Cloud – Page 5 – LeftAsExercise

Automating provisioning with Ansible – variables and facts

In the playbooks that we have considered so far, we have used tasks, links to the inventory and modules. In this post, we will add another important feature of Ansible to our toolbox – variables.

Declaring variables

Using variables in Ansible is slightly more complex than you might expect at the first glance, mainly due to the fact that variables can be defined at many different points, and the precedence rules are a bit complicated, making errors likely. Ignoring some of the details, here are the essential options that you have to define variables and assign values:

You can assign variables in a playbook on the level of a play, which are then valid for all tasks and all hosts within that play
Similarly, variables can be defined on the level of an individual task in a playbook
You can define variables on the level of hosts or groups of hosts in the inventory file
There is a module called set_fact that allows you to define variables and assign values which are then scoped per host and for the remainder of the playbook execution
Variables can be defined on the command line when executing a playbook
Variables can be bound to a module so that the return values of that module are assigned to that variable within the scope of the respective host
Variable definitions can be moved into separate files and be referenced from within the playbook
Finally, Ansible will provide some variables and facts

Let us go through these various options using the following playbook as an example.

---
- hosts: all
  become: yes
  # We can define a variable on the level of a play, it is
  # then valid for all hosts to which the play applies
  vars:
    myVar1: "Hello"
  vars_files:
  - vars.yaml
  tasks:
    # We can also set a variable using the set_fact module
    # This will be valid for the respective host until completion
    # of the playbook
  - name: Set variable
    set_fact:
      myVar2: "World"
      myVar5: "{{ ansible_facts['machine_id'] }}"
    # We can register variables with tasks, so that the output of the
    # task will be captured in the variable
  - name: Register variables
    command: "date"
    register:
      myVar3
  - name: Print variables
    # We can also set variables on the task level
    vars:
      myVar4: 123
    debug:
      var: myVar1, myVar2, myVar3['stdout'], myVar4, myVar5, myVar6, myVar7

At the top of this playbook, we see an additional attribute vars on the level of the play. This attribute itself is a list and contains key-value pairs that define variables which are valid across all tasks in the playbook. In our example, this is the variable myVar1.

The same syntax is used for the variable myVar4 on the task level. This variable is then only valid for that specific task.

Directly below the declaration of myVar1, we instruct Ansible to pick up variable definitions from an external file. This file is again in YAML syntax and can define arbitrary key-value pairs. In our example, this file could be as simple as

---
  myVar7: abcd

Separating variable definitions from the rest of the playbook is very useful if you deal with several environments. You could then move all environment-specific variables into separate files so that you can use the same playbook for all environments. You could even turn the name of the file holding the variables into a variable that is then set using a command line switch (see below), which allows you to use different sets of variables for each execution without having to change the playbook.

The variable myVar3 is registered with the module command, meaning that it will capture the output of this module. Note that this output will usually be a complex data structure, i.e. a dictionary. One of the keys in this dictionary, which depends on the module, is typically stdout and captures the output of the command.

For myVar2, we use the module set_fact to define it and assign a value to it. Note that this value will only be valid per host, as myVar5 demonstrates (here we use a fact and Jinja2 syntax – we will discuss this further below).

In the last task of the playbook, we print out the value of all variables using the debug module. If you look at this statement, you will see that we print out a variable – myVar6 – which is not defined anywhere in the playbook. This variable is in fact defined in the inventory. Recall that the inventory for our test setup with Vagrant introduced in the last post looked as follows.

[servers]
192.168.33.10
192.168.33.11

To define the variable myVar6, change this file as follows.

[servers]
192.168.33.10 myVar6=10
192.168.33.11 myVar6=11

Note that behind each host, we have added the variable name along with its value which is specific per host. If you now run this playbook with a command like

export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook  \
        -u vagrant \
        --private-key ~/vagrant/vagrant_key \
        -i ~/vagrant/hosts.ini \
         definingVariables.yaml

then the last task will produce an output that contains a list of all variables along with their values. You will see that myVar6 has the value defined in the inventory, that myVar5 is in fact different for each host and that all other variables have the values defined in the playbook.

As mentioned before, it is also possible to define variables using an argument to the ansible-playbook executable. If, for instance, you use the following command to run the playbook

export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook  \
        -u vagrant \
        --private-key ~/vagrant/vagrant_key \
        -i ~/vagrant/hosts.ini \
        -e myVar4=789 \
         definingVariables.yaml

then the output will change and the variable myVar4 has the value 789. This is an example of the precedence rule mentioned above – an assignment specified on the command line overwrites all other definitions.

Using facts and special variables

So far we have been using variables that we instantiate and to which we assign values. In addition to these custom variables, Ansible will create and populate a few variables for us.

First, for every machine in the inventory to which Ansible connects, it will create a complex data structure called ansible_facts which represents data that Ansible collects on the machine. To see an example, run the command (assuming again that you use the setup from my last post)

export ANSIBLE_HOST_KEY_CHECKING=False
ansible \
      -u vagrant \
      -i ~/vagrant/hosts.ini \
      --private-key ~/vagrant/vagrant_key  \
      -m setup all

This will print a JSON representation of the facts that Ansible has gathered. We see that facts include information on the machine like the number of cores and hyperthreads per core, the available memory, the IP addresses, the devices, the machine ID (which we have used in our example above), environment variables and so forth. In addition, we find some information on the user that Ansible is using to connect, the used Python interpreter and the operating system installed.

It is also possible to add custom facts by placing a file with key-value pairs in a special directory on the host. Confusingly, this is called local facts, even though these facts are not defined on the control machine on which Ansible is running but on the host that is provisioned. Specifically, a file in /etc/ansible/facts.d ending with .fact can contain key-value pairs that are interpreted as facts and added to the dictionary ansible_local.

Suppose, for instance, that on one of your hosts, you have created a file called /etc/ansible/facts.d/myfacts.fact with the following content

[test]
testvar=1

If you then run the above command again to gather all facts, then, for that specific host, the output will contain a variable

"ansible_local": {
            "myfacts": {
                "test": {
                    "testvar": "1"
                }
            }

So we see that ansible_local is a dictionary, with the keys being the names of the files in which the facts are stored (without the extension). The value for each of the files is again a dictionary, where the key is the section in the facts file (the one in brackets) and the value is a dictionary with one entry for each of the variables defined in this section (you might want to consult the Wikipedia page on the INI file format).

In addition to facts, Ansible will populate some special variables like the inventory_hostname, or the groups that appear in the inventory file.

Using variables – Jinja2 templates

In the example above, we have used variables in the debug statement to print their value. This, of course, is not a typical usage. In general, you will want to expand a variable to use it at some other point in your playbook. To this end, Ansible uses Jinja2 templates.

Jinja2 is a very powerful templating language and Python library, and we will only be able to touch on some of its features in this post. Essentially, Jinja2 accepts a template and a set of Python variables and then renders the template, substituting special expressions according to the values of the variables. Jinja2 differentiates between expressions which are evaluated and replaced by the values of the variables they refer to, and tags which control the flow of the template processing, so that you can realize things like loops and if-then statements.

Let us start with a very simple and basic example. In the above playbook, we have hard-coded the name of our variable file vars.yaml. As mentioned above, it is sometimes useful to use different variable files, depending on the environment. To see how this can be done, change the start of our playbook as follows.

---
- hosts: all
  become: yes
  # We can define a variable on the level of a play, it is
  # then valid for all hosts to which the play applies
  vars:
    myVar1: "Hello"
  vars_files:
  - "{{ myfile }}"

When you now run the playbook again, the execution will fail and Ansible will complain about an undefined variable. To fix this, we need to define the variable myfile somewhere, say on the command line.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook  \
        -u vagrant \
        --private-key ~/vagrant/vagrant_key \
        -i ~/vagrant/hosts.ini \
        -e myVar4=789 \
        -e myfile=vars.yaml \
         definingVariables.yaml

What happens is that before executing the playbook, Ansible will run the playbook through the Jinja2 templating engine. The expression {{myfile}} is the most basic example of a Jinja2 expression and evaluates to the value of the variable myfile. So the entire expression gets replaced by vars.yaml and Ansible will read the variables defined there.

Simple variable substitution is probably the most commonly used feature of Jinja2. But Jinja2 can do much more. As an example, let us modify our playbook so that we use a certain value for myVar7 in DEV and a different value in PROD. The beginning of our playbook now looks as follows (everything else is unchanged):

---
- hosts: all
  become: yes
  # We can define a variable on the level of a play, it is
  # then valid for all hosts to which the play applies
  vars:
    myVar1: "Hello"
    myVar7: "{% if myEnv == 'DEV' %} A {% else %} B {% endif %}"

Let us run this again. On the command line, we set the variable myEnv to DEV.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook  \
        -u vagrant \
        --private-key ~/vagrant/vagrant_key \
        -i ~/vagrant/hosts.ini \
        -e myVar4=789 \
        -e myEnv=DEV \
         definingVariables.yaml

In the output, you will see that the value of the variable is ” A “, as expected. If you use a different value for myEnv, you get ” B “. The characters “{%” instruct Jinja2 to treat everything that follows (until “%}”) as tag. Tags are comparable to statements in a programming language. Here, we use the if-then-else tag which evaluates to a value depending on a condition.

Jinja2 comes with many tags, and I advise you to browse the documentation of all available control structures. In addition to control structures, Jinja2 also uses filters that can be applied to variables and can be chained.

To see this in action, we turn to an example which demonstrates a second common use of Jinja2 templates with Ansible apart from using them in playbooks – the template module. This module is very similar to the copy module, but it takes a Jinja2 template on the control machine and does not only copy it to the remote machine, but also evaluates it.

Suppose, for instance, you wanted to dynamically create a web page on the remote machine that reflects some of the machines’s characteristics, as captured by the Ansible facts. Then, you could use a template that refers to facts to produce some HTML output. I have created a playbook that demonstrates how this works – this playbook will install NGINX in a Docker container and dynamically create a webpage containing machine details. If you run this playbook with our Vagrant based setup and point your browser to http://192.168.33.10/, you will see a screen similar to the one below, displaying things like the number of CPU cores in the virtual machine or the network interfaces attached to it.

I will not go through this in detail, but I advise you to try out the playbook and take a look at the Jinja2 template that it uses. I have added a few comments which, along with the Jinja2 documentation, should give you a good idea how the evaluation works.

To close this post, let us see how we can test Jinja2 templates. Of course, you could simply run Ansible, but this is a bit slow and creates an overhead that you might want to avoid. As Jinja2 is a Python library, there is a much easier approach – you can simply create a small Python script that imports your template, runs the Jinja2 engine and prints the result. First, of course, you need to install the Jinja2 Python module.

pip3 install jinja2

Here is an example of how this might work. We import the template index.html.j2 that we also use for our dynamic webpage displayed above, define some test data, run the engine and print the result.

import jinja2
#
# Create a Jinja2 environment, using the file system loader
# to be able to load templates from the local file system
#
env = jinja2.Environment(
    loader=jinja2.FileSystemLoader('.')
)
#
# Load our template
#
template = env.get_template('index.html.j2')
#
# Prepare the input variables, as Ansible would do it (use ansible -m setup to see
# how this structure looks like, you can even copy the JSON output)
#
groups = {'all': ['127.0.0.1', '192.168.33.44']}
ansible_facts={
        "all_ipv4_addresses": [
            "10.0.2.15",
            "192.168.33.10"
        ],
        "env": {
            "HOME": "/home/vagrant",
        },
        "interfaces": [
            "enp0s8"
        ],
        "enp0s8": {
            "ipv4": {
                "address": "192.168.33.11",
                "broadcast": "192.168.33.255",
                "netmask": "255.255.255.0",
                "network": "192.168.33.0"
            },
            "macaddress": "08:00:27:77:1a:9c",
            "type": "ether"
        },
}
#
# Render the template and print the output
#
print(template.render(groups=groups, ansible_facts=ansible_facts))

An additional feature that Ansible offers on top of the standard Jinja2 templating language and that is sometimes useful are lookups. Lookups allow you to query data from external sources on the control machine (where they are evaluated), like environment variables, the content of a file, and many more. For example, the expression

"{{ lookup('env', 'HOME') }}"

in a playbook or a template will evaluate to the value of the environment variable HOME on the control machine. Lookups are enabled by plugins (the name of the plugin is the first argument to the lookup statement), and Ansible comes with a large number of pre-installed lookup plugins.

We have now discussed Ansible variables in some depth. You might want to read through the corresponding section of the Ansible documentation which contains some more details and links to additional information. In the next post, we will turn our attention back from playbooks to inventories and how to structure and manage them.

Automating provisioning with Ansible – playbooks

So far, we have used Ansible to execute individual commands on all hosts in the inventory, one by one. Today, we will learn how to use playbooks to orchestrate the command execution. As a side note, we will also learn how to set up a local test environment using Vagrant.

Setting up a local test environment

To develop and debug Ansible playbooks, it can be very useful to have a local test environment. If you have a reasonably powerful machine (RAM will be the bottleneck), this can be easily done by spinning up virtual machines using Vagrant. Essentially, Vagrant is a tool to orchestrate virtual machines, based on configuration files which can be put under version control. We will not go into details on Vagrant in this post, but only describe briefly how the setup works.

First, we need to install Vagrant and VirtualBox. On Ubuntu, this is done using

sudo apt-get install virtualbox vagrant

Next, create a folder vagrant in your home directory, switch to it and create a SSH key pair that we will later use to access our virtual machines.

ssh-keygen -b 2048 -t rsa -f vagrant_key -P ""

This will create two files, the private key vagrant_key and the corresponding public key vagrant_key.pub.

Next, we need a configuration file for the virtual machines we want to bring up. This file is traditionally called Vagrantfile. In our case, the file looks as follows.

Vagrant.configure("2") do |config|

  config.ssh.private_key_path = ['~/.vagrant.d/insecure_private_key', '~/.keys/ansible_key'] 
  config.vm.provision "file", source: "~/vagrant/vagrant_key.pub", destination: "~/.ssh/authorized_keys" 
  
  config.vm.define "boxA" do |boxA|
    boxA.vm.box = "ubuntu/bionic64"
    boxA.vm.network "private_network", ip: "192.168.33.10"
  end

  config.vm.define "boxB" do |boxB|
    boxB.vm.box = "ubuntu/bionic64"
    boxB.vm.network "private_network", ip: "192.168.33.11"
  end
end

We will not go into the details, but essentially this file instructs Vagrant to provision two machines, called boxA and boxB (the file, including some comments, can also be downloaded here). Both machines will be connected to a virtual network device and will be reachable from the host and from each other using the IP addresses 192.168.33.10 and 192.168.33.11 (using the VirtualBox networking machinery in the background, which I have described in a bit more detail in this post). On both files, we will install the public key just created, and we ask Vagrant to use the corresponding private key.

Now place this file in the newly created directory ~/vagrant and, from there, run

vagrant up

Vagrant will now start to spin up the two virtual machines. When you run this command for the first time, it needs to download the machine image which will take some time, but once the image is present locally, this usually only takes one or two minutes.

Our first playbook

Without further ado, here is our first playbook that we will use to explain the basic structure of an Ansible playbook.

---
# Our first play. We create an Ansible user on the host
- name: Initial setup
  hosts: all
  become: yes
  tasks:
  - name: Create a user ansible_user on the host
    user:
      name: ansible_user
      state: present
  - name: Create a directory /home/ansible_user/.ssh
    file:
      path: /home/ansible_user/.ssh
      group: ansible_user
      owner: ansible_user
      mode: 0700
      state: directory
  - name: Distribute ssh key
    copy:
      src: ~/.keys/ansible_key.pub
      dest: /home/ansible_user/.ssh/authorized_keys
      mode: 0700
      owner: ansible_user
      group: ansible_user
  - name: Add newly created user to sudoers file
    lineinfile:
      path: /etc/sudoers
      state: present
      line: "ansible_user      ALL = NOPASSWD: ALL"

Let us go through this line by line. First, recall from our previous posts that an Ansible playbook is a sequence of plays. Each play in turn refers to a group of hosts in the inventory.

A playbook is written in YAML-syntax that you probably know well if you have followed my series on Kubernetes (if not, you might want to take a look at the summary page on Wikipedia). Being a sequence of plays, it is a list. Our example contains only one play.

The first attribute of our play is called name. This is simply a human-readable name that ideally briefly summarizes what the play is doing. The next attribute hosts refers to a set of hosts in our inventory. In our case, we want to run the play for all nodes in the inventory. More precisely, this is a pattern which can specify individual hosts, groups of hosts or various combinations.

The second attribute is the become flag that we have already seen several times. We need this, as our vagrant user is not root and we need to do a sudo to execute most of our commands.

The next attribute, tasks, is itself a list and contains all tasks that make up the playbook. When Ansible executes a playbook, it will execute the plays in the order in which they are present in the playbook. For each play, it will go through the tasks and, for each task, it will loop through the corresponding hosts and execute this task on each host. So a task will have to be completed for each host before the next task starts. If an error occurs, this host will be removed from the rotation and the execution of the play will continue.

We have learned in an earlier post that a task is basically the execution of a module. Each task usually starts with a name. The next line specifies the module to execute, followed by a list of parameters for this module. Our first task executes the user module, passing the name of the user to be created as an argument, similar to what we did when executing commands ad-hoc. Our second task executes the file module, the third task the copy module and the last task the lineinfile module. If you have followed my previous posts, you will easily recognize what these tasks are doing – we create a new user, distribute the SSH key in ~/.keys/ansible_key.pub and add the user to the sudoers file on each host (of course, we could as well continue to use the vagrant user, but we perform the switch to a different user for the sake of demonstration). To run this example, you will have to create a SSH key pair called ansible_key and store it in the subdirectory .keys of your home directory, as in my last post.

Executing our playbook

How do we actually execute a playbook? The community edition of Ansible contains a command-line tool called ansible-playbook that accepts one or more playbooks as arguments and executes them. So assuming that you have saved the above playbook in a file called myFirstPlaybook, you can run it as follows.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook -i hosts.ini  -u vagrant --private-key ~/vagrant/vagrant_key myFirstPlaybook.yaml

The parameters are quite similar to the parameters of the ansible command. We use -i to specify the location of an inventory file, -u to specify the user to use for the SSH connections and –private-key to specify the location of the private key file of this user.

When we run this playbook, Ansible will log a summary of the actions to the console. We will see that it starts to work on our play, and then, for each task, executes the respective action once for each host. When the script completes, we can verify that everything worked by using SSH to connect to the machines using our newly created user, for instance

ssh -i ~/.keys/ansible_key -o StrictHostKeyChecking=no ansible_user@192.168.33.10

Instead of executing our playbook manually after Vagrant has completed the setup of the machines, it is also possible to integrate Ansible with Vagrant. In fact, Ansible can be used in Vagrant as a Provisioner, meaning that we can specify an Ansible playbook that Vagrant will execute for each host immediately after is has been provisioned. To see this in action, copy the playbook to the Vagrant directory ~/vagrant and, in the Vagrantfile, add the three lines

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "myFirstPlaybook.yaml"
  end

immediately after the line starting with config.vm.provision "file". When you now shut down the machines again using

vagrant destroy

from within the vagrant directory and then re-start using

vagrant destroy

you will see that the playbook gets executed once for each machine that is being provisioned. Behind the scenes, Vagrant will create an inventory in ~/vagrant/.vagrant/provisioners/ansible/inventory and will add the hosts there (when inspecting this file, you will find that Vagrant incorrectly adds the insecure default SSH key as a parameter to each host, but apparently this overridden by the settings in our Vagrantfile and does not cause any problems).

Of course there is much more that you can do with playbooks. As a first more complex example, I have created a playbook that

Creates a new user ansible_user and adds it to the sudoer file
Distributes a corresponding public SSH key
Installs Docker
Installs pip3 and the Python Docker library
Pulls an NGINX container from the Docker hub
Creates a HTML file and dynamically add the primary IP address of the host
Starts the container and maps the HTTP port

To run this playbook, copy it to the vagrant directory as well along with the HTML file template, change the name of the playbook in the Vagrantfile to nginx.yaml and run vagrant as before.

Once this script completes, you can point your browser to 192.168.33.10 or 192.168.33.11 and see a custom NGINX welcome screen. This playbook uses some additional features that we will discuss in the next post, most notably variables and facts.

Automating provisioning with Ansible – using modules

In the previous post, we have learned the basics of Ansible and how to use Ansible to execute a command – represented by a module – on a group of remote hosts. In this post, we will look into some useful modules in a bit more detail and learn a bit more on idempotency and state.

Installing software with the apt module

Maybe one of the most common scenarios that you will encounter when using Ansible is that you want to install a certain set of packages on a set of remote hosts. Thus, you want to use Ansible to invoke a packet manager. Ansible comes with modules for all major packet managers – yum for RedHat systems, apt for Debian derived systems like Ubuntu, or even the Solaris packet managers. In our example, we assume that you have a set of Ubuntu hosts on which you need to install a certain package, say Docker. Assuming that you have these hosts in your inventory in the group servers, the command to do this is as follows.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible servers \
  -i hosts.ini \
  -u root \
  --private-key ~/.ssh/do_k8s \
  -m apt \
  -a 'name=docker.io update_cache=yes state=present'

Most options of this command are as in my previous post. We use a defined private key and a defined inventory file and instruct Ansible to use the user root when logging into the machines. With the switch -m, we ask Ansible to run the apt module. With the switch -a, we pass a set of parameters to this module, i.e. a set of key-value pairs. Let us look at these parameters in detail.

The first parameter, name, is simply the package that we want to install. The parameter update_cache instructs apt to update the package information before installing the package, which is the equivalent of apt-get update. The last parameter is the parameter state. This defines the target state that we want to achieve. In our case, we want the package to be present.

When we run this command first for a newly provisioned host, chances are that the host does not yet have Docker installed, and the apt module will install it. We will receive a rather lengthy JSON formatted response, starting as follows.

206.189.56.178 | CHANGED => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "cache_update_time": 1569866674,
    "cache_updated": true,
    "changed": true,
    "stderr": "",
    "stderr_lines": [],
    ...

In the output, we see the line "changed" : true, which indicates that Ansible has actually changed the state of the target system. This is what we expect – the package was not yet installed, we want the latest version to be installed, and this requires a change.

Now let us run this command a second time. This time, the output will be considerably shorter.

206.189.56.178 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "cache_update_time": 1569866855,
    "cache_updated": true,
    "changed": false
}

Now the attribute changed in the JSON output is set to false. In fact, the module has first determined the current state and found that the package is already installed. It has then looked at the target state and found that current state and target state are equal. Therefore, no action is necessary and the module does not perform any actual change.

This example highlights two very important and very general (and related) properties of Ansible modules. First, they are idempotent, i.e. executing them more than once results in the same state as executing them only once. This is very useful if you manage groups of hosts. If the command fails for one host, you can simply execute it again for all hosts, and it will not do any harm on the hosts where the first run worked.

Second, an Ansible task does (in general, there are exceptions like the command module) not specify an action, but a target state. The module is then supposed to determine the current state, compare it to the target state and only take those actions necessary to move the system from the current state to the target state.

These are very general principles, and the exact way how they are implemented is specific to the module in question. Let us look at a second example to see how this works in practice.

Working with files

Next, we will look at some modules that allow us to work with files. First, there is a module called copy that can be used to copy a file from the control host (i.e. the host on which Ansible is running) to all nodes. The following commands will create a local file called test.asc and distribute it to all nodes in the servers group in the inventory, using /tmp/test.asc as target location.

export ANSIBLE_HOST_KEY_CHECKING=False
echo "Hello World!" > test.asc
ansible servers \
  -i hosts.ini \
  -u root \
  --private-key ~/.ssh/do_k8s \
  -m copy \
  -a 'src=test.asc dest=/tmp/test.asc'

After we have executed this command, let us look at the first host in the inventory to see that this has worked. We first use grep to strip off the first line of the inventory file, then awk to get the first IP address and finally scp to manually copy the file back from the target machine to a local file test.asc.check. We can then look at this file to see that is contains what we expect.

ip=$(cat hosts.ini  | grep -v "\[" | awk 'NR == 1')
scp -i ~/.keys/k8s_token \
    root@$ip:/tmp/test.asc test.asc.check
cat test.asc.check

Again, it is instructive to run the copy command twice. If we do this, we see that as long as the local version of the file is unchanged, Ansible will not repeat the operation and leave the destination file alone. Again, this follows the state principle – as the node is already in the target state (file present), no action is necessary. If, however, we change the local file and resubmit the command, Ansible will again detect a deviation from the target state and copy the changed file again.

It is also possible to fetch files from the nodes. This is done with the fetch module. As this will in general produce more than one file (one per node), the destination that we specify is a directory and Ansible will create one subdirectory for each node on the local machine and place the files there.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible servers \
  -i hosts.ini \
  -u root \
  --private-key ~/.ssh/do_k8s \
  -m fetch \
  -a 'src=/tmp/test.asc dest=.'

Creating users and ssh keys

When provisioning a new host, maybe the first thing that you want to do is to create a new user on the host and provide SSH keys to be able to use this user going forward. This is especially important when using a platform like DigitalOcean which by default only creates a root account.

The first step is to actually create the user on the hosts, and of course there is an Ansible module for this – the user module. The following command will create a user called ansible_user on all hosts in the servers group.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible servers \
  -i hosts.ini \
  -u root \
  --private-key ~/.ssh/do_k8s \
  -m user \
  -a 'name=ansible_user'

To use this user via Ansible, we need to create an SSH key pair and distribute it. Recall that when SSH keys are used to log in to host A (“server”) from host B (“client”), the public key needs to be present on host A, whereas the private key is on host B. On host A, the key needs to be appended to the file authorized_keys in the .ssh directory of the users home directory.

So let us first create our key pair. Assuming that you have OpenSSH installed on your local machine, the following command will create a private / public RSA key pair with 2048 bit key length and no passphrase and store it in ~/.keys in two files – the private key will be in ansible_keys, the public key in ansible_keys.pub.

ssh-keygen -b 2048 -f ~/.keys/ansible_key -t rsa -P ""

Next, we want to distribute the public key to all hosts in our inventory, i.e. for each host, we want to append the key to the file authorized_keys in the users SSH directory. In our case, instead of appending the key, we can simply overwrite the authorized_keys file with our newly created public key. Of course, we can use Ansible for this task. First, we use the file module to create a directory .ssh in the new users home directory. Note that we still use our initial user root and the corresponding private key.

ansible servers \
  -i hosts.ini \
  -u root \
  --private-key ~/.ssh/do_k8s \
  -m file \
  -a 'path=/home/ansible_user/.ssh owner=ansible_user mode=0700 state=directory group=ansible_user'

Next, we use the copy module to copy the public key into the newly created directory with target file name authorized_keys (there is also a module called authorized_keys that we could use for that purpose).

ansible servers \
  -i hosts.ini \
  -u root \
  --private-key ~/.ssh/do_k8s \
  -m copy \
  -a 'src=~/.keys/ansible_key.pub dest=/home/ansible_user/.ssh/authorized_keys group=ansible_user owner=ansible_user mode=0700'

We can verify that this worked using again the ping module. This time, we use the newly created Ansible user and the corresponding private key.

ansible servers \
  -i hosts.ini \
  -u ansible_user \
  --private-key ~/.keys/ansible_key \
  -m ping

However, there is still a problem. For many purposes, our user is only useful if it has the right to use sudo without a password. There are several ways to achieve this, the easiest being to add the following line to the file /etc/sudoers which will allow the user ansible_user to run any command using sudo without a password.

ansible_user ALL = NOPASSWD: ALL

How to add this line? Again, there is an Ansible module that comes to our rescue – lineinfile. This module can be used to enforce the presence of defined lines in a file, at specified locations. The documentation has all the options, but for our purpose, the usage is rather simple.

ansible servers \
  -i hosts.ini \
  -u root \
  --private-key ~/.ssh/do_k8s \
  -m lineinfile \
  -a 'path=/etc/sudoers state=present line="ansible_user      ALL = NOPASSWD: ALL"'

Again, this module follows the principles of state and idempotency – if you execute this command twice, the output will indicate that the second execution does not result in any changes on the target system.

So far, we have used the ansible command line client to execute commands ad-hoc. This, however, is not the usual way of using Ansible. Instead, you would typically compose a playbook that contains the necessary commands and use Ansible to run this playbook. In the next post, we will learn how to create and deal with playbooks.

Automating provisioning with Ansible – the basics

For my projects, I often need a clean Linux box with a defined state which I can use to play around and, if a make a mistake, simply dump it again. Of course, I use a cloud environment for this purpose. However, I often find myself logging into one of these newly created machines to carry out installations manually. Time to learn how to automate this.

Ansible – the basics

Ansible is a platform designed to automate the provisioning and installation of virtual machines (or, more general, any type of servers, including physical machines and even containers). Ansible is agent-less, meaning that in order to manage a virtual machine, it does not require any special agent installed on that machine. Instead, it uses ssh to access the machine (in the Linux / Unix world, whereas with Windows, you would use something like WinRM).

When you use Ansible, you run the Ansible engine on a dedicated machine, on which Ansible itself needs to be installed. This machine is called the control machine. From this control machine, you manage one or more remote machines, also called hosts. Ansible will connect to these nodes via SSH and execute the required commands there. Obviously, Ansible needs a list of all hosts that it needs to manage, which is called the inventory. Inventories can be static, i.e. an inventory is simply a file listing all nodes, or can be dynamic, implemented either as a script that supplies a JSON-formatted list of nodes and is invoked by Ansible, or a plugin written in Python. This is especially useful when using Ansible with cloud environments where the list of nodes changes over time as nodes are being brought up or are shut down.

The “thing that is actually executed” on a node is, at the end of a day, a Python script. These scripts are called Ansible modules. You might ask yourself how the module can be invoked on the node. The answer is simple – Ansible copies the module to the node, executes it and then removes it again. When you use Ansible, then, essentially, you execute a list of tasks for each node or group of nodes, and a task is a call to a module with specific arguments.

A sequence of tasks to be applied to a specific group of nodes is called a play, and the file that specifies a set of plays is called a playbook. Thus a playbook allows you to apply a defined sequence of module calls to a set of nodes in a repeatable fashion. Playbooks are just plain files in YAML-format, and as such can be kept under version control and can be treated according to the principles of infrastructure-as-a-code.

Installation and first steps

Being agentless, Ansible is rather simple to install. In fact, Ansible is written in Python (with the source code of the community version being available on GitHub), and can be installed with

pip3 install --user ansible

Next, we need a host. For this post, I did simply spin up a host on DigitalOcean. In my example, the host has been assigned the IP address 134.209.240.213. We need to put this IP address into an inventory file, to make it visible to Ansible. A simple inventory file (in INI format, YAML is also supported) looks as follows.

[servers]
134.209.240.213

Here, servers is a label that we can use to group nodes if we need to manage nodes with different profiles (for instance webservers, application servers or database servers).

Of course, you would in general not create this file manually. We will look at dynamic inventories in a later post, but for the time being, you can use a version of the script that I used in an earlier post on DigitalOcean to automate the provisioning of droplets. This script brings up a machine and adds it to a repository file hosts.ini.

With this inventory file in place, we can already use Ansible to ad-hoc execute commands on this node. Here is a simple example.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible -i hosts.ini \
        servers \
        -u root \
        --private-key ~/.ssh/do_k8s \
        -m ping

Let us go through this example step by step. First, we set an environment variable which prevents the SSH host key check when first connecting to an unknown host. Next, we invoke the ansible command line client. The first parameter is the name of the inventory file, host.ini in our case. The second arguments (servers) instructs Ansible only to run our command for those hosts that carry the label servers in our inventory file. This allows us to use different node profiles and apply different commands to them. Next, the -u switch asks Ansible to use the root user to connect to our hosts (which is the default ssh user on DigitalOcean). The next switch specifies the file in which the private host key that Ansible will use to connect to the node is stored.

Finally, the switch -m specifies the module to use. For this example, we use the ping module which tries to establish a connection to the host (you can look at the source code of this module here).

Modules can also take arguments. A very versatile module is the command module (which is actually the default, if no module is specified). This module accepts an argument which it will simply execute as a command on the node. For instance, the following code snippet executes uname -a on all nodes.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible -i hosts.ini \
        servers \
        -u root \
        --private-key ~/.ssh/do_k8s \
        -m command \
        -a "uname -a"

Privilege escalation

Our examples so far have assumed that the user that Ansible uses to SSH into the machines has all the required privileges, for instance to run apt. On DigitalOcean, the standard user is root, where this simple approach is working, but in other cloud environments, like EC2, the situation is different.

Suppose, for instance, that we have created an EC2 instance using the Amazon Linux AMI (which has a default user called ec2-user) and added its IP address to our hosts.ini file (of course, I have written a script to automate this). Let us also assume that the SSH key used is called ansibleTest and stored in the directory ~/.keys. Then, a ping would look as follows.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible -i hosts.ini \
        servers \
        -u ec2-user \
        --private-key ~/.keys/ansibleTest.pem \
        -m ping

This would actually work, but this changes if we want to install a package like Docker. We will learn more about package installation and state in the next post, but naively, the command to actually install the package would be as follows.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible -i hosts.ini \
        servers \
        -u ec2-user \
        --private-key ~/.keys/ansibleTest.pem \
        -m yum -a 'name=docker state=present'

Here, we use the yum module which is able to leverage the YUM packet manager used by Amazon Linux to install, upgrade, remove and manage packages. However, this will fail, and you will receive a message like ‘You need to be root to perform this command’. This happens because Ansible will SSH into your machine using the ec2_user, and this user, by default, does not have the necessary privileges to run yum.

On the command line, you would of course use sudo to fix this. To instruct Ansible to do this, we have to add the switch -b to our command. Ansible will then use sudo to execute the modules as root.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible -i hosts.ini \
        servers \
        -b \
        -u ec2-user \
        --private-key ~/.keys/ansibleTest.pem \
        -m yum -a 'name=docker state=present'

An additional parameter, –become-user, can be used to control which user Ansible will use to run the operations. The default is root, but any other user can be used as well (assuming, of course, that the user has the required privileges for the respective command).

The examples above still look like something that could easily be done with a simple shell script, looping over all hosts in an inventory file and invoking an ssh command. The real power of Ansible, however, becomes apparent when we start to talk about modules in the next post in this series.

Building a CI/CD pipeline for Kubernetes with Travis and Helm

One of the strengths of Kubernetes is the ability to spin up pods and containers with a defined workload within seconds, which makes it the ideal platform for automated testing and continuous deployment. In this, we will see how GitHub, Kubernetes, Helm and Travis CI play together nicely to establish a fully cloud based CI/CD pipeline for your Kubernetes projects.

Introduction

Traditional CI/CD pipelines require a fully equipped workstation, with tools like Jenkins, build environments, libraries, repositories and so forth installed on them. When you are used to working in a cloud based environment, however, you might be looking for alternatives, allowing you to maintain your projects from everywhere and from virtually every PC with a basic equipment. What are your options to make this happen?

Of course there many possible approaches to realize this. You could, for instance, maintain a separate virtual machine running Jenkins and trigger your builds from there, maybe using Docker containers or Kubernetes as build agents. You could use something like Gitlab CI with build agents on Kubernetes. You could install Jenkins X on your Kubernetes cluster. Or you could turn to Kubernetes native solutions like Argo or Tekton.

All these approaches, however, have in common that they require additional infrastructure, which means additional cost. Therefore I dediced to stick to Travis CI as a CI engine and control my builds from there. As Travis runs builds in a dedicated virtual machine, I can use kind to bring up a cluster for integration testing at no additional cost.

The next thing I wanted to try out is a multi-staged pipeline based on the GitOps approach. Roughly speaking, this approach advocates the use of several repositories, one per stage, which each reflect the actual state of the respective stage using Infrastructure-as-a-code. Thus, you would have one repository for development, one for integration testing and one for production (knowing, of course, that real organisations typically have additional stages). Each repository contains the configuration (like Kubernetes YAML files or other configuration items) for the respective Kubernetes cluster. At every point in time, the cluster state is fully in sync with the state of the repository. Thus, if you want to make changes to a cluster, you would not use kubectl or the API to directly deploy into the cluster and update your repository after the fact, but you would rather change the configuration of the cluster stored in the repository, and have a fully automated process in place which detects this change and updates the cluster.

The tool chain originally devised by the folks at Weaveworks requires access to a Kubernetes cluster, which, as described above, I wanted to avoid for cost reasons. Still, some of the basic ideas of GitOps can be applied with Travis CI as well.

Finally, I needed an example project. Of course, I decided to choose my bitcoin controller for Kubernetes, which is described in a series of earlier posts starting here.

Overall design and workflow

Based on these considerations, I came up with the following high-level design. The entire pipeline is based on three GitHub repositories.

The first repository, bitcoin-controller, represents the DEV stage of the project. It contains the actual source code of the bitcoin controller.
The second repository, bitcoin-controller-helm-qa, represents the QA stage. It does not contain source code, but a Helm chart that describes the state of the QA environment.
Finally, the third repository, bitcoin-controller-helm, represents a release of the production stage and contains the final, tested and released packaged Helm charts

To illustrate the overall pipeline, let us take a look at the image below.

The process starts on the left hand side of the above diagram if a developer pushes a change into the DEV repository. At this point, the Travis CI process will start, spin up a virtual machine, install Go and required libraries and conduct build and unit test. Then, a Docker image is built and pushed into the Docker Hub image repository, using the Github commit as a tag. Finally, the new tag is written into the Helm chart stored in the QA repository so that the Helm chart points to the now latest version of the Docker image.

This change in the bitcoin-controller-helm-qa repository now triggers a second Travis CI pipeline. Once the virtual machine has been brought up by Travis, we install kind, spin up a Kubernetes cluster, install Helm in this cluster, download the current version of the Helm charts and install the bitcoin controller using this Helm chart. As we have previously updated the Docker tag in the Helm chart, this will pull the latest version of the Docker image.

We then run the integration tests against our newly established cluster. If the integration test succeeds, we package our Helm chart and upload them into the bitcoin-controller-helm repository.

However, we do not want to perform this last step for every single commit, but only for releases. To achieve this, we check at this point whether the commit was a tagged commit. If yes, a new package is built using the tag as version number. If not, the process stops at this point and no promote to the bitcoin-controller-helm-qa is executed.

Possible extensions

This simple approach can of course be extended into several directions. First, we could add an additional stage to also test our packaged Helm chart. In this stage, we would fully simulate a possible production environment, i.e. spin up a cluster at AWS, DigitalOcean or whatever your preferred provider is, deploy the packaged Helm chart and run additional tests. You could also easily integrate additional QS steps, like a performance test or static code analysis into this pipeline.

Some organisations like to add manual approval steps before deploying into production. Unfortunately, Travis CI does not seem to offer an easy solution for this. To solve this, one could potentially uses branches instead of tags to flag a certain code version as a release, and only allow specific users to perform a push or or merge into this branch.

Finally, we currently only store the Docker image which we then promote through the stages. This is fine for a simple project using Go, where there are no executables or other artifacts. For other projects, like a typical Java web application, you could use the same approach, but in addition store important artifacts like a WAR file in a separate repository, like Nexus or Artifactory.

Let us now dive into some more implementation details and pitfalls when trying to actually code this solution.

Build and deploy

Our pipeline starts when a developer pushes a code change into the DEV repository bitcoin-controller. At this point, Travis CI will step in and run our pipeline, according to the contents of the respective .travis.yml file. After some initial declarations, the actual processing is done by the stage definitions for the install, script and deploy phase.

install:
  - go get -d -t ./...

script:
  - go build ./cmd/controller/
  - go test -v  ./... -run "Unit" -count=1
  - ./travis/buildImage.sh

deploy:
  skip_cleanup: true
  provider: script
  script:  bash ./travis/deploy.sh
  on:
    all_branches: true

Let us go through this step by step. In the install phase, we run go get to install all required dependencies. Among other things, this will download the Kubernetes libraries that are needed by our project. Once this has been completed, we use the go utility to build and run the unit tests. We then invoke the script buildImage.sh.

The first part of the script is important for what follows – it determines the tag that we will be using for this build. Here are the respective lines from the script.

#
# Get short form of git hash for current commit
#
hash=$(git log --pretty=format:'%h' -n 1)
#
# Determine tag. If the build is from a tag push, use tag name, otherwise
# use commit hash
#
if [ "X$TRAVIS_TAG" == "X" ]; then
  tag=$hash
else
  tag=$TRAVIS_TAG
fi

Let us see how this works. We first use git log with the pretty format option to get the short form of the hash of the current commit (this works, as Travis CI will have checked out the code from Github and will have taken us to the root directory of the repository). We then check the environment variable TRAVIS_TAG which is set by Travis CI if the build trigger originates from pushing a tag to the server. If this variable is empty, we use the commit hash as our tag, and treat the build as an ordinary build (we will see later that this build will not make it into the final stage, but will only go through unit and integration testing). If the variable is not set, then we use the name of the tag itself.

The rest of the script is straighforward. We run a docker build using our tag to create an image locally, i.e. within the Docker instance of the Travis CI virtual machine used for the build. We also tag this image as latest to make sure that the latest tag does actually point to the latest version. Finally, we write the tag into a file for later use.

Now we move into the deploy stage. Here, we use the option skip_cleanup to prevent Travis from cleanup up our working directory. We then invoke another script deploy.sh. Here, we read the tag again from the temporary file that we have created during the build stage and push the image to the Docker Hub, using this tag once more.

#
# Login to Docker hub
#

echo "$DOCKER_PASSWORD" | docker login --username $DOCKER_USER --password-stdin

#
# Get tag
#
tag=$(cat $TRAVIS_BUILD_DIR/tag)

#
# Push images
#
docker push christianb93/bitcoin-controller:$tag
docker push christianb93/bitcoin-controller:latest

At this point, it is helpful to remember the use of image tags in Helm as discussed in one of my previous posts. Helm advocates the separation of charts (holding deployment information and dependencies) from configuration by moving the configuration into separate files (values.yaml) which are then merged back into the chart at runtime using placeholders. Applying this principle to image tags implies that we keep the image tag in a values.yaml file. To prepare for integration testing where we will use the Helm chart to deploy, we will now have to replace the tag name in this file by the current tag. So we need to check out our Helm chart using git clone and use our beloved sed to replace the image tag in the values file by its current value.

But this is not the only change that we want to make to our Helm chart. Remember that a Helm chart also contains versioning information – a chart version and an application version. However, at this point, we cannot simply use our tag anymore, as Helm requires that these version numbers follow the SemVer semantic versioning rules. So at this point, we need to define rules how we compose our version number.

We do this as follows. Every release receives a version number like 1.2, where the first digit is the major release and the second digit is the minor release. In GitHub, releases are created by pushing a tag, and the tag name is used as version number (and thus has to follow this convention). Development releases are marked by appending a hyphen followed by dev and the commit hash to the current version. So if the latest version is 0.9 and we create a new build with the commit hash 64ed033, the new version number will be 0.9-dev64ed033.

So we update the values file and the Helm chart itself with the new image tag and the new version numbers. We then push the change back into the Helm repository. This will trigger a second Travis CI pipeline and the integration testing starts.

Integration testing

When the Travis CI pipeline for the repository bitcoin-helm-qa has reached the install stage, the first thing that is being done is to download the script setupVMAndCluster.sh which is located in the code repository and to run it. This script is responsible for executing the following steps.

Download and install Helm (from the snap)
Download and install kubectl
Install kind
Use kind to create a test cluster inside the virtual machine that Travis CI has created for us
Init Helm and wait for the Tiller container to be ready
Get the source code from the code repository
Install all required Go libraries to be ready for the integration test

Most of these steps are straightforward, but there are a few details which are worth being mentioned. First, this setup requires a significant data volume to be downloaded – the kind binary, the container images required by kind, Helm and so forth. To avoid that this slows down the build, we use the caching feature provided by Travis CI which allows us to cache the content of an arbitrary directory. If, for instance, we find that the kind node image is in the cache, we skip the download and instead use docker load to pre-load the image into the local Docker instance.

The second point to observe is that for integration testing, we need the code for the test cases which is located in the code repository, not in the repository for which the pipeline has been triggered. Thus we need to manually clone into the code repository. However, we want to make sure that we get the version of the test cases that matches the version of the Helm chart (which could, for instance, be an issue if someone changes the code repository while a build is in progress). Thus we need to figure out the git commit hash of the code version under test an run git checkout to use that version. Fortunately, we have put the commit hash as application version into the Helm chart while running the build and deploy pipeline, so we can use grep and awk to extract and use the commit hash.

tag=$(cat Chart.yaml | grep "appVersion:" | awk {' print $2 '})
cd $GOPATH/src/github.com/christianb93
git clone https://github.com/christianb93/bitcoin-controller
cd bitcoin-controller
git checkout $tag

Once this script has completed, we have a fully functional Kubernetes cluster with Helm and Tiller installed running in our VM. We can now use the Helm chart to install the latest version of the bitcoin controller and run our tests. Once the tests complete, we perform a cleanup and run an additional script (promote.sh) to enter the final stage of the build process.

This script updates the repository bitcoin-controller-helm that represents the Helm repository with the fully tested and released versions of the bitcoin controller. We first examine the tag to figure out whether this is a stable version, i.e. a tagged release. If this is not the case, the script completes without any further action.

If the commit is a release, we rename the directory in which our Helm chart is located (as Helm assumes that the name of the Helm chart and the name of the directory coincide) and update the chart name in the Chart.yaml file. We then remove a few unneeded files and use the Helm client to package the chart.

Next we clone into the bitcoin-controller-helm repository, place our newly packaged chart there and update the index. Finally, we push the changes back into the repository – and we are done.

Playing with Travis CI

You are using Github to manage your open source projects and want a CI/CD pipeline, but do not have access to a permanently available private infrastructure? Then you might want to take a look at hosted CI/CD platforms like Bitbucket pipelines, Gitlab, CircleCI – or Travis CI

When I was looking for a cloud-based integration platform for my own projects, I decided to give Travis CI a try. Travis offers builds in virtual machines, which makes it much easier to spin up local Kubernetes clusters with e.g. kind for integration testing, offers unlimited free builds for open source projects, is fully integrated with Github and makes setting up standard builds very easy – of course more sophisticated builds require more work. In this post, I will briefly describe the general usage of Travis CI, while a later post will be devoted to the design and implementation of a sample pipeline integrating Kubernetes, Travis and Helm.

Introduction to Travis CI

Getting started with Travis is easy, assuming that you already have a Github account. Simply navigate to the Travis CI homepage, sign up with your Github account and grant the required authorizations to Travis. You can then activate Travis for your Github repositories individually. Once Travis has been enabled for a repository, Travis will create a webhook so that every commit in the repository triggers a build.

For each build, Travis will spin up a virtual machine. The configuration of this machine and the subsequent build steps are defined in a file in YAML format called .travis.yml that needs to be present in the root directory of the repository.

Let us look at a very simple example to see how this works. As a playground, I have created a simple sample repository that contains a “Hello World” implementation in Go and a corresponding unit test. In the root directory of the repository, I have placed a file .travis.yml with the following content.

language: go
dist: xenial

go:
  - 1.10.8

script:
  - go build
  - go test -v -count=1

This is file in YAML syntax, defining an associative array, i.e. key/value pairs. Here, we see four keys: language, dist, go and script. While the first three keys define settings (the language to use, the base image for the virtual machine that Travis will create and the Go version), the fourth key is a build phase and defines a list of commands which will be executed during the build phase. Each of these commands can be an ordinary command as you would place it in a shell script, in particular you can invoke any program or shell script you need.

To see this in action, we can now trigger a test build in Travis. There are several options to achieve this, I usually place a meaningless text file somewhere in the repository, change this file, commit and push to trigger a build. When you do this and wait for a few minutes, you will see a new build in your Travis dashboard. Clicking on this build takes you to the build log, which is displayed on a screen similar to the following screenshot.

Working your way through this logfile, it becomes pretty transparent what Travis is doing. First, it will create a virtual machine for the build, using Ubuntu Xenial (16.04) as a base system (which I did select using the dist key in the Travis CI file). If you browse the system information, you will see some indications that behind the scenes, Travis is using Googles Cloud Platform GCP to provision the machine. Then, a Go language environment is set up, using Go 1.10.8 (again due to the corresponding setting in the Travis CI file), and the repository that triggered the build is cloned.

Once these steps have been completed, the Travis engine processes the job in several phases according to the Travis lifecycle model. Not all phases need to be present in the Travis CI file, if a phase is not specified, default actions are taken. In our case, the install phase is not defined in the file, so Travis executes a default script which depends on the programming language and in our case simply invokes go get.

The same holds for the build phase. Here, we have overwritten the default action for the build phase using the script tag in the Travis CI file. This tag is a YAML formatted list of commands which will be executed step by step during the build phase. Each of these commands can indicate failure by returning a non-zero exit code, but still, all commands will be executed even if one of the previous commands has failed – this is important to understand when designing your build phase. Alternatively, you could of course place your commands in a shell script and use for instance set -e to make sure that the script fails if one individual step fails.

A more complex job will usually have additional phases after the build phase, like a deploy phase. The commands in the deploy phase are only executed once your build phase completes successfully and are typically used to deploy a “green” build into the next stage.

Environment variables and secrets

During your build, you will typically need some additional information on the build context. Travis offers a couple of mechanisms to get access to that information.

First, there are of course environment variables that Travis will set for you. It is instructive to study the source code of the Travis build system to see where these variables are defined – like builtin.rb or several bash scripts like travis_export_go or travis_setup_env. Here is a list of the most important environment variables that are available.

TRAVIS_TMPDIR is pointing to a temporary directory that your scripts can use
TRAVIS_HOME is set to the home directory of the travis user which is used for running the build
TRAVIS_TAG is set only if the current build is for a git tag, if this is the case, it will contain the tag name
TRAVIS_COMMIT is the full git commit hash of the current build
TRAVIS_BUILD_DIR is the absolute path name to the directory where the Git checkout has been done to and where the build is executed

In addition to these built-in variables, users have several options to declare environment variables on top. First, environment variables can be declared using the env tag in the Travis CI file. Note that Travis will trigger a build for each individual item in this list, with the environment variables set to the values specified in this line. Thus if you want to avoid additional builds, specify all environment variables in one line like this.

env:
  - FOO=foo BAR=bar

Now suppose that during your build, you want to push a Docker image into your public repository, or you want to publish an artifact on your GitHub page. You will then need access to the respective credentials. Storing these credentials as environment variables in the Travis CI file is a bad idea, as anybody with read access to your Github repository (i.e. the world) can read your credentials. To handle this use case, Travis offers a different option. You can specify environment variables on the repository level in the Travis CI web interface, which are then available to every build for this repository. These variables are stored in encrypted form and – assuming that you trust Travis – appear to be reasonably safe (when you use Travis, you have to grant them privileged access to your GitHub repository anyway, so if you do not trust them, you might want to think about a privately hosted CI solution).

Testing a build configuration locally

One of the nice things with Travis is that the source code of the build system is publicly available on GitHub, along with instructions on how to use it. This is very useful if you are composing a Travis CI file and run into errors which you might want to debug locally.

As recommended in the README of the repository, it is a good idea to do this in a container. I have turned the build instructions on this page into a Dockerfile that is available as a gist. Let us use this and the Travis CI file from my example repository to dive a little bit into the inner workings of Travis.

For that purpose, let us first create an empty directory, download the Docker file and build it.

$ mkdir ~/travis-test
$ cd ~/travis-test
$ wget https://gist.githubusercontent.com/christianb93/e14252a122d081a219b84a905a40543f/raw/1525fec3b26c7dc4eab71e7838c02f8637e40675/Dockerfile.travis-build
$ docker build --tag travis-build --file Dockerfile.travis-build .

Next, clone my test repository and run the container, mounting the current directory as a bind mount on /travis-test

$ git clone --depth=1 https://github.com/christianb93/go-hello-world
$ docker run -it -v $(pwd):/travis-test travis-build

You should now see a shell prompt within the container, with a working directory containing a clone of the travis build repository. Let us grab our Travis CI file and copy it to the working directory.

# cp /travis-test/go-hello-world/.travis.yml .

When Travis starts a build, it internally converts the Travis CI file into a shell script. Let us do this for our sample file and take quick look a the resulting shell script. Within the container, run

# bin/travis compile > /travis-test/script

The compile command will create the shell script and print it to standard output. Here, we redirect this back into the bind mount to have it on the host where you can inspect it using your favorite code editor. If you open this script, you will see that it is in fact a bash script that first defines some of the environment variables that we have discussed earlier. It then defines and executes a function called travis_preamble and then prints a very long string containing some function definitions into a file called job_stages which is then sourced, so that these functions are available. A similar process is then repeated several times, until finally, at the end of the script, a collection of these functions is executed.

Actually running this script inside the container will fail (the script expects some environment that is not present in our container, it will for instance probe a specific URL to connect to the Travis build system), but at least the script job_stages will be created and can be inspected. Towards the end of the script, there is one function for each of the phases of a Travis build job, and starting with these functions, we could now theoretically dive into the script and debug our work.

Caching with Travis CI

Travis runs each job in a new, dedicated virtual machine. This is very useful, at it gives you a completely reproducable and clean build environment, but also implies that it is difficult to cache results during builds. Go, for instance, maintains a build cache that significantly reduces build times and that is not present by default during a build on Travis. Dependencies are also not cached, meaning that they have to be downloaded over and over again with each new build.

To enable caching, a Travis CI file can contain a cache directive. In addition to a few language specific options, this allows you to specify a list of directories which are cached between builds. Behind the scenes, Travis uses a caching utility which will store the content of the specified directories in a cloud object store (judging from the code, this is either AWS S3 or Google’s GCS). At the end of a build, the cached directories are scanned and if the content has changed, a new archive is created and uploaded to the cloud storage. Conversely, when a job is started, the archive is downloaded and unzipped to create the cached directories. Thus we learn that

The cached content still needs to be fetched via the network, so that caching for instance a Docker image is not necessarily faster than pulling the image from the Docker Hub
Whenever something in the cached directories changes, the entire cache is rebuilt and uploaded to the cloud storage

These characteristics imply that caching just to avoid network traffic will usually not work – you should cache data to avoid repeated processing of the same data. In my experience, for instance, caching the Go build cache is really useful to speed up builds, but caching Docker images (by exporting them into an archive which is then placed in a cached directory) can actually be slower than downloading the images again for each build.

Using the API to inspect your build results

So far, we have operated Travis through the web interface. However, to support full automation, Travis does of course also offer an API.

To use the API, you will first have to navigate to your profile and retrieve an API token. For the remainder of this section, I will assume that you have done this and defined an environment variable travisToken that contains this token. The token needs to be present in the authorization header of each HTTPS request that goes to the API.

To test the API and the token, let us first get some data on our user using curl. So in a terminal window (in which you have exported the token) enter

$ curl -s\
  -H "Travis-API-Version: 3"\
  -H "Authorization: token $travisToken" \
  https://api.travis-ci.org/user | jq

As a result, you should see a JSON formatted output that I do not reproduce here for privacy reasons. Among other fields, you should be able to see your user ID, both as a string and an integer, as well as your Github user ID, demonstrating that the token works and is associated with your credentials.

As a different and more realistic example, let us suppose that you wanted to retrieve the state of the latest build for the sample repository go-hello-world. You would then navigate from the repository, identified by its slug (i.e. the combination of Github user name and repository), to its builds, sort the builds by start date and use jq to retrieve the status of the first entry in the list.

$ curl -s\
    -H "Travis-API-Version: 3"\
    -H "Authorization: token $travisToken" \
    "https://api.travis-ci.org/repo/christianb93%2Fgo-hello-world/builds?limit=5&sort_by=started_at:desc" \
     | jq -r '.builds[0].state'

Note that we need to properly format the backslash which is part of the URL and need to include the entire URL in double quotes so that the shell does not interpret the ampersand & as an instruction to spawn curl in the background.

There is of course much more that you can do with the API – you can activate and deactivate repositories, trigger and cancel builds, retrieve environment variables, change repository settings and so forth. Instead of using the API directly with curl, you could also use the official Travis client which is written in Ruby, and it would probably not be very difficult to create a simple library accessing that API in any other programming language.

We have reached the end of this short introduction to Travis CI. In one of the next posts, I will show you how to actually put this into action. We will build a CI pipeline for my bitcoin controller which fully automates unit and integration tests using Helm and kind to spin up a local Kubernetes cluster. I hope you enjoyed this post and come back to read more soon.

Extending Kubernetes with custom resources and custom controllers

The Kubernetes API is structured around resources. Typical resources that we have seen so far are pods, nodes, containers, ingress rules and so forth. These resources are built into Kubernetes and can be addresses using the kubectl command line tool, the API or the Go client.

However, Kubernetes is designed to be extendable – and in fact, you can add your own resources. These resources are defined by objects called custom resource definitions (CRD).

Setting up custom resource definitions

Confusingly enough, the definition of a custom resource – i.e. the CRD – itself is nothing but a resource, and as such, can be created using either the Kubernetes API directly or any client you like, for instance kubectl.

Suppose we wanted to create a new resource type called book that has two attributes – an author and a title. To distinguish this custom resource from other resources that Kubernetes already knows, we have to put our custom resource definition into a separate API group. This can be any string, but to guarantee uniqueness, it is good practice to use some sort of domain, for instance a GitHub repository name. As my GitHub user name is christianb93, I will use the API group christianb93.github.com for this example.

To understand how we can define that custom resource type using the API, we can take a look at its specification or the corresponding Go structures. We see that

The CRD resource is part of the API group apiextensions.k8s.io and has version v1beta1, so the value of the apiVersion fields needs to be apiextensions.k8s.io/v1beta1
The kind is, of course, CustomResourceDefinition
There is again a metadata field, which is built up as usual. In particular, there is a name field
A custom resource definition spec consists of a version, the API group, a field scope that determines whether our CRD instances will live in a cluster scope or in a namespace and a list of names

This translates into the following manifest file to create our CRD.

$ kubectl apply -f - << EOF
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
    name: books.christianb93.github.com
spec:  
    version: v1
    group: christianb93.github.com
    scope: Namespaced
    names:
      plural: books
      singular: book
      kind: Book
EOF
customresourcedefinition.apiextensions.k8s.io/books.christianb93.github.com created

This will create a new type of resources, our books. We can access books similar to all other resources Kubernetes is aware of. We can, for instance, get a list of existing books using the API. To do this, open a separate terminal and run

kubectl proxy

to get access to the API endpoints. Then use curl to get a list of all books.

$ curl -s -X GET "localhost:8001/apis/christianb93.github.com/v1/books"  | jq
{
  "apiVersion": "christianb93.github.com/v1",
  "items": [],
  "kind": "BookList",
  "metadata": {
    "continue": "",
    "resourceVersion": "7281",
    "selfLink": "/apis/christianb93.github.com/v1/books"
  }
}

So in fact, Kubernetes knows about books and has established an API endpoint for us. Note that the path contains “apis” and not “api” to indicate that this is an extension of the original Kubernetes API. Also note that the path contains our dedicated API group name and the version that we have specified.

At this point we have completed the definition of our custom resource “book”. Now let us try to actually create some books.

$ kubectl apply -f - << EOF
apiVersion: christianb93.github.com/v1
kind: Book
metadata:
  name: david-copperfield
spec:
  title: David Copperfield
  author: Dickens
EOF
book.christianb93.github.com/my-book created

Nice – we have created our first book as an instance of our new CRD. We can now work with this book similar to a pod, a deployment and so forth. We can for instance display it using kubectl

$ kubectl get book david-copperfield
NAME                AGE
david-copperfield   3m38s

or access it using curl and the API.

$ curl -s -X GET "localhost:8001/apis/christianb93.github.com/v1/namespaces/default/books/david-copperfield" | jq
{
  "apiVersion": "christianb93.github.com/v1",
  "kind": "Book",
  "metadata": {
    "annotations": {
      "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"christianb93.github.com/v1\",\"kind\":\"Book\",\"metadata\":{\"annotations\":{},\"name\":\"david-copperfield\",\"namespace\":\"default\"},\"spec\":{\"author\":\"Dickens\",\"title\":\"David Copperfield\"}}\n"
    },
    "creationTimestamp": "2019-04-21T09:32:54Z",
    "generation": 1,
    "name": "david-copperfield",
    "namespace": "default",
    "resourceVersion": "7929",
    "selfLink": "/apis/christianb93.github.com/v1/namespaces/default/books/david-copperfield",
    "uid": "70fbc120-6418-11e9-9fbf-080027a84e1a"
  },
  "spec": {
    "author": "Dickens",
    "title": "David Copperfield"
  }
}

Validations

If we look again at what we have done and where we have started, somethings still feels a bit wrong. Remember that we wanted to define a resource called a “book” that has a title and an author. We have used those fields when actually creating a book, but we have not referred to it at all in the CRD. How does the Kubernetes API know which fields a book actually has?

The answer is simple – it does not know this at all. In fact, we can create a book with any collection of fields we want. For instance, the following will work just fine.

$ kubectl apply -f - << EOF
apiVersion: christianb93.github.com/v1
kind: Book
metadata:
  name: moby-dick
spec:
  foo: bar
EOF
book.christianb93.github.com/moby-dick created

In fact, when you run this, the Kubernetes API server will happily take your JSON input and store it in the etcd that keeps the cluster state – and it will store there whatever you provide. To avoid this, let us add a validation rule to our resource definition. This allows you to attach an OpenAPI schema to your CRD against which the books will be validated. Here is our updated CRD manifest file to make this work.

$ kubectl delete crd  books.christianb93.github.com
customresourcedefinition.apiextensions.k8s.io "books.christianb93.github.com" deleted
$ kubectl apply -f - << EOF
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
    name: books.christianb93.github.com
spec:  
    version: v1
    group: christianb93.github.com
    scope: Namespaced
    subresources:
      status: {}
    names:
      plural: books
      singular: book
      kind: Book
    validation:
      openAPIV3Schema:
        properties:
          spec:
            required: 
            - author
            - title
            properties:
              author:
                type: string
              title:
                type: string
EOF
customresourcedefinition.apiextensions.k8s.io/books.christianb93.github.com created

If you know repeat the command above, you will find that "David Copperfield" can be created, but "Moby Dick" is rejected, as it does not match the validation rules (the required fields author and title are missing).

There is another change that we have made in this version of our CRD – we have added a subresource called status to our CRD. This subresource allows a controller to update the status of the resource indepently of the specification – see the corresponding API convention for more details on this.

The controller pattern

As we have seen above, a CRD is essentially allowing you to store data as part of the cluster state kept in the etcd key-value store using a Kubernetes API endpoint. However, CRDs do not actually trigger any change in the cluster. If you POST a custom resource like a book to the Kubernetes API server, all it will do is to store that object in the etcd store.

It might come as a bit of a surprise, but strictly speaking, this is true for built-in resources as well. Suppose, for instance, that you use kubectl to create a deployment. Then, kubectl will create a PUT request for a deployment and send it to the API server. The API server will process the request and store the new deployment in the etcd. It will, however, not actually create pods, spin up containers and so forth.

This is the job of another component of the Kubernetes architecture – the controllers. Essentially, a controller is monitoring the etcd store to keep track of its contents. Whenever a new resource, for example a deployment, is created, the controller will trigger the associated actions.

Kubernetes comes with a set of built-in controllers in the controller package. Essentially, there is one controller for each type of resource. The deployment controller, for instance, monitors deployment objects. When a new deployment is created in the etcd store, it will make sure that there is a matching replica set. These sets are again managed by another controller, the replica set controller, which will in turn create matching pods. The pods are again monitored by the scheduler that determines the node on which the pods should run and writes the bindings back to the API server. The updated bindings are then picked up by the kubelet and the actual containers are started. So essentially, all involved components of the Kubernetes architecture talk to the etcd via the API server, without any direct dependencies.

Of course, the Kubernetes built-in controllers will only monitor and manage objects that come with Kubernetes. If we create custom resources and want to trigger any actual action, we need to implement our own controllers.

Suppose, for instance, we wanted to run a small network of bitcoin daemons on a Kubernetes cluster for testing purposes. Bitcoin daemons need to know each other and register themselves with other daemons in the network to be able to exchange messages. To manage that, we could define a custom resource BitcoinNetwork which contains the specification of such a network, for instance the number of nodes. We could then write a controller which

Detects new instances of our custom resource
Creates a corresponding deployment set to spin up the nodes
Monitors the resulting pods and whenever a pod comes up, adds this pod to the network
Keeps track of the status of the nodes in the status of the resource
Makes sure that when we delete or update the network, the corresponding deployments are deleted or updated as well

Such a controller would operate by detecting newly created or changed BitcoinNetwork resources, compare their definition to the actual state, i.e. existing deployments and pods, and update their state accordingly. This pattern is known as the controller pattern or operator pattern. Operators exists for many applications, like Postgres, MySQL, Prometheus and many others.

I did actually pick this example for a reason – in an earlier post, I showed you how to set up and operate a small bitcoin test network based on Docker and Python. In the next few posts, we will learn how to write a custom controller in Go that automates all this on top of Kubernetes! To achieve this, we will first analyze the components of a typical controller – informes, queues, caches and all that – using the Kubernetes sample controller and then dig into building a custom bitcoin controller armed with this understanding.

Automating cluster creation on DigitalOcean

So far I have mostly used Amazons EKS platform for my posts on Kubernetes. However, this is of course not the only choice – there are many other providers that offer Kubernetes in a cloud environment. One of them which is explicitly targeting developers is DigitalOcean. In this post, I will show you how easy it is to automate the creation of a Kubernetes cluster on the DigitalOcean platform.

Creating an SSH key

Similar to most other platforms, DigitalOcean offers SSH access to their virtual machines. You can either ask DigitalOcean to create a root password for you and send it to you via mail, or – preferred – you can use SSH keys.

Different from AWS, key pairs need to be generated manually outside of the platform and imported. So let us generate a key pair called do_k8s and import it into the platform. To create the key locally, run

$ ssh-keygen -f ~/.ssh/do_k8s -N ""

This will create a new key (not protected by a passphrase, so be careful) and store the private key file and the public key file in separate files in the SSH standard directory. You can print out the contents of the public key file as follows.

$ cat ~/.ssh/do_k8s.pub

The resulting output is the public part of your SSH key, including the string “ssh-rsa” at the beginning. To make this key known to DigitalOcean, log into the console, navigate to the security tab, click “Add SSH key”, enter the name “do_k8s” and copy the public key into the corresponding field.

Next, let us test our setup. We will create a request using curl to list all our droplets. In the DigitalOcean terminology, a droplet is a virtual machine instance. Of course, we have not yet created one, so expect to get an empty list, but we can uses this to test that our token works. For that purpose, we simply use curl to direct a GET request to the API endpoint and pass the bearer token in an additional header.

$ curl -s -X\
     GET "https://api.digitalocean.com/v2/droplets/"\
     -H "Authorization: Bearer $bearerToken"\
     -H "Content-Type: application/json"
{"droplets":[],"links":{},"meta":{"total":0}}

So no droplets, as expected, but our token seems to work.

Droplets

Let us now see how we can create a droplet. We could of course also use the cloud console to do this, but as our aim is automation, we will leverage the API.

When you have worked with a REST API before, you will not be surprised to learn that this is done by submitting a POST request. This request will contain a JSON body that describes the resource to be created – a droplet in our case – and a header that, among other things, is used to submit the bearer token that we have just created.

To be able to log into our droplet later on, we will have to pass the SSH key that we have just created to the API. Unfortunately, for that, we cannot use the name of the key (do_k8s), but we will have to use the internal ID. So the first thing we need to do is to place a GET request to extract this ID. As so often, we can do this with a combination of curl to retrieve the key and jq to parse the JSON output.

$ sshKeyName="do_k8s"
$ sshKeyId=$(curl -s -X \
      GET "https://api.digitalocean.com/v2/account/keys/" \
      -H "Authorization: Bearer $bearerToken" \
      -H "Content-Type: application/json" \
       | jq -r "select(.ssh_keys[].name=\"$sshKeyName\") .ssh_keys[0].id")

Here we first use curl to get a list of all keys in JSON format. We then pipe the output into jq and use the select statement to get only those items for which the attribute name matches our key name. Finally, we extract the ID field from this item and store it in a shell variable.

We can now assemble the data part of our request. The code is a bit difficult to read, as we need to escape quotes.

$ data="{\"name\":\"myDroplet\",\
       \"region\":\"fra1\",\
       \"size\":\"s-1vcpu-1gb\",\
       \"image\":\"ubuntu-18-04-x64\",\
       \"ssh_keys\":[ $sshKeyId ]}"

To get a nice, readable representation of this, we can use jq’s pretty printing capabilities.

$ echo $data | jq
{
  "name": "myDroplet",
  "region": "fra1",
  "size": "s-1vcpu-1gb",
  "image": "ubuntu-18-04-x64",
  "ssh_keys": [
    24322857
  ]
}

We see that this is a simple JSON structure. There is a name, which will be the name used later in the DigitalOcean console to display our droplet, a region (I use fra1 in central europe, a full list of all available regions is here), a size specifying the type of the droplet (in this case one vCPU and 1 GB), the OS image to use and finally the SSH key id that we have extracted before. Let us now submit our creation request.

$ curl -s  -X \
      POST "https://api.digitalocean.com/v2/droplets"\
      -d "$data" \
      -H "Authorization: Bearer $bearerToken"\
      -H "Content-Type: application/json"

When everything works, you should see your droplet on the DigitalOcean web console. If you repeat the GET request above to obtain all droplets, your droplet should also show up in the list. To format the output, you can again pipe it through jq. After some time, the status field (located at the top of the output) should be “active”, and you should be able to retrieve an IP address from the section “networks”. In my case, this is 46.101.128.54. We can now SSH into the machine as follows.

$ ssh -i ~/.ssh/do_key root@46.101.128.54

Needless to say that it is also easy to delete a droplet again using the API. A full reference can be found here. I have also created a few scripts that can automatically create a droplet, list all running droplets and delete a droplet.

Creating a Kubernetes cluster

Let us now turn to the creation of a Kubernetes cluster. The good news is that this is even easier than the creation of a droplet – a single POST request will do!

But before we can assemble our request, we need to understand how the cluster definition is structured. Of course, a Kubernetes cluster consists of a couple of management nodes (which DigitalOcean manages for you in the background) and worker nodes. On DigitalOcean, worker nodes are organized in node pools. Each node pool contains a set of identical worker nodes. We could, for instance, create one pool with memory-heavy machines for database workloads that require caching, and a second pool with general purpose machines for microservices. The smallest machines that DigitalOcean will allow you to bring up as worker nodes are of type s-1vcpu-2gb. To fully specify a node pool with two machines of this type, the following JSON fragment is used.

$ nodePool="{\"size\":\"s-1vcpu-2gb\",\
      \"count\": 2,\
      \"name\": \"my-pool\"}"
$ echo $nodePool | jq
{
  "size": "s-1vcpu-2gb",
  "count": 2,
  "name": "my-pool"
}

Next, we assemble the data part of the POST request. We will need to specify an array of node pools (here we will use only one node pool), the region, a name for the cluster, and a Kubernetes version (you can of course ask the API to give you a list of all existings versions by running a GET request on the URL path/v2/kubernetes/options). Using the node pool snippet from above, we can assemble and display our request data as follows.

$ data="{\"name\": \"my-cluster\",\
        \"region\": \"fra1\",\
        \"version\": \"1.13.5-do.1\",\
        \"node_pools\": [ $nodePool ]}"
$ echo $data | jq
{
  "name": "my-cluster",
  "region": "fra1",
  "version": "1.13.5-do.1",
  "node_pools": [
    {
      "size": "s-1vcpu-2gb",
      "count": 2,
      "name": "my-pool"
    }
  ]
}

Finally, we submit this data using a POST request as we have done it for our droplet above.

$ curl -s -w -X\
    POST "https://api.digitalocean.com/v2/kubernetes/clusters"\
    -d "$data" \
    -H "Authorization: Bearer $bearerToken"\
    -H "Content-Type: application/json"

Now cluster creation should start, and if you navigate to the Kubernetes tab of the DigitalOcean console, you should see your cluster being created.

Cluster creation is rather fast on DigitalOcean, and typically takes less than five minutes. To complete the setup, you will have to download the kubectl config file for the newly generated cluster. Of course, there are again two ways to do this – you can use the web console or the API. I have created a script that fully automates cluster creation – it detects the latest Kubernetes version, creates the cluster, waits until it is active and downloads the kubectl config file for you. If you run this, make sure to populate the shell variable bearerToken with your token or use the -t switch to pass the token to the script. The same directory also contains a few more scripts to list all existing clusters and to delete them again.

Stateful sets with Kubernetes

In one of my previous posts, we have looked at deployment controllers which make sure that a certain number of instances of a given pod is running at all times. Failures of pods and nodes are automatically detected and the pod is restarted. This mechanism, however, only works well if the pods are actually interchangeable and stateless. For stateful pods, additional mechanisms are needed which we discuss in this post.

Simple stateful sets

As a reminder, let us first recall some properties of deployments aka stateless sets in Kubernetes. For that purpose, let us create a deployment that brings up three instances of the Apache httpd.

$ kubectl apply -f - << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd
spec:
  selector:
    matchLabels:
      app: httpd
  replicas: 3
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
        - name: httpd-ctr
          image: httpd:alpine
EOF
deployment.apps/httpd created
$ $ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
httpd-76ff88c864-nrs8q   1/1     Running   0          12s
httpd-76ff88c864-rrxw8   1/1     Running   0          12s
httpd-76ff88c864-z9fjq   1/1     Running   0          12s

We see that – as expected – Kubernetes has created three pods and assigned random names to them, composed of the name of the deployment and a random string. We can run hostname inside one of the containers to verify that these are also the hostnames of the pods. Let us pick the first pod.

$ kubectl exec -it httpd-76ff88c864-nrs8q hostname
httpd-76ff88c864-nrs8q

Let us now kill the first pod.

$ kubectl delete pod httpd-76ff88c864-nrs8q

After a few seconds, a new pod is created. Note, however, that this pod receives a new pod name and also a new host name. And, of course, the new pod will receive a new IP address. So its entire identity – hostname, IP address etc. – has changed. Kubernetes did not actually magically revive the pod, but it realized that one pod was missing in the set and simply created a new pod according to the same pod specification template.

This behavior allows us to recover from a simple pod failure very quickly. It does, however, pose a problem if you want to deploy a set of pods that each take a certain role and rely on stable identities. Suppose, for instance, that you have an application that comes in pairs. Each instance needs to connect to a second instance, and to do this, it needs to know the name of that instance name and needs to rely on the stability of that name. How would you handle this with Kubernetes?

Of course, we could simply define two pods (not deployments) and use different names for both deployments. In addition, we could set up a service for each pod, which will create DNS entries in the Kubernetes internal DNS network. This would allow each pod to locate the second pod in the pair. But of course this is cumbersome and requires some additional monitoring to detect failures and restart pods. Fortunately, Kubernetes did introduce an alternative known as Stateful sets.

In some sense, a stateful set is similar to a deployment. You define a pod template and a desired number of replicas. A controller will then bring up these replicas, monitor them and replace them in case of a failure. The difference between a stateful set and a deployment is that each instance in a stateful set receives a unique identity which is stable across restarts. In addition, the instances of a stateful set are brought up and terminated in a defined order.

This is best explained using an example. So let us define a stateful set that contains three instances of a HTTPD (which, of course, is a toy example and not a typical application for which you would use stateful sets).

$ kubectl apply -f - << EOF
apiVersion: v1
kind: Service
metadata:
  name: httpd-svc
spec:
  clusterIP: None
  selector:
    app: httpd
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: stateful-test
spec:
  selector:
    matchLabels:
      app: httpd
  replicas: 3 
  serviceName: httpd-svc
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: httpd-ctr
        image: httpd:alpine
EOF
service/httpd-svc created
statefulset.apps/stateful-test created

That is a long manifest file, so let us go through it one by one. The first part of the file is simply a service called httpd-svc. The only thing that might seem strange is that this service has no cluster IP. This type of service is called a headless service. Its primary purpose is not to serve as a load balancer for pods (which would not make sense anyway in a typical stateful scenario, as the pods are not interchangeable), but to provide a domain in the cluster internal DNS system. We will get back to this point in a few seconds.

The second part of the manifest file is the specification of the actual stateful set. The overall structure is very similar to a deployment – there is the usual metadata section, a selector that selects the pods belonging to the set, a replica count and a pod template. However, there is also a reference serviceName to the headless service that we have just created.

Let us now inspect our cluster to see that this manifest file has created. First, of course, there is the service that the first part of our manifest file describes.

$ kubectl get svc httpd-svc
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
httpd-svc   ClusterIP   None                     13s

As expected, there is no cluster IP address for this service and no port. In addition, we have created a stateful set that we can inspect using kubectl.

$ $ kubectl get statefulset stateful-test
NAME            READY   AGE
stateful-test   3/3     88s

The output looks similar to a deployment – we see that our stateful set has three replicas out of three in total in status READY. Let us inspect those pods. We know that, by definition, our stateful set consists of all pods that have an app label with value equal to httpd, so we can use a label selector to list all these pods.

$ kubectl get pods -l app=httpd 
NAME              READY   STATUS    RESTARTS   AGE
stateful-test-0   1/1     Running   0          4m28s
stateful-test-1   1/1     Running   0          4m13s
stateful-test-2   1/1     Running   0          4m11s

So there are three pods that belong to our set, all in status running. However, note that the names of the pods follow a different pattern than for a deployment. The name is composed of the name of the stateful set plus an ordinal, starting at zero. Thus the names are predictable. In addition, the AGE field tells you that Kubernetes will also bring up the pods in this order, i.e. it will start pod stateful-test-0, wait until this pod is ready, then bring up the second pod and so on. If you delete the set again, it will bring them down in the reverse order.

Let us check whether the pod names are not only predictable, but also stable. To do this, let us bring down one pod.

$ kubectl delete pod stateful-test-1
pod "stateful-test-1" deleted
$ kubectl get pods -l app=httpd 
NAME              READY   STATUS              RESTARTS   AGE
stateful-test-0   1/1     Running             0          8m41s
stateful-test-1   0/1     ContainerCreating   0          2s
stateful-test-2   1/1     Running             0          8m24s

So Kubernetes will bring up a new pod with the same name again – our names are stable. The IP addresses, however, are not guaranteed to remain stable and typically change if a pod dies and a substitute is created. So how would the pods in the set communicate with each other, and how would another pod communicate with one of them?

To solve this, Kubernetes will add DNS records for each pod. Recall that as part of a Kubernetes cluster, a DNS server is running (with recent versions of Kubernetes, this is typically CoreDNS). Kubernetes will mount the DNS configuration file /etc/resolv.conf into each of the pods so that the pods use this DNS server. For each pod in a stateful set, Kubernetes will add a DNS record. Let us use the busybox image to show this record.

$ kubectl run -i --tty --image busybox:1.28 busybox --restart=Never --rm
If you don't see a command prompt, try pressing enter.
/ # cat /etc/resolv.conf
nameserver 10.245.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
/ # nslookup stateful-test-0.httpd-svc
Server:    10.245.0.10
Address 1: 10.245.0.10 kube-dns.kube-system.svc.cluster.local

Name:      stateful-test-0.httpd-svc
Address 1: 10.244.0.108 stateful-test-0.httpd-svc.default.svc.cluster.local

What happens? Apparently Kubernetes creates one DNS record for each pod in the stateful set. The name of that record is composed of the pod name, the service name, the namespace in which the service is defined (the default namespace in our case), the subdomain svc in which all entries for services are stored and the cluster domain. Kubernetes will also add a record for the service itself.

/ # nslookup httpd-svc
Server:    10.245.0.10
Address 1: 10.245.0.10 kube-dns.kube-system.svc.cluster.local

Name:      httpd-svc
Address 1: 10.244.0.108 stateful-test-0.httpd-svc.default.svc.cluster.local
Address 2: 10.244.0.17 stateful-test-1.httpd-svc.default.svc.cluster.local
Address 3: 10.244.1.49 stateful-test-2.httpd-svc.default.svc.cluster.local

So we find that Kubernetes has added additional A records for the service, corresponding to the three pods that are part of the stateful set. Thus an application could either look up one of the services directly, or use round-robin and a lookup of the service name to talk to the pods.

As a sidenote, you have really have to use version 1.28 of the busybox image to make this work – later versions seem to have a broken nslookup implementation, see this issue for details.

Stateful sets and storage

Stateful sets are meant to be used for applications that store a state, and of course they typically do this using persistent storage. When you create a stateful set with three replicas, you would of course want to attach storage to each of them, but of course you want to use different volumes. In case a pod is lost and restarted, you would also want to automatically attach the pod to the same volume to which the pod it replaces was attached. All this is done using PVC templates.

Essentially, a PVC template has the same role for storage that a pod specification template has for pods. When Kubernetes creates a stateful set, it will use this template to create a set of persistent volume claims, one for each member of the set. These persistent volume claims are then bound to volumes, either manually created volumes or dynamically provisioned volumes. Here is an extension of our previously used example that includes PVC templates.

apiVersion: v1
kind: Service
metadata:
  name: httpd-svc
spec:
  clusterIP: None
  selector:
    app: httpd
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: stateful-test
spec:
  selector:
    matchLabels:
      app: httpd
  replicas: 3 
  serviceName: httpd-svc
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: httpd-ctr
        image: httpd:alpine
        volumeMounts:
        - name: my-pvc
          mountPath: /test
  volumeClaimTemplates:
  - metadata:
      name: my-pvc
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi

The code is the same as above, with two differences. First, we have an additional array called volumeClaimTemplates. If you consult the API reference, you will find that the elements of this array are of type PersistentVolumeClaim and therefore follow the same syntax rules as the PVCs discussed in my previous posts.

The second change is that in the container specification, there is a mount point. This mount point refers directly to the PVC name, not using a volume on Pod level as a level of indirection as for ordinary deployments.

When we apply this manifest file and display the existing PVCs, we will find that for each instance of the stateful set, Kubernetes has created a PVC.

$ kubectl get pvc
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
my-pvc-stateful-test-0   Bound    pvc-78bde820-53e0-11e9-b859-0afa21a383ce   1Gi        RWO            gp2            1m
my-pvc-stateful-test-1   Bound    pvc-8790edc9-53e0-11e9-b859-0afa21a383ce   1Gi        RWO            gp2            38s
my-pvc-stateful-test-2   Bound    pvc-974cc9fa-53e0-11e9-b859-0afa21a383ce   1Gi        RWO            gp2            12s

We see that again, the naming of the PVCs follows a predictable patterns and is composed of the PVC template name and the name of the corresponding pod in the stateful set. Let us now look at one of the pods in greater detail.

$ kubectl describe pod stateful-test-0
Name:               stateful-test-0
Namespace:          default
Priority:           0
[ --- SOME LINES REMOVED --- ]
Containers:
  httpd-ctr:
[ --- SOME LINES REMOVED --- ]
    Mounts:
      /test from my-pvc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kfwxf (ro)
[ --- SOME LINES REMOVED --- ]
Volumes:
  my-pvc:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  my-pvc-stateful-test-0
    ReadOnly:   false
  default-token-kfwxf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-kfwxf
    Optional:    false
[ --- SOME LINES REMOVED --- ]

Thus we find that Kubernetes has automatically created a volume called my-pvc (the name of the PVC template) in each pod. This pod volume then refers to the instance-specific PVC which again refers to the volume, and this pod volume is in turn referenced by the Docker mount point.

What happens if a pod goes down and is recreated? Let us test this by writing an instance specific file onto the volume mounted by pod #0, deleting the pod, waiting for a few seconds to give Kubernetes enough time to recreate the pod and checking the content of the volume.

$ kubectl exec -it stateful-test-0 touch /test/I_am_number_0
$ kubectl exec -it stateful-test-0 ls /test
I_am_number_0  lost+found
$ kubectl delete pod stateful-test-0 && sleep 30
pod "stateful-test-0" deleted
$ kubectl exec -it stateful-test-0 ls /test
I_am_number_0  lost+found

So we see that in fact, Kubernetes has bound the same volume to the replacement for pod #0 so that the application can access the same data again. The same happens if we forcefully remove the node – the volume is then automatically attached to the node on which the replacement pod will be scheduled. Note, however, that this only works if the new node is in the same availability zone as the previous node as on most cloud platforms, cross-AZ attachment of volumes to nodes is not possible, in this case, the scheduler will have to wait until a new node in that zone has been brought up (see e.g. this link for more information on running Kubernetes in multiple zones).

Let us continue to investigate the life cycle of our volumes. We have seen that the volumes have been created automatically when the stateful set was created. The converse, however, is not true.

$ kubectl delete statefulset stateful-test
statefulset.apps "stateful-test" deleted
$ kubectl delete svc httpd-svc
service "httpd-svc" deleted
$ kubectl get pvc
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
my-pvc-stateful-test-0   Bound    pvc-78bde820-53e0-11e9-b859-0afa21a383ce   1Gi        RWO            gp2            14m
my-pvc-stateful-test-1   Bound    pvc-8790edc9-53e0-11e9-b859-0afa21a383ce   1Gi        RWO            gp2            13m
my-pvc-stateful-test-2   Bound    pvc-974cc9fa-53e0-11e9-b859-0afa21a383ce   1Gi        RWO            gp2            13m

So the persistent volume claims (and the volumes) remain existent if the delete the stateful set. This seems to be a deliberate design decision to give the administrator a chance to copy or backup the data on the volumes even after the set has been deleted. To fully clean up the volumes, we therefore need to delete the PVCs manually. Fortunately, this can again be done using a label selector, in our case

kubectl delete pvc -l app=httpd

will do the trick. Note that it is even possible to reuse the PVCs – in fact, if you recreate the stateful set and the PVCs still exists, they will be reused and attached to the new pods.

This completes our overview of stateful sets in Kubernetes. Running stateful applications in a distributed cloud environment is tricky, and there is much more that needs to be considered. To see a few examples, you might want to look at some of the examples which are part of the Kubernetes documentation, for instance the setup of a distributed key-value store like ZooKeeper. You might also want to learn about operators that provide application specific controller logic, for instance promoting a database replica to a master if the current master node goes down. For some additional considerations on running stateful applications on Kubernetes you might want to read this nice post that explains additional mechanisms like leader elections that are common ingredients for a stateful distributed application.

Kubernetes storage under the hood part III – storage classes and provisioning

In the last post, we have seen the magic of persistent volume claims in action. In this post, we will look in more details at how Kubernetes actually manages storage.

Storage classes and provisioners

First, we need to understand the concept of a storage class. In a typical environment, there are many different types of storage. There could be some block storage like EBS or a GCE persistent disk, or some local storage like an SSD or a HDD, NFS based storage or some distributed storage like Ceph or StorageOS. So far, we have kept our persistent volume claims platform agnostic, on the other hand, there might be a need to specify in more detail what type of storage we want. This is done using a storage class.

To start, let us use kubectl to list the available storage classes in our standard EKS cluster (you will get different results if you use a different provider).

$ kubectl get storageclass
NAME            PROVISIONER             AGE
gp2 (default)   kubernetes.io/aws-ebs   3h

In this case, there is only one storage class called gp2. This is marked as the default storage class, meaning that in case we define a PVC which does not explicitly refer to a storage class, this class is chosen. Using kubectl with one of the flags --output json or --output yaml, we can get more information on the gp2 storage class. We find that there is an annotation storageclass.kubernetes.io/is-default-class which defines whether this storage class is the default storage class. In addition, there is a field provisioner which in this case is kubernetes.io/aws-ebs.

This looks like a Kubernetes provided component, so let us try to locate its source code in the Kubernetes GitHub repository. A quick search in the source tree will show you that there is a fact a manifest file defining the storage class gp2. In addition, the source tree contains a plugin which will communicate with the AWS cloud provider to manage EBS block storage.

The inner workings of this are nicely explained here. Basically, the PVC controller will use the storage class in the PVC to find the provisioner that is supposed to be used for this storage class. If a provisioner is found, it is asked to create the requested storage dynamically. If no provisioner is found, the controller will just wait until storage becomes available. An external provisioner can periodically scan unmatched volume claims and provision storage for them. It then creates a corresponding persistent storage object using the Kubernetes API so that the PVC controller can detect this storage and bind it to the claim. If you are interested in the details, you might want to take a look at the source code of the external provisioner controller and the example of the Digital Ocean provisioner using it.

So at the end of the day, the workflow is as follows.

A user creates a PVC
If no storage class is provided in the PVC, the default storage class is merged into the PVC
Based on the storage class, the provisioner responsible for creating the storage is identified
The provisioner creates the storage and a corresponding Kubernetes PV object
The PVC is bound to the PV and available for use in Pods

Let us see how we can create our own storage classes. We have used the AWS EBS provisioner to create GP2 storage, but it does in fact support all storage types (gp2, io1, st1, sc1) offered by Amazon. Let us create a storage class which we can use to dynamically provision HDD storage of type st1.

$ kubectl apply -f - << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: st1
provisioner: kubernetes.io/aws-ebs
parameters:
  type: st1
EOF
storageclass.storage.k8s.io/st1 created
$ kubectl get storageclass
NAME            PROVISIONER             AGE
gp2 (default)   kubernetes.io/aws-ebs   17m
st1             kubernetes.io/aws-ebs   15s

When you compare this to the default class, you will find that we have dropped the annotation which designates this class as default – there can of course only be one default class per cluster. We have again used the aws-ebs provisioner, but changed the type field to st1. Let us now create a persistent storage claim using this class.

$ kubectl apply -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hdd-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: st1
  resources:
    requests:
      storage: 500Gi
EOF

If you submit this, wait until the PV has been created and then use aws ec2 describe-volumes, you will find a new AWS EBS volume of type st1 with a capacity of 500 Gi. As always, make sure to delete the PVC again to avoid unnecessary charges for that volume.

Manually provisioning volumes

So far, we have used a provisioner to automatically provision storage based on storage claims. We can, however, also provision storage manually and bind claims to it. This is done as usual – using a manifest file which describes a resource of kind PersistentVolume. To be able to link this newly created volume to a PVC, we first need to define a new storage class.

$ kubectl apply -f - << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cold
provisioner: kubernetes.io/aws-ebs
parameters:
  type: sc1
EOF
storageclass.storage.k8s.io/cold created

Once we have this storage class, we can now manually create a persistent volume with this storage class. The following commands create a new volume with type sc1, retrieve its volume id and create a Kubernetes persistent volume linked to that EBS volume.

$ volumeId=$(aws ec2 create-volume --availability-zone=eu-central-1a --size=1024 --volume-type=sc1  --query 'VolumeId')
$ kubectl apply -f - << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: cold-pv
spec:
  capacity:
    storage: 1024Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: cold
  awsElasticBlockStore:
    volumeID: $volumeId
EOF

Initially, this volume will be available, as there is no matching claim. Let us now create a PVC with a capacity of 512 Gi and a matching storage class.

$ kubectl apply -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cold-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: cold
  resources:
    requests:
      storage: 512Gi
EOF

This will create a new PVC and bind it to the existing persistent volume. Note that the PVC will consume the entire 1 Ti volume, even though we have only requested half of that. Thus manually pre-provisioning volumes can be rather inefficient if the administrator and the teams do not agree on standard sizes for persistent volumes.

If we delete the PVC again, we see that persistent volume is not automatically deleted, as it was the case for dynamic provisioning, but survices and can be consumed again by a new matching claim. Similarly, deleting the PV will not automatically delete the underlying EBS volume, and we need to clean up manually.

Using Python to create volume claims

As usual, we close this post with some remarks on how to use Python to create and use persistent volume claims. Again, we omit the typical boilerplate code an refer to the full source code on GitHub for all the details.

First, let us look at how to create and populate a Python object that corresponds to a PVC. We need to instantiate an instance of V1PersistentVolumeClaim. This object again has the typical metadata, contains the access mode and a resource requirement.

pvc=client.V1PersistentVolumeClaim()
pvc.api_version="v1"
pvc.metadata=client.V1ObjectMeta(
                  name="my-pvc")
spec=client.V1PersistentVolumeClaimSpec()
spec.access_modes=["ReadWriteOnce"]
spec.resources=client.V1ResourceRequirements(
                  requests={"storage" : "32Gi"})
pvc.spec=spec

Once we have this, we can submit an API request to Kubernetes to actually create the PVC. This is done using the method create_namespaced_persistent_volume_claim of the API object. We can then create a pod specification that uses this PVC, for instance as part of a deployment. In this template, we first need a container specification that contains the mount point.

container = client.V1Container(
                name="alpine-ctr",
                image="httpd:alpine",
                volume_mounts=[client.V1VolumeMount(
                    mount_path="/test",
                    name="my-volume")])

In the actual Pod specification, we also need to populate the field volumes. This field contains a list of volumes, which again refer to a volume source. So we end up with the following code.

pvcSource=client.V1PersistentVolumeClaimVolumeSource(
              claim_name="my-pvc")
podVolume=client.V1Volume(
              name="my-volume",
              persistent_volume_claim=pvcSource)
podSpec=client.V1PodSpec(containers=[container], 
                         volumes=[podVolume])

Here the claim name links the volume to our previously defined PVC, and the volume name links the volume to the mount point within the container. Starting from here, the creation of a Pod using this specification is then straightforward.

This completes our short series on storage in Kubernetes. In the next post on Kubernetes, we will look at a typical application of persistent storage – stateful applications.