Automation – LeftAsExercise

Mastering large language models – Part XII: byte-pair encoding

On our way to actually coding and training a transformer-based model, there is one last challenge that we have to master – encoding our input by splitting it in a meaningful way into subwords. Today, we will learn how to do this using an algorithm known as byte-pair encoding (BPE).

Let us first quickly recall what the problem is that subword encoding tries to solve. To a large extent, the recent success of large language models is due to the tremendous amount of data used to train them. GPT-3, for instance, has been trained on roughly 500 mio tokens (which, as we will see soon, are not exactly words) [3]. But even much smaller datasets tend to contain a large number of unique words. The WikiText103 dataset [4] for instance contains a bit more than 100 mio. words, which represent a vocabulary of 267.735 unique words. With standard word-level encoding, this would become a dimension in the input and output layers of our neural network. It is not hard to imagine that with 500 mio. token as input, this number would even be much larger and we clearly have a problem.

The traditional way to control the growth of the vocabulary is to introduce a minimum frequency. In other words, we could only include token in the vocabulary that occur more than a given number of times – the minimum frequency – in the training data and tune this number to realize a cap on the vocabulary size. This, however, has the disadvantage that during training and later inference, we will face unknown words. Of course we can reserve a special token for these unknown words, but still a translator would have to know how to handle them.

Character-level encoding solves the problem of unknown words, but introduces a new problem – the model first has to learn words before it can start to learn relations between words. In addition, if our model is capable of learning relations within a certain range L, it does of course make a difference whether the unit in which we measure this range is a character or a word.

This is the point where subword tokenization comes into play. Here, the general idea is to build up a vocabulary that consists of subword units, not full words – but also more than just individual characters. Of course, to make this efficient, we want to include those subwords in our vocabulary that occur often, and we want to include all individual characters. If we need to encode a word that consists of known subwords, we can use those, and in the worst case, we can still go down to the character level during encoding so that we will not face unknown words.

Byte-pair encoding that has first been applied to machine learning in [1] is one way to do this. Here, we build our vocabulary in two phases. In the first phase, we go through our corpus and include all characters that we find in our vocabulary. We then encode our data using this preliminary vocabulary.

In the second phase, we iterate through a process known as merge. During each merge, we go through our data and identify the pair of token in the vocabulary that occurs most frequently. We then include a new token in our vocabulary that is simply the concatenation of these two existing token and update our existing encoding. Thus every merge adds one more token to the vocabulary. We continue this process until the vocabulary has reached the desired size.

Note that in a real implementation, the process of counting token pairs to determine their frequency is usually done in two steps. First, we count how often an individual word occurs in the text and save this for later reference. Next, given a pair of token, we go through all words that contain this pair and then add up the frequencies of these words to determine the frequency of the token pair.

Let us look at an example to see how this works in practice. Suppose that the text we want to encode is

low, lower, newest, widest

In the first phase, we would build an initial vocabulary that consists of all characters that occur in this text. However, we want to make sure that we respect word boundaries, so we need to be able to distinguish between characters that appear within a word and those that appear at the end of the word. We do this by adding a special token </w> to a token that is located at the end of a word. In our case, this applies for instance to the character w that therefore gives rise to two entries in our vocabulary – w and w</w>. With this modification, our initial vocabulary is

'o', 'r</w>', 's', 'w</w>', 'n', 'i', 'e', 'l', 'd', 'w', 't</w>'

We now go through all possible combinations of two token and see how often the appear in the text. Here is what we would find in our example.

Token pair	Frequency
l + o	2
o + w</w>	1
o + w	1
w + e	2
e + r</w>	1
n + e	1
e + w	1
e + s	2
s + t</w>	2
w + i	1
i + d	1
d + e	1

We now pick the pair of token (usually called a byte-pair, even though this is strictly speaking of course not correct when we use Unicode points) that occurs most frequently. In our case, several byte pairs occur twice, and we pick one of them, say w and e. We now add an additional token “we” to our vocabulary and re-encode our text. With this vocabulary, our text which previously was represented by the sequence of token

l, o, w</w>, l, o, w, e, r</w>, n, e, w, e, s, t</w>, w, i, d, e, s, t</w>

would now be encoded as

l, o, w</w>, l, o, we, r</w>, n, e, we, s, t</w>, w, i, d, e, s, t</w>

Note the new token, marked with bold face, that appears at the two positions where we previously had the combination of the token w and e. This concludes our first merge.

We can now continue and conduct the next merge. Each merge will add one token to our vocabulary, so that controlling the number of merges allows us to create a vocabulary of the desired size. Note that each merge results in two outputs – an updated vocabulary and a rule. In our case, the first merge resulted in the rule w, e ==> we.

To encode a piece of text that was not part of our training data when running the merges, we now simply replay these rules, i.e. we start with our text, break it down into characters, corresponding to the token in the initial vocabulary, and apply the rules that we have derived during the merges in the order in which the merges took place. Thus to encode text, we need access to the vocabulary and to the rules.

Let us now turn to implementing this in Python. The original paper [1] does already contain a few code snippets that make an implementation based on them rather easy. As usual, this blog post comes with a notebook that, essentially based on the code in [1], guides you through a simple implementation using the example discussed above. Here are a few code snippets to illustrate the process.

The first step is to build a dictionary that contains the frequencies of individual words, which we will later use to easily calculate the frequency of a byte pair. Note that the keys in this dictionary are the words in the input text, but already broken down into a sequence of token, separated by spaces, so that we need to update them as we merge and add new token.

def get_word_frequencies(pre_tokenized_text):
    counter = collections.Counter(pre_tokenized_text)
    word_frequencies = {" ".join(word) + "</w>" : frequency for word, frequency in counter.items() if len(word) > 0}
    return word_frequencies

As the keys in the dictionary are already pretokenized, we can now build our initial vocabulary based on these keys.

def build_vocabulary(word_frequencies):
    vocab = set()
    for word in word_frequencies.keys():
        for c in word.split():
            vocab.add(c)
    return vocab

Next, we need to be able to identify the pair of bytes that occurs most frequently. Again, Python collections can be utilized to do this – we simply go through all words, split them into symbols, iterate through all pairs and increase the count for this pair that we store in a dictionary.

def get_stats(word_frequencies):
  pairs = collections.defaultdict(int)
  for word, freq in word_frequencies.items():
    symbols = word.split()
    for i in range(len(symbols)-1):
      pairs[symbols[i],symbols[i+1]] += freq
  return pairs

Finally, we need a function that executes an actual merge. During each merge, we use regular expressions (more on the exact expression that we use here can be found in my notebook) to replace each occurence of the pair by the new token.

def do_merge(best_pair, word_frequencies, vocab):
    new_frequencies = dict()
    new_token = "".join(best_pair)
    pattern = r"(?<!\S)" + re.escape(" ".join(best_pair)) + r"(?!\S)"
    vocab.add(new_token)
    for word, freq in word_frequencies.items():
        new_word = re.sub(pattern, new_token, word)
        new_frequencies[new_word] = word_frequencies[word]
    return new_frequencies, vocab

We can now combine the functions above to run a merge. Essentially, during a merge, we need to collect the statistics to identify the most frequent pair, call our merge function to update the dictionary containing the word frequencies and append the rule that we have found to a list of rules that we will save later.

stats = get_stats(word_frequencies)
best_pair = max(stats, key=lambda x: (stats[x], x)) 
print(f"Best pair: {best_pair}")
word_frequencies, vocab = do_merge(best_pair, word_frequencies, vocab)
rules.append(best_pair)

This code, howevers, is awfully slow, as it basically repeats the process of counting once with every merge. The authors of the original paper also provide a reference implementation [2] that applies some tricks like caching and the use of indices to speed up the process significantly. In my repository for this series, I have assembled an implementation that follows this reference implementation to a large extent, but simplifies a few steps (in particular the incremental updates of the frequency counts) and is therefore hopefully a bit easier to read. The code consists of the main file BPE.py and a test script test bpe.py.

The test script can also be used to compare the output of our implementation with the reference implementation [3]. Let us quickly do this using a short snippet from “War and peace” that I have added to my repository.

#
# Clone the reference implementation
#
git clone https://github.com/rsennrich/subword-nmt.git
cd subword-nmt
#
# Get files from my repository
#
wget https://raw.githubusercontent.com/christianb93/MLLM/main/bpe/BPE.py
wget https://raw.githubusercontent.com/christianb93/MLLM/main/bpe/test_bpe.py
wget https://raw.githubusercontent.com/christianb93/MLLM/main/bpe/test.in
wget https://raw.githubusercontent.com/christianb93/MLLM/main/bpe/test.rules
#
# Run unit tests
# 
python3 test_bpe.py
#
# Run reference implementation
#
cat test.in | python3 subword_nmt/learn_bpe.py -s=50 > rules.ref
#
# Run our implementation
#
python3 test_bpe.py --infile=test.in --outfile=rules.dat
#
# Diff outputs
#
diff rules.dat rules.ref
diff rules.ref test.rules

The only difference that the diff should show you is a header line (containing the version number) that the reference implementation adds which we do not include in our output, all the remaining parts of the output files which contain the actual rules should be identical. The second diff verifies the reference file that is part of the repository and is used for the unit tests.

Over time, several other subword tokenization methods have been developed. Two popular methods are Googles wordpiece model ([6], [7]) and the unigram tokenizer [8]. While wordpiece is very similar to BPE and mainly differs in the way how the next byte pair for merging is selected, the unigram tokenizer starts with a very large vocabulary, containing all characters and the most commonly found substrings, and then iteratively removes items from the vocabulary based on a statistic model.

With BPE, we now have a tokenization method at our disposal which will allow us to decouple vocabulary size from the size of the training data, so that we can train a model even on large datasets without having to increase the model size. In the next post, we will put everything together and train a transformer-based model on the WikiText dataset.

References:

[1] R. Sennrich et al., Neural Machine Translation of Rare Words with Subword Units
[2] https://github.com/rsennrich/subword-nmt
[3] T. Brown et al., Language Models are Few-Shot Learners
[4] https://blog.salesforceairesearch.com/the-wikitext-long-term-dependency-language-modeling-dataset/
[5] K. Bostrom, G. Durrett, Byte Pair Encoding is Suboptimal for Language Model Pretraining
[6] Y. Wu et al., Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
[7] M. Schuster, K. Nakajima, Japanese and Korean voice search
[8] T. Kudo, Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates

Managing KVM virtual machines part III – using libvirt with Ansible

In my previous post, we have seen how the libvirt toolset can be used to directly create virtual volumes, virtual networks and KVM virtual machines. If this is not the first time you visit my post, you will know that I am a big fan of automation, so let us investigate today how we can use Ansible to bring up KVM domains automatically.

Using the Ansible libvirt modules

Fortunately, Ansible comes with a couple of libvirt modules that we can use for this purpose. These modules mainly follow the same approach – we initially create objects from an XML template and then can perform additional operations on them, referencing them by name.

To use these modules, you will have to install a couple of additional libraries on your workstation. First, obviously, you need Ansible, and I suggest that you take a look at my short introduction to Ansible if you are new to it and do not have it installed yet. I have used version 2.8.6 for this post.

Next, Ansible of course uses Python, and there are a couple of Python modules that we need. In the first post on libvirt, I have already shown you how to install the python3-libvirt package. In addition, you will have to run

pip3 install lxml

to install the XML module that we will need.

Let us start with a volume pool. In order to somehow isolate our volumes from other volumes on the same hypervisor, it is useful to put them into a separate volume pool. The virt_pool module is able to do this for us, given that we provide an XML definition of the pool which we can of course create from a Jinja2 template, using the name of the volume pool and its location on the hard disk as parameter.

At this point, we have to define where we store the state of our configuration, consisting of the storage pool, but also potentially SSH keys and the XML templates that we use. I have decided to put all this into a directory state in the same directory where the playbook is kept. This avoids polluting system- or user-wide directories, but also implies that when we eventually clean up this directory, we first have to remove the storage pool again in libvirt as we otherwise have a stale reference from the libvirt XML structures in /etc/libvirt into a non-existing directory.

Once we have a storage pool, we can create a volume. Unfortunately, Ansible does not (at least not at the time of writing this) have a module to handle libvirt volumes, so a bit of scripting is needed. Essentially, we will have to

Download a base image if not done yet – we can use the get_url module for that purpose
Check whether our volume already exists (this is needed to make sure that our playbook is idempotent)
If no, create a new overlay volume using our base image as backing store

We could of course upload the base image into the libvirt pool, or, alternatively, keep the base image outside of libvirts control and refer to it via its location on the local file system.

At this point, we have a volume and can start to create a network. Here, we can again use an Ansible module which has idempotency already built into it – virt_net. As before, we need to create an XML description for the network from a Jinja2 template and hand that over to the module to create our network. Once created, we can then refer to it by name and start it.

Finally, its time to create a virtual machine. Here, the virt module comes in handy. Still, we need an XML description of our domain, and there are several ways to create that. We could of course built an XML file from scratch based on the specification on libvirt.org, or simply use the tools explored in the previous post to create a machine manually and then dump the XML file and use it as a template. I strongly recommend to do both – dump the XML file of a virtual machine and then go through it line by line, with the libvirt documentation in a separate window, to see what it does.

When defining the virtual machine via the virt module, I was running into error messages that did not show up when using the same XML definition with virsh directly, so I decided to also use virsh for that purpose.

In case you want to try out the code I had at this point, follow the instructions above to install the required modules and then run (there is still an issue with this code, which we will fix in a minute)

git clone https://github.com/christianb93/ansible-samples
cd ansible-samples
cd libvirt
git checkout origin/v0.1
ansible-playbook site.yaml

When you now run virsh list, you should see the newly created domain test-instance, and using virt-viewer or virt-manager, you should be able to see its screen and that it boots up (after waiting a few seconds because there is no device labeled as root device, which is not a problem – the boot process will still continue).

So now you can log into your machine…but wait a minute, the image that we use (the Ubuntu cloud image) has no root password set. No problem, we will use SSH. So figure out the IP address of your domain using virsh domifaddr test-instance…hmmm…no IP address. Something is still wrong with our machine, its running but there is no way to log into it. And even if we had an IP address, we could not SSH into it because there is SSH key.

Apparently we are missing something, and in fact, things like getting a DHCP lease and creating an SSH key would be the job of cloud-init. So there is a problem with the cloud-init configuration of our machine, which we will now investigate and fix.

Using cloud-init

First, we have to diagnose our problem in a bit more detail. To do this, it would of course be beneficial if we could log into our machine. Fortunately, there is a great tool called virt-customize which is part of the libguestfs project and allows you to manipulate a virtual disk image. So let us shut down our machine, add a root password and restart the machine.

virsh destroy test-instance
sudo virt-customize \
  -a state/pool/test-volume \
  --root-password password:secret
virsh start test-instance

After some time, you should now be able to log into the instance as root using the password “secret”, either from the VNC console or via virsh console test-instance.

Inside the guest, the first thing we can try is to run dhclient to get an IP address. Once this completes, you should see that ens3 has an IP address assigned, so our network and the DHCP server works.

Next let us try to understand why no DHCP address was assigned during the boot process. Usually, this is done by cloud-init, which, in turn, is started by the systemd mechanism. Specifically, cloud-init uses a generator, i.e. a script (/lib/systemd/system-generators/cloud-init-generator) which then dynamically creates unit files and targets. This script (or rather a Jinja2 template used to create it when cloud-init is installed) can be found here.

Looking at this script and the logging output in our test instance (located at /run/cloud-init), we can now figure out what has happened. First, the script tries to determine whether cloud-init should be enabled. For that purpose, it uses the following sources of information.

The kernel command line as stored in /proc/cmdline, where it will search for a parameter cloud-init which is not set in our case
Marker files in /etc/cloud, which are also not there in our case
A default as fallback, which is set to “enabled”

So far so good, cloud-init has figured out that it should be enabled. Next, however, it checks for a data source, i.e. a source of meta-data and user-data. This is done by calling another script called ds-identify. It is instructive to re-run this script in our virtual machine with an increased log level and take a look at the output.

DEBUG=2 /usr/lib/cloud-init/ds-identify --force
cat /run/cloud-init/ds-identify.log

Here we see that the script tries several sources, each modeled as a function. An example is the check for EC2 metadata, which uses a combination of data like DMI serial number or hypervisor IDs with “seed files” which can be used to enforce the use a data source to check whether we are running on EC2.

For setups like our setup which is not part of a cloud environment, there is a special data source called the no cloud data source. To force cloud-init to use this data source, there are several options.

Add the switch “ds=nocloud” to the kernel command line
Fake a DMI product serial number containing “ds=nocloud”
Add a special marker directory (“seed directory”) to the image
Attach a disk that contains a file system with label “cidata” or “CIDATA” to the machine

So the first step to use cloud-config would to actually enable it. We will choose the approach to add a correctly labeled disk image to our machine. To create such a disk, there is a nice tool called cloud_localds, but we will do this manually to better understand how it works.

First, we need to define the cloud-config data that we want to feed to cloud-init by putting it onto the disk. In general, cloud-init expects two pieces of data: meta data, providing information on the instance, and user data, which contains the actual cloud-init configuration.

For the metadata, we use a file containing only a bare minimum.

$ cat state/meta-data
instance-id: iid-test-instance
local-hostname: test-instance

For the metadata, we use the example from the cloud-init documentation as a starting point, which will simply set up a password for the default user.

$ cat state/user-data
#cloud-config
password: password
chpasswd: { expire: False }
ssh_pwauth: True

Now we use the genisoimage utility to create an ISO image from this data. Here is the command that we need to run.

genisoimage \
  -o state/cloud-config.iso \
  -joliet -rock \
  -volid cidata \
  state/user-data \
  state/meta-data

Now open the virt-manager, shut down the machine, navigate to the machine details, and use “Add hardware” to add our new ISO image as a CDROM device. When you now restart the machine, you should be able to log in as user “ubuntu” with the password “password”.

When playing with this, there is an interesting pitfall. The cloud-init tool actually caches data on disk, and will only process a changed configuration if the instance-ID changes. So if cloud-init runs once, and you then make a change, you need to delete root volume as well to make sure that this cache is not used, otherwise you will run into interesting issues during testing.

Similar mechanisms need to be kept in mind for the network configuration. When running for the first time, cloud-init will create a network configuration in /etc/netplan. This configuration makes sure that the network is correctly reconfigured when we reboot the machine, but contains the MAC-address of the virtual network interface. So if you destroy and undefine the domain while the disk untouched, and create a new domain (with a new MAC-address) using the same disk, then the network configuration will fail because the MAC-address no longer matches.

Let us now automate the entire process using Ansible and add some additional configuration data. First, having a password is of course not the most secure approach. Alternatively, we can use the ssh_authorized_keys key in the cloud-config file which accepts a public SSH key (as a string) which will be added as an authorized key to the default user.

Next the way we have managed our ISO image is not yet ideal. In fact, when we attach the image to a virtual machine, libvirt will assume ownership of the file and later change its owner and group to “root”, so that an ordinary user cannot easily delete it. To fix this, we can create the ISO image as a volume under the control of libvirt (i.e. in our volume pool) so that we can remove or update it using virsh.

To see all this in action, clean up by removing the virtual machine and the generated ISO file in the state directory (which should work as long as your user is in the kvm group), get back to the directory into which you cloned my repository and run

cd libvirt
git checkout master
ansible-playbook site.yaml
# Wait for machine to get an IP
ip=""
while [ "$ip" == "" ]; do
  sleep 5
  echo "Waiting for IP..."
  ip=$(virsh domifaddr \
    test-instance \
    | grep "ipv4" \
    | awk '{print $4}' \
    | sed "s/\/24//")
done
echo "Got IP $ip, waiting another 5 seconds"
sleep 5
ssh -i ./state/ssh-key ubuntu@$ip

This will run the playbook which configures and brings up the machine, wait until the machine has acquired an IP address and then SSH into it using the ubuntu user with the generated SSH key.

The playbook will also create two scripts and place them in the state directory. The first script – ssh.sh – will exactly do the same as our snippet above, i.e. figure out the IP address and SSH into the machine. The second script – cleanup.sh – removes all created objects, except the downloaded Ubuntu cloud image so that you can start over with a fresh install.

This completes our post on using Ansible with libvirt. Of course, this does not replace a more sophisticated tool like Vagrant, but for simple setups, it works nicely and avoids the additional dependency on Vagrant. I hope you had some fun – keep on automating!

Understanding cloud-init

For a recent project using Ansible to define and run KVM virtual machines, I had to debug an issue with cloud-init. This was a trigger for me to do a bit of research on how cloud-init operates, and I decided to share my findings in this post. Note that this post is not an instruction on how to use cloud-init or how to prepare configuration data, but an introduction into the structure and inner workings of cloud-init which should put you in a position to understand most of the available official documentation.

Overview

Before getting into the details of how cloud-init operates, let us try to understand the problem it is designed to solve. Suppose you are running virtual machines in a cloud environment, booting off a standardized image. In most cases, you will want to apply some instance-specific configuration to your machine, which could be creating users, adding SSH keys, installing packages and so forth. You could of course try to bake this configuration into the image, but this easily leads to a large number of different images you need to maintain.

A more efficient approach would be to write a little script which runs at boot time and pulls instance-specific configuration data from an external source, for instance a little web server or a repository. This data could then contain things like SSH keys or lists of packages to install or even arbitrary scripts that get executed. Most cloud environments have a built-in mechanism to make such data available, either by running a HTTP GET request against a well-known IP or by mapping data as a drive (to dig deeper, you might want to take a look at my recent post on meta-data in OpenStack as an example of how this works). So our script would need to figure out in which cloud environment it is running, get the data from the specific source provided by that environment and then trigger processing based on the retrieved data.

This is exactly what cloud-init is doing. Roughly speaking, cloud-init consists of the following components.

First, there are data sources that contain instance-specific configuration data, like SSH keys, users to be created, scripts to be executed and so forth. Cloud-init comes with data sources for all major cloud platforms, and a special “no-cloud” data source for standalone environments. Typically, data sources provide four different types of data, which are handled as YAML structures, i.e. essentially nested dictionaries, by cloud-init.

User data which is defined by the user of the platform, i.e. the person creating a virtual machine
Vendor data which is provided by the organization running the cloud platform. Actually, cloud init will merge user data and vendor data before running any modules, so that user data can overwrite vendor data
Network configuration which is applied at an early stage to set up networking
Meta data which is specific to an instance, like a unique ID of the instance (used by cloud-init to figure out whether it runs the first time after a machine has been created) or a hostname

Cloud-init is able to process different types of data in different formats, like YAML formatted data, shell scripts or even Jinja2 templates. It is also possible to mix data by providing a MIME multipart message to cloud-init.

This data is then processed by several functional blocks in cloud-init. First, cloud-init has some built-in functionality to set up networking, i.e. assigning IP addresses and bringing up interfaces. This runs very early, so that additional data can be pulled in over the network. Then, there are handlers which are code snippets (either built into cloud-init or provided by a user) which run when a certain format of configuration data is detected – more on this below. And finally, there are modules which provide most of the cloud-init functionality and obtain the result of reading from the data sources as input. Examples for cloud-init modules are

Bootcmd to run a command during the boot process
Growpart which will resize partitions to fit the actual available disk space
SSH to configure host keys and authorized keys
Users and groups to create or modify users and groups
Phone home to make a HTTP POST request to a defined URL, which can for instance be used to inform a config management system that the machine is up

These modules are not executed sequentially, but in four stages. These stages are linked into the boot process at different points, so that a module that, for instance, requires fully mounted file systems and networking can run at a later point than a module that performs an action very early in the boot process. The modules that will be invoked and the stages at which they are invoked are defined in a static configuration file /etc/cloud/cloud.cfg within the virtual machine image.

Having discussed the high-level structure of cloud-init, let us now dig into each of these components in a bit more detail.

The startup procedure

The first thing we have to understand is how cloud-init is integrated into the system startup procedure. Of course this is done using systemd, however there is a little twist. As cloud-init needs to be able to enable and disable itself at runtime, it does not use a static systemd unit file, but a generator, i.e. a little script which runs at an early boot stage and creates systemd units and targets. This script (which you can find here as a Jinja2 template which is evaluated when cloud-init is installed) goes through a couple of checks to see whether cloud-init should run and whether there are any valid data sources. If it finds that cloud-init should be executed, it adds a symlink to make the cloud-init target a precondition for the multi-user target.

When this has happened, the actual cloud-init execution takes place in several stages. Each of these stages corresponds to a systemd unit, and each invokes the same executable (/usr/bin/cloud-init), but with different arguments.

Unit	Invocation
cloud-init-local	/usr/bin/cloud-init init –local
cloud-init	/usr/bin/cloud-init init
cloud-config	/usr/bin/cloud-init modules –mode=config
cloud-final	/usr/bin/cloud-init modules –mode=final

The purpose of the first stage is to evaluate local data sources and to prepare the networking, so that modules running in later stages can assume a working networking configuration. The second, third and fourth stage then run specific modules, as configured in /etc/cloud/cloud.cfg at several points in the startup process (see this link for a description of the
various networking targets used by systemd).

The cloud-init executable which is invoked here is in fact a Python script, which runs the entry point “cloud-init” which resolves (via the usual mechanism via setup.py) to this function. From here, we then branch into several other functions (main_init for the first and second stage, main_modules for the third and fourth stage) which do the actual work.

The exact processing steps during each stage depend a bit on the configuration and data source as well as on previous executions due to caching. The following diagram shows a typical execution using the no-cloud data source in four stages and indicates the respective processing steps.

For other data sources, the processing can be different. On EC2, for instance, the EC2 datasource will itself set up a minimal network configuration to be able to retrieve metadata from the EC2 metadata service.

Data sources

One of the steps that takes place in the functions discussed above is to locate a suitable data source that cloud-init uses to obtain user data, meta data, and potentially vendor data and networking configuration data. A data source is an object providing this data, for instance the EC2 metadata service when running on AWS. Data sources have dependencies on either a file system or a network or both, and these dependencies determine at which stages a data source is available. All existing data sources are kept in the directory cloudinit/sources, and the function find_sources in this package is used to identify all data sources that can be used at a certain stage. These data sources are then probed by invoking their update_metadata method. Note that data sources typically maintain a cache to avoid having to re-read the data several times (the caches are in /run/cloud-init and in /var/lib/cloud/instance/).

The way how the actual data retrieval is done (encoded in the data-source specific method _get_data) is of course depending on the data source. The “NoCloud” data source for instance parses DMI data, the kernel command line, seed directories and information from file systems with a specific label, while the EC2 data sources makes a HTTP GET request to 169.254.169.254.

Network setup

One of the key activities that takes place in the first cloud-init stage is to bring up the network via the method apply_network_config of the Init class. Here, a network configuration can come from various sources, including – depending on the type of data source – an identified data source. Other possible sources are the kernel command line, the initram file system, or a system wide configuration. If no network configuration could be found, a distribution specific fallback configuration is applied which, for most distributions, is define here. An example for such a fallback configuration on an Ubuntu guest looks as follows.

{
  'ethernets': {
     'ens3': {
       'dhcp4': True, 
       'set-name': 'ens3', 
       'match': {
         'macaddress': '52:54:00:28:34:8f'
       }
     }
   }, 
  'version': 2
}

Once a network configuration has been determined, a distribution specific method apply_network_config is invoked to actually apply the configuration. On Ubuntu, for instance, the network configuration is translated into a netplan configuration and stored at /etc/netplan/50-cloud-init.yaml. Note that this only happens if either we are in the first boot cycle for an instance or the data source has changed. This implies, for instance, that if you attach a disk to a different instance with a different MAC address (which is persisted in the netplan configuration), the network setup in the new instance might fail because the instance ID is cached on disk and not refreshed, so that the netplan configuration is not recreated and the stale MAC address is not updated. This is only one example for the subtleties that can be caused by cached instance-specific data – thus if you ever clone a disk, make sure to use a tool like virt-sysprep to remove this sort of data.

Handlers and modules

User data and meta data for cloud-init can be in several different formats, which can actually be mixed in one file or data source. User data can, for instance, be in cloud-config format (starting with #cloud-config), or a shell script which will simply be executed at startup (starting with #!) or even a Jinja2 template. Some of these formats require pre-processing, and doing this is the job of a handler.

To understand handlers, it is useful to take a closer look at how the user data is actually retrieved from a data source and processed. We have mentioned above that in the first cloud-init stage, different data sources are probed by invoking their update_metadata method. When this happens for the first time, a data source will actually pull the data and cache it.

In the second processing state, the method consume_data of the Init object will be invoked. This method will retrieve user data and vendor data from the data source and invoke all handlers. Each handler is called several times – once initially, once for every part of the user data and once at the end. A typical example is the handler for the cloud-config format, which (in its handle_part method) will, when called first, reset its state, then, during the intermediate calls, merge the received data into one big configuration and, in the final call, write this resulting configuration in a file for later use.

Once all handlers are executed, it is time to run all modules. As already mentioned above, modules are executed in one of the defined stages. However, modules are also executed with a specified frequency. This mechanism can be used to make sure that a module executes only once, or until the instance ID changes, or during every boot process. To keep track of the module execution, cloud-init stores special files called semaphore files in the directory /var/lib/cloud/instance/sem and (for modules that should execute independently of an instance) in /var/lib/cloud. When cloud-init runs, it retrieves the instance-ID from the metadata and creates or updates a link from /var/lib/cloud/instance to a directory specific for this instance, to be able to track execution per instance even if a disk is re-mounted to a different instance.

Technically, a module is a Python module in the cloudinit.config package. Running a module simply means that cloud-init is going to invoke the function handle in the corresponding module. A nice example is the scripts_user module. Recall that the script handler discussed above will extract scripts (parts starting with #!) from the user data and place them as executable files in directory on the file system. The scripts_user module will simply go through this directory and run all scripts located there.

Other modules are more complex, and typically perform an action depending on a certain set of keys in the configuration merged from user data and vendor data. The SSH module, for instance, looks for keys like ssh_deletekeys (which, if present, causes the deletion of existing host keys), ssh_keys to define keys which will be used as host keys and ssh_authorized_keys which contains the keys to be used as authorized keys for the default user. In addition, if the meta data contains a key public-keys containing a list of SSH keys, these keys will be set up for the default user as well – this is the mechanism that AWS EC2 uses to pull the SSH keys defined during machine creation into the instance.

Debugging cloud-init

When you take a short look at the source code of cloud-init, you will probably be surprised by the complexity to which this actually quite powerful toolset has grown. As a downside, the behaviour can sometimes be unexpected, and you need to find ways to debug cloud-init.

If cloud-init fails completely, the first thing you need to do is to find an alternative way to log into your machine, either using a graphical console or an out-of-band mechanism, depending on what your cloud platform offers (or you might want to use a local testbed based on e.g. KVM, using an ISO image to provide user data and meta data – you might want to take a look at the Ansible scripts that I use for that purpose).

Once you have access to the machine, the first thing is to use systemctl status to see whether the various services (see above) behind cloud-init have been executed, and journalctl –unit=… to get their output. Next, you can take a look at the log file and state files written by cloud-init. Here are a few files that you might want to check.

/var/log/cloud-init.log – the main log file
/var/log/cloud-init-output.log – contains additional output, like the network configuration or the output of ssh-keygen
The various state files in /var/lib/cloud/instance and /run/cloud-init/ which contain semaphores, downloaded user data and vendor data and instance metadata.

Another very useful option is the debug module. This is a module (not enabled by default) which will simply print out the merged configuration, i.e. meta data, user data and vendor data. As other modules, this module is configured by supplying a YAML structure. To try this out, simply add the following lines to /etc/cloud/cloud.cfg.

debug:
   output: /var/log/cloud-init-debug.log
   verbose: true

This will instruct the module to print verbose output into the file /var/log/cloud-init-debug.log. As the module is not enabled by default, we either have to add it to the module lists in /etc/cloud/cloud.cfg and reboot or – much easier – use the single switch of the cloud-init executable to run only this module.

cloud-init single \
  --name debug \
  --frequency always

When you now look at the contents of the newly created file /var/log/cloud-init-debug.log, you will see the configuration that the cloud-init modules have actually received, after all merges are complete. With the same single switch, you can also re-run other modules (as in the example above, you might need to overwrite the frequency to enforce the execution of the module).

And of course, if everything else fails on you – cloud-init is written in Python, so the source code is there – typically in /usr/lib/python3/dist-packages. So you can modify the source code and add debugging statements as needed, and then use cloud-init single to re-run specific modules. Happy hacking!

Understanding TLS certificates with Ansible and NGINX – part II

In the first part of this short series, we have seen how Ansible can be used to easily generate self-signed certificates. Today, we will turn to more complicated set-ups and learn how to act as a CA, build chains of certificates and create client-certificates.

Creating CA and intermediate CA certificates

Having looked at the creation of a single, self-signed certificate for which issuer and subject are identical, let us now turn to a more realistic situation – using one certificate, a CA certificate, to sign another certificate. If this second certificate is directly used to authorize an entity, for instance by being deployed into a web server, it is usually called an end-entity certificate. If, however, this certificate is used to in turn sign a third certificate, it is called an intermediate CA certificate.

In the first post, we have looked at the example of the certificate presented by github.com, which is signed by a certificate with the CN “DigiCert SHA2 Extended Validation Server CA” (more precisely, of course, by the private key associated with the public key verified by this certificate), which in turn is issued by “DigiCert High Assurance EV Root CA”, the root CA. Here, the second certificate is the intermediate CA certificate, and the certificate presented by github.com is the end-entity certificate.

Let us now try to create a similar chain in Ansible. First, we need a root CA. This will again be a self-signed certificate (which is the case for all root CA certificates). In addition, root CA certificates typically contain a set of extensions. To understand these extensions, the easiest approach is to look a few examples. You can either use openssl x509 to inspect some of the root certificates that come with your operating system, or use your browser certificate management tab to look at some of the certificates there. Doing this, you will find that root CA certificates typically contain three extensions as specified by X509v3, which are also defined in RFC 3280.

Basic Constraints: CA: True – this marks the certificate as a CA certificate
Key Usage: Digital Signature, Certificate Sign, CRL Sign – this entitles the certificate to be used to sign other certificates, perform digital signatures and sign CRLs (certificate revocation lists)
Subject Key identifier: this is an extension which needs to be present for a CA according to RFC 3280 and allows the usage of a hash key of the public key to easily identify certificates for a specific public key

All these requirements can easily be met using our Ansible modules. We essentially proceed as in the previous post and use the openssl_csr to create a CSR from which we then generate a certificate using the openssl_certificate module. The full playbook (also containing the code for the following sections) can be found here. A few points are worth being noted.

when creating the CSR, we need to add the fields key_usage and key_usage_critical to the parameters of the Ansible module. The same holds for basic_constraints and basic_constraints_critical
The module will by default put the common name into the subject alternative name extension (SAN). To turn this off, we need to set use_common_name_for_san to false.
When creating the certificate using openssl_certificate, we need the flag selfsigned_create_subject_key_identifier to instruct the module to add a subject key identifier extension to the certificate. This feature is only available since Ansible version 2.9. So in case you have an older version, you need to use pip3 install ansible to upgrade to the latest version (you might want to run this in a virtual environment)

Having this CA in place, we can now repeat the procedure to create an intermediate CA certificate. This will again be a CA certificate, with the difference that its issuer will be the root certificate that we have just created. So we do no longer use the selfsigned provider when calling the Ansible openssl_certificate module, but the ownca provider. This requires a few additional parameters, most notably of course the root CA and the private key of the root CA. So the corresponding task in the playbook will look like this.

- name: Create certificate for intermediate CA
  openssl_certificate:
    csr_path: "{{playbook_dir}}/intermediate-ca.csr"
    path: "{{playbook_dir}}/etc/certs/intermediate-ca.crt"
    provider: ownca
    ownca_path: "{{playbook_dir}}/etc/certs/ca.crt"
    ownca_create_subject_key_identifier: always_create
    ownca_privatekey_path: "{{playbook_dir}}/etc/certs/ca.rsa"

When creating the CSR, we also modify the basic constraints field a bit and add the second key/value-pair pathlen:0. This specifies that the resulting certificate cannot be used to create any additional CA certificates, but only to create the final, end-entity certificate.

This is what we will do next. The code for this is more or less the same as that for creating the intermediate CA, but this time, we use the intermediate CA instead of the root CA for signing and we also change the extensions again to create a classical service certificate.

Let us now put all this together and verify that our setup works. To create all certificates, enter the following commands.

git clone https://github.com/christianb93/tls-certificates
cd tls-certificates/lab2
ansible-playbook site.yaml

When the script completes, you should see a couple of certificates created in etc/certs. We can use OpenSSL to inspect them.

for cert in server.crt intermediate-ca.crt ca.crt; do
  openssl x509 -in etc/certs/$cert -noout -text
done

This should display all three certificates in the order listed. Looking at the common names and e-mail addresses (all other attributes of the distinguished name are identical), you should now nicely see that these certificates really form a chain, with the issuer of one element in the chain being the subject of the next one, up to the last one, which is self-signed.

Now let us see how we need to configure NGINX to use our new server certificate when establishing a TLS connection. At the first glance, you might think that we simply replace the server certificate from the last lab with our new one. But there is an additional twist. A client will typically have a copy of the root CA, but it is not clear that a client will have a copy of the intermediate CA as well. Therefore, instead of using just the server certificate, we point NGINX to a file server-chain.crt which contains both the server certificate and the intermediate CA, in this order. So run

cp etc/certs/server.crt etc/certs/server-chain.crt
cat etc/certs/intermediate-ca.crt >> etc/certs/server-chain.crt
docker run -d --rm \
       -p 443:443 \
       -v $(pwd)/etc/conf.d:/etc/nginx/conf.d \
       -v $(pwd)/etc/certs:/etc/nginx/certs \
       nginx

Once the NGINX server is running, we should now be able to build a connection for testing using OpenSSL. As the certificates that the server presents are not self-signed, we also need to tell OpenSSL where the root CA needed to verify the chain of certificates is stored.

openssl s_client \
  --connect localhost:443 \
  -CAfile etc/certs/ca.crt
GET /index.html HTTP/1.0

You should again see the NGINX welcome page. It is also instructive to look at the output that OpenSSL produces and which, right at the beginning, also contains a representation of the certificate chain as received and verified by OpenSSL.

Creating and using client certificates

So far, our certificates have been server certificates – a certificate presented by a server to prove that the public key that the server presents us is actually owned by the entity operating the server. Very often, for instance when securing REST APIs like that of Kubernetes, however, the TLS protocol is used to also authenticate a user.

Let us take the Kubernetes API as an example. The Kubernetes API is a REST API using HTTPS and listening (by default) on port 6443. When a user connects to this URL, a server certificate is used so that the user can verify that the server is really owned by whoever provides the cluster. When a user makes a request to the API server, then, in addition to that, the server would also like to know that the user is a trusted user, and will have to authenticate the user, i.e. associate a certain identity with the request.

For that purpose, Kubernetes can be configured to ask the user for a client certificate during the TLS handshake. The server will then try to verify this certificate against a configured CA certificate. If that verification is successful, i.e. if the server can build a chain of certificates from the certificate that the client presents – the so-called client certificate – then the server will extract the common name and the organization from that certificate and use it as user and group to process the API request.

Let us now see how these client certificates can be created. First, of course, we need to understand what properties of a certificate turn it into a client certificate. Finding a proper definition of the term “client certificate” is not that straightforward as you might expect. There are several recommendations describing a reasonable set of extensions for client certificates (RFC 3279, RFC 5246 and the man page of the OpenSSL X509 tool. Combining these recommendations, we use the following set of extension:

keyUsage is present and contains the bits digitalSignature and keyEncipherment
extend usage is present and contains the clientAuth key

The Ansible code to generate this certificate is almost identical to the code in the previous section, with the differences due to the different extensions that we request. Thus we again create a self-signed root CA certificate, use this certificate to sign a certificate for an intermediate CA, and then use the intermediate CA certificate to issue certificates for client and server.

We also have to adjust our NGINX setup by adding the following two lines to the configuration of the virtual server.

ssl_verify_client       on;
ssl_client_certificate  /etc/nginx/certs/ca.crt;

With the first line, we instruct NGINX to ask a client for a TLS certificate during the handshake. With the second line, we specify the CA that NGINX will use to verify these client certificates. In fact, as you will see immediately when running our example, the server will even tell the client which CAs it will accept as issuer, this is part of the certificate request specified here.

Time to see all this in action again. To download, run and test the playbook enter the following commands (do not forget to stop the container created in the previous section).

git clone https://github.com/christianb93/tls-certificates
cd tls-certificates/lab3
ansible-playbook site.yaml
openssl s_client \
  --connect localhost:443 \
  -CAfile etc/certs/ca.crt \
  -cert etc/certs/client.crt \
  -cert_chain etc/certs/intermediate-ca.crt \
  -key etc/certs/client.rsa
GET /index.html HTTP/1.0

Note the additional switches to the OpenSSL client command. With the -cert switch, we tell OpenSSL to submit a client certificate when requested and point it to the file containing this certificate. With the -cert_chain parameter, we specify additional certificates (if any) that the client will send in order to complete the certificate chain between the client certificate and the root certificate. In our case, this is the intermediate CA certificate (this would not be needed if we had used the intermediate CA certificate in the server configuration). Finally, the last switch -key contains the location of the private RSA key matching the presented certificate.

This closes our post (and the two-part mini series) on TLS certificates. We have seen that Ansible can be used to automate the generation of self-signed certificates and to build entire chains-of-trust involving end-entity certificates, intermediate CAs and private root CAs. Of course, you could also reach out to a provider to do this for you, but is (maybe) a topic for another post.

Understanding TLS certificates with NGINX and Ansible – part I

If you read technical posts like this one, chances are that you have already had some exposure to TLS certificates, for instance because you have deployed a service that uses TLS and needed to create and deploy certificates for the servers and potentially for clients. Dealing with certificates can be a challenge, and a sound understanding of what certificates actually do is more than helpful for this. In this and the next post, we will play with NGINX and Ansible to learn what certificates are, how they are generated and how they are used.

What is a certificate?

To understand the structure of a certificate, let us first try to understand the problem that certificates try to solve. Suppose you are communicating with some other party over an encrypted channel, using some type of asymmetric cryptosystem like RSA. To send an encrypted message to your peer, you will need the peers public key as a prerequisite. Obviously, you could simply ask the peer to send you the public key before establishing a connection, but then you need to mitigate the risk that someone uses a technique like IP address spoofing to pretend to be the peer you want to connect with, and is sending you a fake public key. Thus you need a way to verify that the public key that is presented to you is actually the public key owned by the party to which you want to establish a connection.

One approach could be to establish a third, trusted and publicly known party and ask that trusted party to digitally sign the public key, using a digital signature algorithm like ECDSA. With that party in place, your peer would then present you the signed public key, you would retrieve the public key of the trusted party, use that key to verify the signature and proceed if this verification is successful.

So what your peer will present you when you establish a secure connection is a signed public key – and this is, in essence, what a certificate really is. More precisely, a certificate according to the X509 v3 standard consists of the following components (see also RFC 52809.

A version number which refers to a version of the X509 specification, currently version 3 is what is mostly used
A serial number which the third party (called the issuer) assigns to the certificate
a valid-from and a valid-to date
The public key that the certificate is supposed to certify, along with some information on the underlying algorithm, for instance RSA
The subject, i.e. the party owning the key
The issuer, i.e. the party – also called certificate authority (CA) – signing the certificate
Some extensions which are additional, optional pieces of data that a certificate can contain – more on this later
And finally, a digital signature signing all the data described above

Let us take a look at an example. Here is a certificate from github.com that I have extracted using OpenSSL (we will learn how to do this later), from which I have removed some details and added some line breaks to make the output a bit more readable.

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            0a:06:30:42:7f:5b:bc:ed:69:57:39:65:93:b6:45:1f
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = DigiCert Inc, OU = www.digicert.com, 
                CN = DigiCert SHA2 Extended Validation Server CA
        Validity
            Not Before: May  8 00:00:00 2018 GMT
            Not After : Jun  3 12:00:00 2020 GMT
        Subject: businessCategory = Private Organization, 
                jurisdictionC = US, 
                jurisdictionST = Delaware, 
                serialNumber = 5157550, 
                C = US, ST = California, L = San Francisco, 
                O = "GitHub, Inc.", CN = github.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    SNIP --- SNIP
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            SNIP --- SNIP
    Signature Algorithm: sha256WithRSAEncryption
         70:0f:5a:96:a7:58:e5:bf:8a:9d:a8:27:98:2b:00:7f:26:a9:
         SNIP ----- SNIP
         af:ed:7a:29

We clearly recognize the components just discussed. At the top, there are the version number and the serial number (in hex). Then we see the signature algorithm and, at the bottom, the signature, the issuer (DigiCert), the validity, the subject (GitHub Inc.) and, last but not least, the full public key. Note that both, issuer and subject, are identified using distinguished names as you might known them from LDAP and similar directory services.

If we now wanted to verify this certificate, we would need to get the public key of the issuer, DigiCert. Of course, this is a bit of a chicken-egg problem, as we would need another certificate to verify the authenticity of this key as well. So we would need a certificate with subject DigiCert, signed by some other party, and then another certificate signed by yet another certificate authority, and so forth. This chain obviously has to end somewhere, and it does – the last certificate in such a chain (the root CA certificate) is typically a self-signed certificate. These are certificates for which issuer and subject are identical, i.e certificates where no further verification is possible and in which we simply have to trust.

How, then, do we obtain these root certificates? The answer is that root certificates are either distributed inside an organization or are bundled with operating systems and browsers. In our example, the DigiCert certificate that we see here is itself signed by another DigiCert unit called “DigiCert High Assurance EV Root CA”, and a certificate for this CA is part of the Ubuntu distribution that I use and stored in /etc/ssl/certs/DigiCert_High_Assurance_EV_Root_CA.pem which is a self-signed root certificate.

In this situation, the last element of the chain is called the root CA, the first element the end-entity and any element in between an intermediate CA.

To obtain a certificate, the owner of the server github.com would turn to the intermediate CA and submit file, a so-called certificate signing request (CSR), containing the public key to be signed. The format for CSRs is standardized in RFC 2986 which, among things, specifies that a CSR be itself signed with the private key of the requestor, which also proves to the intermediate CA that the requestor possesses the private key corresponding to the public key to be signed. The intermediate CA will then issue a certificate. To establish the intermediate CA, the intermediate CA has, at some point in the past, filed a similar CSR with the root CA and that root CA has issued a corresponding certificate to the intermediate CA.

The TLS handshake

Let us now see how certificates are applied in practice to secure a communication. Our example is the transport layer security protocol TLS, formerly known as SSL, which is underlying the HTTPS protocol (which is nothing but HTTP sitting on top of TLS).

In a very basic scenario, a TLS communication roughly works as follows. First, the clients send a “hello” message to the server, containing information like the version of TLS supported and a list of supported ciphers. The server answers with a similar message, immediately followed by the servers certificate. This certificate contains the name of the server (either as fully-qualified domain name, or including wildcards like *.domain.com in which case the certificate is called a wildcard certificat) and, of course, the public key of the server. Client and server can now use this key to agree on a secret key which is then used to encrypt the further communication. This phase of the protocol which prepares the actual encrypted connection is known as the TLS handshake.

To successfully conclude this handshake, the server therefore needs a certificate called the server certificate which it will present to the client and, of course, the matching private key, called the server private key. The client needs to verify the server certificate and therefore needs access to the certificate of the (intermediate or root) CA that signed the server certificate. This CA certificate is known as the server CA certificate. Instead of just presenting a single certificate, a server can also present an entire chain of certificates which must end with the server CA certificate that the client knowns. In practice, these certificates are often the root certificates distributed with operating systems and browsers to which the client will have access.

Now suppose that you are a system administrator aiming to set up a TLS secured service, say a HTTPS-based reverse proxy with NGINX. How would you obtain the required certificates? First, or course, you would create a key pair for the server. Once you have that, you need to obtain a certificate for the public key. Basically, you have three options to obtain a valid certificate.

First, you could turn to an independent CA and ask the CA to issue a certificate, based on a CSR that you provide. Most professional CAs will charge for this. There are, however, a few providers like let’s encrypt or Cloudflare that offer free certificates.

Alternatively, you could create your own, self-signed CA certificate using OpenSSL or Ansible, this is what we will do today in this post. And finally, as we will see in the next post, you could even build your own “micro-CA” to issue intermediate CA certificates which you can then use to issue end-entity certificates within your organization.

Using NGINX with self-signed certificates

Let us now see how self-signed certificates can be created and used in practice. As an example, we will secure NGINX (running in a Docker container, of course) using self-signed certificates. We will first do this using OpenSSL and the command line, and then see how the entire process can be automated using Ansible.

The setup we are aiming at is NGINX acting as TLS server, i.e. we will ask NGINX to provide content via HTTPS which is based on TLS. We already know that in order to do this, the NGINX server will need an RSA key pair and a valid server certificate.

To create the key pair, we will use OpenSSL. OpenSSL is composed of a variety of different commands. The command that we will use first is the genrsa command that is responsible for creating RSA keys. The man page – available via man genrsa – is quite comprehensive, and we can easily figure out that we need the following command to create a 2048 bit RSA key, stored in the file server.rsa.

openssl genrsa \
  -out server.rsa

As a side note, the created file does not only contain the private key, but also the public key components (i.e. the public exponent), as you can see by using openssl rsa -in server.rsa -noout -text to dump the generated key.

Now we need to create the server certificate. If we wanted to ask a CA to create a certificate for us, we would first create a CSR, and the CA would then create a matching certificate. When we use OpenSSL to create a self-signed certificate, we do this in one step – we use the req command of OpenSSL to create the CSR, and pass the additional switch –x509 which instructs OpenSSL to not create a CSR, but a self-signed certificate.

To be able to do this, OpenSSL will need a few pieces of information from us – the validity, the subject (which will also be the issuer), the public key to be signed, any extensions that we want to include and finally the output file name. Some of these options will be passed on the command line, but other options are usually kept in a configuration file.

OpenSSL configuration files are plain-text files in the INI-format. There is one section for each command, and there can be additional sections which are then referenced in the command-specific section. In addition, there is a default section with settings which apply for all commands. Again, the man page (run man config for the general structure of the configuration file and man req for the part specific to the req command) – is quite good and readable. Here is a minimal configuration file for our purposes.

[req]
prompt = no
distinguished_name = dn
x509_extensions = v3_ext

[dn]
CN = Leftasexercise
emailAddress = me@leftasexercise.com
O = Leftasexercise blog
L = Big city
C = DE

[v3_ext]
subjectAltName=DNS:*.leftasexercise.local,DNS:leftasexercise.local

We see that the file has three sections. The first section is specific for the req command. It contains a setting that instructs OpenSSL to not prompt us for information, and then two references to other sections. The first of these sections contains the distinguished name of the subject, the second section contains the extensions that we want to include.

There are many different extensions that were introduced with version 3 of the X509 format, and this is not the right place to discuss all of them. The one that we use for now is the subject alternative name extension which allows us to specify a couple of alias names for the subject. Often, these are DNS names for servers for which the certificate should be valid, and browsers will typically check these DNS names and try to match them with the name of the server. As shown here, we can either use a fully-qualified domain name, or we can use a wildcard – these certificates are often called wildcard certificates (which are disputed as they give rise to security concerns, see for instance this discussion). This extension is typical for server certificates.

Let us assume that we have saved this configuration file as server.cnf in the current working directory. We can now invoke OpenSSL to actually create a certificate for us. Here is the command to do this and to print out the resulting certificate.

openssl req \
  -new \
  -config server.cnf \
  -x509 \
  -days 365 \
  -key server.rsa \
  -out server.crt
# Take a look at the certificate
openssl x509 \
  -text \
  -in server.crt -noout

If you scroll through the output, you will be able to identify all components of a certificate discussed so far. You will also find that the subject and the issuer of the certificate are identical, as we expect it from a self-signed certificate.

Let us now turn to the configuration of NGINX needed to serve HTTPS requests presenting our newly created certificate as server certificate. Recall that an NGINX configuration file contains a context called server which contains the configuration for a specific virtual server. To instruct NGINX to use TLS for this server, we need to add a few lines to this section. Here is a full configuration file containing these lines.

server {
    listen               443 ssl;
    ssl_certificate      /etc/nginx/certs/server.crt;
    ssl_certificate_key  /etc/nginx/certs/server.rsa;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
    }
}

In the line starting with listen, specifically the ssl keyword, we ask NGINX to use TLS for port 443, which is the default HTTPS port. In the next line, we tell NGINX which file it should use as a server certificate, presented to a client during the TLS handshake. And finally, in the third line, we point NGINX to the location of the key matching this certificate.

To try this out, let us bring up an an NGINX container with that configuration. Ẃe will mount two directories into this container – one directory containing our certificates, and one directory containing the configuration file. So create the following directories in your current working directory.

mkdir ./etc
mkdir ./etc/conf.d
mkdir ./etc/certs

Then place a configuration file default.conf with the content shown above in ./etc/conf.d and the server certificate and server private key that we have created in the directory ./etc/certs.d. Now we start the container and map these directories into the container.

docker run -d --rm \
       -p 443:443 \
       -v $(pwd)/etc/conf.d:/etc/nginx/conf.d \
       -v $(pwd)/etc/certs:/etc/nginx/certs \
       nginx

Note that we map port 443 inside the container into the same port number on the host, so this will only work if you do not yet have a server running on this port, in this case, pick a different port. Once the container is up, we can test our connection using the s_client command of the OpenSSL package.

openssl s_client --connect 127.0.0.1:443

This will produce a lengthy output that details the TLS handshake protocol and will then stop. Now enter a HTTP GET request like

GET /index.html HTTP/1.0

The HTML code for the standard NGINX welcome page should now be printed, demonstrating that the setup works.

When you go through the output produced by OpenSSL, you will see that the client displays the full certificate chain from the certificate presented by the server up to the root CA. In our case, this chain has only one element, as we are using a self-signed certificate (which the client detects and reports as error – we will see how to get rid of this in the next post).

Automating certificate generation with Ansible

So far, we have created keys and certificates manually. Let us now see how this can be automated using Ansible. Fortunately, Ansible comes with modules to manage TLS certificates.

The first module that we will need is the openssl_csr module. With this module, we will create a CSR which we will then, in a second step, present to the module openssl_certificate to perform the actual signing process. A third module, openssl_privatekey, will be used to create a key pair.

Let us start with the key generation. Here, the only parameters that we need are the length of the key (we again use 2048 bits) and the path to the location of the generated key. The algorithm will be RSA, which is the default, and the key file will by default be created with the permissions 0600, i.e. only readable and writable by the owner.

- name: Create key pair for the server
  openssl_privatekey:
    path: "{{playbook_dir}}/etc/certs/server.rsa"
    size: 2048

Next, we create the certificate signing request. To use the openssl_csr module to do this, we need to specificy the following parameters:

The components of the distinguished name of the subject, i.e. common name, organization, locality, e-mail address and country
Again the path of the file into which the generated CSR will be written
The parameters for the requested subject alternative name extension
And, of course, the path to the private key used to sign the request

- name: Create certificate signing request
  openssl_csr:
    common_name: "Leftasexercise"
    country_name: "DE"
    email_address: "me@leftasexercise.com"
    locality_name: "Big city"
    organization_name: "Leftasexercise blog"
    path: "{{playbook_dir}}/server.csr"
    subject_alt_name: 
      - "DNS:*.leftasexercise.local"
      - "DNS:leftasexercise.local"
    privatekey_path: "{{playbook_dir}}/etc/certs/server.rsa"

Finally, we can now invoke the openssl_certificate module to create a certificate from the CSR. This module is able to operate using different backends, the so-called provider. The provider that we will use for the time being is the self-signed provider which generates self-signed certificates. Apart from the path to the CSR and the path to the created certificate, we therefore need to specify this provider and the private key to use (which, of course, should be that of the server), and can otherwise rely on the default values.

- name: Create self-signed certificate
  openssl_certificate:
    csr_path: "{{playbook_dir}}/server.csr"
    path: "{{playbook_dir}}/etc/certs/server.crt"
    provider: selfsigned
    privatekey_path: "{{playbook_dir}}/etc/certs/server.rsa"

Once this task completes, we are now ready to start our Docker container. This can again be done using Ansible, of course, which has a Docker module for that purpose. To see and run the full code, you might want to clone my GitHub repository.

git clone http://github.com/christianb93/tls-certificates
cd tls-certificates/lab1
ansible-playbook site.yaml

This completes our post for today. In the next post, we will look into more complex setups involving our own local certificate authority and learn how to generate and use client certificates.

Using Ansible with a jump host

For an OpenStack project using Ansible, I recently had to figure out how to make Ansible work with a jump host. After an initial phase of total confusion, I finally found my way through the documentation and various sources and ended up with several working configurations. This post documents what I have learned on that journey to hopefully make your life a bit easier.

Setup

In my previous posts on Ansible, I have used a rather artificial setup – a set of hosts which all expose the SSH port on the network so that we can connect directly. In real world, hosts are often hidden behind firewalls, and a pattern that you will see frequently is that only one host in a network can directly be reached via SSH – called a jump host or a bastion host – and you need to SSH into all the other hosts from there.

To be able to experiment with this situation, let us first create a lab environment which simulates this setup on Google’s cloud plattform (but any other cloud platform that has a concept of a VPC should do as well).

First, we need a project in which our resources will live. For this lab, create a new project called terraform-project with a project ID like terraform-project-12345 (of course, you will not be able to use the exact same project ID as I did, as project IDs are supposed to be unique), for instance from the Google Cloud console under “Manage Resources” in the IAM & Admin tab.

Next, create a service account for this project and assign the role “Compute Admin” to this account (which is definitely not the most secure setup and clearly not advisable for a production setup). Create a key for this service account, download the key in JSON format and store it as ~/gcp_terraform_service_account.json

In addition, you will need a private / public SSH key pair. You can reuse an existing key or create a new one using

ssh-keygen -P "" -b 2048 -t rsa -f ~/.ssh/gcp-default-key

Now we are ready to download and run the Terraform script. To do this, open a terminal on your local PC and enter

git clone https://github.com/christianb93/ansible-samples
cd ansible-samples/jumphost
terraform init
terraform apply -auto-approve

When opening the Google Cloud Console after the script has completed, you should be able to verify that two virtual networks with two machines on them have been created, with a topology as summarized by the following diagram.

So we see that there is a target host which is connected to a private network only, and a jump host which has a public IP address and is attached to a public network.

One more hint: when playing with SSH, keep in mind that on the Ubuntu images used by GCE, sshguard is installed by default which will monitor the SSH log files and, if something that looks like an attach is identified, insert a firewall rule into the filter table which blocks all incoming traffic (including ICMP)from the machine from which the suspicious SSH connections came. As playing around with some SSH features might trigger an alert, the Terraform setup script will therefore remove sshguard from the machines upon startup (though there would of course be smarter ways to deal with that, for instance by adding our own IP to the sshguard whitelist).

The SSH ProxyCommand feature

Before talking about SSH and jump hosts, we first have to understand some features of SSH (and when I say SSH here and in the following, I mean OpenSSH) that are relevant for such a configuration. Let us start with the ProxyCommand feature.

In an ordinary setup, SSH will connect to an SSH server via TCP, i.e. it will establish a TCP connection to port 22 of the server and will start to run the SSH protocol over this connection. You can, however, tell SSH to operate differently. In fact, SSH can spawn a process and write to STDIN of that process instead of writing to a TCP connection, and similarly read from STDOUT of this process. Thus we replace the TCP connection as communication channel by an indirect communication via this proxy process. In general, it is assumed that the proxy process in turn will talk to an SSH server in the background. The ProxyCommand flag tells SSH to use a proxy process to communicate with a server instead of a TCP connection and also how to start this process. Here is a diagram displaying the ordinary connection method (1) compared to the use of a proxy process (2).

To see this feature in action, let us play a bit with this and netcat. Netcat is an extremely useful tool which can establish a connection to a socket or listen on a socket, send its own input to this socket and print out whatever it sees on the socket.

Let us now open a terminal and run the command

nc -l 1234

which will ask netcat to listen for incoming connections on port 1234. In a second terminal windows, run

ssh -vvv \
    -o "ProxyCommand nc localhost 1234" \
    test@172.0.0.1

Here the IP address at the end can be any IP address (in fact, SSH will not even try to establish a connection to this IP as it uses the proxy process to communicate with the apparent server). The flag -vvv has nothing to do with the proxy, but just produces some more output to better see what is going on. Finally, the ProxyCommand flag will specify to use nc localhost 1234 as proxy process, i.e. an instance of netcat connecting to port 1234 (and thus to the instance of netcat in our second terminal).

When you run this, you should see a string similar to

SSH-2.0-OpenSSH_7.6p1 Ubuntu-4

on the screen in the second terminal, and in the first terminal, SSH will seem to wait for something. Copy this string and insert it again in the second terminal (in which netcat is running) below this output. At this point, some additional output should appear, starting with some binary garbage and then printing a few strings that seem to be key types.

This is confusing, but actually expected – let us try to understand why. When we start the SSH client, it will first launch the proxy process, i.e. an instance of netcat – let us call this nc1. This netcat instance receives 1234 as the only parameter, so it will happily try to establish a connection to port 1234. As our second netcat instance – which we call nc2 – is listening on this port, it will connect to nc2.

From the clients point of view, the channel to the SSH server is now established, and the protocol version exchange as described in section 4.2 of RFC4253 starts. Thus the client sends a version string – the string you see appearing first in the second terminal – to its communication channel. In our case, this is STDIN of the proxy process nc1, which takes that string and sends it to nc2, which in turn prints it on the screen.

The SSH client is now waiting for the servers version string as the response. When we copied that string to the second terminal, we provided it as input to nc2, which in turn did send it to nc1 where it was printed on STDOUT of nc1. This data is seen as coming across the communication channel by the SSH client, i.e. from our fictitious SSH server. The client is happy and continues with the next phase of the protocol – the key exchange (KEX) phase described in section 7 of the RFC. Thus the client sends a list of key types that it supports, in the packet format specified in section 6, and this is the garbage followed by some strings that we see. Nice…

STDIO forwarding with SSH

Let us now continue to study a second feature of OpenSSH called standard input and output tunneling which is activated with the -W switch.

The man page is a bit short at this point, stating that this switch requests that standard input and output on the client be forwarded to host on port over the secure channel. Let us try to make this a bit clearer.

First, when you start the SSH client, it will, like any interactive process, connect its STDOUT and STDIN file descriptors to the terminal. When you use the option -W, no command will be executed on the SSH server, but instead the connection will remain open and SSH will establish a connection from the SSH server to a remote host and port that you specify along with the -W switch. Then any input that you provide to the SSH client will travel across the SSH connection and be fed into that connection, whereas anything that is received from this connection from the remote host is sent back via the SSH connection to the client – a bit like a remote version of netcat.

Again, let us try this out. To do this, we need some host and port combination that we can use with -W which will provide some meaningful output. I have decided to use httpbin.org which, among other things, will give you back your own IP address. Let us first try this locally. We will use the shell’s built-in printf statement to prepare a HTTP GET request and feed that into netcat which will connect to httpbin.org, send our request and read the response.

$ printf "GET /ip HTTP/1.0\r\n\r\n" | nc httpbin.org 80
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Tue, 17 Dec 2019 18:20:20 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Length: 45
Connection: Close

{
  "origin": "46.183.103.8, 46.183.103.8"
}

The last part is the actual body data returned with the response, which is our own IP address in JSON format. Now replace our local netcat with its remote version implemented via the SSH -W flag. If you have followed the setup described above, you will have provisioned a remote host in the cloud which we can use as SSH target, and a user called vagrant on that machine. Here is our example code.

$ printf "GET /ip HTTP/1.0\r\n\r\n" | ssh -i ~/.ssh/gcp-default-key -W httpbin.org:80 vagrant@34.89.221.226
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Tue, 17 Dec 2019 18:22:17 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Length: 47
Connection: Close

{
  "origin": "34.89.221.226, 34.89.221.226"
}

Of course, you will have to replace 34.89.221.226 with the public IP address of your cloud instance, and ~/.ssh/gcp-default-key with your private SSH key for this host. We see that this time, the IP address of the host is displayed. What happens is that SSH makes a connection to this host, and the SSH server on the host in turn reaches out to httpbin.org, opens a connection on port 80, sends the string received via the SSH client’s STDIN to the server, gets the response back and sends it back over the SSH connection to the client where it is finally displayed.

TCP/IP tunneling with SSH

Instead of tunneling stdinput / stdoutput through an SSH connection, SSH can also tunnel a TCP/IP connection using the flag -L. In this mode, you specify a local port, a remote host (reachable from the SSH server) and a remote port. The SSH daemon on the server will then establish a connection to the remote host and remote port, and the SSH client will listen on the local port. If a connection is made to the local port, the connection will be forwarded through the SSH tunnel to the remote host.

There is a very similar switch -R which establishes the same mechanism, but with the role of client and server exchanged. Thus the client will connect to the specified target host and port, and the server will listen on a local port on the server. Incoming connections to this port will then be forwarded via the tunnel to the connection held by the client.

Putting it all together – the five methods to use SSH jump hosts

We now have all the tools in our hands to use jump hosts with SSH. It turns out that these tools can be combined in five different ways to achieve our goal (and there might even be more, if you are only creative enough). Let us go through them one by one.

Method 1

We could, of course, simply place the private SSH key for the target host on the jump host and then use ssh to run ssh on the jump host.

scp -i ~/.ssh/gcp-default-key \
    ~/.ssh/gcp-default-key \
    vagrant@34.89.221.226:/home/vagrant/key-on-jump-host
ssh -t -i ~/.ssh/gcp-default-key \
  vagrant@34.89.221.226 \
    ssh -i key-on-jump-host vagrant@192.168.178.3

You will have to adjust the IP addresses to your setup – 34.89.221.226 is the public IP address of the jump host, and 192.168.178.3 is the IP address under which our target host is reachable from the jump host. Also note the -t flag which is required to make the inner SSH process feel that it is connected to a terminal.

This simple approach works, but has the major disadvantage that is forces you to store the private key on the jump host. This makes your jump host a single point of failure in you security architecture, which is not a good thing as this host is typically exposed at a network edge – not a good idea. In addition, this can quickly undermine any serious attempt to establish a central key management in an organisation. So we are looking for methods that will allow you to keep the private key on the local host.

Method two

To achieve this, there is a method which seems to be the “traditional” approach that can find in most tutorials and that uses a combination of netcat and the ProxyCommand flag. Here is the command that we use.

ssh -i ~/.ssh/gcp-default-key \
   -o "ProxyCommand \
     ssh -i ~/.ssh/gcp-default-key vagrant@34.89.221.226\
     nc %h 22" \
   vagrant@192.168.178.3

Again, you will have to adjust the IP addresses in this example as explained above. When you run this, you should be greeted by a shell prompt on the target host – and we can now understand why this works. SSH will first run the proxy command on the client machine, which in turn will invoke another “inner” SSH client establishing a session to the jump host. In this session, netcat will be started on the jump host and connect to the target host.

We have now established a direct channel from standard input / output of the second SSH client – the proxy process – to port 22 of the target host. Using this channel, the first “outer” SSH client can now proceed, negotiate versions, exchange keys and establish the actual SSH session to the target host.

It is interesting to use ps on the client and the jump host and netstat on the jump host to verify this diagram. On the client, you will see two SSH processes, one with the full command line (the outer client) and a second one, spawned by the first one, representing the proxy command. On the jump host, you will see the netcat process that the SSH daemon sshd has spawned, and the TCP connection to the SSH daemon on the target host established by the netcat process.

There is one more mechanism being used here – the symbol %h in the proxy command is an example for what the man page calls a token – a placeholder which is replaced by SSH at runtime. Here, %h is replaced by the host name to which we connect (in the outer SSH command!), i.e. the name or IP of the target host. Instead of hardcoding the port number 22, we could also use the %p token for the port number.

Method three

This approach still has the disadvantage that it requires the extra netcat process on the jump host. In order to avoid this, we can use the same approach using a stdin / stdout tunnel instead of netcat.

ssh -i ~/.ssh/gcp-default-key \
  -o "ProxyCommand \
    ssh -i ~/.ssh/gcp-default-key -W %h:%p vagrant@34.89.221.226"\
  vagrant@192.168.178.3

When you run this and then inspect processes and connections on the jump host, you will find that the netcat process has gone, and the connection from the jump host to the target is initiated by a (child process of) the SSH daemon running on the jump host.

This method is described in many sources and also in the Ansible FAQs.

Method four

Now let us turn to the fourth method that is available in recent versions of OpenSSH – a new ProxyJump directive that can be used in the SSH configuration. This approach is very convention when working with SSH configuration files, so let us take a closer look at it. Let us create a SSH configuration file (typically this is ~/.ssh/config) with the following content:

Host jump-host
  HostName 34.89.221.226
  IdentityFile ~/.ssh/gcp-default-key
  User vagrant 

Host target-host
  HostName 192.168.178.3
  IdentityFile ~/.ssh/gcp-default-key
  User vagrant 
  ProxyJump jump-host

To test this configuration, simply run

ssh target-host

and you should directly be taken to a prompt on the target host. What actually happens here is that SSH looks up the configuration for the host that you specify on the command line – target-host – in the configuration file. There, it will find the ProxyJump directive, referring to the host jump-host. SSH will follow that reference, retrieve the configuration from this host from the same file and use it to establish the connection.

It is instructive to run ps axf in a second terminal on the client after establishing the connection. The output of this command on my machine contains the following two lines.

  49 tty1     S      0:00  \_ ssh target-host
  50 tty1     S      0:00      \_ ssh -W [192.168.178.3]:22 jump-host

So what happens behind the scenes is that SSH will simply start a a second session to open a stdin/stdout tunnel, as we have done it manually before. Thus the ProxyJump option is nothing but a shortcut for what we have done previously.

The equivalent of the ProxyJump directive in the configuration file is the switch -J on the command line. Using this switch directly without a configuration file does, however, have the disadvantage that it is not possible to specify the SSH key to be used for connecting to the jump host. If you need this, you will have to use the -W option discussed above which will provide the same result.

Method five

Finally, there is another method that we could use – TCP/IP tunneling. On the client, we start a first SSH session that will open a local port with port number 8222 and establish a connection to port 22 of the target host.

ssh -i ~/.ssh/gcp-default-key \
  -L 8222:192.168.178.3:22 \
  vagrant@34.89.221.226

Then, in a second terminal on the client, we use this port as the target for an SSH connection. The connection request will then to through the tunnel and we will actually establish a connection to the target host, not to the local host.

ssh -i ~/.ssh/gcp-default-key \
    -p 8222 \
    -o "HostKeyAlias 192.168.178.3" \
    vagrant@127.0.0.1

Why do we need the additional option HostKeyAlias here? Without this option, SSH will take the target host specified on the command line, i.e. 127.0.0.1, and use this host name to look up the host key in the database of known host keys. However, the host key it actually receives during the attempt to establish a connection is the host key of the target host. Therefore, the keys will not match, and SSH will complain that the host key is not known. The HostKeyAlias 192.168.178.3 instructs SSH to use 192.168.178.3 as the host name for the lookup, and SSH will find the correct key.

Ansible configuration with jump hosts

Let us now discuss the configuration needed in Ansible to make this work. As explained in the Ansible FAQs, Ansible has a configuration parameter ansible_ssh_common_args that can be used to define additional parameters to be added to the SSH command used to connect to a host. In our case, we could set this variable as follows.

ansible_ssh_common_args:  '-o "ProxyCommand ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/gcp-default-key -W %h:%p vagrant@34.89.221.226"'

Here the first few options are used to avoid issues with unknown or changed SSH keys (you can also set them here for the outer SSH connection if this is not yet done in your ansible.cfg file). There are several options that you have to set this variable. As all variables in Ansible, this is a per-host variable which we could set in the inventory or (as the documentation suggests) in a group_vars folder on the file system.

In my repository, I have created an example that uses the technique explained in an earlier post to build an inventory dynamically by capturing Terraform output. In this inventory, we set the ansible_ssh_common_args variable as above to be able to reach our target host via the jump host. To run the example, follow the initial configuration steps as explained above and then do

git clone https://github.com/christianb93/ansible-samples/
cd ansible-samples/jumphost
terraform init
ansible-playbook site.yaml

This playbook will not do anything except running Terraform (which will create the environment if you have not done this yet), capturing the output, building the inventory and connecting once to each host to verify that they can be reached.

There are many more tunneling features offered by SSH which I have not touched upon in this post – X-forwarding, for instance, or device tunneling which creates tun and tap devices on the local and the remote machine and therefore allows us to easily build a simple VPN solution. You might want to play with the SSH man pages (and your favorite search engine) to find out more on those features. Enjoy!

Using Terraform and Ansible to manage your cloud environments

Terraform is one of the most popular open source tools to quickly provision complex cloud based infrastructures, and Ansible is a powerful configuration management tool. Time to see how we can combine them to easily create cloud based infastructures in a defined state.

Terraform – a quick introduction

This post is not meant to be a full fledged Terraform tutorial, and there are many sources out there on the web that you can use to get familiar with it quickly (the excellent documentation itself, for instance, which is quite readable and quickly gets to the point). In this section, we will only review a few basic facts about Terraform that we will need.

When using Terraform, you declare the to-be state of your infrastructure in a set of resource definitions using a declarative language known as HCL (Hashicorp configuration language). Each resource that you declare has a type, which could be something like aws_instance or packet_device, which needs to match a type supported by a Terraform provider, and a name that you will use to reference your resource. Here is an example that defines a Packet.net server.

resource "packet_device" "web1" {
  hostname         = "tf.coreos2"
  plan             = "t1.small.x86"
  facilities       = ["ewr1"]
  operating_system = "coreos_stable"
  billing_cycle    = "hourly"
  project_id       = "${local.project_id}"
}

In addition to resources, there are a few other things that a typical configuration will contain. First, there are providers, which are an abstraction for the actual cloud platform. Each provider is enabling a certain set of resource types and needs to be declared so that Terraform is able to download the respective plugin. Then, there are variables that you can declare and use in your resource definitions. In addition, Terraform allows you to define data sources, which represent queries against the cloud providers API to gather certain facts and store them into variables for later use. And there are objects called provisioners that allow you to run certain actions on the local machine or on the machine just provisioned when a resource is created or destroyed.

An important difference between Terraform and Ansible is that Terraform also maintains a state. Essentially, the state keeps track of the state of the infrastructure and maps the Terraform resources to actual resources on the platform. If, for instance, you define a EC2 instance my-machine as a Terraform resource and ask Terraform to provision that machine, Terraform will capture the EC2 instance ID of this machine and store it as part of its state. This allows Terraform to link the abstract resource to the actual EC2 instance that has been created.

State can be stored locally, but this is not only dangerous, as a locally stored state is not protected against loss, but also makes working in teams difficult as everybody who is using Terraform to maintain a certain infrastructure needs access to the state. Thereform, Terraform offers backends that allow you to store state remotely, including PostgreSQL, S3, Artifactory, GCS, etcd or a Hashicorp managed service called Terraform cloud.

A first example – Terraform and DigitalOcean

In this section, we will see how Terraform can be used to provision a droplet on DigitalOcean. First, obviously, we need to install Terraform. Terraform comes as a single binary in a zipped file. Thus installation is very easy. Just navigate to the download page, get the URL for your OS, run the download and unzip the file. For Linux, for instance, this would be

wget https://releases.hashicorp.com/terraform/0.12.10/terraform_0.12.10_linux_amd64.zip
unzip terraform_0.12.10_linux_amd64.zip

Then move the result executable somewhere into your path. Next, we will prepare a simple Terraform resource definition. By convention, Terraform expects resource definitions in files with the extension .tf. Thus, let us create a file droplet.tf with the following content.

# The DigitalOcean oauth token. We will set this with the
#  -var="do_token=..." command line option
variable "do_token" {}

# The DigitalOcean provider
provider "digitalocean" {
  token = "${var.do_token}"
}

# Create a droplet
resource "digitalocean_droplet" "droplet0" {
  image  = "ubuntu-18-04-x64"
  name   = "droplet0"
  region = "fra1"
  size   = "s-1vcpu-1gb"
  ssh_keys = []
}

When you run Terraform in a directory, it will pick up all files with that extension that are present in this directory (Terraform treats directories as modules with a hierarchy starting with the root module). In our example, we make a reference to the DigitalOcean provider, passing the DigitalOcean token as an argument, and then declare one resource of type digitalocean_droplet called droplet0

Let us try this out. When we use a provider for the first time, we need to initialize this provider, which is done by running (in the directory where your resource definitions are located)

terraform init

This will detect all required providers and download the respective plugins. Next, we can ask Terraform to create a plan of the changes necessary to realize the to-be state described in the resource definition. To see this in action, run

terraform plan -var="do_token=$DO_TOKEN"

Note that we use the command line switch -var to define the value of the variable do_token which we reference in our resource definition to allow Terraform to authorize against the DigitalOcean API. Here, we assume that you have stored the DigitalOcean token in an environment variable called DO_TOKEN.

When we are happy with the output, we can now finally ask Terraform to actually apply the changes. Again, we need to provide the DigitalOcean token. In addition, we also use the switch -auto-approve, otherwise Terraform would ask us for a confirmation before the changes are actually applied.

terraform apply -auto-approve -var="do_token=$DO_TOKEN"

If you have a DigitalOcean console open in parallel, you can see that a droplet is actually being created, and after less than a minute, Terraform will complete and inform you that a new resource has been created.

We have mentioned above that Terraform maintains a state, and it is instructive to inspect this state after a resource has been created. As we have not specified a backend, Terraform will keep the state locally in a file called terraform.tfstate that you can look at manually. Alternatively, you can also use

terraform state pull

You will see an array resources with – in our case – only one entry, representing our droplet. We see the name and the type of the resource as specified in the droplet.tf file, followed by a list of instances that contains the (provider specific) details of the actual instances that Terraform has stored.

When we are done, we can also use Terraform to destroy our resources again. Simply run

terraform destroy -auto-approve -var="do_token=$DO_TOKEN"

and watch how Terraform brings down the droplet again (and removes it from the state).

Now let us improve our simple resource definition a bit. Our setup so far did not provision any SSH keys, so to log into the machine, we would have to use the SSH password that DigitalOcean will have mailed you. To avoid this, we again need to specify an SSH key name and to get the corresponding SSH key ID. First, we define a Terraform variable that will contain the key name and add it to our droplet.tf file.

variable "ssh_key_name" {
  type = string
  default = "do-default-key"
}

This definition has three parts. Following the keyword variable, we first specify the variable name. We then tell Terraform that this is a string, and we provide a default. As before, we could override this default using the switch -var on the command line.

Next, we need to retrieve the ID of the key. The DigitalOcean plugin provides the required functionality as a data source. To use it, add the following to our resource definition file.

data "digitalocean_ssh_key" "ssh_key_data" {
  name = "${var.ssh_key_name}"
}

Here, we define a data source called ssh_key_data. The only argument to that data source is the name of the SSH key. To provide it, we use the template mechanism that Terraform provides to expand variables inside a string to refer to our previously defined variable.

The data source will then use the DigitalOcean API to retrieve the key details and store them in memory. We can now access the SSH key to add it to the droplet resource definition as follows.

# Create a droplet
resource "digitalocean_droplet" "droplet0" {
...  
ssh_keys = [ data.digitalocean_ssh_key.ssh_key_data.id ]
}

Note that the variable reference consists of the keyword data to indicate that is has been populated by a data source, followed by the type and the name of the data source and finally the attribute that we refer to. If you now run the example again, we will get a machine with an SSH key so that we can access it via SSH as usual (run terraform state pull to get the IP address).

Having to extract the IP manually from the state file is not nice, it would be much more convenient if we could ask the Terraform to somehow provide this as an output. So let us add the following section to our file.

output "instance_ip_addr" {
  value = digitalocean_droplet.droplet0.ipv4_address
}

Here, we define an output called instance_ip_addr and populate it by referring to a data item which is exported by the DigitalOcean plugin. Each resource type will have its own list of exported variables which you can find in the documentation, and you can only refer to one of these variables. If we now run Terraform again, it will print the output upon completion.

We can also create more than one instance at a time by adding the keyword count to the resource definition. When defining the output, we will now have to refer to the instances as an array using the splat expression syntax to refer to an entire array. It is also advisible to move the entire instance configuration into variables and to split out variable definitions into a separate file.

Using Terraform with a non-local state

Using Terraform with a local state can be problematic – it could easily get lost or overwritten and makes working in a team or at different locations difficult. Therefore, let us quickly look at an example to set up Terraform with non-local state. Among the many available backends, we will use PostgreSQL.

Thanks to docker, it is very easy to bring up a local PostgreSQL instance. Simply run

docker run --name tf-backend -e POSTGRES_PASSWORD=my-secret-password -d postgres

Note that we do not map the PostgreSQL port here for security reasons, so we will need to figure out the IP of the resulting Docker container and use it in the Terraform configuration. The IP can be retrieved with the following command.

docker inspect tf-backend | jq -r '.[0].NetworkSettings.IPAddress'

In my example, the Docker container is running at 172.17.0.2. Next, we will have to create a database for the Terraform backend. Assuming that you have the required client packages installed, run

createdb --host=172.17.0.2 --username=postgres terraform_backend

and enter the password my-secret-password specified above when starting the Docker container. We will use the PostgreSQL superuser for our example, in a real life example you would of course first create your own user for Terraform and use it going forward.

Next, we add the backend definition to our Terraform configuration using the following section.

# The backend - we use PostgreSQL.
terraform {
  backend "pg" {
  }
}

You might expect that we also need to provide the location of the database (i.e. a connection string) and credentials. In fact, we could do this, but this would imply that the credentials which are part of the connection string would be present in the file in clear text. We therefore use a partial configuration and later supply the missing data on the command line.

Finally, we have to re-run terraform init to make the new backend known to Terraform and create a new workspace. Terraform will detect the change and even offer you to automatically migrate the state into the new backend.

terraform init -backend-config="conn_str=postgres://postgres:my-secret-password@172.17.0.2/terraform_backend?sslmode=disable"
terraform workspace new default

Be careful – as our database is running in a container with ephemeral storage, the state will be lost when we destroy the container! In our case, this would not be a desaster, as we would still be able to control our small playground environment manually. Still, it is a good idea to tag all your servers so that you can easily identify the servers managed by Terraform. If you intend to use Terraform more often, you might also want to spin up a local PostgreSQL server outside of a container (here is a short Ansible script that will do this for you and create a default user terraform with password being equal to the user name). Note that with a non-local state, Terraform will still store a reference to your state locally (look at the directory .terraform), so that when you work from a different directory, you will have to run the init command again. Also note that this will store your database credentials locally on disk in clear text.

Combining Terraform and Ansible

After this very short introduction into Terraform, let us now discuss how we can combine Terraform to manage our cloud environment with Ansible to manage the machines. Of course there are many different options, and I have not even tried to create an exhaustive list. Here are the options that I did consider.

First, you could operate Terraform and Ansible independently and use a dynamic inventory. Thus, you would have some Terraform templates and some Ansible playbooks and, when you need to verify or update your infrastructure, first run Terraform and then Ansible. The Ansible playbooks would use provider-specific inventory scripts to build a dynamic inventory and operate on this. This setup is simple and allows you to manage the Terraform configuration and your playbooks without any direct dependencies.

However, there is a potential loss of information when working with inventory scripts. Suppose, for instance, that you want to bring up a configuration with two web servers and two database servers. In a Terraform template, you would then typically have a resource “web” with two instances and a resource “db” with two instances. Correspondingly, you would want groups “web” and “db” in your inventory. If you use inventory scripts, there is no direct link between Terraform resources and cloud resources, and you would have to use tags to provide that link. Also, things easily get a bit more complicated if your configuration uses more than one cloud provider. Still, this is a robust method that seems to have some practical uses.

A second potential approach is to use Terraform provisioners to run Ansible scripts on each provisioned resource. If you decide to go down this road, keep in mind that provisioners only run when a resource is created or destroyed, not when it changes. If, for instance, you change your playbooks and simply run Terraform again, it will not trigger any provisioners and the changed playbooks will not apply. There are approaches to deal with this, for instance null resources, but this is difficult to control and the Terraform documentation itself does not advocate the use of provisioners.

Next, you could think about parsing the Terraform state. Your primary workflow would be coded into an Ansible playbook. When you execute this playbook, there is a play or task which reads out the Terraform state and uses this information to build a dynamic inventory with the add_host module. There are a couple of Python scripts out there that do exactly this. Unfortunately, the structure of the state is still provider specific, a DigitalOcean droplet is stored in a structure that is different from the structure used for a AWS EC2 instance. Thus, at least the script that you use for parsing the state is still provider specific. And of course there are potential race conditions when you parse the state while someone else might modify it, so you have to think about locking the state while working with it.

Finally, you could combine Terraform and Ansible with a tool like Packer to create immutable infrastructures. With this approach, you would use Packer to create images, supported maybe by Ansible as a provisioner. You would then use Terraform to bring up an infrastructure using this image, and would try to restrict the in-place configurations to an absolute minimum.

The approach that I have explored for this post is not to parse the state, but to trigger Terraform from Ansible and to parse the Terraform output. Thus our playbook will work as follows.

Use the Terraform module that comes with Ansible to run a Terraform template that describes the infrastructure
Within the Terraform template, create an output that contains the inventory data in a provider independent JSON format
Back in Ansible, parse that output and use add_host to build a corresponding dynamic inventory
Continue with the actual provisioning tasks per server in the playbook

Here, the provider specific part is hidden in the Terraform template (which is already provider specific by design), where the data passed back to Ansible and the Ansible playbook at least has a chance to be provider agnostic.

Similar to a dynamic inventory script, we have to reach out to the cloud provider once to get the current state. In fact, behind the scenes, the Terraform module will always run a terraform plan and then check its output to see whether there is any change. If there is a change, it will run terraform apply and return its output, otherwise it will fall back to the output from the last run stored in the state by running terraform output. This also implies that there is again a certain danger of running into race conditions if someone else runs Terraform in parallel, as we release the lock on the Terraform state once this phase has completed.

Note that this approach has a few consequences. First, there is a structural coupling between Terraform and Ansible. Whoever is coding the Terraform part needs to prepare an output in a defined structure so that Ansible is happy. Also the user running Ansible needs to have the necessary credentials to run Terraform and access the Terraform state. In addition, Terraform will be invoked every time when you run the Ansible playbook, which, especially for larger infrastructures, slows down the processing a bit (looking at the source code of the Terraform module, it would probably be easy to add a switch so that only terraform output is run, which should provide a significant speedup).

Getting and running the sample code

Let us now take a look at a sample implementation of this idea which I have uploaded into my Github repository. The code is organized into a collection of Terraform modules and Ansible roles. The example will bring up two web servers on DigitalOcean and two database servers on AWS EC2.

Let us first discuss the Terraform part. There are two modules involved, one module that will create a droplet on DigitalOcean and one module that will create an EC2 instance. Each module returns, as an output, a data structure that contains one entry for each server, and each entry contains the inventory data for that server, for instance

inventory = [
  {
    "ansible_ssh_user" = "root"
    "groups" = "['web']"
    "ip" = "46.101.162.72"
    "name" = "web0"
    "private_key_file" = "~/.ssh/do-default-key"
  },
...

This structure can easily be assembled using Terraforms for-expressions. Assuming that you have defined a DigitalOcean resource for your droplets, for instance, the corresponding code for DigitalOcean is

output "inventory" {
  value = [for s in digitalocean_droplet.droplet[*] : {
    # the Ansible groups to which we will assign the server
    "groups"           : var.ansibleGroups,
    "name"             : "${s.name}",
    "ip"               : "${s.ipv4_address}",
    "ansible_ssh_user" : "root",
    "private_key_file" : "${var.do_ssh_private_key_file}"
  } ]
}

The EC2 module is structured similarly and delivers output in the same format (note that there is no name exported by the AWS provider, but we can use a tag to capture and access the name).

In the main.tf file, we then simply invoke each of the modules, concatenate their outputs and return this as Terraform output.

This output is then processed by the Ansible role terraform. Here, we simply iterate over the output (which is in JSON format and therefore recognized by Ansible as a data structure, not just a string), and for each item, we create a corresponding entry in the dynamic inventory using add_host.

The other roles in the repository are straightforward, maybe with one exception – the role sshConfig which will add entries for all provisioned hosts in an SSH configuration file ~/.ssh/ansible_created_config and include this file in the users main SSH configuration file. This is not necessary for things to work, but will allow you to simply type something like

ssh db0

to access the first database server, without having to specify IP address, user and private key file.

A note on SSH. When I played with this, I hit upon this problem with the Gnome keyring daemon. The daemon adds identities to your SSH agent, which are attempted every time Ansible tries to log into one of the machines. If you work with more than a few SSH identities, we will therefore exceed the number of failed SSH login attempts configured in the Ubuntu image, and will not be able to connect any more. The solution for me was to disable the Gnome keyring at startup.

In order to run the sample, you will first have to prepare a few credentials. We assume that

You have AWS credentials set up to access the AWS API
The DigitalOcean API key is stored in the environment variable DO_TOKEN
There is a private key file ~/.ssh/ec2-default-key.pem matching an SSH key on EC2 called ec2-default-key
Similarly, there is a private key file ~/.ssh/do-default-key that belongs to a public key uploaded to DigitalOcean as do-default-key
There is a private / public SSH key pair ~/.ssh/default-user-key[.pub] which we will use to set up an SSH enabled default user on each machine

We also assume that you have a Terraform backend up and running and have run terraform init to connect to this backend.

To run the examples, you can then use the following steps.

git clone https://github.com/christianb93/ansible-samples
cd ansible-samples/terraform
terraform init
ansible-playbook site.yaml

The extra invocation of terraform init is required because we introduce two new modules that Terraform needs to pre-load, and because you might not have the provider plugins for DigitalOcean and AWS on your machine. If everything works, you will have a fully provisioned environment with two servers on DigitalOcean with Apache2 installed and two servers on EC2 with PostgreSQL installed up and running in a few minutes. If you are done, do not forget to shut down everything again by running

terraform  destroy -var="do_token=$DO_TOKEN"

Also I highly recommend to use the DigitalOcean and AWS console to manually check that no servers are left running to avoid unnecessary cost and to be prepared for the case that your state is out-of-sync.

Automating provisioning with Ansible – building cloud environments

When you browse the module list of Ansible, you will find that the by far largest section is the list of cloud modules, i.e. modules to interact with a cloud environments to define, provision and maintain objects like virtual machines, networks, firewalls or storage. These modules make it easy to access the APIs of virtually every existing cloud provider. In this post, we will look at an example that demonstrates the usage of these modules for two cloud platforms – DigitalOcean and AWS.

Using Ansible with DigitalOcean

Ansible comes with several modules that are able to connect to the DigitalOcean API to provision machines, manage keys, load balancers, snapshots and so forth. Here, we will need two modules – the module digital_ocean_droplet to deal with individual droplets, and digital_ocean_sshkey_facts to manage SSH keys.

Before getting our hands dirty, we need to talk about credentials first. There are three types of credentials involved in what follows.

To connect to the DigitalOcean API, we need to identify ourselves with a token. Such a token can be created on the DigitalOcean cloud console and needs to be stored safely. We do not want to put this into a file, but instead assume that you have set up an environment variable DO_TOKEN containing its value and access that variable.
The DigitalOcean SSH key. This is the key that we create once and upload (its public part, of course) to the DigitalOcean website. When we provision our machines later, we pass the name of the key as a parameter and DigitalOcean will automatically add this public key to the authorized_keys file of the root user so that we can access the machine via SSH. We will also use this key to allow Ansible to connect to our machines.
The user SSH key. As in previous examples, we will, in addition, create a default user on each machine. We need to create a key pair and also distribute the public key to all machines.

Let us first create these keys, starting with the DigitalOcean SSH key. We will use the key file name do-default-key and store the key in ~/.ssh on the control machine. So run

ssh-keygen -b 2048 -t rsa -P "" -f ~/.ssh/do-default-key
cat ~/.ssh/do-default-key.pub

to create the key and print out the public key.

Then navigate to the DigitalOcean security console, hit “Add SSH key”, enter the public key, enter the name “do-default-key” and save the key.

Next, we create the user key. Again, we use ssh-keygen as above, but this time use a different file name. I use ~/.ssh/default-user-key. There is no need to upload this key to DigitalOcean, but we will use it later in our playbook for the default user that we create on each machine.

Finally, make sure that you have a valid DigitalOcean API token (if not, go to this page to get one) and put the value of this key into an environment variable DO_TOKEN.

We are now ready to write the playbook to create droplets. Of course, we place this into a role to enable reuse. I have called this role droplet and you can find its code here.

I will not go through the entire code (which is documented), but just mention some of the more interesting points. First, when we want to use the DigitalOcean module to create a droplet, we have to pass the SSH key ID. This is a number, which is different from the key name that we specified while doing the upload. Therefore, we first have to retrieve the key ID. For that purpose, we retrieve a list of all keys by calling the module digital_ocean_sshkey_facts which will write its results into the ansible_facts dictionary.

- name: Get available SSH keys
  digital_ocean_sshkey_facts:
    oauth_token: "{{ apiToken }}"

We then need to navigate this structure to get the ID of the key we want to use. We do this by preparing a JSON query, which we put together in several steps to avoid issues with quoting. First, we apply the query string [?name=='{{ doKeyName }}'], where doKeyName is the variable that holds the name of the DigitalOcean SSH key, to navigate to the entry in the key list representing this key. The result will still be a list from which we extract the first (and typically only) element and finally access its attribute id holding the key ID.

- name: Prepare JSON query string
  set_fact:
    jsonQueryString: "[?name=='{{ doKeyName }}']"
- name: Apply query String to extract matching key data
  set_fact:
      keyData: "{{ ansible_facts['ssh_keys']|json_query(jsonQueryString) }}"
- name: Get keyId
  set_fact:
      keyId: "{{ keyData[0]['id'] }}"

Once we have that, we can use the DigitalOcean Droplet module to actually bring up the droplet. This is rather straightforward, using the following code snippet.

- name: Bring up or stop machines
  digital_ocean_droplet:
    oauth_token: "{{ apiToken }}"
    image: "{{osImage}}"
    region_id: "{{regionID}}"
    size_id: "{{sizeID}}"
    ssh_keys: [ "{{ keyId }}" ]
    state: "{{ targetState }}"
    unique_name: yes
    name: "droplet{{ item }}"
    wait: yes
    wait_timeout: 240
    tags:
    - "createdByAnsible"
  loop:  "{{ range(0, machineCount|int )|list }}"
  register: machineInfo

Here, we go through a loop, with the number of iterations being governed by the value of the variable machineCount, and, in each iteration, bring up one machine. The name of this machine is a fixed part plus the loop variable, i.e. droplet0, droplet1 and so forth. Note the usage of the parameter unique_name. This parameter instructs the module to check that the supplied name is unique. So if there is already a machine with that name, we skip its creation – this is what makes the task idempotent!

Once the machine is up, we finally loop over all provisioned machines once more (using the variable machineInfo) and complete two additional step for each machine. First, we use add_host to add the host to the inventory – this makes it possible to use them later in the playbook. For convenience, we also add an entry for each host to the local SSH configuration, which allows us to log into the machine using a shorthand like ssh droplet0.

Once the role droplet has completed, our main playbook will start a second and a third play. These plays now use the inventory that we have just built dynamically. The first of them is quite simple and just waits until all machines are reachable. This is necessary, as DigitalOcean reports a machine as up and running at a point in time when the SSH daemon might not yet accept conditions. The next and last play simply executes two roles. The first role is the role that we have already put together in an earlier post which simply adds a standard non-root user, and the second role simply uses apt and pip to install a few basic packages.

A few words on security are in order. Here, we use the DigitalOcean standard setup, which gives you a machine with a public IP address directly connected to the Internet. No firewall is in place, and all ports on the machine are reachable from everywhere. This is obviously not a setup that I would recommend for anything else than a playground. We could start a local firewall on the machine by using the ufw module for Ansible as a starting point, and then you would probably continue with some basic hardening measures before installing anything else (which, of course, could be automated using Ansible as well).

Using Ansible with AWS EC2

Let us now see what we need to change when trying the same thing with AWS EC2 instead of Digital Ocean. First, of course, the authorization changes a bit.

The Ansible EC2 module uses the Python Boto library behind the scenes which is able to reuse the credentials of an existing AWS CLI configuration. So if you follow my setup instructions in an earlier post on using Python with AWS, no further setup is required
Again, there is an SSH key that AWS will install on each machine. To create a key, navigate to the AWS EC2 console, select “Key pairs” under “Network and Security” in the navigation bar on the left, create a key called ec2-default-key, copy the resulting PEM file to ~/.ssh/ and set the access rights 0700.
The user SSH key – we can use the same key as before

As the EC2 module uses the Boto3 Python library, you might have to install it first using pip or pip3.

Let us now go through the steps that we have to complete to spin up our servers and see how we realize idempotency. First, to create a server on EC2, we need to specify an AMI ID. AMI-IDs change with every new version of an AMI, and typically, you want to install the latest version. Thus we first have to figure out the latest AMI ID. Here is a snippet that will do this.

- name: Locate latest AMI
  ec2_ami_facts:
    region: "eu-central-1"
    owners: 099720109477
    filters:
      name: "ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-????????"
      state: "available"
  register: allAMIs
- name: Sort by creation date to find the latest AMI
  set_fact:
    latestAMI: "{{ allAMIs.images | sort(attribute='creation_date') | last }}"

Here, we first use the module ec2_ami_facts to get a list of all AMIs matching a certain expression – in this case, we look for available AMIs for Ubuntu 18.04 for EBS-backed machines with the owner 099720109477 (Ubuntu). We then use a Jinja2 template to sort by creation date and pick the latest entry. The result will be a dictionary, which, among other things, contains the AMI ID as image_id.

Now, as in the DigitalOcean example, we can start the actual provisioning step using the ec2 module. Here is the respective task.

- name: Provision machine
  ec2:
    exact_count: "{{ machineCount }}"
    count_tag: "managedByAnsible"
    image: "{{ latestAMI.image_id }}"
    instance_tags:
      managedByAnsible: true
    instance_type: "{{instanceType}}"
    key_name: "{{ ec2KeyName }}"
    region: "{{ ec2Region }}"
    wait: yes
  register: ec2Result

Idempotency is realized differently on EC2. Instead of using the server name (which Amazon will assign randomly), the module uses tags. When a machine is brought up, the tags specified in the attribute instance_tags are attached to the new instance. The attribute exact_count instructs the module to make sure that exactly this number of instances with this tag is running. Thus if you run the playbook twice, the module will first query all machines with the specified tag, figure out that there is already an instance running and will not create any additional machines.

This simple setup will provision all new machines into the default VPC and use the default security group. This has the disadvantage that in order for Ansible to be able to reach these machines, you will have to open port 22 for incoming connections in the attached security group. The playbook does not do this, but expects that you do this manually using for instance the EC2 console. Again, this is of course not a production-like.

Running the examples

I have put the full playbooks with all the required roles into my GitHub repository. To run the examples, carry out the following steps.

First, prepare the SSH keys and API tokens for EC2 and DigitalOcean as explained in the respective sections. Then, use the EC2 console to allow incoming traffic for port 22 from your local workstation in the default VPC, into which our playbook will launch the machines created on EC2. Next, run the following commands to clone the repository and execute the playbooks (if you have not used the SSH key names and locations as above, you will first have to edit the default variables in the roles ec2Instance and droplet accordingly).

git clone https://github.com/christianb93/ansible-samples
cd partVII
# Turn off strict host key checking
export ANSIBLE_HOST_KEY_CHECKING=False
# Run the EC2 example
ansible-playbook awsSite.yaml
# Run the DigitalOcean example
ansible-playbook doSite.yaml

Once the playbooks complete, you should now be able to ssh into the machine created on DigitalOcean using ssh droplet0 and into the machine created on EC2 using ssh aws0. Do not forget to delete all machines again when you are done to avoid unnecessary cost!

Limitations of our approach

The approach we have followed so far to manage our cloud environments is simple, but has a few drawbacks and limitations. First, the provisioning is done on the localhost in a loop, and therefore sequentially. Even if the provisioning of one machine only takes one or two minutes, this implies that the approach does not scale, as bringing up a large number of machines would take hours. One approach to fix that that I have seen on Stackexchange (unfortunately I cannot find the post any more, so I cannot give proper credit) is to use a static inventory file with a dummy group containing only the host names in combination with connection: local to execute the provisioning locally, but in parallel.

The second problem is more severe. Suppose, for instance, you have already created a droplet and now decide that you want 2GB of memory instead. So you might be tempted to go back, edit (or override) the variable sizeID and re-run the playbook. This will not give you any error message, but in fact, it will not do anything. The reason is that the module will only use the host name to verify that the target state has been reached, not any other attributes of the droplet (take a look at the module source code here to verify this). Thus, we have a successful playbook execution, but a deviation between the target state coded into the playbook and the real state.

Of course this could be fixed by a more sophisticated check – either by extending the module, or by coding this check into the playbook. Alternatively, we could try to leverage an existing solution that has this logic already implemented. For AWS, for instance, you could try to define your target state using CloudFormation templates and then leverage the CloudFormation Ansible module to apply this from within Ansible. Or, if you want to be less provider specific, you might want to give Terraform a try. This is what I decided to do, and we will dive into more details how to integrate Terraform and Ansible in the next post.

Automating provisioning with Ansible – control structures and roles

When building more sophisticated Ansible playbooks, you will find that a linear execution is not always sufficient, and the need for more advanced control structures and ways to organize your playbooks arises. In this post, we will cover the corresponding mechanisms that Ansible has up its sleeves.

Loops

Ansible allows you to build loops that execute a task more than once with different parameters. As a simple example, consider the following playbook.

---
  - hosts: localhost
    connection: local
    tasks:
      - name: Loop with a static list of items
        debug:
          msg: "{{ item }}"
        loop:
        - A
        - B

This playbook contains only one play. In the play, we first limit the execution to localhost and then use the connection keyword to instruct Ansible not to use ssh but execute commands directly to speed up execution (which of course only works for localhost). Note that more generally, the connection keyword specifies a so-called connection plugin of which Ansible offers a few more than just ssh and local.

The next task has an additional keyword loop. The value of this attribute is a list in YAML format, having the elements A and B. The loop keyword instructs Ansible to run this task twice per host. For each execution, it will assign the corresponding value of the loop variable to the built-in variable item so that it can be evaluated within the loop.

If you run this playbook, you will get the following output.

PLAY [localhost] *************************************
TASK [Gathering Facts]********************************
ok: [localhost]

TASK [Loop with a static list of items] **************
ok: [localhost] => (item=A) => {
    "msg": "A"
}
ok: [localhost] => (item=B) => {
    "msg": "B"
}

So we see that even though there is only one host, the loop body has been executed twice, and within each iteration, the expression {{ item }} evaluates to the corresponding item in the list over which we loop.

Loops are most useful in combination with Jinja2 expressions. In fact, you can iterate over everything that evaluates to a list. The following example demonstrates this. Here, we loop over all environment variables. To do this, we use Jinja2 and Ansible facts to get a dictionary with all environment variables, then convert this to a list using the item() method and use this list as argument to loop.

---
  - hosts: localhost
    connection: local
    tasks:
      - name: Loop over all environment variables 
        debug:
          msg: "{{ item.0 }}"
        loop:
         "{{ ansible_facts.env.items() | list }}"
        loop_control:
          label:
            "{{ item.0 }}"

This task also demonstrates loop controls. This is just a set of attributes that we can specify to control the behaviour of the loop. In this case, we set the label attribute which specifies the output that Ansible prints out for each loop iteration. The default is to print the entire item, but this can quickly get messy if your item is a complex data structure.

You might ask yourself why we use the list filter in the Jinja2 expression – after all, the items() method should return a list. In fact, this is only true for Python 2. I found that in Python 3, this method returns something called a dictionary view, which we then need to convert into a list using the filter. This version will work with both Python 2 and Python 3.

Also note that when you use loops in combination with register, the registered variable will be populated with the results of all loop iterations. To this end, Ansible will populate the variable to which the register refers with a list results. This list will contain one entry for each loop iteration, and each entry will contain the item and the module-specific output. To see this in action, run my example playbook and have a look at its output.

Conditions

In addition to loops, Ansible allows you to execute specific tasks only if a certain condition is met. You could, for instance, have a task that populates a variable, and then execute a subsequent task only if this variable has a certain value.

The below example is a bit artificial, but demonstrates this nicely. In a first task, we create a random value, either 0 or 1, using Jinja2 templating. Then, we execute a second task only if this value is equal to one.

---
  - hosts: localhost
    connection: local
    tasks:
      - name: Populate variable with random value
        set_fact:
          randomVar: "{{ range(0, 2) | random }}"
      - name: Print value
        debug:
          var: randomVar
      - name: Execute additional statement if randomVar is 1
        debug:
          msg: "Variable is equal to one"
        when:
          randomVar == "1"

To create our random value, we combine the Jinja2 range method with the random filter which picks a random element from a list. Note that the result will be a string, either “0” or “1”. In the last task within this play, we then use the when keyword to restrict execution based on a condition. This condition is in Jinja2 syntax (without the curly braces, which Ansible will add automatically) and can be any valid expression which evaluates to either true or false. If the condition evaluates to false, then the task will be skipped (for this host).

Care needs to be taken when combining loops with conditional execution. Let us take a look at the following example.

---
  - hosts: localhost
    connection: local
      - name: Combine loop with when
        debug:
          msg: "{{ item }}"
        loop:
          "{{ range (0, 3)|list }}"
        when:
          item == 1
        register:
          loopOutput
      - debug:
          var: loopOutput

Here, the condition specified with when is evaluated once for each loop iteration, and if it evaluates to false, the iteration is skipped. However, the results array in the output still contains an item for this iteration. When working with the output, you might want to evaluate the skipped attribute for each item which will tell you whether the loop iteration was actually executed (be careful when accessing this, as it is not present for those items that were not skipped).

Handlers

Handlers provide a different approach to conditional execution. Suppose, for instance, you have a sequence of tasks that each update the configuration file of a web server. When this actually resulted in a change, you want to restart the web server. To achieve this, you can define a handler.

Handlers are similar to tasks, with the difference that in order to be executed, they need to be notified. The actual execution is queued until the end of the playbook, and even if the handler is notified more than once, it is only executed once (per host). To instruct Ansible to notify a handler, we can use the notify attribute for the respective task. Ansible will, however, only notify the handler if the task results in a change. To illustrate this, let us take a look at the following playbook.

---
  - hosts: localhost
    connection: local
    tasks:
    - name: Create empty file
      copy:
        content: ""
        dest: test
        force: no
      notify: handle new file
    handlers:
    - name: handle new file
      debug:
        msg: "Handler invoked"

This playbook defines one task and one handler. Handlers are defined in a separate section of the playbook which contains a list of handlers, similarly to tasks. They are structured as tasks, having a name and an action. To notify a handler, we add the notify attribute to a task, using exactly the name of the handler as argument.

In our example, the first (and only) task of the play will create an empty file test, but only if the file is not yet present. Thus if you run this playbook for the first time (or manually remove the file), this task will result in a change. If you run the playbook once more, it will not result in a change.

Correspondingly, upon the first execution, the handler will be notified and execute at the end of the play. For subsequent executions, the handler will not run.

Handler names should be unique, so you cannot use handler names to make a handler listen for different events. There is, however, a way to decouple handlers and notifications using topics. To make a handler listen for a topic, we can add the listen keyword to the handler, and use the name of the topic instead of the handler name when defining the notificiation, see the documentation for details.

Handlers can be problematic, as they are change triggered, not state triggered. If, for example, a task results in a change and notifies a handler, but the playbook fails in a later task, the trigger is lost. When you run the playbook again, the task will most likely not result in a change any more, and no additional notification is created. Effectively, the handler never gets executed. There is, however, a flag –force-handler to force execution of handlers even if the playbook fails.

Using roles and includes

Playbooks are a great way to automate provisioning steps, but for larger projects, they easily get a bit messy and difficult to maintain. To better organize larger projects, Ansible has the concept of a role.

Essentially, roles are snippets of tasks, variables and handlers that are organized as reusable units. Suppose, for instance, that in all your playbooks, you do a similar initial setup for all machines – create a user, distribute SSH keys and so forth. Instead of repeating the same tasks and the variables to which they refer in all your playbooks, you can organize them as a role.

What makes roles a bit more complex to use initially is that roles are backed by a defined directory structure that we need to understand first. Specifically, Ansible will look for roles in a subdirectory called (surprise) roles in the directory in which the playbook is located (it is also possible to maintain system-wide roles in /etc/ansible/roles, but it is obviously much harder to put these roles under version control). In this subdirectory, Ansible expects a directory for each role, named after the role.

Within this subdirectory, each type of objects defined by this role are placed in separate subdirectories. Each of these subdirectories contains a file main.yml that defines these objects. Not all of these directories need to be present. The most important ones that most roles will include are (see the Ansible documentation for a full list) are:

tasks, holding the tasks that make up the role
defaults that define default values for the variables referenced by these tasks
vars containing additional variable definitions
meta containing information on the role itself, see below

To initialize this structure, you can either create these directories manually, or you run

cd roles
ansible-galaxy init

to let Ansible create a skeleton for you. The tool that we use here – ansible-galaxy – is actually part of an infrastructure that is used by the community to maintain a set of reusable roles hosted centrally. We will not use Ansible Galaxy here, but you might want to take a look at the Galaxy website to get an idea.

So suppose that we want to restructure the example playbook that we used in one of my last posts to bring up a Docker container as an Ansible test environment using roles. This playbook essentially has two parts. In the first part, we create the Docker container and add it to the inventory. In the second part, we create a default user within the container with a given SSH key.

Thus it makes sense to re-structure the playbook using two roles. The first role would be called createContainer, the second defaultUser. Our directory structure would then look as follows.

Here, docker.yaml is the main playbook that will use the role, and we have skipped the var directory for the second role as it is not needed. Let us go through these files one by one.

The first file, docker.yaml, is now much shorter as it only refers to the two roles.

---
  - name: Bring up a Docker container 
    hosts: localhost
    roles:
    - createContainer
  - name: Provision hosts
    hosts: docker_nodes
    become: yes
    roles:
    - defaultUser

Again, we define two plays in this playbook. However, there are no tasks defined in any of these plays. Instead, we reference roles. When running such a play, Ansible will go through the roles and for each role:

Load the tasks defined in the main.yml file in the tasks-folder of the role and add them to the play
Load the default variables defined in the main.yml file in the defaults-folder of the role
Merge these variable definitions with those defined in the vars-folder, so that these variables will overwrite the defaults (see also the precedence rules for variables in the documentation

Thus all tasks imported from roles will be run before any tasks defined in the playbook directly are executed.

Let us now take a look at the files for each of the roles. The tasks, for instance, are simply a list of tasks that you would also put into a play directly, for instance

---
- name: Run Docker container
  docker_container:
    auto_remove: yes
    detach: yes
    name: myTestNode
    image: node:latest
    network_mode: bridge
    state: started
  register: dockerData

Note that this is a perfectly valid YAML list, as you would expect it. Similarly, the variables you define either as default or as override are simple dictionaries.

---
# defaults file for defaultUser
userName: chr
userPrivateKeyFile: ~/.ssh/ansible-default-user-key_rsa

The only exception is the role meta data main.yml file. This is a specific YAML-based syntax that defines some attributes of the role itself. Most of the information within this file is meant to be used when you decide to publish your role on Galaxy and not needed if you work locally. There is one important exception, though – you can define dependencies between roles that will be evaluated and make sure that roles that are required for a role to execute are automatically pulled into the playbook when needed.

The full example, including the definitions of these two roles and the related variables, can be found in my GitHub repository. To run the example, use the following steps.

# Clone my repository and cd into partVI
git clone https://github.com/christianb93/ansible-samples
cd ansible-samples/partVI
# Create two new SSH keys. The first key is used for the initial 
# container setup, the public key is baked into the container
ssh-keygen -f ansible -b 2048 -t rsa -P ""
# the second key is the key for the default user that our 
# playbook will create within the container
ssh-keygen -f default-user-key -b 2048 -t rsa -P ""
# Get the Dockerfile from my earlier post and build the container, using
# the new key ansible just created
cp ../partV/Dockerfile .
docker build -t node .
# Put location of user SSH key into vars/main.yml for 
# the defaultUser role
mkdir roles/defaultUser/vars
echo "---" > roles/defaultUser/vars/main.yml
echo "userPrivateKeyFile: $(pwd)/default-user-key" >> roles/defaultUser/vars/main.yml
# Run playbook
export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook docker.yaml

If everything worked, you can now log into your newly provisioned container with the new default user using

ssh -i default-user-key chr@172.17.0.2

where of course you need to replace the IP address by the IP address assigned to the Docker container (which the playbook will print for you). Note that we have demonstrated variable overriding in action – we have created a variable file for the role defaultUser and put the specific value, namely the location of the default users SSH key – into it, overriding the default coming with the role. This ability to bundle all required variables as part of a role but at the same time allowing a user to override them is it what makes roles suitable for actual code reuse. We could have overwritten the variable as well in the playbook using it, making the use of roles a bit similar to a function call in a procedural programming language.

This completes our post on roles. In the next few posts, we will now tackle some more complex projects – provisioning infrastructure in a cloud environment with Ansible.

Automating provisioning with Ansible – working with inventories

So far, we have used Ansible inventories more or less as a simple list of nodes. But there is much more you can do with inventories – you can assign hosts to groups, build hierarchies of groups, use dynamic inventories and assign variables. In this post, we will look at some of these options.

Groups in inventories

Recall that our simple inventory file (when using the Vagrant setup demonstrated in an earlier post) looks as follows (say this is saved as hosts.ini in the current working directory)

[servers]
192.168.33.10
192.168.33.11

We have two servers and one group called servers. Of course, we could have more than one group. Suppose, for instance, that we change our file as follows.

[web]
192.168.33.10
[db]
192.168.33.11

When running Ansible, we can now specify either the group web or db, for instance

export ANSIBLE_HOST_KEY_CHECKING=False
ansible -u vagrant \
        -i hosts.ini \
        --private-key ~/vagrant/vagrant_key \
        -m ping web

then Ansible will only operate on the hosts in the group web. If you want to run Ansible for all hosts, you can still use the group “all”.

Hosts can be in more than one group. For instance, when we change the file as follows

[web]
192.168.33.10
[db]
192.168.33.10
192.168.33.11

then the host 192.168.33.10 will be on both groups, db and web. If you run Ansible for the db group, it will operate on both hosts, if you run it for the web group, it will operate only on 192.168.33.10. Of course, if you use the pseudo-group all, then Ansible will touch this host only once, even though it appears twice in the configuration file.

It is also possible to define groups as the union of other groups. To create a group that contains all servers, we could for instance do something like

[web]
192.168.33.10
[db]
192.168.33.11
192.168.33.10
[servers:children]
db
web

Finally, as we have seen in the last post, we can define variables directly in the inventory file. We have already seen this on the level of individual hosts, but it is also possible on the level of groups. Let us take a look at the following example.

[web]
192.168.33.10
[db]
192.168.33.11
192.168.33.10
[servers:children]
db
web
[db:vars]
a=5
[servers:vars]
b=10

Here we define two variables, a and b. The first variable is defined for all servers in the db group. The second variable is defined for all servers in the servers group. We can print out the values of these variables using the debug Ansible module

ansible -i hosts.ini \
        --private-key ~/vagrant/vagrant_key  \
        -u vagrant -m debug -a "var=a,b" all

which will give us the following output

192.168.33.11 | SUCCESS => {
    "a,b": "(5, 10)"
}
192.168.33.10 | SUCCESS => {
    "a,b": "(5, 10)"
}

To better understand the structure of an inventory file, it is often useful to create a YAML representation of the file, which is better suited to visualize the hierarchical structure. For that purpose, we can use the ansible-inventory command line tool. The command

ansible-inventory -i hosts.ini --list -y

will create the following YAML representation of our inventory.

all:
  children:
    servers:
      children:
        db:
          hosts:
            192.168.33.10:
              a: 5
              b: 10
            192.168.33.11:
              a: 5
              b: 10
        web:
          hosts:
            192.168.33.10: {}
    ungrouped: {}

which nicely demonstrates that we have effectively built a tree, with the implicit group all at the top, the group servers as the only descendant and the groups db and web as children of this group. In addition, there is a group ungrouped which lists all hosts which are not explicitly assigned to a group. If you omit the -y flag, you will get a JSON representation which lists the variables in a separate dictionary _meta.

Group and host variables in separate files

Ansible also offers you the option to maintain group and host variables in separate files, which can again be useful if you need to deal with different environments. By convention, Ansible will look for variables defined on group level in a directory group_vars that needs to be a subdirectory of the directory in which the inventory file is located. Thus, if your inventory file is called /home/user/somedir/hosts.ini, you would have to create a directory /home/user/somedir/group_vars. Inside this directory, place a YAML file containing a dictionary with key-value pairs which will then be used as variable definitions and whose name is that of the group. In our case, if we wanted to define a variable c for all hosts in the db group, we would create a file group_vars/db.yaml with the following content

---
  c: 15

We can again check that this has worked by printing the variables a, b and c.

ansible -i hosts.ini \
        --private-key ~/vagrant/vagrant_key  \
        -u vagrant -m debug -a "var=a,b,c" db

Similary, you could create a directory host_vars and place a file there to define variables for a specific host – again, the file name should match the host name. It is also possible to merge several files – see the corresponding page in the documentation for all the details.

Dynamic inventories

So far, our inventories have been static, i.e. we prepare them before we run Ansible, and they are unchanged while the playbook is running. This is nice in a classical setup where you have a certain set of machines you need to manage, and make changes only if a new machine is physically added to the data center or removed. In a cloud environment, however, the setup tends to be much more dynamic. To deal with this situation, Ansible offers different approaches to manage dynamic inventories that change at runtime.

The first approach is to use inventory scripts. An inventory script is simply a script (an executable, which can be written in any programming language) that creates an inventory in JSON format, similarly to what ansible-inventory does. When you provide such an executable using the -i switch, Ansible will invoke the inventory script and use the output as inventory.

Inventory scripts are invoked by Ansible in two modes. When Ansible needs a full list of all hosts and groups, it will add the switch –list. When Ansible needs details on a specific host, it will pass the switch –host and, in addition, the hostname as an argument.

Let us take a look at an example to see how this works. Ansible comes bundled with a large number of inventory scripts. Let us play with the script for Vagrant. After downloading the script to your working directory and installing the required module paramiko using pip, you can run the script as follows.

python3 vagrant.py --list

This will give you an output similar to the following (I have piped this through jq to improve readability)

{
  "vagrant": [
    "boxA",
    "boxB"
  ],
  "_meta": {
    "hostvars": {
      "boxA": {
        "ansible_user": "vagrant",
        "ansible_host": "127.0.0.1",
        "ansible_ssh_private_key_file": "/home/chr/vagrant/vagrant_key",
        "ansible_port": "2222"
      },
      "boxB": {
        "ansible_user": "vagrant",
        "ansible_host": "127.0.0.1",
        "ansible_ssh_private_key_file": "/home/chr/vagrant/vagrant_key",
        "ansible_port": "2200"
      }
    }
  }
}

We see that the script has created a group vagrant with two hosts, using the names in your Vagrantfile. For each host, it has, in addition, declared some variables, like the private key file, the SSH user and the IP address and port to use for SSH.

To use this dynamically created inventory with ansible, we first have to make the Python script executable, using chmod 700 vagrant.py. Then, we can simply invoke Ansible pointing to the script with the -i switch.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible -i vagrant.py -m ping all

Note that we do not have to use the switches –private-key and -u as this information is already present in the inventory.

It is instructive to look at the (rather short) script. Doing this, you will see that behind the scenes, the script simply invokes vagrant status and vagrant ssh-config. This implies, however, that the script will only detect your running instances properly if you execute it – and thus Ansible – in the directory in which your Vagrantfile is living and in which you issued vagrant up to bring up the machines!.

In practice, dynamic inventories are mostly used with public cloud providers. Ansible comes with inventory scripts for virtually every cloud provider you can imagine. As an example, let us try out the script for EC2.

First, there is some setup required. Download the inventory script ec2.py and, placing it in the same directory, the configuration file ec2.ini. Then, make the script executable.

A short look at the script will show you that it uses the Boto library to access the AWS API. So you need Boto installed, and you need to make sure that Boto has access to your AWS credentials, for instance because you have a working AWS CLI configuration (see my previous post on using Python with AWS for more details on this). As explained in the comments in the script, you can also use environment variables to provide your AWS credentials (which are stored in ~/.aws/credentials when you have set up the AWS CLI).

Next, bring up some instances on EC2 and then run the script using

./ec2.py --list

Note that the script uses a cache to avoid having to repeat time-consuming API calls. The expiration time is provided as a parameter in the ec2.ini file. The default is 5 minutes, which can be a bit too long when playing around, so I recommend to change this to e.g. 30 seconds.

Even if you have only one or two machines running, the output that the script produces is significantly more complex than the output of the Vagrant dynamic inventory script. The reason for this is that, instead of just listing the hosts, the EC2 script will group the hosts according to certain criteria (that again can be selected in ec2.ini), for instance availability zone, region, AMI, platform, VPC and so forth. This allows you to target for instance all Linux boxes, all machines running in a specific data center and so forth. If you tag your machines, you will also find that the script groups the hosts by tags. This is very useful and allows you to differentiate between different types of machines (e.g. database servers, web servers), or different stages. The script also attaches certain characteristics as variables to each host.

In addition to inventory scripts, there is a second mechanism to dynamically change an inventory – there is actually an Ansible module which maintains the copy of the inventory that Ansible builds in memory at startup (and thus makes changes that are only valid for this run), the add_host module. This is mostly used when we use Ansible to actually bring up hosts.

To demonstrate this, we will use a slightly different setup as we have used so far. Recall that Ansible can work with any host on which Python is installed and which can be accessed via SSH. Thus, instead of spinning up a virtual machine, we can as well bring up a container and use that to simulate a host. To spin up a container, we can use the Ansible module docker_container that we execute on the control machine, i.e. with the pseudo-group localhost, which is present even if the inventory is empty. After we have created the container, we add the container dynamically to the inventory and can then use it for the remainder of the playbook.

To realize this setup, the first thing which we need is a container image with a running SSH daemon. As base image, we can use the latest version of the official Ubuntu image for Ubuntu bionic. I have created a Dockerfile which, based on the Ubuntu image, installs the OpenSSH server and sudo, creates a user ansible, adds the user to the sudoer group and imports a key pair which is assumed to exist in the directory.

Once this image has been built, we can use the following playbook to bring up a container, dynamically add it to the inventory and run ping on it to test that the setup has worked. Note that this requires that the docker Python module is installed on the control host.

---
  # This is our first play - it will bring up a new Docker container and register it with
  # the inventory
  - name: Bring up a Docker container that we will use as our host and build a dynamic inventory
    hosts: localhost
    tasks:
    # We first use the docker_container module to start our container. This of course assumes that you
    # have built the image node:latest according to the Dockerfile which is distributed along with
    # this script. We use bridge networking, but do not expose port 22
    - name: Run Docker container
      docker_container:
        auto_remove: yes
        detach: yes
        name: myTestNode
        image: node:latest
        network_mode: bridge
        state: started
      register: dockerData
    # As we have chosen not to export the SSH port, we will have to figure out the IP of the container just created. We can
    # extract this from the return value of the docker run command
    - name: Extract IP address from docker dockerData
      set_fact:
        ipAddress:  "{{ dockerData['ansible_facts']['docker_container']['NetworkSettings']['IPAddress'] }}"
    # Now we add the new host to the inventory, as part of a new group docker_nodes
    # This inventory is then valid for the remainder of the playbook execution
    - name: Add new host to inventory
      add_host:
        hostname: myTestNode
        ansible_ssh_host: "{{ ipAddress }}"
        ansible_ssh_user: ansible
        ansible_ssh_private_key_file: "./ansible"
        groups: docker_nodes
  #
  # Our second play. We now ping our host using the newly created inventory
  #
  - name: Ping host
    hosts: docker_nodes
    become: yes
    tasks:
    - name: Ping  host
      ping:

If you want to run this example, you can download all the required files from my GitHub repository, build the required Docker image, generate the key and run the example as follows.

pip install docker
git clone https://www.github.com/christianb93/ansible-samples
cd ansible-samples/partV
./buildContainer
export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook docker.yaml

This is nice, and you might want to play with this to create additional containers for more advanced test setups. When you do this, however, you will soon realize that it would be very beneficial to be able to execute one task several times, i.e. to use loops. Time to look at control structures in Ansible, which we do in the next post.