December 2019 – LeftAsExercise

Virtual networking labs – virtual Ethernet networks with VLAN tags

In the previous posts, we have mainly been looking at virtual networking within one single physical hosts. This is nice, but to build cloud environments, we need to establish virtual networks across several physical hosts. In this post, we will start to look into technologies that make this possible and learn how VLAN tagging supports virtual Ethernet networks.

An introduction to virtual Ethernet networks

Today, essentially every Ethernet network you will come across is a switched network, where every server is more or less directly connected to a switch, and the switches are connected to each other to propagate traffic through your data center. A naive approach would be to use layer 2 switches to combine all Ethernet networks into one large broadcast domain, where every node is connected to every other node by a sequence of switches. This approach, however, creates a very large broadcast domain and is difficult to maintain as changes to the topology need to be done by a physical rearrangement. It might therefore be beneficial to have some way of dividing your physical Ethernet network into two or more logical (“virtual”) networks.

For servers that are connected to the same switch, this can be implemented by an approach known as port-based VLAN. To illustrate the idea, let us look at the following configuration, where four servers are connected to four different ports of one switch.

With this setup, a broadcast issued by one server will reach every other server, and all servers are part of one Ethernet network. To introduce virtualization, we could simply add some logic to the switch to divide the ports into two sets, where forwarding of Ethernet frames is only done within those two sets. If, for instance, we define one set to consist of the two ports connected to server 1 and server 2 (green), and the other consisting of the remaining two ports (red), and configure the switch such that it will only forward frames between ports with the same color, we will effectively have established two virtual networks.

This is nice, as – if your switch supports it – no additional hardware is required and you can define and change the configuration entirely in software. But there is a problem. Typically, your data center will have more than one switch. How can you extend these virtual networks across multiple switches? Of course, you could add an additional connection for every virtual network between any two switches, but this will blow up your hardware requirements and again make changes in hardware necessary. To avoid this, a technology called VLAN trunking is needed.

With VLAN trunking, different virtual LANs (VLANs) can share the same physical connection. To enable this, Ethernet frames that travel on this shared part of your infrastructure are enhanced by adding a VLAN tag which contains a numerical ID identifying the VLAN to which they belong, as indicated in the following diagram.

Here, we have two switches, which both use port-based virtual networks as just discussed. The upper two ports of each switch belong to the green network which is assigned the ID 1 (VLAN ID or VID, note that in reality, this ID is often reserved) and the other set of ports is part of VLAN 2 (the red network). When a frame leaves, for instance, the server in the upper left corner and needs to be forwarded to the server in the upper right corner, the switch will add a VLAN tag to indicate that this frame is part of VLAN 1. Then the frame travels across the connection between the two switches. Then the switch on the right hand side receives the frame, it strips off the VLAN frame again and, based on the tag, injects the frame back into its own VLAN 1, so that it can only reach the green ports on the right hand side.

Thus your network is divided into two parts. In the middle, on the connection between the two switches, frames carry the VLAN tag to flag them as being part of the red or green network. Thus the ports facing this part need to be aware of the VLAN tag – these ports are often called trunk ports. The parts of the network behind the switches, however, do never see a VLAN tag, as it is added and removed by the switches when transmitting and receiving on trunk ports. These ports are called access ports. Thus the servers do not need to known to which VLAN they belong, and the configuration can be done entirely on the switches and in software.

The standard that describes all this and also defines how a VLAN tag is added to an Ethernet frame is called IEEE 802.1Q. This standard adds a 16-bit field called TCI – tag control information to the layout of an Ethernet frame. Four bits of this field are reserved for other purposes, so that 12 bits remain for the VLAN ID, allowing a maximum of 4096 different VLANs.

Lab 8: VLAN networking with Linux

Linux has the capability to create virtual Ethernet devices that are associated with a VLAN network. To see this in action, get lab 8 from my GitHub repository and run it.

git clone https://github.com/christianb93/networking-samples
cd networking-samples/lab8
vagrant up

The Vagrantfile and the three Ansible playbooks that are located in this directory will now execute and bring up three virtual machines. Here is a diagram summarizing the network configuration that the scripts create (we will see how this is done manually further below).

We see that all three machines are connected to one virtual Ethernet cable (we use a VirtualBox internal network for that purpose). The three interfaces attached to this network are configured as part of the IP network 192.168.50.0/24.

However, in addition, we have set up two virtual networks – one network with VLAN ID 100 (green), and a second network with VLAN ID 200 (red). In each Linux machine, the virtual networks to which the machine is attached is represented by a virtual device called a VLAN device.

Let us look at boxA to see how this works. On boxA, the Ansible playbook that got executed during the vagrant up did run the following command

vconfig add enp0s8 100

This command is creating a new network interface enp0s8.100 sitting on top of enp0s8 but being associated with the VID 100. This device is an ordinary device from the point of view of the operating system, i.e. you can assign IP addresses, add routes and so forth.

Such a VLAN device operates as follows. When an Ethernet frame arrives on the underlying device, enp0s8 in our case, the kernel checks whether the frame contains a VLAN tag. If no, the processing is as usual. If yes, then the kernel next checks whether a VLAN device is associated with this VID. If there is one, it strips off the VLAN tag, changes the frame so that it appears to be coming from the virtual VLAN device and re-injects the frame into the networking stack. The frame then travels up the stack and can be processed by the higher layers, e.g. the IP layer. Conversely, if a frame needs to be transmitted on enp0s8.100, the kernel adds a VLAN tag with the VID 100 to the frame and redirects it to the physical device enp0s8.

Let us see this in action. Open two SSH connections, one to boxA, and one to boxB – if you use the Gnome terminal, simply run

for i in "A" "B" ; do gnome-terminal -e "vagrant ssh box$i"; done

In boxA, start a tcpdump session on the VLAN device.

sudo tcpdump -e -i enp0s8.100

On boxB, ping boxA, using the IP address 192.168.60.4 (the IP address of the VLAN device). You will see an ordinary frame coming in, with ethertype IPv4. There is no VLAN tag within this frame, and the VLAN device operates like a physical device with no VLAN tagging.

Now, stop the tcpdump session and start it again, but this time, use enp0s8 instead of enp0s8.100, i.e. the underlying physical device. If you now run a ping again, you will see that the ethertype of the incoming packages has changed and is now 802.1Q, indicating that the frame is tagged (tcpdump will also show you the VLAN ID 100).

When you ping boxA from boxB using the IP address 192.168.50.4, the traffic will be as expected, coming in on enp0s8 without any VLAN tag, and will not reach enp0s8.100. Thus even though you have put a VLAN device on top of the physical interface, you can still use the physical interface as usual.

It is instructive to check the ARP cache on boxB using arp -n after the pings have been exchanged. You will see that the MAC address of the enp0s8 device on boxA now appears twice, once with the IP address 192.168.50.4 and once with 192.168.60.4. So the MAC address is shared between the virtual VLAN device and the physical device.

Still, the traffic is separated by the Linux kernel. If, for instance, you try to ping 192.168.70.6 (one of the IP addresses of boxC) from boxA, you will not be successful, because this IP address is on the red network and not reachable from the green network. If you run the ping on boxB, however, it will work, because boxB participates in both virtual networks.

This closes todays lab. In the next lab, we will start to look at a completely different approach to building virtual networks – overlay networks.

Using Ansible with a jump host

For an OpenStack project using Ansible, I recently had to figure out how to make Ansible work with a jump host. After an initial phase of total confusion, I finally found my way through the documentation and various sources and ended up with several working configurations. This post documents what I have learned on that journey to hopefully make your life a bit easier.

Setup

In my previous posts on Ansible, I have used a rather artificial setup – a set of hosts which all expose the SSH port on the network so that we can connect directly. In real world, hosts are often hidden behind firewalls, and a pattern that you will see frequently is that only one host in a network can directly be reached via SSH – called a jump host or a bastion host – and you need to SSH into all the other hosts from there.

To be able to experiment with this situation, let us first create a lab environment which simulates this setup on Google’s cloud plattform (but any other cloud platform that has a concept of a VPC should do as well).

First, we need a project in which our resources will live. For this lab, create a new project called terraform-project with a project ID like terraform-project-12345 (of course, you will not be able to use the exact same project ID as I did, as project IDs are supposed to be unique), for instance from the Google Cloud console under “Manage Resources” in the IAM & Admin tab.

Next, create a service account for this project and assign the role “Compute Admin” to this account (which is definitely not the most secure setup and clearly not advisable for a production setup). Create a key for this service account, download the key in JSON format and store it as ~/gcp_terraform_service_account.json

In addition, you will need a private / public SSH key pair. You can reuse an existing key or create a new one using

ssh-keygen -P "" -b 2048 -t rsa -f ~/.ssh/gcp-default-key

Now we are ready to download and run the Terraform script. To do this, open a terminal on your local PC and enter

git clone https://github.com/christianb93/ansible-samples
cd ansible-samples/jumphost
terraform init
terraform apply -auto-approve

When opening the Google Cloud Console after the script has completed, you should be able to verify that two virtual networks with two machines on them have been created, with a topology as summarized by the following diagram.

So we see that there is a target host which is connected to a private network only, and a jump host which has a public IP address and is attached to a public network.

One more hint: when playing with SSH, keep in mind that on the Ubuntu images used by GCE, sshguard is installed by default which will monitor the SSH log files and, if something that looks like an attach is identified, insert a firewall rule into the filter table which blocks all incoming traffic (including ICMP)from the machine from which the suspicious SSH connections came. As playing around with some SSH features might trigger an alert, the Terraform setup script will therefore remove sshguard from the machines upon startup (though there would of course be smarter ways to deal with that, for instance by adding our own IP to the sshguard whitelist).

The SSH ProxyCommand feature

Before talking about SSH and jump hosts, we first have to understand some features of SSH (and when I say SSH here and in the following, I mean OpenSSH) that are relevant for such a configuration. Let us start with the ProxyCommand feature.

In an ordinary setup, SSH will connect to an SSH server via TCP, i.e. it will establish a TCP connection to port 22 of the server and will start to run the SSH protocol over this connection. You can, however, tell SSH to operate differently. In fact, SSH can spawn a process and write to STDIN of that process instead of writing to a TCP connection, and similarly read from STDOUT of this process. Thus we replace the TCP connection as communication channel by an indirect communication via this proxy process. In general, it is assumed that the proxy process in turn will talk to an SSH server in the background. The ProxyCommand flag tells SSH to use a proxy process to communicate with a server instead of a TCP connection and also how to start this process. Here is a diagram displaying the ordinary connection method (1) compared to the use of a proxy process (2).

To see this feature in action, let us play a bit with this and netcat. Netcat is an extremely useful tool which can establish a connection to a socket or listen on a socket, send its own input to this socket and print out whatever it sees on the socket.

Let us now open a terminal and run the command

nc -l 1234

which will ask netcat to listen for incoming connections on port 1234. In a second terminal windows, run

ssh -vvv \
    -o "ProxyCommand nc localhost 1234" \
    test@172.0.0.1

Here the IP address at the end can be any IP address (in fact, SSH will not even try to establish a connection to this IP as it uses the proxy process to communicate with the apparent server). The flag -vvv has nothing to do with the proxy, but just produces some more output to better see what is going on. Finally, the ProxyCommand flag will specify to use nc localhost 1234 as proxy process, i.e. an instance of netcat connecting to port 1234 (and thus to the instance of netcat in our second terminal).

When you run this, you should see a string similar to

SSH-2.0-OpenSSH_7.6p1 Ubuntu-4

on the screen in the second terminal, and in the first terminal, SSH will seem to wait for something. Copy this string and insert it again in the second terminal (in which netcat is running) below this output. At this point, some additional output should appear, starting with some binary garbage and then printing a few strings that seem to be key types.

This is confusing, but actually expected – let us try to understand why. When we start the SSH client, it will first launch the proxy process, i.e. an instance of netcat – let us call this nc1. This netcat instance receives 1234 as the only parameter, so it will happily try to establish a connection to port 1234. As our second netcat instance – which we call nc2 – is listening on this port, it will connect to nc2.

From the clients point of view, the channel to the SSH server is now established, and the protocol version exchange as described in section 4.2 of RFC4253 starts. Thus the client sends a version string – the string you see appearing first in the second terminal – to its communication channel. In our case, this is STDIN of the proxy process nc1, which takes that string and sends it to nc2, which in turn prints it on the screen.

The SSH client is now waiting for the servers version string as the response. When we copied that string to the second terminal, we provided it as input to nc2, which in turn did send it to nc1 where it was printed on STDOUT of nc1. This data is seen as coming across the communication channel by the SSH client, i.e. from our fictitious SSH server. The client is happy and continues with the next phase of the protocol – the key exchange (KEX) phase described in section 7 of the RFC. Thus the client sends a list of key types that it supports, in the packet format specified in section 6, and this is the garbage followed by some strings that we see. Nice…

STDIO forwarding with SSH

Let us now continue to study a second feature of OpenSSH called standard input and output tunneling which is activated with the -W switch.

The man page is a bit short at this point, stating that this switch requests that standard input and output on the client be forwarded to host on port over the secure channel. Let us try to make this a bit clearer.

First, when you start the SSH client, it will, like any interactive process, connect its STDOUT and STDIN file descriptors to the terminal. When you use the option -W, no command will be executed on the SSH server, but instead the connection will remain open and SSH will establish a connection from the SSH server to a remote host and port that you specify along with the -W switch. Then any input that you provide to the SSH client will travel across the SSH connection and be fed into that connection, whereas anything that is received from this connection from the remote host is sent back via the SSH connection to the client – a bit like a remote version of netcat.

Again, let us try this out. To do this, we need some host and port combination that we can use with -W which will provide some meaningful output. I have decided to use httpbin.org which, among other things, will give you back your own IP address. Let us first try this locally. We will use the shell’s built-in printf statement to prepare a HTTP GET request and feed that into netcat which will connect to httpbin.org, send our request and read the response.

$ printf "GET /ip HTTP/1.0\r\n\r\n" | nc httpbin.org 80
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Tue, 17 Dec 2019 18:20:20 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Length: 45
Connection: Close

{
  "origin": "46.183.103.8, 46.183.103.8"
}

The last part is the actual body data returned with the response, which is our own IP address in JSON format. Now replace our local netcat with its remote version implemented via the SSH -W flag. If you have followed the setup described above, you will have provisioned a remote host in the cloud which we can use as SSH target, and a user called vagrant on that machine. Here is our example code.

$ printf "GET /ip HTTP/1.0\r\n\r\n" | ssh -i ~/.ssh/gcp-default-key -W httpbin.org:80 vagrant@34.89.221.226
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Tue, 17 Dec 2019 18:22:17 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Length: 47
Connection: Close

{
  "origin": "34.89.221.226, 34.89.221.226"
}

Of course, you will have to replace 34.89.221.226 with the public IP address of your cloud instance, and ~/.ssh/gcp-default-key with your private SSH key for this host. We see that this time, the IP address of the host is displayed. What happens is that SSH makes a connection to this host, and the SSH server on the host in turn reaches out to httpbin.org, opens a connection on port 80, sends the string received via the SSH client’s STDIN to the server, gets the response back and sends it back over the SSH connection to the client where it is finally displayed.

TCP/IP tunneling with SSH

Instead of tunneling stdinput / stdoutput through an SSH connection, SSH can also tunnel a TCP/IP connection using the flag -L. In this mode, you specify a local port, a remote host (reachable from the SSH server) and a remote port. The SSH daemon on the server will then establish a connection to the remote host and remote port, and the SSH client will listen on the local port. If a connection is made to the local port, the connection will be forwarded through the SSH tunnel to the remote host.

There is a very similar switch -R which establishes the same mechanism, but with the role of client and server exchanged. Thus the client will connect to the specified target host and port, and the server will listen on a local port on the server. Incoming connections to this port will then be forwarded via the tunnel to the connection held by the client.

Putting it all together – the five methods to use SSH jump hosts

We now have all the tools in our hands to use jump hosts with SSH. It turns out that these tools can be combined in five different ways to achieve our goal (and there might even be more, if you are only creative enough). Let us go through them one by one.

Method 1

We could, of course, simply place the private SSH key for the target host on the jump host and then use ssh to run ssh on the jump host.

scp -i ~/.ssh/gcp-default-key \
    ~/.ssh/gcp-default-key \
    vagrant@34.89.221.226:/home/vagrant/key-on-jump-host
ssh -t -i ~/.ssh/gcp-default-key \
  vagrant@34.89.221.226 \
    ssh -i key-on-jump-host vagrant@192.168.178.3

You will have to adjust the IP addresses to your setup – 34.89.221.226 is the public IP address of the jump host, and 192.168.178.3 is the IP address under which our target host is reachable from the jump host. Also note the -t flag which is required to make the inner SSH process feel that it is connected to a terminal.

This simple approach works, but has the major disadvantage that is forces you to store the private key on the jump host. This makes your jump host a single point of failure in you security architecture, which is not a good thing as this host is typically exposed at a network edge – not a good idea. In addition, this can quickly undermine any serious attempt to establish a central key management in an organisation. So we are looking for methods that will allow you to keep the private key on the local host.

Method two

To achieve this, there is a method which seems to be the “traditional” approach that can find in most tutorials and that uses a combination of netcat and the ProxyCommand flag. Here is the command that we use.

ssh -i ~/.ssh/gcp-default-key \
   -o "ProxyCommand \
     ssh -i ~/.ssh/gcp-default-key vagrant@34.89.221.226\
     nc %h 22" \
   vagrant@192.168.178.3

Again, you will have to adjust the IP addresses in this example as explained above. When you run this, you should be greeted by a shell prompt on the target host – and we can now understand why this works. SSH will first run the proxy command on the client machine, which in turn will invoke another “inner” SSH client establishing a session to the jump host. In this session, netcat will be started on the jump host and connect to the target host.

We have now established a direct channel from standard input / output of the second SSH client – the proxy process – to port 22 of the target host. Using this channel, the first “outer” SSH client can now proceed, negotiate versions, exchange keys and establish the actual SSH session to the target host.

It is interesting to use ps on the client and the jump host and netstat on the jump host to verify this diagram. On the client, you will see two SSH processes, one with the full command line (the outer client) and a second one, spawned by the first one, representing the proxy command. On the jump host, you will see the netcat process that the SSH daemon sshd has spawned, and the TCP connection to the SSH daemon on the target host established by the netcat process.

There is one more mechanism being used here – the symbol %h in the proxy command is an example for what the man page calls a token – a placeholder which is replaced by SSH at runtime. Here, %h is replaced by the host name to which we connect (in the outer SSH command!), i.e. the name or IP of the target host. Instead of hardcoding the port number 22, we could also use the %p token for the port number.

Method three

This approach still has the disadvantage that it requires the extra netcat process on the jump host. In order to avoid this, we can use the same approach using a stdin / stdout tunnel instead of netcat.

ssh -i ~/.ssh/gcp-default-key \
  -o "ProxyCommand \
    ssh -i ~/.ssh/gcp-default-key -W %h:%p vagrant@34.89.221.226"\
  vagrant@192.168.178.3

When you run this and then inspect processes and connections on the jump host, you will find that the netcat process has gone, and the connection from the jump host to the target is initiated by a (child process of) the SSH daemon running on the jump host.

This method is described in many sources and also in the Ansible FAQs.

Method four

Now let us turn to the fourth method that is available in recent versions of OpenSSH – a new ProxyJump directive that can be used in the SSH configuration. This approach is very convention when working with SSH configuration files, so let us take a closer look at it. Let us create a SSH configuration file (typically this is ~/.ssh/config) with the following content:

Host jump-host
  HostName 34.89.221.226
  IdentityFile ~/.ssh/gcp-default-key
  User vagrant 

Host target-host
  HostName 192.168.178.3
  IdentityFile ~/.ssh/gcp-default-key
  User vagrant 
  ProxyJump jump-host

To test this configuration, simply run

ssh target-host

and you should directly be taken to a prompt on the target host. What actually happens here is that SSH looks up the configuration for the host that you specify on the command line – target-host – in the configuration file. There, it will find the ProxyJump directive, referring to the host jump-host. SSH will follow that reference, retrieve the configuration from this host from the same file and use it to establish the connection.

It is instructive to run ps axf in a second terminal on the client after establishing the connection. The output of this command on my machine contains the following two lines.

  49 tty1     S      0:00  \_ ssh target-host
  50 tty1     S      0:00      \_ ssh -W [192.168.178.3]:22 jump-host

So what happens behind the scenes is that SSH will simply start a a second session to open a stdin/stdout tunnel, as we have done it manually before. Thus the ProxyJump option is nothing but a shortcut for what we have done previously.

The equivalent of the ProxyJump directive in the configuration file is the switch -J on the command line. Using this switch directly without a configuration file does, however, have the disadvantage that it is not possible to specify the SSH key to be used for connecting to the jump host. If you need this, you will have to use the -W option discussed above which will provide the same result.

Method five

Finally, there is another method that we could use – TCP/IP tunneling. On the client, we start a first SSH session that will open a local port with port number 8222 and establish a connection to port 22 of the target host.

ssh -i ~/.ssh/gcp-default-key \
  -L 8222:192.168.178.3:22 \
  vagrant@34.89.221.226

Then, in a second terminal on the client, we use this port as the target for an SSH connection. The connection request will then to through the tunnel and we will actually establish a connection to the target host, not to the local host.

ssh -i ~/.ssh/gcp-default-key \
    -p 8222 \
    -o "HostKeyAlias 192.168.178.3" \
    vagrant@127.0.0.1

Why do we need the additional option HostKeyAlias here? Without this option, SSH will take the target host specified on the command line, i.e. 127.0.0.1, and use this host name to look up the host key in the database of known host keys. However, the host key it actually receives during the attempt to establish a connection is the host key of the target host. Therefore, the keys will not match, and SSH will complain that the host key is not known. The HostKeyAlias 192.168.178.3 instructs SSH to use 192.168.178.3 as the host name for the lookup, and SSH will find the correct key.

Ansible configuration with jump hosts

Let us now discuss the configuration needed in Ansible to make this work. As explained in the Ansible FAQs, Ansible has a configuration parameter ansible_ssh_common_args that can be used to define additional parameters to be added to the SSH command used to connect to a host. In our case, we could set this variable as follows.

ansible_ssh_common_args:  '-o "ProxyCommand ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/gcp-default-key -W %h:%p vagrant@34.89.221.226"'

Here the first few options are used to avoid issues with unknown or changed SSH keys (you can also set them here for the outer SSH connection if this is not yet done in your ansible.cfg file). There are several options that you have to set this variable. As all variables in Ansible, this is a per-host variable which we could set in the inventory or (as the documentation suggests) in a group_vars folder on the file system.

In my repository, I have created an example that uses the technique explained in an earlier post to build an inventory dynamically by capturing Terraform output. In this inventory, we set the ansible_ssh_common_args variable as above to be able to reach our target host via the jump host. To run the example, follow the initial configuration steps as explained above and then do

git clone https://github.com/christianb93/ansible-samples/
cd ansible-samples/jumphost
terraform init
ansible-playbook site.yaml

This playbook will not do anything except running Terraform (which will create the environment if you have not done this yet), capturing the output, building the inventory and connecting once to each host to verify that they can be reached.

There are many more tunneling features offered by SSH which I have not touched upon in this post – X-forwarding, for instance, or device tunneling which creates tun and tap devices on the local and the remote machine and therefore allows us to easily build a simple VPN solution. You might want to play with the SSH man pages (and your favorite search engine) to find out more on those features. Enjoy!

Virtual networking labs – more on bridges

In the previous post, we have seen how a software-defined Linux bridge can be established and how it transparently connects two Ethernet devices. In this post, we will take a closer look at how to set up and monitor bridges and learn how VirtualBox uses bridges for virtual networking.

Lab 6: setting up and monitoring bridges

For this lab, we will start with the setup of lab 5 that we have gone through in the previous post. If you have destroyed your environments again, the easiest way to get back to the point where we left off is to let Vagrant and Ansible do the work. I have created a Vagrantfile and a set of playbooks to take care of this. So simply do

git clone https://github.com/christianb93/networking-samples
cd lab6
vagrant up

to bring up all machines and configure the network interfaces as in my last post. You can then use vagrant ssh to SSH into one of the three virtual machines.

First, let us go through the steps that we have used to set up boxB, the machine on which the bridge is running. Recall that, after installing the bridge-utils package, we used the following sequence of commands.

sudo brctl addbr myBridge
sudo ifconfig enp0s8 promisc 0.0.0.0
sudo ifconfig enp0s9 promisc 0.0.0.0
sudo brctl addif myBridge enp0s8
sudo brctl addif myBridge enp0s9
sudo ifconfig myBridge up

The first command is easy to understand. It uses the brctl command line utility to actually set up a bridge called myBridge.

Next, we re-configure the two devices that we will turn into bridge ports. As explained in chapter 10 of “Understanding Linux network internals”, if an Ethernet frame is received on an interface which has been added to a bridge, the usual processing of the frame (i.e. passing the frame to all registered layer 3 protocol handlers) is skipped, and the frame is handed over to the bridging code. Therefore, it does not make sense to have an IP address associated with our bridge ports enp0s8 and enp0s9 any more. In addition, we need to set the devices into promiscuous mode, i.e. we need to enable them to receive packets which are not directed towards their own Ethernet address. This becomes clear if you look at our network diagram once more.

If an Ethernet frame is sent out by boxC, directed towards the interface of boxA, it will have the MAC address of this interface as target address in its Ethernet header. Still, it needs to be picked up by the enp0s9 device on boxB so that it can be handed over to the bridge. If we would not put the device into promiscuous mode, it would drop the frame as its target MAC address does not match its own MAC address (strictly speaking, setting the device into promiscuous mode manually is not really needed, as the Linux kernel will do this automatically when we add the port to the bridge, but we do this here explicitly to highlight this point).

Once we have re-configured our two network devices, we add them to the bridge using brctl addif. We finally bring up the bridge using ifconfig.

Let us now look a bit into the details of our bridge. First, recall that a bridge usually operates by learning MAC addresses. For a Linux bridge, this holds as well, and in fact, a Linux bridge maintains a table of known MAC addresses and the ports behind which they are located. To display this table, open an SSH connection to boxB and run

sudo brctl showmacs myBridge

If you look at the output, you will see that the bridge differentiates between local and non-local addresses. A local address is the MAC address of an interface which is attached to the bridge. In our case, these are the two interfaces enp0s9 and enp0s8 that are part of your bridge on boxB. A non-local address is the address of an Ethernet device on the local network which is not directly attached to the bridge. In our example, these are the Ethernet devices enp0s8 on boxA and boxC.

You also see that these entries are ageing, i.e. if no frames related to an interface that the bridge knows are seen for some time, the entry is dropped and recreated if the interface appears again. The reason for this behaviour is to avoid problems if you reconfigure your physical network so that maybe an Ethernet device thas has been part of the network behind port 1 moves into a part of the network which is behind port 2.

You can also monitor the traffic that flows through the bridge. If, for instance, you run a sniffer like tcpdump on box B using

sudo tcpdump -e -i myBridge

and then create some traffic using for instance ping, you will see that the packets cross the Ethernet bridge.

It is also instructive to run a traceroute on boxA targeted towards boxC. If you do this, you will find that there is no hop between the two devices, again confirming that our bridge operates on layer 2 and behaves like a direct connection between boxA and boxC.

Finally, let us quickly discuss the configuration of the bridge itself. If you look at the configuration using ifconfig myBridge, you will see that the bridge has a MAC address itself, which is the lowest MAC address of all devices added to the bridge (but can also be set manually). In fact, we will see in a second that it is also possible to assign an IP address to a bridge!

This is a bit confusing, after all, a bridge is logically simply a direct connection between the two ports, but nothing which can by itself emit and absorb Ethernet frames. However, on Linux, setting up a bridge also creates a “default-port” on the bridge which is handled like any other network device. Technically speaking, the bridge driver is itself a network device driver (implemented here), and you can ask it to transmit frames. I tend to think of the situation as in the following image.

When the Linux kernel asks the bridge to transmit a frame, the bridge code will consult its table of known MAC addresses and send the frame to the correct port. Conversely, if a frame is received by any of the two ports enp0s8 or enp0s9 and forwarded to the bridge, the bridge does not only forward the frame to the correct port depending on the destination address, but also delivers the frame to the higher layers of the Linux networking stack if its Ethernet target address matches the MAC address of the bridge (or any of the local MAC address in the table of known MAC addresses).

Let us try this out. In our configuration so far, we have not been able to reach boxB via the bridged network, and, conversely, we could not reach boxA and boxC from boxB (try a ping to verify this). Let us now assign an IP address to the bridge device itself and add a route. On boxB, run

sudo ifconfig myBridge netmask 255.255.0.0 192.168.70.4

which will automatically add a route as well. Now, our network diagram has changed as follows (note the additional IP address on boxB).

You should now be able to ping boxB (192.168.70.4) from both boxA and boxB and vice versa. This capability allows one to use one Linux host as both an Ethernet bridge and a router at the same time.

Lab 7: bridged networking with VirtualBox

So far, we have used VirtualBox to create virtual machines, and have played with bridges inside these machines. Now we will turn this around and see how conversely, VirtualBox can use bridges to realize virtual networks.

It is tempting to assume that what is called bridged networking in the VirtualBox documentation actually uses bridges. This, however, is no longer the case. Instead, when you define a bridged network with VirtualBox, the vboxnetflt netfilter driver that also featured in our last post will be used to attach a “virtual Ethernet cable” to an existing device, and the device will be set into promiscuous mode so that it can pick up Ethernet frames targeted towards the virtual ethernet card of the VM and redirect them to the VirtualBox networking engine. Effectively, this exposes the virtual device of the VM to the local network. This is the reason that this mode of operations is called public networking in Vagrant.

Let us try this out. Again, you can start the test setup using Vagrant. This time, the Vagrantfile contains several machines which we bring up one by one.

git clone https://github.com/christianb93/network-samples
cd lab7
vagrant up boxA

When you start this script, it will first scan your existing network interfaces on the host and ask you to which it should connect. Choose the device which connects your machine to the LAN, for me this is eno1 which has the IP address 192.168.178.25 assigned to it.

To run these tests, you need a second machine connected to the same LAN to which your host is connected via the device that we have just used (eno1). In my case, this second machine has the IP address 192.168.178.28. According to the diagram above, this machine should now be able to see our VM the local network. In fact, all we have to do is to establish the required route. First, on your second machine, run

sudo route add -net 192.168.0.0 netmask 255.255.0.0 eth0

where eth0 needs to be replaced by the device which this machine uses to connect to the LAN. Now SSH into the virtual machine boxA and set up the corresponding route there.

sudo route add -net 192.168.0.0 netmask 255.255.0.0 enp0s8

In boxA, you should now be able to ping 192.168.178.28, and conversely, in your second machine, you should be able to ping 192.168.50.4. The setup is logically equivalent to the following diagram.

Of course this setup is broken as we work with two different subnets / netmasks on the same Ethernet network, but hopefully serves well to illustrate bridged networking with VirtualBox.

Now we stop this machine again, create a bridge on the host and bring up the second and third machine that are used in this lab.

vagrant destroy boxA --force
sudo brctl addbr myBridge
vagrant up boxB
vagrant up boxC

Here, both machines have a network device using the bridged networking mode. The difference to the previous setup, however, is now that the virtual machines are not attached to an existing physical device, but to a bridge, and both are attached to the same bridge.

This configuration is very flexible and leaves many options. We could, for instance, use an existing bridge created by some other virtualization engine or even Docker to interact with other virtual networks. We could also, as in the previous post, set up forwarding and NAT rules and assign an IP address to the bridge device to use the bridge as a gateway into the LAN. And we can attach additional interfaces like veth and tun/tap devices to the bridge. I invite you to play with this to try out some of these options.

We have now seen some of the typical networking technologies in virtual networks in action. However, there are additional approaches that we have not touched upon net – network separation using VLAN tags and overlay networks. In the next post, we will study to look at VLANs in order to establish virtual networks on layer 2.

Virtual networking labs – VirtualBox internal networks and bridges

So far, we have been playing with virtual networking for one virtual machine, connected to the host. Now let us see how we can establish virtual networks connecting more than one machine.

Lab3: Virtualbox host-only networking with more than one machine

In this lab, we will connect two virtual machines that both use host-only networking. To run the example, you can again clone my repository and use the prepared Vagrantfile.

git clone https://github.com/christianb93/networking-samples
cd lab3
vagrant up

This will bring up two virtual machines, boxA and boxB. When both of them are running, use vagrant ssh boxA and vagrant ssh boxB to connect to them.

When we inspect the network on the host, we see nothing which is really unexpected. Again, there is the virtual device vboxnet0 which has an IP address assigned to it, and there is a new entry in the routing table which sends all traffic for the network 192.168.50.0 to this device.

In each virtual machine, the situation is as in the last post. There is a virtual network interface enp0s3 which is connected to the NAT device, and there is a virtual interface enp0s8 which is connected to vboxnet0 via the mechanisms discussed in the previous post. However, the trick is that both machines are actually connected to the same virtual device, as in the following diagram.

So we should expect that the machines can talk to each other via this device, and in fact they can. You should be able to ping boxB as 192.168.50.5 from boxA and similary boxA as 192.168.50.4 from boxB.

When you run ifconfig -a to get the MAC addresses of the enp0s8 interfaces on both machines and also run arp -n to display the ARP cache, you will see that the MAC address of boxA is known on boxB and vice versa. This demonstrates that the machines can see each other on the Ethernet level, i.e. on layer 2, not only layer 3, as if they were connected to the same Ethernet segment.

Again, the virtual device has a MAC and an IP address and can be reached from the host. Via the route for the network 192.168.50.0 pointing to it, we can also reach both virtual machines from the host as in the case of an individual machine as before. So we could summarize the host-only network as a virtual network to which the machines are attached and which is also connected to the host networking stack.

Lab4: VirtualBox internal networking

This is very useful for many purposes, but sometimes, you want a virtual network that is completely separated from the host network.

This networking option does not require the virtual device vboxnet0, and to verify this, let us first remove it. To do this, open the VirtualBox GUI by running virtualbox, navigate to “Global Tools -> Host Network Manager”, locate vboxnet0 in the list and remove it.

Now let us bring up the virtual machines using Vagrant. If you have not yet done so, run vagrant destroy to complete lab3. Then switch to lab4, start Vagrant there and open two additional terminals with SSH sessions on the machines.

cd ../lab4
vagrant up
gnome-terminal -e 'vagrant ssh boxA' ;   gnome-terminal -e 'vagrant ssh boxB'

When you inspect the virtual machines, the situation is very similar to what we have seen in lab3, when we connected two machines with a host-only network.

Each machine has two interfaces, enp0s3 (the NAT interface) and enp0s8 (the internal networking interface)
Each machine has a route for the network 192.168.50.0 pointing to enp0s8
The machines can see each other as 192.168.50.4 and 192.168.50.4
If you ping the machines and then inspect the ARP cache, you will again find that the MAC address of the respective other machine is stored in the cache, indicating that the machines appear to be on the same Ethernet network

There is, however, a difference on the host. There is no additional virtual networking device being created, and there is no additional routing table entry on the host (nor any local routing table entry). Thus, the new network to which the machines are attached is actually completely isolated from the host network.

We have now considered host-only networking, NAT networking and internal networking in some detail. However, VirtualBox offers a couple of additional networking models. A model which is used similarly by other hypervisors like KVM is bridged networking. To get a feeling for this, we will first study Linux bridging in some detail before starting to see how VirtualBox applies this.

Lab 5: Linux bridging basics

In this lab, we will use a Linux bridge to connect two Ethernet networks and gain a basic understand of bridges.

A Linux bridge is essentially the virtual equivalent of a classical, physical Ethernet bridge. Recall that a bridge connects Ethernet networks on the link layer level. A bridge device has several ports, and is able to direct Ethernet frames entering in one port to the correct outgoing port to forward the packet into the part of the network where the target address is located. Most bridges are able to learn which MAC addresses are behind which port in order to operate efficiently.

Linux bridges are similar. They are virtual network devices to which you can attach other devices. They will then pick up traffic flowing into the bridge from one of these devices, evaluate the Ethernet address of the target and forward the packet to the respective target device (assuming that this is attached as well).

Let us see this in action. For this lab, I have created a configuration which has three virtual machines. Two of them are connected to a private network myNetworkA, two of them are connected to private network myNetworkB, and they all have a NAT device for SSH access.

Now, in this configuration, there is no way how boxC can reach boxA, because the networks myNetworkA and myNetworkB are completely isolated. Let us now set up a bridge to change this. Before we do this, however, we need to change a setting within VirtualBox. VirtualBox allows us to specify per network interface whether switching this device into the promiscuous mode should be allowed. For a bridge, we need this, because the Ethernet devices attached to the bridge should receive packets which are directed towards any other port on the bridge. If the VirtualBox setting is not changed, putting the devices into the promiscuous on the OS level will silently fail, and the bridge will not work (I had a bit of a hard time figuring this out, until I found this post in the VirtualBox forum). To change this setting, run the following commands on the host machine.

vm=$(vboxmanage list vms | grep "boxB" | awk '{print $1}' | sed s/\"//g)
vboxmanage controlvm $vm nicpromisc2 allow-all
vboxmanage controlvm $vm nicpromisc3 allow-all

Now we set up the actual bridge on box B. Switch into boxB and enter the following commands

sudo apt-get update
sudo apt-get install bridge-utils
sudo brctl addbr myBridge
sudo ifconfig enp0s8 promisc 0.0.0.0
sudo ifconfig enp0s9 promisc 0.0.0.0
sudo brctl addif myBridge enp0s8
sudo brctl addif myBridge enp0s9
sudo ifconfig myBridge up
# check that interfaces are in promiscuous mode
ifconfig -a

On boxA, run

sudo ifconfig enp0s8 netmask 255.255.0.0 192.168.50.4

And finally, enter the following commands on boxC:

sudo ifconfig enp0s8 netmask 255.255.0.0 192.168.60.4
ping 192.168.50.4

Let us see the bridge in action by dumping the traffic on the bridge device on boxB. To do this, switch to boxB and enter

sudo tcpdump -e -vvv -i myBridge

Then, in either boxA or boxC, try to ping the other machine. You should see the ICMP packages moving forth and back along the bridge. When you run arp -n on boxA and boxC, you will also see that each host knows the other host on the Ethernet level, i.e. the bridge did actually implement a connection on layer 2 (as opposed to an IP-based router which operates on layer 3). Thus with the bridge in place, the network now looks as follows.

To summarize, a virtual Linux bridge does exactly what a traditional switch in hardware does – it connects two Ethernet networks transparently on the Ethernet layer. But there is more to it, and in the next post, we will dig a bit deeper into how this works and how it can be applied in the context of virtualization.

Virtual networking labs – NAT and host-only networking with VirtualBox

When you work with virtualized environments, you will sooner or later realize that a large part of the complexity of such environments originates in the networking part. Networking itself is a non-trivial endeavor, and in the context of cloud and virtualization technology, you often stack different virtualization layers on top of each other. To provide the basics to understand all this, this series aims at introducing some of the more commonly used techniques using hands-on exercises.

Setup

To follow this series, I highly recommend to run the examples yourself. For that purpose, you will need Vagrant and VirtualBox installed on your machine, which we use for most of the examples. We will also use Docker at times, so this should be installed as well.

The setup of most examples is automated, using tools like Vagrant or Ansible that you will know when you have followed some earlier posts on this blog. The labs are stored in a Github reposítory that you should clone by running

git clone https://github.com/christianb93/networking-samples

on your machine.

Lab 1: NAT networking with VirtualBox

When you take a look at networking options of Linux based virtual machines like KVM, Xen or VirtualBox, you will find that certain networking modes tend to be common to all these virtualization solutions. First, there is typically a networking mode based on network address translation (NAT) to allow access to the internet from within the virtual machine. Then, there are networking modes which allow you to connect one or more virtual machine using software-emulated ethernet bridges. This can be combined with VLANs, the usage of routing tables or iptables firewall rules to realize advanced networking topologies. And finally, all these methods can be combined in a variety of different setups. Networking for VirtualBox is comparatively easy to understand, but still displays some of these ideas nicely. This is why I have chosen VirtualBox as an example hypervisor. The first networking mode that we will look at is called NAT networking and is actually the VirtualBox default.

To see this in action, switch to the lab1 directory and run Vagrant to bring up the example machine, then use Vagrant to SSH into the machine.

cd lab1
vagrant up
vagrant ssh

When you run this for the first time, Vagrant might have to download the used Ubuntu disk image, which might take a few minutes. Once you are logged into the machine, run ifconfig -a to get a list of all network devices.

You will find that there are two networking devices. First, there is of course the standard loopback device lo which is present on every Linux system. Then, there is an interface enp0s3 which looks like an ordinary Ethernet device (but is of course a virtual device). This device has a MAC address and an Ethernet address assigned to it, usually 10.0.2.15.

When you run route -n to list the content of the kernel routing tables, you will find that this is the default interface for outgoing traffic, with gateway IP address being 10.0.2.2. We can try this out – run

ping leftasexercise.com

to verify that you can actually reach servers on the Internet via this device.

How does this work? When an application within the virtual machine sends a TCP/IP packet to the virtual device, VirtualBox picks up the packet and performs a network address translation on it. It then forwards the resulting packet to the network on the host system. When the answer comes back, the reverse process is applied and to the application, it looks like the reply came from a real network device. In this way, we can reach any host which is also reachable from the host – including the host itself and any other virtual networks reachable from the host.

Let us try this out. On the host, start an NGINX container and determine its IP address.

docker run -d --rm --name=nginx  nginx:latest
docker inspect nginx | jq -r ".[0].NetworkSettings.IPAddress"

Let us suppose that the result is 172.17.0.2. Now switch back into the virtual machine and run

curl 172.17.0.2

and you should see the NGINX welcome page. To see the NAT’ing in action run

sudo netstat -t  -c -p

on the host and then run

telnet 172.17.0.2 80

inside the virtual machine to establish a long running connection to the NGINX server. When you stop the output of netstat and browse through it, you should find a connection established by the VBoxHeadless process that connects to port 80 on 172.17.0.2. What happens is that when we run the telnet command inside the virtual machine, VirtualBox will open a socket on the host machine and use that to connect to the target, similar to a NAT’ing device which proxies outgoing connections. So if you wanted to represent the setup diagramatically, the result would be something like this.

By the way, if you are asking yourself how the configuration of the network within the virtual machine has worked, take a look at the file /etc/netplan/50-cloud-init.yaml inside the virtual machine – here we see that the configuration is done by cloud-init and that the IP address is obtained using a DHCP server, which again is emulated by VirtualBox.

But wait, there is still a problem. If we are conceptually behind a gateway, this implies that the virtual machine cannot be reached from the host network. But how can we then SSH into it? The answer is that VirtualBox (respectively Vagrant) has created a port mapping for us, similar like you would configure an incoming forwarding rule in a classical gateway. Let us try to print out this route using the VirtualBox machine manager. First, we retrieve the name of the machine that Vagrant has created for us, place it in an environment variable and then invoke the VMM again to list some details which we search for forwarding rules.

vm=$(vboxmanage list vms | grep "boxA" | awk '{print $1}' | sed s/\"//g)
vboxmanage showvminfo --machinereadable $vm  | grep "Forwarding"

In fact, we see that there is a forwarding rule that directs incoming traffic on port 2222 from the host to port 22 (SSH) in the virtual machine where the SSH daemon is listening. This makes it possible to reach the machine via SSH.

Lab2: host-only networking

Next, we try a slightly different combination. We will bring up a virtual machine with two network devices, one using NAT as before, and one using host-only networking, or, in Vagrant terminology, private networking.

To run this example, first shut down your existing lab, then switch over to lab2 and restart Vagrant from there.

vagrant destroy
cd ../lab2/
vagrant up

The first thing that you will realize by running ifconfig -a on the host is that VirtualBox has actually created a new networking device vboxnet0 with IP address 192.168.50.1 on the host. When you run ethtool -i on this device, you will see that this device is managed by a custom driver which comes with VirtualBox (see source code here). On the host, VirtualBox has also added a new route, sending all traffic for the network destination 192.168.50.0 to this device.

When you log into the machine and run ifconfig -a, you will see that inside the machine, a new interface enp0s8 with IP address 192.168.50.4 is visible. This is the newly created host-only virtual networking device. Internally, VirtualBox captures traffic sent to this device and re-routes it to the vboxnet0 device and vice versa. Graphically, this looks as follows.

Let us briefly discuss how packets travel across this interface. First, inside the virtual machine, a new route has been added, sending traffic for the network
192.168.50.0 to this device. To test this route, let us first get rid of the NAT interface to have a clearer picture. To do this, we again use the VirtualBox machine manager.

vm=$(vboxmanage list vms | grep "boxA" | awk '{print $1}' | sed s/\"//g)
vboxmanage controlvm $vm setlinkstate1 off

If you have used vagrant ssh to SSH into the machine, this will of course kill your connection, as the connection uses the port forward rule associated to the NAT device. But we can easily get it back and, in doing so, also verify our first route, by using the IP address 192.168.50.4 to SSH into the machine from our host. This should work, as, on the host, we have a route to this destination via vboxnet0. However, we first need the location of the private SSH key file that Vagrant has created as part of the provisioning process using vagrant ssh-config, which will show you that the private key file is stored at .vagrant/machines/boxA/virtualbox/private_key. So we can run

ssh -o StrictHostKeyChecking=no -i .vagrant/machines/boxA/virtualbox/private_key vagrant@192.168.50.4

and should be back in our machine. Thus we can actually reach the machine from the host using vboxnet0. To verify that the reverse process also works, let us again bring up our Docker container for NGINX, but this time, we use port forwarding to bind it to a port on the host.

docker run -d --rm --name=nginx  -p 80:80 nginx:latest

This will of course only work if you do not already have a webserver running on port 80, but if there is one, what comes next should also work. If you now switch back to the virtual machine and run

curl 192.168.50.1

you will again see the NGINX default page.

It is also instructive to look at the ARP caches on both machines. First, on the host, when running arp -n, we see an entry for the MAC address of the enp0s8 interface registered with the outgoing interface vboxnet0. So on layer 2, the traffic seems to flow transparently between enp0s8 on the virtual machine and the vboxnet0 device on the host. When you run arp on the virtual machine, the picture is reversed, and we see an entry showing us that the MAC address of vboxnet0 is reachable via enp0s8.

How does all this work? First, let us see what happens when we try to reach 192.168.50.4 from the host, and let us start our investigation by looking at the source code of the VirtualBox network driver.

As every network driver, the Virtualbox network driver has a function hard_start_xmit which is responsible for the actual transmission of a frame. When you look at the source code of this driver, you will see that this function does nothing except updating the statistics. Logically, this means that the device points “into nowhere”. But how can the packet then reach the virtual machine?

This is where for me, things start to become a bit blurry, but I believe that the answer is hidden in the concept of a local route (ip_fib_local_table in the kernel). The local routing table is maintained by the Linux kernel, and when a network device comes up, an entry is added to it automatically. To inspect the table in our case, enter

ip route show table local

on the host. This should yield, among others, an entry for the destination 192.168.50.1 of type local. The presence of this entry means that when delivering IP packets to this destination, the hard_start_xmit function of the device is never actually invoked, but (see for instance chapter 35 of “Understanding Linux network internals” by C. Benvenuti) will be injected back into the kernel’s IP stack, as if the packet came in via vboxnet0. Thus, effectively, the device acts as a loopback device.

When the packet is picked up again on the IP layer, one of the first things that happens is that the netfilter mechanism is invoked. VirtualBox comes with an additional kernel module VBoxNetFlt that attaches itself to the virtual device vboxnet0 (look at the output of dmesg) and seems to divert traffic to and from the virtual network device so that they are processed by VirtualBox. Understanding the details of this mechanism is beyond my own expertise, but conceptually, this seems to be what is happening.

Combining host-only networking with LAN access

Before we close this post, let us try one more thing. We have seen that the virtual device vboxnet0 allows us to connect to the host network. As a Linux host can serve as a router, it should therefore also be possible to connect to the outside world. So let us pick some server on your LAN, for instance the router that you use to connect to the Internet. In my home network, the router is at 192.168.178.1, reachable from the host via the device eno1. The first thing that we have to do is to add a new default route inside the VM, as we have disconnected the NAT device to which the old default route was pointing. So in the VM, enter

sudo route add default gw 192.168.50.1 enp0s8

Next, we have to prepare the host to enable forwarding. First, we enable forwarding globally in the kernel. Then, we set up a set of forwarding rules. As my router is connected to the device eno1, we first allow all new connections from the virtual device to this device, using the conntrack matching extension.

sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward"
sudo iptables -A FORWARD -o eno1 -i vboxnet0 -s 192.168.50.0/24 -m conntrack --ctstate NEW -j ACCEPT

Next, we need to make sure that the reply is allowed back into the system, so we set up a rule that will enable forwarding for all established connections.

sudo iptables -A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

We also need to enable IP masquerading so that the reply is directed towards the host. The following two commands will first flush the POSTROUTING table (which might not be needed, you might want to try without this command first as it might interfere with existing rules) and then add a rule that will enable masquerading (i.e. replacing of the IP source address by the address of the outgoing interface) for all traffic going out via eno1.

sudo iptables -t nat -F POSTROUTING
sudo iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE

This is already sufficient to reach hosts in the LAN and globally using IP addresses. However, DNS lookups will be broken in the virtual machine. To fix this, edit the file /etc/systemd/resolv.conf in the virtual machine and change the line

#DNS

into

DNS=192.168.178.1

or whatever your preferred IP address is. Then pick up the configuration by running

sudo systemctl restart systemd-resolved

and DNS resolution should work again.

In this post, we have covered the basics of host-only networking and played a bit with only one virtual machine involved. However, with host-only networking, we can do more – we can also connect more than one virtual machine to the same virtual network. We will look into this in detail in the next post.

Using Terraform and Ansible to manage your cloud environments

Terraform is one of the most popular open source tools to quickly provision complex cloud based infrastructures, and Ansible is a powerful configuration management tool. Time to see how we can combine them to easily create cloud based infastructures in a defined state.

Terraform – a quick introduction

This post is not meant to be a full fledged Terraform tutorial, and there are many sources out there on the web that you can use to get familiar with it quickly (the excellent documentation itself, for instance, which is quite readable and quickly gets to the point). In this section, we will only review a few basic facts about Terraform that we will need.

When using Terraform, you declare the to-be state of your infrastructure in a set of resource definitions using a declarative language known as HCL (Hashicorp configuration language). Each resource that you declare has a type, which could be something like aws_instance or packet_device, which needs to match a type supported by a Terraform provider, and a name that you will use to reference your resource. Here is an example that defines a Packet.net server.

resource "packet_device" "web1" {
  hostname         = "tf.coreos2"
  plan             = "t1.small.x86"
  facilities       = ["ewr1"]
  operating_system = "coreos_stable"
  billing_cycle    = "hourly"
  project_id       = "${local.project_id}"
}

In addition to resources, there are a few other things that a typical configuration will contain. First, there are providers, which are an abstraction for the actual cloud platform. Each provider is enabling a certain set of resource types and needs to be declared so that Terraform is able to download the respective plugin. Then, there are variables that you can declare and use in your resource definitions. In addition, Terraform allows you to define data sources, which represent queries against the cloud providers API to gather certain facts and store them into variables for later use. And there are objects called provisioners that allow you to run certain actions on the local machine or on the machine just provisioned when a resource is created or destroyed.

An important difference between Terraform and Ansible is that Terraform also maintains a state. Essentially, the state keeps track of the state of the infrastructure and maps the Terraform resources to actual resources on the platform. If, for instance, you define a EC2 instance my-machine as a Terraform resource and ask Terraform to provision that machine, Terraform will capture the EC2 instance ID of this machine and store it as part of its state. This allows Terraform to link the abstract resource to the actual EC2 instance that has been created.

State can be stored locally, but this is not only dangerous, as a locally stored state is not protected against loss, but also makes working in teams difficult as everybody who is using Terraform to maintain a certain infrastructure needs access to the state. Thereform, Terraform offers backends that allow you to store state remotely, including PostgreSQL, S3, Artifactory, GCS, etcd or a Hashicorp managed service called Terraform cloud.

A first example – Terraform and DigitalOcean

In this section, we will see how Terraform can be used to provision a droplet on DigitalOcean. First, obviously, we need to install Terraform. Terraform comes as a single binary in a zipped file. Thus installation is very easy. Just navigate to the download page, get the URL for your OS, run the download and unzip the file. For Linux, for instance, this would be

wget https://releases.hashicorp.com/terraform/0.12.10/terraform_0.12.10_linux_amd64.zip
unzip terraform_0.12.10_linux_amd64.zip

Then move the result executable somewhere into your path. Next, we will prepare a simple Terraform resource definition. By convention, Terraform expects resource definitions in files with the extension .tf. Thus, let us create a file droplet.tf with the following content.

# The DigitalOcean oauth token. We will set this with the
#  -var="do_token=..." command line option
variable "do_token" {}

# The DigitalOcean provider
provider "digitalocean" {
  token = "${var.do_token}"
}

# Create a droplet
resource "digitalocean_droplet" "droplet0" {
  image  = "ubuntu-18-04-x64"
  name   = "droplet0"
  region = "fra1"
  size   = "s-1vcpu-1gb"
  ssh_keys = []
}

When you run Terraform in a directory, it will pick up all files with that extension that are present in this directory (Terraform treats directories as modules with a hierarchy starting with the root module). In our example, we make a reference to the DigitalOcean provider, passing the DigitalOcean token as an argument, and then declare one resource of type digitalocean_droplet called droplet0

Let us try this out. When we use a provider for the first time, we need to initialize this provider, which is done by running (in the directory where your resource definitions are located)

terraform init

This will detect all required providers and download the respective plugins. Next, we can ask Terraform to create a plan of the changes necessary to realize the to-be state described in the resource definition. To see this in action, run

terraform plan -var="do_token=$DO_TOKEN"

Note that we use the command line switch -var to define the value of the variable do_token which we reference in our resource definition to allow Terraform to authorize against the DigitalOcean API. Here, we assume that you have stored the DigitalOcean token in an environment variable called DO_TOKEN.

When we are happy with the output, we can now finally ask Terraform to actually apply the changes. Again, we need to provide the DigitalOcean token. In addition, we also use the switch -auto-approve, otherwise Terraform would ask us for a confirmation before the changes are actually applied.

terraform apply -auto-approve -var="do_token=$DO_TOKEN"

If you have a DigitalOcean console open in parallel, you can see that a droplet is actually being created, and after less than a minute, Terraform will complete and inform you that a new resource has been created.

We have mentioned above that Terraform maintains a state, and it is instructive to inspect this state after a resource has been created. As we have not specified a backend, Terraform will keep the state locally in a file called terraform.tfstate that you can look at manually. Alternatively, you can also use

terraform state pull

You will see an array resources with – in our case – only one entry, representing our droplet. We see the name and the type of the resource as specified in the droplet.tf file, followed by a list of instances that contains the (provider specific) details of the actual instances that Terraform has stored.

When we are done, we can also use Terraform to destroy our resources again. Simply run

terraform destroy -auto-approve -var="do_token=$DO_TOKEN"

and watch how Terraform brings down the droplet again (and removes it from the state).

Now let us improve our simple resource definition a bit. Our setup so far did not provision any SSH keys, so to log into the machine, we would have to use the SSH password that DigitalOcean will have mailed you. To avoid this, we again need to specify an SSH key name and to get the corresponding SSH key ID. First, we define a Terraform variable that will contain the key name and add it to our droplet.tf file.

variable "ssh_key_name" {
  type = string
  default = "do-default-key"
}

This definition has three parts. Following the keyword variable, we first specify the variable name. We then tell Terraform that this is a string, and we provide a default. As before, we could override this default using the switch -var on the command line.

Next, we need to retrieve the ID of the key. The DigitalOcean plugin provides the required functionality as a data source. To use it, add the following to our resource definition file.

data "digitalocean_ssh_key" "ssh_key_data" {
  name = "${var.ssh_key_name}"
}

Here, we define a data source called ssh_key_data. The only argument to that data source is the name of the SSH key. To provide it, we use the template mechanism that Terraform provides to expand variables inside a string to refer to our previously defined variable.

The data source will then use the DigitalOcean API to retrieve the key details and store them in memory. We can now access the SSH key to add it to the droplet resource definition as follows.

# Create a droplet
resource "digitalocean_droplet" "droplet0" {
...  
ssh_keys = [ data.digitalocean_ssh_key.ssh_key_data.id ]
}

Note that the variable reference consists of the keyword data to indicate that is has been populated by a data source, followed by the type and the name of the data source and finally the attribute that we refer to. If you now run the example again, we will get a machine with an SSH key so that we can access it via SSH as usual (run terraform state pull to get the IP address).

Having to extract the IP manually from the state file is not nice, it would be much more convenient if we could ask the Terraform to somehow provide this as an output. So let us add the following section to our file.

output "instance_ip_addr" {
  value = digitalocean_droplet.droplet0.ipv4_address
}

Here, we define an output called instance_ip_addr and populate it by referring to a data item which is exported by the DigitalOcean plugin. Each resource type will have its own list of exported variables which you can find in the documentation, and you can only refer to one of these variables. If we now run Terraform again, it will print the output upon completion.

We can also create more than one instance at a time by adding the keyword count to the resource definition. When defining the output, we will now have to refer to the instances as an array using the splat expression syntax to refer to an entire array. It is also advisible to move the entire instance configuration into variables and to split out variable definitions into a separate file.

Using Terraform with a non-local state

Using Terraform with a local state can be problematic – it could easily get lost or overwritten and makes working in a team or at different locations difficult. Therefore, let us quickly look at an example to set up Terraform with non-local state. Among the many available backends, we will use PostgreSQL.

Thanks to docker, it is very easy to bring up a local PostgreSQL instance. Simply run

docker run --name tf-backend -e POSTGRES_PASSWORD=my-secret-password -d postgres

Note that we do not map the PostgreSQL port here for security reasons, so we will need to figure out the IP of the resulting Docker container and use it in the Terraform configuration. The IP can be retrieved with the following command.

docker inspect tf-backend | jq -r '.[0].NetworkSettings.IPAddress'

In my example, the Docker container is running at 172.17.0.2. Next, we will have to create a database for the Terraform backend. Assuming that you have the required client packages installed, run

createdb --host=172.17.0.2 --username=postgres terraform_backend

and enter the password my-secret-password specified above when starting the Docker container. We will use the PostgreSQL superuser for our example, in a real life example you would of course first create your own user for Terraform and use it going forward.

Next, we add the backend definition to our Terraform configuration using the following section.

# The backend - we use PostgreSQL.
terraform {
  backend "pg" {
  }
}

You might expect that we also need to provide the location of the database (i.e. a connection string) and credentials. In fact, we could do this, but this would imply that the credentials which are part of the connection string would be present in the file in clear text. We therefore use a partial configuration and later supply the missing data on the command line.

Finally, we have to re-run terraform init to make the new backend known to Terraform and create a new workspace. Terraform will detect the change and even offer you to automatically migrate the state into the new backend.

terraform init -backend-config="conn_str=postgres://postgres:my-secret-password@172.17.0.2/terraform_backend?sslmode=disable"
terraform workspace new default

Be careful – as our database is running in a container with ephemeral storage, the state will be lost when we destroy the container! In our case, this would not be a desaster, as we would still be able to control our small playground environment manually. Still, it is a good idea to tag all your servers so that you can easily identify the servers managed by Terraform. If you intend to use Terraform more often, you might also want to spin up a local PostgreSQL server outside of a container (here is a short Ansible script that will do this for you and create a default user terraform with password being equal to the user name). Note that with a non-local state, Terraform will still store a reference to your state locally (look at the directory .terraform), so that when you work from a different directory, you will have to run the init command again. Also note that this will store your database credentials locally on disk in clear text.

Combining Terraform and Ansible

After this very short introduction into Terraform, let us now discuss how we can combine Terraform to manage our cloud environment with Ansible to manage the machines. Of course there are many different options, and I have not even tried to create an exhaustive list. Here are the options that I did consider.

First, you could operate Terraform and Ansible independently and use a dynamic inventory. Thus, you would have some Terraform templates and some Ansible playbooks and, when you need to verify or update your infrastructure, first run Terraform and then Ansible. The Ansible playbooks would use provider-specific inventory scripts to build a dynamic inventory and operate on this. This setup is simple and allows you to manage the Terraform configuration and your playbooks without any direct dependencies.

However, there is a potential loss of information when working with inventory scripts. Suppose, for instance, that you want to bring up a configuration with two web servers and two database servers. In a Terraform template, you would then typically have a resource “web” with two instances and a resource “db” with two instances. Correspondingly, you would want groups “web” and “db” in your inventory. If you use inventory scripts, there is no direct link between Terraform resources and cloud resources, and you would have to use tags to provide that link. Also, things easily get a bit more complicated if your configuration uses more than one cloud provider. Still, this is a robust method that seems to have some practical uses.

A second potential approach is to use Terraform provisioners to run Ansible scripts on each provisioned resource. If you decide to go down this road, keep in mind that provisioners only run when a resource is created or destroyed, not when it changes. If, for instance, you change your playbooks and simply run Terraform again, it will not trigger any provisioners and the changed playbooks will not apply. There are approaches to deal with this, for instance null resources, but this is difficult to control and the Terraform documentation itself does not advocate the use of provisioners.

Next, you could think about parsing the Terraform state. Your primary workflow would be coded into an Ansible playbook. When you execute this playbook, there is a play or task which reads out the Terraform state and uses this information to build a dynamic inventory with the add_host module. There are a couple of Python scripts out there that do exactly this. Unfortunately, the structure of the state is still provider specific, a DigitalOcean droplet is stored in a structure that is different from the structure used for a AWS EC2 instance. Thus, at least the script that you use for parsing the state is still provider specific. And of course there are potential race conditions when you parse the state while someone else might modify it, so you have to think about locking the state while working with it.

Finally, you could combine Terraform and Ansible with a tool like Packer to create immutable infrastructures. With this approach, you would use Packer to create images, supported maybe by Ansible as a provisioner. You would then use Terraform to bring up an infrastructure using this image, and would try to restrict the in-place configurations to an absolute minimum.

The approach that I have explored for this post is not to parse the state, but to trigger Terraform from Ansible and to parse the Terraform output. Thus our playbook will work as follows.

Use the Terraform module that comes with Ansible to run a Terraform template that describes the infrastructure
Within the Terraform template, create an output that contains the inventory data in a provider independent JSON format
Back in Ansible, parse that output and use add_host to build a corresponding dynamic inventory
Continue with the actual provisioning tasks per server in the playbook

Here, the provider specific part is hidden in the Terraform template (which is already provider specific by design), where the data passed back to Ansible and the Ansible playbook at least has a chance to be provider agnostic.

Similar to a dynamic inventory script, we have to reach out to the cloud provider once to get the current state. In fact, behind the scenes, the Terraform module will always run a terraform plan and then check its output to see whether there is any change. If there is a change, it will run terraform apply and return its output, otherwise it will fall back to the output from the last run stored in the state by running terraform output. This also implies that there is again a certain danger of running into race conditions if someone else runs Terraform in parallel, as we release the lock on the Terraform state once this phase has completed.

Note that this approach has a few consequences. First, there is a structural coupling between Terraform and Ansible. Whoever is coding the Terraform part needs to prepare an output in a defined structure so that Ansible is happy. Also the user running Ansible needs to have the necessary credentials to run Terraform and access the Terraform state. In addition, Terraform will be invoked every time when you run the Ansible playbook, which, especially for larger infrastructures, slows down the processing a bit (looking at the source code of the Terraform module, it would probably be easy to add a switch so that only terraform output is run, which should provide a significant speedup).

Getting and running the sample code

Let us now take a look at a sample implementation of this idea which I have uploaded into my Github repository. The code is organized into a collection of Terraform modules and Ansible roles. The example will bring up two web servers on DigitalOcean and two database servers on AWS EC2.

Let us first discuss the Terraform part. There are two modules involved, one module that will create a droplet on DigitalOcean and one module that will create an EC2 instance. Each module returns, as an output, a data structure that contains one entry for each server, and each entry contains the inventory data for that server, for instance

inventory = [
  {
    "ansible_ssh_user" = "root"
    "groups" = "['web']"
    "ip" = "46.101.162.72"
    "name" = "web0"
    "private_key_file" = "~/.ssh/do-default-key"
  },
...

This structure can easily be assembled using Terraforms for-expressions. Assuming that you have defined a DigitalOcean resource for your droplets, for instance, the corresponding code for DigitalOcean is

output "inventory" {
  value = [for s in digitalocean_droplet.droplet[*] : {
    # the Ansible groups to which we will assign the server
    "groups"           : var.ansibleGroups,
    "name"             : "${s.name}",
    "ip"               : "${s.ipv4_address}",
    "ansible_ssh_user" : "root",
    "private_key_file" : "${var.do_ssh_private_key_file}"
  } ]
}

The EC2 module is structured similarly and delivers output in the same format (note that there is no name exported by the AWS provider, but we can use a tag to capture and access the name).

In the main.tf file, we then simply invoke each of the modules, concatenate their outputs and return this as Terraform output.

This output is then processed by the Ansible role terraform. Here, we simply iterate over the output (which is in JSON format and therefore recognized by Ansible as a data structure, not just a string), and for each item, we create a corresponding entry in the dynamic inventory using add_host.

The other roles in the repository are straightforward, maybe with one exception – the role sshConfig which will add entries for all provisioned hosts in an SSH configuration file ~/.ssh/ansible_created_config and include this file in the users main SSH configuration file. This is not necessary for things to work, but will allow you to simply type something like

ssh db0

to access the first database server, without having to specify IP address, user and private key file.

A note on SSH. When I played with this, I hit upon this problem with the Gnome keyring daemon. The daemon adds identities to your SSH agent, which are attempted every time Ansible tries to log into one of the machines. If you work with more than a few SSH identities, we will therefore exceed the number of failed SSH login attempts configured in the Ubuntu image, and will not be able to connect any more. The solution for me was to disable the Gnome keyring at startup.

In order to run the sample, you will first have to prepare a few credentials. We assume that

You have AWS credentials set up to access the AWS API
The DigitalOcean API key is stored in the environment variable DO_TOKEN
There is a private key file ~/.ssh/ec2-default-key.pem matching an SSH key on EC2 called ec2-default-key
Similarly, there is a private key file ~/.ssh/do-default-key that belongs to a public key uploaded to DigitalOcean as do-default-key
There is a private / public SSH key pair ~/.ssh/default-user-key[.pub] which we will use to set up an SSH enabled default user on each machine

We also assume that you have a Terraform backend up and running and have run terraform init to connect to this backend.

To run the examples, you can then use the following steps.

git clone https://github.com/christianb93/ansible-samples
cd ansible-samples/terraform
terraform init
ansible-playbook site.yaml

The extra invocation of terraform init is required because we introduce two new modules that Terraform needs to pre-load, and because you might not have the provider plugins for DigitalOcean and AWS on your machine. If everything works, you will have a fully provisioned environment with two servers on DigitalOcean with Apache2 installed and two servers on EC2 with PostgreSQL installed up and running in a few minutes. If you are done, do not forget to shut down everything again by running

terraform  destroy -var="do_token=$DO_TOKEN"

Also I highly recommend to use the DigitalOcean and AWS console to manually check that no servers are left running to avoid unnecessary cost and to be prepared for the case that your state is out-of-sync.

Automating provisioning with Ansible – building cloud environments

When you browse the module list of Ansible, you will find that the by far largest section is the list of cloud modules, i.e. modules to interact with a cloud environments to define, provision and maintain objects like virtual machines, networks, firewalls or storage. These modules make it easy to access the APIs of virtually every existing cloud provider. In this post, we will look at an example that demonstrates the usage of these modules for two cloud platforms – DigitalOcean and AWS.

Using Ansible with DigitalOcean

Ansible comes with several modules that are able to connect to the DigitalOcean API to provision machines, manage keys, load balancers, snapshots and so forth. Here, we will need two modules – the module digital_ocean_droplet to deal with individual droplets, and digital_ocean_sshkey_facts to manage SSH keys.

Before getting our hands dirty, we need to talk about credentials first. There are three types of credentials involved in what follows.

To connect to the DigitalOcean API, we need to identify ourselves with a token. Such a token can be created on the DigitalOcean cloud console and needs to be stored safely. We do not want to put this into a file, but instead assume that you have set up an environment variable DO_TOKEN containing its value and access that variable.
The DigitalOcean SSH key. This is the key that we create once and upload (its public part, of course) to the DigitalOcean website. When we provision our machines later, we pass the name of the key as a parameter and DigitalOcean will automatically add this public key to the authorized_keys file of the root user so that we can access the machine via SSH. We will also use this key to allow Ansible to connect to our machines.
The user SSH key. As in previous examples, we will, in addition, create a default user on each machine. We need to create a key pair and also distribute the public key to all machines.

Let us first create these keys, starting with the DigitalOcean SSH key. We will use the key file name do-default-key and store the key in ~/.ssh on the control machine. So run

ssh-keygen -b 2048 -t rsa -P "" -f ~/.ssh/do-default-key
cat ~/.ssh/do-default-key.pub

to create the key and print out the public key.

Then navigate to the DigitalOcean security console, hit “Add SSH key”, enter the public key, enter the name “do-default-key” and save the key.

Next, we create the user key. Again, we use ssh-keygen as above, but this time use a different file name. I use ~/.ssh/default-user-key. There is no need to upload this key to DigitalOcean, but we will use it later in our playbook for the default user that we create on each machine.

Finally, make sure that you have a valid DigitalOcean API token (if not, go to this page to get one) and put the value of this key into an environment variable DO_TOKEN.

We are now ready to write the playbook to create droplets. Of course, we place this into a role to enable reuse. I have called this role droplet and you can find its code here.

I will not go through the entire code (which is documented), but just mention some of the more interesting points. First, when we want to use the DigitalOcean module to create a droplet, we have to pass the SSH key ID. This is a number, which is different from the key name that we specified while doing the upload. Therefore, we first have to retrieve the key ID. For that purpose, we retrieve a list of all keys by calling the module digital_ocean_sshkey_facts which will write its results into the ansible_facts dictionary.

- name: Get available SSH keys
  digital_ocean_sshkey_facts:
    oauth_token: "{{ apiToken }}"

We then need to navigate this structure to get the ID of the key we want to use. We do this by preparing a JSON query, which we put together in several steps to avoid issues with quoting. First, we apply the query string [?name=='{{ doKeyName }}'], where doKeyName is the variable that holds the name of the DigitalOcean SSH key, to navigate to the entry in the key list representing this key. The result will still be a list from which we extract the first (and typically only) element and finally access its attribute id holding the key ID.

- name: Prepare JSON query string
  set_fact:
    jsonQueryString: "[?name=='{{ doKeyName }}']"
- name: Apply query String to extract matching key data
  set_fact:
      keyData: "{{ ansible_facts['ssh_keys']|json_query(jsonQueryString) }}"
- name: Get keyId
  set_fact:
      keyId: "{{ keyData[0]['id'] }}"

Once we have that, we can use the DigitalOcean Droplet module to actually bring up the droplet. This is rather straightforward, using the following code snippet.

- name: Bring up or stop machines
  digital_ocean_droplet:
    oauth_token: "{{ apiToken }}"
    image: "{{osImage}}"
    region_id: "{{regionID}}"
    size_id: "{{sizeID}}"
    ssh_keys: [ "{{ keyId }}" ]
    state: "{{ targetState }}"
    unique_name: yes
    name: "droplet{{ item }}"
    wait: yes
    wait_timeout: 240
    tags:
    - "createdByAnsible"
  loop:  "{{ range(0, machineCount|int )|list }}"
  register: machineInfo

Here, we go through a loop, with the number of iterations being governed by the value of the variable machineCount, and, in each iteration, bring up one machine. The name of this machine is a fixed part plus the loop variable, i.e. droplet0, droplet1 and so forth. Note the usage of the parameter unique_name. This parameter instructs the module to check that the supplied name is unique. So if there is already a machine with that name, we skip its creation – this is what makes the task idempotent!

Once the machine is up, we finally loop over all provisioned machines once more (using the variable machineInfo) and complete two additional step for each machine. First, we use add_host to add the host to the inventory – this makes it possible to use them later in the playbook. For convenience, we also add an entry for each host to the local SSH configuration, which allows us to log into the machine using a shorthand like ssh droplet0.

Once the role droplet has completed, our main playbook will start a second and a third play. These plays now use the inventory that we have just built dynamically. The first of them is quite simple and just waits until all machines are reachable. This is necessary, as DigitalOcean reports a machine as up and running at a point in time when the SSH daemon might not yet accept conditions. The next and last play simply executes two roles. The first role is the role that we have already put together in an earlier post which simply adds a standard non-root user, and the second role simply uses apt and pip to install a few basic packages.

A few words on security are in order. Here, we use the DigitalOcean standard setup, which gives you a machine with a public IP address directly connected to the Internet. No firewall is in place, and all ports on the machine are reachable from everywhere. This is obviously not a setup that I would recommend for anything else than a playground. We could start a local firewall on the machine by using the ufw module for Ansible as a starting point, and then you would probably continue with some basic hardening measures before installing anything else (which, of course, could be automated using Ansible as well).

Using Ansible with AWS EC2

Let us now see what we need to change when trying the same thing with AWS EC2 instead of Digital Ocean. First, of course, the authorization changes a bit.

The Ansible EC2 module uses the Python Boto library behind the scenes which is able to reuse the credentials of an existing AWS CLI configuration. So if you follow my setup instructions in an earlier post on using Python with AWS, no further setup is required
Again, there is an SSH key that AWS will install on each machine. To create a key, navigate to the AWS EC2 console, select “Key pairs” under “Network and Security” in the navigation bar on the left, create a key called ec2-default-key, copy the resulting PEM file to ~/.ssh/ and set the access rights 0700.
The user SSH key – we can use the same key as before

As the EC2 module uses the Boto3 Python library, you might have to install it first using pip or pip3.

Let us now go through the steps that we have to complete to spin up our servers and see how we realize idempotency. First, to create a server on EC2, we need to specify an AMI ID. AMI-IDs change with every new version of an AMI, and typically, you want to install the latest version. Thus we first have to figure out the latest AMI ID. Here is a snippet that will do this.

- name: Locate latest AMI
  ec2_ami_facts:
    region: "eu-central-1"
    owners: 099720109477
    filters:
      name: "ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-????????"
      state: "available"
  register: allAMIs
- name: Sort by creation date to find the latest AMI
  set_fact:
    latestAMI: "{{ allAMIs.images | sort(attribute='creation_date') | last }}"

Here, we first use the module ec2_ami_facts to get a list of all AMIs matching a certain expression – in this case, we look for available AMIs for Ubuntu 18.04 for EBS-backed machines with the owner 099720109477 (Ubuntu). We then use a Jinja2 template to sort by creation date and pick the latest entry. The result will be a dictionary, which, among other things, contains the AMI ID as image_id.

Now, as in the DigitalOcean example, we can start the actual provisioning step using the ec2 module. Here is the respective task.

- name: Provision machine
  ec2:
    exact_count: "{{ machineCount }}"
    count_tag: "managedByAnsible"
    image: "{{ latestAMI.image_id }}"
    instance_tags:
      managedByAnsible: true
    instance_type: "{{instanceType}}"
    key_name: "{{ ec2KeyName }}"
    region: "{{ ec2Region }}"
    wait: yes
  register: ec2Result

Idempotency is realized differently on EC2. Instead of using the server name (which Amazon will assign randomly), the module uses tags. When a machine is brought up, the tags specified in the attribute instance_tags are attached to the new instance. The attribute exact_count instructs the module to make sure that exactly this number of instances with this tag is running. Thus if you run the playbook twice, the module will first query all machines with the specified tag, figure out that there is already an instance running and will not create any additional machines.

This simple setup will provision all new machines into the default VPC and use the default security group. This has the disadvantage that in order for Ansible to be able to reach these machines, you will have to open port 22 for incoming connections in the attached security group. The playbook does not do this, but expects that you do this manually using for instance the EC2 console. Again, this is of course not a production-like.

Running the examples

I have put the full playbooks with all the required roles into my GitHub repository. To run the examples, carry out the following steps.

First, prepare the SSH keys and API tokens for EC2 and DigitalOcean as explained in the respective sections. Then, use the EC2 console to allow incoming traffic for port 22 from your local workstation in the default VPC, into which our playbook will launch the machines created on EC2. Next, run the following commands to clone the repository and execute the playbooks (if you have not used the SSH key names and locations as above, you will first have to edit the default variables in the roles ec2Instance and droplet accordingly).

git clone https://github.com/christianb93/ansible-samples
cd partVII
# Turn off strict host key checking
export ANSIBLE_HOST_KEY_CHECKING=False
# Run the EC2 example
ansible-playbook awsSite.yaml
# Run the DigitalOcean example
ansible-playbook doSite.yaml

Once the playbooks complete, you should now be able to ssh into the machine created on DigitalOcean using ssh droplet0 and into the machine created on EC2 using ssh aws0. Do not forget to delete all machines again when you are done to avoid unnecessary cost!

Limitations of our approach

The approach we have followed so far to manage our cloud environments is simple, but has a few drawbacks and limitations. First, the provisioning is done on the localhost in a loop, and therefore sequentially. Even if the provisioning of one machine only takes one or two minutes, this implies that the approach does not scale, as bringing up a large number of machines would take hours. One approach to fix that that I have seen on Stackexchange (unfortunately I cannot find the post any more, so I cannot give proper credit) is to use a static inventory file with a dummy group containing only the host names in combination with connection: local to execute the provisioning locally, but in parallel.

The second problem is more severe. Suppose, for instance, you have already created a droplet and now decide that you want 2GB of memory instead. So you might be tempted to go back, edit (or override) the variable sizeID and re-run the playbook. This will not give you any error message, but in fact, it will not do anything. The reason is that the module will only use the host name to verify that the target state has been reached, not any other attributes of the droplet (take a look at the module source code here to verify this). Thus, we have a successful playbook execution, but a deviation between the target state coded into the playbook and the real state.

Of course this could be fixed by a more sophisticated check – either by extending the module, or by coding this check into the playbook. Alternatively, we could try to leverage an existing solution that has this logic already implemented. For AWS, for instance, you could try to define your target state using CloudFormation templates and then leverage the CloudFormation Ansible module to apply this from within Ansible. Or, if you want to be less provider specific, you might want to give Terraform a try. This is what I decided to do, and we will dive into more details how to integrate Terraform and Ansible in the next post.

Automating provisioning with Ansible – control structures and roles

When building more sophisticated Ansible playbooks, you will find that a linear execution is not always sufficient, and the need for more advanced control structures and ways to organize your playbooks arises. In this post, we will cover the corresponding mechanisms that Ansible has up its sleeves.

Loops

Ansible allows you to build loops that execute a task more than once with different parameters. As a simple example, consider the following playbook.

---
  - hosts: localhost
    connection: local
    tasks:
      - name: Loop with a static list of items
        debug:
          msg: "{{ item }}"
        loop:
        - A
        - B

This playbook contains only one play. In the play, we first limit the execution to localhost and then use the connection keyword to instruct Ansible not to use ssh but execute commands directly to speed up execution (which of course only works for localhost). Note that more generally, the connection keyword specifies a so-called connection plugin of which Ansible offers a few more than just ssh and local.

The next task has an additional keyword loop. The value of this attribute is a list in YAML format, having the elements A and B. The loop keyword instructs Ansible to run this task twice per host. For each execution, it will assign the corresponding value of the loop variable to the built-in variable item so that it can be evaluated within the loop.

If you run this playbook, you will get the following output.

PLAY [localhost] *************************************
TASK [Gathering Facts]********************************
ok: [localhost]

TASK [Loop with a static list of items] **************
ok: [localhost] => (item=A) => {
    "msg": "A"
}
ok: [localhost] => (item=B) => {
    "msg": "B"
}

So we see that even though there is only one host, the loop body has been executed twice, and within each iteration, the expression {{ item }} evaluates to the corresponding item in the list over which we loop.

Loops are most useful in combination with Jinja2 expressions. In fact, you can iterate over everything that evaluates to a list. The following example demonstrates this. Here, we loop over all environment variables. To do this, we use Jinja2 and Ansible facts to get a dictionary with all environment variables, then convert this to a list using the item() method and use this list as argument to loop.

---
  - hosts: localhost
    connection: local
    tasks:
      - name: Loop over all environment variables 
        debug:
          msg: "{{ item.0 }}"
        loop:
         "{{ ansible_facts.env.items() | list }}"
        loop_control:
          label:
            "{{ item.0 }}"

This task also demonstrates loop controls. This is just a set of attributes that we can specify to control the behaviour of the loop. In this case, we set the label attribute which specifies the output that Ansible prints out for each loop iteration. The default is to print the entire item, but this can quickly get messy if your item is a complex data structure.

You might ask yourself why we use the list filter in the Jinja2 expression – after all, the items() method should return a list. In fact, this is only true for Python 2. I found that in Python 3, this method returns something called a dictionary view, which we then need to convert into a list using the filter. This version will work with both Python 2 and Python 3.

Also note that when you use loops in combination with register, the registered variable will be populated with the results of all loop iterations. To this end, Ansible will populate the variable to which the register refers with a list results. This list will contain one entry for each loop iteration, and each entry will contain the item and the module-specific output. To see this in action, run my example playbook and have a look at its output.

Conditions

In addition to loops, Ansible allows you to execute specific tasks only if a certain condition is met. You could, for instance, have a task that populates a variable, and then execute a subsequent task only if this variable has a certain value.

The below example is a bit artificial, but demonstrates this nicely. In a first task, we create a random value, either 0 or 1, using Jinja2 templating. Then, we execute a second task only if this value is equal to one.

---
  - hosts: localhost
    connection: local
    tasks:
      - name: Populate variable with random value
        set_fact:
          randomVar: "{{ range(0, 2) | random }}"
      - name: Print value
        debug:
          var: randomVar
      - name: Execute additional statement if randomVar is 1
        debug:
          msg: "Variable is equal to one"
        when:
          randomVar == "1"

To create our random value, we combine the Jinja2 range method with the random filter which picks a random element from a list. Note that the result will be a string, either “0” or “1”. In the last task within this play, we then use the when keyword to restrict execution based on a condition. This condition is in Jinja2 syntax (without the curly braces, which Ansible will add automatically) and can be any valid expression which evaluates to either true or false. If the condition evaluates to false, then the task will be skipped (for this host).

Care needs to be taken when combining loops with conditional execution. Let us take a look at the following example.

---
  - hosts: localhost
    connection: local
      - name: Combine loop with when
        debug:
          msg: "{{ item }}"
        loop:
          "{{ range (0, 3)|list }}"
        when:
          item == 1
        register:
          loopOutput
      - debug:
          var: loopOutput

Here, the condition specified with when is evaluated once for each loop iteration, and if it evaluates to false, the iteration is skipped. However, the results array in the output still contains an item for this iteration. When working with the output, you might want to evaluate the skipped attribute for each item which will tell you whether the loop iteration was actually executed (be careful when accessing this, as it is not present for those items that were not skipped).

Handlers

Handlers provide a different approach to conditional execution. Suppose, for instance, you have a sequence of tasks that each update the configuration file of a web server. When this actually resulted in a change, you want to restart the web server. To achieve this, you can define a handler.

Handlers are similar to tasks, with the difference that in order to be executed, they need to be notified. The actual execution is queued until the end of the playbook, and even if the handler is notified more than once, it is only executed once (per host). To instruct Ansible to notify a handler, we can use the notify attribute for the respective task. Ansible will, however, only notify the handler if the task results in a change. To illustrate this, let us take a look at the following playbook.

---
  - hosts: localhost
    connection: local
    tasks:
    - name: Create empty file
      copy:
        content: ""
        dest: test
        force: no
      notify: handle new file
    handlers:
    - name: handle new file
      debug:
        msg: "Handler invoked"

This playbook defines one task and one handler. Handlers are defined in a separate section of the playbook which contains a list of handlers, similarly to tasks. They are structured as tasks, having a name and an action. To notify a handler, we add the notify attribute to a task, using exactly the name of the handler as argument.

In our example, the first (and only) task of the play will create an empty file test, but only if the file is not yet present. Thus if you run this playbook for the first time (or manually remove the file), this task will result in a change. If you run the playbook once more, it will not result in a change.

Correspondingly, upon the first execution, the handler will be notified and execute at the end of the play. For subsequent executions, the handler will not run.

Handler names should be unique, so you cannot use handler names to make a handler listen for different events. There is, however, a way to decouple handlers and notifications using topics. To make a handler listen for a topic, we can add the listen keyword to the handler, and use the name of the topic instead of the handler name when defining the notificiation, see the documentation for details.

Handlers can be problematic, as they are change triggered, not state triggered. If, for example, a task results in a change and notifies a handler, but the playbook fails in a later task, the trigger is lost. When you run the playbook again, the task will most likely not result in a change any more, and no additional notification is created. Effectively, the handler never gets executed. There is, however, a flag –force-handler to force execution of handlers even if the playbook fails.

Using roles and includes

Playbooks are a great way to automate provisioning steps, but for larger projects, they easily get a bit messy and difficult to maintain. To better organize larger projects, Ansible has the concept of a role.

Essentially, roles are snippets of tasks, variables and handlers that are organized as reusable units. Suppose, for instance, that in all your playbooks, you do a similar initial setup for all machines – create a user, distribute SSH keys and so forth. Instead of repeating the same tasks and the variables to which they refer in all your playbooks, you can organize them as a role.

What makes roles a bit more complex to use initially is that roles are backed by a defined directory structure that we need to understand first. Specifically, Ansible will look for roles in a subdirectory called (surprise) roles in the directory in which the playbook is located (it is also possible to maintain system-wide roles in /etc/ansible/roles, but it is obviously much harder to put these roles under version control). In this subdirectory, Ansible expects a directory for each role, named after the role.

Within this subdirectory, each type of objects defined by this role are placed in separate subdirectories. Each of these subdirectories contains a file main.yml that defines these objects. Not all of these directories need to be present. The most important ones that most roles will include are (see the Ansible documentation for a full list) are:

tasks, holding the tasks that make up the role
defaults that define default values for the variables referenced by these tasks
vars containing additional variable definitions
meta containing information on the role itself, see below

To initialize this structure, you can either create these directories manually, or you run

cd roles
ansible-galaxy init

to let Ansible create a skeleton for you. The tool that we use here – ansible-galaxy – is actually part of an infrastructure that is used by the community to maintain a set of reusable roles hosted centrally. We will not use Ansible Galaxy here, but you might want to take a look at the Galaxy website to get an idea.

So suppose that we want to restructure the example playbook that we used in one of my last posts to bring up a Docker container as an Ansible test environment using roles. This playbook essentially has two parts. In the first part, we create the Docker container and add it to the inventory. In the second part, we create a default user within the container with a given SSH key.

Thus it makes sense to re-structure the playbook using two roles. The first role would be called createContainer, the second defaultUser. Our directory structure would then look as follows.

Here, docker.yaml is the main playbook that will use the role, and we have skipped the var directory for the second role as it is not needed. Let us go through these files one by one.

The first file, docker.yaml, is now much shorter as it only refers to the two roles.

---
  - name: Bring up a Docker container 
    hosts: localhost
    roles:
    - createContainer
  - name: Provision hosts
    hosts: docker_nodes
    become: yes
    roles:
    - defaultUser

Again, we define two plays in this playbook. However, there are no tasks defined in any of these plays. Instead, we reference roles. When running such a play, Ansible will go through the roles and for each role:

Load the tasks defined in the main.yml file in the tasks-folder of the role and add them to the play
Load the default variables defined in the main.yml file in the defaults-folder of the role
Merge these variable definitions with those defined in the vars-folder, so that these variables will overwrite the defaults (see also the precedence rules for variables in the documentation

Thus all tasks imported from roles will be run before any tasks defined in the playbook directly are executed.

Let us now take a look at the files for each of the roles. The tasks, for instance, are simply a list of tasks that you would also put into a play directly, for instance

---
- name: Run Docker container
  docker_container:
    auto_remove: yes
    detach: yes
    name: myTestNode
    image: node:latest
    network_mode: bridge
    state: started
  register: dockerData

Note that this is a perfectly valid YAML list, as you would expect it. Similarly, the variables you define either as default or as override are simple dictionaries.

---
# defaults file for defaultUser
userName: chr
userPrivateKeyFile: ~/.ssh/ansible-default-user-key_rsa

The only exception is the role meta data main.yml file. This is a specific YAML-based syntax that defines some attributes of the role itself. Most of the information within this file is meant to be used when you decide to publish your role on Galaxy and not needed if you work locally. There is one important exception, though – you can define dependencies between roles that will be evaluated and make sure that roles that are required for a role to execute are automatically pulled into the playbook when needed.

The full example, including the definitions of these two roles and the related variables, can be found in my GitHub repository. To run the example, use the following steps.

# Clone my repository and cd into partVI
git clone https://github.com/christianb93/ansible-samples
cd ansible-samples/partVI
# Create two new SSH keys. The first key is used for the initial 
# container setup, the public key is baked into the container
ssh-keygen -f ansible -b 2048 -t rsa -P ""
# the second key is the key for the default user that our 
# playbook will create within the container
ssh-keygen -f default-user-key -b 2048 -t rsa -P ""
# Get the Dockerfile from my earlier post and build the container, using
# the new key ansible just created
cp ../partV/Dockerfile .
docker build -t node .
# Put location of user SSH key into vars/main.yml for 
# the defaultUser role
mkdir roles/defaultUser/vars
echo "---" > roles/defaultUser/vars/main.yml
echo "userPrivateKeyFile: $(pwd)/default-user-key" >> roles/defaultUser/vars/main.yml
# Run playbook
export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook docker.yaml

If everything worked, you can now log into your newly provisioned container with the new default user using

ssh -i default-user-key chr@172.17.0.2

where of course you need to replace the IP address by the IP address assigned to the Docker container (which the playbook will print for you). Note that we have demonstrated variable overriding in action – we have created a variable file for the role defaultUser and put the specific value, namely the location of the default users SSH key – into it, overriding the default coming with the role. This ability to bundle all required variables as part of a role but at the same time allowing a user to override them is it what makes roles suitable for actual code reuse. We could have overwritten the variable as well in the playbook using it, making the use of roles a bit similar to a function call in a procedural programming language.

This completes our post on roles. In the next few posts, we will now tackle some more complex projects – provisioning infrastructure in a cloud environment with Ansible.