Learning Kafka with Python – installation

In this post, we will install a Kafka cluster with three nodes on our lab PC, using KVM, Vagrant and Ansible.

The setup

Of course it is possible (and actually easy, see the instructions in the quickstart section of the excellent Apache Kafka documentation) to run Kafka as a single-node cluster on your PC. The standard distribution of Kafka contains all you need for this, even an embedded ZooKeeper and a default configuration that should work out-of-the-box. Some of the more advanced topics that we want to try it in the course of this series, like replication and failover scenarios, do however only make sense if you have a clustered setup.

Fortunately, creating such a setup is comparatively easy using KVM virtual machines and tools like Vagrant and Ansible. If you have never used these tools before – do not worry, I will show you in the last section of this post how to download and run my samples. Still, you might want to take a look at some of the previous posts that I have written to get a basic understanding of Ansible and Vagrant with libvirt.

In our test setup, we will be creating three virtual machines, each sized with two vCPUs and 2 GB of memory. On each of the three nodes, we will be running a ZooKeeper instance and a Kafka broker. Note that in a real-world setup, you would probably use dedicated nodes for the ZooKeeper ensemble, but we co-locate these components to keep the number of involved virtual machines reasonable.

Our machines will be KVM machines running Debian Buster. I have chosen this distribution primarily because it is small (less than 300 MB download for the libvirt vagrant box) and boots up blazingly fast – once the initial has initially been downloaded to your machine, creating the machines from scratch takes only a bit less than a minute.

To simulate a more realistic setup, each machine has two network interfaces. One interface is the default interface that Vagrant will attach to a “public” network and that we will use to SSH into the machine. On this interface, we will expose a Kafka listener secured using TLS, and the network will be connected to the internet using NATing. A second interface will connect to a “private” network (which, however, will still be reachable from the lab PC). On this network, we will run an unsecured listener and Kafka will use this interface for inter-broker communication and connecting to the ZooKeeper instances.


Using the public interface, we will be able to connect to the Kafka brokers via the TLS secured listener and run our sample programs. This setup could easily be migrated to your favored public cloud provider.

Installation steps

The installation is mostly straightforward. First, we setup networking on each machine, i.e. we bring up the interfaces, assign IP addresses and add the host names to /etc/hosts on each node, so that the hostname will resolve to the private interface IP address. We then install ZooKeeper from the official Debian packages (which, at the time of writing, is version 3.4.13).

The ZooKeeper configuration can basically be taken over from the packaged configuration, with one change – we need to define the ensemble by listing all ZooKeeper nodes, i.e. by adding the section


to /etc/zookeeper/conf/zoo.cfg. On each node, we also have to create a file called myid in /etc/zookeeper/conf/ (which is symlinked to /var/lib/zookeeper/myid) containing the unique ZooKeeper ID – here we just use the last character of the server name, i.e. “1” for broker1 and so forth.

Once ZooKeeper is up and running, we can install Kafka. First, as we want to use TLS to secure one of the listeners, we need to create a few keys and certificates. Specifically, we need

  • A self-signed CA certificate and a matching key pair
  • A key-pair and matching server certificate for each broker, signed by the CA
  • A client key-pair and a client certificate, signed by the same CA (though we could of course use a different CA for the client certificates, but let us keep things simple)

Creating these certificates with OpenSSL is straighforward (if you have never worked with OpenSSL certificates before, you might want to take a look at my previous post on this). We also need to bundle keys and certificates into a key store and a trust store for the server and similarly a key store and a trust store for the client (where the keystore holds the credentials, i.e. keys and certificates, presented by the server respectively client, whereas the trust store holds the CA certificate). For the server, I was able to use a PKCS12 key store created by OpenSSL. For the client, however, this did not work, and I had to use the Java keytool to convert the PKCS12 keystore to the JKS format.

After these preparations, we can now install Kafka. I have used the official tarball from the 2.13-2.4.1 release downloaded from this mirror URL. After unpacking the archive, we first need to adapt the configuration file server.properties. The item we need to change are

  • On each broker, we need to set a broker ID – again, I have used the last character of the hostname for this. Note that this implies that broker ID and ZooKeeper ID are the same, which is pure coincidence and not needed
  • The default configuration contains an unsecured (“PLAINTEXT”) listener on port 9092, to which we add an SSL listener on port 9093, using the IP of the public interface
  • The default configuration places the Kafka logs in /tmp, which is obviously not a good idea for anything but a simple test, as most Linux distributions clean up /tmp when you reboot. So we need to change this to point to a different directory
  • As we are running more than one node, it makes sense to change the replication settings for the internal topics
  • Finally, we need to adapt the ZooKeeper connection string so that all Kafka brokers connect to our ZooKeeper ensemble (and thus form a cluster, note that more or less by definition, a cluster in Kafka is simply a bunch of brokers that all connect to the same ZooKeeper ensemble).

Finally, it makes sense to add a systemd unit so that Kafka comes up again if you restart the machine.

Trying it out

After all this theory, we can now finally try this out. As mentioned above, I have prepared a set of Ansible scripts to set up virtual machines and install ZooKeeper and Kafka along the lines of the procedure described above. To run them, you will first have to install a few packages on your machine. Specifically, you need to install (if you have not done so yet) libvirt, Vagrant and Ansible, and install the libvirt Vagrant plugin. The instructions below work on Ubuntu Bionic, if you use a different Linux distribution, you might have to use slightly different package names and / or install the Vagrant libvirt plugin directly using the instructions here. Also, some of the packages (especially Java which we only need to be able to use the Java keytool) might already be present on your machine.

sudo apt-get update 
sudo apt-get install \
  libvirt-daemon \
  libvirt-clients \
  python3-pip \
  python3-libvirt \
  virt-manager \
  vagrant \
  vagrant-libvirt \
  git \
sudo adduser $(id -un) libvirt
sudo adduser $(id -un) kvm
pip3 install ansible lxml pyopenssl

Note that when installing Ansible as individual user as above, Ansible will be installed in ~/.local/bin, so make sure to add this to your path.

Next, clone my repository, change into the created directory, use virsh and vagrant up to start the network and the virtual machines and then run Ansible to install ZooKeeper and Kafka.

git clone https://github.com/christianb93/kafka.git
cd kafka
virsh net-define kafka-private-network.xml
wget http://mirror.cc.columbia.edu/pub/software/apache/kafka/2.4.1/kafka_2.13-2.4.1.tgz
tar xvf kafka_2.13-2.4.1.tgz
mv kafka_2.13-2.4.1 kafka
vagrant up
ansible-playbook site.yaml

Once the installation completes, it is time to run a few checks. First, let us verify that the ZooKeeper is running correctly on each node. For that purpose, SSH into the first node using vagrant ssh broker1 and run

/usr/share/zookeeper/bin/zkServer.sh status

This should print out the configuration file used by ZooKeeper as well as the mode the node is in (follower or leader).

Now let us see whether Kafka is running on each node. First, of course, you should check the status using systemctl status kafka. Then, we can see whether all brokers have registered themselves with ZooKeeper. To do this, run

sudo /usr/share/zookeeper/bin/zkCli.sh \
  -server broker1:2181 \
  ls /brokers/ids

on any of the broker nodes. You should get a list with the broker ids of the cluster, i.e. “[1,2,3]”. Finally, we can try to create a topic.

/opt/kafka/kafka_2.13-2.4.1/bin/kafka-topics.sh \
  --create \
  --bootstrap-server broker1:9092 \
  --replication-factor 3 \
  --partitions 2 \
  --topic test

Congratulations, you are now proud owner of a working three-node Kafka cluster. Having this in place, we are now ready to dive deeper into producing Kafka data, which will be the topic of our next post.

Managing KVM virtual machines part I – Vagrant and libvirt

When you first install and play with Vagrant, chances are that you will be using the VirtualBox VM provider initially, which is supported out-of-the-box by Vagrant and open source. However, in some situations, VirtualBox might not be your preferred hypervisor. Luckily, with the help of a plugin, Vagrant can also be used with KVM. In this and the next post, we will learn how this works and take the opportunity to also learn a bit on KVM, libvirt and all that.

KVM and libvirt

KVM (Kernel Virtual Machine) is a Linux kernel module which turns Linux into a hypervisor, making use of the hardware support for virtualization that is built into all modern x86 CPUs (this feature is called VT-X on Intel CPUs and AMD-V on AMD CPUs). Typically, KVM is not used directly, but is being managed by libvirt, which is a collection of software components like an API, a daemon for remote access and the virsh command line utility to control virtual machines.

Libvirt, in turn, can be used with clients in most major programming languages (including C, Java, Python and Go), and is employed by many virtualization tools like the graphical virtual machine manager virt-manager or OpenStack Nova to create, manage and destroy virtual machines. There is also a Ruby client for the libvirt API, which makes it accessible from Vagrant.

In addition to KVM, libvirt is actually able to leverage many other virtualization providers, including LXC, VMWare and HyperV. The following diagram summarizes how the components discussed so far are related, the components that we will see in action today are marked.


Creating virtual machines with Vagrant

The above diagram indicates that we have a choice between several methods to create virtual machines. I personally like to use Vagrant for that purpose. As libvirt is not one of the standard providers built into Vagrant, we will have to install a plugin first. Assuming that you have not yet installed Vagrant at all, here are the steps needed to install and set up Vagrant, KVM and the required plugin on a standard Ubuntu 18.04 install. First, we install the libvirt library, the virt-manager and vagrant, and add the current user to the groups libvirt and KVM.

sudo apt-get update 
sudo apt-get install \
  libvirt-daemon \
  libvirt-clients \
  virt-manager \
  python3-libvirt \
  vagrant \
sudo adduser $(id -un) libvirt
sudo adduser $(id -un) kvm

At this point, you will have to log out (or run su -l) and in again to make sure that the new group assignments become effective. Note that we install the libvirt Vagrant plugin from the Ubuntu package and not directly, for other Linux distributions, you might want to install using vagrant plugin install vagrant-libvirt. For this post, I have used Vagrant 2.0.2 and version 0.0.43 of the plugin. Finally we download a Debian Jessie image (called a box in Vagrant terminology)

vagrant box add \
  debian/jessie64 \

Now we are ready to bring up our first machine. Obviously, we need a Vagrant file for that purpose. Here is a minimum Vagrant file

Vagrant.configure("2") do |config|
  config.vm.box = "debian/jessie64"
  config.vm.hostname = "test"
  config.vm.provider :libvirt do |v|

In the first line, we set an environment variable which instructs Vagrant to use the libvirt provider (instead of the default provider VirtualBox). In the next few lines, we define a virtual machine as usual. In the provider specific block, we define the number of vCPUs for the machine and the amount of RAM and set the prefix that Vagrant is going to use to build a libvirt domain name for the VM.

Now you should be able to bring up the machine using vagrant up in the directory where the file is located.

Once this command completes, it is time to analyze the resulting configuration a bit. Basically, as explained here, the plugin will go through the following steps to bring up the machine.

  • Upload the image which is part of the box that we use to /var/lib/libvirt/images/, into a libvirt storage pool. This is important to understand in case you change the box, as in this case, you will have to remove the image manually again to force a refresh. We will learn more about storage pools in the next post
  • create a virtual machine, called a domain in the libvirt terminology
  • Create a virtual network and attach it to the machine – we get back to this point later. In addition, a “public IP” will be allocated for this IP address, using the built-in DHCP server
  • Create an SSH key and inject it into the machine

Let us try to learn more about the configuration that Vagrant has created for us. First, run virt-manager to start the graphical machine manager. You should now see a new virtual machine in the overview, and when doubleclicking on the machine, a terminal should open. As the Debian image that we use has a passwordless root account, you should actually be able to log in as root.

By clicking on the “Info” icon or via “View –> Details”, you should also be able to see the configuration of the machine, including things like the attached virtual disks and network interfaces.


Of course we can also get this – and even more – information using the command line client virsh. First, run virsh list to see a list of all domains (i.e. virtual machines). Assuming that you have no other libvirt-managed virtual machines running, this will give you only one line corresponding to the domain test_default which we have already seen in the virtual machine manager. You can retrieve the basic facts about this domain using

virsh dominfo test_default

The virsh utility has a wide variety of options, the best way to learn it is to type virsh help and simply try out a few of the commands. A few notable examples are

# List all block devices attached to our VM
virsh domblkinfo test_default
# List all storage pools
virsh pool-list 
# List all images in the default pool
virsh vol-list default
# List all virtual network interfaces attached to the VM
virsh domiflist test_default
# List all networks
virsh net-list

Internally, libvirt uses XML configuration files to maintain the state of the virtual machines, networks and storage objects. To see, for instance, the full XML configuration of our test machine, run

virsh dumpxml test_default

In the output, we now see all objects – CPU, disks and disk controller, network interfaces, graphics card, keyboard etc. – attached to the machine. We can now dump further XML structures and data to deep dive into the configuration. For instance, the XML output for the machine tells us that the machine is connected to the network vagrant-libvirt, corresponding to the virtual Linux bridge virbr1 (of course, libvirt uses bridges to model networks). To get more information on this, run

virsh net-dumpxml vagrant-libvirt
virsh net-dhcp-leases vagrant-libvirt
ifconfig virbr1
brctl show virbr1

It is instructive to play a bit with that, maybe add a second virtual machine using virt-manager and see how it is reflected in the virsh tool and the network configuration and so forth.

Advanced features

Let us now look at some of the more advanced options you have with the libvirt Vagrant plugin. The first option I would like to mention is to use custom networking. For that purpose, assume that you have created a libvirt network outside of Vagrant. As an example, create a file /tmp/my-network.xml with the following content.

<bridge name='my-bridge' stp='on' delay='0'/>
<ip address='' netmask=''/>

view raw
hosted with ❤ by GitHub

Then run the following commands to create and start a custom network from this definition using virsh.

virsh net-define /tmp/my-network.xml
virsh net-start my-network

This will create a simple network supported by a Linux bridge my-bridge (which libvirt will create for us). As there is no forward block, the network will be isolated and machines attached to it will not be able to connect to the outside world, so this is the equivalent of a private network. To connect a machine to this network, use the following Vagrant file (make sure to delete our first machine again using vagrant destroy first).

Vagrant.configure("2") do |config|
  config.vm.box = "debian/jessie64"
  config.vm.hostname = "test"
  config.vm.network :private_network,
      :ip => "",
      :libvirt__network_name => "my-network"  
  config.vm.provider :libvirt do |v|

Note the line starting with config.vm.network, in which we add our custom network as private network, with a static IP address. When you now run vagrant up again, and repeat the analysis above, you will see that Vagrant has actually attached our machine to two networks – the private network and, in addition, a “public” network which Vagrant will always create to reach the machines via SSH and to enable access to the Internet from the machine.

At this point, it is important that we create the network before we run Vagrant. In fact, if we refer to a network in a Vagrantfile that does not exist yet, Vagrant will be happy to create the network for us, but will use default settings – for instance it will attach a DHCP server to our network and allow access to the internet. This is most likely not what you want, so make sure to create the network using virsh net-define before running vagrant up.

Next let us try to understand how we can use Vagrant to attach additional disks to our virtual machine. Thanks to the libvirt plugin, this is again very easy. Simply add a line like

v.storage :file, :size => '5G', :type => 'raw', :bus => 'scsi'

to the customization section (i.e. to the section which also contains the settings for the number of virtual CPUs and the memory). This will attach a virtual SCSI device to our machine, with a size of 5 GB and image type “raw”. Vagrant will then take care of creating this image, setting it up as a volume in libvirt and attaching it to the virtual machine.

Note that the plugin is also able to automatically synchronize folders in the virtual machine with folders on the host, using for instance rsync. Please refer to the excellent documentation for this and more options.

This completes our short tour through the Vagrant libvirt plugin. You might have realized that libvirt and virsh are powerful tools with a rich data model – we have seen objects like domains, networks, volumes and storage devices. In the next post, we will dig a bit deeper into the inner structure of libvirt and learn how to create virtual machines from scratch, without using a tool like Vagrant.

Virtual networking labs – virtual Ethernet networks with VLAN tags

In the previous posts, we have mainly been looking at virtual networking within one single physical hosts. This is nice, but to build cloud environments, we need to establish virtual networks across several physical hosts. In this post, we will start to look into technologies that make this possible and learn how VLAN tagging supports virtual Ethernet networks.

An introduction to virtual Ethernet networks

Today, essentially every Ethernet network you will come across is a switched network, where every server is more or less directly connected to a switch, and the switches are connected to each other to propagate traffic through your data center. A naive approach would be to use layer 2 switches to combine all Ethernet networks into one large broadcast domain, where every node is connected to every other node by a sequence of switches. This approach, however, creates a very large broadcast domain and is difficult to maintain as changes to the topology need to be done by a physical rearrangement. It might therefore be beneficial to have some way of dividing your physical Ethernet network into two or more logical (“virtual”) networks.

For servers that are connected to the same switch, this can be implemented by an approach known as port-based VLAN. To illustrate the idea, let us look at the following configuration, where four servers are connected to four different ports of one switch.


With this setup, a broadcast issued by one server will reach every other server, and all servers are part of one Ethernet network. To introduce virtualization, we could simply add some logic to the switch to divide the ports into two sets, where forwarding of Ethernet frames is only done within those two sets. If, for instance, we define one set to consist of the two ports connected to server 1 and server 2 (green), and the other consisting of the remaining two ports (red), and configure the switch such that it will only forward frames between ports with the same color, we will effectively have established two virtual networks.


This is nice, as – if your switch supports it – no additional hardware is required and you can define and change the configuration entirely in software. But there is a problem. Typically, your data center will have more than one switch. How can you extend these virtual networks across multiple switches? Of course, you could add an additional connection for every virtual network between any two switches, but this will blow up your hardware requirements and again make changes in hardware necessary. To avoid this, a technology called VLAN trunking is needed.

With VLAN trunking, different virtual LANs (VLANs) can share the same physical connection. To enable this, Ethernet frames that travel on this shared part of your infrastructure are enhanced by adding a VLAN tag which contains a numerical ID identifying the VLAN to which they belong, as indicated in the following diagram.


Here, we have two switches, which both use port-based virtual networks as just discussed. The upper two ports of each switch belong to the green network which is assigned the ID 1 (VLAN ID or VID, note that in reality, this ID is often reserved) and the other set of ports is part of VLAN 2 (the red network). When a frame leaves, for instance, the server in the upper left corner and needs to be forwarded to the server in the upper right corner, the switch will add a VLAN tag to indicate that this frame is part of VLAN 1. Then the frame travels across the connection between the two switches. Then the switch on the right hand side receives the frame, it strips off the VLAN frame again and, based on the tag, injects the frame back into its own VLAN 1, so that it can only reach the green ports on the right hand side.

Thus your network is divided into two parts. In the middle, on the connection between the two switches, frames carry the VLAN tag to flag them as being part of the red or green network. Thus the ports facing this part need to be aware of the VLAN tag – these ports are often called trunk ports. The parts of the network behind the switches, however, do never see a VLAN tag, as it is added and removed by the switches when transmitting and receiving on trunk ports. These ports are called access ports. Thus the servers do not need to known to which VLAN they belong, and the configuration can be done entirely on the switches and in software.

The standard that describes all this and also defines how a VLAN tag is added to an Ethernet frame is called IEEE 802.1Q. This standard adds a 16-bit field called TCI – tag control information to the layout of an Ethernet frame. Four bits of this field are reserved for other purposes, so that 12 bits remain for the VLAN ID, allowing a maximum of 4096 different VLANs.

Lab 8: VLAN networking with Linux

Linux has the capability to create virtual Ethernet devices that are associated with a VLAN network. To see this in action, get lab 8 from my GitHub repository and run it.

git clone https://github.com/christianb93/networking-samples
cd networking-samples/lab8
vagrant up

The Vagrantfile and the three Ansible playbooks that are located in this directory will now execute and bring up three virtual machines. Here is a diagram summarizing the network configuration that the scripts create (we will see how this is done manually further below).


We see that all three machines are connected to one virtual Ethernet cable (we use a VirtualBox internal network for that purpose). The three interfaces attached to this network are configured as part of the IP network

However, in addition, we have set up two virtual networks – one network with VLAN ID 100 (green), and a second network with VLAN ID 200 (red). In each Linux machine, the virtual networks to which the machine is attached is represented by a virtual device called a VLAN device.

Let us look at boxA to see how this works. On boxA, the Ansible playbook that got executed during the vagrant up did run the following command

vconfig add enp0s8 100

This command is creating a new network interface enp0s8.100 sitting on top of enp0s8 but being associated with the VID 100. This device is an ordinary device from the point of view of the operating system, i.e. you can assign IP addresses, add routes and so forth.

Such a VLAN device operates as follows. When an Ethernet frame arrives on the underlying device, enp0s8 in our case, the kernel checks whether the frame contains a VLAN tag. If no, the processing is as usual. If yes, then the kernel next checks whether a VLAN device is associated with this VID. If there is one, it strips off the VLAN tag, changes the frame so that it appears to be coming from the virtual VLAN device and re-injects the frame into the networking stack. The frame then travels up the stack and can be processed by the higher layers, e.g. the IP layer. Conversely, if a frame needs to be transmitted on enp0s8.100, the kernel adds a VLAN tag with the VID 100 to the frame and redirects it to the physical device enp0s8.

Let us see this in action. Open two SSH connections, one to boxA, and one to boxB – if you use the Gnome terminal, simply run

for i in "A" "B" ; do gnome-terminal -e "vagrant ssh box$i"; done

In boxA, start a tcpdump session on the VLAN device.

sudo tcpdump -e -i enp0s8.100

On boxB, ping boxA, using the IP address (the IP address of the VLAN device). You will see an ordinary frame coming in, with ethertype IPv4. There is no VLAN tag within this frame, and the VLAN device operates like a physical device with no VLAN tagging.

Now, stop the tcpdump session and start it again, but this time, use enp0s8 instead of enp0s8.100, i.e. the underlying physical device. If you now run a ping again, you will see that the ethertype of the incoming packages has changed and is now 802.1Q, indicating that the frame is tagged (tcpdump will also show you the VLAN ID 100).

When you ping boxA from boxB using the IP address, the traffic will be as expected, coming in on enp0s8 without any VLAN tag, and will not reach enp0s8.100. Thus even though you have put a VLAN device on top of the physical interface, you can still use the physical interface as usual.

It is instructive to check the ARP cache on boxB using arp -n after the pings have been exchanged. You will see that the MAC address of the enp0s8 device on boxA now appears twice, once with the IP address and once with So the MAC address is shared between the virtual VLAN device and the physical device.

Still, the traffic is separated by the Linux kernel. If, for instance, you try to ping (one of the IP addresses of boxC) from boxA, you will not be successful, because this IP address is on the red network and not reachable from the green network. If you run the ping on boxB, however, it will work, because boxB participates in both virtual networks.

This closes todays lab. In the next lab, we will start to look at a completely different approach to building virtual networks – overlay networks.

Automating provisioning with Ansible – playbooks

So far, we have used Ansible to execute individual commands on all hosts in the inventory, one by one. Today, we will learn how to use playbooks to orchestrate the command execution. As a side note, we will also learn how to set up a local test environment using Vagrant.

Setting up a local test environment

To develop and debug Ansible playbooks, it can be very useful to have a local test environment. If you have a reasonably powerful machine (RAM will be the bottleneck), this can be easily done by spinning up virtual machines using Vagrant. Essentially, Vagrant is a tool to orchestrate virtual machines, based on configuration files which can be put under version control. We will not go into details on Vagrant in this post, but only describe briefly how the setup works.

First, we need to install Vagrant and VirtualBox. On Ubuntu, this is done using

sudo apt-get install virtualbox vagrant

Next, create a folder vagrant in your home directory, switch to it and create a SSH key pair that we will later use to access our virtual machines.

ssh-keygen -b 2048 -t rsa -f vagrant_key -P ""

This will create two files, the private key vagrant_key and the corresponding public key vagrant_key.pub.

Next, we need a configuration file for the virtual machines we want to bring up. This file is traditionally called Vagrantfile. In our case, the file looks as follows.

Vagrant.configure("2") do |config|

  config.ssh.private_key_path = ['~/.vagrant.d/insecure_private_key', '~/.keys/ansible_key'] 
  config.vm.provision "file", source: "~/vagrant/vagrant_key.pub", destination: "~/.ssh/authorized_keys" 
  config.vm.define "boxA" do |boxA|
    boxA.vm.box = "ubuntu/bionic64"
    boxA.vm.network "private_network", ip: ""

  config.vm.define "boxB" do |boxB|
    boxB.vm.box = "ubuntu/bionic64"
    boxB.vm.network "private_network", ip: ""

We will not go into the details, but essentially this file instructs Vagrant to provision two machines, called boxA and boxB (the file, including some comments, can also be downloaded here). Both machines will be connected to a virtual network device and will be reachable from the host and from each other using the IP addresses and (using the VirtualBox networking machinery in the background, which I have described in a bit more detail in this post). On both files, we will install the public key just created, and we ask Vagrant to use the corresponding private key.

Now place this file in the newly created directory ~/vagrant and, from there, run

vagrant up

Vagrant will now start to spin up the two virtual machines. When you run this command for the first time, it needs to download the machine image which will take some time, but once the image is present locally, this usually only takes one or two minutes.

Our first playbook

Without further ado, here is our first playbook that we will use to explain the basic structure of an Ansible playbook.

# Our first play. We create an Ansible user on the host
- name: Initial setup
  hosts: all
  become: yes
  - name: Create a user ansible_user on the host
      name: ansible_user
      state: present
  - name: Create a directory /home/ansible_user/.ssh
      path: /home/ansible_user/.ssh
      group: ansible_user
      owner: ansible_user
      mode: 0700
      state: directory
  - name: Distribute ssh key
      src: ~/.keys/ansible_key.pub
      dest: /home/ansible_user/.ssh/authorized_keys
      mode: 0700
      owner: ansible_user
      group: ansible_user
  - name: Add newly created user to sudoers file
      path: /etc/sudoers
      state: present
      line: "ansible_user      ALL = NOPASSWD: ALL"


Let us go through this line by line. First, recall from our previous posts that an Ansible playbook is a sequence of plays. Each play in turn refers to a group of hosts in the inventory.


A playbook is written in YAML-syntax that you probably know well if you have followed my series on Kubernetes (if not, you might want to take a look at the summary page on Wikipedia). Being a sequence of plays, it is a list. Our example contains only one play.

The first attribute of our play is called name. This is simply a human-readable name that ideally briefly summarizes what the play is doing. The next attribute hosts refers to a set of hosts in our inventory. In our case, we want to run the play for all nodes in the inventory. More precisely, this is a pattern which can specify individual hosts, groups of hosts or various combinations.

The second attribute is the become flag that we have already seen several times. We need this, as our vagrant user is not root and we need to do a sudo to execute most of our commands.

The next attribute, tasks, is itself a list and contains all tasks that make up the playbook. When Ansible executes a playbook, it will execute the plays in the order in which they are present in the playbook. For each play, it will go through the tasks and, for each task, it will loop through the corresponding hosts and execute this task on each host. So a task will have to be completed for each host before the next task starts. If an error occurs, this host will be removed from the rotation and the execution of the play will continue.

We have learned in an earlier post that a task is basically the execution of a module. Each task usually starts with a name. The next line specifies the module to execute, followed by a list of parameters for this module. Our first task executes the user module, passing the name of the user to be created as an argument, similar to what we did when executing commands ad-hoc. Our second task executes the file module, the third task the copy module and the last task the lineinfile module. If you have followed my previous posts, you will easily recognize what these tasks are doing – we create a new user, distribute the SSH key in ~/.keys/ansible_key.pub and add the user to the sudoers file on each host (of course, we could as well continue to use the vagrant user, but we perform the switch to a different user for the sake of demonstration). To run this example, you will have to create a SSH key pair called ansible_key and store it in the subdirectory .keys of your home directory, as in my last post.

Executing our playbook

How do we actually execute a playbook? The community edition of Ansible contains a command-line tool called ansible-playbook that accepts one or more playbooks as arguments and executes them. So assuming that you have saved the above playbook in a file called myFirstPlaybook, you can run it as follows.

ansible-playbook -i hosts.ini  -u vagrant --private-key ~/vagrant/vagrant_key myFirstPlaybook.yaml

The parameters are quite similar to the parameters of the ansible command. We use -i to specify the location of an inventory file, -u to specify the user to use for the SSH connections and –private-key to specify the location of the private key file of this user.

When we run this playbook, Ansible will log a summary of the actions to the console. We will see that it starts to work on our play, and then, for each task, executes the respective action once for each host. When the script completes, we can verify that everything worked by using SSH to connect to the machines using our newly created user, for instance

ssh -i ~/.keys/ansible_key -o StrictHostKeyChecking=no ansible_user@

Instead of executing our playbook manually after Vagrant has completed the setup of the machines, it is also possible to integrate Ansible with Vagrant. In fact, Ansible can be used in Vagrant as a Provisioner, meaning that we can specify an Ansible playbook that Vagrant will execute for each host immediately after is has been provisioned. To see this in action, copy the playbook to the Vagrant directory ~/vagrant and, in the Vagrantfile, add the three lines

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "myFirstPlaybook.yaml"

immediately after the line starting with config.vm.provision "file". When you now shut down the machines again using

vagrant destroy

from within the vagrant directory and then re-start using

vagrant destroy

you will see that the playbook gets executed once for each machine that is being provisioned. Behind the scenes, Vagrant will create an inventory in ~/vagrant/.vagrant/provisioners/ansible/inventory and will add the hosts there (when inspecting this file, you will find that Vagrant incorrectly adds the insecure default SSH key as a parameter to each host, but apparently this overridden by the settings in our Vagrantfile and does not cause any problems).

Of course there is much more that you can do with playbooks. As a first more complex example, I have created a playbook that

  • Creates a new user ansible_user and adds it to the sudoer file
  • Distributes a corresponding public SSH key
  • Installs Docker
  • Installs pip3 and the Python Docker library
  • Pulls an NGINX container from the Docker hub
  • Creates a HTML file and dynamically add the primary IP address of the host
  • Starts the container and maps the HTTP port

To run this playbook, copy it to the vagrant directory as well along with the HTML file template, change the name of the playbook in the Vagrantfile to nginx.yaml and run vagrant as before.

Once this script completes, you can point your browser to or and see a custom NGINX welcome screen. This playbook uses some additional features that we will discuss in the next post, most notably variables and facts.