Managing KVM virtual machines part III – using libvirt with Ansible

In my previous post, we have seen how the libvirt toolset can be used to directly create virtual volumes, virtual networks and KVM virtual machines. If this is not the first time you visit my post, you will know that I am a big fan of automation, so let us investigate today how we can use Ansible to bring up KVM domains automatically.

Using the Ansible libvirt modules

Fortunately, Ansible comes with a couple of libvirt modules that we can use for this purpose. These modules mainly follow the same approach – we initially create objects from an XML template and then can perform additional operations on them, referencing them by name.

To use these modules, you will have to install a couple of additional libraries on your workstation. First, obviously, you need Ansible, and I suggest that you take a look at my short introduction to Ansible if you are new to it and do not have it installed yet. I have used version 2.8.6 for this post.

Next, Ansible of course uses Python, and there are a couple of Python modules that we need. In the first post on libvirt, I have already shown you how to install the python3-libvirt package. In addition, you will have to run

pip3 install lxml

to install the XML module that we will need.

Let us start with a volume pool. In order to somehow isolate our volumes from other volumes on the same hypervisor, it is useful to put them into a separate volume pool. The virt_pool module is able to do this for us, given that we provide an XML definition of the pool which we can of course create from a Jinja2 template, using the name of the volume pool and its location on the hard disk as parameter.

At this point, we have to define where we store the state of our configuration, consisting of the storage pool, but also potentially SSH keys and the XML templates that we use. I have decided to put all this into a directory state in the same directory where the playbook is kept. This avoids polluting system- or user-wide directories, but also implies that when we eventually clean up this directory, we first have to remove the storage pool again in libvirt as we otherwise have a stale reference from the libvirt XML structures in /etc/libvirt into a non-existing directory.

Once we have a storage pool, we can create a volume. Unfortunately, Ansible does not (at least not at the time of writing this) have a module to handle libvirt volumes, so a bit of scripting is needed. Essentially, we will have to

  • Download a base image if not done yet – we can use the get_url module for that purpose
  • Check whether our volume already exists (this is needed to make sure that our playbook is idempotent)
  • If no, create a new overlay volume using our base image as backing store

We could of course upload the base image into the libvirt pool, or, alternatively, keep the base image outside of libvirts control and refer to it via its location on the local file system.

At this point, we have a volume and can start to create a network. Here, we can again use an Ansible module which has idempotency already built into it – virt_net. As before, we need to create an XML description for the network from a Jinja2 template and hand that over to the module to create our network. Once created, we can then refer to it by name and start it.

Finally, its time to create a virtual machine. Here, the virt module comes in handy. Still, we need an XML description of our domain, and there are several ways to create that. We could of course built an XML file from scratch based on the specification on libvirt.org, or simply use the tools explored in the previous post to create a machine manually and then dump the XML file and use it as a template. I strongly recommend to do both – dump the XML file of a virtual machine and then go through it line by line, with the libvirt documentation in a separate window, to see what it does.

When defining the virtual machine via the virt module, I was running into error messages that did not show up when using the same XML definition with virsh directly, so I decided to also use virsh for that purpose.

In case you want to try out the code I had at this point, follow the instructions above to install the required modules and then run (there is still an issue with this code, which we will fix in a minute)

git clone https://github.com/christianb93/ansible-samples
cd ansible-samples
cd libvirt
git checkout origin/v0.1
ansible-playbook site.yaml

When you now run virsh list, you should see the newly created domain test-instance, and using virt-viewer or virt-manager, you should be able to see its screen and that it boots up (after waiting a few seconds because there is no device labeled as root device, which is not a problem – the boot process will still continue).

So now you can log into your machine…but wait a minute, the image that we use (the Ubuntu cloud image) has no root password set. No problem, we will use SSH. So figure out the IP address of your domain using virsh domifaddr test-instance…hmmm…no IP address. Something is still wrong with our machine, its running but there is no way to log into it. And even if we had an IP address, we could not SSH into it because there is SSH key.

Apparently we are missing something, and in fact, things like getting a DHCP lease and creating an SSH key would be the job of cloud-init. So there is a problem with the cloud-init configuration of our machine, which we will now investigate and fix.

Using cloud-init

First, we have to diagnose our problem in a bit more detail. To do this, it would of course be beneficial if we could log into our machine. Fortunately, there is a great tool called virt-customize which is part of the libguestfs project and allows you to manipulate a virtual disk image. So let us shut down our machine, add a root password and restart the machine.

virsh destroy test-instance
sudo virt-customize \
  -a state/pool/test-volume \
  --root-password password:secret
virsh start test-instance

After some time, you should now be able to log into the instance as root using the password “secret”, either from the VNC console or via virsh console test-instance.

Inside the guest, the first thing we can try is to run dhclient to get an IP address. Once this completes, you should see that ens3 has an IP address assigned, so our network and the DHCP server works.

Next let us try to understand why no DHCP address was assigned during the boot process. Usually, this is done by cloud-init, which, in turn, is started by the systemd mechanism. Specifically, cloud-init uses a generator, i.e. a script (/lib/systemd/system-generators/cloud-init-generator) which then dynamically creates unit files and targets. This script (or rather a Jinja2 template used to create it when cloud-init is installed) can be found here.

Looking at this script and the logging output in our test instance (located at /run/cloud-init), we can now figure out what has happened. First, the script tries to determine whether cloud-init should be enabled. For that purpose, it uses the following sources of information.

  • The kernel command line as stored in /proc/cmdline, where it will search for a parameter cloud-init which is not set in our case
  • Marker files in /etc/cloud, which are also not there in our case
  • A default as fallback, which is set to “enabled”

So far so good, cloud-init has figured out that it should be enabled. Next, however, it checks for a data source, i.e. a source of meta-data and user-data. This is done by calling another script called ds-identify. It is instructive to re-run this script in our virtual machine with an increased log level and take a look at the output.

DEBUG=2 /usr/lib/cloud-init/ds-identify --force
cat /run/cloud-init/ds-identify.log

Here we see that the script tries several sources, each modeled as a function. An example is the check for EC2 metadata, which uses a combination of data like DMI serial number or hypervisor IDs with “seed files” which can be used to enforce the use a data source to check whether we are running on EC2.

For setups like our setup which is not part of a cloud environment, there is a special data source called the no cloud data source. To force cloud-init to use this data source, there are several options.

  • Add the switch “ds=nocloud” to the kernel command line
  • Fake a DMI product serial number containing “ds=nocloud”
  • Add a special marker directory (“seed directory”) to the image
  • Attach a disk that contains a file system with label “cidata” or “CIDATA” to the machine

So the first step to use cloud-config would to actually enable it. We will choose the approach to add a correctly labeled disk image to our machine. To create such a disk, there is a nice tool called cloud_localds, but we will do this manually to better understand how it works.

First, we need to define the cloud-config data that we want to feed to cloud-init by putting it onto the disk. In general, cloud-init expects two pieces of data: meta data, providing information on the instance, and user data, which contains the actual cloud-init configuration.

For the metadata, we use a file containing only a bare minimum.

$ cat state/meta-data
instance-id: iid-test-instance
local-hostname: test-instance

For the metadata, we use the example from the cloud-init documentation as a starting point, which will simply set up a password for the default user.

$ cat state/user-data
#cloud-config
password: password
chpasswd: { expire: False }
ssh_pwauth: True

Now we use the genisoimage utility to create an ISO image from this data. Here is the command that we need to run.

genisoimage \
  -o state/cloud-config.iso \
  -joliet -rock \
  -volid cidata \
  state/user-data \
  state/meta-data 

Now open the virt-manager, shut down the machine, navigate to the machine details, and use “Add hardware” to add our new ISO image as a CDROM device. When you now restart the machine, you should be able to log in as user “ubuntu” with the password “password”.

When playing with this, there is an interesting pitfall. The cloud-init tool actually caches data on disk, and will only process a changed configuration if the instance-ID changes. So if cloud-init runs once, and you then make a change, you need to delete root volume as well to make sure that this cache is not used, otherwise you will run into interesting issues during testing.

Similar mechanisms need to be kept in mind for the network configuration. When running for the first time, cloud-init will create a network configuration in /etc/netplan. This configuration makes sure that the network is correctly reconfigured when we reboot the machine, but contains the MAC-address of the virtual network interface. So if you destroy and undefine the domain while the disk untouched, and create a new domain (with a new MAC-address) using the same disk, then the network configuration will fail because the MAC-address no longer matches.

Let us now automate the entire process using Ansible and add some additional configuration data. First, having a password is of course not the most secure approach. Alternatively, we can use the ssh_authorized_keys key in the cloud-config file which accepts a public SSH key (as a string) which will be added as an authorized key to the default user.

Next the way we have managed our ISO image is not yet ideal. In fact, when we attach the image to a virtual machine, libvirt will assume ownership of the file and later change its owner and group to “root”, so that an ordinary user cannot easily delete it. To fix this, we can create the ISO image as a volume under the control of libvirt (i.e. in our volume pool) so that we can remove or update it using virsh.

To see all this in action, clean up by removing the virtual machine and the generated ISO file in the state directory (which should work as long as your user is in the kvm group), get back to the directory into which you cloned my repository and run

cd libvirt
git checkout master
ansible-playbook site.yaml
# Wait for machine to get an IP
ip=""
while [ "$ip" == "" ]; do
  sleep 5
  echo "Waiting for IP..."
  ip=$(virsh domifaddr \
    test-instance \
    | grep "ipv4" \
    | awk '{print $4}' \
    | sed "s/\/24//")
done
echo "Got IP $ip, waiting another 5 seconds"
sleep 5
ssh -i ./state/ssh-key ubuntu@$ip

This will run the playbook which configures and brings up the machine, wait until the machine has acquired an IP address and then SSH into it using the ubuntu user with the generated SSH key.

The playbook will also create two scripts and place them in the state directory. The first script – ssh.sh – will exactly do the same as our snippet above, i.e. figure out the IP address and SSH into the machine. The second script – cleanup.sh – removes all created objects, except the downloaded Ubuntu cloud image so that you can start over with a fresh install.

This completes our post on using Ansible with libvirt. Of course, this does not replace a more sophisticated tool like Vagrant, but for simple setups, it works nicely and avoids the additional dependency on Vagrant. I hope you had some fun – keep on automating!

A cloud in the cloud – running OpenStack on a public cloud platform

When you are playing with virtualization and cloud technology, you will sooner or later realize that the resources of an average lab PC are limited. Especially memory can easily become a bottleneck if you need to spin up more than just a few virtual machines on an average desktop computer. Public cloud platforms, however, offer a virtually unlimited scalability – so why not moving your lab into the cloud to build a cloud in the cloud? In this post, I show you how this can be done and what you need to keep in mind when selecting your platform.

The all-in-one setup on Packet.net

Over the last couple of month, I have spend time playing with OpenStack, which involves a minimal but constantly growing installation of OpenStack on a couple of virtual machines running on my PC. While this was working nicely for the first couple of weeks, I soon realized that scaling this would require more resources – especially memory – than an average PC typically has. So I started to think about moving my entire setup into the cloud.

Of course my first idea was to use a bare metal platform like Packet.net to bring up a couple of physical nodes replacing the VirtualBox instances used so far. However, there are some reasons why I dismissed this idea again soon.

First, for a reasonably realistic setup, I would need at least five nodes – a controller node, a network node, two compute nodes and a storage node. Each of these nodes would need at least two network interfaces – the network node would actually need three – and the nodes would need direct layer 2 connectivity. With the networking options offered by Packet, this is not easily possible. Yes, you can create layer two networks on Packet, but there are some limitations – you can have at most two interfaces of which one might be needed for the SSH access and therefore cannot be used for a private network. In addition, this feature requires servers of a minimum size, which makes it a bit more expensive than what I was willing to spend for a playground. Of course we could collapse all logical network interfaces into a single physical interface, but as one of my main interests is exploring different networking options this is not what I want to do.

Therefore, I decided to use a slightly different approach on Packet which you might call the all-in-one approach – I simply moved my entire lab host into the cloud. Thus, I would get one sufficiently large bare metal server on Packet (with at least 32 GB of RAM, which is still quite affordable), install all the tools I would also need on my local lab host (Vagrant, VirtualBox, Ansible,..) and run the lab there as usual. Thus our OpenStack nodes are VirtualBox instances running on bare metal, and the instances that we bring up in Nova are using QEMU as nested virtualization on Intel CPUs is not supported by VirtualBox.

PacketLabHost

To automate this setup using Ansible, we can employ two stages. The first stage uses the Packet Ansible modules to create the lab host, set up the SSH configuration, install all the required software on the lab host, create a user and set up a basic firewall. We also need to install a few additional packages as the installation of VirtualBox will require some kernel modules to be built. In the second stage, we can then run the actual lab by fetching the latest version of the lab setup from the GitHub repository and invoking Vagrant and Ansible on the lab host.

If you want to try this, you will of course need a valid Packet.net account. Once you have that, head over to the Packet console and grab an API key. Put this API key into the environment variable PACKET_API_TOKEN. Next, create an SSH key pair called packet-default-key and place the private and public key in the .ssh subdirectory of your home directory. This SSH key will be added as authorized key to the lab host so that you can SSH into it. Finally, clone the GitHub repository and run the main playbook.

export PACKET_API_TOKEN=your packet.net token
ssh-keygen \
  -t rsa \
  -b 2048 \
  -f ~/.ssh/packet-default-key -P ""
git clone https://www.github.com/christianb93/openstack-labs
cd openstack-labs/Packet
ansible-playbook site.yaml

Be patient, this might take up to 30 minutes to complete, and in the last phase, when the actual lab is running on the remote host, you will not see any output from the main Ansible script. When the script completes, it will print a short message instructing you how the access the Horizon dashboard using an SSH tunnel to verify your setup.

By default, the script downloads and runs Lab6. You can change this by editing global_vars.yaml and setting the variable lab accordingly.

A distributed setup on GCP

This all-in-one setup is useful, but actually not very flexible. It limits me to essentially one machine type (8 cores, 32 GB RAM), and when I ever needed more RAM I would have to choose a machine with 64 GB, thus doubling my bill. So I decided to try out a different setup – using virtual machines in a public cloud environment instead of having one large lab host, a setup which you might want to call a distributed setup.

Of course this comes at a price – we put a virtualized environment (compute and network) on top of another virtualized environment. To limit the impact of this, I was looking for a platform that supports nested virtualization with KVM, which would allow me to run KVM instead of QEMU as OpenStack hypervisor.

OpenStackOnPublicCloud

So I started to look at the various cloud providers that I typically use to understand my options. DigitalOcean seems to support nested virtualization with KVM, but is limited when it comes to network interfaces, at least I am not aware of a way how to add more than one additional network interface to a DigitalOcean droplet and to add more than one private network. AWS does not seem to support nested virtualization at all. Azure does, but only for HyperV. That left me with Google’s GCP platform.

GCP does, in fact, support nested virtualization with KVM. In addition, I can add an arbitrary number of public and private network interfaces to each instance, though this requires a minimum number of vCPUs, and GCP also offers custom instances with a flexible ratio of vCPUs and RAM. So I decided to give it a try.

Next I started to design the layout of my networks. Obviously, a cloud platform implies some limitations for your networking setup. GCP, as most other public cloud platforms, does not provide layer 2 connectivity, but only layer 3 networking, so I would not be able to use flat or VLAN networks with Neutron, leaving overlay networks as the only choice. In addition, GCP does also not support IP multicast, which, however, is not a problem as the OVS based implementation of Neutron overlay networks only uses IP unicast messages. And GCP accepts VXLAN traffic (but does not support GRE encapsulation), so a Neutron overlay network using VXLAN should work.

I was looking for a way to support at least three different GCP networks – a public network, which would provide access to the Internet from my nodes to download packages and to SSH into the machines, a management network to which all nodes would be attached and an underlay network that I would use to support Neutron overlay networks. As GCP only allows me to attach three network interfaces to a virtual machine with at least 4 vCPUs, I wanted to avoid a setup where each of the nodes would be attached to each network. Instead, I decided to put an APT cache on the controller node and to use the network node as SSH jump host for storage and compute nodes, so that only the network node requires four interfaces. To implement an external Neutron network, I use a OVS based VXLAN network across compute and network nodes which is presented as an external network to Neutron, as we have already done it in a previous post. Ignoring storage nodes, this leads to the following setup.

GCPNetworkSetup

Of course, we need to be a bit careful with MTUs. As GCP does itself use an overlay network, the MTU of a standard GCP network device is 1460. When we add an additional layer of encapsulation to realize our Neutron overlay networks, we end up with an MTU of 1410 for the VNICs created inside the Nova KVM instances.

Automating the setup is rather straightforward. As demonstrated in a previous post, I use Terraform to bring up the base infrastructure and parse the Terraform output in my Ansible script to create a dynamic inventory. Then we set up the APT cache on the controller node (which, to avoid loops, implies that APT on the controller node must not use caching) and adapt the APT configuration on all other nodes to use it. We also use the network node as an SSH jump host (see this post for more on this) so that we can reach compute and storage nodes even though they do not have a public IP address.

Of course you will want to protect your machines using GCPs built-in firewalls (i.e. security groups). I have chosen to only allow incoming public traffic from the IP address of my own workstation (which also implies that you will not be able to use the GCP web based SSH console to reach your machines).

Finally, we need a way to access Horizon and the OpenStack API from our local workstation. For that purpose, I use an instance of NGINX running on the network host which proxies requests to the control node. When setting this up, there is a little twist – typically, a service like Keystone or Nova will return links in their responses, for instances to refer to entries in the service catalog or to the VNC proxy. These links will contain the IP address of our controller node, more specifically the management IP address of this node, which is not reachable from the public network. Therefore the NGINX forwarding rules need to rewrite these URLs to point to the network node – see the documentation of the Ansible role that I have written for this purpose for more details. The configuration uses SSL to secure the connection to NGINX so that the OpenStack credentials do not travel from your workstation to the network node unencrypted. Note, however, that in this setup, the traffic on the internal networks (the management network, the underlay network) is not encrypted.

To run this lab, there are of course some prerequisites. Obviously, you need a GCP account. It is also advisable to create a new project for this to be able to easily separate the automatically created resources from anything else you might have running on GCE. In the IAM console, make sure that your user has the “resourcemanager.organisationadministrator” role to be able to list and create new projects. Then navigate the to Manage Resources page, create a new project and create a service account for this new project. You will need to assign some roles to this service account so that it can create instances and networks – I use the compute instance admin role, the compute network admin role and the compute security admin role for that purpose.

To allow Terraform to create and destroy resources, we need the credentials of the service account. Use the GCP console to download a service account key in JSON format and store it on your local machine as ~/.keys/gcp_service_account.json. In addition, you will again have to create an SSH key pair and store it as ~/.ssh/gcp-default-key respectively ~/.ssh/gcp-default-key.pub, we will use this key to enable access to all GCP instances as user stack.

Once you are done with these preparations, you can now run the Terraform and Ansible scripts.

git clone https://www.github.com/christianb93/openstack-labs
cd openstack-labs/GCE
terraform init
terraform apply -auto-approve
ansible-playbook site.yaml
ansible-playbook demo.yaml

Once these scripts complete, you should be able to log into the Horizon console using the public IP of the network node, using the demo user credentials stored in ~/.os_credentials/credentials.yaml and see the instances created in demo.yaml up and running.

When you are done, you should clean up your resources again to avoid unnecessary charges. Thanks to Terraform, you can easily destroy all resources again using terraform destroy auto-approve.

OpenStack Octavia – creating and monitoring a load balancer

In the last post, we have seen how Octavia works at an architectural level and have gone through the process of installing and configuring Octavia. Today, we will see Octavia in action – we will create our first load balancer and inspect the resulting configuration to better understand what Octavia is doing.

Creating a load balancer

This post assumes that you have followed the instructions in my previous post and run Lab14, so that you are now proud owner of a working OpenStack installation including Octavia. If you have not done this yet, here are the instructions to do so.

git clone https://www.github.com/christianb93/openstack-labs
cd openstack-labs/Lab14
wget https://s3.eu-central-1.amazonaws.com/cloud.leftasexercise.com/amphora-x64-haproxy.qcow2
vagrant up
ansible-playbook -i hosts.ini site.yaml

Now, let us bring up a test environment. As part of the Lab, I have provided a playbook which will create two test instances, called web-1 and web-2. To run this playbook, enter

ansible-playbook -i hosts.ini demo.yaml

In addition to the test instances, the playbook is creating a demo user and a role called load-balancer_admin. The default policy distributed with Octavia will grant all users to which this role is assigned the right to read and write load balancer configurations, so we assign this role to the demo user as well. The playbook will also set up an internal network to which the instances are attached plus a router, and will assign floating IP addresses to the instances, creating the following setup.

OctaviaTestsConfigurationI

Once the playbook completes, we can log into the network node and inspect the running servers.

vagrant ssh network
source demo-openrc
openstack server list

Now its time to create our first load balancer. The load balancer will listen for incoming traffic on an address which is traditionally called virtual IP address (VIP). This terminology originates from a typical HA setup, in which you would have several load balancer instances in an active-passive configuration and use a protocol like VRRP to switch the IP address over to a new instance if the currently active instance fails. In our case, we do not do this, but still the term VIP is commonly used. When we create a load balancer, Octavia will assign a VIP for us and attach the load balancer to this network, but we need to pass the name of this network to Octavia as a parameter. With this, our command to start the load balancer and to monitor the Octavia log file to see how the provisioning process progresses is as follows.

openstack loadbalancer create \
  --name demo-loadbalancer\
  --vip-subnet external-subnet
sudo tail -f /var/log/octavia/octavia-worker.log

In the log file output, we can nicely see that Octavia is creating and signing a new certificate for use by the amphora. It then brings up the amphora and tries to establish a connection to its port 9443 (on which the agent will be listening) until the connection succeeds. If this happens, the instance is supposed to be ready. So let us wait until we see a line like “Mark ACTIVE in DB…” in the log file, hit ctrl-c and then display the load balancer.

openstack loadbalancer list
openstack loadbalancer amphora list

You should see that your new load balancer is in status ACTIVE and that an amphora has been created. Let us get the IP address of this amphora and SSH into it.

amphora_ip=$(openstack loadbalancer amphora list \
  -c lb_network_ip -f value)
ssh -i amphora-key ubuntu@$amphora_ip

Nice. You should now be inside the amphora instance, which is running a stripped down version of Ubuntu 16.04. Now let us see what is running inside the amphora.

ifconfig -a
sudo ps axwf
sudo netstat -a -n -p -t -u 

We find that in addition to the usual basic processes that you would expect in every Ubuntu Linux, there is an instance of the Gunicorn WSGI server, which runs the amphora agent, listening on port 9443. We also see that the amphora agent holds a UDP socket, this is the socket that the agent uses to send health messages to the control plane. We also see that our amphora has received an IP address on the load balancer management network. It is also instructive to display the configuration file that Octavia has generated for the agent – here we find, for instance, the address of the health manager to which the agent should send heartbeats.

This is nice, but where is the actual proxy? Based on our discussion of the architecture, we would have expected to see a HAProxy somewhere – where is it? The answer is that Octavia puts this HAProxy into a separate namespace, to avoid potential IP address range conflicts between the subnets on which HAProxy needs to listen (i.e. the subnet on which the VIP lives, which is specified by the user) and the subnet to which the agent needs to be attached (the load balancer management network, specified by the administrator). So let us look for this namespace.

ip netns list

In fact, there is a namespace called amphora-haproxy. We can use the ip netns exec command and nsenter to take a look at the configuration inside this namespace.

sudo ip netns exec amphora-haproxy ifconfig -a

We see that there is a virtual device eth1 inside the namespace, with an IP address on the external network. In addition, we see an alias on this device, i.e. essentially a second IP address. This the VIP, which could be detached from this instance and attached to another instance in an HA setup (the first IP is often called the VRRP address, again a term originating from its meaning in a HA setup as being the IP address of the interface across which the VRRP protocol is run).

At this point, no proxy is running yet (we will see later that the proxy is started only when we create listeners), so the configuration that we find is as displayed below.

OctaviaTestsConfigurationII

Monitoring load balancers

We have mentioned several times that the agent is send hearbeats to the health manager, so let us take a few minutes to dig into this. During the installation, we have created a virtual device called lb_port on our network node, which is attached to the integration bridge to establish connectivity to the load balancer management network. So let us log out of the amphora to get back to the network node and dump the UDP traffic crossing this interface.

sudo tcpdump -n -p udp -i lb_port

If we look at the output for a few seconds, then we find that every 10 seconds, a UDP packet arrives, coming from the UDP port of the amphora agent that we have seen earlier and the IP address of the amphora on the load balancer management network, and targeted towards port 5555. If you add -A to the tcpdump command, you find that the output is unreadable – this is because the heartbeat message is encrypted using the heartbeat key that we have defined in the Octavia configuration file. The format of the status message that is actually sent can be found here in the Octavia source code. We find that the agent will transmit the following information as part of the status messages:

  • The ID of the amphora
  • A list of listeners configured for this load balancer, with status information and statistics for each of them
  • Similarly, a list of pools with information on the members that the pool contains

We can display the current status information using the OpenStack CLI as follows.

openstack loadbalancer status show demo-loadbalancer
openstack loadbalancer stats show demo-loadbalancer

At the moment, this is not yet too exciting, as we did not yet set up any listeners, pools and members for our load balancer, i.e. our load balancer is not yet accepting and forwarding any traffic at all. In the next post, we will look in more detail into how this is done.

OpenStack Octavia – architecture and installation

Once you have a cloud platform with virtual machines, network and storage, you will sooner or later want to expose services running on your platform to the outside world. The natural way to do this is to use a load balancer, and in a cloud, you of course want to utilize a virtual load balancer. For OpenStack, the Octavia project provides this as a service, and in todays post, we will take a look at Octavias architecture and learn how to install it.

Octavia components

When designing a virtual load balancer, one of the key decisions you have to take is where to place the actual load balancer functionality. One obvious option would be to spawn software load balancers like HAProxy or NGINX on one of the controller nodes or the network node and route all traffic via those nodes. This approach, however, has the clear disadvantage that it introduces a single point of failure (unless, of course, you run your controller nodes in a HA setup) and, even worse, it puts a load of load on the network interfaces of these few nodes, which implies that this solution does not scale well when the number of virtual load balancers or endpoints increases.

Octavia chooses a different approach. The actual load balancers are realized by virtual machines, which are ordinary OpenStack instances running on the compute nodes, but use a dedicated image containing a HAProxy software load balancer and an agent used to control the configuration of the HAProxy instance. These instances – called the amphorae – are therefore scheduled by the Nova scheduler as any other Nova instances and thus scale well as they can leverage all available compute nodes. To control the amphorae, Octavia uses a control plane which consists of the Octavia worker running the logic to create, update and remove load balancers, the health manager which monitors the amphorae and the house keeping service which performs clean up activities and can manage a pool of spare amphorae to optimize the time it takes to spin up a new load balancer. In addition, as any other OpenStack project, Octavia exposes its functionality via an API server.

The API server and the control plane components communicate with each other using RPC calls, i.e. RabbitMQ, as we have already seen it for the other OpenStack services. However, the control plane components also have to communicate with the amphorae. This communication is bi-directional.

  • The agent running on each amphora exposes a REST API that the control plane needs to be able to reach
  • Conversely, the health manager listens for health status messages issued by the amphorae and therefore the control plane needs to be reachable from the amphorae as well

To enable this two-way communication, Octavia assumes that there is a virtual (i.e. Neutron) network called the load balancer management network which is specified by the administrator during the installation. Octavia will then attach all amphorae to this network. Further, Octavia assumes that via this network, the control plane components can reach the REST API exposed by the agent running on each amphora and conversely that the agent can reach the control plane via this network. There are several ways to achieve this, we get back to this point when discussing the installation further below. Thus the overall architecture of an Octavia installation including running load balancers is as in the diagram below.

OctaviaArchitecture

Octavia installation

Large parts of the Octavia installation procedure (which, for Ubuntu, is described here in the official documentation) is along the lines of the installation procedures for other OpenStack components that we have already seen – create users, database tables, API endpoints, configure and start services and so forth. However, there are a few points during the installation where I found that the official documentation is misleading (at the least) or where I had problems figuring out what was going on and had to spend some time reading source code to clarify a few things. Here are the main pitfalls.

Creating and connecting the load balancer management network

The first challenge is the setup of the load balancer network mentioned before. Recall that this needs to be a virtual network to which our amphorae will attach which allows access to the amphorae from the control plane and allows traffic from the amphorae to reach the health manager. To build this network, there are several options.

First, we could of course create a dedicated provider network for this purpose. Thus, we would reserve a physical interface or a VLAN tag on each node for that network and would set up a corresponding provider network in Neutron which we use as load balancer management network. Obviously, this only works if your physical network infrastructure allows for it.

If this is not the case, another option would be to use a “fake” physical network as we have done it in one of our previous labs. We could, for instance, set up an OVS bridge managed outside of Neutron on each node, connect these bridges using an overlay network and present this network to Neutron as a physical network on which we base a provider network. This should work in most environments, but creates an additional overhead due to the additionally needed bridges on each node.

Finally – and this is the approach that also the official installation instructions take – we could simply use a VXLAN network as load balancer management network and connect to it from the network node by adding an additional network device to the Neutron integration bridge. Unfortunately, the instructions provided as part of the official documentation only work if Linux bridges are used, so we need to take a more detailed look at this option in our case (using OVS bridges).

Recall that if we set up a Neutron VXLAN network, this network will manifest itself as a local VLAN tag on the integration bridge of each node on which a port is connected to this network. Specifically, this is true for the network node, on which the DHCP agent for our VXLAN network will be running. Thus, when we create the network, Neutron will spin up a DHCP agent on the network node and will assign a local VLAN tag used for traffic belonging to this network on the integration bridge br-int.

To connect to this network from the network node, we can now simply bring up an additional internal port attached to the integration bridge (which will be visible as a virtual network device) and configured access port, using this VLAN tag. We then assign an IP address to this device, and using this IP address as a gateway, we can now connect to every other port on the Neutron VXLAN network from the network node. As the Octavia control plane components communicate with the Octavia API server via RPC, we can place them on the network node so that they can use this interface to communicate with the amphorae.

OctaviaLoadBalancerNetwork

With this approach, the steps to set up the network are as follows (see also the corresponding Ansible script for details).

  • Create a virtual network as an ordinary Neutron VXLAN network and add a subnet
  • Create security groups to allow traffic to the TCP port on which the amphora agent exposes its REST API (port 9443 by default) and to allow access to the health manager (UDP, port 5555). It is also helpful to allow SSH and ICMP traffic to be able to analyze issues by pinging and accessing the amphorae
  • Now we create a port on the load balancer network. This will reserve an IP address that we can use for our port to avoid IP address conflicts with Neutron
  • Then we wait until the namespace for the DHCP agent has been created, get the ID of the corresponding Neutron port and read the VLAN tag for this port from the OVS DB (I have created a Jinja2 template to build a script doing all this)
  • Now create an OVS access port using this VLAN ID, assign an IP address to the corresponding virtual network device and bring up the device

There is a little gotcha with this configuration when the OVS agent is restarted, as in this case, the local VLAN ID can change. Therefore we also need to create a script to refresh the configuration and run it via a systemd unit file whenever the OVS agent is restarted.

Image creation

The next challenge I was facing during the installation was the creation of the image. Sure, Octavia comes with instructions and a script to do this, but I found some of the parameters a bit difficult to understand and had to take a look at the source code of the scripts to figure out what they mean. Eventually, I wrote a Dockerfile to run the build in a container and corresponding instructions. When building and running a container with this Dockerfile, the necessary code will be downloaded from the Octavia GitHub repository, some required tools are installed in the container and the image build script is started. To have access to the generated image and to be able to cache data across builds, the Dockerfile assumes that your local working directory is mapped into the container as described in the instructions.

Once the image has been built, it needs to be uploaded into Glance and tagged with a tag that will also be added to the Octavia configuration. This tag is later used by Octavia when an amphora is created to locate the correct image. In addition, we will have to set up a flavor to use for the amphorae. Note that we need at least 1 GB of RAM and 2 GB of disk space to be able to run the amphora image, so make sure to size the flavor accordingly.

Certificates, keys and configuration

We have seen above that the control plane components of Octavia use a REST API exposed by the agent running on each amphora to make changes to the configuration of the HAProxy instances. Obviously, this connection needs to be secured. For that purpose, Octavia uses TLS certificates.

First, there is a client certificate. This certificate will be used by the control plane to authenticate itself when connecting to the agent. The client certificate and the corresponding key need to be created during installation. As the CA certificate used to sign the client certificate needs to be present on every amphora (so that the agent can use it to verify incoming requests), this certificate needs to be known to Octavia as well, and Octavia will distribute it to each newly created amphora.

Second, each agent of course needs a server certificate. These certificates are unique to each agent and are created dynamically at run time by a certificate generator built into the Octavia control plane. During the installation, we only have to provide a CA certificate and a corresponding private key which Octavia will then use to issue the server certificates. The following diagram summarizes the involved certificates and key.

OctaviaCertificates

In addition, Octavia can place an SSH key on each amphora to allow us to SSH into an amphora in case there are any issues with it. And finally, a short string is used as a secret to encrypt the health status messages. Thus, during the installation, we have to create

  • A root CA certificate that will be placed on each amphora
  • A client certificate signed by this root CA and a corresponding client key
  • An additional root CA certificate that Octavia will use to create the server certificates and a corresponding key
  • An SSH key pair
  • A secret for the encryption of the health messages

More details on this, how these certificates are referenced in the configuration and a list of other relevant configuration options can be found in the documentation of the Ansible role that I use for the installation.

Versioning issues

During the installation, I came across an interesting versioning issue. The API used to communicate between the control plane and the agent is versioned. To allow different versions to interact, the client code used by the control plane has a version detection mechanism built into it, i.e. it will first ask the REST API for a list of available versions and then pick one based on its own capabilities. This code obviously was added with the Stein release.

When I first installed Octavia, I used the Ubuntu packages for the Stein release which are part of the Ubuntu Cloud archive. However, I experienced errors when the control plane was trying to connect to the agents. During debugging, I found that the versioning code is present in the Stein branch on GitHub but not included in the version of the code distributed with the Ubuntu packages. This, of course, makes it impossible to establish a connection.

I do not know whether this is an archiving error or whether the versioning code was added to the Stein maintenance branch after the official release had gone out. To fix this, I now pull the source code once more from GitHub when installing the Octavia control plane to make sure that I run the latest version from the Stein GitHub branch.

Lab 14: adding Octavia to our OpenStack playground

After all this theory, let us now run Lab14, in which we add Octavia to our OpenStack playground. Obviously, we need the amphora image to do this. You can either follow the instructions above to build your own version of the image, or you can use a version which I have built and uploaded into an S3 bucket. The instructions below use this version.

So to run the lab, enter the following commands (assuming, as always in this series, that you have gone through the basic setup described here).

git clone https://www.github.com/christianb93/openstack-labs
cd openstack-labs/Lab14
wget https://s3.eu-central-1.amazonaws.com/cloud.leftasexercise.com/amphora-x64-haproxy.qcow2
vagrant up
ansible-playbook -i hosts.ini site.yaml

In the next post, we will test this setup by bringing up our first load balancer and go through the configuration and provisioning process step by step.

OpenStack Cinder – architecture and installation

Having looked at the foundations of the storage technology that Cinder uses in the previous posts, we are now ready to explore the basic architecture of Cinder and install Cinder in our playground.

Cinder architecture

Essentially, Cinder consists of three main components which are running as independent processes and typically on different nodes.

First, there is the Cinder API server cinder-api. This is a WSGI-server running inside of Apache2. As the name suggests, cinder-api is responsible for accepting and processing REST API requests from users and other components of OpenStack and typically runs on a controller node.

Then, there is cinder-volume, the Cinder volume manager. This component is running on each node to which storage is attached (“storage node”) and is resonsible for managing this storage, i.e. preparing, maintaining and deleting virtual volumes and exporting these volumes so that they can be used by a compute node. And finally, Cinder comes with its own scheduler, which directs requests to create a virtual volume to an appropriate storage node.

CinderArchitecture

Of course, Cinder also has its own database and communicates with other components of OpenStack, for example with Keystone to authenticate requests and with Glance to be able to create volumes from images.

Cinder van use a variety of different storage backends, ranging from LVM managing local storage directly attached to a storage node over other standards like NFS and open source solutions like Ceph to vendor-specific drivers like Dell, Huawei, NetApp or Oracle – see the official support matrix for a full list. This is achieved by moving low-level access to the actual volumes into a volume driver. Similarly, Cinder uses various technologies to connect the virtual volumes to the compute node on which an instance that wants to use it is running, using a so-called target driver.

To better understand how Cinder operates, let us take a look at one specific use case – creating a volume and attaching to an instance. We will go in more detail on this and similar use cases in the next post, but what essentially happens is the following:

  • A user (for instance an administrator) sends a request to the Cinder API server to create a logical volume
  • The API server asks the scheduler to determine a storage node with sufficient capacity
  • The request is forwarded to the volume manager on the target node
  • The volume manager uses a volume driver to create a logical volume. With the default settings, this will be LVM, for which a volume group and underlying physical volumes have been created during installation. LVM will thzen create a new logical volume
  • Next, the administrator attaches the volume to an instance using another API request
  • Now, the target driver is invoked which sets up an iSCSI target on the storage node and a LUN pointing to the logical volume
  • The storage node informs the compute node about the parameters (portal IP and port, target name) that are needed to access this target
  • The Nova compute agent on the compute node invokes an iSCSI initiator which maps the iSCSI target into the local file system
  • Finally, the Nova compute agent asks the virtual machine driver (libvirt in our case) to attach this locally mapped device to the instance

CinderVolumeExport.png

Installation steps

After this short summary of the high-level architecture of Cinder, let us move and try to understand how the installation process looks like. Again, the installation follows the standard pattern that we have now seen so many times.

  • Create a database for Cinder and prepare a database user
  • Add user, services and endpoints in Keystone
  • Install the Cinder packages on the controller and the storage nodes
  • Modify the configuration files as needed

In addition to these standard steps, there are a few points specific for Cinder. First, as sketched above, Cinder uses LVM to manage virtual volumes on the storage nodes. We therefore need to perform a basic setup of LVM on each storage node, i.e. we need to create physical volumes and a volume group. Second, every compute node will need to act as iSCSI initiator and therefore needs a valid iSCSI initiator name. The Ubuntu distribution that we use already contains the Open-iSCSI package, which maintains an initiator name in the file /etc/iscsi/initiatorname.iscsi. However, as this name is supposed to be unique, it is not set in the Ubuntu Vagrant image and we need to do this once during the installation by running the script /lib/open-iscsi/startup-checks.sh as root.

There is an additional point that we need to observe which is also not mentioned in the official installation guide for the Stein release. When installing Cinder, you have to define which network interface Cinder will use for the iSCSI traffic. In production, you would probably want to use a separate storage network, but for our setup, we use the management network. According to the installation guide, it should be sufficient to set the configuration item my_ip, but in reality, this did not work for me and I had to set the item target_ip_address on the storage nodes.

Lab13: running and testing the Cinder installation

Time to try this out and set up a lab for that. The first thing we need to do is to modify our Vagrantfile to add a storage node. In order to reduce memory consumption a bit, we instead move from having two computes to one compute node only, so that our new setup looks as follows.

TopologyWithStorageNode

As mentioned above, we will not use a dedicated storage network, but send the iSCSI traffic over the management network so that our network setup remains essentially unchanged. To bring up this scenario and to run the Cinder installation plus a demo setup do the following.

git clone https://github.com/christianb93/openstack-labs
cd openstack-labs/Lab13
vagrant up
ansible-playbook -i hosts.ini site.yaml
ansible-playbook -i hosts.ini demo.yaml

When all this completes, we can run some tests. First, verify that all storage nodes are up and running. For that purpose, log into the controller node and use the OpenStack CLI to retrieve a list of all recognized storage nodes.

vagrant ssh controller
source admin-openrc
openstack volume service list

The output that you get should look similar to the sample output below.

+------------------+-------------+------+---------+-------+----------------------------+
| Binary           | Host        | Zone | Status  | State | Updated At                 |
+------------------+-------------+------+---------+-------+----------------------------+
| cinder-scheduler | controller  | nova | enabled | up    | 2020-01-01T14:03:14.000000 |
| cinder-volume    | storage@lvm | nova | enabled | up    | 2020-01-01T14:03:14.000000 |
+------------------+-------------+------+---------+-------+----------------------------+

We see that there are two services that have been detected. First, there is the Cinder scheduler running on the controller node, and then, there is the Cinder volume manager on the storage node. Both are “up”, which is promising.

Next, let us try to create a volume and attach it to one of our test instances, say demo-instance-3. For this to work, you need to be on the network node (as we will have to establish connectivity to the instance).

vagrant ssh network
source demo-openrc
openstack volume create \
  --size=1 \
  demo-volume
openstack server add volume \
  demo-instance-3 \
  demo-volume
openstack server show demo-instance-3
openstack server ssh \
  --identity demo-key \
  --login cirros \
  --private demo-instance-3

You should now be inside the instance. When you run lsblk, you should find a new block device /dev/vdb being added. So our installation seems to work!

Having a working testbed, we are now ready to dive a little deeper into the inner workings of Cinder. In the next post, we will try to understand some of the common use cases in a bit more detail.

OpenStack Nova – installation and overview

In this post, we will look into Nova, the cloud fabric component of OpenStack. We will see how Nova is installed and go briefly through the individual components and Nova services.

Overview

Before getting into the installation process, let us briefly discuss the various components of Nova on the controller and compute nodes.

First, there is the Nova API server which runs on the controller node. The Nova service will register itself as a systemd service with entry point /usr/bin/nova-api. Similar to Glance, invoking this script will bring up an WSGI server which uses PasteDeploy to build a pipeline with the actual Nova API endpoint (an instance of nova.api.openstack.compute.APIRouterV21) being the last element of the pipeline. This component will then distribute incoming API requests to various controllers which are part of the nova.api.openstack.compute module. The routing rules themselves are actually hardcoded in the ROUTE_LIST which is part of the Router class and maps request paths to controller objects and their methods.

NovaAPIComponents

When you browse the source code, you will find that Nova offers some APIs like the image API or the bare metal API which are simply proxies to other OpenStack services like Glance or Ironic. These APIs are deprecated, but still present for backwards compatibility. Nova also has a network API which, depending on the value of the configuration item use_neutron will either acts as proxy to Neutron or will present the legacy Nova networking API.

The second Nova component on the controller node is the Nova conductor. The Nova conductor does not expose a REST API, but communicates with the other Nova components via RPC calls (based on RabbitMQ). The conductor is used to handle long-running tasks like building an instance or performing a live migration.

Similar to the Nova API server, the conductor has a tiered architecture. The actual binary which is started by the systemd mechanism creates a so called service object. In Nova, a service objects represents an RPC API endpoint. When a service object is created, it starts up an RPC service that handles the actual communication via RabbitMQ and forwards incoming requests to an associated service manager object.

NovaRPCAPI

Again, the mapping between binaries and manager classes is hardcoded and, for the Stein release, is as follows.

SERVICE_MANAGERS = {
  'nova-compute': 'nova.compute.manager.ComputeManager',
  'nova-console': 'nova.console.manager.ConsoleProxyManager',
  'nova-conductor': 'nova.conductor.manager.ConductorManager',
  'nova-metadata': 'nova.api.manager.MetadataManager',
  'nova-scheduler': 'nova.scheduler.manager.SchedulerManager',
}

Apart from the conductor service, this list contains one more component that runs on the controller node and use the same mechanism to handle RPC requests (the nova-console binary is deprecated and we use the noVNC proxy, see the section below, the nova-compute binary is running on the compute node, and the nova-metadata binary is the old metadata service used with the legacy Nova networking API). This is the the Nova scheduler.

The scheduler receives and maintains information on the instances running on the individual hosts and, upon request, uses the Placement API that we have looked at in the previous post to take a decision where a new instance should be placed. The actual scheduling is carried out by a pluggable instance of the nova.scheduler.Scheduler base class. The default scheduler is the filter scheduler which first applies a set of filters to filter out individual hosts which are candidates for hosting the instance, and then computes a score using a set of weights to take a final decision. Details on the scheduling algorithm are described here.

The last service which we have not yet discussed is Nova compute. One instance of the Nova compute service runs on each compute node. The manager class behind this service is the ComputeManager which itself invokes various APIs like the networking API or the Cinder API to manage the instances on this node. The compute service interacts with the underlying hypervisor via a compute driver. Nova comes with compute driver for the most commonly used hypervisors, including KVM (via libvirt), VMWare, HyperV or the Xen hypervisor. In a later post, we will go once through the call chain when provisioning a new instance to see how the Nova API, the Nova conductor service, the Nova compute service and the compute driver interact to bring up the machine.

The Nova compute service itself does not have a connection to the database. However, in some cases, the compute service needs to access information stored in the database, for instance when the Nova compute service initializes on a specific host and needs to retrieve a list of instances running on this host from the database. To make this possible, Nova uses remotable objects provided by the Oslo versioned objects library. This library provides decorators like remotable_classmethod to mark methods of a class or an object as remotable. These decorators point to the conductor API (indirection_api within Oslo) and delegate the actual method invocation to a remote copy via an RPC call to the conductor API. In this way, only the conductor needs access to the database and Nova compute offloads all database access to the conductor.

Nova cells

In a large OpenStack installation, access to the instance data stored in the MariaDB database can easily become a bottleneck. To avoid this, OpenStack provides a sharding mechanism for the database known as cells.

The idea behind this is that the set of your compute nodes are partitioned into cells. Every compute node is part of a cell, and in addition to these regular cells, there is a cell called cell0 (which is usually not used and only holds instances which could not be scheduled to a node). The Nova database schema is split into a global part which is stored in a database called the API database and a cell-local part. This cell-local database is different for each cell, so each cell can use a different database running (potentially) on a different host. A similar sharding applies to message queues. When you set up a compute node, the configuration of the database connection and the connection to the RabbitMQ service determine to which cell the node belongs. The compute node will then use this database connection to register itself with the corresponding cell database, and a special script (nova-manage) needs to be run to make these hosts visible in the API database as well so that they can be used by the scheduler.

Cells themselves are stored in a database table cell_mappings in the API database. Here each cell is set up with a dedicated RabbitMQ connection string (called the transport URL) and a DB connection string. Our setup will have two cells – the special cell0 which is always present and a cell1. Therefore, our installation will required three databases.

Database Description
nova_api Nova API database
nova_cell0 Database for cell0
nova Database for cell1

In a deployment with more than one real cell, each cell will have its own Nova conductor service, in addition to a “super conductor” running across cells, as explained here and in diagram below which is part of the OpenStack documentation.

The Nova VNC proxy

Usually, you will use SSH to access your instances. However, sometimes, for instance if the SSHD is not coming up properly or the network configuration is broken, it would be very helpful to have a way to connect to the instance directly. For that purpose, OpenStack offers a VNC console access to running instances. Several VNC clients can be used, but the default is to use the noVNC browser based client embedded directly into the Horizon dashboard.

How exactly does this work? First, there is KVM. The KVM hypervisor has the option to export the content of the emulated graphics card of the instance as a VNC server. Obviously, this VNC server is running on the compute node on which the instance is located. The server for the first instance will listen on port 5900, the server for the second instance will listen on port 5901 and so forth. The server_listen configuration option determines the IP address to which the server will bind.

Now theoretically a VNC client like noVNC could connect directly to the VNC server. However, in most setups, the network interfaces of the compute node are not directly reachable from a browser in which the Horizon GUI is running. To solve this, Nova comes with a dedicated proxy for noVNC. This proxy is typically running on the controller node. The IP address and port number on which this proxy is listening can again be configured using the novncproxy_host and novncproxy_port configuration items. The default port is 6080.

When a client like the Horizon dashboard wants to get access to the proxy, it can use the Nova API path /servers/{server_id}/remote-consoles. This call will be forwarded to the Nova compute method get_vnc_console on the compute node. This method will return an URL, consisting of the base URL (which can again be configured using the novncproxy_base_url configuration item), and a token which is stored in the database as well. When the client uses this URL to connect to the proxy, the token is used to verify that the call is authorized.

The following diagram summarizes the process to connect to the VNC console of an instance from a browser running noVNC.

NoVNCProxy

  1. Client uses Nova API /servers/{server_id}/remote-consoles to retrieve the URL of a proxy
  2. Nova API delegates the request to Nova Compute on the compute node
  3. Nova Compute assembles the URL, which points to the proxy, and creates a token, containing the ID of the instance as well as additional information, and the URL including the token is handed out to the client
  4. Client uses the URL to connect to the proxy
  5. The proxy validates the token, extracts the target compute node and port information, establishes the connection to the actual VNC server and starts to service the session

Installing Nova on the controller node

Armed with the knowledge from the previous discussions, we can now almost guess what steps we need to take in order to install Nova on the controller node.

NovaInstallation

First, we need to create the Nova databases – the Nova API database (nova_api), the Nova database for cell0 (nova_cell0) and the Nova database for our only real cell cell1 (nova). We also need to create a user which has the necessary grants on these databases.

Next, we create a user in Keystone representing the Nova service, register the Nova API service with Keystone and define endpoints.

We then install the Ubuntu packages corresponding to the four components that we will install on the controller node – the Nova API service, the Nova conductor, the Nova scheduler and the VNC proxy.

Finally, we adapt the configuration file /etc/nova/nova.conf. The first change is easy – we set the value my_ip to the IP of the controller management interface.

We then need to set up the networking part. To enforce the use of Neutron instead of the built-in legacy Nova networking, we set the configuration option use_neutron that we already discussed above to True. We also set the firewall driver to the No-OP driver nova.virt.firewall.NoopFirewallDriver.

The next information we need to provide is the connection information to RabbitMQ and the database. Recall that we need to configure two database connections, one for the API database and one for the database for cell 1 (Nova will automatically append _cell0 to this database name to obtain the database connection for cell 0).

We also need to provide some information that Nova needs to communicate with other Nova services. In the glance section, we need to define the URL to reach the Glance API server. In the neutron section, we need to set up the necessary credentials to connect to Neutron. Here we use a Keystone user neutron which we will set up when installing Neutron in a later post, and we also define some data needed for the metadata proxy that we will discuss in a later post. And finally Nova needs to connect to the Placement service for which we have to provide credentials as well, this time using the placement user created earlier.

To set up the communication with Keystone, we need to set the authorization strategy to Keystone (which will also select the PasteDeploy Pipeline containing the Keystone authtoken middleware) and provide the credentials that the authtoken middleware needs. And finally, we set the path that the Oslo concurrency library will use to create temporary files.

Once all this has been done, we need to prepare the database for use. As with the other services, we need to sync the database schema to the latest version which, in our case, will simply create the database schema from scratch. We also need to establish our cell 1 in the database using the nova-manage utility.

Installing Nova on the compute nodes

Let us now turn to the installation of Nova on the compute nodes. Recall that on the compute nodes, only nova-compute needs to be running. There is no database connection needed, so the only installation step is to install the nova-compute package and to adapt the configuration file.

The configuration file nova.conf on the compute node is very similar to the configuration file on the controller node, with a few differences.

As there is no database connection, we can comment out the DB connection string. In the light of our above discussion of the VNC proxy mechanism, we also need to provide some configuration items for the proxy mechanism.

  • The configuration item server_proxyclient_address is evaluated by the get_vnc_console of the compute driver and used to return the IP and port number on which the actual VNC server is running and can be reached from the controller node (this is the address to which the proxy will connect)
  • The server_listen configuration item is the IP address to which the KVM VNC server will bind on the compute host and should be reachable via the server_proxyclient_address from the controller node
  • the novncproxy_base_url is the URL which is handed out by the compute node for use by the proxy

Finally, there is a second configuration file nova-compute.conf specific to the compute nodes. This file determines the compute driver used (in our case, we use libvirt) and the virtualization type. With libvirt, we can either use KVM or QEMU. KVM will only work if the CPU supports virtualization (i.e. offers the VT-X extension for Intel or AMD-V for AMD). In our setup, the virtual machines will run on top of another virtual machine (Virtualbox), which does only pass through these features for AMD CPUs. We will therefore set the virtualization type to QEMU.

Finally, after installing Nova on all compute nodes, we need to run the nova-manage tool once more to make these nodes known and move them into the correct cells.

Run and verify the installation

Let us now run the installation and verify that is has succeeded. Here are the commands to bring up the environment and to obtain and execute the needed playbooks.

git clone https://www.github.com/christianb93/openstack-labs
cd openstack-labs/Lab4
vagrant up
ansible-playbook -i hosts.ini site.yaml

This will run a few minutes, depending on the network connection and the resources available on your machine. Once the installation completes, log into the controller and source the admin credentials.

vagrant ssh controller
source admin-openrc

First, we verify that all components are running on the controller. To do this, enter

systemctl | grep "nova"

The output should contain four lines, corresponding to the four services nova-api, nova-conductor, nova-scheduler and nova-novncproxy running on the controller node. Next, let us inspect the Nova database to see which compute services have registered with Nova.

openstack compute service list

The output should be similar to the sample output below, listing the scheduler, the conductor and two compute instances, corresponding to the two compute nodes that our installation has.

+----+----------------+------------+----------+---------+-------+----------------------------+
| ID | Binary         | Host       | Zone     | Status  | State | Updated At                 |
+----+----------------+------------+----------+---------+-------+----------------------------+
|  1 | nova-scheduler | controller | internal | enabled | up    | 2019-11-18T08:35:04.000000 |
|  6 | nova-conductor | controller | internal | enabled | up    | 2019-11-18T08:34:56.000000 |
|  7 | nova-compute   | compute1   | nova     | enabled | up    | 2019-11-18T08:34:56.000000 |
|  8 | nova-compute   | compute2   | nova     | enabled | up    | 2019-11-18T08:34:57.000000 |
+----+----------------+------------+----------+---------+-------+----------------------------+

Finally, the command sudo nova-status upgrade check will run some checks meant to be executed after an update that can be used to further verify the installation.

OpenStack supporting services – Glance and Placement

Apart from Keystone, Glance and Placement are two additional infrastructure services that are part of every OpenStack installation. While Glance is responsible for storing and maintaining disk images, Placement (formerly part of Nova) is keeping track of resources and allocation in a cluster.

Glance installation

Before we get into the actual installation process, let us take a short look at the Glance runtime environment. Different from Keystone, but similar to most other OpenStack services, Glance is not running inside Apache, but is an independent process using a standalone WSGI server.

To understand the startup process, let us start with the setup.cfg file. This file contains an entry point glance-api which, via the usual mechanism provided by Pythons setuptools, will provide a Python executable which runs glance/cmd/api.py. This in turn uses the simple WSGI server implemented in glance/common/wsgi.py. This server is then started in the line

server.start(config.load_paste_app('glance-api'), default_port=9292)

Here we see that the actual WSGI app is created and passed to the server using the PasteDeploy Python library. If you have read my previous post on WSGI and WSGI middleware, you will know that this is a library which uses configuration data to plumb together a WSGI application and middleware. The actual call of the PasteDeploy library is delegated to a helper library in glance/common and happens in the function load_past_app defined here.

Armed with this understanding, let us now dive right into the installation process. We will spend a bit more time with this process, as it contains some recurring elements which are relevant for most of the OpenStack services that we will install and use. Here is a graphical overview of the various steps that we will go through.

GlanceInstallationProcess

The first thing we have to take care of is the database. Almost all OpenStack services require some sort of database access, thus we have to create one or more databases in our MariaDB database server. In the case of Glance, we create a database called glance. To allow Glance to access this database, we also need to set up a corresponding MariaDB user and grant the necessary access rights on our newly created database.

Next, Glance of course needs access to Keystone to authenticate users and authorize API requests. For that purpose, we create a new user glance in Keystone. Following the recommended installation process, we will in fact create one Keystone user for every OpenStack service, which is of course not strictly necessary.

With this, we have set up the necessary identities in Keystone. However, recall that Keystone is also used as a service catalog to decouple services from endpoints. An API user will typically not access Glance directly, but first get a list of service endpoints from Keystone, select an appropriate endpoint and then use this endpoint. To support this pattern, we need to register Glance with the Keystone service catalog. Thus, we create a Keystone service and API endpoints. Note that the port provided needs to match the actual port on which the Glance service is listening (using the default unless overridden explicitly in the configuration).

OpenStack services typically expose more than one endpoint – a public endpoint, an internal endpoint and an admin endpoint. As described here, there does not seem to be fully consistent configuration scheme that allows an administrator to easily define which endpoint type the services will use. Following the installation guideline, we will install all our services with all three endpoint types.

Next, we can install Glance by simply installing the corresponding APT package. Similar to Keystone, this package comes with a set of configuration files that we now have to adapt.

The first change which is again standard across all OpenStack components is to change the database connection string so that Glance is able to find our previously created database. Note that this string needs to contain the credentials for the Glance database user that we have created.

Next, we need to configure the Glance WSGI middleware chain. As discussed above, Glance uses the PasteDeploy mechanism to create a WSGI application. When you take a look at the corresponding configuration, however, you will see that it contains a variety of different pipeline definitions. To select the pipeline that will actually be deployed, Glance has a configuration option called deployment flavor. This is a short form for the name of the pipeline to be selected, and when the actual pipeline is assembled here, the name of the pipeline is put together by combining the flavor with the string “glance-api”. We use the flavor “keystone” which will result in the pipeline “glance-api-keystone” being loaded.

This pipeline contains the Keystone auth token middleware which (as discussed in our deep dive into tokens and policies) extracts and validates the token data in a request. This middleware components needs access to the Keystone API, and therefore we need to add the required credentials to our configuration in the section [keystone_authtoken].

To complete the installation, we still have to create the actual database schema that Glance expects. Like most other OpenStack services, Glance is able to automatically create this schema and to synchronize an existing database with the current version of the existing schema by automatically running the necessary migration routines. This is done by the helper script glance-manage.

The actual installation process is now completed, and we can restart the Glance service so that the changes in our configuration files are picked up.

Note that the current version of the OpenStack install guide for Stein will instruct you to start two Glance services – glance-api and glance-registry. We only start the glance-api service, for the following reason.

Internally, Glance is structured into a database access layer and the actual Glance API server, plus of course a couple of other components like common services. Historically, Glance used the first of todays three access layers called the Glance registry. Essentially, the Glance registry is a service sitting between the Glance API service and the database, and contains the code for the actual database layer which uses SQLAlchemy. In this setup, the Glance API service is reachable via the REST API, whereas the Glance registry server is only reachable via RPC calls (using the RabbitMQ message queue). This will add an additional layer of security, as the database credentials need to be stored in the configuration of the Glance registry service only, and makes it easier to scale Glance across several nodes. Later, the Glance registry service was deprecated, and the actual configuration instructs Glance to access the database directly (this is the data_api parameter in the Glance configuration file).

As in my previous posts, I will not replicate the exact commands to do all this manually (you can find them in the well-written OpenStack installation guide), but have put together a set of Ansible scripts doing all this. To run them, enter the following commands

git clone https://github.com/christianb93/openstack-labs
cd Lab3
vagrant up
ansible-playbook -i hosts.ini site.yaml

This playbook will not only install and configure the Glance service, but will also download the CirrOS image (which I have mirrored in an S3 bucket as the original location is sometimes a bit slow) and import it into Glance.

Working with Glance images

Let us now play a bit with Glance. The following commands need to be run from the controller node, and we have to source the credentials that we need to connect to the API. So SSH into the controller node and source the credentials by running

vagrant ssh controller
source admin-openrc

First, let us use the CLI to display all existing images.

openstack image list

As we have only loaded one image so far – the CirrOS image – the output will be similar to the following sample output.

+--------------------------------------+--------+--------+
| ID                                   | Name   | Status |
+--------------------------------------+--------+--------+
| f019b225-de62-4782-9206-ed793fbb789f | cirros | active |
+--------------------------------------+--------+--------+

Let us now get some more information on this image. For better readability, we display the output in JSON format.

openstack image show cirros -f json

The output is a bit longer, and we will only discuss a few of the returned attributes. First, there is the file attribute. If you look at this and compare this to the contents of the directory /var/lib/glance/images/, you will see that this is a reference to the actual image stored on the hard disk. Glance delegates the actual storage to a storage backend. Storage backends are provided by the separate glance_store library and include a file store (which simply stores files on the disk as we have observed and is the default), a HTTP store which uses a HTTP GET to retrieve an image, an interface to the RADOS distribute object store and interfaces to Cinder, Swift and VMWare data store.

We also see from the output that images can be active or inactive, belong to a project (the owner field refers to a project), can be tagged and can either be visible for the public (i.e. outside the project to which they belong) or private (i.e. only visible within the project). It is also possible to share images with individual projects by adding these projects as members.

Note that Glance stores image metadata, like visibility, hash values, owner and so forth in the database, while the actual image is stored in one of the storage backends.

Let us now go through the process of adding an additional image. The OpenStack Virtual Machine Image guide contains a few public sources for OpenStack images and explains how adminstrators can create their own images based on the most commonly used Linux distributions. As an example, here are the commands needed to download the latest Ubuntu Bionic cloud image and import it into Glance.

wget http://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
openstack image create \
    --disk-format qcow2 \
    --file bionic-server-cloudimg-amd64.img \
    --public \
    --project admin \
    ubuntu-bionic

We will later see how we need to reference an image when creating a virtual machine.

Installing the placement service

Having discussed the installation process for the Glance service in a bit more detail, let us now quickly go over the steps to install Placement. Structurally, these steps are almost identical to those to install Glance, and we will not go into them in detail.

PlacementInstallation

Also the changes in the configuration file are very similar to those that we had to apply for Glance. Placement again uses the Keystone authtoken plugin, so that we have to supply credentials for Keystone in the keystone_authtoken section of the configuration file. We also have to supply a database connection string. Apart from that, we can take over the default values in the configuration file without any further changes.

Placement overview

Let us now investigate the essential terms and objects that the Placement API uses. As the openstack client does not yet contain full support for placement, we will directly use the API using curl. With each request, we need to include two header parameters.

  • X-Auth-Token needs to contain a valid token that we need to retrieve from Keystone first
  • OpenStack-API-Version needs to be included to define the version of the API (this is the so-called microversion).

Here is an example. We will SSH into the controller, source the credentials, get a token from Keystone and submit a GET request on the URL /resource_classes that we feed into jq for better readability.

vagrant ssh controller
source admin-openrc
sudo apt-get install jq
token=$(openstack token issue -f json | jq -r ".id") 
curl \
  -H "X-Auth-Token: $token" \
  -H "OpenStack-API-Version: placement 1.31"\
  "http://controller:8778/resource_classes" | jq

The resulting list is a list of all resource classes known to Placement. Resource classes are types of resources that Placement manages, like IP addresses, vCPUs, disk space or memory. In a fully installed system, OpenStack services can register as resource providers. Each provider offers a certain set of resource classes, which is called an inventory. A compute node, for instance, would typically provide CPUs, disk space and memory. In our current installation, we cannot yet test this, but in a system with compute nodes, the inventory for a compute node would typically look as follows.

{
  "resource_provider_generation": 3,
  "inventories": {
    "VCPU": {
      "total": 2,
      "reserved": 0,
      "min_unit": 1,
      "max_unit": 2,
      "step_size": 1,
      "allocation_ratio": 16
    },
    "MEMORY_MB": {
      "total": 3944,
      "reserved": 512,
      "min_unit": 1,
      "max_unit": 3944,
      "step_size": 1,
      "allocation_ratio": 1.5
    },
    "DISK_GB": {
      "total": 9,
      "reserved": 0,
      "min_unit": 1,
      "max_unit": 9,
      "step_size": 1,
      "allocation_ratio": 1
    }
  }
}

Here we see that the compute node has two virtual CPUs, roughly 4 GB of memory and 9 GB of disk space available. For each resource provider, Placement also maintains usage data which keeps track of the current usage of the resources. Here is a JSON representation of the usage for a compute node.

{
  "resource_provider_generation": 3,
  "usages": {
    "VCPU": 1,
    "MEMORY_MB": 128,
    "DISK_GB": 1
  }
}

So in this example, one vCPU, 128 MB RAM and 1 GB disk of the compute node are in use. To link consumers and usage information, Placement uses allocations. An allocation represents a usage of resources by a specific consumer, like in the following example.

{
  "allocations": {
    "aaa957ac-c12c-4010-8faf-55520200ed55": {
      "resources": {
        "DISK_GB": 1,
        "MEMORY_MB": 128,
        "VCPU": 1
      },
      "consumer_generation": 1
    }
  },
  "resource_provider_generation": 3
}

In this case, the consumer (represented by the UUID aaa957ac-c12c-4010-8faf-55520200ed55) is actually a compute instance which consumes 1 vCPU, 128 MB memory and 1 GB disk space. Here is a diagram that represents a simplified version of the Placement data model.

PlacementDataModel

Placement offers a few additional features like Traits, which define qualitative properties of resource providers, or aggregates which are groups of resource providers, or the ability to make reservations.

Let us close this post by briefly discussing the relation between Nova and Placement. As we have mentioned above, compute nodes represent resource providers in Placement, so Nova needs to register resource provider records for the compute nodes it manages and providing the inventory information. When a new instance is created, the Nova scheduler will request information on inventories and current usage from Placement to determine the compute node on which the instance will be placed, and will subsequently update the allocations to register itself as a consumer for the resources consumed by the newly created instance.

With this, the installation of Glance and Placement is complete and we have all the ingredients in place to start installing the Nova services in the next post.

Openstack Keystone – installation and overview

Today we will dive into OpenStack Keystone, the part of OpenStack that provides services like management of users, roles and projects, authentication and a service catalog to the other OpenStack components. We will first install Keystone and then take a closer look at each of these areas.

Installing Keystone

As in the previous lab, I have put together a couple of scripts that automatically install Keystone in a virtual environment. To run them, issue the following commands (assuming, of course, that you did go through the basic setup steps from the previous post to set up the environment)

pwd
# you should be in the directory into which 
# you did clone the repository
cd openstack-samples/Lab2
vagrant up
ansible-playbook -i hosts.ini site.yaml

While the scripts are running, let us discuss the installation steps. First, we need to prepare the database. Keystone uses its own database schema called (well, you might guess …) keystone that needs to be added to the MariaDB instance. We will also have to create a new database user keystone with the necessary privileges on the keystone database.

Then, we install the Keystone packages via APT. This will put default configuration files into /etc/keystone which we need to adapt. Actually, there is only one change that we need to make at this point – we need to change the connection key to contain a proper connection string to reach our MariaDB with the database credentials just established.

Next, the keystone database needs to be created. To do this, we use the keystone-manage db_sync command that actually performs an upgrade of the Keystone DB schema to the latest version. We then again utilize the keystone-manage tool to create symmetric keys for the Fernet token mechanism and to encrypt credentials in the SQL backend.

Now we need to add a minimum set of domains, project and users to Keystone. Here, however, we face a chicken-and-egg problem. To be able to add a user, we need the authorization to do this, so we need a user, but there is no user yet.

There are two solutions to this problem. First, it is possible to define an admin token in the Keystone configuration file. When this token is used for a request, the entire authorization mechanism is bypassed, which we could use to create our initial admin user. This method, however, is a bit dangerous. The admin token is contained in the configuration file in clear text and never expires, so anyone who has access to the file can perform every action in Keystone and then OpenStack.

The second approach is to again the keystone-manage tool which has a command bootstrap which will access the database directly (more precisely, via the keystone code base) and will create a default domain, a default project, an admin user and three roles (admin, member, reader). The admin user is set up to have the admin role for the admin project and on system level. In addition, the bootstrap process will create a region and catalog entries for the identity services (we will discuss these terms later on).

Users, projects, roles and domains

Of course, users are the central object in Keystone. A user can either represent an actual human user or a service account which is used to define access rights for the OpenStack services with respect to other services.

In a typical cloud environment, just having a global set of users, however, is not enough. Instead, you will typically have several organizations or tenants that use the same cloud platform, but require a certain degree of separation. In OpenStack, tenants are modeled as projects (even though the term tenant is sometimes used as well to refer to the same thing). Projects and users, in turn, are both grouped into domains.

To actually define which user has which access rights in the system, Keystone allows you to define roles and assign roles to users. In fact, when you assign a role, you always do this for a project or a domain. You would, for instance, assign the role reader to user bob for the project test or for a domain. So role assignments always refer to a user and either a role or a project.

DomainsUserRolesProjects

Note that it is possible to assign a role to a user in a domain for a project living in a different domain (though you will probably have a good reason to do this).

In fact, the full picture is even a bit more complicated than this. First, roles can imply other roles. In the default installation, the admin role implies the member role, and the member role implies the reader role. Second, the above diagram suggests that a role is not part of a domain. This is true in most cases, but it is in fact possible to create domain-specific roles. These roles do not appear in a token and are therefore not directly relevant to authorization, but are intended to be used as prior roles to map domain specific role logic onto the overall role logic of an installation.

It is also not entirely true that roles always refer to either a domain or a project. In fact, Keystone allows for so-called system roles which are supposed to be used to restrict access to operations that are system wide, for instance the configuration of API endpoints.

Finally, there are also groups. Groups are just collections of users, and instead of assigning a role to a user, you can assign a role to a group which then effectively is valid for all users in that group.

And, yes, there are also subprojects.. but let us stop here, you see that the Keystone data structures are complicated and have been growing significantly over time.

To better understand the terms discussed so far, let us take a look at our sample installation. First, establish an SSH connection to some of the nodes, say the controller node.

vagrant ssh controller

On this node, we will use the OpenStack Python client to explore users, projects and domains. To run it, we will need credentials. When you work with the OpenStack CLI, there are several methods to supply credentials. The option we will use is to provide credentials in environment variables. To be able to quickly set up these variables, the installation script creates a bash script admin-openrc that sets these credentials. So let us source this script and then submit an OpenStack API request to list all existing users.

source admin-openrc
openstack user list

At this point, this should only give you one user – the admin user created during the installation process. To display more details for this user, you can use openstack user show admin, and you should obtain an output similar to the one below.

+---------------------+----------------------------------+
| Field               | Value                            |
+---------------------+----------------------------------+
| domain_id           | default                          |
| enabled             | True                             |
| id                  | 67a4f789b4b0496cade832a492f7048f |
| name                | admin                            |
| options             | {}                               |
| password_expires_at | None                             |
+---------------------+----------------------------------+

We see that the user admin is part of the default domain, which is the standard domain used in OpenStack as long as no other domain is specified.

Let us now see which role assignments this user has. To do this, let us list all assigments for the user admin, using JSON output for better readability.

openstack role assignment list --user admin -f json

This will yield the following output.

[
  {
    "Role": "18307c8c97a34d799d965f38b5aecc37",
    "User": "92f953a349304d48a989635b627e1cb3",
    "Group": "",
    "Project": "5b634876aa9a422c83591632a281ad59",
    "Domain": "",
    "System": "",
    "Inherited": false
  },
  {
    "Role": "18307c8c97a34d799d965f38b5aecc37",
    "User": "92f953a349304d48a989635b627e1cb3",
    "Group": "",
    "Project": "",
    "Domain": "",
    "System": "all",
    "Inherited": false
  }
]

Here we see that there are two role assignments for this user. As the output only contains the UUIDs of the role and the project, we will have to list all projects and all roles to be able to interpret the output.

openstack project list 
openstack role list

So we see that for both assignments, the role is the admin role. For the first assignment, the project is the admin project, and for the second assignment, there is no project (and no domain), but the system field is filled. Thus the first assignment assigns the admin role for the admin project to our user, whereas the second one assigns the admin role on system level.

So far, we have not specified anywhere what these roles actually imply. To understand how roles lead to authorizations, there are still two missing pieces. First, OpenStack has a concept of implied roles. These are roles that a user has which are implied by explicitly defined roles. To see implied roles in action, run

openstack implied role list 

The resulting table will list the prior roles on the left and the implied role on the right. So we see that having the admin role implies to also have the member role, and having the member role in turn implies to also have the reader role.

The second concept that we have not touched upon are policies. Policies actually define what a user having a specific role is allowed to do. Whenever you submit an API request, this request targets a certain action. Actions more or less correspond to API endpoints, so an action could be “list all projects” or “create a user”. A policy defines a rule for this action which is evaluated to determine whether that request is allowed. A simple rule could be “user needs to have the admin role”, but the rule engine is rather powerful and we can define much more elaborated rules – more on this in the next post.

The important point to understand here is that policies are not defined by the admin via APIs, but are predefined either in the code or in specific policy files that are part of the configuration of each OpenStack service. Policies refer to roles by name, and it does not make sense to define and use a role that this is not referenced by policies (even though you can technically do this). Thus you will rarely need to create roles beyond the standard roles admin, member and reader unless you also change the policy files.

Service catalogs

Apart from managing users (the Identity part of Keystone), project, roles and domains (the Resources part of Keystone), Keystone also acts as a service registry. OpenStack services register themselves and their API endpoints with Keystone, and OpenStack clients can use this information to obtain the URL of service endpoints.

Let us take a look at the services that are currently registered with Keystone. This can be done by running the following commands on the controller.

source admin-openrc
openstack service list -f json

At this point in the installation, before installing any other OpenStack services, there is only one service – Keystone itself. The corresponding output is

[
  {
    "ID": "3acb257f823c4ecea6cf0a9e94ce67b9",
    "Name": "keystone",
    "Type": "identity"
  },
]

We see that a service has a name which identifies the actual service, in addition to a type which defines the type of service delivered. Given the type of a service, we can now use Keystone to retrieve a list of service API endpoints. In our example, enter

openstack endpoint list --service identity -f json

which should yield the following output.

[
  {
    "ID": "062975c2758f4112b5d6568fe068aa6f",
    "Region": "RegionOne",
    "Service Name": "keystone",
    "Service Type": "identity",
    "Enabled": true,
    "Interface": "public",
    "URL": "http://controller:5000/v3/"
  },
  {
    "ID": "207708ecb77e40e5abf9de28e4932913",
    "Region": "RegionOne",
    "Service Name": "keystone",
    "Service Type": "identity",
    "Enabled": true,
    "Interface": "admin",
    "URL": "http://controller:5000/v3/"
  },
  {
    "ID": "781c147d02604f109eef1f55248f335c",
    "Region": "RegionOne",
    "Service Name": "keystone",
    "Service Type": "identity",
    "Enabled": true,
    "Interface": "internal",
    "URL": "http://controller:5000/v3/"
  }
]

Here, we see that every service typically offers different types of endpoints. There are public endpoints, which are supposed to be reachable from an external network, internal endpoints for users in the internal network and admin endpoints for administrative access. This, however, is not enforced by Keystone but by the network layout you have chosen. In our simple test installation, all three endpoints for a service will be identical.

When we install more OpenStack services later, you will see that as part of this installation, we will always register a new service and corresponding endpoints with Keystone.

Token authorization

So far, we have not yet discussed how an OpenStack service actually authenticates a user. There are several ways to do this. First, you can authorize using passwords. When using the OpenStack CLI, for instance, you can put username and password into environment variables which will then be used to make API requests (for ease of use, the Ansible playbooks that we use to bring up our environment will create a file admin-openrc which you can source to set these variables and that we have already used in the examples above).

In most cases, however, subsequent authorizations will use a token. A token is essentially a short string which is issued once by Keystone and then put into the X-Auth-Token field in the HTTP header of subsequent requests. If a token is present, Keystone will validate this token and, if it is valid, be able to derive all informations it needs to authenticate the user and authorize a request.

Keystone is able to use different token formats. The default token format with recent releases of Keystone is the Fernet token format.

It is important to understand that tokens are scoped objects. The scope of a token determines which roles are taken into account for the authorization process. If a token is project scoped, only those roles of a user that target a project are considered. If a token is domain scoped, only the roles that are defined on domain level are considered. And finally, a system scope token implies that only roles at system level are relevant for the authorization process.

Earlier versions of Keystone supported a token type called PKI token that contained a large amount of information directly, including role information and service endpoints. The advantage of this approach was that once a token had been issued, it could be processed without any further access to Keystone. The disadvantage, however, was that the tokens generated in this way tended to be huge, and soon reached a point where they could no longer be put into a HTTP header. The Fernet token format handles things differently. A Fernet token is an encrypted token which contains only a limited amount of information. To use it, a service will, in most cases, need to run additional calls agains Keystone to retrieve additional data like roles and services. For a project scoped token, for instance, the following diagram displays the information that is stored in a token on the left hand side.

FernetToken

First, there is a version number which encodes the information on the scope of the token. Then, there is the ID of the user for whom the token is issued, the methods that the user has used to authenticate, the ID of the project for which the token is valid, and the expiration date. Finally, there is an audit ID which is simply a randomly generated string that can be put into logfiles (in contrast to the token itself, which should be kept secret) and can be used to trace the usage of this token. All these fields are serialized and encrypted using a symmetric key stored by Keystone, typically on disk. A domain scoped token contains a domain ID instead of the project ID and so forth.

Equipped with this understanding, we can now take a look at the overall authorization process. Suppose a client wants to request an action via the API, say from Nova. First, the client would then use password-based authorization to obtain a token from Keystone. Keystone returns the token along with an enriched version containing roles and endpoints as well. The client would use the endpoint information to determine the URL of the Nova service. Using the token, it would then try to submit an API request to the Nova API.

The Nova service will take the token, validate it and ask Keystone again to enrich the token, i.e to add the missing information on roles and endpoints (in fact, this happens in the Keystone middleware). It is then able to use the role information and its policies to determine whether the user is authorized for the request.

OpenStackAuthorization

Using Keystone with LDAP and other authentication mechanisms

So far, we have stored user identities and group inside the MariaDB database, i.e. local. In most production setups, however, you will want to connect to an existing identity store, which is typically exposed via the LDAP protocol. Fortunately, Keystone can be integrated with LDAP. This integration is read-only, and Keystone will use LDAP for authentication, but still store projects, domains and role information in the MariaDB database.

When using this feature, you will have to add various data to the Keystone configuration file. First, of course, you need to add basic connectivity information like credentials, host and port so that Keystone can connect to an LDAP server. In addition, you need to define how the fields of a user entity in LDAP map onto the corresponding fields in Keystone. Optionally, TLS can be configured to secure the connection to an LDAP server.

In addition to LDAP, Keystone also supports a variety of alternative methods for authentication. First, Keystone supports federation, i.e. the ability to share authentication data between different identity providers. Typically, Keystone will act as a service provider, i.e. when a user tries to connect to Keystone, the user is redirected to an identity provider, authenticates with this provider and Keystone receives and accepts the user data from this provider. Keystone supports both the OpenID Connect and the SAML standard to exchange authentication data with an identity provider.

As an alternative mechanism, Keystone can delegate the process of authentication to the Apache webserver in which Keystone is running – this is called external authentication in Keystone. In this case, Apache will handle the authentication, using whatever mechanisms the administrator has configured in Apache, and pass the resulting user identity as part of the request context down to Keystone. Keystone will then look up this user in its backend and use it for further processing.

Finally, Keystone offers a mechanism called application credentials to allow applications to use the API on behalf of a Keystone user without having to reveal the users password to the application. In this scenario, a user creates an application credential, passing in a secret and (optionally) a subset of roles and endpoints to which the credential grants access. Keystone will then create a credential and pass its ID back to the user. The user can then store the credential ID and the secret in the applications configuration. When the application wants to access an OpenStack service, it uses a POST request on the /auth/tokens endpoint to request a token, and Keystone will generate a token that the application can use to connect to OpenStack services.

This completes our post today. Before moving on to install additional services like Placement, Glance and Nova, we will – in the next post – go on a guided tour through a part of the Keystone source code to see how tokens and policies work under the hood.

Understanding TLS certificates with Ansible and NGINX – part II

In the first part of this short series, we have seen how Ansible can be used to easily generate self-signed certificates. Today, we will turn to more complicated set-ups and learn how to act as a CA, build chains of certificates and create client-certificates.

Creating CA and intermediate CA certificates

Having looked at the creation of a single, self-signed certificate for which issuer and subject are identical, let us now turn to a more realistic situation – using one certificate, a CA certificate, to sign another certificate. If this second certificate is directly used to authorize an entity, for instance by being deployed into a web server, it is usually called an end-entity certificate. If, however, this certificate is used to in turn sign a third certificate, it is called an intermediate CA certificate.

In the first post, we have looked at the example of the certificate presented by github.com, which is signed by a certificate with the CN “DigiCert SHA2 Extended Validation Server CA” (more precisely, of course, by the private key associated with the public key verified by this certificate), which in turn is issued by “DigiCert High Assurance EV Root CA”, the root CA. Here, the second certificate is the intermediate CA certificate, and the certificate presented by github.com is the end-entity certificate.

Let us now try to create a similar chain in Ansible. First, we need a root CA. This will again be a self-signed certificate (which is the case for all root CA certificates). In addition, root CA certificates typically contain a set of extensions. To understand these extensions, the easiest approach is to look a few examples. You can either use openssl x509 to inspect some of the root certificates that come with your operating system, or use your browser certificate management tab to look at some of the certificates there. Doing this, you will find that root CA certificates typically contain three extensions as specified by X509v3, which are also defined in RFC 3280.

  • Basic Constraints: CA: True – this marks the certificate as a CA certificate
  • Key Usage: Digital Signature, Certificate Sign, CRL Sign – this entitles the certificate to be used to sign other certificates, perform digital signatures and sign CRLs (certificate revocation lists)
  • Subject Key identifier: this is an extension which needs to be present for a CA according to RFC 3280 and allows the usage of a hash key of the public key to easily identify certificates for a specific public key

All these requirements can easily be met using our Ansible modules. We essentially proceed as in the previous post and use the openssl_csr to create a CSR from which we then generate a certificate using the openssl_certificate module. The full playbook (also containing the code for the following sections) can be found here. A few points are worth being noted.

  • when creating the CSR, we need to add the fields key_usage and key_usage_critical to the parameters of the Ansible module. The same holds for basic_constraints and basic_constraints_critical
  • The module will by default put the common name into the subject alternative name extension (SAN). To turn this off, we need to set use_common_name_for_san to false.
  • When creating the certificate using openssl_certificate, we need the flag selfsigned_create_subject_key_identifier to instruct the module to add a subject key identifier extension to the certificate. This feature is only available since Ansible version 2.9. So in case you have an older version, you need to use pip3 install ansible to upgrade to the latest version (you might want to run this in a virtual environment)

Having this CA in place, we can now repeat the procedure to create an intermediate CA certificate. This will again be a CA certificate, with the difference that its issuer will be the root certificate that we have just created. So we do no longer use the selfsigned provider when calling the Ansible openssl_certificate module, but the ownca provider. This requires a few additional parameters, most notably of course the root CA and the private key of the root CA. So the corresponding task in the playbook will look like this.

- name: Create certificate for intermediate CA
  openssl_certificate:
    csr_path: "{{playbook_dir}}/intermediate-ca.csr"
    path: "{{playbook_dir}}/etc/certs/intermediate-ca.crt"
    provider: ownca
    ownca_path: "{{playbook_dir}}/etc/certs/ca.crt"
    ownca_create_subject_key_identifier: always_create
    ownca_privatekey_path: "{{playbook_dir}}/etc/certs/ca.rsa" 

When creating the CSR, we also modify the basic constraints field a bit and add the second key/value-pair pathlen:0. This specifies that the resulting certificate cannot be used to create any additional CA certificates, but only to create the final, end-entity certificate.

This is what we will do next. The code for this is more or less the same as that for creating the intermediate CA, but this time, we use the intermediate CA instead of the root CA for signing and we also change the extensions again to create a classical service certificate.

Let us now put all this together and verify that our setup works. To create all certificates, enter the following commands.

git clone https://github.com/christianb93/tls-certificates
cd tls-certificates/lab2
ansible-playbook site.yaml

When the script completes, you should see a couple of certificates created in etc/certs. We can use OpenSSL to inspect them.

for cert in server.crt intermediate-ca.crt ca.crt; do
  openssl x509 -in etc/certs/$cert -noout -text
done

This should display all three certificates in the order listed. Looking at the common names and e-mail addresses (all other attributes of the distinguished name are identical), you should now nicely see that these certificates really form a chain, with the issuer of one element in the chain being the subject of the next one, up to the last one, which is self-signed.

Now let us see how we need to configure NGINX to use our new server certificate when establishing a TLS connection. At the first glance, you might think that we simply replace the server certificate from the last lab with our new one. But there is an additional twist. A client will typically have a copy of the root CA, but it is not clear that a client will have a copy of the intermediate CA as well. Therefore, instead of using just the server certificate, we point NGINX to a file server-chain.crt which contains both the server certificate and the intermediate CA, in this order. So run

cp etc/certs/server.crt etc/certs/server-chain.crt
cat etc/certs/intermediate-ca.crt >> etc/certs/server-chain.crt
docker run -d --rm \
       -p 443:443 \
       -v $(pwd)/etc/conf.d:/etc/nginx/conf.d \
       -v $(pwd)/etc/certs:/etc/nginx/certs \
       nginx

Once the NGINX server is running, we should now be able to build a connection for testing using OpenSSL. As the certificates that the server presents are not self-signed, we also need to tell OpenSSL where the root CA needed to verify the chain of certificates is stored.

openssl s_client \
  --connect localhost:443 \
  -CAfile etc/certs/ca.crt
GET /index.html HTTP/1.0

You should again see the NGINX welcome page. It is also instructive to look at the output that OpenSSL produces and which, right at the beginning, also contains a representation of the certificate chain as received and verified by OpenSSL.

Creating and using client certificates

So far, our certificates have been server certificates – a certificate presented by a server to prove that the public key that the server presents us is actually owned by the entity operating the server. Very often, for instance when securing REST APIs like that of Kubernetes, however, the TLS protocol is used to also authenticate a user.

Let us take the Kubernetes API as an example. The Kubernetes API is a REST API using HTTPS and listening (by default) on port 6443. When a user connects to this URL, a server certificate is used so that the user can verify that the server is really owned by whoever provides the cluster. When a user makes a request to the API server, then, in addition to that, the server would also like to know that the user is a trusted user, and will have to authenticate the user, i.e. associate a certain identity with the request.

For that purpose, Kubernetes can be configured to ask the user for a client certificate during the TLS handshake. The server will then try to verify this certificate against a configured CA certificate. If that verification is successful, i.e. if the server can build a chain of certificates from the certificate that the client presents – the so-called client certificate – then the server will extract the common name and the organization from that certificate and use it as user and group to process the API request.

Let us now see how these client certificates can be created. First, of course, we need to understand what properties of a certificate turn it into a client certificate. Finding a proper definition of the term “client certificate” is not that straightforward as you might expect. There are several recommendations describing a reasonable set of extensions for client certificates (RFC 3279, RFC 5246 and the man page of the OpenSSL X509 tool. Combining these recommendations, we use the following set of extension:

  • keyUsage is present and contains the bits digitalSignature and keyEncipherment
  • extend usage is present and contains the clientAuth key

The Ansible code to generate this certificate is almost identical to the code in the previous section, with the differences due to the different extensions that we request. Thus we again create a self-signed root CA certificate, use this certificate to sign a certificate for an intermediate CA, and then use the intermediate CA certificate to issue certificates for client and server.

We also have to adjust our NGINX setup by adding the following two lines to the configuration of the virtual server.

ssl_verify_client       on;
ssl_client_certificate  /etc/nginx/certs/ca.crt;

With the first line, we instruct NGINX to ask a client for a TLS certificate during the handshake. With the second line, we specify the CA that NGINX will use to verify these client certificates. In fact, as you will see immediately when running our example, the server will even tell the client which CAs it will accept as issuer, this is part of the certificate request specified here.

Time to see all this in action again. To download, run and test the playbook enter the following commands (do not forget to stop the container created in the previous section).

git clone https://github.com/christianb93/tls-certificates
cd tls-certificates/lab3
ansible-playbook site.yaml
openssl s_client \
  --connect localhost:443 \
  -CAfile etc/certs/ca.crt \
  -cert etc/certs/client.crt \
  -cert_chain etc/certs/intermediate-ca.crt \
  -key etc/certs/client.rsa
GET /index.html HTTP/1.0


Note the additional switches to the OpenSSL client command. With the -cert switch, we tell OpenSSL to submit a client certificate when requested and point it to the file containing this certificate. With the -cert_chain parameter, we specify additional certificates (if any) that the client will send in order to complete the certificate chain between the client certificate and the root certificate. In our case, this is the intermediate CA certificate (this would not be needed if we had used the intermediate CA certificate in the server configuration). Finally, the last switch -key contains the location of the private RSA key matching the presented certificate.

This closes our post (and the two-part mini series) on TLS certificates. We have seen that Ansible can be used to automate the generation of self-signed certificates and to build entire chains-of-trust involving end-entity certificates, intermediate CAs and private root CAs. Of course, you could also reach out to a provider to do this for you, but is (maybe) a topic for another post.

Understanding TLS certificates with NGINX and Ansible – part I

If you read technical posts like this one, chances are that you have already had some exposure to TLS certificates, for instance because you have deployed a service that uses TLS and needed to create and deploy certificates for the servers and potentially for clients. Dealing with certificates can be a challenge, and a sound understanding of what certificates actually do is more than helpful for this. In this and the next post, we will play with NGINX and Ansible to learn what certificates are, how they are generated and how they are used.

What is a certificate?

To understand the structure of a certificate, let us first try to understand the problem that certificates try to solve. Suppose you are communicating with some other party over an encrypted channel, using some type of asymmetric cryptosystem like RSA. To send an encrypted message to your peer, you will need the peers public key as a prerequisite. Obviously, you could simply ask the peer to send you the public key before establishing a connection, but then you need to mitigate the risk that someone uses a technique like IP address spoofing to pretend to be the peer you want to connect with, and is sending you a fake public key. Thus you need a way to verify that the public key that is presented to you is actually the public key owned by the party to which you want to establish a connection.

One approach could be to establish a third, trusted and publicly known party and ask that trusted party to digitally sign the public key, using a digital signature algorithm like ECDSA. With that party in place, your peer would then present you the signed public key, you would retrieve the public key of the trusted party, use that key to verify the signature and proceed if this verification is successful.

CertificatesI

So what your peer will present you when you establish a secure connection is a signed public key – and this is, in essence, what a certificate really is. More precisely, a certificate according to the X509 v3 standard consists of the following components (see also RFC 52809.

  • A version number which refers to a version of the X509 specification, currently version 3 is what is mostly used
  • A serial number which the third party (called the issuer) assigns to the certificate
  • a valid-from and a valid-to date
  • The public key that the certificate is supposed to certify, along with some information on the underlying algorithm, for instance RSA
  • The subject, i.e. the party owning the key
  • The issuer, i.e. the party – also called certificate authority (CA) – signing the certificate
  • Some extensions which are additional, optional pieces of data that a certificate can contain – more on this later
  • And finally, a digital signature signing all the data described above

Let us take a look at an example. Here is a certificate from github.com that I have extracted using OpenSSL (we will learn how to do this later), from which I have removed some details and added some line breaks to make the output a bit more readable.

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            0a:06:30:42:7f:5b:bc:ed:69:57:39:65:93:b6:45:1f
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = DigiCert Inc, OU = www.digicert.com, 
                CN = DigiCert SHA2 Extended Validation Server CA
        Validity
            Not Before: May  8 00:00:00 2018 GMT
            Not After : Jun  3 12:00:00 2020 GMT
        Subject: businessCategory = Private Organization, 
                jurisdictionC = US, 
                jurisdictionST = Delaware, 
                serialNumber = 5157550, 
                C = US, ST = California, L = San Francisco, 
                O = "GitHub, Inc.", CN = github.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    SNIP --- SNIP
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            SNIP --- SNIP
    Signature Algorithm: sha256WithRSAEncryption
         70:0f:5a:96:a7:58:e5:bf:8a:9d:a8:27:98:2b:00:7f:26:a9:
         SNIP ----- SNIP
         af:ed:7a:29

We clearly recognize the components just discussed. At the top, there are the version number and the serial number (in hex). Then we see the signature algorithm and, at the bottom, the signature, the issuer (DigiCert), the validity, the subject (GitHub Inc.) and, last but not least, the full public key. Note that both, issuer and subject, are identified using distinguished names as you might known them from LDAP and similar directory services.

If we now wanted to verify this certificate, we would need to get the public key of the issuer, DigiCert. Of course, this is a bit of a chicken-egg problem, as we would need another certificate to verify the authenticity of this key as well. So we would need a certificate with subject DigiCert, signed by some other party, and then another certificate signed by yet another certificate authority, and so forth. This chain obviously has to end somewhere, and it does – the last certificate in such a chain (the root CA certificate) is typically a self-signed certificate. These are certificates for which issuer and subject are identical, i.e certificates where no further verification is possible and in which we simply have to trust.

How, then, do we obtain these root certificates? The answer is that root certificates are either distributed inside an organization or are bundled with operating systems and browsers. In our example, the DigiCert certificate that we see here is itself signed by another DigiCert unit called “DigiCert High Assurance EV Root CA”, and a certificate for this CA is part of the Ubuntu distribution that I use and stored in /etc/ssl/certs/DigiCert_High_Assurance_EV_Root_CA.pem which is a self-signed root certificate.

CertificatesII

In this situation, the last element of the chain is called the root CA, the first element the end-entity and any element in between an intermediate CA.

To obtain a certificate, the owner of the server github.com would turn to the intermediate CA and submit file, a so-called certificate signing request (CSR), containing the public key to be signed. The format for CSRs is standardized in RFC 2986 which, among things, specifies that a CSR be itself signed with the private key of the requestor, which also proves to the intermediate CA that the requestor possesses the private key corresponding to the public key to be signed. The intermediate CA will then issue a certificate. To establish the intermediate CA, the intermediate CA has, at some point in the past, filed a similar CSR with the root CA and that root CA has issued a corresponding certificate to the intermediate CA.

The TLS handshake

Let us now see how certificates are applied in practice to secure a communication. Our example is the transport layer security protocol TLS, formerly known as SSL, which is underlying the HTTPS protocol (which is nothing but HTTP sitting on top of TLS).

In a very basic scenario, a TLS communication roughly works as follows. First, the clients send a “hello” message to the server, containing information like the version of TLS supported and a list of supported ciphers. The server answers with a similar message, immediately followed by the servers certificate. This certificate contains the name of the server (either as fully-qualified domain name, or including wildcards like *.domain.com in which case the certificate is called a wildcard certificat) and, of course, the public key of the server. Client and server can now use this key to agree on a secret key which is then used to encrypt the further communication. This phase of the protocol which prepares the actual encrypted connection is known as the TLS handshake.

To successfully conclude this handshake, the server therefore needs a certificate called the server certificate which it will present to the client and, of course, the matching private key, called the server private key. The client needs to verify the server certificate and therefore needs access to the certificate of the (intermediate or root) CA that signed the server certificate. This CA certificate is known as the server CA certificate. Instead of just presenting a single certificate, a server can also present an entire chain of certificates which must end with the server CA certificate that the client knowns. In practice, these certificates are often the root certificates distributed with operating systems and browsers to which the client will have access.

Now suppose that you are a system administrator aiming to set up a TLS secured service, say a HTTPS-based reverse proxy with NGINX. How would you obtain the required certificates? First, or course, you would create a key pair for the server. Once you have that, you need to obtain a certificate for the public key. Basically, you have three options to obtain a valid certificate.

First, you could turn to an independent CA and ask the CA to issue a certificate, based on a CSR that you provide. Most professional CAs will charge for this. There are, however, a few providers like let’s encrypt or Cloudflare that offer free certificates.

Alternatively, you could create your own, self-signed CA certificate using OpenSSL or Ansible, this is what we will do today in this post. And finally, as we will see in the next post, you could even build your own “micro-CA” to issue intermediate CA certificates which you can then use to issue end-entity certificates within your organization.

Using NGINX with self-signed certificates

Let us now see how self-signed certificates can be created and used in practice. As an example, we will secure NGINX (running in a Docker container, of course) using self-signed certificates. We will first do this using OpenSSL and the command line, and then see how the entire process can be automated using Ansible.

The setup we are aiming at is NGINX acting as TLS server, i.e. we will ask NGINX to provide content via HTTPS which is based on TLS. We already know that in order to do this, the NGINX server will need an RSA key pair and a valid server certificate.

To create the key pair, we will use OpenSSL. OpenSSL is composed of a variety of different commands. The command that we will use first is the genrsa command that is responsible for creating RSA keys. The man page – available via man genrsa – is quite comprehensive, and we can easily figure out that we need the following command to create a 2048 bit RSA key, stored in the file server.rsa.

openssl genrsa \
  -out server.rsa

As a side note, the created file does not only contain the private key, but also the public key components (i.e. the public exponent), as you can see by using openssl rsa -in server.rsa -noout -text to dump the generated key.

Now we need to create the server certificate. If we wanted to ask a CA to create a certificate for us, we would first create a CSR, and the CA would then create a matching certificate. When we use OpenSSL to create a self-signed certificate, we do this in one step – we use the req command of OpenSSL to create the CSR, and pass the additional switch –x509 which instructs OpenSSL to not create a CSR, but a self-signed certificate.

To be able to do this, OpenSSL will need a few pieces of information from us – the validity, the subject (which will also be the issuer), the public key to be signed, any extensions that we want to include and finally the output file name. Some of these options will be passed on the command line, but other options are usually kept in a configuration file.

OpenSSL configuration files are plain-text files in the INI-format. There is one section for each command, and there can be additional sections which are then referenced in the command-specific section. In addition, there is a default section with settings which apply for all commands. Again, the man page (run man config for the general structure of the configuration file and man req for the part specific to the req command) – is quite good and readable. Here is a minimal configuration file for our purposes.

[req]
prompt = no
distinguished_name = dn
x509_extensions = v3_ext

[dn]
CN = Leftasexercise
emailAddress = me@leftasexercise.com
O = Leftasexercise blog
L = Big city
C = DE

[v3_ext]
subjectAltName=DNS:*.leftasexercise.local,DNS:leftasexercise.local

We see that the file has three sections. The first section is specific for the req command. It contains a setting that instructs OpenSSL to not prompt us for information, and then two references to other sections. The first of these sections contains the distinguished name of the subject, the second section contains the extensions that we want to include.

There are many different extensions that were introduced with version 3 of the X509 format, and this is not the right place to discuss all of them. The one that we use for now is the subject alternative name extension which allows us to specify a couple of alias names for the subject. Often, these are DNS names for servers for which the certificate should be valid, and browsers will typically check these DNS names and try to match them with the name of the server. As shown here, we can either use a fully-qualified domain name, or we can use a wildcard – these certificates are often called wildcard certificates (which are disputed as they give rise to security concerns, see for instance this discussion). This extension is typical for server certificates.

Let us assume that we have saved this configuration file as server.cnf in the current working directory. We can now invoke OpenSSL to actually create a certificate for us. Here is the command to do this and to print out the resulting certificate.

openssl req \
  -new \
  -config server.cnf \
  -x509 \
  -days 365 \
  -key server.rsa \
  -out server.crt
# Take a look at the certificate
openssl x509 \
  -text \
  -in server.crt -noout

If you scroll through the output, you will be able to identify all components of a certificate discussed so far. You will also find that the subject and the issuer of the certificate are identical, as we expect it from a self-signed certificate.

Let us now turn to the configuration of NGINX needed to serve HTTPS requests presenting our newly created certificate as server certificate. Recall that an NGINX configuration file contains a context called server which contains the configuration for a specific virtual server. To instruct NGINX to use TLS for this server, we need to add a few lines to this section. Here is a full configuration file containing these lines.

server {
    listen               443 ssl;
    ssl_certificate      /etc/nginx/certs/server.crt;
    ssl_certificate_key  /etc/nginx/certs/server.rsa;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
    }
}

In the line starting with listen, specifically the ssl keyword, we ask NGINX to use TLS for port 443, which is the default HTTPS port. In the next line, we tell NGINX which file it should use as a server certificate, presented to a client during the TLS handshake. And finally, in the third line, we point NGINX to the location of the key matching this certificate.

To try this out, let us bring up an an NGINX container with that configuration. Ẃe will mount two directories into this container – one directory containing our certificates, and one directory containing the configuration file. So create the following directories in your current working directory.

mkdir ./etc
mkdir ./etc/conf.d
mkdir ./etc/certs

Then place a configuration file default.conf with the content shown above in ./etc/conf.d and the server certificate and server private key that we have created in the directory ./etc/certs.d. Now we start the container and map these directories into the container.

docker run -d --rm \
       -p 443:443 \
       -v $(pwd)/etc/conf.d:/etc/nginx/conf.d \
       -v $(pwd)/etc/certs:/etc/nginx/certs \
       nginx

Note that we map port 443 inside the container into the same port number on the host, so this will only work if you do not yet have a server running on this port, in this case, pick a different port. Once the container is up, we can test our connection using the s_client command of the OpenSSL package.

openssl s_client --connect 127.0.0.1:443

This will produce a lengthy output that details the TLS handshake protocol and will then stop. Now enter a HTTP GET request like

GET /index.html HTTP/1.0

The HTML code for the standard NGINX welcome page should now be printed, demonstrating that the setup works.

When you go through the output produced by OpenSSL, you will see that the client displays the full certificate chain from the certificate presented by the server up to the root CA. In our case, this chain has only one element, as we are using a self-signed certificate (which the client detects and reports as error – we will see how to get rid of this in the next post).

Automating certificate generation with Ansible

So far, we have created keys and certificates manually. Let us now see how this can be automated using Ansible. Fortunately, Ansible comes with modules to manage TLS certificates.

The first module that we will need is the openssl_csr module. With this module, we will create a CSR which we will then, in a second step, present to the module openssl_certificate to perform the actual signing process. A third module, openssl_privatekey, will be used to create a key pair.

Let us start with the key generation. Here, the only parameters that we need are the length of the key (we again use 2048 bits) and the path to the location of the generated key. The algorithm will be RSA, which is the default, and the key file will by default be created with the permissions 0600, i.e. only readable and writable by the owner.

- name: Create key pair for the server
  openssl_privatekey:
    path: "{{playbook_dir}}/etc/certs/server.rsa"
    size: 2048

Next, we create the certificate signing request. To use the openssl_csr module to do this, we need to specificy the following parameters:

  • The components of the distinguished name of the subject, i.e. common name, organization, locality, e-mail address and country
  • Again the path of the file into which the generated CSR will be written
  • The parameters for the requested subject alternative name extension
  • And, of course, the path to the private key used to sign the request
- name: Create certificate signing request
  openssl_csr:
    common_name: "Leftasexercise"
    country_name: "DE"
    email_address: "me@leftasexercise.com"
    locality_name: "Big city"
    organization_name: "Leftasexercise blog"
    path: "{{playbook_dir}}/server.csr"
    subject_alt_name: 
      - "DNS:*.leftasexercise.local"
      - "DNS:leftasexercise.local"
    privatekey_path: "{{playbook_dir}}/etc/certs/server.rsa"

Finally, we can now invoke the openssl_certificate module to create a certificate from the CSR. This module is able to operate using different backends, the so-called provider. The provider that we will use for the time being is the self-signed provider which generates self-signed certificates. Apart from the path to the CSR and the path to the created certificate, we therefore need to specify this provider and the private key to use (which, of course, should be that of the server), and can otherwise rely on the default values.

- name: Create self-signed certificate
  openssl_certificate:
    csr_path: "{{playbook_dir}}/server.csr"
    path: "{{playbook_dir}}/etc/certs/server.crt"
    provider: selfsigned
    privatekey_path: "{{playbook_dir}}/etc/certs/server.rsa"

Once this task completes, we are now ready to start our Docker container. This can again be done using Ansible, of course, which has a Docker module for that purpose. To see and run the full code, you might want to clone my GitHub repository.

git clone http://github.com/christianb93/tls-certificates
cd tls-certificates/lab1
ansible-playbook site.yaml

This completes our post for today. In the next post, we will look into more complex setups involving our own local certificate authority and learn how to generate and use client certificates.