OpenStack Neutron – running Neutron with a separate network node

So far, our OpenStack control plane setup was rather simple – we had a couple of compute nodes, and all other services were running on the same controller node. In practice, this does not only create a single point of failure, but also a fairly high traffic on the network interfaces. In this post, we will move towards a more distributed setup with a dedicated network node.

OpenStack nodes and their roles

Before we plan our new topology, let us quickly try to understand what nodes we have used so far. Recall that each node has an Ansible hostname which is the name returned by the Ansible variable inventory_hostname and defined in the inventory file. In addition, each node has a DNS name, and during our installation procedure, we adapt the /etc/hosts file on each node so that DNS names and Ansible hostnames are identical. A host called, for instance, controller in the Ansible inventory will therefore be reachable under this name across the cluster.

The following table lists the types of nodes (defined by the services running on them) and the Ansible hostnames used so far.

Node type Description Hostname
api_node Node on which all APIs are exposed. This will typically be the controller node, but could also be a load balancer in a HA setup controller
db_node Node on which the database is running controller
mq_node Node on which the Rabbit MQ service is running controller
memcached_node Node on which the memcached service is running controller
ntp_node Node on which the NTP service is running controller
network_node Node on which DHCP agent and the L3 agent (and thus routers) are running controller
horizon_node Node on which Horizon is running controller
compute_node Compute nodes compute*

Now we can start to split out some of the nodes and distribute the functionality across several machines. It is rather obvious how to do this for e.g. the database node or the RabbitMQ node – simply start MariaDB or the RabbitMQ service on a different node and update all URLs in the configuration accordingly. In this post, we will instead introduce a dedicated network node that will hold all our Neutron agents. Thus the new distribution of functionality to hosts will be as follows.

Node type Description Hostname
api_node Node on which all APIs are exposed. This will typically be the controller node, but could also be a load balancer in a HA setup controller
db_node Node on which the database is running controller
mq_node Node on which the Rabbit MQ service is running controller
memcached_node Node on which the memcached service is running controller
ntp_node Node on which the NTP service is running controller
network_node Node on which DHCP agent and the L3 agent (and thus routers) are running network
horizon_node Node on which Horizon is running controller
compute_node Compute nodes compute*

Here is a diagram that summarizes our new distribution of components to the various VirtualBox instances.

NewNodeSetup

In addition, we will make a second change to our network topology. So far, we have used a setup where all machines are directly connected on layer 2, i.e. are part of a common Ethernet network. This did allow us to use flat networks and VLAN networks to connect our different nodes. In reality, however, an OpenStack cluster will might sometimes be operated on top of an IP fabric so that layer 3 connectivity between all nodes is guaranteed, but we cannot assume layer 2 connectivity. Also, broadcast and multicast traffic might be restricted – an example could be an existing bare-metal cloud environment on top of which we want to install OpenStack. To be prepared for this situation, we will change our setup to avoid direct layer 2 connectivity. Here is our new network topology implementing these changes.

NetworkTopologySeparateNetworkNode

This is a bit more complicated than what we used in the past, so let us stop for a moment and discuss the setup. First, each node (i.e. each VirtualBox instance) will still be directly connected to the network on our lab host by a VirtualBox NAT interface with IP 10.0.2.15, and – for the time being – we continue to use this interface to access our machines via SSH and to download packages and images (this is something which we will change in an upcoming post as well). Then, there is still a management network with IP range 192.168.1.0/24.

The network that we did call the provider network in our previous posts is now called the underlay network. This network is reserved for traffic between virtual machines (and OpenStack routers) realized as a VXLAN virtual OpenStack network. As we use this network for VXLAN traffic, the network interfaces connected to it are now numbered.

All compute nodes are connected to the underlay network and to the management network. The same is true for the network node, on which all Neutron agents (the DHCP agent, the L3 agent and the metadata agent) will be running. The controller node, however, is not connected to the underlay network any more.

But we need a bit more than this to allow our instances to connect to the outside world. In our previous posts, we did build a flat network that was directly connected to the physical infrastructure to provide access to the public network. In our setup, where direct layer 2 connectivity between the machines can no longer be assumed, we realize this differently. On the network node and on each compute node, we bring up an additional OVS bridge called the external bridge br-ext. This bridge will essentially act as an additional virtual Ethernet device that is used to realize a flat network. All external bridges will be connected with each other by a VXLAN that is not managed by OpenStack, but by our Ansible scripts. For this VXLAN, we use a segmentation ID which is different from the segmentation IDs of the tenant networks (as all VXLAN connections will use the underlay network IP addresses and the standard port 4789, this isolation is required).

VXLANNetworkNodesComputeNodes

Essentially, we use the network node as a virtual bridge connecting all compute nodes with the network node and with each other. For Neutron, the external bridges will look like a physical interface building a physical network, and we can use this network as supporting provider network for a Neutron flat network to which we can attach routers and virtual machines.

This setup also avoids the use of multicast traffic, as we connect every bridge to the bridge running on the network node directly. Note that we also need to adjust the MTU of the bridge on the network node to account for the overhead of the VXLAN header.

On the network node, the external bridge will be numbered with IP address 172.16.0.1. It can therefore, as in the previously created flat networks, be used as a gateway. To establish connectivity from our instances and routers to the outside world, the network node will be configured as a router connecting the external bridge to the device enp0s3 so that traffic can leave the flat network and reach the lab host (and from there, the public internet).

MTU settings in Neutron

This is a good point in time to briefly discuss how Neutron handles MTUs. In Neutron, the MTU is an attribute of each virtual network. When a network is created, the ML2 plugin determines the MTU of the network and stores it in the Neutron database (in the networks table).

When a network is created, the ML2 plugin asks the type driver to calculate the MTU by calling its method get_mtu. For a VXLAN network, the VXLAN type driver first determines the MTU of the underlay network as the minimum of two values:

  1. the globally defined MTU set by the administrator using the parameter global_physnet_mtu in the neutron.conf configuration file (which defaults to 1500)
  2. the path MTU, defined by the parameter path_mtu in the ML2 configuration

Then, 50 bytes are subtracted from this value to account for the VXLAN overhead. Here 20 bytes are for the IPv4 header (so Neutron assumes that no options are used) and 30 bytes are for the remaining VXLAN overhead, assuming no inner VLAN tagging (you might want to consult this post to understand the math behind this). Therefore, with the default value of 1500 for the global MTU and no path MTU set, this results in 1450 bytes.

For flat networks, the logic is a bit different. Here, the type driver uses the minimum of the globally defined global_physnet_mtu and a network specific MTU which can be defined for each physical network by setting the parameter physical_network_mtus in the ML2 configuration. Thus, there are in total three parameters that determine the MTU of a VXLAN or flat network.

  1. The globally defined global_physnet_mtu in the Neutron configuration
  2. The per-network MTU defined in physical_network_mtus in the ML2 configuration which can be used to overwrite the global MTU for a specific flat network
  3. The path MTU in the ML2 configuration which can be used to overwrite the global MTU of the underlay network for VXLAN networks

What does this imply in our case? In the standard configuration, the VirtualBox network interfaces have an MTU of 1500, which is the standard Ethernet MTU. Thus we set the global MTU to 1500 and leave the path MTU undefined. With these settings, Neutron will correctly derive the MTU 1450 for interfaces attached to a Neutron managed VXLAN network.

Our own VXLAN network joining the external bridges is used as supporting network for a Neutron flat network. To tell Neutron that the MTU of this network is only 1450 (the 1500 of the underlying VirtualBox network minus 50 bytes for the encapsulation overhead), we can set the MTU for this network explicitly in the physical_network_mtus configuration item.

Implementation and tests

Let us now take a closer look at how our setup needs to change to make this work. First, obviously, we need to adapt our Vagrantfile to bring up an additional node and to reflect the changed network configuration.

Next, we need to bring up the external bridge br-ext on the network node and on the compute nodes. On each compute node, we create a VXLAN port pointing to the network node, and on the network node, we create a corresponding VXLAN port for each compute node. All VXLAN ports will be assigned to the VXLAN ID 100 (using the option:key= option of OVS). We then add a flow table entry to the bridge which defines NORMAL processing for all packets, i.e. forwarding like an ordinary switch.

We also need to make sure that the external bridge interface on the network node is up and that it has an IP address assigned, which will automatically create a route on the network node as well.

The next step is to configure the network node as a router. After having set the famous flag in /proc/sys/net/ipv4/ip_forward to 1 to enable forwarding, we need to set up the necessary rules in iptables. Here is a sample iptables-save file that demonstrates this setup.

# Set policies for raw table to accept
*raw
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
COMMIT
# Set policies for NAT table to ACCEPT and 
# add SNAT rule for traffic going out via the public interface
# Generated by iptables-save v1.6.1 on Mon Dec 16 08:50:33 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A POSTROUTING -o enp0s3 -j MASQUERADE
COMMIT
# Set policies in mangle table to ACCEPT
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
# Set policy for forwarded traffic in filter table to DROP, but allow
# forwarding for traffic coming from br-ext and established connections
# Also block incoming traffic on the public interface except SSH traffic 
# and reply to connected traffic
# Do not set the INPUT policy to DROP, as this would also drop all traffic
# on the management and underlay networks
*filter
:INPUT ACCEPT [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
-A FORWARD -i br-ext -j ACCEPT
-A FORWARD -i enp0s3 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i enp0s3 -p tcp --destination-port 22 -j ACCEPT
-A INPUT -i enp0s3 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i enp0s3 -j DROP
-A INPUT -i br-ext -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i br-ext -p icmp -j ACCEPT
-A INPUT -i br-ext -j DROP
COMMIT

Let us quickly discuss these rules. In the NAT table, we set up a rule to apply IP masquerading to all traffic that goes out on the public interface. Thus, the source IP address will be replaced by the IP address of enp0s3 so that the reply traffic is correctly routed. In the filter table, we set the default policy in the FORWARD chain to DROP. We then explicitly allow forwarding for all traffic coming from br-ext and for all traffic coming from enp0s3 which belongs to an already established connection. This is the firewall part of the rules – all traffic not matching one of these rules cannot reach the OpenStack networks. Finally, we need some rules in the INPUT table to protect the network node itself from unwanted traffic from the external bridge (to avoid attacks from an instance) and from the public interface and only allow reply traffic and SSH connections. Note that we make an exception for ICMP traffic so that we can ping 172.16.0.1 from the flat network – this is helpful to avoid confusion during debugging.

In addition, we use an OVS patch-port to connect our external bridge to the bridge br-phys, as we would do it for a physical device. We can then proceed to set up OpenStack as before, with the only difference that we install the Neutron agents on our network node, not on the controller node. If you want to try this out, run

git clone https://github.com/christianb93/openstack-labs
cd openstack-labs/Lab10
vagrant up
ansible-playbook -i hosts.ini site.yaml
ansible-playbook -i hosts.ini demo.yaml

The playbook demo.yaml will again create two networks, one VXLAN network and one flat network, and will start one instance (demo-instance-3) on the flat network and two instances on the VXLAN network. It will also install a router connecting these two networks, and assign a floating IP address to the first instance on the VXLAN network.

As before, we can still reach Horizon from the lab host via the management network on 192.168.1.11. If we navigate to the network topology page, we see the same pattern that we have already seen in the previous post.

DedicatedNetworkNodeVirtualTopology

Let us now try out a few things. First, let us try to reach the instance demo-instance-3 from the network node. To do this, log into the network node and ssh into the machine from there (replacing 172.16.0.9 with the IP address of demo-instance-3 on the flat network, which might be different in your case).

vagrant ssh network
ssh -i demo-key cirros@172.16.0.9

Note that we can no longer reach the machine from the lab host, as we are not using the VirtualBox network vboxnet1 any more. We can, however, SSH directly from the lab host (!) into this machine using the network node as a jump host.

ssh -i ~/.os_credentials/demo-key \
    -o StrictHostKeyChecking=no \
    -o "ProxyCommand ssh -i .vagrant/machines/network/virtualbox/private_key \
      -o StrictHostKeyChecking=no \
      -o UserKnownHostsFile=/dev/null \
      -p 2200 -q \
      -W 172.16.0.9:22 \
      vagrant@127.0.0.1" \
    cirros@172.16.0.9 

What about the instances on the VXLAN network? Our demo playbook has created a floating IP for one of the instances which you can either take from the output of the playbook or from the Horizon GUI. In my case, this is 172.16.0.4, and we can therefore reach this machine similarly.

ssh -i ~/.os_credentials/demo-key \
    -o StrictHostKeyChecking=no \
    -o "ProxyCommand ssh -i .vagrant/machines/network/virtualbox/private_key \
      -o StrictHostKeyChecking=no \
      -o UserKnownHostsFile=/dev/null \
      -p 2200 -q \
      -W 172.16.0.4:22 \
      vagrant@127.0.0.1" \
    cirros@172.16.0.4 

Now let us try to reach a few target IP addresses from this instance. First, you should be able to ping the machine on the flat network, i.e. 172.16.0.9 in this case. This is not surprising, as we have created a virtual router connecting the VXLAN network to the flat network. However, thanks to the routing functionality on the network node, we should now also be able to reach our lab host and other machines in the network of the lab host. In my case, for instance, the lab host is connected to a cable modem router with IP address 192.168.178.1, and in fact pinging this IP address from demo-instance-1 work just fine. You should even be able to SSH into the lab host from this instance!

It is interesting to reflect on the path that this ping request takes through the network.

  • First, the request is routed to the default gateway 172.18.0.1 network interface eth0 on the instance
  • From there, the packet travels all the way down via the integration bridge to the tunnel bridge on the compute node, via VXLAN to the tunnel bridge on the network node, to the integration bridge on the network node and to the internal port of the router
  • In the virtual OpenStack router, the packet is forwarded to the gateway interface and reaches the integration bridge again
  • As we are now on the flat network, the packet travels from the integration bridge to the physical bridge br-phys and from there to our external bridge br-ext. In the OpenStack router, a first NAT’ing takes place which replaces the source IP address of the packet by 172.18.0.1
  • The packet is received by the network node, and our iptables rules become effective. Thus, a second NAT’ing happens, the IP source address is set to that of enp0s3 and the packet is forwarded to enp0s3
  • This device is a VirtualBox NAT device. Therefore VirtualBox now opens a connection to the target, replaces the source IP address with that of the outgoing lab host interface via which this target can be reached and sends the packet to target host

If we log into the instance demo-instance-3 which is directly attached to the flat network, we are of course also able to reach our lab host and other machines to which it is directly connected, essentially via the same mechanism with the only difference that the first three steps are not necessary.

There is, however, still one issue: DNS resolution inside the instances does not work. To fix this, we will have to set up our DHCP agent, and this agent and how it works will be the topic of our next post.

OpenStack Neutron – building VXLAN overlay networks with OVS

In this post, we will learn how to set up VXLAN overlay networks as tenant networks in Neutron and explore the resulting configuration on our compute nodes.

Tenant networks

The networks that we have used so far have been provider networks – they have been created by an administrator, specifying the link to the physical network resource (physical network, VLAN ID) manually. As already indicated in our introduction to Neutron, OpenStack also allows us to create tenant networks, i.e. networks that a tenant, using a non-privileged user, can create using either the CLI or the GUI.

To make this work, Neutron needs to understand which resources, i.e. VLAN IDs in the case of VLANs or VXLAN IDs in the case of VXLANs, it can use when a tenant requests the creation of a network without getting in conflict with the physical network infrastructure. To achieve this, the configuration of the ML2 plugin allows us to specify ranges for VLANs and VXLANs which act as pools from which Neutron can allocate resources to tenant networks. In the case of VLANs, these ranges can be specified in the item network_vlan_ranges that we have already seen. Instead of just specifying the name of a physical network, we could use an expression like

physnet:100:200

to tell Neutron that the VLAN ids between 100 and 200 can be freely allocated for tenant networks. A similar range can be specified for VXLANs, here the configuration item is called vni_ranges. In addition, the ML2 configuration contains the item tenant_network_types which is an ordered list of network types which are offered as tenant networks.

When a user requests the creation of a tenant network, Neutron will go through this list in the specified order. For each network type, it will try to find a free segmentation ID. The first segmentation ID found determines the type of the network. If, for instance, all configured VLAN ids are already in use, but there is still a free VXLAN id, the tenant network will be created as a VXLAN network.

Setting up VXLAN networks in Neutron

After this theoretical discussion, let us now configure our playground to offer VXLAN networks as tenant networks. Here are the configuration changes that we need to enable VXLAN tenant networks.

  • add vxlan to the list type_drivers in the ML plugin configuration file ml2_conf.ini so that VXLAN networks are enabled
  • add vxlan to tenant_network_types in the same file
  • navigate to the ml2_type_vxlan section and edit vni_ranges to specify VXLAN IDs available for tenant networks
  • Set the local_ip in the configuration of OVS agent, which is the ID on which the agent will listen for VXLAN connections. Here we use the IP address of the management network (which is something you would probably not do in a real world situation as it is preferred to use separate network interfaces for management and VM traffic, but in our setup, the interface connected to the provider network is unnumbered)
  • In the same file, add vxlan to the key tunnel_types
  • In the Horizon configuration, add vxlan to the item horizon_supported_network_types

Instead of doing all this manually, you can of course once more use one of the Labs that I have created for you. To do this, run

git clone https://github.com/christianb93/openstack-labs
cd Lab9
vagrant up
ansible-playbook -i hosts.ini site.yaml
ansible-playbook -i hosts.ini demo_base.yaml

Note that the second playbook that we run will create a demo project and a demo user as in our previous projects as well as the m1.nano flavor. This playbook will also modify the default security groups to allow incoming ICMP and SSH traffic.

Let us now inspect the state of the compute nodes by logging into the compute node and executing the usual commands to check our network configuration.

vagrant ssh compute1
ifconfig -a
sudo ovs-vsctl show

The first major difference that we see compared to the previous setup with a pure VLAN based network separation is that an additional OVS bridge has been created by Neutron called the tunnel bridge br-tun. This bridge is connected to the integration bridge by a virtual patch cable. Attached to this bridge, there are two VXLAN peer-to-peer ports that connect the tunnel bridge on the compute node compute1 to the tunnel bridges on the second compute node and the controller node.

VXLANTunnelBridgeInitial

Let us now inspect the flows defined for the tunnel bridge. At this point in time, with no virtual machines created, the rules are rather simple – all packets are currently dropped.

TunnelBridgeFlowsInitial

Let us now verify that, as promised, a non-admin user can use the Horizon GUI to create virtual networks. So let us log into the Horizon GUI (reachable from the lab host via http://192.168.1.11/horizon as the demo user (use the demo password from /.os_credentials/credentials.yaml) and navigate to the “Networks” page. Now click on “Create Network” at the top right corner of the page. Fill in all three tabs to create a network with the following attributes.

  • Name: vxlan-network
  • Subnet name: vxlan-subnet
  • Network address: 172.18.0.0/24
  • Gateway IP: 172.18.0.1
  • Allocation Pools: 172.18.0.2,172.18.0.10 (make sure that there is no space after the comma separating the start and end address of the allocation pool)

It is interesting to note that the GUI does not ask us to specify a network type. This is in line with our discussion above on the mechanism that Neutron uses to assign tenant networks – we have only specified one tenant network type, namely VXLAN, and even if we had specified more than one, Neutron would pick the next available combination of segmentation ID and type for us automatically.

If everything worked, you should now be able to navigate to the “Network topology” page and see the following, very simple network layout.

NetworkTopologyVXLANNetworkOnly

Now use the button “Launch Instance” to create two instances demo-instance-1 and demo-instance-2 attached to the VXLAN network that we have just created. When we now navigate back to the network topology overview, the image should look similar to the one below.

NetworkTopologyVXLANInstances

Using the noVNC console, you should now be able to log into your instances and ping the first instance from the second one and the other way around. When we now re-investigate the network setup on the compute node, we see that a few things have changed.

First, as you would probably guess from what we have learned in the previous post, an additional tap port has been created which connects the integration bridge to the virtual machine instance on the compute node. This port is an access port with VLAN tag 1, which is again the local VLAN ID.

VXLANTunnelBridgeWithInstances

The second change that we find is that additional rules have been created on the tunnel bridge.Ethernet unicast traffic coming from the integration bridge is processed by table 20. In this table, we have one rule for each peer which directs traffic tagged with VLAN ID 1 for this peer to the respective VXLAN port. Note that the MAC destination address used here is the address of the tap port of the respective other VM. Outgoing Ethernet multicast traffic with VLAN ID 1 is copied to all VXLAN ports (“flooding rule”). Traffic with unknown VLAN IDs is dropped.

For ingress traffic, the VXLAN ID is mapped to VLAN IDs in table 4 and the packets are forwarded to table 10. In this table, we find a learning rule that will make sure that MAC addresses from which traffic is received are considered as targets in table 20. Typically, the packet triggering this learning rule is an ARP reply or an ARP request, so that the table is populated automatically when two machines establish an IP based communication. Then the packet is forwarded to the integration bridge.

TunnelBridgeFlowsWithInstances

At this point, we have two instances which can talk to each other, but there is still no way to connect these instances to the outside world. To do this, let us now add an external network and a router.

The external network will be a flat network, i.e. a provider network, and therefore needs to be added as admin user. We do this using the CLI on the controller node.

vagrant ssh controller
source admin-openrc
openstack network create \
  --share \
  --external \
  --provider-network-type flat \
  --provider-physical-network physnet \
  flat-network
openstack subnet create \
  --dhcp  \
  --subnet-range 172.16.0.0/24 \
  --network flat-network \
  --allocation-pool start=172.16.0.2,end=172.16.0.10 \
  flat-subnet

Back in the Horizon GUI (where we are logged in as the user demo), let us now create a router demo-router with the flat network that we just created as the external network. When you inspect the newly created router in the GUI, you will find that Neutron has assigned one of the IP addresses on the flat network to the external interface of the router. On the “Interfaces” tab, we can now create an internal interface connected to the VXLAN network. When we verify our work so far in the network topology overview, the displayed image should look similar to the one below.

NetworkTopologyVXLANInstancesRouter

Finally, to be able to reach our instances from the flat network (and from the lab host), we need to assign a floating IP. We will create it using the CLI as the demo user.

vagrant ssh controller
source demo-openrc
openstack floating ip create \
  --subnet flat-subnet \
  flat-network

Now switch back to the GUI and navigate to the compute instance to which you want to attach the floating IP, say demo-instance-1. From the instance overview page, select “Associate floating IP”, pick the IP address of the floating IP that we have just created and complete the association. At this point, you should be able to ping your instance and SSH into it from the lab host.

Concluding remarks

Before closing this post, let us quickly discuss several aspects of VXLAN networks in OpenStack that we have not yet touched upon.

First, it is worth mentioning that the MTU settings turn out to be an issue that comes up frequently when working with VXLANs. Recall that VXLAN technology encapsulates Ethernet frames in UDP packets. This implies a certain overhead – to a given Ethernet frame, we need to add a VXLAN header, a UDP header, an IP header and another Ethernet header. This overhead increases the size of a packet compared to the size of the payload.

Now, in an Ethernet network, every interface has an MTU (minimum transmission unit) which defines the maximum payload size of a frame which can be processed by this interface without fragmentation. A typical MTU for Ethernet is 1500 bytes, which (adding 14 bytes for the Ethernet header and 4 bytes for the checksum) corresponds to an Ethernet frame of 1518 bytes. However, if the MTU of the physical device on the host used to transmit the VXLAN packets is 1500 bytes, the MTU available to the virtual device is smaller, as the packet transmitted by the physical device is composed of the packet transmitted by the virtual device plus the overhead.

As discussed in RFC 4459, there are several possible solutions for this issue which essentially boil down to two alternatives. First, we could of course simply allow fragmentation so that packets exceeding the MTU are fragmented, either by the encapsulator (i.e. fragment the outer packet) or by the virtual interface (i.e. fragment the inner packet). Second, we could adjust the MTU of the underlying network infrastructure, for instance by jusing Jumbo frames with a size of 9000 bytes. The long and interesting discussion of the pros and cons of both approaches in the RFC comes to the conclusion that no clean and easy solution for this problem exists. The approach that OpenStack takes by default is to reduce the MTU of the virtual network interfaces (see the comments of the parameter global_physnet_mtu in the Neutron configuration file). In addition, the ML2 plugin can also be configured with specific MTUs for physical devices and a path MTU, see this page for a short discussion of the options.

The second challenge that the usage of VXLAN can create is the overhead generated by broadcast traffic. As OVS does not support the use of VXLAN in combination with multicast IP groups, Neutron needs to handle broadcast and multicast traffic differently. We have already seen that Neutron will install OpenFlow rules on the tunnel bridge (if you want to know all nitty-gritty details, the method tunnel_sync in the OVS agent is a good starting point), which implies that all broadcast traffic like ARP requests go to all nodes, even those on which no VMs in the same virtual network are hosted (and, as remarked above, the reply typically creates a learned OpenFlow rule used for further communication).

To avoid the unnecessary overhead of this type of broadcast traffic, the L2 population driver was introduced. This driver uses a proxy ARP mechanism to intercept ARP requests coming from the virtual machines on the bridge. The ARP requests are then answered locally, using an OpenFlow rule, and the request never leaves the local machine. As therefore the ARP reply can no longer be used to learn the MAC addresses of the peers, forwarding rules are created directly by the agent to send traffic to the correct VXLAN endpoint (see here for an entry point into the code).

OpenStack Neutron – deep dive into flat and VLAN networks

Having installed Neutron in my last post, we will now analyze flat networks and VLAN networks in detail and see how Neutron actually realizes virtual Ethernet networks. This will also provide the basic understanding that we need for more complex network types in future posts.

Setup

To follow this post, I recommend to repeat the setup from the previous post, so that we have two virtual machines running which are connected by a flat virtual network. Instead of going through the setup again manually, you can also use the Ansible scripts for Lab5 and combine them with the demo playbook from Lab6.

git clone https://github.com/christianb93/openstack-labs
cd openstack-labs/Lab5
vagrant up
ansible-playbook -i hosts.ini site.yaml
ansible-playbook -i hosts.ini ../Lab6/demo.yaml

This will install a demo project and a demo user, import an SSH key pair, create a flavor and a flat network and bring up two instances connected to this network, one on each compute node.

Analyzing the virtual devices

Once all instances are running, let us SSH into the first compute node and list all network interfaces present on the node.

vagrant ssh compute1
ifconfig -a

The output is long and a bit daunting. Here is the output from my setup, where I have marked the most relevant sections in red.

FlatNetworkIfconfig

The first interface that we see at the top is the integration bridge br-int which is created automatically by Neutron (in fact, by the Neutron OVS agent). The second bridge is the bridge that we have created during the installation process and that is used to connect the integration bridge to the physical network infrastructure – in our case to the interface enp0s9 which we use for VM traffic. The name of the physical bridge is known to Neutron from our configuration file, more precisely from the mapping of logical network names (physnet in our case) to bridge devices.

The full output also contains two devices (virbr0 and virbr0-nic) that are created by the libvirt daemon but not used.

We also see a tap device, tapd5fc1881-09 in our case. This tap device is realizing the port of our demo instance. To see this, source the credentials of the demo user and run openstack port list to see all ports. You will see two ports, one corresponding to each instance. The second part of the name of the tap device matches the first part of the UUID of the corresponding port (and we can use ethtool -i to get the driver managing this interface and see that it is really a tap device).

The virtual machine is listening on the tap device and using it to provide the virtual NIC to its guest. To verify that QEMU is really listening on this tap device, you can use the following commands (run this and all following commands as root).

# Figure out the PID of QEMU
pid=$(ps --no-headers \
      -C qemu-system-x86_64 \
      | awk '{ print $1}')
# Search file descriptors in /proc/*/fdinfo 
grep "tap" /proc/$pid/fdinfo/*

This should show you that one of the file descriptors is connected to the tap device. Let us now see how this tap device is attached to the integration bridge by running ovs-vsctl show. The output should look similar to the following sample output.

4629e2ce-b4d9-40b1-a362-5a1ba7f79e12
    Manager "ptcp:6640:127.0.0.1"
        is_connected: true
    Bridge br-phys
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port phy-br-phys
            Interface phy-br-phys
                type: patch
                options: {peer=int-br-phys}
        Port "enp0s9"
            Interface "enp0s9"
        Port br-phys
            Interface br-phys
                type: internal
    Bridge br-int
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port "tapd5fc1881-09"
            tag: 1
            Interface "tapd5fc1881-09"
        Port int-br-phys
            Interface int-br-phys
                type: patch
                options: {peer=phy-br-phys}
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "2.11.0"

Here we see that both OVS bridges are connected to a controller listening on port 6633, which is actually the Neutron OVS agent (the manager in the second line is the OVSDB server). The integration bridge has three ports. First, there is a port connected to the tap device, which is an access port with VLAN tag 1. This tagging is used to separate traffic on the integration bridge belonging to two different virtual networks. The VLAN ID here is called the local VLAN ID and is only valid per node. Then, there is a patch port connecting the integration bridge to the physical bridge, and there is the usual internal port.

The physical bridge has also three ports – the other side of the patch port connecting it to the integration bridge, the internal port and finally the physical network interface enp0s9. Thus the following picture emerges.

FlatNetworkTopology

So we get a first idea of how traffic flows. When the guest sends a packet to the virtual interface in the VM, it shows up on the tap device and goes to the integration bridge. It is then forwarded to the physical bridge and from there to the physical interface. The packet travels across the physical network connecting the two compute nodes and there again hits the physical bridge, travels along the virtual patch cable to the integration bridge and finally arrives at the tap interface.

At this point, it is important that the physical network interface enp0s9 is in promiscuous mode. In fact, it needs to pick up traffic directed to the MAC address of the virtual instance, not to its own MAC address. Effectively this interface itself is not visible and only part of a virtual Ethernet cable connecting the two physical bridges.

OpenFlow rules on the bridges

We now have a rough understanding of the flow, but there is still a bit of a twist – the VLAN tagging. Recall that the port to which the tap interface is connected is an access port, so traffic arriving there will receive a VLAN tag. If you run tcpdump on the physical interface, however, you will see that the traffic is untagged. So at some point, the VLAN tag is stripped of.

To figure out where this happens, we need to inspect the OpenFlow rules on the bridges. To simplify this process, we will first remove the security groups (i.e. disable firewall rules) and turn off port security for the attached port to get rid off the rules realizing this. For simplicity, we do this for all ports (needless to say that this is not a good idea in a production environment).

source /home/vagrant/admin-openrc
ports=$(openstack port list \
       | grep "ACTIVE" \
       | awk '{print $2}')
for port in $ports; do 
  openstack port set --no-security-group $port
  openstack port set --disable-port-security $port
done

Now let us dump the flows on the integration bridge using ovs-ofctl dump-flows br-int. In the following image, I have marked those rules that are relevant for traffic coming from the instance.

IntegrationBridgeOutgoingRules

The first rule drops all traffic for which the VLAN TCI, masked with 0x1fff (i.e. the last 13 bits) is equal to 0x0fff, i.e. for which the VLAN ID is the reserved value 0xfff. These packets are assumed to be irregular and are dropped. The second rule directs all traffic coming from the tap device, i.e. from the virtual machine, to table 60.

In table 60, the traffic coming from the tap device, i.e. egress traffic, is marked by loading the registers 5 and 6, and resubmitted to table 73, where it is again resubmitted to table 94. In table 94, the packet is handed over to the ordinary switch processing using the NORMAL target.

When we dump the rules on the physical bridge br-phys, the result is much shorter and displayed in the lower part of the image above. Here, the first rule will pick up the traffic, strip off the VLAN tag (as expected) and hand it over to normal processing, so that the untagged package is forwarded to the physical network.

Let us now turn to the analysis of ingress traffic. If a packet arrives at br-phys, it is simply forwarded to br-int. Here, it is picked up by the second rule (unless it has a reserved VLAN ID) which adds a VLAN tag with ID 1 and resubmits to table 60. In this table, NORMAL processing is applied and the packet is forwarded to all ports. As the port connected to the tap device is an access port for VLAN 1, the packet is accepted by this port, the VLAN tag is stripped off again and the packet appears in the tap device and therefore in the virtual interface in the instance.

IntegrationBridgeIncomingRules

All this is a bit confusing but becomes clearer when we study the meaning of the various tables in the Neutron source code. The relevant source files are br_int.py and br_phys.py. Here is an extract of the relevant tables for the integration bridge from the code.

Table Name
0 Local switching table
23 Canary table
24 ARP spoofing table
25 MAC spoofing table
60 Transient table
71,72 Firewall for egress traffic
73,81,82 Firewall for accepted and ingress traffic
91 Accepted egress traffic
92 Accepted ingress traffic
93 Dropped traffic
94 Normal processing

Let us go through the meaning of some of these tables. The canary table (23) is simply used to verify that OVS is up and running (whence the name). The MAC spoofing and ARP spoofing tables (24, 25) are not populated in our example as we have disabled the port protection feature. Similarly, the firewall tables (71 , 72, 73, 81, 82) only contain a minimal setup. Table 91 (accepted egress traffic) simply routes to table 94 (normal processing), tables 92 and 93 are not used and table 94 simply hands over the packets to normal processing.

In our setup, the local switching table (table 0) and the transient table (table 60) are actually the most relevant tables. Together, these two tables realize the local VLANs on the compute node. We will see later that on each node, a local VLAN is built for each global virtual network. The method provision_local_vlan installs a rule into the local switching table for each local VLAN which adds the corresponding VLAN ID to ingress traffic coming from the corresponding global virtual and then resubmits to the transient table.

Here is the corresponding table for the physical bridge.

Table Name
0 Local switching table
2 Local VLAN translation

In our setup, only the local switching table is used which simply strips off the local VLAN tags for egress traffic.

You might ask yourself how we can reach the instances from the compute node. The answer is that a ping (or an SSH connection) to the instance running on the compute node actually travels via the default gateway, as there is no direct route to 172.16.0.0 on the compute node. In our setup, the gateway is 10.0.2.2 on the enp0s3 device which is the NAT-network provided by Virtualbox. From there, the connection travels via the lab host where we have configured the virtual network device vboxnet1 as a gateway for 172.16.0.0, so that the traffic enters the virtual network again via this gateway and eventually reaches enp0s9 from there.

We could now turn on port protection and security groups again and study the resulting rules, but this would go far beyond the scope of this post (and far beyond my understanding of OpenFlow rules). If you want to get into this, I recommend this summary of the firewall rules. Instead, we move on to a more complex setup using VLANs to separate virtual networks.

Adding a VLAN network

Let us now adjust our configuration so that we are able to provision a VLAN based virtual network. To do this, there are two configuration items that we have to change and that both appear in the configuration of the ML2 plugin.

The first item is type_drivers. Here, we need to add vlan as an additional value so that the VLAN type driver is loaded.

When starting up, this plugin loads the second parameter that we need to change – network_vlan_ranges. Here, we need to specify a list of physical network labels that can be used for VLAN based networks. In our case, we set this to physnet to use our only physical network that is connected via the br-phys bridge.

You can of course make these changes yourself (do not forget to restart the Neutron server) or you can use the Ansible scripts of lab 7. The demo script that is part of this lab will also create a virtual network based on VLAN ID 100 and attach two instances to it.

git clone https://github.com/christianb93/openstack-labs
cd openstack-labs/Lab7
vagrant up
ansible-playbook -i hosts.ini site.yaml
ansible-playbook -i hosts.ini demo.yaml

Once the instances are up, log again into the compute node and, as above, turn off port security for all ports. We can now go through the exercise above again and see what has changed.

First, ifconfig -a shows that the basic setup is the same as before. We have still our integration bridge connected to the tap device and connected to the physical bridge. Again, the port to which the tap device is attached is an access port, tagged with the VLAN ID 1. This is the local VLAN corresponding to our virtual network.

When we analyze the OpenFlow rules in the two bridges, however, a difference to our flat network is visible. Let us start again with egress traffic.

In the integration bridge, the flow is the same as before. As the port to which the VM is attached is an access port, traffic originating from the VM is tagged with VLAN ID 1, processed by the various tables and eventually forwarded via the virtual patch cable to br-phys.

Here, however, the handling is different. The first rule for this bridge matches, and the VLAN ID 1 is rewritten to become VLAN ID 100. Then, normal processing takes over, and the packet leaves the bridge and travels via enp0s9 to the physical network. Thus, traffic which the VLAN ID on the integration bridge shows up with VLAN ID 100 on the physical network. This is the mapping between local VLAN ID (which represents a virtual network on the node) and global VLAN ID (which represents a virtual VLAN network on the physical network infrastructure connecting the nodes).

IntegrationBridgeOutgoingRulesVLAN

For ingress traffic, the reverse mapping applies. A packet travels from the physical bridge to the integration bridge. Here, the second rule for table 0 matches for packets that are tagged with VLAN 100, the global VLAN ID of our virtual network, and rewrites the VLAN ID to 1, the local VLAN ID. This packet is then processed as before and eventually reaches the access port connecting the bridge with the tap device. There, the VLAN tagging is stripped off and the untagged traffic reaches the VM.

IntegrationBridgeIncomingRulesVLAN

The diagram below summarizes our findings. We see that on the same physical infrastructure, two virtual networks are realized. There is still the flat network corresponding to untagged traffic, and the newly created virtual network corresponding to VLAN ID 100.

VLANNetworkTopology

It is interesting to note how the OpenFlow rules change if we bring up an additional instance on this compute node which is attached to the flat network. Then, an additional local VLAN ID (2 in our case) will appear corresponding to the flat network. On the physical bridge, the VLAN tag will be stripped off for egress traffic with this local VLAN ID, so that it appears untagged on the physical network. Similarly, on the integration bridge, untagged ingress traffic will no longer be dropped but will receive VLAN ID 2.

Note that this setup implies that we can no longer easily reach the machines connected to a VLAN network via SSH from the lab host or the compute node itself. In fact, even if we would set up a route to the vboxnet1 interface on the lab host, our traffic would come in untagged and would not reach the virtual machine. This is the reason why our lab 7 comes with a fully installed Horizon GUI which allows you to use the noVNC console to log into our instances.

This is very similar to a physical setup where a machine is connected to a switch via an access port, but the connection to the external network is on a different VLAN or on the untagged, native VLAN. In this situation, one would typically use a router to connect the two networks. Of course, Neutron offers virtual routers to connect two virtual networks. In the next post, we will see how this works and re-establish SSH connectivity to our instances.

OpenStack Neutron installation – basic setup and our first instances

In this post, we will go through the installation of Neutron for flat networks and get to know the basic configuration options for the various Neutron components, thus completing our first fully working OpenStack installation. If you have not already read my previous post describing some of the key concepts behind Neutron, I advise you to do this to be able to better follow todays post.

Installing Neutron on the controller node

The setup of Neutron on the controller node follows a similar logic to the installation of Nova or other OpenStack components that we have already seen. We create database tables and database users, add users, services and endpoints to Keystone, install the required packages, update the configuration and sync the database.

NeutronInstallation

As the first few steps are rather standard, let us focus on the configuration. There are

  • the Neutron server configuration file /etc/neutron.conf
  • the configuration of the ML2 plugin /etc/neutron/plugins/ml2/ml2_conf.ini
  • the configuration of the DHCP agent /etc/neutron/dhcp_agent.ini
  • the configuration of the DHCP agent /etc/neutron/metadata_agent.ini

Let us now discuss each of these configuration files in a bit more detail.

The Neutron configuration file on the controller node

The first group of options that we need to configure is familiar – the authorization strategy (keystone) and the related configuration for the Keystone authtoken middleware, the URL of the RabbitMQ server and the database connection as well as a directory for lock files.

The second change is the list of service plugins. Here we set this to an empty string, so that no additional service plugins beyond the core ML2 plugin will be loaded. At this point, we could configure additional services like Firewall-as-a-service which act as an extension.

Next we set the options notify_nova_on_port_status_changes and notify_nova_on_port_data_changes which instruct Neutron to inform Nova about status changes, for instance when the creation of a port fails.

To send these notifications, the external events REST API of Nova is used, and therefore Neutron needs credentials for the Nova service which need to be present in a section [nova] in the configuration file (loaded here).

The Neutron ML2 plugin configuration file

As you might almost guess from the presentation of Neutrons high-level architecture in the previous post, the ML2 plugin configuration file contains the information which type drivers and which mechanism drivers are loaded. The corresponding configuration items (type_drivers and mechanism_drivers) are both comma-separated lists. For our initial setup, we load the type drivers for a local network and a flat network, and we use the openvswitch mechanism driver.

In this file, we also configure the list of network types that are available as tenant networks. Here, we leave this list empty as we only offer a flat provider network.

In addition, we can configure so-called extension drivers. These are additional drivers that can be loaded by the ML2 plugin. In our case, the only driver that we load at this point in time is the port_security driver which allows us to disable the built-in port protection mechanisms and eases debugging.

In addition to the main [ml2] section, the configuration file also contains one section for each network type. As we only provide flat networks so far, we only have to populate this section. The parameter that we need is a list of flat networks. These networks are physical networks, but are not specified by the name of an actual physical interface, but by a logical network name which will later be mapped to an actual network device. Here, we choose the name physnet for our physical network.

The configuration of the metadata agent

We will talk in more detail about the metadata agent in a later post, and only briefly discuss it here. The metadata agent is an agent that allows instances to read metadata like the name of an SSH key using the de-facto standard provided by EC2s metadata service.

Nova provides a metadata API service that typically runs on a controller node. To allow access from within an instance, Neutron provides a proxy service which is protected by a secret shared between Nova and Neutron. In the configuration, we therefore need to provide the IP address (or DNS name) of the node on which the Nova metadata service is running and the shared secret.

The configuration of the DHCP agent

The configuration file for the DHCP agent requires only a few changes. First, we need to define the driver that the DHCP agent uses to connect itself to the virtual networks, which in our case is the openvswitch driver. Then, we need to specify the driver for the DHCP server itself. Here, we use the dnsmasq driver which actually spawns a dnsmasq process. We will learn more about DHCP agents in Neutron in a later post.

There is another feature that we enable in our configuration file. Our initial setup does not contain any router. Usually, routers are used to provide connectivity between an instance and the metadata service. To still allow the instances to access the metadata agent, the DHCP agent can be configured to add a static route to each instance pointing to the DHCP service. We enable this feature by setting the flag enable_isolated_metadata to true.

Installing Neutron on the compute nodes

We are now done with the setup of Neutron on the controller nodes and can turn our attention to the compute nodes. First, we need to install Open vSwitch and the neutron OVS agent on each compute node. Then, we need to create an OVS bridge on each compute node.

This is actually the point where I have been cheating in my previous post on Neutrons architecture. As mentioned there, Neutron does not connect the integration bridge directly to the network interface of the compute node. Instead, Neutron expects the presence of an additional OVS bridge that has to be provided by the adminstrator. We will simply let Neutron known what the name of this bridge is, and Neutron will attach it to the integration bridge and add a few OpenFlow rules. Everything else, i.e. how this bridge is connected to the actual physical network infrastructure, is up to the administrator. To create the bridge on the compute node, simply run (unless you are using my Ansible scripts, of course – see below)

ovs-vsctl  add-br br-int \
  -- set bridge br-int fail-mode=secure
ovs-vsctl add-port br-phys enp0s9

Once this is done, we now we need to adapt our configuration files as follows. In the item bridge_mappings, we need to provide the mapping from the physical network names to bridges. In our example, we have used the name physnet for our physical network. We now map this name to the device just created by setting

bridge_mappings = physnet:br-phys

We then need to define the driver that the Neutron agent uses to provide firewall functionality. This should of course be compatible with the chosen L2 driver, so we set this to openvswitch. We also enable security groups.

Once all configuration files have been updated, we should now restart the Neutron OVS agent on each compute node and all services on the controller nodes.

Installation using Ansible scripts

As always, I have put the exact steps to replicate this deployment into a set of Ansible scripts and roles. To run them, use (assuming, as always, that you did go through the basic setup described in an earlier post)

git clone https://github.com/christianb93/openstack-labs
cd openstack-labs/Lab5
vagrant up
ansible-playbook -i hosts.ini site.yaml

Let us now finally test our installation. We will create a project and a user, a virtual network, bring up two virtual machines, SSH into them and test connectivity.

First, we SSH into the controller node and source the OpenStack credentials for the admin user. We then create a demo project, assign the generated password for the demo user from demo-openrc, create the demo user with the corresponding password and assign the member role in the demo project to this user.

vagrant ssh controller
source admin-openrc
openstack project create demo
password=$(awk -F '=' '/OS_PASSWORD=/ { print $2}' demo-openrc)
openstack user create \
   --domain default\
   --project demo\
   --project-domain default\
   --password $password\
   demo
openstack role add \
   --user demo\
   --project demo\
   --project-domain default\
   --user-domain default member

Let us now create our network. This will be a flat network (the only type we have so far), based on the physical network physnet, and we will call this network demo-network.

openstack network create \
--share \
--external \
--provider-physical-network physnet \
--provider-network-type flat \
demo-network

Next, we create a subnet attached to this network with an IP range from 172.16.0.2 to 172.16.0.10.

openstack subnet create --network demo-network \
  --allocation-pool start=172.16.0.2,end=172.16.0.10 \
  --gateway 172.16.0.1 \
  --subnet-range 172.16.0.0/12 demo-subnet

Now we need to create a flavor. A flavor is a bit like a template for a virtual machines and defines the machine size.

openstack flavor create\
  --disk 1\
  --vcpus 1\
  --ram 512 m1.nano

To complete the preparations, we now have to adjust the firewall rules in the default security group to allow for ICMP and SSH traffic and to import the SSH key that we want to use. We execute these commands as the demo user.

source demo-openrc
openstack security group rule create --proto icmp default
openstack security group rule create --proto tcp --dst-port 22 default
openstack keypair create --public-key /home/vagrant/demo-key.pub demo-key

We are now ready to create our first instances. We will bring up two instances, called demo-instance-1 and demo-instance-2, both being attached to our demo network.

source demo-openrc
net_id=$(openstack network show -f value -c id demo-network)
openstack server create \
--flavor m1.nano \
--image cirros \
--nic net-id=$net_id \
--security-group default \
--key-name demo-key \
demo-instance-1
openstack server create \
--flavor m1.nano \
--image cirros \
--nic net-id=$net_id \
--security-group default \
--key-name demo-key \
demo-instance-2

To inspect the status of the instances, use openstack server list, which will also give you the IP address of the instances. To determine the IP address of the first demo instance and verify connectivity using ping, run

ip=$(openstack server show demo-instance-1 \
  -f value -c addresses \
  | awk -F "=" '{print $2}')
ping $ip

Finally, you should be able to SSH into your instances as follows.

ssh -i demo-key cirros@$ip

You can now ping the second instance and verify that the instance is in the ARP cache, demonstrating that there is a direct layer 2 connectivity between the instances.

We have now successfully completed a full OpenStack installation. In the next post, we will analyse the network setup that Neutron uses on the compute nodes in more detail. Until then, enjoy your running OpenStack playground!

OpenStack Neutron – architecture and overview

In this post, which is part of our series on OpenStack, we will start to investigate OpenStack Neutron – the OpenStack component which provides virtual networking services.

Network types and some terms

Before getting into the actual Neutron architecture, let us try to understand how Neutron provides virtual networking capabilities to compute instances. First, it is important to understand that in contrast to some container networking technologies like Calico, Neutron provides actual layer 2 connectivity to compute instances. Networks in Neutron are layer 2 networks, and if two compute instances are assigned to the same virtual network, they are connected to an actual virtual Ethernet segment and can reach each other on the Ethernet level.

What technologies do we have available to realize this? In a first step, let us focus on connecting two different virtual machines running on the same host. So assume that we are given two virtual machines, call them VM1 and VM2, on the same physical compute node. Our hypervisor will attach a virtual interface (VIF) to each of these virtual machines. In a physical network, you would simply connect these two interfaces to ports of a switch to connect the instances. In our case, we can use a virtual switch / bridge to achieve this.

OpenStack is able to leverage several bridging technologies. First, OpenStack can of course use the Linux bridge driver to build and configure virtual switches. In addition, Neutron comes with a driver that uses Open vSwitch (OVS). Throughout this series, I will focus on the use of OVS as a virtual switch.

So, to connect the VMs running on the same host, Neutron could use (and it actually does) an OVS bridge to which the virtual machine networking interfaces are attached. This bridge is called the integration bridge. We will see later that, as in a typical physical network, this bridge is also connected to a DHCP agent, routers and so forth.

NeutronNetworkingStepI

But even for the simple case of VMs on the same host, we are not yet done. In reality, to operate a cloud at scale, you will need some approach to isolate networks. If, for instance, the two VMs belong to different tenants, you do not want them to be on the same network. To do this, Neutron uses VLANs. So the ports connecting the integration bridge to the individual VMs are tagged, and there is one VLAN for each Neutron network.

This networking type is called a local network in Neutron. It is possible to set up Neutron to only use this type of network, but in reality, this is of course not really useful. Instead, we need to move on and connect the VMs that are attached to the same network on different hosts. To do this, we will have to use some virtual networking technology to connect the integration bridges on the different hosts.

NeutronNetworkingStepII

At this point, the above diagram is – on purpose – a bit vague, as there are several technologies available to achieve this (and I am cheating a bit and ignoring the fact that the integration bridge is not actually connected to a physical network interface but to a second bridge which in turn is connected to the network interface). First, we could simply connect each integration bridge to a physical network device which in turn is connected to the physical network. With this setup, called a flat network in Neutron, all virtual machines are effectively connected to the same Ethernet segment. Consequently, there can only be one flat network per deployment.

The second option we have is to use VLANs to partition the physical network according to the virtual networks that we wish to establish. In this approach, Neutron would assign a global VLAN ID to each virtual network (which in general is different from the VLAN ID used on the integration bridge) and tag the traffic within each virtual network with the corresponding VLAN ID before handing it over to the physical network infrastructure.

Finally, we could use tunnels to connect the integration bridges across the hosts. Neutron supports the most commonly used tunneling protocols (VXLAN, GRE, Geneve).

Regardless of the network type used, Neutron networks can be external or internal. External networks are networks that allow for connectivity to networks outside of the OpenStack deployment, like the Internet, whereas internal networks are isolated. Technically, Neutron does not really know whether a network has connectivity to the outside world, therefore “external network” is essentially a flag attached to a network which becomes relevant when we discuss IP routers in a later post.

Finally, Neutron deployment guides often use the terms provider network and tenant networks. To understand what this means, suppose you wanted to establish a network using e.g. VLAN tagging for separation. When defining this network, you would have to assign a VLAN tag to this virtual network. Of course, you need to make sure that there are no collisions with other Neutron networks or other reserved VLAN IDs on the physical networks. To achieve this, there are two options.

First, an administrator who has a certain understanding of the underlying physical network structure could determine an available VLAN ID and assign it. This implies that an administrator needs to define the network, and thus, from the point of view of a tenant using the platform, the network is created by the platform provider. Therefore, these networks are called provider networks.

Alternatively, an administrator could, initially, when installing Neutron, define a pool of available VLAN IDs. Using this pool, Neutron would then be able to automatically assing a VLAN ID when a tenant uses, say, the Horizon GUI to create a network. With this mechanism in place, tenants can define their own networks without having to rely on an administrator. Therefore, these networks are called tenant networks.

Neutron architecture

Armed with this basic understanding of how Neutron realizes virtual networks, let us now take a closer look at the architecture of Neutron.

NeutronArchitecture

The diagram above displays a very rough high-level overview of the components that make up Neutron. First, there is the Neutron server on the left hand side that provides the Neutron API endpoint. Then, there are the components that provide the actual functionality behind the API. The core functionality of Neutron is provided by a plugin called the core plugin. At the time of writing, there is one plugin – the ML2 plugin – which is provided by the Neutron team, but there are also other plugins available which are provided by third parties, like the Contrail plugin. Technically, a plugin is simply a Python class implementing the methods of the NeutronPluginBaseV2 class.

The Neutron API can be extended by API extensions. These extensions (which are again Python classes which are stored in a special directory and loaded upon startup) can be action extensions (which provide additional actions on existing resources), resource extensions (which provide new API resources) or request extensions that add new fields to existing requests.

Now let us take a closer look at the ML2 plugin. This plugin again utilizes pluggable modules called drivers. There are two types of drivers. First, there are type drivers which provide functionality for a specific network type, like a flat network, a VXLAN network, a VLAN network and so forth. Second, there are mechanism drivers that contain the logic specific to an implementation, like OVS or Linux bridging. Typically, the mechanism driver will in turn communicate with an L2 agent like the OVS agent running on the compute nodes.

On the right hand side of the diagram, we see several agents. Neutron comes with agents for additional functionality like DHCP, a metadata server or IP routing. In addition, there are agents running on the compute node to manipulate the network stack there, like the OVS agent or the Linux bridging agent, which correspond to the chosen mechanism drivers.

Basic Neutron objects

To close this post, let us take a closer look at same of the objects that Neutron manages. First, there are networks. As mentioned above, these are virtual layer 2 networks to which a virtual machine can attach. The point where the machine attaches is called a port. Each port belongs to a network and has a MAC address. A port contains a reference to the device to which it is attached. This can be a Nova managed instance, but also be another network device like a DHCP agent or a router. In addition, an IP address can be assigned to a port, either directly when the port is created (this is often called a fixed IP address) or dynamically.

In addition to layer 2 networks, Neutron has the concept of a subnet. A subnet is attached to a network and describes an IP network on top of this Ethernet network. Thus, a subnet has a CIDR and a gateway IP address.

Of course, this list is far from complete – there are routers, floating IP addresses, DNS servers and so forth. We will touch upon some of these objects in later posts in this series. In the next post, we will learn more about the components making up Neutron and how they are installed.

OpenStack Nova – deep-dive into the provisioning process

In the last post, we did go through the installation process and the high-level architecture of Nova, talking about the Nova API server, the Nova scheduler and the Nova agent. Today, we will make this a bit more tangible by observing how a typical request to provision an instance flows through this architecture.

The use case we are going to consider is the creation of a virtual server, triggered by a POST request to the /servers/ API endpoint. This is a long and complicated process, and we try to focus on the main path through the code without diving into every possible detail. This implies that we will skim over some points very briefly, but the understanding of the overall process should put us in a position to dig into other parts of the code if needed.

Roughly speaking, the processing of the request will start in the Nova API server which will perform validations and enrichments and populate the database. Then, the request is forwarded to the Nova conductor which will invoke the scheduler and eventually the Nova compute agent on the compute nodes. We will go through each of these phases in a bit more detail in the following sections.

Part I – the Nova API server

Being triggered by an API request, the process of course starts in the Nova API server. We have already seen in the previous post that the request is dispatched to a controller based on a set of hard-wired routes. For the endpoint in question, we find that the request is routed to the method create of the server controller.

This method first assembles some information like the user data which needs to be passed to the instance or the name of the SSH key to be placed in the instance. Then, authorization is carried out be calling the can method on the context (which, behind the scenes, will eventually invoke the Oslo policy rule engine that we have studied in our previous deep dive). Then the request data for networks, block devices and the requested image is processed before we eventually call the create method of the compute API. Finally, we parse the result and use a view builder to assemble a response.

Let us now see follow the call into the compute API. Here, all input parameters are validated and normalized, for instance by adding defaults. Then the method _provision_instances is invoked, which builds a request specification and the actual instance object and stores these objects in the database.

At this point, the Nova API server is almost done. We now call the method schedule_and_build_instances of the compute task API. From here, the call will simply be delegated to the corresponding method of the client side of the conductor RPC API which will send a corresponding RPC message to the conductor. At this point, we leave the Nova API server and enter the conductor. The flow through the code up to this point is summarized in the diagram below.

NovaProvisioningPartI

Part II – the conductor

In the last post, we have already seen that RPC calls are accepted by the Nova conductor service and are passed on to the Nova conductor manager. The corresponding method is schedule_and_build_instances

This method first retrieves the UUIDs of the instances from the request. Then, for each instance, the private method self._schedule_instances is called. Here, the class SchedulerQueryClient is used to submit an RPC call to the scheduler, which is being processed by the schedulers select_destinations method.

We will not go into the details of the scheduling process here, but simply note that this will in turn make a call to the placement service to retrieve allocation candidates and then calls the scheduler driver to actually select a target host.

Back in the conductor, we check whether the scheduling was successful. It not, the instance is moved into the cell0. If yes, we determine the cell in which the selected host is living, update some status information and eventually, at the end of the method, invoke the method build_and_run_instance of the RPC client for the Nova compute service. At this point, we leave the Nova conductor service and the processing continues in the Nova compute service running on the selected host.

InstanceCreationPartII

Part III – the processing on the compute node

We have now reached the Nova compute agent running on the selected compute node, more precisely the method build_and_run_instance of the Nova compute manager. Here we spawn a separate worker thread which runs the private method _do_build_and_run_instance.

This method updates the VM state to BUILDING and calls _build_and_run_instance. Within this method, we first invoke _build_resources which triggers the creation of resources like networks and storage devices, and then move on to the spawn method of the compute driver from nova.virt. Note that this is again a pluggable driver mechanism – in fact the compute driver class is an abstract class, and needs to be implemented by each compute driver.

Now let us see how the processing works in our specific case of the libvirt driver library. First, we create an image for the VM by calling the private method _create_image. Next, we create the XML descriptor for the guest, i.e. we retrieve the required configuration data and turn it into the XML structure that libvirt expects. Finally, we call _create_domain_and_network and finally set a timer to periodically check the state of the instance until the boot process is complete.

In _create_domain_and_network, we plug in the virtual network interfaces, set up the firewall (in our installation, this is the point where we use the No-OP firewall driver as firewall functionality is taken over by Neutron) and then call _create_domain which creates the actual guest (called a domain in libvirt).

This delegates the call to nova.virt.libvirt.Guest.create()and then powers on the guest using the launch method on the newly created guest. Let us take a short look at each of these methods in turn.

In nova.virt.libvirt.Guest.create(), we use the write_instance_config method of the host class to create the libvirt guest without starting it.

In the launch method in nova/virt/libvirt/guest.py, we now call createWithFlags on the domain. This is actually a call into the libvirt library itself and will launch the previously defined guest.

InstanceCreationPartIII

At this point, our newly created instance will start to boot. The timer which we have created earlier will check in periodic intervalls whether the boot process is complete and update the status of the instance in the database accordingly.

This completes our short tour through the instance creation process. There are a few points which we have deliberately skipped, for instance the details of the scheduling process, the image creation and image caching on the compute nodes or the network configuration, but the information in this post might be a good starting point for further deep dives.

OpenStack supporting services – Glance and Placement

Apart from Keystone, Glance and Placement are two additional infrastructure services that are part of every OpenStack installation. While Glance is responsible for storing and maintaining disk images, Placement (formerly part of Nova) is keeping track of resources and allocation in a cluster.

Glance installation

Before we get into the actual installation process, let us take a short look at the Glance runtime environment. Different from Keystone, but similar to most other OpenStack services, Glance is not running inside Apache, but is an independent process using a standalone WSGI server.

To understand the startup process, let us start with the setup.cfg file. This file contains an entry point glance-api which, via the usual mechanism provided by Pythons setuptools, will provide a Python executable which runs glance/cmd/api.py. This in turn uses the simple WSGI server implemented in glance/common/wsgi.py. This server is then started in the line

server.start(config.load_paste_app('glance-api'), default_port=9292)

Here we see that the actual WSGI app is created and passed to the server using the PasteDeploy Python library. If you have read my previous post on WSGI and WSGI middleware, you will know that this is a library which uses configuration data to plumb together a WSGI application and middleware. The actual call of the PasteDeploy library is delegated to a helper library in glance/common and happens in the function load_past_app defined here.

Armed with this understanding, let us now dive right into the installation process. We will spend a bit more time with this process, as it contains some recurring elements which are relevant for most of the OpenStack services that we will install and use. Here is a graphical overview of the various steps that we will go through.

GlanceInstallationProcess

The first thing we have to take care of is the database. Almost all OpenStack services require some sort of database access, thus we have to create one or more databases in our MariaDB database server. In the case of Glance, we create a database called glance. To allow Glance to access this database, we also need to set up a corresponding MariaDB user and grant the necessary access rights on our newly created database.

Next, Glance of course needs access to Keystone to authenticate users and authorize API requests. For that purpose, we create a new user glance in Keystone. Following the recommended installation process, we will in fact create one Keystone user for every OpenStack service, which is of course not strictly necessary.

With this, we have set up the necessary identities in Keystone. However, recall that Keystone is also used as a service catalog to decouple services from endpoints. An API user will typically not access Glance directly, but first get a list of service endpoints from Keystone, select an appropriate endpoint and then use this endpoint. To support this pattern, we need to register Glance with the Keystone service catalog. Thus, we create a Keystone service and API endpoints. Note that the port provided needs to match the actual port on which the Glance service is listening (using the default unless overridden explicitly in the configuration).

OpenStack services typically expose more than one endpoint – a public endpoint, an internal endpoint and an admin endpoint. As described here, there does not seem to be fully consistent configuration scheme that allows an administrator to easily define which endpoint type the services will use. Following the installation guideline, we will install all our services with all three endpoint types.

Next, we can install Glance by simply installing the corresponding APT package. Similar to Keystone, this package comes with a set of configuration files that we now have to adapt.

The first change which is again standard across all OpenStack components is to change the database connection string so that Glance is able to find our previously created database. Note that this string needs to contain the credentials for the Glance database user that we have created.

Next, we need to configure the Glance WSGI middleware chain. As discussed above, Glance uses the PasteDeploy mechanism to create a WSGI application. When you take a look at the corresponding configuration, however, you will see that it contains a variety of different pipeline definitions. To select the pipeline that will actually be deployed, Glance has a configuration option called deployment flavor. This is a short form for the name of the pipeline to be selected, and when the actual pipeline is assembled here, the name of the pipeline is put together by combining the flavor with the string “glance-api”. We use the flavor “keystone” which will result in the pipeline “glance-api-keystone” being loaded.

This pipeline contains the Keystone auth token middleware which (as discussed in our deep dive into tokens and policies) extracts and validates the token data in a request. This middleware components needs access to the Keystone API, and therefore we need to add the required credentials to our configuration in the section [keystone_authtoken].

To complete the installation, we still have to create the actual database schema that Glance expects. Like most other OpenStack services, Glance is able to automatically create this schema and to synchronize an existing database with the current version of the existing schema by automatically running the necessary migration routines. This is done by the helper script glance-manage.

The actual installation process is now completed, and we can restart the Glance service so that the changes in our configuration files are picked up.

Note that the current version of the OpenStack install guide for Stein will instruct you to start two Glance services – glance-api and glance-registry. We only start the glance-api service, for the following reason.

Internally, Glance is structured into a database access layer and the actual Glance API server, plus of course a couple of other components like common services. Historically, Glance used the first of todays three access layers called the Glance registry. Essentially, the Glance registry is a service sitting between the Glance API service and the database, and contains the code for the actual database layer which uses SQLAlchemy. In this setup, the Glance API service is reachable via the REST API, whereas the Glance registry server is only reachable via RPC calls (using the RabbitMQ message queue). This will add an additional layer of security, as the database credentials need to be stored in the configuration of the Glance registry service only, and makes it easier to scale Glance across several nodes. Later, the Glance registry service was deprecated, and the actual configuration instructs Glance to access the database directly (this is the data_api parameter in the Glance configuration file).

As in my previous posts, I will not replicate the exact commands to do all this manually (you can find them in the well-written OpenStack installation guide), but have put together a set of Ansible scripts doing all this. To run them, enter the following commands

git clone https://github.com/christianb93/openstack-labs
cd Lab3
vagrant up
ansible-playbook -i hosts.ini site.yaml

This playbook will not only install and configure the Glance service, but will also download the CirrOS image (which I have mirrored in an S3 bucket as the original location is sometimes a bit slow) and import it into Glance.

Working with Glance images

Let us now play a bit with Glance. The following commands need to be run from the controller node, and we have to source the credentials that we need to connect to the API. So SSH into the controller node and source the credentials by running

vagrant ssh controller
source admin-openrc

First, let us use the CLI to display all existing images.

openstack image list

As we have only loaded one image so far – the CirrOS image – the output will be similar to the following sample output.

+--------------------------------------+--------+--------+
| ID                                   | Name   | Status |
+--------------------------------------+--------+--------+
| f019b225-de62-4782-9206-ed793fbb789f | cirros | active |
+--------------------------------------+--------+--------+

Let us now get some more information on this image. For better readability, we display the output in JSON format.

openstack image show cirros -f json

The output is a bit longer, and we will only discuss a few of the returned attributes. First, there is the file attribute. If you look at this and compare this to the contents of the directory /var/lib/glance/images/, you will see that this is a reference to the actual image stored on the hard disk. Glance delegates the actual storage to a storage backend. Storage backends are provided by the separate glance_store library and include a file store (which simply stores files on the disk as we have observed and is the default), a HTTP store which uses a HTTP GET to retrieve an image, an interface to the RADOS distribute object store and interfaces to Cinder, Swift and VMWare data store.

We also see from the output that images can be active or inactive, belong to a project (the owner field refers to a project), can be tagged and can either be visible for the public (i.e. outside the project to which they belong) or private (i.e. only visible within the project). It is also possible to share images with individual projects by adding these projects as members.

Note that Glance stores image metadata, like visibility, hash values, owner and so forth in the database, while the actual image is stored in one of the storage backends.

Let us now go through the process of adding an additional image. The OpenStack Virtual Machine Image guide contains a few public sources for OpenStack images and explains how adminstrators can create their own images based on the most commonly used Linux distributions. As an example, here are the commands needed to download the latest Ubuntu Bionic cloud image and import it into Glance.

wget http://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
openstack image create \
    --disk-format qcow2 \
    --file bionic-server-cloudimg-amd64.img \
    --public \
    --project admin \
    ubuntu-bionic

We will later see how we need to reference an image when creating a virtual machine.

Installing the placement service

Having discussed the installation process for the Glance service in a bit more detail, let us now quickly go over the steps to install Placement. Structurally, these steps are almost identical to those to install Glance, and we will not go into them in detail.

PlacementInstallation

Also the changes in the configuration file are very similar to those that we had to apply for Glance. Placement again uses the Keystone authtoken plugin, so that we have to supply credentials for Keystone in the keystone_authtoken section of the configuration file. We also have to supply a database connection string. Apart from that, we can take over the default values in the configuration file without any further changes.

Placement overview

Let us now investigate the essential terms and objects that the Placement API uses. As the openstack client does not yet contain full support for placement, we will directly use the API using curl. With each request, we need to include two header parameters.

  • X-Auth-Token needs to contain a valid token that we need to retrieve from Keystone first
  • OpenStack-API-Version needs to be included to define the version of the API (this is the so-called microversion).

Here is an example. We will SSH into the controller, source the credentials, get a token from Keystone and submit a GET request on the URL /resource_classes that we feed into jq for better readability.

vagrant ssh controller
source admin-openrc
sudo apt-get install jq
token=$(openstack token issue -f json | jq -r ".id") 
curl \
  -H "X-Auth-Token: $token" \
  -H "OpenStack-API-Version: placement 1.31"\
  "http://controller:8778/resource_classes" | jq

The resulting list is a list of all resource classes known to Placement. Resource classes are types of resources that Placement manages, like IP addresses, vCPUs, disk space or memory. In a fully installed system, OpenStack services can register as resource providers. Each provider offers a certain set of resource classes, which is called an inventory. A compute node, for instance, would typically provide CPUs, disk space and memory. In our current installation, we cannot yet test this, but in a system with compute nodes, the inventory for a compute node would typically look as follows.

{
  "resource_provider_generation": 3,
  "inventories": {
    "VCPU": {
      "total": 2,
      "reserved": 0,
      "min_unit": 1,
      "max_unit": 2,
      "step_size": 1,
      "allocation_ratio": 16
    },
    "MEMORY_MB": {
      "total": 3944,
      "reserved": 512,
      "min_unit": 1,
      "max_unit": 3944,
      "step_size": 1,
      "allocation_ratio": 1.5
    },
    "DISK_GB": {
      "total": 9,
      "reserved": 0,
      "min_unit": 1,
      "max_unit": 9,
      "step_size": 1,
      "allocation_ratio": 1
    }
  }
}

Here we see that the compute node has two virtual CPUs, roughly 4 GB of memory and 9 GB of disk space available. For each resource provider, Placement also maintains usage data which keeps track of the current usage of the resources. Here is a JSON representation of the usage for a compute node.

{
  "resource_provider_generation": 3,
  "usages": {
    "VCPU": 1,
    "MEMORY_MB": 128,
    "DISK_GB": 1
  }
}

So in this example, one vCPU, 128 MB RAM and 1 GB disk of the compute node are in use. To link consumers and usage information, Placement uses allocations. An allocation represents a usage of resources by a specific consumer, like in the following example.

{
  "allocations": {
    "aaa957ac-c12c-4010-8faf-55520200ed55": {
      "resources": {
        "DISK_GB": 1,
        "MEMORY_MB": 128,
        "VCPU": 1
      },
      "consumer_generation": 1
    }
  },
  "resource_provider_generation": 3
}

In this case, the consumer (represented by the UUID aaa957ac-c12c-4010-8faf-55520200ed55) is actually a compute instance which consumes 1 vCPU, 128 MB memory and 1 GB disk space. Here is a diagram that represents a simplified version of the Placement data model.

PlacementDataModel

Placement offers a few additional features like Traits, which define qualitative properties of resource providers, or aggregates which are groups of resource providers, or the ability to make reservations.

Let us close this post by briefly discussing the relation between Nova and Placement. As we have mentioned above, compute nodes represent resource providers in Placement, so Nova needs to register resource provider records for the compute nodes it manages and providing the inventory information. When a new instance is created, the Nova scheduler will request information on inventories and current usage from Placement to determine the compute node on which the instance will be placed, and will subsequently update the allocations to register itself as a consumer for the resources consumed by the newly created instance.

With this, the installation of Glance and Placement is complete and we have all the ingredients in place to start installing the Nova services in the next post.

OpenStack Keystone – a deep-dive into tokens and policies

In the previous post, we have installed Keystone and provided an overview of its functionality. Today, we will dive in detail into a typical authorization handshake and take you through the Keystone source code to see how it works under the hood.

The overall workflow

Let us first take a look at the overall process before we start to dig into details. As an example, we will use the openstack CLI to list all existing projects. To better see what is going on behind the scenes, we run the openstack client with the -v command line switch which creates a bit more output than usual.

So, log into the controller node and run

source admin-demorc
openstack -vv project list

This will give a rather lengthy output, so let us focus on those lines that signal that a requests to the API is made. The first API is a GET request to the URL

http://controller:5000/v3

This request will return a list of available API versions, marked with a status. In our case, the result indicates that the stable version is version v3. Next, the clients submits a POST request to the URL

http://controller:5000/v3/auth/tokens

If we look up this API endpoint in the Keystone Identity API reference, we find that this method is used to create and return a token. When making this request, the client will use the data provided in the environment variables set by our admin-openrc script to authenticate with Keystone, and Keystone will assemble and return a token.

The returned data has actually two parts. First, there is the actual Fernet token, which is provided in the HTTP header instead of the HTTP body. Second, there is a token structure which is returned in the response body. This structure contains the user that owns the token, the date when the token expires and the data when the token has been issued, the project for which the token is valid (for a project scoped token) and the roles that the user has for this project. In addition, it contains a service catalog. Here is an example, where I have collapsed the catalog part for better readibility.

token

Finally, at the bottom of the output, we see that the actual API call to get a list of projects is made, using our newly acquired token and the endpoint

http://controller:5000/v3/projects

So our overall flow looks like this, ignoring some client internal processes like selecting the endpoint (and recovering from failed authorizations, see the last section of this post).

AuthorizationWorkflowGetProjects

Let us now go through these requests step by step and see how tokens and policies interact.

Creating a token

When we submit the API request to create a token, we end up in the method post in the AuthTokenResource class defined in keystone/api/auth.py. Here we find the code.

token=authentication.authenticate_for_token(auth_data)
resp_data=render_token.render_token_response_from_model(
          token, include_catalog=include_catalog
)

The method authenticate_for_token is defined in keystone/api/_shared/authentication.py. Here, we first authenticate the user, using the auth data provided in the request, in our case this is username, password, domain and project as defined in admin-openrc. Then, the actual token generation is triggered by the call

token=PROVIDERS.token_provider_api.issue_token(
          auth_context['user_id'], 
          method_names, 
          expires_at=expires_at,
          system=system, 
          project_id=project_id, 
          domain_id=domain_id,
          auth_context=auth_context, 
          trust_id=trust_id,
          app_cred_id=app_cred_id, 
         parent_audit_id=token_audit_id)

Here we see an additional layer of indirection in action – the ProviderAPIRegistry as defined in keystone/common/provider_api.py. Without getting into details, here is the idea of this approach which is used in a similar way in other OpenStack services.

Keystone itself consists of several components, each of which provide different methods (aka internal APIs). There is, for instance, the code in keystone/identity handling the core identity features, the code in keystone/assignment handling role assigments, the code in keystone/token handling tokens and so forth. Each of these components contains a class typically called Manager which is derived from the base class Manager in keystone/common/manager.py.

When such a class is instantiated, it registers its methods with the static instance ProviderAPI of the class ProviderAPIRegistry defined in keystone/common/provider_api.py. Technically, registering means that the object is added as attribute to the ProviderAPI object. For the token API, for instance, the the Manager class in keystone/token/provider.py registers itself using the name token_provider_api, so that it is added to the provider registry object as the attribute token_provider_api. Thus a method XXX of this manager class can now be invoked using

from keystone.common import provider_api
provider_api.ProviderAPIs.token_provider_api.XXX()

or by

from keystone.common import provider_api
PROVIDERS = provider_api.ProviderAPIs
PROVIDERS.token_provider_api.XXX()

This is exactly what happens here, and this is why the above line will actually take us to the method issue_token of the Manager class defined in keystone/token/provider.py. Here, we build and populate an instance of the Token class defined in keystone/models/token_model.py and populate it with the available data. We then populate the field token.id where we put the actual token, i.e. the encoded string that will end up in the HTTP header of future requests. This is done in the line

token_id, issued_at =
             self.driver.generate_id_and_issued_at(token)

which calls the actual token provider, for instance the Fernet provider. For a Fernet token, this will eventually end up in the line

token_id=self.token_formatter.create_token(
    token.user_id,
    token.expires_at,
    token.audit_ids,
    token_payload_class,
    methods=token.methods,
    system=token.system,
    domain_id=token.domain_id,
    project_id=token.project_id,
    trust_id=token.trust_id,
    federated_group_ids=token.federated_groups,
    identity_provider_id=token.identity_provider_id,
    protocol_id=token.protocol_id,
    access_token_id=token.access_token_id,
    app_cred_id=token.application_credential_id
)

calling the token formatter which will do the low level work of actually creating and encrypting the token. The token ID will then be added to the token data structure, along with the creation time (a process known as minting) before the token is returned up the call chain.

At this point, the token does not yet contain any role information and no service catalog. To enrich the token by this information, it is rendered by calling render_token defined in keystone/common/render_token.py. Here, a dictionary is built and populated with data including information on role, scope and endpoints.

Note that the role information in the token is dynamic, in fact, in the Token class, the property decoration is used to divert access to the roles property to a method call. Here, we receive the scope information and select and return only these roles which are bound to the respective domain or project if the token is domain scoped or project scoped. When we render the token, we access the roles attribute and retrieve the role information from the method bound to it.

Within this method, an additional piece of logic is implemented which is relevant for the later authorization process. Keystone allows an administrator to define a so-called admin project. Any user who authenticates with a token scoped to this special project is called a cloud admin, a special role which can be referenced in policies. When rendering the token, the project to which the token refers (if it its project scoped) is compared to this special project, and if they match, an additional attribute is_admin_project is added to the token dictionary.

Finally, back in the post method, we build the response body from the token structure and add the actual token to the response header in the line

response.headers['X-Subject-Token'] = token.id

Here is a graphical overview on the process as we have discussed it so far.

IssueToken

The key learnings from the code that we can deduce so far are

  • The actual Fernet token contains a minimum of information, like the user for whom the token is issued and – depending on the scope – the Ids of the project or domain to which the token is scoped
  • When a token is requested, the actual Fernet token (the token ID) is returned in the response header, and an enriched version of the token is added in the response body
  • This enrichment is done dynamically using the Keystone database, and the enrichment will only add the roles to the token data that are relevant for the token scope
  • There is a special admin project, and a token scoped to this project implies the cloud administrator role

Using the token to authorize a request

Let us now see what happens when a client uses this token to actually make a request to the API – in our example, this happens when the openstack client makes the actual API call to the endpoint http://controller:5000/v3/projects.

Before this request is actually dispatched to the business logic, it passes through the WSGI middleware. Here, more precisely in the class method AuthContextMiddleware.process_request defined in the file keystone/server/flask/request_processing/middleware/auth_context.py, the token is retrieved from the field X-Auth-Token in the HTTP header of the request (here we also put the marker field is_admin into the context when an admin_token is defined in the configuration and equal to the actual token). Then the process_request method of the superclass is called which invokes fetch_token (of the derived class!). Here, the validate_token method of the token provider is called which performs the actual token validation. Finally, the token is again rendered as above, thereby adding the relevant roles dynamically, and put as token_reference in the request context (this happens in the method fill_context respectively _keystone_specific_values of the middleware class).

At this point, it is instructive to take a closer look at the method that actually selects the relevant roles – the method roles of the token class defined in keystone/models/token_model.py. If you follow the call chain, you will find that, to obtain for instance all project roles, the internal API of the assignment component is used. This API returns the effective roles of the user, i.e. roles that include those roles that the user has due to group membership and roles that are inherited, for instance from the domain-level to the project level or down a tree of subprojects. Effective roles also include implied roles. It is important to understand (and reasonable) that it is the effective roles that enter a token and are therefore evaluated during the authorization process.

Once the entire chain of middleware has been processed, we finally reach the method _list_projects in keystone/api/projects.py. Close to the start of this method, the enforce_call method of the class RBACEnforcer in keystone/common/rbac_enforcer/enforcer.py. When making this call, the action identity:list_projects is passed as a parameter. In addition, a parameter called target is passed, a dictionary which contains some information on the objects to which the API request refers. In our example, as long as we do not specify any filters, this dictionary will be empty. If, however, we specify a domain ID as a filter, it will contain the ID of this domain. As we will see later, this allows us to define policies that allow a user to see projects in a specific domain, but not globally.

The enforce_call method will first make a couple of validations before it checks whether the request context contains the attribute is_admin. If yes, the token validation is skipped and the request is always allowed- this is to support the ADMIN_TOKEN bootstrapping mechanism. Then, close to the bottom of the method, we retrieve the request context, instantiate a new object and calls its _enforce method which essentially delegates the call to the Oslo policy rules engine and its Enforcer class, more precisely to the enforce method of this class.

As input, this method receives the action (identity:list_projects in our case), the target of the action, and the credentials, in the form of the Oslo request context, and the processing of the rules starts.

InvokePolicyEngine

Again, let us quickly summarize what the key take aways from this discussion should be – these points actually apply to most other OpenStack services as well.

  • When a request is received, the WSGI middleware is responsible for validating the token, retrieving the additional information like role data and placing it in the request context
  • Again, only those roles are stored in the context which the user has for the scope of the token (i.e. on project level for project-scoped token, on the domain level for domain-scoped token and on the system level for system-scoped token)
  • The roles in the token are effective roles, i.e. taking inheritance into account
  • The actual check against the policy is done by the Oslo policy rule engine

The Oslo policy rule engine

Before getting into the details of the rule engine, let us quickly summarize what data the rule engine has at its disposal. First, we have seen that it receives the action, which is simply a string, identity:list_projects in our case. Then, it has information on the target, which, generally speaking, is the object on which the action should be performed (this is less relevant in our example, but becomes important when we modify data). Finally, it has the credentials, including the token and role information which was part of the token and is now stored in the request context which the rule engine receives.

The engine will now run this data through all rules which are defined in the policy. Within the engine, a rule (or check) is simply an object with a __call__ method, so that they can be treated and invoked like a function. In the module _checks.py, a few basic checks are defined. There are, for instance, simple checks that always return true or false, and their checks like AndCheck and OrCheck which can be used to build more complex rules from basic building blocks. And there are other checks like the RoleCheck which checks whether a certain role is present in the credentials, which, as we know from the discussion above, is the case if the token use to authorize contains this role, i..e if the user who is owning the token has this role with respect to the scope of the token.

Where do the rules come from that are processed? First, note that the parameter rule to the enforce method does, in our case at least, contain a string, namely the action (identity:list_projects). To load the actual rules, the method enforce will first call load_rules which loads rules from a policy file, at which we will take a look in a second. Loading the policy file will create a new instance of the Rules class, which is a container class to hold a set of rules.

After loading all rules, the following line in enforce identifies the actual rule to be processed.

to_check = self.rules[rule]

This looks a bit confusing, but recall that here, rule actually contains the action identity:list_projects, so we look up the rule associated with this action. Finally, the actual rule checking is done by invoking the _check methods of the _checks module.

Let us now take a closer look at the policy files themselves. These files are typically located in the /etc/XXX subdirectory, where XXX is the OpenStack component in question. Samples files are maintained by the OpenStack team. To see an example, let us take a look at the sample policy file for Keystone which was distributed with the Rocky release. Here, we find a line

identity:list_projects": "rule:cloud_admin or rule:admin_and_matching_domain_id",

This file is in JSON syntax, and this line defines a dictionary entry with the action identity:list_projects and the rule rule:cloud_admin or rule:admin_and_matching_domain_id. The full syntax of the rule is explained nicely here or in the comments at the start of policy.py. In essence, in our example, the rule says that the action is allowed if either the user is a cloud administrator (i.e. an administrator the the special admin project or admin domain which can be configured in the Keystone configuration file) or is an admin for the requested domain.

When I first looked at the policy files in my test installation, however, which uses the Stein release, I was more than confused. Here, the rule for the action identity:list_projects is as follows.

"identity:list_projects": "rule:identity:list_projects"

Here we define a rule called identity:list_projects for the action with the same name, but where is this rule defined?

The answer is that there is a second source of rules, namely software defined rules (which the OpenStack documentation calls policy-in-code) which are registered when the enforcer object is created. This happens in the _enforcer method of the RBACEnforcer when a new enforcer is created. Here we call register_rules which creates a list of rules by calling the function list_rules define in the keystone/common/policies module which returns a list of sofware-defined rules, and registers these rules with the Oslo policy enforcer. The rule we are looking for, for instance, is defined in keystone/common/policies/project.py and looks as follows.

policy.DocumentedRuleDefault(
        name=base.IDENTITY % 'list_projects',
        check_str=SYSTEM_READER_OR_DOMAIN_READER,
        scope_types=['system', 'domain'],
        description='List projects.',
        operations=[{'path': '/v3/projects',
                     'method': 'GET'}],
        deprecated_rule=deprecated_list_projects,
        deprecated_reason=DEPRECATED_REASON,
        deprecated_since=versionutils.deprecated.STEIN),

Here we see that the actual rule (in the attribute check_str) has now changed compared to the Rocky release, and allows access if either the user has the reader role on the system level or has the reader role for the requested domain. In addition, there is a deprecated rule for backwards compatibility which is OR’ed with the actual rule. So the rule that really gets evaluated in our case is

(role:reader and system_scope:all) or (role:reader and domain_id:%(target.domain_id)s) or rule:admin_required

In our case, asking OpenStack to list all projects, there is a further piece of magic involved. This becomes visible if you try a different user. For instance, we can create a new project demo with a user demo who has the reader role for this project. If you now run the OpenStack client again to get all projects, you will only see those projects for which the user has a role. This is again a bit confusing, because by what we have discussed above, the authorization should fail.

In fact, it does, but the client is smart enough to have a plan B. If you look at the output of the OpenStack CLI with the -vvv flag, you will that a first request is made to list all projects which fails, as expected. The client then tries a second request, this time using the URL /users//projects to get all projects for that specific user. This call ends up in the method get of the class UserProjectsResource defined in keystone/api/users.py which will list all projects for which a specifc user has a role. Here, a call is made with a different action called identity:list_user_projects, and the rule for this action allows access if the user making the request (i.e. the user data from the token) is equal to target user (i.e. the user ID specified in the request). Thus this final call succeeds.

These examples are hopefully sufficient to demonstrate that policies can be a tricky topic. It is actually very instructive to add debugging output to the involved classes (the Python source code is on the controller node in /usr/lib/python3/dist-packages, do not forget to restart Apache if you have made changes to the code) to print out the various structures and trace the flow through the code. Happy hacking!

Openstack Keystone – installation and overview

Today we will dive into OpenStack Keystone, the part of OpenStack that provides services like management of users, roles and projects, authentication and a service catalog to the other OpenStack components. We will first install Keystone and then take a closer look at each of these areas.

Installing Keystone

As in the previous lab, I have put together a couple of scripts that automatically install Keystone in a virtual environment. To run them, issue the following commands (assuming, of course, that you did go through the basic setup steps from the previous post to set up the environment)

pwd
# you should be in the directory into which 
# you did clone the repository
cd openstack-samples/Lab2
vagrant up
ansible-playbook -i hosts.ini site.yaml

While the scripts are running, let us discuss the installation steps. First, we need to prepare the database. Keystone uses its own database schema called (well, you might guess …) keystone that needs to be added to the MariaDB instance. We will also have to create a new database user keystone with the necessary privileges on the keystone database.

Then, we install the Keystone packages via APT. This will put default configuration files into /etc/keystone which we need to adapt. Actually, there is only one change that we need to make at this point – we need to change the connection key to contain a proper connection string to reach our MariaDB with the database credentials just established.

Next, the keystone database needs to be created. To do this, we use the keystone-manage db_sync command that actually performs an upgrade of the Keystone DB schema to the latest version. We then again utilize the keystone-manage tool to create symmetric keys for the Fernet token mechanism and to encrypt credentials in the SQL backend.

Now we need to add a minimum set of domains, project and users to Keystone. Here, however, we face a chicken-and-egg problem. To be able to add a user, we need the authorization to do this, so we need a user, but there is no user yet.

There are two solutions to this problem. First, it is possible to define an admin token in the Keystone configuration file. When this token is used for a request, the entire authorization mechanism is bypassed, which we could use to create our initial admin user. This method, however, is a bit dangerous. The admin token is contained in the configuration file in clear text and never expires, so anyone who has access to the file can perform every action in Keystone and then OpenStack.

The second approach is to again the keystone-manage tool which has a command bootstrap which will access the database directly (more precisely, via the keystone code base) and will create a default domain, a default project, an admin user and three roles (admin, member, reader). The admin user is set up to have the admin role for the admin project and on system level. In addition, the bootstrap process will create a region and catalog entries for the identity services (we will discuss these terms later on).

Users, projects, roles and domains

Of course, users are the central object in Keystone. A user can either represent an actual human user or a service account which is used to define access rights for the OpenStack services with respect to other services.

In a typical cloud environment, just having a global set of users, however, is not enough. Instead, you will typically have several organizations or tenants that use the same cloud platform, but require a certain degree of separation. In OpenStack, tenants are modeled as projects (even though the term tenant is sometimes used as well to refer to the same thing). Projects and users, in turn, are both grouped into domains.

To actually define which user has which access rights in the system, Keystone allows you to define roles and assign roles to users. In fact, when you assign a role, you always do this for a project or a domain. You would, for instance, assign the role reader to user bob for the project test or for a domain. So role assignments always refer to a user and either a role or a project.

DomainsUserRolesProjects

Note that it is possible to assign a role to a user in a domain for a project living in a different domain (though you will probably have a good reason to do this).

In fact, the full picture is even a bit more complicated than this. First, roles can imply other roles. In the default installation, the admin role implies the member role, and the member role implies the reader role. Second, the above diagram suggests that a role is not part of a domain. This is true in most cases, but it is in fact possible to create domain-specific roles. These roles do not appear in a token and are therefore not directly relevant to authorization, but are intended to be used as prior roles to map domain specific role logic onto the overall role logic of an installation.

It is also not entirely true that roles always refer to either a domain or a project. In fact, Keystone allows for so-called system roles which are supposed to be used to restrict access to operations that are system wide, for instance the configuration of API endpoints.

Finally, there are also groups. Groups are just collections of users, and instead of assigning a role to a user, you can assign a role to a group which then effectively is valid for all users in that group.

And, yes, there are also subprojects.. but let us stop here, you see that the Keystone data structures are complicated and have been growing significantly over time.

To better understand the terms discussed so far, let us take a look at our sample installation. First, establish an SSH connection to some of the nodes, say the controller node.

vagrant ssh controller

On this node, we will use the OpenStack Python client to explore users, projects and domains. To run it, we will need credentials. When you work with the OpenStack CLI, there are several methods to supply credentials. The option we will use is to provide credentials in environment variables. To be able to quickly set up these variables, the installation script creates a bash script admin-openrc that sets these credentials. So let us source this script and then submit an OpenStack API request to list all existing users.

source admin-openrc
openstack user list

At this point, this should only give you one user – the admin user created during the installation process. To display more details for this user, you can use openstack user show admin, and you should obtain an output similar to the one below.

+---------------------+----------------------------------+
| Field               | Value                            |
+---------------------+----------------------------------+
| domain_id           | default                          |
| enabled             | True                             |
| id                  | 67a4f789b4b0496cade832a492f7048f |
| name                | admin                            |
| options             | {}                               |
| password_expires_at | None                             |
+---------------------+----------------------------------+

We see that the user admin is part of the default domain, which is the standard domain used in OpenStack as long as no other domain is specified.

Let us now see which role assignments this user has. To do this, let us list all assigments for the user admin, using JSON output for better readability.

openstack role assignment list --user admin -f json

This will yield the following output.

[
  {
    "Role": "18307c8c97a34d799d965f38b5aecc37",
    "User": "92f953a349304d48a989635b627e1cb3",
    "Group": "",
    "Project": "5b634876aa9a422c83591632a281ad59",
    "Domain": "",
    "System": "",
    "Inherited": false
  },
  {
    "Role": "18307c8c97a34d799d965f38b5aecc37",
    "User": "92f953a349304d48a989635b627e1cb3",
    "Group": "",
    "Project": "",
    "Domain": "",
    "System": "all",
    "Inherited": false
  }
]

Here we see that there are two role assignments for this user. As the output only contains the UUIDs of the role and the project, we will have to list all projects and all roles to be able to interpret the output.

openstack project list 
openstack role list

So we see that for both assignments, the role is the admin role. For the first assignment, the project is the admin project, and for the second assignment, there is no project (and no domain), but the system field is filled. Thus the first assignment assigns the admin role for the admin project to our user, whereas the second one assigns the admin role on system level.

So far, we have not specified anywhere what these roles actually imply. To understand how roles lead to authorizations, there are still two missing pieces. First, OpenStack has a concept of implied roles. These are roles that a user has which are implied by explicitly defined roles. To see implied roles in action, run

openstack implied role list 

The resulting table will list the prior roles on the left and the implied role on the right. So we see that having the admin role implies to also have the member role, and having the member role in turn implies to also have the reader role.

The second concept that we have not touched upon are policies. Policies actually define what a user having a specific role is allowed to do. Whenever you submit an API request, this request targets a certain action. Actions more or less correspond to API endpoints, so an action could be “list all projects” or “create a user”. A policy defines a rule for this action which is evaluated to determine whether that request is allowed. A simple rule could be “user needs to have the admin role”, but the rule engine is rather powerful and we can define much more elaborated rules – more on this in the next post.

The important point to understand here is that policies are not defined by the admin via APIs, but are predefined either in the code or in specific policy files that are part of the configuration of each OpenStack service. Policies refer to roles by name, and it does not make sense to define and use a role that this is not referenced by policies (even though you can technically do this). Thus you will rarely need to create roles beyond the standard roles admin, member and reader unless you also change the policy files.

Service catalogs

Apart from managing users (the Identity part of Keystone), project, roles and domains (the Resources part of Keystone), Keystone also acts as a service registry. OpenStack services register themselves and their API endpoints with Keystone, and OpenStack clients can use this information to obtain the URL of service endpoints.

Let us take a look at the services that are currently registered with Keystone. This can be done by running the following commands on the controller.

source admin-openrc
openstack service list -f json

At this point in the installation, before installing any other OpenStack services, there is only one service – Keystone itself. The corresponding output is

[
  {
    "ID": "3acb257f823c4ecea6cf0a9e94ce67b9",
    "Name": "keystone",
    "Type": "identity"
  },
]

We see that a service has a name which identifies the actual service, in addition to a type which defines the type of service delivered. Given the type of a service, we can now use Keystone to retrieve a list of service API endpoints. In our example, enter

openstack endpoint list --service identity -f json

which should yield the following output.

[
  {
    "ID": "062975c2758f4112b5d6568fe068aa6f",
    "Region": "RegionOne",
    "Service Name": "keystone",
    "Service Type": "identity",
    "Enabled": true,
    "Interface": "public",
    "URL": "http://controller:5000/v3/"
  },
  {
    "ID": "207708ecb77e40e5abf9de28e4932913",
    "Region": "RegionOne",
    "Service Name": "keystone",
    "Service Type": "identity",
    "Enabled": true,
    "Interface": "admin",
    "URL": "http://controller:5000/v3/"
  },
  {
    "ID": "781c147d02604f109eef1f55248f335c",
    "Region": "RegionOne",
    "Service Name": "keystone",
    "Service Type": "identity",
    "Enabled": true,
    "Interface": "internal",
    "URL": "http://controller:5000/v3/"
  }
]

Here, we see that every service typically offers different types of endpoints. There are public endpoints, which are supposed to be reachable from an external network, internal endpoints for users in the internal network and admin endpoints for administrative access. This, however, is not enforced by Keystone but by the network layout you have chosen. In our simple test installation, all three endpoints for a service will be identical.

When we install more OpenStack services later, you will see that as part of this installation, we will always register a new service and corresponding endpoints with Keystone.

Token authorization

So far, we have not yet discussed how an OpenStack service actually authenticates a user. There are several ways to do this. First, you can authorize using passwords. When using the OpenStack CLI, for instance, you can put username and password into environment variables which will then be used to make API requests (for ease of use, the Ansible playbooks that we use to bring up our environment will create a file admin-openrc which you can source to set these variables and that we have already used in the examples above).

In most cases, however, subsequent authorizations will use a token. A token is essentially a short string which is issued once by Keystone and then put into the X-Auth-Token field in the HTTP header of subsequent requests. If a token is present, Keystone will validate this token and, if it is valid, be able to derive all informations it needs to authenticate the user and authorize a request.

Keystone is able to use different token formats. The default token format with recent releases of Keystone is the Fernet token format.

It is important to understand that tokens are scoped objects. The scope of a token determines which roles are taken into account for the authorization process. If a token is project scoped, only those roles of a user that target a project are considered. If a token is domain scoped, only the roles that are defined on domain level are considered. And finally, a system scope token implies that only roles at system level are relevant for the authorization process.

Earlier versions of Keystone supported a token type called PKI token that contained a large amount of information directly, including role information and service endpoints. The advantage of this approach was that once a token had been issued, it could be processed without any further access to Keystone. The disadvantage, however, was that the tokens generated in this way tended to be huge, and soon reached a point where they could no longer be put into a HTTP header. The Fernet token format handles things differently. A Fernet token is an encrypted token which contains only a limited amount of information. To use it, a service will, in most cases, need to run additional calls agains Keystone to retrieve additional data like roles and services. For a project scoped token, for instance, the following diagram displays the information that is stored in a token on the left hand side.

FernetToken

First, there is a version number which encodes the information on the scope of the token. Then, there is the ID of the user for whom the token is issued, the methods that the user has used to authenticate, the ID of the project for which the token is valid, and the expiration date. Finally, there is an audit ID which is simply a randomly generated string that can be put into logfiles (in contrast to the token itself, which should be kept secret) and can be used to trace the usage of this token. All these fields are serialized and encrypted using a symmetric key stored by Keystone, typically on disk. A domain scoped token contains a domain ID instead of the project ID and so forth.

Equipped with this understanding, we can now take a look at the overall authorization process. Suppose a client wants to request an action via the API, say from Nova. First, the client would then use password-based authorization to obtain a token from Keystone. Keystone returns the token along with an enriched version containing roles and endpoints as well. The client would use the endpoint information to determine the URL of the Nova service. Using the token, it would then try to submit an API request to the Nova API.

The Nova service will take the token, validate it and ask Keystone again to enrich the token, i.e to add the missing information on roles and endpoints (in fact, this happens in the Keystone middleware). It is then able to use the role information and its policies to determine whether the user is authorized for the request.

OpenStackAuthorization

Using Keystone with LDAP and other authentication mechanisms

So far, we have stored user identities and group inside the MariaDB database, i.e. local. In most production setups, however, you will want to connect to an existing identity store, which is typically exposed via the LDAP protocol. Fortunately, Keystone can be integrated with LDAP. This integration is read-only, and Keystone will use LDAP for authentication, but still store projects, domains and role information in the MariaDB database.

When using this feature, you will have to add various data to the Keystone configuration file. First, of course, you need to add basic connectivity information like credentials, host and port so that Keystone can connect to an LDAP server. In addition, you need to define how the fields of a user entity in LDAP map onto the corresponding fields in Keystone. Optionally, TLS can be configured to secure the connection to an LDAP server.

In addition to LDAP, Keystone also supports a variety of alternative methods for authentication. First, Keystone supports federation, i.e. the ability to share authentication data between different identity providers. Typically, Keystone will act as a service provider, i.e. when a user tries to connect to Keystone, the user is redirected to an identity provider, authenticates with this provider and Keystone receives and accepts the user data from this provider. Keystone supports both the OpenID Connect and the SAML standard to exchange authentication data with an identity provider.

As an alternative mechanism, Keystone can delegate the process of authentication to the Apache webserver in which Keystone is running – this is called external authentication in Keystone. In this case, Apache will handle the authentication, using whatever mechanisms the administrator has configured in Apache, and pass the resulting user identity as part of the request context down to Keystone. Keystone will then look up this user in its backend and use it for further processing.

Finally, Keystone offers a mechanism called application credentials to allow applications to use the API on behalf of a Keystone user without having to reveal the users password to the application. In this scenario, a user creates an application credential, passing in a secret and (optionally) a subset of roles and endpoints to which the credential grants access. Keystone will then create a credential and pass its ID back to the user. The user can then store the credential ID and the secret in the applications configuration. When the application wants to access an OpenStack service, it uses a POST request on the /auth/tokens endpoint to request a token, and Keystone will generate a token that the application can use to connect to OpenStack services.

This completes our post today. Before moving on to install additional services like Placement, Glance and Nova, we will – in the next post – go on a guided tour through a part of the Keystone source code to see how tokens and policies work under the hood.

Setting up our OpenStack playground

In this post, we will describe the setup of our Lab environment and install the basic infrastructure services that OpenStack uses.

Environment setup

In a real world setup, OpenStack runs on a collection of physical servers on which the virtual machines provided by the cloud run. Now most of us will probably not have a rack in their basement, so that using four or five physical servers for our labs is not a realistic option. Instead, we will use virtual machines for that purpose.

To avoid confusion, let us first fix some terms. First, there is the actual physical machine on which the labs will run, most likely a desktop PC or a laptop, and most likely the PC you are using to read this post. Let us call this machine the lab host.

On this host, we will run Virtualbox to create virtual machines. These virtual machines will be called the nodes, and they will play the role that in a real world setup, the physical servers would play. We will be using one controller node on which most of the OpenStack components will run, and two compute nodes.

Inside the compute nodes, the Nova compute service will then provision virtual machines which we call VMs. So effectively, we use nested virtualization – the VM is itself running inside a virtual machine (the node).

To run the labs, your host will need to have a certain minimum amount of RAM. When I tested the setup, I found that the controller node and the compute nodes in total consume at least 7-8 GB of RAM, which will increase depending on the number of VMs you run. To still be able to work on the machine, you will therefore need at least 16 GB of RAM. If you have more – even better. If you have less, you might also want to use a cloud based setup. In this case, the host could itself be a virtual machine in the cloud, or you could use a bare-metal provider like Packet to get access to a physical host with the necessary memory.

Not every cloud will work, though, as it needs to support nested virtualization. I have tested the setup on DigitalOcean and found that it works, but other cloud providers might yield different results.

Networking

Let us now take a look at the network configuration that we will use for our hosts. If you run OpenStack, there will be different categories of traffic between the nodes. First, there is management traffic, i.e. communication between the different components of the platform, like messages exchanged via RabbitMQ or API calls. For security and availability reasons, this traffic is typically handled via a dedicated management network. The management network is configured by the administrator and used by the OpenStack components.

Then, there is traffic between the VMs, or, more precisely, between the guests running inside the VMs. The network which is supporting this traffic is called the guest network. Note that we are not yet talking about a virtual network here, but about the network connecting the various nodes which eventually will be used for this traffic.

Sometimes, additional network types need to be considered, there could for instance be a dedicated API network to allow end users and administrators access to the API without depending on any of the other networks, or a dedicated external network that connects the network node to a physical route to provide internet access for guests, but for this setup, we will only use a two networks – a management network and a guest network. Note that the guest network needs to be provided by an adminstrator, but is controlled by Openstack (which, for instance, will add the interfaces that make up the network to virtual bridges so that they can no longer be used for other traffic).

In our case, both networks, the management network and the guest network, will be set up as Virtualbox host-only networks, connecting our nodes. Here is a diagram that summarizes the network topology we will use.

OpenStackEnvironment

Setting up your host and first steps

Let us now roll up our sleeves and dive right into the first lab. Today, we will bring up our environment, and, on each node, install the required infrastructure like MySQL, RabbitMQ and so forth.

First, however, we need to prepare our host. Obviously, we need some tools installed – Vagrant, Virtualbox and Ansible. We will also use pwgen to create credentials. How exactly these tool need to be installed depends on your Linux distribution, on Ubuntu, you would run

sudo apt-get install python3-pip
pip3 install 'ansible==v2.8.6' 
sudo apt-get install pwgen
sudo apt-get install virtualbox
sudo apt-get install vagrant

The Ansible version is important. I found that there is at least oneissue which breaks network creation in OpenStack with Ansible with some 2.9.x versions of Ansible.

When we set up our labs, we will sometimes have to throw away our environment and rebuild it. This will be fully automated, but it implies that we need to download packages into the nodes over and over again. To speed up this process, we install a local APT cache. I use APT-Cacher-NG for that purpose. Installing it is very easy, simply run

sudo apt-get install apt-cacher-ng

This will install a proxy, listening on port 3142, which will create local copies of packages that you install. Later, we will instruct the apt processes running in our virtual machines to use this cache.

Now we are ready to start. First, you will have to clone my repository to get a copy of the scripts that we will use.

git clone https://github.com/christianb93/openstack-labs
cd openstack-labs/Lab1

Next, we will bring up our virtual machines. There is, however, a little twist when it comes to networking. As mentioned above, we will use Virtualbox host networking. As you might know when you have read my previous post on this topic, Virtualbox will create two virtual devices to support this, one for each network. These devices will be called vboxnet0 and vboxnet1. However, if these devices already exist, Virtualbox will use them and take over parts of the existing network configuration. This can lead to problems later, if, for instance, Virtualbox runs a DHCP server on this device, this will conflict with the OpenStack DHCP agent and your VMs will get incorrect IP addresses and will not be reachable. To avoid this, we will delete any existing interfaces (which of course requires that you stop all other virtual machines) and recreate them before we bring up our machines. The repository contains a shells script to do this. To run it and start the machines, enter

../scripts/createVBoxNetworks.sh
vagrant up

We are now ready to run our playbook. Before doing this, let us first discuss what the scripts will actually do.

First, we need a set of credentials. These credentials consist of a set of randomly generated passwords that we use to set up the various users that the installation needs (database users, RabbitMQ users, Keystone users and so forth) and an SSH key pair that we will use later to access our virtual machines. These credentials will be created automatically and stored in ~/.os_credentials.

Next, we need a basic setup within each of the nodes – we will need the Python OpenStack modules, we will need to bring up all network interfaces, and we will update the /etc/hosts configuration files in each of the nodes to be able to resolve all other nodes.

We will also change the configuration of the APT package manager. We will point APT to the APT cache running on the host and we will add the Ubuntu OpenStack Cloud Archive repository to the repository list from which we will pull the OpenStack packages.

Next, we need to make sure that the time on all nodes is synchronized. To achieve this, we install a network of NTP daemons. We use Chrony and set up the controller as Chrony server and the compute nodes as clients. We then install MySQL, Memcached and RabbitMQ on the controller node and create the required users.

All this is done by the playbook site.yaml, and you can simply run it by typing

ansible-playbook -i hosts.ini site.yaml

Once the script completes, we can run a few checks to see that everything worked. First, log into the controller node using vagrant ssh controller and verify that Chrony is running and that we have network connectivity to the other nodes.

sudo systemctl | grep "chrony"
ping compute1
ping compute2

Then, verify that you can log into MySQL locally and that the root user has a non-empty password (we can still log in locally as root without a password) by running sudo mysql and then, on the SQL prompt, typing

select * from mysql.user;

Finally, let us verify that RabbitMQ is running and has a new user openstack.

sudo rabbitmqctl list_users
sudo rabbitmqctl node_health_check
sudo rabbitmqctl status

A final note on versions. This post and most upcoming posts in this series have been created with a lab PC running Python 3.6 and Ansible 2.8.9. After upgrading my lab PC to Ubuntu 20.04 today, I continued to use Ansible 2.8.9 because I had experienced problems with newer versions earlier on, but upgraded to Python 3.8. After doing this, I hit upon this bug that requires this fix which I reconciled manually into my local Ansible version.

We are now ready to install our first OpenStack services. In the next post, we will install Keystone and learn more about domains, users, projects and services in OpenStack.