Managing KVM virtual machines part III – using libvirt with Ansible

In my previous post, we have seen how the libvirt toolset can be used to directly create virtual volumes, virtual networks and KVM virtual machines. If this is not the first time you visit my post, you will know that I am a big fan of automation, so let us investigate today how we can use Ansible to bring up KVM domains automatically.

Using the Ansible libvirt modules

Fortunately, Ansible comes with a couple of libvirt modules that we can use for this purpose. These modules mainly follow the same approach – we initially create objects from an XML template and then can perform additional operations on them, referencing them by name.

To use these modules, you will have to install a couple of additional libraries on your workstation. First, obviously, you need Ansible, and I suggest that you take a look at my short introduction to Ansible if you are new to it and do not have it installed yet. I have used version 2.8.6 for this post.

Next, Ansible of course uses Python, and there are a couple of Python modules that we need. In the first post on libvirt, I have already shown you how to install the python3-libvirt package. In addition, you will have to run

pip3 install lxml

to install the XML module that we will need.

Let us start with a volume pool. In order to somehow isolate our volumes from other volumes on the same hypervisor, it is useful to put them into a separate volume pool. The virt_pool module is able to do this for us, given that we provide an XML definition of the pool which we can of course create from a Jinja2 template, using the name of the volume pool and its location on the hard disk as parameter.

At this point, we have to define where we store the state of our configuration, consisting of the storage pool, but also potentially SSH keys and the XML templates that we use. I have decided to put all this into a directory state in the same directory where the playbook is kept. This avoids polluting system- or user-wide directories, but also implies that when we eventually clean up this directory, we first have to remove the storage pool again in libvirt as we otherwise have a stale reference from the libvirt XML structures in /etc/libvirt into a non-existing directory.

Once we have a storage pool, we can create a volume. Unfortunately, Ansible does not (at least not at the time of writing this) have a module to handle libvirt volumes, so a bit of scripting is needed. Essentially, we will have to

  • Download a base image if not done yet – we can use the get_url module for that purpose
  • Check whether our volume already exists (this is needed to make sure that our playbook is idempotent)
  • If no, create a new overlay volume using our base image as backing store

We could of course upload the base image into the libvirt pool, or, alternatively, keep the base image outside of libvirts control and refer to it via its location on the local file system.

At this point, we have a volume and can start to create a network. Here, we can again use an Ansible module which has idempotency already built into it – virt_net. As before, we need to create an XML description for the network from a Jinja2 template and hand that over to the module to create our network. Once created, we can then refer to it by name and start it.

Finally, its time to create a virtual machine. Here, the virt module comes in handy. Still, we need an XML description of our domain, and there are several ways to create that. We could of course built an XML file from scratch based on the specification on libvirt.org, or simply use the tools explored in the previous post to create a machine manually and then dump the XML file and use it as a template. I strongly recommend to do both – dump the XML file of a virtual machine and then go through it line by line, with the libvirt documentation in a separate window, to see what it does.

When defining the virtual machine via the virt module, I was running into error messages that did not show up when using the same XML definition with virsh directly, so I decided to also use virsh for that purpose.

In case you want to try out the code I had at this point, follow the instructions above to install the required modules and then run (there is still an issue with this code, which we will fix in a minute)

git clone https://github.com/christianb93/ansible-samples
cd ansible-samples
cd libvirt
git checkout origin/v0.1
ansible-playbook site.yaml

When you now run virsh list, you should see the newly created domain test-instance, and using virt-viewer or virt-manager, you should be able to see its screen and that it boots up (after waiting a few seconds because there is no device labeled as root device, which is not a problem – the boot process will still continue).

So now you can log into your machine…but wait a minute, the image that we use (the Ubuntu cloud image) has no root password set. No problem, we will use SSH. So figure out the IP address of your domain using virsh domifaddr test-instance…hmmm…no IP address. Something is still wrong with our machine, its running but there is no way to log into it. And even if we had an IP address, we could not SSH into it because there is SSH key.

Apparently we are missing something, and in fact, things like getting a DHCP lease and creating an SSH key would be the job of cloud-init. So there is a problem with the cloud-init configuration of our machine, which we will now investigate and fix.

Using cloud-init

First, we have to diagnose our problem in a bit more detail. To do this, it would of course be beneficial if we could log into our machine. Fortunately, there is a great tool called virt-customize which is part of the libguestfs project and allows you to manipulate a virtual disk image. So let us shut down our machine, add a root password and restart the machine.

virsh destroy test-instance
sudo virt-customize \
  -a state/pool/test-volume \
  --root-password password:secret
virsh start test-instance

After some time, you should now be able to log into the instance as root using the password “secret”, either from the VNC console or via virsh console test-instance.

Inside the guest, the first thing we can try is to run dhclient to get an IP address. Once this completes, you should see that ens3 has an IP address assigned, so our network and the DHCP server works.

Next let us try to understand why no DHCP address was assigned during the boot process. Usually, this is done by cloud-init, which, in turn, is started by the systemd mechanism. Specifically, cloud-init uses a generator, i.e. a script (/lib/systemd/system-generators/cloud-init-generator) which then dynamically creates unit files and targets. This script (or rather a Jinja2 template used to create it when cloud-init is installed) can be found here.

Looking at this script and the logging output in our test instance (located at /run/cloud-init), we can now figure out what has happened. First, the script tries to determine whether cloud-init should be enabled. For that purpose, it uses the following sources of information.

  • The kernel command line as stored in /proc/cmdline, where it will search for a parameter cloud-init which is not set in our case
  • Marker files in /etc/cloud, which are also not there in our case
  • A default as fallback, which is set to “enabled”

So far so good, cloud-init has figured out that it should be enabled. Next, however, it checks for a data source, i.e. a source of meta-data and user-data. This is done by calling another script called ds-identify. It is instructive to re-run this script in our virtual machine with an increased log level and take a look at the output.

DEBUG=2 /usr/lib/cloud-init/ds-identify --force
cat /run/cloud-init/ds-identify.log

Here we see that the script tries several sources, each modeled as a function. An example is the check for EC2 metadata, which uses a combination of data like DMI serial number or hypervisor IDs with “seed files” which can be used to enforce the use a data source to check whether we are running on EC2.

For setups like our setup which is not part of a cloud environment, there is a special data source called the no cloud data source. To force cloud-init to use this data source, there are several options.

  • Add the switch “ds=nocloud” to the kernel command line
  • Fake a DMI product serial number containing “ds=nocloud”
  • Add a special marker directory (“seed directory”) to the image
  • Attach a disk that contains a file system with label “cidata” or “CIDATA” to the machine

So the first step to use cloud-config would to actually enable it. We will choose the approach to add a correctly labeled disk image to our machine. To create such a disk, there is a nice tool called cloud_localds, but we will do this manually to better understand how it works.

First, we need to define the cloud-config data that we want to feed to cloud-init by putting it onto the disk. In general, cloud-init expects two pieces of data: meta data, providing information on the instance, and user data, which contains the actual cloud-init configuration.

For the metadata, we use a file containing only a bare minimum.

$ cat state/meta-data
instance-id: iid-test-instance
local-hostname: test-instance

For the metadata, we use the example from the cloud-init documentation as a starting point, which will simply set up a password for the default user.

$ cat state/user-data
#cloud-config
password: password
chpasswd: { expire: False }
ssh_pwauth: True

Now we use the genisoimage utility to create an ISO image from this data. Here is the command that we need to run.

genisoimage \
  -o state/cloud-config.iso \
  -joliet -rock \
  -volid cidata \
  state/user-data \
  state/meta-data 

Now open the virt-manager, shut down the machine, navigate to the machine details, and use “Add hardware” to add our new ISO image as a CDROM device. When you now restart the machine, you should be able to log in as user “ubuntu” with the password “password”.

When playing with this, there is an interesting pitfall. The cloud-init tool actually caches data on disk, and will only process a changed configuration if the instance-ID changes. So if cloud-init runs once, and you then make a change, you need to delete root volume as well to make sure that this cache is not used, otherwise you will run into interesting issues during testing.

Similar mechanisms need to be kept in mind for the network configuration. When running for the first time, cloud-init will create a network configuration in /etc/netplan. This configuration makes sure that the network is correctly reconfigured when we reboot the machine, but contains the MAC-address of the virtual network interface. So if you destroy and undefine the domain while the disk untouched, and create a new domain (with a new MAC-address) using the same disk, then the network configuration will fail because the MAC-address no longer matches.

Let us now automate the entire process using Ansible and add some additional configuration data. First, having a password is of course not the most secure approach. Alternatively, we can use the ssh_authorized_keys key in the cloud-config file which accepts a public SSH key (as a string) which will be added as an authorized key to the default user.

Next the way we have managed our ISO image is not yet ideal. In fact, when we attach the image to a virtual machine, libvirt will assume ownership of the file and later change its owner and group to “root”, so that an ordinary user cannot easily delete it. To fix this, we can create the ISO image as a volume under the control of libvirt (i.e. in our volume pool) so that we can remove or update it using virsh.

To see all this in action, clean up by removing the virtual machine and the generated ISO file in the state directory (which should work as long as your user is in the kvm group), get back to the directory into which you cloned my repository and run

cd libvirt
git checkout master
ansible-playbook site.yaml
# Wait for machine to get an IP
ip=""
while [ "$ip" == "" ]; do
  sleep 5
  echo "Waiting for IP..."
  ip=$(virsh domifaddr \
    test-instance \
    | grep "ipv4" \
    | awk '{print $4}' \
    | sed "s/\/24//")
done
echo "Got IP $ip, waiting another 5 seconds"
sleep 5
ssh -i ./state/ssh-key ubuntu@$ip

This will run the playbook which configures and brings up the machine, wait until the machine has acquired an IP address and then SSH into it using the ubuntu user with the generated SSH key.

The playbook will also create two scripts and place them in the state directory. The first script – ssh.sh – will exactly do the same as our snippet above, i.e. figure out the IP address and SSH into the machine. The second script – cleanup.sh – removes all created objects, except the downloaded Ubuntu cloud image so that you can start over with a fresh install.

This completes our post on using Ansible with libvirt. Of course, this does not replace a more sophisticated tool like Vagrant, but for simple setups, it works nicely and avoids the additional dependency on Vagrant. I hope you had some fun – keep on automating!

Managing KVM virtual machines part II – the libvirt toolkit

In the previous post, we have seen how Vagrant can be used to define, create and destroy KVM virtual machines. Today, we will dig a bit deeper into the objects managed by the libvirt library and learn how to create virtual machines using the libvirt toolkit directly

Creating a volume

When creating a virtual machine, you need to supply a volume which will be attached to the machine, either as a bootable root partition or as an additional device. In the libvirt object model, volumes are objects with a lifecycle independent of a virtual machine. Let us take a closer look at how volumes are defined and managed by libvirt.

At the end of the day, a volume which is attached to a virtual machine is linked to some physical storage – usually a file, i.e. a disk image – on the host on which KVM is running. These physical file locations are called target in the libvirt terminology. To organize the storage available for volume targets, libvirt uses the concept of a storage pool. Essentially, a storage pool is some physical disk space which is reserved for libvirt and used to create and store volumes.

LibvirtStorage

Libvirt is able to manage different types of storage pools. The most straightforward type of storage pool is a directory pool. In this case, the storage available for the pool is simply a directory, and each volume in the pool is a disk image stored in this directory. Other, more advanced pool types include pools that utilize storage provided by an NFS server or an iSCSI server, LVM volume groups, entire physical disks or IP storage like Ceph and Gluster.

When libvirt is initially installed, a default storage pool is automatically created. To list all available storage pools and get some information on the default pool, use the commands

virsh pool-list
virsh pool-info default
virsh pool-dumpxml default

Here we see that the default pool is of type “directory” and its target (i.e. location on the host file system) is /var/lib/libvirt/images

Let us now create an image in this pool. There are several ways to do this. In our case, we will first download a disk image and then upload this image into the pool, which will essentially create a copy of the image inside the pool directory and thus under libvirts control. For our tests, we will use the Cirros image, as it has a password enabled by default and is very small. To obtain a copy, run the commands

wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
mv cirros-0.4.0-x86_64-disk.img cirros-0.4.0-x86_64-disk.qcow2

It happened to me several times that the download was corrupted, so it is a good idea to check the integrity of the image using the MD5 checksums provided here. For our image, the MD5 checksum (which you can verify using md5sum cirros-0.4.0-x86_64-disk.qcow2) should be 443b7623e27ecf03dc9e01ee93f67afe.

Now let us import this image into the default pool. First, we use the qemu-img tool to figure out the size of the image, and then we use virsh vol-create-as to create a volume in the default pool which is large enough to hold our image.

qemu-img info cirros-0.4.0-x86_64-disk.qcow2 
virsh vol-create-as \
  default \
  cirros-image.qcow2 \
  128M \
  --format qcow2

When this command completes, we can verify that a new disk image has been created in /var/lib/libvirt/images.

ls -l /var/lib/libvirt/images
virsh vol-list --pool=default
sudo qemu-img info /var/lib/libvirt/images/cirros-image.qcow2
virsh vol-dumpxml cirros-image.qcow2 --pool=default

This image is now logically still empty, but we can now perform the actual upload which will copy the contents of our downloaded image into the libvirt volume

virsh vol-upload \
  cirros-image.qcow2 \
  cirros-0.4.0-x86_64-disk.qcow2 \
  --pool default 

Create a network

The next thing that we need to spin up a useful virtual machine is a network. To create a network, we use a slightly different approach. In libvirt, every object is defined and represented by an XML structure (placed in a subdirectory of /etc/libvirt). We have already seen some of these XML structures in this and the previous post. If you want full control over each attribute of a libvirt managed object, you can also create them directly from a corresponding XML structure. Let us see how this works for a network. First, we create an XML file with a network definition – use this link for a full description of the structure of the XML file.

<network>
<name>test-network</name>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr-test' stp='on' delay='0'/>
<ip address='192.168.200.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.200.2' end='192.168.200.254'/>
</dhcp>
</ip>
</network>

view raw
test-network.xml
hosted with ❤ by GitHub

Here we define a new virtual network called test-network. This network has NAT’ing enabled, which implies that libvirt will create iptables rules to masquerade outgoing traffic so that any VM that we attach to this network later will be able to reach the public network. We also instruct libvirt to bring up a virtual Linux bridge virbr-test to implement this network on the host. Finally, we specify a CIDR for our network and ask libvirt to start a DHCP server listening on this network that will hand out leases for a specific range of IP address.

Store this XML structure in a file /tmp/test-network.xml and then use it to create a network as follows.

virsh net-define /tmp/test-network.xml
virsh net-start test-network

You can now inspect the created bridge, iptables rule and DNS processes by running

sudo iptables -S -t nat
brctl show virbr-test
ip addr show dev virbr-test
ps ax | grep "dnsmasq"
sudo cat /var/lib/libvirt/dnsmasq/test-network.conf

Looking at all this, we find that libvirt will start a dnsmasq process which is listening on the virbr-test bridge and managing the IP range that we specify. When we start a virtual machine later on, this machine will also be attached to the bridge using a TUN device, so that we have the following picture.

LibvirtNetwork

Note that the IP range assigned to the network should not overlap with the IP range of any other libvirt virtual network (or any other virtual network on your host created by e.g. Docker or VirtualBox)

Bring up a machine

We are now ready to start a machine which is attached to our previously defined network and volume (and actually booting from this volume). To create a virtual machine – called a domain in libvirt – we again have several options. We could use the graphical virt-manager or, similar to a network, could prepare an XML file with a domain definition and use virsh create to create a domain from that. A slightly more convenient method is to use the virt-install tool which is part of the virt-manager project. Here is the command that we need to create a new domain called test-instance using our previously created image and network.

virt-install \
  --name test-instance \
  --memory 512 \
  --vcpus 1 \
  --import \
  --disk vol=default/cirros-image.qcow2,format=qcow2,bus=virtio \
  --network network=test-network \
  --graphics vnc,keymap=local --noautoconsole 

Let us quickly go through some of the parameters that we use. First, we give the instance a name, define the amount of RAM that we allocate and the number of vCPUs that the machine will have. With the import flag, we instruct virt-install to boot from the first provided disk (alternatively, virt-install has an option to boot from an image defined using the –location directive, which can point to a disk image or a remote location).

In the next line, we specify the first (and only) disk that we want to attach. Note that we refer to the logical name of the volume, in the form pool/volume name. We also tell libvirt which format our image has and that it should use the virtio driver to set up the virtual storage controller in our machine.

In the next line, we attach our machine to the test network. The CirrOS image that we use contains a startup script which will use DHCP to get a lease, so it will get a lease from the DHCP server that libvirt has attached to this network. Finally, in the last line, we ask libvirt to start a VNC server which will reflect the virtual graphics device, mouse and keyboard of the machine, using the same keymap as on the local machine, and to not start a VNC console automatically.

To verify the startup process, you have several options. First, you can use the virt-viewer tool which will display a list of all running machines and allow you to connect via VNC. Alternatively, you can use virt-manager as we have done it in the last post, or use

virt console test-instance

to connect to a text console and log in from there (the user is cirros, the password is gocubsgo). Once the machine is up, you can also SSH into it:

ip=$(virsh domifaddr test-instance \
  | grep "ipv4"  \
  | awk '{print $4}'\
  | sed 's/\/24//')
ssh cirros@$ip

When playing with this, you will find that it takes a long time for the machine to boot. The reason is that the image we use is meant to be used as a lean test image in a cloud platform and therefore tries to query metadata from a metadata server which, in our case, is not present. There are ways to handle this, we get back to this in a later post.

Using backing stores

In the setup we have used so far, every machine has a disk image serving as its virtual hard disk, and all these disk images are maintained independently. Obviously, if you are running a larger number of guests on a physical host, this tends to consume a lot of disk space. To optimize this setup, libvirt allows us to use overlay images. An overlay image is an image which is backed by another images and uses a copy-on-write approach so that we only have to store the data which is actually changed compared to the underlying image.

To try this out, let us first delete our machine again.

virsh destroy test-instance
virsh undefine test-instance

Now we create a new volume which is an overlay volume backed by our CirrOS image.

virsh vol-create-as default test-image.qcow2 20G \
  --format qcow2 \
  --backing-vol /var/lib/libvirt/images/cirros-image.qcow2 \
  --backing-vol-format qcow2 

Here we create a new image test-image.qcow2 (second parameter) of size 20 GB (third parameter in the default pool (first parameter) in qcow2 format (fourth parameter). The additional parameters instruct libvirt to set this image up as an overlay image, backed by our existing CirrOS image. When you now inspect the created image using

sudo ls -l /var/lib/libvirt/images/test-image.qcow2
sudo qemu-img info /var/lib/libvirt/images/test-image.qcow2

you will see a reference to the backing image in the output as well. Make sure that the format of the backing image is correct (apparently libvirt cannot autodetect this, and I had problem when not specifying the format explicitly). Also note that the physical file behind the image is still very small, as it only needs to capture some metadata and changed blocks, and we have not made any changes yet. We can now again bring up a virtual machine, this time using the newly created overlay image.

virt-install \
--name test-instance \
--memory 512 \
--vcpus 1 \
--import \
--disk vol=default/test-image.qcow2,format=qcow2,bus=virtio \
--network network=test-network \
--graphics vnc --noautoconsole 

This completes our short tour through the libvirt toolset and related tools. There are a couple of features that libvirt offers that we have not yet looked at (including things like network filters or snapshots), but I hope that with the overview given in this and the previous post, you will find your way through the available documentation on libvirt.org.

We have seen that to create virtual machines, we have several options, including CLI tools like virsh and virt-install suitable for scripting. Thus libvirt is a good candidate if you want to automate the setup of virtual environments. Being a huge fan of Ansible, I did of course also play with Ansible to see how we can use it to manage virtual machines, which will be the content of my next post.

Managing KVM virtual machines part I – Vagrant and libvirt

When you first install and play with Vagrant, chances are that you will be using the VirtualBox VM provider initially, which is supported out-of-the-box by Vagrant and open source. However, in some situations, VirtualBox might not be your preferred hypervisor. Luckily, with the help of a plugin, Vagrant can also be used with KVM. In this and the next post, we will learn how this works and take the opportunity to also learn a bit on KVM, libvirt and all that.

KVM and libvirt

KVM (Kernel Virtual Machine) is a Linux kernel module which turns Linux into a hypervisor, making use of the hardware support for virtualization that is built into all modern x86 CPUs (this feature is called VT-X on Intel CPUs and AMD-V on AMD CPUs). Typically, KVM is not used directly, but is being managed by libvirt, which is a collection of software components like an API, a daemon for remote access and the virsh command line utility to control virtual machines.

Libvirt, in turn, can be used with clients in most major programming languages (including C, Java, Python and Go), and is employed by many virtualization tools like the graphical virtual machine manager virt-manager or OpenStack Nova to create, manage and destroy virtual machines. There is also a Ruby client for the libvirt API, which makes it accessible from Vagrant.

In addition to KVM, libvirt is actually able to leverage many other virtualization providers, including LXC, VMWare and HyperV. The following diagram summarizes how the components discussed so far are related, the components that we will see in action today are marked.

Libvirt

Creating virtual machines with Vagrant

The above diagram indicates that we have a choice between several methods to create virtual machines. I personally like to use Vagrant for that purpose. As libvirt is not one of the standard providers built into Vagrant, we will have to install a plugin first. Assuming that you have not yet installed Vagrant at all, here are the steps needed to install and set up Vagrant, KVM and the required plugin on a standard Ubuntu 18.04 install. First, we install the libvirt library, the virt-manager and vagrant, and add the current user to the groups libvirt and KVM.

sudo apt-get update 
sudo apt-get install \
  libvirt-daemon \
  libvirt-clients \
  virt-manager \
  python3-libvirt \
  vagrant \
  vagrant-libvirt
sudo adduser $(id -un) libvirt
sudo adduser $(id -un) kvm

At this point, you will have to log out (or run su -l) and in again to make sure that the new group assignments become effective. Note that we install the libvirt Vagrant plugin from the Ubuntu package and not directly, for other Linux distributions, you might want to install using vagrant plugin install vagrant-libvirt. For this post, I have used Vagrant 2.0.2 and version 0.0.43 of the plugin. Finally we download a Debian Jessie image (called a box in Vagrant terminology)

vagrant box add \
  debian/jessie64 \
  --provider=libvirt

Now we are ready to bring up our first machine. Obviously, we need a Vagrant file for that purpose. Here is a minimum Vagrant file

ENV['VAGRANT_DEFAULT_PROVIDER'] = 'libvirt'
Vagrant.configure("2") do |config|
  config.vm.box = "debian/jessie64"
  config.vm.hostname = "test"
  config.vm.provider :libvirt do |v|
    v.cpus=1
    v.memory=1024
    v.default_prefix="test"
  end
end

In the first line, we set an environment variable which instructs Vagrant to use the libvirt provider (instead of the default provider VirtualBox). In the next few lines, we define a virtual machine as usual. In the provider specific block, we define the number of vCPUs for the machine and the amount of RAM and set the prefix that Vagrant is going to use to build a libvirt domain name for the VM.

Now you should be able to bring up the machine using vagrant up in the directory where the file is located.

Once this command completes, it is time to analyze the resulting configuration a bit. Basically, as explained here, the plugin will go through the following steps to bring up the machine.

  • Upload the image which is part of the box that we use to /var/lib/libvirt/images/, into a libvirt storage pool. This is important to understand in case you change the box, as in this case, you will have to remove the image manually again to force a refresh. We will learn more about storage pools in the next post
  • create a virtual machine, called a domain in the libvirt terminology
  • Create a virtual network and attach it to the machine – we get back to this point later. In addition, a “public IP” will be allocated for this IP address, using the built-in DHCP server
  • Create an SSH key and inject it into the machine

Let us try to learn more about the configuration that Vagrant has created for us. First, run virt-manager to start the graphical machine manager. You should now see a new virtual machine in the overview, and when doubleclicking on the machine, a terminal should open. As the Debian image that we use has a passwordless root account, you should actually be able to log in as root.

By clicking on the “Info” icon or via “View –> Details”, you should also be able to see the configuration of the machine, including things like the attached virtual disks and network interfaces.

virt-manager-two

Of course we can also get this – and even more – information using the command line client virsh. First, run virsh list to see a list of all domains (i.e. virtual machines). Assuming that you have no other libvirt-managed virtual machines running, this will give you only one line corresponding to the domain test_default which we have already seen in the virtual machine manager. You can retrieve the basic facts about this domain using

virsh dominfo test_default

The virsh utility has a wide variety of options, the best way to learn it is to type virsh help and simply try out a few of the commands. A few notable examples are

# List all block devices attached to our VM
virsh domblkinfo test_default
# List all storage pools
virsh pool-list 
# List all images in the default pool
virsh vol-list default
# List all virtual network interfaces attached to the VM
virsh domiflist test_default
# List all networks
virsh net-list

Internally, libvirt uses XML configuration files to maintain the state of the virtual machines, networks and storage objects. To see, for instance, the full XML configuration of our test machine, run

virsh dumpxml test_default

In the output, we now see all objects – CPU, disks and disk controller, network interfaces, graphics card, keyboard etc. – attached to the machine. We can now dump further XML structures and data to deep dive into the configuration. For instance, the XML output for the machine tells us that the machine is connected to the network vagrant-libvirt, corresponding to the virtual Linux bridge virbr1 (of course, libvirt uses bridges to model networks). To get more information on this, run

virsh net-dumpxml vagrant-libvirt
virsh net-dhcp-leases vagrant-libvirt
ifconfig virbr1
brctl show virbr1

It is instructive to play a bit with that, maybe add a second virtual machine using virt-manager and see how it is reflected in the virsh tool and the network configuration and so forth.

Advanced features

Let us now look at some of the more advanced options you have with the libvirt Vagrant plugin. The first option I would like to mention is to use custom networking. For that purpose, assume that you have created a libvirt network outside of Vagrant. As an example, create a file /tmp/my-network.xml with the following content.

<network>
<name>my-network</name>
<bridge name='my-bridge' stp='on' delay='0'/>
<ip address='10.100.0.1' netmask='255.255.255.0'/>
</network>

view raw
my-network.xml
hosted with ❤ by GitHub

Then run the following commands to create and start a custom network from this definition using virsh.

virsh net-define /tmp/my-network.xml
virsh net-start my-network

This will create a simple network supported by a Linux bridge my-bridge (which libvirt will create for us). As there is no forward block, the network will be isolated and machines attached to it will not be able to connect to the outside world, so this is the equivalent of a private network. To connect a machine to this network, use the following Vagrant file (make sure to delete our first machine again using vagrant destroy first).

ENV['VAGRANT_DEFAULT_PROVIDER'] = 'libvirt'
Vagrant.configure("2") do |config|
  config.vm.box = "debian/jessie64"
  config.vm.hostname = "test"
  config.vm.network :private_network,
      :ip => "10.100.0.11",
      :libvirt__network_name => "my-network"  
  config.vm.provider :libvirt do |v|
    v.cpus=1
    v.memory=1024
    v.default_prefix="test"
  end
end

Note the line starting with config.vm.network, in which we add our custom network as private network, with a static IP address. When you now run vagrant up again, and repeat the analysis above, you will see that Vagrant has actually attached our machine to two networks – the private network and, in addition, a “public” network which Vagrant will always create to reach the machines via SSH and to enable access to the Internet from the machine.

At this point, it is important that we create the network before we run Vagrant. In fact, if we refer to a network in a Vagrantfile that does not exist yet, Vagrant will be happy to create the network for us, but will use default settings – for instance it will attach a DHCP server to our network and allow access to the internet. This is most likely not what you want, so make sure to create the network using virsh net-define before running vagrant up.

Next let us try to understand how we can use Vagrant to attach additional disks to our virtual machine. Thanks to the libvirt plugin, this is again very easy. Simply add a line like

v.storage :file, :size => '5G', :type => 'raw', :bus => 'scsi'

to the customization section (i.e. to the section which also contains the settings for the number of virtual CPUs and the memory). This will attach a virtual SCSI device to our machine, with a size of 5 GB and image type “raw”. Vagrant will then take care of creating this image, setting it up as a volume in libvirt and attaching it to the virtual machine.

Note that the plugin is also able to automatically synchronize folders in the virtual machine with folders on the host, using for instance rsync. Please refer to the excellent documentation for this and more options.

This completes our short tour through the Vagrant libvirt plugin. You might have realized that libvirt and virsh are powerful tools with a rich data model – we have seen objects like domains, networks, volumes and storage devices. In the next post, we will dig a bit deeper into the inner structure of libvirt and learn how to create virtual machines from scratch, without using a tool like Vagrant.

OpenStack Cinder foundations – building logical volumes and snapshots with LVM

When you want to build a volume service for a cloud platform, you need to find a way to quickly create and remove block devices on your compute nodes. We could of course use loopback devices for this, but this is slow, as every operation goes through the file system. A logical volume manager might be a better alternative. Today, we will investigate the logical volume manager that Cinder actually uses – Linux LVM2.

The Linux logical volume manager – some basic terms

In this section, we will briefly explain some of the key concepts of the Linux logical volume manager (LVM2). First, there are of course physical devices. These are ordinary block devices that the LVM will completely manage, or partitions on block devices. Technically, even though these devices are called physical devices in this context, these devices can themselves be virtual devices, which happens for instance if you run LVM on top of a software RAID. Logically,
the physical devices are divided further into physical extents. An extent the smallest unit of storage that LVM manages.

On the second layer, LVM now bundles one or several physical devices into a volume group. On top of that volume group, you can now create logical devices. These logical devices can be thought of as being divided into logical extents. LVM maps these logical extents to physical extents of the underlying volume group. Thus, a logical device is essentially a collection of physical extents of the underlying volume group which are presented to a user as a logical block device. On top of these logical volumes, you can then create file systems as usual.

LVMTerms

Why would you want to do this? One obvious advantage is again based on the idea of pooling. A logical volume essentially pools the storage capacity of the underlying physical devices and LVM can dynamically assign space to logical devices. If a logical device starts to fill up while other logical devices are still mostly empty, an administrator can simply reallocate capacity between the logical devices without having to change the physical configuration of the system.

Another use case is virtualization. Given that there is sufficiently storage in your logical volume group, you can dynamically create new logical devices with a simple command, which can for instance be used to automatically provision volumes for cloud instances – this is how Cinder leverages the LVM as we will see later on.

Looking at this, you might be reminded of a RAID controller which also manages physical devices and presents their capacity as virtual RAID volumes. It is important to understand that LVM is not (primarily) a RAID manager. In fact, newer versions of LVM also offer RAID functionality (more on this below), but this it not its primary purpose.

Another useful functionality that LVM offers is a snapshot. When you create a snapshot, LVM will not simply create a physical copy. Instead, it will start to mark blocks which are changed after the snapshot has been taken as changed and only copy those blocks to a different location. This makes using the snapshot functionality very efficient.

Lab12: installing and using LVM

Let us now try to see how LVM works in practice. First, we need a machine with a couple of unused block devices. As it is unlikely that you have some spare disks lying around under your desk, we will again use a virtual machine for that purpose. So bring up our test machine and log into it using the following commands (assuming that you have gone through the basic setup steps in the the first post in this series).

git clone https://github.com/christianb93/openstack-labs
cd openstack-labs/Lab12
vagrant up
vagrant ssh box

When you now run lsblk inside the machine, you should see two additional devices /dev/sdc and /dev/sdd which are both unmounted and have a capacity of 5 GB each.

As a first step, let us now prepare these physical volumes for use with LVM. This is done using the pvcreate utility. WARNING: if you accidentally run this outside of the VM, it will render the device unusable!

sudo pvcreate /dev/sdc
sudo pvcreate /dev/sdd

What is this command actually doing? To understand this, let us first use pvscan to print a list of all physical volumes on the system which LVM knows.

sudo pvscan -u

You will see a list of two volumes, and after each volume, LVM will print a UUID for this volume. Now let us see what LVM has actually written on the volume.

sudo dd if=/dev/sdc \
  bs=1024 \
  count=10 \
  | hexdump -C

In the output, you will see that LVM has written some sort of signature onto the device, containing some binary information and the UUID of the device. In fact, this is how LVM stores state and is able to recognize a volume even if it has been moved to a different point in /dev.

Now we can build our first volume group. For that purpose, we use the command vgcreate and specify the name of the volume group and a list of physical devices that the volume group should contain.

sudo vgcreate test_vg /dev/sdc /dev/sdd

If you now repeat the dump above, you will see that LVM has again written some additional data on the device, we find the name of the newly created volume group and even a JSON representation of the physical volumes in the volume group.

Let us now print out a bit more information on the system using the lvm shell. So run

sudo lvm 

to start the shell and then type fullreport to get a description of the current configuration. It is instructive to play a bit with the shell, use help to get a list of available commands and exit to exit the shell when you are done.

Finally, it is now time to create a few logical volumes. Our entire volume group has 10 GB available. We will create three logical volumes which in total consume 6 GB.

for i in {1..3}; do
  sudo lvcreate \
    --size 2G \
    test_vg \
    --type linear \
   --name lv0$i
done
sudo lvscan

The last command will print a list of all logical volumes on the system and should display the three logical volumes that we have just created. If you now again create a full report using lvm, you will find these three devices and a table that indicates how the logical extends are distributed across the various physical devices.

Behind the scenes, LVM uses the Linux device mapper kernel module, and in fact, each device that we create is displayed in the /dev tree at three different points. First, LVM exposes the logical volumes at a location built according to the scheme

/dev/volume group name/logical volume name

In our example, the first volume, for instance, is located at /dev/test_vg/lv01. This, however, is only a link to the device /dev/dm-0, indicating that it is created by the device mapper. Finally, a second link is created in /dev/mapper.

The LVM metadata daemon

We have said above that LVM stores its state on the physical devices. This, however, is only a part of the story, as it would imply that whenever we use one of the tools introduced above, we have to scan all devices, which is slow and might interfere with other read or write access to the device.

For that reason, LVM comes with a metadata daemon, running as lvmetad in the background as a systemctl service. This daemon maintains a cache of the LVM metadata that a command like lvscan will typically use (you can see this if you try to run such a command as non-root, which will cause an error message while the tool is trying to connect to the daemon via a Unix domain socket).

The metadata daemon is also involved when devices are added (hotplug), removed, or changed. If, for example, a physical volume comes up, a Linux kernel mechanism known as udev informs LVM about this event, and when a volume group is complete, all logical volumes based on it are automatically activated (see the comment on use_lvmetad in the configuration file /etc/lvm/lvm.conf).

It is interesting to take a look at the udev ruleset that LVM creates for this purpose (you will find these rules in the LVM-related files in /lib/udev/rules.d, in my distribution, these are the files with the numbers 56 and 69). In the rules file 69-lvm-metad.rules, for instance, you will find a rule that invokes (via systemd dependencies) a pvscan every time a physical device is added which will update the cache maintained by the metadata daemon (see also this man-page for a bit more background on the various options that you have to activate logical LVM devices at boot-time).

However, there is one problem with this type of scan that should be mentioned. Suppose, in our scenario, someone exports our logical device /dev/test_vg/lv01 using a block device level tool like iSCSI. A client then consumes the device and it appears inside the file system of the client as, say, /dev/sdc. On the client, an administrator now decides to also use LVM and sets up this device as a physical volume.

LVMStacked

LVM on the client will now write a signature into /dev/sdc. This write will go through the iSCSI connection and the signature will be written to /dev/test_vg/lv01 on the server. If now LVM on the server scans the devices for signatures the next time, this signature will also appear on the server, and LVM will be confused and believe that a new physical device has been added.

To avoid this sort of issues, the LVM configuration file /etc/lvm/lvm.conf contains an option which allow us to add a filter to the scan, so that only devices which are matching that filter are scanned for PV signatures. We will need this when we later install Cinder which uses LVM to create logical volumes for virtual machines on the fly.

LVM snapshots

Let us now explore a very useful feature of LVM – efficiently creating COW (copy-on-write) snapshots.

The idea behind a copy-on-write snapshot is easily explained. Suppose you have a logical volume that contains, say, 100 extends. You now want to create a snapshot, i.e. a copy of that volume at a given point in time. The naive approach would be to go through all extents and to create an exact copy for each of them. This, however, has two major disadvantages – it is very time consuming and it requires a lot of additional disk space.

When using copy-on-write, you would proceed differently. First, you would create a list of all extents. Then, you would start to monitor write activities on the original volume. As soon as an extent is about to be changed, you would mark it as changed and create a copy of that extent to preserve its content. For those extents, however, that have not yet changed since the snapshot has been created, you would not create a copy, but refer to the original content when someone tries to read from the snapshot, similar to a file system link.

Thus when a read is done on the snapshot, you would first check your list whether the extent has been changed. If yes, the copied extent is used. If no, the read is redirected to the original extent. This procedure is very fast, as we do not have to copy around all the data at the time when the snapshot is created, and uses space efficiently, as the capacity needed for the snapshot does not depend on the total size of the original volume, but on the volume of change.

COWSnapshot

Let us try this out. For this exercise, we will use the logical volume /dev/test_vg/lv01 that we have created earlier. First, use fdisk to create a partition on this volume, then create a file system and a mount point and mount this volume under /mnt/lv/ . Note that – which confused me quite a bit when trying this – the device belonging to the partition will NOT show in in /dev/test_vg, but in /dev/mapper/, i.e. the path to the partition that you have to use with mkfs is /dev/mapper/test_vg-lv01p1. Then create a file in in the mounted directory.

(echo n; echo p; echo 1; echo ; echo ; echo w)\
  | sudo fdisk /dev/test_vg/lv01
sudo partprobe /dev/test_vg/lv01
sudo mkfs -t ext4 /dev/mapper/test_vg-lv01p1
sudo mkdir -p /mnt/lv
sudo mount /dev/mapper/test_vg-lv01p1 /mnt/lv
echo "1" |  sudo tee /mnt/lv/test
sudo sync

Note that we need one execution of partprobe to force the kernel to read the partition table on the logical device which will create the device node for the partition. We also sync the filesystem to make sure that the write goes through to the block device level.

Next, we will create a snapshot. This done using the lvcreate command as follows.

sudo lvcreate \
  --snapshot \
  --name snap01 \
  --size 128M \
  --permission r \
  test_vg/lv01

There are two things that should be noted here. First, we explicitly specify a size of the snapshot which is much smaller than the original volume. At a later point in time, when a lot of data has been written, we might have to extend the volume manually, or we can make use of LVMs auto-extension feature for snapshots (see the comments for the parameter snapshot_autoextend_threshold in /etc/lvm/lvm.conf and the man page of dmeventd which needs to be running to make this work for details). Second, we ask LVM to create a read-only snapshot – LVM can also create read-write snapshots, which in fact is the default, but we will not need this here.

If you now run lvs to get a list of all logical volumes, you will see that a new snapshot volume has been created which is linked (via the “origin” field) to the original volume. Let us now mount the snapshot as well, change the data in our test file and then verify that the file in the snapshot is unchanged.

sudo partprobe /dev/mapper/test_vg-snap01
sudo mkdir -p /mnt/snap
sudo mount /dev/mapper/test_vg-snap01p1 /mnt/snap
echo "2" |  sudo tee /mnt/lv/test
sudo cat /mnt/snap/test

In a real world scenario, we could now use the mounted snapshot as a backup, copy the files that we want to restore then eventually remove the snapshot volume again. Alternatively, we can restore the entire snapshot by merging it back into the original volume, which will reset the original volume to the state in which it was when the snapshot was taken. This is done using the command lvconvert.

sudo lvconvert \
  --mergesnapshot \
  test_vg/snap01

When you run this, the merge will be scheduled, but it will only be executed once the devices are re-activated. At this point, I got a bit into trouble. To understand the problem, let us first umount all mount points and then try to deactivate the original volume.

sudo umount /mnt/lv 
sudo umount /mnt/snap
sudo lvchange -an test_vg/lv01

But wait, there is a problem – when you simply run this command, you will get an error message informing you that the logical volume “is in use by another device”. It took me some time and this blog post describing a similar problem to figure out what goes wrong. To diagnose the problem, we can find the links to our device in the /sys filesystem. First, find the major and minor device number of the logical volume using dmsetup info – in my example, this gave me 253:0. Then, navigate to /sys/dev/block. Here, you will find a subdirectory for each major-minor device number representing the existing devices. Navigate into the one for the combination you just noted and check the holders subdirectory to see who is holding a reference to the device. You will find that the entry in /dev/mapper representing the partition that showed up after running partprobe causes the problem! So we can use

sudo dmsetup remove test_vg-lv01p1
sudo dmsetup remove test_vg-snap01p1

to remove these links for the original volume and the snapshot. Now you should be able to de-activate and activate the volume again.

sudo lvchange -an test_vg/lv01
sudo lvchange -ay test_vg/lv01

After a few seconds, the snapshot should disappear from /dev/mapper, and sudo lvs -a should now longer show the snapshot, indicating that the merge is complete. When you now mount the original volume again and check the test file

sudo partprobe /dev/mapper/test_vg-lv01
sudo mount /dev/mapper/test_vg-lv01p1 /mnt/lv
sudo cat /mnt/lv/test

you should see the original content (1) again.

Note that it is not possible to detach a snapshot from its origin (there is a switch –splitsnapshot for lvconvert, but this does only split of the changed extents, i.e. the COW part, and is primarily intended to be able to zero out those extents before returning them into the volume group pool by removing the snapshot). A snapshot will always require a reference to the original volume.