Cloud – Page 4 – LeftAsExercise

OpenStack Keystone – a deep-dive into tokens and policies

In the previous post, we have installed Keystone and provided an overview of its functionality. Today, we will dive in detail into a typical authorization handshake and take you through the Keystone source code to see how it works under the hood.

The overall workflow

Let us first take a look at the overall process before we start to dig into details. As an example, we will use the openstack CLI to list all existing projects. To better see what is going on behind the scenes, we run the openstack client with the -v command line switch which creates a bit more output than usual.

So, log into the controller node and run

source admin-demorc
openstack -vv project list

This will give a rather lengthy output, so let us focus on those lines that signal that a requests to the API is made. The first API is a GET request to the URL

http://controller:5000/v3

This request will return a list of available API versions, marked with a status. In our case, the result indicates that the stable version is version v3. Next, the clients submits a POST request to the URL

http://controller:5000/v3/auth/tokens

If we look up this API endpoint in the Keystone Identity API reference, we find that this method is used to create and return a token. When making this request, the client will use the data provided in the environment variables set by our admin-openrc script to authenticate with Keystone, and Keystone will assemble and return a token.

The returned data has actually two parts. First, there is the actual Fernet token, which is provided in the HTTP header instead of the HTTP body. Second, there is a token structure which is returned in the response body. This structure contains the user that owns the token, the date when the token expires and the data when the token has been issued, the project for which the token is valid (for a project scoped token) and the roles that the user has for this project. In addition, it contains a service catalog. Here is an example, where I have collapsed the catalog part for better readibility.

Finally, at the bottom of the output, we see that the actual API call to get a list of projects is made, using our newly acquired token and the endpoint

http://controller:5000/v3/projects

So our overall flow looks like this, ignoring some client internal processes like selecting the endpoint (and recovering from failed authorizations, see the last section of this post).

Let us now go through these requests step by step and see how tokens and policies interact.

Creating a token

When we submit the API request to create a token, we end up in the method post in the AuthTokenResource class defined in keystone/api/auth.py. Here we find the code.

token=authentication.authenticate_for_token(auth_data)
resp_data=render_token.render_token_response_from_model(
          token, include_catalog=include_catalog
)

The method authenticate_for_token is defined in keystone/api/_shared/authentication.py. Here, we first authenticate the user, using the auth data provided in the request, in our case this is username, password, domain and project as defined in admin-openrc. Then, the actual token generation is triggered by the call

token=PROVIDERS.token_provider_api.issue_token(
          auth_context['user_id'], 
          method_names, 
          expires_at=expires_at,
          system=system, 
          project_id=project_id, 
          domain_id=domain_id,
          auth_context=auth_context, 
          trust_id=trust_id,
          app_cred_id=app_cred_id, 
         parent_audit_id=token_audit_id)

Here we see an additional layer of indirection in action – the ProviderAPIRegistry as defined in keystone/common/provider_api.py. Without getting into details, here is the idea of this approach which is used in a similar way in other OpenStack services.

Keystone itself consists of several components, each of which provide different methods (aka internal APIs). There is, for instance, the code in keystone/identity handling the core identity features, the code in keystone/assignment handling role assigments, the code in keystone/token handling tokens and so forth. Each of these components contains a class typically called Manager which is derived from the base class Manager in keystone/common/manager.py.

When such a class is instantiated, it registers its methods with the static instance ProviderAPI of the class ProviderAPIRegistry defined in keystone/common/provider_api.py. Technically, registering means that the object is added as attribute to the ProviderAPI object. For the token API, for instance, the the Manager class in keystone/token/provider.py registers itself using the name token_provider_api, so that it is added to the provider registry object as the attribute token_provider_api. Thus a method XXX of this manager class can now be invoked using

from keystone.common import provider_api
provider_api.ProviderAPIs.token_provider_api.XXX()

or by

from keystone.common import provider_api
PROVIDERS = provider_api.ProviderAPIs
PROVIDERS.token_provider_api.XXX()

This is exactly what happens here, and this is why the above line will actually take us to the method issue_token of the Manager class defined in keystone/token/provider.py. Here, we build and populate an instance of the Token class defined in keystone/models/token_model.py and populate it with the available data. We then populate the field token.id where we put the actual token, i.e. the encoded string that will end up in the HTTP header of future requests. This is done in the line

token_id, issued_at =
             self.driver.generate_id_and_issued_at(token)

which calls the actual token provider, for instance the Fernet provider. For a Fernet token, this will eventually end up in the line

token_id=self.token_formatter.create_token(
    token.user_id,
    token.expires_at,
    token.audit_ids,
    token_payload_class,
    methods=token.methods,
    system=token.system,
    domain_id=token.domain_id,
    project_id=token.project_id,
    trust_id=token.trust_id,
    federated_group_ids=token.federated_groups,
    identity_provider_id=token.identity_provider_id,
    protocol_id=token.protocol_id,
    access_token_id=token.access_token_id,
    app_cred_id=token.application_credential_id
)

calling the token formatter which will do the low level work of actually creating and encrypting the token. The token ID will then be added to the token data structure, along with the creation time (a process known as minting) before the token is returned up the call chain.

At this point, the token does not yet contain any role information and no service catalog. To enrich the token by this information, it is rendered by calling render_token defined in keystone/common/render_token.py. Here, a dictionary is built and populated with data including information on role, scope and endpoints.

Note that the role information in the token is dynamic, in fact, in the Token class, the property decoration is used to divert access to the roles property to a method call. Here, we receive the scope information and select and return only these roles which are bound to the respective domain or project if the token is domain scoped or project scoped. When we render the token, we access the roles attribute and retrieve the role information from the method bound to it.

Within this method, an additional piece of logic is implemented which is relevant for the later authorization process. Keystone allows an administrator to define a so-called admin project. Any user who authenticates with a token scoped to this special project is called a cloud admin, a special role which can be referenced in policies. When rendering the token, the project to which the token refers (if it its project scoped) is compared to this special project, and if they match, an additional attribute is_admin_project is added to the token dictionary.

Finally, back in the post method, we build the response body from the token structure and add the actual token to the response header in the line

response.headers['X-Subject-Token'] = token.id

Here is a graphical overview on the process as we have discussed it so far.

The key learnings from the code that we can deduce so far are

The actual Fernet token contains a minimum of information, like the user for whom the token is issued and – depending on the scope – the Ids of the project or domain to which the token is scoped
When a token is requested, the actual Fernet token (the token ID) is returned in the response header, and an enriched version of the token is added in the response body
This enrichment is done dynamically using the Keystone database, and the enrichment will only add the roles to the token data that are relevant for the token scope
There is a special admin project, and a token scoped to this project implies the cloud administrator role

Using the token to authorize a request

Let us now see what happens when a client uses this token to actually make a request to the API – in our example, this happens when the openstack client makes the actual API call to the endpoint http://controller:5000/v3/projects.

Before this request is actually dispatched to the business logic, it passes through the WSGI middleware. Here, more precisely in the class method AuthContextMiddleware.process_request defined in the file keystone/server/flask/request_processing/middleware/auth_context.py, the token is retrieved from the field X-Auth-Token in the HTTP header of the request (here we also put the marker field is_admin into the context when an admin_token is defined in the configuration and equal to the actual token). Then the process_request method of the superclass is called which invokes fetch_token (of the derived class!). Here, the validate_token method of the token provider is called which performs the actual token validation. Finally, the token is again rendered as above, thereby adding the relevant roles dynamically, and put as token_reference in the request context (this happens in the method fill_context respectively _keystone_specific_values of the middleware class).

At this point, it is instructive to take a closer look at the method that actually selects the relevant roles – the method roles of the token class defined in keystone/models/token_model.py. If you follow the call chain, you will find that, to obtain for instance all project roles, the internal API of the assignment component is used. This API returns the effective roles of the user, i.e. roles that include those roles that the user has due to group membership and roles that are inherited, for instance from the domain-level to the project level or down a tree of subprojects. Effective roles also include implied roles. It is important to understand (and reasonable) that it is the effective roles that enter a token and are therefore evaluated during the authorization process.

Once the entire chain of middleware has been processed, we finally reach the method _list_projects in keystone/api/projects.py. Close to the start of this method, the enforce_call method of the class RBACEnforcer in keystone/common/rbac_enforcer/enforcer.py. When making this call, the action identity:list_projects is passed as a parameter. In addition, a parameter called target is passed, a dictionary which contains some information on the objects to which the API request refers. In our example, as long as we do not specify any filters, this dictionary will be empty. If, however, we specify a domain ID as a filter, it will contain the ID of this domain. As we will see later, this allows us to define policies that allow a user to see projects in a specific domain, but not globally.

The enforce_call method will first make a couple of validations before it checks whether the request context contains the attribute is_admin. If yes, the token validation is skipped and the request is always allowed- this is to support the ADMIN_TOKEN bootstrapping mechanism. Then, close to the bottom of the method, we retrieve the request context, instantiate a new object and calls its _enforce method which essentially delegates the call to the Oslo policy rules engine and its Enforcer class, more precisely to the enforce method of this class.

As input, this method receives the action (identity:list_projects in our case), the target of the action, and the credentials, in the form of the Oslo request context, and the processing of the rules starts.

Again, let us quickly summarize what the key take aways from this discussion should be – these points actually apply to most other OpenStack services as well.

When a request is received, the WSGI middleware is responsible for validating the token, retrieving the additional information like role data and placing it in the request context
Again, only those roles are stored in the context which the user has for the scope of the token (i.e. on project level for project-scoped token, on the domain level for domain-scoped token and on the system level for system-scoped token)
The roles in the token are effective roles, i.e. taking inheritance into account
The actual check against the policy is done by the Oslo policy rule engine

The Oslo policy rule engine

Before getting into the details of the rule engine, let us quickly summarize what data the rule engine has at its disposal. First, we have seen that it receives the action, which is simply a string, identity:list_projects in our case. Then, it has information on the target, which, generally speaking, is the object on which the action should be performed (this is less relevant in our example, but becomes important when we modify data). Finally, it has the credentials, including the token and role information which was part of the token and is now stored in the request context which the rule engine receives.

The engine will now run this data through all rules which are defined in the policy. Within the engine, a rule (or check) is simply an object with a __call__ method, so that they can be treated and invoked like a function. In the module _checks.py, a few basic checks are defined. There are, for instance, simple checks that always return true or false, and their checks like AndCheck and OrCheck which can be used to build more complex rules from basic building blocks. And there are other checks like the RoleCheck which checks whether a certain role is present in the credentials, which, as we know from the discussion above, is the case if the token use to authorize contains this role, i..e if the user who is owning the token has this role with respect to the scope of the token.

Where do the rules come from that are processed? First, note that the parameter rule to the enforce method does, in our case at least, contain a string, namely the action (identity:list_projects). To load the actual rules, the method enforce will first call load_rules which loads rules from a policy file, at which we will take a look in a second. Loading the policy file will create a new instance of the Rules class, which is a container class to hold a set of rules.

After loading all rules, the following line in enforce identifies the actual rule to be processed.

to_check = self.rules[rule]

This looks a bit confusing, but recall that here, rule actually contains the action identity:list_projects, so we look up the rule associated with this action. Finally, the actual rule checking is done by invoking the _check methods of the _checks module.

Let us now take a closer look at the policy files themselves. These files are typically located in the /etc/XXX subdirectory, where XXX is the OpenStack component in question. Samples files are maintained by the OpenStack team. To see an example, let us take a look at the sample policy file for Keystone which was distributed with the Rocky release. Here, we find a line

identity:list_projects": "rule:cloud_admin or rule:admin_and_matching_domain_id",

This file is in JSON syntax, and this line defines a dictionary entry with the action identity:list_projects and the rule rule:cloud_admin or rule:admin_and_matching_domain_id. The full syntax of the rule is explained nicely here or in the comments at the start of policy.py. In essence, in our example, the rule says that the action is allowed if either the user is a cloud administrator (i.e. an administrator the the special admin project or admin domain which can be configured in the Keystone configuration file) or is an admin for the requested domain.

When I first looked at the policy files in my test installation, however, which uses the Stein release, I was more than confused. Here, the rule for the action identity:list_projects is as follows.

"identity:list_projects": "rule:identity:list_projects"

Here we define a rule called identity:list_projects for the action with the same name, but where is this rule defined?

The answer is that there is a second source of rules, namely software defined rules (which the OpenStack documentation calls policy-in-code) which are registered when the enforcer object is created. This happens in the _enforcer method of the RBACEnforcer when a new enforcer is created. Here we call register_rules which creates a list of rules by calling the function list_rules define in the keystone/common/policies module which returns a list of sofware-defined rules, and registers these rules with the Oslo policy enforcer. The rule we are looking for, for instance, is defined in keystone/common/policies/project.py and looks as follows.

policy.DocumentedRuleDefault(
        name=base.IDENTITY % 'list_projects',
        check_str=SYSTEM_READER_OR_DOMAIN_READER,
        scope_types=['system', 'domain'],
        description='List projects.',
        operations=[{'path': '/v3/projects',
                     'method': 'GET'}],
        deprecated_rule=deprecated_list_projects,
        deprecated_reason=DEPRECATED_REASON,
        deprecated_since=versionutils.deprecated.STEIN),

Here we see that the actual rule (in the attribute check_str) has now changed compared to the Rocky release, and allows access if either the user has the reader role on the system level or has the reader role for the requested domain. In addition, there is a deprecated rule for backwards compatibility which is OR’ed with the actual rule. So the rule that really gets evaluated in our case is

(role:reader and system_scope:all) or (role:reader and domain_id:%(target.domain_id)s) or rule:admin_required

In our case, asking OpenStack to list all projects, there is a further piece of magic involved. This becomes visible if you try a different user. For instance, we can create a new project demo with a user demo who has the reader role for this project. If you now run the OpenStack client again to get all projects, you will only see those projects for which the user has a role. This is again a bit confusing, because by what we have discussed above, the authorization should fail.

In fact, it does, but the client is smart enough to have a plan B. If you look at the output of the OpenStack CLI with the -vvv flag, you will that a first request is made to list all projects which fails, as expected. The client then tries a second request, this time using the URL /users//projects to get all projects for that specific user. This call ends up in the method get of the class UserProjectsResource defined in keystone/api/users.py which will list all projects for which a specifc user has a role. Here, a call is made with a different action called identity:list_user_projects, and the rule for this action allows access if the user making the request (i.e. the user data from the token) is equal to target user (i.e. the user ID specified in the request). Thus this final call succeeds.

These examples are hopefully sufficient to demonstrate that policies can be a tricky topic. It is actually very instructive to add debugging output to the involved classes (the Python source code is on the controller node in /usr/lib/python3/dist-packages, do not forget to restart Apache if you have made changes to the code) to print out the various structures and trace the flow through the code. Happy hacking!

Openstack Keystone – installation and overview

Today we will dive into OpenStack Keystone, the part of OpenStack that provides services like management of users, roles and projects, authentication and a service catalog to the other OpenStack components. We will first install Keystone and then take a closer look at each of these areas.

Installing Keystone

As in the previous lab, I have put together a couple of scripts that automatically install Keystone in a virtual environment. To run them, issue the following commands (assuming, of course, that you did go through the basic setup steps from the previous post to set up the environment)

pwd
# you should be in the directory into which 
# you did clone the repository
cd openstack-samples/Lab2
vagrant up
ansible-playbook -i hosts.ini site.yaml

While the scripts are running, let us discuss the installation steps. First, we need to prepare the database. Keystone uses its own database schema called (well, you might guess …) keystone that needs to be added to the MariaDB instance. We will also have to create a new database user keystone with the necessary privileges on the keystone database.

Then, we install the Keystone packages via APT. This will put default configuration files into /etc/keystone which we need to adapt. Actually, there is only one change that we need to make at this point – we need to change the connection key to contain a proper connection string to reach our MariaDB with the database credentials just established.

Next, the keystone database needs to be created. To do this, we use the keystone-manage db_sync command that actually performs an upgrade of the Keystone DB schema to the latest version. We then again utilize the keystone-manage tool to create symmetric keys for the Fernet token mechanism and to encrypt credentials in the SQL backend.

Now we need to add a minimum set of domains, project and users to Keystone. Here, however, we face a chicken-and-egg problem. To be able to add a user, we need the authorization to do this, so we need a user, but there is no user yet.

There are two solutions to this problem. First, it is possible to define an admin token in the Keystone configuration file. When this token is used for a request, the entire authorization mechanism is bypassed, which we could use to create our initial admin user. This method, however, is a bit dangerous. The admin token is contained in the configuration file in clear text and never expires, so anyone who has access to the file can perform every action in Keystone and then OpenStack.

The second approach is to again the keystone-manage tool which has a command bootstrap which will access the database directly (more precisely, via the keystone code base) and will create a default domain, a default project, an admin user and three roles (admin, member, reader). The admin user is set up to have the admin role for the admin project and on system level. In addition, the bootstrap process will create a region and catalog entries for the identity services (we will discuss these terms later on).

Users, projects, roles and domains

Of course, users are the central object in Keystone. A user can either represent an actual human user or a service account which is used to define access rights for the OpenStack services with respect to other services.

In a typical cloud environment, just having a global set of users, however, is not enough. Instead, you will typically have several organizations or tenants that use the same cloud platform, but require a certain degree of separation. In OpenStack, tenants are modeled as projects (even though the term tenant is sometimes used as well to refer to the same thing). Projects and users, in turn, are both grouped into domains.

To actually define which user has which access rights in the system, Keystone allows you to define roles and assign roles to users. In fact, when you assign a role, you always do this for a project or a domain. You would, for instance, assign the role reader to user bob for the project test or for a domain. So role assignments always refer to a user and either a role or a project.

Note that it is possible to assign a role to a user in a domain for a project living in a different domain (though you will probably have a good reason to do this).

In fact, the full picture is even a bit more complicated than this. First, roles can imply other roles. In the default installation, the admin role implies the member role, and the member role implies the reader role. Second, the above diagram suggests that a role is not part of a domain. This is true in most cases, but it is in fact possible to create domain-specific roles. These roles do not appear in a token and are therefore not directly relevant to authorization, but are intended to be used as prior roles to map domain specific role logic onto the overall role logic of an installation.

It is also not entirely true that roles always refer to either a domain or a project. In fact, Keystone allows for so-called system roles which are supposed to be used to restrict access to operations that are system wide, for instance the configuration of API endpoints.

Finally, there are also groups. Groups are just collections of users, and instead of assigning a role to a user, you can assign a role to a group which then effectively is valid for all users in that group.

And, yes, there are also subprojects.. but let us stop here, you see that the Keystone data structures are complicated and have been growing significantly over time.

To better understand the terms discussed so far, let us take a look at our sample installation. First, establish an SSH connection to some of the nodes, say the controller node.

vagrant ssh controller

On this node, we will use the OpenStack Python client to explore users, projects and domains. To run it, we will need credentials. When you work with the OpenStack CLI, there are several methods to supply credentials. The option we will use is to provide credentials in environment variables. To be able to quickly set up these variables, the installation script creates a bash script admin-openrc that sets these credentials. So let us source this script and then submit an OpenStack API request to list all existing users.

source admin-openrc
openstack user list

At this point, this should only give you one user – the admin user created during the installation process. To display more details for this user, you can use openstack user show admin, and you should obtain an output similar to the one below.

+---------------------+----------------------------------+
| Field               | Value                            |
+---------------------+----------------------------------+
| domain_id           | default                          |
| enabled             | True                             |
| id                  | 67a4f789b4b0496cade832a492f7048f |
| name                | admin                            |
| options             | {}                               |
| password_expires_at | None                             |
+---------------------+----------------------------------+

We see that the user admin is part of the default domain, which is the standard domain used in OpenStack as long as no other domain is specified.

Let us now see which role assignments this user has. To do this, let us list all assigments for the user admin, using JSON output for better readability.

openstack role assignment list --user admin -f json

This will yield the following output.

[
  {
    "Role": "18307c8c97a34d799d965f38b5aecc37",
    "User": "92f953a349304d48a989635b627e1cb3",
    "Group": "",
    "Project": "5b634876aa9a422c83591632a281ad59",
    "Domain": "",
    "System": "",
    "Inherited": false
  },
  {
    "Role": "18307c8c97a34d799d965f38b5aecc37",
    "User": "92f953a349304d48a989635b627e1cb3",
    "Group": "",
    "Project": "",
    "Domain": "",
    "System": "all",
    "Inherited": false
  }
]

Here we see that there are two role assignments for this user. As the output only contains the UUIDs of the role and the project, we will have to list all projects and all roles to be able to interpret the output.

openstack project list 
openstack role list

So we see that for both assignments, the role is the admin role. For the first assignment, the project is the admin project, and for the second assignment, there is no project (and no domain), but the system field is filled. Thus the first assignment assigns the admin role for the admin project to our user, whereas the second one assigns the admin role on system level.

So far, we have not specified anywhere what these roles actually imply. To understand how roles lead to authorizations, there are still two missing pieces. First, OpenStack has a concept of implied roles. These are roles that a user has which are implied by explicitly defined roles. To see implied roles in action, run

openstack implied role list

The resulting table will list the prior roles on the left and the implied role on the right. So we see that having the admin role implies to also have the member role, and having the member role in turn implies to also have the reader role.

The second concept that we have not touched upon are policies. Policies actually define what a user having a specific role is allowed to do. Whenever you submit an API request, this request targets a certain action. Actions more or less correspond to API endpoints, so an action could be “list all projects” or “create a user”. A policy defines a rule for this action which is evaluated to determine whether that request is allowed. A simple rule could be “user needs to have the admin role”, but the rule engine is rather powerful and we can define much more elaborated rules – more on this in the next post.

The important point to understand here is that policies are not defined by the admin via APIs, but are predefined either in the code or in specific policy files that are part of the configuration of each OpenStack service. Policies refer to roles by name, and it does not make sense to define and use a role that this is not referenced by policies (even though you can technically do this). Thus you will rarely need to create roles beyond the standard roles admin, member and reader unless you also change the policy files.

Service catalogs

Apart from managing users (the Identity part of Keystone), project, roles and domains (the Resources part of Keystone), Keystone also acts as a service registry. OpenStack services register themselves and their API endpoints with Keystone, and OpenStack clients can use this information to obtain the URL of service endpoints.

Let us take a look at the services that are currently registered with Keystone. This can be done by running the following commands on the controller.

source admin-openrc
openstack service list -f json

At this point in the installation, before installing any other OpenStack services, there is only one service – Keystone itself. The corresponding output is

[
  {
    "ID": "3acb257f823c4ecea6cf0a9e94ce67b9",
    "Name": "keystone",
    "Type": "identity"
  },
]

We see that a service has a name which identifies the actual service, in addition to a type which defines the type of service delivered. Given the type of a service, we can now use Keystone to retrieve a list of service API endpoints. In our example, enter

openstack endpoint list --service identity -f json

which should yield the following output.

[
  {
    "ID": "062975c2758f4112b5d6568fe068aa6f",
    "Region": "RegionOne",
    "Service Name": "keystone",
    "Service Type": "identity",
    "Enabled": true,
    "Interface": "public",
    "URL": "http://controller:5000/v3/"
  },
  {
    "ID": "207708ecb77e40e5abf9de28e4932913",
    "Region": "RegionOne",
    "Service Name": "keystone",
    "Service Type": "identity",
    "Enabled": true,
    "Interface": "admin",
    "URL": "http://controller:5000/v3/"
  },
  {
    "ID": "781c147d02604f109eef1f55248f335c",
    "Region": "RegionOne",
    "Service Name": "keystone",
    "Service Type": "identity",
    "Enabled": true,
    "Interface": "internal",
    "URL": "http://controller:5000/v3/"
  }
]

Here, we see that every service typically offers different types of endpoints. There are public endpoints, which are supposed to be reachable from an external network, internal endpoints for users in the internal network and admin endpoints for administrative access. This, however, is not enforced by Keystone but by the network layout you have chosen. In our simple test installation, all three endpoints for a service will be identical.

When we install more OpenStack services later, you will see that as part of this installation, we will always register a new service and corresponding endpoints with Keystone.

Token authorization

So far, we have not yet discussed how an OpenStack service actually authenticates a user. There are several ways to do this. First, you can authorize using passwords. When using the OpenStack CLI, for instance, you can put username and password into environment variables which will then be used to make API requests (for ease of use, the Ansible playbooks that we use to bring up our environment will create a file admin-openrc which you can source to set these variables and that we have already used in the examples above).

In most cases, however, subsequent authorizations will use a token. A token is essentially a short string which is issued once by Keystone and then put into the X-Auth-Token field in the HTTP header of subsequent requests. If a token is present, Keystone will validate this token and, if it is valid, be able to derive all informations it needs to authenticate the user and authorize a request.

Keystone is able to use different token formats. The default token format with recent releases of Keystone is the Fernet token format.

It is important to understand that tokens are scoped objects. The scope of a token determines which roles are taken into account for the authorization process. If a token is project scoped, only those roles of a user that target a project are considered. If a token is domain scoped, only the roles that are defined on domain level are considered. And finally, a system scope token implies that only roles at system level are relevant for the authorization process.

Earlier versions of Keystone supported a token type called PKI token that contained a large amount of information directly, including role information and service endpoints. The advantage of this approach was that once a token had been issued, it could be processed without any further access to Keystone. The disadvantage, however, was that the tokens generated in this way tended to be huge, and soon reached a point where they could no longer be put into a HTTP header. The Fernet token format handles things differently. A Fernet token is an encrypted token which contains only a limited amount of information. To use it, a service will, in most cases, need to run additional calls agains Keystone to retrieve additional data like roles and services. For a project scoped token, for instance, the following diagram displays the information that is stored in a token on the left hand side.

First, there is a version number which encodes the information on the scope of the token. Then, there is the ID of the user for whom the token is issued, the methods that the user has used to authenticate, the ID of the project for which the token is valid, and the expiration date. Finally, there is an audit ID which is simply a randomly generated string that can be put into logfiles (in contrast to the token itself, which should be kept secret) and can be used to trace the usage of this token. All these fields are serialized and encrypted using a symmetric key stored by Keystone, typically on disk. A domain scoped token contains a domain ID instead of the project ID and so forth.

Equipped with this understanding, we can now take a look at the overall authorization process. Suppose a client wants to request an action via the API, say from Nova. First, the client would then use password-based authorization to obtain a token from Keystone. Keystone returns the token along with an enriched version containing roles and endpoints as well. The client would use the endpoint information to determine the URL of the Nova service. Using the token, it would then try to submit an API request to the Nova API.

The Nova service will take the token, validate it and ask Keystone again to enrich the token, i.e to add the missing information on roles and endpoints (in fact, this happens in the Keystone middleware). It is then able to use the role information and its policies to determine whether the user is authorized for the request.

Using Keystone with LDAP and other authentication mechanisms

So far, we have stored user identities and group inside the MariaDB database, i.e. local. In most production setups, however, you will want to connect to an existing identity store, which is typically exposed via the LDAP protocol. Fortunately, Keystone can be integrated with LDAP. This integration is read-only, and Keystone will use LDAP for authentication, but still store projects, domains and role information in the MariaDB database.

When using this feature, you will have to add various data to the Keystone configuration file. First, of course, you need to add basic connectivity information like credentials, host and port so that Keystone can connect to an LDAP server. In addition, you need to define how the fields of a user entity in LDAP map onto the corresponding fields in Keystone. Optionally, TLS can be configured to secure the connection to an LDAP server.

In addition to LDAP, Keystone also supports a variety of alternative methods for authentication. First, Keystone supports federation, i.e. the ability to share authentication data between different identity providers. Typically, Keystone will act as a service provider, i.e. when a user tries to connect to Keystone, the user is redirected to an identity provider, authenticates with this provider and Keystone receives and accepts the user data from this provider. Keystone supports both the OpenID Connect and the SAML standard to exchange authentication data with an identity provider.

As an alternative mechanism, Keystone can delegate the process of authentication to the Apache webserver in which Keystone is running – this is called external authentication in Keystone. In this case, Apache will handle the authentication, using whatever mechanisms the administrator has configured in Apache, and pass the resulting user identity as part of the request context down to Keystone. Keystone will then look up this user in its backend and use it for further processing.

Finally, Keystone offers a mechanism called application credentials to allow applications to use the API on behalf of a Keystone user without having to reveal the users password to the application. In this scenario, a user creates an application credential, passing in a secret and (optionally) a subset of roles and endpoints to which the credential grants access. Keystone will then create a credential and pass its ID back to the user. The user can then store the credential ID and the secret in the applications configuration. When the application wants to access an OpenStack service, it uses a POST request on the /auth/tokens endpoint to request a token, and Keystone will generate a token that the application can use to connect to OpenStack services.

This completes our post today. Before moving on to install additional services like Placement, Glance and Nova, we will – in the next post – go on a guided tour through a part of the Keystone source code to see how tokens and policies work under the hood.

Setting up our OpenStack playground

In this post, we will describe the setup of our Lab environment and install the basic infrastructure services that OpenStack uses.

Environment setup

In a real world setup, OpenStack runs on a collection of physical servers on which the virtual machines provided by the cloud run. Now most of us will probably not have a rack in their basement, so that using four or five physical servers for our labs is not a realistic option. Instead, we will use virtual machines for that purpose.

To avoid confusion, let us first fix some terms. First, there is the actual physical machine on which the labs will run, most likely a desktop PC or a laptop, and most likely the PC you are using to read this post. Let us call this machine the lab host.

On this host, we will run Virtualbox to create virtual machines. These virtual machines will be called the nodes, and they will play the role that in a real world setup, the physical servers would play. We will be using one controller node on which most of the OpenStack components will run, and two compute nodes.

Inside the compute nodes, the Nova compute service will then provision virtual machines which we call VMs. So effectively, we use nested virtualization – the VM is itself running inside a virtual machine (the node).

To run the labs, your host will need to have a certain minimum amount of RAM. When I tested the setup, I found that the controller node and the compute nodes in total consume at least 7-8 GB of RAM, which will increase depending on the number of VMs you run. To still be able to work on the machine, you will therefore need at least 16 GB of RAM. If you have more – even better. If you have less, you might also want to use a cloud based setup. In this case, the host could itself be a virtual machine in the cloud, or you could use a bare-metal provider like Packet to get access to a physical host with the necessary memory.

Not every cloud will work, though, as it needs to support nested virtualization. I have tested the setup on DigitalOcean and found that it works, but other cloud providers might yield different results.

Networking

Let us now take a look at the network configuration that we will use for our hosts. If you run OpenStack, there will be different categories of traffic between the nodes. First, there is management traffic, i.e. communication between the different components of the platform, like messages exchanged via RabbitMQ or API calls. For security and availability reasons, this traffic is typically handled via a dedicated management network. The management network is configured by the administrator and used by the OpenStack components.

Then, there is traffic between the VMs, or, more precisely, between the guests running inside the VMs. The network which is supporting this traffic is called the guest network. Note that we are not yet talking about a virtual network here, but about the network connecting the various nodes which eventually will be used for this traffic.

Sometimes, additional network types need to be considered, there could for instance be a dedicated API network to allow end users and administrators access to the API without depending on any of the other networks, or a dedicated external network that connects the network node to a physical route to provide internet access for guests, but for this setup, we will only use a two networks – a management network and a guest network. Note that the guest network needs to be provided by an adminstrator, but is controlled by Openstack (which, for instance, will add the interfaces that make up the network to virtual bridges so that they can no longer be used for other traffic).

In our case, both networks, the management network and the guest network, will be set up as Virtualbox host-only networks, connecting our nodes. Here is a diagram that summarizes the network topology we will use.

Setting up your host and first steps

Let us now roll up our sleeves and dive right into the first lab. Today, we will bring up our environment, and, on each node, install the required infrastructure like MySQL, RabbitMQ and so forth.

First, however, we need to prepare our host. Obviously, we need some tools installed – Vagrant, Virtualbox and Ansible. We will also use pwgen to create credentials. How exactly these tool need to be installed depends on your Linux distribution, on Ubuntu, you would run

sudo apt-get install python3-pip
pip3 install 'ansible==v2.8.6' 
sudo apt-get install pwgen
sudo apt-get install virtualbox
sudo apt-get install vagrant

The Ansible version is important. I found that there is at least oneissue which breaks network creation in OpenStack with Ansible with some 2.9.x versions of Ansible.

When we set up our labs, we will sometimes have to throw away our environment and rebuild it. This will be fully automated, but it implies that we need to download packages into the nodes over and over again. To speed up this process, we install a local APT cache. I use APT-Cacher-NG for that purpose. Installing it is very easy, simply run

sudo apt-get install apt-cacher-ng

This will install a proxy, listening on port 3142, which will create local copies of packages that you install. Later, we will instruct the apt processes running in our virtual machines to use this cache.

Now we are ready to start. First, you will have to clone my repository to get a copy of the scripts that we will use.

git clone https://github.com/christianb93/openstack-labs
cd openstack-labs/Lab1

Next, we will bring up our virtual machines. There is, however, a little twist when it comes to networking. As mentioned above, we will use Virtualbox host networking. As you might know when you have read my previous post on this topic, Virtualbox will create two virtual devices to support this, one for each network. These devices will be called vboxnet0 and vboxnet1. However, if these devices already exist, Virtualbox will use them and take over parts of the existing network configuration. This can lead to problems later, if, for instance, Virtualbox runs a DHCP server on this device, this will conflict with the OpenStack DHCP agent and your VMs will get incorrect IP addresses and will not be reachable. To avoid this, we will delete any existing interfaces (which of course requires that you stop all other virtual machines) and recreate them before we bring up our machines. The repository contains a shells script to do this. To run it and start the machines, enter

../scripts/createVBoxNetworks.sh
vagrant up

We are now ready to run our playbook. Before doing this, let us first discuss what the scripts will actually do.

First, we need a set of credentials. These credentials consist of a set of randomly generated passwords that we use to set up the various users that the installation needs (database users, RabbitMQ users, Keystone users and so forth) and an SSH key pair that we will use later to access our virtual machines. These credentials will be created automatically and stored in ~/.os_credentials.

Next, we need a basic setup within each of the nodes – we will need the Python OpenStack modules, we will need to bring up all network interfaces, and we will update the /etc/hosts configuration files in each of the nodes to be able to resolve all other nodes.

We will also change the configuration of the APT package manager. We will point APT to the APT cache running on the host and we will add the Ubuntu OpenStack Cloud Archive repository to the repository list from which we will pull the OpenStack packages.

Next, we need to make sure that the time on all nodes is synchronized. To achieve this, we install a network of NTP daemons. We use Chrony and set up the controller as Chrony server and the compute nodes as clients. We then install MySQL, Memcached and RabbitMQ on the controller node and create the required users.

All this is done by the playbook site.yaml, and you can simply run it by typing

ansible-playbook -i hosts.ini site.yaml

Once the script completes, we can run a few checks to see that everything worked. First, log into the controller node using vagrant ssh controller and verify that Chrony is running and that we have network connectivity to the other nodes.

sudo systemctl | grep "chrony"
ping compute1
ping compute2

Then, verify that you can log into MySQL locally and that the root user has a non-empty password (we can still log in locally as root without a password) by running sudo mysql and then, on the SQL prompt, typing

select * from mysql.user;

Finally, let us verify that RabbitMQ is running and has a new user openstack.

sudo rabbitmqctl list_users
sudo rabbitmqctl node_health_check
sudo rabbitmqctl status

A final note on versions. This post and most upcoming posts in this series have been created with a lab PC running Python 3.6 and Ansible 2.8.9. After upgrading my lab PC to Ubuntu 20.04 today, I continued to use Ansible 2.8.9 because I had experienced problems with newer versions earlier on, but upgraded to Python 3.8. After doing this, I hit upon this bug that requires this fix which I reconciled manually into my local Ansible version.

We are now ready to install our first OpenStack services. In the next post, we will install Keystone and learn more about domains, users, projects and services in OpenStack.

Building your own cloud with OpenStack – overview

Over time, I have worked with a couple of different commercial cloud platforms like AWS, DigitalOcean, GCP, Paperspace or Packet.net. Even though these platforms are rather well documented, there comes a point where you would like to have more insights into the inner workings of a cloud platform. Unfortunately, not too many of use have permission to walk into a Google data center and dive into their setup, but we can install and study one of the most relevant open source cloud platforms – OpenStack.

What is OpenStack?

OpenStack is an open source project (or, more precisely, a collection of projects) aiming at providing a state-of-the-art cloud platform. Essentially, OpenStack contains everything that you need to convert a set of physical servers into a cloud. There are components that interact with a hypervisor like KVM to build and run virtual machines, components to define and operate virtual networks and virtual storage, components to maintain images, a set of APIs to operate your cloud and a web-based graphical user interface.

OpenStack has been launched by Rackspace and NASA in 2010, but is currently supported by a large number of organisations. Some commercially supported OpenStack distributions are available, like RedHat OpenStack, Lenovo Thinkcloud or VMWare Integrated OpenStack. The software is written in Python, which for me was one of the reasons why I have decided to dive into OpenStack instead of one of the other open source cloud platforms like OpenNebula or Apache Cloudstack.

New releases of OpenStack are published every six months. This and the following posts use the Stein release from April 2019 and Ubuntu 18.04 Bionic as the underlying operating system.

OpenStack architecture basics

OpenStack is composed of a large number of components and services which initially can be a bit confusing. However, a minimal OpenStack installation only requires a hand-full of services which are displayed in the diagram below.

At the lowest layer, there are a couple of components that are used by OpenStack but provided by other open source projects. First, as OpenStack is written in Python, it uses the WSGI specification to expose its APIs. Some services come with their own WSGI container, others use Apache2 for that purpose.

Then, of course, OpenStack needs to persist the state of instances, networks and so forth somewhere and therefore relies on a relational database which by default is MariaDB (but could also be PostgreSQL, and in fact, every database that works with SQLAlchemy should do). Next, the different components of an OpenStack service communicate with each other via message queues provided by RabbitMQ and store data temporarily in Memcached. And finally, there is of course the hypervisor which by default is KVM.

On top of these infrastructure components, there are OpenStack services that lay the foundations for the compute, storage and network components. The first of these services is Keystone which provides identity management and a service catalog. All end user and all other services are registered as user with Keystone, and Keystone is handing out tokens so that these users can access the APIs of the various OpenStack services.

Then, there is the Glance image service. Glance allows an administrator to import OS images for use with virtual machines in the cloud, similar to a Docker registry for Docker images. The third of these intermediate services is the placement service which used to be a part of Nova and is providing information on available and used resources so that OpenStack can decide where a virtual machine should be scheduled.

On the upper layer, we have the services that make up the heart of OpenStack. Nova is the compute service, responsible for interacting with the hypervisor to bring up and maintain virtual machines. Neutron is creating virtual networks so that these virtual machines can talk to each other and the outside world. And finally, Cinder (which is not absolutely needed in a minimum installation) is providing block storage.

There are a couple of services that we have not represented in this picture, like the GUI Horizon or the bare-metal service Ironic. We will not discuss Ironic in this series and we will set up Horizon, but mostly use the API.

OpenStack offers quite a bit of flexibility as to how these services are distributed among physical nodes. We can not only distribute these services, but can even split individual services and distribute them across several physical nodes. Neutron, for instance, consists of a server and several agents, and typically these agents are installed on each compute node. Over time, we will look into more complex setups, but for our first steps, we will use a setup where there is a single controller node holding most of the Nova services and one or more compute nodes on which parts of the Nova service and the Neutron service are running.

In a later lab, we will build up an additional network host that runs a part of the Neutron network services, to demonstrate how this works.

Organisation of the upcoming series

In the remainder of this series, I will walk you through the installation of OpenStack in a virtual environment. But the main purpose of this exercise is not to simply have a working installation of OpenStack – if you want this, you could as well use one of the available installation methods like DevStack. Instead, the idea is to understand a bit what is going on behind the scenes – the architecture, the main configuration options, and here and then a little deep-dive into the source code.

To achieve this, we will discuss each service, its overall architecture, some use cases and the configuration steps, starting with the basic setup and ending with the Neutron networking service (on which I will put a certain focus out of interest). To turn this into a hands-on experience, I will guide you through a sequence of labs. In each lab, we will do some exercises and see OpenStack in action. Here is my current plan how the series will be organized.

Environment and base services
The Keystone identity service
A deep-dive into Keystone: tokens, roles and policies
The Glance image service and the Placement service
The Nova compute service
A deep-dive into the provisioning process with Nova
Neutron basics – components and architecture
Setting up Neutron and running our first instances
Provider networks: flat networks and VLAN networks with Neutron
Project and overlay networks
Separating the network node
Cinder: architecture, installation and deep-dives
Octavia: load balancer as-a-service with OpenStack
Moving the entire setup into the cloud – running OpenStack on Packet, GCE or DigitalOcean

As always, the code for this series is available on GitHub. Most of the actual setup will be fully automated using Vagrant and Ansible. We will simulate the individual nodes as virtual machines using VirtualBox, but it should not be difficult to adapt this to the hypervisor of your choice. And finally, the setup is flexible enough to work on a sufficiently well equipped desktop PC as well as in the cloud.

After this general overview, let us now get started. In the next post, we will dive right into our first lab and install the base services that OpenStack needs.

Virtual networking labs – Open vSwitch in practice

In the last post, we have discussed the architecture of Open vSwitch and how it with a control plane to realize an SDN. Today, we will make this a bit more tangible by running two hands-on labs with OVS.

The labs in this post are modelled after some of the How-to documents that are part of the Open vSwitch documentation, but use a combination of virtual machines and Docker to avoid the need for more than one physical machine. In both labs, we bring up two virtual machines which are connected via a VirtualBox virtual network, and inside each machine, we bring up two Docker containers that will eventually interact via OVS bridges.

Lab 11: setting up an overlay network with Open vSwitch

In the first lab, we will establish interaction between the OVS bridges on the two involved virtual machines using an overlay network. Specifically, the Docker containers on each VM will be connected to an OVS bridge, and the OVS bridges will use VXLAN to talk to each other, so that effectively, all Docker containers appear to be connected to an Ethernet network spanning the two virtual machines.

Instead of going through all steps required to set this up, we will again bring up the machines automatically using a combination of Vagrant and Ansible, and then discuss the major steps and the resulting setups. To run the lab, you will again have to download the code from my repository and start Vagrant.

git clone https://github.com/christianb93/networking-samples
cd lab11
vagrant up

While this is running, let us quickly discuss what the scripts are doing. First, of course, we create two virtual machines, each running Ubuntu Bionic. In each machine, we install Open vSwitch and Docker. We then install the docker Python3 module to make Ansible happy.

Next, we bring up two Docker containers, each running an image which is based on NGINX but has some networking tools installed on top. For each container, we set up a pair of two VETH devices. One of the devices is then moved into the networking namespace of the container, and one of the two devices will later be added to our bridge, so that these VETH device pairs effectively operate like an Ethernet cable connecting the containers to the bridge.

We then create the OVS bridge. In the Ansible script, we use the Ansible OVS module to do this, but if you wanted to create the bridge manually, you would use a command like

ovs-vsctl add-br myBridge \
           -- add-port myBridge web1_veth1 \
           -- add-port myBridge web2_veth1

This is actually a combination of three commands (i.e updates on the OVSDB database) which will be run in one single transaction (the OVS CLI uses the double dashes to combine commands into one transaction). With the first part of the command, we create a virtual OVS bridge called myBridge. With the second and third line, we then add two ports, connected to the two VETH pairs that we have created earlier.

Once the bridge exists and is connected to the containers, we add a third port, which is a VLXAN port, which, using a manual setup, would be the result of the following commands.

ovs-vsctl add-port myBridge vxlan0 \\
          -- set interface vxlan0 type=vxlan options:remote_ip=192.168.50.4  options:dst_port=4789 options:ttl=5

Again, we atomically add the port to the bridge and pass the VXLAN options. We set up the VTEP as a point-to-point connection to the second virtual machine, using the standard UDP port and a TTL of five to avoid that UDP packets get lost.

Finally, we configure the various devices and assign IP addresses. To configure the devices in the container namespaces, we could attach to the containers, but it is easier to use netns to run the required commands within the container namespaces.

Once the setup is complete, we are ready to explore the newly created machines. First, use vagrant ssh boxA to log into boxA. From there, use Docker exec to attach to the first container.

sudo docker exec -it web1 "/bin/bash"

You should now be able to ping all other containers, using the IP addresses 172.16.0.2 – 172.16.0.4. If you run arp -n inside the container, you will also find that all three IP addresses are directly resolved into MAC addresses and are actually present on the same Ethernet segment.

To inspect the bridges that OVS has created, exit the container again so that we are now back in the SSH session on boxA and use the command line utility ovs-vsctl to list all bridges.

sudo ovs-vsctl list-br

This will show us one bridge called myBridge, as expected. To get more information, run

sudo ovs-vsctl show

This will print out the full configuration of the current OVS node. The output should look similar to the following snippet.

af3a2230-3d2a-4364-a0b9-1da4e32211e4
    Bridge myBridge
        Port "web2_veth1"
            Interface "web2_veth1"
        Port "vxlan0"
            Interface "vxlan0"
                type: vxlan
                options: {dst_port="4789", remote_ip="192.168.50.5", ttl="5"}
        Port "web1_veth1"
            Interface "web1_veth1"
        Port myBridge
            Interface myBridge
                type: internal
    ovs_version: "2.9.2"

We can see that the output nicely reflects the structure of our network. There is one bridge, with three ports – the two VETH ports and the VXLAN port. We also see the parameters of the VXLAN ports that we have specified during creation. It is also possible to obtain the content of the OVSDB tables that correspond to the various objects in JSON format.

sudo ovs-vsctl list bridge
sudo ovs-vsctl list port
sudo ovs-vsctl list interface

Lab 12: VLAN separation with Open vSwitch

In this lab, we will use a setup which is very similar to the previous one, but with the difference that we use layer 2 technology to span our network across the two virtual machines. Specifically, we establish two VLANs with ids 100 (containing web1 and web3) and 200 (containing the other two containers). On those two logical Ethernet networks, we establish two different layer 3 networks – 192.168.50.0/24 and 192.168.60.0/24.

The first part of the setup – bringing up the containers and creating the VETH pairs – is very similar to the previous labs. Once this is done, we again set up the two bridges. On boxA, this would be done with the following sequence of commands.

sudo ovs-vsctl add-br myBridge
sudo ovs-vsctl add-port myBridge enp0s8
sudo ovs-vsctl add-port myBridge web1_veth1 tag=100
sudo ovs-vsctl add-port myBridge web2_veth1 tag=200

This will create a new bridge and first add the VM interface enp0s8 to it. Note that by default, every port added to OVS is a trunk port, i.e. the traffic will carry VLAN tags. We then add the two VETH ports with the additional parameter tag which will mark the port as an access port and define the corresponding VLAN ID.

Next we need to fix our IP setup. We need to remove the IP address from the enp0s8 as this is now part of our bridge, and set the IP address for the two VETH devices inside the containers.

sudo ip addr del 192.168.50.4 dev enp0s8
web1PID=$(sudo docker inspect --format='{{.State.Pid}}' web1)
sudo nsenter -t $web1PID -n ip addr add  192.168.50.1/24 dev web1_veth0
web2PID=$(sudo docker inspect --format='{{.State.Pid}}' web2)
sudo nsenter -t $web2PID -n ip addr add  192.168.60.2/24 dev web2_veth0

Finally, we need to bring up the devices.

sudo nsenter -t $web1PID -n ip link set  web1_veth0 up
sudo nsenter -t $web2PID -n ip link set  web2_veth0 up
sudo ip link set web1_veth1 up
sudo ip link set web2_veth1 up

The setup of boxB proceeds along the following lines. In the lab, we again use Ansible scripts to do all this, but if you wanted to do it manually, you would have to run the following on boxB.

sudo ovs-vsctl add-br myBridge
sudo ovs-vsctl add-port myBridge enp0s8
sudo ovs-vsctl add-port myBridge web3_veth1 tag=100
sudo ovs-vsctl add-port myBridge web4_veth1 tag=200
sudo ip addr del 192.168.50.5 dev enp0s8
web3PID=$(sudo docker inspect --format='{{.State.Pid}}' web3)
sudo nsenter -t $web3PID -n ip addr add  192.168.50.3/24 dev web3_veth0
web4PID=$(sudo docker inspect --format='{{.State.Pid}}' web4)
sudo nsenter -t $web4PID -n ip addr add  192.168.60.4/24 dev web4_veth0
sudo nsenter -t $web3PID -n ip link set  web3_veth0 up
sudo nsenter -t $web4PID -n ip link set  web4_veth0 up
sudo ip link set web3_veth1 up
sudo ip link set web4_veth1 up

Instead of manually setting up the machines, I have of course again composed a couple of Ansible scripts to do all this. To try this out, run

git clone https://github.com/christianb93/networking-samples
cd lab12
vagrant up

Now log into one of the boxes, say boxA, attach to the web1 container and try to ping web3 and web4.

vagrant ssh boxA
sudo docker exec -it web1 /bin/bash
ping 192.168.50.3
ping 192.168.60.4

You should see that you can get a connection to web3, but not to web4. This is of course what we expect, as the VLAN tagging is supposed to separate the two networks. To see the VLAN tags, open a second session on boxA and enter

sudo tcpdump -e -i enp0s8

When you now repeat the ping, you should see that the traffic generated from within the container web1 carries the VLAN tag 100. This is because the port to which enp0s8 is attached has been set up as a trunk port. If you stop the dump and start it again, but this time listening on the device web1_veth1 which we have added to the bridge as an access port, you should see that no VLAN tag is present. Thus the bridge operates as expected by adding the VLAN tag according to the tag of the access port on which the traffic comes in.

In the next post, we will start to explore another important feagure of OVS – controlling traffic using flows.

Virtual networking labs – a short introduction to Open vSwitch

In the previous posts, we have used standard Linux tools to establish and configure our network interfaces. This is nice, but becomes very difficult to manage if you need to run environments with hundreds or even thousands of machines. Open vSwitch (OVS) is an Open source software switch which can be integrated with SDN control planes and cloud management software. In this post, we will look a bit at the theoretical background of OVS, leaving the practical implementation of some examples to the next post.

Some terms from the world of software defined networks

It is likely that you have heard the magical word SDN before, and it is also quite likely that you have already found that giving a precise meaning to this term is hard. Still, there is a certain agreement that one of the core ideas of SDN is to separate data flow through your networking devices from the and networking configuration.

In a traditional data center, your network would be implemented by a large number of devices like switches and routers. Each of these devices holds some configuration and typically has a way to change that configuration remotely. Thus, the configuration is tightly integrated with the networking infrastructure, and making sure that the entire configuration is consistent and matches the desired state of your network is hard.

With sofware defined networking, you separate the configuration from the networking equipment and manage it centrally. Thus, the networking equipment handles the flow of data – and is referred to as the data plane or flow plane – while a central component called the control plane is responsible for controlling the flow of data.

This is still a bit vague, but becomes a bit more tangible when we look at an example. Enter Open vSwitch (OVS). OVS is a software switch that turns a Linux server (which we will call a node) into a switch. Technically, OVS is a set of server processes that are installed on each node and that handle the network flow between the interfaces of the node. These nodes together make up the data plane. On top of that, there is a control plane or controller. This controller talks to the individual nodes to make sure the rules that they use to manage traffic (called the flows) are set up accordingly.

To allow controllers and switch nodes to interact, an open standard called OpenFlow has been created which defines a common way to describe flows and to exchange data between the controller and the switches. OVS supports OpenFlow (currently only version 1.1 is supported) and thus can be combined with OpenFlow based controllers like Faucet or Open Daylight, creating a layered architecture as follows. Additionally, a switch can be configured to ask the controller how to handle a packet for which no matching flow can be found.

Here, OVS uses OpenFlow to exchange flows with the controller. To exchange information on the underlying configuration of the virtual bridge (which ports are connected, how are these ports set up, …) OVS provides a second protocol called OVSDB (see below) which can also be used by the control plane to change the configuration of the virtual switch (some people would probably prefer to call the part of the control logic which handles this the management plane in contrast to the control plane, which really handles the data flow only).

Components of Open vSwitch

Let us now dig a little bit into the architecture of OVS itself. Essentially, OVS consists of three components plus a set of command-line interfaces to operate the OVS infrastructure.

First, there is the OVS virtual switch daemon ovs-vswitchd. This is a server process running on the virtual switch and is connected to a socket (usually a Unix socket, unless it needs to communicate with controllers not on the same machine). This component is responsible for actually operating the software defined switch.

Then, there is a state store, in the form of the ovsdb-server process. This process is maintaining the state that is managed by OVS, i.e. the objects like bridges, ports and interfaces that make up the virtual switch, and tables like the flow tables used by OVS. This state is usually kept in a file in JSON format in /etc/openvswitch. The OVSDB connects to the same Unix domain socket as the Switch daemon and uses it to exchange information with the Switch daemon (in the database world, the switch daemon is a client to the OVSDB database server). Other clients can connect to the OVSDB using a JSON based protocol called the OVSDB protocol (which is described in RFC 7047) to retrieve and update information.

The third main component of OVS is a Linux kernel module openvswitch. This module is now part of the official Linux kernel tree and therefore is typically pre-installed. This kernel module handles one part of the OVS data path, sometimes called the fast path. Known flows are handled entirely in kernel space. New flows are handled once in the user space part of the datapath (slow path) and then, once the flow is known, subsequently in the kernel data path.

Finally, there are various command-line interfaces, the most important one being ovs-vsctl. This utility can be used to add, modify and delete the switch components managed by OVS like bridges, port and so forth – more on this below. In fact, this utility operates by making updates to the OVSDB, which are then detected and realized by the OVS switch daemon. So the OVSDB is the leading provider of the target state of the system.

The OVS data model

To understand how OVS operates, it is instructive to look at the data model that describes the virtual switches deployed by OVS. This model is verbally described in the man pages. If you have access to a server on which OVS is installed, you can also get a JSON representation of the data model by running

ovsdb-client get-schema Open_vSwitch

At the top level of the hierarchy, there is a table called Open_vSwitch. This table contains a set of configuration items, like the supported interface types or the version of the database being used.

Next, there are bridges. A bridge has one or more ports and is associated with a set of tables, each table representing a protocol that OVS supports to obtain flow information (for instance NetFlow or OpenFlow). Note that the Flow_Table does not contain the actual OpenFlow flow table entries, but just additional configuration items for a flow table. In addition, there are mirror ports which are used to trace and monitor the network traffic (which we ignore in the diagram below).

Each port refers to one or more interfaces. In most situations, each port has one interface, but in case of bonding, for instance, one port is supported by two interfaces. In addition, a port can be associated with QoS settings and queue for traffic control.

Finally, there are controllers and managers. A controller, in OVS terminology, is some external system which talks to OVS via OpenFlow to control the flow of packets through a bridge (and thus is associated with a bridge). A manager, on the other hand, is an external system that uses the OVSDB protocol to read and update the OVSDB. As the OVS switch daemon constantly polls this database for changes, a manager can therefore change the setup, i.e. add or remove bridges, add or remove ports and so on – like a remote version of the ovs-vsctl utility. Therefore, managers are associated with the overall OVS instance.

Installation and first steps with OVS

Before we get into the actual labs in the next post, let us see how OVS can be installed, and let us use OVS to create a simple bridge in order to get used to the command line utilities.

On an Ubuntu distribution, OVS is available as a collection of APT packages. Usually, it should be sufficient to install openvswitch-switch, which will pull in a few additional dependencies. There are similar packages for other Linux distributions.

Once the installation is complete, you should see that two new server processes are running, called (as you might expect from the previous sections) ovsdb-server and ovs-vswitchd. To try out that everything worked, you can now run the ovs-vsctl utility to display the current configuration.

$ ovs-vsctl show
518a2ed6-56d0-433f-98f2-a575daf20f72
    ovs_version: "2.9.2"

The output is still very short, as we have not yet defined any objects. What it shows you is, in fact, an abbreviated version of the one and only entry in the Open_vSwitch table, which shows the unique row identifier (UUID) and the OVS version installed.

Now let us populate the database by creating a bridge, currently without any ports attached to it. Run

sudo ovs-vsctl add-br myBridge

When we now inspect the current state again using ovs-vsctl show, the output should look like this.

518a2ed6-56d0-433f-98f2-a575daf20f72
    Bridge myBridge
        Port myBridge
            Interface myBridge
                type: internal
    ovs_version: "2.9.2"

Note how the output reflects the hierarchical structure of the database. There is one bridge, and attached to this bridge one port (this is the default port which is added to every bridge, similarly to a Linux bridge where creating a bridge also creates a device that has the same name as the bridge). This port has only one interface of type “internal”. If you run ifconfig -a, you will see that OVS has in fact created a Linux networking device myBridge as well. If, however, you run ethtool -i myBridge, you will find that this is not an ordinary bridge, but simply a virtual device managed by the openvswitch driver.

It is interesting to directly inspect the content of the OVSDB. You could either do this by browsing the file /etc/openvswitch/conf.db, or, a bit more conveniently, using the ovsdb-client tool.

sudo ovsdb-client dump Open_vSwitch

This will provide a nicely formatted dump of the current database content. You will see one entry in the Bridge table, representing the new bridge, and corresponding entries in the Port and Interface table.

This closes our post for today. In the next post, we will setup an example (again using Vagrant and Ansible to do all the heavy lifting) in which we connect containers on different virtual machines using OVS bridges and a VXLAN tunnel. In the meantime, you might want to take a look at the following references which I found helpful.

Click to access OpenVSwitch.pdf

https://github.com/openvswitch/ovs
https://tools.ietf.org/html/rfc7047

Using Terraform and Ansible to manage your cloud environments

Terraform is one of the most popular open source tools to quickly provision complex cloud based infrastructures, and Ansible is a powerful configuration management tool. Time to see how we can combine them to easily create cloud based infastructures in a defined state.

Terraform – a quick introduction

This post is not meant to be a full fledged Terraform tutorial, and there are many sources out there on the web that you can use to get familiar with it quickly (the excellent documentation itself, for instance, which is quite readable and quickly gets to the point). In this section, we will only review a few basic facts about Terraform that we will need.

When using Terraform, you declare the to-be state of your infrastructure in a set of resource definitions using a declarative language known as HCL (Hashicorp configuration language). Each resource that you declare has a type, which could be something like aws_instance or packet_device, which needs to match a type supported by a Terraform provider, and a name that you will use to reference your resource. Here is an example that defines a Packet.net server.

resource "packet_device" "web1" {
  hostname         = "tf.coreos2"
  plan             = "t1.small.x86"
  facilities       = ["ewr1"]
  operating_system = "coreos_stable"
  billing_cycle    = "hourly"
  project_id       = "${local.project_id}"
}

In addition to resources, there are a few other things that a typical configuration will contain. First, there are providers, which are an abstraction for the actual cloud platform. Each provider is enabling a certain set of resource types and needs to be declared so that Terraform is able to download the respective plugin. Then, there are variables that you can declare and use in your resource definitions. In addition, Terraform allows you to define data sources, which represent queries against the cloud providers API to gather certain facts and store them into variables for later use. And there are objects called provisioners that allow you to run certain actions on the local machine or on the machine just provisioned when a resource is created or destroyed.

An important difference between Terraform and Ansible is that Terraform also maintains a state. Essentially, the state keeps track of the state of the infrastructure and maps the Terraform resources to actual resources on the platform. If, for instance, you define a EC2 instance my-machine as a Terraform resource and ask Terraform to provision that machine, Terraform will capture the EC2 instance ID of this machine and store it as part of its state. This allows Terraform to link the abstract resource to the actual EC2 instance that has been created.

State can be stored locally, but this is not only dangerous, as a locally stored state is not protected against loss, but also makes working in teams difficult as everybody who is using Terraform to maintain a certain infrastructure needs access to the state. Thereform, Terraform offers backends that allow you to store state remotely, including PostgreSQL, S3, Artifactory, GCS, etcd or a Hashicorp managed service called Terraform cloud.

A first example – Terraform and DigitalOcean

In this section, we will see how Terraform can be used to provision a droplet on DigitalOcean. First, obviously, we need to install Terraform. Terraform comes as a single binary in a zipped file. Thus installation is very easy. Just navigate to the download page, get the URL for your OS, run the download and unzip the file. For Linux, for instance, this would be

wget https://releases.hashicorp.com/terraform/0.12.10/terraform_0.12.10_linux_amd64.zip
unzip terraform_0.12.10_linux_amd64.zip

Then move the result executable somewhere into your path. Next, we will prepare a simple Terraform resource definition. By convention, Terraform expects resource definitions in files with the extension .tf. Thus, let us create a file droplet.tf with the following content.

# The DigitalOcean oauth token. We will set this with the
#  -var="do_token=..." command line option
variable "do_token" {}

# The DigitalOcean provider
provider "digitalocean" {
  token = "${var.do_token}"
}

# Create a droplet
resource "digitalocean_droplet" "droplet0" {
  image  = "ubuntu-18-04-x64"
  name   = "droplet0"
  region = "fra1"
  size   = "s-1vcpu-1gb"
  ssh_keys = []
}

When you run Terraform in a directory, it will pick up all files with that extension that are present in this directory (Terraform treats directories as modules with a hierarchy starting with the root module). In our example, we make a reference to the DigitalOcean provider, passing the DigitalOcean token as an argument, and then declare one resource of type digitalocean_droplet called droplet0

Let us try this out. When we use a provider for the first time, we need to initialize this provider, which is done by running (in the directory where your resource definitions are located)

terraform init

This will detect all required providers and download the respective plugins. Next, we can ask Terraform to create a plan of the changes necessary to realize the to-be state described in the resource definition. To see this in action, run

terraform plan -var="do_token=$DO_TOKEN"

Note that we use the command line switch -var to define the value of the variable do_token which we reference in our resource definition to allow Terraform to authorize against the DigitalOcean API. Here, we assume that you have stored the DigitalOcean token in an environment variable called DO_TOKEN.

When we are happy with the output, we can now finally ask Terraform to actually apply the changes. Again, we need to provide the DigitalOcean token. In addition, we also use the switch -auto-approve, otherwise Terraform would ask us for a confirmation before the changes are actually applied.

terraform apply -auto-approve -var="do_token=$DO_TOKEN"

If you have a DigitalOcean console open in parallel, you can see that a droplet is actually being created, and after less than a minute, Terraform will complete and inform you that a new resource has been created.

We have mentioned above that Terraform maintains a state, and it is instructive to inspect this state after a resource has been created. As we have not specified a backend, Terraform will keep the state locally in a file called terraform.tfstate that you can look at manually. Alternatively, you can also use

terraform state pull

You will see an array resources with – in our case – only one entry, representing our droplet. We see the name and the type of the resource as specified in the droplet.tf file, followed by a list of instances that contains the (provider specific) details of the actual instances that Terraform has stored.

When we are done, we can also use Terraform to destroy our resources again. Simply run

terraform destroy -auto-approve -var="do_token=$DO_TOKEN"

and watch how Terraform brings down the droplet again (and removes it from the state).

Now let us improve our simple resource definition a bit. Our setup so far did not provision any SSH keys, so to log into the machine, we would have to use the SSH password that DigitalOcean will have mailed you. To avoid this, we again need to specify an SSH key name and to get the corresponding SSH key ID. First, we define a Terraform variable that will contain the key name and add it to our droplet.tf file.

variable "ssh_key_name" {
  type = string
  default = "do-default-key"
}

This definition has three parts. Following the keyword variable, we first specify the variable name. We then tell Terraform that this is a string, and we provide a default. As before, we could override this default using the switch -var on the command line.

Next, we need to retrieve the ID of the key. The DigitalOcean plugin provides the required functionality as a data source. To use it, add the following to our resource definition file.

data "digitalocean_ssh_key" "ssh_key_data" {
  name = "${var.ssh_key_name}"
}

Here, we define a data source called ssh_key_data. The only argument to that data source is the name of the SSH key. To provide it, we use the template mechanism that Terraform provides to expand variables inside a string to refer to our previously defined variable.

The data source will then use the DigitalOcean API to retrieve the key details and store them in memory. We can now access the SSH key to add it to the droplet resource definition as follows.

# Create a droplet
resource "digitalocean_droplet" "droplet0" {
...  
ssh_keys = [ data.digitalocean_ssh_key.ssh_key_data.id ]
}

Note that the variable reference consists of the keyword data to indicate that is has been populated by a data source, followed by the type and the name of the data source and finally the attribute that we refer to. If you now run the example again, we will get a machine with an SSH key so that we can access it via SSH as usual (run terraform state pull to get the IP address).

Having to extract the IP manually from the state file is not nice, it would be much more convenient if we could ask the Terraform to somehow provide this as an output. So let us add the following section to our file.

output "instance_ip_addr" {
  value = digitalocean_droplet.droplet0.ipv4_address
}

Here, we define an output called instance_ip_addr and populate it by referring to a data item which is exported by the DigitalOcean plugin. Each resource type will have its own list of exported variables which you can find in the documentation, and you can only refer to one of these variables. If we now run Terraform again, it will print the output upon completion.

We can also create more than one instance at a time by adding the keyword count to the resource definition. When defining the output, we will now have to refer to the instances as an array using the splat expression syntax to refer to an entire array. It is also advisible to move the entire instance configuration into variables and to split out variable definitions into a separate file.

Using Terraform with a non-local state

Using Terraform with a local state can be problematic – it could easily get lost or overwritten and makes working in a team or at different locations difficult. Therefore, let us quickly look at an example to set up Terraform with non-local state. Among the many available backends, we will use PostgreSQL.

Thanks to docker, it is very easy to bring up a local PostgreSQL instance. Simply run

docker run --name tf-backend -e POSTGRES_PASSWORD=my-secret-password -d postgres

Note that we do not map the PostgreSQL port here for security reasons, so we will need to figure out the IP of the resulting Docker container and use it in the Terraform configuration. The IP can be retrieved with the following command.

docker inspect tf-backend | jq -r '.[0].NetworkSettings.IPAddress'

In my example, the Docker container is running at 172.17.0.2. Next, we will have to create a database for the Terraform backend. Assuming that you have the required client packages installed, run

createdb --host=172.17.0.2 --username=postgres terraform_backend

and enter the password my-secret-password specified above when starting the Docker container. We will use the PostgreSQL superuser for our example, in a real life example you would of course first create your own user for Terraform and use it going forward.

Next, we add the backend definition to our Terraform configuration using the following section.

# The backend - we use PostgreSQL.
terraform {
  backend "pg" {
  }
}

You might expect that we also need to provide the location of the database (i.e. a connection string) and credentials. In fact, we could do this, but this would imply that the credentials which are part of the connection string would be present in the file in clear text. We therefore use a partial configuration and later supply the missing data on the command line.

Finally, we have to re-run terraform init to make the new backend known to Terraform and create a new workspace. Terraform will detect the change and even offer you to automatically migrate the state into the new backend.

terraform init -backend-config="conn_str=postgres://postgres:my-secret-password@172.17.0.2/terraform_backend?sslmode=disable"
terraform workspace new default

Be careful – as our database is running in a container with ephemeral storage, the state will be lost when we destroy the container! In our case, this would not be a desaster, as we would still be able to control our small playground environment manually. Still, it is a good idea to tag all your servers so that you can easily identify the servers managed by Terraform. If you intend to use Terraform more often, you might also want to spin up a local PostgreSQL server outside of a container (here is a short Ansible script that will do this for you and create a default user terraform with password being equal to the user name). Note that with a non-local state, Terraform will still store a reference to your state locally (look at the directory .terraform), so that when you work from a different directory, you will have to run the init command again. Also note that this will store your database credentials locally on disk in clear text.

Combining Terraform and Ansible

After this very short introduction into Terraform, let us now discuss how we can combine Terraform to manage our cloud environment with Ansible to manage the machines. Of course there are many different options, and I have not even tried to create an exhaustive list. Here are the options that I did consider.

First, you could operate Terraform and Ansible independently and use a dynamic inventory. Thus, you would have some Terraform templates and some Ansible playbooks and, when you need to verify or update your infrastructure, first run Terraform and then Ansible. The Ansible playbooks would use provider-specific inventory scripts to build a dynamic inventory and operate on this. This setup is simple and allows you to manage the Terraform configuration and your playbooks without any direct dependencies.

However, there is a potential loss of information when working with inventory scripts. Suppose, for instance, that you want to bring up a configuration with two web servers and two database servers. In a Terraform template, you would then typically have a resource “web” with two instances and a resource “db” with two instances. Correspondingly, you would want groups “web” and “db” in your inventory. If you use inventory scripts, there is no direct link between Terraform resources and cloud resources, and you would have to use tags to provide that link. Also, things easily get a bit more complicated if your configuration uses more than one cloud provider. Still, this is a robust method that seems to have some practical uses.

A second potential approach is to use Terraform provisioners to run Ansible scripts on each provisioned resource. If you decide to go down this road, keep in mind that provisioners only run when a resource is created or destroyed, not when it changes. If, for instance, you change your playbooks and simply run Terraform again, it will not trigger any provisioners and the changed playbooks will not apply. There are approaches to deal with this, for instance null resources, but this is difficult to control and the Terraform documentation itself does not advocate the use of provisioners.

Next, you could think about parsing the Terraform state. Your primary workflow would be coded into an Ansible playbook. When you execute this playbook, there is a play or task which reads out the Terraform state and uses this information to build a dynamic inventory with the add_host module. There are a couple of Python scripts out there that do exactly this. Unfortunately, the structure of the state is still provider specific, a DigitalOcean droplet is stored in a structure that is different from the structure used for a AWS EC2 instance. Thus, at least the script that you use for parsing the state is still provider specific. And of course there are potential race conditions when you parse the state while someone else might modify it, so you have to think about locking the state while working with it.

Finally, you could combine Terraform and Ansible with a tool like Packer to create immutable infrastructures. With this approach, you would use Packer to create images, supported maybe by Ansible as a provisioner. You would then use Terraform to bring up an infrastructure using this image, and would try to restrict the in-place configurations to an absolute minimum.

The approach that I have explored for this post is not to parse the state, but to trigger Terraform from Ansible and to parse the Terraform output. Thus our playbook will work as follows.

Use the Terraform module that comes with Ansible to run a Terraform template that describes the infrastructure
Within the Terraform template, create an output that contains the inventory data in a provider independent JSON format
Back in Ansible, parse that output and use add_host to build a corresponding dynamic inventory
Continue with the actual provisioning tasks per server in the playbook

Here, the provider specific part is hidden in the Terraform template (which is already provider specific by design), where the data passed back to Ansible and the Ansible playbook at least has a chance to be provider agnostic.

Similar to a dynamic inventory script, we have to reach out to the cloud provider once to get the current state. In fact, behind the scenes, the Terraform module will always run a terraform plan and then check its output to see whether there is any change. If there is a change, it will run terraform apply and return its output, otherwise it will fall back to the output from the last run stored in the state by running terraform output. This also implies that there is again a certain danger of running into race conditions if someone else runs Terraform in parallel, as we release the lock on the Terraform state once this phase has completed.

Note that this approach has a few consequences. First, there is a structural coupling between Terraform and Ansible. Whoever is coding the Terraform part needs to prepare an output in a defined structure so that Ansible is happy. Also the user running Ansible needs to have the necessary credentials to run Terraform and access the Terraform state. In addition, Terraform will be invoked every time when you run the Ansible playbook, which, especially for larger infrastructures, slows down the processing a bit (looking at the source code of the Terraform module, it would probably be easy to add a switch so that only terraform output is run, which should provide a significant speedup).

Getting and running the sample code

Let us now take a look at a sample implementation of this idea which I have uploaded into my Github repository. The code is organized into a collection of Terraform modules and Ansible roles. The example will bring up two web servers on DigitalOcean and two database servers on AWS EC2.

Let us first discuss the Terraform part. There are two modules involved, one module that will create a droplet on DigitalOcean and one module that will create an EC2 instance. Each module returns, as an output, a data structure that contains one entry for each server, and each entry contains the inventory data for that server, for instance

inventory = [
  {
    "ansible_ssh_user" = "root"
    "groups" = "['web']"
    "ip" = "46.101.162.72"
    "name" = "web0"
    "private_key_file" = "~/.ssh/do-default-key"
  },
...

This structure can easily be assembled using Terraforms for-expressions. Assuming that you have defined a DigitalOcean resource for your droplets, for instance, the corresponding code for DigitalOcean is

output "inventory" {
  value = [for s in digitalocean_droplet.droplet[*] : {
    # the Ansible groups to which we will assign the server
    "groups"           : var.ansibleGroups,
    "name"             : "${s.name}",
    "ip"               : "${s.ipv4_address}",
    "ansible_ssh_user" : "root",
    "private_key_file" : "${var.do_ssh_private_key_file}"
  } ]
}

The EC2 module is structured similarly and delivers output in the same format (note that there is no name exported by the AWS provider, but we can use a tag to capture and access the name).

In the main.tf file, we then simply invoke each of the modules, concatenate their outputs and return this as Terraform output.

This output is then processed by the Ansible role terraform. Here, we simply iterate over the output (which is in JSON format and therefore recognized by Ansible as a data structure, not just a string), and for each item, we create a corresponding entry in the dynamic inventory using add_host.

The other roles in the repository are straightforward, maybe with one exception – the role sshConfig which will add entries for all provisioned hosts in an SSH configuration file ~/.ssh/ansible_created_config and include this file in the users main SSH configuration file. This is not necessary for things to work, but will allow you to simply type something like

ssh db0

to access the first database server, without having to specify IP address, user and private key file.

A note on SSH. When I played with this, I hit upon this problem with the Gnome keyring daemon. The daemon adds identities to your SSH agent, which are attempted every time Ansible tries to log into one of the machines. If you work with more than a few SSH identities, we will therefore exceed the number of failed SSH login attempts configured in the Ubuntu image, and will not be able to connect any more. The solution for me was to disable the Gnome keyring at startup.

In order to run the sample, you will first have to prepare a few credentials. We assume that

You have AWS credentials set up to access the AWS API
The DigitalOcean API key is stored in the environment variable DO_TOKEN
There is a private key file ~/.ssh/ec2-default-key.pem matching an SSH key on EC2 called ec2-default-key
Similarly, there is a private key file ~/.ssh/do-default-key that belongs to a public key uploaded to DigitalOcean as do-default-key
There is a private / public SSH key pair ~/.ssh/default-user-key[.pub] which we will use to set up an SSH enabled default user on each machine

We also assume that you have a Terraform backend up and running and have run terraform init to connect to this backend.

To run the examples, you can then use the following steps.

git clone https://github.com/christianb93/ansible-samples
cd ansible-samples/terraform
terraform init
ansible-playbook site.yaml

The extra invocation of terraform init is required because we introduce two new modules that Terraform needs to pre-load, and because you might not have the provider plugins for DigitalOcean and AWS on your machine. If everything works, you will have a fully provisioned environment with two servers on DigitalOcean with Apache2 installed and two servers on EC2 with PostgreSQL installed up and running in a few minutes. If you are done, do not forget to shut down everything again by running

terraform  destroy -var="do_token=$DO_TOKEN"

Also I highly recommend to use the DigitalOcean and AWS console to manually check that no servers are left running to avoid unnecessary cost and to be prepared for the case that your state is out-of-sync.

Automating provisioning with Ansible – building cloud environments

When you browse the module list of Ansible, you will find that the by far largest section is the list of cloud modules, i.e. modules to interact with a cloud environments to define, provision and maintain objects like virtual machines, networks, firewalls or storage. These modules make it easy to access the APIs of virtually every existing cloud provider. In this post, we will look at an example that demonstrates the usage of these modules for two cloud platforms – DigitalOcean and AWS.

Using Ansible with DigitalOcean

Ansible comes with several modules that are able to connect to the DigitalOcean API to provision machines, manage keys, load balancers, snapshots and so forth. Here, we will need two modules – the module digital_ocean_droplet to deal with individual droplets, and digital_ocean_sshkey_facts to manage SSH keys.

Before getting our hands dirty, we need to talk about credentials first. There are three types of credentials involved in what follows.

To connect to the DigitalOcean API, we need to identify ourselves with a token. Such a token can be created on the DigitalOcean cloud console and needs to be stored safely. We do not want to put this into a file, but instead assume that you have set up an environment variable DO_TOKEN containing its value and access that variable.
The DigitalOcean SSH key. This is the key that we create once and upload (its public part, of course) to the DigitalOcean website. When we provision our machines later, we pass the name of the key as a parameter and DigitalOcean will automatically add this public key to the authorized_keys file of the root user so that we can access the machine via SSH. We will also use this key to allow Ansible to connect to our machines.
The user SSH key. As in previous examples, we will, in addition, create a default user on each machine. We need to create a key pair and also distribute the public key to all machines.

Let us first create these keys, starting with the DigitalOcean SSH key. We will use the key file name do-default-key and store the key in ~/.ssh on the control machine. So run

ssh-keygen -b 2048 -t rsa -P "" -f ~/.ssh/do-default-key
cat ~/.ssh/do-default-key.pub

to create the key and print out the public key.

Then navigate to the DigitalOcean security console, hit “Add SSH key”, enter the public key, enter the name “do-default-key” and save the key.

Next, we create the user key. Again, we use ssh-keygen as above, but this time use a different file name. I use ~/.ssh/default-user-key. There is no need to upload this key to DigitalOcean, but we will use it later in our playbook for the default user that we create on each machine.

Finally, make sure that you have a valid DigitalOcean API token (if not, go to this page to get one) and put the value of this key into an environment variable DO_TOKEN.

We are now ready to write the playbook to create droplets. Of course, we place this into a role to enable reuse. I have called this role droplet and you can find its code here.

I will not go through the entire code (which is documented), but just mention some of the more interesting points. First, when we want to use the DigitalOcean module to create a droplet, we have to pass the SSH key ID. This is a number, which is different from the key name that we specified while doing the upload. Therefore, we first have to retrieve the key ID. For that purpose, we retrieve a list of all keys by calling the module digital_ocean_sshkey_facts which will write its results into the ansible_facts dictionary.

- name: Get available SSH keys
  digital_ocean_sshkey_facts:
    oauth_token: "{{ apiToken }}"

We then need to navigate this structure to get the ID of the key we want to use. We do this by preparing a JSON query, which we put together in several steps to avoid issues with quoting. First, we apply the query string [?name=='{{ doKeyName }}'], where doKeyName is the variable that holds the name of the DigitalOcean SSH key, to navigate to the entry in the key list representing this key. The result will still be a list from which we extract the first (and typically only) element and finally access its attribute id holding the key ID.

- name: Prepare JSON query string
  set_fact:
    jsonQueryString: "[?name=='{{ doKeyName }}']"
- name: Apply query String to extract matching key data
  set_fact:
      keyData: "{{ ansible_facts['ssh_keys']|json_query(jsonQueryString) }}"
- name: Get keyId
  set_fact:
      keyId: "{{ keyData[0]['id'] }}"

Once we have that, we can use the DigitalOcean Droplet module to actually bring up the droplet. This is rather straightforward, using the following code snippet.

- name: Bring up or stop machines
  digital_ocean_droplet:
    oauth_token: "{{ apiToken }}"
    image: "{{osImage}}"
    region_id: "{{regionID}}"
    size_id: "{{sizeID}}"
    ssh_keys: [ "{{ keyId }}" ]
    state: "{{ targetState }}"
    unique_name: yes
    name: "droplet{{ item }}"
    wait: yes
    wait_timeout: 240
    tags:
    - "createdByAnsible"
  loop:  "{{ range(0, machineCount|int )|list }}"
  register: machineInfo

Here, we go through a loop, with the number of iterations being governed by the value of the variable machineCount, and, in each iteration, bring up one machine. The name of this machine is a fixed part plus the loop variable, i.e. droplet0, droplet1 and so forth. Note the usage of the parameter unique_name. This parameter instructs the module to check that the supplied name is unique. So if there is already a machine with that name, we skip its creation – this is what makes the task idempotent!

Once the machine is up, we finally loop over all provisioned machines once more (using the variable machineInfo) and complete two additional step for each machine. First, we use add_host to add the host to the inventory – this makes it possible to use them later in the playbook. For convenience, we also add an entry for each host to the local SSH configuration, which allows us to log into the machine using a shorthand like ssh droplet0.

Once the role droplet has completed, our main playbook will start a second and a third play. These plays now use the inventory that we have just built dynamically. The first of them is quite simple and just waits until all machines are reachable. This is necessary, as DigitalOcean reports a machine as up and running at a point in time when the SSH daemon might not yet accept conditions. The next and last play simply executes two roles. The first role is the role that we have already put together in an earlier post which simply adds a standard non-root user, and the second role simply uses apt and pip to install a few basic packages.

A few words on security are in order. Here, we use the DigitalOcean standard setup, which gives you a machine with a public IP address directly connected to the Internet. No firewall is in place, and all ports on the machine are reachable from everywhere. This is obviously not a setup that I would recommend for anything else than a playground. We could start a local firewall on the machine by using the ufw module for Ansible as a starting point, and then you would probably continue with some basic hardening measures before installing anything else (which, of course, could be automated using Ansible as well).

Using Ansible with AWS EC2

Let us now see what we need to change when trying the same thing with AWS EC2 instead of Digital Ocean. First, of course, the authorization changes a bit.

The Ansible EC2 module uses the Python Boto library behind the scenes which is able to reuse the credentials of an existing AWS CLI configuration. So if you follow my setup instructions in an earlier post on using Python with AWS, no further setup is required
Again, there is an SSH key that AWS will install on each machine. To create a key, navigate to the AWS EC2 console, select “Key pairs” under “Network and Security” in the navigation bar on the left, create a key called ec2-default-key, copy the resulting PEM file to ~/.ssh/ and set the access rights 0700.
The user SSH key – we can use the same key as before

As the EC2 module uses the Boto3 Python library, you might have to install it first using pip or pip3.

Let us now go through the steps that we have to complete to spin up our servers and see how we realize idempotency. First, to create a server on EC2, we need to specify an AMI ID. AMI-IDs change with every new version of an AMI, and typically, you want to install the latest version. Thus we first have to figure out the latest AMI ID. Here is a snippet that will do this.

- name: Locate latest AMI
  ec2_ami_facts:
    region: "eu-central-1"
    owners: 099720109477
    filters:
      name: "ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-????????"
      state: "available"
  register: allAMIs
- name: Sort by creation date to find the latest AMI
  set_fact:
    latestAMI: "{{ allAMIs.images | sort(attribute='creation_date') | last }}"

Here, we first use the module ec2_ami_facts to get a list of all AMIs matching a certain expression – in this case, we look for available AMIs for Ubuntu 18.04 for EBS-backed machines with the owner 099720109477 (Ubuntu). We then use a Jinja2 template to sort by creation date and pick the latest entry. The result will be a dictionary, which, among other things, contains the AMI ID as image_id.

Now, as in the DigitalOcean example, we can start the actual provisioning step using the ec2 module. Here is the respective task.

- name: Provision machine
  ec2:
    exact_count: "{{ machineCount }}"
    count_tag: "managedByAnsible"
    image: "{{ latestAMI.image_id }}"
    instance_tags:
      managedByAnsible: true
    instance_type: "{{instanceType}}"
    key_name: "{{ ec2KeyName }}"
    region: "{{ ec2Region }}"
    wait: yes
  register: ec2Result

Idempotency is realized differently on EC2. Instead of using the server name (which Amazon will assign randomly), the module uses tags. When a machine is brought up, the tags specified in the attribute instance_tags are attached to the new instance. The attribute exact_count instructs the module to make sure that exactly this number of instances with this tag is running. Thus if you run the playbook twice, the module will first query all machines with the specified tag, figure out that there is already an instance running and will not create any additional machines.

This simple setup will provision all new machines into the default VPC and use the default security group. This has the disadvantage that in order for Ansible to be able to reach these machines, you will have to open port 22 for incoming connections in the attached security group. The playbook does not do this, but expects that you do this manually using for instance the EC2 console. Again, this is of course not a production-like.

Running the examples

I have put the full playbooks with all the required roles into my GitHub repository. To run the examples, carry out the following steps.

First, prepare the SSH keys and API tokens for EC2 and DigitalOcean as explained in the respective sections. Then, use the EC2 console to allow incoming traffic for port 22 from your local workstation in the default VPC, into which our playbook will launch the machines created on EC2. Next, run the following commands to clone the repository and execute the playbooks (if you have not used the SSH key names and locations as above, you will first have to edit the default variables in the roles ec2Instance and droplet accordingly).

git clone https://github.com/christianb93/ansible-samples
cd partVII
# Turn off strict host key checking
export ANSIBLE_HOST_KEY_CHECKING=False
# Run the EC2 example
ansible-playbook awsSite.yaml
# Run the DigitalOcean example
ansible-playbook doSite.yaml

Once the playbooks complete, you should now be able to ssh into the machine created on DigitalOcean using ssh droplet0 and into the machine created on EC2 using ssh aws0. Do not forget to delete all machines again when you are done to avoid unnecessary cost!

Limitations of our approach

The approach we have followed so far to manage our cloud environments is simple, but has a few drawbacks and limitations. First, the provisioning is done on the localhost in a loop, and therefore sequentially. Even if the provisioning of one machine only takes one or two minutes, this implies that the approach does not scale, as bringing up a large number of machines would take hours. One approach to fix that that I have seen on Stackexchange (unfortunately I cannot find the post any more, so I cannot give proper credit) is to use a static inventory file with a dummy group containing only the host names in combination with connection: local to execute the provisioning locally, but in parallel.

The second problem is more severe. Suppose, for instance, you have already created a droplet and now decide that you want 2GB of memory instead. So you might be tempted to go back, edit (or override) the variable sizeID and re-run the playbook. This will not give you any error message, but in fact, it will not do anything. The reason is that the module will only use the host name to verify that the target state has been reached, not any other attributes of the droplet (take a look at the module source code here to verify this). Thus, we have a successful playbook execution, but a deviation between the target state coded into the playbook and the real state.

Of course this could be fixed by a more sophisticated check – either by extending the module, or by coding this check into the playbook. Alternatively, we could try to leverage an existing solution that has this logic already implemented. For AWS, for instance, you could try to define your target state using CloudFormation templates and then leverage the CloudFormation Ansible module to apply this from within Ansible. Or, if you want to be less provider specific, you might want to give Terraform a try. This is what I decided to do, and we will dive into more details how to integrate Terraform and Ansible in the next post.

Automating provisioning with Ansible – control structures and roles

When building more sophisticated Ansible playbooks, you will find that a linear execution is not always sufficient, and the need for more advanced control structures and ways to organize your playbooks arises. In this post, we will cover the corresponding mechanisms that Ansible has up its sleeves.

Loops

Ansible allows you to build loops that execute a task more than once with different parameters. As a simple example, consider the following playbook.

---
  - hosts: localhost
    connection: local
    tasks:
      - name: Loop with a static list of items
        debug:
          msg: "{{ item }}"
        loop:
        - A
        - B

This playbook contains only one play. In the play, we first limit the execution to localhost and then use the connection keyword to instruct Ansible not to use ssh but execute commands directly to speed up execution (which of course only works for localhost). Note that more generally, the connection keyword specifies a so-called connection plugin of which Ansible offers a few more than just ssh and local.

The next task has an additional keyword loop. The value of this attribute is a list in YAML format, having the elements A and B. The loop keyword instructs Ansible to run this task twice per host. For each execution, it will assign the corresponding value of the loop variable to the built-in variable item so that it can be evaluated within the loop.

If you run this playbook, you will get the following output.

PLAY [localhost] *************************************
TASK [Gathering Facts]********************************
ok: [localhost]

TASK [Loop with a static list of items] **************
ok: [localhost] => (item=A) => {
    "msg": "A"
}
ok: [localhost] => (item=B) => {
    "msg": "B"
}

So we see that even though there is only one host, the loop body has been executed twice, and within each iteration, the expression {{ item }} evaluates to the corresponding item in the list over which we loop.

Loops are most useful in combination with Jinja2 expressions. In fact, you can iterate over everything that evaluates to a list. The following example demonstrates this. Here, we loop over all environment variables. To do this, we use Jinja2 and Ansible facts to get a dictionary with all environment variables, then convert this to a list using the item() method and use this list as argument to loop.

---
  - hosts: localhost
    connection: local
    tasks:
      - name: Loop over all environment variables 
        debug:
          msg: "{{ item.0 }}"
        loop:
         "{{ ansible_facts.env.items() | list }}"
        loop_control:
          label:
            "{{ item.0 }}"

This task also demonstrates loop controls. This is just a set of attributes that we can specify to control the behaviour of the loop. In this case, we set the label attribute which specifies the output that Ansible prints out for each loop iteration. The default is to print the entire item, but this can quickly get messy if your item is a complex data structure.

You might ask yourself why we use the list filter in the Jinja2 expression – after all, the items() method should return a list. In fact, this is only true for Python 2. I found that in Python 3, this method returns something called a dictionary view, which we then need to convert into a list using the filter. This version will work with both Python 2 and Python 3.

Also note that when you use loops in combination with register, the registered variable will be populated with the results of all loop iterations. To this end, Ansible will populate the variable to which the register refers with a list results. This list will contain one entry for each loop iteration, and each entry will contain the item and the module-specific output. To see this in action, run my example playbook and have a look at its output.

Conditions

In addition to loops, Ansible allows you to execute specific tasks only if a certain condition is met. You could, for instance, have a task that populates a variable, and then execute a subsequent task only if this variable has a certain value.

The below example is a bit artificial, but demonstrates this nicely. In a first task, we create a random value, either 0 or 1, using Jinja2 templating. Then, we execute a second task only if this value is equal to one.

---
  - hosts: localhost
    connection: local
    tasks:
      - name: Populate variable with random value
        set_fact:
          randomVar: "{{ range(0, 2) | random }}"
      - name: Print value
        debug:
          var: randomVar
      - name: Execute additional statement if randomVar is 1
        debug:
          msg: "Variable is equal to one"
        when:
          randomVar == "1"

To create our random value, we combine the Jinja2 range method with the random filter which picks a random element from a list. Note that the result will be a string, either “0” or “1”. In the last task within this play, we then use the when keyword to restrict execution based on a condition. This condition is in Jinja2 syntax (without the curly braces, which Ansible will add automatically) and can be any valid expression which evaluates to either true or false. If the condition evaluates to false, then the task will be skipped (for this host).

Care needs to be taken when combining loops with conditional execution. Let us take a look at the following example.

---
  - hosts: localhost
    connection: local
      - name: Combine loop with when
        debug:
          msg: "{{ item }}"
        loop:
          "{{ range (0, 3)|list }}"
        when:
          item == 1
        register:
          loopOutput
      - debug:
          var: loopOutput

Here, the condition specified with when is evaluated once for each loop iteration, and if it evaluates to false, the iteration is skipped. However, the results array in the output still contains an item for this iteration. When working with the output, you might want to evaluate the skipped attribute for each item which will tell you whether the loop iteration was actually executed (be careful when accessing this, as it is not present for those items that were not skipped).

Handlers

Handlers provide a different approach to conditional execution. Suppose, for instance, you have a sequence of tasks that each update the configuration file of a web server. When this actually resulted in a change, you want to restart the web server. To achieve this, you can define a handler.

Handlers are similar to tasks, with the difference that in order to be executed, they need to be notified. The actual execution is queued until the end of the playbook, and even if the handler is notified more than once, it is only executed once (per host). To instruct Ansible to notify a handler, we can use the notify attribute for the respective task. Ansible will, however, only notify the handler if the task results in a change. To illustrate this, let us take a look at the following playbook.

---
  - hosts: localhost
    connection: local
    tasks:
    - name: Create empty file
      copy:
        content: ""
        dest: test
        force: no
      notify: handle new file
    handlers:
    - name: handle new file
      debug:
        msg: "Handler invoked"

This playbook defines one task and one handler. Handlers are defined in a separate section of the playbook which contains a list of handlers, similarly to tasks. They are structured as tasks, having a name and an action. To notify a handler, we add the notify attribute to a task, using exactly the name of the handler as argument.

In our example, the first (and only) task of the play will create an empty file test, but only if the file is not yet present. Thus if you run this playbook for the first time (or manually remove the file), this task will result in a change. If you run the playbook once more, it will not result in a change.

Correspondingly, upon the first execution, the handler will be notified and execute at the end of the play. For subsequent executions, the handler will not run.

Handler names should be unique, so you cannot use handler names to make a handler listen for different events. There is, however, a way to decouple handlers and notifications using topics. To make a handler listen for a topic, we can add the listen keyword to the handler, and use the name of the topic instead of the handler name when defining the notificiation, see the documentation for details.

Handlers can be problematic, as they are change triggered, not state triggered. If, for example, a task results in a change and notifies a handler, but the playbook fails in a later task, the trigger is lost. When you run the playbook again, the task will most likely not result in a change any more, and no additional notification is created. Effectively, the handler never gets executed. There is, however, a flag –force-handler to force execution of handlers even if the playbook fails.

Using roles and includes

Playbooks are a great way to automate provisioning steps, but for larger projects, they easily get a bit messy and difficult to maintain. To better organize larger projects, Ansible has the concept of a role.

Essentially, roles are snippets of tasks, variables and handlers that are organized as reusable units. Suppose, for instance, that in all your playbooks, you do a similar initial setup for all machines – create a user, distribute SSH keys and so forth. Instead of repeating the same tasks and the variables to which they refer in all your playbooks, you can organize them as a role.

What makes roles a bit more complex to use initially is that roles are backed by a defined directory structure that we need to understand first. Specifically, Ansible will look for roles in a subdirectory called (surprise) roles in the directory in which the playbook is located (it is also possible to maintain system-wide roles in /etc/ansible/roles, but it is obviously much harder to put these roles under version control). In this subdirectory, Ansible expects a directory for each role, named after the role.

Within this subdirectory, each type of objects defined by this role are placed in separate subdirectories. Each of these subdirectories contains a file main.yml that defines these objects. Not all of these directories need to be present. The most important ones that most roles will include are (see the Ansible documentation for a full list) are:

tasks, holding the tasks that make up the role
defaults that define default values for the variables referenced by these tasks
vars containing additional variable definitions
meta containing information on the role itself, see below

To initialize this structure, you can either create these directories manually, or you run

cd roles
ansible-galaxy init

to let Ansible create a skeleton for you. The tool that we use here – ansible-galaxy – is actually part of an infrastructure that is used by the community to maintain a set of reusable roles hosted centrally. We will not use Ansible Galaxy here, but you might want to take a look at the Galaxy website to get an idea.

So suppose that we want to restructure the example playbook that we used in one of my last posts to bring up a Docker container as an Ansible test environment using roles. This playbook essentially has two parts. In the first part, we create the Docker container and add it to the inventory. In the second part, we create a default user within the container with a given SSH key.

Thus it makes sense to re-structure the playbook using two roles. The first role would be called createContainer, the second defaultUser. Our directory structure would then look as follows.

Here, docker.yaml is the main playbook that will use the role, and we have skipped the var directory for the second role as it is not needed. Let us go through these files one by one.

The first file, docker.yaml, is now much shorter as it only refers to the two roles.

---
  - name: Bring up a Docker container 
    hosts: localhost
    roles:
    - createContainer
  - name: Provision hosts
    hosts: docker_nodes
    become: yes
    roles:
    - defaultUser

Again, we define two plays in this playbook. However, there are no tasks defined in any of these plays. Instead, we reference roles. When running such a play, Ansible will go through the roles and for each role:

Load the tasks defined in the main.yml file in the tasks-folder of the role and add them to the play
Load the default variables defined in the main.yml file in the defaults-folder of the role
Merge these variable definitions with those defined in the vars-folder, so that these variables will overwrite the defaults (see also the precedence rules for variables in the documentation

Thus all tasks imported from roles will be run before any tasks defined in the playbook directly are executed.

Let us now take a look at the files for each of the roles. The tasks, for instance, are simply a list of tasks that you would also put into a play directly, for instance

---
- name: Run Docker container
  docker_container:
    auto_remove: yes
    detach: yes
    name: myTestNode
    image: node:latest
    network_mode: bridge
    state: started
  register: dockerData

Note that this is a perfectly valid YAML list, as you would expect it. Similarly, the variables you define either as default or as override are simple dictionaries.

---
# defaults file for defaultUser
userName: chr
userPrivateKeyFile: ~/.ssh/ansible-default-user-key_rsa

The only exception is the role meta data main.yml file. This is a specific YAML-based syntax that defines some attributes of the role itself. Most of the information within this file is meant to be used when you decide to publish your role on Galaxy and not needed if you work locally. There is one important exception, though – you can define dependencies between roles that will be evaluated and make sure that roles that are required for a role to execute are automatically pulled into the playbook when needed.

The full example, including the definitions of these two roles and the related variables, can be found in my GitHub repository. To run the example, use the following steps.

# Clone my repository and cd into partVI
git clone https://github.com/christianb93/ansible-samples
cd ansible-samples/partVI
# Create two new SSH keys. The first key is used for the initial 
# container setup, the public key is baked into the container
ssh-keygen -f ansible -b 2048 -t rsa -P ""
# the second key is the key for the default user that our 
# playbook will create within the container
ssh-keygen -f default-user-key -b 2048 -t rsa -P ""
# Get the Dockerfile from my earlier post and build the container, using
# the new key ansible just created
cp ../partV/Dockerfile .
docker build -t node .
# Put location of user SSH key into vars/main.yml for 
# the defaultUser role
mkdir roles/defaultUser/vars
echo "---" > roles/defaultUser/vars/main.yml
echo "userPrivateKeyFile: $(pwd)/default-user-key" >> roles/defaultUser/vars/main.yml
# Run playbook
export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook docker.yaml

If everything worked, you can now log into your newly provisioned container with the new default user using

ssh -i default-user-key chr@172.17.0.2

where of course you need to replace the IP address by the IP address assigned to the Docker container (which the playbook will print for you). Note that we have demonstrated variable overriding in action – we have created a variable file for the role defaultUser and put the specific value, namely the location of the default users SSH key – into it, overriding the default coming with the role. This ability to bundle all required variables as part of a role but at the same time allowing a user to override them is it what makes roles suitable for actual code reuse. We could have overwritten the variable as well in the playbook using it, making the use of roles a bit similar to a function call in a procedural programming language.

This completes our post on roles. In the next few posts, we will now tackle some more complex projects – provisioning infrastructure in a cloud environment with Ansible.

Automating provisioning with Ansible – working with inventories

So far, we have used Ansible inventories more or less as a simple list of nodes. But there is much more you can do with inventories – you can assign hosts to groups, build hierarchies of groups, use dynamic inventories and assign variables. In this post, we will look at some of these options.

Groups in inventories

Recall that our simple inventory file (when using the Vagrant setup demonstrated in an earlier post) looks as follows (say this is saved as hosts.ini in the current working directory)

[servers]
192.168.33.10
192.168.33.11

We have two servers and one group called servers. Of course, we could have more than one group. Suppose, for instance, that we change our file as follows.

[web]
192.168.33.10
[db]
192.168.33.11

When running Ansible, we can now specify either the group web or db, for instance

export ANSIBLE_HOST_KEY_CHECKING=False
ansible -u vagrant \
        -i hosts.ini \
        --private-key ~/vagrant/vagrant_key \
        -m ping web

then Ansible will only operate on the hosts in the group web. If you want to run Ansible for all hosts, you can still use the group “all”.

Hosts can be in more than one group. For instance, when we change the file as follows

[web]
192.168.33.10
[db]
192.168.33.10
192.168.33.11

then the host 192.168.33.10 will be on both groups, db and web. If you run Ansible for the db group, it will operate on both hosts, if you run it for the web group, it will operate only on 192.168.33.10. Of course, if you use the pseudo-group all, then Ansible will touch this host only once, even though it appears twice in the configuration file.

It is also possible to define groups as the union of other groups. To create a group that contains all servers, we could for instance do something like

[web]
192.168.33.10
[db]
192.168.33.11
192.168.33.10
[servers:children]
db
web

Finally, as we have seen in the last post, we can define variables directly in the inventory file. We have already seen this on the level of individual hosts, but it is also possible on the level of groups. Let us take a look at the following example.

[web]
192.168.33.10
[db]
192.168.33.11
192.168.33.10
[servers:children]
db
web
[db:vars]
a=5
[servers:vars]
b=10

Here we define two variables, a and b. The first variable is defined for all servers in the db group. The second variable is defined for all servers in the servers group. We can print out the values of these variables using the debug Ansible module

ansible -i hosts.ini \
        --private-key ~/vagrant/vagrant_key  \
        -u vagrant -m debug -a "var=a,b" all

which will give us the following output

192.168.33.11 | SUCCESS => {
    "a,b": "(5, 10)"
}
192.168.33.10 | SUCCESS => {
    "a,b": "(5, 10)"
}

To better understand the structure of an inventory file, it is often useful to create a YAML representation of the file, which is better suited to visualize the hierarchical structure. For that purpose, we can use the ansible-inventory command line tool. The command

ansible-inventory -i hosts.ini --list -y

will create the following YAML representation of our inventory.

all:
  children:
    servers:
      children:
        db:
          hosts:
            192.168.33.10:
              a: 5
              b: 10
            192.168.33.11:
              a: 5
              b: 10
        web:
          hosts:
            192.168.33.10: {}
    ungrouped: {}

which nicely demonstrates that we have effectively built a tree, with the implicit group all at the top, the group servers as the only descendant and the groups db and web as children of this group. In addition, there is a group ungrouped which lists all hosts which are not explicitly assigned to a group. If you omit the -y flag, you will get a JSON representation which lists the variables in a separate dictionary _meta.

Group and host variables in separate files

Ansible also offers you the option to maintain group and host variables in separate files, which can again be useful if you need to deal with different environments. By convention, Ansible will look for variables defined on group level in a directory group_vars that needs to be a subdirectory of the directory in which the inventory file is located. Thus, if your inventory file is called /home/user/somedir/hosts.ini, you would have to create a directory /home/user/somedir/group_vars. Inside this directory, place a YAML file containing a dictionary with key-value pairs which will then be used as variable definitions and whose name is that of the group. In our case, if we wanted to define a variable c for all hosts in the db group, we would create a file group_vars/db.yaml with the following content

---
  c: 15

We can again check that this has worked by printing the variables a, b and c.

ansible -i hosts.ini \
        --private-key ~/vagrant/vagrant_key  \
        -u vagrant -m debug -a "var=a,b,c" db

Similary, you could create a directory host_vars and place a file there to define variables for a specific host – again, the file name should match the host name. It is also possible to merge several files – see the corresponding page in the documentation for all the details.

Dynamic inventories

So far, our inventories have been static, i.e. we prepare them before we run Ansible, and they are unchanged while the playbook is running. This is nice in a classical setup where you have a certain set of machines you need to manage, and make changes only if a new machine is physically added to the data center or removed. In a cloud environment, however, the setup tends to be much more dynamic. To deal with this situation, Ansible offers different approaches to manage dynamic inventories that change at runtime.

The first approach is to use inventory scripts. An inventory script is simply a script (an executable, which can be written in any programming language) that creates an inventory in JSON format, similarly to what ansible-inventory does. When you provide such an executable using the -i switch, Ansible will invoke the inventory script and use the output as inventory.

Inventory scripts are invoked by Ansible in two modes. When Ansible needs a full list of all hosts and groups, it will add the switch –list. When Ansible needs details on a specific host, it will pass the switch –host and, in addition, the hostname as an argument.

Let us take a look at an example to see how this works. Ansible comes bundled with a large number of inventory scripts. Let us play with the script for Vagrant. After downloading the script to your working directory and installing the required module paramiko using pip, you can run the script as follows.

python3 vagrant.py --list

This will give you an output similar to the following (I have piped this through jq to improve readability)

{
  "vagrant": [
    "boxA",
    "boxB"
  ],
  "_meta": {
    "hostvars": {
      "boxA": {
        "ansible_user": "vagrant",
        "ansible_host": "127.0.0.1",
        "ansible_ssh_private_key_file": "/home/chr/vagrant/vagrant_key",
        "ansible_port": "2222"
      },
      "boxB": {
        "ansible_user": "vagrant",
        "ansible_host": "127.0.0.1",
        "ansible_ssh_private_key_file": "/home/chr/vagrant/vagrant_key",
        "ansible_port": "2200"
      }
    }
  }
}

We see that the script has created a group vagrant with two hosts, using the names in your Vagrantfile. For each host, it has, in addition, declared some variables, like the private key file, the SSH user and the IP address and port to use for SSH.

To use this dynamically created inventory with ansible, we first have to make the Python script executable, using chmod 700 vagrant.py. Then, we can simply invoke Ansible pointing to the script with the -i switch.

export ANSIBLE_HOST_KEY_CHECKING=False
ansible -i vagrant.py -m ping all

Note that we do not have to use the switches –private-key and -u as this information is already present in the inventory.

It is instructive to look at the (rather short) script. Doing this, you will see that behind the scenes, the script simply invokes vagrant status and vagrant ssh-config. This implies, however, that the script will only detect your running instances properly if you execute it – and thus Ansible – in the directory in which your Vagrantfile is living and in which you issued vagrant up to bring up the machines!.

In practice, dynamic inventories are mostly used with public cloud providers. Ansible comes with inventory scripts for virtually every cloud provider you can imagine. As an example, let us try out the script for EC2.

First, there is some setup required. Download the inventory script ec2.py and, placing it in the same directory, the configuration file ec2.ini. Then, make the script executable.

A short look at the script will show you that it uses the Boto library to access the AWS API. So you need Boto installed, and you need to make sure that Boto has access to your AWS credentials, for instance because you have a working AWS CLI configuration (see my previous post on using Python with AWS for more details on this). As explained in the comments in the script, you can also use environment variables to provide your AWS credentials (which are stored in ~/.aws/credentials when you have set up the AWS CLI).

Next, bring up some instances on EC2 and then run the script using

./ec2.py --list

Note that the script uses a cache to avoid having to repeat time-consuming API calls. The expiration time is provided as a parameter in the ec2.ini file. The default is 5 minutes, which can be a bit too long when playing around, so I recommend to change this to e.g. 30 seconds.

Even if you have only one or two machines running, the output that the script produces is significantly more complex than the output of the Vagrant dynamic inventory script. The reason for this is that, instead of just listing the hosts, the EC2 script will group the hosts according to certain criteria (that again can be selected in ec2.ini), for instance availability zone, region, AMI, platform, VPC and so forth. This allows you to target for instance all Linux boxes, all machines running in a specific data center and so forth. If you tag your machines, you will also find that the script groups the hosts by tags. This is very useful and allows you to differentiate between different types of machines (e.g. database servers, web servers), or different stages. The script also attaches certain characteristics as variables to each host.

In addition to inventory scripts, there is a second mechanism to dynamically change an inventory – there is actually an Ansible module which maintains the copy of the inventory that Ansible builds in memory at startup (and thus makes changes that are only valid for this run), the add_host module. This is mostly used when we use Ansible to actually bring up hosts.

To demonstrate this, we will use a slightly different setup as we have used so far. Recall that Ansible can work with any host on which Python is installed and which can be accessed via SSH. Thus, instead of spinning up a virtual machine, we can as well bring up a container and use that to simulate a host. To spin up a container, we can use the Ansible module docker_container that we execute on the control machine, i.e. with the pseudo-group localhost, which is present even if the inventory is empty. After we have created the container, we add the container dynamically to the inventory and can then use it for the remainder of the playbook.

To realize this setup, the first thing which we need is a container image with a running SSH daemon. As base image, we can use the latest version of the official Ubuntu image for Ubuntu bionic. I have created a Dockerfile which, based on the Ubuntu image, installs the OpenSSH server and sudo, creates a user ansible, adds the user to the sudoer group and imports a key pair which is assumed to exist in the directory.

Once this image has been built, we can use the following playbook to bring up a container, dynamically add it to the inventory and run ping on it to test that the setup has worked. Note that this requires that the docker Python module is installed on the control host.

---
  # This is our first play - it will bring up a new Docker container and register it with
  # the inventory
  - name: Bring up a Docker container that we will use as our host and build a dynamic inventory
    hosts: localhost
    tasks:
    # We first use the docker_container module to start our container. This of course assumes that you
    # have built the image node:latest according to the Dockerfile which is distributed along with
    # this script. We use bridge networking, but do not expose port 22
    - name: Run Docker container
      docker_container:
        auto_remove: yes
        detach: yes
        name: myTestNode
        image: node:latest
        network_mode: bridge
        state: started
      register: dockerData
    # As we have chosen not to export the SSH port, we will have to figure out the IP of the container just created. We can
    # extract this from the return value of the docker run command
    - name: Extract IP address from docker dockerData
      set_fact:
        ipAddress:  "{{ dockerData['ansible_facts']['docker_container']['NetworkSettings']['IPAddress'] }}"
    # Now we add the new host to the inventory, as part of a new group docker_nodes
    # This inventory is then valid for the remainder of the playbook execution
    - name: Add new host to inventory
      add_host:
        hostname: myTestNode
        ansible_ssh_host: "{{ ipAddress }}"
        ansible_ssh_user: ansible
        ansible_ssh_private_key_file: "./ansible"
        groups: docker_nodes
  #
  # Our second play. We now ping our host using the newly created inventory
  #
  - name: Ping host
    hosts: docker_nodes
    become: yes
    tasks:
    - name: Ping  host
      ping:

If you want to run this example, you can download all the required files from my GitHub repository, build the required Docker image, generate the key and run the example as follows.

pip install docker
git clone https://www.github.com/christianb93/ansible-samples
cd ansible-samples/partV
./buildContainer
export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook docker.yaml

This is nice, and you might want to play with this to create additional containers for more advanced test setups. When you do this, however, you will soon realize that it would be very beneficial to be able to execute one task several times, i.e. to use loops. Time to look at control structures in Ansible, which we do in the next post.