Building a bitcoin controller for Kubernetes part IX – managing secrets and creating events

In the last post in this series, we have created a more or less functional bitcoin controller. However, to be reasonably easy to operate, there are still a few things that are missing. We have hardcoded secrets in our images as well as our code, and we log data, but do not publish events. These shortcomings are on our todo list for today.

Step 12: using secrets to store credentials

So far, we have used the credentials to access the bitcoin daemon at several points. We have placed the credentials in a configuration file in the bitcoin container where they are accessed by the daemon and the bitcoin CLI and we have used them in our bitcoin controller when establishing a connection to the RPC daemon. Let us now try to replace this by a Kubernetes secret.

We will store the bitcoind password and user in a secret and map this secret into the pods in which our bitcoind is running (thus the secret needs to be in the namespace in which the bitcoin network lives). The name of the secret will be configurable in the definition of the network.

In the bitcoind container, we add a startup script that checks for the existence of the environment variables. If they exist, it overwrites the configuration. It then starts the bitcoind as before. This makes sure that our image will still work in a pure Docker environment and that the bitcoin CLI can use the same passwords.

When our controller brings up pods, it needs to make sure that the secret is mapped into the environments of the pod. To do this, the controller needs to add a corresponding structure to the container specification when bringing up the pod, using the secret name provided in the specification of the bitcoin network. This is done by the following code snippet.

sts.Spec.Template.Spec.Containers[0].EnvFrom = []corev1.EnvFromSource{
	corev1.EnvFromSource{
		SecretRef: &corev1.SecretEnvSource{
			Optional: &optional,
			LocalObjectReference: corev1.LocalObjectReference{
				Name: bcNetwork.Spec.Secret,
			},
		},
	},
}

The third point where we need the secret is when the controller itself connects to a bitcoind to manage the node list maintained by the daemon. We will use a direct GET request to retrieve the secret, not an informer or indexer. The advantage of this approach is that in our cluster role, we can restrict access to a specific secret and do not have to grant the service account the right to access ANY secret in the cluster which would be an obvious security risk.

Note that the secret that we use needs to be in the same namespace as the pod into which it is mapped, so that we need one secret for every namespace in which a bitcoin network will be running.

Once we have the secret in our hands, we can easily extract the credentials from it. To pass the credentials down the call path into the bitcoin client, we also need to restructure the client a bit – the methods of the client now accept a full configuration instead of just an IP address so that we can easily override the default credentials. If no secret has been defined for the bitcoin network, we still use the default credentials. The code to read the secret and extract the credentials has been placed in a new package secrets.

Step 13: creating events

As a second improvement, let us adapt our controller so that it does not only create log file entries, but actively emits events that can be accessed using kubectl or the dashboard or picked up by a monitoring tool.

Let us take a quick look at the client-side code of the Kubernetes event system. First, it is important to understand that events are API resources – you can create, get, update, list, delete and watch them like any other API resource. Thus, to post an event, you could simply use the Kubernetes API directly and submit a POST request. However, the Go client package contains some helper objects that make it much easier to create and post events.

A major part of this mechanism is located in the tools/record package within the Go client. Here the following objects and interfaces are defined.

  • An event sink is an object that knows how to forward events to the Kubernetes API. Most of the time, this will be a REST client accessing the API, for instance the implementation in events.go in kubernetes/typed/core/v1.
  • Typically, a client does not use this object directly, but makes use of an event recorder. This is just a helper object that has a method Event that assembles an event and passes it to the machinery so that it will eventually be picked up by the recorder and sent to the Kubernetes API
  • The missing piece that connects and event sink and an event recorder is an event broadcaster. This is a factory class for event recorders. You can ask a broadcaster for a record and set up the broadcaster such that events received via this recorder are not only forwarded to the API, but also logged and forwarded to additional event handlers.
  • Finally, an event source is basically a label that is added to the events that we generate, so that whoever evaluates or reads the events knows where they originate from

Under the hood, the event system uses the broadcaster logic provided by the package apimachinery/pkg/watch. Here, a broadcaster is essentially a collection of channels. One channel, called the incoming channel, is used to collect messages, which are then distributed to N other channels called watchers. The diagram below indicates how this is used to manage events.

Broadcaster

When you create an event broadcaster, a watch.Broadcaster is created as well (embedded into the event broadcaster), and when you ask this broadcaster to create a new recorder, it will return a recorder which is connected to the same watch.Broadcaster. If a recorder publishes an event, it will write into the incoming queue of this broadcaster, which then distributes the event to all registered watchers. For each watcher, a new goroutine is started with invokes a defined function once an event is received. This can be a function to perform logging, but also be a function to write into an event sink.

To use this mechanism, we therefore have to create an event broadcaster, an event source, an event sink, register potentially needed additional handlers and finally receive a recorder. The Kubernetes sample controller again provides a good example how this is done.

After adding a similar code to our controller, we will run into two small problems. First, events are API resources, and therefore our controller needs the right to create them. So once more, we need to adapt our cluster role to grant that right. The second problem that we can get is that the event refers to a bitcoin network, but is published via the core Kubernetes API. The scheme used for that purpose is not aware of the existence of bitoin network objects, and the operation will fail, resulting in the mesage ‘Could not construct reference to …due to: ‘no kind is registered for the type v1.BitcoinNetwork in scheme “k8s.io/client-go/kubernetes/scheme/register.go:65″‘. To fix this, we can simply add our scheme to the default scheme (as it is also done in the sample controller

This completes todays post. In the next post, we will discuss how we can efficiently create and run automated unit and integration tests for our controller and mock the Kubernetes API.

Building a bitcoin controller for Kubernetes part VIII – creating a helm chart

Our bitcoin controller is more or less complete, and can be deployed into a Kubernetes cluster. However, the deployment process itself is still a bit touchy – we need to deploy secrets, RBAC profiles and the CRD, bring up a pod running the controller and make sure that we grab the right version of the involved images. In addition, doing this manually using kubectl makes it difficult to reconstruct which version is running, and environment specific configurations need to be made manually. Enter Helm….

Helm – an overview

Helm is (at the time of writing, this is supposed to change with the next major release) a combination of two different components. First, there is the helm binary itself which is a command-line interface that typically runs locally on your workstation. Second, there is Tiller which is a controller running in your Kubernetes cluster and carrying out the actual deployment.

With Helm, packages are described by charts. A chart is essentially a directory with a certain layout and a defined set of files. Specifically, a Helm chart directory contains

  • A file called Chart.yaml that contains some basic information on the package, like its name, its version, information on its maintainers and a timestamp
  • A directory called templates. The files in this directory are Kubernetes manifests that need to be applied as part of the deployment, but there is a twist – these files are actually templates, so that parts of these files can be placeholders that are filled as part of the deployment process.
  • A file called values.yaml that contains the values to be used for the placeholders

To make this a bit more tangible, suppose that you have an application which consists of several pods. One of these pods runs a Docker image, say myproject/customer-service. Each time you build this project, you will most likely create a new Docker image and push it into some repository, and you will probaby use a unique tag name to distinguish different builds from each other.

When you deploy this without Helm, you would have to put the tag number into the YAML manifest file that describes the deployment. With every new build, you would then have to update the manifest file as well. In addition, if this tag shows up in more than one place, you would have to do this several times.

With Helm, you would not put the actual tag into the deployment manifest, but use a placeholder. These placeholders follow the Go template syntax. Instead, you would put the actual tag into the values.yaml file. Thus the respective lines in your deployment would be something like

containers:
  - name: customer-service-ctr
    image: myproject/customer-service:{{.Values.customer_service_image_tag}}

and within the file values.yaml, you would provide a value for the tag

customer_service_image_tag: 0.7

When you want to install the package, you would run helm install . in the directory in which your package is located. This would trigger the process of joining the templates and the values to create actual Kubernetes manifest files and apply these files to your cluster.

Of course there is much more that you can customize in this way. As an example, let us take a look at the Apache helm chart published by Bitnami. When you read the deployment manifest in the templates folder, you will find that almost every value is parametrized. Some of these parameters – those starting with .Values – are defined in the values.yaml. Others, like those starting with .Release, refer to the current release and are auto-generated by Helm. Here, a release in the Helm terminology is simply the result of installing a chart. This does, for instance, allow you to label pods with a label that contains the release in which they were created.

Helm repositories

So far, we have described Helm charts as directories. This is one way to implement Helm charts – those of you who have worked with J2EE and WAR files before might be tempted to call this the “exploded” view. However, you can also bundle all files that comprise a chart into a single archive. These archives can be published in a Helm repository.

Technically, a Helm repository is nothing but an URL that can serve two types of objects. First, there is a collection of archives (zipped tar files with suffix .tgz). These files contain the actual Helm charts – they are literally archived versions of an exploded directory, created using the command helm package. Second, there is an index, a file called index.yaml that lists all archives contained in the repository and contains some metadata. An index file can be created using the command helm repo index.

Helm maintains a list of known repositories, similar to a package manager like APT that maintains a list of known package sources. To add a new repository to this list, use the helm repo add command.

Let us again take a look at an example. Gitlab maintains a Helm repository at the URL https://charts.gitlab.io. If you add index.yaml to this URL and submit a GET request, you will get this file. This index file contains several entries for different charts. Each chart has, among other fields, a name, a version and the URL of an archive file containing the actual chart. This archive can be on the same web server, or somewhere else. If you use Helm to install one of the charts from the repository, Helm will use the latest version that shows up in the index file, unless you specify a version explicitly.

Installing and running Helm

Let us go through an example to see how all this works. We assume that you have a running Kubernetes cluster and a kubectl pointing to this cluster.

First, let us install Helm. As explained earlier, this involves a local installation and the installation of Tiller in your cluster. The local installation depends, of course, on your operating system. Several options are available and described here.

Once the helm binary is in your path, let us install the Tiller daemon in your cluster. Before we do this, however, we need to think about service accounts. Tiller, being the component of Helm that is responsible for carrying out the actual deployment, requires the permissions to create, update and delete basically every Kubernetes resource. To make this possible, we need to define a service account that Tiller uses and map this account to a corresponding cluster role. For a test setup, we can simply use the pre-existing cluster-admin role.

$ kubectl create serviceaccount tiller -n kube-system
$ kubectl create clusterrolebinding tiller --clusterrole=cluster-admin  --serviceaccount=kube-system:tiller

Once this has been done, we can now instruct Helm to set up its local configuration and to install Tiller in the cluster using this service account.

$ helm init --service-account tiller

If you now display all pods in your cluster, you will see that a new Tiller pod has been created in the namespace kube-system (this is why we created our service account in that namespace as well).

Now we can verify that this worked. You could, for instance, run helm list which will list all releases in your cluster. This should not give you any output, as we have not yet installed anything. Let us change this – as an example, we will install the Bitnami repo referenced earlier. As a starting point, we retrieve the list of repos that Helm is currently aware of.

$ helm repo list
NAME    URL
stable  https://kubernetes-charts.storage.googleapis.com
local   http://127.0.0.1:8879/charts

These are the default repositories that Helm knows from scratch. Let us now add the Bitnami repo and then use the search function to find all Gitlab charts.

$ helm repo add bitnami https://charts.bitnami.com
$ helm search bitnami

This should give you a long list of charts that are now available for being installed, among them the chart for the Apache HTTP server. Let us install this.

$ helm install bitnami/apache

Note how we refer to the chart – we use the combination of the chart name that we have used when doing our helm repo add and the name of the chart. You should now see a summary that lists the components that have been installed as part of this release – a pod, a load balancer service and a deployment. You can also track the status of the service using kubectl get svc. After a few seconds, the load balancer should have been brough up, and you should be able to curl it.

You will also see that Helm cas created a release name for you that you can reference the release. You can now use this release name and the corresponding helm commands to delete or upgrade your release.

A helm repository for our bitcoin controller

It is easy to set up a Helm repository for the bitcoin controller. As mentioned earlier, every HTTP server will do. I use Github for this purpose. The Helm repository itself is here. It contains the archives that have been created with helm package and the index file which is the result of helm index. With that repository in place, it is very easy to install our controller – two lines will do.

$ helm repo add bitcoin-controller-repo https://raw.githubusercontent.com/christianb93/bitcoin-controller-helm/master
$ helm install bitcoin-controller-repo/bitcoin-controller

An exploded view of this archive is available in a separate Github repository. This version of the Helm chart is automatically updated as part of my CI/CD pipeline (which I will describe in more detail in a later post), and when all tests succeed, this repository is packaged and copied into the Helm repository.

This completes our short introduction to Helm and, at the same time, is my last post in the short series on Kubernetes controllers. I highly recommend to spend some time with the quite readable documentation at helm.sh to learn more about Helm templates, repositories and the life cycle of releases. If you want to learn more about controllers, my best advice is to read the source code of the controllers that are part of Kubernetes itself that demonstrate all of the mechanisms discussed so far and much more. Happy hacking!

Building a bitcoin controller for Kubernetes part VII – testing

Kubernetes controllers are tightly integrated with the Kubernetes API – they are invoked if the state of the cluster changes, and they act by invoking the API in turn. This tight dependency turns testing into a challenge, and we need a smart testing strategy to be able to run unit and integration tests efficiently.

The Go testing framework

As a starting point, let us recall some facts about the standard Go testing framework and see what this means for the organization of our code.

A Go standard installation comes with the package testing which provides some basic functions to define and execute unit tests. When using this framework, a unit test is a function with a signature

func TestXXX(t *testing.T)

where XXX is the name of your testcase (which, ideally, should of course reflect the purpose of the test). Once defined, you can easily run all your tests with the command

$ go test ./...

from the top-level directory of your project. This command will scan all packages for functions following this naming convention and invoke each of the test functions. Within each function, you can use the testing.T object that can be used to indicate failure and log error messages. At the end of a test execution, a short summary of passed and failed tests will be printed.

It is also possible to use go test to create and display a coverage analysis. To do this, run

$ go test ./... -coverprofile /tmp/cp.out
$ go tool cover -html=/tmp/cp.out

The first command will execute the tests and write coverage information into /tmp/cp.out. The second command will turn the contents of this file into HTML output and display this nicely in a browser, resulting in a layout as below (and yes, I understand that test coverage is a flawed metric, but still it is useful….)

GoTestCoverage

How do we utilize this framework in our controller framework? First, we have to decide in which packages we place the test functions. The Go documentation recommends to place the test code in the same package as the code under test, but I am not a big fan of this approach, because this will allow you to access private methods and attributes of the objects under test and not to test against contracts, but against implementation. Therefore I have decided to put the code for unit testing a package XYZ into a dedicated package XYZ_test (more on integration tests below).

This approach has its advantages, but requires that you take testability into account when designing your code (which, of course, is a good idea anyway). In particular, it is good practice to use interfaces to allow for injection of test code. Let us take the code for the bitcoin RPC client as an example. Here, we use an interface HTTPClient to model the dependency of the RPC client from a HTTP clent. This interface is implemented by the client from the HTTP package, but for testing purposes, we can use a mock implementation and inject it when creating a bitcoin RPC client.

We also have to think about separation of unit tests which will typically use mock objects from integration tests that require a running Kubernetes cluster or a bitcoin daemon. There are different ways to do this, but the approach that I have chosen is as follows.

  • Unit tests are placed in the same directory as the code that they test
  • A unit test functions has a name that ends with “Unit”
  • Integration tests – which typically test the system as a whole and thus are not clearly linked to an individual package – are placed in a global subdirectory test
  • Integration test function names end on “Integration”

With this approach, we can run all unit tests by executing

go test ./... -run "Unit"

from the top-level directory, and can use the command

go test -run "Integration"

from within the test directory to run all integration tests.

The Kubernetes testing package

To run useful unit tests in our context, we typically need to simulate access to a Kubernetes API. We could of course our own mock objects, but fortunately, the Kubernetes client go package comes with a set of ready-to-use helper objects to do this. The core of this is the package client-go/testing. The key objects and concepts used in this package are

  • An action describes an API call, i.e. a HTTP request against the Kubernetes API, by defining the namespace, the HTTP verb and the resource and subresource that is addressed
  • A Reactor describes how a mock object should react to a particular action. We can ask a Reactor whether it is ready to handle a specific action and then invoke its React action to actually do this
  • A Fake object is essentially a collection of actions recording the actions that have been taken on the mock object and reactors that react upon these actions. The core of the Fake object is its Invoke method. This method will first record the action and then walk the registered reactors, invoking the first reactor that indicates that it will handle this action and returning its result

This is of course rather a framework than a functional mock object, but the client-go library offers more – it has a generated set of fake client objects that build on this logic. In fact, there is a fake clientset in the package client-go/kubernetes/fake which implements the kubernetes.Interface interface that makes up a Kubernetes API client. If you take a look at the source code, you will find that the implementation is rather straightforward – a fake clientset embeds a testing.Fake object and a testing.ObjectTracker which is essentially a simple object store. To link those two elements, it installs reactors for the various HTTP verbs like GET, UPDATE, … that simply carry out the respective action on the object tracker. When you ask such a fake clientset for, say, a Nodes object, you will receive an object that delegates the various methods like Get to the invoke method of the underlying fake object which in turn uses the reactors to get the result from the object tracker. And, of course, you can add your own reactors to simulate more specific responses.

KubernetesTesting

Using the Kubernetes testing package to test our bitcoin controller

Let us now discuss how we can use that testing infrastructure provided by the Kubernetes packages to test our bitcoin controller. To operate our controller, we will need the following objects.

  • Two clientsets – one for the standard Kubernetes API and one for our custom API – here we can utilize the machinery described above. Note that the Kubernetes code generators that we use also creates a fake clientset for our bitcoin API
  • Informer factories – again there will be two factories, one for the Kubernetes API and one for our custom API. In a fully integrated environment these informers would invoke the event handlers of the controller and maintain the cache. In our setup, we do not actually run the controller, but only use the cache that they contain, and maintain the cache ourselves. This gives us more control about the contents of the cache, the timing and the invocations of the event handlers.
  • A fake bitcoin client that we inject into the controller to simulate the bitcoin RPC server

Its turns out to be useful to collect all components that make up this test bed in a Go structure testFixture. We can then implement some recurring functionality like the basic setup or starting and stopping the controller as methods of this object.

TestFixture

In this approach, there is a couple of pitfalls. First, it is important to keep the informer caches and the state stored in the fake client objects in sync. If, for example, we let the controller add a new stateful set, it will do this via the API, i.e. in the fake clientset, and we need to push this stateful set into the cache so that during the next iteration, the controller can find it there.

Another challenge is that our controller uses go routines to do the actual work. Thus, whenever we invoke an event handler, we have to wait until the worker thread has picked up the resulting queued event before we can validate the results. We could do this by simply waiting for some time, however, this is of course not really reliable and can lead to long execution times. Instead, it is a better approach to add a method to the controller which allows us to wait until the controller internal queue is empty and makes the tests deterministic. Finally, it is good practice to put independent tests into independent functions, so that each unit test function starts from scratch and does not rely on the state left over by the previous function. This is especially important because go test will cache test results and therefore not necessarily execute all test cases every time we run the tests.

Integration testing

Integration testing does of course require a running Kubernetes cluster, ideally a cluster that can be brought up quickly and can be put into a defined state before each test run. There are several options to achieve this. Of course, you could make use of your preferred cloud provider to bring up a cluster automatically (see e.g. this post for instructions on how to do this in Python), run your tests, evaluate the results and delete the cluster again.

If you prefer a local execution, there are by now several good candidates for doing this. I have initially executed integration tests using minikube, but found that even though this does of course provide perfect isolation, it has the disadvantage that starting a new minikube cluster takes some time, which can slow down the integration tests considerably. I have therefore switched to kind which runs Kubernetes locally in a Docker container. With kind, a cluster can be started in approximately 30 seconds (depending, of course, on your machine). In addition, kind offers an easy way to pre-load docker images into the nodes, so that no download from Docker hub is needed (which can be very slow). With kind, the choreography of a integration test run is roughly as follows.

  • Bring up a local bitcoin daemon in Docker which will be used to test the Bitcoin RPC client which is part of the bitcoin controller
  • Bring up a local Kubernetes cluster using kind
  • Install the bitcoin network CRD, the RBAC profile and the default secret
  • Build the bitcoin controller and a new Docker image and tag it in the local Docker repository
  • Pre-load the image into the nodes
  • Start a pod running the controller
  • Pre-pull the docker image for the bitcoin daemon
  • Run the integration tests using go test
  • Tear down the cluster and the bitcoin daemon again

I have created a shell script that runs these steps automatically. Currently, going through all these steps takes about two minutes (where roughly 90 seconds are needed for the actual tests and 30 seconds for setup and tear-down). For the future, I plan to add support for microk8s as well. Of course, this could be automated further using a CI/CD pipeline like Jenkins or Travis CI, with proper error handling, a clean re-build from a freshly cloned repo and more reporting functionality – this is a topic that I plan to pick up in a later post.

Building a bitcoin controller for Kubernetes part V – establishing connectivity

Our bitcoin controller now has the basic functionality that we expect – it can synchronize the to-be state and the as-is state and update status information. However, to be really useful, a few things are still missing. Most importantly, we want our nodes to form a real network and need to establish a mechanism to make them known to each other. Specifically, we will use RPC calls to exchange IP addresses between the nodes in our network so that they can connect.

Step 10: talking to the bitcoin daemon

To talk to the bitcoin daemon, we will use its JSON RPC interface. We thus need to be able to send and receive HTTP POST requests and to serialize and de-serialize JSON. Of course there are some libraries out there that could do this for us, but it is much more fun to implement our own bitcoin client in Go. For that purpose, we will use the packages HTTP and JSON.

While developing this client, it is extremely useful to have a locally running bitcoind. As we already have a docker image, this is very easy – simply run

$ docker run -d -p 18332:18332 christianb93/bitcoind

on your local machine (assuming you have Docker installed). This will open port 18332 which you can access using for instance curl, like

$ curl --user 'user:password' --data '{"jsonrpc":"1.0","id":"0","method":"getnetworkinfo","params":[]}' -H 'content-type:text/plain;' http://localhost:18332

Our client will be very simple. Essentially, it consists of the following two objects.

  • A Config represents the configuration data needed to access an RPC server (IP, port, credentials)
  • A BitcoinClient which is the actual interface to the bitcoin daemon and executes RPC calls

A bitcoin client holds a reference to a configuration which is used as default if no other configuration is supplied, and a HTTP client. Its main method is RawRequest which creates an RPC request, adds credentials and parses the response. No error handling to e.g. deal with timeouts is currently in place (this should not be a real restriction in practice, as we have the option to re-queue our processing anyway). In addition to this generic function which can invoke any RPC method, there are specific functions like AddNode, RemoveNode and GetAddedNodeList that accept and return Go structures instead of JSON objects. In addition, there are some structures to model RPC request, RPC responses and errors.

BitcoinClient

Node that our controller now needs to run inside the cluster, as it needs to access the bitcoind RPC servers (there might be ways around this, for instance by adding a route on the host similar to what minikube tunnel is doing for services, but I found that this is easily leads to IP range conflicts with e.g. Docker).

Step 11: adding new nodes to our network

When we bring up a network of bitcoin nodes, each node starts individually, but is not connected to any other node in the network – in fact, if we bring up three nodes, we maintain three isolated blockchains. For most use cases, this is of course not what we want. So let us now try to connect the nodes to each other.

To do this, we will manipulate the addnode list that each bitcoind maintains. This list is like a database of known nodes to which the bitcoind will try to connect. Before we automate this process, let us first try this out manually. Bring up the network and enter

$ ip1=$(kubectl get bitcoinnetwork my-network -o json | jq -r ".status.nodes[1].ip")
$ kubectl exec my-network-sts-0 -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf addnode $ip1 add

This will find out the (node) IP address of the second node using our recently implemented status information and invoke the JSON-RPC method addnode on the first node to connect the first and the second node. We can now verify that the IP address of node 1 has been added to the addnode list of node 0 and that node 1 has been added to the peer list of node 0, but also node 0 has been added to the peer list of node 1.

$ kubectl exec my-network-sts-0 -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf getaddednodeinfo
$ kubectl exec my-network-sts-0 -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf getpeerinfo
$ kubectl exec my-network-sts-1 -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf getpeerinfo

We can now repeat this process with the third node – we again make the node known to node 0 and then get the list of nodes each nodes knows about.

$ ip2=$(kubectl get bitcoinnetwork my-network -o json | jq -r ".status.nodes[2].ip")
$ kubectl exec my-network-sts-0 -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf addnode $ip2 add
$ kubectl exec my-network-sts-0 -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf getpeerinfo
$ kubectl exec my-network-sts-1 -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf getpeerinfo
$ kubectl exec my-network-sts-2 -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf getpeerinfo

We see that

  • Node 0 knows both node 1 and node 2
  • Node 1 knows only node 0
  • Node 2 knows only node 1

So in contrast to my previous understanding, the nodes do not automatically connect to each other when there is a node that is known to all of them. After some research, I suspect that this is because bitcoind puts addresses into buckets and only connects to one IP address in the bucket. As IP addresses in the same subnet go into the same bucket, only one connection will be made by default. To avoid an artificial dependency on node 0, we therefore explicitly connect each node to any other node in the network.

To do this, we create an additional function syncNodes which is called during the reconciliation if we detect a change in the node list. Within this function, we then simply loop over all nodes that are ready and, for each node:

  • Submit the RPC call addednodeinfo to get a list of all nodes that have previously been added
  • For each node that is not in the list, add it using the RPC call addnode with command add
  • For each node that is in the list, but is no longer ready, use the same RPC call with command remove to remove it from the list

As there might be another worker thread working on the same network, we ignore, for instance, errors that a bitcoind returns when we try to add a node that has already been added before, similarly for deletions.

Time again to run some tests. First, let us run the controller and bring up a bitcoin network called my-network (assuming that you have cloned my repository)

$ kubectl apply -f deployments/controller.yaml
$ kubectl apply -f deployments/testNetwork.yaml

Wait for some time – somewhere between 30 and 45 seconds – to allow all nodes to come up. Then, inspect the log file of the controller

$ kubectl logs bitcoin-controller -n bitcoin-controller

You should now see a few messages indicating that the controller has determined that nodes need to be added to the network. To verify that this worked, we can print all added node lists for all three instances.

$ for i in {0..2}; 
do
  ip=$(kubectl get pod my-network-sts-$i -o json  | jq -r ".status.podIP")
  echo "Connectivity information for node $i (IP $ip):" 
  kubectl exec my-network-sts-$i -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf getaddednodeinfo | jq -r ".[].addednode"
done

This should show you that in fact, all nodes are connected to each other – each node is connected to all other nodes. Now let us connect to one node, say node 0, and mine a few blocks.

$ kubectl exec my-network-sts-0 -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf generate 101

After a few seconds, we can verify that all nodes have synchronized the chain.

$ for i in {0..2}; 
do
  ip=$(kubectl get pod my-network-sts-$i -o json  | jq -r ".status.podIP")
  blocks=$(kubectl exec my-network-sts-$i -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf getblockchaininfo | jq -r ".blocks")
  echo "Node $i (IP $ip) has $blocks blocks" 
done

This should show you that all three nodes have 101 blocks in their respective chain. What happens if we bring down a node? Let us delete, for instance, pod 0.

$ kubectl delete pod my-network-sts-0

After a few seconds, the stateful set controller will have brought up a replacement. If you wait for a few more seconds and repeat the command above, you will see that the new node has been integrated into the network and synchronized the blockchain. In the logfiles of the controller, you will also see that two things have happened (depending a bit on timing). First, the controller has realized that the node is no longer ready and uses RPC calls to remove it from the added node lists of the other nodes. Second, when the replacement node comes up, it will add this node to the remaining nodes and vice versa, so that the synchronization can take place.

Similarly, we can scale our deployment. To do this, enter

$ kubectl edit bitcoinnetwork my-network

Then change the number of replicas to four and save the file. After a few seconds, we can inspect the state of the blockchain on the new node and find that is also has 101 blocks.

$ kubectl exec my-network-sts-3 -- /usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf getblockchaininfo

Again, the log files of the controller tell us that the controller has detected the new node and added it to all other nodes. Similarly, if we use the same procedure to scale down again, the nodes that are removed from the stateful set will also be removed from the added node lists of the remaining nodes.

We now have the core functionality of our controller in place. As in the previous posts, I have pushed the code into a new tag on GitHub. I have also pushed the latest image to Docker Hub so that you can repeat the tests described above without building the image yourself. In the next post, we will start to add some more meat to our controller and to implement some obvious improvements – proper handling of secrets, for instance.

Building a bitcoin controller for Kubernetes part IV – garbage collection and status updates

In our short series on implementing a bitcoin controller for Kubernetes, we have reached the point where the controller is actually bringing up bitcoin nodes in our network. Today, we will extend its logic to also cover deletions and we will start to add additional logic to monitor the state of our network.

Step 9: owner references and deletions

As already mentioned in the last post, Kubernetes has a built-in garbage collector which we can utilize to handle deletions so that our controller does not have to care about this.

The Kubernetes garbage collector is essentially a mechanism that is able to perform cascading deletes. Let us take a deployment as an example. When you create a deployment, the deployment will in turn create a replica set, and the replica set will bring up pods. When you delete the deployment, the garbage collector will make sure that the replica set is deleted, and the deletion of the replica set in turn will trigger the deletion of the pods. Thus we only have to maintain the top-level object of the hierarchy and the garbage collector will help us to clean up the dependent objects.

The order in which objects are deleted is controller by the propagation policy which can be selected when deleting an object. If “Foreground” is chosen, Kubernetes will mark the object as pending deletion by setting its deletionTimestamp and delete all objects that are owned by this object in the background before itself is eventually removed. For a “Background” deletion, the order is reversed – the object will be deleted right away, and the cleanup will be performed afterwards.

How does the garbage collector identify the objects that need to be deleted when we delete, say, a deployment? The ownership relation underlying this logic is captured by the ownership references in the object metadata. This structure contains all the information (API version, kind, name, UID) that Kubernetes needs to identify the owner of an object. An owner reference can conveniently be generated using the function NewControllerRef in the package k8s.io/apimachinery/pkg/apis/meta/v1

Thus, to allow Kubernetes to clean up our stateful set and the service when we delete a bitcoin network, we need to make two changes to our code.

  • We need to make sure that we add the owner reference to the metadata of the objects that we create
  • When reconciling the status of a bitcoin network with its specification, we should ignore networks for which the deletion timestamp is already set, otherwise we would recreate the stateful set while the deletion is in progress

For the sake of simplicity, we can also remove the delete handler from our code completely as it will not trigger any action anyway. When you now repeat the tests at the end of the last post and delete the bitcoin, you will see that the stateful set, the service and the pods are deleted as well.

At this point, let us also implement an additional improvement. When a service or a stateful set changes, we have so far been relying on the periodic resynchronisation of the cache. To avoid long synchronization times, we can also add additional handlers to our code to detect changes to our stateful set and our headless service. To distinguish changes that affect our bitcoin networks from other changes, we can again use the owner reference mechanism, i.e. we can retrieve the owner reference from the stateful set to figure out to which – if any – bitcoin network the stateful set belongs. Following the design of the sample controller, we can put this functionality into a generic method handleObject that works for all objects.

BitcoinControllerStructureII

Strictly speaking, we do not really react upon changes of the headless service at the moment as the reconciliation routine only checks that it exists, but not its properties, so changes to the headless service would go undetected at the moment. However, we add the event handler infrastructure for the sake of completeness.

Step 10: updating the status

Let us now try to add some status information to our bitcoin network which is updated regularly by the controller. As some of the status information that we are aiming at is not visible to Kubernetes (like the synchronization state of the blockchain), we will not add additional watches to capture for instance the pod status, but once more rely on the periodic updates that we do anyway.

The first step is to extend the API type that represents a bitcoin network to add some more status information. So let us add a list of nodes to our status. Each individual node is described by the following structure

type BitcoinNetworkNode struct {
	// a number from 0...n-1 in a deployment with n nodes, corresponding to
	// the ordinal in the stateful set
	Ordinal int32 `json:"ordinal"`
	// is this node ready, i.e. is the bitcoind RPC server ready to accept requests?
	Ready bool `json:"ready"`
	// the IP of the node, i.e. the IP of the pod running the node
	IP string `json:"ip"`
	// the name of the node
	NodeName string `json:"nodeName"`
	// the DNS name
	DNSName string `json:"dnsName"`
}

Correspondingly, we also need to update our definition of a BitcoinNetworkStatus – do not forget to re-run the code generation once this has been done.

type BitcoinNetworkStatus struct {
	Nodes []BitcoinNetworkNode `json:"nodes"`
}

The next question we have to clarify is how we determine the readiness of a bitcoin node. We want a node to appear as ready if the bitcoind representing the node is accepting JSON RPC requests. To achieve this, there is again a Kubernetes mechanism which we can utilize – readiness probes. In general, readiness probes can be defined by executing an arbitrary command or by running a HTTP request. As we are watching a server object, using HTTP requests seems to be the way to go, but there is a little challenge: the bitcoind RPC server uses HTTP POST requests, so we cannot use a HTTP GET request as a readiness probe, and Kubernetes does not allow us to configure a POST request. Instead, we use the exec-option of a readiness check and run the bitcoin CLI inside the container to determine when the node is ready. Specifically, we execute the command

/usr/local/bin/bitcoin-cli -regtest -conf=/bitcoin.conf getnetworkinfo

Here we use the configuration file that contains, among other things, the user credentials that the CLI will use. As a YAML structure, this readiness probe would be set up as follows (but of course we do this in Go in our controller programmatically).

   readinessProbe:
      exec:
        command:
        - /usr/local/bin/bitcoin-cli
        - -regtest
        - -conf=/bitcoin.conf
        - getnetworkinfo
      failureThreshold: 3
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

Note that we wait 10 seconds before doing the first probe, to give the bitcoind sufficient time to come up. It is instructive to test higher values of this, for instance 60 seconds – when the stateful set is created, you will see how the creation of the second pod is delayed for 60 seconds until the first readiness check for the first pod succeeds.

Now let us see how we can populate the status structure. Basically, we need to retrieve a list of all pods that belong to our bitcoin network. We could of course again use the ownership references to find those pods, or use the labels that we need anyway to define our stateful set. But in our case, there is even a an easier approach – as we use a stateful set, the names of the pods are completely predictable and we can easily retrieve them all by name. So to update the status information, we need to

  • Loop through the pods controlled by this stateful set, and for each pod
  • Find the status of the pod, using its conditions
  • Retrieve the pods IP address from its status
  • Assemble the DNS name (which, as we know, is the combination of the name of the pod and the name of the headless service)
  • Put all this into the above structure
  • Post this using the UpdateStatus method of our client object

Note that at this point, we follow the recommended best practise and update the status independent of the spec. It is instructive to extend the logging of the controller to log the generation of the bitcoin network and the resourceVersion. The generation (contained in the ObjectMeta structure) represents a version of the desired state, i.e. the spec, and is only updated (usually incremented by one) if we change the spec for the bitcoin network resource. In contrast to this, the resource version is updated for every change of the persisted state of the object and represents the etcd’s internal sequence number.

When you try to run our updated controller inside the cluster, however, you will find that there is again a problem – with the profile that we have created and to which our service account is linked, the update of the status information of a bitcoin network is not allowed. Thus we have to explicitly allow this by granting the update right on the subresource status, which is done by adding the following rule to our cluster role.

- apiGroups: ["bitcoincontroller.christianb93.github.com"]
  resources: ["bitcoinnetworks/status"]
  verbs: ["update"] 

We can now run a few more tests to see that our status updates work. When we bring up a new bitcoin network and use kubectl with “-o json” to retrieve the status of the bitcoin network, we can see that the node list populates as the pods are brought up and the status of the nodes changes to “Ready” one by one.

I have again created a tag v0.4 in the GitHub repository for this series to persist the state of the code at this point in time, so that you have the chance to clone the code and play with it. In the next post, we will move on and add the code needed to make sure that our bitcoin nodes detect each other at startup and build a real bitcoin network.

Building a bitcoin controller for Kubernetes part III – service accounts and the reconciliation function

In the previous post, we have reached the point where our controller is up and running and is starting to handle events. However, we hit upon a problem at the end of the last post – when running in-cluster, our controller uses a service account which is not authorized to access our bitcoin network resources. In todays post, we will see how to fix this by adding RBAC rules to our service account. In addition, we will implement the creation of the actual stateful set when a new bitcoin network is created.

Step 7: defining a service account and roles

To define what a service account is allowed to do, Kubernetes offers an authorization scheme based on the idea of role-based access control (RBAC). In this model, we do not add authorizations to a user or service account directly. Instead, the model knows three basic entities.

  • Subjects are actors that need to be authorized. In Kubernetes, actors can either be actual users or service accounts.
  • Roles are collection of policy rules that define a set of allowed actions. For instance, there could be a role “reader” which allows read-access to all or some resources, and a separate role “writer” that allows write-access.
  • Finally, there are role bindings which link roles and subjects. A subject can have more than one role, and each role can be assigned to more than one subject. The sum of all roles assigned to a subject determines what this subject is allowed to do in the cluster

The actual data model is a bit more complicated, as there are some rules that only make sense on the cluster level, and other rules can be restricted to a namespace.

RBAC

How do we specify a policy rule? Essentially, a policy rule lists a set of resources (specified by the API group and the resource type or even specific resource names) as they would show up in an API path, and a set of verbs like GET, PUT etc. When we add a policy rule to a role, every subject that is linked to this role will be authorized to run API calls that match this combination of resource and verb.

A cluster role then basically consists of a list of policy rules (there is also a mechanism called aggregation which allows us to build hierarchies or roles). Being an API object, it can be described by a manifest file and created using kubectl as any other resource. So to set up a role that will allow our controller to list, get and update bitcoin network resources and pods and create, update, get, list and delete stateful sets, we would apply the following manifest file.

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: bitcoin-controller-role
rules:
- apiGroups: ["apps", ""]
  resources: ["statefulsets", "services"]
  verbs: ["get", "watch", "list", "create", "update", "delete"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["bitcoincontroller.christianb93.github.com"]
  resources: ["bitcoinnetworks"]
  verbs: ["get", "list", "watch"]

Next, we set up a specific service account for our controller (otherwise we would have to add our roles to the default service account which is used by all pods by default – this is not what we want). We need to do this for every namespace in which we want to run the bitcoin operator. Here is a manifest file that creates a new namespace bitcoin-controller with a corresponding service account.

apiVersion: v1
kind: Namespace
metadata:
    name: bitcoin-controller
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: bitcoin-controller-sva
  namespace: bitcoin-controller

Let us now link this service account and our cluster role by defining a cluster role binding. Again, a cluster role binding can be defined in a manifest file and be applied using kubectl.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: bitcoin-controller-role-binding
subjects:
- kind: ServiceAccount
  name: bitcoin-controller-sva
  namespace: bitcoin-controller
roleRef:
  kind: ClusterRole
  name: bitcoin-controller-role
  apiGroup: rbac.authorization.k8s.io

Finally, we need to modify our pod specification to instruct Kubernetes to run our controller using our newly created service account. This is easy, we just need to add the service account as a field to the Pod specification:

...
spec:
  serviceAccountName: bitcoin-controller-sva
...

When we now run our controller using the modified manifest file, it should be able to access all the objects it needs and the error messages observed at the end of our last post should disappear. Note that we need to run our controller in the newly created namespace bitcoin-controller, as our service account lives in this namespace. Thus you will have to create a service account and a cluster role binding for every namespace in which you want to run the bitcoin controller.

Step 8: creating stateful sets

Let us now start to fill the actual logic of our controller. The first thing that we will do is to make sure that a stateful set (and a matching headless service) is created when a new bitcoin network is defined and conversely, the stateful set is removed again when the bitcoin network is deleted.

This requires some additional fields in our controller object that we need to define and populate. We will need

  • access to indexers for services and stateful sets in order to efficiently query existing stateful sets and services, i.e. additional listers and informers (strictly speaking we will not need the informers in todays post, but in a future post – here we only need the listers)
  • A clientset that we can use to create services and stateful sets

Once we have this, we can design the actual reconciliation logic. This requires a few thoughts. Remember that our logic should be level-based and not edge-based, because our controller could actually miss events, for instance if it is down for some time and comes up again. So the logic that we implement is as follows and will be executed every time when we retrieve a trigger from the work queue.

Retrieve the headless service for this bitcoin network 
IF service does not exist THEN
  create new headless service
END IF
Retrieve the stateful set for this bitcoin network 
IF stateful set does not exist THEN
  create new stateful set
END IF
Compare number of nodes in bitcoin network spec with replicas in stateful set
IF they are not equal
  update stateful set object
END IF

For simplicity, we will use a naming convention to match bitcoin networks and stateful sets. This has the additional benefit that when we try to create a second stateful set by mistake, it will be refused as no two stateful sets with the same name can exist. Alternatively, we could use a randomly generated name and use labels or annotations to match stateful sets and controllers (and, of course, there are owner references – more on this below).

Those of you who have some experience with concurrency, multi-threading, locks and all this (for instance because you have built an SMP-capable operating system kernel) will be a bit alerted when looking at this code – it seems very vulnerable to race conditions. What if a node is just going down and the etcd does not know about it yet? What if the cache is stale and the status in the etcd is already reflecting updates that we do not see? What if two events are processed concurrently by different worker threads? What if a user updates the bitcoin network spec while we are just bringing up our stateful sets?

There are two fundamentally different ways to deal with these challenges. Theoretically, we could probably use the API mechanisms provided for optimistic locking ( resource versions that are being checked on updates) to implement basic synchronization primitives like compare-and-swap as it is done to implement leader election on Kubernetes, see also this blog. We could then implement locking mechanisms based on these primitives and use them to protect our resources. However, this will never be perfect, as there will always be a lag between the state in the etcd and the actual state of the cluster. In addition, this can easily put us in a situation where deadlocks occur or locks at least slow down the processing massively.

The second approach – which, looking at the source code of some controllers in the Kubernetes repositories, seems to be the approach taken by the K8s community – is to accept that full consistency will never be possible and to strive for eventual consistency. All actors in the system need to prepare for encountering temporary inconsistencies and implement mechanisms to deal with them, for instance be re-queuing events until the situation is resolved. This is the approach that we will also take for our controller. This implies, for instance, that we re-queue events when errors occur and that we leverage the periodic resync of the cache to reconcile the to-be state and the as-is state periodically. In this way, inconsistencies can arise but should be removed in the next synchronisation cycle.

In this version of the code, error handling is still very simple – most of the time, we simply stop the reconciliation when an error occurs without re-queuing the event and rely on the periodic update that happens every 30 seconds anyway because the cache is re-built. Of course there are errors for which we might want to immediately re-queue to retry faster, but we leave that optimization to a later version of the controller.

Let us now run a few tests. I have uploaded the code after adding all the features explained in this post as tag v3 to Github. For simplicity, I assume that you have cloned this code into the corresponding directory github.com/christianb93/bitcoin-controller in your Go workspace and have a fresh copy of a Minikube cluster. To build and deploy the controller, we have to add the CRD, the service account, cluster role and cluster role binding before we can build and deploy the actual image.

$ kubectl apply -f deployments/crd.yaml
$ kubectl apply -f deployments/rbac.yaml
$ ./build/controller/minikube_build.sh
$ kubectl apply -f deployments/controller.yaml

At this point, the controller should be up and running in the namespace bitcoin-controller, and you should be able to see its log output using

$ kubectl logs -n bitcoin-controller bitcoin-controller

Let us now add an actual bitcoin network with two replicas.

$ kubectl apply -f deployments/testNetwork.yaml

If you now take a look at the logfiles, you should see a couple of messages indicating that the controller has created a stateful set my-network-sts and a headless service my-network-svc. These objects have been created in the same namespace as the bitcoin network, i.e. the default namespace. You should be able to see them doing

$ kubectl get pods
$ kubectl get sts
$ kubectl get svc

When you run these tests for the first time in a new cluster, it will take some time for the containers to come up, as the bitcoind image has to be downloaded from the Docker Hub first. Once the pods are up, we can verify that the bitcoin daemon is running, say on the first node

$ kubectl exec my-network-sts-0 -- /usr/local/bin/bitcoin-cli -regtest -rpcuser=user -rpcpassword=password getnetworkinfo

We can also check that our controller will monitor the number of replicas in the stateful set and adjust accordingly. When we set the number of replicas in the stateful set to five, for instance, using

$ kubectl scale --replicas=5 statefulset/my-network-sts

and then immediately list the stateful set, you will see that the stateful set will bring up additional instances. After a few seconds, however, when the next regular update happens, the controller will detect the difference and scale the replica set down again.

This is nice, but there is again a problem which becomes apparent if we delete the network again.

$ kubectl delete bitcoinnetwork my-network

As we can see in the logs, this will call the delete handler, but at this point in time, the handler is not doing anything. Should we clean up all the objects that we have created? And how would that fit into the idea of a level based processing? If the next reconciliation takes place after the deletion, how can we identify the remaining objects?

Fortunately, Kubernetes offers very general mechanisms – owner references and cascading deletes – to handle these problems. In fact, Kubernetes will do all the work for us if we only keep a few points in mind – this will be the topic of the next post.

Building a bitcoin controller for Kubernetes part II – code generation and event handling

In this post, we will use the Kubernetes code generator to create client code and informers which will allow us to set up the basic event handlers for our customer controller.

Before we start to dig into this, note that compared to my previous post, I had to make a few changes to the CRD definition to avoid dashes in the name of the API group. The updated version of the CRD definition looks as follows.

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
    name: bitcoinnetworks.bitcoincontroller.christianb93.github.com
spec:
    version: v1
    group: bitcoincontroller.christianb93.github.com
    scope: Namespaced
    subresources:
      status: {}
    names:
      plural: bitcoinnetworks
      singular: bitcoinnetwork
      kind: BitcoinNetwork
    validation:
      openAPIV3Schema:
        properties:
          spec:
            required:
            - nodes
            properties:
              nodes:
                type: integer

Step 5: running the code generators

Of course we will use the Kubernetes code generator to generate the code for the clientset and the informer. To use the code generator, we first need to get the corresponding packages from the repository.

$ go get k8s.io/code-generator
$ go get k8s.io/gengo

The actual code generation takes place in three steps. In each step, we will invoke one of the Go programs located in $GOPATH/src/k8s.io/code-generator/cmd/ to create a specific set of objects. Structurally, these programs are very similar. They accept a parameter that specifies certain input packages that are scanned. They then look at every structure in these packages and detect tags, i.e. comments in a special format, to identify those objects for which they need to create code. Then they place the resulting code in an output package that we need to specify.

Fortunately, we only need to prepare three inputs files for the code generation – the first one is actually scanned by the generators for tags, the second and third file have to be provided to make the generated code compile.

  • In the package apis/bitcoincontroller/v1, we need to provide a file types.go in which define the Go structures corresponding to our CRD – i.e. a BitcoinNetwork, the corresponding list type BitcoinNetworkList, a BitcoinNetworkSpec and a BitcoinNetworkStatus. This is also the file in which we need to place our tags (as the scan is based on package structures, we could actually call our file however we want, but following the usual conventions makes it easier for third parties to read our code)
  • In the same directory, we will place a file register.go. This file defines some functions that will later be called by the generated code to register our API group and version with a scheme
  • Finally, there is a second file register.go which is placed in apis/bitcoincontroller and defines a constant representing the fully qualified name of the API group

We first start with the generator that creates the code to create deep copies for our API objects. In this case, we mark the structures for which code should be generated with the tag +k8s.deepcopy-gen=true (which we could also do on package level). As we also want to create DeepCopyObject() methods for these structures, we also add the additional tags

+k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

Then we invoke the code generator using

go run $GOPATH/src/k8s.io/code-generator/cmd/deepcopy-gen/main.go \
  --bounding-dirs github.com/christianb93/bitcoin-controller/internal/apis \
  --input-dirs github.com/christianb93/bitcoin-controller/internal/apis/bitcoincontroller/v1

By default, the generator will place its results in a file deepcopy_generated.go in the input directory. If you run the controller and open the file, you should find the generated code which is not hard to read and does in fact simply create deep copies. For a list, for instance, it creates a new list and copies item by item. As our structures are not deeply nested, the code is comparatively straightforward. If something goes wrong, you can add the switch --v 5 to increase the log level and obtain additional debugging output.

The second code generator that we will use is creating the various clients that we need – a clientset for our new API group and a client for our new resource. The structure of the command is similar, but this time, we place the generated code in a separate directory.

go run $GOPATH/src/k8s.io/code-generator/cmd/client-gen/main.go \
  --input-base "github.com/christianb93/bitcoin-controller/internal/apis" \
  --input "bitcoincontroller/v1" \
  --output-package "github.com/christianb93/bitcoin-controller/internal/generated/clientset" \
  --clientset-name "versioned"

The first two parameters taken together define the package that is scanned for tagged structures. This time, the magic tag that will cause a structure to be considered for code generation is +genclient. The third parameter and the fourth parameters similarly define where the output will be placed in the Go workspace. The actual package name will be formed from this output path by appending the name of the API group and the version. Make sure to set this variable, as the default will point into the Kubernetes package and not into your own code tree (it took me some time to figure out the exact meaning of all these switches and a few failed attempts plus some source code analysis – but this is one of the beauties of Go – all the source code is at your fingertip…)

When you run this command, it will place a couple of files in the directory $GOPATH/src/github.com/christianb93/bitcoin-controller/internal/generated/clientset. With these files, we have now all the code in place to handle our objects via the API – we can create, update, get and list our bitcoin networks. To list all existing bitcoin networks, for instance, the following code snippet will work (I have skipped some of the error handling code to make this more readable).

import (
	"fmt"
	"path/filepath"

	bitcoinv1 "github.com/christianb93/bitcoin-controller/internal/apis/bitcoincontroller/v1"
	clientset "github.com/christianb93/bitcoin-controller/internal/generated/clientset/versioned"
	"k8s.io/client-go/tools/clientcmd"
	"k8s.io/client-go/util/homedir"
)

home := homedir.HomeDir()
kubeconfig := filepath.Join(home, ".kube", "config")
config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
// Create BitcoinNetwork client set
c, err := clientset.NewForConfig(config)
client := c.BitcoincontrollerV1()
list, err := client.BitcoinNetworks("default").List(metav1.ListOptions{})
for _, item := range list.Items {
	fmt.Printf("Have item %s\n", item.Name)
}

This code is very similar to the code that we have used in one of our first examples to list pods and nodes, with the only difference that we are now using our generated packages to create a clientset. I have written a few tests to verify that the generated code works.

To complete our code generation, we now have to generate listers and informers. The required commands will first generate the listers package and then the informers package that uses the listers.

go run $GOPATH/src/k8s.io/code-generator/cmd/lister-gen/main.go \
  --input-dirs  "github.com/christianb93/bitcoin-controller/internal/apis/bitcoincontroller/v1"\
  --output-package "github.com/christianb93/bitcoin-controller/internal/generated/listers"

go run $GOPATH/src/k8s.io/code-generator/cmd/informer-gen/main.go \
  --input-dirs  "github.com/christianb93/bitcoin-controller/internal/apis/bitcoincontroller/v1"\
  --versioned-clientset-package "github.com/christianb93/bitcoin-controller/internal/generated/clientset/versioned"\
  --listers-package "github.com/christianb93/bitcoin-controller/internal/generated/listers"\
  --output-package "github.com/christianb93/bitcoin-controller/internal/generated/informers"

You can find a shell script that runs all necessary commands here.

Again, we can now use our listers and informers as for existing API objects. If you want to try this out, there is also a small test set for this generated code.

Step 6: writing the controller skeleton and running first tests

We can now implement most of the code of the controller up to the point where the actual business logic kicks in. In main.go, we create a shared informer and a controller object. Within the controller, we add event handlers to this informer that put the events onto a work queue. Finally, we create worker threads that pop the events off the queue and trigger the actual business logic (which we still have to implement). If you have followed my previous posts, this code is straightforward and does not contain anything new. Its structure at this point in time is summarized in the following diagram.

BitcoinControllerStructureI

We are now in a position to actually run our controller and test that the event handlers are called. For that purpose, clone my repository into your workspace, make sure that the CRD has been set up correctly in your cluster and start the controller locally using

$ go run $GOPATH/src/github.com/christianb93/bitcoin-controller/cmd/controller/main.go --kubeconfig "$HOME/.kube/config"

You should now see a few messages telling you that the controller is running and has entered its main loop. Then, in a second terminal, create a test bitcoin network using

$ kubectl apply -f https://raw.githubusercontent.com/christianb93/bitcoin-controller/master/deployments/testNetwork.yaml

You should now see that the ADD handler has been called and see a message that the worker thread has popped the resulting event off the work queue. So our message distribution scheme works! You will also see that even though there are no further changes, update events are published every 30 seconds. The reason for this behaviour is that the cache is resynced every 30 seconds which will push the update events. This can be useful to make sure that a reconciliation is done every 30 seconds, which might heal a potentially incorrect state which was the result of an earlier error.

This is nice, but there is a problem which becomes apparent if you now try to package our code in a container and run it inside the cluster as we have done it at the end of our previous post. This will not produce the same output, but error messages ending with “cannot list resource “bitcoinnetworks” in API group “bitcoincontroller.christianb93.github.com” at the cluster scope”.

The reason for this is that the pod is running with the default service account, and this account does not have the privileges to read our resources. In the next post, we will see how role based access control comes to the rescue.

As before, I have created the tag v0.2 to reflect the status of the code at the end of this post.

Building a bitcoin controller for Kubernetes part I – the basics

As announced in a previous post, we will, in this and the following posts, implement a bitcoin controller for Kubernetes. This controller will be aimed at starting and operating a bitcoin test network and is not designed for production use.

Here are some key points of the design:

  • A bitcoin network will be specified by using a custom resource
  • This definition will contain the number of bitcoin nodes that the controller will bring up. The controller will also talk to the individual bitcoin daemons using the Bitcon JSON RPC API to make the nodes known to each other
  • The controller will monitor the state of the network and maintain a node list which is part of the status subresource of the CRD
  • The bitcoin nodes are treated as stateful pods (i.e. controlled by a stateful set), but we will use ephemeral storage for the sake of simplicity
  • The individual nodes are not exposed to the outside world, and users running tests against the cluster either have to use tunnels or log into the pod to run tests there – this is of course something that could be changed in a future version

The primary goal of writing this operator was not to actually run it in real uses cases, but to demonstrate how Kubernetes controllers work under the hood… Along the way, we will learn a bit about building a bitcoin RPC client in Go, setting up and using service accounts with Kubernetes, managing secrets, using and publishing events and a few other things from the Kubernetes / Go universe.

Step 1: build the bitcoin Docker image

Our controller will need a Docker image that contains the actual bitcoin daemon. At least initially, we will use the image from one of my previous posts that I have published on the Docker Hub. If you decide to use this image, you can skip this section. If, however, you have your own Docker Hub account and want to build the image yourself, here is what you need to do.

Of course, you will first need to log into Docker Hub and create a new public repository.
You will also need to make sure that you have a local version of Docker up and running. Then follow the instructions below, replacing christianb93 in all but the first lines with your Docker Hub username. This will

  • Clone my repository containing the Dockerfile
  • Trigger the build and store the resulting image locally, using the tag username/bitcoind:latest – be patient, the build can take some time
  • Log in to the Docker hub which will store your credentials locally for later use by the docker command
  • Push the tagged image to the Docker Hub
  • Delete your credentials again
$ git clone https://github.com/christianb93/bitcoin.git
$ cd bitcoin/docker 
$ docker build --rm -f Dockerfile -t christianb93/bitcoind:latest .
$ docker login
$ docker push christianb93/bitcoind:latest
$ docker logout

Step 2: setting up the skeleton – logging and authentication

We are now ready to create a skeleton for our controller that is able to start up inside a Kubernetes cluster and (for debugging purposes) locally. First, let us discuss how we package our code in a container and run it for testing purposes in our cluster.

The first thing that we need to define is our directory layout. Following standard conventions, we will place our code in the local workspace, i.e. the $GOPATH directory, under $GOPATH/src/github.com/christianb93/bitcoin-controller. This directory will contain the following subdirectories.

  • internal will contain our packages as they are not meant to be used outside of our project
  • cmd/controller will contain the main routine for the controller
  • build will contain the scripts and Dockerfiles to build everything
  • deployments will holds all manifest files needed for the deployment

By default, Go images are statically linked against all Go specific libraries. This implies that you can run a Go image in a very minimal container that contains only C runtime libraries. But we can go even further and ask the Go compiler to also statically link the C runtime library into the Go executable. This executable is then independent of any other libraries and can therefore run in a “scratch” container, i.e. an empty container. To compile our controller accordingly, we can use the commands

CGO_ENABLED=0 go build
docker build --rm -f ../../build/controller/Dockerfile -t christianb93/bitcoin-controller:latest .

in the directory cmd/controller. This will build the controller and a docker image based on the empty scratch image. The Dockerfile is actually very simple:

FROM scratch

#
# Copy the controller binary from the context into our
# container image
#
COPY controller /
#
# Start controller
#
ENTRYPOINT ["/controller"]

Let us now see how we can run our controller inside a test cluster. I use minikube to run tests locally. The easiest way to run own images in minikube is to build them against the docker instance running within minikube. To do this, execute the command

eval $(minikube docker-env)

This will set some environment variables so that any future docker commands are directed to the docker engine built into minikube. If we now build the image as above, this will create a docker image in the local repository. We can run our image from there using

kubectl run bitcoin-controller --image=christianb93/bitcoin-controller --image-pull-policy=Never --restart=Never

Note the image pull policy – without this option, Kubernetes would try to pull the image from the Docker hub. If you do not use minikube, you will have to extend the build process by pushing the image to a public repository like Docker hub or a local repository reachable from within the Kubernetes cluster that you use for your tests and omit the image pull policy flag in the command above. We can now inspect the log files that our controller writes using

kubectl logs bitcoin-controller

To implement logging, we use the klog package. This will write our log message to the standard output of the container, where they are picked up by the Docker daemon and forwarded to the Kubernetes logging system.

Our controller will need access to the Kubernetes API, regardless of whether we execute it locally or within a Kubernetes cluster. For that purpose, we use a command-line argument kubeconfig. If this argument is set, it refers to a kubectl config file that is used by the controller. We then follow the usual procedure to create a clientset.

In case we are running inside a cluster, we need to use a different mechanism to obtain a configuration. This mechanism is based on a service accounts.

Essentially, service accounts are “users” that are associated with a pod. When we associate a service account with a pod, Kubernetes will map the credentials that authenticate this service account into /var/run/secrets/kubernetes.io/serviceaccount. When we use the helper function clientcmd.BuildConfigFromFlags and pass an empty string as configuration file, the Go client will fall back to in-cluster configuration and try to retrieve the credentials from that location. If we do not specify a service account for the pod, a default account is used. This is what we will do for the time being, but we will soon run into trouble with this approach and will have to define a service account, an RBAC role and a role binding to grant permissions to our controller.

Step 3: create a CRD

Next, let us create a custom resource definition that describes our bitcoin network. This definition is very simple – the only property of our network that we want to make configurable at this point in time is the number of bitcoin nodes that we want to run. We do specify a status subresource which we will later use to track the status of the network, for instance the IP addresses of its nodes. Here is our CRD.

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
    name: bitcoin-networks.bitcoin-controller.christianb93.github.com
spec:
    version: v1
    group: bitcoin-controller.christianb93.github.com
    scope: Namespaced
    subresources:
      status: {}
    names:
      plural: bitcoin-networks
      singular: bitcoin-network
      kind: BitcoinNetwork
    validation:
      openAPIV3Schema:
        properties:
          spec:
            required:
            - nodes
            properties:
              nodes:
                type: int

Step 4: pushing to a public repository and running the controller

Let us now go through the complete deployment cycle once, including the push to a public repository. I assume that you have a user on Docker Hub, (for me, this is christianb93), and have set up a repository called bitcoin-controller in this account. I will also assume that you have done a docker login before running the commands below. Then, building the controller is easy – simply run the following commands, replacing the christianb93 in the last two commands with your username on Docker Hub.

cd $GOPATH/src/github.com/christianb93/bitcoin-controller/cmd/controller
CGO_ENABLED=0 go build
docker build --rm -f ../../build/controller/Dockerfile -t christianb93/bitcoin-controller:latest .
docker push christianb93/bitcoin-controller:latest

Once the push is complete, you can run the controller using a standard manifest file as the one below.

apiVersion: v1
kind: Pod
metadata:
  name: bitcoin-controller
  namespace: default
spec:
  containers:
  - name: bitcoin-controller-ctr
    image: christianb93/bitcoin-controller:latest

Note that this will only pull the image from Docker Hub if we delete the local image using

docker rmi christianb93/bitcoin-controller:latest

from the minikube Docker repository (or did not use that repository at all). You will see that pushing takes some time, this is why I prefer to work with the local registry most of the time and only push to the Docker Hub once in a while.

We now have our build system in place and a working skeleton which we can run in our cluster. This version of the code is available in my GitHub repository under the v0.1 tag. In the next post, we will start to add some meat – we will model our CRD in a Go structure and put our controller in a position to react on newly added bitcoin networks.

Understanding Kubernetes controllers part IV – putting it all together

In the last few posts, we have looked in detail at the various parts of the machinery behind Kubernetes controllers – informers, queues, stores and so forth. In this final post, we will wrap up a bit and see how all this comes together in the Kubernetes sample controller.

The flow of control

The two main source code files that make up the sample controller are main.go and controller.go. The main function is actually rather simple. It creates a clientset, an informer factory for Foo objects and an informer factory for deployments and then uses those items to create a controller.

Once the controller exists, main starts the informers using the corresponding functions of the factory, which will bring up the goroutines of the two informers. Finally, the controllers main loop is started by calling its Run method.

The controller is more interesting. When it is created using NewController, it attaches event handlers to the informes for Deployments and Foo resources. As we have seen, both event handlers will eventually put Foo objects that require attention into a work queue.

The items in this work queue are processed by worker threads that are created in the controllers Run method. As the queue servers act as a synchronization point, we can run as many worker threads that we want and still make sure that each event is processed by only one worker thread. The main function of the worker thread is processNextWorkItem which retrieves elements from the queue, i.e. the keys of the Foo objects that need to be reconciled, and calls syncHandler for each of them. If that functions fails, the item is put back onto the work queue.

The syncHandler function contains the actual reconciliation logic. It first splits the key into namespace and name of the Foo resource. Then it tries to receive an existing deployment for this Foo resource and creates one if it does not yet exist. If a deployment is found, but deviates from the target state, it is replaced by a new deployment as returned by the function newDeployment. Finally, the status subresource of the Foo object is updated.

So this simple controller logic realizes some of the recommendations for building controllers.

  • It uses queues to decouple workers from event handlers and to allow for concurrent processing
  • Its logic is level based, not edge based, as a controller might be down for some time and miss updates
  • It uses shared informers and caches. The cached objects are never updated directly, but if updates are needed, deep copies are used and modified and updates are done via the API
  • It waits for all caches to sync before starting worker threads
  • It collapses all work to be done into a single queue

Package structure and generated code

If you browse the sourcecode of the sample controller, you might at the first glance be overwhelmed by the number of files in the repository. However, there is good news – most of this code is actually generated! In fact, the diagram below shows the most important files and packages. Generated files are in italic font, and files that serve as input for the code generator or are referenced by the generated code are in bold face.

SampleControllerPackages

We see that the number of files that we need to provide is comparatively small. Apart from, of course, the files discussed above (main.go and controller.go), these are

  • register.go which adds our new types to the scheme and contains some lookup functions that the generated code will use
  • types.go which defines the Go structures that correspond to the Foo CRD
  • register.go in the top-level directory of the API package which defines the name of the API group as a constant

All the other files are created by the Kubernetes API code generator. This generator accepts a few switches (“generators”) to generate various types of code artifacts – deep-copy routines, clientsets, listers and informers. We will see the generator in action in a later post. Instead of using the code generator directly, we could of course as well use the sample controller as a starting point for our own code and make the necessary modifications manually, the previous posts should contain all the information that we need for this.

This post completes our short mini-series on the machinery behind custom controllers. In the next series, we will actually apply all this to implement a controller that operators a small bitcoin test network in a Kubernetes cluster.

Understanding Kubernetes controllers part III – informers

After understanding how an informer can be used to implement a custom controller, we will now learn more about the inner working of an informer.

We have seen that essentially, informers use the Kubernetes API to learn about changes in the state of a Kubernetes cluster and use that information to maintain a cache (the indexer) of the current cluster state and to inform clients about the changes by calling handler functions. To achieve this, an informer (more specifically: a shared informer) is again a composition of several components.

  1. A reflector is the component which is actually talking to the Kubernetes API server
  2. The reflector stores the information on state changes in a special queue, a Delta FIFO
  3. A processor is used to distribute the information to all registered event handlers
  4. Finally, a cache controller is reading from the queue and orchestrating the overall process

SharedInformer

In the next few sections, we visit each of these components in turn.

Reflectors and the FIFO queue

In the last post, we have seen that the Kubernetes API offers mechanisms like resource versions and watches to allow a client to keep track of the cluster state. However, these mechanism require some logic – keeping track of the resource version, handling timeouts and so forth. Within the Kubernetes client package, this logic is built into an Reflector.

A reflector uses the API to keep track of changes for one specific type of resources and update an object store accordingly. To talk to the Kubernetes API, it uses an object implementing the ListWatch interface, which is simply a convenience interface with a default implementation

When a reflector is started by invoking its Run method, it will periodically run the ListAndWatch method, which contains the core logic. This function first lists all resources and retrieves the resource version. It then uses the retrieved list to rebuild the object store. It then creates a watch (which is an instance of the watch.Interface interface) and reads from the channel provided by this watch resource in a loop, while keeping the resource version up to date. Depending on the type of the event, it then calls either Add, Update or Delete on the store.

The store that is maintained by the reflector is not just an ordinary store, but a Delta FIFO. This is a special store that maintains deltas, i.e. instances of cache.Delta, which is a structure containing the changed object and the type of change. The store will, for each object that has been subject to at least one change, maintain a list of all changes that have been reported for this object, and perform a certain de-duplication. Thus a reader will get a complete list of all deltas for a specific resource.

The cache controller and the processor

So our reflector is feeding the delta FIFO queue, but who is reading from it? This is done by another component of the shared informer – the cache controller.

When the Run method of a cache controller is invoked, it creates a Reflector, starts the Reflectors Run method in a separate goroutine and starts its own main loop processLoop as a second goroutine. Within that loop, the controller will pop elements off the FIFO queue and invoke the handleDeltas method of the informer once for each element in the queue. This method represents the main loop of the informer that is executed once for each object in the delta store.

This function will do two things. First, it will update the indexer according to the detected change retrieved from the delta FIFO queue. Second, it will build notifications (using the indexer to get old versions of an object if needed) to create notifications and distribute them to all handler functions. This work is delegated to the processor.

Essentially, a processor is an object that maintains a list of listeners. A listeners has an add method to receive notifications. These notifications are then buffered and forwarded to a resource event handler which can be any object that has methods OnAdd, OnUpdate and OnDelete.

We now have a rather complete understanding of what happens in case the state of the Kubernetes cluster changes.

  • The reflector picks up the change via the Kubernetes API and enqueues it in the delta FIFO queue, where deduplication is performed and all changes for the same object are consolidated
  • The controller is eventually reading the change from the delta FIFO queue
  • The controller updates the indexer
  • The controller forwards the change to the processor, which in turns calls all handler functions
  • The handler functions are methods of the custom controller who can then inspect the delta, use the updated indexer to retrieve additional information and adjust the cluster state if needed

Note that the same informer object can serve many different handler functions, while maintaining only one, shared indexer – as the name suggests. Throughout the code, synchronisation primitives are used to protect the shared data structures to make all this thread-safe.

Creating and using shared informers

Let us now see how we can create and use a shared informer in practice. Suppose we wanted to write a controller that is taking care of pods and, for instance, trigger some action whenever a pod is created. We can do this by creating an informer which will listen for events on pods and call a handler function that we define. Again, the sample controller is guiding us how to do this.

The recommended way to create shared informers is to use an informer factory. This factory maintains a list of informers per resource type, and if a new informer is requested, it first performs a lookup in that list and returns an existing informer if possible (this happens in the function InformerFor). It is this mechanism which actually makes the informers and therefore their caches shared and reduces both memory footprint and the load on the API server.

We can create a factory and use it to get a shared informer listening for pod events as follows.

factory := informers.NewSharedInformerFactory(clientset, 30*time.Second)
podInformer := factory.Core().V1().Pods().Informer()

Here clientset is a Kubernetes clientset that we can create as in our previous examples, and the second argument is the resync period. In our example, we instruct the informer to rebuild its cache every 30 seconds.

Once our informer is created, we need to start its main loop. However, we cannot simply call its Run method, as this informer might be shared and we do not want to call this method twice. Instead, we can again use a convenience function of the factory class – its method Start will start all informers created by the factory and keep track of their status.

We can now register event handlers with the informer. An event handler is an object implementing the ResourceHandler interface. Instead of building a class implementing this interface ourselves, we can use the existing class ResourceHandlerFuncs. Assuming that our handler functions for adding, updating and deleting pods are called onAdd, onUpdate and onDelete, the code to add these functions as an event handler would look as follows.

podInformer.AddEventHandler(
		&cache.ResourceEventHandlerFuncs{
			AddFunc:    onAdd,
			DeleteFunc: onDelete,
			UpdateFunc: onUpdate,
		})

Finally, we need to think about how we stop our controller again. At the end of our main function, we cannot simply exit, as this would stop the entire controller loop immediately. So we need to wait for something – most likely for a signal sent to us by the operating system, for instance because we have hit Ctrl-C. This can be achieved using the idea of a stop channel. This is a channel which is closed when we want to exit, for instance because a signal is received. To achieve this, we can create a dedicated goroutine which uses the os standard package to receive signals from the operating system and closes the channel accordingly. An example implementation is part of the sample controller.

Our event handlers are now very easy. They receive an object that we can convert to a Pod object using a type assertion. We can than use properties of the pod, for instance its name, print information or process that information further. A full working example can be found in my GitHub repository here.

This completes our post for today. In the next post in this mini-series on Go and Kubernetes, we will put everything that we have learned so far together and talk a short walk through the full source code of the sample controller to understand how a custom controller works. This will then finally put us in a position to start the implementation of our planned bitcoin controller.