July 2019 – LeftAsExercise

Building a bitcoin controller for Kubernetes part II – code generation and event handling

In this post, we will use the Kubernetes code generator to create client code and informers which will allow us to set up the basic event handlers for our customer controller.

Before we start to dig into this, note that compared to my previous post, I had to make a few changes to the CRD definition to avoid dashes in the name of the API group. The updated version of the CRD definition looks as follows.

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
    name: bitcoinnetworks.bitcoincontroller.christianb93.github.com
spec:
    version: v1
    group: bitcoincontroller.christianb93.github.com
    scope: Namespaced
    subresources:
      status: {}
    names:
      plural: bitcoinnetworks
      singular: bitcoinnetwork
      kind: BitcoinNetwork
    validation:
      openAPIV3Schema:
        properties:
          spec:
            required:
            - nodes
            properties:
              nodes:
                type: integer

Step 5: running the code generators

Of course we will use the Kubernetes code generator to generate the code for the clientset and the informer. To use the code generator, we first need to get the corresponding packages from the repository.

$ go get k8s.io/code-generator
$ go get k8s.io/gengo

The actual code generation takes place in three steps. In each step, we will invoke one of the Go programs located in $GOPATH/src/k8s.io/code-generator/cmd/ to create a specific set of objects. Structurally, these programs are very similar. They accept a parameter that specifies certain input packages that are scanned. They then look at every structure in these packages and detect tags, i.e. comments in a special format, to identify those objects for which they need to create code. Then they place the resulting code in an output package that we need to specify.

Fortunately, we only need to prepare three inputs files for the code generation – the first one is actually scanned by the generators for tags, the second and third file have to be provided to make the generated code compile.

In the package apis/bitcoincontroller/v1, we need to provide a file types.go in which define the Go structures corresponding to our CRD – i.e. a BitcoinNetwork, the corresponding list type BitcoinNetworkList, a BitcoinNetworkSpec and a BitcoinNetworkStatus. This is also the file in which we need to place our tags (as the scan is based on package structures, we could actually call our file however we want, but following the usual conventions makes it easier for third parties to read our code)
In the same directory, we will place a file register.go. This file defines some functions that will later be called by the generated code to register our API group and version with a scheme
Finally, there is a second file register.go which is placed in apis/bitcoincontroller and defines a constant representing the fully qualified name of the API group

We first start with the generator that creates the code to create deep copies for our API objects. In this case, we mark the structures for which code should be generated with the tag +k8s.deepcopy-gen=true (which we could also do on package level). As we also want to create DeepCopyObject() methods for these structures, we also add the additional tags

+k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

Then we invoke the code generator using

go run $GOPATH/src/k8s.io/code-generator/cmd/deepcopy-gen/main.go \
  --bounding-dirs github.com/christianb93/bitcoin-controller/internal/apis \
  --input-dirs github.com/christianb93/bitcoin-controller/internal/apis/bitcoincontroller/v1

By default, the generator will place its results in a file deepcopy_generated.go in the input directory. If you run the controller and open the file, you should find the generated code which is not hard to read and does in fact simply create deep copies. For a list, for instance, it creates a new list and copies item by item. As our structures are not deeply nested, the code is comparatively straightforward. If something goes wrong, you can add the switch --v 5 to increase the log level and obtain additional debugging output.

The second code generator that we will use is creating the various clients that we need – a clientset for our new API group and a client for our new resource. The structure of the command is similar, but this time, we place the generated code in a separate directory.

go run $GOPATH/src/k8s.io/code-generator/cmd/client-gen/main.go \
  --input-base "github.com/christianb93/bitcoin-controller/internal/apis" \
  --input "bitcoincontroller/v1" \
  --output-package "github.com/christianb93/bitcoin-controller/internal/generated/clientset" \
  --clientset-name "versioned"

The first two parameters taken together define the package that is scanned for tagged structures. This time, the magic tag that will cause a structure to be considered for code generation is +genclient. The third parameter and the fourth parameters similarly define where the output will be placed in the Go workspace. The actual package name will be formed from this output path by appending the name of the API group and the version. Make sure to set this variable, as the default will point into the Kubernetes package and not into your own code tree (it took me some time to figure out the exact meaning of all these switches and a few failed attempts plus some source code analysis – but this is one of the beauties of Go – all the source code is at your fingertip…)

When you run this command, it will place a couple of files in the directory $GOPATH/src/github.com/christianb93/bitcoin-controller/internal/generated/clientset. With these files, we have now all the code in place to handle our objects via the API – we can create, update, get and list our bitcoin networks. To list all existing bitcoin networks, for instance, the following code snippet will work (I have skipped some of the error handling code to make this more readable).

import (
	"fmt"
	"path/filepath"

	bitcoinv1 "github.com/christianb93/bitcoin-controller/internal/apis/bitcoincontroller/v1"
	clientset "github.com/christianb93/bitcoin-controller/internal/generated/clientset/versioned"
	"k8s.io/client-go/tools/clientcmd"
	"k8s.io/client-go/util/homedir"
)

home := homedir.HomeDir()
kubeconfig := filepath.Join(home, ".kube", "config")
config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
// Create BitcoinNetwork client set
c, err := clientset.NewForConfig(config)
client := c.BitcoincontrollerV1()
list, err := client.BitcoinNetworks("default").List(metav1.ListOptions{})
for _, item := range list.Items {
	fmt.Printf("Have item %s\n", item.Name)
}

This code is very similar to the code that we have used in one of our first examples to list pods and nodes, with the only difference that we are now using our generated packages to create a clientset. I have written a few tests to verify that the generated code works.

To complete our code generation, we now have to generate listers and informers. The required commands will first generate the listers package and then the informers package that uses the listers.

go run $GOPATH/src/k8s.io/code-generator/cmd/lister-gen/main.go \
  --input-dirs  "github.com/christianb93/bitcoin-controller/internal/apis/bitcoincontroller/v1"\
  --output-package "github.com/christianb93/bitcoin-controller/internal/generated/listers"

go run $GOPATH/src/k8s.io/code-generator/cmd/informer-gen/main.go \
  --input-dirs  "github.com/christianb93/bitcoin-controller/internal/apis/bitcoincontroller/v1"\
  --versioned-clientset-package "github.com/christianb93/bitcoin-controller/internal/generated/clientset/versioned"\
  --listers-package "github.com/christianb93/bitcoin-controller/internal/generated/listers"\
  --output-package "github.com/christianb93/bitcoin-controller/internal/generated/informers"

You can find a shell script that runs all necessary commands here.

Again, we can now use our listers and informers as for existing API objects. If you want to try this out, there is also a small test set for this generated code.

Step 6: writing the controller skeleton and running first tests

We can now implement most of the code of the controller up to the point where the actual business logic kicks in. In main.go, we create a shared informer and a controller object. Within the controller, we add event handlers to this informer that put the events onto a work queue. Finally, we create worker threads that pop the events off the queue and trigger the actual business logic (which we still have to implement). If you have followed my previous posts, this code is straightforward and does not contain anything new. Its structure at this point in time is summarized in the following diagram.

We are now in a position to actually run our controller and test that the event handlers are called. For that purpose, clone my repository into your workspace, make sure that the CRD has been set up correctly in your cluster and start the controller locally using

$ go run $GOPATH/src/github.com/christianb93/bitcoin-controller/cmd/controller/main.go --kubeconfig "$HOME/.kube/config"

You should now see a few messages telling you that the controller is running and has entered its main loop. Then, in a second terminal, create a test bitcoin network using

$ kubectl apply -f https://raw.githubusercontent.com/christianb93/bitcoin-controller/master/deployments/testNetwork.yaml

You should now see that the ADD handler has been called and see a message that the worker thread has popped the resulting event off the work queue. So our message distribution scheme works! You will also see that even though there are no further changes, update events are published every 30 seconds. The reason for this behaviour is that the cache is resynced every 30 seconds which will push the update events. This can be useful to make sure that a reconciliation is done every 30 seconds, which might heal a potentially incorrect state which was the result of an earlier error.

This is nice, but there is a problem which becomes apparent if you now try to package our code in a container and run it inside the cluster as we have done it at the end of our previous post. This will not produce the same output, but error messages ending with “cannot list resource “bitcoinnetworks” in API group “bitcoincontroller.christianb93.github.com” at the cluster scope”.

The reason for this is that the pod is running with the default service account, and this account does not have the privileges to read our resources. In the next post, we will see how role based access control comes to the rescue.

As before, I have created the tag v0.2 to reflect the status of the code at the end of this post.

Building a bitcoin controller for Kubernetes part I – the basics

As announced in a previous post, we will, in this and the following posts, implement a bitcoin controller for Kubernetes. This controller will be aimed at starting and operating a bitcoin test network and is not designed for production use.

Here are some key points of the design:

A bitcoin network will be specified by using a custom resource
This definition will contain the number of bitcoin nodes that the controller will bring up. The controller will also talk to the individual bitcoin daemons using the Bitcon JSON RPC API to make the nodes known to each other
The controller will monitor the state of the network and maintain a node list which is part of the status subresource of the CRD
The bitcoin nodes are treated as stateful pods (i.e. controlled by a stateful set), but we will use ephemeral storage for the sake of simplicity
The individual nodes are not exposed to the outside world, and users running tests against the cluster either have to use tunnels or log into the pod to run tests there – this is of course something that could be changed in a future version

The primary goal of writing this operator was not to actually run it in real uses cases, but to demonstrate how Kubernetes controllers work under the hood… Along the way, we will learn a bit about building a bitcoin RPC client in Go, setting up and using service accounts with Kubernetes, managing secrets, using and publishing events and a few other things from the Kubernetes / Go universe.

Step 1: build the bitcoin Docker image

Our controller will need a Docker image that contains the actual bitcoin daemon. At least initially, we will use the image from one of my previous posts that I have published on the Docker Hub. If you decide to use this image, you can skip this section. If, however, you have your own Docker Hub account and want to build the image yourself, here is what you need to do.

Of course, you will first need to log into Docker Hub and create a new public repository.
You will also need to make sure that you have a local version of Docker up and running. Then follow the instructions below, replacing christianb93 in all but the first lines with your Docker Hub username. This will

Clone my repository containing the Dockerfile
Trigger the build and store the resulting image locally, using the tag username/bitcoind:latest – be patient, the build can take some time
Log in to the Docker hub which will store your credentials locally for later use by the docker command
Push the tagged image to the Docker Hub
Delete your credentials again

$ git clone https://github.com/christianb93/bitcoin.git
$ cd bitcoin/docker 
$ docker build --rm -f Dockerfile -t christianb93/bitcoind:latest .
$ docker login
$ docker push christianb93/bitcoind:latest
$ docker logout

Step 2: setting up the skeleton – logging and authentication

We are now ready to create a skeleton for our controller that is able to start up inside a Kubernetes cluster and (for debugging purposes) locally. First, let us discuss how we package our code in a container and run it for testing purposes in our cluster.

The first thing that we need to define is our directory layout. Following standard conventions, we will place our code in the local workspace, i.e. the $GOPATH directory, under $GOPATH/src/github.com/christianb93/bitcoin-controller. This directory will contain the following subdirectories.

internal will contain our packages as they are not meant to be used outside of our project
cmd/controller will contain the main routine for the controller
build will contain the scripts and Dockerfiles to build everything
deployments will holds all manifest files needed for the deployment

By default, Go images are statically linked against all Go specific libraries. This implies that you can run a Go image in a very minimal container that contains only C runtime libraries. But we can go even further and ask the Go compiler to also statically link the C runtime library into the Go executable. This executable is then independent of any other libraries and can therefore run in a “scratch” container, i.e. an empty container. To compile our controller accordingly, we can use the commands

CGO_ENABLED=0 go build
docker build --rm -f ../../build/controller/Dockerfile -t christianb93/bitcoin-controller:latest .

in the directory cmd/controller. This will build the controller and a docker image based on the empty scratch image. The Dockerfile is actually very simple:

FROM scratch

#
# Copy the controller binary from the context into our
# container image
#
COPY controller /
#
# Start controller
#
ENTRYPOINT ["/controller"]

Let us now see how we can run our controller inside a test cluster. I use minikube to run tests locally. The easiest way to run own images in minikube is to build them against the docker instance running within minikube. To do this, execute the command

eval $(minikube docker-env)

This will set some environment variables so that any future docker commands are directed to the docker engine built into minikube. If we now build the image as above, this will create a docker image in the local repository. We can run our image from there using

kubectl run bitcoin-controller --image=christianb93/bitcoin-controller --image-pull-policy=Never --restart=Never

Note the image pull policy – without this option, Kubernetes would try to pull the image from the Docker hub. If you do not use minikube, you will have to extend the build process by pushing the image to a public repository like Docker hub or a local repository reachable from within the Kubernetes cluster that you use for your tests and omit the image pull policy flag in the command above. We can now inspect the log files that our controller writes using

kubectl logs bitcoin-controller

To implement logging, we use the klog package. This will write our log message to the standard output of the container, where they are picked up by the Docker daemon and forwarded to the Kubernetes logging system.

Our controller will need access to the Kubernetes API, regardless of whether we execute it locally or within a Kubernetes cluster. For that purpose, we use a command-line argument kubeconfig. If this argument is set, it refers to a kubectl config file that is used by the controller. We then follow the usual procedure to create a clientset.

In case we are running inside a cluster, we need to use a different mechanism to obtain a configuration. This mechanism is based on a service accounts.

Essentially, service accounts are “users” that are associated with a pod. When we associate a service account with a pod, Kubernetes will map the credentials that authenticate this service account into /var/run/secrets/kubernetes.io/serviceaccount. When we use the helper function clientcmd.BuildConfigFromFlags and pass an empty string as configuration file, the Go client will fall back to in-cluster configuration and try to retrieve the credentials from that location. If we do not specify a service account for the pod, a default account is used. This is what we will do for the time being, but we will soon run into trouble with this approach and will have to define a service account, an RBAC role and a role binding to grant permissions to our controller.

Step 3: create a CRD

Next, let us create a custom resource definition that describes our bitcoin network. This definition is very simple – the only property of our network that we want to make configurable at this point in time is the number of bitcoin nodes that we want to run. We do specify a status subresource which we will later use to track the status of the network, for instance the IP addresses of its nodes. Here is our CRD.

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
    name: bitcoin-networks.bitcoin-controller.christianb93.github.com
spec:
    version: v1
    group: bitcoin-controller.christianb93.github.com
    scope: Namespaced
    subresources:
      status: {}
    names:
      plural: bitcoin-networks
      singular: bitcoin-network
      kind: BitcoinNetwork
    validation:
      openAPIV3Schema:
        properties:
          spec:
            required:
            - nodes
            properties:
              nodes:
                type: int

Step 4: pushing to a public repository and running the controller

Let us now go through the complete deployment cycle once, including the push to a public repository. I assume that you have a user on Docker Hub, (for me, this is christianb93), and have set up a repository called bitcoin-controller in this account. I will also assume that you have done a docker login before running the commands below. Then, building the controller is easy – simply run the following commands, replacing the christianb93 in the last two commands with your username on Docker Hub.

cd $GOPATH/src/github.com/christianb93/bitcoin-controller/cmd/controller
CGO_ENABLED=0 go build
docker build --rm -f ../../build/controller/Dockerfile -t christianb93/bitcoin-controller:latest .
docker push christianb93/bitcoin-controller:latest

Once the push is complete, you can run the controller using a standard manifest file as the one below.

apiVersion: v1
kind: Pod
metadata:
  name: bitcoin-controller
  namespace: default
spec:
  containers:
  - name: bitcoin-controller-ctr
    image: christianb93/bitcoin-controller:latest

Note that this will only pull the image from Docker Hub if we delete the local image using

docker rmi christianb93/bitcoin-controller:latest

from the minikube Docker repository (or did not use that repository at all). You will see that pushing takes some time, this is why I prefer to work with the local registry most of the time and only push to the Docker Hub once in a while.

We now have our build system in place and a working skeleton which we can run in our cluster. This version of the code is available in my GitHub repository under the v0.1 tag. In the next post, we will start to add some meat – we will model our CRD in a Go structure and put our controller in a position to react on newly added bitcoin networks.

Understanding Kubernetes controllers part IV – putting it all together

In the last few posts, we have looked in detail at the various parts of the machinery behind Kubernetes controllers – informers, queues, stores and so forth. In this final post, we will wrap up a bit and see how all this comes together in the Kubernetes sample controller.

The flow of control

The two main source code files that make up the sample controller are main.go and controller.go. The main function is actually rather simple. It creates a clientset, an informer factory for Foo objects and an informer factory for deployments and then uses those items to create a controller.

Once the controller exists, main starts the informers using the corresponding functions of the factory, which will bring up the goroutines of the two informers. Finally, the controllers main loop is started by calling its Run method.

The controller is more interesting. When it is created using NewController, it attaches event handlers to the informes for Deployments and Foo resources. As we have seen, both event handlers will eventually put Foo objects that require attention into a work queue.

The items in this work queue are processed by worker threads that are created in the controllers Run method. As the queue servers act as a synchronization point, we can run as many worker threads that we want and still make sure that each event is processed by only one worker thread. The main function of the worker thread is processNextWorkItem which retrieves elements from the queue, i.e. the keys of the Foo objects that need to be reconciled, and calls syncHandler for each of them. If that functions fails, the item is put back onto the work queue.

The syncHandler function contains the actual reconciliation logic. It first splits the key into namespace and name of the Foo resource. Then it tries to receive an existing deployment for this Foo resource and creates one if it does not yet exist. If a deployment is found, but deviates from the target state, it is replaced by a new deployment as returned by the function newDeployment. Finally, the status subresource of the Foo object is updated.

So this simple controller logic realizes some of the recommendations for building controllers.

It uses queues to decouple workers from event handlers and to allow for concurrent processing
Its logic is level based, not edge based, as a controller might be down for some time and miss updates
It uses shared informers and caches. The cached objects are never updated directly, but if updates are needed, deep copies are used and modified and updates are done via the API
It waits for all caches to sync before starting worker threads
It collapses all work to be done into a single queue

Package structure and generated code

If you browse the sourcecode of the sample controller, you might at the first glance be overwhelmed by the number of files in the repository. However, there is good news – most of this code is actually generated! In fact, the diagram below shows the most important files and packages. Generated files are in italic font, and files that serve as input for the code generator or are referenced by the generated code are in bold face.

We see that the number of files that we need to provide is comparatively small. Apart from, of course, the files discussed above (main.go and controller.go), these are

register.go which adds our new types to the scheme and contains some lookup functions that the generated code will use
types.go which defines the Go structures that correspond to the Foo CRD
register.go in the top-level directory of the API package which defines the name of the API group as a constant

All the other files are created by the Kubernetes API code generator. This generator accepts a few switches (“generators”) to generate various types of code artifacts – deep-copy routines, clientsets, listers and informers. We will see the generator in action in a later post. Instead of using the code generator directly, we could of course as well use the sample controller as a starting point for our own code and make the necessary modifications manually, the previous posts should contain all the information that we need for this.

This post completes our short mini-series on the machinery behind custom controllers. In the next series, we will actually apply all this to implement a controller that operators a small bitcoin test network in a Kubernetes cluster.

Understanding Kubernetes controllers part III – informers

After understanding how an informer can be used to implement a custom controller, we will now learn more about the inner working of an informer.

We have seen that essentially, informers use the Kubernetes API to learn about changes in the state of a Kubernetes cluster and use that information to maintain a cache (the indexer) of the current cluster state and to inform clients about the changes by calling handler functions. To achieve this, an informer (more specifically: a shared informer) is again a composition of several components.

A reflector is the component which is actually talking to the Kubernetes API server
The reflector stores the information on state changes in a special queue, a Delta FIFO
A processor is used to distribute the information to all registered event handlers
Finally, a cache controller is reading from the queue and orchestrating the overall process

In the next few sections, we visit each of these components in turn.

Reflectors and the FIFO queue

In the last post, we have seen that the Kubernetes API offers mechanisms like resource versions and watches to allow a client to keep track of the cluster state. However, these mechanism require some logic – keeping track of the resource version, handling timeouts and so forth. Within the Kubernetes client package, this logic is built into an Reflector.

A reflector uses the API to keep track of changes for one specific type of resources and update an object store accordingly. To talk to the Kubernetes API, it uses an object implementing the ListWatch interface, which is simply a convenience interface with a default implementation

When a reflector is started by invoking its Run method, it will periodically run the ListAndWatch method, which contains the core logic. This function first lists all resources and retrieves the resource version. It then uses the retrieved list to rebuild the object store. It then creates a watch (which is an instance of the watch.Interface interface) and reads from the channel provided by this watch resource in a loop, while keeping the resource version up to date. Depending on the type of the event, it then calls either Add, Update or Delete on the store.

The store that is maintained by the reflector is not just an ordinary store, but a Delta FIFO. This is a special store that maintains deltas, i.e. instances of cache.Delta, which is a structure containing the changed object and the type of change. The store will, for each object that has been subject to at least one change, maintain a list of all changes that have been reported for this object, and perform a certain de-duplication. Thus a reader will get a complete list of all deltas for a specific resource.

The cache controller and the processor

So our reflector is feeding the delta FIFO queue, but who is reading from it? This is done by another component of the shared informer – the cache controller.

When the Run method of a cache controller is invoked, it creates a Reflector, starts the Reflectors Run method in a separate goroutine and starts its own main loop processLoop as a second goroutine. Within that loop, the controller will pop elements off the FIFO queue and invoke the handleDeltas method of the informer once for each element in the queue. This method represents the main loop of the informer that is executed once for each object in the delta store.

This function will do two things. First, it will update the indexer according to the detected change retrieved from the delta FIFO queue. Second, it will build notifications (using the indexer to get old versions of an object if needed) to create notifications and distribute them to all handler functions. This work is delegated to the processor.

Essentially, a processor is an object that maintains a list of listeners. A listeners has an add method to receive notifications. These notifications are then buffered and forwarded to a resource event handler which can be any object that has methods OnAdd, OnUpdate and OnDelete.

We now have a rather complete understanding of what happens in case the state of the Kubernetes cluster changes.

The reflector picks up the change via the Kubernetes API and enqueues it in the delta FIFO queue, where deduplication is performed and all changes for the same object are consolidated
The controller is eventually reading the change from the delta FIFO queue
The controller updates the indexer
The controller forwards the change to the processor, which in turns calls all handler functions
The handler functions are methods of the custom controller who can then inspect the delta, use the updated indexer to retrieve additional information and adjust the cluster state if needed

Note that the same informer object can serve many different handler functions, while maintaining only one, shared indexer – as the name suggests. Throughout the code, synchronisation primitives are used to protect the shared data structures to make all this thread-safe.

Creating and using shared informers

Let us now see how we can create and use a shared informer in practice. Suppose we wanted to write a controller that is taking care of pods and, for instance, trigger some action whenever a pod is created. We can do this by creating an informer which will listen for events on pods and call a handler function that we define. Again, the sample controller is guiding us how to do this.

The recommended way to create shared informers is to use an informer factory. This factory maintains a list of informers per resource type, and if a new informer is requested, it first performs a lookup in that list and returns an existing informer if possible (this happens in the function InformerFor). It is this mechanism which actually makes the informers and therefore their caches shared and reduces both memory footprint and the load on the API server.

We can create a factory and use it to get a shared informer listening for pod events as follows.

factory := informers.NewSharedInformerFactory(clientset, 30*time.Second)
podInformer := factory.Core().V1().Pods().Informer()

Here clientset is a Kubernetes clientset that we can create as in our previous examples, and the second argument is the resync period. In our example, we instruct the informer to rebuild its cache every 30 seconds.

Once our informer is created, we need to start its main loop. However, we cannot simply call its Run method, as this informer might be shared and we do not want to call this method twice. Instead, we can again use a convenience function of the factory class – its method Start will start all informers created by the factory and keep track of their status.

We can now register event handlers with the informer. An event handler is an object implementing the ResourceHandler interface. Instead of building a class implementing this interface ourselves, we can use the existing class ResourceHandlerFuncs. Assuming that our handler functions for adding, updating and deleting pods are called onAdd, onUpdate and onDelete, the code to add these functions as an event handler would look as follows.

podInformer.AddEventHandler(
		&cache.ResourceEventHandlerFuncs{
			AddFunc:    onAdd,
			DeleteFunc: onDelete,
			UpdateFunc: onUpdate,
		})

Finally, we need to think about how we stop our controller again. At the end of our main function, we cannot simply exit, as this would stop the entire controller loop immediately. So we need to wait for something – most likely for a signal sent to us by the operating system, for instance because we have hit Ctrl-C. This can be achieved using the idea of a stop channel. This is a channel which is closed when we want to exit, for instance because a signal is received. To achieve this, we can create a dedicated goroutine which uses the os standard package to receive signals from the operating system and closes the channel accordingly. An example implementation is part of the sample controller.

Our event handlers are now very easy. They receive an object that we can convert to a Pod object using a type assertion. We can than use properties of the pod, for instance its name, print information or process that information further. A full working example can be found in my GitHub repository here.

This completes our post for today. In the next post in this mini-series on Go and Kubernetes, we will put everything that we have learned so far together and talk a short walk through the full source code of the sample controller to understand how a custom controller works. This will then finally put us in a position to start the implementation of our planned bitcoin controller.

Understanding Kubernetes controllers part II – object stores and indexers

In the last post, we have seen that our sample controller uses Listers to retrieve the current state of Kubernetes resources. In this post, we will take a closer look at how these Listers work.

Essentially, we have already seen how to use the Kubernetes Go client to retrieve informations on Kubernetes resources, so we could simply do that in our controller. However, this is a bit inefficient. Suppose, for instance, you are using multiple worker threads as we do it. You would then probably retrieve the same information over and over again, creating a high load on the API server. To avoid this, a special class of Kubernetes informers – called index informers can be used which build a thread-safe object store serving as a cache. When the state of the cluster changes, the informer will not only invoke the handler functions of our controller, but also do the necessary updates to keep the cache up to date. As the cache has the additional ability to deal with indices, it is called an Indexer. Thus at the end of todays post, the following picture will emerge.

In the remainder of this post, we will discuss indexers and how they interact with an informer in more detail, while in the next post, we will learn how informers are created and used and dig a little bit into their inner workings.

Watches and resource versions

Before we talk about informers and indexers, we have to understand the basic mechanisms that clients can use to keep track of the cluster state. To enable this, the Kubernetes API offers a mechanism called a watch. This is maybe explained best using an example.

To follow this example, we assume that you have a Kubernetes cluster up and running. We will use curl to directly interact with the API. To avoid having to add tokens or certificates to our request, we will use the kubectl proxy mechanism. So in a separate terminal, run

$ kubectl proxy

You should see a message that the proxy is listening on a port (typically 8001) on the local host. Any requests sent to this port will be forwarded to the Kubernetes API server. To populate our cluster, let us first start a single HTTPD.

$ kubectl run alpine --image=httpd:alpine

Then let us use curl to get a list of running pods in the default namespace.

$ curl localhost:8001/api/v1/namespaces/default/pods
{
  "kind": "PodList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/namespaces/default/pods",
    "resourceVersion": "6834"
  },
  "items": [
    {
      "metadata": {
        "name": "alpine-56cf65bbfc-tzqqx",
        "generateName": "alpine-56cf65bbfc-",
        "namespace": "default",
        "selfLink": "/api/v1/namespaces/default/pods/alpine-56cf65bbfc-tzqqx",
        "uid": "584ddf85-5f8d-11e9-80c0-080027696a3f",
        "resourceVersion": "6671",
--- REDACTED ---

As expected, you will get a JSON encoded object of type PodList. The interesting part is the data in the metadata. You will see that there is a field resourceVersion. Essentially, the resource version is a number which increases over time and that uniquely identifies a certain state of the cluster.

Now the Kubernetes API offers you the option to request a watch, using this resource version as a starting point. To do this manually, enter

$ curl -v localhost:8001/api/v1/namespaces/default/pods?watch=1&resourceVersion=6834

Looking at the output, you will see that this request returns a HTTP response with the transfer enconding “chunked”. This is specified in RFC 7230 and puts the client into streaming mode, i.e. the connection will remain open and the API server will continue to send updates in small chunks. This will move curl into the background, but curl will continue to print the received data to the terminal. If you now create additional pods in your cluster or delete existing pods, you will continue to see notifications being received, informing you about the events. Each notification consists of a type (ADDED, MODIFIED, ERROR or DELETED) and an object – the layout of the message is described here.

This gives us a way to obtain a complete picture of the clusters state in an efficient manner. We first use an ordinary API request to list all resources. We then remember the resource version in the response and use that resource version as a starting point for a watch. Whenever we receive a notification about a change, we update our local data accordingly. And essentially, this is exactly what the combination of informer and indexer are doing.

Caching mechanisms and indexers

An indexer is any object that implements the interface cache.Indexer. This interface in turn is derived from cache.Store, so let us study that first. Its definition is in store.go.

type Store interface {
	Add(obj interface{}) error
	Update(obj interface{}) error
	Delete(obj interface{}) error
	List() []interface{}
	ListKeys() []string
	Get(obj interface{}) (item interface{}, exists bool, err error)
	GetByKey(key string) (item interface{}, exists bool, err error)
	Replace([]interface{}, string) error
	Resync() error
}

So basically a store is something to which we can add objects, retrieve them, update or delete them. The interface itself does not make any assumptions about keys, but when you create a new store, you provide a key function which extracts the key from an object and has the following signatures.

type KeyFunc func(obj interface{}) (string, error)

Working with stores is very convenient and easy, you can find a short example that stores objects representing books in a store here.

Let us now verify that, as the diagram above claims, both, the informer and the lister have a reference to the same indexer. To see this, let us look at the creation process of our sample controller.

When a new controller is created by the function NewController, this function accepts a DeploymentInformer and a FooInformer. These are interfaces that provide access to an actual informer and a lister for the respective resources. Let us take the FooInformer as an example. The actual creation method for the Lister looks as follows.

func (f *fooInformer) Lister() v1alpha1.FooLister {
	return v1alpha1.NewFooLister(f.Informer().GetIndexer())
}

This explains how the link between the informer and the indexer is established. The communication between informer and indexer is done via the function handleDeltas which receives a list of Delta objects as defined in delta_fifo.go (we will learn more about how this works in the next post). If we look at this function, we find that it does not only call all registered handler functions (with the help of a processor), but also calls the methods Add, Update and Delete on the store, depending on the type of the delta.

We now have a rather complete picture of how our sample controller works. The informer uses the Kubernetes API and its mechanism to watch for changes based on resource versions to obtain updates of the cluster state. These updates are used to maintain an object store which reflects the current state of the cluster and to invoke defined event handler functions. A controller registers its functions with the informer to be called when a resource changes. It can then access the object store to easily retrieve the current state of the resources and take necessary actions to drive the system towards the target state.

What we have not yet seen, however, is how exactly the magical informer works – this will be the topic of our next post.

Understanding Kubernetes controllers part I – queues and the core controller loop

In previous posts in my series on Kubernetes, we have stumbled across the concept of a controller. Essentially, a controller is a daemon that monitors the to-be state of components of a Kubernetes cluster against the as-is state and takes action to reach the to-be state.

A classical example is the Replica set controller which monitors replica sets and pods and is responsible for creating new pods or deleting existing pods if the number of replicas is out-of-sync with the defined value.

In this series, we will perform a deep dive into controllers. Specifically, we will take a tour through the sample controller that is provided by the Kubernetes project and try to understand how this controller works. We will also explain how this relates to customer resource definitions (CRDs) and what steps are needed to implement a customer controller for a given CRD.

Testing the sample controller

Let us now start with our analysis of the sample controller. To follow this analysis, I advise to download a copy of the sample controller into your Go workspace using go get github.com/kubernetes/sample-controller and then using an editor like Atom that offers plugins to navigate Go code.

To test the client and to have a starting point for debugging and tracing, let us follow the instructions in the README file that is located at the root of the repository. Assuming that you have a working kubectl config in $HOME/.kube/config, build and start the controller as follows.

$ cd $GOPATH/src/k8s.io/sample-controller
$ go build
$ kubectl create -f artifacts/examples/crd.yaml
$ ./sample-controller --kubeconfig=$HOME/.kube/config

This will create a custom resource definition, specifically a new resource type named “Foo” that we can use as any other resource like Pods or Deployments. In a separate terminal, we can now create a Foo resource.

$ kubectl create -f artifacts/examples/example-foo.yaml 
$ kubectl get deployments
$ kubectl get pods

You will find that our controller has created one deployment which in turn brings up one pod running nginx. If you delete the custom resource again using kubectl delete foo example-foo, both, the deployment and the pods disappear again. However, if you manually delete the deployment, it is is recreated by the controller. So apparently, our controller is able to detect changes to deployments and foo resources and to match them accordingly. How does this work?

Basically, a controller will periodically match the state of the system to the to-be state. For that purpose, several functionalities are required.

We need to be able to keep track of the state of the system. This is done based on an event-driven processing and handled by informers that are able to subscribe to events and invoke specific handlers and listers that are able to list all resources in a given Kubernetes cluster
We need to be able to keep track of the state of the system. This is done using object stores and their indexed variants indexer
Ideally, we should be able to process larger volumes using multi-threading, coordinated by queues

In this and the next post, we will go through these elements one by one. We start with queues.

Queues and concurrency

Let us start by investigating threads and queues in the Kubernetes library. The ability to easily create threads (called go-routines) in Go and the support for managing concurrency and locking are one of the key differentiators of the Go programming language, and of course the Kubernetes client library makes use of these features.

Essentially, a queue in the Kubernetes client library is something that implements the interface k8s.io/client-go/util/workqueue/Interface. That interface contains (among others) the following methods.

Add adds an object to the queue
Get blocks until an item is available in the queue, and then returns the first item in the queue
Done marks an item as processed

Internally, the standard implementation of a queue in queue.go uses Go maps. The keys of these maps are arbitrary objects, the items in the map are actually just placeholders (empty structures). One of these maps is called the dirty set, this map contains all elements that make up the actual queue, i.e. need to be processed. The second map is called the processing set, these are all items which have been retrieved using Get, but for which Done has not yet been called. As maps are unordered, there is also an array which holds the elements in the queue and is used to define the order of processing. Note that each of the maps can hold a specific object only once, whereas the queue can hold several copies of the object.

If we add something to the queue, it is added to the dirty set and appended to the queue array. If we call Get, the first item is retrieved from the queue, removed from the dirty set and added to the processing set. Calling Done will remove the element from the processing set as well, unless someone else has called Add in the meantime again on the same object – in this case it will be removed from the processing set, but also be added to the queue again.

Let us implement a little test program that works with queues. For that purpose, we will establish two threads aka goroutines. The first thread will call Add five times to add something to the queue and then complete. The second thread will sleep for three seconds and then read from the queue in a loop to retrieve the elements. Here are the functions to send and receive.

func fillQueue(queue workqueue.Interface) {
	time.Sleep(time.Second)
	queue.Add("this")
	queue.Add("is")
	queue.Add("a")
	queue.Add("complete")
	queue.Add("sentence")
}

func readFromQueue(queue workqueue.Interface) {
	time.Sleep(3 * time.Second)
	for {
		item, _ := queue.Get()
		fmt.Printf("Got item: %s\n", item)
		queue.Done(item)
	}
}

With these two functions in place, we can now easily create two goroutines that use the same queue to communicate (goroutines, being threads, share the same address space and can therefore communicate using common data structures).

myQueue := workqueue.New()
go fillQueue(myQueue)
go readFromQueue(myQueue)

However, if you run this, you will find that there is a problem. Our main thread completes after creating both worker threads, and this will cause the program to exit and kill both worker threads before the reader has done anything. To avoid this, we need to wait for the reader thread (which, reading from the queue, will in turn wait for the writer thread). One way to do this with Go language primitives is to use channels. So we change our reader function to receive a channel of integer elements

func readFromQueue(queue workqueue.Interface, stop chan int)

and in the main function, we create a channel, pass it to the function and then read from it which will block until the reader thread sends a confirmation that it is done reading.

stop := make(chan int)
myQueue := workqueue.New()
go fillQueue(myQueue)
go readFromQueue(myQueue, stop)
<-stop

Now, however, there is another problem – how does the reader know that no further items will be written to the queue? Fortunately, queues offers a way to handle this. When a writer is done using the queue, it will call Shutdown on the queue. This will change the queues behavior – reads will no longer be blocking, and the second return value of a Get will be true if the queue is empty. If a reader recognizes this situation, it can stop its goroutine.

A full example can be found here – of course this is made up to demonstrate goroutines, queues and channels and not the most efficient solution for the problem at hand.

The core controller loop

Armed with our understanding of concurrency and queues, we can now take a first look at the code of the sample controller. The main entry points for are the function handleObject and enqueueFoo – these are the functions invoked by the Informer, which we will discuss in one of the next posts, whenever either a Foo object or a Deployment is created, updated or deleted.

The function enqueueFoo is called whenever a Foo object is changed (i.e. added, updated or deleted). It simply determines a key for the object and adds that key to the workqueue.

The workqueue is read by worker threads, which are created in the Run function of the controller. This function creates a certain number of goroutines and then listens on a channel called stopCh, as we have done it in our simplified example before. This channel is created by main.go and used to be able to stop all workers and the main thread if a signal is received.

Each worker thread executes the method processNextItem of the controller in a loop. For each item in the queue, this method calls another method – syncHandler – passing the item retrieved from the queue, i.e. the key of the Foo resource. This method then uses a Lister to retrieve the current state of the Foo resource. It then retrieves the deployment behind the Foo resource, creates it if it could not be found, and updates the number of replicas if needed.

The function handleObject is similar. It is invoked by the informer with the new, updated state of the Deployment object. It then determines the owning Foo resource and simply enqueues that Foo resource. The rest of the processing will then be the same.

At this point, two open ends remain. First, we will have to understand how an Informer works and how it invokes the functions handleObject and enqueueFoo. And we will need to understand what a Lister is doing and where the lister and the data is uses comes from. This will be the topic of our next post.

Extending Kubernetes with custom resources and custom controllers

The Kubernetes API is structured around resources. Typical resources that we have seen so far are pods, nodes, containers, ingress rules and so forth. These resources are built into Kubernetes and can be addresses using the kubectl command line tool, the API or the Go client.

However, Kubernetes is designed to be extendable – and in fact, you can add your own resources. These resources are defined by objects called custom resource definitions (CRD).

Setting up custom resource definitions

Confusingly enough, the definition of a custom resource – i.e. the CRD – itself is nothing but a resource, and as such, can be created using either the Kubernetes API directly or any client you like, for instance kubectl.

Suppose we wanted to create a new resource type called book that has two attributes – an author and a title. To distinguish this custom resource from other resources that Kubernetes already knows, we have to put our custom resource definition into a separate API group. This can be any string, but to guarantee uniqueness, it is good practice to use some sort of domain, for instance a GitHub repository name. As my GitHub user name is christianb93, I will use the API group christianb93.github.com for this example.

To understand how we can define that custom resource type using the API, we can take a look at its specification or the corresponding Go structures. We see that

The CRD resource is part of the API group apiextensions.k8s.io and has version v1beta1, so the value of the apiVersion fields needs to be apiextensions.k8s.io/v1beta1
The kind is, of course, CustomResourceDefinition
There is again a metadata field, which is built up as usual. In particular, there is a name field
A custom resource definition spec consists of a version, the API group, a field scope that determines whether our CRD instances will live in a cluster scope or in a namespace and a list of names

This translates into the following manifest file to create our CRD.

$ kubectl apply -f - << EOF
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
    name: books.christianb93.github.com
spec:  
    version: v1
    group: christianb93.github.com
    scope: Namespaced
    names:
      plural: books
      singular: book
      kind: Book
EOF
customresourcedefinition.apiextensions.k8s.io/books.christianb93.github.com created

This will create a new type of resources, our books. We can access books similar to all other resources Kubernetes is aware of. We can, for instance, get a list of existing books using the API. To do this, open a separate terminal and run

kubectl proxy

to get access to the API endpoints. Then use curl to get a list of all books.

$ curl -s -X GET "localhost:8001/apis/christianb93.github.com/v1/books"  | jq
{
  "apiVersion": "christianb93.github.com/v1",
  "items": [],
  "kind": "BookList",
  "metadata": {
    "continue": "",
    "resourceVersion": "7281",
    "selfLink": "/apis/christianb93.github.com/v1/books"
  }
}

So in fact, Kubernetes knows about books and has established an API endpoint for us. Note that the path contains “apis” and not “api” to indicate that this is an extension of the original Kubernetes API. Also note that the path contains our dedicated API group name and the version that we have specified.

At this point we have completed the definition of our custom resource “book”. Now let us try to actually create some books.

$ kubectl apply -f - << EOF
apiVersion: christianb93.github.com/v1
kind: Book
metadata:
  name: david-copperfield
spec:
  title: David Copperfield
  author: Dickens
EOF
book.christianb93.github.com/my-book created

Nice – we have created our first book as an instance of our new CRD. We can now work with this book similar to a pod, a deployment and so forth. We can for instance display it using kubectl

$ kubectl get book david-copperfield
NAME                AGE
david-copperfield   3m38s

or access it using curl and the API.

$ curl -s -X GET "localhost:8001/apis/christianb93.github.com/v1/namespaces/default/books/david-copperfield" | jq
{
  "apiVersion": "christianb93.github.com/v1",
  "kind": "Book",
  "metadata": {
    "annotations": {
      "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"christianb93.github.com/v1\",\"kind\":\"Book\",\"metadata\":{\"annotations\":{},\"name\":\"david-copperfield\",\"namespace\":\"default\"},\"spec\":{\"author\":\"Dickens\",\"title\":\"David Copperfield\"}}\n"
    },
    "creationTimestamp": "2019-04-21T09:32:54Z",
    "generation": 1,
    "name": "david-copperfield",
    "namespace": "default",
    "resourceVersion": "7929",
    "selfLink": "/apis/christianb93.github.com/v1/namespaces/default/books/david-copperfield",
    "uid": "70fbc120-6418-11e9-9fbf-080027a84e1a"
  },
  "spec": {
    "author": "Dickens",
    "title": "David Copperfield"
  }
}

Validations

If we look again at what we have done and where we have started, somethings still feels a bit wrong. Remember that we wanted to define a resource called a “book” that has a title and an author. We have used those fields when actually creating a book, but we have not referred to it at all in the CRD. How does the Kubernetes API know which fields a book actually has?

The answer is simple – it does not know this at all. In fact, we can create a book with any collection of fields we want. For instance, the following will work just fine.

$ kubectl apply -f - << EOF
apiVersion: christianb93.github.com/v1
kind: Book
metadata:
  name: moby-dick
spec:
  foo: bar
EOF
book.christianb93.github.com/moby-dick created

In fact, when you run this, the Kubernetes API server will happily take your JSON input and store it in the etcd that keeps the cluster state – and it will store there whatever you provide. To avoid this, let us add a validation rule to our resource definition. This allows you to attach an OpenAPI schema to your CRD against which the books will be validated. Here is our updated CRD manifest file to make this work.

$ kubectl delete crd  books.christianb93.github.com
customresourcedefinition.apiextensions.k8s.io "books.christianb93.github.com" deleted
$ kubectl apply -f - << EOF
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
    name: books.christianb93.github.com
spec:  
    version: v1
    group: christianb93.github.com
    scope: Namespaced
    subresources:
      status: {}
    names:
      plural: books
      singular: book
      kind: Book
    validation:
      openAPIV3Schema:
        properties:
          spec:
            required: 
            - author
            - title
            properties:
              author:
                type: string
              title:
                type: string
EOF
customresourcedefinition.apiextensions.k8s.io/books.christianb93.github.com created

If you know repeat the command above, you will find that "David Copperfield" can be created, but "Moby Dick" is rejected, as it does not match the validation rules (the required fields author and title are missing).

There is another change that we have made in this version of our CRD – we have added a subresource called status to our CRD. This subresource allows a controller to update the status of the resource indepently of the specification – see the corresponding API convention for more details on this.

The controller pattern

As we have seen above, a CRD is essentially allowing you to store data as part of the cluster state kept in the etcd key-value store using a Kubernetes API endpoint. However, CRDs do not actually trigger any change in the cluster. If you POST a custom resource like a book to the Kubernetes API server, all it will do is to store that object in the etcd store.

It might come as a bit of a surprise, but strictly speaking, this is true for built-in resources as well. Suppose, for instance, that you use kubectl to create a deployment. Then, kubectl will create a PUT request for a deployment and send it to the API server. The API server will process the request and store the new deployment in the etcd. It will, however, not actually create pods, spin up containers and so forth.

This is the job of another component of the Kubernetes architecture – the controllers. Essentially, a controller is monitoring the etcd store to keep track of its contents. Whenever a new resource, for example a deployment, is created, the controller will trigger the associated actions.

Kubernetes comes with a set of built-in controllers in the controller package. Essentially, there is one controller for each type of resource. The deployment controller, for instance, monitors deployment objects. When a new deployment is created in the etcd store, it will make sure that there is a matching replica set. These sets are again managed by another controller, the replica set controller, which will in turn create matching pods. The pods are again monitored by the scheduler that determines the node on which the pods should run and writes the bindings back to the API server. The updated bindings are then picked up by the kubelet and the actual containers are started. So essentially, all involved components of the Kubernetes architecture talk to the etcd via the API server, without any direct dependencies.

Of course, the Kubernetes built-in controllers will only monitor and manage objects that come with Kubernetes. If we create custom resources and want to trigger any actual action, we need to implement our own controllers.

Suppose, for instance, we wanted to run a small network of bitcoin daemons on a Kubernetes cluster for testing purposes. Bitcoin daemons need to know each other and register themselves with other daemons in the network to be able to exchange messages. To manage that, we could define a custom resource BitcoinNetwork which contains the specification of such a network, for instance the number of nodes. We could then write a controller which

Detects new instances of our custom resource
Creates a corresponding deployment set to spin up the nodes
Monitors the resulting pods and whenever a pod comes up, adds this pod to the network
Keeps track of the status of the nodes in the status of the resource
Makes sure that when we delete or update the network, the corresponding deployments are deleted or updated as well

Such a controller would operate by detecting newly created or changed BitcoinNetwork resources, compare their definition to the actual state, i.e. existing deployments and pods, and update their state accordingly. This pattern is known as the controller pattern or operator pattern. Operators exists for many applications, like Postgres, MySQL, Prometheus and many others.

I did actually pick this example for a reason – in an earlier post, I showed you how to set up and operate a small bitcoin test network based on Docker and Python. In the next few posts, we will learn how to write a custom controller in Go that automates all this on top of Kubernetes! To achieve this, we will first analyze the components of a typical controller – informes, queues, caches and all that – using the Kubernetes sample controller and then dig into building a custom bitcoin controller armed with this understanding.

Learning Go with Kubernetes IV – life of a request

So far we have described how a client program utilizes the Kubernetes library to talk to a Kubernetes API server. In this post, we will actually look into the Kubernetes client code and try to understand how it works under the hood.

When we work with the Kubernetes API and try to understand the client code, we first have to take a look at how the API is versioned.

Versioning in the Kubernetes API

The Kubernetes API is a versioned API. Every resource that you address using requests like GET or POST has a version. For stable versions, the version name is of the form vX, where X is an integer. In addition, resources are grouped into API groups. The full relative path to a resource is of the form

/api/GROUP/VERSION/namespaces/NAMESPACE

Let us take the job resource as an example. This resource is in the API group batch. A GET request for a job called my-job in the default namespace using version v1 of the API would therefore be something like

GET /api/batch/v1/namespaces/default/jobs/my-job

An exception is made by the core API group, which is omitted in the URL path for historical reasons. In a manifest file, API group and version are both stored in the field apiVersion, which, in our example, would be batch/v1.

Within the Kubernetes Go client, the combination of a type of resource (a “kind”, like a Pod), a version and an API group is stored in a Go structure called GroupVersionKind. In fact, this structure is declared as follows

type GroupVersionKind struct {
	Group   string
	Version string
	Kind    string
}

in the file k8s.io/apimachinery/pkg/runtime/schema/group_version.go. In the source code, instances of this class are typically called gvk. We will later see that, roughly speaking, the client contains a machinery which allows us to map forth and back between possible combinations of group, version and kind and Go structures.

An overview of the Kubernetes client

At least as far as we are concerned with getting, updating and deleting resources, the Kubernetes client code consists of the following core components:

A clientset is the entry point into the package and typically created from a client configuration as stored in the file ~/.kube/config
For each combination of API group and version, there is a package that contains the corresponding clients. For each resource in that group, like a Node, there is a corresponding Interface that allows us to perform operations like get, list etc. on the resource, and a corresponding object like a node itself
The package k8s.io/client-go/rest contains the code to create, submit and process REST requests. There is a REST client, request and result structures, serializers and configuration objects
The package k8s.io/apimachinery/pkg/runtime contains the machinery to translate API requests and replies from and to Go structures. An Encoder is able to write objects to a stream. A Decoder transforms a stream of bytes into an object. A Scheme is able to map a group-version-kind combination into a Go type and vice versa.
The same package contains a CodecFactory that is able to create encoders and decoders, and some standard encoders and decoders, for instance for JSON and YAML

Let us now dive into each of these building blocks in more detail.

Clientsets and clients

In our first example program, we have used the following lines to connect to the API.

clientset, err := kubernetes.NewForConfig(config)
coreClient := clientset.CoreV1()
nodeList, err := coreClient.Nodes().List(metav1.ListOptions{})

Let us walk through this and see how each of these lines is implemented behind the scenes. The first line creates an instance of the class ClientSet. Essentially, a clientset is a set of client objects, where each client object represents one version of an API group. When we access nodes, we will use the API group core, and correspondingly use the field coreV1 of this structure.

This core client is an instance of k8s.io/client-go/kubernetes/typed/core/v1/coreV1Client and implementing the interface CoreV1Interface. This interface declares for each resource in the core API a dedicated getter function which returns an interface to work with this resource. For a node, the getter function is called Nodes and returns a class implementing the interface NodeInterface, which defines all the functions we are looking for – get, update, delete, list and so forth.

An instance of the Nodes class also contains a reference to a RESTClient which is the working horse where the actual REST requests to the Kubernetes API will be assembled and processed – let us continue our analysis there.

RESTClient and requests

How do REST clients work? Going back to our example code, we invoke the REST client in the line

nodeList, err := coreClient.Nodes().List(metav1.ListOptions{})

Here we invoke the List method on the nodes object which is defined in the file k8s.io/client-go/kubernetes/typed/core/v1/node.go. The core of this method is the following code snippet.

err = c.client.Get().
	Resource("nodes").
	VersionedParams(&opts, scheme.ParameterCodec).
	Timeout(timeout).
	Do().
	Into(result)

Let us go through this step by step to see what is going on. First, the attribute client referenced here is a RESTClient, defined in k8s.io/client-go/rest/client.go. Among other things, this class contains a set of methods to manipulate requests.

The first method that we call is Get, which returns an instance of the class rest.Request, defined in rest.go in the same directory. A request contains a reference to a HTTPClient, which typically is equal to the HTTPClient to which the RESTClient itself refers. The request created by the Get method will be pre-initialized with the verb “GET”.

Next, several parameters are added to the request. Each of the following methods is a method of the Request object and again returns a request, so that chaining becomes possible. First, the method Resource sets the name of the resource that we want to access, in this case “nodes”. This will become part of the URL. Then we use VersionedParams to add the options to the request and Timeout to set a timeout.

We then call Do() on the request. Here is the key section of this method

var result Result
err := r.request(func(req *http.Request, resp *http.Response) {
	result = r.transformResponse(resp, req)
})
return result

In the first line, we create an (empty) rest.Result object. In addition to the typical attributes that you would expect from a HTTP response, like a body, i.e a sequence of bytes, this object also contains a decoder, which will become important later on.

We then invoke the request method of the Request object. This function assembles a new HTTP request based on our request, invokes the Do() method on the HTTP client and then calls the provided function which is responsible for converting the HTTP response into a Result object. The default implementation of this is transformResponse, which also sets a decoder in the Result object respectively copies the decoder contained in the request object.

When all this completes, we have a Result object in our hands. This is still a generic object, we have a response body which is a stream of bytes, not a typed Go structure.

This conversion – the unmarshalling – is handled by the method Into. This method accepts as an argument an instance of the type runtime.Object and fills that object according to the response body. To understand how this work, we will have to take a look at schemes, codec factories and decoders.

Schemes and decoder

In the first section, we have seen that API resources are uniquely determined by the combination of API group, version and kind. For each valid combination, our client should contain a Go structure representing this resource, and conversely, for every valid resource in the Go world, we would expect to have a combination of group, version and kind. The translation between these two worlds is accomplished by a scheme. Among other things, a scheme implements the following methods.

A method ObjectKind which returns all known combinations of kind, API group and version for a given object
A method Recognizes which in turn determines whether a given combination of kind, API group and version is allowed
A method New which is able to create a new object for a given combination of API group, kind and version

Essentially, a scheme knows all combinations of API group, version and kind and the corresponding Go structures and is able to create and default the Go structures. For this to work, all resources handled by the API need to implement the interface runtime.Object.

This is nice, but what we need to transform the result of a call to the Kubernetes API into a Go structure is a decoder object. To create decoder (and encoder) objects, the API uses the class CodecFactory. A codec factory refers to a scheme and is able to create encoders and decoders. Some of the public methods of such an object are collected in the interface NegotiatedSerializer.

This interface provides the missing link between a REST client and the scheme and decoder / encoder machinery. In fact, a REST client has an attribute contentConfig which is an object of type ContentConfig. This object contains the HTTP content type, the API group and version the client is supposed to talk to and a NegotiatedSerializer which will be used to obtain decoders and encoders.

Where are schemes and codec factories created and stored? Within the package k8s.io/client-go/kubernetes/scheme, there is a public variable Scheme and a public variable Codecs which is a CodecFactory. Both variables are declared in register.go. The scheme is initially empty, but in the init method of the package, the scheme is built up by calling (via a function list called a scheme builder) the function AddToScheme for each known API group.

Putting it all together

Armed with this understanding of the class structures, we can now again try to understand what happens when we submit our request to list all nodes.

During initialization of the package k8s.io/client-go/kubernetes/scheme, the initialization code in the file register.go is executed. This will initialize our scheme and a codec factory. As part of this, a standard decoder for the JSON format will be created (this happens in the function NewCodecFactory in codec_factory.go).

Then, we create our clientset using the function NewForConfig in the kubernetes package, which calls the method NewForConfig for each of the managed clients, including our CoreV1Client. Here, the following things happen:

We set group and version to the static values provided in the file register.go of the v1 package – the group will be empty, as we are in the special case for the core client, and the version will be “”v1”
We add a reference to the CodecFactory to our configuration
We create a REST client with a base URL constructed from the host name and port of the Kubernetes API server, the API group and the version as above
We then invoke the function createSerializers in the rest package. This function retrieves all supported media types from the codec factory and matches it against the media type in the kubectl config. Then a rest.Serializer is selected which matches group, version and media type
The REST client is added to the core client and the core client is returned
When we subsequently create a request using this REST client, we add this serializer to the request from which it is later copied to the result

At this point, we are ready to use this core client. We now navigate to a NodeInterface and call its list method. As explained above, this will eventually take us to the function Into defined in request.go. Here, we invoke the Decode method of our REST decoder. As this is the default JSON serializer, this method is the Decode function in json.go. This decode function first uses the scheme to determine if API group, version and kind of the response are valid and match the expected Go type. It then uses a standard JSON unmarshaller to perform the actual decoding.

This completes our short tour through the structure of the Go client source code. We have seen some central concepts of the client library – API groups, versions and kinds, schemes, encoder and decoder and the various client types – which we will need again in a later post when we discuss custom controllers.