Understanding Kubernetes controllers part III – informers

After understanding how an informer can be used to implement a custom controller, we will now learn more about the inner working of an informer.

We have seen that essentially, informers use the Kubernetes API to learn about changes in the state of a Kubernetes cluster and use that information to maintain a cache (the indexer) of the current cluster state and to inform clients about the changes by calling handler functions. To achieve this, an informer (more specifically: a shared informer) is again a composition of several components.

  1. A reflector is the component which is actually talking to the Kubernetes API server
  2. The reflector stores the information on state changes in a special queue, a Delta FIFO
  3. A processor is used to distribute the information to all registered event handlers
  4. Finally, a cache controller is reading from the queue and orchestrating the overall process

SharedInformer

In the next few sections, we visit each of these components in turn.

Reflectors and the FIFO queue

In the last post, we have seen that the Kubernetes API offers mechanisms like resource versions and watches to allow a client to keep track of the cluster state. However, these mechanism require some logic – keeping track of the resource version, handling timeouts and so forth. Within the Kubernetes client package, this logic is built into an Reflector.

A reflector uses the API to keep track of changes for one specific type of resources and update an object store accordingly. To talk to the Kubernetes API, it uses an object implementing the ListWatch interface, which is simply a convenience interface with a default implementation

When a reflector is started by invoking its Run method, it will periodically run the ListAndWatch method, which contains the core logic. This function first lists all resources and retrieves the resource version. It then uses the retrieved list to rebuild the object store. It then creates a watch (which is an instance of the watch.Interface interface) and reads from the channel provided by this watch resource in a loop, while keeping the resource version up to date. Depending on the type of the event, it then calls either Add, Update or Delete on the store.

The store that is maintained by the reflector is not just an ordinary store, but a Delta FIFO. This is a special store that maintains deltas, i.e. instances of cache.Delta, which is a structure containing the changed object and the type of change. The store will, for each object that has been subject to at least one change, maintain a list of all changes that have been reported for this object, and perform a certain de-duplication. Thus a reader will get a complete list of all deltas for a specific resource.

The cache controller and the processor

So our reflector is feeding the delta FIFO queue, but who is reading from it? This is done by another component of the shared informer – the cache controller.

When the Run method of a cache controller is invoked, it creates a Reflector, starts the Reflectors Run method in a separate goroutine and starts its own main loop processLoop as a second goroutine. Within that loop, the controller will pop elements off the FIFO queue and invoke the handleDeltas method of the informer once for each element in the queue. This method represents the main loop of the informer that is executed once for each object in the delta store.

This function will do two things. First, it will update the indexer according to the detected change retrieved from the delta FIFO queue. Second, it will build notifications (using the indexer to get old versions of an object if needed) to create notifications and distribute them to all handler functions. This work is delegated to the processor.

Essentially, a processor is an object that maintains a list of listeners. A listeners has an add method to receive notifications. These notifications are then buffered and forwarded to a resource event handler which can be any object that has methods OnAdd, OnUpdate and OnDelete.

We now have a rather complete understanding of what happens in case the state of the Kubernetes cluster changes.

  • The reflector picks up the change via the Kubernetes API and enqueues it in the delta FIFO queue, where deduplication is performed and all changes for the same object are consolidated
  • The controller is eventually reading the change from the delta FIFO queue
  • The controller updates the indexer
  • The controller forwards the change to the processor, which in turns calls all handler functions
  • The handler functions are methods of the custom controller who can then inspect the delta, use the updated indexer to retrieve additional information and adjust the cluster state if needed

Note that the same informer object can serve many different handler functions, while maintaining only one, shared indexer – as the name suggests. Throughout the code, synchronisation primitives are used to protect the shared data structures to make all this thread-safe.

Creating and using shared informers

Let us now see how we can create and use a shared informer in practice. Suppose we wanted to write a controller that is taking care of pods and, for instance, trigger some action whenever a pod is created. We can do this by creating an informer which will listen for events on pods and call a handler function that we define. Again, the sample controller is guiding us how to do this.

The recommended way to create shared informers is to use an informer factory. This factory maintains a list of informers per resource type, and if a new informer is requested, it first performs a lookup in that list and returns an existing informer if possible (this happens in the function InformerFor). It is this mechanism which actually makes the informers and therefore their caches shared and reduces both memory footprint and the load on the API server.

We can create a factory and use it to get a shared informer listening for pod events as follows.

factory := informers.NewSharedInformerFactory(clientset, 30*time.Second)
podInformer := factory.Core().V1().Pods().Informer()

Here clientset is a Kubernetes clientset that we can create as in our previous examples, and the second argument is the resync period. In our example, we instruct the informer to rebuild its cache every 30 seconds.

Once our informer is created, we need to start its main loop. However, we cannot simply call its Run method, as this informer might be shared and we do not want to call this method twice. Instead, we can again use a convenience function of the factory class – its method Start will start all informers created by the factory and keep track of their status.

We can now register event handlers with the informer. An event handler is an object implementing the ResourceHandler interface. Instead of building a class implementing this interface ourselves, we can use the existing class ResourceHandlerFuncs. Assuming that our handler functions for adding, updating and deleting pods are called onAdd, onUpdate and onDelete, the code to add these functions as an event handler would look as follows.

podInformer.AddEventHandler(
		&cache.ResourceEventHandlerFuncs{
			AddFunc:    onAdd,
			DeleteFunc: onDelete,
			UpdateFunc: onUpdate,
		})

Finally, we need to think about how we stop our controller again. At the end of our main function, we cannot simply exit, as this would stop the entire controller loop immediately. So we need to wait for something – most likely for a signal sent to us by the operating system, for instance because we have hit Ctrl-C. This can be achieved using the idea of a stop channel. This is a channel which is closed when we want to exit, for instance because a signal is received. To achieve this, we can create a dedicated goroutine which uses the os standard package to receive signals from the operating system and closes the channel accordingly. An example implementation is part of the sample controller.

Our event handlers are now very easy. They receive an object that we can convert to a Pod object using a type assertion. We can than use properties of the pod, for instance its name, print information or process that information further. A full working example can be found in my GitHub repository here.

This completes our post for today. In the next post in this mini-series on Go and Kubernetes, we will put everything that we have learned so far together and talk a short walk through the full source code of the sample controller to understand how a custom controller works. This will then finally put us in a position to start the implementation of our planned bitcoin controller.

Understanding Kubernetes controllers part II – object stores and indexers

In the last post, we have seen that our sample controller uses Listers to retrieve the current state of Kubernetes resources. In this post, we will take a closer look at how these Listers work.

Essentially, we have already seen how to use the Kubernetes Go client to retrieve informations on Kubernetes resources, so we could simply do that in our controller. However, this is a bit inefficient. Suppose, for instance, you are using multiple worker threads as we do it. You would then probably retrieve the same information over and over again, creating a high load on the API server. To avoid this, a special class of Kubernetes informers – called index informers can be used which build a thread-safe object store serving as a cache. When the state of the cluster changes, the informer will not only invoke the handler functions of our controller, but also do the necessary updates to keep the cache up to date. As the cache has the additional ability to deal with indices, it is called an Indexer. Thus at the end of todays post, the following picture will emerge.

InformerControllerInteraction

In the remainder of this post, we will discuss indexers and how they interact with an informer in more detail, while in the next post, we will learn how informers are created and used and dig a little bit into their inner workings.

Watches and resource versions

Before we talk about informers and indexers, we have to understand the basic mechanisms that clients can use to keep track of the cluster state. To enable this, the Kubernetes API offers a mechanism called a watch. This is maybe explained best using an example.

To follow this example, we assume that you have a Kubernetes cluster up and running. We will use curl to directly interact with the API. To avoid having to add tokens or certificates to our request, we will use the kubectl proxy mechanism. So in a separate terminal, run

$ kubectl proxy

You should see a message that the proxy is listening on a port (typically 8001) on the local host. Any requests sent to this port will be forwarded to the Kubernetes API server. To populate our cluster, let us first start a single HTTPD.

$ kubectl run alpine --image=httpd:alpine

Then let us use curl to get a list of running pods in the default namespace.

$ curl localhost:8001/api/v1/namespaces/default/pods
{
  "kind": "PodList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/namespaces/default/pods",
    "resourceVersion": "6834"
  },
  "items": [
    {
      "metadata": {
        "name": "alpine-56cf65bbfc-tzqqx",
        "generateName": "alpine-56cf65bbfc-",
        "namespace": "default",
        "selfLink": "/api/v1/namespaces/default/pods/alpine-56cf65bbfc-tzqqx",
        "uid": "584ddf85-5f8d-11e9-80c0-080027696a3f",
        "resourceVersion": "6671",
--- REDACTED ---

As expected, you will get a JSON encoded object of type PodList. The interesting part is the data in the metadata. You will see that there is a field resourceVersion. Essentially, the resource version is a number which increases over time and that uniquely identifies a certain state of the cluster.

Now the Kubernetes API offers you the option to request a watch, using this resource version as a starting point. To do this manually, enter

$ curl -v localhost:8001/api/v1/namespaces/default/pods?watch=1&resourceVersion=6834

Looking at the output, you will see that this request returns a HTTP response with the transfer enconding “chunked”. This is specified in RFC 7230 and puts the client into streaming mode, i.e. the connection will remain open and the API server will continue to send updates in small chunks. This will move curl into the background, but curl will continue to print the received data to the terminal. If you now create additional pods in your cluster or delete existing pods, you will continue to see notifications being received, informing you about the events. Each notification consists of a type (ADDED, MODIFIED, ERROR or DELETED) and an object – the layout of the message is described here.

This gives us a way to obtain a complete picture of the clusters state in an efficient manner. We first use an ordinary API request to list all resources. We then remember the resource version in the response and use that resource version as a starting point for a watch. Whenever we receive a notification about a change, we update our local data accordingly. And essentially, this is exactly what the combination of informer and indexer are doing.

Caching mechanisms and indexers

An indexer is any object that implements the interface cache.Indexer. This interface in turn is derived from cache.Store, so let us study that first. Its definition is in store.go.

type Store interface {
	Add(obj interface{}) error
	Update(obj interface{}) error
	Delete(obj interface{}) error
	List() []interface{}
	ListKeys() []string
	Get(obj interface{}) (item interface{}, exists bool, err error)
	GetByKey(key string) (item interface{}, exists bool, err error)
	Replace([]interface{}, string) error
	Resync() error
}

So basically a store is something to which we can add objects, retrieve them, update or delete them. The interface itself does not make any assumptions about keys, but when you create a new store, you provide a key function which extracts the key from an object and has the following signatures.

type KeyFunc func(obj interface{}) (string, error)

Working with stores is very convenient and easy, you can find a short example that stores objects representing books in a store here.

Let us now verify that, as the diagram above claims, both, the informer and the lister have a reference to the same indexer. To see this, let us look at the creation process of our sample controller.

When a new controller is created by the function NewController, this function accepts a DeploymentInformer and a FooInformer. These are interfaces that provide access to an actual informer and a lister for the respective resources. Let us take the FooInformer as an example. The actual creation method for the Lister looks as follows.

func (f *fooInformer) Lister() v1alpha1.FooLister {
	return v1alpha1.NewFooLister(f.Informer().GetIndexer())
}

This explains how the link between the informer and the indexer is established. The communication between informer and indexer is done via the function handleDeltas which receives a list of Delta objects as defined in delta_fifo.go (we will learn more about how this works in the next post). If we look at this function, we find that it does not only call all registered handler functions (with the help of a processor), but also calls the methods Add, Update and Delete on the store, depending on the type of the delta.

We now have a rather complete picture of how our sample controller works. The informer uses the Kubernetes API and its mechanism to watch for changes based on resource versions to obtain updates of the cluster state. These updates are used to maintain an object store which reflects the current state of the cluster and to invoke defined event handler functions. A controller registers its functions with the informer to be called when a resource changes. It can then access the object store to easily retrieve the current state of the resources and take necessary actions to drive the system towards the target state.

What we have not yet seen, however, is how exactly the magical informer works – this will be the topic of our next post.

Understanding Kubernetes controllers part I – queues and the core controller loop

In previous posts in my series on Kubernetes, we have stumbled across the concept of a controller. Essentially, a controller is a daemon that monitors the to-be state of components of a Kubernetes cluster against the as-is state and takes action to reach the to-be state.

A classical example is the Replica set controller which monitors replica sets and pods and is responsible for creating new pods or deleting existing pods if the number of replicas is out-of-sync with the defined value.

In this series, we will perform a deep dive into controllers. Specifically, we will take a tour through the sample controller that is provided by the Kubernetes project and try to understand how this controller works. We will also explain how this relates to customer resource definitions (CRDs) and what steps are needed to implement a customer controller for a given CRD.

Testing the sample controller

Let us now start with our analysis of the sample controller. To follow this analysis, I advise to download a copy of the sample controller into your Go workspace using go get github.com/kubernetes/sample-controller and then using an editor like Atom that offers plugins to navigate Go code.

To test the client and to have a starting point for debugging and tracing, let us follow the instructions in the README file that is located at the root of the repository. Assuming that you have a working kubectl config in $HOME/.kube/config, build and start the controller as follows.

$ cd $GOPATH/src/k8s.io/sample-controller
$ go build
$ kubectl create -f artifacts/examples/crd.yaml
$ ./sample-controller --kubeconfig=$HOME/.kube/config

This will create a custom resource definition, specifically a new resource type named “Foo” that we can use as any other resource like Pods or Deployments. In a separate terminal, we can now create a Foo resource.

$ kubectl create -f artifacts/examples/example-foo.yaml 
$ kubectl get deployments
$ kubectl get pods

You will find that our controller has created one deployment which in turn brings up one pod running nginx. If you delete the custom resource again using kubectl delete foo example-foo, both, the deployment and the pods disappear again. However, if you manually delete the deployment, it is is recreated by the controller. So apparently, our controller is able to detect changes to deployments and foo resources and to match them accordingly. How does this work?

Basically, a controller will periodically match the state of the system to the to-be state. For that purpose, several functionalities are required.

  • We need to be able to keep track of the state of the system. This is done based on an event-driven processing and handled by informers that are able to subscribe to events and invoke specific handlers and listers that are able to list all resources in a given Kubernetes cluster
  • We need to be able to keep track of the state of the system. This is done using object stores and their indexed variants indexer
  • Ideally, we should be able to process larger volumes using multi-threading, coordinated by queues

In this and the next post, we will go through these elements one by one. We start with queues.

Queues and concurrency

Let us start by investigating threads and queues in the Kubernetes library. The ability to easily create threads (called go-routines) in Go and the support for managing concurrency and locking are one of the key differentiators of the Go programming language, and of course the Kubernetes client library makes use of these features.

Essentially, a queue in the Kubernetes client library is something that implements the interface k8s.io/client-go/util/workqueue/Interface. That interface contains (among others) the following methods.

  • Add adds an object to the queue
  • Get blocks until an item is available in the queue, and then returns the first item in the queue
  • Done marks an item as processed

Internally, the standard implementation of a queue in queue.go uses Go maps. The keys of these maps are arbitrary objects, the items in the map are actually just placeholders (empty structures). One of these maps is called the dirty set, this map contains all elements that make up the actual queue, i.e. need to be processed. The second map is called the processing set, these are all items which have been retrieved using Get, but for which Done has not yet been called. As maps are unordered, there is also an array which holds the elements in the queue and is used to define the order of processing. Note that each of the maps can hold a specific object only once, whereas the queue can hold several copies of the object.

Queue

If we add something to the queue, it is added to the dirty set and appended to the queue array. If we call Get, the first item is retrieved from the queue, removed from the dirty set and added to the processing set. Calling Done will remove the element from the processing set as well, unless someone else has called Add in the meantime again on the same object – in this case it will be removed from the processing set, but also be added to the queue again.

Let us implement a little test program that works with queues. For that purpose, we will establish two threads aka goroutines. The first thread will call Add five times to add something to the queue and then complete. The second thread will sleep for three seconds and then read from the queue in a loop to retrieve the elements. Here are the functions to send and receive.

func fillQueue(queue workqueue.Interface) {
	time.Sleep(time.Second)
	queue.Add("this")
	queue.Add("is")
	queue.Add("a")
	queue.Add("complete")
	queue.Add("sentence")
}

func readFromQueue(queue workqueue.Interface) {
	time.Sleep(3 * time.Second)
	for {
		item, _ := queue.Get()
		fmt.Printf("Got item: %s\n", item)
		queue.Done(item)
	}
}

With these two functions in place, we can now easily create two goroutines that use the same queue to communicate (goroutines, being threads, share the same address space and can therefore communicate using common data structures).

myQueue := workqueue.New()
go fillQueue(myQueue)
go readFromQueue(myQueue)

However, if you run this, you will find that there is a problem. Our main thread completes after creating both worker threads, and this will cause the program to exit and kill both worker threads before the reader has done anything. To avoid this, we need to wait for the reader thread (which, reading from the queue, will in turn wait for the writer thread). One way to do this with Go language primitives is to use channels. So we change our reader function to receive a channel of integer elements

func readFromQueue(queue workqueue.Interface, stop chan int)

and in the main function, we create a channel, pass it to the function and then read from it which will block until the reader thread sends a confirmation that it is done reading.

stop := make(chan int)
myQueue := workqueue.New()
go fillQueue(myQueue)
go readFromQueue(myQueue, stop)
<-stop

Now, however, there is another problem – how does the reader know that no further items will be written to the queue? Fortunately, queues offers a way to handle this. When a writer is done using the queue, it will call Shutdown on the queue. This will change the queues behavior – reads will no longer be blocking, and the second return value of a Get will be true if the queue is empty. If a reader recognizes this situation, it can stop its goroutine.

A full example can be found here – of course this is made up to demonstrate goroutines, queues and channels and not the most efficient solution for the problem at hand.

The core controller loop

Armed with our understanding of concurrency and queues, we can now take a first look at the code of the sample controller. The main entry points for are the function handleObject and enqueueFoo – these are the functions invoked by the Informer, which we will discuss in one of the next posts, whenever either a Foo object or a Deployment is created, updated or deleted.

The function enqueueFoo is called whenever a Foo object is changed (i.e. added, updated or deleted). It simply determines a key for the object and adds that key to the workqueue.

The workqueue is read by worker threads, which are created in the Run function of the controller. This function creates a certain number of goroutines and then listens on a channel called stopCh, as we have done it in our simplified example before. This channel is created by main.go and used to be able to stop all workers and the main thread if a signal is received.

Each worker thread executes the method processNextItem of the controller in a loop. For each item in the queue, this method calls another method – syncHandler – passing the item retrieved from the queue, i.e. the key of the Foo resource. This method then uses a Lister to retrieve the current state of the Foo resource. It then retrieves the deployment behind the Foo resource, creates it if it could not be found, and updates the number of replicas if needed.

The function handleObject is similar. It is invoked by the informer with the new, updated state of the Deployment object. It then determines the owning Foo resource and simply enqueues that Foo resource. The rest of the processing will then be the same.

At this point, two open ends remain. First, we will have to understand how an Informer works and how it invokes the functions handleObject and enqueueFoo. And we will need to understand what a Lister is doing and where the lister and the data is uses comes from. This will be the topic of our next post.

Extending Kubernetes with custom resources and custom controllers

The Kubernetes API is structured around resources. Typical resources that we have seen so far are pods, nodes, containers, ingress rules and so forth. These resources are built into Kubernetes and can be addresses using the kubectl command line tool, the API or the Go client.

However, Kubernetes is designed to be extendable – and in fact, you can add your own resources. These resources are defined by objects called custom resource definitions (CRD).

Setting up custom resource definitions

Confusingly enough, the definition of a custom resource – i.e. the CRD – itself is nothing but a resource, and as such, can be created using either the Kubernetes API directly or any client you like, for instance kubectl.

Suppose we wanted to create a new resource type called book that has two attributes – an author and a title. To distinguish this custom resource from other resources that Kubernetes already knows, we have to put our custom resource definition into a separate API group. This can be any string, but to guarantee uniqueness, it is good practice to use some sort of domain, for instance a GitHub repository name. As my GitHub user name is christianb93, I will use the API group christianb93.github.com for this example.

To understand how we can define that custom resource type using the API, we can take a look at its specification or the corresponding Go structures. We see that

  • The CRD resource is part of the API group apiextensions.k8s.io and has version v1beta1, so the value of the apiVersion fields needs to be apiextensions.k8s.io/v1beta1
  • The kind is, of course, CustomResourceDefinition
  • There is again a metadata field, which is built up as usual. In particular, there is a name field
  • A custom resource definition spec consists of a version, the API group, a field scope that determines whether our CRD instances will live in a cluster scope or in a namespace and a list of names

This translates into the following manifest file to create our CRD.

$ kubectl apply -f - << EOF
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
    name: books.christianb93.github.com
spec:  
    version: v1
    group: christianb93.github.com
    scope: Namespaced
    names:
      plural: books
      singular: book
      kind: Book
EOF
customresourcedefinition.apiextensions.k8s.io/books.christianb93.github.com created

This will create a new type of resources, our books. We can access books similar to all other resources Kubernetes is aware of. We can, for instance, get a list of existing books using the API. To do this, open a separate terminal and run

kubectl proxy

to get access to the API endpoints. Then use curl to get a list of all books.

$ curl -s -X GET "localhost:8001/apis/christianb93.github.com/v1/books"  | jq
{
  "apiVersion": "christianb93.github.com/v1",
  "items": [],
  "kind": "BookList",
  "metadata": {
    "continue": "",
    "resourceVersion": "7281",
    "selfLink": "/apis/christianb93.github.com/v1/books"
  }
}

So in fact, Kubernetes knows about books and has established an API endpoint for us. Note that the path contains “apis” and not “api” to indicate that this is an extension of the original Kubernetes API. Also note that the path contains our dedicated API group name and the version that we have specified.

At this point we have completed the definition of our custom resource “book”. Now let us try to actually create some books.

$ kubectl apply -f - << EOF
apiVersion: christianb93.github.com/v1
kind: Book
metadata:
  name: david-copperfield
spec:
  title: David Copperfield
  author: Dickens
EOF
book.christianb93.github.com/my-book created

Nice – we have created our first book as an instance of our new CRD. We can now work with this book similar to a pod, a deployment and so forth. We can for instance display it using kubectl

$ kubectl get book david-copperfield
NAME                AGE
david-copperfield   3m38s

or access it using curl and the API.

$ curl -s -X GET "localhost:8001/apis/christianb93.github.com/v1/namespaces/default/books/david-copperfield" | jq
{
  "apiVersion": "christianb93.github.com/v1",
  "kind": "Book",
  "metadata": {
    "annotations": {
      "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"christianb93.github.com/v1\",\"kind\":\"Book\",\"metadata\":{\"annotations\":{},\"name\":\"david-copperfield\",\"namespace\":\"default\"},\"spec\":{\"author\":\"Dickens\",\"title\":\"David Copperfield\"}}\n"
    },
    "creationTimestamp": "2019-04-21T09:32:54Z",
    "generation": 1,
    "name": "david-copperfield",
    "namespace": "default",
    "resourceVersion": "7929",
    "selfLink": "/apis/christianb93.github.com/v1/namespaces/default/books/david-copperfield",
    "uid": "70fbc120-6418-11e9-9fbf-080027a84e1a"
  },
  "spec": {
    "author": "Dickens",
    "title": "David Copperfield"
  }
}

Validations

If we look again at what we have done and where we have started, somethings still feels a bit wrong. Remember that we wanted to define a resource called a “book” that has a title and an author. We have used those fields when actually creating a book, but we have not referred to it at all in the CRD. How does the Kubernetes API know which fields a book actually has?

The answer is simple – it does not know this at all. In fact, we can create a book with any collection of fields we want. For instance, the following will work just fine.

$ kubectl apply -f - << EOF
apiVersion: christianb93.github.com/v1
kind: Book
metadata:
  name: moby-dick
spec:
  foo: bar
EOF
book.christianb93.github.com/moby-dick created

In fact, when you run this, the Kubernetes API server will happily take your JSON input and store it in the etcd that keeps the cluster state – and it will store there whatever you provide. To avoid this, let us add a validation rule to our resource definition. This allows you to attach an OpenAPI schema to your CRD against which the books will be validated. Here is our updated CRD manifest file to make this work.

$ kubectl delete crd  books.christianb93.github.com
customresourcedefinition.apiextensions.k8s.io "books.christianb93.github.com" deleted
$ kubectl apply -f - << EOF
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
    name: books.christianb93.github.com
spec:  
    version: v1
    group: christianb93.github.com
    scope: Namespaced
    subresources:
      status: {}
    names:
      plural: books
      singular: book
      kind: Book
    validation:
      openAPIV3Schema:
        properties:
          spec:
            required: 
            - author
            - title
            properties:
              author:
                type: string
              title:
                type: string
EOF
customresourcedefinition.apiextensions.k8s.io/books.christianb93.github.com created

If you know repeat the command above, you will find that "David Copperfield" can be created, but "Moby Dick" is rejected, as it does not match the validation rules (the required fields author and title are missing).

There is another change that we have made in this version of our CRD – we have added a subresource called status to our CRD. This subresource allows a controller to update the status of the resource indepently of the specification – see the corresponding API convention for more details on this.

The controller pattern

As we have seen above, a CRD is essentially allowing you to store data as part of the cluster state kept in the etcd key-value store using a Kubernetes API endpoint. However, CRDs do not actually trigger any change in the cluster. If you POST a custom resource like a book to the Kubernetes API server, all it will do is to store that object in the etcd store.

It might come as a bit of a surprise, but strictly speaking, this is true for built-in resources as well. Suppose, for instance, that you use kubectl to create a deployment. Then, kubectl will create a PUT request for a deployment and send it to the API server. The API server will process the request and store the new deployment in the etcd. It will, however, not actually create pods, spin up containers and so forth.

This is the job of another component of the Kubernetes architecture – the controllers. Essentially, a controller is monitoring the etcd store to keep track of its contents. Whenever a new resource, for example a deployment, is created, the controller will trigger the associated actions.

Kubernetes comes with a set of built-in controllers in the controller package. Essentially, there is one controller for each type of resource. The deployment controller, for instance, monitors deployment objects. When a new deployment is created in the etcd store, it will make sure that there is a matching replica set. These sets are again managed by another controller, the replica set controller, which will in turn create matching pods. The pods are again monitored by the scheduler that determines the node on which the pods should run and writes the bindings back to the API server. The updated bindings are then picked up by the kubelet and the actual containers are started. So essentially, all involved components of the Kubernetes architecture talk to the etcd via the API server, without any direct dependencies.

KubernetesComponents

Of course, the Kubernetes built-in controllers will only monitor and manage objects that come with Kubernetes. If we create custom resources and want to trigger any actual action, we need to implement our own controllers.

Suppose, for instance, we wanted to run a small network of bitcoin daemons on a Kubernetes cluster for testing purposes. Bitcoin daemons need to know each other and register themselves with other daemons in the network to be able to exchange messages. To manage that, we could define a custom resource BitcoinNetwork which contains the specification of such a network, for instance the number of nodes. We could then write a controller which

  • Detects new instances of our custom resource
  • Creates a corresponding deployment set to spin up the nodes
  • Monitors the resulting pods and whenever a pod comes up, adds this pod to the network
  • Keeps track of the status of the nodes in the status of the resource
  • Makes sure that when we delete or update the network, the corresponding deployments are deleted or updated as well

Such a controller would operate by detecting newly created or changed BitcoinNetwork resources, compare their definition to the actual state, i.e. existing deployments and pods, and update their state accordingly. This pattern is known as the controller pattern or operator pattern. Operators exists for many applications, like Postgres, MySQL, Prometheus and many others.

I did actually pick this example for a reason – in an earlier post, I showed you how to set up and operate a small bitcoin test network based on Docker and Python. In the next few posts, we will learn how to write a custom controller in Go that automates all this on top of Kubernetes! To achieve this, we will first analyze the components of a typical controller – informes, queues, caches and all that – using the Kubernetes sample controller and then dig into building a custom bitcoin controller armed with this understanding.

Learning Go with Kubernetes IV – life of a request

So far we have described how a client program utilizes the Kubernetes library to talk to a Kubernetes API server. In this post, we will actually look into the Kubernetes client code and try to understand how it works under the hood.

When we work with the Kubernetes API and try to understand the client code, we first have to take a look at how the API is versioned.

Versioning in the Kubernetes API

The Kubernetes API is a versioned API. Every resource that you address using requests like GET or POST has a version. For stable versions, the version name is of the form vX, where X is an integer. In addition, resources are grouped into API groups. The full relative path to a resource is of the form

/api/GROUP/VERSION/namespaces/NAMESPACE

Let us take the job resource as an example. This resource is in the API group batch. A GET request for a job called my-job in the default namespace using version v1 of the API would therefore be something like

GET /api/batch/v1/namespaces/default/jobs/my-job

An exception is made by the core API group, which is omitted in the URL path for historical reasons. In a manifest file, API group and version are both stored in the field apiVersion, which, in our example, would be batch/v1.

Within the Kubernetes Go client, the combination of a type of resource (a “kind”, like a Pod), a version and an API group is stored in a Go structure called GroupVersionKind. In fact, this structure is declared as follows

type GroupVersionKind struct {
	Group   string
	Version string
	Kind    string
}

in the file k8s.io/apimachinery/pkg/runtime/schema/group_version.go. In the source code, instances of this class are typically called gvk. We will later see that, roughly speaking, the client contains a machinery which allows us to map forth and back between possible combinations of group, version and kind and Go structures.

An overview of the Kubernetes client

At least as far as we are concerned with getting, updating and deleting resources, the Kubernetes client code consists of the following core components:

  • A clientset is the entry point into the package and typically created from a client configuration as stored in the file ~/.kube/config
  • For each combination of API group and version, there is a package that contains the corresponding clients. For each resource in that group, like a Node, there is a corresponding Interface that allows us to perform operations like get, list etc. on the resource, and a corresponding object like a node itself
  • The package k8s.io/client-go/rest contains the code to create, submit and process REST requests. There is a REST client, request and result structures, serializers and configuration objects
  • The package k8s.io/apimachinery/pkg/runtime contains the machinery to translate API requests and replies from and to Go structures. An Encoder is able to write objects to a stream. A Decoder transforms a stream of bytes into an object. A Scheme is able to map a group-version-kind combination into a Go type and vice versa.
  • The same package contains a CodecFactory that is able to create encoders and decoders, and some standard encoders and decoders, for instance for JSON and YAML

KubernetesGoClientOverview

Let us now dive into each of these building blocks in more detail.

Clientsets and clients

In our first example program, we have used the following lines to connect to the API.

clientset, err := kubernetes.NewForConfig(config)
coreClient := clientset.CoreV1()
nodeList, err := coreClient.Nodes().List(metav1.ListOptions{})

Let us walk through this and see how each of these lines is implemented behind the scenes. The first line creates an instance of the class ClientSet. Essentially, a clientset is a set of client objects, where each client object represents one version of an API group. When we access nodes, we will use the API group core, and correspondingly use the field coreV1 of this structure.

This core client is an instance of k8s.io/client-go/kubernetes/typed/core/v1/coreV1Client and implementing the interface CoreV1Interface. This interface declares for each resource in the core API a dedicated getter function which returns an interface to work with this resource. For a node, the getter function is called Nodes and returns a class implementing the interface NodeInterface, which defines all the functions we are looking for – get, update, delete, list and so forth.

An instance of the Nodes class also contains a reference to a RESTClient which is the working horse where the actual REST requests to the Kubernetes API will be assembled and processed – let us continue our analysis there.

RESTClient and requests

How do REST clients work? Going back to our example code, we invoke the REST client in the line

nodeList, err := coreClient.Nodes().List(metav1.ListOptions{})

Here we invoke the List method on the nodes object which is defined in the file k8s.io/client-go/kubernetes/typed/core/v1/node.go. The core of this method is the following code snippet.

err = c.client.Get().
	Resource("nodes").
	VersionedParams(&opts, scheme.ParameterCodec).
	Timeout(timeout).
	Do().
	Into(result)

Let us go through this step by step to see what is going on. First, the attribute client referenced here is a RESTClient, defined in k8s.io/client-go/rest/client.go. Among other things, this class contains a set of methods to manipulate requests.

The first method that we call is Get, which returns an instance of the class rest.Request, defined in rest.go in the same directory. A request contains a reference to a HTTPClient, which typically is equal to the HTTPClient to which the RESTClient itself refers. The request created by the Get method will be pre-initialized with the verb “GET”.

Next, several parameters are added to the request. Each of the following methods is a method of the Request object and again returns a request, so that chaining becomes possible. First, the method Resource sets the name of the resource that we want to access, in this case “nodes”. This will become part of the URL. Then we use VersionedParams to add the options to the request and Timeout to set a timeout.

We then call Do() on the request. Here is the key section of this method

var result Result
err := r.request(func(req *http.Request, resp *http.Response) {
	result = r.transformResponse(resp, req)
})
return result

In the first line, we create an (empty) rest.Result object. In addition to the typical attributes that you would expect from a HTTP response, like a body, i.e a sequence of bytes, this object also contains a decoder, which will become important later on.

We then invoke the request method of the Request object. This function assembles a new HTTP request based on our request, invokes the Do() method on the HTTP client and then calls the provided function which is responsible for converting the HTTP response into a Result object. The default implementation of this is transformResponse, which also sets a decoder in the Result object respectively copies the decoder contained in the request object.

RESTClient

When all this completes, we have a Result object in our hands. This is still a generic object, we have a response body which is a stream of bytes, not a typed Go structure.

This conversion – the unmarshalling – is handled by the method Into. This method accepts as an argument an instance of the type runtime.Object and fills that object according to the response body. To understand how this work, we will have to take a look at schemes, codec factories and decoders.

Schemes and decoder

In the first section, we have seen that API resources are uniquely determined by the combination of API group, version and kind. For each valid combination, our client should contain a Go structure representing this resource, and conversely, for every valid resource in the Go world, we would expect to have a combination of group, version and kind. The translation between these two worlds is accomplished by a scheme. Among other things, a scheme implements the following methods.

  • A method ObjectKind which returns all known combinations of kind, API group and version for a given object
  • A method Recognizes which in turn determines whether a given combination of kind, API group and version is allowed
  • A method New which is able to create a new object for a given combination of API group, kind and version

Essentially, a scheme knows all combinations of API group, version and kind and the corresponding Go structures and is able to create and default the Go structures. For this to work, all resources handled by the API need to implement the interface runtime.Object.

This is nice, but what we need to transform the result of a call to the Kubernetes API into a Go structure is a decoder object. To create decoder (and encoder) objects, the API uses the class CodecFactory. A codec factory refers to a scheme and is able to create encoders and decoders. Some of the public methods of such an object are collected in the interface NegotiatedSerializer.

This interface provides the missing link between a REST client and the scheme and decoder / encoder machinery. In fact, a REST client has an attribute contentConfig which is an object of type ContentConfig. This object contains the HTTP content type, the API group and version the client is supposed to talk to and a NegotiatedSerializer which will be used to obtain decoders and encoders.

SchemesAndDecoders

Where are schemes and codec factories created and stored? Within the package k8s.io/client-go/kubernetes/scheme, there is a public variable Scheme and a public variable Codecs which is a CodecFactory. Both variables are declared in register.go. The scheme is initially empty, but in the init method of the package, the scheme is built up by calling (via a function list called a scheme builder) the function AddToScheme for each known API group.

Putting it all together

Armed with this understanding of the class structures, we can now again try to understand what happens when we submit our request to list all nodes.

During initialization of the package k8s.io/client-go/kubernetes/scheme, the initialization code in the file register.go is executed. This will initialize our scheme and a codec factory. As part of this, a standard decoder for the JSON format will be created (this happens in the function NewCodecFactory in codec_factory.go).

Then, we create our clientset using the function NewForConfig in the kubernetes package, which calls the method NewForConfig for each of the managed clients, including our CoreV1Client. Here, the following things happen:

  • We set group and version to the static values provided in the file register.go of the v1 package – the group will be empty, as we are in the special case for the core client, and the version will be “”v1”
  • We add a reference to the CodecFactory to our configuration
  • We create a REST client with a base URL constructed from the host name and port of the Kubernetes API server, the API group and the version as above
  • We then invoke the function createSerializers in the rest package. This function retrieves all supported media types from the codec factory and matches it against the media type in the kubectl config. Then a rest.Serializer is selected which matches group, version and media type
  • The REST client is added to the core client and the core client is returned
  • When we subsequently create a request using this REST client, we add this serializer to the request from which it is later copied to the result

At this point, we are ready to use this core client. We now navigate to a NodeInterface and call its list method. As explained above, this will eventually take us to the function Into defined in request.go. Here, we invoke the Decode method of our REST decoder. As this is the default JSON serializer, this method is the Decode function in json.go. This decode function first uses the scheme to determine if API group, version and kind of the response are valid and match the expected Go type. It then uses a standard JSON unmarshaller to perform the actual decoding.

This completes our short tour through the structure of the Go client source code. We have seen some central concepts of the client library – API groups, versions and kinds, schemes, encoder and decoder and the various client types – which we will need again in a later post when we discuss custom controllers.

Learning Go with Kubernetes III – slices and Kubernetes resources

In the last post, we have seen how structures, methods and interfaces in Go are used by the Kubernetes client API to model object oriented behavior. Today, we will continue our walk through to our first example program.

Arrays and slices

Recall that in the last point, we got to the point that we were able to get a list of all nodes in our cluster using

nodeList, err := coreClient.Nodes().List(metav1.ListOptions{})

Let us now try to better understand what nodeList actually is. If we look up the signature of the List method, we find that it is

List(opts metav1.ListOptions) (*v1.NodeList, error)

So we get a pointer to a NodeList. This in turns has a field Items which is defined as

Items []Node

We can access the field Items using either an explicit dereferencing of the pointer as items := (*nodeList).Items or the shorthand notation items := nodeList.Items.

Now looking at the definition above, it seems that Items is some sort of array whose elements are of type Node, but which does not have a fixed length. So time to learn more about arrays in Go

At the first glance, arrays in Go are very much like in many other languages. A declaration like

var a [5]int

declares an array called a of five integers. Arrays, like in C, cannot be resized. Other than in C, however, an assignment of arrays does not create two pointers that point to the same location in memory, but creates a copy. Thus if you do something like

b := a

you create a second array b which initially is identical to a, but if you modify b, a remains unchanged. This is especially important when you pass arrays to functions – you will pass a copy, and especially for large arrays, this is probably not what you want.

So why not passing pointers to arrays? Well, there is a little problem with that approach. In Go, the length of an array is part of an arrays type, so [5]int and [6]int are different types, which makes it difficult to write functions that accept an array of arbitrary length. For that purpose, Go offers slices which are essentially pointers to arrays.

Arrays are created either by declaring them or by using the new keyword. Slices are created either by slicing an existing array or by using the make keyword. As in Python, slices can refer to a part of an array and we can take slices of an existing slice. When slices are assigned, they refer to the same underlying array, and if a slice is passed as a parameter, no copy of the underlying array is created. So slices are effectively pointers to parts of arrays (with a bit more features, for instance the possibility to extend them by appending data).

How do you loop over a slice? For an array, you know its length and can build an ordinary loop. For a slice, you have two options. First, you can use the built-in function len to get the length of a slice and use that to construct a loop. Or you can use the for statement with range clause which also works for other data structures like strings, maps and channels. So we can iterate over the nodes in the list and print some basic information on them as follows.

items := nodeList.Items
for _, item := range items {
	fmt.Printf("%-20s  %-10s %s\n", item.Name,
		item.Status.NodeInfo.Architecture,
		item.Status.NodeInfo.OSImage)
}

Standard types in the Kubernetes API

The elements of the list we are cycling through above are instances of the struct Node, which is declared in the file k8s.io/api/core/v1/types.go. It is instructive to look at this definition for a moment.

type Node struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
	Spec NodeSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`
	Status NodeStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
}

First, we see that the first and second line are examples of embedded fields. It is worth noting that we can address these fields in two different ways. The second field, for instance, is ObjectMeta which has itself a field Name. To access this field when node is of type Node, we could either write node.ObjectMeta.Name or node.Name. This mechanism is called field promotion.

The second thing that is interesting in this definition are the strings like ‘json:”inline”‘ added after some field names. These string literals are called tags. They are mostly ignored, but can be inspected using the reflection API of Go and are, for instance, used by the json marshaller and unmarshaller.

When we take a further look at the file types.go in which these definitions are located and look at its location in our Go workspace and the layout of the various Kubernetes GitHub repositories, we see that (verify this in the output of go list -json k8s.io/client-go/kubernetes) this file is part of the Go package k8s.io/api/core/v1 which is part of the Kubernetes API GitHub repository. As explained here, the Swagger API specification is generated from this file. If you take a look at the resulting API specification, you will see that the comments in the source file appear in the documentation and that the json tags determine the field names that are used in the API.

To further practice navigating the source code and the API documentation, let us try to use the API to create a (naked) pod. The documentation tells us that a pod belongs to the API group core, so that the core client is probably again what we need. So the first few lines of our code will be as before.

home := homedir.HomeDir()
kubeconfig := filepath.Join(home, ".kube", "config")
config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)

if err != nil {
	panic(err)
}
clientset, err := kubernetes.NewForConfig(config)
if err != nil {
	panic(err)
}
coreClient := clientset.CoreV1()

Next we have to create a Pod. To understand what we need to do, we can again look at the definition of a Pod in types.go which looks as follows.

type Pod struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
	Spec PodSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`
	Status PodStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`

As in a YAML manifest file, we will not provide the Status field. The first field that we need is the TypeMeta field. If we locate its definition in the source code, we see that this is again a structure. To create an instance, we can use the following code

metav1.TypeMeta{
			Kind:       "Pod",
			APIVersion: "v1",
		}

This will create an unnamed instance of this structure with the specified fields – if you have trouble reading this code, you might want to consult the corresponding section of a Tour in Go. Similarly, we can create an instance of the ObjectMeta structure.

The PodSpec structure is a bit more interesting. The key field that we need to provide is the field Containers which is a slice. To create a slice consisting of one container only, we can use the following syntax

[]v1.Container{
	v1.Container{
		Name:  "my-ctr",
		Image: "httpd:alpine",
	}

Here the code starting in the second line creates a single instance of the Container structure. Surrounding this by braces gives us an array with one element. We then use this array to initialize our slice.

We could use temporary variables to store all these fields and then assemble our Pod structure step by step. However, in most examples, you will see a coding style avoiding this which heavily uses anonymous structures. So our final result could be

pod := &v1.Pod{
	TypeMeta: metav1.TypeMeta{
		Kind:       "Pod",
		APIVersion: "v1",
	},	
	ObjectMeta: metav1.ObjectMeta{
		Name: "my-pod",
	},
	Spec: v1.PodSpec{
		Containers: []v1.Container{
			v1.Container{
				Name:  "my-ctr",
				Image: "httpd:alpine",
			},
		},
	},
}

It takes some time to get used to expressions like this one, but once you have seen and understood a few of them, they start to be surprisingly readable, as all declarations are in one place. Also note that this gives us a pointer to a Pod, as we use the deferencing & in front of our structure. This pointer can then be used as input for the Create() method of a PodInterface, which finally creates the actual Pod. You can find the full source code here, including all the boilerplate code.

At this point, you should be able to read and (at least roughly) understand most of the source code in the Kubernetes client package. In the next post, we will be digging deeper into this code and trace an API request through the library, starting in your Go program and ending at the communication with the Kubernetes API server.

Learning Go with Kubernetes II – navigating structs, methods and interfaces

In the last post, we have set up our Go development environment and downloaded the Kubernetes Go client package. In this post, we will start to work on our first Go program which will retrieve and display a list of all nodes in a cluster.

You can download the full program here from my GitHub account. Copy this file to a subdirectory of the src directory in your Go workspace. You can build the program with

go build

which will create an executable called example1. If you run this (and have a working kubectl configuration pointing to a cluster with some nodes), you should see an output similar to the following.

$ ./example1 
NAME                  ARCH       OS
my-pool-7t62          amd64      Debian GNU/Linux 9 (stretch)
my-pool-7t6p          amd64      Debian GNU/Linux 9 (stretch)

Let us now go through the code step by step. In the first few lines, I declare my source code as part of the package main (which Go will search for the main entry point) and import a few packages that we will need later.

package main

import (
	"fmt"
	"path/filepath"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/client-go/kubernetes"
	"k8s.io/client-go/tools/clientcmd"
	"k8s.io/client-go/util/homedir"
)

Note that the first two packages are Go standard packages, whereas the other four are Kubernetes packages. Also note that we import the package k8s.io/apimachinery/pkg/apis/meta/v1 using metav1 as an alias, so that an element foo in this package will be accessible as metav1.foo.

Next, we declare a function called main. As in many other program languages, the linker will use this function (in the package main) as an entry point for the executable.

func main() {
....
}

This function does not take any arguments and has no return values. Note that in Go, the return value is declared after the function name, not in front of the function name as in C or Java. Let us now take a look at the first three lines in the program

home := homedir.HomeDir()
kubeconfig := filepath.Join(home, ".kube", "config")
fmt.Printf("%-20s  %-10s %s\n", "NAME", "ARCH", "OS")

In the first line, we declare and initialize (using the := syntax) a variable called home. Variables in Go are strongly typed, but with this short variable declaration syntax, we ask the compiler to derive the type of the variable automatically from the assigned value.

Let us try to figure out this value. homedir is one of the packages that we have imported above. Using go list -json k8s.io/client-go/util/homedir, you can easily find the file in which this package is described. Let us look for a function HomeDir.

$ grep HomeDir $GOPATH/src/k8s.io/client-go/util/homedir/homedir.go 
// HomeDir returns the home directory for the current user
func HomeDir() string {

We see that there is a function HomeDir which returns a string, so our variable home will be a string. Note that the name of the function starts with an uppercase character so that it is exported by Go (in Go, elements starting with an uppercase character are exported, elements that start with a lowercase character not – you have to get used to this if you have worked with C or Java before). Applying the same exercise to the second line, you will find that kubeconfig is a string that is built from the three arguments to the function Join in the package filepath and path separators, i.e. this will be the path to the kubectl config file. Finally, the third line prints the header of the table of nodes we want to generate, in a printf-like syntax.

The next few lines in our code use the name of the kubectl config file to apply the configuration.

config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
if err != nil {
     panic(err)
}

This is again relatively straightforward, with one exception. The function BuildConfigFromFlags does actually return two values, as we can also see in its function declaration.

$ (cd  $GOPATH/src/k8s.io/client-go/tools/clientcmd ; grep "BuildConfigFromFlags" *.go)
client_config.go:// BuildConfigFromFlags is a helper function that builds configs from a master
client_config.go:func BuildConfigFromFlags(masterUrl, kubeconfigPath string) (*restclient.Config, error) {
client_config_test.go:	config, err := BuildConfigFromFlags("", tmpfile.Name())

The first argument is the actual configuration, the second is an error. We store both return values and check the error value – if everything went well, this should be nil which is the null value in Go.

Structures and methods

As a next step, we create a reference to the API client that we will use – a clientset.

clientset, err := kubernetes.NewForConfig(config)
if err != nil {
	panic(err)
}

Again, we can easily locate the function that we actually call here and try to determine its return type. By now, you should know how to figure out the directory in which we have to search for the answer.

$ (cd $GOPATH/src/k8s.io/client-go/kubernetes ; grep "func NewForConfig(" *.go)
clientset.go:func NewForConfig(c *rest.Config) (*Clientset, error) {

Now this is interesting for a couple of reasons. First, the first return value is of type *Clientset. This, like in C, is a pointer (yes, there are pointers in Go, but there is no pointer arithmetic), refering to the area in memory where an object is stored. The type of this object Clientset is not an elementary type, but seems to be a custom data type. If you search the file clientset.go in which the function is defined for the string Clientset, you will easy locate its definition.

// Clientset contains the clients for groups. Each group has exactly one
// version included in a Clientset.
type Clientset struct {
	*discovery.DiscoveryClient
	admissionregistrationV1beta1 *admissionregistrationv1beta1.AdmissionregistrationV1beta1Client
	appsV1                       *appsv1.AppsV1Client
	appsV1beta1                  *appsv1beta1.AppsV1beta1Client
...
	coreV1                       *corev1.CoreV1Client
...
}

where I have removed some lines for better readibility. So this is a structure. Similar to C, a structure is a sequence of named elements (field) which are bundled into one data structure. In the third line, for instance, we declare a field called appsV1 which is a pointer to an object of type appsv1.AppsV1Client (note that appsv1 is a package imported at the start of the file). The first line is a bit different – there is a field type, but not field name. This is called an embedded field, and its name is derived from the type name as the unqualified part of this name (in this case, this would be DiscoveryClient).

The field that we need from this structure is the field coreV1. However, there is a problem. Recall that only fields whose names start with an uppercase character are exported. So from outside the package, we cannot simply access this field using something like

clientset.coreV1

Instead, we need a function that returns this value which is located inside the package, something like a getter function. If you inspect the file clientset.go, you will easily locate the following function

// CoreV1 retrieves the CoreV1Client
func (c *Clientset) CoreV1() corev1.CoreV1Interface {
	return c.coreV1
}

This seems to be doing what we need, but its declaration is a bit unusual. There is a function name (CoreV1()), a return type (corev1.CoreV1Interface) and an empty parameter list. But there is also a declaration preceding the function name ((c *Clientset)).

This is called a receiver argument in Go. This binds our function to the type Clientset and at the same time acts as a parameter. When you invoke this function, you do it in the context of an instance of the type Clientset. In our example, we invoke this function as follows.

coreClient := clientset.CoreV1()

Here we call the function CoreV1 that we have just seen in clientset.go, passing a pointer to our instance clientset as the argument c. The function will then get the field coreV1 from this instance (which it can do, as it is located in the same package), and returns its value. Such a function could also accept additional parameters which are passed as usual. Note that the type of the receiver argument and the function need to be defined in the same package, otherwise the compiler would not know in which package it should look for the function CoreV1(). The presence of a receiver field turns a function into a method.

This looks a bit complicated, but is in fact rather intuitive (and those of you who have seen object oriented Perl know the drill). This construction links the structure Clientset containing data and the function CoreV1() with each other, under the umbrella of the package in which data type and function are defined. This comes close to a class in object-oriented programming languages.

Interfaces and inheritance

At this point, we hold the variable coreClient in our hands, and we know that it contains a (pointer to) an instance of the type k8s.io/client-go/kubernetes/typed/core/v1/CoreV1Interface. Resolving packages as before, we can now locate the file in which this type is declared.

$ (cd $GOPATH/src/k8s.io/client-go/kubernetes/typed/core/v1 ; grep "type CoreV1Interface" *.go)
core_client.go:type CoreV1Interface interface {

Hmm..this does not look like a structure. In fact, this is an interface. An interface defines a collection of methods (not just ordinary functions!). The value of an interface can be an instance of any type which implements all these methods, i.e. a type to which methods with the same names and signatures as those contained in the interface are linked using receiver arguments.

This sounds complicated, so let us look at a simple example from the corresponding section of “Tour in Go”.

type I interface {
	M()
}

type T struct {
	S string
}

// This method means type T implements the interface I,
// but we don't need to explicitly declare that it does so.
func (t T) M() {
	fmt.Println(t.S)
}

Here T is a structure containing a field S. The function M() has a receiver argument of type T and is therefore a method linked to the structure T. The interface I can be “anything on which we can call a method M“. Hence any variable to type T would be an admissible value for a variable of type I. In this sense, T implements I. Note that the “implements” relation is purely implicit, there is no declaration of implementation like in Java.

Let us now try to understand what this means in our case. If you locate the definition of the interface CoreV1Interface in core_client.go, you will see something like

type CoreV1Interface interface {
	RESTClient() rest.Interface
	ComponentStatusesGetter
	ConfigMapsGetter
...
	NodesGetter
...
}

The first line looks as expected – the interface contains a method RESTClient() returning a k8s.io/client-go/rest/Interface. However, the following lines do not look like method definitions. In fact, if you search the source code for these names, you will find that these are again interfaces! ConfigMapsGetter, for instance, is an interface declared in configmap.go which belongs to the same package and is defined as follows.

// ConfigMapsGetter has a method to return a ConfigMapInterface.
// A group's client should implement this interface.
type ConfigMapsGetter interface {
	ConfigMaps(namespace string) ConfigMapInterface
}

So this is a “real” interface. Its occurence in CoreV1Interface is an example of an embedded interface. This simply means that the interface CoreV1Interface contains all methods declared in ConfigMapsGetter plus all the other methods declared directly. This relation corresponds to interface inheritance in Java.

Among other interfaces, we find that CoreV1Interface embeds (think: inherits) the interface NodesGetter. Surprisingly, this is defined in node.go.

type NodesGetter interface {
	Nodes() NodeInterface
}

So we find that our variable coreClient contains something that – among other interfaces – implements the NodeGetter interface and therefore has a method called Nodes(). Thus we could do something like

coreClient.Nodes()

which would return an instance of the interface type NodeInterface. This in turn is declared in the same file, and we find that it has a method List()

type NodeInterface interface {
	Create(*v1.Node) (*v1.Node, error)
...	List(opts metav1.ListOptions) (*v1.NodeList, error)
...
}

This will return two things – an instance of NodeList and an error. NodeList is part of the package k8s.io/api/core/v1 and declared in types.go. To get this node list, we therefore need the following code.

nodeList, err := coreClient.Nodes().List(metav1.ListOptions{})

Note the argument to List – this is an anonymous instance of the type ListOptions which is just a structure, here we create an instance of this structure with all fields being nil and pass that instance as a parameter.

This completes our post for today. We have learned to navigate through structures, interfaces, methods and inheritance and how to locate type definitions and method signatures in the source code. In the next post, we will learn how to work with the node list, i.e. how to walk the list and prints its contents.

Learning Go with Kubernetes I – basics

When you work with Kubernetes and want to learn more about its internal workings and how to use the API, you will sooner or later reach the point at which the documentation can no longer answer all your questions and you need to consult the one and only source of truth – the source code of Kubernetes and a plethora of examples. Of course all of this is not written in my beloved Python (nor in C or Java) but in Go. So I decided to come up with a short series of posts that documents my own efforts to learn Go, using the Kubernetes source code as an example.

Note that this is not a stand-alone introduction into Go – there are many other sites doing this, like the Tour of Go on the official Golang home page, the free Golang book or many many blogs like Yourbasic. Rather, it is meant as an illustration of Go concepts for programmers with a background in a language like C (preferred), Java or Python which will help you to read and understand the Kubernetes server and client code.

What is Go(lang)?

Go – sometimes called Golang – is a programming language that was developed by Google engineers in an effort to create an easy to learn programming language well suited for programming fast, multithreaded servers and web applications. Some of its syntax and ideas actually remind me of C – there are pointers and structs – other of things I have seen in Python like slices. If you know any of these languages, you will find your way through the Go syntax quickly.

Go compiles code into native statically linked executables, which makes them easy to deploy. And Go comes with built-in support for multithreading, which makes it comparatively easy to build server-like applications. This blog lists some of the features of the Go language and compares them to concepts known from other languages.

Installation

The first thing, of course, is to install the Go environment. As Go is evolving rapidly, the packages offered by your distribution will most likely be outdated. To install the (fairly recent) version 1.10 on my Linux system, I have therefore downloaded the binary distribution using

wget https://dl.google.com/go/go1.10.8.linux-amd64.tar.gz
gzip -d go1.10.8.linux-amd64.tar.gz
tar xvf go1.10.8.linux-amd64.tar

in the directory where I wanted Go to be installed (I have used a subdirectory $HOME/Local for that purpose, but you might want to use /usr/local for a system-wide installation).

To resolve dependencies at compile time, Go uses a couple of standard directories – more on this below. These directories are stored in environment variables. To set these variables, add the following to your shell configuration script (.bashrc or .profile, depending on your system). Here GOROOT needs to point to the sub-directory go which the commands above will generate.

export GOROOT=directory-in-which-you-did-install-go/go
export GOPATH=$HOME/go
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH

The most important executable in $GOROOT/bin is the go utility itself. This program can operate in various modes – go build will build a program, go get will install a package and so forth. You can run go help for a full list of available commands.

Packages and imports

A Go program is built from a collection of packages. Each source code file is part of a package, defined via the package declaration at the start of the file. Packages can be imported by other packages, which is the basis for reusable libraries. To resolve package names, Go has two different mechanisms – the GOPATH mode and the newer module mode available since Go 1.11. Here we will only discuss and use the old GOPATH mode.

In this mode, the essential idea is that your entire Go code is organized in a workspace, i.e. in one top-level directory, typically located in your home directory. In my case, my workspace is $HOME/go. To tell the Go build system about your workspace, you have to export the GOPATH environment variable.

export GOPATH=$HOME/go

The workspace directory follows a standard layout and typically contains the following subdirectories.

  • src – this is where all the source files live
  • pkg – here compiled versions of packages will be installed
  • bin – this is were executables resulting out of a build process will be moved

Let us try this out. Run the following commands to download the Kubernetes Go client package and some standard libraries.

go get k8s.io/client-go/...
go get golang.org/x/tools/...

When this has completed, let us take a look at our GOPATH directory to see what has happened.

$ ls $GOPATH/src
github.com  golang.org  gopkg.in  k8s.io

So the Go utility has actually put the source code of the downloaded package into our workspace. As GOPATH points to this workspace, the Go build process can resolve package names and map them into this workspace. If, for instance, we refer to the package k8s.io/client-go/kubernetes in our source code (as we will do it in the example later on), the Go compiler will look for this package in

GOPATH/src/k8s.io/client-go/kubernetes

To get information on a package, we can use the list command of the Go utility. Let us try this out.

$ go list -json k8s.io/client-go/kubernetes
{
	"Dir": "[REDACTED - this is $GOPATH]/src/k8s.io/client-go/kubernetes",
	"ImportPath": "k8s.io/client-go/kubernetes",
	"ImportComment": "k8s.io/client-go/kubernetes",
        [REDACTED - SOME MORE LINES]
	"Root": "[REDACTED - THIS SHOULD BE $GOROOT]",
	"GoFiles": [
		"clientset.go",
		"doc.go",
		"import.go"
	],
[... REDACTED - MORE OUTPUT ...]

Here I have removed a few lines and redacted the output a bit. We see that the Dir field is the directory in which the compiler will look for the code constituting the package. The top-level directory is the directory to which GOPATH points. The import path is the path that an import statement in a program using this package would be. The list GoFiles is a list of all files that are parts of this package. If you inspect these files, you will in fact find that the first statement they contain is

package kubernetes

indicating that they belong to the Kubernetes package. You will see that the package name (defined in the source code) equals the last part of the full import path (part of the filesystem structure) by convention (which, as far as I understand, is not enforced technically).

I recommend to spend some time reading more on typical layouts of Go packages here ore here.

We have reached the end of our first post in this series. In the next post, we will write our first example program which will list nodes in a Kubernetes cluster.

Kubernetes on your PC: playing with minikube

In my previous posts on Kubernetes, I have used public cloud providers like AWS or DigitalOcean to spin up test clusters. This is nice and quite flexible – you can create clusters with an arbitrary numbers of nodes, can attach volumes, create load balancers and define networks. However, cloud providers will of course charge for that, and your freedom to adapt the configuration and play with the management nodes is limited. It would be nice to have a playground, maybe even on your own machine, which gives you a small environment to play with. This is exactly what the minikube project is about.

Basics and installation

Minikube is a set of tools that allows you to easily create a one-node Kubernetes cluster inside a virtual machine running on your PC. Thus there is only one node, which serves at the same time as a management node and a worker node. Minikube supports several virtualization toolsets, but the default (both on Linux and an Windows) is Virtualbox. So as a first step, let us install this.

$ sudo apt-get install virtualbox

Next, we can install minikube. We will use release 1.0 which has been published end of march. Minikube is one single, statically linked binary. I keep third-party binaries in a directory ~/Local/bin, so I applied the following commands to download and install minikube.

$ curl -Lo minikube https://storage.googleapis.com/minikube/releases/v1.0.0/minikube-linux-amd64 
$ chmod 700 minikube
$ mv minikube ~/Local/bin

Running minikube

Running minikube is easy – just execute

$ minikube start

When you do this for the first time after installation, Minikube needs to download a couple of images. These images are cached in ~/minikube/cache and require a bit more than 2 Gb of disk space, so this will take some time.

Once the download is complete, minikube will bring up a virtual machine, install Kubernetes in it and adapt your kubectl configuration to point to this newly created cluster.

By default, minikube will create a virtual machine with two virtual CPUs (i.e. two hyperthreads) and 2 GB of RAM. This is the minimum for a reasonable setup. If you have a machine with sufficient memory, you can allocate more. To create a machine with 4 GB RAM and four CPUs, use

$ minikube start --memory 4096 --cpus 4

Let us see what this command does. If you print your kubectl config file using kubectl config view, you will see that minikube has added a new context to your configuration and set this context as the default context, while preserving any previous configuration that you had. Next, let us inspect our nodes.

$ kubectl get nodes
NAME       STATUS   ROLES    AGE     VERSION
minikube   Ready    master   3m24s   v1.14.0

We see that there is one node, as expected. This node is a virtual machine – if you run virtualbox, you will be able to see that machine and its configuration.

screenshot-from-2019-04-08-14-06-02.png

When you run minikube stop, the virtual machine will be shut down, but will survive. When you restart minikube, this machine will again be used.

There are several ways to actually log into this machine. First, minikube has a command that will do that – minikube ssh. This will log you in as user docker, and you can do a sudo -s to become root.

Alternatively, you can stop minikube, then start the machine manually from the virtualbox management console, log into it (user “docker”, password “tcuser” – it took me some time to figure this out, if you want to verify this look at this file, read the minikube Makefile to confirm that the build uses buildroot and take a look at the description in this file) and then start minikube. In this case, minikube will detect that the machine is already running.

Networking in Minikube

Let us now inspect the networking configuration of the virtualbox instance that minikube has started for us. When minikube comes up, it will print a message like the following

“minikube” IP address is 192.168.99.100

In case you missed this message, you can use run minikube ip to obtain this IP address. How is that IP address reachable from the host?

If you run ifconfig and ip route on the host system, you will find that virtualbox has created an additional virtual network device vboxnet0 (use ls -l /sys/class/net to verify that this is a virtual device) and has added a route sending all the traffic to the CIDR range 192.168.99.0/24 to this device, using the source IP address 192.168.99.1 (the src field in the output of ip route). So this gives you yet another way to SSH into the virtual machine

ssh docker@$(minikube ip)

showing also that the connection works.

Inside the VM, however, the picture is a bit more complicated. As a starting point, let us print some details on the virtual machine that minikube has created.

$ vboxmanage showvminfo  minikube --details | grep "NIC" | grep -v "disabled"
NIC 1:           MAC: 080027AE1062, Attachment: NAT, Cable connected: on, Trace: off (file: none), Type: virtio, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 1 Settings:  MTU: 0, Socket (send: 64, receive: 64), TCP Window (send:64, receive: 64)
NIC 1 Rule(0):   name = ssh, protocol = tcp, host ip = 127.0.0.1, host port = 44359, guest ip = , guest port = 22
NIC 2:           MAC: 080027BDDBEC, Attachment: Host-only Interface 'vboxnet0', Cable connected: on, Trace: off (file: none), Type: virtio, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none

So we find that virtualbox has equipped our machine with two virtual network interfaces, called NIC 1 and NIC 2. If you ssh into the machine, run ifconfig and compare the MAC address values, you fill find that these two devices appear as eth0 and eth1.

Let us first take a closer look at the first interface. This is a so-called NAT device. Basically, this device acts like a router – when a TCP/IP packet is sent to this device, the virtualbox engine extracts the data, opens a port on the host machine and sends the data to the target host. When the answer is received, another address translation is performed and the packet is fed again into the virtual device.

Much like an actual router, this mechanism makes it impossible to reach the virtual machine from the host – unless a port forwarding rule is set up. If you look at the output above, you will see that there is one port forwarding rule already in place, mapping the SSH port of the guest system to a port on the host, in our case 44359. When you run netstat on the host, you will find that minikube itself actually connects to this port to reach the SSH daemon inside the virtual machine – and, incidentally, this gives us yet another way to SSH into our machine.

ssh -p 44359 docker@127.0.0.1

Now let us turn to the second interface – eth1. This is an interface type which the VirtualBox documentation refers to as host-only networking. In this mode, an additional virtual network device is created on the host system – this is the vboxnet0 device which we have already spotted. Traffic sent to the virtual device eth1 in the machine is forwarded to this device and vice versa (this is in fact handled by a special driver vboxnet as you can tell from the output of ethtool -i vboxnet0). In addition, VirtualBox has added routes on the host and the guest system to connect this device to the network 192.168.99.0/24. Note that this network is completely separated from the host network. So our picture looks as follows.

VirtualBoxNetworking

What does this mean for Kubernetes networking in Minikube? Well, the first obvious consequence is that we can use node ports to access services from our host system. Let us try this out, using the examples from a previous post.

$ kubectl apply -f https://raw.githubusercontent.com/christianb93/Kubernetes/master/pods/deployment.yaml
deployment.apps/alpine created
$ kubectl apply -f https://raw.githubusercontent.com/christianb93/Kubernetes/master/network/nodePortService.yaml
service/alpine-service created
$ kubectl get svc
NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
alpine-service   NodePort    10.99.112.157           8080:32197/TCP   26s
kubernetes       ClusterIP   10.96.0.1               443/TCP          4d17h

So our service has been created and is listening on the node port 32197. Let us see whether we can reach our service from the host. On the host, open a terminal window and enter

$ nodeIP=$(minikube ip)
$ curl $nodeIP:32197
<h1>It works!</h1>

So node port services work as expected. What about load balancer services? In a typical cloud environment, Kubernetes will create load balancers whenever we set up a load balancer service that is reachable from outside the cluster. Let us see what the corresponding behavior in a minikube environment is.

$ kubectl delete svc alpine-service
service "alpine-service" deleted
$ kubectl apply -f https://raw.githubusercontent.com/christianb93/Kubernetes/master/network/loadBalancerService.yaml
service/alpine-service created
$ kubectl get svc
NAME             TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
alpine-service   LoadBalancer   10.106.216.127        8080:31282/TCP   3s
kubernetes       ClusterIP      10.96.0.1                443/TCP          4d18h
$ curl $nodeIP:31282
<h1>It works!</h1>

You will find that even after a few minutes, the external IP remains pending. Of course, we can still reach our service via the node port, but this is not the idea of a load balancer service. This is not awfully surprising, as there is no load balancer infrastructure on your local machine.

However, minikube does offer a tool that allows you to emulate a load balancer – minikube tunnel. To see this in action, open a second terminal on your host and enter

minikube tunnel

After a few seconds, you will be asked for your root password, as minikube tunnel requires root privileges. After providing this, you should see some status message on the screen. In our first terminal, we can now inspect our service again.

$ kubectl get svc alpine-service
NAME             TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)          AGE
alpine-service   LoadBalancer   10.106.216.127   10.106.216.127   8080:31282/TCP   17m
$ curl 10.106.216.127:8080
<h1>It works!</h1>

Suddenly, the field external IP is populated, and we can reach our service under this IP address and the port number that we have configured in our service description. What is going on here?

To find the answer, we can use ip route on the host. If you run this, you will find that minikube has added an additional route which looks as follows.

10.96.0.0/12 via 192.168.99.100 dev vboxnet0 

Let us compare this with the CIDR range that minikube uses for services.

$ kubectl cluster-info dump | grep -m 1 range
                            "--service-cluster-ip-range=10.96.0.0/12",

So minikube has added a route that will forward all traffic directed towards the IP ranged used for Kubernetes services to the IP address of the VM in which minikube is running, using the virtual ethernet device created for this VM. Effectively, this sets up the VM as a gateway which makes it possible to reach this CIDR range (see also the minikube documentation for details). In addition, minikube will set the external IP of the service to the cluster IP address, so that the service can now be reached from the host (you can also verify the setup using ip route get 10.106.216.127 to display the result of the route resolution process for this destination).

Note that if you stop the separate tunnel process again, the additional route disappears again and the external IP address of the service switches back to “pending”.

Persistent storage in Minikube

We have seen in my previous posts on persistent storage that cloud platforms typically define a default storage class and offer a way to automatically create persistent volumes for a PVC. The same is true for minikube – there is a default storage class.

$ kubectl get storageclass
NAME                 PROVISIONER                AGE
standard (default)   k8s.io/minikube-hostpath   5d1h

In fact, minikube is by default starting a custom storage controller (as you can check by running kubectl get pods -n kube-system). To understand how this storage controller is operating, let us construct a PVC and analyse the resulting volume.

$ kubectl apply -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 512Mi
EOF

If you use kubectl get pv, you will see that the storage controller has created a new persistent volume. Let us attach this volume to a container to play with it.

$ kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
  name: pv-test
  namespace: default
spec:
  containers:
  - name: pv-test-ctr
    image: httpd:alpine
    volumeMounts:
      - mountPath: /test
        name: test-volume
  volumes:
  - name: test-volume
    persistentVolumeClaim:
      claimName: my-pvc
EOF

If you then use once more SSH into the VM, you should see our new container running. Using docker inspect, you will find that Docker has again created a bind mount, binding the mount point /test to a directory on the host named /tmp/hostpath-provisioner/pvc-*, where * indicates some randomly generated number. When you attach to the container and create a file /test/myfile, and then display the contents of this directory in the VM, you will in fact see that the file has been created.

So at the end of the day, a persistent volume in minikube is simply a host-path volume, pointing to a directory on the one and only node used by minikube. Also note that this storage is really persistent in the sense that it survives a restart of minikube.

Additional features

There are a view additional features of minikube that are worth being mentioned. First, it is very easy to install an NGINX ingress controller – the command

minikube addons enable ingress

will do this for you. Second, minikube also allows you to install and enable the Kubernetes dashboard. In fact, running

minikube dashboard

will install the dashboard and open a browser pointing to it.

KubernetesDashboard

And there are many more addons- you can get a full list with minikube addons list or in the minikube documentation. I highly recommend to browse that list and play with one of them.

Automating cluster creation on DigitalOcean

So far I have mostly used Amazons EKS platform for my posts on Kubernetes. However, this is of course not the only choice – there are many other providers that offer Kubernetes in a cloud environment. One of them which is explicitly targeting developers is DigitalOcean. In this post, I will show you how easy it is to automate the creation of a Kubernetes cluster on the DigitalOcean platform.

Creating an SSH key

Similar to most other platforms, DigitalOcean offers SSH access to their virtual machines. You can either ask DigitalOcean to create a root password for you and send it to you via mail, or – preferred – you can use SSH keys.

Different from AWS, key pairs need to be generated manually outside of the platform and imported. So let us generate a key pair called do_k8s and import it into the platform. To create the key locally, run

$ ssh-keygen -f ~/.ssh/do_k8s -N ""

This will create a new key (not protected by a passphrase, so be careful) and store the private key file and the public key file in separate files in the SSH standard directory. You can print out the contents of the public key file as follows.

$ cat ~/.ssh/do_k8s.pub

The resulting output is the public part of your SSH key, including the string “ssh-rsa” at the beginning. To make this key known to DigitalOcean, log into the console, navigate to the security tab, click “Add SSH key”, enter the name “do_k8s” and copy the public key into the corresponding field.

Next, let us test our setup. We will create a request using curl to list all our droplets. In the DigitalOcean terminology, a droplet is a virtual machine instance. Of course, we have not yet created one, so expect to get an empty list, but we can uses this to test that our token works. For that purpose, we simply use curl to direct a GET request to the API endpoint and pass the bearer token in an additional header.

$ curl -s -X\
     GET "https://api.digitalocean.com/v2/droplets/"\
     -H "Authorization: Bearer $bearerToken"\
     -H "Content-Type: application/json"
{"droplets":[],"links":{},"meta":{"total":0}}

So no droplets, as expected, but our token seems to work.

Droplets

Let us now see how we can create a droplet. We could of course also use the cloud console to do this, but as our aim is automation, we will leverage the API.

When you have worked with a REST API before, you will not be surprised to learn that this is done by submitting a POST request. This request will contain a JSON body that describes the resource to be created – a droplet in our case – and a header that, among other things, is used to submit the bearer token that we have just created.

To be able to log into our droplet later on, we will have to pass the SSH key that we have just created to the API. Unfortunately, for that, we cannot use the name of the key (do_k8s), but we will have to use the internal ID. So the first thing we need to do is to place a GET request to extract this ID. As so often, we can do this with a combination of curl to retrieve the key and jq to parse the JSON output.

$ sshKeyName="do_k8s"
$ sshKeyId=$(curl -s -X \
      GET "https://api.digitalocean.com/v2/account/keys/" \
      -H "Authorization: Bearer $bearerToken" \
      -H "Content-Type: application/json" \
       | jq -r "select(.ssh_keys[].name=\"$sshKeyName\") .ssh_keys[0].id")

Here we first use curl to get a list of all keys in JSON format. We then pipe the output into jq and use the select statement to get only those items for which the attribute name matches our key name. Finally, we extract the ID field from this item and store it in a shell variable.

We can now assemble the data part of our request. The code is a bit difficult to read, as we need to escape quotes.

$ data="{\"name\":\"myDroplet\",\
       \"region\":\"fra1\",\
       \"size\":\"s-1vcpu-1gb\",\
       \"image\":\"ubuntu-18-04-x64\",\
       \"ssh_keys\":[ $sshKeyId ]}"

To get a nice, readable representation of this, we can use jq’s pretty printing capabilities.

$ echo $data | jq
{
  "name": "myDroplet",
  "region": "fra1",
  "size": "s-1vcpu-1gb",
  "image": "ubuntu-18-04-x64",
  "ssh_keys": [
    24322857
  ]
}

We see that this is a simple JSON structure. There is a name, which will be the name used later in the DigitalOcean console to display our droplet, a region (I use fra1 in central europe, a full list of all available regions is here), a size specifying the type of the droplet (in this case one vCPU and 1 GB), the OS image to use and finally the SSH key id that we have extracted before. Let us now submit our creation request.

$ curl -s  -X \
      POST "https://api.digitalocean.com/v2/droplets"\
      -d "$data" \
      -H "Authorization: Bearer $bearerToken"\
      -H "Content-Type: application/json"

When everything works, you should see your droplet on the DigitalOcean web console. If you repeat the GET request above to obtain all droplets, your droplet should also show up in the list. To format the output, you can again pipe it through jq. After some time, the status field (located at the top of the output) should be “active”, and you should be able to retrieve an IP address from the section “networks”. In my case, this is 46.101.128.54. We can now SSH into the machine as follows.

$ ssh -i ~/.ssh/do_key root@46.101.128.54

Needless to say that it is also easy to delete a droplet again using the API. A full reference can be found here. I have also created a few scripts that can automatically create a droplet, list all running droplets and delete a droplet.

Creating a Kubernetes cluster

Let us now turn to the creation of a Kubernetes cluster. The good news is that this is even easier than the creation of a droplet – a single POST request will do!

But before we can assemble our request, we need to understand how the cluster definition is structured. Of course, a Kubernetes cluster consists of a couple of management nodes (which DigitalOcean manages for you in the background) and worker nodes. On DigitalOcean, worker nodes are organized in node pools. Each node pool contains a set of identical worker nodes. We could, for instance, create one pool with memory-heavy machines for database workloads that require caching, and a second pool with general purpose machines for microservices. The smallest machines that DigitalOcean will allow you to bring up as worker nodes are of type s-1vcpu-2gb. To fully specify a node pool with two machines of this type, the following JSON fragment is used.

$ nodePool="{\"size\":\"s-1vcpu-2gb\",\
      \"count\": 2,\
      \"name\": \"my-pool\"}"
$ echo $nodePool | jq
{
  "size": "s-1vcpu-2gb",
  "count": 2,
  "name": "my-pool"
}

Next, we assemble the data part of the POST request. We will need to specify an array of node pools (here we will use only one node pool), the region, a name for the cluster, and a Kubernetes version (you can of course ask the API to give you a list of all existings versions by running a GET request on the URL path/v2/kubernetes/options). Using the node pool snippet from above, we can assemble and display our request data as follows.

$ data="{\"name\": \"my-cluster\",\
        \"region\": \"fra1\",\
        \"version\": \"1.13.5-do.1\",\
        \"node_pools\": [ $nodePool ]}"
$ echo $data | jq
{
  "name": "my-cluster",
  "region": "fra1",
  "version": "1.13.5-do.1",
  "node_pools": [
    {
      "size": "s-1vcpu-2gb",
      "count": 2,
      "name": "my-pool"
    }
  ]
}

Finally, we submit this data using a POST request as we have done it for our droplet above.

$ curl -s -w -X\
    POST "https://api.digitalocean.com/v2/kubernetes/clusters"\
    -d "$data" \
    -H "Authorization: Bearer $bearerToken"\
    -H "Content-Type: application/json"

Now cluster creation should start, and if you navigate to the Kubernetes tab of the DigitalOcean console, you should see your cluster being created.

Cluster creation is rather fast on DigitalOcean, and typically takes less than five minutes. To complete the setup, you will have to download the kubectl config file for the newly generated cluster. Of course, there are again two ways to do this – you can use the web console or the API. I have created a script that fully automates cluster creation – it detects the latest Kubernetes version, creates the cluster, waits until it is active and downloads the kubectl config file for you. If you run this, make sure to populate the shell variable bearerToken with your token or use the -t switch to pass the token to the script. The same directory also contains a few more scripts to list all existing clusters and to delete them again.