Kubernetes storage under the hood part II – persistent storage

The storage types that we have discussed so far realize ephemeral storage, i.e. storage tied to the lifecycle of the Pod on a specific node. Of course, there are many use cases like databases or other stateful applications that require storage that is persistent and has a lifecycle independent of the Pod. In this post, we look at different ways to realize this in Kubernetes.

Using cloud provider specific storage directly

The easiest – and historically first – way to do this is to directly refer to an existing piece of persistent storage provided by the underlying cloud platform. Recall that when you define and use volumes, you define the volume as part of Pod specification, assign a name and tell Kubernetes which type of volume it should allocate – like emptyDir or hostPath. If you check the list of supported volumes, you will find some volumes that are specific to a cloud provider, for instance awsElasticBlockStore. This allows you to mount a pre-existing AWS Elastic Block Store volume into your Pod. As EBS volumes can only be attached to one instance at a time, you can only connect to this volume from one Pod.

To try this out, we of course have to generate an EBS volume first (as always, be aware of the charges and make sure to delete the volume if it is no longer needed). EBS volumes need to be created within an availability zone (which you can figure out using aws ec2 describe-availability-zones). In my example, I will use the availability zone eu-central-1a. The following command creates a 16 GiB GP2 (SSD) drive in this availability zone.

$ aws ec2 create-volume --availability-zone=eu-central-1a\
           --size=16 --volume-type=gp2

If you now run aws ec2 describe-volumes --output json to list all volumes, you will see several volumes that have the status “in-use” (these are the root volumes of your nodes) and one new volume that has the status “available”. We will need the volume ID of this volume, in my case this is vol-0eb2505d4b7d035cb. Let us try to attach this to a Pod using the following manifest file.

apiVersion: v1
kind: Pod
  name: ebs-demo
  namespace: default
  - name: ebs-demo-ctr
    image: httpd:alpine
      - mountPath: /test
        name: test-volume
  - name: test-volume
      volumeID: vol-0eb2505d4b7d035cb

Sometimes you learn most from your mistakes. If you apply this manifest file chances are that your Pod will never be fully established, but will remain in the state “ContainerCreating” forever. What is going wrong?

To find the answer, use the AWS CLI or the AWS Web console to look at the availability zones of the instance on which the Pod is scheduled and the volume. In my example, Kubernetes did schedule the Pod on a node running in eu-central-1c, whereas the volume was created in eu-central-1a. Unfortunately, EBS volumes cannot be attached across availability zones, and the creation of the Pod fails.

Fortunately, there is a way out. Note that EKS will attach a label to the nodes which capture there availability zone. This label is called failure-domain.beta.kubernetes.io/zone. Now, Kubernetes has a general mechanism called node selector. This allows you to instruct the Kubernetes scheduler to move Pods only on specific nodes, matching certain selection criteria. These criteria are provided in the section nodeSelector of the Pod specification. So the following updated manifest file makes sure that the node will be scheduled in the availability zone in which we did create the volume.

apiVersion: v1
kind: Pod
  name: ebs-demo
  namespace: default
  - name: ebs-demo-ctr
    image: httpd:alpine
      - mountPath: /test
        name: test-volume
  - name: test-volume
      volumeID: vol-0eb2505d4b7d035cb
    failure-domain.beta.kubernetes.io/zone: eu-central-1a

If you apply this manifest file and wait for some time, you will see that the Pod comes up. If you again run aws ec2 describe-volumes, you will find that the volumes status has changed to “in-use” and that EKS has automatically attached it to the node on which the Pod is scheduled (provided, of course, that you have a node running in eu-central-1a, which is not a given if you only use two nodes as we do it – in this case you will have to create the volume in one of the availability zones in which you have a node). You can also attach to the Pod and run mount to verify that a device has been mounted on /test. Combining the output of docker inspect and mount on the node will tell you that the following has happened.

  • Kubernetes has asked AWS to attach our volume as /dev/xvdba to the node
  • Then Kubernetes did create a directory specific to the Pod
  • The volume was mounted into this directory
  • Finally, Kubernetes did create a Docker bind mount to hook up this directory on the node with the directory /test in the container file system

This example nicely demonstrates that using existing persistent storage in the underlying cloud platform is possible, but comes with some drawbacks. An administrator will have to manage these volumes manually, using whatever tools your cloud provider makes available. There are limitations if your cluster is spanning multiple availability zones, and of course you tie all your manifest files directly to a specific cloud provider. In addition, Kubernetes tries to follow the idea of “run everywhere”, which does not combine well with an approach where each individual cloud provider needs to be hardcoded into the core Kubernetes code base. As so often in computer science, this situation almost cries for an additional abstraction layer between Pod volumes and the underlying storage. This abstraction layer exists and is the topic of the next section.

Persistent volume claims

In the previous section, we have defined volumes as part of a Pod specification. These volumes – which we should actually call Pod volumes – are linked to the lifecycle of an individual Pod. We have seen that these volumes can refer to existing storage within your cloud layer, but that this comes with some drawbacks and requires manual provisioning outside of the Kubernetes world.

To change this, Kubernetes defines an additional abstraction layer between the storage as provided by the cloud platform and the Pod volumes. The most important objects we need to discuss in order to understand this are persistent volumes (PV) and persistent volume claims (PVC).

Let us start with persistent volumes. Essentially, a persistent volume is a Kubernetes object that represents a piece of storage in the underlying cloud platform. Typically, volumes are created dynamically, but we will see in the next post in this series that they can also be managed manually. The important thing is that volumes are first-class Kubernetes citizens and have a lifecycle independent of that of Pods.

Volumes represent the supply side of storage on your cluster. The demand side is represented by volume claims. A persistent volume claim is an object that a user creates to let Kubernetes know that a certain amount of storage is required. If the cluster is set up for it, Kubernetes will then automatically try to fulfill the claim, i.e. to either find an existing, unused volume that matches the claim or to provision a volume dynamically, using a so-called provisioner. A persistent volume claim can then be referenced in a Pod volume which in turn can be mounted into a container.


To get an idea what this means, let us again consider an example. The following manifest file describes a persistent volume claim.

apiVersion: v1
kind: PersistentVolumeClaim
  name: ebs-pvc
  namespace: default
    - ReadWriteOnce
      storage: 4Gi

In addition to the standard fields, we have a specification that consists of two sections. The first section specifies the access mode. Here we use ReadWriteOnce, which states that we want a volume which can be accessed by one Pod at a time, reading and writing. In the second section, we specify the resources, i.e. the amount of storage that we need, in our case 4 Gigabytes of storage. This example already demonstrates one nice feature of a PVC – it does not refer at all to a specific type of volume or a specific cloud platform (it does so indirectly, via the mechanism of storage classes, which we will investigate in the next post).

When we apply this manifest file and wait for a few seconds, we can look at the objects that have been created. First, run kubectl get pvc to get a list of all persistent volume claims.

$ kubectl get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ebs-pvc   Bound    pvc-63521bd7-4e51-11e9-8e51-0af6f9d0ca50   4Gi        RWO            gp2            7m

So as expected, we have a new persistent volume claim that has been generated. The status of this PVC is “bound”, telling us that the provisioner working behind the scenes was able to find a matching volume. We also find a reference to the volume in the output. So let us list this volume.

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS   REASON   AGE
pvc-63521bd7-4e51-11e9-8e51-0af6f9d0ca50   4Gi        RWO            Delete           Bound    default/ebs-pvc   gp2   

As stated above, this volume exists as an independent entity that we can refer to by its name and that has some properties. When we append --output json to get all the details, we see a few interesting things.

First, the volume has a field claimRef which tells us to which claim this volume is bound. This is only one field, not an array. Similarly, the PVC has a field volumeName referring to the underlying volume, which is again not an array. So the relation between a PV and a PVC is one-to-one. As mentioned in the documentation, this can lead to overprovisioning. If, for instance, you request 4 Gi, and there is a free volume with 8 Gi, the provisioner might decide to bind the PVC to this volume, even though only 4 Gi were requested.

Another interesting information that we get from the JSON output is that a volume has a label that tells us in which availability zone the volume (or, more precisely, the underlying EBS volume) is located. And, finally, the spec section contains a field that lets us locate the underlying EBS storage. You can use the following statements to extract this information and use the AWS CLI to print out some details of this volume.

$ fullVolumeID=$(kubectl get pv\
$ volumeID=$(echo $fullVolumeID | sed 's/aws:\/\/.*\///')
$ aws ec2 describe-volumes --volume-id=$volumeID --output json

Let us now actually use this volume, i.e. attach it to a Pod. Here is a manifest file which will bring up a Pod mounting this volume.

apiVersion: v1
kind: Pod
  name: pv-demo
  namespace: default
  - name: pv-demo-ctr
    image: httpd:alpine
      - mountPath: /test
        name: test-volume
  - name: test-volume
      claimName: ebs-pvc

We see that at the point in the file were we did previously refer directly to an EBS volume, we now refer to the PVC. So again, there is no EBS or EKS specific data in this manifest file, and we can theoretically use the same manifest file on any cloud platform.

When you apply this manifest file, you will notice that it takes significantly longer for the containers to come up, this is because behind the scenes, Kubernetes needs to attach the EBS volume to the node on which the Pod is scheduled, which takes some time.

We can now again analyze the structure of the file systems in the container and on the node. If we do this, we will find a very similar picture as above. The container has a bind mount into a directory managed by Kubernetes. This directory in turn is a mount point for the device /dev/xvdbu. If we look at the output of aws ec2 describe-volumes, we find that this is the device to which AWS has attached the EBS volume behind the PVC. So the outcome of the entire exercise is the same as before, with the difference that no manual provisioning of the volume as necessary.

In addition, Kubernetes is smart enough to place our Pod in a the same availability zone in which the volume is located. In fact, as explained in the documentation on multi-zone capabilities, the Kubernetes scheduler will make sure that a Pod that requires a PV located in a given availability zone will only be placed on a node running in that zone. It can, however, happen that a volume is created in an availability zone where there is no node at all, rendering it unusable (see this GitHub issue for a discussion).

Let us now investigat the lifecycle of this storage. To do this, let us create a file in our newly mounted directory, kill the pod, bring it up again and list the contents of the volume (this assumes that you have saved the manifest file for the Pod in a file called pvUsage.yaml).

$ kubectl exec -it pv-demo touch /test/hello
$ kubectl delete pod pv-demo
pod "pv-demo" deleted
$ kubectl apply -f pvUsage.yaml
pod/pv-demo created
$ # Wait for container to be ready
$ kubectl exec -it pv-demo ls /test
hello       lost+found

So, as expected, the volume and its content have survived the restart of the Pod. If a persistent volume claim is deleted, however, Kubernetes will automatically delete the underlying volume and delete the corresponding storage in the cloud platform layer. When you shutdown a cluster, make sure to delete all PVCs first, otherwise orphaned block storage volumes not controlled by Kubernetes any more could result.

We have now covered the basics of persistent storage in Kubernetes – but of course there is much more that we have still left open. How is storage actually provisioned? How does the platform know which type of storage we need when we issue a PVC? And how can we manually create storage? These are the topics that we will discuss in the next post in this series.

Superconducting qubits – the flux qubit

In the last post, we have discussed the basic idea of superconducting qubits – implement circuits in which a supercurrent flows that can be described by a quantum mechanical wave function, and use two energy levels of the resulting quantum system as a qubit. Today, we will look in some more detail into one possible realization of this idea – flux qubits.

In its simplest form, a flux qubit is a superconducting loop threaded by an external magnetic field and interrupted by a Josephson junction. This is visualized on the left hand side of the diagram below, while the right hand side shows an equivalent circuit, formed by an inductance L, the capacity CJ of the junction and the pure junction.


It is not too difficult to describe this system in the language of quantum mechanics. Essentially, its state at a given point in time can be described by two quantities – the charge stored in the capacitor formed by the two leads of the Josephson junction, and the magnetic flow (or flux) through the loop. If you carefully go through all the details and write down the resulting equation for the Hamiltonian, you will find that the classical equivalent of this system is a particle moving in a potential which, for an appropriate choice of the parameters of the circuit, looks like the one displayed below.


This potential has a form which physicists call a double-well potential. Let us discuss the behavior of a particle moving in such a potential qualitatively. Classically, the particle would eventually settle down at one the minima. As long as its energy is below the height of the potential separating these two minima, it would remain in that state. In the quantum world, however, we would expect tunneling to occur, i.e. we would expect that our system has two basic states, one corresponding to the particle being close the left minimum and one corresponding to the particle being right to the minimum, and that we see a certain non-zero probability for the particle to cross the potential wall and to flip from one state into the other state. This two-state system already looks like a good candidate for a qubit.

Being a one-dimensional system, the eigenstate wave functions can be approximated numerically using standard methods, for example the “particle-in-a-box” approach. Again, I refer to my detailed notes for the actual calculation. The result is displayed below.


The diagram shows the ground state (blue curve on the left hand side) and the first excited state (blue curve on the right hand side). In both diagrams, I have added the classical potential (orange line) for the purpose of comparison. So we actually obtain the picture that we expect. Up to normalization, the ground state – the eigenstate on the left – is a superposition of two states

|l \rangle - | r \rangle

where |l \rangle is a state localized around the left minimum of the potential and |r \rangle is a state centered around the right minimum, whereas the first excited state is a linear combination proportional to

|l \rangle + | r \rangle

In general, there will be a small energy difference between these two states, which leads to a non-zero probability for tunneling between them. Intuitively, the state |l \rangle corresponds to a supercurrent that flows through the loop in one direction and the state |r \rangle is a state in which the supercurrent flows in the opposite direction. Our ground state – which would be the state |0 \rangle in an interpretation as a qubit – and our first excited state |1 \rangle are superpositions of these two states.

A nice property of the flux qubit is that the energy gap between the first excited state and the second excited state is much higher than the energy gap between the ground state and the first excited state. In the example used as basis for the numerical simulations described here, the second gap is more than one order of magnitude higher than the first gap. This implies that the two level system spanned by |0 \rangle and |1 \rangle is a fairly well isolated system and thus serves as a very good approximation to a qubit. The Hamiltonian can be manipulated by changing the flux through the loop by applying an external magnetic field or an external microwave pulse can be used to stimulate a transition from the ground state to the first excited state. In this way, the qubit can be read-out and controlled.

So theoretically, this system is a good candidate for the implementation of a qubit. In fact, this has been used in practice – the D-Wave quantum annealer is based on interconnected flux qubits. However, it seems that the flux qubit has come a bit out of fashion, and research has focussed on a new generation of superconducting qubits like the transmon qubit that work slightly differently. We will study this type of qubit in the next post in this series.

Kubernetes storage under the hood part I – ephemeral storage

So far, we have mainly discussed how compute and network resources are used and managed with Kubernetes. We will now turn to the third fundamental element of a container platform – storage.

Docker storage concepts

Before we talk about Kubernetes storage concepts, let us first recall how storage is managed in Docker. The following tests assume that you have a local installation of Docker on a Linux workstation (or virtual machine, of course). As a refresher, you might want to take a look at my introduction into Docker internals before reading on.

First, let us start a Docker container and spawn a shell. The easiest way to do this is the busybox image. So let us spin up a busybox container, attached to the terminal, and run mount to inspect its file system.

$ docker run -it busybox
/ # mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/BNUAN2FS4K75ZLQFSZVQWXRLUU:/var/lib/docker/overlay2/l/UKEGOK4TLTD2T4XN5KIEPN7JXF,upperdir=/var/lib/docker/overlay2/e8f16f0d705fb8e4677b605796b7319ef3f0226e2ad173b506e13b479afa515f/diff,workdir=/var/lib/docker/overlay2/e8f16f0d705fb8e4677b605796b7319ef3f0226e2ad173b506e13b479afa515f/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755)

Your output will most likely look differently, but the general pattern will be the same. The first mount point that we see is the mount for the root directory. On newer Docker versions, this will be an overlay2 file system, on older Docker versions, it will be of type aufs. This entry points to one or more actual files on the host system that are located in the directory /var/lib/docker/ which is managed by Docker. These are the files in which the actual content of the root filesystem is stored.

The data on that file system is volatile and linked to the lifecycle of the container. To see this, let us create a file in the containers root file system and exit the container.

/ # echo "dancing in the rain" > /franky
/ # exit

This will stop the container, but not remove it – it will still exist and be visible in the output of docker ps -a. If you restart the container and attach to the running container, you will find that the file is still there and its content has been preserved. If, however, you remove the container using docker rm, the corresponding files on the host file system will be removed and the content of the file system of our container is lost. In that sense, these volumes are ephemeral – they survive across restarts, but die if the container is removed.

But Docker can do more – we can also use persistent storage. One option to do this are bind mounts. A bind mount maps a directory or a file from the host file system into the namespace of the container and attaches it to a mount point. To see an example, create a temporary directory on your host system and put some data into it. We can then mount this directory into a new Docker container using the -v option.

$ mkdir /tmp/ctr-test/
$ echo "Hello World" > /tmp/ctr-test/hello
$ docker run -v /tmp/ctr-test:/ctr-test/ -it busybox 
/ # cat /ctr-test/hello
Hello World
/# exit

So the content of the directory /tmp/ctr-test on the host becomes accessible within the container as /ctr-test (of course I could have chosen any other name as well). We can also see this mount point in the output of docker inspect. Use docker ps -a to find out the ID of the busybox container, in my case fd8ef21ba685, and then run

$ docker inspect fd8ef21ba685 --format="{{json .Mounts}}"

So the mount point shows up as a mount point of type bind in the list of mounts of our container.

We remark that Docker also has a more advanced way to mount storage referred to as volumes. In contrast to a bind mount, a volume is an object managed by Docker, backed by files in the Docker controlled directories. Volumes can be created manually or dynamically, can be given a name and can be mounted into a container. As they are objects with an independent lifecycle, they survive container eviction and can be mounted to more than one container. However, we will not look deeper into this as (at least to my knowledge) this feature is not used by Kubernetes.

Ephemeral storage in Kubernetes

Now let us try out how things change if we use Kubernetes to spin up our containers.

$ kubectl run -i --tty busybox --image=busybox --restart=Never
/ # / # mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/57X3FQLOLATAMPLAASUMBKHD5W:/var/lib/docker/overlay2/l/Q6HSQOG4NX7UHGVZEPQIZI43KQ,upperdir=/var/lib/docker/overlay2/f183dfd193c86bf43ebca4ae7d08cbfd6e268229f3bd59c70be2f48d9dc0937f/diff,workdir=/var/lib/docker/overlay2/f183dfd193c86bf43ebca4ae7d08cbfd6e268229f3bd59c70be2f48d9dc0937f/work)

So we see pretty much the same picture. The root volume of the container is again an overlay file system that is backed by directories managed by Docker on the host on which the pod is running.

If, however, you ssh into the node on which the container is running and do a docker inspect as before, you will find that there are actually a couple of bind mounts that we have not explicitly specified. These bind mounts are added by Kubernetes to give the container access to some configuration data like the token that can be used to connect to a Kubernetes service account, or host name resolutions (the own IP address of the container, for instance). Up to this point, our picture is actually quite simple.


Now things get a bit more complicated if you want to mount additional volumes. Kubernetes does in fact offer a large number of different volume types. Some of these volume types are ephemeral, i.e. the data is potentially lost if the pod dies or is rescheduled to a different node, other types of volumes are persistent. In this post, we focus on ephemeral storage and discuss different strategies to attach persistent storage in a later post.

Mount ephemeral storage of type emptyDir

In Kubernetes, we can define volumes on the level of an individual pod and attach these volumes to one or more containers that are running in this pod. Kubernetes offers several types of volumes. The type we are going to look at first is called an emptyDir because, from the point of view of a container, it is exactly that – a directory which is initially empty.

To see this in action, let us look at the following manifest file.

apiVersion: v1
kind: Pod
  name: empty-dir-demo
  namespace: default
  - name: empty-dir-demo-ctr
    image: httpd:alpine
      - mountPath: /test
        name: test-volume
  - name: test-volume
    emptyDir: {}

This manifest file defines an individual Pod, as we have seen it before. However, there are a few new elements which are populated in this manifest. The Pod specification contains a new field volumes, which is an array of volume objects. This volume has a name and an additional field which indicates the type of the volume. The documentation lists many of them, here we are working with a volume of type emptyDir.

In the container specification, we now refer to this volume. This instructs Kubernetes to create the volume and to mount it into this container at the defined mount point. To see this in action, let us apply this manifest file, spawn a shell in the pod that is created and inspect its file system.

$ kubectl apply -f https://raw.githubusercontent.com/christianb93/Kubernetes/master/emptyDir.yaml 
pod/empty-dir-demo created
$ kubectl exec -it empty-dir-demo "/bin/bash"
bash-4.4# mount | grep "test"
/dev/xvda1 on /test type xfs (rw,noatime,attr2,inode64,noquota)

So we see that Kubernetes has actually mounted a new file system onto the mount point /test. To figure out how this is realized, let us take a closer look at the Docker container that has been created. So ssh into the node on which the Pod is running and run the following commands (this assumes that jq is installed on the node, which is the default when using the standard AWS AMI).

$ containerId=$(docker ps | grep "httpd" | awk '{print $1}')
$ docker inspect $containerId | jq -r '.[0].Mounts[]'
  "Type": "bind",
  "Source": "/var/lib/kubelet/pods/0914a859-4da2-11e9-931c-06a2d10ef1fe/volumes/kubernetes.io~empty-dir/test-volume",
  "Destination": "/test",
  "Mode": "",
  "RW": true,
  "Propagation": "rprivate"
[ ... more output ... ]

So we find that Kubernetes realizes an emptyDir volume as a bind mount, i.e. Kubernetes will create a directory on the nodes local file system and use a Docker bind mount to mount this into the container. Of course, this directory will initially be empty (as the name strongly suggests). Let us see what happens if we actually write something onto this file system. The following commands (to be run again on the node on which the Pod is running) extract the directory which is used for the bind mount from the output of docker inspect and list the contents of this directory.

$ dir=$(docker inspect $containerId | jq -r '.[0].Mounts[] | select(.Destination=="/test") | .Source')
$ sudo ls $dir

If you run this now, you will find that the directory is empty. Now switch back to the terminal attached to the Pod and create a file in the /test directory.

bash-4.4# echo "hello" > /test/hello

If you now list the directories content again, you will find that a file hello has been created.

Knowing how an emptyDir is implemented now makes it easy to understand the statements on the lifecycle in the Kubernetes documentation. It is stored in a directory specific for the Pod, i.e. it is initially created when the Pod is created and removed when the Pod is removed. It survices container restarts, but when the Pod is migrated to a different node, the content will be lost. In that sense, it is ephemeral storage.


Accessing host-local file systems

We have found that a volume of type emptyDir is nothing but a Docker bind mount to a Pod specific directory managed by Kubernetes. Of course, Kubernetes also offers a way to set up bind mounts to existing directories in the host file system (needless to say that this might be a security risk). This done using a volume of type hostPath as in the example below.

apiVersion: v1
kind: Pod
  name: host-path-demo
  namespace: default
  - name: host-path-demo-ctr
    image: httpd:alpine
      - mountPath: /test
        name: test-volume
  - name: test-volume
      path: /etc

When you run this and attach to the resulting Pod, you will find that the content of the directory /test now match the content of the directory /etc on the host. Using again docker inspect on the node on which the Pod is running, you will find that Kubernetes has created an additional bind mount for the container which links the containers /test directory to the directory /etc on the host. Consequently, the contents of a hostPath volume will survive container restarts but will not be accessible anymore once the Pod is migrated to a different host.

Superconducting qubits – an introduction

In some of the last posts in my series on quantum computing, we have discussed how NMR (nuclear magnetic resonance) technology can be used to implement quantum computers. Over the last couple of years, however, a different technology has attracted significantly more interest and invest – superconducting qubits.

What are superconducting qubits? To start with something a bit more familiar, let us look at an analogue in classical electronics – an LC circuit. If you have ever built a transistor-based radio, you will have seen this. An LC circuit consists of an inductor and a capacitor, connected together as follows.


To understand what this circuit is doing, let us assume that at some point in time, the capacitor is fully charged. Then current will start to flow through the inductor. This will create a magnetic field in the inductor, so the electric energy stored in the capacitor is transformed into magnetic energy stored in the magnetic field of the inductor. Once the capacitor is fully decharged, the magnetic field will start to break down, again inducing a current which will then recharge the capacitor and so forth. This follows a pattern that every physicist knows by the name harmonic oscillator.

Now, in quantum mechanics, a harmonic oscillator has an infinite number of energy levels at equally spaced values. Can we use two of them, say the ground state and the first excited state, to build a qubit? Unfortunately there are a few obstacles.

The first problems is that an ordinary current is the coordinated movement of many electrons that constantly interact with the environment (dissipating heat due to the non-zero resistance) and therefore do not make a good closed quantum system. Even though each individual electron is of course a quantum mechanical system, the overall system exhibits classical behavior.

This changes if we use a superconducting LC circuit. As soon as we enter the superconducting regime, the flow of charge is no longer given by individual moving electrons, but by Cooper pairs that flow through the conductor. It is well known that in this state, the system can be described by a macroscopic wave function and is therefore behaving according to the laws of quantum mechanics, leading to phenomena like the quantization of the magnetic flux in a superconducting loop.

Does this mean that we can use a superconducting LC circuit, which then should behave as a quantum mechanical harmonic oscillator, as a qubit? Well, we are not quite there yet – as the energy levels of a harmonic oscillator are equally spaced, the first two levels are not well isolated against the remaining levels which makes it very hard to confine the system to these two levels.

Enter Josephson junctions. A Josephson junction is a superconducting element which consists of two superconducting solids that are separated by a thin insulator, like displayed schematically below.


Due to tunneling, Cooper pairs can cross the junction and thus a superconducting current can flow through the element. It turns out that a Josephson junction behaves like an inductor whose inductance is not constant (as for an ordinary inductor), but depends on the current. This will change the Hamiltonian of the system to the effect that the energy levels become distorted. If we get the parameters right, we are able to obtain a situation where the two lowest energy levels are separated fairly well from the rest of the energy spectrum and therefore the corresponding subsystem can approximately be treated as a two-level system, i.e. a qubit.

Superconducting qubits have gained a lot of attention over the last two years. They can be produced with techniques known from the production of integrated circuit, and they can be controlled with classical electronics. Their dimensions are within the realm of traditional IC technology, being in the micrometer range. The picture below (credits A. ter Haar, PhD thesis) shows a superconducting qubit (a so-called flux qubit) made of an inner loop with three Josephson junctions and an outer loop for readout and control.

picture taken from the PhD thesis of A. ter Haar

In addition, superconducting qubits can easily be layed out on a chip, which makes them good candidates for small, highly integrated quantum computing devices. Of course they need cooling as the need to operate below the critical temperature of the employed superconducting material, but this is standard technology by now. Today, most of the big players in the quantum world like Google, IBM, Rigetti and D-Wave are focusing their efforts on superconducting qubits.

To interact with the qubit (and to make two qubits interact), there are several options. We can apply an external voltage, an external current or couple the system with a microwave cavity acting as a resonator. We also have several choices to combine a Josephson junction with classical circuits – we can add an additional capacitor in series with the Josephson junction (which will give us a qubit called the charge qubit), combine this setup with an additional capacitor parallel to the Junction (transmon qubit) or use a Josephson junction in combination with a classical inductance formed by a superconducting loop (flux qubit). All these circuits have different characteristics and have been used to realize qubits. The flux qubit, for instance, is being used in the D-Wave quantum annealer, whereas Google uses transmon qubits for their devices. We will dive a little bit deeper into each of them in the next posts in this series.

Managing traffic with Kubernetes ingress controllers

In one of the previous posts, we have learned how to expose arbitrary ports to the outside world using services and load balancers. However, we also found that this is not very efficient – in the worst case, the number of load balancers we need equals the number of services.

Specifically for HTTP/HTTPS traffic, there is a different option available – an ingress rule.

Ingress rules and ingress controllers

An ingress rule is a Kubernetes resource that defines a kind of routing on the HTTP(S) path level. To illustrate this, assume that you have two services running, call them svcA and svcB. Assume further that both services work with HTTP as the underlying protocol and are listening on port 80. If you expose these services naively as in the previous post, you will need two load balancers, call them LB1 and LB2. Then, to reach svcA, you would use the URL


and to access svcB, you would use


The idea of an ingress is to have only one load balancer, with one DNS entry, and to use the path name to route traffic to our services. So with an ingress, we would be able to reach svcA under the URL


and the second service under the URL


With this approach, you have several advantages. First, you only need one load balancer that clients can use for all services. Second, the path name is related to the service that you invoke, which follows best practises and makes coding against your services much easier. Finally, you can easily add new services and remove old services without a need to change DNS names.

In Kubernetes, two components are required to make this work. First, there are ingress rules. These are Kubernetes resources that contain rules which specify how to route incoming requests based on their path (or even hostname). Second, there are ingress controllers. These are controllers which are not part of the Kubernetes core software, but need to be installed on top. These controllers will work with the ingress rules to manage the actual routing. So before trying this out, we need to install an ingress controller in our cluster.

Installing an ingress controller

On EKS, there are several options for an ingress controller. One possible choice is the nginx ingress controller. This is the controller that we will use for our examples. In addition, AWS has created their own controller called AWS ALB controller that you could use as an alternative – I have not yet looked at this in detail, though.

So let us see how we can install the nginx ingress controller in our cluster. Fortunately, there is a very clear installation instruction which tells us that to run the install, we simply have to execute a number of manifest files, as you would expect for a Kubernetes application. If you have cloned my repository and brought up your cluster using the up.sh script, you are done – this script will set up the controller automatically. If not, here are the commands to do this manually.

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/aws/service-l4.yaml
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/aws/patch-configmap-l4.yaml

Let us go through this and see what is going on. The first file (mandatory.yaml) will set up several config maps, a service account and a new role and connect the service account with the newly created role. It then starts a deployment to bring up one instance of the nginx controller itself. You can see this pod running if you do a

kubectl get pods --namespace ingress-nginx

The first AWS specific manifest file service-l4.yaml will establish a Kubernetes service of type LoadBalancer. This will create an elastic load balancer in your AWS VPC. Traffic on the ports 80 and 443 is directed to the nginx controller.

kubectl get svc --namespace ingress-nginx

Finally, the second AWS specific file will update the config map that stores the nginx configuration and set the parameter use-proxy-protocol to True.

To verify that the installation has worked, you can use aws elb describe-load-balancers to verify that a new load balancer has been created and curl the DNS name provided by

kubectl get svc ingress-nginx --namespace ingress-nginx

from your local machine. This will still give you an error as we have not yet defined ingress rule, but show that the ingress controller is up and running.

Setting up and testing an ingress rule

Having our ingress controller in place, we can now set up ingress rules. To have a toy example at hand, let us first apply a manifest file that will

  • Add a deployment of two instances of the Apache httpd
  • Install a service httpd-service accepting traffic for these two pods on port 8080
  • Add a deployment of two instances of Tomcat
  • Create a service tomcat-service listening on port 8080 and directing traffic to these two pods

You can either download the file here or directly use the URL with kubectl

$ kubectl apply -f https://raw.githubusercontent.com/christianb93/Kubernetes/master/network/ingress-prep.yaml
deployment.apps/httpd created
deployment.apps/tomcat created
service/httpd-service created
service/tomcat-service created

When all pods are up and running, you can again spawn a shell in one of the pods and use the DNS name entries created by Kubernetes to verify that the services are available from within the pods.

$ pod=$(kubectl get pods --output \
$ kubectl exec -it $pod "/bin/bash"
bash-4.4# apk add curl
bash-4.4# curl tomcat-service:8080
bash-4.4# curl httpd-service:8080

Let us now define an ingress rule which directs requests to the path /httpd to our httpd service and correspondingly requests to /tomcat to the Tomcat service. Here is the manifest file for this rule.

apiVersion: extensions/v1beta1
kind: Ingress
  name: test-ingress
    nginx.ingress.kubernetes.io/rewrite-target : "/"
  - http:
      - path: /tomcat
          serviceName: tomcat-service
          servicePort: 8080
      - path: /alpine
          serviceName: alpine-service
          servicePort: 8080

The first few lines are familiar by now, specifying the API version, the type of resource that we want to create and a name. In the metadata section, we also add an annotation. The nginx ingress controller can be configured using this and similar annotations (see here for a list), and this annotation is required to make the example work.

In the specification section, we now define a set of rules. Our rule is a http rule, which, at the time of writing, is the only supported protocol. This is followed by a list of paths. Each path consists of a selector (“/httpd” and “/tomcat” in our case), followed by the specification of a backend, i.e. a combination of service name and service port, serving requests matching this path.

Next set up the ingress rule. Assuming that you have saved the manifest file above as ingress.yaml, simply run

$ kubectl apply -f ingress.yaml
ingress.extensions/test-ingress created

Now let us try this. We already know that the ingress controller has created a load balancer for us which serves all ingress rules. So let us get the name of this load balancer from the service specification and then use curl to try out both paths

$ host=$(kubectl get svc ingress-nginx -n ingress-nginx --output\
$ curl -k https://$host/httpd
$ curl -k https://$host/tomcat

The first curl should give you the standard output of the httpd service, the second one the standard Tomcat welcome page. So our ingress rule works.

Let us try to understand what is happening. The load balancer is listening on the HTTPS port 443 and picking up the traffic coming from there. This traffic is then redirected to a host port that you can read off from the output of aws elb describe-load-balancers, in my case this was 31135. This node port belongs to the service ingress-nginx that our ingress controller has set up. So the traffic is forwarded to the ingress controller. The ingress controller interprets the rules, determines the target service and forwards the traffic to the target service. Ignoring the fact that the traffic goes through the node port, this gives us the following simplified picture.


In fact, this diagram is a bit simplified as (see here) the controller does not actually send the traffic to the service cluster IP, but directly to the endpoints, thus bypassing the kube-proxy mechanism, so that advanced features like session affinity can be applied.

Ingress rules have many additional options that we have not yet discussed. You can define virtual hosts, i.e. routing based on host names, define a default backend for requests that do not match any of the path selectors, use regular expressions in your paths and use TLS secrets to secure your HTTPS entry points. This is described in the Kubernetes networking documentation and the API reference.

Creating ingress rules in Python

To close this post, let us again study how to implement Ingress rules in Python. Again, it is easiest to build up the objects from the bottom to the top. So we start with our backends and the corresponding path objects.


Having this, we can now define our rule and our specification section.

            paths=[tomcat_path, httpd_path]))

Finally, we assemble our ingress object. Again, this consists of the metadata (including the annotation) and the specification section.

metadata.annotations={"nginx.ingress.kubernetes.io/rewrite-target" : "/"}

We are now ready for our final steps. We again read the configuration, create an API endpoint and submit our creation request. You can find the full script including comments and all imports here


Quantum teleportation

Quantum states are in many ways different from information stored in classical systems – quantum states cannot be cloned and quantum information cannot be erased. However, it turns out that quantum information can be transmitted and replicated by combining a quantum channel and a classical channel – a process known as quantum teleportation.

Bell states

Before we explain the teleportation algorithm, let us quickly recall some definitions. Suppose that we are given a system with two qubits, say A and B. Consider the unitary operator

U = C_{NOT} (H \otimes I)

i.e. the operator that applies a Hadamard operator to qubit A and then a CNOT gate with control qubit A and target qubit B. Let us calculate the action of this operator on the basis  state |0 \rangle |0 \rangle.

U |0 \rangle |0 \rangle = \frac{1}{\sqrt{2}} C_{NOT} (|0 \rangle |0 \rangle + |1 \rangle |0 \rangle) = \frac{1}{\sqrt{2}} (|00 \rangle + |11 \rangle)

It is not difficult to show that this state cannot be written as a product – it is an entangled state. Traditionally, this state is called a Bell state.

Now the operator U is unitary, and it therefore maps a Hilbert space basis to a Hilbert space basis. We can therefore obtain a basis of our two-qubit Hilbert space that consists of entangled states by applying the transformation U to the elements of the computational basis. The resulting states are easily calculated and are as follows (we use the notation from [1]).

Computational basis vector Bell basis vector
|00 \rangle |\beta_{00} \rangle = \frac{1}{\sqrt{2}} (|00 \rangle + |11 \rangle)
|01 \rangle |\beta_{01} \rangle = \frac{1}{\sqrt{2}} (|01 \rangle + |10 \rangle)
|10 \rangle |\beta_{10} \rangle = \frac{1}{\sqrt{2}} (|00 \rangle - |11 \rangle)
|11 \rangle |\beta_{11} \rangle = \frac{1}{\sqrt{2}} (|01 \rangle - |10 \rangle)

Of course we can also reverse this process and express the elements of the computational basis in terms of the Bell basis. For later reference, we note that we can write the computational basis as

|00\rangle = \frac{1}{\sqrt{2}} (\beta_{00} + \beta_{10})
|01\rangle = \frac{1}{\sqrt{2}} (\beta_{01} + \beta_{11})
|10\rangle = \frac{1}{\sqrt{2}} (\beta_{01} - \beta_{11})
|11\rangle = \frac{1}{\sqrt{2}} (\beta_{00} - \beta_{10})

Quantum teleportation

Let us now turn to the real topic of this post – quantum teleportation. Suppose that Alice is in possession of a qubit that captures some quantum state |\psi \rangle = a |0 \rangle + b|1 \rangle, stored in some sort of quantum device, which could be a superconducting qubit, a trapped ion, a polarized photon or any other physical implementation of a qubit. Bob is operating a similar device. The task is to create a quantum state in Bob’s device which is identical to the state stored in Alice device.

To be able to do this, Alice and Bob will need some communication channel. Let us suppose that there is a classical channel that Alice can use to transmit a finite number of bits to Bob. Is that sufficient to transmit the state of the qubit?

Obviously, the answer is no. Alice will not be able to measure both coefficients a and b, and even if she would find a way to do this, she would be faced with the challenge of transmitting two complex numbers with arbitrary precision using a finite string of bits.

At the other extreme, if Alice and Bob were able to fully entangle their quantum devices, they would probably be able to transmit the state. They could for instance implement a swap gate to move the state from Alice device onto Bob’s device.

So if Alice and Bob are able to perform arbitrary entangled quantum operations on the combined system consisting of Alice qubit and Bob’s qubit, a transmission is possible. But what happens if the devices are separated? It turns out that a transmission is still possible based on the classical channel provided that Alice and Bob had a chance to prepare a common state before Alice prepares |\psi \rangle. In fact, this common state does not depend at all on |\psi \rangle, and at no point during the process, any knowledge of the state |\psi \rangle is needed.

The mechanism first described in [2] works as follows. We need three qubits that we denote by A, B and C. The qubit C contains the unknown state |\psi \rangle_C (we use the lower index to denote the qubit in which the state lives). We also assume that Alice is able to perform arbitrary measurements and operations on A and C, and that B similarly controls qubit B.

The first step in the algorithm is to prepare an entangled state

|\beta_{00}^{AB} \rangle = \frac{1}{\sqrt{2}} \big[ |0\rangle _A |0 \rangle _B+ |1\rangle_A |1 \rangle_B) \big]

Here the upper index at \beta indicates the two qubits in which the Bell state vector lives. This is the common state mentioned above, and it can obviously be prepared upfront, without having the state |\psi \rangle_C. Bob and Alice could even prepare an entire repository of such states, ready for being used when the actual transmission should take place.

Now let us look at the state of the combined system consisting of all three qubits.

|\beta_{00}^{AB} \rangle |\psi\rangle_C = \frac{1}{\sqrt{2}} \big[ a |000 \rangle + a |110 \rangle + b |001\rangle + b |111 \rangle \big]

For a reason that will become clear in a second, we will now adapt the tensor product order and choose the new order A-C-B. Then our state is (simply swap the last two qubits)

\frac{1}{\sqrt{2}} \big[ a |000 \rangle + a |101 \rangle + b |010\rangle + b |111 \rangle \big]

Now instead of using the computational basis for our three-qubit system, we could as well use the basis that is given by the Bell basis of the qubits A and C and the computational basis of B, i.e. |\beta^{AC}_{ij} \rangle |k\rangle. Using the table above, we can express our state in this basis. This will give us four terms that contain a and four terms that contain b. The terms that contain a are

\frac{a}{2} \big[ |\beta^{AC}_{00}\rangle |0 \rangle + |\beta^{AC}_{10}\rangle |0 \rangle + |\beta^{AC}_{01}\rangle |1 \rangle - |\beta^{AC}_{11} \rangle |1 \rangle \big]

and the terms that contain b are

\frac{b}{2} \big[ |\beta^{AC}_{01}\rangle |0 \rangle + |\beta^{AC}_{11}\rangle |0 \rangle + |\beta^{AC}_{00}\rangle |1 \rangle - |\beta^{AC}_{10} \rangle |1 \rangle \big]

So far we have not done anything to our state, we have simply re-written the state as an expansion into a different basis. Now Alice conducts a measurement – but she measures A and C in the Bell basis. This will project the state onto one of the basis vectors |\beta_{ij}^{AC}\rangle. Thus there are four different possible outcomes of this measurement, which we can read off from the expansion in terms of the Bell basis above.

Measurement State of qubit B
\beta_{00} a|0\rangle + b|1\rangle = |\psi \rangle
\beta_{01} a|1\rangle + b|0\rangle = X|\psi \rangle
\beta_{10} a|0\rangle - b|1\rangle = Z|\psi \rangle
\beta_{11} b|0\rangle - a|1\rangle = -XZ|\psi \rangle

Now Alice sends the outcome of the measurement to Bob using the classical channel. The table above shows that the state in Bob’s qubit B is now the result of a unitary transformation which depends only on the measurement outcome. Thus if Bob receives the measurement outcomes (two bits), he can do a lookup on the table above and apply the inverse transformation on his qubit. If, for instance, he receives the measurement outcome 01, he can apply a Pauli X gate to his qubit and then knows that the state in this qubit will be identical to the original state |\psi \rangle. The teleportation process is complete.

Implementing teleportation as a circuit

Let us now build a circuit that models the teleportation procedure. The first part that we need is a circuit that creates a Bell state (or, more generally, transforms the elements of the computational basis into the Bell basis). This is achieved by the circuit below (which again uses the Qiskit convention that the most significant qubit q1 is the leftmost one in the tensor product).


It is obvious how this circuit can be reversed – as both individual gates used are there own inverse, we simply need to reverse their order to obtain a circuit that maps the Bell basis to the computational basis.

Let us now see how the overall teleportation circuit looks like. Here is the circuit (drawn with Qiskit).


Let us see how this relates to the above description. First, the role of the qubits is, from the top to the bottom

C = q[2]
A = q[1]
B = q[0]

The first two gates (the Hadamard and the CNOT gate) act on qubits A and B and create the Bell basis vector \beta_{00} from the initial state. This state is the shared entangled state that Alice and Bob prepare upfront.

The next section of the circuit operates on the qubits A and C and realizes the measurement in the Bell basis. We first use a combination of another CNOT gate and another Hadamard gate to map the Bell basis to the computational basis and then perform a measurement – these steps are equivalent to doing a measurement in the Bell basis directly.

Finally, we have a controlled X gate and a controlled Z gate. As described above, these gates apply the conditional corrections to Bob’s state in qubit B that depend on the outcome of the measurement. As a result, the state of qubit B will be the same as the initial state of qubit C.

How can we test this circuit? One approach could be to add a pre-processing part and a post-processing part to our teleportation circuit. The pre-processing gate acts with one or more gates on qubit C to put this qubit into a defined state. We then apply the teleportation circuit. Finally, we apply a post-processing circuit, namely the inverse sequence of gates to the “target” of the teleportation, i.e. to qubit B. If the teleportation works, the pre- and post-steps act on the same logical state and therefore cancel each other. Thus the final result in qubit B will be the initial state of qubit C, i.e. |0 \rangle. Therefore, a measurement of this qubit will always yield zero. Thus our final test circuit looks as follows.


When we run this on a simulator, the result will be as expected – the value in the classical register c2 after the measurement will always be zero. I have not been able to test this on real hardware as the IBM device does not (yet?) support classical conditional gates, but the result is predictable – we would probably get the same output with some noise. If you want to play with the circuits yourself, you can, as always, find the code in my GitHub repository.

So this is quantum teleportation – not quite as mysterious as the name suggests. In particular, we do not magically teleport particles and actual matter, but simply quantum information, based on a clever split of the information contained in an unknown state into a classical part, transmitted in two classical bit, and a quantum part that we can prepare upfront. We also do not violate special relativity – to complete the process, we depend on the measurement outcomes that are transmitted via a classical channel and therefore not faster than the speed of light. It is also important to note that quantum teleportation does also not violate the “no-cloning”-principle – the measurements let the original state collapse and therefore we do not end up with two copies of the same state.

Quantum teleportation has already been demonstrated in reality several times. The first experimental verification was published in [3] in 1997. Since then, the distance between the involved parties “Alice” and “Bob” has been gradually increased. In 2017, a chinese team reported a teleportation of a single qubit between a ground observatory to a low-Earth-orbit satellite (see [4]).

The term quantum teleportation is also used in the context of quantum error correction for a circuit that uses a measurement on an entangled state to bring one of the involved qubits into a specific state. When using a surface code, for instance, this technique – also called gate teleportation – is used to implement the T gate on the level of logical qubits. We refer to [5] for a more detailed discussion of this pattern.


1. M. Nielsen, I. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, Cambridge 2010
2. C.H. Bennett, G. Brassard, C. Crepeau, R. Jozsa, A Peres, W.K. Wootter, Teleporting an Unknown Quantum State via Dual Classical and EPR Channels, Phys. Rev. Lett. 70, March 1993
3. D. Bouwmeester, Jian-Wei Pan, K. Mattle, M. Eibl, H. Weinfurter, A. Zeilinger, Experimental quantum teleportation, Nature Vol. 390, Dec. 1997
4. Ji-Gang Ren et. al., Ground-to-satellite quantum teleportation, Nature Vol. 549, September 2017
5. D. Gottesman, I. L. Chuang, Quantum Teleportation is a Universal Computational Primitive, arXiv:quant-ph/9908010

Watching Kubernetes networking in action

In this post, we will look in some more detail into networking in a Kubernetes cluster. Even though the Kubernetes networking model is independent of the underlying cloud provider, the actual implementation does of course depend on the cloud provider which communicates with Kubernetes through a CNI plugin.

I will continue to use EKS, so some of this will be EKS specific. To work with me through the example, you will first have to bring up your cluster, start two nodes and deploy a pod running a httpd on one of the nodes. I have written a script up.sh and a manifest file that automates all this. To download and apply all this, enter

$ git clone https://github.com/christianb93/Kubernetes.git
$ cd Kubernetes/cluster
$ chmod 700 up.sh
$ ./up.sh
$ kubectl apply -f ../pods/alpine.yaml

Node-to-Pod networking

Now let us log into the node on which the container is running and collect some information on the local network interface attached to the VM.

$ ifconfig eth0
eth0: flags=4163  mtu 9001
        inet  netmask  broadcast
        inet6 fe80::2b:dcff:fee7:448c  prefixlen 64  scopeid 0x20
        ether 02:2b:dc:e7:44:8c  txqueuelen 1000  (Ethernet)
        RX packets 197837  bytes 274587781 (261.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 25656  bytes 2389608 (2.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

So the local IP address of the node is If we do a kubectl get pods -o wide, we get a different IP address – – for the pod. Let us curl this from the node.

$ curl
<h1>It works!</h1>

So apparently we have reached our httpd. To understand why this works, let us investigate the network configuration in more detail. First, on the node on which the container is running, let us take a look at the network configuration inside the container.

$ ID=$(docker ps --filter name=alpine-ctr --format "{{.ID}}")
$ docker exec -it $ID "/bin/bash"
bash-4.4# ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if6:  mtu 9001 qdisc noqueue state UP 
    link/ether 4a:8b:c9:bb:8c:8e brd ff:ff:ff:ff:ff:ff
bash-4.4# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 4A:8B:C9:BB:8C:8E  
          inet addr:  Bcast:  Mask:
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:936 (936.0 B)  TX bytes:0 (0.0 B)
bash-4.4# ip route
default via dev eth0 dev eth0 scope link

What do we learn from this? First, we see that inside the container namespace, there is a virtual ethernet device eth0, with IP address If you run kubectl get pods -o wide on your local workstation, you will find that this is the IP address of the Pod. We also see that there is a route in the container namespace that direct all traffic to this interface. The output of the ip link command also shows that this device is a virtual ethernet device that has a paired device (with index if6) in a different namespace. So let us exit the container, go back to the node and try to figure out what the network configuration on the node is.

$ ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0:  mtu 9001 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 02:2b:dc:e7:44:8c brd ff:ff:ff:ff:ff:ff
3: eni3f5399ec799:  mtu 9001 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 66:37:e5:82:b1:f6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
4: enie68014839ee@if3:  mtu 9001 qdisc noqueue state UP mode DEFAULT group default 
    link/ether f6:4f:62:dc:38:18 brd ff:ff:ff:ff:ff:ff link-netnsid 1
5: eth1:  mtu 9001 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 02:e8:f0:26:a7:3e brd ff:ff:ff:ff:ff:ff
6: eni97d5e6c4397@if3:  mtu 9001 qdisc noqueue state UP mode DEFAULT group default 
    link/ether b2:c9:58:c0:20:25 brd ff:ff:ff:ff:ff:ff link-netnsid 2
$ ifconfig eth0
eth0: flags=4163  mtu 9001
        inet  netmask  broadcast
        inet6 fe80::2b:dcff:fee7:448c  prefixlen 64  scopeid 0x20
        ether 02:2b:dc:e7:44:8c  txqueuelen 1000  (Ethernet)
        RX packets 197837  bytes 274587781 (261.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 25656  bytes 2389608 (2.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
$ ip route
default via dev eth0 dev eth0 dev eth0 proto kernel scope link src dev eni3f5399ec799 scope link dev eni97d5e6c4397 scope link dev enie68014839ee scope link

Here we see that – in addition to a few other interfaces – there is a device eth0 to which all traffic is sent by default. However, there is also a device eni97d5e6c4397 which is the other end of the interface visible in the container. And there is a route that sends all traffic that is directed to the IP address of the pod to this interface. Overall, this gives a picture which seems familiar from our earlier analysis of docker networking


If we try to establish a connection to the httpd running in the pod, the routing table entry on the node will send the traffic to the interface eni97d5e6c4397. This is one end of a veth-pair, the other end appears inside the container as eth0. So from the containers point of view, this is incoming traffic received via eth0, which is happily accepted and processed by the httpd. The reply goes the other way – it is directed to eth0 inside the container, travels via the veth pair and ends up inside the host namespace, coming from eni97d5e6c4397.

Pod-to-Pod networking across nodes

Now let us try something else. Log into the second node – on which the container is not running – and try the curl from there. Surprisingly, this works as well! What we have seen so far does not explain this, so there is probably a piece of magic that we are still missing. To find this, let us use the aws cli to print out the network interfaces attached to the node on which the container is running (the following snippet assumes that you have the extremely helpful tool jq installed on your PC).

$ nodeName=$(kubectl get pods --output json | jq -r ".items[0].spec.nodeName")
$ aws ec2 describe-instances --output json --filters Name=private-dns-name,Values=$nodeName --query "Reservations[0].Instances[0].NetworkInterfaces"
---- SNIP -----
        "MacAddress": "02:e8:f0:26:a7:3e",
        "SubnetId": "subnet-06088e09ce07546b9",
        "PrivateDnsName": "ip-192-168-84-108.eu-central-1.compute.internal",
        "VpcId": "vpc-060469b2a294de8bd",
        "Status": "in-use",
        "Ipv6Addresses": [],
        "PrivateIpAddress": "",
        "Groups": [
                "GroupName": "eks-auto-scaling-group-myCluster-NodeSecurityGroup-1JMH4SX5VRWYS",
                "GroupId": "sg-08163e3b40afba712"
        "NetworkInterfaceId": "eni-0ed2f1cf4b09cb8be",
        "OwnerId": "979256113747",
        "PrivateIpAddresses": [
                "Primary": true,
                "PrivateDnsName": "ip-192-168-84-108.eu-central-1.compute.internal",
                "PrivateIpAddress": ""
                "Primary": false,
                "PrivateDnsName": "ip-192-168-72-200.eu-central-1.compute.internal",
                "PrivateIpAddress": ""
                "Primary": false,
                "PrivateDnsName": "ip-192-168-96-163.eu-central-1.compute.internal",
                "PrivateIpAddress": ""
                "Primary": false,
                "PrivateDnsName": "ip-192-168-99-199.eu-central-1.compute.internal",
                "PrivateIpAddress": ""
---- SNIP ----

I have removed some of the output to keep it readable. We see that AWS has attached several elastic network interfaces (ENI) to our node. An ENI is a virtual network interface that AWS creates and manages for you. Each node can have more than one ENI attached, and each ENI can have a primary and multiple secondary IP addresses.

If you look at the last line of the output, you see that there is a network interface eni-0ed2f1cf4b09cb8be that has, as one of the secondary IP addresses, the IP address This is the IP address of our Pod! Let us now go back to the node and inspect its network configuration once more. You will not find a network interface with this exact name, but you will find a network interface on the node on which the pod is running which has the same MAC address, namely eth1.

$ ifconfig eth1
eth1: flags=4163  mtu 9001
        inet6 fe80::e8:f0ff:fe26:a73e  prefixlen 64  scopeid 0x20
        ether 02:e8:f0:26:a7:3e  txqueuelen 1000  (Ethernet)
        RX packets 224  bytes 6970 (6.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20  bytes 1730 (1.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

This is an ordinary VPC interface and visible in the entire VPC under all of its IP addresses. So if we curl our httpd from the second node, the traffic will leave that node via the default interface, be picked up by the VPC, routed to the node on which the pod is running and enter via eth1. As IP forwarding is enabled on this node, the traffic will be routed to the Pod and arrive at the httpd.

This is the missing piece of magic we have been looking for. In fact, for every pod running on a node, EKS will add an additional secondary IP address to an ENI attached to the node (and attach additional ENIs if needed) which will make the Pod IP addressses visible in the entire VPC. This mechanism is nicely described in the documentation of the CNI plugin which EKS uses. So we now have the following picture.


So this allows us to run our httpd in such a way that it can be reached from the entire Pod network (and the entire VPC). Note, however, that it can of course not be reached from the outside world. It is interesting to repeat this experiment with a slighly adapted YAML file that uses the containerPort field:

apiVersion: v1
kind: Pod
  name: alpine
  namespace: default
  - name: alpine-ctr
    image: httpd:alpine
      - containerPort: 80

If we remove the old Pod and use this YAML file to create a new pod, we will find that the configuration does not change at all. In particular, running docker ps on the node on which the Pod is scheduled will teach you that this port specification is not the equivalent of the port specification of the docker run port mapping feature – as the Kubernetes API specification states, this field is informational.

Implementation of services

Let us now see how this picture changes if we add a service. First, we will use a service of type ClusterIP, i.e. a service that will make our httpd reachable from within the entire cluster under a common IP address. For that purpose – after deleting our existing pods – let us create a deployment that brings up two instances of the httpd.

$ kubectl apply -f https://raw.githubusercontent.com/christianb93/Kubernetes/master/pods/deployment.yaml

Once the pods are up, you can again use curl to verify that you can talk to every pod from every node and every pod. Now let us create a service.

$ kubectl apply -f https://raw.githubusercontent.com/christianb93/Kubernetes/master/network/service.yaml

Once that has been done, enter kubectl get svc to get a list all services. You should see a newly created service alpine-service. Note down its cluster IP address – in my case this was

Now log into one of the nodes again, attach to the container, install curl there and try to connect to port

$ ID=$(docker ps --filter name=alpine-ctr --format "{{.ID}}")
$ docker exec -it $ID "/bin/bash"
bash-4.4# apk add curl
OK: 124 MiB in 67 packages
bash-4.4# curl
<h1>It works!</h1>

So, as promised by the definition of s service, the httpd is visible within the cluster under the cluster IP address of the service. The same works if we are on a node and not attached to a container.

To see how this works, let us log out of the container again and search the NAT tables for the cluster IP address of the service.

$ sudo iptables -S -t nat | grep
-A KUBE-SERVICES -d -p tcp -m comment --comment "default/alpine-service: cluster IP" -m tcp --dport 8080 -j KUBE-SVC-SXWLG3AINIW24QJC

So we see that Kubernetes (more precisely the kube-proxy running on each node) has added a NAT rule that captures traffic directed towards the service IP address to a special chain. Let us dump this chain.

$ sudo iptables -S -t nat | grep KUBE-SVC-SXWLG3AINIW24QJC
-A KUBE-SERVICES -d -p tcp -m comment --comment "default/alpine-service: cluster IP" -m tcp --dport 8080 -j KUBE-SVC-SXWLG3AINIW24QJC
-A KUBE-SVC-SXWLG3AINIW24QJC -m comment --comment "default/alpine-service:" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YRELRNVKKL7AIZL7
-A KUBE-SVC-SXWLG3AINIW24QJC -m comment --comment "default/alpine-service:" -j KUBE-SEP-BSEIKPIPEEZDAU6E

Now this is actually pretty interesting. The first line is simply the creation of the chain. The second line is the line that we already looked at above. The next two lines are the lines we are looking for. We see that, with a probability of 50%, we either jump to the chain KUBE-SEP-YRELRNVKKL7AIZL7 or to the chain KUBE-SEP-BSEIKPIPEEZDAU6E. Let us display one of them.

$ sudo iptables -S KUBE-SEP-BSEIKPIPEEZDAU6E -t nat 
-A KUBE-SEP-BSEIKPIPEEZDAU6E -s -m comment --comment "default/alpine-service:" -j KUBE-MARK-MASQ
-A KUBE-SEP-BSEIKPIPEEZDAU6E -p tcp -m comment --comment "default/alpine-service:" -m tcp -j DNAT --to-destination

So we see the that this chain has two rules. The first rule marks all packages that are originating from the pod running on this node, this mark is later evaluated in the forwarding rules to make sure that the packet is accepted for forwarding. The second rule is where the magic happens – it performs a DNAT, i.e. a destination NAT, and sends our packets to one of the pods. The rule KUBE-SEP-YRELRNVKKL7AIZL7 is similar, with the only difference that it sends the packets to the other pod. So we see that two things are happening

  • Traffic directed towards port 8080 of the cluster IP address is diverted to one of the pods
  • Which one of the pods is selected is determined randomly, with a probability of 50% for both pods. Thus these rules implement a simple load balancer.

Let us now see how things change when we use a service of type NodePort. So let us use a slightly different YAML file.

$ kubectl delete -f https://raw.githubusercontent.com/christianb93/Kubernetes/master/network/service.yaml
$ kubectl apply -f https://raw.githubusercontent.com/christianb93/Kubernetes/master/network/nodePortService.yaml

When we now run kubectl get svc, we see that our service appears as a NodePort service, and, as the second entry in the columns PORTS, we find the port that Kubernetes opens for us.

$ kubectl get services
NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
alpine-service   NodePort           8080:32755/TCP   13s
kubernetes       ClusterIP             443/TCP          7h

In my case, the port 32755 has been used. If we now go back to one of the nodes and search the iptables rules for this port, we find that Kubernetes has created two additional NAT rules.

$ sudo iptables -S  -t nat | grep 32755
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/alpine-service:" -m tcp --dport 32755 -j KUBE-MARK-MASQ
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/alpine-service:" -m tcp --dport 32755 -j KUBE-SVC-SXWLG3AINIW24QJC

So we find that for all traffic directed to this port, again a marker is set and the rule KUBE-SVC-SXWLG3AINIW24QJC applies. If you inspect this rule, you will find that it is similar to the rules above and again realizes a load balancer that sends traffic to port 80 of one of the pods.

Let us now verify that we can really reach this pod from the outside world. Of course, this only works once we have allowed incoming traffic on at least one of the nodes in the respective AWS security group. The following commands determine the node Port, your IP address, the security group and the IP address of the node, allow access and submit the curl command (note that I use the cluster name myCluster to select the worker nodes, in case you are not using my scripts to run this example, you will have to change the filter to make this work).

$ nodePort=$(kubectl get svc alpine-service --output json | jq ".spec.ports[0].nodePort")
$ IP=$(aws ec2 describe-instances --filters Name=tag-key,Values=kubernetes.io/cluster/myCluster Name=instance-state-name,Values=running --output text --query Reservations[0].Instances[0].PublicIpAddress)
$ SEC_GROUP_ID=$(aws ec2 describe-instances --filters Name=tag-key,Values=kubernetes.io/cluster/myCluster Name=instance-state-name,Values=running --output text --query Reservations[0].Instances[0].SecurityGroups[0].GroupId)
$ myIP=$(wget -q -O- https://ipecho.net/plain)
$ aws ec2 authorize-security-group-ingress --group-id $SEC_GROUP_ID --port $nodePort --protocol tcp --cidr "$myIP/32"
$ curl $IP:$nodePort
<h1>It works!</h1>


After all these nitty-gritty details, let us summarize what we have found. When you start a pod on a node, a pair of virtual ethernet devices is created, with one end being assigned to the namespace of the container and one end being assigned to the namespace of the host. Then IP routes are added so that traffic directed towards the pod is forwarded to this bridge. This allows access to the container from the node on which they are running. To realize access from other nodes and pods and thus the flat Kubernetes networking model, EKS uses the AWS CNI plugin which attaches the pods IP addresses as secondary IP addresses to elastic network interfaces.

When you start a service, Kubernetes will in addition set up NAT rules that will capture traffic determined for the cluster IP address of the service and perform a destination network address translation so that this traffic gets send to one of the pods. The pod is selected at random, which implements a simple load balancer. For a service of type NodePort, additional rules will be created which make sure that the same NAT processing applies to traffic coming from the outside world.

This completes today post. If you want to learn more, you might want to check out some of the links below.

NMR based quantum computing: gates and state preparation

In my last post on NMR based quantum computing, we have seen how an individual qubit can be implemented based on NMR technology. However, just having a single qubit is of course not really helpful – what we are still missing is the ability to initialize several qubits and to realize interacting quantum gates. These are the topics that we will discuss today.

The physics and mathematics behind these topics is a bit more involved, and I will try to keep things simple. In case you would like to dive more deeply into some of the details, you might want to take a look at my notes on GitHub as well as the references cited there.

Pseudo-pure states

In quantum computing, we are typically manipulating individual qubits which are initially in a pure state. In an NMR probe, however, we usually have a state which is very far from being a pure state. Instead, we typically deal with highly mixed states, and it appears to be impossible to prepare all spins in an NMR probe in the same, well defined pure state. So we first need to understand how the usual formalism of quantum computing – expressed in terms of qubits being in pure states which are subject to unitary operations – relates to the description of an NMR experiment in terms of density matrices and mixed states.

The formalism that we will now describe is known as the formalism of pseudo-pure states. These are states that are described by a density matrix of the form

\rho = \frac{1}{2^N}(1-\epsilon) + \epsilon |\psi \rangle \langle \psi |

with a pure state |\psi \rangle. Here, N is the number of spins, and the factor 1/2N has been inserted to normalize the state. This density matrix describes an ensemble for which almost all molecules are in purely statistically distributed states, but a small fraction – measured by \epsilon – of them are in a pure state. The second term in this expression is often called the deviation and denoted by \rho_\Delta.

Why are these states useful? To see this, let us calculate the time evolution of this state under a unitary matrix U. In the density matrix formalism, the time evolution is given by conjugation, and a short calculation shows that

U(t) \rho U(t)^{-1} = \frac{1}{2^N} (1-\epsilon) + \epsilon |U(t) \psi \rangle \langle U(t) \psi |

Thus the result is again a pseudo-pure state. In fact, it is the pseudo-pure state that corresponds to the pure state U|\psi \rangle. Similarly, one can show that if A is an observable, described by a hermitian matrix (which, for technical reasons, we assume to be traceless), then the expectation value of A evaluated on |\psi \rangle – which is the result a measurement would yield in the pure state formalism – is, up to a constant, the same as the result of a measurement of A on an ensemble prepared in the pure state corresponding to |\psi \rangle!

Using these relations, we can now translate the typical process of a quantum computation as expressed in the standard, pure state formalism, into NMR terminology as follows.

  • The process of initializing a quantum computer into an initial state |\psi \rangle corresponds to putting the NMR probe into the corresponding pseudo-pure state with \rho_{\Delta} \sim |\psi \rangle \langle \psi|
  • If the quantum algorithm is described as a unitary operator U (typically presented as a sequence of gates Ui), we apply the same unitary operator to the density matrix, i.e. we realize the gates as an NMR experiment
  • We then measure the macroscopic quantity A which will give us – up to a factor – the result of the calculation

This is nice, but how do we prepare these states? Different researchers have come up with different techniques to do this. One of the ideas that is commonly applied is averaging. This is based on the observation that the state at thermal equilibrium is a pseudo-pure state up to an error term. This error term can be removed by averaging over many instances of the same experiment (while somehow shuffling the initial states around using a clever manipulation). So we first let the probe settle down, i.e. prepare it in a thermal state (which can even be at room temperature). We then run our quantum algorithm and measure. Next, we re-initialize the system, apply a certain transformation and re-run the algorithm. We repeat this several times, with a prescribed set of unitary transformations that we apply in each run to the thermal equilibrium state before running the quantum algorithm. At the end, we add up all our results and take the average. It can be shown that these initial “shuffling transformations” can be chosen such that the difference between the thermal state and the pseudo-pure state cancels out.

Quantum gates

Having this result in place, we now need to understand how we can actually realize quantum gates. In the last post, we have already seen that we can use RF-pulses to rotate the state of an NMR qubit on the Bloch sphere, which can be used to realize single qubit gates. But how do we realize two-qubit gates like the CNOT gate? For that purpose, we need some type of interaction between two qubits.

Now, in reality, any two molecules in an NMR probe do of course interact – by their electric and magnetic fields, by direct collision and so forth. It turns out that in most circumstances, all but one type of coupling called the J-coupling can be neglected. This coupling is an indirect coupling – the magnetic moment created by a nuclear spin interacts with the electric field of an electron, which in turn interacts with the magnetic moment of a different nuclear spin. In the Hamiltonian – in a rotating frame – the J-coupling contributes a term like

\frac{2 \pi}{\hbar} J_{12} I^1_z I^2_z

where J12 is a coupling constant. This term introduces a correlation between the two nuclear spins, similar to an additional magnetic field depending on the state of the second qubit. The diagram below is the result of a simulation of an initial state to which J-coupling is applied and demonstrates that the J-coupling manifests itself in a splitting of peaks in an NRM diagram.


An NMR diagram is the result of a Fourier transform, so an additional peak corresponds to a slow, additional rotation of the two spin polarizations around each other.

Let us take a closer look at this. If we apply a free evolution under the influence of J-coupling over some time t, it can be shown – again I refer to my notes on GitHub for the full math – that the time evolution is given by the operator

U = \cos (\frac{\pi J_{12}}{2} t) -i \sigma_z^1 \sigma_z^2 \sin (\frac{\pi J_{12}}{2}  t)

If we choose t such that

\frac{\pi J_{12}}{2} t = \frac{\pi}{4}

and combine this time evolution with rotations around the z-axis of both qubits, we obtain the following transformation, expressed as a matrix

R_{z^2}(-\frac{\pi}{2}) R_{z^1}(-\frac{\pi}{2}) U =  \frac{1+i}{\sqrt{2}} \begin{pmatrix} 1 & & & \\ & 1 & & \\ & & 1 & \\ & & & -1  \end{pmatrix}

Technically, this transformation is the result of letting the system evolve under the influence of J-coupling for the time t and then applying two sharp RF pulses, one at resonance with the first qubit and one at resonance with the second qubit, timed such that they correspond to a rotation on the Bloch sphere around the z-axis with angle \pi / 2. The matrix on the right hand side of this equation represents a two-qubit transformation known as the controlled phase gate. This gate corresponds to applying a phase gate to the second qubit, conditional on the state of the first qubit.

This already looks very similar to a CNOT gate, and in fact it is – one can easily show that the phase gate is equivalent to the CNOT gate up to single qubit operations, more precisely these two gates are related by

C_{NOT} = (I \otimes H) C_{PHASE} (I \otimes H)

where I is the identity and H is the Hadamard gate. As single qubit operations can be realized by RF pulses, this shows that a CNOT gate can be realized by a free evolution under the J-coupling, framed by sequences of RF pulses. This, together with single qubit gates, gives us a universal gate set.

This completes our short deep dive into NMR based quantum computing. Historically, NMR based quantum computers were among the first fully functional implementations of (non error-corrected) universal quantum computers, but have come a bit out of fashion in favor of other technologies. At the time of writing, most of the large technology players focus on a different approach – superconducting qubits – which I will cover in the next couple of posts on quantum computing.

Kubernetes services and load balancers

In my previous post, we have seen how we can use Kubernetes deployment objects to bring up a given number of pods running our Docker images in a cluster. However, most of the time, a pod by itself will not be able to operate – we need to connect it with other pods and the rest of the world, in other words we need to think about networking in Kubernetes.

To try this out, let us assume that you have a cluster up and running and that you have submitted a deployment of two httpd instances in the cluster. To easily get to this point, you can use the scripts in my GitHub repository as follows. These scripts will also open two ssh connections, one to each of the EC2 instances which are part of the cluster (this assumes that you are using a PEM file called eksNodeKey.pem as I have done it in my examples, if not you will have to adjust the script up.sh accordingly).

# Clone repository
$ git clone https://github.com/christianb93/Kubernetes.git
$ cd Kubernetes/cluster
# Bring up cluster and two EC2 nodes
$ chmod 700 up.sh
$ ./up.sh
# Bring down nginx controller
$ kubectl delete svc ingress-nginx -n ingress-nginx
# Deploy two instances of the httpd
$ kubectl apply -f ../pods/deployment.yaml

Be patient, the creation of the cluster will take roughly 15 minutes, so time to get a cup of coffee. Note that we delete an object – the nginx ingress controller service – that my scripts generate and that we will use in a later post, but which blur the picture for today.

Now let us inspect our cluster and try out a few things. First, let us get the running pods.

$ kubectl get pods -o wide

This should give you two pods, both running an instance of the httpd. Typically, Kubernetes will distribute these two pods over two different nodes in the cluster. Each pod has an IP address called the pod IP address which is displayed in the column IP of the output. This is the IP address under which the pod is reachable from other pods in the cluster.

To verify this, let us attach to one of the pods and run curl to access the httpd running in the other pod. In my case, the first pod has IP address We will attach to the second pod and verify that this address is reachable from there. To get a shell in the pod, we can use the kubectl exec command which executes code in a pod. The following commands extract the id of the second pod, opens a shell in this pod, installs curl and talks to the httpd in the first pod. Of course, you will have to replace the IP address of the first pod – – with whatever your output gives you.

$ name=$(kubectl get pods --output json | \
             jq -r ".items[1].metadata.name")
$ kubectl exec -it $name "/bin/bash"
bash-4.4# apk add curl
bash-4.4# curl
<h1>It works!</h1>

Nice. So we can reach a port on pod A from any other pod in the cluster. This is what is called the flat networking model in Kubernetes. In this model, each pod has a separate IP address. All containers in the pod share this IP address and one IP namespace. Every pod IP address is reachable from any other pod in the cluster without a need to set up a gateway or NAT. Coming back to the comparison of a pod with a logical host, this corresponds to a topology where all logical hosts are connected to the same IP network.

In addition, Kubernetes assumes that every node can reach every pod as well. You can easily confirm this – if you log into the node (not the pod!) on which the second pod is running and use curl from there, directed to the IP address of the first pod, you will get the same result.

Now Kubernetes is designed to run on a variety of platforms – locally, on a bare metal cluster, on GCP, AWS or Azure and so forth. Therefore, Kubernetes itself does not implement those rules, but leaves that to the underlying platform. To make this work, Kubernetes uses an interface called CNI (container networking interface) to talk to a plugin that is responsible for executing the platform specific part of the network configuration. On EKS, the AWS CNI plugin is used. We will get into the details in a later post, but for the time being simply assume that this works (and it does).

So now we can reach every pod from every other pod. This is nice, but there is an issue – a pod is volatile. An application which is composed of several microservices needs to be prepared for the event that a pod goes down and is brought up again, be it due to a failure or simply due to the fact that an auto-scaler tries to empty a node. If, however, a pod is restarted, it will typically receive a different IP address.

Suppose, for instance, you had a REST service that you want to expose within your cluster. You use a deployment to start three pods running the REST service, but which IP address should another service in the cluster use to access it? You cannot rely on the IP address of individual pods to be stable. What we need is a stable IP address which is reachable by all pods and which routes traffic to one instance of this REST service – a bit like a cluster-internal load balancer.

Fortunately, Kubernetes services come to the rescue. A service is a Kubernetes object that has a stable IP address and is associated with a set of pods. When traffic is received which is directed to the service IP address, the service will select one of the pods (at random) and forward the traffic to it. Thus a service is a decoupling layer between the instable pod IP addresses and the rst of the cluster or the outer world. We will later see that behind the scenes, a service is not a process running somewhere, but essentially a set of smart NAT and routing rules.

This sounds a bit abstract, so let us bring up a service. As usual, a service object is described in a manifest file.

apiVersion: v1
kind: Service
  name: alpine-service
    app: alpine
  - protocol: TCP
    port: 8080
    targetPort: 80

Let us call this file service.yaml (if you have cloned my repository, you already have a copy of this in the directory network). As usual, this manifest file has a header part and a specification section. The specification section contains a selector and a list of ports. Let us look at each of those in turn.

The selector plays a role similar to the selector in a deployment. It defines a set of pods that are assumed to be reachable via the service. In our case, we use the same selector as in the deployment, so that our service will send traffic to all pods brought up by this deployment.

The ports section defines the ports on which the service is listening and the ports to which traffic is routed. In our case, the service will listen for TCP traffic on port 8080, and will forward this traffic to port 80 on the associated pods (as specified by the selector). We could omit the targetPort field, in this case the target port would be equal to the port. We could also specify more than one combination of port and target port and we could use names instead of numbers for the ports – refer to the documentation for a full description of all configuration options.

Let us try this. Let us apply this manifest file and use kubectl get svc to list all known services.

$ kubectl apply -f service.yaml
$ kubectl get svc

You should now see a new service in the output, called alpine-service. Similar to a pod, this service has a cluster IP address assigned to it, and a port number (8080). In my case, this cluster IP address is We can now again get a shell in one of the pods and try to curl that address

$ name=$(kubectl get pods --output json | \
             jq -r ".items[1].metadata.name")
$ kubectl exec -it $name "/bin/bash"
bash-4.4# apk add curl # might not be needed 
bash-4.4# curl
<h1>It works!</h1>

If you are lucky and your container did not go down in the meantime, curl will already be installed and you can skip the apk add curl. So this works, we can actually reach the service from within our cluster. Note that we now have to use port 8080, as our service is listening on this port, not on port 80.

You might ask how we can get the IP address of the service in real world? Well, fortunately Kubernetes does a bit more – it adds a DNS record for the service! So within the pod, the following will work

bash-4.4# curl alpine-service:8080
<h1>It works!</h1>

So once you have the service name, you can reach the httpd from every pod within the cluster.

Connecting to services using port forwarding

Having a service which is reachable within a cluster is nice, but what options do we have to reach a cluster from the outside world? For a quick and dirty test, maybe the easiest way is using the kubectl port forwarding feature. This command allows you to forward traffic from a local port on your development machine to a port in the cluster which can be a service, but also a pod. In our case, let us forward traffic from the local port 5000 to port 8080 of our service (which is hooked up to port 80 on our pods).

$ kubectl port-forward service/alpine-service 5000:8080 

This will start an instance of kubectl which will bind to port 5000 on your local machine ( You can now connect to this port, and behind the scenes, kubectl will tunnel the traffic through the Kubernetes master node into the cluster (you need to run this in a second terminal, as the kubectl process just started is still blocking your terminal).

$ curl localhost:5000
<h1>It works!</h1>

A similar forwarding can be realized using kubectl proxy, which is designed to give you access to the Kubernetes API from your local machine, but can also be used to access services.

Connect to a service using host ports

Forwarding is easy and a quick solution, but most likely not what you want to do in a production setup. What other options do we have?

One approach is to use host ports. Essentially, a host port is a port on a node that Kubernetes will wire up with the cluster IP address of a service. Assuming that you can reach the host from the outside world, you can then use the public IP address of the host to connect to a service.

To create a host port, we have to modify our manifest file slightly by adding a host port specification.

apiVersion: v1
kind: Service
  name: alpine-service
    app: alpine
  - protocol: TCP
    port: 8080
    targetPort: 80
  type: NodePort

Note the additional line at the bottom of the file which instructs Kubernetes to open a node port. Assuming that this file is called nodePortService.yaml, we can again use kubectl to bring down our existing service and add the node port service.

$ kubectl delete svc alpine-service
$ kubectl apply -f nodePortService.yaml
$ kubectl get svc

We see that Kubernetes has brought up our service, but this time, we see two ports in the line describing our service. The second port (32226 in my case) is the port that Kubernetes has opened on each node. Traffic to this port will be forwarded to the service IP address and port. To try this out, you can use the following commands to get the external IP address of the first node, adapt the AWS security group such that traffic to this node is allowed from your workstation, determine the node port and curl it. If your cluster is not called myCluster, replace every occurrence of myCluster with the name of your cluster.

$ nodePort=$(kubectl get svc alpine-service --output json | jq ".spec.ports[0].nodePort")
$ IP=$(aws ec2 describe-instances --filters Name=tag-key,Values=kubernetes.io/cluster/myCluster Name=instance-state-name,Values=running --output text --query Reservations[0].Instances[0].PublicIpAddress)
$ SEC_GROUP_ID=$(aws ec2 describe-instances --filters Name=tag-key,Values=kubernetes.io/cluster/myCluster Name=instance-state-name,Values=running --output text --query Reservations[0].Instances[0].SecurityGroups[0].GroupId)
$ myIP=$(wget -q -O- https://ipecho.net/plain)
$ aws ec2 authorize-security-group-ingress --group-id $SEC_GROUP_ID --port $nodePort --protocol tcp --cidr "$myIP/32"
$ curl $IP:$nodePort
<h1>It works!</h1>

Connecting to a service using a load balancer

A node port will allow you to connect to a service using the public IP of a node. However, if you do this, this node will be a single point of failure. For a HA setup, you would typically choose a different options – load balancers.

Load balancers are not managed directly by Kubernetes. Instead, Kubernetes will ask the underlying cloud provider to create a load balancer for you which is then connected to the service – so there might be additional charges. Creating a service exposed via a load balancer is easy – just change the type field in the manifest file to LoadBalancer

apiVersion: v1
kind: Service
  name: alpine-service
    app: alpine
  - protocol: TCP
    port: 8080
    targetPort: 80
  type: LoadBalancer

After applying this manifest file, it takes a few seconds for the load balancer to be created. Once this has been done, you can find the external DNS name of the load balancer (which AWS will create for you) in the column EXTERNAL-IP of the output of kubectl get svc. Let us extract this name and curl it. This time, we use the jsonpath option of the kubectl command instead of jq.

$ host=$(kubectl get svc alpine-service --output  \
$ curl $host:8080
<h1>It works!</h1>

If you get a “couldn not resolve hostname” error, it might be that the DNS entry has not yet propagated through the infrastructure, this might take a few minutes.

What has happened? Behind the scenes, AWS has created an elastic load balancer (ELB) for you. Let us describe this load balancer.

$ aws elb describe-load-balancers --output json
    "LoadBalancerDescriptions": [
            "Policies": {
                "OtherPolicies": [],
                "LBCookieStickinessPolicies": [],
                "AppCookieStickinessPolicies": []
            "AvailabilityZones": [
            "CanonicalHostedZoneName": "ad76890d448a411e99b2e06fc74c8c6e-2099085292.eu-central-1.elb.amazonaws.com",
            "Subnets": [
            "CreatedTime": "2019-03-17T11:07:41.580Z",
            "SecurityGroups": [
            "Scheme": "internet-facing",
            "VPCId": "vpc-060469b2a294de8bd",
            "LoadBalancerName": "ad76890d448a411e99b2e06fc74c8c6e",
            "HealthCheck": {
                "UnhealthyThreshold": 6,
                "Interval": 10,
                "Target": "TCP:30829",
                "HealthyThreshold": 2,
                "Timeout": 5
            "BackendServerDescriptions": [],
            "Instances": [
                    "InstanceId": "i-0cf7439fd8eb65858"
                    "InstanceId": "i-0fda48856428b9a24"
            "SourceSecurityGroup": {
                "GroupName": "k8s-elb-ad76890d448a411e99b2e06fc74c8c6e",
                "OwnerAlias": "979256113747"
            "DNSName": "ad76890d448a411e99b2e06fc74c8c6e-2099085292.eu-central-1.elb.amazonaws.com",
            "ListenerDescriptions": [
                    "PolicyNames": [],
                    "Listener": {
                        "InstancePort": 30829,
                        "LoadBalancerPort": 8080,
                        "Protocol": "TCP",
                        "InstanceProtocol": "TCP"
            "CanonicalHostedZoneNameID": "Z215JYRZR1TBD5"

This is a long output, let us see what this tells us. First, there is a list of instances, which are the instances of the nodes in your cluster. Then, there is the block ListenerDescriptions. This block specificies, among other things, the load balancer port (8080 in our case, this is the port that the load balancer exposes) and the instance port (30829). You will note that these are also the ports that kubectl get svc will give you. So the load balancer will send incoming traffic on port 8080 to port 30829 of one of the instances. This in turn is a host port as discussed before, and therefore will be connected to our service. Thus, even though technically not fully correct, the following picture emerges (technically, a service is not a process, but a collection of iptables rules on each node, which we will look at in more detail in a later post).


Using load balancers, however, has a couple of disadvantages, the most obvious one being that each load balancer comes with a cost. If you have an application that exposes tens or even hundreds of services, you clearly do not want to fire up a load balancer for each of them. This is where an ingress comes into play, which can distribute incoming HTTP(S) traffic across various services and which we will study in one of the next posts.

There is one important point when working with load balancer services – do not forget to delete the service when you are done! Otherwise, the load balancer will continue to run and create charges, even if it is not used. So delete all services before shutting down your cluster and if in doubt, use aws elb describe-load-balancers to check for orphaned load balancers.

Creating services in Python

Let us close this post by looking into how services can be provisioned in Python. First, we need to create a service object and populate its metadata. This is done using the following code snippet.

service = client.V1Service()
service.api_version = "v1"
service.kind = "Service"
metadata = client.V1ObjectMeta(name="alpine-service")
service.metadata = metadata

Now we assemble the service specification and attach it to the service object.

spec = client.V1ServiceSpec()
selector = {"app": "alpine"}
spec.selector = selector
port = client.V1ServicePort(
              port = 8080, 
              protocol = "TCP", 
              target_port = 80 )
spec.ports = [port]
service.spec = spec

Finally, we authenticate, create an API endpoint and submit the creation request.

api = client.CoreV1Api()

If you have cloned my GitHub repository, you will find a script network/create_service.py that contains the full code for this and that you can run to try this out (do not forget to delete the existing service before running this).