Docker internals: networking part I

In this post, we will investigate one of the more complex topics when working with Docker – networking.

We have already seen in the previous post that namespaces can be used to isolate networking resources used by different containers as well as resources used by containers from those used by the host system. However, by nature, networking is not only about isolating but also about connecting – how does this work?

So let us do some tests with the httpd:alpine image. First, get a copy of that image into your local repository:

$ docker pull httpd:alpine

Then open two terminals that we call Container1 and Container2 and attach to them (note that you need to switch to the second terminal after entering the second line).

$ docker run --rm -d  --name="Container1" httpd:alpine
$ docker exec -it Container1 "/bin/sh"

$ docker run --rm -d  --name="Container2" httpd:alpine
$ docker exec -it Container2 "/bin/sh"

Next, make curl available in both containers using apk update followed by apk add curl. Now let us switch to cointainer 1 and inspect the network configuration.

/usr/local/apache2 # ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:02  
          inet addr:172.17.0.2  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1867 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1017 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1831738 (1.7 MiB)  TX bytes:68475 (66.8 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

We see that there is an ethernet interface eth0 with IP address 172.17.0.2. If you do the same in the second container, you should see a similar output with a different IP address, say 172.17.0.3.

Now let us make a few connection tests. First, go to the terminal running inside container 1 and enter

curl 172.17.0.3

You should now see a short HTML snippet containing the text “It works!”. So apparently we can reach container 2 from container 1 – and similarly, you should be able to reach container 1 from container 2. Finally, try the same from a terminal attached to the host – you should be able to reach both containers from there. Finally, if you also have a web server or a similar server running on the host, you will see that you can also reach that from within the containers. In my case, I have a running tomcat being bound to 0.0.0.0:8080 on my local host, and was able to connect using

curl 192.168.178.27:8080

from within the container. How does this work?

To solve the puzzle, go back to a terminal on the host system and take a look at the routing table.

$ ip route show
default via 192.168.178.1 dev enp4s0  proto static  metric 100 
169.254.0.0/16 dev enp4s0  scope link  metric 1000 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 
192.168.178.0/24 dev enp4s0  proto kernel  scope link  src 192.168.178.27  metric 100 

We see that docker has apparently added an additional routing table entry and has created an additional networking device – docker0 – to which all packages with destination in the class B network 172.17.0.0 are sent.

This device is a so called bridge. A (software) bridge is very similar to an ethernet bridge in hardware. It connects two or more devices – each packet that goes to one device is forwarded to all other devices connected to the bridge. The Linux kernel offers the option to establish a bridge in software which does exactly the same thing.

Let us list all existing bridges using the brctl utility.

$ brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.0242fd67b17b	no		veth1f8d78c
							vetha1692e1

Let us compare this with the output of the ifconfig command (again on the host). The corresponding output is

veth1f8d78c Link encap:Ethernet  HWaddr 56:e5:92:e8:77:1e  
          inet6 addr: fe80::54e5:92ff:fee8:771e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1034 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1964 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:70072 (70.0 KB)  TX bytes:1852757 (1.8 MB)

vetha1692e1 Link encap:Ethernet  HWaddr ca:73:3e:20:36:f7  
          inet6 addr: fe80::c873:3eff:fe20:36f7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1133 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2033 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:78715 (78.7 KB)  TX bytes:1860499 (1.8 MB)

These two devices are also appearing in the output of brctl show and are two devices that are connected to the bridge. These devices are called a virtual ethernet devices. They are always created in pairs and act like a pipe: traffic flowing in at one of the two devices appears to come out of the second device and vice versa – like a virtual network cable connecting the two devices.

We just said that virtual devices are always created in pairs. We have two containers, and if you start them one by one and look at the output of ifconfig, we see that each of the two containers contributes one device. Can that be correct? After starting the first container we should already see a pair, and after starting the second one we should see four instead of just two devices. So one in each pair is missing. Where did it go?

The answer is that Docker did create it, but move it into the namespace of the container, so that it is no longer visible on the host. To verify this, we can see the connection between the two interfaces as follows. First, enter

ip link

on the host. The output should look similar to the following lines

$ ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp4s0:  mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 1c:6f:65:c0:c9:85 brd ff:ff:ff:ff:ff:ff
3: docker0:  mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:fd:67:b1:7b brd ff:ff:ff:ff:ff:ff
25: vetha1692e1@if24:  mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether ca:73:3e:20:36:f7 brd ff:ff:ff:ff:ff:ff link-netnsid 0
27: veth1f8d78c@if26:  mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether 56:e5:92:e8:77:1e brd ff:ff:ff:ff:ff:ff link-netnsid 1

This will show you (the very first number in each line) the so called ifindex which is a unique identifier for each network device within the kernel. In our case, the virtual ethernet devices visible in the host namespace have the indices 25 and 27. After each name, after the “at” symbol, you see a second number – 24 and 26. Now let us execute the same commmand in the first container.

/usr/local/apache2 # ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
24: eth0@if25:  mtu 1500 qdisc noqueue state UP 
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff

Here suddenly the device with index 24 appears, and we see that it is connected to device if25, which is displayed as vetha1692e1 in the host namespace! Similarly, we find the device if26 inside container two:

/usr/local/apache2 # ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
26: eth0@if27:  mtu 1500 qdisc noqueue state UP 
    link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff

This gives us now a rather complete picture of the involved devices. Ignoring the loopback devices for a moment, the following picture emerges.

dockernetworking.png

Now we can understand what happens when an application in container 1 sends a packet to container 2. First, the kernel will inspect the routing table in container 1. This table looks as follows.

/usr/local/apache2 # ip route
default via 172.17.0.1 dev eth0 
172.17.0.0/16 dev eth0  src 172.17.0.2

So the kernel will determine that the packet should be sent to the interface known as eth0 in container 1 – this is the interface with the unique index 24. As this is part of a virtual ethernet device pair, it appears on the other side of the pair, i.e. at the device vetha1692e1. This device in turn is connected to the bridge docker0. Being a bridge, it will distribute the packet to all other attached devices, so it will reach veth1f8d78c. This is now one endpoint of the second virtual ethernet device pair, and so the packet will finally end up at the interface with the unique index 26, i.e. the interface that is called eth0 in container 2. On this interface, the HTTP daemon is listening, receives the message, prepares an answer and that answer goes the same way back to container 1.

Thus, effectively, it appears from inside the containers as if all container network interfaces would be attached to the same network segment. To complete the picture, we can actually follow the trace of a packet going from container 1 to container 2 using arp and traceroute.

/usr/local/apache2 # arp
? (172.17.0.1) at 02:42:fd:67:b1:7b [ether]  on eth0
? (172.17.0.3) at 02:42:ac:11:00:03 [ether]  on eth0
/usr/local/apache2 # traceroute 172.17.0.3
traceroute to 172.17.0.3 (172.17.0.3), 30 hops max, 46 byte packets
 1  172.17.0.3 (172.17.0.3)  0.016 ms  0.013 ms  0.010 ms

We can now understand how applications in different containers can communicate with each other. However, what we have discussed so far is not yet sufficient to explain how an application on the host can access a HTTP server running inside a container, and what additional setup we need to access a HTTP server running in a container from the LAN. This will be the topic of my next post.

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s