Recently, I picked up an old project of mine – implementing a Unix like operating kernel from scratch. I will post more on this later, but one of the first things I stumbled across when browsing my old code and my old documentation was the networking stack. I used this as an opportunity to refresh my own understanding of networking basics, and reckoned it might help others to post my findings here. So I decided to write a short series of posts on the basics of networking – ethernet, ARP, IP, TCP/UDP and all that stuff.
Before we start to get into details on the different networking technologies, let me first explain the idea of a layered networking architecture.
Ultimately, networking is about physical media (copper cables, for instance) connecting physical machines. Each of these machines will be connected to the network using a network adapter. These adapters will write messages into the network, i.e. create a certain sequence of electrical signals on the medium, and read other messages coming from the network.
However, the magic of a modern networking architecture is that the same physical network can be used to connect different machines with different operating systems using different networking protocols. When you use your web browser to connect to some page on the web – this one for instance – your web browser will use a protocol called HTTP(S) for that purpose. From the web browsers point of view, this appears to be a direct connection to the web server. The underlying physical network can be quite different – it can be a combination of Ethernet or WLAN to connect your PC to a router, some technology specific to your ISP or even a mobile network. The beauty of that is that the browser does not have to care. As often in computer science, this becomes possible by an abstract model organizing networking capabilities into layers.
The lowest layer in this model is called the link layer. Roughly speaking, this layer is the layer which is responsible for the actual physical connection between hosts. This covers things like the physical media connecting the machines and the mechanisms used to avoid collisions on the media, but also the addressing of network interfaces on this level.
One of the most commonly used link layer protocols is the Ethernet protocol. When the Ethernet protocol is used, hosts in the network are addressed using the so-called MAC address which uniquely identifies a network card within the network. In the Ethernet protocol (and probably in most other protocols), the data is transmitted in small units called packets or frames. Each packet contains a header with some control data like source and destination, the actual data and maybe a checksum and some end-of-data marker.
Now Ethernet is the most common, but by far not the only available link layer protocol. Another protocol which was quite popular at some point at the end of the last century is the Token Ring protocol, and in modern networks, a part of the path between two stations could be bridged by a provider specific technology or a mobile network. So we need a way to make hosts talk to each other which are not necessarily connected via a direct Ethernet link.
The approach taken by the layered TCP/IP model to this is as follows. Suppose you have – as in the diagram below – two sets of machines that are organized in networks. On the left hand side, we see an Ethernet network with three hosts that can talk to each other using Ethernet. On the right hand side, there is a smaller network that consists of two hosts, also connected with each other via Ethernet (this picture is a bit misleading, as in a real modern network, the topology would be different, but let us ignore this for a moment).
Both networks are connected by a third network, indicated by the dashed line. This network can use any other link layer technology. Each network contains a dedicated host called the gateway that connects the network to this third network.
Now suppose host A wants to send a network message to host B. Then, instead of directly using the link layer protocol in its network, i.e. Ethernet, it composes a network message according to a protocol that is in the second layer of the TCP/IP networking model, called the Internet layer which uses the IP protocol. Similar to an Ethernet packet, this IP packet contains again a header with target and source address, the actual data and a checksum.
Then host A takes this message, puts that into an Ethernet packet and sends it to the gateway. The gateway will now extract the IP message from the Ethernet packet, put it into a message specific to the networking connecting the two gateways and transmit this message.
When the message arrives at the gateway on the right hand side, this gateway will again extract the IP message, put it into an Ethernet message and send this via Ethernet to host B. So eventually, the unchanged IP message will reach host B, after traveling through several networks, piggybacked on messages specific to the used network protocols. However, for applications running on host A and B, all this is irrelevant – they will only get to see IP messages and do not have to care about the details and layer below.
In this way, the internet connects many different networks using various different technologies – you can access hosts on the internet from your mobile device using mobile link layer protocols, and still communicate with a host in a traditional data center using Ethernet or something else. In this sense, the internet is a network of networks, powered by the layer model.
But we are not yet done. The IP layer is nice – it allows us to send messages across the internet to other hosts, using the addresses specific to the IP layer (yes, this is the IP address). But it does, for instance, not yet guarantee that the message ever arrives, nor does it provide a way to distinguish between different communication end points (aka ports) on one host.
These items are the concern of the next layer, the transport layer. The best known example of a transport layer protocol is TCP. On top of IP, TCP offers features like stream oriented processing, re-transmission of lost messages and ports as additional address components. For an application using TCP, a TCP connection appears a bit like a file into which bytes can be written and out of which bytes can be read, in a well defined order and with guaranteed delivery.
Finally, the last layer is the application layer. Common application layer protocols are HTTP (the basis of the world wide web), FTP (the file transfer protocol), or SMTP (the mail transport protocol).
The transitions between the different layers work very much like the transition between the internet layer and the link layer. For instance, if an application uses TCP to send a message, the operating system will take the message (not exactly true, as TCP is byte oriented and not message oriented, but let us gracefully ignore this), add a TCP header, an IP header and finally an Ethernet header, and ask the network card to transmit the message on the Ethernet network. When the message is received by the target host, the operating system of that host strips off the various headers one by one and finally obtains the original data sent by the first host.
The part of the operating system responsible for this mechanism is naturally itself organized in layers – there could for instance be an IP layer that receives data from higher layers without having to care whether the data represents a TCP message or a UDP message. This layer then adds an IP header and forwards the resulting message to another layer responsible for operating the Ethernet network device and so forth. Due to this layered architecture stacking different components on top of each other, the part of the operating system handling networking is often called the networking stack. The Linux kernel, for instance, is loosely organized in this way (see for instance this article).
This completes our short overview of the networking stack. In the next few posts, we will look at each layer one by one, getting our hands dirty again, i.e. we will create and inspect actual network messages and see the entire stack in action.