If you wanted a slogan that summarizes key trends in the IT industry over the last 30+ years, then “everything is virtual” would be a good candidate. In todays computing environments, essentially every physical resource is virtualized – and pretty much the first resource where this happened in mainstream computing was memory. In this post, we will look a bit into virtual memory: what it is, how it is implemented and how it is used.
To introduce the concept, suppose you were running a document storage service. Maybe you own a large warehouse with thousands of little boxes, and customers can hand in envelopes with documents that you will then safely store in one of the boxes. Obviously, you need a smart way to manage all these little envelops and boxes.
The first approach that someone might come up with is very simple. Suppose that your warehouse contains 10.000 boxes and each box can store one envelope. You could then number the boxes, say from zero to 9.999, and ask customers to label their envelops with numbers in that range. When a customer hands in an envelope that has, say, 11 on it, then an agent accepts the envelope and simply puts it into the box that has exactly that number, i.e. 11 in this case.
In the world of computing, this is how physical memory addresses work. Of course, the agent corresponds the memory controller and the customer to the CPU. When the CPU wants to store some data in memory, it passes that data (the envelope) along with the address where the data should be stored to the memory controller, and the memory controller stores the data at the requested address.
That model is simple, and the first customer will love it – the customer can freely decide about where the envelope should be placed, and feels like owning the entire warehouse exclusively.
However, suppose you are the second customer. If, for some reason, you also want to store an envelope in box 11, there is a problem as this box is already taken. Similar problems are caused by physical memory addressing if you want to implement multitasking, i.e. allow several programs (customers) to use resources of the PC (i.e. memory) simultaneously. If, for instance, you want to run two programs that are both designed to start at address 0x100 in memory, the first program will properly load and execute, but it will consume this area of memory and you will not be able to load and run the second program. This is the mode that most early CPUs used, for instance the 8086 used in the first IBM PCs (and todays CPUs still start in the so-called real mode that uses the same addressing pattern with a bit of a twist).
So for multitasking, you need a different model. Let us look at our warehouse again. Instead of just storing the envelope in the box designated by the label on the envelope, the agent could, for each and every customer, maintain a mapping table. When a customer hands in an envelope, say again with label 11, the agent would locate an unused box. This can have any label, say it is 3. The agent would then add an entry to the mapping table that maps – for this customer – label 11 to box 3, and store the envelope in box 3. If a second customer also hands in an envelope with label 11, the agent would add a similar entry to the mapping table for this customer, this time mapping 11 to – say – box 7
Similarly, if a customer requests to retrieve an envelope, say again envelope 11, the agent would consult the mapping table for this specific customer and see that the envelope is located in box 3, so that it can be located and handed over to the customer. The agent could then either mark the box as unused again or agree with the customer to reserve the space for later use.
Of course, this requires some overhead – the agent needs to maintain mapping tables, one table per customer. But assuming that there is still enough space left in the warehouse, every customer still feels like owning the entire warehouse – in fact, the customer is not even able to detect a difference to the first system. And there are more interesting opportunities that the system offers. If, for instance, a new customer arrives but the warehouse is full, the agent could locate a box that has not been used for a while, transfer the content of this box into some other storage room and use the box again.
Translated back to the world of computing, this model corresponds to virtual memory addresses. In that model, an additional translation unit sits between the CPU and the memory controller. This unit uses a system of mapping tables – called the page tables – to map forth and back between virtual and physical addresses. An operating system implements one different set of tables, i.e. a different mapping, for each process. Thus each process is like the customer in our warehouse example and logically can access the entire virtual memory space, even if other processes run at the same time. The operating system can even swap out areas of memory, i.e. if physical memory is exhausted, it could copy parts of its content onto a slower medium, say the hard drive, and reallocate that space – a mechanism which is usually called swapping.
There is much more that we can do having virtual memory at our disposal. We could, for instance, implement a mechanism called copy-on-write. Suppose that you wanted to copy a large area of physical memory which can be quite time consuming. Instead of copying, you could simply adapt the address mapping such that different virtual addresses point to the same physical address. For the CPU, which only sees virtual addresses, it appears like if the content had copied – until, of course, the CPU tries to change the copy. So the operating system needs to listen for writes into this memory area, and only if that write takes place create an actual physical copy. If we are lucky, only a comparatively small area of the copied memory is actually written to, and then copy-on-write can be much more efficient in terms of performance and overall memory utilization than a plain physical copy.
Designing the structure of the mapping tables requires some care. Of course, we cannot map every single byte of memory – storing that mapping information alone would consume the entire system memory. Instead, the memory is divided into small chunks called pages. Traditionally, a page is 4096 bytes, but other page sizes are feasible. We then map page by page, i.e. for each page, we map its starting address in virtual address space to a corresponding physical page.
In fact, this mapping is usually organized as a hierarchy of page tables, with the lowest level being the so-called page table directory. Each page table has 1024 entries, each of which describes the mapping of one page, so each page table is able to hold the mapping for 4 MB of memory. Thus we need 1024 page tables to describe the full address space accessible with 32 bits. The page table directory is then another table that simply holds the address of these 1024 page tables, as illustrated below.
In practice, there is not only one page table directory, but several page table directories, such that each process uses a different set of page table and hence a different mapping of physical and virtual addresses, i.e. a different virtual address space. In fact, for most operating systems, a process and an address space are almost synonyms. For Unix like operating systems, each process has a separate address space, but all threads running within this process share this address space which is an important conceptual difference between threads and processes.
To be able to use several page table directories, we need a way to switch between them. On the x86 platform, this is done by loading the address of the page table directory into a special register of the CPU called CR3. When we switch forth and back between two threads or tasks that do not belong to the same process, the operating system needs to make sure that the register CR3 is loaded with the address of the new page table directory corresponding to the process to which we switch. As every process has its own virtual-to-physical address mapping, the address spaces of two different processes are very effectively isolated and we can make sure that process A does not read or write memory allocated by process B and vice versa.
There is one problem, though. Changing the value of CR3 is, at the end of the day, an ordinary instruction. If every process could execute this instruction, a process could reload CR3 with a pointer to a manipulated page table. Thus, every process would be able to change the mapping between virtual and physical memory, effectively destroying the isolation.
The solution is to introduce privilege levels. Each process is running at one of these levels, and the operating system is running at the highest level. We could then make the instructions that manipulate CR3 privileged instructions that only the operating system is allowed to execute, and thus only the operating system could change page tables. These different privilege levels have been introduced into the x86 architecture as part of the protected mode that is also required to implement virtual memory and which will be the topic of one of the next posts in this series.