Most careers in operating system development probably start with a seemingly simple task – produce a program that, at start time, takes full control of a computer and prepares for the execution of the actual operating system, i.e. boot the computer. It turns out, however, that – thanks to the complexity of modern x86 based hardware – this is easier said than done. In this post, we will look at the boot process in a bit more detail.
History of the boot process
To understand the boot process in a modern PC, it is actually useful to go back into the past a bit. The first home computer that I owned was an Amstrad CPC 464 with a Zilog Z80 CPU. Like any other CPU I am aware of, this CPU stores the memory location of the next instruction to be executed in a register, called – in this case – the program counter (PC). When you turn on that machine, the CPU is in its initial state, with the value of the PC being equal to its initial value 0x0000. Now, at this point in memory space, there was actually no RAM, but a ROM that contained the CPC firmware. Starting at address 0x0000, the so-called restart block was located which then took over and initialized the system. And let the fun begin …(picture taken on the fabulous CPCBox, a browser-based CPC emulator).
This mechanism is straightforward and fast – when you turn on the machine, the operating system is available immediately and you can start to work with it. However, there are obvious disadvantages, the most important one being that updates are difficult once a machine is shipped, and even late in the manufacturing process.
The developers of the famous IBM PC XT decided to use a different approach. Instead of loading the full operating system from ROM, the ROM did only contain a firmware called the BIOS, a small set of routines needed for some basic operations, like reading from the hard drive or a disk, and a short startup routine. When the system comes up, the CPU again starts execution at some predefined point in memory, where the BIOS startup routine is located. This routine would then typically execute some self-tests and then load the actual operating system from the hard drive or a floppy disk.
Loading the operating system does actually happen in two phases. In the first phase, the BIOS loads the so-called boot loader. The boot loader is located at a fixed location of the hard drive, specifically in the first sector (called the master boot record (MBR)), and occupies a bit less than 512 bytes. This boot loader is brought into memory and then executed. It is then responsible for locating the actual operating system and loading it.
So far, this is comparatively simple. Unfortunately, PC hardware then started to evolve, and with every step in the evolution, the boot process had to be adapted so that it could make use of new features, but still be backwards compatible. For instance, when the x86 CPU was enhanced by adding the so-called protected mode in 1985, the BIOS would still operate in the older real mode to stay compatible with legacy operating systems. When an operating system wanted to make use of the new features of the protected mode, either the boot loader or the early stages of the operating system kernel had to switch the CPU into protected mode. However, as the BIOS is still 16 bit code, its routines can no longer be accessed once that switch has been completed, in particular reading from a hard drive is no longer easily possible until the initialization of the operating system has reached a certain point. This makes for interesting chicken-and-egg problems: if the operating system that needs to be loaded exceeds a certain size, you will have to switch to protected mode to be able to utilize the full memory, but once this is done it becomes much more complicated to access the hard drive to load the operating system files.
To solve this, several sophisticated boot mechanisms were implemented by various boot loaders. One approach was to actually compress the operating system image so that it could be loaded into memory once the BIOS was still accessible, then perform the switch to protected mode and then decompress the image. Another approach is to boot the system in several stages. The boot code for stage 0 would, for instance, be stored in the MBR, which would then load a stage 1. This stage 1 would contain sufficient functionality to read data from a standard file system, including a hard disk driver. It would then switch to protected mode and use this functionality to load the stage 2, which would then start the actual operating system. The GRUB bootloader, for instance, uses a similar (but slightly different) mechanism.
Booting a modern PC
Roughly around 2005, the industry started to introduce the UEFI as the next generation standardized PC firmware, aiming to become the successor of the outdated BIOS. The UEFI is much more powerful than the BIOS. Instead of loading one sector from a specified location on the hard drive, the UEFI is able to read from a file system. Thus the initial operating system image can be an ordinary executable file, stored on a file system. The UEFI will search certain standard directories for files with predefined names (like BOOTX64.EFI) and execute such a file if it is found. This file then executes in an UEFI environment in 32 bit protected mode or 64 bit long mode and has access to UEFI services to perform operations like input, output or hard drive access. These services can then be used to load the operating system, before eventually control of the system is handed over to the operating system and the UEFI services must no longer be used.
However, the BIOS used to be more than just a set of routines – it also contained several configuration tables that provide basic information on the hardware relevant for the operating system, like the available graphics modes, the number and type of available CPUs, the available memory and reserved memory areas and information on the configuration of the system interrupts. This information is now presented to an operating system as part of the ACPI (Advanced configuration and power interface). The ACPI is a complex collection of tables, some of which are static, some of which are actually dynamic, i.e. contain byte code that needs to be run by a bytecode interpreter.
So far, the description has focused on the standard case for a home PC – booting from a local storage like hard drive, an USB stick or a CD-ROM. However, several methods exist to boot over a network as well. The PXE boot protocol, for example, is a mechanism to boot an operating system over a network. When PXE boot is used, either a boot loader or a sufficiently modern BIOS initializes the clients network card and establishes a connection to server. The server runs a TFTP server (TFTP is a simplified version of the FTP protocol) and the client uses the TFTP protocol to transfer an operating system image over the network. This is the technology behind thin clients, i.e. clients that only contain a ROM based firmware capable of executing a PXE based boot process, but no local storage and no local copy of the operating system.
To get an idea how the full boot process looks like on a modern PC, let us take a look at the sequence of steps that are executed when my homebrew operating system ctOS starts up. Here is a screenshot of the system once the startup sequence is completed (taken using the QEMU emulator).
The most important steps that actually execute in the background and produce this output are as follows.
- The hardware (or the emulator) loads the BIOS (SeaBIOS in this case)
- The BIOS locates the boot loader (GRUB2 in my case) on the hard drive and loads it in several stages
- GRUB2 starts to execute on the first CPU in the system, called the bootstrap CPU
- When GRUB2 hands over control to the operating system, the system is already in protected mode, however certain system tables are in a preliminary or incomplete state. Thus these tables are initialized first
- Then all components of the kernel are initialized. This includes device drivers, interrupts, the memory manager, the process manager and the TCP/IP stack.
- Next, all the other CPUs in the system are initialized and put into protected mode
- Finally, the root process is started and the command line interface is located on the disk and started – at this point the boot process is complete and the user takes over control of the system
During this stage, the system will also read the BIOS or ACPI configuration tables and detect and initialize other hardware components like the interrupt controller, the system clock, the keyboard, or attached PCI devices like hard drives and network cards
Of course for a more advanced operating system, the actual boot process would do much more – loading a graphical user interface, for instance, or starting system daemons. However, technically, once we get to the point that a user space program can execute, most of the low level subtleties are mastered and we can work in a stable environment.
This concludes our discussion of the boot process. With the next post in this series, I will start our trip through the various components of an operating system. Happy booting!