Gilad Ben-Yossef July 8th, 2008
One of the tasks embedded system developers face is managing boot latency, or the time it takes for a device to become functional after power up.
After all, a set top box that will take more then 60 seconds from the time the “On” button is pressed until the end customer can at least interact with the menus, for example, will most likely be returned to the store for a refund, no matter what feature set it has. When it come to our gadgets, we are all hungry for immediate satisfaction, it seems.
There are many tricks one can employ to to achieve boot time nirvana, but as Knuth taught us, premature optimization is the root of all evil. Therefore, before we turn our efforts to optimize boot latency, we should first try to measure what that boot latency really is and what part of the boot sequence contribute to it, lest we optimize the wrong thing.
Generally speaking, on a x86 (32bit or 64 bit), the boot process of a Linux system is comprised of the following phases and milestones:
- The power up milestone: when the power is set to on
- The BIOS phase: which includes the POST (or Power On Self Test), device initialization, running of option ROMs and loading the boot loader form the MBR (or Master Boot Record).
- The boot loader phase: loading of an operating system kernel and ancillary data (such as Linux initrd or initramfs) into RAM.
- Kernel initialization phase: initialization of CPU, peripherals and kernel data structures, including the bring up of the non boot cores in the case of a multi-core machine.
- First user application first line of code milestone: the time when the first line of user application source code is executed.
Unfortunately, measuring the contribution of each of these phases to the overall boot latency is not an easy task to accomplish. At each of the phases different kind of code is executing on the machine: from the 16 bit BIOS code which is part of the machine firmware, via the 16 or 32 bit boot loader code, the 32 bit kernel code and the finally user applications - each of these is executing in a completely different software environment from the other and it is hard to find a common ground to compare the time each phase takes.
Luckily, the x86 architecture provides a useful tool: the TSC (or Time Stamp Clock) register. The TSC register was introduced in the original Intel Pentium CPU and counts the number of clock ticks from the last processor reset. Reading the current value of the TSC register is done using the RDTSC instruction.
Assuming no processor frequency changes (for example via SpeedStep technology) takes place and that we always sample the register of the same core in a multi core environment, both of which are easy to guarantee during the boot phase, the TSC register provides us with an accurate hi-res timer through which we can benchmarks the various phases in the boot process.
Continue Reading »