As a Unix derivative, the Linux operating system is organized into a kernel and a number of user-level programs. It has been the experience of the author that a large number of userland programs which are already known to work under Linux on some other architecture (say, i386 or ppc) can be ported to the ARM with relatively little effort.1 As such, most of the material in this report will concern itself with the kernel. Having said this, it is worth noting that none of the ARM Linux kernel work could take place without the efforts of Philip Blundell on the ARM ports of the GNU C library, GNU binutils, and GNU Compiler Collection (gcc). These resources presently support version 4 of the ARM architecture, which the SA-1110 implements.
The Linux kernel was not originally intended to support architectures beyond the 80386 found in the personal computer of its creator. Although a number of steps to correct this have been taken, evidence of the i386 bias remains easy to find in the current release. For example, many device drivers assume the existence of an I/O address space, and dedicated I/O instructions through which to access this region. Nevertheless, many key kernel services are now sufficiently architecture-independent as to enable new ports -- such as the SA-1110 work described herein -- to proceed in a straightforward manner.
In order to present the portions of the kernel which have relevance to a discussion of the SA-1110 port, the following subsections describe the bootstrap process on a generic StrongARM SA-1100 or SA-1110 system. Beginning with the boot loader, we review the low-level hardware initialization steps which are required for the kernel to begin execution. Next discussed is a feature of Linux which allows the kernel to boot from a compressed program image. Following this will be a summary of kernel memory initialization and low-level resource setup, such as that for interrupts. The overview concludes with an explanation of a typical method for launching userland.
The StrongARM SA-1100 microprocessor, upon emerging from reset, begins executing code from the base of static memory bank 0, which is generally mapped to some quantity of non-volatile flash RAM.2 Because of this convention, a design will generally involve a program, stored in bank 0, which maintains responsibility for certain hardware diagnostics, communication with a workstation development environment, and the task of populating physical memory with kernel images and other data. This program, referred to as the ``boot loader'' in this document,3 changes infrequently, and if corrupted, generally renders the system unusable.
The StrongARM SA-1100 processor includes features which allow it to operate at a number of discrete multiples of a core clock frequency. Because this has implications for the memory access times presented to the available RAM components, one of the first responsibilities of a boot loader is to properly configure the processor clock and memory access settings. Once the RAMs are accessible, programmatic constructs such as a heap or stack can be initialized, if so desired.4
Following memory initialization, the boot loader may enable interrupts from the processor serial port, which will allow communication with the development environment. Multiple serial ports are available on the StrongARM SA-1100, and it is at the discretion of the boot loader developer to assign a role to each one. For example, the boot loader might accept command and data traffic on one UART, while displaying debugging information over another. The choice of which port to configure, and the selection of parameters used (e.g., data rate, parity), must be consistent with the assumptions made in the kernel startup code.5 Once a serial connection is available, the boot loader may accept commands from a development host. One type of command might instruct the boot loader to accept a sequence of data over the serial connection and write that data to a location in memory, such as RAM or flash RAM. Another command might direct the boot loader to perform a series of hardware diagnostics, such as a memory integrity test. A third command might direct the processor to reset and begin executing instructions at a specified location in memory.
Not all systems will rely on an attached host for operation. An independent system may boot by copying a program image from non-volatile storage (such as flash RAM or a rotating disk) into RAM and jumping to that program. A system may also initialize a network interface, such as a wireless LAN adapter, and request a program image from a network file server.
Regardless of how the kernel image is obtained, ARM Linux does impose three requirements on the boot loader which must be satisfied before the kernel may be entered. The first of these is that the memory management unit must be disabled, register r0 must contain the value 0x0, and register r1 must contain a unique architecture identifier.6 No other information is communicated to the kernel beyond these two values. Once these preconditions have been met, the boot loader is free to perform a jump to the beginning of the kernel image, and begin executing instructions.
The Linux kernel, configured in a manner typical for a StrongARM host, is now over six megabytes in size7. For kernel developers, this is unhappy news, as a complete kernel image must be moved over a serial link -- at 115200bps -- each time a code change is made. Adding an eight minute delay to the compile-execute-debug cycle seems unattractive, but worse, some systems lack sufficient non-volatile storage to house such a large kernel image.8
Fortunately, the kernel program text is highly compressible -- a size reduction by a factor of ten is not uncommon -- and so a solution to both the delay and storage problems just mentioned seems to have presented itself. Unfortunately, as feature-filled as the StrongARM may be, the ability to execute compressed machine instructions has not yet been included. One could imagine placing decompression features in the boot loader, but not all loaders have been developed with Linux in mind.9 A third solution is to piggyback a small decompression program onto the compressed kernel image.
From the perspective of the boot loader, executing a kernel compressed in this manner, known as a zImage, is no different from executing a ``normal'' program image. The processor jumps to the decompressor, which reads the compressed image from memory, expands the image (using, for example, the Lempel-Ziv algorithm of gzip), and places the resulting instructions and data in physical memory. The decompressor then jumps to the kernel entry point, and kernel startup begins as usual.
Once the kernel program image is in memory -- regardless of how it got there -- execution begins with a small amount of code written in assembly which initializes the system. The kernel begins by attempting to discover on what type of machine it is executing, first by checking the processor ID register, then by examining the unique architecture identifier passed in register r1. If the system is recognized, the kernel constructs the processor page tables, which include direct mappings for the first few megabytes of the kernel image. Any data in the BSS segment are zeroed, the stack pointer is initialized, and the processor branches to the high-level kernel startup routine, which is written in C.
At this point, the kernel begins to allocate various resources such as memory maps and interrupts. Different system designs may have varying quantities of DRAM, SDRAM, or flash RAM installed at varying locations in the system address space. Further, I/O devices -- either external to the processor (such as a coprocessor), or internal (such as the Peripheral Control Module) -- must have virtual-to-physical mappings in order to be accessible. The kernel selects memory and I/O mapping lists to add to the page table based on knowledge of the system type, which was passed by the boot loader as the architecture identifier. (I/O mappings can also be created or destroyed dynamically.)
In order to describe how interrupts are handled under Linux running on the StrongARM SA-1100, a brief discussion of StrongARM SA-1100 interrupt sources seems in order. There are 32 separate interrupts which can be recognized by the processor, which arrive from sources such as the operating system timers, serial ports, and general-purpose I/O (GPIO) pins. Only twelve interrupts are allocated to the 28 GPIO sources, which leads to a potentially confusing interrupt handling strategy. Interrupts 0-10 map directly to edge detects on GPIO pins 0-10. Interrupt 11 absorbs all of GPIO pins 11-27. Any interrupt handling implementation would have to receive IRQ 11, then consult the GPIO edge detect register and manually deal with the multiplexed GPIO sources. The original method of handling these interrupts was for device drivers to request IRQ 11 in shared mode. At runtime, each driver which had requested this interrupt would be entered, given the chance to examine the GPIO edge detect register, and possibly handle a pending device interrupt. As will be discussed later, this approach has subsequently been replaced by a somewhat easier-to-use mechanism.
At this point, the kernel can proceed with a number of architecture-independent initializations, such as setting up the console, enabling filesystems and caching, and launching the init thread. This thread begins by performing a number of setup tasks which require the processor, memory, and other basic resources to be initialized. One operation performed by this thread is to mount the root filesystem, which on many StrongARM systems involves the use of a RAM disk.
A RAM disk is a compressed filesystem image which can be loaded into memory,10 and then mounted just like any other volume, such as a disk. RAM disks tend to be small; a compressed disk image might be 1.5-2.5MB in size. Such a disk might contain the familiar /etc initialization scripts, userland utilities such as interactive shells and editors, and loadable kernel modules. A filesystem mounted from a RAM disk may be writeable, but in general, any modifications to a RAM disk filesystem will not survive a reset or loss of power. The Journalling Flash File System, which would allow modifications to propagate through to flash RAM, is being developed by Axis Communications.
The last task performed by the init thread during startup is to exec() a program from the root filesystem which will be responsible for bringing up userland. In this document, ``userland'' will refer to the set of programs and data which exist outside of the kernel address space and which do not have the ability to execute privileged instructions. Informally, this will mean the set of programs and data with which the user can interact, such as shells or editors. The program run by the init thread11 usually begins by executing a number of shell scripts which activate system services such as inetd or syslogd. An init program may also be responsible for running a program that handles login terminals, such as getty. Note that init is the parent of all processes that execute on an ARM Linux system, and as such, is indirectly responsible for all user interaction with the system, regardless of whether this occurs via a terminal, a graphics console, voice, or some other mode.