MkLinux Server

And hence begins our discussion of the MkLinux server running on PowerPC's. My plan here is to do a bit of tracing of the operation of the MkLinux server, going into each function, figuring out what it does, etc - something really thorough. Given that its source takes up something like 28MB's on my drive, I know I'll end up being less thorough than I'd like. I plan to try to focus more on the Mach related aspects of the server (and the emulation of the Linux stuff) rather than the particulars of Linux itself. There are some other sites that tailer specifically to how Linux works, so maybe don't need to cover that...

Oh yes. I seem to enjoy converting the source code files for the server into HTML, so I'll provide them whenever possible, with links to and from the discussion here (I hope nobody at the OSF takes offense at this and sues me...). I cannot possibly provide all of the source, though, as I've only got a 5MB quota with my ISP...so sometimes we'll have to do without. Finally, if I'm ever wrong, please let me know! I don't want to be providing incorrect information!

And so it begins...

970603

Let's begin at the beginning. main(). The first function called from main() is server_thread_set_priv_data(). This function basically just sets the current thread's associated data with the cthread routine cthread_set_data(). The data in question is a struct server_thread_priv_data, which contains information regarding whether activations are being used (activations, as they're called by the Mach docs, are "empty threads," which are used to handle RPC calls to a port/portset managed by a Mach subsystem), whether the thread is preemptive or not, the task that contains the current thread, a mach port, and a jump buffer (to jump back into the ux_server_loop when its linux process exits?), and is essentially zero'ed out.

The second function called from main() is set_current_task(). This function sets up some data in the current thread's data (as described above) so that a task's associated cthread can be identified.

main() now goes on to call uniproc_enter() and uniproc_preemption_init(). uniproc_enter() seems to make its living providing the service of a mutually exclusive lock that assures that only one thread can perform a certain set of operations at one time. I'm not yet exactly certain what operations are governed by this system, however. In uniproc_enter(), there is a block of code contained in ifdefs for UNIPROC_PREEMPTION. If UNIPROC_PREEMPTION evaluates to TRUE (in the preprocessor), this code is included. The code in question seems to allow a task that is blocking, waiting on uniproc_enter() to be preempted - or interrupted, so that it can go and do something else. A task can only be preempted if the thread's private data structure's preemptive flag is set to TRUE. When the task gets control of the mutex, it will stash the current task into an array, indexed by the processor that the task is running on, in uniproc_has_entered(). It also takes note at the cthread that currently has control of the mutex in uniproc_holder_cthread.

uniproc_preemption_init() seems to initialize some kind of mutex lock to make sure that only one task can preempt another at one time. The way this seems to work is like so:

uniproc_preemption_init() is called, locking the uniproc_preemption_mutex.
uniproc_preemptible() is called, unlocking the mutex, allowing preemption, or uniproc_unpreemptible() is called, preventing preemption.
uniproc_preempt() is called, preempting a waiting task.

Now we start to get into the meaty stuff. main() now calls setup_server() to do a whole lot of Mach-specific stuff. The first thing it happens to do is call ports_init(), which acquires a number of special ports in the Mach kernel. Inside ports_init(), we see first a call to task_get_special_port(), which gets the bootstrap port. This port is used mainly as a channel along which other special ports are found, via the bootstrap_ports() function, which resides in the kernel. This call gives the server a privileged host port, device server port, root ledger wired port, root ledger paged port, and a security port. The privileged host port is used to get privileged information from the kernel, as well as to send privileged commands to the kernel. The device server port is used as a channel along which data can be read/written to and from the devices attached to the computer. The other ports returned are still a bit of a mystery to me.

While still inside ports_init(), we see a call to vm_set_default_memory_manager(), processor_set_default(), host_processor_set_priv(), and task_info().

The call to vm_set_default_memory_manager() is being used to find the port used to communicate with the memory manager. Methinks the MkLinux coders are relying on the fact that global variables are initialized to 0 before being used, so the Mach kernel just returns the current default memory manager into default_pager_port. According to the OSF Mach3 docs, vm_set_default_memory_manager() has been changed to host_default_memory_manager(), but is retained for backward compatability. This port is used for all memory_object_create calls.

processor_set_default() returns a processor set structure, to which all threads, tasks, and processors are assigned to by the kernel, unless explicitly told otherwise. host_processor_set_priv() is used to get send rights for the control port of the default processor set. task_info() is being used to get a security token, which is used later on to establish communication to some devices attached to the system.

Upon return from ports_init(), we come to a call to host_info(). This function is used to extract priority (scheduling) information from the kernel. Control then flows into gen_init(), which initializes a bit of information about the machine it's running on.

The first thing that gen_init() does is call size_memory(). size_memory(), in turn, calls host_info(), asking the kernel for basic information on the machine it's running on, extracting the amount of memory it has. Then, it calls host_statistics(), looking for information on virtual memory, and finally, it calls host_page_size() to find the memory page size of the system it's running on.

gen_init() then calls serv_port_register(), assigning a unique name (calculated via a hash function based on the current task. The result is a porthash structure, which has been dynamically allocated, and stored in a linked list. This, presumably, will used later in keeping track of the server's port...

MkLinux then goes and tries to write-protect page 0, so that NULL-pointer references are caught. This is done through the use of the Mach calls vm_protect(), vm_deallocate(), and vm_map(). First vm_protect(...TRUE, VM_PROT_READ) is called which seems to allow read permissions on page 0. Secondly, vm_protect(...,FALSE, VM_PROT_NONE) is called, eliminating access priviledges for page zero (Does one need to have read permissions for a page in order to remove all of its access permissions?). vm_deallocate() is now called, apparently to make sure that it is not currently allocated. Finally, vm_map() is called, mapping page zero to itself (zero to zero), such that its child tasks don't inherit the page (VM_INHERIT_NONE). A maximum protection of VM_PROT_READ, and a current protection of VM_PROT_NONE is used in the mapping. I believe that the maximum protection field applies to child tasks, while the current protection applies to the current task. This would mean that the current (server) task can't access page zero at all, whereas its children have the option to read from it.

970604

OK...that went smoothly. In case you're lost, we've gotten through all the basic init code up to the call to get_config_info() midway through gen_init(). The purpose of get_config_info() is to read the arguments that the server was booted with, and configure itself appropriately. A few of the options available in DR2.1update2 are the ability to use activations (where Mach messaging is handled through the use of subsystems), single/multi-mode, and whether the server is the only server running under the kernel, or if a Linux server already has control (check the calls to parent_server_set_type() when the 'l', 'o', and '2' switches are found). get_config_info() also seems to pass arguments along to the standard linux code when given the 'c' flag, and afterwards, finds the root device name to boot from.

As a little note here, when the use of activations is turned on, it allows OSF Mach3 to use short-circuited RPC's for message communication. It seems from the flag parsing code here, that kernel-located MkLinux servers don't use activations, and that MkLinux servers using activations aren't kernel-located. I presume that this is because when the MkLinux server is kernel-located, Mach can easily identify that no context switching and copying of data has to be done when delivering a Mach message, and using a subsystem would be more inefficient than just letting Mach handle it.

So, we come back to our friend, gen_init(), calling the last function in this routine, osfmach3_console_init(), which initializes the console. If the MkLinux server is running under another server (possibly another MkLinux server), it uses parent_server_get_mach_privilege() to get control of the console, otherwise it uses the Mach device_open() call to open it (using the security token described above). parent_server_get_mach_privilege() uses some funky procedures to get control of the console, depending on whether the server it's running under is an OSF1 server or another Linux server.

After the server gets control of the console, it calls register_console, with a pointer to the osfmach3_console_print() function. register_console takes that function pointer, and keeps track of it so that it may be used within the printk() function (which uses osfmach3_console_print() by default. register_console also prints out anything that might have been printk()'ed before register_console has been called.

Speaking of osfmach3_console_print(), that's a fairly simple function, essentially, just a little loop that cnputc()'s each character in a buffer, calling cnflush() every once in a while.

cnputc() doesn't do much, either, being just caching up characters until it's buffer is full or it gets a '\n' or '\r', at which point it calls cnflush(). cnflush() will call parent_server_write() if it's running underneath another Mach server, and will call the Mach device_write_inband() function if it is running on its own, which prints out the buffer built up with cnputc(). Note that if the call to device_write_inband() fails the first time 'round, the server tries to write "panic: cnflush" to the console one last time, and then divides by zero, indicating that "panic may fail." I'm not exactly sure why panic() might fail here, but it might have something to do with the possibility that the server might have multiple threads executing when this code is run.

Now, on return from register_console, we find ourselves back inside setup_server(). The next bit of code we execute, print_versions(), doesn't do much, just making a call to the Mach host_kernel_version() call, printing out the results.

After a version check, we come back to setup_server(), and if and only if we're using activations, we call subsystem_init(). subsystem_init() calls the Mach functions mach_subsystem_join() and mach_subsystem_create(). These functions seem to be used to create a subsystem through which RPC calls for exception handling and server callbacks can be handled. I can't find any docs on mach_subsystem_join() at the moment, so let's go digging around in the mach kernel source to see what this function does...It's code can be found in DR2.1update2/osfmk/src/mach_services/lib/libmach/mach_subsystem_join.c.

970605

OK. Looking at the code for mach_subsystem_join(), we can see that its first two arguments, two subsystem types, should really be initialized prior to being given to the function. This implies that the two variables being passed here, in subsystem_init(), are statically initialized data. They're initialized a bit strangely. catch_exc_subsystem is declared in exc_server.h, described as an external, seemingly in libmach.a or libmach_maxonstack.a. I believe the definitions for this structure are generated from the file DR2.1update2/osfmk/src/mach_kernel/mach/exc.defs, but I have no proof...

Given that it's so tricky to find out what's going on with catch_exc_subsystem, let's just go on to the slightly easier serv_callback_subsystem. I believe that this structure is defined in the serv_callback.defs file. I believe that the structure ends up being generated by the mig program during the build, being placed into one of the generated files (serv_callbackServer.c?). This belief is strengthened by the comment that the routine numbers for the subsystem defined there should be close to the one in mach/exc.defs. This is consistent with the concept of joining subsystems, as the range of numbers between the first subsystem and the second are used (wasted) during the join (ie: joining a subsystem with a routine number base of 1 and one with a base of 1000 would use up numbers 2 through 999, at least in this implementation of Mach.).

To be continued...