Carnegie Mellon
SCS logo
Computer Science Department
home
syllabus
staff
lecture
projects
 
 

Namespace Crossings for Plan 9


The Current Situation

Currently, in Plan 9, all namespaces are isolated from each other. This is, indeed, how Plan 9 achieves most of its security. However, this can pose a problem at times.

  • Clients which wish they had visibility into namespaces, such as rio, can't get it. In particular, rio's file name completion is performed with respect to its own namespace, not the namespace shared by processes inside the window.
  • There is a somewhat awkward "default" namespace, which is built up of almost-always-present kernel servers.
  • If we have /a/b/c and each of /, /a, and /a/b are mounted servers, if /a goes offline, there is no way to obtain descriptors into /a/b without redialing and remounting the remote server. For some servers, this is problematic.

This project is an experiment in solving the first issue, though it might provide insight into alternative mechanisms useful for the others.

Approach

The #p device driver, traditionally mounted on /proc, provides a debugging interface to processes, including /proc/$PID/ns, a textual reconstruction of the process's namespace.

We could add a /proc/$PID/nsmnt directory which would emulate the process's root directory and would implement walk() operations with respect to the target process's namespace.

Background

A file descriptor is backed by a structure called a Chan. A Chan contains all the metadata one might expect to find, including a specification of the owning device, the channel's type and Qid (identifier, in a sense), union-mount metadata if relevant, etc.

A channel also includes a Path, which is a list of mountpoints crossed to reach the current point in the hierarchy. The Plan 9 "dot-dot" paper has more about how this actually works, but perhaps a short example will suffice. Since servers can be queried for the parent of any directory, but obviously don't know about namespaces, walk(chan, ..) needs a way to "back up" across servers. The Path provides the required "left hand side" of all mountpoints (where the "right hand side" is the root of the lower server as mounted).

Namespaces (that is, the hierarchy of servers) are stored in a structure called a Pgrp. A namespace consists of a binding of "left hand side"s to one or more "right hand side"s (a "union mount" occurs when one "left hand side" maps to several "right hand side"s).

Implementation

The prototype alters the namespace code so that Chans can request that other Chans in other Pgrps can be consulted ("impersonated") when doing namespacing operations. As a side effect, the Path mechanism for keeping Chan history is updated so that path elements are (Chan, Pgrp) pairs.

The general flow is then:

  1. Acquisition of the current namespace: up->pgrp unless the current Chan's Path has a non-nil current Pgrp.
  2. Do walk()s inside this namespace explicitly, rather than have findmount() consult up->pgrp manually.
  3. walk()may return a Walkquid with ->nc having an "impersonated" Chan and Pgrp, which will then become the current Chan and Pgrp for subsequent walks (see the domount() Path-update code).

Current Status

The prototype functions well enough to support a demo! This screenshot shows the bottom window visiting the namespace of the top window. Each window displays a list of filenames appearing in square brackets; this is the result of requesting a filename-completion, and demonstrates that rio has been patched to perform completion with respect to each window's namespace.

The changes are available in diff format or in annotated form offsite.

Shortcomings

  1. Any kernel server which depends on up rather than the Chan structure is probably going to get something wrong.
  2. The current patch doesn't alter the meanings of bind() / mount() / unmount() to properly work in alternate namespaces.
  3. This does not solve the "detatched server" problem cleanly. The relatively straightforward nature of the tree exposure mechanism hints that we might need a separate enumeration of mountpoints to solve this problem cleanly.

Future Work

This patch as it stands is not scheduled for submission to the Plan 9 distribution. While it "works" in some sense, it is not entirely obvious what the desirable behavior is in all cases (should a process in one namespace be able to make changes in another namespace?). In addition, thinking after the dust settled suggests it might be structurally cleaner to approach the problem by slightly altering the definition of a namespace.

Currently, each process specifies its namespace (Pgrp) and also its Chan for /, the root directory. It is not clear that the ->root Chan is used or that it using it would work correctly, because each Path stores the (definitive) textual representation of its derivation. The presence of the root Chan limits us to exposing namespaces on a per-process basis, because each process views a slice of a namespace descending from its ->root Chan.

It would seem attractive for the root to be a namespace property rather than a process property. Then it would be possible for a kernel server to present userland with two views of a namespace: an unordered list of bindings and a walkable hierarchy of file servers. While it would be possible to expose the "mount-table view" the way things stand, it is not entirely clear what names to give the bindings in absence of a well-defined root; moving the root into the namespace is necessary to expose the server-hierarchy view.

This approach would allow the namespace-crossing code to be a separate device driver essentially independent of /proc and would probably enable the same functionality as this patch without requiring as much invasiveness into the namespace code. Making root a property of the namespace would also enable implementation of cross-namespace mount(), bind(), and unmount() operations, as the effects could then be identical for all processes in a namespace.

Conclusion

Namespaces are a core service of the Plan 9 kernel, so any changes in this area are tricky to specify and implement. Hopefully the extended functionality demonstrated by this project will spark discussion about what should be done and how it might best be accomplished.


[Last modified Monday March 26, 2007]