The kernel you created in the previous project has one fatal flaw however - there is no way for users to save data from one period of system uptime to the next. Worse yet, there is not even a way for a program to save data such that it can be restored later by another incarnation of the same program (for example, a simple text editor opening a file created an hour earlier by a user).
This type of data is called persistent data, and one way to achieve it is with a file system. Using various forms of media we can save data so that it can be recovered later, even if the system goes down for some period of time or the accesses are from different processes with disjoint address spaces.
While the file system you will implement in this project is fairly simple, it will increase the utility of your operating system greatly. Users will be able to create files and store data in them. These files will then be available to other processes, even after the system has been rebooted.
Like the kernel project you have just completed, the file system is a
four week project. It will again be very important to adhere to a
reasonable schedule throughout the four weeks to ensure completion of
the project. The file system probably involves the most code out of
the four projects in this course. This causes problems not only in
writing the code, but in finding bugs as well, so be prepared and make
sure you leave time at the end for plenty of debugging.
Goals
The goal of this project set is certainly not to teach the
idiosyncrasies of the x86 architecture (or Intel's
documentation). That said, it will be necessary to become accustomed
to the x86 way of doing things, and the Intel nomenclature, for the
purposes of completing this project set. Just keep in mind that the
x86 way of doing things is not the only way of doing things. It is the
price to be paid for learning the principles of operating systems on a
real world system instead of a simulated architecture.
Important Dates
Monday, March 31st: Project four assigned.
Monday, April 7thth: Checkpoint one.
Monday, April 14th: Checkpoint two.
Monday, April 28th: Project four is due at midnight (Monday night).
Groups
The file system project is a group assignment. You should already be in a group
of two from the previous project. If you are not in a group, or you are
having other group difficulties, send email to staff-412@cs.cmu.edu.
Grading
The primary criteria for grading the file system will be correctness
and robustness. Does the kernel crash if you pass a null
pointer to your write system call? File systems are expected to take
abuse and continue to run.
Although it is not the first thing we will look at, style is also important. Use an appropriate level of commenting, name variables meaningfully and consistently, and try not to resort to global variables when possible. Style is an especially important consideration when dealing with large amounts of code.
It is also our intent that part of the project 4 grade will be based on a face to face interview process. We reserve the right to give partners in a group different project grades if circumstances make that necessary.
Please hand in a design document with your file system. It should talk about all the important design decisions you made, what works, and what does not. If you tell us about something that does not work, it puts you in a better position to be given more partial credit than if you do not tell us and we stumble across it ourselves, because it lets us know that you understand that there is a problem and have spent some nonzero amount of time trying to find it.
Handin
The procedure for handing in project 4 will be the same as it was for
project 3. Look for more specific handin directions closer to the project
due date.
Hardware Primitives
Projects one and three involved quite a bit of hardware interaction. To
build a file system however, we require only a couple basic abilities -
the ability to read and write blocks to persistent storage. In this project,
as is the case in many environments, we will be using a hard disk for
persistent storage.
The Organization of a Hard Disk
A hard disk is composed of one or more disk-shaped pieces of magnetic
media. Unlike floppy disks (which contain magnetic media that is, well,
floppy), the disks in a hard disk are rigid, hence the name hard disk.
Each disk in a hard disk is referred to as a platter. Information can
be stored on either side of this platter, and each side of a platter is
referred to as a surface.
Each surface is further divided by concentric circles organizing it into tracks. The last step is to divide each track into some number of sectors. The sectors within a track occupy an equal amount of surface area. These sectors are the unit of transfer for a disk; you may not read or write less than one sector. For most modern hard disks, a sector contains 512 bytes.
Modern hard disks get more complicated than this - for example, they compensate for the fact that with this simple scheme, tracks near the outside of the surface would have much larger sectors than the tracks near the inside. However we will not go into such details here.
In fact, through a convenient abstraction known as logical block addressing, or LBA addressing (yes, that is technically logical block addressing addressing, but that is unfortunately the way the term is used), the geometry of the disk need not be a concern to you. LBA addressing numbers the sectors throughout the disk (all sectors on all surfaces) from zero on up, so that the disk appears to the system programmer like a large array of sectors. This simplification does not come completely for free (you should think about why), but it does make your job easier for the purpose of designing and implementing your file system.
By convention, the first sector of the disk (sector zero) is not
available for use by the file system. This sector is the boot
sector. It contains important information such as a bootstrap program
or partition table information. For this reason overwriting the first
sector of the disk normally has catastrophic effects. You must ensure
that your file system never writes to sector zero.
Blocks vs. Sectors
A block is different than a sector in that it is the minimum
unit of transfer for the file system. You may choose to make blocks
equal to one sector, or you may choose to have blocks larger than one
sector. A block must be an integer multiple of sectors, however.
Using the Disk
Like the devices you supported in projects one and three,
communication with the disk is done through I/O ports using functions
like inb and outb. Configuring and using the disk is
a little more complicated than reading scancodes from the
keyboard. The disk must be properly configured first, and there are
several steps needed to initiate a read or a write. For this reason we
have supplied you with a very simple disk driver. It is available in
the library for use with this project, libproj4.a. We have provided a
version of the source in the tarball, however this is only for you to
look at - please do not try to compile the disk driver we provide in
the tarball..
It is important to remember that only one request may be sent to the disk at any given time. If you try to initiate an IDE transfer while another transfer is already pending, an error will occur and the device driver will cause your kernel to panic.
The disk driver has the following interface (defined in disk.h):
int disk_init()
Installs the interrupt handler for disk interrupts into the IDT,
blocks until the disks are ready (on power up a real disk may need
time to begin responding to commands), and finds out how big the disks
are. SUCCESS and ERROR are returned as appropriate. You will need to
call this function, directly or indirectly, during the setup phase of
your kernel.
void ide_read_sector(int lba_addr, char *dest, int num)
Initiates an IDE operation to read the specified sectors from the
disk. The sectors will be stored sequentially starting at the address
dest.
void ide_write_sector(int lba_addr, char *source, int num)
Initiates an IDE operation to write the specified sectors to the
disk. The sectors will be taken sequentially starting at the address
source.
int get_num_sectors()
Returns the number of sectors available on the disk. You must call
this after calling disk_init() (and having disk_init()
return SUCCESS).
In order to use the disk driver, you must provide two callback functions,
disk_read_callback() and disk_write_callback(). These
two functions will be called when a read or write completes. You will most
likely need to put some process management code in these callback functions.
File System Design
File systems come in a large variety of shapes and sizes. The file
system you need to implement is not very complicated, but it does have
some important features that you should take a moment to consider.
Files
While a disk contains sectors of a fixed size, user-level programs
only see the contents of the disk through the file abstraction. A file
is, as far as programs know, a sequence of contiguous bytes, even
though the bytes that belong in a file may not be physically
contiguous on disk. A file has a name by which it is created and
found. Your file system needs to allow a variety of operations on
files. Some identify the file by its name, while others use a file
descriptor. A file descriptor contains information relevant to
active (opened) files, including, for instance, the current offset into
the file and the location of the file's metadata cached in memory. A
file descriptor is created from a file name as a result of the
open or create system calls.
Note that you need to manage file descriptors carefully when a process
fork()s or exec()s. You should use UNIX semantics when
managing file descriptors in these situations. This implies that the child
process gets its own copy of the parent's file descriptors. These new
descriptors reference the same underlying objects, so for example a seek
on the file descriptor in the child will change the offset of the file
descriptor in the parent. File descriptors are not destroyed when a process
exec()s.
Naming
The hierarchical namespace of your file system will be similar to what
you are familiar with from UNIX environments. A collection of
variable-sized storage units (files) are referred to by name. File
names can be grouped into directories. Directories can also include
other directories, or a mix of files and directories. This creates a
hierarchy of directories. Directories can contain any number of files
and directories, without limits other than those imposed by the
physical size of the disk. The directory tree starts at the root
directory, denoted by "/". The character "/" also serves as a file
separator. As in UNIX environments, the name ".." refers to the
parent of the directory that contains it, and "." refers to the
containing directory itself. The name "../foo" then would point to the
file "foo" in the parent directory.
Hierarchical names, known as paths, may be absolute or relative. Absolute paths start with "/". Relative paths are transformed into absolute paths by your file system. Your file system needs to maintain a notion of the current directory of a process. This current directory is modified with the chdir system call. The current directory, which is a path itself, is prepended to relative paths to produce the corresponding absolute path.
A link creates a file name reference that refers to an existing file under an additional name. Your file system should support both early-binding (hard) and late-binding (symbolic) links. This allows the naming tree to be made into a directed graph.
As in UNIX environments, directories are stored as files whose
contents are interpreted by the file system.
Fragmentation
Your files will contain an integral number of sectors. Even though the
space in a sector may be only partially used, the entire sector is
reserved for that file. This means that the file system may suffer
from internal fragmentation. However, the file data need not be
contiguous on disk. Adjacent disk sectors may contain data belonging
to two different files, and a file's sectors need not be adjacent to
each other. That means that the file system need not and must not be
vulnerable to external fragmentation.
Persistent Metadata
Your file system exists to store application data in files. To manage
the location and grouping of raw data into files, your file system
will need to build data structures. These data structures associate
directories with file names, and file names with collections of disk
sectors. These data structures are collectively referred to as
metadata. This metadata must be written in the same disk that
holds the file data. You must decide which disk sectors are used to
store data and which ones will be used to store metadata.
Metadata on disk is inconsistent when one part of the metadata indicates one fact and another part of the metadata contradicts that fact. For example, an inconsistent metadata set might indicate that a file exists, and at the same time indicate that the file does not appear in any directories. The metadata on disk is also inconsistent when some part of the metadata asserts a fact that is not reflected by the data present in the disk. For example, the metadata might indicate that a file is 4 kilobytes long but only the first 2 kilobytes of the file might have been written to disk.
Your file system need not constantly maintain consistency in the
metadata stored on disk. You may assume that when your file system
starts the metadata on disk is consistent. You must ensure, however,
that immediately after an application calls the sync() system
call the metadata on disk is consistent. At all other times during
execution, the metadata written on disk need not be consistent.In the
real world, if a system crash occurs when the disk metadata is
inconsistent, a utility will make it consistent before the file system
is restarted.
Limits
Your file system should perform well, however, for random data access
within a file. Accessing a file's data has two components: first,
looking up the metadata to determine which disk sector B contains the
data at file offset off, and second, accessing the disk sector B. Your
file system should guarantee that lookup of B given off takes no
longer than O(log off). This means, for example, that you cannot use a
linked list to store the sector numbers that form a file. Also, the
requirement that any disk be able to contain up to I directories
and/or files, any of which may be the size of the disk, implies that
you cannot simply use a directly indexed array for this purpose. Think
about why this is true.
Buffer Cache
Because the disk is very slow, your file system must try to amortize
the large cost of reading and/or writing one sector over several data
and/or metadata accesses. It achieves this by loading the data from
disk sectors into memory, and allowing several file system operations
to access the memory copy rather than the data directly on disk. This
would be rather pointless if the disk copy of the data needed to be
updated every time the memory copy was updated. Therefore your file
system must implement a write-behind policy for its buffer cache.
This policy dictates that when the cached data (or metadata) is
written, the corresponding disk data need not be immediately
updated. The disk data is only updated when:
A reasonable replacement policy for the buffer cache is
least-recently-used (LRU.) This policy dictates that the block which
has been used least recently among those currently in the cache is
chosen as the victim when a free block is needed. The LRU policy may
be modified to prefer victim blocks that have not been modified since
they were loaded into memory from disk, because this would avoid an
extra disk write. LRU may be further modified advantageously to apply
different treatment to different types of blocks. For instance, data
and metadata often have different patterns of access, and may benefit
from different LRU queues. More exotically, large files (such as those
used by databases) are typically accessed in more random patterns than
shorter files are (which are often accessed sequentially). This fact
may or may not warrant a modification of the cache replacement
policies.
Your buffer cache should hold no more than 256 sectors. You may choose
to divide this space among several separate pools for the different
kinds of access (e.g, data, metadata, prefetch, and whatever other
clever things you may dispose,) but the total number of sectors cached
may not exceed 256.
File System Interface
Your file system must support the following interface. The system call
numbers are defined in syscall_nums.h. Please continue to
use the same system call convention (i.e. the same software interrupt,
same registers, etc). As in project 3, argument structures have been
defined for system calls that take more than one argument. They are
provided in fs_types.h.
There are two main kinds of data structures required for the file
system - those that reside on disk, and those that never get written
to disk (buffer cache data structures, for example). It is best to
start by planning the first type of data structures.
Metadata Design
Remember that:
At this point, you will be done with design. You will have designed:
The hard disk image file is organized simply as an array of sectors. The first 512 bytes in the file correspond to the first sector, the second 512 bytes correspond to the second sector, etc.
It is not required that you create a user-level buffer cache, however
doing so will allow you to also create a user-level tool for exploring
the hard disk image outside of Simics and performing operations on it.
Also, you may find that implementing a user-level buffer cache and
single-process draft versions of the system calls may significantly
speed up your project, as the debugging environment will probably be
simpler.
Requests for Help
Please do not ask for help from the course staff with a message like this:
"I'm getting the OSKit default trap handler telling me I have a general protection fault. What's wrong?"
or
"My create system call crashes the kernel! What's wrong?"
The problem with both of these messages is that they do not follow the initial information-gathering steps which you must perform yourself. Debugging is actually one of the best ways to learn operating system concepts.
Having said that, if you have spent reasonable amount of time trying to solve the problem and are not making any progress, do not hesitate to ask a question. But when you do, provide evidence that you have spent some time trying to solve the problem yourself.
Plan of Attack
Like the kernel project, the file system is a great deal of
work. Although it is not necessarily as conceptually challenging, it
is probably more code. Do not be fooled into thinking you can waste
two of your four weeks. It is critical for you to start the project
right away if you wish to finish on time and do well.
To help you, we have established a recommended plan of attack. Hopefully this will give you an idea of how to start. This also establishes the criteria used for the checkpoints.