Project 4: Writing a File System

Project Overview
Goals
Technical Disclaimer
Important Dates
Groups
Grading
Handin
Hardware Primitives
The Organization of a Hard Disk
Blocks vs. Sectors
Using the Disk
File System Design
Files
Naming
Fragmentation
Persistent Metadata
Limits
Performance
Buffer Cache
File System Interface
Using the Reference Kernel
Hints on Implementing a File System
A User Level Buffer Cache
Requests for Help
Plan of Attack

Project Overview

By now you should have a thorough understanding of how to implement a simple kernel on the x86 architecture. Features like multiple disjoint address spaces, context switching, and system call implementation should no longer be a mystery. In addition, you have now been exposed to a somewhat large scale systems programming project (both in size and complexity) if you had not been exposed to one before. Indeed, the kernel project is a valuable assignment.

The kernel you created in the previous project has one fatal flaw however - there is no way for users to save data from one period of system uptime to the next. Worse yet, there is not even a way for a program to save data such that it can be restored later by another incarnation of the same program (for example, a simple text editor opening a file created an hour earlier by a user).

This type of data is called persistent data, and one way to achieve it is with a file system. Using various forms of media we can save data so that it can be recovered later, even if the system goes down for some period of time or the accesses are from different processes with disjoint address spaces.

While the file system you will implement in this project is fairly simple, it will increase the utility of your operating system greatly. Users will be able to create files and store data in them. These files will then be available to other processes, even after the system has been rebooted.

Like the kernel project you have just completed, the file system is a four week project. It will again be very important to adhere to a reasonable schedule throughout the four weeks to ensure completion of the project. The file system probably involves the most code out of the four projects in this course. This causes problems not only in writing the code, but in finding bugs as well, so be prepared and make sure you leave time at the end for plenty of debugging.

Goals

Implementing a file system. This involves understanding how a disk is organized, how to use a buffer cache, etc.
Designing and managing large amounts of code well. If enough planning is done ahead of time, significant amounts of code can be reused throughout the file system.
Debugging large amounts of code. The file system code will likely be annoying to debug - lots of possibilities for off by one errors, array bounds errors, etc.

Technology Disclaimer

Because of the availability, low cost, and widespread use of the x86 architecture, it was chosen as the platform for this sequence of projects. As its creator, Intel Corporation has provided much of the documentation used in the development of these projects. In its literature Intel uses and defines terms like interrupt, fault, etc. On top of this the x86 architecture will be the only platform used in these projects.

The goal of this project set is certainly not to teach the idiosyncrasies of the x86 architecture (or Intel's documentation). That said, it will be necessary to become accustomed to the x86 way of doing things, and the Intel nomenclature, for the purposes of completing this project set. Just keep in mind that the x86 way of doing things is not the only way of doing things. It is the price to be paid for learning the principles of operating systems on a real world system instead of a simulated architecture.

Important Dates

Monday, March 31st: Project four assigned.
Monday, April 7thth: Checkpoint one.
Monday, April 14th: Checkpoint two.
Monday, April 28th: Project four is due at midnight (Monday night).

Groups

The file system project is a group assignment. You should already be in a group of two from the previous project. If you are not in a group, or you are having other group difficulties, send email to staff-412@cs.cmu.edu.

Grading

The primary criteria for grading the file system will be correctness and robustness. Does the kernel crash if you pass a null pointer to your write system call? File systems are expected to take abuse and continue to run.

Although it is not the first thing we will look at, style is also important. Use an appropriate level of commenting, name variables meaningfully and consistently, and try not to resort to global variables when possible. Style is an especially important consideration when dealing with large amounts of code.

It is also our intent that part of the project 4 grade will be based on a face to face interview process. We reserve the right to give partners in a group different project grades if circumstances make that necessary.

Please hand in a design document with your file system. It should talk about all the important design decisions you made, what works, and what does not. If you tell us about something that does not work, it puts you in a better position to be given more partial credit than if you do not tell us and we stumble across it ourselves, because it lets us know that you understand that there is a problem and have spent some nonzero amount of time trying to find it.

Handin

The procedure for handing in project 4 will be the same as it was for project 3. Look for more specific handin directions closer to the project due date.

Hardware Primitives

Projects one and three involved quite a bit of hardware interaction. To build a file system however, we require only a couple basic abilities - the ability to read and write blocks to persistent storage. In this project, as is the case in many environments, we will be using a hard disk for persistent storage.

The Organization of a Hard Disk

A hard disk is composed of one or more disk-shaped pieces of magnetic media. Unlike floppy disks (which contain magnetic media that is, well, floppy), the disks in a hard disk are rigid, hence the name hard disk. Each disk in a hard disk is referred to as a platter. Information can be stored on either side of this platter, and each side of a platter is referred to as a surface.

Each surface is further divided by concentric circles organizing it into tracks. The last step is to divide each track into some number of sectors. The sectors within a track occupy an equal amount of surface area. These sectors are the unit of transfer for a disk; you may not read or write less than one sector. For most modern hard disks, a sector contains 512 bytes.

Modern hard disks get more complicated than this - for example, they compensate for the fact that with this simple scheme, tracks near the outside of the surface would have much larger sectors than the tracks near the inside. However we will not go into such details here.

In fact, through a convenient abstraction known as logical block addressing, or LBA addressing (yes, that is technically logical block addressing addressing, but that is unfortunately the way the term is used), the geometry of the disk need not be a concern to you. LBA addressing numbers the sectors throughout the disk (all sectors on all surfaces) from zero on up, so that the disk appears to the system programmer like a large array of sectors. This simplification does not come completely for free (you should think about why), but it does make your job easier for the purpose of designing and implementing your file system.

By convention, the first sector of the disk (sector zero) is not available for use by the file system. This sector is the boot sector. It contains important information such as a bootstrap program or partition table information. For this reason overwriting the first sector of the disk normally has catastrophic effects. You must ensure that your file system never writes to sector zero.

Blocks vs. Sectors

A block is different than a sector in that it is the minimum unit of transfer for the file system. You may choose to make blocks equal to one sector, or you may choose to have blocks larger than one sector. A block must be an integer multiple of sectors, however.

Using the Disk

Like the devices you supported in projects one and three, communication with the disk is done through I/O ports using functions like inb and outb. Configuring and using the disk is a little more complicated than reading scancodes from the keyboard. The disk must be properly configured first, and there are several steps needed to initiate a read or a write. For this reason we have supplied you with a very simple disk driver. It is available in the library for use with this project, libproj4.a. We have provided a version of the source in the tarball, however this is only for you to look at - please do not try to compile the disk driver we provide in the tarball..

It is important to remember that only one request may be sent to the disk at any given time. If you try to initiate an IDE transfer while another transfer is already pending, an error will occur and the device driver will cause your kernel to panic.

The disk driver has the following interface (defined in disk.h):

int disk_init()
Installs the interrupt handler for disk interrupts into the IDT, blocks until the disks are ready (on power up a real disk may need time to begin responding to commands), and finds out how big the disks are. SUCCESS and ERROR are returned as appropriate. You will need to call this function, directly or indirectly, during the setup phase of your kernel.

void ide_read_sector(int lba_addr, char *dest, int num)
Initiates an IDE operation to read the specified sectors from the disk. The sectors will be stored sequentially starting at the address dest.

void ide_write_sector(int lba_addr, char *source, int num)
Initiates an IDE operation to write the specified sectors to the disk. The sectors will be taken sequentially starting at the address source.

int get_num_sectors()
Returns the number of sectors available on the disk. You must call this after calling disk_init() (and having disk_init() return SUCCESS).

In order to use the disk driver, you must provide two callback functions, disk_read_callback() and disk_write_callback(). These two functions will be called when a read or write completes. You will most likely need to put some process management code in these callback functions.

File System Design

File systems come in a large variety of shapes and sizes. The file system you need to implement is not very complicated, but it does have some important features that you should take a moment to consider.

Files

While a disk contains sectors of a fixed size, user-level programs only see the contents of the disk through the file abstraction. A file is, as far as programs know, a sequence of contiguous bytes, even though the bytes that belong in a file may not be physically contiguous on disk. A file has a name by which it is created and found. Your file system needs to allow a variety of operations on files. Some identify the file by its name, while others use a file descriptor. A file descriptor contains information relevant to active (opened) files, including, for instance, the current offset into the file and the location of the file's metadata cached in memory. A file descriptor is created from a file name as a result of the open or create system calls.

Note that you need to manage file descriptors carefully when a process fork()s or exec()s. You should use UNIX semantics when managing file descriptors in these situations. This implies that the child process gets its own copy of the parent's file descriptors. These new descriptors reference the same underlying objects, so for example a seek on the file descriptor in the child will change the offset of the file descriptor in the parent. File descriptors are not destroyed when a process exec()s.

Naming

The hierarchical namespace of your file system will be similar to what you are familiar with from UNIX environments. A collection of variable-sized storage units (files) are referred to by name. File names can be grouped into directories. Directories can also include other directories, or a mix of files and directories. This creates a hierarchy of directories. Directories can contain any number of files and directories, without limits other than those imposed by the physical size of the disk. The directory tree starts at the root directory, denoted by "/". The character "/" also serves as a file separator. As in UNIX environments, the name ".." refers to the parent of the directory that contains it, and "." refers to the containing directory itself. The name "../foo" then would point to the file "foo" in the parent directory.

Hierarchical names, known as paths, may be absolute or relative. Absolute paths start with "/". Relative paths are transformed into absolute paths by your file system. Your file system needs to maintain a notion of the current directory of a process. This current directory is modified with the chdir system call. The current directory, which is a path itself, is prepended to relative paths to produce the corresponding absolute path.

A link creates a file name reference that refers to an existing file under an additional name. Your file system should support both early-binding (hard) and late-binding (symbolic) links. This allows the naming tree to be made into a directed graph.

As in UNIX environments, directories are stored as files whose contents are interpreted by the file system.

Fragmentation

Your files will contain an integral number of sectors. Even though the space in a sector may be only partially used, the entire sector is reserved for that file. This means that the file system may suffer from internal fragmentation. However, the file data need not be contiguous on disk. Adjacent disk sectors may contain data belonging to two different files, and a file's sectors need not be adjacent to each other. That means that the file system need not and must not be vulnerable to external fragmentation.

Persistent Metadata

Your file system exists to store application data in files. To manage the location and grouping of raw data into files, your file system will need to build data structures. These data structures associate directories with file names, and file names with collections of disk sectors. These data structures are collectively referred to as metadata. This metadata must be written in the same disk that holds the file data. You must decide which disk sectors are used to store data and which ones will be used to store metadata.

Metadata on disk is inconsistent when one part of the metadata indicates one fact and another part of the metadata contradicts that fact. For example, an inconsistent metadata set might indicate that a file exists, and at the same time indicate that the file does not appear in any directories. The metadata on disk is also inconsistent when some part of the metadata asserts a fact that is not reflected by the data present in the disk. For example, the metadata might indicate that a file is 4 kilobytes long but only the first 2 kilobytes of the file might have been written to disk.

Your file system need not constantly maintain consistency in the metadata stored on disk. You may assume that when your file system starts the metadata on disk is consistent. You must ensure, however, that immediately after an application calls the sync() system call the metadata on disk is consistent. At all other times during execution, the metadata written on disk need not be consistent.In the real world, if a system crash occurs when the disk metadata is inconsistent, a utility will make it consistent before the file system is restarted.

Limits

Your file system may have a maximum file size, however it must be at least the size of the disk.
There is no limit on the number of files and/or directories contained within a directory, other than the limit imposed by the maximum file size.
The part of a filename that corresponds to a single directory entry is a string of n characters, where 0 < n < MAXFILENAMESIZE. The file name may contain any characters except '\0' and '/'. Two file names that differ in any character position are recognized as two different file names. MAXFILENAMESIZE is a parameter you define such that MAXFILENAMESIZE >= 32.
A path (a full hierarchical file name) may be limited to a maximum size of MAXPATHNAMELEN. MAXPATHNAMELEN is a parameter you define such that MAXPATHNAMELEN >= 256.
Your file system can only store a maximum number of files or directories. While this limit is constant for a given disk, it can be different for different disks. The mkfs system call is responsible for initializing the file system on a disk. This system call takes an argument, which is the maximum number I of files and/or directories that the file system may contain. I is such that 0 < I < 8192. Your file system must be able to handle any of the valid values of I.
Your file system refers to files in two ways. All files are referred to by name, however certain file operations (such as write) refer to their files by file descriptor. A file descriptor is a small integer that refers to the information associated with a file in use. Your file system should support a maximum of MAX_OPEN_FILES open files at any given time. MAX_OPEN_FILES is a parameter that your file system will define such that MAX_OPEN_FILES >= 32. In addition, each process supports a maximum of MAX_FD file descriptors. MAX_FD is a parameter that your file system will define such that MAX_FD >= 16.

Note: A single file may be opened multiple times, each with its own instantiation of the relevant state of open files. For example, a file may be accessed with two different current file offsets through two different instances of open. It is also possible that two different file descriptors may refer to a single instantiation of an open file. In this case, the two file descriptors would share and update the current file offset concurrently.

Performance

Your file system need not guarantee that the lookup of a file inside a directory is fast. In particular, your file system may take longer to locate a file inside a large directory than in a small one. You need not use sorted and/or balanced data structures to implement directories, but you may do so.

Your file system should perform well, however, for random data access within a file. Accessing a file's data has two components: first, looking up the metadata to determine which disk sector B contains the data at file offset off, and second, accessing the disk sector B. Your file system should guarantee that lookup of B given off takes no longer than O(log off). This means, for example, that you cannot use a linked list to store the sector numbers that form a file. Also, the requirement that any disk be able to contain up to I directories and/or files, any of which may be the size of the disk, implies that you cannot simply use a directly indexed array for this purpose. Think about why this is true.

Buffer Cache

Because the disk is very slow, your file system must try to amortize the large cost of reading and/or writing one sector over several data and/or metadata accesses. It achieves this by loading the data from disk sectors into memory, and allowing several file system operations to access the memory copy rather than the data directly on disk. This would be rather pointless if the disk copy of the data needed to be updated every time the memory copy was updated. Therefore your file system must implement a write-behind policy for its buffer cache. This policy dictates that when the cached data (or metadata) is written, the corresponding disk data need not be immediately updated. The disk data is only updated when:

The sync() system call is called by a user process.
The cache is full and a modified disk block is replaced by a newly accessed block from disk.

The update policy for cached data is one parameter of caches. Another parameter of cache design is the replacement policy. The replacement policy dictates which cached blocks will be evicted when the cache is full and space is needed in the cache for newly needed blocks.

A reasonable replacement policy for the buffer cache is least-recently-used (LRU.) This policy dictates that the block which has been used least recently among those currently in the cache is chosen as the victim when a free block is needed. The LRU policy may be modified to prefer victim blocks that have not been modified since they were loaded into memory from disk, because this would avoid an extra disk write. LRU may be further modified advantageously to apply different treatment to different types of blocks. For instance, data and metadata often have different patterns of access, and may benefit from different LRU queues. More exotically, large files (such as those used by databases) are typically accessed in more random patterns than shorter files are (which are often accessed sequentially). This fact may or may not warrant a modification of the cache replacement policies. Your buffer cache should hold no more than 256 sectors. You may choose to divide this space among several separate pools for the different kinds of access (e.g, data, metadata, prefetch, and whatever other clever things you may dispose,) but the total number of sectors cached may not exceed 256.

File System Interface

Your file system must support the following interface. The system call numbers are defined in syscall_nums.h. Please continue to use the same system call convention (i.e. the same software interrupt, same registers, etc). As in project 3, argument structures have been defined for system calls that take more than one argument. They are provided in fs_types.h.

int mkfs(int max_inodes)
Initializes the file system such that max_inodes inodes are available. This parameter determines the maximum number of files and directories that may be present on your file system. If max_inodes is 0, mkfs estimates a reasonable number for max_inodes based on the size of the disk. If max_inodes is less than 0 or greater than 8192, or if any error prevents the initialization from completing, ERROR is returned. Otherwise mkfs returns SUCCESS.
int open(char *pathname)
Opens the file named by pathname. If the file exists, this syscall sets the file position to the beginning of the file and returns a file descriptor number that can be used for future operations on this open file. If the file does not exist or pathname refers to a directory, open returns ERROR. open also returns ERROR if pathname is relative and the process' current director y has not yet been set with chdir, or if any internal file system error occurs. The file descriptor returned should be a small integer that uniquely identifies the open pathname file for as long as the file remains open. A single pathname name can be opened multiple times simultaneously, and each separate open should return a separate file descriptor. If pathname is a symbolic link, open attempts to open the file that the symbolic link points to, and not the symbolic link itself.
int close(int fd)
Closes the open file specified by the file descriptor number fd. If fd is not the descriptor number of a file currently open in the system, close returns ERROR; otherwise, it returns SUCCESS.
int create(char *pathname)
Creates and opens the new file named pathname. If the file can be created and opened, create sets the file position to the beginning of the empty file and returns a file descriptor number that can be used for future operations on this open file. Otherwise, create returns ERROR. Note that create should return ERROR if pathname already exists, or if its parent directory does not exist, and also if pathname is a relative path but the process' current directory has not yet been set with chdir.
int read(int fd, void *buf, int size)
Reads data from the current file position of an open file. The argument fd specifies the file descriptor number of the open file to be read. buf specifies the address of the calling process' buffer into which to deposit the data from the file, and size is the number of bytes of data to be read. read returns the number of bytes read, which will be 0 if reading at the end-of-file. The number of bytes read will be the minimum of the number of bytes requested and the number of bytes remaining in the file between the current file position and the end-of-file. Any error should cause read to return ERROR.
int write(int fd, void *buf, int size)
Writes data to the current file position of an open file. The argument fd specifies the file descriptor number of the open file to be written, buf specifies the address of the calling process' buffer from which to get the write data, and size is the number of bytes to be written to the file. write returns the number of bytes written, unless any error occurs, in which case it returns ERROR. It is not an error to write past the end of the file.
int seek(int fd, int offset)
Changes the current file position of the open file specified by fd. The argument offset specifies a byte offset in the file to which the current file position will be set. offset must be non-negative, and fd must be a currently valid descriptor. offset, however, may be larger than the current file size, in which case the file is automatically grown. seek returns the new file position or ERROR.
int link(char *oldname, char *newname)
Creates a hard link from the new file newname to the existing file oldname. The files oldname and newname need not be in the same directory. The file oldname must not be a directory. If newname exists, oldname does not exist, or if any file system limit prevents the completion of the operation, link returns ERROR, and it returns SUCCESS otherwise.
int symlink(char *oldname, char *newname)
Creates a symbolic link from the new file newname to the existing file oldname. The files oldname and newname need not be in the same directory. The file oldname may be a directory and may not even exist. If newname exists or if any file system limit prevents the completion of the operation, symlink returns ERROR, and it returns SUCCESS otherwise.
int unlink(char* pathname)
Removes the directory entry for pathname, and, if pathname is the last reference to a file, also deletes the file itself by freeing all data and metadata used by the file. The file pathname must not be a directory. If pathname is a symbolic link, the unlink operation is attempted on the symbolic link itself, and not on the file the symbolic link points to. On success, unlink returns SUCCESS, and it returns ERROR otherwise.
int filesize(char *pathname)
Returns the size of the file pathname, in bytes. Returns ERROR if the file does not exist, if the file is a directory, or if any other error occurs.
int mkdir(char *pathname)
Creates a new directory named pathname and creates the ``.'' and ``..'' entries in the new directory. If pathname exists and/or any other error occurs, mkdir returns ERROR. It returns SUCCESS otherwise.
int rmdir(char *pathname)
Deletes the existing directory named pathname. pathname must be a directory, and it must contain no valid entries other than ``.'' and ``..''. rmdir returns ERROR if any error occurs, and returns SUCCESS otherwise.
int chdir(char *pathname)
Changes the current directory of the current process to pathname. The notion of current directory is maintained by the file system for processes that use the file system, and is not reflected on disk. Before a process calls chdir, or if all previous calls from a process to chdir have returned ERROR, the current directory for that process is undefined, and all pathnames given to file system calls must be absolute paths (paths starting with ' /'.) After a successful call to chdir, the process may specify relative paths, that is, paths that do not start with ' /'. chdir returns ERROR if any error occurs (if pathname does not exist, for example), and returns SUCCESS otherwise.
int dirsize(char *pathname)
Returns the number of entries in the directory pathname, including the entries for ``.'' and ``..''. Returns ERROR if pathname does not exist, if pathname is not a directory, or if any other error occurs.
int direntry(char *pathname, int entrypos, char *entryname)
Sets entryname to the name of the entrypos'th entry within the directory specified by pathname. The entries within a directory can be returned in any order, but the order must remain the same for as long as no files are created or unlinked within it. 0 is a valid value for entrypos. direntry returns ERROR if pathname refers to a regular file, if pathname does not exist, if the directory has fewer than entrypos + 1 entries, or if any other error occurs. It returns SUCCESS otherwise. If direntry returns ERROR, the value of entryname is undefined. entryname is an address in the caller's memory space to which at least MAXFILENAMESIZE bytes have been allocated by the caller.
int sync()
Writes all ``dirty'' cached metadata and data to the disk, thus ensuring that the disk contains a consistent copy of the file system. ``Dirty'' data (or metadata) is data whose value on disk is stale because the buffer cache's write-behind policy has not yet updated it. sync returns ERROR if any file system error prevents the sync operation from proceeding. Otherwise sync returns SUCCESS.

Using the Reference Kernel

We encourage you to use your kernel from project 3 during project 4. If for some reason you feel that is not possible, please meet with the course staff to discuss using a reference kernel.

Hints on Implementing a File System

Think about the data structures you will need before you start writing code. As was the case in the kernel project, writing pseudocode here will help you a lot. Go through each system call and sketch out the code. At the same time, you will be refining the data structures for your file system.

There are two main kinds of data structures required for the file system - those that reside on disk, and those that never get written to disk (buffer cache data structures, for example). It is best to start by planning the first type of data structures.

Metadata Design

Remember that:

Your file system should not suffer from external fragmentation.
Random accesses must be efficient.
The maximum file size be at least as large as the size of the disk.

These constraints will likely dictate that you have some of the following:

A list of free sectors or blocks, possibly implemented as a bit mask of free sectors, one bit for each sector on the disk, or for each block if you decide to implement blocks that span multiple sectors. This list may be on disk or exist only in memory. Think about why one would be better than the other and why. Also think about whether it would be all right to require this list of free blocks to be contiguous on disk, or how you would relax that constraint in your implementation.
A file system header containing the maximum number of files and directories in the file system. This should be in a well known location.
A multi-level indexed structure for storing the blocks that belong to a file, in the manner of UNIX inodes, as described in the textbook.
An inode structure containing the above indexed structure in addition to any information regarding the file, such as size, or regarding the inode itself, such as the number of objects (i.e. directories) that know of its existence.
A set of disk sectors in one or several locations for storing the inodes, and a way to find them.
A well-known location for the inode of the root directory.
A directory entry structure containing a file name and an inode number.

Think about what data structures you will need, and how the various file system calls will use them. Increasingly refine the system calls, looking for functionality that is needed from within several of them, e.g, finding the disk block that contains a given offset within a file, finding the nth directory entry within a directory, or finding an open file structure given a file descriptor. There are many such opportunities for sharing code among the system calls: write a legible and maintainable program by separating common functionality into procedures.

At this point, you will be done with design. You will have designed:

A layout for the disk.
Internal format of directories, files, file descriptors, inodes, etc.
Buffer cache queues, hash tables, etc.
A rough call graph of procedures, to implement all file system syscalls.

The rest of the recommended approach is listed in the "Plan of Attack" section.

A User Level Buffer Cache

There is a neat trick that you can use to simplify the debugging of your file system code. Simics uses raw files for its disk images. This means that you can create a version of your buffer cache that, rather than interacting with the hard disk device driver in Simics, simply seeks and reads/writes to a regular file in user space under Linux or Solaris. If you are careful in your design of the buffer cache and how the file system interacts with it, the changes required to create a user-level version of the buffer cache are minimal.

The hard disk image file is organized simply as an array of sectors. The first 512 bytes in the file correspond to the first sector, the second 512 bytes correspond to the second sector, etc.

It is not required that you create a user-level buffer cache, however doing so will allow you to also create a user-level tool for exploring the hard disk image outside of Simics and performing operations on it. Also, you may find that implementing a user-level buffer cache and single-process draft versions of the system calls may significantly speed up your project, as the debugging environment will probably be simpler.

Requests for Help

Please do not ask for help from the course staff with a message like this:

"I'm getting the OSKit default trap handler telling me I have a general protection fault. What's wrong?"

"My create system call crashes the kernel! What's wrong?"

The problem with both of these messages is that they do not follow the initial information-gathering steps which you must perform yourself. Debugging is actually one of the best ways to learn operating system concepts.

Having said that, if you have spent reasonable amount of time trying to solve the problem and are not making any progress, do not hesitate to ask a question. But when you do, provide evidence that you have spent some time trying to solve the problem yourself.

Plan of Attack

Like the kernel project, the file system is a great deal of work. Although it is not necessarily as conceptually challenging, it is probably more code. Do not be fooled into thinking you can waste two of your four weeks. It is critical for you to start the project right away if you wish to finish on time and do well.

To help you, we have established a recommended plan of attack. Hopefully this will give you an idea of how to start. This also establishes the criteria used for the checkpoints.

Read this handout thoroughly.
Re-read the complicated parts to make sure you understand what's being said.
Think about it all for a while.
Write up pseudocode for all the system calls. At the same time, start sketching out the layout of the disk and design your data structures. Use the previous hints section as a guide for this step.
Work on the buffer cache. The buffer cache can be seen as a layer underneath the file system. The buffer cache need not understand much about the organization of the disk. Usage of the disk driver, as well as process blocking and rescheduling, can be completely encapsulated in the buffer cache to make the design cleaner and simpler.
Test your buffer cache by simply reading and writing some blocks to the disk. At this point you are simply trying to verify that the blocks go where you think they are, and that the process management is working correctly.
At this point you have enough to pass checkpoint one.
Write mkfs and the helper functions needed by it. Test it. This should be made easier by using a user-level buffer cache (described in this document).
Write more of the common helper functions. Choose the order of these carefully - it will be difficult to test some of them on their own.
Complete the create, open, read, write, and close system calls. If you have done the previous step thoroughly (and tested the helper functions), this step is not as bad as it seems.
You have enough to pass checkpoint 2.
Implement the remaining system calls.
Test your file system extensively.
You are done with the 15-412 Project Set! Go celebrate, and then start studying for the final. :D

Steve Muckle

Last modified: Mon Mar 31 20:50:41 EST 2003