My advisor was Garth Gibson of the School of Computer Science. I worked with both the Parallel Data Lab and the Data Storage Systems Center.
Now, I work at new startup called Data Domain. We are still in stealth mode, but we have many openings. Feel free to contact me if you'd like to learn more about the opportunities here.
Many applications have serial I/O workloads that don't benefit from a disk array any more than single-threaded applications benefit from a parallel-processor. Read latency dominates I/O performance for such serial I/O workloads and disk arrays don't reduce latency. How can we help applications leverage disk array parallelism for low access latency?
My approach is for applications to give hints about their future I/O accesses. The file system can then use the hints to prefetch data and manage the cache. We've shown that a broad range of regularly-used, I/O-intensive applications can give good hints. Examples include: Agrep text search, XDataSlice scientific visualization, Postgres relational database, Sphinx speech recognition, Davidson computational physics, and the Gnuld object code linker. The hard part is managing resources such as file cache buffers and disk bandwidth.
When should cache buffers be used to hold data for reuse, and when should potentially useful data be ejected to prefetch new data? I have developed a framework for resource management based on cost-benefit analysis to answer this question. It uses a system model to estimate the benefit of using a buffer for prefetching and the cost of taking a buffer from the cache. The system I implemented (with help from my friends in the PDL) computes these estimates dynamically and reallocates a buffer from the cache for prefetching when the benefit is greater than the cost. The system reduces execution time for the applications listed above by 19% to 84%.
Here is the paper (postscript, pdf) I presented on this work at the 15th Symposium on Operating Systems Principles. Here are the slides for my talk.
Here is the full treatment in my dissertation (postscript, pdf).
Here is a complete list of publications on the topic.
rhp@acm.org