(I'm the one with the glasses. The furry one is Smokey.) |
Angela Demke Brown
Ph.D. Candidate
Computer Science Department
|
Programs with data sets that exceed the capacity of physical memory (referred to as "out-of-core" programs) are common in many scientific computing applications. Examples can be found in weather modeling, computational fluid dynamics, earthquake simulations, scientific visualization, and speech recognition to name just a few. These programs require large amounts of I/O throughout their executions, but existing techniques for managing these requirements leave much to be desired. The simple solution - virtual memory - delivers unacceptable performance due to the high latency of page faults. Another problem with virtual memory is that out-of-core programs can over-consume memory resources and degrade the performance of other applications executing at the same time. Anyone who has ever run a large simulation on their desktop machine is well-familiar with this effect! The alternative for good performance - explicit read/write calls - is a lot harder to implement, leading to code that is less portable and harder to maintain.
My thesis research investigates a fully-automatic scheme that enables out-of-core applications to explicitly manage their memory requirements by prefetching pages that will be needed in the near future, and informing the operating system about pages that can be replaced. To make this work, we leverage the ability of the compiler to analyze data access patterns statically and automatically transform code (by inserting the required memory management hints) with the ability of the operating system to monitor dynamic resource usage conditions. A new run-time layer intercepts the hints inserted by the compiler and uses the dynamic information to decide when action needs to be taken. This system has been implemented using IRIX 6.5 and the Stanford SUIF compiler infrastructure; results have shown that the technique can dramatically improve the performance of array-based out-of-core programs (over implementations that rely on ordinary virtual memory) while virtually eliminating the negative impact on an interactive program. No intervention by the application programmer is required.
Currently, I am exploring the performance of a new compiler algorithm for inserting prefetch directives in multi-dimensional loops, such that the prefetches will be issued early enough to hide the I/O latency completely (we call this scheduling the prefetches). The previous algorithm only considered the innermost loop, which typically contains too few iterations for effective scheduling given the large latencies involved. I am also looking at extending automatic prefetching techniques to irregular out-of-core programs that use pointer-based data structures.