Parallel Data Laboratory
Carnegie Mellon University, Pittsburgh, PA
//TRACE1 is a new approach for extracting and replaying traces of parallel applications to recreate their I/O behavior. Its tracing engine automatically discovers inter-node data dependencies and inter-I/O compute times for each node (process) in an application. This information is reflected in per-node annotated I/O traces. Such annotation allows a parallel replayer to closely mimic the behavior of a traced application across a variety of storage systems. When compared to other replay mechanisms, //TRACE offers significant gains in replay accuracy. Overall, the average replay error for the parallel applications evaluated in this paper is below 6%.
1 Pronounced "parallel trace"
BibTeX entry
@inproceedings { ptrace-fast2007, author = "Michael Mesnier and Matthew Wachs and Julio L\'{o}pez and Raja Sambasivan and James Hendricks and Greggory Ganger", title = "//TRACE -- Parallel Trace Replay with Approximate Causal Events", organization = "{USENIX}", booktitle = "Proceedings of 5th Conference on File Systems and Storage Technologies ({FAST'07})", month = "February", year = 2007, address = "San Jose, CA" }