//TRACE -- Parallel Trace Replay with Approximate Causal Events
Michael Mesnier, Matthew Wachs, Julio López, Raja Sambasivan, James Hendricks and Greggory Ganger

Parallel Data Laboratory
Carnegie Mellon University, Pittsburgh, PA

Abstract

//TRACE1 is a new approach for extracting and replaying traces of parallel applications to recreate their I/O behavior. Its tracing engine automatically discovers inter-node data dependencies and inter-I/O compute times for each node (process) in an application. This information is reflected in per-node annotated I/O traces. Such annotation allows a parallel replayer to closely mimic the behavior of a traced application across a variety of storage systems. When compared to other replay mechanisms, //TRACE offers significant gains in replay accuracy. Overall, the average replay error for the parallel applications evaluated in this paper is below 6%.

1 Pronounced "parallel trace"

BibTeX entry

@inproceedings	{ ptrace-fast2007,
  author	= "Michael Mesnier and Matthew Wachs and Julio L\'{o}pez and
		   Raja Sambasivan and James Hendricks and Greggory Ganger",
  title		= "//TRACE -- Parallel Trace Replay with Approximate Causal
		   Events",
  organization	= "{USENIX}",
  booktitle	= "Proceedings of 5th Conference on File Systems and Storage
		   Technologies ({FAST'07})",
  month		= "February",
  year		= 2007,
  address	= "San Jose, CA"
}