Matthew Wachs, Ph.D.
E-mail: Look at the URL of this
page. Take everything between the tilde (~) and the following slash
(/), and append ".com" to it. Then prepend "misc@" to it.
Research
I worked on performance
insulation
for shared storage servers. Shared storage servers are an
appealing alternative to per-application, dedicated storage systems.
However, it is essential that applications sharing a server receive
good performance, fairness, and efficiency. Unfortunately, interference
between workloads may reduce all three of these. With a combination of
three techniques (timeslicing, amortization, and cache partitioning),
we've been able to approach the goal of providing
each of n clients 1/n of their standalone throughput,
while keeping average response times reasonable [ read
more | web site ].
These techniques have been implemented in the
Argon storage server.
We've also demonstrated how to extend this technique to
a workload using multiple servers to store its data [ read more ].
Our latest work in this area, Cesium, shows how to provide specific bandwidth guarantees to workloads
while building on the high efficiency of Argon. A new timeslicing-based
scheduler grows or shrinks timeslices depending on the access patterns of
workloads to provide them with their specified bandwidth requirements. When a
guarantee cannot be met, we are able to differentiate between
fundamental
violations (those where the workload's access pattern is temporarily too
demanding for its guarantee to be met) and
avoidable violations. Our scheduler
is able to prevent nearly all of the avoidable violations, whereas other
approaches that do not explicitly manage efficiency suffer from many avoidable
violations when the workloads are
complex [ read more ].
My thesis, on these topics, may be viewed here.
I've also worked on a number of other topics. We explored making it possible to use a
file system implementation in one operating system from within another.
Not all file systems are available on all operating systems. Porting file
systems can be a significant burden for implementers. One type of "porting" is
merely maintaining compatibility with newer versions of a kernel; even minor
kernel revisions often change file system interfaces enough to require
significant effort from developers. While file systems can be exported from
one operating system to another using file sharing / network file systems like
NFS, the semantics of these protocols often differ dramatically from the file
system of interest. If NFS is used, the semantics become the "lowest common
denominator." Our solution, which preserves semantics, is File System Virtual
Appliances (FSVAs). These are virtual machines which host a file
system, using its operating system of choice. Other virtual machines on the
same machine can then access the file system as if it were local to them.
This is accomplished by installing a relatively simple kernel module in the
operating systems of both virtual machines. The module performs VFS forwarding
(redirecting kernel file system API calls)
between the machines [ read more |
web site ].
I've also worked on parallel
application I/O tracing for benchmarking. The best benchmark for
a real application is the real application, or trace replay based on
traces from that application. Unfortunately, running the real
application against a new or different storage system can be difficult,
or even impossible if the application or data set are classified,
confidential, or sensitive. Trace replay can be significantly more
straightforward and can be done with 'dummy' data.
For parallel applications, however,
accurate trace replay requires respecting the dependencies between
multiple nodes. Thus, it is necessary to discover these dependencies
during the trace extraction process. We've proposed and implemented a
black-box technique to do this by running a parallel application,
slowing down nodes, and observing how other nodes react [ read more | web site ].
Publications
-
Incremental Algorithm for Updating Betweenness Centrality in Dynamically Growing Networks. Miray Kas, Matthew Wachs, Kathleen M. Carley, L. Richard Carley. Proceedings of the 2013 IEEE / ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013). August 25-28, 2013, Niagara Falls, ON.
- File System Virtual Appliances: Portable File
System Implementations. Michael Abd-El-Malek, Matthew Wachs, James Cipar,
Karan Sanghi, Gregory R. Ganger, Garth A. Gibson, Michael K. Reiter.
ACM Transactions on Storage 8, 3, Article 9 (September 2012), 26 pages.
Supersedes Carnegie Mellon University Parallel Data Lab Technical Report
CMU-PDL-10-105. May 2010.
-
Incremental Centrality Computations for Dynamic Social Networks. Miray Kas, Matthew Wachs, L. Richard Carley, Kathleen M. Carley. Conference Presentation at XXXII International Sunbelt Social Network Conference (Sunbelt 2012). March 12-18, 2012, Rodendo Beach, CA. [ read more ]
- Exertion-based Billing for Cloud Storage Access. Matthew Wachs, Lianghong Xu, Arkady Kanevsky, Gregory R. Ganger. Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '11). June 14-15, 2011, Portland, OR. Supersedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-11-105. March 2011.
Abstract / PDF [65K]
- Improving Storage Bandwidth Guarantees with
Performance Insulation. Matthew Wachs, Gregory R. Ganger.
Carnegie Mellon University Parallel Data Lab Technical Report
CMU-PDL-10-113. October 2010.
Abstract / PDF [285K]
- Relative Fitness Modeling. Michael P. Mesnier,
Matthew Wachs, Raja R. Sambasivan, Alice X. Zheng, Gregory R. Ganger.
Communications of the ACM (Vol 52, No 4, pg 91-96). April, 2009.
Abstract / PDF [775K]
- Modeling the relative fitness of storage. Michael P. Mesnier,
Matthew Wachs, Raja R. Sambasivan, Alice X. Zheng, Gregory R. Ganger.
Proceedings of the Joint
International Conference on Measurement and Modeling of Computer
Systems (SIGMETRICS'07). June 12th–16th 2007, San Diego, CA.
Awarded Best Paper
Abstract / PDF [235K]
- Argon: Performance Insulation
for Shared Storage Servers.
Matthew Wachs, Michael Abd-El-Malek,
Eno Thereska, Gregory R. Ganger. Proceedings of the 5th USENIX
Conference on File and Storage Technologies (FAST '07),
February 13–16, 2007, San Jose, CA. Supercedes Carnegie Mellon
University Parallel Data Lab Technical Report
CMU-PDL-06-106, May 2006.
Abstract
/ PDF [167K]
- //TRACE: Parallel Trace
Replay with Approximate Causal
Events.
Michael P. Mesnier, Matthew Wachs, Raja R. Sambasivan, Julio Lopez, James
Hendricks, Gregory R. Ganger, David O'Hallaron.
Proceedings of the 5th USENIX Conference on File and Storage
Technologies (FAST '07),
February 13–16, 2007, San Jose, CA. Supercedes Carnegie Mellon
University Parallel Data Lab Technical Report
CMU-PDL-06-108, September 2006.
Abstract
/ PDF [187K]
- Early Experiences on the Journey Towards Self-* Storage.
Michael
Abd-El-Malek, William V. Courtright II, Chuck
Cranor, Gregory R. Ganger, James Hendricks, Andrew J. Klosterman,
Michael Mesnier, Manish Prasad,
Brandon Salmon, Raja R. Sambasivan, Shafeeq Sinnamohideen, John D.
Strunk, Eno Thereska, Matthew Wachs, Jay J. Wylie.
Bulletin of the IEEE Computer Society Technical Committee on Data
Engineering, September 2006.
Abstract
/ PDF [113K] / Postscript
[745K]
- Stardust: Tracking Activity in a Distributed Storage System.
Eno Thereska, Brandon Salmon, John Strunk, Matthew Wachs, Michael
Abd-El-Malek, Julio Lopez, Gregory R. Ganger. Proceedings of the Joint
International Conference on Measurement and Modeling of Computer
Systems, (SIGMETRICS'06). June 26th-30th 2006, Saint-Malo, France.
Abstract / PDF [578K]
- Relative fitness models for storage. Michael Mesnier,
Matthew Wachs, Brandon Salmon, Gregory R. Ganger. SIGMETRICS
Performance Evaluation Review (Vol 33, No 4, pg 23-38). March, 2006.
- Ursa Minor: Versatile Cluster-based Storage.
Michael Abd-El-Malek, William V. Courtright II, Chuck Cranor, Gregory
R. Ganger, James Hendricks, Andrew J. Klosterman, Michael Mesnier,
Manish Prasad, Brandon Salmon, Raja R. Sambasivan, Shafeeq
Sinnamohideen, John D. Strunk, Eno Thereska, Matthew Wachs, Jay J.
Wylie. Proceedings of the 4th USENIX Conference on File and Storage
Technology (FAST '05). December 13–16, 2005, San Francisco, CA.
Supercedes Carnegie Mellon University Parallel Data Lab Technical
Report CMU-PDL-05-104, April 2005.
Awarded Best Paper
Abstract
/ PDF
[490K]
Support
I appreciate the support, while I was a graduate student, of an
NDSEG
(National Defense Science and Engineering) Graduate Fellowship,
thanks to
the Air Force Office of Scientific
Research
(AFOSR).
Education
I received my Ph.D. from Carnegie Mellon University . I was a member of the Computer Science Department in the School of Computer Science.
I double-majored in Computer
Science and Math in the
College of Arts and Sciences
at Cornell University.
While I was a student, I enjoyed being a part of a number of interesting courses:
I was a teaching assistant for 15-212 (Fall 2009), Carnegie Mellon's course on
functional programming (ML). It was taught by Professor Steven Brookes.
I was a teaching assistant for
15-213 (Fall 2007), Carnegie Mellon's course on
computer architecture from a programmer's perspective (such as representation
of ints and floats, understanding assembly language, and buffer overflows). It
was taught by Professor Todd Mowry
and Professor Greg Ganger.
I was a teaching assistant for CS 482 in
Spring 2004 with Professor
Jon Kleinberg. CS 482 is Cornell's required CS theory course
covering algorithms topics such as greedy algorithms, dynamic
programming, network flow, and NP-completeness.
I was a teaching assistant for CS 381 in Fall 2003 with Professor John Hopcroft. CS
381 is Cornell's required CS theory course covering finite automata,
context-free languages, and Turing machines.
Last Modified: September 2014