Computer Science Department
Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213-3891 |
Carnegie Mellon University, Pittsburgh, PA
Ph.D. in Computer Science, Sep 2000 - Sep 2007 (expected)
Research Advisor: Srinivasan Seshan
University of Michigan, Ann Arbor, MI
M.S. in Computer Science and Engineering, Aug 2000
Research Advisor: Farnam Jahanian
University of Michigan, Ann Arbor, MI
B.S. in Computer Science, December 1997
No longer an academic plaything, the Internet today is a vital global resource. Despite its importance, however, its availability leaves much to be desired. In particular, end-users are often disconnected from the Internet due to (planned) network maintenance, deployment of critical security updates, and hardware and software failures.
A key challenge to higher availability is that most users maintain only
a single connection to the Internet. As such, when the router on the
ISP's side of that connection (the "access router") is out of service,
the user is disconnected from the Internet. While router vendors have
proposed methods for improving the reliability of access routers,
the proposed solutions are costly, complex, and have yet to achieve
their availability goals.
We proposed an alternative approach to high-availability, inspired by
the design of RAID storage systems, and web-server farms. In this
approach, which we call RouterFarm, we mask the unavailability of
access routers by dynamically moving (``re-homing'') users to spare
access routers.
Through a test-bed evaluation, we showed that RouterFarm can reduce
down-time due to planned maintenance such as software upgrades
by a factor of 3-5, depending on the number of customers and their
configuration. We have also explored how RouterFarm can improve the
manageability of IP networks. For example, the ability to dynamically
re-home users could be used to dynamically balance load across
routers, rather than relying on conservative engineering rules at the
time of provisioning.
Peer-to-peer systems have rapidly been widely adopted for applications such as file sharing and Internet telephony. However, evolving these systems is difficult, owing to the difficulty of designing interoperable changes to their distributed protocols (such as for routing, cooperative caching, and load-balancing).
In this work, we proposed simultaneous execution as a methodology for
evolving peer-to-peer storage systems. In this methodology, each
version of the application runs its own distributed protocols, but
client-visible state is kept loosely consistent by replicating
operations that modify that state in both versions.
We prototyped the use of this methodology to perform upgrades of the
Cooperative File System and IrisLog, using Xen as the virtual machine
monitor, and user-space proxies for replication of state-modifying
operations. We found that the developer effort to employ our
methodology was small (about 1000 lines of code for each application),
but that the overheads of our prototype were limiting.
Performance with Satellite Networks
A recent development in Internet access technologies is satellite based Internet service. Typically, users connect to the network using a modem for uplink, and a satellite dish for downlink. We investigated how the performance of these networks might be improved by two simple techniques: caching and use of the return path on the modem link. Via simulation, we showed that caching alone can simultaneously reduce bandwidth requirements by 33% and improve response times by 62%. We then developed heuristic schedulers for determining which requests to service over the satellite link, and which to serve over the return path of the modem. We showed that the combination of caching with the heuristic schedulers yields a system that performs far better under high loads than existing systems.
Server Performance
Although web servers typically service requests in a manner that approximates fair allocation of network bandwidth across client requests, scheduling theory tells us that scheduling requests using the SRPT (shortest remaining processing time first) policy will minimize mean response time. One concern about using SRPT, however, is that requests for large files will be starved, in favor of requests for small files, when the server is heavily loaded.
In this work, we evaluated SPRT experimentally, using Apache and
Linux. In LAN scenarios, we found that SRPT decreased mean response
time by a factor of 3-8, under high load, with only a 20% increase in
response time for large files. Significantly, only the request for the
very largest files (0.5% of requests overall) achieve better
performance under traditional fail scheduling than under SRPT. The
gains achieved by SRPT in a WAN setting are lesser, but improvements
of a factor of two are still possible.
IBM T.J. Watson Research Center, Summer 2000.
Server Selection
DNS-based server selection is a widely used technique for directing web clients to web servers. The technique attempts to direct clients to servers that are likely to deliver the best performance, based on network topology information and server load measurements.
In this work, we identified and quantified two factors that limit the
effectiveness of DNS-based server selection. First, the technique
defeats DNS caching, thus potentially increasing service
latency. Second, the technique assumes that web clients are
topologically near their name servers - an assumption that does not
always hold.
We found that, due to its effects on DNS caching, DNS-based server
selection substantially increases the time required for the name
resolution phase of web requests. We also found that a significant
fraction of clients are distant from their name servers.
Investigated the detection of anomalous network behavior using statistical profiling of network flow data. Developed components of a system for near-realtime monitoring of Internet performance. Components included a multithreaded data visualization toolkit supporting push and pull data retrieval, and a flexible, extensible, multithreaded system for report generation.
Designed and implemented an OS abstraction/resource management software component for facilities such as threads and timers. Conducted preliminary investigation of approaches to providing fail-over support for a network control system.