Clean Slate Architectures for Network Management
While Internet Protocol (IP) has been a runaway success,
today's IP
networks are difficult to manage well. We take a clean slate
approach for redesiging different aspects of network control
and management, guided by the following three principles:
Network-level objectives: Running a robust
data network
depends on satisfying objectives for performance, reliability, and
policy that can (and should) be expressed as goals for the entire
network, separately from the low-level network elements.
Network-wide views: Timely, accurate,
network-wide views of
topology, traffic, and events are crucial for running a robust
network.
Direct control: The decision logic should
provide network operators with a direct interface to configure network
elements; this logic should not be implicitly or explicitly
hardwired in
protocols distributed among switches.
These design principles have been embodied in three research
initiatives:
- An architecture for centralizing network decision logic
- The theory and practice of interconnecing multiple routing instances
- The design of new flow
monitoring solutions
The 4-D Architecture
|
Layers
of the 4D architecture |
Despite the early design goal of minimizing the state in network
elements, tremendous amounts of state are distributed
across routers and management platforms in IP networks.
We believe that the many, loosely-coordinated actors
that create and manipulate the distributed state introduce
substantial complexity that makes both backbone and enterprise
networks increasingly fragile and difficult to manage.
In the 4D architecture, we decompose the functions of network
control into 4
planes: A decision plane that is responsible for
creating a
network configuration (e.g. computing FIBs for each router in the
network); a dissemination plane that gathers
information about
network state (e.g. link up/down information) to the decision plane,
and distributes decision plane output to routers; a discovery
plane that enables devices to discover their directly connected
neighbors; and a data plane for forwarding network
traffic.
Publications
- Tesseract: A 4D Network Control Plane
Hong Yan, David A. Maltz, T. S. Eugene Ng, Hemant Gogineni, Hui Zhang, Zheng Cai.
Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI '07), April 2007.
- A Clean Slate 4D Approach to Network Control and Management
Albert Greenberg, Gisli Hjalmtysson, David A. Maltz, Andy Myers, Jennifer Rexford, Geoffrey Xie, Hong Yan, Jibin Zhan, Hui Zhang.
ACM SIGCOMM Computer Communication Review. 35(5). October, 2005.
- Network-Wide Decision Making: Toward A Wafer-Thin Control Plane
Jennifer Rexford, Albert Greenberg, Gisli Hjalmtysson, David A. Maltz, Andy Myers, Geoffrey Xie, Jibin Zhan, and Hui Zhang.
Proceedings of HotNets III. November, 2004.
- Refactoring Network Control and Management: A Case for the 4D Architecture
Albert Greenberg, Gisli Hjalmtysson, David A. Maltz, Andy Myers, Jennifer Rexford, Geoffrey Xie, Hong Yan, Jibin Zhan, Hui Zhang.
CMU CS Technical Report CMU-CS-05-117, September 2005.
- On Static Reachability Analysis of IP Networks
Geoffrey Xie, Jibin Zhan, Dave Maltz, Hui Zhang, Albert Greenberg, Gisli Hjalmtysson, Jennifer Rexford.
Proceedings of IEEE Infocom 2005.
- Routing Design in Operational Networks:A Look from the Inside
D. Maltz, G. Xie, J. Zhan, H. Zhang, A. Greenberg, G. Hjalmtysson.
Proceedings of ACM SIGCOMM 2004.
Theory and Practice of Interconnecting Multiple Routing
Instances
Today, a large body of research exists on the correctness of existing
routing protocols.
However, analytical frameworks for studying routing dynamics have
mostly focused on one single routing protocol instance at a time. In
reality, the Internet is composed of, not one (e.g., BGP) but, a
multitude of protocol instances that need to interact. For example,
routes must be exchanged between BGP and OSPF. The interactions between
these protocol instances are governed by the routing glue
component. However, despite its wide usage and essential role, there
has been no formal investigation into how safe its usage is. We develop
analytical models to rigorously analyze the interactions between
multiple routing protocol instances, and its impacts on a network-wide
level. We show that making routing protocols safe alone is not
sufficient to ensure the correctness of Internet routing but the
routing glue plays an equally important part: Its usage can result in a
wide range of routing anomalies including persistent forwarding loops
and permanent route oscillations. This routing glue deserves further
attention from the networking community.
Publications:
- Instability Free Routing: Beyond One Protocol Instance
Franck Le, Geoffrey Xie, Hui Zhang.
Proceedings of ACM CoNEXT '08, December 2008.
Brief description
The interactions between routing protocol instances are in fact governed by two procedures: route redistribution permits the exchange of routing information, and route selection allows routers to rank routes received from different instances. We demonstrate that the problem is broader than that of route redistribution alone. Route selection by itself, i.e., the mere co-existence of multiple routing protocol instances, and its interplay with route redistribution can each result in routing anomalies. We show that the routing glue could actually be at the origins of many global disruptions of the Internet connectivity that have been reported but could not be fully explained so far.
- Shedding Light on the Glue Logic of the Internet Routing Architecture (Slides)
Franck Le, Geoffrey Xie, Dan Pei, Jia Wang, Hui Zhang.
Proceedings of ACM SIGCOMM '08, August 2008.
Brief description
We conduct a large-scale empirical study of the prevalence and usage of the routing glue in more than 1600 operational networks. The evidence show that the routing glue is widely deployed. More surprisingly, we discover that operators depend on the routing glue not simply to interconnect routing instances but also to implement complex design objectives that existing routing protocols (e.g., BGP) alone cannot accomplish. This reinforces the importance of the role played by the routing glue. Finally, we find that actual deployed configurations can be vulnerable to routing anomalies. These results confirm the importance of the problem.
- Understanding Route Redistribution (Slides)
Franck Le, Geoffrey Xie, Hui Zhang.
Proceedings of IEEE ICNP '07, October 2007.
Best Paper Award.
Brief description
We develop an analytical model to rigorously analyze the impacts of route redistribution, i.e., the exchange of routing information between different routing protocol instances, on a network-wide level. We illustrate how easily inaccurate configurations of route redistribution may cause severe routing instabilities (including route oscillations and persistent routing loops) and we discuss potential changes to the current route redistribution procedure to guarantee safety.
- On Guidelines for Safe Route Redistributions
Franck Le, Geoffrey Xie.
Proceedings of ACM SIGCOMM Workshop on Internet Network Management (INM'07), August 2007.
Brief description
We show that existing recommendations put forth by router vendors do not effectively protect against routing anomalies. Configurations of route redistribution, compliant with existing guidelines, can still experience permanent route oscillations and other unacceptable instabilities. Consequently, we propose a set of new configuration guidelines for different targeted objectives. The configuration guidelines consist of sufficient conditions for the usage of route redistribution, and we formally prove that each guideline will prevent the targeted routing anomaly.
Technical Reports:
Rethinking Flow Monitoring: A Coordinated RISC Architecture
for Network Flow Monitoring
|
|
RISC
vs. application-specific approaches
|
Example
of a network-wide RISC approach |
Flow
monitoring supports several critical network management tasks such as
traffic engineering, accounting, anomaly detection, identifying and
understanding end-user applications, understanding traffic structure at
various granularities, detecting worms, scans, and botnet activities,
and forensic analysis. These require high-fidelity estimates of traffic
metrics relevant to each application. The set of network management and
security applications is a moving target, and new applications arise as
the nature of both normal and anomalous traffic patterns changes over
time. We make the case for a "RISC" approach for flow
monitoring
which employs simple collection primitives on each monitoring
device and manages them in an intelligent network-wide fashion, to
ensure that the collected data will support computation of metrics of
interest to various applications. A RISC architecture dramatically
reduces the implementation complexity of monitoring elements; enables
router vendors and researchers to focus their energies on building
efficiently implementing a small number of primitives; and
allows
late binding to what traffic metrics are important, thus insulating
router implementations from the changing needs of flow monitoring
applications.
Presentation
Rethinking NetFlow
Publications
- A Case for a RISC Architecture for Network Flow Monitoring
Vyas Sekar, Michael K Reiter, Hui Zhang,
CMU CS Technical
Report CMU-CS-09-125
Brief
description
This paper addresses
the question of whether we need complex application-specific primitives to meet the demands
of different flow monitoring applications or if it suffices to implement a small number of
"RISC" primitives on routers to get sufficient fidelity across the entire spectrum
of applications.
- Coordinated Sampling sans Origin-Destination Identifiers:
Algorithms, Analysis, and Evaluation
Vyas Sekar, Anupam Gupta, Michael K Reiter, Hui Zhang,
CMU CS Technical
Report CMU-CS-09-104
Brief
description
This paper describes how to
implement cSamp using only local information on routers without
requiring global OD-pair identifiers. It provides an immediate and incremental
deployment path for ISPs without requiring changes to packet headers or the existing
routing infrastructure.
- cSamp: A
System for
Network-Wide Flow Monitoring
Vyas Sekar, Michael K. Reiter, Walter Willinger, Hui Zhang, Ramana Rao
Kompella, David G. Andersen
Proceedings of NSDI 2008
Brief
description
This paper describes the basic Coordinated Sampling framework.
There are three key ideas: flow sampling, hash-based coordination, and an optimization framework
for meeting network-wide flow monitoring objectives while operating within router resource constraints.