|
Camel :
A Distributed and Scalable Content Discovery System
Overview |
Members |
Publications |
Presentations |
Software
A Content Discovery System (CDS) is a distributed system that enables the
discovery of contents. A node in a CDS can publish and provide contents,
issue queries looking for contents, store contents or contents' meta-data
published by other nodes, and resolve other nodes' queries.
There exists a wide spectrum of distributed applications that either
themselves are CDS systems or use a CDS as one of their major components.
Examples include service discovery services, peer-to-peer (P2P) object sharing
systems, sensor networks and publication-subscription (pub/sub) systems.
The primary task of a CDS is to efficiently locate the set of contents that
matches a client's query. Existing CDS systems have
difficulties in achieving both rich functionality and scalability.
At one end, they may be able to scale to the Internet level but offer
limited functionality, e.g., they support exact content name lookup [Chord,
CAN, Pastry, Tapestry] only, or the search of strictly hierarchical
content names [DNS], or they consider static contents only, e.g.,
search engines [Google].
At the other end, they may offer general searching capability of both
static and dynamic contents, but their searching mechanisms are not
scalable [Gnutella, KaZaa].
In this project, we design Camel, a distributed and scalable CDS that
overcomes the above difficulties and enables powerful content discovery
on the Internet. Camel uses a Distributed Hash Tables (DHT) as an overlay
network substrate, and possesses the following properties:
- Scalability.
Camel achieves scalability through the use of
Rendezvous Points (RPs), and thus avoids system-wide
message flooding at both content registration and query time.
- Load Balancing.
Camel deploys a novel mechanism that uses Load Balancing Matrices (LBMs)
to dynamically balance both registration and query load in a truly distributed
fashion to ensure its throughput, even under extremely skewed load,
such as flash crowds.
- Rich searchability.
Camel utilizes a flexible attribute-value based naming scheme for searching,
and provides efficient support for complex queries, such as subset matching
based queries, range and similarity queries.
Camel is designed as a generic software layer such that
high level applications can be built on top of it.
We have implemented Camel in a simulator as well as a real Internet
implementation.
As a proof of concept, we integrated Camel with a content-based
music classification engine, and
implemented a distributed music information retrieval system.
Please refer to our publications for more technical details.
|
-
Efficient Support for Similarity Searches in DHT-based Peer-to-Peer
Systems.
Jun Gao and Peter Steenkiste.
To appear in Proceedings of the 2007 IEEE International Conference on
Communications (ICC'07), Glasgow, Scotland, June 2007.
FULL TEXT:
(154KB)
-
A Distributed and Scalable Peer-to-Peer
Content Discovery System Supporting Complex Queries.
Jun Gao.
Ph.D. Thesis., CMU Technical Report, CMU-CS-04-170,
Computer Science Department, Carnegie Mellon University, Oct. 2004.
FULL TEXT:
(1.8MB)
-
An Adaptive Protocol for Efficient Support of Range Queries in DHT-based Systems.
Jun Gao and Peter Steenkiste.
In Proceedings of the 12th IEEE International Conference on
Network Protocols (ICNP'04), pages 239-250, Berlin, Germany, Oct. 2004.
FULL TEXT:
(277KB)
(A previous version of this paper is published as
CMU Technical Report, CMU-CS-03-215, Dec. 2003.)
-
Design and Evaluation of a
Distributed Scalable Content Discovery System.
Jun Gao and Peter Steenkiste.
IEEE Journal on Selected Areas in Communications (JSAC),
22(1):54-66, January 2004. Special Issue on Recent
Advances in Service Overlay Networks.
FULL TEXT:
(544KB)
-
A Scalable
Peer-to-Peer System for Music Information Retrieval.
George Tzanetakis, Jun Gao, and Peter Steenkiste.
Computer Music Journal, 28(2):24-33, June 2004. The MIT Press.
FULL TEXT:
(89KB; Available from MIT Press)
(Previous version appeared in
Proceedings of the Fourth International Conference on Music Information
Retrieval (ISMIR'03), pages 209-214, Baltimore, MD, October, 2003.)
FULL TEXT:
(109KB)
-
Content-Based Retrieval of Music in Scalable Peer-to-Peer Networks.
Jun Gao, George Tzanetakis, and Peter Steenkiste.
In Proceedings of 2003 IEEE International Conference on Multimedia &
Expo(ICME'03), pages 309-312, volume I, Baltimore, MD, July, 2003.
FULL TEXT:
(90KB)
-
Rendezvous Points-Based Scalable Content Discovery with Load Balancing.
Jun Gao and Peter Steenkiste.
In Proceedings of the Fourth International Workshop on
Networked Group Communication (NGC'02), pages 71-78, Boston, MA, Oct. 2002.
FULL TEXT:
(360KB)
-
Distributed Scalable Content Discovery Based on Rendezvous Points.
Jun Gao
Ph.D. Thesis Proposal., Computer Science Department, Carnegie Mellon
University, May 20th, 2002.
FULL TEXT:
(340KB)
- Meeting of the Minds, May 2004. (Adam's poster,
here is one graph that shows
registration load balancing on a PlanetLab experiment.)
- CWBN Spring Review Talk, April 2004.
- Invited Talk at IBM TJ Watson, March, 2004.
- CMU Student Seminar Talk, March 2004. (Abstract)
- ISMIR'03 talk (George).
- ICME'03 poster.
- NGC'02 talk.
PowerPoint |
PDF |
Gzip'd Postscript
( Best Student Presentation Award)
- Proposal Talk
PowerPoint |
PDF |
PostScript
- CDS Simulator. A comprehensive event-driven simulator that implements all
functionalities of Camel.
- Real implementation. Added a separate library on top of Chord.
Exports simple API to applications that use Camel. Currently runs on the Planet Lab testbed.
We will be releasing the simulator and our real implementation soon.
Please email any comments to Jun Gao.
Maintained by Jun Gao.
|
Last updated on July 1, 2004.
|
|