go back to the main page

SSS Abstracts
Spring 2004

go back to the main page

The Case for End System Multicast

Friday, February 20^th, 2004 from 12-1 pm in NSH 1507.

Presented by Sanjay G. Rao,

The vision of enabling ubiquitous deployment of bandwidth-demanding broadcasting and conferencing applications on the Internet has driven significant research activity in the networking community over the last two decades. For much of the 1990's, the conventional wisdom was that such applications are best supported by the IP Multicast architecture. However, despite significant efforts by the research and industrial community, IP Multicast has seen limited deployment. Over the last 5 years, we have been exploring an alternate architecture for supporting such applications that we termed End System Multicast (ESM). Here, end systems participating in the application organize into efficient overlays for data delivery, and there is no support from routing infrastructure. While ESM has the potential to address issues with IP Multicast, the key concern is performance.

In this talk, I will present three phases of our research in ESM, each of which has further strengthened the case for the architecture while addressing key issues in the process. The early phase involved the design of one of the first self-organizing overlay protocols (Narada), and laid the early evaluation framework in the area. The second phase has involved building and deploying a broadcasting system based on ESM that has been used for over 15 real events including the Sigcomm and SOSP conferences and has been used by over 3000 users. I will talk in greater detail about a key component of this effort: optimizing overlays for bandwidth-demanding applications. The third phase involved exploring design issues that were motivated by real deployment experience. In particular, I will describe the critical need for addressing heterogeneous bandwidth constraints in application end-point environments through differential treatment of end systems, an issue that has been largely overlooked by the research community.

Compact Representations of Ordered Sets

Friday, March 12^th, 2004 from 12-1 pm in NSH 1507.

Presented by Dan Blandford,

Joint work with Guy Blelloch.

I discuss a blocking technique for improving the space efficiency of ordered-set data structures such as balanced trees. This technique stores blocks of compressed values rather than individual values. The space bounds of the resulting structure are provably good, and the blocks can be decompressed in O(1) time using table lookup.

Blocks are compressed using difference codes such as the gamma code. I describe prefix codes, difference coding and gamma codes, and I discuss how gamma codes can be rapidly decoded using table lookup.

I present experimental results using treaps and red-black trees. The blocking technique improves compression by a factor of up to ten on red-black trees (depending on the density of the set), while causing slowdown of a factor of one to three.

This talk is in partial fulfillment of the speaking requirement.

A Scalable Peer-to-Peer Content Discovery System Supporting Complex Queries

Friday, March 19^th, 2004 from 12-1 pm in NSH 1507.

Presented by Jun Gao,

Joint work with Peter Steenkiste

A Content Discovery System (CDS) is a distributed system that allows nodes in the system to discover contents published by other nodes. CDSes are an essential component in many distributed applications including wide-area service discovery systems, peer-to-peer (P2P) object sharing systems, publication subscription systems and sensor networks. However, current systems often have difficulties in achieving both rich functionality and good scalability. For example, applications built directly on top of Distributed Hash Tables (DHTs) are scalable but only allow exact name lookup. Peer-to-peer file sharing systems such as Gnutella and KaZaA, on the other hand, offer searching capability, but their flooding-based searching mechanism does not scale well with the number of queries.

In this talk, I will present the design and evaluation of Camel, a DHT-based CDS that scales well with both registration and query load and provides efficient support for complex searches such as range and similarity queries. In particular, I will describe how Camel achieves scalability through the use of rendezvous points to avoid system-wide message broadcasting. I will present a novel distributed load balancing mechanism that dynamically recruits lightly loaded nodes in the system to eliminate hot spots caused by skewed load such as flash crowds, and thus ensures high system throughput. I will discuss in detail a new distributed data structure, the Range Search Tree (RST), that automatically optimizes itself to adapt to the load and support range queries efficiently in a fully distributed fashion.

Compact Representations of Ordered Sets

Friday, April 9^th, 2004 from 12-1 pm in NSH 1507.

Presented by Dan Blandford,

Joint work with Guy Blelloch.

I discuss a blocking technique for improving the space efficiency of ordered-set data structures such as balanced trees. This technique stores blocks of compressed values rather than individual values. The space bounds of the resulting structure are provably good, and the blocks can be decompressed in O(1) time using table lookup.

Blocks are compressed using difference codes such as the gamma code. I describe prefix codes, difference coding and gamma codes, and I discuss how gamma codes can be rapidly decoded using table lookup.

I present experimental results using treaps and red-black trees. The blocking technique improves compression by a factor of up to ten on red-black trees (depending on the density of the set), while causing slowdown of a factor of one to three.

This talk is in partial fulfillment of the speaking requirement.

The IOC algorithm: Efficient Many-Class Non-parametric Classification for High-Dimensional Data

Friday, April 23^rd, 2004 from 12-1 pm in NSH 1507.

Presented by Ting Liu,

Joint work: Ke Yang, Andrew W. Moore

K nearest neighbor remains a very popular classification technique, especially in areas such as computer vision, drug activity prediction and astrophysics. Furthermore, many more modern classifiers, such as kernel-based Bayes classifiers or the prediction phase of SVMs, require computational regimes similar to k-NN. We believe that tractable k-NN algorithms therefore continue to be important.

This work relies on the insight that even with many classes, the task of finding the majority class among the k nearest neighbors of a query need not require us to explicitly find those k nearest neighbors. This insight was previously used in (Liu et al. 2003) in algorithms which dealt with fast classification in the case of two classes. In this work we show how a different approach, IOC (standing for International Olympic Commity), can apply to the case of n classes where n > 2.

IOC assumes a slightly different processing of the data points in the neighborhood of the query. This allows it to search a set of ball- trees, one for each class. During the searches it is possible to quickly prune away classes that cannot possibly be the majority. Further accelerations are possible by a process (called RIOC) that initially estimates and then confirms the majority class.

This talk is in partial fulfillment of the speaking requirement.

An Efficient Decision Procedure for Separation Logic

Friday, April 30^th, 2004 from 12-1 pm in NSH 1507.

Presented by Murali Talupur,

Separation Logic consists of a Boolean combination of predicates of the form x ≥ y + c where c is a constant and x, y are variables of type real or integer. Any linear equality or inequality can be expressed in this logic. Separation logic formulas arise quite often during the verification of hardware/software systems and thus it is important to have an efficient decision procedure to decide them. In this talk I will describe a new decision procedure based on enumerating a small domain for each variable in a Separation formula F. These small domains are provably sufficient for preserving the satisfiability of F. Once we have reduced the domains of the variables from integer/real to small finite domains, we can encode the separation formula F as a propositional formula. We can then take advantage of the powerful SAT solvers to decide the formula. I will present some experimental results which demonstrate the efficacy of our method.

This is joint work with N. Sinha, O. Strichman, and A. Pnueli

This talk is in partial fulfillment of the speaking requirement.

Polyphonic Audio Matching and Alignment

Friday, May 7^th, 2004 from 12-1 pm in NSH 3305.

Presented by Ning Hu,

Getting computers to understand and process audio recordings in terms of their musical content is a difficult challenge. In this talk, I will describe a method in which general, polyphonic audio recordings of music can be aligned to symbolic score information in standard MIDI files. Because of the difficulties of polyphonic transcription, we perform matching and alignment directly on acoustic features that we extract from MIDI and audio.

One of the challenges in Music Information Retrieval (MIR) is to find correspondences among these different music representations. By using this method, we can search through a MIDI database to find the MIDI file corresponding to a polyphonic audio recording. Polyphonic audio matching and alignment can also be used for polyphonic score following, building intelligent editors that understand the content of recorded audio, and the analysis of expressive performance.

This talk is in partial fulfillment of the speaking requirement.

The Mixture Modeling Theory of Hippocampal Place Cell Remapping

Friday, May 7^th, 2004 from 1:30-2:30 am in WeH 4623.

Presented by Mark Fuhs,

Place cells throughout the rat hippocampus show location-specific firing fields that topologically reorganize, or "remap", when the rat moves to a different environment. In some cases, however, remapping occurs only after extensive exposure to one or both environments, indicating that it is experience-dependent and not purely sensory driven. We propose that this construction of a multi-map representation of the world can be understood as a mixture modeling process, where the degree of remapping between environments reflects the perceived statistical likelihood that the features of each environment form distinct clusters. We simulated the construction of mixture models for several different training paradigms where the environments differed (square vs. circular arenas, or different room location, or different rooms), or the animals had different amounts of pretraining. We found that the observed time course of remapping could be explained by the degree of similarity between environments and the amount of experience the rat had in each one. Our computationally explicit mixture modeling theory can also be used to predict the time course of remapping in as-yet untested training paradigms.

This talk is in partial fulfillment of the speaking requirement.

Web contact: sss+www@cs