SSS Abstracts |
Compact Representations of Ordered Sets
Friday, March 12th, 2004 from 12-1 pm in NSH 1507.
Presented by Dan Blandford,
Joint work with Guy Blelloch.
I discuss a blocking technique for improving the space efficiency of ordered-set data structures such as balanced trees. This technique stores blocks of compressed values rather than individual values. The space bounds of the resulting structure are provably good, and the blocks can be decompressed in O(1) time using table lookup.
Blocks are compressed using difference codes such as the gamma code. I describe prefix codes, difference coding and gamma codes, and I discuss how gamma codes can be rapidly decoded using table lookup.
I present experimental results using treaps and red-black trees. The blocking technique improves compression by a factor of up to ten on red-black trees (depending on the density of the set), while causing slowdown of a factor of one to three.
This talk is in partial fulfillment of the speaking requirement.
A Scalable Peer-to-Peer Content Discovery System Supporting Complex Queries
Friday, March 19th, 2004 from 12-1 pm in NSH 1507.
Presented by Jun Gao,
Joint work with Peter Steenkiste
A Content Discovery System (CDS) is a distributed system that allows nodes in the system to discover contents published by other nodes. CDSes are an essential component in many distributed applications including wide-area service discovery systems, peer-to-peer (P2P) object sharing systems, publication subscription systems and sensor networks. However, current systems often have difficulties in achieving both rich functionality and good scalability. For example, applications built directly on top of Distributed Hash Tables (DHTs) are scalable but only allow exact name lookup. Peer-to-peer file sharing systems such as Gnutella and KaZaA, on the other hand, offer searching capability, but their flooding-based searching mechanism does not scale well with the number of queries.
In this talk, I will present the design and evaluation of Camel, a DHT-based CDS that scales well with both registration and query load and provides efficient support for complex searches such as range and similarity queries. In particular, I will describe how Camel achieves scalability through the use of rendezvous points to avoid system-wide message broadcasting. I will present a novel distributed load balancing mechanism that dynamically recruits lightly loaded nodes in the system to eliminate hot spots caused by skewed load such as flash crowds, and thus ensures high system throughput. I will discuss in detail a new distributed data structure, the Range Search Tree (RST), that automatically optimizes itself to adapt to the load and support range queries efficiently in a fully distributed fashion.
Compact Representations of Ordered Sets
Friday, April 9th, 2004 from 12-1 pm in NSH 1507.
Presented by Dan Blandford,
Joint work with Guy Blelloch.
I discuss a blocking technique for improving the space efficiency of ordered-set data structures such as balanced trees. This technique stores blocks of compressed values rather than individual values. The space bounds of the resulting structure are provably good, and the blocks can be decompressed in O(1) time using table lookup.
Blocks are compressed using difference codes such as the gamma code. I describe prefix codes, difference coding and gamma codes, and I discuss how gamma codes can be rapidly decoded using table lookup.
I present experimental results using treaps and red-black trees. The blocking technique improves compression by a factor of up to ten on red-black trees (depending on the density of the set), while causing slowdown of a factor of one to three.
This talk is in partial fulfillment of the speaking requirement.
The IOC algorithm: Efficient Many-Class Non-parametric Classification for High-Dimensional Data
Friday, April 23rd, 2004 from 12-1 pm in NSH 1507.
Presented by Ting Liu,
Joint work: Ke Yang, Andrew W. Moore
K nearest neighbor remains a very popular classification technique, especially in areas such as computer vision, drug activity prediction and astrophysics. Furthermore, many more modern classifiers, such as kernel-based Bayes classifiers or the prediction phase of SVMs, require computational regimes similar to k-NN. We believe that tractable k-NN algorithms therefore continue to be important.
This work relies on the insight that even with many classes, the task of finding the majority class among the k nearest neighbors of a query need not require us to explicitly find those k nearest neighbors. This insight was previously used in (Liu et al. 2003) in algorithms which dealt with fast classification in the case of two classes. In this work we show how a different approach, IOC (standing for International Olympic Commity), can apply to the case of n classes where n > 2.
IOC assumes a slightly different processing of the data points in the neighborhood of the query. This allows it to search a set of ball- trees, one for each class. During the searches it is possible to quickly prune away classes that cannot possibly be the majority. Further accelerations are possible by a process (called RIOC) that initially estimates and then confirms the majority class.
This talk is in partial fulfillment of the speaking requirement.
An Efficient Decision Procedure for Separation Logic
Friday, April 30th, 2004 from 12-1 pm in NSH 1507.
Presented by Murali Talupur,
Separation Logic consists of a Boolean combination of predicates of the form x ≥ y + c where c is a constant and x, y are variables of type real or integer. Any linear equality or inequality can be expressed in this logic. Separation logic formulas arise quite often during the verification of hardware/software systems and thus it is important to have an efficient decision procedure to decide them. In this talk I will describe a new decision procedure based on enumerating a small domain for each variable in a Separation formula F. These small domains are provably sufficient for preserving the satisfiability of F. Once we have reduced the domains of the variables from integer/real to small finite domains, we can encode the separation formula F as a propositional formula. We can then take advantage of the powerful SAT solvers to decide the formula. I will present some experimental results which demonstrate the efficacy of our method.
This is joint work with N. Sinha, O. Strichman, and A. Pnueli
This talk is in partial fulfillment of the speaking requirement.
Polyphonic Audio Matching and Alignment
Friday, May 7th, 2004 from 12-1 pm in NSH 3305.
Presented by Ning Hu,
Getting computers to understand and process audio recordings in terms of their musical content is a difficult challenge. In this talk, I will describe a method in which general, polyphonic audio recordings of music can be aligned to symbolic score information in standard MIDI files. Because of the difficulties of polyphonic transcription, we perform matching and alignment directly on acoustic features that we extract from MIDI and audio.
One of the challenges in Music Information Retrieval (MIR) is to find correspondences among these different music representations. By using this method, we can search through a MIDI database to find the MIDI file corresponding to a polyphonic audio recording. Polyphonic audio matching and alignment can also be used for polyphonic score following, building intelligent editors that understand the content of recorded audio, and the analysis of expressive performance.
This talk is in partial fulfillment of the speaking requirement.
The Mixture Modeling Theory of Hippocampal Place Cell Remapping
Friday, May 7th, 2004 from 1:30-2:30 am in WeH 4623.
Presented by Mark Fuhs,
Place cells throughout the rat hippocampus show location-specific firing fields that topologically reorganize, or "remap", when the rat moves to a different environment. In some cases, however, remapping occurs only after extensive exposure to one or both environments, indicating that it is experience-dependent and not purely sensory driven. We propose that this construction of a multi-map representation of the world can be understood as a mixture modeling process, where the degree of remapping between environments reflects the perceived statistical likelihood that the features of each environment form distinct clusters. We simulated the construction of mixture models for several different training paradigms where the environments differed (square vs. circular arenas, or different room location, or different rooms), or the animals had different amounts of pretraining. We found that the observed time course of remapping could be explained by the degree of similarity between environments and the amount of experience the rat had in each one. Our computationally explicit mixture modeling theory can also be used to predict the time course of remapping in as-yet untested training paradigms.
This talk is in partial fulfillment of the speaking requirement.
Web contact: sss+www@cs