SSS Abstracts |
Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems
Friday, March 14th, 2003 from 12-1 pm in NSH 1507.
Presented by Kay Sripanidkulchai,
Locating content in decentralized peer-to-peer systems is a challenging problem. Gnutella, a popular file-sharing application, relies on flooding queries to all peers. Although flooding is simple and robust, it is not scalable. In this talk, I will present an approach that retains the simplicity of Gnutella, while greatly improving its inherent weakness: scalability. We propose a solution in which peers loosely organize themselves into an interest-based structure on top of the existing Gnutella network. Our approach exploits a simple, yet powerful principle called interest-based locality, which posits that if a peer has a particular piece of content that one is interested in, it is very likely that it will have other items that one is interested in as well. When using our algorithm, called interest-based shortcuts, a significant amount of flooding can be avoided, making Gnutella a more competitive solution.
I will demonstrate the existence of interest-based locality in five diverse traces of content distribution applications, two of which are traces of popular peer-to-peer file-sharing applications. Trace-driven simulations show that interest-based shortcuts often resolve queries quickly in one peer-to-peer hop, while reducing the total load in the system by a factor of 3 to 7.
Truecasing
Friday, March 21st, 2003 from 12-1 pm in WeH 5409.
Presented by Lucian Lita,
Truecasing is the process of restoring case information to badly-cased or non-cased text. Although not a new problem, case restoration has not been properly studied with respect to its impact to text processing tasks. In this talk I will focus on truecasing issues and propose a statistical, language modeling based truecaser which achieves an accuracy of ~98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing. In the context of automatic content extraction, mention detection performance on automatic speech recognition text is considerably improved. Truecasing also enhances machine translation output legibility and yields a BLEU score improvement of 80.2%. This paper argues for the use of truecasing as a valuable component in text processing applications.
This talk is in partial fulfillment of the speaking requirement.
Physically Large Displays Improve Spatial Performance
Friday, April 4th, 2003 from 12-1 pm in NSH 1507.
Presented by Desney Tan,
Large wall-sized displays are becoming prevalent. Although researchers have articulated qualitative benefits of group work on large displays, little work has been done to quantify the benefits for individual users. I ran three studies comparing the performance of users working on a large projected wall display to that of users working on a standard desktop monitor. In these studies, we held the visual angle constant by adjusting the viewing distance to each of the displays. Results from the first study indicate that although there was no significant difference in performance on a reading comprehension task, users performed about 26% better on a spatial orientation task done on the large display. Results from the second and third studies suggest that the large display affords a greater sense of presence, allowing users to treat the spatial task as an egocentric rather than an exocentric rotation. Results from these studies show that users perform better when they are provided with an egocentric strategy than when they use an exocentric one. In the absence of an explicit strategy, users choose an exocentric one when working on the small display and the much more efficient egocentric one when working on the large display. I discuss future work to extend my findings and formulate design principles for computer interfaces and physical workspaces.
This talk is in partial fulfillment of the speaking requirement.
Statistics, Geometry, and the Cosmos
Friday, April 18th, 2003 from 12-1 pm in NSH 1507.
Presented by Alexander Gray,
In this talk I'll begin by describing the central question of cosmology: is the `Standard Model' of the origin and structure of the universe, which implies an expanding universe among other things, correct or not? A little-known fact, outside of astrophysics and spatial statistics, is that the computational tractability of a fundamental family of statistics called the n-point correlations is one of the only remaining obstacles to answering this question. Unfortunately no significant computational advance has been made on the n-point correlations for nearly 3 decades. I will explain what the n-point correlations are, how they are related to fractal geometry and to machine learning, why they are so important, and what makes them so hard to compute. Then I will describe two algorithms which attack different cases of the problem: one based on a simple but unusual computational geometry idea, and one based on a simple but unusual combination of computational geometry and Monte Carlo integration. Along the way I'll show you some things you have seen before and some things you probably haven't seen before, regarding high-school-level basic divide-and-conquer, basic sampling, and even basic probability and statistics if there's time. I will finally demonstrate that these algorithms have in fact changed the tractability status of this problem, and are being used in massive computational experiments by several astrophysics groups as we speak. So will the universe collapse and crush us all? Find out this Friday, and eat free food at the same time.
This is joint work with Andrew Moore.
This talk is in partial fulfillment of the speaking requirement.
Early Disease Outbreak Detection
Friday, April 25th, 2003 from 12-1 pm in NSH 1507.
Presented by Weng-Keen Wong,
When an epidemic breaks out in a region, it leaves a noticeable mark on health care data. For instance, more people visit physicians and emergency departments, school and work absenteeism increases, sales of over-the-counter medication peaks, and perhaps even the number of deaths is unusually large. With such distinct signals in health care data, data mining algorithms have been developed to improve the timeliness of disease outbreak detection. The first part of this talk will provide an overview of the science of early disease outbreak detection, describing the strategies employed and the common problems that early warning systems encounter. The second portion of this talk will describe WSARE (What's Strange About Recent Events), a rule-based anomalous pattern detection algorithm that is used for early outbreak detection.
This is joint work with Andrew Moore, Greg Cooper, and Michael Wagner.
This talk is NOT for speaking requirement.
Providing Contextual Information to Pervasive Computing Applications
Friday, May 2nd, 2003 from 12-1 pm in WeH 4623.
Presented by Glenn Judd,
Pervasive computing applications are increasingly leveraging contextual information from several sources to provide users with behavior appropriate to the environment in which they reside. If these sources of contextual information are used and deployed in an ad hoc manner, however, they may provide overlapping functionality, fail to provide needed functionality, and require the use of inconsistent interfaces by applications. To overcome these problems, we introduce a Contextual Information Service that provides applications with contextual information via a virtual database. Unlike previous efforts, our service provides applications a consistent, lightweight, and powerful mechanism for obtaining contextual information, and includes explicit support for the on demand computation of contextual information. We show, via example applications and a Contextual Information Service prototype that we have implemented, how this approach can be used to allow proactive applications to adapt their behavior to match a user's current environment.
This talk is in partial fulfillment of the speaking requirement.
SPADES --- System for Parallel Agent, Discrete Event Simulation
Friday, May 9th, 2003 from 12-1 pm in NSH 1507.
Presented by Patrick Riley,
Simulations are an excellent tool for studying artificial intelligence. However, the simulation technology designed for and in use by the AI community often fails to take advantage of the work in the larger simulation community about how to construct stable, repeatable simulations. In this talk, I will discuss some of the problems of current AI simulation technology and some of the special requirements that the AI community has that is not addressed by much simulation infrastructure. I'll then present a system I designed and implemented called SPADES which attempts to provide solid infrastructure for AI simulations to address problems created by distributed simulation. No knowledge of AI techniques or the jargon and algorithms of the simulation community will be assumed.
This talk is in partial fulfillment of the speaking requirement.
Web contact: sss+www@cs