go back to the main page

SSS Abstracts
Fall 2002

go back to the main page

How to Speed Up Heuristic Search with Reduced Ordered Binary Decision Diagrams

Friday, August 30^th, 2002 from 12-1 pm in NSH 1507.

Presented by Rune Jensen,

Heuristic search algorithms such as A* and best-first search are of key importance in a wide range of scientific and industrial applications (e.g. for process scheduling and optimization, formal verification, and VLSI design). A substantial problem for this class of algorithms, however, is that they usually represent and manipulate states of the search domain explicitly. In practice this often causes large problem instances to be intractable. In the first part of this talk, I will show, step by step, how to improve the performance of these algorithms by representing and expanding sets of states implicitly during the search using a Boolean encoding of states based on the reduced Ordered Binary Decision Diagram (OBDD). The approach forms a general framework called state-set branching. It applies to any heuristic function, any evaluation function, and any transition cost function. In addition, it is generally sound and optimal for A* implementations. In the second part of the talk, I will present an extensive experimental validation of the framework involving several new and classical domains. The results show that state-set branching can improve the complexity of single-state A* exponentially and that it often dominates both single-state A* and classical OBDD-based breadth-first search by several orders of magnitude. In addition, it has substantially better performance than BDDA, the only previous OBDD-based implementation of A.

This talk is in partial fulfillment of the speaking requirement.

Compiler Optimization of Scalar Value Communication Between Speculative Threads

Friday, September 27^th, 2002 from 12-1 pm in NSH 1507.

Presented by Antonia Zhai,

Compiler and architecture researchers have striven to improve the performance of a wide spectrum of applications on microprocessors by automatically exploiting parallelism. However, the desired performance is elusive for a large number of general-purpose programs, mainly due to their complicated control flow and ambiguous data dependences. Thread-Level Speculation (TLS) opens up new opportunities for automatic parallelization by allowing possibly dependent threads to execute speculatively in parallel. Although there have been many recent proposals for hardware that supports TLS, relatively little has been done on compiler optimization to fully exploit this potential.

In this talk, I will introduce our compiler techniques which remove an important limitation to program performance caused by stalls that are necessary for forwarding scalar values between threads in TLS execution. I will present dataflow algorithms for three increasingly aggressive instruction-scheduling techniques that reduce the critical forwarding path introduced by these stalls. The effectiveness of these techniques will be demonstrated by contrasting with related hardware-only approaches.

This talk is in partial fulfillment of the speaking requirement.

Unsupervised Learning of Arabic Stemming using a Parallel Corpus

Friday, October 18^th, 2002 from 12-1 pm in NSH 1507.

Presented by Monica Rogati,

Stemming is the process of normalizing word variations by removing prefixes and suffixes that add little or no additional meaning. In most cases, both the efficiency and effectiveness of text processing applications such as information retrieval and machine translation are improved. Stemming English is relatively easy: the popular Porter stemmer consists of ~150 lines of code. However, building a rule-based stemmer for an arbitrary language is time consuming and requires experts with linguistic knowledge in that particular language.

This talk presents an unsupervised learning approach to non-English stemming. The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its sole training resources. A parallel corpus is a collection of sentence pairs with the same meaning but in different languages (i.e. United Nations proceedings, bilingual newspapers, the Bible). Monolingual, unannotated text in the target language is readily available and can be used to further improve the stemmer by allowing it to adapt to a desired domain or genre.

Examples and results will be given for Arabic, but the approach is applicable to any language that needs affix removal. Our resource-frugal approach results in 87.5% agreement with a proprietary Arabic stemmer built using rules, suffix and prefix lists, and human annotated text.

This is joint work with Scott McCarley while at IBM TJ Watson (Summer 2002).

This talk is in partial fulfillment of the speaking requirement.

Steganography: Undercover Cryptography

Friday, October 25^th, 2002 from 12-1 pm in NSH 1507.

Presented by Nicholas Hopper,

Steganography is typically described as the art of hiding secret messages in "innocent-looking messages" in such a way that the hidden messages cannot be detected. There is a substantial literature describing protocols which heuristically satisfy this goal, and attacks which detect or even recover the messages hidden by these protocols. A likely reason for this phenomenon is the lack of a rigorous definition of steganography, along the lines of the definitions for secure encryption developed by Blum-Micali, Micali-Goldwasser, Yao and others in the 1980s.

In this talk I will present a formal definition of a stegosystem; give strong definitions of steganographic security against passive and active adversaries; and present constructions which provably satisfy these definitions, under the assumption that pseudorandom function families exist.

This talk is in partial fulfillment of the speaking requirement.

High-Dimensional Nearest-Neighbor Location: Statistics, Geometry and Algorithms (talk cancelled)

Friday, November 1^st, 2002 from 12-1 pm in NSH 1507.

Presented by Alexander Gray,

Nearest-neighbor location is a fundamental problem in computer science, which is actively studied in computational geometry and database indexing, and applied heavily in machine learning. Increasingly, high-dimensional spaces are where the problems lie, while the classic tree-based approaches fail after reaching a handful of dimensions. This has led to recent pessimistic conclusions in both the theory and database communities - that the problem is hopeless unless the problem is relaxed or redefined in various ways including the acceptance of highly approximate and/or non-deterministic answers. In this talk I'll first develop a straightforward statistical theory identifying the actual source of difficulty for tree-based approaches as dimensionality grows, including its concrete effect on the algorithmic complexity. Using the resulting insights, I'll then present a number of simple geometric techniques, based on the properties of hierarchically-arranged hyperspheres and hyperrectangles, which dramatically improve the ability of trees to localize neighborhoods in high dimension, while non-intuitively refuting a couple of previously held notions. I'll conclude by showing empirical results demonstrating efficiency in the highest dimensionalities to date, surpassing the recent compression-based and tree-based methods from the database community and the recent hashing methods from the theory community.

This talk is in partial fulfillment of the speaking requirement.

A Calculus for Probabilistic Languages

Friday, November 8^th, 2002 from 12-1 pm in NSH 1507.

Presented by Sungwoo Park,

As probabilistic computation plays an increasing role in diverse fields in computer science, researchers have designed new languages to facilitate the development of probabilistic programs. An example is an imperative language CES,which is an extension to C++ with probabilistic data types. The language is designed specifically for probabilistic robotics and successfully employed in implementing typical robot control programs rapidly and compactly.

In this talk, I will present a calculus for probabilistic languages, which extends the traditional lambda calculus. In our calculus, every expression denotes a probability distribution yet evaluates to a regular value. The most notable feature of our calculus is that it is founded upon sampling functions, which map the unit interval to probability domains. As a consequence, we achieve a unified representation scheme for all types of probability distributions. I will also present preliminary evidence that probabilistic languages based upon our calculus are viable in applications involving massive probabilistic computation.

The development of the calculus also engendered new research topics. For instance, we have developed a structural formulation of the intuitionistic modal logic S4 with an intersection connective, which is employed in the type system for the calculus. I will briefly discuss these new research topics.

This talk is in partial fulfillment of the speaking requirement.

Understanding the Slowdown of Large Jobs in an M/GI/1 System

Friday, November 22^nd, 2002 from 12-1 pm in NSH 1507.

Presented by Adam Wierman,

It is well-known that choosing the right scheduling algorithm can have a big impact on performance, both in theory and in practice. For example, changing the scheduling algorithm in a CPU from Processor-Sharing (PS) to a policy which biases towards small jobs, such as Shortest-Remaining-Processing-Time-First (SRPT), or to a policy which biases towards young jobs, such as Least-Attained-Service, can improve mean response time (a.k.a. sojourn time) dramatically. However, less well understood is the performance impact of different scheduling policies on large jobs. For example, how does a policy which biases towards small jobs, such as SRPT, compare against a policy which biases towards large jobs, such as Longest-Remaining-Processing-Time-First, when the performance metric is the response time of the large jobs?

In this talk I will limit the discussion to an M/GI/1 queue. For the M/GI/1/PS queue all jobs (large or small) are slowed down by the same factor in expectation. Because the slowdown (response time divided by job size) is the same for all job sizes, the PS policy is often referred to as a fair policy. I will show that all work conserving scheduling policies have the same performance as PS with respect to large jobs. In particular, I will show that the slowdown as job size tends to infinity under any work conserving policy is at most that of PS almost surely; even for policies that clearly bias against large jobs.

This talk is in partial fulfillment of the speaking requirement.

Web contact: sss+www@cs