Time: Tuesday
Place: Wean Hall 5409
Organizer: Dr.
Assistant: Phyllis Pomerantz (plp@cs.cmu.edu). Unless a different person is named below, please contact Phyllis for appointments with the speakers.
Date |
Speaker |
Affiliation |
Topic |
9/26/03, Wean Hall 7220, 3:30pm Note unusual date & room |
Jason Hartline (This talk is part of the theory seminar series, but I highly recommend it to AI seminar series participants also.) |
CMU Computer Science Department |
|
|
Ron Lavi |
|
Towards a Characterization of Truthful Combinatorial Auctions |
|
Drew Bagnell |
|
Learning Policies: Leveraging Supervised Learning for Planning and Control |
|
David Blei |
UC Berkeley, CS |
|
|
Andrew McCallum |
UMass Amherst, Computer Science |
Conditional Random Fields for Information Extraction and Coreference |
|
Eric Xing |
UC Berkeley, CS |
Generalized Mean Field Inference in Graphical Models (For appointments, contact Jill Lentz [jlentz@cs.cmu.edu].) |
|
Jonathan Schaeffer |
|
|
|
Naftali Tishby |
School of Computer Science and Engineering and |
|
Note unusual date & room |
Michael Kearns |
Upenn CIS |
|
|
Russ Greiner |
|
|
|
SPRING BREAK |
NO TALK |
|
|
Henry Kautz |
|
|
4/20/04 |
Leslie Pack Kaelbling |
MIT Computer Science and Artificial Intelligence Lab |
|
|
Michael Lewicki |
|
Probabilistic models for learning structure in natural scenes and sounds |
|
Pat Langley |
|
Computational Induction of Explanatory Process Models |
TBA |
Tom Mitchell |
|
TBA (Brain science & machine learning) |
One feature of the classical treatment of optimization
problems is that the space over which the optimization is being performed,
i.e., the input description of the problem, is assumed to be publicly known to
the optimizer. This assumption does not always accurately represent reality,
for example, in applications involving resource sharing and electronic commerce
on the Internet where transactions must occur among parties with diverse and
selfish interests. In such settings, the inputs to many optimizations being
performed are not publicly known in advance. Instead they must be solicited
from companies, computerized agents, individuals, etc. that may act selfishly
to promote their own self-interests. In particular, they may lie about their
values or may not adhere to specified protocols if it benefits them. One of the
most fundamental problems where the selfish interests of the participants comes
into play is in auctions. In this talk I will review the worst case competitive
analysis of auctions for digital goods. This review will include a review of
the necessary game theoretic notions, the solution concept of truthful
mechanism design, and the framework for competitive analysis of auctions. I
will then pose two open problems relating to this work and present what is
known about them. The first open problem I will present is on the existence of
competitive deterministic auctions. On this topic, I will show that there is no
symmetric deterministic auction that is competitive. On the other hand, I will
show that there is a competitive auction that uses only 1 random bit. The open
question left is whether there exists asymmetric deterministic auctions which
are competitive. The second open problem I will present is on the optimal
competitive ratio. The best known auction achieves a competitive ratio of 3.39.
I will discuss a lower bound on the competitive ratio of 2.42. Furthermore, I
will present two simplifications of the auction problem, one for which the
optimal competitive ratio is known, and the other for which it is not known.
(This talk presents work that is joint with Amos Fiat,
Andrew Goldberg,
Anna Karlin, and Mike Saks)
We analyze incentive compatible (truthful) mechanisms over restricted domains of preferences, the leading example being combinatorial auctions. Our work generalizes the characterization of Roberts (1979) who showed that truthful mechanisms over unrestricted domains with at least 3 possible outcomes must be "affine maximizers". We show that truthful mechanisms for combinatorial auctions (and related restricted domains) must be "almost affine maximizers" if they also satisfy an additional requirement of "independence of irrelevant alternatives". This requirement is without loss of generality for unrestricted domains as well as for auctions between two players where all goods must be allocated. This implies unconditional results for these cases, including a new proof of Roberts' theorem. The computational implications of this characterization are severe, as reasonable "almost affine maximizers" are shown to be as computationally hard as exact optimization. This implies the near-helplessness of such truthful polynomial-time auctions in all cases where exact optimization is computationally intractable.
See also http://www.cs.huji.ac.il/~tron for more information.
Joint work with Ahuva Mu'alem and Noam Nisan.
Increasingly, learning systems are called upon to not merely predict,
but to take actions that affect the world. The fields of planning and
control are meanwhile asked to make good sequential decisions in
complex problems where simple analytic models are unavailable. This
confluence of learning and control has driven research to identify
algorithms that can search for good strategies while remaining
efficient in both sample and computational resources.
Policy search approaches are becoming a popular alternative to value
function methods in part because they gracefully handle both large
state-spaces and partial observability. I'll discuss two recent
developments that transplant techniques from supervised learning to
learning policies. First, I'll show how notions from information
geometry give simple algorithms with dramatically better performance
than existing reinforcement learning algorithms. Next, I'll discuss a
new approach that discards Bellman equations and searches directly in
a space of policies while retaining the central insight of dynamic
programming. With appropriate domain knowledge, this algorithm is
efficient in both computational and sample complexity and provides
much stronger guarantees than classical value-function approximation
can. Examples will illustrate how each theoretical advance leads to
state-of-the-art empirical performance on difficult benchmark
problems.
J.A. (Drew) Bagnell is a doctoral candidate in Robotics and an NSF
graduate fellow at Carnegie Mellon University. His interests include
rich probabilistic models for statistical learning, robust and
stochastic control, and the application of machine learning to large,
partially observable planning and control problems.
I will discuss the problem of learning topic
hierarchies from text data. Model selection in this domain is daunting---which
of the large collection of possible trees to use? Taking a Bayesian approach,
we have developed the nested Chinese restaurant process, a flexible,
nonparametric prior on tree structures which is built via a collection of
distributions on partitions of integers. This prior allows arbitrarily large
branching factors and readily accommodates growing data collections.
We have used this distribution as a prior in a
hierarchical variant of latent Dirichlet allocation
(LDA). LDA is a probabilistic graphical model of text collections in which
documents can exhibit multiple latent topics. In the hierarchical variant,
these topics lie in a hierarchy which itself is random and unknown. A first
approach to inference in such a model is a straightforward Gibbs sampling
algorithm which finds an approximate posterior distribution on topic
hierarchies induced by a collection of documents. This yields interesting
results when applied to a dataset of abstracts from the NIPS conferences.
(Joint work with Thomas Griffiths, Michael Jordan, and
Josh Tenenbaum)
BIO
David Blei is a doctoral
candidate in the Computer Science Department at U.C. Berkeley. He is
interested in probabilistic graphical models applied to problems in
text-processing and information retrieval.
Information
extraction is the process of filling a structured database
from unstructured text. It is a difficult statistical and
computational problem often involving hundreds of thousands of
variables, complex algorithms, and noisy and sparse data. Recently
there has been significant interest in conditionally-trained
alternatives to joint probabilistic models such as HMMs.
In this talk
I will briefly review previous work in finite state, Conditional
Random Field (CRF) models for information extraction, and then
describe four pieces of more recent work: (1) the application of CRFs
to extraction of tables from government reports, (2) feature induction
for these models, applied to named entity extraction, (3) a
random field method for noun co-reference resolution that has strong
ties to graph partitioning, (4) an extension of CRFs
to factorial
state representation, enabling simultaneous part-of-speech tagging and
noun-phrase segmentation.
Joint
work with colleagues at UMass (Ben Wellner, Khashayar
Rohanimanesh, Charles Sutton, David Pinto, Xing Wei, Wei Li, Bruce Croft),
CMU (John Lafferty), and UPenn (Fernando Pereira).
Andrew
McCallum is an Associate Professor at
and Development at WhizBang Labs, a company that used
machine learning
for information extraction from the Web. In the late 1990's he was a
Research Scientist and Coordinator at
receiving his PhD from the
editorial board of the Journal of Machine Learning Research.
For the past
eight years, McCallum has been active in research on statistical machine
learning applied to text, especially information extraction, document
classification, finite state models, and learning from combinations of
labeled and unlabeled data. Web page: http://www.cs.umass.edu/~mccallum.
Recent years have seen increasing popularity of
modeling complex phenomena in applied fields using "probabilistic
graphical models", a formulism that exploits the conjoined talents of
graphical theory and probability theory to build complex models out of simpler
pieces. In this talk, I discuss a class
of generalized mean field (GMF) inference algorithms for approximate inference
in exponential family graphical models, and a novel combination of graph
partitioning (GP) algorithms with the GMF algorithms, which potentially leads
to a generic, autonomous, and distributed algorithm for deterministic
approximate inference in arbitrary graphical models. I discuss the mathematical
underpinnings for the GMF algorithms and formal relationships between GMF and
GP. I also discuss applications of the GMF algorithms in both toy examples and
a large-scale computational biology problem in genomic sequence analysis.
(Joint work with Michael
Jordan and Stuart Russell.)
Eric Xing received his B.S. with honors in Physics and Biology from
Tsinghua university, his Ph.D. in Molecular Biology and Biochemistry
from
Science at UC Berkeley. His early work in molecular biology focused on
the genetic mechanisms of human carcinogenesis and the mutational
spectrum of tumor suppressor genes. Then he moved into machine
learning and has worked on probabilistic graphical models, approximate
inference and pattern recognition. He is interested in studying
biological problems (in particular, systems biology, genetic inference
and evolution) using statistical learning approaches, theory and
application of graphical models, nonparametric Bayesian analysis and
semi-unsupervised learning.
Poker is a challenging
problem for AI research: multiple agents (up to 10), stochastic element (cards
being dealt), imperfect information (don't know the opponent's cards), user modeling
(identifying player patterns), and risk management (betting decisions). For
over 10 years the
1) knowledge-based system,
2) simulations,
3) game theory, and
4) tree searching with learning.
The prospects of a program
successfully challenging
the best human players in the
near future is excellent.
In this talk we will motivate
the research, compare the
four different program
designs, and discuss our future directions.
Jonathan Schaeffer is a
professor in Computing Science at the
A fundamental issue in computational learning theory, as
well as in
biological and cognitive information processing, is the best possible
relationship between model representation complexity and its prediction
accuracy. Clearly, we expect more complex models that require longer data
representation to be more accurate. Can one provide a quantitative - yet
general - formulation of this tradeoff? In this talk I will discuss this
question from
this trade-off can be traced back to the basic duality between source and
channel coding and is also related to the notion of "coding with side
information". Moreover, using some resent results I will argue that
when
formulated properly (as an "Information
Bottleneck") the optimal
representation form a "perfectly matched source-channel" and is
achievable
without delays or block-coding. I will review the theoretical
achievability
results for such relevant data representations and discuss some of our
algorithms for extracting them. I will then demonstrate the application of
these ideas for the analysis of natural language corpora and speculate on
possibly-universal scaling between meaningful information and word-entropy
in human communication, that they reveal.
http://www.cs.huji.ac.il/~tishby/
Over the last several years, a number of authors have developed graph-theoretic or network models for large-population game theory and economics. In such models, each player or organization is represented by a vertex in a graph, and payoffs and transactions are restricted to obey the topology of the graph. This allows the detailed specification of rich structure (social, technological, organizational, political, regulatory) in strategic and economic systems.
In this talk, I will survey these models and the attendant algorithms for certain basic computations, including Nash, correlated, and Arrow-Debreu equilibria. Connections to related topics, such as Bayesian and Markov networks for probabilistic modeling and inference, will be discussed. Time permitting, I will briefly discuss some recent experimental work marrying this general line of thought with topics in social network theory.
Many web recommendation systems direct users to webpages, from a single website, that other similar users
have visited. By contrast, our WebIC web
recommendation system is designed to locate "information content (IC)
pages" --- pages the current user needs to see to complete her task ---
from essentially anywhere on the web. WebIC first extracts the "browsing properties" of
each word encountered in the user's current click-stream --- eg, how often each word appears in the title of a page, or
in the "anchor" of a link that was followed, etc. It then uses a user- and site-independent
model, learned from a set of annotated web logs acquired in a user study, to
determine which of these words is likely to appear in an IC page. We discuss how to use these IC-words to find
IC-pages, and demonstrate empirically that this browsing-based approach works
effectively.
Joint work with
Tingshao Zhu, Gerald Häubl
and Bob Price.
After earning a PhD from Stanford, Russ Greiner worked
in both academic and industrial research before settling at the
For more information, see http://www.cs.ualberta.ca/~greiner/WebIC
Dr. Greiner is visiting CALD here at CMU this Spring.
In the early days of AI some researchers proposed that
intelligent problem solving could be reduced to the application of general
purpose theorem provers to an axiomatization
of commonsense knowledge. Although automated first-order theorem proving was
unwieldy, general reasoning engines for propositional logic turned out to be
surprisingly efficient for a wide variety of applications. Yet many problems of interest to AI involve
probabilities or quantification, and would seem to be beyond propositional
methods. However, research research has shown that the basic backtrack search
algorithm for satisfiability generalizes to a
strikingly efficient approach for broader classes of inference. We describe our progress in building the
world's fastest engine for model-counting (the core #P-completeproblem)
by leveraging the techniques of clause learning and formula cacheing.
Henry Kautz is an Associate Professor in the Department of Computer
Science and Engineering at the
the faculty in the summer of the year 2000 after a career
at
Labs and AT&T Laboratories, where he was Head of
the AI Principles
Research Department. His academic degrees include an
A.B. in
mathematics from
from the
from the
from the
and Thought Award" from the International Joint
Conference on
Artificial Intelligence and a Fellow of the American
Association
for Artificial Intelligence. In 1998 he was elected to
the
Executive Council of AAAI, and in 2000 was Program
Chair for
the AAAI National Conference.
In the last 10 years, the combination of techniques from machine
learning and operations research has allowed major advances in learning
and planning for uncertain environments. Reasonably large problems
can be solved using current techniques. But what if we want to scale
up to the uncertain learning and planning problem that you face every
day? It is many orders of magnitude larger than the biggest problem we
can solve currently.
In this talk, I'll describe three early pieces of work that try to
begin to address working in truly huge environments. The first is a
method for learning probabilistic rules to describe naive physics
models of the interactions between objects. The second is an uncertain
planning algorithm that uses the rules learned by the first method to
construct contingency plans that consider enough cases to perform
robustly, but are much smaller than complete policies. The last piece
is preliminary work on combining multiple abstraction methods
dynamically, in order to allow an agent to have a working model of the
environment that changes focus depending on the current situation.
Our perceptual systems are bewildering in their complexity. Although a
great deal has been learned about their structure and organization,
insights into the underlying information processing algorithms have
proved elusive. Recent theoretical work, however, has begun to shed
light on some of the underlying computational principles and has
provided explanations for both the properties of early perceptual
representations and their relationship to the natural environment. A
motivating hypothesis of these theories is that part of what makes
biological sensory systems so robust and powerful is their ability to
learn cues and features from the statistical regularities of natural
signals. By developing probabilistic models and algorithms for
learning structure from these regularities, we will both be able to
develop more robust artificial systems, and gain insight into the types
of representations and computations used by biological systems. I will
show that a simple set of theoretical principles can be used to learn
optimal codes for both natural images and sounds, and that these
representations can explain a wide range of complex neural properties
and provide new insights into their organization. I will also show
that this framework can be extended in order to learn abstract sensory
properties and invariant features. This framework allows us to make
novel predictions about the functional roles of many aspects of
biological visual and auditory systems.
Joint work with Yan Karklin and Evan Smith.
Dr. Lewicki received his B.S. degree in mathematics and cognitive
science from
computation and neural systems from the California Institute of
Technology, and did postdoctoral studies in the Computational
Neurobiology Laboratory at the Salk Institute. His research involves
the study and development of computational and theoretical approaches
to the representation, processing, and learning of pattern structure in
natural visual and acoustic environments. He is the recipient of an
NSF CAREER award for his multidisciplinary work on the representation
of natural auditory scenes. He is currently an assistant professor in
the Computer Science Department at
the CMU-University of
Cognition.
Computational Induction of Explanatory Process Models
The growing amount of scientific data has led to the increased use of
computational discovery methods to understand and interpret them. However, most
work has relied on knowledge-lean techniques like clustering and classification
learning, which produce descriptive rather than explanatory
models, and it has utilized formalisms developed in AI or statistics, so that
results seldom make contact with current theories or scientific notations. In
this talk, I present an approach to computational discovery that encodes
explanatory scientific models as sets of quantitative processes, simulates
these models' behavior over time, incorporates background knowledge to
constrain model construction, and induces these models from time-series data in a robust manner. I illustrate this framework
on data and models from Earth science and microbiology, two domains in which explanatory process accounts occur frequently. In
closing, I describe our progress toward an interactive software
environment for the construction, evaluation, and revision of such explanatory
scientific models.
This talk describes joint work with Kevin Arrigo, Nima Asgharbeygi,
http://cll.stanford.edu/~langley/