Neural Information Processing Systems

1999 Tutorial Information

Tutorial Program: November 29, 1999

Click on the session to see the abstracts, biographical sketches of the session leaders and links to more detailed information (given below).

Session I: 09:30--11:30

"Data Mining and Scalability to Large Databases", Usama Fayyad and Heikki Mannila, Microsoft Research
"Probabilistic Models for Unsupervised Learning", Zoubin Ghahramani and Sam Roweis, University College London

Session II: 13:00--15:00

"Probabilistic Language Models: 100 Years and Counting", John Lafferty, Carnegie Mellon University
"Neural Computation: from Software Simulation to Hardware Emulation", Rodney Douglas, UNIZ/ETH Zurich

Session III: 15:30--17:30

"WARNING: Data-Snooping May Be Dangerous To Your Wealth!", Andrew Lo, MIT
"On-line Learning Algorithms: Concepts and Relative Loss Bounds", Manfred Warmuth, University of California Santa Cruz

Session I: 09:30--11:30

Usama Fayyad and Heikki Mannila
- Abstract: Data mining techniques have their origin in statistical pattern recognition. However, related research in statistics, learning, and other fields has not addressed the issue of scaling to large data sets, or other database systems issues. Also, most transactional type data, which are typically very sparse, have received little attention. We cover recent progress in scalable techniques for classification, clustering, data reduction, and modeling sequential data. We also cover methods for finding local patterns or structures (both static and temporal) from large very high-dimensional data sets. We conclude with research challenges and basic open problems that need to be addressed to enable data mining over large databases and facilitate its use by non-specialists.
- Usama Fayyad is a Senior Researcher at Microsoft Research. His research interests include scaling data mining algorithms to large databases, learning algorithms, and statistical pattern recognition, especially classification and clustering. He received his PhD from the University of Michigan, Ann Arbor in 1991, and joined the Jet Propulsion Laboratory (JPL), California Institute of Technology, where he developed data mining systems for automated science data analysis and headed the Machine Learning Systems Group until 1996. He remains affiliated with JPL as a Distinguished Visting Scientist. He received the 1994 NASA Exceptional Achievement Medal and the JPL 1993 Lew Allen Award for Excellence in Research for his work on developing data mining systems to solve challenging science analysis problems in astronomy and remote sensing. He is a co-editor of "Advances in Knowledge Discovery and Data Mining" (AAAI/MIT Press, 1996) and is an Editor-in-Chief of "Data Mining and Knowledge Discovery". He is a director of ACM SIGKDD and Editor-in-chief of its newsletter, "SIGKDD Explorations".
- Heikki Mannila obtained his PhD in 1985, and has been a professor of computer science at the University of Helsinki since 1989. He has also been a professor at the University of Tampere, as well as a visiting professor at the Technical University of Vienna, and a visiting researcher at the Max Planck Institute for Computer Science in Saarbruecken, Germany. He has been a Senior Researcher at Microsoft Research since September 1998. His research interests are in algorithms, databases, and especially data mining. He has worked extensively on finding frequent local patterns from transactional and sequential data, on discovering integrity constraints in databases, on MCMC methods for fitting intensity models, and on using data mining methods for applications in medical genetics.
Zoubin Ghahramani and Sam Roweis
- Abstract: Many of the methods used for clustering, dimensionality reduction, source separation, time series modeling, and other classical problems in unsupervised data modeling are closely related to each other. The focus of this tutorial is to present a consistent unified picture of how these methods, which have been developed and rediscovered in several different fields, are variants of each other, and how a single framework can be used to develop learning algorithms for all of them. We will start from a humble Gaussian model, to describe how continuous state models such as factor analysis, principal components analysis (PCA) and independent components analysis (ICA) are related to each other. We will then motivate discrete state mixture models and vector quantization. Mixture models and factor analysis are then extended to model time series data, and result in hidden Markov models (HMMs) and linear-Gaussian dynamical systems (a.k.a. state-space models), respectively.
  All of these models can be described within the framework of probabilistic graphical models, which we will briefly introduce. In this framework it becomes easy to explore variants and hybrids (such as mixtures of factor analyzers and switching state-space models) which are potentially powerful tools. This framework also makes it clear that the same general probability propagation algorithm can be used to infer the hidden (i.e. latent) variables in all these models, and that the EM algorithm can be used to learn the maximum likelihood (ML) parameters. In the latter part of the tutorial we will focus on approximate inference techniques for models in which probability propagation is intractable, and on variational methods for Bayesian model averaging which can overcome the overfitting and model selection problems in ML learning. Matlab demos will be used to demonstrate some of the models and algorithms.
- Zoubin Ghahramani studied Computer Science and Cognitive Science at the University of Pennsylvania and received his PhD in Cognitive Neuroscience from the Massachusetts Institute of Technology. After leaving MIT he was a postdoctoral fellow in Geoffrey Hinton's group in the Department of Computer Science at the University of Toronto. Since 1998 he has been a Lecturer in the Gatsby Computational Neuroscience Unit at University College London, with joint appointments in the Departments of Computer Science and Psychology. His current research interests include studying how the brain controls movement and integrates information from different senses, developing computational theories of learning in biological and artificial systems, and applying probabilistic approaches to computer intelligence.
- Sam Roweis is currently a postdoctoral research fellow in the Gatsby Computational Neuroscience Unit at University College London. He obtained his undergraduate degree at the University of Toronto, where his interest in neural networks and pattern recognition problems developed under the guidance of Geoffrey Hinton. His graduate studies at Caltech and Princeton University were supervised by John Hopfield. He has also worked at Bell Labs, Microsoft and Northern Telecom. His research focusses on time-series problems, such as speech processing, source separation, motion video analysis, and motor control learning.

Session II: 13:00--15:00

John Lafferty
- Abstract: At the beginning of this century, Andrei Andreyevich Markov pioneered the theory of stochastic processes, motivated in part by probabilistic models of language. Nearly a half century later, Claude Shannon introduced information theory, using statistical language models to illustrate many of its features. Today, at the turn of the century, language modeling methods are proving to be of fundamental importance to a wide range of information processing technologies. This tutorial will survey the art and science of probabilistic language modeling, from n-grams to recent advances in unsupervised clustering, hidden Markov models, probabilistic grammars, maximum entropy, and Bayesian methods. Case studies from information retrieval, document categorization and other information processing technologies will be used to demonstrate the unreasonable effectiveness of simple, generative probabilistic models of natural language, arguably the most interesting and complex of all stochastic processes.
- John Lafferty is an Associate Professor in the Computer Science Department and Language Technologies Institute at Carnegie Mellon University, where he has been a member of the faculty since 1994. Prior to joining CMU, Dr. Lafferty was a Research Staff Member at the IBM Thomas J. Watson Research Center in Yorktown Heights, working on statistical methods for language processing. Dr. Lafferty received his PhD in Mathematics in 1986 from Princeton University, where he was a member of the Program in Applied and Computational Mathematics. His research interests include statistical learning algorithms, natural language processing, information retrieval, and coding and information theory.
Rodney Douglas
- Abstract: This tutorial deals with issues of modeling the computational aspects of neurons and neuronal networks. We will critically discuss four classes of models in terms of their level of abstraction, methods of implementation, and utility. The first class is that of detailed simulations. Examples of simulations of realistic compartmental models of neurons will be presented, and their application to understanding the operation of neocortical pyramidal cells will be discussed. The second class is that of net works of simplified neurons, whose architecture approximates that of the neocortex. These more abstract models are used to illustrate general operating principles of cortical circuits. The third class refers to the implementation of neuromorphic processing in analog and hybrid VLSI chips. Neuromorphic engineering is concerned with the design and fabrication of artificial neural systems, such as neuronal networks, vision chips, and head-eye systems. We will use the silicon retina and silicon synapses as examples. The final class concerns the implementation of complete behaving systems on roving robots whose architecture and design principles are based on those of biological nervous systems.
- Rodney Douglas is Co-Director and Professor of the Institute of Neuroinformatics, University of Zurich. He graduated in Science and Medicine at the University of Cape Town. After obtaining a Doctorate in Neuroscience, he moved to the Anatomical Neuropharmacology Unit in Oxford, where he continued research on the anatomy and biophysics of the microcircuitry of cerebral cortex. In 1989 he moved back to UCT as Associate Professor in Physiology; there he developed digital and analog electronic models of neuronal circuitry. In 1991 he moved back to the ANU in Oxford, where he extended his research into analog VLSI neuromorphic engineering in collaboration with Kevan Martin and Misha Mahowald. A few years later, the three of them moved to Zurich to establish the new Institute of Neuroinformatics. Prof. Douglas' research goal is to define principles of neuronal computation, and to implement these computations in aVLSI neuromorphic systems. His research methods include experimental neurophysiology `in vivo' and `in vitro', theoretical analysis and modeling, biophysics, applied computing, and the use of analog Very Large Scale Integration technology for the development of neuromorphic systems. Professor Douglas is a co-founder and organizer of the annual NSF Neuromorphic Engineering Workshop at Telluride.

Session III: 15:30--17:30

Andrew Lo
- Abstract: While data-mining has become a serious endeavor among computer scientists, statisticians, and financial engineers, there is a dark side---data-snooping---that must be acknowledged if it is to be controlled and managed. This tutorial will provide a brief historical review of data-snooping, describe its psychological underpinnings, and propose several methods for grappling with its most insidious effects on financial decision making and statistical inference. While data-snooping biases can never be completely eliminated in the non-experimental sciences, the magnitude of their effects can sometimes be bounded. No prior statistical background will be assumed; however, participants should be capable of proving the first fundamental theorem of data-snooping: there is no such thing as an uninteresting number!
- Andrew W. Lo is the Harris & Harris Group Professor at MIT's Sloan School of Management and director of MIT's Laboratory for Financial Engineering. He received his PhD in Economics from Harvard University in 1984, and taught at the University of Pennsylvania's Wharton School from 1984 to 1988 before joining the MIT faculty. His research interests include the empirical validation and implementation of financial asset pricing models; the pricing of options and other derivative securities; financial engineering and risk management; trading technology and market microstructure; statistics, econometrics, and stochastic processes; computer algorithms and numerical methods; financial visualization; nonlinear models of stock and bond returns; and, most recently, evolutionary and neurobiological models of individual risk preferences. He has published numerous articles in finance and economics journals, and is a co-author of "The Econometrics of Financial Markets" and "A Non-Random Walk Down Wall Street". He is currently an associate editor of the "Financial Analysts Journal", the "Journal of Portfolio Management", the "Journal of Computational Finance", and the "Review of Economics and Statistics".
Manfred Warmuth
- Abstract: An on-line algorithm sees one example at a time and incurs a loss on each example based on its current model. The hypothesis is updated after each example. The best fixed model is chosen off-line, based on all examples.
  The loss of the on-line algorithm on a sequence of examples is typically larger than the loss of the best off-line model. However, the goal of the on-line learner is to minimize the additional loss of the on-line algorithm over the loss of the best off-line model. Bounds relating the on-line loss to the best off-line loss are called `relative loss bounds'. Such bounds hold for arbitrary sequences of examples. They quantify the price of hiding the future examples from the learner.
  We will review methods for proving relative loss bounds and give an overview of applications. We will emphasize a method that starts with a Bregman divergence which measures the `distance' between parameterized models. This divergence function is used to derive a parameter update rule for the on-line learner, and it becomes the potential function in the proof of the relative loss bound for this update rule. We will discuss related methods for proving bounds on the generalization error. We will then introduce families of update algorithms that are characterized by different divergence functions. The two main families are gradient descent and exponentiated gradient. The former family includes all the kernel based algorithms and the latter family is motivated by the minimum relative entropy principle. We will discuss the merits of the two families. No background is required.
- Manfred K. Warmuth is a professor in the Computer Science department at the University of California at Santa Cruz. He received his PhD in Computer Science in 1981 from the University of Colorado at Boulder. He is an editor of the journal of "Machine Learning" and is involved with the organization of the annual ACM Conference on Computational Learning Theory. His current research interests include machine learning, on-line learning, game theory, neural networks, and statistics.

We have attempted to ensure that all information is correct, but we cannot guarantee it.

Please send comments and corrections to:

L. Douglas Baker

Carnegie Mellon University

ldbapp+nips@cs.cmu.edu