Algorithms for Indexing and Searching


ALGORITHMS FOR INDEXING AND SEARCHING


INSTRUCTORS: Guy Blelloch, John Lafferty, Gary Miller
TIME: Wednesday 3:00-5:30
ROOM: Wean 4601

Description:

With the growth of the Web and other online resources in recent years, the problems associated with managing massive amounts of data have become increasingly interesting and important. A variety of algorithms and techniques have emerged for indexing, filtering, searching, and transmitting these online resources. This course will present a selection of these techniques, with an emphasis on the underlying algorithms and the need to scale up to handle very large data collections. A particular focus of the seminar will be on spectral methods (eigenvalues, singular value decompositions, and graph partitioning) and randomized algorithms (sampling and dimension reduction) for clustering.

Evaluation:

Grading will be based on participation, including giving presentations, and a class project.

Topics:

  • Indexing schemes for massive amounts of data
  • Spectral methods for graph analysis
  • Clustering and nearest neighbor search in high-dimensional spaces
  • Approximation algorithms for singular value decomposition
  • Statistical methods for clustering documents
  • Efficient near-equality testing using randomized algorithms
  • Cache replacement strategies
  • Erasure codes for transmitting information

Schedule:

  Week   Topic
  January 13   Introduction to IR, classical indexing algorithms
  January 20   Overview of algorithmic issues
  January 27   Latent semantic indexing
  February 3   Link-based ranking algorithms
  February 10   Efficient SVD
  February 17   High-dimensional nearest neighbors
  February 24   Cluster analysis
  March 3   Open
  March 10   Properties of the Web
  March 17   Removing duplicate documents
  March 24   Mid-semester break
  March 31   Web caching
  April 7   Erasure codes
  April 14   Open

Handouts



lafferty@cs.cmu.edu