15-853: Algorithms in the Real World (Guy Blelloch)
Topic 9: Indexing and Searching
[ Topics |
Scribe Notes |
Readings |
Text Books |
Links ]
Indexing 1 (draft) (Helen Wang)
Indexing 2 (draft) (Ben Horowitz)
Evolutionary Trees and Indexing 3 (draft) (Amar Chaudhary)
Ian H. Witten, Alistair Moffat, and Timothy C. Bell.
Managing Gigabytes: Compressing and Indexing Documents and
Images.
Van Nostrand Reinhold, 1994.
Chapter 3: Indexing
Christos Faloutsos.
Searching Multimedia Data Bases by Content.
Kluwer Academic, 1996.
Frakes and Baeza-Yates (ed.).
Information Retrieval: Data Structures and Algorithms.
Prentice Hall, 1992
Frants V. J., Shapiro J., Voiskunskii V. G.
Automated
Information Retrieval Theory and Methods
Academic Press, Aug 1997.
Robert R. Korfhage
Information Storage and Retrieval.
John Wiley & Sons, 1997.
Lesk M.
Practical Digital Libraries Books, Bytes & Bucks
Morgan Kaufman Publishers, 1997.
Salton.
Automatic Text Processing: The Transformation, Analysis and Retrieval
of Information by Computer.
Addison-Wesley, 1989.
Sparck Jones K. and Willett P. (editors).
Readings in Information Retrieval.
Morgan Kaufman Publishers, 1997.
Ian H. Witten, Alistair Moffat, and Timothy C. Bell.
Managing Gigabytes: Compressing and Indexing Documents and
Images.
Academic Press/Morgan Kaufmann, 1999.
Summaries
A nice tutorial by Andrei Broder and Monika Henzinger on Information
Retrieval on the Web.
Information on Web search engines
Lycos: Design choices in an Internet search service
Search Engine Watch.
Information on the various search engines, including size and features and
some information on how they work.
Latent Semantic Indexing
Latent Semantic Indexing (LSI)
Latent semantic indexing: A probabilistic analysis
by Papadimitriou et. al.
Analyzes an information retrieval technique related to principle components analysis.
In PODS 98.
Link Based Analysis for Indexing
Authoritative sources in a hyperlinked environment by J. Kleinberg.
Overviews the ideas behind the HITS method for finding Hubs and Authorities.
Automatic resource compilation by analyzing hyperlink
structure and associated text by Chakrabarti et. al.
Describes how HITS can be used to find pages within given clasifications
to fill the topics pages of a Yahoo like system automatically.
The Clever project at IBM Almaden.
The Anatomy of a Large-Scale Hypertextual Web Search Engine"
by Sergey Brin and Lawrence Page. This describes the Google search engine,
at least the original version from Stanford. It briefly describes the
page ranking scheme.
The Connectivity Server: fast access to linkage information on the Web
by Krishna Bharat, Andrei Broder, Monika Henzinger, Puneet Kumara, and Suresh Venkatasubramanian.
Digital Libraries
The digital libraries initiative. Includes projects
at Carnegie Mellon U., Stanford U., UC Berkeley, UC Santa Barbara,
U. Illinois, U. Michigan.
Digital Library Related Information and Resources
Systems for Indexing/Searching your own site
A long list of indexing/search engines that can be used to search
your own site.
Web Site search engines. A nice overview of the various
search engines available for indexing your own site
Glimpse
Other relevant info
The Text Retrieval Conference (TREC).
"The goal of the conference series is to encourage research in information
retrieval from large text applications by providing a large test collection, uniform scoring procedures, and a
forum for organizations interested in comparing their results." Sponsored by NIST and DARPA.
Back to the Algorithms in the Real World home page.
Guy Blelloch,
guyb@cs.cmu.edu.