Reading List
NOTICE:
Several of the links are internal
to CMU.
Required texts
Recommended texts
- [HKP] Jiawei Han,
Micheline Kamber and Jian Pei, Data Mining:
Concepts and Techniques, 3rd ed., Morgan Kaufmann, 2011 (amazon
link).
- [PTVF] William H. Press Saul A.
Teukolsky William T. Vetterling Brian P. Flannery Numerical
Recipes in C Cambridge University Press, 1992, 2nd Edition,
free on-line and also here. Latest
is 3rd edition, 2007.
- Undergraduate DB textbook,
for those who took a db class too long ago:
Foils:
In pdf, from the course schedule
page.
A. Multimedia Indexing
- Primary key access methods
- Secondary key and spatial access methods
- Jon Louis Bentley, Multidimensional
binary search trees used for associative searching,
Comm. of the ACM (CACM), Volume 18 , Issue 9,
pp. 509-517, (September 1975)
- A. Guttman R-Trees:
a Dynamic Index Structure for Spatial Searching,
Proc. ACM SIGMOD, June 1984, pp. 47-57, Boston, Mass.
- J. Orenstein, Spatial
Query Processing in an Object-Oriented Database System,
Proc. ACM SIGMOD, May, 1986, pp. 326-336, Washington D.C.
- MM-Textbook, chapters 4 and 5.
- Fractals
- Christos Faloutsos and Ibrahim Kamel, Beyond
Uniformity and Independence: Analysis of R-trees Using
the Concept of Fractal Dimension, Proc. ACM
SIGACT-SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis,
MN.
-
Alberto Belussi and Christos Faloutsos, Estimating
the
Selectivity of Spatial Queries Using the `Correlation'
Fractal Dimension Proc. of VLDB, p.
299-310, 1995 (deleted- replaced by the one below)
- Bernd-Uwe Pagel, Flip Korn and Christos Faloutsos, Deflating
the
Dimensionality Curse using Multiple Fractal Dimensions,
ICDE 2000, San Diego, CA, Feb. 2000. (NEW:
9/30 - replaces the one above it)
- Power laws, lognormals etc: M. E. J. Newman, Power
laws, Pareto distributions and Zipf's law Contemporary
Physics 46, 323-351 (2005) (local pdf
copy)
- Text and LSI
- MM-Textbook, chapter 6
- Peter W. Foltz and Susan T. Dumais, Personalized
Information Delivery: an Analysis of Information
Filtering Methods, Comm. of ACM (CACM), 35, 12,
Dec. 1992, pp. 51-60.
- SVD: In PTVF ch. 2.6; MM-Textbook
Appendix D
- PageRank: Sergey Brin, Lawrence Page The Anatomy of a
Large-Scale Hypertextual Web Search Engine (1998)
(local pdf)
- HITS: Jon M. Kleinberg Authoritative Sources in a
Hyperlinked Environment JACM, 46,5 (1999) (local pdf)
- Tensors survey: Papalexakis, Faloutsos, Sidiropoulos Tensor
for Data Mining and Data Fusion: Models, Applications, and
Scalable Algorithms ACM Trans. on Intelligent Systems
and Technology, 8,2, Oct. 2016. (local
copy)
- Tensors: [Graph-Textbook] Ch.16.
- Time sequences
- DSP and image databases
- Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jon
Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner,
Denis Lee, Dragutin Petkovic, David Steele and Peter Yanker
Query
by Image and Video Content: the QBIC System IEEE
Computer 28, 9, Sep. 1995, pp. 23-32.
- Journal
of
Intelligent Inf. Systems, 3, 3/4, pp. 231-262, 1994 An
earlier, more technical version of the IEEE Computer '95
paper.
- FastMap: MM-Textbook chapter 11; Also
in: C. Faloutsos and K.I. Lin FastMap:
A Fast Algorithm for Indexing, Data-Mining and
Visualization of Traditional and Multimedia Datasets
ACM SIGMOD 95, pp. 163-174.
- DFT/DCT: In PTVF ch. 12.1, 12.3, 12.4;
in MM-Textbook Appendix B.
- Wavelets: In PTVF ch. 13.10; in MM-Textbook Appendix C
- Karhunen-Loeve: in MM-Textbook Appendix
D.
- JPEG: Gregory K. Wallace, The
JPEG Still Picture Compression Standard, CACM,
34, 4, April 1991, pp. 31-44
- MPEG: D. Le Gall, MPEG:
a Video Compression Standard for Multimedia Applications
CACM, 34, 4, April 1991, pp. 46-58
- Fractal compression: M.F. Barnsley and A.D. Sloan, A
Better Way to Compress Images, BYTE, Jan. 1988,
pp. 215-223.
- MM-Textbook, chapter 9
B. Data mining
- Graph mining and social networks:
- Michalis Faloutsos, Petros Faloutsos and Christos
Faloutsos,
On Power-Law Relationships of the Internet Topology,
SIGCOMM 1999.
- R. Albert, H. Jeong, and A.-L. Barabási, Diameter
of the World Wide Web Nature, 401,
130-131 (1999).
- Réka Albert and Albert-László Barabási Statistical
mechanics of complex networks, Reviews of
Modern Physics, 74, 47 (2002).
- Haveliwala,
Taher H. (2003) Topic-Sensitive PageRank: A
Context-Sensitive Ranking Algorithm for Web Search.
Technical Report. Stanford InfoLab. (Extended version of the
WWW2002 paper on Topic-Sensitive PageRank.)
- D.
Chakrabarti,
S. Papadimitriou, D. Modha and C. Faloutsos, Fully Automatic
Cross-Associations, in KDD 2004 (pages
79-88), Washington, USA
- Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs:
Problem Definition and Fast Solutions, KDD
2006, Philadelphia, PA
- Jure Leskovec, Jon Kleinberg, Christos Faloutsos Graphs
over
Time: Densification Laws, Shrinking Diameters and Possible
Explanations, KDD 2005, Chicago, IL, USA, 2005.
- D. Chakrabarti and C. Faloutsos, Graph Mining: Laws, Generators and Algorithms, in
ACM Computing Surveys, 38(1), 2006 (pdf
draft, internal to CMU)
- J. Leskovec, D. Chakrabarti, J. Kleinberg, and C.
Faloutsos, Realistic, Mathematically
Tractable Graph Generation and Evolution, Using
Kronecker Multiplication, in PKDD 2005,
Porto, Portugal
- Danai Koutra, Tai-You Ke, U. Kang, Duen Horng Chau,
Hsing-Kuo Kenneth Pao, and Christos Faloutsos.
Unifying guilt-by-association approaches: theorems and
fast algorithms. ECML/PKDD'11, Athens, Greece. (NEW, 11/25)
- [Graph-Textbook], Ch.18: virus
propagation
- [Graph-Textbook],
Ch.19: Case studies: Random walk with restarts, image
captioning.
- Time series forecasting
- Statistics background: In PTVF pp. 620-621
and ch. 14.4-14.5;
- AI background / Classification
- [HKP] chapter 8.1-2.
- Rakesh Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer
and Arun Swami An
Interval Classifier for Database Mining Applications
VLDB Conf. Proc. Vancouver, BC, Canada, Aug. 1992, pp.
560-573.
- M. Mehta, R. Agrawal and J. Rissanen, `SLIQ:
A
Fast Scalable Classifier for Data Mining', Proc.
of the Fifth Int'l Conference on Extending Database
Technology, Avignon, France, March 1996.
- Data Mining in Databases:
- Data warehouses, OLAP and DataCubes: [HKP],
ch. 4.1-3.
- Data reduction: [HKP] chapter 3.3.
- Association Rules:
- Cluster analysis: [HKP] chapter 10.
- Miscellaneous (ICA, approximate counting)
- Jia-Yu Pan, Christos Faloutsos, Masafumi Hamamoto and
Hiroyuki Kitagawa: AutoSplit:
Fast and Scalable Discovery of Hidden Variables in
Stream and Multimedia Databases, PAKDD, Sydney,
Australia, May 2004.
- Christopher Palmer, Phillip Gibbons and Christos
Faloutsos,
ANF: A Fast and Scalable Tool for Data Mining in Massive
Graphs, KDD 2002, Edmonton, Alberta, Canada,
July 2002
- Efficient
and Tunable Similar Set Retrieval, by
Aristides Gionis, Dimitrios Gunopulos and Nikos Koudas, ACM
SIGMOD, Santa Barbara, California, May 21-24, 2001.
- New
sampling-based summary statistics for improving
approximate query answers, by Phillip B. Gibbons
and Yossi Matias, ACM SIGMOD, pp 331 - 342, Seattle,
Washington, 1998.
RECOMMENDED OPTIONAL READING
Additional, optional citations, that may be useful for
your project:
Multimedia indexing
- Spatial access methods:
- N. Beckmann, H.-P. Kriegel, R. Schneider B. Seeger The
R*-Tree: an Efficient and Robust Access Method for Points
and Rectangles ACM SIGMOD, May 1990, pp. 322-331
Atlantic City, NJ. (Deferred splitting in R-trees)
- Fractals
Data mining
Last modified: Nov. 25, 2019, by Christos Faloutsos