Carnegie Mellon University
15-826: Multimedia and Data Mining
Fall 2024 - Christos Faloutsos
Reading list
Several of the links are internal
to CMU.
Required texts
Recommended texts
- [HKP] Jiawei Han,
Micheline Kamber and Jian Pei, Data Mining:
Concepts and Techniques, 3rd ed., Morgan Kaufmann, 2011 (amazon
- [PTVF] William H. Press Saul A.
Teukolsky William T. Vetterling Brian P. Flannery Numerical
Recipes in C Cambridge University Press,
1992, 2nd Edition, free on-line.
is 3rd edition, 2007.
- Undergraduate DB textbook,
for those who took a db class too long ago:
In pdf, from the course schedule
A. Multimedia Indexing
- Secondary key and spatial access methods
- Jon Louis Bentley, Multidimensional
binary search trees used for associative searching,
Comm. of the ACM (CACM), Volume 18 , Issue 9,
pp. 509-517, (September 1975)
- A. Guttman R-Trees:
a Dynamic Index Structure for Spatial Searching,
Proc. ACM SIGMOD, June 1984, pp. 47-57, Boston, Mass.
- J. Orenstein, Spatial
Query Processing in an Object-Oriented Database System,
Proc. ACM SIGMOD, May, 1986, pp. 326-336, Washington D.C.
- MM-Textbook, chapters 4 and 5.
- Fractals
- Christos Faloutsos and Ibrahim Kamel, Beyond
Uniformity and Independence: Analysis of R-trees Using
the Concept of Fractal Dimension, Proc. ACM
SIGACT-SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis,
- Bernd-Uwe Pagel, Flip Korn and Christos Faloutsos, Deflating
Dimensionality Curse using Multiple Fractal Dimensions,
ICDE 2000, San Diego, CA, Feb. 2000.
- Power laws, lognormals etc: M. E. J. Newman, Power
laws, Pareto distributions and Zipf's law Contemporary
Physics 46, 323-351 (2005) (local pdf
- Text and LSI
- MM-Textbook, chapter 6
- Peter W. Foltz and Susan T. Dumais, Personalized
Information Delivery: an Analysis of Information
Filtering Methods, Comm. of ACM (CACM), 35, 12,
Dec. 1992, pp. 51-60.
- SVD: In PTVF ch. 2.6; MM-Textbook
Appendix D
- PageRank: Sergey Brin, Lawrence Page The Anatomy of a
Large-Scale Hypertextual Web Search Engine (1998)
(local pdf)
- HITS: Jon M. Kleinberg Authoritative Sources in a
Hyperlinked Environment JACM, 46,5 (1999) (local pdf)
- Tensors survey: Papalexakis, Faloutsos, Sidiropoulos Tensor
for Data Mining and Data Fusion: Models, Applications, and
Scalable Algorithms ACM Trans. on Intelligent Systems
and Technology, 8,2, Oct. 2016. (local
- Tensors: [Graph-Textbook] Ch.16.
- Time sequences
- DSP and image databases
- Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jon
Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner,
Denis Lee, Dragutin Petkovic, David Steele and Peter Yanker
by Image and Video Content: the QBIC System IEEE
Computer 28, 9, Sep. 1995, pp. 23-32.
- FastMap: MM-Textbook chapter 11; Also
in: C. Faloutsos and K.I. Lin FastMap:
A Fast Algorithm for Indexing, Data-Mining and
Visualization of Traditional and Multimedia Datasets
ACM SIGMOD 95, pp. 163-174.
- DFT/DCT: In PTVF ch. 12.1, 12.3, 12.4;
in MM-Textbook Appendix B.
- Wavelets: In PTVF ch. 13.10; in MM-Textbook Appendix C
- Karhunen-Loeve: in MM-Textbook Appendix
- JPEG: Gregory K. Wallace, The
JPEG Still Picture Compression Standard, CACM,
34, 4, April 1991, pp. 31-44
- MPEG: D. Le Gall, MPEG:
a Video Compression Standard for Multimedia Applications
CACM, 34, 4, April 1991, pp. 46-58
- Laurens van der Maaten, Geoffrey Hinton, Visualizing
Data using t-SNE, JMLR 9(86):2579−2605, 2008.
local copy.
- Leland McInnes, John Healy, James Melville, UMAP: Uniform
Manifold Approximation and Projection for Dimension
Reduction, arxiv, 2018
local copy.
- Fractal compression: M.F. Barnsley and A.D. Sloan, A
Better Way to Compress Images, BYTE, Jan. 1988,
pp. 215-223.
- MM-Textbook, chapter 9
B. Data mining
- Graph mining and social networks:
- Michalis Faloutsos, Petros Faloutsos and Christos
On Power-Law Relationships of the Internet Topology,
- R. Albert, H. Jeong, and A.-L. Barabási, Diameter
of the World Wide Web Nature, 401,
130-131 (1999).
- Réka Albert and Albert-László Barabási Statistical
mechanics of complex networks, Reviews of
Modern Physics, 74, 47 (2002).
- Haveliwala,
Taher H. (2003) Topic-Sensitive PageRank: A
Context-Sensitive Ranking Algorithm for Web Search.
Technical Report. Stanford InfoLab. (Extended version of the
WWW2002 paper on Topic-Sensitive PageRank.)
- D.
S. Papadimitriou, D. Modha and C. Faloutsos, Fully Automatic
Cross-Associations, in KDD 2004 (pages
79-88), Washington, USA
- Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs:
Problem Definition and Fast Solutions, KDD
2006, Philadelphia, PA
- Jure Leskovec, Jon Kleinberg, Christos Faloutsos Graphs
Time: Densification Laws, Shrinking Diameters and Possible
Explanations, KDD 2005, Chicago, IL, USA, 2005.
- J. Leskovec, D. Chakrabarti, J. Kleinberg, and C.
Faloutsos, Realistic, Mathematically
Tractable Graph Generation and Evolution, Using
Kronecker Multiplication, in PKDD 2005,
Porto, Portugal
- Danai Koutra, Tai-You Ke, U. Kang, Duen Horng Chau,
Hsing-Kuo Kenneth Pao, and Christos Faloutsos.
Unifying guilt-by-association approaches: theorems and
fast algorithms. ECML/PKDD'11, Athens, Greece.
- [Graph-Textbook], Ch.18: virus
- [Graph-Textbook],
Ch.19: Case studies: Random walk with restarts, image
- Time series forecasting
- Chungmin Melvin Chen and Nick Roussopoulos, Adaptive Selectivity
Estimation Using Query Feedbacks, SIGMOD
- Byong-Kee Yi, Nikolaos D. Sidiropoulos, Theodore Johnson,
H.V. Jagadish, Christos Faloutsos and Alex Biliris, Online
Data Mining for Co-Evolving Time Sequences, ICDE
2000, Feb. 2000.
- Steven L. Brunton, Joshua L. Proctor, and J. Nathan
KutzAuthors, Discovering governing equations from data by
sparse identification of nonlinear dynamical systems,
PNAS 113 (15) 3932-3937, March 28, 2016,
local copy).
- Statistics background: In PTVF pp. 620-621
and ch. 14.4-14.5;
- AI background / Classification
- [HKP] chapter 8.1-2.
- Rakesh Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer
and Arun Swami An
Interval Classifier for Database Mining Applications
VLDB Conf. Proc. Vancouver, BC, Canada, Aug. 1992, pp.
- M. Mehta, R. Agrawal and J. Rissanen, `SLIQ:
Fast Scalable Classifier for Data Mining', Proc.
of the Fifth Int'l Conference on Extending Database
Technology, Avignon, France, March 1996.
- Data Mining in Databases:
- Association Rules:
- Cluster analysis: [HKP] chapter 10.
- Miscellaneous (ICA, approximate counting)
- Jia-Yu Pan, Christos Faloutsos, Masafumi Hamamoto and
Hiroyuki Kitagawa: AutoSplit:
Fast and Scalable Discovery of Hidden Variables in
Stream and Multimedia Databases, PAKDD, Sydney,
Australia, May 2004.
- Christopher Palmer, Phillip Gibbons and Christos
ANF: A Fast and Scalable Tool for Data Mining in Massive
Graphs, KDD 2002, Edmonton, Alberta, Canada,
July 2002
- Efficient
and Tunable Similar Set Retrieval, by
Aristides Gionis, Dimitrios Gunopulos and Nikos Koudas, ACM
SIGMOD, Santa Barbara, California, May 21-24, 2001.
- New
sampling-based summary statistics for improving
approximate query answers, by Phillip B. Gibbons
and Yossi Matias, ACM SIGMOD, pp 331 - 342, Seattle,
Washington, 1998.
- Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi: Efficient
Computation of Frequent and Top-k Elements in Data
Streams. ICDT 2005: 398-412
Additional, optional citations, that may be useful for
your project:
Multimedia indexing
Data mining
- Time sequences
- Graph mining:
- Tom Mitchell,
Machine Learning, McGraw Hill, 1997.
Last modified: Aug. 27, 2024, by Christos Faloutsos