Carnegie Mellon University
15-826: Multimedia and Data Mining
Fall 2024 - Christos Faloutsos
Reading list
NOTICE:
Several of the links are internal
to CMU.
Required texts
Recommended texts
- [HKP] Jiawei Han,
Micheline Kamber and Jian Pei, Data Mining:
Concepts and Techniques, 3rd ed., Morgan Kaufmann, 2011 (amazon
link).
- [PTVF] William H. Press Saul A.
Teukolsky William T. Vetterling Brian P. Flannery Numerical
Recipes in C Cambridge University Press,
1992, 2nd Edition, free on-line.
Latest
is 3rd edition, 2007.
- Undergraduate DB textbook,
for those who took a db class too long ago:
Foils:
In pdf, from the course schedule
page.
A. Multimedia Indexing
- Secondary key and spatial access methods
- Jon Louis Bentley, Multidimensional
binary search trees used for associative searching,
Comm. of the ACM (CACM), Volume 18 , Issue 9,
pp. 509-517, (September 1975)
- A. Guttman R-Trees:
a Dynamic Index Structure for Spatial Searching,
Proc. ACM SIGMOD, June 1984, pp. 47-57, Boston, Mass.
- J. Orenstein, Spatial
Query Processing in an Object-Oriented Database System,
Proc. ACM SIGMOD, May, 1986, pp. 326-336, Washington D.C.
- MM-Textbook, chapters 4 and 5.
- Fractals
- Christos Faloutsos and Ibrahim Kamel, Beyond
Uniformity and Independence: Analysis of R-trees Using
the Concept of Fractal Dimension, Proc. ACM
SIGACT-SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis,
MN.
- Bernd-Uwe Pagel, Flip Korn and Christos Faloutsos, Deflating
the
Dimensionality Curse using Multiple Fractal Dimensions,
ICDE 2000, San Diego, CA, Feb. 2000.
- Power laws, lognormals etc: M. E. J. Newman, Power
laws, Pareto distributions and Zipf's law Contemporary
Physics 46, 323-351 (2005) (local pdf
copy)
- Text and LSI
- MM-Textbook, chapter 6
- Peter W. Foltz and Susan T. Dumais, Personalized
Information Delivery: an Analysis of Information
Filtering Methods, Comm. of ACM (CACM), 35, 12,
Dec. 1992, pp. 51-60.
- SVD: In PTVF ch. 2.6; MM-Textbook
Appendix D
- PageRank: Sergey Brin, Lawrence Page The Anatomy of a
Large-Scale Hypertextual Web Search Engine (1998)
(local pdf)
- HITS: Jon M. Kleinberg Authoritative Sources in a
Hyperlinked Environment JACM, 46,5 (1999) (local pdf)
- Tensors survey: Papalexakis, Faloutsos, Sidiropoulos Tensor
for Data Mining and Data Fusion: Models, Applications, and
Scalable Algorithms ACM Trans. on Intelligent Systems
and Technology, 8,2, Oct. 2016. (local
copy)
- Tensors: [Graph-Textbook] Ch.16.
- Time sequences
- DSP and image databases
- Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jon
Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner,
Denis Lee, Dragutin Petkovic, David Steele and Peter Yanker
Query
by Image and Video Content: the QBIC System IEEE
Computer 28, 9, Sep. 1995, pp. 23-32.
- FastMap: MM-Textbook chapter 11; Also
in: C. Faloutsos and K.I. Lin FastMap:
A Fast Algorithm for Indexing, Data-Mining and
Visualization of Traditional and Multimedia Datasets
ACM SIGMOD 95, pp. 163-174.
- DFT/DCT: In PTVF ch. 12.1, 12.3, 12.4;
in MM-Textbook Appendix B.
- Wavelets: In PTVF ch. 13.10; in MM-Textbook Appendix C
- Karhunen-Loeve: in MM-Textbook Appendix
D.
- JPEG: Gregory K. Wallace, The
JPEG Still Picture Compression Standard, CACM,
34, 4, April 1991, pp. 31-44
- MPEG: D. Le Gall, MPEG:
a Video Compression Standard for Multimedia Applications
CACM, 34, 4, April 1991, pp. 46-58
- Laurens van der Maaten, Geoffrey Hinton, Visualizing
Data using t-SNE, JMLR 9(86):2579−2605, 2008.
local copy.
- Leland McInnes, John Healy, James Melville, UMAP: Uniform
Manifold Approximation and Projection for Dimension
Reduction, arxiv, 2018
local copy.
- Fractal compression: M.F. Barnsley and A.D. Sloan, A
Better Way to Compress Images, BYTE, Jan. 1988,
pp. 215-223.
- MM-Textbook, chapter 9
B. Data mining
- Graph mining and social networks:
- Michalis Faloutsos, Petros Faloutsos and Christos
Faloutsos,
On Power-Law Relationships of the Internet Topology,
SIGCOMM 1999.
- R. Albert, H. Jeong, and A.-L. Barabási, Diameter
of the World Wide Web Nature, 401,
130-131 (1999).
- Réka Albert and Albert-László Barabási Statistical
mechanics of complex networks, Reviews of
Modern Physics, 74, 47 (2002).
- Haveliwala,
Taher H. (2003) Topic-Sensitive PageRank: A
Context-Sensitive Ranking Algorithm for Web Search.
Technical Report. Stanford InfoLab. (Extended version of the
WWW2002 paper on Topic-Sensitive PageRank.)
- D.
Chakrabarti,
S. Papadimitriou, D. Modha and C. Faloutsos, Fully Automatic
Cross-Associations, in KDD 2004 (pages
79-88), Washington, USA
- Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs:
Problem Definition and Fast Solutions, KDD
2006, Philadelphia, PA
- Jure Leskovec, Jon Kleinberg, Christos Faloutsos Graphs
over
Time: Densification Laws, Shrinking Diameters and Possible
Explanations, KDD 2005, Chicago, IL, USA, 2005.
- J. Leskovec, D. Chakrabarti, J. Kleinberg, and C.
Faloutsos, Realistic, Mathematically
Tractable Graph Generation and Evolution, Using
Kronecker Multiplication, in PKDD 2005,
Porto, Portugal
- Danai Koutra, Tai-You Ke, U. Kang, Duen Horng Chau,
Hsing-Kuo Kenneth Pao, and Christos Faloutsos.
Unifying guilt-by-association approaches: theorems and
fast algorithms. ECML/PKDD'11, Athens, Greece.
- [Graph-Textbook], Ch.18: virus
propagation
- [Graph-Textbook],
Ch.19: Case studies: Random walk with restarts, image
captioning.
- Time series forecasting
- Chungmin Melvin Chen and Nick Roussopoulos, Adaptive Selectivity
Estimation Using Query Feedbacks, SIGMOD
1994.
- Byong-Kee Yi, Nikolaos D. Sidiropoulos, Theodore Johnson,
H.V. Jagadish, Christos Faloutsos and Alex Biliris, Online
Data Mining for Co-Evolving Time Sequences, ICDE
2000, Feb. 2000.
- Steven L. Brunton, Joshua L. Proctor, and J. Nathan
KutzAuthors, Discovering governing equations from data by
sparse identification of nonlinear dynamical systems,
PNAS 113 (15) 3932-3937, March 28, 2016, https://doi.org/10.1073/pnas.1517384113
(
local copy).
- Statistics background: In PTVF pp. 620-621
and ch. 14.4-14.5;
- AI background / Classification
- [HKP] chapter 8.1-2.
- Rakesh Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer
and Arun Swami An
Interval Classifier for Database Mining Applications
VLDB Conf. Proc. Vancouver, BC, Canada, Aug. 1992, pp.
560-573.
- M. Mehta, R. Agrawal and J. Rissanen, `SLIQ:
A
Fast Scalable Classifier for Data Mining', Proc.
of the Fifth Int'l Conference on Extending Database
Technology, Avignon, France, March 1996.
- Data Mining in Databases:
- Association Rules:
- Cluster analysis: [HKP] chapter 10.
- Miscellaneous (ICA, approximate counting)
- Jia-Yu Pan, Christos Faloutsos, Masafumi Hamamoto and
Hiroyuki Kitagawa: AutoSplit:
Fast and Scalable Discovery of Hidden Variables in
Stream and Multimedia Databases, PAKDD, Sydney,
Australia, May 2004.
- Christopher Palmer, Phillip Gibbons and Christos
Faloutsos,
ANF: A Fast and Scalable Tool for Data Mining in Massive
Graphs, KDD 2002, Edmonton, Alberta, Canada,
July 2002
- Efficient
and Tunable Similar Set Retrieval, by
Aristides Gionis, Dimitrios Gunopulos and Nikos Koudas, ACM
SIGMOD, Santa Barbara, California, May 21-24, 2001.
- New
sampling-based summary statistics for improving
approximate query answers, by Phillip B. Gibbons
and Yossi Matias, ACM SIGMOD, pp 331 - 342, Seattle,
Washington, 1998.
- Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi: Efficient
Computation of Frequent and Top-k Elements in Data
Streams. ICDT 2005: 398-412
RECOMMENDED OPTIONAL READING
Additional, optional citations, that may be useful for
your project:
Multimedia indexing
Data mining
- Time sequences
- Graph mining:
- Tom Mitchell,
Machine Learning, McGraw Hill, 1997.
Last modified: Aug. 27, 2024, by Christos Faloutsos