Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Fall 2012 - C. Faloutsos
Reading List
NOTICE:
Several of the links are internal to CMU.
Required texts
Recommended texts
- [HK] Jiawei Han and
Micheline Kamber, Data Mining: Concepts
and Techniques, 2nd ed., Morgan Kaufmann, 2006, or 3rd edition, 2011
- [PTVF] William H. Press Saul A.
Teukolsky William T. Vetterling Brian P. Flannery Numerical
Recipes in C Cambridge University Press, 1992, 2nd Edition.
On-line evaluation copy
- Undergraduate DB textbook, for
those who took a db class too long ago:
Foils:
In pdf, from the course schedule
page.
A. Multimedia Indexing
- Primary key access methods
- Secondary key and spatial access methods
- Jon
Louis Bentley,
Multidimensional binary search trees used for associative
searching, Comm. of the ACM (CACM), Volume 18 ,
Issue 9, pp. 509-517, (September 1975)
- A. Guttman
R-Trees: a Dynamic Index Structure for Spatial
Searching, Proc. ACM SIGMOD, June 1984, pp. 47-57, Boston,
Mass.
- J. Orenstein,
Spatial Query Processing in an Object-Oriented Database
System, Proc. ACM SIGMOD, May, 1986, pp. 326-336,
Washington D.C.
- Ibrahim Kamel and Christos Faloutsos,
Hilbert R-tree: An improved R-tree using fractals Proc.
of VLDB Conference, Santiago, Chile, Sept. 12-15, 1994, pp.
500-509.
- Roberto F. Santos Filho, Agma Traina, Caetano Traina Jr., and
Christos Faloutsos:
Similarity search without tears: the OMNI family of all-purpose
access methods ICDE, Heidelberg, Germany, April 2-6
2001.
- MM-Textbook, chapters 4 and 5.
- Fractals
- Christos Faloutsos and Ibrahim Kamel,
Beyond Uniformity and Independence: Analysis of R-trees Using
the Concept of Fractal Dimension, Proc. ACM
SIGACT-SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis,
MN.
- Alberto Belussi and Christos Faloutsos, Estimating
the Selectivity of Spatial Queries Using the `Correlation' Fractal
Dimension Proc. of VLDB, p. 299-310, 1995 (and
gzipped postscript )
- Power laws, lognormals etc: M. E. J. Newman, Power laws, Pareto
distributions and Zipf's law Contemporary Physics 46, 323-351
(2005) (local
pdf copy)
- Text and LSI
- MM-Textbook, chapter 6
- Peter W. Foltz and Susan T. Dumais,
Personalized Information Delivery: an Analysis of Information
Filtering Methods, Comm. of ACM (CACM), 35, 12, Dec. 1992,
pp. 51-60.
- SVD: In PTVF ch. 2.6; MM-Textbook Appendix D
- PageRank: Sergey Brin, Lawrence Page The Anatomy of a
Large-Scale Hypertextual Web Search Engine (1998) (local
pdf)
- HITS: Jon M. Kleinberg Authoritative Sources in a
Hyperlinked Environment JACM, 46,5 (1999) (local
pdf)
- Tensors: Tamara G. Kolda and Brett W. Bader.
Tensor decompositions and applications. Technical Report
SAND2007-6702, Sandia National Laboratories, Albuquerque, NM and
Livermore, CA, November 2007 (local
pdf copy )
- Tensors: [Graph-Textbook] Ch.16.
- Time sequences
- DSP and image databases
- Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jon Ashley,
Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee,
Dragutin Petkovic, David Steele and Peter Yanker
Query by Image and Video Content: the QBIC System IEEE
Computer 28, 9, Sep. 1995, pp. 23-32.
- Journal
of Intelligent Inf. Systems, 3, 3/4, pp. 231-262, 1994 An
earlier, more technical version of the IEEE Computer '95
paper.
- FastMap: MM-Textbook chapter 11; Also in: C.
Faloutsos and K.I. Lin
FastMap: A Fast Algorithm for Indexing, Data-Mining and
Visualization of Traditional and Multimedia Datasets ACM
SIGMOD 95, pp. 163-174
.
- DFT/DCT: In PTVF ch. 12.1, 12.3, 12.4; in MM-Textbook Appendix B.
- Wavelets: In PTVF ch. 13.10; in MM-Textbook Appendix C
- Karhunen-Loeve: in MM-Textbook Appendix D.
- JPEG: Gregory K. Wallace,
The JPEG Still Picture Compression Standard, CACM, 34,
4, April 1991, pp. 31-44
- MPEG: D. Le Gall,
MPEG: a Video Compression Standard for Multimedia
Applications CACM, 34, 4, April 1991, pp. 46-58
- Fractal compression: M.F. Barnsley and A.D. Sloan,
A Better Way to Compress Images, BYTE, Jan. 1988, pp.
215-223.
- MM-Textbook, chapter 9
B. Data mining
- Graph mining and social networks:
- Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos,
On Power-Law Relationships of the Internet Topology,
SIGCOMM 1999.
- R. Albert, H. Jeong, and A.-L. Barabási,
Diameter of the World Wide Web Nature,
401, 130-131 (1999).
- Réka Albert and Albert-László
Barabási
Statistical mechanics of complex networks, Reviews of
Modern Physics, 74, 47 (2002).
- Haveliwala, Taher H. (2003) Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search. Technical Report. Stanford InfoLab. (Extended version of the WWW2002 paper on Topic-Sensitive PageRank.)
- D. Chakrabarti, S. Papadimitriou, D. Modha and C. Faloutsos, Fully Automatic Cross-Associations, in KDD 2004 (pages 79-88), Washington, USA
- Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA
- Jure Leskovec, Jon Kleinberg, Christos Faloutsos Graphs
over Time: Densification Laws, Shrinking Diameters and Possible
Explanations, KDD 2005, Chicago, IL, USA, 2005.
- D. Chakrabarti and C. Faloutsos, Graph Mining: Laws, Generators and
Algorithms, in ACM Computing Surveys, 38(1), 2006
(
pdf draft, internal to CMU)
- J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos,
Realistic, Mathematically
Tractable Graph Generation and Evolution, Using Kronecker
Multiplication, in PKDD 2005, Porto, Portugal
- [Graph-Textbook], Ch.18: virus propagation
- [Graph-Textbook], Ch.19: Case studies: Random walk with restarts, image captioning.
- Time series forecasting
- Chungmin Melvin Chen and Nick Roussopoulos, Adaptive Selectivity Estimation Using Query Feedbacks, SIGMOD 1994.
- Byong-Kee Yi, Nikolaos D. Sidiropoulos, Theodore Johnson, H.V.
Jagadish, Christos Faloutsos and Alex Biliris,
Online Data Mining for Co-Evolving Time Sequences, ICDE
2000, Feb. 2000.
- Spiros Papadimitriou, Anthony Brockwell and Christos Faloutsos
Adaptive, Hands-Off Stream Mining VLDB 2003, Berlin,
Germany, Sept. 2003. CMU-TR
version
- Statistics background: In PTVF pp. 620-621
and ch. 14.4-14.5;
- AI background / Classification
- [HK] chapter 7.3
- Rakesh Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer and
Arun Swami
An Interval Classifier for Database Mining Applications
VLDB Conf. Proc. Vancouver, BC, Canada, Aug. 1992, pp.
560-573.
- M. Mehta, R. Agrawal and J. Rissanen, `
SLIQ: A Fast Scalable Classifier for Data Mining', Proc. of
the Fifth Int'l Conference on Extending Database Technology,
Avignon, France, March 1996.
- Data Mining in Databases:
- Data warehouses, OLAP and DataCubes: [HK], ch. 2.
- Data reduction: [HK] chapter 3.4
- Association Rules:
- Cluster analysis: [HK] chapter 8.
- Miscellaneous (ICA, approximate counting)
- Jia-Yu Pan, Christos Faloutsos, Masafumi Hamamoto and Hiroyuki
Kitagawa:
AutoSplit: Fast and Scalable Discovery of Hidden Variables in
Stream and Multimedia Databases, PAKDD, Sydney, Australia,
May 2004.
- Christopher Palmer, Phillip Gibbons and Christos Faloutsos,
ANF: A Fast and Scalable Tool for Data Mining in Massive
Graphs, KDD 2002, Edmonton, Alberta, Canada, July 2002
-
Efficient and Tunable Similar Set Retrieval, by
Aristides Gionis, Dimitrios Gunopulos and Nikos Koudas, ACM
SIGMOD, Santa Barbara, California, May 21-24, 2001.
-
New sampling-based summary statistics for improving approximate
query answers, by Phillip B. Gibbons and Yossi Matias, ACM
SIGMOD, pp 331 - 342, Seattle, Washington, 1998.
RECOMMENDED OPTIONAL READING
Additional, optional citations, that may be useful for
your project:
Multimedia indexing
- Spatial access methods:
- N. Beckmann, H.-P. Kriegel, R. Schneider B. Seeger The
R*-Tree: an Efficient and Robust Access Method for Points and
Rectangles ACM SIGMOD, May 1990, pp. 322-331 Atlantic City, NJ.
(Deferred splitting in R-trees)
- Fractals
Data mining
Last modified: Sept. 9, 2012, by Christos Faloutsos