Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Spring 2007 - C. Faloutsos
Final exam study guide
Reminders:
- Exam duration: 3 hours, on Friday May 11, 8:30am-11:30, PH A18A (but double-check room)
- All aids allowed, EXCEPT laptop (due to its wireless
connection)
- The exam will be comprehensive, with more emphasis on
the material after the midterm
- Several of the links are internal to CMU.
- The reading list below is a slightly modified version of the
original reading list. Namely, we added the paper on ICA,
and we deleted the papers on approximate counting.
Required text
Recommended text
- [HK] Jiawei Han and Micheline Kamber, Data Mining: Concepts
and Techniques, Morgan Kaufmann, 2000.
- [PTVF] William H. Press Saul A.
Teukolsky William T. Vetterling Brian P. Flannery Numerical
Recipes in C Cambridge University Press, 1992, 2nd Edition.
On-line evaluation copy
- Undergraduate DB textbook, for
those who took a db class too long ago:
-
- Raghu Ramakrishnan, Johannes Gehrke, "Database Management
Systems," McGraw-Hill 2002 (3rd ed).
Foils:
In pdf, from the course schedule page.
A. Multimedia Indexing
- Primary key access methods
- Secondary key and spatial access methods
- A. Guttman
R-Trees: a Dynamic Index Structure for Spatial
Searching, Proc. ACM SIGMOD, June 1984, pp. 47-57, Boston,
Mass.
- J. Orenstein,
Spatial Query Processing in an Object-Oriented Database
System, Proc. ACM SIGMOD, May, 1986, pp. 326-336,
Washington D.C..
- Textbook, chapters 4 and 5.
- Fractals
- Ibrahim Kamel and Christos Faloutsos,
Hilbert R-tree: An improved R-tree using fractals Proc.
of VLDB Conference, Santiago, Chile, Sept. 12-15, 1994, pp.
500-509.
- Christos Faloutsos and Ibrahim Kamel,
Beyond Uniformity and Independence: Analysis of R-trees Using
the Concept of Fractal Dimension, Proc. ACM
SIGACT-SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis,
MN.
- Text and LSI
- Time sequences
- DSP and image databases
- Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jon Ashley,
Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee,
Dragutin Petkovic, David Steele and Peter Yanker
Query by Image and Video Content: the QBIC System IEEE
Computer 28, 9, Sep. 1995, pp. 23-32. (hard copy - on reserve)
- Journal
of Intelligent Inf. Systems, 3, 3/4, pp. 231-262, 1994 An
earlier, more technical version of the IEEE Computer '95
paper.
- FastMap: Textbook chapter 11; Also in: C.
Faloutsos and K.I. Lin FastMap: A Fast Algorithm for Indexing,
Data-Mining and Visualization of Traditional and Multimedia
Datasets ACM SIGMOD 95, pp. 163-174.
- DFT/DCT: In PTVF ch. 12.1, 12.3, 12.4; in
Textbook Appendix B.
- Wavelets: In PTVF ch. 13.10; in Textbook Appendix C
- Karhunen-Loeve: in Textbook Appendix D.
- JPEG: Gregory K. Wallace,
The JPEG Still Picture Compression Standard, CACM, 34,
4, April 1991, pp. 31-44
- MPEG: D. Le Gall,
MPEG: a Video Compression Standard for Multimedia
Applications CACM, 34, 4, April 1991, pp. 46-58
- Fractal compression: M.F. Barnsley and A.D. Sloan,
A Better Way to Compress Images, BYTE, Jan. 1988, pp.
215-223. (hard copy: on reserve)
- Textbook, chapter 9
B. Data mining
- Graph mining and social
networks:
- Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos,
On Power-Law Relationships of the Internet Topology,
SIGCOMM 1999.
- R. Albert, H. Jeong, and A.-L. Barabási, Diameter of the
World Wide Web, Nature, 401,
130-131 (1999).
- Réka Albert and Albert-László
Barabási Statistical mechanics
of complex networks, Reviews of Modern Physics,
74, 47 (2002).
- Time series forecasting
- Statistics background:
In PTVF pp. 620-621 and ch. 14.4-14.5;
- AI background /
Classification
- [HK] chapter 7.3
- Rakesh Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer and
Arun Swami
An Interval Classifier for Database Mining Applications
VLDB Conf. Proc. Vancouver, BC, Canada, Aug. 1992, pp.
560-573.
- M. Mehta, R. Agrawal and J. Rissanen, `SLIQ:
A Fast Scalable Classifier for Data Mining', Proc. of the
Fifth Int'l Conference on Extending Database Technology, Avignon,
France, March 1996.
- Data Mining in
Databases:
--------------DELETED (i.e., the papers below will not be
examined in the final)--------------------------
- Miscellaneous
(approximate counting)
- Christopher Palmer, Phillip Gibbons and Christos Faloutsos,
ANF: A Fast and Scalable Tool for Data Mining in Massive
Graphs, KDD 2002, Edmonton, Alberta, Canada, July 2002
-
Efficient and Tunable Similar Set Retrieval, by
Aristides Gionis, Dimitrios Gunopulos and Nikos Koudas, ACM
SIGMOD, Santa Barbara, California, May 21-24, 2001.
-
New sampling-based summary statistics for improving approximate
query answers, by Phillip B. Gibbons and Yossi Matias, ACM
SIGMOD, pp 331 - 342, Seattle, Washington, 1998.
Last modified May 1, 2007, by Christos Faloutsos