Carnegie Mellon University
15-826: Multimedia Databases and Data Mining
Spring 2007 - C. Faloutsos
Syllabus
DESCRIPTION
The course covers advanced algorithms for learning, analysis, data
management and visualization of large datasets. Topics include
indexing for text and DNA databases, searching medical and
multimedia databases by content, fundamental signal processing
methods, compression, fractals in databases, data mining, privacy
and security issues, rule discovery, data visualization, graph
mining, stream mining.
TOPICS TO BE COVERED
- Database topics:
-
- Traditional databases: Advanced hashing and multi-key access
methods, for main-memory and for disk-based data.
- Text databases: indexing text and DNA strings,
clustering, information filtering, LSI (singular value
decomposition).
- Multimedia databases: Searching by content in signals:
Time sequences, photographs and medical images, video clips,
feature extraction, continuous media storage and
delivery.
- Tools:
-
- Fundamental signal processing methods: Discrete Fourier
Transform, wavelets, JPEG and MPEG compression.
- Singular Value Decomposition: revisited
- Fractals in databases: Self-similarity/non-uniformity of real
datasets, fractal dimensions, selectivity using fractals and
multifractals, fractal image compression, self-similarity in
web-traffic patterns.
- Data Mining:
-
- Graph mining: ``Laws'' in large graphs (power laws; 'small
world' phenomena); graph generators; social networks.
- Sensor and time series mining: linear and non-linear
forecasting
- Review of Statistical methods,
- Review of AI-methods,
- Database methods - Massive datasets: Association rules;
Frequent sets; Single-pass learning algorithms;
Information compression and reconstruction; Sampling; Condensed
data representations; Datacubes; Cube-trees; Function finding.
- Security and Privacy Protection.
- Visualization of large data sets
- More tools: approximate counting algorithms; Independent
Component Analysis.
- OVERVIEW OF RECENT TOPICS: Graphs, trust and influence
propagation; Future directions.
PREREQUISITES: Introductory database course 15-415
(familiarity with B-trees and Hashing), or permission of the
instructor.
UNIVERSITY UNITS: 12
CORE UNITS: 1
TEXT
Copies of instructor's transparencies and notes, as well as copies
of selected articles will be made available. The required text is
Recommended, but not required texts:
- William H. Press, Saul A. Teukolsky, William T. Vetterling and
Brian P. Flannery, Numerical
Recipes in C, Cambridge University Press, 1992, 2nd
Edition.
- Raghu Ramakrishnan, Johannes Gehrke, "Database Management
Systems," McGraw-Hill 2002 (3rd ed).
- Jiawei Han and Micheline Kamber, Data Mining: Concepts and
Techniques, Morgan Kaufmann, 2000.
METHOD OF EVALUATION
The course involves
- A midterm (20%)
- Homeworks (10%)
- A Project (40%)
- A Final exam (30%)
Clarifications:
- Some homeworks will be indicated as mandatory, that is,
a passing grade is necessary.
- Projects will be carried out in teams of 1-2. A detailed
handout about the project will be distributed at the beginning of
the course, along with a list of suggested projects. The goal of
the project is to give the participants the opportunity to tackle a
large, interesting problem, which may lead to a publication.