826 - syllabus

Carnegie Mellon University
15-826: Multimedia Databases and Data Mining
Fall 2019 - C. Faloutsos

Syllabus

DESCRIPTION

The course covers advanced algorithms for learning, analysis, data management and visualization of large datasets. Topics include indexing for text and DNA databases, searching medical and multimedia databases by content, fundamental signal processing methods, compression, fractals in databases, data mining, privacy and security issues, rule discovery, data visualization, graph mining, stream mining.

TOPICS TO BE COVERED

Database topics:
- Traditional databases: Advanced hashing and multi-key access methods, for main-memory and for disk-based data.
- Text databases: indexing text and DNA strings, clustering, information filtering, LSI (singular value decomposition).
- Multimedia databases: Searching by content in signals: Time sequences, photographs and medical images, video clips, feature extraction, continuous media storage and delivery.
Tools:
- Fundamental signal processing methods: Discrete Fourier Transform, wavelets, JPEG and MPEG compression.
- Singular Value Decomposition: revisited
- Fractals in databases: Self-similarity/non-uniformity of real datasets, fractal dimensions, selectivity using fractals and multifractals, fractal image compression, self-similarity in web-traffic patterns.
Data Mining:
- Graph mining: ``Laws'' in large graphs (power laws; 'small world' phenomena); graph generators; social networks.
- Sensor and time series mining: linear and non-linear forecasting
- Review of Statistical methods,
- Review of AI-methods,
- Database methods - Massive datasets: Association rules; Frequent sets; Single-pass learning algorithms; Information compression and reconstruction; Sampling; Condensed data representations; Datacubes; Cube-trees; Function finding.
- Security and Privacy Protection.
- Visualization of large data sets
- More tools: approximate counting algorithms; Independent Component Analysis.
OVERVIEW OF RECENT TOPICS: trust and influence propagation; Future directions.

PREREQUISITES: Introductory database course 15-415/615 or 15-445/645 (familiarity with B-trees and Hashing), or permission of the instructor.

UNIVERSITY UNITS: 12

CORE UNITS: 1

TEXT

Copies of instructor's transparencies and notes, as well as copies of selected articles will be made available. The required texts are:

Christos Faloutsos, Searching Multimedia Databases by Content, Kluwer Academic Press, 1996. (evaluation copy - internal to CMU)
Deepayan Chakrabarti and Christos Faloutsos Graph Mining: Laws, Tools and Case Studies, Morgan Claypool, 2012 (evaluation copy - internal to CMU)

Recommended, but not required texts:

William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery, Numerical Recipes in C, Cambridge University Press, 1992, 2nd Edition.
Raghu Ramakrishnan, Johannes Gehrke, "Database Management Systems," McGraw-Hill 2002 (3rd ed).
Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 3rd edition, 2011

METHOD OF EVALUATION

The course involves

A midterm (20%)
Homeworks (10%) (hw1: 1%, hw2,3,4: 3% each)
A Project (40%)
A Final exam (30%)

Clarifications:

Projects will be carried out in teams of 2. A detailed handout about the project will be distributed at the beginning of the course, along with a list of suggested projects. The goal of the project is to give the participants the opportunity to tackle a large, interesting problem, which may lead to a publication, and/or a large-size software system.

Last updated: Sept. 2, 2019, by Christos Faloutsos

Carnegie Mellon University 15-826: Multimedia Databases and Data Mining Fall 2019 - C. Faloutsos