project - 826

Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Fall 2013 - C. Faloutsos

PROJECT PHASE3 - GRADING SCHEME

REMINDER - CHECK-LIST FOR ALL PROJECTS:

Hard copy: Turn in one complete report in hard copy, following the provided latex template, ie., with

Abstract,
Introduction,
Literature Survey,
Method,
Experimental Analysis,
Appendix, with the breakdown of work and all of your code (if your code is too long, contact corresponding grader).

E-copy: submit a tar file, with

all your latex/msword sources for the write-up, and
all your code plus data, packaged as specified in http://www.cs.cmu.edu/~christos/courses/826.F13/proj.html

Details for PROJECT 1 ('insects') - grader: Vagelis Papalexakis

For Project 1, you are expected to complete all 5 tasks; since all of you did Visualization on Phase 2, this means that you have to complete the Anomaly Detection task, as well as finalize the Custom Distance Function Task (which is the innovative task). You also have to complete any sub-tasks that you left incomplete in Phases 1 & 2. (there will be a penalty of 5% for any sub-task missing from the final report).

The detailed point break-down is as follows:

40% for providing a detailed explanation and evaluation of your custom distance function
40% for completing the anomaly detection task and successfully recovering as many anomalies as possible.
(-5% penalty for each missing task of the 5 ones).
15% for your code
5% for the quality/clarity of your report and the labor division.
0% for poster - it is optional for all the default projects.

Details for PROJECT 2 ('graph mining') - grader: Alex Beutel

You are expected to complete

all 7 tasks,
an innovative task, and
a thorough experimental analysis.

In your method section for each task you should explain both the math and how to use SQL to implement the algorithm. In your experimental analysis you should verify the validity of your implementation against Matlab (which can easily perform most of these tasks on small datasets) or against test data that you construct to show that the methods work. In addition to this, you should run your algorithms on as many datasets as possible (at least 5 relatively large datasets, of at least 1 million nodes). In your report you should give an overview of the results for each method, why they make sense, similarities and differences of the results across the different data sets, etc. This should be done primarily through plots (and do not use Node ID as an axis in your plots).

The detailed point break-down is as follows:

15% for explaining the methods and the math behind them.
15% for explaining how you programmed these algorithms using SQL.
56% for experiments on your methods. Each method is worth 7%: 3% for demonstrating its accuracy and 4% for interpreting its results on a variety of datasets
10% for your code
4% for the quality/clarity of your report and the labor division.
0% for poster - it is optional for all the default projects.

Details for non-default projects - grader: Christos Faloutsos

No changes from the old announcements, ie.:

Point break-down, software packaging etc: as announced in http://www.cs.cmu.edu/~christos/courses/826.F13/proj.html
These are the only projects for which the poster is mandatory.

Created: Nov. 21, 2013, by Christos Faloutsos