Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Fall 2013 - C. Faloutsos
PROJECT PHASE3 - GRADING SCHEME
REMINDER - CHECK-LIST FOR ALL PROJECTS:
- Hard copy: Turn in one complete report in hard copy, following the provided latex template, ie., with
- Abstract,
- Introduction,
- Literature Survey,
- Method,
- Experimental Analysis,
- Appendix, with the breakdown of work and all of your code (if your code is too long, contact corresponding grader).
- E-copy: submit a tar file, with
- all your latex/msword sources for the write-up, and
- all your code plus data, packaged as specified in http://www.cs.cmu.edu/~christos/courses/826.F13/proj.html
Details for PROJECT 1 ('insects') - grader: Vagelis Papalexakis
For Project 1, you are expected to complete all 5 tasks; since all of
you did Visualization on Phase 2, this means that you have to complete
the Anomaly Detection task, as well as finalize the Custom Distance
Function Task (which is the innovative task). You also have to complete
any sub-tasks that you left incomplete in Phases 1 & 2. (there will
be a penalty of 5% for any sub-task missing from the final report).
The detailed point break-down is as follows:
- 40% for providing a detailed explanation and evaluation of your custom distance function
- 40% for completing the anomaly detection task and successfully recovering as many anomalies as possible.
- (-5% penalty for each missing task of the 5 ones).
- 15% for your code
- 5% for the quality/clarity of your report and the labor division.
- 0% for poster - it is optional for all the default projects.
Details for PROJECT 2 ('graph mining') - grader: Alex Beutel
You are expected to complete
- all 7 tasks,
- an innovative task, and
- a thorough experimental analysis.
In your method section for each task you should explain both the math
and how to use SQL to implement the algorithm. In your
experimental analysis you should verify the validity of your
implementation against Matlab (which can easily perform most of these
tasks on small datasets) or against test data that you construct to
show that the methods work. In addition to this, you should run
your algorithms on as many datasets as possible (at least 5 relatively
large datasets, of at least 1 million nodes). In your report you
should give an overview of the results for each method, why they make
sense, similarities and differences of the results across the different
data sets, etc. This should be done primarily through plots (and
do not use Node ID as an axis in your plots).
The detailed point break-down is as follows:
- 15% for explaining the methods and the math behind them.
- 15% for explaining how you programmed these algorithms using SQL.
- 56% for experiments on
your methods. Each method is worth 7%: 3% for demonstrating its
accuracy and 4% for interpreting its results on a variety of datasets
- 10% for your code
- 4% for the quality/clarity of your report and the labor division.
- 0% for poster - it is optional for all the default projects.
Details for non-default projects - grader: Christos Faloutsos
No changes from the old announcements, ie.:
Created: Nov. 21, 2013, by Christos Faloutsos