Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Fall 2019 - C. Faloutsos
Phase2
- clarifications/announcements
DEFAULT projects
- Explanations: Notice the emphasis on the explanations
of the outliers ('deep dive'). Imagine you are interning at a
company, and your job is to find anomalies in the
who-friends-whom graph (or who-buys-what, or who-reviews-what).
Any lead you generate, will go to human investigators, who will
spend time trying to find out whether the nodes you listed, are
indeed suspicious or not. The investigators will ignore
your leads, if they are not convinced it is worth their time.
That is, as we mentioned in the grading scheme for phase 2, for
each outlier (or group of outlier nodes), make sure that you
give
- the list of node-ids
- one or more plots that justify your decision
- Radius plot - sub-quadratic approximation: If we measure the
#hops from any of the $n$ nodes to any of the other $n$
nodes, this will be quadratic (at best). The proposed
approximation is the following:
- choose a small sample of $s$ nodes, at random
- for each node $i$ in (1, 2, ... $n$ ),
- compute how many of the $s$ nodes are within 1, 2,
3, ... hops
- and estimate the actual (and/or effective) radius
of node $i$. The actual radius is the maximum of the
number of hops; the effective radius is the 90-percentile
(ie, how many hops are needed, for node $i$ to reach
90% of the $s$ nodes it can reach)
- if node $i$ is isolated, then its radius is zero,
and so is the effective radius.
NON-DEFAULT projects
- Please contact instructor (during office hours, or send an
'invite' - any free slot 11am-8pm on the calendar,
is fine)
- for your proposed next steps
- and for clarifications on the feedback on phase1
Last modified: Oct. 23, 2019, by Christos Faloutsos