James
Cheney
University of
Edinburgh
Data
provenance as dependency analysis
*
* * * *
PLEASE NOTE: NOT USUAL DAY OR CONFERENCE ROOM * * * * *
Abstract:
Scientists in a variety
of disciplines are now using databases and other sophisticated systems
in novel ways and placing new demands on them. For example,
biologists expect data to be accompanied by so-called provenance
information explaining how the data got there, where it came from, and
how it has been manipulated. Currently, provenance is maintained
through manual effort. Doing so is expensive, tedious, and
error-prone, which motivates investigating ways of automatically
tracking and managing provenance in general-purpose systems such as
databases.
This raises many system design and implementation issues. But
there are also foundational questions that ought to be addressed first,
such as what makes a given candidate definition of provenance correct
or suitable for a given purpose. These questions have been
largely ignored. Instead, most work in this area is based on ad
hoc definitions motivated by imprecise claims that the definition
captures how parts of the input "influence", "contribute to" or "are
relevant to" parts of the output.
In this talk I will present a new provenance-tracking technique that is
equipped with a clear and (I argue) well-motivated correctness
property, called dependency provenance. For each part of the
output of a query, we define the dependency provenance as the set of
input locations on which the given output part depends, in a sense
similar to that used in programming language dependency analyses.
It is also closely related to debugging techniques such as dynamic
program slicing, adapted to databases. Calculating exact
dependency provenance turns out to be expensive (and undecidable in
general) so we consider dynamic and static over-approximations.
Joint work with Amal Ahmed and Umut
Acar
* * * * * PLEASE NOTE:
NOT USUAL DAY OR CONFERENCE ROOM * * * * *
Thursday, April 24, 2008
3:30 - 5:00 p.m.
Wean Hall 7220
Principles
of Programming Seminars