Abstract
Assume a uniform, multidimensional grid of bivariate data, where each
cell of the grid has a count c_i and a baseline b_i. Our goal is to find
spatial regions (d-dimensional rectangles) where the c_i are significantly
higher than expected given b_i. We focus on two applications: detection
of clusters of disease cases from epidemiological data (emergency
department visits, over-the-counter drug sales), and discovery of regions
of increased brain activity corresponding to given cognitive tasks (from
fMRI data). Each of these problems can be solved using a spatial scan
statistic (Kulldorff, 1997), where we compute the maximum of a likelihood
ratio statistic over all spatial regions, and find the significance of
this region by randomization. However, computing the scan statistic for
all spatial regions is generally computationally infeasible, so we
introduce a novel fast spatial scan algorithm, generalizing the 2D scan
algorithm of (Neill and Moore, 2004) to arbitrary dimensions. Our new
multidimensional multiresolution algorithm allows us to find spatial
clusters up to 1400x faster than the naive spatial scan, without any loss
of accuracy.
This is joint work with Andrew W. Moore.
|
Pradeep Ravikumar Last modified: Fri Oct 15 10:37:45 EDT 2004