|
J. Ernst,
G.J. Nau
, and Z. Bar-Joseph
Clustering Short Time Series Gene Expression Data
Bioinformatics (Proceedings of ISMB 2005), 21 Suppl. 1, pp. i159-i168, 2005.
Abstract
Motivation:
Time series expression experiments are used to study a
wide range of biological systems. More than 80%
of all time series expression datasets are short (8 time points or fewer).
These datasets present unique challenges. Due to the large number
of genes profiled (often tens of thousands) and the small number
of time points many patterns are expected to arise at random. Most
clustering algorithms are unable to distinguish between real and
random patterns.
Results:
We present an algorithm specifically designed for clustering
short time series expression data.
Our algorithm works by assigning genes to a pre-defined
set of model profiles that capture the potential distinct patterns
that can be expected from the experiment. We discuss how to obtain such a
set of profiles and how to determine the significance of each of these
profiles. Significant profiles are retained for further analysis and can be
combined to form clusters.
We tested our method on both simulated and real biological data. Using
immune response data we show that our algorithm can correctly detect
the temporal profile of relevant functional categories.
Using GO analysis we show that our
algorithm outperforms both
general clustering algorithms and algorithms designed specifically for
clustering time series gene expression data.