THESIS DRAFT: Computational Methods for Analyzing and Modeling Gene Regulation Dynamics
Thesis Defense
Jason Ernst
Tuesday, August 19, 2008, 11am
Wean Hall 5409
Abstract
Gene regulation is a central biological process whose disruption can lead to many diseases. This
process is largely controlled by a dynamic network of transcription factors interacting with specific
genes to control their expression. Time series microarray gene expression experiments have become
a widely used technique to study the dynamics of this process. This thesis introduces new
computational methods designed to better utilize data from these experiments and to integrate this
data with static transcription factor-gene interaction data to analyze and model the dynamics of
gene regulation. The first method, STEM (Short Time-series Expression Miner), is a clustering
algorithm and software specifically designed for short time series expression experiments, which
represent the substantial majority of experiments in this domain. The second method, DREM (Dynamic
Regulatory Events Miner), integrates transcription factor-gene interactions with time series
expression data to model regulatory networks while taking into account their dynamic nature. The
method uses an Input-Output Hidden Markov Model to identify bifurcation points in the time series
expression data. While the method can be readily applied to some species the coverage of experimentally
determined transcription factor-gene interactions in most species is limited. To address
this we introduce two methods to improve the computational predictions of these interactions. The
first of these methods, SEREND (SEmi-supervised REgulatory Network Discoverer), motivated by
the species E. coli is a semi-supervised learning method that uses verified transcription factor-gene
interactions, DNA sequence binding motifs, and gene expression data to predict new interactions.
We also present a method motivated by human genomic data that combines motif information with a
probabilistic prior on transcription factor binding at each location in the organism's genome, which
it infers based on a diverse set of genomic properties. We applied these methods to yeast, E. coli,
and human cells. Our methods successfully predicted interactions and pathways, many of which
have been experimentally validated. Our results indicate that by explicitly addressing the temporal
nature of regulatory networks we can obtain accurate models of dynamic interaction networks in
the cell.
Thesis Committee
Ziv Bar-Joseph, Carnegie Mellon University (Chair)
Zoubin Ghahramani, Carnegie Mellon University
Naftali Kaminski, University of Pittsburgh
Zoltan Oltvai, University of Pittsburgh
Eric Xing, Carnegie Mellon University