Identifying Cycling Genes by Combining Sequence and Expression Data

Yong Lu, Roni Rosenfeld and Ziv Bar-Joseph


Data Sources

How Cycling Scores are Computed

Score Distributions of Human Genes with Fitted EVD

Procedure to Generate Labels for Simmulation Data

GO Enrichment Comparison

Yeast Binding Comparison

List of Top 1200 Human Genes Identified By Our Model

Annotated Human Cell Cycle Genes in the Top 1200 List

Annotated Human Cell Cycle Genes with a low cycling score

GO Validation Using A Stricter GO Set

Edge Distribution of the Human-Yeast Graph

Abstract: The expression of genes during the cell division process has now been studied in a many different species. An important goal of these studies is to identify the set of cycling genes. To date, this was done independently for each of the species studied. Due to noise and other data analysis problems, accurately deriving a set of cycling genes from expression data is a hard problem. This is especially true for some of the multicellular organisms, including humans.

Here we present the first algorithm that combines microarray expression data from multiple species for identifying cycling genes. Our algorithm represents genes from multiple species as nodes in a graph. Edges between genes represent sequence similarity. Starting with the measured expression values for each species we use Belief Propagation to determine a posterior score for genes. This posterior is used to determine a new set of cycling genes for each species.

We applied our algorithm to improve the identification of the set of cell cycle genes in budding yeast and humans. As we show, by incorporating sequence similarity information we were able to obtain a more accurate set of genes compared to methods that rely on expression data alone. Our method was especially successful for the human dataset indicating that it can use a high quality dataset from one species to overcome noise problems in another.

This website provides additional information and results that were omitted (due to lack of space) from our submission. Follow the link on the left to view additional information and results that were omitted (due to lack of space) from our submission.

For software request, please write to lyongu at cs.cmu.edu