Matthew Ruffalo and Ziv Bar-Joseph
Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease. The main challenge for predicting such interactions is the very small positive training set currently available. Another challenge is the fact that a large fraction of miRNAs are encoded within genes making it hard to determine the specific way in which they are regulated.
To enable genome wide predictions of TF--miRNA interactions, we extended semi-supervised machine learning approaches to integrate a large set of different types of data including sequence, expression, ChiP-Seq, and epigenetic data. As we show, the methods we develop achieve good performance on both, a labeled test set, and on when analyzing general co-expression networks. We next analyze mRNA and miRNA cancer expression data demonstrating the advantage of using the predicted set of interactions for identifying more coherent and relevant modules, genes and miRNAs. The complete set of predictions is available on the supporting website and can be used by any method that combines miRNAs, genes and TFs.