Novel Models and Algorithms for Network Modeling, Mining and Reverse Engineering

NSF-IIS: Award # 0713379


PI: Eric P. Xing

Students:

 

Abstract:

In many problems arising in biology, social sciences and various other fields, it is often necessary to analyze populations of entities (e.g., molecules or individuals) interconnected by a network. This proposal intends to develop new statistical formalisms and computational methodologies for modeling and inference the semantic underpinnings of network entities, and investigate how these aspects influence the network topology and its temporal evolution during biological and sociological processes. It will also study a number of yet unexplored topics such as discriminative learning of network structures, recovering temporally evolving network sequences, and related theoretical issues.

 

The proposed research is envisaged to help address big-picture problems such as: 1) Hidden Identity/Function Induction, e.g., what role(s) do individuals play when they interact with different peers under different conditions? 2) Structural/Organizational Forecast, e.g., whether and how changes of molecular functions lead to alterations of biological pathways? 3) System Robustness, e.g., how a network adjusts to perturbations caused by exogenous intrusions?

 

This research straddles statistical learning, social/biological sciences and data mining. The intellectual merit of the proposed work lies in both the algorithmic and theoretical novelties of the methodological developments, and the analysis of specific social and biological networks and various other applications enabled by the proposed methods. The main novelties include: (1) new Bayesian formalisms for latent space modeling of node functions and network linkages, which capture the functional/behavioral context of network entities; (2) novel temporal extensions of exponential random graph model for network evolution, and inference/learning algorithms; (3) algorithms for reverse-engineering temporally rewiring networks from longitudinal node attribute data; and (4) novel discriminative learning algorithms for learning very-large networks from partial samples of the network and relevant learning theory. These methods will be applied to the ”ENRON email network” to explore the behavioral patterns under various business operation conditions, and to analyze a longitudinal molecular abundance profile measured from breast cancer cells to infer (alterations of) networks under carcinogenic or tumor-suppressing environments. The results are expected to advance the principles and technologies for network analysis, and enable a wide-range of applications of broader interests.

 

The proposed research is also expected to have broad educational and societal impact. As an interdisciplinary research effort, this project will provide rich opportunities for multi-disciplinary educational and research training, at both undergraduate and graduate levels. A thorough understanding of social network structures in human populations can have significant impact on important issues such as policy making or technology adoption. Knowledge of cellular networks and its changes in response to exogenous interventions can help reasoning disease causes and designing therapeutic schemes. Our methodological and software deliverables can potentially facilitate such studies, improve the cost-effectiveness of network data collection, and foster future development in this area.

 

Publication:

·       F. Guo, S. Hanneke, W. Fu and E. P. Xing, Recovering Temporally Rewiring Networks: A model-based approach, Proceedings of the 24th International Conference on Machine Learning (ICML 2007).

·       Airoldi, E.M., Blei, D.M., Fienberg, S.E., & Xing, E.P. (submitted). Mixed-membership stochastic blockmodels. (Arxiv stat/0705.4485) (an earlier version appeared at Statistical Netwrok Analysis workshop, ICML 2006)

·       Airoldi, E.M., Blei, D.M., Fienberg, S.E., & Xing, E.P. (submitted). Admixtures of latent blocks with application to protein interaction networks. (Arxiv q-bio/0706.0294) (Recipient of the John Van Ryzin award, 2006) (an earlier version appeared at Link-KDD workshop, KDD 2005)

 

Back to top

Last updated 06/25/2007