Chenyan Xiong
Associate Professor, Language Technologies Institute, Carnegie Mellon University.
6409 GHC,
5000 Forbes Avenue
Pittsburgh, PA 15213
I am an Associate Professor at the Language Technologies Institute (LTI), affiliated with the Machine Learning Department (MLD) in the School of Computer Science at Carnegie Mellon University. I am also a member of the CMU Foundation and Language Model Center (FLAME). From 2018 to 2023, I worked at Microsoft Research Redmond on conversational search, dense retrieval, and large-scale pretraining, contributing both scientific advances and real-world impact across production systems serving billions of users and trillions of web pages. I received my Ph.D. from LTI, CMU, in 2018 under the supervision of Jamie Callan, focusing on integrating knowledge graphs and deep learning into search engines. Prior to that, I completed my undergraduate studies at Wuhan University in 2009, earned a master’s degree at the Institute of Software, Chinese Academy of Sciences in 2012, and spent two years interning at Microsoft Research Asia in Tie-Yan Liu’s group.
My research group welcomes Ph.D. students, postdoctoral researchers, and undergraduate/graduate interns. Recent publications are available at the CX Research Group at CMU. If our research interests align, please feel free to reach out.
- Ph.D. students: I primarily review applicants in LTI’s Ph.D. program. You can list me as a potential advisor in the application system and send me an email to ensure I see your materials.
- Postdocs: Please contact me directly via email.
- Current CMU students: Fill out this form and email me. I particularly enjoy working with students who share my research interests, have well-defined directions, and value long-term impact or real-world applications.
2026 Recruiting
Our group has multiple openings for 2026 Fall Ph.D.s, 2 Postdoc openings now, and multiple pre-doc internship roles in the directions of our research interests listed below. Here are some example targeted directions I am looking to grow:
- Pretraining Multi-Modality Foundation Models that unify various multi-modality capabilities, e.g., understanding, editing, and generation
- Deeper understanding and development of Mixture-of-Experts, new attention mechanisms, and other foundation model architectures
- Foundation models for robotics, self-driving, scientific robots, and generally better foundation models for VLAs
- New training mechanisms for foundation models, e.g., from the view of elevating the upper bound and closing the generation-verification gaps
- LLM Science: theoretically and systematically understanding the inner workings and source of powers of language models
- Healthcare AI: bringing in the power of foundation models into healthcare and revolutionizing medicine
I plan to attend NeurIPS 2026 in person and am happy to chat if you are on the market for these directions.
Research Interests
My recent work focuses on foundation and large language models, with particular emphasis on improving the speed–quality trade-offs in pretraining, exploring new scaling frontiers, and enabling new capabilities for next-generation GenAI applications. Current directions in my group include:
Foundation Model Science
- Advancing the Pareto frontier of scaling laws (speed–quality) through data-centric strategies, new architectures, and model–infrastructure co-design.
- Exploring scaling frontiers with synthetic data, innovative training methods, and feedback-driven learning.
- Developing foundation models with new capabilities for emerging applications in multimodality, vision–language–action, and model-as-agents.
GenAI-Native Information Retrieval
- Building agentic search and recommendation systems leveraging the new capabilities of foundation models.
- Exploring the ecosystem of the agentic web, including new organization of the digital world, new economic models, and fair revenue sharing.
- Supporting community research on agentic information systems with retrieval and large-scale training infrastructures.
New GenAI-Enabled Scenarios
- Designing healthcare foundation models to support clinical applications such as disease risk prediction and clinician copilots for improved patient outcomes.
- Developing new context learning paradigms for agent and test-time scaling in various applications.
- Adapting foundation models to verticals such as finance, robotics, and sports.