Hello World!
I'm Sherry Tongshuang Wu
(吴彤霜)!
Assistant Professor
School of CS, Carnegie Mellon University (CMU SCS)
Human Computer Interaction Institute (HCII)
Language Technololgy Institute (LTI)

I am trained (by my amazing PhD advisors Jeffrey Heer and Dan Weld at the University of Washington) to be an HCI+NLP researcher. I study how humans (AI experts, lay users, domain experts) interact with (debug, audit, collaborate) AI systems.

Most recently, I work on:

Build practical AI systems, by mapping general-purpose AIs to the right specific use cases.

Click & jump to some recent papers that represent my research interests and style:
If you are interested in exploring relevant topics with me at CMU, I will be looking for undergraduate, master or PhD students! PLEASE read this FAQ to find out our open projects and best ways to contact us.

Research Highlights

Real-world AI Evaluation
General Scales Unlock AI Evaluation with Explanatory and Predictive Power
Lexin Zhou, Lorenzo Pacchiardi, Fernando Martínez-Plumed, Katherine M. Collins, Yael Moros-Daval, Seraphina Zhang, Qinlin Zhao, Yitian Huang, Luning Sun, Jonathan E. Prunty, Zongqian Li, Pablo Sánchez-García, Kexin Jiang Chen, Pablo A. M. Casares, Jiyun Zu, John Burden, Behzad Mehrbakhsh, David Stillwell, Manuel Cebrian, Jindong Wang, Peter Henderson, Sherry Tongshuang Wu, Patrick C. Kyllonen, Lucy Cheke, Xing Xie, José Hernández-Orallo
ArXiv 2025: ArXiv 2409.08775
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma*, Dora Zhao*, Xinran Zhao, Chenglei Si, Chenyang Yang, Ryan Louie, Ehud Reiter, Diyi Yang+, Tongshuang Wu+
ACL Findings 2025: Findings of the Association for Computational Linguistics
Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Tongshuang Wu
CoLM 2024: Conference on Language Modeling
Task-specific AI Test & Distill.
What Prompts Don’t Say: Understanding and Managing Underspecification in LLM Prompts
Chenyang Yang, Yike Shi, Qianou Ma, Michael Xieyang Liu, Christian Kästner, Tongshuang Wu
ArXiv 2025: arXiv:2505.13360
Checklists Are Better Than Reward Models For Aligning Language Models
Vijay Viswanathan, Yanchao Sun, Shuang Ma, Xiang Kong, Meng Cao, Graham Neubig, Tongshuang Wu
ArXiv 2025: arXiv:2507.18624
Promp2Model: Generating Deployable Models from Natural Language Instructions
Vijay Viswanathan, Chenyang Zhao, Amanda Bertsch, Tongshuang Wu, Graham Neubig
EMNLP Demo Track 2023: The 2023 Conference on Empirical Methods in Natural Language Processing
Human-AI Task Delegation
MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
Jushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz, Tongshuang Wu
EMNLP 2025: The 2025 Conference on Empirical Methods in Natural Language Processing
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
Tongshuang Wu, Haiyi Zhu, Maya Albayrak, Alexis Axon, Amanda Bertsch, Wenxing Deng, Ziqi Ding, Bill Guo, Sireesh Gururaja, Tzu-Sheng Kuo, Jenny T Liang, Ryan Liu, Ihita Mandal, Jeremiah Milbauer, Xiaolin Ni, Namrata Padmanabhan, Subhashini Ramkumar, Alexis Sudjianto, Jordan Taylor, Ying-Jui Tseng, Patricia Vaidos, Zhijin Wu, Wei Wu, Chenyang Yang
CHI Case Study 2025: the 2025 Conference on Human Factors in Computing Systems
What Should We Engineer in Prompts? Training Humans in Requirement-Driven LLM Use
Qianou Ma, Weirui Peng, Chenyang Yang, Hua Shen, Kenneth Koedinger, Tongshuang Wu
TOCHI 2025: ACM Transactions on Computer-Human Interaction