Carolyn Penstein Rose


cprose@cs.cmu.edu
Kavcic-Moura Professor of Computer Science

A picture of me

Carnegie Mellon University
Language Technologies Institute
and HCI Institute
Gates-Hillman Center 5515
5000 Forbes Ave.
Pittsburgh, PA 15213-3891
+1 (412) 268-7130 (W)
+1 (412) 268-6298 (F)

Google Scholar Profile: Publications
Past President and Inaugural Fellow of The International Society of the Learning Sciences
IEEE Senior Member
AAAS Lesherner Leadership Fellow for Public Engagement with Science: AI Cohort

Welcome to my webpage!

I have the privilege of directing the Teledia lab, a large, interdisciplinary lab involving PhD, Masters, and undergraduate students, staff, and affiliates researching interactive and explainable Sociotechnical Artificial Intelligence from a highly interdisciplinary perspective. From a machine learning perspective, we push the frontier of learnability and generalizabiliry through deeply data focused explorations of inductive biases. In particular, we develop and explore novel representations and architectural elements using a problem-driven approach motivated by error analysis and exploratory data analysis, with a current emphasis on abstraction and decomposition, which are arguably two of the greatest challenges for LLMs. We investigate these issues across multiple problem areas including multimodal conversational process analysis, multimodal document understanding, clinical text processing, knowledge based question answering, and language models of code. Our work is particularly known for the way it bridges between deep, theoretical insights from theories of language and interaction on the one side (e.g., social psychology and cognitive psychology, sociolinguistics, and discourse analysis) and computational modeling technology on the other (e.g., deep learning, LLMs, neurosymbolic reasoning). The key enabler of effective machine learning is the measurement of capabilities, operationalized in computational objectives, and embodied in benchmarks. We have collaborated on the development of benchmarks and challenge data sets in coreference for dialogue, textual entailment, and event ordering, and are working on new benchmarks for code translation and code review.

Recent/Upcoming Invited Talks:
  • Invited Speaker: Human-centered GenAI for Education, NeurIPS 2024
  • Invited Panelist: Human-Centered Large Language Modeling Workshop, Annual Meeting of the Association for Computational Linguistics, Summer 2024
  • Keynote Speaker: Annual Meeting of the International Educational Data Mining Society (EDM), July 16, 2024
  • Invited Discussant: Symposium on What does it mean to be literate in the time of AI? Different Perspectives on Learning and Teaching AI Literacies in K-12 Education, Annual Meeting of the International Society of the Learning Sciences, Summer 2024
  • Keynote Speaker: ODSC East Virtual Conference, April 24, 2024
  • Invited Speaker: SUNY IAD Days, Institute for Artificial Intelligence and Data Science, Spring 2024

My 3+ decades long passion is to use technology to positively impact human learning. Building my group’s computational advances, with numerous papers published at top conferences in Language Technologies, our research has birthed and substantially contributed to the growth of two thriving interrelated areas of research in the Learning Sciences: namely, Automated Analysis of Collaborative Learning Processes and Dynamic Support for Collaborative Learning, with demonstrations of efficacy in numerous classroom studies where these interventions have frequently been associated with increases in learning on average of a letter grade or more. Recent work employs LLM agents to support learning in collaborative software teams. Other recent work focuses on AI Literacy, including work to address perception and effectiveness of LLM guardrails as well as developing curricula and learning technologies to engage K12 students in learning about artificial intelligence and machine learning. Our highly interdisciplinary work, published in over 320 peer reviewed publications (H-index 65), is represented in the top venues in 5 fields: namely, Language Technologies, Learning Sciences, Cognitive Science, Educational Technology, and Human-Computer Interaction, with awards in 4 of these fields. In Summer of 2023, I directed Carnegie Mellon University’s Generative AI Innovation Incubator, with numerous online events available for viewing on its zoom playlist, which has been accessed nearly 9K times. I am Fellow of the AAAS Leshner Leadership Institute for Public Engagement with Science, past president and inaugural fellow of the International Society of the Learning Sciences, Senior Member of IEEE, former co-editor-in-chief of the International Journal of Computer Supported Collaborative Learning, and Founding chair of the International Alliance to Advance Learning in a Digital Era.

Selected Publications
  • Dutt, R., Wu, Z,, Shi, J., Sheth, D., Gupta, P., Rosé, C. P. (2024). Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics
  • Nourbakhsh, A., Rosé, C. P. (20204).Towards a new research agenda for multimodal enterprise document understanding: What are we missing? Findings of the 62nd Annual Meeting of the Association for Computational Linguistics
  • Xie, Y., Rosé, C. P. (2024). DocLens: Multi-aspect Fine-grained Medical Text Evaluation Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics
  • Zhou, C., Ao, D., Rosé, C. P. (2024). Estimating Agreement by Chance for Sequence Annotation Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics
  • Yao, H. R., Breitfeller, L., Naik, A., Zhou, C. and Rosé, C. P. (2024). Distilling Multi-Scale Knowledge for Event Temporal Relation Extraction. Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024) (Full Research Paper track, 23% acceptance rate)
  • Wu, Z., Dutt, R., Rosé, C. P. (2024). Evaluating Large Language Models on Social Signal Sensitivity: An Appraisal Theory Approach, Proceedings of the Workshop on Human-Centered Large Language Modeling Workshop, ACL 2024.
  • Naik, Atharva, Yin, Jessica, Kamanth, Anusha, Ma, Qianou, Wu, Sherry Tongshuang, Murray, R Charles, Sakr, Majd, and Rosé, Carolyn (2024). Generating Situated Reflection Prompts About Alternative Solution Paths: A Case Study in Generative AI for Computer-Supported Collaborative Learning, Proceedings of AI in Education (nominated for best paper)
  • Jeanne McClure, Juan Zheng, Franziska Bickel, Shiyan Jiang, Carolyn P. Rosé and Jie Chao (2024). Modeling with Primary Sources: An Approach to Teach Data Bias for Artificial Intelligence and Machine Learning Education, Proceedings of the Annual Meeting of the International Society of the Learning Sciences (best design paper award)
  • Y Xie, A Naik, D Fried, C Rose (2023). Data Augmentation for Code Translation with Comparable Corpora and Multiple References, Proceedings of Findings of Empirical Methods in Natural Language Processing
  • Armineh Nourbakhsh, Sameena Shah and Carolyn Rosé(2023). Using counterfactual contrast to improve compositional generalization for multi-step quantitative reasoning. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics
  • James Fiacco, David Adamson and Carolyn Rosé (2023). Towards Extracting and Understanding the Implicit Rubrics of Transformer Based Automatic Essay Scoring Models. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications