Sireesh Gururaja

prof_pic.jpg

I’m a PhD student at Carnegie Mellon University’s Language Technologies Institute, advised by Emma Strubell. I previously completed a masters degree here under the supervision of Carolyn Rose, and a BA in computer science at Columbia University. My work is supported by the Army Research Lab’s HTMDEC US Citizen Fellowship and the Mozilla Foundation.

My research focuses on NLP and AI tools that allows users in specialized domains to keep agency in their work. How can we empower people to customize and change their tools to reflect and be useful to how they see their own jobs, rather than how their boss or a tech company with a billion other users does? More concretely, I focus on user-customizable, on-device models that live in the browser, and how to effectively reason their limitations and update them. I’m also interested in the incentives that shape NLP research, whether funding, tooling, or culture.

Before coming to CMU, I spent six years in industry. I started at IBM Watson in 2015 on a team that did bespoke prototypes; I then moved to Kensho Technologies in 2018, where I spent three and a half years, first working as an ML engineer focused on NLP, then as the first lead of the ML Ops and Internal Tools team.

You can find my CV here.

news

Dec 12, 2025 I’m proposing my thesis next week, on 12/19! If you’re interested, you can take a look at the proposal here.
Dec 02, 2025 I’m at NeurIPS this week, presenting our work on data-driven materials design as a dataset proposal. I also contributed to work at the Tackling Climate Change with Machine Learning Workshop.
Jun 22, 2025 I’m presenting two papers at ACL next month! an interview study that characterizes the sociotechnical gap between what experts in materials science and law/policy do, and and where NLP research is focused, and our work on Collage, a tool for facilitating rapid prototyping, co-design, and debugging of information extraction approaches on PDFs.

selected work

  1. ACL ’25 Findings
    Beyond Text: Expert Needs in Document Research
    Sireesh Gururaja, Nupoor Gandhi, Jeremiah Milbauer, and 1 more author
    2025
  2. SDProc@ACL ’25
    Collage: Decomposable Rapid Prototyping for Information Extraction on Scientific PDFs
    Sireesh Gururaja, Yueheng Zhang, Guannan Tang, and 6 more authors
    2025
  3. Non-archival
    Data-driven Design as a High-Impact, Ecologically Valid Benchmark for Document Understanding
    Sireesh Gururaja, Junwon Seo, Hung-Yi Lin, and 3 more authors
    2025
  4. Preprint
    Basic Research, Lethal Effects: Military AI Research Funding as Enlistment
    David Gray Widder, Sireesh Gururaja, and Lucy Suchman
    2024
  5. EMNLP ’23
    To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing
    Sireesh Gururaja, Amanda Bertsch, Clara Na, and 2 more authors
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023