Siddharth Dalmia

Hi! I am Sid [s-ih-d]. I am a Research Scientist at Google Deepmind, where I am working on Gemini, in particular building reliable evaluations for audio and long-context capabilities.

I graduated with a PhD from the Language Technologies Institute of School of Computer Science at Carnegie Mellon University. I was fortunate to be advised by Florian Metze (now at Meta), Alan W Black and Shinji Watanabe.

During my Ph.D., I worked on making sequence models amenable to resource-constrained scenarios (both data and compute) by exploiting the compositionality principles of system building, like task-simplification, reusability, transferability, and data-pooling, into sequence models used for various speech and language tasks.

I have also spent time doing research at Google Brain (2021), Amazon AWS AI (2020), Facebook AI Research (2019, 2020) and INRIA (2015, 2016). There I was fortunate to work under many amazing mentors that have helped me evolve as a researcher: Yu Zhang, Ron J Weiss, Tara Sainath and Alexis Conneau (Google Brain); Yuzong Liu, Srikanth Ronanki and Katrin Kirchhoff (Amazon AWS AI); Mike Lewis and Abdelrahman Mohamed (Facebook AI Research); Emmanuel Vincent and Irina Illina (INRIA).

I recieved my undergraduate degree in Computer Science from BITS, Pilani (Hyderabad Campus) in 2016.

The best way to reach me is through email - sdalmia[at]cs.cmu.edu

CV / Google Scholar / Twitter / LinkedIn / Github

Publications

Outdated! Sorry, I am working on updating this. Meanwhile, please follow google scholar for my latest work.

	Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, Alan W Black The 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021) BibTeX / Website / PDF
	Differentiable Allophone Graphs for Language Universal Speech Recognition Brian Yan, Siddharth Dalmia, David Mortensen, Florian Metze, Shinji Watanabe The 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021) BibTeX / Code / PDF
	ESPnet-ST IWSLT 2021 Offline Speech Translation System Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021) BibTeX / Code / PDF
	Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, Shinji Watanabe 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021) BibTeX / Code / PDF
	Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation Jiatong Shi, Jonathan D. Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, Shinji Watanabe Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2021) BibTeX / Dataset / PDF
	NoiseQA: Challenge Set Evaluation for User-Centric Question Answering Abhilasha Ravichander, Siddharth Dalmia, Maria Ryskina, Florian Metze, Eduard Hovy, Alan W Black 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021) BibTeX / Project Page / Data / Code / PDF
	Transformer-Transducers for Code-Switched Speech Recognition Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff 46th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021) BibTeX / PDF
	On Long Tailed Phenomena in Neural Machine Translation Vikas Raunak, Siddharth Dalmia, Vivek Gupta, Florian Metze Findings of the 2020 Conference on Emperical Methods in Natural Language Processing (EMNLP 2020) BibTeX / PDF / Code
	Universal Phone Recognition with a Multilingual Allophone System Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R Mortensen, Graham Neubig, Alan W Black, Florian Metze 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020) BibTeX / PDF / Code
	Towards Zero-shot Learning for Automatic Phonemic Transcription Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W Black, Florian Metze 34th AAAI Conference on Artificial Intelligence (AAAI 2020) BibTeX / PDF
	Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models Siddharth Dalmia, Abdelrahman Mohamed, Mike Lewis, Florian Metze, Luke Zettlemoyer arXiv 2019 BibTeX / PDF
	Cross-Attention End-to-End ASR for Two-Party Conversations Suyoun Kim, Siddharth Dalmia, Florian Metze 20th Annual Conference of the International Speech Communication Association (InterSpeech 2019) BibTeX / PDF
	Multilingual Speech Recognition with Corpus Relatedness Sampling Xinjian Li, Siddharth Dalmia, Alan W Black, Florian Metze 20th Annual Conference of the International Speech Communication Association (InterSpeech 2019) BibTeX / PDF
	SANTLR: Speech Annotation Toolkit for Low Resource Language Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W Black, Florian Metze 20th Annual Conference of the International Speech Communication Association (InterSpeech 2019). Show and Tell Track BibTeX / PDF / Demo
	The ARIEL-CMU Systems for LoReHLT18 Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R Mortensen, Graham Neubig, Eduard Hovy, Alan W Black, Jaime Carbonell, Graham V Horwood, Shabnam Tafreshi, Mona Diab, Efsun S Kayi, Noura Farra, Kathleen McKeown CMU System Description for Low Resource Human Language Technologies (LoREHLT 2018) BibTeX / PDF / Project
	Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion Suyoun Kim, Siddharth Dalmia, Florian Metze 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019) BibTeX / PDF
	Phoneme Level Language Models for Sequence Based Low Resource ASR Siddharth Dalmia, Xinjian Li, Alan W Black, Florian Metze 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019) BibTeX / PDF
	Situation Informed End-to-End ASR for CHiME-5 Challenge Suyoun Kim, Siddharth Dalmia, Florian Metze 5th International Workshop on Speech Processing in Everyday Environments (CHIME 2018) BibTeX / PDF
	Domain Robust Feature Extraction for Rapid Low Resource ASR Development Siddharth Dalmia, Xinjian Li, Florian Metze, Alan W. Black 7th IEEE Workshop on Spoken Language Technology (SLT 2018) BibTeX / PDF
	Sequence-based Multi-lingual Low Resource Speech Recognition Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. Black 43rd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018) BibTeX / PDF / Code / Slides
	Epitran: Precision G2P for Many Languages David R. Mortensen, Siddharth Dalmia, Patrick Littell 11th International Conference on Language Resources and Evaluation (LREC 2018) BibTeX / PDF / Code
	An Approach for Self-Training Audio Event Detectors Using Web Data Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane 25th European Signal Processing Conference (EUSIPCO 2017) BibTeX / PDF
	Robust ASR using neural network based speech enhancement and feature simulation Sunit Sivasankaran, Aditya Arie Nugraha, Emmanuel Vincent, Juan A Morales-Cordovilla, Siddharth Dalmia, Irina Illina, Antoine Liutkus 14th IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2015) BibTeX / PDF

Academic Services

ICML 2019, LREC 2020, ACL 2020 (SRW), EMNLP 2020, NeurIPS 2020, AACL 2020 (SRW), EACL 2021, AAAI 2021, NAACL 2021, ACL 2021, ICML 2021, INTERSPEECH 2021, ACMMM 2021, EMNLP 2021, NeurIPS 2021, ICLR 2022

Teaching Assistant

Spring 2019: Large Scale Multimedia Analysis, Graduate Course @ CMU

Fall 2019: Speech Recognition and Understanding, Graduate Course @ CMU

Website source from Jon Barron here