David Huggins-Daines

About me

I am a PhD student (as of spring 2006) at Language Technologies Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

View my Curriculum Vitae or download it as a PDF.

I received my B.A. with Honours in Linguistics at the University of Ottawa in 2000. I have worked as a computer programmer since 1999, doing, among other diverse things, search engine and web-database programming, operating system kernel and bootloader development, platform support for the GNU dynamic linker and standard C library, telephony integration, and, of course, high-quality, small footprint TTS. At various other points I have also worked as a bicycle mechanic.

I worked on on Text-To-Speech for several years prior to coming to LTI. My general research interests include pronunciation and acoustic modeling for speech synthesis and recognition, efficient techniques for speech recognition, and data-driven methods in linguistic analysis and language processing.

My PhD advisor is Dr. Alex Rudnicky. My research interests are presented below, or you can also browse my Wiki. For the most part, I work on acoustic and language modeling tools for the CMU Sphinx system. I am also developing PocketSphinx, an open-source speech recognition system for embedded and handheld devices.

Publications

"Combining Mixture Weight Pruning and Quantization for Small-Footprint Speech Recognition" David Huggins-Daines and Alexander I. Rudnicky. Proceedings of ICASSP-2009, Taipei, Taiwan, April 2009.
"Mixture Pruning and Roughening for Scalable Acoustic Models." David Huggins-Daines and Alexander I. Rudnicky. Proceedings of ACL-08 Workshop on Mobile Language Processing, Columbus, OH, USA, June 2008.
"Interactive ASR Error Correction for Touchscreen Devices." David Huggins-Daines and Alexander I. Rudnicky. Demo presented at ACL 2008, Columbus, OH, USA, June 2008.
"Implicitly Supervised Language Model Adaptation for Meeting Transcription." David Huggins-Daines and Alexander I. Rudnicky. Proceedings of HLT-NAACL 2007, Rochester, NY, USA, May 2007.
"Conquest - an Open-Source Dialog System for Conferences." Bohus, D., Grau, S., Huggins-Daines, D., Keri, V., Krishna, G., Kumar, R., Raux, A., and Tomko, S. Proceedings of HLT-NAACL 2007, Rochester, NY, USA, May 2007.
"A Constrained Baum-Welch Algorithm for Improved Phoneme Segmentation and Efficient Training." David Huggins-Daines and Alexander I. Rudnicky. To appear in Proceedings of Interspeech 2006, Pittsburgh, USA, September 2006.
"PocketSphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices." David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W Black, Mosur Ravishankar, and Alexander I. Rudnicky. In Proceedings of ICASSP 2006, Toulouse, France, May 2006.
"Investigations on Ensemble Based Semi-Supervised Acoustic Model Training" Rong Zhang, Ziad Al Bawab, Arthur Chan, Ananlada Chotimongkol, David Huggins-Daines, and Alexander I. Rudnicky. In Proc. of Eurospeech 2005.

Code

Every once in a while I write some small piece of code that isn't good enough to end up in a public CVS/Subversion server somewhere, but might still be useful. You may be interested in the PocketSphinx with touch-correction demo for the Nokia N800 (see the YouTube video here). Or perhaps you are interested in a Phoneme Decoder written in Python, or an IAX (Asterisk Voice over IP) interface for the Communicator/Olympus/GALAXY dialog system. I also have some Perl modules on CPAN which I assume still work though I haven't touched them in a while. I am a big fan of SciPy and NumPy these days.

Research

Researchy things I'm interested in at the moment include but are by no means limited to:

Distributed decoding algorithms for ASR and SMT. I'm interested in building low-power personal recording devices which collaborate over a wireless network in order to do speech recognition, information extraction, machine translation, or what have you, in a meeting room or lecture hall situation.
Speech recognition of spontaneous bilingual speech involving code-switching. Aside from being an interesting test of multilingual speech recognition, this is potentially useful for large parts of the world where bilingualism (defined as being a native or near-native speaker of multiple languages) is the norm rather than the exception.
Efficient algorithms for on-line speaker adaptation. I'm interested in strategies for performing vocal tract normalization and acoustic model adaptation based on sufficient statistics that can be collected and estimated with very little overhead compared to decoding alone.
Methods for combining data from different channel conditions and sampling rates in acoustic model training, and for dealing with feature mismatches in decoding.
Figuring out what speech recognition (and by extension statistical NLP) can learn from sociolinguistics and vice versa. It has often seemed to me that these research communities are "speaking the same language" in many ways, in that both accept variation as a fundamental fact of naturalistic spoken language, and both are interested in modeling it using statistical techniques. For further food for thought, William Labov's paper on Quantitative Reasoning in Linguistics may be of interest.

David Huggins-Daines

Last modified: Mon Oct 26 14:34:15 EDT 2009