A partial list of papers and theses from the
CMU Robust Speech Group

- Robust Speech Group, Carnegie Mellon University

Ph.D Theses

Anjali Menon, Robust Recognition Of Binaural Speech Signals Using Techniques Based On Human Auditory Processing, February, 2019.
Mark J. Harvilla, Compensation for Nonlinear Distortion in Noise for Robust Speech Recognition, Ph.D. Thesis, ECE, CMU, October, 2014.
Amir Moghimi, Array-Based Spectro-Temporal Masking for Automatic Speech Recognition, Ph.D. Thesis, ECE, CMU, April, 2014.
Griffin Romigh, Individualized Head-Related Transfer Functions: Efficient Modeling and Estimation from Small Sets of Spatial Samples, Ph.D. Thesis, ECE, CMU, December, 2012.
Kshitiz Kumar, A Spectro-Temporal Framework for Compensation of Reverberation for Speech Recognition, Ph.D. Thesis, ECE, CMU, February, 2011.
Chanwoo Kim, Signal Processing for Robust Speech Recognition Motivated by Auditory Processing, Ph.D. Thesis, LTI, CMU, September, 2010.
Lingyun Gu, Single-Channel Speech Separation Based on Instantaneous Frequency, Ph.D. Thesis, LTI, CMU, May, 2010.
Yu-Hsiang Bosco Chiu, Learning-Based Auditory Encoding for Robust Speech Recognition, Ph.D. Thesis, ECE Department, CMU, April, 2010.
Ziad Al Bawab, An Analysis-by-Synthesis Approach to Vocal Tract Modeling for Robust Speech Recognition, Ph.D. Thesis, ECE Department, CMU, September, 2009.
Xiang Li, Combination and Generation of Parallel Feature Streams for Improved Speech Recognition, Ph.D. Thesis, ECE Department, CMU, February 2005.
Jon P. Nedel, Duration Normalization for Robust Recognition of Spontaneous Speech via Missing Feature Methods, Ph.D. Thesis, ECE Department, CMU, April, 2004.
Michael L. Seltzer, Microphone Array Processing for Robust Speech Recognition, Ph.D. Thesis, ECE Department, CMU, July 2003.
Sam-Joo Doh, Enhancements to Transformation-Based Speaker Adaptation: Principal Component and Inter-Class Maximum Likelihood Linear Regression, Ph.D. Thesis, ECE Department, CMU, July 2000.
Juan M. Huerta, Robust Speech Recognition in GSM Codec Environments, Ph.D. Thesis, ECE Department, CMU, April 2000.
Bhiksha Raj, Reconstruction of Incomplete Spectrograms for Robust Speech Recognition (.pdf 1.3MB), Ph.D. Thesis, ECE Department, CMU, April 2000.
Matthew A. Siegler, Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance, Ph.D. Thesis, ECE Department, CMU, December 1999.
Evandro B. Gouvea, Acoustic-Feature-Based Frequency Warping for Speaker Normalization, Ph.D. Thesis, ECE Department, CMU, February 1999.
Thomas M. Sullivan, Multi-Microphone Correlation-Based Processing for Robust Automatic Speech Recognition (2.2MB), (PDF format) Ph.D. Thesis, ECE Department, CMU, August 1996. (Compressed, 0.7MB) (Abstract)
Pedro J. Moreno, Speech Recognition in Noisy Environments (1.3MB), (PDF format) Ph.D. Thesis, ECE Department, CMU, May 1996. (Compressed, 0.5MB) (Abstract)
Fu-Hua Liu, Environmental Adaptation for Robust Speech Recognition (2.3MB), Ph.D. Thesis, ECE Department, CMU, June 1994. (abstract)
Yoshiaki Ohshima, Environmental Robustness in Speech Recognition using Physiologically-Motivated Signal Processing, Ph.D. Thesis, ECE Department, CMU, December 1993. (abstract)
William A. Rozzi, Speaker Adaptation in Automatic Speech Recognition via Estimation of Correlated Mean Vectors (2MB), Ph.D. Thesis, ECE Department, CMU, May 1991. (Compressed, 0.6MB) (abstract)
Alejandro Acero, Acoustical and Environmental Robustness for Automatic Speech Recognition (.pdf, 1.3MB), Ph.D. Thesis, ECE Department, CMU, September 1990. (abstract)

MS Reports

Balakrishnan Narayanaswamy, Improved Text-Independent Speaker Recognition using Gaussian Mixture Probabilities, Master's Report, ECE Department, CMU, May 2005.
Michael Seltzer, Automatic Detection of Corrupted Speech Features for Robust Speech Recognition, ECE Department, CMU, May 2000.
Jon Nedel, Integration of Speech and Video: Applications for Lip Synch: Lip Movement Synthesis and Time Warping, Master's Report, ECE Department, CMU, May 1999.
Uday Jain, Connected Digit Recognition over Long Distance Telephone Lines Using the SPHINX-II System, Master's Report, ECE Department, CMU, May 1995. (abstract)
Matthew Siegler, Effects of Speech Rate on Speech Recognition Accuracy, Master's Report, ECE Department, CMU, December 1995.
Pedro J. Moreno, Speech Recognition in Telephone Environments, Master's Report, ECE Department, CMU, January 1993.

Papers and Talks

2020

T. Vuong, Y. Xia, and R. M. Stern, "Learnable Spectro-Temporal Receptive Fields for Robust Voice Type Discrimination," Interspeech 2020, October 2020, Shanghai, China
R. M. Stern and A. J. Menon, "Binaural Technology for Machine Speech Recognition nand Understanding, in The Technology of Binaural Understanding, J. Blauert and J. Braasch, Eds., Springer-Verlag.

2019

A. J. Menon, C. Kim, and R. M. Stern, "Robust Recognition of Reverberant and Noisy Speech using Coherence-Based Processing," IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2019, Brighton, United Kingdom

2018

Y. Xia and R. M. Stern, "A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement," Interspeech 2018, September 2018, Hyderabad, India.

2017

V. Mitra, H. Franco, R. Stern, J. Van Hout, L. Ferrer, M. Graciarena, W. Wang, D. Vergyri, A. Alwan, J.H.L. Hansen “Robust features in Deep Learning based Speech Recognition,” in New Era for Robust Speech Recognition: Exploiting Deep Learning, S. Watanabe, M. Delcroix, F.Metze, & J. Hershey (eds). , Springer, in press. (preliminary version).
F. de la Calle Silos and R. M. Stern, “Synchrony-Based Feature Extraction for Robust Speech Recognition," IEEE Signal Processing Letters, 24;1158-1162.
A. Menon, C. Kim, and R. M. Stern, "Robust Speech Recognition Based on Binaural Auditory Processing," Interspeech 2017, August 2017, Stockholm, Sweden.
A. Menon, C. Kim, U. Kurokawa, and R. M. Stern, (2017), “Binaural Processing for Robust Recognition of Degraded Speech,” IEEE Automatic Speech Recognition and Understanding Workshop, December 2017, Naha, Okinawa, Japan.

2016

C. Kim and R. M. Stern, Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition, IEEE Trans. on Audio, Speech, and Language Processing, 24:1315-1329.
B. J. Cho, H. Kwon, J.-W. Cho, C. Kim, R. M. Stern, and H.-M. Park, A Subband-Based Stationary-Component Suppression Method Using Harmonics and Power Ratio for Reverberant Speech Recognition, IEEE Signal Processing Letters, 23:780-784.
R. M. Stern, C. Kim, A. R. Moghimi, A. Menon, Binaural Technology and Automatic Speech Recognition, International Congress on Acoustics, September 2016, Buenos Aires, Argentina.

2015

G. D. Romigh, D. S. Brungart, R. M. Stern, and B. D. Simpson, Efficient Real Spherical Harmonic Representation of Head-Related Transfer Functions, IEEE Journal on Selected Topics in Signal Processing, 9:921-930, August 2015.
M. J. Harvilla and R. M. Stern, Efficient audio declipping using regularized least squares, IEEE International Conference on Accoustics, Speech, and Signal Processing, April 2015, Brisbane, Australia.
M. J. Harvilla and R. M. Stern, Robust parameter estimation for audio declipping in noise, Interspeech 2015, September 2015, Dresden, Germany.
K. Osako, R. Singh, and B. Raj, Complex Recurrent Neural Networks for Denoising Speech Signals, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, New York.

2014

A. Moghimi and R. M. Stern, An Analysis of Binaural Spectro-Temporal Masking as Nonlinear Beamforming, IEEE International Conference on Accoustics, Speech, and Signal Processing, May 2014, Florence, Italy.
M. J. Harvilla and R. M. Stern, Least squares signal declipping for robust speech recognition, Interspeech 2014, September 2014, Singapore.
A. R. Moghimi, B. Raj, and R. M. Stern, Post-masking: A hybrid approach to array processing for speech recognition, Interspeech 2014, September 2014, Singapore.
C. Kim, K. K. Chin, M. Bacchiani, and R. M. Stern, Robust speech recognition using temporal masking and thresholding algorithm, Interspeech 2014, September 2014, Singapore.

2013

H. Hermansky, J. R. Cohen, and R. M. Stern, Perceptual properties of current speech recognition technology, Proc. IEEE, 101:1969-1985, September 2013.
R. M. Stern and N. Morgan, Features based on auditory physiology and perception, Chapter in Techniques for Noise Robustness in Speech Recognition, T. Virtanen, B. Raj, and R. Singh, Eds., pp. 193-227. (page proofs)
M. J. Harvilla and R. M. Stern, Recognition of speech enhanced by blind compensation for artifacts of single-sideband modulation, (unpublished) 2013.

2012

Y.-H. B. Chiu, B. Raj, and R. M. Stern, Learning-based auditory encoding for robust speech recognition, IEEE Trans. on Audio, Speech, and Language Processing, 20:900-914, March, 2012.
R. M. Stern and N. Morgan, Hearing is believing: Biologically-inspired feature extraction for robust automatic speech recognition, IEEE Signal Processing Magazine, 29:34-43, November, 2012.
M. J. Harvilla and R. M. Stern, Histogram-Based Subband Power Warping and Spectral Averaging for Robust Speech Recognition under Matched and Multistyle Training, IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2012, Kyoto, Japan.
C. Kim and R. M. Stern, Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2012, Kyoto, Japan.
C. Kim, C. Khawand, and R. M. Stern, Two-Microphone Source Separation Algorithm Based on Statistical Modeling of Angle Distributions, IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2012, Kyoto, Japan.

2011

R. Stern, Applying physiologically-motivated models of auditory processing to automatic speech recognition, Third International Symposium on Auditory and Audiological Research, August 2011, Nyborg, Denmark.
C. Kim, K. Kumar, and R. M. Stern, Binaural sound source separation motivated by auditory processing, IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2011, Prague, Czech Republic.
K. Kumar, C. Kim, and R. M. Stern, Delta-spectral cepstral coefficients for robust speech recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2011, Prague, Czech Republic.
K. Kumar, B. Raj, R. Singh, and R. M. Stern, An iterative least-squares techique for dereverberation, IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2011, Prague, Czech Republic.
K. Kumar, R. Singh, B. Raj, and R. M. Stern, Gammatone sub-band magnitude-domain dereverberation , IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2011, Prague, Czech Republic.
W. Kim and R. M. Stern, "Mask classification for missing-feature reconstruction for robust speech recognition," Speech Communication, 53:1-11, January, 2011.

2010

Z. Al Bawab, B. Raj, and R. M. Stern, "A hybrid physical and statistical dynamic articulatory framework incorporating analysis-by-synthesis for improved phone classification," IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2010, Dallas, Texas.
Y.-H. B. Chiu, B. Raj, and R. M. Stern, "Learning-based auditory encoding for robust speech recognition," IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2010, Dallas, Texas.
C. Kim and R. M. Stern, "Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring," IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2010, Dallas, Texas.
K. Kumar and R. M. Stern, "Maximum-likelihood-based cepstral inverse filtering for blind speech dereverberation," IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2010, Dallas, Texas.
C. Kim, R. M. Stern, K. Eom, and J. Lee, "Automatic selection of thresholds for signal separation algorithms based on interaural delay," Interspeech 2010, September 2010, Makuhari, Japan.
C. Kim and R. M. Stern, "Nonlinear enhancement of onset for robust speech recognition," Interspeech 2010, September 2010, Makuhari, Japan.

2009

H.-M. Park and R. M. Stern, "Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero crossings," Speech Communication, 51:15-25, January 2009.
Y.-H. B. Chiu and R. M. Stern, "Minimum variance modulation filters for robust speech recognition," IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2009, Taipei, Taiwan.
Z. Al Bawab, L. Turicchia, R. M. Stern, and B. Raj, "Deriving vocal tract shapes from electromagnetic articulograph data via geometric adaptation and matching, Interspeech 2009, September 2009, Brighton, United Kingdom.
L. Buera, A. Miguel, A. Ortega, E. Lleida, and R. Stern, "Unsupervised training scheme with non-stereo data for empirical feature vector compensation, Interspeech 2009, September 2009, Brighton, United Kingdom.
Y.-H. B. Chiu, B. Raj, and R. M. Stern, "Toward fusion of feature extraction and acoustic model training: a top-down process for robust speech recognition," Interspeech 2009, September 2009, Brighton, United Kingdom.
L. Gu and R. M. Stern, "Speaker segmentation and clustering for simultaneously-presented speech," Interspeech 2009, September 2009, Brighton, United Kingdom.
C. Kim, K. Kumar, B. Raj, and R. M. Stern, "Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain," Interspeech 2009, September 2009, Brighton, United Kingdom.
C. Kim and R. M. Stern, "Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction," Interspeech 2009, September 2009, Brighton, United Kingdom.
C. Kim and R. M. Stern, "Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition," IEEE Automatic Speech Recognition and Understanding Workshop, December 2009, Merano, Italy.
C. Kim and R. M. Stern, "Robust Speech Recognition using a Small Power Boosting Algorithm," IEEE Automatic Speech Recognition and Understanding Workshop, December 2009, Merano, Italy.

2008

R. Stern, E. Gouvea, C. Kim, K. Kumar, and H.-M.Park, “Binaural and multiple-microphone signal processing motivated by auditory perception,” HSCMA Joint Workshop on Hands-free Speech Communication and Microphone Arrays, May 2008, Trento, Italy.
Z. Al Bawab, B, Raj, and R. M. Stern, “Analysis-by-synthesis features for speech recognition,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2008, Las Vegas, Nevada.
L. Gu and R. M. Stern, “Single-channel speech separation based on modulation frequency,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2008, Las Vegas, Nevada.
K. Kumar, and R. M. Stern, “Environment-invariant compensation for reverberation using linear post-filtering for minimum distortion,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2008, Las Vegas, Nevada.
Y.-H. Chiu and R. M. Stern, "Analysis of physiologically-motivated signal processing for robust speech recognition," Interspeech 2008, September 2008, Brisbane, Australia.
C. Kim and R. M. Stern, "Robust Signal-to-Noise Ratio Estimation Based on Waveform Amplitude Distribution Analysis," Interspeech 2008, September 2008, Brisbane, Australia.

2007

H.-M. Park and R. M. Stern, “Missing-feature speech recognition using dereverberation and echo suppression in reverberant environments,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2007, Honolulu, Hawaii.
K. Kumar, T. Chen, and R. M. Stern, “Profile view lip reading,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2007, Honolulu, Hawaii.
R. M. Stern, E. Gouvea, and G. Thattai, "'Polyaural’ array processing for automatic speech recognition in degraded environments,” Proc. Interspeech 2007, August 2007, Antwerp, Belgium.

2006

M. L. Seltzer and R. M. Stern, “Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments,” IEEE Trans. on Audio, Speech, and Language Processing, 14(6): 2109-2121, November 2006.
R. M. Stern, DeL. Wang, and G. Brown, “Binaural sound localization,” Chapter in Computational Auditory Scene Analysis, G. Brown and DeL. Wang, Eds., Wiley/IEEE Press, 2006.
R. M. Stern, C. Trahiotis, and A. Ripepi, “Fluctuations in amplitude and frequency enable interaural delays to foster the identification of speech-like stimuli,” Chapter in Dynamics of Speech Production and Perception, P. Divenyi et al., Eds., IOS Press, 2006.
H.-M. Park and R. M. Stern, “Spatial separation of speech sgnals using continuously-variable masks estimated from comparisons of zero crossings,” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2006, Toulouse, France.
W. Kim and R. M. Stern, “Band-independent mask estimation for missing-feature reconstruction,” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2006, Toulouse, France.
C. Kim, Y.-H. Chiu, and R. M. Stern, “Physiologically-motivated synchrony-based processing for robust automatic speech recognition,” Interspeech 2006, September 2006, Pittsburgh, Pennsylvania.
B. Narayanaswamy, R. Gangadharaiah, and R. M. Stern, “Voting for two speaker segmentation,” Interspeech 2006, September 2006, Pittsburgh, Pennsylvania.

2005

B. Raj and R. M. Stern, “Missing-Feature Methods for Robust Automatic Speech Recognition,” IEEE Signal Processing Magazine, 22(5):101-116, September 2005.
N.S. Kim, W. Lim, and R. M. Stern, “Feature compensation based on switching linear dynamic model,” IEEE Signal Processing Letters, 12 (6): 473-476, June, 2005.
W. Kim, R. M. Stern, and H. Ko, "Environment-Independent Mask Estimation for Missing Feature Reconstruction," Proc. Eurospeech-2005 September, 2005, Lisbon, Portugal.

2004

B. Raj, M. L. Seltzer, and R. M. Stern, “Reconstruction of Missing Features for Robust Speech Recognition,” Speech Communication Journal 43(4): 275-296, September 2004.
M. L. Seltzer, B. Raj, and R. M. Stern, “A Bayesian Framework for Spectrographic Mask Estimation for Missing Feature Speech Recognition,” Speech Communication Journal 43(4): 379-393, September 2004.
M. L. Seltzer, B. Raj, and R. M. Stern, “Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition,” IEEE Trans. on Speech and Audio Processing, 12(5): 489-498, September 2004.
R. M. Stern, “Signal Separation Motivated by Human Auditory Perception: Applications to Automatic Speech Recognition,” in Speech Separation by Humans and Machines, P. Divenyi, Ed., Springer-Verlag, 2004.
Y. Obuchi, N. Hataoka, and R. M. Stern, "Normalization of Time-Derivative Parameters for Robust Speech Recognition in Small Devices," IEICE Transactions on Information and Systems 87-D(4): 1004:1011, April 2004.
X. Li and R. M. Stern, “Feature Generation Based on Maximum Normalized Acoustic Likelihood for Improved Speech Recognition,” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2004, Montreal, Quebec.
B. Raj, R. Singh, and R. M. Stern, “On Tracking Noise with Linear Dynamical System Models,” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2004, Montreal, Quebec.
M. L. Seltzer and R. M. Stern, “Parameter Sharing in Subband Likelihood-Maximizing Beamforming for Speech Recognition using Microphone Arrays,” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2004, Montreal, Quebec.
X. Li and R. M. Stern, "Parallel Feature Generation Based on Maximum Normalized Acoustic Likelihood for Improved Combination Performance," International Conference on Spoken Language Processing, October, 2004, Jeju Island, Korea.

2003

B. Raj and R. Singh, "Classifier-Based Non-Linear Projection for Adaptive Endpointing of Continuous Speech," Computer Speech and Language 17(1):5-26, January 2003.
M. L. Seltzer, and B. Raj, "Speech Recognizer Based Filter Optimization for Microphone Array Processing", IEEE Signal Processing Letters 10(3):69-71, March 2003.
M. Seltzer and R. Stern, “Subband Parameter Optimization of Microphone Arrays for Speech Recognition in Reverberant Environments,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2003, Hong Kong.
X. Li and R. Stern, “Training of Stream Weights for the Decoding of Speech using Parallel Feature Streams,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2003, Hong Kong.
X. Li and R. M. Stern, “Feature Generation Based on Maximum Classification Probability for Improved Speech Recognition," Proc. Eurospeech-2003 September, 2003, Geneva, Switzerland.
J. P. Nedel and R. M. Stern, “Duration Normalization and Hypothesis Combination for Improved Spontaneous Speech Recognition,” Proc. Eurospeech-2003 September, 2003, Geneva, Switzerland.
Y. Obuchi and R. M. Stern, “Normalization of Time-Derivative Parameters using Histogram Equalization," Proc. Eurospeech-2003 September, 2003, Geneva, Switzerland.

2002

R. Singh, B. Raj, and R. M. Stern, "Automatic Generation of Subword Units for Speech Recognition Systems," IEEE Transactions on Speech and Audio Processing, 10(2): 89-99, 2002.
R. Singh, R. M. Stern, and B. Raj, “Signal and Feature Compensation Methods for Robust Speech Recognition,” Chapter in CRC Handbook on Noise Reduction in Speech Applications, Gillian Davis, Ed. CRC Press, 2002.
R. Singh, B. Raj, and R. M. Stern, “Model Compensation and Matched Condition Methods for Robust Speech Recognition,” Chapter in CRC Handbook on Noise Reduction in Speech Applications, Gillian Davis, Ed. CRC Press, 2002.
M. L. Seltzer, B. Raj, and R. M. Stern, “Speech Recognizer-Based Microphone Array Processing for Robust Hands-Free Speech Recognition,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., May, 2002, Orlando, Florida.
X. Li, R. Singh, and R. M. Stern, "Lattice Combination for Improved Speech Recognition," Proc. of the International Conference of Spoken Language Processing, September, 2002, Denver, Colorado.

2001

J. M. Huerta and R. M. Stern. "Distortion-Class Modeling for Robust Speech Recognition under GSM RPE-LTP Coding,” Speech Communication Journal, 34:213-225.
R. Singh, M. L. Seltzer, B. Raj, and R. M. Stern, “Speech in Noisy Environments: Robust Automatic Segmentation, Feature Extraction, and Hypothesis Combination,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., May, 2001, Salt Lake City, Utah.
J. P. Nedel and R. M. Stern, “Duration Normalization for Improved recognition of Spontaneous and Read Speech via Missing Feature Methods,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., May, 2001, Salt Lake City, Utah.
D. P. W. Ellis, R. Singh, and S. Sivadas, “Tandem Acoustic Modeling in Large-Vocabulary Recognition,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., May, 2001, Salt Lake City, Utah.
M. L. Seltzer and B. Raj, "Calibration of Microphone Arrays for Improved Speech Recognition," Proc. Eurospeech-2001 September, 2001, Aalborg, Denmark.
B. Raj, M. L. Seltzer, and R. M. Stern, “Robust Speech Recognition: The Case for Restoring Missing Features,” Proc. of the Workshop on Consistent and Reliable Acoustic Cues, September, 2001, Aalborg, Denmark.

2000

S.-J. Doh and R. M. Stern, “Using Class Weighting in Inter-Class MLLR,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
J. M. Huerta and R. M. Stern, “Instantaneous Distortion-Based Weighted Acoustic Modeling for Robust Recognition of Coded Speech,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
J. P. Nedel, R. Singh, and R. M. Stern, “Automatic Subword Unit Refinement for Spontaneous Speech Recognition via Phoneword Splitting,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
J. P. Nedel, R. Singh, and R. M. Stern, “Phone Transition Acoustic Modeling: Application to Speaker Independent and Spontaneous Speech Systems,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
B. Raj, M. L. Seltzer, and R. M. Stern, “Reconstruction of Damaged Spectrographic Features for Robust Speech Recognition,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
M. L. Seltzer, B. Raj, and R. M. Stern, “Classifier-Based Mask Estimation for Missing Feature Methods of Robust Speech Recognition,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
R. Singh, B. Raj, and R. M. Stern, “Structured Redefinition of Sound Units by Merging and Splitting for Improved Speech Recognition,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
S.-J. Doh and R. M. Stern, “Inter-Class MLLR for Speaker Adaptation,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., June, 2000, Istanbul, Turkey. (Poster)
R. Singh, B. Raj, and R. M. Stern, “Automatic Generation of Phone Sets and Lexical Transcriptions,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., June, 2000, Istanbul, Turkey.
M. Ravishankar, R. Singh, B. Raj, R. M. Stern, "The 1999 CMU 10X Real Time Broadcast News Transcription System,” Proc. NIST Speech Transcription Workshop, May, 2000, College Park, Maryland.

1999

S.-J. Doh and R. M. Stern, "Weighted principal component MLLR for speaker adaptation," Proc. of Automatic Speech Recognition and Understanding Workshop (ASRU 99), Colorado, USA, 1999. (Poster)
R. Singh, B. Raj and R. M. Stern, "Automatic Clustering And Generation of Contextual Questions For Tied States In Hidden Markov Models," Proc. of the ICASSP., Phoenix, Arizona, March, 1999.
J. M. Huerta and R. M. Stern, "Distortion-Class Weighted Acoustic Modeling for Robust Recognition under GSM RPE-LTP Coding", Proc. of the International Symposium on Robust Speech Recognition, Tampere, Finland, June, 1999.
R. Singh, B. Raj, and R. M. Stern, “Domain Adduced State Tying for Cross-domain Acoustic Modelling,” Proc. Eurospeech-99, September, 1999, Budapest, Hungary.
J. M. Huerta, S. J. Chen, and R. M. Stern, “The 1998 Carnegie Mellon University Sphinx-3 Spanish Broadcast News Transcription System", Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, March, 1999, Herndon, Virginia.

1998

P. J. Moreno, B. Raj, and R. M. Stern. “Data-Driven Environmental Compensation for Speech Recognition: A Unified Approach,” Speech Communication , 24: 267-85, 1998.
J. M. Huerta and R. M. Stern, "Speech Recognition From GSM Codec Parameters," Proc. of the International Conference on Spoken Language Processing, Sydney, Australia, November, 1998.
B. Raj, R. Singh, and R. M. Stern, "Inference of Missing Spectrographic Features for Robust Speech Recognition," Proc. of the International Conference on Spoken Language Processing, Sydney, Australia, November, 1998.

1997

R. M. Stern, B. Raj, and P. J. Moreno, (1997). “Compensation for Environmental Degradation in Automatic Speech Recognition,” Proc. of the ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, April, 1997, Pont-au-Mousson, France, pp. 33-42.
M. A. Siegler, U. Jain, B. Raj, and R. M. Stern, "Automatic Segmentation, Classification and Clustering of Broadcast News Audio," Proc. of the Speech Recognition Workshop (DARPA), Chantilly, VA, Feb. 1997.
J. M. Huerta, E. Thayer, M. Ravishankar, and R. M. Stern, “The Development of the 1997 CMU Spanish Broadcast News Transcription System,” Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, February, 1998, Landsdowne, Virginia.
E. Gouvêa, and R. M. Stern, "Speaker Normalization Through Formant-Based Warping Of The Frequency Scale," Proc. of the EUROSPEECH, 1997.
B. Raj, E. Gouvêa, and R. M. Stern, "Vector Polynomial Approximations For Robust Speech Recognition," Proc. of the ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, Pont-au-Mousson, France, April, 1997.
B. Raj, V. N. Parikh, and R. M. Stern, "The Effects Of Background Music On Speech Recognition Accuracy," Proc. of the ICASSP, Munich, Germany, April 1997.
J. M. Huerta and R. M. Stern, “Compensation for Environmental and Speaker Variability by Normalization of Pole Locations,” Proc. Eurospeech-97, September, 1997, Rhodes, Greece.

1996

R. M. Stern, A. Acero, F.-H. Liu, and Y. Ohshima, “Signal Processing for Robust Speech Recognition,” Chapter in Speech Recognition, pp. 351-378, C.-H. Lee and F. Soong, Eds., Boston: Kluwer Academic Publishers, 1996.
P. J. Moreno, B. Raj, and R. M. Stern, "A Vector Taylor Series Approach For Environment-Independent Speech Recognition," Proc. of the ICASSP, Atlanta, GA, May 1996.
B. Raj, E. Gouvêa, P. J. Moreno, and R. M. Stern, "Cepstral Compensation By Polynomial Approximation For Environment-Independent Speech Recognition," Proc. of the ICSLP, Philadelphia, PA, Oct. 1996.
E. B. Gouvea, P. J. Moreno, B. Raj, T. M. Sullivan, and R. M. Stern, “Adaptation and Compensation: Approaches To Microphone And Speaker Independence In Automatic Speech Recognition,” Proceedings of the ARPA Workshop on Speech Recognition Technology, Harriman, NY, Morgan Kaufmann, D. Pallett, Ed.
U. Jain, M. A. Siegler, S.-J. Doh, E. Gouvea, P. J. Moreno, B. Raj, and R. M. Stern, “Recognition Of Continuous Broadcast News With Multiple Unknown Speakers And Environments,” Proceedings of the ARPA Workshop on Speech Recognition Technology, Harriman, NY, Morgan Kaufmann, D. Pallett, Ed.

1995

P. J. Moreno, B. Raj, E. Gouvêa, and R. M. Stern, "Multivariate-Gaussian-Based Cepstral Normalization for Robust Speech Recognition," Proc. of the ICASSP, Detroit, Michigan, 1995.
M. A. Siegler, and R. M. Stern, "On the Effects of Speech Rate in Large Vocabulary Speech Recognition Systems," Proc. of the ICASSP, Detroit, Michigan, 1995.
P. J. Moreno, B. Raj, R. M. Stern, “A Unified Approach to Robust Speech Recognition,” Proc. of Eurospeech-95, Madrid, Spain, September, 1995.
P. J. Moreno, M. A. Siegler, U. Jain, and R. M. Stern, "Continuous Speech Recognition of Large Vocabulary Telephone Quality Speech," Proc. of the Eighth Spoken Language Systems Technology Workshop, 1995.
P. J. Moreno, U. Jain, B. Raj, and R. M. Stern, "Approaches to Microphone Independence in Automatic Speech Recognition," Proc. of the Eighth Spoken Language Systems Technology Workshop, 1995.
P. J. Moreno, B. Raj, and R. M. Stern, "Approaches to Environment Compensation in Automatic Speech Recognition," Proc. 15th International Conference on Acoustics, Trondheim, Norway, Vol. III, pp. 109-112, June, 1995.
Stern, R. M. and Sullivan, T. M. “Robust Speech Recognition Based on Human Binaural Perception,” Proc. of the ATR workshop on A Biological Framework for Speech Perception and Production, Kansai Science City, September, 1994, Reprinted as ATR Technical Report TR-H-121, (1995).

1994

F.-H. Liu, R. M. Stern, A. Acero, and P. J. Moreno, "Environment Normalization for Robust Speech Recognition using Direct Cepstral Comparison," Proc. of the ICASSP, Adelaide, Australia, 1994.
P. J. Moreno, and R. M. Stern, "Sources of Degradation of Speech Recognition in the Telephone Network," Proc. of the ICASSP, Adelaide, Australia, 1994.
R. M. Stern, F.-H. Liu, P. J. Moreno, and A. Acero, "Signal Processing for Robust Speech Recognition," Proc. of the International Conference on Spoken Language Processing, Yokohama, Japan, September, 1994.
N. Hanai, and R. M. Stern, "Robust Speech Recognition in the Automobile," Proc. of the International Conference on Spoken Language Processing, Yokohama, Japan, September, 1994.
Y. Ohshima and R. M. Stern, “Environmental Robustness in Automatic Speech Recognition Using Physiologically-Motivated Signal Processing,” Proc. of the International Conference on Spoken Language Processing, Yokohama, Japan, September, 1994.
F.-H. Liu, P. J. Moreno, R. M. Stern, and A. Acero, “Signal Processing For Robust Speech Recognition,” Proceedings of the Seventh ARPA Workshop on Human Language Technology, Princeton, New Jersey, Morgan Kaufmann, C. J. Weinstein, Ed.
F.-H. Liu, P. J. Moreno, R. M. Stern, and A. Acero, “Signal Processing For Robust Speech Recognition,” Proceedings of the ARPA Workshop on Spoken Language Technology, Princeton, New Jersey, March, 1994, R. M. Stern, Ed.

1993

T. M. Sullivan and R. M. Stern, "Multi-Microphone Correlation-Based Processing for Robust Speech Recognition," Proc. of the ICASSP, Minneapolis, Minnesota, April, 1993.
F.-H. Liu, R. M. Stern, X. Huang, and A. Acero, "Efficient Cepstral Normalization For Robust Speech Recognition," Proc. of the Sixth ARPA Workshop on Human Language Technology, Princeton, NJ, Morgan Kaufmann, March, 1993.

1992

R. M. Stern, F.-H. Liu, Y. Ohshima, T. M. Sullivan, and A. Acero, "Multiple Approaches to Robust Speech Recognition," Proc. of the Fifth DARPA Speech and Natural Language Workshop, Harriman, New York, February, 1992.
F.-H. Liu, A. Acero, and R. M. Stern, "Efficient Joint Compensation of Speech for the Effects of Additive Noise and Linear Filtering," Proc. of the ICASSP, San Francisco, CA, March, 1992.
R. M. Stern, F.-H. Liu, Y. Ohshima, T. M. Sullivan, and A. Acero, "Multiple Approaches to Robust Speech Recognition," Proc. of the ICSLP, 1992.

1991

A. Acero, and R. M. Stern, "Robust Speech Recognition by Normalization of the Acoustic Space," Proc. of the ICASSP, Toronto, Ontario, 1991.
W. A. Rozzi and R. M. Stern, “Fast Estimation of Mean Vectors using Adaptive Filtering,” Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Toronto, Ontario, pp. 865-868, 1991.

1990

A. Acero, and R. M. Stern, "Environmental Robustness in Automatic Speech Recognition," Proc. of the ICASSP, Albuquerque, New Mexico, 1990.
A. Acero, and R. M. Stern, “Toward Microphone-Independent Spoken Language Systems,” Proceedings of the DARPA Speech and Natural Language Workshop , Hidden Valley, PA, R. M. Stern , Ed., Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1990.
A. Acero, and R. M. Stern, “Acoustical Pre-Processing for Robust Spoken Language Systems,” Proc. First International Conference on Spoken Language Processing, pp. 1121-1124, Kobe, Japan, November, 1990.
D. A. Coast, R. M. Stern, G. G. Cano, and S. A. Briller, "An Approach to Cardiac Arrhythmia Analysis Using Hidden Markov models," IEEE Transactions on Biomedical Engineering, September, 1990.

"Classic" robust papers (pre-1990)

Original description of extended maximum a posteriori probability (EMAP) speaker adaptation:

R. M. Stern and M. J. Lasry, “Dynamic Speaker Adaptation for Feature-Based Isolated Letter Recognition,” IEEE Trans. on Acoustics, Speech, and Signal Processing 35: 751-763, 1987.
M. J. Lasry and R. M. Stern, “A Posteriori Estimation of Correlated Jointly Gaussian Mean Vectors,” IEEE Trans. on Pattern Anal. and Mach. Intel. 6: 530-535, 1984.
M. J. Lasry and R. M. Stern, “Unsupervised Adaptation to New Speakers in Feature-Based Letter Recognition,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., San Diego, California, May, 1984.
R. M. Stern and M. J. Lasry (1983). “Dynamic Speaker Adaptation for Isolated Letter Recognition Using MAP Estimation,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., Boston, Massachusetts, May, 1983.

A partial list of papers and theses from the CMU Robust Speech Group

Ph.D Theses

MS Reports

Papers and Talks

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

"Classic" robust papers (pre-1990)

A partial list of papers and theses from the
CMU Robust Speech Group