A partial list of papers and theses from the
CMU Robust Speech Group
- Robust
Speech Group, Carnegie Mellon University
Ph.D Theses
- Anjali Menon, Robust Recognition
Of Binaural Speech Signals Using Techniques Based On Human Auditory
Processing, February, 2019.
- Mark J. Harvilla, Compensation
for Nonlinear Distortion in Noise for Robust Speech Recognition,
Ph.D. Thesis, ECE, CMU, October, 2014.
- Amir Moghimi, Array-Based
Spectro-Temporal Masking for Automatic Speech Recognition,
Ph.D. Thesis, ECE, CMU, April, 2014.
- Griffin Romigh, Individualized
Head-Related Transfer Functions: Efficient Modeling and Estimation
from Small Sets of Spatial Samples, Ph.D. Thesis, ECE, CMU,
December, 2012.
- Kshitiz Kumar, A
Spectro-Temporal Framework for Compensation of Reverberation for
Speech Recognition, Ph.D. Thesis, ECE, CMU, February,
2011.
- Chanwoo Kim, Signal
Processing for Robust Speech Recognition Motivated by Auditory
Processing, Ph.D. Thesis, LTI, CMU, September, 2010.
- Lingyun Gu, Single-Channel Speech
Separation Based on Instantaneous Frequency, Ph.D. Thesis,
LTI, CMU, May, 2010.
- Yu-Hsiang Bosco Chiu, Learning-Based
Auditory Encoding for Robust Speech Recognition, Ph.D.
Thesis, ECE Department, CMU, April, 2010.
- Ziad Al Bawab, An
Analysis-by-Synthesis Approach to Vocal Tract Modeling for Robust
Speech Recognition, Ph.D. Thesis, ECE Department, CMU,
September, 2009.
- Xiang Li, Combination and
Generation of Parallel Feature Streams for Improved Speech
Recognition, Ph.D. Thesis, ECE Department, CMU, February
2005.
- Jon P. Nedel, Duration
Normalization for Robust Recognition of Spontaneous Speech via
Missing Feature Methods, Ph.D. Thesis, ECE Department, CMU,
April, 2004.
- Michael L. Seltzer, Microphone
Array Processing for Robust Speech Recognition, Ph.D.
Thesis, ECE Department, CMU, July 2003.
- Sam-Joo Doh, Enhancements to
Transformation-Based Speaker Adaptation: Principal Component and
Inter-Class Maximum Likelihood Linear Regression, Ph.D.
Thesis, ECE Department, CMU, July 2000.
- Juan M. Huerta, Robust Speech
Recognition in GSM Codec Environments, Ph.D. Thesis, ECE
Department, CMU, April 2000.
- Bhiksha Raj, Reconstruction of
Incomplete Spectrograms for Robust Speech Recognition (.pdf 1.3MB),
Ph.D. Thesis, ECE Department, CMU, April 2000.
- Matthew A. Siegler, Integration
of Continuous Speech Recognition and Information Retrieval for
Mutually Optimal Performance, Ph.D. Thesis, ECE Department,
CMU, December 1999.
- Evandro B. Gouvea, Acoustic-Feature-Based
Frequency Warping for Speaker Normalization, Ph.D. Thesis,
ECE Department, CMU, February 1999.
- Thomas M. Sullivan, Multi-Microphone
Correlation-Based Processing for Robust Automatic Speech Recognition
(2.2MB), (PDF format) Ph.D.
Thesis, ECE Department, CMU, August 1996. (Compressed,
0.7MB) (Abstract)
- Pedro J. Moreno, Speech Recognition
in Noisy Environments (1.3MB), (PDF
format) Ph.D. Thesis, ECE Department, CMU, May 1996. (Compressed,
0.5MB) (Abstract)
- Fu-Hua Liu, Environmental
Adaptation for Robust Speech Recognition (2.3MB), Ph.D.
Thesis, ECE Department, CMU, June 1994. (abstract)
- Yoshiaki Ohshima, Environmental
Robustness in Speech Recognition using Physiologically-Motivated
Signal Processing, Ph.D. Thesis, ECE Department, CMU,
December 1993. (abstract)
- William A. Rozzi, Speaker
Adaptation in Automatic Speech Recognition via Estimation of
Correlated Mean Vectors (2MB), Ph.D. Thesis, ECE Department,
CMU, May 1991. (Compressed, 0.6MB)
(abstract)
- Alejandro Acero, Acoustical and
Environmental Robustness for Automatic Speech Recognition (.pdf,
1.3MB), Ph.D. Thesis, ECE Department, CMU, September 1990. (abstract)
MS Reports
- Balakrishnan Narayanaswamy, Improved
Text-Independent Speaker Recognition using Gaussian Mixture
Probabilities, Master's Report, ECE Department, CMU, May
2005.
- Michael Seltzer, Automatic
Detection of Corrupted Speech Features for Robust Speech Recognition,
ECE Department, CMU, May 2000.
- Jon Nedel, Integration of Speech
and Video: Applications for Lip Synch: Lip Movement Synthesis and
Time Warping, Master's Report, ECE Department, CMU, May
1999.
- Uday Jain, Connected Digit Recognition
over Long Distance Telephone Lines Using the SPHINX-II System,
Master's Report, ECE Department, CMU, May 1995. (abstract)
- Matthew Siegler, Effects of
Speech Rate on Speech Recognition Accuracy, Master's Report,
ECE Department, CMU, December 1995.
- Pedro J. Moreno, Speech
Recognition in Telephone Environments, Master's Report, ECE
Department, CMU, January 1993.
Papers and Talks
2020
- T. Vuong, Y. Xia, and R. M. Stern, "Learnable
Spectro-Temporal Receptive Fields for Robust Voice Type Discrimination,"
Interspeech 2020, October 2020, Shanghai, China
- R. M. Stern and A. J. Menon, "Binaural
Technology for Machine Speech Recognition nand Understanding,
in The Technology of Binaural Understanding, J. Blauert and J.
Braasch, Eds., Springer-Verlag.
2019
2018
2017
- V. Mitra, H. Franco, R. Stern, J. Van Hout, L. Ferrer, M. Graciarena,
W. Wang, D. Vergyri, A. Alwan, J.H.L. Hansen “Robust features in Deep
Learning based Speech Recognition,” in New Era for Robust Speech
Recognition: Exploiting Deep Learning, S. Watanabe, M. Delcroix,
F.Metze, & J. Hershey (eds). , Springer, in press. (preliminary
version).
- F. de la Calle Silos and R. M. Stern, “Synchrony-Based
Feature Extraction for Robust Speech Recognition," IEEE
Signal Processing Letters, 24;1158-1162.
- A. Menon, C. Kim, and R. M. Stern, "Robust
Speech Recognition Based on Binaural Auditory Processing," Interspeech
2017, August 2017, Stockholm, Sweden.
- A. Menon, C. Kim, U. Kurokawa, and R. M. Stern, (2017), “Binaural
Processing for Robust Recognition of Degraded Speech,” IEEE
Automatic Speech Recognition and Understanding Workshop, December 2017,
Naha, Okinawa, Japan.
2016
- C. Kim and R. M. Stern, Power-Normalized
Cepstral Coefficients (PNCC) for Robust Speech Recognition, IEEE
Trans. on Audio, Speech, and Language Processing, 24:1315-1329.
- B. J. Cho, H. Kwon, J.-W. Cho, C. Kim, R. M. Stern, and H.-M. Park, A
Subband-Based Stationary-Component Suppression Method Using Harmonics
and Power Ratio for Reverberant Speech Recognition, IEEE
Signal Processing Letters, 23:780-784.
- R. M. Stern, C. Kim, A. R. Moghimi, A. Menon, Binaural
Technology and Automatic Speech Recognition, International
Congress on Acoustics, September 2016, Buenos Aires, Argentina.
2015
- G. D. Romigh, D. S. Brungart, R. M. Stern, and B. D.
Simpson, Efficient Real Spherical
Harmonic Representation of Head-Related Transfer Functions, IEEE
Journal on Selected Topics in Signal Processing, 9:921-930,
August 2015.
- M. J. Harvilla and R. M. Stern, Efficient
audio declipping using regularized least squares, IEEE
International Conference on Accoustics, Speech, and Signal Processing,
April 2015, Brisbane, Australia.
- M. J. Harvilla and R. M. Stern, Robust
parameter estimation for audio declipping in noise, Interspeech
2015, September 2015, Dresden, Germany.
- K. Osako, R. Singh, and B. Raj, Complex
Recurrent Neural Networks for Denoising Speech Signals, IEEE
Workshop on Applications of Signal Processing to Audio and Acoustics
(WASPAA), New Paltz, New York.
2014
- A. Moghimi and R. M. Stern, An
Analysis of Binaural Spectro-Temporal Masking as Nonlinear
Beamforming, IEEE International Conference on Accoustics,
Speech, and Signal Processing, May 2014, Florence, Italy.
- M. J. Harvilla and R. M. Stern, Least
squares signal declipping for robust speech recognition, Interspeech
2014, September 2014, Singapore.
- A. R. Moghimi, B. Raj, and R. M. Stern, Post-masking:
A hybrid approach to array processing for speech recognition, Interspeech
2014, September 2014, Singapore.
- C. Kim, K. K. Chin, M. Bacchiani, and R. M.
Stern, Robust speech recognition
using temporal masking and thresholding algorithm, Interspeech
2014, September 2014, Singapore.
2013
- H. Hermansky, J. R. Cohen, and R. M. Stern, Perceptual
properties of current speech recognition technology, Proc.
IEEE, 101:1969-1985, September 2013.
- R. M. Stern and N. Morgan, Features
based on auditory physiology and perception, Chapter in Techniques
for Noise Robustness in Speech Recognition, T. Virtanen, B. Raj,
and R. Singh, Eds., pp. 193-227. (page proofs)
- M. J. Harvilla and R. M. Stern, Recognition
of speech enhanced by blind compensation for artifacts of
single-sideband modulation, (unpublished) 2013.
2012
- Y.-H. B. Chiu, B. Raj, and R. M. Stern, Learning-based
auditory encoding for robust speech recognition, IEEE Trans.
on Audio, Speech, and Language Processing, 20:900-914,
March, 2012.
- R. M. Stern and N. Morgan, Hearing
is believing: Biologically-inspired feature extraction for robust
automatic speech recognition, IEEE Signal Processing Magazine,
29:34-43, November, 2012.
- M. J. Harvilla and R. M. Stern, Histogram-Based
Subband Power Warping and Spectral Averaging for Robust Speech
Recognition under Matched and Multistyle Training, IEEE
International Conference on Acoustics, Speech, and Signal Processing,
March 2012, Kyoto, Japan.
- C. Kim and R. M. Stern, Power-Normalized
Cepstral Coefficients (PNCC) for Robust Speech Recognition, IEEE
International Conference on Acoustics, Speech, and Signal Processing,
March 2012, Kyoto, Japan.
- C. Kim, C. Khawand, and R. M. Stern, Two-Microphone
Source Separation Algorithm Based on Statistical Modeling of Angle
Distributions, IEEE International Conference on Acoustics,
Speech, and Signal Processing, March 2012, Kyoto, Japan.
2011
- R. Stern, Applying
physiologically-motivated models of auditory processing to automatic
speech recognition, Third International Symposium on Auditory
and Audiological Research, August 2011, Nyborg, Denmark.
- C. Kim, K. Kumar, and R. M. Stern, Binaural
sound source separation motivated by auditory processing, IEEE
International Conference on Acoustics, Speech, and Signal Processing,
May 2011, Prague, Czech Republic.
- K. Kumar, C. Kim, and R. M. Stern, Delta-spectral
cepstral coefficients for robust speech recognition, IEEE
International Conference on Acoustics, Speech, and Signal Processing,
May 2011, Prague, Czech Republic.
- K. Kumar, B. Raj, R. Singh, and R. M. Stern, An
iterative least-squares techique for dereverberation, IEEE
International Conference on Acoustics, Speech, and Signal Processing,
May 2011, Prague, Czech Republic.
- K. Kumar, R. Singh, B. Raj, and R. M. Stern, Gammatone
sub-band magnitude-domain dereverberation , IEEE
International Conference on Acoustics, Speech, and Signal Processing,
May 2011, Prague, Czech Republic.
- W. Kim and R. M. Stern, "Mask
classification for missing-feature reconstruction for robust speech
recognition," Speech Communication, 53:1-11,
January, 2011.
2010
- Z. Al Bawab, B. Raj, and R. M. Stern, "A
hybrid physical and statistical dynamic articulatory framework
incorporating analysis-by-synthesis for improved phone classification,"
IEEE International Conference on Acoustics, Speech, and Signal
Processing, March 2010, Dallas, Texas.
- Y.-H. B. Chiu, B. Raj, and R. M. Stern, "Learning-based
auditory encoding for robust speech recognition," IEEE
International Conference on Acoustics, Speech, and Signal Processing,
March 2010, Dallas, Texas.
- C. Kim and R. M. Stern, "Feature
extraction for robust speech recognition based on maximizing the
sharpness of the power distribution and on power flooring," IEEE
International Conference on Acoustics, Speech, and Signal Processing,
March 2010, Dallas, Texas.
- K. Kumar and R. M. Stern, "Maximum-likelihood-based
cepstral inverse filtering for blind speech dereverberation," IEEE
International Conference on Acoustics, Speech, and Signal Processing,
March 2010, Dallas, Texas.
- C. Kim, R. M. Stern, K. Eom, and J. Lee, "Automatic
selection of thresholds for signal separation algorithms based on
interaural delay," Interspeech 2010, September 2010, Makuhari,
Japan.
- C. Kim and R. M. Stern, "Nonlinear
enhancement of onset for robust speech recognition," Interspeech
2010, September 2010, Makuhari, Japan.
2009
- H.-M. Park and R. M. Stern, "Spatial
separation of speech signals using amplitude estimation based on
interaural comparisons of zero crossings," Speech
Communication, 51:15-25, January 2009.
- Y.-H. B. Chiu and R. M. Stern, "Minimum
variance modulation filters for robust speech recognition," IEEE
International Conference on Acoustics, Speech, and Signal Processing,
April 2009, Taipei, Taiwan.
- Z. Al Bawab, L. Turicchia, R. M. Stern, and B. Raj, "Deriving
vocal tract shapes from electromagnetic articulograph data via
geometric adaptation and matching, Interspeech 2009,
September 2009, Brighton, United Kingdom.
- L. Buera, A. Miguel, A. Ortega, E. Lleida, and R. Stern, "Unsupervised
training scheme with non-stereo data for empirical feature vector
compensation, Interspeech 2009, September 2009, Brighton,
United Kingdom.
- Y.-H. B. Chiu, B. Raj, and R. M. Stern, "Toward
fusion of feature extraction and acoustic model training: a top-down
process for robust speech recognition," Interspeech 2009,
September 2009, Brighton, United Kingdom.
- L. Gu and R. M. Stern, "Speaker
segmentation and clustering for simultaneously-presented speech,"
Interspeech 2009, September 2009, Brighton, United Kingdom.
- C. Kim, K. Kumar, B. Raj, and R. M. Stern, "Signal
separation for robust speech recognition based on phase difference
information obtained in the frequency domain," Interspeech
2009, September 2009, Brighton, United Kingdom.
- C. Kim and R. M. Stern, "Feature
extraction for robust speech recognition using a power-law
nonlinearity and power-bias subtraction," Interspeech 2009,
September 2009, Brighton, United Kingdom.
- C. Kim and R. M. Stern, "Power
Function-Based Power Distribution Normalization Algorithm for Robust
Speech Recognition," IEEE Automatic Speech Recognition and
Understanding Workshop, December 2009, Merano, Italy.
- C. Kim and R. M. Stern, "Robust
Speech Recognition using a Small Power Boosting Algorithm," IEEE
Automatic Speech Recognition and Understanding Workshop, December
2009, Merano, Italy.
2008
- R. Stern, E. Gouvea, C. Kim, K. Kumar, and H.-M.Park, “Binaural
and multiple-microphone signal processing motivated by auditory
perception,” HSCMA Joint Workshop on Hands-free Speech
Communication and Microphone Arrays, May 2008, Trento, Italy.
- Z. Al Bawab, B, Raj, and R. M. Stern, “Analysis-by-synthesis
features for speech recognition,” IEEE International
Conference on Acoustics, Speech, and Signal Processing, April
2008, Las Vegas, Nevada.
- L. Gu and R. M. Stern, “Single-channel
speech separation based on modulation frequency,” IEEE
International Conference on Acoustics, Speech, and Signal Processing,
April 2008, Las Vegas, Nevada.
- K. Kumar, and R. M. Stern, “Environment-invariant
compensation for reverberation using linear post-filtering for minimum
distortion,” IEEE International Conference on Acoustics,
Speech, and Signal Processing, April 2008, Las Vegas, Nevada.
- Y.-H. Chiu and R. M. Stern, "Analysis
of physiologically-motivated signal processing for robust speech
recognition," Interspeech 2008, September 2008, Brisbane,
Australia.
- C. Kim and R. M. Stern, "Robust
Signal-to-Noise Ratio Estimation Based on Waveform Amplitude
Distribution Analysis," Interspeech 2008, September 2008, Brisbane,
Australia.
2007
- H.-M. Park and R. M. Stern, “Missing-feature
speech recognition using dereverberation and echo suppression in
reverberant environments,” IEEE International Conference on
Acoustics, Speech, and Signal Processing, April 2007, Honolulu,
Hawaii.
- K. Kumar, T. Chen, and R. M. Stern, “Profile
view lip reading,” IEEE International Conference on
Acoustics, Speech, and Signal Processing, April 2007, Honolulu,
Hawaii.
- R. M. Stern, E. Gouvea, and G. Thattai, "'Polyaural’
array processing for automatic speech recognition in degraded
environments,” Proc. Interspeech 2007, August 2007,
Antwerp, Belgium.
2006
- M. L. Seltzer and R. M. Stern, “Subband
Likelihood-Maximizing Beamforming for Speech Recognition in
Reverberant Environments,” IEEE Trans. on Audio, Speech, and
Language Processing, 14(6): 2109-2121, November 2006.
- R. M. Stern, DeL. Wang, and G. Brown, “Binaural
sound localization,” Chapter in Computational Auditory Scene
Analysis, G. Brown and DeL. Wang, Eds., Wiley/IEEE Press, 2006.
- R. M. Stern, C. Trahiotis, and A. Ripepi, “Fluctuations
in amplitude and frequency enable interaural delays to foster the
identification of speech-like stimuli,” Chapter in Dynamics
of Speech Production and Perception, P. Divenyi et al.,
Eds., IOS Press, 2006.
- H.-M. Park and R. M. Stern, “Spatial
separation of speech sgnals using continuously-variable masks
estimated from comparisons of zero crossings,” IEEE
International Conference on Acoustics, Speech, and Signal Processing,
May 2006, Toulouse, France.
- W. Kim and R. M. Stern, “Band-independent
mask estimation for missing-feature reconstruction,” IEEE
International Conference on Acoustics, Speech, and Signal Processing,
May 2006, Toulouse, France.
- C. Kim, Y.-H. Chiu, and R. M. Stern, “Physiologically-motivated
synchrony-based processing for robust automatic speech recognition,”
Interspeech 2006, September 2006, Pittsburgh, Pennsylvania.
- B. Narayanaswamy, R. Gangadharaiah, and R. M. Stern, “Voting
for two speaker segmentation,” Interspeech 2006, September
2006, Pittsburgh, Pennsylvania.
2005
- B. Raj and R. M. Stern, “Missing-Feature
Methods for Robust Automatic Speech Recognition,” IEEE Signal
Processing Magazine, 22(5):101-116, September
2005.
- N.S. Kim, W. Lim, and R. M. Stern, “Feature compensation based on
switching linear dynamic model,” IEEE Signal Processing Letters,
12 (6): 473-476, June, 2005.
- W. Kim, R. M. Stern, and H. Ko, "Environment-Independent
Mask Estimation for Missing Feature Reconstruction," Proc.
Eurospeech-2005 September, 2005, Lisbon, Portugal.
2004
- B. Raj, M. L. Seltzer, and R. M. Stern, “Reconstruction
of Missing Features for Robust Speech Recognition,” Speech
Communication Journal 43(4): 275-296, September 2004.
- M. L. Seltzer, B. Raj, and R. M. Stern, “A
Bayesian Framework for Spectrographic Mask Estimation for Missing
Feature Speech Recognition,” Speech Communication Journal 43(4):
379-393, September 2004.
- M. L. Seltzer, B. Raj, and R. M. Stern, “Likelihood-Maximizing
Beamforming for Robust Hands-Free Speech Recognition,” IEEE
Trans. on Speech and Audio Processing, 12(5): 489-498,
September 2004.
- R. M. Stern, “Signal
Separation Motivated by Human Auditory Perception: Applications to
Automatic Speech Recognition,” in Speech Separation by Humans
and Machines, P. Divenyi, Ed., Springer-Verlag, 2004.
- Y. Obuchi, N. Hataoka, and R. M. Stern, "Normalization
of Time-Derivative Parameters for Robust Speech Recognition in Small
Devices," IEICE Transactions on Information and Systems 87-D(4):
1004:1011, April 2004.
- X. Li and R. M. Stern, “Feature
Generation Based on Maximum Normalized Acoustic Likelihood for
Improved Speech Recognition,” IEEE International Conference on
Acoustics, Speech, and Signal Processing, May 2004, Montreal,
Quebec.
- B. Raj, R. Singh, and R. M. Stern, “On
Tracking Noise with Linear Dynamical System Models,” IEEE
International Conference on Acoustics, Speech, and Signal Processing,
May 2004, Montreal, Quebec.
- M. L. Seltzer and R. M. Stern, “Parameter
Sharing in Subband Likelihood-Maximizing Beamforming for Speech
Recognition using Microphone Arrays,” IEEE International
Conference on Acoustics, Speech, and Signal Processing, May 2004,
Montreal, Quebec.
- X. Li and R. M. Stern, "Parallel Feature
Generation Based on Maximum Normalized Acoustic Likelihood for
Improved Combination Performance," International Conference on
Spoken Language Processing, October, 2004, Jeju Island, Korea.
2003
- B. Raj and R. Singh, "Classifier-Based
Non-Linear Projection for Adaptive Endpointing of Continuous Speech,"
Computer Speech and Language 17(1):5-26, January 2003.
- M. L. Seltzer, and B. Raj, "Speech
Recognizer Based Filter Optimization for Microphone Array Processing",
IEEE Signal Processing Letters 10(3):69-71, March 2003.
- M. Seltzer and R. Stern, “Subband
Parameter Optimization of Microphone Arrays for Speech Recognition in
Reverberant Environments,” IEEE International Conference on
Acoustics, Speech, and Signal Processing, April 2003, Hong Kong.
- X. Li and R. Stern, “Training of
Stream Weights for the Decoding of Speech using Parallel Feature
Streams,” IEEE International Conference on Acoustics, Speech,
and Signal Processing, April 2003, Hong Kong.
- X. Li and R. M. Stern, “Feature
Generation Based on Maximum Classification Probability for Improved
Speech Recognition," Proc. Eurospeech-2003 September,
2003, Geneva, Switzerland.
- J. P. Nedel and R. M. Stern, “Duration
Normalization and Hypothesis Combination for Improved Spontaneous
Speech Recognition,” Proc. Eurospeech-2003 September,
2003, Geneva, Switzerland.
- Y. Obuchi and R. M. Stern, “Normalization
of Time-Derivative Parameters using Histogram Equalization," Proc.
Eurospeech-2003 September, 2003, Geneva, Switzerland.
2002
- R. Singh, B. Raj, and R. M. Stern, "Automatic
Generation of Subword Units for Speech Recognition Systems," IEEE
Transactions on Speech and Audio Processing, 10(2): 89-99,
2002.
- R. Singh, R. M. Stern, and B. Raj, “Signal and Feature Compensation
Methods for Robust Speech Recognition,” Chapter in CRC Handbook on
Noise Reduction in Speech Applications, Gillian Davis, Ed. CRC
Press, 2002.
- R. Singh, B. Raj, and R. M. Stern, “Model Compensation and Matched
Condition Methods for Robust Speech Recognition,” Chapter in CRC
Handbook on Noise Reduction in Speech Applications, Gillian Davis,
Ed. CRC Press, 2002.
- M. L. Seltzer, B. Raj, and R. M. Stern, “Speech
Recognizer-Based Microphone Array Processing for Robust Hands-Free
Speech Recognition,” Proc. IEEE Conf. on Acoustics, Speech,
and Sig. Proc., May, 2002, Orlando, Florida.
- X. Li, R. Singh, and R. M. Stern, "Lattice
Combination for Improved Speech Recognition," Proc. of the
International Conference of Spoken Language Processing, September,
2002, Denver, Colorado.
2001
- J. M. Huerta and R. M. Stern. "Distortion-Class
Modeling for Robust Speech Recognition under GSM RPE-LTP Coding,”
Speech Communication Journal, 34:213-225.
- R. Singh, M. L. Seltzer, B. Raj, and R. M. Stern, “Speech
in Noisy Environments: Robust Automatic Segmentation, Feature
Extraction, and Hypothesis Combination,” Proc. IEEE Conf. on
Acoustics, Speech, and Sig. Proc., May, 2001, Salt Lake City,
Utah.
- J. P. Nedel and R. M. Stern, “Duration
Normalization for Improved recognition of Spontaneous and Read Speech
via Missing Feature Methods,” Proc. IEEE Conf. on Acoustics,
Speech, and Sig. Proc., May, 2001, Salt Lake City, Utah.
- D. P. W. Ellis, R. Singh, and S. Sivadas, “Tandem
Acoustic Modeling in Large-Vocabulary Recognition,” Proc. IEEE
Conf. on Acoustics, Speech, and Sig. Proc., May, 2001, Salt Lake
City, Utah.
- M. L. Seltzer and B. Raj, "Calibration
of Microphone Arrays for Improved Speech Recognition," Proc.
Eurospeech-2001 September, 2001, Aalborg, Denmark.
- B. Raj, M. L. Seltzer, and R. M. Stern, “Robust
Speech Recognition: The Case for Restoring Missing Features,” Proc.
of the Workshop on Consistent and Reliable Acoustic Cues,
September, 2001, Aalborg, Denmark.
2000
- S.-J. Doh and R. M. Stern, “Using
Class Weighting in Inter-Class MLLR,” Proc. of the
International Conference of Spoken Language Processing, October,
2000, Beijing, China.
- J. M. Huerta and R. M. Stern, “Instantaneous
Distortion-Based Weighted Acoustic Modeling for Robust Recognition of
Coded Speech,” Proc. of the International Conference of Spoken
Language Processing, October, 2000, Beijing, China.
- J. P. Nedel, R. Singh, and R. M. Stern, “Automatic
Subword Unit Refinement for Spontaneous Speech Recognition via
Phoneword Splitting,” Proc. of the International Conference of
Spoken Language Processing, October, 2000, Beijing, China.
- J. P. Nedel, R. Singh, and R. M. Stern, “Phone
Transition Acoustic Modeling: Application to Speaker Independent and
Spontaneous Speech Systems,” Proc. of the International
Conference of Spoken Language Processing, October, 2000, Beijing,
China.
- B. Raj, M. L. Seltzer, and R. M. Stern, “Reconstruction
of Damaged Spectrographic Features for Robust Speech Recognition,”
Proc. of the International Conference of Spoken Language Processing,
October, 2000, Beijing, China.
- M. L. Seltzer, B. Raj, and R. M. Stern, “Classifier-Based
Mask Estimation for Missing Feature Methods of Robust Speech
Recognition,” Proc. of the International Conference of Spoken
Language Processing, October, 2000, Beijing, China.
- R. Singh, B. Raj, and R. M. Stern, “Structured
Redefinition of Sound Units by Merging and Splitting for Improved
Speech Recognition,” Proc. of the International Conference of
Spoken Language Processing, October, 2000, Beijing, China.
- S.-J. Doh and R. M. Stern, “Inter-Class
MLLR for Speaker Adaptation,” Proc. IEEE Conf. on Acoustics,
Speech, and Sig. Proc., June, 2000, Istanbul, Turkey. (Poster)
- R. Singh, B. Raj, and R. M. Stern, “Automatic
Generation of Phone Sets and Lexical Transcriptions,” Proc.
IEEE Conf. on Acoustics, Speech, and Sig. Proc., June, 2000,
Istanbul, Turkey.
- M. Ravishankar, R. Singh, B. Raj, R. M. Stern, "The
1999 CMU 10X Real Time Broadcast News Transcription System,” Proc.
NIST Speech Transcription Workshop, May, 2000, College Park,
Maryland.
1999
- S.-J. Doh and R. M. Stern, "Weighted
principal component MLLR for speaker adaptation," Proc. of
Automatic Speech Recognition and Understanding Workshop (ASRU 99),
Colorado, USA, 1999. (Poster)
- R. Singh, B. Raj and R. M. Stern, "Automatic
Clustering And Generation of Contextual Questions For Tied States In
Hidden Markov Models," Proc. of the ICASSP., Phoenix,
Arizona, March, 1999.
- J. M. Huerta and R. M. Stern, "Distortion-Class
Weighted Acoustic Modeling for Robust Recognition under GSM RPE-LTP
Coding", Proc. of the International Symposium on Robust Speech
Recognition, Tampere, Finland, June, 1999.
- R. Singh, B. Raj, and R. M. Stern, “Domain
Adduced State Tying for Cross-domain Acoustic Modelling,” Proc.
Eurospeech-99, September, 1999, Budapest, Hungary.
- J. M. Huerta, S. J. Chen, and R. M. Stern, “The
1998 Carnegie Mellon University Sphinx-3 Spanish Broadcast News
Transcription System", Proc. of the DARPA Broadcast News
Transcription and Understanding Workshop, March, 1999, Herndon,
Virginia.
1998
- P. J. Moreno, B. Raj, and R. M. Stern. “Data-Driven
Environmental Compensation for Speech Recognition: A Unified Approach,”
Speech Communication , 24: 267-85, 1998.
- J. M. Huerta and R. M. Stern, "Speech
Recognition From GSM Codec Parameters," Proc. of the
International Conference on Spoken Language Processing, Sydney,
Australia, November, 1998.
- B. Raj, R. Singh, and R. M. Stern, "Inference
of Missing Spectrographic Features for Robust Speech Recognition,"
Proc. of the International Conference on Spoken Language Processing,
Sydney, Australia, November, 1998.
1997
- R. M. Stern, B. Raj, and P. J. Moreno, (1997). “Compensation
for Environmental Degradation in Automatic Speech Recognition,” Proc.
of the ESCA Tutorial and Research Workshop on Robust Speech
Recognition for Unknown Communication Channels, April, 1997,
Pont-au-Mousson, France, pp. 33-42.
- M. A. Siegler, U. Jain, B. Raj, and R. M. Stern, "Automatic
Segmentation, Classification and Clustering of Broadcast News Audio,"
Proc. of the Speech Recognition Workshop (DARPA), Chantilly, VA,
Feb. 1997.
- J. M. Huerta, E. Thayer, M. Ravishankar, and R. M. Stern, “The
Development of the 1997 CMU Spanish Broadcast News Transcription
System,” Proc. of the DARPA Broadcast News Transcription and
Understanding Workshop, February, 1998, Landsdowne, Virginia.
- E. Gouvêa, and R. M. Stern, "Speaker
Normalization Through Formant-Based Warping Of The Frequency Scale,"
Proc. of the EUROSPEECH, 1997.
- B. Raj, E. Gouvêa, and R. M. Stern, "Vector
Polynomial Approximations For Robust Speech Recognition," Proc.
of the ESCA Tutorial and Research Workshop on Robust Speech
Recognition for Unknown Communication Channels, Pont-au-Mousson,
France, April, 1997.
- B. Raj, V. N. Parikh, and R. M. Stern, "The
Effects Of Background Music On Speech Recognition Accuracy," Proc.
of the ICASSP, Munich, Germany, April 1997.
- J. M. Huerta and R. M. Stern, “Compensation
for Environmental and Speaker Variability by Normalization of Pole
Locations,” Proc. Eurospeech-97, September, 1997, Rhodes,
Greece.
1996
- R. M. Stern, A. Acero, F.-H. Liu, and Y. Ohshima, “Signal
Processing for Robust Speech Recognition,” Chapter in Speech
Recognition, pp. 351-378, C.-H. Lee and F. Soong, Eds., Boston:
Kluwer Academic Publishers, 1996.
- P. J. Moreno, B. Raj, and R. M. Stern, "A
Vector Taylor Series Approach For Environment-Independent Speech
Recognition," Proc. of the ICASSP, Atlanta, GA, May 1996.
- B. Raj, E. Gouvêa, P. J. Moreno, and R. M. Stern, "Cepstral
Compensation By Polynomial Approximation For Environment-Independent
Speech Recognition," Proc. of the ICSLP, Philadelphia,
PA, Oct. 1996.
- E. B. Gouvea, P. J. Moreno, B. Raj, T. M. Sullivan, and R. M. Stern, “Adaptation
and Compensation: Approaches To Microphone And Speaker Independence In
Automatic Speech Recognition,” Proceedings of the ARPA Workshop on
Speech Recognition Technology, Harriman, NY, Morgan Kaufmann, D.
Pallett, Ed.
- U. Jain, M. A. Siegler, S.-J. Doh, E. Gouvea, P. J. Moreno, B. Raj,
and R. M. Stern, “Recognition Of
Continuous Broadcast News With Multiple Unknown Speakers And
Environments,” Proceedings of the ARPA Workshop on Speech
Recognition Technology, Harriman, NY, Morgan Kaufmann, D. Pallett, Ed.
1995
- P. J. Moreno, B. Raj, E. Gouvêa, and R. M. Stern, "Multivariate-Gaussian-Based
Cepstral Normalization for Robust Speech Recognition," Proc.
of the ICASSP, Detroit, Michigan, 1995.
- M. A. Siegler, and R. M. Stern, "On the
Effects of Speech Rate in Large Vocabulary Speech Recognition
Systems," Proc. of the ICASSP, Detroit, Michigan, 1995.
- P. J. Moreno, B. Raj, R. M. Stern, “A
Unified Approach to Robust Speech Recognition,” Proc. of
Eurospeech-95, Madrid, Spain, September, 1995.
- P. J. Moreno, M. A. Siegler, U. Jain, and R. M. Stern, "Continuous
Speech Recognition of Large Vocabulary Telephone Quality Speech,"
Proc. of the Eighth Spoken Language Systems Technology Workshop, 1995.
- P. J. Moreno, U. Jain, B. Raj, and R. M. Stern, "Approaches
to Microphone Independence in Automatic Speech Recognition," Proc.
of the Eighth Spoken Language Systems Technology Workshop, 1995.
- P. J. Moreno, B. Raj, and R. M. Stern, "Approaches
to Environment Compensation in Automatic Speech Recognition," Proc.
15th International Conference on Acoustics, Trondheim, Norway,
Vol. III, pp. 109-112, June, 1995.
- Stern, R. M. and Sullivan, T. M. “Robust
Speech Recognition Based on Human Binaural Perception,” Proc.
of the ATR workshop on A Biological Framework for Speech Perception
and Production, Kansai Science City, September, 1994, Reprinted
as ATR Technical Report TR-H-121, (1995).
1994
- F.-H. Liu, R. M. Stern, A. Acero, and P. J. Moreno, "Environment
Normalization for Robust Speech Recognition using Direct Cepstral
Comparison," Proc. of the ICASSP, Adelaide, Australia,
1994.
- P. J. Moreno, and R. M. Stern, "Sources
of Degradation of Speech Recognition in the Telephone Network," Proc.
of the ICASSP, Adelaide, Australia, 1994.
- R. M. Stern, F.-H. Liu, P. J. Moreno, and A. Acero, "Signal
Processing for Robust Speech Recognition," Proc. of the
International Conference on Spoken Language Processing, Yokohama,
Japan, September, 1994.
- N. Hanai, and R. M. Stern, "Robust
Speech Recognition in the Automobile," Proc. of the
International Conference on Spoken Language Processing, Yokohama,
Japan, September, 1994.
- Y. Ohshima and R. M. Stern, “Environmental
Robustness in Automatic Speech Recognition Using
Physiologically-Motivated Signal Processing,” Proc. of the
International Conference on Spoken Language Processing, Yokohama,
Japan, September, 1994.
- F.-H. Liu, P. J. Moreno, R. M. Stern, and A. Acero, “Signal
Processing For Robust Speech Recognition,” Proceedings of the
Seventh ARPA Workshop on Human Language Technology, Princeton, New
Jersey, Morgan Kaufmann, C. J. Weinstein, Ed.
- F.-H. Liu, P. J. Moreno, R. M. Stern, and A. Acero, “Signal
Processing For Robust Speech Recognition,” Proceedings of the
ARPA Workshop on Spoken Language Technology, Princeton, New
Jersey, March, 1994, R. M. Stern, Ed.
1993
- T. M. Sullivan and R. M. Stern, "Multi-Microphone
Correlation-Based Processing for Robust Speech Recognition," Proc.
of the ICASSP, Minneapolis, Minnesota, April, 1993.
- F.-H. Liu, R. M. Stern, X. Huang, and A. Acero, "Efficient
Cepstral Normalization For Robust Speech Recognition," Proc.
of the Sixth ARPA Workshop on Human Language Technology, Princeton,
NJ, Morgan Kaufmann, March, 1993.
1992
- R. M. Stern, F.-H. Liu, Y. Ohshima, T. M. Sullivan, and A. Acero, "Multiple
Approaches to Robust Speech Recognition," Proc. of the Fifth
DARPA Speech and Natural Language Workshop, Harriman, New York,
February, 1992.
- F.-H. Liu, A. Acero, and R. M. Stern, "Efficient
Joint Compensation of Speech for the Effects of Additive Noise and
Linear Filtering," Proc. of the ICASSP, San Francisco,
CA, March, 1992.
- R. M. Stern, F.-H. Liu, Y. Ohshima, T. M. Sullivan, and A. Acero, "Multiple
Approaches to Robust Speech Recognition," Proc. of the ICSLP,
1992.
1991
- A. Acero, and R. M. Stern, "Robust Speech
Recognition by Normalization of the Acoustic Space," Proc. of
the ICASSP, Toronto, Ontario, 1991.
- W. A. Rozzi and R. M. Stern, “Fast
Estimation of Mean Vectors using Adaptive Filtering,” Proc. of
the IEEE International Conference on Acoustics, Speech, and Signal
Processing, Toronto, Ontario, pp. 865-868, 1991.
1990
- A. Acero, and R. M. Stern, "Environmental
Robustness in Automatic Speech Recognition," Proc. of the
ICASSP, Albuquerque, New Mexico, 1990.
- A. Acero, and R. M. Stern, “Toward
Microphone-Independent Spoken Language Systems,” Proceedings
of the DARPA Speech and Natural Language Workshop , Hidden Valley,
PA, R. M. Stern , Ed., Morgan Kaufmann Publishers, Inc., San Mateo, CA,
1990.
- A. Acero, and R. M. Stern, “Acoustical
Pre-Processing for Robust Spoken Language Systems,” Proc.
First International Conference on Spoken Language Processing, pp.
1121-1124, Kobe, Japan, November, 1990.
- D. A. Coast, R. M. Stern, G. G. Cano, and S. A. Briller, "An
Approach to Cardiac Arrhythmia Analysis Using Hidden Markov models,"
IEEE Transactions on Biomedical Engineering, September, 1990.
"Classic" robust papers (pre-1990)
Original description of extended maximum a posteriori
probability (EMAP) speaker adaptation:
- R. M. Stern and M. J. Lasry, “Dynamic
Speaker Adaptation for Feature-Based Isolated Letter Recognition,”
IEEE Trans. on Acoustics, Speech, and Signal Processing 35:
751-763, 1987.
- M. J. Lasry and R. M. Stern, “A Posteriori Estimation of Correlated
Jointly Gaussian Mean Vectors,” IEEE Trans. on Pattern Anal. and
Mach. Intel. 6: 530-535, 1984.
- M. J. Lasry and R. M. Stern, “Unsupervised Adaptation to New Speakers
in Feature-Based Letter Recognition,” Proc. IEEE Conf. on Acoustics,
Speech, and Sig. Proc., San Diego, California, May, 1984.
- R. M. Stern and M. J. Lasry (1983). “Dynamic Speaker Adaptation for
Isolated Letter Recognition Using MAP Estimation,” Proc. IEEE Conf.
on Acoustics, Speech, and Sig. Proc., Boston, Massachusetts, May,
1983.