11-756 / 18799D Design and Implementation of ASR Systems
11-756/18799D ASR: Assignment 1, Feature Computation
Problem
Write a routine for computing MFCC from audio
- Record multiple instances of digits multiple instances of digits
- Zero, One, Two etc.
- 16Khz sampling, 16 bit PCM
- Compute log spectra and cepstra
- Use 40 Mel spectral filters. They must cover the frequencies between 50Hz and 7000Hz (you may use a different setting if you choose).
- No. of features = 13 for cepstra (use first 13 DCT coefficients)
- Visualize both spectrographically (easy using matlab)
- Note similarity in different instances of the same word
- Modify number of filters to 30 and 25 (over the same frequency range).
- Patterns will remain, but be more blurry
- Record data with noise
- Degradation due to noise may be lesser on 25-filter outputs
Some suggestions
You are allowed to use code from the web
- The "wav2feat" code in CMU sphinx is good.
- Dan Ellis has nice matlab code on his website.
However, we recommend doing your own code if you can.
Regardless of what you use, the feature computation code must be integrated with the audio capture routine.
- Assume kbhit for start of recording. Stop of recording is obtained via automatic endpointing.
How to visualize the spectrogram represented by cepstra
The Mel-log spectrum can be directly visualized as a matrix.
However, the cepstrum is a dimensionality-reduced and transformed version of the log spectrum. It is not visually meaningful. However, the truncated cepstrum can be converted back to a log spectrum by zeropadding it to 64 or 128 poitns and computing an inverse DCT (if you used a DCT to derive cepstra from log spectra). The IDCT-derived logspectrum is what the cepstrum really represents.
Due: Wednesday, 8 Feb 2011.