Exercise 3
Preprocessing with JRTk

Introduction

The goal of Exercise 3 is to investigate how to preprocess a speech signal in a way that a speech recognition engine can handle it properly. This first process of a speech recognition engin is called preprocessing (or front end). We already learnt in the first session that in Janus this is done by the object FeatureSet.

Read the introduction of the FeatureSet manual to get more familiar with the featureSet object. Also read the section about the feature description file format

Most of the FeatureSet methods are described in the section Tcl methods in more detail. You may find them useful in order to solve the upcoming tasks and answer the questions.

Question 7-1 Compare the plots spectrum and melscal filterbank coefficients . What are the units of the plots axes? Which feature of a speech signal is represented in the melscal filterbank but not in the spectrum? Which of these two 'Features' is more appropriate as input for a speech recognition engine? Please give at least two reasons.

Liftering and Cepstrum
Now we investigate a speech signal and see how and why we calculate a cepstrum. Briefly review the SpeechCourse about Ceptren .

In the cepstral space it is easy to separate the exitation signal (voiced sounds) and the channel (vocal tract). The periodical signal in the spectrum of a voiced sound produce a peak in the higher part of the cepstrum. When these higher ceptral coefficient are set to 0 this process is called liftering (in analogy to the corresponding filtering operation on the spectrum). When we transform the liftered cepstren back to the spectrum, we get a smoothened spectrum.

Task 7 Follow step by step the commands in An example how liftering works .

As soon as you created a <FEATURE> you can display it by using the command
% fs show <FEATURE>

If you have more than one <FEATURE>, you can chose them from the feature selection list in the menu 'Feature'.
Now cut out a voiced region by using the left and right mouse buttons (for example 247-260). Arrange the zoom so that you can see the cut-out region.

By double clicking on the middle mouse button you can chose a second, third,... feature from the selection list to easily compare two features (i.e. CEP and newCEP).

You can view the spectren and cepstren best if you set the display mode to vertical using the menu 'Display -> mode -> vertical'.

Frame statistics

Task 8 Write a tcl-script frameCnt.tcl which creates a DBase object and a FeatureSet. As database reuse the one you already created in the last session. Apply this database and the FeatureSet to read in all audiofiles of our sample database. Create a feature 'power' and determine the number of frames for this feature. Create the 'power' feature by applying the FeatureSet method adc2pow and use a window size of 16ms. Your script frameCnt.tcl should report the database key (ID) of the utterance together with the number of frames. Finally the script should output the total number of processed sentences, the average frame number, and the total frame number.

Mean statistics

Task 9 Change the script from Task 8 so that FeatureSet uses a 'feature description' file. The feature description file reads in the audiofile and creates a melscale feature using the method adc2mel. Apply the method meansub with the options -mean and -upMean to calculate the total average of all feature vectors into a FVector. The resulting script melAverage.tcl reports the mean vector of all feature vectors.

Beep detector

In the following task you will build a simple beep detector, which detects if and where an audiofile contains a 1000Hz, and 4000Hz sinoid signal respectively.

Task 10 Write a tcl script beepDetect.tcl that creates a feature with the following characteristic: for each frame (i.e. every 10ms) the feature has one of the values:

                  1.0        if a 1000Hz sinoid signal is given
                 -1.0        if a 4000Hz sinoid signal is given
                  0.0        else
Use the audiofile ../isl-lab/beeps.adc as input. You can display the signal by typing the following commands:
FeatureSet fs
fs readADC beeps ../../../beeps.adc -sr 16 -bm 01
fs show beeps

Sinoid signals of different frequences are easily to separate in the spectral space. Think about how many points (how wide) the FFT-window should be to differentiate frequences of 1000Hz. After normalisation of the spectrum, detect the sinoid signal in the relevant bands (which one?) using a threshold detectors. To make life easier you can detect the bands independent from each other and combine the results later.

The following methods might be helpful:

spectrum
normalize
split
thresh
add
You may also use %fs <METHOD> -help or consult the FeatureSet manual.

Last modified: Fri Mar 9 10:40:46 EST 2001
Maintainer: tanja@cs.cmu.edu.