11-756 / 18799D Design and Implementation of ASR Systems

11-756/18799D ASR: Assignment 6, Training from Continuous Speech

In this problem we will use recordings from assignment 6 as test data.

In this assignment we will train HMMs for digits from continuous speech recordings.

Problem 1: Record each of the following digit sequences five times each. Record the digit "0" as "zero":

0 1 2 3 4 5 6 7 8 9
9 8 7 6 5 4 3 2 1 0
1 2 3 4 5 6 7 8 9 0
0 9 8 7 6 5 4 3 2 1
1 3 5 7 9 0 2 4 6 8
8 6 4 2 0 9 7 5 3 1

Record each digit sequence as a continuous recording (without pauses between words). You will now have 30 recordings of each of the digits.

Train models for all ten digits from these continuous recordings. To do so, compose the model for each digit string by concatenating the models for the individual digits. Include silence models on either side, e.g. to model "0 1 2 3 4 5 6 7 8 9", compose the HMM as "sil * 0 * 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * sil" (The "*" here indicates conctenation, and the digits represent their HMMs).

Initialize the HMMs for all digits by the models you learned for them from isolated recordings in previous assignments. For "silence" HMMs, record 5 separate 1-second segments of silence, train an HMM from them and use that to initialize the models trained from the continuous recordings.

Recognize all the continuous digit sequences you recorded for assignment 6 using these models. Report accuracies as reported in assignment 6.

Problem 2: We will now go for a "real" task -- training models from a medium sized corpus of recordings of digit sequences. This link points to a tar file that includes a portion of the "Aurora2" corpus. In it you will find a "train" directory with 8400 training recordings. The list of filenames for training is in the file TRAIN.list. Their corresponding recordings are in TRAIN.transcripts. The recordings are continuous digit strings, including the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. "0" is recorded both as "zero" and as "oh" (as you will find in the transcriptions). Use these data to train models for all digits. The transcriptions also have the "silence" marked, so you need not explicity add silences at the end of the digit strings. Initialize all models with the models from problem 1. Record a few instances of "oh" and initialize your models for "oh" with those.

The "test" directory contains 1000 test recordings. Each recording is a digit sequence. Use the setup from the second problem of assignment 6 (loopy digits) to recognize these sequences. Report the recognition accuracy as you did for problem 2 of assignment 6.