11-756 / 18799D Design and Implementation of ASR Systems
11-756/18799D ASR: Assignment 2, DTW
Problem
- Write a routine to compute the Levenshtein distance between two symbol strings.
- The routine must optionally include two different pruning strategies (so that either may be turned on or off), one based on a maximum string edit distance of 3, and another based on a "beam" of 3 relative to the current best score in any column of the trellis.
- Display the trellis, the scores and the best path in table format.
- Extend the routine to simultaneously compare a given text string to multiple templates and select the best one. As an example, you may want to compare the text string "Eleaphent" to "Elephant", "Elegant" and "Sycophant" at the same time to determine which is the closest of the three. Do NOT use lexical trees -- templates are not to be merged to share common portions.
- Apply the same pruning strategies as before (absolute distance of 3, and relative beam of 3). For the latter case, the best score and the scoring threshold must be computed across all templates.
- Display the trellis and best path in table form.
- Using the code for the above problem, develop a spelling checker. For the spelling checker, you are provided a dictionary (download from this link) which is simply a list of word spellings. Each of the word spellings in your dictionary is now a template. Given some incoming text, each word is compared to the entire list of templates to determine the closest one. This is returned as the “spellchecked” version of the word.
At the “demonstration” each of you will also be given a separate text string that you must run your spell checker on.
Due: Wednesday, 15 Feb 2011.