11-756 / 18799D Design and Implementation of ASR Systems

11-756/18799D ASR: Spell Checking with Lextrees

Problem

This homework builds on your previous spellchecker assignment. We have two problems:

Problem 1: Write a spell checker that loads a dictionary as a lexical tree.

For the spellchecker use the dictionary at (this link). Run the spellchecker on this little Indian story.

Problem 2: Use the lextree structure to also automatically segment the text in this file and this file to find word boundaries (and insert spaces) in the right places and (for the second file) simultaneously spellcheck the words in it. To do this, permit a transition back from the leaves of the lextree back to the "*". The location of "*" in the best path identifies word boundaries.

Note that the procedure above is likely to make many errors. Can you think of any variation to the procedure that may result in better segmentation?

For problem 2, try relative beam widths of 5,10, and 15. Compare the segmentation and corrected spellings in this file to determine which works best. The "accuracy" of an output is computed as the difference in the number of words in the hypothesized segmentation and the number of words in the correct transcription PLUS the number of mispelled words. If possible, plot accuracy as a function of beam width.

Due: Wednesday, 19 Mar 2014.