Announcing: The Lotec Speech Recognition Package All that you need to build a single-speaker, small-vocabulary, low-quality continuous speech recognition module, for use as part of a larger system. Input: a sound sample in Sun .au file format, plus word templates in the same format Output: a bunch of word hypotheses, each consisting of temporal location and likelihood score (eg, `template 2 for the word "central" matched the input best in the time span from 450 to 830 milliseconds, and the match discrepancy was 405.32). (That is, it outputs a lattice of word hypotheses.) Hardware: SUN SparcStation, decent microphone Software: SunOS 4.1.2 (Unix plus the "multimedia" library files in /usr/demo/SOUND), X, and if you want to compile, gcc (the gnu C compiler) ======= CONTENTS ======= "grab" helps you record speech samples. "labeler" lets you interactively assign word labels to a speech sample. "chopper" chops a speech sample file into files for each word. "featurizer" converts a speech file to a parametric representation. "match" does word spotting. "real" is an online version of grab|featurize|match that runs in real time on a Sun SparcStation 10. Other goodies. ======= POLITICAL STATMENT ======= It's time for people to exploit a little speech input functionality in all sorts of systems. To do so, they shouldn't have to buy expensive software, nor learn a lot about speech processing. ======= PERSONAL STATEMENT ======= I hate C. I don't understand signal processing. But, having no luck trying to beg, borrow, or steal some simple speech software, I had no choice but together my own package. It's simplistic, but has served at least to let me try out some ideas (on the integration of speech and language processing). I'm making it available in the hope that others will find it useful, but have no time or inclination to support it. ======= BACKGROUND ======= Naively, you might expect speech recognition to be like a stenographer: converting your speech to words. But that's impossible without human-type knowledge. Using acoustic, phonetic, and lexical knowledge only, all you can get is probabilities for what word is where. Most systems hide this fact, by searching thru the lattice of word hypotheses to come up with one (or a few) best sentence hypotheses. This is fine if it hits on the right interpretation, but if not, the downstream system is stuck. So it's better for the speech recognizer to output the entire lattice of word hypotheses (or so I claim). So, if you're interested in building systems which use a speech input, Lotec may be for you. Moreover, the low quality shouldn't bother you. My rationale is this: Sometime in the next century there will be recognition systems which can extract useful information from uncooperative speakers with bad microphones in noisy environments. What will the output of these systems be like? Not very good; and impossible to make sense without the application of semantic knowledge. That is, the output of these future systems will probably be similar in quality to the output of Lotec today, with a single speaker in a normal room with a good microphone and a small vocabulary. This means that, if you're interested in building systems that use the results of speech recognition, you can use Lotec today to prototype systems that will work well with 21st century speech recognition technology. ======= HOW TO GET IT ======= Lotec is available by anonymous ftp. To get it, do something like this: ftp ftp.sanpo.t.u-tokyo.ac.jp anonymous ((when it asks for Name)) xxx@yyy.zzz ((or whatever, when it asks for Password)) cd pub/nigel/lotec get lotec.tar.Z quit ((exit ftp) gunzip lotec.tar.Z ((or uncompress lotec.tar.Z) tar xvf lotec.tar Now, put lotec/bin in your path, and enjoy. If ftp is slow for you, take lotec-no-bin.tar.Z instead; then compile it by going to lotec/src, saying "make all", and waiting a couple of minutes. --- Nigel Ward nigel@sanpo.t.u-tokyo.ac.jp University of Tokyo May 1994 ---