Jonathan Brown - Sphinx2-CTAT Integration

Sphinx2-CTAT Integration Directions

This page includes directions for integrating the Sphinx2 speech recognition system into the Cognitive Tutor Authoring Tools.

After following these directions, you will be able to build speech exercises using the Authoring Tools. Be sure to visit the Sphinx website to learn about the recognizer, including how to customize the language models for your application.

These directions will guide you through installing cygwin and Sphinx, adding some utilities and a language model to Sphinx, and connecting the Authoring Tools to the recognizer. I assume you already have NetBeans and the Authoring Tools installed. If not, please do this first. Note that the Authoring Tools/Widgets are not included in this code. (Neither is the recognizer, but information is supplied on how to get this.) To begin, please download the Sphinx2-CTAT Integration Files.

Sphinx2-CTAT Integration Files for Windows

Sphinx2-CTAT Integration Files for Linux/Unix

Downloadable Directions

Notes explaining what is going on are included in italics. Feel free to ignore them if you just want to get stuff working.

Directions

Go to www.cygwin.com and download the setup file for Cygwin.
Direct link: http://www.cygwin.com/setup.exe
Although Sphinx runs under Windows, it is easiest to work with under Linux. Thus, we need Cygwin.

Go to the Sphinx Sourceforge site and download sphinx2 (version .4).
Direct link: http://prdownloads.sourceforge.net/cmusphinx/sphinx2-0.4.tar.gz?download
Save this file in a new directory at C:\sphinx.

Install cgywin:
Double click setup.exe that you downloaded.
When it asks for the Root Directory, choose C:\cygwin
When it asks for the Local Package Directory, choose C:\cygwin\packages
Choose any download site you want.
When you get to the select packages screen, you need to change the defaults as follows:
Click on "default" next to Arhive to change it to "install".
Click on "default" next to Devel to change it to "install".
Click on "default" next to Editors to change it to "install".
Proceed with download.
If you are familiar with Cygwin, feel free to install any packages you want provided you can build Sphinx2. These choices are simply the quickest method. They are not necessarily the smallest installation possible.

Install sphinx:
Start the cygwin bash shell (you should either have a link on the desktop or in start->program files).
Type the following commands in order:
cd /cygdrive/c/sphinx
gunzip sphinx2-0.4.tar.gz
tar -xf sphinx2-0.4.tar
cd sphinx2-0.4
./configure --prefix=/cygdrive/c/sphinx/sphinx2
make
make install
The path /cygdrive/c/sphinx is references in the accompanying java code. If you install Sphinx in another location, you will need to modify the java code.

Add helper files:
Copy the files sphinx2-current and sphinx2-current-run to the bin directory, C:\sphinx\sphinx2\bin under windows or /cygdrive/c/sphinx/sphinx2/bin under cygwin.
These are helper shell scripts which run sphinx using the "current" language model (see below). The sphinx2-current-run script is the main script. The sphinx2-current script is what the java code runs. It pipes stdout and stderr to /dev/null, so to run the recognizer manually, use the first script.

Add language model:
Copy the directory "current" to C:\sphinx\sphinx2\share\sphinx2\model\lm
This is the language model directory. It includes all the data files needed for Sphinx to do recognition. This is also where Sphinx is configured to put it's output files. The file output.txt includes the single best hypothesis, and the file current.hyp includes the top n utterances, as configured in the shell scripts (by default 3). The files in this directory must be replaced with language models for your problems. Please see my example applications for their language model, and see the Sphinx website for details on constructing these models.

Test sphinx:
Run the following two commands.
cd /cygdrive/c/sphinx/sphinx2/bin
./sphinx2-current-run
You should get a good deal of output, and near the end you should see something that says:
BESTPATH: GO FORWARD FIVE METERS
If you see this, sphinx has correctly recognized the speech in an example file.
The example files is in the "current" directory and is named current.16k.

Test Example Application:
Copy the "Speech" directory to your ProblemsOrganizer.
Open NetBeans and mount the example application folder.
Build each of the four java files.
Run Interface.java
Open the GoForward problem under Speech->test->test in the Behavior recorder.
Run by clicking start, speaking the utterance, and clicking stop.
See my example applications for more complex examples. This example application (and the others) include code for recording the user's utterance (Recoder.java) and communicating with the recognizer (Recognizer.java). These utilities are used by the event handlers for the start and stop buttons, which include the code for working with the authoring tool widgets to force updates.

Feel free to email me with any questions or comments: jonbrown @ cs.cmu.edu.