Name |
Remarks |
Sphinx-3.5 (Live-mode APIs, Speaker Adaptation and Code Convergence) |
- Location: Open source, release available.
- Based on Sphinx-3.4 and Sphinx 3.0.
- Live-mode APIs are implemented an thoroughly tested
- Speaker adaptation based on single regression class
is implemented.
- Four tools of s3.0 are now incorporated in Sphinx 3.x
align (s3align in 3.0) : Forced alignment
allphone (s3allphone in 3.0) : Allphone decoding
astar (s3astar in 3.0) : A* search, nbest generation
dag (s3dag in 3.0) : Shortest-path search
- The feature extraction libraries of sphinx 3 is now EXACTLY the same as SphinxTrain
- Corresponding changes in SphinxTrian
- Two new tools are introduced:
mllr_solve : compute the regression matrix based on MLLR algorithm.
mllr_transform : given a regression matrix, this program converts the mean based on linear transformation.
- The command line interface of all SphinxTrain's tools are now unified.
|
Sphinx-3.4 (Fast GMM computation) |
- Location: Open source, module archive_s3/s3.4 in cvs tree.
- Based on Sphinx-3.3, Fast GMM computation is implemented
- frame down-sampling
- CI-based GMM selection
- VQ-based and SVQ-based Gaussian Selection
- Support of SVQ with arbitrary number of sub-vectors.
- Phoneme look-ahead
- Support class-based LM and multiple LM
|
Sphinx-3.3 (fast decoder) |
- Location: Open source, module archive_s3/s3.3 in cvs tree.
- Fast Sphinx-3 decoder using lextree organization:
- 5-10x real time speed on large vocabulary tasks (measured at 1999)
- Continuous density acoustic models only
- Batch-Mode or live operation
- Other tools
gausubvq : Sub-vector clustered acoustic model building, needed for fast acoustic model evaluation
|
Sphinx-3.2 |
- Location: Open source, module archive_s3/s3.2 in cvs tree.
- Same features as s3.3, but capable of batch-mode operation only.
|
Sphinx-3 (slow decoder) |
- Location: Open source, module archive_s3/s3 in cvs tree.
- Original Sphinx-3 decoder
- Slow; 50-100x real time speed on large vocabulary tasks (measured at 1999)
- Any kind of acoustic model (discrete, semi-continuous, continuous,
others)
- Major applications:
s3decode and s3decode-anytopo : Speech-to-text
Decoding
s3align : Forced alignment
s3allphone : Allphone decoding
s3astar : A* search, nbest generation
s3dag : Shortest-path search
- Other utilities:
stseg-read : State-segmentation binary file reader
sen2s2 : Sphinx-II "sendump" file creation from Sphinx-3
acoustic model
|
Sphinx-2 (fbs8) |
- Location: Open source, release available.
- Sphinx-II decoder
- Real-time operation
- Semi-continuous, Sphinx-II acoustic models only (Sphinx-II format)
- User applications support:
- Compiled into a library with a straightforward API for building
speech-enabled applications
- Continuous-listening support
- Dynamic language model loading and switching
- Several test applications:
- Basic dictation with and without "push-to-talk"
- Basic audio recording and playback
- Audio segmentation using the continuous listener
- Additional recognition modes:
- Forced alignment
- Allphone decoding
- A* search, nbest generation
- Shortest-path search
|