Minimum Error Rate Training Tools
This document walks you through running Minimum Error Rate Training for the STTK system. Please check this document for updates on the script as well as parameter settings for the STTK binary. This document assumes you have TranslateTS.cc v 1.19 , which was committed to CVS on 07/14/2005. please call "cvs update TranslateTS.cc" to update your version. Download the latest version of the script
STKK settings
Additional settings required in the parameter file to run MER training, make sure you have at least 50 sentences to train with
- MEROptimize 1
- IterationLimit 10
- ReferenceFileListing (make sure it matches your source files)
- NormalizingScript (use ~ashishv/NoNormalize.pl for no normalization, other make sure it matches your needs)
- NBest 1000.0
- ParameterRestrictionFile myinit.opt
LanguageModel
AReorderingModel
SentenceLengthModel
PhraseCountModel
TranslationModels1...N
1 -3 -3 -3 0 0
1 3 3 3 1 1
Running training
- Make sure your staging directory has a link to the optimization script
- Make sure your staging directory has a ParameterRestrictionFile in it
- Run TranslateTS with your new parameter file and save the output to a log file
- grep 'TrainTranslationScore' to see how the score is improving
- grep 'Got' to see the scaling factors found by the optimization
- Note:It is common to see the score drop after 1 iteration (since the n-best list doesnt have many negative examples)
User parameters in the script
There are a few parameters in the script that allow you to change the way the optimization runs. Their default settings usually work well, but you might want to experiment with them to speed up the process if your parameters are generalizing well. Here they are in no particular order.
- NumRandomTests : currently set to 5. Number of times new random seeds are used in the optimization process
- ConvergedLimit : currently set to 3. Number of times the error must stay the same to claim convergence has occurred (within a random seed run)
- IterationLimit : currently set to 20. Max number of times random permutations are considered for a single random seed
- ExpansionMargin : currently set to 0.0 If the optimized parameter value gets within the ExpansionMargin of its right bound, the right bound will be increased by ExpansionFactor*( maxValue - minValue), effectively increasing the search space.