Minimum Error Rate Training Tools

This document walks you through running Minimum Error Rate Training for the STTK system. Please check this document for updates on the script as well as parameter settings for the STTK binary. This document assumes you have TranslateTS.cc v 1.19 , which was committed to CVS on 07/14/2005. please call "cvs update TranslateTS.cc" to update your version. Download the latest version of the script

STKK settings

Additional settings required in the parameter file to run MER training, make sure you have at least 50 sentences to train with

  • MEROptimize 1
  • IterationLimit 10
  • ReferenceFileListing (make sure it matches your source files)
  • NormalizingScript (use ~ashishv/NoNormalize.pl for no normalization, other make sure it matches your needs)
  • NBest 1000.0
  • ParameterRestrictionFile myinit.opt
myinit.opt is a file that restricts the scaling factors to be within a set of ranges. The decoder outputs its models in the following order...

LanguageModel
AReorderingModel
SentenceLengthModel
PhraseCountModel
TranslationModels1...N

The param restriction file should match this same order, and its important to keep 1 model fixed. Here is an example restriction file for a system with phrases that have 2 model scores, that should be kept between 0 and 1, a LM fixed to be 1, and other models ranging between -3 and 3.

1 -3 -3 -3 0 0
1 3 3 3 1 1

Running training

  • Make sure your staging directory has a link to the optimization script
  • Make sure your staging directory has a ParameterRestrictionFile in it
  • Run TranslateTS with your new parameter file and save the output to a log file
  • grep 'TrainTranslationScore' to see how the score is improving
  • grep 'Got' to see the scaling factors found by the optimization
  • Note:It is common to see the score drop after 1 iteration (since the n-best list doesnt have many negative examples)

User parameters in the script

There are a few parameters in the script that allow you to change the way the optimization runs. Their default settings usually work well, but you might want to experiment with them to speed up the process if your parameters are generalizing well. Here they are in no particular order.

  • NumRandomTests : currently set to 5. Number of times new random seeds are used in the optimization process
  • ConvergedLimit : currently set to 3. Number of times the error must stay the same to claim convergence has occurred (within a random seed run)
  • IterationLimit : currently set to 20. Max number of times random permutations are considered for a single random seed
  • ExpansionMargin : currently set to 0.0 If the optimized parameter value gets within the ExpansionMargin of its right bound, the right bound will be increased by ExpansionFactor*( maxValue - minValue), effectively increasing the search space.