·Two
issues with large vocabulary bigram LMs:
·With vocabulary size V and N word exits per
frame, NxV cross-word transitions
per frame
·Bigram
probabilities very sparse; mostly “backoff” to unigrams
·Optimize cross-word
transitions using “backoff node”:
·Viterbi decision at
backoff node selects single-best predecessor