Abstract
The Viterbi algorithm is an efficient and optimal method for decoding
linear-chain Markov Models. However, the entire input sequence must be
observed before the labels for any time step can be generated, and
therefore Viterbi cannot be directly applied to
online/interactive/streaming scenarios without incurring significant
(possibly unbounded) latency. A widely used approach is to break the
input stream into fixed-size windows, and apply Viterbi to each window.
Larger windows lead to higher accuracy, but result in higher latency.
We propose several alternative algorithms to the fixed-sized window
decoding approach. These approaches compute a certainty measure on
predicted labels that allows us to trade off latency for expected
accuracy dynamically, without having to choose a fixed window size up
front. Not surprisingly, this more principled approach gives us a
substantial improvement over choosing a fixed window. We show the
effectiveness of the approach for the task of spotting semi-structured
information in large documents.
|
Pradeep Ravikumar Last modified: Thu Nov 9 02:04:18 EST 2006