Recognition
performance of a large-scale dependency-grammar language model
A. Berger, H. Printz
International Conference on Spoken Language Processing, Sydney, Australia
(1998)
A
Comparison of Criteria for Maximum Entropy/Minimum Divergence Language
Modelling
(in compressed postscript)
A. Berger, H. Printz
Third Conference on Empirical Methods in Natural Language Processing. Granada, Spain (1998)
Just in Time Language Modelling
A. Berger, R. Miller
IEEE Conference on Acoustic, Speech and Signal Processing. Seattle, WA (1998)
Traditional approaches to language modelling have relied on a fixed
corpus of text to inform the parameters of a probability distribution
over word sequences. Increasing the corpus size often leads to
better-performing language models, but no matter how large, the corpus
is a static entity, unable to reflect information about events which
postdate it. In these pages we introduce an online paradigm which
interleaves the estimation and application of a language model. We
present a Bayesian approach to online language modelling, in which the
marginal probabilities of a static trigram model are dynamically
updated to match the topic being dictated to the system. We also
describe the architecture of a prototype we have implemented which
uses the World Wide Web (WWW) as a source of information, and provide
results from some initial proof of concept experiments.
Cyberpunc: A lightweight punctuation annotation system for speech
D. Beeferman, A. Berger, J. Lafferty
IEEE Conference on Acoustic, Speech and Signal Processing. Seattle, WA (1998)
This paper describes a lightweight method for the automatic insertion
of intra-sentence punctuation into text. Despite the intuition that
pauses in an acoustic stream are a positive indicator for some types
of punctuation, this work will demonstrate the feasibility of a system
which relies solely on lexical information. Besides its potential role
in a speech recognition system, such a system could serve equally well
in non-speech applications such as automatic grammar correction in a
word processor and parsing of spoken text. After describing the design
of a punctuation-restoration system, which relies on a trigram
language model and a straightforward application of the Viterbi
algorithm, we summarize results, both quantitative and subjective, of
the performance and behavior of a prototype system.
Text segmentation using exponential models
D. Beeferman, A. Berger, J. Lafferty
Second Conference on Empirical Methods in Natural Language Processing. Providence, RI. (1997)
This paper introduces a new statistical approach to partitioning text
automatically into coherent segments. Our approach enlists both
short-range and long-range language models to help it sniff out likely
sites of topic changes in text. To aid its search, the system consults a
set of simple lexical hints it has learned to associate with the
presence of boundaries through inspection of a large corpus of annotated
data. We also propose a new probabilistically motivated error metric for
use by the natural language processing and information retrieval
communities, intended to supersede precision and recall for appraising
segmentation algorithms. Qualitative assessment of our algorithm as
well as evaluation using this new metric demonstrate the effectiveness
of our approach in two very different domains, Wall Street
Journal articles and the TDT Corpus,
a collection of newswire articles and broadcast news transcripts.
A Model of Lexical Attraction and Repulsion
D. Beeferman, A. Berger, J. Lafferty
ACL-EACL'97 Joint Conference, Madrid Spain (1997)
This paper introduces new techniques based on exponential
families for modeling the correlations between words in
text and speech.
The motivation for this work is to build improved
statistical language models by treating a static trigram model as a
default distribution, and adding sufficient statistics, or ``features,''
to a family of conditional exponential distributions in order to model
the nonstationary characteristics of language. We focus on features
based on pairs of mutually informative words which allow the trigram
model to adapt to recent context. While previous work assumed the
effects of these word pairs to be constant over a window of several
hundred words, we show that their influence is nonstationary on a much
smaller time scale. In particular, empirical samples drawn from both
written text and conversational speech reveal that the ``attraction''
between words decays exponentially, while stylistic and syntactic
contraints create a ``lexical exclusion'' effect that discourages close
co-occurrence. We show that these characteristics are well described by
mixture models based on two-stage exponential distributions. These
models are a common tool in queueing theory, but they have not
previously found use in speech and language processing. We show how the
EM algorithm can be used to estimate the parameters of these models,
which can then be incorporated as penalizing features in the posterior
distribution for predicting the next word. Experimental results
illustrate the benefit these techniques yield when incorporated into
a long-range language model.