Recognition
performance of a large-scale dependency-grammar language model
A. Berger, H. Printz
International Conference on Spoken Language Processing, Sydney, Australia
(1998)
A
Comparison of Criteria for Maximum Entropy/Minimum Divergence Language
Modelling
(in compressed postscript)
A. Berger, H. Printz
Third Conference on Empirical Methods in Natural Language Processing. Granada, Spain (1998)
Just in Time Language Modelling
A. Berger, R. Miller
IEEE Conference on Acoustic, Speech and Signal Processing. Seattle, WA (1998)
Traditional approaches to language modelling have relied on a fixed corpus of text to inform the parameters of a probability distribution over word sequences. Increasing the corpus size often leads to better-performing language models, but no matter how large, the corpus is a static entity, unable to reflect information about events which postdate it. In these pages we introduce an online paradigm which interleaves the estimation and application of a language model. We present a Bayesian approach to online language modelling, in which the marginal probabilities of a static trigram model are dynamically updated to match the topic being dictated to the system. We also describe the architecture of a prototype we have implemented which uses the World Wide Web (WWW) as a source of information, and provide results from some initial proof of concept experiments.
Cyberpunc: A lightweight punctuation annotation system for speech
D. Beeferman, A. Berger, J. Lafferty
IEEE Conference on Acoustic, Speech and Signal Processing. Seattle, WA (1998)
This paper describes a lightweight method for the automatic insertion of intra-sentence punctuation into text. Despite the intuition that pauses in an acoustic stream are a positive indicator for some types of punctuation, this work will demonstrate the feasibility of a system which relies solely on lexical information. Besides its potential role in a speech recognition system, such a system could serve equally well in non-speech applications such as automatic grammar correction in a word processor and parsing of spoken text. After describing the design of a punctuation-restoration system, which relies on a trigram language model and a straightforward application of the Viterbi algorithm, we summarize results, both quantitative and subjective, of the performance and behavior of a prototype system.
Text segmentation using exponential models
D. Beeferman, A. Berger, J. Lafferty
Second Conference on Empirical Methods in Natural Language Processing. Providence, RI. (1997)
This paper introduces a new statistical approach to partitioning text automatically into coherent segments. Our approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To aid its search, the system consults a set of simple lexical hints it has learned to associate with the presence of boundaries through inspection of a large corpus of annotated data. We also propose a new probabilistically motivated error metric for use by the natural language processing and information retrieval communities, intended to supersede precision and recall for appraising segmentation algorithms. Qualitative assessment of our algorithm as well as evaluation using this new metric demonstrate the effectiveness of our approach in two very different domains, Wall Street Journal articles and the TDT Corpus, a collection of newswire articles and broadcast news transcripts.
A Model of Lexical Attraction and Repulsion
D. Beeferman, A. Berger, J. Lafferty
ACL-EACL'97 Joint Conference, Madrid Spain (1997)
This paper introduces new techniques based on exponential families for modeling the correlations between words in text and speech. The motivation for this work is to build improved statistical language models by treating a static trigram model as a default distribution, and adding sufficient statistics, or ``features,'' to a family of conditional exponential distributions in order to model the nonstationary characteristics of language. We focus on features based on pairs of mutually informative words which allow the trigram model to adapt to recent context. While previous work assumed the effects of these word pairs to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale. In particular, empirical samples drawn from both written text and conversational speech reveal that the ``attraction'' between words decays exponentially, while stylistic and syntactic contraints create a ``lexical exclusion'' effect that discourages close co-occurrence. We show that these characteristics are well described by mixture models based on two-stage exponential distributions. These models are a common tool in queueing theory, but they have not previously found use in speech and language processing. We show how the EM algorithm can be used to estimate the parameters of these models, which can then be incorporated as penalizing features in the posterior distribution for predicting the next word. Experimental results illustrate the benefit these techniques yield when incorporated into a long-range language model.