Processing...

### A Comparison of Two Smoothing Methods for Word Bigram Models

**1994-10-31**

9410034 | cmp-lg

A COMPARISON OF TWO SMOOTHING METHODS FOR WORD BIGRAM MODELS
Linda Bauman Peto
Department of Computer Science
University of Toronto Abstract Word bigram models estimated from text corpora
require smoothing methods to estimate the probabilities of unseen bigrams. The
deleted estimation method uses the formula:
Pr(i|j) = lambda f_i + (1-lambda)f_i|j, where f_i and f_i|j are the relative
frequency of i and the conditional relative frequency of i given j,
respectively, and lambda is an optimized parameter. MacKay (1994) proposes a
Bayesian approach using Dirichlet priors, which yields a different formula:
Pr(i|j) = (alpha/F_j + alpha) m_i + (1 - alpha/F_j + alpha) f_i|j where F_j
is the count of j and alpha and m_i are optimized parameters. This thesis
describes an experiment in which the two methods were trained on a
two-million-word corpus taken from the Canadian _Hansard_ and compared on the
basis of the experimental perplexity that they assigned to a shared test
corpus. The methods proved to be about equally accurate, with MacKay's method
using fewer resources.

**Login to like/save this paper, take notes and configure your recommendations**