Processing...
Towards History-based Grammars: Using Richer Models for Probabilistic Parsing
1994-05-03
9405007 | cmp-lg
We describe a generative probabilistic model of natural language, which we
call HBG, that takes advantage of detailed linguistic information to resolve
ambiguity. HBG incorporates lexical, syntactic, semantic, and structural
information from the parse tree into the disambiguation process in a novel way.
We use a corpus of bracketed sentences, called a Treebank, in combination with
decision tree building to tease out the relevant aspects of a parse tree that
will determine the correct parse of a sentence. This stands in contrast to the
usual approach of further grammar tailoring via the usual linguistic
introspection in the hope of generating the correct parse. In head-to-head
tests against one of the best existing robust probabilistic parsing models,
which we call P-CFG, the HBG model significantly outperforms P-CFG, increasing
the parsing accuracy rate from 60% to 75%, a 37% reduction in error.