Processing...
Automated Postediting of Documents
1994-07-29
9407028 | cmp-lg
Large amounts of low- to medium-quality English texts are now being produced
by machine translation (MT) systems, optical character readers (OCR), and
non-native speakers of English. Most of this text must be postedited by hand
before it sees the light of day. Improving text quality is tedious work, but
its automation has not received much research attention. Anyone who has
postedited a technical report or thesis written by a non-native speaker of
English knows the potential of an automated postediting system. For the case of
MT-generated text, we argue for the construction of postediting modules that
are portable across MT systems, as an alternative to hardcoding improvements
inside any one system. As an example, we have built a complete self-contained
postediting module for the task of article selection (a, an, the) for English
noun phrases. This is a notoriously difficult problem for Japanese-English MT.
Our system contains over 200,000 rules derived automatically from online text
resources. We report on learning algorithms, accuracy, and comparisons with
human performance.