K-vec: A New Approach for Aligning Parallel Texts
Various methods have been proposed for aligning texts in two or more languages such as the Canadian Parliamentary Debates(Hansards). Some of these methods generate a bilingual lexicon as a by-product. We present an alternative alignment strategy which we call K-vec, that starts by estimating the lexicon. For example, it discovers that the English word "fisheries" is similar to the French "pe^ches" by noting that the distribution of "fisheries" in the English text is similar to the distribution of "pe^ches" in the French. K-vec does not depend on sentence boundaries.