favorite7The tagging accuracies obtained with M U on CZ-EN and FR-EN are similar to the one obtained by Li and Jurafsky with their multi-sense model (93.8), while the accuracy of S G is more competitive in our case (around 94.0 compared to 92.5), although they use a larger corpus for training the word representations.
favorite15As our downstream evaluation task, we use the learned word representations to initialize the embedding layer of a neural network tagging model.
favorite1similarity dataset for evaluating multi-sense embeddings, since it allows us to perform the sense prediction step based on the sentential context provided for each word in the pair.
favorite6In practice, we avoid the costly computation of the normalization factor in the softmax computation of Eq. Optimizing the autoencoding objective is broadly similar to the learning algorithm defined for multi-sense embedding induction in some of the previous work (Neelakantan et al., 2014; Li and Jurafsky, 2015).
favorite2Figure 1: Model schema: the sense encoder with bilingual signal and the context-word predictor are learned jointly..
favorite4The main differences are that we use word-embeddings for generation and include N-grams features for ranking, which can easily be obtained from raw text.
favorite3Hassan and Menezes (2013) use random walks in a bipartite graph based on words and their contexts to generate normalization candidates, which they rank using the Viterbi algorithm.
favorite3We use features from the generation modules as well as additional features in a random forest classifier, which decides which candidate is the correct normalization.
favorite10We train a random forest classifier to rank the candidates, which generalizes well to all different types of normalization actions.
favorite2Our proposed model is based on a modular candidate generation in which each module is responsible for a different type of normalization action.
favorite0The obtained word representations achieve better performance than those from the unlabeled-tree model.
favorite11As our extension builds upon sequential and unlabeledtree HMMs, we also revisit the basic difference between the two, but are unable to entirely corroborate the alleged advantage of syntactic context for word representations in the NER task..
favorite2Unlike the recent research in word representation learning, focused heavily on word embeddings from the neural network tradition (Collobert and Weston, 2008; Mikolov et al., 2013a; Pennington et al., 2014), our work falls into the framework of hidden Markov models (HMMs), drawing on the work of Grave et al.
favorite190Recently, it has been shown that representations using syntactic contexts can be superior to those learned from linear sequences in downstream tasks such as named entity recognition (Grave et al., 2013), dependency parsing (Bansal et al., 2014; Sagae and Gordon, 2009) and PP-attachment disambiguation (Belinkov et al., 2014).
favorite5We observe improvements from exploiting syntactic function information in both cases, and the results rivaling those of state-of-the-art representation learning methods.