favorite8UdL at SemEval-2017 Task 1: Semantic textual similarity estimation of english sentence pairs using regression model over pairwise features.
favorite3cross-lingual tracks explores data from the WMT 2014 quality estimation task (Bojar et al., 2014).3 Sentences pairs in SNLI derive from Flickr30k image captions (Young et al., 2014) and are labeled with the entailment relations: entailment, neutral, and contradiction.
favorite0The Stanford Natural Language Inference (SNLI) corpus (Bowman et al., 2015) is the primary evaluation data source with the exception that one of the pilot track on cross-lingual Spanish-English STS.
favorite10Significant research effort has focused on STS over English sentence pairs.2 English STS is a 1 i.a., news headlines, video and image descriptions, glosses from lexical resources including WordNet (Miller, 1995; Fellbaum, 1998), FrameNet (Baker et al., 1998), OntoNotes (Hovy et al., 2006), web discussion fora, plagiarism, MT post-editing and Q&A data sets.
favorite85The STS task is motivated by the observation that accurately modeling the meaning similarity of sentences is a foundational language understanding problem relevant to numerous applications including: machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems.
favorite5In order to better understand the effect of the proposed post-processing in the two similarity axes introduced in Section 1, we adopt the widely used word analogy and word similarity tasks, which offer specific benchmarks for semantics/syntax and similarity/relatedness, respectively.
favorite5While there have been several proposals to learn specialized word embeddings (Levy and Goldberg, 2014a; Kiela et al., 2015; Bojanowski et al., 2017), previous work explicitly altered the training objective and often relied on external resources like knowledge bases, whereas the proposed method is applied as a post-processing of any pre-trained embedding model and does not require any additional resource.
favorite8Several unsupervised methods have been proposed to efficiently train dense vector representations of words (Mikolov et al., 2013; Pennington et al., 2014; Bojanowski et al., 2017) and successfully applied in a variety of tasks like parsing (Bansal et al., 2014), topic modeling (Batmanghelich et al., 2016) and document classification (Taddy, 2015).
favorite0A linear transformation that adjusts the similarity order of the model without any external resource can tailor it to achieve better results in those aspects, providing a new perspective on how embeddings encode divergent linguistic information.
favorite3Abstract Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness.
favorite1Comparing performance among the hierarchical approaches and a 5-way flat SVM model (also tuned on the training set) results turn to be similar.
favorite14That is, when training the non-top binary classifiers we do not consider the instances of the classes handled in higher levels of the hierarchy.
favorite40Such a relation can be used to build a hierarchical approach in order to consider the label relationships inside the classifying model.
favorite18Within this context, the confusion matrix of a tuned 5-way classifier (see figure 1) gives us an appropriate way to automatically identify the problematic label-relationship in order to set fruitful steps towards a better solution.
favorite1(2013) proposed the Semeval-2013 Student Response Analysis (SRA) task to automatically grade open question answers.