ML p(r)ior | WASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter

WASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter

2016-01-31
Language in social media is mostly driven by new words and spellings that are constantly entering the lexicon thereby polluting it and resulting in high deviation from the formal written version. The primary entities of such language are the out-of-vocabulary (OOV) words. In this paper, we study various sociolinguistic properties of the OOV words and propose a classification model to categorize them into at least six categories. We achieve 81.26% accuracy with high precision and recall. We observe that the content features are the most discriminative ones followed by lexical and context features.
PDF

Highlights - Most important sentences from the article

Login to like/save this paper, take notes and configure your recommendations

Related Articles

2017-10-07
1710.02650 | cs.CL

Current topic models often suffer from discovering topics not matching human intuition, unnatural sw… show more
PDF

Highlights - Most important sentences from the article

2018-04-19

Compromised social media accounts are legitimate user accounts that have been hijacked by a third (m… show more
PDF

Highlights - Most important sentences from the article

2017-08-17

Stance classification determines the attitude, or stance, in a (typically short) text. The task has … show more
PDF

Highlights - Most important sentences from the article

2017-03-06

The inverse relationship between the length of a word and the frequency of its use, first identified… show more
PDF

Highlights - Most important sentences from the article

2016-04-25

In contrast to much previous work that has focused on location classification of tweets restricted t… show more
PDF

Highlights - Most important sentences from the article

2019-01-03
1901.00570 | cs.SI

Event detection using social media streams needs a set of informative features with strong signals t… show more
PDF

Highlights - Most important sentences from the article

2018-05-16
1805.06201 | cs.CL

We propose a novel data augmentation for labeled sentences called contextual augmentation. We assume… show more
PDF

Highlights - Most important sentences from the article

2016-12-12
1612.03769 | cs.CL

Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in t… show more
PDF

Highlights - Most important sentences from the article

2016-08-02
1608.00789 | cs.CL

The word embedding methods have been proven to be very useful in many tasks of NLP (Natural Language… show more
PDF

Highlights - Most important sentences from the article

2015-06-17
1506.05230 | cs.CL

Data-driven representation learning for words is a technique of central importance in NLP. While ind… show more
PDF

Highlights - Most important sentences from the article

2019-01-24

In this paper, we address the problem of detection, classification and quantification of emotions of… show more
PDF

Highlights - Most important sentences from the article

2018-12-04
1812.01199 | cs.IR

Recently, researchers have shown an increased interest in harnessing Twitter data for dynamic monito… show more
PDF

Highlights - Most important sentences from the article

2018-09-02

In Twitter, there is a rising trend in abusive behavior which often leads to incivility. This trend … show more
PDF

Highlights - Most important sentences from the article

2017-09-01
1709.00345 | cs.CL

In an online community, new words come and go: today's "haha" may be replaced by tomorrow's "lol." C… show more
PDF

Highlights - Most important sentences from the article

2018-10-25

In this article we present the design and implementation of the Logoscope, the first tool especially… show more
PDF

Highlights - Most important sentences from the article

2019-01-28

Extensive evaluation on a large number of word embedding models for language processing applications… show more
PDF

Highlights - Most important sentences from the article