ML p(r)ior | Using Hadoop for Large Scale Analysis on Twitter: A Technical Report

Using Hadoop for Large Scale Analysis on Twitter: A Technical Report

2016-02-03
Sentiment analysis (or opinion mining) on Twitter data has attracted much attention recently. One of the system's key features, is the immediacy in communication with other users in an easy, user-friendly and fast way. Consequently, people tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide diversity of topics. This amount of information offers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since none can invest an infinite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample, is not representative to define the sentiment polarity towards a topic due to the massive number of tweets published daily. In this paper, we go one step further and develop a novel method for sentiment learning in the MapReduce framework. Our algorithm exploits the hashtags and emoticons inside a tweet, as sentiment labels, and proceeds to a classification procedure of diverse sentiment types in a parallel and distributed manner. Moreover, we utilize Bloom filters to compact the storage size of intermediate data and boost the performance of our algorithm. Through an extensive experimental evaluation, we prove that our solution is efficient, robust and scalable and confirm the quality of our sentiment identification.
PDF

Highlights - Most important sentences from the article

Login to like/save this paper, take notes and configure your recommendations

Related Articles

2013-05-27

We have explored different methods of improving the accuracy of a Naive Bayes classifier for sentime… show more
PDF

Highlights - Most important sentences from the article

2017-04-07

This paper describes our multi-view ensemble approach to SemEval-2017 Task 4 on Sentiment Analysis i… show more
PDF

Highlights - Most important sentences from the article

2018-06-07

Sentiment classification typically relies on a large amount of labeled data. In practice, the availa… show more
PDF

Highlights - Most important sentences from the article

2017-06-25
1706.08032 | cs.CL

This paper introduces a novel deep learning framework including a lexicon-based approach for sentenc… show more
PDF

Highlights - Most important sentences from the article

2017-03-07

This paper presents a novel approach for multi-lingual sentiment classification in short texts. This… show more
PDF

Highlights - Most important sentences from the article

2016-06-14
1606.04351 | cs.CL

This paper describes the participation of the team "TwiSE" in the SemEval 2016 challenge. Specifical… show more
PDF

Highlights - Most important sentences from the article

2018-04-02

This paper describes our NIHRIO system for SemEval-2018 Task 3 "Irony detection in English tweets". … show more
PDF

Highlights - Most important sentences from the article

2017-09-07

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Releva… show more
PDF

Highlights - Most important sentences from the article

2017-01-26
1701.07681 | cs.DS

Time series (TS) occur in many scientific and commercial applications, ranging from earth surveillan… show more
PDF

Highlights - Most important sentences from the article

2017-08-17

Stance classification determines the attitude, or stance, in a (typically short) text. The task has … show more
PDF

Highlights - Most important sentences from the article

2019-01-24

A character-level convolutional neural network (CNN) motivated by applications in "automated machine… show more
PDF

Highlights - Most important sentences from the article

2019-04-17

In recent years, there has been an exponential growth in the number of complex documents and texts t… show more
PDF

Highlights - Most important sentences from the article

2013-08-28

In this paper, we describe how we created two state-of-the-art SVM classifiers, one to detect the se… show more
PDF

Highlights - Most important sentences from the article

2018-07-09

Deep neural networks have shown good data modelling capabilities when dealing with challenging and l… show more
PDF

Highlights - Most important sentences from the article

2019-04-04

In Brain-Computer Interfacing (BCI), due to inter-subject non-stationarities of electroencephalogram… show more
PDF

Highlights - Most important sentences from the article

2019-01-07
1901.01695 | cs.CL

In this dissertation we report results of our research on dense distributed representations of text … show more
PDF