Sentiment Analysis of Twitter Data

Sentiment Analysis of Twitter Data Apoorv Agarwal Boyi Xie Ilia Vovsha Owen Rambow Rebecca Passonneau Department of Computer Science Columbia University New York, NY 10027 USA fapoorv@cs, xie@cs, iv2121@, rambow@ccls, [email protected] Abstract kernel based model. For the feature based model we use some of the features proposed in past liter- We examine sentiment analysis on Twitter ature and propose new features. For the tree ker- data. The contributions of this paper are: (1) nel based model we design a new tree representa- We introduce POS-specific prior polarity features. (2) We explore the use of a tree kernel to tion for tweets. We use a unigram model, previously obviate the need for tedious feature engineer- shown to work well for sentiment analysis for Twit- ing. The new features (in conjunction with ter data, as our baseline. Our experiments show that previously proposed features) and the tree ker- a unigram model is indeed a hard baseline achieving nel perform approximately at the same level, over 20% over the chance baseline for both classifi- both outperforming the state-of-the-art base- cation tasks. Our feature based model that uses only line. 100 features achieves similar accuracy as the unigram model that uses over 10,000 features. Our tree 1 Introduction kernel based model outperforms both these models by a significant margin. We also experiment with Microblogging websites have evolved to become a a combination of models: combining unigrams with source of varied kind of information. This is due to our features and combining our features with the tree nature of microblogs on which people post real time kernel. Both these combinations outperform the un- messages about their opinions on a variety of topics, igram baseline by over 4% for both classification discuss current issues, complain, and express posi- tasks. In this paper, we present extensive feature tive sentiment for products they use in daily life. In analysis of the 100 features we propose. Our ex- fact, companies manufacturing such products have periments show that features that have to do with started to poll these microblogs to get a sense of gen- Twitter-specific features (emoticons, hashtags etc.) eral sentiment for their product. Many times these add value to the classifier but only marginally. Fea- companies study user reactions and reply to users on tures that combine prior polarity of words with their microblogs. One challenge is to build technology to parts-of-speech tags are most important for both the detect and summarize an overall sentiment. classification tasks. Thus, we see that standard nat- In this paper, we look at one such popular mi- ural language processing tools are useful even in croblog called Twitter and build models for classify- a genre which is quite different from the genre on ing “tweets” into positive, negative and neutral senti- which they were trained (newswire). Furthermore, ment. We build models for two classification tasks: we also show that the tree kernel model performs a binary task of classifying sentiment into positive roughly as well as the best feature based models, and negative classes and a 3-way task of classi- even though it does not require detailed feature en- fying sentiment into positive, negative and neutral gineering. classes. We experiment with three types of models: unigram model, a feature based model and a tree We use manually annotated Twitter data for our experiments. One advantage of this data, over pre- with parts-of-speech (POS) features. They note that viously used data-sets, is that the tweets are col- the unigram model outperforms all other models. lected in a streaming fashion and therefore represent Specifically, bigrams and POS features do not help. a true sample of actual tweets in terms of language Pak and Paroubek (2010) collect data following a use and content. Our new data set is available to similar distant learning paradigm. They perform a other researchers. In this paper we also introduce different classification task though: subjective ver- two resources which are available (contact the first sus objective. For subjective data they collect the author): 1) a hand annotated dictionary for emoti- tweets ending with emoticons in the same manner cons that maps emoticons to their polarity and 2) as Go et al. (2009). For objective data they crawl an acronym dictionary collected from the web with twitter accounts of popular newspapers like “New English translations of over 5000 frequently used York Times”, “Washington Posts” etc. They re- acronyms. port that POS and bigrams both help (contrary to The rest of the paper is organized as follows. In results presented by Go et al. (2009)). Both these section 2, we discuss classification tasks like sen- approaches, however, are primarily based on ngram timent analysis on micro-blog data. In section 3, models. Moreover, the data they use for training and we give details about the data. In section 4 we dis- testing is collected by search queries and is therefore cuss our pre-processing technique and additional re- biased. In contrast, we present features that achieve sources. In section 5 we present our prior polarity a significant gain over a unigram baseline. In addi- scoring scheme. In section 6 we present the design tion we explore a different method of data represen- of our tree kernel. In section 7 we give details of our tation and report significant improvement over the feature based approach. In section 8 we present our unigram models. Another contribution of this paper experiments and discuss the results. We conclude is that we report results on manually annotated data and give future directions of research in section 9. that does not suffer from any known biases. Our data is a random sample of streaming tweets unlike 2 Literature Survey data collected by using specific queries. The size of our hand-labeled data allows us to perform cross- Sentiment analysis has been handled as a Natural validation experiments and check for the variance in Language Processing task at many levels of gran- performance of the classifier across folds. ularity. Starting from being a document level classi- Another significant effort for sentiment classifica- fication task (Turney, 2002; Pang and Lee, 2004), it tion on Twitter data is by Barbosa and Feng (2010). has been handled at the sentence level (Hu and Liu, They use polarity predictions from three websites as 2004; Kim and Hovy, 2004) and more recently at noisy labels to train a model and use 1000 manually the phrase level (Wilson et al., 2005; Agarwal et al., labeled tweets for tuning and another 1000 manu- 2009). ally labeled tweets for testing. They however do Microblog data like Twitter, on which users post not mention how they collect their test data. They real time reactions to and opinions about “every- propose the use of syntax features of tweets like thing”, poses newer and different challenges. Some retweet, hashtags, link, punctuation and exclamation of the early and recent results on sentiment analysis marks in conjunction with features like prior polar- of Twitter data are by Go et al. (2009), (Bermingham ity of words and POS of words. We extend their and Smeaton, 2010) and Pak and Paroubek (2010). approach by using real valued prior polarity, and by Go et al. (2009) use distant learning to acquire senti- combining prior polarity with POS. Our results show ment data. They use tweets ending in positive emoti- that the features that enhance the performance of our cons like “:)” “:-)” as positive and negative emoti- classifiers the most are features that combine prior cons like “:(” “:-(” as negative. They build mod- polarity of words with their parts of speech. The els using Naive Bayes, MaxEnt and Support Vec- tweet syntax features help but only marginally. tor Machines (SVM), and they report SVM outper- Gamon (2004) perform sentiment analysis on forms other classifiers. In terms of feature space, feeadback data from Global Support Services sur- they try a Unigram, Bigram model in conjunction vey. One aim of their paper is to analyze the role of linguistic features like POS tags. They perform tweets each from classes positive, negative and neu- extensive feature analysis and feature selection and tral). demonstrate that abstract linguistic analysis features contributes to the classifier accuracy. In this paper 4 Resources and Pre-processing of data we perform extensive feature analysis and show that In this paper we introduce two new resources for the use of only 100 abstract linguistic features per- pre-processing twitter data: 1) an emoticon dictio- forms as well as a hard unigram baseline. nary and 2) an acronym dictionary. We prepare the emoticon dictionary by labeling 170 emoticons 3 Data Description listed on Wikipedia1 with their emotional state. For example, “:)” is labeled as positive whereas “:=(” is Twitter is a social networking and microblogging labeled as negative. We assign each emoticon a label service that allows users to post real time messages, from the following set of labels: Extremely-positive, called tweets. Tweets are short messages, restricted Extremely-negative, Positive, Negative, and Neu- to 140 characters in length. Due to the nature of this tral. We compile an acronym dictionary from an on- microblogging service (quick and short messages), line resource.2 The dictionary has translations for people use acronyms, make spelling mistakes, use 5,184 acronyms. For example, lol is translated to emoticons and other characters that express special laughing out loud. meanings. Following is a brief terminology associ- We pre-process all the tweets as follows: a) re- ated with tweets. Emoticons: These are facial ex- place all the emoticons with a their sentiment po- pressions pictorially represented using punctuation larity by looking up the emoticon dictionary, b) re- and letters; they express the user’s mood.

Load more