Learning for Microblogs with Distant Supervision: Political Forecasting with Twitter
Total Page:16
File Type:pdf, Size:1020Kb
Learning for Microblogs with Distant Supervision: Political Forecasting with Twitter Micol Marchetti-Bowick Nathanael Chambers Microsoft Corporation Department of Computer Science 475 Brannan Street United States Naval Academy San Francisco, CA 94122 Annapolis, MD 21409 [email protected] [email protected] Abstract apply our approach to the problem of predicting Presidential Job Approval polls from Twitter data, Microblogging websites such as Twitter and we present results that improve on previous offer a wealth of insight into a popu- work in this area. We also present a novel base- lation’s current mood. Automated ap- proaches to identify general sentiment to- line that performs remarkably well without using ward a particular topic often perform two topic identification. steps: Topic Identification and Sentiment Topic identification is the task of identifying Analysis. Topic Identification first identi- text that discusses a topic of interest. Most pre- fies tweets that are relevant to a desired vious work on microblogs uses simple keyword topic (e.g., a politician or event), and Sen- searches to find topic-relevant tweets on the as- timent Analysis extracts each tweet’s atti- tude toward the topic. Many techniques for sumption that short tweets do not need more so- Topic Identification simply involve select- phisticated processing. For instance, searches for ing tweets using a keyword search. Here, the name “Obama” have been assumed to return we present an approach that instead uses a representative set of tweets about the U.S. Pres- distant supervision to train a classifier on ident (O’Connor et al., 2010). One of the main the tweets returned by the search. We show contributions of this paper is to show that keyword that distant supervision leads to improved search can lead to noisy results, and that the same performance in the Topic Identification task keywords can instead be used in a distantly super- as well in the downstream Sentiment Anal- ysis stage. We then use a system that incor- vised framework to yield improved performance. porates distant supervision into both stages Distant supervision uses noisy signals in text to analyze the sentiment toward President as positive labels to train classifiers. For in- Obama expressed in a dataset of tweets. stance, the token “Obama” can be used to iden- Our results better correlate with Gallup’s tify a series of tweets that discuss U.S. President Presidential Job Approval polls than pre- Barack Obama. Although searching for token vious work. Finally, we discover a sur- matches can return false positives, using the re- prising baseline that outperforms previous work without a Topic Identification stage. sulting tweets as positive training examples pro- vides supervision from a distance. This paper ex- periments with several diverse sets of keywords 1 Introduction to train distantly supervised classifiers for topic Social networks and blogs contain a wealth of identification. We evaluate each classifier on a data about how the general public views products, hand-labeled dataset of political and apolitical campaigns, events, and people. Automated algo- tweets, and demonstrate an improvement in F1 rithms can use this data to provide instant feed- score over simple keyword search (.39 to .90 in back on what people are saying about a topic. the best case). We also make available the first la- Two challenges in building such algorithms are beled dataset for topic identification in politics to (1) identifying topic-relevant posts, and (2) iden- encourage future work. tifying the attitude of each post toward the topic. Sentiment analysis encompasses a broad field This paper studies distant supervision (Mintz et of research, but most microblog work focuses al., 2009) as a solution to both challenges. We on two moods: positive and negative sentiment. Algorithms to identify these moods range from In contrast to lexicons, many approaches in- matching words in a sentiment lexicon to training stead focus on ways to train supervised classi- classifiers with a hand-labeled corpus. Since la- fiers. However, labeled data is expensive to cre- beling corpora is expensive, recent work on Twit- ate, and examples of Twitter classifiers trained on ter uses emoticons (i.e., ASCII smiley faces such hand-labeled data are few (Jiang et al., 2011). In- as :-( and :-)) as noisy labels in tweets for distant stead, distant supervision has grown in popular- supervision (Pak and Paroubek, 2010; Davidov et ity. These algorithms use emoticons to serve as al., 2010; Kouloumpis et al., 2011). This paper semantic indicators for sentiment. For instance, presents new analysis of the downstream effects a sad face (e.g., :-() serves as a noisy label for a of topic identification on sentiment classifiers and negative mood. Read (2005) was the first to sug- their application to political forecasting. gest emoticons for UseNet data, followed by Go Interest in measuring the political mood of et al. (Go et al., 2009) on Twitter, and many others a country has recently grown (O’Connor et al., since (Bifet and Frank, 2010; Pak and Paroubek, 2010; Tumasjan et al., 2010; Gonzalez-Bailon et 2010; Davidov et al., 2010; Kouloumpis et al., al., 2010; Carvalho et al., 2011; Tan et al., 2011). 2011). Hashtags (e.g., #cool and #happy) have Here we compare our sentiment results to Presi- also been used as noisy sentiment labels (Davi- dential Job Approval polls and show that the sen- dov et al., 2010; Kouloumpis et al., 2011). Fi- timent scores produced by our system are posi- nally, multiple models can be blended into a sin- tively correlated with both the Approval and Dis- gle classifier (Barbosa and Feng, 2010). Here, we approval job ratings. adopt the emoticon algorithm for sentiment analy- In this paper we present a method for cou- sis, and evaluate it on a specific domain (politics). pling two distantly supervised algorithms for Topic identification in Twitter has received topic identification and sentiment classification on much less attention than sentiment analysis. The Twitter. In Section 4, we describe our approach to majority of approaches simply select a single topic identification and present a new annotated keyword (e.g., “Obama”) to represent their topic corpus of political tweets for future study. In Sec- (e.g., “US President”) and retrieve all tweets that tion 5, we apply distant supervision to sentiment contain the word (O’Connor et al., 2010; Tumas- analysis. Finally, Section 6 discusses our sys- jan et al., 2010; Tan et al., 2011). The underlying tem’s performance on modeling Presidential Job assumption is that the keyword is precise, and due Approval ratings from Twitter data. to the vast number of tweets, the search will re- turn a large enough dataset to measure sentiment 2 Previous Work toward that topic. In this work, we instead use a distantly supervised system similar in spirit to The past several years have seen sentiment anal- those recently applied to sentiment analysis. ysis grow into a diverse research area. The idea Finally, we evaluate the approaches presented of sentiment applied to microblogging domains is in this paper on the domain of politics. Tumasjan relatively new, but there are numerous recent pub- et al. (2010) showed that the results of a recent lications on the subject. Since this paper focuses German election could be predicted through fre- on the microblog setting, we concentrate on these quency counts with remarkable accuracy. Most contributions here. similar to this paper is that of O’Connor et al. The most straightforward approach to senti- (2010), in which tweets relating to President ment analysis is using a sentiment lexicon to la- Obama are retrieved with a keyword search and bel tweets based on how many sentiment words a sentiment lexicon is used to measure overall appear. This approach tends to be used by appli- approval. This extracted approval ratio is then cations that measure the general mood of a popu- compared to Gallup’s Presidential Job Approval lation. O’Connor et al. (2010) use a ratio of posi- polling data. We directly compare their results tive and negative word counts on Twitter, Kramer with various distantly supervised approaches. (2010) counts lexicon words on Facebook, and 3 Datasets Thelwall (2011) uses the publicly available Sen- tiStrength algorithm to make weighted counts of The experiments in this paper use seven months of keywords based on predefined polarity strengths. tweets from Twitter (www.twitter.com) collected between June 1, 2009 and December 31, 2009. ID Type Keywords The corpus contains over 476 million tweets la- PC-1 Obama obama beled with usernames and timestamps, collected PC-2 General republican, democrat, senate, congress government through Twitter’s ‘spritzer’ API without keyword , PC-3 Topic health care, economy, tax cuts, filtering. Tweets are aligned with polling data in tea party, bailout, sotomayor Section 6 using their timestamps. PC-4 Politician obama, biden, mccain, reed, The full system is evaluated against the pub- pelosi, clinton, palin licly available daily Presidential Job Approval PC-5 Ideology liberal, conservative, progres- polling data from Gallup1. Every day, Gallup asks sive, socialist, capitalist 1,500 adults in the United States about whether Table 1: The keywords used to select positive training they approve or disapprove of “the job Presi- sets for each political classifier (a subset of all PC-3 dent Obama is doing as president.” The results and PC-5 keywords are shown to conserve space). are compiled into two trend lines for Approval and Disapproval ratings, as shown in Figure 1. positive: LOL, obama made a bears refer- We compare our positive and negative sentiment ence in green bay. uh oh. scores against these two trends. negative: New blog up! It regards the new 4 Topic Identification iPhone 3G S: <URL> This section addresses the task of Topic Identi- We then use these automatically extracted fication in the context of microblogs.