Identification of Good and Bad News on Twitter
Total Page:16
File Type:pdf, Size:1020Kb
Identification of Good and Bad News on Twitter Piush Aggarwal Ahmet Aker University of Duisburg-Essen University of Duisburg-Essen [email protected] [email protected] Abstract Studies show that news, in general, has a signif- icant impact on our mental stature (Johnston and Social media plays a great role in news dis- Davey, 1997). However, it is also demonstrated semination which includes good and bad that the influence of bad news is more significant news. However, studies show that news, than good news (Soroka, 2006; Baumeister et al., in general, has a significant impact on 2001) and that due to the natural negativity bias, our mental stature and that this influence as described by (Rozin and Royzman, 2001), hu- is more in bad news. An ideal situation mans may end up consuming more bad than good would be that we have a tool that can help news. Since bad news travels faster than good to filter out the type of news we do not news (Kamins et al., 1997; Hansen et al., 2011) the want to consume. In this paper, we provide consumption may increase. This is a real threat to the basis for such a tool. In our work, we the society as according to medical doctors and, focus on Twitter. We release a manually psychologists exposure to bad news may have se- annotated dataset containing 6,853 tweets vere and long-lasting negative effects for our well from 5 different topical categories. Each being and lead to stress, anxiety, and depression tweet is annotated with good and bad la- (Johnston and Davey, 1997). (Milgrom, 1981; bels. We also investigate various machine BRAUN et al., 1995; Conrad et al., 2002; Soroka, learning systems and features and evaluate 2006) describe crucial role of good and bad news their performance on the newly generated on financial markets. For instance, bad news about dataset. We also perform a comparative unemployment is likely to affect stock markets and analysis with sentiments showing that sen- in turn, the overall economy (Boyd et al., 2005). timent alone is not enough to distinguish Differentiating between good and bad news may between good and bad news. help readers to combat this issue and a system that filters news based on the content may enable them 1 Introduction to control the amount of bad news they are con- suming. Social media sites like Twitter, Facebook, Reddit, The aim of this paper is to provide the basis to etc. have become a major source of information develop such a filtering system to help readers in seeking. They provide chances to users to shout their selection process. We focus on Twitter and to the world in search of vanity, attention or just aim to develop such a filtering system for tweets. shameless self-promotion. There is a lot of per- On this respect the contributions of this work are: sonal discussions but at the same time, there is a base of useful knowledgeable content which is • We introduce a new task, namely the distinc- worthy enough to consider for the public inter- tion between good and bad news on Twitter. est. For example in Twitter, tweets may report about news related to recent events such as natural • We provide the community with a new gold or man-made disasters, discoveries made, local or standard dataset containing 6,893 tweets. global election outcomes, health reports, financial Each tweet is labeled either as good or bad. updates, etc. In all cases, there are good and bad To the best of our knowledge, this is the first news scenarios. dataset containing tweets with good and bad 9 Proceedings of Recent Advances in Natural Language Processing, pages 9–17, Varna, Bulgaria, Sep 2–4, 2019. https://doi.org/10.26615/978-954-452-056-4_002 labels. The dataset is publicly accessible and can be used for further research1. • Provide guidelines to annotate good/bad news on Twitter. • We implement several features approaches and report their performances. • The dataset covers diverse domains. We also show out-of-domain experiments and report Figure 1: Good news tweet system performances when they are trained on in-domain and tested on out-of-domain data. In the following, we first discuss related work. In Section3 we discuss the guidelines that we use to annotate tweets and gather our dataset. Sec- tion4 provides description about the data itself. In Section5 we describe several baseline systems Figure 2: Bad news tweet performing the good and bad news classification as well as features used to guide the systems. Fi- 3 Good vs Bad News nally, we conclude the paper in Section7. News can be good for one section of society but 2 Related Work bad for other section. For example, win or loss re- In terms of classifying tweets into the good and lated news are always subjective. In such cases, bad classes no prior work exists. The clos- agreement towards news types (good or bad) is est studies to our work, are those performing quite low. On the other hand, news related to sentiment classification in Twitter (Nakov et al., natural disaster, geographical changes, humanity, 2016; Rosenthal et al., 2017). Kouloumpis et al. women empowerment, etc. show very high agree- (2011) use n-gram, lexicon, part of speech and ment. Therefore, while defining news types, topi- micro-blogging features for detecting sentiment in cality plays an important role. tweets. Similar features are used by Go(2009). We consider news as good news if it relates to More recently researchers also investigated deep low subjective topics and includes positive over- learning strategies to tackle the tweet level sen- tones such as recoveries, breakthroughs, cures, timent problem (Severyn and Moschitti, 2015; wins, and celebrations (Harcup and ONeill, 2017) Ren et al., 2016). Twitter is multi-lingual and and also beneficial for an individual, a group or in Mozeticˇ et al.(2016) the idea of multi-lingual society. An example of good news is shown in sentiment classification is investigated. The task, Figure1. In contrary to that, the bad news is de- as well as approaches proposed for determining fined as when it relates to the low subjective topic tweet level sentiment, are nicely summarized in and include negative overtones such as death, in- the survey paper of Kharde et al.(2016). How- jury, defeat, loss and is not beneficial for an in- ever, Balahur et al.(2010) reports that there is no dividual, a group or society. An example of bad link between good and bad news with positive and news is shown in Figure2. Based on these defini- negative sentiment respectively. tions/guidelines we have gathered our dataset (see Thus, unlike related work, we do tweet level next Section) of tweets containing the good and good vs. bad news classification. We also show bad labels. that similar to Balahur et al.(2010), there is no ev- idence that positive sentiment implies good news 4 Dataset and negative sentiment bad news. Data collection To collect tweets for annotation, 1https://github.com/aggarwalpiush/ we first choose low subjective ten topics which goodBadNewsTweet can be divided into five different categories. Then, 10 Category Topics Collected Annotated Ebola 892 852 Health Hiv 663 630 Hurricane Harvey 2,073 1,997 Natural Disaster Hurricane Irma 795 772 Macerata oohmm 668 625 Terrorist Attack Stockholm Attack 743 697 AGU17 652 592 Geography and Env. Swachh Bharat 21 21 IOT 627 602 Science and Edu. Nintendo 78 65 Total 7,212 6,853 Table 1: Categories, their topics, and distributions for the dataset generation. we retrieve the examples from Twitter using its We ask three annotators5 to classify the selected API2. Next, we discard non-English tweets and examples into good and bad news. We also al- re-tweets. We also remove duplicates based on lowed a third category cannot say. We computed lower-cased first four words of tweets keeping Fleiss’ kappa Fleiss(1971) on the trial dataset for only the first one. Thereafter, we filter only those the three annotators. The value is 0.605 which in- tweets which can be regarded as news by using an dicates rather a high agreement. We used 247 in- in house SVM classifier (Aggarwal, 2019). This stances agreed by all the three annotators as test classifier is trained on tweets annotated with the questions for the crowdsourcing platform. labels news and not news. We use this classifier to During the crowd annotation, we showed each remove not news tweets from the annotation task3. annotator 5 tweets per page and paid 3 US Cents We select only tweets where the classifier predic- per tweet. For maintaining quality standards, in tion probability is greater than or equal to 80%. In addition to the test questions, we applied a re- Table1, we provide information about the topics striction so that annotation could be performed and categories as well as statistics about the col- only by people from English speaking countries. lected tweets that will be used for annotation (col- We also made sure that each annotation was per- umn collected). formed maximum by 7 annotators and that an an- notator agreement of min. 70% was met. Note if Data Annotation For data annotation, we use the agreement of 70% was met with fewer anno- the figure-eight crowdsourcing service4. Before tators then the system would not force an anno- uploading our collected examples, we carried out a tation to be done by 7 annotators but would fin- round of trial annotation of 300 randomly selected ish earlier. The system requires 7 annotators if instances from our tweet collection corpus.