Analyzing Language in Fake News and Political Fact-Checking

Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking Hannah Rashkin† Eunsol Choi† Jin Yea Jang‡ Svitlana Volkova‡ Yejin Choi† †Paul G. Allen School of Computer Science &“There Engineering, are already more University American jobs of in Washington the solar industry hrashkin,eunsol,[email protected] in coal mining.” { } ‡Data Sciences and Analytics, Pacific Northwest National Laboratory True False jinyea.jang,svitlana.volkova @pnnl.gov { } -Rated True by PolitiFact, (May 2014) Abstract “You cannot get ebola from just riding on a plane or a bus.” We present an analytic study on the language of news media in the context of po- True Mostly True False litical fact-checking and fake news detec- -Rated Mostly True by PolitiFact, (Oct. 2014) tion. We compare the language of real news with that of satire, hoaxes, and propaganda to find linguistic characteristics “Google search spike suggests many people of untrustworthy text. To probe the feasi- don’t know why they voted for Brexit.” bility of automatic political fact-checking, we also present a case study based on True Mostly False False PolitiFact.com using their factuality judg- ments on a 6-point scale. Experiments -Rated Mostly False by PolitiFact, (June 2016) show that while media fact-checking re- Figure 1: Example statements rated by PolitiFact mains to be an open research question, as“Bymostly declaring true thatand Plutomostly was no longer false. a Misleading planet, the phras- stylistic cues can help determine the truth- (International Astronomical Union) put into place a planetary ingdefinition - bolded that would in green have even - was declassified one reason Earth for as a the in- fulness of text. betweenplanet if it ratings.existed as far from the sun as Pluto does.” 1 Introduction True Half True False Words in news media and political discourse have and Google search-Ratedtrends Half True – areby PunditFact, presented (July ambigu- 2015) a considerable power in shaping people’s beliefs ously as if they were directly linked. and opinions. As a result, their truthfulness is of- Importantly, like above examples, most fact- ten compromised to maximize impact. Recently, checked statements on PolitiFact are rated as nei- fake news has captured worldwide interest, and ther entirely true nor entirely false. Analysis in- the number of organized efforts dedicated solely dicates that falsehoods often arise from subtle dif- 1 to fact-checking has almost tripled since 2014. ferences in phrasing rather than outright fabrica- Organizations, such as PolitiFact.com, actively in- tion (Rubin et al., 2015). Compared to most prior vestigate and rate the veracity of comments made work on deception literature that focused on bi- by public figures, journalists, and organizations. nary categorization of truth and deception, politi- Figure1 shows example quotes rated for truth- cal fact-checking poses a new challenge as it in- fulness by PolitiFact. Per their analysis, one com- volves a graded notion of truthfulness. ponent of the two statements’ ratings is the mis- While political fact-checking generally focuses leading phrasing (bolded in green in the figure). on examining the accuracy of a single quoted For instance, in the first example, the statement statement by a public figure, the reliability of gen- is true as stated, though only because the speaker eral news stories is also a concern (Connolly et al., hedged their meaning with the quantifier just. In 2016; Perrott, 2016). Figure2 illustrates news the second example, two correlated events – Brexit types categorized along two dimensions: the intent 1https://www.poynter.org/2017/there-are-now-114-fact- of the authors (desire to deceive) and the content checking-initiatives-in-47-countries/450477/ of the articles (true, mixed, false). 2931 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2931–2937 Copenhagen, Denmark, September 7–11, 2017. c 2017 Association for Computational Linguistics News # of # Tokens Source Type Doc per Doc. Trusted News Trusted Gigaword News 13,995 541 Trustworthy Propaganda The Onion 14,170 350 Satire The Borowitz Report 627 250 Hoax Satire Clickhole 188 303 Fake Information Quality American News 6,914 204 To deceive No desire to deceive Hoax DC Gazette 5,133 582 Intention of Author The Natural News 15,580 857 Propaganda Figure 2: Types of news articles categorized based Activist Report 17,869 1,169 on their intent and information quality. Table 1: News articles used for analysis in Sec- tion2. In this paper, we present an analytic study char- acterizing the language of political quotes and news media written with varying intents and de- the text with NLTK (Bird et al., 2009) and com- grees of truth. We also investigate graded de- pute per-document count for each lexicon, and re- ception detection, determining the truthfulness on port averages per article of each type. a 6-point scale using the political fact-checking 2 First among these lexicons is the Linguistic In- database available at PolitiFact. quiry and Word Count (LIWC), a lexicon widely 2 Fake News Analysis used in social science studies (Pennebaker et al., 2015). In addition, we estimate the use of strongly News Corpus with Varying Reliability To an- and weakly subjective words with a sentiment lex- alyze linguistic patterns across different types of icon (Wilson et al., 2005). Subjective words can be articles, we sampled standard trusted news articles used to dramatize or sensationalize a news story. from the English Gigaword corpus and crawled ar- We also use lexicons for hedging from (Hyland, ticles from seven different unreliable news sites of 2015) because hedging can indicate vague, ob- differing types. Table1 displays sources identified scuring language. Lastly, we introduce intensi- under each type according to US News & World fying lexicons that we crawled from Wiktionary Report.3 These news types include: based on a hypothesis that fake news articles try to Satire: mimics real news but still cues the reader enliven stories to attract readers. We compiled five • that it is not meant to be taken seriously lists from Wiktionary of words that imply a de- Hoax: convinces readers of the validity of a gree a dramatization (comparatives, superlatives, • paranoia-fueled story action adverbs, manner adverbs, and modal ad- Propaganda: misleads readers so that they be- verbs) and measured their presence. • lieve a particular political/social agenda Unlike hoaxes and propaganda, satire is intended Discussion Table2 summarizes the ratio of av- to be notably different from real news so that audi- erages between unreliable news and truthful news ences will recognize the humorous intent. Hoaxes for a handful of the measured features. Ratios and satire are more likely to invent stories, while greater than one denote features more prominent propaganda frequently combines truths, false- in fake news, and ratios less than one denote fea- hoods, and ambiguities to confound readers. tures more prominent in truthful news. The ratios To characterize differences between news types, between unreliable/reliable news reported are sta- we applied various lexical resources to trusted and tistically significant (p < 0.01) with Welsch t-test fake news articles. We draw lexical resources from after Bonferroni correction. prior works in communication theory and stylistic Our results show that first-person and second- analysis in computational linguistics. We tokenize person pronouns are used more in less reliable or deceptive news types. This contrasts studies 2All resources created for this paper including corpus of news articles from unreliable sources, collection of Politi- in other domains (Newman et al., 2003), which fact ratings, and compiled Wiktionary lexicons have been found fewer self-references in people telling lies made publicly available at homes.cs.washington. about their personal opinions. Unlike that do- edu/˜hrashkin/factcheck.html 3www.usnews.com/news/national-news/articles/2016-11- main, news writers are trying to appear indifferent. 14/avoid-these-fake-news-sites-at-all-costs Editors at trustworthy sources are possibly more 2932 EXAMPLE LEXICON MARKERS RATIO MAX SOURCE TEXT Swear (LIWC) 7.00 Borowitz Report ... Ms. Rand, who has been damned to eternal torment ... S 2nd pers (You) 6.73 DC Gazette You would instinctively justify and rationalize your ... P Modal Adverb 2.63 American News ... investigation of Hillary Clinton was inevitably linked ... S Action Adverb 2.18 Activist News ... if one foolishly assumes the US State Department ... S 1st pers singular (I) 2.06 Activist Post I think its against the law of the land to finance riots ... S Manner Adverb 1.87 Natural News ... consequences of deliberately engineering extinction. S Sexual (LIWC) 1.80 The Onion ... added that his daughter better not be pregnant. S See (LIWC) 1.52 Clickhole New Yorkers ... can bask in the beautiful image ... H Negation(LIWC) 1.51 American News There is nothing that outrages liberals more than ... H Strong subjective 1.51 Clickhole He has one of the most brilliant minds in basketball. H Hedge (Hyland, 2015) 1.19 DC Gazette As the Communist Party USA website claims... H Superlatives 1.17 Activist News Fresh water is the single most important natural resource P Weak subjective 1.13 American News ... he made that very clear in his response to her. P Number (LIWC) 0.43 Xinhua News ... 7 million foreign tourists coming to the country in 2010 S Hear (LIWC) 0.50 AFP The prime minister also spoke about the commission ... S Money (LIWC) 0.57 NYTimes He has proposed to lift the state sales tax on groceries P Assertive 0.84 NYTimes Hofstra has guaranteed scholarships to the current players. P Comparitives 0.86 Assoc. Press ... from fossil fuels to greener sources of energy P Table 2: Linguistic features and their relationship with fake news. The ratio refers to how frequently it appears in fake news articles compared to the trusted ones.

Analyzing Language in Fake News and Political Fact-Checking

THE EFFECTS of FACT-CHECKING THREAT Results from a Field Experiment in the States

How White Supremacy Returned to Mainstream Politics

Climate Change in the Era of Post-Truth

IN AMERICAN POLITICS TODAY, EVEN FACT-CHECKERS ARE VIEWED THROUGH PARTISAN LENSES by David C

Analyzing Language in Fake News and Political Fact-Checking

Understanding and Undermining Fake News from the Classroom Adam Rosenzweig1 Beyond 12

The Politicization and Polarization of Climate Change

Conspiracy Theories: a Cultural Evolution Theory Approach

How to Talk to Friends & Family Who Share

Media Manipulation and Disinformation Online Alice Marwick and Rebecca Lewis CONTENTS

Resources Compiled by Nicole A. Cooke, for the Fake News Workshop Presented at the Ischool at the University of Illinois - February 1, 2017

The Rhetoric of Conspiracism in User-Centered Democracy