<<

Truth of Varying Shades: Analyzing Language in and Political Fact-Checking

Hannah Rashkin† Eunsol Choi† Jin Yea Jang‡ Svitlana Volkova‡ Yejin Choi† †Paul G. Allen School of Computer Science &“There Engineering, are already more University American jobs of in Washington the solar industry hrashkin,eunsol,[email protected] in coal mining.” { } ‡Data Sciences and Analytics, Pacific Northwest National Laboratory True False jinyea.jang,svitlana.volkova @pnnl.gov { } -Rated True by PolitiFact, (May 2014)

Abstract “You cannot get ebola from just riding on a plane or a bus.” We present an analytic study on the lan- guage of news media in the context of po- True Mostly True False litical fact-checking and fake news detec- -Rated Mostly True by PolitiFact, (Oct. 2014) tion. We compare the language of real news with that of satire, , and pro- paganda to find linguistic characteristics “Google search spike suggests many people of untrustworthy text. To probe the feasi- don’t know why they voted for Brexit.” bility of automatic political fact-checking, we also present a case study based on True Mostly False False PolitiFact.com using their factuality judg- ments on a 6-point scale. Experiments -Rated Mostly False by PolitiFact, (June 2016) show that while media fact-checking re- Figure 1: Example statements rated by PolitiFact mains to be an open research question, as“Bymostly declaring true thatand Plutomostly was no longer false. a Misleading planet, the phras- stylistic cues can help determine the truth- (International Astronomical Union) put into place a planetary ingdefinition - bolded that would in green have even - was declassified one reason Earth for as a the in- fulness of text. betweenplanet if it ratings.existed as far from the sun as Pluto does.”

1 Introduction True Half True False

Words in news media and political discourse have and Google search-Ratedtrends Half True – areby PunditFact, presented (July ambigu- 2015) a considerable power in shaping people’s beliefs ously as if they were directly linked. and opinions. As a result, their truthfulness is of- Importantly, like above examples, most fact- ten compromised to maximize impact. Recently, checked statements on PolitiFact are rated as nei- fake news has captured worldwide interest, and ther entirely true nor entirely false. Analysis in- the number of organized efforts dedicated solely dicates that falsehoods often arise from subtle dif- 1 to fact-checking has almost tripled since 2014. ferences in phrasing rather than outright fabrica- Organizations, such as PolitiFact.com, actively in- tion (Rubin et al., 2015). Compared to most prior vestigate and rate the veracity of comments made work on literature that focused on bi- by public figures, journalists, and organizations. nary categorization of truth and deception, politi- Figure1 shows example quotes rated for truth- cal fact-checking poses a new challenge as it in- fulness by PolitiFact. Per their analysis, one com- volves a graded notion of truthfulness. ponent of the two statements’ ratings is the mis- While political fact-checking generally focuses leading phrasing (bolded in green in the figure). on examining the accuracy of a single quoted For instance, in the first example, the statement statement by a public figure, the reliability of gen- is true as stated, though only because the speaker eral news stories is also a concern (Connolly et al., hedged their meaning with the quantifier just. In 2016; Perrott, 2016). Figure2 illustrates news the second example, two correlated events – Brexit types categorized along two dimensions: the intent 1https://www.poynter.org/2017/there-are-now-114-fact- of the authors (desire to deceive) and the content checking-initiatives-in-47-countries/450477/ of the articles (true, mixed, false).

2931 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2931–2937 Copenhagen, Denmark, September 7–11, 2017. c 2017 Association for Computational Linguistics 14/avoid-these-fake-news-sites-at-all-costs been at have available edu/ lexicons Politi- Wiktionary publicly of compiled made collection and sources, ratings, unreliable fact from articles news tokenize We linguistics. computational stylistic in and analysis theory communication in works from prior resources lexical draw We articles. and news trusted fake to resources lexical various applied we false- readers. confound to truths, ambiguities and combines hoods, while stories, invent frequently to likely more are Hoaxes satire intent. and humorous the audi- recognize that will so news ences real from different intended notably is be satire to propaganda, and hoaxes Unlike • • World• & News US to Report. according type each identified under sources displays 1 Table of types. sites differing news unreliable different seven from ar- ticles crawled and corpus Gigaword English the articles from news trusted of standard types sampled we different articles, across patterns linguistic alyze Reliability Varying with Corpus News Analysis News Fake 2 fact-checking political PolitiFact. the at available using database de- scale graded 6-point investigate on a truthfulness also the We determining detection, ception de- truth. and and intents of varying quotes grees with political written media of news language the acterizing quality. information and intent based their categorized on articles news of Types 2: Figure iv atclrpltclsca agenda political/social particular a lieve Propaganda story paranoia-fueled seriously taken be to meant not is it that Satire ocaatrz ifrne ewe estypes, news between differences characterize To char- study analytic an present we paper, this In 3 2 www.usnews.com/news/national-news/articles/2016-11- of corpus including paper this for created resources All ˜

hrashkin/factcheck.html Information Quality 3 ovne edr ftevldt fa of validity the of readers convinces : iisra esbtsilce h reader the cues still but news real mimics :

hs estpsinclude: types news These Fake Trustworthy Hoax odeceive To ilasraess htte be- they that so readers misleads : neto fAuthor of Intention Propaganda homes.cs.washington. odsr odeceive to desire No rse News Trusted Satire 2 oan- To 2932 dtr ttutotysucsaepsil more possibly are sources do- trustworthy at that indifferent. Editors appear Unlike to trying are writers news opinions. main, personal lies telling their which people about 2003), in al., self-references et fewer studies found (Newman contrasts domains reliable This other less in types. in news more deceptive used or are pronouns person correction. Bonferroni after ( significant sta- tistically are reported ratios news The unreliable/reliable fea- news. between denote truthful in one prominent than more less tures ratios Ratios and prominent news, more fake features features. in denote measured one the than of greater handful a news truthful for and news unreliable between erages Discussion ad- presence. modal their and measured and adverbs, verbs) manner de- adverbs, superlatives, a action (comparatives, imply dramatization that a words gree of Wiktionary from five compiled lists We readers. to attract try to articles stories news enliven fake Wiktionary that hypothesis from a crawled on intensi- based we introduce that we lexicons ob- Lastly, fying vague, indicate language. can scuring (Hyland, hedging from hedging because for 2015) story. lexicons news use a also sensationalize We or dramatize to be can used words Subjective 2005). al., et lex- (Wilson sentiment icon a with strongly words subjective of weakly use and the al., estimate et we addition, (Pennebaker In studies 2015). science widely social lexicon in a used (LIWC), Count Word and quiry type. each of article re- com- per and averages and lexicon, port ) 2009 each for al. , count et per-document (Bird pute NLTK with text the Sec- in analysis for used 2. tion articles News 1: Table Propaganda u eut hwta rtpro n second- and first-person that show results Our In- Linguistic the is lexicons these among First Trusted Satire News Type Hoax h ooizReport Borowitz The smaie h ai fav- of ratio the summarizes 2 Table h aua News Natural The iaodNews Gigaword mrcnNews American ciitReport Activist CGazette DC h Onion The Clickhole Source < p 0 . 01 ihWlc t-test Welsch with ) 17,869 15,580 14,170 13,995 5,133 6,914 Doc of # 188 627 Tokens # e Doc. per 1,169 857 582 204 303 250 350 541 EXAMPLE LEXICON MARKERS RATIO MAX SOURCE TEXT Swear (LIWC) 7.00 Borowitz Report ... Ms. Rand, who has been damned to eternal torment ... S 2nd pers (You) 6.73 DC Gazette You would instinctively justify and rationalize your ... P Modal Adverb 2.63 American News ... investigation of was inevitably linked ... S Action Adverb 2.18 Activist News ... if one foolishly assumes the US State Department ... S 1st pers singular (I) 2.06 Activist Post I think its against the law of the land to finance riots ... S Manner Adverb 1.87 ... consequences of deliberately engineering extinction. S Sexual (LIWC) 1.80 The Onion ... added that his daughter better not be pregnant. S See (LIWC) 1.52 Clickhole New Yorkers ... can bask in the beautiful image ... H Negation(LIWC) 1.51 American News There is nothing that outrages liberals more than ... H Strong subjective 1.51 Clickhole He has one of the most brilliant minds in basketball. H Hedge (Hyland, 2015) 1.19 DC Gazette As the Communist Party USA website claims... H Superlatives 1.17 Activist News Fresh water is the single most important natural resource P Weak subjective 1.13 American News ... he made that very clear in his response to her. P Number (LIWC) 0.43 Xinhua News ... 7 million foreign tourists coming to the country in 2010 S Hear (LIWC) 0.50 AFP The prime minister also spoke about the commission ... S Money (LIWC) 0.57 NYTimes He has proposed to lift the state sales tax on groceries P Assertive 0.84 NYTimes Hofstra has guaranteed scholarships to the current players. P Comparitives 0.86 Assoc. Press ... from fossil fuels to greener sources of energy P

Table 2: Linguistic features and their relationship with fake news. The ratio refers to how frequently it appears in fake news articles compared to the trusted ones. We list linguistic phenomena more pronounced in the fake news first, and then those that appear less in the fake news. Examples illustrate sample texts from news articles containing the lexicon words. All reported ratios are statistically significant. The last column (MAX) lists compares which type of fake news most prominently used words from that lexicon (P = propaganda, S = satire, H = hoax) rigorous about removing language that seems too prominently. We found that one distinctive fea- personal, which is one reason why this result dif- ture of satire compared to other types of untrusted fers from other lie detection domains. This find- news is its prominent use of adverbs. Hoax sto- ing instead corroborates previous work in written ries tend to use fewer superlatives and compara- domains found by Ott et al.(2011) and Rayson tives. In contrast, compared to other types of fake et al.(2001), who found that such pronouns were news, propaganda uses relatively more assertive indicative of imaginative writing. Perhaps imagi- verbs and superlatives. native storytelling domains is a closer match to de- News Reliability Prediction We study the fea- tecting unreliable news than lie detection on opin- sibility of predicting the reliability of the news ions. article into four categories: trusted, satire, hoax, Our results also show that words that can be or propaganda. We split our collected articles used to exaggerate – subjectives, superlatives, and into balanced training (20k total articles from the modal adverbs – are all used more by fake news. Onion, American News, The Activist, and the Gi- Words used to offer concrete figures – compara- gaword news excluding ‘APW’, ‘WPB’ sources) tives, money, and numbers – appear more in truth- and test sets (3k articles from the remaining ful news. This also builds on previous findings by sources). Because articles in the training and test Ott et al.(2011) on the difference between superla- set come from different sources, the models must tive/comparative usage. classify articles without relying on author-specific Trusted sources are more likely to use assertive cues. We also use 20% of the training articles words and less likely to use hedging words, in- as an in-domain development set. We trained a dicating that they are less vague about describing Max-Entropy classifier with L2 regularization on events, as well. This relates to psychology the- n-gram tf-idf feature vectors (up to trigrams).4 ories (Buller and Burgoon, 1996) that deceivers The model achieves F1 scores of 65% on the show more “uncertainty and vagueness” and “in- out-of-domain test set (Table3). This is a promis- direct forms of expression”. Similarly, the trusted ing result as it is much higher than random, but sources use the hear category words more often, still leaves room for improvement compared to the possibly indicating that they are citing primary 4 sources more often. N-gram tfidf vectors have acted as competitive means of cross-domain text-classification. Zhang et al.(2015) found The last column in Table2 shows the fake news that for data sets smaller than a million examples, this was type that uses the corresponding lexicon most the best model, outperforming neural models.

2933 More True More False Data Sources Random MaxEnt Mostly Half Mostly Pants- True False Dev in-domain 0.26 0.91 True True False on-fire Test out-of-domain 0.26 0.65 6-class 20% 21% 21% 14% 17% 7% 2-class 62% 38% Table 3: F1 scores of 4-way classification of news reliability. Table 4: PolitiFact label distribution. PolitiFact uses a 6-point scale ranging from: True, Mostly performance on the development set consisting of True, Half-true, Mostly False, False, and Pants- articles from in-domain sources. on-fire False. We examined the 50 highest weighted n-gram features in the MaxEnt classifier for each class. in Table4. Most statements are labeled as neither The highest weighted n-grams for trusted news completely true nor false. were often specific places (e.g., “washington”) We formulate a fine-grained truthfulness pre- or times (“on monday”). Many of the high- diction task with Politifact data. We split quotes est weighted from satire were vaguely facetious into training/development/test set of 2575, 712, { hearsay (“reportedly”, “confirmed”). For hoax 1074 statements, respectively, so that all of each } articles, heavily weighted features included di- speaker’s quotes are in a single set. Given a state- visive topics (“liberals”, “trump”) and dramatic ment, the model returns a rating for how reliable cues (“breaking”). Heavily weighted features the statement is (Politifact ratings are used as gold for propaganda tend towards abstract generali- labels). We ran the experiment in two settings, one ties (“truth”, “freedom”) as well as specific issues considering all 6 classes and the other considering (“vaccines”, “syria”). Interestingly, “youtube” and only 2 (treating the top three truthful ratings as true “video” are highly weighted for the propaganda and the lower three as false). and hoax classes respectively; indicating that they often rely on video clips as sources. Model We trained an LSTM model (Hochreiter and Schmidhuber, 1997) that takes the sequence 3 Predicting Truthfulness of words as the input and predicts the Politifact rating. We also compared this model with Maxi- Politifact Data Related to the issue of identify- mum Entropy (MaxEnt) and Naive Bayes models, ing the truthfulness of a news article is the fact- frequently used for text categorization. checking of individual statements made by public For input to the MaxEnt and Naive Bayes mod- figures. Misleading statements can also have a va- els, we tried two variants: one with the word tf- riety of intents and levels of reliability depending idf vectors as input, and one with the LIWC mea- on whom is making the statement. surements concatenated to the tf-idf vectors. For PolitiFact5 is a site led by the LSTM model, we used word sequences as in- journalists who actively fact-check suspicious put and also a version where LSTM output is con- statements. One unique quality of PolitiFact is that catenated with LIWC feature vectors before un- each quote is evaluated on a 6-point scale of truth- dergoing the activation layer. The LSTM word fulness ranging from “True” (factual) to “Pants- embeddings are initialized with 100-dim embed- on-Fire False” (absurdly false). This scale allows dings from GLOVE (Pennington et al., 2014) and for distinction between categories like mostly true fine-tuned during training. The LSTM was im- (the facts are correct but presented in an incom- plemented with Theano and Keras with 300-dim plete manner) or mostly false (the facts are not cor- hidden state and a batch size of 64. Training was rect but are connected to a small kernel of truth). done with ADAM to minimize categorical cross- We collected labelled statements from Poli- entropy loss over 10 epochs. tiFact and its -off sites (PunditFact, etc.) Classifier Results Table5 summarizes the per- (10,483 statements in total). We analyze a sub- formance on the development set. We report set of 4,366 statements that are direct quotes by macro averaged F1 score in all tables. The LSTM the original speaker. The distributions of ratings outperforms the other models when only using text on the PolitiFact scale for this subset are shown as input; however the other two models improve 5www.politifact.com/ substantially with adding LIWC features, particu-

2934 2-CLASS 6-CLASS (Lord et al., 1979; Thorson, 2016; Nyhan and Rei- text + LIWC text + LIWC fler, 2015). Prior computational works (Vlachos Majority Baseline .39 - .06 - and Riedel, 2014; Ciampaglia et al., 2015) have Naive Bayes .44 .58 .16 .21 MaxEnt .55 .58 .20 .21 proposed fact-checking through entailment from LSTM .58 .57 .21 .22 knowledge bases. Our work takes a more lin- guistic approach, performing lexical analysis over Table 5: Model performance on the Politifact val- varying types of falsehood. idation set. Biyani et al.(2016) examine the unique linguis- tic styles found in clickbait articles, and Kumar MODEL FEATURE 2-CLASS 6-CLASS et al.(2016) also characterize hoax documents on Majority Baseline .39 .06 Wikipedia. The differentiation between these fake Naive Bayes text + LIWC .56 .17 news types is also proposed in previous work (Ru- MaxEnt text + LIWC .55 .22 LSTM text + LIWC .52 .19 bin et al., 2015). Our paper extends this work by LSTM text .56 .20 offering a quantitative study of linguistic differ- ences found in articles of different types of fake Table 6: Model performance on the Politifact test news, and build predictive models for graded de- set. ception across multiple domains – PolitiFact and news articles. More recent work (Wang, 2017) has larly in the case of the multinomial naive Bayes also investigated PolitiFact data though they inves- model. In contrast, the LIWC features do not im- tigated meta-data features for prediction whereas prove the neural model much, indicating that some our investigation is focused on linguistic analysis of this lexical information is perhaps redundant to through stylistic lexicons. what the model was already learning from text. We report results on the test set in Table6. We 5 Conclusion again find that LIWC features improves MaxEnt We examine truthfulness and its contributing lin- and NB models to perform similarly to the LSTM guistic attributes across multiple domains e.g., on- model. As in the dev. set results, the LIWC fea- line news sources and public statements. We per- tures do not improve the LSTM’s performance, form multiple prediction tasks on fact-checked and even seem to hurt the performance slightly. statements of varying levels of truth (graded de- 4 Related Work ception) as well as a deeper linguistic compari- son of differing types of fake news e.g., propa- Deception Detection Psycholinguistic work in ganda, satire and hoaxes. We have shown that fact- interpersonal deception theory (Buller and Bur- checking is indeed a challenging task but that var- goon, 1996) has postulated that certain speech pat- ious lexical features can contribute to our under- terns can be signs of a speaker trying to purpose- standing of the differences between more reliable fully obscure the truth. Hedge words and other and less reliable digital news sources. vague qualifiers (Choi et al., 2012; Recasens et al., 2013), for example, may add indirectness to a 6 Acknowledgements statement that obscures its meaning. We would like to thank anonymous reviewers for Linguistic aspects deception detection has been providing insightful feedback. The research de- well-studied in a variety of NLP applications (Ott scribed in this paper was conducted under the Lab- et al., 2011; Mihalcea and Strapparava, 2009; Jin- oratory Directed Research and Development Pro- dal and Liu, 2008; Girlea et al., 2016; Zhou et al., gram at Pacific Northwest National Laboratory, a 2004). In these applications, people purposefully multiprogram national laboratory operated by Bat- tell lies to receive an extrinsic payoff. In our telle for the U.S. Department of Energy, the Na- study, we compare varying types of unreliable tional Science Foundation Graduate Research Fel- news source, created with differing intents and lowship Program under Grant No. DGE-1256082, levels of veracity. in part by NSF grants IIS-1408287, IIS-1714566, Fact-Checking and Fake News There is re- and gifts by Google and . search in political science exploring how effective fact-checking is at improving people’s awareness

2935 References of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steven Bird, Ewan Klein, and Edward Loper. Steering Committee, pages 591–602. 2009. Natural Language Processing with Python. O’Reilly Media. Charles G Lord, Lee Ross, and Mark R Lepper. 1979. Biased assimilation and attitude polarization: The Prakhar Biyani, Kostas Tsioutsiouliklis, and John effects of prior theories on subsequently considered Blackmer. 2016. ”8 amazing secrets for getting evidence. Journal of Personality and Social Psy- more clicks”: Detecting clickbaits in news streams chology 37(11):2098–2109. using article informality. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Rada Mihalcea and Carlo Strapparava. 2009. The lie AAAI Press, pages 94–100. detector: Explorations in the automatic recognition David B. Buller and Judee K. Burgoon. 1996. In- of deceptive language. In Proceedings of the ACL- terpersonal deception theory. Communication The- IJCNLP 2009 Conference Short Papers. Association ory 6(3):203–242. https://doi.org/10.1111/j.1468- for Computational Linguistics, pages 309–312. 2885.1996.tb00127.x. Matthew L Newman, James W Pennebaker, Diane S Eunsol Choi, Chenhao Tan, Lillian Lee, Cristian Berry, and Jane M Richards. 2003. Lying words: Danescu-Niculescu-Mizil, and Jennifer Spindel. Predicting deception from linguistic styles. Person- 2012. Hedge detection as a lens on framing in the ality and social psychology bulletin 29(5):665–675. gmo debates: A position paper. In Proceedings of the Workshop on Extra-Propositional Aspects of Brendan Nyhan and Jason Reifler. 2015. The effect of Meaning in Computational Linguistics. Association fact-checking on elites: A field experiment on US for Computational Linguistics, pages 70–79. state legislators. American Journal of Political Sci- ence 59(3):628–640. Giovanni Luca Ciampaglia, Prashant Shi- ralkar, Luis M. Rocha, Johan Bollen, Filippo Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T. Menczer, and Alessandro Flammini. 2015. Hancock. 2011. Finding deceptive opinion spam Computational fact checking from knowl- by any stretch of the imagination. In Proceedings edge networks. PLOS ONE 10(6):e0128193. of the 49th Annual Meeting of the Association for https://doi.org/10.1371/journal.pone.0128193. Computational Linguistics: Human Language Tech- nologies. Association for Computational Linguis- Kate Connolly, Angelique Chrisafis, Poppy tics, pages 309–319. McPherson, Stephanie Kirchgaessner, Ben- jamin Haas, Dominic Phillips, Elle Hunt, and James W. Pennebaker, Roger J. Booth, Ryan L. Boyd Michael Safi. 2016. Fake news: An insidi- Boyd, and Martha E. Francis. 2015. Linguistic In- ous trend that’s fast becoming a global problem. quiry and Word Count: LIWC2015. Pennebaker- https://www.theguardian.com/media/2016/dec/02/fake- Conglomerates, Austin, TX. www.liwc.net. news-facebook-us-election-around-the-world. Accessed: 2017-01-30. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word Codruta Girlea, Roxana Girju, and Eyal Amir. 2016. representation. In Proceedings of the 2014 Con- Psycholinguistic features for deceptive role detec- ference on Empirical Methods in Natural Language tion in werewolf. In Proceedings of the 2016 Con- Processing (EMNLP). Association for Computa- ference of the North American Chapter of the Asso- tional Linguistics, pages 1532–1543. ciation for Computational Linguistics: Human Lan- guage Technologies. Association for Computational Kathryn Perrott. 2016. ‘Fake news’ on social me- Linguistics, pages 417–422. dia influenced US election voters, experts say. http://www.abc.net.au/news/2016-11-14/fake- Sepp Hochreiter and Jurgen¨ Schmidhuber. 1997. news-would-have-influenced-us-election-experts- Long short-term memory. Neural Computation say/8024660. Accessed: 2017-01-30. 9(8):1735–1780.

Ken Hyland. 2015. Metadiscourse. In The Interna- Paul Rayson, Andrew Wilson, and Geoffrey Leech. tional Encyclopedia of Language and Social Inter- 2001. Grammatical word class variation within action, John Wiley & Sons, Inc., pages 1–11. the british national corpus sampler. Language and Computers 36(1):295–306. Nitin Jindal and Bing Liu. 2008. Opinion spam and analysis. In Proceedings of the 2008 International Marta Recasens, Cristian Danescu-Niculescu-Mizil, Conference on Web Search and Data Mining. ACM, and Dan Jurafsky. 2013. Linguistic models for ana- pages 219–230. lyzing and detecting biased language. In Proceed- ings of the 51st Annual Meeting of the Associa- Srijan Kumar, Robert West, and Jure Leskovec. 2016. tion for Computational Linguistics (Volume 1: Long on the web: Impact, characteristics, Papers). Association for Computational Linguistics, and detection of wikipedia hoaxes. In Proceedings pages 1650–1659.

2936 Victoria L. Rubin, Yimin Chen, and Niall J. Conroy. 2015. Deception detection for news: Three types of fakes. Proceedings of the Association for Informa- tion Science and Technology 52(1):1–4. Emily Thorson. 2016. Belief echoes: The persistent effects of corrected . Political Com- munication 33(3):460–480. Andreas Vlachos and Sebastian Riedel. 2014. Fact checking: Task definition and dataset construction. In Proceedings of the ACL 2014 Workshop on Lan- guage Technologies and Computational Social Sci- ence. Association for Computational Linguistics, pages 18–22. William Yang Wang. 2017. “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of the Association for Computational Linguistics Short Papers. Association for Computa- tional Linguistics. Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase- level sentiment analysis. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Pro- cessing. Association for Computational Linguistics, pages 347–354.

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text clas- sification. In Advances in Neural Information Pro- cessing Systems. pages 649–657.

Lina Zhou, Judee K. Burgoon, Jay F. Nunamaker, and Doug Twitchell. 2004. Automating linguistics- based cues for detecting deception in text-based asynchronous computer-mediated communications. Group Decision and Negotiation 13(1):81–106.

2937