That Is a Known Lie: Detecting Previously Fact-Checked Claims
Total Page:16
File Type:pdf, Size:1020Kb
That is a Known Lie: Detecting Previously Fact-Checked Claims Shaden Shaar1, Nikolay Babulkov2, Giovanni Da San Martino1, Preslav Nakov1 1Qatar Computing Research Institute, HBKU, Doha, Qatar 2Sofia University, Sofia, Bulgaria fsshar, gmartino, [email protected] [email protected] Abstract Governments, international organizations, tech companies, media, journalists, and regular users The recent proliferation of “fake news” has launched a number of initiatives to limit the impact triggered a number of responses, most notably of the newly emerging large-scale weaponization the emergence of several manual fact-checking of disinformation1 online. Notably, this included initiatives. As a result and over time, a large number of fact-checked claims have been ac- manual fact-checking initiatives, which aimed at cumulated, which increases the likelihood that debunking various false claims, with the hope to a new claim in social media or a new statement limit its impact, but also to educate the public that by a politician might have already been fact- not all claims online are true. checked by some trusted fact-checking organi- Over time, the number of such initiatives grew zation, as viral claims often come back after substantially, e.g., at the time of writing, the Duke a while in social media, and politicians like to repeat their favorite statements, true or false, Reporters’ Lab lists 237 active fact-checking orga- 2 over and over again. As manual fact-checking nizations plus another 92 inactive. While some is very time-consuming (and fully automatic organizations debunked just a couple of hundred fact-checking has credibility issues), it is im- claims, others such as Politifact,3 FactCheck.org,4 portant to try to save this effort and to avoid Snopes,5 and Full Fact6 have fact-checked thou- wasting time on claims that have already been sands or even tens of thousands of claims. fact-checked. Interestingly, despite the impor- tance of the task, it has been largely ignored by The value of these collections of resources has the research community so far. Here, we aim been recognized in the research community, and to bridge this gap. In particular, we formulate they have been used to train systems to perform the task and we discuss how it relates to, but automatic fact-checking (Popat et al., 2017; Wang, also differs from, previous work. We further 2017; Zlatkova et al., 2019) or to detect check- create a specialized dataset, which we release worthy claims in political debates (Hassan et al., to the research community. Finally, we present 2015; Gencheva et al., 2017; Patwari et al., 2017; learning-to-rank experiments that demonstrate Vasileva et al., 2019). There have also been datasets sizable improvements over state-of-the-art re- trieval and textual similarity approaches. that combine claims from multiple fact-checking organizations (Augenstein et al., 2019), again with 1 Introduction the aim of performing automatic fact-checking. arXiv:2005.06058v1 [cs.CL] 12 May 2020 The year 2016 was marked by massive disinforma- 1In the public discourse, the problem is generally known tion campaigns related to Brexit and the US Presi- as “fake news”, a term that was declared Word of the Year 2017 by Collins dictionary. Despite its popularity, it remains dential Elections. While false statements are not a a confusing term, with no generally agreed upon definition. It new phenomenon, e.g., yellow press and tabloids is also misleading as it puts emphasis on (a) the claim being have been around for decades, this time things were false, while generally ignoring (b) its intention to do harm. In contrast, the term disinformation covers both aspects (a) and notably different in terms of scale and effectiveness (b), and it is generally preferred at the EU level. thanks to social media platforms, which provided 2http://reporterslab.org/ both a medium to reach millions of users and an fact-checking/ 3http://www.politifact.com/ easy way to micro-target specific narrow groups of 4http://www.factcheck.org/ voters based on precise geographical, demographic, 5http://www.snopes.com/ psychological, and/or political profiling. 6http://fullfact.org/ Interestingly, despite the importance of the task of detecting whether a claim has been fact-checked in the past, it has been largely ignored by the research community. Here, we aim to bridge this gap. Our contributions can be summarized as follows: Figure 1: A general information verification pipeline. • We formulate the task and we discuss how it relates to, but differs from, previous work. It has been argued that checking against a • We create a specialized dataset, which we database of previously fact-checked claims should release to the research community.8 Un- be an integral step of an end-to-end automated fact- like previous work in fact-checking, which checking pipeline (Hassan et al., 2017). This is il- used normalized claims from fact-checking lustrated in Figure1, which shows the general steps datasets, we work with naturally occurring of such a pipeline (Elsayed et al., 2019): (i) assess claims, e.g., in debates or in social media. the check-worthiness of the claim (which could come from social media, from a political debate, • We propose a learning-to-rank model that etc.), (ii) check whether a similar claim has been achieves sizable improvements over state-of- previously fact-checked (the task we focus on here), the-art retrieval and textual similarity models. (iii) retrieve evidence (from the Web, from social media, from Wikipedia, from a knowledge base, The remainder of this paper is organized as fol- etc.), and (iv) assess the factuality of the claim. lows: Section2 discusses related work, Section3 From a fact-checkers’ point of view, the abun- introduces the task, Section4 presents the dataset, dance of previously fact-checked claims increases Section5 discusses the evaluation measures, Sec- the likelihood that the next claim that needs to be tion6 presents the models we experiment with, checked would have been fact-checked already by Section7 described our experiments, and Section8 some trusted organization. Indeed, viral claims of- concludes and discusses future work. ten come back after a while in social media, and 2 Related Work politicians are known to repeat the same claims over and over again.7 Thus, before spending hours To the best of our knowledge, the task of detecting fact-checking a claim manually, it is worth first whether a claim has been previously fact-checked making sure that nobody has done it already. was not addressed before. Hassan et al.(2017) On another point, manual fact-checking often mentioned it as an integral step of their end-to-end comes too late. A study has shown that “fake news” automated fact-checking pipeline, but there was spreads six times faster than real news (Vosoughi very little detail provided about this component et al., 2018). Another study has indicated that over and it was not evaluated. 50% of the spread of some viral claims happens In an industrial setting, Google has developed within the first ten minutes of their posting on social Fact Check Explorer,9 which is an exploration media (Zaman et al., 2014). At the same time, tool that allows users to search a number of fact- detecting that a new viral claim has already been checking websites (those that use ClaimReview fact-checked can be done automatically and very from schema.org10) for the mentions of a topic, quickly, thus allowing for a timely action that can a person, etc. However, the tool cannot handle a limit the spread and the potential malicious impact. complex claim, as it runs Google search, which is From a journalistic perspective, the ability to not optimized for semantic matching of long claims. check quickly whether a claim has been previously While this might change in the future, as there have fact-checked could be revolutionizing as it would been reports that Google has started using BERT allow putting politicians on the spot in real time, in its search, at the time of writing, the tool could e.g., during a live interview. In such a scenario, not handle a long claim as an input. automatic fact-checking would be of limited utility 8Data and code are available at the following URL: as, given the current state of technology, it does not https://github.com/sshaar/ offer enough credibility in the eyes of a journalist. That-is-a-Known-Lie 9http://toolbox.google.com/factcheck/ 7President Trump has repeated one claim over 80 times: explorer http://tinyurl.com/yblcb5q5. 10http://schema.org/ClaimReview A very similar work is the ClaimsKG dataset and While our main contribution here is the new task system (Tchechmedjiev et al., 2019), which in- and the new dataset, we should also mentioned cludes 28K claims from multiple sources, orga- some work on retrieving documents. In our experi- nized into a knowledge graph (KG). The system ments, we perform retrieval using BM25 (Robert- can perform data exploration, e.g., it can find son and Zaragoza, 2009) and re-ranking using all claims that contain a certain named entity or BERT-based similarity, which is a common strategy keyphrase. In contrast, we are interested in detect- in recent state-of-the-art retrieval models (Akkaly- ing whether a claim was previously fact-checked. oncu Yilmaz et al., 2019a; Nogueira and Cho, 2019; Akkalyoncu Yilmaz et al., 2019b). Other work has focused on creating datasets of textual fact-checked claims, without building KGs. Our approach is most similar to that of (Akka- Some of the larger ones include the Liar, Liar lyoncu Yilmaz et al., 2019a), but we differ, as dataset of 12.8K claims from PolitiFact (Wang, we perform matching, both with BM25 and with 2017), and the MultiFC dataset of 38K claims BERT, against the normalized claim, against the from 26 fact-checking organizations (Augenstein title, and against the full text of the articles in the et al., 2019), the 10K claims Truth of Various fact-checking dataset; we also use both scores and Shades (Rashkin et al., 2017) dataset, among sev- reciprocal ranks when combining different scores eral other datasets, which were used for automatic and rankings.