Fact Checking: Detecting and Verifying Facts

Fact Checking: Detectando y Veriﬁcando Hechos

Bilal Ghanem Universitat Polit`ecnicade Val`encia [email protected]

Abstract: With the uncontrolled increasing of fake news, untruthful claims, and rumors over the web, recently different approaches have been proposed to address the problem. To distinguish false claims from truthful ones may positively affect the society in different aspects. In this paper we describe the motivations towards fake news research topic in the recent years, we present similar and related research topics, and we show our preliminary work and future plans. Keywords: Fact checking, rumors, fake news, disinformation Resumen: Debido a su descrontrolado incremento, recientemente se ha comenzado a investigar y a proponer aproximaciones a la detecciónde rumores y noticias y afirmaciones falsas. Distinguir proposiciones falsas de las verdaderas puede afectar positivamente a la sociedad en diferentes aspectos. En este art´ıculo describimos la motivaciónactual para investigar en el tema de las noticias falsas, presentamos temáticassimilares y que estánrelacionadas, y mostramos nuestro trabajo preliminar y nuestros planes de futuro. Palabras clave: Fact checking, rumores, noticias falsas, desinformación

1 Fake News cific issues, where they have political, social, Fake news is an important topic so the re- or financial agendas. search community become interested in its In a recent approach (Nelson and Taneja, identification necessity. Fake news as a topic 2018), the authors investigated the user’s be- has been defined as: “An inaccurate, some- havior when visiting these sites. They ex- times sensationalistic report that is created amined online visitations data across differ- to gain attention, mislead, deceive or dam- ent Internet devices. During the US elections age a reputation”1. Unlike misinformation, in 2016, they found that the number of de- in fake news, authors have previous intention vices that visited trusted news sites was 40 to pose a misleading sentence. Whereas in times larger than fake news ones. This obser- misinformation, authors have inaccurate or vation shows a hypothesis that the Internet confused information about specific topic. users are aware of fake news and they are able The spreading of this phenomenon has in- to discriminate them from others. Later on, creased in a massive way in online sites. The (Bond Jr and DePaulo, 2006) showed that openness of the web has led to increase the humans can detect lies only 4% better than number of online news agencies, social me- random chance. They analyzed more than dia networks, and online blogs. These plat- 200 meta-data of trusted sites and showed forms allow certain organizations or individu- that their good reputation may contribute to als to pose fake news, because they guarantee have more visitors. This study revealed that them several attractive factors, such as pri- the users are not able to detect fake news vacy, free-access, availability, and large audi- and they are affected by the good reputation ence. In the same time, a large number of un- of online sites. trusted news agencies have appeared. These Improving the search engines ranks of un- sites can affect the public opinions about spe- trusted sites might make them more popular, 1https://whatis.techtarget.com/definition/fake- and maybe, truthful. Therefore, fake news news, visited in May 2018. sites’ admins employ website spoofing or au-

Lloret, E.; Saquete, E.; Mart´ınez-Barco, P.; Moreno, I. (eds.) Proceedings of the Doctoral Symposium of the XXXIV International Conference of the Spanish Society for Natural Language Processing (SEPLN 2018), p. 19–23 Sevilla, Spain, September 19th 2018. Copyright c 2018 by the paper’s authors. Copying permitted for private and academic purposes. thentic news styling technique to mimic the cation, and veracity prediction. (Enayet and hight reputation of the authentic sites in an El-Beltagy, 2017) have achieved the highest attempt to make their sites closely similar. result in veracity prediction. They used the The last US election has drawn a real fear in percentage of replying tweets, the existence of the American nation about fake news (Nel- hashtags, and the existence of URLs as fea- son and Taneja, 2018). The fast spreading of tures for a classification model. According fake news motivated the owner of Wikipedia to the previous tasks, the veracity prediction encyclopedia to create a news site called Wik- was the one more related to fake news detec- iTribune2 to promote Evidence-based jour- tion. nalism. 3 Fact Checking in Presidential 2 Rumors Debates A similar research topic that has attracted Fake news gained attention in the 2016 US the research community attention recently is presidential elections, where both Democrats rumors detection. Despite the different defi- and Republicans blamed each other for nitions of rumors in the literature, the most spreading false information. According to a accepted is: “Unverified and instrumentally public poll, many people after the election be- relevant information statements in circula- lieved that fake news had affected the election tion” (DiFonzo and Bordia, 2007). Both, ru- results. Even during the presidential debate, mors and fake news are similar research top- journalists found that there were many false ics, although they have been tackled indepen- claims between candidates. Their goal was dently. Unlike fake news, rumors may turn to weaken or to build bad reputation of the out to be true, false, or partly true, wherein contenders. These false claims could draw the time of posting the veracity is unknown. real effect on the elections results. Especially, Rumors normally remain in circulation (ex. presidential debates are long by their nature retweeted by Twitter users) until a trusted and to detect false claims may need more destination uncover or verify the truth. An- time than the needed to spread these false other difference has been shown by (Zubiaga claims among the public. Also, these debates et al., 2018): rumors cannot just be classi- contain large number of sentences accused by fied by their veracity type (true, false, half- each candidate and most of these claims are true), but also by the credibility degree (high un-factual claims (opinions). These opinions or low). Previously, (Allport and Postman, composed another challenge to the journal- 1947) studied rumors from a psychological ists to filter them from the factual claims. perspective. They were interested in answer- This real issue has taken attention by CLEF- ing why people spread rumors in their envi- Lab 2018. In this lab, two shared tasks have ronment. In that time (1947), it was diffi- been proposed: check worthiness and factual- cult and complex to find a clear answer. But ity. In the following, we will give brief review in the recent years, a renewed interests con- about these two tasks, and show our prelim- ducted to answer the question: people spread inary approach rumors when there is uncertainty, when they feel anxiety, and when the information is im- 3.1 CLEF-2018 Check That Lab portant. As mentioned above, two different tasks have According to (Zubiaga et al., 2018), ru- been proposed at CLEF-2018 Check That lab mors in literature have been studied from (Nakov et al., 2018). The first task was sim- different perspectives: rumors detection (ru- ilar to rumor detection. The idea is to de- mor or not), tracking (in social media, detect claims that are worthy for checking. The tecting posts dealt with these rumors), stance other task is complementary, where the fac- classification (how each user or post is ori- tuality of these factual claims is needed to be ented towards a rumor’s veracity) and ve- checked. racity prediction (true, false, or unverified). Task 1 - Check-Worthiness: A set of In rumourEval shared task at SemEval-2017 presidential debates from the US presiden- (Derczynski et al., 2017), the organizers pro- tial election is presented for the task, where posed two different subtasks: stance classifi- each claim in the debate text has been tagged 2https://www.wikitribune.com, visited in May manually as worth to be checked or not. The 2018. full text of the debates is used in the task to 20 allow participants to exploit contextual fea- Team English Arabic tures in the debates. The task goal is to de- Prise de Fer 0.1332 tect claims that are worthy of checking and Copenhagen 0.1152 to rank them from the most worthy one for UPV-INAOE 0.1130 0.0585 checking to the lowest. Our preliminary ap- bigIR 0.1120 0.0899 proach for this task (Ghanem et al., 2018b) Fragarach 0.0812 was inspired from previous works proposed blue 0.0801 by (Granados et al., 2011) and Stamatatos RNCC 0.0632 (2017). The authors in the former work have used a text distortion technique to enhance Table 1: Official results for the Task1, re- thematic text clustering by maintaining the leased using the MAP measure words that have a low frequency in documents. Similarly, in the latter work, the same text distortion technique was used for (UPV-INAOE) has achieved the third posi- authorship attribution. The authors main- tion among seven teams. In the Arabic part, tained the words that had the highest fre- only two teams have submitted their results. quency in the documents to detect the au- Similarly to English, the results are close and thor from his/her writing style. In this work, there is not a big difference among them. We we used the same text distortion technique to believed that the low result of our approach detect worthy claims. We believed that this in the Arabic part is because we translated type of tasks is more thematic than stylis- the LC lexicons automatically and a manual tic, where the writing style is not as impor- translation might have been more reliable. tant as the thematic words. We have main- Task 2 - Factuality: As we mentioned tained the thematic words (that have the low- above, this task concerns with detecting the est frequency) using a ratio of C; the higher factuality of the claims from the US pres- value of C is, the more thematic words are idential election. The claims that are un- maintained. Also, we maintained a set of lin- worthy for checking have not been annotated guistic cue words (LC) that were used pre- and kept in the debates to maintain the con- viously by (Mukherjee and Weikum, 2015) text. Factual claims have been tagged as to infer the news credibility. Additionally, True, False, and Half-True. These debates we maintained also named entities from be- are provided in two languages, English and ing distorted, such as: Iraq, Trump, Amer- Arabic, similarly to the previous task. The ica. Through the manually checking of the macro F1 score was used as the performance claims, we found the checking worthy claims measure. tended to list different types of named enti- Our approach for this task was based on ties. After applying the distortion process, the hypothesis that factual claims have been the new version of the text was used by Bag- discussed and mentioned in online news agen- of-Chars using the Tf-Idf weighting scheme. cies. In our approach (Ghanem et al., 2018a), The new distorted text became less biased by we used the distribution of these claims in 3 the high frequency words, such as stopwords. the search engines results . Furthermore, we For the ranking purpose, we used a method supposed that truthful claims have been men- inspired from the K-Nearest Neighbor (KNN) tioned more by trusted web news agencies classifier to rank these worthy claims based than untruthful ones. Thus, our approach on the distance to the nearest neighbor. For depended on modeling the returned results the classification process, we used KNN clas- from search engines using similarity measures sifier. During the training phase of the task, with the reliabilities of the sources. Our fea- the Average Precision @N was used. In the ture set consists of two types: dependent testing phase, a set of testing files were pro- and independent features. For the depen- vided by the organizers and the Mean Aver- dent features, we used cosine over embed- age Precision (MAP) measure was employed. ding between a claim query and each of the In Table 1 the results for the task are pre- first N results from the search engines. We sented. It is worth to mention that the task used the main sentence components to built has been organized also in Arabic, where the the sentences embeddings, discarding stop-

English claims were translated manually. In 3 the English part of the task, our approach We used in our experiments both Google and Bing search engines. 21 Team English Arabic References Copenhagen 0.705 Allport, G. W. and L. Postman. 1947. The FACTR 0.913 0.657 psychology of rumor. Oxford, England: UPV-INAOE 0.949 0.820 Henry Holt. bigIR 0.964 Check It Out 0.964 Bond Jr, C. F. and B. M. DePaulo. 2006. Accuracy of deception judgments. Per- Table 2: Official results for the Task2, re- sonality and social psychology Review, leased using the MAE measure 10(3):214–234. Derczynski, L., K. Bontcheva, M. Liakata, words. Also, for each result, we extracted R. Procter, G. W. S. Hoi, and A. Zubi- the AlexaRank value for its site, to capture aga. 2017. Semeval-2017 task 8: Ru- the reliability of the result source. Finally, moureval: Determining rumour veracity another text similarity feature was used to and support for rumours. arXiv preprint measure the similarity using the full sen- arXiv:1704.05972. tences, without using embeddings and discarding stopwords. From these set of fea- DiFonzo, N. and P. Bordia. 2007. Ru- tures, we built other dependent features that mor, gossip and urban legends. Diogenes, capture the distribution of some of these pre- 54(1):19–35. vious ones. We used Standard Deviation and Enayet, O. and S. R. El-Beltagy. 2017. Average of the previous cosine similar values, Niletmrg at semeval-2017 task 8: Deter- and in a similar manner, for AlexaRank val- mining rumour and veracity support for ues. rumours on twitter. In Proceedings of The official results of task 2 are shown in the 11th International Workshop on Se- Table 2. In general, the low results of both mantic Evaluation (SemEval-2017), pages complex tasks can give an intuition of how 470–474. much this research topic is. Further work is needed. Ghanem, B., M. Montes-y Gòmez,F. Rangel, and P. Rosso. 2018a. Upv-inaoe - check 4 Future Work that: An approach based on external Fact checking became recently an even more sources to detect claims credibility. In interesting research topic. We think that de- Working Notes of CLEF 2018 – Confer- tecting worthy claims should be the first step ence and Labs of the Evaluation Forum, in fake news identification. We will address CLEF ’18, Avignon, France, September. these issues both in political debates and so- Ghanem, B., M. Montes-y Gòmez,F. Rangel, cial media. To the best of our knowledge, and P. Rosso. 2018b. Upv-inaoe - check the previous works on fact checking only con- that: Preliminary approach for checking centrated on validating the facts using ex- worthiness of claims. In Working Notes of ternal resources. We will work on investi- CLEF 2018 – Conference and Labs of the gating facts from different aspects: linguisti- Evaluation Forum, CLEF ’18, Avignon, cally, structurally, and semantically. In this France, September. vein, SemEval-2019 lab4 proposes two tasks for the next year: first to determine rumor Granados, A., M. Cebrian, D. Camacho, and veracity and support for rumors, which is F. de Borja Rodriguez. 2011. Reduc- similar to the one that was proposed previ- ing the loss of information through an- ously in SemEval-2017. Secondly, fact check- nealing text distortion. IEEE Transac- ing in community question answering forums, tions on Knowledge and Data Engineer- which is a new environment for investigat- ing, 23(7):1090–1102. ing facts veracity. This shows the high inter- Mukherjee, S. and G. Weikum. 2015. est of the research community in these two Leveraging joint interactions for credibil- research topics. Therefore, participating in ity analysis in news communities. In Pro- these tasks is one of our future plans. ceedings of the 24th ACM International on 4http://alt.qcri.org/semeval2019/index.php?id=tasks, Conference on Information and Knowl- visited on May 2018 edge Management, pages 353–362. ACM. 22 Nakov, P., A. Barrón-Cedeño, T. El- sayed, R. Suwaileh, L. Màrquez,W. Za- ghouani, P. Atanasova, S. Kyuchukov, and G. Da San Martino. 2018. Overview of the CLEF-2018 CheckThat! Lab on au- tomatic identification and verification of political claims. In P. Bellot, C. Trabelsi, J. Mothe, F. Murtagh, J. Nie, L. Soulier, E. Sanjuan, L. Cappellato, and N. Ferro, editors, Proceedings of the Ninth Interna- tional Conference of the CLEF Associa- tion: Experimental IR Meets Multilingual- ity, Multimodality, and Interaction, Lec- ture Notes in Computer Science, Avignon, France, September. Springer. Nelson, J. L. and H. Taneja. 2018. The small, disloyal fake news audience: The role of audience availability in fake news consumption. new media & society, page 1461444818758715. Stamatatos, E. 2017. Authorship attribution using text distortion. In Proceedings of the 15th Conference of the European Chap- ter of the Association for Computational Linguistics: Volume 1, Long Papers, volume 1, pages 1138–1149. Zubiaga, A., A. Aker, K. Bontcheva, M. Li- akata, and R. Procter. 2018. Detection and resolution of rumours in social media: A survey. ACM Computing Surveys (CSUR), 51(2):32.