A Survey on Computational Propaganda Detection
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) Survey Track A Survey on Computational Propaganda Detection Giovanni Da San Martino1∗ , Stefano Cresci2 , Alberto Barron-Cede´ no˜ 3 , Seunghak Yu4 , Roberto Di Pietro5 and Preslav Nakov1 1Qatar Computing Research Institute, HBKU, Doha, Qatar 2Institute of Informatics and Telematics, IIT-CNR, Pisa, Italy 3DIT, Alma Mater Studiorum–Universita` di Bologna, Forl`ı, Italy 4MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA 5College of Science and Engineering, HBKU, Doha, Qatar fgmartino, rdipietro, [email protected], [email protected], [email protected], [email protected] Abstract Whereas false statements are not really a new phenomenon —e.g., yellow press has been around for decades— this time Propaganda campaigns aim at influencing people’s things were notably different in terms of scale and effective- mindset with the purpose of advancing a specific ness thanks to social media, which provided both a medium to agenda. They exploit the anonymity of the Internet, reach millions of users and an easy way to micro-target spe- the micro-profiling ability of social networks, and cific narrow groups of voters based on precise geographic, the ease of automatically creating and managing demographic, psychological, and/or political profiling. coordinated networks of accounts, to reach millions An important aspect of the problem that is often largely of social network users with persuasive messages, ignored is the mechanism through which disinformation specifically targeted to topics each individual user is being conveyed, which is using propaganda techniques. is sensitive to, and ultimately influencing the out- These include specific rhetorical and psychological tech- come on a targeted issue. In this survey, we review niques, ranging from leveraging on emotions —such as us- the state of the art on computational propaganda de- ing loaded language, flag waving, appeal to authority, slo- tection from the perspective of Natural Language gans, and cliches´ — to using logical fallacies —such as straw Processing and Network Analysis, arguing about men (misrepresenting someone’s opinion), red herring (pre- the need for combined efforts between these com- senting irrelevant data), black-and-white fallacy (presenting munities. We further discuss current challenges and two alternatives as the only possibilities), and whataboutism. future research directions. Moreover, the problem is exacerbated by the fact that propa- ganda does not necessarily have to lie; it could appeal to emo- 1 Introduction tions or cherry-pick the facts. Thus, we believe that specific The Web makes it possible for anybody to create a website or research on propaganda detection is a relevant contribution in a blog and to become a news medium. Undoubtedly, this is a the fight against online disinformation. hugely positive development as it elevates freedom of expres- Here, we focus on computational propaganda, which is sion to a whole new level, giving anybody the opportunity to defined as “propaganda created or disseminated using com- make their voice heard. With the rise of social media, every- putational (technical) means” [Bolsover and Howard, 2017]. one can reach out to a very large audience, something that Traditionally, propaganda campaigns had been a monopoly until recently was only possible for major news outlets. of state actors, but nowadays they are within reach for However, this new avenue for self-expression has brought various groups and even for individuals. One key ele- also unintended consequences, the most evident one being ment of such campaigns is that they often rely on coordi- that the society has been left unprotected against potential nated efforts to spread messages at scale. Such coordina- manipulation from a multitude of sources. The issue be- tion is achieved by leveraging botnets (groups of fully au- came of general concern in 2016, a year marked by micro- tomated accounts) [Zhang et al., 2016], cyborgs (partially targeted online disinformation and misinformation at an un- automated) [Chu et al., 2012] and troll armies (human- precedented scale, primarily in connection to Brexit and the driven) [Linvill and Patrick, 2018], known as sockpuppets US Presidential campaign; then, in 2020, the COVID-19 pan- [Kumar et al., 2017], Internet water army [Chen et al., 2013], demic also gave rise to the first global infodemic. Spread- astroturfers [Ratkiewicz et al., 2011], and seminar users ing disinformation disguised as news created the illusion that [Darwish et al., 2017]. Thus, a promising direction to thwart the information was reliable, and thus people tended to lower propaganda campaigns is to discover such coordination; this their natural barrier of critical thinking compared to when in- is demonstrated by recent interest by Facebook1 and Twitter2. formation came from different types of sources. 1 newsroom.fb.com/news/2018/12/inside-feed-coordinated-inauthentic-behavior/ ∗ 2 Contact Author https://help.twitter.com/en/rules-and-policies/platform-manipulation 4826 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) Survey Track In order for propaganda campaigns to work, it is critical Although lying and creating fake stories is considered as that they go unnoticed. This further motivates work on de- one of the propaganda techniques (some authors refer to it as tecting and exposing propaganda campaigns, which should “black propaganda” [Jowett and O’Donnell, 2012]), there are make them increasingly inefficient. Given the above, in the contexts where this course of actions is often done without present survey, we focus on computational propaganda from pursuing the objective to influence the audience, as in satire two perspectives: (i) the content of the propaganda messages and clickbaiting. These special cases are of less interest when and (ii) their propagation in social networks. it comes to fighting the weaponization of social media, and Finally, it is worth noting that, even though there have been are therefore considered out of the scope for this survey. several recent surveys on fake news detection [Shu at al., 2017; Zhou et al., 2019], fact-checking [Thorne and Vlachos, 3 Text Analysis Perspective 2018], and truth discovery [Li et al., 2016], none of them fo- cuses on computational propaganda. There has also been a Research on propaganda detection based on text analysis has special issue of the Big Data journal on Computational Pro- a short history, mainly due to the lack of suitable annotated paganda and Political Big Data [Bolsover and Howard, 2017], datasets for training supervised models. There have been but it did not include a survey. Here we aim to bridge this gap. some relevant initiatives, where expert journalists or volun- teers analyzed entire news outlets, which could be used for 6 2 Propaganda training. For example, Media Bias/Fact Check (MBFC) is an independent organization analyzing media in terms of The term propaganda was coined in the 17th century, and ini- their factual reporting, bias, and propagandist content, among tially referred to the propagation of the Catholic faith in the other aspects. Similar initiatives are run by US News & New World [Jowett and O’Donnell, 2012, p. 2]. It soon took World Report7 and the European Union.8 Such data has been a pejorative connotation, as its meaning was extended to also used in distant supervision approaches [Mintz et al., 2009], mean opposition to Protestantism. In more recent times, back i.e., by assigning each article from a given news outlet the in 1938, the Institute for Propaganda Analysis [Ins, 1938], label propagandistic/non-propagandistic using the label for defined propaganda as “expression of opinion or action by in- that news outlet. Unfortunately, such coarse approximation dividuals or groups deliberately designed to influence opin- inevitably introduces noise to the learning process, as we dis- ions or actions of other individuals or groups with reference cuss in Section 5. to predetermined ends”. In the remainder of this section, we review current work on Recently, Bolsover et. al [2017] dug deeper into this def- propaganda detection from a text analysis perspective. This inition identifying its two key elements: (i) trying to influ- includes the production of annotated datasets, characterizing ence opinion, and (ii) doing so on purpose. Influencing opin- entire documents, and detecting the use of propaganda tech- ions is achieved through a series of rhetorical and psycho- niques at the span level. logical techniques. Clyde R. Miller in 1937 proposed one of the seminal categorizations of propaganda, consisting of 3.1 Available Datasets [ ] seven devices Ins, 1938 , which remain well accepted to- Given that existing models to detect propaganda in text are [ ] day Jowett and O’Donnell, 2012, p.237 : name calling, glit- supervised, annotated corpora are necessary. Table 1 shows tering generalities, transfer, testimonial, plain folks, card an overview of the available corpora (to the best of our knowl- stacking, and bandwagon. Other scholars consider catego- edge), with annotation both at the document and at the frag- [ rizations with as many as eighty-nine techniques Conserva, ment level. ] 3 2003 , and Wikipedia lists about seventy techniques. How- Rashkin et al. [2017] released TSHP-17, a balanced cor- ever, these larger sets of techniques are essentially subtypes pus with document-level annotation including