Generating Counter Narratives Against Online Hate Speech: Data and Strategies

Generating Counter Narratives against Online Hate Speech: Data and Strategies Serra Sinem Tekiroglu˘ 1, Yi-Ling Chung1,2, and Marco Guerini1 1Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento, Italy [email protected],[email protected],[email protected] 2University of Trento, Italy Abstract the conversations (Bielefeldt et al., 2011; Jurgens et al., 2019). In this line of action, some Non- Recently research has started focusing on avoiding undesired effects that come with Govermental Organizations (NGOs) train operators content moderation, such as censorship and to intervene in online hateful conversations by writ- overblocking, when dealing with hatred on- ing counter-narratives. A Counter-Narrative (CN) line. The core idea is to directly intervene in is a non-aggressive response that offers feedback the discussion with textual responses that are through fact-bound arguments and is considered as meant to counter the hate content and prevent the most effective approach to withstand hate mes- it from further spreading. Accordingly, au- sages (Benesch, 2014; Schieb and Preuss, 2016). tomation strategies, such as natural language generation, are beginning to be investigated. To be effective, a CN should follow guidelines simi- 1 Still, they suffer from the lack of sufficient lar to those in ‘Get the Trolls Out’ project , in order amount of quality data and tend to produce to avoid escalating the hatred in the discussion. generic/repetitive responses. Being aware of Still, manual intervention against hate speech the aforementioned limitations, we present a is not scalable. Therefore, data-driven NLG ap- study on how to collect responses to hate ef- proaches are beginning to be investigated to assist fectively, employing large scale unsupervised language models such as GPT-2 for the gen- NGO operators in writing CNs. As a necessary first eration of silver data, and the best annotation step, diverse CN collection strategies have been strategies/neural architectures that can be used proposed, each of which has its advantages and for data filtering before expert validation/post- shortcomings (Mathew et al., 2018; Qian et al., editing. 2019; Chung et al., 2019). 1 Introduction In this study, we aim to investigate methods to obtain high quality CNs while reducing efforts Owing to the upsurge in the use of social me- from experts. We first compare data collection dia platforms over the past decade, Hate Speech strategies depending on the two main requirements (HS) has become a pervasive issue by spreading that datasets must meet: (i) data quantity and (ii) quickly and widely. Meanwhile, it is difficult to data quality. Finding the right trade-off between track and control its diffusion, since nuances in the two is in fact a key element for an effective cultures and languages make it difficult to provide arXiv:2004.04216v1 [cs.CL] 8 Apr 2020 automatic CN generation. To our understanding a clear-cut distinction between hate and dangerous none of the collection strategies presented so far speeches (Schmidt and Wiegand, 2017). The stan- is able to fulfill this requirement. Thus, we test dard approaches to prevent online hate spreading several hybrid strategies to collect data, by mixing include the suspension of user accounts or deletion niche-sourcing, crowd-sourcing, and synthetic data of hate comments from the social media platforms generation obtained by fine-tuning deep neural ar- (SMPs), paving the way for the accusation of cen- chitectures specifically developed for NLG tasks, sorship and overblocking. Alternatively, to weigh such as GPT-2 (Radford et al., 2019). We propose the right to freedom of speech, shadow-banning has using an author-reviewer framework in which an been put into use where the content/account is not author is tasked with text generation and a reviewer deleted but hidden from SMP search results. Still can be a human or a classifier model that filters the we believe that we must overstep reactive identify- and-delete strategies, to responsively intervene in 1http://stoppinghate.getthetrollsout.org/ produced output. Finally, a validation/post-editing Crowdsourcing (CROWD). Qian et al.(2019) phase is conducted with NGO operators over the propose that once a list of HS is collected from filtered data. Our findings show that this frame- SMPs and manually annotated, we can briefly in- work is scalable allowing to obtain datasets that are struct crowd-workers (non-expert) to write possible suitable in terms of diversity, novelty, and quantity. responses to such hate content. In this case the content is obtained in controlled settings as opposed to 2 Related Work crawling approaches. Nichesourcing (NICHE). The study by Chung We briefly focus on three research aspects related to et al.(2019) still relies on the idea of outsourcing hate online, i.e. available datasets, methodologies and collecting CNs in controlled settings. However, for detection, and studies on textual intervention in this case the CNs are written by NGO operators, effectiveness. In the next section we will instead i.e. persons specifically trained to fight online ha- focus on few methodologies specifically devoted to tred via textual responses that can be considered as HS-CN pairs collection. experts in CN production. Hate datasets. Several datasets have been collected from SMPs including Twitter (Waseem and 4 Characteristics of the Datasets Hovy, 2016; Waseem, 2016; Ross et al., 2017), Facebook (Kumar et al., 2018), WhatsApp (Sprug- Regardless of the HS-CN collection strategy, noli et al., 2018), and forums (de Gibert et al., datasets must meet two criteria: quality and quan- 2018), in order to perform hate speech classifica- tity. While quantity has a straightforward inter- tion (Xiang et al., 2012; Silva et al., 2016; Del Vi- pretation, we propose that data quality should be gna et al., 2017; Mathew et al., 2018). decomposed into conformity (to NGOs guidelines) Hate detection. Most of the research on hatred and diversity (lexical & semantic). Additionally, online focuses on hate speech detection (Warner HS-CN datasets should not be ephemeral, which and Hirschberg, 2012; Silva et al., 2016; Schmidt is a structural problem with crawled data since, and Wiegand, 2017; Fortuna and Nunes, 2018) em- due to copyright limitations, datasets are usually ploying features such as lexical resources (Gitari distributed as a list of tweet IDs (Klubickaˇ and et al., 2015; Burnap and Williams, 2016), sentiment Fernandez´ , 2018). With generated data (crowd- polarity (Burnap and Williams, 2015) and multi- sourcing or nichesourcing) the problem is avoided. modal information (Hosseinmardi et al., 2015) to a Quantity. While the CRAWL dataset is very small classifier. and ephemeral, representing more a proof of con- Hate countering. Recent work has proved that cept than an actual dataset, the CROWD dataset counter-narratives are effective in hate countering involved more than 900 workers to produce ≈ 41K (Benesch, 2014; Silverman et al., 2016; Schieb and CNs, while the NICHE dataset is constructed by Preuss, 2016; Stroud and Cox, 2018; Mathew et al., the participation of ≈ 100 expert-operators to ob- 2019). Several CN methods to counter hatred are tain ≈ 4K pairs (in three languages) and resorted outlined and tested by Benesch(2014), Munger to HS paraphrasing and pair translation to obtain (2017), and Mathew et al.(2019). the final ≈ 14K HS-CN pairs. Evidently, employing non-experts, e.g, crowdworkers or annotators, 3 CN Collection Approaches is preferable in terms of data quantity. Quality. In terms of quality, we consider that di- Three prototypical strategies to collect HS-CN versity is of paramount importance, since verbatim pairs have been presented recently. repetition of arguments can become detrimental Crawling (CRAWL). Mathew et al.(2018) fo- for operator credibility and for the CN intervention cuses on the intuition that CNs can be found on itself. Following Li et al.(2016), we distinguish SMPs as responses to hateful expressions. The pro- between (i) lexical diversity and (ii) semantic diver- posed approach is a mix of automatic HS collection sity. While lexical diversity focuses on the diversity via linguistic patterns, and a manual annotation of in surface realization of CNs and can be captured replies to check if they are responses that counter by word overlapping metrics, semantic diversity the original hate content. Thus, all the material focuses on meaning and is harder to be captured, collected is made of natural/real occurrences of as in the case of CNs with similar meaning but dif- HS-CN pairs. ferent wordings (e.g.,“Any source?” vs. “Do you have a link?”). 2016) or the standard type/token ratio (Richards, (i) Semantic Diversity & Conformity. To model 1987) since it allows us to compare corpora of di- semantic diversity and conformity, we focus on the verse sizes by averaging the statistics collected on a CN ‘argument’ types that are present in various sliding window of 1000 words. Since CROWD and datasets. Argument types are useful in assessing NICHE contain repeated CNs for different HSs2, content richness (Hua et al., 2019). In a prelim- we first removed repeated CNs and then applied inary analysis, CROWD CNs are observed to be a shuffling procedure to avoid that CNs that are simpler and mainly focus on ‘denouncing’ the use answering to the same HS (so more likely to con- of profanity while NICHE CNs are found richer tain repetitions) appear close together. Results in with a higher variety of arguments. On the other Table1 show that NICHE is the dataset with more hand, CRAWL CNs can cover diverse arguments to lexical diversity (lower RR), followed by CRAWL a certain extent while being highly prone to contain and CROWD. profanities. To perform a quantitative comparison, Discussion. We can reasonably conclude that: (i) we randomly sampled 100 pairs from each dataset crawling, as presented in (Mathew et al., 2018), and annotated them according to the CN types pre- is not a mature procedure yet for CN collection, sented by Benesch et al.(2016), which is the most even if it is promising, (ii) nichesourcing is the one comprehensive CN schema.

Generating Counter Narratives Against Online Hate Speech: Data and Strategies

Setting the Record Straighter on Shadow Banning

Big Data for Improved Health Outcomes — Second Edition — Arjun Panesar Machine Learning and AI for Healthcare Big Data for Improved Health Outcomes Second Edition

Analyzing Twitter Users' Behavior Before and After Contact By

Countering Terrorism Online with Artificial Intelligence an Overview for Law Enforcement and Counter-Terrorism Agencies in South Asia and South-East Asia

Congressional Record—Senate S4658

Netchoice V. Moody

'Please Read the Comments': Commenting Cultures

Purveying Fake News

Reframing Entrepreneurship Via Identity, Techné, and Material Culture

Online Agitators, Extremists and Counter-Messaging in Indonesia Zoom, Tuesday 25 August 2020, 16:00-17:30 (Singapore)

Tech Giants, Artificial Intelligence, and the Future of Journalism

An Introduction to Viewpoint Diversity Shareholder Proposals