Arxiv:2106.01625V1 [Cs.CL] 3 Jun 2021

Generate, Prune, Select: A Pipeline for Counterspeech Generation against Online Hate Speech Wanzheng Zhu and Suma Bhat University of Illinois at Urbana-Champaign, USA [email protected], [email protected] Abstract Hate I am done with Islam and isis. All Muslims Speech: should be sent to their homeland. Britain will Warning: this paper contains content that be better without their violence and ideology. may be offensive or upsetting. Expert: I agree that ISIS is an evil aberration, but to Countermeasures to effectively fight the ever extend this to include up to 3 million people just in the UK is just plain silly. increasing hate speech online without blocking freedom of speech is of great social in- Common- Hate speech is not tolerated. Please review our terest. Natural Language Generation (NLG), place: user policies. Thank you for your cooperation. is uniquely capable of developing scalable so- Not rele- Use of the r-word is unacceptable as it de- lutions. However, off-the-shelf NLG meth- vant: means and insults people with disabilities. ods are primarily sequence-to-sequence neu- ral models and they are limited in that they Table 1: An illustrative example of hate speech and generate commonplace, repetitive and safe re- counterspeech. sponses regardless of the hate speech (e.g., “Please refrain from using such language.”) or irrelevant responses, making them ineffective ing countermeasure is counterspeech—a response for de-escalating hateful conversations. In this that provides non-negative feedback through fact- paper, we design a three-module pipeline ap- bound arguments and broader perspectives to miti- proach to effectively improve the diversity and gate hate speech and fostering a more harmonious relevance. Our proposed pipeline first generates various counterspeech candidates by a conversation in social platforms (Schieb and Preuss, generative model to promote diversity, then 2016; Munger, 2017; Mathew et al., 2018; Shin and filters the ungrammatical ones using a BERT Kim, 2018). Counterspeech as a measure to combat model, and finally selects the most relevant abusive language online is also promoted in active counterspeech response using a novel retrieval- campaigns such as “Get The Trolls Out”.1 based method. Extensive Experiments on What makes an effective counterspeech? In- three representative datasets demonstrate the formed by psychosocial and linguistic studies on efficacy of our approach in generating diverse and relevant counterspeech. counterspeech (Mathew et al., 2019b) and the large number of effective counterspeech examples cre- 1 Introduction ated by crowdsourcing (Qian et al., 2019) and by arXiv:2106.01625v1 [cs.CL] 3 Jun 2021 experts (Chung et al., 2019), we identify that effec- Hate speech is any form of expression through tive counterspeech should be diverse and relevant which speakers intend to vilify, humiliate, or in- to the hate speech instance. Diversity is the re- cite hatred against a group or a class of persons on quirement that a collection of counterspeech should the basis of some characteristics, including race, not be largely commonplace, repetitive and safe religion, skin color, sexual identity, gender identity, responses without regard to the target or type of ethnicity, disability, or national origin (Ward, 1997; hate speech (e.g., “Please refrain from using such Nockleby, 2000). Its ever-growing increase on the language.”). Relevance refers to the property that Internet makes it a problem of significant societal counterspeech should directly address and target concern (Williams, 2019); effective countermea- the central aspects of the hate speech, enabling sures call for not blocking freedom of speech by means of censorship or active moderation (Gagliar- 1https://getthetrollsout.org/ done et al., 2015; Strossen, 2018). A very promis- stoppinghate coherent conversations rather than irrelevant or off- matic and human evaluations. topic ones (e.g., the hate speech instance targets an ethnic group, while the counterspeech talks about 2 Proposed Model people with disabilities). Comparative examples We assume access to a corpus of la- are shown in Table1 where we list some counter- beled pairs of conversations D = speech that lack diversity or relevance. f(x1; y1); (x2; y2); :::; (xn; yn)g, where xi is While NLG systems (in particular, sequence-to- a hate speech and yi is the appropriate counter- sequence models) offer much promise for generat- speech as decided by experts or by crowdsourcing. ing text at scale (Sutskever et al., 2014; Zhu et al., The goal is to learn a model that takes as input a 2018; Lewis et al., 2020), the quality of the out- hate speech x and outputs a counterspeech y.A puts is modest in the context of the requirements motivating example is shown in Table1. Most identified above. Indeed, Qian et al.(2019), the importantly, we aim at generating diverse and only existing quality work on counterspeech gen- relevant counterspeech. We present an overview of eration, has highlighted their limitations: the re- the model in Figure1 and describe each module in sponses are largely commonplace and sometimes detail below. irrelevant. These limitations apply more broadly to general conversational language generation tasks, 2.1 Candidate Generation arising primarily due to the intrinsic end-to-end The main goal of this module is to create a diverse training nature of a single sequence-to-sequence ar- candidate pool for counterspeech selection. We chitecture (Sordoni et al., 2015; Li et al., 2016; Ser- extract all available counterspeech instances Y = ban et al., 2017; Jiang and de Rijke, 2018). Model [y1; y2; :::; yn] from the training dataset and enlarge refinements to account for these limitations have the counterspeech pool by a generative model. been addressed individually: improved diversity Specifically, we utilize an RNN-based varia- (Li et al., 2016; Xu et al., 2018) or improved rele- tional autoencoder (Bowman et al., 2016), that in- vance (Gao et al., 2019; Li et al., 2020). However, corporates the global distributed latent representa- combining these improvements into a single model tions of all sentences to generate candidates. Both is not straightforward. Such is the goal of this pa- the encoder and the decoder have two layers with per. 512 nodes each, and we use two highway network We tackle the problem from an entirely novel an- layers (Srivastava et al., 2015) to facilitate robust gle by proposing a three-module pipeline approach, training. Like all other generative models, it aims Generate, Prune, Select (denoted as “GPS”) to en- to maximize the lower bound of the likelihood L sure the generated sentences adhere to the required of generating the training data Y , properties of diversity and relevance. First, the L = −KL(q (zjy) jj p(z)) + [log p (yjz)] Candidate Generation module generates a large θ Eqθ(zjy) θ number of diverse response candidates using a gen- where θ denotes all parameters of the generative erative model. As such, a large candidate pool is model, z is a latent variable having a Gaussian dis- made available for selection, which accounts for tribution with a diagonal covariance matrix, p de- improved diversity. Second, the Candidate Pruning notes the prior distribution, q denotes the posterior module prunes the ungrammatical candidates from distribution, and KL denotes the KL-divergence the candidate pool. Last, from the pruned coun- (Kullback and Leibler, 1951). In the training pro- terspeech candidate pool, the Response Selection cess, we apply the KL annealing technique (Bow- module selects the most relevant counterspeech for man et al., 2016) to prevent the undesirable stable a given hate speech instance by a novel retrieval- equilibrium problem (i.e., the first term of the like- based response selection method. lihood function KL(qθ(zjy)jjp(z)) becomes zero). We demonstrate the efficacy of GPS, the first Upon the completion of the training, we generate pipeline approach for counterspeech generation, by candidates by simply decoding from noise sam- a systematic comparison with other competitive pled from a standard Gaussian distribution (i.e., NLG approaches in generating diverse and rele- ∼ N (0; 1)). vant counterspeech. We derive new state-of-the-art As demonstrated by Bowman et al.(2016) (and results on three benchmark datasets by showing as inferred from our own experiments described in improved diversity and relevance using both auto- Section3), the generative model not only captures But this dude is a Jew a retarded Jew Candidate Pool 1. What you have just said is hate Grammatical What is the point in speech. listening to black women? Initial Candidate Pool Counterspeech 2. I do not think the government 1. What you have just said is hate does not it widely to our speech. 1. Please refrain from using hateful responsibilities which is such ableist language in your posts. ignorance. 2. muslims have the good to the right to build with any other countries Response …… 3. muslims have the good to the against any public here. Selection right to build with any other …… 3864. The Muslims I know are just countries against any public here. ordinary hard working people just …… 15412. Hate speech is not tolerated. like any other religion? Please review our user policies and 30000. Hate speech is not consider this a final warning. The point of listening to black tolerated. Please review our user women, and women in general, policies and consider this a final is that they could broaden and warning. enlighten your view of the world. Please refrain from using the Candidate Candidate r-word. It is offensive and Generation Pruning against our Content Policy and User Agreement. Figure 1: Overview of GPS. The red ovals correspond to the individual modules. holistic properties of sentences such as style, topic, two response selection methods, but we find that and high-level syntactic features, but also produces neither of them is well-suited for our task. diverse candidates.

Arxiv:2106.01625V1 [Cs.CL] 3 Jun 2021

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support