Arxiv:2010.12820V2 [Cs.CL] 12 Apr 2021
Total Page:16
File Type:pdf, Size:1020Kb
“Nice Try, Kiddo”: Investigating Ad Hominems in Dialogue Responses Emily Sheng1, Kai-Wei Chang2, Premkumar Natarajan1, Nanyun Peng1;2 1 Information Sciences Institute, University of Southern California 2 Computer Science Department, University of California, Los Angeles {ewsheng,pnataraj}@isi.edu, {kwchang,violetpeng}@cs.ucla.edu Abstract Post: Many are trying to co-opt and mischaracterize the #blacklivesmatter movement. We won’t allow it! Ad hominem attacks are those that target some Resp: I hate how much of a victim complex you guys have. feature of a person’s character instead of the Post: You’re the reason we need the #MeToo movement. position the person is maintaining. These at- Resp: Nice try, kiddo. tacks are harmful because they propagate im- Post: Stop eating them if you don’t want them to go ex- plicit biases and diminish a person’s credi- tinct! #govegan bility. Since dialogue systems respond di- Resp: I don’t like your username rectly to user input, it is important to study ad hominems in dialogue responses. To this Table 1: Ad hominem responses to Twitter posts. end, we propose categories of ad hominems, examples of ad hominem responses to Twitter posts. compose an annotated dataset, and build a classifier to analyze human and dialogue sys- Undesirable in any response, ad hominems are un- tem responses to English Twitter posts. We productive in furthering a meaningful discussion specifically compare responses to Twitter top- and can reinforce falsehoods. However, these at- ics about marginalized communities (#Black- tacks appeal to emotions and implicit biases to ar- LivesMatter, #MeToo) versus other topics gue a point, and are thus often effectively harmful (#Vegan, #WFH), because the abusive lan- regardless of whether the attacks are true, recog- guage of ad hominems could further amplify nized, or retracted (Yap, 2013). the skew of power away from marginalized populations. Furthermore, we propose a con- Our work is motivated by this fallacy’s potential strained decoding technique that uses salient to amplify the spread of harmful societal biases. n-gram similarity as a soft constraint for For communities that are already disproportion- top-k sampling to reduce the amount of ad ately harmed by societal power inequalities, ad hominems generated. Our results indicate that hominems further amplify the power imbalance. 1) responses from both humans and DialoGPT Tone policing is a type of ad hominem that seeks contain more ad hominems for discussions to regulate the emotions that a person (usually of around marginalized communities, 2) different quantities of ad hominems in the training data a marginalized population) can use to deliver their can influence the likelihood of generating ad points (e.g., not too angrily), thereby altogether hominems, and 3) we can use constrained de- invalidating the style of delivery, the person’s com- coding techniques to reduce ad hominems in petence, and the points being conveyed. Besides di- generated dialogue responses. rectly experiencing ad hominem attacks, marginal- arXiv:2010.12820v2 [cs.CL] 12 Apr 2021 ized groups could also be disproportionately dis- 1 Introduction couraged from using technologies that propagate Ad hominems attack an opponent’s character or these attacks, since abusive language from a tech- identity instead of the points the opponent is mak- nology can deter people from using the technology ing, and can exist in any conversational setting (Sood et al., 2012b). between two or more entities. From an argumen- The goal of this study is to analyze ad hominems tation perspective, ad hominems are fallacies, and in dialogue system- and human-generated re- fallacies rely on faulty reasoning to advance a point sponses for topics that vary in impact to marginal- (Hansen, 2020). These ad hominem fallacies are ized populations. Through analysis, we formulate related to abusive language, toxicity, and microag- techniques to reduce ad hominem responses and gressions, and can be expressed with both subtle thus the associated harms, which is especially im- and explicitly offensive language. Table1 presents portant for dialogue systems since these systems directly interact with users. hominems in dialogue systems is related to exam- We analyze responses from DialoGPT (Zhang ining offensive language and other harms. Lastly, et al., 2020a) and humans to English Twitter posts. we discuss existing constrained decoding methods. Specifically, we compare responses to Twitter Ad Hominems In the argumentation literature, topics about marginalized communities (#Black- theoretical ad hominems include the abusive (attack LivesMatter, #MeToo) versus other topics (#Vegan, on the opponent’s character), tu quoque (“he did #WFH). Through human annotation and trained it first”), circumstantial (accusation of hypocrisy), classifiers, we find that ad hominems exist in both and guilt by association (associating the opponent human and DialoGPT responses. Across response with someone with low credibility) (Walton, 1998; sources, there are more ad hominems in #Black- Woods, 2007). Wijze(2003) criticizes that these LivesMatter- and #MeToo-related responses, fewer textbook examples are not realistic in conversa- in #Vegan-related responses, and even fewer in tion. For more empirical categories, Habernal #WFH-related responses. The presence of more et al.(2018) propose ad hominem types based on ad hominems in responses to social issues that analysis of Reddit’s ChangeMyView discussion concern marginalized groups has troubling impli- threads, and Delobelle et al.(2019) analyze the cations about the amplified harms toward these name-calling and abusive categories. Moreover, groups. Wulczyn et al.(2017) use classifiers for a large- Given our analysis, we further propose a con- scale analysis of personal attacks in Wikipedia com- strained decoding algorithm to reduce the amount ments. We build upon prior works to define and of ad hominems generated by dialogue systems. By analyze ad hominems in a conversational setting. n using salient -gram similarity to apply soft con- Additionally, Yap(2013) discusses the harmful k straints to top- sampling, our proposed technique effects of implicit biases in forming and evaluating is simple, extensible to reducing other harms, and ad hominems. They emphasize that ad hominem does not require much additional computation. At attacks can be harmful to a person’s credibility each decoding time step, the technique compares and expertise even if the attack is recognized as the similarity between the current generated output fallacious and irrelevant to the argument. In par- and salient ad hominem versus non-ad hominem ticular, because societal norms allow biases and n -grams, possibly selecting alternative token can- stereotypes to detract from a person’s credibility didates to generate. This technique is effective at or expertise, the use of ad hominems can further reducing the amount of ad hominems generated diminish the rhetorical credibility (Govier, 1993) across topics while maintaining coherence and rel- of marginalized groups. evance. Our main contribution is a novel analysis of ad Offensive Language Detection Ad hominems hominem responses generated by humans and Di- occur in many forms and are related to differ- aloGPT across topics varying in impact to marginal- ent types of offensive language, including abu- ized communities. For this analysis, we propose sive language (Yin et al., 2009; Chen et al., 2012; empirically-derived ad hominem categories that are Nobata et al., 2016), hate speech (Warner and further verified through annotation. Furthermore, Hirschberg, 2012; Kwok and Wang, 2013; Djuric we build a new dataset of Twitter posts paired with et al., 2015), profanity (Sood et al., 2012a), and the human- and DialoGPT-generated responses, where more subtle forms of microaggressions (Breitfeller the responses have ad hominem-related labels. Fi- et al., 2019) and projecting biases and stereotypes nally, we devise a constrained decoding technique through power differentials in language (Sap et al., that uses salient n-gram similarity to steer top-k 2020). Ranging from outright insults to condescen- sampling away from ad hominem responses. We re- sion, ad hominems are a form of offensive language lease data and code at https://github.com/ that is difficult to comprehensively and objectively ewsheng/ad-hom-in-dialogue. define. Nonetheless, these responses are important to characterize, since they can irreparably damage 2 Related Work a person’s credibility. It is also generally important to identify these subtle forms of offensive language, This work is related to a broad spectrum of topics, since it is unclear if existing offensive language de- including prior definitions of ad hominems and how tection techniques are equally effective for these ad hominems facilitate biases. Also, analyzing ad subtle forms. # [post, Harms in Dialogue Systems Conversational Polarizing Affects systems are known to perpetuate several types of Topic marginalized human resp] topic group harms. Ruane et al.(2019) caution about harms that pairs can result from using conversational systems and BLM yes yes 4,037 MeToo yes yes 2,859 propose striving for trust and transparency; Roller Vegan yes no 3,697 et al.(2020) suggest techniques for chatbot safety. WFH no no 3,992 For analysis, Sheng et al.(2019) evaluate societal Total - - 14,585 biases in language generation, Curry and Rieser Table 2: Topics, rationales, and statistics for the human (2018) study how conversational systems respond response subset