Do Women Perceive Hate Differently: Examining the Relationship Between Hate Speech, Gender, and Agreement Judgments
Total Page:16
File Type:pdf, Size:1020Kb
Do Women Perceive Hate Differently: Examining the Relationship Between Hate Speech, Gender, and Agreement Judgments Michael Wojatzki Tobias Horsmann Darina Gold Torsten Zesch Language Technology Lab University of Duisburg-Essen, Germany {michael.wojatzki,tobias.horsmann,darina.gold,torsten.zesch} @uni-due.de Abstract ods (pre-)identifying potentially hateful or threat- ening utterances are required. However, even for Understanding hate speech remains a sig- humans, the decision whether an utterance is hate nificant challenge for both creating reliable speech or not is often difficult (Ross et al., 2016; datasets and automated hate speech detec- Benikova et al., 2017). tion. We hypothesize that being part of We hypothesize that hate speech perception is the targeted group or personally agreeing substantially influenced by whether one belongs with an assertion substantially effects hate to the targeted group or by whether one agrees speech perception. To test these hypothe- or disagrees with the statement to evaluate. For ses, we create FEMHATE – a dataset con- instance, it is likely that females perceive the state- taining 400 assertions that target women. ment women have lower IQ than men as more These assertion are judged by female and hateful than men. Furthermore, if anyone should male subjects for (i) how hateful these strangely have the supposition that women really assertions are and (ii) for whether they have a lower IQ than men, then this person will agree with the assertions. We find that likely not attribute much hate to this statement. women and men consistently evaluate ex- treme cases of hate speech. We also find To study these hypotheses, we create the a strong relationship between hate speech FEMHATE data set containing 400 German asser- tions about women which have been collected and agreement judgments, showing that a 2 low agreement score is a prerequisite for through a web survey. As a first step, we limit our- hate speech. We show how this relation- self to self-contained, explicit statements to which assertions ship can be used for automatic hate speech we will refer to as . Subsequently, we detection. Our best system based on agree- let 40 females and 40 males annotate (i) how hate- ment judgments outperforms a baseline ful the assertions are and (ii) whether the subjects SVM classifier (equipped with ngrams) by agree or disagree with the assertions. However, in- a wide margin. dicating the amount of hatefulness on a numerical scale is a hard task which is associated with inter- 1 Introduction rater inconsistencies (Ross et al., 2016; Benikova et al., 2017). Thus, we use the Best–Worst-Scaling Hate speech can be defined as hateful or threaten- (BWS) approach by Louviere (1991) for the lat- ing communication targeted towards people deserv- ter, which results in more reliable scores for other ing protection. For instance A disciplinary slap opinion-related tasks (Kiritchenko and Mohammad, in the face never hurts, even for the (own) wife is 2017). The intuition underlying BWS is that al- hate speech against women.1 The effects of this though humans do not share a common absolute unpleasant form of communication range from poi- scale for a topic, they still tend to agree when pick- soning the atmosphere in social media to psychic ing the worst and best from a tuple of choices. We or physical violence in the real world (Mantilla, make all data publicly available.3 2013). To counteract the massive scale to which hate We find that in extreme cases of hate speech speech can occur in social media, automatic meth- against women, it does not matter if women or men 1Some of the examples in this paper are insulting, distress- 2All examples shown in this paper have been freely trans- ing, or offensive to some readers. These examples do in no lated from German by the authors. way corresponds to the opinion of the authors. 3github.com/muchafel/femhate 110 Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018) Vienna, Austria { September 19-21, 2018 judge the assertions. For less extreme instances, In this study, we decided to focus on another however, there are clear differences e.g. when eval- target group of hate speech - namely women. uating female quotas. In addition, we find a strong Although there are some works on misogyny as correlation between agreement and hate speech a subform of hate speech (Mantilla, 2013; Bartlett scores, and that an agreement score of an assertion et al., 2014; Cole, 2015), there is no dataset that is a necessary prerequisite for the assertion contain- serves as a gold standard for hate speech detec- ing hate speech. This means that an assertion with tion against women. The misogynist variant of a high agreement score value is most likely not hate speech was coined Gendertrolling by Man- hate speech. Furthermore, we show how the rela- tilla (2013), which according to Mantilla (2013), tionship agree/disagree judgments and hate speech is even more dangerous and destructive than reg- score can be exploited by automatic systems for ular trolling, often containing credible threats of predicting hate speech. We develop an automatic physical and psychic violence. Bartlett et al. (2014) approach that is based on how similarly assertions collected a corpus of Tweets containing terms such are judged by all of our subjects. We demonstrate as rape and slut in order to analyze their usage and that this approach outperforms a baseline system origin. While this is a fruitful approach in analyz- (SVM equipped with ngrams) by a large margin. ing misogynist behavior, it is also limited to Tweets containing these terms. 2 Related Work 2.3 Hate Speech and Opinions We now shed light on hate speech research and Benikova et al. (2017) defined hate speech as ex- related methods as well as the various facets that pressing a very negative stance against a given tar- make a formalization of hate speech difficult. Fur- get. Following this definition, we position hate thermore, we motivate our focus on misogyny in speech amongst studies using opinion expressions this study. to predict or rationalize stance (including implicit statements towards the given target) (Boltužic´ and 2.1 Hate Speech in General Šnajder, 2014; Sobhani et al., 2015; Wojatzki and In recent years, the increasing number of poten- Zesch, 2016). tially offensive, hurtful or abusive utterances in the We hypothesize that there is a direct connection Internet shifted into the research focus. Such utter- between the perception of hate speech and whether ances can be roughly summarized under the um- one agrees or disagrees with an assertion. For in- brella term of hate speech (Warner and Hirschberg, stance, whether the statement Women cannot live 2012; Silva et al., 2016), which is itself only up to the demands of the male olympic hundred vaguely defined (Schmidt and Wiegand, 2017). meter run is hate speech or not depends heavily In earlier work on hate speech, hate speech has on whether one thinks that women do not meet the been framed as abusive or hostile messages or requirements for the male olympic hundred meter flames (Spertus, 1997). Other commonly used run. Consider how this perception might change terms are abusive language (Waseem et al., 2017) if we change male olympic hundred meter run to or offensive language (Razavi et al., 2010), as well sitting in the parliament. If we insert sitting in the as sub-issues such as cyberbullying (Xu et al., 2012) parliament, most people would likely disagree with or trolling (Mantilla, 2013). Research on this topic the statement and the statement is probably more focused on an analysis of such utterances and meth- likely to be labeled as hate speech. ods for an automatic detection. Furthermore, we hypothesize that being part of the target group influences the perception of hate 2.2 Misogyny as a Form of hate speech speech and whether one agrees or disagrees on As noted by Mondal et al. (2017), hate speech is notions concerning this group. In a small survey existent on many social media channels, resulting with three participants, Kwok and Wang (2013) in many efforts to detect or eliminate hate speech indicated that in racist targeted hate speech, race and hate speech related phenomenons (Agarwal has an influence on the perception of hate speech. and Sureka, 2015; Bartlett et al., 2014; Gitari et al., 2015; Ting et al., 2013), often focusing on one 2.4 Annotating Hate Speech specific form of hate speech, e.g. racism (Chaudhry, In the construction of a hate speech corpus, there 2015; Waseem and Hovy, 2016a). are basically two steps: 1) collection of potential 111 Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018) Vienna, Austria { September 19-21, 2018 hate speech 2) rating of these instances. Number Most current studies rely on lists of offensive Assertions 400 words and phrases for collecting potential hate Agreement Judgments 32,000 speech (Mantilla, 2013; Njagi et al., 2015; Waseem BWS Judgments 4,800 and Hovy, 2016b). However, such collection in- evitably brings in biases due to the limited number Table 1: Overview on the collected dataset. of query terms. For example, if one collects tweets by searching for the term bitch, it is not surpris- ing that if there are hate speech annotations in this asked participants to come up with assertions that collection, it is strongly associated to this term. are relevant to our topic. As shown by Benikova et al. (2017), Waseem et Assertion Generation To narrow the topic down al. (2017), and Ross et al. (2016), annotating hate for the subjects, we presented them with a list of speech using a numeric or binary scale on such data (sub)-topics.