Multi-Dimensional Gender Bias Classification

Multi-Dimensional Gender Bias Classification Emily Dinan∗, Angela Fan∗y, Ledell Wu, Jason Weston, Douwe Kiela, Adina Williams Facebook AI Research yLaboratoire Lorrain d’Informatique et Applications (LORIA) Abstract Machine learning models are trained to find patterns in data. NLP models can inadver- tently learn socially undesirable patterns when training on gender biased text. In this work, we propose a general framework that decom- poses gender bias in text along several pragmatic and semantic dimensions: bias from the gender of the person being spoken about, bias from the gender of the person being spoken to, and bias from the gender of the speaker. Figure 1: Framework for Gender Bias in Dialogue. Using this fine-grained framework, we auto- We propose a framework separating gendered language matically annotate eight large scale datasets based on who you are speaking ABOUT, speaking TO, with gender information. In addition, we col- and speaking AS. lect a novel, crowdsourced evaluation bench- mark of utterance-level gender rewrites. Dis- tinguishing between gender bias along multi- gender biases can affect downstream applications— ple dimensions is important, as it enables us to sometimes even leading to poor user experiences— train finer-grained gender bias classifiers. We understanding and mitigating gender bias is an im- show our classifiers prove valuable for a variety of important applications, such as control- portant step towards making NLP tools and models ling for gender bias in generative models, de- safer, more equitable, and more fair. We provide a tecting gender bias in arbitrary text, and shed finer-grained framework for this purpose, analyze light on offensive language in terms of gen- the presence of gender bias in models and data, deredness. and empower others by releasing tools that can 1 Introduction be employed to address these issues for numerous text-based use-cases. Language is a social behavior, and as such, it is While many works have explored methods for a primary means by which people communicate, removing gender bias from text (Bolukbasi et al., express their identities, and socially categorize 2016; Emami et al., 2019; Maudslay et al., 2019; arXiv:2005.00614v1 [cs.CL] 1 May 2020 themselves and others. Such social information Dinan et al., 2019a; Kaneko and Bollegala, 2019; is present in the words we write and, consequently, Zmigrod et al., 2019; Ravfogel et al., 2020), no in the text we use to train our NLP models. In extant work on classifying gender or removing gen- particular, models often can unwittingly learn neg- der bias has incorporated facts about how humans ative associations about protected groups present collaboratively and socially construct our language in their training data and propagate them. In partic- and identities. We propose a pragmatic and se- ular, NLP models often learn biases against others mantic framework for measuring bias along three based on their gender (Bolukbasi et al., 2016; Hovy dimensions that builds on knowledge of the con- and Spruit, 2016; Caliskan et al., 2017; Rudinger versational and performative aspects of gender, as et al., 2017; Garg et al., 2018; Gonen and Gold- illustrated in Figure 1. Recognizing these dimen- berg, 2019; Dinan et al., 2019a). Since unwanted sions is important, because gender along each di- ∗Joint first authors. mension can affect text differently, for example, by modifying word choice or imposing different unigram language modeling (Qian et al., 2019), ap- preferences in how we construct sentences. propriate turn-taking classification (Lepp, 2019), Decomposing gender into separate dimensions relation extraction (Gaut et al., 2019), identification also allows for better identification of gender bias, of offensive content (Sharifirad and Matwin, 2019; which subsequently enables us to train a suite of Sharifirad et al., 2019), and machine translation classifiers for detecting different kinds of gender (Stanovsky et al., 2019). Furthermore, translations bias in text. We train several classifiers on freely are judged as having been produced by older and available data that we annotate with gender infor- more male speakers than the original was (Hovy mation along our dimensions. We also collect a et al., 2020). new crowdsourced dataset (MDGENDER) for bet- For dialogue text particularly, gender biases in ter evaluation of gender classifier performance. The training corpora have been found to be amplified in classifiers we train have a wide variety of poten- machine learning models (Lee et al., 2019; Dinan tial applications. We evaluate them on three: con- et al., 2019a; Liu et al., 2019). While many of the trolling the genderedness of generated text, detect- works cited above propose methods of mitigating ing gendered text, and examining the relationship the unwanted effects of gender on text, Maudslay between gender bias and offensive language. In et al.(2019); Zmigrod et al.(2019); Dinan et al. addition, we expect them to be useful in future (2019a) in particular rely on counterfactual data to for many text applications such as detecting gen- alter the training distribution to offset gender-based der imbalance in newly created training corpora or statistical imbalances (see x4.1 for more discussion model-generated text. of training set imbalances). Also relevant is Kang In this work, we make four main contribu- et al.(2019, PASTEL), which introduces a paral- tions: we propose a multi-dimensional framework lel style corpus and shows gains on style-transfer (ABOUT, AS, TO) for measuring and mitigating gen- across binary genders. In this work, we provide der bias in language and NLP models, we introduce a clean new way to understand gender bias that an evaluation dataset for performing gender iden- extends to the dialogue use-case by independently tification that contains utterances re-written from investigating the contribution of author gender to the perspective of a specific gender along all three data created by humans. dimensions, we train a suite of classifiers capable of labeling gender in both a single and multitask set Most relevant to this work, Sap et al.(2019b) up, and finally we illustrate our classifiers’ utility proposes a framework for modeling pragmatic as- for several downstream applications. All datasets, pects of many social biases in text, such as intent annotations, and classifiers will be released pub- to offend, for guiding discovery of new instances licly to facilitate further research into the important of social bias. These works focus on complemen- problem of gender bias in language. tary aspects of a larger goal—namely, making NLP safe and inclusive for everyone—but they differ 2 Related Work in several ways. Here, we treat statistical gender bias in human or model generated text specifically, Gender affects myriad aspects of NLP, including allotting it the focused and nuanced attention that corpora, tasks, algorithms, and systems (Chang such a complicated phenomenon deserves. Sap et al., 2019; Costa-jussa`, 2019; Sun et al., 2019). et al.(2019b) takes a different perspective, and For example, statistical gender biases are ram- aims to characterize the broader landscape of nega- pant in word embeddings (Jurgens et al., 2012; tive stereotypes in social media text, an approach Bolukbasi et al., 2016; Caliskan et al., 2017; Garg which can make parallels apparent across differ- et al., 2018; Zhao et al., 2018b; Basta et al., 2019; ent types of socially harmful content. Moreover, Chaloner and Maldonado, 2019; Du et al., 2019; they consider different pragmatic dimensions than Gonen and Goldberg, 2019; Kaneko and Bollegala, we do: they target negatively stereotyped com- 2019; Zhao et al., 2019)—even multilingual ones monsense implications in arguably innocuous state- (Gonen et al., 2019; Zhou et al., 2019)—and af- ments, whereas we investigate pragmatic dimen- fect a wide range of downstream tasks including sions that straightforwardly map to conversational coreference resolution (Zhao et al., 2018a; Cao and roles (i.e., topics, addressees, and authors of con- Daume´, 2019; Emami et al., 2019), part-of-speech tent). As such, we believe the two frameworks to and dependency parsing (Garimella et al., 2019), be fully compatible. Also relevant is the intersectionality of gender classifiers to perform better than random chance. identity, i.e., when gender non-additively interacts We know that current-day classifiers are gender with other identity characteristics. Negative gen- biased, because they achieve much better than ran- der stereotyping is known to be weakened or re- dom performance by learning distributional differ- inforced by the presence of other social factors, ences in how current-day texts use gender; we show such as dialect (Tatman, 2017), class (Degaetano- this in x5. These classifiers learn to pick up on Ortlieb, 2018) and race (Crenshaw, 1989). These these statistical biases in text in addition to explicit differences have been found to affect gender classi- gender markers (like she).1 fication in images (Buolamwini and Gebru, 2018), Gender. Gender manifests itself in language in and also in sentences encoders (May et al., 2019). numerous ways. In this work, we are interested We acknowledge that these are crucial considera- in gender as it is used in English when referring tions, but set them aside for follow-up work. to people and other sentient agents, or when dis- 3 Dimensions of Gender Bias cussing their identities, actions, or behaviors. We annotate gender with four potential values: mascu- Gender infiltrates language differently depending line, feminine, neutral and unknown — which al- on the conversational role played by the people lows us to go beyond the oppositional male-female using that language (see Figure 1). We propose a gender binary. We take the neutral category to framework for decomposing gender bias into three contain characters with either non-binary gender separate dimensions: bias when speaking ABOUT identity, or an identity which is unspecified for gen- 2 someone, bias when speaking TO someone, and der by definition (say, for a magic tree).

Load more