On Measuring and Mitigating Biased Inferences of Word Embeddings

The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) On Measuring and Mitigating Biased Inferences of Word Embeddings Sunipa Dev, Tao Li, Jeff M. Phillips, Vivek Srikumar School of Computing University of Utah Salt Lake City, Utah, USA {sunipad, tli, jeffp, svivek}@cs.utah.edu Abstract to ascertain if one sentence—the premise—entails or contradicts another—the hypothesis, or if neither conclusions Word embeddings carry stereotypical connotations from the text they are trained on, which can lead to invalid inferences hold (i.e., they are neutral with respect to each other). in downstream models that rely on them. We use this observa- As an illustration, consider the sentences: tion to design a mechanism for measuring stereotypes using (1) The rude person visited the bishop. the task of natural language inference. We demonstrate a re- duction in invalid inferences via bias mitigation strategies on (2) The Uzbekistani person visited the bishop. static word embeddings (GloVe). Further, we show that for Clearly, the first sentence neither entails nor contra- gender bias, these techniques extend to contextualized em- dicts the second. Yet, the popular decomposable attention beddings when applied selectively only to the static compo- nents of contextualized embeddings (ELMo, BERT). model (Parikh et al. 2016) built with GloVe embeddings predicts that sentence (1) entails sentence (2) with a high prob- ability of 0.842! Either model error, or an underlying bias in Introduction GloVe could cause this invalid inference. To study the latter, Word embeddings have become the de facto feature repre- we develop a systematic probe over millions of such sen- sentation across NLP (Parikh et al. 2016; Seo et al. 2017, tence pairs that target specific word classes like polarized for example). Their usefulness stems from their ability cap- adjectives (e.g., rude) and demonyms (e.g., Uzbekistani). ture background information about words using large cor- A second focus of this paper is bias attenuation. As a pora as static vector embeddings—e.g., word2vec (Mikolov representative of several lines of work in this direction, et al. 2013), GloVe (Pennington, Socher, and Manning we use the recently proposed projection method of (Dev 2014)—or contextual encoders that produce embeddings— and Phillips 2019), which identifies the dominant direc- e.g., ELMo (Peters et al. 2018), BERT (Devlin et al. 2019). tion defining a bias (e.g., gender), and removes it from all However, besides capturing word meaning, their em- embedded vectors. This simple approach thus, avoids the beddings also encode real-world biases about gender, age, trap of residual information (Gonen and Goldberg 2019) ethnicity, etc. To discover biases, several lines of exist- seen in hard debiasing approach of (Bolukbasi et al. 2016), ing work (Bolukbasi et al. 2016; Caliskan, Bryson, and which categorizes words and treats each category differ- Narayanan 2017; Zhao et al. 2017; Dev and Phillips 2019) ently. Specifically, we ask the question: Does projection- employ measurements intrinsic to the vector representations, based debiasing attenuate bias in static embeddings (GloVe) which despite their utility, have two key problems. First, and contextualized ones (ELMo, BERT)? there is a mismatch between what they measure (vector dis- tances or similarities) and how embeddings are actually used Our contributions. Our primary contribution is the use (as features for downstream tasks). Second, contextualized of natural language inference-driven to design probes that embeddings like ELMo or BERT drive today’s state-of-the- measure the effect of specific biases. It is important to note art NLP systems, but tests for bias are designed for word here that the vector distance based methods of measuring types, not word token embeddings. bias poses two problems. First, it assumes that the interac- In this paper, we present a general strategy to probe word tion between word embeddings can be captured by a sim- embeddings for biases. We argue that biased representations ple distance function. Since embeddings are transformed by lead to invalid inferences, and the number of invalid infer- several layers of non-linear transformations, this assump- ences supported by word embeddings (static or contextual) tion need not be true. Second, the vector distance method is measures their bias. To concretize this intuition, we use the not applicable to contextual embeddings because there is no task of natural language inference (NLI), where the goal is single ‘driver’, ‘male’, ‘female’ vectors; instead the vectors Copyright c 2020, Association for the Advancement of Artificial are dependent on the context. Hence, to enhance this mea- Intelligence (www.aaai.org). All rights reserved. surement of bias, we use the task of textual inference. We 7659 construct sentence pairs where one should not imply any- process using how gender biases affect inferences related to thing about the other, yet because of representational biases, occupations. Afterwards, we will extend the approach to po- prediction engines (without mitigation strategies) claim that larized inferences related to nationalities and religions. they do. To quantify this we use model probabilities for entailment (E), contradiction (C) or neutral association (N) for Experimental Setup pairs of sentences. Consider, for example, We use GloVe to study static word embeddings and ELMo (3) The driver owns a cabinet. and BERT for contextualized ones. Our NLI models for GloVe and ELMo are based on the decomposable attention (4) The man owns a cabinet. model (Parikh et al. 2016) with a BiLSTM encoder instead (5) The woman owns a cabinet. of the original projective one (Cheng, Dong, and Lapata 2016). We used the GloVe vectors pretrained on the common The sentence (3) neither entails nor contradicts sentences crawl dataset with dimension 300. For ELMo, as is standard, (4) and (5). Yet, with sentence (3) as premise and sentence we first linearly interpolate the three layers of embeddings (4) as hypothesis, the decomposable attention model pre- before the LSTM encoder. We use BERTBASE, and follow dicts probabilities: E: 0.497, N: 0.238, C: 0.264; the model the NLI setup in the original work. Specifically, our final pre- predicts entailment. Whereas, with sentence (3) as premise dictor is a linear classifier over the embeddings of the first and sentence (5) as hypothesis, we get E: 0.040, N: 0.306, token in the input (i.e, [CLS]). Our models are trained on C: 0.654; the model predicts contradiction. Each premise- the SNLI training set. hypothesis pair differs only by a gendered word. The extended version of this paper 1 lists further hyper- We define aggregate measures that quantify bias ef- parameters and network details. fects over a large number of predictions. We discover sub- stantial bias across GloVe, ELMo and BERT embeddings. Occupations and Genders In addition to the now commonly reported gender bias (e.g., (Bolukbasi et al. 2016)), we also show that the embed- Consider the following three sentences: dings encode polarized information about demonyms and re- (6) The accountant ate a bagel. ligions. To our knowledge, this is the among the first demon- (7) The man ate a bagel. strations (Sweeney and Najafian 2019; Manzini et al. 2019) of national or religious bias in word embeddings. (8) The woman ate a bagel. Our second contribution is to show that simple mecha- The sentence (6) should neither entail nor contradict the sen- nisms for removing bias on static word embeddings (partic- tences (7) and (8): we do not know the gender of the accoun- ularly GloVe) work. The projection approach of (Dev and tant. For these, and many other sentence pairs, the correct Phillips 2019) has been shown effective for intrinsic mea- label should be neutral, with prediction probabilities E: 0, sures; we show that its effectiveness extends to the new NLI- N: 1, C: 0. But a gender-biased representation of the word based probes. Specifically, we show that it reduces gender’s accountant may lead to a non-neutral prediction. We ex- effect on occupations. We further show similar results for re- pand these anecdotal examples by automatically generating moving subspaces associated with religions and demonyms. a large set of entailment tests by populating a template con- Our third contribution is that these approaches can be ex- structed using subject, verb and object fillers. All our tem- tended to contextualized embeddings (on ELMo and BERT), plates are of the form: but with limitations. We show that the most direct appli- The subject verb a/an object. cation of learning and removing a bias direction on the full representations fails to reduce bias measured by NLI. Here, we use a set of common activities for the verb and ob- However, learning and removing a gender direction from ject slots, such as ate a bagel, bought a car, etc. For the the non-contextual part of the representation (the first layer same verb and object, we construct an entailment pair us- in ELMo, and subword embeddings in BERT), can reduce ing subject fillers from sets of words. For example, to assess NLI-measured gender bias. Yet, this approach is ineffective gender bias associated with occupations, the premise of the or inapplicable for religion or nationality. entailment pair would be an occupation word, while the hypothesis would be a gendered word. The extended version Measuring Bias with Inference of the paper has all the word lists we use in our experiments. Only the subject changes between the premise and the Our construction of a bias measure uses the NLI task, which hypothesis in any pair. Since we seek to construct entailment has been widely studied in NLP, starting with the PAS- pairs so the bias-free label should be neutral, we removed all CAL RTE challenges (Dagan, Glickman, and Magnini 2006; gendered words from the occupations list (e.g., nun, sales- Dagan et al.

On Measuring and Mitigating Biased Inferences of Word Embeddings

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support