Arxiv:1912.11078V2 [Cs.CL] 12 Sep 2020
Total Page:16
File Type:pdf, Size:1020Kb
Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview Deven Shah H. Andrew Schwartz Dirk Hovy Stony Brook University Bocconi University {dsshah,has}@cs.stonybrook.edu [email protected] Abstract the training process. In the case of domains, these non-generalizations are words, phrases, or senses An increasing number of natural language that occur in one text type, but not another. processing papers address the effect of bias However, this kind of variation is not just re- on predictions, introducing mitigation tech- niques at different parts of the standard NLP stricted to text domains: it is a fundamental prop- pipeline (data and models). However, these erty of human-generated language: we talk differ- works have been conducted individually, with- ently than our parents or people from a different out a unifying framework to organize efforts part of our country, etc. (Pennebaker and Stone, within the field. This situation leads to repet- 2003; Eisenstein et al., 2010; Kern et al., 2016). itive approaches, and focuses overly on bias In other words, language reflects the diverse de- symptoms/effects origins , rather than on their , mographics, backgrounds, and personalities of the which could limit the development of effective countermeasures. In this paper, we propose a people who use it. While these differences are of- unifying predictive bias framework for NLP. ten subtle, they are distinct and cumulative (Trudg- We summarize the NLP literature and suggest ill, 2000; Kern et al., 2016; Pennebaker, 2011). general mathematical definitions of predictive Similar to text domains, this variation can lead bias. We differentiate two consequences of models to pick up on patterns that do not gener- bias: outcome disparities and error dispari- alize to other author-demographics, or to rely on ties, as well as four potential origins of bi- undesirable word-demographic relationships. ases: label bias, selection bias, model over- Bias may be an inherent property of any NLP amplification, and semantic bias. Our frame- work serves as an overview of predictive bias system (and broadly any statistical model), but in NLP, integrating existing work into a single this is not per se negative. In essence, biases are structure, and providing a conceptual baseline priors that inform our decisions (a dialogue sys- for improved frameworks. tem designed for elders might work differently than one for teenagers). Still, undetected and 1 Introduction unaddressed, biases can lead to negative conse- Predictive models in NLP are sensitive to a variety quences: There are aggregate effects for demo- of (often unintended) biases throughout the devel- graphic groups, which combine to produce predic- opment process. As a result, fitted models do not tive bias. I.e., the label distribution of a predictive arXiv:1912.11078v2 [cs.CL] 12 Sep 2020 generalize well, incurring performance and relia- model reflects a human attribute in a way that di- bility losses on unseen data. They also have so- verges from a theoretically defined “ideal distribu- cially undesirable effects by systematically under- tion.” For example, a Part Of Speech (POS) tag- serving or mispredicting certain user groups. ger reflecting how an older generation uses words The general phenomenon of biased predictive (Hovy and Søgaard, 2015) diverges from the pop- models in NLP is not recent. The community ulation as a whole. has long worked on the domain adaptation prob- A variety of papers have begun to address lem (Jiang and Zhai, 2007; Daume III, 2007): countermeasures for predictive biases (Li et al., models fit on newswire data do not perform well 2018; Elazar and Goldberg, 2018; Coavoux et al., on social media and other text types. This prob- 2018).1 Each identifies a specific bias and counter- lem arises from the tendency of statistical mod- 1An even more extensive body of work on fairness exists els to pick up on non-generalizable signals during as part of the FAT* conferences, which goes beyond the scope Error disparity Example Dep, is_older? Matched / uniform: outcome - 0 (young) : .001 origin over-amplification label bias outcome disparity ideal: - 1 (old) : .001 The model discriminates on Biased annotations, The distribution of outcomes, given attribute A, 0, 0 : .30 consequence a given human attribute interaction, or latent bias is dissimilar than the ideal distribution: 0, 1 : .35 From predicted beyond its source base-rate. from past classifications. Q(Ŷ |A) ≁ P(Y |A) t t 1, 0 : .20 - 0 (young) : 1, 1 : .15 mean=.0005 Embedding - 1 (old) : Corpus Source Population Target Population From predicted mean=.002 biased 0, 0 : .25 features features outcomes features 0, 1 : .40 Q(ϵ |A) ≁ Uniform => there fit predict outcomes t 휃 X Y X 1, 0 : .25 exists an i,j such that embedding source source target Ŷ target ϵ ϵ (Pre-trained Side) (Model Side) (Application Side) 1, 1 : .20 Q( t|Ai) =/= Q( t|Aj) semantic bias selection bias error disparity Non-ideal associations between attributed The sample of observations The distribution of error (ϵ) over at least two lexeme (e.g. gendered pronouns) and themselves are not representative different values of an attribute A( ) are unequal: non-attributed lexeme (e.g. occupation). of the application population. ϵ ϵ Q( t|Ai) ≁ Q( t|Aj) Figure 1: The Predictive Bias Framework for NLP: Depiction of where bias mayerror originate disparity within a standard supervised NLP pipeline. Evidence of bias is seen in y^ via outcome disparityTheand distributionerror disparityof error (ϵ) .is inconsistent over different values of an attribute A( ): ϵ Q( t|A) ≁ Uniform measure on their terms, but it is often not explic- lected works in social science and adjacent fields. itly clear which bias is addressed, where it orig- We identify four distinct sources of bias: selec- inates, or how it generalizes. There are multi- tion bias, label bias, model overamplification, ple sources from which bias can arise within the and semantic bias. We can express all of these as predictive pipeline, and methods proposed for one differences between (a) a “true” or intended distri- specific bias often do not apply to another. As bution (e.g., over users, labels, or outcomes), and a consequence, much work has focused on bias (b) the distribution used or produced by the model. effects and symptoms rather than their origins. These cases arise at specific points within a typical While it is essential to address the effects of bias, it predictive pipeline: embeddings, source data, la- can leave the fundamental origin unchanged (Go- bels (human annotators), models, and target data. nen and Goldberg, 2019), requiring researchers to We provide quantitative definitions of predictive rediscover the issue over and over. The “bias” dis- bias in this framework intended to make it easier cussed in one paper may, therefore, be quite dif- to: (a) identify biases (because they can be clas- ferent than that in another.2 sified), (b) develop countermeasures (because the A shared definition and framework of predictive underlying problem is known), and (c) compare bias can unify these efforts, provide a common ter- biases and countermeasures across papers. We minology, help to identify underlying causes, and hope this paper will help researchers spot, com- allow coordination of countermeasures (Sun et al., pare, and address bias in all its various forms. 2019). However, such a general framework had Contributions Our primary contributions in- yet to be proposed within the NLP community. clude: (1) a conceptual framework for identify- To address these problems, we suggest a joint ing and quantifying predictive bias and its origins conceptual framework, depicted in Figure1, out- within a standard NLP pipeline, (2) a survey of bi- lining and relating the different origins of bias. ases identified in NLP models, and (3) a survey We base our framework on an extensive survey of methods for countering bias in NLP organized of the relevant NLP literature, informed by se- within our conceptual framework. of this biased-focused paper. Note also that while bias is an 2 Definition - Two Types of Disparities ethical issue and contributes to many papers in the ethics in NLP area, the two should not be conflated: ethics covers more Our definition of predictive bias in NLP builds on than bias. 2Quantitative social science offers a background for its definition within the literature on standardized bias (Berk, 1983). However, NLP differs fundamentally in testing (i.e., SAT, GRE, etc.) Specifically, Swinton analytic goals (namely out-of-sample prediction for NLP ver- (1981) states: sus parameter inference for hypothesis testing in social sci- ence) that bring about NLP-specific situations: biases in word embeddings, annotator labels, or predicting over-amplified By “predictive bias," we refer to a situation in demographics. which a [predictive model] is used to predict a specific criterion for a particular population, and pronouns associated with computer science). Our is found to give systematically different predic- framework should enable its users to apply evolv- tions for subgroups of this population who are in ing standards and norms across NLP’s many ap- fact identical on that specific criterion.3 plication contexts. A prototypical example of outcome disparity We generalize Swinton’s definition in two ways: is gender disparity in image captions. Zhao et al. First, to align notation with standard supervised (2017) and Hendricks et al.(2018) demonstrate modeling, we say there are both Y (a random a systematic difference with respect to gender in variable representing the “true” values of an out- the outcome of the model, Y^ even when taking come) and Y^ (a random variable representing the the source distribution as an ideal target distribu- predictions. Next, we allow the concept to apply tion: Q(Y^ jgender) Q(Y jgender) ∼ to differences associated with continuously-valued target target Q(Y jgender).