Arxiv:2106.11410V2 [Cs.CL] 15 Jul 2021 Inferiority of Racialized People’S Language Practices Underrepresented in STEM and Academia

A Survey of Race, Racism, and Anti-Racism in NLP Anjalie Field Su Lin Blodgett Carnegie Mellon University Microsoft Research [email protected] [email protected] Zeerak Waseem Yulia Tsvetkov University of Sheffield University of Washington [email protected] [email protected] Abstract While researchers and activists have increasingly Despite inextricable ties between race and lan- drawn attention to racism in computer science and guage, little work has considered race in NLP academia, frequently-cited examples of racial bias research and development. In this work, we in AI are often drawn from disciplines other than survey 79 papers from the ACL anthology that NLP, such as computer vision (facial recognition) mention race. These papers reveal various (Buolamwini and Gebru, 2018) or machine learn- types of race-related bias in all stages of NLP ing (recidivism risk prediction) (Angwin et al., model development, highlighting the need for 2016). Even the presence of racial biases in search proactive consideration of how NLP systems engines like Google (Sweeney, 2013; Noble, 2018) can uphold racial hierarchies. However, per- sistent gaps in research on race and NLP re- has prompted little investigation in the ACL com- main: race has been siloed as a niche topic munity. Work on NLP and race remains sparse, and remains ignored in many NLP tasks; most particularly in contrast to concerns about gender work operationalizes race as a fixed single- bias, which have led to surveys, workshops, and dimensional variable with a ground-truth label, shared tasks (Sun et al., 2019; Webster et al., 2019). which risks reinforcing differences produced In this work, we conduct a comprehensive sur- by historical racism; and the voices of histor- vey of how NLP literature and research practices ically marginalized people are nearly absent in NLP literature. By identifying where and how engage with race. We first examine 79 papers from NLP literature has and has not considered race, the ACL Anthology that mention the words ‘race’, especially in comparison to related fields, our ‘racial’, or ‘racism’ and highlight examples of how work calls for inclusion and racial justice in racial biases manifest at all stages of NLP model NLP research practices. pipelines (§3). We then describe some of the limi- 1 Introduction tations of current work (§4), specifically showing that NLP research has only examined race in a nar- Race and language are tied in complicated ways. row range of tasks with limited or no social context. Raciolinguistics scholars have studied how they are Finally, in §5, we revisit the NLP pipeline with a fo- mutually constructed: historically, colonial pow- cus on how people generate data, build models, and ers construct linguistic and racial hierarchies to are affected by deployed systems, and we highlight justify violence, and currently, beliefs about the current failures to engage with people traditionally arXiv:2106.11410v2 [cs.CL] 15 Jul 2021 inferiority of racialized people’s language practices underrepresented in STEM and academia. continue to justify social and economic exclusion While little work has examined the role of race 1 (Rosa and Flores, 2017). Furthermore, language in NLP specifically, prior work has discussed race is the primary means through which stereotypes in related fields, including human-computer in- and prejudices are communicated and perpetuated teraction (HCI) (Ogbonnaya-Ogburu et al., 2020; (Hamilton and Trolier, 1986; Bar-Tal et al., 2013). Rankin and Thomas, 2019; Schlesinger et al., However, questions of race and racial bias 2017), fairness in machine learning (Hanna et al., have been minimally explored in NLP literature. 2020), and linguistics (Charity Hudley et al., 2020; 1We use racialization to refer the process of “ascribing and Motha, 2020). We draw comparisons and guid- prescribing a racial category or classification to an individual ance from this work and show its relevance to NLP or group of people . based on racial attributes including but not limited to cultural and social history, physical features, research. Our work differs from NLP-focused re- and skin color” (Charity Hudley, 2017). lated work on gender bias (Sun et al., 2019), ‘bias’ generally (Blodgett et al., 2020), and the adverse U.S. However, as race and racism are global con- impacts of language models (Bender et al., 2021) structs, some aspects of our analysis are applicable in its explicit focus on race and racism. to other contexts. We suggest that future studies In surveying research in NLP and related fields, on racialization in NLP ground their analysis in the we ultimately find that NLP systems and research appropriate geo-cultural context, which may result practices produce differences along racialized lines. in findings or analyses that differ from our work. Our work calls for NLP researchers to consider the social hierarchies upheld and exacerbated by 3 Survey of NLP literature on race NLP research and to shift the field toward “greater 3.1 ACL Anthology papers about race inclusion and racial justice” (Charity Hudley et al., 2020). In this section, we introduce our primary survey data—papers from the ACL Anthology3—and we 2 What is race? describe some of their major findings to empha- size that NLP systems encode racial biases. We It has been widely accepted by social scientists that searched the anthology for papers containing the race is a social construct, meaning it “was brought terms ‘racial’, ‘racism’, or ‘race’, discarding ones into existence or shaped by historical events, social that only mentioned race in the references section forces, political power, and/or colonial conquest” or in data examples and adding related papers cited rather than reflecting biological or ‘natural’ differ- by the initial set if they were also in the ACL An- ences (Hanna et al., 2020). More recent work has thology. In using keyword searches, we focus on criticized the “social construction” theory as circu- papers that explicitly mention race and consider lar and rooted in academic discourse, and instead papers that use euphemistic terms to not have sub- referred to race as “colonial constituted practices”, stantial engagement on this topic. As our focus including “an inherited western, modern-colonial is on NLP and the ACL community, we do not in- practice of violence, assemblage, superordination, clude NLP-related papers published in other venues exploitation and segregation” (Saucier et al., 2016). in the reported metrics (e.g. Table1), but we do The term race is also multi-dimensional and draw from them throughout our analysis. can refer to a variety of different perspectives, in- Our initial search identified 165 papers. How- cluding racial identity (how you self-identify), ob- ever, reviewing all of them revealed that many do served race (the race others perceive you to be), not deeply engage on the topic. For example, 37 and reflected race (the race you believe others per- papers mention ‘racism’ as a form of abusive lan- ceive you to be) (Roth, 2016; Hanna et al., 2020; guage or use ‘racist’ as an offensive/hate speech Ogbonnaya-Ogburu et al., 2020). Racial catego- label without further engagement. 30 papers only rizations often differ across dimensions and depend mention race as future work, related work, or mo- on the defined categorization schema. For exam- tivation, e.g. in a survey about gender bias, “Non- ple, the United States census considers Hispanic binary genders as well as racial biases have largely an ethnicity, not a race, but surveys suggest that been ignored in NLP” (Sun et al., 2019). After 2/3 of people who identify as Hispanic consider discarding these types of papers, our final analysis 2 it a part of their racial background. Similarly, set consists of 79 papers.4 the census does not consider ‘Jewish’ a race, but Table1 provides an overview of the 79 papers, some NLP work considers anti-Semitism a form manually coded for each paper’s primary NLP task of racism (Hasanuzzaman et al., 2017). Race de- and its focal goal or contribution. We determined pends on historical and social context—there are task/application labels through an iterative process: no ‘ground truth’ labels or categories (Roth, 2016). listing the main focus of each paper and then col- As the work we survey primarily focuses on the lapsing similar categories. In cases where papers United States, our analysis similarly focuses on the 3The ACL Anthology includes papers from all official 2https://www:census:gov/mso/ ACL venues and some non-ACL events listed in AppendixA, www/training/pdf/race-ethnicity- as of December 2020 it included 6; 200 papers onepager:pdf/, https://www:census:gov/ 4We do not discard all papers about abusive language, only topics/population/race/about:html, ones that exclusively use racism/racist as a classification label. https://www:pewresearch:org/fact-tank/ We retain papers with further engagement, e.g. discussions 2015/06/15/is-being-hispanic-a-matter- of how to define racism or identification of racial bias in hate of-race-ethnicity-or-both/ speech classifiers. Detect Bias Debias Total Collect Corpus Analyze Corpus Develop Model Survey/Position Abusive Language 6 4 2 5 2 2 21 Social Science/Social Media 2 10 6 1 - 1 20 Text Representations (LMs, embeddings) - 2 - 9 2 - 13 Text Generation (dialogue, image captions, story gen. ) - - 1 5 1 1 8 Sector-specific NLP applications (edu., law, health) 1 2 - - 1 3 7 Ethics/Task-independent Bias 1 - 1 1 1 2 6 Core NLP Applications (parsing, NLI, IE) 1 - 1 1 1 - 4 Total 11 18 11 22 8 9 79 Table 1: 79 papers on race or racism from the ACL anthology, categorized by NLP application and focal task. could rightfully be included in multiple categories, vealing under-representation in training data, some- we assign them to the best-matching one based on times tangentially to primary research questions: stated contributions and the percentage of the paper Rudinger et al.(2017) suggest that gender bias may devoted to each possible category.

Arxiv:2106.11410V2 [Cs.CL] 15 Jul 2021 Inferiority of Racialized People’S Language Practices Underrepresented in STEM and Academia

Ai Governance in 2019 a Year in Review

Our Digital Future in a Divided World 22Nd APRU ANNUAL JUNE 24-26, 2018 PRESIDENTS’ MEETING National TAIWAN UNIVERSIT Y

The Black Box, Unlocked PREDICTABILITY and UNDERSTANDABILITY in MILITARY AI

1 Mona T. Diab

The Miracle Continues School of Engineering Celebrates HKUST’S Th Anniversary

Kathleen Mckeown

Big Data Institute Educational Technology Programs Hong Kong and Nation’S Transformation Big Data Research Center