Quantifying 60 Years of Gender Bias in Biomedical Research with Word Embeddings
Total Page:16
File Type:pdf, Size:1020Kb
Quantifying 60 Years of Gender Bias in Biomedical Research with Word Embeddings Anthony Rios1, Reenam Joshi2, and Hejin Shin3 1Department of Information Systems and Cyber Security 2Department of Computer Science 3Library Systems University of Texas at San Antonio fAnthony.Rios, Reenam.Joshi, [email protected] Abstract tion areas to understand the effect of bias. For ex- ample, Kay et al. (2015) found that the Google im- Gender bias in biomedical research can have age search application is biased (Kay et al., 2015). an adverse impact on the health of real peo- Specifically, they found an unequal representation ple. For example, there is evidence that heart of gender stereotypes in image search results for disease-related funded research generally fo- cuses on men. Health disparities can form different occupations (e.g., all police images are between men and at-risk groups of women of men). Likewise, ad-targeting algorithms may (i.e., elderly and low-income) if there is not an include characteristics of sexism and racism (Datta equal number of heart disease-related studies et al., 2015; Sweeney, 2013). Sweeney (2013) for both genders. In this paper, we study tem- found that the names of black men and women poral bias in biomedical research articles by are likely to generate ads related to arrest records. measuring gender differences in word embed- In healthcare, much of the prior work has stud- dings. Specifically, we address multiple ques- ied the bias in the diagnosis process made by doc- tions, including, How has gender bias changed over time in biomedical research, and what tors (Young et al., 1996; Hartung and Widiger, health-related concepts are the most biased? 1998). There have also been studies about ethical Overall, we find that traditional gender stereo- considerations about the use of machine learning types have reduced over time. However, we in healthcare (Cohen et al., 2014). also find that the embeddings of many medical It is possible to analyze and measure the pres- conditions are as biased today as they were 60 ence of gender bias in text. Garg et al. (2018) an- years ago (e.g., concepts related to drug addic- tion and body dysmorphia). alyzed the presence of well-known gender stereo- types over the last 100 years. Hamberg (2008) 1 Introduction shown that gender blindness and stereotyped pre- conceptions are the key cause for gender bias in It is important to develop gender-specific best- medicine. Heath et al. (2019) studied the gender- practice guidelines for biomedical research (Hold- based linguistic differences in physician trainee croft, 2007). If research is heavily biased towards evaluations of medical faculty. Salles et al. (2019) one gender, then the biased guidance may con- measured the implicit and explicit gender bias tribute towards health disparities because the evi- among health care professionals and surgeons. dence drawn-on may be questionable (i.e., not well Feldman et al. (2019) quantified the exclusion of studied). For example, there is more research fund- females in clinical studies at scale with automated ing for the study of heart disease in men (Weisz data extraction. Recently, researchers have studied et al., 2004). Therefore, the at-risk populations methods to quantify gender bias using word embed- of older women in low economic classes are not dings trained on biomedical research articles (Ku- as well-investigated. Therefore, this opens up the rita et al., 2019). Kurita et al. (2019) shown that possibility for an increase in the health disparities the resulting embeddings capture some well-known between genders. gender stereotypes. Moreover, the embeddings ex- Among informatics researchers, there has been hibit the stereotypes at a lower rate than embed- increased interest in understanding, measuring, and dings trained on other corpora (e.g., Wikipedia). overcoming bias associated with machine learning However, to the best of our knowledge, there has methods. Researchers have studied many applica- not been an automated temporal study in the change 1 Proceedings of the BioNLP 2020 workshop, pages 1–13 Online, July 9, 2020 c 2020 Association for Computational Linguistics of gender bias. we analyze two groups of words: occupations In this paper, we look at the temporal change of and mental health disorders. For each group, gender bias in biomedical research. To study social we measure the overall change in bias over biases, we make use of word embeddings trained time. Moreover, we measure the individual on different decades of biomedical research arti- bias associated with each occupation and men- cles. The two main question driving this work are, tal health disorder. In what ways has bias changed over time, and Are there certain illnesses associated with a specific 2 Related Work gender? We leverage three computational tech- In this section, we discuss research related to the niques to answer these questions, the Word Em- three major themes of this paper: gender disparities bedding Association Test (WEAT) (Caliskan et al., in healthcare, biomedical word embeddings, and 2017), the Embedding Coherence Test (ECT) (Dev bias in natural language processing (NLP). and Phillips, 2019), and Relational Inner Product Association (RIPA) (Ethayarajh et al., 2019). To 2.1 Gender Disparities in Healthcare. the best of our knowledge, this will be the first tem- poral analysis of bias of word embeddings trained There is evidence of gender disparities in the health- on biomedical research articles. Moreover, to the care system, from the diagnosis of mental health best of our knowledge, this is the first analysis that disorders to differences in substance abuse. An measures the gender bias associated with individual important question is, Do similar biases appear biomedical words. in biomedical research? In this work, while we Our work is most similar to Garg et al. (2018). explore traditional gender stereotypes (e.g., Intelli- Garg et al. (2018) study the temporal change of gence vs Appearance), we also measure potential both gender and racial biases using word embed- bias in the occupations and mental health-related dings. Our work substantially differs in three ways. disorders associated with each gender. First, this paper is focused on biomedical litera- With regard to mental health, as an example, af- ture, not general text corpora. Second, we analyze fecting more than 17 million adults in the United gender stereotypes using three distinct methods to States (US) alone, major depression is one of the see if the bias is robust to various measurement most common mental health illnesses (Pratt and techniques. Third, we extend the study beyond Brody, 2014). Depression can cause people to lose gender stereotypes. Specifically, we look at bias in pleasure in daily life, complicate other medical sets of occupation words, as well as bias in men- conditions, and possibly lead to suicide (Pratt and tal health-related word sets. Moreover, we quan- Brody, 2014). Moreover, depression can occur to tify the bias of individual occupational and mental anyone, at any age, and to people of any race or health-related words. ethnic group. While treatment can help individuals In summary, the paper makes the following con- suffering from major depression, or mental illness tributions: in general, only about 35% of individuals suffering from severe depression seek treatment from mental • We answer the question; How has the usage health professionals. It is common for people to of gender stereotypes changed in the last 60 resist treatment because of the belief that depres- years of biomedical research? Specifically, sion is not serious, that they can treat themselves, we look at the change in well-known gender or that it would be seen as a personal weakness stereotypes (e.g., Math vs Arts, Career vs Fam- rather than a serious medical illness (Gulliver et al., ily, Intelligence vs Appearance, and occupa- 2010). Unfortunately, while depression can affect tions) in biomedical literature from 1960 to anyone, women are almost twice as likely as men 2020. to have had depression (Albert, 2015). Moreover, depression is generally higher among certain de- • The second contribution answers the question; mographic groups, including, but not limited to, What are the most gender-stereotyped words Hispanic, non-Hispanic black, low income, and for each decade during the last 60 years, and low education groups (Bailey et al., 2019). The have they changed over time? This contribu- focus of this paper is to understand the impact of tion is more focused than simply looking at these mental health disparities in word embeddings traditional gender stereotypes. Specifically, trained on biomedical corpora. 2 2.2 Biomedical Word Embeddings. Year # Articles Word embeddings capture the distributional nature 1960-1969 1,479,370 between words (i.e., words that appear in similar 1970-1979 2,305,257 contexts will have a similar vector encoding). Over 1980-1989 3,322,556 1990-1999 4,109,739 the years, there have been multiple methods of pro- 2000-2010 6,134,431 ducing word embeddings, including, but not lim- 2010-2020 8,686,620 ited to, latent semantic analysis (Deerwester et al., 1990), Word2Vec (Mikolov et al., 2013a,b), and Total 26,037,973 GLOVE (Pennington et al., 2014). Moreover, pre- Table 1: The total number of articles in each decade. trained word embeddings have been shown to be useful for a wide variety of downstream biomedical NLP tasks (Wang et al., 2018), such as text classi- also been work on measuring bias in sentence em- fication (Rios and Kavuluru, 2015), named entity beddings (May et al., 2019). Furthermore, there recognition (Habibi et al., 2017), and relation ex- has been a significant amount of research that ex- traction (He et al., 2019). In Chiu et al. (2016), the plores different ways to measure bias in word em- authors study a standard methodology to train good beddings (Caliskan et al., 2017; Dev and Phillips, biomedical word embeddings.