Men Are Elected, Women Are Married: Events Gender Bias on Wikipedia
Total Page:16
File Type:pdf, Size:1020Kb
Men Are Elected, Women Are Married: Events Gender Bias on Wikipedia Jiao Sun1;2 and Nanyun Peng1;2;3 1Computer Science Department, University of Southern California 2Information Sciences Institute, University of Southern California 3Computer Science Department, University of California, Los Angeles [email protected], [email protected] Abstract Name Wikipedia Description Human activities can be seen as sequences Career: In 1930, when she was 17, she eloped of events, which are crucial to understanding with 26-year-old actor Grant Withers; they were married in Yuma, Arizona. The marriage was societies. Disproportional event distribution Loretta for different demographic groups can mani- Young annulled the next year, just as their second fest and amplify social stereotypes, and po- (F) movie together (ironically entitled Too Young tentially jeopardize the ability of members in to Marry) was released. some groups to pursue certain goals. In this pa- Personal Life: In 1930, at 26, he eloped to per, we present the first event-centric study of Yuma, Arizona with 17-year-old actress Loretta gender biases in a Wikipedia corpus. To facili- Grant Young. The marriage ended in annulment in tate the study, we curate a corpus of career and Withers 1931 just as their second movie together, titled (M) personal life descriptions with demographic in- Too Young to Marry, was released. formation consisting of 7,854 fragments from 10,412 celebrities. Then we detect events with Table 1: The marriage events are under the Career sec- a state-of-the-art event detection model, cali- tion for the female on Wikipedia. However, the same brate the results using strategically generated marriage is in the Personal Life section for the male. templates, and extract events that have asym- yellow background highlights events in the passage. metric associations with genders. Our study discovers that Wikipedia pages tend to inter- mingle personal life events with professional could have a significant impact on audiences’ per- events for females but not for males, which ception of different groups, thus propagating and calls for the awareness of the Wikipedia com- munity to formalize guidelines and train the even amplifying societal biases. Therefore, analyz- editors to mind the implicit biases that contrib- ing potential biases in Wikipedia is imperative. utors carry. Our work also lays the foundation In particular, studying events in Wikipedia is im- for future works on quantifying and discover- portant. An event is a specific occurrence under ing event biases at the corpus level. a certain time and location that involves partici- pants (Yu et al., 2015); human activities are essen- 1 Introduction tially sequences of events. Therefore, the distribu- Researchers have been using NLP tools to ana- tion and perception of events shape the understand- arXiv:2106.01601v1 [cs.CL] 3 Jun 2021 lyze corpora for various tasks on online platforms. ing of society. Rashkin et al.(2018) discovered For example, Pei and Jurgens(2020) found that implicit gender biases in film scripts using events female-female interactions are more intimate than as a lens. For example, they found that events with male-male interactions on Twitter and Reddit. Dif- female agents are intended to be helpful to other ferent from social media, open collaboration com- people, while events with male agents are moti- munities such as Wikipedia have slowly won the vated by achievements. However, they focused on trust of public (Young et al., 2016). Wikipedia has the intentions and reactions of events rather than been trusted by many, including professionals in events themselves. work tasks such as scientific journals (Kousha and In this work, we propose to use events as a lens Thelwall, 2017) and public officials in powerful to study gender biases and demonstrate that events positions of authority such as court briefs (Gerken, are more efficient for understanding biases in cor- 2010). Implicit biases in such knowledge sources pora than raw texts. We define gender bias as the asymmetric association of events with females and Career Personal Life Collected 1 males, which may lead to gender stereotypes. For Occ F M F M F M example, females are more associated with domes- Acting 464 469 464 469 464 469 tic activities than males in many cultures (Leopold, Writer 455 611 319 347 1,372 2,466 2018; Jolly et al., 2014). Comedian 380 655 298 510 642 1,200 Artist 193 30 60 18 701 100 To facilitate the study, we collect a corpus that Chef 81 141 72 95 176 350 contains demographic information, personal life de- Dancer 334 167 286 127 812 465 scription, and career description from Wikipedia.2 Podcaster 87 183 83 182 149 361 Musician 39 136 21 78 136 549 We first detect events in the collected corpus using a state-of-the-art event extraction model (Han et al., All 4,425 3,429 10,412 2019). Then, we extract gender-distinct events with Table 2: Statistics showing the number of celebrities a higher chance to occur for one group than the with Career section or Personal Life section, together other. Next, we propose a calibration technique to with all celebrities we collected. Not all celebrities offset the potential confounding of gender biases have Career or Personal Life sections. in the event extraction model, enabling us to fo- cus on the gender biases at the corpus level. Our contributions are three-fold: corpus. We encourage interested researchers to fur- ther utilize our collected corpus and conduct studies • We contribute a corpus of 7,854 fragments from other perspectives. In each experiment, we from 10,412 celebrities across 8 occupations select the same number of female and male celebri- including their demographic information and ties from one occupation for a fair comparison. Wikipedia Career and Personal Life sections. Event Extraction. There are two definitions of • We propose using events as a lens to study events: one defines an event as the trigger word gender biases at the corpus level, discover a (usually a verb) (Pustejovsky et al., 2003b), the mixture of personal life and professional life other defines an event as a complex structure includ- for females but not for males, and demonstrate ing a trigger, arguments, time, and location (Ahn, the efficiency of using events in comparison 2006). The corpus following the former definition to directly analyzing the raw texts. usually has much broader coverage, while the lat- ter can provide richer information. For broader • We propose a generic framework to analyze coverage, we choose a state-of-the-art event detec- event gender bias, including a calibration tech- tion model that focuses on detecting event trigger nique to offset the potential confounding of words by Han et al.(2019). 3 We use the model gender biases in the event extraction model. trained on the TB-Dense dataset (Pustejovsky et al., 2003a) for two reasons: 1) the model performs 2 Experimental Setup better on the TB-Dense dataset; 2) the annota- In this section, we will introduce our collected cor- tion of the TB-Dense dataset is from the news pus and the event extraction model in our study. articles, and it is also where the most content of Wikipedia comes from.4 We extract and lemma- Dataset. Our collected corpus contains demo- tize events e from the corpora and count their fre- graphics information and description sections of quencies jej. Then, we separately construct dic- m m m m m celebrities from Wikipedia. Table2 shows the tionaries E = fe1 : je1 j; :::; eM : jeM jg and statistics of the number of celebrities with Career f f f f f E = fe1 : je1 j; :::; eF : jeF jg mapping events to or Personal Life sections in our corpora, together their frequency for male and female respectively. with all celebrities we collected. In this work, we only explored celebrities with Career or Personal Event Extraction Quality. To check the model Life sections, but there are more sections (e.g., Pol- performance on our corpora, we manually anno- itics and Background and Family) in our collected tated events in 10,508 sentences (female: 5,543, 3 1In our analysis, we limit to binary gender classes, which, We use the code at https://github.com/ while unrepresentative of the real-world diversity, allows us to rujunhan/EMNLP-2019 and reproduce the model trained focus on more depth in analysis. on the TB-Dense dataset. 2https://github.com/PlusLabNLP/ 4According to Fetahu et al.(2015), more than 20% of the ee-wiki-bias references are news articles on Wikipedia. Metric TB-D S S-F S-M generating data that contains target events; 2) test- ing the model performance for females and males Precision 89.2 93.5 95.3 93.4 separately in the generated data, 3) and using the Recall 92.6 89.8 87.1 89.8 model performance to estimate real event occur- F1 90.9 91.6 91.0 91.6 rence frequencies. We aim to calibrate the top 50 most skewed Table 3: The performance for off-the-shelf event ex- traction model in both common event extraction dataset events in females’ and males’ Career and Per- TB-Dense (TB-D) and our corpus with manual annota- sonal Life descriptions after using the OR sepa- tion. S represents the sampled data from the corpus. rately. First, we follow two steps to generate a S-F and S-M represent the sampled data for female ca- synthetic dataset: reer description and male career description separately. 1. For each target event, we select all sentences where the model successfully detected the tar- male: 4,965) from the Wikipedia corpus. Table3 get event. For each sentence, we manually shows that the model performs comparably on our verify the correctness of the extracted event corpora as on the TB-Dense test set.