Men Are Elected, Women Are Married: Events Gender Bias on Wikipedia

Total Page:16

File Type:pdf, Size:1020Kb

Men Are Elected, Women Are Married: Events Gender Bias on Wikipedia Men Are Elected, Women Are Married: Events Gender Bias on Wikipedia Jiao Sun1;2 and Nanyun Peng1;2;3 1Computer Science Department, University of Southern California 2Information Sciences Institute, University of Southern California 3Computer Science Department, University of California, Los Angeles [email protected], [email protected] Abstract Name Wikipedia Description Human activities can be seen as sequences Career: In 1930, when she was 17, she eloped of events, which are crucial to understanding with 26-year-old actor Grant Withers; they were married in Yuma, Arizona. The marriage was societies. Disproportional event distribution Loretta for different demographic groups can mani- Young annulled the next year, just as their second fest and amplify social stereotypes, and po- (F) movie together (ironically entitled Too Young tentially jeopardize the ability of members in to Marry) was released. some groups to pursue certain goals. In this pa- Personal Life: In 1930, at 26, he eloped to per, we present the first event-centric study of Yuma, Arizona with 17-year-old actress Loretta gender biases in a Wikipedia corpus. To facili- Grant Young. The marriage ended in annulment in tate the study, we curate a corpus of career and Withers 1931 just as their second movie together, titled (M) personal life descriptions with demographic in- Too Young to Marry, was released. formation consisting of 7,854 fragments from 10,412 celebrities. Then we detect events with Table 1: The marriage events are under the Career sec- a state-of-the-art event detection model, cali- tion for the female on Wikipedia. However, the same brate the results using strategically generated marriage is in the Personal Life section for the male. templates, and extract events that have asym- yellow background highlights events in the passage. metric associations with genders. Our study discovers that Wikipedia pages tend to inter- mingle personal life events with professional could have a significant impact on audiences’ per- events for females but not for males, which ception of different groups, thus propagating and calls for the awareness of the Wikipedia com- munity to formalize guidelines and train the even amplifying societal biases. Therefore, analyz- editors to mind the implicit biases that contrib- ing potential biases in Wikipedia is imperative. utors carry. Our work also lays the foundation In particular, studying events in Wikipedia is im- for future works on quantifying and discover- portant. An event is a specific occurrence under ing event biases at the corpus level. a certain time and location that involves partici- pants (Yu et al., 2015); human activities are essen- 1 Introduction tially sequences of events. Therefore, the distribu- Researchers have been using NLP tools to ana- tion and perception of events shape the understand- arXiv:2106.01601v1 [cs.CL] 3 Jun 2021 lyze corpora for various tasks on online platforms. ing of society. Rashkin et al.(2018) discovered For example, Pei and Jurgens(2020) found that implicit gender biases in film scripts using events female-female interactions are more intimate than as a lens. For example, they found that events with male-male interactions on Twitter and Reddit. Dif- female agents are intended to be helpful to other ferent from social media, open collaboration com- people, while events with male agents are moti- munities such as Wikipedia have slowly won the vated by achievements. However, they focused on trust of public (Young et al., 2016). Wikipedia has the intentions and reactions of events rather than been trusted by many, including professionals in events themselves. work tasks such as scientific journals (Kousha and In this work, we propose to use events as a lens Thelwall, 2017) and public officials in powerful to study gender biases and demonstrate that events positions of authority such as court briefs (Gerken, are more efficient for understanding biases in cor- 2010). Implicit biases in such knowledge sources pora than raw texts. We define gender bias as the asymmetric association of events with females and Career Personal Life Collected 1 males, which may lead to gender stereotypes. For Occ F M F M F M example, females are more associated with domes- Acting 464 469 464 469 464 469 tic activities than males in many cultures (Leopold, Writer 455 611 319 347 1,372 2,466 2018; Jolly et al., 2014). Comedian 380 655 298 510 642 1,200 Artist 193 30 60 18 701 100 To facilitate the study, we collect a corpus that Chef 81 141 72 95 176 350 contains demographic information, personal life de- Dancer 334 167 286 127 812 465 scription, and career description from Wikipedia.2 Podcaster 87 183 83 182 149 361 Musician 39 136 21 78 136 549 We first detect events in the collected corpus using a state-of-the-art event extraction model (Han et al., All 4,425 3,429 10,412 2019). Then, we extract gender-distinct events with Table 2: Statistics showing the number of celebrities a higher chance to occur for one group than the with Career section or Personal Life section, together other. Next, we propose a calibration technique to with all celebrities we collected. Not all celebrities offset the potential confounding of gender biases have Career or Personal Life sections. in the event extraction model, enabling us to fo- cus on the gender biases at the corpus level. Our contributions are three-fold: corpus. We encourage interested researchers to fur- ther utilize our collected corpus and conduct studies • We contribute a corpus of 7,854 fragments from other perspectives. In each experiment, we from 10,412 celebrities across 8 occupations select the same number of female and male celebri- including their demographic information and ties from one occupation for a fair comparison. Wikipedia Career and Personal Life sections. Event Extraction. There are two definitions of • We propose using events as a lens to study events: one defines an event as the trigger word gender biases at the corpus level, discover a (usually a verb) (Pustejovsky et al., 2003b), the mixture of personal life and professional life other defines an event as a complex structure includ- for females but not for males, and demonstrate ing a trigger, arguments, time, and location (Ahn, the efficiency of using events in comparison 2006). The corpus following the former definition to directly analyzing the raw texts. usually has much broader coverage, while the lat- ter can provide richer information. For broader • We propose a generic framework to analyze coverage, we choose a state-of-the-art event detec- event gender bias, including a calibration tech- tion model that focuses on detecting event trigger nique to offset the potential confounding of words by Han et al.(2019). 3 We use the model gender biases in the event extraction model. trained on the TB-Dense dataset (Pustejovsky et al., 2003a) for two reasons: 1) the model performs 2 Experimental Setup better on the TB-Dense dataset; 2) the annota- In this section, we will introduce our collected cor- tion of the TB-Dense dataset is from the news pus and the event extraction model in our study. articles, and it is also where the most content of Wikipedia comes from.4 We extract and lemma- Dataset. Our collected corpus contains demo- tize events e from the corpora and count their fre- graphics information and description sections of quencies jej. Then, we separately construct dic- m m m m m celebrities from Wikipedia. Table2 shows the tionaries E = fe1 : je1 j; :::; eM : jeM jg and statistics of the number of celebrities with Career f f f f f E = fe1 : je1 j; :::; eF : jeF jg mapping events to or Personal Life sections in our corpora, together their frequency for male and female respectively. with all celebrities we collected. In this work, we only explored celebrities with Career or Personal Event Extraction Quality. To check the model Life sections, but there are more sections (e.g., Pol- performance on our corpora, we manually anno- itics and Background and Family) in our collected tated events in 10,508 sentences (female: 5,543, 3 1In our analysis, we limit to binary gender classes, which, We use the code at https://github.com/ while unrepresentative of the real-world diversity, allows us to rujunhan/EMNLP-2019 and reproduce the model trained focus on more depth in analysis. on the TB-Dense dataset. 2https://github.com/PlusLabNLP/ 4According to Fetahu et al.(2015), more than 20% of the ee-wiki-bias references are news articles on Wikipedia. Metric TB-D S S-F S-M generating data that contains target events; 2) test- ing the model performance for females and males Precision 89.2 93.5 95.3 93.4 separately in the generated data, 3) and using the Recall 92.6 89.8 87.1 89.8 model performance to estimate real event occur- F1 90.9 91.6 91.0 91.6 rence frequencies. We aim to calibrate the top 50 most skewed Table 3: The performance for off-the-shelf event ex- traction model in both common event extraction dataset events in females’ and males’ Career and Per- TB-Dense (TB-D) and our corpus with manual annota- sonal Life descriptions after using the OR sepa- tion. S represents the sampled data from the corpus. rately. First, we follow two steps to generate a S-F and S-M represent the sampled data for female ca- synthetic dataset: reer description and male career description separately. 1. For each target event, we select all sentences where the model successfully detected the tar- male: 4,965) from the Wikipedia corpus. Table3 get event. For each sentence, we manually shows that the model performs comparably on our verify the correctness of the extracted event corpora as on the TB-Dense test set.
Recommended publications
  • Florida State University Libraries
    )ORULGD6WDWH8QLYHUVLW\/LEUDULHV 2020 Wiki-Donna: A Contribution to a More Gender-Balanced History of Italian Literature Online Zoe D'Alessandro Follow this and additional works at DigiNole: FSU's Digital Repository. For more information, please contact [email protected] THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS & SCIENCES WIKI-DONNA: A CONTRIBUTION TO A MORE GENDER-BALANCED HISTORY OF ITALIAN LITERATURE ONLINE By ZOE D’ALESSANDRO A Thesis submitted to the Department of Modern Languages and Linguistics in partial fulfillment of the requirements for graduation with Honors in the Major Degree Awarded: Spring, 2020 The members of the Defense Committee approve the thesis of Zoe D’Alessandro defended on April 20, 2020. Dr. Silvia Valisa Thesis Director Dr. Celia Caputi Outside Committee Member Dr. Elizabeth Coggeshall Committee Member Introduction Last year I was reading Una donna (1906) by Sibilla Aleramo, one of the most important ​ ​ works in Italian modern literature and among the very first explicitly feminist works in the Italian language. Wanting to know more about it, I looked it up on Wikipedia. Although there exists a full entry in the Italian Wikipedia (consisting of a plot summary, publishing information, and external links), the corresponding page in the English Wikipedia consisted only of a short quote derived from a book devoted to gender studies, but that did not address that specific work in great detail. As in-depth and ubiquitous as Wikipedia usually is, I had never thought a work as important as this wouldn’t have its own page. This discovery prompted the question: if this page hadn’t been translated, what else was missing? And was this true of every entry for books across languages, or more so for women writers? My work in expanding the entry for Una donna was the beginning of my exploration ​ ​ into the presence of Italian women writers in the Italian and English Wikipedias, and how it relates back to canon, Wikipedia, and gender studies.
    [Show full text]
  • Blind Trust Is Not Enough”: Considering Practical Verifiability and Open Referencing in Wikipedia
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Journal of Critical Library and Information Studies Perspective “Blind Trust is Not Enough”: Considering Practical Verifiability and Open Referencing in Wikipedia Cal Murgu and Krisandra Ivings ABSTRACT This article draws attention to the often-unseen information inequalities that occur in the way that Wikipedia content is referenced. Drawing on digital information control and virtual gatekeeping scholarship, we contend that by not considering the degree to which references are practically accessible and verifiable, Wikipedia editors are implicitly promoting a control mechanism that is limiting the potential of Wikipedia to serve, as Willinsky puts it, “as a gateway to a larger world of knowledge.” We question the widespread practice of referencing peer-reviewed literature that is obscured by prohibitive paywalls. As we see it, two groups are disadvantaged by this practice. First, Wikipedia editors who lack access to the most current scholarly literature are unable to verify a reference or confirm the veracity of a fact or figure. Second, and perhaps most important, general readers are left with two unsatisfactory options in this scenario: they can either trust the authority of the citation or pay to access the article. Building on Don Fallis’ work on epistemic consequences of Wikipedia, this paper contends that promoting practically verifiable references would work to mitigate these inequalities and considers how relatively new initiatives could be used to improve the quality and utility of Wikipedia more generally. Murgu, Cal and Krisandra Ivings. “’Blind Trust is Not Enough’: Considering Practical Verifiability and Open Referencing in Wikipedia,” in “Information/Control: Control in the Age of Post-Truth,” eds.
    [Show full text]
  • Gender Imbalance on Wikipedia- an Insider’S Perspective Saskia Ehlers This Paper Will Focus on the Gender Gap Within Wikipedia
    November 17 Gender Imbalance on Wikipedia- An Insider’s perspective Saskia Ehlers This paper will focus on the gender gap within Wikipedia. For this the current state of research will be summarized. Effect on participation and content of the encyclopedia resulting from the gender imbalance will be analyzed. An Insider’s perspective will be given on the current issues and dynamics within the German community and the Wikimedia foundation also including narratives from different community members from all over the world that are part of the global movement. Saskia Ehlers, Center for interdisciplinary gender studies , TU Berlin Table of contents LIST OF TABLES II LIST OF FIGURES II PREFACE 1 INTRODUCTION 1 CURRENT STATE OF RESEARCH ON THE TOPIC OF GENDER IMBALANCE ON WIKIPEDIA 2 ARGUMENTS IN FAVOR OF DIVERSITY 2 ARGUMENTS AGAINST DIVERSITY 3 FINDING OF STUDIES 3 REASONS FOR LOW FEMALE PARTICIPATION 4 CONTENT DISTORTION 5 STRATEGIES TO INCREASE FEMALE PARTICIPATION 9 SURVEY OF DIVERSITY EXPERTS IN THE WIKIPEDIA COMMUNITY 10 SURVEY RESPONSES 10 BEST PRACTICES FOR INCREASING FEMALE EDITORSHIP 16 OBSTACLES AND FAILED PROJECTS FOR INCREASING FEMALE EDITORSHIP 17 CURRENT PROJECTS ON THE GENDER GAP IN THE WIKIPEDIA COMMUNITY 19 CONCLUSION 20 REFERENCES 22 List of tables Table 1: Percentage of female editors found in different surveys (own illustration) Table 2: English language version gender- specific Likelihood Ratios (Wagner, Claudia It’s a Man’s Wikipedia? Assessing Gender Inequality in an Online Encyclopedia Proceedings of the Ninth International AAAI Conference on Web and Social Media. P. 454-463) List of figures Figure 1: Estimated proportion of female contributors in different language versions (Charting Diversity (2014) Charting Diverstiy- Working together towards diversity in Wikipedia by Wikimedia Foundation Germany and Beuth university for applied sciences.
    [Show full text]
  • Why Wikipedia Often Overlooks Stories of Women in History Lara Nicosia Rochester Institute of Technology
    Rochester Institute of Technology RIT Scholar Works Articles 3-16-2018 Why Wikipedia Often Overlooks Stories of Women in History Lara Nicosia Rochester Institute of Technology Tamar Carroll Rochester Institute of Technology This work is licensed under a Creative Commons Attribution-No Derivative Works 4.0 License. Follow this and additional works at: http://scholarworks.rit.edu/article Part of the History of Gender Commons, and the Information Literacy Commons Recommended Citation Nicosia, Lara and Carroll, Tamar, "Why Wikipedia Often Overlooks Stories of Women in History" (2018). The Conversation,Accessed from http://scholarworks.rit.edu/article/1847 This Article is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Articles by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. https://theconversation.com/why-wikipedia-often-overlooks-stories-of-women-in-history-92555 Academic rigor, journalistic flair Why Wikipedia often overlooks stories of women in history March 16, 2018 6.25am EDT Less than a third of biographical entries on Wikipedia are about women. aradaphotography/shutterstock.com Why Wikipedia often overlooks stories of women in history March 16, 2018 6.25am EDT Movements like #MeToo are drawing increased attention to the systemic discrimination Authors facing women in a range of professional fields, from Hollywood and journalism to banking and government. Discrimination is also a problem on user-driven sites like Wikipedia. Wikipedia is the fifth most popular website worldwide. In January, the English-language version of the online Tamar Carroll Associate Professor of History, Rochester encyclopedia had over 7.3 billion page views, more than 2000 percent higher than other Institute of Technology online reference sites such as IMDb or Dictionary.com.
    [Show full text]
  • View / Open Ada07-Wpthr-Pea-2015
    2/26/2021 WP:THREATENING2MEN: Misogynist Infopolitics and the Hegemony of the Asshole Consensus on English Wikipedia - Ada New Media ISSUE NO. 7 WP:THREATENING2MEN: Misogynist Infopolitics and the Hegemony of the Asshole Consensus on English Wikipedia Bryce Peake When schools discourage reporting, they collude with many societal forces to cover up sexual violence. Sexual violence thrives on secrecy. – Jennifer Freyd, ’Official Campus Statistics for Sexual Violence Mislead’ (http://america.aljazeera.com/opinions/2014/7/college-campus- sexualassaultsafetydatawhitehousegender.html) Spring 2014 was a long season, marked by a campus-wide anti-rape movement that took off at the University of Oregon (UO). In the wake of a high profile case, administrators callously and robotically rehearsed the “one time is too many” catch phrase that, through its rhetorical singularity, renders campus sexual violence an “isolated issue.” Arguments that UO was somehow unique or unusual in its unsafe environment and unethical public relations approach to public safety became rampant in public forums and the comment sections of online articles. The idea of writing campus sexual violence into Wikipedia was born of these circumstances, growing out of a conversation with campus activists about universities’ efforts to keep campus sexual violence invisible. By increasing the amount of freely available information on the long history of campus sexual violence across the United States, we could provide information for people looking to learn about the ways Oregon was
    [Show full text]
  • Reader Responsibility in Knowledge Organisation Readers Between Reference Books, Commonplace Books, Wikipedia, and Pinterest
    Reader Responsibility in Knowledge Organisation Readers between Reference Books, Commonplace Books, Wikipedia, and Pinterest Lea Maria Ferguson (B.A.), s1283464 Master’s Thesis Book and Digital Media Studies, Leiden University First reader: Prof. A. H. van der Weel Second reader: P. Verhaar Date of Submission: 1st July 2015 Words: 19.530 (incl. footnotes) Abstract The possibilities for exerting reader responsibility as an expression of democracy on the Internet are manifold. Reader responsibility may be seen as the process of finding, accessing, making sense of, interpreting, judging, and putting to use the information that can be gained through reading, and thereby turning it into knowledge. However, the 'democratisation discourse' on the Internet is often (mis-)used to promote mere economic aims, even under the guise of ‘digital socialism’. This is not only an obstacle to understanding reader responsibility and the process of reading, but also to gaining access to knowledge and making use of information resources successfully and effectively. It is encouraging to see though that the potential presented by reading and writing is (partly) being implemented. Historically, this occurred through commonplace and reference books, and currently it is happening through Pinterest and Wikipedia - despite some problematic economic and political circumstances. Embracing the responsibilities of the reading process will be conducive to developing the political potential of reading and as such be beneficial for society at large. Despite the connected and implied struggles of this, it is important to realise that readers can only turn information into knowledge and utilise it in their lives by being active and informed. Keywords Reader responsibility, textual knowledge navigation, commonplace books, reference books, Wikipedia, Pinterest Contents Abstract...............................................................................................................................................................................
    [Show full text]
  • Gender Inequality in New Media: Evidence from Wikipedia
    ISSN 2279-9362 Gender Inequality in New Media: Evidence from Wikipedia Marit Hinnosaar No. 411 May 2015 www.carloalberto.org/research/working-papers © 2015 by Marit Hinnosaar. Any opinions expressed here are those of the authors and not those of the Collegio Carlo Alberto. Gender Inequality in New Media: Evidence from Wikipedia∗ Marit Hinnosaary May 2015 Abstract Media is considered to be critical for gender equality. I analyze Wikipedia, one of the prominent examples of new media. I study why women are less likely to contribute to Wikipedia, the implications of the gender gap, and what can be done about it. I find that: (1) gender differences in the frequency of Wikipedia use and in beliefs about one's competence explain a large share of the gender gap in Wikipedia writing; (2) the gender gap among contributors leads to unequal coverage of topics; (3) providing information about gender inequality has a large effect on contributions. JEL codes: L86; L82; J16; H41 Keywords: Gender, Internet, Media, Public goods 1 Introduction United Nations is engaged in an initiative to increase female presence in the media.1 The initiative was established \in recognition that the media are critical to the achievement of gender equality and women's empowerment".2 In most traditional media outlets, men vastly outnumber women, and several major media organizations have taken steps to change that. Examples include Bloomberg introducing gender quotas for news stories3 and BBC for panel ∗I'm grateful to Ainhoa Aparicio, Gabriele Camera, Matthias Doepke, Martin Dufwenberg, Avi Goldfarb, Shane Greenstein, Toomas Hinnosaar, Ryan McDevitt, David McKenzie, Mario Pagliero, Andrea Prat, Stephan Seiler, Alex Tetenov, and Michael Zhang for helpful comments and discussions.
    [Show full text]
  • Writing Women in Mathematics Into Wikipedia
    Writing Women in Mathematics into Wikipedia Marie A. Vitulli University of Oregon Eugene, OR 97403-1222 October 20, 2017 Abstract In this article I reflect upon the problemsconnected with writing women in mathematics into Wikipedia. I discuss some of the current projects and efforts aimed at increasing the visibility of women in mathe- matics on Wikipedia. I present the rules for creating a biography on Wikipedia and relate my personal experiences in creating such articles. I hope to provide the reader with the background and resources to start editing existing Wikipedia articles and the confidence to create new articles. I would also like to encourage existing editors to look out for and protect new articles about women mathematicians and submit new articles. 1 Introduction The twin problems of the paucity of women subjects and scarcity of women editors on Wikipedia are well known but no solutions are on the horizon. We can nevertheless take small steps to address these problems. In response to my concerns about these problems and in order to gain first-hand experience, I became a Wikipedia editor in 2013. In this article I will update and expand upon my article Writing Women in Mathematics into Wikipedia, which appeared in the May–June 2014 issue of the Association for Women in Mathematics Newsletter [8]. Since 2013 some things have improved on Wikipedia and others have remained more or less the same. I will first report on these and then relate some of my personal experiences as a Wikipedia editor. Briefly, this is how this article is organized.
    [Show full text]
  • Digital Citizenship Toolkit
    Digital Citizenship Toolkit Digital Citizenship Toolkit MICHELLE SCHWARTZ RYERSON UNIVERSITY TORONTO Digital Citizenship Toolkit by Michelle Schwartz is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted. This book is licensed CC BY unless otherwise noted. Cover image by Gerd Altmann from Pixabay This book was produced with Pressbooks (https://pressbooks.com) and rendered with Prince. Introduction Hello and welcome to the Digital Citizenship Toolkit. Have you ever wondered if your phone is listening to you? Do you ever look to the Internet for the answer to a question, and hours later, find that you are more confused than before? Have you argued with a friend or relative about a meme? Have you been tempted to share your own thoughts and feelings online, but resisted for fear of trolls? This book delves into these issues and more. Finding What You’re Looking For If you spend any time at all plugged into the digital realm, whether through social media apps, streaming services, or websites, it’s easy to become overwhelmed by the sheer volume of content that exists – you can scroll all you want, but you’ll never reach the end of the Internet. All of that content, from the silliest YouTube video to the most serious news reports, has to be critically evaluated by you, the viewer. When you engage with the digital world, are you able to find the answers to your questions? When there are a thousand different responses to the same event, how do you know who to trust? Can you identify whose voice is missing? These are some of the components that make up digital literacy, the topic of Chapter 1.
    [Show full text]
  • Gender and Equity in Openness: Forgotten Spaces
    4 Gender and Equity in Openness: Forgotten Spaces Sonal Zaveri Introduction: Why Is Gender Equity Important? Gender equity1 is recognized by many to be critical to ensure an inclusive society that benefits all people. A gap in gender parity across the world has been well documented. The World Economic Forum’s The Global Gender Gap Report 2017 (Leopold, Ratcheva, and Zahidi 2017), which measures the gap between men and women in four key areas of health, education, economy, and politics, indicated that in 2017, an average gap of 32 percent remained to be closed between men and women. Some countries showed a reversal, and others some positive steps forward, indicating that context and sustained efforts are important while tracking gender measures. Yet these persistent gender gaps have focused global attention and stimulated a call for action to track progress and close these gaps (Leopold, Ratcheva, and Zahidi 2016). Although the promise of open development is that it will be neutral, inclusive, and gender- fair, operationalizing openness to ensure that men and women benefit equally can be problematic. This is largely because gender asymmetries and the inherent social structures that fuel them are often not addressed adequately, even when technological and operational concerns are attended to. Addressing technological challenges faced by women is just that— a service delivery that reaches out to women. But participation is much more than numbers and attendance; it is about speaking up, sharing ideas, and being heard. Gender asymmetries in participation and voice have been attributed to a failure to address entrenched cultural and traditional biases and a lack of gender- sensitive policies (Neuman 2016).
    [Show full text]
  • Writing Women in Mathematics Into Wikipedia
    COMMUNICATION Writing Women in Mathematics into Wikipedia Marie A. Vitulli Communicated by Harriet Pollatsek The Paucity of Women Subjects ABSTRACT. In this article I reflect upon the problems connected with writing women in mathematics into There have been numerous allegations that Wikipedia Wikipedia. I hope to persuade the reader to start editing suffers from systemic gender bias with respect to both existing Wikipedia articles and to create new articles. content and editors and that the attempts to increase 1 I also hope to persuade existing editors to contribute women’s participation have failed. new articles about women mathematicians. One disturbing manifestation of the underrepresenta- tion of women in Wikipedia is the dearth of biographies of notable women. According to WikiProject Women in Red, The twin problems of the paucity of women subjects and only 16.36% of the biographies in English Wikipedia were scarcity of women editors on Wikipedia are well known, about women as of August 7, 2016. This is up from just but no solutions are on the horizon. We can nevertheless over 15% in November 2014. Wikipedia has guidelines on take small steps to address these problems. In response academic notability (also called the professor test) that to my concerns about these problems and in order to gain a subject must meet to merit a Wikipedia page. The first first-hand experience, I became a Wikipedia editor in 2013. two of the nine criteria of academic notability are: the In the first section, I discuss gender bias on Wikipedia and person’s research has made significant impact in their the underrepresentation of women subjects.
    [Show full text]
  • A Machine Learning Model to Detect Gender Bias in Wikipedia
    Wikigender: A Machine Learning Model to Detect Gender Bias in Wikipedia Natalie Bolón Brun Sofia Kypraiou École Polytechnique Fédérale de Lausanne (EPFL) École Polytechnique Fédérale de Lausanne (EPFL) Lausanne, Switzerland Lausanne, Switzerland [email protected] [email protected] Natalia Gullón Altés Irene Petlacalco Barrios École Polytechnique Fédérale de Lausanne (EPFL) École Polytechnique Fédérale de Lausanne (EPFL) Lausanne, Switzerland Lausanne, Switzerland [email protected] [email protected] ABSTRACT In this work, we model the problem as a prediction task to infer The way Wikipedia’s contributors think can influence how they the gender of the person described by using the set of words used describe individuals resulting in a bias based on gender. We use a in the article. We base our prediction on a logistic regression model machine learning model to prove that there is a difference in how that provides interpretable insights on the importance of its features. women and men are portrayed on Wikipedia. Additionally, we use The difference is manifested as the presence of different words the results of the model to obtain which words create bias in the given the sex of the person being described. This bias is also studied overview of the biographies of the English Wikipedia. Using only along with different occupations. Finally, we analyze those words adjectives as input to the model, we show that the adjectives used that appear as most predictive for each gender and quantify their to portray women have a higher subjectivity than the ones used subjectivity and strength. to describe men. Extracting topics from the overview using nouns Results show that there is actually a distinction in the usage and adjectives as input to the model, we obtain that women are of words based on gender.
    [Show full text]