Towards an On-Demand Simple Portuguese Wikipedia

Total Page:16

File Type:pdf, Size:1020Kb

Load more

Towards an on-demand Simple Portuguese Wikipedia Arnaldo Candido Junior Ann Copestake Institute of Mathematics and Computer Sciences Computer Laboratory University of São Paulo University of Cambridge arnaldoc at icmc.usp.br Ann.Copestake at cl.cam.ac.uk Lucia Specia Sandra Maria Aluísio Research Group in Computational Linguistics Institute of Mathematics and Computer Sciences University of Wolverhampton University of São Paulo l.specia at wlv.ac.uk sandra at icmc.usp.br Abstract Language (the set of guidelines to write simplified texts) are related to both syntax and lexical levels: The Simple English Wikipedia provides a use short sentences; avoid hidden verbs; use active simplified version of Wikipedia's English voice; use concrete, short, simple words. articles for readers with special needs. A number of resources, such as lists of common However, there are fewer efforts to make words3, are available for the English language to information in Wikipedia in other help users write in Simple English. These include languages accessible to a large audience. lexical resources like the MRC Psycholinguistic This work proposes the use of a syntactic Database4 which helps identify difficult words simplification engine with high precision using psycholinguistic measures. However, rules to automatically generate a Simple resources as such do not exist for Portuguese. An Portuguese Wikipedia on demand, based on exception is a small list of simple words compiled user interactions with the main Portuguese as part of the PorSimples project (Aluisio et al., Wikipedia. Our estimates indicated that a 2008). human can simplify about 28,000 Although the guidelines from the Plain occurrences of analysed patterns per Language can in principle be applied for many million words, while our system can languages and text genres, for Portuguese there are correctly simplify 22,200 occurrences, with very few efforts using Plain Language to make estimated f-measure 77.2%. information accessible to a large audience. To the best of our knowledge, the solution offered by 1 Introduction Portugues Claro5 to help organizations produce European Portuguese (EP) documents in simple The Simple English Wikipedia1 is an effort to make language is the only commercial option in such a information in Wikipedia2 accessible for less direction. For Brazilian Portuguese (BP), a competent readers of English by using simple Brazilian Law (10098/2000) tries to ensure that words and grammar. Examples of intended users content in e-Gov sites and services is written in include children and readers with special needs, simple and direct language in order to remove such as users with learning disabilities and learners barriers in communication and to ensure citizens' of English as a second language. rights to information and communication access. Simple English (or Plain English), used in this However, as it has been shown in Martins and version of Wikipedia, is a result from the Plain Filgueiras (2007), content in such websites still English movement that occurred in Britain and the needs considerable rewriting to follow the Plain United States in the late 1970’s as a reaction to the Language guidelines. unclear language used in government and business A few efforts from the research community have forms and documents. Some recommendations on recently resulted in natural language processing how to write and organize information in Plain 3 http://simple.wiktionary.org/ 1 http://simple.wikipedia.org/ 4 http://www2.let.vu.nl/resources/elw/resource/mrc.html 2 http://www.wikipedia.org/ 5 http://www.portuguesclaro.pt/ 137 Proceedings of the 2nd Workshop on Speech and Language Processing for Assistive Technologies, pages 137–147, Edinburgh, Scotland, UK, July 30, 2011. c 2011 Association for Computational Linguistics systems to simplify and make Portuguese language Section 3 presents the methodology to build and clearer. ReEscreve (Barreiro and Cabral, 2009) is a evaluate the simplification engine for BP. Section 4 multi-purpose paraphraser that helps users to presents the results of the engine evaluation. simplify their EP texts by reducing its ambiguity, Section 5 presents an analysis on simplification number of words and complexity. The current issues and discusses possible improvements. linguistic phenomena paraphrased are support verb Section 6 contains some final remarks. constructions, which are replaced by stylistic variants. In the case of BP, the lack of 2 Related work simplification systems led to development of Given the dependence of syntactic simplification PorSimples project (Aluísio and Gasperin, 2010). on linguistic information, successful approaches This project uses simplification in different are mostly based on rule-based systems. linguistic levels to provide simplified text to poor Approaches using operations learned from corpus literacy readers. have not shown to be able to perform complex For English, automatic text simplification has operations such the splitting of sentences with been exploited for helping readers with poor relative clauses (Chandrasekar and Srinivas, 1997; literacy (Max, 2006) and readers with other special Daelemans et al., 2004; Specia, 2010). On the needs, such as aphasic people (Devlin and other hand. the use of machine learning techniques Unthank, 2006; Carroll et al. 1999). It has also to predict when to simplify a sentence, i.e. learning been used in bilingual education (Petersen, 2007) the properties of language that distinguish simple and for improving the accuracy of Natural from normal texts, has achieved relative success Language Processing (NLP) tasks (Klebanov et al., (Napoles and Dredze, 2010). Therefore, most work 2004; Vickrey and Koller, 2008). on syntactic simplification still relies on rule-based Given the general scarcity of human resources to systems to simplify a set of syntactic constructions. manually simplify large content repositories such This is also the approach we follow in this paper. as Wikipedia, simplifying texts automatically can In what follows we review some relevant and work be the only feasible option. The Portuguese on syntactic simplification. Wikipedia, for example, is the tenth largest The seminal work of Chandrasekar and Srinivas Wikipedia (as of May 2011), with 683,215 articles (1997) investigated the induction of syntactic rules and approximately 860,242 contributors6. from a corpus annotated with part-of-speech tags In this paper we propose a new rule-based augmented by agreement and subcategorization syntactic simplification system to create a Simple information. They extracted syntactic Portuguese Wikipedia on demand, based on user correspondences and generated rules aiming to interactions with the main Portuguese Wikipedia. speed up parsing and improving its accuracy, but We use a simplification engine to change passive not working on naturally occurring texts. into active voice and to break down and change the Daelemans et al. (2004) compared both machine syntax of subordinate clauses. We focus on these learning and rule-based approaches for the operations because they are more difficult to automatic generation of TV subtitles for hearing- process by readers with learning disabilities as impaired people. In their machine learning compared to others such as coordination and approach, a simplification model is learned from complex noun phrases (Abedi et al., 2011; Jones et parallel corpora with TV programme transcripts al., 2006; Chappell, 1985). User interaction with and the associated subtitles. Their method used a Wikipedia can be performed by a system like the memory-based learner and features such as words, Facilita7 (Watanabe et al., 2009), a browser plug-in lemmas, POS tags, chunk tags, relation tags and developed in the PorSimples project to allow proper name tags, among others features (30 in automatic adaptation (summarization and syntactic total). However, this approach did not perform as simplification) of any web page in BP. well as the authors expected, making errors like This paper is organized as follows. Section 2 removing sentence subjects or deleting a part of a presents related work on syntactic simplification. multi-word unit. More recently, Specia (2010) 6 http://meta.wikimedia.org/wiki/List_of_Wikipedias# presented a new approach for text simplification, Grand Total based on the framework of Statistical Machine 7 http://nilc.icmc.usp.br/porsimples/facilita/ 138 Translation. Although the results are promising for by a Link Grammar8 parser. Their motivation is to lexical simplification, syntactic rewriting was not improve the performance of systems for extracting captured by the model to address long-distance Protein-Protein Interactions automatically from operations, since syntactic information was not biomedical articles by automatically simplifying included into the framework. sentences. Besides treating the syntactic Inui et al. (2003) proposed a rule-based system phenomena described in Siddharthan (2006), they for text simplification aimed at deaf people. Using remove describing phrases occurring at the about one thousand manually created rules, the beginning of the sentences, like “These results authors generate several paraphrases for each suggest that” and “As reported previously”. While sentence and train a classifier to select the simpler they focus on the scientific genre, our work is ones. Promising results were obtained, although focused on the encyclopedic genre. different types of errors on the paraphrase In order to obtain a text easier to understand
Recommended publications
  • Cultural Anthropology Through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-Based Sentiment

    Cultural Anthropology Through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-Based Sentiment

    Cultural Anthropology through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-based Sentiment Peter A. Gloor, Joao Marcos, Patrick M. de Boer, Hauke Fuehres, Wei Lo, Keiichi Nemoto [email protected] MIT Center for Collective Intelligence Abstract In this paper we study the differences in historical World View between Western and Eastern cultures, represented through the English, the Chinese, Japanese, and German Wikipedia. In particular, we analyze the historical networks of the World’s leaders since the beginning of written history, comparing them in the different Wikipedias and assessing cultural chauvinism. We also identify the most influential female leaders of all times in the English, German, Spanish, and Portuguese Wikipedia. As an additional lens into the soul of a culture we compare top terms, sentiment, emotionality, and complexity of the English, Portuguese, Spanish, and German Wikinews. 1 Introduction Over the last ten years the Web has become a mirror of the real world (Gloor et al. 2009). More recently, the Web has also begun to influence the real world: Societal events such as the Arab spring and the Chilean student unrest have drawn a large part of their impetus from the Internet and online social networks. In the meantime, Wikipedia has become one of the top ten Web sites1, occasionally beating daily newspapers in the actuality of most recent news. Be it the resignation of German national soccer team captain Philipp Lahm, or the downing of Malaysian Airlines flight 17 in the Ukraine by a guided missile, the corresponding Wikipedia page is updated as soon as the actual event happened (Becker 2012.
  • Modeling Popularity and Reliability of Sources in Multilingual Wikipedia

    Modeling Popularity and Reliability of Sources in Multilingual Wikipedia

    information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia.
  • The Culture of Wikipedia

    The Culture of Wikipedia

    Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented.
  • Proceedings of the 3Rd Workshop on Building and Using Comparable

    Proceedings of the 3Rd Workshop on Building and Using Comparable

    Workshop Programme Saturday, May 22, 2010 9:00-9:15 Opening Remarks Invited talk 9:15 Comparable Corpora Within and Across Languages, Word Frequency Lists and the KELLY Project Adam Kilgarriff 10:30 Break Session 1: Building Comparable Corpora 11:00 Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Ma- chine Translation Inguna Skadin¸a, Andrejs Vasil¸jevs, Raivis Skadin¸š, Robert Gaizauskas, Dan Tufi¸s and Tatiana Gornostay 11:30 Statistical Corpus and Language Comparison Using Comparable Corpora Thomas Eckart and Uwe Quasthoff 12:00 Wikipedia as Multilingual Source of Comparable Corpora Pablo Gamallo and Isaac González López 12:30 Trillions of Comparable Documents Pascale Fung, Emmanuel Prochasson and Simon Shi 13:00 Lunch break i Saturday, May 22, 2010 (continued) Session 2: Parallel and Comparable Corpora for Machine Translation 14:30 Improving Machine Translation Performance Using Comparable Corpora Andreas Eisele and Jia Xu 15:00 Building a Large English-Chinese Parallel Corpus from Comparable Patents and its Experimental Application to SMT Bin Lu, Tao Jiang, Kapo Chow and Benjamin K. Tsou 15:30 Automatic Terminologically-Rich Parallel Corpora Construction José João Almeida and Alberto Simões 16:00 Break Session 3: Contrastive Analysis 16:30 Foreign Language Examination Corpus for L2-Learning Studies Piotr Banski´ and Romuald Gozdawa-Goł˛ebiowski 17:00 Lexical Analysis of Pre and Post Revolution Discourse in Portugal Michel Généreux, Amália Mendes, L. Alice Santos Pereira and M. Fernanda Bace- lar do Nascimento
  • ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia

    ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia

    ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia AARON HALFAKER∗, Microsoft, USA R. STUART GEIGER†, University of California, San Diego, USA Algorithmic systems—from rule-based bots to machine learning classifiers—have a long history of supporting the essential work of content moderation and other curation work in peer production projects. From counter- vandalism to task routing, basic machine prediction has allowed open knowledge projects like Wikipedia to scale to the largest encyclopedia in the world, while maintaining quality and consistency. However, conversations about how quality control should work and what role algorithms should play have generally been led by the expert engineers who have the skills and resources to develop and modify these complex algorithmic systems. In this paper, we describe ORES: an algorithmic scoring service that supports real-time scoring of wiki edits using multiple independent classifiers trained on different datasets. ORES decouples several activities that have typically all been performed by engineers: choosing or curating training data, building models to serve predictions, auditing predictions, and developing interfaces or automated agents that act on those predictions. This meta-algorithmic system was designed to open up socio-technical conversations about algorithms in Wikipedia to a broader set of participants. In this paper, we discuss the theoretical mechanisms of social change ORES enables and detail case studies in participatory machine learning around ORES from the 5 years since its deployment. CCS Concepts: • Networks → Online social networks; • Computing methodologies → Supervised 148 learning by classification; • Applied computing → Sociology; • Software and its engineering → Software design techniques; • Computer systems organization → Cloud computing; Additional Key Words and Phrases: Wikipedia; Reflection; Machine learning; Transparency; Fairness; Algo- rithms; Governance ACM Reference Format: Aaron Halfaker and R.
  • Critical Point of View: a Wikipedia Reader

    Critical Point of View: a Wikipedia Reader

    w ikipedia pedai p edia p Wiki CRITICAL POINT OF VIEW A Wikipedia Reader 2 CRITICAL POINT OF VIEW A Wikipedia Reader CRITICAL POINT OF VIEW 3 Critical Point of View: A Wikipedia Reader Editors: Geert Lovink and Nathaniel Tkacz Editorial Assistance: Ivy Roberts, Morgan Currie Copy-Editing: Cielo Lutino CRITICAL Design: Katja van Stiphout Cover Image: Ayumi Higuchi POINT OF VIEW Printer: Ten Klei Groep, Amsterdam Publisher: Institute of Network Cultures, Amsterdam 2011 A Wikipedia ISBN: 978-90-78146-13-1 Reader EDITED BY Contact GEERT LOVINK AND Institute of Network Cultures NATHANIEL TKACZ phone: +3120 5951866 INC READER #7 fax: +3120 5951840 email: [email protected] web: http://www.networkcultures.org Order a copy of this book by sending an email to: [email protected] A pdf of this publication can be downloaded freely at: http://www.networkcultures.org/publications Join the Critical Point of View mailing list at: http://www.listcultures.org Supported by: The School for Communication and Design at the Amsterdam University of Applied Sciences (Hogeschool van Amsterdam DMCI), the Centre for Internet and Society (CIS) in Bangalore and the Kusuma Trust. Thanks to Johanna Niesyto (University of Siegen), Nishant Shah and Sunil Abraham (CIS Bangalore) Sabine Niederer and Margreet Riphagen (INC Amsterdam) for their valuable input and editorial support. Thanks to Foundation Democracy and Media, Mondriaan Foundation and the Public Library Amsterdam (Openbare Bibliotheek Amsterdam) for supporting the CPOV events in Bangalore, Amsterdam and Leipzig. (http://networkcultures.org/wpmu/cpov/) Special thanks to all the authors for their contributions and to Cielo Lutino, Morgan Currie and Ivy Roberts for their careful copy-editing.
  • A Wikipedia Reader

    A Wikipedia Reader

    UvA-DARE (Digital Academic Repository) Critical point of view: a Wikipedia reader Lovink, G.; Tkacz, N. Publication date 2011 Document Version Final published version Link to publication Citation for published version (APA): Lovink, G., & Tkacz, N. (2011). Critical point of view: a Wikipedia reader. (INC reader; No. 7). Institute of Network Cultures. http://www.networkcultures.org/_uploads/%237reader_Wikipedia.pdf General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:05 Oct 2021 w ikipedia pedai p edia p Wiki CRITICAL POINT OF VIEW A Wikipedia Reader 2 CRITICAL POINT OF VIEW A Wikipedia Reader CRITICAL POINT OF VIEW 3 Critical Point of View: A Wikipedia Reader Editors: Geert Lovink
  • Analyzing Wikidata Transclusion on English Wikipedia

    Analyzing Wikidata Transclusion on English Wikipedia

    Analyzing Wikidata Transclusion on English Wikipedia Isaac Johnson Wikimedia Foundation [email protected] Abstract. Wikidata is steadily becoming more central to Wikipedia, not just in maintaining interlanguage links, but in automated popula- tion of content within the articles themselves. It is not well understood, however, how widespread this transclusion of Wikidata content is within Wikipedia. This work presents a taxonomy of Wikidata transclusion from the perspective of its potential impact on readers and an associated in- depth analysis of Wikidata transclusion within English Wikipedia. It finds that Wikidata transclusion that impacts the content of Wikipedia articles happens at a much lower rate (5%) than previous statistics had suggested (61%). Recommendations are made for how to adjust current tracking mechanisms of Wikidata transclusion to better support metrics and patrollers in their evaluation of Wikidata transclusion. Keywords: Wikidata · Wikipedia · Patrolling 1 Introduction Wikidata is steadily becoming more central to Wikipedia, not just in maintaining interlanguage links, but in automated population of content within the articles themselves. This transclusion of Wikidata content within Wikipedia can help to reduce maintenance of certain facts and links by shifting the burden to main- tain up-to-date, referenced material from each individual Wikipedia to a single repository, Wikidata. Current best estimates suggest that, as of August 2020, 62% of Wikipedia ar- ticles across all languages transclude Wikidata content. This statistic ranges from Arabic Wikipedia (arwiki) and Basque Wikipedia (euwiki), where nearly 100% of articles transclude Wikidata content in some form, to Japanese Wikipedia (jawiki) at 38% of articles and many small wikis that lack any Wikidata tran- sclusion.
  • Thanks for Stopping By: a Study of “Thanks

    Thanks for Stopping By: a Study of “Thanks

    Thanks for Stopping By: A Study of “Thanks” Usage on Wikimedia Swati Goel Ashton Anderson Leila Zia [email protected] [email protected] [email protected] Henry M. Gunn High School University of Toronto Wikimedia Foundation ABSTRACT editors a better experience, can therefore increase editor activity. The Thanks feature on Wikipedia, also known as “Thanks”, is a tool A positive environment may actually be one of the most crucial with which editors can quickly and easily send one other positive elements for increasing engagement, as social factors tend to out- feedback [1]. The aim of this project is to better understand this fea- weigh even those surrounding usability with regards to positively ture: its scope, the characteristics of a typical “Thanks” interaction, affecting contribution [3]. The impact of these social factors could and the effects of receiving a thank on individual editors. Westudy be quite significant, as a community member’s internal value sys- the motivational impacts of “Thanks” because maintaining editor tems can be influenced by external rewards, thus making positive engagement is a central problem for crowdsourced repositories of feedback an extremely useful tool in building online communi- knowledge such as Wikimedia. Our main findings are that most ties [6]. The Thanks feature could therefore represent an important editors have not been exposed to the Thanks feature (meaning they resource for building a positive Wiki community. have never given nor received a thank), thanks are typically sent “Thanks” is no longer a new Wiki feature, having been imple- upwards (from less experienced to more experienced editors), and mented on English Wikipedia on May 30th, 2013 and introduced receiving a thank is correlated with having high levels of editor to all projects soon thereafter.
  • Wikipedia Data Analysis

    Wikipedia Data Analysis

    Wikipedia data analysis. Introduction and practical examples Felipe Ortega June 29, 2012 Abstract This document offers a gentle and accessible introduction to conduct quantitative analysis with Wikipedia data. The large size of many Wikipedia communities, the fact that (for many languages) the already account for more than a decade of activity and the precise level of detail of records accounting for this activity represent an unparalleled opportunity for researchers to conduct interesting studies for a wide variety of scientific disciplines. The focus of this introductory guide is on data retrieval and data preparation. Many re- search works have explained in detail numerous examples of Wikipedia data analysis, includ- ing numerical and graphical results. However, in many cases very little attention is paid to explain the data sources that were used in these studies, how these data were prepared before the analysis and additional information that is also available to extend the analysis in future studies. This is rather unfortunate since, in general, data retrieval and data preparation usu- ally consumes no less than 75% of the whole time devoted to quantitative analysis, specially in very large datasets. As a result, the main goal of this document is to fill in this gap, providing detailed de- scriptions of available data sources in Wikipedia, and practical methods and tools to prepare these data for the analysis. This is the focus of the first part, dealing with data retrieval and preparation. It also presents an overview of useful existing tools and frameworks to facilitate this process. The second part includes a description of a general methodology to undertake quantitative analysis with Wikipedia data, and open source tools that can serve as building blocks for researchers to implement their own analysis process.
  • Wikimedia Foundation Metrics Meeting 5 November 2015 Agenda

    Wikimedia Foundation Metrics Meeting 5 November 2015 Agenda

    Wikimedia Foundation metrics meeting 5 November 2015 Agenda Welcome Jimmy & Lila Strategy update Community update Metrics Showcase: Content translation (CX) Q&A Photo by Benh LIEU SONG, CC BY SA 4.0 Welcome! Requisition hires: Contractors, interns & volunteers: ● Zachary McCune - Comms - SF ● Gretchen Holtman - Advancement - CA ● Chris Schilling - CE - IL ● Natalie Cadranel - Advancement - SF ● Jan Drewniak - Engineering - Poland ● Mohammed Abdulai - CE - Ghana ● Morgan Jue - CE - SF Anniversaries Megan Hernandez (6 yrs) Winifred Olliff (5 yrs) Quim Gil (3 yrs) Amanda Bittaker (1 yr) Grace Gellerman (1 yr) Jacob Rogers (1 yr) Jerry Kim (1 yr) Juliet Barbara (1 yr) Niharika Kohli (1 yr) Sandra George (1 yr) Stas Malyshev (1 yr) Photo by Pink Sherbet Photography, CC BY SA 2.0 Jimmy & Lila Lourdes Cardenal Photo by Ruben Ortega, CC BY SA 4.0 Photo by Lourdes Photo by Lourdes Cardenal, CC BY SA 3.0 Sae Kitamura Photo by Kakidai, CC BY SA 3.0 Jeevan Jose Photo by Muhamed Sherif, CC BY SA 3.0 Photo by Jee Photo by Jeevan Jose, CC BY SA 3.0 Ravan Jaafar Altaie Photo by Adam Novak, CC BY SA 3.0 Manjai Lee Wikiplorer visualization tool Paulina Sánchez Photo by V Grigas, CC BY SA 3.0 Lionel Allorge Photo by V Grigas, CC BY SA 3.0 Photo by Lionel Photo by ToucanWings, CC BY SA 3.0 Culture & Community Thank you!!! 1. Communication? What communication? 2. HR at the C-level. 3. Culture survey. 4. Superprotect is gone! 5. Brown-bags: culture & strategy. Mark your calendars for MONDAY.
  • Gender Gap in Wikipedia Editing

    Gender Gap in Wikipedia Editing

    5 Gender Gap in Wikipedia Editing A Cross Language Comparison Paolo Massa and Asta Zelenkauskaite INTRODUCTION According to various surveys, the percentage of women editing Wikipedia barely reaches 10 percent.1, 2 The issue of gender distribution on Wikipedia was first brought to public attention by an article in the New York Times on January 2011. The article, titled “Define Gender Gap? Look Up Wikipedia’s Contributor List,”3 started by highlighting how, in just ten years, the Wikipedia community accomplished some remarkable goals, such as reaching more than 3.5 million articles in English and starting an online encyclopedia in more than 250 languages. Yet Wikipedia failed to reach at least a minimal gender balance: according to the United Nations and Maas- tricht University 2010 reported survey, less than 13 percent of contributors were female. A more recent survey carried out in 2011 by the Wikimedia Foundation, the nonprofit organization that coordinates the various Wikipedia projects,4 reported even a steeper gender gap: women account for just 9 percent of editors. The importance of Wikipedia editors’ diversity is relevant since Wikipedia is increasingly becoming one of the most accessed Web sources for information needs. Some 53 percent of American Internet users searched for information on Wikipedia as of May 2010; 88 percent of 2,318 university students used Wikipedia during a course-related research process, and finally, Wikipedia is the sixth most visited site on the entire Web.5 Thus, because so many people read the content of Wikipedia pages, it is important to become aware that these pages reflect the point of view of a pre- dominantly male population.