The Sum of Human Knowledge? Not in One Wikipedia Language Edition
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Librarians As Wikimedia Movement Organizers in Spain: an Interpretive Inquiry Exploring Activities and Motivations
Librarians as Wikimedia Movement Organizers in Spain: An interpretive inquiry exploring activities and motivations by Laurie Bridges and Clara Llebot Laurie Bridges Oregon State University https://orcid.org/0000-0002-2765-5440 Clara Llebot Oregon State University https://orcid.org/0000-0003-3211-7396 Citation: Bridges, L., & Llebot, C. (2021). Librarians as Wikimedia Movement Organizers in Spain: An interpretive inQuiry exploring activities and motivations. First Monday, 26(6/7). https://doi.org/10.5210/fm.v26i3.11482 Abstract How do librarians in Spain engage with Wikipedia (and Wikidata, Wikisource, and other Wikipedia sister projects) as Wikimedia Movement Organizers? And, what motivates them to do so? This article reports on findings from 14 interviews with 18 librarians. The librarians interviewed were multilingual and contributed to Wikimedia projects in Castilian (commonly referred to as Spanish), Catalan, BasQue, English, and other European languages. They reported planning and running Wikipedia events, developing partnerships with local Wikimedia chapters, motivating citizens to upload photos to Wikimedia Commons, identifying gaps in Wikipedia content and filling those gaps, transcribing historic documents and adding them to Wikisource, and contributing data to Wikidata. Most were motivated by their desire to preserve and promote regional languages and culture, and a commitment to open access and open education. Introduction This research started with an informal conversation in 2018 about the popularity of Catalan Wikipedia, Viquipèdia, between two library coworkers in the United States, the authors of this article. Our conversation began with a sense of wonder about Catalan Wikipedia, which is ranked twentieth by number of articles, out of 300 different language Wikipedias (Meta contributors, 2020). -
The Wikipedia Diversity Observatory a Project to Identify and Bridge Content Gaps in Wikipedia
The Wikipedia Diversity Observatory A Project to Identify and Bridge Content Gaps in Wikipedia Marc Miquel-Ribé David Laniado Universitat Pompeu Fabra, Barcelona, Catalonia Eurecat, Centre Tecnològic de Catalunya [email protected] [email protected] ABSTRACT 1 Introduction In this paper we present the Wikipedia Diversity Observatory, Wikipedia is among the largest information repositories on the a project aimed to increase diversity within Wikipedia Internet that are both multilingual and created through language editions. The project includes dashboards with collaborative effort. Its prime objective1 is to "give free access visualizations and tools which show the gaps in terms of to the sum of all human knowledge" and, consequently, it exists concepts not represented or not shared across languages. The in as many as 309 languages. Even though the language dashboards are built on datasets generated for each of the more communities make the projects grow on a constant basis, the than 300 language editions, with features that label each article content does not represent the existing diversity in peoples, according to different categories relevant to overall content places, and cultures of the world; furthermore, there is a gap diversity. Through various examples, we show how the tools between language editions and articles often are not shared, or encourage and help editors to bridge the gaps in Wikipedia remain even unique to one language [1]. The creation of articles content. Finally, we discuss the project's impact on the in Wikipedia language editions is spontaneous and non- communities and implications for the Wikimedia movement, in directed. Several studies showed that cultural and geographical a moment in which covering diversity is considered strategic. -
Master Thesis
MASTER THESIS TITLE : Cultural configuration of Wikipedia: measuring Autoreferentiality in different languages MASTER DEGREE: Master of Science in Telecommunication Engineering & Management AUTHOR: Marc Miquel Ribe´ DIRECTOR: Horacio Rodr´ıguez Hontoria TUTOR: Sebastia` Sallent Ribes DATE: March 31, 2011 T´ıtol : Cultural configuration of Wikipedia: measuring Autoreferentiality in different languages Autor: Marc Miquel Ribe´ Director: Horacio Rodr´ıguez Hontoria Tutor: Sebastia` Sallent Ribes Data: 31 de marc¸de 2011 Resum ”Wikipedia es´ un projecte enciclopedic` multiling¨ue, col·laboratiu, basat en web i sense anim` de lucre impulsat per la Fundacio´ Wikimedia”, aix´ı es´ com s’autodescriu Wikipedia en la definicio´ de l’article que du el seu nom. Aixo` significa que l’enciclopedia` pot ser modificada en qualsevol moment, per qualsevol persona i des de qualsevol lloc. Aquestes premisses i la seva gran participacio´ fan que es tracti d’un excel·lent objecte social d’estudi, que a la vegada, per tractar-se d’un artefacte tecnologic,` permeti tambe´ l’´us de tecniques` de processament llenguatge natural, obtencio´ i mineria de dades. Tanmateix, en la recerca actual hi ha una clara mancanc¸a en software que pugui aproximar-s’hi d’una manera integral. Tenint en compte aquest buit realitzem una caracteritzacio´ de Wikipedia amb l’objectiu de coneixer` a fons quins son´ els elements i estructures d’informacio´ que conte´ i com despres´ poden obtenir-se mitjanc¸ant una eina anal´ıtica. Partim de l’API existent anomenada wikAPIdia, que desenvolupem fins a incloure-hi noves funcionalitats i posar-la apunt per a encarar m´ultiples escenaris i problematiques` de les ciencies` socials. -
Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories
Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories Brent Hecht† and Darren Gergle†‡ †Dept. of Electrical Engineering and Computer Science ‡Dept. of Communication Studies Northwestern University [email protected], [email protected] ABSTRACT In this paper, we explore this question by introducing and Self-focus is a novel way of understanding a type of bias in measuring self-focus, a new way of understanding a type of bias in community-maintained Web 2.0 graph structures. It goes beyond the graph structures that underlie many of these community- previous measures of topical coverage bias by encapsulating both maintained repositories, including one of the largest: Wikipedia. node- and edge-hosted biases in a single holistic measure of an We define self-focus bias as occurring when contributors to a entire community-maintained graph. We outline two methods to knowledge repository encode information that is important and quantify self-focus, one of which is very computationally correct to them and a large proportion of contributors to the same inexpensive, and present empirical evidence for the existence of repository, but not important and correct to contributors of similar self-focus using a “hyperlingual” approach that examines 15 repositories. different language editions of Wikipedia. We suggest applications Self-focus bias is similar to topical coverage biases [8, 11] in that of our methods and discuss the risks of ignoring self-focus bias in it seeks to describe the semantic makeup of knowledge technological applications. repositories. Topical coverage bias studies explicitly or implicitly compare the distribution of articles (or a similar measure) in Categories and Subject Descriptors particular semantic categories in Wikipedia to that of a more H.5.3 [Information Systems]: Group and Organization Interfaces traditional knowledge repository, generally in an effort to show – collaborative computing, computer-supported cooperative work, that Wikipedia describes in more detail semantic areas that are of theory and models. -
The Case of 13 Wikipedia Instances
Interaction Design and Architecture(s) Journal - IxD&A, N.22, 2014, pp. 34-47 The Impact of Culture On Smart Community Technology: The Case of 13 Wikipedia Instances Zinayida Petrushyna1, Ralf Klamma1, Matthias Jarke1,2 1 Advanced Community Information Systems Group, Information Systems and Databases Chair, RWTH Aachen University, Ahornstrasse 55, 52056 Aachen, Germany 2 Fraunhofer Institute for Applied Information Technology FIT, 53754 St. Augustin, Germany {petrushyna, klamma}@dbis.rwth-aachen.de [email protected] Abstract Smart communities provide technologies for monitoring social behaviors inside communities. The technologies that support knowledge building should consider the cultural background of community members. The studies of the influence of the culture on knowledge building is limited. Just a few works consider digital traces of individuals that they explain using cultural values and beliefs. In this work, we analyze 13 Wikipedia instances where users with different cultural background build knowledge in different ways. We compare edits of users. Using social network analysis we build and analyze co- authorship networks and watch the networks evolution. We explain the differences we have found using Hofstede dimensions and Schwartz cultural values and discuss implications for the design of smart community technologies. Our findings provide insights in requirements for technologies used for smart communities in different cultures. Keywords: Social network analysis, Wikipedia communities, Hofstede dimensions, Schwartz cultural values 1 Introduction People prefer to leave in smart cities where their needs are satisfied [1]. The development of smart cities depends on the collaboration of individuals. The investigation of the flow [1] of knowledge created by the individuals allows the monitoring of city smartness. -
DLDP Digital Language Survival Kit
The Digital Language Diversity Project Digital Language Survival Kit The DLDP Recommendations to Improve Digital Vitality The DLDP Recommendations to Improve Digital Vitality Imprint The DLDP Digital Language Survival Kit Authors: Klara Ceberio Berger, Antton Gurrutxaga Hernaiz, Paola Baroni, Davyth Hicks, Eleonore Kruse, Vale- ria Quochi, Irene Russo, Tuomo Salonen, Anneli Sarhimaa, Claudia Soria This work has been carried out in the framework of The Digital Language Diversity Project (w ww. dldp.eu), funded by the European Union under the Erasmus+ Programme (Grant Agreement no. 2015-1-IT02-KA204- 015090) © 2018 This work is licensed under a Creative Commons Attribution 4.0 International License. Cover design: Eleonore Kruse Disclaimer This publication reflects only the authors’ view and the Erasmus+ National Agency and the Com- mission are not responsible for any use that may be made of the information it contains. www.dldp.eu www.facebook.com/digitallanguagediversity [email protected] www.twitter.com/dldproject 2 The DLDP Recommendations to Improve Digital Vitality Recommendations at a Glance Digital Capacity Recommendations Indicator Level Recommendations Digital Literacy 2,3 Increasing digital literacy among your native language-speaking community 2,3 Promote the upskilling of language mentors, activists or dissemi- nators 2,3 Establish initiatives to inform and educate speakers about how to acquire and use particular communication and content creation skills 2 Teaching digital literacy to children in your language community through -
Gebiotoolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 4081–4088 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies Marta R. Costa-jussa,` Pau Li Lin, Cristina Espana-Bonet˜ ∗ TALP Research Center, Universitat Politecnica` de Catalunya, Barcelona ∗ DFKI GmBH and Saarland University, Saarbrucken¨ [email protected], [email protected], [email protected] Abstract We introduce GeBioToolkit, a tool for extracting multilingual parallel corpora at sentence level, with document and gender information from Wikipedia biographies. Despite the gender inequalities present in Wikipedia, the toolkit has been designed to extract corpus balanced in gender. While our toolkit is customizable to any number of languages (and to other domains than biographical entries), in this work we present a corpus of 2,000 sentences in English, Spanish and Catalan, which has been post-edited by native speakers to become a high-quality dataset for machine translation evaluation. While GeBioCorpus aims at being one of the first non-synthetic gender-balanced test datasets, GeBioToolkit aims at paving the path to standardize procedures to produce gender-balanced datasets. Keywords: corpora, gender bias, Wikipedia, machine translation 1. Introduction by volunteers. The toolkit is customizable for languages and gender-balance. We take advantage of Wikipedia mul- Gender biases are present in many natural language pro- tilinguality to extract a corpus of biographies, being each cessing applications (Costa-jussa,` 2019). This comes as biography a document available in all the selected lan- an undesired characteristic of deep learning architectures guages. -
Mining Cross-Cultural Relations from Wikipedia - a Study of 31 European Food Cultures
Mining cross-cultural relations from Wikipedia - A study of 31 European food cultures Paul Laufer Claudia Wagner Graz University of Technology GESIS & U. of Koblenz Graz, Austria Cologne, Germany [email protected] [email protected] Fabian Flöck Markus Strohmaier GESIS GESIS & U. of Koblenz Cologne, Germany Cologne, Germany fabian.fl[email protected] [email protected] ABSTRACT the editor community of the Romanian-language Wikipedia For many people, Wikipedia represents one of the primary could either have a deviant mental picture of the French sources of knowledge about foreign cultures. Yet, differ- cuisine { or it might estimate the priorities of Romanian- ent Wikipedia language editions offer different descriptions speaking readers to rather be on meat-based French deli- of cultural practices. Unveiling diverging representations of catessen than on wine and baking goods. Further, the gen- cultures provides an important insight, since they may foster eral interest of the Romanian-speaking readers in the French the formation of cross-cultural stereotypes, misunderstand- cuisine (for example measured by the number of views of the ings and potentially even conflict. In this work, we explore article about French cuisine in the Romanian language edi- to what extent the descriptions of cultural practices in var- tion) might serve to potentially displease any Francophile, ious European language editions of Wikipedia differ on the since the Romanian speaking community might show no- example of culinary practices and propose an approach to tably less interest in the French kitchen than in the Russian mine cultural relations between different language commu- or Hungarian one. This hypothetical scenario serves as an nities trough their description of and interest in their own example for numerous similar real-world cases (which can- and other communities' food culture. -
Proceedings of Rely on Different Character Sets Such As MATMT2008 Workshop: Mixing Approaches to CJK Or Arabic
9th SaLTMiL Workshop on “Free/open-Source Language Resources for the Machine Translation of Less-Resourced Languages” LREC 2014, Reykjavík, Iceland, 27 May 2014 Workshop Programme 09:00 – 09:30 Welcoming address by Workshop co-chair Mikel L. Forcada 09:30 – 10:30 Oral papers Iñaki Alegria, Unai Cabezon, Unai Fernandez de Betoño, Gorka Labaka, Aingeru Mayor, Kepa Sarasola and Arkaitz Zubiaga Wikipedia and Machine Translation: killing two birds with one stone Gideon Kotzé and Friedel Wolff Experiments with syllable-based English-Zulu alignment 10:30 – 11:00 Coffee break 11:00 – 13:00 Oral papers Inari Listenmaa and Kaarel Kaljurand Computational Estonian Grammar in Grammatical Framework Matthew Marting and Kevin Unhammer FST Trimming: Ending Dictionary Redundancy in Apertium Hrvoje Peradin, Filip Petkovski and Francis Tyers Shallow-transfer rule-based machine translation for the Western group of South Slavic languages Alex Rudnick, Annette Rios Gonzales and Michael Gasser Enhancing a Rule-Based MT System with Cross-Lingual WSD 13:00 – 13:30 General discussion 13:30 Closing Editors Mikel L. Forcada Universitat d’Alacant, Spain Kepa Sarasola Euskal Herriko Unibertsitatea, Spain Francis M. Tyers UiT Norgga árktalaš universitehta, Norway Workshop Organizers/Organizing Committee Mikel L. Forcada Universitat d’Alacant, Spain Kepa Sarasola Euskal Herriko Unibertsitatea, Spain Francis M. Tyers UiT Norgga árktalaš universitehta, Norway Workshop Programme Committee Iñaki Alegria Euskal Herriko Unibertsitatea, Spain Lars Borin Göteborgs Universitet, Sweden Elaine Uí Dhonnchadha Trinity College Dublin, Ireland Mikel L. Forcada Universitat d’Alacant, Spain Michael Gasser Indiana University, USA Måns Huldén Helsingin Yliopisto, Finland Krister Lindén Helsingin Yliopisto, Finland Nikola Ljubešić Sveučilište u Zagrebu, Croatia Lluís Padró Universitat Politècnica de Catalunya, Spain Juan Antonio Pérez-Ortiz Universitat d’Alacant, Spain Felipe Sánchez-Martínez Universitat d’Alacant, Spain Kepa Sarasola, Euskal Herriko Unibertsitatea, Spain Kevin P. -
Whatsupcat1.Pdf
What’s up with Catalonia? “. the causes which impel them to the separation . .” Edited by Liz Castro Catalonia Press What’s up with Catalonia? The causes which impel them to the separation Translated and edited by Liz Castro Published by Catalonia Press http://www.cataloniapress.com Ashfield, Massachusetts, USA Copyright © 2013 Each writer maintains the copyright for his or her respective article. Cover design: Andreu Cabré © 2013 All rights reserved Proofreading: Margaret Trejo Notice of rights All rights reserved. No part of this book may be reproduced or transmitted in any form by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, except by the collaborators themselves each of whom may publisher his or her own article on his or her own website. For information on getting permission for reprints and excerpts, contact [email protected] ISBN: Print: 978-1-61150-032-5 EPUB: 978-1-61150-033-2 Kindle: 978-1-61150-034-9 Library of Congress Control Number: 2013901821 Contents Editor’s note 7 Liz Castro Prologue: A new path for Catalonia 9 Artur Mas i Gavarró President of Catalonia Catalonia, a new state in Europe 13 Carme Forcadell Lluís 2013: The transition year toward the referendum on independence 19 Oriol Junqueras Premeditated asphyxia 23 Elisenda Paluzie It’s always been there 31 F. Xavier Vila Catalonia, land of immigration 39 Andreu Domingo Opening the black box of secessionism 45 Laia Balcells Schooling in Catalonia (1978–2012) 51 Pere Mayans Balcells The view from Brussels 59 Ramon Tremosa i Balcells Keep Calm and Speak Catalan 67 Josep Maria Ganyet Wilson, Obama, Catalonia, and Figueres 75 Enric Pujol Casademont News from Catalonia 79 Josep M. -
Conference Abstracts
NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Held under the Patronage of UNESCO, the United Nations Educational, Scientific and Cultural Organization MAY 26 – 31, 2014 HARPA CONFERENCE CENTER REYKJAVIK, ICELAND CONFERENCE ABSTRACTS Editors: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis. Assistant Editors: Sara Goggi, Jérémy Leixa, Hélène Mazo The LREC 2014 Proceedings are licensed under a Creative Commons Attribution- NonCommercial 4.0 International License i ii LREC 2014, NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Title: LREC 2014 Conference Abstracts Distributed by: ELRA – European Language Resources Association 9, rue des Cordelières 75013 Paris France Tel.: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 www.elra.info and www.elda.org Email: [email protected] and [email protected] ISBN 978-2-9517408-8-4 EAN 9782951740884 iii iv Introduction of the Conference Chair and ELRA President Nicoletta Calzolari I wish to express to Mrs. Irina Bokova, Director-General of UNESCO, the gratitude of the Program Committee, of all LREC participants and my personal for her Distinguished Patronage of LREC 2014. Languages – mentioned in the first article of UNESCO Constitution – have been at the heart of UNESCO mission and programmes throughout its history. I am also especially grateful to Madame Vigdís Finnbogadóttir, UNESCO’s Goodwill Ambassador for languages and former President of Iceland (1980-1996), first woman in the world elected as head of state in a democratic election, for the continuous personal support she has granted to LREC since our first visit in Reykjavík in 2012. -
Wikitrip: Animated Visualization Over Time of Gender and Geo-Location of Wikipedians Who Edited a Page
WikiTrip: animated visualization over time of gender and geo-location of Wikipedians who edited a page Paolo Massa, Maurizio Napolitano, Federico Scrinzi {massa, napolita, fscrinzi}@fbk.eu Bruno Kessler Foundation Trento, Italy WikiTrip allows to have a trip in the process of creation of any Wikipedia page from any language edition of Wikipedia. WikiTrip is an interactive web tool empowering its users by providing an insightful visualization of two kinds of information about the Wikipedians who edited the selected page: their location in the world and their gender. If you want to investigate, for example, where in the world are Wikipedians who edited the page “Peace”, WikiTrip is the right tool. And you can check also the origin of edits for in the Arabic Wikipedia or “Amani” in the Swahili Wikipedia. Moreover, if ” سلم“ the equivalent page you have ever wondered if a specific page was edited more by male or female Wikipedians, WikiTrip allows to explore this information as well. Visualization of both information is available over time so that you can appreciate the evolution of the page over years, from its creation up to the present. WikiTrip is available at http://sonetlab.fbk.eu/wikitrip/ Locations in the world are visualized on a zoomable and scrollable map and, in order to deal with large datasets of points, they are clustered at runtime in bubbles of varying dimensions depending on the number of points in that location. The 10 countries from which most edits came are visualized also in a specific bar plot. The location is visualized only for edits made by anonymous users since they are identified by their IP address and this can be mapped by WikiTrip to the place in the world from which the user edited Wikipedia.