The Tower of Babel Meets Web 2.0: User-Generated Content and Its Applications in a Multilingual Context Brent Hecht* and Darren Gergle*† Northwestern University *Dept
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
D2.8 GLAM-Wiki Collaboration Progress Report 2
Project Acronym: Europeana Sounds Grant Agreement no: 620591 Project Title: Europeana Sounds D2.8 GLAM-Wiki Collaboration Progress Report 2 Revision: Final Date: 30/11/2016 Authors: Brigitte Jansen and Harry van Biessum, NISV Abstract: Within the Europeana Sounds GLAM-wiki collaboration task, nine edit-a-thons were organised by seven project partners. These edit-a-thons were held in Italy, Denmark, Latvia, England, Greece, France and the Netherlands. This report documents each event, the outcomes and lessons learned during this task. Dissemination level Public X Confidential, only for the members of the Consortium and Commission Services Coordinated by the British Library, the Europeana Sounds project is co-funded by the European Union, through the ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme (CIP) http://ec.europa.eu/information_society/activities/ict_psp/ Europeana Sounds EC-GA 620591 EuropeanaSounds-D2.8-GLAM-wiki-collaboration-progress-report-2-v1.0.docx 30/11/2016 PUBLIC Revision history Version Status Name, organisation Date Changes 0.1 ToC Brigitte Jansen & Harry 14/10/2016 van Biessum, NISV 0.2 Draft Brigitte Jansen & Harry 04/11/2016 First draft van Biessum, NISV 0.3 Draft Zane Grosa, NLL 09/10/2016 Chapter 3.5 0.4 Draft Laura Miles, BL 15/11/2016 Chapters 3.4, 3.8, 5.1, 7 0.5 Draft Karen Williams, State 17/11/2016 Chapters 3.9, 7 and University Library Denmark 0.6 Draft Marianna Anastasiou, 17/11/2016 Chapter 3.6 FMS 0.7 Draft Brigitte Jansen, Maarten 18/11/2016 Incorporating feedback by Brinkerink & Harry van reviewer and Europeana Biessum, NISV Sounds partner 0.8 Draft David Haskiya, EF 28/11/2016 Added Chapter 3.2.2 0.9 Final draft Maarten Brinkerink & 28/11/2016 Finalise all chapters Harry van Biessum, NISV 1.0 Final Laura Miles & Richard 30/11/2016 Layout, minor changes Ranft, BL Review and approval Action Name, organisation Date Sindy Meijer, Wikimedia Chapter Netherland 16/11/2016 Reviewed by Liam Wyatt, EF 24/11/2016 Approved by Coordinator and PMB 30/11/2016 Distribution No. -
12€€€€Collaborating on the Sum of All Knowledge Across Languages
Wikipedia @ 20 • ::Wikipedia @ 20 12 Collaborating on the Sum of All Knowledge Across Languages Denny Vrandečić Published on: Oct 15, 2020 Updated on: Nov 16, 2020 License: Creative Commons Attribution 4.0 International License (CC-BY 4.0) Wikipedia @ 20 • ::Wikipedia @ 20 12 Collaborating on the Sum of All Knowledge Across Languages Wikipedia is available in almost three hundred languages, each with independently developed content and perspectives. By extending lessons learned from Wikipedia and Wikidata toward prose and structured content, more knowledge could be shared across languages and allow each edition to focus on their unique contributions and improve their comprehensiveness and currency. Every language edition of Wikipedia is written independently of every other language edition. A contributor may consult an existing article in another language edition when writing a new article, or they might even use the content translation tool to help with translating one article to another language, but there is nothing to ensure that articles in different language editions are aligned or kept consistent with each other. This is often regarded as a contribution to knowledge diversity since it allows every language edition to grow independently of all other language editions. So would creating a system that aligns the contents more closely with each other sacrifice that diversity? Differences Between Wikipedia Language Editions Wikipedia is often described as a wonder of the modern age. There are more than fifty million articles in almost three hundred languages. The goal of allowing everyone to share in the sum of all knowledge is achieved, right? Not yet. The knowledge in Wikipedia is unevenly distributed.1 Let’s take a look at where the first twenty years of editing Wikipedia have taken us. -
The Case of 13 Wikipedia Instances
Interaction Design and Architecture(s) Journal - IxD&A, N.22, 2014, pp. 34-47 The Impact of Culture On Smart Community Technology: The Case of 13 Wikipedia Instances Zinayida Petrushyna1, Ralf Klamma1, Matthias Jarke1,2 1 Advanced Community Information Systems Group, Information Systems and Databases Chair, RWTH Aachen University, Ahornstrasse 55, 52056 Aachen, Germany 2 Fraunhofer Institute for Applied Information Technology FIT, 53754 St. Augustin, Germany {petrushyna, klamma}@dbis.rwth-aachen.de [email protected] Abstract Smart communities provide technologies for monitoring social behaviors inside communities. The technologies that support knowledge building should consider the cultural background of community members. The studies of the influence of the culture on knowledge building is limited. Just a few works consider digital traces of individuals that they explain using cultural values and beliefs. In this work, we analyze 13 Wikipedia instances where users with different cultural background build knowledge in different ways. We compare edits of users. Using social network analysis we build and analyze co- authorship networks and watch the networks evolution. We explain the differences we have found using Hofstede dimensions and Schwartz cultural values and discuss implications for the design of smart community technologies. Our findings provide insights in requirements for technologies used for smart communities in different cultures. Keywords: Social network analysis, Wikipedia communities, Hofstede dimensions, Schwartz cultural values 1 Introduction People prefer to leave in smart cities where their needs are satisfied [1]. The development of smart cities depends on the collaboration of individuals. The investigation of the flow [1] of knowledge created by the individuals allows the monitoring of city smartness. -
Europe's Online Encyclopaedias
Europe's online encyclopaedias Equal access to knowledge of general interest in a post-truth era? IN-DEPTH ANALYSIS EPRS | European Parliamentary Research Service Author: Naja Bentzen Members' Research Service PE 630.347 – December 2018 EN The post-truth era – in which emotions seem to trump evidence, while trust in institutions, expertise and mainstream media is declining – is putting our information ecosystem under strain. At a time when information is increasingly being manipulated for ideological and economic purposes, public access to sources of trustworthy, general-interest knowledge – such as national online encyclopaedias – can help to boost our cognitive resilience. Basic, reliable background information about history, culture, society and politics is an essential part of our societies' complex knowledge ecosystem and an important tool for any citizen searching for knowledge, facts and figures. AUTHOR Naja Bentzen, Members' Research Service This paper has been drawn up by the Members' Research Service, within the Directorate-General for Parliamentary Research Services (EPRS) of the Secretariat of the European Parliament. The annexes to this paper are dedicated to individual countries and contain contributions from Ilze Eglite (coordinating information specialist), Evarts Anosovs, Marie-Laure Augere-Granier, Jan Avau, Michele Bigoni, Krisztina Binder, Kristina Grosek, Matilda Ekehorn, Roy Hirsh, Sorina Silvia Ionescu, Ana Martinez Juan, Ulla Jurviste, Vilma Karvelyte, Giorgios Klis, Maria Kollarova, Veronika Kunz, Elena Lazarou, Tarja Laaninen, Odile Maisse, Helmut Masson, Marketa Pape, Raquel Juncal Passos Rocha, Eric Pichon, Anja Radjenovic, Beata Rojek- Podgorska, Nicole Scholz, Anne Vernet and Dessislava Yougova. The graphics were produced by Giulio Sabbati. To contact the authors, please email: [email protected] LINGUISTIC VERSIONS Original: EN Translations: DE, FR Original manuscript, in English, completed in January 2018 and updated in December 2018. -
Is Wikipedia Really Neutral? a Sentiment Perspective Study of War-Related Wikipedia Articles Since 1945
PACLIC 29 Is Wikipedia Really Neutral? A Sentiment Perspective Study of War-related Wikipedia Articles since 1945 Yiwei Zhou, Alexandra I. Cristea and Zachary Roberts Department of Computer Science University of Warwick Coventry, United Kingdom fYiwei.Zhou, A.I.Cristea, [email protected] Abstract include books, journal articles, newspapers, web- pages, sound recordings2, etc. Although a “Neutral Wikipedia is supposed to be supporting the 3 “Neutral Point of View”. Instead of accept- point of view” (NPOV) is Wikipedia’s core content ing this statement as a fact, the current paper policy, we believe sentiment expression is inevitable analyses its veracity by specifically analysing in this user-generated content. Already in (Green- a typically controversial (negative) topic, such stein and Zhu, 2012), researchers have raised doubt as war, and answering questions such as “Are about Wikipedia’s neutrality, as they pointed out there sentiment differences in how Wikipedia that “Wikipedia achieves something akin to a NPOV articles in different languages describe the across articles, but not necessarily within them”. same war?”. This paper tackles this chal- Moreover, people of different language backgrounds lenge by proposing an automatic methodology based on article level and concept level senti- share different cultures and sources of information. ment analysis on multilingual Wikipedia arti- These differences have reflected on the style of con- cles. The results obtained so far show that rea- tributions (Pfeil et al., 2006) and the type of informa- sons such as people’s feelings of involvement tion covered (Callahan and Herring, 2011). Further- and empathy can lead to sentiment expression more, Wikipedia webpages actually allow to con- differences across multilingual Wikipedia on tain opinions, as long as they come from reliable au- war-related topics; the more people contribute thors4. -
Parallel-Wiki: a Collection of Parallel Sentences Extracted from Wikipedia
Parallel-Wiki: A Collection of Parallel Sentences Extracted from Wikipedia Dan Ştefănescu1,2 and Radu Ion1 1 Research Institute for Artificial Intelligence, Romanian Academy {danstef,radu}@racai.ro 2 Department of Computer Science, The University of Memphis [email protected] Abstract. Parallel corpora are essential resources for certain Natural Language Processing tasks such as Statistical Machine Translation. However, the existing publically available parallel corpora are specific to limited genres or domains, mostly juridical (e.g. JRC-Acquis) and medical (e.g. EMEA), and there is a lack of such resources for the general domain. This paper addresses this issue and presents a collection of parallel sentences extracted from the entire Wikipedia collection of documents for the following pairs of languages: English-German, English-Romanian and English-Spanish. Our work began with the processing of the publically available Wikipedia static dumps for the three languages in- volved. The existing text was stripped of the specific mark-up, cleaned of non- textual entries like images or tables and sentence-split. Then, corresponding documents for the above mentioned pairs of languages were identified using the cross-lingual Wikipedia links embedded within the documents themselves. Considering them comparable documents, we further employed a publically available tool named LEXACC, developed during the ACCURAT project, to extract parallel sentences from the preprocessed data. LEXACC assigns a score to each extracted pair, which is a measure of the degree of parallelism between the two sentences in the pair. These scores allow researchers to select only those sentences having a certain degree of parallelism suited for their intended purposes. -
Wikipedia @ 20
Wikipedia @ 20 Wikipedia @ 20 Stories of an Incomplete Revolution Edited by Joseph Reagle and Jackie Koerner The MIT Press Cambridge, Massachusetts London, England © 2020 Massachusetts Institute of Technology This work is subject to a Creative Commons CC BY- NC 4.0 license. Subject to such license, all rights are reserved. The open access edition of this book was made possible by generous funding from Knowledge Unlatched, Northeastern University Communication Studies Department, and Wikimedia Foundation. This book was set in Stone Serif and Stone Sans by Westchester Publishing Ser vices. Library of Congress Cataloging-in-Publication Data Names: Reagle, Joseph, editor. | Koerner, Jackie, editor. Title: Wikipedia @ 20 : stories of an incomplete revolution / edited by Joseph M. Reagle and Jackie Koerner. Other titles: Wikipedia at 20 Description: Cambridge, Massachusetts : The MIT Press, [2020] | Includes bibliographical references and index. Identifiers: LCCN 2020000804 | ISBN 9780262538176 (paperback) Subjects: LCSH: Wikipedia--History. Classification: LCC AE100 .W54 2020 | DDC 030--dc23 LC record available at https://lccn.loc.gov/2020000804 Contents Preface ix Introduction: Connections 1 Joseph Reagle and Jackie Koerner I Hindsight 1 The Many (Reported) Deaths of Wikipedia 9 Joseph Reagle 2 From Anarchy to Wikiality, Glaring Bias to Good Cop: Press Coverage of Wikipedia’s First Two Decades 21 Omer Benjakob and Stephen Harrison 3 From Utopia to Practice and Back 43 Yochai Benkler 4 An Encyclopedia with Breaking News 55 Brian Keegan 5 Paid with Interest: COI Editing and Its Discontents 71 William Beutler II Connection 6 Wikipedia and Libraries 89 Phoebe Ayers 7 Three Links: Be Bold, Assume Good Faith, and There Are No Firm Rules 107 Rebecca Thorndike- Breeze, Cecelia A. -
A Systematic Review of Scholarly Research on the Content of Wikipedia
“The sum of all human knowledge”: A systematic review of scholarly research on the content of Wikipedia Mostafa Mesgari John Molson School of Business, Concordia University, Montreal, Canada [email protected] Chitu Okoli John Molson School of Business, Concordia University, Montreal, Canada [email protected] Mohamad Mehdi Computer Science, Concordia University, Montreal, Canada [email protected] Finn Årup Nielsen DTU Compute, Technical University of Denmark, Kongens Lyngby, Denmark [email protected] Arto Lanamäki Department of Information Processing Science, University of Oulu, Oulu, Finland [email protected] This is a postprint of an article accepted for publication in Journal of the American Society for Information Science and Technology copyright © 2014 (American Society for Information Science and Technology). The paper can be cited as: Mesgari, Mostafa, Chitu Okoli, Mohamad Mehdi, Finn Årup Nielsen and Arto Lanamäki (2014). “The sum of all human knowledge”: A systematic review of scholarly research on the content of Wikipedia. Journal of the American Society for Information Science and Technology (Forthcoming since April 2014). Abstract Wikipedia might possibly be the best-developed attempt thus far of the enduring quest to gather all human knowledge in one place. Its accomplishments in this regard have made it an irresistible point of inquiry for researchers from various fields of knowledge. A decade of research has thrown light on many aspects of the Wikipedia community, its processes, and content. However, due to the variety of the fields inquiring about Wikipedia and the limited synthesis of the extensive research, there is little consensus on many aspects of Wikipedia’s content as an encyclopedic collection of human knowledge. -
1 Wikipedia As an Arena and Source for the Public. a Scandinavian
Wikipedia as an arena and source for the public. A Scandinavian Comparison of “Islam” Hallvard Moe Department of Information Science and Media Studies University of Bergen [email protected] Abstract This article compares Wikipedia as an arena and source for the public through analysis of articles on “Islam” across the three Scandinavian languages. Findings show that the Swedish article is continuously revised and adjusted by a fairly high number of contributors, with comparatively low concentration to a small group of top users. The Norwegian article is static, more basic, but still serves as a matter-of-factly presentation of Islam as religion to a stable amount of views. In contrast, the Danish article is at once more dynamic through more changes up until recently, it portrays Islam differently with a distinct focus on identity issues, and it is read less often. The analysis illustrates how studying Wikipedia can bring light to the receiving end of what goes on in the public sphere. The analysis also illustrates how our understanding of the online realm profits from “groundedness”, and how comparison of similar sites in different languages can yield insights into cultural as well as political differences, and their implications. Keywords Wikipedia, public sphere, freedom of information, comparative, digital methods Introduction The online encyclopedia Wikipedia is heralded as a non-commercial, user generated source of information. It is also a space for debate over controversial issues. Wikipedia, therefore, stands out from other online media more commonly analyzed in studies of public debate: on the one hand, mainstream media such as online newspapers are typically deemed interesting since they (are thought to) reach a wide audience with curated or edited content. -
Template for Phd Dissertations
Navigation, findability and the usage of cultural heritage on the web: an exploratory study Fransson, Jonas 2014 Link to publication Citation for published version (APA): Fransson, J. (2014). Navigation, findability and the usage of cultural heritage on the web: an exploratory study. Royal School of Library and Information Science, University of Copenhagen. Total number of authors: 1 General rights Unless other specific re-use rights are stated the following general rights apply: Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Read more about Creative commons licenses: https://creativecommons.org/licenses/ Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. LUND UNIVERSITY PO Box 117 221 00 Lund +46 46-222 00 00 Navigation, findability and the usage of cultural heritage on the web: an exploratory study JONAS FRANSSON A B Navigation, findability and the usage of cultural heritage on the web: an exploratory study JONAS FRANSSON PhD thesis from Royal School of Library and Information Science, Denmark C CIP – Cataloguing in Publication Fransson, Jonas Navigation, findability and the usage of cultural heritage on the web: an exploratory study / Jonas Fransson. -
The Tower of Babel Meets Web 2.0: User-Generated Content and Its Applications in a Multilingual Context Brent Hecht* and Darren Gergle*† Northwestern University *Dept
The Tower of Babel Meets Web 2.0: User-Generated Content and Its Applications in a Multilingual Context Brent Hecht* and Darren Gergle*† Northwestern University *Dept. of Electrical Engineering and Computer Science, † Dept. of Communication Studies [email protected], [email protected] ABSTRACT the goal of this research to illustrate the splintering effect of This study explores language’s fragmenting effect on user- this “Web 2.0 Tower of Babel”1 and to explicate the generated content by examining the diversity of knowledge positive and negative implications for HCI and AI-based representations across 25 different Wikipedia language applications that interact with or use Wikipedia data. editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in We begin by suggesting that current technologies and which these concepts are described. We demonstrate that applications that rely upon Wikipedia data structures the diversity present is greater than has been presumed in implicitly or explicitly espouse a global consensus the literature and has a significant influence on applications hypothesis with respect to the world’s encyclopedic that use Wikipedia as a source of world knowledge. We knowledge. In other words, they make the assumption that close by explicating how knowledge diversity can be encyclopedic world knowledge is largely consistent across beneficially leveraged to create “culturally-aware cultures and languages. To the social scientist this notion applications” and “hyperlingual applications”. will undoubtedly seem problematic, as centuries of work have demonstrated the critical role culture and context play Author Keywords in establishing knowledge diversity (although no work has Wikipedia, knowledge diversity, multilingual, hyperlingual, yet measured this effect in Web 2.0 user-generated content Explicit Semantic Analysis, semantic relatedness (UGC) on a large scale). -
CCURL 2014: Collaboration and Computing for Under- Resourced Languages in the Linked Open Data Era
CCURL 2014: Collaboration and Computing for Under- Resourced Languages in the Linked Open Data Era Workshop Programme 09:15-09:30 – Welcome and Introduction 09:30-10:30 – Invited Talk Steven Moran, Under-resourced languages data: from collection to application 10:30-11:00 – Coffee break 11:00-13:00 – Session 1 Chairperson: Joseph Mariani 11:00-11:30 – Oleg Kapanadze, The Multilingual GRUG Parallel Treebank – Syntactic Annotation for Under-Resourced Languages 11:30-12:00 – Martin Benjamin, Paula Radetzky, Multilingual Lexicography with a Focus on Less- Resourced Languages: Data Mining, Expert Input, Crowdsourcing, and Gamification 12:00-12:30 – Thierry Declerck, Eveline Wandl-Vogt, Karlheinz Mörth, Claudia Resch, Towards a Unified Approach for Publishing Regional and Historical Language Resources on the Linked Data Framework 12:30-13:00 – Delphine Bernhard, Adding Dialectal Lexicalisations to Linked Open Data Resources: the Example of Alsatian 13:00-15:00 – Lunch break 13:00-15:00 – Poster Session Chairpersons: Laurette Pretorius and Claudia Soria Georg Rehm, Hans Uszkoreit, Ido Dagan, Vartkes Goetcherian, Mehmet Ugur Dogan, Coskun Mermer, Tamás Varadi, Sabine Kirchmeier-Andersen, Gerhard Stickel, Meirion Prys Jones, Stefan Oeter, Sigve Gramstad, An Update and Extension of the META-NET Study “Europe’s Languages in the Digital Age” István Endrédy, Hungarian-Somali-English Online Dictionary and Taxonomy Chantal Enguehard, Mathieu Mangeot, Computerization of African Languages-French Dictionaries Uwe Quasthoff, Sonja Bosch, Dirk Goldhahn, Morphological Analysis for Less- Resourced Languages: Maximum Affix Overlap Applied to Zulu Edward O. Ombui, Peter W. Wagacha, Wanjiku Ng’ang’a, InterlinguaPlus Machine Translation Approach for Under-Resourced Languages: Ekegusii & Swahili Ronaldo Martins, UNLarium: a Crowd-Sourcing Environment for Multilingual Resources Anuschka van ´t Hooft, José Luis González Compeán, Collaborative Language Documentation: the Construction of the Huastec Corpus Sjur Moshagen, Jack Rueter, Tommi Pirinen, Trond Trosterud, Francis M.