Locating Foci of Translation on Wikipedia: Some Methodological Proposals1
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Cultural Anthropology Through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-Based Sentiment
Cultural Anthropology through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-based Sentiment Peter A. Gloor, Joao Marcos, Patrick M. de Boer, Hauke Fuehres, Wei Lo, Keiichi Nemoto [email protected] MIT Center for Collective Intelligence Abstract In this paper we study the differences in historical World View between Western and Eastern cultures, represented through the English, the Chinese, Japanese, and German Wikipedia. In particular, we analyze the historical networks of the World’s leaders since the beginning of written history, comparing them in the different Wikipedias and assessing cultural chauvinism. We also identify the most influential female leaders of all times in the English, German, Spanish, and Portuguese Wikipedia. As an additional lens into the soul of a culture we compare top terms, sentiment, emotionality, and complexity of the English, Portuguese, Spanish, and German Wikinews. 1 Introduction Over the last ten years the Web has become a mirror of the real world (Gloor et al. 2009). More recently, the Web has also begun to influence the real world: Societal events such as the Arab spring and the Chilean student unrest have drawn a large part of their impetus from the Internet and online social networks. In the meantime, Wikipedia has become one of the top ten Web sites1, occasionally beating daily newspapers in the actuality of most recent news. Be it the resignation of German national soccer team captain Philipp Lahm, or the downing of Malaysian Airlines flight 17 in the Ukraine by a guided missile, the corresponding Wikipedia page is updated as soon as the actual event happened (Becker 2012. -
A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics. -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
Title of Thesis: ABSTRACT CLASSIFYING BIAS
ABSTRACT Title of Thesis: CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis Directed By: Dr. David Zajic, Ph.D. Our project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias. CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis submitted in partial fulfillment of the requirements of the Gemstone Honors Program, University of Maryland, 2018 Advisory Committee: Dr. David Zajic, Chair Dr. Brian Butler Dr. Marine Carpuat Dr. Melanie Kill Dr. Philip Resnik Mr. Ed Summers © Copyright by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang 2018 Acknowledgements We would like to express our sincerest gratitude to our mentor, Dr. -
International Journal of Computational Linguistics
International Journal of Computational Linguistics & Chinese Language Processing Aims and Scope International Journal of Computational Linguistics and Chinese Language Processing (IJCLCLP) is an international journal published by the Association for Computational Linguistics and Chinese Language Processing (ACLCLP). This journal was founded in August 1996 and is published four issues per year since 2005. This journal covers all aspects related to computational linguistics and speech/text processing of all natural languages. Possible topics for manuscript submitted to the journal include, but are not limited to: • Computational Linguistics • Natural Language Processing • Machine Translation • Language Generation • Language Learning • Speech Analysis/Synthesis • Speech Recognition/Understanding • Spoken Dialog Systems • Information Retrieval and Extraction • Web Information Extraction/Mining • Corpus Linguistics • Multilingual/Cross-lingual Language Processing Membership & Subscriptions If you are interested in joining ACLCLP, please see appendix for further information. Copyright © The Association for Computational Linguistics and Chinese Language Processing International Journal of Computational Linguistics and Chinese Language Processing is published four issues per volume by the Association for Computational Linguistics and Chinese Language Processing. Responsibility for the contents rests upon the authors and not upon ACLCLP, or its members. Copyright by the Association for Computational Linguistics and Chinese Language Processing. All rights reserved. No part of this journal may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical photocopying, recording or otherwise, without prior permission in writing form from the Editor-in Chief. Cover Calligraphy by Professor Ching-Chun Hsieh, founding president of ACLCLP Text excerpted and compiled from ancient Chinese classics, dating back to 700 B.C. -
The Culture of Wikipedia
Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented. -
An Analysis of Contributions to Wikipedia from Tor
Are anonymity-seekers just like everybody else? An analysis of contributions to Wikipedia from Tor Chau Tran Kaylea Champion Andrea Forte Department of Computer Science & Engineering Department of Communication College of Computing & Informatics New York University University of Washington Drexel University New York, USA Seatle, USA Philadelphia, USA [email protected] [email protected] [email protected] Benjamin Mako Hill Rachel Greenstadt Department of Communication Department of Computer Science & Engineering University of Washington New York University Seatle, USA New York, USA [email protected] [email protected] Abstract—User-generated content sites routinely block contri- butions from users of privacy-enhancing proxies like Tor because of a perception that proxies are a source of vandalism, spam, and abuse. Although these blocks might be effective, collateral damage in the form of unrealized valuable contributions from anonymity seekers is invisible. One of the largest and most important user-generated content sites, Wikipedia, has attempted to block contributions from Tor users since as early as 2005. We demonstrate that these blocks have been imperfect and that thousands of attempts to edit on Wikipedia through Tor have been successful. We draw upon several data sources and analytical techniques to measure and describe the history of Tor editing on Wikipedia over time and to compare contributions from Tor users to those from other groups of Wikipedia users. Fig. 1. Screenshot of the page a user is shown when they attempt to edit the Our analysis suggests that although Tor users who slip through Wikipedia article on “Privacy” while using Tor. Wikipedia’s ban contribute content that is more likely to be reverted and to revert others, their contributions are otherwise similar in quality to those from other unregistered participants and to the initial contributions of registered users. -
China Date: 8 January 2007
Refugee Review Tribunal AUSTRALIA RRT RESEARCH RESPONSE Research Response Number: CHN31098 Country: China Date: 8 January 2007 Keywords: China – Taiwan Strait – 2006 Military exercises – Typhoons This response was prepared by the Country Research Section of the Refugee Review Tribunal (RRT) after researching publicly accessible information currently available to the RRT within time constraints. This response is not, and does not purport to be, conclusive as to the merit of any particular claim to refugee status or asylum. Questions 1. Is there corroborating information about military manoeuvres and exercises in Pingtan? 2. Is there any information specifically about the military exercise there in July 2006? 3. Is there any information about “Army day” on 1 August 2006? 4. What are the aquatic farming/fishing activities carried out in that area? 5. Has there been pollution following military exercises along the Taiwan Strait? 6. The delegate makes reference to independent information that indicates that from May until August 2006 China particularly the eastern coast was hit by a succession of storms and typhoons. The last one being the hardest to hit China in 50 years. Could I have information about this please? The delegate refers to typhoon Prapiroon. What information is available about that typhoon? 7. The delegate was of the view that military exercises would not be organised in typhoon season, particularly such a bad one. Is there any information to assist? RESPONSE 1. Is there corroborating information about military manoeuvres and exercises in Pingtan? 2. Is there any information specifically about the military exercise there in July 2006? There is a minor naval base in Pingtan and military manoeuvres are regularly held in the Taiwan Strait where Pingtan in located, especially in the June to August period. -
THE CASE of WIKIPEDIA Shane Greenstein Feng Zhu
NBER WORKING PAPER SERIES COLLECTIVE INTELLIGENCE AND NEUTRAL POINT OF VIEW: THE CASE OF WIKIPEDIA Shane Greenstein Feng Zhu Working Paper 18167 http://www.nber.org/papers/w18167 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 June 2012 The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research. NBER working papers are circulated for discussion and comment purposes. They have not been peer- reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications. © 2012 by Shane Greenstein and Feng Zhu. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. Collective Intelligence and Neutral Point of View: The Case of Wikipedia Shane Greenstein and Feng Zhu NBER Working Paper No. 18167 June 2012 JEL No. L17,L3,L86 ABSTRACT We examine whether collective intelligence helps achieve a neutral point of view using data from a decade of Wikipedia’s articles on US politics. Our null hypothesis builds on Linus’ Law, often expressed as “Given enough eyeballs, all bugs are shallow.” Our findings are consistent with a narrow interpretation of Linus’ Law, namely, a greater number of contributors to an article makes an article more neutral. No evidence supports a broad interpretation of Linus’ Law. Moreover, several empirical facts suggest the law does not shape many articles. The majority of articles receive little attention, and most articles change only mildly from their initial slant. -
How to Contribute Climate Change Information to Wikipedia : a Guide
HOW TO CONTRIBUTE CLIMATE CHANGE INFORMATION TO WIKIPEDIA Emma Baker, Lisa McNamara, Beth Mackay, Katharine Vincent; ; © 2021, CDKN This work is licensed under the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/legalcode), which permits unrestricted use, distribution, and reproduction, provided the original work is properly credited. Cette œuvre est mise à disposition selon les termes de la licence Creative Commons Attribution (https://creativecommons.org/licenses/by/4.0/legalcode), qui permet l’utilisation, la distribution et la reproduction sans restriction, pourvu que le mérite de la création originale soit adéquatement reconnu. IDRC Grant/ Subvention du CRDI: 108754-001-CDKN knowledge accelerator for climate compatible development How to contribute climate change information to Wikipedia A guide for researchers, practitioners and communicators Contents About this guide .................................................................................................................................................... 5 1 Why Wikipedia is an important tool to communicate climate change information .................................................................................................................................. 7 1.1 Enhancing the quality of online climate change information ............................................. 8 1.2 Sharing your work more widely ......................................................................................................8 1.3 Why researchers should -
Semantically Annotated Snapshot of the English Wikipedia
Semantically Annotated Snapshot of the English Wikipedia Jordi Atserias, Hugo Zaragoza, Massimiliano Ciaramita, Giuseppe Attardi Yahoo! Research Barcelona, U. Pisa, on sabbatical at Yahoo! Research C/Ocata 1 Barcelona 08003 Spain {jordi, hugoz, massi}@yahoo-inc.com, [email protected] Abstract This paper describes SW1, the first version of a semantically annotated snapshot of the English Wikipedia. In recent years Wikipedia has become a valuable resource for both the Natural Language Processing (NLP) community and the Information Retrieval (IR) community. Although NLP technology for processing Wikipedia already exists, not all researchers and developers have the computational resources to process such a volume of information. Moreover, the use of different versions of Wikipedia processed differently might make it difficult to compare results. The aim of this work is to provide easy access to syntactic and semantic annotations for researchers of both NLP and IR communities by building a reference corpus to homogenize experiments and make results comparable. These resources, a semantically annotated corpus and a “entity containment” derived graph, are licensed under the GNU Free Documentation License and available from http://www.yr-bcn.es/semanticWikipedia. 1. Introduction 2. Processing Wikipedia1, the largest electronic encyclopedia, has be- Starting from the XML Wikipedia source we carried out a come a widely used resource for different Natural Lan- number of data processing steps: guage Processing tasks, e.g. Word Sense Disambiguation (Mihalcea, 2007), Semantic Relatedness (Gabrilovich and • Basic preprocessing: Stripping the text from the Markovitch, 2007) or in the Multilingual Question Answer- XML tags and dividing the obtained text into sen- ing task at Cross-Language Evaluation Forum (CLEF)2. -
City Enabling Environment Rating: Assessment of the Countries in Asia and the Pacific © 2018 UCLG ASPAC Cities Alliance
City Enabling Environment Rating: Assessment of the Countries in Asia and the Pacific © 2018 UCLG ASPAC Cities Alliance This Report includes the Introduction, Methodology, Findings and Conclusion of the City Enabling Environment Rating: Assessment of the Countries in Asia and the Pacific. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Publishers United Cities and Local Governments Asia-Pacific Jakarta’s City Hall Complex, Building E, 4th Floor Jl. Medan Merdeka Selatan 8-9 Jakarta, Indonesia www.uclg-aspac.org Cities Alliance Rue Royale 94, 3rd Floor 1000 Brussels, Belgium www.citiesalliance.org [email protected] DISCLAIMERS Cities Alliance and UCLG ASPAC do not represent or endorse the accuracy, reliability, or timeliness of the materials included in the report or of any advice, opinion, statement, or other information provided by any information provider or content provider, or any user of this website or other person or entity. Reliance upon the materials in the report or any such opinion, advice, statement, or other information shall be at your own risk. Cities Alliance and UCLG Asia-Pacific will not be liable in any capacity for damages or losses to the user that may result from the use of or reliance on the materials or any such advice, opinion, statement, or other information. The terms used to describe the legal status of any country, territory, city or area, or of its authorities, or concerning the delimitation of its frontiers or boundaries, or regarding its economic system or degree of development do not necessarily reflect the opinion of UCLG ASPAC.