Omnipedia: Bridging the Wikipedia Language
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Christians and Jews in Muslim Societies
Arabic and its Alternatives Christians and Jews in Muslim Societies Editorial Board Phillip Ackerman-Lieberman (Vanderbilt University, Nashville, USA) Bernard Heyberger (EHESS, Paris, France) VOLUME 5 The titles published in this series are listed at brill.com/cjms Arabic and its Alternatives Religious Minorities and Their Languages in the Emerging Nation States of the Middle East (1920–1950) Edited by Heleen Murre-van den Berg Karène Sanchez Summerer Tijmen C. Baarda LEIDEN | BOSTON Cover illustration: Assyrian School of Mosul, 1920s–1930s; courtesy Dr. Robin Beth Shamuel, Iraq. This is an open access title distributed under the terms of the CC BY-NC 4.0 license, which permits any non-commercial use, distribution, and reproduction in any medium, provided no alterations are made and the original author(s) and source are credited. Further information and the complete license text can be found at https://creativecommons.org/licenses/by-nc/4.0/ The terms of the CC license apply only to the original material. The use of material from other sources (indicated by a reference) such as diagrams, illustrations, photos and text samples may require further permission from the respective copyright holder. Library of Congress Cataloging-in-Publication Data Names: Murre-van den Berg, H. L. (Hendrika Lena), 1964– illustrator. | Sanchez-Summerer, Karene, editor. | Baarda, Tijmen C., editor. Title: Arabic and its alternatives : religious minorities and their languages in the emerging nation states of the Middle East (1920–1950) / edited by Heleen Murre-van den Berg, Karène Sanchez, Tijmen C. Baarda. Description: Leiden ; Boston : Brill, 2020. | Series: Christians and Jews in Muslim societies, 2212–5523 ; vol. -
A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics. -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
Title of Thesis: ABSTRACT CLASSIFYING BIAS
ABSTRACT Title of Thesis: CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis Directed By: Dr. David Zajic, Ph.D. Our project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias. CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis submitted in partial fulfillment of the requirements of the Gemstone Honors Program, University of Maryland, 2018 Advisory Committee: Dr. David Zajic, Chair Dr. Brian Butler Dr. Marine Carpuat Dr. Melanie Kill Dr. Philip Resnik Mr. Ed Summers © Copyright by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang 2018 Acknowledgements We would like to express our sincerest gratitude to our mentor, Dr. -
An Analysis of Contributions to Wikipedia from Tor
Are anonymity-seekers just like everybody else? An analysis of contributions to Wikipedia from Tor Chau Tran Kaylea Champion Andrea Forte Department of Computer Science & Engineering Department of Communication College of Computing & Informatics New York University University of Washington Drexel University New York, USA Seatle, USA Philadelphia, USA [email protected] [email protected] [email protected] Benjamin Mako Hill Rachel Greenstadt Department of Communication Department of Computer Science & Engineering University of Washington New York University Seatle, USA New York, USA [email protected] [email protected] Abstract—User-generated content sites routinely block contri- butions from users of privacy-enhancing proxies like Tor because of a perception that proxies are a source of vandalism, spam, and abuse. Although these blocks might be effective, collateral damage in the form of unrealized valuable contributions from anonymity seekers is invisible. One of the largest and most important user-generated content sites, Wikipedia, has attempted to block contributions from Tor users since as early as 2005. We demonstrate that these blocks have been imperfect and that thousands of attempts to edit on Wikipedia through Tor have been successful. We draw upon several data sources and analytical techniques to measure and describe the history of Tor editing on Wikipedia over time and to compare contributions from Tor users to those from other groups of Wikipedia users. Fig. 1. Screenshot of the page a user is shown when they attempt to edit the Our analysis suggests that although Tor users who slip through Wikipedia article on “Privacy” while using Tor. Wikipedia’s ban contribute content that is more likely to be reverted and to revert others, their contributions are otherwise similar in quality to those from other unregistered participants and to the initial contributions of registered users. -
(Anti-)Semitic” and “(Anti-) Zionist” in Modern Middle Eastern Discourse
On the use of the terms “(anti-)Semitic” and “(anti-) Zionist” in modern Middle Eastern discourse Lutz Edzard1 University of Oslo Abstract Just as political and cultural discourse in general, modern Middle Eastern discourse is at times characterized by a great deal of hostility, not only between different states or religious denominations, but also state-internally among various ethnic, political, or religious groups. This short article focuses on the use of the attributes “Semitic” and “Zionist,” as well as their negative counterparts “anti-Semitic” and “anti-Zionist,” respectively, in examples of both Arabic and Israeli critical to hostile discourse. The focus of the discussion will lie on how the original meanings of these terms, especially in their negated forms, tend to be distorted in engaged political and cultural discourse. Keywords: Semitic, Anti-Semitic, Zionist, lā-sāmī, ṣahyūnī, ʾanṭi-šemi, ṣiyyōn The term “Semitic” Brief overview of the term “Semitic” As is well known, the term “Semitic” derives from the name of one of the sons of Noah, Shem,2 and was suggested for the language family in question by the encyclopedic German historian and polymath August Ludwig von Schlözer in 1781. 3 In linguistics context, the term “Semitic” is generally speaking non-controversial. Together with Ancient Egyptian, as well as the language families Berber, Cushitic, Chadic, and possibly Omotic, the Semitic language family is part of the larger Afroasiatic macrofamily (formerly also referred to as “Hamito-Semitic”).4 This usage of the term “Semitic” must be kept apart from the usage of the term in the compound adjective “anti-Semitic,” a term only coined in 1879 in a pamphlet by the journalist Wilhelm Marr (if not already in 1860 by the bibliographer and Orientalist Moritz Steinschneider), referring to prejudices against or hatred of Jews. -
Texts from De.Wikipedia.Org, De.Wikisource.Org
Texts from http://www.gutenberg.org/files/13635/13635-h/13635-h.htm, de.wikipedia.org, de.wikisource.org BRITISH AND GERMAN AUTHORS AND Contents: INTELLECTUALS CONFRONT EACH OTHER IN 1914 1. British authors defend England’s war (17 September) The material here collected consists of altercations between British and German men of letters and professors in the first 2. Appeal of the 93 German professors “to the world of months of the First World War. Remarkably they present culture” (4 October 1914) themselves as collective bodies and as an authoritative voice on — 2a. German version behalf of their nation. — 2b. English version The English-language materials given here were published in, 3. Declaration of the German university professors (16 and have been quoted from, The New York Times Current October 1914) History: A Monthly Magazine ("The European War", vol. 1: — 3a. German version "From the beginning to March 1915"; New York: The New — 3b. English version York Times Company 1915), online on Project Gutenberg (www.gutenberg.org). That same source also contains many 4. Reply to the German professors, by British scholars interventions written à titre personnel by individuals, including (21 October 1914) G.B. Shaw, H.G. Wells, Arnold Bennett, John Galsworthy, Jerome K. Jerome, Rudyard Kipling, G.K. Chesterton, H. Rider Haggard, Robert Bridges, Arthur Conan Doyle, Maurice Maeterlinck, Henri Bergson, Romain Rolland, Gerhart Hauptmann and Adolf von Harnack (whose open letter "to Americans in Germany" provoked a response signed by 11 British theologians). The German texts given here here can be found, with backgrounds, further references and more precise datings, in the German wikipedia article "Manifest der 93" and the German wikisource article “Erklärung der Hochschullehrer des Deutschen Reiches” (in a version dated 23 October 1914, with French parallel translation, along with the names of all 3000 signatories). -
Florida State University Libraries
)ORULGD6WDWH8QLYHUVLW\/LEUDULHV 2020 Wiki-Donna: A Contribution to a More Gender-Balanced History of Italian Literature Online Zoe D'Alessandro Follow this and additional works at DigiNole: FSU's Digital Repository. For more information, please contact [email protected] THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS & SCIENCES WIKI-DONNA: A CONTRIBUTION TO A MORE GENDER-BALANCED HISTORY OF ITALIAN LITERATURE ONLINE By ZOE D’ALESSANDRO A Thesis submitted to the Department of Modern Languages and Linguistics in partial fulfillment of the requirements for graduation with Honors in the Major Degree Awarded: Spring, 2020 The members of the Defense Committee approve the thesis of Zoe D’Alessandro defended on April 20, 2020. Dr. Silvia Valisa Thesis Director Dr. Celia Caputi Outside Committee Member Dr. Elizabeth Coggeshall Committee Member Introduction Last year I was reading Una donna (1906) by Sibilla Aleramo, one of the most important works in Italian modern literature and among the very first explicitly feminist works in the Italian language. Wanting to know more about it, I looked it up on Wikipedia. Although there exists a full entry in the Italian Wikipedia (consisting of a plot summary, publishing information, and external links), the corresponding page in the English Wikipedia consisted only of a short quote derived from a book devoted to gender studies, but that did not address that specific work in great detail. As in-depth and ubiquitous as Wikipedia usually is, I had never thought a work as important as this wouldn’t have its own page. This discovery prompted the question: if this page hadn’t been translated, what else was missing? And was this true of every entry for books across languages, or more so for women writers? My work in expanding the entry for Una donna was the beginning of my exploration into the presence of Italian women writers in the Italian and English Wikipedias, and how it relates back to canon, Wikipedia, and gender studies. -
The Case of Croatian Wikipedia: Encyclopaedia of Knowledge Or Encyclopaedia for the Nation?
The Case of Croatian Wikipedia: Encyclopaedia of Knowledge or Encyclopaedia for the Nation? 1 Authorial statement: This report represents the evaluation of the Croatian disinformation case by an external expert on the subject matter, who after conducting a thorough analysis of the Croatian community setting, provides three recommendations to address the ongoing challenges. The views and opinions expressed in this report are those of the author and do not necessarily reflect the official policy or position of the Wikimedia Foundation. The Wikimedia Foundation is publishing the report for transparency. Executive Summary Croatian Wikipedia (Hr.WP) has been struggling with content and conduct-related challenges, causing repeated concerns in the global volunteer community for more than a decade. With support of the Wikimedia Foundation Board of Trustees, the Foundation retained an external expert to evaluate the challenges faced by the project. The evaluation, conducted between February and May 2021, sought to assess whether there have been organized attempts to introduce disinformation into Croatian Wikipedia and whether the project has been captured by ideologically driven users who are structurally misaligned with Wikipedia’s five pillars guiding the traditional editorial project setup of the Wikipedia projects. Croatian Wikipedia represents the Croatian standard variant of the Serbo-Croatian language. Unlike other pluricentric Wikipedia language projects, such as English, French, German, and Spanish, Serbo-Croatian Wikipedia’s community was split up into Croatian, Bosnian, Serbian, and the original Serbo-Croatian wikis starting in 2003. The report concludes that this structure enabled local language communities to sort by points of view on each project, often falling along political party lines in the respective regions. -
Parallel Creation of Gigaword Corpora for Medium Density Languages – an Interim Report
Parallel creation of gigaword corpora for medium density languages – an interim report Peter´ Halacsy,´ Andras´ Kornai, Peter´ Nemeth,´ Daniel´ Varga Budapest University of Technology Media Research Center H-1111 Budapest Stoczek u 2 {hp,kornai,pnemeth,daniel}@mokk.bme.hu Abstract For increased speed in developing gigaword language resources for medium resource density languages we integrated several FOSS tools in the HUN* toolkit. While the speed and efficiency of the resulting pipeline has surpassed our expectations, our experience in developing LDC-style resource packages for Uzbek and Kurdish makes clear that neither the data collection nor the subsequent processing stages can be fully automated. 1. Introduction pages (http://technorati.com/weblog/2007/ So far only a select few languages have been fully in- 04/328.html, April 2007) as a proxy for widely tegrated in the bloodstream of modern communications, available material. While the picture is somewhat distorted commerce, research, and education. Most research, devel- by artificial languages (Esperanto, Volapuk,¨ Basic En- opment, and even the language- and area-specific profes- glish) which have enthusiastic wikipedia communities but sional societies in computational linguistics (CL), natural relatively few webpages elsewhere, and by regional lan- language processing (NLP), and information retrieval (IR), guages in the European Union (Catalan, Galician, Basque, target English, the FIGS languages (French, Italian, Ger- Luxembourgish, Breton, Welsh, Lombard, Piedmontese, man, Spanish), -
Forgotten Palestinians
1 2 3 4 5 6 7 8 9 THE FORGOTTEN PALESTINIANS 10 1 2 3 4 5 6x 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 36x 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 36x 1 2 3 4 5 THE FORGOTTEN 6 PALESTINIANS 7 8 A History of the Palestinians in Israel 9 10 1 2 3 Ilan Pappé 4 5 6x 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 YALE UNIVERSITY PRESS 5 NEW HAVEN AND LONDON 36x 1 In memory of the thirteen Palestinian citizens who were shot dead by the 2 Israeli police in October 2000 3 4 5 6 7 8 9 10 1 2 3 4 5 Copyright © 2011 Ilan Pappé 6 The right of Ilan Pappé to be identified as author of this work has been asserted by 7 him in accordance with the Copyright, Designs and Patents Act 1988. 8 All rights reserved. This book may not be reproduced in whole or in part, in any form (beyond that copying permitted by Sections 107 and 108 of the U.S. Copyright 9 Law and except by reviewers for the public press) without written permission from 20 the publishers. 1 For information about this and other Yale University Press publications, 2 please contact: U.S. -
Does Wikipedia Matter? the Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Discus Si On Paper No
Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko First version: December 2015 This version: September 2017 Download this ZEW Discussion Paper from our ftp server: http://ftp.zew.de/pub/zew-docs/dp/dp15089.pdf Die Dis cus si on Pape rs die nen einer mög lichst schnel len Ver brei tung von neue ren For schungs arbei ten des ZEW. Die Bei trä ge lie gen in allei ni ger Ver ant wor tung der Auto ren und stel len nicht not wen di ger wei se die Mei nung des ZEW dar. Dis cus si on Papers are inten ded to make results of ZEW research prompt ly avai la ble to other eco no mists in order to encou ra ge dis cus si on and sug gesti ons for revi si ons. The aut hors are sole ly respon si ble for the con tents which do not neces sa ri ly repre sent the opi ni on of the ZEW. Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices ∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ First version: December 2015 This version: September 2017 September 25, 2017 Abstract We document a causal influence of online user-generated information on real- world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional information on Wikipedia about cities affects tourists’ choices of overnight visits.