Natural Language Processing for Similar Languages, Varieties, and Dialects: a Survey
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
Omnipedia: Bridging the Wikipedia Language
Omnipedia: Bridging the Wikipedia Language Gap Patti Bao*†, Brent Hecht†, Samuel Carton†, Mahmood Quaderi†, Michael Horn†§, Darren Gergle*† *Communication Studies, †Electrical Engineering & Computer Science, §Learning Sciences Northwestern University {patti,brent,sam.carton,quaderi}@u.northwestern.edu, {michael-horn,dgergle}@northwestern.edu ABSTRACT language edition contains its own cultural viewpoints on a We present Omnipedia, a system that allows Wikipedia large number of topics [7, 14, 15, 27]. On the other hand, readers to gain insight from up to 25 language editions of the language barrier serves to silo knowledge [2, 4, 33], Wikipedia simultaneously. Omnipedia highlights the slowing the transfer of less culturally imbued information similarities and differences that exist among Wikipedia between language editions and preventing Wikipedia’s 422 language editions, and makes salient information that is million monthly visitors [12] from accessing most of the unique to each language as well as that which is shared information on the site. more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with In this paper, we present Omnipedia, a system that attempts a multilingual Wikipedia experience. These include to remedy this situation at a large scale. It reduces the silo visualizing content in a language-neutral way and aligning effect by providing users with structured access in their data in the face of diverse information organization native language to over 7.5 million concepts from up to 25 strategies. We present a study of Omnipedia that language editions of Wikipedia. At the same time, it characterizes how people interact with information using a highlights similarities and differences between each of the multilingual lens. -
Title of Thesis: ABSTRACT CLASSIFYING BIAS
ABSTRACT Title of Thesis: CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis Directed By: Dr. David Zajic, Ph.D. Our project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias. CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis submitted in partial fulfillment of the requirements of the Gemstone Honors Program, University of Maryland, 2018 Advisory Committee: Dr. David Zajic, Chair Dr. Brian Butler Dr. Marine Carpuat Dr. Melanie Kill Dr. Philip Resnik Mr. Ed Summers © Copyright by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang 2018 Acknowledgements We would like to express our sincerest gratitude to our mentor, Dr. -
Does Wikipedia Matter? the Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Discus Si On Paper No
Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko First version: December 2015 This version: September 2017 Download this ZEW Discussion Paper from our ftp server: http://ftp.zew.de/pub/zew-docs/dp/dp15089.pdf Die Dis cus si on Pape rs die nen einer mög lichst schnel len Ver brei tung von neue ren For schungs arbei ten des ZEW. Die Bei trä ge lie gen in allei ni ger Ver ant wor tung der Auto ren und stel len nicht not wen di ger wei se die Mei nung des ZEW dar. Dis cus si on Papers are inten ded to make results of ZEW research prompt ly avai la ble to other eco no mists in order to encou ra ge dis cus si on and sug gesti ons for revi si ons. The aut hors are sole ly respon si ble for the con tents which do not neces sa ri ly repre sent the opi ni on of the ZEW. Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices ∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ First version: December 2015 This version: September 2017 September 25, 2017 Abstract We document a causal influence of online user-generated information on real- world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional information on Wikipedia about cities affects tourists’ choices of overnight visits. -
Flags and Banners
Flags and Banners A Wikipedia Compilation by Michael A. Linton Contents 1 Flag 1 1.1 History ................................................. 2 1.2 National flags ............................................. 4 1.2.1 Civil flags ........................................... 8 1.2.2 War flags ........................................... 8 1.2.3 International flags ....................................... 8 1.3 At sea ................................................. 8 1.4 Shapes and designs .......................................... 9 1.4.1 Vertical flags ......................................... 12 1.5 Religious flags ............................................. 13 1.6 Linguistic flags ............................................. 13 1.7 In sports ................................................ 16 1.8 Diplomatic flags ............................................ 18 1.9 In politics ............................................... 18 1.10 Vehicle flags .............................................. 18 1.11 Swimming flags ............................................ 19 1.12 Railway flags .............................................. 20 1.13 Flagpoles ............................................... 21 1.13.1 Record heights ........................................ 21 1.13.2 Design ............................................. 21 1.14 Hoisting the flag ............................................ 21 1.15 Flags and communication ....................................... 21 1.16 Flapping ................................................ 23 1.17 See also ............................................... -
Wikipedia Matters∗
Wikipedia Matters∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ September 29, 2017 Abstract We document a causal impact of online user-generated information on real-world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional content on Wikipedia pages about cities affects tourists’ choices of overnight visits. Our treatment of adding information to Wikipedia increases overnight visits by 9% during the tourist season. The impact comes mostly from improving the shorter and incomplete pages on Wikipedia. These findings highlight the value of content in digital public goods for informing individual choices. JEL: C93, H41, L17, L82, L83, L86 Keywords: field experiment, user-generated content, Wikipedia, tourism industry 1 Introduction Asymmetric information can hinder efficient economic activity. In recent decades, the Internet and new media have enabled greater access to information than ever before. However, the digital divide, language barriers, Internet censorship, and technological con- straints still create inequalities in the amount of accessible information. How much does it matter for economic outcomes? In this paper, we analyze the causal impact of online information on real-world eco- nomic outcomes. In particular, we measure the impact of information on one of the primary economic decisions—consumption. As the source of information, we focus on Wikipedia. It is one of the most important online sources of reference. It is the fifth most ∗We are grateful to Irene Bertschek, Avi Goldfarb, Shane Greenstein, Tobias Kretschmer, Thomas Niebel, Marianne Saam, Greg Veramendi, Joel Waldfogel, and Michael Zhang as well as seminar audiences at the Economics of Network Industries conference in Paris, ZEW Conference on the Economics of ICT, and Advances with Field Experiments 2017 Conference at the University of Chicago for valuable comments. -
Conferenceabstracts
TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Held under the Honorary Patronage of His Excellency Mr. Borut Pahor, President of the Republic of Slovenia MAY 23 – 28, 2016 GRAND HOTEL BERNARDIN CONFERENCE CENTRE Portorož , SLOVENIA CONFERENCE ABSTRACTS Editors: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Marko Grobelnik , Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis. Assistant Editors: Sara Goggi, Hélène Mazo The LREC 2016 Proceedings are licensed under a Creative Commons Attribution- NonCommercial 4.0 International License LREC 2016, TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Title: LREC 2016 Conference Abstracts Distributed by: ELRA – European Language Resources Association 9, rue des Cordelières 75013 Paris France Tel.: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 www.elra.info and www.elda.org Email: [email protected] and [email protected] ISBN 978-2-9517408-9-1 EAN 9782951740891 ii Introduction of the Conference Chair and ELRA President Nicoletta Calzolari Welcome to the 10 th edition of LREC in Portorož, back on the Mediterranean Sea! I wish to express to his Excellency Mr. Borut Pahor, the President of the Republic of Slovenia, the gratitude of the Program Committee, of all LREC participants and my personal for his Distinguished Patronage of LREC 2016. Some figures: previous records broken again! It is only the 10 th LREC (18 years after the first), but it has already become one of the most successful and popular conferences of the field. We continue the tradition of breaking previous records. We received 1250 submissions, 23 more than in 2014. We received 43 workshop and 6 tutorial proposals. -
Critical Point of View: a Wikipedia Reader
w ikipedia pedai p edia p Wiki CRITICAL POINT OF VIEW A Wikipedia Reader 2 CRITICAL POINT OF VIEW A Wikipedia Reader CRITICAL POINT OF VIEW 3 Critical Point of View: A Wikipedia Reader Editors: Geert Lovink and Nathaniel Tkacz Editorial Assistance: Ivy Roberts, Morgan Currie Copy-Editing: Cielo Lutino CRITICAL Design: Katja van Stiphout Cover Image: Ayumi Higuchi POINT OF VIEW Printer: Ten Klei Groep, Amsterdam Publisher: Institute of Network Cultures, Amsterdam 2011 A Wikipedia ISBN: 978-90-78146-13-1 Reader EDITED BY Contact GEERT LOVINK AND Institute of Network Cultures NATHANIEL TKACZ phone: +3120 5951866 INC READER #7 fax: +3120 5951840 email: [email protected] web: http://www.networkcultures.org Order a copy of this book by sending an email to: [email protected] A pdf of this publication can be downloaded freely at: http://www.networkcultures.org/publications Join the Critical Point of View mailing list at: http://www.listcultures.org Supported by: The School for Communication and Design at the Amsterdam University of Applied Sciences (Hogeschool van Amsterdam DMCI), the Centre for Internet and Society (CIS) in Bangalore and the Kusuma Trust. Thanks to Johanna Niesyto (University of Siegen), Nishant Shah and Sunil Abraham (CIS Bangalore) Sabine Niederer and Margreet Riphagen (INC Amsterdam) for their valuable input and editorial support. Thanks to Foundation Democracy and Media, Mondriaan Foundation and the Public Library Amsterdam (Openbare Bibliotheek Amsterdam) for supporting the CPOV events in Bangalore, Amsterdam and Leipzig. (http://networkcultures.org/wpmu/cpov/) Special thanks to all the authors for their contributions and to Cielo Lutino, Morgan Currie and Ivy Roberts for their careful copy-editing. -
Assessing the Accuracy and Quality of Wikipedia Entries Compared to Popular Online Encyclopaedias
Assessing the accuracy and quality of Wikipedia entries compared to popular online encyclopaedias A preliminary comparative study across disciplines in English, Spanish and Arabic ˃ Imogen Casebourne ˃ Dr. Chris Davies ˃ Dr. Michelle Fernandes ˃ Dr. Naomi Norman Casebourne, I., Davies, C., Fernandes, M., Norman, N. (2012) Assessing the accuracy and quality of Wikipedia entries compared to popular online encyclopaedias: A comparative preliminary study across disciplines in English, Spanish and Arabic. Epic, Brighton, UK. Retrieved from: http://commons.wikimedia.org/wiki/File:EPIC_Oxford_report.pdf This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported Licence http://creativecommons.org/licences/by-sa/3.0/ Supplementary information Additional information, datasets and supplementary materials from this study can be found at: http://meta.wikimedia.org/wiki/Research:Accuracy_and_quality_of_Wikipedia_entries/Data 2 Table of Contents Table of Contents .......................................................................................................................................... 2 Executive Summary ....................................................................................................................................... 5 1. Introduction .......................................................................................................................................... 5 2. Aims and Objectives ............................................................................................................................. -
Dictionary of Slovenian Exonyms Explanatory Notes
GEOGRAFSKI INŠTITUT ANTONA MELIKA ZNANSTVENORAZISKOVALNI CENTER SLOVENSKE AKADEMIJE ZNANOSTI IN UMETNOSTI ANTON MELIK GEOGRAPHICAL INSTITUTE OF ZRC SAZU DICTIONARY OF SLOVENIAN EXONYMS EXPLANATORY NOTES Drago Kladnik Drago Perko AMGI ZRC SAZU Ljubljana 2013 1 Preface The geocoded collection of Slovenia exonyms Zbirka slovenskih eksonimov and the dictionary of Slovenina exonyms Slovar slovenskih eksonimov have been set up as part of the research project Slovenski eksonimi: metodologija, standardizacija, GIS (Slovenian Exonyms: Methodology, Standardization, GIS). They include more than 5,000 of the most frequently used exonyms that were collected from more than 50,000 documented various forms of these types of geographical names. The dictionary contains thirty-four categories and has been designed as a contribution to further standardization of Slovenian exonyms, which can be added to on an ongoing basis and used to find information on Slovenian exonym usage. Currently, their use is not standardized, even though analysis of the collected material showed that the differences are gradually becoming smaller. The standardization of public, professional, and scholarly use will allow completely unambiguous identification of individual features and items named. By determining the etymology of the exonyms included, we have prepared the material for their final standardization, and by systematically documenting them we have ensured that this important aspect of the Slovenian language will not sink into oblivion. The results of this research will not only help preserve linguistic heritage as an important aspect of Slovenian cultural heritage, but also help preserve national identity. Slovenian exonyms also enrich the international treasury of such names and are undoubtedly important part of the world’s linguistic heritage. -
Wikipedia Matters∗
Wikipedia Matters∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ July 14, 2019 Abstract We document a causal impact of online user-generated information on real-world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional content on Wikipedia pages about cities affects tourists’ choices of overnight visits. Our treatment of adding information to Wikipedia in- creases overnight stays in treated cities compared to non-treated cities. The impact is largely driven by improvements to shorter and relatively incomplete pages on Wikipedia. Our findings highlight the value of content in digital public goods for informing individual choices. JEL: C93, H41, L17, L82, L83, L86 Keywords: field experiment, user-generated content, Wikipedia, tourism industry ∗We are grateful to Irene Bertschek, Avi Goldfarb, Shane Greenstein, Tobias Kretschmer, Michael Luca, Thomas Niebel, Marianne Saam, Greg Veramendi, Joel Waldfogel, and Michael Zhang as well as seminar audiences at the Economics of Network Industries conference (Paris), ZEW Conference on the Economics of ICT (Mannheim), Advances with Field Experiments 2017 Conference (University of Chicago), 15th Annual Media Economics Workshop (Barcelona), Conference on Digital Experimentation (MIT), and Digital Economics Conference (Toulouse) for valuable comments. Ruetger Egolf, David Neseer, and Andrii Pogorielov provided outstanding research assistance. Financial support from SEEK 2014 is gratefully acknowledged. †Collegio Carlo Alberto and CEPR, [email protected] ‡Collegio Carlo Alberto, [email protected] §Georgia Institute of Technology & University of East Anglia, [email protected] ¶ZEW, [email protected] 1 1 Introduction Asymmetric information can hinder efficient economic activity (Akerlof, 1970). In recent decades, the Internet and new media have enabled greater access to information than ever before. -
A Wikipedia Reader
UvA-DARE (Digital Academic Repository) Critical point of view: a Wikipedia reader Lovink, G.; Tkacz, N. Publication date 2011 Document Version Final published version Link to publication Citation for published version (APA): Lovink, G., & Tkacz, N. (2011). Critical point of view: a Wikipedia reader. (INC reader; No. 7). Institute of Network Cultures. http://www.networkcultures.org/_uploads/%237reader_Wikipedia.pdf General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:05 Oct 2021 w ikipedia pedai p edia p Wiki CRITICAL POINT OF VIEW A Wikipedia Reader 2 CRITICAL POINT OF VIEW A Wikipedia Reader CRITICAL POINT OF VIEW 3 Critical Point of View: A Wikipedia Reader Editors: Geert Lovink