Wikimedia Foundation Metrics Meeting 1 October 2015 Agenda
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science
(forthcoming in the Journal of the Association for Information Science and Technology) Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science Misha Teplitskiy Grace Lu Eamon Duede Dept. of Sociology and KnowledgeLab Computation Institute and KnowledgeLab University of Chicago KnowledgeLab University of Chicago [email protected] University of Chicago [email protected] (773) 834-4787 [email protected] (773) 834-4787 5735 South Ellis Avenue (773) 834-4787 5735 South Ellis Avenue Chicago, Illinois 60637 5735 South Ellis Avenue Chicago, Illinois 60637 Chicago, Illinois 60637 Abstract With the rise of Wikipedia as a first-stop source for scientific knowledge, it is important to compare its representation of that knowledge to that of the academic literature. Here we identify the 250 most heavi- ly used journals in each of 26 research fields (4,721 journals, 19.4M articles in total) indexed by the Scopus database, and test whether topic, academic status, and accessibility make articles from these journals more or less likely to be referenced on Wikipedia. We find that a journal’s academic status (im- pact factor) and accessibility (open access policy) both strongly increase the probability of its being ref- erenced on Wikipedia. Controlling for field and impact factor, the odds that an open access journal is referenced on the English Wikipedia are 47% higher compared to paywall journals. One of the implica- tions of this study is that a major consequence of open access policies is to significantly amplify the dif- fusion of science, through an intermediary like Wikipedia, to a broad audience. Word count: 7894 Introduction Wikipedia, one of the most visited websites in the world1, has become a destination for information of all kinds, including information about science (Heilman & West, 2015; Laurent & Vickers, 2009; Okoli, Mehdi, Mesgari, Nielsen, & Lanamäki, 2014; Spoerri, 2007). -
100125-Wikimedia Feb Board Background Final2 Objectives for This Document
Wikimedia Foundation Board Meeting Pre-read January 2010 Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Contents • Strategic planning process update • Goal setting • Priority 1: Building the platform • Priority 2: Strengthening the editing community • Priority 3: Accelerating impact through innovation and experimentation • Implications for the Foundation TBG 100125-Wikimedia_Feb Board Background_Final2 Objectives for this document • Provide reminder on design of strategic planning process and current stage of work • Highlight research and analysis that informed the development of priorities for the Wikimedia Foundation • Introduce action items, including priorities for 2010-11 annual plan to be discussed at Board meeting in February TBG 100125-Wikimedia_Feb Board Background_Final3 A reminder: Two interdependent objectives for the planning work • Define a strategic direction that advances the Wikimedia vision over the next five years • Broader reach and participation • Improved quality and scope of content • Defined community roles and partnerships • Develop a business plan to guide Wikimedia Foundation in executing this direction • Organization, capabilities, and governance • Technology strategy and infrastructure • Economics, cost structure, and funding models TBG 100125-Wikimedia_Feb Board Background_Final4 Our frame for developing the strategic direction and Wikimedia Foundation’s business plan • Wikimedia’s vision is “a world in which every single human being can -
Language-Agnostic Relation Extraction from Abstracts in Wikis
information Article Language-Agnostic Relation Extraction from Abstracts in Wikis Nicolas Heist, Sven Hertling and Heiko Paulheim * ID Data and Web Science Group, University of Mannheim, Mannheim 68131, Germany; [email protected] (N.H.); [email protected] (S.H.) * Correspondence: [email protected] Received: 5 February 2018; Accepted: 28 March 2018; Published: 29 March 2018 Abstract: Large-scale knowledge graphs, such as DBpedia, Wikidata, or YAGO, can be enhanced by relation extraction from text, using the data in the knowledge graph as training data, i.e., using distant supervision. While most existing approaches use language-specific methods (usually for English), we present a language-agnostic approach that exploits background knowledge from the graph instead of language-specific techniques and builds machine learning models only from language-independent features. We demonstrate the extraction of relations from Wikipedia abstracts, using the twelve largest language editions of Wikipedia. From those, we can extract 1.6 M new relations in DBpedia at a level of precision of 95%, using a RandomForest classifier trained only on language-independent features. We furthermore investigate the similarity of models for different languages and show an exemplary geographical breakdown of the information extracted. In a second series of experiments, we show how the approach can be transferred to DBkWik, a knowledge graph extracted from thousands of Wikis. We discuss the challenges and first results of extracting relations from a larger set of Wikis, using a less formalized knowledge graph. Keywords: relation extraction; knowledge graphs; Wikipedia; DBpedia; DBkWik; Wiki farms 1. Introduction Large-scale knowledge graphs, such as DBpedia [1], Freebase [2], Wikidata [3], or YAGO [4], are usually built using heuristic extraction methods, by exploiting crowd-sourcing processes, or both [5]. -
Defending Democracy & Kristin Skare Orgeret Nordicom-Information
Defending Defending Oslo 8–11 August 2013 Democracy The 2013 NordMedia conference in Oslo marked the 40 years that had passed since the very first Nordic media conference. To acknowledge this 40-year anniversary, it made sense to have a conference theme that dealt with a major and important topic: Defending Democracy. Nordic Defending and Global Diversities in Media and Journalism. Focusing on the rela- tionship between journalism, other media practices and democracy, the plenary sessions raised questions such as: Democracy & Edited by What roles do media and journalism play in democratization • Kristin Skare Orgeret processes and what roles should they play? Nordic and Global Diversities How does the increasingly complex and omnipresent media in Media and Journalism • Hornmoen Harald field affect conditions for freedom of speech? This special issue contains the keynote speeches of Natalie Fenton, Stephen Ward and Ib Bondebjerg. A number of the conference papers have been revised and edited to become articles. Together, the articles presented should give the reader an idea of the breadth and depth of Edited by current Nordic scholarship in the area. Harald Hornmoen & Kristin Skare Orgeret SPECIAL ISSUE Nordicom Review | Volume 35 | August 2014 Nordicom-Information | Volume 36 | Number 2 | August 2014 Nordicom-Information Nordicom Review University of Gothenburg 2014 issue Special Box 713, SE 405 30 Göteborg, Sweden Telephone +46 31 786 00 00 (op.) | Fax +46 31 786 46 55 www.nordicom.gu.se | E-mail: [email protected] SPECIAL ISSUE Nordicom Review | Volume 35 | August 2014 Nordicom-Information | Volume 36 | Number 2 | August 2014 Nordicom Review Journal from the Nordic Information Centre for Media and Communication Research Editor NORDICOM invites media researchers to contri- Ulla Carlsson bute scientific articles, reviews, and debates. -
A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks
WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks Cristian Consonni David Laniado Alberto Montresor DISI, University of Trento Eurecat, Centre Tecnologic` de Catalunya DISI, University of Trento [email protected] [email protected] [email protected] Abstract and result in a huge conceptual network. According to Wiki- pedia policies2 (Wikipedia contributors 2018e), when a con- Wikipedia articles contain multiple links connecting a sub- cept is relevant within an article, the article should include a ject to other pages of the encyclopedia. In Wikipedia par- link to the page corresponding to such concept (Borra et al. lance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wiki- 2015). Therefore, the network between articles may be seen pedia links for the 9 largest language editions. The dataset as a giant mind map, emerging from the links established by contains yearly snapshots of the network and spans 17 years, the community. Such graph is not static but is continuously from the creation of Wikipedia in 2001 to March 1st, 2018. growing and evolving, reflecting the endless collaborative While previous work has mostly focused on the complete hy- process behind it. perlink graph which includes also links automatically gener- The English Wikipedia includes over 163 million con- ated by templates, we parsed each revision of each article to nections between its articles. This huge graph has been track links appearing in the main text. In this way we obtained exploited for many purposes, from natural language pro- a cleaner network, discarding more than half of the links and cessing (Yeh et al. -
A Focus on Swahili
MACHINE NATURAL LANGUAGE TRANSLATION USING WIKIPEDIA AS A PARALLEL CORPUS: A FOCUS ON SWAHILI BY ARTHUR BULIVA UNITED STATES INTERNATIONAL UNIVERSITY - AFRICA SUMMER 2017 MACHINE NATURAL LANGUAGE TRANSLATION USING WIKIPEDIA AS A PARALLEL CORPUS: A FOCUS ON SWAHILI BY ARTHUR BULIVA A Project Report Submitted to the School of Science and Technology in Partial Fulfillment of the Requirement for the Degree of Master of Science in Information Systems and Technology UNITED STATES INTERNATIONAL UNIVERSITY - AFRICA SUMMER 2017 Declaration I, the undersigned, declare that this is my original work and has not been submitted to any other college, institution or university other than the United States International University in Nairobi for academic credit. Signed: Date: Arthur Buliva (ID 645381) This project has been presented for examination with my approval as the appointed supervisor. Signed: Date: Leah Mutanu Signed: Date: Dean, School of Science and Technology ii Copyright All Rights Reserved. No part of this dissertation report may be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without prior permission of United States International University - Africa or the author. Copyright Oc 2017 Arthur Buliva. iii Abstract The government of Kenya has undertaken an ambitious project to equip children with laptops and tablets for the purposes of facilitating electronic based learning. This initiative can only bear fruit provided that there is content relevant to the studies being undertaken. Many Kenyans learn English as a second language. Swahili or other African languages is the mother tongue. Therefore, with content in Swahili, a better and deeper understanding of subject matter takes place. -
Complete Issue
Culture Unbound: Journal of Current Cultural Research Thematic Section: Changing Orders of Knowledge? Encyclopedias in Transition Edited by Jutta Haider & Olof Sundin Extraction from Volume 6, 2014 Linköping University Electronic Press Culture Unbound: ISSN 2000-1525 (online) URL: http://www.cultureunbound.ep.liu.se/ © 2014 The Authors. Culture Unbound, Extraction from Volume 6, 2014 Thematic Section: Changing Orders of Knowledge? Encyclopedias in Transition Jutta Haider & Olof Sundin Introduction: Changing Orders of Knowledge? Encyclopaedias in Transition ................................ 475 Katharine Schopflin What do we Think an Encyclopaedia is? ........................................................................................... 483 Seth Rudy Knowledge and the Systematic Reader: The Past and Present of Encyclopedic Reading .............................................................................................................................................. 505 Siv Frøydis Berg & Tore Rem Knowledge for Sale: Norwegian Encyclopaedias in the Marketplace .............................................. 527 Vanessa Aliniaina Rasoamampianina Reviewing Encyclopaedia Authority .................................................................................................. 547 Ulrike Spree How readers Shape the Content of an Encyclopedia: A Case Study Comparing the German Meyers Konversationslexikon (1885-1890) with Wikipedia (2002-2013) ........................... 569 Kim Osman The Free Encyclopaedia that Anyone can Edit: The -
Wikimatrix: Mining 135M Parallel Sentences
Under review as a conference paper at ICLR 2020 WIKIMATRIX:MINING 135M PARALLEL SENTENCES IN 1620 LANGUAGE PAIRS FROM WIKIPEDIA Anonymous authors Paper under double-blind review ABSTRACT We present an approach based on multilingual sentence embeddings to automat- ically extract parallel sentences from the content of Wikipedia articles in 85 lan- guages, including several dialects or low-resource languages. We do not limit the extraction process to alignments with English, but systematically consider all pos- sible language pairs. In total, we are able to extract 135M parallel sentences for 1620 different language pairs, out of which only 34M are aligned with English. This corpus of parallel sentences is freely available.1 To get an indication on the quality of the extracted bitexts, we train neural MT baseline systems on the mined data only for 1886 languages pairs, and evaluate them on the TED corpus, achieving strong BLEU scores for many language pairs. The WikiMatrix bitexts seem to be particularly interesting to train MT systems between distant languages without the need to pivot through English. 1 INTRODUCTION Most of the current approaches in Natural Language Processing (NLP) are data-driven. The size of the resources used for training is often the primary concern, but the quality and a large variety of topics may be equally important. Monolingual texts are usually available in huge amounts for many topics and languages. However, multilingual resources, typically sentences in two languages which are mutual translations, are more limited, in particular when the two languages do not involve English. An important source of parallel texts are international organizations like the European Parliament (Koehn, 2005) or the United Nations (Ziemski et al., 2016). -
Critical Point of View: a Wikipedia Reader
w ikipedia pedai p edia p Wiki CRITICAL POINT OF VIEW A Wikipedia Reader 2 CRITICAL POINT OF VIEW A Wikipedia Reader CRITICAL POINT OF VIEW 3 Critical Point of View: A Wikipedia Reader Editors: Geert Lovink and Nathaniel Tkacz Editorial Assistance: Ivy Roberts, Morgan Currie Copy-Editing: Cielo Lutino CRITICAL Design: Katja van Stiphout Cover Image: Ayumi Higuchi POINT OF VIEW Printer: Ten Klei Groep, Amsterdam Publisher: Institute of Network Cultures, Amsterdam 2011 A Wikipedia ISBN: 978-90-78146-13-1 Reader EDITED BY Contact GEERT LOVINK AND Institute of Network Cultures NATHANIEL TKACZ phone: +3120 5951866 INC READER #7 fax: +3120 5951840 email: [email protected] web: http://www.networkcultures.org Order a copy of this book by sending an email to: [email protected] A pdf of this publication can be downloaded freely at: http://www.networkcultures.org/publications Join the Critical Point of View mailing list at: http://www.listcultures.org Supported by: The School for Communication and Design at the Amsterdam University of Applied Sciences (Hogeschool van Amsterdam DMCI), the Centre for Internet and Society (CIS) in Bangalore and the Kusuma Trust. Thanks to Johanna Niesyto (University of Siegen), Nishant Shah and Sunil Abraham (CIS Bangalore) Sabine Niederer and Margreet Riphagen (INC Amsterdam) for their valuable input and editorial support. Thanks to Foundation Democracy and Media, Mondriaan Foundation and the Public Library Amsterdam (Openbare Bibliotheek Amsterdam) for supporting the CPOV events in Bangalore, Amsterdam and Leipzig. (http://networkcultures.org/wpmu/cpov/) Special thanks to all the authors for their contributions and to Cielo Lutino, Morgan Currie and Ivy Roberts for their careful copy-editing. -
A Wikipedia Reader
UvA-DARE (Digital Academic Repository) Critical point of view: a Wikipedia reader Lovink, G.; Tkacz, N. Publication date 2011 Document Version Final published version Link to publication Citation for published version (APA): Lovink, G., & Tkacz, N. (2011). Critical point of view: a Wikipedia reader. (INC reader; No. 7). Institute of Network Cultures. http://www.networkcultures.org/_uploads/%237reader_Wikipedia.pdf General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:05 Oct 2021 w ikipedia pedai p edia p Wiki CRITICAL POINT OF VIEW A Wikipedia Reader 2 CRITICAL POINT OF VIEW A Wikipedia Reader CRITICAL POINT OF VIEW 3 Critical Point of View: A Wikipedia Reader Editors: Geert Lovink -
Primary-School-Feasibility Study
WikiAfrica Primary School Feasibility Study WikiAfrica Primary School Feasibility Study Draft, November 2012. 1 WikiAfrica Primary School Feasibility Study The WikiAfrica Primary School Feasibility Study was produced in 2012 within the frame of WikiAfrica, a cross-continental collaboration that aims to increase the quantity and quality of African content on the world’s most referenced online encyclopaedia, Wikipedia. WikiAfrica is promoted by lettera27 Foundation and the Africa Centre and it was initiated by lettera27 Foundation in 2006. WikiAfrica Primary School Feasibility Study is edited by Iolanda Pensa, scientific director WikiAfrica; WikiAfrica project manager for lettera27 Cristina Perillo; WikiAfrica project manager for the Africa Centre Isla Haddow-Flood. WikiAfrica Primary School is a project conceived by Iolanda Pensa under Creative Commons attribution share alike license. The WikiAfrica Primary School Feasibility Study was made possible thanks to volunteers and the support of lettera27 Foundation. WikiAfrica Primary School Feasibility Study is under Creative Commons attribution share alike license. WikiAfrica, 2012. 2 WikiAfrica Primary School Feasibility Study “Education is the most powerful weapon which you can use to change the world “ Nelson Mandela 3 WikiAfrica Primary School Feasibility Study Table of content 1. Introduction 7 Key findings 8 2. Wikipedia 13 Outreach projects and education 13 Languages used in the African educational systems 15 General overview of involved Wikipedia projects 18 General overview on the