Position Description Addenda
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Danish Resources
Danish resources Finn Arup˚ Nielsen November 19, 2017 Abstract A range of different Danish resources, datasets and tools, are presented. The focus is on resources for use in automated computational systems and free resources that can be redistributed and used in commercial applications. Contents 1 Corpora3 1.1 Wikipedia...................................3 1.2 Wikisource...................................3 1.3 Wikiquote...................................4 1.4 ADL......................................4 1.5 Gutenberg...................................4 1.6 Runeberg...................................5 1.7 Europarl....................................5 1.8 Leipzig Corpora Collection..........................5 1.9 Danish Dependency Treebank........................6 1.10 Retsinformation................................6 1.11 Other resources................................6 2 Lexical resources6 2.1 DanNet....................................6 2.2 Wiktionary..................................7 2.3 Wikidata....................................7 2.4 OmegaWiki..................................8 2.5 Other lexical resources............................8 2.6 Wikidata examples with medical terminology extraction.........8 3 Natural language processing tools9 3.1 NLTK.....................................9 3.2 Polyglot....................................9 3.3 spaCy.....................................9 3.4 Apache OpenNLP............................... 10 3.5 Centre for Language Technology....................... 10 3.6 Other libraries................................ -
On the Evolution of Wikipedia
On the Evolution of Wikipedia Rodrigo B. Almeida Barzan Mozafari Junghoo Cho UCLA Computer Science UCLA Computer Science UCLA Computer Science Department Department Department Los Angeles - USA Los Angeles - USA Los Angeles - USA [email protected] [email protected] [email protected] Abstract time. So far, several studies have focused on understanding A recent phenomenon on the Web is the emergence and pro- and characterizing the evolution of this huge repository of liferation of new social media systems allowing social inter- data [5, 11]. action between people. One of the most popular of these Recently, a new phenomenon, called social systems, has systems is Wikipedia that allows users to create content in a emerged from the Web. Generally speaking, such systems collaborative way. Despite its current popularity, not much allow people not only to create content, but also to easily is known about how users interact with Wikipedia and how interact and collaborate with each other. Examples of such it has evolved over time. systems are: (1) Social network systems such as MySpace In this paper we aim to provide a first, extensive study of or Orkut that allow users to participate in a social network the user behavior on Wikipedia and its evolution. Compared by creating their profiles and indicating their acquaintances; to prior studies, our work differs in several ways. First, previ- (2) Collaborative bookmarking systems such as Del.icio.us or ous studies on the analysis of the user workloads (for systems Yahoo’s MyWeb in which users are allowed to share their such as peer-to-peer systems [10] and Web servers [2]) have bookmarks; and (3) Wiki systems that allow collaborative mainly focused on understanding the users who are accessing management of Web sites. -
Wikibase Knowledge Graphs for Data Management & Data Science
Business and Economics Research Data Center https://www.berd-bw.de Baden-Württemberg Wikibase knowledge graphs for data management & data science Dr. Renat Shigapov 23.06.2021 @shigapov @_shigapov DATA Motivation MANAGEMENT 1. people DATA SCIENCE knowledg! 2. processes information linking 3. technology data things KNOWLEDGE GRAPHS 2 DATA Flow MANAGEMENT Definitions DATA Wikidata & Tools SCIENCE Local Wikibase Wikibase Ecosystem Summary KNOWLEDGE GRAPHS 29.10.2012 2030 2021 3 DATA Example: Named Entity Linking SCIENCE https://commons.wikimedia.org/wiki/File:Entity_Linking_-_Short_Example.png Rule#$as!d problems Machine Learning De!' Learning Learn data science at https://www.kaggle.com 4 https://commons.wikimedia.org/wiki/File:Data_visualization_process_v1.png DATA Example: general MANAGEMENT research data silos data fabric data mesh data space data marketplace data lake data swamp Research data lifecycle https://www.reading.ac.uk/research-services/research-data-management/ 5 https://www.dama.org/cpages/body-of-knowledge about-research-data-management/the-research-data-lifecycle KNOWLEDGE ONTOLOG( + GRAPH = + THINGS https://www.mediawiki.org https://www.wikiba.se ✔ “Things, not strings” by Google, 2012 + ✔ A knowledge graph links things in different datasets https://mariadb.org https://blazegraph.com ✔ A knowledge graph can link people & relational database graph database processes and enhance technologies The main example: “THE KNOWLEDGE GRAPH COOKBOOK RECIPES THAT WORK” by ANDREAS BLUMAUER & HELMUT NAGY, 2020. https://www.wikidata.org -
Wiki-Reliability: a Large Scale Dataset for Content Reliability on Wikipedia
Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia KayYen Wong∗ Miriam Redi Diego Saez-Trumper Outreachy Wikimedia Foundation Wikimedia Foundation Kuala Lumpur, Malaysia London, United Kingdom Barcelona, Spain [email protected] [email protected] [email protected] ABSTRACT Wikipedia is the largest online encyclopedia, used by algorithms and web users as a central hub of reliable information on the web. The quality and reliability of Wikipedia content is maintained by a community of volunteer editors. Machine learning and information retrieval algorithms could help scale up editors’ manual efforts around Wikipedia content reliability. However, there is a lack of large-scale data to support the development of such research. To fill this gap, in this paper, we propose Wiki-Reliability, the first dataset of English Wikipedia articles annotated with a wide set of content reliability issues. To build this dataset, we rely on Wikipedia “templates”. Tem- plates are tags used by expert Wikipedia editors to indicate con- Figure 1: Example of an English Wikipedia page with several tent issues, such as the presence of “non-neutral point of view” template messages describing reliability issues. or “contradictory articles”, and serve as a strong signal for detect- ing reliability issues in a revision. We select the 10 most popular 1 INTRODUCTION reliability-related templates on Wikipedia, and propose an effective method to label almost 1M samples of Wikipedia article revisions Wikipedia is one the largest and most widely used knowledge as positive or negative with respect to each template. Each posi- repositories in the world. People use Wikipedia for studying, fact tive/negative example in the dataset comes with the full article checking and a wide set of different information needs [11]. -
News Release
NEWS RELEASE For immediate release Sue Gardner to deliver 16th annual LaFontaine-Baldwin Lecture Former head of CBC.ca and Wikimedia Foundation to open 6 Degrees Toronto TORONTO, August 13, 2018—6 Degrees announces that the 2018 LaFontaine-Baldwin Lecture will be delivered by leading digital pioneer Sue Gardner. The lecture will be given on September 24 as part of 6 Degrees Toronto, a project of the Institute for Canadian Citizenship. As senior director of CBC.ca, Gardner reinvented the Canadian Broadcasting Corporation’s place in the world of digital news. Later, as executive director of the Wikimedia Foundation, she played a crucial role in the explosive growth of Wikipedia. The San Francisco–based Gardner continues to be a sought-after global thought-leader: she currently advises media and technology companies, and serves on the boards of Privacy International and the Organized Crime and Corruption Reporting Project. “Sue Gardner is on the forefront of ideas on technology, democracy, and women’s roles in our society,” said ICC Co-founder and Co-chair John Ralston Saul. “As we witness the alarming erosion of democratic institutions, I can’t think of a more relevant voice to address the challenges ahead. I can’t wait to welcome Sue home to Toronto to deliver this year’s LaFontaine-Baldwin Lecture.” Titled Dark Times Ahead: Taking Back Truth, Freedom, and Technology, the interactive event will include Gardner in conversation with John Ralston Saul. Gardner joins an illustrious list of past LaFontaine-Baldwin lecturers, including His Highness the Aga Khan, Naomi Klein, Shawn A-in-chut Atleo, Michael Sandel, and Naheed Nenshi. -
Wikipedia Workshop: Learn How to Edit and Create Pages
Wikipedia Workshop: Learn how to edit and create pages Part A: Your user account Log in with your user name and password. OR If you don’t have a user account already, click on “Create account” in the top right corner. Once you’re logged in, click on “Beta” and enable the VisualEditor. The VisualEditor is the tool for editing or creating articles. It’s like Microsoft Word: it helps you create headings, bold or italicize characters, add hyperlinks, etc.). It’s also possible to add references with the Visual Editor. Pamela Carson, Web Services Librarian, May 12, 2015 Handout based on “Guide d’aide à la contribution sur Wikipédia” by Benoît Rochon. Part B: Write a sentence or two about yourself Click on your username. This will lead you to your user page. The URL will be: https://en.wikipedia.org/wiki/User:[your user name] Exercise: Click on “Edit source” and write about yourself, then enter a description of your change in the “Edit summary” box and click “Save page”. Pamela Carson, Web Services Librarian, May 12, 2015 Handout based on “Guide d’aide à la contribution sur Wikipédia” by Benoît Rochon. Part C: Edit an existing article To edit a Wikipedia article, click on the tab “Edit” or “Edit source” (for more advanced users) available at the top of any page. These tabs are also available beside any section title within an article. Editing an entire page Editing just a section Need help? https://en.wikipedia.org/wiki/Wikipedia:Tutorial/Editing Exercise: Go to http://www.statcan.gc.ca/ and find a statistic that interests you. -
The Culture of Wikipedia
Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented. -
Supporting Organizational Effectiveness Within Movements
While organizational effectiveness is a common concern with nonprofit organizations and networks, especially ones that are growing, the de-centralized culture of Wikimedia presented some unique challenges and opportunities. In a movement that so strongly valued the autonomy of indi- vidual organizations, what did it mean for the Foundation to Supporting Organizational attempt to increase their effectiveness – and how could it do this in a way that was not top-down? Furthermore, the Effectiveness Within Foundation was concerned that its financial support was fueling momentum toward larger-budget, “traditional” or Movements staffed nonprofit structures which might not be consistent with its history and mission as a global free-knowledge move- A Wikimedia Foundation Case Study ment created and driven largely by individual volunteers. Funders supporting movements face a unique set of chal- The Foundation was grappling with two fundamental lenges when trying to support organizations working questions: for the same cause. In their efforts to increase organi- zational effectiveness, foundations invariably face ques- How could the Wikimedia Foundation help Wikime- dia groups make a clearer connection between their tions of credibility and control, and sometimes encoun- strategies and their desired impacts without being di- ter diverging understandings of success. This case study 1 rective about those strategies and desired impacts? profiles how TCC Group – with its partner, the Wikimedia Foundation – developed an innovative, participatory ap- Anecdotally WMF knew that some chapters were proach to these challenges, resulting in increased orga- “more effective” than others, but without clear cri- nizational effectiveness in the Wikimedia movement. teria for success it was difficult to determine how or why some groups were thriving. -
The Importance of Female Voices
The Importance of Female Voices Goals Collaborate to read and understand a text, and examine and use topic-specific vocabulary; Identify and define individual and collective knowledge, differentiating knowledge of girls and boys. Essential Questions Are women today participating in Canadian public life in percentages equal to that of men? Have efforts to increase women’s participation in public life succeeded in Canada? Does it matter if female voices are not heard in more equal proportions? Enduring Understandings Even after much progress in the past 30 years, women are vastly under-represented in many aspects of Canadian public life. This imbalance even showed up in Wikipedia, the open online resource encyclopedia, where only 15-20% of content contributors are women. Efforts to increase female participation in public life in the Canada have succeeded but growth is slow. The executives of the Wikipedia foundation viewed the small number of female contributors as a problem. Many believe that increasing the percentage of female voices will enrich diversity of opinions, and better reflect the user community. Materials Define Gender Gap? (2011) Making the edit: why we need more women in Wikipedia (2019) Wikipedia is a mirror of the world's gender biases (2018) Vocabulary intractable [ in-trak-tuh-buh ] (adjective) not easily controlled or directed; not docile or manageable; stubborn; obstinate. nuance [ noo-ahns ] (noun) a subtle difference or distinction in expression or meaning. disparity [ di-ˈsper-ə-tee ] (noun) difference; containing or made up of fundamentally different or unequal elements. egalitarian [ ih-gal-i-tair-ee-uh n ] (adjective) characterized by a belief in the equal status of all people in political, economic, or social life. -
Mathematical Formulae in Wikimedia Projects 2020”
Preprint from https://www.gipp.com/pub/ M. Schubotz et al. \Mathematical Formulae in Wikimedia Projects 2020". In: Proc. ACM/IEEE JCDL. 2020. doi: 10.1145/3383583. 3398557 Mathematical Formulae in Wikimedia Projects 2020 Moritz Schubotz1,2, Andr´eGreiner-Petter1, Norman Meuschke1,3, Olaf Teschke2, and Bela Gipp1,3 1University of Wuppertal, Germany ([email protected], [email protected]) 2FIZ-Karlsruhe, Germany (ffirst.lastg@fiz-karlsruhe.de) 3University of Konstanz, Germany (ffi[email protected]) May 8, 2020 Abstract This poster summarizes our contributions to Wikimedia's processing pipeline for mathematical formulae. We describe how we have supported the transition from rendering formulae as course-grained PNG images in 2001 to providing modern semantically enriched language-independent MathML formulae in 2020. Additionally, we describe our plans to im- prove the accessibility and discoverability of mathematical knowledge in Wikimedia projects further. 1 Introduction Mathematical formulae are an integral part of Wikipedia and other projects of the Wikimedia foundation1. The MediaWiki software is the technical back- bone of many Wikimedia projects, including Wikipedia. Since 2003, wikitext { the markup language of MediaWiki { supports mathematical content [9]. For example, MediaWiki converts the wikitext code <math>E=mc^2</math> to the arXiv:2003.09417v2 [cs.DL] 6 May 2020 formula E = mc2. While the markup for mathematical formulae has remained stable since 2003, MediaWikis pipeline for processing wikitext to render formu- lae has changed significantly. Initially, MediaWiki used LaTeX to convert math tags in wikitext to PNG images. The rendering process was slow, the images did not integrate well into the text, were inaccessible to screen readers for visually impaired users, and scaled poorly for both small and high-resolution screens. -
Building a Visual Editor for Wikipedia
Building a Visual Editor for Wikipedia Trevor Parscal and Roan Kattouw Wikimania D.C. 2012 (Introduce yourself) (Introduce yourself) We’d like to talk to you about how we’ve been building a visual editor for Wikipedia Trevor Parscal Roan Kattouw Rob Moen Lead Designer and Engineer Data Model Engineer User Interface Engineer Wikimedia Wikimedia Wikimedia Inez Korczynski Christian Williams James Forrester Edit Surface Engineer Edit Surface Engineer Product Analyst Wikia Wikia Wikimedia The People Wikimania D.C. 2012 We are only 2/6ths of the VisualEditor team Our team includes 2 engineers from Wikia - they also use MediaWiki They also fight crime in their of time Parsoid Team Gabriel Wicke Subbu Sastry Lead Parser Engineer Parser Engineer Wikimedia Wikimedia The People Wikimania D.C. 2012 There’s also two remote people working on a new parser This parser makes what we are doing with the VisualEditor possible The Project Wikimania D.C. 2012 You might recognize this, it’s a Wikipedia article You should edit it! Seems simple enough, just hit the edit button and be on your way... The Complexity Problem Wikimania D.C. 2012 Or not... What is all this nonsense you may ask? Well, it’s called Wikitext! Even really smart people who have a lot to contribute to Wikipedia find it confusing The truth is, Wikitext is a lousy IQ test, and it’s holding Wikipedia back, severely Active Editors 20k 0 2001 2007 Today Growth Stagnation The Complexity Problem Wikimania D.C. 2012 The internet has normal people on it now, not just geeks and weirdoes Normal people like simple things, and simple things are growing fast We must make editing Wikipedia easier to use, not just to grow, but even just to stay alive The Complexity Problem Wikimania D.C. -
Wikimedia Foundation 2010-11 Plan
Wikimedia Foundation The Year Ahead, and the Year in Review SUE GARDNER WIKIMANIA – WASHINGTON DC – 14 JULY 2012 Purpose of this presentation is two things: Tell you what the WMF did in 2011-12 Tell you what the WMF plans to do in 2012-13 3 34.8 million dollars 4 You wrote the projects. You earned the goodwill. Your work is what donors are supporting. 5 6 7 Recapping 2011-12 8 Recapping 2011-12 Activities Supported 25% increase in readership from 400 to 500 million UVs; Visual Editor – released with save/edit capability & new parser, in restricted namespace on Mediawiki.org; Global Ed work is being done in 12 countries, and in eight the work is driven by volunteers – 19 million characters added to Wikipedia, with 50% female editors; The mobile platform has been redeveloped, Android app released, mobile PVs up 187%. Wikipedia Zero launched with new deals with two companies covering 28 countries, and more underway; Editor recruitment activity is underway in India and Brazil; Wikimedia Labs is launched and exceeding participation targets; Internationalization team has made significant improvements, particularly for Indic languages. It has also created new translation tools; New editor engagement – MoodBar/Feedback Dashboard, AFTv5, New Pages Feed tool, plus many research projects and small experiments (e.g., warnings & welcoming study, lapsed editor e-mail experiment,Teahouse). Nonetheless, editors are down to 85 from 89K; Upload wizard increased image uploads 27% over the year, with WLM causing a visible spike in September 2011. 9 Strategy Plan 2015 Goals (for reference) 1) Increase readership. 2) Increase the quantity of material we offer.