Edit Filters on English Wikipedia

Total Page:16

File Type:pdf, Size:1020Kb

Edit Filters on English Wikipedia Master Thesis at the Computer Science Department of the Freie Universit¨at Berlin Human-Centered Computing (HCC) You shall not publish: Edit filters on English Wikipedia Lyudmila Vaseva Supervisor and first examiner: Prof. Dr. Claudia Mu¨ller-Birn Second Examiner: Prof. Dr. Lutz Prechelt Berlin, 25.07.2019 Eidesstattliche Erkl¨arung Ich versichere hiermit an Eides Statt, dass diese Arbeit von niemand an- derem als meiner Person verfasst worden ist. Alle verwendeten Hilfsmittel wie Berichte, Bucher,¨ Internetseiten oder ¨ahnliches sind im Literaturverzeich- nis angegeben, Zitate aus fremden Arbeiten sind als solche kenntlich gemacht. Die Arbeit wurde bisher in gleicher oder ¨ahnlicher Form keiner anderen Pru-¨ fungskommission vorgelegt und auch nicht ver¨offentlicht. Berlin, den September 12, 2019 <Name> Abstract The present thesis offers an initial investigation of a previously unexplored by scientific research quality control mechanism of Wikipedia|edit filters. It is analysed how edit filters fit in the quality control system of English Wikipedia, why they were introduced, and what tasks they take over. Moreover, it is discussed why rule based systems like these seem to be still popular today, when more advanced machine learning methods are available. The findings indicate that edit filters were implemented to take care of obvious but persistent types of vandalism, disallowing these from the start so that (human) resources can be used more efficiently elsewhere (i.e. for judging less obvious cases). In addition to disallowing such vandalism, edit filters appear to be applied in ambiguous situations where an edit is disruptive but the motivation of the editor is not clear. In such cases, the filters take an \assume good faith" approach and seek via warning messages to guide the disrupting editor towards transforming their contribution to a constructive one. There are also a smaller number of filters taking care of haphazard maintenance tasks|above all tracking a certain bug or other behaviour for further investigation. Since the current work is just a first exploration into edit filters, at the end, a comprehensive list of open questions for future research is compiled. Zusammenfassung Die vorliegende Arbeit bietet eine erste Untersuchung eines bisher von der Wis- senschaft unerforschten Qualit¨atskontrollmechanismus' von Wikipedia: Bear- beitungsfilter (\edit filters" auf Englisch). Es wird analysiert, wie sich Bear- beitungsfilter in das Qualit¨atssicherungssystem der englischsprachigen Wikipedia einfugen,¨ warum sie eingefuhrt¨ wurden und welche Aufgaben sie ubernehmen.¨ Daruberhinaus¨ wird diskutiert, warum regelbasierte Systeme wie dieses noch heute beliebt sind, wenn fortgeschrittenere Machine Lerning Methoden verfug-¨ bar sind. Die Ergebnisse deuten darauf hin, dass Bearbeitungsfilter implemen- tiert wurden, um sich um offensichtliche, aber hartn¨ackige Sorten von Vandal- ismus zu kummern.¨ Die Motivation der Wikipedia-Community war, dass wenn solcher Vandalismus von vornherein verboten wird, (Personal-)Ressourcen an anderen Stellen effizienter genutzt werden k¨onnen (z.B. zur Beurteilung weniger offensichtlicher F¨alle). Außerdem scheinen Bearbeitungsfilter in uneindeutigen Situationen angewendet zu werden, in denen eine Bearbeitung zwar st¨orend ist, die Motivation der editierenden Person allerdings nicht klar als boshaft iden- tifiziert werden kann. In solchen F¨allen verinnerlichen die Filter Wikipedias \Geh von guten Absichten aus" Richtlinie und versuchen uber¨ Warnmeldun- gen einen konstruktiven Beitrag anzuleiten. Es gibt auch eine kleinere Anzahl von Filtern, die sich um vereinzelte Wartungsaufgaben kummern.¨ Hierunter fallen die Versuche, einen bestimmten Bug nachzuvollziehen oder ein anderes Verhalten zu verfolgen, um es dann weiter untersuchen zu k¨onnen. Da die ak- tuelle Arbeit nur ein erster Einblick in Wikipedias Bearbeitungsfilter darstellt, wird am Ende eine umfassendere Liste mit offenen Fragen fur¨ die zukunftige¨ Erforschung des Mechanismus' erarbeitet. Contents 1. Introduction1 1.1. Subject and Context........................2 1.2. Contributions............................2 1.3. Structure..............................3 2. Quality-Control Mechanisms on Wikipedia5 2.1. Automated.............................6 2.1.1. Bots.............................6 2.1.2. ORES............................8 2.1.3. Page Protection, TitleBlacklist, SpamBlacklist......8 2.2. Semi-Automated..........................9 2.3. Manual................................ 10 2.4. Conclusion.............................. 11 3. Methods 15 3.1. Trace Ethnography......................... 15 3.2. Emergent Coding.......................... 16 4. Edit Filters As Part of Wikipedia's Socio-Technical Infrastructure 19 4.1. Data................................. 19 4.2. Edit Filter Definition........................ 20 4.2.1. Example of a Filter..................... 20 4.3. The AbuseFilter MediaWiki Extension.............. 21 4.4. History................................ 25 4.5. Building a Filter: the Internal Perspective............ 27 4.5.1. How Is a New Filter Introduced?............. 27 4.5.2. Who Can Edit Filters?................... 28 4.6. Filters during Runtime: the External Perspective........ 30 4.7. Edit Filters' Role in the Quality Control Ecosystem....... 31 4.7.1. Wikipedia's Algorithmic Quality Control Mechanisms in Comparison......................... 31 4.7.2. Collaboration of the Mechanisms............. 39 4.7.3. Conclusions......................... 40 5. Descriptive Overview of Edit Filters on the English Wikipedia 43 5.1. Data................................. 43 5.2. Types of Edit Filters: Manual Classification........... 44 5.2.1. Coding Process and Challenges.............. 44 5.2.2. Vandalism.......................... 46 i 5.2.3. Good Faith......................... 46 5.2.4. Maintenance......................... 47 5.2.5. Unknown.......................... 47 5.3. Filter Characteristics........................ 47 5.3.1. General Traits........................ 47 5.3.2. Public and Hidden Filters................. 48 5.3.3. Filter Actions........................ 49 5.3.4. What Do Filters Target.................. 49 5.3.5. Who Trips Filters...................... 54 5.4. Filter Activity............................ 55 5.4.1. Filter Hits per Month.................... 55 5.4.2. Most Active Filters Over the Years............ 59 5.5. Conclusions............................. 60 6. Discussion 67 6.1. Q1 What is the role of edit filters among existing quality-control mechanisms on Wikipedia (bots, semi-automated tools, ORES, humans)?.............................. 67 6.2. Q2: Edit filters are a classical rule-based system. Why are they still active today when more sophisticated ML approaches exist? 69 6.3. Q3: Which type of tasks do filters take over?........... 70 6.4. Q4: How have these tasks evolved over time (are they changes in the type, number, etc.)?..................... 71 6.5. Limitations............................. 71 6.5.1. Limitations of the Data.................. 72 6.5.2. Limitations in the Research Process............ 72 6.6. Directions for Future Studies.................... 73 7. Conclusion 75 References 78 Appendices 91 A. Code Book.............................. 91 B. Extra Figures and Tables...................... 96 List of Figures 2.1. State of the scientific literature: edit filters are missing from the quality control ecosystem...................... 12 4.1. Detailed page of edit filter #365.................. 22 4.2. Tagged edits are marked as such in a page's revision history.. 30 4.3. Abuse Log showing all filter matches by User:Schnuppi4223... 31 4.4. Editor gets two warnings upon erasing an entire page...... 33 4.5. EN Wikipedia: Number of editors over the years......... 34 4.6. EN Wikipedia: Number of edits over the years.......... 35 4.7. Edit filters' role in the quality control ecosystem......... 41 5.1. Overview of active, disabled and deleted filters on EN Wikipedia 48 5.2. EN Wikipedia edit filters: Filters actions for all filters...... 50 5.3. EN Wikipedia edit filters: Filters actions for enabled public filters 50 5.4. EN Wikipedia edit filters: Filters actions for enabled hidden filters 51 5.5. Manual tags parent categories distribution: all filters...... 52 5.6. Manual tags parent categories distribution: enabled filters (Jan- uary 2019).............................. 52 5.7. Edit filters manual tags distribution................ 53 5.8. EN Wikipedia edit filters: Hits per month............ 56 5.9. EN Wikipedia edit filters: Hits per month according to manual tags................................. 56 5.10. EN Wikipedia: Reverts for July 2001{April 2017......... 56 5.11. EN Wikipedia edit filters: Hits per month according to filter action................................ 59 5.12. EN Wikipedia edit filters: Hits per month according to trigger- ing editor's action.......................... 60 .1. abuse filter schema......................... 97 .2. abuse filter log schema....................... 98 .3. abuse filter history schema..................... 99 .4. abuse filter action schema..................... 100 iii List of Tables 2.1. Wikipedia's algorithmic quality control mechanisms in comparison 14 4.1. Timeline: Introduction of algorithmic quality control mechanisms 32 4.2. Wikipedia's algorithmic quality control mechanisms in comparison 38 5.1. Filters aimed at unconfirmed users................ 54 5.2. What do most active filters do?.................. 61 5.3. 10 most active filters in 2009.................... 62 5.4. 10 most active filters in 2010...................
Recommended publications
  • Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers Abstract
    THE YALE LAW JOURNAL FORUM O CTOBER 9 , 2017 Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers abstract. The problem of fake news impacts a massive online ecosystem of individuals and organizations creating, sharing, and disseminating content around the world. One effective ap- proach to addressing false information lies in monitoring such information through an active, engaged volunteer community. Wikipedia, as one of the largest online volunteer contributor communities, presents one example of this approach. This Essay argues that the existing legal framework protecting intermediary companies in the United States empowers the Wikipedia community to ensure that information is accurate and well-sourced. The Essay further argues that current legal efforts to weaken these protections, in response to the “fake news” problem, are likely to create perverse incentives that will harm volunteer engagement and confuse the public. Finally, the Essay offers suggestions for other intermediaries beyond Wikipedia to help monitor their content through user community engagement. introduction Wikipedia is well-known as a free online encyclopedia that covers nearly any topic, including both the popular and the incredibly obscure. It is also an encyclopedia that anyone can edit, an example of one of the largest crowd- sourced, user-generated content websites in the world. This user-generated model is supported by the Wikimedia Foundation, which relies on the robust intermediary liability immunity framework of U.S. law to allow the volunteer editor community to work independently. Volunteer engagement on Wikipedia provides an effective framework for combating fake news and false infor- mation. 358 wikipedia and intermediary immunity: supporting sturdy crowd systems for producing reliable information It is perhaps surprising that a project open to public editing could be highly reliable.
    [Show full text]
  • Wikipedia Ahead
    Caution: Wikipedia not might be what you think it is… What is Wikipedia? Wikipedia is an online encyclopedia that is written, updated, rewritten, and edited by registered site users around the world known as “Wikipedians”. The concept of a freely accessible online “work in progress” is known as a “wiki” (Lin, 2004). While it is possible to view the user profile of individual contributors, one of the prime features of a wiki is that the information is collectively owned, shared, and changed by Internet users who have access to that wiki. Wikipedia is accessible to anyone with an Internet connection. Why Wikipedia? The main philosophy behind Wikipedia is that the sum total of the ideas of Internet users are as credible and valid as the published views of experts who have advanced degrees and extensive experience in their field. Like any wiki, Wikipedians do not “own” or assume intellectual property of the ideas and text, as the information is shared or altered by any/all contributors. I’m not sure what you mean…. An example may be most helpful. Consider Wikipedia’s article on global warming. This text has been continually written, edited, and rewritten by hundreds of Internet users around the world over the past few years. When you open and read Wikipedia’s global warming article, you are reading a text that: • is the sum of all the contributions to this article until that moment. Explanation: The information in the article, in terms of both the content and language, will likely change in the next few hours, days, or months as Wikipedias continue to contribute to/modify the article.
    [Show full text]
  • Decentralization in Wikipedia Governance
    Decentralization in Wikipedia Governance Andrea Forte1, Vanessa Larco2 and Amy Bruckman1 1GVU Center, College of Computing, Georgia Institute of Technology {aforte, asb}@cc.gatech.edu 2Microsoft [email protected] This is a preprint version of the journal article: Forte, Andrea, Vanessa Larco and Amy Bruckman. (2009) Decentralization in Wikipedia Governance. Journal of Management Information Systems. 26(1) pp 49-72. Publisher: M.E. Sharp www.mesharpe.com/journals.asp Abstract How does “self-governance” happen in Wikipedia? Through in-depth interviews with twenty individuals who have held a variety of responsibilities in the English-language Wikipedia, we obtained rich descriptions of how various forces produce and regulate social structures on the site. Our analysis describes Wikipedia as an organization with highly refined policies, norms, and a technological architecture that supports organizational ideals of consensus building and discussion. We describe how governance on the site is becoming increasingly decentralized as the community grows and how this is predicted by theories of commons-based governance developed in offline contexts. We also briefly examine local governance structures called WikiProjects through the example of WikiProject Military History, one of the oldest and most prolific projects on the site. 1. The Mechanisms of Self-Organization Should a picture of a big, hairy tarantula appear in an encyclopedia article about arachnophobia? Does it illustrate the point, or just frighten potential readers? Reasonable people might disagree on this question. In a freely editable site like Wikipedia, anyone can add the photo, and someone else can remove it. And someone can add it back, and the process continues.
    [Show full text]
  • Community and Communication
    Community and 12 Communication A large, diverse, and thriving group of volun- teers produces encyclopedia articles and administers Wikipedia. Over time, members of the Wikipedia community have developed conventions for interacting with each other, processes for managing content, and policies for minimizing disruptions and maximizing use- ful work. In this chapter, we’ll discuss where to find other contributors and how to ask for help with any topic. We’ll also explain ways in which community members interact with each other. Though most discussion occurs on talk pages, Wikipedia has some central community forums for debate about the site’s larger policies and more specific issues. We’ll also talk about the make-up of the community. First, however, we’ll outline aspects of Wikipedia’s shared culture, from key philosophies about how contributors How Wikipedia Works (C) 2008 by Phoebe Ayers, Charles Matthews, and Ben Yates should interact with each other to some long-running points of debate to some friendly practices that have arisen over time. Although explicit site policies cover content guidelines and social norms, informal philosophies and practices help keep the Wikipedia community of contributors together. Wikipedia’s Culture Wikipedia’s community has grown spontaneously and organically—a recipe for a baffling culture rich with in-jokes and insider references. But core tenets of the wiki way, like Assume Good Faith and Please Don’t Bite the Newcomers, have been with the community since the beginning. Assumptions on Arrival Wikipedians try to treat new editors well. Assume Good Faith (AGF) is a funda- mental philosophy, as well as an official guideline (shortcut WP:AGF) on Wikipedia.
    [Show full text]
  • A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
    Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics.
    [Show full text]
  • Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
    information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia.
    [Show full text]
  • Title of Thesis: ABSTRACT CLASSIFYING BIAS
    ABSTRACT Title of Thesis: CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis Directed By: Dr. David Zajic, Ph.D. Our project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias. CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis submitted in partial fulfillment of the requirements of the Gemstone Honors Program, University of Maryland, 2018 Advisory Committee: Dr. David Zajic, Chair Dr. Brian Butler Dr. Marine Carpuat Dr. Melanie Kill Dr. Philip Resnik Mr. Ed Summers © Copyright by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang 2018 Acknowledgements We would like to express our sincerest gratitude to our mentor, Dr.
    [Show full text]
  • The Culture of Wikipedia
    Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented.
    [Show full text]
  • An Analysis of Contributions to Wikipedia from Tor
    Are anonymity-seekers just like everybody else? An analysis of contributions to Wikipedia from Tor Chau Tran Kaylea Champion Andrea Forte Department of Computer Science & Engineering Department of Communication College of Computing & Informatics New York University University of Washington Drexel University New York, USA Seatle, USA Philadelphia, USA [email protected] [email protected] [email protected] Benjamin Mako Hill Rachel Greenstadt Department of Communication Department of Computer Science & Engineering University of Washington New York University Seatle, USA New York, USA [email protected] [email protected] Abstract—User-generated content sites routinely block contri- butions from users of privacy-enhancing proxies like Tor because of a perception that proxies are a source of vandalism, spam, and abuse. Although these blocks might be effective, collateral damage in the form of unrealized valuable contributions from anonymity seekers is invisible. One of the largest and most important user-generated content sites, Wikipedia, has attempted to block contributions from Tor users since as early as 2005. We demonstrate that these blocks have been imperfect and that thousands of attempts to edit on Wikipedia through Tor have been successful. We draw upon several data sources and analytical techniques to measure and describe the history of Tor editing on Wikipedia over time and to compare contributions from Tor users to those from other groups of Wikipedia users. Fig. 1. Screenshot of the page a user is shown when they attempt to edit the Our analysis suggests that although Tor users who slip through Wikipedia article on “Privacy” while using Tor. Wikipedia’s ban contribute content that is more likely to be reverted and to revert others, their contributions are otherwise similar in quality to those from other unregistered participants and to the initial contributions of registered users.
    [Show full text]
  • Building a Visual Editor for Wikipedia
    Building a Visual Editor for Wikipedia Trevor Parscal and Roan Kattouw Wikimania D.C. 2012 Trevor Parscal Roan Kattouw Rob Moen Lead Designer and Engineer Data Model Engineer User Interface Engineer Wikimedia Wikimedia Wikimedia Inez Korczynski Christian Williams James Forrester Edit Surface Engineer Edit Surface Engineer Product Analyst Wikia Wikia Wikimedia The People Wikimania D.C. 2012 Parsoid Team Gabriel Wicke Subbu Sastry Lead Parser Engineer Parser Engineer Wikimedia Wikimedia The People Wikimania D.C. 2012 The Complexity Problem Wikimania D.C. 2012 Active Editors 20k 0 2001 2007 Today Growth Stagnation The Complexity Problem Wikimania D.C. 2012 just messing around Testing testing 123... The Complexity Problem Wikimania D.C. 2012 The Review Problem Wikimania D.C. 2012 Balancing the ecosystem Difficulty Editing Reviewing The Review Problem Wikimania D.C. 2012 Balancing the ecosystem Difficulty Editing Reviewing The Review Problem Wikimania D.C. 2012 Balancing the ecosystem Difficulty Editing Reviewing The Review Problem Wikimania D.C. 2012 Balancing the ecosystem Difficulty Editing Reviewing The Review Problem Wikimania D.C. 2012 Wikitext enthusiasts CC-BY-SA-3.0, http://commons.wikimedia.org/wiki/File:Usfa-heston.gif The Expert Problem Wikimania D.C. 2012 Exit strategy 100% Preference for Wikitext Capabilities of visual tools 0% The Expert Problem Wikimania D.C. 2012 To what extent? CC-BY-SA-3.0, http://commons.wikimedia.org/wiki/File:TriMet_MAX_Green_Line_Train_on_Portland_Transit_Mall.jpg The Expert Problem Wikimania D.C. 2012 To what extent? CC-BY-SA-3.0, http://commons.wikimedia.org/wiki/File:TriMet_MAX_Green_Line_Train_on_Portland_Transit_Mall.jpgCC-BY-SA-3.0, http://commons.wikimedia.org/wiki/File:TriMet_1990_Gillig_bus_carrying_bike.jpg The Expert Problem Wikimania D.C.
    [Show full text]
  • Classifying Wikipedia Article Quality with Revision History Networks
    Classifying Wikipedia Article Quality With Revision History Networks Narun Raman∗ Nathaniel Sauerberg∗ Carleton College Carleton College [email protected] [email protected] Jonah Fisher Sneha Narayan Carleton College Carleton College [email protected] [email protected] ABSTRACT long been interested in maintaining and investigating the quality We present a novel model for classifying the quality of Wikipedia of its content [4][6][12]. articles based on structural properties of a network representation Editors and WikiProjects typically rely on assessments of article of the article’s revision history. We create revision history networks quality to focus volunteer attention on improving lower quality (an adaptation of Keegan et. al’s article trajectory networks [7]), articles. This has led to multiple efforts to create classifiers that can where nodes correspond to individual editors of an article, and edges predict the quality of a given article [3][4][18]. These classifiers can join the authors of consecutive revisions. Using descriptive statistics assist in providing assessments of article quality at scale, and help generated from these networks, along with general properties like further our understanding of the features that distinguish high and the number of edits and article size, we predict which of six quality low quality Wikipedia articles. classes (Start, Stub, C-Class, B-Class, Good, Featured) articles belong While many Wikipedia article quality classifiers have focused to, attaining a classification accuracy of 49.35% on a stratified sample on assessing quality based on the content of the latest version of of articles. These results suggest that structures of collaboration an article [1, 4, 18], prior work has suggested that high quality arti- underlying the creation of articles, and not just the content of the cles are associated with more intense collaboration among editors article, should be considered for accurate quality classification.
    [Show full text]
  • How to Contribute Climate Change Information to Wikipedia : a Guide
    HOW TO CONTRIBUTE CLIMATE CHANGE INFORMATION TO WIKIPEDIA Emma Baker, Lisa McNamara, Beth Mackay, Katharine Vincent; ; © 2021, CDKN This work is licensed under the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/legalcode), which permits unrestricted use, distribution, and reproduction, provided the original work is properly credited. Cette œuvre est mise à disposition selon les termes de la licence Creative Commons Attribution (https://creativecommons.org/licenses/by/4.0/legalcode), qui permet l’utilisation, la distribution et la reproduction sans restriction, pourvu que le mérite de la création originale soit adéquatement reconnu. IDRC Grant/ Subvention du CRDI: 108754-001-CDKN knowledge accelerator for climate compatible development How to contribute climate change information to Wikipedia A guide for researchers, practitioners and communicators Contents About this guide .................................................................................................................................................... 5 1 Why Wikipedia is an important tool to communicate climate change information .................................................................................................................................. 7 1.1 Enhancing the quality of online climate change information ............................................. 8 1.2 Sharing your work more widely ......................................................................................................8 1.3 Why researchers should
    [Show full text]