Edit Filters on English Wikipedia

Master Thesis at the Computer Science Department of the Freie Universität Berlin Human-Centered Computing (HCC) You shall not publish: Edit filters on English Wikipedia Lyudmila Vaseva Supervisor and first examiner: Prof. Dr. Claudia Mu¨ller-Birn Second Examiner: Prof. Dr. Lutz Prechelt Berlin, 25.07.2019 Eidesstattliche Erklärung Ich versichere hiermit an Eides Statt, dass diese Arbeit von niemand an- derem als meiner Person verfasst worden ist. Alle verwendeten Hilfsmittel wie Berichte, Bucher,¨ Internetseiten oder ähnliches sind im Literaturverzeich- nis angegeben, Zitate aus fremden Arbeiten sind als solche kenntlich gemacht. Die Arbeit wurde bisher in gleicher oder ähnlicher Form keiner anderen Pru-¨ fungskommission vorgelegt und auch nicht veröffentlicht. Berlin, den September 12, 2019 <Name> Abstract The present thesis offers an initial investigation of a previously unexplored by scientific research quality control mechanism of Wikipedia|edit filters. It is analysed how edit filters fit in the quality control system of English Wikipedia, why they were introduced, and what tasks they take over. Moreover, it is discussed why rule based systems like these seem to be still popular today, when more advanced machine learning methods are available. The findings indicate that edit filters were implemented to take care of obvious but persistent types of vandalism, disallowing these from the start so that (human) resources can be used more efficiently elsewhere (i.e. for judging less obvious cases). In addition to disallowing such vandalism, edit filters appear to be applied in ambiguous situations where an edit is disruptive but the motivation of the editor is not clear. In such cases, the filters take an \assume good faith" approach and seek via warning messages to guide the disrupting editor towards transforming their contribution to a constructive one. There are also a smaller number of filters taking care of haphazard maintenance tasks|above all tracking a certain bug or other behaviour for further investigation. Since the current work is just a first exploration into edit filters, at the end, a comprehensive list of open questions for future research is compiled. Zusammenfassung Die vorliegende Arbeit bietet eine erste Untersuchung eines bisher von der Wis- senschaft unerforschten Qualitätskontrollmechanismus' von Wikipedia: Bear- beitungsfilter (\edit filters" auf Englisch). Es wird analysiert, wie sich Bear- beitungsfilter in das Qualitätssicherungssystem der englischsprachigen Wikipedia einfugen,¨ warum sie eingefuhrt¨ wurden und welche Aufgaben sie ubernehmen.¨ Daruberhinaus¨ wird diskutiert, warum regelbasierte Systeme wie dieses noch heute beliebt sind, wenn fortgeschrittenere Machine Lerning Methoden verfug-¨ bar sind. Die Ergebnisse deuten darauf hin, dass Bearbeitungsfilter implemen- tiert wurden, um sich um offensichtliche, aber hartnäckige Sorten von Vandal- ismus zu kummern.¨ Die Motivation der Wikipedia-Community war, dass wenn solcher Vandalismus von vornherein verboten wird, (Personal-)Ressourcen an anderen Stellen effizienter genutzt werden können (z.B. zur Beurteilung weniger offensichtlicher Fälle). Außerdem scheinen Bearbeitungsfilter in uneindeutigen Situationen angewendet zu werden, in denen eine Bearbeitung zwar störend ist, die Motivation der editierenden Person allerdings nicht klar als boshaft iden- tifiziert werden kann. In solchen Fällen verinnerlichen die Filter Wikipedias \Geh von guten Absichten aus" Richtlinie und versuchen uber¨ Warnmeldun- gen einen konstruktiven Beitrag anzuleiten. Es gibt auch eine kleinere Anzahl von Filtern, die sich um vereinzelte Wartungsaufgaben kummern.¨ Hierunter fallen die Versuche, einen bestimmten Bug nachzuvollziehen oder ein anderes Verhalten zu verfolgen, um es dann weiter untersuchen zu können. Da die ak- tuelle Arbeit nur ein erster Einblick in Wikipedias Bearbeitungsfilter darstellt, wird am Ende eine umfassendere Liste mit offenen Fragen fur¨ die zukunftige¨ Erforschung des Mechanismus' erarbeitet. Contents 1. Introduction1 1.1. Subject and Context........................2 1.2. Contributions............................2 1.3. Structure..............................3 2. Quality-Control Mechanisms on Wikipedia5 2.1. Automated.............................6 2.1.1. Bots.............................6 2.1.2. ORES............................8 2.1.3. Page Protection, TitleBlacklist, SpamBlacklist......8 2.2. Semi-Automated..........................9 2.3. Manual................................ 10 2.4. Conclusion.............................. 11 3. Methods 15 3.1. Trace Ethnography......................... 15 3.2. Emergent Coding.......................... 16 4. Edit Filters As Part of Wikipedia's Socio-Technical Infrastructure 19 4.1. Data................................. 19 4.2. Edit Filter Definition........................ 20 4.2.1. Example of a Filter..................... 20 4.3. The AbuseFilter MediaWiki Extension.............. 21 4.4. History................................ 25 4.5. Building a Filter: the Internal Perspective............ 27 4.5.1. How Is a New Filter Introduced?............. 27 4.5.2. Who Can Edit Filters?................... 28 4.6. Filters during Runtime: the External Perspective........ 30 4.7. Edit Filters' Role in the Quality Control Ecosystem....... 31 4.7.1. Wikipedia's Algorithmic Quality Control Mechanisms in Comparison......................... 31 4.7.2. Collaboration of the Mechanisms............. 39 4.7.3. Conclusions......................... 40 5. Descriptive Overview of Edit Filters on the English Wikipedia 43 5.1. Data................................. 43 5.2. Types of Edit Filters: Manual Classification........... 44 5.2.1. Coding Process and Challenges.............. 44 5.2.2. Vandalism.......................... 46 i 5.2.3. Good Faith......................... 46 5.2.4. Maintenance......................... 47 5.2.5. Unknown.......................... 47 5.3. Filter Characteristics........................ 47 5.3.1. General Traits........................ 47 5.3.2. Public and Hidden Filters................. 48 5.3.3. Filter Actions........................ 49 5.3.4. What Do Filters Target.................. 49 5.3.5. Who Trips Filters...................... 54 5.4. Filter Activity............................ 55 5.4.1. Filter Hits per Month.................... 55 5.4.2. Most Active Filters Over the Years............ 59 5.5. Conclusions............................. 60 6. Discussion 67 6.1. Q1 What is the role of edit filters among existing quality-control mechanisms on Wikipedia (bots, semi-automated tools, ORES, humans)?.............................. 67 6.2. Q2: Edit filters are a classical rule-based system. Why are they still active today when more sophisticated ML approaches exist? 69 6.3. Q3: Which type of tasks do filters take over?........... 70 6.4. Q4: How have these tasks evolved over time (are they changes in the type, number, etc.)?..................... 71 6.5. Limitations............................. 71 6.5.1. Limitations of the Data.................. 72 6.5.2. Limitations in the Research Process............ 72 6.6. Directions for Future Studies.................... 73 7. Conclusion 75 References 78 Appendices 91 A. Code Book.............................. 91 B. Extra Figures and Tables...................... 96 List of Figures 2.1. State of the scientific literature: edit filters are missing from the quality control ecosystem...................... 12 4.1. Detailed page of edit filter #365.................. 22 4.2. Tagged edits are marked as such in a page's revision history.. 30 4.3. Abuse Log showing all filter matches by User:Schnuppi4223... 31 4.4. Editor gets two warnings upon erasing an entire page...... 33 4.5. EN Wikipedia: Number of editors over the years......... 34 4.6. EN Wikipedia: Number of edits over the years.......... 35 4.7. Edit filters' role in the quality control ecosystem......... 41 5.1. Overview of active, disabled and deleted filters on EN Wikipedia 48 5.2. EN Wikipedia edit filters: Filters actions for all filters...... 50 5.3. EN Wikipedia edit filters: Filters actions for enabled public filters 50 5.4. EN Wikipedia edit filters: Filters actions for enabled hidden filters 51 5.5. Manual tags parent categories distribution: all filters...... 52 5.6. Manual tags parent categories distribution: enabled filters (Jan- uary 2019).............................. 52 5.7. Edit filters manual tags distribution................ 53 5.8. EN Wikipedia edit filters: Hits per month............ 56 5.9. EN Wikipedia edit filters: Hits per month according to manual tags................................. 56 5.10. EN Wikipedia: Reverts for July 2001{April 2017......... 56 5.11. EN Wikipedia edit filters: Hits per month according to filter action................................ 59 5.12. EN Wikipedia edit filters: Hits per month according to trigger- ing editor's action.......................... 60 .1. abuse filter schema......................... 97 .2. abuse filter log schema....................... 98 .3. abuse filter history schema..................... 99 .4. abuse filter action schema..................... 100 iii List of Tables 2.1. Wikipedia's algorithmic quality control mechanisms in comparison 14 4.1. Timeline: Introduction of algorithmic quality control mechanisms 32 4.2. Wikipedia's algorithmic quality control mechanisms in comparison 38 5.1. Filters aimed at unconfirmed users................ 54 5.2. What do most active filters do?.................. 61 5.3. 10 most active filters in 2009.................... 62 5.4. 10 most active filters in 2010...................

Edit Filters on English Wikipedia

Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers Abstract

Wikipedia Ahead

Decentralization in Wikipedia Governance

Community and Communication

A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages

Modeling Popularity and Reliability of Sources in Multilingual Wikipedia

Title of Thesis: ABSTRACT CLASSIFYING BIAS

The Culture of Wikipedia

An Analysis of Contributions to Wikipedia from Tor

Building a Visual Editor for Wikipedia

Classifying Wikipedia Article Quality with Revision History Networks

How to Contribute Climate Change Information to Wikipedia : a Guide