Edit Filters on English Wikipedia
Total Page:16
File Type:pdf, Size:1020Kb
Master Thesis at the Computer Science Department of the Freie Universit¨at Berlin Human-Centered Computing (HCC) You shall not publish: Edit filters on English Wikipedia Lyudmila Vaseva Supervisor and first examiner: Prof. Dr. Claudia Mu¨ller-Birn Second Examiner: Prof. Dr. Lutz Prechelt Berlin, 25.07.2019 Eidesstattliche Erkl¨arung Ich versichere hiermit an Eides Statt, dass diese Arbeit von niemand an- derem als meiner Person verfasst worden ist. Alle verwendeten Hilfsmittel wie Berichte, Bucher,¨ Internetseiten oder ¨ahnliches sind im Literaturverzeich- nis angegeben, Zitate aus fremden Arbeiten sind als solche kenntlich gemacht. Die Arbeit wurde bisher in gleicher oder ¨ahnlicher Form keiner anderen Pru-¨ fungskommission vorgelegt und auch nicht ver¨offentlicht. Berlin, den September 12, 2019 <Name> Abstract The present thesis offers an initial investigation of a previously unexplored by scientific research quality control mechanism of Wikipedia|edit filters. It is analysed how edit filters fit in the quality control system of English Wikipedia, why they were introduced, and what tasks they take over. Moreover, it is discussed why rule based systems like these seem to be still popular today, when more advanced machine learning methods are available. The findings indicate that edit filters were implemented to take care of obvious but persistent types of vandalism, disallowing these from the start so that (human) resources can be used more efficiently elsewhere (i.e. for judging less obvious cases). In addition to disallowing such vandalism, edit filters appear to be applied in ambiguous situations where an edit is disruptive but the motivation of the editor is not clear. In such cases, the filters take an \assume good faith" approach and seek via warning messages to guide the disrupting editor towards transforming their contribution to a constructive one. There are also a smaller number of filters taking care of haphazard maintenance tasks|above all tracking a certain bug or other behaviour for further investigation. Since the current work is just a first exploration into edit filters, at the end, a comprehensive list of open questions for future research is compiled. Zusammenfassung Die vorliegende Arbeit bietet eine erste Untersuchung eines bisher von der Wis- senschaft unerforschten Qualit¨atskontrollmechanismus' von Wikipedia: Bear- beitungsfilter (\edit filters" auf Englisch). Es wird analysiert, wie sich Bear- beitungsfilter in das Qualit¨atssicherungssystem der englischsprachigen Wikipedia einfugen,¨ warum sie eingefuhrt¨ wurden und welche Aufgaben sie ubernehmen.¨ Daruberhinaus¨ wird diskutiert, warum regelbasierte Systeme wie dieses noch heute beliebt sind, wenn fortgeschrittenere Machine Lerning Methoden verfug-¨ bar sind. Die Ergebnisse deuten darauf hin, dass Bearbeitungsfilter implemen- tiert wurden, um sich um offensichtliche, aber hartn¨ackige Sorten von Vandal- ismus zu kummern.¨ Die Motivation der Wikipedia-Community war, dass wenn solcher Vandalismus von vornherein verboten wird, (Personal-)Ressourcen an anderen Stellen effizienter genutzt werden k¨onnen (z.B. zur Beurteilung weniger offensichtlicher F¨alle). Außerdem scheinen Bearbeitungsfilter in uneindeutigen Situationen angewendet zu werden, in denen eine Bearbeitung zwar st¨orend ist, die Motivation der editierenden Person allerdings nicht klar als boshaft iden- tifiziert werden kann. In solchen F¨allen verinnerlichen die Filter Wikipedias \Geh von guten Absichten aus" Richtlinie und versuchen uber¨ Warnmeldun- gen einen konstruktiven Beitrag anzuleiten. Es gibt auch eine kleinere Anzahl von Filtern, die sich um vereinzelte Wartungsaufgaben kummern.¨ Hierunter fallen die Versuche, einen bestimmten Bug nachzuvollziehen oder ein anderes Verhalten zu verfolgen, um es dann weiter untersuchen zu k¨onnen. Da die ak- tuelle Arbeit nur ein erster Einblick in Wikipedias Bearbeitungsfilter darstellt, wird am Ende eine umfassendere Liste mit offenen Fragen fur¨ die zukunftige¨ Erforschung des Mechanismus' erarbeitet. Contents 1. Introduction1 1.1. Subject and Context........................2 1.2. Contributions............................2 1.3. Structure..............................3 2. Quality-Control Mechanisms on Wikipedia5 2.1. Automated.............................6 2.1.1. Bots.............................6 2.1.2. ORES............................8 2.1.3. Page Protection, TitleBlacklist, SpamBlacklist......8 2.2. Semi-Automated..........................9 2.3. Manual................................ 10 2.4. Conclusion.............................. 11 3. Methods 15 3.1. Trace Ethnography......................... 15 3.2. Emergent Coding.......................... 16 4. Edit Filters As Part of Wikipedia's Socio-Technical Infrastructure 19 4.1. Data................................. 19 4.2. Edit Filter Definition........................ 20 4.2.1. Example of a Filter..................... 20 4.3. The AbuseFilter MediaWiki Extension.............. 21 4.4. History................................ 25 4.5. Building a Filter: the Internal Perspective............ 27 4.5.1. How Is a New Filter Introduced?............. 27 4.5.2. Who Can Edit Filters?................... 28 4.6. Filters during Runtime: the External Perspective........ 30 4.7. Edit Filters' Role in the Quality Control Ecosystem....... 31 4.7.1. Wikipedia's Algorithmic Quality Control Mechanisms in Comparison......................... 31 4.7.2. Collaboration of the Mechanisms............. 39 4.7.3. Conclusions......................... 40 5. Descriptive Overview of Edit Filters on the English Wikipedia 43 5.1. Data................................. 43 5.2. Types of Edit Filters: Manual Classification........... 44 5.2.1. Coding Process and Challenges.............. 44 5.2.2. Vandalism.......................... 46 i 5.2.3. Good Faith......................... 46 5.2.4. Maintenance......................... 47 5.2.5. Unknown.......................... 47 5.3. Filter Characteristics........................ 47 5.3.1. General Traits........................ 47 5.3.2. Public and Hidden Filters................. 48 5.3.3. Filter Actions........................ 49 5.3.4. What Do Filters Target.................. 49 5.3.5. Who Trips Filters...................... 54 5.4. Filter Activity............................ 55 5.4.1. Filter Hits per Month.................... 55 5.4.2. Most Active Filters Over the Years............ 59 5.5. Conclusions............................. 60 6. Discussion 67 6.1. Q1 What is the role of edit filters among existing quality-control mechanisms on Wikipedia (bots, semi-automated tools, ORES, humans)?.............................. 67 6.2. Q2: Edit filters are a classical rule-based system. Why are they still active today when more sophisticated ML approaches exist? 69 6.3. Q3: Which type of tasks do filters take over?........... 70 6.4. Q4: How have these tasks evolved over time (are they changes in the type, number, etc.)?..................... 71 6.5. Limitations............................. 71 6.5.1. Limitations of the Data.................. 72 6.5.2. Limitations in the Research Process............ 72 6.6. Directions for Future Studies.................... 73 7. Conclusion 75 References 78 Appendices 91 A. Code Book.............................. 91 B. Extra Figures and Tables...................... 96 List of Figures 2.1. State of the scientific literature: edit filters are missing from the quality control ecosystem...................... 12 4.1. Detailed page of edit filter #365.................. 22 4.2. Tagged edits are marked as such in a page's revision history.. 30 4.3. Abuse Log showing all filter matches by User:Schnuppi4223... 31 4.4. Editor gets two warnings upon erasing an entire page...... 33 4.5. EN Wikipedia: Number of editors over the years......... 34 4.6. EN Wikipedia: Number of edits over the years.......... 35 4.7. Edit filters' role in the quality control ecosystem......... 41 5.1. Overview of active, disabled and deleted filters on EN Wikipedia 48 5.2. EN Wikipedia edit filters: Filters actions for all filters...... 50 5.3. EN Wikipedia edit filters: Filters actions for enabled public filters 50 5.4. EN Wikipedia edit filters: Filters actions for enabled hidden filters 51 5.5. Manual tags parent categories distribution: all filters...... 52 5.6. Manual tags parent categories distribution: enabled filters (Jan- uary 2019).............................. 52 5.7. Edit filters manual tags distribution................ 53 5.8. EN Wikipedia edit filters: Hits per month............ 56 5.9. EN Wikipedia edit filters: Hits per month according to manual tags................................. 56 5.10. EN Wikipedia: Reverts for July 2001{April 2017......... 56 5.11. EN Wikipedia edit filters: Hits per month according to filter action................................ 59 5.12. EN Wikipedia edit filters: Hits per month according to trigger- ing editor's action.......................... 60 .1. abuse filter schema......................... 97 .2. abuse filter log schema....................... 98 .3. abuse filter history schema..................... 99 .4. abuse filter action schema..................... 100 iii List of Tables 2.1. Wikipedia's algorithmic quality control mechanisms in comparison 14 4.1. Timeline: Introduction of algorithmic quality control mechanisms 32 4.2. Wikipedia's algorithmic quality control mechanisms in comparison 38 5.1. Filters aimed at unconfirmed users................ 54 5.2. What do most active filters do?.................. 61 5.3. 10 most active filters in 2009.................... 62 5.4. 10 most active filters in 2010...................