The Use of Software Tools and Autonomous Bots Against Vandalism: Eroding Wikipedia’S Moral Order?

Total Page:16

File Type:pdf, Size:1020Kb

The Use of Software Tools and Autonomous Bots Against Vandalism: Eroding Wikipedia’S Moral Order? Ethics Inf Technol DOI 10.1007/s10676-015-9366-9 ORIGINAL PAPER The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral order? Paul B. de Laat1 Ó The Author(s) 2015. This article is published with open access at Springerlink.com Abstract English-language Wikipedia is constantly Introduction being plagued by vandalistic contributions on a massive scale. In order to fight them its volunteer contributors Communities that thrive on the contributions from their deploy an array of software tools and autonomous bots. respective ‘crowds’ have been with us for several decades After an analysis of their functioning and the ‘coactivity’ in now. The contents involved may be source code, text, or use between humans and bots, this research ‘discloses’ the pictures; the products created may be software, news, ref- moral issues that emerge from the combined patrolling by erence works, videos, maps, and the like. As argued before humans and bots. Administrators provide the stronger tools (de Laat 2012b, using Dutton 2008), the basic parameters only to trusted users, thereby creating a new hierarchical of such open content communities are twofold: on the one layer. Further, surveillance exhibits several troubling fea- hand their technological web design, which may range tures: questionable profiling practices (concerning from sharing (1.0), co-contributing (2.0), to co-creation anonymous users in particular), the use of the controversial (3.0); on the other hand their conditions of admission to the measure of reputation (under consideration), ‘oversurveil- collaborative work process, which may be fully open or lance’ where quantity trumps quality, and a prospective more restricted (say, only for experts). Some telling loss of the required moral skills whenever bots take over examples are Linux, Slashdot, Reddit, NowPublic, Wiki- from humans. The most troubling aspect, though, is that pedia, and YouTube. Wikipedia has become a Janus-faced institution. One face Open channels of communication invite all kinds of is the basic platform of MediaWiki software, transparent to contributions to the collective process. Inevitably, they also all. Its other face is the anti-vandalism system, which, in solicit contents that are disruptive and damaging to the contrast, is opaque to the average user, in particular as a goals of the community: off-topic, inappropriate, improper, result of the algorithms and neural networks in use. Finally offensive, malicious content and the like. Obviously, this it is argued that this secrecy impedes a much needed dis- issue is most urgent in those open-content communities cussion to unfold; a discussion that should focus on a that focus on co-contribution or co-creation without any ‘rebalancing’ of the anti-vandalism system and the devel- restrictions on entry. Most of such ‘open collaboration’ opment of more ethical information practices towards projects (a term coined by Forte and Lampe 2013), there- contributors. fore, have had no choice but to develop systems that detect improper contributions and subsequently eliminate them in Keywords Bots Á Disclosive ethics Á Profiling Á appropriate ways. In short, anti-intrusion systems have Surveillance Á Vandalism Á Wikipedia been introduced. Several varieties can be discerned.1 A default solution is that professional editors of the site police contributions for improper content, before and/or after publication. Most & Paul B. de Laat social news sites and citizen journals are a case in point. As [email protected] 1 University of Groningen, Groningen, The Netherlands 1 The next two paragraphs mostly follow de Laat (2012a). 123 P. B. de Laat a rule, though, the incoming flow of content is simply too certain fields of interest (encyclopedias) or for source code large to be handled in this way alone; therefore the crowds that meets certain technical criteria (open-source software); are called to the rescue. Such help may take various forms. their criteria for propriety obviously differ widely. First, contributors can be asked to scout for improper Correspondingly, coming closer to the ‘truth’ is much content as well. Any purported transgressions are to be harder for the latter types of community than for the for- reported to the managing editors who will deal with them. mer. Further, the size of the community involved matters: If necessary, additional workers are hired for the modera- the larger it becomes, the more tendencies towards strati- tion of reported contents. Facebook and YouTube report- fication and differentiation of power are likely to manifest edly outsource this kind of moderation globally to small themselves. Moreover, a multitude of elites may be forming, companies that all together employ thousands of lowly- each with their own agendas. paid workers. Many of the call centres involved, for Keeping these questions in mind let us return to the example, are located in the Philippines (Chen 2014). Sec- phenomenon of disruptive contributions and focus on their ondly, contributors can be invited to vote on new content: scale. It can safely be asserted that in the largest of all usually a plus or a minus. It was, of course, Digg that open-content encyclopedias, Wikipedia, disruption has pioneered this scheme (‘digging’). As a result of this vot- reached gigantic proportions. For the English language ing, contributions rise to public prominence (or visibility) version of the encyclopedia in particular, estimates hover more quickly, or fade away more quickly. Quality gets around 9000 malicious edits a day. A few reasons behind sorted out. Both reporting and voting are common practice this vandalism on a large scale can easily be identified. On now in social news sites and citizen journals. Thirdly, the one hand, the Wikipedia corpus involved has assumed contributors can be invited to change contributions that are large proportions (almost five million entries), making it a deemed inappropriate. Usually, of course, this means very attractive target for visitors of bad intent; on the other, deleting the content in question. It is common practice in Wikipedia allows co-creation (3.0) with full write-access communities that operate according to a wiki format that for all, a collaborative tool that is very susceptible to dis- allows contributors unrestricted write-access. Cases in ruptive actions by mala fide visitors. point are Citizendium, Wikinews, and Wikipedia. Finally, From the beginning of Wikipedia, in 2001, several contributors can be invited to spot improper contribu- approaches to curb vandalism have been tried. Some of tions—but only those who have proved themselves trust- them have been accepted and endured; others have been worthy are allowed to act accordingly. This hierarchical discarded along the way. In the following, the various tools option is common practice in communities of open-source and approaches are spelled out, with a particular emphasis software: after identifying them as genuine bugs, only on the ones that are still being used. It is shown that within developers (with write-access) may correct such edits by Wikipedia a whole collection of anti-vandalism tools has eliminating or fixing them in the main code tree. been developed. These instruments are being unfolded in a In order to present a unified framework in the above I massive campaign of collective monitoring of edits. This have been talking indiscriminately about improper and/or vigilance is exercised continuously, on a 24/7 basis. In the disruptive content. Such neutral language of course con- process, human Wikipedians are being mobilized as ceals a lot of variety between communities concerning patrollers, from administrators at the top to ordinary users what is to be counted as such. Moreover, impropriety and at the bottom; in addition, several autonomous bots are disruption only exist in the eyes of the particular beholder. doing their part. In a short detour, this system of surveil- These observations immediately raise many associated lance is analysed from the viewpoint of robotic ethics, and questions. What are the definitions of impropriety in use? shown to conform to the recent trend of ‘coactivity’ Who effectively decides on what is to count as such? Is it between humans and bots. the crowds that have been solicited or only a select few? If In the central part of this article, subsequently, the moral the latter, how did the select few obtain their position of questions that emerge from this mixed human/bot surveil- power: by appointment from above or by some democratic lance are discussed. The analysis is to be seen as an procedure? Ultimately the crucial question is: do such exercise in ‘disclosive ethics’ (Brey 2000): uncovering decisions on what counts as proper content bring us any features of the anti-vandalism system in place that are closer to the particular ‘truth’ being sought? largely opaque and appear to be morally neutral—but, in Obviously, the answers are bound to vary depending on actual fact, are not so. Based on a careful reading of which type of community is under scrutiny. Let me just Wikipedian pages that relate to vandalism fighting and a state here a few, select generalities. As concerns the kind of period of participant observation as a practising patroller I content being pursued, some communities are after an draw the following conclusions. As for power, next to exchange of opinions about topical issues (social news, administrators, strong anti-vandalism tools are shown to be citizen journals), others are on a quest for the ‘facts’ about only distributed to trusted users, thus creating dominance 123 The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral… for a new hierarchical layer. As far as surveillance is namespace only (the entries themselves). Further, users concerned, filtering of edits becomes questionable when it may compose a list of specific entries they are interested in focuses on editor characteristics like anonymity or repu- (their own personal ‘watchlist’), and then select all recent tation.
Recommended publications
  • Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers Abstract
    THE YALE LAW JOURNAL FORUM O CTOBER 9 , 2017 Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers abstract. The problem of fake news impacts a massive online ecosystem of individuals and organizations creating, sharing, and disseminating content around the world. One effective ap- proach to addressing false information lies in monitoring such information through an active, engaged volunteer community. Wikipedia, as one of the largest online volunteer contributor communities, presents one example of this approach. This Essay argues that the existing legal framework protecting intermediary companies in the United States empowers the Wikipedia community to ensure that information is accurate and well-sourced. The Essay further argues that current legal efforts to weaken these protections, in response to the “fake news” problem, are likely to create perverse incentives that will harm volunteer engagement and confuse the public. Finally, the Essay offers suggestions for other intermediaries beyond Wikipedia to help monitor their content through user community engagement. introduction Wikipedia is well-known as a free online encyclopedia that covers nearly any topic, including both the popular and the incredibly obscure. It is also an encyclopedia that anyone can edit, an example of one of the largest crowd- sourced, user-generated content websites in the world. This user-generated model is supported by the Wikimedia Foundation, which relies on the robust intermediary liability immunity framework of U.S. law to allow the volunteer editor community to work independently. Volunteer engagement on Wikipedia provides an effective framework for combating fake news and false infor- mation. 358 wikipedia and intermediary immunity: supporting sturdy crowd systems for producing reliable information It is perhaps surprising that a project open to public editing could be highly reliable.
    [Show full text]
  • Initiativen Roger Cloes Infobrief
    Wissenschaftliche Dienste Deutscher Bundestag Infobrief Entwicklung und Bedeutung der im Internet ehrenamtlich eingestell- ten Wissensangebote insbesondere im Hinblick auf die Wiki- Initiativen Roger Cloes WD 10 - 3010 - 074/11 Wissenschaftliche Dienste Infobrief Seite 2 WD 10 - 3010 - 074/11 Entwicklung und Bedeutung der ehrenamtlich im Internet eingestellten Wissensangebote insbe- sondere im Hinblick auf die Wiki-Initiativen Verfasser: Dr. Roger Cloes / Tim Moritz Hector (Praktikant) Aktenzeichen: WD 10 - 3010 - 074/11 Abschluss der Arbeit: 1. Juli 2011 Fachbereich: WD 10: Kultur, Medien und Sport Ausarbeitungen und andere Informationsangebote der Wissenschaftlichen Dienste geben nicht die Auffassung des Deutschen Bundestages, eines seiner Organe oder der Bundestagsverwaltung wieder. Vielmehr liegen sie in der fachlichen Verantwortung der Verfasserinnen und Verfasser sowie der Fachbereichsleitung. Der Deutsche Bundestag behält sich die Rechte der Veröffentlichung und Verbreitung vor. Beides bedarf der Zustimmung der Leitung der Abteilung W, Platz der Republik 1, 11011 Berlin. Wissenschaftliche Dienste Infobrief Seite 3 WD 10 - 3010 - 074/11 Zusammenfassung Ehrenamtlich ins Internet eingestelltes Wissen spielt in Zeiten des sogenannten „Mitmachweb“ eine zunehmend herausgehobene Rolle. Vor allem Wikis und insbesondere Wikipedia sind ein nicht mehr wegzudenkendes Element des Internets. Keine anderen vergleichbaren Wissensange- bote im Internet, auch wenn sie zum freien Abruf eingestellt sind und mit Steuern, Gebühren, Werbeeinnahmen finanziert oder als Gratisproben im Rahmen von Geschäftsmodellen verschenkt werden, erreichen die Zugriffszahlen von Wikipedia. Das ist ein Phänomen, das in seiner Dimension vor dem Hintergrund der urheberrechtlichen Dis- kussion und der Begründung von staatlichem Schutz als Voraussetzung für die Schaffung von geistigen Gütern kaum Beachtung findet. Relativ niedrige Verbreitungskosten im Internet und geringe oder keine Erfordernisse an Kapitalinvestitionen begünstigen diese Entwicklung.
    [Show full text]
  • Community Or Social Movement? Piotr Konieczny
    Wikipedia: Community or social movement? Piotr Konieczny To cite this version: Piotr Konieczny. Wikipedia: Community or social movement?. Interface: a journal for and about social movements, 2009. hal-01580966 HAL Id: hal-01580966 https://hal.archives-ouvertes.fr/hal-01580966 Submitted on 4 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Interface: a journal for and about social movements Article Volume 1 (2): 212 - 232 (November 2009) Konieczny, Wikipedia Wikipedia: community or social movement? Piotr Konieczny Abstract In recent years a new realm for study of political and sociological phenomena has appeared, the Internet, contributing to major changes in our societies during its relatively brief existence. Within cyberspace, organizations whose existence is increasingly tied to this virtual world are of interest to social scientists. This study will analyze the community of one of the largest online organizations, Wikipedia, the free encyclopedia with millions of volunteer members. Wikipedia was never meant to be a community, yet it most certainly has become one. This study asks whether it is something even more –whether it is an expression of online activism, and whether it can be seen as a social movement organization, related to one or more of the Internet-centered social movements industries (in particular, the free and open-source software movement industry).
    [Show full text]
  • Reliability of Wikipedia 1 Reliability of Wikipedia
    Reliability of Wikipedia 1 Reliability of Wikipedia The reliability of Wikipedia (primarily of the English language version), compared to other encyclopedias and more specialized sources, is assessed in many ways, including statistically, through comparative review, analysis of the historical patterns, and strengths and weaknesses inherent in the editing process unique to Wikipedia. [1] Because Wikipedia is open to anonymous and collaborative editing, assessments of its Vandalism of a Wikipedia article reliability usually include examinations of how quickly false or misleading information is removed. An early study conducted by IBM researchers in 2003—two years following Wikipedia's establishment—found that "vandalism is usually repaired extremely quickly — so quickly that most users will never see its effects"[2] and concluded that Wikipedia had "surprisingly effective self-healing capabilities".[3] A 2007 peer-reviewed study stated that "42% of damage is repaired almost immediately... Nonetheless, there are still hundreds of millions of damaged views."[4] Several studies have been done to assess the reliability of Wikipedia. A notable early study in the journal Nature suggested that in 2005, Wikipedia scientific articles came close to the level of accuracy in Encyclopædia Britannica and had a similar rate of "serious errors".[5] This study was disputed by Encyclopædia Britannica.[6] By 2010 reviewers in medical and scientific fields such as toxicology, cancer research and drug information reviewing Wikipedia against professional and peer-reviewed sources found that Wikipedia's depth and coverage were of a very high standard, often comparable in coverage to physician databases and considerably better than well known reputable national media outlets.
    [Show full text]
  • Towards Content-Driven Reputation for Collaborative Code Repositories
    University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 8-28-2012 Towards Content-Driven Reputation for Collaborative Code Repositories Andrew G. West University of Pennsylvania, [email protected] Insup Lee University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/cis_papers Part of the Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons, and the Software Engineering Commons Recommended Citation Andrew G. West and Insup Lee, "Towards Content-Driven Reputation for Collaborative Code Repositories", 8th International Symposium on Wikis and Open Collaboration (WikiSym '12) , 13.1-13.4. August 2012. http://dx.doi.org/10.1145/2462932.2462950 Eighth International Symposium on Wikis and Open Collaboration, Linz, Austria, August 2012. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_papers/750 For more information, please contact [email protected]. Towards Content-Driven Reputation for Collaborative Code Repositories Abstract As evidenced by SourceForge and GitHub, code repositories now integrate Web 2.0 functionality that enables global participation with minimal barriers-to-entry. To prevent detrimental contributions enabled by crowdsourcing, reputation is one proposed solution. Fortunately this is an issue that has been addressed in analogous version control systems such as the *wiki* for natural language content. The WikiTrust algorithm ("content-driven reputation"), while developed and evaluated in wiki environments operates under a possibly shared collaborative assumption: actions that "survive" subsequent edits are reflective of good authorship. In this paper we examine WikiTrust's ability to measure author quality in collaborative code development. We first define a mapping omfr repositories to wiki environments and use it to evaluate a production SVN repository with 92,000 updates.
    [Show full text]
  • Checkuser and Editing Patterns
    CheckUser and Editing Patterns Balancing privacy and accountability on Wikimedia projects Wikimania 2008, Alexandria, July 18, 2008 HaeB [[de:Benutzer:HaeB]], [[en:User:HaeB]] Please don't take photos during this talk. What are sockpuppets? ● Wikipedia and sister projects rely on being open: Anyone can edit, anyone can get an account. ● No ID control when registering (not even email address required) ● Many legitimate uses for multiple accounts ● “Sockpuppet” often implies deceptive intention, think ventriloquist What is the problem with sockpuppets? ● Ballot stuffing (some decision processes rely on voting, such as request for adminships and WMF board elections) ● “Dr Jekyll/Mr Hyde”: Carry out evil or controversial actions with a sockpuppet, such that the main account remains in good standing. E.g. trolling (actions intended to provoke adversive reactions and disrupt the community), or strawman accounts (putting the adversarial position in bad light) ● Artificial majorities in content disputes (useful if the wiki's culture values majority consensus over references and arguments), especially circumventing “three-revert-rule” ● Ban evasion What is the problem with sockpuppets? (cont'd) ● Newbies get bitten as a result of the possibility of ban evasion: ● Friedman and Resnick (The Social Cost of Cheap Pseudonyms, Journal of Economics and Management Strategy 2001): Proved in a game-theoretic model that the possibility of creating sockpuppets leads to distrust and discrimination against newcomers (if the community is otherwise successfully
    [Show full text]
  • The Case of Wikipedia Jansn
    UvA-DARE (Digital Academic Repository) Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system Niederer, S.; van Dijck, J. DOI 10.1177/1461444810365297 Publication date 2010 Document Version Submitted manuscript Published in New Media & Society Link to publication Citation for published version (APA): Niederer, S., & van Dijck, J. (2010). Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system. New Media & Society, 12(8), 1368-1387. https://doi.org/10.1177/1461444810365297 General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:27 Sep 2021 Full Title: Wisdom of the Crowd or Technicity of Content? Wikipedia as a socio-technical system Authors: Sabine Niederer and José van Dijck Sabine Niederer, University of Amsterdam, Turfdraagsterpad 9, 1012 XT Amsterdam, The Netherlands [email protected] José van Dijck, University of Amsterdam, Spuistraat 210, 1012 VT Amsterdam, The Netherlands [email protected] Authors’ Biographies Sabine Niederer is PhD candidate in Media Studies at the University of Amsterdam, and member of the Digital Methods Initiative, Amsterdam.
    [Show full text]
  • Wikipedia's Labor Squeeze and Its Consequences Eric Goldman Santa Clara University School of Law, [email protected]
    Santa Clara Law Santa Clara Law Digital Commons Faculty Publications Faculty Scholarship 1-1-2010 Wikipedia's Labor Squeeze and Its Consequences Eric Goldman Santa Clara University School of Law, [email protected] Follow this and additional works at: http://digitalcommons.law.scu.edu/facpubs Part of the Internet Law Commons Recommended Citation 8 J. on Telecomm. & High Tech. L. 157 (2010) This Article is brought to you for free and open access by the Faculty Scholarship at Santa Clara Law Digital Commons. It has been accepted for inclusion in Faculty Publications by an authorized administrator of Santa Clara Law Digital Commons. For more information, please contact [email protected]. WIKIPEDIA'S LABOR SQUEEZE AND ITS CONSEQUENCES ERIC GOLDMAN* INT RO D U CTIO N ................................................................................... 158 I. MEASURING WIKIPEDIA'S SUCCESS ....................................... 159 II. THREATS TO WIKIPEDIA ......................................................... 161 III. WIKIPEDIA'S RESPONSE TO THE VANDAL AND SPAMMER T H REA T S ................................................................................... 164 A. Increased TechnologicalBarriers to Participation.................... 164 B. Increased Social Barriersto Participation............................... 167 IV. WIKIPEDIA'S LOOMING LABOR SUPPLY PROBLEMS ............. 170 A . E ditor Turnover ................................................................... 170 B. Wikipedia's Limited Toolkit to Attract New Editors..............
    [Show full text]
  • A Cultural and Political Economy of Web 2.0
    A CULTURAL AND POLITICAL ECONOMY OF WEB 2.0 by Robert W. Gehl A Dissertation Submitted to the Graduate Faculty of George Mason University in Partial Fulfillment of The Requirements for the Degree of Doctor of Philosophy Cultural Studies Committee: Director Program Director Dean, College of Humanities and Social Sciences Date: Spring Semester 2010 George Mason University Fairfax, VA A Cultural and Political Economy of Web 2.0 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University. By Robert W. Gehl Master of Arts Western Michigan University, 2003 Director: Hugh Gusterson, Professor Department of Sociology and Anthropology Spring Semester 2010 George Mason University Fairfax, VA Copyright © 2010 Robert W. Gehl Creative Commons Attribution-Noncommercial-Share Alike 3.0 All trademarks, service marks, logos and company names mentioned in this work are the property of their respective owners. ii Dedication This dissertation is dedicated to one TJ, Teddy, who kindly slept through most of it and danced through the rest, and another TJ, Jesse, who works so hard for our little family. It is also dedicated to my parents, who filled my childhood house with books, computers, and love. Thank you. iii Acknowledgments Singlehandedly authoring a dissertation on Web 2.0 – a phenomenon which can at its best be about multiauthor collaboration in creative projects – is an ironic act. While I might claim authorship of this dissertation, it really is the result of many people working together. Specifically, my wonderful dissertation committee. The chair, Hugh Gusterson, came to GMU at just the right time to set me off on new paths of inquiry.
    [Show full text]
  • Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features
    University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 2-2011 Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features B. Thomas Adler University of California, Santa Cruz, [email protected] Luca de Alfaro University of California, Santa Cruz -- Google, [email protected] Santiago M. Mola-Velasco Universidad Politcnica de Valencia, [email protected] Paolo Rosso Universidad Politcnica de Valencia, [email protected] Andrew G. West University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/cis_papers Part of the Other Computer Sciences Commons Recommended Citation B. Thomas Adler, Luca de Alfaro, Santiago M. Mola-Velasco, Paolo Rosso, and Andrew G. West, "Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features", Lecture Notes in Computer Science: Computational Linguistics and Intelligent Text Processing 6609, 277-288. February 2011. http://dx.doi.org/10.1007/978-3-642-19437-5_23 CICLing '11: Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics, Tokyo, Japan, February 20-26, 2011. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_papers/457 For more information, please contact [email protected]. Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features Abstract Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features.
    [Show full text]
  • Editing Wikipedia: a Guide to Improving Content on the Online Encyclopedia
    wikipedia globe vector [no layers] Editing Wikipedia: A guide to improving content on the online encyclopedia Wikimedia Foundation 1 Imagine a world in which every single human wikipedia globebeing vector [no layers] can freely share in the sum of all knowledge. That’s our commitment. This is the vision for Wikipedia and the other Wikimedia projects, which volunteers from around the world have been building since 2001. Bringing together the sum of all human knowledge requires the knowledge of many humans — including yours! What you can learn Shortcuts This guide will walk you through Want to see up-to-date statistics about how to contribute to Wikipedia, so Wikipedia? Type WP:STATS into the the knowledge you have can be freely search bar as pictured here. shared with others. You will find: • What Wikipedia is and how it works • How to navigate Wikipedia The text WP:STATS is what’s known • How you can contribute to on Wikipedia as a shortcut. You can Wikipedia and why you should type shortcuts like this into the search • Important rules that keep Wikipedia bar to pull up specific pages. reliable In this brochure, we designate shortcuts • How to edit Wikipedia with the as | shortcut WP:STATS . VisualEditor and using wiki markup • A step-by-step guide to adding content • Etiquette for interacting with other contributors 2 What is Wikipedia? Wikipedia — the free encyclopedia that anyone can edit — is one of the largest collaborative projects in history. With millions of articles and in hundreds of languages, Wikipedia is read by hundreds of millions of people on a regular basis.
    [Show full text]
  • Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach
    Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach Koen Smets and Bart Goethals and Brigitte Verdonk Department of Mathematics and Computer Science University of Antwerp, Antwerp, Belgium {koen.smets,bart.goethals,brigitte.verdonk}@ua.ac.be Abstract by users on a blacklist. Since the end of 2006 some vandal bots, computer programs designed to detect and revert van- Since the end of 2006 several autonomous bots are, or have dalism have seen the light on Wikipedia. Nowadays the most been, running on Wikipedia to keep the encyclopedia free from vandalism and other damaging edits. These expert sys- prominent of them are ClueBot and VoABot II. These tools tems, however, are far from optimal and should be improved are built around the same primitives that are included in Van- to relieve the human editors from the burden of manually dal Fighter. They use lists of regular expressions and consult reverting such edits. We investigate the possibility of using databases with blocked users or IP addresses to keep legit- machine learning techniques to build an autonomous system imate edits apart from vandalism. The major drawback of capable to distinguish vandalism from legitimate edits. We these approaches is the fact that these bots utilize static lists highlight the results of a small but important step in this di- of obscenities and ‘grammar’ rules which are hard to main- rection by applying commonly known machine learning al- tain and easy to deceive. As we will show, they only detect gorithms using a straightforward feature representation. De- 30% of the committed vandalism. So there is certainly need spite the promising results, this study reveals that elemen- for improvement.
    [Show full text]