Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers Abstract
THE YALE LAW JOURNAL FORUM O CTOBER 9 , 2017 Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers abstract. The problem of fake news impacts a massive online ecosystem of individuals and organizations creating, sharing, and disseminating content around the world. One effective ap- proach to addressing false information lies in monitoring such information through an active, engaged volunteer community. Wikipedia, as one of the largest online volunteer contributor communities, presents one example of this approach. This Essay argues that the existing legal framework protecting intermediary companies in the United States empowers the Wikipedia community to ensure that information is accurate and well-sourced. The Essay further argues that current legal efforts to weaken these protections, in response to the “fake news” problem, are likely to create perverse incentives that will harm volunteer engagement and confuse the public. Finally, the Essay offers suggestions for other intermediaries beyond Wikipedia to help monitor their content through user community engagement. introduction Wikipedia is well-known as a free online encyclopedia that covers nearly any topic, including both the popular and the incredibly obscure. It is also an encyclopedia that anyone can edit, an example of one of the largest crowd- sourced, user-generated content websites in the world. This user-generated model is supported by the Wikimedia Foundation, which relies on the robust intermediary liability immunity framework of U.S. law to allow the volunteer editor community to work independently. Volunteer engagement on Wikipedia provides an effective framework for combating fake news and false infor- mation. 358 wikipedia and intermediary immunity: supporting sturdy crowd systems for producing reliable information It is perhaps surprising that a project open to public editing could be highly reliable. -
Initiativen Roger Cloes Infobrief
Wissenschaftliche Dienste Deutscher Bundestag Infobrief Entwicklung und Bedeutung der im Internet ehrenamtlich eingestell- ten Wissensangebote insbesondere im Hinblick auf die Wiki- Initiativen Roger Cloes WD 10 - 3010 - 074/11 Wissenschaftliche Dienste Infobrief Seite 2 WD 10 - 3010 - 074/11 Entwicklung und Bedeutung der ehrenamtlich im Internet eingestellten Wissensangebote insbe- sondere im Hinblick auf die Wiki-Initiativen Verfasser: Dr. Roger Cloes / Tim Moritz Hector (Praktikant) Aktenzeichen: WD 10 - 3010 - 074/11 Abschluss der Arbeit: 1. Juli 2011 Fachbereich: WD 10: Kultur, Medien und Sport Ausarbeitungen und andere Informationsangebote der Wissenschaftlichen Dienste geben nicht die Auffassung des Deutschen Bundestages, eines seiner Organe oder der Bundestagsverwaltung wieder. Vielmehr liegen sie in der fachlichen Verantwortung der Verfasserinnen und Verfasser sowie der Fachbereichsleitung. Der Deutsche Bundestag behält sich die Rechte der Veröffentlichung und Verbreitung vor. Beides bedarf der Zustimmung der Leitung der Abteilung W, Platz der Republik 1, 11011 Berlin. Wissenschaftliche Dienste Infobrief Seite 3 WD 10 - 3010 - 074/11 Zusammenfassung Ehrenamtlich ins Internet eingestelltes Wissen spielt in Zeiten des sogenannten „Mitmachweb“ eine zunehmend herausgehobene Rolle. Vor allem Wikis und insbesondere Wikipedia sind ein nicht mehr wegzudenkendes Element des Internets. Keine anderen vergleichbaren Wissensange- bote im Internet, auch wenn sie zum freien Abruf eingestellt sind und mit Steuern, Gebühren, Werbeeinnahmen finanziert oder als Gratisproben im Rahmen von Geschäftsmodellen verschenkt werden, erreichen die Zugriffszahlen von Wikipedia. Das ist ein Phänomen, das in seiner Dimension vor dem Hintergrund der urheberrechtlichen Dis- kussion und der Begründung von staatlichem Schutz als Voraussetzung für die Schaffung von geistigen Gütern kaum Beachtung findet. Relativ niedrige Verbreitungskosten im Internet und geringe oder keine Erfordernisse an Kapitalinvestitionen begünstigen diese Entwicklung. -
Community Or Social Movement? Piotr Konieczny
Wikipedia: Community or social movement? Piotr Konieczny To cite this version: Piotr Konieczny. Wikipedia: Community or social movement?. Interface: a journal for and about social movements, 2009. hal-01580966 HAL Id: hal-01580966 https://hal.archives-ouvertes.fr/hal-01580966 Submitted on 4 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Interface: a journal for and about social movements Article Volume 1 (2): 212 - 232 (November 2009) Konieczny, Wikipedia Wikipedia: community or social movement? Piotr Konieczny Abstract In recent years a new realm for study of political and sociological phenomena has appeared, the Internet, contributing to major changes in our societies during its relatively brief existence. Within cyberspace, organizations whose existence is increasingly tied to this virtual world are of interest to social scientists. This study will analyze the community of one of the largest online organizations, Wikipedia, the free encyclopedia with millions of volunteer members. Wikipedia was never meant to be a community, yet it most certainly has become one. This study asks whether it is something even more –whether it is an expression of online activism, and whether it can be seen as a social movement organization, related to one or more of the Internet-centered social movements industries (in particular, the free and open-source software movement industry). -
Reliability of Wikipedia 1 Reliability of Wikipedia
Reliability of Wikipedia 1 Reliability of Wikipedia The reliability of Wikipedia (primarily of the English language version), compared to other encyclopedias and more specialized sources, is assessed in many ways, including statistically, through comparative review, analysis of the historical patterns, and strengths and weaknesses inherent in the editing process unique to Wikipedia. [1] Because Wikipedia is open to anonymous and collaborative editing, assessments of its Vandalism of a Wikipedia article reliability usually include examinations of how quickly false or misleading information is removed. An early study conducted by IBM researchers in 2003—two years following Wikipedia's establishment—found that "vandalism is usually repaired extremely quickly — so quickly that most users will never see its effects"[2] and concluded that Wikipedia had "surprisingly effective self-healing capabilities".[3] A 2007 peer-reviewed study stated that "42% of damage is repaired almost immediately... Nonetheless, there are still hundreds of millions of damaged views."[4] Several studies have been done to assess the reliability of Wikipedia. A notable early study in the journal Nature suggested that in 2005, Wikipedia scientific articles came close to the level of accuracy in Encyclopædia Britannica and had a similar rate of "serious errors".[5] This study was disputed by Encyclopædia Britannica.[6] By 2010 reviewers in medical and scientific fields such as toxicology, cancer research and drug information reviewing Wikipedia against professional and peer-reviewed sources found that Wikipedia's depth and coverage were of a very high standard, often comparable in coverage to physician databases and considerably better than well known reputable national media outlets. -
Towards Content-Driven Reputation for Collaborative Code Repositories
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 8-28-2012 Towards Content-Driven Reputation for Collaborative Code Repositories Andrew G. West University of Pennsylvania, [email protected] Insup Lee University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/cis_papers Part of the Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons, and the Software Engineering Commons Recommended Citation Andrew G. West and Insup Lee, "Towards Content-Driven Reputation for Collaborative Code Repositories", 8th International Symposium on Wikis and Open Collaboration (WikiSym '12) , 13.1-13.4. August 2012. http://dx.doi.org/10.1145/2462932.2462950 Eighth International Symposium on Wikis and Open Collaboration, Linz, Austria, August 2012. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_papers/750 For more information, please contact [email protected]. Towards Content-Driven Reputation for Collaborative Code Repositories Abstract As evidenced by SourceForge and GitHub, code repositories now integrate Web 2.0 functionality that enables global participation with minimal barriers-to-entry. To prevent detrimental contributions enabled by crowdsourcing, reputation is one proposed solution. Fortunately this is an issue that has been addressed in analogous version control systems such as the *wiki* for natural language content. The WikiTrust algorithm ("content-driven reputation"), while developed and evaluated in wiki environments operates under a possibly shared collaborative assumption: actions that "survive" subsequent edits are reflective of good authorship. In this paper we examine WikiTrust's ability to measure author quality in collaborative code development. We first define a mapping omfr repositories to wiki environments and use it to evaluate a production SVN repository with 92,000 updates. -
Checkuser and Editing Patterns
CheckUser and Editing Patterns Balancing privacy and accountability on Wikimedia projects Wikimania 2008, Alexandria, July 18, 2008 HaeB [[de:Benutzer:HaeB]], [[en:User:HaeB]] Please don't take photos during this talk. What are sockpuppets? ● Wikipedia and sister projects rely on being open: Anyone can edit, anyone can get an account. ● No ID control when registering (not even email address required) ● Many legitimate uses for multiple accounts ● “Sockpuppet” often implies deceptive intention, think ventriloquist What is the problem with sockpuppets? ● Ballot stuffing (some decision processes rely on voting, such as request for adminships and WMF board elections) ● “Dr Jekyll/Mr Hyde”: Carry out evil or controversial actions with a sockpuppet, such that the main account remains in good standing. E.g. trolling (actions intended to provoke adversive reactions and disrupt the community), or strawman accounts (putting the adversarial position in bad light) ● Artificial majorities in content disputes (useful if the wiki's culture values majority consensus over references and arguments), especially circumventing “three-revert-rule” ● Ban evasion What is the problem with sockpuppets? (cont'd) ● Newbies get bitten as a result of the possibility of ban evasion: ● Friedman and Resnick (The Social Cost of Cheap Pseudonyms, Journal of Economics and Management Strategy 2001): Proved in a game-theoretic model that the possibility of creating sockpuppets leads to distrust and discrimination against newcomers (if the community is otherwise successfully -
The Case of Wikipedia Jansn
UvA-DARE (Digital Academic Repository) Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system Niederer, S.; van Dijck, J. DOI 10.1177/1461444810365297 Publication date 2010 Document Version Submitted manuscript Published in New Media & Society Link to publication Citation for published version (APA): Niederer, S., & van Dijck, J. (2010). Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system. New Media & Society, 12(8), 1368-1387. https://doi.org/10.1177/1461444810365297 General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:27 Sep 2021 Full Title: Wisdom of the Crowd or Technicity of Content? Wikipedia as a socio-technical system Authors: Sabine Niederer and José van Dijck Sabine Niederer, University of Amsterdam, Turfdraagsterpad 9, 1012 XT Amsterdam, The Netherlands [email protected] José van Dijck, University of Amsterdam, Spuistraat 210, 1012 VT Amsterdam, The Netherlands [email protected] Authors’ Biographies Sabine Niederer is PhD candidate in Media Studies at the University of Amsterdam, and member of the Digital Methods Initiative, Amsterdam. -
Wikipedia's Labor Squeeze and Its Consequences Eric Goldman Santa Clara University School of Law, [email protected]
Santa Clara Law Santa Clara Law Digital Commons Faculty Publications Faculty Scholarship 1-1-2010 Wikipedia's Labor Squeeze and Its Consequences Eric Goldman Santa Clara University School of Law, [email protected] Follow this and additional works at: http://digitalcommons.law.scu.edu/facpubs Part of the Internet Law Commons Recommended Citation 8 J. on Telecomm. & High Tech. L. 157 (2010) This Article is brought to you for free and open access by the Faculty Scholarship at Santa Clara Law Digital Commons. It has been accepted for inclusion in Faculty Publications by an authorized administrator of Santa Clara Law Digital Commons. For more information, please contact [email protected]. WIKIPEDIA'S LABOR SQUEEZE AND ITS CONSEQUENCES ERIC GOLDMAN* INT RO D U CTIO N ................................................................................... 158 I. MEASURING WIKIPEDIA'S SUCCESS ....................................... 159 II. THREATS TO WIKIPEDIA ......................................................... 161 III. WIKIPEDIA'S RESPONSE TO THE VANDAL AND SPAMMER T H REA T S ................................................................................... 164 A. Increased TechnologicalBarriers to Participation.................... 164 B. Increased Social Barriersto Participation............................... 167 IV. WIKIPEDIA'S LOOMING LABOR SUPPLY PROBLEMS ............. 170 A . E ditor Turnover ................................................................... 170 B. Wikipedia's Limited Toolkit to Attract New Editors.............. -
A Cultural and Political Economy of Web 2.0
A CULTURAL AND POLITICAL ECONOMY OF WEB 2.0 by Robert W. Gehl A Dissertation Submitted to the Graduate Faculty of George Mason University in Partial Fulfillment of The Requirements for the Degree of Doctor of Philosophy Cultural Studies Committee: Director Program Director Dean, College of Humanities and Social Sciences Date: Spring Semester 2010 George Mason University Fairfax, VA A Cultural and Political Economy of Web 2.0 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University. By Robert W. Gehl Master of Arts Western Michigan University, 2003 Director: Hugh Gusterson, Professor Department of Sociology and Anthropology Spring Semester 2010 George Mason University Fairfax, VA Copyright © 2010 Robert W. Gehl Creative Commons Attribution-Noncommercial-Share Alike 3.0 All trademarks, service marks, logos and company names mentioned in this work are the property of their respective owners. ii Dedication This dissertation is dedicated to one TJ, Teddy, who kindly slept through most of it and danced through the rest, and another TJ, Jesse, who works so hard for our little family. It is also dedicated to my parents, who filled my childhood house with books, computers, and love. Thank you. iii Acknowledgments Singlehandedly authoring a dissertation on Web 2.0 – a phenomenon which can at its best be about multiauthor collaboration in creative projects – is an ironic act. While I might claim authorship of this dissertation, it really is the result of many people working together. Specifically, my wonderful dissertation committee. The chair, Hugh Gusterson, came to GMU at just the right time to set me off on new paths of inquiry. -
Editing Wikipedia: a Guide to Improving Content on the Online Encyclopedia
wikipedia globe vector [no layers] Editing Wikipedia: A guide to improving content on the online encyclopedia Wikimedia Foundation 1 Imagine a world in which every single human wikipedia globebeing vector [no layers] can freely share in the sum of all knowledge. That’s our commitment. This is the vision for Wikipedia and the other Wikimedia projects, which volunteers from around the world have been building since 2001. Bringing together the sum of all human knowledge requires the knowledge of many humans — including yours! What you can learn Shortcuts This guide will walk you through Want to see up-to-date statistics about how to contribute to Wikipedia, so Wikipedia? Type WP:STATS into the the knowledge you have can be freely search bar as pictured here. shared with others. You will find: • What Wikipedia is and how it works • How to navigate Wikipedia The text WP:STATS is what’s known • How you can contribute to on Wikipedia as a shortcut. You can Wikipedia and why you should type shortcuts like this into the search • Important rules that keep Wikipedia bar to pull up specific pages. reliable In this brochure, we designate shortcuts • How to edit Wikipedia with the as | shortcut WP:STATS . VisualEditor and using wiki markup • A step-by-step guide to adding content • Etiquette for interacting with other contributors 2 What is Wikipedia? Wikipedia — the free encyclopedia that anyone can edit — is one of the largest collaborative projects in history. With millions of articles and in hundreds of languages, Wikipedia is read by hundreds of millions of people on a regular basis. -
Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach
Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach Koen Smets and Bart Goethals and Brigitte Verdonk Department of Mathematics and Computer Science University of Antwerp, Antwerp, Belgium {koen.smets,bart.goethals,brigitte.verdonk}@ua.ac.be Abstract by users on a blacklist. Since the end of 2006 some vandal bots, computer programs designed to detect and revert van- Since the end of 2006 several autonomous bots are, or have dalism have seen the light on Wikipedia. Nowadays the most been, running on Wikipedia to keep the encyclopedia free from vandalism and other damaging edits. These expert sys- prominent of them are ClueBot and VoABot II. These tools tems, however, are far from optimal and should be improved are built around the same primitives that are included in Van- to relieve the human editors from the burden of manually dal Fighter. They use lists of regular expressions and consult reverting such edits. We investigate the possibility of using databases with blocked users or IP addresses to keep legit- machine learning techniques to build an autonomous system imate edits apart from vandalism. The major drawback of capable to distinguish vandalism from legitimate edits. We these approaches is the fact that these bots utilize static lists highlight the results of a small but important step in this di- of obscenities and ‘grammar’ rules which are hard to main- rection by applying commonly known machine learning al- tain and easy to deceive. As we will show, they only detect gorithms using a straightforward feature representation. De- 30% of the committed vandalism. So there is certainly need spite the promising results, this study reveals that elemen- for improvement. -
Why Wikipedia Is So Successful?
Term Paper E Business Technologies Why Wikipedia is so successful? Master in Business Consulting Winter Semester 2010 Submitted to Prof.Dr. Eduard Heindl Prepared by Suresh Balajee Madanakannan Matriculation number-235416 Fakultät Wirtschaftsinformatik Hochschule Furtwangen University I ACKNOWLEDGEMENT This is to claim that all the content in this article are from the author Suresh Balajee Madanakannan. The resources can found in the reference list at the end of each page. All the ideas and state in the article are from the author himself with none plagiary and the author owns the copyright of this article. Suresh Balajee Madanakannan II Contents 1. Introduction .............................................................................................................................................. 1 1.1 About Wikipedia ................................................................................................................................. 1 1.2 Wikipedia servers and architecture .................................................................................................... 5 2. Factors that led Wikipedia to be successful ............................................................................................ 7 2.1 User factors ......................................................................................................................................... 7 2.2 Knowledge factors .............................................................................................................................. 8