Automatically Labeling Low Quality Content on Wikipedia by Leveraging 22 Patterns in Editing Behavior
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
1 Wikipedia: an Effective Anarchy Dariusz Jemielniak, Ph.D
Wikipedia: An Effective Anarchy Dariusz Jemielniak, Ph.D. Kozminski University [email protected] Paper presented at the Society for Applied Anthropology conference in Baltimore, MD (USA), 27-31 March, 2012 (work in progress) This paper is the first report from a virtual ethnographic study (Hine, 2000; Kozinets, 2010) of Wikipedia community conducted 2006-2012, by the use of participative methods, and relying on an narrative analysis of Wikipedia organization (Czarniawska, 2000; Boje, 2001; Jemielniak & Kostera, 2010). It serves as a general introduction to Wikipedia community, and is also a basis for a discussion of a book in progress, which is going to address the topic. Contrarily to a common misconception, Wikipedia was not the first “wiki” in the world. “Wiki” (originated from Hawaiian word for “quick” or “fast”, and named after “Wiki Wiki Shuttle” on Honolulu International Airport) is a website technology based on a philosophy of tracking changes added by the users, with a simplified markup language (allowing easy additions of, e.g. bold, italics, or tables, without the need to learn full HTML syntax), and was originally created and made public in 1995 by Ward Cunningam, as WikiWikiWeb. WikiWikiWeb was an attractive choice among enterprises and was used for communication, collaborative ideas development, documentation, intranet, knowledge management, etc. It grew steadily in popularity, when Jimmy “Jimbo” Wales, then the CEO of Bomis Inc., started up his encyclopedic project in 2000: Nupedia. Nupedia was meant to be an online encyclopedia, with free content, and written by experts. In an attempt to meet the standards set by professional encyclopedias, the creators of Nupedia based it on a peer-review process, and not a wiki-type software. -
Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers Abstract
THE YALE LAW JOURNAL FORUM O CTOBER 9 , 2017 Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers abstract. The problem of fake news impacts a massive online ecosystem of individuals and organizations creating, sharing, and disseminating content around the world. One effective ap- proach to addressing false information lies in monitoring such information through an active, engaged volunteer community. Wikipedia, as one of the largest online volunteer contributor communities, presents one example of this approach. This Essay argues that the existing legal framework protecting intermediary companies in the United States empowers the Wikipedia community to ensure that information is accurate and well-sourced. The Essay further argues that current legal efforts to weaken these protections, in response to the “fake news” problem, are likely to create perverse incentives that will harm volunteer engagement and confuse the public. Finally, the Essay offers suggestions for other intermediaries beyond Wikipedia to help monitor their content through user community engagement. introduction Wikipedia is well-known as a free online encyclopedia that covers nearly any topic, including both the popular and the incredibly obscure. It is also an encyclopedia that anyone can edit, an example of one of the largest crowd- sourced, user-generated content websites in the world. This user-generated model is supported by the Wikimedia Foundation, which relies on the robust intermediary liability immunity framework of U.S. law to allow the volunteer editor community to work independently. Volunteer engagement on Wikipedia provides an effective framework for combating fake news and false infor- mation. 358 wikipedia and intermediary immunity: supporting sturdy crowd systems for producing reliable information It is perhaps surprising that a project open to public editing could be highly reliable. -
'Anyone Can Edit', Not Everyone Does: Wikipedia and the Gender
Heather Ford and Judy Wajcman ‘Anyone can edit’, not everyone does: Wikipedia and the gender gap Article (Accepted version) (Refereed) Original citation: Ford, Heather and Wajcman, Judy (2017) ‘Anyone can edit’, not everyone does: Wikipedia and the gender gap. Social Studies of Science, 47 (4). pp. 511-527. ISSN 0306-3127 DOI: 10.1177/0306312717692172 © 2017 The Authors This version available at: http://eprints.lse.ac.uk/68675/ Available in LSE Research Online: September 2017 LSE has developed LSE Research Online so that users may access research output of the School. Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Users may download and/or print one copy of any article(s) in LSE Research Online to facilitate their private study or for non-commercial research. You may not engage in further distribution of the material or use it for any profit-making activities or any commercial gain. You may freely distribute the URL (http://eprints.lse.ac.uk) of the LSE Research Online website. This document is the author’s final accepted version of the journal article. There may be differences between this version and the published version. You are advised to consult the publisher’s version if you wish to cite from it. Anyone can edit, not everyone does: Wikipedias infrastructure and the gender gap Heather Ford School of Media and Communication, University of Leeds, UK Judy Wajcman Department of Sociology, London School of Economics, UK Abstract Feminist STS has continues to define what counts as knowledge and expertise. -
Wikipedia Citations: a Comprehensive Data Set of Citations with Identifiers Extracted from English Wikipedia
RESEARCH ARTICLE Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia Harshdeep Singh1 , Robert West1 , and Giovanni Colavizza2 an open access journal 1Data Science Laboratory, EPFL 2Institute for Logic, Language and Computation, University of Amsterdam Keywords: citations, data, data set, Wikipedia Downloaded from http://direct.mit.edu/qss/article-pdf/2/1/1/1906624/qss_a_00105.pdf by guest on 01 October 2021 Citation: Singh, H., West, R., & ABSTRACT Colavizza, G. (2020). Wikipedia citations: A comprehensive data set Wikipedia’s content is based on reliable and published sources. To this date, relatively little of citations with identifiers extracted from English Wikipedia. Quantitative is known about what sources Wikipedia relies on, in part because extracting citations Science Studies, 2(1), 1–19. https:// and identifying cited sources is challenging. To close this gap, we release Wikipedia doi.org/10.1162/qss_a_00105 Citations, a comprehensive data set of citations extracted from Wikipedia. We extracted DOI: 29.3 million citations from 6.1 million English Wikipedia articles as of May 2020, and https://doi.org/10.1162/qss_a_00105 classified as being books, journal articles, or Web content. We were thus able to extract Received: 14 July 2020 4.0 million citations to scholarly publications with known identifiers—including DOI, PMC, Accepted: 23 November 2020 PMID, and ISBN—and further equip an extra 261 thousand citations with DOIs from Crossref. Corresponding Author: As a result, we find that 6.7% of Wikipedia articles cite at least one journal article with Giovanni Colavizza [email protected] an associated DOI, and that Wikipedia cites just 2% of all articles with a DOI currently indexed in the Web of Science. -
Community Or Social Movement? Piotr Konieczny
Wikipedia: Community or social movement? Piotr Konieczny To cite this version: Piotr Konieczny. Wikipedia: Community or social movement?. Interface: a journal for and about social movements, 2009. hal-01580966 HAL Id: hal-01580966 https://hal.archives-ouvertes.fr/hal-01580966 Submitted on 4 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Interface: a journal for and about social movements Article Volume 1 (2): 212 - 232 (November 2009) Konieczny, Wikipedia Wikipedia: community or social movement? Piotr Konieczny Abstract In recent years a new realm for study of political and sociological phenomena has appeared, the Internet, contributing to major changes in our societies during its relatively brief existence. Within cyberspace, organizations whose existence is increasingly tied to this virtual world are of interest to social scientists. This study will analyze the community of one of the largest online organizations, Wikipedia, the free encyclopedia with millions of volunteer members. Wikipedia was never meant to be a community, yet it most certainly has become one. This study asks whether it is something even more –whether it is an expression of online activism, and whether it can be seen as a social movement organization, related to one or more of the Internet-centered social movements industries (in particular, the free and open-source software movement industry). -
Wikipédia, Mythes Et Réalités
Wikipédia, mythes et réalités David Monniaux Wikimédia France 28 janvier 2011 David Monniaux (Wikimédia France) Wikipédia, mythes et réalités 28 janvier 2011 1 / 62 Qu’est-ce que Wikipédia ? http://www.wikipedia.org/ I Un site Web. I Présentant une collection d’articles encyclopédiques. I Éditables par tout à chacun (via connexion Internet). I Pas de comité éditorial. I Dans de multiples langues : http://fr.wikipedia.org/ pour le français, http://en.wikipedia.org/ pour l’anglais. David Monniaux (Wikimédia France) Wikipédia, mythes et réalités 28 janvier 2011 2 / 62 Aspects juridiques Aspects juridiques Aspects éditoriaux En pratique Les articles à avoir... ou pas Le danger Wikipédia La CIA et le Vatican manipulent Wikipédia Google+Wikipédia dévoie la jeunesse La culture du copier-coller Wikipédia, surtout forte en culture populaire Une vérité ? Conclusion David Monniaux (Wikimédia France) Wikipédia, mythes et réalités 28 janvier 2011 3 / 62 Aspects juridiques Hébergement Initialement, projet sur quelques machines hébergées chez Bomis, entreprise de Jimmy Wales. Wikipédia est maintenant un site important : e I Comscore décembre 2010 : 12 site aux USA e I Comscore 2010 : 5 site aux USA (après Google, Microsoft, Yahoo !, Facebook et devant AOL, eBay, Ask, Amazon...) e I Médiamétrie novembre 2010 : 6 site en France (après Google, Facebook, Microsoft, Orange, Youtube et devant Free, Yahoo !, Pages Jaunes...). De très loin le premier site non commercial, premier site culturel et éducatif. David Monniaux (Wikimédia France) Wikipédia, mythes et réalités 28 janvier 2011 4 / 62 Aspects juridiques Hébergement haut débit Jusqu’à 90000 requêtes http/s Les pannes de Wikipédia sont rapportées dans la presse ! Ceci nécessite : I Hébergement solide, matériel suffisant.. -
Genre Analysis of Online Encyclopedias. the Case of Wikipedia
Genre analysis online encycloped The case of Wikipedia AnnaTereszkiewicz Genre analysis of online encyclopedias The case of Wikipedia Wydawnictwo Uniwersytetu Jagiellońskiego Publikacja dofi nansowana przez Wydział Filologiczny Uniwersytetu Jagiellońskiego ze środków wydziałowej rezerwy badań własnych oraz Instytutu Filologii Angielskiej PROJEKT OKŁADKI Bartłomiej Drosdziok Zdjęcie na okładce: Łukasz Stawarski © Copyright by Anna Tereszkiewicz & Wydawnictwo Uniwersytetu Jagiellońskiego Wydanie I, Kraków 2010 All rights reserved Książka, ani żaden jej fragment nie może być przedrukowywana bez pisemnej zgody Wydawcy. W sprawie zezwoleń na przedruk należy zwracać się do Wydawnictwa Uniwersytetu Jagiellońskiego. ISBN 978-83-233-2813-1 www.wuj.pl Wydawnictwo Uniwersytetu Jagiellońskiego Redakcja: ul. Michałowskiego 9/2, 31-126 Kraków tel. 12-631-18-81, 12-631-18-82, fax 12-631-18-83 Dystrybucja: tel. 12-631-01-97, tel./fax 12-631-01-98 tel. kom. 0506-006-674, e-mail: [email protected] Konto: PEKAO SA, nr 80 1240 4722 1111 0000 4856 3325 Table of Contents Acknowledgements ........................................................................................................................ 9 Introduction .................................................................................................................................... 11 Materials and Methods .................................................................................................................. 14 1. Genology as a study .................................................................................................................. -
Citations Needed: Build Your Wikipedia Skills While Building the World’S Encyclopedia
A companion guide to deepen your learning during the WebJunction webinar on January 10, 2018, at 3:00 pm EST Citations Needed: Build Your Wikipedia Skills While Building the World’s Encyclopedia A glimpse into the inner workings of English Wikipedia for information professionals The Five Pillars of Wikipedia What are the ways in which the five pillars of Wikipedia align 1. Wikipedia is an encyclopedia with the mission of libraries? 2. It is written from a Neutral Point of View (NPOV) 3. It’s free content that anyone can use, edit, and distribute 4. Editors should treat each other with respect and civility 5. Wikipedia has no firm rules Learn about what U.S. public library staff are doing with Wikipedia in the WebJunction series Librarians Who Wikipedia List two (or more) insights you’ve gained about how Wikipedia editing works, such as the color-coded peer assessments that are shown in the chart below. 1. 2. How does learning about Wikipedia’s inner workings help you evaluate the quality of articles? Wikipedia’s articles are in a constant state of development, learn more about quality assessments made by other editors 1 | P a g e OCLC Wikipedia + Libraries: Better Together About the #1lib1ref campaign (and how you and your library can participate) What is the #1lib1ref campaign? How can you participate? How can your library participate? #1lib1ref The Wikipedia Library’s annual It’s easy! Follow the steps on Plan a #1lib1ref event for your #1lib1ref (“One Librarian, One pages three and four to insert a library, Wikipedia is better with Reference”) global campaign reference as a footnote citation. -
The Missing Wikipedia Ads.Pdf
The missing Wikipedia ads Designing targeted acquisition campaigns Dario Taraborelli • Wikimedia Foundation Wikimania 2014 • London, 9 August 2014 Q: How to use gaps and biases in Wikipedia to engage new and more diverse contributors A: Adsense for Wikipedia Targeted acquisition/contribution campaigns Overview Rationale (and debunking a few myths...) 1. scaling outreach campaigns 2. turning gaps into hooks 3. targeted outreach Proposal 1. applications 2. infrastructure needed Outreach campaigns work Monthly active editors by project Commons Wikidata Wiki Loves * wiki loves pride Q1: If outreach campaigns work, how do we make them cheap to programmatically run at scale? No shortage of work On the English Wikipedia only: 2.5M articles assessed as stubs1 20K articles need cleanup2 Hundreds of missing articles sought by at least 1K readers every week3 Eric Fischer: A sidewalk is not just some hunk of concrete. It is something that somebody made. It humanizes the city. Q2: If there is a large backlog of work to do, how can we make it programmatically accessible? Targeted outreach registered first-time reader user contributor acquire first, activate later first-time reader contributor activate first 30-day new editor activation by referral (source - data) Q3: How do we programmatically reach out to subject matter experts who are likely to become future Wikimedians? Q1: If outreach campaigns work, how do we make them cheap to run at scale? Q2: If there’s a large backlog of work to do, how can we make it programmatically accessible? Q3: How do we programmatically reach out to subject matter experts who are likely to become future Wikimedians? Targeted acquisition campaigns broadcast engage measure Applications Embeddable calls to action Women in Science Wikipedia needs your help The English Wikipedia article Women in Science needs contributors from a more global perspective. -
Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 2-2011 Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features B. Thomas Adler University of California, Santa Cruz, [email protected] Luca de Alfaro University of California, Santa Cruz -- Google, [email protected] Santiago M. Mola-Velasco Universidad Politcnica de Valencia, [email protected] Paolo Rosso Universidad Politcnica de Valencia, [email protected] Andrew G. West University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/cis_papers Part of the Other Computer Sciences Commons Recommended Citation B. Thomas Adler, Luca de Alfaro, Santiago M. Mola-Velasco, Paolo Rosso, and Andrew G. West, "Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features", Lecture Notes in Computer Science: Computational Linguistics and Intelligent Text Processing 6609, 277-288. February 2011. http://dx.doi.org/10.1007/978-3-642-19437-5_23 CICLing '11: Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics, Tokyo, Japan, February 20-26, 2011. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_papers/457 For more information, please contact [email protected]. Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features Abstract Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. -
Editing Wikipedia: a Guide to Improving Content on the Online Encyclopedia
wikipedia globe vector [no layers] Editing Wikipedia: A guide to improving content on the online encyclopedia Wikimedia Foundation 1 Imagine a world in which every single human wikipedia globebeing vector [no layers] can freely share in the sum of all knowledge. That’s our commitment. This is the vision for Wikipedia and the other Wikimedia projects, which volunteers from around the world have been building since 2001. Bringing together the sum of all human knowledge requires the knowledge of many humans — including yours! What you can learn Shortcuts This guide will walk you through Want to see up-to-date statistics about how to contribute to Wikipedia, so Wikipedia? Type WP:STATS into the the knowledge you have can be freely search bar as pictured here. shared with others. You will find: • What Wikipedia is and how it works • How to navigate Wikipedia The text WP:STATS is what’s known • How you can contribute to on Wikipedia as a shortcut. You can Wikipedia and why you should type shortcuts like this into the search • Important rules that keep Wikipedia bar to pull up specific pages. reliable In this brochure, we designate shortcuts • How to edit Wikipedia with the as | shortcut WP:STATS . VisualEditor and using wiki markup • A step-by-step guide to adding content • Etiquette for interacting with other contributors 2 What is Wikipedia? Wikipedia — the free encyclopedia that anyone can edit — is one of the largest collaborative projects in history. With millions of articles and in hundreds of languages, Wikipedia is read by hundreds of millions of people on a regular basis. -
Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach
Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach Koen Smets and Bart Goethals and Brigitte Verdonk Department of Mathematics and Computer Science University of Antwerp, Antwerp, Belgium {koen.smets,bart.goethals,brigitte.verdonk}@ua.ac.be Abstract by users on a blacklist. Since the end of 2006 some vandal bots, computer programs designed to detect and revert van- Since the end of 2006 several autonomous bots are, or have dalism have seen the light on Wikipedia. Nowadays the most been, running on Wikipedia to keep the encyclopedia free from vandalism and other damaging edits. These expert sys- prominent of them are ClueBot and VoABot II. These tools tems, however, are far from optimal and should be improved are built around the same primitives that are included in Van- to relieve the human editors from the burden of manually dal Fighter. They use lists of regular expressions and consult reverting such edits. We investigate the possibility of using databases with blocked users or IP addresses to keep legit- machine learning techniques to build an autonomous system imate edits apart from vandalism. The major drawback of capable to distinguish vandalism from legitimate edits. We these approaches is the fact that these bots utilize static lists highlight the results of a small but important step in this di- of obscenities and ‘grammar’ rules which are hard to main- rection by applying commonly known machine learning al- tain and easy to deceive. As we will show, they only detect gorithms using a straightforward feature representation. De- 30% of the committed vandalism. So there is certainly need spite the promising results, this study reveals that elemen- for improvement.