An Analysis of Wikipedia
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics. -
Wiki-Reliability: a Large Scale Dataset for Content Reliability on Wikipedia
Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia KayYen Wong∗ Miriam Redi Diego Saez-Trumper Outreachy Wikimedia Foundation Wikimedia Foundation Kuala Lumpur, Malaysia London, United Kingdom Barcelona, Spain [email protected] [email protected] [email protected] ABSTRACT Wikipedia is the largest online encyclopedia, used by algorithms and web users as a central hub of reliable information on the web. The quality and reliability of Wikipedia content is maintained by a community of volunteer editors. Machine learning and information retrieval algorithms could help scale up editors’ manual efforts around Wikipedia content reliability. However, there is a lack of large-scale data to support the development of such research. To fill this gap, in this paper, we propose Wiki-Reliability, the first dataset of English Wikipedia articles annotated with a wide set of content reliability issues. To build this dataset, we rely on Wikipedia “templates”. Tem- plates are tags used by expert Wikipedia editors to indicate con- Figure 1: Example of an English Wikipedia page with several tent issues, such as the presence of “non-neutral point of view” template messages describing reliability issues. or “contradictory articles”, and serve as a strong signal for detect- ing reliability issues in a revision. We select the 10 most popular 1 INTRODUCTION reliability-related templates on Wikipedia, and propose an effective method to label almost 1M samples of Wikipedia article revisions Wikipedia is one the largest and most widely used knowledge as positive or negative with respect to each template. Each posi- repositories in the world. People use Wikipedia for studying, fact tive/negative example in the dataset comes with the full article checking and a wide set of different information needs [11]. -
Omnipedia: Bridging the Wikipedia Language
Omnipedia: Bridging the Wikipedia Language Gap Patti Bao*†, Brent Hecht†, Samuel Carton†, Mahmood Quaderi†, Michael Horn†§, Darren Gergle*† *Communication Studies, †Electrical Engineering & Computer Science, §Learning Sciences Northwestern University {patti,brent,sam.carton,quaderi}@u.northwestern.edu, {michael-horn,dgergle}@northwestern.edu ABSTRACT language edition contains its own cultural viewpoints on a We present Omnipedia, a system that allows Wikipedia large number of topics [7, 14, 15, 27]. On the other hand, readers to gain insight from up to 25 language editions of the language barrier serves to silo knowledge [2, 4, 33], Wikipedia simultaneously. Omnipedia highlights the slowing the transfer of less culturally imbued information similarities and differences that exist among Wikipedia between language editions and preventing Wikipedia’s 422 language editions, and makes salient information that is million monthly visitors [12] from accessing most of the unique to each language as well as that which is shared information on the site. more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with In this paper, we present Omnipedia, a system that attempts a multilingual Wikipedia experience. These include to remedy this situation at a large scale. It reduces the silo visualizing content in a language-neutral way and aligning effect by providing users with structured access in their data in the face of diverse information organization native language to over 7.5 million concepts from up to 25 strategies. We present a study of Omnipedia that language editions of Wikipedia. At the same time, it characterizes how people interact with information using a highlights similarities and differences between each of the multilingual lens. -
How Trust in Wikipedia Evolves: a Survey of Students Aged 11 to 25
PUBLISHED QUARTERLY BY THE UNIVERSITY OF BORÅS, SWEDEN VOL. 23 NO. 1, MARCH, 2018 Contents | Author index | Subject index | Search | Home How trust in Wikipedia evolves: a survey of students aged 11 to 25 Josiane Mothe and Gilles Sahut. Introduction. Whether Wikipedia is to be considered a trusted source is frequently questioned in France. This paper reports the results of a survey examining the levels of trust shown by young people aged eleven to twenty- five. Method. We analyse the answers given by 841 young people, aged eleven to twenty-five, to a questionnaire. To our knowledge, this is the largest study ever published on the topic. It focuses on (1) the perception young people have of Wikipedia; (2) the influence teachers and peers have on the young person’s own opinions; and (3) the variation of trends according to the education level. Analysis. All the analysis is based on ANOVA (analysis of variance) to compare the various groups of participants. We detail the results by comparing the various groups of responders and discuss these results in relation to previous studies. Results. Trust in Wikipedia depends on the type of information seeking tasks and on the education level. There are contrasting social judgments of Wikipedia. Students build a representation of a teacher’s expectations on the nature of the sources that they can use and hence the documentary acceptability of Wikipedia. The average trust attributed to Wikipedia for academic tasks could be induced by the tension between the negative academic reputation of the encyclopedia and the mostly positive experience of its credibility. -
An Analysis of Contributions to Wikipedia from Tor
Are anonymity-seekers just like everybody else? An analysis of contributions to Wikipedia from Tor Chau Tran Kaylea Champion Andrea Forte Department of Computer Science & Engineering Department of Communication College of Computing & Informatics New York University University of Washington Drexel University New York, USA Seatle, USA Philadelphia, USA [email protected] [email protected] [email protected] Benjamin Mako Hill Rachel Greenstadt Department of Communication Department of Computer Science & Engineering University of Washington New York University Seatle, USA New York, USA [email protected] [email protected] Abstract—User-generated content sites routinely block contri- butions from users of privacy-enhancing proxies like Tor because of a perception that proxies are a source of vandalism, spam, and abuse. Although these blocks might be effective, collateral damage in the form of unrealized valuable contributions from anonymity seekers is invisible. One of the largest and most important user-generated content sites, Wikipedia, has attempted to block contributions from Tor users since as early as 2005. We demonstrate that these blocks have been imperfect and that thousands of attempts to edit on Wikipedia through Tor have been successful. We draw upon several data sources and analytical techniques to measure and describe the history of Tor editing on Wikipedia over time and to compare contributions from Tor users to those from other groups of Wikipedia users. Fig. 1. Screenshot of the page a user is shown when they attempt to edit the Our analysis suggests that although Tor users who slip through Wikipedia article on “Privacy” while using Tor. Wikipedia’s ban contribute content that is more likely to be reverted and to revert others, their contributions are otherwise similar in quality to those from other unregistered participants and to the initial contributions of registered users. -
Wikipedia Citations: a Comprehensive Data Set of Citations with Identifiers Extracted from English Wikipedia
RESEARCH ARTICLE Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia Harshdeep Singh1 , Robert West1 , and Giovanni Colavizza2 an open access journal 1Data Science Laboratory, EPFL 2Institute for Logic, Language and Computation, University of Amsterdam Keywords: citations, data, data set, Wikipedia Downloaded from http://direct.mit.edu/qss/article-pdf/2/1/1/1906624/qss_a_00105.pdf by guest on 01 October 2021 Citation: Singh, H., West, R., & ABSTRACT Colavizza, G. (2020). Wikipedia citations: A comprehensive data set Wikipedia’s content is based on reliable and published sources. To this date, relatively little of citations with identifiers extracted from English Wikipedia. Quantitative is known about what sources Wikipedia relies on, in part because extracting citations Science Studies, 2(1), 1–19. https:// and identifying cited sources is challenging. To close this gap, we release Wikipedia doi.org/10.1162/qss_a_00105 Citations, a comprehensive data set of citations extracted from Wikipedia. We extracted DOI: 29.3 million citations from 6.1 million English Wikipedia articles as of May 2020, and https://doi.org/10.1162/qss_a_00105 classified as being books, journal articles, or Web content. We were thus able to extract Received: 14 July 2020 4.0 million citations to scholarly publications with known identifiers—including DOI, PMC, Accepted: 23 November 2020 PMID, and ISBN—and further equip an extra 261 thousand citations with DOIs from Crossref. Corresponding Author: As a result, we find that 6.7% of Wikipedia articles cite at least one journal article with Giovanni Colavizza [email protected] an associated DOI, and that Wikipedia cites just 2% of all articles with a DOI currently indexed in the Web of Science. -
Does Wikipedia Matter? the Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Discus Si On Paper No
Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko First version: December 2015 This version: September 2017 Download this ZEW Discussion Paper from our ftp server: http://ftp.zew.de/pub/zew-docs/dp/dp15089.pdf Die Dis cus si on Pape rs die nen einer mög lichst schnel len Ver brei tung von neue ren For schungs arbei ten des ZEW. Die Bei trä ge lie gen in allei ni ger Ver ant wor tung der Auto ren und stel len nicht not wen di ger wei se die Mei nung des ZEW dar. Dis cus si on Papers are inten ded to make results of ZEW research prompt ly avai la ble to other eco no mists in order to encou ra ge dis cus si on and sug gesti ons for revi si ons. The aut hors are sole ly respon si ble for the con tents which do not neces sa ri ly repre sent the opi ni on of the ZEW. Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices ∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ First version: December 2015 This version: September 2017 September 25, 2017 Abstract We document a causal influence of online user-generated information on real- world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional information on Wikipedia about cities affects tourists’ choices of overnight visits. -
Reliability of Wikipedia 1 Reliability of Wikipedia
Reliability of Wikipedia 1 Reliability of Wikipedia The reliability of Wikipedia (primarily of the English language version), compared to other encyclopedias and more specialized sources, is assessed in many ways, including statistically, through comparative review, analysis of the historical patterns, and strengths and weaknesses inherent in the editing process unique to Wikipedia. [1] Because Wikipedia is open to anonymous and collaborative editing, assessments of its Vandalism of a Wikipedia article reliability usually include examinations of how quickly false or misleading information is removed. An early study conducted by IBM researchers in 2003—two years following Wikipedia's establishment—found that "vandalism is usually repaired extremely quickly — so quickly that most users will never see its effects"[2] and concluded that Wikipedia had "surprisingly effective self-healing capabilities".[3] A 2007 peer-reviewed study stated that "42% of damage is repaired almost immediately... Nonetheless, there are still hundreds of millions of damaged views."[4] Several studies have been done to assess the reliability of Wikipedia. A notable early study in the journal Nature suggested that in 2005, Wikipedia scientific articles came close to the level of accuracy in Encyclopædia Britannica and had a similar rate of "serious errors".[5] This study was disputed by Encyclopædia Britannica.[6] By 2010 reviewers in medical and scientific fields such as toxicology, cancer research and drug information reviewing Wikipedia against professional and peer-reviewed sources found that Wikipedia's depth and coverage were of a very high standard, often comparable in coverage to physician databases and considerably better than well known reputable national media outlets. -
Academic Research About the Reliability of Wikipedia.Odp
Academic research about the reliability of Wikipedia 台灣維基人冬聚 Taipei, January 7, 2012 T. Bayer Perspective of this talk ● About myself: – Wikipedian since 2003 (User:HaeB on de:, en:) – Editor of The Signpost on en:, 2010-11 – Working for the Foundation since July 2011 (contractor, supporting movement communications) – Editor of the Wikimedia Research Newsletter (together with WMF research analyst Dario Taraborelli): Monthly survey on recent academic research about Wikipedia Reliability of Wikipedia ● Standard criticism: “Wikipedia is not reliable because anyone can edit” – … is fallacious: ● Yes, one traditional quality control method (restrict who can write) is totally missing ● But there is a new quality control method: Anyone can correct mistakes ● But does the new method work? ● Need to examine the content, not the process that leads to it Anecdotes vs. systematic studies ● Seigenthaler scandal (2005, vandalism in biography article). Public opinion problem: Unusual, extreme cases are more newsworthy and memorable – and because of “anyone can edit”, these tend to be negative for WP. ● Scientists are trained not to rely on anecdotes: The 2005 Nature study ● “first [study] to use peer review to compare Wiki- pedia and Britannica’s coverage of science” ● 42 reviews by experts ● Errors per article: Wikipedia 4 , Britannica 3 ● Britannica protested, Nature stood by it ● 6 years later, still the most frequently cited study about Wikipedia's reliability ●Brockhaus – “the” German encyclopedia ● “generally regarded as the model for the development of many encyclopaedias in other languages” (Britannica entry “Brockhaus Enzyklopädie”) ● 1796: First edition ● 2005: 21st edition (30 volumes) ● 2009: Editorial staff dismissed, brand sold Stern (news magazine) study, 2007 ● Compared 50 random articles in German Wikipedia and Brockhaus ● Not peer-reviewed, but conducted by experienced research institute ● Wide range of topics ● Wikipedia more accurate: Brockhaus 2.3 vs. -
Wikipedia Matters∗
Wikipedia Matters∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ September 29, 2017 Abstract We document a causal impact of online user-generated information on real-world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional content on Wikipedia pages about cities affects tourists’ choices of overnight visits. Our treatment of adding information to Wikipedia increases overnight visits by 9% during the tourist season. The impact comes mostly from improving the shorter and incomplete pages on Wikipedia. These findings highlight the value of content in digital public goods for informing individual choices. JEL: C93, H41, L17, L82, L83, L86 Keywords: field experiment, user-generated content, Wikipedia, tourism industry 1 Introduction Asymmetric information can hinder efficient economic activity. In recent decades, the Internet and new media have enabled greater access to information than ever before. However, the digital divide, language barriers, Internet censorship, and technological con- straints still create inequalities in the amount of accessible information. How much does it matter for economic outcomes? In this paper, we analyze the causal impact of online information on real-world eco- nomic outcomes. In particular, we measure the impact of information on one of the primary economic decisions—consumption. As the source of information, we focus on Wikipedia. It is one of the most important online sources of reference. It is the fifth most ∗We are grateful to Irene Bertschek, Avi Goldfarb, Shane Greenstein, Tobias Kretschmer, Thomas Niebel, Marianne Saam, Greg Veramendi, Joel Waldfogel, and Michael Zhang as well as seminar audiences at the Economics of Network Industries conference in Paris, ZEW Conference on the Economics of ICT, and Advances with Field Experiments 2017 Conference at the University of Chicago for valuable comments. -
How Does Mediawiki Work? 1 How Does Mediawiki Work?
How does MediaWiki work? 1 How does MediaWiki work? Fundamental Introduction to MediaWiki Contents • How does MediaWiki work? • Documentation • Customization • Versions & Download • Installation • Support & Contact • Development All other topics • See navigation on the left You probably know Wikipedia, the free encyclopedia, and may possibly be a little bit confused by similar, but different, words such as Wiki, Wikimedia or MediaWiki. To avoid a possible confusion between the words you may first want to read the article about the names where the differences are explained. General Overview MediaWiki is free server-based software which is licensed under the GNU General Public License (GPL). It's designed to be run on a large server farm for a website that gets millions of hits per day. MediaWiki is an extremely powerful, scalable software and a feature-rich wiki implementation, that uses PHP to process and display data stored in its MySQL database. MediaWiki can be used in large enterprise server farms as in the Wikimedia Foundation cluster. Pages use MediaWiki's wikitext format, so that users without knowledge of XHTML or CSS can edit them easily. When a user submits an edit to a page, MediaWiki writes it to the database, but without deleting the previous versions of the page, thus allowing easy reverts in case of vandalism or spamming. MediaWiki can manage image and multimedia files, too, which are stored in the filesystem. For large wikis with lots of users, MediaWiki supports caching and can be easily coupled with Squid proxy server software. How does MediaWiki work? 2 Try out Wikitext Yes, you can easily modify pages and you can (temporarily) publish dummy sentences, and you can even (temporarily) completely destroy a page in a wiki. -
Situating Wikipedia As a Health Information Resource in Various Contexts: a Scoping Review
Western University Scholarship@Western FIMS Publications Information & Media Studies (FIMS) Faculty Winter 2-19-2020 Situating Wikipedia as a health information resource in various contexts: A scoping review Denise Smith Western University, [email protected] Follow this and additional works at: https://ir.lib.uwo.ca/fimspub Part of the Library and Information Science Commons Citation of this paper: Smith, Denise, "Situating Wikipedia as a health information resource in various contexts: A scoping review" (2020). FIMS Publications. 341. https://ir.lib.uwo.ca/fimspub/341 RESEARCH ARTICLE Situating Wikipedia as a health information resource in various contexts: A scoping review 1,2 Denise A. SmithID * 1 Health Sciences Library, McMaster University, Hamilton, Ontario, Canada, 2 Faculty of Information & Media Studies, Western University, London, Ontario, Canada * [email protected] a1111111111 a1111111111 Abstract a1111111111 a1111111111 Background a1111111111 Wikipedia's health content is the most frequently visited resource for health information on the internet. While the literature provides strong evidence for its high usage, a comprehen- sive literature review of Wikipedia's role within the health context has not yet been reported. OPEN ACCESS Objective Citation: Smith DA (2020) Situating Wikipedia as a health information resource in various contexts: A To conduct a comprehensive review of peer-reviewed, published literature to learn what the scoping review. PLoS ONE 15(2): e0228786. existing body of literature says about Wikipedia as a health information resource and what https://doi.org/10.1371/journal.pone.0228786 publication trends exist, if any. Editor: Stefano Triberti, University of Milan, ITALY Received: October 17, 2019 Methods Accepted: January 22, 2020 A comprehensive literature search in OVID Medline, OVID Embase, CINAHL, LISTA, Wil- Published: February 18, 2020 son's Web, AMED, and Web of Science was performed.