Implementation and Evaluation of a Framework to Calculate Impact Measures for Wikipedia Authors

Total Page:16

File Type:pdf, Size:1020Kb

Implementation and Evaluation of a Framework to Calculate Impact Measures for Wikipedia Authors Implementation and Evaluation of a Framework to calculate Impact Measures for Wikipedia Authors Bachelor Thesis by Sebastian Neef [email protected] Submitted to the Faculty IV, Electrical Engineering and Computer Science Database Systems and Information Management Group in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science at the arXiv:1709.01142v1 [cs.DL] 26 Aug 2017 Technische Universität Berlin June 29, 2017 Thesis Advisors: Moritz Schubotz Thesis Supervisor: Prof. Dr. Volker Markl Prof. Dr. Odej Kao Eidesstattliche Erklärung Hiermit erkläre ich, dass ich die vorliegende Arbeit selbstständig und eigenhändig sowie ohne unerlaubte fremde Hilfe und ausschließlich unter Verwendung der aufgeführten Quellen und Hilfsmittel angefertigt habe. Die selbständige und eigenständige Anfertigung versichert an Eides statt: Berlin, den 29. Juni 2017 Sebastian Neef III Zusammenfassung Die Wikipedia ist eine offene, kollaborative Webseite, welche von jedermann, auch anonym, bearbeitet und deswegen Opfer von unerwünschten Änderungen wer- den kann. Anhand der kontinuierlichen Versionierung aller Änderungen und mit Ranglisten, basierend auf der Berechnung von impact measures, können uner- wünschte Änderungen erkannt sowie angesehene Nutzer identifiziert werden [4]. Allerdings benötigt das Verarbeiten vieler Millionen Revisionen auf einem einzel- nen System viel Zeit. Der Autor implementiert ein quelloffenes Framework, um solche Ranglisten auf verteilten Systemen im MapReduce-Stil zu berechnen, und evaluiert dessen Performance auf verschieden großen Datensätzen. Die Nachim- plementierung der contribution measures von Adler u. a. sollen die Erweiterbarkeit und Nutzbarkeit, als auch die Probleme beim Handhaben von riesigen Datensätzen und deren mögliche Lösungsansätze demonstrieren. In den Ergebnissen werden die verschiedenen Optimierungen diskutiert und gezeigt, dass horizontale Skalierung die gesamte Verarbeitungsdauer reduzieren kann. Abstract Wikipedia, an open collaborative website, can be edited by anyone, even anony- mously, thus becoming victim to ill-intentioned changes. Therefore, ranking Wikipedia authors by calculating impact measures based on the edit history can help to identify reputational users or harmful activity such as vandalism [4]. However, processing millions of edits on one system can take a long time. The author implements an open source framework to calculate such rankings in a distributed way (MapReduce) and evaluates its performance on various sized datasets. A reimplementation of the contribution measures by Adler et al. demonstrates its extensibility and usability, as well as problems of handling huge datasets and their possible resolutions. The results put different performance optimizations into perspective and show that horizontal scaling can decrease the total processing time. IV Contents 1 Introduction1 1.1 Background . .1 1.2 Related work . .2 1.3 Problem statement . .2 1.4 Keywords and abbreviations . .4 1.5 Setup . .7 2 Implementation 11 2.1 Investigating compressed datasets . 11 2.2 Plan of implementation . 19 2.3 Functionality and extensibility . 41 2.4 Problems and solutions . 47 3 Evaluation 59 3.1 WikiTrust vs. framework . 59 3.2 Performance comparison . 63 3.3 Framework optimizations . 68 3.4 Pageviews parsing . 76 4 Discussion 79 4.1 Performance and ranking results . 79 4.2 Framework use cases . 93 4.3 Future work . 97 5 Conclusion 100 6 References 102 7 Appendix 107 7.1 Source code on CD and GitHub . 107 7.2 Java, Flink, WikiTrust and framework installation . 109 7.3 WikiTrust and framework bash-loop . 111 7.4 WikiTrust and framework author reputations . 113 7.5 List of figures, tables and listings . 117 V 1 Introduction 1 Introduction In this first section the thesis’ topic is introduced. Starting with the background and related work then continuing with the problem statement and goal. Important keywords, formulas and abbreviations are defined, followed by general information about the used software and hardware setup. 1.1 Background People publish articles or contribute to open source projects without an immedi- ate reward, but hope for indirect rewards like extending their skill set, own mar- ketability or peer recognition [37, p. 253]. This may be one of the reasons for using platforms like ResearchGate1. It uses an unknown algorithm [30] to calculate an individual’s RG Score2 and position her in the community, leading to said indirect rewards. Such impact measures have been used to accept or reject applicants [15, p. 391]. Another well known measure of the quality of a scientist’s work is the h-index (see 1.4.1)[15, p. 392]. Unlike ResearchGate, the free encyclopedia Wikipedia3 does not display author performance measures besides edit counts. Thus making it harder for them to turn their contributions into indirect rewards. In fact, Wikimedia relied on the edit count and votes to nominate new moderators [39]. Another point which makes researching this topic interesting are the high user and edit counts. In 2008, Wikipedia had more than 300.000 authors with at least ten edits and the numbers have been growing by 5 - 10 percent per month since then [37, p. 243 - 244]. Even at the time of writing in March 2017, Wikipedia has more than 10.000 active authors with more than 100 edits in that month [44]. 1https://www.researchgate.net/ 2https://www.researchgate.net/publicprofile.RGScoreFAQ.html 3https://wikipedia.org 1 1.2 Related work 1.2 Related work Previous research focused only on the edit count [46, 29, 40] or length of changed text [38,3]. Schwartz (2006) discovered high discrepancies between some users’ edit and text count. He noticed that top contributors by edit count are not necessarily on the top by text count [38]. Adler et al. conclude that the quantitative measures edit or text count can be manipulated easily. Therefor, more weight should be put on the content’s quality with qualitative measures. One measure they introduced was the longevity of a change [4, p. 1f]. Their concepts have been used in more recent publications like the WikiTrust program [22, p. 38] or as a consideration for another rating system [36, p. 75]. A service that makes use of Wikipedia’s history trying to detect vandalism by clas- sifying an edit’s quality, is the “Objective Revision Evaluation Service (ORES)“ by MediaWiki [32]. Its approach focuses on machine learning, but manual classifi- cation work by the community is needed to get accurate results. Furthermore, it appears that only a limited amount of Wikipedia sites have this service enabled [32]. 1.3 Problem statement The work by Adler et al. is a promising starting point on the impact measure topic, due to their proposal of several formulas that calculate an author ranking based on the impact of an author’s edit. Adler et al. call those formulas “contribution measures“ [4]. The introduced terminology will be adapted to reduce confusion and credit their work. Unfortunately the authors don’t discuss how to efficiently analyze the around 2.6 billion4 edits. This is the bachelor thesis’ starting point. It will introduce and discuss an open source framework, which prepares Wikipedia’s edit history and page views and facilitates the development of distributed impact measures and rankings for Wikipedia authors. Such a framework could lead to better reproducibility of 4https://tools.wmflabs.org/wmcounter 2 1.3 Problem statement author rankings. The thesis will try to provide reasonable information to keep all tests and results reproducible. Therefore the following research objective was defined: Implement and evaluate a framework for distributed cal- culation of impact measures for Wikipedia authors. The following tasks were set to achieve this objective: • Task 1: Analyze the input datasets in regard to their layout and format. • Task 2: Design and implement the distributed framework. • Task 3: Implement Adler et al. contribution measures as described in [4]. • Task 4: Evaluate the framework’s processing speed by comparing it to Adler et al. WikiTrust program [18]. The first problem to address is the parsing of Wikipedia XML dumps (see 1.4.2; [21]). For example, the English Wikipedia is more than 0.5 TB compressed and extracts to multiple (>= 10) TB of XML data. The student cluster (see 1.5.1) does not have enough disk space to store such amounts of data, thus processing the data in its compressed format will be explored in section “Investigating compressed datasets“. This amount of data has to be transformed from its XML state into an usable representation for further processing, for example objects with the attributes as member variables and references between correlated objects, so that distributed algorithms can operate on them. That also covers the calculation of differences between two revisions of a page. Different types of processing techniques will be tested on multiple datasets and evaluated based on the execution time. Wikipedia’s page views (see 1.4.3;[34, 27]) is an additional source of information which was not used by to Adler et al., but might be an important factor to rate authors. The data needs to be parsed as well and incorporated into the framework’s representation of a Wikipedia page. The key functionality of the framework is to hide the initial Wikipedia XML or page views data processing from the user, who uses the framework to calculate author rankings for Wikipedia sites. It should also try to hide the parallel data processing mechanisms as far as it is possible. Another important point is the extensibility which will make the framework customizable in mostly every aspect. 3 1.4 Keywords and abbreviations 1.4 Keywords and abbreviations This section explains and defines important abbreviations and keywords which will be used throughout the thesis. 1.4.1 H-index The h-index: An author A has a non-empty set PA = fp1; : : : ; png of n 2 N publications. Let cA : PA ! N be a function which returns a publication’s citation count. Let PAi be the tuple of publications, sorted by decreasing citation count.
Recommended publications
  • Causal Inference in the Presence of Interference in Sponsored Search Advertising
    Causal Inference in the Presence of Interference in Sponsored Search Advertising Razieh Nabi∗ Joel Pfeiffer Murat Ali Bayir Johns Hopkins University Microsoft Bing Ads Microsoft Bing Ads [email protected] [email protected] [email protected] Denis Charles Emre Kıcıman Microsoft Bing Ads Microsoft Research [email protected] [email protected] Abstract In classical causal inference, inferring cause-effect relations from data relies on the assumption that units are independent and identically distributed. This assumption is violated in settings where units are related through a network of dependencies. An example of such a setting is ad placement in sponsored search advertising, where the clickability of a particular ad is potentially influenced by where it is placed and where other ads are placed on the search result page. In such scenarios, confounding arises due to not only the individual ad-level covariates but also the placements and covariates of other ads in the system. In this paper, we leverage the language of causal inference in the presence of interference to model interactions among the ads. Quantification of such interactions allows us to better understand the click behavior of users, which in turn impacts the revenue of the host search engine and enhances user satisfaction. We illustrate the utility of our formalization through experiments carried out on the ad placement system of the Bing search engine. 1 Introduction In recent years, advertisers have increasingly shifted their ad expenditures online. One of the most effective platforms for online advertising is search engine result pages. Given a user query, the search engine allocates a few ad slots (e.g., above or below its organic search results) and runs an auction among advertisers who are bidding and competing for these slots.
    [Show full text]
  • Understanding Attention Metrics
    Understanding Attention Metrics sponsored by: 01 introduction For years and years we’ve been told that if the click isn’t on its deathbed, it should be shivved to death immediately. While early display ads in the mid-90s could boast CTRs above 40%, the likely number you’ll see attached to a campaign these days is below 0.1%. But for some reason, even in 2015, click-through rate remains as vital to digital ad measurement although its importance is widely ridiculed. Reliance on the click means delivering unfathomable amounts of ad impressions to users, which has hyper-inflated the value of the pageview. This has led to a broken advertising system that rewards quantity over quality. Instead of building audiences, digital publishers are chasing traffic, trying to lure users to sites via “clickbait” headlines where the user is assaulted with an array of intrusive ad units. The end effect is overwhelming, ineffective ads adjacent to increasingly shoddy content. However, the rise of digital video and viewability have injected linear TV measurement capabilities into the digital space. By including time in its calculation, viewability has opened a conversation into the value of user attention. It has introduced metrics with the potential to assist premium publishers in building attractive audiences; better quantify exposure for advertisers; and ultimately financially reward publishers with more engaged audiences. This is a new and fast developing area, but one where a little education can go a long way. In addition to describing current attention metrics, this playbook will dive into what you should look for in an advanced metrics partner and how to use these metrics in selling inventory and optimizing campaign performance.
    [Show full text]
  • Initiativen Roger Cloes Infobrief
    Wissenschaftliche Dienste Deutscher Bundestag Infobrief Entwicklung und Bedeutung der im Internet ehrenamtlich eingestell- ten Wissensangebote insbesondere im Hinblick auf die Wiki- Initiativen Roger Cloes WD 10 - 3010 - 074/11 Wissenschaftliche Dienste Infobrief Seite 2 WD 10 - 3010 - 074/11 Entwicklung und Bedeutung der ehrenamtlich im Internet eingestellten Wissensangebote insbe- sondere im Hinblick auf die Wiki-Initiativen Verfasser: Dr. Roger Cloes / Tim Moritz Hector (Praktikant) Aktenzeichen: WD 10 - 3010 - 074/11 Abschluss der Arbeit: 1. Juli 2011 Fachbereich: WD 10: Kultur, Medien und Sport Ausarbeitungen und andere Informationsangebote der Wissenschaftlichen Dienste geben nicht die Auffassung des Deutschen Bundestages, eines seiner Organe oder der Bundestagsverwaltung wieder. Vielmehr liegen sie in der fachlichen Verantwortung der Verfasserinnen und Verfasser sowie der Fachbereichsleitung. Der Deutsche Bundestag behält sich die Rechte der Veröffentlichung und Verbreitung vor. Beides bedarf der Zustimmung der Leitung der Abteilung W, Platz der Republik 1, 11011 Berlin. Wissenschaftliche Dienste Infobrief Seite 3 WD 10 - 3010 - 074/11 Zusammenfassung Ehrenamtlich ins Internet eingestelltes Wissen spielt in Zeiten des sogenannten „Mitmachweb“ eine zunehmend herausgehobene Rolle. Vor allem Wikis und insbesondere Wikipedia sind ein nicht mehr wegzudenkendes Element des Internets. Keine anderen vergleichbaren Wissensange- bote im Internet, auch wenn sie zum freien Abruf eingestellt sind und mit Steuern, Gebühren, Werbeeinnahmen finanziert oder als Gratisproben im Rahmen von Geschäftsmodellen verschenkt werden, erreichen die Zugriffszahlen von Wikipedia. Das ist ein Phänomen, das in seiner Dimension vor dem Hintergrund der urheberrechtlichen Dis- kussion und der Begründung von staatlichem Schutz als Voraussetzung für die Schaffung von geistigen Gütern kaum Beachtung findet. Relativ niedrige Verbreitungskosten im Internet und geringe oder keine Erfordernisse an Kapitalinvestitionen begünstigen diese Entwicklung.
    [Show full text]
  • WTF IS NATIVE ADVERTISING? Introduction
    SPONSORED BY IS NATIVE ADVERTISING? Table of Contents Introduction WTF is Native 3 14 Programmatic? Nomenclature The Native Ad 4 16 Triumvirate 5 Decision tree 17 The Ad Man 18 The Publisher Issues Still Plaguing 19 The Platform 6 Native Advertising Glossary 7 Scale 20 8 Metrics 10 Labelling 11 The Church/State Divide 12 Credibility 3 / WTF IS NATIVE ADVERTISING? Introduction In 2013, native advertising galloped onto the scene like a masked hero, poised to hoist publishers atop a white horse, rescuing them from the twin menaces of programmatic advertising and sagging CPMs. But who’s really there when you peel back the mask? Native advertising is a murky business. Ad executives may not consider it advertising. Editorial departments certainly don’t consider it editorial. Even among its practitioners there is debate — is it a format or is it a function? Publishers who have invested in the studio model position native advertising as the perfect storm of context, creative capital and digital strategy. For platforms, it may be the same old banner advertising refitted for the social stream. Digiday created the WTF series to parse murky digital marketing concepts just like these. WTF is Native Advertising? Keep reading to find out..... DIGIDAY 4 / WTF IS NATIVE ADVERTISING? Nomenclature Native advertising An advertising message designed Branded content Content created to promote a Content-recommendation widgets Another form to mimic the form and function of its environment brand’s products or values. Branded content can take of native advertising often used by publishers, these a variety of formats, not all of them technically “native.” appear to consumers most often at the bottom of a web Content marketing Any marketing messages that do Branded content placed on third-party publishing page with lines like “From around the web,” or “You not fit within traditional formats like TV and radio spots, sites or platforms can be considered native advertising, may also like.” print ads or banner messaging.
    [Show full text]
  • Google Analytics Education and Skills for the Professional Advertiser
    AN INDUSTRY GUIDE TO Google Analytics Education and Skills For The Professional Advertiser MODULE 5 American Advertising Federation The Unifying Voice of Advertising Google Analytics is a freemium service that records what is happening at a website. CONTENTS Tracking the actions of a website is the essence of Google MODULE 5 Google Analytics Analytics. Google Analytics delivers information on who is visiting a Learning Objectives .....................................................................................1 website, what visitors are doing on that website, and what Key Definitions .......................................................................................... 2-3 actions are carried out at the specific URL. Audience Tracking ................................................................................... 4-5 Information is the essence of informed decision-making. The The A,B,C Cycle .......................................................................................6-16 website traffic activity data of Google Analytics delivers a full Acquisition understanding of customer/visitor journeys. Behavior Focus Google Analytics instruction on four(4) Conversions basic tenets: Strategy: Attribution Model ............................................................17-18 Audience- The people who visit a website Case History ........................................................................................... 19-20 Review ..............................................................................................................
    [Show full text]
  • Reliability of Wikipedia 1 Reliability of Wikipedia
    Reliability of Wikipedia 1 Reliability of Wikipedia The reliability of Wikipedia (primarily of the English language version), compared to other encyclopedias and more specialized sources, is assessed in many ways, including statistically, through comparative review, analysis of the historical patterns, and strengths and weaknesses inherent in the editing process unique to Wikipedia. [1] Because Wikipedia is open to anonymous and collaborative editing, assessments of its Vandalism of a Wikipedia article reliability usually include examinations of how quickly false or misleading information is removed. An early study conducted by IBM researchers in 2003—two years following Wikipedia's establishment—found that "vandalism is usually repaired extremely quickly — so quickly that most users will never see its effects"[2] and concluded that Wikipedia had "surprisingly effective self-healing capabilities".[3] A 2007 peer-reviewed study stated that "42% of damage is repaired almost immediately... Nonetheless, there are still hundreds of millions of damaged views."[4] Several studies have been done to assess the reliability of Wikipedia. A notable early study in the journal Nature suggested that in 2005, Wikipedia scientific articles came close to the level of accuracy in Encyclopædia Britannica and had a similar rate of "serious errors".[5] This study was disputed by Encyclopædia Britannica.[6] By 2010 reviewers in medical and scientific fields such as toxicology, cancer research and drug information reviewing Wikipedia against professional and peer-reviewed sources found that Wikipedia's depth and coverage were of a very high standard, often comparable in coverage to physician databases and considerably better than well known reputable national media outlets.
    [Show full text]
  • Auditing Wikipedia's Hyperlinks Network on Polarizing Topics
    Auditing Wikipedia’s Hyperlinks Network on Polarizing Topics Cristina Menghini Aris Anagnostopoulos Eli Upfal DIAG DIAG Dept. of Computer Science Sapienza University Sapienza University Brown University Rome, Italy Rome, Italy Providence, RI, USA [email protected] [email protected] [email protected] ABSTRACT readers to deepen their understanding of a topic by conveniently ac- People eager to learn about a topic can access Wikipedia to form cessing other articles." Consequently, while reading an article, users a preliminary opinion. Despite the solid revision process behind are directly exposed to its content and indirectly exposed to the the encyclopedia’s articles, the users’ exploration process is still content of the pages it points to. influenced by the hyperlinks’ network. In this paper, we shed light Wikipedia’s pages are the result of collaborative efforts of a com- on this overlooked phenomenon by investigating how articles de- mitted community that, following policies and guidelines [4, 20], scribing complementary subjects of a topic interconnect, and thus generates and maintains up-to-date and high-quality content [28, may shape readers’ exposure to diverging content. To quantify 40]. Even though tools support the community for curating pages this, we introduce the exposure to diverse information, a metric that and adding links, it lacks a systematic way to contextualize the captures how users’ exposure to multiple subjects of a topic varies pages within the more general articles’ network. Indeed, it is im- click-after-click by leveraging navigation models. portant to stress that having access to high-quality pages does not For the experiments, we collected six topic-induced networks imply a comprehensive exposure to an argument, especially for a about polarizing topics and analyzed the extent to which their broader or polarizing topic.
    [Show full text]
  • Towards Content-Driven Reputation for Collaborative Code Repositories
    University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 8-28-2012 Towards Content-Driven Reputation for Collaborative Code Repositories Andrew G. West University of Pennsylvania, [email protected] Insup Lee University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/cis_papers Part of the Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons, and the Software Engineering Commons Recommended Citation Andrew G. West and Insup Lee, "Towards Content-Driven Reputation for Collaborative Code Repositories", 8th International Symposium on Wikis and Open Collaboration (WikiSym '12) , 13.1-13.4. August 2012. http://dx.doi.org/10.1145/2462932.2462950 Eighth International Symposium on Wikis and Open Collaboration, Linz, Austria, August 2012. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_papers/750 For more information, please contact [email protected]. Towards Content-Driven Reputation for Collaborative Code Repositories Abstract As evidenced by SourceForge and GitHub, code repositories now integrate Web 2.0 functionality that enables global participation with minimal barriers-to-entry. To prevent detrimental contributions enabled by crowdsourcing, reputation is one proposed solution. Fortunately this is an issue that has been addressed in analogous version control systems such as the *wiki* for natural language content. The WikiTrust algorithm ("content-driven reputation"), while developed and evaluated in wiki environments operates under a possibly shared collaborative assumption: actions that "survive" subsequent edits are reflective of good authorship. In this paper we examine WikiTrust's ability to measure author quality in collaborative code development. We first define a mapping omfr repositories to wiki environments and use it to evaluate a production SVN repository with 92,000 updates.
    [Show full text]
  • Chapter 3. Event Expressions: Filtering the Event Stream
    Chapter 3. Event Expressions: Filtering the Event Stream As we discussed in Chapter 1, the Live Web is uses a data model that is the dual of the data model that has prevailed on the static Web. The static Web is built around interactive Web sites that leverage pools of static data held in databases. Interactive Web sites are built with languages like PHP and Ruby on Rails that are designed for taking user interaction and formulating SQL queries against databases. The static Web works by making dynamic queries against static data. In contrast, the Live Web is based on static queries against dynamic data. Streams of real-time data are becoming more and more common online. Years ago, such data streams were just a trickle, but the advent of technologies like RSS and Atom as well as the appearance of services like Twitter and Facebook has turned this trickle into a raging torrent. And this is just the beginning. The only way that we can hope to make use of all this data is if we can filter it and automate our responses to it as much as possible. As we’ll discuss in detail in Chapter 4, the task is greater than mere filtering, the task is correlating these events contextually. Correlating events provides power well beyond merely using filters to tame the data torrent. This chapter introduces in detail the pattern language, called “event expressions1,” that we will use to match against streams of events. Event expressions are an important way that we match events with context.
    [Show full text]
  • Checkuser and Editing Patterns
    CheckUser and Editing Patterns Balancing privacy and accountability on Wikimedia projects Wikimania 2008, Alexandria, July 18, 2008 HaeB [[de:Benutzer:HaeB]], [[en:User:HaeB]] Please don't take photos during this talk. What are sockpuppets? ● Wikipedia and sister projects rely on being open: Anyone can edit, anyone can get an account. ● No ID control when registering (not even email address required) ● Many legitimate uses for multiple accounts ● “Sockpuppet” often implies deceptive intention, think ventriloquist What is the problem with sockpuppets? ● Ballot stuffing (some decision processes rely on voting, such as request for adminships and WMF board elections) ● “Dr Jekyll/Mr Hyde”: Carry out evil or controversial actions with a sockpuppet, such that the main account remains in good standing. E.g. trolling (actions intended to provoke adversive reactions and disrupt the community), or strawman accounts (putting the adversarial position in bad light) ● Artificial majorities in content disputes (useful if the wiki's culture values majority consensus over references and arguments), especially circumventing “three-revert-rule” ● Ban evasion What is the problem with sockpuppets? (cont'd) ● Newbies get bitten as a result of the possibility of ban evasion: ● Friedman and Resnick (The Social Cost of Cheap Pseudonyms, Journal of Economics and Management Strategy 2001): Proved in a game-theoretic model that the possibility of creating sockpuppets leads to distrust and discrimination against newcomers (if the community is otherwise successfully
    [Show full text]
  • The Case of Wikipedia Jansn
    UvA-DARE (Digital Academic Repository) Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system Niederer, S.; van Dijck, J. DOI 10.1177/1461444810365297 Publication date 2010 Document Version Submitted manuscript Published in New Media & Society Link to publication Citation for published version (APA): Niederer, S., & van Dijck, J. (2010). Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system. New Media & Society, 12(8), 1368-1387. https://doi.org/10.1177/1461444810365297 General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:27 Sep 2021 Full Title: Wisdom of the Crowd or Technicity of Content? Wikipedia as a socio-technical system Authors: Sabine Niederer and José van Dijck Sabine Niederer, University of Amsterdam, Turfdraagsterpad 9, 1012 XT Amsterdam, The Netherlands [email protected] José van Dijck, University of Amsterdam, Spuistraat 210, 1012 VT Amsterdam, The Netherlands [email protected] Authors’ Biographies Sabine Niederer is PhD candidate in Media Studies at the University of Amsterdam, and member of the Digital Methods Initiative, Amsterdam.
    [Show full text]
  • The New World of Content Measurement Why Existing Metrics Are Flawed (And How to Fix Them)
    The New World of Content Measurement Why existing metrics are flawed (and how to fix them) Copyright © 2014 Contently. All rights reserved. contently.com Carta Marina, Olaus Magnus Death to the pageview. — Paul Fredrich VP of Product, Contently GAME OF METRICS: OVERCOMING THE MEASUREMENT CRISIS IN THE CONTENT MARKETING KINGDOM Editor’s Note At Contently, three things have been a constant topic of discussion over the past six months: basketball, cult comedies from the early 1990s, and the disastrous state of content measurement in the marketing and media industries. Simply put, everyone was using flawed, superficial metrics to measure success—like a doctor using x-ray glasses instead of an x-ray machine. Even as forward-thinking content analytics platforms like Chartbeat began to make headway towards measuring content in more sophisticated ways, we saw brand publishers we worked with getting left out. That’s because brands face a fundamentally different set of content measurement challenges than traditional publishing companies. In pursuit of a solution, we did a lot of research and hard thinking, and even built an analytics tool of our own. This report is the result of our findings. We hope you find it useful, and that it empowers you to firmly tie your content to business results. — Joe Lazauskas Contently Editor in Chief 3 CONTENTLY GAME OF METRICS: OVERCOMING THE MEASUREMENT CRISIS IN THE CONTENT MARKETING KINGDOM Table of Contents I. Introduction 5 II. History 6 III. The Present Day 7 IV. Flawed Proxy Metrics 9 V. The Key to the Content Marketing Kingdom 11 VI. The Future 13 4 CONTENTLY GAME OF METRICS: OVERCOMING THE MEASUREMENT CRISIS IN THE CONTENT MARKETING KINGDOM You’re measuring the success of your content all wrong.
    [Show full text]