Classifying Wikipedia Article Quality with Revision History Networks

Total Page:16

File Type:pdf, Size:1020Kb

Classifying Wikipedia Article Quality with Revision History Networks Classifying Wikipedia Article Quality With Revision History Networks Narun Raman∗ Nathaniel Sauerberg∗ Carleton College Carleton College [email protected] [email protected] Jonah Fisher Sneha Narayan Carleton College Carleton College [email protected] [email protected] ABSTRACT long been interested in maintaining and investigating the quality We present a novel model for classifying the quality of Wikipedia of its content [4][6][12]. articles based on structural properties of a network representation Editors and WikiProjects typically rely on assessments of article of the article’s revision history. We create revision history networks quality to focus volunteer attention on improving lower quality (an adaptation of Keegan et. al’s article trajectory networks [7]), articles. This has led to multiple efforts to create classifiers that can where nodes correspond to individual editors of an article, and edges predict the quality of a given article [3][4][18]. These classifiers can join the authors of consecutive revisions. Using descriptive statistics assist in providing assessments of article quality at scale, and help generated from these networks, along with general properties like further our understanding of the features that distinguish high and the number of edits and article size, we predict which of six quality low quality Wikipedia articles. classes (Start, Stub, C-Class, B-Class, Good, Featured) articles belong While many Wikipedia article quality classifiers have focused to, attaining a classification accuracy of 49.35% on a stratified sample on assessing quality based on the content of the latest version of of articles. These results suggest that structures of collaboration an article [1, 4, 18], prior work has suggested that high quality arti- underlying the creation of articles, and not just the content of the cles are associated with more intense collaboration among editors article, should be considered for accurate quality classification. [8, 10, 21]. To this end, we introduce a novel model for article qual- ity classification, inspired by the network-based article trajectory CCS CONCEPTS model proposed by Keegan et al. [7], in order to predict an article’s quality by analyzing the structure of its collaboration network. • Human-centered computing ! Empirical studies in collab- Using statistics generated from each article’s revision history orative and social computing; Wikis. network, in conjunction with basic article characteristics like article KEYWORDS size and edit count, we run a multinomial logisitic regression (MLR) and attain a classification accuracy of 49.35% on a stratified sample Wikipedia, network analysis, quantitative methods, article quality, of 6000 articles. classification, collaboration ACM Reference Format: 2 COLLABORATION AND QUALITY Narun Raman, Nathaniel Sauerberg, Jonah Fisher, and Sneha Narayan. 2020. Classifying Wikipedia Article Quality With Revision History Networks. Structures of collaboration have been shown to affect outcomes in In 16th International Symposium on Open Collaboration (OpenSym 2020), traditional organizations, as well as volunteer groups online [13]. August 25–27, 2020, Virtual conference, Spain. ACM, New York, NY, USA, Structural properties of internal collaborative networks can influ- 7 pages. https://doi.org/10.1145/3412569.3412581 ence outcomes across a wide variety of contexts, from the creation of Broadway musicals [16], to open-source software development 1 INTRODUCTION [15]. In particular, small-world networks (i.e. highly clustered net- Wikipedia has become the largest and most popular online ref- works with small characteristic path length [19]) are believed to erence encyclopedia in the world1, and each of its articles is the enhance productivity and creativity [13, 15, 16]. product of collaboration between many volunteer editors working On Wikipedia in particular, leadership behavior and an interme- together to create a coherent and accurate resource [7][11]. Due diate level of small-wordliness in the social networks of WikiProject to Wikipedia’s widespread adoption, editors and researchers have talk pages have been found to be positively related to the efficiency and productivity of a project (measured in terms of edit count and ∗Both authors contributed equally to this research. edit longevity) [13]. Additionally, prior work has shown that arti- 1https://www.alexa.com/siteinfo/wikipedia.org cle quality is positively impacted by more communication among editors on talk pages, as well as a more centralized collaboration OpenSym 2020, August 25–27, 2020, Virtual conference, Spain structure [8]. While articles with many editors are more likely to © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. This is the author’s version of the work. It is posted here for your personal use. Not be of higher quality than articles with fewer editors [8, 9, 20], the for redistribution. The definitive Version of Record was published in 16th International addition of editors to a page only seems to improve its quality Symposium on Open Collaboration (OpenSym 2020), August 25–27, 2020, Virtual confer- ence, Spain, https://doi.org/10.1145/3412569.3412581. when those editors are collaborating appropriately [8]. Preliminary results also suggest that structural parameters of collaboration OpenSym 2020, August 25–27, 2020, Virtual conference, Spain Narun Raman, Nathaniel Sauerberg, Jonah Fisher, and Sneha Narayan networks generated from article revision histories can help dif- quality is subjective and classification judgments may vary among ferentiate controversial articles from featured articles [2]. Given editors. Second, approximately 7.52% (as of June 13, 2020) of all that these structures of collaboration have been related to individ- Wikipedia articles have not been assessed 4. This may impact the dis- ual article quality [2][8][10], as well as the overall productivity of tribution of assessed Wikipedia articles. Finally, because Wikipedia WikiProjects [13], we believe that characteristics of the collabora- is a constantly evolving platform, an article’s quality class rating tion networks of individual articles may be beneficial in predicting could be outdated because of new edits and revisions that occurred the quality level of a Wikipedia article. after its assessment. Despite these limitations, we use them as the ground-truth for 3 PREDICTING ARTICLE QUALITY ON article quality in our classifier because of their frequent use in WIKIPEDIA discussions of article quality on Wikipedia [5][10] as well as in similar classification models [4][6][14][18]. Existing models for predicting article quality on Wikipedia have typically used features related to the content of the article at a 4.2 The Revision History Network Model particular moment in time [1, 4, 18]. One such feature is the length of the article. A model with article word count as its only predictor The page history of a Wikipedia article can be viewed as an ordered was able to achieve a 97.15% accuracy rate in completing a binary list of revisions, each of which stores the state of the article at a classification task of featured and non-featured articles, beating particular point in time. Each revision is associated with an editor, out several more complex models [1]. Warncke-Wang et al. then identified by a username or IP address, depending on whether the introduced the ‘Actionable Model’, which uses article size along editor created an account. with features like the number of internal links, headings, images We define the revision history network of an article as an undi- and references to perform a more fine-grained classification task rected network whose nodes are the editors of the corresponding on all seven quality classes (including the now uncommon A-Class) article, and whose edges join the editors of consecutive revisions. with 42.5% accuracy on an imbalanced corpus of articles [18]. ORES, In other words, we can construct the revision history network of a machine-learning based web service that estimates Wikipedia an article from its revision history by iterating over the revisions article quality, uses a modified version of the Actionable Model chronologically and, for each revision 8 after the first, creating an with some additional predictor variables, including the number edge between the editors of revisions 8 − 1 and 8 if one does not of “[citation needed]” templates and the number of “Main article” already exist (see Figure 1). Note that this holds even if an editor linking templates [4]. This version was able to achieve an accuracy authors two consecutive revisions; in this case, we create an edge of 62.9% on their own corpus of articles using the six current quality joining the corresponding node to itself. Our model is adapted from classes. the article trajectory network model introduced in [7], which rep- However, these models do not take into account how the ar- resents each article’s revision history as a directed network with ticle was developed by its editors and the way they collaborated multiple edges. with one another. In light of work outlined in the previous section that suggests a relationship between the structure of editor collab- orations and article quality, we present a novel model that uses characteristics of an article’s revision history network to predict the quality of an article. 4 MODELING APPROACH 4.1 Operationalizing Article Quality In order to measure the baseline quality of an article, we make use of Wikipedia’s content
Recommended publications
  • Proceedings of the 11Th International Symposium on Open Collaboration
    Proceedings of the 11th International Symposium on Open Collaboration August 19-21, 2015 San Francisco, California, U.S.A. General chair: Dirk Riehle, Friedrich-Alexander University Erlangen-Nürnberg Research track chairs: Kevin Crowston, Syracuse University Carlos Jensen, Oregon State University Carl Lagoze, University of Michigan Ann Majchrzak, University of Southern California Arvind Malhotra, University of North Carolina at Chapel Hill Claudia Müller-Birn, Freie Universität Berlin Gregorio Robles, Universidad Rey Juan Carlos Aaron Shaw, Northwestern University Sponsors: Wikimedia Foundation Google Inc. University of California Berkeley In-cooperation: ACM SIGSOFT ACM SIGWEB Fiscal sponsor: The John Ernest Foundation The Association for Computing Machinery 2 Penn Plaza, Suite 701 New York New York 10121-0701 ACM COPYRIGHT NOTICE. Copyright © 2014 by the Association for Computing Machin- ery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be hon- ored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or [email protected]. For other copying of articles that carry a code at the bottom of the first or last page, copying is permitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, +1-978-750-8400, +1-978-750- 4470 (fax).
    [Show full text]
  • Community and Communication
    Community and 12 Communication A large, diverse, and thriving group of volun- teers produces encyclopedia articles and administers Wikipedia. Over time, members of the Wikipedia community have developed conventions for interacting with each other, processes for managing content, and policies for minimizing disruptions and maximizing use- ful work. In this chapter, we’ll discuss where to find other contributors and how to ask for help with any topic. We’ll also explain ways in which community members interact with each other. Though most discussion occurs on talk pages, Wikipedia has some central community forums for debate about the site’s larger policies and more specific issues. We’ll also talk about the make-up of the community. First, however, we’ll outline aspects of Wikipedia’s shared culture, from key philosophies about how contributors How Wikipedia Works (C) 2008 by Phoebe Ayers, Charles Matthews, and Ben Yates should interact with each other to some long-running points of debate to some friendly practices that have arisen over time. Although explicit site policies cover content guidelines and social norms, informal philosophies and practices help keep the Wikipedia community of contributors together. Wikipedia’s Culture Wikipedia’s community has grown spontaneously and organically—a recipe for a baffling culture rich with in-jokes and insider references. But core tenets of the wiki way, like Assume Good Faith and Please Don’t Bite the Newcomers, have been with the community since the beginning. Assumptions on Arrival Wikipedians try to treat new editors well. Assume Good Faith (AGF) is a funda- mental philosophy, as well as an official guideline (shortcut WP:AGF) on Wikipedia.
    [Show full text]
  • Understanding the Challenges of Collaborative Evidence-Based
    Do you have a source for that? Understanding the Challenges of Collaborative Evidence-based Journalism Sheila O’Riordan Gaye Kiely Bill Emerson Joseph Feller Business Information Business Information Business Information Business Information Systems, University Systems, University Systems, University Systems, University College Cork, Ireland College Cork, Ireland College Cork, Ireland College Cork, Ireland [email protected] [email protected] [email protected] [email protected] ABSTRACT consumption has evolved [25]. There has been a move WikiTribune is a pilot news service, where evidence-based towards digital news with increasing user involvement as articles are co-created by professional journalists and a well as the use of social media platforms for accessing and community of volunteers using an open and collaborative discussing current affairs [14,39,43]. As such, the boundaries digital platform. The WikiTribune project is set within an are shifting between professional and amateur contributions. evolving and dynamic media landscape, operating under Traditional news organizations are adding interactive principles of openness and transparency. It combines a features as participatory journalism practices rise (see [8,42]) commercial for-profit business model with an open and the technologies that allow citizens to interact en masse collaborative mode of production with contributions from provide new avenues for engaging in democratic both paid professionals and unpaid volunteers. This deliberation [19]; the “process of reaching reasoned descriptive case study captures the first 12-months of agreement among free and equal citizens” [6:322]. WikiTribune’s operations to understand the challenges and opportunities within this hybrid model of production. We use With these changes, a number of challenges have arisen.
    [Show full text]
  • The Culture of Wikipedia
    Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented.
    [Show full text]
  • Using Shape Expressions (Shex) to Share RDF Data Models and to Guide Curation with Rigorous Validation B Katherine Thornton1( ), Harold Solbrig2, Gregory S
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Repositorio Institucional de la Universidad de Oviedo Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation B Katherine Thornton1( ), Harold Solbrig2, Gregory S. Stupp3, Jose Emilio Labra Gayo4, Daniel Mietchen5, Eric Prud’hommeaux6, and Andra Waagmeester7 1 Yale University, New Haven, CT, USA [email protected] 2 Johns Hopkins University, Baltimore, MD, USA [email protected] 3 The Scripps Research Institute, San Diego, CA, USA [email protected] 4 University of Oviedo, Oviedo, Spain [email protected] 5 Data Science Institute, University of Virginia, Charlottesville, VA, USA [email protected] 6 World Wide Web Consortium (W3C), MIT, Cambridge, MA, USA [email protected] 7 Micelio, Antwerpen, Belgium [email protected] Abstract. We discuss Shape Expressions (ShEx), a concise, formal, modeling and validation language for RDF structures. For instance, a Shape Expression could prescribe that subjects in a given RDF graph that fall into the shape “Paper” are expected to have a section called “Abstract”, and any ShEx implementation can confirm whether that is indeed the case for all such subjects within a given graph or subgraph. There are currently five actively maintained ShEx implementations. We discuss how we use the JavaScript, Scala and Python implementa- tions in RDF data validation workflows in distinct, applied contexts. We present examples of how ShEx can be used to model and validate data from two different sources, the domain-specific Fast Healthcare Interop- erability Resources (FHIR) and the domain-generic Wikidata knowledge base, which is the linked database built and maintained by the Wikimedia Foundation as a sister project to Wikipedia.
    [Show full text]
  • E-Posters in Academic/Scientific Conferεnces – Guidelines, Comparative Study & New Suggestions
    University of the Aegean Department of Product and Systems Design Engineering DISSERTATION: E-POSTERS IN ACADEMIC/SCIENTIFIC CONFERΕNCES – GUIDELINES, COMPARATIVE STUDY & NEW SUGGESTIONS Pissaridi Maria Anastasia 511/2004041 Supervisor: Paraskevas Papanikos Committee Members: Nikolaos Zacharopoulos Maria Simosi Syros, June 2015 DISSERTATION: E-POSTERS IN ACADEMIC/SCIENTIFIC CONFERENCES – GUIDELINES, COMPARATIVE STUDY & NEW SUGGESTIONS Supervisor: Paraskevas Papanikos Committee Members: Nikolaos Zacharopoulos Maria Simosi Pissaridi Maria Anastasia Syros, June 2015 3 ABSTRACT Conferences play a key role in getting people interested in a field together to network and exchange knowledge. The poster presentation is a commonly used format for communicating information within the academic scientific conference sector. Paper posters were the beginning but as technology and the way people work changes, posters have to be developed and implemented in order to achieve successful knowledge transfer. Incorporating aspects of information technology into poster presentations can promote an interactive learning environment for users and counter the current passive nature of poster design as an integrated approach with supplemental material is required to achieve changes in user knowledge, attitude and behaviour. After conducting literature review, research of existing e-poster providers, interviews with 5 of them and a personal evaluation in a real time conference environment results show a gradual turn towards e-posters with the medical sector pioneering. Authors, viewers and organisers embrace this new format, and the features and functions it offers, although objections exist since people have different preferences and the e-poster sector is relatively young, an average of 5 years. Pissaridi Maria Anastasia dpsd04041 4 SUMMARY Η ανάγκη των ανθρώπων να συγκεντρώνονται για να ανταλλάσσουν ιδέες, ευρήματα, έρευνες και απόψεις υπήρχε από τη δημιουργία των πρώτων πόλεων όπου φτιάχνονται με χώρους συγκέντρωσης.
    [Show full text]
  • Edit Filters on English Wikipedia
    You Shall Not Publish: Edit Filters on English Wikipedia Lyudmila Vaseva Claudia Müller-Birn Human-Centered Computing | Freie Universität Berlin Human-Centered Computing | Freie Universität Berlin [email protected] [email protected] Figure 1: Warning Message of an edit filter to inform the editor that their edit is potentially non-constructive. ABSTRACT ACM Reference Format: Ensuring the quality of the content provided in online settings Lyudmila Vaseva and Claudia Müller-Birn. 2020. You Shall Not Publish: is an important challenge today, for example, for social media or Edit Filters on English Wikipedia. In 16th International Symposium on Open Collaboration (OpenSym 2020), August 25–27, 2020, Virtual conference, Spain. news. The Wikipedia community has ensured the high-quality ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3412569.3412580 standards for an online encyclopaedia from the beginning and has built a sophisticated set of automated, semi-automated, and manual 1 INTRODUCTION quality assurance mechanisms over the last fifteen years. The scien- tific community has systematically studied these mechanisms but The public heatedly debated so-called "upload filters" in the context 1 one mechanism has been overlooked — edit filters. Edit filters are of the EU copyright law reform in 2019 . Upload filters are con- syntactic rules that assess incoming edits, file uploads or account ceived as a form of copyright protection. They should check for creations. As opposed to many other quality assurance mechanisms, possible infringements of copyrighted material even before a con- edit filters are effective before a new revision is stored in the online tribution is published online — and the contributions affected can encyclopaedia.
    [Show full text]
  • R. Stuart Geiger Curriculum Vitae September 2021 [email protected] | | @Staeiou | Google Scholar | Github
    Geiger CV 1/14 R. Stuart Geiger Curriculum Vitae September 2021 [email protected] | http://stuartgeiger.com | @staeiou | Google Scholar | GitHub Education University of California at Berkeley, School of Information Ph.D., Information Management & Systems, designated emphasis in New Media December 2015 Georgetown University M.A., Communication, Culture, and Technology May 2009 University of Texas at Austin B.A., Humanities Honors May 2007 Employment University of California at San Diego Assistant Professor, Dept of Communication & Halıcıoğlu Data Science Institute Jul 2020 – present University of California at Berkeley, Berkeley Institute for Data Science Postdoctoral scholar and staff ethnographer Jan 2016 – June 2020 Wikimedia Foundation Research intern (full-time) May 2011 – Aug 2011 Georgetown University Research associate, Communication, Culture, and Technology (full-time) May 2009 – Aug 2010 Publications & Other Scholarly Outputs: Peer-Reviewed Journal Publications: 1. Geiger, R. S., Cope, D., Ip, J., Lotosh, M., Shah, A., Weng, J., & Tang, R. (2021). “Garbage In, Garbage Out” Revisited: What Do Machine Learning Application Papers Report About Human- Labeled Training Data?. Quantitative Science Studies, Online First. 1-32. https://doi.org/10.1162/qss_a_00144 2. Geiger, R. S., Howard, D., & Irani, L. (2021). The Labor of Maintaining and Scaling Free and Open-Source Software Projects. Proceedings of the ACM on Human-Computer Interaction, 5 (CSCW1), 1-28. https://dl.acm.org/doi/pdf/10.1145/3449249 Geiger CV 2/14 3. Scroggins, M.J., Pasquetto, I.V., Geiger, R.S., Boscoe, B.M., Darch, P.T., Cabasse-Mazel, C., Thompson, C. and Borgman, C.L. (2020). “Thorny Problems in Data (-Intensive) Science.” Communications of the ACM.
    [Show full text]
  • Controlled Analyses of Social Biases in Wikipedia Bios
    Controlled Analyses of Social Biases in Wikipedia Bios Anjalie Field Chan Young Park Yulia Tsvetkov [email protected] [email protected] [email protected] Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University ABSTRACT as explanatory variables in a regression model, which restricts anal- Social biases on Wikipedia, a widely-read global platform, could ysis to regression models and requires explicitly enumerating all greatly influence public opinion. While prior research has examined confounds [52]. man/woman gender bias in biography articles, possible influences In contrast, we develop a matching algorithm that enables ana- of confounding variables limit conclusions. In this work, we present lyzing different demographic groups while reducing the influence a methodology for reducing the effects of confounding variables in of confounding variables. Given a corpus of Wikipedia biography analyses of Wikipedia biography pages. Given a target corpus for pages for people that contain target attributes (e.g. pages for cis- analysis (e.g. biography pages about women), we present a method gender women), our algorithm builds a matched comparison corpus for constructing a comparison corpus that matches the target cor- of biography pages for people that do not (e.g. for cisgender men). pus in as many attributes as possible, except the target attribute The comparison corpus is constructed so that it closely matches the (e.g. the gender of the subject). We evaluate our methodology by de- target corpus on all known attributes except the targeted one. Thus, veloping metrics to measure how well the comparison corpus aligns examining differences between the two corpora can reveal content with the target corpus.
    [Show full text]
  • Submission Data for 2020-2021 CORE Conference Ranking Process (Submitted As a Comparator for International Conference on Testing Software and Systems )
    Submission Data for 2020-2021 CORE conference Ranking process (Submitted as a comparator for International Conference on Testing Software and Systems ) Mercedes G. Merayo, Ana Rosa Cavalli Conference Details Conference Title: International Symposium on Open Collaboration (was International Symposiums on Wikis) Acronym : OPENSYM Rank: B DBLP Link DBLP url: https://dblp.uni-trier.de/db/conf/wikis Recent Years Proceedings Publishing Style Proceedings Publishing: series Link to most recent proceedings: https://dl.acm.org/doi/proceedings/10.1145/3412569 Further details: Proceedings are published in the ACMâĂŹs International Conference Proceeding Series (ICPS). In addition to Full Research Papers, Emerging Research Papers are included in the proceedings. They are labelled as Short Papers. Also posters have been included in the proceedings (OpenSym 2017) in a dedicated section, POSTER SESSION: Posters. Most Recent Years Most Recent Year Year: 2019 URL: https://opensym.org/os2019/ Location: SkÃűvde, Sweden Papers submitted: 23 Papers published: 17 Acceptance rate: 74 Source for numbers: https://dl.acm.org/doi/proceedings/10.1145/3306446#acceptance-rates General Chairs Name: BjÃűrn Lundell Affiliation: University of SkÃűvde, Sweden Gender: M H Index: 23 GScholar url: https://scholar.google.com/citations?user=HzSO6UQAAAAJ&hl=en DBLP url: https://dblp.org/pid/37/6147.html Name: Jonas Gamalielsson Affiliation: University of SkÃűvde, Sweden Gender: M H Index: 10 GScholar url: DBLP url: https://dblp.org/pid/95/49.html Program Chairs 1 Name: Lorraine Morgan
    [Show full text]
  • Project Talk: Coordination Work and Group Membership in Wikiprojects
    Project talk: Coordination work and group membership in WikiProjects Jonathan T. Morgan*, Michael Gilbert*, David W. McDonald**, Mark Zachry* *Human Centered Design & Engineering **The Information School University of Washington Seattle, WA USA {jmo25, mdg, dwmc, zachry} @uw.edu ABSTRACT Groups emerge in online collaborations as individuals organize their WikiProjects have contributed to Wikipedia’s success in important productive activities around shared goals, interests, tasks and work- ways, yet the range of work that WikiProjects perform and the way spaces. These groups can provide important benefits for their mem- they coordinate that work remains largely unexplored. In this study, bers and perform valuable work for the community they belong to. we perform a content analysis of 788 work-related discussions from Lave & Wenger [16] assert that the most effective way to under- the talk pages of 138 WikiProjects in order to understand the role stand working groups like these is to examine the work activities WikiProjects play in collaborative work on Wikipedia. We find that their members engage in. But, as the scenario above illustrates, the editors use WikiProjects to coordinate a wide variety of work identifying the members of an online group and the work the group activities beyond content production and that non-members play an performs can be difficult for an outsider—whether they are a new active role in that work. Our research suggests that WikiProject user, a researcher or a system designer. collaboration is less structured and more open than that of many virtual teams and that WikiProjects may function more like FLOSS Research on the behavior of Wikipedia editors has informed our projects than traditional groups.
    [Show full text]
  • Browsing the Free Software Commons
    Browsing the Free Soware Commons Stefano Zacchiroli University Paris Diderot & Inria — [email protected] 22 August 2018 OpenSym 2018 Paris, France THE GREAT LIBRARY OF SOURCE CODE Stefano Zacchiroli Browsing the Free Soware Commons 22/08/2018, OpenSym 1 / 23 Outline 1 The Soware Commons 2 Soware Heritage 3 Accessing the archive 4 Geing involved Stefano Zacchiroli Browsing the Free Soware Commons 22/08/2018, OpenSym 2 / 23 (Free) Soware is everywhere Software Stefano Zacchiroli Browsing the Free Soware Commons 22/08/2018, OpenSym 2 / 23 Soware source code is special Harold Abelson, Structure and Interpretation of Computer Programs “Programs must be wrien for people to read, and only incidentally for machines to execute.” ake 2 source code (excerpt) Net. queue in Linux (excerpt) Len Shustek, Computer History Museum “Source code provides a view into the mind of the designer.” Stefano Zacchiroli Browsing the Free Soware Commons 22/08/2018, OpenSym 3 / 23 Source code is a precious part of our commons are we taking care of it? Our Soware Commons Definition (Commons) The commons is the cultural and natural resources accessible to all members of a society, including natural materials such as air, water, and a habitable earth. These resources are held in common, not owned privately. https://en.wikipedia.org/wiki/Commons Definition (Soware Commons) The soware commons consists of all computer soware which is available at lile or no cost and which can be altered and reused with few restrictions. Thus all open source soware and all free soware are part of the [soware] commons. [...] https://en.wikipedia.org/wiki/Software_Commons Stefano Zacchiroli Browsing the Free Soware Commons 22/08/2018, OpenSym 4 / 23 Our Soware Commons Definition (Commons) The commons is the cultural and natural resources accessible to all members of a society, including natural materials such as air, water, and a habitable earth.
    [Show full text]