The Writing Process in Online Mass Collaboration NLP-Supported Approaches to Analyzing Collaborative Revision and User Interaction
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Short-Format Workshops Build Skills and Confidence for Researchers to Work with Data
Paper ID #23855 Short-format Workshops Build Skills and Confidence for Researchers to Work with Data Kari L. Jordan Ph.D., The Carpentries Dr. Kari L. Jordan is the Director of Assessment and Community Equity for The Carpentries, a non-profit that develops and teaches core data science skills for researchers. Marianne Corvellec, Institute for Globally Distributed Open Research and Education (IGDORE) Marianne Corvellec has worked in industry as a data scientist and a software developer since 2013. Since then, she has also been involved with the Carpentries, pursuing interests in community outreach, educa- tion, and assessment. She holds a PhD in statistical physics from Ecole´ Normale Superieure´ de Lyon, France (2012). Elizabeth D. Wickes, University of Illinois at Urbana-Champaign Elizabeth Wickes is a Lecturer at the School of Information Sciences at the University of Illinois, where she teaches foundational programming and information technology courses. She was previously a Data Curation Specialist for the Research Data Service at the University Library of the University of Illinois, and the Curation Manager for WolframjAlpha. She currently co-organizes the Champaign-Urbana Python user group, has been a Carpentries instructor since 2015 and trainer since 2017, and elected member of The Carpentries’ Executive Council for 2018. Dr. Naupaka B. Zimmerman, University of San Francisco Mr. Jonah M. Duckles, Software Carpentry Jonah Duckles works to accelerate data-driven inquiry by catalyzing digital skills and building organiza- tional capacity. As a part of the leadership team, he helped to grow Software and Data Carpentry into a financially sustainable non-profit with a robust organization membership in 10 countries. -
A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics. -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
Operational Transformation in Co-Operative Editing
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 5, ISSUE 01, JANUARY 2016 ISSN 2277-8616 Operational Transformation In Co-Operative Editing Mandeep Kaur, Manpreet Singh, Harneet Kaur, Simran Kaur ABSTRACT: Cooperative Editing Systems in real-time allows a virtual team to view and edit a shared document at the same time. The document shared must be synchronized in order to ensure consistency for all the participants. This paper describes the Operational Transformation, the evolution of its techniques, its various applications, major issues, and achievements. In addition, this paper will present working of a platform where two users can edit a code (programming file) at the same time. Index Terms: CodeMirror, code sharing, consistency maintenance, Google Wave, Google Docs, group editors, operational transformation, real-time editing, Web Sockets ———————————————————— 1 INTRODUCTION 1.1 MAIN ISSUES RELATED TO REAL TIME CO- Collaborative editing enables portable professionals to OPERATIVE EDITING share and edit same documents at the same time. Elementally, it is based on operational transformation, that Three inconsistency problems is, it regulates the order of operations according to the Divergence: Contents at all sites in the final document transformed execution order. The idea is to transform are different. received update operation before its execution on a replica Causality-violation: Execution order is different from the of the object. Operational Transformation consists of a cause-effect order. centralized/decentralized integration procedure and a transformation function. Collaborative writing can be E.g., O1 → O3 defined as making changes to one document by a number of persons. The writing can be organized into sub-tasks Intention-violation: The actual effect is different from the assigned to each group member, with the latter part of the intended effect. -
Omnipedia: Bridging the Wikipedia Language
Omnipedia: Bridging the Wikipedia Language Gap Patti Bao*†, Brent Hecht†, Samuel Carton†, Mahmood Quaderi†, Michael Horn†§, Darren Gergle*† *Communication Studies, †Electrical Engineering & Computer Science, §Learning Sciences Northwestern University {patti,brent,sam.carton,quaderi}@u.northwestern.edu, {michael-horn,dgergle}@northwestern.edu ABSTRACT language edition contains its own cultural viewpoints on a We present Omnipedia, a system that allows Wikipedia large number of topics [7, 14, 15, 27]. On the other hand, readers to gain insight from up to 25 language editions of the language barrier serves to silo knowledge [2, 4, 33], Wikipedia simultaneously. Omnipedia highlights the slowing the transfer of less culturally imbued information similarities and differences that exist among Wikipedia between language editions and preventing Wikipedia’s 422 language editions, and makes salient information that is million monthly visitors [12] from accessing most of the unique to each language as well as that which is shared information on the site. more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with In this paper, we present Omnipedia, a system that attempts a multilingual Wikipedia experience. These include to remedy this situation at a large scale. It reduces the silo visualizing content in a language-neutral way and aligning effect by providing users with structured access in their data in the face of diverse information organization native language to over 7.5 million concepts from up to 25 strategies. We present a study of Omnipedia that language editions of Wikipedia. At the same time, it characterizes how people interact with information using a highlights similarities and differences between each of the multilingual lens. -
Lock-Free Collaboration Support for Cloud Storage Services With
Lock-Free Collaboration Support for Cloud Storage Services with Operation Inference and Transformation Jian Chen, Minghao Zhao, and Zhenhua Li, Tsinghua University; Ennan Zhai, Alibaba Group Inc.; Feng Qian, University of Minnesota - Twin Cities; Hongyi Chen, Tsinghua University; Yunhao Liu, Michigan State University & Tsinghua University; Tianyin Xu, University of Illinois Urbana-Champaign https://www.usenix.org/conference/fast20/presentation/chen This paper is included in the Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST ’20) February 25–27, 2020 • Santa Clara, CA, USA 978-1-939133-12-0 Open access to the Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST ’20) is sponsored by Lock-Free Collaboration Support for Cloud Storage Services with Operation Inference and Transformation ⇤ 1 1 1 2 Jian Chen ⇤, Minghao Zhao ⇤, Zhenhua Li , Ennan Zhai Feng Qian3, Hongyi Chen1, Yunhao Liu1,4, Tianyin Xu5 1Tsinghua University, 2Alibaba Group, 3University of Minnesota, 4Michigan State University, 5UIUC Abstract Pattern 1: Losing updates Alice is editing a file. Suddenly, her file is overwritten All studied This paper studies how today’s cloud storage services support by a new version from her collaborator, Bob. Sometimes, cloud storage collaborative file editing. As a tradeoff for transparency/user- Alice can even lose her edits on the older version. services friendliness, they do not ask collaborators to use version con- Pattern 2: Conflicts despite coordination trol systems but instead implement their own heuristics for Alice coordinates her edits with Bob through emails to All studied handling conflicts, which however often lead to unexpected avoid conflicts by enforcing a sequential order. -
The Culture of Wikipedia
Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented. -
Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science
(forthcoming in the Journal of the Association for Information Science and Technology) Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science Misha Teplitskiy Grace Lu Eamon Duede Dept. of Sociology and KnowledgeLab Computation Institute and KnowledgeLab University of Chicago KnowledgeLab University of Chicago [email protected] University of Chicago [email protected] (773) 834-4787 [email protected] (773) 834-4787 5735 South Ellis Avenue (773) 834-4787 5735 South Ellis Avenue Chicago, Illinois 60637 5735 South Ellis Avenue Chicago, Illinois 60637 Chicago, Illinois 60637 Abstract With the rise of Wikipedia as a first-stop source for scientific knowledge, it is important to compare its representation of that knowledge to that of the academic literature. Here we identify the 250 most heavi- ly used journals in each of 26 research fields (4,721 journals, 19.4M articles in total) indexed by the Scopus database, and test whether topic, academic status, and accessibility make articles from these journals more or less likely to be referenced on Wikipedia. We find that a journal’s academic status (im- pact factor) and accessibility (open access policy) both strongly increase the probability of its being ref- erenced on Wikipedia. Controlling for field and impact factor, the odds that an open access journal is referenced on the English Wikipedia are 47% higher compared to paywall journals. One of the implica- tions of this study is that a major consequence of open access policies is to significantly amplify the dif- fusion of science, through an intermediary like Wikipedia, to a broad audience. Word count: 7894 Introduction Wikipedia, one of the most visited websites in the world1, has become a destination for information of all kinds, including information about science (Heilman & West, 2015; Laurent & Vickers, 2009; Okoli, Mehdi, Mesgari, Nielsen, & Lanamäki, 2014; Spoerri, 2007). -
Linfield College, Computer Science Department COMP 161: Basic Programming Syllabus Spring 2015 Prof
Linfield College, Computer Science Department COMP 161: Basic Programming Syllabus Spring 2015 Prof. Daniel Ford COURSE DESCRIPTION: Introduces the basic concepts of programming: reading and writing unambiguous descriptions of sequential processes. Emphasizes introductory algorithmic strategies and corresponding structures. (QR). COMP 160 is required for a major in electronic arts and is a prerequisite for COMP 161 and all other required courses for a major or minor in computer science. Extensive hands-on experience with programming projects. Lecture/discussion, lab. PREREQUISITES: Prerequisite: COMP 160. CREDITS: 3 LECTURES: Renshaw 211 Section 1: MWF 2:00 – 2:50 PM WORKSHOPS: Renshaw 211 Workshop: Mondays and Wednesday, 8—9 PM Additional help and tutoring is available by request and through the learning center. PROFESSOR: Daniel Ford Office: Renshaw 210 Phone: x2706 E-mail: [email protected] Office Hours: MWF 10:30—11:30 AM, TTh 2:00—3:00 PM, and by appointment. RECOMMENDED BOOKS: § Cay S. Horstmann, Big Java, http://www.horstmann.com/bigjava.html. Recommended for COMP 160 & 161. Wiley 5th Edition (2012 Early Objects) ~ $90, 4th Edition, 3rd Edition, …, 2nd Edition (2005) ~ $10. § Y. Daniel Liang, Introduction to Java Programming, Comprehensive Version, http://cs.armstrong.edu/liang/intro9e/. Recommended for COMP 160 & 161. Faster, more advanced, but with fewer tips than Big Java. 9th Edition (2012) ~ $90, 8th Edition, …, 6th Edition (2006) ~$10. § Cay S. Horstmann & Gary Cornell, Core Java™, Volume I – Fundamentals and Volume II – Advanced Features (9th Edition), 2012 http://www.horstmann.com/corejava.html. The best reference manuals for Java. Recommended for CS majors and professional Java programmers. -
Wikipedia Editing History in Dbpedia Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin
Wikipedia editing history in DBpedia Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin To cite this version: Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin. Wikipedia editing history in DB- pedia : extracting and publishing the encyclopedia editing activity as linked data. IEEE/WIC/ACM International Joint Conference on Web Intelligence (WI’ 16), Oct 2016, Omaha, United States. hal- 01359575 HAL Id: hal-01359575 https://hal.inria.fr/hal-01359575 Submitted on 2 Sep 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Wikipedia editing history in DBpedia extracting and publishing the encyclopedia editing activity as linked data Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin Université Côte d’Azur, Inria, CNRS, I3S, France Wimmics, Sophia Antipolis, France [email protected] Abstract— DBpedia is a huge dataset essentially extracted example, the French editing history dump represents 2TB of from the content and structure of Wikipedia. We present a new uncompressed data. This data extraction is performed by extraction producing a linked data representation of the editing stream in Node.js with a MongoDB instance. It takes 4 days to history of Wikipedia pages. This supports custom querying and extract 55 GB of RDF in turtle on 8 Intel(R) Xeon(R) CPU E5- combining with other data providing new indicators and insights. -
The Case of Croatian Wikipedia: Encyclopaedia of Knowledge Or Encyclopaedia for the Nation?
The Case of Croatian Wikipedia: Encyclopaedia of Knowledge or Encyclopaedia for the Nation? 1 Authorial statement: This report represents the evaluation of the Croatian disinformation case by an external expert on the subject matter, who after conducting a thorough analysis of the Croatian community setting, provides three recommendations to address the ongoing challenges. The views and opinions expressed in this report are those of the author and do not necessarily reflect the official policy or position of the Wikimedia Foundation. The Wikimedia Foundation is publishing the report for transparency. Executive Summary Croatian Wikipedia (Hr.WP) has been struggling with content and conduct-related challenges, causing repeated concerns in the global volunteer community for more than a decade. With support of the Wikimedia Foundation Board of Trustees, the Foundation retained an external expert to evaluate the challenges faced by the project. The evaluation, conducted between February and May 2021, sought to assess whether there have been organized attempts to introduce disinformation into Croatian Wikipedia and whether the project has been captured by ideologically driven users who are structurally misaligned with Wikipedia’s five pillars guiding the traditional editorial project setup of the Wikipedia projects. Croatian Wikipedia represents the Croatian standard variant of the Serbo-Croatian language. Unlike other pluricentric Wikipedia language projects, such as English, French, German, and Spanish, Serbo-Croatian Wikipedia’s community was split up into Croatian, Bosnian, Serbian, and the original Serbo-Croatian wikis starting in 2003. The report concludes that this structure enabled local language communities to sort by points of view on each project, often falling along political party lines in the respective regions. -
Language-Agnostic Relation Extraction from Abstracts in Wikis
information Article Language-Agnostic Relation Extraction from Abstracts in Wikis Nicolas Heist, Sven Hertling and Heiko Paulheim * ID Data and Web Science Group, University of Mannheim, Mannheim 68131, Germany; [email protected] (N.H.); [email protected] (S.H.) * Correspondence: [email protected] Received: 5 February 2018; Accepted: 28 March 2018; Published: 29 March 2018 Abstract: Large-scale knowledge graphs, such as DBpedia, Wikidata, or YAGO, can be enhanced by relation extraction from text, using the data in the knowledge graph as training data, i.e., using distant supervision. While most existing approaches use language-specific methods (usually for English), we present a language-agnostic approach that exploits background knowledge from the graph instead of language-specific techniques and builds machine learning models only from language-independent features. We demonstrate the extraction of relations from Wikipedia abstracts, using the twelve largest language editions of Wikipedia. From those, we can extract 1.6 M new relations in DBpedia at a level of precision of 95%, using a RandomForest classifier trained only on language-independent features. We furthermore investigate the similarity of models for different languages and show an exemplary geographical breakdown of the information extracted. In a second series of experiments, we show how the approach can be transferred to DBkWik, a knowledge graph extracted from thousands of Wikis. We discuss the challenges and first results of extracting relations from a larger set of Wikis, using a less formalized knowledge graph. Keywords: relation extraction; knowledge graphs; Wikipedia; DBpedia; DBkWik; Wiki farms 1. Introduction Large-scale knowledge graphs, such as DBpedia [1], Freebase [2], Wikidata [3], or YAGO [4], are usually built using heuristic extraction methods, by exploiting crowd-sourcing processes, or both [5].