Automatic Subject Indexing of Library Records with Wikipedia Concepts

Total Page:16

File Type:pdf, Size:1020Kb

Automatic Subject Indexing of Library Records with Wikipedia Concepts View metadata, citation and similar papers at core.ac.uk brought to you by CORE Accepted for Publication provided by University of Limerick Institutional Repository By the Journal of Information Science: http://jis.sagepub.co.uk Article Journal of Information Science 1–11 Towards Linking Libraries and © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav Wikipedia: Automatic Subject DOI: 10.1177/0165551510000000 Indexing of Library Records with jis.sagepub.com Wikipedia Concepts Journal of Information Science 1–11 © The Author(s) 2013 Reprints and permission: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/1550059413486272 Arash Joorabchi jis.sagepub.com Department of Electronic and Computer Engineering, University of Limerick, Ireland Abdulhussain E. Mahdi Department of Electronic and Computer Engineering, University of Limerick, Ireland Abstract In this article, we first argue the importance and timely need of linking libraries and Wikipedia for improving the quality of their services to information consumers, as such linkage will enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources which are currently overlooked to a large degree. We then describe the development of an automatic system for subject indexing of library metadata records with Wikipedia concepts as an important step towards Library-Wikipedia integration. The proposed system is based on first identifying all Wikipedia concepts occurring in the metadata elements of library records. This is then followed by training and deploying generic machine learning algorithms to automatically select those concepts which most accurately reflect the core subjects of the library materials whose records are being indexed. We have assessed the performance of the developed system using standard information retrieval measures of precision, recall, and F-score on a dataset consisting of 100 library metadata records manually indexed with a total of 469 Wikipedia concepts. The evaluation results show that the developed system is capable of achieving an averaged F-score as high as 0.92. Keywords Text mining; metadata generation; subject metadata; library metadata; bibliographic records; Wikipedia 1. Introduction Over the last decade, libraries have experienced a steady decline in the number of users of their online catalogues and websites, which in turn translates into a substantial decrease in the number of information consumers such as students and scholars who use library resources for their information needs. Reportedly, very few information searches (~1%) start from library websites, while the great majority of the rest of information seeking activities (84%) start from search engines such as Google (62%) [1]. On the other hand, Google-Wikipedia is becoming a prevalent information-seeking paradigm in which the information seeker submits an informational query, i.e., query on a particular topic, subject, or concept, to Google and then follows one of the search results redirecting to a relevant article on Wikipedia. Reportedly, Wikipedia appears on page one of the Google search results for 60% of informational queries and in 66% of such cases it appears in top-visibility positions (1-3) of the results page, where majority of clicks occur [2]. Wikipedia is the world’s largest web-based free-content encyclopedia project. The English Wikipedia alone currently contains more than four million articles [3]. Wikipedia articles are written, edited, and kept up-to-date and accurate (to a large degree) by a vast community of volunteer contributors, editors, and administrators who are collectively called Wikipedians. An investigation conducted by Nature in 2005 [4] suggested that Wikipedia comes close to Encyclopaedia Britannica in terms of the accuracy of its science entries, which was later disputed by the Britannica [5]. However, regardless of occasional controversies around the accuracy of its articles, Wikipedia is serving a significant role in Corresponding author: Abdulhussain E. Mahdi, Department of Electronic and Computer Engineering, University of Limerick, Limerick, Republic of Ireland. Email: [email protected] Accepted for Publication By the Journal of Information Science: http://jis.sagepub.co.uk Joorabchi et al 2 fulfilling public information needs. For example, results of a nationwide survey conducted in the U.S. in 2007 showed that Wikipedia attracted six times more traffic than the next closest website in the ‘educational and reference’ category and preceded websites such as Google Scholar and Google Books with a large margin [6]. In the context described above, linking Wikipedia articles to the records of relevant library materials will give information seekers the option to readily acquire lists of library resources which provide them with more in-depth knowledge on their subject of interest. In this paradigm each Wikipedia article will be linked to the records of relevant materials in a global union catalogue of libraries around the world, WorldCat.org, which in turn provides bibliographic information on the materials of interest and directs information seekers to their local libraries, where they can access those materials. Introduction of this new Wikipedia-Library information seeking paradigm will consequently improve the visibility of library resources which are currently overlooked to a large extend by those information consumers with lower information literacy skills. Looking at it from another perspective, in this paradigm Wikipedia plays the role of a new controlled vocabulary for subject indexing of library materials as an alternative (and/or compliment) to the traditional controlled vocabularies currently used in libraries, such as the Library of Congress Subject Headings (LCSH). As a controlled vocabulary, Wikipedia offers a number of unique advantages over traditional controlled vocabularies: (1) Extensive coverage and comprehensiveness: the English Wikipedia currently contains over 4 million articles covering subjects in all aspects of human knowledge and growing. Whereas, the latest version of LCSH (33rd edition), which is the de facto standard controlled vocabulary used in libraries around the world, contains approximately a total of 337,000 subject headings and references [7]. (2) Up-to-dateness: due to the crowd-sourced nature of Wikipedia and its large pool of editors, Wikipedia articles are generally well-maintained and kept quite up-to-date. For example, a recent study examining the potential of combining Twitter and Wikipedia data for event detection shows that in case of major events Wikipedia lags Twitter only by about three hours [8]. (3) Rich description: Wikipedia articles provide rich descriptive content for the represented concepts. Whereas, traditional library controlled vocabularies offer very little or no description for their subject headings. For example, the LCSH authority record for the subject heading ‘Metadata’1 offers only two pieces of information about this subject: (a) the subject may also be referred to by ‘data about data’ and ‘meta-data’; (b) the subject is related to two more specific subjects ‘Dublin Core’ and ‘Preservation metadata’. In contrast, the Wikipedia article for the concept of ‘metadata’2 provides a rich description of the concept including definition, variations, and applications complemented with links and references to relevant materials and related concepts. Therefore, extending the metadata records of library materials with Wikipedia concepts gives the users of online library catalogues the option to find descriptive information about unfamiliar indexing subjects that they may encounter as they browse and explore the collection. (4) Semantic richness: like the LC subject headings, each Wikipedia article has a descriptor which is the preferred and most commonly used term for the represented concept and it is also assigned a set of non-descriptors which are the less common synonyms and alternative lexical forms for the concept. Also, similar to the concept of ‘Related Terms’ in library controlled vocabularies, in Wikipedia, related articles are connected via hyperlinks. Furthermore, each Wikipedia article is classified according to the Wikipedia’s own community-built classification scheme into one or more broader categories which resembles the concept of ‘Broader Terms/Narrower Terms’ used in the library controlled vocabularies. Finally, similar to the library controlled vocabularies, Wikipedia addresses the problem of word-sense ambiguity by allowing an ambiguous term to correspond to multiple concepts each representing a different sense of the term, e.g., Java (programming language), Java (town), Java (band), etc. (5) Multilingual: as of September 2013 Wikipedia exists in more than 285 languages. Wikipedia has more than one million articles in each of the 8 most populated languages and more than one hundred thousand articles in each of the 38 less populated languages [9]. This high level of multilingualism in Wikipedia would allow the design and development of effective multilingual information retrieval systems for libraries once their metadata records are indexed with Wikipedia concepts. According to a report published by OCLC (Online Computer Library Centre) in 2009 [10], the majority of end users of online library catalogues surveyed considered adding more subject information to the metadata records of library materials to be the most helpful enhancement in relation to improving the discovery-related data quality
Recommended publications
  • Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers Abstract
    THE YALE LAW JOURNAL FORUM O CTOBER 9 , 2017 Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers abstract. The problem of fake news impacts a massive online ecosystem of individuals and organizations creating, sharing, and disseminating content around the world. One effective ap- proach to addressing false information lies in monitoring such information through an active, engaged volunteer community. Wikipedia, as one of the largest online volunteer contributor communities, presents one example of this approach. This Essay argues that the existing legal framework protecting intermediary companies in the United States empowers the Wikipedia community to ensure that information is accurate and well-sourced. The Essay further argues that current legal efforts to weaken these protections, in response to the “fake news” problem, are likely to create perverse incentives that will harm volunteer engagement and confuse the public. Finally, the Essay offers suggestions for other intermediaries beyond Wikipedia to help monitor their content through user community engagement. introduction Wikipedia is well-known as a free online encyclopedia that covers nearly any topic, including both the popular and the incredibly obscure. It is also an encyclopedia that anyone can edit, an example of one of the largest crowd- sourced, user-generated content websites in the world. This user-generated model is supported by the Wikimedia Foundation, which relies on the robust intermediary liability immunity framework of U.S. law to allow the volunteer editor community to work independently. Volunteer engagement on Wikipedia provides an effective framework for combating fake news and false infor- mation. 358 wikipedia and intermediary immunity: supporting sturdy crowd systems for producing reliable information It is perhaps surprising that a project open to public editing could be highly reliable.
    [Show full text]
  • The Case of Wikipedia Jansn
    UvA-DARE (Digital Academic Repository) Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system Niederer, S.; van Dijck, J. DOI 10.1177/1461444810365297 Publication date 2010 Document Version Submitted manuscript Published in New Media & Society Link to publication Citation for published version (APA): Niederer, S., & van Dijck, J. (2010). Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system. New Media & Society, 12(8), 1368-1387. https://doi.org/10.1177/1461444810365297 General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:27 Sep 2021 Full Title: Wisdom of the Crowd or Technicity of Content? Wikipedia as a socio-technical system Authors: Sabine Niederer and José van Dijck Sabine Niederer, University of Amsterdam, Turfdraagsterpad 9, 1012 XT Amsterdam, The Netherlands [email protected] José van Dijck, University of Amsterdam, Spuistraat 210, 1012 VT Amsterdam, The Netherlands [email protected] Authors’ Biographies Sabine Niederer is PhD candidate in Media Studies at the University of Amsterdam, and member of the Digital Methods Initiative, Amsterdam.
    [Show full text]
  • Critical Point of View: a Wikipedia Reader
    w ikipedia pedai p edia p Wiki CRITICAL POINT OF VIEW A Wikipedia Reader 2 CRITICAL POINT OF VIEW A Wikipedia Reader CRITICAL POINT OF VIEW 3 Critical Point of View: A Wikipedia Reader Editors: Geert Lovink and Nathaniel Tkacz Editorial Assistance: Ivy Roberts, Morgan Currie Copy-Editing: Cielo Lutino CRITICAL Design: Katja van Stiphout Cover Image: Ayumi Higuchi POINT OF VIEW Printer: Ten Klei Groep, Amsterdam Publisher: Institute of Network Cultures, Amsterdam 2011 A Wikipedia ISBN: 978-90-78146-13-1 Reader EDITED BY Contact GEERT LOVINK AND Institute of Network Cultures NATHANIEL TKACZ phone: +3120 5951866 INC READER #7 fax: +3120 5951840 email: [email protected] web: http://www.networkcultures.org Order a copy of this book by sending an email to: [email protected] A pdf of this publication can be downloaded freely at: http://www.networkcultures.org/publications Join the Critical Point of View mailing list at: http://www.listcultures.org Supported by: The School for Communication and Design at the Amsterdam University of Applied Sciences (Hogeschool van Amsterdam DMCI), the Centre for Internet and Society (CIS) in Bangalore and the Kusuma Trust. Thanks to Johanna Niesyto (University of Siegen), Nishant Shah and Sunil Abraham (CIS Bangalore) Sabine Niederer and Margreet Riphagen (INC Amsterdam) for their valuable input and editorial support. Thanks to Foundation Democracy and Media, Mondriaan Foundation and the Public Library Amsterdam (Openbare Bibliotheek Amsterdam) for supporting the CPOV events in Bangalore, Amsterdam and Leipzig. (http://networkcultures.org/wpmu/cpov/) Special thanks to all the authors for their contributions and to Cielo Lutino, Morgan Currie and Ivy Roberts for their careful copy-editing.
    [Show full text]
  • Robots.Txt: an Ethnographic Investigation of Automated Software Agents in User-Generated Content Platforms
    robots.txt: An Ethnographic Investigation of Automated Software Agents in User-Generated Content Platforms By Richard Stuart Geiger II A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy In Information Management and Systems and the Designated Emphasis in New Media in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Jenna Burrell, Chair Professor Coye Cheshire Professor Paul Duguid Professor Eric Paulos Fall 2015 robots.txt: An Ethnographic Investigation of Automated Software Agents in User-Generated Content Platforms © 2015 by Richard Stuart Geiger II Freely licensed under the Creative Commons Attribution-ShareAlike 4.0 License (License text at https://creativecommons.org/licenses/by-sa/4.0/) Abstract robots.txt: An Ethnographic Investigation of Automated Software Agents in User-Generated Content Platforms by Richard Stuart Geiger II Doctor of Philosophy in Information Management and Systems with a Designated Emphasis in New Media University of California, Berkeley Professor Jenna Burrell, Chair This dissertation investigates the roles of automated software agents in two user-generated content platforms: Wikipedia and Twitter. I analyze ‘bots’ as an emergent form of sociotechnical governance, raising many issues about how code intersects with community. My research took an ethnographic approach to understanding how participation and governance operates in these two sites, including participant-observation in everyday use of the sites and in developing ‘bots’ that were delegated work. I also took a historical and case studies approach, exploring the development of bots in Wikipedia and Twitter. This dissertation represents an approach I term algorithms-in-the-making, which extends the lessons of scholars in the field of science and technology studies to this novel domain.
    [Show full text]
  • Operationalizing Conflict and Cooperation Between Automated Software Agents in Wikipedia: a Replication and Expansion of “Even Good Bots Fight”
    49 Operationalizing Conflict and Cooperation between Automated Software Agents in Wikipedia: A Replication and Expansion of “Even Good Bots Fight” R. STUART GEIGER, Berkeley Institute for Data Science, University of California, Berkeley AARON HALFAKER, Wikimedia Research, Wikimedia Foundation This paper replicates, extends, and refutes conclusions made in a study published in PLoS ONE (“Even Good Bots Fight”), which claimed to identify substantial levels of conflict between automated software agents (or bots) in Wikipedia using purely quantitative methods. By applying an integrative mixed-methods approach drawing on trace ethnography, we place these alleged cases of bot-bot conflict into context and arrive at a better understanding of these interactions. We found that overwhelmingly, the interactions previously characterized as problematic instances of conflict are typically better characterized as routine, productive, even collaborative work. These results challenge past work and show the importance of qualitative/quantitative collaboration. In our paper, we present quantitative metrics and qualitative heuristics for operationalizing bot-bot conflict. We give thick descriptions of kinds of events that present as bot-bot reverts, helping distinguish conflict from non-conflict. We computationally classify these kinds of events through patterns in edit summaries. By interpreting found/trace data in the socio-technical contexts in which people give that data meaning, we gain more from quantitative measurements, drawing deeper understandings about the governance of algorithmic systems in Wikipedia. We have also released our data collection, processing, and analysis pipeline, to facilitate computational reproducibility of our findings and to help other researchers interested in conducting similar mixed-method scholarship in other platforms and contexts.
    [Show full text]
  • How English Wikipedia Moderates Harmful Speech
    Content and Conduct: How English Wikipedia Moderates Harmful Speech The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Clark, Justin, Robert Faris, Urs Gasser, Adam Holland, Hilary Ross, and Casey Tilton. Content and Conduct: How English Wikipedia Moderates Harmful Speech. Berkman Klein Center for Internet & Society, Harvard University, 2019. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:41872342 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA CONTENT AND CONDUCT: HOW ENGLISH WIKIPEDIA MODERATES HARMFUL SPEECH NOV 2019 JUSTIN CLARK ROBERT FARIS URS GASSER ADAM HOLLAND HILARY ROSS CASEY TILTON Acknowledgments The authors would like to thank the following people for their critical contributions and support: The 16 interview participants for volunteering their time and knowledge about Wikipedia content moderation; Isaac Johnson, Jonathan Morgan, and Leila Zia for their guidance with the quantitative research methodology; Patrick Earley for his guidance and assistance with recruiting volunteer Wikipedians to be interviewed; Amar Ashar, Chinmayi Arun, and SJ Klein for their input and feedback throughout the study; Jan Gerlach and other members of the Legal and Research teams at the Wikimedia Foundation for their guidance in scoping the study and for providing feedback on drafts of the report; and the Wikimedia Foundation for its financial support. Executive Summary In this study, we aim to assess the degree to which English-language Wikipedia is successful in addressing harmful speech with a particular focus on the removal of deleterious content.
    [Show full text]
  • Wikipedia and Wikis Haider, Jutta
    Wikipedia and Wikis Haider, Jutta; Sundin, Olof Published in: The Handbook of Peer Production DOI: 10.1002/9781119537151.ch13 2021 Document Version: Peer reviewed version (aka post-print) Link to publication Citation for published version (APA): Haider, J., & Sundin, O. (2021). Wikipedia and Wikis. In M. O'Neil, C. Pentzold, & S. Toupin (Eds.), The Handbook of Peer Production (pp. 169-184). (Wiley Handbooks in Communication and Media Series ). Wiley. https://doi.org/10.1002/9781119537151.ch13 Total number of authors: 2 Creative Commons License: CC BY-NC-ND General rights Unless other specific re-use rights are stated the following general rights apply: Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Read more about Creative commons licenses: https://creativecommons.org/licenses/ Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. LUND UNIVERSITY PO Box 117 221 00 Lund +46 46-222 00 00 This is the author’s version of a chapter accepted for publication in the Handbook of Peer Production.
    [Show full text]
  • The Handbook of Peer Production Chapter 13: Wikipedia and Wikis
    This is the author’s version of a chapter accepted for publication in the Handbook of Peer Production. Changes resulting from the publishing process such as copy-editing, typesetting, and other quality control mechanisms may not be reflected in this document. This author manuscript version is available for personal, non-commercial and no derivative uses only. Citation: Haider, J. & Sundin, O. (2021). Wikipedia and wikis. In: M. O’Neil, C. Pentzold & S. Toupin (Eds.), The Handbook of Peer Production (pp. 169-184). Malden, MA: Wiley-Blackwell. ISBN 9781119537106 Available at: https://www.wiley.com/en-au/The+Handbook+of+Peer+Production-p-9781119537090 The Handbook of Peer Production Chapter 13: Wikipedia and Wikis Jutta Haider, University of Borås, Sweden & Olof Sundin, Lund University, Sweden Chapter 13 – Wikis and Wikipedia 2 1. Introduction WikiWikiWebs or wikis constitute the core platform of peer production. Wikis are software programs allowing for flexible collaboration without necessarily having a defined content owner or leader. A wiki is a user-editable website or content management system. Wikis might use different programming languages and licenses, but they apply the same model for cooperation, which means that collaborators can modify content, insert hyperlinks, and change the structure of a document directly in any web browser. Edits are usually archived and open to revision for all collaborators. The most popular and successful wiki-based project by far is Wikipedia. Established in 2001, today this encyclopedia is one of the most popular sites on the Web; what is more, with its data supporting other applications and commercial platforms like Google, Wikipedia has taken on an infrastructural role in the contemporary internet.
    [Show full text]
  • The Roles Bots Play in Wikipedia
    The Roles Bots Play in Wikipedia LEI (NICO) ZHENG, Stevens Institute of Technology, USA CHRISTOPHER M. ALBANO, Stevens Institute of Technology, USA NEEV M. VORA, Stevens Institute of Technology, USA FENG MAI, Stevens Institute of Technology, USA JEFFREY V. NICKERSON, Stevens Institute of Technology, USA Bots are playing an increasingly important role in the creation of knowledge in Wikipedia. In many cases, editors and bots form tightly knit teams. Humans develop bots, argue for their approval, and maintain them, performing tasks such as monitoring activity, merging similar bots, splitting complex bots, and turning off malfunctioning bots. Yet this is not the entire picture. Bots are designed to perform certain functions and can acquire new functionality over time. They play particular roles in the editing process. Understanding these roles is an important step towards understanding the ecosystem, and designing better bots and interfaces between bots and humans. This is important for understanding Wikipedia along with other kinds of work in which autonomous machines affect tasks performed by humans. In this study, we use unsupervised learning to build a nine category taxonomy of bots based on their functions in English Wikipedia. We then build a multi-class classifier to classify 1,601 bots based on labeled data. We discuss different bot activities, including their edit frequency, their working spaces, and their software evolution. We use a model to investigate how bots playing certain roles will have differential effects on human editors. In particular, we build on previous research on newcomers by studying the relationship between the roles bots play, the interactions they have with newcomers, and the ensuing survival rate of the newcomers.
    [Show full text]
  • Wikipedia: a Republic of Science Democratized
    CHEN_ART_FINAL.DOCX 5/18/2010 1:29 PM WIKIPEDIA: A REPUBLIC OF SCIENCE DEMOCRATIZED * Shun-Ling Chen TABLE OF CONTENTS I. INTRODUCTION ........................................................................... 249 II. WHAT IS WIKIPEDIA AND HOW DOES IT WORK? ...................... 252 III. THE COMMONS-COLLABORATIVE MODEL AND ITS ORIGIN..... 255 IV. THE COMMONS-COLLABORATIVE MODEL IN WIKIPEDIA AND ITS NEW REPUBLIC ...................................................... 262 A. Anyone Can Edit ............................................................ 266 B. Anonymity and User Privacy ........................................ 269 C. Copyright, Collaboration and a Free Encyclopedia ...... 270 D. Community Self-Governance and Meritocracy ............ 271 E. Content Policy and Quality Control .............................. 273 V. WIKIPEDIA AND ITS BOUNDARY-WORK..................................... 279 A. Defense Against External Pressure .............................. 282 1. Criticism: Wikipedia is Vulnerable to Vandalism .. 282 2. Criticism: Wikipedia Is Untrustworthy For Academic Citation ................................................... 284 3. Criticism: Wikipedia is Prone to be Abused For Tolerating Anonymity ............................................. 285 4. Criticism: Wikipedia Disrespects Expertise ........... 288 * S.J.D candidate at Harvard Law School; LL.M.. 2005, Harvard Law School. She thanks editors of the Albany Law Journal of Science & Technology for their meticulous work, and thanks T.B., Gabriella Coleman, William W. Fisher,
    [Show full text]
  • Largescale Communication Is More Complex and Unpredictable with Automated Bots
    BOTS AND PREDICTABILITY OF COMMUNICATION DYNAMICS Largescale Communication Is More Complex and Unpredictable with Automated Bots Martin Hilbert* & David Darmon^ * Communication, DE Computational Social Science; University of California, Davis, CA 95616; [email protected] ^ Department of Mathematics; Monmouth University, West Long Branch, NJ 07764, USA ABSTRACT Automated communication bots follow deterministic local rules that either respond to programmed instructions or learned patterns. On the micro-level, their automated and reactive behavior makes certain parts of the communication dynamic more predictable. Studying communicative turns in the editing history of Wikipedia, we find that on the macro-level the overall emergent communication process becomes both more complex and less predictable. The increased presence of bots is the main explanatory variable for these seemingly contradictory tendencies. In short, individuals introduce bots to make communication more simple and predictable, but end up with a largescale dynamic that is more complex and more uncertain. We explain our results with the information processing nature of complex systems. The article also serves as a showcase for the use of information theoretic measures from dynamical systems theory to assess changes in communication dynamics provoked by algorithms. Keywords: bots, predictability, complexity, Wikipedia, dynamical systems theory Acknowledgements: We would like to thank Summer Mielke, who explored the adequacy of a first version of this data source as part of an undergraduate research project. We would also like to thank Ryan James, who always had an open door and helped us in countless discussions with ideas and by removing doubts. 1 BOTS AND PREDICTABILITY OF COMMUNICATION DYNAMICS Largescale Communication Is More Complex and Unpredictable with Automated Bots Communication bots have become an integral part of today’s online landscape.
    [Show full text]
  • The Good, the Bot, and the Ugly: Problematic Information and Critical Media Literacy in the Postdigital Era
    Postdigital Science and Education https://doi.org/10.1007/s42438-019-00069-4 ORIGINAL ARTICLES The Good, the Bot, and the Ugly: Problematic Information and Critical Media Literacy in the Postdigital Era Jialei Jiang1 & Matthew A. Vetter1 # Springer Nature Switzerland AG 2019 Abstract This paper explores Wikipedia bots and problematic information in order to consider implications for cultivating students’ critical media literacy. While we recognize the key role of Wikipedia bots in addressing and reducing problematic information (misinforma- tion and disinformation) on the encyclopedia, it is ultimately reductive to construe bots as merely having benign impacts. In order to understand bots and other algorithms as more than just tools, we turn towards a postdigital theorization of these as ‘agents’ that co- produce knowledge in conjunction with human editors and actors. This paper presents case studies of three specific bots on Wikipedia, including ClueBot NG, AAlertbot, and COIBot, each of which engages in some type of information validation in the encyclo- pedia. The activities involving these bots, illustrated in these case studies, ultimately support our argument that information validation processes in Wikipedia are complicated by their distribution across multiple human-computer relations and agencies. Despite the programming of these bots for combating problematic information, their efficacy is challenged by social, cultural, and technical issues related to misogyny, systemic bias, and conflict of interest. Studying the function of Wikipedia bots makes space for extending educational models for critical media literacy. In the postdigital era of prob- lematic information, students should be on the alert for how the human and the nonhu- man, the digital and the nondigital, interfere and exert agency in Wikipedia’s complex and highly volatile processes of information validation.
    [Show full text]