The Impact of Research Data Sharing and Reuse on Data Citation in STEM Fields

Total Page:16

File Type:pdf, Size:1020Kb

The Impact of Research Data Sharing and Reuse on Data Citation in STEM Fields University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations 12-1-2018 The Impact of Research Data Sharing and Reuse on Data Citation in STEM Fields Hyoungjoo Park University of Wisconsin-Milwaukee Follow this and additional works at: https://dc.uwm.edu/etd Part of the Library and Information Science Commons Recommended Citation Park, Hyoungjoo, "The Impact of Research Data Sharing and Reuse on Data Citation in STEM Fields" (2018). Theses and Dissertations. 2005. https://dc.uwm.edu/etd/2005 This Dissertation is brought to you for free and open access by UWM Digital Commons. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of UWM Digital Commons. For more information, please contact [email protected]. THE IMPACT OF RESEARCH DATA SHARING AND REUSE ON DATA CITATION IN STEM FIELDS by Hyoungjoo Park A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Information Studies at The University of Wisconsin -Milwaukee December 2018 ABSTRACT THE IMPACT OF RESEARCH DATA SHARING AND REUSE ON DATA CITATION IN STEM FIELDS by Hyoungjoo Park The University of Wisconsin Milwaukee, 2018 Under the Supervision of Dr. Dietmar Wolfram Despite the open science movement and mandates for the sharing of research data by major funding agencies and influential journals, the citation of data sharing and reuse has not become standard practice in the various science, technology, engineering and mathematics (STEM) fields. Advances in technology have lowered some barriers to data sharing, but it is a socio-technical phenomenon and the impact of the ongoing evolution in scholarly communication practices has yet to be quantified. Furthermore, there is need for a deeper and more nuanced understanding of author self-citation and recitation, the most often cited types of data, disciplinary differences regarding data citation and the extent of interdisciplinarity in data citation. ii This study employed a mixed methods approach that combined coding with semi-automatic text-searching techniques in order to assess the impact of data sharing and reuse on data citation in STEM fields. The research considered over 500,000 open research data entities, such as datasets, software and data studies, from over 350 repositories worldwide. I also examined 705 bibliographic publications with a total of 15,261 instances of data sharing, reuse, and citation the data, article, discipline and interdisciplinary levels. More specifically, I measured the phenomenon of data sharing in terms of formal data citation, frequently cited data types, and author self-citation, and I explored recitation at the levels of both data- and bibliography-level, and data reuse practices in bibliographies, associations of disciplines, and interdisciplinary contexts. The results of this research revealed, to begin with, disciplinary differences with regard to the impact of data sharing and reuse on data citation in STEM fields. This research also yielded the following additional findings regarding the citation of data by STEM researchers; 1) data sharing practices were diverse across disciplines; 2) data sharing has been increasing in recent years; 3) each discipline made use of major digital repositories; 4) these repositories took various forms depending on the discipline; 5) certain data types were more often cited in each discipline, so that the frequency distribution of the data types was highly skewed; 6) author self-citation and recitation followed similar trends at the data and bibliographic levels, but iii specific practices varied within each discipline; 7) associations between and across data and author self-citation and recitation at the bibliographic level were observed, with the self-citation rate differing significantly among disciplines;8) data reuse in bibliographies was rare yet diverse; 9) informal citation of data sharing and reuse at the bibliographic level was more common in certain fields, with astronomy/physics showing the highest amount (98%) and technology the lowest (69%); 10) within bibliographic publications, the documentation of data sharing and reuse occurred mainly in the main text; 11) publications in certain disciplines, such as chemistry, computing and engineering, did not attract citations from more than one field (i.e., showed no diversity); and, on the other hand,12) publications in other fields attracted a wide range of interdisciplinary data citations. This dissertation, then, contributes to the understanding of two key areas aspects of the current citation systems. First, the findings have practical implications for individual researchers, decision makers, funding agencies and publishers with regard to giving due credits to those who share their data. Second, this research has methodological implications in terms of reducing the labor required to analyze the full text of associated articles in order to identify evidence of data citation. iv © Copyright by Hyoungjoo Park, 2018 All Rights Reserved v To my parents and my sister vi TABLE OF CONTENTS Chapter 1 INTRODUCTION .................................................................................................................... 1 1.1. Research Problem and Motivation ................................................................................................ 1 1.2. Significance of the Research ......................................................................................................... 5 1.3. Research Questions and Purpose .................................................................................................. 6 1.4. Scope ............................................................................................................................................. 7 1.5. Definition of Terms ....................................................................................................................... 8 1.6. Dissertation Structure .................................................................................................................. 10 Chapter 2 LITERATURE REVIEW ...................................................................................................... 11 2.1. Introduction ................................................................................................................................. 11 2.2. Metric Studies of Scientific Communication .............................................................................. 11 2.2.1. Scientometrics ........................................................................................................................... 11 2.2.2. Citation Analysis ....................................................................................................................... 14 2.2.3. Citation Counts ......................................................................................................................... 16 2.2.4. Direct Citation ........................................................................................................................... 17 2.2.5. Co-citation and Literature Mapping .......................................................................................... 18 2.2.6. Bibliographic Coupling ............................................................................................................. 20 2.2.7. Scholarly Impact Assessment ................................................................................................... 21 2.2.8. Journal Impact Factor................................................................................................................ 23 vii 2.2.9. Co-word Analysis ..................................................................................................................... 24 2.2.10. Citer-based Analysis ............................................................................................................. 25 2.3. Open Science .............................................................................................................................. 26 2.3.1. Open Access .............................................................................................................................. 27 2.3.2. Open Access Journals ............................................................................................................... 28 2.3.3. Open Peer Review ..................................................................................................................... 29 2.3.4. Open Data ................................................................................................................................. 30 2.4. Data Sharing, Reuse and Citation ............................................................................................... 32 2.4.1. Data Sharing .............................................................................................................................. 32 2.4.2. Data Reuse ................................................................................................................................ 35 2.4.3. Data Citation ............................................................................................................................. 39 2.5. Software Sharing, Reuse and Citation ........................................................................................ 57 2.5.1. Software Sharing ....................................................................................................................... 57 2.5.2. Software
Recommended publications
  • How to Cite Datasets and Link to Publications
    A Digital Curation Centre ‘working level’ guide How to Cite Datasets and Link to Publications Alex Ball (DCC) and Monica Duke (DCC) Please cite as: Ball, A., & Duke, M. (2015). ‘How to Cite Datasets and Link to Publications’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/how-guides Digital Curation Centre, 2015. Licensed under Creative Commons Attribution 4.0 International: http://creativecommons.org/licenses/by/4.0/ How to Cite Datasets and Link to Publications Introduction This guide will help you create links between your academic publications and the underlying datasets, so that anyone viewing the publication will be able to locate the dataset and vice versa. It provides a working knowledge of the issues and challenges involved, and of how current approaches seek to address them. This guide should interest researchers and principal investigators working on data-led research, as well as the data repositories with which they work. Why cite datasets and link mechanisms allowing authors to be open about their research while still receiving due credit; metrics used them to publications? to translate such attributions into rewards for authors 1 and their institutions; and archives ensuring that the The motivation to cite datasets arises from a recog- work is permanently available for reference and reuse.5 nition that data generated in the course of research If datasets are to be regarded as first-class records of are just as valuable to the ongoing academic discourse research, as they need to be, a similar set of control as papers and monographs.
    [Show full text]
  • Theory and Practice of Data Citation
    Theory and Practice of Data Citation Gianmaria Silvello Department of Information Engineering, University of Padua, Via Gradenigo 6/b, Padua, Italy [email protected] tel. +39 049 827 7500 Abstract Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming “data-intensive”, where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle. Introduction Citations are the cornerstone of knowledge propagation in science, the principal means of assessing the quality of research and directing investments in science as well as one of the pillars of the scholarship architecture.
    [Show full text]
  • Access to Citation Data: Cost-Benefit And
    Theme: Infrastructure: Access to Citation Data: Cost-benefit and Risk Review and Forward Look September 2013 Theme: Infrastructure Access to Citation Data: Cost-benefit and Risk Review and Forward Look Access to Citation Data: Cost-benefit and Risk Review and Forward Look Dr Max Hammond Professor Charles Oppenheim Dr Geoff Curtis September 2013 Access to Citation Data: Cost-benefit and Risk Review and Forward Look Contents Executive summary ................................................ 1 5.2 Alternative approaches to process elements ..................................................... 29 1 Introduction...................................................... 4 5.3 Analysis of future business model .............. 30 1.1 Aim, scope and focus of the study ............... 4 1.2 Study approach ............................................ 4 6 Conclusions and way forward .......................... 32 6.1 Context ........................................................ 32 2 Background - citation data and citation metrics . 5 6.2 Overview of the current situation .............. 32 2.1 Introduction .................................................. 5 6.3 Current approach ........................................ 33 2.2 What is a citation? ........................................ 6 6.4 Future systems ............................................ 35 2.3 The uses of citation data .............................. 8 6.5 Next steps ................................................... 37 2.4 Citation data services ................................. 12 7 References
    [Show full text]
  • A Data Citation Roadmap for Scholarly Data Repositories
    www.nature.com/scientificdata OPEN A data citation roadmap for ARTICLE scholarly data repositories Martin Fenner1, Mercè Crosas2, Jefrey S. Grethe 3, David Kennedy 4, Henning Hermjakob 5, Phillippe Rocca-Serra6, Gustavo Durand2, Robin Berjon 7, 8 3 9 Received: 2 October 2017 Sebastian Karcher , Maryann Martone & Tim Clark Accepted: 12 March 2019 This article presents a practical roadmap for scholarly data repositories to implement data citation in Published: xx xx xxxx accordance with the Joint Declaration of Data Citation Principles, a synopsis and harmonization of the recommendations of major science policy bodies. The roadmap was developed by the Repositories Expert Group, as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH-funded BioCADDIE (https://biocaddie.org) project. The roadmap makes 11 specifc recommendations, grouped into three phases of implementation: a) required steps needed to support the Joint Declaration of Data Citation Principles, b) recommended steps that facilitate article/ data publication workfows, and c) optional steps that further improve data citation support provided by data repositories. We describe the early adoption of these recommendations 18 months after they have frst been published, looking specifcally at implementations of machine-readable metadata on dataset landing pages. Introduction Te Joint Declaration of Data Citation Principles (JDDCP) published in 20141 and endorsed by a large number of scholarly and academic publishing organizations, lays out a set of principles on purpose, function and attributes of data citations. Te frst of these principles stresses that data should be considered legitimate, citable products of research2. Te JDDCP condenses the results of substantial prior studies on science policy and practice3–5.
    [Show full text]