The Impact of Research Data Sharing and Reuse on Data Citation in STEM Fields
Total Page:16
File Type:pdf, Size:1020Kb
University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations 12-1-2018 The Impact of Research Data Sharing and Reuse on Data Citation in STEM Fields Hyoungjoo Park University of Wisconsin-Milwaukee Follow this and additional works at: https://dc.uwm.edu/etd Part of the Library and Information Science Commons Recommended Citation Park, Hyoungjoo, "The Impact of Research Data Sharing and Reuse on Data Citation in STEM Fields" (2018). Theses and Dissertations. 2005. https://dc.uwm.edu/etd/2005 This Dissertation is brought to you for free and open access by UWM Digital Commons. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of UWM Digital Commons. For more information, please contact [email protected]. THE IMPACT OF RESEARCH DATA SHARING AND REUSE ON DATA CITATION IN STEM FIELDS by Hyoungjoo Park A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Information Studies at The University of Wisconsin -Milwaukee December 2018 ABSTRACT THE IMPACT OF RESEARCH DATA SHARING AND REUSE ON DATA CITATION IN STEM FIELDS by Hyoungjoo Park The University of Wisconsin Milwaukee, 2018 Under the Supervision of Dr. Dietmar Wolfram Despite the open science movement and mandates for the sharing of research data by major funding agencies and influential journals, the citation of data sharing and reuse has not become standard practice in the various science, technology, engineering and mathematics (STEM) fields. Advances in technology have lowered some barriers to data sharing, but it is a socio-technical phenomenon and the impact of the ongoing evolution in scholarly communication practices has yet to be quantified. Furthermore, there is need for a deeper and more nuanced understanding of author self-citation and recitation, the most often cited types of data, disciplinary differences regarding data citation and the extent of interdisciplinarity in data citation. ii This study employed a mixed methods approach that combined coding with semi-automatic text-searching techniques in order to assess the impact of data sharing and reuse on data citation in STEM fields. The research considered over 500,000 open research data entities, such as datasets, software and data studies, from over 350 repositories worldwide. I also examined 705 bibliographic publications with a total of 15,261 instances of data sharing, reuse, and citation the data, article, discipline and interdisciplinary levels. More specifically, I measured the phenomenon of data sharing in terms of formal data citation, frequently cited data types, and author self-citation, and I explored recitation at the levels of both data- and bibliography-level, and data reuse practices in bibliographies, associations of disciplines, and interdisciplinary contexts. The results of this research revealed, to begin with, disciplinary differences with regard to the impact of data sharing and reuse on data citation in STEM fields. This research also yielded the following additional findings regarding the citation of data by STEM researchers; 1) data sharing practices were diverse across disciplines; 2) data sharing has been increasing in recent years; 3) each discipline made use of major digital repositories; 4) these repositories took various forms depending on the discipline; 5) certain data types were more often cited in each discipline, so that the frequency distribution of the data types was highly skewed; 6) author self-citation and recitation followed similar trends at the data and bibliographic levels, but iii specific practices varied within each discipline; 7) associations between and across data and author self-citation and recitation at the bibliographic level were observed, with the self-citation rate differing significantly among disciplines;8) data reuse in bibliographies was rare yet diverse; 9) informal citation of data sharing and reuse at the bibliographic level was more common in certain fields, with astronomy/physics showing the highest amount (98%) and technology the lowest (69%); 10) within bibliographic publications, the documentation of data sharing and reuse occurred mainly in the main text; 11) publications in certain disciplines, such as chemistry, computing and engineering, did not attract citations from more than one field (i.e., showed no diversity); and, on the other hand,12) publications in other fields attracted a wide range of interdisciplinary data citations. This dissertation, then, contributes to the understanding of two key areas aspects of the current citation systems. First, the findings have practical implications for individual researchers, decision makers, funding agencies and publishers with regard to giving due credits to those who share their data. Second, this research has methodological implications in terms of reducing the labor required to analyze the full text of associated articles in order to identify evidence of data citation. iv © Copyright by Hyoungjoo Park, 2018 All Rights Reserved v To my parents and my sister vi TABLE OF CONTENTS Chapter 1 INTRODUCTION .................................................................................................................... 1 1.1. Research Problem and Motivation ................................................................................................ 1 1.2. Significance of the Research ......................................................................................................... 5 1.3. Research Questions and Purpose .................................................................................................. 6 1.4. Scope ............................................................................................................................................. 7 1.5. Definition of Terms ....................................................................................................................... 8 1.6. Dissertation Structure .................................................................................................................. 10 Chapter 2 LITERATURE REVIEW ...................................................................................................... 11 2.1. Introduction ................................................................................................................................. 11 2.2. Metric Studies of Scientific Communication .............................................................................. 11 2.2.1. Scientometrics ........................................................................................................................... 11 2.2.2. Citation Analysis ....................................................................................................................... 14 2.2.3. Citation Counts ......................................................................................................................... 16 2.2.4. Direct Citation ........................................................................................................................... 17 2.2.5. Co-citation and Literature Mapping .......................................................................................... 18 2.2.6. Bibliographic Coupling ............................................................................................................. 20 2.2.7. Scholarly Impact Assessment ................................................................................................... 21 2.2.8. Journal Impact Factor................................................................................................................ 23 vii 2.2.9. Co-word Analysis ..................................................................................................................... 24 2.2.10. Citer-based Analysis ............................................................................................................. 25 2.3. Open Science .............................................................................................................................. 26 2.3.1. Open Access .............................................................................................................................. 27 2.3.2. Open Access Journals ............................................................................................................... 28 2.3.3. Open Peer Review ..................................................................................................................... 29 2.3.4. Open Data ................................................................................................................................. 30 2.4. Data Sharing, Reuse and Citation ............................................................................................... 32 2.4.1. Data Sharing .............................................................................................................................. 32 2.4.2. Data Reuse ................................................................................................................................ 35 2.4.3. Data Citation ............................................................................................................................. 39 2.5. Software Sharing, Reuse and Citation ........................................................................................ 57 2.5.1. Software Sharing ....................................................................................................................... 57 2.5.2. Software