An Investigation of Descriptive Information Facilitators and Inhibitors
Total Page:16
File Type:pdf, Size:1020Kb
DATA SHARING AND DATA REUSE: AN INVESTIGATION OF DESCRIPTIVE INFORMATION FACILITATORS AND INHIBITORS Angela Patricia Murillo A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the School of Information and Library Science. Chapel Hill 2016 Approved by: Jane Greenberg Mohammed Hossein Jarrahi Robert Losee William Michener Reagan Moore Arcot Rajasekar © 2016 Angela Patricia Murillo ALL RIGHTS RESERVED ii ABSTRACT Angela Patricia Murillo: Data Sharing And Data Reuse: An Investigation Of Descriptive Information Facilitators And Inhibitors (Under the direction of Jane Greenberg) This dissertation examines how descriptive information inhibits or facilitates data sharing and reuse. DataONE serves as the test environment. The objective is to identify descriptive information made discoverable through DataONE and subsequently determine what of this descriptive information is helpful for scientists to determine data reusability. This study uses a mixed method approach, which includes a data profiling assessment in the form of a quantitative and qualitative content analysis and a quasi-experiment think-aloud. A quantitative and qualitative content analysis was conducted on a stratified sample of data extracted from DataONE to examine types of descriptive information made available through the shared data. Participants searched a quasi-experiment interface and thought-aloud about what information inhibited or facilitated them to determine data reusability. Additionally, participants completed a post result usefulness survey, post search rank order survey, and a post search factors survey. The quantitative and qualitative content analysis shows that the shared data contains 30 unique pieces of descriptive information found in the records. The quasi-experiment think-aloud indicates that scientists found pieces of descriptive information particularly useful for their ability to determine data reusability. These include: (a) the data description, (b) the attribute table, and (c) the research methods. In conclusion, metadata schema, member node standards, and community standards, impact what types of descriptive information are provided through the iii shared data. Attribute and unit lists, research methods information, and succinctly written abstracts facilitate data reuse. However long abstracts and having the same information in multiple places, and the exclusion of data descriptions inhibit data reuse. The findings and recommendations assist funding agencies and scientific organizations in understanding the current state of data being shared and prioritizing how to meet the needs of scientists regarding data reuse. This dissertation provides guidance to developers of current and future data sharing environments and infrastructures, research data management and scientific communities, scientific data managers, creators of data management plans, and funding agencies; and has implications beyond DataONE. iv Dedicated To My Parents: Eddie William Murillo And Bertha Murillo v ACKNOWLEDGEMENTS I am truly thankful and grateful to my committee: (Drs.) Jane Greenberg, Mohammed Hossein Jarrahi, Robert Losee, William Michener, Reagan Moore, and Acrot Rajasekar. I deeply appreciate all of your time, thoughts, ideas, and encouragement. Thank you to the rest of the SILS professors particularly Dr. Diane Kelly and Dr. Barbara Wildemuth for your helpful advice and encouragement along the way; thank you for your generosity, time, and guidance. Thank you to the community for the many research opportunities and researcher communities that I have had the opportunity to be a part of. Thank you to the SILS community and to the many opportunities for research, teaching, and service you’ve provided me. Thank you to DataONE for all of the research opportunities, the opportunity to explore your data, and for your generous funding. Thank you to the to the Metadata Research Center, CODATA, the National Consortium for Data Science, and the Earth Science Information Partners for all of your generous research and funding opportunities. A special thank you to the DigCCurr Fellowship for funding so many years of my doctorate studies, thank you for the wonderful opportunity to work with you. And lastly, thank you UNC-Writing Center! Thank you to all the many great friends and colleagues that have travelled this long and wonderful path with me particularly, my doctoral cohort friends. I miss seeing all of your smiling encouraging faces and I’m so glad we are still able to connect from time to time. It has been so wonderful to have you by my side during this long journey and to see where this journey has taken all of you too. Hugs and love J vi Thank you to my writing buddies especially Rachael Clemens, Ericka Patillo, and Leslie Thomson, I will miss coffee shopping with you all. Thank you Dr. Ashlee Edwards, Sami Kaplan, Debbie Maron, Sarah Ramdeen, and Jewel Ward, I definitely couldn’t have finished this without each of your support. To those I’ve forgotten to acknowledge, sorry and thanks to you too! To triangle area coffee shops and libraries, I definitely couldn’t have done this without you. To all past friends and colleagues, I wish you all well too! To my outside of doctoral studies friends: Greg, Trish, Chris, Megan, Katelyn, Rachel, and Mindy, thank you all who listened to me endlessly talk about my research. Special thanks to Kjersti Kyle and Emily Zaentz; love you both! To my best girl, Xan, I could not have done this without you; thanks for putting up with endless hours of watching me work. To my family, especially my mom and sister, thank you for everything, I love you all always! Lastly again, thank you to my wonderful committee for helping me through this process. And lastly, to Jane Greenberg, you are an amazing mentor, advisor, and friend and I will forever be in your debt. vii TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... xiv LIST OF FIGURES ..................................................................................................................... xvi LIST OF ABBREVIATIONS ...................................................................................................... xix CHAPTER I: INTRODUCTION .................................................................................................... 1 CHAPTER II: LITERATURE REVIEW ....................................................................................... 4 DataNet and DataONE ................................................................................................................ 4 The DataNet ............................................................................................................................ 5 Data Observation Network for Earth (DataONE) ................................................................. 11 Research Studies Specific to DataONE ................................................................................ 19 Conclusion ............................................................................................................................ 26 Data Sharing and Reuse in the Sciences ................................................................................... 28 Themes and Factors Associated with Data Sharing .............................................................. 28 Data Sharing Research Studies ............................................................................................. 30 Conclusion ............................................................................................................................ 34 Data Management in the Sciences ............................................................................................ 36 The Data Deluge ................................................................................................................... 36 Changes in Scientific Process and the Fourth Paradigm ...................................................... 41 Emerging Concerns of Data Management in the Sciences ................................................... 44 Conclusion ............................................................................................................................ 53 viii Selected Infrastructure and Interoperability Factors ..................................................................... 54 Data Tools and Applications ................................................................................................. 55 Provenance ............................................................................................................................ 59 Metadata ................................................................................................................................ 61 Ontologies ............................................................................................................................. 62 Data and Data Models ........................................................................................................... 64 Literature Review Conclusions ................................................................................................. 66 CHAPTER III: RELEVANT RESEARCH METHODS AND THEORETICAL FRAMEWORKS .......................................................................................................................... 68 Theoretical Research Specific to the