SHARING AND DATA REUSE: AN INVESTIGATION OF DESCRIPTIVE FACILITATORS AND INHIBITORS

Angela Patricia Murillo

A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the School of Information and Library Science.

Chapel Hill 2016

Approved by:

Jane Greenberg

Mohammed Hossein Jarrahi

Robert Losee

William Michener

Reagan Moore

Arcot Rajasekar

© 2016 Angela Patricia Murillo ALL RIGHTS RESERVED

ii ABSTRACT

Angela Patricia Murillo: Data Sharing And Data Reuse: An Investigation Of Descriptive Information Facilitators And Inhibitors (Under the direction of Jane Greenberg)

This dissertation examines how descriptive information inhibits or facilitates data sharing and reuse. DataONE serves as the test environment. The objective is to identify descriptive information made discoverable through DataONE and subsequently determine what of this descriptive information is helpful for scientists to determine data reusability. This study uses a mixed method approach, which includes a data profiling assessment in the form of a quantitative and qualitative content analysis and a quasi-experiment think-aloud. A quantitative and qualitative content analysis was conducted on a stratified sample of data extracted from

DataONE to examine types of descriptive information made available through the shared data.

Participants searched a quasi-experiment interface and thought-aloud about what information inhibited or facilitated them to determine data reusability. Additionally, participants completed a post result usefulness survey, post search rank order survey, and a post search factors survey.

The quantitative and qualitative content analysis shows that the shared data contains 30 unique pieces of descriptive information found in the records. The quasi-experiment think-aloud indicates that scientists found pieces of descriptive information particularly useful for their ability to determine data reusability. These include: (a) the data description, (b) the attribute table, and (c) the research methods. In conclusion, metadata schema, member node standards, and community standards, impact what types of descriptive information are provided through the

iii shared data. Attribute and unit lists, research methods information, and succinctly written abstracts facilitate data reuse. However long abstracts and having the same information in multiple places, and the exclusion of data descriptions inhibit data reuse. The findings and recommendations assist funding agencies and scientific organizations in understanding the current state of data being shared and prioritizing how to meet the needs of scientists regarding data reuse. This dissertation provides guidance to developers of current and future data sharing environments and infrastructures, research data management and scientific communities, scientific data managers, creators of data management plans, and funding agencies; and has implications beyond DataONE.

iv Dedicated To My Parents: Eddie William Murillo And Bertha Murillo

v ACKNOWLEDGEMENTS

I am truly thankful and grateful to my committee: (Drs.) Jane Greenberg, Mohammed

Hossein Jarrahi, Robert Losee, William Michener, Reagan Moore, and Acrot Rajasekar. I deeply appreciate all of your time, thoughts, ideas, and encouragement. Thank you to the rest of the

SILS professors particularly Dr. Diane Kelly and Dr. Barbara Wildemuth for your helpful advice and encouragement along the way; thank you for your generosity, time, and guidance.

Thank you to the community for the many research opportunities and researcher communities that I have had the opportunity to be a part of. Thank you to the SILS community and to the many opportunities for research, teaching, and