Florida State University Libraries

Electronic Theses, Treatises and Dissertations The Graduate School

2014 Exploring the Data Management and Curation (DMC) Practices of Scientists in Research Labs within a Research University Plato L. Smith II

Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected]

FLORIDA STATE UNIVERSITY

COLLEGE OF COMMUNICATION & INFORMATION

EXPLORING THE DATA MANAGEMENT AND CURATION (DMC) PRACTICES

OF SCIENTISTS IN RESEARCH LABS WITHIN A RESEARCH UNIVERSITY

By

PLATO L. SMITH II

A Dissertation submitted to the School of Information in partial fulfillment of the requirements for the degree of

Degree Awarded Summer Semester, 2014

! ! ! !

Plato L. Smith II defended this dissertation on June 23, 2014. The members of the supervisory committee were:

Paul Marty Professor Directing Dissertation

Helen Burke University Representative

Stvilia Besiki Committee Member

Lorri Mon Committee Member

The Graduate School has verified and approved the above-named committee members, and certifies that the dissertation has been approved in accordance with university requirements.

! ii! ! ! ! !

This dissertation is dedicated to my God, grandfather (Plato Smith), grandmother (Lula Smith), mother (Joyce C. Smith), son (Daryl), and grandson (Lil Daryl).

!iii! ! ! ! !

ACKNOWLEDGEMENTS

I thank God for my Lord and Savior, Jesus Christ, through which His Holy Spirit equipped me with the strength, knowledge, skills, abilities, and fortitude to complete this dissertation.

The completion of this dissertation was made possible by the participation of scientists at several high-profile research laboratories at the Florida State University and scientists associated with the National Science Foundation (NSF) EarthCube project.

The following individuals have been significant in their encouragement of my efforts to complete this dissertation and for that I am forever humbled, grateful, and thankful. I especially thank Dr. Christie Koontz and Dr. Corrine Jörgensen who encouraged me to apply to the doctoral program in the School of Library & Information Studies at Florida State University. I also like to thank Dean Larry Dennis, Dr. Kathleen Burnett, Dr. Paul Marty, Dr. Lorri Mon, Dr. Besiki Stvilia, and Dr. Helen Burke for providing the necessary academic support, guidance, and technical acumen required for successful doctoral matriculation. The following is a fractional representation of the list of individuals over the course of my academic and professional career that have contributed to my success in various facets of my life whether directly or indirectly: Dr. Greg Newby, Dr. Reginald Stevens, Dr. Althea Jenkins, Dr. Hannelore Radar, Dr. A.K.S.K. Prasad, Dr. Cheryl Ward, Dr. Peter Lazarevich,!Dr. Lawrence Abele, Dr. Dominic Farace, Dr. Ann Thistle, Dr. Sylvia Spengler, Irene Lombardo, Dr. Barbara Ransom, Dr. Bob Chadduck, Dr. Marco Tedesco, Dr. Joel Cutcher-Gershenfeld, Mimi McClure, Dr. Ruth Swan, Jeneice Smith, Brenda Wright, Lula Smith, Regina Harris, Teresa Sargent, Diana DeBoer, Ken Baldauf, Geoffery Miller, Dr. Judith Devine, Dr. Nancy Marcus, Dr. Melissa Johnston, Dr. Gary Burnett, Chuck Thomas, Giesele Towels, Willa Patterson, Dr. Alicia Viera, Vanessa Reyes, FSU Libraries DLC OPS students, Dr. Laurie Taylor, Mark Sullivan, Jeanette Dummer, Michael Kaplan, Wei Dei, Sharon Schwerzel, Tamara Weatherholt, Priscilla Caplan, Dr. Robin Sellars, Dr. Jenny Ma, Dr. Benjamin Speller, Susan Nutter, John Little, Dr. Courtney Ferguson, Carmen Dorsey, Prof. Shearer, Lionell Parker, Prof. Chapman, Dr. Gibbs, Mary Lou Norwood, Chaplain Barry, Mark Lazarus, Dr. Amoateng, Phyllis Carrington, Portia McQueen, Shanthi Natesan, Dwayne McCoy, Tabatha, Harry James, Tangy (Big Sis), Cherry Richardson, Delores Littlejohn, my dad, and Loretta Rhyne.

The following organizations have been instrumental in my professional development: The National Science Foundation (NSF), Florida State University School of Information, Florida A&M University (FAMU) Libraries, Centre (DCC), International Conference on Grey Literature (GreyNet), UNC DigCCur, JCDL Doctoral Consortium, Cornell University Library/ICPSR Management Workshop, UIUC Summer Institute for Data Curation in the Humanities, Association of Research Libraries (ARL), North Carolina State University Libraries, Libraries, Emerald Publishing, FSU Libraries, the Council on Library and Information Resources (CLIR)/DLF Postdoctoral Fellowship Program and The University of New Mexico College of University Libraries & Learning Sciences (UL & LS).

“In their hearts humans plan their course, but the Lord establishes their steps.” – Proverbs 16:9

! iv! ! ! ! !

TABLE OF CONTENTS List of Tables ...... ix List of Figures ...... xi Abstract ...... xii 1. INTRODUCTION ...... 1 1.1 Statement of the Problem ...... 1 1.2 Research Purpose ...... 2 1.3 Research Questions ...... 2 1.4 Significance of Research ...... 3 1.5 Assumptions ...... 3 1.6 Definitions...... 4 1.7 Overview of Theory ...... 7 1.8 Overview of Conceptual Framework ...... 7 1.9 Overview of Method ...... 7 1.10 Summary ...... 8 2. LITERATURE REVIEW ...... 9 2.1 Data Management and Curation (DMC) ...... 9 2.1.1 Data Management Planning, Data Curation, Digital Curation, and Digital Preservation ...... 9 2.1.2 Data Management Planning ...... 10 2.1.3 Data Curation ...... 12 2.1.4 Digital Curation ...... 12 2.1.5 Digital Preservation ...... 13 2.1.6 Summary ...... 14 2.2 Research Issues ...... 15 2.3 Relevant Research ...... 16 2.3.1 Domain ...... 16 2.3.2 Institutional ...... 18 2.3.3 National ...... 20 2.3.4 International ...... 21 2.4 Models...... 24 2.4.1 Open Archival Information System (OAIS) - Preservation ...... 24 2.4.2 Levels 1 – 3 Curation ...... 25 2.4.3 The DCC Curation Lifecycle Model ...... 26 2.5 Relevant Standards, Guidelines, & Best Practices ...... 28 2.5.1 Metadata Standards ...... 29 2.5.2 Disciplinary Domains ...... 29 2.5.3 Repository Standards ...... 30 2.5.4 Best Practices ...... 31 2.5.5 Guidelines ...... 32 2.6 Open Access ...... 33 2.7 Tools ...... 35 2.7.1 Data Management Planning Tools ...... 35 2.7.2 Curator Tools ...... 37 2.7.3 Researcher Tools ...... 37 2.7.4 A Relevant Metadata Standards Use Case Example ...... 38 ! v! ! ! ! !

2.7.5 Disciplinary Specific Issues, Perspectives, & Use Cases ...... 40 2.8 Literature Review Online Searches ...... 43 2.8.1 LIS Journals Resources ...... 44 2.8.2 Indexing Problems ...... 45 2.8.3 Publication Errors ...... 46 2.8.4 Displacement of Concepts ...... 47 2.9 Introduction of Metatriangulation Theory ...... 48 2.9.1 Metatriangulation ...... 48 2.9.2 Foundations of Lewis & Grimes’ Metatriangulation ...... 50 2.9.3 Background and Early Developments ...... 50 2.9.4 Lessons from the Trenches of Metatriangulation Research ...... 52 2.9.5 Metatriangulation Study of Digital Cross-Organizational Collaboration ...... 52 2.9.6 Summary ...... 53 2.10 Data Management and Curation: A Metatriangulation Review ...... 54 2.10.1 Application of Metatriangulation to Data Management and Curation ...... 55 2.10.2 Interdisciplinary Research Compliments Metatriangulation ...... 57 2.10.3 Metatriangulation as a Theoretical Perspective ...... 57 2.10.4 Limitations of Metatriangulation ...... 58 2.11 Conceptual Framework for Analyzing Methodological Suppositions ...... 59 2.11.1 How has the Conceptual Framework been used? ...... 60 2.11.2 Conceptual Framework Scenario #1 ...... 62 2.11.3 Conceptual Framework Scenario #2 (Preliminary Study) ...... 62 2.12 Major Findings from Literature Review ...... 65 2.13 Summary ...... 66 3. METHODS ...... 67 3.1 Methodology ...... 67 3.2 Research Purpose ...... 67 3.3 Research Questions ...... 68 3.4 Theoretical Framework ...... 69 3.5 Methodological Approach ...... 71 3.6 Research Design ...... 72 3.7 Data Asset Framework (DAF) ...... 73 3.7.1 Data Asset Framework (DAF) Use Cases ...... 73 3.8 Mixed Methods ...... 74 3.9 Methods of Data Collection ...... 75 3.9.1 DAF Survey (online) ...... 76 3.9.2 DAF Interview (online second survey with option for face- to-face) ...... 77 3.10 Population and Sampling ...... 79 3.11 Data Analysis ...... 80 3.11.1 Preliminary Pilot Study Results ...... 80 3.11.2 DAF Survey Questionnaire Data Analysis ...... 82 3.12 Qualitative Data Analysis ...... 82 3.12.1 Units of Analysis for Interview ...... 83 3.12.2 Multiple Paradigms Perspectives ...... 83

! vi! ! ! ! !

3.12.3 Multiple Paradigms Perspectives Significance ...... 84 3.13 Data Management ...... 85 3.14 Validity and Reliability ...... 86 3.14.1 Mixed-Methods ...... 86 3.14.2 Quantitative Survey ...... 87 3.14.2.1 Face validity ...... 87 3.14.2. 2 Content validity ...... 87 3.14. 2.3 Concurrent validity ...... 87 3.14.2.4 Construct validity ...... 87 3.14.2.5 Reliable validity ...... 87 3.14.2.6 Test-retest reliability ...... 88 3.14.3 Qualitative Interview ...... 88 3.14.3.1 Credibility ...... 89 3.14.3.2 Transferability ...... 89 3.14.3.3 Dependability ...... 89 3.14.3.4 Confirmability ...... 89 3.14.3.5 Member Checks ...... 89 3.15 Significance and Limitations ...... 90 3.15.1 Significance ...... 90 3.15.2 Limitations ...... 91 3.16 Ethical Considerations ...... 91 3.17 Proposed Timeline for Dissertation Research ...... 92 3.18 Conclusions ...... 92 4. RESULTS ...... 94 4.1 Findings from DAF Survey Data Analysis ...... 94 4.1.1 DAF Survey ...... 94 4.1.2 About You ...... 95 4.1.3 General Data Management ...... 99 4.1.4 Barriers ...... 104 4.1.5 Your Data Assets ...... 107 4.1.6 Final Comments ...... 113 4.2 Findings from DAF Interview Data Analysis ...... 117 4.2.1 DAF Interview ...... 117 4.2.2 Area of Research ...... 119 4.2.3 Disciplinary Domain Research Data Management Perspectives ...... 119 4.2.4 Exemplar Research Project – The Funding Application Stage ...... 130 4.2.5 Exemplar Research Project – Data Collection Stage ...... 132 4.2.6 Exemplar Research Project – Data Storage and Backups ...... 133 4.2.7 Exemplar Research Project – Sharing and Security ...... 136 4.2.8 Exemplar Research Project – Archiving Data ...... 139 4.2.9 Expected Support ...... 140 4.2.10 Effective Use of Infrastructure ...... 141 4.2.11 Research Data Management Confidence and Awareness ...... 143 4.2.12 Concluding Questions & Comments ...... 144 5. CONCLUSIONS ...... 145 5.1 Discussions ...... 145

!vii! ! ! ! !

5.2 Research Q #1 – How Do Researchers Manage, Store, And Preserve Research Data? ...... 145 5.3 Research Q #2 – How Can The Identification And Clarification Of Key Concepts Of Data Management And Curation (DMC) Be Better Articulated Within And Across Disciplines? ...... 150 5.4 Research Q #3 – What Are Some Of The Theories, Practices, And Methods Multiple Disciplines Use To Address Research Data Management In Your Discipline? ...... 156 5.5 Research Q #4 – How Can Multiple Paradigms Perspectives On Data Management And Curation Practices Within And Across Disciplinary Domains Contribute To Building DMC Research & Theory? ...... 155 5.6 Implications ...... 158 5.6.1 Research Implications ...... 158 5.6.2 Practical Implications ...... 159 5.6.3 Social Implications ...... 160 5.7 Conclusions ...... 161 5.8 Recommendations ...... 163 5.9 Significance ...... 164 5.10 Future Research ...... 164 APPENDICES ...... 168 A. DATA MANAGEMENT AND CURATION SERVICES (DMCS) OPINION SURVEY QUESTIONNAIRE ...... 168 B. PRELIMINARY STUDY IRB APPROVAL MEMORANDUM ...... 177 C. LETTER OF INVITATION TO PARTICIPATE IN WEB-BASED SURVEY ...... 179 D. PRIMARY STUDY IRB APPROVAL MEMORANDUM – PHASE 1 ...... 180 E. LETTER OF INVITATION TO PARTICIPATE IN WEB-BASED SURVEY ...... 182 F. DATA ASSET FRAMEWORK (DAF) SURVEY QUESTIONNAIRE ...... 183 G. PRIMARY STUDY IRB APPROVAL MEMORANDUM – PHASE 2 ...... 189 H. LETTER OF INVITATION TO PARTICIPATE IN INTERVIEW ...... 191 I. DATA ASSET FRAMEWORK (DAF) INTERVIEW QUESTIONS ...... 192 J. DATA ASSET FRAMEWORK (DAF) INTERVIEW TRANSCRIPTS ...... 200 K. COPYRIGHT PERMISSION FOR SELECT FIGURES AND TABLE ...... 227 REFERENCES ...... 229 BIOGRAPHICAL SKETCH ...... 242

!viii! ! ! ! !

LIST OF TABLES

Table 1: Data Curation Continua ...... 28

Table 2: LIS Journals Search Results of DMC Key Concepts ...... 44

Table 3: Theory-Building Processes of Traditional Induction and Metatriangulation ...... 55

Table 4: Preliminary Study Survey Results of Existing DMC Perspectives ...... 56

Table 5: Ontological Assumptions and Epistemological Stances ...... 61

Table 6: Proposed Timeline for Dissertation Research ...... 92

Table 7: Primary Research Role ...... 95

Table 8: Primary Disciplinary Domain ...... 97

Table 9: Primary Data Type ...... 98

Table 10: Primary Data Funding Agencies ...... 99

Table 11: Collectors of Secondary Data ...... 101

Table 12: Data Types of Secondary Data ...... 102

Table 13: Data Storage Locations ...... 103

Table 14: Data Management Issues ...... 105

Table 15: Amount of Electronic Research Data Currently Maintained ...... 108

Table 16: Location of Data Backup ...... 110

Table 17: Frequency of Data Backup ...... 110

Table 18: Primary Research Role, Discipline, and Research Lab ...... 119

Table 19: The DCC Curation Lifecycle Model – Interview Heat Map Table ...... 124

Table 20: Level Three Curation Model – Interview Heat Map Table ...... 125

Table 21: Adapted Conceptual Framework Model – Interview Heat Map Table ...... 127

Table 22: Disciplinary Domain Research Data Management Perspectives ...... 128

! ix! ! ! ! !

Table 23: Ranking of Storage Device for Research Data ...... 141

Table 24: Research Data Management Confidence and Awareness ...... 143

! x! ! ! ! !

LIST OF FIGURES

Fig. 1: Data Management and Curation (DMC) Key Concepts ...... 5

Fig. 2: OAIS Reference Model – Functional ...... 25

Fig. 3: Level Three Curation – Information Flow With Data Curation ...... 26

Fig. 4: The DCC Curation Lifecycle Model ...... 27

Fig. 5: Conceptual Framework For Analyzing Methodological Suppositions ...... 61

Fig. 6: Methodological Suppositions For The Data Management And Curation ...... 64

Fig. 7: Adapted Conceptual Framework Model ...... 64

Fig. 8: Data Asset Framework (DAF) Methodology ...... 75

Fig. 9: Participant Primary Research Roles ...... 79

Fig. 10: Primary Disciplinary Domain ...... 80

Fig. 11: DAF Surveys and Interviews Completion Rate ...... 94

Fig. 12: Primary Disciplinary Domain ...... 96

Fig. 13: Barriers to Research Data Management ...... 105

Fig. 14: Responsible for Research Data Management ...... 108

Fig. 15: Use of Standards, Best Practices, and Guidelines ...... 112

Fig. 16: Research Lab Benefit from Data Curator ...... 113

Fig. 17: The DCC Curation Lifecycle Model – Interview Heat Map ...... 123

Fig. 18: Level Three Curation Model – Interview Heat Map ...... 125

Fig. 19: Adapted Conceptual Framework Model – Interview Heat Map ...... 127

! xi! ! ! ! !

ABSTRACT

Beginning January 18, 2011, proposals submitted to The National Science Foundation (NSF) must include a supplementary Data Management Plan (DMP) of no more than two pages. The NSF DMP requirement has significantly redefined the role of scientists, researchers, and practitioners in the United States of America (USA) by presenting the opportunity to engage in effective data management planning and practices for current and future use. In order for data to be useful to research, science, scholarship, and education, data must be identified, described, shared, discovered, extended, stored, managed, and consulted over its lifecycle (Bush, 1945; Lord & Macdonald, 2003; Hunter, 2005; JISC, 2006; UIUC GLIS, 2006/2010; NSF, 2011).

Within the scope of this research study data management planning is defined as the planning of policies for the management of data types, formats, metadata, standards, integrity, privacy, protection, confidentiality, security, intellectual property rights, dissemination, reuse/re- distribution, derivatives, archives, preservation, and access (NSF, 2011). The management of data includes analog [physical], digitized [made electronic] & born digital [no physical surrogate] data. NSF’s data management plan requirements have incentivized the development of a multitude of programs, projects, and initiatives aimed at promoting and providing data management planning knowledge, skills, and abilities for NSF data management plan requirements compliancy. Without the specification, clarification, & definition of key concepts; assessment of current data management practices, experiences, & methods; interrelationships of key concepts; and utilization of multiple methodological approaches, data management will be problematic, fragmented, and ineffective. The accomplishment of effective data management is contingent on funders, stakeholders, and users’ investment and support in Infrastructure, Cultural Change, Economic Sustainability, Data Management Guidelines, and Ethics and Internet Protocol (Blatecky, 2012, p. 5) across organizations, institutions, & domains.

One of the goals of the researcher “is to select a theory or combine [multiple theoretical perspectives] so they resonate with the guiding research questions, data-collection methods, analysis procedures, and presentation of findings” (Bodner & Orgill, 2007, p. 115) within a conceptual framework that “places its assumptions in view for practitioners” (Crotty, 1998). The introduction of the Conceptual Framework for Analyzing Methodological Suppositions (Burrell

!xii! ! ! ! !

& Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983, Solem, 1993) to gather competing approaches and paradigmatic assumptions for multiple paradigm integration and crossing via interplay (Schultz & Hatch, 1996) is an attempt by the researcher to build theory from multiple paradigms through Metatriangulation (Lewis & Grimes, 1999), a theory-building approach. Within this study, the Data Asset Framework (DAF) is framed as a sequential mixed methods explanatory research design (Creswell and Plano Clark, 2011) and applies social science research to facilitate scientific inquiry. The purpose of this study is to investigate the data management and curation practices of scientists at several research laboratories at the Florida State University and select scientists associated with the National Science Foundation (NSF) EarthCube project. The goal of this research is not to provide extensive literature review to prove the need for effective data management practices but to provide empirical evidence to support current data management and curation practices. Within the scope of this dissertation, data management and curation practices will be generally defined as the effective aggregation, organization, representation, dissemination, and preservation of data. Data refers to analog and digital objects, databases, data sets, and research data. For purposes of discussions in this study, data is both singular and plural. Data management and curation practices include four key concepts: (1) data management planning, (2) data curation, (3) digital curation, and (4) digital preservation. Literature review suggests that these key concepts when applied with relevant standards, best practices, and guidelines can assist scientists in ensuring the integrity, accessibility, and stewardship of research data throughout its lifecycle. The combination of the conceptual framework for analyzing methodological suppositions (Burrell & Morgan, 1979; Morgan & Smircich, 1980; Morgan, 1983; Solem, 1993), Metatriangulation (Lewis & Grimes, 1999), and the Data Asset Framework (DAF) (JISC, 2009) contributes to the development of an interdisciplinary conceptual framework model concept capable of addressing the data management and curation issues common across disciplines. For the purpose of this dissertation “research data are being understood as both primary input into research and first order results of that research1” (ESRC, 2010, p. 2).

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1 Sustainable Economics inspired the definition for a Digital Planet: Ensuring Long-Term Access to Digital Information. Final Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access, February 2010. !xiii! ! ! ! !

CHAPTER ONE INTRODUCTION

“If we are to do an adequate job of conceptualizing, we must do more than just think up some definitions, any definitions, for our concepts. We have to turn to social theory and prior research to review appropriate definitions. We may need to distinguish sub-concepts, or dimensions, of the concept. We should understand how the definitions we choose fits within the theoretical framework guiding the research and what assumptions underlie this framework” (Schutt, 2006, p. 93.).

The challenge of how to effectively manage research data affects all scientists and researchers within and across multiple domains. Managing research data through data management and curation (DMC) services (including data management planning, data curation, digital curation, and digital preservation) is a common topic in the Academic Research Libraries (ARL) and Library and Information Science (LIS) literature. Multiple perspectives on how researchers from different domains store, manage, and use research data are necessary to understand the challenge of managing research data effectively—a problem common to multiple disciplines.

Researchers report that they struggle unsuccessfully with storage and management of their burgeoning volume of documents and data sets that they need and that result from their work. While some universities have devised new services to better manage data and other information derived from research, many researchers flounder in a disorganized way and rising accumulation of useful findings may be lost or unavailable when conducting future research. (Kroll & Forsman, 2010, p. 5)

1.1 Statement of the Problem The challenge of effectively managing data within and across disciplines, is compounded by the additional research problems of: (1) definitional confusion and lack of clarification of key concepts and (2) lack of a systematic approach for the development of theory of data management and curation services. Literature review suggests (Merton, 1968) that the identification, clarification, and linking of the underlying suppositions of key concepts are necessary to develop theory. Sometimes key concepts are not identified, clarified, or linked. Data

! 1! ! ! ! ! management key concepts are not consistently identified, clarified, and linked. Data management and curation key concepts are frequently used interchangeably, which creates confusion, inconsistent application, and disjointed learning outcomes. Digital preservation is one of the four key concepts of data management and curation introduced in this dissertation. The other three key concepts include (1) data management planning, (2) data curation, and (3) digital curation. These key concepts represent various stages and processes in the storage, management, and preservation of data over its lifecycle. A theory of digital preservation or digital curation should begin with the identification, clarification, and linking of these key concepts within an interdisciplinary conceptual framework model. Currently there is a need for interdisciplinary conceptual framework models (Whyte, 2012; Parsons et al., 2011) that clarify, identify, and link the underlying suppositions of these key concepts to build data management and curation theory. The identification, clarification, and linking of these key concepts were addressed by the data management and curation framework (see Fig. 1) developed from the preliminary study. Metatriangulation, a multiple paradigm theory building approach, was introduced in this dissertation to contribute to data management and curation theory.

1.2 Research Purpose The purpose of this research study is to (1) identify and clarify the key concepts in data curation, (2) link those key concepts of data curation into a framework, (3) integrate the underlying suppositions of those key concepts within a paradigm conceptual model, and (4) use the data asset framework (DAF) methodology to explore the data management and curation practices of scientists at research labs. The implications from this study can lead to recommendations to improve the current data management and curation practices of participants.

1.3 Research Questions The following research questions were developed to gather information on how data is currently stored, managed, and preserved by scientists at select research labs at FSU. In order to effectively manage research data, scientists need to identify, classify, assess, and organize their data. The Data Asset Framework (DAF) methodology is one approach for auditing data assets to improve current data management and curation practices. These research questions were answered using the DAF methodology as an approach to perform an assessment of data assets. It

! 2! ! ! ! ! is assumed that the adoption of relevant data management standards, best practices, and guidelines, where appropriate will lead to improving the data management and curation practices of scientists at research labs at FSU. 1. How do researchers create, manage, store, and preserve research data? 2. How can the identification and clarification of key DMC concepts be resolved within and across disciplines? 3. What are some of the theories, practices, and methods disciplines use to address research data management in your discipline? 4. How can multiple paradigms’ perspectives on data management and curation practices within and across disciplinary domains contribute to building DMC research & theory?

The research questions were addressed through the use of Metatriangulation for data management and curation theory development and the DAF methodology operationalized within a conceptual framework for integrating multiple paradigm perspectives (see Fig. 7). The results from this dissertation study has met the goal of developing learning outcomes with implications that articulate the significance in improving the data management and curation practices where feasible, permissible, and beneficial.

1.4 Significance of Research The significance of this research includes: (1) identification and classification of research data assets, (2) articulation for the need to address data management challenges, (3) development of data management and curation services policies and procedures, (4) improvement of current data management practices, and (5) contribution to the future development of data management plans that meet and/or exceed funding agencies’ data management plan requirements. Also, this research introduces the key concepts of data management and curation for concept analysis and data management and curation theory development considerations.

1.5 Assumptions The research project assumed: (1) there is a need for scientists at research labs at FSU to identify and clarify their research data, (2) there is a need to improve current data management and curation practices of scientists at research labs at FSU, (3) there is a need to develop a theory

! 3! ! ! ! ! of data management and curation based on theory and practice, (4) there is a need for the clear articulation and introduction of the key concepts of data management and curation to disciplinary domains beyond the dominant disciplinary domains in which most of the current data management and curation research takes place (i.e. ARL, LIS), and (5) scientists who participated in this research study will reevaluate their data management and curation practices.

1.6 Definitions The identification, clarification, definition, and concatenation of underlying suppositions of the four key concepts of data management and curation are fundamental to this research study. As Denzin noted: “Herbert Blumer traced sociologists’ inability to develop sound theory to a misunderstanding of concepts” (Denzin, 1970/2009, p. 33). Theory is more than simply identifying and defining key concepts but also requires the linking of underlying suppositions (Merton, 1968). Within the scope of this study, the definitions of the key concepts are: 1. Data management planning is a data lifecycle management process comprised of departmental, institutional, or organization policies and procedures governing the creation, organization, dissemination, preservation, and comprehensive lifecycle management of research data and information in accordance with relevant standards, best practices, and guidelines. Data management planning includes data curation, digital curation, and digital preservation. Data management and curation services include data management planning. 2. Data curation is a data lifecycle management process of providing descriptive, annotative, and representative information for research data through metadata. 3. Digital curation is a data lifecycle management process of storing, managing, and storing curated research data within a repository or digital content management system. 4. Digital preservation is a data lifecycle management process of maintaining the authenticity, integrity, and security of curated research data within a standards-based repository or digital content management system for long-term archival preservation.

The key concepts are defined in literature as follows: 1. Data Management Planning [DMP] is the planning of policies for the management of data types, formats, metadata, standards, integrity, privacy, protection, confidentiality,

! 4! ! ! ! !

security, intellectual property rights, dissemination, reuse/re-distribution, derivatives, archive, preservation, and access (NSF, 2011); 2. Data Curation [DaC] is the “active and ongoing management of data through its lifecycle of interest and usefulness to [research], science, scholarship, and education” (UIUC GSLIS, circa 2006) (includes analog, digitized, & born digital research data); 3. Digital Curation [DiC] is the “maintaining and adding value to a trusted body of digital information for future and current use; specifically, the active management and appraisal of data over the entire life cycle” (JISC, 2006) (includes digitized & born digital research data); 4. Digital Preservation [DP] is “the series of technical, strategic, and organizational actions and interventions required to ensure continued and reliable access to authentic digital objects for as long as they are deemed to be of value” (JISC, 2006) (includes digitized & born digital research data).

Fig. 1 – Data Management and Curation (DMC) Key Concepts

! 5! ! ! ! !

Additional concepts and terms relevant to this research are defined in literature as follows: 1. Concept – “A mental image that summarizes a set of similar observations, feelings, or ideas” (Schutt, 2006, p. 92); 2. Cyberinfrastructure – The integration of personnel, services, organizations, computing hardware, data and networks, digitally enabled sensors, observatories, and experimental facilities with base technology (computer and information science and engineering-CISE) and discipline-specific science (NSF CI Council, 2006, p. 6; Atkins et al., 2003) for “supporting and enabling large increases in multi-disciplinary science while reducing duplication of effort and resources across scientific domains” (Bowker et al., 2010, p. 100); 3. Data – “A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing” (CCSDS, 2002/2012, p. 20); 4. Methodological – “This assumptions holds that a qualitative researcher conceptualizes the research process in a certain way. For example, a qualitative inquirer relies on views of participants, and discusses their own views within the context in which they occur, to inductively develop ideas from particulars to abstractions” (Creswell, 1994; Creswell, 2007, p. 248); 5. Paradigm – The philosophical stance of a researcher whereby basic set of beliefs guide action (Denzin & Lincoln, 1994; Creswell, 2007, p. 248). “A paradigm is a set of generalizations, beliefs, and values of a community of specialists” (Creswell & Clark Plano, 2011, p. 39); 6. Model – “A model is a representation of an idea, object, event, process, or system, which concentrates attention on certain aspects of the system – thus facilitating scientific inquiry” (Briggs, 2007, p. 73); 7. Reference Model – “A reference model is an abstract framework for understanding significant relationships among the entities of some environment, and for the development of consistent standards or specifications supporting that environment. A reference model is based on a small number of unifying concepts and may be used as a basis for education and explaining standards to a non-specialist” (CCSDS, 2002/2012); 8. Theoretical perspective – The philosophical stance informing the methodology and providing a context for the process and grounding its logic and criteria (Crotty, 1998);

! 6! ! ! ! !

9. Theory – “A theory must contain a set of propositions [stated relationship] or hypothesis that combine descriptive and relational concepts” (Denzin, 1970/2009, p. 34).

1.7 Overview of Theory Metatriangulation (Lewis & Grimes, 1999) is a theory-building approach that builds theory from the integration of multiple paradigms perspectives (Gioia & Pitre, 1990) through conceptualized triangulation (Denzin, 1970) that includes (1) multiparadigm reviews, (2) multiparadigm research, and (3) metaparadigm theory building based on multiparadigm exemplars. Building on previous research in organizational theory, Lewis & Grimes developed a meta-theory building approach that is most suited for disciplinary domains with inconsistent theories such as the field of data management and data curation within the ARL and LIS disciplines. DMC research in the LIS field is primarily dominated by ethnographic and pragmatic studies as evidenced by the preliminary study. A multiple paradigm perspective theory building approach such as Metatriangulation is enabled to facilitate an improved understanding of the significance of data management and curation common to multiple disciplinary domains.

1.8 Overview of Conceptual Framework The conceptual framework for analyzing methodological suppositions (Burrell & Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983; Solem, 1993, p. 595) (see Fig. 5) is a general metatheoretical framework (Solem, 1993, p. 594) that provides a two-dimensional perspective for the methodological analysis of social theory (Solem, 1993) within a domain to facilitate scientific inquiry. This framework contains major suppositions such as the ontology, epistemology, frame of reference, concepts, and methods processes involved in the analysis of a problem. This conceptual framework accommodates the analysis, comparison, and integration of multiple perspectives in the investigation of a problem common across multiple domains.

1.9 Overview of Method The DAF is a methodology composed of survey and interview research methods. The purpose of the DAF is to identify, assess, classify, and organize data assets in order to develop recommendations that will improve research data management. An adapted DAF methodology was operationalized within this study as a two-phase sequential mixed-methods explanatory

! 7! ! ! ! ! research design (Creswell and Plano Clark, 2011). The mixed-methods consisted of survey questionnaires administered in Phase 1 and semi-structured interviews conducted in Phase 2 in this study. It was assumed that the application of social science research methods to the DAF methodology would produce empirical data with implications to improve the data management practices of scientists of research labs at FSU and the wider DMC, ARL, and LIS communities.

1.10 Summary This dissertation proposed the use of Metatriangulation of data management and curation as a theory-building approach, operationalization of the Data Asset Framework (DAF) methodology as an approach to investigate data management practices, and integration of a conceptual framework that explored the methods, concepts, frame of references, and how multiple disciplinary domains manage, store, and preserve their research data. Scientists from several research labs at Florida State University and select scientists associated with the National Science Foundation (NSF) EarthCube project were purposively selected to participate in this dissertation research. The data generated from this research study will serve as references for exploring current and future data management and curation practices. The learning outcomes from this study included research, practical, and social implications resulting in recommendations that contribute to improving the ways scientists manage, store, and preserve research data. The preliminary study2 (see 3.11.1 Preliminary Pilot Study Results) on the data management and curation opinions of multiple stakeholders across multiple disciplinary domains conducted in December 2012 preceded and contributed to the development of this dissertation.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 2!GL15!Conference!Proceedings.!(2014).!https://easy.dans.knaw.nl/ui/datasets/id/easyG dataset:57106!! ! 8! ! ! ! !

CHAPTER TWO LITERATURE REVIEW

“Information science would do well to develop more and better theories, for without adequate theoretical support, we may do a technically brilliant job of solving the wrong problems.” (Dow, 1977, p. 323)

2.1 Data Management and Curation (DMC) 2.1.1 Data Management Planning, Data Curation, Digital Curation, and Digital Preservation A major aim of the data management and curation services community is to improve the management of research data for current and future use. Within the scope of this study, “research data” is data that you currently hold, that has been collected and/or used in the course of your research. Research data can be primary data collected by you or your research group or secondary data provided by a third party. It may be quantitative or qualitative (e.g. survey results, interview transcripts, databases compiled from documentary sources, images or audiovisual files) (McGowan & Gibbs, 2009). Currently, literature review of research data management and digital preservation/curation discourses promote the need for theory development in the area of data management and curation practices within the disciplinary domain of library and information science (LIS). Recently, there have been national and international calls for proposals for systematic and interdisciplinary approaches to improve data management, data curation, research on data, and digital preservation within and across disciplinary domains. These recent calls for proposals for theory of digital preservation and theory of digital curation research represent a growing trend. Digital preservation, theory of digital preservation, data curation, theory of data curation, data management and curation developments, and conceptual framework models/tools for integrating all of these concepts within and across multiple disciplinary domains continue to be phenomena of interest in the profession, academia, and data management research. Multiple disciplinary domains, particularly data intensive scientific disciplinary domains, will benefit from the development of a theory of digital preservation and theory of digital curation that embraces multiple paradigmatic perspectives such as the development of data management and curation (DMC) theory proposed in this dissertation. Even though the origins of data curation and digital curation started in the

! 9! ! ! ! ! academic research libraries, data management and curation, e-science, and LIS research communities, the scope and reach of data management and curation now includes all disciplinary domains. To better address the data management and curation research issues common across multiple disciplinary domains, the main key concepts of data management and curation must be identified, clarified, defined, articulated, linked, and associated with an established theory (i.e., social theory, organizational theory, systems theory, etc.) for improved understanding, comprehension, and knowledge transfer within and across multiple disciplines

The four key concepts of data management and curation (DMC) are: 1. Data Management Planning (Entire data lifecycle – DCC Curation Lifecycle Model) 2. Data Curation (Level 1 Curation - Traditional academic information flow) 3. Digital Curation (Level 2 Curation - Information flow with data archiving) 4. Digital Preservation (Level 3 Curation - Information flow with data curation) (Lord and Macdonald, 2003) DMC practices include four major data lifecycle management processes that: 1. Fulfill departmental, institutional, organizational policies & data management requirements; 2. Provide data creation (primary, secondary, tertiary data), data publication, minimal data description; 3. Facilitate added value (metadata), management & storage of archived data over data lifecycle; 4. Integrate a series of technical & strategic actions and consultations to ensure continual data authenticity.

2.1.2 Data Management Planning Progressive research and development of data management plans began to permeate throughout the wider research and learning communities with the announcement of the National Science Foundation (NSF) data management plan (DMP) requirement that became effective January 18, 2011. In brief, the NSF DMP requirement calls for proposals seeking NSF funding to provide a supplemental document, not to exceed two pages, detailing the dissemination and sharing of research results congruent with the NSF policy on managing research data as outlined

!10! ! ! ! ! in the NSF Grant Proposal Guide (GPG) Chapter II.C.2.j. The special information and supplementary documentation section in the NSF data management policy includes the plans for the management and sharing of data, including but not limited to: data management planning, data curation, digital curation, and digital preservation. One of the guiding purposes of this research study was the development of the DMC framework that links and interrelates the key concepts of data management and curation. The development of a proposed theory of DMC was first based on the development DMC framework. The following components of data management planning as defined by the NSF, shall define data management planning within the context of this research study. Data management planning includes: 1. The types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project; 2. The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies); 3. Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements; 4. Policies and provisions for re-use, re-distribution, and the production of derivatives; and 5. Plans for archiving data, samples, and other research products, and for preservation of access to them (NSF 13-13, 2013)

The NSF DMP is supported by the National Science Board Committee on Strategy and Budget Task Force on Data Policies statement of principles outlined in NSB-11-20 (February 16, 2011 document), particularly the dissemination and sharing of research results as it appears in Chapter VI, Section D, of the NSF Proposal and Award Policies and Procedures Guide (pages VI-8 and VI-9 of NSF Document 10-1; NSB4, 2011). The NSF DMP includes elements of ICPSR Elements of a Data Management Plan (ICPSR, n-d), levels of curation (Lord & Macdonald, 2003) the DCC Curation Lifecycle Model (DCC, 2007/2014), Data Seal of Approval

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3 NSF 13-1. (January 2013). Chapter II – Proposal Preparation Instructions: j. Special Information and Supplementary Documentation. 4 NSB. (2011). National Science Board Committee on Strategy and Budget Task Force on Data Polices Statements and Principles retrieved July 10, 2013 from http://www.nsf.gov/nsb/committees/dp/principles.pdf.! !11! ! ! ! !

Assessment Guidelines (DSA, 2009/2014), data curation, digital curation, and digital preservation. The NSF DMP served as the foundational key data management planning reference model that guided this research in pursuit of answering the research questions as they related to the data management practices of scientists sampled for this study.

2.1.3 Data Curation Data curation is a data lifecycle management process of providing descriptive, annotative, and representative information for research data through the application of metadata. Data curation facilitates the adoption of existing metadata and format standards where appropriate to aid in the description, representation, organization, aggregation, access, discovery, and storage of data. Data curation can be applied to analog, digital, digitized, and born digital data existing offline or online. The various processes involved in the data curation process are easily articulated and comprehended by the Level 1 Curation Model (traditional academic information flow model) (Lord & Macdonald, 2003, p. 42). The Level 1 Curation model graphically represents some of the processes involved in data curation such as the basic research process of scientists including but not limited to the development of: (1) web content, (2) primary data, (3) secondary (derived) data, (4) patent data, and (5) tertiary data for publication - leading to the publication process of (6) preprints & e-prints flowing from secondary data, and interacting with tertiary data for publication and peer review; (7) peer review which leads to (8) primary publication leading to (9) secondary publication leading to (10) tertiary publication. Within the scope of this study, the concept of data curation is mapped to level one-curation (traditional academic information flow) (Lord &Macdonald, 2003, p. 42). The concept of data curation as defined and used within this field was primarily addressed in the Digital Data Curation Task Force Report from the Digital Data Curation Task Force Strategy Discussion Day on November 26, 2002 (Macdonald & Lord, 2002).

2.1.4 Digital Curation The DCC Curation Lifecycle Model developed by the (DCC) and the University of Edinburgh was introduced to the ARL and LIS profession at the 3rd International Digital Curation Conference (IDCC) in Washington, DC in December 2007. The DCC Curation Model includes elements of data management planning, data curation, digital

!12! ! ! ! ! curation, and digital preservation. Within the scope of this study, the concept of digital curation is mapped to level-two curation (information flow with data archiving) (Lord, 2003, p. 43). The concept of digital curation was first used at the Digital Curation: digital archives, libraries and e-science seminar sponsored by the Digital Preservation Coalition (DPC) and the British National Space Centre in London on October 19, 2001 (Beagrie, 2006). Digital curation is a data lifecycle management process of managing and storing curated research data within a standards- based technical environment such as an open archival information system (OAIS) data repository. Digital curation facilitates the storage and management of existing data that has undergone data curation, preferably in accordance with relevant standards, best practices, and guidelines where appropriate, to aid in the archiving and long-term preservation of data throughout its lifecycle. Digital curation is applied to digital, digitized, and born digital data (primarily online, but can be applied to offline data assets as well). Digital curation when conducted in accordance with standards (metadata), guidelines (principles), and best practices (metrics) allow for the effective registration, certification, awareness, archiving, and rewarding scholarly communications functions of data over time (Roosendaal & Geurts, 1997; Van de Sompel et al., 2004). Digital curation allows users the current and future use of research data.

2.1.5 Digital Preservation The methods, practices, and tools involved in digital preservation has been and continues to advance in technology since the inception of the Task Force on Digital Archiving, which was created by the Commission on Preservation and the Research Libraries Group in December 1994 (Garrett & Waters, 1996, p. iii). Digital preservation is a data lifecycle management process of maintaining the authenticity, integrity, usability, and security of curated research data within a standards-based repository, or digital content management system. Within the scope of this study, the concept of digital preservation is mapped to level-three curation (information flow with data curation) (Lord & Macdonald, 2003). Digital curation requires periodic consultation (Bush, 1945) and managed activities (Beagrie & Jones, 2001; Beagrie 2006; DPC Handbook, 2008) for checking the authenticity, integrity, and validity of preserved data over time to prevent or reduce obsolescence (Hedstrom & Montgomery, 1998; Russell, 2000; Lee et al., 2002) throughout the lifecycle of data. The concept of digital preservation is articulated through the Level Three Curation Model (information flow with data curation model) (Lord & Macdonald,

!13! ! ! ! !

2003, p. 43). The Level Three Curation information flow with data curation model (Lord & Macdonald, 2003, p. 44) is the Level Two Curation Model with a data curator (formerly archivist position), taking the data from curation to data repository, accepting input from metadata and synchronous interaction with research based on data, while also receiving inputs from secondary (derived) data ultimately contributing to archived data. Even though digital preservation is associated with digital curation, it should be noted that digital preservation is a stage within the curation process that adds preservation management functions (OAIS, 2002/2012) to the curation process and should be viewed in contrast to the concept of level three data curation (Lord, 2003) with data archiving and not used synonymously with the concept of digital curation (DCC, 2004; JISC 2006) or data curation.

2.1.6 Summary Data management planning, data curation, digital curation, and digital preservation are independent yet interrelated key concepts in data management and curation. These key concepts should not be used interchangeably. The key concept must be properly identified, clarified, and defined when conducting data management and curation research within and across multiple disciplines. The underlying suppositions of the key concepts must be concatenated and interrelated when conducting data management and curation theory development within and across multiple disciplines. “Long term curation and preservation represent a complex set of challenges, which are exceptionally difficult for data centers and institutions to address individually” (JISC, 2003). Even though data management planning has figured prominently in research proposals and literature review since the data management planning requirement announced by the National Science Foundation (NSF) in 2011, data management and curation research that includes data curation, digital curation, and digital preservation has been progressively developing since 1996 and is receiving more comprehensive development as a result of the NSF DMP requirement. In addition to the NSF DMP announcement, the White House Office of Science and Technology Policy5 (OSTP) memorandum on increasing access to the results of federally funded research (OSTP, 2013) announced February 22, 2013 presents

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 5 OSTP. (2013). Memorandum for the heads of executive departments and agencies: increasing access to the results of federally funded scientific research. Retrieved July 10, 2013 from OSTP Memorandum on Access - 2013. !14! ! ! ! ! another opportunity for scientists to develop and implement data management practices congruent with emerging data management and curation trends.

Investment in the preservation of digital assets does not present new twists to the problem, especially in terms of immediacy, scale, and uncertainty. In the analog world, the rate of degradation or depreciation of an asset is usually not swift, and consequently, decisions about long-term preservation of these materials can often be postponed for a considerable period, especially if they are kept in settings with appropriate climate controls. The digital world affords no such luxury; digital assets can be extremely fragile and ephemeral, and the need to make preservation decisions can arise as early as the time of the asset’s creation (Blue Ribbon Task Force on Sustainable Digital Preservation and Access, 2008, p. 9).

Since data lifecycle management and preservation are common to multiple disciplines, data management and curation perspectives from different domains are relevant, appropriate, and necessary for this research study and the wider data management research community.

2.2 Research Issues The research issues addressed in this study focus on (1) definitional confusion of key concepts of data management and (2) the need for multiple perspectives on the management of research data to contribute to the development of data management and curation theory. Multiple definitions of the key concepts of data management exist in literature and practice without the clear separation, clarification, and distinction of the specific data management concepts under investigation. Additionally, the interchangeable use of the key concepts such as data curation for digital curation and digital preservation for digital curation for digital preservation along with the use of inconsistent theories based on episodic, ethnographic, or pragmatic studies leave new investigators and users confused and unsure of any logical, practical, or systematic approach for implementing quality data management and curation within and across disciplinary domains. The concepts, methods, and tools employed within disciplinary domains need to be integrated into new conceptual tools for wide-ranging applications outside the principle domain in which they originated when investigating data management and curation (Microsoft, 2006, p. 22). Research discipline-specific preferred concepts, theoretical and methodological biases (Markus & Robey, 1988) include unclear definitions and measures (Bakopoulos, 1985) and organizational structure (Frey, 1982); similarities, differences, interrelationships (Gioia & Pitre, 1990), and tensions

!15! ! ! ! !

(Poole & Van de Ven, 1989); case studies research (Eisenhardt, 1989); and interplay (Schultz & Hatch, 1996) that when investigated and integrated into a multiple paradigm conceptual framework model can provide a broader and deeper understanding of a common phenomenon in aggregate than in isolation. Interplay is the recognition of multiple views of a discipline’s paradigmatic insights and biases (Schultz & Hatch, 1996; Lewis & Grimes, 1999, p. 676). For the purposes of this study, the perspectives of scientists represented the multiple views of data management and curation practices from several research labs across multiple disciplinary domains. These research issues were be addressed by the (1) identification, clarification, distinction, and definition of DMC key concepts through the use of models and (2) introduction of a conceptual framework that accommodates the integration of multiple perspectives in the study of data management and curation across multiple disciplinary domains.

2.3 Relevant Research The purposively selected DMC projects relevant to this study were grouped by: (1) domain, (2) institution, (3) national, (4) international, (5) models, (6) relevant standards, guidelines, & best practices, (7) open access, and (8) tools categories. These eight categories are pertinent for scientists and researcher at research labs at FSU interested and/or involved in developing data management plans, improving data management practices, and facilitating the dissemination, preservation, and sharing of scientific results; specifically, federally funded scientific data. These categories are generic in context, broad in scope, and applicable to research within and across multiple disciplinary domains. The accumulation, aggregation, and interpretation of multiple views on how multiple disciplinary domains store, manage, and share data benefit multiple stakeholders including funders, institutions, and users. Scientists responsible for the management of scientific data can benefit from the different perspectives on DMC within and outside their respective scientific disciplinary through interdisciplinary collaboration, research, and resource sharing in pursuit of effective research data management.

2.3.1 Domain The following domain-specific research projects include data management practices, policies, and strategies that could be useful to scientists in need of improving data management and curation services (includes data management planning, data curation, digital curation, and

!16! ! ! ! ! digital preservation functions) in order to increase the discovery, use, and re-use of their data for research, teaching, and education. The concepts, methods, and models used to store, manage, disseminate, and preserve the various data types and research data specific to disciplinary domains represent paradigmatic perspectives that can be referenced and integrated into a conceptual framework such as the conceptual framework for analyzing methodological suppositions (Burrell & Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983: Solem, 1993) to develop, explore, and improve data management and curation within and across domains.

1. Integrated Earth Data Applications (IEDA) is a National Science Foundation funded project that produced The IEDA Data Management Plan Tool v.2 developed at the Lamont Doherty Earth Observatory data management center at Columbia University with partnerships between EarthChem and Marine Geoscience Data System (MGDS). The data management plan (DMP) tool allows scientists to create NSF-compliant data management plans with capabilities that allow the funding agency to check that research data is being managed according to specifications outlined in the data management plans. While this DMP tool focuses on solid earth data disciplinary domains (marine, terrestrial, and polar environments), the tool allows generic DMP creation for other disciplinary domains. Source: http://www.iedadata.org/compliance/plan 2. The Morphbank project allows scientists to store, provide access, manage, and share images of biological specimens online for other scientists. In order for scientists to successfully contribute and participate in Morphbank, they properly describe, document, and annotate research specimens for contribution which is enabled by good data management practices, particularly data management planning, data curation, and digital curation. Scientists applying for funding from the National Science Foundation (NSF) could include Morphbank in the data management plan as the platform that will allow public access to federally funded research while also providing the required metadata and policies on data lifecycle management. High-performance grid technology is used for long-term preservation and storage of Morphbank data. The Morphbank Biological Imaging project is a continuously growing database of images that scientists use for international collaboration, research, and education. Source: http://www.morphbank.net/

!17! ! ! ! !

The examples in the domain category have three functions in common. They (1) identify, describe, and represent the data, (2) store, manage, and disseminate the data, and (3) provide long-term preservation and archiving. Even though these three functions are common to multiple disciplinary domains, the concepts, methods, models, and the workflows are different. Scientists can leverage multiple data management perspectives to strengthen areas of data management practices where data management standards, best practices, and guidelines are not fully realized, utilized, or capitalized. Examining how other domains store, manage, and preserve data is useful.

2.3.2 Institutional The research and development of data management and curation services are typically developed for particular purposes or research proposals in pursuit of goals specific to an institution according to articulated policies and procedures. Some institutional initiatives have broader impacts beyond the scope of the institutions, with implications for contributing to the wider scientific community that benefits diverse external stakeholders and users. With this in mind, the following samples of institutional initiatives are included to build upon disciplinary domains research in efforts to promote cogent DMC research and education for diverse users.

1. The Graduate School of Library and Information Science (GSLIS) Center for Informatics Research in Science and Scholarship (CIRSS) at the University of Illinois-Urbana Champaign has, and continues contribute to, data curation research. The Graduate School of Library and Information Science (GSLIS) Center for Informatics Research in Science and Scholarship (CIRSS6) project is an exemplar example of the collaboration and partnership of scientists from multiple disciplinary domains “to advance the work of scientists and scholars, the curation of research data, and the integration of information within and across disciplines and research communities” (CRISS, 2013). The Summer Institutes on Data Curation provide data curation training to scientists, researchers, and practitioners involved in the management of research data resulting in learning outcomes that contribute to developing effective data management practices through the adoption of data management standards, best practices, and

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 6 CIRSS. (2013). Center for Informatics Research in Science and Scholarship. Retrieved July 11, 2013 from http://cirssweb.lis.illinois.edu/index.php. !18! ! ! ! !

guidelines. CIRSS – Center for Informatics Research in Science and Scholarship conducts research on information problems that impact scientific and scholarly inquiry. The CIRSS project can provide useful information to researchers developing data collaborations and partnerships between the social sciences and scientific disciplinary domains. The Summer Institute on Data Curation workshop offers relevant standards, guidelines, and best practices on data skills covering collections and curation, e-science, socio-technical data analytics, and digital humanities domains. The workshop is a joint project between Maryland Institute for Technology in the Humanities and GSLIS. Even though the workshop is focused on social sciences and humanities particularly information science, the topic of data curation is common across multiple disciplinary domains. Source: http://cirss.lis.illinois.edu/ 2. The University of North Carolina at Chapel Hill (UNC) School of Information and Library Science (SILS) DigCCurr project is a national and international digital curation curriculum projected targeted at educating and preparing professionals and practitioners in digital curation. The DigCCurr Project provides digital curation workshops and training with project learning outcomes that can contribute to educating scientists on digital curation research and potential benefits for managing their research data. Source: http://ils.unc.edu/digccurr/ 3. The Florida Digital Archive (FDA) is an open archival information system (OAIS) compliant repository that stores and manages data. The FDA requires data depositors to submit archival information packages (AIP) that contain a metadata file (XML) using the metadata encoding transmission standard (METS), content description file, and the data set files that digital preservation compliant (i.e., in non-proprietary file formats). The FDA includes all the OAIS Reference Model Functional digital preservation elements (see Fig. 2). The Florida Virtual Campus (FLVC) Florida Digital Archive provides cost- effective, preserve and archive data for digital materials in support of teaching and learning, scholarship, and research in the state of Florida. The FLVC FDA website provides a wealth of digital preservation information and documents for educating scientists and researchers of research labs at FSU on effective data management and curation practices through the use of standards, best practices, and guidelines. Source: http://fclaweb.fcla.edu/fda

!19! ! ! ! !

4. The University of California Curation Center (UC3) helps researchers and the various UC libraries manage, preserve, and provide access to their important digital assets. The UC3 provides information on data management tools and resources such as Merritt (data repository), EZID (persistent IDs to link data), Service (archive website content), DATA Up (tabular/spreadsheet data management – i.e., Microsoft® Excel), and the DMPTool (help researchers create and manage data management plans). The UC3 provides data management and curation services that extend beyond the scope of UC with potential implications and applications for improving data management practices of scientists at FSU. Source: http://www.cdlib.org/services/uc3/

Even though the above listed institutional DMC research work deals primarily with data management and curation LIS and ARL disciplinary domains, these examples have research data management implications for all disciplinary domains including domains in this research study.

2.3.3 National Among the national initiatives involved with data management, the following resources have been selected as appropriate and suitable references in contributing support for the development of effective data management strategies and practices of scientists of research labs at FSU. a) The Board on Research Data and Information (BRDI) is a national initiative to improve the stewardship, policy, and use of digital data and information for science and the broader research and learning communities. BRDI sponsors events and symposium for scientists, researchers, academic deans, provosts, practitioners, funding agencies, commercial, non-profit, and stakeholders interested in the research issues involving the management of research data and developing strategies for curriculum development aimed at addressing these research issues from now into the future. BRDI offers invaluable meeting and events at the National Academies of Sciences in Washington, DC. Scientists at research labs at FSU can significantly benefit from BRDI publications targeted at improving strategic and organizational policy development governing the stewardship, scholarship, and sustainability of research data in an era of massive amount

!20! ! ! ! !

of data, data intensive scientific discovery, and long-tail of science (dark data). Source: http://sites.nationalacademies.org/PGA/brdi/index.htm b) The HathiTrust is an outstanding example of a data repository that acquired Trusted Repository Audit & Checklist (TRAC) compliance by the Center for Research Libraries (CRL) in March 2011 before TRAC became the ISO 16363:2012 standard. HathiTrust is a partnership of academic & research institutions, offering a collection of millions of titles digitized from libraries around the world. HathiTrust is a good example whereby digitization of physical objects has been stored in a data repository that adopted the TRAC standard. Scientists that manage data in their respective repositories use HathiTrust as both a TRAC compliant resource and reference. Source: http://www.hathitrust.org/ c) The Integrative Biology Project (Donnelly, Boyd, & Spellman, 2008, p. 9-10) provides a good case study example of the use of high-performance computing by biologists and other scientists in the development of multi-scale models for the provision of improved data management, data sharing, data storage, and data reuse of biological research data output. These select user tasks provide relevant insights that support the goal of this study. • Manage and curate both input and output datasets securely. • Create appropriate metadata information for future reference and re-use. • Store intermediate and end user results in images for publications, including animations, as data files, using structured data management on the SRB. (Note: SRB7 is used to manage the wide variety of datasets generated at multiple locations in the project.) • Assist in collaborative, distributed work. • Use established standards where appropriate.

2.3.4 International The following organizations have provided, and continue to provide, leading research on data management and curation for application across multiple disciplinary domains. Though the !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 7 SRB. (2012). The DICE Storage Resource Broker. Retrieved June 24, 2013 from http://www.sdsc.edu/srb/index.php/Main_Page. !21! ! ! ! ! list is not exhaustive, these international supporters and stakeholders in the domain of data curation and digital preservation are useful resources. a) The Australian National Data Services (ANDS) provides resources that enable improved use, management, publication, discovery, accessibility, and organization of data. Source: http://www.ands.org.au/about-ands.html b) The Alliance for Permanent Access (APA) is an international organization with aims to develop a shared vision and framework for sustainable organizational infrastructure for permanent access to scientific information. Source: http://www.alliancepermanentaccess.org/index.php/about/ c) The Digital Curation Centre (DCC), founded in 2004 in Edinburgh, is a world-leading center of expertise in digital information curation with a focus on building capacity and skills for research data management across the UK’s higher education institutions (HEIs) research community. The DCC reach, influence, and impact reach well beyond the UK HEIs. Their Curation Lifecycle Model, tools, publications, research, and resources contain an abundance of data management and curation information beneficial to all disciplinary domains and the research and learning communities. Source: http://www.dcc.ac.uk/about-us d) The Digital Preservation Coalition (DPC) is an advocate and catalyst for digital preservation, enabling members to deliver resilient long-term access to content and services, and helping members derive enduring value from digital collections. Source: http://www.dpconline.org/about e) The JISC, formerly known as Joint Information Systems Committee, is a registered charity working on behalf of UK higher education to further education and skills, and to champion the use of digital technologies. Source: http://www.jisc.ac.uk/aboutus.aspx f) The National Science Foundation (NSF) funds numerous research projects within and across multiple scientific disciplinary domains. The NSF has many program directorates specific to particular disciplinary domains. Multiple disciplinary domains produced by the NSF include the Division of Advanced Cyberinfrastructure (ACI), formerly the Office of Cyberinfrastrucure (OCI), and Interdisciplinary Research proposals. Source: http://www.nsf.gov/od/iia/additional_resources/interdisciplinary_research/

!22! ! ! ! !

Of the above-listed organizations, the NSF is the most relevant to this study and research scientists at FSU, followed by the Digital Curation Centre (DCC). The following NSF definition of interdisciplinary research is particularly relevant and significant to this study for three reasons: (1) provides a generally accepted and adopted definition of interdisciplinary research that can be used across multiple institutions and organizations; (2) addresses the need for a multiple paradigm conceptual model to address research topics common to multiple disciplinary domains; (3) supports the need for the integration of multiple paradigmatic perspectives for improved comprehension of research issues that extend beyond the scope of a single disciplinary domain or paradigm.

Interdisciplinary research is a mode of research by teams or individuals that integrates information, data, techniques, tools, perspectives, concepts, and/or theories from two or more disciplines or bodies of specialized knowledge to advance fundamental understanding or to solve problems whose solutions are beyond the scope of a single discipline or area of research practice. (Committee on Facilitating Interdisciplinary Research, Committee on Science, Engineering, and Public Policy, 2004, p. 2)

In addition to providing an adopted definition of interdisciplinary research underlying this study, the NSF made a significant impact on the research and learning communities with the data management plan (DMP) requirement in 2011. The NSF DMP requires any scientist seeking NSF support to submit a two-page DMP as part of his/her research proposal. Scientists no longer have the option of simply providing useful research data descriptions, representations, and preservation information with their federally-funded research. The DMP requirement is an opportunity for scientists to prolong the usefulness of their research to the science, scholarly, and research communities by providing crucial and necessary information that allows machines, computers, and operating systems to search, retrieve, deliver, and preserve research data in homogeneous, heterogeneous, and distributed technical environments. A DMP allows scientists to prevent their research data from becoming unusable, non-accessible (dark), non-discoverable, offline, and difficult to manage. The international organizations listed above, all provide resources and access to data management and curation services projects, research, publications, resources, and tools that could benefit any scientists wanting to improve education, knowledge, skills, and understanding of data management planning, data curation, digital curation, and digital preservation. Whether

!23! ! ! ! ! fulfilling university and/or funding agencies’ data management requirements, all of these organizations provide access to research, resources, and tools to educate scientists and/or professionals responsible for managing research data.

2.4 Models The following reference models are foundational to the definitions of data curation, digital curation, and digital preservation in the ARL and LIS communities. These models figure prominently in the basic understanding of, and introduction to, data management and curation services. These models are articulated to illustrate some of the roles and complexities involved in these DMC processes. For purposes of this study, a reference model is defined as: • A reference model is an abstract framework for understanding significant relationships among the entities of an environment, and for the development of consistent standards or specifications supporting that environment; • A reference model is based on a small number of unifying concepts and may be used as a basis for education and explaining standards to a non-specialist; • A reference model is not directly tied to any standards, technologies, or other concrete implementation details, but does seek to provide a common semantic that can be used unambiguously across, and between, different implementations (Fox, et al., The Digital Libraries Reference Model section, 20112).

2.4.1 Open Archival Information System (OAIS) Reference Model – Preservation The OAIS Reference Model Functional diagram is a graphical representation of several preservation management functions that includes aspects of data management planning, data curation, digital curation, and digital preservation. The OAIS Reference Model functional activities include the producer (data creator), submissions information package (SIP), archival information package (AIP), and dissemination information package (DIP) to preserve and archive data for resource discovery, use, and reuse. The open archival information system (OAIS) reference model, particularly the functional digital preservation diagram, illustrates the processes involved in digital preservation. The diagram illustrates the workflow of an archival information package (AIP) which is comprised of metadata wrapper, data description, and data files. The AIP is necessary for the

!24! ! ! ! ! repository (platform that stores the data) to read and understand the data that is deposited for data storage management. The AIP is generally ingested into the repository platform by a data archivist, data curator, or data manager, but could be also be ingested by the scientist. The AIP must adhere to appropriate metadata and repository standards for successful data storage, digital curation, and digital preservation. The OAIS reference model is at the foundation of multiple definitions of data curation, digital curation, and digital preservation, and contributes significantly to many institutional repository developments, models, and frameworks, as well as to data management and curation research. The OAIS reference model provides an easy way to articulate to others the major digital preservation processes involved in data management planning.

Fig. 2 – OAIS Reference Model – Functional (CDSDS, 2002/2012) (Used with permission)

2.4.2 Levels 1-3 Curation The three levels of curation are (1) Level 1 Curation – traditional academic information flow (data curation), (2) Level 2 Curation – information flow with data archiving (digital curation), and (3) Level 3 Curation – information flow with data archiving (digital preservation; Lord, 2003). These levels of curation were introduced in the 2003 e-Science Curation Report: data curation for e-Science in the UK: an audit to establish requirements for future curation and provision, and is seen by many data management and curation professionals as an important document in the development of data curation, digital curation, and digital preservation. The Levels 3 Curation diagram (see Fig. 3) (includes Level 1 & Level 2 Curation) captures three of

!25! ! ! ! ! the key data management concepts while the overall diagram comprises the fourth key concept, and most important, data management planning.

Fig. 3 – Level Three Curation – Information Flow with Data Curation (Lord, 2003) (Fair Use)

2.4.3 The DCC Curation Lifecycle Model The DCC Curation Lifecycle Model was introduced to the research and learning community at the 3rd International Digital Curation Conference (IDCC) in Washington, DC in December 2007. The model includes many important stages needed for the effective planning, management, and preservation of data over its lifecycle of usefulness to research, science, and education. The model is meant for use as a reference guide and not a definitive or exhaustive checklist in the processes involved in data management and curation. The stages range from the description and representation of data and databases to curation, storage, and preservation to dissemination, access, use, and re-use to disposition. The DCC Curation Lifecycle Model is mapped to the different data management and curation stakeholders in the Data Asset Framework (DAF) implementation guide (JISC, HATII, DCC, 2009, p. 3). The DCC Curation Lifecycle model provides a high-level graphical representation of the key concepts of DMC for easy articulation, assimilation, adoption, and integration in development of planning policies and strategies for effective data management across disciplines.

!26! ! ! ! !

Fig. 4 – The DCC Curation Lifecycle Model (Digital Curation Centre, University of Edinburgh, 2007/2014) (Used with permission)

These three models are included in this dissertation due to their significance in the development of data management planning, data curation, digital curation, and digital preservation concepts, definitions, education, and research. Each one of the models have been included, cited, or extended within numerous publications, and have been highlighted at national and international conferences. The purpose for the inclusion of these models is to support the need for the identification, clarification, and differentiation of the key concepts of data management and curation services (DCMS; See Fig. 1) to reduce definitional confusion and interchangeable use of key concepts, and to promote future data management and curation services theory development. These models assist in the conveyance of key DMC concepts beyond the primary disciplinary domain of DMC services in academic research libraries (ARL), iSchools, and library and information sciences (LIS) not for the unification, standardization, or assimilation of these key concepts across multiple disciplinary domains, but to provide a DMC paradigmatic framework for use within a conceptual model in investigating DMC across multiple disciplines. Since DMC is common to all disciplines, other disciplinary domains’ perspectives,

!27! ! ! ! ! understandings, and interpretations of these models are necessary for interdisciplinary research, collaboration, and DMC theory development. Research issues involving the management of data within and across disciplinary domains requires a more systematic and comprehensive approach for promoting good data management practices to ensure funding agencies data management requirements, data curation, data access, data discovery, and data archiving. As mentioned earlier, the current and future use of research data is contingent on the use of relevant standards, best practices, and guidelines where appropriate as developed through the global expert opinion, multi-stakeholder process, and consensus (ISO8, 2013) engagements. The DCC Curation Lifecycle Model was used as a referent model during the interviews in Phase 2 of this study to allow interview participants to click on areas of the model which they believe were important in the management and curation of their research data. The Data Curation Continua (Treloar, Groenewegen, & Harboe-Ree, 2007) is another reference model in which to assess and audit the (1) object, (2) management, and (3) access dimensions of data lifecycle on a continuum.

Table 1: Data Curation Continua (Treloar, Groenewegen, & Harboe-Ree, 2007) (Used with permission)

2.5 Relevant Standards, Guidelines, & Best Practices The following sample of standards, best practices, and guidelines are included in support of disciplinary domains should explore, investigate, and adopt standards, best practices, and guidelines specific to their research work and disciplinary domain when appropriate. Since most scientists do not have the time to explore disciplinary domain-specific standards, best practices, !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 8 How does ISO develop standards? Retrieved May 26, 2014 from ISO standards. !28! ! ! ! ! and data management guidelines, it is recommended that scientists acquire a data curator, data archivist, or data manager to perform the necessary data management research and due diligence necessary to exploit the applicable DMC resources and tools necessary for data management planning, data curation, and quality research data management. Below are a few examples that support exploring DMC practices in multiple disciplinary domains within a research university.

2.5.1 Metadata Standards Metadata is comprised of elements that provide information that describes and represents data such as data objects, databases, data sets, or information resources. Metadata ensures a data object can be appropriately identified, discovered, and used. Categorized metadata include topic headings with defined functions. Examples of categorized metadata and accompanying functions contributing to development of a metadata standard are: (1) descriptive metadata – enables identification, location, and retrieval; (2) technical metadata – describes technical data capture processes; (4) administrative metadata – manages copyrights, acquisition, and versioning; (5) use metadata – manages user access, tracking, and multiple versioning information; and (6) preservation metadata – documents preservation, migration, validation information (Higgins, 2007). An excellent source on metadata standards for various communities and domains, is the “Seeing Standards: A Visualization of the Metadata Universe9” by Jenn Riley (2009/2010). A brief, general introduction of metadata and metadata standard elements is necessary for basic understanding before transitioning to specific metadata standards relevant to scientific disciplinary domains, particularly those research labs at FSU selected for this study.

2.5.2 Disciplinary Domains The problem of managing data varies across fields, disciplines, and domains. The conceptual models, methods, and tools used are specific and applicable to the types and formats of data within respective fields, disciplines, and domains. For the purposes of this study, disciplinary domains will refer to both disciplines and domains to accommodate multi- disciplinary, interdisciplinary, and cross-discipline domains such as several research laboratories at the Florida State University and the National Science Foundation (NSF) EarthCube project, respectively. This study will not delve deep into the distinctions between multi-disciplinary, !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 9!Riley,!J.!(2009/2010).!Seeing Standards: A Visualization of the Metadata Universe. ! !29! ! ! ! ! interdisciplinary, and cross disciplinary for the purposes of this research but will treat each classification as multiple perspectives representative of multiple disciplinary domains, appropriate and relevant for investigating DMC practices in multiple research labs. The purpose of this section is to introduce disciplinary domain-specific DMC research as a reference guide for stimulating the discussion, exploration, and implementation of metadata standards, metadata standards extensions, tools, and uses. The following disciplinary domains metadata examples are relevant standards for applying DMC processes to research data within disciplinary domains to facilitate data access, discovery, and use of the research data beyond disciplinary domains. It is assumed that the following disciplinary domain examples can serve as resources for those interested and/or involved in adopting disciplinary domain metadata to facilitate data sharing. 1. Biology - http://www.dcc.ac.uk/resources/subject-areas/biology 2. Earth Science - http://www.dcc.ac.uk/resources/subject-areas/earth-science 3. General Research Data - http://www.dcc.ac.uk/resources/subject-areas/general-research- data 4. Physical Science - http://www.dcc.ac.uk/resources/subject-areas/physical-science 5. Social Science & Humanities - http://www.dcc.ac.uk/resources/subject-areas/social- science-humanities

These disciplinary domain metadata categories have metadata standards, tools, and use cases that offer resources that enable data management planning, data curation, digital curation, and digital preservation specific to disciplinary domains. The examples in aggregate offer multiple worldviews and perspectives on the representation and description of scientific data for accommodating the data lifecycle management between levels of data management and curation. Data management planning must include disciplinary-specific metadata standards where appropriate or at least some sort of technical documentation, annotation, or description of the scientific data for reproducibility, interoperability, and resource discovery now, and in the future.

2.5.3 Repository Standards The following two repository standards are included as reference guidelines for ensuring storage, management, and long-term preservation for data. These standards ensure that a data repository either meets or exceeds minimum criteria requirements deemed necessary for data

!30! ! ! ! ! management planning compliance. In particular, repository developed using ISO guidelines are considered exemplars and can serve as reference models for organizations developing repository standards for strategic and comprehensive data management implementation. 1. ISO 1472:2012 defines the reference model for an open archival information system (OAIS) (See Fig. 2). Repositories that are OAIS-compliant (i.e., FLVC FDA) are considered exemplars and can effectively be used as reference models for developing OAIS-compliant repositories to store, manage, and preserve research data as part of effective data management and curation services practices. 2. ISO 16363:2012 defines recommended practices for assessing the trustworthiness of digital repositories, formerly the Trusted Repository Audit Checklist (TRAC). Any repository that is TRAC-compliant (i.e., HathiTrust) is considered an exemplar and can effectively serve as reference model for accomplishing the necessary requirements to acquire TRAC-compliance and thus exhibit data management and curation practices congruent with appropriate standards (i.e., metadata and repository).

2.5.4 Best Practices This small sample of data management best practices is included as examples in support of recommendations for the development of data management best practices compliant with established procedures and workflows that support the adoption of metadata and data repository standards in pursuit of effective data management and curation services and practices. The Australian National Data Services (ANDS) Principles is a good example of multiple stakeholders’ engagement, support, and collaboration that coalesced into quality development of organizational data management and curation services and practices. The Data Seal of Approval Assessment Guidelines, included in the preliminary study, provide very good criteria for data repository assessment. Similar to data repositories that acquire ISO compliance, data repositories that acquire Data Seal of Approval compliance are exemplars. The Griffith University’s best practice guidelines for researchers: managing research data and primary materials10 (Consultation draft, July 2013) addresses and supports some of the statements outlined in this section. Lastly, the Oak Ridge National Laboratory (ORNL) Distributed

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 10!Griffith University. (August 2013). Best practice guidelines for researchers: Managing research data and primary material. See: Best practice guidelines for researchers.! !31! ! ! ! !

Active Archive Center (DAAC) for Biogeochemical Dynamics Best Practices, provide a list of best practices specific to a disciplinary domain but with data management implications for multiple disciplinary domains, particularly research labs at FSU.

2.5.5 Guidelines The following guidelines provide useful information for developing data policy, general principles, and procedures for accommodating standards and best practices relevant to data management planning, leading to effective data management and curation practices. 1. Research Information Network stewardship of digital research data principles and guidelines is a framework promoting the access, discovery, and use of data from publicly funded research. The document provides guidelines on the responsibilities of research institutions and funders, data mangers, learned societies and publishers (RIN, 2008). Of all the guidelines identified in this study, this one is one of the most relevant, since this document, which supports public access to publicly funded research data, predates the recent 2013 OSTP memorandum for public access to publicly funded research in the US and the 2013 NEH Office of Digital Humanities data management plan requirement by five years. 2. The OECD Principles and Guidelines for Access to Research Data from Public Funding (Source: OECD 2007 http://www.oecd.org/science/sci-tech/38500813.pdf) 3. Promoting Access to Public Research Data for Scientific, Economic and Social Developments (Source: Data Science Journal, Volume 3, 29 November 2004)

The ability for curation and preservation management systems to assure that users can render and use preserved content is realized through sound practices such as the Trustworthy Digital Object (TDO) Methodology principles of preservation in conjunction with the adoption of the OAIS standard. The TDO Methodology principles include: 1. Content servers that store and provide search and access to data; 2. Replication mechanisms that protect data from loss of last copy of any data; 3. A method for packaging a work together with metadata that includes provenance assertion, reliable linking of related works, ontologies, rendering software, and package pieces with one another;

!32! ! ! ! !

4. Standard bibliographic metadata and topic-specific ontologies defined, standardized, and maintained by the professional communities; 5. A bit-string encoding scheme to represent each content piece in language insensitive to irrelevant and ephemeral aspects of its current computer environment (Gladney, 2004, p.7).

The DMC key concepts of data management, selection of relevant research, standards, best practices, guidelines, and tools permit the dissemination and sharing of data via open access. Open access benefits funders, scientists, researchers, and the scholarly research communities.

2.6 Open Access Open access (OA) means free, unrestricted access to resources. The decision for scientists to allow open access to scientific research is no longer an option, but is a requirement for some federally funded research projects, particularly those with DMP requirements. This creates both a challenge and an opportunity for scientists to develop data management and curation practices that both improve the management of data and develop plans that are compliant with the emerging trends of data management plan requirements, digital preservation, and open access. Open access resources have limited and/or permissive user access that allows online access. Public access is not the same as open access. Generally, public access refers to access granted to the public after an embargo (restricted or limited access) has expired. Open access allows use and re-use of freely available resource materials without risk or harm to the users. Open access is free of most copyright and licensing restrictions and provides unrestricted online public access to electronic literature including results of scholarly research (BOAI, 2002; Suber, 2004/2013; IOAW, 2013). As mentioned previously, research data must include metadata in order for the resource to be discovered and accessed. The same is true for all open access materials. Actually, open access materials without proper metadata succumb to the same information access and discovery issues applicable to research data without metadata, in- accessible (dark/offline). Research, development, and support for open access continues to progress. The White House’s Office of Science and Technology Policy (OSTP) announced via a memorandum to all department heads in February 2013 that federal funding agencies must develop open access policies that provide information about access to federally-funded research.

!33! ! ! ! !

On 22 February 2013, the Office of Science and Technology Policy (OSTP) issued a memorandum to the heads of executive departments and agencies, directing them to ‘develop a plan to support increased public access to the results of research Funded by the Federal Government. This includes any results published in peer-reviewed scholarly publications that are based on research that directly arises from Federal funds, as defined in relevant OMB circulars (e.g., A- 21and A-11). It is preferred that agencies work together, where appropriate, to develop these plans.’ (DBASSE, 2013, ¶ 2)

The OSTP 2013 announcement quickly made the rounds on all national and international research data management list serves and with good reason: it is reminiscent of the National Science Foundation (NSF) data management plan (DMP) requirement announcement in 2011. Before these two announcements, scientists and researchers of federally funded research projects did not have to submit information as to how the data would be managed and/or provide access to research data from federally funded research projects. There are many complexities and dimensions involved in the provision of open access ranging from domain/subject based content, to types of content, to level of access to content ownership of the object, management, and access of the institutional repository of published research (Blinco & McLean, 2004). The development and continuity of open access initiatives require agreement and active participation among program promoters (i.e., president, provost, funding agency), stakeholders (i.e., disciplines, departments, scientists), and users (i.e., scholars, campus, public). The program evaluation11 of open access literacy programs must include the following to remain effective: 1. The need for the program (i.e., OSTP memorandum, open access policy) 2. The design of the program (i.e., Infrastructure) 3. Program implementation (i.e., Institutional Repository (IR) platform) 4. Program impact or outcomes (i.e., Scholarship, scholarly communication) 5. Program efficiency (i.e., Success Metrics) Today, scientists must develop data management plans and provide access to research data from federally funded research projects. Providing access to research include: (1) technological, (2) institutional and managerial, (3) financial and budgetary, (4) legal and policy, and (5) cultural and behavioral issues (Arzberger et al., 2004, p. 136). The data management and

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 11 Rossi, P. H,, Lipsey, M. W. & Freeman, H. E. (2004). Evaluation: a systematic approach, 7th ed. Sage. ! !34! ! ! ! ! curation models, relevant standards, best practices, and guidelines discussed in this chapter address these issues and can facilitate scientists at research labs at FSU to effectively share data. The OSTP open access and NSF data management plan requirements affect data producers, particularly scientists and those responsible for managing research data at research labs at FSU, to be cognizant of relevant data management standards, best practices, and guidelines involving the data management and open access to federally-funded research.

2.7 Tools There are many data management and curation tools available to scientists. The following examples are a sample of useful tools for managing data. 1. The collaborative assessment of research data infrastructure and objectives (CARDIO) facilitates data management requirements assessments, builds consensus among various roles involved in data management, and identifies practical data management goals, support, operational inefficiencies, and cost-saving opportunities or stakeholders’ support (CARDIO, 2013). 2. The Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) allows for an internal audit of a data repository to assess capabilities, opportunities, and strengths. DRAMBORA has been administered to several types of organizations ranging from national libraries and scientific data centers to cultural and heritage data archives. 3. Workflow4Ever is a toolkit for addressing the challenges of preservation of scientific experiments in data-intensive science, http://www.wf4ever-project.org/.

Data management planning requires tools to advance the data management lifecycle processes of data curation, digital curation, and digital preservation. Data management and curation tools are developed to address the multitude of complex challenges involved in the management of data over its lifecycle. The following data curation tools are practical and useful tools for curators, researchers, and scientists handling data.

2.7.1 Data Management Planning Tools A good data management planning tool can allow scientists to create data management plans that either meet and/or exceed the minimum data management plan requirements set by

!35! ! ! ! !

NSF. Even if researchers are not submitting proposals for funding to the NSF, data management planning tools are useful for identifying, organizing, and strategizing many of the complex challenges involved in the management of data throughout the life of a research project. These data management planning tools are suitable resources for scientists familiar and not familiar with developing data management plans to guide the management of data assets.

1. The Integrated Earth Data Applications (IEDA) Data Management Plan Tool v.2 is a practical data management planning (DMP) software application for the development of data management plans of geosciences, earth, ocean, and polar sciences disciplinary domains data. The DMP tool has generic form capabilities for creating DMP for multiple disciplines such as the research labs at FSU. The IEDA allows data access, analysis, compliance, publication, and contribution congruent with the National Science Foundation (NSF) DMP requirement. The DMP Tool v.2 was released in April 17, 2012. Source: http://www.iedadata.org/compliance/plan 2. The DCC Data Management Planning Tool (DMP online) developed by the Digital Curation Centre is a particularly very good data management plan tool that exceeds the NSF DMP requirements. The DCC DMP tool prompts users to address many of the stages of the preservation illustrated in the DCC Curation Lifecycle Model, elements of the NSF DMP, and ICPSR Data Management & Curation elements of a data management plan. Similar to the IEDA Data Management Plan Tool, the DCC DMP tool is applicable to multiple disciplines. The DMP Online v.4 is scheduled for release in August 2013. Source: https://dmponline.dcc.ac.uk/ 3. The University of California Curation Center (UC3) California Digital Library DMP Tool allows researchers from partner institutions to create institution specific data management plans for NSF DMP compliance. The DMP Tool v.2 is currently under development as of July 2013. Source: https://dmp.cdlib.org/

Whether developing data management plans for funding agency or institution compliancy, scientists can use data management planning tools to make informed decisions about the creation, organization, dissemination, citation, and utilization of data. Data management planning is an essential element of the data lifecycle that precedes and leads into

!36! ! ! ! ! the data lifecycle management processes of data curation, digital curation, and digital preservation. Metadata, metadata standards, best practices, guidelines, open access, and tools are important elements of successful data management planning that contribute to an environment that encourages the dissemination and sharing of data.

2.7.2 Curator Tools The data curator tools focus on (1) depositing and ingesting digital objects, (2) archiving and preserving information packages, and (3) managing and administering repositories (DCC, 2013). Depositing and ingesting digital objects include creating and manipulating metadata; data transfer and deposit; metadata extraction; normalization and migration; and web arching categories. Archiving and preserving information packages include access platforms; backup and storage management; creating and manipulating metadata; emulation; file format ID and validation; metadata harvest and exposure; normalization and migration; persistent ID assignment, and repository platform themes. Managing and administering repositories include administrative and rights documentation; assessment and audit; costing; data management planning; and preservation planning divisions. Popular curator tools include (1) Archivematica, (2) JSTOR/Harvard Object Validation Environment (JHOVE), (3) Xena Software, (4) Curator’s Workbench, (5) KEEP Emulation Framework, and (6) FITS (DCC, 2013). The Creative Commons is a popular administration and rights documentation tool for data curators. Curator tools are valuable resources for data curators working with researchers in the management of data.

2.7.3 Researcher Tools The researcher tools focus on (1) managing active research data and (2) sharing output and tracking impact (DCC, 2013). Managing active research data includes active data storage, data management planning, persistent ID assignment, and workflow and lab notebook management categories. Sharing output and tracking impact includes academic social networking and citation and impact tracking. The researcher tools include DMPonline, DataCite, DataVerse, DMPTool, Kepler, LabTrove, myExperiment, WebCite, and Open Planets

!37! ! ! ! !

Foundation preservation planning tool for organizations called Plato12 SCAPE13 (DCC, 2013), and Workflow4Ever. The data management and curation tools allow other domain experts such as data curators, digital archivists, academic research digital librarians, and data managers (i.e., non-scientists) outside the scientific disciplinary domains to assist scientists in linking data in a reliable, scientifically meaningful way which is very difficult for scientists to do within their disciplinary domains (Microsoft, 2006, p. 19). The curator and researcher tools14 support the scientist and curator functional roles in the OAIS preservation planning (CDSDS, 2002/2012), the levels one to three curation (Lord & Macdonald, 2003), and the DCC Curation Lifecycle Model (DCC, 2007/2014) that include aspects of data curation, digital curation and digital preservation to ensure the integrity, accessibility, and stewardship of data in multiple disciplinary domains. At the core of the DCC Curation Lifecycle Model and any model for the dissemination and sharing of data is metadata.

2.7.4 A Relevant Metadata Standards Use Case Example Even though a great deal of research exists on the development, application, and implementation of data management metadata and metadata standards in the archival sciences, social sciences & humanities, e-science, and digital libraries communities, the current metadata standards research and development at the National Coastal Data Development Center National Oceanic and Atmospheric Administration (NOAA) has implications for improving and aligning the data management practices of scientists at research labs from multiple disciplinary domains at FSU to pursue standards, best practice, and guidelines within their respective disciplines. The NOAA National Coastal Development Center endorses (1) ISO metadata standards, (2) offers metadata standards aid - MERMAid, (3) conducts metadata training, and (4) utilizes XML transformations to make its data discoverable, usable, and understandable (NOAA NCDDC, 2013). The NOAA NDCC current transition to international standards organization (ISO) metadata standards is the outcome of the NOAA’s Environmental Data Management

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 12 Plato: The Preservation Planning Tool. Retrieved April 10, 2014 from http://www.ifs.tuwien.ac.at/dp/plato/intro.html. 13 SCAPE. Scalable Preservation Environments. http://www.scape-project.eu/. 14 DCC. (2013). Digital curation resources from outside the DCC: tools & services. Retrieved April 10, 2014 from http://www.dcc.ac.uk/resources/external/tools-services.!! !38! ! ! ! !

Committee’s (EDMC) Data Documentation Planning Directive and the NOAA Administrative Order (NAO) 212-15: Management of Environmental Data and Information15. The NAO establishes the Department of Commerce (DOC) NOAA environmental data management policy that guides the procedures, decisions, and actions regarding the full data lifecycle of all domains of NOAA environmental data, information, and records (Lubchenco, 2010). The NOAA NCDDC endorses the following ISO metadata standards: 1. ISO Technical Committee 211 – Geographic information/Geomatics "aims to establish a structured set of standards for information concerning objects or phenomena that are directly or indirectly associated with a location relative to the earth." 2. ISO 19115:2003 Geographic Information – Metadata Order Information; 3. ISO 19115:2003 Geographic Information - Metadata Workbook - Guide to Implementing ISO 19115:2003 (E), the North American Profile (NAP), and ISO 19110 Feature Catalogue; 4. ISO 19115:2003 Geographic Information – Metadata Biological Extensions Work – Guide to Implementing ISO 19115:2003 (E), the North American Profile (NAP), and ISO 19110 Feature Catalogue with Biological Extensions; 5. ISO 19115-2:2009 Geographic Information – Metadata – Part 2: Extensions for imagery and gridded data workbook – Guide to Implementing ISO 19115-2:2009(E), the North American Profile (NAP), and ISO 19110 Feature Catalogue (NOAA NCDDC, 2013).

The NOAA NDCC MERMAid supports Machine Readable Catalog (MARC) export from Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) for the description, representation, and communication of environmental data and information. MERMAid is the secure, online Web-based Metadata Enterprise Resource Management Aid application of NOAA NDCC “that allows users to establish unlimited metadata databases to organize their metadata records any way they see fit (i.e., by project, data type, personnel, etc.)” (NOAA NCDDC MERMAid, 2013). The FGDC is an interagency committee that promotes the coordinated development, use, sharing, and dissemination of geospatial data on a national basis. The FGDC Content Standard for Digital Geospatial Metadata (CSDGM), !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 15 NOAA Administrative Order 212-5 retrieved July 14, 2013 from http://www.corporateservices.noaa.gov/ames/administrative_orders/chapter_212/212-15.html. !39! ! ! ! !

Ver. 2 is the current federal standard in support of Executive Order 12906, though recent guidance from FGDC encourages agencies to transition to ISO metadata as they are able to do so. (NOAA NDCC, 2013)

The NOAA NDCC is a clear example of high-level data management policy document development with support among all stakeholders throughout NOAA. The purposes, plans, and ideas of making environmental data, information, and records available for dissemination and sharing are realized through the adoption and implementation of metadata standards leading ultimately to ISO standards. In addition to the adoption and implementation of metadata standards to advance resource discovery, metadata training and XML transformations16 (extensible Markup Language for documentation organization, structure, and dissemination), are essential for scientists and users to take advantage of the opportunities to make use of the data made available to them through description, representation, and communication via metadata standards. The NOAA NCDDC transition to ISO metadata standards, development of MERMAid, metadata training, and XML transformation education are an organization’s commitment to effectively manage data. The NCDCC’s XML transformation initiatives recognize that standards are needed for data discovery, access, use, and preservation. The NCDDC development and use of several XML transformation ranging from FGDC CSDGM to ISO Transform, FGDC CSDGM to ISO Crosswalk, ISO XML to HTML View, FGDC BIO to ISO Transform, FGDC BIO to ISO Crosswalk, and FGDC RSE to ISO Transform name a few to disseminate and share environmental data and information is one domain’s perspective on how to effectively manage data that has implications for improving how scientists at research labs at FSU disseminate and share their data.

2.7.5 Disciplinary Specific Issues, Perspectives, & Use Cases The National Research Council publication titled For Attribution – Developing Data Attribution and Citation Practices and Standards Summary of an International Workshop (NRC, 2012) includes brief domain perspectives from the (1) life sciences, (2) earth and physical sciences, (3) social sciences, and (4) humanities on issues covering the types of data, data risks, citation, replication, and publication important in data sharing. The NRC publication highlights

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 16!NOAA!NCDDC!XML!Transformations.!Retrieved!July!17,!2013!from! http://www.ncddc.noaa.gov/metadataGstandards/metadataGxml/.! !40! ! ! ! ! the significance of multiple disciplinary perspectives on data management issues common across disciplines. The NOAA NCDDC partners (a) West Coast Observing System, (b) NOAA’s Office of National Marine Sanctuaries, (c) Florida Geospatial Assessment of Marine Ecosystems, (d) Louisiana Department of Natural Resources, (d) Applied Coastal Engineering and Science, and (e) Strategic Online Natural Resources Information System that employ XML technologies also represent an interdisciplinary research collaboration infrastructure with multiple perspectives on data management that could benefit scientists at the Antarctic Marine Geology Research Facility and Center for Ocean-Atmospheric Prediction Studies laboratories managing, storing, and preserving diverse data.

The principal investigator for this study has experience managing digitized and born digital research data for the department of anthropology, biological science, and oceanography at Florida State University. Below are three brief use cases along with comments from scientists at FSU that have benefited from the application of data curation, digital curation, and digital preservation to their research data.

Department of Anthropology One project involved an anthropologist with research data images on the Sadana Island Shipwreck in Egypt that existed offline on DVDs in the researcher’s office wanted to develop an online collection and make the collection accessible to the research and learning communities. After a series of data management consultations, the research images were made accessible through he FSU Libraries online public access catalog (OPAC) via MARC and Dublin Core metadata standards and the digital asset management system, DigiTool and institutional repository (IR). Below is the researcher’s comment to the data curation and digital curation applied to her research data images. This is just a tremendous project and I’m so grateful to all the work [the digital librarian and OPAC cataloger] has put in to make the digital archive of some of the Sadana Island artifacts a reality [of digital content] within an institutional repository (IR). It’s going to be a tremendous asset to people doing research, for scholarly and personal reasons, into this area, and I will be sure to pass on any comments I receive about it. I look forward to continuing this project with you and again, thank you for all your efforts to make [access to this collection] happen.” – FSU Anthropologist (Smith II, 2008, p. 174)

!41! ! ! ! !

Department of Biological Science Another project involved a biological scientist with electron scan microscope images of his diatoms research that existed offline on DVDs in his office who wanted this research made available online. His research was made available online in the FSU Libraries OPAC via MARC and digital content management system, DigiTool, via Dublin Core. A sample of the scientist’s peer-reviewed journals were also made available online from the FSU Libraries BePress Digital Commons IR based on publisher’s copyright permission. The biological scientist’s research data was the first FSU scientist to have a portion of his research data preserved in both the Florida Digital Archive (FDA) and MetaArchive preservation systems. FDA is a local preservation strategy and MetaArchive is a national/international distributed preservation strategy. The Diatomscapes digital collection was the first FSU scientist research to be available through FSU Libraries OPAC, digital content management system, and IR through data curation and digital curation. Data curation and digital curation of the Diatomscapes collection facilitated digital preservation in FDA and MetaArchive.

I am very pleased that my diatom images are now digitally archived as part of the pilot program for [Florida Digital Archive -FDA] digital preservation. I am honored to be partnering with [FSU Digital Library] and your colleagues [Florida Center for Library Automation - FCLA] on this innovative program. Kindly extend my thanks, on my behalf, to Ms. Motyka and Ms. Caplan for their contributions to the success of this important aspect of our collaboration. I hope it is just the beginning for a long and mutually beneficial partnership between scientists and digital technologists. Thank you again for this exciting news. You made my day. – FSU Biological Scientist (Smith II, 2011)

Department of Oceanography Yet another project involved a Department of Oceanography scientist that requested a sample of his legacy, aging, and deteriorating analog (physical) research data as technical cruise reports to be digitized and made available. Similar to the other disciplinary domains, the technical reports were made available online in the OPAC and digital content management system. Below is the scientist’s comment to data management and curation applied to his research reports.

I am pleased to have our reports on the FSU digital library collection. It is a worthwhile service for our group, as it is a suitable location for our technically oriented reports. While we also maintain a collection of our reports on our own website, the extra

!42! ! ! ! !

visibility of a central collection should make it easier for potential readers to find it on the web. – FSU Oceanography Scientist (Smith II, 2010, p. 82)

The following comments from the former FSU Provost/Biological Scientist on the digitization, dissemination, sharing, and preservation of a sample of his scholarly works, reflects how the application of metadata, metadata standards, best practices, guidelines, and tools can impact the scientist’s attitude, behavior, and participation with respect to research data management.

Thank you so much for all of work on this. It is a great service to the crustacean community to have these available. Please let me know if I can help with anything on the project. – FSU Provost/Biological Scientist (Smith II, 2010, p. 82)

Literature review and experience working with scientists to build, develop, and extend their scholarly works beyond their disciplinary domains reveal a willingness and need for data curators to assist scientists with the data management and curation of data. This research study seeks to explore, introduce, and articulate steps for successful research data management to scientists through triangulating multiple perspectives.

2.8 Literature Review Online Searches The online search interface technique of “berry picking” (Bates, 1989, p. 410) was employed to satisfy the search results “by a series of selections of individual references and bits of information at each stage of the ever-modifying search” (ibid). The keyword/title search results for data management planning (DMP), data curation (DaC), digital curation (DiC), and digital preservation (DP) concepts from 9 LIS and 10 additional resources are not surprising and are in support for continued articulation for the identification, clarification, and definition of key data management and curation services concepts. Also, there were more publications on model versus theory in technology, digital preservation, digital libraries, and digital curation specific publications. The search results yield more publications on digital preservation and data management planning versus publications on data curation and digital curation. However, publications on data curation and digital curation were comparable and redundant in results. The multiparadigm reviews of literature process included: (1) indexing problems, (2) publication of errors, and (3) displacement of concepts. These errors are briefly discussed prior to the discussion of the application of Metatriangulation to this study.

!43! ! ! ! !

2.8.1 LIS Journals Resources The following table includes 9 LIS journals and 9 additional resources used in the search of the four key concepts of data management. This study was used to support the researcher’s assumption of the disparity of availability of research articles on the key concepts. Focusing more on the LIS journals (LIS 1 – LIS 9), literature review suggests more articles on digital preservation than the other key concepts. Also, literature review supports the researcher’s assumption that there exist more non-theory based research publications. Some of the articles with theory search results were articles mentioning theory without any execution of a good theory with a rich history of theoretical research, applications, and implementation by many theorists. It is true that research and development for practical solutions to data management and curation issues dominate but the study of these issues through a paradigm lens is needed to bridge the divides and gaps in literature.

Table 2: LIS Journals Search Results of DMC Key Concepts Resource DMP DaC DiC DP Mode Theory l LIS 1 0 0 1 5 LIS 2 16 6 2 41 LIS 3 19 0 0 7 LIS 4 1 11 19 2 LIS 5 3823 18 12 370 LIS 6 73 17 18 73 173 70 LIS 7 91 7 12 113 LIS 8 2 53 74 78 LIS 9 1749 47 43 256

Below is the listing of the (9) Library and Information Science (LIS) journals used as part of the multiparadigm reviews research of data management and curation (DMC) literature: i. [LIS 1] American Archivist (Freely Accessible Arts & Humanities Journals – from 1938 to 3 years ago)

!44! ! ! ! !

ii. [LIS 2] Information Sciences (ScienceDirect Freedom Collection 2012 – from 01/01/1995 to present) iii. [LIS 3] Information Technology and Libraries (Academic OneFile – from 12/01/1991 to 14 days ago) iv. [LIS 4] Information Today (Library Literature & Information Science Full Text (H.W. Wilson) – from 07/01/1998 to present) v. [LIS 5] ERIC (Education Resources Information Center) vi. [LIS 6] International Journal on Digital Libraries (SpringerLink Contemporary (1997 – Present) – from 04/01/1997 to present) vii. [LIS 7] Journal of Education for Library and Information Science (from 01/01/2004 to present in Library Literature & Information Science Full Text (H.W. Wilson) viii. [LIS 8] Journal of the American Society for Information Science (Business Source Complete – from 01/01/1970 to 12/31/2000) ix. [LIS 9] Journal of the American Society for Information Science and Technology (FCLA -Blackwell Titles – from 1987 to present)

During the course of this research, it was determined to substitute the preliminary study survey data results on theoretical frameworks/perspectives for many of the articles with inconsistent, fragmented, or missing theoretical perspectives. The preliminary study was useful in supporting the researcher’s assumption for the need of more data management and curation theory development. Pragmatism and ethnography were the top two theoretical perspectives chosen by survey participants from the preliminary study of DMC.

2.8.2 Indexing Problems The literature review of DMC encountered information retrieval problems that impacted the multiparadigm research process. The selected journals’ indexers were ineffective in yielding accurate retrieval of the key concepts exactly as entered in search form of the information retrieval systems. “SHERA (1965) observes that effective retrieval depends on congruence between the cognitive organization imposed on knowledge by the individual and the representational structure imposed on documents by the indexer” (Jacob & Shaw, 1998, p. 131).

!45! ! ! ! !

The search on the number of key data management and curation concepts in LIS publications and peer-reviewed journals were introduced in support of the preliminary study results that data curation, digital curation, and digital preservation are independent yet interrelated concepts. Based on preliminary study and literature review, research data and literature review supports that data curation is not digital curation, digital curation is not digital preservation, digital preservation is not digital archiving, and data management planning includes data curation, digital curation, and digital preservation. Hutchins (1978) explains indexing problems encountered during multiparadigm reviews. It is assumed that ‘key’ words can be identical as those that occur most frequently in the text, disregarding the ‘function words’ (articles, prepositions, conjunctions, etc.) and other common words of high frequency in similar texts… We are all aware of the inadequacies of present indexing practice, and despite impressive achievements, there is not much sign that automatic systems can or will do much better than human ones. (Hutchins, 1978, p. 173).

Many of the journals’ indexers used in this small study treated each word as a separate search term thus increasing the search results (i.e., data management planning as search string treated the instance of ‘data’, ‘management’, and ‘planning’ as separate terms and not as a complete search. This was also true for the other concepts. The most reliable conclusion that can be drawn from the results is that there exist literature review publication for the four key concepts with more literature publications for data management planning and digital preservation. Further research is encouraged to study ‘aboutness’ in subject indexing (Hutchins, 1978), relevance assessment (Kekäläinen & Järvelin, 2002), and the quality of information (Stvilia et al., 2007) of data management and curation key concepts in publications.

2.8.3 Publication Errors Literature review suggests that many published DMC articles have publication and key concept errors. Some authors used data curation, digital curation, and digital preservation interchangeably thus contributing to definitional confusion which is more prevalent among the earlier publications from 2007 to 2011. The interchangeable use of key concepts, lack of identification of key concepts and foundational models, made it difficult to understand which key concept was under study. Librarians performed some of the research with little to no knowledge of the key concepts, history of the key concepts, reference to foundational models, and/or

!46! ! ! ! ! sociological, or methodological, suppositions. There is a need to analyze and separate all of DMC content by level of expertise from novice, to intermediate, to faculty, similar to computer science book identification of target audience (i.e., beginner, novice, expert). Many projects did not employ scientific inquiry to provide empirical evidence that allowed reproducible research, leaving some practical articles to languish in obscurity as outcomes of expired grant-funded projects. Some “rush to publication” articles on the then emerging trend of data curation and digital curation that should have been complimentary to digital preservation, where actually promoted as synonymous with each other thus contributing to a proliferation of displacement of concepts. Publications from 2011 to present make more of a distinction for the clarification of the separation of key concept than previously found in literature review but more work is needed.

2.8.4 Displacement of Concepts Literature review suggests that some authors used digital curation in the title followed by the use of data curation throughout the article with reference to digital preservation in the explanation of the storage and management of digital assets. Numerous articles failed to clearly articulate “the proper balance between theory being built around practice and practice flowing from theory” (Tibbo, 2012) as a result of the displacement of key concepts. Digital curation has been described as an emerging field (Botticelli et al., 2011) and a stage in the curation process (Lord & Macdonald, 2003) rather than one of the four key concepts of data management. The profession can address this displacement of key concepts in literature review through the education, articulation, and publication of the key concepts by clearly differentiating, defining, and clarifying them in scholarly works. “Preservation is an aspect of archiving and archiving is an activity needed for curation with all three concerned with managing change over time” (Lord & Macdonald, 2003; Beagrie, 2006, p. 6). Whether a study is addressing one or more of the key concepts of data management, all of the key concepts should be clearly defined, clarified, and articulated at the beginning of the project before the specific concept or concepts are applied to the project in the investigation of a phenomenon. It was decided that the case studies for this research would stem from the DAF research of this study and not from case studies research in DMC literature.

!47! ! ! ! !

2.9 Introduction of Metatriangulation Theory The recent topics of interest for the International Conference on Preservation of Digital Objects/International Conference on Dublin Core and Metadata Applications (iPRES 2013/DC- 2013) and the 2013 call for papers in the special issue on digital curation in Archival Science (Springer) journal include calls for theory of digital preservation and theory of digital curation research, respectively. These two calls for research on theory of digital preservation and theory of digital curation are indicative of the need for more theory development in the field of data management and curation that is dominated by practical, non-theory based research. Competing frameworks and models supported by episodic and inconsistent theories within single disciplinary domains have created a proliferation of fragmentation of concepts, methods, and practices found in literature. Disciplinary silos that claim special expertise on data management and curation such as data curation, digital curation, and digital preservation should not deny, oppose or compete with the expertise of others (inside/outside home domain) in normal attempts to professionalize (Abbott, 1988; Bowker, et al., 2010, p. 112) the home discipline. Viewpoints from multiple domains must be deliberated to effectively address the issue of how to effectively manage, disseminate and share data across beyond one’s home domain. “One interprets one’s own research so that it is useful beyond one’s own disciplinary boundaries and can be integrated into a larger body of knowledge” (Nibert, 2008). With this in mind, the dominant disciplinary domain in which the topic originates must first establish a clear, cogent, and consistent theoretical foundation on which to build, develop, and extend to others for a more comprehensive understanding of the analysis of how to effectively manage data. For this reason, Metatriangulation is an ideal theory-building approach for this study.

2.9.1 Metatriangulation Metatriangulation (Lewis & Grimes, 1999) is a theory-building approach whereby the multiple paradigmatic differences, similarities, and interrelationships (Gioia & Pitre, 1990) of a phenomenon of interest common to multiple disciplinary domains is analyzed from a multiple perspectives versus a single perspective. A single paradigm perspective represents the worldview, frame of reference, concepts, theories, methods, and practices operationalized by an individual, team, or disciplinary domain. It is possible for multiple perspectives to exist among individuals and research teams within the same disciplinary domain. Metatriangulation is

!48! ! ! ! ! conceptualized theoretical triangulation (Denzin, 1970) and multiple perspectives analysis of a phenomenon of interests based on ostensive exemplars, paradigms (Wienberger, 2012). Metatriangulation includes (1) multiparadigm reviews, (2) multiparadigm research, (3) metaparadigm theory building. Building theory from multiple paradigms can enhance more “cross-domain ventures that accelerate discovery, highlight new connections, and suggest unforeseen links that will speed science forward” (Dirk, 2009) by achieving a more comprehensive view of the phenomenon in organizational realities (Burrell & Morgan, 1979; Frost, 1980). In situations where theories are not readily available or insufficiently useful to guide research, one can take the steps of using “metatriangulation as a conceptualized theoretical triangulation (Denzin, 1970) process of building theory from multiple paradigms analogous to [the] traditional single-paradigm triangulation” (Gioia & Pitre, 1990; Lewis & Grimes, 1999, p. 676). “It is conventionally assumed that triangulation is the set of multiple methods in the study of the same object” (Cambell & Fiske, 1959; Webb, 1966; Denzin, 1970:2009, p. 301). The multiple paradigm theory-building approach of Metatriangulation includes the following three significant research processes. 1. Multiparadigm reviews – involve recognition of divides and bridges in existing theory to reveal the impact of theorists’ underlying, and often taken-for-granted, assumptions on their understandings of organizational phenomena [perform an extensive literature review] • Paradigm bracketing – differentiate paradigms • Paradigm bridging – suggests transition zones 2. Multiparadigm research – the theorist applies divergent lenses empirically via • Parallel studies – preserve theoretical conflicts • Sequential studies – cultivate diverse representations 3. Metaparadigm theory building – the theorist manages bounded rationality and accommodates opposing views within a metaparadigm perspective. “Metaparadigm theory building strives to juxtapose and link conflicting paradigm insights (X and Y) within a novel understanding (Z)” (Lewis & Grimes, 1999, pp. 673-675).

!49! ! ! ! !

The multiparadigm reviews, multiparadigm research, and metaparadigm theory building are represented as Phase I: Groundwork, Phase II: Data Analysis, and Phase III: Theory Building in the application of Metatriangulation to DMC (See Table 3).

2.9.2 Foundations of Lewis & Grimes’ Metatriangulation Metatriangulation (Lewis & Grimes, 1999) is founded on research in the study of paradigms in organizational theory (Burrell & Morgan, 1979; Pondy & Boje, 1981; Zey-Ferrell & Aiken, 1981), theoretical triangulation (Denzin, 1970; Fielding & Fielding, 1986; Rennie, Venville, & Wallace, 2010), triangulation (Schutt, 2006; Brewer & Hunter, 1989; Sechrest & Sidani, 1995), multiparadigm inquiry (Deetz, 1996; Reed, 1996; Schultz & Hatch, 1996; Sherer, 1998), paradigm interplay (Schultz & Hatch, 1996), multiparadigm perspectives (Gioia & Pitre, 1990), social theory paradoxes (Poole & Van de Ven, 1989), and theorists’ common sense and experience intuition (Eisenhardt, 1989; Glaser & Strauss, 1967; Mintzberg, 1979; Weick, 1989). Paradigms include the assumptions, practices, and agreements among a scholarly community” (Lewis & Grimes, 1999, 672). Theoretical triangulation is the combination of one or more theories and triangulation is the combination of one of more methods in the study of the same phenomenon. A paradigm may contain multiple theories and methods. Theoretical triangulation and triangulation traditionally involve the study of the same phenomenon of interest from a single paradigmatic perspective whereas Metatriangulation involves the study of the same phenomenon from paradigmatic perspectives that include multiple theories and methods.

2.9.3 Background and Early Developments The “application of a relatively untried research approach called Metatriangulation” (Saunders, et al., 2003, p. 245) has been used in three studies: (1) Jasperson et al. (2002) study of power and information technology research, (2) Saunders et al. (2003) study of management information systems (MIS), and (3) Madlberger & Roztock (2010) study of digital cross- organizational collaboration. These studies were influenced by the first application of Metatriangulation in the study of advanced manufacturing technology (AMT) (Lewis & Grimes, 1999) that serves as a theoretical illustration for practical application for this research. Lewis & Grimes (1999) studied the problematic and controversial complexities of AMT in organizational technologies resulting in recognition of self-managed work teams, total quality

!50! ! ! ! ! management, just-in-time inventory, trust, authority, and control issues. Metatriangulation was used to unify AMT exemplars that only depicted portion of the theory-building process thereby facilitating direct implication and illustration for future application. Paradigms transition zones were located via functionalist-interpretivist views and paradigms lens biases were critiqued to explore objective-subjective paradigms for complementarity and disparity of paradigm lenses (Gioia & Pitre, 1990). The AMT assumptions were bracketed using Burrell & Morgan’s (1979) typology (Grint, 1991; Lewis & Grimes, 1999). The typology includes (1) radical humanist, (2) radical structuralist, (3) interpretivist, and (4) functionalist quadrants with subjective to objective continuum on the X-axis and regulation to radical change continuum on the Y-axis. The data source included 100 case studies of comprehensive study of AMT in a specific setting (Eisenhardt, 1989) with 20 cases selected for detailed analysis. The cases selected for analysis represented extremes (i.e., from manual to automated), were open to interpretation, were accompanied with detailed description to allow creative theorizing, included multiparadigm analysis, and represented each of the typology’s four paradigm lenses (Burrell & Morgan, 1979; Lewis & Grimes, 1999). The paradigm itinerary selected by Lewis & Grimes (1999) was from functionalist to radical structuralist to interpretivist to radical humanist to enhance researchers’ learning experiences by traveling from dominant AMT (functionalist) paradigm towards its antithesis (radical humanist) (Lewis & Grimes, 1999, p. 681). The multiple paradigm data was coded, broken down, interpreted, and conceptualized (Glaser & Strauss, 1967; Lewis & Grimes, 1999). Even though exemplar multiple paradigm coding scheme such Graham-Hill (1996), Yin (1989), Mangham & Overington (1983), and Martin (1992) were available, Lewis & Grimes (1999) opted to coding cases using the “case authors’ focus, language, and methods” (Lewis & Grimes, 1999, p. 682). The multiparadigm analyses yielded four sets of codes addressing distinct and interrelated elements of AMT: conceptualizations, implementations, perceptions, and tensions. The multiple paradigms perspectives provide the following AMT insights: “production system for enabling efficiency and adaptability (functionalist), ongoing construction of inter-subjective experiences (interpretivist), tool for labor domination and control (radical structuralist) and vehicle for communicative distortion (radical humanist)” (Lewis & Grimes, 1999, p. 680).

!51! ! ! ! !

2.9.4 Lessons from the Trenches of Metatriangulation Research Saunders et al. (2003) studied investigation of contradictory, confusing, and fragmented collection of management of information systems (MIS) literature review supported the authors’ application of Metatriangulation to the study of MIS. The authors recognize the need and “challenge to conduct research that synthesizes multiple theoretical perspectives” (Saunders et al., 2003, p. 246) in pursuit of a broader and deeper understanding of a phenomenon interest. Saunders et al. (2003) used Metatriangulation to study power and information technology (IT) in the MIS disciplinary domain. Saunders et al. (2003) investigated the “phenomenon of interest [of] the role of power in the management and use of IT” (p 247) and the paradigm brackets included the IT paradigm lenses: (1) technological imperative, (2) organizational imperative and emergent perspective and power paradigm lenses: (1) rational, (2 pluralist, (3) interpretive, and (4) radical. The metatheoretical sample of literature review included 68 articles published in 10 leading MIS and Management journals from 1980-1999 and 14 articles from non-North American journals (Saunders et al., 2003). Multiparadigm coding scheme, coding, and re-coding were executed to capture authors’ IT paradigms and resolve coding differences. Patterns in data from multiparadigm perspectives were identified, conceptualized, and studied to synthesize power and IT theoretical insights. Paradigm theoretical perspectives differences, similarities, and metaconjectures were articulated. Researchers examined metaconjectures for new observations and previous conclusions consistency for deeper insight. Saunders et al. (2003) used paradigm lenses introduced by Markus and Robey (1988) and a modified version of the Burrell & Morgan (1979) typology rather than the adapted Burrell & Morgan (1979) multiple paradigms lenses used by Lewis & Grimes (1999) and Hassard (1991).

2.9.5 Metatriangulation Study of Digital Cross-Organizational Collaboration Madlberger & Roztock (2010) conducted a Metatriangulation of digital cross- organization collaboration using paradigm lenses adapted from Markus and Robey (1988) in the examination of 80 articles published in six information systems journals between 2000 and 2007. The authors’ concluded that the paradigms lens affects the selection of an underlying theory: (1) technological imperative, (2) organizational imperative, and (3) emergent perspective. Theoretical gaps and opportunities were discovered through multiparadigm lens analysis. The

!52! ! ! ! ! authors defined digital cross-organizational collaboration as “the integration of people, information systems, and business process across various organizations” (Madlbeger & Roztocki, 2010, p. 1). The common phenomenon of interest was the use of information technology (IT) as an enabler in digital cross-organizational collaboration (digital COC). The goal of the investigation was to develop a theoretical understanding of digital COC through the study of extant digital COC research through the “adoption of a highly abstract perception that prescinds from individual theoretical lenses” (ibid). Each article in the sample was analyzed for theoretical viewpoints for either technological imperative, organizational imperative, or emergent perspective designation for coding, analysis, and interpretation. Multiple coding and re-coding testing was executed for reliability and validity of data. The multiple paradigms analysis of papers resulted in emergent (35), technology imperative (20), and organizational imperative (25) perspectives. The papers were coded as reference (paradigm perspective), topic studied, data source, and theoretical frameworks/perspectives for analysis. The written paradigm accounts included how authors defined digital COC, identified unit of analysis, applied theories, paradigmatic differences explanations. The groups of articles unit of analysis were individual, organization, and group of organizations. Results yielded different levels of digital COC with conclusions and understandings influenced by the unit of analysis. There were 42 underlying theories from the 80 articles used to explain various aspects of digital COC. The integration of paradigm lenses to theory of digital COC framework was an outcome.

2.9.6 Summary Metatriangulation (Lewis & Grimes, 1999) is a multiple phase, qualitative research method theory building process comprised of groundwork (phase I), data analysis (phase II), and theory building (phase III). Founded on the work of Burrell & Morgan (1979) multiparadigm model, Metatriangulation enables the multiple paradigm perspectives investigation of a phenomenon of interest common to disciplines and literature review with inconsistent theories. The Saunders et al. (2003) study used the Bradshaw-Camball and Murray (1991) framework and Markus and Robey (1988) multiple paradigms work as an adapted form of Metatriangulation. Madlberger & Roztock (2010) also used an adaptive form of Metatriangulation. Both studies support the need and significance of Metatriangulation as multiparadigm approach to address

!53! ! ! ! ! disciplines with publications and projects “having frequently inconsistent results and conflicting interpretations” (Madlberger & Roztock, 2010, p. 2). Hassard’s (1991) multiple paradigms analysis provides a case that supports Lewis & Grimes (1999) and provides suggestions for exploring variations of multiple paradigm research applicable to the needs of an investigation similar to the Metatriangulation adaptions undertaken by Saunders et al. (2003) and Madlberger & Roztock (2010) that lend support to this study.

2.10 Data Management and Curation: A Metatriangulation Review The complex problem facing the non-systematic approach to the developments of a theory of digital preservation and a theory of digital curation within the field of data management and curation is the same frontier problem that confronted theorists within the field organizational theory which was “how to conduct inquiry based on several paradigms” (Pondy & Boje, 1981; Lewis & Grimes, 1999). “Metatheorizing techniques [such as Metatriangulation] help theorists explore patterns that span conflicting understanding” (Lewis & Grimes, 1999, p. 675) and “information science needs metatheory in addition to theory because of the nature of information and the problems that information science addresses” (Dow, 1977, p. 323). The study of data management and curation requires inquiry based on several paradigms. Literature review suggests researchers have used competing models and frameworks’ tensions and oppositions to fragment practices rather than “use them to stimulate the development of more encompassing theories” (Poole & Van de Ven, 1989; Lewis & Grimes, 1999). Data management and curation literature review can be viewed as a compendium of contradictory perspectives underscored with episodic, non-theory based research that is further complicated by provincial and political organizational practices. Provincial and organizational practices can create a cultural environment whereby the triangulation of erroneous practices or fragmented theories in research data management practices do not properly explain or predict a phenomenon. The integration of theoretic perspectives from other disciplines with consistent theoretical analysis of phenomena is paramount to the building of data management and curation theory and minimizing future errors and inconsistencies within the library and information science (LIS) field.

!54! ! ! ! !

2.10.1 Application of Metatriangulation to Data Management and Curation The groundwork phase of Metatriangulation requires recognition of the author’s paradigm during multiparadigm research. “Recognizing an author’s paradigm, however may be an arduous and arguable task. Smircich (1983) noted that not only do authors rarely state their paradigm but, often, make the choice unconsciously” (Lewis & Grimes, 1999, p. 679). For purposes of this study, researchers will be asked to state their paradigms during the interviews in phase 2 of this study. The theory-building process of data management and curation (DMC) for this research is guided by the application of Metatriangulation (Lewis & Grimes, 1999) (See Table: 3). The portions of the table in grey and green were completed as part of the preliminary study. The data source for the case studies of varied DMC context and theoretical views will come from the data source for the data asset framework (DAF) discussed in Chapter 3.

Table 3: Theory-Building Processes of Traditional Induction and Metatriangulation (Lewis & Grimes, 1999, p. 677) (Used with Fair Use)

Multiparadigm Single Paradigm – Variation of Inductive Purpose in Implications for DMC Traditional Inductive Activity Metatriangulation Study Activity Phase I: Groundwork Specify research questions Define phenomenon of interest Provide focus, yet enable Encompassed diverse interpretative flexibility DMC types and theory Review relevant literature Focus paradigm lenses – Gain multiparadigm Recognize divides and bracket paradigms and locate understanding and bridges between existing transition zones cognizance of home paradigm perspectives Preliminary Study Choose data source Collect metatheoretical sample Aim lenses at common Interview case studies of (data interpretable from empirical referent varied DMC contexts and multiple lenses) theoretical views Phase II: Data Analysis Design analytical process Plan paradigm itinerary Recognize paradigmatic Move away from home (ordered use of lenses) influences; emphasize and dominant paradigm contrast and retain balance Systematically code data Conduct multiparadigm coding Cultivate diverse data Detail contrasting views interpretations; accent distinct DMC and its paradigm insights implementation Tabulate and/or exhibit Write paradigm accounts Experience paradigm Recognize conflicts and analyses language-in-use; manage overlaps in images of accumulating insights DMC tensions Phase III: Theory Building Develop and test Explore metaconjectures Conduct mental experiments; Examine patterns and propositions juxtapose paradigm insights discrepancies across accounts Build theory Attain a metapardigm Encompass disparity and Use “individual” and perspective complentarity; motivate “team” to accommodate interplay differing explanations Evaluate resulting theory Articulate critical self-reflection Assess theory quality and Track tensions and theory-building process paradoxes experienced in own work

!55! ! ! ! !

Multiparadigm research requires recognition of the divides and bridges between existing perspectives (See Table 4) during the groundwork process in phase 1 of Metatriangulation. Using Martin’s (2002) theoretical frameworks/perspectives, researchers’ theoretical perspectives on DMC were obtained from survey data from the Data Management and Curation Services Opinions Survey (See Appendix A).

Table 4: Preliminary Study Survey Results of Existing DMC Perspectives

Question #5 – Preliminary Question #6 – Preliminary Question #7 – Preliminary Study Study Study ***Theoretical *Elements of Data **Data Seal of Approval Assessment Frameworks/Perspective Management Plan ≥90% Guidelines ≥80% s ≥30% 1. Autoethnography (15: 1. Data description (47: 92%) 1. Research data in repository with sufficient 36%) access information (44: 90%) 2. Constructivism (13: 31%) 2. Existing data (41: 80%) 2. Research data in recommended formats (37: 76%) 3. Critical Theory (6: 14%) 3. Format (48: 94%) 3. Research data with metadata (42: 86%) 4. Ethnography (27: 64%) – 4. Metadata (48: 94%) 4. Data repository has explicit digital the study of the culture of a archiving mission (39: 78%) group. 5. Ethnomethodology (8: 5. Storage & backup (48: 94%) 5. Ensure legal regulations compliance and 19%) human subjects protection (39: 80%) 6. Feminism (2: 5%) 6. Security (45: 88%) 6. Documented processes & procedures for managing data storage (43: 88%) 7. Grounded Theory (20: 7. Responsibility (47: 92%) 7. Lon-term digital assets preservation plan 48%) (46: 94%) 8. Hermeneutics (5: 12%) 8. Intellectual property rights 8. Archiving takes place according to (47: 92%) workflows across data life cycle (36: 73%) 9. Narratology (10: 24%) 9. Access and sharing (50: 98%) 9. Access and availability of the digital objects (41: 84%) 10. Phenomenology (14: 10. Audience (29: 57%) 10. Enables users to utilize and reference 33%) research data (40: 82%) 11. Phenomenography (2: 11. Selection and retention 11. Ensure integrity of digital objects and the 5%) periods (43: 84%) metadata (40: 82%) 12. Positivist/Realist/Analytic 12. Archiving and preservation 12. Technical infrastructure supports archival Approaches (14: 33%) (48: 94%) standards like OAIS (34: 69%) 13. Pragmatism (28: 67%) – 13. Ethics and privacy (46: 90%) 13. Data consumer complies with access answering practical questions regulations (36: 73%) that are not theory-based. 14. Symbolic Interactionism 14. Budget (39: 76%) 14. Data consumer conforms to codes of (8: 19%) conduct (38: 78%) 15. Triangulation 15. Data organization (44: 86%) 15. Data consumer respects applicable Metatriangulation (13: 31%) licenses (38: 78%) 16. Quality assurance (46: 90%) 17. Legal requirements (44: 86%)

!56! ! ! ! !

2.10.2 Interdisciplinary Research Compliments Metatriangulation The definition of interdisciplinary research adopted by the National Science Foundation (NSF) contains elements of theoretical triangulation that lend support to the development of the theory-building approach such as Metatriangulation in the study of data management and curation for theory of digital preservation and theory of digital preservation discussions.

“Interdisciplinary research is a mode of research by teams or individuals that integrates information, data, techniques, tools, perspectives, concepts, and/or theories from two or more disciplines or bodies of specialized knowledge to advance fundamental understanding or to solve problems whose solutions are beyond the scope of a single discipline or area of research practice” (Committee on Facilitating Interdisciplinary Research, Committee on Science, Engineering, and Public Policy, 2004, p. 2).

The use of theory building tools such as Metatriangulation can benefit the digital curation community by contributing to the development of good theory. “Good theory guides research, which, when applied, increases the likelihood that information technology will be employed with desirable consequences for users, organizations, and other interested parties” (Markus & Robey, 1988, p. 583). Metatriangulation can be used to clarify, define, and analyze practices (methods) and theories (concepts) to build theory of digital or theory of digital curation within the body of data management and curation literature review that is comprised of inconsistent theories. The phenomenon of interest for this study is how to effectively manage data and the multiple paradigm perspectives include the data management and curation practices of scientists from interdisciplinary, multiple disciplinary, and cross-disciplinary research labs at FSU.

2.10.3 Metatriangulation as a Theoretical Perspective Metatriangulation can lead to the development of new theoretical perspective for data management and curation by introducing broader “[views] for constructively changing or refining concepts and methods” (Committee on National Statistics et al., 1985, p. 12) within and outside disciplines to develop new theories. Developing and applying new theories to existing data may lead not only to new knowledge but also to improvements in future data collections (Committee on National Statistics et al., 1985, p. 13). A single research paradigm is too narrow a view of the multifaceted nature of organizational reality (Burrell & Morgan, 1979; Frost, 1980; Gioia & Pitre, 1990, p. 584) to investigate interdisciplinary problems. Digital curation just like organizational theory in the past faces the problem of how to conduct inquiry based on several

!57! ! ! ! ! paradigms (Pondy & Boje, 1981; Lewis & Grimes, 1999, p. 672). Metatriangulation encourages theorist to recognize the complementarity and disparity (Gioia & Pitre, 1990; Poole & Van de Ven, 1989; Ybema, 1996; Lewis & Grimes, 1999) of multiple paradigms to “bridge the gap between image of the phenomena and the phenomena itself” (Morgan, 1983, p. 21).

2.10.4 Limitations of Metatriangulation Since Metatriangulation uses multiple paradigms to build theory and “existing multiparadigm approaches are ambiguous and fragmented” (Lewis & Grimes, 1999, p. 673), there is potential for the results of a Metatriangulation study to multiply ambiguous or paradoxical practices parallel to home paradigm without fostering “greater insight and creativity” (ibid, p, 672). There is a potential for paradigm bracketing to maintain dualistic boundaries (Deetz, 1996) if the biases of each paradigm are not critiqued and transition zone perspectives located (Lewis & Grimes, 1999, p. 686). Fielding & Fielding (1986) offer a limitation of theoretical triangulation with implications for Metatriangulation.

“Theoretical triangulation does not necessarily reduce bias, nor does methodological triangulation necessary increase validity. Theories are generally the product of quite different traditions, so when they are combined one may get a fuller picture, but not a more ‘objective’ one. Similarly, different methods have emerged as a product of different theoretical traditions, and therefore combining them can add range and depth, but not accuracy. We should combine theories and methods carefully and purposely with the intention of adding breadth or depth to our analysis, but not for the purpose of pursing objective truth” (Fieldling & Fieldling, 1986, p. 33).

Critics challenge the approach of embracing other perspectives by “noting the potential for ethnocentric bias-contamination of paradigm accounts from the theorist’s home culture” (Deetz, 1996; Lewis & Grimes, 1999, p. 687). However, one theorist explains the goal of Metatriangulation in the following.

“[The passion for research encourages researchers to] move beyond reproduction of the differences that divide us to an appreciation of why we are divided. In doing so, we arrive at the only powerful means of assessing the nature and limitations of research practice – by acquiring a capacity for knowing what we doing, why we are doing it, and how we might do it differently if we choose” (Morgan, 1983; Lewis & Grimes, 1999, p. 686).

!58! ! ! ! !

2.11 Conceptual Framework for Analyzing Methodological Suppositions There is a growing need for interdisciplinary conceptual framework models that incorporate and integrate multiple theories and methods from multiple paradigms to solve interdisciplinary data challenges. Conceptual frameworks need to improve the management of data and not simply prove the need for conceptual frameworks to guide scientists in the effective management, use, and preservation of data. The following quote purports the need for a conceptual framework with multiple disciplines implications.

“Good conceptual tools are needed, whether to participate in these developments, evaluate their progress, guide researchers to make good use of the results [interpretations], or critically engage with the very idea of data being usable ‘any time, any place’” 17 (Whyte, 2012, p. 205).

The Conceptual Framework for Analyzing Methodological Suppositions (Burrell & Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983: Solem, 1993, p. 595) (See Fig. 5) is a general, two-dimensional metatheoretical perspective for the analysis of social theory (Solem, 1993) within a domain to facilitate scientific inquiry. The conceptual framework accommodates for analysis and comparison the perspective of a problem common to multiple disciplinary domains through the investigation of the (1) ontology, (2) epistemology, (3) frame of reference, (4) concepts, and (5) methods processes involved in scientific inquiry (Briggs, 2007, p. 73). The elements in the conceptual framework are defined specific to this research as: 1. Ontology – objective to subjective typology continuum (Morgan & Smircich, 1980) (theories of reality) – worldview 2. Epistemology – positivism to anti-positivism typology continuum (Morgan & Smircich, 1980) (understanding & knowledge transfer) – discipline knowledge 3. Frame of Reference – “determines the corresponding methods and concepts accompanying a specific framework” (Solem, 1993) 4. Method – principal ways of acting on environment (Denzin, 1970) 5. Concept – “constitutes the definitions (or prescriptions) of what is to be observed” (Merton, 1968)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 17 Whyte, A. (2012). Emerging infrastructure and services for research data management and curation in the UK and Europe. In G. Pryor (Ed.). Managing Research Data. London: Facet Publishing. pp. 205-234. !59! ! ! ! !

6. Practice – research made public and reproducible by others (Denzin, 1970), use cases, scenarios, prototypes 7. Theory – interrelated concepts within a framework (Merton, 1968) that “provide explanations of the phenomenon under analysis” which generate new images of reality (Denzin, 1970) 8. Problem – phenomenon under observation or analysis A paradigm’s perspective on the ontology, epistemology, methodology, and assumptions about human nature (Burrell & Morgan, 1979; Schultz & Hatch, 1996, p. 532) creates boundaries, barriers, and competing approaches and assumptions that are consistent with paradigm incommensurability (i.e., Jackson & Carter, 1991, 1993). However, paradigm integration (Wilmott, 1993; Reed, 1985) and paradigm crossing (Shultz & Hatch, 1996, p. 532) allow for the assessment of paradigm contributions and confrontation of divergent competing approaches and paradigmatic assumptions, respectively, thus ignoring paradigm barriers and boundaries. paradigm bridging (Gioia & Pitre, 1990)

The conceptual framework selected for this study provides one possible approach to the need for research data management conceptual frameworks identified by Whyte (2012) while also encompassing elements of interdisciplinary research defined by the Committee on Facilitating Interdisciplinary Research, Committee on Science, Engineering, and Public Policy (2004). The combination of the conceptual framework for analyzing methodological suppositions (Burrell & Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983: Solem, 1993), Metatriangulation (Lewis & Grimes, 1999), and the Data Asset Framework (DAF) (JISC, 2009) contribute to the development of an interdisciplinary conceptual framework model concept capable of addressing data management and curation issues common across disciplines.

2.11.1 How has the Conceptual Framework been used? The conceptual framework for analysis of a phenomenon in a specific scientific discipline has been used as a model for two scenarios: (1) Solem’s (1993) study of the rational analytical tradition within the strategic management area and (2) the study of the data management and curation services opinions as part of the preliminary study for this project. “A model relates to a target system or phenomenon with which we have a common experience or set of experiences”

!60! ! ! ! !

(Briggs, 2007, p. 73). The practical application of the conceptual framework starts with two basic questions followed by the standard in which to address these questions. 1. What are the core ontological suppositions underlying its frame of reference? [How does the discipline look at and understand reality?] 2. What are the basic epistemological stances for its frame of reference? [How does the discipline learn about reality?] (Solem, 1993, p. 596)

Fig. 5: Conceptual Framework for Analyzing Methodological Suppositions (Burrell & Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983; Solem, 1993, p. 595) (Used with permission)

Table 5: Ontological Assumptions and Epistemological Stances (Morgan Smircich, 1980)

Core Ontological Assumptions Basic Epistemological Stances Objectivist approaches to social science Objectivist approaches to social science Realism Positivism • Reality as a concrete structure • To construct a positivist science • Reality as a concrete process • To study systems, processes, changes • Reality as a conceptual field of • To map contexts information • Reality as a realm of symbolic • To understand patterns of symbolic discourse discourse • Reality as a social construction • To understand how social reality is created • Reality as a projection of human • To obtain phenomenological insight, imagination revelation Subjectivist approaches to social science Subjectivist approaches to social science Nominalism Anti-positivism !61! ! ! ! !

The core ontological assumptions are the standard used to answer question 1 and the basic epistemological stances (See Table 3) are the standard to answer question 2.

The ontological assumptions and epistemological stances provide a simplified overview for “basic comparative analysis between different scientific disciplines to elaborate on their metatheoretical suppositions” (Solem, 1993, p. 597). The metaphor concept is used in epistemology as “an image, or a description of one thing for another” (Morgan, 1986; Solem 1993). The ontological assumptions, epistemological stance, and metaphor concept are necessary to this investigation of multiple perspectives on how to effectively manage data to enable theory development, scientific inquiry, and interdisciplinary research. Interview participants will be asked the core ontological assumptions and basic epistemological stances questions during the interviews in Phase 2 of this study. Below are two practical scenarios of the application of the conceptual framework with implications for the study of data management and curation practices of scientists within research labs at FSU.

2.11.2 Conceptual Framework Scenario #1 The study of the methodological suppositions for the radical analytic tradition within the strategic management area (Ansoff, 1965; Porter, 1980; Bengtsson & Skärvad, 1988) yielded the following results: 1. Ontology – Reality mainly as a concrete structure, concrete practice, and a system of information, decision; 2. Epistemology – To study structures, processes, systems, changes; 3. Metaphors – The machine metaphor, organism metaphor, and brain metaphor; 4. Frame of Reference – Dominating contributions from the brain metaphor, especially decision theory; 5. Concepts – Strategies: stability, expansion, retrenchment, combination; 6. Methods – Descriptive and Prescriptive: product/market matric, gap analysis, product life-cycle, portfolio-theory (Solem, 1993, p. 597)

2.11.3 Conceptual Framework Scenario #2 (Preliminary Study)

!62! ! ! ! !

The results from an analysis of the methodological suppositions underlying the data management and curation services opinions survey within the data management and curation field are shown in in Fig. 6. The data management and curation services opinions survey was the preliminary study conducted for this research. The preliminary study included questions on the four key concepts of data management, elements of a data management plan, theoretical frameworks/perspectives, and data seal of approval assessment guidelines. 1. Ontology – Reality mainly as a contextual field of information, symbolic discourse, a projection of human imagination, social construction; 2. Epistemology – To study models/frameworks, systems, processes, changes, to map context, to understand patterns of symbolic discourse; 3. Metaphors – The machine metaphor, organism metaphor, and brain metaphor; 4. Frame of Reference – Dominating contributions from the brain metaphor, especially pragmatism (See Preliminary Study Results Sec 3.12.2). 5. Concepts – Data management planning, data curation, digital curation, digital preservation, cyberinfrastructure 6. Methods – Descriptive and Prescriptive: ethnography, grounded theory, autoethnography, positivist, constructivism

Solutions to data management and curation issues require “continued organic adaption by technology, people, organizations, and society” (Parsons et al., 2011) through “multiple worldviews (e.g. constructivism and participatory) during the study instead of using a single worldview, such as pragmatism” (Creswell & Plano Clark, 2011, p. 45). Multiple worldviews include a paradigm worldview, theoretical lens, methodological approach, and methods of collection (Crotty, 1998; Creswell & Plano Clark, 2011, p. 39).

!63! ! ! ! !

Fig. 6: Methodological Suppositions For The Data Management And Curation

Fig. 7: Adapted Conceptual Framework Model (ACFM)

The conceptual framework (Fig. 5) was adapted to create the Adapted Conceptual Framework Model (ACFM) (See Fig. 7). The ACFM includes theory and practice functional extensions for the illustration of the need for concepts to inform theory and methods to inform practice. Concepts must be defined and linked to inform theory. Methods must include standards, best practices, and guidelines to inform practice. The goal of the adapted framework is to balance !64! ! ! ! ! theory and practice research developments through extending the framework to accommodate the theories and practices from multiple perspectives. Theory and method must be brought closer together and that both must be interpreted from a common perspective if sociologists are to narrow the breach that presently exists between their theories and their methods” (Denzin, 1970/2009, preface). The conceptual framework was adapted for this study to bring theory and method closer together for a common perspective.

The adapted conceptual framework will be used as a referent model during the interviews in Phase 2 of this study. The model will be enabled with the new Qualtrics heat map feature to allow interview participants to click on areas of the heat map model they believed are important in attempts to quantize participants’ discipline specific methodological suppositions. The heat map will record areas of the data management and curation and conceptual framework images participants deem as important to their research. This process will allow the collection and aggregation of regions of the heat for mapping to key concepts of DMC.

2.12 Major Findings from Literature Review Some major findings from the literature review include: • The four key concepts of data management are not clearly distinguished, defined, and clarified. • Data management plans are required for funding from the NSF (2011). • The White House OSTP issued public access notice (2013) • The NEH Office of Digital Humanities has established a DPM requirement (2013) with guidance from the NSF Directorate for Social, Behavioral, & Economic Sciences. • Researchers/scientists need DMC education & resources to ensure data integrity and stewardship. • DMC activities and practices vary across disciplines. • Multiple disciplines face massive data storage management issues. • It is “impossible to define all the terms of one theory in the vocabulary of the other” (Kuhn, 1982, p. 669)

!65! ! ! ! !

2.13 Summary The ever-increasing data intensive and technologically advancing 21st century cyber- environment has extended the responsibilities of scientists to manage, store, and provide access to their data with greater accountability. Literature review suggests definitional confusion, underdeveloped or inconsistent theory, and distributed and fragmented data management approaches keep scientists unprepared for learning and executing effective data management practices. Lewis & Grimes (1999) suggests research starts with literature review for theoretical perspectives then explore multiple paradigms and metaparadigm theory building when identifying exemplars. Paradigms need to be based on exemplars (Wienberger, 2012). Multiple paradigm perspectives on data management require diligently, continually, and confidently exploring data management practices of scientists across disciplines cognizant of applicable relevant standards and guidelines. This chapter presented literature review of data management and curation resources in support of utilizing the theory-building approach of Metatriangulation (Lewis & Grimes, 1999) and the conceptual framework for analyzing methodological suppositions (Burrell & Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983; Solem, 1993) for investigating and improving data management and curation practices in disciplinary domains. The conceptual framework is necessary to establish the paradigm within a domain and multiple instances of the conceptual framework will represent multiple perspectives. The research data management practices of scientists in research labs at FSU will be investigated through multiple perspectives and the Data Asset Framework (DAF). The DAF, data collection & analysis, interpretations, and limitations are discussed in Chapter 3.

!66! ! ! ! !

CHAPTER THREE METHODS

It will be difficult to improve your institutional infrastructure without an overall understanding of the data you currently hold and how researchers at your institution are managing their data.– CARDIO v.2 (Collaborative Assessment of Research Data Infrastructure and Objectives)

3.1 Methodology Social science research involves the generation of knowledge from the use of scientific, logical, systematic, and documented research methods to investigate individuals, societies, and social processes (Schutt, 2006, p. 9). The research problem and the research questions determined the mixed-methods research method chosen for this study, the DAF methodology. The chosen research method defines how the researcher will collect, analyze, and interpret data in the study (Creswell, 2009; Johnston, 2011). Within this study, the researcher used a mixed- methods research approach to investigate the data management and curation practices of scientists at select research labs at FSU and scientists involved with the NSF EarthCube project. This chapter provided the reason for this study; this chapter reviewed the research questions and identified Metatriangulation as the theoretical lens as the frame of reference theoretical perspective within an adapted conceptual framework model. The adapted conceptual framework model was use with an adapted Data Asset Framework (DAF) methodology in this study. This chapter concluded with the data analysis, significance, research study limitations, and the proposed timeline for completion of the dissertation research. The FSU Human Subjects Committee IRB approved the Data Asset Framework (DAF) Survey Questionnaire (See Appendix D) in Summer 2013. The informed consent and letter of invitation to participate in the web-based DAF survey (See Appendix E) for phase 1 of this study are included in the Appendices. The FSU Human Subjects Committee IRB approved the Data Asset Framework (DAF) Interview for Phase 2 of this study. Phase 2 was conducted and completed in Fall 2013.

3.2 Research Purpose The research study investigated the current data management and curation practices of scientists in several research labs at FSU and the NSF EarthCube project to identify, classify, and assess data assets in order to develop recommendations to improve current data management

!67! ! ! ! ! practices. The research labs included the (1) Center for Advanced Power Systems (CAPS), (2) Antarctic Marine Geology Research Facility, (3) Center for Ocean-Atmospheric Prediction Studies (COAPS), (4) Geophysical Fluid Dynamics, (5) Marine and Coastal Laboratory, and (6) National High Magnetic Field Laboratory (NHMFL). The investigation focused on DMC perspectives from multiple domains.

The DMC practices are defined within this dissertation as the effective aggregation, organization, representation, dissemination, and preservation of data throughout its lifecycle. Data management and curation practices include four the key concepts: (1) data management planning, (2) data curation, (3) digital curation, and (4) digital preservation. Literature review suggests these key concepts - when applied with appropriate relevant standards, best practices, and data management guidelines - help to ensure the integrity, accessibility, usability and stewardship of research data throughout its lifecycle.

3.3 Research Questions The effective management of research data requires answering (1) how the data is created, (2) what types of data are created and (3) how will the data be managed. Funders, stakeholders, scientists, and data users all benefit from the answers to these questions. The following research questions answered these questions and guided this study.

1. How do researchers create, manage, store, and preserve research data? 2. How can the identification and clarification of key DMC concepts be resolved within and across disciplines? 3. What are some of the theories, practices, and methods disciplines use to address research data management in your discipline? 4. How can multiple paradigms perspectives on data management and curation practices within and across disciplinary domains contribute to building DMC research & theory?

The Data Asset Framework (DAF) methodology was used along with an adapted conceptual framework (see Fig. 7) to answer the research questions. The DAF consisted of a survey questionnaire administered in Phase 1 of this study followed by an online interview

!68! ! ! ! ! conducted in Phase 2 of this study. The online interview utilized the Qualtrics survey software for both phases of the study. The online interview format was selected to facilitate the remote and multiple geographic locations of some scientists. The online interview format accommodated scientists on assignment and/or conducting research outside the state of Florida. Skype video with chat was not selected due to confidentiality, privacy, information, and security issues. The DAF survey and interview addressed RQ1. The survey data analysis and interpretation created the themes (i.e., data management and curation services, resources, support, and barriers) for the DAF interview questions. The DAF survey and literature review addressed RQ2. The online DAF interview questions addressed RQ1, RQ2, RQ3, and RQ4. In theory, each RQ complimented each RQ and progressed toward a better understanding of research data management that accessed current data practices and workflows. Below are the RQs mapped to phases of Lewis & Grimes’ Metatriangulation and the DAF methodology: 1. RQ #1 > Phase I Groundwork > DAF Survey & Interview 2. RQ #2 > Phase I Groundwork > DAF Survey & Interview 3. RQ #3 > Phase II Data Analysis > DAF Interview 4. RQ #4 > Phase III Theory Building > DAF Interview The DAF methodology consisted of four major categories; (1) planning the audit, (2) identifying and classifying data assets, (3) assessing the management of data assets, and (4) reporting findings and/or recommendations for improvements (DAF, 2009).

3.4 Theoretical Framework The theory selected for this study was Metatriangulation (Lewis & Grimes, 1999). Metatriangulation is a three-phased theory-building process that incorporates (1) multiparadigm reviews, (2) multiparadigm research, and (3) metaparadigm theory building. Metatriangulation requires intensive research efforts to reconcile paradigmatic differences, tensions, and relationships in the analysis of a phenomenon of interest common across multiple domains. In the case of this study, the phenomenon of interest, “how to effectively manage research data”, is a topic common across multiple disciplinary domains. The phenomenon of interest was investigated across the previously listed research labs at FSU and select scientists affiliated with NSF EarthCube project. The participants designated their primary research domain as interdisciplinary, multi-disciplinary, or other disciplinary domain. These designations were

!69! ! ! ! ! appropriate for the multiple paradigm perspectives study of data management and curation practices. Within the scope of this study, a paradigm is defined as “the assumptions, practices, and agreement among a scholarly community” (Lewis & Grimes, 1999, p. 672). The preliminary study & LIS literature review provided the (1) multiparadigm review of literature; the DAF Survey provided the (2) multiparadigm research, and the DAF Interview supported multiparadigm research and provided the (3) metaparadigm theory building of Metatriangulation. “Because a theoretical framework has great influence on the design, data collection, and data analysis of qualitative studies, each qualitative researcher must make explicit the framework he or she has chosen for a particular study” (Bodner & Orgill, 2007, p. vii).

A theoretical framework is only useful if it is able to explain a phenomenon and to predict future phenomena (Briggs, 2007, p. 83). There is no existing literature review or research on applying sociological research methods to the DAF method for use in building data management and curation theory from multiple paradigm perspectives as proposed in this research study. For purposes of introduction, articulation, and definition, the proposed theory of data management and curation is the study of data management and curation from multiple paradigmatic assumptions, approaches, and perspectives, complimentary and uncomplimentary. Metatriangulation was influenced by the works of previous theorists such as Denzin (1970), Burrell & Morgan (1979), Pondy & Boje (1981), Zey-Ferrell & Aiken (1981), Gioia & Pire (1990), Bouchikhi (1998), Reed (1996), Deetz (1996), Scherer (1998), and Schultz & Hatch (1996), Feyerabend (1979), Hassard (1991), Van de Ven (1983) to name a few. Since it is “impossible to define all the terms of one theory in the vocabulary of the other” (Kuhn, 1982, p. 669), paradigm integration (Willmott, 1993; Reed 1985) and paradigm crossing (Schultz & Hatch, 1996, p. 532) are referenced to assess paradigm contributions and confront competing approaches and paradigmatic assumptions (Schultz & Hatch, 1996) for paradigm commensurability. Even though the conceptual framework selected for this study maintains paradigm incommensurability, multiparadigm approach such as Burrell & Morgan’s typology (Lewis & Grimes, 1999, p. 680) seeks to cross paradigm boundaries through interplay (Schultz & Hatch, 1996, p. 533). Metatriangulation accommodates the metatheoretical positions of (1) paradigm incommensurability, (2) integration, and (3) crossing (Schultz & Hatch, 1996, p. 532).

!70! ! ! ! !

Metatriangulation represents a history of good organizational science and theory research spanning over a quarter century. Metatriangulation was the appropriate theory for this study and contribution to literature on the development of data management and curation theory.

3.5 Methodological Approach The research investigated the data management practices of scientists in research labs with the aim to (1) collect data for doing multiple paradigm analysis, (2) reflect on the validity of the method, and (3) reflect and improve on multiple paradigm application (Beers & Bots, 2009). The research project outlined the application of the three paradigm perspectives in organizational culture model of Martin (1992) to conduct an empirical analysis of data management and curation practices in select research labs at FSU and NSF EarthCube via an adapted Data Asset Framework (DAF) methodology. The multiple paradigms perspectives model includes three perspectives as defined by (1992): 1. Integration Perspective – cultural manifestations are consistent with fostering innovation and expressing concern’s about employees’ physical and mental well-being and there is little dissent or ambiguity – consistency, consensus, and clarity are evident (p. 28); 2. Differentiation Perspective – cultural manifestations expose inconsistencies, view organization-wide consensus as a myth, and replace homogeneity and harmony with difference and conflict (p. 71); 3. Fragmentation Perspective – cultural manifestations entertain a variety of interpretations and concerns do not coalesce into share opinions, either in the form of agreement or disagreement thus uncertainty, multiplicity, flux, and ambiguity are pervasive (p. 118). Previous studies on the integration perspective (Barley, Meyer, & Gash, 1988; Calas & Smircich, 1987; Pettigrew, 1979; Galiardi, 1991; Pfeffer, 1981; Pndy, Frost, Morgan, & Dandridge, 1983) (Martin, 1992, p. 69) influenced Martin’s (1992) definition of the integration perspective used in this study. Studies on the differentiation perspective (Van Maanen & Barley, 1985; Louis, 1985; Riley, 1983; Gregory, 1983; Trice & Morand, 1991; Lucas, 1987) (Martin, 1992, p. 97) provided the context for the differentiation perspective used in this study. Previous studies on the fragmentation perspective (Weick, 1985; Levitt & Nass, 1989; Collins, 1986;

!71! ! ! ! !

Feral, 1985; Moi, 1985; Sabrosky, Thompson, & McPherson, 1982 ) provided evidence of the fragmentation perspective referenced by Martin (1992) and used for this study. This study used Martin’s (1992) three perspectives to analyze data collected from the findings from DAF Surveys and DAF Interviews for a better understanding of DMC practices of scientists working in multiple disciplinary domains across several research labs and projects. The analysis for this study examined survey qualitative data from the integration, differentiation, and fragmentation perspectives to facilitate “the systematic accumulation of data from micro to macro levels of analysis, whilst including opportunities to criticize and re-interpret the methods and findings” Hassard, 1991, p. 296). The qualitative survey data was coded, broken down, interpreted, and conceptualized (Glaser & Strauss, 1967; Lewis & Grimes, 1999, p. 681) into themes, questions, and key issues for data analysis. The DCC Curation Lifecycle Model, Levels 1-3 Curation, and an Adapted Conceptual Framework Model were presented to interview participants as part of DAF interview process with the required tasks of clicking on areas of the diagrams that correspond to their current DMC practices. This activity recorded the participant’s responses via a Qualtrics feature called “heat map”. The heat map recorded participants’ responses via clicking on areas of the models. The heat maps allowed for the conceptualizing and understanding of key DMC concepts through the processes graphically represented by the models. The DAF interviews represented case studies of DMC practices and perspectives supported by findings from the DAF Survey.

3.6 Research Design The mixed methods sequential explanatory research design (Creswell & Plano Clark, 2011) selected for this study included the DAF quantitative survey method and followed the DAF qualitative interview method. The importance of this study’s findings was placed on the quantitative survey data findings for the explanation and interpretation of the DMC practices and perspectives investigated in this study. Hence, the social science research notation used to describe this mixed methods sequential explanatory research design is represented as QUAN ! qual = explain significant factors (Morse, 1991/2003; Creswell & Plano Clark, 2011, p. 109). Phase 1 utilized a survey questionnaire method and phase 2 utilized an interview qualitative research method. The sequential explanatory mixed methods research design was adapted to an adapted Data Asset Framework (DAF) methodology to for scientific inquiry. The

!72! ! ! ! ! process of social science research was applied to the adapted DAF methodology for the purposes of guiding this study, answering the research questions, and investigating the DMC practices of scientists at several research laboratories at the Florida State University and scientists associated with the National Science Foundation (NSF) EarthCube project. The DAF was framed in the context of a sequential explanatory mixed methods research design (Creswell & Plano Clark, 2011) and Martin’s (1992) perspectives for data analysis within this study to facilitate scientific inquiry and application. 3.7 Data Asset Framework (DAF) The JISC, University of Glasgow Humanities Advanced Technology & Information Institute (HATII), and the Digital Curation Centre (DCC) developed the Data Asset Framework (DAF) methodology in 2009. The DAF “provides organizations with the means to identify, locate, describe and assess how they are managing their research data assets” (JISC, et al., 2009) through findings/recommendations resulting from the online survey (quantitative) and interview (qualitative) mixed-methods research. The adapted DAF methodology used for this study combined quantitative and qualitative research methods approaches in the form of questionnaires and interviews to enable the persons responsible for managing research data to gather data assets information to report findings and applicable recommendations for improvement of research data management. The combination of survey and interview research methods revealed aspects of the problem of managing research data that the strongest method would overlook (Fielding & Fielding, 1986, p. 69). The DAF was “a useful tool to engage researchers in data curation and to scope their data management requirements. It can also be applied in non-HEI contexts to investigate or build on existing approaches to information management" (JISC et al., 2009, p. 3). The FSU Data Asset Framework (DAF) Survey Questionnaire developed for this dissertation was adapted from pervious DAF work by McGowan, T. & Gibbs, T. A. (2009) Southampton Data Survey: Our Experiences & Lessons Learned [unpublished], University of Southampton: UK, The University of West of England (UWE), Imperial College, and University of Southampton DAF questionnaires from the DAF Implementation Guide 2009 and latest versions.

3.7.1 Data Asset Framework (DAF) Use Cases The pragmatic and ethnographic DAF studies of data management at Queen Mary (2011), University of London (2011), University of West of England (2012), University of

!73! ! ! ! !

Hertfordshire (2012), University of Southampton (2010), and University of New Castle (2012) contributed to the development of DAF used in this study. The DAF use cases had the same aim of investigating current DMC issues, practices, and perspectives varying across multiple disciplines at their respective institutions. These practical, non-scientific DAF studies inspired the idea of applying social science research, sociological concepts and terms, and multiple paradigm perspectives to the investigation of DMC practices at research labs at FSU.

I would certainly endorse DAF as a method of exposing current practice. In addition to the fixed responses, the 'other' box was used a lot by people doing our DAF survey, often to voice their difficulties, so it was an excellent way to gather a diversity of practice and experience. (Worthington, University of Hertfordshire, 2013)

The DAF use cases exhibited some of the thematic research data management issues previously discussed in the open access section in Chapter 2 (See Sec 2.6).

3.8 Mixed Methods The research design selected for this study was the sequential mixed methods explanatory research design. The mixed method research design served purposes beyond traditional triangulation (Creswell, 1995), within and across methods triangulation (Jick, 1979; Tashakkori & Teddlie, 1998, p. 42), theoretical triangulation (Denzin, 1970/2009), and Metatriangulation (Lewis & Grimes, 1999). Even though various definitions of mixed methods exist in literature review, mixed methods is defined in this dissertation as the blending of quantitative and qualitative research methods in the methodology of a study (Tashakkori & Teddlie, 1998). Mixed methods are not connected to any specific paradigmatic inquiry (Greene et al., 1989) or domain (Burrell & Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983; Solem, 1993, p. 595) in order to make sense of the social world in multiple ways (Greene, 2007), multiple paradigms (Gioia & Pitre, 1990), or case studies (Eisenhardt, 1989) research approaches. Mixed methods research was appropriate for the study of the phenomenon of DMC practices because the scope, concepts, and methods inherent to a single paradigm, perspective, disciplinary domain, or traditional form of triangulation are not adequate for a more complete investigation or analysis of the phenomenon of DMC practices. The combination and use of quantitative and qualitative data collection and analysis and interpretation from a mixed methods research approach provided an improved depth and breadth of understanding (Creswell & Plano

!74! ! ! ! !

Clark, 2011) of this study, which is not available through a single quantitative or qualitative research method. Mixed methods focuses on methods and philosophy (Greene et al., 1989; Creswell & Plano Clark, 2007/2011), methodology (Tashakkori & Teddlie, 1998), qualitative and quantitative research and purpose (Johnson et al., 2007), multiple ways of making sense (Greene, 2007), and research design (Creswell & Plano Clark, 2011). The mixed methods research approach leveraged the strengths of multiple viewpoints, findings, and inferences derived from combining quantitative and qualitative data collection and analysis in this study.

Fig. 8 Data Asset Framework (DAF) Methodology

3.9 Methods of Data Collection The data was collected from online DAF survey questionnaire and online DAF interview questionnaire that were created using the Qualtrics survey software application. Scientists were given the option for face-to-face interviews but choose online interviews due to current their research activities and availability. “Due to the perceived security potential and legal risks from using third-party services [such as Skype, Google Hangouts/Drive/Doc/Office, Facebook, or free host provider]” (Jones, Pryor, & Whyte, 2013, p. 14), dropbox-like services can provide a level of control, security, and reduced risk over third-party services. The FSU Dropbox Service was used in the transfer of large data files for the biological scientist’s diatoms researched, previously discussed in the Disciplinary Specific Issues, Perspectives, & Use Cases section in Chapter 2 (See Section 2.7.5). !75! ! ! ! !

The Florida State University’s instance of Qualtrics was utilized as the data collection software application for the quantitative survey data collection in phase 1 and the qualitative interview data collection in phase 2 of this study. The Qualtrics survey software was used to develop and distribute the interview questions due to the greater control and validity features it provides in addition to the lower perceived security risks. Qualtrics is a product approved for use by the FSU research community and compliance to the FSU’s Use of University Information Technology Resources and the FSU’s Human Subjects Committee data collection policies. Third-party services are not subject to these policies adopted by FSU and FSU IRB and are thus vulnerable to potential security threats, confidentiality, ethical, privacy, information, and legal risks if used as part of FSU IRB-approved research. The data collection plan for phase 1 and phase 2 of this study are outlined in sec. 3.9.1 and sec 3.9.2, respectively.

3.9.1 DAF Survey (online) An online DAF survey was used to collect data assets information from a sample population of scientists (Schutt, 2006, p 234) at several research labs at FSU and affiliated with the NSF EarthCube project. The survey questions were adapted from previous DAF surveys administered in higher education institutions (HEI) in the UK (See Sec 3.7.1). The DAF survey was a quantitative research method designed to record variations in the data management practices of individuals within and across research labs at FSU and NSF EarthCube. The FSU DAF survey (See Appendix F), approved by FSU IRB on July 10, 2013, was comprised of 25 questions aimed at investigating and identifying the management of data assets. The survey was separated into several content blocks: (1) Block 1: About You, (2) Block 2: General Data Management, (3) Block 3: Barriers, (4) Block 4: Your Data Assets, and (5) Block 5: Final Comments. The survey data result was analyzed using the Qualtrics survey statistics and JMP statistical software. The unit of analysis for the DAF survey was the individual role (i.e., Senior Researcher (1), Principal Investigator (2), Research Assistant (3), Research Technician (4), Research Support (5), Research Student (6), and Other (7)). The individuals are members of one or more disciplinary domains research communities. The individuals’ responses to questions represented the context of their social worlds. The DAF survey conducted in phase 1 of this

!76! ! ! ! ! study included an invitation for survey participants in Q25 to participate in the DAF interview in phase 2.

3.9.2 DAF Interview (online second survey with option for face-to-face) The semi-structured online interviews began with informed consent, introduction, and interview logistics. The interview instrument included open-ended questions to stimulate conversation-like responses (Patton, 2002) and required tasks to click on diagrams that recorded responses via heat map recognition (concentration of responses). The qualitative interviews enhanced the quantitative measurement techniques used in this study’s research design” (Schutt, 2006, p. 278) as previously noted as QUAN ! qual = explain significant factors (Morse, 1991/2003; Creswell & Plano Clark, 2011, p. 109). The DAF interview was a qualitative research method designed to record participants’ descriptions, comments, and specific reactions to the data management and curation questions and concept map images to arrive “at meanings on a concrete level” (Kvale & Brinkman, 2009, p. 30). The interview participants were emailed invitations to participate in this study via their respective disciplinary domain’s email list serves. Participants were asked to voluntarily participate in the interview after their directors of their research labs approved the participation and dissemination of the interview protocol to faculty in participating labs. The interview participants were given 30 days to complete the interviews.

The DAF interviews included questions developed from the University of Hertfordshire and referent concept maps models, and ontological/epistemological questions. The DCC Curation Lifecycle Model (See Sec. 2.4.3), the Levels 1-3 Curation Model (See Sec. 2.4.2), and the adapted conceptual framework (See Sec. 2.11.3) images were used as referent concept map models during the interviews. The referent concept maps were introduced during the interviews to acquire domains worldviews. The interview participants’ sense-making or meaning-making of the referent concept map images were probed to understand their respective disciplinary domains perspectives (Bodner & Orgill, 2007) with respect to research data management. The referent concept maps models were enabled with the then new Qualtrics heat map feature that allowed interview participants to click on areas of the referent concept maps to record their responses via a heat map. Participants clicked on areas of the heat map in which interview participants believed are important to them in their research to identify participants’ discipline-specific DMC practices

!77! ! ! ! ! and perspectives. The use of concept maps to probe understanding (e.g. Markow & Lonning, 1998; Nakhelh, 1994; Nakhleh & Krajcik, 1994; Nicoll, Francisco, & Nakhleh, 2001b; Pendley, Bretz, & Novak, 1994; Regis, Aberttazzi, & Roletto, 1996; Stensvold & Wilson, 1990) (Bodner & Orgill, 2007, p. 38) and as a means of expressing concepts through graphical representations or images were introduced in the interviews to stimulate DMC comprehension and understanding from multiple disciplinary domains’ perspectives. Concept maps have been used in research from the field of chemistry (Bodner & Orgill, 2007). Interview participants were asked to click on which area of The DCC Curation Lifecycle Model is important to their research. The model enabled with heat map allowed the clicks to be recorded to aid in data analysis. A follow-up question to the DCC Curation Lifecycle Model asked participants to describe the activities before, during, and after the events in which they clicked on the concept map. The same procedure applied to the Levels 1-3 Curation and the adapted conceptual framework model. The follow-up questions to the referent concept maps models included (1) which areas are you actively involved and (2) describe how does your disciplinary domain address the problem of data management. The results were coded, conceptualized, and analyzed using Martin’s (1992) technique to expose perspectives. The DCC Curation Lifecycle Model was organized into 10 heat map regions representing different stages of the data curation lifecycle model that relate to (1) data management planning, (2) data curation, (3) digital curation, and (4) digital preservation. The Levels Three Curation was organized into 3 heat map regions representing the (1) research process, (2) publication process, and (3) curation process reflective of (1) Level 1 Curation – traditional academic flow of information (data curation), (2) level two curation – information flow with data archiving (digital curation), and (3) level 3 curation – information flow data preservation (Lord & Macdonald, 2003) (digital preservation). The coding of both concept maps allowed interview participants to select aspects of data management and curation from their perspectives outside of the researcher’s dominant domain. Both models contained broad terms and underlying data management activities and curation processes familiar to all disciplines. The interview findings were coded “using the [participants’] focus, language, and methods” (Lewis & Grimes, 1999, p. 682). Additional coding included Martin’s (1992) perspectives technique of applying the code of integration (unified culture); differentiation (varied subculture); and fragmentation (conflicting feelings and ambiguity) to participants’

!78! ! ! ! ! quotes based on their perceptions of organizational culture (Martin, 1992; Lewis & Grimes, 1999). Prior to the additional coding, the first round of coding were peer-reviewed, edited, and corrected for accuracy by colleagues via member check. The interview data was coded and recoded during each subsequent analysis to produce categories and subcategories (Lewis & Grimes, 1999; Strauss, 1987). The DAF interviews were executed in phase 2 of this study.

3.10 Population and Sampling A stratified sample of 10 scientists from a targeted population of 129 scientists was interviewed in phase 2 of this study. The population of 129 scientists comprised scientists from interdisciplinary research labs at FSU and other research labs throughout the United States. Purposive sampling of 129 scientists at select research labs was the target population.

Research Labs and Participant Primary Research Roles • 58 research centers and institutes at FSU; 6 research labs (centers/institutes) selected • 6/58 = 0.10 • 10% research labs at FSU sampled for this study: (1) Center for Advanced Power Systems (CAPS), (2) Antarctic Marine Geology Research Facility, (3) Center for Ocean- Atmospheric Prediction Studies (COAPS), (4) Geophysical Fluid Dynamics (GFDI), (5) Coastal & Marine Laboratory, and (6) National High Magnetic Field Laboratory (NHMFL)

Fig. 9 Participant Primary Research Roles The population and sample for this study were scientists from several research laboratories at the Florida State University and scientists associated with the National Science Foundation (NSF) EarthCube project. These research labs are designated as interdisciplinary,

!79! ! ! ! ! multidisciplinary, and cross-disciplinary programs. Interdisciplinary research is susceptible to paradigm incommensurability (e.g., Jackson & Carter, 1991, 1993; Schultz & Hatch, 1996, p. 532) when constructing interdisciplinary knowledge “because scientific knowledge is situated in its discipline” (Beers & Bots, 2009, p. 259). The researcher included interdisciplinary, multidisciplinary, and other designations as options for participant’s primary research domain on the survey in the sampling of scientists from these research labs. The following definition and difference between interdisciplinary and multidisciplinary were used in survey data analysis:

…multidisciplinary is simply the idea of a number of disciplines working together on a problem, an educational program, or research study. Interdisciplinary research or education typically refers to those situations in which the integration of the work goes beyond the mere concatenation of disciplinary contribution (Petrie, 1992, p. 303).

Fig. 10 Primary Disciplinary Domain

Scientists from different backgrounds that cross multiple disciplines contributed to the integration of knowledge from different areas. Scientists from interdisciplinary and multidiscipline research labs were sampled based on their designation and the rich context of multiple disciplinary perspectives.

3.11 Data Analysis 3.11.1 Preliminary Pilot Study Results A preliminary pilot study of stakeholders’ opinions on data management and curation included a survey of participants comprised of federal funding agencies, faculty, researchers,

!80! ! ! ! ! academic research librarians, and practitioners throughout the US and around the world. The data management and curation (DMC) services opinions survey, approved by the FSU IRB, was distributed from 11/5/12 – 12/5/12 to gather preliminary data on varied DMC theoretical views, opinions, and perspectives that supported further DMC research such as this dissertation. The data on survey participants’ roles, institutional roles, theoretical frameworks/perspectives (See Table 4), and dominant disciplinary domains are included as contributions to the required groundwork in phase 1 of Metatriangulation (See Table 3) and the selection of the adapted DAF survey (See Appendix F) for this study. Similar disciplinary domains were grouped together (i.e., LIS, library science). There were 64 starts and 53 completes for an 83% completion rate. Survey participant’s individual role: " Contractor (2), data curator (2), data manager (5), data producer (3), faculty (10), librarian (9), practitioner (1), program funding officer (3), research fellow (1), scientist (5), senior leadership (5), other (6) Survey participant’s institutional role: " Commercial organization (2), government funding organization (4), higher education institution (35), ischool (5), non-profit organization (1), publisher (1), research center (3), other (1) Survey participant’s theoretical frameworks/perspectives: " Autoethnography (15), constructivism (13), ethnography (27), ethnomethodology (8), feminism (2), grounded theory (20), hermeneutics (5), narratology (10), phenomenology (14), phenomenography (2), positivist/realist/analytic (14), pragmatism (28), symbolic interactionism (8), triangulation/Metatriangulation (13) Survey participant’s dominant disciplinary domains: " Grey literature (1), sociology (1), large scale (big data) data management (1), data management and curation (6), economics (1), oceanography (1), biomedical informatics (1), LIS (11), environmental (1), social science (1), digital humanities (1), information management (1), computer science (2), engineering (1), biology/biodiversity (1), scientific and technical information (1), biological systematics (1) Survey responses relevant to this research: " 55% disagree that data curation is the same as digital curation " 61% disagree digital curation is the same as digital preservation

!81! ! ! ! !

" 80% agree data curation, digital curation, and digital preservation are independent yet interrelated concepts " 77% agree data management and curation services include data curation, digital curation, and digital preservation " 59% agree there is a need to develop data curation theory from similarities, differences, and interrelationships from multiple competing models or frameworks " 70% agree there is a need to develop interdisciplinary undergraduate data management and curation services programs The survey participants also provided responses on the importance of (1) the elements of data management plans (DMP) and (2) the data seal of approval assessment (DSA) guidelines. Survey participants were specifically asked these two questions (1) to promote DMP and DMC and (2) to introduce DSA to multiple disciplines for articulation, education, and indoctrination. The dataset is available online via https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:57106. According to Sampson (2004) and Creswell (2007), pilot testing [preliminary study] can be used to refine interview questions, procedures (Sampson, 2004; Creswell, 2007), data collection plans, and develop relevant lines of questions (Yin, 2003; Creswell, 2007, p. 133). The DAF survey questionnaire results were used to define and refine the DAF interview questions.

3.11.2 DAF Survey Questionnaire Data Analysis The units of analysis were the survey content blocks: (1) Block 1: About You, (2) Block 2: General Data Management, (3) Block 3: Barriers, (4) Block 4: Your Data Assets, and (5) Block 5: Final Comments. Descriptive statistics generated by the Qualtrics survey software application was used to analyze the survey data. The mean, median, and mode will be used to analyze bimodal and multimodal data.

3.12 Qualitative Data Analysis The research used a systematic approach of a series of analyses to track down patterns in the data, conceptualize the data, and align multiple paradigm interpretations (Eisenhardt, 1989; Lewis & Grimes, 1999; Glaser & Srauss, 1967; Reed, 1997) for sense-making and meaning- making of qualitative data using Martin’s (1992) three perspectives.

!82! ! ! ! !

3.12.1 Units of Analysis for Interview The units of analysis for the interview data were the individual researchers’ responses collected from the DAF interview content themes: (1) Area of Research, (2) Disciplinary Domain Research Data Management, (3) Exemplar Research Project – The Funding Application Stage, (4) Exemplar Research Project – Data Collection Stage, (5) Exemplar Research Project - Data Storage and Backup, (6) Exemplar Research Project - Sharing and Security Research, (7) Exemplar Research Project - Archiving Data, (8) Expected Support, (9) Effective use of infrastructure, (10) Data Management Confidence and Awareness, and (11) Concluding Questions and Comments - comparing variations within and across research labs; as part of a research team within and across organizations; worldviews. The Conceptual Framework for Analyzing Methodological Suppositions (Burrell & Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983; Solem, 1993) (See Sec 2.11) was used to gather interview participants’ (1) ontological, (2) epistemological, (3) frame of reference, (4) concepts, and (5) methods from disciplinary domain-specific perspectives. The qualitative data analysis required the discovery of patterns, relationships, and meanings from the interview resulting in themes, multiple perspectives, and interview reports. The interview reports included all forms of observational data, memos, respondent feedback, and transcripts of the DAF interviews to capture as much context necessary for analysis and interpretations of findings.

3.12.2 Multiple Paradigms Perspectives The interview data was conceptualized, coded, and analyzed using the three perspectives (Martin, 1992) paradigms. Each paradigm analysis generated a paradigm perspective: (1) culture, (2) practice, (3) key issues, and (4) DMC context. The paradigm itinerary moved from dominant DMC paradigm to foreign paradigms to accommodate competing approaches and paradigmatic assumptions (Shultz & Hatch, 1996). The movement through multiple paradigms allowed multiple worldviews of DMC that is a phenomenon common across multiple domains. The analysis of interview data from multiple perspectives recognized paradigmatic conflicts, overlaps, and tensions (Lewis & Grimes, 1999). Paradigm transition zones were accommodated by interplay (Shultz & Hatch, 1996) for paradigm integration and crossing. The Burrell & Morgan’s (1979) typology referenced for multiple paradigm analysis enabled a deeper understanding of the phenomenon of interest through the assimilation of multiple paradigms

!83! ! ! ! ! differences, similarities, and interrelationships (Gioia & Pitre, 1990). The polarized interrelationships of uncertainty and complexity as paradoxes for comprehension were recognized as divergent views (Lewis & Grimes, 1999) during the literature review and as fragmentation and differentiation (Martin, 1992) perspectives in this research study. Lewis & Grimes (1999) Metatriangulation included Phase 1: Groundwork, Phase 2: Data Analysis, and Phase 3: Theory Building (See Table 3). The multiple paradigm data analysis and theory building executed the following processes in phase 2 of this project: 1. Plan paradigm itinerary – recognize paradigmatic influences 2. Conduct multiparadigm coding – cultivate diverse data interpretations 3. Write paradigm accounts – experience paradigms language-in-use 4. Explore metaconjectures – conduct mental experiments (Weick, 1989) 5. Attain a metaparadigm perspective – encompass disparity and complementarity 6. Articulate critical reflection – assess theory quality and theory building process

The researcher explored the use of HyperRESEARCH “to code and retrieve, build theories, and conduct analysis of the data” (Creswell, 2007, p. 167). Phase 2 and phase 3 of Metatriangulation were conducted in phase 2 of this research study in Fall 2013.

3.12.3 Multiple Paradigms Perspectives Significance Data from interview participants’ data management and curation experiences involving exemplar research projects (former or current) was collected. During the interview, participants were asked to view and click on regions of images of three different conceptual models which were important in their research. The three conceptual models were: (1) DCC Curation Lifecycle Model (see Fig. 17), (2) Levels Three Curation (see Fig. 19), and (3) Adapted Conceptual Framework Model (see Fig. 21). The images were enabled with the Qualtrics heat map feature that recorded interview participants responses image clicks for data collection and analysis. The interview participants were asked the following ontological and epistemological questions to acquire disciplinary paradigms perspectives (worldviews) in domain specific language. Q10 How does your discipline looks at and understands reality? (i.e., What are the core ontological suppositions underlying its frame of reference?)

!84! ! ! ! !

Q11 How does the discipline learn about reality? (i.e., What are the basic epistemological stances for its frame of reference?) Q12 What are the concepts, methods, theory, and practices that you use to address “research data management” in your discipline?

The purpose of the introduction of the conceptual models was to (1) introduce dominant domain concepts as models and (2) ask non-dominant domains to describe data management from their domain perspective. The multiple paradigm perspectives allowed identification, comparison, and interpretation of paradigm incommensurability in efforts to approach paradigm commensurability, integration (transition zones – where paradigms collide), and crossing (confront distinctions) (Schultz & Hatch, 1996) with respect to understanding the different realities of data management and curation across multiple disciplines. It was expected that parallel and divergent data management paradigm perspectives from this study would aid in developing underdeveloped DMC theory through a broader and deeper articulation, application, and dissemination of multiple domains, perspectives, and practices. It was assumed that interview participants’ exemplar research projects required and/or had some type of data management plan and that investigating how different domains applied disciplinary domain-specific theories, practices, and methods in addressing data management of exemplar research projects would prove essential in addressing and answering this study’s research questions.

3.13 Data Management The Qualtrics survey data and interview data results were stored and managed in digital format by the researcher on the researcher’s personal laptop and external hard drive disk. The survey data will be kept in Microsoft Excel (.xls/.xlsx) and Portable Document File (.pdf) formats. The interview data will be kept in Microsoft Word (.doc/.docx), Microsoft Excel (.xls/.xlsx), and Portable Document Fie (.pdf) formats. A password protected and encrypted disk image was created to store this dissertation and all supporting research data. The password will be an “Excellent” rating and not remembered by the computer or stored in a computer keychain, leaving the data only accessible by the researcher. The encrypted disk image will contain data files, results, and folders corresponding to specific stages of the research process. The proposed

!85! ! ! ! ! interviews coding scheme discussed in Sec. 3.9.2 will be kept in a HyperRESEARCH project (.hs2) file at parent level on the encrypted disk image. The encrypted disk image that will manage the research data for this dissertation includes: (1) a 40 MB volume size, (2) Mac OS Extended (Journaled) volume format, and (3) 128-bit AES encryption. The filenames of the dataset included data description and representation information. The data description included methodology, phase, order, participant, and year. An example of a dataset: DAF-P2_I1-PF_2013.pdf • DAF: Data Asset Framework • P2: Phase 2 • I1: Interview Participant #1 • PF: Participant initials • 2013: Year of DAF Interview- Phase 2

Weekly backups of password protected and encrypted data following each phase of the study were stored on an external hard drive at the researcher’s residence. All research data pertaining to this research will be disposed after 5yrs from the completion of the dissertation, or June 23, 2019, whichever sooner. All future dissemination and sharing of the dissertation, research data, and data analysis will be congruent with publication policies set forth by the FSU Graduate School, FSU IRB, and applicable scholarly publishers. The dissertation may enable an Attribution 3.0 Unported Creative Commons License (Free Culture License).

3.14 Validity and Reliability 3.14.1 Mixed-Methods The validity and reliability of mixed-methods studies was assessed by asking experts to assist in judging the instrument (judgmental validity) and the collection of empirical data from measurement outcome (empirical validation) (Tashakkori & Teddlie, 1998, p. 81). According to Tashakkori & Teddlie (1998), measurements were associated/correlated with other measures of the same construct or that are theoretically related to the construct (convergent validity). The research study’s quantitative and qualitative measurements represent different measures of the same construct and are theoretically related to the same construct. Reliability is the accurate and repeatable results of the test measurement of the construct over time. This research used an

!86! ! ! ! ! established theoretical lens and practical methodology to attempt to balance the theory and practice of data management and curation as the construct. The use of established works of Lewis & Grimes (1999), Burrell & Morgan (1979), and Martin (1992) other theorists introduced in this dissertation enhanced the reliability of this research to be tested over time.

3.14.2 Quantitative Survey The validity and reliability test measurements for quantitative research defined by Tashakkori & Teddlie (1998) include:

3.14.2.1 Face validity – “Face validity is not an indicator of the validity of an instrument” (ibid);

3.14.2.2 Content validity – A group of judges (experts) evaluate a test measurement’s goal. The survey results were reviewed and critiqued by participants/colleagues for validity.

3.14.2.3 Concurrent validity – The instrument was administered with a validated construct. The survey was an adapted survey that was well established to groups of faculty/professionals in HEIs in the UK.

3.14.2.4 Construct validity – The degree to which the test measures the construct includes (1) convergent validity and (2) discriminant validity (ibid, p. 83). The mixed- methods approach and Metatriangulation fulfilled convergent validity but there was the potential for the measurement to yield test results that were different between groups/disciplines that have different degrees of the construct.

3.14.2.5 Reliable validity – A measurement instrument is reliable if results are consistent over time (test-retest reliability), across a range of items (internal consistency reliability) and/or across different raters/observers (interobserver or interrater reliability) (Tashakkori & Teddlie, 1998, p. 80). Inconsistencies in reliability may be due to the definition of the construct. The construct was clearly identified and defined at the start of

!87! ! ! ! !

the investigations to elicit common understanding across raters for consistent interpretation. The multiple paradigms perspectives and triangulation of data provided a deeper insight and understanding of the analysis of the phenomenon over a single method and should improve reliability and validity.

3.14.2.6 Test-retest reliability – The results of its repeated administration differentiate members of a group in a consistent manner. The DAF survey has been administered to different groups/disciplines in the UK with consistency and was consistent in this study (Tashakkori & Teddlie, 1998, p. 85). This study used criteria and techniques cited by Tashakkori & Teddlie (1998) to ensure validity and reliability of the quantitative phase of this research.

3.14.3 Qualitative Interview The validity of qualitative data required the use of researcher, participant, and reviewer standards in conjunction with validation strategies such as triangulation, member checks, and reflexive journal (Tashakkori & Teddlie, 1998; Creswell & Plano Clark, 2011). The trustworthiness (Lincoln & Guba, 1985), credibility (e.g. Eisner, 1991; Janasick, 1994; Lincoln & Guba, 1985; Patton, 1990; Tashakkori & Teddlie, 1998, p. 90), transferability, dependability, and confirmability all contributed to the validation of this qualitative research. Credibility, transferability, dependability, and confirmability collectively contributed to trustworthiness and trustworthiness was determined by different methods such as (1) prolonged engagement (e.g. Freeman, 1983; Gardner, 1993), (2) persistent observation, (3) triangulation (e.g. Denzin, 1970; Patton, 1990), (4) peer debriefing, (5) negative case analysis (e.g. Glaser & Strauss, 1967; Kidder, 1981; Lincoln & Guba, 1985; Patton, 1990; Yin, 1994), (6) referential adequacy (e.g. Eisner, 1975; 1991; Lincoln & Guba, 1985), and (7) member checks (e.g. Lincoln & Guba, 1985; Spradley, 1979). For purposes of this study, the criteria and techniques prolonged engagement (Freeman, 1983; Gardner, 1993), triangulation (Denzin, 1970; Patton, 1990), and member checks (Lincoln & Guba, 1985; Spradley, 1979) were adapted for ensuring the validity and reliability of the qualitative phases of this study. The applicable criteria and methods are briefly discussed to support the trustworthiness of this research inquiry.

!88! ! ! ! !

3.14.3.1 Credibility – The sequential explanatory multiphase, mixed-methods design and Metatriangulation (Lewis & Grimes, 1999) theoretical approach creates credibility of the study through prolong engagement and use of triangulation techniques (Tashakkori & Teddlie, 1998). Using the Burrell & Morgan (1979) typology and the multiple paradigms analysis technique of Lewis & Grimes (1999), conceptual framework model (Burrell & Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983; Solem, 1993), and three perspectives (Martin, 1992) provided breadth and depth to this research. The multiphase mixed-methods research design was discussed in Sec. 3.6, triangulation technique in Sec. 2.10, and the conceptual framework in Sec. 2.11.

3.14.3.2 Transferability – The recording of notes and memos in the form of a reflexive journal throughout the entire study provided context for transparency. The reflexive technique “provides information for all four criteria of trustworthiness (i.e., credibility, transferability, dependability, and confirmability)” (Tashakkori & Teddlie, 1998, p. 93).

3.14.3.3 Dependability – The project maintained full-disclosure to participants in phase 1 and phase 2 of this study. Data was collected prior to IRB approval and changes to approved IRB phases of the project were amended and vetted for IRB approval. The survey participants from phase 1 of the study that agreed to participate in phase 2 of the study provided prolonged engagement, insight, and relevant findings leading to saturation of new knowledge for analysis. The guidance of the major professor and the committee’s concerns on “the process of the inquiry, including the appropriateness of inquiry decisions and methodological shifts” (Tashakkori & Teddlie, 1998, p. 93) were critical and the foundation of this research. This project was made possible with their support and that approval lead to the dependability of this dissertation.

3.14.3.4 Confirmability – The data supported the findings and interpretations for internally coherence. The principal investigator used Qualtrics software to record detailed data, progress reports, error checking, and consistent validity checking during the study.

!89! ! ! ! !

3.14.3.5 Member checks – “This occurs either during the investigation or at its conclusion, and constitutes the most important credibility check” (Tashakkori & Teddlie, 1998, p. 92). Participant feedback was asked during the investigations to insure validity.

3.15 Significance and Limitations 3.15.1 Significance With the recent calls for proposals for theory of digital preservation and theory of digital curation in international conference proceedings and publications, this dissertation contributed to data management and curation theory development that is currently under-represented in literature review. The combinations of Metatriangulation, multiple paradigm approaches and analysis, organizational theory, the Data Asset Framework, and a conceptual framework that promoted scientific inquiry to study data management and curation was original with research, practical, and social implications. The significance of this study included: • 1st study to use Metatriangulation, a conceptual framework, and DAF to investigate data management across disciplines. • The study applied theoretical and practical knowledge to underdeveloped research on theory of data management and curation (DMC). • The study may help in improving DMC practices across multiple disciplines. 50% of the interview participants found this study to be useful.

Other significances of this study included the following responses to important questions on this study’s methodology: • What does this research want to discover? This research wants to investigate how scientists manage, store, provide access, and preserve research data and explore relevant data management and curation standards, best practices, and guidelines. • Why are the research questions are important? The research questions are important because they address funding agencies data management and curation requirements; plan to collect, aggregate, and analyze current data management practices, and plan to educate, articulate, and promote the need for data access, use, and reuse.

!90! ! ! ! !

• How is this research going to answer the research questions? This research is going to discover, map, and correlate data management synergies across disciplines and introduce/share data management concepts & models across disciplines.

3.15.2 Limitations The limitations of this research study included: • The study may multiply ambiguous or paradoxical practices in parallel with home paradigm without fostering greater insight & creativity (Lewis & Grimes, 1999, p. 672). • There is potential for maintaining boundaries if paradigms biases are not critiqued. • The study may lack transferability & generalizability (i.e., non-scientific disciplines). • Methodological approach – Even though DAF has been used in national and international HEIs, there exist the lack of application of the scientific method to DAF. Most of the DAF examples have been case studies within an ethnographic and/or pragmatic context but not with a theoretic lens with scientific method applications. The application of sociological methods and other research methods are necessary to develop DAF as a methodology representative of scientific method for scientific inquiry across domains. • Survey and interview completion/drop out rate adversely affected the findings.

3.16 Ethical Considerations The FSU Institutional Review Board (IRB) approved the study’s DAF Survey on July 10, 2013. There were no foreseeable risks to participants and the informed consent (See Appendix D) was presented to participants at the start of the survey. Participants had to agree to the inform consent in order to participate in this study. The survey was equipped with a required validation question that asked participants if they agree to informed consent. Below is an excerpt from the DAF Survey informed consent:

There is no foreseeable risk to participants involved in completing this survey. Your participation in the study may lead to the development of strategies for your organization to develop data management plans, data curation practices, professional development, educational graduate programs and practices, and data management plans in agreement with standards, guidelines, and best practices to address the data deluge program across disciplines, institutions, and organizations.

!91! ! ! ! !

The ethical considerations for this mixed-methods research study included (1) maintaining confidentiality, (2) protecting anonymity, (3) reducing biases, (4) causing no harm to participants, (5) sustaining honesty, and (6) demonstrating responsible conduct for research in compliance to university and government policies and procedures regulating research.

3.17 Proposed Timeline for Dissertation Research Table 6. Proposed Timeline for Dissertation Research Date Research Activity November 2012 IRB approval awarded (Preliminary Study) July 2013 DAF Survey Questionnaire IRB approval awarded (Phase 1) September 2013 Defended Prospectus October 2013 DAF Interview Questionnaire IRB approval awarded (Phase 2) October 2013 – November 2013 DAF Survey Data Collection (Phase 1) November 2013 – December 2013 Data Interview Data Collection (Phase 2) December 2013 – February 2014 Data Survey Data Analysis February 2014 – March 2014 DAF Interview Data Analysis March 2014 – May 2014 Dissertation writing June 2014 Dissertation Defense

Conclusions The DAF and multiple paradigms perspectives on data management and curation provided insight into an organization’s culture while also promoting the seven rules for successful research data management in universities. The seven rules for successful research data management (RDM) in universities are: (1) understand how your institution deals with research data, (2) build a case for RDM and gather support, (3) define your institution’s position on RDM to establish policy and strategy, (4) ensure researchers are aware of what data is available, (5) provide easy to use, robust data storage, (6) make it easy for others to find and cite research data, and (7) stay ahead of your peers (Hodson & Jones, 2013) to enable data dissemination and sharing. This chapter presented a case in support of multiple paradigm perspectives and multiple

!92! ! ! ! ! paradigm analysis using Metatriangulation as theoretical lens, DAF as methodology, and mixed methods as research design as an approach to investigate and improve research data management and curation practices of scientists at research labs at FSU. The data analysis and findings provided empirical evidence in support of this approach in DMC theory development. The following justification by the University of Northampton to undertake a DAF project were also the same justification enacted by the researcher to undertake this DAF project for research labs at FSU. The justification for this research project included the following: " There is little known information about university researchers’ data creation, data storage requirements, and data management research workflow that incorporate long-term preservation [across multiple disciplinary domains and research labs] " No university-wide data storage policy or procedure currently exists " Research funders are beginning to demand that data as well as published research outputs are made openly available " There is infrastructure to store and preserve digital data " Previous studies have noted that the process of undertaking DAF has been valuable in itself, even if the resulting inventory of data is only partial " There have been a number of previous implementations of DAF, these could be consulted or adapted to meet [FSU’s] needs (Pickton, 2010) This study aimed to discover how scientists store, manage, share, and preserve data in contrast to the emerging data management and curation trends, relevant standards, best practices, and guidelines. The research questions were important because funding agencies continue to require data management plans and data access to federally funded research. Scientists need to adopt, implement, and adopt effective data management and curation practices to meet the demands of funding agencies and the research and learning communities. This research answered the research questions through identifying and investigating the perspectives, practices, and experiences of scientists from different disciplinary domains that are responsible for managing research data. It was the goal of this research project to develop recommendations and implications that seek to provide suggestions that would improve the current data management and curation practices of scientists in several research labs at Florida State University.

!93! ! ! ! !

CHAPTER FOUR RESULTS

A methodology in theory differs from a methodology in practice. - Watson & Wood-Harper (1993, p. 611)

Findings from DAF Survey18 Data Analysis

Using a mixed-methods research approach comprised of adapted Data Asset Framework (DAF) surveys and interviews, this study analyzed the data management practices of scientists from several research labs at Florida State University and scientists affiliated with the National Science Foundation (NSF) EarthCube project. This chapter presents findings from DAF surveys and DAF interviews data analysis.

Fig. 11 DAF Surveys and DAF Interviews Completion Rate

DAF Survey The DAF Survey, FSU IRB HSC#2013.3073, consisted of 26 questions covering five categories focused on research data management. The categories included: (1) About You, (2) General Data Management, (3) Barriers, (4) Your Data Assets, and (5) Final Comments. Each category contained survey questions. The survey was administered to scientists via email and remained active from October 1, 2013 to October 31, 2013. This section discusses the analysis of data from each of the five categories in the DAF Survey with n = 129. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 18!This!survey!was!adapted!from!McGowan,!T.!&!Gibbs,!T.!A.!(2009)!Southampton!Data! Survey:!Our!Experiences!&!Lessons!Learned![unpublished].! !94! ! ! ! !

About You This category contained three questions about (1) whether the participant holds research data, (2) primary research role, and (3) primary disciplinary domain. If the participant does not hold research data, then the survey skip logic takes the participant to the end and terminates the survey. Participants were asked in Q2 “do you currently hold any research data” as a qualifier for participation in the survey. If participants hold any research data, then they were allowed to participate in this survey, else participants were taken to the end of the survey and not included. Q2. Do you currently hold any research data? – Out of the 129 participants that answered yes to the informed consent in order to begin the DAF Survey, 106 (83%) participants currently hold research data and 21 (17%) participants currently do not hold research data. Thus, 106 out of the 129 original participants were qualified to participate in this study.

Q3. What is your primary research role? – There was not much demographic information collected outside of (1) primary research role and (2) primary disciplinary domain in the ‘About You’ category for this study. Future DAF studies will include more demographic information but for now will focus on primary research role and principal disciplinary domain with respect to the management of data across disciplines. For primary research role, participants were given 7 options, including ‘Other’ as a text input box option. The results from Q3 “what is your primary research role” are displayed in Table 7.

Table 7. Primary Research Role

Role Frequency Percent *Other Senior Researcher 23 23% IT Support Principal Investigator 29 29% Postdoctoral research associate Research Assistant 26 26% Research associate Research Technician 3 3% Operation project manager Research Support 3 3% Data management Research Student 10 10% Postdoctoral research associate *Other 7 7% Postdoc Total 101 1.01

!95! ! ! ! !

The data revealed that 29% (29) of the participants identified themselves as Principal Investigators followed by 26% (26) as Research Assistants, and 23% (23) as Senior Researchers. Participants identified themselves in “other” as IT support, postdoctoral research associate, research associate, operation project management, data management, post-doctoral research associate, and postdoc resulting in 50% of other identified as postdocs. This question is important to identify the key individuals involved in research data management (RDM) in order to understand the audience, title, and academic position in which to develop proposed recommendations for RDM articulation and education where applicable and feasible.

Q4. What is your primary disciplinary domain? – Scientists that participated in this study classified their primary disciplinary domain as (1) multidisciplinary 52% (51), (2) interdisciplinary 25% (25), or (3) Other 23% (23) (See Table 7). However, some scientists listed the same disciplines as multidisciplinary and interdisciplinary without providing details explaining how they distinguished between such designations. For the purposes of definition, clarification, and articulation of the differences between multidisciplinary and interdisciplinary within this study, this research project adapted Petrie’s (1992) definitions of multidisciplinarity and interdisciplinarity for the definitions of multidisciplinary and interdisciplinary, respectively. Multidisciplinarity is simply the idea of a number of disciplines working together on a problem, an education program, or research study. Interdisciplinarity research refers to those situations in which integration of the work goes beyond the mere concatenation of disciplinary contribution. (Petrie, 1992, p. 303)

Fig. 12 Primary Disciplinary Domain

!96! ! ! ! !

Table 8. Primary Disciplinary Domain

Role Frequency Percent Please specify (included) Physics, engineering, educational psychology, oceanography, meteorology, material science, biology, chemistry, cryogenics, AI, cellular biology, geoscience, high field magnet science and technology, mathematics, mechanical and electrical engineering, thermal physics, computer science, condensed Multidisciplinary 51 52% matter physics Condensed matter physics, oceanography, computer science, neuroscience, geochemistry, biophysics, earth science, marine biology, ecology, electrical engineering, chemistry, mechanical, material science, Interdisciplinary 25 25% proteomics/biostatistics Note: Input text box was not Other 23 23% enabled. Total 99 100%

The data revealed that research data management cuts across multiple disciplines and sub-disciplines and supports the assumption that ‘how to effectively management data” is a phenomena common across multiple disciplines considering the disciplines identified in this study are multidisciplinary and/or interdisciplinary. Some scientists listed the same discipline and/or combinations of specified disciplines as multidisciplinary and interdisciplinary. Some of these combinations listed (1) Oceanography and Meteorology, (2) Marine Meteorology/Oceanography, (3) Marine Biology/Ecology, (4) Geophysics, (5) Physics, and (6) Materials Sciences as multidisciplinary. Other combinations listed (1) Ocean/Atmosphere/Agriculture/Power/Water, (2) Physical Oceanography, (3) Marine/Fisheries/Ecology/Evolution, (4) Marine Ecology and Marine Policy, (5) Physics, (6) Material Science and Engineering, and (7) Oceanography as interdisciplinary. The varied and diverse multidisciplinary and interdisciplinary responses revealed multiple sub-disciplines worthy of further exploration. Further research is needed to explore how and why scientists self-

!97! ! ! ! ! identified the same disciplines and/or combinations of disciplines as multidisciplinary, interdisciplinary, or other.

General Data Management This category contained seven questions about (1) primary data, (2) secondary data, (3) data types, (4) access, and (5) storage and was focused on general data management practices as a form to develop a starting point in which to perform a data assessment. Is it important to know some basic characteristics of data, how data is created, data types, and current data practices to effectively explore options and recommendations for improving data practices where appropriate and applicable.

Q5. Thinking about the primary data you hold, what type of data is your primary data? – Participants were asked what type of primary data they hold to distinguish from secondary data. It is important to know the type of data to better manage and apply standards, best practices, and guidelines in accordance with the specific type of data. 78% (74) of the participants selected ‘experimental data’ as their primary data type followed by 61% (58) ‘derived data’, 51% (48) ‘computer code’, 44% (42) ‘observational data’, 27% (26) ‘reference data’, 3% (3) ‘don’t hold any primary data’, and 2% (2) ‘other. The participants’ primary data types are listed in Table 8.

Table 9. Primary Data Type

Type of primary data Frequency Percent I don’t hold any primary data 3 3% Computer code (including model or simulation source code, where it may be more important to preserve the model and associated metadata than computational data arising from the model) 48 51% Derived (resulting from processing or combining ‘raw’ or other data (where care may be required to respect the rights of the owner of the raw data)) 58 61% Experimental (scientific experiments and computational results, which may in principle be reproduced although it may in practice prove difficult or not cost-effective to do so) 74 78%

!98! ! ! ! !

Table 9. (Continued)

Observational (of scientific phenomena at a specific time or location where the data will usually constitute a unique and irreplaceable record) 42 44%

Table 9. (Continued) Reference (canonical or reference data relating for example to gene sequences, chemical structures or literary texts) 26 27% Other - (1) videos, images, audio files and (2) project funding, cost and budget analysis, personnel and materials resources, grants, schedules 2 2%

Q6. Who funded your primary data? – Participants were asked who funded their primary data research to determine which funding agencies are funding the primary research data for purposes of exploring the proportional of funding by National Science Foundation (NSF) and National Institutes of Health (NIH) that require data management plans for research funding. The text responses had numerous and varied funding agencies. Some participants listed multiple funding agencies specifically, generally, or vaguely. There were 79 individual text responses with multiple responses to this question. Specific funding agencies (i.e., NSF, NOAA, NIH) were grouped by specific names, whereas general funding agencies (i.e., government agencies, lab, utility) were grouped by general names, and unique funding agencies listed individually.

Table 10. Primary Data Funding Agencies

Funding agency Frequency Percent National Science Foundation (NSF) 44 33% National Oceanic and Atmospheric Administration (NOAA) 9 7% Department of Energy (DoE) 13 10% Office of Nava Research (ONR) 8 6% NASA 6 4% National High Magnetic Field Laboratory (NHMFL) 7 5% National Institutes of Health (NIH) 5 4% Federal Government (non-specific) 5 4% National Nuclear Security Administration (NNSA) 4 3%

!99! ! ! ! !

Table 10. (Continued)

Unique Funding Agencies (i.e., Pew Charitable Trust, AT&T, MacArthur Foundation) 14 10% Table 10. (Continued)

Deep-C, Gulf of Mexico Research Initiative (GoMRI) - British Petroleum (BP) 4 3% State of Florida (FL) 4 3% Private (non-specific) 3 2% Others (non-specific) 3 2% Florida State University (FSU) 2 1% Free 2 1% Utility (non-specific) 2 1% Total 135 100%

The top funding agencies identified as funding the primary data used by the scientists that participated in this study included 36% (44) National Science Foundation (NSF), 11% (13) Department of Energy (DOE), 10% (14) Unique Funding Agencies, 7% (9) National Oceanic and Atmospheric Administration (NOAA), 7% (8) Office of Naval Research (ONR), 6% (7) National High-Magnetic Field Laboratory, 5% (6) NASA, 4% (5) National Institutes of Health, and 3% (4) National Nuclear Security Administration (NNSA). The Unique Funding Agencies that were identified as funding primary data included organizations such as the Munir Humayan, Ke Han, Pew Charitable Trust, Florida Fish and Wildlife Conservation Commission, Maryland SeaGrant, Louisiana SeaGrant, Institutional Fellowships, AT&T, USGS, AHA, BOEM, VIMS, external/international universities, and MacArthur Foundation funding agencies.

Q7. Thinking about the secondary data you hold, who collected the data? – In order to properly distinguish between primary and secondary (derived) data as identified in the research process of the traditional academic information flow - Level 1 Curation (Lord & Macdonald, 2003, p. 43), scientists were asked who collected secondary data they currently hold (See Table 10). This is important in targeting key data management professionals for the proper and effective articulation of research data management and data management and curation education, resources, and recommendations for stakeholders and developers of data management policies. The interview participant primarily collected secondary data at 25% (27) followed by graduate

100! ! ! ! ! ! students/research assistants/research staff/technicians at 22% (23), research group (team, project) at 9% (10), and principal investigators at 7% (7).

Table 11. Collectors of Secondary Data

Responsible for secondary data collection Frequency Percent Myself (me) 27 25% Graduate students, research assistants, research staff, technicians 23 22% Research Group (team, project) 10 9% Various collaborators & groups 10 9% Postdocs 8 8% Colleagues 5 5% Principal Investigators (PIs)/researchers (various) 7 7% National Science Foundation (NSF) 3 3% Scientists working in research lab 2 2% National High Magnetic Field Laboratory (NHMFL) & NHMFL users 3 3% Derived from primary data/automatic 2 2% NASA 1 1% DOE 1 1% NNSA 1 1% Deep-C Scientists 1 1% FSU Office of Undergraduate Research/FSU Institutional Research 1 1% Suppliers 1 1% Total 106 100%

Q8. Which of the following data types make up your secondary data? – 83 scientists clarified the data types of the secondary data for assessment and understanding of the various data types that are subject to research data management and standards, best practices, and guidelines, where appropriate and applicable. The data types of secondary data are displayed in Table 11. Scientists identified the data types of secondary data they hold primarily as 60% (50) data collected from sensors (including questionnaires), 55% (46) Excel sheets, 52% (43) data automatically generated from computer programs, 48% (40) laboratory notes, 45% (37)

101! ! ! ! ! ! computer software code, 37% (31) MS Word, 36% (30) MS PowerPoint, 28% (23) images/scans/photos/x-rays, 11% (9) digital video files, and 11% (9) websites. This diverse selection of data types will require format migration, mapping, and standards for preservation.

Table 12. Data Types of Secondary Data

Secondary data types Frequency Percent Audio tapes 1 1% Computer software source code 37 45% Data automatically generated from or by computer programs 43 52% Data collected from sensors or instruments (including questionnaires) 50 60% Digital audio files 3 4% Digital video files 9 11% Excel sheets or equivalent presentation software 46 55% Fieldwork data 12 14% Images, scans, photos, x-rays 23 28% Laboratory notes 40 48% MS Access or equivalent database software 6 7% MS PowerPoint of equivalent presentation software 30 36% MS Word or equivalent word processing software 31 37% Slides - physical media 4 5% SPSS files or equivalent statistical software 6 7% Video tapes 4 5% Websites 9 11% Other - please specify - (1) On-line Budget and costs systems, MS Project and (2) Previously published experimental data 2 2% Total Responses 83

Q9. Do you allow others to access your data once the project is finished? – Allowing access to data is useful for extending the usefulness of data for science, research, and education for

102! ! ! ! ! ! current and future use. 89% (73) of the scientists that participated in this survey allow others to access their data once the project is finished whereas 11% (9) do not allow access to their data once the project is finished. Data supports open access and resource discovery via data sharing.

Q10. What are your concerns with providing access to your research data? – 63% (52) of the scientists identified confidentiality or data protection issues, 55% (45) identified the data is not fully documented, 17% (14) identified license agreements prohibiting access, 16% (13) identified the data is no longer in a format that is widely readable/accessible, and 13% (11) identified others as concerns with providing access to their research data. Others included (1) research work underway, (2) getting proper credit when others use this data, (3) data has to be obtained correctly, (4) some data can be shared, (5) cost, (6) PIS and scientists not wanting to share, (7) not published yet, (8) understanding of the experimental conditions, and (9) liability.

Q11. Where do you store your data (excluding backup copies)? – Where research data is stored is vitally important for current and future use. Improper and/or insecure data storage locations can lead to data loss, theft, deletion, and a plethora of complexities if not stored in stable, secure, and reliable locations. The majority of data storage locations included 78% (68) local computer, 56% (49) external hard disk, and 37% (32) other provided file server. The other file servers are stable, secure, and reliable locations. The only concerns would be those storage locations that rely heavily on Google products and other third party solutions that have or do not have transparent data management policies and data privacy statements. Additional research is needed to identify the data storage locations standards, best practices, and guidelines policies. Table 13. Data Storage Locations

Storage location of data Frequency Percent CD/DVD 14 16% External/commercial/web data storage facility - give details 15 17% External hard disk 49 56% Local computer 68 78% My documents on research lab PC 30 34% Paper/file records 24 28% Technology vendor file server 5 6%

103! ! ! ! ! !

Table 13. (Continued) Other provided file server (e.g. by School/unit - please specify) 32 37% Other - give details 7 8% Total Responses 87

External data storage facility included Dropbox (7), Google Drive/Docs (3), Collaborating Institution (1), Commercial Web Hosting (1), Federal Government Web Server (1), and FigShare (1), and NHMFL data storage (1). Other provided file server included NHMFL Drives (9), MagLab (10), IT Servers/HPC/Center/University/SVN (8), Deep-C petabyte server (1), and Dropbox (1). Other (7) included Dedicated Linux servers, Tapes, Project Web Server, My own group file servers, My own lab web server with public access, My personal computer, and A shared Dropbox with the whole lab. Participants stored data in multiple locations. Participants that stored data on local computer and my documents on research lab PC also stored on external hard disk and external/commercial/web data storage. Several participants used DropBox as a complimentary data storage to data stored on local computer and my documents on research lab PC.

4.1.4 Barriers This category contained five questions on potential barriers to research data management.

Q12. Can you estimate, as a percentage % of your research time, how much time you have lost re-organizing, re-formatting or trying to remember details about data? – Limited time and resources can often affect the management of research data, particularly the amount of time spent re-organizing, re-formatting, or trying to remember details about the data. Scientists were asked the estimated time they spend on re-organizing, re-formatting, or trying to remember details about the data. Reducing the amount of time spent and/or addressing inefficiencies in the management of data could provide for reallocation of time and effort to other activities. 73% (62) of participants estimate about less than 20% of time is spent re-organizing, reformatting, or trying to remember details about data followed by 18% (15) estimate between 20%-40% is lost, 4% (3) estimate 0% is lost, and 6% (5) are not sure how much time is lost.

104! ! ! ! ! !

Q13. What are some barriers for you with regard to managing and storing your research data? – Various types of barriers to research data management prevent the development of appropriate standards, best practices, and guidelines or even the culture or infrastructure to support data management and curation of research data. Participants were asked some of their barriers with regard to managing and storing their research data. 53% (39) of the participants identified infrastructure/resources as the primary barrier to managing and storing their research data followed by 44% (32) storage/technology, 38% (28) budget/funding, 23% (17) other, and 14% (10) stakeholders.

Fig. 13 Barriers to Research Data Management

Q14. Which of the following data management issues have you experienced? – Anyone involved with data will encounter some type of data management issues. The study is interested in some of the data management issues encountered by scientists to explore ways to address and develop possible recommendations for ameliorating the issues. Data management issues encountered by survey participants are displayed in Table 14.

Table 14. Data Management Issues

Issues with data management Frequency Percent Finding files which may be either colleagues' or your earlier version (e.g. problems with file names, file/folder structure) 50 62% Locating where data files are stored (e.g. on external hard drives, USB, CD/DVDs, networked storage) 54 67%

105! ! ! ! ! !

Table 14. (Continued)

Non-standard file formats which are difficult to work with in current FSU systems 29 36% Legal issues arising from international transfer of data 6 7%

Problems establishing ownership of data 4 5%

Table 14. (Continued) Finding or accessing research data from former colleagues (e.g. former PhD students or research staff) 45 56% Security and protection of files 18 22% Other - provide details 6 7% Total Responses 81

Other data management issues experienced included the following: • Volume of data collection over time; absence of metadata • Transferring physical notes ! digital; digital search difficult without standardized index term • Complex data creation environment

Q15. Have you ever been asked by a funder to produce a Data Management Plan? – The development of a data management plan has gained in popularity and requirement since the National Science Foundation (NSF) required all scientists/researchers seeking NSF funding development a two-page data management plan beginning in 2011. Since 2011, other funding agencies have required data management plans such as the National Endowment for the Humanities (NEH) in 2013. However, there are still funding agencies that do not require a data management plan for research. Out of 86 total responses, 44% (38) of participants have been asked by a funder to produce a data management plan whereas 56% (48) have not been asked by a funder to produce a data management plan. The funders requesting the production of a data management plan included the National Science Foundation (NSF), the Navy, NOAA, Gulf of

106! ! ! ! ! !

Mexico Research Initiative (Deep-C Consortium), and the National High Magnetic Field Laboratory (NHMFL).

Q16. Are there any data preservation policies in place within your School (e.g. data preservation policy, data lifecycle management policy, or data disposal policy)? – A university of school with a data management plan in place benefits stakeholders, researchers, and users of research data. Out of 86 total responses, 22% (19) of the participants answered affirmatively that there are data preservation policies in place with their school, 17% (15) answered negatively, and 60% (52) did not know if any data preservation policies are in place at their universities. Data preservation policies are crucial to the data lifecycle management of data from data creation to description to representation to dissemination to preservation. The Center for Ocean-Atmospheric Prediction Studies (COAPS), NHMFL, MagLab, and NSF have data preservation policies that are applicable to the participants associated with these research labs and/or receiving research funding from NSF.

4.1.5 Your Data Assets This category contained eight questions on data assets ranging from (1) who is responsible to managing research, (2) how much data is currently maintained, to (3) how do you track your data, (4) when, (5) where, and (6) how often do you back up your research data.

Q17. Who is responsible for managing your research data? – The individual responsible for managing research data must be competent, proficient, and cognizant of various data management practices, policies, and procedures to remain a benefit to stakeholders, researchers, and users. There are many complexities involved in the management of research data. Even though one may not be an expert in all things relating to the lifecycle management of data, the person responsible for management data needs to know not only domain-specific models and practices but resources, tools, and services from other disciplines that may prove useful in advancing, optimizing, and maximizing the research data management capabilities given current budget, technology, infrastructure, and available resources. Also, the key data professional is also responsible for the articulation, education, and integration of data management standards, best practices, and guidelines, where appropriate and applicable.

107! ! ! ! ! !

Out of total 85 responses, 82% (70) identified themselves as responsible for managing their research data followed by 24% (20) project manager, 20% (17) research assistant, 19% (16) research groups, 9% (8) national data center or data archive, and 7% (6) other. The national data center or data archive included National Oceanographic Center, ORNL DAAC, NSIDC, NCDC, and NGDC. Other included Deep-C Data Team, student, postdoc, supervisors, department heads, and self.

Fig. 14 Responsible for Research Data Management

Q18. What is the estimated amount of electronic research data do you currently hold/maintain? – The amount of electronic research data one currently maintains is important to maintaining, developing, and extending the necessary resources, budget, technology, and infrastructure to properly and effectively provide research data management. The participants provided the amount of electronic research data they currently maintain in Table 14. Out of 86 total responses, 30% (26) selected 1 - 50 Gigabyte followed by 22% (19) 1 -50 Terabyte, 10% (9) 50 – 100 Gigabyte, 9% (8) 100 – 500 Gigabyte, 8% don’t know, 7% (6) 500 Gigabyte – 1 Terabyte, 2% (2) 1 – 50 Terabyte, and 1% (1) 100 Terabyte – 1 Petabyte. The estimated amount of electronic research data scientists currently hold/maintain is important in properly assessing and developing recommendations to improve data management. The various amounts of research

108! ! ! ! ! ! data affect the necessary resources and infrastructure to maintain current and future data output, storage, management, and preservation.

Table 15. Amount of Electronic Research Data Currently Maintained

Amount of electronic research data Frequency Percent < 1 Gigabyte 8 9% 1 - 50 Gigabyte 26 30% Table 15. (Continued) 50 - 100 Gigabyte 9 10% 100 - 500 Gigabyte 8 9% 500 Gigabyte - 1 Terabyte 6 7% 1 - 50 Terabyte 19 22% 50 - 100 Terabyte 2 2% 100 Terabtye - 1 Petabyte 1 1% Don' know 7 8% Total Responses 86

Q19. How do you keep track of where your data is stored and the relationships between data? – The way in which the individual responsible for storing and managing the research data can determine the type of potential data management issues will be encountered if logical, standardized, or generally accepted best practices and guidelines for tracking data are not consistently and effectively implemented, maintained, and updated when appropriate. Out of total 85 responses, 49% (42) identified in a local data base (e.g. research group, 38% (32) in an electronic logbook, 35% (30) in a paper logbook, 28% (24) other, 25% (21) in a spreadsheet, and 8% (7) in a remote database (e.g. national archive/data center). Others included memory (brain/my head), website documentation, logical folder structures, dataset file structures, directory structure, wiki, and combination of all the above.

Q20. How long do you keep your data? – In order to keep data for a specified length of time, the elements, stages, and factors involved in the lifecycle management of data become crucial. Without proper data description, representation, preservation planning, curation, and digital preservation, data may not be available for a determined length of time. Out of 86 total responses, 40% (34) keep data forever, followed by 19% (16) don’t know, 13% (11) 10 – 25

109! ! ! ! ! ! years, 12% (10) 5 – 10 years, 5% (4) until the end of a project, 5% (4) 25 – 50 years, 1% (1) 50 years or more with a define timeline.

Q21. Where do you back up your data? – The location of data backup is just as important as the storage location of research data. The same stable, secure, and reliable requirements that is imperative for storage of data is also equally important in data backup. The locations of data backup are displayed in Table 15. Out of total 86 responses, the top data backup locations include 59% (51) external hard disk, 47% (40) another computer, and 41% (35) school/unit – provide file server. These results were expected. However, the rotation and/or replacement cycle of hardware/software is crucial for sustainability, reliability, and security. External/commercial/web data storage facility include data backups at NODC and NCAR, Dropbox, Box, Google Drive, disaster recovery site, commercial web host, and Google docs. Other included national archive, colleague’s institution, own group backup file server, and local computer. It would be interesting to know Google Drive and Dropbox data backup policies.

Table 16. Location of Data Backup

Backup data location Frequency Percent Another computer 40 47% CD/DVD 17 20% External hard disk 51 59% My documents on research lab PC 15 17% My own tape backup system 4 5% Paper/file records via photocopy of similar 6 7% School/unit - provided file server 35 41% USB/Flash Drive/Memory Stick 23 27% To technology vendor file server 4 5% To technology vendor backup system 4 5% To external/commercial/web data storage facility - give details 9 10% Other - give details 5 6% Total Responses 86

Q22. How frequently do you back up your data? – The frequency of data backups is important to ensure that the most recent and valid data is accessible, retrievable, and readable.

110! ! ! ! ! !

36% (31) of participants do not have a fixed schedule followed by 22% (19) automatically via vendor solution nightly, and 19% (16) at least monthly. There is an opportunity to develop a more frequent and regular backup of data to reduce the propensity for data loss of disaster.

Table 17. Frequency of Data Backup

Frequency of data backup Frequency Percent I do not backup my data 4 5% No fixed schedule - when I remember 31 36% Table 17. (Continued) At the end of a project/body of work 3 3% At least annually 3 3% At least quarterly 4 5% At least monthly 16 19% After every update 6 7% Automatically via vendor solution nightly (retained for defined period of months) 19 22% Total Responses 86

Q23. Do you use standards, best practices, and guidelines to manage your research data? – This question is one of the most question of this study and the results support the assumption that there is a need for the use of standards, best practices, and guidelines in managing research data. Out of 81 total responses, 35% (28) of participants use standards, best practices, and guidelines in managing their research data whereas 65% (53) do not use standards, best practices, and guidelines. There is a great opportunity for articulation and education on improving data management practices and to introduce data management standards, best practices, and guidelines where applicable. Also, there is an opportunity to connect scientists from research labs and/or consortiums with data management plans in place with others scientists from research labs and/or consortium without data management plans in place to increase data management awareness, practices, and guidelines across disciplines, labs, and institutions. Collaboration between scientists as a result from this study could benefit those scientists that may be in need of research data management education on the benefits and importance on the use of standards, best practices, and guidelines in the lifecycle management of data for current and future use. Some of the standards, best practices, and guidelines used include netCDF,

111! ! ! ! ! !

THREDDS, ISO, range of international marine data standards, OAIS, ITIL, DAAC ORNL, IEEE, locally developed standards, NSF policies, internal guidelines, and formal management.

Fig. 15 Use of Standards, Best Practices, and Guidelines

Q24. Would you and/or your research lab benefit from having a "data curator", in this context the person/organization responsible for all the activities connected with the management/curation of digital data, in particular of research data? – Depending on the organization, culture, and amount of data being produces, managed, and curated, a data curator may be helpful in assisting those individuals responsible for the management and storage of research data. Out of 85 total responses, 27% (23) answered yes that they or their research lab could benefit from a data curator, 48% (41) answered maybe, and 25% (21) answered no. The data reveals that there is an opportunity to articulate the benefits and a data curator and/or develop proposals to develop workshops, programs, and services to address this potential opportunity to educate scientists in research labs and from multiple disciplines on the benefits of having data curators to assist data producers and data managers with managing data throughout its lifecycle from creation through the research processes, publication processes, curation processes, and preservation processes.

112! ! ! ! ! !

Fig. 16 Research Lab Benefit from Data Curator

4.1.6 Final Comments This last section of the DAF Survey contained two questions: (1) an invitation to participate in interviews in Phase 2 of this research study and (2) solicitation of comments on how the University may assist scientists in the data management and storage of research data.

Q25. Would you be willing to participate in a follow up interview to explore data management issues in more depth (max. 1 hr.)? – Out of 85 total responses, 16% (14) answered yes and 84% (71) answered no. Out of the 14 that answered yes, only 10 provided contact information resulting in a total of 7 participants that participating in interviews.

Q26. How can the University (including your School, research lab cyberinfrastructure, and campus resources such as high performance computing (HPC)) make data management and storage easier for you? – The purpose of this question was to solicit feedback from the participants on ways to address barriers to research data management and gather various perspectives on ways to help participants better manage their research data in the future. There were 54 text responses to this open-ended question that were analyzed using qualitative data analysis techniques such as coding and memoing (Strauss and Corbin, 1998) to identify common themes and map those themes to Martin’s (1992) three perspectives on cultures in organizations. “Strauss (1987) described coding as provisionally conceptualizing the data and producing categories and subcategories” (Bodner & Orgill, 2007, p. 38). “Strauss and Corbin (1998)

113! ! ! ! ! ! defined memoing (or the act of making memos) as ‘the researcher’s record of analysis, thoughts, interpretations, questions, and directions of further data collection’ (p.110)” (ibid, p. 39).

The qualitative survey results from the open-ended Q26 introduced the participants’ perspectives on how to improve data management and curation in their respective research labs and environments. Analysis of this open-ended question in the DAF Survey revealed themes that could be classified into three perspectives on cultures in organizations (Martin, 1992). The three perspectives are (1) integration (harmony and homogeneity), (2) differentiation (separation and conflict), and (3) fragmentation (multiplicity and flux) (Martin, 1992). The data from Q26 can be logically mapped to one of these three perspectives as the following direct quotes from the participants will reveal in this last section of the DAF Survey data analysis. The identification of common themes followed by the classification of the themes into the three perspectives developed the framework for qualitative data analysis that was used in the DAF Interviews.

The very first response to this open-ended question was interestingly surprising, important, and exemplified the differentiation (separation and conflict) perspective while underscoring the need for inclusion, representation, and collaboration across research labs and multiple disciplines. This respondent expressed a perspective that exhibits a subculture perspective in conflict with organization-wide consensus (Martin, 1992, p. 6).

“YOU’RE KIDDING!! I HAVE BEEN HERE SINCE 1968 AND THE UNIVERSITY NEVER HELPS.” – Respondent #1

Other respondents express similar separation or in conflict perspectives representative of the differentiation perspective while proposing solutions for improvements as themes.

“I can do data management myself. Just keep the system stable and reliable.” – Respondent #2

“Provide a cost effective backup solution. The disaster recovery option (sending data up to Atlanta) does not fit within our budget.” – Respondent #4

“By providing more storage and better transfer speed.” – Respondent #5

114! ! ! ! ! !

“Relax policies for purchasing computer infrastructure to allow computers to be used across multiple projects.” – Respondent #6

“Providing resources (financial and personnel) to document the data. I prefer to have my own file data servers (so oversight and management resides with me) so hardware is not an issue.” – Respondent #11

“I can’t imagine them doing anything really useful.” – Respondent #12 “My lab group uses Mathematica as a way to access data files. We have ‘summary notebooks’ which link to all of our raw data as well as our data analysis. It can be hard to share information since we are using a programming language, which is not free or available to everyone at Florida State University. A university license for a programming language could bridge this gap.” – Respondent #40

There were developing themes for (1) the need of education, (2) data curator, and (3) training on data management and curation. The following quote from participant #46 supports the purpose and reason for this research study and can be categorized as a fragmentation (multiplicity and flux) perspective due to the articulation of known data management inefficiencies as a result of several factors affecting good data management from the need for education, training, assistant (data curator), and adherence to best practices. “I think education is important, i.e., educating students about how important this is so that they take it more seriously. This is the weakest link in my own data management plan. If a student does not take this seriously, then their data is not properly managed. Their data is my data. I cannot personally manage all of the data generated within my group. I have to rely on students following practices that we have developed within my group; these are practices that I developed quite a few years ago - long before data management became a major issue. But we are not following these practices as well as we could.” – Respondent #46

There is a need for workshops, symposiums, and tutorials on research data management within and across multiple disciplinary domains. Other respondents expressed similar responses to respondent #46 for a need for education, training, dedicated curators/data professionals, and information on data management practices and guidelines. “Provide infrastructure and possibly expertise in data management to provide advice when necessary. For example, I am not familiar with access, so if someone could take my excel files and enter them into access that would be useful.” – Respondent #7

“Provide more guidance on best practices.” – Respondent #15

“I really wouldn't want anyone else involved in the storage of my research data. While it is fine if other access it, it would be a big drawback if someone else organized it.

115! ! ! ! ! !

Perhaps a small seminar on good data management practices would be helpful, however. Such practices would seem highly dependent on the type of data being managed, but likely could be general enough to be useful to many research groups. Also, it would be helpful to have a small seminar or workshop concerning version management systems for source code, as most research groups I have come in contact with have little concern for this…” – Respondent #18

“Dedicated curator.” – Respondent #19

“…help by providing the person-hours to reformat, add metadata, index content, update file format as needed.” – Respondent #22

“Providing funding for a technician.” – Respondent #24

“By providing standard best practices, provide automatic backup to offsite locations. We are located at a remote laboratory that does not yet have the same capabilities that reside on campus. This is about to be changed so that at least our personal files will be held. However, having a data manager who takes care of everyone's files would be preferable. Then we would have easier access to each other's data bases (if necessary) and standardize storage practices.” – Respondent #28

“Hire a permanent data manager. Make networked storage solutions easier to navigate.” – Respondent #29

“Develop, communicate, and require conformance to reporting and storage standards. Assign the task to a ‘data manager’, who can support the work full-time.” – Respondent #31

“Two separate issues: I distinguish between the core activities of a university from the ancillary support functions. So, I expect Engineering to work on future means of energy production, not build a wind farm just for FSU's use. Likewise, I don't have an expectation that FSU would build its own data center, but to encourage research on future data management issues). 1. Research: What a university does is research and education (not provide infrastructure per se), so I would expect more investment in data management research, and thinking about ways to bring data management into the graduate curriculum more explicitly. In particular, I am desperate for ontology of experimental procedures and results to provide some standard classification of proposed and completed experiments and results. 2. Infrastructure: Given the immaturity of digital data storage and classification, and the lack of clear standards, I think the best thing FSU could do is provide generic data storage and archiving capability, e.g., arrange with a third party to provide a quota of accessible disk space in the cloud, and let individual investigators do what they want with it. If there is an NIH or NSF standardized interface that could also be implemented, then do so by all means. But the attempts by FSU to implement "institutional" content management systems have been pretty appalling to use (Blackboard, the new faculty vita system). Just give me a few gigabytes running a github or other versioning system and I would be happy.” – Respondent #32

116! ! ! ! ! !

“Tutorial on technology, implementation and architecture.” – Respondent #43

Some respondents provided responses identifying the themes of everything are fine, doing a great job, or the university provides ample support for current management and storage of data. The following three responses can be classified as integration perspectives representing harmony or a perspective not in conflict or in different to cultures in the university/organization. “Storage is fine- I keep my data. However, creating the spread sheets is a huge deal, so we usually use a previous spread sheet. In doing that, we must alter the existing one, which is quite confusing with many different variables, each being an element, which all have different formulas to go from intensities measured to the final concentrations, after correcting for errors, normalizing, etc. But if there was already a sheet with everything, then we could delete what we don't need, keep what we do, and plug in what we have. It would be a one time fix, but very tedious, and would have to be done by someone very familiar with the instrumentation and practice. So, probably no person could help other than the very busy professionals that work with the instruments.” – Respondent #34

“Think they’re doing a great job, but if there was funding available for custom, automated data backup solutions, that would be even better.” – Respondent #39

“I am happy with what I have now.” – Respondent #49

There are opportunities for advanced scientists to work with research libraries and ischools to develop data management programs, education, and outreach that address (1) advanced, (2) intermediate, and (3) novice scientists while optimizing collaboration and integration between and among the various levels. Advanced scientists can train intermediate scientists and intermediate scientists could train novice scientists while research libraries and ischools facilitate and compliment research, education, and training where appropriate.

4.2 Findings from DAF Interview19 Data Analysis 4.2.1 DAF Interview The DAF Interview, FSU IRB HSC#2013.11347, consisted of 45 questions and covered eleven categories focused on research data management in a domain. The categories included: (1) Area of Research, (2) Disciplinary Domain Research Data Management, (3) Exemplar Research Project – The Funding Application Stage, (4) Exemplar Research Project – Data !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 19!The!Florida!State!University!Data!Asset!Framework!(DAF)!Interview!Protocol!is!adapted! from!the!University!of!Hertfordshire!RDM!Interview!Protocol!6/1/2012.! 117! ! ! ! ! !

Collection Stage, (5) Exemplar Research Project - Data Storage and Backup, (6) Exemplar Research Project - Sharing and Security Research, (7) Exemplar Research Project - Archiving Data, (8) Expected Support, (9) Effective use of infrastructure, (10) Data Management Confidence and Awareness, and (11) Concluding Questions and Comments. Each category contained interview questions. The interview was administered to scientists online via Qualtrics and remained active from November 1, 2013 to December 23, 2013. This section discusses the analysis of data obtained from each of the categories in the DAF Interview. The qualitative data analysis technique of coding and memoing (Strauss & Corbin, 1998) was used to analyze the qualitative interview data. The (1) DCC Curation Lifecycle Model, (2) Level 3 Curation Model, and (3) An Adapted Conceptual Framework Model were introduced as referent models images. These referent modes images were enabled with the Qualtrics Heat Map feature that allowed the images to capture responses via clicking on the images. This feature allowed interview participants to provide data through clicking on the elements of the images that best represent their involvement in research and perspectives on elements necessary when conducting research within and across disciplines. P represents participant and the number represent the order (i.e., P1 = participant 1, P2 = participant 2, P3 = participant 3, etc.). Not all of the participants answered all of the questions. There were 7 total interview participants. Six participants completed the interview. However, all the interview participants answered the most important questions that included the image heat maps (See Fig. 17 – 22).

The interview participants were sent the interviews via email and given 45 days to complete the interview. The interview participants were survey participants asked in Q25 of the DAF Survey to participate in the interview. Fourteen (14) survey participants agreed to participate in the interview, ten (10) participants provided contact information, 7 out of 10 participants responded and started the interview, and 6 out of 7 participants completed the interview. The one participant that did not complete the interview bailed after seven questions. Since all the interview participants had already completed general DMC questions in the DAF survey, some interview participants did not want to answer specific DMC questions related to exemplar research projects. The directors of the research labs were sent the IRB approval and DAF Survey and Interview to invite scientists’ participation in this study. After the directors approved their research labs participation in this study, the directors allowed the DAF Survey to

118! ! ! ! ! ! be administered to the respective research labs faculty list serves and invite scientists to participate in the DAF interviews.

4.2.2 Area of Research Respondents were asked their primary research role, discipline, and lab to collect some basic demographic information on their area of research. Since this study is focused on the data management and curation practices of scientists of research labs in a university, it was necessary to collect these three elements on the individuals responsible for managing and storing data to explore similarities, differences, and interrelationships in data management and curation practices across roles, disciplines, and research labs. Table 18 provides a good representation of the distribution of the respondents that participated in the DAF Surveys in Phase 1 of this study.

Table 18. Primary Research Role, Discipline, and Research Lab

Primary Research Primary Research Role Primary Discipline Lab P Software Application Computer Science International Ocean 1 Support Discovery Program P Research Professor, Principal Marine Ecology, Fisheries FSU Coastal and 2 Investigator Science Marine Lab Data Stewardship Meteorology Center for Ocean- P Atmospheric 3 Prediction Studies P Data Management Biology and Oceanography MBL and ASU 4 Faculty member at an Boundary-layer Meteorology Biomicrometeorology P institution of higher and Biogeochemical cycles of Group 5 education water and carbon PI and Senior Collaborator Condensed Matter Physics National High P Magnetic Field Lab, 6 FSU Principal Investigator Materials science and physics National High P Magnetic Field 7 Laboratory

4.2.3 Disciplinary Domain Research Data Management Perspectives After the Area of Research section in the DAF Interview followed the Disciplinary Domain Research Data Management section. This section included questions on area of research

119! ! ! ! ! ! and on elements of referent models the participants are involved in their research. This is the most important section of the DAF Interviews and thus was positioned at the beginning of the interview in expectation of future none responses and/or interview drop off. The quantitative data obtained from the surveys is supported by the qualitative data captured in this section.

Q5. Could you briefly explain your area of research and the types of research questions, with examples, that you try to answer? Respondents were asked to briefly describe their area of research before being introduced to the referent models to provide a conscious frame of reference in which to reflect upon when answering following questions on which elements of the referent models are important to their research (See Table 18). 1. P1 – Is concerned with technology and techniques for efficient data management and dissemination; 2. P2 – Studies biology and ecology of marine fishes and movements, migration and habitat use using tagging, acoustic telemetry and satellite telemetry, studies of population dynamics and community structure using fishery-independent sampling…; 3. P3 – Seeks to identify changes in the atmospheric and near surface ocean conditions in the global oceans and understand the mechanisms of exchange of heat, moisture, and momentum across the air-sea interface; 4. P4 – Seeks understanding plankton dynamics and better ways to manage data; 5. P5 – Is interested in the breathing of land-based ecosystems and how to develop better environmental sensor networks to account for the spatial heterogeneity of our natural environment; 6. P6 – Works with several different areas involving novel superconductivity, low dimensional conductors and spin system at high magnetic fields and low temperatures to attempt to understand the role of magnetism in materials; 7. P7 – Works in the area of materials characterization and development and measures the physical properties of materials used for constructing high field magnets.

Q6. I am interested in learning more about those research activities that contain some form of data management. It may be easier to do this by going through a particular research project that you carried out, and look as its ‘research lifecycle’, from funding application,

120! ! ! ! ! ! data collection & processing, all the way to publishing. Thinking of your research projects, could you select one of them as an exemplar and tell me (1) about that project, (2) the name of the project, and (3) the project outcome? The themes of data curation lifecycle and research data management of exemplar projects from data creation to representation to publication to preservation were most evident in the following projects.

My group is constantly involved in data management. We create and support the applications used on the JOIDES Resolution research vessel. Data is gathered from over 50 instruments in real time; including physical properties, chemistry, and visual observations. The information is organized, stored, analyzed and made available to scientists in various tabular file formats and transmitted to NGDC for term storage. – P1, Software Application Support, Computer Science, International Ocean Discovery Program (IODP)

The project is known as the Shipboard Automated Meteorological and Oceanographic Systems (SAMOS) initiative. The initiative aims to improve the quality of meteorological and near-surface oceanographic observations collected in-situ on research vessels (R/Vs). Scientific objectives of SAMOS include (1) Creating quality estimates of the heat, moisture, momentum, and radiation fluxes at the air-sea interface, (2) improving our understanding of the biases and uncertainties in global air-sea fluxes, (3) benchmarking new satellite and model products, and (4) providing high quality observations to support modeling activities (e.g., reanalysis) and global climate programs. The initiative developed procedures to recruit research vessel operators that were willing to routinely submit 1-minute averaged navigational, meteorological, and oceanographic observations to our data center. The observations undergo common data formatting, quality assessment and quality control, and metadata augmentation prior to being distributed to the user community. Outcomes are quality evaluated meteorology and oceanographic observations that are delivered to the research and operational communities and ultimately submitted to the National Oceanographic Data Center for long term archival and preservation. – P3 Data Stewardship, Meteorology, Center for Ocean-Atmospheric Prediction Studies (COAPS)

1) This project studies the ecosystem carbon and water exchange at 3 locations in the Pacific Northwest and is part of the AmeriFlux network. It requires year- round observations of atmospheric exchange of momentum, mass, and energy, the observations have been carried out continuously since 2002 at one location, and since 2006 at the other two locations. 2) Title: "The effects of disturbance and climate on carbon storage and the exchanges of carbon dioxide, water vapor and energy of coniferous forests in the Pacific Northwest: integration of measurements at a cluster of super sites", 3) We have published many papers resulting from this study; The main paper summarizing the first 8 years of

121! ! ! ! ! !

observations that I found) the seasonal hydrology is the main driver of ecosystem carbon uptake, ii) drought plays a significant role in the carbon sequestration potential at a site, iii) only by proposing a novel concept called "Hydro- ecological years' we were able to delineate functional seasonality in an ecologically meaningful and explain the inter annual and intraannual variability, this concept takes the plants' perspective, rather than the human perspective. – P5 Faculty member at an institution of higher education, Boundary-layer Meteorology and Biogeochemical cycles of water and carbon, Biomicrometeorology Group

Project: Verification tests for ITER TF strands ITER are a multi-billion dollar international nuclear fusion project. Large quantities of superconductor wires are needed for the project. Our role in the project is to do the quality control test of Nb3Sn superconductor wires made by US manufacturers. Outcome: The deliverable is the Nb3Sn wire property data measured at both room temperature and cryogenic temperatures. – P7 Principal Investigator, Materials science and physics, National High Magnetic Field Laboratory (NHMFL)

Participants were then introduced to the following models (1) the DCC Curation Lifecycle Model in question #7 (See Fig. 17), (2) Level 3 Curation Model in question #8 (See Fig. 18), and (3) An Adapted Conceptual Framework Model (See Fig. 19) in question #9 in order to develop contextual perspectives in which to associate their research experiences with data management and curation concepts and a conceptual framework model (See Table 22). These models contain (1) referents (semiotic symbols), (2) relationships among referents, (3) set of rules or syntax, (4) results (derive new knowledge), and (5) operation (acts on referents via relations and rules to produce results) constituents (Lesh et al., 2000; Bodner & Orgill, 2007, pp. 74-75) with which to convey models from one discipline as mental models in other disciplines. These graphical models were introduced during the interview to stimulate mental recognition of some of the data management and curation processes along with metatheoretical concepts involved in research data management with and across disciplines. The process of introducing the DCC Curation Lifecycle Model to scientists and researchers across multiple disciplines is not new yet is effective in providing visual representations and recognition of data management and curation key concepts. These models were introduced without explanation to obtain honest responses from scientists and researchers outside of the discipline of library and information science, digital libraries, and data management and curation that may not familiar with or have not had adequate education & training in data management planning, data curation, digital

122! ! ! ! ! ! curation, and digital preservation in relation to research data management practices as evidence by data from the DAF Surveys.

In order to capture respondents’ responses through clicking on the images, the images were separated into regions. Whenever respondents click on a region, their responses were captured. When multiple respondents clicked on the same area, then the concentration of the region where the respondents clicked will intensify in color with the color closest to red symbolizing a higher concentration of responses, thus heat map. Below, the responses from the images heat map are presented in aggregate whereas the respondent’s individual responses are recorded in Fig. 17.

Q7. Which of the elements of the DCC Curation Lifecycle Model are you involved? (Please click on the sections that apply.) (See Fig. 17 and Table 19) The respondents selected sections mainly from the data (digital objects or databases) to description to representation to preservation. Respondents also selected create or receive, access, use & reuse, curation, and transform sections of the DCC Curation Lifecycle Model and were previously mentioned in the brief descriptions of their exemplar projects. Scientists from across multiple research disciplines are involved in many aspects of the DCC Curation Lifecycle Model and could be used for mapping data management concepts, education, training, and teaching.

Fig. 17 The DCC Curation Lifecycle Model – Interview Heat Map

1! 23! ! ! ! !

Table 19. The DCC Curation Lifecycle Model – Interview Heat Map Table

The DCC Curation Lifecycle Model heat map (Fig. 17 & Table 19) provided evidence of which areas of the DCC Curation Lifecycle Model in which the scientists are actively involved. The higher concentration of clicks represented by responses and percentage (See Table 19) indicate which areas of greater significance in the scientists’ current activities in data curation lifecycle management even if scientists were not familiar with the key concepts of data management and curation as defined in the LIS/ARL disciplines. There were a total of 6 interview participants that participated in this specific question. The regions of (1) Conceptualize, (2) Data, (3) Appraise, and (4) Store all have 100% or greater participation. These regions are associated with data management planning. Thus, all the interview participants that answered this question are actively involved in some aspect of data management planning.

Q8. Which of the processes from the Levels of Curation Model are you involved? (Please click on processes that apply.) (See Fig. 18 and Table 20) – The respondents selected many of the processes in Level 1 Curation – traditional academic information flow (Lord & Macdonald, 2003, p. 43), Level 2 curation – information flow with data archiving (ibid, p. 44), and Level 3 curation – information flow with data curation (ibid, p. 45). Each level of curation can be

124! ! ! ! ! ! mapped to (1) data curation (Level 1), (2) digital curation (Level 2), and (3) digital preservation (Level 3), respectively. This question was asked to correlate scientists’ current data management and curation practices from multiple disciplines to concepts in the author’s dominate domain.

Fig. 18 Level Three Curation Model – Interview Heat Map

Table 20. Level Three Curation Model – Interview Heat Map Table

125! ! ! ! ! !

The Level Three Curation Model (See Fig. 18 & Table 20) provided evidence of the various levels of curation scientists are currently involved in the management of their research data. A total of 7 interview participants answered this question. The regions included in the model represent (1) data curation, (2) digital curation, and (3) digital preservation. All the scientists are involved in some aspects of data curation, particularly the research process, primary data, secondary data, tertiary data, and the publication process as represented by the responses and frequencies (See Table 20). Scientists are most involved in aspects of data curation with 429% followed by the publication and access with 143%. The model was separated into three major categories (1) Research Process, (2) Publication, and (3) Curation (includes data curation and digital curation). The processes from the model (See Fig. 18) were mapped to the concepts (See Table 20) to help convey the key concepts across multiple disciplines.

Q9. In your opinion, which of the following elements of the Adapted Conceptual Framework Model are important when conducting research within and across multiple disciplines? (Please click on the elements in the framework image that apply.) (See Fig. 19 and Table 21) The scientists’ responses to this question was interesting for two reasons (1) this is a developed model from the field of organizational theory in social science and (2) the theory and practice elements were added to the original model developed by Burrell & Morgan (197) as a result of the author’s assumptions that theory and interrelated concepts (Merton, 1968) and practice and interrelated methods are important for informing each other within a paradigm. The integration of the theory and practice elements in the model placed the author’s assumptions in view of the practitioner thus making the theoretical framework even more useful (Crotty, 1998; Bodner & Orgill, 2007, p. 76). The author’s assumptions are supported by Moschkovich and Brenner’s (2000) view that “theory and methods are intricately related, mutually constructive, and informing each other (p. 459)” (Bodner & Orgill, 2007, p. 73). The scientists mainly selected concepts, methods, and problem but also selected ontology, epistemology, frame of reference, theory, and practice when conducting research within and across disciplines. This question is important for providing perspectives on elements necessary when conducting research data management within and across disciplines. Also, there is a need for a conceptual model that contributes to data management and curation theory development. The respondents’ responses to this question supports the author’s assumption that (1) this model can be used to develop data

126! ! ! ! ! ! management and curation theory, (2) theory and practice processes are necessary elements of a conceptual framework, and (3) multiple disciplines paradigms perspectives are necessary for interdisciplinary research in research data management and curation.

Fig. 19 Adapted Conceptual Framework Model – Interview Heat Map

Table 21. Adapted Conceptual Framework Model – Interview Heat Map Table

The Adapted Conceptual Framework Model (See Fig. 19 & Table 21) provided evidence of which elements are important to scientists when conducting research within and across

127! ! ! ! ! ! multiple disciplines. A total of 6 participants answered this question and identified (1) Concepts and (2) Methods are most important with 86% respectively followed by Frame of Reference with 57%. It is important to note that the Practice (43%) and Theory (29%) regions (suppositions) were added to the conceptual model by the author as a result of the research conducted in this dissertation.

Table 22. Disciplinary Domain Research Data Management Perspectives

DCC Curation Lifecycle Level 3 Curation Model Adapted Conceptual Model Framework (1) Data, (2) Description, (3) 1) Web Content, (2) Primary (1) Ontology, (2) Representation Information, data, (3) Metadata, (4) Concepts, and (3) (4) Preservation Planning, Secondary (derived) data, (5) Methods (5) Curate, (6) Preserve, (7) Tertiary data for publication, (6) Reappraise, (8) Migrate, (9) Patent data, (7) Curation, (8) Transform, (10) Create or Data repositories, and (9) P Receive Archived data 1 (1) Conceptualize, (2) Create (1) Primary data, (2) Metadata, (1) Theory, (2) or Receive, (3) Appraise & (3) Secondary (derived) data, Practice, (3) Concepts, Select, (4) Ingest, (5) (4) Tertiary data for publication, (4) Methods, and (4) Preservation Action, (6) (5) Peer Review, (6) Primary Problem Store, (7) Access, Use & publication, (7) Curator, (8) Reuse, (8) Transform, and Curation, (9) Data repositories, P (9) Curate and (10) Archived data 2 (1) Data, (2) Description, (3) (1) Research Process, (2) Web (1) Ontology, (2) Representation Information, Content, (3) Primary data, (4) Epistemology, (3) (4) Preservation Planning, Secondary (derived) data, (5) Frame of Reference, (5) Curate, (6) Reappraise, Metadata, (6) Data repositories, (4) Theory, (5) (7) Preservation Action, (8) (7) Archived data, (8) Curation, Practice, (6) Concepts, Store, (9) Access, Use & (9) Curator, and (10) Library - (7) Methods, and (8) P Reuse, and (9) Transform Peers - Public – Industry Problem 3 (1) Create or Receive, (2) (1) Web Content, (2) Primary (1) Ontology, (2) Appraise & Select, (3) data, (3) Secondary (derived) Frame of Reference, Access, Use & Reuse, and data, (4) Tertiary data for (3) Concepts, and (4) (4) Transform publication, (5) Peer Review, Methods (6) Primary publication, and (7) P Research based on data 4

128! ! ! ! ! !

Table 22. (Continued)

(1) Data, (2) Curate, (3) (1) Secondary (derived) data, (1) Frame of Create or Receive, (4) Ingest, (2) Tertiary data for publication, Reference, (2) (5) Store, and (6) Access, (3) Peer Review, (4) Primary Concepts, and (3) Use & Reuse publication, (5) Secondary Methods publication, (6) Tertiary publication, (7) e-Print, (8) P Metadata, (9) Data repositories, and (10) Archived data 5 No response. (!) Scientist, (2) Research (1) Practice, (2) Process, (3) Primary data, (4) Concepts, (3) Methods, Secondary (derived) data, (5) and (4) Problem Tertiary data for publication, (6) Primary publication, (7) P Metadata, and (8) Research 6 based on data (1) Data (Digital Objects or (1) Research Process (1) Frame of Reference P Databases), (2) Create or 7 Receive, and (3) Store

Q10. How does your discipline look at and understand reality? (i.e., What are the core ontological suppositions underlying its frame of reference?) The themes of organization, classification, analysis, and measurement of the relationships between and among data (constituents) via observations, experiments, and patterns recognition emerged as how the disciplines look at and understand reality.

Q11. How does your discipline learn about reality? (i.e., what are the basic epistemological stances for its frame of reference?) The themes of modeling (numerical), sampling, controlled experiments, observations, and sensors emerged as how disciplines learn about reality.

Primarily through observation and modeling. Meteorology is based on the physical observation of our world. Through observation patterns emerge that support the development of conceptual models for atmospheric systems (e.g., Bjerknes (sp?) cyclone model of the early 1900s). As our knowledge has grown we have been able to connect observations to physical, chemical, and mathematical concepts that in turn have led to the numerical (computer) modeling that dominates the field today. Through a series of observations and

129! ! ! ! ! !

numerical trials, we gain a better understanding of the realities of our atmosphere. – P3 Data Stewardship, Meteorology, Center for Ocean-Atmospheric Prediction Studies (COAPS)

Q12. What the concepts, methods, theories, and practices do you use to address ‘research data management ‘ in your discipline? The respondents stated that there are no theories but individual responsibilities of the person responsible for managing and storing the data during the data lifecycle and its usefulness and value to the researcher. Aspects of the data curation lifecycle model emerged from the responses even though respondents were not familiar with the concept. We collect many gigabytes of observations every month since studying atmospheric transport requires very fast-response sensors collecting observations 20 to 50 times per send (20 to 50 Hz). We archive these 'raw' data streams, and process and analyze them to produce secondary and tertiary data products. Managing the data servers, making sure data doesn't get lost, can be served to the larger research group, documenting the data well enough to be useful to others, and sharing the secondary and tertiary data with other external users is key. I am not familiar with any specific theories or concept for research data management, it all evolved as best practice out of experience. – P5 Faculty member at an institution of higher education, Boundary-layer Meteorology and Biogeochemical cycles of water and carbon Biomicrometeorology Group

4.2.4 Exemplar Research Project – The Funding Application Stage This section is interested in the funding agencies, data management planning, and the informal and informal data management planning that may or may not have influenced data management practices during the development of research proposals for exemplar projects.

Q13. What agency funded your exemplar research project? Funding agencies of the exemplar projects included National Science Foundation (NSF) (2), Louisiana Sea Grant, Department of Energy (DOE) (2), and the (National Oceanic and Atmospheric Administration (NOAA) Climate Observing Division.

Q14. Was a data management plan required by the funding agency at the application stage? 50% (3) of the participants responded yes that the funding agency required a data management plan at the application stage and 50% (3) responded no that the funding agency did not require a data management plan at the application stage.

130! ! ! ! ! !

Q15. If YES, (1) how did you develop the data management plan, (2) what resources did you use to seek help, and (3) what was included in your data management plan? – The themes of established data management plan, guidelines, and practices emerged as the resources used to develop respondents’ data management plan.

Data Management Plan has been in place for over 10 years. We added to it for the current funding / application cycle. Was done internally. Includes both initial data capture planning, retention, providing back to the community and long term archival. – P1

1,2) We were required to share the secondary data with the data network servers within one year of completion of an annual data set. We followed the networks' data submission guidelines and also outlined our strategy. 3) An outline of how the primary data is stored, backed up, and processed to produce the secondary data sets. – P5

I discussed current practices with facility heads and drafted a statement covering current practices. – P6

Q16. If NO, did you have any thought or informal plan for managing data at the application stage? Different themes of informal data management planning via electronic organization, retrieval, limited access, and backups of data emerged from the respondents that were not required by funding agencies to produce a data management plan.

Not other making backups. – P4

Absolutely. The entire proposal was a data management plan. – P3

I did. The data are organized in electronic form for easy retrieval. Granting limited access controls data. Data are backed up periodically. – P7

Q17. How much did formal or informal initial data management planning actually influenced your data management practice? Four respondents responded that very little to not all to formal or informal initial data management planning influenced data management practice. However, two respondents responded a lot to in every way to formal or informal data management planning influenced data management practice.

131! ! ! ! ! !

A lot. – P1

In most every way. Clearly, we have deviated from the original proposed plan over the past 10years as we have learned about best practices in other data management groups, but the proposed plan was the blueprint for our present operations. – P3

4.2.5 Exemplar Research Project - Data Collection Stage This section is interested in the series of research data management activities during the creative or receiving stage of the DCC Curation Lifecycle involving digital objects or databases.

Q18. Thinking of your exemplar research project, please describe the nature (range, scope, origin) of your research data and the process by which you capture & create new data. The themes of the Level 1 curation – traditional academic information flow (Lord & Macdonald, 2003, p. 43) and Level 2 curation – information flow with data archiving emerged as respondents described the data capture in during the research processes and primary and secondary data representation and dissemination during the publication processes. This is important to understand the lifecycle of data across different disciplines for research, education, and learning.

The process by which data is captured and maintained continues to evolve and mature as scientific needs change. – P1

High-frequency turbulence observations (20 Hz) of wind speeds, carbon dioxide, water vapor, methane concentrations, and air temperature, collected rom fast- response environmental sensors in multiple locations per location. Data were recorded with onsite data loggers and primary data manually harvested, while summary statistics were harvested via cell phone remote links and displayed on webpages automatically. Once on data server on campus, the primary data are then processed using statistical tools, which produce the secondary data sources (ecosystem fluxes). Those are then further aggregated into tertiary data products, which typically are contained in publications. – P5

SAMOS data are typically derived from a computerized data logging system that continuously records navigational (ship position, course, speed, and heading), meteorological (winds, air temperature, pressure, moisture, rainfall, and radiation), and near-surface oceanographic (sea temperature, conductivity, and salinity) parameters from underway research vessels. Measurements are recorded at high-temporal sampling rates (typically 1 minute or less). A SAMOS comprises scientific instrumentation deployed by the research vessel operator and typically

132! ! ! ! ! !

differs from instruments provided by national meteorological services for routine marine weather reports. Presently we had 32 vessels recruited and each delivers their data via daily (containing all 1440 1-min records from the previous day) ship-to-shore emails. – P3

Q19. Thinking of your exemplar research project, please describe any difficulties that you encountered during the data collection stage. The themes of change, duplication of efforts (repetition), incomplete documentation, technical difficulties (power failure and software development) and poor data quality emerged as difficulties encountered during the data collection stage.

Our biggest issue was dealing with change and the rate of change. With different major science objectives every two months, we often have to adapt the capture and management system to new types of instruments and analysis on a very short time frame. This was part of what drove the design of our current management system. – P1

Primary difficulty is in the recruitment of new vessels. There is a learning curve on the part of the operator to organize their data into a suitable format for our daily email deliveries. Some software development is likely needed on the part of the operator or they must become familiar with existing software to submit SAMOS records. So basically, most of the difficulties are technical. – P3

There were issues trying to figure out how we were going to capture and express rejection of the algae. The experiments were difficult and had to be repeated many times over several years. This made comparison difficult. – P4

Power failure leading to data loss Insufficient or incomplete documentation of sensor information, exchanges, collection schedule / protocol Fast turnover of students/technicians leading to different data harvesting schedules and protocols. – P5

Sometimes, the data quality is poor. The measurement has to be repeated. – P7

4.2.6 Exemplar Research Project - Data Storage and Backups This section is interested in the data storage and backups aspects of research data management outcomes with respect to respondents’ exemplar and/or grant funded projects.

Q20. What are the format (s) of your research data in the short term after acquisition? The formats of respondents’ research data in the short term after acquisition included: (1) relational

133! ! ! ! ! ! database – Oracle (both during and after acquisition), (2) hand written data sheets, Excel files and photos, (3) binary numerical format then converted to ASCII-readable floating point 4-byte numbers, (4) key-valued paired ASCII files, (5) ASCII text files and Igor data processing files, and (6) either ASCII text format or JPEG image format. Understanding the different file formats helps to identify the right tools and resources to aid scientists in data curation and management.

Q21. Where do you store your data in the short term after acquisition? The storage location of data after acquisition include (1) relational database – Oracle (both during and after acquisition), (2) filing cabinet and PC, (3) compact flash cards then converted on laptop and stored to HDD then transferred to data server, (4)Linux servers at COAPS, (5) desktop backed up by local external disk and backed up by lab wide server, and (6) hard disk of a local computer or in network drive. The various storage locations of data carry with various degrees of stability, reliability, and security. Choosing the proper storage location of data will ensure future access.

Q22. How much data do you generate or expect to generate during the life-cycle of the research project? The amount of data generated or expected to be generated during the lifecycle of the research projects of all six respondents ranged from less than 1 GB to 100 GB to 160 GB to 1 TB. The data generated varies across disciplines and projects based on research data capture. This is important in developing robust tools and resources that scale to growth in data captures.

Q23. How often do you structure and name your folders and files? One theme that emerged was structuring the data automatically once at the beginning during data creation and acquisition and without frequent restructuring and naming of folders and files.

We don't. Files are cataloged and maintained in a digital file catalog cross- referenced in the Oracle database associated with the measurements. A backup copy of files is kept as copied from each instrument system at the end of each two month expedition. – P1

Just once in the beginning. - P4

I don't restructure often, but create structure as data is acquired. Usually by material studied, measurement type and date. - P6

134! ! ! ! ! !

The other theme that emerged was continuously and frequently structuring and naming of folders and files. Continuously. We have a fully automated file and folder naming system. – P3

Every time we download them we follow a very strict naming protocol. – P5

More than 20 new folders will be created under is predetermined scheme every day. – P7

Q24. What types of data about your data (metadata) do you need? What types of metadata standards do you use for your data? The respondents all collect metadata but none of the respondents use a metadata standard. The themes of local metadata and project and disciplinary domain specific metadata without adherence to a recognized metadata standard emerged. Adherence to established and recognized metadata standards facilitates interoperability and data sharing between heterogeneous systems, platforms, and data repositories. As much as possible. We record how, who, when, where, raw instrument readings, calibration information for the instrument and final computed results. This is not recorded in standards such as Dublin core, but collected within the relational database as discrete information. – P1

We collect a wide range of metadata related to the individual ships, instrumentation, and observations. These include, but are not limited to instrument make and model, location on the vessel, units of measurement, calibration date, sampling rate, etc. For more details, look at section 4 of our most recent annual data report: http://samos.coaps.fsu.edu/html/docs/2012SAMOSAnnualReport_final.pdf. – P3

There is no specific standards we adhere to, we have online searchable online logs for field / data harvesting activities, in which we document all changes/ downloads etc. – P5

Lab note book covering power levels, material, masses, connections, drive voltages, configuration of leads etc. Some metadata is recorded as free text in data files and/or in the Igor data analysis program as the data is collected. – P6

Q25. How will you back-up the data during the project’s lifetime and what is the frequency? If you do not back-up your data, then please provide your reasons. Data back- ups are necessary data management functions and are required for protecting data against loss, damage, or disaster. Data back-ups should be regular, efficient, and data policy-driven. The theme of daily and/or automatic back-ups emerged from the responses.

135! ! ! ! ! !

Backups on the ship are tape based, daily incremental and weekly full with a rotating 6 week cycle. Backups on shore are the same rotation and monthly one set of full backup tapes is moved offsite. – P1

The data are backed up locally by using Raid storage disks. This ensures integrity of our data storage system. We also routinely submit (monthly) the data to the National Oceanographic Data Center. Finally, we run a daily sync to an offsite location - the National Center for Atmospheric Research. - P3 Backed up automatically via contracting resources provided by the college, and using our own RAID system and via Time Machine. - P5

We have daily back-up the data to our network drive. In addition, we back up data to a local hard disk once per month. – P7

Q26. Please describe your experiences with data loss, formatting, or file size issues. Some barriers to research data management can include data loss due for access, formatting, and file size issues. The themes of file size and file naming issues emerged as data management issues. We have had to restore sections of information from tape multiple times. Usually due to human error. Formatting is mostly for report output from the database and those reports continue to evolve with science needs. File size is a continuing discussion of how we are going to continue support the digital repository at the rate of 5 TB a year growth. However we have a planned capacity in excess of 10 years growth, which is our current contract. – P1

We have issues with the length of the file name. Since we have multiple levels of folder/sub-folders, the total length of the file name can be longer than 255 characters. So we had to compromise the clarity to make the file name less than 255 characters. - P7

4.2.7 Exemplar Research Project - Sharing and Security This section is interested in the data sharing and security of data aspects of data practices.

Q27. Who owns the data? This was the only question in which the respondents responded “good question”. The themes of uncertainty or belief that the institution and/or the funding agency own the data emerged from the majority responses beyond those responses that clearly stated the funding agency, the institution, or PI.

NSF. – P1

136! ! ! ! ! !

I suppose I do. I'm sure the institutions where I work do, technically, but none of my research will yield a profit, so they don't care. – P4

The funding agency and the PI. – P5

Good question. We consider the data to be public access and distribute them with no restrictions or holds. I would assume that the original data are "owned" by the originating vessel and the subsequent reformatted and quality processed data are covered by FSU intellectual property rules. In my world, data ownership is not a concern - we want everyone to benefit from them. – P3

Good question. At the moment we state the PI but the NSF seems to be changing the rules. – P6

Our funding agency, I suppose. – P7

Q28. Who needed access to the data? How did you share and provide access to the data? Access to the data may be required or not and may have policies in place that restricts access to data. This question is interested who has access to the data to identify the users of the data. The themes of scientists, research group, and collaborators needing access to the data emerged. Access is initially provided to only the scientists who participated in the expedition. This is called a moratorium access and the data is only available to them, no matter where they are in the world. After moratorium (typically one year) the information becomes publicly available and is used by scientists worldwide. Data access is provided via reporting tools available on the web. – P1

Our users range from atmospheric and ocean modelers, researchers developing and deploying satellites for space-based ocean observation, and a wide range of marine climatologists. Tracking data use in an open distribution model is nearly impossible. Data are distributed via web pages, ftp services, and a THREDDS catalog server. Very rarely we will provide data on digital media for special requests. Also the data are reserved by NODC after submission to that archive. – P3

My own group, research collaborators on campus, and the larger network (AmeriFlux). – P5

Our group members need access to the data, because we are constantly updating them. The project funding agency will also need access to the data. Our group members have full access to the data (including editing and deleting rights). The processed data are submitted to the funding agency on regular bases. The raw data are available to the funding agency upon request. – P7

137! ! ! ! ! !

Q29. Thinking of security and confidentiality of your data, what security measures have been taken to preserve security and integrity of your data? The security and confidentiality of data is a constant issue that requires vigilance, evaluation, and benchmarks for success. Without any clearly defined research data management policies for enforcing and preserving the security, confidentiality, and integrity of data, then data will be subjected to data breach, theft, and loss. There was not much concern to threats to the security and confidentiality of data. The themes (1) that most of the data is publicly accessible and (2) that restricted data is limited to dedicated users emerged.

The ability to add / change data is restricted to a control account that is not used by any person. It is managed via web services, which authenticate the user with their personal credentials, but use the control account to modify the database. Audit records are kept of changes made through these services and who did them. – P1

Again the data are public access, so confidentiality is not a concern. We do house the data within our COAPS severs which are protected by a range of Linux security protocols. In fact, I forgot to mention earlier that the public copy of the data (the one on our public FTP server) is a copy of the final data, which is stored within our firewall. So if the public copy was in some way corrupted (less security in the ftp location) it could be restored from our more secure internal copy. – P3

Only our group member has the access to the data, which is stored in the network drive. – P7

Q30. How will you publish and provide public or open access to your data? If not, why? The White House Office of Science and Technology Policy (OSTP) issued the policy development for public access to federally-funded research to the heads of all federal agencies via a memorandum in 2013. The theme that access is required as part of data access polices emerged. Some data was published via the web and FigShare.

Done via the web. – P1

I have put some of it on FigShare. – P4

The secondary data is required by the funding agency to be made publicly available, and we have a fair-use policy in place. – P5

138! ! ! ! ! !

See above. Our data are public access. – P3

Not to the original data since without the complicated curation of the original meta data it Is incomprehensible and useless. There is no forum or funding for such curation of original data in our field. – P6

The data may be published with consent of the funding agency. – P7

4.2.8 Exemplar Research Project - Archiving Data This section is interested in the Level 2 curation – information flow with data archiving (Lord & Macdonald, 2003, p. 44) of the data.

Q34. Please describe the stewardship of any raw data after the life of the project such as when data is archived for long-term preservation and when the files migrate into an archive. The long-term preservation of data is a very complex function with multiple stakeholders (funding agency, stakeholders, IT) and requires stringent requirements to be effective. Out of the six respondents, two have clearly articulated long-term preservation of data whereas the other do not have clearly defined requirements but are interested in suggestions from experts is FSU on long-term preservation.

All data captured is formatted per NGDC requirements and delivered to them for long term maintenance and archival. – P1

Every original data file received from a vessel operator is preserved and included in our monthly submissions to the National Oceanographic Data Center (NODC). – P3

We have not yet discuss this with the funding agency. I would like to have suggestions from experts in FSU. – P7

Q35. What should be (or was) archived beyond the end of your project? Who makes this decision? Determining what data is archived beyond the end of the project is a decision that needs to be thought of in advance of developing a data management and executed at the end of a research project. The themes (1) that all data should be archive and (2) the decision is made by the funding agency and/or the PI emerged. All data we capture per our NSF contract is archived as part of our agreement. So I guess NSF made the decision. – P1

139! ! ! ! ! !

We archive everything from primary to tertiary data. We, the PIs of the project make that decision. A copy of the secondary data lives with the network servers, so they decide what happens with this data. - P5

Q36. How long should exemplar research project data be stored, managed, and preserved? Exemplar research projects data should be stored, managed, and preserved indefinitely to allow future scientists and researchers to build, develop, and extend new research from exemplar projects. All the respondents are in agreement and the themes that exemplar projects should be (1) stored, (2) managed, and (3) preserved indefinitely emerged.

See above. Length of time to be held in NGDC is as long as that Data Center exists. – P1

Ideally, forever.... – P4

It's a continuing long-term ecological study, so the horizon is 30+ years? – P5

We submit data to NODC because they are a 100+ year archive. Preservation is their responsibility, but the data will have value for 100s of years. – P3

Until I retire. – P6

Indefinitely. – P7

4.2.9 Expected Support This section is interested in respondent’s experiences and perspectives on support for the data management and curation of research data in their respective research labs/environments.

Q37. Are there any challenges, concerns, and barriers in managing your research data that have not been addressed? If so, then please describe them and the services that would help you to deal more effectively deal with those issues. Oftentimes barriers to research data management are beyond the control of scientists/researchers. The themes that emerged were that (1) limited resources (time and money), (2) needed staff (personnel) and/or staff turnover, and (3) proper documentation of data (description and representation) pose challenges and create barriers to managing research data. These qualitative data themes support the quantitative data barriers to research data management identified in Sec.4.1.4, particularly Q13.

140! ! ! ! ! !

I think it all comes down to time and money. A service that would do this for you would help. – P1

The number one challenge is resources. Research data management always takes a back seat to research data collection. The work of a data management center is meticulous and requires personnel with very strong computer skills. It is very difficult to find, employ, and retain talented database managers and system architects in a university system (with its limited compensation system). Securing both external and institutional support for research data management continues to be a challenge. – P3

4.2.10 Effective Use of Infrastructure This section is interested in current use of infrastructure in managing research data.

Q38. Please rank which of the following storage devices you use for storing your research data (please rank them from 1 to 6, where 1 is the one you use least and 6 is the one you use the most). Participants were asked to ranked from 1 to 6 (with 1 representing the least and 6 representing the most) the following devices used in the storage of research data. All of the participants did not use all these storage devices.40% ranked laptop/PC as 6 followed by 40% ranked USB or external hard drive as 5 then 40% ranked laptop/PC as 4 followed by 33% ranked both My personal network storage and Dropbox or other cloud based storage as 3. The top storage devices are (1) laptop, (2) USB or external hard drive, and (3) My personal network storage and Dropbox or other cloud based storage.

Table 23. Ranking of Storage Device for Research Data

Total Storage devices 1 2 3 4 5 6 Responses USB or external hard drive 0 20% (1) 20% (1) 20% (1) 40% (2) 0 5 My personal network storage 0 0 33% (1) 33% (1) 33% (1) 0 3 Your departmental shared area 33% (2) 17% (1) 0 17% (1) 17% (1) 0 6 Dropbox or other cloud based storage 0 0 33% (1) 33% (1) 33% (1) 0 3 Laptop/PC 0 20% (1) 0 40% (2) 0 40% (2) 5 Other 67% (2) 0 0 0 0 33% (1) 3 141! ! ! ! ! !

Q39. How would you describe the security of your research data? The general theme was that the security of research data is reasonably secure but not completely sure it is secure except for P1 that states the security is “Excellent”.

Excellent. Controlled in an Oracle data base. – P1

Secure as in no one can get to it? Probably not very. – P4

Well, I hope its fine. – P5

As secure as is possible on networked Linux computers. Network security is the responsibility of our systems administration group. – P3

Reasonably secure. – P7

Q40. How would you describe the effectiveness of the university infrastructures in managing your research data? This was one of the most important question with the most unexpected responses. The general themes of the effectiveness of the university infrastructure in managing research data range from the university is not involved in the infrastructure used to manage their research data to mediocre to reasonably satisfied. Scientists/researchers had to develop their own strategy to develop the infrastructure necessary to manage their research data.

The University is not involved in managing our information. – P1

Mediocre. – P4

Very little support, each group needs to come up with its own strategy. Costly, but at least I am in charge…. – P5

Most of our infrastructure was purchased in house on grants. The university does provide a high performance network link in our building that does improve data access performance, but overall, the university does not provide us much in the way of support. – P3

The lab has infrastructure separate from the university and manages. – P6

Reasonably satisfied by the network server provided by NHMFL. – P7

142! ! ! ! ! !

4.2.11 Research Data Management Confidence and Awareness This section is interested in the respondents’ confidence and awareness of research data management during different stages of the research processes from the development of the research proposals to the successful awarding of the proposals to execution of research proposals objectives including the management of the data from the research projects.

Q41. On the scale of 1-3, please rate your level of confidence and awareness of the following research data management (RDM) and data management plan (DMP) matters. – Respondents were asked to rate their level of RDM confidence and awareness during different stages of the research proposals development and execution of research proposals goals.

Table 24. Research Data Management Confidence and Awareness

Research data management (RDM) Not very Confident Very function confident confident Consideration of the RDM requirements at the bidding stage 50% (3) 0 50% (3) Expertise in developing DMP for bidding applications 33% (2) 0 67% (4) Fulfilling research grant RDM obligations 17% (1) 0 83% (5) Awareness of university RDM facilities and DMP policies 33% (2) 33% (2) 33% (2) Ease of access to the research data during the lifecycle of the research project by yourself 0 17% (1) 83% (5) Ease of access to the research data by collaborators 0 50% (3) 50% (3) Re-accessing the data after the project life-time 17% (1) 50% (3) 33% (2)

Q42. Did you find this interview useful? Was it helpful in increasing your awareness of my project and the requirements of RDM? 50% (3) found this interview useful and was helpful in increasing their awareness of the author’s research project and RDM requirements whereas 50% (3) did not find this interview useful and helpful. P3 (Data Stewardship, Meteorology, Center for Ocean-Atmospheric Prediction Studies (COAPS),) P5 (Faculty member at an institution of

143! ! ! ! ! ! higher education, Boundary-layer Meteorology and Biogeochemical cycles of water and carbon Biomicrometeorology Group), and P7 (Principal Investigator, Materials science and physics, National High Magnetic Field Laboratory (NHMFL) found this interview useful and helpful.

Q43. As a result of participation in this interview, do you think you will re-evaluate your RDM practice? 33% (2) think they will re-evaluate their RDM practice as a result of participating in this interview and 67% (4) will not re-evaluate their RDM practices as a result of participating in this interview. The two participants that will re-evaluate their RDM as result of participating in this interview are P5 and P7. This interview is a success in that 33% of the participants’ RDM practices will re-evaluate and improve their data management practices.

4.2.12 Concluding Questions & Comments Q44. Do you have any questions regarding this interview? One respondent stated not to understand some of the questions. Most likely, the respondent is referring to the questions including the models in which there was not a brief explanation of the models before asking participants to click on the models (i.e., DCC Curation Lifecycle Model, Level 3 Curation).

Q45. What comments, feedback, or suggestions do you have to help improve future research data management interviews? The themes of better explanation, identification, clarification, education, and articulation of the interrelation of key concepts that comprise data management and curation and theoretical models emerged from the only response to this question. These themes support the purpose of this study and address a need in this discipline. A brief explanation of the theoretical models and concepts (earlier Questions) would have been helpful. I was unfamiliar with any of those, so I clicked on the elements that seemed right, but no guarantee… – P5 Faculty member at an institution of higher education, Boundary-layer Meteorology and Biogeochemical cycles of water and carbonBiomicrometeorology Group

144! ! ! ! ! !

CHAPTER FIVE CONCLUSIONS What you do not see you cannot describe What you cannot describe you cannot interpret. But because you can describe something does not mean you can interpret it. – Patton (2002, p. 429)

The process by which data is captured and maintained continues to evolve and mature as scientific needs change. – DAF Interview P1 Participant (2013, Q18)

5.1 Discussions

This chapter discusses the study’s findings and research questions, followed by the study’s implications, conclusions, recommendations, and future research. The research questions for this study are:

1. How do researchers create, manage, store, and preserve research data? 2. How can the identification and clarification of key DMC concepts be resolved within and across disciplines? 3. What are some of the theories, practices, and methods disciplines use to address research data management in your discipline? 4. How can multiple paradigms perspectives on data management and curation practices within and across disciplinary domains contribute to building DMC research & theory?

The DAF Surveys and DAF Interviews address research questions #1 and #2. The DAF Interviews address research questions #3 and #4.

5.2 Research Q #1 – How Do Researchers Create, Manage, Store, And Preserve Research Data? Researches create, manage, store, and preserve research in multiple ways that vary across disciplines, research labs, and institutions. Researchers that participated in this study created primary data generated from: (1) computer model/modeling or simulation source code; (2) derived data resulting from processing or combining raw of other data; (3) experimental scientific experiments, controlled experiments, and computational results; (4) observations of

145! ! ! ! ! ! scientific phenomena at a specific time or location where data will usually constitute a unique and irreplaceable record; (5) reference data relating to gene sequences, chemical structures or literary texts; and (6) videos, images, audio files, and grant proposal project data. The resulting datasets from these data sources “ are becoming ‘the new instruments of science’ (Atkins et., 2010) ranging from the globally accessible results of high-throughput gene sequencing to the datasets emanating from vast scientific facilities in the physics domain” (Whyte, 2012, p. 206) such as datasets generated from the National High Magnetic Field Laboratory (NHMFL). Scientists that participated in this study provided examples of data-intensive science in the brief descriptions of their exemplar projects (See Sec. 4.2.3). According to Bell (2009), “data- intensive science consists of three basic activities: capture, curation, and analysis. Data comes in all scales and shapes, covering large international experiments; cross-laboratory, single- laboratory, and individual observations.” Researchers that participated in the study created primary data from the data sources and created secondary (derived) data in the form of computer software source code, data automatically generated from or by computer programs, data collected from sensors or instruments, lab notebooks, Excel, PowerPoint, Word, computer code, images, scans, photos, video, audio, and other formats.

Though not comprehensive, Section 4.4 Disciplinary Domain Research Data Management Perspectives attempts to explore data-intensive science of the interview participants from individual perspectives on experiences from exemplar projects from the funding application to data collection to data archiving, basically from data capture to curation to analysis. Many of the participants cited funding and limited resources as barriers to research data management. Data from both the surveys and interviews support that “funding is needed to create a generic set of tools that covers the full range of activities – from capture and data validation through curation, analysis, and ultimately permanent archiving” (Bell, 2009, p. xv). Although at times not explicitly stated, the respondents described the creation and supply of scientific knowledge from their research projects as only of interest to the scientific community and not necessarily outside the academic community and focused on individual science disciplines and rewards for individual scientist’s efforts through publications and the promotion and tenure process (Abbott, 2009, p. 111). Some respondents prefer to work alone, in isolation, and separate from others for reasons outside the scope of this study. However the social, cultural, and organizational factors

146! ! ! ! ! ! influencing scientists’ behavior with respect to how they create, manage, and store research data is worthy of investigation.

It was clear from the responses, that some researchers manage and store research data through radical personalization with personal access to computing, terabytes of storage, and personal compute clouds (Abbott, 2009, p. 114) while others manage research data through high- performance computing, networked services, external/commercial/vendor data storage services, cloud-based solution, data centers, and shared infrastructure (departmental, local, consortia, regional, national, and international). While some researchers have an informal approach in the way they manage their research data on local PCs, lab laptops, external hard disk, and DropBox according to local or self-defined structures and data storage practices while other researchers described a more formal approach to the management of their research data. Some researchers manage their research data according to best practices and guidelines developed by data centers, labs, and organizations. • Deep-C Consortium Data Management Plan http://deep-c.org.images/documents/Deep-C_DataMgmtPlan.pdf • ITL® • National Center for Atmospheric Research (NCAR) • National Climatic Data Center (NCDC) • National Geophysical Data Center (NGDC) • National High Magnetic Field Laboratory (NHMFL) Data Management Plan http://users.magnet.fsu.edu/Documents/DataManagementPlanPolicy.pdf • National Oceanographic Data Center (NODC) • National Snow & Ice Data Center (NSIDC) • National Science Foundation (NSF) Data Management Plan • Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC) for Biogeochemical Some of the standards used to manage research data identified by the participants include: • FTP Servers • Institute of Electrical and Electronics Engineers (IEEE) • International Organization for Standardization (ISO)

147! ! ! ! ! !

• Live Access Server (LAS) • NODC Geoportal Server • Open Archival Information System (OAIS) • OPeNDAP (Data Access Protocol) Hyrax Server • Network Common Data Form (NetCDF) – “a set of software libraries and self- describing, machine-independent data formats that support the creation, access, and sharing or array-oriented scientific data.” Source: www.unidata.ucar.edu/software/netcdf/ • THREDDS Data Server (TDS) – “is a web server that provides metadata and data access for scientific datasets, using a variety of remote data access protocols.” Source: www.unidata.ucar.edu/software/thredds/current/tds/

The researchers that were (1) worked in a research lab that had a data management plan in place (i.e., National High Magnetic Field Laboratory (NHMFL)), (2) worked in a research lab that was part of a consortium (i.e., Marine Coastal Laboratory as part of the Deep-C Consortium), or (3) worked in a research lab that contributed to a national data center (i.e., NCDC, NGDC, NSDIC) had good data management and storage practices and represented large scientific projects. Despite the variety and differences in the way that respondents store and manage their research data, “the most common methods of storing or backing-up data are via desktop or laptop computer hard drives, external hard drives (including USB drives), and university or departmental-based servers” (Akers & Doty, 2013, p. 9). It was clear from the interview responses while most, if not all the participants, were confident about the storage and management of their research data, some participants were unsure about the long-term preservation of their research data. While most participants with very large projects and research data in data centers were confident “great care is frequently devoted to the collection preservation, and reuse of data” (Heidorn, 2008, p. 280) many participants from research labs consisting of smaller research teams and projects were not as confident about the long-term preservation of their research data. Some respondents that participated in this study acknowledge data has not been made available to the rest of the scientific community. This data that has not been made available to the rest of the scientific community is called ‘dark data’ and constitutes a large portion of scientific output contributing to the ‘long-tail of science’ (Heidorn, 2008). It is this long-tail of science data evident from participants in this study that is in need of

148! ! ! ! ! ! long-term preservation as evident from some of the responses in the DAF Survey. 22% of the participants are aware of long-term preservation policies, 17% are not aware, and 60% don’t know of any long-term preservation policies at their respective organizations according to data Q16 from the DAF Survey that supports evidence of long-tail scientists. However, the DAF Interview participants were confident in the long-term preservation of their research data as evident from their selection of the store, curate, and preservation action elements in the DCC Curation Lifecycle Model (See Table 19) and by the higher concentration of responses on the archive data processes in the Level 3 curation heart map (See Fig. 18). The respondents’ selection of store, curate, preservation action, and archived data from the two models correspond to long-term preservation of research data. The DAF Survey Q9 responses revealed 89% (73) of the participants allow access to research data after the project is finished whereas 11% (9) do not allow access to research data after the project is finished. The DAF Survey Q10 responses revealed 63% (52) of respondents were concerned with confidentiality or data protection issues and 55% (45) were concerned with the data not fully documented. The DAF Interview Q28 responses revealed that the no one, scientists, own research group, team, collaborators, self, and funding agency needed access to the data. The DAF Survey responses on allowing access to data and concerns about allowing access to the research data are complimented by the DAF Interviews responses on who needed access to the data. Both the survey and interview data on data sharing is supported by the key principles underlying the BBRSC data sharing policy. The key principles underlying the BBRSC data sharing policy includes: (1) Few Restrictions, (2) Timely, (3) Appropriate Data Quality, (4) Metadata, (5) Use of Standards, (6) Use of Existing Resources, (7) Appropriate to discipline, and (8) Regulatory Requirements (ethics) (Whyte & Pryor, 2010, p.4). Data from both the DAF Surveys and DAF Interviews provide examples of “data arising from high volume experimentation, low throughput data arising from long time series or cumulative approaches, and models generated using systems approaches” (BBSRC, 2010, p. 8) that are strong cases for scientific data sharing. The BBRSC data sharing policy addresses the concerns of some of the participants in this research study in that:

“Researchers have a legitimate interest in benefitting from their own time and effort in producing the data but not in prolonged exclusive use of these data. Timescales for data sharing will be influenced by the nature of the data but it is expected that timely release

149! ! ! ! ! !

would generally be no later than the release through publication of the main findings and should be in-line with established best practice in the field” (BBSRC, 2010, p. 5).

In addition to the data from this study supporting the underlying principles of the BBSRC data sharing policy, results from this study also reflected the difference between ‘big sciences’ and ‘little science’ as defined in literature.

‘Big science’ fields such as physics and astronomy that collaborate around expensive instrumentation have constructed shared digital libraries to manage their data and documents, while ‘little science’ research areas that gather data through hand-crafted fieldwork continue to manage their data locally. Borgman, Wallis, & Enyedy, 2007, p. 17

‘Big science’ was evident in one respondent being part of a billion dollar research project while other respondents did not generate enough data that warranted development of a data management plan. Regardless of the discipline, size of the research project, or research team of one, all individuals responsible for the creation, storage, management, and preservation of data should have a data management plan and if not should collaborate with colleagues and/or research labs that have data management plans. Without data management plans, scientific output and knowledge can be undiscoverable, closed, unlinked, non-useful, and unsafe “when the [data] custodian makes no plans for long-term retention in a changing technical environment” (Garrett & Waters, 1996, p. 3) and research collections lack established or standardized data systems that are not well integrated with more standardized resource and reference collections (NSB, 2005; Parsons et al., 2011, p. 556).

5.3 Research Q #2 – How Can The Identification And Clarification Of Key Concepts Of Data Management And Curation (DMC) Be Better Articulated Within And Across Disciplines?

The identification and clarification of key concepts of data management and curation (DMC) can be better articulated within and across disciplines through the use of framework and models as heat maps as demonstrated in this study. Even if some of the scientists did not understand the concepts they understood the processes involved in research data management articulated in general terms. The complimentary curation models used in this study included

150! ! ! ! ! ! general descriptions of some of the process involved data curation, digital preservation, and overall data lifecycle management. Combining complimentary curation models and utilizing theory development approaches such as Merton (1968) and Lewis and Grimes (1999) for addressing paradigmatic differences, similarities, and interrelationships of competing frameworks and models in heat maps type approaches can better articulate key DMC concepts within and across multiple disciplines. “Good data practices in all phases of the data lifecycle such as generating and collecting the data, managing the data analyzing the data, and sharing [the data]” (Tenopir, et al., 2011, p. 2) begins with the clear identification and clarification of key concepts of data management and curation. With specialization in data curation (UIUC, 2014), digital curation specialization (UMD iSchool, 2014), certificate in digital curation (UNC, 2014), graduate certificate in digital curation (UMaine, 2014), certificate in digital curation (John Hopkins, 2014), digital curation and data management graduate academic certificate (University of North Texas, 2014), master of science in records management and digital preservation (University of Dundee, 2014), and master of library and information science in digital preservation (Kent State University, 2014), there is a need to properly identify, clarify, and distinguish between the key concepts of data management and curation. What is the difference between data curation, digital curation, and digital preservation? Many of the definitions on the websites defined data curation and/or digital curation with descriptions that included data curation, digital curation, and digital preservation concepts except for Kent State University. Kent State University defines the digital preservation concept as defined by the ALCTS Preservation and Reformatting Section, Working Group on Defining Digital Preservation (ALA, 200720). The ALCTS definition of digital preservation includes most of the descriptions from the definition of digital preservation and some of the descriptions for digital curation in the 2006 JISC Digital Preservation briefing paper by Maureen Pennock. It appears most US definitions of data curation, digital curation, and digital curation includes descriptions that define one concept in terms of another concept. If this confusion exists within the field of library and information science and digital libraries in the US, then how can the discipline properly and effectively articulate data management and curation research,

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 20!Definitions!of!Digital!Preservation.!Prepared!by!the!ALCTS!Preservation!and! Reformatting!Section,!Working!Group!on!Defining!Digital!Preservation.!Source:! http://www.ala.org/aclts/resources/preserv/defdigpers0408.!! 151! ! ! ! ! ! education, and learning within and across other disciplines? This question was the motivation behind my 2012 pilot study on Data Management & Curation Services: Exploring Stakeholders Opinions21 that resulted in development of the Data Management and Curation (DMC) Framework22 that maps underling curation models and concepts. What is data management and curation? For purposes of this research study, data management and curation is broadly defined as a research data lifecycle management concept comprised of four underlying key concepts of (1) data management planning, (2) data curation, (3) digital curation, (4) digital preservation. The underlying definitions will change, evolve, and develop in tandem with changes in scientific, educational, and scholarly research communities. The following description of curation, stewardship, and LTER data practices by Karasti et al. (2006, p. 352) provides a good foundation on which to develop current and future understanding of this key concepts. Curation and stewardship both focus on the data but have different views about the nature of data, their lifecycles, and relations with their environments of science conduct. As portrayed in e-Science literature data curation in organizing and overseeing data holdings deal with the guidelines and procedures for data ingestion, archive, and delivery. Data stewardship as practices in LTER provides a large conceptual framework, an overarching process occurring now but attending to the past and taking into account and influencing the future, stretching from data planning to sampling, from data archive to use and reuse – including both data care and information infrastructure work. Such work involves data definitions, data requirements, and quality assurance as well as user feedback, redesign, and data exchange. Karasti et al., 2006

While Karasti et al. (2006) provided a good understanding of the differences in the nature of data and the stages of data lifecycles and referenced the Levels 1-3 Curation (Lord & Macdonald, 2003), Pennock’s (2006) definition of digital curation further explicitly clarifies and distinguishes the difference between digital curation and digital preservation concepts while acknowledging that digital curation includes underlying data management and curation concepts.

Digital curation is all about maintaining and adding value to a trusted body of digital information for future and current use; specifically, the active !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 21!Data!Management!&!Curation!Services:!Exploring!Stakeholders!Opinions.!Retrieved!April! 6,!2014!from!!https://easy.dans.knaw.nl/ui/datasets/id/easyGdataset:57106.!! 22!Data!Management!and!Curation!(DMC).!Retrieved!April!6,!2014!from! http://platosmith.com/research/research_datamgmtcurate.!! 152! ! ! ! ! !

management and appraisal of data over the entire lifecycle. Digital curation builds upon the underlying concepts of digital preservation whilst emphasizing opportunities for added value and knowledge through annotation and continuing resource management. Preservation is a curation activity, although both are concerned with managing digital resources with no significant [or controlled] changes over time. Pennock, 2006

While current US definitions of data curation, digital curation, and digital preservation define the concepts without clear differentiation of the underlying concepts, earlier UK definitions provide better understanding for properly clarifying the differences in key data management and curation concepts and processes through conceptual models like the Level 1-3 Curation Model and the DCC Curation Lifecycle Model. Interviews participants were introduced to the DCC Curation Lifecycle Model and the Level 3 Curation Model that also includes Level 1 Curation and Level 2 Curation as a means of conveying the concept of curation and stewardship to scientists without having to define the underlying key concepts (See Fig. 18 and Table 20). Interview participants were asked to click on the elements of the DCC Curation Lifecycle Model in which they were involved as part of their research within their disciplinary domains. The model was broken into regions to allow for recording of click on responses. Each region was mapped to a key concept (i.e., the center of the model Data (digital objects or databases), Description, and Representation Information were mapped to data curation) and recorded. The DCC Curation Model (See Fig. 17) was introduced to scientists in the DAF methodology (University of Glasgow, 2009) and many scientists could easily identify with the stages of curation and preservation. The Level 3 Curation (See Fig. 18) was introduced and many scientists clicked on process in the research, publication, and curation processes representing Level 1, Level 2, and Level 3 Curation. These levels can be mapped to (1) data curation, (2) digital curation, and (3) digital curation concepts whereas the entire models represent (4) data management planning. In conclusion, the identification and clarification of key concepts of data management and curation (DMC) can be better articulated within and across disciplines through conceptual models and not defining one key concept in terms of other key concepts. There is a need to clarify concepts and use the consistently within and across fields.

Although the exact terms used have varied (from “curation” to “digital preservation”, to “digital curation”) a range of commentators have been trying to convey the concept that we now need a new approach to creating and

153! ! ! ! ! !

managing digital assets. This approach often confounds attempts to neatly categorize activities and demands involvement from and interaction between, a far wider group of individuals, roles and organizations. This involvement and interaction extends across authors and researchers, publishers and curators, and information and data management specialists (Beagrie & Jones, 2001; Gray, Szalay, Thakar, Stoughton, & vandenBerg, 2002). Beargie, 2006, p. 4

The proper identification, clarification, definition, and linking of key concepts through conceptual models and frameworks are necessary for DMC research and theory development.

5.4 Research Q #3 – What Are Some Of The Theories, Practices, And Methods Multiple Disciplines Use To Address Research Data Management In Your Discipline?

There were a total of six interview participants that addressed this research question in Q11 and Q12 of the DAF Interviews. Scientists generally did not use any theories but controlled experiments, observations, scientific experiments, and various methods of data collection that primarily included (1) experimental and computational, (2) derived data, (3) computer code and software, (4) observational, and (5) reference. Participants used numerical models, field sampling, controlled experiments, sensors, observation and modeling, conceptual models for atmospheric systems, and experiments and observations methods to address research data management issues across their respective science disciplines. Scientists used satellite, sensors, telemetry, cell phones, high magnetic fields and magnets, and multiple advanced technologies via data centers, high performance computing, and grid technology to address research data management in their respective disciplines. The responses reflected scientific output representing (1) experimental, (2) theoretical, (3) computer simulations, and (4) data-intensive science referred to as the ‘fourth paradigm’ (Hey et al., 2009; Lynch, 2009, Parsons et al., 2011) research paradigms. The fourth paradigm “provides an integrating framework that allows the first three [research paradigms] to interact and reinforce each other” (Lynch, 2009, p. 177). The majority of research methods and practices of the interview participants comprised experimental, computer simulations, and data-intensive science research paradigms. The interview responses in this study also comprised the five complementary elements of ‘digital science’ methods and data practices: 1. Collection of data from the physical world (using distributed sensors and instruments); 2. Distributed and remote access to organized repositories of such data;

154! ! ! ! ! !

3. Computation using theoretical models and experimental data; 4. Presentation of results for scientific visualization and interpretation; and 5. Support for collaboration among scientists (Messerschmitt, 2003).

Below are the responses to DAF Interview Q12 as supported by aforementioned literature and directly addresses RQ #3: Capture of original measurement information and metadata about the process and tools for QAQC purposes. Lifespan management of information, including versions and change history as corrections, additions or analysis occurs. – P1

At this point, mostly having good documentation and putting data in a repository or making it discoverable and citable in some way. That's about as much as one can do. – P4

We collect many gigabytes of observations every month since studying atmospheric transport requires very fast-response sensors collecting observations 20 to 50 times per send (20 to 50 Hz). We archive these 'raw' data streams, and process and analyze them to produce secondary and tertiary data products. Managing the data servers, making sure data doesn't get lost, can be served to the larger research group, documenting the data well enough to be useful to others, and sharing the secondary and tertiary data with other external users is key. I am not familiar with any specific theories or concept for research data management, it all evolved as best practice out of experience. – P5

Not quite clear what you a seeking with this question. I work with research vessel operators to ensure that they are using appropriate instrumentation to measure the quantities desired by the marine science community. We ensure that they have best practices for instrument exposure on the vessel and seek to collect sufficient metadata to understand the observations being made. – P3

There is very little theory involved. Basically there is no funding or reward for such an effort so data is only held or managed until publication. Other than that it is up to individual PIs as to how valuable data is for future research and if it is worth effort to manage. – P6

No theories of systematic approach. Only manually sorted and saved electronically only, and backed up regularly. – P7

5.5 Research Q #4 – How Can Multiple Paradigms Perspectives On Data Management And Curation Practices Within And Across Disciplinary Domains Contribute To Building DMC Research & Theory?

155! ! ! ! ! !

Within the scope of this dissertation, “the elements of [data management and curation] are concepts or representations of concepts” (Dahlberg, 1978, p. 9). Bolton (1977, p. 23) defines concept as “a stable organization in the experience of reality which is achieved through the utilization of rules of relation and to which can be given a name.”

A paradigm can represent a set of concepts, methods, theories, practices, frame of references, epistemology, and ontology (Burrell & Morgan, 1979: Morgan & Smircich, 1980: Morgan, 1983; Solem, 1993; Smith II) (See Fig. 10) of a discipline. A ‘paradigm’ seeks “to define the legitimate problems and methods of a research field for succeeding generations of practitioners” through theoretical perspectives or frameworks “to attract an enduring group of adherents away from competing models of scientific activity” and “to leave all sorts of problems for the redefined group of practitioners to solve” (Kuhn, 1996, p. 10). Multiple paradigms perspectives on data managements and curation practices within and across disciplinary domains contribute to building DMC through interdisciplinary research that promotes the interplay (Schultz & Hatch, 1996) where paradigms’ similarities, differences, and interrelationships (Gioia & Pitre, 1990) intersect. For purposes of this research study, the definition of interdisciplinary research adapted from the Committee on Facilitating Interdisciplinary Research, Committee on Science, Engineering, and Public Policy and accepted by the National Science Foundation (NSF) is used to define interdisciplinary research within the scope of this dissertation.

Interdisciplinary research is a mode of research by teams or individuals that integrates information, data, techniques, tools, perspectives, concepts, and/or theories from two or more disciplines or bodies of specialized knowledge to advance fundamental understanding or to solve problems whose solutions are beyond the scope of a single discipline or area of research practice. Committee on Facilitating Interdisciplinary Research, Committee on Science, Engineering, and Public Policy, 2004, p. 2

This definition of interdisciplinary research addresses the need for increased collaborative research within the library and information science (LIS) and digital preservation discipline and across multiple disciplines including the sciences that integrates multiple paradigms perspectives in pursuit of DMC research and theory development. As research funding becomes more competitive and researchers are forced to do more research with limited resources, “there is a need towards collaborative research projects that involve large scale inter-disciplinary

156! ! ! ! ! ! partnerships; among researchers and research users, companies and communities with shared interests in the problems to be addressed” (Whyte, 2012, p. 207) that is “important to the long- term management, preservation, and use of scientific research” (Hey & Trefethen, 2003; Messerschmidt, 2003). Research participants mention this need in Q26 from the DAF Survey in that there is a need for curators, training, education, and collaborations with data management experts at FSU. Data from both the DAF Surveys and DAF Interviews support the need for increased interdisciplinary research and collaborative research with respect to the management of research data within and across multiple disciplines, research groups, and collaborators. Even though some participants preferred to work alone, many participants worked in research teams that “produce exceptionally high-impact research, even where that distinction was once the domain of solo authors” (Wuchty, Jones, & Uzzi, 2007, 1036; Whyte, 2012, p. 208). It is evident from the data and literature that interdisciplinary research and collaborative research will continue to grow as the scientific needs change, the challenge of managing and storing data across all disciplines continues to proliferate, and “businesses and policymakers ask questions that are far more interdisciplinary than in the past” (Abbott, 2009, p. 112). The following characteristics of interdisciplinary research that were present in some of the respondents’ responses provide a framework in which to develop, implement, and promote interdisciplinary research within and across multiple disciplines.

IDR has the following characteristics as generalized by numerous writers (Alpert, 1969; Birnbaum, 1981; Blackwell, 1955; Caudill & Roberts, 1951; Luszki, 1958) (1) different bodies of knowledge are represented in research groups, (2) group members used different approaches in attempting to solve problems, (3) members of the group perform different roles in solving problems, (4) members of the group work on common problem, (5) there is a group responsibility for the final product, (6) the group shares common facilities, (7) the nature of the problem determines the selection of group personnel, and (8) members are influenced by how others perform their tasks. Qin, Lancaster, & Allen, 1997, pp. 893-894

In aggregate, the DAF Interviews participants selected all the processes (theoretical suppositions) in the Adapted Conceptual Framework (See Fig. 10) as important when conducting research within and across multiple disciplines. Thus, the scientists’ perspectives on the use of the Adapted Conceptual Framework Model in conducting research within and across multiple disciplines contribute to the use of this conceptual model for DMC research and theory

157! ! ! ! ! ! development. Broadening the scope and reach of data management and curation research to include the practices, perspectives, and experiences of researchers across multiple disciplines through interdisciplinary research contributes to DMC research and theory development through the integration and synergy of various scientific disciplinary domains’ concepts, methods, practices, and theories in addressing the problem of data management common to all disciplines.

5.6 Implications The participants provided their current data management and curation practices, perspectives, experiences, and recommendations on how organizations can help scientists, researchers, and individuals responsible for the management and storage of data improve the management of research data across multiple disciplines and through the lifecycle of data. This study provided implications not only for the scientists from research labs and multiple disciplines that participated in this study but also for the wider research and data management community.

Below is a list of research, practical, and social implications stemming from this research project that compliment the recommendations to improve the current data management and curation practices of the scientists that participated in this research study.

5.6.1 Research Implications Good DMC practices stimulate organized research data management awareness. Organized research data management awareness allows stakeholders, institutions, and users to be responsive to emerging trends, resources, and tools that facilitate improvements in data lifecycle management. Being cognizant and responsive to research data management emerging trends, resources, and tools allows organizations better adoption of appropriate and applicable opportunities for improvements than organizations without established, good DMC practices.

Data management education exposure across multiple disciplines and departments raise data management cognition. The proper articulation of the benefits or good DMC practices followed by education tailored to the specific audiences (i.e., advance, intermediate, and novice) ensures a broader exposure than a rigid, stoic, and inflexible education program. By raising the

158! ! ! ! ! ! research data management cognition through multifaceted and flexible education programs, the levels of confidence and awareness of good DMC practices will across disciplines and units.

Adherence to best practices, standards, and guidelines foster cogent data policies, promote good DMC practice, and enable new research built on accessible & existing data. The adoption of best data management practices, metadata and preservation standards, and research data management policies where appropriate and cost effective improves the lifecycle of data for current and future use. Research data that has been created, stored, managed, and preserved according to generally accepted practices, standards, and guidelines promote data sharing and interoperability of data across distributed, heterogeneous platforms, systems, and technologies.

5.6.2 Practical implications Data standards improve departmental and institutional level data management accountability and advance development of relevant data management policies. Departmental and institutional level data management accountability is essential for adherence to funding agencies data management requirements. Institutions, organizations, and departments that have relevant data management policies and procedures ensure that will meet the growing demand for data management plan by funding agencies and in some cases may even exceed funding agencies’ minimal data management plan requirements to the extent of advancing data management standards for success and benchmarks.

Good data policies support funding agencies data management plan requirements. The National Science Foundation (NSF), National Endowment for the Humanities (NEH), NOAA, JISC, and many other funding agencies now require data management planning as part of research funding. Already developed and established organizational/institutional/departmental data policies ensure those organizations will easily develop data management plans when seeking research funding.

An established DMP in one lab can serve as a model or catalyst for other units/labs. Researcher labs with large budgets and established data management plans can assist and help researcher labs with smaller budgets and without established data management plans to develop

159! ! ! ! ! ! appropriate and applicable DMP. The sharing of date management and curation resources whether time, talent, or money across disciplines and research labs enable more interdisciplinary research and collaborative environments that benefit multiple stakeholders and users.

Research labs with established DMP and/or part of consortiums representing big sciences and/or research projects with extensive resources could leverage their resources across smaller sciences and/or research labs without established DMP that are not part of consortiums with extensive resources. Since most scientists identified limited resources, time, and expertise for DMC training, education, and research, research libraries working in collaboration with ischools to develop research coordinating networks (RCNs) with available scientists (preferably scientists from both the big and smaller sciences) to facilitate the needs and gaps in resources for collaboration in education across research labs/disciplinary domains.

5.6.3 Social Implications Proper data lifecycle management increases data access, discovery, use & reuse. Understanding the differences in the stages of data management and curation in conjunction with applicable data management and curation models and frameworks facilitate effective data access, discovery, and preservation for current and future use. The current and future use of data allows users and the research teaching communities to study, duplicate, and/or extend existing research thus creating new and/or derivative research. New and derivative research allows testing, retesting, and validating of previous or existing research that improve research, teaching, and learning.

Metadata standards provide the origin, nature of research data, and extend the usefulness of data to science, research, and education. Metadata standards allow a uniformed, organized, and systematic approach to the description, representation, and dissemination of data. Whether nationally developed of locally developed, metadata standards are useful, necessary, and effective where appropriate and applicable in the availability, discovery, and delivery of data.

Data sharing contributes to the wider research data management and scholarly communities. Sharing data elevates the body and level of research in the research data

160! ! ! ! ! ! management and scholarly communities through the creation of new ideas, perspectives, and practices. Sharing data increases multiple perspectives that build broader impact and practices.

Without established policies on data management and curation that encompass the lifecycle of data including data sharing, storage, management and long-term preservation, research data and …information still becomes lost in the system, directed to the wrong people, or both. Similarly, during a crisis, the wrong people may try to solve a problem because of their prowess at bureaucratic gamesmanship, or the right people (because of mismanagement or oversight) may be overlooked or sent elsewhere. Sabrosky, Thompason, & Mcpherson, 1982, p. 142; Martin, 1992, p. 143

Whether a research lab, department, or unit has or does have established data policies governing the storage, management, and long-term preservation of research data, organizational and social structures impact DMC practices and perspectives within and across multiple disciplines as evident from this research study. This study revealed that identified barriers to RDM impact DMC articulation, integration, & collaboration. The data from this study revealed that scientists from research labs with a DPM and/or part of a consortium with DPM exhibited RDM Integration Perspectives (Martin, 1992) whereas scientists from the research labs without a DPM exhibited RDM Differentiation and Fragmentation Perspectives (Martin, 1992). Scientists from all research labs could benefit from improved RDM in which there are allocated opportunities and resources for Universities and research labs to collaborate on developing robust RDM & DPM education, guidelines, strategies, and policies that benefit multiple research labs and disciplines. Research labs can leverage, utilize, maximize, and incentivize campus & consortium assets to promote interdisciplinary research, collaboration, and sharing. This research study also revealed that further DMC & RDM research requires scientific inquiry, organizational, cultural, & sociological paradigms analysis when investigating data management and curation practices and perspectives across multiple disciplines.

5.7 Conclusions In future, funding agencies such as the National Science Foundation (NSF) may strongly recommend principal investigators (PIs) clearly articulate and include relevant theories,

161! ! ! ! ! ! concepts, and frameworks/models in the data management plans of research proposals seeking funding regarding existing and/or new data. Currently, some PIs believe that a comprehensive data management plan [a plan that include key elements of data management plans] is not necessary because PIs are working with existing and nor creating new data. Regardless whether PIs are working with existing data and are not in fact creating new data, a data management plan that describes the metadata, description, representation, aggregation, dissemination, and preservation of existing data used in grant-funded research is necessary and not discretionary.

This chapter answers this study’s research questions, provides implications, and suggests recommendations for future study and research. Even though some researchers “report that they struggle unsuccessfully with storage and management of their burgeoning volume of documents and data sets that they need and that result from their work” (Kroll & Forsman, 2010, p. 5) and other researchers choose “to practice data management in isolation, disconnected from the technology available to scientists collecting similar types of data” (Douglass et al., 2014, p. 254), many researchers manage their research data in a way they are most accustomed to managing their data either traditionally, contemporarily, or a combination of both traditional and contemporary research data management practices. Some traditional ways of managing data may include keeping local copies on paper, laptop/PC, external hard drive/disks, and USB/memory sticks/flash drives while some contemporary way may include Google Drive/Docs, Dropbox, Box, networked data storage, commercial/vendor networked data storage, consortium (shared infrastructure) data storage, and cloud-based storage. The ways in which researchers store and manage research data varies across disciplines, research labs, and institutions and is influenced by the structure, organization, and culture of their environments. The goal of most, if not all, researchers despite their current data management and curation practices is to have research data that is “organized, categorized, indexed, catalogued, archived, mined, retrieved, curated, preserved, conserved, consumed, and evaluated” (Bias, Marty, & Douglas, 2012, p. 277) to name a few. Literature reveals it is better to manage research data in accordance with available standards, best practices, and guidelines where appropriate and applicable but informal data management practices are equally important and provides the framework on which to develop or integrate formal data management policies and practices.

162! ! ! ! ! !

This study revealed the following cultures in organizations (Martin, 1992): 1. Integration Perspectives - scientists from research labs with established RDM & DMP and/or part of consortium - exhibited harmony, consensus, consistency (homogeneity) 2. Fragmentation Perspectives - scientists from research labs without established RDM & DMP and not part consortium - exhibited multiplicity, flux (ambiguity) 3. Differentiation Perspectives - scientists from research labs without established RDM & DMP and not part of consortium - exhibited separation, conflict (harmonious or indifferent sub-cultures)

5.8 Recommendations The data from this research study supports literature on the differences in the data management practices of scientists comprising ‘big science’ and ‘little science’ and the accompanying infrastructure supporting the data capture, curation, management, analysis, and visualization across scientific domains. It is recommended that a gateway to research similar to the Research Councils UK’s Gateway to Research Portal (GtR)23 be developed to help bridge the divide between ‘big science’ and ‘little science” data management and curation infrastructure and the data lifecycle management of scientific outputs. Within this gateway to research there would be a Gateway for Higher Education that provides the tools, information, resources, and connections on key collaborations within the institution and across multiple disciplinary domains to enable more interdisciplinary and collaborative research while also identifying and connecting with HEI, commercial, and government partners to better align the supply of scientific output with the demand for scientific knowledge from the commercial, private, and government sectors (Abbott, 2009; Research Councils UK Gateway to Research).This gateway to research would utilize an OpeNDAP (Data Access Protocol) type framework with Open Government License that provide the necessary APIs (i.e., produce outputs in ASCII, XML, JSON, & multiple formats and allow metadata crosswalks/mapping) to allow access and scientific networking to ‘big science’ and ‘little science’ data and scientific knowledge according to multiple stakeholders (i.e., scientific community, wider academic community, commercial sector, government sector, and private sector). It is recommended to leverage current resources, collaborations, and

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 23!Research!Councils!UK!Gateway!to!Research.!Introducing!the!Gateway!to!Research!Portal.! Retrieved!April!6,!2014!from!http://gtr.rcuk.ac.uk/resources/about.html.!! 163! ! ! ! ! ! consortium partnerships and memorandums of understanding to be able to expand the use of existing tools, resources, and infrastructure across projects, disciplines, and scientific environments to diversity risks, leverage resources, maximize scientific outputs, and increase ROI. Leveraging current resources, collaborations, and consortium partnerships for data- intensive discovery and data management and curation will “foster the development of software tools, support, new document authoring tools and publication models, and research into scientific data management” (Gray, 2007; Hey, Tansley, & Tolle, 2009, pp. 227-228). The start of such an initiative would include partnerships from FSU funding agencies, government funding agencies, industry, research labs/centers, and data from funding agencies and the disciplinary domains that produced the scientific knowledge. 5.9 Significance This study is relevant to the research data management community and: • Addresses the challenge of managing research data that affects all disciplines • Identifies scientists’ needs for data managers, education, and training • Aggregates, integrates, & interrelates complimentary DMC models within a study • Proposes that competing DMC models must specify and identify key DMC concepts • Discovers RDM barriers and supporters vary across big and smaller sciences/labs • Benefits some scientists that participated in this study • Identifies University/elsewhere managed/non-managed data stores (labs & physical) • Supports some of the Board on Research Data and Information (BRDI) primary mission areas such as mission area #1 “Address!emerging!issues!in!the!management,!policy,! and!use!of!research!data!and!information!at!the!national!and!international!levels”! and!mission!area!#3!“Encourage!and!facilitate!collaboration!across!disciplines,! sectors,!and!nations!with!regard!to!common!interests!in!research!data!and! information!activities”!(Board!on!Research!Data!and!Information,!2014). • Explores some of the DMC concerns, interests, and issues affecting the LIS, ARL, and research data management and curation communities

5.10 Future Research Research libraries can learn from the varied DMC practices, utilization of data centers, and successful DMC integration and collaboration of advanced scientists and PIs using data

164! ! ! ! ! ! centers, HPC, and grid technology to manage their research data. Advanced scientists and PIs working with non-advanced researchers/assistants/postdocs on developing resources, tools, education, and outreach programs within and across multiple disciplines can benefit research libraries efforts in working with scientists. Research labs that are part of consortiums and with established DMP and successful RDM practices compliment research libraries current efforts. This study revealed varying levels of DMC practices among scientists resulting in (1) advanced, (2) intermediate, and (3) novice categories. Each of these categories included scientists, researchers, and research assistants/postdocs with different levels of research data management confidence, awareness, and experiences that warrant different approaches if research libraries and ischools are to effectively collaborate with scientists from multiple disciplines across multiple research labs with respect to data management and curation. DMC education, workshop, and outreach must meet the needs and experience levels of the various categories. Thus a data assessment is required to properly determine how to best develop, deliver, and measure DMC practices/programs within and across multiple disciplines according to advanced, intermediate, and novice scientists. Senior faculty with advanced DMC experiences did not see much value in this study whereas intermediate and non-advanced scientists found some value in this study. The intermediate scientists identified the need for training for their research assistants that manage their research data. Non-advanced and novice scientists identified the need for education, workshop, and training on DMC best practices, standards, and guidelines. There are opportunities for advanced scientists to work with research libraries and ischools to develop data management programs, education, and outreach that address (1) advanced, (2) intermediate, and (3) novice scientists while optimizing collaboration and integration between and among the various levels. Advanced scientists can train intermediate scientists and intermediate scientists could train novice scientists while research libraries and ischools facilitate and compliment research, education, and training where appropriate. Senior faculty and advanced scientists representing big sciences used national data centers with extensive resources (managed repositories) to manage research data whereas some intermediate and novice sciences representing smaller sciences used non-managed repositories (local/free/open-source). This difference was reflected in how well research data was managed or revealed opportunities to improve how research data is managed. Future research on the differences in the RDM practices of advance, intermediate, and non-advanced scientists along with a comparison of the analysis of

165! ! ! ! ! ! changes in (1) data management and (2) data curation concepts from 2001 to present across the data management and data curation communities are possible areas to extend this research study.

The reason for the concatenation of data management and curation into a research data management concept comprised of (1) data management planning, (2) data curation, (3) digital curation, and (4) digital preservation is to contribute to a theory of data management and curation. According to Merton (1968), theory begins to emerge when key concepts are specified, clarified, and interrelated within a scheme. Another reason for the data management and curation concept is to allow for the aggregation of multiple perspectives of data management and data curation communities into a broader concept for theory development that is currently in an underdeveloped state. Further research and survey of the (1) data management and (2) data curation communities are necessary to further develop the DMC framework in this study.

There is an opportunity to extend this research to compare the definitions of the key concepts of DMC against the results from the heat maps to identify gaps, divides, and opportunities to improve information literacy on concepts that scientists did not understand.

The DAF methodology was used as a scoping project and not as a comprehensive audit of all data assets of the participating research labs. The DAF surveys and interviews can be improved with more demographic questions and specific project outcomes such as providing data management and curation resources information, management of sample data, curation of sample data, preservation of sample data, and conducting introductory workshops and training on data management planning, data curation, and long-term preservation. This research study focused on scientists from research labs at Florida State University and several scientists from other universities associated with the National Science Foundation (NSF) EarthCube project. Some of the scientists that participated in the study worked in research labs or environments that were part of a consortium and included other higher education institutions and funding organizations. This study can be expanded to include scientists beyond FSU and include scientists from other universities affiliated with the Deep-C Consortium, NHMFL (Los Alamos Laboratory and University of Florida), Florida Climate Institute, Antarctic Data Consortium, Integrated Earth Data Applications (IEDA), Marine Geoscience Data System (MGDS), Data

166! ! ! ! ! !

Observation Network for Earth (DataONE), the National Snow & Ice Data Center (NSIDC), and Extreme Science and Engineering Discovery Environment (XSEDE) to name a few.

167! ! ! ! ! !

APPENDIX A

DATA MANAGEMENT AND CURATION SERVICES (DMCS) OPINION SURVEY

Data Management and Curation Services Opinion Survey

Intro The purpose of this survey is to gather information about data management and curation services such as data management planning, data collection methods, and data preservation practices. You have been selected as part of a purposive sample based on your experiences and interests with these issues. This survey is looking for your personal opinions, and not the opinions of your organization, institution, department, or program. Your participation in this survey is completely voluntary and confidential. Your email address, name, and any personal identifying information will not be recorded or included in the survey or in the survey results.

A. Are you aware this survey is voluntary, confidential, and you can stop at any time without risk or harm? # Yes (1) # No (2)

If No Is Selected, Then Skip To Do you agree to participate in this r...

B. Do you agree to participate in this research? # Yes (1) # No (2)

If No Is Selected, Then Skip To End of Survey

168! ! ! ! ! !

Q1 What is the extent to which you agree or disagree with the following statements? Strongly Agree (2) Neither Disagree (4) Strongly Agree (1) Agree nor Disagree (5) Disagree (3) It is important to have a culture that encourages # # # # # effective data management planning. (1) It is important to have a culture that encourages data # # # # # management and curation services. (2) It is important to have a culture that encourages data management and # # # # # curation services within and across disciplines. (3) It is important to have a culture that encourages # # # # # interdisciplinary research and collaboration. (4) It is important to have a culture that encourages the capacity for # # # # # sustainability of cyberinfrastructure development and modeling. (5) It is important to have a culture that encourages best # # # # # practices, standards, and evaluations. (6)

169! ! ! ! ! !

Q2 What is the extent to which you agree or disagree with the following statements? Strongly Agree (2) Neither Disagree (4) Strongly Agree (1) Agree nor Disagree (5) Disagree (3) Currently, funders and stakeholders encourage # # # # # effective data management. (1) Currently, funders and stakeholders encourage # # # # # interdisciplinary collaboration and research. (2) Currently, funders and stakeholders encourage the # # # # # use of best practices, standards, and evaluations. (3) Currently, funders and stakeholders encourage # # # # # education, outreach, and learning outcomes. (4)

170! ! ! ! ! !

Q3 What is the extent to which you agree or disagree with the following statements? Strongly Agree (2) Neither Disagree (4) Strongly Agree (1) Agree nor Disagree (5) Disagree (3) An organization should invest in data management # # # # # cyberinfrastructure development. (1) An organization should invest in resources that assist scientists and researchers in # # # # # domain-specific and interdisciplinary data management. (2) An organization should support data management # # # # # research, teaching, and learning. (3) An organization should provide resources for data # # # # # management and curation services. (4)

171! ! ! ! ! !

Q4 What is the extent to which you agree or disagree with the following statements? Strongly Agree (2) Neither Disagree (4) Strongly Agree (1) Agree nor Disagree (5) Disagree (3) Data curation is the same as # # # # # digital curation. (1) Digital curation is the same as # # # # # digital preservation. (2) Data curation, digital curation, and digital preservation are # # # # # independent yet interrelated concepts. (3) Data management and curation services include data # # # # # curation, digital curation, and digital preservation. (4) There is a need to develop data curation theory from similarities, differences, and # # # # # interrelationships from multiple and competing models or frameworks. (5) There is a need to develop interdisciplinary undergraduate # # # # # data management and curation services

172! ! ! ! ! !

programs. (6)

173! ! ! ! ! !

Q5 With respect to data management and curation services research and evaluation, what "theoretical frameworks or perspectives" have you referenced, used, or developed? (i.e., *examples of theoretical frameworks) $ Autoethnography - Insights that can be extracted from analysis of one's own experiences. (1) $ Constructivism - Focuses on individuals making sense of their experiences. (2) $ Critical Theory - Overcoming the uneven balance of power between groups of individuals. (3) $ Ethnography - The study of the culture of a group. (4) $ Ethnomethodology - The study of people making sense of their experiences to behave in socially acceptable ways. (5) $ Feminism - An example of an "orientational" inquiry theory that seeks to understand women's perception of a phenomenon. (6) $ Grounded Theory - Analysis of fieldwork that is used to generate a theory. (7) $ Hermeneutics - Providing a voice to individuals or groups who either cannot speak for themselves or are traditionally ignored. (8) $ Narratology - Analysis of a narrative story to reveal something about the world from which the individual comes. (9) $ Phenomenology - The search for the common thread or essence of a shared experience. (10) $ Phenomenography - The description of different ways people interpret shared experiences. (11) $ Positivist/Realist/Analytic Approaches - The search for the "truth" about the real world, insofar as we can get at it. (12) $ Pragmatism - Answering practical questions that are not theory-based. (13) $ Symbolic Interactionism -The search for a common set of meanings that emerge from interactions within a group. (14) $ Triangulation/Metatriangulation - The study of a phenomenon from multiple theoretical perspectives or frameworks. (15)

Q6 In your opinion, which of the following elements should be included in developing a data management plan? (i.e., **elements of a data management plan)

$ Data description (data collection and generation information) (1) $ Existing data (existing data integration assessment) (2) $ Format (data generation output formats) (3) $ Metadata (standards-based information description of output data) (4) $ Storage and backup (physical and cyber storage methods, backup, and research data recovery) (5) $ Security (technical, procedural, confidential information, permissions, and information access protection) (6) $ Responsibility (research project data manager(s)) (7) $ Intellectual property rights (copyright ownership management) (8) $ Access and sharing (resource sharing policies, procedures, and restrictions) (9) 174! ! ! ! ! !

$ Audience (primary, secondary, tertiary users of research data outputs) (10) $ Selection and retention periods (data selection and possession policies) (11) $ Archiving and preservation (data life-cycle management and digital preservation) (12) $ Ethics and privacy (informed consent, privacy, and participant confidentiality protection) (13) $ Budget (data preparation, documentation, curation, management, and preservation costs) (14) $ Data organization (data description and representation management) (15) $ Quality assurance (data quality, validation, security, and integrity checking during the project) (16) $ Legal requirements (relevant local, state, federal or funder data management and sharing requirements) (17)

Q7 In your opinion, which of the following guidelines should be included in managing a data repository? (i.e., ***data repository guidelines) $ The data producer deposits the research data in a data repository with sufficient information for others to assess the scientific and scholarly quality of the research data and compliance with disciplinary and ethical norms. (1) $ The data producer provides the research data in formats recommended by the data repository. (2) $ The data producer provides the research data together with the metadata requested by the data repository. (3) $ The data repository has an explicit mission in the area of digital archiving and promulgates it. (4) $ The data repository uses due diligence to ensure compliance with legal regulations and contracts including, when applicable, regulations governing the protection of human subjects. (5) $ The data repository applies documented processes and procedures for managing data storage. (6) $ The data repository has a plan for long-term preservation of digital assets. (7) $ Archiving takes place according to explicit workflows across the data life cycle. (8) $ The data repository assumes responsibility from the data producers for access and availability of the digital objects. (9) $ The data repository enables the users to utilize the research data and refer to them. (10) $ The data repository ensures integrity of the digital objects and the metadata. (11) $ The technical infrastructure explicitly supports the tasks and functions described in internationally accepted archival standards like OAIS (Open Archival Information System). (12) $ The data consumer complies with access regulations set by the data repository. (13) $ The data consumer conforms to and agrees with any codes of conduct that are generally accepted in higher education and research for the exchange and proper use of knowledge and information. (14)

175! ! ! ! ! !

$ The data consumer respects the applicable licenses of the data repository regarding the use of the research data. (15)

Q8 What is the primary designation of the organization where you conduct your research, practice, or teaching? (i.e., institutional/organization role) # Commercial Organization (1) # Government Funding Organization (2) # Higher Education Institution (HEI) (3) # iSchool (4) # Non-profit Organization (5) # Professional Association (6) # Publisher (7) # Research Center (8) # Other (9)

Q9 With respect to data management and curation programs, projects, and services, how would you best describe your primary role? (i.e., individual role) # Contractor (Consultant) (1) # Data Curator (processes data for access) (2) # Data Manager (manages data for discovery) (3) # Data Producer (produces data for research) (4) # Data User (consumes data for usefulness) (5) # Entrepreneur (Self-Employed) (6) # Faculty (Tenured and Non-Tenured) (7) # Graduate Student (Doctoral, Masters, Advanced Study) (8) # Librarian (University, Associate, Assistant) (9) # Practitioner (Professional) (10) # Post-doctoral (Post-doc appointment) (11) # Program Officer (Funding Program Director, Asst. Dir., Funder) (12) # Program Evaluator (13) # Research Fellow (14) # Scientist (Researcher) (15) # Senior Leadership (Academic Dean, Provost, VP, CIO) (16) # Other (17) ______

Q10 What is your primary discipline or domain of expertise?

176! ! ! ! ! !

APPENDIX B

PRELIMINARY STUDY IRB APPROVAL MEMORANDUM

Office of the Vice President for Research Human Subjects Committee Tallahassee, Florida 32306-2742

(850) 644-8673 · FAX (850) 644-4392 APPROVAL MEMORANDUM

Date: 11/02/2012

To: Plato Smith II

Address: College of Communication and Information - 142 Collegiate Loop

Dept.: INFORMATION STUDIES From: Thomas L. Jacobson, Chair Re: Use of Human Subjects in Research Data Management and Curation Services Opinion Survey

The application that you submitted to this office in regard to the use of human subjects in the proposal referenced above have been reviewed by the Secretary, the Chair, and two members of the Human Subjects Committee. Your project is determined to be Expedited per 45 CFR § 46.110(7) and has been approved by an expedited review process.

The Human Subjects Committee has not evaluated your proposal for scientific merit, except to weigh the risk to the human participants and the aspects of the proposal related to potential risk and benefit. This approval does not replace any departmental or other approvals, which may be required.

If you submitted a proposed consent form with your application, the approved stamped consent form is attached to this approval notice. Only the stamped version of the consent form may be used in recruiting research subjects.

If the project has not been completed by11/01/2013 you must request a renewal of approval for continuation of the project. As a courtesy, a renewal notice will be sent to you prior to your expiration date; however, it is your responsibility as the Principal Investigator to timely request renewal of your approval from the Committee.

You are advised that any change in protocol for this project must be reviewed and approved by the Committee prior to implementation of the proposed change in the protocol. A protocol change/amendment form is required to be submitted for approval by the Committee. In addition, federal regulations require that the Principal Investigator promptly report, in writing any unanticipated problems or adverse events involving risks to research subjects or others.

177! ! ! ! ! !

By copy of this memorandum, the chairman of your department and/or your major professor is reminded that he/she is responsible for being informed concerning research projects involving human subjects in the department, and should review protocols as often as needed to insure that the project is being conducted in compliance with our institution and with DHHS regulations.

This institution has an Assurance on file with the Office for Human Research Protection. The Assurance Number is IRB00000446.

Cc: Paul Marty , Advisor HSC No. 2012.9198

178! ! ! ! ! !

APPENDIX C

LETTER OF INVITATION TO PARTICIPATE IN WEB-BASED SURVEY

October 31, 2012

Dear Survey Participant,

I am a doctoral graduate student under the direction of Dr. Paul Marty in the College of Communication and Information at Florida State University. As part of my work toward a doctorate degree, I am conducting a study of professionals’ opinions of various data management and curation services issues such as data management planning, data collection methods, and data preservation practices. YOU HAVE BEEN SELECTED AS PART OF A PURPOSIVE SAMPLE OF PROFESSIONALS WITH EXPERTISE AND/OR INTEREST IN RESEARCH DATA MANAGEMENT AND CURATION ISSUES. The goal of the project is to gather information about research data management and curation services for contribution to effective steps toward effective data management.

Your participation involves answering twelve (12) questions online via the Data Management and Curation Services Opinion Survey and should take less than 10 minutes to complete. Your involvement in the study is voluntary, and you may choose not to participate or to stop at any time. Your email address and/or personal identifying information will not be released to anyone or included as part of the final results. The results of the study may be published, but your name will not be linked to responses in publications that are released from the project. All information you provide will remain strictly confidential to the extent allowed by law.

There is no foreseeable risk to participants involved in completing this survey. Your participation in the study may lead to the development of strategies for your organization to develop digital curation practices, professional development, educational graduate programs and practices, and data management plans to address the data deluge program across disciplines, institutions, and organizations.

If you have any questions about your rights as a subject/participant in this research, or if you feel you have been placed at risk, you can contact me, the Chair of the Human Subjects Committee, Institutional Review Board, through the Vice President for the Office of Research at (850) 644-8633, or my faculty advisor, Dr. Paul Marty, at (850) 644-5775 and/or email [email protected].

By completing the online questionnaire, you will be agreeing to participate in the above- proposed project.

Thank you in advance for your support and consideration. Respectfully, Plato L. Smith II College of Information Doctoral Student, The Florida State University

179! ! ! ! ! !

APPENDIX D

PRIMARY STUDY IRB APPROVAL MEMORANDUM – PHASE 1

Office of the Vice President for Research Human Subjects Committee Tallahassee, Florida 32306-2742 (850) 644-8673 · FAX (850) 644-4392 APPROVAL MEMORANDUM

Date: 07/11/2013

To: Plato Smith II Address: College of Communication and Information - 142 Collegiate Loop Dept.: INFORMATION STUDIES

From: Thomas L. Jacobson, Chair Re: Use of Human Subjects in Research

FLORIDA STATE UNIVERSITY DATA ASSET FRAMEWORK (DAF) SURVEY QUESTIONNAIRE

The application that you submitted to this office in regard to the use of human subjects in the proposal referenced above have been reviewed by the Secretary, the Chair, and two members of the Human Subjects Committee. Your project is determined to be Expedited per 45 CFR § 46.110(7) and has been approved by an expedited review process.

The Human Subjects Committee has not evaluated your proposal for scientific merit, except to weigh the risk to the human participants and the aspects of the proposal related to potential risk and benefit. This approval does not replace any departmental or other approvals, which may be required.

If you submitted a proposed consent form with your application, the approved stamped consent form is attached to this approval notice. Only the stamped version of the consent form may be used in recruiting research subjects.

If the project has not been completed by07/10/2014 you must request a renewal of approval for continuation of the project. As a courtesy, a renewal notice will be sent to you prior to your expiration date; however, it is your responsibility as the Principal Investigator to timely request renewal of your approval from the Committee.

You are advised that any change in protocol for this project must be reviewed and approved by the Committee prior to implementation of the proposed change in the protocol. A protocol change/amendment form is required to be submitted for approval by the Committee. In addition, federal regulations require that the Principal Investigator promptly report, in writing any unanticipated problems or adverse events involving risks to research subjects or others.

180! ! ! ! ! !

By copy of this memorandum, the chairman of your department and/or your major professor is reminded that he/she is responsible for being informed concerning research projects involving human subjects in the department, and should review protocols as often as needed to insure that the project is being conducted in compliance with our institution and with DHHS regulations.

This institution has an Assurance on file with the Office for Human Research Protection. The Assurance Number is IRB00000446.

Cc: Paul Marty , Advisor HSC No. 2013.3073

181! ! ! ! ! !

APPENDIX E

LETTER OF INVITATION TO PARTICIPATE IN WEB-BASED SURVEY

July 10, 2013 Dear Survey Participant,

I am a doctoral graduate student under the direction of Dr. Paul Marty in the College of Communication and Information at Florida State University. As part of my work toward a doctorate degree, I am conducting a study of researchers’ practices of various data management and curation activities such as research data planning, management, and preservation practices. You have been randomly selected to participate as part of a random sample of researchers involved with the accumulation, aggregation, and dissemination of research data. Your email address and name will not be recorded or included as part of the survey or in the survey results. The goal of the project is to gather information about research data management and curation services for contribution to strategies and research toward effective planning, management, and preservation of research data for current and future use.

Your participation involves answering 25 questions on DATA MANAGEMENT online via the Florida State University Data Asset Framework (DAF) Survey Questionnaire and should take less than 12 minutes to complete. Your involvement in the study is voluntary, and you may choose not to participate or to stop at any time. YOUR EMAIL ADDRESS, NAME, AND ANY PERSONAL IDENTIFYING INFORMATION will not be released to anyone or included as part of the final results. The results of the study may be published, but your name will not be linked to responses in publications that are released from the project. All personal identifying information will be coded to obfuscate your identity and only coded survey data responses will be published. Information you provide will remain strictly confidential to the extent allowed by law. There is no foreseeable risk to participants involved in completing this survey. Your participation in the study may lead to the development of strategies for your organization to develop data management plans, data curation practices, professional development, educational graduate programs and practices, and data management plans in agreement with standards, guidelines, and best practices to address the data deluge program across disciplines, institutions, and organizations.

If you have any questions about your rights as a subject/participant in this research, or if you feel you have been placed at risk, you can contact me, the Chair of the Human Subjects Committee, Institutional Review Board, through the Vice President for the Office of Research at (850) 644-8633, or my faculty advisor, Dr. Paul Marty, at (850) 644-5775 and/or email [email protected]. By completing the online questionnaire, you will be agreeing to participate in the above- proposed project.

Thank you in advance for your support and consideration. Respectfully, Plato L. Smith II College of Information Doctoral Student, The Florida State University

182! ! ! ! ! !

APPENDIX F

DATA ASSET FRAMEWORK (DAF) SURVEY QUESTIONNAIRE

Florida State University Data Asset Framework (DAF) Survey Questionnaire

Q1 Acknowledgements: JISC, University of Glasgow Humanities Advanced Technology & Information Institute (HATII), and the Digital Curation Centre (DCC) developed The Data Asset Framework (DAF) methodology. This survey was adapted from McGowan, T. & Gibbs, T. A. (2009) Southampton Data Survey: Our Experiences & Lessons Learned [unpublished]. University of Southampton: UK; The University of West of England (UWE), Imperial College, and University of Southampton questionnaires in the DAF Implementation Guide, October 2009.

FLORIDA STATE UNIVERSITY DATA ASSET FRAMEWORK (DAF) SURVEY QUESTIONNAIRE Thank you for participating in this survey which aims to find out about research data held by researchers in randomly selected research labs at FSU. The purpose of this survey is to build a better understanding of research data held in your Department, to inform strategic planning for data management at Florida State University and to inform the wider FSU data management community. Within the scope of this study 'research data' is data that you currently hold that has been collected and/or used in the course of your research at the Florida State University. Research data can be primary data collected by you or your research group or secondary data provided by a third party. It may be quantitative or qualitative (e.g. survey results, interview transcripts, databases compiled from documentary sources, images or audiovisual files). Data that you 'currently hold' is all research data that you currently store anywhere (i.e., in your 'My Documents' folder, on shared drive, a PC or laptop, cloud storage, on portable media such as CDs/DVDs, external hard drives, paper, or USB. The questionnaire is a maximum of 25 questions and should take about 12 minutes to complete. It would help my dissertation research greatly if you respond to this questionnaire 'even if you do not currently hold any research data'. Thank you for your time.

PARTICIPANT CONSENT FORM Please read the following statements carefully before agreeing to take part in this study. I have read and understand the participant information sheet (attached to the email in which you received this link). I understand that;(1) All results from this study will be anonymous. Information extracted from this questionnaire and any subsequent interview will not, under any circumstances, contain names or identifying characteristics of participants. (2) I am free to withdraw from this study at any time without penalty. (3) I am free to decline to answer particular questions. (4) Whether I participate or not there will be no effect on my progress in employment in any way. Do you consent to take part in this study on the terms described above in the Participant Consent Form? # Yes (1) # No (2)

If No Is Selected, Then Skip To End of Survey

183! ! ! ! ! !

Q2 Do you currently hold any research data? # Yes (1) # No (2)

If No Is Selected, Then Skip To End of Survey

Q3 What is your primary research role? # Senior Researcher (1) # Principal Investigator (2) # Research Assistant (3) # Research Technician (4) # Research Support (5) # Research Student (6) # Other (7) ______Q4 What is your primary disciplinary domain? # Multidisciplinary - please specify (1) ______# Interdisciplinary - please specify (2) ______# Other - please specify (3) Q5 Thinking about the primary data you hold, what type of data is your primary data? [Please select all that apply.] $ I don't hold any primary data (1) $ Computer code (including model or simulation source code, where it may be more important to preserve the model and associated metadata than computational data arising from the model) (2) $ Derived (resulting from processing or combining 'raw' or other data (where care may be required to respect the rights of the owners of the raw data)) (3) $ Experimental (scientific experiments and computational results, which may in principle be reproduced although it may in practice prove difficult or not cost-effective to do so) (4) $ Observational (of scientific phenomena at a specific time or location where the data will usually constitute a unique and irreplaceable record) (5) $ Reference (canonical or reference data relating for example to gene sequences, chemical structures or literary texts) (6) $ Other (7) ______

If I don't hold any primary data Is Selected, Then Skip To Thinking about the secondary data you... Q6 Who funded your primary data? Q7 Thinking about the secondary data you hold, who collected this data?

184! ! ! ! ! !

Q8 Which of the following data types makes up your secondary data? [Please select all that apply] $ Audio tapes (1) $ Computer software source code (2) $ Data automatically generated from or by computer programs (3) $ Data collected from sensors or instruments (including questionnaires) (4) $ Digital audio files (5) $ Digital video files (6) $ Excel sheets or equivalent presentation software (7) $ Fieldwork data (8) $ Images scans photos or X-rays (9) $ Laboratory notes (10) $ MS Access or equivalent database software (11) $ MS PowerPoint or equivalent presentation software (12) $ MS Word or equivalent word processing software (13) $ Slides - physical media (14) $ SPSS files or equivalent statistical software (15) $ Video tapes (16) $ Websites (17) $ Other - please give details (18) ______Q9 Do you allow others to access your data once the project is finished? # Yes (1) # No (2)

If No Is Selected, Then Skip To What are your concerns with granting ... Q10 What are your concerns with providing access to your research data? [Please select all that apply] $ Confidentiality or data protection issues (1) $ License agreements prohibiting sharing (2) $ The data is not fully documented (3) $ The data is no longer in a format that is widely readable/accessible (4) $ Other - provide details (5) ______

185! ! ! ! ! !

Q11 Where do you store your data (excluding back up copies)? [Please select all that apply] $ CD/DVD (1) $ External/commercial/ web data storage facility - give details (2) ______$ External Hard Disk (3) $ Local computer (4) $ My documents on research lab PC (5) $ Paper/file records (6) $ Technology vendor file server (7) $ Other provided file server (e.g. by School/unit - please specify (8) ______$ Other - give details (9) ______Q12 Can you estimate, as a percentage % of your research time, how much time you have lost re-organizing, re-formatting or trying to remember details about data? # 0% (1) # < 20% (2) # 20% - 40% (3) # 40% - 60% (4) # 60% - 80% (5) # > 80% (6) # Not sure (7) Q13 What are some barriers for you with regard to managing and storing your research data? $ Budget/funding (1) $ Infrastructure/resources (2) $ Stakeholders (3) $ Storage/technology (4) $ Other - provide details (5) Q14 Which of the following data management issues have you experienced? [Please select all that apply] $ Finding files which may be either colleagues' or your earlier versions (e.g. problems with file names, file/folder structure (1) $ Locating where data files are stored e.g. on external hard drives, USB, CDs/DVDs, networked storage (2) $ Non standard file formats which are difficult to work with in current FSU systems (3) $ Legal issues arising from international transfer of data (4) $ Problems establishing ownership of data (5) $ Finding or accessing research data from former colleagues (e.g. former PhD students or research staff) (6) $ Security and protection of files (7) $ Other - provide details (8) ______Q15 Have you ever been asked by a funder to produce Data Management Plan? # Yes - please provide details (1) ______# No (2) Q16 Are there any data preservation policies in place within your School (e.g. data preservation policy, data lifecycle management policy, or data disposal policy)?

186! ! ! ! ! !

# Yes - please provide details (1) ______# No (2) # Don't know (3) Q17 Who is responsible for managing your research data? [Please select all that apply] $ Project manager (1) $ Research assistant (2) $ Research groups (3) $ National data center or data archive - provide details (4) ______$ You (5) $ Other - provide details (6) ______Q18 What is the estimated amount of electronic research data do you currently hold / maintain? # < 1 Gigabyte (1) # 1 - 50 Gigabyte (2) # 50 - 100 Gigabyte (3) # 100 - 500 Gigabyte (4) # 500 Gigabyte - 1 Terabyte (5) # 1 - 50 Terabyte (6) # 50 - 100 Terabyte (7) # 100 Terabyte - 1 Petabyte (8) # Don't know (9) Q19 How do you keep track of where your data is stored and the relationships between data? [Please select all that apply] $ In a paper logbook (1) $ In an electronic logbook (2) $ In a spreadsheet (3) $ In a local database (e.g. research group) (4) $ In a remote database (e.g. national archive/data center) (5) $ Other - please provide details (6) ______Q20 How long do you keep your data? # Until the end of a project/body of work/when results are published (1) # 1 - 5 years (2) # 5 - 10 years (3) # 10 - 25 years (4) # 25 - 50 years (5) # 50 years or more with a defined lifetime (6) # Forever (7) # Don't know (8) Q21 Where do you back up your data? [Please select all that apply] $ Another computer (1) $ CD/DVD (2) $ External hard disk (3) $ My documents on research lab PC (4) $ My own tape backup system (5)

187! ! ! ! ! !

$ Paper/file records via photocopy of similar (6) $ School/unit - provided file server (7) $ USB/Flash Drive/Memory Stick (8) $ To technology vendor file server (9) $ To technology vendor backup system (10) $ To external/commercial/web data storage facility - give details (11) ______$ Other - give details (12) ______Q22 How frequently do you back up your data? # I do not back up my data (1) # No fixed schedule - when I remember (2) # At the end of a project/body of work (3) # At least annually (4) # At least quarterly (5) # At least monthly (6) # After every update (7) # Automatically via vendor solution nightly backup (retained for defined period of months) (8) Q23 Do you use standards, best practices, and guidelines to manage your research data? # Yes - please provide details (1) ______# No - please provide details (2) ______Q24 Would you and/or your research lab benefit from having a "data curator", in this context the person/organization responsible for all the activities connected with the management/curation of digital data, in particular of research data? # Yes (1) # Maybe (2) # No - provide details (3) ______Q25 Would you be willing to participate in a follow up interview to explore data management issues in more depth (max. 1 hr.)? # Yes (please include your name and email for follow up interview) (1) ______# No (2) Q26 How can the University (including your School, research lab cyberinfrastructure, and campus resources such as high performance computing (HPC)) make data management and storage easier for you?

188! ! ! ! ! !

APPENDIX G

PRIMARY STUDY IRB APPROVAL MEMORANDUM – PHASE 2

The Florida State University Office of the Vice President For Research Human Subjects Committee Tallahassee, Florida 32306-2742 (850) 644-8673 · FAX (850) 644-4392

APPROVAL MEMORANDUM Date: 10/8/2013

To: Plato Smith II Address: College of Communication and Information - 142 Collegiate Loop Dept.: INFORMATION STUDIES From: Thomas L. Jacobson, Chair Re: Use of Human Subjects in Research

FLORIDA STATE UNIVERSITY DATA ASSET FRAMEWORK (DAF) INTERVIEW QUESTIONNAIRE

The application that you submitted to this office in regard to the use of human subjects in the proposal referenced above have been reviewed by the Secretary, the Chair, and one member of the Human Subjects Committee. Your project is determined to be Expedited per 45 CFR § 46.110(7) and has been approved by an expedited review process.

The Human Subjects Committee has not evaluated your proposal for scientific merit, except to weigh the risk to the human participants and the aspects of the proposal related to potential risk and benefit. This approval does not replace any departmental or other approvals, which may be required.

If you submitted a proposed consent form with your application, the approved stamped consent form is attached to this approval notice. Only the stamped version of the consent form may be used in recruiting research subjects.

If the project has not been completed by 10/7/2014 you must request a renewal of approval for continuation of the project. As a courtesy, a renewal notice will be sent to you prior to your expiration date; however, it is your responsibility as the Principal Investigator to timely request renewal of your approval from the Committee.

You are advised that any change in protocol for this project must be reviewed and approved by the Committee prior to implementation of the proposed change in the protocol. A protocol change/amendment form is required to be submitted for approval by the Committee. In addition, federal regulations require that the Principal Investigator promptly report, in writing any unanticipated problems or adverse events involving risks to research subjects or others.

189! ! ! ! ! !

By copy of this memorandum, the Chair of your department and/or your major professor is reminded that he/she is responsible for being informed concerning research projects involving human subjects in the department, and should review protocols as often as needed to insure that the project is being conducted in compliance with our institution and with DHHS regulations.

This institution has an Assurance on file with the Office for Human Research Protection. The Assurance Number is FWA00000168/IRB number IRB00000446.

Cc: Paul Marty, Advisor HSC No. 2013.11347

190! ! ! ! ! !

APPENDIX H

LETTER OF INVITATION TO PARTICIPATE IN INTERVIEW

September 19, 2013 Dear Survey Participant,

I am a doctoral graduate student under the direction of Dr. Paul Marty in the College of Communication and Information at Florida State University. As part of my work toward a doctorate degree, I am conducting a study of researchers’ practices of various data management and curation activities such as research data planning, management, and preservation practices. You have been purposively selected to participate as part of a purposive sampling of researchers involved with the accumulation, aggregation, and dissemination of research data. Your email address and name will not be recorded or included as part of the survey or in the survey results. The goal of the project is to gather information about research data management and curation services for contribution to strategies and research toward effective planning, management, and preservation of research data for current and future use.

Your participation involves answering 45 questions on DATA MANAGEMENT online via the Florida State University Data Asset Framework (DAF) Interview Questionnaire and should take less than 90 minutes to complete. Your involvement in the study is voluntary, and you may choose not to participate or to stop at any time. YOUR EMAIL ADDRESS, NAME, AND ANY PERSONAL IDENTIFYING INFORMATION will not be released to anyone or included as part of the final results. The results of the study may be published, but your name will not be linked to responses in publications that are released from the project. All personal identifying information will be coded to obfuscate your identity and only coded survey data responses will be published. Information you provide will remain strictly confidential to the extent allowed by law. There is no foreseeable risk to participants involved in completing this survey. Your participation in the study may lead to the development of strategies for your organization to develop data management plans, data curation practices, professional development, educational graduate programs and practices, and data management plans in agreement with standards, guidelines, and best practices to address the data deluge program across disciplines, institutions, and organizations.

If you have any questions about your rights as a subject/participant in this research, or if you feel you have been placed at risk, you can contact me, the Chair of the Human Subjects Committee, Institutional Review Board, through the Vice President for the Office of Research at (850) 644-8633, or my faculty advisor, Dr. Paul Marty, at (850) 644-5775 and/or email [email protected]. By completing the online questionnaire, you will be agreeing to participate in the above proposed project.

Thank you in advance for your support and consideration. Respectfully, Plato L. Smith II College of Information Doctoral Student, The Florida State University

191! ! ! ! ! !

APPENDIX I

DATA ASSET FRAMEWORK (DAF) INTERVIEW QUESTIONS

Florida State University Data Asset Framework (DAF) Interview Questionnaire

Q1 Acknowledgements: The Florida State University Data Asset Framework (DAF) Interview Protocol is adapted from the University of Hertfordshire RDM Interview Protocol 6/1/2012. The additional sources contributing to the development of this interview protocol include: (1) University of South Hampton generic interview schedule, (2) University of Oxford Interview Framework (from DAF Implementation Guide), (3) Data Management Plan Checklist, (4) University of Bath Postgraduate DMP template, (5) Twenty Questions for Research Data Management (Oxford DMPonline Project), (6) The DCC Curation Lifecycle Model (DCC, 2007), (7) Level Three Curation Model (Lord, 2003) from the 2003 eScience Curation Report, and (8) the Conceptual Framework for Analyzing Methodological Suppositions (Burrell & Morgan, 1979; Morgan & Smircich, 1980; Morgan, 1983; Solem, 1993).

FLORIDA STATE UNIVERSITY DATA ASSET FRAMEWORK (DAF) INTERVIEW PROTOCOL Thank you for taking the time to participate in this interview. My name is Plato L. Smith II. I am a doctoral candidate in the FSU College of Communication and Information studying data management and curation. The title of my dissertation is "Exploring the Data Management and Curation (DMC) Practices of Scientists in Research Labs within a Research University". The dissertation aims to investigate current data management and curation practices in order to deliver recommendations for researchers, which will define a number of opportunities that can be used for the improvement of data management and curation throughout the lifecycle of a research project. Therefore, this interview must first examine the current data management and curation (DMC) practices and associated activities of scientists at select research labs to assessment the effectiveness of data management planning (DMP). DMP includes DMC and DMC includes data curation, digital curation, and digital preservation. For purposes of this interview and dissertation these concepts are defined as follows: 1. Data Management Planning [DMP] is the planning of policies for the management of data types, formats, metadata, standards, integrity, privacy, protection, confidentiality, security, intellectual property rights, dissemination, reuse/re-distribution, derivatives, archive, preservation, and access (NSF, 2011). 2. Data curation is a data lifecycle management process of providing descriptive, annotative, and representative information for research data through metadata and standards.3. Digital curation is a data lifecycle management process of storing, managing, and storing curated research data within a platform environment.4. Digital preservation is a data lifecycle management process of maintaining the authenticity, integrity, and security of curated research data within a platform environment. The purpose of this interview is to find out more information about your DMP and DMC practices during the course of your research in the university environment, your experience of managing data and what can be done to aid you and other staff in the use and management of research data. There are no right or wrong answers; this study is just interested in your perspectives, activities, and operating procedures with respect to how you store, manage, preserve, and provide access to your data. Please view this interview as a platform to voice your thoughts, ideas, and perspectives rather than simply another question and answer session. This

192! ! ! ! ! ! interview includes 6 thematic headings and several sub-questions with the goal of investigating what is important to you and your research. If there is any question that you do not understand than please provide comment in the corresponding text box for future explanation. If there is anything you want to ask, then there is a comment box at the end of the interview for questions. The interview questionnaire includes 45 questions and should take less than 90 minutes to complete. Your participation is significant to this study and it will help my dissertation greatly if you answer all the questions. Thank you again for your time, participation, and support.

PARTICIPANT CONSENT FORM Please read the following statements carefully before agreeing to take part in this study. I have read and understand the participant information sheet (attached to the email in which you received this link). I understand that;(1) All results from this study will be anonymous. Information extracted from this questionnaire and any subsequent interview will not, under any circumstances, contain names or identifying characteristics of participants.(2) I am free to withdraw from this study at any time without penalty.(3) I am free to decline to answer particular questions.(4) Whether I participate or not there will be no effect on my progress in employment in any way. Do you consent to take part in this study on the terms described above in the Participant Consent Form? # Yes (1) # No (2) If No Is Selected, Then Skip To End of Survey

193! ! ! ! ! !

Q2 What is your primary research role?

Q3 What is your primary disciplinary domain?

Q4 Which research lab do you work?

Q5 Could you briefly explain your area of research and the types of research questions, with examples, that you try to answer?

Q6 I am interested in learning more about those research activities that contain some form of data management. It may be easier to do this by going through a particular research project that you carried out, and look at its "research life-cycle", from funding application, data collection & processing, all the way to publishing. Thinking of your research projects, could you select one of them as an exemplar and tell me (1) about that project, (2) the name of the project, and (3) the project outcomes?

Q7 Which of the elements of the DCC Curation Lifecycle Model are you involved? (Please click on sections that apply).

194! ! ! ! ! !

Q8 Which of the processes from the Levels of Curation Model are you involved? (Please click on processes that apply).

Q9 In your opinion, which of the following elements of the Adapted Conceptual Framework Model are important in conducting research within and across multiple disciplines? (Please click on the elements in the framework image that apply.)

195! ! ! ! ! !

Q10 How does your discipline look at and understand reality? (i.e., What are the core ontological suppositions underlying its frame of reference?)

Q11 How does the discipline learn about reality? (i.e., What are the basic epistemological stances for its frame of reference?)

Q12 What are the concepts, methods, theory, and practice do you use to address 'research data management' in your discipline?

Q13 What agency funded your exemplar project?

Q14 Was a data management plan required by the funding agency at the application stage? # Yes (1) # No (2) If Yes Is Selected, Then Skip To If YES, (1) how did you develop the d...If No Is Selected, Then Skip To If NO, did you have any thought or in...

Q15 If YES, (1) how did you develop the data management plan, (2) what resources did you seek help, and (3) what was included in your data management plan?

Q16 If NO, did you have any thought or informal plan for managing data at the application stage?

196! ! ! ! ! !

Q17 How much did formal or informal initial data management planning actually influenced your data management practice?

Q18 Thinking of your exemplar research project, please describe the nature (range, scope, origin) of your research data and the process by which you capture & create new data.

Q19 Thinking of your exemplar research project, please describe your experience of any difficultly that you encountered during the data collection stage.

Q20 What are the format(s) of your research data in the short term after acquisition?

Q21 Where do you store your in the short term after acquisition?

Q22 How much data do you generate or expect to generate during the life-cycle of the research project?

Q23 How often do you structure and name your folders and files?

Q24 What type of data about your data (metadata) do you record? What type of metadata standards do you use for your data?

Q25 How will you back-up the data during the project's lifetime and what is the frequency? If you do not back-up your data, then please provide your reasons.

Q26 Please describe your experiences with data loss, formatting, or file size issues.

Q27 Who owns your research data?

Q28 Who needed access to the data? How did you share and provide access to the data?

Q29 Thinking of security and confidentiality of your data, what security measures have been taken to preserve security and integrity of your data?

Q30 How will you publish and provide public or open access to your data? If not, why?

Q31 What type of data about your data (metadata) do you record? What type of metadata standards do you use for your data?

Q32 How will you back-up the data during the project's lifetime and what is the frequency? If you do not back-up your data, then please provide your reasons.

Q33 Please describe your experiences with data loss, formatting, or file size issues.

Q34 Please describe the stewardship of any raw data after the life time of the project such as when data is archived for long-term preservation and when the files migrate into an archive.

197! ! ! ! ! !

Q35 What should be (or was) archived beyond the end of your project? Who makes this decision?

Q36 How long should the exemplar research project data be stored, managed, and preserved?

Q37 Are there any other challenges, concerns, and barriers in managing your research data that have not been addressed? If so, then please describe them and the services that would help you to deal more effectively deal with those issues.

Q38 Please rank which of the following storage devices you use for storing your research data (please rank them from 1 to 6, where 1 is the one that you use the least and 6 is the one you use the most): ______USB or external hard drive (1) ______My personal network storage (2) ______Your departmental shared area (3) ______Dropbox or other cloud based storage (4) ______Laptop/PC (5) ______Other (Please specify) (6)

Q39 How would you describe the security of your research data?

Q40 How would you describe the effectiveness of the university infrastructure in managing your research data?

198! ! ! ! ! !

Q41 On the scale of 1-3, please rate your level of confidence and awareness of the following research data management (RDM) and data management plan (DMP) matters: Click to write Column 1 Not very confident (1) Confident (2) Very confident (3) Consideration of the RDM # # # requirements/costs at the bidding stage (1) Expertise in developing DMP for bidding # # # applications (2) Fulfilling research grant # # # RDM obligations (3) Awareness of university RDM facilities and # # # DMP policies (4) Ease of access to the research data during the # # # life-time of the research project by yourself (5) Ease of access to the research data by # # # collaborators (6) Re-accessing the data after the project life- # # # time (7)

Q42 Did you find this interview useful? Was it helpful in increasing your awareness of my project and the requirements of RDM? # Yes (1) # No (2)

Q43 As a result of your participation in this interview, do you think you will re-evaluate your RDM practice? # Yes (1) # No (2) Q44 Do you have any questions regarding this interview? Q45 What comments, feedback, or suggestions do you have to help improve future research data management interviews?

199! ! ! ! ! !

APPENDIX J

DATA ASSET FRAMEWORK (DAF) INTERVIEW TRANSCRIPTS

My Report Last Modified: 03/04/2014 1. Acknowledgements: The Florida State University Data Asset Framework (DAF) Interview Protocol is adapted from the University of Hertfordshire RDM Interview Protocol 6/1/2012. The additional sources contributing to the development of this interview protocol include: (1) University of South Hampton generic interview schedule, (2) University of Oxford Interview Framework (from DAF Implementation Guide), (3) Data Management Plan Checklist, (4) University of Bath Postgraduate DMP template, (5) Twenty Questions for Research Data Management (Oxford DMPonline Project), (6) The DCC Curation Lifecycle Model (DCC, 2007), (7) Level Three Curation Model (Lord, 2003) from the 2003 eScience Curation Report, and (8) the Conceptual Framework for Analyzing Methodological Suppositions (Burrell & Morgan, 1979; Morgan & Smircich, 1980; Morgan, 1983; Solem, 1993). FLORIDA STATE UNIVERSITY DATA ASSET FRAMEWORK (DAF) INTERVIEW PROTOCOL Thank you for taking the time to participate in this interview. My name is Plato L. Smith II. I am a doctoral candidate in the FSU College of Communication and Information studying data management and curation. The title of my dissertation is "Exploring the Data Management and Curation (DMC) Practices of Scientists in Research Labs within a Research University". The dissertation aims to investigate current data management and curation practices in order to deliver recommendations for researchers, which will define a number of opportunities that can be used for the improvement of data management and curation throughout the lifecycle of a research project. Therefore, this interview must first examine the current data management and curation (DMC) practices and associated activities of scientists at select research labs to assessment the effectiveness of data management planning (DMP). DMP includes DMC and DMC includes data curation, digital curation, and digital preservation. For purposes of this interview and dissertation these concepts are defined as follows: 1. Data Management Planning [DMP] is the planning of policies for the management of data types, formats, metadata, standards, integrity, privacy, protection, confidentiality, security, intellectual property rights, dissemination, reuse/re-distribution, derivatives, archive, preservation, and access (NSF, 2011). 2. Data curation is a data lifecycle management process of providing descriptive, annotative, and representative information for research data through metadata and standards. 3. Digital curation is a data lifecycle management process of storing, managing, and storing curated research data within a platform environment. 4. Digital preservation is a data lifecycle management

200! ! ! ! ! ! process of maintaining the authenticity, integrity, and security of curated research data within a platform environment. The purpose of this interview is to find out more information about your DMP and DMC practices during the course of your research in the university environment, your experience of managing data and what can be done to aid you and other staff in the use and management of research data. There are no right or wrong answers; this study is just interested in your perspectives, activities, and operating procedures with respect to how you store, manage, preserve, and provide access to your data. Please view this interview as a platform to voice your thoughts, ideas, and perspectives rather than simply another questionnaire. This interview includes 6 thematic headings and several sub-questions with the goal of investigating what is important to you and your research. If there is any question that you do not understand than please provide comment in the corresponding text box for future explanation. If there is anything you want to ask, then there is a comment box at the end of the interview for questions. The interview questionnaire includes 40 questions and should take less than 60 minutes to complete. Your participation is significant to this study and it will help my dissertation greatly if you answer all the questions. Thank you again for your time, participation, and support. PARTICIPANT CONSENT FORM Please read the following statements carefully before agreeing to take part in this study. I have read and understand the participant information sheet (attached to the email in which you received this link). I understand that; (1) All results from this study will be anonymous. Information extracted from this questionnaire and any subsequent interview will not, under any circumstances, contain names or identifying characteristics of participants. (2) I am free to withdraw from this study at any time without penalty. (3) I am free to decline to answer particular questions. (4) Whether I participate or not there will be no effect on my progress in employment in any way. Do you consent to take part in this study on the terms described above in the Participant Consent Form? # Answer Response %

1 Yes 7 100%

2 No 0 0%

Total 7 100%

2. What is your primary research role? Text Response Software application support Data management Faculty member at an institution of higher education Data stewardship PI and Senior Collaborator Principal investigator Research professor, Principal Investigator

201! ! ! ! ! !

Statistic Value Total Responses 7

3. What is your primary disciplinary domain? Text Response Computer Science Biology and oceanography Boundary-layer Meteorology and Biogeochemical cycles of water and carbon Meteorology Condensed Matter Physics Materials science and physics Marine ecology, fisheries science

Statistic Value Total Responses 7

4. Which research lab do you work? Text Response International Ocean Discovery Program MBL and ASU This question in unclear to me. I have my own research group called the 'Biomicrometeorology Group'. Center for Ocean-Atmospheric Prediction Studies National High Magnetic Field Lab, FSU National High Magnetic Field Laboratory FSU Coastal and Marine Lab

Statistic Value Total Responses 7

202! ! ! ! ! !

5. Could you briefly explain your area of research and the types of research questions, with examples, that you try to answer? Text Response I don't do research per se, I support research done by scientists. My "research" is concerned with technology and techniques for efficient data management and dissemination. Understanding plankton dynamics. Understanding ways to manage data better. I am interested in the breathing of land-based ecosystems. The breathing entails the 'How? (physical transport paths)' , the 'How much? (seasonal to inter annual budgets of carbon, water, and energy exchange between the atmosphere and the vegetation), and the 'How can we best observe it ? (Innovating observational techniques to studying the breathing of ecosystems). Examples of concrete questions are: - What generates atmospheric turbulence and transport in forested ecosystems that are prone to weak winds and spatial heterogeneity? - What drives the seasonal and interannual variability of ecosystem carbon, water, and energy fluxes, and how is it impacted by climate change? - How do Pacific Northwest forests respond to changes in drought severity, duration, and timing? - How can we develop better environmental sensor networks to account for the spatial heterogeneity of our natural environment? My primary area of research is marine climatology. I seek to identify changes in the atmospheric and near surface ocean conditions in the global oceans. I also seek to understand the mechanisms of exchange of heat, moisture, and momentum across the air-sea interface. Both of these efforts require collection, analysis, and stewardship of a wide array of in-situ and remotely-sensed atmospheric and ocean parameters in the marine environment. Finally, I work (with my colleagues) to improve observation of the marine environment through improvements in in-situ and remotely-sensed data collection. I work with several different areas involving novel superconductivity, low dimensional conductors and spin systems at high magnetic fields and low temperatures. We are attempting to understand the role of magnetism in materials. Our group work in the area of materials characterization and development. We measure physical properties of materials used for constructing high field magnets. These measurements are typically performed at cryogenic temperatures. The properties we are interested in are, electrical conductivity, thermal conductivity, specific heat, magnetization, and critical current of superconductors. I study the biology and ecology of marine fishes. I have a great many active projects with dozens of research questions. These include studies of movements, migration and habitat use using tagging, acoustic telemetry and satellite telemetry, studies of population dynamics and community structure using fishery-independent sampling, studies of trophic ecology using diet and stable isotope analyses, life history studies (age, growth, reproduction), and studies of taxonomy and phylogenetics.

Statistic Value Total Responses 7

6. I am interested in learning more about those research activities that contain some form of data management. It may be easier to do this by going through a particular research project that you carried out, and look at its "research life-cycle", from funding application, data collection & processing, all the way to publishing. Thinking of your research projects,

203! ! ! ! ! ! could you select one of them as an exemplar and tell me (1) about that project, (2) the name of the project, and (3) the project outcomes? Text Response My group is constantly involved in data management. We create and support the applications used on the JOIDES Resolution research vessel. Data is gathered from over 50 instruments in real time; including physical properties, chemistry, and visual observations. The information is organized, stored, analyzed and made available to scientists in various tabular and graphic forms. Long term the data is brought back to shore and accumulated with other expeditions in a large repository. All data is organized into specific file formats and transmitted to NGDC for long term storage. I did a project wherein I exposed oysters to toxic algae in the laboratory to see 1) if they ate the algae and 2) if they accumulated toxin in their tissues. The outcome of the project was that the oysters rejected the algae as pseudofeces. They did not ingest it. They did not accumulate toxin. A paper and two posters were made from this research. The actual data are in the form of printed photos and handwritten data sheets. Some calculations are in excel. Some of the photo prints are in FigShare. 1) This project studies the ecosystem carbon and water exchange at 3 location in the Pacific Northwest and is part of the AmeriFlux network. It requires year-round observations of atmospheric exchange of momentum, mass, and energy, the observations have been carried out continuously since 2002 at one location, and since 2006 at the other two locations. 2) Title: "The effects of disturbance and climate on carbon storage and the exchanges of carbon dioxide, water vapor and energy of coniferous forests in the Pacific Northwest: integration of measurements at a cluster of super sites", 3) We have published many papers resulting from this study; The main paper summarizing the first 8 years of observations found that i) the seasonal hydrology is the main driver of ecosystem carbon uptake, ii) drought plays a significant role in the carbon sequestration potential at a site, iii) only by proposing a novel concept called "Hydro-ecoogical years' we were able to delineate functional seasonality in an ecologically meaningful and explain the inter annual and intraannual variability, this concept takes the plants' perspective, rather than the human perspective. The project is known as the Shipboard Automated Meteorological and Oceanographic Systems (SAMOS) initiative. The initiative aims to improve the quality of meteorological and near- surface oceanographic observations collected in-situ on research vessels (R/Vs). Scientific objectives of SAMOS include (1) Creating quality estimates of the heat, moisture, momentum, and radiation fluxes at the air-sea interface, (2) improving our understanding of the biases and uncertainties in global air-sea fluxes, (3) benchmarking new satellite and model products, and (4) providing high quality observations to support modeling activities (e.g., reanalysis) and global climate programs. The initiative developed procedures to recruit research vessel operators that were willing to routinely submit 1-minute averaged navigational, meteorological, and oceanographic observations to our data center. The observations undergo common data formatting, quality assessment and quality control, and metadata augmentation prior to being distributed to the user community. Outcomes are quality evaluated meteorology and oceanographic observations that are delivered to the research and operational communities and ultimately submitted to the National Oceanographic Data Center for long term archival and preservation. Projects are on going and not easy to separate out. I will pick one that involved a material showing a novel superconducting state. This project did not have a specific title but I will call it

204! ! ! ! ! !

"Search for the FFLO state in a heavy fermion superconductor". This involved working with several research groups to look for this proposed transition. The outcome was a paper published in Nature. Project: Verification tests for ITER TF strands ITER is a multi-billion dollar international nuclear fusion project. Large quantity of superconductor wires are needed for the project. Our role in the project is to do the quality control test of Nb3Sn superconductor wires made by US manufacturers. Outcome: The deliverable is the Nb3Sn wire property data measured at both room temperature and cryogenic temperatures. Smalltooth sawfish are the only native marine fish listed on the US Endangered SPecies Act. As part of the listing, critical habitat must be designated. We conducted a study to examine activity space, habitat use and migration patterns of juvenile sawfish in one of the only known nursery areas in the U.S. We tracke juvenile sawfish using active and passive telemetry and determined that they have very small, but mobile daily activity spaces, the use shallow muddy habitats associated with red mangroves, and they remain in the backcountry habitats for much of the first year of like, later moving into adjacent shallow bays for up to two years.

Statistic Value Total Responses 7

205! ! ! ! ! !

7. Which of the elements of the DCC Curation Lifecycle Model are you involved? (Please click on sections that apply).

206! ! ! ! ! !

Region Response %

Conceptualize - 6 100% DMP Data - DMP 10 167%

Curate - Data 4 67% Curation Community - DMP - 0% Preserve - Digital 1 17% Curation Appraise - Digital 6 100% Curation Store - Digital 6 100% Preservation Transform - Digital 5 83% Curation Access, Use & 4 67% Reuse - DC Dispose - DMP - 0% Other - 0%

Statistic Value Total Responses 6

207! ! ! ! ! !

8. Which of the processes from the Levels of Curation Model are you involved? (Please click on processes that apply).

Region Response %

Research Process – 30 429% Data Curation Curation – Digital 7 100% Curation Curation – Digital 8 114% Preservation Publication - 10 143% Access Other – Library – Peers – Public - 1 14%

Industry

Statistic Value Total Responses 7

208! ! ! ! ! !

9. In your opinion, which of the following elements of the Adapted Conceptual Framework Model are important in conducting research within and across multiple disciplines? (Please click on the elements in the framework image that apply.)

Region Response %

Region #1 - 3 43% Ontology Region #2 - 1 14% Epistemology Region #3 – Frame 4 57% of Reference Region #4 - 3 43% Practice Region #5 - Theory 2 29%

Region #6 - 6 86% Concepts Region #7 - 6 86% Methods Region #8 - 3 43% Problem Other - 0%

209! ! ! ! ! !

Statistic Value Total Responses 7

10. How does your discipline look at and understand reality? (i.e., What are the core ontological suppositions underlying its frame of reference?) Text Response Mostly geological based: all measurements are of a physical entity I'm not sure how to answer this....Most of the data in my domain are spatio-temporally organized. It analyzes reality by making observations, i.e., seeing the natural environment through sensors targeting specific characteristics of the environment, such as heat, water, and carbon content, light availability, etc. In meteorology, we seek patterns in a chaotic system. Through organization and classification, patterns emerge that subsequently support understanding of underlying physical relationships. Over time enough knowledge is gathered to support causal relationships that lead to our ability to predict future outcomes based on past pattern recognition. Not sure how to reply to this question. Physics is the study of reality so a supposition that there is an objective reality is the core of the discipline. We use experimental methods to reveal the true reality.

Statistic Value Total Responses 6

11. How does your discipline learn about reality? (i.e., What are the basic epistemological stances for its frame of reference?) Text Response I don't participate in this. I'm not sure how to answer this....My discipline uses numerical models, field sampling and controlled experiments to learn about reality. I guess from the sensors it employs Primarily through observation and modeling. Meteorology is based on the physical observation of our world. Through observation patterns emerge that support the development of conceptual models for atmospheric systems (e.g., Bjerknes (sp?) cyclone model of the early 1900s). As our knowledge has grown we have been able to connect observations to physical, chemical, and mathematical concepts that in turn have led to the numerical (computer) modeling that dominates the field today. Through a series of observations and numerical trials, we gain a better understanding of the realities of our atmosphere. Experiments and observations Via carefully controlled experiment.

Statistic Value Total Responses 6

210! ! ! ! ! !

12. What are the concepts, methods, theories, and practices do you use to address "research data management" in your discipline? Text Response Capture of original measurement information and metadata about the process and tools for QAQC purposes. Lifespan management of information, including versions and change history as corrections, additions or analysis occurs. At this point, mostly having good documentation and putting data in a repository or making it discoverable and citable in some way. That's about as much as one can do. We collect many gigabytes of observations every month since studying atmospheric transport requires very fast-response sensors collecting observations 20 to 50 times per send (20 to 50 Hz). We archive these 'raw' data streams, and process and analyze them to produce secondary and tertiary data products. Managing the data servers, making sure data doesn't get lost, can be served to the larger research group, documenting the data well enough to be useful to others, and sharing the secondary and tertiary data with other external users is key. I am not familiar with any specific theories or concept for research data management, it all evolved as best practice out of experience. Not quite clear what you a seeking with this question. I work with research vessel operators to ensure that they are using appropriate instrumentation to measure the quantities desired by the marine science community. We ensure that they have best practices for instrument exposure on the vessel and seek to collect sufficient metadata to understand the observations being made. There is very little theory involved. Basically there is no funding or reward for such an effort so data is only held or managed until publication. Other than that it is up to individual PIs as to how valuable data is for future research and if it is worth effort to manage. No theories of systematic approach. Only manually sorted and saved electronically only, and backed up regularly.

Statistic Value Total Responses 6

13. What agency funded your exemplar research project? Text Response NSF I think Louisiana Sea Grant. That was long ago. DOE NOAA Climate Observing Division NSF US Department of Energy

Statistic Value Total Responses 6

211! ! ! ! ! !

14. Was a data management plan required by the funding agency at the application stage? # Answer Response %

1 Yes 3 50%

2 No 3 50%

Total 6 100%

15. If YES, (1) how did you develop the data management plan, (2) what resources did you use to seek help, and (3) what was included in your data management plan? Text Response Data Management Plan has been in place for over 10 years. We added to it for the current funding / application cycle. Was done internally. Includes both initial data capture planning, retention, providing back to the community and long term archival. 1,2) We were required to share the secondary data with the data network servers within one year of completion of an annual data set. We followed the networks' data submission guidelines and also outlined our strategy. 3) An outline of how the primary data is stored, backed up, and processed to produce the secondary data sets. I discussed current practices with facility heads and drafted a statement covering current practices.

Statistic Value Total Responses 3

16. If NO, did you have any thought or informal plan for managing data at the application stage? Text Response Not other than making back ups. Absolutely. The entire proposal was a data management plan. I did. The data are organized in electronic form for easy retrieval. Data are controlled by granting limited access. Data are backed up periodically.

Statistic Value Total Responses 3

212! ! ! ! ! !

17. How much did formal or informal initial data management planning actually influenced your data management practice? Text Response A lot. Not at all Not much, we had a system in place beforehand, so it was more describing of what resources we already had available. In most every way. Clearly, we have deviated from the original proposed plan over the past 10 years as we have learned about best practices in other data management groups, but the proposed plan was the blueprint for our present operations. very little. Data management planning was drafted to fit current practice. We only have informal data management plan. No one in our group has formal training on data management

Statistic Value Total Responses 6

213! ! ! ! ! !

18. Thinking of your exemplar research project, please describe the nature (range, scope, origin) of your research data and the process by which you capture & create new data. Text Response We don't have an exemplar research project, we support many research projects from a common laboratory basis. Anywhere from 4-6 primary projects a year involving dozens of different scientists for each project. The process by which data is captured and maintained continues to evolve and mature as scientific needs change. That particular project was just lab experiments. There were data sheets with a measurement taken at given time intervals. There were samples that were analyzed via microscopy. There were data gathered from the oysters at the end. High-frequency turbulence observations (20 Hz) of wind speeds, carbon dioxide, water vapor, methane concentrations, and air temperature, collected rom fast-response environmental sensors in multiple locations per location. Data were recorded with onsite data loggers and primary data manually harvested, while summary statistics were harvested via cell phone remote links and displayed on webpages automatically. Once on data server on campus, the primary data are then processed using statistical tools, which produce the secondary data sources (ecosystem fluxes). Those are then further aggregated into tertiary data products, which typically are contained in publications. SAMOS data are typically derived from a computerized data logging system that continuously records navigational (ship position, course, speed, and heading), meteorological (winds, air temperature, pressure, moisture, rainfall, and radiation), and near-surface oceanographic (sea temperature, conductivity, and salinity) parameters from underway research vessels. Measurements are recorded at high-temporal sampling rates (typically 1 minute or less). A SAMOS comprises scientific instrumentation deployed by the research vessel operator and typically differs from instruments provided by national meteorological services for routine marine weather reports. Presently we had 32 vessels recruited and each delivers their data via daily (containing all 1440 1-min records from the previous day) ship-to-shore emails. My data is measurements of low temperature heat capacity data as ascii files. These captured by standard data acquisition software. Our data are collected in a steady pace. The data include the raw data which include arrays of numbers and images directly coming from our testing instruments, as well as the processed data. We are required to provide weekly update of the database to our funding agency.

Statistic Value Total Responses 6

214! ! ! ! ! !

19. Thinking of your exemplar research project, please describe any difficulties that you encountered during the data collection stage. Text Response Our biggest issue was dealing with change and the rate of change. With different major science objectives every two months, we often have to adapt the capture and management system to new types of instruments and analysis on a very short time frame. This was part of what drove the design of our current management system. There were issues trying to figure out how we were going to capture and express rejection of the algae. The experiments were difficult and had to be repeated many times over several years. This made comparison difficult. Power failure leading to data loss Insufficient or incomplete documentation of sensor information, exchanges, collection schedule / protocol Fast turnover of students/ technicians leading to different data harvesting schedules and protocols Primary difficulty is in the recruitment of new vessels. There is a learning curve on the part of the operator to organize their data into a suitable format for our daily email deliveries. Some software development is likely needed on the part of the operator or they must become familiar with existing software to submit SAMOS records. So basically, most of the difficulties are technical. Not in managing data. Collecting data is easy once the experiment functions. Sometimes, the data quality is poor. The measurement has to be repeated.

Statistic Value Total Responses 6

20. What are the format(s) of your research data in the short term after acquisition? Text Response Relational data base - Oracle (both during and after acquisition) Hand-written data sheets, excel files and photos Initially in binary numerical format, then converted to ascii-readable floating point 4-byte numbers Key: value paired ASCII files Ascii text files and Igor data processing files. In either ascii text format or JPEG image format.

Statistic Value Total Responses 6

215! ! ! ! ! !

21. Where do you store your data in the short term after acquisition? Text Response Relational data base - Oracle (both during and after acquisition) Filing cabinet and PC On Compact flash cards, then converted on laptop and stored to HDD, then transferred to data server On Linux servers at COAPS My desktop backed up by local external disk and backed up by lab wide server. In a hard disk of a local computer, or in a network drive.

Statistic Value Total Responses 6

22. How much data do you generate or expect to generate during the life-cycle of the research project? Text Response Each two month expedition generates around 1 TB of data. This is raw capture level information, it grows by roughly 20% in the following year as post-capture analysis information is added to the system. Usually less than 1GB In one year, we produce about 160 GB of compressed, primary data for 6 sensor system deployed at 3 locations. Presently we have ~100 GB of data. This is an operational project, so the data continue to grow each day. The life-cycle of the project is unknown at present, as it depends on continual funding. couple of hundred megabytes. Small amount. 10 GB, and about 10,000 files.

Statistic Value Total Responses 6

23. How often do you structure and name your folders and files? Text Response We don't. Files are cataloged and maintained in a digital file catalog cross-referenced in the Oracle database associated with the measurements. A backup copy of files is kept as copied from each instrument system at the end of each two month expedition. Just once in the beginning Every time we download them we follow a very strict naming protocol. Continuously. We have a fully automated file and folder naming system. I don't restructure often, but create structure as data is acquired. Usually by material studied, measurement type and date. more than 20 new folders will be created under is predetermined scheme every day.

Statistic Value Total Responses 6

216! ! ! ! ! !

24. What type of data about your data (metadata) do you record? What type of metadata standards do you use for your data? Text Response As much as possible. We record how, who, when, where, raw instrument readings, calibration information for the instrument and final computed results. This is not recorded in standards such as Dublin core, but collected within the relational database as discrete information. At the time, none, now I am more careful about such things. Usually a time and space component along with method information. There is no specific standards we adhere to, we have online searchable online logs for field / data harvesting activities, in which we document all changes/ downloads etc. We collect a wide range of metadata related to the individual ships, instrumentation, and observations. These include, but are not limited to instrument make and model, location on the vessel, units of measurement, calibration date, sampling rate, etc. For more details, look at section 4 of our most recent annual data report: http://samos.coaps.fsu.edu/html/docs/2012SAMOSAnnualReport_final.pdf lab note book covering power levels, material, masses, connections, drive voltages, configuration of leads etc. Some metadata is recorded as free text in data files and/or in the Igor data analysis program as the data is collected. Weekly update, so we have data of number of newly created folder every week.

Statistic Value Total Responses 6

25. How will you back-up the data during the project's lifetime and what is the frequency? If you do not back-up your data, then please provide your reasons. Text Response Backups on the ship are tape based, daily incremental and weekly full with a rotating 6 week cycle. Backups on shore are the same rotation and monthly one set of full backup tapes is moved offsite. I have not done anything with this data in ages. The paper was published, so I haven't paid much attention to it. i) Backed up automatically via contracting resources provided by the college, and i) using our own RAID system and via Time Machine. The data are backed up locally by using Raid storage disks. This ensures integrity of our data storage system. We also routinely submit (monthly) the data to the National Oceanographic Data Center. Finally, we run a daily sync to an offsite location - the National Center for Atmospheric Research. hourly to local external hard disk and weekly to lab wide server. We have daily back-up the data to our network drive. In addition, we back up data to a local hard disk once per month.

Statistic Value Total Responses 6

217! ! ! ! ! !

26. Please describe your experiences with data loss, formatting, or file size issues. Text Response We have had to restore sections of information from tape multiple times. Usually due to human error. Formatting is mostly for report output from the database and those reports continue to evolve with science needs. File size is a continuing discussion of how we are going to continue support the digital repository at the rate of 5 TB a year growth. However we have a planned capacity in excess of 10 years growth, which is our current contract. Personally, I haven't lost much data. There are data sheets in previous labs I no longer have access to. The most common issue is having to toss samples for lack of space or having a culture die or something like that. Never had any issues since the college takes backups off site, and the RAID system has enough redundancy to prevent data loss. So far we have been lucky… File size aren't an issue since we keep them small intentionally. Data loss has not been a problem to date (knock on wood). Since we enforce a common data submission format, incoming data are in a known format. Problems primarily occur when the operator fails to include an essential component of the format, or includes unexpected parameter without informing us of the change. On rare occasions, ship-to-shore communication errors can garble data files, resulting in data loss. File size is not an issue as our granules are small (~200kbyte/day) and we have a Petabyte storage system at COAPS. have been able to handle all file formats, back to 3 1/2 inch floppies. Have some 5 1/4 inch floppies but I believe I transferred all data before losing ability to read them. Have transferred and discarded all data from older formats, 9 track tape, 8 inch floppies, punch cards, paper tape, Jaz disks. All primary data formats are ascii as the most durable format. We have issues with the length of the file name. Since we have multiple levels of folder/sub- folders, the total length of the file name can be longer than 255 characters. So we had to compromise the clarity to make the file name less than 255 characters.

Statistic Value Total Responses 6

27. Who owns your research data? Text Response NSF I suppose I do. I'm sure the institutions where I work do, technically, but none of my research will yield a profit, so they don't care. the funding agency and the PI Good question. We consider the data to be public access and distribute them with no restrictions or holds. I would assume that the original data are "owned" by the originating vessel and the subsequent reformatted and quality processed data are covered by FSU intellectual property rules. In my world, data ownership is not a concern - we want everyone to benefit from them. good question. At the moment we state the PI but the NSF seems to be changing the rules. Our funding agency, I suppose.

Statistic Value Total Responses 6

218! ! ! ! ! !

28. Who needed access to the data? How did you share and provide access to the data? Text Response Access is initially provided to only the scientists who participated in the expedition. This is called a moratorium access and the data is only available to them, no matter where they are in the world. After moratorium (typically one year) the information becomes publicly available and is used by scientists worldwide. Data access is provided via reporting tools available on the web. No one, actually. My own group, research collaborators on campus, and the larger network (AmeriFlux) Our users range from atmospheric and ocean modelers, researchers developing and deploying satellites for space-based ocean observation, and a wide range of marine climatologists. Tracking data use in an open distribution model is nearly impossible. Data are distributed via web pages, ftp services, and a THREDDS catalog server. Very rarely we will provide data on digital media for special requests. Also the data are reserved by NODC after submission to that archive. My collaborators and myself. through web downloads, CDs, flash drives and direct network copies. Our group members need access to the data, because we are constantly updating them. The project funding agency will also need access to the data. Our group members have full access to the data (including editing and deleting rights). The processed data are submitted to the funding agency on regular bases. The raw data are available to the funding agency upon request.

Statistic Value Total Responses 6

29. Thinking of security and confidentiality of your data, what security measures have been taken to preserve security and integrity of your data? Text Response The ability to add / change data is restricted to a control account that is not used by any person. It is managed via web services, which authenticate the user with their personal credentials, but use the control account to modify the database. Audit records are kept of changes made through these services and who did them. None The primary data is of very little use to the broader research community, since it's hard to read and understand without knowing exactly how and where it was collected. We do not employ any encryption or so. We limit access to our data servers via user network management. Again the data are public access, so confidentiality is not a concern. We do house the data within our COAPS severs which are protected by a range of Linux security protocols. In fact, I forgot to mention earlier that the public copy of the data (the one on our public FTP server) is a copy of the final data, which is stored within our firewall. So if the public copy was in some way corrupted (less security in the ftp location) it could be restored from our more secure internal copy. backups. Only our group member has the access to the data, which is stored in the network drive.

Statistic Value Total Responses 6

219! ! ! ! ! !

30. How will you publish and provide public or open access to your data? If not, why? Text Response Done via the web. I have put some of it on FigShare The secondary data is required by the funding agency to be made publicly available, and we have a fair-use policy in place. See above. Our data are public access. Not to the original data since without the complicated curation of the original meta data it is incomprehensible and useless. There is no forum or funding for such curation of original data in our field. The data may be published with consent of the funding agency.

Statistic Value Total Responses 6

31. Please describe the stewardship of any raw data after the life time of the project such as when data is archived for long-term preservation and when the files migrate into an archive. Text Response All data captured is formatted per NGDC requirements and delivered to them for long term maintenance and archival. I put some stuff from old projects on FigShare, but that's it. We will hold on to it as long as we can and make it available to anyone who has a sincere scientific interest and is aware of authorship and ownership. Every original data file received from a vessel operator is preserved and included in our monthly submissions to the National Oceanographic Data Center (NODC). None We have not yet discussed this with the funding agency. I would like to have suggestions from experts in FSU.

Statistic Value Total Responses 6

220! ! ! ! ! !

32. What should be (or was) archived beyond the end of your project? Who makes this decision? Text Response All data we capture per our NSF contract is archived as part of our agreement. So I guess NSF made the decision. I would like to archive everything, but there are only 24 hours in a day. We archive everything from primary to tertiary data. We, the PIs of the project make that decision. A copy of the secondary data lives with the network servers, so they decide what happens with this data. As I noted, the project is ongoing. Myself and our data team in consultation with the archival experts at NODC made decisions regarding what to archive. We have a written submission agreement with NODC that outlines these procedures. No one is going to try to reanalyze my data so there is only my personal archiving of the data. Myself make this decision. The funding agency, and the PI should make the decision. It is desirable the data is preserved indefinitely beyond the end of the project.

Statistic Value Total Responses 6

33. How long should the exemplar research project data be stored, managed, and preserved? Text Response See above. Length of time to be held in NGDC is as long as that Data Center exists. Ideally, forever.... It's a continuing long-term ecological study, so the horizon is 30+ years? We submit data to NODC because they are a 100+ year archive. Preservation is their responsibility, but the data will have value for 100s of years. Until I retire. Indefinitely

Statistic Value Total Responses 6

221! ! ! ! ! !

34. Are there any other challenges, concerns, and barriers in managing your research data that have not been addressed? If so, then please describe them and the services that would help you to deal more effectively deal with those issues. Text Response I think it all comes down to time and money. A service that would do this for you would help. Serving large primary data set to outside users is a challenge. Proper documentation of primary and secondary data is still a challenge, since personnel turns over so fast. The number one challenge is resources. Research data management always takes a back seat to research data collection. The work of a data management center is meticulous and requires personnel with very strong computer skills. It is very difficult to find, employ, and retain talented database managers and system architects in a university system (with its limited compensation system). Securing both external and institutional support for research data management continues to be a challenge. The data is in a structure that is difficult to make query like in a database software

Statistic Value Total Responses 4

35. Please rank which of the following storage devices you use for storing your research data (please rank them from 1 to 6, where 1 is the one that you use the least and 6 is the one you use the most): Total # Answer 1 2 3 4 5 6 Responses USB or 1 external 0 1 1 1 2 0 5 hard drive My personal 2 network 0 0 1 1 1 0 4 storage Your 3 departmental 2 1 0 1 0 1 6 shared area Dropbox or other cloud 4 0 0 1 1 1 0 4 based storage 5 Laptop/PC 0 1 0 2 0 2 5 Other 6 (Please 2 0 0 0 0 1 3 specify) Total 4 3 3 6 4 4 -

Other (Please specify) Group data servers

222! ! ! ! ! !

My Dropbox or USB or Your Other personal other cloud Statistic external departmental Laptop/PC (Please network based hard drive shared area specify) storage storage Min Value 2 0 0 0 2 1 Max Value 5 5 6 5 6 6 Mean 3.80 3.00 2.33 3.00 4.40 2.67 Variance 1.70 4.67 5.07 4.67 2.80 8.33 Standard 1.30 2.16 2.25 2.16 1.67 2.89 Deviation Total 5 4 6 4 5 3 Responses

36. How would you describe the effectiveness of the university infrastructure in managing your research data? Text Response The University is not involved in managing our information. Mediocre Very little support, each group needs to come up with its own strategy. Costly, but at least I am in charge…. Most of our infrastructure was purchased in house on grants. The university does provide a high performance network link in our building, which does improve data access performance, but overall, the university does not provide us much in the way of support. The lab has infra structure separate from the university and manages backups. The university structure is basically useless. Reasonably satisfied by the network server provided by NHMFL

Statistic Value Total Responses 6

223! ! ! ! ! !

37. Click to write Column 1 Not very Very Total # Question Confident Mean confident confident Responses Consideration of the RDM 1 requirements/costs 3 0 3 6 2.00 at the bidding stage Expertise in developing DMP 2 2 0 4 6 2.33 for bidding applications Fulfilling research 3 grant RDM 1 0 5 6 2.67 obligations Awareness of university RDM 4 2 2 2 6 2.00 facilities and DMP policies Ease of access to the research data during the life- 5 0 1 5 6 2.83 time of the research project by yourself Ease of access to 6 the research data 0 3 3 6 2.50 by collaborators Re-accessing the 7 data after the 1 3 2 6 2.17 project life-time

224! ! ! ! ! !

Ease of access to the researc Re- Expertise Awarene h data Ease of accessi in Fulfilling ss of Consideration during access to ng the developin research universit of the RDM the the data g DMP grant y RDM Statistic requirements/c life- research after for RDM facilities osts at the time data by the bidding obligatio and bidding stage of the collaborat project applicatio ns DMP researc ors life- ns policies h time project by yourse lf Min 1 1 1 1 2 2 1 Value Max 3 3 3 3 3 3 3 Value Mean 2.00 2.33 2.67 2.00 2.83 2.50 2.17 Varianc 1.20 1.07 0.67 0.80 0.17 0.30 0.57 e Standard Deviatio 1.10 1.03 0.82 0.89 0.41 0.55 0.75 n Total Respons 6 6 6 6 6 6 6 es

38. Did you find this interview useful? Was it helpful in increasing your awareness of my project and the requirements of RDM? # Answer Response %

1 Yes 3 50%

2 No 3 50%

Total 6 100%

Statistic Value Min Value 1 Max Value 2 Mean 1.50 Variance 0.30 Standard Deviation 0.55 Total Responses 6

225! ! ! ! ! !

39. As a result of your participation in this interview, do you think you will re-evaluate your RDM practice? # Answer Response %

1 Yes 2 33%

2 No 4 67%

Total 6 100%

Statistic Value Min Value 1 Max Value 2 Mean 1.67 Variance 0.27 Standard Deviation 0.52 Total Responses 6

40. Do you have any questions regarding this interview? Text Response I didn't understand some of the questions. None at the moment

Statistic Value Total Responses 2

41. What comments, feedback, or suggestions do you have to help improve future research data management interviews? Text Response A brief explanation of the theoretical models and concepts (earlier Questions) would have been helpful. I was unfamiliar with any of those, so I clicked on the elements that seemed right, but no guarantee….

Statistic Value Total Responses 1

226! ! ! ! ! !

APPENDIX K

COPYRIGHT PERMISSION FOR SELECT FIGURES AND TABLE

Copyright Permission for Fig. 2 Re:!CCSDS!OAIS!Functional!Preservation!diagram!G!CCSDS!650.0GMG2!! Thomas!Gannett!!! Sat!5/3/2014!3:41!PM!! To:!Smith!II,!Plato;!! You!may,!provided!you!cite!the!source.! Sent!from!my!iPhone! ! On!May!3,!2014,!at!3:21!PM,!"Smith!II,!Plato"!wrote:!

Good$afternoon$Tom,$ $ May$I$Fair$Use$the$attached$CCSDS$OAIS$functional$preservation$diagram$in$my$dissertation?$ Thanks$in$advance$for$your$response.$ Respectfully,$ Plato$

Copyright Permission for Fig. 4 RE:!DCC!Curation!Lifecycle!Model!copyright!permission!for!dissertation!! contact!!! Mon!5/5/2014!11:43!AM!! To:!Smith!II,!Plato;!! Edinburgh!University!charitable!status;!! Dear$Plato! $! Thank$you$for$your$email.$! Everything$on$the$site$is$licensed$with$a$CCHBY$licence,$unless$it$explicitly$says$otherwise,$so$although$you$ don't$need$our$permission$to$use$the$model$you$do$need$to$acknowledge$the$source.$! Please$attribute$to$"Digital$Curation$Centre,$University$of$ Edinburgh$$http://www.dcc.ac.uk/resources/curationHlifecycleHmodel”! $! Kind$regards,$! Lorna$Brown! DCC$Administrator! Digital$Curation$Centre! University$of$Edinburgh! Appleton$Tower! 11$Crichton$Street! Edinburgh! EH8$9LE! 0131$6511239! The$University$of$Edinburgh$is$a$charitable$body,$registered$in$Scotland,$with$registration$number$ SC005336.! 227! ! ! ! ! !

Copyright Permission for Table 1 Re:!Data!Curation!Continuum!copyright!permission!! Andrew!Treloar!!! Sun!5/4/2014!4:31!AM!! To:!Smith!II,!Plato;!! Sure! If you want an updated version with a CC-BY license, go to http://andrew.treloar.net/research/diagrams/data_curation_continuum.pdf

From: Smith II, Plato Reply: Smith II, Plato Date: 4 May 2014 at 05:56:40 To: [email protected] [email protected] Subject: Data Curation Continuum copyright permission

Good$afternoon$Andrew,$ $ May$I$Fair$Use$your$Data$Curation$Continuum$diagram$from$The$Data$Curation$Continuum:$ managing$data$objects$in$institutional$repositories.$D"Lib&Magazine,$ 13(9/10)$in$my$dissertation?$ $ Thanks$in$advance.$ Respectfully,$ Plato$

Copyright Permission for Fig. 5 RE: Copyright permission request Olav Solem Sun 5/4/2014 3:33 AM To: Smith II, Plato; Dear$Plato, You$are$welcome$to$use$the$figure. Regards Olav$Solem $ From: Smith II, Plato Sent: 3. mai 2014 21:44 To: Olav Solem Subject: Copyright permission request Good$afternoon$Dr.$Olav$Solem, May$I$have$copyright$permission$to$use$your$Conceptual$Framework$for$Analyzing$ Methodological$Suppositions$on$page$595$of$Systems$Science:$Addressing$Global$Issues$edited$ by$Frank$A.$Stowell,$Daune$West,$and$James$G.$Howell$in$my$dissertation? Thanks$in$advance. Respectfully, Plato

228! ! ! ! ! !

REFERENCES

1. Abbott, M. R. (2009). A new path for science. In. T. Hey, S. Tansley, and K. Tolle (Eds.), The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, pp. 111–115.

2. Akers, K. & Doty, J. (2013). Disciplinary differences in faculty research data management practices and perspectives. The International Journal of Digital Curation, 8(2): pp.: 5-26.

3. Arzberger, P., Schroeder, P., Beaulieu, A., Bowker, G., Casey, K., Laaksonen, L., Moorman, D., Uhlir, P., & Wouters, P. (2004). Promoting access to public research data for scientific, economic, and social development. Data Science Journal, 3(29), pp. 130-152.

4. Atkins, D., Borgman, C., Bindoff, N., Ellisman, M., Feldman, S., Foster, I. et al. (2010). Building a UK Foundation for the Transformative Enhancement of Research and Innovation: report of the international panel for the 2009 review of the UK research councils e-science programme. Retrieved April 5, 2014 from http://www.epsrc.ac.uk/siteCollectionDocuments/Publications/reports/RCUKe- ScienceReviewReport.pdf.

5. Atkins, D. E., Droegemeier, K. K., Feldman, S. L., Garcia-Molina, H., Klein, M. L., Messerschmitt, D. G., et al. (2003). Revolutionizing science and engineering through cyberinfrastructure. Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure. Washington, DC: National Science Foundation.

6. Bailey, C. W. (2013). Research data curation bibliography. Retrieved May 20, 2013 from http://digital-scholarship.org/rdcb/rdcb.htm.

7. Bates, M. (1989). The design of browsing and berrypicking techniques for online search interface. Online Information Review, 13(5): 407-424.

8. Bates, M. (1999). The invisible substrate of information science. Journal of the American Society for Information Science, 50(12): 1043-1050.

9. Bates, M. J. (2005/2009). An introduction to metatheories, theories, and models. In K. Fisher, S. Erdelez, & L. McKechnie (Eds.), Theories of Information Behavior. American Society for Information Science and Technology.

10. BBSRC. (2010). BBSRC Data Sharing Policy. Retrieved April 5, 2014 from http://www.bbsrc.ac.uk/organisation/policies/position/policy/data-sharing- policy.aspx.

11. Beagrie, N. (2006). Digital curation for science, digital libraries, and individuals. The International Journal of Digital Curation, 1(1): p. 3-16.

229! ! ! ! ! !

12. Bell, G. (2009). The fourth paradigm: a focus on data-intensive systems and scientific communication. In. T. Hey, S. Tansley, and K. Tolle (Eds.), The Fourth Paradigm: Data- Intensive Scientific Discovery. Microsoft Research, p. xv – xvii.

13. Bias, R. G., Marty, P. F., & Douglas, I. (2012). Usability/user-centered design in the iSchools: justifying a teaching philosophy. Journal of Education for Library and Information Science, 53(4): pp. 274 – 289.

14. BOAI. (2002). Budapest Open Access Initiative. Read the Budapest open Access Initiative. Retrieved May 2, 2013 from http://www.opensocietyfoundations.org/openaccess/read.

15. Borgman, C. (2007). Scholarship in the digital age: information, infrastructure, and the Internet. MIT Press: Cambridge, MA.

16. Borgman, C. L., Wallis, J. C., & Enyedy, N. (2007). Little science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries. International Journal on Digital Libraries, 7, 17-30. doi:10.1007/s00799-007-0022-9.

17. Blatecky, A. (2012). Open Remarks by Project Sponsors. In The future of scientific knowledge discovery in open networked environments: summary of a workshop, p. 3-5. National Research Council of the National Academies. The National Academies Press: Washington, DC.

18. Blue Ribbon Task Force on Sustainable Digital Preservation and Access. (2008). Sustainable the digital investment: issues and challenges of economically sustainable digital preservation. Interim report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access.

19. Blinco, K. & McLean, N. (2004). The Wheel of Fortune: A “Cosmic” View of Repositories Space.

20. Board on Research Data and Information. (2014). Statement of task. Retrieved June 10, 2014 from http://sites.nationalacademies.org/PGA/brdi/index.htm.

21. Bolton, N. (1977). Concept formation. Pergamon Press Inc., Maxwell House, Fairview Park, Elmsford, 10523, U.S.A.

22. Boyer, E. (1990). Scholarship reconsidered: priorities of professoriate. The Carnegie Foundation for the Advancement of Teaching. San Francisco, CA: Jossey-Bass.

23. Bowker, G. C., Baker, K., Millerand, F., & Ribes, D. (2010). Toward information infrastructure studies: ways of knowing in a networked environment. In J. Hunsinger et al. (Eds.), International Handbook of Internet Research, pp. 97-117. doi: 10.1007/978-1-4020-9789-8_5.

230! ! ! ! ! !

24. Brewer, J. & Hunter, A. (1989). Multimethod research: A synthesis of styles. Newbury Park, CA: Sage.

25. Briggs, M. (2007). Models and modeling: a theory of learning. In G. Bodner & M. Orgill (Eds.), Theoretical frameworks for research in chemistry/science education, pp. 77-85. Pearson Prentice Hall: Pearson Education, Inc.

26. Burnett, K. & Bonnici, L. (2006). Contested terrain: accreditation and the future of the profession of librarianship. Library Quarterly, 76(2), pp. 193-219.

27. Burrell, G. & Morgan, G. (1979). Sociological paradigms and organizational analysis. London: Heineman.

28. Bush, V. (1945). As we may think. Retrieved May 8, 2013 from http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/3881/.

29. CCSDS. (2002/2012). The Consultative Committee for Space Data Systems. Recommendation for space data system practices. Reference model for an open archival information system (OAIS): recommended practice, CCSDS 650.0-M-2. Magenta Book, June 2012. Retrieved April 5, 2014 from http://public.ccsds.org/publications/archive/650x0m2.pdf.

30. Callaos, N. & Callaos, B. (2002). Toward a systemic notion of information: practical consequences. Informing Science, 5 (1). Retrieved May 28, 20 from http://www.inform.nu/Articles/Vol5/v5n1p001-011.pdf.

31. Campbell, D. & Fiske, D. (1959). Convergent and discriminant validation by multitrait- multimethod matrix. Psychological Bulletin, 56:81-105.

32. Center for Research Libraries (CRL). (2013). Archiving & preservation. Retrieved May 11, 2013 from http://www.crl.edu/archiving-preservation.

33. Cerf, V. (2013). Open Access. Communications of the ACM, April 2013. v. 56(4).

34. CLIR. (2011). CLIR Press Release: CLIR/DLF receives Sloan Foundation Grant for research on building data curation skills. Retrieved May 20, 2013 from http://www.clir.org/news/pressrelease/11sloanpr1.html.

35. Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age, Committee on Science, Engineering, and Public Policy. (2009). Ensuring the integrity, accessibility, and stewardship of research data in the digital age. National Academies of Sciences, National Academy of Engineering, and Institute of Medicine. The National Academies Press.

231! ! ! ! ! !

36. Committee on Facilitating Interdisciplinary Research, Committee on Science, Engineering, and Public Policy. (2004). Facilitating interdisciplinary research. National Academies. Washington: National Academy Press, p. 2.

37. Cooper, R. (1983). The other: a model of human structuring. In G. Morgan (Ed.). Beyond Method: Strategies for Social Research. Sage. pp. 202-218.

38. Cragin, M. (2009). Data curation education program: situating data curation at GSLIS. Summer Institute on Data Curation in the Humanities.

39. Creswell, J. W. (1994). Research design: Qualitative and quantitative approaches. Thousands Oaks, CA: Sage.

40. Creswell, J. W. (2007). 2nd ed. Qualitative inquiry & research design: choosing among five approaches. Thousands Oaks, CA: Sage.

41. Creswell, J. W. & Plano Clark, V. L. (2011). 2nd ed. Designing and conducting mixed methods research. Thousands Oaks: Sage.

42. Crotty, M. (1998). The foundations of social research: meaning and perspective in the research process. London: Sage.

43. Dahlberg, I. (1978). Ontical structures and universal classification. Sarada Ranganathan Endowment for Library Science. Bangalore.

44. DBASSE. (2013). Division of Behavioral and Social Sciences and Education. Public Access to Federally-Supported Research and Development Data and Publications: Two Planning Meetings. National Academy of Sciences.

45. Digital Curation Centre, University of Edinburgh. (2007/2014). DCC Curation Lifecycle Model. Retrieved May 5, 2014 from http://www.dcc.ac.uk/resources/curation-lifecycle-model.

46. DCC. (2010). Digital Curation Centre. DCC Digital Curation 101.

47. DCC. (2010). What is digital curation? Retrieved April 18, 2013 from DCC - What is digital curation?

48. Denzin, N. K. (1970/2009). The research act: a theoretical introduction to sociological methods. New Brunswick, NJ: Transaction Publishers.

49. Denzin, N. K. & Lincoln, Y. S. (1994). The handbook of qualitative research. Thousand Oaks, CA: Sage.

50. DigCCurV. (2010). Digital curator vocational education Europe. Retrieved April 18, 2013 from Europe - DigCurV.

232! ! ! ! ! !

51. Dollar, C. M. (1992). Archival Theory and Information Technologies: The Impact of Information Technologies on Archival Principles and Methods. Macerata: University of Macerata Press.

52. Douglass, K., Allard, S., Tenopir, C., Wu, L., & Frame, M. (2014). Managing scientific data as public assets: data sharing practices and policies among full-time government employees. Journal of the Association for Information Science and Technology, 65(2): pp. 251-262.

53. Dow, T. (1977). A metatheory for the development of a science of information. Journal of the American Society for Information Science, 28(6): pp. 323-332.

54. ESRC. (2010). Economic & Social Research Council. ESRC Research Data Policy. Retrieved July 22, 2013 from ESRC Research Data Policy.

55. Eisenhardt, K. M. (1989). Building theories from case study research. The Academy of Management Review, 14(4): pp. 532-550.

56. FGDC. (2013). Federal geographic data committee. Geospatial metadata standards, FGDC endorsed ISO metadata standards. Retrieved May 11, 2013 from FGDC Endorsed ISO Metadata Standards.

57. Fox, E., Yang, S., Ewers, J., Wildermuth, B., Pomerantz, J., & Oh, S. (2011). Digital libraries/conceptual framework, models, theories, definitions. Retrieved May 12, 2013 from Digital Libraries frameworks, models, theories, and definitions.

58. Gioia, D. & Pitre, E. (1990). Multiparadigm perspectives on theory building. Academy of Management Review, 15(4): 584-602.

59. Gladney, H. M. (2004). Principles of digital preservation. Retrieved July 20, 2013 from http://eprints.erpanet.org/70/.

60. Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual framework for mixed-method evaluation designs. Educational Evaluation and Policy Analysis, 11, pp. 255-274.

61. Harmon, G. (1987). The interdisciplinary study of information: a review essay. The study of information: interdisciplinary messages by Fritz Machlup; Una Mansfield. The Journal of Library History (1974-1987), 22(2): pp. 206-277.

62. Hassard, J. (1991). Multiple paradigms and organizational analysis: a case study. Organizational Studies, 12(2): pp. 275 – 298.

63. Heidorn, P. B. (2008). Shedding light on the dark data in the long tail of science. Library Trends, 57(2). Fall.

233! ! ! ! ! !

64. Heidorn, P. B., Palmer, C. L., Cragin, M. H., & Smith, L. C. (2007). Data curation education and Biological Information Specialists. DigCCurr2007: An international symposium on Digital Curation, April 18-20, 2007, Chapel Hill, NC.

65. Hey, T., Tansley, & Tolle, K. (2009). Conclusions. In. T. Hey, S. Tansley, and K. Tolle (Eds.), The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research. pp. 227-230.

66. Hey, T., & Trefethen, A. (2003). The data deluge: an e-science perspective. In F. Berman, G. Fox, & A.J.G. Hey (Eds.), Grid computing: Making the global infrastructure a reality. New York: John Wiley and Sons.

67. Hodson, S. & Jones, S. (2013). Seven rules of successful research data management in universities. The Guardian: Higher Education Network. Retrieved July 16, 2013 from 7 Rules of successful RDM in universities.

68. Hutchins, W. J. (1978). The concept of ‘aboutness’ in subject indexing. Aslib Proceedings, v. 30(5): pp. 172-181.

69. Hunter, P. (2005). A Tradition of Scholarly Documentation for Digital Objects: The Launch of the Digital Curation Centre. Retrieved December 20, 2013 from http://www.ariadne.ac.uk/issue42/dcc-rpt.

70. ICPSR. (circa 2006). Digital Curation. Retrieved May 12, 2013 from Digital Curation.

71. ICPSR. (2012). Applied data science: managing research data for re-use. Retrieved April 18, 2013 from ICPSR - Summer 2012 Workshop: Managing data for re-use.

72. International Digital Curation Education and Action. (2009). IDEA Working Group. Retrieved April 18, 2013 from International - IDEA.

73. IOAW. (2013). International Open Access Week. Open access: Redefining impact. Retrieved May 2, 2013 from http://www.openaccessweek.org/page/about.

74. Jacobs, H. H. (1989). The growing need for interdisciplinary curriculum content. In H. Jacobs (Ed.). Interdisciplinary Curriculum: Design and Implementation. Alexandria, VA: Association for Supervision and Curriculum Development. pp. 1-13.

75. Jacob, E. & Shaw, D. (1998). Sociocognitive perspectives on representation. Annual Review of Information Science and Technology (ARIST), v. 33, pp. 131-185.

76. Jick, T. D. (1979). Mixing qualitative and quantitative methods: Triangulation in action. Administrative Science Quarterly, 24, 602-611.

234! ! ! ! ! !

77. JISC. (2003). JISC Circular 6/03 (Revised). An invitation for expressions of interest to establish a new Digital Curation Centre for research into and support of the curation and preservation of digital data and publications.

78. JISC. (2006). Digital Preservation briefing paper. Retrieved May 28, 2013 from JISC Digital Preservation brief paper.

79. JISC, University of Glasgow Humanities Advanced Technology & Information Institute (HATII), & Digital Curation Center (DCC). (2009). Data Asset Framework [formerly Data Audit Framework] Implementation Guide. Retrieved May 30, 2013 from http://www.data-audit.eu/docs/DAF_Implementation_Guide.pdf.

80. Karasti, H., Baker, K. S., & Halkola, E. (2006). Enriching the notion of data curation in e-Science: data managing and information infrastructure in long-term ecological research (LTER) network. Computer Supported Cooperative Work, 15: 321-358. doi: 10.1007/s10606-006-9023-2.

81. Kuhn, T. (1996). The structure of scientific revolutions, 3rd edition. The University of Chicago Press, Chicago.

82. Johnston, M. (2011). School librarians as technology integration leaders: enablers and barriers to leadership enactment. Dissertation. Retrieved May 29, 2013 from http://etd.lib.fsu.edu/theses/available/etd-05312011-083825/.

83. Jones, S., Pryor, G. & Whyte, A. (2013). ‘How to develop research data management services – a guide fro HEIs’. DCC How-to Guides. Digital Curation Centre. Available online: http://www.dc.ac.ul/resources/how-guides.

84. Joseph, H. (2012). The impact of open access on research and scholarship: reflections on the Berlin 9 Open Access Conference. College & Research Libraries News, v. 73(2), p. 83-87. Retrieved May 3, 2013 from http://crln.acrl.org/content/73/2/83.full.

85. Kekäläinen, J. & Järvelin, K. (2002). Using graded relevance assessments in IR e valuation. Journal of the American Society for Information Science, 53(13): 1120- 1129.

86. Karasti, H., Baker, K. S., & Halkola, E. (2006). Enriching the notion of data curation in eScience: data managing and information infrastructure in the long-term ecological research (LTER) network. In M. Jirotka, R. Procter, T. Rodden, & G. Bowker (Eds.), Computer Supported Cooperative Work: An International Journal. Special Issue: Collaborative in eResearch, 15(4), 321-358.

87. Keyser, C. J. (1916). Scientific method in philosophy. Reviewed Works. Bertrand Russell, Our Knowledge of the External World as a Field for Scientific Method in Philosophy. Bulletin of the American Mathematical Society, v. 23 (2), 91-97.

235! ! ! ! ! !

88. Kroll, S. & Forsman, R. (2010). A slice of research life: information support for research in the United States. Retrieved April 7, 2014 from http://www.oclc.org/research/publications/library/2010/2010-15.pdf.

89. Kuhn, T. (1982). Commensurability, Comparability, Communicability. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association, Vol. 1982, Volume Two: Symposia and Invited Papers (1982), pp. 669-688.

90. Krathwohl, D. R. (1993). Methods of educational and social science research: An integrated approach. White Plains, NY: Longman.

91. Lesh, R., Hoover, M., Hole, B., Kelly, A., & Post, T. (2000). Principles for developing thought-revealing activities for students and teachers. In A. Kelly & R. Lesh (Eds.), Handbook of research design in mathematics and science education (pp. 457-486). Mahway, NJ: Lawrence Erlbaum.

92. Lesk, Michael. (1990). Image Formats for Preservation and Access: A Report of the Technology Assessment Advisory Committee to the Commission on Preservation and Access Washington, D.C.: Commission on Preservation and Access. Retrieved April 18, 2013 from http://www.clir.org/pubs/reports/pub5/lesk.html.

93. Levins, R. (1966). The strategy of model building in population biology. In Brewer and Collins 1981.

94. Lewis, M., & Grimes, A. (1999). Metatriangulation: building theory from multiple paradigms. Academy of Management Review, 24(4): 672-690.

95. Lord, P., & Macdonald, A. (2003a). Digital Data Curation Task Force. Report of the Task Force Strategy Discussion Day Tuesday, 26th November 2002 Centre Point, London WC1. Retrieved April 7, 2014 from: http://www.jisc.ac.uk/uploaded_documents/CurationTaskForceFinal1.pdf.

96. Lord, P. & Macdonald, A. (2003b). E-Science Curation Report: data curation for e- Science in the UK: an audit to establish requirements for future curation and provision. Retrieved April 7, from e-Science Curation Report (2003).

97. Lubchenco, J. (2010). NOA 212-15: Management of Environmental Data and Information. Retrieved July 14, 2013 from NOAA Administrative Order 212-15.

98. Lynch, C. (2009). Jim Gray’s Fourth Paradigm and the Construction of the Scientific Record. In. T. Hey, S. Tansley, and K. Tolle (Eds.), The Fourth Paradigm: Data- Intensive Scientific Discovery. Microsoft Research. pp. 177-183.

99. Mallinson, John C. (1986) “Preserving Machine-Readable Archival Records for the Millenia.” Archivaria 22(Summer): 147-52.

236! ! ! ! ! !

100. Macdonald, A., & Lord, P. (2002). Digital data curation task force: report of the Task force strategy discussion day, p. 1-11. Retrieved May 4, 2014 from Data Curation Taskforce Report - 2002.

101. McGowan, T. & Gibbs, T. A. (2009) Southampton Data Survey: Our Experiences & Lessons Learned [unpublished].

102. Messerschmitt, D. (2003). Opportunities for research libraries in the NSF Cyberinfrastructure Program. ARL Bimonthly Report, 229. Retrieved April 7, 2014 from: http://old.arl.org/resources/pubs/br/br229/br229cyber.shtml.

103. Microsoft. (2006). Towards 2020 Science. Microsoft Research: UK.

104. Moravcsik, J. M. (1977). On understanding. In: Intern. Workshop on cognitive viewpoint. University of Ghent, March 24-26, 1977, p. 73-82.

105. Morgan, G. (1983). Beyond method: strategies for social research. Sage Publications.

106. Morgan, G. & Smircich, L. (1980). The case for qualitative research. Academy of Management Review, 5(4), pp. 491-500.

107. NOAA NCDDC. (2013). National Oceanic and Atmospheric Administration National Coastal Data Development Center. Retrieved May 4, 2014 from http://www.ncddc.noaa.gov/metadata-standards/.

108. NOAA NCDDC MERMAid. (2013). Metadata Enterprise Resource Management Aid. Retrieved May 4, 2014 from http://www.ncddc.noaa.gov/metadata-standards/mermaid/.

109. National Research Council. (1995). Preserving Scientific Data on Our Physical Universe: A New Strategy for Archiving the Nation’s Scientific Information Resources. Washington, D.C.: The National Academy Press.

110. National Research Council. (2012). For Attribution – Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop. Washington, D.C.: The National Academy Press.

111. NSF. (2011). Data Management Plan Requirements. National Science Foundation. Retrieved April 6, 2014 from http://www.nsf.gov/eng/general/dmp.jsp.

112. National Science Foundation. (2012). What is interdisciplinary research? Retrieved April 18, 2013 from NSF - What is interdisciplinary research?.

113. Nibert, M. (2008). Nibert, M. (2008). “Boyer’s Model of Scholarship”, Faculty Guidebook, 4th Edition, Pacific Crest: Lisle, IL.

237! ! ! ! ! !

114. NRC. (2012). The case for international sharing of scientific data. A focus on developing countries: proceeding of a symposium. National Research Council of the National Academies. The National Academy Press: Washington, DC.

115. NSB (National Science Board). (2005). Long-lived digital data collections: enabling research and education in the 21st century. Washington, DC: National Science Foundation.

116. OASIS. (2010). Open Access Scholarly Information Sourcebook. Open access impact: a briefing paper for researchers, universities and funders. Retrieved May 2, 2013 from Open Access impact.

117. OSTP. (2013). Executive Office of the President Office of Science and Technology Policy. Memorandum for the heads of the executive departments and agencies. Retrieved May 4, 2014 from OSTP 2013 Open Access Memorandum.

118. Palmer, C. (2010). Support available for doctoral work in data curation. Retrieved May 20, 2013 from Doctoral work in data curation.

119. Pampel, H. (2013). re3data.org – Registry of Research Data Repositories launched. Post to Research Data Management Discussion list serve on May 4, 2014. Retrieved May 28, 2013 from Research-DataMan archives - May 2013.

120. Parsons, M. A., Godoy, Ø., LeDrew, E., de Bruin, T. F., Danis, B., Tomlinson, S., & Carlson, D. (2011). A conceptual framework for managing very diverse data for complex, interdisciplinary science. Journal of Information Science, 37(6), 555-569.

121. Patton, M. Q. (2002). Qualitative research and evaluation (3rd ed.). Thousand Oaks, CA: Sage Publications.

122. Pennock, M. (2006). Digital preservation. Continued access to authentic digital assets. Retrieved May 4, 2014 from http://www.jisc.ac.uk/media/documents/publications/digitalpreservationbp.pdf.

123. Petrie, H. G. (1992). Interdisciplinary Education: Are We Faced with Insurmountable Opportunities? Review of Research in Education. v. 18, pp. 299-333. American Educational Research Association.

124. Pickton, M. (2010). Diary of a repository preservation project. Retrieved April 6, 2014 from http://blog.soton.ac.uk/keepit/2010/02/07/nectar-and-the-data- asset-framework-first-thoughts/.

238! ! ! ! ! !

125. Pondy, L. & Boje, D. (1981). Bring the mind back in. In W. Evan (Ed.), Frontiers in organization and management: 83-101. New York: Praeger.

126. Power, R. K., Miles, B. B., Peruzzi, A., & Voerman, A. (2011). Building bridges: a practical guide to developing and implementing a subject-specific peer- to-peer academic mentoring program for first-year higher education students. Asian Social Science, v. 7(11). doi: 10.5539/ass.v7n11p75.

127. Qin, J., Lancaster, F. W., & Allen, B. (1997). Types of and levels of collaboration in interdisciplinary research in the sciences. Journal of the American Society for Information Science, 48(10): pp. 893-916.

128. Reed, M. (1985). Reflections in organizational analysis. London: Tavistock.

129. Rosenthal, D. S. H., Robertson, T., Lipkis, T., Reich, V., & Morabito, S. (2005). Requirements for digital preservation systems: a bottom-up approach. Retrieved May 19, 2013 from http://arxiv.org/pdf/cs/0509018.pdf.

130. Ross, D. (1951). Plato’s theory of ideas. Oxford: The Clarendon Press.

131. Rusbridge, C., Burnhill, P., Ross, S., Buneman, P., Giaretta, D., Lyon, L., & Atkinson, M. (2005). The Digital Curation Centre: A Vision for Digital Curation. In Proceedings of Local to Global Data Interoperability - Challenges and Technologies. Mass Storage and Systems Technology Committee of the IEEE Computer Society, June 20-24, Sardinia, Italy.

132. Sabrosky, A., Thompson, J., & McPherson, K. (1982). Organized anarchies: Military bureaucracy in the 1980s. Journal of Applied Behavioral Science, 18(2): pp. 137-53.

133. Saunders, C., Carte, T. A., Jasperson, J., & Butler, B. S. (2003). Lessons learned from the trenches of Metatriangulation research. Communications of Association for Information Systems, v. 11, pp. 245-270.

134. Schon, D. A. (1959/1963). Displacement of concepts. London: Tavistock Publications Limited.

135. Schultz, H. & Hatch, M. J. (1996). Living with multiple paradigms: the case of paradigm interplay in organizational studies. Academy of Management Review, 21(2), pp. 529-537.

136. Schutt, R. K. (2006). Investigating the social world. The process and practice of research. 5th ed. Sage Publications.

137. Sennett, R. (2004). Respect. London: Penguin.

239! ! ! ! ! !

138. Silverman, D. (1970). The theory of organizations: a sociological framework. Heinemann: London.

139. Smith II, P. L. (2008). Where IR you? Using “open access” to extend the reach and richness of faculty research within a university. OCLC Systems & Services: International digital library perspectives - Open access and scholarly communication: part 1, v. 24(3). Emerald.

140. Smith II, P. L. (2010). Diatomscapes expose: how faculty and digital librarians collaborate to promote and preserve the passion of research (CP3R) for digital future. World Digital Library, 3(1). teri.

141. Smith II, P. L. (2011). Using preservation of faculty research as a demo preservation use case for developing a digital preservation strategy within a research university. 1st US ETD Association 2011 Conference. Retrieved July 16, 2013 from USETDA2011.

142. Solem, O. (1993). Integrating scientific disciplines: an evergreen challenge to Systems science. In F. Stowell, D. West, & J. Howell (Eds.), Systems Science: Addressing Global Issues (pp. 593-598). New York: Plenum Press.

143. Star, S., & Ruhleder, K. (1996). Steps toward an ecology of infrastructure: design and access for large information spaces. Information Systems Research, 7(1):111-134.

144. Steneck, N. H. (2007). ORI introduction to the responsible conduct of research. U.S. Government Printing Office: Washington, DC.

145. Strauss, A., & Corbin, J. (1998). Basics of qualitative research. Thousands Oaks, CA: Sage.

146. Stvilia, B., Gasser, L., Twidale, M. B., & Smith, L. C. (2007). A framework for information quality assessment. Journal of the American Society for Information Science and Technology, 58(12): 1720-1733.

147. Suber, P. (2004/2013). Open Access Overview. Retrieved May 4, 2014 from http://legacy.earlham.edu/~peters/fos/overview.htm.

148. Tashakkori, A. & Teddlie, C. (1998). Mixed methodology: Combining qualitative and quantitative approaches. Thousands Oaks, CA: Sage.

149. Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A., Wu, L., Read, E., Manoff, M. & Frame, M. (2011) Data Sharing by scientists: Practices and Perceptions. PLoS ONE 6(6): e21101. doi:10.1371/journal.pone.0021101.

240! ! ! ! ! !

150. Tibbo, H. (2012). Placing the horse before the cart: conceptual and technical dimensions of digital curation. Retrieved May 19, 2013 from http://www.cceh.uni-koeln.de/files/Tibbo_final.pdf.

151. Torbert, W. R. (1983). Initiating collaborative inquiry, p. 272 - 291. In Beyond Method. Sage Publications, Inc.

152. Treloar, A., Groenewegen, D., & Harboe-Ree, C. (2007). The Data Curation Continuum: managing data objects in institutional repositories. D-Lib Magazine, 13(9/10).

153. Teunissen, J. (1996). Paradoxes in social science and research. In W. Koot, I. Sabelis, & S. Ybema (Eds.), Contradictions in context: 17-38. Amsterdam: Vrije Universiteit.

154. UIUC Graduate School of Library and Information Science (GSLIS), The iSchool at Illinois. (2010). Master of Science: Specialization in Data Curation. Retrieved April 18, 2013 from Specialization in Data Curation.

155. Van de Sompel, H., Payette, S., Erickson, J., Lagoze, C., & Warner, S. (2004). Rethinking scholarly communication. D-Lib Magazine, 10(9).

156. Vaughan, L. (2008). Statistical methods for the information professional: a practical, painless approach to understanding, using, and interpreting statistics. American Society for Information Science and Technology Monograph Series. Information Today, Inc. Medford, New Jersey.

157. Watry, P. (2007). Digital preservation theory and application: transcontinental persistent archives and testbed activity. The International Journal on Digital Curation, 2(2): p. 41-68. Retrieved May 4, 2014 from http://ijdc.net/index.php/ijdc/article/view/43/28.

158. Watson, H. & Wood-Harper, T. (1993). Hermeneutic approaches to learning methodology. In F. Stowell, D. West, & J. Howell (Eds.), Systems Science: Addressing Global Issues (pp. 611-615). New York: Plenum Press.

159. Whyte, A. & Pryor, G. (2010). Appendices to the report. Open to All? Case studies of openness in research. A study sponsored by the RIN and NEST, Project Reference RIN/P27. Retrieved April 5, 2014 from http://www.rin.ac.uk/our-work/data-management-and- curation/open-science-case-studies.

160. Wuchty, S., Jones, B. F., & Uzzi, B. (2007). The increasing dominance of teams in production of knowledge. Science, 316(5827): pp. 1036-1039.

241! ! ! ! ! !

BIOGRAPHICAL SKETCH

Plato L. Smith II earned a Master’s of Science in Information Science from North Carolina Central University in 2001, 2nd Bachelor of Science in Computer Information Science from North Carolina Central in 2004, and 1st Bachelor of Science in Marketing from North Carolina A&T State University. He has worked as a technical/paraprofessional librarian at Florida A&M University from Spring 2013 to Spring 2014, teaching assistant in the College of Communication and Information from Fall 2009 – Summer 2013, and as the Digital Library Center Department Head at Florida State University Strozier Library from Summer 2005 to Spring 2012. Smith enrolled in the doctoral program at Florida State University College of Communication & Information, School of Information (formerly the School of Library & Information Studies) in 2007. Plato’s research interests include data management and curation, digital libraries, digital preservation, open access & scholarly publishing.

242! ! !