Research Data Management Best Practices
Total Page:16
File Type:pdf, Size:1020Kb
Research Data Management Best Practices Introduction ............................................................................................................................................................................ 2 Planning & Data Management Plans ...................................................................................................................................... 3 Naming and Organizing Your Files .......................................................................................................................................... 6 Choosing File Formats ............................................................................................................................................................. 9 Working with Tabular Data ................................................................................................................................................... 10 Describing Your Data: Data Dictionaries ............................................................................................................................... 12 Describing Your Project: Citation Metadata ......................................................................................................................... 15 Preparing for Storage and Preservation ............................................................................................................................... 17 Choosing a Repository .......................................................................................................................................................... 19 Glossary ................................................................................................................................................................................. 22 Research Data Management Best Practices 2018-02-28 INTRODUCTION The following best practices are intended for use by Smithsonian researchers and affiliated staff who plan for, create, and/or work with digital research data. Additional information about available tools, policies, and resources for managing research data can be found on https://library.si.edu/research/data-management . There are many phases in the research data lifecycle and they do not always occur in the tidy order pictured in the diagram (left). These best practices are designed to improve overall management of data at each point in the lifecycle, resulting in published data that are not only easy to care for long after the project is complete, but that are also findable, accessible, interoperable, and reusable. The Smithsonian has other resources that can contribute to effective data management including software and high performance computing provided by OCIO, and training and planning consultation services from the Libraries. Contact [email protected] for more information. SI also has two locally managed repositories that accept research data for publication and/or archiving: SIdora – is best for larger, or more complicated datasets, including actively updated datasets. To deposit data in SIdora contact Beth Stern or email [email protected] Smithsonian Research Online (SRO) – is best for smaller (<50GB), fixed (inactive) datasets that accompany or support publications deposited in SRO. To deposit data and publications in SRO, you can self-deposit using the forms found here http://staff.research.si.edu/input_forms.cfm or contact [email protected] Actively managing your data throughout the research process enables reproducibility, reusability, and discovery, and can help maximize the impact of your research into the future. Research Data Management Best Practices 2018-02-28 PLANNING & DATA MANAGEMENT PLANS Many granting agencies, such as NSF and the Alfred P. Sloan Foundation, require a formal data management plan (DMP) as part of a grant proposal. Even if a granting agency does not require a DMP, SI strongly recommends that PIs create a planning document before starting any project that will create digital research data. DMPs are valuable tools for addressing issues that affect not only collection and use, but also the long-term viability of your data. A written data management plan can: provide continuity on projects if staff join or leave allow for future validation or reproduction of results enable reuse of your data in potentially novel ways SI Libraries staff are available for consultation on creating DMPs and are happy to review draft DMPs before submission with a proposal. Contact [email protected] for more information. Proposals The Smithsonian Office of Sponsored Projects (OSP) provides administrative and financial services for externally funded grants and contracts, and is available to assist PIs with technical and procedural questions related to managing grants and awards. OSP also provides training in proposal development, writing and editing, and compliance oversight for areas such as Institutional Animal Care & Use, Export Control, Human Subjects in Research, and Responsible Conduct of Research. Their list of online and in person learning opportunities are available on their PRISM site. Planning checklist Any plan should at a minimum answer the following questions in bold for each stage in the data management lifecycle. More specific guidance for questions in the data collection, publishing, and archiving stages is available at https://library.si.edu/research/data-management PROPOSAL/PLANNING STAGE ⃞ What type of data is being collected/generated? ⃞ Who is involved in data collection? ⃞ Who "owns" the rights to the data? ⃞ Are there restrictions on sharing and reuse? ⃞ Are there applicable institutional policies on how the data is handled, shared, or archived? ⃞ Who will be using the data? ⃞ If a collaborative project, are there MOUs that define roles and responsibilities? ⃞ How do the outcomes need to be reported, e.g., to a sponsor or publisher? DATA COLLECTION STAGE ⃞ How will data be acquired/collected? ⃞ What metadata standards and schema will be used? ⃞ What are the file and data field naming conventions? ⃞ What are the temporary storage requirements (size,cost,media)? ⃞ How, where, and how frequently will data be backed up? ⃞ Are there existing standards for data structure and vocabularies, or will they be developed? Research Data Management Best Practices 2018-02-28 ⃞ Are there existing workflows for collecting, processing, describing, and storing the data, or will they need to be developed? ⃞ Is there a data model for the project? ⃞ Will your data be versioned, and if so, how will versioning be handled? ⃞ What is your quality assurance/quality control process? PUBLISHING STAGE ⃞ What repository or platform will be used to share the data? ⃞ Who will be responsible for deposit and archiving after the project ends? ⃞ If the data is to be shared publicly, what license should be applied? Are there any use restrictions? ⃞ If the data is embargoed, what is the embargo period, and who will manage it? ⃞ If the data is not public, how will access be restricted? ⃞ What costs are associated with publishing? ⃞ What unique identifier will be assigned to the data (DOI, etc.)? ARCHIVING STAGE ⃞ Who is responsible for preserving the datasets in the future? ⃞ What data should be retained? ⃞ Where will the data be archived? ⃞ How much storage will be needed? ⃞ How long should the data be maintained, and why? ⃞ What are the risks for future access to the data, i.e., proprietary file formats, specialty software needed to interpret, password-protected systems? ⃞ How should the data be maintained in the future? ⃞ Is there a cost associated with archiving the data? ⃞ How will the data be found? Funder-specific DMP Requirements Some funding agencies require that plans submitted with grant proposals include specific elements or specific formatting. Below is a list of links to those requirements, alphabetical by funder, for selected granting organizations. * = sample plans available on their site Alfred P. Sloan Foundation BCO-DMO NSF OCE: Biological and Chemical Oceanography Department of Energy – DOE: Generic * Gordon and Betty Moore Foundation (pdf) Institute for Museum and Library Services IMLS : guidelines for datasets (Word doc) * National Aeronautics and Space Administration NASA * National Endowment for the Humanities NEH-ODH (pdf) National Oceanic and Atmospheric Administration NOAA * National Science Foundation NSF-Generic DMP o NSF-Atmospheric and Geo o NSF-Astronomy (pdf) o NSF-Biology (pdf) o NSF-Earth Sciences o NSF-Education and Human Resources (pdf) Research Data Management Best Practices 2018-02-28 United States Geological Survey USGS Tools and templates SMITHSONIAN SPECIFIC TEMPLATES The Data Management Team has developed boilerplate (temporarily located on an internal Confluence site) that can be used when applying for an NSF grant if you plan to deposit data either in SRO or SIDora. The boilerplate address the specifics of data archiving, dissemination, policies, and roles and responsibilities within the SI data management ecosystem. DMPTOOL One of the major tools for creating data management plans is the DMPTool, hosted by the University of California Curation Center (UC3 The Smithsonian was one of the original partner institutions involved in creating the DMPTool. The DMPTool website includes templates and requirements for a large number of granting bodies, including NSF, DOE and NIH. Any researcher at the Smithsonian can create an account and login to the DMPTool by selecting "Smithsonian Institution" from the list of institutions