Curation Profiles Access. Knowledge. Success A Brief Introduction to the Data Curation Profiles

Jake Carlson Data Services Specialist

Data Curation Profiles Access. Knowledge. Success Data Curation Profiles Access. Knowledge. Success Setting the stage Since 2004: Purdue Interdisciplinary Research Initiative revealed many data needs on campus What faculty said… • Not sure how or whether to share data • Lack of time to organize data sets • Need help describing data for discovery • Want to find new ways to manage data • Need help archiving data sets/collections Data Curation Profiles Access. Knowledge. Success “Investigating Data Curation Profiles across Research Domains”

This lead to a research award from IMLS • Goals of the project: – A better understanding of the practices, attitudes and needs of researchers in managing and sharing their data. – Identify possible roles for librarians and the skill sets they will need to facilitate data sharing and curation. – Develop “data curation profiles” – a tool for librarians and others to gather information on researcher needs for their data and to inform curation services. – Purdue: Brandt—PI, Carlson—Proj Mgr, Witt—coPI; GSLIS UIUC: Palmer—co-PI, Cragin—Proj Mgr

Data Curation Profiles Access. Knowledge. Success Data Curation Profiles Methodology • Initial interview— based on “Conducting a Data Interview” poster, then pre-profile based on model of data in a few published papers • Interviews were transcribed, reviewed • Coding, wanted more detail -> 2nd Interview • Developed and revised a template for Profile

“In our experience, one of the most effective tactics for eliciting datasets for the collection is a simple librarian- researcher interview. In this poster, we share a set of ten questions…” Data Curation Profiles Access. Knowledge. Success

What the interviews asked…

• Research Data Lifecycle (story of the data) • / Storage • Disposition of the Data • Data Dissemination and Sharing • and Repositories • Roles for Libraries and Librarians Data Curation Profiles Access. Knowledge. Success Interview areas: 20 faculty, 12 disciplines

Agronomy & Soil Science (Purdue & UIUC), Anthropology (UIUC), Biochemistry (Purdue), Biology (Purdue), Civil Engineering (Purdue), Earth & Atmospheric Sciences (Purdue & UIUC), Electrical & Computer Engineering (Purdue), Food Science (Purdue), Geology (UIUC), Horticulture & Plant Science (Purdue & UIUC), Kinesiology (UIUC), Speech and Hearing (UIUC) Data Curation Profiles Access. Knowledge. Success

Prioritize your needs for the following types of services

The ability for researchers within my discipline to easily find this dataset

The ability for researchers outside of my discipline to easily find this dataset

n=19 The ability to cite this dataset in my publications

The ability for people to easily discover this dataset using Google

Witt, M. (2009, May 18). Eliciting Faculty Requirements for Research Data Repositories 4th Int’l Conference on Open Repositories. Georgia Tech, Atlanta, GA. Data Curation Profiles Access. Knowledge. Success

Prioritize your needs for the following types of services

The ability of the repository to provide version control for the data

The process of submitting this dataset to a repository is automated

n=19 The ability for me to submit this dataset to a repository myself

The ability to make these data accessible in multiple formats

Witt, M. (2009, May 18). Eliciting Faculty Requirements for Research Data Repositories 4th Int’l Conference on Open Repositories. Georgia Tech, Atlanta, GA. Data Curation Profiles Access. Knowledge. Success

• Cragin, M.H., Palmer, C.L., Carlson, J.R., & Witt, M. (2010). “Data Sharing, Small Science, and Institutional Repositories.” Philosophical Transactions of the Royal Society A, 368(1926).

• “If cyberinfrastructure is ―principally about data: how to get it, how to share it, how to store it, and how to leverage it for scientific discovery and learning (Edwards, Jackson, Bowker and Knobel 2007, p. 31), then advancing cyberinfrastructure is dependent on our understanding of how to support data practices and needs.” Data Curation Profiles Access. Knowledge. Success The Data Curation Profile • A means to capture requirements for specific data generated by a single scientist or lab, based on their reported their needs and preferences for these data. • Represents the researcher’s needs and perspectives. • A concise, structured document suitable for sharing and annotation. • A resource for Librarians, Archivists, IT Professionals, Data Mangers, and others. Data Curation Profiles Access. Knowledge. Success Characteristics of the DCP

• Tells “the story” of the data • Focused on a specific data set – provides depth not breadth • Interview based • Meant to be “discipline neutral” and widely applicable to different types of data • Modular – allows for flexibility and tailoring to specific situations and uses

Data Curation Profiles Access. Knowledge. Success DCP Sections • Information about the Data and its Context – Overview of the Research • Focus • Intended Audience • Funding – Data Kinds and Stages • Data Narrative (data lifecycle) • Target Data for Sharing • Use/re-use Value • Contextual Narrative

Data Curation Profiles Access. Knowledge. Success Data Stage Output Typical File Size Format Other / Notes Primary Data Sensor 100k in 1 file per proprietary to FTP downloads are Raw data day the sensor mostly automated. Sensor Data are formatted data – into .csv before bring Processing open/acces reformatted into a Stage 1 sible format Roughly 6kb .csv / .xls mySQL database. Data are extracted 800 records per from the mySQL Data intersection per database for analysis Processed vectors day. SQL / .xls purposes. charts and graphs charts/ used for Analyzed Graphs .xls / .emf interpretation. charts/ Data are presented Published graphs .ppt via power point. Ancillary Data Stills taken .gif /.jpg / Images generated Image from video .ppt from video. Data Curation Profiles Access. Knowledge. Success More DCP Sections . Information about Needs –Intellectual –Tools Property –Interoperability –Organization and description –Measuring of data Impact –Ingest –Data –Access Management –Discovery –Preservation

Data Curation Profiles Access. Knowledge. Success The DCP Toolkit

The Data Curation Profile Toolkit consists of 4 components: • User Guide • Interviewer’s Manual • Interview Worksheet • DCP Template

Photo from: / Data Curation Profiles Access. Knowledge. Success

Scenarios for using DCP across campus

Research proposal  development: data mgmt plans Resource data collections:

Research data sets: data of use to community Reference data  local data management collections: Research data sets:  data useful in long term  local data assessment 

Helping faculty in preparation of sharing data — Helping faculty in preparation of archiving data in repository — —In general liaison role, seeking to understand specific faculty research — Used during a collaboration or research project to understand data mgmt needs — Participating with VP for Research Office to understand NSF “mandate” needs Data Curation Profiles Access. Knowledge. Success How does Purdue use the Profiles? • Strategy 1: D2C2 provides expert consultant/collaborator D2C2 DCP reputation with researchers. • Strategy 2: D2C2 creates tools

to help liaison librarians engage librarians researchers; Data Curation Profiles help give structure to conversations about data and facilitate information gathering. researchers Data Curation Profiles Access. Knowledge. Success


Jake Carlson Data Services Specialist