DataONE: Ensuring access, use and reuse of science data

Rebecca Koskela

DataONE, University of New Mexico

Virtualizing infrastructures for Data e-Infrastructures & RDA for data intensive science 22 September 2015 Developing Sustainable Data Discovery and Interoperability Solutions

Three major components for a Coordinating Nodes flexible, scalable, sustainable • retain complete metadata network catalog • indexing for search • network-wide services • ensure content availability (preservation) • replication services

2 Developing Sustainable Data Discovery and Interoperability Solutions

Three major components for a Coordinating Nodes flexible, scalable, sustainable • retain complete metadata network Member Nodes • diversecatalog institutions • serveindexing local for community search • providenetwork resources-wide services for • managingensure content their availabilitydata • retain(preservation) copies of data • replication services

3 DataONE Coordinating Nodes and Member Nodes

Coordinating Nodes at Member Nodes UNM, UCSB, and ORC Current In Development Enabling Science Through Tools and Services

Plan

Analyze Collect

Integrate Assure

Discover Describe

Preserve

5

Phase II Goal: Facilitate reproducible science

. Track data derivation history . Track data inputs and outputs of analyses . Track analysis and model executions . Preserve and document

. Link all of these to publications New Tools . Provenance indexing and search . Web user interface for browsing provenance . Matlab tool for generating provenance . { tool for generating provenance} Semantics: enabling improved discovery

Goal: Improve precision and recall of search . Annotating data with semantic types . Extending search systems to incorporate semantics . Display / edit semantic annotations in the web UI New first-class objects in DataONE . Annotating data with semantic types . Semantic Annotations New Tools . New model for linking semantics to data . New KR models for Carbon Cycle and related areas . Ontology repository . Automated classifier to assign semantics to variables . Web UI for browsing and editing semantics Collaborations Challenges

• Diverse data and standards

• Diverse institutions

• Diverse software

9