Dataone: Ensuring Access, Use and Reuse of Science Data
Total Page:16
File Type:pdf, Size:1020Kb
DataONE: Ensuring access, use and reuse of science data Rebecca Koskela DataONE, University of New Mexico Virtualizing infrastructures for Data e-Infrastructures & RDA for data intensive science 22 September 2015 Developing Sustainable Data Discovery and Interoperability Solutions Three major components for a Coordinating Nodes flexible, scalable, sustainable • retain complete metadata network catalog • indexing for search • network-wide services • ensure content availability (preservation) • replication services 2 Developing Sustainable Data Discovery and Interoperability Solutions Three major components for a Coordinating Nodes flexible, scalable, sustainable • retain complete metadata network Member Nodes • diversecatalog institutions • serveindexing local for community search • providenetwork resources-wide services for • managingensure content their availabilitydata • retain(preservation) copies of data • replication services 3 DataONE Coordinating Nodes and Member Nodes Coordinating Nodes at Member Nodes UNM, UCSB, and ORC Current In Development Enabling Science Through Tools and Services Plan Analyze Collect Integrate Assure Discover Describe Preserve 5 Provenance Phase II Goal: Facilitate reproducible science . Track data derivation history . Track data inputs and outputs of analyses . Track analysis and model executions . Preserve and document software . Link all of these to publications New Tools . Provenance indexing and search . Web user interface for browsing provenance . Matlab tool for generating provenance . {R tool for generating provenance} Semantics: enabling improved discovery Goal: Improve precision and recall of search . Annotating data with semantic types . Extending search systems to incorporate semantics . Display / edit semantic annotations in the web UI New first-class objects in DataONE . Annotating data with semantic types . Semantic Annotations New Tools . New model for linking semantics to data . New KR models for Carbon Cycle and related areas . Ontology repository . Automated classifier to assign semantics to variables . Web UI for browsing and editing semantics Collaborations Challenges • Diverse data and standards • Diverse institutions • Diverse software 9.