Use Case: Nanoscopy Open Research Data Repository (NORDR)
Total Page:16
File Type:pdf, Size:1020Kb
RDA Repository Platforms for Research Data Interest Group
Use Case: Nanoscopy Open Research Data Repository (NORDR) Author(s): A. Prabhune, T. Jejkal, V. Hartmann, R. Stotzka
1. Scientific Motivation and Outcomes Novel imaging methods enable new insights but very often the images are hard to interpret and potential users and scientists need to be trained on the basis of reference data and their already known interpretation. These need to be publicly shared in an open reference data repository. Furthermore, new insights of the reference data will be gained with growing experience. Therefore, appropriate tools must enable the open discussion of data and associated analysis results. This necessitates data sharing on the one hand and annotation capabilities on the other. The nanoscopy research data repository supports the complete data life cycle by providing various services such for long term archival, large scale data processing, automated metadata modelling and storage, annotation services and data publication.
2. Functional Description
High-resolution microscopes generate raw datasets in the range of hundreds of Terabytes which are ingested into the NORDR. Depending on the high-resolution microscope different raw data files are generated. Based on the data type of the file, the metadata is automatically extracted, modelled and stored in the metadata storage. Each dataset is assigned a PID For systematic management of metadata the captured metadata is categorised under Administrative Metadata (AM), Descriptive Metadata (DM) and Technical Metadata (TM). Once a raw dataset is ingested, community-defined data processing workflows are executed. The results of all workflow steps, also intermediate results, are ingested in the NORDR for reuse and are linked to the dataset they originate from. Furthermore, an according provenance graph is created and stored. Experts from the Nanoscopy Research Community evaluate and annotate the results. Depending on the evaluation results and new insights from the community, processing algorithms and workflows are improved and the datasets have to be reprocessed. For allowing data discovery, the NORDR provides various services which are built on top of the metadata storage. The NORDR supports METS metadata standard for allowing metadata interoperability. Metadata mining services allow to analyse and compare different workflow results.
Page 1 of 3 3. Achieved Results
A first version of the NORDR is installed, Data ingest and access workflow has been implemented and tested and was made available to the community. A generic metadata framework for extraction, modelling and storing heterogeneous metadata has been defined and is partly implemented.
Page 2 of 3 4. Requirements
Requirement Description Motivation from Use Importance (1 - very Case important to 5 - not at all important) Data Annotation The novel nanoscopy Remotely located 1 result images have to researchers can be annotated to share their insights capturing valuable insights Vocabulary Service Scientific terms have Integrating 2 to be consistent for vocabulary for allowing future reuse allowing systematic annotations Data and Metadata Bit preservation, Metadata is the 1 quality control checksum of data backbone for data and metadata discovery and hence completeness, quality assessment is accuracy, necessary correctness and etc High performance For efficient Frequent 2 computing (HPC) processing of large (re-)processing must integrating with Data datasets not be triggered by repository users but should be seamlessly integrated into the repository system. DOIs assignment For publishing the DOIs will allow data 2 results there has to sharing and with PID be clear mapping mapping enable between the PID and reproducibility of the DOI results Data Policies Data policies are Especially for 2 used to define what processing and happens when to quality control which dataset, regularly enforced policies are helpful.
Page 3 of 3