D4.3 – Data Sets, Formats and Models (Public Version)

D4.3 – Data Sets, Formats and Models (Public Version)

Project Acronym: DataBio Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action) Project Full Title: Data-Driven Bioeconomy Project Coordinator: INTRASOFT International DELIVERABLE D4.3 – Data sets, formats and models (Public version) Dissemination level PU -Public Type of Document Report Contractual date of delivery M20 – 31/8/2018 Deliverable Leader SINTEF Status - version, date Final – v1.0-Public, 12/12/2018 WP / Task responsible WP4 (T4.5 and T4.6) Keywords: data set, metadata, datastream This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 Executive Summary This is a public version of Deliverable D4.3 “Data sets, formats and models”. Confidential information from the original document has been omitted. The D4.3 document starts with an introduction to the DataBio project and other documents related to D4.3 followed by an introduction to data sharing and data economy in the context of DataBio. The FAIR principle is introduced as a foundation for data finding, access, interoperability and reuse - and as a further motivation for meta data and discovery of datasets through data registries, in particular the DataBio Hub. It is also options for further support for data sharing and data exchange in particular through the use of linked data and industrial data platforms for data sharing and data exchange. The context of datasets in DataBio, is presented including external drivers for data sharing and data exchange, stakeholders and license models. Data interoperability through ontologies, models, formats and standards and data access through standard services and APIs is introduced related to the DataBio standardisation engagement in particular in the Geospatial and Earth Observation areas. Furthermore, an overview of the requirements for datasets and datastreams in DataBio grouped by pilots and the platform itself is presented. This is followed by a detailed description of the datasets in DataBio in a metadata template from the description of the datasets in the DataBio hub, for existing, improved, new and other relevant datasets. The final section gives an example of how a dataset can be used for application development, followed by concluding remarks. The deliverable also comprises contributions from WP5 on the EO Datasets and from the tasks T4.5 Big Data Variety Management and T4.6 Data Acquisition with Security support in WP4. The first phase of the DataBio project has focused on the usage and creation of datasets based on the needs and requirements of the DataBio pilots. The next phase will continue with this, but will also have increased focus on interoperability aspects of datasets through the use of ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure data sharing and data exchange beyond the individual pilots to support a growing data economy in the DataBio areas of agriculture, forestry and fishery. Relation with Other DataBio Platform Deliverables The DataBio project includes three piloting work packages (WP1-3) and two related platform work packages (WP4 handling data in general including IoT data and WP5 5 focusing on Earth Observation and geospatial data) that support the pilots (Figure 1). The DataBio platform provides Big Data capabilities to the pilots by forming software pipelines of components Dissemination level: PU -Public Page 2 D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 through which data flows from the sources in agriculture, forestry and fishery through data management, analytics, and visualization stages in the pilots. Agro Pilot 1 Forest Pilot 1 Fishery Pilot 1 Agro Pilot 2 Forest Pilot 2 Fishery Pilot 2 WP1-3 ... ... ... Agro Pilot 13 Forest Pilot 7 Fishery Pilot 6 Deliverables D4.1, D4.2, D4.3 Milestone M7 Deliverables DataBio platform Components & D5.1, D5.2, with big data components WP4 Components & WP5 Earth Observation D5.3 and datasets IoT datasets datasets Milestone M9 Figure 1: Work packages and their roles in DataBio The platform developed in DataBio is described in the Deliverables D4.1, D4.2, D4.3 (WP4) and D5.1, D5.2, D5.3 (WP5) (Figure 1). Deliverables D4.1-3 define the Milestone M7 Service ready for Trial 1, whereas Deliverables D5.1-3 define the Milestone M9 EO Services ready for integration. The platform services and pipelines have been in trials since April 2018 (M16). More specifically, the public deliverable D4.1 Platforms and interfaces describes the software components to be utilized by the pilots. Most of components are already in use in the first pilot trials. In addition, this deliverable reports the outcome of a matchmaking process, in which the pilots selected which components to deploy in their pilots. Deliverable D4.2 Services for tests builds on D4.1 and provides an overview of the component pipelines as identified at month 16 (M16) of the project. It also provides guidelines for successful implementation and deployment of the pipelines. This deliverable, D4.3 Datasets, formats and models is due at the end of August 2018. While the two earlier reports deal with software modules, this report will focus on the data sets and streams employed in DataBio. Data formats, standards and models enabling easy findability, access, interoperability, and reusability of data (FAIR principle) will be dealt with. Thus, in this deliverable we will address topics beyond the coverage of single pilots. Deliverable D5.1 EO component specification includes an analysis of the EO dataset and component related requirements provided by the pilots. It was published in end of 2017 and contains an overview of best practices of EO access and initial component and dataset requirements based on the DataBio pilot needs. Dissemination level: PU -Public Page 3 D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 Deliverable D5.2 EO component and interfaces describes, building on D5.1, the Earth Observations component pipelines similarly as D4.2 does for IoT components. It also includes examples of data experimentations with the pipelines. Deliverable D5.3 EO services and tools builds on 5.1 and 5.2 and describes how the technical components from DataBio can be scaled-up to services and tools that are installed as Software as a Service (SaaS) or on premise. It further provides the information how and under which conditions these services and tools can be externally accessed. Dissemination level: PU -Public Page 4 D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 Deliverable Leader: Arne-Jørgen Berre (SINTEF) Ståle Walderhaug (SINTEF), Pekka Siltanen (VTT), Caj Södergård (VTT), Miguel Ángel Esbrí (ATOS), Javier Hitado Simarro (ATOS), Ephrem Habyarimana (CREA), Iason Kastanis (CSEM), Margus Freudenthal (CYBER), Allan Aasbjerg Nielsen (DTU), Marco Corsi – (e-geos), Kostas Akasoglou (EXUS), Ioannis Komnios (EXUS), Adamantios Maragkos (EXUS), Anuj Sharma (EXUS), Charikleia Stefanou (EXUS), Dimitris Vassiliadis (EXUS), Petr Lukes (FMI), Eva Klien (Fraunhofer), Ivo Senner (Fraunhofer), Fabiana Fournier (IBM), Inna Skarbovsky (IBM), Christian Zinke (InfAI), George Bravos (INTRASOFT), Vassilis Chatzigiannakis (INTRASOFT), Karel Charvat (LESPRO), Karel Charvat, jr (LESPRO), Tomas Reznik (LESPRO), Anu Kosunen (METSAK), Contributors: Virpi Stenman (METSAK), Seppo Huurinainen (MHGS), Veli-Matti Plosila (MHGS), Panagiotis Elias (NP), Kostas Karalas (NP), Stamatis Krommidas (NP), Kostas Mastrogiannis (NP), Natassa Miliaraki (NP), Ilias Panos (NP), Menelaos Perdikeas (NP), Savvas Rogotis (NP), Pavlos Tsagkis (NP), Marco Folegani (MEEO), Ingo Simonis (OGCE), Soumya Brahma (PSNC), Raul Palma (PSNC), Juliusz Pukacki (PSNC), Jarkko Vähäkangas (Senop), Andrey Sadovykh (Softeam), Marc Gilles (Spacebel), Yves Coene (Spacebel), Anca Liana Costea (TerraS), Adrian Stoica (TerraS), Delia Teleaga (TerraS), Jesus Estrada Villegas (TRAGSA), Asuncion Roldan Zamarron (TRAGSA), Michal Kepka (UWB), Karel Jedlička (UWB), Tomas Mildorf (UWB), Erwin Goor (VITO), Jarmo Kalaoja (VTT), Tuomas Paaso (VTT), Kari Rainio (VTT), Renne Tergujef (VTT) Per Gunnar Auran (SINTEF Fishery) Reviewers: Tomas Mildorf (UWB) Virpi Stenman (METSAK) Approved by: Athanasios Poulakidas (INTRASOFT) Dissemination level: PU -Public Page 5 D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 Document History Version Date Contributor(s) Description 0.1 05.06.2018 Ståle Walderhaug Initial ToC Ståle Walderhaug / 0.2 21.06.2018 ToC with section assignments Arne J. Berre 0.3 01.08.2018 Datasets included from partners Adrian Stoica, D5.i2 datasets included. Added FAIR data. 0.4 15.08.2018 Terrasigna Examples of use included. Updated with license policy information. 0.5 20.08.2018 Ståle Walderjaug Added concerns section Caj Södergård, Ståle 0.6 24.08.2018 Requirement in place. Datasets updates Walderhaug Arne J. Berre, Ståle 0.7 28.08.2018 Walderhaug, Caj Version for internal review Södergård Ståle Walderhaug, 0.8 31.08.2018 Version updated after internal review Arne

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    154 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us