Project Acronym: DataBio Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action) Project Full Title: Data-Driven Bioeconomy Project Coordinator: INTRASOFT International

DELIVERABLE D4.3 – Data sets, formats and models (Public version)

Dissemination level PU -Public Type of Document Report Contractual date of delivery M20 – 31/8/2018 Deliverable Leader SINTEF Status - version, date Final – v1.0-Public, 12/12/2018 WP / Task responsible WP4 (T4.5 and T4.6) Keywords: data set, metadata, datastream

This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee.

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Executive Summary

This is a public version of Deliverable D4.3 “Data sets, formats and models”. Confidential information from the original document has been omitted.

The D4.3 document starts with an introduction to the DataBio project and other documents related to D4.3 followed by an introduction to data sharing and data economy in the context of DataBio. The FAIR principle is introduced as a foundation for data finding, access, interoperability and reuse - and as a further motivation for meta data and discovery of datasets through data registries, in particular the DataBio Hub. It is also options for further support for data sharing and data exchange in particular through the use of linked data and industrial data platforms for data sharing and data exchange. The context of datasets in DataBio, is presented including external drivers for data sharing and data exchange, stakeholders and license models. Data interoperability through ontologies, models, formats and standards and data access through standard services and APIs is introduced related to the DataBio standardisation engagement in particular in the Geospatial and Earth Observation areas. Furthermore, an overview of the requirements for datasets and datastreams in DataBio grouped by pilots and the platform itself is presented. This is followed by a detailed description of the datasets in DataBio in a metadata template from the description of the datasets in the DataBio hub, for existing, improved, new and other relevant datasets. The final section gives an example of how a dataset can be used for application development, followed by concluding remarks. The deliverable also comprises contributions from WP5 on the EO Datasets and from the tasks T4.5 Big Data Variety Management and T4.6 Data Acquisition with Security support in WP4. The first phase of the DataBio project has focused on the usage and creation of datasets based on the needs and requirements of the DataBio pilots. The next phase will continue with this, but will also have increased focus on interoperability aspects of datasets through the use of ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure data sharing and data exchange beyond the individual pilots to support a growing data economy in the DataBio areas of agriculture, forestry and fishery. Relation with Other DataBio Platform Deliverables The DataBio project includes three piloting work packages (WP1-3) and two related platform work packages (WP4 handling data in general including IoT data and WP5 5 focusing on Earth Observation and geospatial data) that support the pilots (Figure 1). The DataBio platform provides Big Data capabilities to the pilots by forming software pipelines of components

Dissemination level: PU -Public Page 2

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 through which data flows from the sources in agriculture, forestry and fishery through data management, analytics, and visualization stages in the pilots.

Agro Pilot 1 Forest Pilot 1 Fishery Pilot 1

Agro Pilot 2 Forest Pilot 2 Fishery Pilot 2 WP1-3

...

......

Agro Pilot 13 Forest Pilot 7 Fishery Pilot 6

Deliverables D4.1, D4.2, D4.3 Milestone M7 Deliverables DataBio platform Components & D5.1, D5.2, with big data components WP4 Components & WP5 Earth Observation D5.3 and datasets IoT datasets datasets Milestone M9

Figure 1: Work packages and their roles in DataBio

The platform developed in DataBio is described in the Deliverables D4.1, D4.2, D4.3 (WP4) and D5.1, D5.2, D5.3 (WP5) (Figure 1). Deliverables D4.1-3 define the Milestone M7 Service ready for Trial 1, whereas Deliverables D5.1-3 define the Milestone M9 EO Services ready for integration. The platform services and pipelines have been in trials since April 2018 (M16).

More specifically, the public deliverable D4.1 Platforms and interfaces describes the software components to be utilized by the pilots. Most of components are already in use in the first pilot trials. In addition, this deliverable reports the outcome of a matchmaking process, in which the pilots selected which components to deploy in their pilots.

Deliverable D4.2 Services for tests builds on D4.1 and provides an overview of the component pipelines as identified at month 16 (M16) of the project. It also provides guidelines for successful implementation and deployment of the pipelines.

This deliverable, D4.3 Datasets, formats and models is due at the end of August 2018. While the two earlier reports deal with software modules, this report will focus on the data sets and streams employed in DataBio. Data formats, standards and models enabling easy findability, access, interoperability, and reusability of data (FAIR principle) will be dealt with. Thus, in this deliverable we will address topics beyond the coverage of single pilots.

Deliverable D5.1 EO component specification includes an analysis of the EO dataset and component related requirements provided by the pilots. It was published in end of 2017 and contains an overview of best practices of EO access and initial component and dataset requirements based on the DataBio pilot needs.

Dissemination level: PU -Public Page 3

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Deliverable D5.2 EO component and interfaces describes, building on D5.1, the Earth Observations component pipelines similarly as D4.2 does for IoT components. It also includes examples of data experimentations with the pipelines.

Deliverable D5.3 EO services and tools builds on 5.1 and 5.2 and describes how the technical components from DataBio can be scaled-up to services and tools that are installed as Software as a Service (SaaS) or on premise. It further provides the information how and under which conditions these services and tools can be externally accessed.

Dissemination level: PU -Public Page 4

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Deliverable Leader: Arne-Jørgen Berre (SINTEF) Ståle Walderhaug (SINTEF), Pekka Siltanen (VTT), Caj Södergård (VTT), Miguel Ángel Esbrí (ATOS), Javier Hitado Simarro (ATOS), Ephrem Habyarimana (CREA), Iason Kastanis (CSEM), Margus Freudenthal (CYBER), Allan Aasbjerg Nielsen (DTU), Marco Corsi – (e-geos), Kostas Akasoglou (EXUS), Ioannis Komnios (EXUS), Adamantios Maragkos (EXUS), Anuj Sharma (EXUS), Charikleia Stefanou (EXUS), Dimitris Vassiliadis (EXUS), Petr Lukes (FMI), Eva Klien (Fraunhofer), Ivo Senner (Fraunhofer), Fabiana Fournier (IBM), Inna Skarbovsky (IBM), Christian Zinke (InfAI), George Bravos (INTRASOFT), Vassilis Chatzigiannakis (INTRASOFT), Karel Charvat (LESPRO), Karel Charvat, jr (LESPRO), Tomas Reznik (LESPRO), Anu Kosunen (METSAK), Contributors: Virpi Stenman (METSAK), Seppo Huurinainen (MHGS), Veli-Matti Plosila (MHGS), Panagiotis Elias (NP), Kostas Karalas (NP), Stamatis Krommidas (NP), Kostas Mastrogiannis (NP), Natassa Miliaraki (NP), Ilias Panos (NP), Menelaos Perdikeas (NP), Savvas Rogotis (NP), Pavlos Tsagkis (NP), Marco Folegani (MEEO), Ingo Simonis (OGCE), Soumya Brahma (PSNC), Raul Palma (PSNC), Juliusz Pukacki (PSNC), Jarkko Vähäkangas (Senop), Andrey Sadovykh (Softeam), Marc Gilles (Spacebel), Yves Coene (Spacebel), Anca Liana Costea (TerraS), Adrian Stoica (TerraS), Delia Teleaga (TerraS), Jesus Estrada Villegas (TRAGSA), Asuncion Roldan Zamarron (TRAGSA), Michal Kepka (UWB), Karel Jedlička (UWB), Tomas Mildorf (UWB), Erwin Goor (VITO), Jarmo Kalaoja (VTT), Tuomas Paaso (VTT), Kari Rainio (VTT), Renne Tergujef (VTT) Per Gunnar Auran (SINTEF Fishery) Reviewers: Tomas Mildorf (UWB) Virpi Stenman (METSAK) Approved by: Athanasios Poulakidas (INTRASOFT)

Dissemination level: PU -Public Page 5

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Document History Version Date Contributor(s) Description 0.1 05.06.2018 Ståle Walderhaug Initial ToC Ståle Walderhaug / 0.2 21.06.2018 ToC with section assignments Arne J. Berre 0.3 01.08.2018 Datasets included from partners Adrian Stoica, D5.i2 datasets included. Added FAIR data. 0.4 15.08.2018 Terrasigna Examples of use included. Updated with license policy information. 0.5 20.08.2018 Ståle Walderjaug Added concerns section Caj Södergård, Ståle 0.6 24.08.2018 Requirement in place. Datasets updates Walderhaug Arne J. Berre, Ståle 0.7 28.08.2018 Walderhaug, Caj Version for internal review Södergård Ståle Walderhaug, 0.8 31.08.2018 Version updated after internal review Arne J Berre Athanasios 1.0 31.08.2018 Final version for submission Poulakidas 1.0- 12.12.2018 Caj Södergård Public version of the document Public

Dissemination level: PU -Public Page 6

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Table of Contents EXECUTIVE SUMMARY ...... 2

RELATION WITH OTHER DATABIO PLATFORM DELIVERABLES ...... 2 TABLE OF CONTENTS ...... 7 TABLE OF FIGURES ...... 9 LIST OF TABLES ...... 9 DEFINITIONS, ACRONYMS AND ABBREVIATIONS ...... 10 INTRODUCTION ...... 15

1.1 PROJECT SUMMARY ...... 15 1.2 DOCUMENT SCOPE ...... 17 1.3 DOCUMENT STRUCTURE ...... 17 BACKGROUND ...... 19

2.1 DATA SHARING AND DATA ECONOMY IN DATABIO ...... 19 2.2 FAIR PRINCIPLES ...... 19 2.3 METADATA AND DISCOVERY OF DATASETS ...... 21 2.4 DATA REGISTRIES, DATA SHARING AND DATA EXCHANGE ...... 21 2.4.1 DataBioHub ...... 22 2.4.2 Linked Data and Open Micka ...... 23 2.4.3 Industrial data spaces ...... 25 2.4.4 Openness and payment ...... 26 2.4.5 UXP – Exchange Platform - Cybernetica ...... 26 2.5 INDUSTRIAL DATA SPACES AND CONNECTORS ...... 27 2.5.1 EU Data Portal ...... 29 2.5.2 GEOSS...... 29 2.5.3 DCAT and GeoDCAT ...... 30 2.5.4 CKAN ...... 30 2.6 OTHERS ...... 30 CONTEXT VIEW ...... 33

3.1 EXTERNAL DRIVERS FOR DATA SHARING AND DATA EXCHANGE ...... 33 3.2 DATA INTEROPERABILITY THROUGH ONTOLOGIES, MODELS, FORMATS AND STANDARDS ...... 35 3.2.1 Geospatial and Earth Observation ontologies and standards ...... 35 3.2.2 Agricultural ontologies and standards ...... 35 3.2.3 Forestry ontologies and standards ...... 36 3.2.4 Fishery ontologies and standards ...... 36 3.3 DATA ACCESS THROUGH STANDARD SERVICES AND APIS ...... 38 3.3.1 Geospatial Standards, Data Types and Services ...... 38 3.3.2 Sensor Standards, ontologies, data representations ...... 39 3.3.3 API approach ...... 41 3.4 STAKEHOLDERS AND CONCERNS ...... 42 3.5 LICENSE MODELS FOR DATA REUSE ...... 46 REQUIREMENTS VIEW ...... 47

4.1 TYPES OF EO DATA AND SENSORS USED IN THE DATABIO PILOTS AND THEIR CHARACTERISTICS ...... 47 4.2 DATASETS AND DATASTREAM REQUIREMENTS FROM PLATFORM ...... 55 4.3 DATASETS AND DATASTREAM REQUIREMENTS FROM AGRICULTURE PILOTS ...... 57

Dissemination level: PU -Public Page 7

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

4.4 DATASETS AND DATASTREAM REQUIREMENTS FROM FORESTRY PILOTS ...... 62 4.5 DATASETS AND DATASTREAM REQUIREMENTS FROM FISHERY PILOTS ...... 64 DATASETS: EXISTING, IMPROVED, NEW AND OTHERS ...... 69

5.1 EXISTING DATASETS UTILIZED BY DATABIO PILOTS ...... 69 5.1.1 Open Transport Map (UWB - D03.02) ...... 69 5.1.2 Forest resource data (METSAK - D18.01) ...... 71 5.1.3 Landsat 8 OLI data ...... 74 5.1.4 Sentinel 3 OLCI (Ocean and Land Colour Instrument) data ...... 77 5.1.5 Sentinel 3 SLSTR (Sea and Land Surface Temperature Radiometer) ...... 78 5.1.6 MODIS data ...... 80 5.1.7 Proba-V data ...... 81 5.1.8 Global Precipitation Measurement (GPM) mission data ...... 82 5.1.9 KNMI (Koninklijk Nederlands Meteorologisch Instituut) precipitation data ...... 84 5.1.10 CMEMS (Copernicus Marine Environment Monitoring Service) data ...... 85 5.1.11 Sentinel 2A (ESA D11.01) ...... 86 5.1.12 Sentinel-2 Data ...... 88 5.1.13 Sentinel 3 SRAL (Synthetic Aperture Radar Altimeter) data ...... 89 5.1.14 Sentinel 3 MWR (Microwave Radiometer) data ...... 89 5.2 DATASETS IMPROVED BY DATABIO ...... 89 5.2.1 RPAS (Remotely Piloted Aircraft Systems) data ...... 89 5.2.2 Ortophotos ...... 90 5.2.3 gaiasense field (D13.01)...... 91 5.2.4 Land use and properties - Greek agriculture pilots (NP - D13.02) ...... 93 1.1.1 5.3.13 Land use and properties - Greek agriculture pilots ...... 93 5.2.5 Customer and forest estate data (METSAK - D18.02) ...... 96 5.3 NEW DATASETS CREATED DURING DATABIO ...... 98 5.3.1 Canopy height map (FMI - D14.05) ...... 98 5.3.2 Orthophotos - (IGN - D11.02) ...... 99 5.3.3 GEOSS sources (D11.03) ...... 101 5.3.4 RPAS data (Tragsa - D11.04) ...... 101 5.3.5 MFE Spanish Forest Map (D11.06) ...... 103 5.3.6 Field data - pilot B2 (Tragsa - D11.07) ...... 105 5.3.7 Forest damage (FMI - D14.07) ...... 107 5.3.8 Open Forest Data (METSAK - D18.01) ...... 108 5.3.9 Hyperspectral image orthomosaic (Senop - D44.02) ...... 111 5.3.10 Leaf area index (FMI - D14.06) ...... 111 5.3.11 NASA CMR Landsat Datasets via FedEO Gateway (SPACEBEL - D07.02) ...... 114 5.3.12 Ontology for (Precision) Agriculture (PSNC -D09.01) ...... 115 5.3.13 Open Land Use (Lespro - D02.01) ...... 117 5.3.14 Phenomics, metabolomics, genomics and environmental datasets (CERTH - DS40.01) ...... 122 5.3.15 Quality control data (METSAK - D18.04) ...... 122 5.3.16 Sentinels Scientific Hub Datasets via FedEO Gateway (SPACEBEL -D07.01) ...... 125 5.3.17 SigPAC (Tragsa - D11.05) ...... 127 5.3.18 Smart POI dataset (Lespro - D02.01) ...... 128 5.3.19 Stand age map (FMI - D14.04) ...... 129 5.3.20 Storm and forest damage observations and possible risk areas (METSAK - D18.03a) ...... 130 5.3.21 Forest road condition observations (METSAK - D18.03b) ...... 133 5.3.22 Tree species map (FMI - D14.03) ...... 136 5.3.23 Wuudis data (MHGS - D20.01) ...... 138 5.4 RECOMMENDED INTERACTION STRUCTURES: ATOS ...... 139

Dissemination level: PU -Public Page 8

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

CONCLUDING REMARKS ...... 149 REFERENCES ...... 150 APPENDIX A METADATA TEMPLATE TABLE ...... 152

Table of Figures FIGURE 1: WORK PACKAGES AND THEIR ROLES IN DATABIO ...... 3 FIGURE 2: HOW DISTRIBUTED STORAGE AND PAYMENTS WORK ...... 26 FIGURE 3: FUNCTIONAL ARCHITECTURE OF THE INDUSTRIAL DATA SPACE ...... 28 FIGURE 4: OPENAIRE ...... 31 FIGURE 5: DRYAD ...... 32 FIGURE 6: THE FLUX STANDARDS AND STATUS (FROM UN ESCAP PRESENTATION OF DR HEINER LEHR) [REF-37]...... 37 FIGURE 7: ARCHIMATE STRATEGY DIAGRAM SHOWING HOW THE PILOT SYSTEM WILL REALIZE THE DEFINED GOALS ...... 42 FIGURE 8: ARCHIMATE BUSINESS DIAGRAM SHOWING THE DATA PROCESSING, DATASETS AND ACTORS INVOLVED ...... 43 FIGURE 9: ARCHIMATE DATA VIEW FOR ONE OF THE FISHERY PILOTS (B2) ...... 44 FIGURE 10: THE B2 FISHERY PILOT LIFECYCLE VIEW SHOWING HOW DATA IS PROVIDED AS INPUT TO PROCESSING STEPS ...... 44 FIGURE 11: THE B2 FISHERY PILOT PIPELINE VIEW SHOWING HOW DATASETS ARE INTERFACED ...... 45 FIGURE 12: EO DATA COLLECTION CONTEXT ...... 47

List of Tables TABLE 1: THE DATABIO CONSORTIUM PARTNERS ...... 15 TABLE 2: TYPES OF DATA USED IN DATABIO PILOT PROJECTS ...... 48

Dissemination level: PU -Public Page 9

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Definitions, Acronyms and Abbreviations Acronym/ Title Abbreviation ADES Application Deployment and Execution Service AMS Application Management Client API Application programming interface ArchiMate ArchiMate® Specification, modelling language for Enterprise Architecture ATOM ATOM (Syndication Format) BDVA Big Data Value Association CAP Common Agriculture Policy CCSDS Consultative Committee for Space Data Systems CEOS Committee on Earth Observing Satellites CEP Complex Event Processing CETL Connect Extract Transform and Load CKAN Comprehensive Kerbal Archive Network CMEMS Copernicus Marine Environment Monitoring Service CMR Common Metadata Repository CPS Cyber Physical Systems CSW Catalogue Service for Web DCAT Data Catalog Vocabulary DDS Data Distribution System DEI Digitising European Industry DIAS Data and Information Access Services DSL Domain Specific Language DWG Domain Working Group ECMWF European Centre for Medium-Range Weather Forecasts ECSS European Collaboration on Space Standardisation EO Earth Observation ERS European Remote Sensing Satellite ESA European Space Agency FAD Fish Aggregating Devices FTP File Transfer Protocol GEMET GEneral Multilingual Environmental Thesaurus GEO Group on Earth Observation GSCDA GMES Space Component Data Access GUI Graphical User Interface HLA High Level Architecture HPC High-Performance Computing HTTP Hypertext Transfer Protocol

Dissemination level: PU -Public Page 10

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

IAS Invasive Alien Species IDP Industrial Data Platform IETF Internet Engineering Task Force INSPIRE Infrastructure for Spatial Information in Europe IoT Internet of Things ISO International Organisation for Standardisation JSON JavaScript Object Notation KMI Koninklijk Meteorologisch Instituut KML KNMI Koninklijk Nderlands Meteorologisch Instituut LPIS Land Parcel Identification System NASA National Aeronautics and Space Administration NG Next Generation NIST National Institute of Standards and Technology NN Nearest Neighbors OAIS Open Archival Information System OASIS Organization for the Advancement of Structured Information Standards ODBC Open Database Connectivity OGC Open Geospatial Consortium OLCI Ocean and Land Colour Imager OLU Open Land Use OTM Open Transport Map PaaS Platform as a Service PDP Research Data Platform PPP Public-Private Partnership PROTON IBM PROactive Technology ONline RDF Resource Description Framework REST REpresentational State Transfer RMSE Root Mean Square Error RPAS Remotely Piloted Aircraft Systems SaaS Software as a Service SAFE Standard Archive Format for Europe SIG Special Interest Groups SLSTR Sea and Land Surface Temperature Radiometer SRIA Strategic Research and Innovation Agenda STIM Smart Transducer Interface Module (from IEEE standard) SVM Support Vector Machines SWG Standards Working Group TCP Transmission Control Protocol

Dissemination level: PU -Public Page 11

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

TEP Thematic Exploitation Platform TMS Tile Map Service UDP Urban/City Data Platform UI User Interface UMM Unified Metadata Model URL Universal Resource Locator W3C World Wide Web Consortium WCPS Web Coverage Processing Service WCS WFS WGISS Working Group on Information Systems and Services WMS WMTS WP Work Package WPS WTZ Warning time horizon XFDU XML Formatted Data Units XML eXtensible Markup Language

Term Definition Commercial The products from high resolution and very high-resolution commercial Mission missions are purchased on the market. The term “commercial” is used to denote both optical and radar missions. Dataset Identifiable collection of data. In the EO Community, a dataset is typically called “product”. Dataset Series Collection of datasets sharing the same product specification. In the EO Community, a dataset series is also called “collection” or “dataset” (in GSCDA). Exploitation An Exploitation Platform is a virtual workspace, providing the user Platform community with access to (i) large volume of data (EO/non-space data), (ii) algorithm development and integration environment, (iii) processing software and services (e.g. toolboxes, retrieval baselines, visualization routines), (iv) computing resources (e.g. hybrid cloud/grid), (v) collaboration tools (e.g. forums, wiki, knowledge base, open publications, social networking), (vi) general operation capabilities (e.g. user management and access control, accounting, etc.). SAFE Format The SAFE (Standard Archive Format for Europe) has been designed to act as a common format for archiving and conveying data within ESA Earth Observation archiving facilities.

Dissemination level: PU -Public Page 12

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Special attention has been taken to ensure that SAFE conforms to the ISO 14721:2003 OAIS (Open Archival Information System) reference model and related standards such as the emerging CCSDS/ISO XFDU (XML Formatted Data Units) packaging format. Sentinel-1 The Copernicus Sentinel-1 earth observation mission developed by ESA provides continuity of data from ERS and Envisat missions, with further enhancements in terms of revisit, coverage, timeliness and reliability of service. The SENTINEL-1 mission comprises a constellation of two polar- orbiting satellites, operating day and night performing C-band synthetic aperture radar imaging, enabling them to acquire imagery regardless of the weather. The two-satellite constellation offers a 6 days revisit time. A summary of mission objectives is: ● Monitoring sea ice zones and the Arctic environment, and surveillance of marine environment; ● Monitoring land surface motion risks; ● Mapping of land surfaces: forest, water and soil; ● Mapping in support of humanitarian aid in crisis situations; ● Spatial Resolution: 5m, 20m, 40m. Source: Wikipedia and Sentinel Online Web site (https://sentinels.copernicus.eu). Sentinel-2 The Copernicus Sentinel-2 earth observation mission developed by ESA provides continuity to services relying on multi-spectral high-resolution optical observations over global terrestrial surfaces. Sentinel-2 sustains the operational supply of data for services such as forest monitoring, land cover changes detection or natural disasters management. The Sentinel-2 mission offers an unprecedented combination of the following capabilities: ● Multi-spectral information with 13 bands in the visible, near infra-red and short wave infra-red part of the spectrum; ● Systematic global coverage of land surfaces: from 56°South to 84°North, coastal waters and all Mediterranean Sea; ● High revisit: every 5 days at equator under the same viewing conditions; ● High spatial resolution: 10m, 20m and 60m; ● Wide field of view: 290 km. Source: Wikipedia and Sentinel Online Web site (https://sentinels.copernicus.eu). Sentinel-3 The Copernicus Sentinel-3 earth observation mission developed by ESA main objective is to measure sea-surface topography, sea- and land- surface temperature and ocean- and land-surface colour. A pair of Sentinel-3 satellites will enable a short revisit time of less than two days for OLCI instrument and less than one day for SLSTR at the equator.

Dissemination level: PU -Public Page 13

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Mission objectives are: ● Measure sea-surface topography, sea-surface height and significant wave height; ● Measure ocean and land-surface temperature; ● Measure ocean and land-surface colour ● Monitor sea and land ice topography; ● Sea-water quality and pollution monitoring; ● Inland water monitoring, including rivers and lakes; ● Aid marine weather forecasting with acquired data; ● Climate monitoring and modelling; ● Land-use change monitoring; ● Forest cover mapping; ● Fire detection; ● Weather forecasting; ● Measuring Earth's thermal radiation for atmospheric applications. The Sentinel-3A mission has now reached the full operational capacity and preparations for Sentinel-3B launch is-going (mission status on 6 December 2017). Sources: Wikipedia and Sentinel Online Web site (https://sentinels.copernicus.eu). Third Party ESA uses its multi-mission ground systems to acquire, process, archive Mission and distribute data from other satellites - so called Third Party Missions. Source: http://earth.esa.int/missions/thirdpartymission/.

Dissemination level: PU -Public Page 14

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Introduction 1.1 Project Summary The data intensive target sector selected for the DataBio project is the Data-Driven Bioeconomy. DataBio focuses on utilizing Big Data to contribute to the production of the best possible raw materials from agriculture, forestry and fishery/aquaculture for the bioeconomy industry, in order to output food, energy and biomaterials, also taking into account various responsibility and sustainability issues. DataBio will deploy state-of-the-art big data technologies and existing partners’ infrastructure and solutions, linked together through the DataBio Platform. These will aggregate Big Data from the three identified sectors (agriculture, forestry and fishery), intelligently process them and allow the three sectors to selectively utilize numerous platform components, according to their requirements. The execution will be through continuous cooperation of end user and technology provider companies, bioeconomy and technology research institutes, and stakeholders from the big data value PPP programme. DataBio is driven by the development, use and evaluation of a large number of pilots in the 3 identified sectors, where also associated partners and additional stakeholders are involved. The selected pilot concepts will be transformed to pilot implementations utilizing co- innovative methods and tools. The pilots select and utilize the best suitable market ready or almost market ready ICT, Big Data and Earth Observation methods, technologies, tools and services to be integrated to the common DataBio Platform. Based on the pilot results and the new DataBio Platform, new solutions and new business opportunities are expected to emerge. DataBio will organize a series of trainings and hackathons to support its take-up and to enable developers outside the consortium to design and develop new tools, services and applications based on and for the DataBio Platform. The DataBio consortium is listed in Table 1. For more information about the project see www.databio.eu.

Table 1: The DataBio consortium partners

Number Name Short name Country 1 (CO) INTRASOFT INTERNATIONAL SA INTRASOFT Belgium 2 LESPROJEKT SLUZBY SRO LESPRO Czech Republic 3 ZAPADOCESKA UNIVERZITA V PLZNI UWB Czech Republic FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG DER 4 ANGEWANDTEN FORSCHUNG E.V. Fraunhofer Germany 5 ATOS SPAIN SA ATOS Spain

Dissemination level: PU -Public Page 15

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

6 STIFTELSEN SINTEF SINTEF ICT Norway 7 SPACEBEL SA SPACEBEL Belgium VLAAMSE INSTELLING VOOR TECHNOLOGISCH 8 ONDERZOEK N.V. VITO Belgium INSTYTUT CHEMII BIOORGANICZNEJ POLSKIEJ 9 AKADEMII NAUK PSNC Poland 10 CIAOTECH Srl CiaoT Italy 11 EMPRESA DE TRANSFORMACION AGRARIA SA TRAGSA Spain 12 INSTITUT FUR ANGEWANDTE INFORMATIK (INFAI) EV INFAI Germany 13 NEUROPUBLIC AE PLIROFORIKIS & EPIKOINONION NP Greece Ústav pro hospodářskou úpravu lesů Brandýs nad 14 Labem UHUL FMI Czech Republic 15 INNOVATION ENGINEERING SRL InnoE Italy 16 Teknologian tutkimuskeskus VTT Oy VTT Finland SINTEF 17 SINTEF FISKERI OG HAVBRUK AS Fishery Norway 18 SUOMEN METSAKESKUS-FINLANDS SKOGSCENTRAL METSAK Finland 19 IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD IBM Israel 20 MHG SYSTEMS OY - MHGS MHGS Finland 21 NB ADVIES BV NB Advies Netherlands CONSIGLIO PER LA RICERCA IN AGRICOLTURA E 22 L'ANALISI DELL'ECONOMIA AGRARIA CREA Italy 23 FUNDACION AZTI - AZTI FUNDAZIOA AZTI Spain 24 KINGS BAY AS KingsBay Norway 25 EROS AS Eros Norway 26 ERVIK & SAEVIK AS ESAS Norway 27 LIEGRUPPEN FISKERI AS LiegFi Norway 28 E-GEOS SPA e-geos Italy 29 DANMARKS TEKNISKE UNIVERSITET DTU Denmark 30 FEDERUNACOMA SRL UNIPERSONALE Federu Italy CSEM CENTRE SUISSE D'ELECTRONIQUE ET DE MICROTECHNIQUE SA - RECHERCHE ET 31 DEVELOPPEMENT CSEM Switzerland 32 UNIVERSITAET ST. GALLEN UStG Switzerland 33 NORGES SILDESALGSLAG SA Sildes Norway United 34 EXUS SOFTWARE LTD EXUS Kingdom 35 CYBERNETICA AS CYBER Estonia GAIA EPICHEIREIN ANONYMI ETAIREIA PSIFIAKON 36 YPIRESION GAIA Greece 37 SOFTEAM Softeam France FUNDACION CITOLIVA, CENTRO DE INNOVACION Y 38 TECNOLOGIA DEL OLIVAR Y DEL ACEITE CITOLIVA Spain 39 TERRASIGNA SRL TerraS Romania

Dissemination level: PU -Public Page 16

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

ETHNIKO KENTRO EREVNAS KAI TECHNOLOGIKIS 40 ANAPTYXIS CERTH Greece METEOROLOGICAL AND ENVIRONMENTAL EARTH 41 OBSERVATION SRL MEEO Italy 42 ECHEBASTAR FLEET SOCIEDAD LIMITADA ECHEBF Spain 43 NOVAMONT SPA Novam Italy 44 SENOP OY Senop Finland UNIVERSIDAD DEL PAIS VASCO/ EUSKAL HERRIKO 45 UNIBERTSITATEA EHU/UPV Spain OPEN GEOSPATIAL CONSORTIUM (EUROPE) LIMITED United 46 LBG OGCE Kingdom 47 ZETOR TRACTORS AS ZETOR Czech Republic COOPERATIVA AGRICOLA CESENATE SOCIETA 48 COOPERATIVA AGRICOLA CAC Italy 49 SINTEF AS SINTEF Norway

1.2 Document Scope

This is a public version of Deliverable D4.3 “Data sets, formats and models”. Confidential information from the original document has been omitted.

The main objective of this deliverable is to describe the datasets utilized, improved and created in the DataBio project. A secondary objective is to show how the datasets are identified based on a model-driven design process based on Archimate, involving the 26 pilot systems in the DataBio project. In addition to this deliverable, the datasets will be provided through the DataBioHub, including important Archimate design diagrams. 1.3 Document Structure This document is comprised of the following chapters: Chapter 1 presents an introduction to the project and the document. Chapter 2 introduces datasharing and dataeconomy in the context of DataBio. Chapter 3 presents the context view of datasets in DataBio, including external drivers, stakeholders and license models. Chapter 4 provides an overview of the requirements for datasets and datastreams in DataBio grouped by pilots and the platform itself

Dissemination level: PU -Public Page 17

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Chapter 5 presents the datasets in DataBio: existing, improved, new and other relevant datasets. The final subsection gives an example of how a dataset can be used for application development. Chapter 6 presents the concluding remarks.

Dissemination level: PU -Public Page 18

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Background 2.1 Data sharing and data economy in DataBio As part of the Digital Single Market strategy and building a European data economy, the European Commission adopted the Communication ‘Towards a common European data space’ in April 2018 [REF-01]. The document proposes a roadmap to “a common data space in the EU - a seamless digital area with the scale that will enable the development of new products and services based on data.” The DataBio domains, agriculture, forestry and fishery, are key areas where the Commission expects that businesses can utilize the data sharing through the data space to improve products and productivity. The Commission document identifies reuse of public and publicly funded data to be a cornerstone in the dataspace and has launched the “European Open Data Portal” to stimulate the development [REF-02]. An important factor in realizing a common data space is to stimulate to private businesses and public agencies to share both private and public datasets. The guide to “Building a European data economy” states that digital data “is an essential resource for economic growth, competitiveness, innovation, job creation and societal progress in general” [REF-03]. Digital data should be shared in both business to business (B2B) and business to government (B2G) contexts. The DataBio pilots involves many private stakeholders that produce, consume and share datasets/datastreams. The pilots will demonstrate how data can be shared and utilized in order to improve the quality and efficiency of pilot systems. All datasets and datastreams involved in the pilot systems’ realization are identified documented in the platform and pilot ArchiMate models. These models relate the datasets to the pilots and interfaces, providing traceability from pilot to data, components and pipelines. The DataBio datasets and datastreams are examples of B2B and B2G data sharing, and is documented here in terms of 1) Rich metadata: each dataset is described with relevant metadata elements following best practice and harmonized with e.g. Transforming Transport datasets [REF-04]. 2) Data portal - the DataBioHub: each dataset and datastream is registered in the DataBioHub - a data portal from DataBio 3) Examples: relevant examples on how to utilize datasets from DataBio is provided in this document 2.2 FAIR Principles Most datasets from publicly funded research are still inaccessible to the majority of scientists in the same discipline, not to mention other potential users of the data, such as company R&D departments. About 80% of research data is not in a trusted repository. However, even if the data openly appears in repositories, this is not always enough. As a current example, only 18% of the data in open repositories is reusable [REF-05]. This leads to inefficiencies and delays; in recent surveys, the time reportedly spent by data scientists in collecting and cleaning data sources made up 80% of their work [REF-06]. These figures can be assumed to be valid also for the bioeconomy sector.

Dissemination level: PU -Public Page 19

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

In response to these challenges, the Commission has launched a large effort with the objective of creating “a European Open Science Cloud to make science more efficient and productive and let millions of researchers share and analyse research data in a trusted environment across technologies, disciplines and borders” [REF-07]. The initial outline for the European Open Science Cloud (EOSC) was laid out in the report from the High Level Expert Group (Moons et al 2016). This report promotes the FAIR Data Principles, which are a set of guiding principles in order to promote maximum use of research data (Wilkinson et al., 2016) The FAIR principles were created in a workshop in 2014 and intend to give “a minimal set of community-agreed guiding principles and practices” [REF-08]. Both humans and machines should be enabled to find (F), access (A), interoperate (I) and re-use (R) research data and metadata in an effortless but confined fashion. These principles provide guidance for scientific data management and stewardship and are relevant to all stakeholders in the current digital ecosystem. A Data management plan based on FAIR is since 2017 mandatory in all EU Horizon projects [REF-09]. The FAIR principles are advanced by the Go Fair initiative (https://www.go-fair.org/) [REF-10]. Currently, Germany, France and the Netherlands are part of this initiative. As comes to DataBio, the project implemented the Data Management Plan (DMP), that is a part of the project proposal. The plan, that constitutes Deliverable D6.21 covers descriptions of the DataBio datasets, data standards, data sharing and long-time preservation of data. The DMP is also an important tool for the dissemination and exploitation activities. Data privacy and ownership are essential elements, which are dealt with in T4.6. The DataBioHub [REF-11], described below in Section 2.4.1, is a central tool for our project in realising data management and data sharing. In addition to offering searchable public and private dataset descriptions, it also contains descriptions of DataBio components, pipelines and pilots as well as of their mutual relations. The hub clearly makes the DataBio data findable by publishing the metadata according to best practices and standards (geospatial and others) as well as applying search keywords (=tags) to the digital objects. The data is also accessible from the DataBioHub repository, however in some cases only indirectly by consulting the dataset owner, when the Hub only contains the metadata. DataBioHub typically contains information about the APIs, the data model and formats as well as about the access methods This hub also promotes interoperability as the metadata and data many times - but not always - obey established standards, e.g. in the Earth Observation field. Finally, for reusability, the licensing schemes are essential to permit the widest reuse possible. When will restricted data be made available for reuse? Are the data produced and/or used in the project useable by third parties, in particular after the end of the project? How long is it intended that the data remains re-usable? The DataBio data management plan related to FAIR principles is described in chapter 3 of [REF-20].

1 https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D6.2-Data-Management-Plan_v1.0_2017-06- 30_CREA.pdf

Dissemination level: PU -Public Page 20

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

2.3 Metadata and discovery of datasets Data discoverability of (open) geo-information is vital to increase the use of geospatial data within- and outside the geospatial expert community. This may also be supported by experience originating from Europe. In 2003, Directive 2003/98/EC (also known as PSI – Public Sector Information) established a minimum set of rules governing both the re-use and the practical means of facilitating the re-use of existing documents held by public sector bodies in the European Union. In the end, Directive 2003/98/EC had only a partial impact in the field of data re-use. It was even hard to discover that there are data that may be re-used. In 2007, Directive 2007/2/ES (also known as INSPIRE – INfrastructure for SPatial InfoRmation in Europe) was established, chiefly to make it easier to discover available spatial data and services. Moreover, discovery mechanisms represent one of the bridges between geospatial and non-geospatial approaches for metadata management. Metadata in the DataBio project are extraordinarily diverse from their structure, encodings, kinds of resources they describe, handling as well as publication point of views. “Big metadata” approaches need to be developed since also metadata meet the requirements of three out of four V: variety, veracity and velocity. Volume is not an issue as metadata are typically small, in a scale of kilo- or maximally megabytes. Nevertheless, the traditional metadata approaches are based on assumptions of static resources and long-term durability of metadata records from a variety and velocity point of view. Veracity of metadata has always been an issue, a least, due to a loose integration of data and metadata updates. The DataBio approach therefore aims at the following goals for metadata and discovery: 1. Tight data and metadata together: ensure updated metadata despite Big Data velocity updates. 2. Support metadata heterogeneity: enable discovery of static (e.g. datasets) as well as mobile/other resources (e.g., sensors active during agricultural machinery fleet tracking) in a unified platform. 3. Use efficient encodings: support XML-based format for backwards compatibility, on the contrary use visionary lightweight and semantics-based formats. 4. Integrate metadata in other tools: the best metadata platform is the one where a user does not notice that (s)he works with metadata. 2.4 Data registries, data sharing and data exchange The data sets of DataBio are registered in the DataBio Hub. It is also relevant to register datasets in other data registries like GEOSS or others. Earth Observation (EO) data sets are of major importance for the DataBio project, and the management and access of these has been described in more detail in the deliverables D5.1 and D5.i2. As an example, Sentinel Products available on the Sentinels Scientific Data Hub (Sentinel-1, Sentinel-2) can be discovered and accessed via the FedEO Gateway (C07.01) that returns Sentinel collections and datasets metadata (including product download URL) via an OGC 13-026r8 OpenSearch interface.

Dissemination level: PU -Public Page 21

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Industrial data platforms including support for data sharing, data exchange and data access are now also emerging and the DataBio project is also aiming to take advantage of these in the next phase of the project. Below the DataBioHub is described, followed by a description of other relevant data registries and data platforms.

2.4.1 DataBioHub DataBioHub [REF-11] provides a registry for the project components, pipelines, and pilots for an easy search of the different project entities. The hub is dynamic and is being updated with more functionalities and resources as the project evolves. The data sets applied in the project will be added to this hub, so these can be searched in combinations with the other project resources. It is important to note that the DataBio Hub does not offer a repository or operating environment for the service instances and datasets themselves, as those instances will be running on the service providers’ servers or cloud infrastructure (or DataBio -provided cloud). Regardless of the running environment of the service instances, DataBioHub offers descriptions and endpoints to all DataBio platform -compatible services and components (and possibly applications) in a single location and with a coherent description. Initially two publicly available instances of the complete digital service registries exist: one as a project deliverable at icare.erve.vtt.fi/ServiceRegistryWeb and one public and free for non- commercial R&D usage at www.digitalserviceshub.com. A new service registry instance has been provided for DataBio project and can be found at http://www.databiohub.eu/. The instance has been installed on a virtual machine on Microsoft Azure’s cloud computing service. Infrastructure as a service (IaaS) allows easy server management and increasing computing power and resources if needed. Virtual machine runs on Ubuntu Linux platform and the whole machine is backed up in a recovery service vault redundant geologically. Digital service registry has been tailored for DataBio use, which includes following developments:

• As service registry was initially developed to register digital services with mainly machine-readable interface descriptions, vocabulary support for new categories of software components such as applications with both human and technical interfaces have need to be added. • New interface technologies such as OpenSearch for satellite image services have been added to service hub interface description vocabularies. • DataBio Pilot descriptions data processing pipelines developed in pilots as well as component descriptions are now also included into registry. Dataset descriptions will be added while submitting deliverable D4.3. • Specific rules for keyword use for DataBio have been enforced to link descriptions to BDVA reference architecture and also help linking component and service descriptions to overall architecture of DataBio platform and pilots.

Dissemination level: PU -Public Page 22

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

• New fields for human readable description have been added to improve linking them to pilot development, data models, and DataBio deliverables with possibility to include as images the component diagrams exported from DataBio architecture models. • Service Hub UI and its website has been tailored for DataBio and linked with other websites of DataBio project. • Registration mechanisms for new users outside DataBio consortium have been restricted during DataBio platform development. 2.4.2 Linked Data and Open Micka The best practices for the publication of Linked data were described in previous deliverable D4.i1, Section “Linked Data Publication Pipeline”. In this section, we summarize the most relevant practices, which have been applied during the DataBio project. Theoretically Linked Data refers to a set of best practices for publishing and interlinking structured data thereby enabling it to be accessed by both humans and machines. The data interchange follows the RDF family of standard and SPARQL is used for querying. The key technologies that support the Linked Data are:

• Any concept or entity can be identified by assigning specific URIs to them. • HTTP for retrieving or description of resources. • RDF which is generic graph-based data model used for structuring and linking data that describes concepts or entities in the real world. • SPARQL is the standard RDF query language. Due to the growing popularity of Linked Data, more detailed guidelines for the development and delivery of open data as Linked Data were defined. For instance, for open government data, the best practices recommended include (more detailed information was given in D4i.1: • To prepare the stakeholders • To select a reusable dataset • To model data objects and their relations to represent Linked Data. • To specify an appropriate license to ease data reuse. • To use well-considered URI naming strategy and implementation plan. • To describe the objects with previously defined vocabulary. • To convert data into linked data representation by scripting or other automated processes. • To provide machine access to the Linked Data. • To announce new datasets on authoritative domains to initiate an implicit social contact. • To maintain the Linked Data which is once published. Note that even those these best practices were conceived for open government data, they apply generally in other domains. Regarding the publication process, there are at least three well known life cycle models (Hyland et al., Hausenblas et al., Villazón-Terrazas et al.) for publishing linked data. All of these

Dissemination level: PU -Public Page 23

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 models identify common needs of specifying, modelling and publishing data in the standard open Web format. However even though all of the models somewhat deal with similar tasks involved in the process of publishing linked data, they have some differences between those tasks. A detailed description of these models is available in D4.i1. For our work, we are mainly interested in the model proposed by Villazón-Terrazas et al., that includes the following activities:

• Specification: o Identification and analysis of the data sources to be published. o Reusing or leveraging the data that had already been opened/published. o Assigning meaningful URIs rather than opaque ones whenever possible. o Definition of the license of the data sources and reusing existing ones whenever possible. • Modelling: o Ontologies are to be expressed in either OWL or RDF(S). o Reusing the existing and available vocabularies. o Reusing the available non-ontological resources. • Generation: o Transformation of the specified data sources into RDF according to the modelled vocabulary by using tools like CSV and spread sheets, RDB or XML. o Pre-processing and/or post processing tasks for fixing accessibility issues, reasoning issues etc. o Linking with suitable datasets and discovering suitable relationships between the other data items with valid properties. • Publishing: o Dataset publication by using tools for storing RDF (e.g. Openlink Virtuoso Universal Server, Jena, Sesame, 4Store, YARS, OWLIM etc.) and using SPARQL endpoint and Linked Data front end (e.g. Pubby, Talis Platform, Fuseki). o Metadata publication by using VoID which allows expressing metadata about RDF datasets and by OPM (Open Provenance Model). o Dataset discovery by registering the datasets in the CKAN2 registry and generating sitemap files for the dataset, by using sitemap4rdf. • Exploitation o Application and exploitation of the Linked Data for various purposes and applications across different platform in Web technology. Open Micka [REF-14] is a web application for management and discovery geospatial metadata (open source under BSD license). This has been extended and applied in DataBio project in particular in the Agriculture pilot 1 pipeline on " Metadata, linked data and graph data ". Features of the application:

• OGC Catalogue service (CSW 2.0.2)

2 https://ckan.org/

Dissemination level: PU -Public Page 24

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

• Transactions and harvesting • Metadata editor • Multilingual user interface • ISO AP 1.0 profile • Feature catalogue (ISO 19110) • Interactive metadata profiles - management • WFS/Gazetteer for defining metadata - extent • GEMET thesaurus built-in client • INSPIRE registry built-in client • OpenSearch • INSPIRE ATOM download service - automatically creation from metadata 2.4.3 Industrial data spaces The Industrial Data Space (IDS) (renamed in April 2018 to International Data Space(s) ) is both a research project and a non-profit user association (IDSA). IDS extends a Data marketplace with the ability to run services inside the IDS, e.g., data analysis and processing operations. The core requirements for and IDS related to data access are as described in the Industrial Data Space whitepaper.3

• Data sovereignty: It is always the data owner that specifies the terms and conditions of use of the data provided • Decentral data management: Data management remains with the respective data owner, if desired. • Data economy: Data is viewed as an economic asset. It can be distinguished into three categories: private data, so-called »club data« (i.e. data belonging to a specific value creation chain, which is available to selected companies only), and public data (weather information, traffic information, geo data etc.). • Easy linkage of data: Linked-data concepts and common vocabularies facilitate the integration of data between participants. • Trust: All participants, data sources, and data services of the Industrial Data Space are certified against commonly defined rules. • Secure data supply chain: Data exchange is secure across the entire data supply chain, i.e. from data creation to data capture to data usage.

A “Data User” that wants to access data in an IDS must comply with a set of requirements specified by the “Data Provider” and IDS. These requirements may include payment, standards for data protection, use period, restrictions on aggregation levels and sharing with other parties.

3 http://www.industrialdataspace.org/wp-content/uploads/2016/09/whitepaper-industrial-data-space-eng.pdf

Dissemination level: PU -Public Page 25

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

An example of an IDS solution is the Estonian Cybernetica platform4. Cybernetica provides solutions for sea surveillance, customs declaration management, data sharing, voting and a number of other applications.

2.4.4 Openness and payment Openness with respect to data is not a binary concept and that there could be degrees of openness when it comes to data access (eligible parties, conditions under which data can be accessed). With the diffusion of IoT-enabled sensors/machines, storage and payment of data has adopted blockchain technologies. In addition to secure storage, this approach allows data consumer services that can purchase data from providers using blockchain payment. Datum (https://datum.org) is an example of a data marketplace following this approach as illustrated below.

Figure 2: How distributed storage and payments work

2.4.5 UXP – Exchange Platform - Cybernetica Unified eXchange Platform (UXP) is a technology that enables peer-to-peer data exchange over encrypted and mutually authenticated channels. It is based on a decentralised architecture where each peer has an information system that will be connected with other peers’ systems.

4 https://cyber.ee/en/

Dissemination level: PU -Public Page 26

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

UXP is created by the authors of the world-renowned e-Government system of Estonia, the X-Road, which according to the World Bank Development Report is what allowed Estonia to become a truly digital society. UXP-based solutions have been implemented across four continents to enable running online government services for 35 million people from different countries and cultures. We make this possible by fitting our technology naturally into your existing ecosystem, with full integration support and minimal changes required. Seamless Data Exchange: UXP connects any number of databases in an efficient and secure way, helping you build a network of agreements that allows controlled exchange between any members in your ecosystem. UXP benefits:

• Less is More. UXP means less paperwork, less bureaucracy, less time spent on futility. In Estonia, digital services save every citizen one work-week per year. What would you do with your week? • Affordable. UXP can be implemented into any ecosystem – be it a tiny country or a supranational association. With very low maintenance cost and marginal implementation investment, UXP is cost-effective and allows you to move ahead one step at a time. • Reliable. UXP has been heavily tried and tested since its launch as Estonia’s X-Road in 2001. No downtime has been observed since and the system survived the world’s first cyber conflict in 2007. • Secure. We use extensive security measures to guarantee the protection and integrity of your data. UXP is secure-by-design, as its decentralised architecture has no single point-of-failure. All traffic is encrypted with 2048-bit keys. These are minimal requirements of the system – cryptographic algorithms can be altered to provide even stronger encryption at the request of our customers. • Scalable. UXP is scalable to any size of infrastructure. Unlimited amount of security servers can be linked together, making it fit for local and international applications. • Private. We use a distributed architecture, eliminating the creation of a superdatabase, which could be prone to exploitation. All transactions are signed and timestamped, making it possible to monitor all queries made by officials against private citizens

2.5 Industrial data spaces and connectors The “Industrial Data Space” is a virtual data space using standards and common governance models to facilitate the secure exchange and easy linkage of data in business ecosystems. It thereby provides a basis for creating and using smart services and innovative business processes, while at the same time ensuring digital sovereignty of data owners. The following section introduces the concept of the Industrial Data Space Connector by citing from the Reference Architecture Model for the Industrial Data Space published by the

Dissemination level: PU -Public Page 27

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Fraunhofer-Gesellschaft in cooperation with the Industrial Data Space Association. We introduce Connectors on the functional level5. Figure 3 shows the Functional Architecture of the Industrial Data Space. It defines, irrespective of existing technologies and applications, the functional requirements of the Industrial Data Space, and the features to be implemented resulting thereof.

Figure 3: Functional Architecture of the Industrial Data Space

The Connector is the central functional entity of the Industrial Data Space. It facilitates the exchange of data between participants. The Connector is basically a dedicated communication server for sending and receiving data in compliance with the Connector specification (see Section 3.5.1 in the Reference Architecture Model). A single Connector can be understood as a node in the peer-to-peer architecture of the Industrial Data Space. This means that a central authority for data management is not required. Connectors can be installed, managed and maintained both by Data Providers and Data Consumers. Typically, a Connector is operated in a secure environment (e.g., beyond a firewall). This means that internal systems of an enterprise cannot be directly accessed. However, the Connector can, for example, also be connected to a machine or a transportation vehicle. Each company participating in the Industrial Data Space may operate several Connectors. As an option, intermediaries (i.e., the Service Provider) may operate Connectors on behalf of one or several participating organizations. The data exchange with the enterprise systems must be established by the Data Provider or the Data Consumer. Data Providers can offer data to other participants of the Industrial Data Space. The data therefore has to be described by metadata. The metadata contains information about the Data Provider, syntax and semantics of the data itself, and additional information (e.g., pricing information or usage policies). To support the creation of metadata and the enrichment of data with semantics, vocabularies can be created and stored for other participants in the Vocabulary and Metadata Management component. If the Data Provider wants to offer data,

5 For further details related to the other layers of the Reference Architecture Model please refer to the official document: https://www.fraunhofer.de/content/dam/zv/de/Forschungsfelder/industrial-data-space/Industrial- Data-Space_Reference-Architecture-Model-2017.pdf

Dissemination level: PU -Public Page 28

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 the metadata will automatically be sent to one or more central metadata repositories hosted by the Broker. Other participants can browse and search data in this repository. Connectors can be extended with software components that help transform and/or process data. These Data Apps constitute the App Ecosystem. Data Apps can either be purchased via the App Store or developed by the participants themselves. App Providers may implement and provide Data Apps using the AppStore. Every participant possesses identities required for authentication when communicating with other participants. These identities are managed by the Identity Management component. The Clearing House logs each data exchange between two Connectors.

2.5.1 EU Data Portal The European Union Open Data Portal (EU ODP) [REF-02] gives access to open data published by EU institutions and bodies. All the data via this catalogue are free to use and reuse for commercial or non-commercial purposes. They can be reused in databases, reports, or projects. A variety of digital formats are available from the EU institutions and other EU bodies. Total datasets available as per the July 2018 is 12418. The goal by providing easy access to data — free of charge — is to help organizations to use the data in innovative ways and unlock their economic potential. The portal is also designed to make the EU institutions and other bodies more open and accountable. The data concerned include: geographic, geopolitical and financial data; statistics; election results; legal acts; data on crime, health, the environment, transport and scientific research. The portal provides:

• a standardised catalogue, giving easier access to EU open data; • a list of apps and web tools reusing these data; • a SPARQL endpoint query editor; • REST API access; • tips on how to make best use of the site (see the Search and SPARQL manuals). 2.5.2 GEOSS The Group on Earth Observations (GEO) [REF-12]works to connect the demand for sound and timely environmental information with the supply of data and information about the Earth that is collected through observing systems and made available by the GEO community. GEOSS (Global Earth Observation System of systems) is a set of coordinated, independent Earth Observation, information and processing systems that interact and provide access to diverse information for a broad range of users in both public and private sectors. It facilitates the sharing of environmental data and information collected from the large array of observing systems contributed by countries and organizations within GEO. The ‘GEOSS Portal’ offers a single Internet access point for users seeking data, imagery and analytical software packages relevant to all parts of the globe. It connects users to existing

Dissemination level: PU -Public Page 29

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 databases and portals and provides reliable, up-to-date and user-friendly information – vital for the work of decision makers, planners and emergency managers. It is an objective that DataBio datasets suitable for GEOSS will be added to the GEOSS portal.

2.5.3 DCAT and GeoDCAT GeoDCAT is a Geospatial extension to DCAT-AP (DCAT application profile for data portals in Europe). DCAT-AP is a metadata profile meant to provide an interchange format for data portals operated by EU Member States. It is based on and compliant with the W3C Data Catalog (DCAT) vocabulary. Data Catalog Vocabulary (DCAT) is an RDF vocabulary designed to facilitate interoperability between data catalogues published on the Web. By using DCAT to describe datasets in catalogues, publishers increase discoverability and enable applications to consume metadata from multiple catalogues. It enables decentralized publishing of catalogues and facilitates federated dataset search across them. GeoDCAT was developed in the framework of the EU Programme “Interoperability Solutions for European Public Administrations” (ISA). GeoDCAT-AP is meant to provide a DCAT-AP compliant representation for the set of metadata elements included in INSPIRE metadata and the core profile of ISO 19115:2003. GeoDCAT objectives:

• The GeoDCAT-AP specification does not replace the INSPIRE Metadata Regulation nor the INSPIRE Metadata Technical Guidelines based on ISO 19115:2003 and ISO 19119 [REF-13] • Its purpose is to give owners of geospatial metadata the possibility to achieve more by providing an additional RDF syntax binding • Its basic use case is to make spatial datasets, data series, and services searchable on general data portals, thereby making geospatial information better searchable across borders and sectors 2.5.4 CKAN CKAN [REF-15] is one of the world’s leading open source data portal platform. It is a data management system that makes data accessible by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. Once the data is published, users can use its faceted search features to browse and find the data they need, and preview it using maps, graphs and tables – whether they are developers, journalists, researchers, NGOs, or citizens. 2.6 Others OpenAire (https://www.openaire.eu/) as a place to put open data and get a Digital Object Identifier (DOI) for the dataset. EU funded projects are expected to add the open datasets created to this portal, and this is also the intention of DataBio. An example of an OpenAire dataset is shown in the Figure below.

Dissemination level: PU -Public Page 30

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Figure 4: OpenAire

Dryad (https://datadryad.org), the Dryad Digital Repository, is a curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of datatypes. Dryad’s vision is to promote a world where research data is openly available, integrated with the scholarly literature, and routinely re-used to create knowledge. The Dryad mission is to provide the infrastructure for, and promote the re-use of, data underlying the scholarly literature. Dryad is governed by a non-profit membership organization. Membership is open to any stakeholder organization, including but not limited to journals, scientific societies, publishers, research institutions, libraries, and funding organizations. Publishers are encouraged to facilitate data archiving by coordinating the submission of manuscripts with submission of data to Dryad. Learn more about submission integration. Dryad originated from an initiative among a group of leading journals and scientific societies in evolutionary biology and ecology to adopt a joint data archiving policy (JDAP) for their publications, and the recognition that easy-to-use, sustainable, community-governed data infrastructure was needed to support such a policy. An example from Dryad is shown in Figure 5.

Dissemination level: PU -Public Page 31

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Figure 5: DRYAD

Dissemination level: PU -Public Page 32

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Context view The datasets, formats and models are identified, described and used within the context of the DataBio project. 3.1 External drivers for data sharing and data exchange Data sharing consists of minimum two stakeholder which are providing and/or consuming mostly structured data about an entity (person, business, property or event). External regulations may set the rules and conditions of data sharing - on how to provide or consume data. These conditions might have a conductive or restrictive impact on data sharing processes. Most prominent regulation (legislation) is the GDPR, which set the conditions of processing personal data. Personal data are any information relating to an identified or identifiable natural person. For example: To process personal data, the purpose of processing has to be defined (and validated) and the data consumer has to make sure that the data are only processed for the defined purpose. In business context, not only legislations are setting the rules, moreover all contracts define the specific rulesets for processing data. Such regulations and rules have two main impacts. Firstly, to enable data sharing, the infrastructure (software) must ensure the compliance to external requirements and rules, such as the GDPR. Secondly, the data sharing process need to be defined and specified according to those regulations. While both process in the first run implies costs and efforts, in the second run it enables trust and long-term collaboration within a community such as bioeconomy. Furthermore, regulations and activities of public bodies can enable trustful environment of data-sharing, such as Open Data Policies. Beside the regulations and rules data sharing also depending on the knowledge domain, application scenario and intended use, data is differently represented, stored and published. Data may be intended for human users or for machine processing. Data can be in very diverse formats and their multimodality (text, image, video, audio) as well as its structural level (unstructured, semi-structured and structured) can be geared to a specific purpose. Both have impact on data providing and consuming processes. Furthermore, data, datasets, knowledge bases and knowledge building blocks are often not stable, are successively expanded and versioned as well as increasingly developed collaboratively and decentralized. Depending on the stability and size of the datasets, data is materialized or computed by processing routines and made available via APIs. Insofar as data is made accessible, for example under Open Data principles, the target group of the data users must be identified, possible business models defined, license requirements provided or used, the provenance and trustworthiness of the data disclosed. Data that is untrustworthy and whose usability is in question are hardly unusable, at least in a professional environment. This heterogeneity presents data publishers / data owners and data users with major problems. Various initiatives (W3C, Go-FAIR, DCMI) recommend the use of metadata that are specially designed. These initiatives are important external drivers that have impact on data sharing in data economy and especially in bioeconomy. The more clarity of the process and requirements of data sharing, the more users will succeed.

Dissemination level: PU -Public Page 33

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

As a rough measure of the quality and sustainability of the published data, the 5-star scheme according to Tim Berners Lee can be used [REF-16]:

★ Make your data available on the web under an open license. The format does not matter

★★ Provide data in a structured format (e.g. Excel instead of a scanned image of a spreadsheet)

★★★ Use open, non-proprietary formats (e.g. CSV instead of Excel)

★★★★ Use URIs to label things so your data can be linked

★★★★★ Link your data with other data to create contexts For example, the W3C offers a huge set of recommendations on which formats, languages, and vocabularies used to design and link data as well as metadata (RDF, RDFS, OWL, SPARQL, SHACL). Furthermore, the W3C offers a best practice for dealing with data to be published (https://www.w3.org/TR/dwbp/). For example, to provide metadata for both human users and computer applications, and describes the overall features of the dataset as well as the schema and internal structure of the distributions. Further, as described in Section 2.2, the Go-Fair Initiative [REF-17] developed a structured guideline to publish data sustainable. It uses four categories: Firstly, “To be Findable”, which mainly set some recommendations of identifier and “rich metadata”. Secondly, “To be Accessible”, which refer to the usage of standards and well-designed protocols. Thirdly, “To be Interoperable”, which are guidelines to ensure quality and transparent representations and fourthly “To be Re-usable”, which makes sure the data can be accessed and provided sustainably. In addition to these guides, the Dublin Core MetaData Initiative [REF-18] offers a variety of vocabularies in different formats for describing metadata related to raw data and data aggregates. Particular emphasis is placed on the provision of:

• Authors and contributors • Description of the data in text • Categorization • License information, • Versioning and updating rules. How concrete license information has to be designed is currently not defined and part of different research approaches. One structured definition of a license can be found on [REF- 19]. Due to the domain-specific complexity and heterogeneity of the data representation, there is no one big truth that leads the data economy of an application scenario to success. Rather, this is seen as a collection of recommendations that address dedicated aspects of the design

Dissemination level: PU -Public Page 34

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 of data to ensure the sustainable usability of the data provided, and thus providing users with the greatest possible support. 3.2 Data interoperability through ontologies, models, formats and standards DataBio aims at supporting data interoperability through use of suitable standard ontologies, and data models in the general domains of Geospatial and Earth Observation data and in the specific domains of Agriculture, Forestry and Fishery, and also to impact future standardisation where this is found feasible.

3.2.1 Geospatial and Earth Observation ontologies and standards In the Geospatial domain the DataBio project will aim to use and extend the standards of OGC, ISO/TC211 and INSPIRE in particular related to the requirements of Big Data. In the Earth Observation domain, the objective is to use and extend the international Earth Observation standards and services/APIs as described further in D5.1.

3.2.2 Agricultural ontologies and standards In the spring of 2018 the DataBio project has engaged in the new established Agriculture Working Group of OGC, the Open Geodata Consortium. The mission of the OGC Agriculture Working Group is to identify geospatial interoperability issues and challenges within the agriculture domain, then examine ways in which those challenges can be met through application of existing OGC standards, or through development of new geospatial interoperability standards under the auspices of OGC.

• Examination of the possibilities for agricultural information exchange standard alignment and harmonization between UN/CEFACT, ISO TC 23, ISOBus, AgroXML, OGC, W3C, etc. • Development of a reference architecture for use of OGC encoding and interface standards in common agricultural activities. • Renewal of MOU with IUSS WGSIS for coordination on SoilML / ISO 28258 and related standards. • Coordination with the agricultural interest groups within ESIP and RDA. • Coordination and exchange with other related initiatives such as GEOSS, GODAN, CGIAR, GlobalGAP, Open Ag Data Alliance, etc. • Organization of Agricultural Geoinformatics Summits at OGC Technical Committee meetings. Through previous projects DataBio partners have been engaged in the creation of ontologies and data models like FOODIE6 and SENSLOG7. The FOODIE ontology extends INSPIRE data model for Agriculture and Aquaculture Facilities themes. These ontologies and data models

6 http://foodie-cloud.github.io/model/FOODIE.html 77 https://sdi4apps.eu/2016/11/opensensorsnetwork-pilot-senslog-api-for-farmtelemetry-module/

Dissemination level: PU -Public Page 35

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 will be used in the DataBio project and related to the emerging standardisation interest in the Agriculture area.

3.2.3 Forestry ontologies and standards Forest information is standardised so that actors engaged in the forest sector could develop and use harmonised information systems. There are several parallel, successive actors in the forest sector value chain who have to exchange information when implementing measures. Although the basic concepts and measurement units of forestry have already been quite carefully defined for decades, almost every actor has implemented them differently in their information systems until recent years. As a result, it has been difficult or almost impossible to convert the information and transfer it from one system to another. Forest information standards facilitate the use of open materials and data transfer between actors, which in turn improves operational efficiency for the forest sector. This website is maintained by the Finnish Forest Centre and Forestry Development Centre Tapio8. The forest information standards used by the information systems have been published as xml schema documentation. The schema defines the structure and content of information so that different information systems can exchange standardized information. Available Forest Information Standards include a standard forestry data model, a standard for special features data, a standard for forestry and micro stand forestry information, a standard set of wood and forestry trades trading, a standard for wood and timber statistics as well as Forest Centre messages for official use. The new official standard messages published recently in 2018 include a message mix for wood harvesting and forest management, as well as self-monitoring messages. The standardization forum is currently working on a forest data update message and the first official version of the message is to be released during 2018. Additionally, in autumn 2018, the forest information standard compliance with the Y platform developed by the Population Register Center will be explored, a redesign of a wide-ranging special feature code will be planned and the interface between the digitized forest management recommendations and the forest information standardization will be considered.

3.2.4 Fishery ontologies and standards There are fewer established ontologies and standards in the Fishery domain, but in particular FAO, the Food and Agriculture Organization of the United Nations has established a Fisheries Glossary. The FAO Fisheries Glossary has been jointly upgraded by the Fisheries and Aquaculture Department and the Meeting Programming and Documentation Service. This upgrade stems from the need to have it become an integral part of the FAO Term Portal. It includes additional

8 Forestry oriented standards - https://www.metsatietostandardit.fi/en/.

Dissemination level: PU -Public Page 36

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 features, languages, and access to alternative definitions for currently existing terms in the FAO Term Portal. As at October 2014, the FAO Fisheries Glossary consists of approximately 1580 terms and definitions, grouped by subject areas, with relevant language equivalents being developed when new terms are added (http://www.fao.org/faoterm/collection/fisheries/en/). In addition, there is a recent CEN CENELEC Workshop on Aquaculture, that might be relevant also for some of the DataBio activities, https://www.cen.eu/News/Workshops/Pages/WS- 2016-14.aspx . The UN/CEFACT FLUX (Fisheries Language for Universal eXchange) standards for information exchange is designed to overcome the barrier with diverse national reporting standards.

Figure 6: The FLUX standards and status (from UN ESCAP presentation of Dr Heiner Lehr) [REF-37].

The type of data exchanged include:

• Information between stakeholders on stocks, quotas and catches • Real time monitoring of vessel positions (VMS) and on-going fishing activities • Reporting of fish landed and sales • Vessel data and characteristics • License and fishing authorisation requests

Dissemination level: PU -Public Page 37

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

3.3 Data access through standard services and APIs Besides taking advantage of existing standard services and interfaces in the Geospatial and Earth Observation area, DataBio will also look into the usage and promotion of suitable APIs for data access and other services.

3.3.1 Geospatial Standards, Data Types and Services

3.3.1.1 OGC View Services View services make it possible to display, navigate, zoom to, or overlay spatial datasets and to display legend information and any relevant content of metadata (EU Commission DIRECTIVE 2007/2/EC, Art. 11.1 b). A Web Map Services (WMS) provides geodata in the form of georeferenced image data in raster or vector image formats, such as Portable Network Graphics (PNG), Graphics Interchange Format (GIF) or Scalable Vector Graphics (SVG). In a configuration step of the WMS, it is also possible to query attribute information stored in an image coordinate. The Web Map Tile Service (WMTS) enables application to serve map tiles of spatially references data using tile images with predefined content, extent and resolution. It can be used to develop scalable, high performance services for web-based distribution of cartographic maps.

3.3.1.2 OGC Download Services Download services, enabling copies of spatial datasets, or parts of such sets, to be downloaded and, where practicable, accessed directly (EU Commission DIRECTIVE 2007/2/EC, Art. 11.1 c). A download service supports either the complete transfer of a geodataset or the access to individual objects. The downloaded data is available to the user on his own IT system and can be further processed if appropriate rights have been granted. A Web Feature Service (WFS) provides a web-based access to vector-based objects or data. New data models should be created exclusively on GML version 3.2. This service may be limited to download predefined datasets without further individual query or selection possibility of the contents (see http://www.opengeospatial.org/standards/sensorml). The Web Coverage Service (WCS) provides georeferenced raster data, in particular of multi- dimensional data stocks which represent phenomena with spatial or temporal variability. It includes e.g. earth observation, height models or temperature distribution (see http://www.opengeospatial.org/standards/sensorml ).

Dissemination level: PU -Public Page 38

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

3.3.1.3 Other Services In addition to the already mentioned OGC services or interfaces, respectively, there are service dealing with geospatial data which don’t implement these standards. In particular, if it comes to semi- or even unstructured data, different approaches might become more feasible. Representational State Transfer (REST) does not describe a specific standard but rather an architectural style for distributed hypermedia systems. REST does not suggest any specific protocol or data format. Nevertheless, HTTP and JSON is widely used for such services. Vector Data Formats Geography Markup Language (GML) is a format focused on, but not exclusively, describing vector data based on the Extensible Markup Language (XML). Since version 3 it is possible to use extensions e.g. for coverages. GML does limit the description of geospatial objects as 2D- and 3D-data only, but allows the inclusion of other information such as temporal data. This is the preferred data format to be served by an OGC Web Service. GeoJSON allows to describe and exchange geospatial information based on the JavaScript Object Notation (JSON). While limited to 2D-Data, it provides support for a variety of different geometry types such as Points, Lines and Polygons. Beside the geometric information an object can hold additional properties to describe features. These objects are called feature objects. Furthermore, GeoJSON allows to define so called FeatureCollections containing a set of different features. Well-known text (WKT) is a simple text-based markup language to describe geospatial information. Originally described by the OGC, the current standard is specified by ISO/IEC 13249-3:2016 and ISO 19162:2015. Unlike GeoJSON, it is possible to describe not only 2D features, but 3D features as well. This format is widely used to add geospatial information to table-structured data such as SQL Databases or CSV files (comma-separated values).

3.3.2 Sensor Standards, ontologies, data representations

3.3.2.1 OGC Sensor Observation Service The Sensor Observation Service (SOS) is an OGC standard and describes web services to store and to query real-time sensor data and sensor data time series. SOS is part of the . The offered sensor data comprises descriptions of sensors themselves, which are encoded in the Sensor Model Language (SensorML, see below), and the measured values in the Observations and Measurements (O&M) encoding format. The web service as well as both file formats are open standards and specifications of the same name defined by the Open Geospatial Consortium (OGC). If the SOS supports the transactional profile (SOS-T), new sensors can be registered on the service interface and measuring values be inserted. A SOS implementation can be used both for data from in-situ as well as remote sensing sensors. Furthermore, the sensors can be either mobile or stationary.

Dissemination level: PU -Public Page 39

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

3.3.2.2 OGC Sensor Model Language SensorML is an OGC standard and provides standard models and an XML encoding for describing sensors and measurement processes. SensorML can be used to describe a wide range of sensors, including both dynamic and stationary platforms and both in-situ and remote sensors. It provides a provider-centric view of information in a sensor web, which is complemented by Observations and Measurements (O&M) which provides a user-centric view. Functions supported include:

• sensor discovery, • sensor geolocation, • processing of sensor observations, • a sensor programming mechanism, • subscription to sensor alerts.

Latest version of the standard is 2.0 published in the year 2012 (see http://www.opengeospatial.org/standards/sensorml).

3.3.2.3 OGC SensorThings API SensorThings API is an OGC standard providing an open and unified framework to interconnect IoT sensing devices, data, and applications over the Web. It is an open standard addressing the syntactic interoperability and semantic interoperability of the Internet of Things. It complements the existing IoT networking protocols such CoAP, MQTT, HTTP, 6LowPAN. While the these protocols are addressing the ability for different IoT systems to exchange information, OGC SensorThings API is addressing the ability for different IoT systems to use and understand the exchanged information. As an OGC standard, SensorThings API also allows easy integration into existing Spatial Data Infrastructures or Geographic Information Systems. Latest version of the standard is 1.0 published in the year 2015.

3.3.2.4 ISO 19156:2011 Geographic information - Observations and measurements O&M standard defines a conceptual schema for observations, and for features involved in sampling. The standard provides models for the exchange of information describing observation acts and their results, both within and between different scientific and technical communities. Observations commonly involve sampling of a feature-of-interest. The standard defines a common set of sampling feature types classified primarily by topological dimension, as well as samples for ex-situ observations. The schema includes relationships between sampling features (sub-sampling, derived samples). The standard concerns only externally visible interfaces and places no restriction on the underlying implementations other than what is needed to satisfy the interface specifications in the actual situation. The last version of the standard was published in the year 2011.

Dissemination level: PU -Public Page 40

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

3.3.2.5 W3C Semantic Sensor Network Ontology This W3C ontology describes sensors and observations, and related concepts. It does not describe domain concepts, time, locations, etc. these are intended to be included from other ontologies via OWL imports. This ontology is developed by the W3C Semantic Sensor Networks Incubator Group (SSN-XG). The ontology is based around concepts of systems, processes, and observations. It supports the description of the physical and processing structure of sensors. Sensors are not constrained to physical sensing devices: rather a sensor is anything that can estimate or calculate the value of a phenomenon, so a device or computational process or combination could play the role of a sensor. The representation of a sensor in the ontology links together what it measures (the domain phenomena), the physical sensor (the device) and its functions and processing (the models). Last version of the SSN ontology was published in the year 2011.

3.3.2.6 NGSI-9/10 The FI-WARE version of the Open Mobile Alliance (OMA) NGSI-9 interface is a RESTful API via HTTP. Its purpose is to exchange information about the availability of context information. The three main interaction types are:

• one-time queries for discovering hosts (also called 'agents' here) where certain context information is available • subscriptions for context availability information updates (and the corresponding notifications) • registration of context information, i.e. announcements that certain context information is available (invoked by context providers).

The FI-WARE version of the OMA NGSI 10 interface is a RESTful API via HTTP. Its purpose is to exchange context information. The three main interaction types are:

• one-time queries for context information • subscriptions for context information updates (and the corresponding notifications) • unsolicited updates (invoked by context providers).

3.3.2.7 IoT Architecture -Thing, Resource, Entity IoT-Lite Ontology (http://iot.ee.surrey.ac.uk/fiware/ontologies/iot-lite). Surprisingly, there are no standards with regards to events. As a result, each event processing tool has its own programming model and semantics. The same goes for data representation of events.

3.3.3 API approach The API approach is largely tested and relatively well used. There are many categories of APIs; web-based system (e.g. REST), operating system (e.g., Cocoa), database system (e.g., Django) and hardware system. APIs typically include three elements: access control, request (operation and parameters) and response (data/service).

Dissemination level: PU -Public Page 41

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Lately, businesses have changed their view on API from a technology to a business enabler. Gartner9 introduces the concept of “API economy” together with “Digital business”. APIs allows data to be provided and consumed across platforms, systems and services using standards in a secure and reliable manner. However, there are some challenges with security and efficiency related to the use of API as a data access mechanism. A successful example of API-based data access is Transport for London: sharing 200 data elements through an API10. The API is used by 600 different apps that 42% of London’s population use11. 3.4 Stakeholders and concerns Using ArchiMate as a specification tool in the DataBio project, each dataset/datastream is related explicitly to a set of pilot systems, stakeholders, components and/or pipelines. The ArchiMate motivation and strategy diagrams specify the goals, drivers and outcomes of each pilot system, indicating the relevance and use of the datasets/streams. Figure 7 shows a strategy diagram from the B2 fishery pilots where the goals and outcomes are realized through extensive data collection and processing.

Figure 7: ArchiMate strategy diagram showing how the pilot system will realize the defined goals

Furthermore, ArchiMate is used to model pilot applications that realize outcomes. Figure 8 shows how the “Provide decision support for pelagic fisheries planning” (shown in Figure 7)

9 https://www.gartner.com/smarterwithgartner/ 10 https://api.tfl.gov.uk/ 11 https://tfl.gov.uk/info-for/open-data-users/open-data-policy

Dissemination level: PU -Public Page 42

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018 is supported by a set of process steps, including datasets, stakeholders and interactions. The application diagram identifies EO Data, Vessel Operation Data, Meteorological Forcast and Catch reports as required datasets/streams.

Figure 8: ArchiMate business diagram showing the data processing, datasets and actors involved

Each dataset can then be broken down into subsets (from ArchiMate Business Objects to ArchiMate DataObject) as shown in Figure 9.

Dissemination level: PU -Public Page 43

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Figure 9: ArchiMate data view for one of the fishery pilots (B2)

The pilots are realized both from datasets and DataBio components. Each pilot system utilizes a set of components to implement the required big data processing steps: collection, preparation, analysis, visualization and access. Figure 10 shows how the B2 fishery pilot is designed.

Figure 10: The B2 fishery pilot lifecycle view showing how data is provided as input to processing steps

Dissemination level: PU -Public Page 44

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

To further specify the application architecture, a pipeline view is created for each pilot system. The pipeline shows the component and dataset interfaces. Figure 11 shows the B2 fishery pilot pipeline with all its components and datasets.

Figure 11: The B2 fishery pilot pipeline view showing how datasets are interfaced

All pilots in DataBio are modelled in ArchiMate following this methodology. This allows for traceability from stakeholder and goal to application realization:

• A stakeholder has a goal that will have an outcome • An outcome is created from a set of actions • An action requires a set of resources • A resource can be a dataset or component (processing) • Datasets and components are combined in an architecture through interfaces and responsibilities.

Using Softeam’s Modelio software, users can navigate through the DataBio ArchiMate models for pilots and components to understand, compare and document the system/subsystems.

Dissemination level: PU -Public Page 45

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

3.5 License models for data reuse There exists a wide range of licencing schemes for publishing datasets. E.g, data.world lists 13 common schemes ranging from the most open to the most restrictive [REF-21]. These licences are typically Creative Common (CC) licenses, which origins from the Open Source domain. In addition to the more or less open models, there are several models for commercial licensing of closed datasets for b-b and b-g purposes, including International Data Spaces (IDS), Unified eXchange Platform UXP) and Sharemind from Cybernetica.

Dissemination level: PU -Public Page 46

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Requirements view 4.1 Types of EO data and sensors used in the DataBio pilots and their characteristics Remote sensing is one of the most common ways to extract relevant information about the Earth and our environment. Remote sensing acquisitions, done through both active (synthetic aperture radar, LiDAR) and passive (optical and thermal range, multispectral and hyperspectral) sensors, provide a variety of information about the land and ocean processes. Different types of Earth Observation data have been developed over the last forty years bringing significant changes in the context of the Big Data concept. A typical Big Data application chain may require EO input data in addition to other sensor data as depicted below.

Figure 12: EO Data Collection Context

A significant part of the 26 DataBio pilots use EO (Earth Observation) data as input for their specific purposes, in the context of efficient resource use and increasing productivity in agriculture, forestry and fishery. The general data types, including EO data, used in DataBio pilot projects are listed in the table below.

Dissemination level: PU -Public Page 47

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Table 2: Types of data used in DataBio pilot projects

Category Name of Partn AOI Data used in the pilot

pilot ers

Leader

No. pilot of

1 A1. Precision A1.1 NP, Greece (Pilot Site A: data directly from the field, agriculture in NP Precision GAIA Chalkidiki - 600 ha, collected from a network olives, fruits, agriculture Epiche Pilot Site B: Stimagka of telemetric IoT stations grapes and in olives, irein - 3 000 ha, Pilot Site called GAIAtrons; remotely

vegetables fruits, C: Veria - 10 000 ha) with image sensors on in- AGRICULTURE grapes orbit platforms; and by monitoring the application of inputs and outputs in the farm (e.g. in-situ measurements, farm logs, farm profile)

2 A1.2 C.A.C., Eastern Italy. satellite imagery, weather Precision VITO Location: 5 farms, and soil data and agriculture Emilia Romagna yield/seed maturity in vegetable Region, for the total predictions

A. A. PrecisionHorticulture including vine and olives seed crops acreage of 14,79 hectares in the first year. To be expanded to other crops in the same Region and in Region Marche.

3 A1.3 NB Veenkoloniën region historical yield data - field Precision Advies in the Netherlands characteristics (sample agriculture , VITO data yield data, potato in varieties, planting data vegetables - etc.), historical earth 2 (Potatoes) observation data

4 A2. Big Data A2.1 Big CREA, greenhouse experimental data: whole management Data CERTH horticulture in the genome genotypic data, in manageme Thessali Region, metabolomics and greenhouse nt in Greece phenomic (lab) data; eco-systems greenhouse observational data: eco- phenomics (field), sensor systems data, environmental indoor and outdoor

Dissemination level: PU -Public Page 48

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

5 B1. Cereals B1.1 TRAGS Cabreros del Río, high resolution (Sentinel-2

and biomass Vito Cereals and A Castile - Leon, Spain, type) satellite images, crops biomass “Ribera del Porma” complemented with sensor crops 1 Farmers Community: data and, in some specific 24.270 ha cases, with RPAS (Remotely Piloted Aircraft Systems) data and external data

B. Arable B. Precision Farming

6 B1.2 NP, Elassona, Greece- data directly from the field, Cereals and GAIA 2500 ha of maize as collected from a network biomass Epiche targeted crops of telemetric IoT stations crops 2 irein called GAIAtrons; remotely with image sensors on in- orbit platforms; and by monitoring the application of inputs and outputs in the farm (e.g. in-situ measurements, farm logs, farm profile)

7 B1.3 CREA, 24 sites in Emilia satellite imagery, Cereals, VITO, Romagna, Italy (120 telemetry IoT data (air biomass NOVA ha) - CREA sorghum temperature, air moisture, crops 3 MONT pilot, 3 sites in Emilia solar radiation, leaf (Biomass Romagna and wetness, rainfall, wind crops Veneto, Italy (6 ha) - speed and direction, soil monitoring CREA fiber hemp moisture, soil temperature, and pilot, 4 sites in North soil EC / salinity, PAR, performanc and South-Western barometric pressure), e Sardinia, Italy (65 ha) phenotypic data collected predictions) - NOVAMONT for each cropping season cardoon pilot

8 B1.4 LESPR 8300 ha - Rostenice EO data (Landsat 8 - Cereals, O (Vyskov, Czech Landsat data repository - biomass Republic); target (https://espa.cr.usgs.gov), crops 4 crops: cereals - Sentinel 2A/B - (Cereal crop winter wheat, spring (https://scihub.copernicus. monitoring) barley, grain maize eu/), Google Earth Engine platform for fast viewing EO data: (https://earthengine.googl e.com/), field boundaries from Czech LPIS database as shp or xml (http://eagri.cz/public/app /eagriapp/lpisdata/), ortophotos, topography maps, cadastral maps – as

Dissemination level: PU -Public Page 49

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

WMS service, farm data - Crop rotation, crop treatments records, yield maps, soil maps

9 B2. B2.1 LESPR AOI in Czech Republic telemetry data from Machinery Machinery O, machinery, other farm management manageme ZETOR, data and nt Federu environment

al issue

10 C1. C1.1 NP 12000 ha in North EO data, field data (soil

Insurance Insurance Greece - targeted temperature, humidity -

GEOS -

e crops: 7 types multi-depth, ambient

Insurance (wheat, stone fruits temperature, humidity, etc.) barometric pressure, solar radiation, leaf wetness, rainfall volume, wind speed and direction),

historical and current C. Subsidies C. and weather data, via the IoT strations network, enriched with yield data information extracted from the work calendar and stored in the NP’s cloud infrastructure

11 C1.2 Farm e- AOI in Italy Copernicus satellite data Weather GEOS series, meteorological Insurance data, other ground Assessment available data

12 C2. CAP C2.1 CAP e- AOI in Northern Italy data related to parcel Support Support GEOS, (50.000 ha) - 2 information and provided TerraS, targeted crop types, by the users, satellite Tragsa AOI in Southeastern optical and SAR data, in- Romania (10.000 situ / field data sqkm.) - 3 - 10 crop types

13 C2.2 CAP NP, AOI in Northern data directly from the field, Support - GAIA Greece (50.000 ha) - collected from a network Greece Epiche 2 targeted crops: dry of telemetric IoT stations irein beans, peaches called GAIAtrons; remotely with image sensors on in- orbit platforms (EO data), anonymized IASC data

Dissemination level: PU -Public Page 50

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

14 A1. Easy data 2.2.1 Easy MHGS, two estates, called forestry data transferred sharing and data VTT, “Rangunkorven via the Finnish forestry

networking MHGS sharing and SPACE yhteismetsä” and standard XML format , real

services -

FORESTRY networking BEL, “Taipale”, both time updates from field METSA located in Central measurements, forest K, FMI Finland; follows up owners’ forest the implementation management plans and according to the other notifications from defined specifications forest owners, forestry in Czech Republic and operators and other Belgium by WP2 stakeholders; processed partners data: forest estate, geometry of compartments, type of the forest work, sample plot locations, measured data

per sample plot, A. A. Multisourceand data crowdsourcing / e measurement averages per compartment, measurement date and user information; control significant vegetation changes, such as clear-cuts and forest damage areas to act in time

15 A2. 2.2.2 MHGS, AOI in Finland forestry data, real time Monitoring Monitoring FMI, updates from field and control and control TRAGS measurements, forest tools for tools for A, owners’ forest forest forest METSA management plans and owners owners K other notifications from forest owners, forestry operators and other stakeholders; processed data: forest estate, geometry of compartments, type of the forest work, sample plot locations, measured data per sample plot, measurement averages per compartment, measurement date and user information; control significant vegetation changes, such as clear-cuts and forest damage areas to act in time

Dissemination level: PU -Public Page 51

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

16 B1. Forest 2.3.1 Forest MHGS, the main EO data (in particular damage damage VTT, demonstration areas optical Sentinel-2 satellite

remote TRAGSA remote SENOP are the Hippala and data), precise data from sensing sensing , Rangunkorpi forest airborne and field METSA plots in South- measurements, used to

K, Eastern Finland; in train and validate the species damage / SPACE Wallonia, FMI, with method BEL the support of Spacebel, aims to develop a remote sensing service to provide a spatial distribution of the vulnerability and risk exposure to diseases and other potential hazards based on Sentinel-2 or Sentinel-1+Sentinel-2

17 2.3.2-FH TRAGS large areas in the remote sensing images Monitoring A, Iberian Peninsula - (satellite + aerial + UAV),

of forest SENOP Spain (Extremadura, field dat B. Forest B. Health / Remote / Crowdsensing, Invasive health , Andalucia, Castilla y CSEM, León, Castilla La CiaoT, Mancha, Madrid FMI, VTT

18 B2. Invasive 2.3.2 IAS - TRAGS Spain - the Iberian EO data (Sentinel 2, alien species Invasive A, Peninsula, the Canary Landsat 8), several control – alien SENOP Islands and the alphanumeric Big Data plagues – species , Balearic Islands databases - centralized forest control and CSEM, data - WORLDCLIM dataset management monitoring CiaoT, (provided by the FMI, International Journal of VTT Climatology - 19 bioclimatic raster layers with a resolution of 1 km), foreign trade database from Spanish Finance Ministry, Immigration Database by Spanish Statistical Institute, tourism dataset from Ministry of Energy, Tourism and Digital Agenda, GHS - population grid (developed by JRC), Spanish terrestrial transport netword (ESRI shp), provided by the National Geographic

Dissemination level: PU -Public Page 52

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Institute), NUTS-2, NUTS-3, Municipalities maps from GADM - Global

Administrative Areas

19 C1. Web- 2.4.1 Web- FMI, Czech Republic, Sentinel-2 satellite data, mapping mapping VTT, Wallonia (Belgium) distributed by European

service for service for SPACE Space Agency, forest METSAK government government BEL management maps, in-situ decision decision LAI (Leaf-area index) making making observations - in-situ data from total 189 forest plots with varying species composition and structures

20 C2. Shared 2.4.2 METSA Finland centralized forest resource multiuser Shared K, VTT data - original data source C. Forest C. data management services data multiuser for forest resource data environment data can be laser scanning, field environmen measurement, growth t modelling or notification from forest owner or forestry operator. Other data sources for Kemera financing data, forest use declarations, access and

authorization.

21 A1. Oceanic A1. Oceanic EHU- South Atlantic, Indian EO data (Sentinel 3, tuna tuna UPV Ocean CMEMS products), data

fisheries Fishery fisheries from on board monitoring FISHERY immediate immediate systems / fleet sensor

operational SINTEF operational observations (vessel choices choices engines sensors - velocity and heading, position of the vessel, fish catches - species, weight), weather and sea condition information.

22 A2. Small A2. Small SINTEF small pelagic fishing time series measurements pelagic pelagic Ocean fleet, covering the collected from a variety of fisheries fisheries North Atlantic Ocean sources (power system, immediate immediate navigation system, A. A. vessels Fishing immediate operational choices operational operational weather sensors, deck choices choices machinery), sonar / hydroacoustic data; EO data evaluated for inclusion in the pilot

Dissemination level: PU -Public Page 53

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

23 B1. Oceanic B1. Oceanic AZTI South Atlantic, Indian EO data, large datasets of

tuna AZTI tuna Ocean historical data (logbooks, fisheries fisheries VMS, GPS, Buoys, planning planning Observers), fuel consumption data, captures data, weather forecast

24 B2. Small B2. Small SINTEF small pelagic fishing extensive datasets within pelagic pelagic Ocean fleet, typically fisheries activity and catch fisheries fisheries covering the North statistics, combined with planning planning Atlantic Ocean information from that time and history of the same

such as meteorological and B. Fishing B. vessel trip and planning fisheries oceanographic data (meteorological and oceanographic hindcasts and forecasts), moon phase, time of day, time of

year, sonar data

25 C1. Pelagic C1. Pelagic SINTEF northeast Atlantic; hydroacoustics, fish stock fish stock Fisheri Norwegian coast oceanographic and assessments assessment es meteorological data (ocean s surface currents,

temperatures etc.), SINTEF Fisheries SINTEF collected in-situ or through remote sensing, estimates

of fish species and sustainabilityand value densities, catch reports, oceanographic simulations,

stock simulations C. Fisheries C. 26 C2. Small C2. Small SINTEF the small pelagic centralized data: market pelagic pelagic Fisheri fisheries in the North trends by the World bank market market es Atlantic Ocean and Norwegian Seafood predictions predictions Council (market insight and and data, statistics, trade traceabilit traceability information, consumption and consumer insight), pelagic auction data (a database containing information about all pelagic catches landed in Norway in the last decades), provided by Norges Sildesalgslaget, distributed/local data: fish stock observations (hydroacoustic and sonar instruments), quality

Dissemination level: PU -Public Page 54

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

measurements, vessel operations data (motion and cost of operation)

As datasets, they can also be grouped as: 1. Existing datasets utilized by DataBio pilots: datasets that are available and have relevance for the pilot systems in DataBio. The DataBio project demonstrates its usefulness and provide recommendations for use. 2. Existing datasets that the DataBio project has improved in terms of easier or better findability, accessibility, interoperability or reusability. 3. New datasets created by the DataBio project by combining or processing existing data sources.

Subsequently, the types of EO data and sensors (classified into optical and SAR data) used in the DataBio pilots are presented in terms of their main features: objectives of the mission, spatial, temporal and radiometric resolution, coverage, data access etc., with special regard on the aim of using these EO data in pilots, including derived EO products/results. 4.2 Datasets and datastream requirements from Platform This section describes the platform requirements that are related to EO datasets and datastreams. Each requirement (EO-xxxxxx) has a textual description, zero to more implementations in DataBio, and one or more relationships to requirements specified in the pilots. Full details and navigation are provided in the ArchiMate models.

ID Requirement

EO-441020 The DataBio Platform shall discover EO metadata through interfaces compliant with the OGC 13-026r8 specification.

Implementations N/A

Dissemination level: PU -Public Page 55

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Derived from

EO-441031 Discover available historical EO products

Implementations

Derived from

EO-441032 Discover extreme weather data

Implementations N/A

Derived from

EO-441040 The Proba-V data shall be discoverable using an Opensearch interface which can be integrated in FedEO.

Implementations N/A

Derived from

EO-442020 The interface to access the catalog where the Sentinel-2 data is stored (if stored remotely) shall be granted to the pilots.

Dissemination level: PU -Public Page 56

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Implementations N/A

Derived from

EO-442030 The Proba-V data shall be accessible using the product URLs as returned in the OpenSearch responses (discovery step).

Implementations N/A

Derived from

EO-444010 The Proba-V MEP platform should provide a processing cluster allowing parallel computing and data analytics on Proba-V data and selected Sentinel-2 derived vegetation indices at country/region range.

Implementations N/A

Derived from

4.3 Datasets and datastream requirements from Agriculture pilots This section describes the Agriculture pilots’ requirements that are related to datasets and datastreams. Each requirement (R1.x.y_z) has a textual description, zero to more implementations in DataBio, and one or more relationships to requirements specified in the pilots. Full details and navigation are provided in the ArchiMate models.

ID

Dissemination level: PU -Public Page 57

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

R1.2.1_6 A pilot needs the growth model

Implementation s

Derived platform requirements

R1.2.1_7 A pilot needs EO data (historical and current).

Implementation s

Derived platform requirements

R1.2.1_8 A pilot needs weather data (historical and current)

Implementation s

Derived platform requirements

R1.3.1_4 A pilot need that the current solution has to be improved, developed and scaled from 34KHa to several municipalities and NUTS-2 level

Dissemination level: PU -Public Page 58

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Implementation s

Derived platform requirements

R1.3.1_6 A pilot needs availability of historical and actual EO data (including vegetation indices(e.g., NDVI, EVI, NDRE, NDMI)

Implementation s

Derived platform requirements

R1.3.1_7 A pilot needs analysis on EO, DEM, soil and crop data by applying machine learning algorithms to identify management zones within the fields and its export in vector format (shp, isoxml)

Implementation s

Derived platform requirements

R1.3.1_8 A pilot needs analysis of spatial variability of crop status and alerting service

Implementation s

Dissemination level: PU -Public Page 59

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Derived platform requirements

R.1.3.1_9 A pilot needs reporting - by field or aggregated for crop type

Implementations

Derived platform requirements

R.1.3.1_11 A pilot needs:

(1) Components enabling to harness satellite data for applications in farm telemetry, with particular interest in Crop Monitoring and Predictions. (2) Components for crop monitoring and real-time analytics using real- time streaming data from wireless sensor networks; capability to trigger alarm/notifications/recommendations in order to improve farm operations and productivity

Implementations

Derived platform requirements

R.1.4.1_3 A pilot needs availability of current and historical EO data (including for example vegetation indices such as NDVI,LAI)

Dissemination level: PU -Public Page 60

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Implementations

Derived platform requirements

R.1.4.1_4 A pilot needs availability of weather data (integrated together with weather stations data). Parameters will be temperature, rainfall and humidity.

Implementations

Derived platform requirements

R.1.4.1_5 A pilot needs analysis on historical EO and weather data by applying machine learning algorithms to assess the impact of the bad weather conditions

Implementations

Derived platform requirements

Dissemination level: PU -Public Page 61

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

4.4 Datasets and datastream requirements from Forestry pilots This section describes the Forestry pilot requirements that are related to datasets and datastreams. Each requirement (R2.x.y_z) has a textual description, zero to more implementations in DataBio, and one or more relationships to requirements specified in the pilots. Full details and navigation are provided in the ArchiMate models.

ID

R2.2.2_1 A pilot needs damage & quality reporting features to the Wuudis mobile app (MHG), Needs standard development (METSAK), Integrations

Implementation s

Derived platform requirements

R2.3.1_1 A pilot needs new satellite and RS map layers provided via WMS/WMTS interface, Customizable map layers development to the Wuudis (MHG), Real-time forest management service development based on multiple forest big data sources

Implementation s

Dissemination level: PU -Public Page 62

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Derived platform requirements

R2.3.2_1 A pilot needs learning about methodologies to assess and monitor forest health status

Implementation s

Derived platform requirements

R2.4.1_1 A pilot needs shared repository of Sentinel-1 and Sentinel-2 satellite images.

Implementation s

Derived platform requirements

R.2.4.1_2 A pilot needs cloud environment with components for satellite data pre-processing (components FMI 1-4)

Implementation s

Dissemination level: PU -Public Page 63

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Derived platform requirements

R2.4.2_2 A pilot needs XML standard development (METSAK) for the forest damages and forest stand information update, Integrations and X-road approach for data transfer services as well as development of the data visualization/ map service for the forest damage information

Implementation s

Derived platform requirements

4.5 Datasets and datastream requirements from Fishery Pilots This section describes the Fishery pilots’ requirements that are related to datasets and datastreams. Each requirement (R3.x.y_z) has a textual description, zero to more implementations in DataBio, and one or more relationships to requirements specified in the pilots. Full details and navigation are provided in the ArchiMate models.

ID Requirement Implementations

R3.3.1_1 A pilot needs satellite data streams of sea surface temperature, sea surface salinity, sea level anomalies, ice concentrations, chlorophyll-a concentrations.

Implementations

Dissemination level: PU -Public Page 64

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Derived platform requirements

R3.3.1_2 A pilot needs ocean current simulation data streams.

Implementations

Derived platform requirements

R3.3.1_3 A pilot needs buoys data and position of the vessel

Implementations

Derived platform requirements

Dissemination level: PU -Public Page 65

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

R3.3.2_1 A pilot needs meteorological data to be available in the vessel power system

Implementations

Derived platform requirements

R3.3.2_2 A pilot needs meteorological data to be collected by interfacing with existing sensors, or new sensors provided

Implementations

Derived platform requirements

R3.3.2_3 A pilot needs meteorological data to be collected by interfacing with existing sensors, or new sensors provided

Implementations

Derived platform requirements

R3.3.2_4 A pilot needs satellite data streams of sea surface temperature, sea

Dissemination level: PU -Public Page 66

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

surface salinity, sea level anomalies, ice concentrations, chlorophyll-a concentrations.

Implementations

Derived platform requirements

R3.4.1_1 A pilot needs missing data sources including fishery-dependent data, fishery-independent data, oceanography.

Implementations

Derived platform requirements

R3.4.1_2 A pilot needs fishery-dependent data: landed catch (Sildes, ICES), scientific surveys (IMR), ERS (Norwegian directorate of fisheries)

Implementations

Derived platform requirements

R3.4.1_3 A pilot needs fishery-independent data: Publically available scientific survey data, hydro acoustics from fishing vessels (perhaps through ratatosk C17.01)

Dissemination level: PU -Public Page 67

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Implementations

Derived platform requirements

R3.4.1_4 A pilot needs oceanographic data: Satellite streams of sea surface temperature, sea surface salinity, sea level anomalies, ice concentrations, chlorophyll-a concentrations. (ICES, met.no, SPACEBEL, ..)

Implementations

Derived platform requirements

R3.4.2_2 A pilot needs machine learning & data analysis components for finding covariations (multivariate/PCA analysis) and estimating price prediction models.

Implementations

Derived platform requirements

Dissemination level: PU -Public Page 68

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Datasets: existing, improved, new and others This section presents the datasets identified by the DataBio projects as relevant for the selected domains, agriculture, forestry and fishery. The datasets are grouped into three sections based on their availability: 1. Existing datasets utilized by DataBio pilots: datasets that are available and have relevance for the pilot systems in DataBio. The DataBio project demonstrates its usefulness and provide recommendations for use 2. Existing datasets that the DataBio project has improved in terms of easier or better findability, accessibility, interoperability or reusability. 3. New datasets created during the DataBio project by collecting new data or combining or processing existing data sources. 4. Other datasets that might be of (future) relevance to DataBio pilots or similar systems.

Please note that many datasets are missing some parameters in the description. The datasets are continuously being added to the DataBioHub and most of the parameters will be included as they are harvested automatically from the data source.

The datasets are presented with the available metadata. The full metadata template structure is provided in Appendix A. 5.1 Existing datasets utilized by DataBio Pilots

5.1.1 Open Transport Map (UWB - D03.02) Field Value

Internal Name of D03.02 the Dataset

Name of the Open Transport Map Dataset/API Provider

Short Description The Open Transport Map displays a road network which – is suitable for routing – – visualizes average daily Traffic Volumes for the whole EU – – visualizes time related Traffic Volumes (in OTN Pilot Cities - Antwerp, Birmingham, Issy-le-Moulineaux, Liberec region) – Talking technical, the Open Transport Map – can serve as a map itself as well as a layer embedded in your map –

Dissemination level: PU -Public Page 69

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

– is derived from the most popular open dataset - OpenStreetMap – – is accessible via both GUI and API – – covers the whole European Union –

Version 1.0

Initial Availability 07.03.2017 Date

Data Type geographic data

Personal Data no

Rightsholder Plan4all

Other Rights Open Data Commons Open Database License (ODbL) Information

Dataset/API UWB Owner/Responsibl e

Dataset/API [email protected] Owner/Responsibl e Contacts

Technology

Name of the Open Transport Map System

Dissemination level: PU -Public Page 70

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dataset Data GUI, WMS, WFS, shapefile, all described at http://opentransportmap.info Model/API Interface

Data Model: WMS, WFS, shapefile, PostGIS Standards, Glossaries and metadata standards

Data Identifier - Standard used

Data Model - http://opentransportmap.info/img/OTM_physicalModelAndCodelists.s Specific Data vg Model

Data Volume 20 Gb

Update Frequency irregularly

Data Archiving and preservation

Geographical European Union Coverage

Timespan 2015-present

5.1.2 Forest resource data (METSAK - D18.01) Field Value

Dissemination level: PU -Public Page 71

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Internal Name of the Dataset D18.01

Name of the Dataset/API Forest resource data / MESTAK Provider

Short Description The pilot uses METSAK’s forest resource data concerning privately owned Finnish forests from METSAK’s forest resource data system. The forest resource data consists of basic data of tree stands (development class, dominant tree species, scanned height, scanned intensity, stand measurement date), strata of tree stands (mean age, basal area, number of stems, mean diameter, mean height, total volume, volume of logwood, volume of pulpwood), growth place data (classification, fertility class, soil type, drainage state, ditching year, accessibility, growth place data source, growth place data measurement date), geometry and compartment numbering. The forest resource data is available in a standard format for external use with consent of a forest owner.

Extended Description The forest resources are invented once in a decade per certain area using remote sensing (airborne laser scanning) and aerial photographs. The new data is analysed and in some parts measured in the field. Other updates on the forest resource data are yearly growth calculations, possible notifications of forest use or other forestry operations or so called Kemera financing operations and possible new aerial photographs to be interpreted.

Version Oracle database and data model version 2.5.2.

Initial Availability Date from year 2010 onwards

Dissemination level: PU -Public Page 72

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Data Type Oracle database model

Personal Data User information

Rightsholder METSAK

Dataset/API Forest resource data / METSAK / Aapo Lindberg Owner/Responsible

Dataset/API Oracle database model/ [email protected] Owner/Responsible Contacts

Technology Oracle database

Name of the System Aarni

Data Model: Standards, Oracle database model for forest resource data Glossaries and metadata standards

Data Identifier - Standard N/A used

Data Model - Specific Data Oracle database model Model

Data Volume 1984 GB

Update Frequency Online

Data Archiving and Real time backup procedures as well as database copy once a month preservation

Dissemination level: PU -Public Page 73

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Geographical Coverage Finland

Timespan Data available from year 2010 onwards

Access Level METSAK users

Access Mechanism Active directory user management

5.1.3 Landsat 8 OLI data

Field Value

Internal Name of the Landsat 8 OLI Dataset

Name of the Dataset/API NASA and the U.S. Geological Survey Provider

Extended Description Landsat 8 (formerly the Landsat Data Continuity Mission, LDCM), a collaboration between NASA and the U.S. Geological Survey, provides moderate-resolution measurements of the Earth’s terrestrial and polar regions in the visible, near- infrared, short wave infrared, and thermal infrared. Landsat 8 provides continuity with the more than 40-year long Landsat land imaging dataset. Landsat 8 carries two push-broom instruments: The Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS). The spectral bands of the OLI sensor provides enhancement from prior Landsat instruments, with the addition of two additional spectral bands: a deep blue visible channel (band 1) specifically designed for water resources and coastal zone investigation, and a shortwave infrared channel (band 9) for the detection of cirrus clouds. The TIRS instrument collects two spectral bands for the wavelength covered by a single band on the previous TM and ETM+ sensors.

Dissemination level: PU -Public Page 74

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Landsat 8 mission’s objectives are: · to provide data continuity with Landsats 4, 5, and 7; · to offer 16-day repetitive Earth coverage, an 8- day repeat with a Landsat 7 offset; · · to build and periodically refresh a global archive of sun-lit, substantially cloud-free, land images.

Data Type Level 0 (L0) Data Products Description of the products: L0 data products are image data with all data transmission and formatting artefacts removed. These products are time provided, spatial, and band- sequentially ordered multispectral digital image data. Level 1 Radiometric (L1R) Data Products Description of the products: L1R data products consist of radiometrically corrected image data derived from L0 data scaled to at-aperture spectral radiance or reflectance. Level 1 Systematic (L1G) Data Products Description of the products: L1G data products consist of L1R data products with systematic geometric corrections applied and resampled for registration to a cartographic projection, referenced to the World Geodetic System 1984 (WGS84). Level 1 Gt (L1Gt) Data Products Description of the products: L1Gt data products consist of L1R data products with systematic geometric and terrain corrections applied and resampled for registration to a cartographic projection, referenced to the WGS84. Level 1 Terrain (L1T) Data Products Description of the products: L1T data products consist of L1R data products with systematic geometric corrections applied, using Ground Control Points (GCPs) or onboard positional information to resample the image data for registration to a cartographic projection, referenced to the WGS84. The data are also terrain corrected for relief displacement. Level-2 Data Products Description of the products: Surface Reflectance are available on demand, courtesy of the USGS (U.S. Geological Survey).

Dissemination level: PU -Public Page 75

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

They provide an estimate of the surface spectral reflectance as it would be measured at ground level in the absence of atmospheric scattering or absorption. Landsat 8 Tier 1 data Description of the products: They are the Landsat scenes with the highest available data quality and are considered suitable for time-series analysis. Tier 1 includes Level-1 Precision and Terrain (L1TP) corrected data that have well-characterized radiometry and are inter-calibrated across the different Landsat instruments. Landsat 8 Tier 2 data Description of the products: Landsat 8 Tier 2 products are the ones that do not meet the Tier 1 criteria during processing. Tier 2 includes Systematic Terrain (L1GT) and Systematic (L1GS) processed data, as well as any L1TP data that do not meet the Tier 1 specifications due to significant cloud cover, insufficient ground control, and other factors. Landsat 8 Real-Time data Description of the products: The Real-Time Tier contains data immediately after acquisitions that use estimated parameters. Real-Time data are reprocessed and assessed for inclusion into Tier 1 or Tier 2 as soon as final parameters are available.

Access Mechanism Landsat Level-1 data products are available for immediate download. There are several ways of accessing Landsat-8 Level 1 products:· EarthExplorer (https://earthexplorer.usgs.gov/) – provides a graphical user interface to define areas of interest (AOI) by place name, address, zip code or creating an AOI on the interactive map. Queries can be applied to multiple collections simultaneously. The Bulk Download Application is an easy-to-use tool for downloading large quantities of satellite imagery and geospatial data on Earth Explorer. Once scenes are added to a Bulk Order via Earth Explorer, the Bulk Download Application can be used to automatically retrieve them with little to no user interaction and the application will automatically iterate through the scene list and download each until all have been processed.

Dissemination level: PU -Public Page 76

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

· GloVis (https://glovis.usgs.gov/ ); · LandsatLook Viewer (https://landsatlook.usgs.gov/ ). · Surface Reflectance and other Level-2 science products are available on request through: · USGS Earth Resources Observation and Science (EROS) Center Science Processing Architecture (ESPA) On Demand Interface; · ESPA Application Programming Interface (API); · EarthExplorer – allows ordering of only surface reflectance (SR) data products.

5.1.4 Sentinel 3 OLCI (Ocean and Land Colour Instrument) data

Field Value

Internal Name of the Sentinel 3 OLCI Dataset

Name of the Dataset/API ESA Provider

Short Description The Sentinel-3 mission carries multiple instruments to measure sea-surface topography, sea and land-surface temperature, ocean- and land-surface colour, contributing to the Copernicus marine, land, atmosphere, emergency, security and cryosphere applications. It is based on a constellation of two identical satellites, Sentinel-3A and Sentinel-3B, launched separately.

Extended Description Primary geophysical products provided by the Sentinel-3 mission are: · global coverage Sea Surface Height (SSH) for ocean and coastal areas; · enhanced resolution SSH products in coastal zones and sea-ice regions; · global coverage Sea Surface Temperature (SST) and sea-Ice Surface Temperature (IST); · global coverage ocean colour and water quality products;

Dissemination level: PU -Public Page 77

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

· global coverage ocean surface wind speed measurements; · global coverage significant wave height measurement; · global coverage atmospheric aerosol consistent over land and ocean; · global coverage total column water vapour over land and ocean; · global coverage vegetation products; · global coverage land ice/snow surface temperature products; · ice products (e.g., ice surface topography, extent, concentration). Secondary geophysical products provided by the Sentinel-3 mission are: · global coverage fire monitoring products (e.g. fire radiated power, burned area, risk maps); · · inland water (lakes and rivers) surface height data.

Geographical Coverage One Sentinel-3 satellite provides a revisit time of 27 days (385 orbits). OLCI’s field of view and its swath width of 1270 km, allows global coverage at the equator to be provided in 2–4 days with one satellite and in less than two days with two satellites.

Access Mechanism On 9 March 2018, Level-1 and Level-2 Sentinel-3 OLCI PDUs, full and reduced resolution, began to be released through the Sentinel-3 Pre-Operational Data Hub.

5.1.5 Sentinel 3 SLSTR (Sea and Land Surface Temperature Radiometer)

Field Value

Internal Name of the Sentinel 3 SLSTR Dataset

Dissemination level: PU -Public Page 78

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Name of the Dataset/API ESA Provider

Short Description The Sentinel-3 mission carries multiple instruments to measure sea-surface topography, sea and land-surface temperature, ocean- and land-surface colour, contributing to the Copernicus marine, land, atmosphere, emergency, security and cryosphere applications.

Extended Description The sensors / main instruments of the Sentinel-3 mission are: · Ocean and Land Colour Instrument (OLCI); · Sea and Land Surface Temperature Radiometer (SLSTR); · SAR Radar Altimeter (SRAL); · MicroWave Radiometer (MWR); · Precise Orbit Determination (POD), which consists of 3 instruments: DORIS: a Doppler Orbit Radio positioning system; GNSS: a GPS receiver, providing precise orbit determination and tracking multiple satellites simultaneously; LRR: to accurately locate the satellite in orbit using a Laser Retro- Reflector system. The Sea and Land Surface Temperature Radiometer (SLSTR) is a dual scan temperature radiometer, which has been selected for the low Earth orbit (800 - 830 km altitude) ESA Sentinel-3 operational mission as a part of the Copernicus (Global Monitoring for Environment and Security) programme. SLSTR is the successor of the (A)ATSR series (aboard the ERS and ENVISAT missions). The main objective of SLSTR products is to provide global and regional Sea and Land Surface Temperature (SST, LST) to a very high level of accuracy (better than 0.3 K for SST) for both climatological and meteorological applications. SLSTR is mostly known for its marine applications (SST – Sea Surface Temperature), but it also provides information related to biomass burning (fire detection and classification). SLSTR also contributes to climate studies by bringing several of the required Essential Climate Variables (ECVs) to the scientific community.

Dissemination level: PU -Public Page 79

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Geographical Coverage The mean global coverage revisit time for dual view SLSTR observations is 1.9 days at the equator (one operational spacecraft) or 0.9 days (in constellation with a 180° in-plane separation between the two spacecraft) with these values increasing at higher latitudes due to orbital convergence.

Timespan

Access Level Sentinel-3 SLSTR products are made available systematically and free of charge to all data users including the general public, scientific and commercial users.

Access Mechanism Sentinel-3A SLSTR data products are available via the Copernicus Open Access Hub.

5.1.6 MODIS data

Field Value

Internal Name of the MODIS Dataset

Name of the Dataset/API NASA Provider

Short Description The Moderate-resolution Imaging Spectroradiometer (MODIS) is a scientific instrument (radiometer) on board the NASA Terra and Aqua satellite platforms, launched in 1999 and 2002 respectively to study global dynamics of the Earth atmosphere, land, ice and oceans.

Extended Description MODIS captures data in 36 spectral bands ranging in wavelength from 0.4 um to 14.4 um and at varying spatial resolutions (2 bands at 250 m, 5 bands at 500 m and 29 bands at 1 km), providing complete global coverage of the Earth every 1 to 2 days. Both Terra and Aqua platforms are in sun synchronous, near polar (98 degree) orbits at 705 km altitude but with a descending local equatorial crossing time of

Dissemination level: PU -Public Page 80

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

10:30am in the case of Terra and a 1:30pm ascending crossing time for Aqua. MODIS Terra Global Level 3 Mapped Thermal SST products consists of sea surface temperature (SST) data derived from the 11 and 12 um thermal IR infrared (IR) bands (MODIS channels 31 and 32). Daily, weekly (8 day), monthly and annual MODIS SST products are available at both 4.63 and 9.26 km spatial resolution and for both daytime and night-time passes

Rightsholder MODIS products are available courtesy of GSFC – NASA.

Geographical Coverage The orbit of the Terra satellite goes from north to south across the equator in the morning and Aqua passes south to north over the equator in the afternoon resulting in global coverage every 1 to 2 days

5.1.7 Proba-V data

Field Value

Internal Name of the Proba-V Dataset

Name of the Dataset/API Vito Provider

Short Description The Proba-V mission provides multispectral images to study the evolution of the vegetation cover on a daily and global basis. The 'V' stands for Vegetation. This mission is extending the dataset of the long-established Vegetation instrument, flown as a secondary payload aboard France's SPOT-4 and SPOT-5 satellites launched in 1998 and 2002 respectively. The Proba-V mission has been developed in the frame of the ESA General Support Technology Program (GSTP). The Contributors to the Proba-V mission are Belgium, Luxembourg and Canada.

Dissemination level: PU -Public Page 81

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Extended Description Proba-V’s main applications are related to monitoring plant and forest growth, as well as inland water bodies. The Vegetation instrument can distinguish between different land cover types and plant species, including crops, to reveal their health, as well as detect water bodies and vegetation burn scars. The VEGETATION instrument is pre-programmed with an indefinite repeated sequence of acquisitions. This nominal acquisition scenario allows a continuous series of identical products to be generated, aiming to map land cover and vegetation growth across the entire planet every two days.

Geographical Coverage The mission, developed as part of ESA's Proba Programme, is an ESA EO mission providing global coverage every two days, with latitudes 35-75°N and 35-56°S covered daily, and between 35°N and 35°S every 2 days

Timespan

Access Level

Access Mechanism PROBA-V products can be ordered and downloaded from the PROBA-V Product Distribution Portal (PDP) at http://www.vito-eodata.be/. Products are usually available within 24 hours after sensing time (max 48 hours). Figure 8 shows the portal’s main page.

URI https://www.vito- eodata.be/PDF/portal/Application.html#Home

5.1.8 Global Precipitation Measurement (GPM) mission data

Field Value

Internal Name of the Dataset

Dissemination level: PU -Public Page 82

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Name of the Dataset/API Provider

Short Description Global Precipitation Measurement (GPM) is an international satellite mission to provide next-generation observations of rain and snow worldwide every three hours.

Extended Description NASA and the Japanese Aerospace Exploration Agency (JAXA) launched the GPM Core Observatory satellite on February 27th, 2014, carrying advanced instruments that set a new standard for precipitation measurements from space. The foundation of the GPM mission is the Core Observatory satellite provided by NASA and JAXA. Data collected from the Core satellite serves as a reference standard that unifies precipitation measurements from research and operational satellites launched by a consortium of GPM partners in the United States, Japan, France, India, and Europe. The Core satellite measures rain and snow using two science instruments: the GPM Microwave Imager (GMI) and the Dual- frequency Precipitation Radar (DPR). The GMI captures precipitation intensities and horizontal patterns, while the DPR provides insights into the three dimensional structure of precipitating particles. Together these two instruments provide a database of measurements against which other partner satellites’ microwave observations can be meaningfully compared and combined to make a global precipitation dataset.

Rightsholder NASA

Update Frequency The GPM constellation of satellites can observe precipitation over the entire globe every 2-3 hours

Geographical Coverage The GPM constellation of satellites can observe precipitation over the entire globe every 2-3 hours

Access Mechanism https://pmm.nasa.gov/data-access/downloads/gpm

URI https://pmm.nasa.gov/GPM

Dissemination level: PU -Public Page 83

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

5.1.9 KNMI (Koninklijk Nederlands Meteorologisch Instituut) precipitation data

Field Value

Internal Name of the KNMI Dataset

Name of the Dataset/API KNMI Data Centre (KDC) Provider

Short Description The KNMI Data Centre (KDC) provides access to weather, climate and seismological datasets of KNMI (Koninklijk Nederlands Meteorologisch Instituut).

Extended Description The primary tasks of KNMI are weather forecasting, monitoring of climate changes and monitoring seismic activity. KNMI is also the national research and information centre for climate, climate change and seismology.

Rightsholder KDC

Geographical Coverage KNMI Products cover the Netherlands and surrounding areas.

Access Mechanism Access to most is unrestricted and provided under the 'OpenData' policy of the Dutch government. For what concerns the specific precipitation KNMI dataset described in this document, the access is free, but a registration is needed. The Multisensor Evolution Analysis (MEA) technology (C41.01 Databio component) provides access to the above mentioned KNMI precipitation data.

Dissemination level: PU -Public Page 84

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

5.1.10 CMEMS (Copernicus Marine Environment Monitoring Service) data

Field Value

Internal Name of the CMEMS Dataset

Name of the Dataset/API Copernicus Provider

Short Description The CMEMS (Copernicus Marine Environment Monitoring Service) provides regular and systematic core reference information on the state of the physical oceans and regional seas. The observations and forecasts produced by the service support all marine applications.

Extended Description From May 2015, Copernicus Marine Environment Monitoring Service (CMEMS) is working on an operational mode. It follows the MyOcean demonstration phase that enabled to open the service on a pre-operational mode during 6 years. The service is meant for any user requesting generic information on the ocean, and especially downstream service providers who use this information as an input to their own value-added services to end-users. The CMEMS can be defined as:

• An integrated Service; • An Open and Free service; • Providing access to a single Catalogue of products; • A reliable service; • A sustainable service.

Data Type Copernicus Marine products are delivered in netCDF format (.nc). They can easily be downloaded through the CMEMS interface. Data are directly available through services like CSW catalog (Catalog Services for Web), WMS (Web Map Service), Subsetter, Direct Get File, FTP.

Dissemination level: PU -Public Page 85

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Access Mechanism In order to provide a standard access to the CMEMS Products, the FedEO Gateway (C07.01) has been extended with an additional connector for the CMEMS web service interface. By this way, CMES products can be retrieved and downloaded through the same components FedEO Gateway (C07.01) and the Data Manager (C07.04) via the same standard OpenSearch Interface compliant with OGC 13-026r8 than other EO products such as Sentinel products

URI

5.1.11 Sentinel 2A (ESA D11.01) Field Value

Internal Name of the Dataset D11.01

Name of the Dataset/API Sentinel 2A Provider

Short Description Sentinel 2B data provided by ESA. Multiples geographical areas and various times

Extended Description https://sentinel.esa.int/web/sentinel/sentinel-data-access

https://scihub.copernicus.eu/twiki/do/view/SciHubWebPortal/API HubDescription

Data Type EO data

Dissemination level: PU -Public Page 86

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Rightsholder ESA. License CC-BY.

Dataset/API Owner/Responsible ESA

Dataset/API Owner/Responsible [email protected] Contacts

Name of the System Sentinel

Dataset Data Model/API REST Interface

Data Model: Standards, SENTINEL SAFE Glossaries and metadata standards

Data Volume ~GB

Update Frequency Every 5 days

Data Archiving and preservation Locally on TRAGSATEC Premises

Geographical Coverage Extremadura, Galicia

Timespan 2016 - End of the Project

Dissemination level: PU -Public Page 87

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

5.1.12 Sentinel-2 Data Field Value

Internal Name of the Dataset D14.01, D14.02

Name of the Dataset/API Sentinel-2 Data Provider

Short Description · Sentinel-2 L1 data (C14.01). Sentinel-2 L1 data archive. ESA. Czech Republic · Sentinel-1 IWS data (C14.02). Sentinel-1 L1 data archive. EO data. Czech Republic · Sentinel-2 HR Optical data (C14.03) Sentinel-2 archive. European Space Agency (ESA). Global coverage

Extended Description NP has the data for its pilot areas (Τ1.2.1, Τ1.4.1, Τ1.4.2) corresponding to 6 tiles. Thematic Exploitation Platforms, such as the Forestry TEP (C16.10), are available for online analytics.

Rightsholder ESA

Data Model: Standards, SENTINEL-SAFE format Glossaries and metadata standards

Data Volume L1 data: Approximately 6Gb per scene

Update Frequency L1: 10 days revisit time, up to 5 days in Q2 2017

Dissemination level: PU -Public Page 88

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Geographical Coverage · Sentinel-2 L1 data (C14.01). Sentinel-2 L1 data archive. ESA. Czech Republic · Sentinel-1 IWS data (C14.02). Sentinel-1 L1 data archive. EO data. Czech Republic · Sentinel-2 HR Optical data (C14.03) Sentinel-2 archive. European Space Agency (ESA). Global coverage.

Timespan L1: June 2015 - now

5.1.13 Sentinel 3 SRAL (Synthetic Aperture Radar Altimeter) data

5.1.14 Sentinel 3 MWR (Microwave Radiometer) data

5.2 Datasets improved by DataBio This section presents datasets that are improved by DataBio through processing or other data management mechanisms.

5.2.1 RPAS (Remotely Piloted Aircraft Systems) data

Field Value

Internal Name of the RPAS Dataset

Name of the Dataset/API TRAGSA Provider

Short Description RPAS data, property of TRAGSA, are provided according to the pilot needs. The images acquired are provided in 6 spectral bands: RGB, Red Edge, NIR, Thermal, as well as point-cloud

Extended Description The delivery of RPAS imagery started in October 2017 and the areas covered represent small parcels (hectares) within pilot areas in the areas in the Iberian Peninsula - Spain (Extremadura, Andalucia, Castilla y León, Castilla La Mancha, Madrid). RPAS imagery are stored in TRAGSA Premises.

Dissemination level: PU -Public Page 89

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Timespan From 2017

5.2.2 Ortophotos

Field Value

Internal Name of the Ortophotos Dataset

Name of the Dataset/API TRAGSA Provider

Short Description The National Geographic Information Centre of Spain provides a mosaic of the latest orthophotos of the National Plan for Aerial Orthophotography.

Extended Description They are delivered in ETRS89 - The European Terrestrial Reference System 1989 datum for the Iberian Peninsula, Balearic Islands, Ceuta and Melilla, and WGS84 for the Canary Islands and UTM projection in the corresponding zone. Each unit (mosaic) covers a MTN50 sheet (National Topographic Map at 1:50 000 scale). All datasets are processed by TRAGSA to produce improved images. Specifically, orthophotos will be transformed by an orthorectification method developed under WP5. Component C11.03 – Radiometric Corrections is a tool that provides colour correction and homogenization process of orthophotos from different areas and/or dates. This tool increase orthophotos homogeneity and improve their subsequent possibilities of use, both for agrarian and environmental purposes, using image analysis automatized processes.

Dissemination level: PU -Public Page 90

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Access Mechanism Ortophotos are provided by Spanish National Geographic Institute at http://centrodedescargas.cnig.es/CentroDescargas/catalogo. do

5.2.3 gaiasense field (D13.01) Field Value

Internal Name of the Dataset D13.01

Name of the Dataset/API gaiasense field Provider

Short Description Dataset composed of measurements from NP’s telemetric IoT agro-climate stations called GAIATrons.

Extended Description Dataset composed of field-sensing measurements from NP’s network of telemetric IoT stations, called GAIAtrons. GAIAtrons offer configurable data collection and transmission rates and come in two variants. The GAIATron Atmo stations measures atmospheric parameters (e.g. ambient temperature, humidity, wind speed, direction, solar irradiance) whereas the GAIATron Soil stations measures soil parameters (e.g. multi-depth soil temperature, humidity). The coverage area for each station varies and their spatial distribution is influenced by the microclimatic variability of the monitored area.

Version 1.0

Initial Availability Date Beginning of 2016

Data Type Sensor measurements (numerical data) and metadata (timestamps, sensor id, etc.)

Dissemination level: PU -Public Page 91

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Personal Data No personal data is being recorder and/or stored

Rightsholder NP

Dataset/API Owner/Responsible NP

Dataset/API Owner/Responsible [email protected] Contacts

Technology NODEJS, Python, Apache, Linux, MySQL, JSON

Name of the System GAIAtrons (IoT telemetry stations for in-field measurements collection)

GAIABus DataSmart RealTime Subcomponent (for cloud-based monitoring, validation, parsing and cross-checking of the incoming data streams)

Dataset Data Model/API Interface

Data Model: Standards, No standards are being used in glossaries and metadata Glossaries and metadata standards

Data Identifier - Standard used No standards are being used

Data Model - Specific Data Custom data model that is designed to optimally address the needs Model of the offered smart farming applications

Data Volume several GBs/year

Dissemination level: PU -Public Page 92

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Update Frequency The update frequency depends on the velocity of the incoming data streams. GAIAtrons offer configurable data collection and transmission rates, per station and monitored parameter, based on the needs of the application

Data Archiving and preservation Data is preserved in local warehouses

Geographical Coverage Greek Pilot Areas (DataBio Pilots A1.1, B1.2, C1.1, C2.2)

Timespan 2016 until now

Access Level Restricted

Access Mechanism Query

5.2.4 Land use and properties - Greek agriculture pilots (NP - D13.02) Anonymised IACS data

Field Value

Internal Name of the Dataset D13.02

Name of the Dataset/API Provider 1.1.1 5.3.13 Land use and properties - Greek agriculture pilots

Short Description Dataset comprised of agricultural parcel positions expressed in vectors along with several attributes and extracted multi- temporal vegetation indices associated with them.

Extended Description Dataset comprised of thousands of agricultural parcel positions expressed in vectors along with several attributes including

Dissemination level: PU -Public Page 93

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

cultivating crop types, variety codes, and description. Further, each object/parcel has been assigned with several extracted statistical descriptors of different vegetation indices such as NDVI, NDWI and SAVI that capture its status in various temporal instances.

Version 1.0

Initial Availability Date Beginning of 2016

Data Type Parcel Geometries (WKT), alphanumeric parcel-related data and metadata (e.g. timestamps)

Personal Data The dataset has been pseudonymized and the most revealing fields within a data record (farmers’ identifiers) have been replaced by artificial identifiers (parcel id). The pseudonymization of the data allows the data to be tracked to its origins, as the goal is to provide smart farming services to the farmers, however, by following this process personal data can no longer be attributed to a specific data subject without the use of additional information. Fully aligned with the new GDPR, NP keeps the additional information separately and all technical and organizational measures have been established, ensuring that the personal data are not directly attributed to an identified or identifiable natural person.

Rightsholder NP

Dataset/API Owner/Responsible NP

Dataset/API Owner/Responsible [email protected] Contacts

Dissemination level: PU -Public Page 94

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Technology PostgreSQL, Python

Name of the System

Dataset Data Model/API Interface

Data Model: Standards, No standards are being used in glossaries and metadata Glossaries and metadata standards

Data Identifier - Standard used No standards are being used

Data Model - Specific Data Custom data model that is designed to optimally address the needs Model of the offered smart farming applications

Data Volume several GBs/year

Update Frequency Periodically. The update frequency depends on the velocity of the incoming EO data streams and the assignment of vegetation indices statistics to each parcel. Currently, new Sentinel-2 products are available every 5 days approximately and the dataset is updated in regular intervals

Data Archiving and preservation

Geographical Coverage Several areas within the Greek territory, including DataBio Pilots A1.1, B1.2, C1.1 and C2.2.

Timespan 2016 until now

Dissemination level: PU -Public Page 95

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Access Level Restricted

Access Mechanism Query

URI

5.2.5 Customer and forest estate data (METSAK - D18.02) Field Value

Internal Name of the Dataset D18.02

Name of the Dataset/API Customer and forest estate data / Metsään.fi Provider

Short Description The forest resource data is connected with the customer and forest estate data of METSAK. The essential part of the Metsään.fi eService use is the information on who owns certain forest estates and who has the rights to read and to use the forest resource data of a certain forest owner. The pilot uses METSAK’s customer information system, which contains all this data.

Version XML-file versions 1.4, 1.5, 1.6, 1.7

Initial Availability Date Year 2012 onwards

Data Type Relational database

Personal Data Private Forest Owners and Forest Service Providers

Rightsholder METSAK

Dissemination level: PU -Public Page 96

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dataset/API Owner/Responsible Metsään.fi data / Anu Kosunen

Dataset/API Owner/Responsible XML data / Anu Kosunen/[email protected] Contacts

Technology XML writer provides the standardized XML data from the Forest Resource Database

Name of the System Metsään.fi

Dataset Data Model/API Metsään.fi user interface, Web Service and SOAP interfaces on the Interface back ground.

Data Model: Standards, XML standards Glossaries and metadata https://www.metsatietostandardit.fi/en/ standards

Data Identifier - Standard used XML

Data Model - Specific Data https://www.bitcomp.fi/metsatietostandardit/ Model

Data Volume 450 GB

Update Frequency Constant updates when needed.

Data Archiving and preservation N/A

Geographical Coverage Finland

Dissemination level: PU -Public Page 97

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Timespan from 2012 onwards

Access Level Registered users: Private Forest Owners and Forest Service Providers

Access Mechanism https://tunnistaminen.suomi.fi

URI https://www.metsaan.fi/

5.3 New Datasets created during DataBio

5.3.1 Canopy height map (FMI - D14.05) Field Value

Internal Name of the D14.05 Dataset

Name of the Dataset/API Canopy height map Provider

Short Description Stand age (growth stages) according to canopy height model derived from aerial stereo-orthophoto interpretation of Czech Land Survey (data available countrywide every second year). Spatial resolution 5 m. Distinguished 4 different growth stages and absolute canopy height.

Extended Description Canopy height map, 20m resolution, pixel value corresponds to the height of dominant tree species

Initial Availability Date Will be prepared in Q3 2017

Data Type GeoTiff

Dissemination level: PU -Public Page 98

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Rightsholder Property of FMI

Dataset/API Raster dataset Owner/Responsible

Dataset/API [email protected] Owner/Responsible Contacts

Data Volume 4 GB

Update Frequency Fixed

Geographical Coverage Czech Republic

Timespan 2017

5.3.2 Orthophotos - (IGN - D11.02) Field Value

Internal Name of the Dataset D11.02

Name of the Dataset/API IGN Ortophotos Provider

Short Description Orthophotos provided by Spanish National Geographic Institute.

Dissemination level: PU -Public Page 99

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Extended Description Multiples geographical areas and various times. Orthophotos from PNOA (Spanish coverage). RGB&NIR bands. GSD= 25 cm. RMSE < 0,5 m

Initial Availability Date 2006

Data Type Images (WMS, PNG…)

Personal Data No

Rightsholder IGN. License CC-BY.

Dataset/API Owner/Responsible IGN

Dataset/API Owner/Responsible [email protected] Contacts

Name of the System Sentinel

Dataset Data Model/API REST Interface

Data Volume TB

Update Frequency Yearly

Geographical Coverage Whole Spanish Surface

Timespan 2016 - End of the Project

Dissemination level: PU -Public Page 100

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Access Level Free

URI http://centrodedescargas.cnig.es/CentroDescargas/catalogo .do#selectedSerie

5.3.3 GEOSS sources (D11.03) Field Value

Internal Name of the Dataset D11.03

Name of the Dataset/API GEOSS Sources Provider

Data Type EO data

Dataset/API Owner/Responsible TRAGSA-TRAGSATEC

Dataset/API Owner/Responsible [email protected] Contacts

Name of the System GEOSS

5.3.4 RPAS data (Tragsa - D11.04) Field Value

Internal Name of the Dataset D11.04

Dissemination level: PU -Public Page 101

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Name of the Dataset/API RPAS Provider

Short Description RPAS data and Images

Extended Description RGB & Multispectral (6 bands: RGB+Red Edge + NIR) & Thermal & point-cloud. Spatial features TBD according to the pilot needs

Version

Initial Availability Date October 2017

Data Type Images: TIFF and JPEG

Personal Data No

Rightsholder Under agreement. Property of TRAGSA Group

Other Rights Information N/A

Dataset/API Owner/Responsible TRAGSA Group

Dataset/API Owner/Responsible [email protected] Contacts

Dissemination level: PU -Public Page 102

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dataset Data Model/API No API interface. Files in local folders. Interface

Data Identifier - Standard used TIFF. JPEG.

Data Model - Specific Data .TIFF, .JPG, .LAS Model

Data Volume 60 Gb

Update Frequency 1-2 times year.

Data Archiving and preservation Stored in TRAGSA Premises

Geographical Coverage Small parcels within pilots areas. Hectares.

Timespan Meeting pilot needs.

Access Level Private.

Access Mechanism Under Request.

URI No.

5.3.5 MFE Spanish Forest Map (D11.06) Field Value

Dissemination level: PU -Public Page 103

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Internal Name of the Dataset D11.06

Name of the Dataset/API MFE50 Provider

Short Description Mapa Forestal Españolo (MFE) - Spanish Forestry Map

Extended Description MAPAMA (Spanish Ministry of Agriculture, Fisheries and Environment)

Initial Availability Date From 1997

Data Type ESRI Shapefile

Personal Data No

Rightsholder Free

Other Rights Information MAPAMA (Spanish Ministry of Agriculture, Fisheries and Environment)

Dataset/API Owner/Responsible TRAGSA-TRAGSATEC

Dataset/API Owner/Responsible [email protected] Contacts

Name of the System MFE (Spanish Forest Map)

Dissemination level: PU -Public Page 104

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dataset Data Model/API http://www.mapama.gob.es/es/biodiversidad/servicios/banco- Interface datos-naturaleza/informacion- disponible/mfe50_descargas_comunidad_madrid.aspx

Data Model: Standards, Cartography, vectors Glossaries and metadata standards

Data Model - Specific Data ESRI Shape File Model

Data Volume ~Mb

Update Frequency Every 10 years

Geographical Coverage Spain

Timespan From 1997, updated every 10 years

Access Level Open Access, Specific license not defined

5.3.6 Field data - pilot B2 (Tragsa - D11.07) Field Value

Internal Name of the Dataset D11.07

Name of the Dataset/API Field data - pilot B2 Provider

Dissemination level: PU -Public Page 105

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Short Description Data acquired by IoT Sensors. Scientific data from field samples.

Extended Description Direct observations and Direct & Lab measurements: Chlorophyll content, morphology, green & dry weight, hydric potential, Leaf Area Index (LAI), visual classification of damages. Features TBD according to the pilot needs

Rightsholder Under agreement. Property of TRAGSA Group

Dataset/API Owner/Responsible [email protected] Contacts

Data Identifier - Standard used CSV

Data Volume ~Mb

Update Frequency Daily

Data Archiving and preservation TRAGSA Premises

Geographical Coverage Study sites TBD in: Extremadura, Galicia

Timespan Specific dates TBD according to the pilot needs: 2017-2019

Access Level TRAGSA-TRAGSATEC

Dissemination level: PU -Public Page 106

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

5.3.7 Forest damage (FMI - D14.07) Field Value

Internal Name of the Dataset D14.07

Name of the Dataset/API Forest damage Provider

Short Description In-situ observations of forest damage. FMI. Czech Republic. Forestry statistics for selected plots - information about the amount of salvage cutting.

Extended Description Derived from Wuudis mobile application

Initial Availability Date 2017

Data Type Photography, numeric values

Rightsholder FMI

Dataset/API Owner/Responsible [email protected] Contacts

Name of the System

Dataset Data Model/API SQL, REST Interface

Dissemination level: PU -Public Page 107

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Data Model: Standards, GeoTiff, CSV Glossaries and metadata standards

Data Volume Gigabytes

Update Frequency Based on field campaigns

Geographical Coverage Czech Republic

5.3.8 Open Forest Data (METSAK - D18.01) Field Value

Internal Name of the Dataset D18.01

Name of the Dataset/API Open Forest Data / METSAK Provider

Short Description The pilot uses METSAK’s forest resource data concerning privately owned Finnish forests from METSAK’s forest resource data system. The forest resource data consists of basic data of tree stands (development class, dominant tree species, scanned height, scanned intensity, stand measurement date), strata of tree stands (mean age, basal area, number of stems, mean diameter, mean height, total volume, volume of logwood, volume of pulpwood), growth place data (classification, fertility class, soil type, drainage state, ditching year, accessibility, growth place data source, growth place data measurement date), geometry and compartment numbering. The forest resource data is

Dissemination level: PU -Public Page 108

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

available in a standard format for external use with consent of a forest owner.

Extended Description The forest resources are invented once in a decade per certain area using remote sensing (airborne laser scanning) and aerial photographs. The new data is analysed and in some parts measured in the field. Other updates on the forest resource data are yearly growth calculations, possible notifications of forest use or other forestry operations or so called Kemera financing operations and possible new aerial photographs to be interpreted.

Version OGC GeoPackage with 1.2 RTree XML version 1.7

Initial Availability Date 1.3.2018 Download services, Q2/2018 API’s

Data Type Open forest data including forest resource data as well as GIS data

Personal Data N/A

Rightsholder METSAK

Dataset/API Owner/Responsible METSAK Open forest data/ Juha Inkilä

Dissemination level: PU -Public Page 109

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dataset/API Owner/Responsible METSAK Open forest data (Avoin metsätieto)/METSAK / Contacts [email protected]

Technology WMS, WFS and REST

Name of the System Open forest data (Avoin metsätieto)

Dataset Data Model/API XML standard/REST, Interface OGC GeoPackage standard / WFS, WMS from Oracle database

Data Model: Standards, https://www.metsatietostandardit.fi/en/ Glossaries and metadata standards

Data Identifier - Standard used XML, OGC, WFS, WMS, REST

Data Model - Specific Data https://www.bitcomp.fi/metsatietostandardit/ Model

Data Volume 276,8 GB on June 2018

Update Frequency Daily

Geographical Coverage Finland

Timespan From 1.3.2018 onwards

Access Level Open

URI https://www.metsaan.fi/rajapinnat

Dissemination level: PU -Public Page 110

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

5.3.9 Hyperspectral image orthomosaic (Senop - D44.02) Field Value

Internal Name of the Dataset D44.02

Name of the Dataset/API Hyperspectral image orthomosaic Provider

Short Description Orthorectified hyperspectral mosaic, n-bands, band- matched.

Data Type ENVI /multipage TIF/single band TIF

5.3.10 Leaf area index (FMI - D14.06) Field Value

Internal Name of the Dataset D14.06

Name of the Dataset/API Leaf area index Provider

Short Description Leaf area index and canopy closure for selected National forest inventory sites in Czech Republic. Based on interpretation of digital hemispherical photos (in total 2457 images collected for 189 sites). Provided as input hemispherical photos and vector point layer with centroid of forest plot and LAI values in attribute table.

Dissemination level: PU -Public Page 111

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Extended Description In-situ sampling of DHP was based on the scheme, which takes into account the Sentinel-2 satellite spatial resolution (20 m pixel size) while the number of photos and their spatial layout was selected according to Majasalmi et al. (2012) as star shape with 13 sampled points in four principal azimuths - north, south, east and west. Each sampled point was positioned 3 meters apart. Sampling scheme for digital hemispherical photography. The images were taken with a Nikon D5500 digital SLR camera with a Sigma 4.5 mm circular fisheye lens. The camera was placed on a Vanguard Espod CX203 AP tripod and aligned horizontally with a two-axis level. All photos were shot with lens facing north and taken as RAW uncompressed images. In total 189 forest plots were sampled, from which 79 stands were dominated by coniferous trees (42% of the samples) and 110 stands with the dominant presence of deciduous trees (58% of all samples). All field plots were visited during the period of maximum vegetation foliage, for 2016 and 2017 in June to August, while in 2015 was the test period, where photos were taken only for evergreen coniferous plots, mostly in October.

All DHP photos were analysed in Hemisfer software (WSL, Switzerland). The software uses the LAI value inversion from angular distribution of canopy gaps for a set of statistically representative set of images.

Version 1.0

Initial Availability Date 1.1.2018

Dissemination level: PU -Public Page 112

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Data Type Image, numeric values

Rightsholder Property of FMI

Dataset/API Owner/Responsible FMI

Dataset/API Owner/Responsible [email protected] Contacts

Technology Digital hemispherical photography

Data Model: Standards, GeoTiff, CSV Glossaries and metadata standards

Data Volume Approx 10 GB

Update Frequency Based on field campaigns, three dedicated field campaigns conducted in 2015, 2016 and 2017

Data Archiving and preservation Local file storage

Geographical Coverage Czech Republic

Timespan 2015-2017

Dissemination level: PU -Public Page 113

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

5.3.11 NASA CMR Landsat Datasets via FedEO Gateway (SPACEBEL - D07.02) Field Value

Internal Name of the Dataset D07.02

Name of the Dataset/API NASA CMR Landsat Datasets via FedEO Gateway Provider

Short Description All datasets and collections metadata (including Landsat-8 collections) provided by the NASA Common Metadata Repository (CMR), around 32000 collections, are accessible through an OGC 13-026r8 OpenSearch interface via the FedEO Gateway

Extended Description All datasets and collections metadata (including Landsat-8 collections) provided by the NASA Common Metadata Repository (CMR), around 32000 collections, are accessible through an OGC 13-026r8 OpenSearch interface via the FedEO Gateway (C07.01). The available geographical area and the temporal coverage for the datasets/products are specified in each collection metadata. In the case of Landsat- 8, the coverage is the global world starting on April 2013. To download Landsat-8 products, an account is needed on EROS Registration System (ERS) at the following URL https://ers.cr.usgs.gov/register/. The download URL is included in the catalog search response. Collections and then products metadata including the product download URL metadata can be accessed via the component C07.05 FedEO Portlet acting as client of the FedEO Gateway (C07.01). The following picture illustrates the retrieval of Landsat-8 datasets through the FedEO Portlet (C07.05).

Initial Availability Date April 2013

Dissemination level: PU -Public Page 114

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dataset/API Owner/Responsible NASA/USGS - access point via Spacebel/ESA FedEO Gateway to collections including Landsat-8

Dataset/API Owner/Responsible [email protected], [email protected] Contacts

Dataset Data Model/API Mission/collection specific. Product metadata returned is OGC 10- Interface 157r4 compliant. Metadata contains download URL .OGC 13-026r8 OpenSearch.

Geographical Coverage Global

Timespan Landsat-8 Starts 2013-04, other collections have other temporal extents which can be found in the metadata and Atom dc:date elements.

Access Mechanism Accessible through an OGC 13-026r8 OpenSearch interface via the FedEO Gateway. To download Landsat-8 products, an account is needed on EROS Registration System (ERS) at the following URL https://ers.cr.usgs.gov/register/. The download URL is included in the catalog search response. Requires having a username and password at Sentinels Scientific Data Hub which is to be used inside the OpenSearch request to the FedEO Gateway (geo.spacebel.be).

5.3.12 Ontology for (Precision) Agriculture (PSNC -D09.01) Field Value

Dissemination level: PU -Public Page 115

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Internal Name of the Dataset D09.01

Name of the Dataset/API Gateway Ontology for (Precision) Agriculture Provider

Short Description The (FOODIE) ontology enables the representation of data compliant with FOODIE data model in semantic format and their interlinking with established vocabularies and ontologies (e.g., AGROVOC).

Extended Description Thus, in line with FOODIE data model, different agricultural-related concepts can be described and represented, including agricultural facilities, crop and soil data, treatments, interventions, agriculture machinery, etc. Also, in line with FOODIE data model, the ontology is based on the INSPIRE directive, ISO standards (e.g. 19156, 19157) and OGC standards. The ontology can be used for different semantic tasks, such as data semantization for the transformation of (semi-)structured data (e.g., tabular, relational) to semantic format; ontology-based data access, e.g., accessing relational databases as virtual, read- only RDF graphs; publication of linked data, including the discovery of links with relevant datasets in the Linked Open Data cloud.

Rightsholder Creative Commons Attribution 3.0

Dataset/API Owner/Responsible [email protected] Contacts

Dissemination level: PU -Public Page 116

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dataset Data Model/API SPARQL Interface

Data Model: Standards, OWL Glossaries and metadata standards

Data Volume Dataset 100Kb

Geographical Coverage Agnostic

Timespan Agnostic

5.3.13 Open Land Use (Lespro - D02.01) Field Value

Internal Name of the Dataset D02.01

Name of the Dataset/API Open Land Use Provider

Short Description Open Land Use Map is a composite map that is intended to create detailed land-use maps of various regions based on certain pan-Europen datasets such as CORINE Landcover, UrbanAtlas enriched by available regional data. The dataset is derived from available open datasources at different levels of detail and coverage. These data sources include: 1) Digital cadastral maps if available 2) Land Parcel Identification System if Available 3) Urban Atlas(European Environmental Agency)

Dissemination level: PU -Public Page 117

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

4) CORINE Land Cover 2006 (European Environmental Agency) 5) Open Street Map The order of the data sources is according to the level of detail and, therefore, the priority for data integration.

Dissemination level: PU -Public Page 118

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Extended Description

The Open Land Use (OLU) data model joins two basic data models of the INSPIRE Land Use specification – existing land use and planned land use. The main difference among INSPIRE data models and OLU model has been caused by the fact that OLU data model connects planning and existing land use data. In the OLU the different attributes are used for both types of land use data.

Land use involves management and modification of natural environment or wilderness into built environment such as fields, pastures, and settlements. It also has been defined as "the arrangements, activities and inputs people undertake in a certain land cover type to produce, change or maintain it" (FAO, 1997a; FAO/UNEP, 1999). Land use practices vary considerably across the world. The United Nations' Food and Agriculture Organization Water Development Division explains that "Land use concerns the products and/or benefits obtained from use of the land as well as the land management actions (activities) carried out by humans to produce those products and benefits." The OLU model also follows INSPIRE land use specification (uses same data attributes; the set of used attributes is larger than in the case of Land Use Database Schema), but it works with more simple view on data. Both models are transformable to each other and it is also possible to migrate data from these model to or from other datasets that are in harmony with INSPIRE specification. The main reason for above- mentioned differences is determine by different usage of data and data models. OLU will be used for any land use (and land cover) data, Land Use Database Schema serves just to spatial planning data as a special part of land use data. There are several datasets which could be used for creating harmonised land use dataset. Land use is a dataset which is used in many specialisms including agriculture, spatial or urban planning, environment protection and maintenance and restoration of environmental functions.Currently Open Land Use cover all EU with different level of accuracy: Europe

Dissemination level: PU -Public Page 119

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

The base European dataset is derived from the set of available data sources that are helping identify the land use in particular locality. The list of the sources used so far on the Pan-European level includes: 1. Urban Atlas 2. CORINE Land Cover 2012 The sources are mentioned in the order they were combined (1 - has the highest geometrical and semantic precedence and so on) to create the map. Czech Republic The dataset is derived from the set of available data sources that are helping identify the land use in particular locality. The list of the sources used so far includes:

1) Digital Cadastre

2) LPIS (Land Parcel Identification System)

3) Urban Atlas

4) CORINE Land Cover Austria The dataset is derived from the set of available data sources that are helping identify the land use in particular locality. The list of the sources used so far includes:

1) LPIS (Land Parcel Identification System)

2) Urban Atlas

3) CORINE Land Cover Flanders The dataset is derived from the set of available datasources that are helping identify the landuse in particular locality. The list of the sources used so far includes:

1) GRBGis Large Scale Reference Database

2) Urban Atlas

3) CORINE Land Cover

Dissemination level: PU -Public Page 120

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Version

Initial Availability Date 2015

Rightsholder Plan4all

Dataset/API LESP Owner/Responsible

Dataset/API [email protected] Owner/Responsible Contacts

Technology GML

Name of the System Open Land Use

Dataset Data Model/API REST, OGC WMS, WFS Interface

Data Model: Standards, GML Glossaries and metadata standards

Data Volume Hundreds of GB

Update Frequency Semi annually

Geographical Coverage Europe

Timespan 2015 - present

Dissemination level: PU -Public Page 121

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

URI Open Land Use is available on http://sdi4apps.eu/open_land_use/

5.3.14 Phenomics, metabolomics, genomics and environmental datasets (CERTH - DS40.01) Field Value

Internal Name of the Dataset DS40.01

Name of the Dataset/API Phenomics, metabolomics, genomics and environmental Provider datasets

Short Description This dataset includes phenomics, metabolomics, genomics as well as environmental data. Genomic predictions and selection data are also there.

Data Type Raw text, CSV data

Dataset/API Owner/Responsible [email protected], [email protected] Contacts

Data Volume 1-12 MB

Geographical Coverage Regions of Thessalia

5.3.15 Quality control data (METSAK - D18.04) Field Value

Dissemination level: PU -Public Page 122

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Internal Name of the Dataset D18.04

Name of the Dataset/API Quality control data Provider

Short Description The quality control data consists of forest estate, number of the financing conclusion, geometry of compartments type of the forest work, sample plot locations, measured data per sample plot, measurement averages per compartment, measurement date and user information. The quality control data will be added to the existing forest data standard during 2017.

Extended Description The quality control data of the work done in forest is part of the Best Practice Guidelines for Forest Management. The data is already being collected and saved in METSAK’s information systems, but the amount of that data needs to be increased. The data is planned to be collected also through a mobile application. This pilot is about presenting the quality control data in Metsään.fi eService for forest owners and forestry operators, and supporting the requirement specification of a new mobile application and its interfaces. In Metsään.fi the forest owners should be able to follow the quality of work done in their forests and compare it to the national average. The forestry operators have the quality data of their own work done in forest in Metsään.fi and also the possibility to compare it to the national average.

Version v1.0.0

Initial Availability Date Q3/2018

Data Type Quality control data for young stand improvement and tending of seedling stands.

Dissemination level: PU -Public Page 123

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Personal Data End user information and personal data.

Rightsholder METSAK

Dataset/API Owner/Responsible Mobile app. dataset owner MHGS/ Seppo Huurinainen

Dataset/API Owner/Responsible METSAK forest resource database (KantoRiihi) / Aki Hostikka / Contacts [email protected]

Technology Mobile app. in JSON, Quality control data in XML, SOAP, REST

Name of the System Laatumetsä mobile app., METSAK Forest Resource DataBase (KantoRiihi)

Dataset Data Model/API Laatumetsä Mobile app. user interface, Interface REST SOAP METSAK Forest Resource Database (KantoRiihi)

Data Model: Standards, REST, SOAP, JSON, XML Glossaries and metadata standards https://www.metsatietostandardit.fi/en/

Data Identifier - Standard used XML - https://www.metsatietostandardit.fi/en/

Data Model - Specific Data https://www.bitcomp.fi/metsatietostandardit/ Model

Dissemination level: PU -Public Page 124

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Data Volume Expected to be 200 GB together with Storm and Forest Damages dataset

Update Frequency Online

Data Archiving and preservation METSAK Forest Resource Database (KantoRiihi)

Geographical Coverage Finland

Timespan From Q3/2018 onwards

Access Level Available for registered users.

Access Mechanism https://tunnistaminen.suomi.fi

URI https://www.wuudis.com/fi/laatumetsa/

5.3.16 Sentinels Scientific Hub Datasets via FedEO Gateway (SPACEBEL -D07.01) Sentinel Products available on the Sentinels Scientific Data Hub (Sentinel-1, Sentinel-2) can be discovered and accessed via the FedEO Gateway (C07.01) that returns Sentinel collections and datasets metadata (including product download URL) via an OGC 13-026r8 OpenSearch interface. The available geographical area is the global world and the temporal coverage starts on April 2014 for Sentinel-1 and June 2015 for Sentinel-2. The access to the datasets metadata and the products requires an account (user/password) that can be obtained at https://scihub.copernicus.eu/dhus/#/self-registration. Access to Sentinel Products and metadata information can be done via the user interface of the FedEO Portlet (C07.05).

Field Value

Internal Name of the Dataset D07.01

Dissemination level: PU -Public Page 125

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Name of the Dataset/API Sentinels Scientific Hub Datasets via FedEO Gateway Provider

Short Description Sentinel Products available on the Sentinels Scientific Data Hub (Sentinel-1, Sentinel-2) can be discovered and accessed via the FedEO Gateway (C07.01) that returns Sentinel collections and datasets metadata (including product download URL) via an OGC 13-026r8 OpenSearch interface. The available geographical area is the global world and the temporal coverage starts on April 2014 for Sentinel-1 and June 2015 for Sentinel-2. The access to the datasets metadata and the products requires an account (user/password) that can be obtained at https://scihub.copernicus.eu/dhus/#/self-registration. Access to Sentinel Products and metadata information can be done via the user interface of the FedEO Portlet (C07.05).

Extended Description All datasets (collections) available through the Sentinels Scientific Hub are accessible through standard protocols via the Spacebel component C07.01 FedEO Gateway. These collections include: Sentinel-1, Sentinel 2, … Detailed collection information is published by ESA/Spacebel in the FedEO Collection Catalog and can be made available in various metadata flavours including ISO19139, ISO19139-2, ISO MENDS, DIF-10 or visualised on a user interface. Examples are shown below. http://geo.spacebel.be/opensearch/request?uid=EOP:ESA:SENTINE L_1, http://geo.spacebel.be/opensearch/request?uid=EOP:ESA:S2MSI1C

Dataset/API Spacebel Owner/Responsible

Dataset/API [email protected], [email protected] Owner/Responsible Contacts

Dissemination level: PU -Public Page 126

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dataset Data Model/API OGC 13-026r8 OpenSearch. Interface

Geographical Coverage Global

Timespan Sentinel-1 Starts 2014-04 (see below), Sentinel-2 starts 2015-06 (See below), other collections have other temporal extents.

Access Mechanism Requires having a user name and password at Sentinels Scientific Data Hub which is to be used inside the OpenSearch request to the FedEO Gateway (geo.spacebel.be).

5.3.17 SigPAC (Tragsa - D11.05) CAP Information System is a Land parcel identification system. It is provided by the Junta de Castilla y Leon (Autonomic Government).

Field Value

Internal Name of the Dataset D11.05

Name of the Dataset/API SigPAC Provider

Short Description LPIS - Land parcel identificacion system.

Extended Description A land-parcel identification system (LPIS) is a system to identify land use for a given country. It utilises orthophotos – basically aerial photographs and high precision satellite images that are digitally rendered to extract as much meaningful spatial information as possible. A unique number is given to each land parcel to provide a unique identification in space and time. This information is updated regularly to monitor the evolution of the land cover and the management of the crops.

Dissemination level: PU -Public Page 127

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Initial Availability Date Starting date of the project

Data Type ESRI Shape, SQLITE Databases

Personal Data No

Rightsholder FEGA - CAP payment Agency in Spain

Dataset/API Owner/Responsible www.mapama.gob.es Contacts

Data Model: Standards, More information at: Glossaries and metadata standards https://ec.europa.eu/jrc/en/research-topic/agricultural-monitoring

Data Model - Specific Data There are some commonalities among the european countries but Model LPIS model is different in each member state.

Data Volume Lower than 1Gb

Update Frequency Yearly

Geographical Coverage Spain

Access Level Free in some regions. Private in others.

5.3.18 Smart POI dataset (Lespro - D02.01) Field Value

Dissemination level: PU -Public Page 128

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Internal Name of the Dataset D02.01

Name of the Dataset/API Smart POI dataset Provider

Extended Description The Smart Points of Interest dataset (SPOI) is the seamless and open resource of POIs that is available for all users to download, search or reuse in applications and services SPOI’s principal target is to provide information as Linked Data together with other dataset containing road network. The added value of the Smart approach in comparison to other similar solutions consists in implementation of linked data, using of standardized and respected datatype properties and development of the completely harmonized dataset with uniform data model and common classification. The SPOI dataset is created as a combination of global data (selected points from OpenStreetMap) and local data provided by the SDI4Apps partners or data available on the web. The dataset can be reached by Sparql endpoint (http://data.plan4all.eu/sparql), for detailed information please follow: http://sdi4apps.eu/spoi.

Rightsholder It is available under Open Data Commons Open Database License (ODbL ~ http://opendatacommons.org/licenses/odbl/)

5.3.19 Stand age map (FMI - D14.04) Field Value

Dissemination level: PU -Public Page 129

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Internal Name of the Dataset D14.04

Name of the Dataset/API Stand age map Provider

Short Description Vector layer based on Czech forest management plans and stand age based on detailed forest inventory. It is countrywide with 10 years update interval.

5.3.20 Storm and forest damage observations and possible risk areas (METSAK - D18.03a) Field Value

Internal Name of the Dataset D18.03a

Name of the Dataset/API Storm and forest damage observations and possible risk Provider areas

Short Description One of the new data concerning this pilot is storm and forest damage observations, which are planned to be crowdsourced. The storm damage observations consist of location, type of the damage, evaluation of the extent of the damage, tree species and distance from the road. The storm and forest damage data supplements forest resource data. Possible storm and forest damage areas are evaluated based on the damage observations collected. The possible risk areas are presented to the users on a map layer.

Dissemination level: PU -Public Page 130

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Extended Description This information is currently gathered by METSAK with the forest use declaration process. To improve the overall management of storm damages and to prevent the possible further damages it is extremely important to get the field data and information as soon as possible. One way to gather this type of information is to provide a mobile app, which allows every (wo)man to report their observations for the forestry experts at Finnish Forest Centre. Based on the crowdsourced information forestry experts are able to react faster the before, which can prevent the further damages for instance caused by the pest attacks. Also the damaged wood material could be faster routed to the most suitable place for further processing.

Version v1.0.0

Initial Availability Date Q4/2018

Data Type XML

Personal Data No personal data gathered

Rightsholder METSAK

Other Rights Information MHGS provides the mobile app for data collection

Dataset/API Owner/Responsible Mobile app. dataset owner MHGS/ Seppo Huurinainen METSAK / Virpi Stenman

Dissemination level: PU -Public Page 131

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dataset/API Owner/Responsible METSAK forest damages database/Mikko Kesälä/ Contacts [email protected]

Technology Mobile app. in JSON, Storm and forest damages data in XML, SOAP, REST

Name of the System Laatumetsä mobile app, Mestakeskus map service (https://metsakeskus.maps.arcgis.com/home/index.html)

Dataset Data Model/API WMS-maps, XML standardization is on going. Interface METSAK user interface

Laatumetsä Mobile app. user interface, REST SOAP

Data Model: Standards, REST, SOAP, JSON, XML Glossaries and metadata standards https://www.metsatietostandardit.fi/en/

Data Identifier - Standard used XML, OGC and WMS-maps XML - https://www.metsatietostandardit.fi/en/

Data Model - Specific Data https://www.bitcomp.fi/metsatietostandardit/ Model

Dissemination level: PU -Public Page 132

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Data Volume Expected to be 200 GB together with Quality Control dataset (Laatumetsä)

Update Frequency Online

Data Archiving and preservation The data is stored and backuped in METSAK map service database

Geographical Coverage Finland

Timespan Q4/2018 onwards

Access Level open

Access Mechanism open

URI Mobile app: https://www.wuudis.com/fi/laatumetsa/

METSAK map service: https://metsakeskus.maps.arcgis.com/home/index.html

5.3.21 Forest road condition observations (METSAK - D18.03b) Field Value

Internal Name of the Dataset D18.03b

Name of the Dataset/API Forest road condition observations / Roads.ML Provider

Dissemination level: PU -Public Page 133

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Short Description One of the new data concerning this pilot is forest road condition observations, which are planned to be crowdsourced. The forest road condition observations consist of location, type of the road based on digiroad map, evaluation of the condition of the road, possible road limitations or obstacles on the road as well as the forest development classes for the road surroundings. The road and forest felling potential data supplements open forest data forest resource data. In future, possible priorities in road improvement activities might be evaluated based on the road condition observations collected. Both, the observed condition of the road and related felling potential are presented to the users on a map layer, which is openly available.

Extended Description This information is not currently gathered by METSAK. To increase the knowledge regarding the current road network condition and availability is extremely important for the logistic chain of the forest industry as well as for ensuring the wood supply. The crowdsourcing i.e. a mobile app can be utilized for collecting the field data and information as soon as possible. Based on the crowdsourced information forestry experts within the forest industry sector are able to react faster than before, which can prevent possible hiccups in the wood supply chain.

Version v1.0.0

Initial Availability Date Q4/2018

Data Type WMS

Dissemination level: PU -Public Page 134

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Personal Data No personal data gathered

Rightsholder METSAK

Other Rights Information Roads.ML provides the mobile app for data collection

Dataset/API Owner/Responsible Mobile app. dataset owner Roads.ml/ Jussi-Pekka Martikainen

METSAK map service / Mikko Kesälä

Dataset/API Owner/Responsible METSAK forest road map/Mikko Kesälä/ Contacts [email protected]

Technology Mobile app. in PostGres, Forest road data provided as GIS interface

Name of the System Roads.ml mobile app, Mestakeskus map service ( https://metsakeskus.maps.arcgis.com/home/index.html)

Dataset Data Model/API WMS-maps. METSAK user interface for WMS map. Interface Roads.ml Mobile app. user interface, REST

Data Model: Standards, REST, PostGres Glossaries and metadata standards

Dissemination level: PU -Public Page 135

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Data Identifier - Standard used OGC and WMS-maps

Data Model - Specific Data Based OGC standard Model

Data Volume Expected to be around 20 GB

Update Frequency Online

Data Archiving and preservation Postgres database

Geographical Coverage Finland

Timespan Q4/2018 onwards

Access Level open

Access Mechanism open

URI Mobile app: www.roads.ml

METSAK map service: https://metsakeskus.maps.arcgis.com/home/index.html

5.3.22 Tree species map (FMI - D14.03) Field Value

Internal Name of the Dataset D14.03

Dissemination level: PU -Public Page 136

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Name of the Dataset/API Tree species map Provider

Short Description Tree species map. Raster dataset based on classification of Sentinel-2 multi-temporal data and National forest inventory of Czech Republic. 20 m spatial resolution, distinguished six most abundant tree species in Czech Republic.

Data Type Raster dataset

Rightsholder Property of FMI

Dataset/API Owner/Responsible lukes.petr@@uhul.sz Contacts

Data Model: Standards, GeoTiff Glossaries and metadata standards

Data Volume 1 Gb

Update Frequency Fixed

Geographical Coverage Czech Republic

Timespan 2017

Dissemination level: PU -Public Page 137

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

5.3.23 Wuudis data (MHGS - D20.01) Wuudis uses the Finnish forest information standard as basic data import/export format. Wuudis service data model is based on the Finnish forest information standard. All development activities during the DataBio project that will affect to the Wuudis data model are based on Finnish forest information standard. Forest information standard includes a set of different standardized schemas (like timber sales, logistics etc.). Some of these schemas can be used in the DataBio and some new specifications are developed during project.

Basic information about the forest information standard: http://www.metsatietostandardit.fi/en . Base forest information standard XML schema description can be found here: https://extra.bitcomp.fi/metsastandardi_ehdotus/V8/MV/doc/index.html . This schema includes basic forest property data, stands, operations, tree stratums. Everything is based on this basic real estate information. Whole schema repository can be found here: https://www.bitcomp.fi/metsatietostandardit/ Wuudis also has open REST API that uses plain JSON which is faster than standard based XML data transfer. With JSON interface different kind of query parameters can be also used and data can be fetched in parts (like single stand or operation). All available resources are listed in the WADL documentation: https://wuudis.com/api/application.wadl One important dataset for Wuudis is different map layers. Wuudis uses global map services like Google and Microsoft (Bing) to provide world-wide satellite map layers to the end users. Wuudis also provides map layers from National Land Survey of Finland’s WMS/WMTS service. More information about National Land Survey of Finland map services can be found here: http://www.maanmittauslaitos.fi/en/maps-and-spatial-data/maps/view-maps .

Field Value

Internal Name of the Dataset D20.01

Name of the Dataset/API Wuudis data Provider

Short Description Wuudis uses the Finnish forest information standard as basic data import/export format. Wuudis service data model is based on the Finnish forest information standard. All development activities during the DataBio project that will

Dissemination level: PU -Public Page 138

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

affect to the Wuudis data model are based on Finnish forest information standard

Extended Description Forest information standard includes a set of different standardized schemas (like timber sales, logistics etc.). Some of these schemas can be used in the DataBio and some new specifications are developed during project. Basic information about the forest information standard: http://www.metsatietostandardit.fi/en . Base forest information standard XML schema description can be found here: https://extra.bitcomp.fi/metsastandardi_ehdotus/V8/MV/doc/ind ex.html . This schema includes basic forest property data, stands, operations, tree stratums. Everything is based on this basic real estate information. Whole schema repository can be found here: https://www.bitcomp.fi/metsatietostandardit/ Wuudis also has open REST API that uses plain JSON which is faster than standard based XML data transfer. With JSON interface different kind of query parameters can be also used and data can be fetched in parts (like single stand or operation). All available resources are listed in the WADL documentation: https://wuudis.com/api/application.wadl One important dataset for Wuudis is different map layers. Wuudis uses global map services like Google and Microsoft (Bing) to provide world-wide satellite map layers to the end users. Wuudis also provides map layers from National Land Survey of Finland’s WMS/WMTS service. More information about National Land Survey of Finland map services can be found here: http://www.maanmittauslaitos.fi/en/maps-and- spatial-data/maps/view-maps .

5.4 Recommended interaction structures: ATOS As presented in previous sections in this document, each of Databio’s pilots require a heterogeneous set of datasets that are made available in different remote systems, formats, encodings as well as spatial and temporal resolutions.

Dissemination level: PU -Public Page 139

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

This section exemplary describes how some of the most commonly used datasets in the pilots are managed/used by Databio’s components in an interoperable manner by making use of standardized interfaces protocols (APIs).

Datas DATASET NAME et name :

Pilot: A1 /3.2.1 Oceanic tuna fisheries immediate operational choices

Comp C05.01 Rasdaman onent :

API/O OGC WCS - GetCoverage perati on:

Exam Retrieve a subset area, encoded as GML, from the variable [variable name] ple: covering the whole Indian Ocean for a specific date.

Request:

http://150.254.165.231:8080/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&R EQUEST=GetCoverage&COVERAGEID=mlotst&SUBSET=Lat(13.41,14.82)&SUBSET= Long(76.67,78.14)&SUBSET=ansi(%222018-06-26T00:00:00.000Z%22,%222018- 06-26T00:00:00.000Z%22)&FORMAT=application/gml+xml

Response:

Dissemination level: PU -Public Page 140

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

13.35467349552 76.618871415346 "2018-06-26T00:00:00.000Z" 14.8527528809232 78.2007400554937 "2018-06-26T00:00:00.000Z" 182 620 21 199 638 21 Lat Long ansi 14.811139564662 76.66049953745515 "2018-06- 26T00:00:00.000Z" -0.0832266325224 0 0 Lat Linear 0 0.0832562442183 0 Long Linear

Dissemination level: PU -Public Page 141

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

0 0 1 "2018-06- 26T00:00:00.000Z" ansi Linear -32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

Dissemination level: PU -Public Page 142

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,- 32767 Linear 182 620 21 Gray -32767

Dataset DATASET NAME name:

Pilot: A1 /3.2.1 Oceanic tuna fisheries immediate operational choices

Compone C05.01 Rasdaman nt:

API/Opera OGC WCPS - ProcessCoverage tion:

Example: Calculate the mean value from variable “mlotst” for the whole Indian Ocean for all time periods and return it as text.

Dissemination level: PU -Public Page 143

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Request:

http://150.254.165.231:8080/rasdaman/ows?&SERVICE=WCS&VERSION=2.0. 1&REQUEST=ProcessCoverages& query=for $s in ( mlotst ) return encode( avg($s), "text/csv" )

Response:

38.206644819155315

Dataset DATASET NAME name:

Pilot: A1 /3.2.1 Oceanic tuna fisheries immediate operational choices

Compon C05.01 Rasdaman ent:

API/Oper OGC WCS - ProcessCoverage ation:

Example: Produce a colorized map (in png format) of the whole Indian Ocean Area depending on the values of the “mlotst” variable for a specific time period

Request: http://150.254.165.231:8080/rasdaman/ows?&SERVICE=WCS&VERSION=2.0. 1&REQUEST=ProcessCoverages& query=for $c in ( mlotst ) return encode(switch case $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] = 99999 return {red: 255; green: 255; blue: 255} case 18 > $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] return {red: 0; green: 0; blue: 255} case 23 > $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] return {red: 255; green: 255; blue: 0} case 30 > $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] return {red: 255; green: 140; blue: 0} default return {red: 255; green: 0; blue: 0} , "image/png")

Dissemination level: PU -Public Page 144

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Response:

Dataset name: DATASET NAME

Pilot: Pilot 1.3.1.B1.1: Cereals and biomass crop

Component: C05.02 FIWARE IoT Hub

API/Operation: CRUD Operations under RESTful API

Example: Device Registration:

Dissemination level: PU -Public Page 145

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

End-point URL: http://{$host_url:$host_port}/iot/devices Payload example in JSON format: {"devices": [ {"device_id": "raspberryPI1", "entity_name": "Field1", "entity_type": "Field", "protocol": "MQTT", "timezone": "Europe/Madrid", "attributes": [ { "name": "leaf_condensation", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name": "temperature", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name": "humidity", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name": "soil_humidity", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name":"Device", "type":"string" } ],

Dissemination level: PU -Public Page 146

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

"commands": [ { "name": "ping", "type": "command" } ] } ] }

Data Handling Management: End-point URL: http://{$host_url:$host_port}/v1/admin/config Payload example in JSON format: { "service":"DataBio", "servicePath":"/Tragsa", "host":"http://localhost:8080", "in":[ { "id":"Field1", "type":"Field", "providers":[ "http://localhost:8081" ], "attributes":[ { "name":"leaf_condensation", "type":"double" }, { "name":"temperature", "type":"double" }, { "name":"humidity", "type":"double" }, { "name":"soil_humidity", "type":"double" }, { "name":"Device", "type":"string" } ] } ], "out":[ { "id":"DataBioEvent1", "type":"DataBioEvent", "attributes":[ { "name":"leaf_condensation", "type":"double" }, { "name":"temperature", "type":"double" }, { "name":"humidity", "type":"double" }, { "name":"soil_humidity", "type":"double" }, { "name":"Device", "type":"string" } ], "brokers": [ { "url":"http://localhost:1026", "serviceName": "DataBio", "servicePath": "/Tragsa" } ] } ], "statements":[

Dissemination level: PU -Public Page 147

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

"INSERT INTO DataBioEvent SELECT leaf_condensation, temperature, humidity, soil_humidity, Device FROM Field Where leaf_condensation < 90 AND temperature > 15 AND 20 < humidity < 90 AND 0 < soil_humidity > 50" ] }

Dissemination level: PU -Public Page 148

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Concluding remarks The DataBio project is an EU lighthouse project with twenty-six pilots running over a hundred of piloting sites across Europe in the three main bioeconomy sectors, agriculture, forestry, and fishery. These sectors utilize, process and produce many datasets and streams that creates value for both businesses and governments. This deliverable provides an overview of datasets in the context of DataBio platform and pilots allowing the reader to gain insight into why the data is needed, what the data provides and how it can be retrieved. The requirements from the pilots and platform identifies datasets that are needed for the pilot applications. The ArchiMate models provides trace links to the relevant components, requirements and application goals through, allowing users to carry out coverage and orphan analysis as well as traditional trace navigation. The overview of datasets shows that DataBio pilots currently utilize 14 existing datasets, improve 6 datasets by processing or enriching with other datapoints, and finally are creating a total of 23 datasets. Each dataset is described with metadata in the DataBioHub. The numbers are expected to grow during the project’s lifetime. The first phase of the DataBio project has focused on the usage and creation of datasets based on the needs and requirements of the DataBio pilots. The next phase will continue with this, but will also have an increased focus on interoperability aspects of datasets through the use of ontologies and potential standard data models and access mechanisms/services and APIs. Further there will be an increased focus on secure data sharing and data exchange beyond the individual pilots to support a growing data economy in the DataBio areas of agriculture, forestry and fishery.

Dissemination level: PU -Public Page 149

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

References Reference Name of document (include authors, version, date etc. where applicable) [REF-01] European Commission, 2018: https://eur-lex.europa.eu/legal- content/EN/ALL/?uri=COM:2018:0232:FIN

[REF-02] European Open Data Portal: https://data.europa.eu/euodp/data/

[REF-03] European Commission, January 2017: (https://ec.europa.eu/digital-single- market/en/policies/building-european-data-economy

[REF-04] Transforming Transport (web): https://data.transformingtransport.eu/

[REF-05] Dunning, A.(2017). ‘Are FAIR data principles FAIR?’ LIBER Webinar. http://www.ijdc.net/article/view/567. Retrieved 2018-08-21

[REF-06] Press, G. (2016). ‘Cleaning Big Data: most time-consuming, least enjoyable data science task, survey says’, Forbes [Internet]. https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most- time-consuming-least-enjoyable-data-science-task-survey- says/#3cfa77426f63. Retrieved 2018-08-21.

[REF-07] Moons, B. et al. (2016). Realising the European Open Science Cloud. https://ec.europa.eu/research/openscience/pdf/realising_the_european_ope n_science_cloud_2016.pdf. Retrieved 2018-08-21

[REF-08] Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Nature Scientific Data, 3, 2016. doi:10.1038/sdata.2016.18.

[REF-09] FORCE 11 (2014) https://www.force11.org/fairprinciples, Retrieved 2018-08- 21.

[REF-10] European Commission (2016): H2020 Programme Guidelines on FAIR Data Management in Horizon 2020. http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/ oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 2018-08-21. [REF-11] DataBioHub: https://www.databiohub.eu

[REF-12] https://www.earthobservations.org/geoss.php

[REF-13] https://inspire.ec.europa.eu/sites/default/files/geodcat -ap.pdf

[REF-14] http://micka.bnhelp.cz/

[REF-15] https://c kan.org/

[REF-16] 5 -star scheme, Tim Berners Lee: https://5stardata.info/de/

Dissemination level: PU -Public Page 150

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

[REF-17] Go -Fair Initiative (https://www.go-fair.org/)

[REF-18] Dublin Core MetaData Initiative http://dublincore.org/

[REF-19] Creative Commons: https://creativecommons.org/ns

[REF-20] DataBio deliverable D6.2 “Data Management Plan”, June 30, 2017

[REF-21] Common license types for datasets (https://help.data.world/hc/en- us/articles/115006114287-Common-license-types-for-datasets, retrieved 2019-08-21).

[REF-22] DataBio deliberable D5.i2 “EO data sets, formats and sets”, https://rid- redmine.intrasoft-intl.com/projects/databio/dmsf?folder_id=1685

Dissemination level: PU -Public Page 151

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Appendix A Metadata template table

Field Value

Internal Name of the Dataset

Name of the Dataset/API Provider

Short Description

Extended Description

Version

Initial Availability Date

Data Type

Personal Data

Rightsholder

Other Rights Information

Dataset/API Owner/Responsible

Dataset/API Owner/Responsible Contacts

Dissemination level: PU -Public Page 152

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Technology

Name of the System

Dataset Data Model/API Interface

Data Model: Standards, Glossaries and metadata standards

Data Identifier - Standard used

Data Model - Specific Data Model

Data Volume

Update Frequency

Data Archiving and preservation

Geographical Coverage

Timespan

Access Level

Access Mechanism

Dissemination level: PU -Public Page 153

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

URI

Dissemination level: PU -Public Page 154