3436/R0-Copernicus/EEA/IDM/57292

Technical specifications for implementation of a new land-monitoring concept based on EAGLE

Public Consultation document for CLC+ Core

Version 1.0

22.04.2020

1 | P a g e

Version history

Version Date Author Status and Distribution description 2.0 04/10/2017 S. Kleeschulte, G. Draft for review by EEA, NRC LC Banko, G. Smith, S. NRC LC Arnold, J. Scholz, B. Kosztra, G. Maucha Reviewers: G-H Strand, G. Hazeu, M. Bock, M. Caetano, L. Hallin- Pihlatie 3.0 10/11/2017 S. Kleeschulte, G. Updated draft For distribution at Banko, G. Smith, S. integrating the Copernicus User Day Arnold, J. Scholz, B. comments Kosztra, G. Maucha following the NRC LC workshop Reviewers: G-H Strand, G. Hazeu, M. Bock, M. Caetano, L. Hallin- Pihlatie 4.0 31/01/2018 S. Kleeschulte, G. Updated draft For EEA distribution / Banko, G. Smith, S. integrating the consultations Arnold, J. Scholz, B. comments Kosztra, G. Maucha following the Brussels Reviewers: COPERNICUS Land G-H Strand, G. monitoring Hazeu, M. Bock, M. workshop Caetano, L. Hallin- Pihlatie 4.1 07/03/2018 S. Kleeschulte, G. Updated draft in EEA internal Banko, G. Smith, S. parallel to EEA Arnold, J. Scholz, B. consultation Kosztra, G. Maucha process on D4 (no EEA comments

received) Mainly updated chapters: 2, 3, 5, 6 and 7

5.0 30/06/2018 S. Kleeschulte, G. Refined details on Banko, G. Smith, S. CLC+ Backbone Arnold, J. Scholz, B. specifications, Kosztra, G. criteria for CLC+

2 | P a g e

Version Date Author Status and Distribution description Maucha, N. Backbone data Valcarcel, J. selection, Delgado

5.1 07/08/2018 H. Dufourmont EEA comments on Added HD Comments to 4.5 v. 4.5 from 5/07/2018 by Michael Bock 5.2 07/09/2018 S. Kleeschulte, G. Review and update Banko, G. Smith, S. based on EEA Arnold, J. Scholz, C. comments Pleschberger, B. Kosztra, G. Maucha, N. Valcarcel, J. Delgado

5.3 February 2019 H. Dufourmont Selection and Call for tender CLC+ structuring of Backbone production elements that are essential to organise the CLC+ Backbone procurement 5.4 July 2019 H. Dufourmont Minor error corrections; Last gaps in tables filled out 5.5 January J. Andresen CLC+ Core Call for tender CLC+ Core functional requirements 1.0 2. April 2020 Jes Bruun Olsen Document revised Consultation for Public Consultation process

3 | P a g e

Table of Contents

1 Context ...... 8 2 Introduction to the CLC+ concept ...... 10 2.1 Background ...... 10 2.2 Concept ...... 11 2.3 Role of Member States ...... 15 2.4 Engagement with stakeholder community ...... 16 2.5 Structure of this document ...... 17 3 Requirements analysis ...... 18 4 CLC+ Backbone – the new view on Europe’s land cover ...... 23 4.1 Introduction ...... 23 4.2 Description of CLC+ Backbone ...... 25 4.2.1 Level 1 Objects – hard bones ...... 26 4.2.2 Level 2 Objects – soft bones ...... 26 5 CLC+ Core – the grid approach...... 28 5.1 Background ...... 28 5.2 Overview ...... 29 5.3 Populating the database ...... 30 5.4 Using the EAGLE ontology ...... 32 5.4.1 Maintaining and using the EAGLE ontology ...... 35 5.4.2 Supporting spatiotemporal analysis in CLC+ Core ...... 36 5.5 Data Integrity ...... 37 5.6 Data flow ...... 39 5.6.1 Ingesting data ...... 39 5.6.2 Extracting data ...... 42 5.7 The hybrid approach ...... 43 6 PART III (INFORMATION) - CLC+ - the long-term vision ...... 44 7 CLC+ Legacy (Example) ...... 50 7.1 Experiences to be considered ...... 52 8 Annex 1 – CLC nomenclature ...... 53 9 Annex 2: Available ancillary data for CLC+ Backbone ...... 54 9.1 Open Street Map data (linear networks) ...... 54 9.1.1 Introduction ...... 54 9.1.2 OSM road nomenclature ...... 55

4 | P a g e

9.1.3 OSM roads: completeness assessment ...... 58 9.1.4 OSM completeness and quality aspects ...... 59 9.2 INSPIRE national portals: roads & railways ...... 61 9.3 Rivers and lakes ...... 62 9.3.1 WISE: ...... 62 9.3.2 EU-Hydro: ...... 63 9.4 Priority geospatial datasets for the European Commission ...... 63 9.5 Conclusions from evaluation of datasets ...... 64 9.5.1 Conclusions roads and railways ...... 64 9.5.2 Conclusions rivers and lakes ...... 68 9.6 Remark: land parcel identification system (LPIS) ...... 70 10 Annex 3: LPIS data in Europe ...... 72 11 Annex 4: EAGLE ontology ...... 75 11.1 LCC code entries ...... 75 11.2 LUA code entries ...... 78 11.3 LCH code entries ...... 80

List of Figures

Figure 1-1: The relationship of the CLC+ products to the existing CLMS components...... 8 Figure 2-1: Conceptual design showing the interlinked elements required to deliver improved European land monitoring (CLC+ product suite)...... 12 Figure 2-2 Conceptual design for the products and stages required to deliver improved European land monitoring (CLC+)...... 14 Figure 2-3 A scale versus format schematic for the current CLMS and proposed CLC+ products...... 15 Figure 4-1 Illustration of a merger of current local component layers covering (with overlaps) 26,3% of EEA39 territory (legend: red – UA, yellow - N2K, blue – RZ LC)...... 24 Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3) the pixel-based classification of EAGLE land cover components on raster basis and (4) attribution of Level 2 objects based on spectral segment based statistics, pixel based classification, additional Sentinel-1 information and VHR-texture parameters...... 25 Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit: 10 daa = 1 ha...... 29 Figure 5-2: Representation of real world data in the CLC+ Core...... 31 Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central mountain area covered with sparsely vegetated lichen and calluna heath is assigned to 3.3.3 Sparsely vegetated areas in Norway and 3.2.2 Moors and heathland in Sweden. Both classifications are correct...... 51

5 | P a g e

Figure 7-2: Result of changing the CLC implementation method in Germany ...... 52 Figure 9-1 Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Novo), Romania (near Parta) and Serbia (near Indija) (top to bottom). Roads are displayed according to their functional road class (FRC). The legend of FRC is given in the figure below. © Bing/Google and EOX Sentinel-2 cloud free services as Background (left to right)...... 55 Figure 9-2: Legend for functional road class (FRC) as defined in OSM data ...... 55 Figure 9-3 Global OSM Road density and completeness map - Barrington-Leigh and Millard-Ball study ...... 60 Figure 9-4: Downloadable datasets for INSPIRE transport network (road) services using the official INSPIRE interface (Status: 7. June 2018)...... 61 Figure 9-5 Distribution of number of downloadable datasets for INSPIRE transport service across member countries differentiated according to national and subnational coverage (Status: June 2018) ...... 62 Figure 9-6: Overview of the river data sources per country for the currently suggested hybrid river dataset (based on the EU-Hydro public beta, not yet EU-Hydro final v1.0 !!!) ...... 70 Figure 10-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit ...... 73 Figure 10-2: Overview of available GIS datasets LPIS and their thematic content © Synergise, project LandSense ...... 74

List of Tables

Table 2-1: Overview of key characteristics proposed for the four elements / products of the CLC+ based on initial discussions with EEA...... 13 Table 2-2: Summary (matrix) of potential roles associated with each element / product...... 16 Table 4-1: Main characteristic of Level-1 objects (hard bone) and Level-2 objects (soft bones)...... 26 Table 6-1: List of current CLC classes and requirements for external information ...... 49 Table 9-1: Grouping of INSPIRE national datasets for roads and railways according to CLC+ Backbone requirements and availability ...... 66

6 | P a g e

Introduction The European Environment Agency (EEA) and European Commission DG Internal Market, Industry, Entrepreneurship and SMEs (DG GROW), now under the remit of DG DEFIS (Directorate General Defence Industry and Space) have determined to develop and design a conceptual strategy and associated technical specifications for a new series of products within the Copernicus Land Monitoring Service (CLMS) portfolio, which should meet the current and future requirements for European Land Use Land Cover (LC/LU) monitoring and reporting obligations. This process of development is nominally called the "2nd generation CORINE Land Cover (CLC)" and the suite of products resulting are referred to as “CLC+”. In order to support the future CLC+ Core implementation it is decided to organise an open online consultation process, before finalisation of the technical and requirement specifications for the upcoming call for tender. The consultation will be open for a limited time The document at hand represents the ideas, analysis and the objectives behind the CLC+ product suite and a short description of requirements to realise the products in the suite. It is derived from a consolidated and more detail technical specification based on the results the collaboration between EIONET and EEA Stakeholders.

7 | P a g e

1 CONTEXT The European Environment Agency (EEA) and European Commission DG Internal Market, Industry, Entrepreneurship and SMEs (DG GROW) have determined to develop and design a conceptual strategy and associated technical specifications for a new series of products within the Copernicus Land Monitoring Service (CLMS) portfolio (Figure 1-1), which should meet the current and future requirements for European Land Use Land Cover (LC/LU) monitoring and reporting obligations. This process of development is nominally called the "2nd generation CORINE Land Cover (CLC)" and the suite of products resulting are referred to as “CLC+”.

Figure 1-1: The relationship of the CLC+ products to the existing CLMS components. After a call for tender in 2017, the EEA has tasked the Environment Information and Observation Network (EIONET) Action Group on Land monitoring in Europe (EAGLE Group) with developing an initial response to fulfilling the needs listed in the following chapters through the establishment of a conceptual design and a set of technical specifications. The approach adopted within the 2nd generation CLC for the development of the CLC+ products represented a sequence of stages, where separate single elements or products of the whole concept could be developed relatively independently and at different rates, to allow time for broad consultation with stakeholders. This also allowed the continuous inclusion of Member State (MS) input, the exploitation of industrial production capacity, and the necessary feedback, lessons learnt and refinement to reach the ultimate objective of a coherent and harmonized European Land Monitoring Framework. The first stage of this development process was to engage with the EEA to refine their requirements and build on the ideas presented by the EAGLE Group in the proposal. These first developments were undertaken within a set of constraints outlined by EEA and consolidated by DG GROW for the initial products to be delivered as part of CLC+.

They focus on: • Industrial production by service providers, • Final product in vector format, • Highly automated production process, • Short timeframe of production phase,

8 | P a g e

• Use of Earth Observation data (High spatial Resolution (HR) Sentinel-2 and Very High spatial Resolution (VHR) from Copernicus Contributing Missions (CCM)), and • Complete the picture started by the Hot Spot Monitoring1 products (EEA-39).

These developments resulted in a first draft deliverable (circulated informally) which outlined the conceptual strategy and proposed a technical specification for the first CLC+ product (CLC+ Backbone, see later details) to be developed. This stage also involved a presentation of the concepts contained in the draft deliverable to the EIONET National Reference Centres (NRCs) Land Cover in Copenhagen in October 2017. The NRCs feedback collected during the October meeting and afterwards was carefully taken into consideration for the continuation of the development process into the next stage. The conceptual strategy and technical specifications were all refined and extended to produce the next publicly distributed deliverable (D3)2. The concepts and specification for the CLC+ products contained in D3 were presented to the Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+ in Brussels in November 2017. This meeting generated lively debate and a set of survey results from the participants. The penultimate deliverable (D4) of the project represented the outcomes of the development of the CLC+ products guided by the feedback from the Brussels meeting and subsequent discussions. This involved a further expansion on the conceptual strategy and extended technical specifications for the proposed products. The aim of deliverable D4 was to continue to communicate these developments to the stakeholders involved in the field of European land monitoring and elicit their feedback. D4 was published by the EEA on the Copernicus Land website3 on 5th March 2018 and attendees of the Workshop on CORINE Land Cover+ in Brussels in November 2017 have been given an opportunity to provide feedback on that version until the end of March 2018. A final product - deliverable (D5.3) was delivered focusing on the future CLC+ product suite. Deliverable D5.3 has subsequently been revised and updated with information from March 2018 consultation process. The updated document describes the overall concept of CLC+ to provide guidance to future tenderers on technical choices for the production of the CLC+ Backbone and CLC+ Core. The informative part comprises a feasibility study and first design of the underlying database- structure (CLC+ Core), CLC+ Instance and the links to the existing CLC (CLC+ Legacy). The LULUCF- instance will be the first “visible” instance, derived from the CLC+ Core database, once it will be sufficiently populated with both Land Cover data, mainly originating from the CLMS, and Land Use data, mainly originating from in situ data provided by participating countries. This part is still under development by the EAGLE group and therefore subject to future changes. The Copernicus User Forum and Committee, as well as DG GROW and Eionet NRC/Land Cover are being informed on a regular basis on the progress of the development of the CLC+ product specifications and encouraged to take stock of the specification documents as made available in the technical library of the CLMS portal land.copernicus.eu.

1 Hot Spot monitoring products are formerly known as “Local components” and were renamed within 2018. 2 https://land.copernicus.eu/user-corner/technical-library/technical-specifications-for-clc-v3 3 https://land.copernicus.eu/user-corner/technical-library/clcplus-draft-technical-specifications-v4

9 | P a g e

2 INTRODUCTION TO THE CLC+ CONCEPT This chapter provides the background to the concept, an outline of the current proposed approach to be adopted within the 2nd generation Corine Land Cover (CLC) work, an overview of the individual elements of the concept and the status of the developments.

2.1 Background Monitoring of Land Cover and Land Use (LC/LULC/LU) is among the most fundamental environmental survey efforts required to support policy development and effective environmental management4. Information on LC/LU plays a key role in many European environmental directives and regulations. Many current environmental issues are directly related to the land surface, such as habitats, biodiversity, phenology and distribution of plant species, ecosystem services, as well as other issues relevant to population growth and climate change. Human activities and behaviour in space (living, working, production, education, supplying, recreation, mobility & communication, socializing) have significant impacts on the environment through settlements, transportation and industrial infrastructure, agriculture, forestry, exploitation of natural resources and tourism. The land surface, who´s negative change of state can only – if at all – be reversed with huge efforts, is therefore a crucial ecological factor, an essential economic resource, and a key societal determinant for all spatially relevant basic functions of human existence and, not least, nations’ sense of identity. Land thus plays a central role in all three factors of sustainable development: ecology, economy, and society. LC/LU products have so far tended to be produced independently of each other at the global, European, national and sub-national levels, as well as by different disciplines and sectors, each of them focussing on a fairly similar end product but with different emphases on thematic content and spatial detail. This independency and uncoordinated production are explained and justified by the large variation in objectives and requirements of these products, but it leads to reduced interoperability and sometimes also duplication of work, and thereby, inefficient use of resources. At the European level, CLC is the flagship programme for long-term land monitoring. It is now part of the Copernicus Land Monitoring Service (CLMS) and has been produced for the reference years of 1990, 2000, 2006 and 2012 and 2018 and expected to be fully completed by mid-2019. CLC is produced by MS with technical coordination of the EEA and guided by well-established specifications and methodological guidelines. The CLC specification aims to provide consistent localized geographical information on LC/LU using 44 classes at level-3 in the hierarchical nomenclature (see Section 8, (Annex 1)). The vector databases have a minimum mapping unit (MMU) of 25 ha and minimum mapping width (MMW) of 100 m with a single thematic class attribute per land parcel. At the European level, the database is also made available on a 100 x 100 m and 250 x 250 m raster products, which have been aggregated from the original vector data at 1:100 000 scale. CLC also includes a directly mapped (i.e. not creating by intersecting CLC status layers) change layer, which records changes between any two of the 44 thematic classes between two dates with a MMU of 5 ha. Although CLC has become well established and has been successfully used, mainly at the pan- European level, there are some deficiencies and limitations that restrict its wider exploitation, particularly at the MS level and below. This is partly due to the fact that the MMU of CLC (25 ha) is too coarse to capture the fine details of the landscape at local and regional scales, but also to the

4 Harmonised European Land Monitoring - Findings and Recommendations of the HELM Project.

10 | P a g e fact that some MS have access to more detailed, precise and timely information from national mapping programmes. In consequence of the first point, features of smaller size that represent landscapes diversity and complexity are not mapped either because of geometric generalization or because they are absorbed by thematic classes with rather broad definitions. Moreover, landscape dynamics that are highly relevant to locally decided but globally effective policy, such as small-scale forest rotation, changes in agricultural practices and urban in-filling, may be missed due to the low spatial resolution and / or thematic depth of CLC class definitions. The successful use of CLC in combination with higher spatial resolution products to detect and document urban in-filling has given a clear indication of the required direction of development for European land monitoring. To address some of the above issues, the CLMS has expanded its portfolio of products beyond CLC to include the High Resolution Layers (HRLs) and Hot Spot monitoring products. The HRLs provide pan- European wall-to-wall information on selected surface characteristics in a 20 m raster format and are made available as products aggregated to 100 m raster cells. They provide information on basic surface properties such as imperviousness, tree cover density and grassland and can be described as intermediate products. The Hot Spot monitoring products, in vector format, are based on Very High Resolution (VHR) Earth Observation (EO) data and ancillary information tailored to a specific landscape monitoring purpose (e.g. Urban Atlas), which provide more detailed thematic LC/LU information on polygon level with a MMU in a range of 0.25 to 1 ha. However, these Hot Spot monitoring products altogether do not provide wall to wall coverage of the EEA-39 countries, even when combined. Their nomenclatures and geometries are only partly harmonised (e.g. the 2017 fine-tuning between Riparian Zones and Natura2000 products) and have thus caused interoperability problems which are now being addressed.

2.2 Concept Given the above issues and the known reporting obligation listed in the next chapter, a revised concept for European Land Monitoring is required, which both provides improved spatial and thematic performance and builds on the existing heritage. Also, recent evolutions in the field of land monitoring (i.e. improved EO input data due to e.g. Sentinel programme, national bottom-up approaches, processing methods, etc.) and desktop and cloud-computing capacities offer opportunities to deliver these improvements more effectively and efficiently. The EEA in conjunction with DG GROW has identified this need and now aims to harmonise and integrate some of the CLMS activities by investigating the concept and technical specifications for a set of higher performance pan-European mapping products developed within the "2nd generation CLC" and under the banner of CLC+ product suite (see Figure 1-1). The EEA determined that such a CLC+ product suite should build upon recent conceptual development and expertise, while still guaranteeing backwards compatibility with the conventional CLC datasets. Also, the proposed approach should be suitable to answer and assist the needs of recent evolution in European policies like reporting obligations on land use, land use change and forestry (LULUCF), plans for an upcoming Energy Union and long-term climate mitigation objectives, as well as for monitoring purposes under the evolution of the Common Agricultural Policy (CAP). After a successful establishment of the CLC+ products there would then also be knock-on benefits for a broad range of other European policy requirements, land monitoring activities and reporting obligations. However, given these requirements, no single product or specification would address all the needs and the complexity of the solution would require a stage development and implementation.

11 | P a g e

The conceptual strategy therefore consists of a number of interlinked elements (Error! Reference source not found.) which represent separate CLC+ datasets / products and therefore can be delivered independently in discrete production phases. Each of the datasets / products has its own technical specification and production methodology and can be produced through its own funding / resourcing mechanisms. The structure of the conceptual design proposed by the EAGLE Group within the 2nd generation CLC is based on the elements shown in Error! Reference source not found.. The reports associated with this work have expanded incrementally giving more details on the conceptual design and provide technical specifications of increasing elaboration for the individual products.

Figure 2-1: Conceptual design showing the interlinked elements required to deliver improved European land monitoring (CLC+ product suite).

Although the four new elements / products of the conceptual strategy will be described in more detail later in this report, they can be summarised as follows to aid the understanding of the conceptual design: 1. CLC+ Backbone is a spatially detailed, large scale inventory in vector format providing a geometric spatial structure attributed with raster data for landscape features with limited, but robust, EO-based land cover thematic detail on which to build other products. 2. CLC+ Core is a consistent multi-use grid5 database repository for environmental land monitoring information populated with a broad range of land cover (including but not limited to CLC+ Backbone), land use and ancillary data from the CLMS and other sources, forming the information engine to deliver and support tailored thematic information requirements.

5 A data structure whose grid cells are linked to a data model that can be populated with the information from the different sources.

12 | P a g e

3. CLC+ Instance is the “nominal” end point or final product in the establishment of CLC+ product suite. There may potentially be multiple “instances” with different specifications and for different purposes. It is a derived raster / grid product from the CLC+ Core and will be a LC/LU monitoring product with improved spatial and thematic performance, relative to the current CLC, for reporting and assessment. It can be tailored towards different types of application domains. An illustration of the conceptual design is the ability to continue to produce the existing CLC, which as a CLC+ Instance may be referred to as CLC+ Legacy in the future, which already has a well-established and agreed specification. It is one out of possibly many instances derived from the CLC+ Core database, once populated with the CLC+ Backbone and some ancillary data on LU. However, priority will be given to develop a LULUCF Instance. Table 2-1 provides the current overview of the main characteristics of the three elements to allow the reader to make comparisons between the key characteristics of the CLC+ products. Table 2-1: Overview of key characteristics proposed for the three elements / products of the CLC+ based on initial discussions with EEA. CLC+ Legacy is used as one example for a CLC+ Instance. CLC+ Backbone CLC+ Core CLC+ Instances …. CLC+ Legacy (under (Call for Tender 2Q (example) implementation) 2020) Description Detailed wall to All-in-one data Thematically (and A generalised wall (EEA-39) container for geometrically) LC/LU product geometric vector environmental land detailed LC/LU consistent with reference layer monitoring product. the CLC with basic thematic information specification. content and a according to the raster attribution. EAGLE data model. Role / Support to CLMS Thematic Support to EU Maintain the time purpose products and characterisation, and national series (backwards services at from including flexible reporting and compatibility) and pan-European to extension of CLMS policy support legacy local scale levels. products and requirements. systems. services from pan- European to regional scale levels. Format Vector and raster. GRID database. GRID database. Raster and vector. Thematic 10 - 12 basic land Rich attribution of High thematic CLC- detail cover classes, LC, LU and further detail including LC nomenclature some spectral- characteristics (full and LU with with 44 classes temporal EAGLE data model). improvements + changes attributes. compared to CLC. between the 44. Geometric 0.5 ha for the 1.0 ha, 100 x 100 m 1.0 ha, 100 x 100 25 ha for status, 5 detail vector layer grid. m grid for status ha for changes (MMU, grid 10x10 m for the and changes. (raster: 100 x size) raster product 100 m). Update cycle 3 - 6 years. Dynamic update as 1 - 3 years. Standard 6 years new information or 5 years becomes available. depending on the needs for SOER). Reference 2018 2018 2018 2018 / 2024 year

13 | P a g e

Production 2019 2020 2021 2021 year Method Geospatial data EAGLE-Grid Derived by SQL Derived from integration and approach: from CLC+ Core CLC+ and CLC+ image population and Core segmentation for attribution of geometric regular GRID boundaries and attribution / labelling by pixel- based EO derived land cover Input data Geospatial data, CLC+ Backbone, all CLC+ Core CLC+ and / or EO images and available HRLs, CLC+ Core ancillary data hotspot monitoring selected from products ancillary hotspot monitoring data, national data products (with provisions (e.g. LU), visual control) and photointerpretation. HRL products.

To manage the project efficiently and effectively, when developing and specifying these new products it has been necessary to stage the work. Throughout the documents delivered by this project the staging of the work given in Error! Reference source not found. has been used as a key graphic to identify the product and stage being described.

Figure 2-2 Conceptual design for the products and stages required to deliver improved European land monitoring (CLC+).

In figure 2-3 a schematic description of the conceptual design showing the relationship of the new elements / products of CLC+ to the existing CLMS products in terms of their format and level of spatial detail. In this representation: 1. The current or conventional CLC (CLC2000, CLC2006, CLC2012, CLC2018 etc.), a polygon map with fewer geometric details. 2. The Hot Spot monitoring products (Urban Atlas, Riparian Zones, N2000 etc.), polygon maps with more geometric and thematic details. 3. The HRLs (Imperviousness, Forest, Grassland, Wetland etc.), raster products for specific surface characteristics with high spatial details. 4. The proposed CLC+ Core, a grid product with a MMU of 1 ha. 5. The proposed CLC+ Backbone, (a 10x10m raster and a vector product with MMU 0.5 ha) 6. The proposed CLC+ Instance, a grid product with a MMU of 1 ha.

14 | P a g e

Figure 2-3 A scale versus format schematic for the current CLMS and proposed CLC+ products.

In any monitoring system the requirement for updates is central to the implementation of the activity. In this document we are focusing on the technical specifications, therefore the update cycle will only be addressed in terms of the suggested time between repeat productions. Details of the update process are more associated with the methodologies, which still need to be finalised by the service providers, national teams and / or the EEA in a later phase of development. However, particularly in relation to the CLC+ Legacy, reference will be made to the issues associated with the update cycle and change mapping where appropriate. Given the context, requirements and issues, the aim should be to find a means to deliver the concept and its proposed products with an efficient mixture of industrially produced material, backed up by ancillary (e.g. land use) information from various national programmes if they are in existence. It is important to build a viable system in which there is flexibility to adapt t feasibility and practicality once the implementation of the earlier steps is underway. For instance, the outcomes of the CLC+ Backbone production should be able to influence thematic content of the CLC+ Core and / or the technical specifications of the CLC+ Instance. In a similar way, the need to create instances as LULUCF, Legacy and CAP2020 will also drive the thematic requirements for CLC+ Backbone and CLC+ Core. 2.3 Role of Member States The MS have always been intimately linked with the CLC as its production has been the responsibility of the nominated authorities through EIONET, i.e. NRC Land Cover. The MS have provided the production teams for the actual mapping, allowing the process to exploit local knowledge and familiarity with native landscape types, and providing access to national datasets, which may not otherwise have been possible. The bottom-up production approach has been the key to successful delivery of this important time series of LC/LU data. With respect to the non-CLC products within the CLMS, the MS involvement has so far been limited to verification and, in the case of the 2012 products, enhancement. Recently, the EEA started a product enrichment task for MS as well, which aims at enriching Hot Spot monitoring products with LU information originating from national / regional in situ data. Although the MS have provided

15 | P a g e valuable feedback and enhancement in some cases, particularly on the HRLs, their capabilities have not yet been fully exploited within the CLMS. It is therefore of the utmost importance to ensure a balanced contribution to the CLC+ suite of products between industry and MS via EIONET NRC/LCs. Table 2-2 shows the potential for a greater role for the MS across all of the new elements / products within the CLC+. It is important that the MS experts have oversight of the specifications and an understanding of the opportunities to provide feedback on the monitoring process and products. This includes the opportunities to contribute with data, experience and local knowledge to the production and validation activities where appropriate. The MS are currently involved in verification, but not in the validation of CLMS products. Verification is the evaluation of whether the product corresponds to the reality of the landscape by comparing the product to the regional / local expertise of the terrain, and hence test that it meets the expectation of the stakeholders. Validation is the independent geostatistical control on the accuracy with reference to the product specifications. The intended use of CLMS products and services at the national level calls for verification against national user requirements – not as an acceptance test but as a documentation of the usefulness, a promotional tool, and possibly also as assistance to improve the products for this intended use. Table 2-2: Summary (matrix) of potential roles associated with each element / product. CLC+ Legacy is used as one example for a CLC+ Instance.

CLC+ Backbone CLC+ Core (Call CLC+ Instances CLC+ Legacy (Under for Tender 2Q … (example) implementation) 2020) EEA Definition, Definition, Definition, Definition, coordination, coordination, coordination, coordination, main user. main user. main user. main user. MS Review of Review of Review of Production, specification, specification, specification, verification, provision of population with support to validation, geometric data national production, user. for future datasets, land verification, updates, use information, validation, user. verification, user. validation, user. Industry Production. Implementation Support Validation. and maintenance production, of DB validation. infrastructure.

2.4 Engagement with stakeholder community The success of the project and the long-term development of land monitoring in Europe is intrinsically linked to the involvement of the stakeholder community. It is vital that all the relevant stakeholders are aware of this activity and contribute their requirements and opinions to the 2nd generation CLC process. This will be obtained through public consultation before tendering CLC+ products.

16 | P a g e

2.5 Structure of this document This document presents the conceptual strategy and the technical specifications for the CLC+ product suite, status of the CLC+ Backbone implementation and CLC+ Core specifications for a subsequent series of CLC+ development steps and products. The latter aims at supporting the call for tender on the development and implementation of the CLC+ Core, and to continue dialog on the 2nd generation CLC approach and CLC+ suite of products with the stakeholders involved in European land monitoring to elicit feedback, comments and questions. The technical specifications presented in this document includes the feedback from the NRC LC meeting in Copenhagen in October 2017, the Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+ in Brussels in November 2017, dedicated written feedback from various MS on the draft versions D3 and D4 as well as additional feedback from EEA. This version of the technical specification documentation is primarily aimed at presenting the final details for the CLC+ Backbone and has also enhanced and refined the definitions of CLC+ Core. However, information on technical specifications for the future products CLC+ Instance and CLC+ Legacy are also presented. It has also focused on backwards engineering from what is required to ensure the CLC+ Legacy can be produced by considering what must be in a CLC+ Instance and the CLC+ Core to support this. Key issues are the provision of the required land use information and the link to the established HILUCS6 nomenclature. Finally, CLC+ Core presents a mechanism that allows for the maintenance of the EAGLE ontology, which allows for object-oriented descriptions in the form of code assignments.

6 Hierarchical INSPIRE Land Use Classification System

17 | P a g e

3 REQUIREMENTS ANALYSIS In line with the principle of Copernicus, this analysis in support of the development of the CLC+ products within the CLMS, is driven by user needs rather than the current technical capabilities of EO sensors and processing systems. The technical issues will be dealt with in the chapters related to the individual products focusing on their specification and, to a lesser extent, methodological development. This analysis was initially based around the requirements set out in the original call for tender but has been extended through inclusion of recent work by the EC and ETC to provide a broader range of stakeholder needs. As this project has developed, further requirements have been identified through meetings and stakeholder feedback to support the increasingly detailed specifications of a complete CLC+ product suite. The MMU and nomenclature of the current CLC have been operational for three decades. Backward compatibility is therefore an essential issue when a new monitoring framework is considered. Still, the specification for a future, improved and harmonised European land monitoring product is now open for revision. The proposed products should both support current European policies through continuity of current monitoring activities, but also support new policies. Examples are the reporting obligations (on the European level) on LULUCF, plans for an upcoming Energy Union, Greening of the CAP, the Urban Agenda, Biodiversity and long-term climate mitigation objectives. A confounding element is that many of the policies that could exploit EO-based land monitoring information rarely give clear quantitative requirements that can provide a basis for spatial, temporal or thematic specifications. Also, even though land monitoring information is crucial in many domains, so far, there has been no Land Framework Directive in Europe. The higher-level strategic policies, such as the EU Energy Union often refer to the monitoring linked to other activities such as REDD+ and LULUCF. Where actual quantitative analysis and assessment takes place, they tend to rely on currently available products. For instance, LULUCF assessments in Europe (independently from national reporting obligations) use CLC (with a 25 ha MMU), while monitoring at the global scale is using Global Land Cover 2000 (GLC 2000) with a 1 km spatial resolution (or 100 ha MMU). Reference was made to future monitoring in Decision No 529/2013/EU of the European Parliament and of the Council on accounting rules on greenhouse gas emissions and removals resulting from activities relating to LULUCF, but this only refers to MS activities with MMUs between 0.05 ha and 1 ha7. Many MS also employ sample-based field surveys (e.g. national forest inventories) to collect the required LULUCF data. Some of the initiatives associated with LULUCF have explored the potential of improved monitoring performance, such as the use of SPOT4 imagery with a 20 m spatial resolution to show land-use change between 2000 and 2010 for REDD+ reporting, but until very recently it has not been practical for these types of monitoring to become operational globally. A more fruitful avenue for user requirements in support of the CLC+ products is to consider the initiatives that are now addressing habitats, biodiversity and ecosystem services. The EU Biodiversity Strategy to 2020 was adopted by the European Commission to halt the loss of biodiversity and improve the state of Europe’s species, habitats, ecosystems and the services. It defines six major targets with the second target focusing on maintaining and enhancing ecosystem services, and restoring degraded ecosystems across the EU, in line with the global goal set in 2010. Within this target, Action 5 is directed at improving knowledge of ecosystems and their services in the EU. Member States, with the assistance of the Commission, will map and assess the state of

7 Regulation (EU) 2018/841 of the European Parliament and of the Council of 30 May 2018, on the inclusion of greenhouse gas emissions and removals from land use, land use change and forestry in the 2030 climate and energy framework, and amending Regulation (EU) No 525/2013 and Decision No 529/2013/EU

18 | P a g e ecosystems and their services in their national territory, assess the economic value of such services, and promote the integration of these values into accounting and reporting systems. The Mapping and Assessment of Ecosystems and their Services (MAES) initiative is supporting the implementation of Action 5 and, although much of the early work in this area on European level has used inputs such as CLC, it is obvious that to fully address this action an improved approach is required especially for a better characterisation of land, freshwater and coastal habitats, particularly in watershed and landscape approaches. The spatial resolution at which ecosystems and services should be mapped and assessed will vary depending on the context and the purpose for which the mapping/assessment is carried out. However, information from a more detailed thematic characterisation and classification and at higher spatial resolution are required which are compatible with the European-wide classification and could be aggregated in a consistent manner if needed. To that purpose, and as a first step, the CLMS developed the Riparian Zones product at a 0.5 ha MMU, using a tailored nomenclature based on CLC and to a large extent as well on EUNIS8. A first version of a European ecosystem map covering spatially explicit ecosystem types for land and freshwater has been produced at 1 ha spatial resolution using CLC (100 m raster), the predecessor of the current HRL imperviousness with 20 m resolution, JRC forest layer with 25 m spatial resolution plus a range of other ancillary datasets (e.g. ECRINS water bodies) using a wide variety of spatial resolutions (from detailed Open Street Map data to 10 x 10 km grid data used in Article 17 reporting of the Habitats Directive). It is clear that ecosystem mapping and assessment could make more in depth use of the recently available Copernicus Sentinel data and CLMS products and thus move to a higher spatial resolution. The primary goal of the CLC+ product suite is to underpin European policies and, where possible, fulfil their needs. Furthermore, the EEA also wants to make the new products useful at national, regional and even local levels. To this end the requirements gathering exercise was extended to a broader stakeholder community at the national level and below, including representatives of commercial service providers, NGOs and academia. This work drew on existing surveys (e.g. MS CLC survey conducted by ETC-ULS9) and the feedback from the two events where the 2nd generation CLC approach and CLC+ products were promoted. The recent EC-funded NextSpace project collected user requirements for the next generation of Sentinel satellites to be launched around 2030. These requirements were analysed on a domain basis. From the land domain, those requirements that were given a context of land cover (including vegetation), glaciers, lakes, above-ground biomass, leaf area index and snow were considered. A broad range of spatial resolutions were requested from sub 2.5 m to 10 km. Two thirds of the collated requirements wanted pixel resolutions in the 10 – 30 m range. The requested MMUs were also broadly distributed, but with a preference for 0.5 to 5 ha, which represents field / city block level for much of Europe. The temporal resolution / update frequency was dominated by yearly revisions, although some users could accept 5 yearly updates. Some users also wished for monthly updates, although the reason was unclear. The users were less specific about the thematic detail required and the few references given were to CLC, EUNIS, LCCS and “basic land cover”. The acceptable accuracy of the products was in the range from 85 to 90 % and above. The European Topic Centre on Urban, Land and Soil systems (ETC/ULS) undertook a survey of EIONET members involved with the CLMS and CLC production considering a range of topics in

8 European Nature Information System 9 Service Contract No 3436/R0-Copernicus/EEA.56586, Task 3: Planning of cooperation with EEA member and cooperating countries. Final report in preparation of the CLC2018 exercise

19 | P a g e advance of the 2018 activities. Some of the questions referred to the shortcomings of CLC and the potential improvements that could be made towards CLC+ product. As expected, it was noted that some CLC classes cause problems because of their mixed nature (mixed LC and LU components) and elements in the reality of the landscape composition and use that are sometimes complex and difficult to disentangle. It was suggested that this situation could be improved using a smaller MMU so that the mapped features have more homogeneous characteristics. However, it should be noted that a mere refinement of the MMU leads to the next level of mixed signals and classification challenges. More realistic is to link the feasibility of feature detection and identification to the spatial resolution of the input EO data, in casu: 10 m for Sentinel-2 data in combination with 2.5 m for the VHR data made available in the Copernicus data warehouse managed by ESA. Also, a MMW reduction, particularly for highways, would allow linear features to be represented more realistically. Of the 32 countries, who responded to the questionnaire, 25 of them would support a finer spatial resolution, showing that there is, in general, a national demand for high spatial resolution LC/LU data. The proposals ranged from 0.05 ha to remaining at the current 25 ha, but the majority favoured solutions in the range of 0.510 to 5 ha. Thematic refinement is also supported by around one third of the respondents, who requested improved thematic detail, separation of land cover from land use, splitting formerly mixed CLC classes and the addition of further attributes to the spatial polygons. During the Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+ in Brussels in November 2017 several MS representatives were asked to provide their requirements for land monitoring and their thoughts on the CLC+ proposals as presentations to the whole event. Their required spatial detail ranged from 0.5 ha MMU down to 5 m pixel resolutions, which are at the most detailed end of what had been suggested to date and potentially the smaller spatial resolutions are out of scope for CLC+ products. For the required thematic detail, it was suggested to go beyond the current CLC 44 class nomenclature and aim for either a full level-4 classification or split the more vague or ambiguous classes. There was a unanimous request for yearly updates and a general feeling that the quality of the data should be improved relative to the Hot Spot monitoring products. As the final session in the workshop, a survey of the attendees (via an online voting tool) was undertaken against a set of questions designed to understand the audience’s feeling towards the CLC+ products and assess their preferences for some of the key technical specifications. There was strong support for CLC+ products with 88 % of respondents considering the products to have relevance at European and national levels with 35 % also thinking they would be useful at sub- national levels. Similar levels of support were found for a spatially detailed land cover product with limited thematic content, i.e. CLC+ Backbone. On the technical specification of CLC+ Backbone, 80 % felt a MMU of 1.0 ha or smaller would be appropriate, but this was almost evenly split across 0.25 ha, 0.5 ha and 1 ha. The CLC+ Backbone MMW was evenly split between 10 m and 20 m and a lively debate was produced around the impact of the input EO data. The dominant level of the preferred thematic detail was around 10 classes with 51 %, although 27 % voted for an even less detailed nomenclature of around 5 classes. In terms of the update cycle for CLC+ Backbone, the majority of the audience thought 3 years as sufficient, however almost a quarter wanted yearly updates. For the CLC+ Core the preferred grid size was less clear, ranging from 10x10 m to 100x100 m, with the extremes dominating and representing almost equal numbers of votes. Further issues regarding the

10 0.5 ha being required by LULUCF

20 | P a g e data sources and contents of the CLC+ products were also surveyed, but these will be dealt with in the relevant chapters. During the project, the EEA and the EAGLE Group have received a large amount of written feedback from MS and other stakeholders, which has been addressed where appropriate in the relevant product chapters below. Some of the feedback, although important, is out of scope for this project and is being dealt with separately by the EEA and DG-DEFIS.

The key themes of the feedback have been: • Clarification of the overall 2nd generation CLC approach and the CLC+ product concept. The team appreciates that this is quite a departure from current practice within the CLMS and during the iterations of the deliverables the description of the concept has been clarified and improved, particularly in visualising the relationship with the exiting CLMS components and the links between the individual products of the CLC+ suite. • Suggestions regarding the technical specifications. Many technical issues have been identified from the feedback from the stakeholders. These issues have been reviewed and incorporated where appropriate, practical and feasible within the revision of the technical specifications given in the later chapters of this deliverable. • Use cases for the CLC+ products. The requirements for an improved version of CLC has been known for some time and the detailed specification for CLC+ is now well in line with a range of global and European policy and environmental management needs. The new products within the CLC+ suite, such as CLC+ Backbone and CLC+ Core, have a vital role to play in the delivery of CLC+ Instance. Still, there are so far, no admissible use cases. The required use cases should be developed as end users become more familiar with the capabilities. • Inputs to the products from MS and other sources. Inclusion of MS data in the production / updates of the CLC+ products matches the bottom-up approaches promoted by the EEA and EAGLE and may help make the products more relevant for use at the national and sub- national level. There are, however, some technical and organisation challenges, which need to be addressed, and which led to the decision to use European wide datasets for the first production run of CLC+, in attendance of realistic alternatives for future updates. This report includes in annex a detailed review of national data sets, their suitability and access conditions for use in the project. • Quality of the CLC+ products. A further issue of improved spatial and thematic detail is the need to increase the quality of the products so that they will be accepted by the user community. The conceptual strategy aims to support high quality products by allowing CLC+ Backbone to focus on improvement of the spatial detail and CLC+ Core to focus on the enrichment of thematic detail. • Change detection, update cycle and maintenance. The aim of this project was to propose the initial technical specification for the CLC+ products, not to design a methodology for the updating and maintenance of the product in an operational monitoring environment. These issues have been noted where relevant in the technical specifications and are likely to feature more prominently in later development work. Where feasible, though, it is expected that tendering service providers elaborate on potential update mechanisms.

21 | P a g e

• Feasibility studies. The project was designed as a desk study and there was no requirement or resources to undertake any testing of the specifications proposed or methods that may be relevant. Where appropriate existing feasibility study results were incorporated into the reviews and technical specification development, such as the work undertaken in Hungary11. • Governance and implementation of the CLC+ production. As was expected, the proposal of a set of products more closely aligned to MS activities, yet integral to the CLMS, has raised some issues related to governance and implementation of work. These issues are out of scope for this document but have been noted and are being dealt with by the EEA. This chapter has summarised the user requirements and therefore set boundary conditions within which the CLC+ product suite can be developed and seen to be valid. Further work in the subsequent chapters will refine the technical specifications for each of the products. From the requirements review above, the feedback from the workshops and considering the specific requirement associated with LULUCF, it is suggested that the high-performance products of the CLC+ suite should have a MMU of 0.5 ha, a MMW of 20 m in vector format (CLC+ Backbone), and a 1 ha MMU in GRID format (CLC+ Core). The coming CLC+ Backbone will be based on EU-wide reference data for segmentation and on EO data for the LC thematic content. The EO data will have a spatial resolution of between 10 and 20 m (HR, multiple coverages), complemented with 2.5m data (VHR, single coverage). The thematic content would be a refinement of the current CLC nomenclature to cope with changing MMU, separation of land cover and land use for the needs of most application domains, not the least for ecosystem mapping and assessment. The temporal update could come down from 6 years to 3 years in the short-term, and potentially to 1 year in the longer-term. To reach this ultimate objective and initiate implementation of the CLC+ products, the requirements review supports the role of the CLC+ Backbone and CLC+ Core in contributing to the development and delivery of the required land monitoring system.

11 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of 23.4.2009 on the European Environment Agency and the European Environment Information and Observation Network. FINAL REPORT: Implementation of the EAGLE approach for deriving the European CLC from national databases on a selected area in one EU country

22 | P a g e

4 CLC+ BACKBONE – THE NEW VIEW ON EUROPE’S LAND COVER

CLC+ Backbone is under implementation after a successful call for tender in July 2020 and is planned to be implemented end of 2021. The following presentation and requirements for CLC+ Backbone only serves as background information for understanding the thoughts and requirements for realisation of CLC+ Core. 4.1 Introduction CLC+ Backbone has been conceived as a new high spatial resolution vector product. It represents a baseline object delineation with the emphasis on geometric rather than thematic detail. The wall-to- wall coverage of CLC+ Backbone shall draw on, complete and amend the incomplete picture formed by the current hotspot monitoring products (Urban Atlas, Riparian Zones and Natura 2000) which cover approximately a quarter of total area as shown in Error! Reference source not found.. CLC+ Backbone will provide a comprehensive and detailed coverage of the whole EEA39. The objective of the CLC+ Backbone is to provide: • A spatially detailed geometric base for CLC+ Core and where appropriate for CLC+ instances; • Basic homogeneous land cover units (EAGLE terminology) as thematic input to CLC+ Core and CLC+ instances; • A potential standalone product as it reflects and amends the thematic content of HR Layers and is available as vector and raster product (TBD); • A useful product to enhance and accelerate the update of the hot-spot monitoring products (local component)

23 | P a g e

Figure 4-1 Illustration of a merger of current local component layers covering (with overlaps) 26,3% of EEA39 territory (legend: red – UA, yellow - N2K, blue – RZ LC).

CLC+ Backbone is initially being produced by a mostly automated industrial approach divided into three distinct steps (Error! Reference source not found.). The spatial units of CLC+ Backbone, called “landscape objects”, are vector polygons. They are classified based on their spectral characteristics from multiple sensors (Sentinel 1&2, VHR) and attributed using a wall-to-wall pixel-based 10 m spatial resolution land cover classification.

24 | P a g e

Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3) the pixel-based classification of EAGLE land cover components on raster basis and (4) attribution of Level 2 objects based on spectral segment based statistics, pixel based classification, additional Sentinel-1 information and VHR-texture parameters.

For implementation of CLC+ Backbone the following principles are taken into account to contribute to a harmonized European land cover monitoring framework: • Heritage: The concept takes into account the existing Copernicus pan-European and hot- spot products. The technical specifications to be provided by this contract are therefore based on a thorough review of these datasets and their geometric and thematic content. • Integration: CLC+ Backbone will integrate selected geometric information from ancillary data and provide information for the update of existing COPERNICUS products.

4.2 Description of CLC+ Backbone CLC+ Backbone will consist of two sub-products: • Vector product: o object-based, 0.5 ha MMU, derived from linear networks and image segmentation, attributed with aggregated statistics from the raster product and additional characteristics from VHR and Sentinel-1; • Raster product: o pixel-based (multi-temporal Sentinel-2 input imagery), 10m spatial resolution, basic land cover classes.

25 | P a g e

4.2.1 Level 1 Objects – hard bones The idea behind the Level 1 objects (“hard bones”) is to derive a partition of the landscape that reflects relatively stable (persistent) borders based on artificial and natural features. In the scale of Sentinel-2 images these features are represent as lines (linear networks). Roads and railways represent therefore the transportation network and rivers represent the hydrographic network. When building the spatial framework, there would be a potential long list of sources for persistent object geometries. However, throughout the consultation phase the decision was taken to solely concentrate on linear networks for Level-1 and provide all other object geometries for Level-2 directly from image segmentation. The Level 1 input data are available as line geometry, mostly representing centre lines of roads, railways and rivers. From the intersection of these line geometries Level-1 object polygons are derived. Within an aggregation step all polygons smaller than the final MMU must be merged. Hard bones, based on EU-wide reference data, will be using Open Street Map (OSM), EU-hydro and the EU Water Framework Directive (WFD) data to derive roads, railways and hydrography. The categorisation of linear structures and how these will be considered in production (distinction between centre lines only, polygons or neglect) will be proposed. In case availability, access conditions or technical heterogeneity of in situ data jeopardise the homogeneity of the EEA39 coverage of the CLC+ Backbone product or its further propagation and use under the Copernicus data and information policy principles, it is recommended to use sources of European wide in situ data that overcome the above risks, and this in attendance of the fulfilment of requirements for in situ data as provided by distributed data sources according to INSPIRE principles.

4.2.2 Level 2 Objects – soft bones The aim of the Level-2 segmentation is to derive objects that can be differentiated by their spectral response and variation throughout a year. With the increasing number of observation dates per pixel throughout a year the likelihood of finding homogeneous management units increases, but at the same time pixel based noise is increased as environmental conditions within a field parcel may vary significantly throughout a year (e.g. soil moisture content). Service providers will have to balance the number of acquisitions to maximize the identification of single field parcels and on the other side to minimize the “salt and pepper” effects due to varying spectral response and short term variations due to climate conditions (, snow, moisture). The Level-2 landscape objects represent land cover units with a unique vegetation cover / surface property and homogeneous dynamics throughout a year. As an example, different settlement structures (e.g. single houses or building blocks) should appear spectrally different (due to the different levels of sealing, shape, spatial extent and spatial built-up pattern) and will therefore be delineated as separate polygons. The actual cultivation measures applied on the land will also influence the type of delineation as e.g. in agriculture the field structure is visible in multi-temporal images due to the differences in temporal-spectral response according to the applied management practices on field level (mowing, ploughing and sowing). Homogeneous management units (e.g. field parcels) that may not differ from the neighbouring unit in terms of main land cover but can differ only in terms of land cover dynamics over time. Table 4-1: Main characteristic of Level-1 objects (hard bone) and Level-2 objects (soft bones).

26 | P a g e

Level-1 objects Level-2 objects Name Hard bones Soft bones Method Partition of landscape according to Image segmentation existing spatial data Input data Persistent landscape objects: Sentinel-2 complete time-stacks of one Transport infrastructure (roads & vegetation period (amended by Landsat railways excluding tunnels); river 8 data in case of clouds/haze) & canal network Special issue Geometric transformation of line Pre-processing (atmospheric and geometry into vector-polygons topographic) of Sentinel-2 images and further transformation of vector-polygons into raster based regions Advantage Very good cost/benefit ratio; no Independent data source; doubling of efforts, reuse existing Clearly defined observation period European geospatial data (2018), free and open access infrastructure Disadvantage Data might be outdated; large Reliability and reproducibility of vector processing facilities segmentation algorithms needed; effort to integrate and collect appropriate data Concerns European data vs. national data Technical feasibility (big data processing); cooperation with data centres (e.g. DIAS12) necessary

12 COPERNICUS Data and Information Access Service

27 | P a g e

5 CLC+ CORE – THE GRID APPROACH

The CLC+ Core product should be an underpinning semantically and geometrically harmonized “data container” and “data modeller” that holds, or allows referencing of, land monitoring information from different sources in a grid-based information system. The grid database will essentially store geospatial and thematic data from sources such as CLC+ Backbone, CLC, Local Component data, HRLs, in-situ as well as information provided by the MS. The contents of the grid database can be exploited directly within grid-based analyses or alternatively subsets of the grid database can be extracted and analysed by spatial queries driven by objects from an external vector dataset. In this way, the CLC+ Core can support the other elements of the 2nd generation CLC, for instance, a CLC+ Backbone object could gain additional thematic information by aggregating or analysing the equivalent portion of the CLC+ Core as a starting point for CLC+ production (see next chapter).

5.1 Background Textbooks usually differentiate between “raster GIS” and “vector GIS” (Couclelis 199213, Congalton 199714). The “vector GIS” does, like the paper map, attempt to represent individual objects that can be identified, measured, classified and digitized. The “raster GIS” attempts to represent continuous spatial fields approximated by a partition of the land into small spatial units with uniform size and shape. The “raster GIS” therefore have structural properties that make them particularly suitable for use in modelling exercises. A grid is a spatial model falling somewhere between the vector model and the raster model. The grid looks like a raster. The difference is the information attached to the grid cell. A raster cell (pixel) is classified to a particular land cover class or single characteristic. The grid cell, on the other hand, is characterized by how much it contains of each land cover class or other information. The grid is, as a result, a more information-rich data model than the raster. The difference between the raster and the grid is illustrated in Figure 5-1, using an example based on CLC. The original CLC vector data set can be converted into a new data set with uniform spatial units of e.g. in this case 1 X 1 km. The result can either be a “raster” where a single CLC class is assigned to each pixel (representing for example the dominant land cover class inside the pixel; or the CLC class found at the centre of the pixel unit). Alternatively, the result is a “grid” where an attribute vector representing the complete composition of land cover classes inside the unit is assigned to each grid cell. Geometric information on a more detailed scale than the size of the

13 Couclelis, H. 1992. People manipulate objects (but cultivate fields): Beyond the raster-vector debate in GIS, in Frank, A.U., Campari, I. and Formentini, U. (eds) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639: 65 – 77, Springer Berlin Heidelberg 14 Congalton, R.G. 1997. Exploring and evaluating the consequences of vector-to-raster and raster-to-vector conversion. Photogrammetric Engineering and Remote Sensing, 63: 425-434.

28 | P a g e spatial unit is in both cases lost, but the overall loss of information is less in the grid than in the raster. The loss of information when using a grid structure can be further reduced if the grid size is selected to be less than the minimum mapping unit of the input product. For instance, with a MMU of 0.25 ha (50 x 50 m), the grid size should be around 10 x 10 m. Consequently, any downstream products derived from the grid-based information must also consider grid size in relation to its requirements.

Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit: 10 daa = 1 ha. A grid data model is not the result of mapping in any traditional sense. The grid is “populated” with thematic information from (one or more) existing sources (Figure 5-2). One method frequently used to populate grids with LC/LU information is to calculate spatial coverage of each LC/LU class based on a geometric overlay between the grid and detailed polygon datasets. The assignment of attributes can furthermore involve counting or statistical processing of registered data for observations made inside each grid cell (for instance, not only the area of a class, but also the number of objects it represented). Ancillary databases, which provide important and relevant information about the content and context of the grid cells, can also be added. For instance, the number of buildings, length of roads or number of people found within the grid cells. 5.2 Overview CLC+ Core will be the effective “engine” to model, process and integrate the individual input data sets to create new, value added information. The modelling will make use of synergies (the same

29 | P a g e information provided for one cell based on different sources) as well as disagreements (different information provided for one cell by different sources) during the data processing. We would like to refer to the modelling and combination of different layers of the database, including the modelling parameters as an “instance” of the database. In this understanding, any output of a database modelling process using different input layers and different parameters will create a specific instance with associated metadata. Technically CLC+ Core shall: • Allow further characterisation (e.g. by additional input data / knowledge, see also chapter 5.3 - Populating the database). • Allow for (fully or partial) inclusion of Copernicus High Resolution Layers and hot spot monitoring products. • Allow the best possible transformation of National datasets into CLC+ by implementing the EAGLE concept and allowing for ingestion using national rulesets (capturing the wealth of local knowledge existing in the countries). • Allow for combination of several EAGLE codes into entries in predefined classification systems using EAGLE barcodes. Storing information in a grid structure will also help to overcome the problem of different geometries: as the data are converted to grid cells, differences in vector geometries will have a lesser impact.

5.3 Populating the database The grid as a spatial model consists of square spatial units of uniform size and shape that can have an (internally) heterogeneous thematic composition, i.e. the grids are “geometrically uniform polygons”. Each grid cell has a unique ID that is related to a database containing the attribute data. In a first step, CLC+ Core will be populated with existing thematic information from the CLMS: • Hot Spot monitoring data (Urban Atlas, N2000, Riparian Zones) • High Resolution Layers (Imperviousness, Forest, Grassland, Wetness & Water) • CLC+ Backbone • Any other data, such as CLC (25 ha), MS data Upon agreement, Member States are invited to provide also their national information to populate the information system, mainly on land cover components, land use, agricultural themes and further characteristics according to the EAGLE data model.

30 | P a g e

Figure 5-2: Representation of real world data in the CLC+ Core. The minimum thematic content of CLC+ Core will be determined by the information which is mandatory to create the final product, i.e. CLC+ Legacy and its traditional 44 classes. The CLC+ Core will inherit • from CLC+ Backbone: o the homogeneous landscape units, o and the basic land cover classes at pixel resolution, • from the hot spot monitoring products: o the information on land cover/land use • from the high resolution layers: o Information on elementary parts of the EAGLE land cover data model But for deriving CLC this will not be sufficient: the basic land cover classes and the HRL provide only land cover information and some parameters (like the sealing degree or tree cover density), while the land use information – which is a crucial part of the CLC nomenclature – is not addressed in an exhaustive manner. The hotspot monitoring products data do contain land use information, but in a spatially not complete manner (only covering parts of the European territory) and therefore cannot fulfil the requirements regarding information on land use. The detailed land use information can only be provided by the Member States, either from existing databases on land use, or – as last option when no such information has already been collected – from the CLC database itself and/or via dedicated mapping campaigns for specific land use classes. The standardized list of land use classes has been initially developed in the frame of INSPIRE as so called HILUCS15-classes. These classes were amended within the EAGLE data model. For CLC+ Backbone agricultural land use classes (arable land, grassland, permanent crops) are additionally included into this system, as the main focus of HILUCS classes is to comply with socio-economic statistical assessments rather than description of observed land use classes (e.g. instead of the land use production type in agriculture the focus is on usage of products: alimentary, industrial instead of cropland, grassland, …). Crop types as main information regarding agricultural production are

15 Hierarchical INSPIRE Land Use Classification System (HILUCS)

31 | P a g e foreseen as additional characteristic amending the general land use classes in agriculture. It is essential to stick to the already defined Code lists under the EAGLE model for populating the CLC+ Core database to allow further semantic transformation of these classes and harmonization of input data. From this perspective it should be clear that data ingested into CLC+ Core should give reference to common vocabulary as formulated by the EAGLE model. We will denote this vocabulary the EAGLE ontology. 5.4 Using the EAGLE ontology By themselves, the associated nomenclatures for the Copernicus monitoring products and the member state datasets supports semantic reasoning for each dataset. These nomenclatures could be directly expressed as ontologies and used in combination for the CLC+ Core. But when trying to combine datasets using classification systems, we encounter overlaps in the form of thematic content. This is where the EAGLE model steps in. The EAGLE model allows for object-oriented descriptions of land cover components via one or more parametric characterizations using EAGLE codes (as opposed to traditional classification using a single value). The intention is that by employing a more fine-grained characterisation of the land areas inside each grid cell we can establish a mechanism where we can obtain descriptions from several source and use these in combination from the CLC+ instances.

Code

Hierarchical code attribution allowing object oriented parametric descriptions of thematic content (See Annex 8 for the current listing of LCC, LUA and CH codes in the EAGLE ontology)

Definition 1 EAGLE Code.

Content being assigned EAGLE LCC codes will typically originate from the CLC+ Backbone, LUA codes will typically originate from member state datasets – and Characteristics will originate from a mix of the two. The combination of EAGLE codes from instances can then be described using an extraction rules using EAGLE Barcodes for each entry in the nomenclature for the instances:

Barcode

An EAGLE barcode describes the decomposition of a specific classification entry based into the relevant entries in the Eagle Ontology as described in the EAGLE transition matrix. All relevant information is contained within the EAGLE model.

Definition 2 Barcode

One example could be the attribution of “Forest land” for the LULUCF instance, looking for information from “Trees (LCC: 211)” and requiring the absence of Agricultural Management practices (one of the subclasses of LCH 011). The “Forest land” classification could then be delivered using extraction rules noted for the “Forest Land” code in the LULUCF nomenclature.

32 | P a g e

Figure 5-3 illustrating the usage of EAGLE codes (LCC, LUA, CH) in CLC+ core

A nomenclature for each CLC+ instance should be constructed. E.g. a LULUCF nomenclature could be constructed - with one of the entries in the nomenclature being “Forest Land”, with an associated EAGLE barcode. The idea would be that the nomenclature would hold across the entire EEA39 area – extracting data from CLC+ core. The input data would be loaded in with possible contributions from all possible sources. It should be noted that the EAGLE barcodes only denote how to combine EAGLE codes from existing entries in the EAGLE model into a classification code for an external system. If we are to import data into CLC+ Core, we will need to decompose the incoming nomenclature for the classification system using knowledge that potentially is not available beforehand. Firstly, the EAGLE barcode for the nomenclature should be expressed and secondly, additional rules for importing data should be expressed by NRC experts:

33 | P a g e

Figure 5-4 illustrating the use of Ingestion and extraction Rulesets

This figure should illustrate that we require an ingestion procedure that transforms chosen information element from input datasets to the EAGLE model – and in the case of country data sets we will rely on the assistance of NRC experts to supply information about where the data is available and with what priority it should be mapped (NRC Ingestion Rulesets).

In addition to the attribution of entries in code lists, EAGLE codes (especially Characteristics) can give reference to a value in a continuous range. We will denote the codes that references numbers as a “Parameter”:

Parameter

Parametric values are specified EAGLE characteristics (CH) that references a continuous value range

Current examples of specific parameters in a continuous range include: ▪ “Soil Sealing Degree” ▪ “Crown Cover Density”

Definition 3 EAGLE Parameter

The EAGLE model also employs the concept of General and Temporal parameters:

1 General Parameters 11

34 | P a g e

Height 2 in [m] 112 2 % cover 113 1 Temporal Parameters 12 2 Duration 121

Figure 5-5 EAGLE General and Temporal parameters

For the purpose of modelling, we can attach the Temporal Parameters to the concept of “Spatial Composition” (described in chapter 5.4.2) and the general parameter “CH: % Cover: 113” to the specific EAGLE code attachment to the spatial composition. “CH: Height in [m]: 112” could be extracted from a central data source and made available in 100x100m from CLC+ Core. When we are aiming to ingest data from different data sources with different resolutions, we will introduce the abstraction “Spatial Composition” that could be used to refer to singular Land cover components or anything with a geometrical description and a given timeframe. For the purpose of CLC+ core we will restrict it to singular grid cell references within a given temporal layer (e.g. we could talk about GridCell 1234 for “CLC LEGACY 2020”)

Spatial Composition A concept allowing parametric description related to a given area related to a given observation period and the geometrical area. Definition 4 Spatial Composition

In addition to describing the area which the observation relates to, each spatial composition should also allow for characterization of observations using the temporal parameters from the EAGLE model. These could be e.g. be “duration” or “period”. Note that we are aiming at allowing the delivery of data that is no synchronized with the CLC delivery cycle (so member state data could e.g. be supplied bi-annually, whereas CLC could be annually).

5.4.1 Maintaining and using the EAGLE ontology

The EAGLE ontology identifies each of the relevant EAGLE codes and their internal hierarchical structure using a vocabulary that identifies their relations – and allows for a description of the ingestion of data into CLC+ Core using NRC Rules. The implementation of CLC+ Core shall allow for running updates of the EAGLE ontology by the EAGLE group in a manner that maintains backwards compability for existing NRC Rulesets.

In general, the following should be considered, when using the EAGLE ontology: - Entries in the EAGLE ontology contains object-oriented descriptions, and not classification codes as such. Therefore, we cannot expect to use EAGLE barcodes to guide import of national datasets directly. This introduces the need for NRC Ingestion Rulesets to aid the import of data.

35 | P a g e

- Each EAGLE code shall have a description in a structured format that the supports the existing description of the EAGLE model (including both the hierarchical relationships, as well as the textual descriptions). - Every time an EAGLE characteristic (LCH) is relevant for a LCC or LUA code, then the relationship should be expressed, so that an application or user can be guided towards characteristics that are associated with the given code16. - By using the EAGLE ontology, it should be possible to obtain a name for each possible place to store values with reference to the EAGLE model. Combining the concept of an EAGLE code and the concept of spatial composition we should identify how to store a given value for a given EAGLE code in given point in time for the physical implementation.

Practically this means the following for the implementation of CLC+ Core: - The import of CLC+ instance data shall be guided by Ingestion Rulesets maintained by NRC Experts. - The export of CLC+ instance data shall be guided by the Extraction Rulesets. The queries supporting the extraction of data for the instances can be formulated with reference to the EAGLE barcode – therefore it does not have to be expressed in SPARQL but could as well be created using e.g. SQL or WCS. - Coverage data can be stored outside the ontology, allowing storage of timeseries data for biophysical parameters in a separate database (a hybrid solution).

Note that the hierarchical relationships and nomenclature descriptions for the EAGLE ontology could be described using the SKOS17 ontology to aid interoperability for supplying catalogue records for DCAT18.

5.4.2 Supporting spatiotemporal analysis in CLC+ Core

Traditionally, the thematic attributions for Corine Land Cover have been associated with land cover units on a pan European scale – in the same resolution. When we are trying to combine Land Use datasets from member states, we expect that we must aggregate the thematic attributions to a common nomenclature as well as a common resolution in grid cell and give a description of the valid duration of each temporal layer. We expect the existing land cover characteristics and codes to be directly related to Land Cover Units, whereas the Land Use codes and characteristics is to be supplied in a lower resolution attributed to grid EEA39 100x100m grid references. This means that the practical combination of Land Cover and Land Use will take place at 100x100, thus requiring aggregation of the linked datasets, before a practical analysis can make sense. It should also be noted that e.g. if an ingested national dataset LU “industry” with 25 ha resolution will not have improved resolution, even though it is not stored in 100x100 cells. To identify the precision in the aggregation we will note the

16 Note that the relationship is on a schema class level, not instance level 17 SKOS is described at https://www.w3.org/TR/2008/WD-skos-reference-20080829/skos.html 18 DCAT is described at https://www.w3.org/TR/vocab-dcat-2/

36 | P a g e percentage of the Grid Cell having the noted thematic attribution related to LLC, LU entries in the EAGLE model. During aggregation it should be noted that the EAGLE ontology provides linkage with relevant EAGLE parameters that supports the thematic code attribution at the given spatial composition. This should help the comparison of results from different attribution schemes using the same source data – as well as offer support for timeseries analysis – but during aggregation it should be observed that the implied coupling between codes are intended to hold at the given spatial composition ,and not necessarily after aggregation. Over time we expect the spatial extent of the land cover units to vary – so we aggregate to Grid Cells in combination with a timestamp in CLC+ Core to support a possible change in reference data. The changes in EAGLE code attributions will be tracked with reference to the spatial compositions of the EAGLE codes for the analysed time periods. In contrast to the existing CLC nomenclature, the EAGLE data model supports multiple EAGLE codes associated with the same area – thus supporting both land cover and land use in combination. This means that a change from one observation period to another observation period could possibly consist of one attribution to several associated attributions in the next observation period. Given that we expect to be using aggregated values in CLC+ Core, we will store a percentage of the area noted to be covered the code during the aggregation. This should help identify the dominant attribution during the attribution period, as well as supporting Q/A analysis when combining datasets. Since we do not expect to have deliveries in the same update intervals across Europe then we need a way to identify the start and stop date for the data observations to allow the spatiotemporal coupling across the datasets. This introduces the need for spatial composition, where we link observation using knowledge of the observation period and the given location for the observation.

5.5 Data Integrity

In overall we expect to store a value related for each EAGLE code noted for a given grid cell. Every time we encounter an EAGLE parameter, we expect to store the actual value – and every time we encounter an EAGLE code, we expect to store a % value, marking how much of the grid cell the given code covers.

5.5.1.1 Value ranges for Parameter types Parameter values should be within the given range for the EAGLE parameter (given for the specific parameter type).

5.5.1.2 Hierarchical sum rule Due to the hierarchical nature of the EAGLE codes, then the % value stored for a broader code should equal to or greater than the sum of % values for the narrower codes stored. E.g. given the following hierarchy of codes: BIOTIC / VEGETATION 2 Woody Vegetation 21

37 | P a g e

Trees 211 Bushes, Shrubs 212 regular bushes 2.121 dwarf shrubs 2.122 Herbaceous Vegetation (grasses and forbs) 22

If we store 20% “trees” and 20% “Bushes, Shrubs”, then we would expect 40% “Woody Vegetation”. From this it should follow that updates to values should be coordinated in a manner that also potentially updates broader code values.

5.5.1.3 Supporting code and parameter coupling During the construction of Rulesets, attention should be paid to parameters that is intended to support code attributions in CLC+ instances. As an example, member states will have different definitions of “Forest Land” (a reference to the IPCC category) definitions that could depend on the following parameters • LCH 0925 : “Crown Cower Density” • LCH 111 : “width” • LCH 112 : “height” This would imply that the possible ingestion of member state characteristics “Crown cover density”, “width” and “height” values that supports a country specific interpretation of the IPCC category “Forest land” in combination. The attribution of “Forest land” could involve additional data (e.g. it could include that Forestry management practices have been attributed to the area, as well as not having certain agricultural practices attributed) The original intent of the usage of Characteristics is expressed as19: “The Characteristics, which can be chosen to describe further details of the Land Cover Components, are arranged around the Land Cover Components (UML-LandCoverClasses) as additional boxes. Depending on their applicability, they are only valid for specific part of the Land Cover Components and therefore are linked respectively. E.g. the attribute “soil sealing degree” can be applied for any abiotic spatial unit ( - even any biotic in theory), while e.g. the status attribute “clear cut” only relates to vegetation or abiotic natural surfaces materials. Other characteristics like “spatial landscape patterns” could be used to describe a spatial pattern composed by one element or a mosaic of several landscape components in larger scale.”

From this it should follow that the value for a specific characteristic is intended as a means to further describe a specific land cover component – so at the CLC+ Backbone level when we have the specific values at a 10x10m resolution, this would work – but attention should be given during ingestion of datasets to a 100x100m level that contains aggregated values.

19 Page 16 in https://land.copernicus.eu/eagle/files/explanatory-documentation/eagle-concept_explanatory- documentation_v2-3.1

38 | P a g e

Parameter type data that is intended to support CLC+ instances will be uploaded in a form that is valid for the entirety of the grid cell – thereby potentially decreasing the precision of the stored value as compared to the scenario where an average parametric value is stored with relation to specific code assignment. E.g. if the 20% of the 100x100m grid cell is covered by “LCC: 211 Trees” and 70% is covered by “LCC 212: Bushes, Shrubs”, then we do not expect the value of “LCH 0925: Crown Cover Density” to express a direct relationship to “LCC: 211 Trees”. The averaged value will likely express a relationship with the dominant value. This calls for the need for the option of preserving the relationship between the code assignment and the relation to the supporting parametric values when aggregating values for the percentage being covered by the “Forest land” code assignment. Supporting the preservation of the relationship between the code attribution and the relevant parameters would require a mechanism identifying that the parameter is “locked” or “prioritized” in the case of data is already present in the CLC+ Core. When constructing new rulesets – the NRC expert should be guided that this parameter is relevant for e.g. “LCC: 211 Trees” (this could be done be inquiring the EAGLE ontology) and given the right credential/rights for the user present an option to override / break this relationship - depending on the given CLC+ instance. If there is an expressed need to break the relationship, then a specific instance could be constructed – E.g. “CLC+ LULUCF” – potentially allowing “LCH 0925: Crown Cover Density” to be overwritten – and “CLC+ LEGACY” that would maintain the original data.

5.6 Data flow

Data shall be stored in CLC+ Core with a reference to an EAGLE code for each entry and combined into CLC+ instances using EAGLE barcodes. The intention is that each CLC+ instance data shall be available using a fixed nomenclature for each specific instance, allowing the continuation of existing use cases using common GIS software (such as QGIS20 or ArcGIS21). 5.6.1 Ingesting data Data shall be ingested with reference to a ruleset and stored with reference to the relevant EAGLE codes in a manner that identifies the % covered by the code assignment (CH: 113) and the actual values for EAGLE codes note of the Parameter Type. When ingesting data into CLC+ core we need more data than the EAGLE barcode. We will denote the combination of the EAGLE barcode and additional information required an “Ingestion Ruleset” and expect these rulesets to be formulated by NRC experts, aided by a user interface provided by the CLC+ Core system. Ingestion Ruleset

An Ingestion ruleset will identify a data ingestion procedure marking how a selected information element in a member state dataset can be expressed as one or more entries in the EAGLE ontology in the CLC+ Core

20 Available from https://qgis.org/ 21 Available from https://www.arcgis.com/

39 | P a g e

The ingestion ruleset will consist of • The country or organisation responsible for the ruleset • The name of the chosen information element in the source data set • Documentation of decomposition rules for the chosen information element towards one or more EAGLE codes in CLC+ Core (an EAGLE barcode) • Knowledge about the spatiotemporal coverage of the dataset (potentially obtained from the catalogue) • Knowledge about where to obtain the source data at a central storage location. • Knowledge about additional data required for supporting the decomposition (supported by the EAGLE ontology and other rulesets) • A priority for the ruleset (clearly marking if it should overwrite already present data)

The intention is that the Ruleset should contain all the necessary information needed to translate one information element from a dataset to one or more entries in the EAGLE ontology for a clear semantic match in a repeatable manner.

Note that in situations where more than one 1 possible decomposition into an EAGLE code exists, the ingestion Ruleset should identify one is correct (identifying rules for selecting possible relevant entries in a code list). Definition 5 Ingestion Ruleset

The intention is that there should be an ingestion ruleset for each chosen information element type in the datasets to be ingested (note that we do not intend to ingest the entire dataset by default – a NRC expert will be given the opportunity to select which information elements to be ingested).

Figure 5-6 obtaining EAGLE codes from country datasets using Ingestion Rulesets.

In this manner the EAGLE ontology could be used both as a way of characterizing land areas, as well as providing a vessel for describing the relation with existing member state classification systems in the form of EAGLE barcode and additionally supporting data ingestion using explicitly formulated ingestion rulesets. The concrete formulation of the Ingestion Rulesets would be depending on how the EAGLE ontology is formulated and how the CLC+ Core is chosen to be implemented. Note that an Ingestion Ruleset could refer to already loaded information available in CLC+ Core (in the form of linked EAGLE parameters or codes for the Spatial Composition). The intention is that the Ingestion Rulesets should hold the logical expressions and values used to express queries in the query language chosen for the CLC+ Core (e.g. SQL, WCPS or SPARQL). Note that Spatial and temporal coverage of datasets

40 | P a g e could be described by DCAT222 in the catalogue – and that logical expressions written directly in SPARQL could refer to raster datasets by employing an OBDA approach23 to refer to the source data. The intention is that - after ingestion of data to fill out CLC+ Core using the EAGLE model, an additional step exporting data with reference to the EAGLE barcodes and additional information will be performed (using extraction rulesets). The Ingestion Rulesets are used for ingestion, whereas extraction rulesets are used for the assembly of entries in the instance nomenclature. An implementation should provide considerations about chaining operations of the ingestion and extraction rulesets in an efficient manner. 5.6.1.1 Describing the usage of Ingestion Rulesets NRC experts shall guide the construction of rulesets that relate information elements from national datasets with entries in the EAGLE ontology. The NRC experts should have the following information available before starting to formulate the ingestion rulesets: • The EAGLE ontology describing the EAGLE codes and their relationships • The name of the Information element to be ingested • The nomenclature for the dataset to be ingested. • The dataset name and the accompanying metadata information from the catalogue o Original resolution o Temporal extent o Geographic projection / Reference System

Therefore, NRC Expert should strive to update the catalogue with the relevant information in a structured manner before starting to formulate the Ingestion Rulesets (since the ingestion of data will require this information).

The following information shall be provided by NRC experts and be available usage when ingesting data: • The version of the EAGLE ontology supported • The original dataset name with a description of the given format • The chosen Information Element Name to be translated • The priority of the Ingestion Ruleset on a given CLC+ instance. • The country or organisation responsible for the ruleset • The required semantic match of the Information Element name with a possible decomposition into one or more related EAGLE codes. • Information on what action should be taken in case of incomplete data or in situ data • Information about potential mitigation actions if the ingestion of the information element fails (should the entire ingestion be stopped, or can it be ignored?) • Information on how to possibly aggregate the information element so that it can be stored with reference to the EEA39 reference grid (100x100m – and later potentially 10x10m)

22 DCAT2 is described at https://www.w3.org/TR/vocab-dcat-2/ 23 An OBDA approach is described in http://cgi.di.uoa.gr/~koubarak/publications/2018/igarss2018.pdf

41 | P a g e

A user interface should be constructed to guide the entry of this information and perform trial runs of the Ingestion Rulesets before allowing the user to save it for use when ingesting. Note that the information stored in the NRC Ingestion Rulesets is supposed to aid an automated translation procedure, so the information should be expressed in a machine-readable manner. The intention is that queries should be expressed that inserts data into CLC+ core during the execution of an import job. These queries should be expressed using the knowledge captured in the NRC Ingestion Rulesets. After the import job has finished, we will run an export job that combines the entries in the EAGLE data model so that it fills out the nomenclature in the instance. 5.6.2 Extracting data A CLC+ Core instance data shall combine CLC+ Core elements, derived from existing data sources, national datasets, that has been ingested into and stored with reference to EAGLE ontology. Depending on which instance and what priorities exists for the given instance, data will be ingested into the instance using specific rulesets. Extraction of data for the CLC+ instances will then be performed using extraction rulesets that references EAGLE barcodes.

Extraction Ruleset

An extraction ruleset will identify a query formulated to combine LCC, LUA and CH elements from the CLC+ Core to a specific information element in a CLC+ instance given query rules. The combination of several Extraction rulesets will be used to construct a CLC+ instance.

The extract ruleset shall consist of: • The name of the chosen target information element • Knowledge about relevant EAGLE codes that relates to the target information elements (captured in the EAGLE barcode). • The ability to capture logical expressions on combinations of the of the marked data in the EAGLE barcode • A priority for the ruleset (clearly marking which ruleset should be run in case of overlapping rulesets).

The intention is that the Extraction Ruleset should contain all the necessary information needed to obtain the value for a given information element in the target nomenclature.

Definition 6 Extraction Ruleset

In the case of differences in code attributions schemes it should be possible to create separate CLC+ instances, clearly identifying which attribution schemes created what instance and how. The following CLC+ instances are currently planned: CLC+ LULUCF

Combining the attributions of Land Use form the CLC+ Backbone with additions of Land use data from member states using the EAGLE ontology and allowing for comparison across observation

42 | P a g e periods. The CLC+ LULUCF instance will capture data that specific to member state needs related to LULUCF reporting – potentially supplying data with higher resolution or with more recent updates than is centrally available. Definition 7 CLC+ LULUCF

CLC+ LEGACY

Mapping the existing CLC 2018 nomenclature from the EAGLE ontology to a resolution that is comparable with the existing products CLC products. This will include a combination of Land Cover data from CLC+ Backbone and Land Use and Characteristics data from the member states. This data should allow for the continuation of existing reporting efforts (supporting the Accounting Layers).

Definition 8 CLC+ LEGACY

5.7 The hybrid approach As a guidance to the structure of the CLC+ Core a hybrid approach is introduced. The purpose of the hybrid approach is to combine the power of the graph database with the established knowledge of traditional GIS procedures. The role of the graph database could be to maintain the structure of the EAGLE model, metadata, lineage and data access, ingestion rules and extraction rules. One could imagine a system where the graph database is used to query the references to the individual layers of a dataset. These layers are then loaded into a data cube where a series of operational steps are executed, and a final layer is presented as the resulting layer. In this situation the graph database is used to store and retrieve all information related to the datasets including a reference to the gridded data set but not the actual dataset. One could even consider the situation where the graph database stores a reference to a vector dataset and a method for translating the vector data to the grid of the data cube. Please note, that this implies that the actual datasets can be stored in several different locations and formats, as long as there is a method for importing the data into the data cube and that this method can be described in the graph structure and executed automatically. An example could be to construct ‘Forestland’ of LULUCF. The graph is queried to find information about land cover, land use and characteristics that are defining the Forestland. The required data sources, the temporal extent and the logic for combining the information is returned and the system loads the layers into a cube and executes the returned operations.

43 | P a g e

6 PART III (INFORMATION) - CLC+ - THE LONG-TERM VISION

CLC+ is expected to represent the new long-term vision for land cover / land use monitoring in Europe. CLC+ shall meet the requirements of today and tomorrow for European LC/LU information needs, building upon the EAGLE concept and expertise. CLC+ shall be understood as a suite of products addressing and replying to various user needs and policy demands. These individual products will be derived from CLC+ Core and will represent different “instances” (i.e. realisations), related to specific nomenclatures e.g.: • a CLC+ Instance with a 1 ha MMU (or another required MMU value); • a MAES-Instance; • LULUCF Instance; • CLC+ Legacy Instance; • CAP2020 Instance; • a national nomenclature instance (e.g. a SIOSE-Instance, a LISA-Instance); or related to ecosystem types: • a wetland-Instance; • a forest-Instance. Basic considerations in creating a new version were: • The CLC nomenclature instances should be able to serve as basis for creating CLC+ Legacy with simple aggregation or simple query steps; • CLC+ Instances nomenclatures should be able to give a map output, • Preferably all types of features and real life instances in CLC nomenclature (see: https://land.copernicus.eu/user-corner/technical-library/corine-land-cover-nomenclature- guidelines/html/ ) should find their place in CLC+, preferably under the same class (not always fulfilled, for logical reasons); • In general class descriptions of old CLC should be applicable, unless otherwise noted (e.g. in mines also all related infrastructure and areas are included); • In subclasses and attributes compliance with latest version of EAGLE data model wherever possible. The current proposal has been elaborated by the EAGLE team, most of whose members are NRCs LC in the Eionet. The design of CLC+ also aims at: • Evolution of the new Copernicus EO and In-situ datasets that can support an enhanced extraction of LC/LU classes, attributes and parameters (even if they give only partial

44 | P a g e

coverage in some cases). The times series of Sentinel data, improved quality of VHR data and the upcoming improved EU DEM or alternatively the Copernicus programme level DEM; • Recent scientific development and operational achievement for enhanced classification depth by making effective use of the evolution of the Copernicus Space Component, third party missions and In-situ datasets; • The capacities, but also the rather slow evolution of availability and access to existing data of Member States, which in particular are needed to classify functional, but also other land use units, that can only partly be assigned by automated means or visual interpretation, often requiring a high amount of labour input; • The mapping guidelines of the Pan European and Hot Spot Land monitoring components; • Additional operational mapping guidelines and recent relevant developments, e.g. the German System for the Survey of biotopes and Land use types and the SWOS modifications to the wetland ecosystems in MAES nomenclature and others. Technically the CLC+-1ha-Instance is proposed as a raster data set with 1 ha spatial resolution (100 x 100 m), created from CLC+ Core, i.e. already including information from CLC+ Backbone (e.g. pixel classification), existing HRL, hot spot monitoring products and national data stored in CLC+ Core. Such an instance could: • address and overcome the limitations of the spatial resolution of CLC; • overcome the problems created by using heterogeneous classes, while still allowing their creation in CLC+ Legacy to maintain the “landscape vision” of CLC; • provide a better discrimination of specific landscape features by introducing a limited number of new or sub-level classes; • keep the CLC nomenclature unchanged to a large degree, while allowing it to address known major shortcomings in the classical CLC nomenclature (by attribution and/ or introduction of subclasses); • make use of a consistent and reproducible method of mapping changes at the 1 ha level for future land monitoring at a much higher spatial resolution than with the traditional CLC. A CLC+ Legacy-Instance would then guarantee the backwards compatibility with the original CLC datasets, implying: • the use of a 25 ha MMU; • the use of the standard 44 class CLC legend.

The information provided here on CLC+ and its different instances is given to set the context for the current and future work on a revised and improved CLC nomenclature. Finalization of such a revised nomenclature is not the scope of the current document, but it provides an initial proposal for discussion. Nonetheless, CLC+ Core and the CLC+ product suite needs to ensure the provision of a minimum set of parameters to allow the creation of CLC+ Legacy (for 2018). The 2018 reference year provides a unique opportunity for creating both a traditional CLC update (by the MS), meanwhile established over 38 countries and a CLC+ Legacy (via CLC+ Backbone, CLC+ Core and optionally, but not

45 | P a g e necessarily via CLC+ Instance24). The ability to create CLC+ Legacy will therefore be a prime requirement. The proposed nomenclature and structure of CLC+ and its CLC+ Instances does not deviate too much from the classical CLC to allow a comparison of both approaches. With respect to its thematic content, CLC+ will need to allow the creation of CLC+ Legacy, building on the information of CLC+ Core. To allow a straightforward creation of the traditional 44 classes of CLC+ Legacy the nomenclature of CLC+ is proposed basically as a level 4 version of the standard CLC nomenclature including additional attribution. In this way CLC+ Legacy can be derived mostly by simple thematic aggregation (partly by combination of attributes), while more thematically more detailed information is available for further assessments. An initial proposal for a level 4 classification has been prepared, considering user needs expressed e.g. in the CLC survey conducted by ETC-ULS in 201625, former CLC production experience, CLC bottom up national test case for Hungary, as well as feasibility. The current proposal is still open for internal discussion and a widespread external consultation, particularly among NRCs Land Cover and European institutional stakeholders. Basis of current proposal are the outcomes and findings of the above mentioned CLC survey, during which the following classes were marked as primary candidate for division into subclasses: • 121 – division of industry from services and utilities, green energy industry separated; • 133 – “construction of artificial features” and “nature construction"; • 142 – cemeteries and sport/recreation (subclasses e.g. sport arenas, sport fields and racecourses, golf courses, ski slopes, parks (outside urban fabric), allotment gardens, archaeological sites and museums, holiday villages (temporary residential areas), sport airfields); • 231 – agricultural grasslands (pastures, meadows) and non-agricultural grasslands under strong human impact; • 324 – forests growth areas (succession, afforestation, reforestation) and decline of forests (clearcutting, damage); • 512 – natural and man-made water surfaces.

Level-4 subclasses mainly focus on: • LU issues for classes 1xx; • LU intensity / cultivation method / LU history for classes 2xx; • habitat/geography for classes 3xx and 4xx; • naturalness for classes 5xx. This is mostly in line with the proposal of MS in the CLC survey and practices of MS states in national LM initiatives. Attributes that can be applied for several classes are proposed to have a priority in introduction to the CLC+ product suite. These mostly aim to tackle issues related to LC percentages, pattern,

24 Given the fact that CLC+ nomenclature is not yet consolidated and that collecting necessary input information for traditional CLC already poses a big challenge; in a stepwise implementation of the CLC+ concept, first the bottom-up creation of CLC+ Legacy could be tested (directly from CLC+ Core, leaving out the CLC+ step). 25 Service Contract No 3436/R0-Copernicus/EEA.56586 Task 3: Planning of cooperation with EEA member and cooperating countries. FINAL report in preparation of the CLC2018 exercise.

46 | P a g e parameters, crop, species, bio-geographical characteristics. A first tentative list of attributes is presented in Section 14 (Error! Reference source not found.). Obviously, many other land monitoring products with various nomenclatures (e.g. MAES instances) and spatial frameworks may also be defined on the basis of CLC+ Core in the future, also addressing the separation of land cover and land use (already foreseen in CLC+ Core), the use of attributes in the object characterisation or the inclusion of ecosystem or habitat information. But this will be topic of future discussions and developments, not part of the current work. Despite the 1 ha MMU of a CLC-1ha-Instance, the information provided from the pixel based classification of CLC+ Backbone and stored in CLC+ Core will allow the reporting of fractions of 1 ha units to respond to more detailed (i.e. 0.5 ha) reporting obligations, via the inclusion of attributes. Due to the higher geometric resolution, the use of some of the often-debated heterogeneous classes, like 2.4.2 (complex cultivation pattern) and 2.4.3 (land principally occupied by agriculture, with significant areas of natural vegetation) will be significantly reduced, if not disappear altogether. Other conceptually heterogeneous classes like 2.4.4 (agro-forestry) will remain widespread as they also have a meaning at the 1 ha level. As the original CLC nomenclature is not a pure land cover classification there will be the need to make use of more information than provided by the CLC+ Backbone and Copernicus Land Monitoring Service. Table 6-1 provides an overview of the traditional 44 CLC classes and information needed in addition to the information provided by the Local Components, the High Resolution Layers and the land cover information collected by CLC+ Backbone in order to derive a specific CLC class. The Hungarian test case26 for implementing the EAGLE concept in a bottom-up CLC production, as well as the EAGLE-based analysis of CLC) has clearly shown that content and quality of a bottom up derived CLC is ultimately dependent on input data on: • Land Use (critical); • Crop types (e.g. from LPIS) and agricultural use intensity; • Habitat (vegetation composition) information. The situation is different for each CLC class and Table 6-1 clearly shows that many CLC classes cannot be derived without additional information on the land use aspect of the area. The current hot spot monitoring products are only partly able to overcome this lack of land use information as they are only available in selected areas, not covering all the European territory. In almost all cases this information gap can be closed with data from MS (databases or photo-interpretation). In case MS data is not available, a few classes can be approximated via European level information, but most certainly at the expense of data accuracy and precision. The MS information should be provided as input to CLC+ Core. In case of missing national land use information as well as no other European- level replacement data27, traditional CLC data or production methods (e.g. photo interpretation of specific land use classes) might need to be used as back-up solution.)

26 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of 23.4.2009 on the European Environment Agency and the European Environment Information and Observation Network. Final report: Implementation of the EAGLE approach for deriving the European CLC from national databases on a selected area in one EU country 27 Wall-to-wall HRL can be used to address density issues or leaf types, but not they cannot be used as replacement information of land use.

47 | P a g e

The new CLC+ Instance nomenclature will demand similar, but thematically much more detailed input information on the pan-European level. The following options for the origin of information may be considered in the CLC+ concept: • Automated extraction from EO data (spectral properties and time series); • DSM data (using the EU-DEM) or DTM data (a future European DEM?); • Visual interpretation from VHR EO or Airborne data (by MS); • Open source Databases e.g. OSM; • Commercial databases and maps; • EU Member state data through INSPIRE, EuroGeographics, Open databases or direct bi- lateral consultations; • LUCAS; • Existing CLC maps (especially for LU information).

Having in mind the challenges of collecting information required for creating CLC+ Instances, EAGLE experts suggest to: • Develop optional roadmaps and scenarios depending on the timely access to spatial database of the member states, e.g. the LPIS, INSPIRE and national geodatabases. Hereby it is important to develop use cases how to handle the availability, completeness and quality of datasets of different EU member states and harmonized integration with European wide input dataset delivered by Copernicus or other sources; • To tender out a specific study reviewing recent scientific developments regarding enhanced information extraction from operational Earth observation data and complementary spatial data products as e.g. the EU-DEM; • Analyse the mapping guidelines of the Pan European and Hotspot Land components as well as other operational mapping guidelines and recent relevant developments. Several options for (stepwise) implementation of CLC+ product suite is possible for land monitoring actors in the coming period: a) Basic, geometrically enhanced (1ha) CLC+ Legacy instance; b) Geometrically and to an extent thematically enhanced 1ha CLC+ Legacy instance with a limited number of attributes derived from CLC+ Backbone; c) Geometrically and to an extent thematically enhanced 1ha CLC+ Legacy with a limited number of additional level-4 classes derived by Earth observation and combination of existing and upcoming Copernicus products; d) Same as b) with input from MS to derive existing and new LU classes, instead of photointerpretation. Before operational implementation of these new CLC+ Instances, prototyping tenders should develop processors to test feasibility and validate results.

48 | P a g e

Table 6-1: List of current CLC classes and requirements for external information

49 | P a g e

7 CLC+ LEGACY (EXAMPLE)

The history of European land monitoring is a story of permanent evolution of approaches and concepts, resulting in different solutions for individual needs and specifications. In this context, the CLC specifications provided the first set of European-wide accepted de-facto standards, i.e. a nomenclature of land cover classes, a geometric specification, an approach for land cover change mapping and a conceptual data model (Heymann et al, 1994). In this sense, CLC represents a unique European-wide, if not global, consensus on land monitoring, supported by 39 countries. Being the thematically most detailed wall to wall European LC/LU dataset and representing a time- series started in the early 1990’s CLC data became the basis of several analyses and indicator developments, like Land & Ecosystem Accounts (LEAC) for Europe. It is an obvious requirement that the planned new European land monitoring framework (2nd generation CLC) has to be able to ensure a possible maximum of backwards compatibility with standard CLC data. This comparability will be ensured by CLC+ Legacy – a data set with 25 ha MMU and the standard 44 thematic CLC classes but derived by spatial and thematic generalisation from CLC+ Core (and maybe CLC+ Backbone, TBD). Considering today’s already existing different CLC production methods, i.e. traditional visual photo-interpretation vs. national bottom-up solutions, shows that those specific methodologies result in CLC data with slightly different characteristics in terms of LC/LU statistics or fine geometric detail, but still respecting and adhering to the general CLC specifications. Although original CLC vector data are still used mainly for cartographic purposes, in practice the most commonly used versions of CLC data are CLC status and CLC change raster layers. That is why CLC+ Legacy is also proposed as a 100 x 100m raster data set. As already mentioned in chapter 6, the year 2018 offers the unique opportunity of comparing two different, i.e. a traditional and a modernised approach making use of capacities from the Copernicus service industry and MS. Comparing CLC2018 and CLC_Legacy_2018 will allow to assess the strengths and weaknesses of both approaches. The ability to compare traditionally produced CLC2018 and CLC_Legacy_2018 will be an important benchmark for assessing the success of this new production method. The production method is, however, not the only source of variation. The specification of the nomenclature, combined with the variation in land cover and landscape across the European continent, is necessarily leading to large variability within each class and partial overlap between classes (Figure 7-1). Variation is inherent in the CLC methodology and variation due to the production method should not cause any major concern.

50 | P a g e

Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central mountain area covered with sparsely vegetated lichen and calluna heath is assigned to 3.3.3 Sparsely vegetated areas in Norway and 3.2.2 Moors and heathland in Sweden. Both classifications are correct. For the practical creation of a CLC+ Legacy 201828, the production process is facing a classical dilemma that is associated with a change of methodology: How to derive a CLC+ Legacy 2018? The production process of CLC+ Backbone, CLC+ Core and CLC+ will only provide status information for 2018, not any changes relative to CLC2012. A possible solution to this dilemma (i.e. to map changes relative to the last update) is the approach used by Germany that moved from a traditional CLC mapping approach to an automated approach in 2012, using a “status layer first” approach instead of “change mapping first”. The result of that methodological change was a delineation of CLC2012 objects based on national high spatial resolution data with a different geometry than the previous CLC2006 objects. For an illustration, see also the areas marked in Figure 7-2. In a subsequent step it was then necessary to separate the differences between CLC2006 and the new CLC2012 into “technical changes” (due to the new mapping approach) and “real changes”. This step involved a mostly manual revision and the constitution of the change layer 2006-2012. In subsequent update cycles it may be possible to derive changes from the new versions of CLC+, by creating a difference layer, then manually separating real changes from differences between datasets (resulting from differences in CLC+ Core input data). Additional manual change detection might be needed for changes that are not automatically detectable due to lack of up-to-date input information (e.g. changes of land use without land cover change) or for classes that are traditional not part of national datasets (e.g. constructions sites or real extension of active mineral extraction sites).

28 The process of deriving CLC+ Legacy from CLC+ Backbone, CLC+ Core and finally CLC+ should not be confused with the traditional creation of CLC2018 by the MS.

51 | P a g e

CLC2006 CLC2012 Figure 7-2: Result of changing the CLC implementation method in Germany For a future CLC2024 (assuming the 6-year update cycle) the change mapping approach could be the combination of the traditional “change mapping first” approach and the ‘update first’ approach. • Information from 1 ha CLC+2018 and CLC+2024 would be used to derive 1 ha changes with strong manual control, to separate differences from real changes;

• These 1 ha changes would be aggregated to 5 ha change objects (i.e. CLC-Changes2018-2024); • The 5 ha change objects (2018-2014) would then be added to CLC2018 to create a CLC2024. Methods and approaches for how to obtain 25 ha status layers and 5 ha changes from higher spatial resolution data exist already in some MS.

7.1 Experiences to be considered The following experiences can be considered for establishing the grounds for creating CLC+ Legacy: • CLC accounting layers are used as input for EEAs Land and Ecosystem Accounting (LEAC) system. The methodology was developed by ETC/ULS to create these layers, as well as for the raster generalization of CLC accounting layers29; • Methodologies for bottom-up creation of CLC data have been developed by several CLC national teams. This includes spatial and thematic aggregation of in-situ based high spatial resolution data (e.g. Norway, Finland, Germany, The Netherlands, United Kingdom, Spain). In most cases, rules were developed to allow thematic conversion in an automated or semi- automated fashion. For spatial generalisation different techniques are available, depending on the starting point in each of the countries; • CLC+ national test case for Hungary30.

29 Generalization of adjusted CLC layers. ETC/ULS report 2016. 30 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of 23.4.2009 on the European Environment Agency and the European Environment Information and Observation Network. FINAL REPORT: Implementation of the EAGLE approach for deriving the European CLC from national databases on a selected area in one EU country

52 | P a g e

8 ANNEX 1 – CLC NOMENCLATURE

53 | P a g e

9 ANNEX 2: AVAILABLE ANCILLARY DATA FOR CLC+ BACKBONE

Three different possible data sources are identified for transport network: • Open Street Map data • National reference data • Commercially available navigation data For rivers two European datasets are available • Water framework directive – WISE • EU-hydro (COPERNICUS product) 9.1 Open Street Map data (linear networks) 9.1.1 Introduction Throughout the last 10-15 years the Open Street Map initiative has turned into a major and comprehensive provider for road and railway networks worldwide. Data is collected by volunteers and the product is provided with an open-content license. One major criticism when using crowd-sourced data is their incompleteness and varying quality. For OSM road dataset several recent scientific reviews (e.g. Barrington-Leigh and Millard-Ball, 201731) have shown a completeness in Europe of more than 96 % for most the EEA 39 countries. According to studies only Serbia, Bosnia-Herzegovina and Turkey have a completeness below 90 %. These three countries may be subject for a refinement of roads using additional national data sources. In figure 10.1 below the completeness of OSM in four countries (Estonia, Portugal, Romania and Serbia) is visualised on aerial images and Sentinel-2 background data.

31 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0180698

54 | P a g e

Figure 9-1 Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Novo), Romania (near Parta) and Serbia (near Indija) (top to bottom). Roads are displayed according to their functional road class (FRC). The legend of FRC is given in the figure below. © Bing/Google and EOX Sentinel-2 cloud free services as Background (left to right).

Figure 9-2: Legend for functional road class (FRC) as defined in OSM data

9.1.2 OSM road nomenclature The following table identifies all OSM road types that should be considered for CLC+ Backbone Level 1 delineation. In many countries, most roads are tagged as “unclassified”. Only highway = motorway/motorway_link implies anything about quality. (Source: https://wiki.openstreetmap.org/wiki/Key:highway)

55 | P a g e

CLC+ Potential Key Value Comment Backbone 20m width y wide highway motorway A restricted access major divided highway, normally with 2 or more running lanes plus emergency hard shoulder. Equivalent to the Freeway, Autobahn, etc.. y highway trunk The most important roads in a country's system that aren't motorways. (Need not necessarily be a divided highway.) y wide highway primary The next most important roads in a country's system. (Often link larger towns.) y highway secondary The next most important roads in a country's system. (Often link towns.) y highway tertiary The next most important roads in a country's system. (Often link smaller towns and villages) y highway unclassified The least most important through roads in a country's system – i.e. minor roads of a lower classification than tertiary, but which serve a purpose other than access to properties. Often link villages and hamlets. (The word 'unclassified' is a historical artefact of the UK road system and does not mean that the classification is unknown; you can use highway=road for that.) y highway residential Roads which serve as an access to housing, without function of connecting settlements. Often lined with housing. y highway service For access roads to, or within an industrial estate, camp site, business park, car park etc. Can be used in conjunction with service=* to indicate the type of usage and with access=* to indicate who can use it and in what circumstances. Link roads

Y wide highway motorway_link The link roads (sliproads/ramps) leading to/from a motorway from/to a motorway or lower class highway. Normally with the same motorway restrictions. y highway trunk_link The link roads (sliproads/ramps) leading to/from a trunk road from/to a trunk road or lower class highway.

56 | P a g e

CLC+ Potential Key Value Comment Backbone 20m width y highway primary_link The link roads (sliproads/ramps) leading to/from a primary road from/to a primary road or lower class highway. y highway secondary_link The link roads (sliproads/ramps) leading to/from a secondary road from/to a secondary road or lower class highway. y highway tertiary_link The link roads (sliproads/ramps) leading to/from a tertiary road from/to a tertiary road or lower class highway. Special road types y highway track Roads for mostly agricultural or forestry uses. To describe the quality of a track, see tracktype=*. Note: Although tracks are often rough with unpaved surfaces, this tag is not describing the quality of a road but its use. Consequently, if you want to tag a general use road, use one of the general highway values instead of track.

57 | P a g e

9.1.3 OSM roads: completeness assessment Estimation of OSM completeness according to parametric estimation in study of Barrington-Leigh and Millard-Ball (2017)32. A sigmoid curve was fitted to the cumulative length of contributions within OSM data and used to estimate the saturation level for each country. Country ISO-Code frcComplete_fit OSM length [km] Serbia SRB 74,3% 61.233 Bosnia and Herzegovina BIH 86,5% 37.877 Turkey TUR 89,7% 501.384 Sweden SWE 92,7% 280.107 Norway NOR 93,6% 132.527 Hungary HUN 93,8% 90.670 Romania ROU 94,3% 162.844 Ireland IRL 96,0% 107.263 Spain ESP 96,2% 509.937 Liechtenstein LIE 96,3% 359 Croatia HRV 96,8% 60.119 Bulgaria BGR 96,9% 80.983 Cyprus CYP 97,0% 12.091 Albania ALB 97,6% 22.875 Italy ITA 97,9% 594.384 Belgium BEL 98,3% 103.778 Lithuania LTU 98,5% 69.885 Poland POL 98,5% 358.733 Iceland ISL 98,6% 17.871 France FRA 99,9% 1.164.238 Denmark DNK 100,3% 96.656 Estonia EST 100,3% 42.878 Portugal PRT 100,4% 150.176 United Kingdom GBR 100,4% 443.265 Netherlands NLD 100,5% 136.118 Greece GRC 100,8% 156.586

32 Barrington-Leigh C, Millard-Ball A (2017) The world’s user-generated road map is more than 80 % complete. PLOS ONE 12(8): e0180698. https://doi.org/10.1371/journal.pone.0180698

58 | P a g e

Country ISO-Code frcComplete_fit OSM length [km] Slovakia SVK 101,4% 41.566 Austria AUT 101,4% 128.934 Luxembourg LUX 101,5% 6.210 Germany DEU 101,6% 744.304 Latvia LVA 101,7% 61.066 Montenegro MNE 102,2% 12.346 Macedonia (FYROM) MKD 102,4% 15.174 Switzerland CHE 103,0% 73.066 Czech Republic CZE 104,0% 112.441 Finland FIN 105,3% 231.298 Kosovo XKO 107,4% 13.297 Malta MLT 109,8% 2.241 Slovenia SVN NoData 36.041

9.1.4 OSM completeness and quality aspects An analysis of the OSM completeness and quality was undertaken as part of the ETC ULS Action Plan 2017 Task 1.8.1.4 – landscape fragmentation indicator fact sheet (ETC-ULS, Dec. 2017, M. Kopecky, T. Soukup). As mentioned the OSM dataset has been collaboratively improved already for 12 years33, so it is expected that completeness and consistency of OSM will continue to gradually improve, especially in Europe. This is confirmed also by the growing number of commercial solutions seen in navigation and web-mapping using the OSM data. Nevertheless, the real answer on essential questions like “How matured is the OSM actually?” or “From which point in time the OSM can be considered as mature for specific purposes?” are still far from trivial. Many studies aiming at assessing the quality of OSM data originated in previous years, and the quality of OSM data is still being monitored by the scientific community. These studies evaluated data from a variety of aspects - from a folksonomy (completeness and relevance of descriptive tags) point of view34, from the point of view of completeness by comparison with reference data35,36, track changes in the number of contributors and changes made in the dataset (these authors derive an

33 See Evolution of European OpenStreetMap coverage at https://www.youtube.com/watch?v=vUKKGgN1Dc8 34 Mocnik F.B, Zipf A., Raifer M. - The OpenStreetMap folksonomy and its evolution (Geo-spatial Information Science Volume 20, 2017 - Issue 3: Special Issue: Volunteered Geographic Information (VGI)-Analytics. http://www.tandfonline.com/doi/full/10.1080/10095020.2017.1368193 35 Neis P - Analysis of User-generated Geodata Quality for the Implementation of Disabled People Friendly Route P Planning (dissertation thesis). https://www.geog.uniheidelberg.de/md/chemgeo/geog/gis/dissertation_pascal_neis_reduced.pdf 36 Zielstra D., Zipf A. - A Comparative Study of Proprietary Geodata and Volunteered Geographic Information for Germany. http://www.geog.uni-heidelberg.de/md/chemgeo/geog/gis/agile2010_zielstra_zipf_final5.pdf

59 | P a g e increase in the number of mapped elements and their refinements from these changes) etc. For our needs, most of these studies37 38 39have several disadvantages:

• they are focused on the total number of elements in the database without their division; • study only data from a specific territory (city, region or country)40. The most extensive analysis of data in terms of completeness globally is provided by the Barrington- Leigh and Millard-Ball study that works with data for the whole world.41 This study combines two research methods – visual comparison with aerial photography, and fitting parametric models to the historical OSM street network. Figure 9-3 Global OSM Road density and completeness map - Barrington-Leigh and Millard-Ball study

However, it only estimates actual status at the time of the study creation. Information on the development of OSM data (with respect to road network) can’t be gained from it. The Barrington- Leigh and Millard-Ball study mentioned above was used also for the OSM roads completeness estimation for the CLC+ purposes as mentioned in the CLC+ specification draft report42. A sigmoid curve was fitted to the cumulative length of contributors within OSM data and used to estimate the saturation level of each country. There are also some reports available from prior ETC-SIA/ETC-ULS activities, focused on the OSM, Eurogeographics EuroRegionalMaps database and TeleAtlas MULTINET database comparison in the previous years43. OSM stability and completeness was analysed there to some extent but using only

37 https://wiki.openstreetmap.org/wiki/Completeness 38 https://wiki.openstreetmap.org/wiki/Research 39 https://osmstats.stevecoast.com/dashboard/cz/raceway 40 Evolution of road network length in Flanders https://www.openstreetmap.org/user/joost%20schouppe/diary/40472 41 Barrington-Leigh CH., Millard-Ball A. – The world’s user-generated road map is more than 80 % complete. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0180698 42 EEA/IDM/R0/17/003 - Technical specifications for implementation of a new land-monitoring concept based on EAGLE; D2: Draft design concept and CLC+ Backbone technical specifications, including requirements review. 43 https://forum.eionet.europa.eu/etc-sia- consortium/library/2012_subvention/261_emerging_issues/landscape_fragmentation/critical-analysis- data/download/en/1/gst_sia12_fragmentation_data_v2.1.doc

60 | P a g e limited number of sample countries or regions and the work didn’t have an ambition to cover all EEA39 countries. 9.2 INSPIRE national portals: roads & railways The project ELF – European Location Framework44 – has compiled overviews for existing national services mainly for INSPIRE Annex I themes (roads, buildings, hydrology). They try to serve as single point of access for harmonised reference data from National Mapping, Cadastre and Land Registry Authorities. The INSPIRE directive and subsequent regulation will certainly lead to an increased availability and accessibility for European reference data, but at the current point in time the data situation even for core national data is still quite patchy and requires large efforts for searching, finding, compiling and harmonizing (semantically and geometrically) the various dataset. The recent survey (status: 7. June 2018) in the official INSPIRE portal 45 reveals that in total 103 datasets are available from 14 member countries for download in the data theme “transport networks”. Some countries only provide separate sub-nationally distributed datasets, but not a singular nationwide download service. By 7th June all 28 countries provide metadata for at least one dataset in the Annex theme “transport”, but only 14 MC provide downloadable datasets. From these downloadable datasets only 9 MC provide datasets with national coverage. As the INSPIRE theme “transport” provides a frame for a large variety of features, a downloadable dataset does not mean automatically that a national reference road and railway transportation network is available.

Figure 9-4: Downloadable datasets for INSPIRE transport network (road) services using the official INSPIRE interface (Status: 7. June 2018).

44 http://locationframework.eu/ 45 http://inspire-geoportal.ec.europa.eu/thematicviewer/Theme.action?themecode=tn

61 | P a g e

Figure 9-5 Distribution of number of downloadable datasets for INSPIRE transport service across member countries differentiated according to national and subnational coverage (Status: June 2018)

The advantage of using national data instead of a European wide dataset should be carefully evaluated regarding costs and benefit. First, it is still a challenge to find the appropriate dataset. Although it is known that other national services exist (e.g. Spain), they cannot be found using this central access portal (INSPIRE thematic viewer). The analysis of the INSPIRE thematic portal is revised using expert knowledge in the separate EEA- study to deepen the knowledge on ancillary data input. Experts in the field of INSPIRE and national data sources have established a list of downloadable transport data per country. The list is available as Table 11.7 to this report. The complete results are available as separate EEA-report (D146). The currently available national data are not harmonized across member countries regarding, for instance, the definition of road-types or completeness of road types and provide a large variety of licenses. For future activities EuroGeographics has announced that they will provide European reference datasets. In the long term, it is expected that (upon full implementation of INSPIRE in EU28) national in-situ data will become available and accessible as consistent pan-European coverage in appropriate quality at harmonized access conditions.

9.3 Rivers and lakes 9.3.1 WISE: One example for a joint European (EU 28) dataset is the reference spatial data set of the water framework directive (WFD) that is part of the water information system for Europe (WISE)47. They compile national information reported by the EU Member states to one homogeneous dataset.

46 3436/R0-Copernicus/EEA/IDM/57292: Task 1: Deepen knowledge on hard bone selection, Sept. 2018 47 https://www.eea.europa.eu/data-and-maps/data/wise-wfd-spatial

62 | P a g e

According to the reporting cycle of the WFD that requires an update ever 6 years, the dataset is available in a version 2010 and 2016. Such datasets may provide a valuable data source for the generation of “hard bones”, but still do not provide the optimal solution. In the case of the WISE WFD dataset the coverage is in principal restricted to EU 28, but in practice not even complete within EU 28, as countries like UK, Ireland, Slovenia, Lithuania and Greece are not included yet (UK and Slovenia are only available for EEA and EC for visualisation purposes). Compared to the EU-Hydro dataset they provide in some parts a higher geometric quality, but are less complete, as smaller rivers of catchments less than 10 km2 are not included by definition. 9.3.2 EU-Hydro: EU-Hydro is Pan-European hydrographic datasets developed in the frame of the GMES RDA Preparatory Action in 2009-2012. Based on satellite imagery and other ancillary data sources they are fully in line with the Copernicus data and information policy regulation. • The geometric quality of EU-HYDRO shows a high degree of variation in geometric positional quality. In some countries this quality is even higher than the comparable WISE dataset, whereas in other countries the quality not appropriate for the CLC+ Backbone use case. • Data source: o EU-HYDRO exists already in various versions. For the assessment under this task the latest version available from the official land monitoring COPERNICUS website was used ▪ EU-HYDRO Public Beta ▪ https://land.copernicus.eu/pan-european/satellite-derived-products/eu- hydro/eu-hydro-public-beta

9.4 Priority geospatial datasets for the European Commission In the 8th meeting of the INSPIRE MIG-P the European Commission has presented a paper on “priority geospatial datasets for the European Commission”48. The goal of this paper is to provide a consolidated and consistent overview of Commission user needs and set out cross-cutting and domain specific requirements of the European Commission for EU wide geospatial information and data products from Member States in support to Sustainable Development and other EU policies. The themes covered are: Buildings (BU), Cadastral Parcels (CP), Addresses (AD), Administrative units (AU), Statistical Units (SU), and Transport Networks (TN) They state that the current offer from Member States as distributed via EuroGeographics is no longer sufficient to meet the additional user requirements in terms of content and spatial resolution. The Commission recognises that Member States via EuroGeographics under the European Location Services (ELS) project are working on future products that may represent a valuable offer to meet also the Commission requirements. The work of the commission is aligned with the work of UN-GGIM that undertakes similar work to define the specifications for 14 INSPIRE data themes. Regarding current state of play, the EC mentions that a huge implementation gap between the number of metadata records and the number of download services to obtain these data exists. Also,

48 https://ies-svn.jrc.ec.europa.eu/projects/mig-p/wiki/8th_meeting_of_the_MIG

63 | P a g e the analysis of the INSPIRE thematic viewer clearly show that several Member States are lagging behind in the implementation of certain INSPIRE themes. In terms of technical harmonisation, a study by EEA shows that the above-mentioned core reference data are not yet available from the INSPIRE geoportal with harmonised physical data models and functioning services. Similar results have been observed by Eurostat.

9.5 Conclusions from evaluation of datasets With regard to current state of play of MS geospatial datasets the European Commission3 understands that there is an implementation gap between the number of metadata records and the number of download services to actually obtain these data that exist, and that several Member States are lagging behind in the implementation of certain INSPIRE themes, at least as it can be understood after analysing what is available in the Inspire Geoportal and Thematic Viewer. In non-technical terms, varying data access and use conditions remain a huge issue for EU wide use of harmonised data from official Member States data sources. There is a growing trend to ‘open data’ is evident on the European scale. This is fully in line with the principles of INSPIRE and its main requirement to make data accessible. At national level, however there is still a diversity of license conditions between countries and in some cases even among national public authorities. In addition, organisations have adopted different technical means to restrict access to data services, impacting on the ease and time taken to download data when needed from several sources. The Commission recognises that Member States via EuroGeographics under the European Location Services (ELS) project are working on future products that may represent a valuable offer to meet also the Commission requirements. In the 8th meeting of the INSPIRE MIG-P the European Commission has presented a paper on “priority geospatial datasets for the European Commission”49. From the analysis carried out in this task, it is considered that these initiatives (and the provision of harmonised national data) could eventually become useful for the approach towards a 2nd generation CLC and in general for other monitoring initiatives of the European territory. However, taking into account and trying to satisfy the maximum scoring for the evaluation criteria used in this assessment these multiple sources are not mature in terms of harmonisation to support this first iteration of CLC+ Backbone in a cost effective and time efficient manner. As the specifications for CLC+ Backbone are to be turned into an operational contract already in 2019 the current data situation must be considered for this service contract. All potential future updates should be evaluated based on potential new reference data and/or new conditions and accessibility to existing reference data that might be available at the future date of the update.

9.5.1 Conclusions roads and railways Open Street map data should be used in general for roads and railways. When comparing the scoring values of OSM and national dataset on a country-per-country level it turned out that OSM reached higher scoring values in 38 countries. Spain is the only country with equal scoring values. For an operational service OSM turns out to be the best current single source solution as no major quality restrictions could be observed that would call for a hybrid solution (mixing OSM and national

49 https://ies-svn.jrc.ec.europa.eu/projects/mig-p/wiki/8th_meeting_of_the_MIG

64 | P a g e data sources on a country-by-country decision). However, the integration of national data would provide additional advantages, e.g. regarding the acceptance of final results within a country. As any hybrid solution requests additional efforts for data integration the principal decision to invest in a hybrid solution may be taken based on a critical area threshold for countries that can provide such data with the necessary open license. The definition, if the critical threshold is reached already with a single country, or if larger area coverage (and thus a group of countries) is required out of a European perspective (e.g. at least 25 % of EEA 39 area) has to be taken by the tendering institution. Future updates of CLC+ Backbone will each time require an ad hoc assessment of the progress made under INSPIRE and the cost-effectiveness of a hybrid approach with regards to the most appropriate decision on choice of transport network data to be used. In future a hybrid solution may be followed, once a critical mass of countries is in a position to provide a homogeneous road dataset derived from national sources. In general, larger countries justify their inclusion due to their larger area as the additional efforts to integrate additional data sources within a hybrid approach may be acceptable. Based on our assessment, in 35 out of 39 EEA countries national data are not able to fulfil the necessary perquisites for use in CLC+ Backbone. As mentioned above, the main bottlenecks are related to incompatible license conditions and pricing policies. From the 39 EEA member and cooperating countries only 4 provide (at the current stage) road datasets with an open and free data license and sufficient quality for CLC+ Backbone. From these 4 countries two countries (Spain and UK) present larger areas whereas two further countries (Austria and Luxembourg) rather represent smaller area fraction compared to the necessary coverage of EEA member country territory. When suggesting the ingestion of national data in the process, the effort (cost/benefit) needs to be evaluated, taking into account elements as area covered, number of data models to be adapted (e.g. when dealing with regional differences) etc. As stated above any hybrid approach would need a critical area threshold in the sense of proportion of the total European coverage, or contiguity of groups of countries, etc., to justify the additional efforts needed for integration of various data sources. Whereas Spain turns out in the evaluation to provide the same scoring values for national datasets and OSM, the evaluation of the national UK is marginally lower compared to OSM. Luxemburg and Austria are examples for countries with marginally lower scores for national data, but they could be used in a demo-case study as the existing national datasets come along with open and free licenses comparable to Copernicus. In general, it should be noted that the acceptance of the final product (CLC+ Backbone) will most likely be higher, if official national data would be used in the production process and this national data is in line with the Copernicus free and open data policy. But at the same time, we must respect that only a minor fraction of countries can currently provide national data in such a format. Unless a critical mass of countries can provide these data with sufficient area coverage a hybrid approach will request higher costs compared to the achievable benefits. The consideration of national data for the creation of CLC+ Backbone should consider following messages: • The use of national data may fit well with object shapes and geometries available in the country, thus potentially improving cross-scale compliance.

65 | P a g e

• The CLC+ Backbone may provide an incentive to countries to provide fully INSPIRE compliant data with matching Copernicus free and open license policies. Therefore, it is recommended to conduct in parallel to the operational tender for CLC+ Backbone a separate study where road dataset from national data sources are compiled and compared to OSM data to evaluate the consequences and the influence on the geometric structure of CLC+ Backbone. This would pave the way for the replacement of OSM data by national data for future updates, once the critical area coverage of national data is reached. In parallel it should be noted that from a technical viewpoint the predefined dataset (ATOM-feed) are still the preferred download service for INSPIRE datasets. They are easy to use in most GIS applications. Whereas WFS-Services and GML files require additional knowledge and expertise and come relatively often with restrictions (e.g. maximum number of features per request). However, they will gain major importance throughout the next years and should ultimately not establish any operational technical obstacles anymore. The current availability of download services, which have been tested in this study is provided in Table 10-2. The national data are evaluated according to the evaluation table. In addition, they have been grouped into four classes, as data availability and licensing conditions are currently undergoing rapid changes. Many countries are in a transition phase to provide national data more and more as open and free data. This might come too late for the CLC+ Backbone in spring 2019, but it should be considered for future updates. From the table below it becomes clear that only: • 4 countries (as already stated above) provide currently freely available data without restrictions; • 6 countries are considered to have their national data as open data already available, however either no download link could be found (not available) or the data is not delivered as single country file; • All other countries still struggle either with licensing conditions or geometrical issues regarding the requirements for CLC+ Backbone. Table 9-1: Grouping of INSPIRE national datasets for roads and railways according to CLC+ Backbone requirements and availability

Group group country ID 1 Data freely available: download Spain working and no Austria licensing/geometry constraint Luxembourg United Kingdom 2 Data should be freely available: Finland but download link not Denmark known/or technical problems and no licensing/geometry Hungary constraint Netherlands

66 | P a g e

Group group country ID Slovakia Malta 3 download link not known/or Sweden technical problems AND current France licensing constraints Belgium Czech Republic Romania Poland Italy 4 download link not known/or Norway technical problems AND Portugal licensing/geometric constraints Lithuania Latvia Estonia Slovenia Germany Iceland

Download links for data sources: OSM-download • https://download.geofabrik.de/europe.html • 1 file per country (Format: ESRI shape file OR .osm.bz2) • Total size: 18,2 GB

National datasets on a country-by-country base: • Same national evaluation scores AND no licensing constraints: o Spain o http://centrodedescargas.cnig.es/CentroDescargas/index.jsp ▪ data are divided into 54 provinces o upon request directly from IGN Spain: 1 country file

• Only marginal lower national evaluation scores AND no licensing constraints: o UK:

67 | P a g e

▪ https://www.ordnancesurvey.co.uk/opendatadownload/products.html ▪ OS Open Roads, 340 MB, Email-registration o Luxembourg ▪ https://download.data.public.lu/resources/inspire-annex-i-theme-transport- networks-roads-roadlink-2/20180621- 230150/TN.RoadTransportNetwork.RoadLink.gml.zip • Roads, 700 MB, GML-file ▪ https://download.data.public.lu/resources/inspire-annex-i-theme-transport- networks-rail-railwaylink-2/20180621- 230219/TN.RailTransportNetwork.RailwayLink.gml.zip • Railways, 5 MB, GML-file o Austria: ▪ Roads: GIP - https://www.data.gv.at/katalog/dataset/3fefc838-791d-4dde- 975b-a4131a54e7c5 • 1 file for Austria: linknetz + documentation (only in german) ▪ Railway: http://gis.bmnt.gv.at/wmsgw-ds/?alias=9d6ecb34-12b8- 4&request=GetData&id=ffed9fb4-4977-491b-9489- 474c13fbbd07&parentId=ac865dc1-8fa4-4f42-b843- 17c48b9360d2&filename=2017_line.gml • 1 file for AT: Streckennetz ÖBB (4,8 MB)

9.5.2 Conclusions rivers and lakes Currently additional quality assessments and improvements for EU-HYDRO have been delivered to EEA in February 2019. These include following actions as compared to the previously available public beta: • topological consistency of the River Network improved; • numerous missing polygons, in particular in the features classes InlandWaters and River_net_p added, and respective modifications of linear features and nodes made; • line segmentation and geographical displacements corrected; • missing culverts added; • erroneous flow directions corrected; • description of the braided hydrological system through the revision of specific attribute (MC) in the feature class River_Net_l added; • more than 4 000 features throughout all European river basins integrated or revised; • z-values and Strahler codes recalculated; • logical consistency improvements made: all fields from all feature classes from EU- Hydro have been reviewed and then kept or removed or updated; • usability improvements made: the user guide is updated, a detailed description of the attributes is provided, INSPIRE compliant metadata are provided for all feature classes, and all feature classes are consistently provided in geodatabase as well as shapefile-format. The EU-Hydro Public Beta version that is available from the COPERNICUS land monitoring portal will therefore be available in an improved, final version 1.0. This work should be considered, when taking the final decision per country on whether to amend the EU-HYDRO data source with national WISE data.

68 | P a g e

From the current evaluation of the river network based on the EU-hydro public beta, it turned out that a hybrid approach should be followed as no single data source guarantees a sufficient and complete dataset. 21 countries are found to have WISE data available in sufficient quality. 12 countries were found to have sufficient quality of the EU-Hydro public beta, and one national data source (Iceland) was identified, for which the EU-HYDRO had not sufficient positional accuracy. For the operational implementation of CLC+ Backbone, a prior re-assessment is required of the latest EU-Hydro v1.0 against WISE dataset to update the below listed results based on the former public beta: WISE – all available national dataset (20 countries) • EXCEPT: BG, RO, SK, MT (too low positional accuracy) EU-HYDRO (12 countries): • Within EU (3 countries): o replacing low positional WISE data: ▪ BG, RO, SK o Replacing missing WISE data (4 countries): ▪ UK, SI, Lithuania, IRL • outside EU (5 countries) o TK, NO, MK, RS, CH • Too low EU-HYDRO geometric quality – leave these countries without any dataset (4 countries) o AL, ME, B&H, MT

National dataset (not part of WISE and too low quality in EU-HYDRO, 1 country): • Iceland o https://gatt.lmi.is/geonetwork/srv/eng/catalog.search#/metadata/%7BF9A7 A75F-76AE-4C66-A36C-4D98051E9487%7D o INSPIRE Hydrography Iceland, Watercourses, WFS-Service

69 | P a g e

Figure 9-6: Overview of the river data sources per country for the currently suggested hybrid river dataset (based on the EU-Hydro public beta, not yet EU-Hydro final v1.0 !!!)

9.6 Remark: land parcel identification system (LPIS) The land parcel identification system (LPIS) in Europe is at the current stage not suited to be integrated into the Level-1 “hard bone” process, as the data is currently either: • not completely available on national level (approx. 50 % of countries provide data), • partly only available on regional level (Spain, Germany), • do not represent the same spatial scale across countries, • are not available with the same semantic information (e.g. grassland/arable land differentiation), • if data are harmonized under INSPIRE, they are provided either under Annex II land cover or Annex III land use (but not in one homogeneous data format) or • not accessible for all users / uses. It is expected that the availability and homogeneity of IACS/LPIS data in Europe is improving in the next years. DG AGRI has started a legal investigation, together with the EC Secretariat-general, into the opportunity of opening up on systematic basis parts of the LPIS data, amongst other for use under Copernicus. The conclusion of this investigation is in favour of disclosing the geometric and land cover information for broader use, and the implementation of this measure will be prepared by

70 | P a g e

DG AGRI. However, except for some pilot countries, a generalised access is not yet to be expected for the first production round of CLC+. For a future update of the CLC+ Backbone this improved availability of LPIS data will play an important role, especially as the European Commission has noted the LPIS as one of their priority geospatial datasets50. It is expected that the availability and homogeneity of IACS/LPIS data in Europe is improving in the next years. DG AGRI has started a legal investigation, together with the EC Secretariat-general, into the opportunity of opening up on systematic basis parts of the LPIS data, amongst other for use under Copernicus. The conclusion of this investigation is in favour of disclosing the geometric and land cover information for broader use, and the implementation of this measure will be prepared by DG AGRI. However, except for some pilot countries, a generalised access is not yet to be expected for the first production round of CLC+. For a future update of the CLC+ Backbone this improved availability of LPIS data will play an important role, especially as the European Commission has noted the LPIS as one of their priority geospatial datasets51.

50 [DOC11]_Priority Geospatial Datasets for the European Commission.docx: https://ies- svn.jrc.ec.europa.eu/projects/mig-p/wiki/8th_meeting_of_the_MIG 51 [DOC11]_Priority Geospatial Datasets for the European Commission.docx: https://ies- svn.jrc.ec.europa.eu/projects/mig-p/wiki/8th_meeting_of_the_MIG

71 | P a g e

10 ANNEX 3: LPIS DATA IN EUROPE The land parcel identification system is the geographic information system component of IACS (Integrated Agricultural Control System). It contains the delineation of the reference areas, ecological focus areas and agricultural management areas that are subject for receiving CAP- payments. Due to different national priorities in the implementation of the CAP the LPIS-data are not homogeneous throughout Europe. Some countries use rather aggregated physical blocks as geometric reference and other countries use single parcel management units as smallest geographic entity within LPIS. Three types of spatial objects can in principal be differentiated within LPIS (Error! Reference source not found.): • Physical block • Field parcel • Single management unit A physical block is defined as the area that is surrounded by permanent non-agricultural objects like roads, rivers, settlement, forest or other features. A physical block normally contains several field management units that may belong to various farmers. A field parcel is owned by a single farmer and is the dedicated management unit for a specific agricultural main category. It contains either grassland or arable land, but never both categories. It may be split into several single sub-management units. A single management unit contains a specific crop type (e.g. winter wheat, maize) or a specific grassland management type (e.g. one cut meadow, two cut meadow, 3 and more cut meadow). The relation between field parcels and single management units is a 1 to many relations, where 1 management unit (field) can contain 1 or many sub-management units (single parcels). Remark to the relation with the real estate database: Land properties are recorded in many countries in the real estate database. An elementary part of the real estate database is the cartographic representation of real estate parcel boundaries. These parcels should not be mixed up with the field parcels. The parcels in the real estate database constitutes owner ship rights and are documented as official boundaries. The boundaries in agricultural management units however are delineated to derive correct area figures as base for payments under the CAP. In most cases the parcel boundaries between the real estate database and the LPIS will be the same, but not in all cases.

(a)

72 | P a g e

(b)

(c) Figure 10-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit

From year to year the availability of national LPIS data is increasing. An overview of available and downloadable datasets has been created based on a previous ETC-ULS report by Synergise within the Horizon 2020 project LandSense (Error! Reference source not found.). The minimum thematic detail that should be harmonized from LPIS data is the classification into (1) arable land, (2) permanent grassland, (3) permanent crops, (4) ecological focus areas and (5) others. As these categories contain quite heterogeneous classes on national level a harmonization is quite challenging, as they are applied quite inconsistently across Europe. Nevertheless, the geometric delineation of field parcels at any level could already provide valuable input given that LPIS is available for the large majority of countries. Unless a critical number of countries with available LPIS data is reached the inclusion of LPIS will add more bias in the resulting geometric CLC+ Backbone compared to a method solely based on pure image segmentation.

73 | P a g e

Figure 10-2: Overview of available GIS datasets LPIS and their thematic content © Synergise, project LandSense

74 | P a g e

11 ANNEX 4: EAGLE ONTOLOGY

All CLC+ Core database content shall refer to codes maintained in the EAGLE ontology once the system has been established. This chapter shall list the codes available in tabular form that were available at the time of writing (January 2020). The initial list of Code entries will be provided at the kick-off meeting – and maintained in the form of the EAGLE ontology.

11.1 LCC code entries

LCC_name LCC_code

ABIOTIC / NON-VEGETATED SURFACES AND OBJECTS 1 Artificial Surfaces and Constructions 11 Sealed Artificial Surfaces and Constructions 111 Buildings 1.111 conventional buildings 11.111 specific buildings 11.112 Other Constructions 1.112 specific structures and facilities 11.121 open sealed surfaces 11.122

Non-Sealed Artificial Surfaces and Constructions 112 waste materials 1.121 open non-sealed artificial surfaces 1.122 Natural Material Surfaces 12 Consolidated Surfaces 121 bare rock 1.211 hard pan 1.212 Un-Consolidated surfaces 122 Mineral Fragments 1.221 boulders, stones 12.211 pebbles, gravel 12.212 sand, grit 12.213 clay, silt 12.214

75 | P a g e

LCC_name LCC_code mixed unsorted material 12.215 bare soils 1.222 Natural Deposits 1.223 inorganic deposits 12.231 organic deposits (peat) 12.232 BIOTIC / VEGETATION 2 Woody Vegetation 21 trees 211 Bushes, Shrubs 212 regular bushes 2.121 dwarf shrubs 2.122 Herbaceous Vegetation (grasses and forbs) 22 Graminaceous (grass-like) 221 regular graminaceous 2.211 reeds, bamboos and canes 2.212 non-graminaceous (forbs) 222 succulents and cacti 23 Lichens, Mosses and Algae 24 lichens 241 mosses 242 Algae 243 macro algae 2.431 micro algae 2.432 WATER 3 Liquid waters 31 Inland Waters 311 water courses 3.111 standing waters 3.112 Marine Waters 312 Solid waters 32

76 | P a g e

LCC_name LCC_code Permanent Snow 321 Ice, Glaciers 322

77 | P a g e

11.2 LUA code entries

Needed for CLC-Legacy Additionally needed for LULUCF

LUA_name LUA_code No characteristic LUA 0 Primary Production Sector 1 Agriculture 11 Commercial Crop Production 111 alimentary crop production 1.111 fodder crop production 1.112 industrial crop production 1.113 energy crop production 1.114 Farming Infrastructure 112 Production for own consumption 113 Forestry 12 short rotation 121 intermediate / long rotation 122 continuous cover 123 Mining and Quarrying extraction sites 13 salines 131 surface mining 132 Industries (Secondary sector) 2 Services (Tertiary sector) 3 Commercial Services 31 Accommodation and Food Services 35 Financial, Professional and Information Services 32 Community Services 33 cemetery 3.343

78 | P a g e

LUA_name LUA_code other community services 335 Cultural, Entertainment and Recreational Services 34 Cultural services 341 Entertainment 342 Sports Infrastructure 343 Open Air Recreational Areas 344 urban greenery and parks 3.441 semi-natural areas and national parks 3.442 Other Recreational Services 345 allotment garden (Schrebergarten) 3.451 Transport networks, Logistics, Utilities 4 Transport networks 41 road network (incl. parking lots) 411 railway network 412 air transport 413 water transport 414 other transportation networks 415 Logistics and Storage 42 Utilities 43 Power Distribution Services 431 Water infrastructure 432 waste treatment 433 dump sites (solid / liquid) 4.331 recycling facilities 4.332 Other utilities 434 Residential 5 Other Uses 6

79 | P a g e

LUA_name LUA_code nature protection 61 flood protection 62 renaturation 63 abandoned 64 no economic use 65 Inland Water Function 7 nature protection 76 no economic use 77

11.3 LCH code entries

Needed for CLC-Legacy Additionally needed for LULUCF

LCH_name LCH_code Land Management 01 Agricultural Management 011 Agricultural Land Type 0111 arable crop land 01111 permanent crop land 01112 permanent grassland 01113 Cultivation Practices 0112 crop rotation 01121 no crop rotation 01122 plantation 01123 extensive orchards 01124 agroforestry 01125 shifting cultivation (slash&burn) 01126 intercropping 01127 paddy field cultivation 01128

80 | P a g e

LCH_name LCH_code Cultivation Measures 0113 Fertilizing Activity (yes/no) 011301 Irrigation (yes/no) 011303 Mowing 011307 none (natural) 0113071 1 time (semi-natural, extensive) 0113072 2 times (medium intensity) 0113073 > 2 x (intensive) 0113074 uknown 0113075 Grazing 011308 intensive (>2 livestock unit/ha) 0113081 extensive, freerange (<2 livestock unit/ha) 0113082 unknown intensity 0113083 Forestry Management 012 Forestry Practice 0121 even-aged monoculture plantation 01211 even-aged mixed forest 01212 uneven-aged mixed forest (selection forest) 01213 natural forest (non-homogenized) 01214 Forestry Harvesting Method 0122 clearcutting 01221 selective logging 01222 Forestry Measures 0123 coppicing 01231 thinning 01232 Forest History Type 0125 endemic/primary 01251 re-forestation 01252 af-forestation 01253 Spatial Patterns 02 Spatial Distribution Patterns 021

81 | P a g e

LCH_name LCH_code homogenous 0211 mixed 0212 mosaic 0213 scattered 0214 Spatial Context 025 urban 0251 coastal 0254 inland 0257 Crop Type (PlotActivityValue) 05 Arable crops 051 rice 051017 hops 0510602 Pasture / meadow 052 Permanent crops 053 Fruit and berry plantations 0531 Olive plantations 0533 Vineyards 0534 Ecosystem Types 07 Constructed, industrial and other artificial 0701 Regularly or recently cultivated agricultural 0702 Grassland and tall forb 0703 Agricultural mosaics 0704 Woodland and forest 0705 Heathland, shrub and tundra 0706 Transitional woodland 0707 Inland unvegetated or sparsely vegetated 0708 Mire, bog, fen 0709 inland marshes 07091 Peatbogs 07092 Coastal salt marshes 0710 Intertidal flats 0711 Mangroves 0712

82 | P a g e

LCH_name LCH_code Coastal lagoons 0713 Estuaries 0714 Inland surface water 0715 Marine 0716 Biotic / Vegetation Characteristics 092 Leaf Form 0921 needle leaved 09211 broad leaved 09212 palm leaved 09213 non-leafy 09214 Leaf character 0923 sclerophyte 09231 regular 09232 Phenology 0924 perenial 09243 Foliage Persistence 0922 evergreen 09221 winter deciduous 09222 Water Characteristics 093 Inland Water Origin 0931 natural 09311 controlled/regulated 09312 man-made 09313 Hydrological Persistence 0932 dry 09324 ephemeral 09323 intermittent 09322 perennial 09321 Wetness 0933 surface water 09331 saturated ground 09332 Salinity (Water or Soil) 0934

83 | P a g e

LCH_name LCH_code saline 09342 brine 09341 Tidal Influence (yes/no) 0935 Snow Height 0936 Status 10 under construction 1001 Damaged (yes/no) 1006 Damage Reason 1010 snow avalanche 10103 fires 10106 tornados, hurricanes, strong winds 10107 biological 10108 unknown status 1011 General Parameters 11 Height in [m] 112 Temporal Parameters 12 duration 121

84 | P a g e