03aee3f91559b02c5d2bab131ec161e6.docx 01/08/2016 Page 1 of 16 Summary Introduction

This document describes how:

1. the CLC source data1 already used in the EAGLE project 2 and available as sqlite file3 was transformed into a gml dataset file compliant to INSPIRE Land Cover Vector application schema using HALE software. The choice of using the same EAGLE demo source dataset (the first 10 records of each CLC- code for 3 pilot countries) has been made to compare results and encountered issues.

2. WFS and WMS services were set up in Deegree4 environment serving the abovementioned Corine Land Cover geospatial dataset compliant to INSPIRE LCV application schema5.

3. the issues encountered in GeoServer environment hindered the set-up of GeoServer WFS and WMS services for LC GML dataset.

It’s worth noting that:

 the mapping rules provided by EAGLE reference project were used for the transformation process. Nevertheless, a different ‘encoding of the mapping’ was used with respect to the member relationship linking the Land Cover dataset to its Land Cover units. As will be better explained in the following of the document, this change was needed to overcome issues related to correct access to and visualisation of Corine Land Cover GML features in GIS environment.6

 in the Land Cover harmonisation process, the huge amount of data to transform and serve is clearly a limitation both in the GML dataset production and in the web services set up.

1 SC56088_EAGLE6 _FinalReport : “Three countries are chosen for the test dataset: Finland, Portugal and Austria. The selection of these countries is made from the clc06_spatialite.rar archive available on http://www.eea.europa.eu/data-and- maps/data/clc-2006-vector-data-version-3. To limit the size of the resulting GML, for each country the first 10 records of each CLC-code are chosen. If there are less than 10 records within a country with a specific CLC-code, then all of them are included.“ 2 http://land.copernicus.eu/eagle 3 clc06_selection.sqlite provided from EEA as reference material 4 Deegree (http://www.deegree.org/) is open source software for the creation and the management of spatial data infrastructures. Version referred in this document is 3.2 5 http://inspire.ec.europa.eu/schemas/lcv/4.0/LandCoverVector.xsd 6 SC56088_EAGLE6 _FinalReport page 67

03aee3f91559b02c5d2bab131ec161e6.docx 01/08/2016 Page 3 of 16 To overcome the issue related to huge file size ingestion in Deegree SQL data stores, the use of HALE WFS-T experimental function was tested and the results are reported in the following of the document.

 to solve the “orientation of polygons not counterclockwise” – data validation issue related to the respect of the ISO191077 - the “Unify Winding order” new HALE feature has been tested and result reported.

7 requiring orientation of exterior polygon boundary counter clockwise and interior boundaries clockwise Source data transformation

Source data were mapped into INSPIRE Land Cover Vector application schema structure  according to the LC_Matching_Table_CLC_pub.xlsx mapping rules provided by EAGLE project

 making use of HALE transformation software. As already stated in the introduction, a new encoding was used for the mapping of the member relationship (see Figure 1), linking the LandCoverDataset to its LandCoverUnits. More specifically the ‘referenced encoding’ was used instead of the ‘embedded encoding’ (this latter was used in the EAGLE project).

Figure 1 ‘Referenced’ vs ‘Embedded’ encoding

The LandcoverDataset can refer each of its LandCoverUnit by means of:  an external link to the unit (@xlink href)

 a description embedded in the LandCoverDataset feature itself.

Figure 2 illustrates the target schema (i.e. INSPIRE LandCoverVector) structure as displayed in HALE workbench. As can be seen, two choices are available to map the member attribute of the LandCoverDataset feature type:

03aee3f91559b02c5d2bab131ec161e6.docx 01/08/2016 Page 5 of 16 1. reference choice (in the Figure2 is the mapping option highlighted in green in the red box) in which the LandCoverUnits are addressed by means of xlink:href . This way each LandCoverUnit is described/mapped as a different gml feature member.

2. embedded choice (the mapping option highlighted in light blue in the red box) in which the description of the LandCoverUnits is embedded in the LandCoverDataset feature i.e. not available as stand-alone feature type.

Figure 2

Despite the encodings being both formally correct, only the ‘referenced approach’ allows: 1. the correct visualisation of the INSPIRE GML dataset (imported as GML file or by means of WFS Getfeature operation) in the GIS environment 2. to download LandCoverUnits from WFS without having to download the entire dataset (this is not possible when using the embedded approach). GML dataset Validation Results

Different tools were used to assess the conformity of the transformed GML dataset to  INSPIRE Land Cover Vector target schema

 GML encoding. It’s important to remark that HALE and Oxygen tools only perform a validation against those schema requirements that can be expressed by means of XML schema grammar (i.e. xsd). Conversely, the eENVplus Validation Service goes deeper into validation, since it performs also the validation of other GML Specs requirements (that cannot be expressed through xsds) by means of OGC GML schematron file8 and OGC GML Test Suite Java methods (TestNG). Note that the produced GML files were validated as well against the Land Cover Vector v4.0 application schema requirements included in the Land Cover schematron9 developed by Epsilon Italia in the framework of the eENVplus project and modified by EAGLE project developers.

Following Table1 reports the validation results.

Validated GML File HALE OXYGEN eENVplus VS 2.0 Schematron Schematron name Validation Validation Validation Validation Validation (based on OGC CITE Oxygen + eENVplus VS + test suite rel.1.22) LandCoverVector LandCoverVector _v3.4.sch _v3.4.sch schematron schematron clc_completeDataset Passed Out of Out of memory Out of memory Out of memory memory clc_1000 Passed Passed schema validation Passed Passed test = passed overall validation result= failed

Table 1

The eENVplus Validation Service validation fails because some Corine Land Cover polygons in the dataset (those corresponding to the Madeira island for example) do not fall into the area of the Coordinate Reference System addressed by the dataset.

Initially the validation issue “orientation of polygons not counterclockwise” was reported by the eENVplus Validation Service. This validation issue is related to the respect of the ISO19107 requiring orientation of exterior polygon boundary counter clockwise and interior boundaries clockwise. This issue was solved later on thanks to the “Unify Winding order” new HALE feature.

8 http://grepcode.com/file/repo1.maven.org/maven2/org.jvnet.ogc/ogc- schemas/2.2.0/ogc/gml/3.2.1/SchematronConstraints.xml 9 LandCoverVector_v3.4.sch

03aee3f91559b02c5d2bab131ec161e6.docx 01/08/2016 Page 7 of 16 In the exporting GML phase, this new feature allows to leave the exterior polygon boundary winding order ‘as-is’, or to unify to CW or CCW. For inner polygon boundaries, the reverse order -with respect to the exterior one- is used. The clc_1000.gml dataset was re-exported selecting the “Unify Winding order to counter clockwise” and no issue related to orientation of polygons was reported. Steps to the creation of the Deegree CLC WFS instance

Step 1: A CLC workspace was created and activated in Degree. An empty PostGis DB 10was created to serve as a repository for the CLC data. This empty database is used by Deegree to create tables and views according to the INSPIRE LC application schema (see step 2 below). A jdbc connection to CLC PostGis DB was established in Degree. Step 2: A CLC SQL feature store was set up selecting ‘schema-driven’ mode + ‘relational mapping’. This way the detailed feature type definitions and property declarations from CLC application schema were used by Deegree to create relevant tables in the abovementioned PostGis DB (the Deegree wizard provides an ad-hoc configuration dialog where the user can enter all settings required to establish a connection to the PostGIS database). Step 3: The transformed INSPIRE Land Cover GML dataset11 was used to populate the PostGIS tables by means of the feature store Loader function. Step 4: A WFS service was set up and a slight modification of the automatically produced configuration file was necessary: 1. to use urn encoding for GML 3.2 CRS info. This was necessary to avoid incorrect display of the served datasets in some GIS environment (for example in QGIS)12.

2. to introduce the INSPIRE extended capabilities

10 PostgreSQL (version 9.4) with PostGIS extension, https://www.postgresql.org/ 11 File available in a Google Drive zip folder at https://drive.google.com/a/epsilon-italia.it/folderview? id=0BygRFIfN1Pe6STBVVnNTM1BPUG8&usp=sharing# . 12 For more information on this issue, see http://cloud.epsilon-italia.it/iisstart_files/QGISTips.html File size issue for the SQL feature store

The main issue encountered in setting up WFS server for the CLC data is related to the limit size of the GML file to be ingested in the data store by means of the Loader function (Step 3). A ‘Java heap space’ issue prevented the ingestion of the complete CLC dataset (+255000 features, almost 1.5 GB size) while smaller size datasets ingestion (up to 10000 features) was successful. Possible solution to the file size issue

To overcome the issue related to huge file size ingestion in Deegree SQL data stores, the use of HALE WFS-T experimental function was tested.

HALE WFS-T experimental function HALE WFS-T feature allows to publish GML to a transactional Web Feature Service, for instance to publish INSPIRE data to a Deegree server. There are two different choices for publishing to WFS:  Direct upload: performs a single request that starts as soon as the first transformed features are available.  Partitioned upload: splits the transformed data before uploading and performs multiple requests to the WFS-T (recommended for large data sets). The partitioning is done based on the references between features.

We used the ‘WFS_T Partined Upload’ to publish CLC dataset to the Deegree WFS instance. We set ‘15000- instances per transaction’ partitions.

After about 24 hours, 60000 features were ingested in the database (i.e. 4 transactions correctly completed), but then the HALE running task got stuck (though no error message appeared in the HALE workbench nor could be found in HALE log) and we aborted execution.

In Deegree log we found following message

“java.io.IOException: com.ctc.wstx.exc.WstxIOException: java.io.IOException at org.deegree.services.controller.utils.HttpResponseBuffer.flushBuffer(HttpResponseBuffer.java:301) ..”

We also tried re-running WFS_T Partined Upload task from HALE specifying 10000- instances per transaction partitions and got same java.io.IOException issue.

The issue is being reported to HALE developers (though we are not sure whether reported issues depend on HALE or if they are related to Deegree server configuration or memory management).

03aee3f91559b02c5d2bab131ec161e6.docx 01/08/2016 Page 9 of 16 Loading WFS-served CLC features in QGIS

Using the WFS Client 2.0 QGIS plugin and specifying the server url: http://cloud.epsilon-italia.it:8085/deegree-webservices-3.3.18/services/EEA_LC_WFS? the LC.LandCover layer can correctly be imported, displayed and queried. Moreover, it correctly overlays on the CLC spatiaLite source data layer (see Figure 3 below).

Figure 3 WFS served CLC features : validation results

It was found that, in creating the SQL datastore from INSPIRE LandCoverVector application schema, Deegree did not create the elements related to LandCoverNomenclature.xsd and to gmd.xsd both imported by the LandCoverVector.xsd.

This is because, when the element structures extend a certain complexity (with nesting and recursion) Deegree simply omits them in order to avoid the config files to blow up.13

Manual intervention was needed to modify the SQL datastore configuration schema automatically created by Deegree. More specifically needed elements in the LandCoverDataset feature type were mapped according to the following :

13 See bug reported at https://sourceforge.net/p/deegree/mailman/message/32578671/

03aee3f91559b02c5d2bab131ec161e6.docx 01/08/2016 Page 11 of 16 toColumns="id_nomenclature_parent"/>

Steps to the creation of the Deegree CLC WMS instance

Manual configuration was needed for the creation CLC WMS according to following steps: 1. Creation of a CLC style store and styles contained therein

2. Creation of CLC layer store and layers contained therein

3. Creation of CLC theme (theme can be thought of as a collection of layers, organized in a tree structure) The WMS CLC layer can be visualised at http://cloud.epsilon-italia.it:8085/deegree-webservices-3.3.18/console/wms/wms.html clicking on the ‘+ ‘ icon on the right and selecting LC.LandCover layer Loading WMS LC.LandCover layer in QGIS

The WMS can be added to and correctly displayed in QGIS map.

Deegree Webservices endpoints

The WFS can be accessed at

http://cloud.epsilon-italia.it:8085/deegree-webservices-3.3.18/services/EEA_LC_WFS? below are a WFS GetFeature examples for LCUnits and LCDataset respectively http://cloud.epsilon-italia.it:8085/deegree-webservices-3.3.18/services/EEA_LC_WFS? service=WFS&version=2.0.0&request=GetFeature&count=100&typename=lcv:LandCoverUnit http://cloud.epsilon-italia.it:8085/deegree-webservices-3.3.18/services/EEA_LC_WFS? service=WFS&version=2.0.0&request=GetFeature&count=100&typename=lcv:LandCoverDataset

The WMS can be accessed at http://cloud.epsilon-italia.it:8085/deegree-webservices-3.3.18/services/EEA_WMS?

The LC.LandCover layer can be visualised at http://cloud.epsilon-italia.it:8085/deegree-webservices-3.3.18/console/wms/wms.html clicking on the ‘+ ‘ icon on the right and selecting relevant layer

03aee3f91559b02c5d2bab131ec161e6.docx 01/08/2016 Page 13 of 16 Issues related to the creation of the GeoServer LC Webservices

The HALE 2.9.4 GeoServer App-Schema Integration feature was used to automatically convert the HALE alignment into an app-schema configuration for GeoServer and setup INSPIRE compliant services by means of the GeoServer App-Schema plugin.

Initially we tried the App-Schema Configuration [Direct Upload] (i.e. the configuration is generated and immediately uploaded to GeoServer via its REST API) but we got a “org.apache.http.client.HttpResponseException: Internal Server Error” message.

It’s worth specifying here that the GeoServer compatibility mode in HALE workbench was selected and reported no issue (green check sign).

We then used the App-Schema Configuration function and manually copied the produced files the GeoServer data directory.

The LandCoverVector store as well as the LandCoverDataset and LandCoverUnit layers were automatically created by GeoServer after the relevant app-schema file was uploaded.

The WFS serving LandCoverDataset is correctly functioning at http://cloud.epsilon-italia.it:8083/geoserver/ows? service=WFS&version=2.0.0&request=GetFeature&typeNames=lcv:LandCoverDataset This WFS string successfully validates in the eENvplus Validation Service.

Conversely, the following error arises when invoking the WFS 2.0 for LandCoverUnit, for example with the following string: http://cloud.epsilon-italia.it:8083/geoserver/ows? service=WFS&version=2.0.0&request=GetFeature&typeNames=lcv:LandCoverUnit&count=10

ServiceException> java.lang.RuntimeException: Error applying mapping with targetAttribute lcv:geometry/gml:Surface/gml:polygonPatches Error applying mapping with targetAttribute lcv:geometry/gml:Surface/gml:polygonPatches Could not find working property accessor for attribute (the_geom) in object

It seems that GeoServer is not able to map Surface/gml:polygonPatches geometry which is instead required by geometryIsKindOfGM_PointOrGM_Surface constraint contained in the INSPIRE Data Specification on Land Cover (in OCL language: inv: self.geometry->forAll(l | l.oclIsKindOf(GM_Surface) or l.oclIsKindOf(GM_Point)).

Further investigation is needed. The issue has been reported to GeoServer developers.

03aee3f91559b02c5d2bab131ec161e6.docx 01/08/2016 Page 15 of 16 Conclusions

The work described in the present document takes into consideration the same source dataset already used in the EAGLE project (the first 10 records of each CLC-code for 3 pilot countries) so that results and issues encountered could be compared. Here follow few considerations with reference to:

1. the transformation of source dataset into a gml file compliant to INSPIRE LandCoverVector schema and the choice between the ‘Referenced’ vs ‘Embedded’ encoding: referenced choice is to be recommended because it allows: o the correct visualisation of the INSPIRE GML dataset (both imported as GML file or by means of WFS GetFeature operation) in the GIS environment

o to download LandCoverUnits from WFS without having to download the entire dataset (this is not possible when using the embedded approach).

2. file size issue: huge file sizes can easily be reached in the case of Land Cover datasets. For example, the EAGLE source dataset (10 records of each CLC-code for 3 countries) already contains 255619 polygons and turn out into a huge 1.5 GB GML file! With reference to the Deegree software, it seems that the automatic ingestion of huge amount of features in the data store is not possible. Therefore, currently, one should think about transforming single CLC class dataset or single MS dataset and ingest one at the time…

3. WFS-served features validation issue: Deegree is capable of creating WFS and WMS webservices for Land Cover datasets. Nevertheless manual intervention is needed to modify the SQL datastore configuration schema automatically created by Deegree to avoid validation issues on served features.