Computers & Geosciences 45 (2012) 270–283

Contents lists available at SciVerse ScienceDirect

Computers & Geosciences

journal homepage: www.elsevier.com/locate/cageo

Towards a standard for and terrain data exchange: SoTerML

Amir Pourabdollah a,n, Didier G. Leibovici a, Daniel M. Simms b, Piet Tempel c, Stephen H. Hallett b, Mike J. Jackson a a Centre for Geospatial Science (CGS), the University of Nottingham, Triumph Road, Nottingham NG7 2TU, UK b National Soil Resources Institute (NSRI), Cranfield University, Bedfordshire MK43 0AL, UK c ISRIC – World Soil Information, 6708 PB Wageningen, the Netherlands article info abstract

Article history: Soil and landform information is needed for a wide range of applications but available data are often Received 24 November 2010 inaccessible, incomplete, or out of date. Within the European FP7 project ‘e-SOTER’, following Received in revised form recognised harmonisation principles, we developed an XML schema to serve as an exchange format 14 November 2011 for soil and terrain data derived from e-SOTER methodologies (SoTerML). It encompasses existing Accepted 23 November 2011 SOTER database conceptual modelling as well as the WRB (World Reference Base of soil resources) and Available online 12 January 2012 the FAO soil data structures and classifications, therefore covering major soil and terrain databases such Keywords: as the European Soil Database (ESD). The flexibility of the modelling achieved is demonstrated from XML legacy data integrated in the new scheme and made available using an OGC Web Feature Service. Along GML with the description of SoTerML, the paper aims at pointing out the modelling approach and the Soil data modelling principles used for soil and terrain observations, which extracted from our proposal, could Terrain data SOTER prove useful for emerging initiatives towards defining an exchange standard framework for the Standard soil theme. OGC & 2011 Elsevier Ltd. All rights reserved.

1. Introduction This paper aims to contribute to this goal by expressing the main principles which helped us in building up a data model and Many initiatives are addressing the goal of establishing a a mark-up language called SoTerML (Soil and Terrain Mark-up standard model and exchange format for soil-related data in Language) as a part of the European FP7 project e-SOTER local, continental and global scales. At the current time there is (e-SOTER, 2008). SoTerML is an XML-based data exchange format much activity underway across the EU Member States concerning derived from the modelling principles and main contextual development of the INSPIRE (Infrastructure for Spatial Informa- approaches that we analysed (see details in Section 4). We tion in Europe) Common Implementing Rules for the Interoper- describe the current development status of SoTerML which can ability of Spatial Data Sets and Services. In particular, soil is contribute to establishing a consensus for an OGC or ISO standard identified in Annex 3 of the INSPIRE Directive as a discrete spatial for this theme. We believe these principles to be applicable to theme (EU, 2007). Other related emergent data specification other geo-science domains and can be used by the above-men- initiatives also concern soil, such as Soil-ML (Montanarella tioned initiatives as well. SoTerML is not just devoted to soil data et al., 2010), the ISO group on soil data (not yet published), alone, but together with terrain data components, we believe that GlobalSoilMap.net (GSM.net, 2010) and the GS Soil project no other exchange format exists that fits the needs for soil and (GSSoil, 2010). These emerging proposals of standards are rela- terrain information management. tively convergent and a consensus is expected on formal data The area of SoTerML applicability is not limited to the e-SOTER specifications for applications developments within a multidisci- platform. An example of the future application is the assessment of plinary context. risk, which represents one of the foremost interpretative uses of information. Without adequate vegetative cover and protection, certain soils are prone to erosion, with impacts on soil productivity and fertility. Erosion occurs from a range of physical n Corresponding author. Tel.: þ441159515449. forces, such as rainfall, flowing water, wind, ice, temperature change, E-mail addresses: [email protected] (A. Pourabdollah), gravity and other abrading factors (Kibblewhite et al., 2007). Accel- [email protected] (D.G. Leibovici), erated erosion due to inappropriate management and landuse can be d.m.simms@cranfield.ac.uk (D.M. Simms), [email protected] (P. Tempel), s.hallett@cranfield.ac.uk (S.H. Hallett), countered through the use of computer-based erosion modelling [email protected] (M.J. Jackson). tools. One such tool, MESALES, Modeled’EvaluationSpatialedel’ALe ´a

0098-3004/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2011.11.026 A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283 271

Erosion des Sols, offers regional, homogenous assessments of soil e.g., geomorphic landscape analysis, geologically re-classified erosion risk, applicable at a national and EU scale (Le Bissonnais et al., remote sensing, and remote sensing of soil attributes. 2005). The MESALES model is currently being adapted to operate via The e-SOTER project aims at delivering a pilot platform and soils data served in SoTerML form, combined with other relevant data web portal that provides open access to: (1) methodologies; (2) themes such as regional climate, drainage density and landuse. It is enhanced SOTER databases and an artefact-free 90 m digital anticipated that this will ultimately permit development of web elevation model; (3) dedicated applications related to major services addressing regional and national soil management and threats to and performance. protection issues. As a concrete example, if a SOTER database exists The design of SoTerML had to take into account two competing holding information falling across two or three countries which aspects. The first is the various soil attribute data profiles or would form an input for the MESALES model, the SoTerML exchange classifications already used in SOTER databases. The three main format will facilitate seamless combination of this soils data together data designs are: the SOTER profile (Oldeman and van Engelen, with other required data themes. 1993), WRB (2006) (World Reference Base of Soil Resources) After summarising the context within the e-SOTER project for (WRB, 2006) and the FAO classification schemas (FAO, 1988). European contribution to GEOSS in Section 2, the paper describes The second aspect is related to the interoperability developments in Section 3 the related initiatives for soil data exchange. Based led by OGC (Open Geospatial Consortium) concerning geographi- on the analysis of these descriptions, Section 4 explains the cal datasets. On the one hand the goal is to be able to facilitate conceptual and modelling approach to establish SoTerML. Before understanding and transfer from existing formats and data concluding by envisioning the web services applications, Section models, whilst on the other the aim is to provide exchange of 5 discusses the main principles that guided the SoTerML datasets necessitating harmonisation and standardisation with modelling. compliancy to existing standards. The intention is that the soil and terrain related data will be provided via an OGC-compliant web feature service capable of 2. The context of the modelling approach placing both the geometric and attribute information as complex objects into the client applications. The OGC’s GML application Contributing to the GEOSS framework (Global Earth Observa- profile (OGC, 2004) which is also distributed as ISO 19136 is the tion System of Systems) (Lautenbacher, 2006), the e-SOTER natural choice to manage the geometric aspects of the SoTerML. project addresses the need for a global soil and terrain database The GML standard allows for dissemination of complex feature and standardizing the soil data (van Engelen, 2011). e-SOTER aims and attribute data and is the exchange format for geospatial data at delivering a web-based regional pilot platform with data, under the INSPIRE Directive. SoTerML provides a GML application methodology, and applications using remote sensing to validate, schema for the delivery of SOTER data via an OGC Web Feature augment and extend existing data. This project has required an Service (WFS). XML-based mark-up language for exchanging the soil and terrain The context of SoTerML development, especially with the related data over the Web, called SoTerML. ongoing ISO/TC190/SC 1 N140 ‘‘Recording and Exchange of Soil- Why should an XML-based standard be needed for soil data? Related Data’’, enforces the data model to be adaptive to different XML has been used widely in Web data exchange and data soil profiles so that SoTerML can be in principle compliant to the storage applications because of its semi-structured nature and future ISO 28258 standard which has recently reached the DIS being text-based it is readily human as well as machine-read- status (Draft of International Standard). able. Its recent applications have been extended to many scientific domains including geo-science applications (Babaie and Ramachandran, 2005) and particularly to the environmental 3. Related works sciences. More and more scientists and environmental modellers are working using data sources for their modelling purposes that Data models using XML and the GML application profiles are are fetched directly over the web using interoperability princi- still recent but are now becoming widely accepted in the ples. This saves a considerable amount of time and allows for geosciences community (Babaie and Ramachandran, 2005). The easy updating of models or changes to study areas. The Open modelling efforts within each particular theme have been boosted Geospatial Consortium (OGC) along with the International Stan- by the seamless use of OGC web services to discover, retrieve and dards Organisation (ISO) ensures standard interfaces for spatial deliver data over the Internet using this format. In this section, data format and queries using web services. Besides this tech- the previous and most relevant developments of international nical interoperability, a semantic interoperability structure is databases for soil attribute and existing related data models and also needed to encompass in a seamless way legacy data and mark-up languages for global soil classification systems are newly developed measurements. These semantic issues are summarised, along with their implication for the modelling based on harmonisation of soil and terrain data structures and approach of SoTerML. on integration of existing format structures that can be embedded and referenced in a standard way in an XML file. The e-SOTER project aims at providing web services for soil 3.1. INSPIRE data specification and terrain data based on an enhanced SOTER methodology derived from the project. This enhanced methodology has been INSPIRE is a Directive of the European Community establishing developed to include: quantitative mapping of landforms; soil a Spatial Data Infrastructure in the European Community, which parent material and soil attribute characterisation using pat- was entered into force from May 2007. The INSPIRE initiative is tern recognition on remote sensing data; and standardisation of responsible for overseeing the rules and standards governing methods and measures of soil attributes to convert legacy data. spatial data infrastructures (SDI). These rules are founded upon There are two major research thrusts involved: (1) improve- Open Geospatial Consortium (OGC) standards, adapted in various ment of the current SOTER methodology using moderate- ISO and CEN rules concerning the management of geographic resolution optical remote sensing in combination with existing information, and serve as the framework for developing the parent material/geology and soil information; (2) development geospatial web-based infrastructures in e-SOTER with close con- of an advanced remote sensing application within pilot areas, nection to the GEOSS initiative. 272 A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283

The INSPIRE Directive addresses 34 spatial data themes in 3.4. ENVASSO soil database (SODA) three annexes, in which Soil is included. The INSPIRE Thematic Working Group (TWG) on Soil is a recent established TWG which SODA has been developed by BGR (Federal Institute for has released the INSPIRE Data Specification on Soil – Draft Geosciences and Natural Resources, Germany) as a general and Guidelines (D2.8.III.3) in June 2011 (INSPIRE TWG/Soil, 2011). In flexible database for soil-related data (Eberhardt, 2008). The main this document, a comprehensive data model of soil has been particularity of SODA is its flexibility in accepting user-defined described which fits into the needs of the INSPIRE Directive. The soil attributes. In fact SODA tries to make its table structure document has a set of UML class diagrams to specify soil data simple and to impose as little constraint or pre-definition of the reusing ISO-19xxx family of standards. This document provides a schema as possible for accepting soil data, while keeping the basis for developing an XML specification for soil data exchange in database robustness. Although SODA has not been widely used, the future. Besides the UML, one of the main parts of this its flexibility in design has been partially acknowledged in the ISO document is the ‘‘feature catalogue’’ (Section 5.2.2. of D2.8.III.3), development for soil data exchange and so has inspired SoTerML which defines a set of attributes/value sets that can be associated in supporting user-defined soil attributes. with soil features. This is an interesting area to connect to SoTerML (which was developed before the release of the INSPIRE 3.5. World reference base of soil Resources (WRB) Draft Guidelines) because the feature catalogue can be used as SoTerML’s ‘‘attribute patterns’’ (see next section). SoTerML WRB is a standard taxonomic classification of soil that has appears compliant with the guidelines in its structure, however, been developed by IUSS, FAO and ISRIC in a working group called the INSPIRE document mostly uses the fixed set of attribute WRB in 1998. The most recent editions of it (WRB, 2006, 2007) name/values for soil (compared to the more flexible SoTerML have been used in the design of SoTerML. More details about the mechanism) and also it does not cover terrain data. WRB classification method used in SoTerML are described in Section 4.6. 3.2. GeoSciML 3.6. FAO soil classification (Revised FAO legend) A range of mark-up languages were evaluated for part The Revised FAO Legend (FAO, 1988) (corrections on the 1994 inclusion with the SoTerML format. An early candidate was the edition of of the World-SMW-Legends) is another Geo-Science Mark-up Language (GeoSciML) (Bellier et al., 2006). method of soil classification. The FAO Legend, unlike WRB, has a The geological community across Europe have developed GeoS- flat (as opposed to hierarchical) structure. The result of classifying ciML as a means to hold and convey earth sciences information a soil categorises it in one of the 147 classes. This classification for the purposes of query and exchange of digital geo-scientific has been used as one of the standards implemented in SoTerML. information. GeoSciML offers a conceptual data model plus a mark-up language to query and exchange digital geo-scientific 3.7. The European soil database information between data providers and users (Sen and Duffy, 2005). Its scope is the information generally shown on geological The SOMIS service (SOil Map Internet Service, now available at maps along with some observations, in particular those made http://eusoils.jrc.ec.europa.eu/wrb/) allows users to retrieve and using boreholes. GeoSciML is based on both GML (OGC, 2004) for download soil data in a raster or KML format (Panagos and Van representation of geometrical features and the OGC Observations Liedekerke, 2006). SOMIS mainly uses the WRB and FAO soil and Measurements standard (O&M) for the attached attributes. classifications, deriving this from the European Soil Database It was initially considered that, given the physiological proxi- (which can be also downloaded as CD-ROM from http://eusoils. mity and overlap of geology and soil, certain of the elements jrc.ec.europa.eu/esdb_archive/ESDB/Index.htm). The principles of selected in GeoSciML could be reused in SoTerML. Particularly, the ESD particularly for the raster distribution of soil data and the GeoSciML uses GML for geometrical attributes, thus for the use of different scales are described in Darrousin et al., 2006, efficient reuse of the developed GeoSciML classes, many geo- Panagos, 2006 and Montanarella et al., 2005. metric dependencies of SoTerML can be achieved by reusing Early work to align the European Soil Database and Informa- GeoSciML classes. However, GeoSciML is itself a relatively heavy tion System with SOTER noted the complementarity of the package compared to the SoTerML core and it defines many more physiographic layer and representative profile dataset between other elements that are not usable in SoTerML Although this the two approaches (Nachtergale, 2000). The contemporary remains an objective, concerns have emerged that the semantic SoTerML format now draws together these land facets in the differences between the two domains may prove hard to accom- mark up language. The SOTER approach provides an ‘‘orderly modate; for example a soil profile is not a ‘Borehole’ as envisaged arrangement of natural resources data y from the point of view in GeoSciML and a ‘‘GeologicUnit’’ as defined does not capture the of potential use and production, in relation to food requirements, full description of soil parent materials. It is hoped that future environmental impact and conservation’’ (Nachtergale, 2000). SOTER developments should see convergence of these standards. does not simply concern soil data, but seeks a representation of soil, terrain and landscape facets. The SOTER Terrain component seeks to 3.3. Soil and terrain database (SOTER) delineate a spatial, physiographic unit representing landforms based on homogenous gradient and relief intensity (FAO, 1995). Initial Since 1986, ISRIC has initiated and led the development of the physiographic delineation is then combined first with lithological national and the global soil digital database at the scales ranging information, then with slope and mesorelief characteristics to further from 1:5 M to 1:500 K, depending on the user’s need (Oldeman sub-divide the landscape. Soil components are then integrated with and van Engelen, 1993). The implementation has been under- these resultant terrain components. In turn, soil components contain taken by FAO, UNEP and ISRIC, and has been supported by the soil representative profiles, having fully described profile horizons, International Union of Soil Sciences (IUSS). A main part in SOTER with respective descriptive and analytical information. The scope of development is the design and methodology of coding the soil SoTerML is to capture in a single coherent and OGC standards- attributes. This coding scheme has been used in SoTerML attri- compliant form all these member elements of the SOTER bute design. characterisation. A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283 273

SoTerML

GeologicFeature «FeatureType» GeologicUnit::GeologicUnit +part «DataType» + geologicUnitType: ControlledConcept GeologicUnit::GeologicUnitPart «estimatedProperty» 0..* + bodyMorphology: CGI_TermValue [0..*] 1 + role: ScopedName + compositionCategory: CGI_TermValue [0..1] «estimatedProperty» + exposureColor: CGI_TermValue [0..*] +containedUnit + proportion: CGI_Value + outcropCharacter: CGI_TermValue [0..*] + rank: ScopedName [0..1] + unitThickness: CGI_Numeric [0..*]

+SoTerUnit 1..*

SoTerUnit TerrainComponent SoilComponent

+ SoTerUnitID: CharacterString + TerrainComponentNumber: Integer + SoilComponentNumber: Integer

Fig. 1. The top level description summarising SoTerML objects in version 4 (reusing GeoSciML classes).

SoTerML

+SoTerUnit 1..*

SoTerUnit

+ SoTerUnitID: CharacterString

0..1 0..*

+terrainComponent 0..*

TerrainComponent

+ proportion: Decimal + TerrainComponentNumber: Integer

+shape 1 0..*

GM_SurfacePatch +soilComponent 0..* «type» SoilComponent Coordinate geometry::GM_Polygon {n} + proportion: Decimal + SoilComponentNumber: Integer + boundary: GM_SurfaceBoundary + spanningSurface[0..1]: GM_Surface 0..* 0..* + GM_Polygon(GM_SurfaceBoundary, GM_Surface) : GM_Polygon + GM_Polygon(GM_SurfaceBoundary) : GM_Polygon +representativeProfile 0..*+profile 0..*

GM_Primitive Profile

«type» +point + profileID: CharacterString Geometric primitive:: GM_Point 1 0..1 {n}

Fig. 2. The top level description summarising SoTerML objects in version 5.1 (GeoSciML independent, GML dependent). 274 A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283

3.8. US soil taxonomy 3.10. Soil-ML

US Soil Taxonomy is the soil classification standard used in the A recent initiative to harmonise different Soil schemas is the USA (USDA, 1999). The classification is based on analytical Soil-ML project (Montanarella et al., 2010). This initiative is methods that groups soils into a hierarchy of classes, comparable planned to be governed by a working group within the IUSS to that used in WRB. SoTerML has not yet implemented the US (International Union of Soil Sciences), which is a strong part of the Soil Taxonomy as one of the classifications (as SoTerML is agenda to insure recognition and sustainability. Their aim is to primarily designed within a non-US context), but it is open to built upon the different approaches including SoTerML to propose be extended to cover the Soil Taxonomy as required. a conceptual model implementable in XML for a global adoption needed for international programs such as GEOSS.

3.9. ISO ‘‘Recording and exchange of soil-related data’’ 4. Soil and terrain mark-up language (SoTerML) This ISO working group commenced work in 2008 but has not yet published a standard. The draft documents (ISO/DIS 28258) of 4.1. Design backgrounds this group target a very general purpose class design for soil- related data, on which a mark-up language can be based. The design of the data model has been based on: (1) the Considering the consensus sought by this working group, one development of different SOTER databases in the past; (2) a study could anticipate a high level of acceptance from the of its overlaps with the GeoSciML data model specifications; and community. (3) a study of the current status of soil classification standards.

SoTerML +SoTerUnit SoTerUnit 1..* TerrainComponent +terrainComponent + SoTerUnitID: CharacterString + proportion: Decimal 0..* 0..* + TerrainComponentNumber: Integer

0..* 0..* 0..*

+soilComponent 0..*

SoilComponent

+ proportion: Decimal +attribute + SoilComponentNumber: Integer +attributeReference +attribute 0..* 0..* 0..* 0..* 0..* AttributeReference +attribute 0..1 Attribute + name: IDREF 0..*

+attribute 0..* 0..*+attribute +representativeProfile 0..*+profile 0..*

0..* Profile +attribute 0..* + profileID: CharacterString

metadata 0..1

0..*

+classification 0..* +metadata 1 0..* 0..* 0..* +horizon SoilClassification:: Metadata entity set information:: SoilClassification MD_Metadata Horizon {n} + holder: CharacterString [0..1] + name: CharacterString [0..1] + version: CharacterString [0..1] + year: CharacterString [0..1]

Fig. 3. The second level of UML hierarchy. A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283 275

The main design resources are: and previous versions of the SoTerML package can be accessed via http://lab.maps.nottingham.ac.uk/SoTerML. The modelling approach SOTER Procedures Manual (van Engelen and Wen, 1995); was initially based on the potential overlaps with GeoSciML, as SOTER Database Structure v3 (Temple, 2002); described in the next sub-sections, but a finer semantic interpretation SOTER2009 (January and April editions): SOTER Manual has led us to drop this dependence. Since version 5 (currently 5.1) Update for Discussion e-SOTER partners (Draft): SOTER Attri- SoTerML is no longer dependent on GeoSciML, making the class bute Data Coding and draft of non-spatial attributes of a SOTER structure a more appropriate semantic for soil and terrain data. Unit. (internal e-SOTER document); WRB (WRB, 2006, 2007); GeoSciML’s online documentations, data model and schemas 4.2.1. Data model (GeoSciML, 2006); and The data model specification where details are included in the FAO revised legend (FAO, 1988). UML design, consists of the core SoTerML elements and soil classification sub-elements. The approach in designing the UML If geological data and soil and terrain data are different class diagram follows the object-oriented method of system semantically, they nonetheless do have a connection at a global analysis and design (Bennett et al., 2006). The CASE tool used to semantic level. A geoscientist may seek to obtain different types draw out the UML diagrams was also able to generate the XML of information at a given spatial location. Therefore SoTerML has Schema (XSD) as well as provide code engineering from the UML to be technically compliant with GeoSciML. They are both diagrams. dependent on GML (OGC, 2004) and it is via an anchor on GML that the compliance is sought. Early drafts of SoTerML package (up to version 4) expressed this anchor by re-using a sub-package 4.2.2. XML schemas from GeoSciML. The XML Schema (XSD) is the W3C standard language for It is noticeable that some links between SOTER databases, describing the structure of a data model described in XML. Along WRB and FAO soil classifications have already been implemented with references to GeoSciML 2.0 and GML 3.0, the main XML elsewhere, for example, in the SODA design. schema (SoTerML.xsd) includes the developed schema for the core elements and for the sub-elements (SoilClassification.xsd). The latter contains the validation rules for a part of the SoTerML 4.2. SoTerML package that describes the soil classification (e.g., WRB). Another impor- tant ‘‘schema’’ is named AttributeReference.xml (detailed in the The SoTerML package includes a data model specification, XML next sub-section) that contains the attribute data dictionary and schemas, the SoTerML attribute pattern, the matching table with vocabulary information to be used for validation purposes within SOTER database, as well as documentations and samples. The current a written SoTerML document.

SoTerML

SoTerUnit 1 SoTerUnit 2

Terrain Terrain Component 1 Component 2

Soil Soil Component 1 Component 2

Profile 1 Profile 2

Classification1 Horizon Laboratory

Horizon 1 Horizon 2

Fig. 4. Needed attributes structure in the class hierarchy of SoTerML. 276 A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283

4.2.3. Attribute patterns Each TerrainComponent mayincludeanumberofSoil- This includes the table of different attributes applicable to Components. different elements of SoTerML and their permitted values. A number of SoilProfiles can be associated to a Soil- Component. Each Profile itself includes a number of Horizons. A Profile is 4.2.4. SOTER matching table classified according to one of the standard SoilClassifica- This document is aimed for those who have been familiar with tions. the legacy SOTER database schema, providing a guide for match- A soil classification (e.g., WRB, 2006) instance is then asso- ing between the SOTER database attributes and their equivalent ciated to each profile. For the case of WRB, 2006, 2007, an RSG implementation in the SoTerML. (Reference Soil Group) and to a number of Qualifiers (and optionally, specifiers) defined as prefixes or suffixes. 4.3. Class hierarchy All of the above classes may have a number of SoTerML- specific attributes. The details and diagrams of the class hierarchy in the SoTerML Each attribute above has a name and value. It may also be data model are also described in the SoTerML documentation. The associated to a number of AnalyticalMethods. top levels of the class hierarchy in two different versions of In the current version (5.1), the geometry of each SoterUnit SoTerML are shown in Figs. 1 and 2. The second level of UML and Profile is implemented by the links to relevant GML diagram is shown in Fig. 3. objects called shape and point respectively. In version 4 how- A brief review of Figs. 1–3 shows that: ever, those geometries were implemented via upper classes (GeologicUnit of GeoSciML package and SamplingPoint of The root element is called SoTerML can include a number of O&M package) to the relevant GML objects. As stated above, the SoTerUnits. classes reused from GeoSciML do not carry the same semantic Each SoTerUnit may include a number of TerrainComponents. descriptor as in the ones from GeoSciML. Nonetheless, as they

AttributeReference

+attribute 1..*

AttributeEntry +attribute 0..* Attribute + name: ID name + sourceElement: CharacterString [0..*] + name: IDREF 0..1 + description: CharacterString [0..1]

0..1 0..1

+value 0..1 +value 0..* +analyticalMethod 0..* Value ValueEntry + literalValue: CGI_TermValue LaboratoryAnalysis + valueID: ID valueID + numericValue: CGI_NumericValue + shortTerm: CharacterString + valueID: IDREF + fullTerm: CharacterString [0..1] + description: CharacterString [0..1] 0..* 0..*

Only one sub-element out +analyticalMethod 1 of three can be +laboratory 1 implemented AnalyticalMethod Laboratory + introductionYear: CharacterString + introductionMonth: CharacterString + serialNumber: CharacterString + description: CharacterString

Fig. 5. UML design for SoTerML attributes. A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283 277

seek to represent the same feature type viewed either from a A data model implementing these principles and with the soil perspective or geological perspective, they must corre- required characteristics for an attribute pattern is shown in Fig. 5. spond. We retained the GeoSciML semantic here, but these classes can be redefined semantically for the soil context and 4.5. Matching attributes between the SoTerML and SOTER declared as perfectly matching with the GeoSciML ones. Since SOTER databases are to beupdatedbye-SOTERusing 4.4. Attribute pattern design SoTerML attribute patterns, a matching mechanism has been imple- mented in order to import (or export) attributes between those two The approach in designing SoTerML data attributes was to domains. The required data attributes in the specification of the incorporate as much flexibility and reuse as possible. Different SOTER database structure have been divided into the following four elements in the class hierarchy of SoTerML require attributes to be categories: associated with them without restricting their numbers or their data types or specificities. This principle is illustrated in Fig. 4. 1. The attributes that are reused in SoTerML as derived directly Thus a single and flexible mechanism has been designed to from the SOTER database, such as SoTerUnitID. associate attributes and values to different elements, which possesses the following characteristics:

The attribute assigning mechanism must be abstracted from the class hierarchy. This allows the developers to manage and maintain the two separately and with flexibility. This means that each element can have many attributes, with the permis- sible attributes and their characteristics for each element being defined in the Attribute Pattern, not the core SoTerML. Each attribute can either have a value, or itself have recursive sub-attributes. The permitted values for each attribute can be either open (numeric or literal) or enumerated/restricted. If it is defined as restricted, the list of permissible values are defined in the Fig. 7. Pseudo-XML code in implementation of fixed (in SOTER) vs. open (in SoTerML) Attribute Patterns. attribute design showing the coding of ‘‘clayMineralogy’’ attribute in Fig. 6.

Fig. 6. Part of the attribute pattern design in SoTerML. 278 A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283

2. The attributes that were required for the relational database The attributes of the category 4 with their properties and design, but not necessarily needed in the SoTerML structure. Not possible enumerations have been called AttributeReference necessarily, because SoTerML uses a semi-structured XML design in SoTerML. The XML file depends on the schema of the Attribu- where redundant information is not always avoided, like teReference element which is built into the SoTerML schema. As a profile_ID. result, any SoTerML compliant reference must have the 3. The attributes which are automatically associated with, or inher- SoTerML.xsd as its schema and a chosen AttributeReferen- ited from, GeoSciML, or from SoilClassification.xsd classes. ce.xml as an included XML file. For including an XML file within This happens only if the right classes are chosen for association/ another XML, as shown in the sample file presented, the XInclude generalisationintheSoTerMLdatamodel,suchasproportionofa mechanism is recommended as a W3G standard (Marsh et al., soil component in a terrain component which is implemented by 2006). It is noticeable that AttributeReference may not be itself the proportion attribute of GeologicUnitPart in GeoSciML. validated unless it is used within a validated SoTerML file. 4. The attributes that are designed to be implemented using the In the AttributeReference file each attribute has been named Attribute Patterns mechanism in SoTerML. These are the attributes together with its SourceElement, Description, Reference and that are appropriate to be fed from an AttributeReferen- a list of permitted enumerated values (where applicable). SourceE- ce.xml bank, rather than being an explicit data member of lement isthenameofaSoTerMLclass(orthenameoftheparent SoTerML classes. Having these attributes as records of a data bank attribute for the case of 4.c above) in which this attribute can be and not represented as columns can provide much more flexibility assigned. For each enumerated value, short and full terms and in the data model especially to the current and future amend- description can be defined. ments of the assignable attributes to each SoTerML class. These Fig. 6 shows a part of SoTerML’s attribute pattern table. In this attributes are themselves divided to three groups: figure, for example, the ‘‘ mineralogy’’ is an attribute name, 4.1. Numeric or literal: these attributes can accept a number belonging to ‘‘horizon’’ as the source element and having seven or a literal string, such as pH-KCl. valid options as its value. Fig. 7 compares the fixed vs. open 4.2. Enumerated: the values of these attributes must be one of approach in XML implementations in both domains. a set of predefined values, such as vegetation having a set of permitted values as ’’I’’, ’’IA’’, ’’IA1’’, ’’IA2’’ etc y. 4.6. The soil classification in SoTerML 4.3. Sub-attributes: Some of the attributes are a container for some other sub-attributes of one of the two above groups, The SoTerML approach to soil classification is designed to be open, forming a recursive reference to other attributes; for thus different standards can be included and a soil profile can be example, the ‘‘dryColor’’ attribute which has its own three classified using those standards. The standards implemented in attributes, being ‘‘hue’’, ‘‘value’’ and ‘‘chroma’’. SoTerML to date are WRB (2006, 2007) and Revised FAO Legend

SoilClassification

+ holder: CharacterString [0..1] + name: CharacterString [0..1] + version: CharacterString [0..1] + year: CharacterString [0..1]

WRB2007 WRB2006 FAORevisedLegend1988 NationalSystem USDASoilTaxonomy

Implemented in this ... Or any other Not Implemented here package standards of

Fig. 8. Soil Classification in SoTerML. A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283 279

1988 (Fig. 8). Accommodation is also made for national soil J For buried layers, the ‘‘thapto’’ specifier as an intergrade classifications. qualifier will be as a suffix. In summary and according to the 2006 WRB specification The standard format of classifying a soil profile is by knowing (WRB, 2006): its RSG, its qualifiers and specifiers: ’’[[specifier(s)]þ[prefix qualifier(s)]]þRSGþ[[specifier(s)]þ[(suffix qualifier(s))]]; an The classification of a soil profile is based on diagnostic criteria example being: Histic Turbic Cryosol (Dystric). in its Horizons, Properties and Material, where each of them can be selected from a predefined list. In SoTerML, it is possible to code using the WRB classification Then the Diagnostic will be mapped in two levels of in both 2006 and 2007 (WRB, 2007) formats (Fig. 9). A substantial classification: difference between the two versions (in terms of underlying data J First, to categorise the soil profile to one of the predefined model) is the ability of making qualifiers in 2006 by adding ’’-ic’’ Reference Soil Groups (RSGs), e.g., and . at the end of RSG, which can be used for classifying buried layer J Second, to assign a number of qualifiers to the selected RSG. using ’’thapto’’ specifier (like thapto Podzolic). In the 2007 format, The Qualifiers are of three types: typically Associated, Inter- these qualifiers are no longer valid to make, instead the model grade and Others, each being selectable from predefined now allows connecting two classifications by ’’over’’ to describe contextual lists. buried layers. For the data model, this means: 1) a fewer number Specifiers are used to define a certain characteristic of a of qualifiers; and 2) an ’’over’’ attribute for WRB, 2007 classifica- qualifier, written before the qualifier, and can be selected tion, which is another WRB classification (a cyclic reference from from a predefined list. Examples are Epi-, Endo-, Hyper- WRB, 2007 to itself via the attribute ’’over’’). and Hypo-. Typically associated qualifiers and other qualifiers are consid- 4.7. Code sample ered to be prefix of the RSG, while other qualifiers are considered to be suffix, having two exceptions: For the sake of clarity, Fig. 10 shows only a fragment of the J Arenosols is of type intergrade but is a suffix qualifier. XML coding of a sample SoTerML file. This part shows a soil

SoilClassification WRB2007 «enumeration» +rsg RSGList 1 +over 0..1 Albeluvisol Arenosol +qualifier 0..* Qualifier2007 Cryosol «enumeration» Ferralsol +position PositionList 1 prefix suffix Luvisol +name 1 +specifier 0..1 «enumeration» «enumeration» QualifierList2007 SpecifierList Abruptic Bathy Aceric Cumuli Acric Endo Acroxic Epi Albic Hyper Alcalic Hypo Alic Ortho Aluandic Para Alumic Proto Andic Thapto Anthraquic Anthric Anthrotoxic Arenic Areninovic

Fig. 9. WRB 2007 UML design (cut). 280 A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283

Fig. 10. Example of SoTerML encoding (part of the whole XML file). profile specifications that includes its sampling geometry, WRB consumers of such information. This section presents a prototype classification and a part of one of its horizon definitions and the implementation of this exchange between a database and a web horizon attributes. service. The development of this transfer service has been under- taken within the e-SOTER project at Cranfield University’s National Soil Resources Institute, UK. 5. Prototype implementation The traditional SOTER methodology (van Engelen and Wen, 1995) identifies a database schema for holding the various elements of the SoTerML can be used as an open data exchange format for SOTER soil and terrain classification. Implementations of this schema the soil and terrain-related data between various sources or have been made in a variety of database management tools, with A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283 281 non-spatial attributes held in systems such as dBASE to, most holding soil profile and analytical data together with a semantic recently, MS Access; the latter having the advantage of storing all map database. Functionalities shall be provided to address miss- the required data elements in one file, together with the requisite ing data and to assist in data validation through the use of soil relationships. Spatial data elements were held in an associated ESRI ‘pedo-transfer functions and rules’ to be held as database meth- GIS ‘Shape file’ with integer-based identifiers for the highest level ods. Standards-compliant metadata allows for easy discovery of Soter Unit, or ‘terrain’ element being used to link between the two the data resources. data repositories (Fig. 11). For the database platform, the object-relational database The majority of SOTER databases available to date remain in management system ‘PostgreSQL/PostGIS’ was selected for its this ‘legacy’ format. A means is therefore required to convert such spatial data handling capabilities, and also to satisfy the project data into the SoTerML format. To achieve this, a Java-based parser objectives of producing solutions founded on open-source tool has been developed to process the input GIS Shape file and technologies. associated Access database, resulting in a single SoTerML file. In order to demonstrate how the resulting SoTerML can be fed Here, the existing SOTER database of Kenya has been used to into a web service, the Kenyan SoTerML data was loaded in the demonstrate the transfer. The interface of the developed parser is database then made available using the open source server shown in Fig. 12 and the resulting SoTerML file of the Kenyan software GeoServer, which enables the publication of GML data database is partially shown in Fig. 13. This parser also outputs via Open Geospatial Consortium (OGC, 2004) Web Services. processing notes to guide the user where full compliance to the Fig. 14 shows the resulting map returned from an OGC Web XSD is not possible, allowing the source file to be re-edited before Feature Service for the Kenya SOTER dataset. re-export to SoTerML. This prototype is the forerunner to a SOTER geo-portal which One key objective of the research is to create an e-SOTER will allow users to view, query and download SOTER datasets dissemination platform, namely a web-based portal capable of using the described SoTerML.

Attribute databases Spatial units

319 terrain

317 terrain component 321

soil component 321 322 320

Fig. 11. Relationships between spatial and non-spatial SOTER Unit elements.

Fig. 12. Running the SoTerML Parser. 282 A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283

Fig. 13. A SoTerML output translated from the Kenyan SOTER database.

Fig. 14. GeoServer OGC Feature Service for Kenya, viewed in Snowflake Software GML Viewer, showing complex elements for the selected SoTerUnit.

6. Conclusion implementation of SoTerML within the eSoTer project as demon- strated in Section 5, the links between GEOSS and eSoTer should Developing an XML exchange format for a scientific domain which enable sustainability of its use, as much for legacy data as for new has a long history has proved an interesting yet challenging task. provided soil and terrain datasets using the eSoTer methodologies. As These historical developments incorporate a great deal of important with any format, if the tools already developed are adopted and features, sometimes too specific, depending on the discipline evolu- perfected, more soil scientists and other GEOSS users will benefit tion and on the information technology existing at the time. Existing from the interoperability provided by SoTerML. data structures for legacy data have to be taken into account to allow We hope this experience will benefit the current different easy transfer and update of knowledge. The SoTerML approach with initiatives for harmonisation of soil data models, including the its current model version appears flexible enough to be used in this INSPIRE thematic specification. An archive containing the SoTerML respect and we believe also that its flexibility would allow adaptation package is available from the e-SOTER website and by request to the of future needs for soil and terrain data conceptual models. Beside the first author A. Pourabdollah et al. / Computers & Geosciences 45 (2012) 270–283 283

Acknowledgements Kibblewhite, M., Rubio, J.-L., Kosmas, C., Jones, R., Arrouways, D., Huber, S. and Verheijen, F. (2007) Environmental assessment of soil for monitoring deserti- fication in Europe. 8th Session of the Conference of the Parties (COP 8) to the The authors acknowledge financial support from the EU FP7 United Nations Convention to Combat Desertification (UNCCD), Madrid, Spain, programme under the e-SOTER project (contract 211578). The 3–14 September 2007. authors would also like to thank Vincent van Engelen, Koos Lautenbacher, C.C., 2006. The global earth observation system of systems: science serving society. Space Policy 22 (1), 8–11. Dijkshoorn and Hannes Reuter from ISRIC (World Soil Informa- Le Bissonnais, Y., Jamagne, M., Lambert, J.-J., Le Bas, C., Daroussin, J., King, D., tion, the Netherlands), Rainer Baritz and Einar Eberhardt from Cerdan, O., Leonard, J., Bresson, L.-M., Jones, R.J.A., 2005. Pan-European soil BGR (Federal Institute for Geosciences and Natural Resources, crusting and erodibility assessment from the European soil geographical database using pedotransfer rules. Advances in Environmental Monitoring Germany), Vit Penizek and Simon Cox from JRC (Joint Research and Modelling 2 (1), pp. 1–15. Centre, the European Commission in Ispra, Italy) and Erika Marsh J., Orchard D., Veillard D., 2006. XML inclusions (XInclude) version 1.0 (2nd Micheli from Szent Istvan University (Hungary) for their kind edn.). W3C Recommendation, World Wide Web Consortium (W3C), /http:// www.w3.org/TR/xinclude/S (Accessed August 18, 2010). contributions and feedback in developing this project. Montanarella, L., Jones, R.J.A. and Dusart, J., 2005. The European Soil Bureau Network. In Jones, R.J.A., Houskova, B., Bullock, P., and Montanarella, L. (eds). References Soil Resources of Europe, 2nd edn. European Soil Bureau Research Report No. 9, EUR 20559EN, Office for Official Publications of the European Communities, Luxembourg, 420 pp. Babaie, H.A., Ramachandran, R., 2005. Application of XML in the geosciences. Nachtergale, F.O., 2000. The Global Soil and Terrain Database and the European Computers & Geosciences 31 (9), 1079–1080. Soil Information System. In The European Soil Information System. World Soil Bennett, S., McRobb, S., Farmer, R., 2006. Object Oriented System Analysis and Resources Reports No. 91. European Union and Food and Agriculture Organi- Design using UML, 3rd edn. McGraw Hill. pp. 103–126. zation, Rome, 153 pp. Bellier, C., Robida, F., Serrano, J.J., Brodaric, B., Boisvert, E., Richard, S., Johnson, B., Montanarella L., Wilson P., Cox S., McBratney A., Ahamed S., McMillan Bob, Laxton, J., Duffy, T., Sen, M., Simons, B., Ritchie, A., Wyborn, L., Cox, S., Stolen, L., Jacquier D. and Fortner J., 2010, Developing SoilML as a global standard for 2006. GeosciML, the geosciences markup language. In: Extended Abstracts of the collation and transfer of soil data and information, /http://eusoils.jrc.ec. the third International Conference on GIS in Geology, Moscow, pp. 16–17. europa.eu/esdb_archive/eusoils_docs/Poster/montanarella_EGU2010_XML.pdfS Daroussin, J., King, D., Le Bas, C., Vrscˇˇaj, B., Dobos, E., Montanarella, L., 2006. (Accessed July 1, 2011). Chapter 4: the soil geographical database of Eurasia at scale 1:1,000,000: OGC (Open Geospatial Consortium, Inc.), 2004. OpenGIS Geography Markup history and perspective in . Developments in Soil Science Language (GML) Implementation specification. in: Cox S., Daisey P., Lake R., 31 (55–65), 602. Portele C. and Whiteside A.(Eds.). OpenGIS Recommendation Paper OGC 03-105r1 Eberhardt, E., 2008. SoDa 1.0.6 – Soil Database for ENVASSO, Manual: Database Version: 3.1.0, /http://www.opengeospatial.org/S (Accessed August 18, 2010). Design and Selection; ENVASSO Project coordinated by Cranfield University for Oldeman, L.R., van Engelen, V., 1993. A world soils and terrain digital database Scientific Support to Policy, European Commission 6th Framework Research (SOTER) – an improved assessment of land resources. Geoderma 60, 309–335. Programme, /http://www.envasso.com/Publications/ENV_D5_WP3þAppends_ Panagos, P., 2006. The European soil database. GEOconnexion International 5 (7), prt2bk.pdfS (Accessed August 18, 2010. 32–33. e-SOTER, 2008. e-SOTER project website, /www.esoter.orgS (Accessed August 18, Panagos, P., Van Liedekerke, M., 2006. Mapping services in the European soil 2010). portal. GEOconnexion International 5 (8), 42–45. EU, 2007. Directive 2007/2/EC of the European Parliament and of the Council of 14 Sen, M., Duffy, T., 2005. GeoSciML: development of a generic geoscience markup March 2007 establishing an Infrastructure for Spatial Information in the language. Computers & Geosciences 31 (9), 1095–1103. European Community (INSPIRE). Official Journal of the European Union L108 Tempel P., 2002. SOTER – Global and National Soils and Terrain Digital Database, (50). April 2007. Database Structure v3, ISRIC Working Paper No. 02/01, /http://www.isric.org/ INSPIRE TWG/Soil 2011. D2.8.III.3 INSPIRE Data Specification on SOIL – Draft isric/webdocs/Docs/DatabaseStructureM1.pdfS (Accessed August 18, 2010). Guidelines, /http://inspire.jrc.ec.europa.eu/documents/Data_Specifications/ USDA (United States Department of Agriculture), 1999. Soil Taxonomy, 2nd edn., INSPIRE_DataSpecification_SO_v2.0.pdfS. /ftp://ftp-fc.sc.egov.usda.gov/NSSC/Soil_Taxonomy/tax.pdfS (Accessed August 18, FAO, 1988. Soil Map of the World Revised Legend, World Soil Resources Rep. 60, 2010). FAO, Rome. van Engelen V., Wen T.T., 1995. Global and National Soils and Terrain Digital FAO (1995) Global and National Soils and Terrain Digital Databases (SOTER). Databases (SOTER): Procedures Manual, Vol. (Published also as FAO World Soil Procedures Manual. 74. Rev 1. Land and Water Development Division, FAO, Resources Report No. 74), pp. 115. UNEP, IUSS, ISRIC, FAO, Wageningen, the Rome. 129 pp. Netherlands. GeoSciML, 2006. GeoScience Markup Language Website (Accessed August 18, van Engelen, V., 2011. Standardizing soil data. International Innovation, 48–49. 2010). June 2011. GSM.net, 2010. GlobalSoilMap.ne: New Digital Soil Map of the World, /http:// WRB 2006. World Reference Base for Soil Resources 2006, first update 2007. World www.globalsoilmap.netS (Accessed August 18, 2010). Soil Resources Reports No. 103. FAO, Rome. GSSoil, 2010. eContentplus GS Soil, /http://www.gssoil.euS (Accesses August 18, WRB, 2007. World Reference Base for Soil Resources 2006, World Soil Resources 2010). Reports No. 103. FAO, Rome.