Geospatial Interoperability Standards

Biology and Environment

Phillip C. Dibner Ecosystem Associates

SEE Grid II Canberra, Australia March 16, 2005 Motivation

• Tens (100s?) of billions of dollars worth of spatial data already archived at diverse sites, using a variety of data formats and supporting software suites • Too large and expensive to move and convert • Prohibitively inconvenient to store and manage centrally • Operationally constraining and administratively challenging to mandate a single-technology supplier The Solution: Interoperate

• Data remain in place • No constraints on maintenance operations or policy • Existing in-house tools and applications remain viable • Only the interfaces need be well-defined among interoperating clients and servers Open Geospatial Consortium (Until Recently “Open GIS Consortium”) • The Open Geospatial Consortium (OGC) grew out of the need among federal agencies to share data. • Currently a collaboration of more than 250 vendors, integrators, academic institutions, government and private agencies, and other end users. • Published specifications – Abstract Spec in 16 volumes – 14 Implementation Specs published – Recommendations and Discussion Papers – More in various stages of creation or refinement • Technology baseline is maturing. Increasing emphasis on application to specific domains. Structural View of OGC

• Planning Committee – Steering, policy, general management and oversight • Strategic Member Advisory Committee • Technical Committee (TC) – Meets 4 times per year. Next: United Nations, NYC, January 17 - 20, 2005. C’mon down! – Develops specifications, discussion papers, recommendations – Working Groups (WGs) • Technology • Domain Current (+/-) Working Groups

• Architecture (Arch WG) • Coordinate Reference Systems (CRS WG) • Decision Support WG (DS WG) • Earth Observation (EO WG) • Geo Digital Rights Management (GeoDRM WG) • GeoAPI (GeoAPI WG) • Geography Markup Language (GML WG) • Image Exploitation Services (IES WG) • Information Communities and Semantics (ICS WG) • Location Services (LS WG) • Metadata (Metadat WG) • Natural Resources and Environment (NRE WG) • Query Language (Query WG) • Sensor Web (SWE WG) • University (Univ WG) • (WFS WG) • Joint Advisory Group (JAG) Revision Working Groups

• Focus on (revisions to !) an existing specification • Formed and dissolved according to status of that spec • Current (+/-): – Catalog RWG –GML RWG – GO-1 RWG – OpenLS RWG – RWG – RWGs for WCS, WFS, WMS Process View of OGC • Interoperability Program(IP) – Interoperability Initiatives, Interoperability Experiments, Pilots, etc. – Experimental testbeds for proposed technologies, test existing specs and implementations, discover needs and generate requirements and proposals for new specs. • Standards Program – Requirements come from WGs, IP, and other sources – Produces draft standards (or accepts them from IP) – WG votes bring candidate documents to the TC at large Philosophy of Standards Development

• Minimal impact on existing implementations • Base content and communications standards on implementation-independent abstractions • Use the expertise of other information communities (“buy don’t build”) • Interaction with and participation in other standards bodies – ISO TC 211: • ISO 19107, Feature Geometry; ISO 19128, Web Map Server spec; 19136, GML, are ISO standards or standards in process • Use ISO 19115, 19119, 19139, others –W3C, OMG, … • SOAP, WSDL, UDDI (testbeds), ebRIM -> OGC RIM, BPEL for service chaining (recent testbed), serious examination of XQuery –TDWG?? Abstract Specification

• Conceptual foundation / reference model for spec development • http://www.opengeospatial.org/specs/?page=abstract • 17 “books” or “Topics” (including overview) • Each topic described in 6 sections, including: – Essential Model: real-world items relevant to or that define the universe of discourse – Abstract Model: defines classes and their relationships, typically using a graphical or lexical language – Well known structures, described in UML • Two central themes: sharing geospatial information, and providing services Abstract Spec: Sharing Geospatial Information • Fundamental Data Types – Topic 5: The OpenGIS Feature. (Misleading, useful hint: think “vector data - point, line, polygon.”) • An abstraction of a real-world phenomenon • Geographic if associated with a location on the Earth • Geometric and non-spatial attributes – Topic 6: The Coverage Type. (Misleading hint: think “raster.”) • A specialization of feature • A two- (sometimes higher) dimensional metaphor for phenomena on the surface of the earth • Essentially a function over a contiguous area • Also requires a mapping from earth coordinates to the coverage extent, to provide geolocation – Topic 7: Earth Imagery. A special case of Coverage Abstract Spec: Sharing Geospatial Information • Prerequisites for sharing geospatial data – Topic 1: Feature Geometry • Geometric and topological primitives and operators – Topic 2: Spatial Reference Systems – Topic 3: Locational Geometry Structures • Within images, rasters, and similar entities – Topic 4: Stored Functions and Interpolation • Essential for the support of Coverages – Topic 8: Relationships Between Features – Topic 11: Metadata • Modeling and query Abstract Spec: Providing Geospatial Services

• Topic 12: Service Architecture – comprehensive suite of services – relationships to each other – CORBA / OMG compliance and relationships • Topic 13: Catalogs • Topic 15: Image Exploitation Services • Topic 16: Image Coordinate Transformation Services Abstract Spec: Miscellaneous Topics

• Topic 9: Quality • Topic 10: Feature Collections – Transient or persistent – Dependent on world view or “Project Structure” – Abstract model not yet defined • Topic 14: Semantics & Information Communities Implementation Specifications

• Provide basis for working software; detail the interface structure between software components • http://www.opengeospatial.org/specs/?page= specs • Tested and refined in Testbeds and Pilot programs; recommended changes are brought back to the Technical Committee for vote • Many implementations in commercial and open-source software Recommendations and Discussion Papers

• Recommendations: – http://www.opengeospatial.org/specs/?page=recommendation – Revisions of other specs (e.g., GML and WMS) – Units of Measure, Observation: particularly interesting to TDWG • Discussion Papers – http://www.opengeospatial.org/specs/?page=discussion – Forum for public review of results and proposed but unvoted standards – Geoparser, Geocoder, Gazetteer – Other results from IP initiatives OGC Data Services (“W*S”) Pattern What can you Here… do? GetCapabilities read this.

Capabilities Document

Great! Give Get Map, Feature, or Coverage me data Here you Data are…. Capabilities Document

• An XML document that describes the service. • Specified by XML Schema in most recent version. • element that provides general metadata for the service as a whole – E.g., name, human-readable title, online resource • metadata – Operations such as and – Output formats for these operations (e.g., PNG or JPG for GetMap requests – URL prefix for each operation • Hierarchically nested elements – Name, title – (SRS) - supports EPSG codes – Enclosing of layer NOAA Hurricane Image of the Gulf of Mexico

Source: Implementation Specification OGCProject Document 01-047r2 Political, Coastline, and Populated Areas, Southeastern United States

Source: Web Map Service Implementation Specification OGCProject Document 01-047r2 Combined Hurricane Image and Population Map

Source: Web Map Service Implementation Specification OGCProject Document 01-047r2 Cascading Map Server

Map GIF Server WGS 84

Different viewer client in browser! Web Browser Cascading Map Server Map PNG Server AL St Pln Capabilities Integrator

Map internet internet GIF Map Server NAD 83 Integrator

Map JPEG Server WGS 84

Adapted from Web Map Server Demonstration presentation © 2000 OGC Web Feature Server • Passes GML-encoded data (not pictures) between server and client. • Basic WFS: – GetCapabilities – DescribeFeatureType – GetFeature • Transactional WFS provides for remote update of datastore: – Transaction –LockFeature OGC Data Services

Web GetCoverage Coverage WCS Client Service Coverage Data Coverage Data Coverage Portrayal Service

Web GetFeature WFS Feature Client Service Feature Data (as GML) Vector Data SLD Service

GetMap Map Layers Web Map WMS Service Client Rendered Map Image

Sensor GetObservation SOS Observation Client Constellation of Sensors Service Measurement Collection Planning Studies

Diverse jurisdictions Different data sets Hazards Database

Transportation Network Design Server

Background 1 Background 2 Natural Resources Canada Protected Areas

Management of protected areas: Whose responsibility? Whose jurisdiction?

Data Service Application Protected areas from all provinces, multiple services Jurisdictional boundaries overlay

Sources: OGC User (http://www.opengeospatial.org/press/?page=ogcuser). Data view: Canadian National Forest Information System. Analysis of an Invasion by Exotic Beetles

Green dots: collections of native beetles Yellow: exotics Red: exotics not yet established.

Guides and optimizes management practices

Sources: OGC User (http://www.opengeospatial.org/press/?page=ogcuser). Data view: Canadian National Forest Information System. Problem Completely Solved. Right?

Another vendor's GIS

One vendor's GIS Maybe Not

WFS Client

Road Road Semantics

One Information Another Information Yet another Information Community’s Schema Community’s Schema Community’s Schema Road is: Highway is: Traffic corridor is: _Width _Pavement thickness _No. of vehicles/hour _Lanes _Right of way _Limited access _Pavement type _Width _Lanes …. …. …. Cell tower is: Cell trans. platform is: Cellular transmitter is: _Owner _Location _Cell region _Height _No. of antennas _Location _Licensees _Elevation _Transmitter type …. …. ….

• Terminology – Same term can mean different things. – Different terms can mean the same - or nearly the same - thing. • Information model - how are objects of interest defined? What are their components and relationships? • Implications - what is a running piece of software to do when it encounters incompatible differences in the concept of what an object is? Taxonomies

• The meaning of an object can be understood through its position in a hierarchical structure. • A taxonomic organization can imply inheritance of properties that characterize parent nodes:

THING STUFF

RED THING GREEN STUFF RED STUFF GREEN THING

• The archetypic taxonomy: the Linnaean system for classifying biota. The Reality of Biological Classification

(Not so ideal…)

? ? ? Ontologies

• A richer semantic model can be built upon representation of objects, an arbitrary number of properties that they have, and their relationships. • Objects (or “resources”) have properties, and a value for each property. • Can produce complex webs of relationships, and correspondingly rich semantics, i.e., formal meanings for each element that are machine-interpretable. Ontologies

Material contributed by PCI Geomatics. Groundswell of Technologies WFS automatically generates GML

A WFS "DescribeFeatureType" query through a WFS interface…

… yields a GML 3 W "application schema" F for that data model. … against all features in a S local data model coded in a relational data base…

© 2004 Open Geospatial Consortium, Inc. GML - OGC Lingua Franca for Geospatial Data

• An XML language - now in revision 3.1 – OGC project document OGC 03-105r1 – ISO Committee Draft 19136 – Currently publicly available as an OGC Recommendation Paper • Defined in XML Schema • Based upon the OGC Abstract Specification and ISO 19000 series • Defines a variety of object types – Features – Coordinate reference systems – Geometry (simple and complex) – Topology –Time – Units of Measure – Generalized values • The notion of Feature is fundamental – An abstraction of a real-world phenomenon – It’s a geographic feature if it is associated with a location relative to the Earth – Digital representation of a portion of the earth == a set of features GML

• Really a generic object model • Supports an object-property paradigm similar to class-property model of RDF (like entity-relationship pattern in RDBMS) • Encoded by declaring a GML type (subtyped directly or indirectly from gml:AbstractFeatureType or gml:AbstractFeatureCollectionType) – Then assign properties to that type – Property = a {name, type, value} triple • Semantics are provided by context of application, not inherent to the GML document (although it is good practice to name a property according to the role of its value or the relationship it defines). • The state of a feature is defined by the values of its properties – A property may be geometric and geolocated – A GML feature may have multiple geometric properties • GML properties are XML elements that are children of a feature, not attributes • A GML object cannot contain another object directly - it must have a property that contains that object as a value. GML Application Schemas

• GML schema documents provide base types only.

• Feature types for a particular application are defined in an XML Schema document that defines a language - a GML Application Schema - of interest to a particular information community.

• Conceptual role of an application schema: define a catalog of feature types relevant to the information community.

• Operational role: validate XML instance documents containing information of interest to the community.

• An application schema need not import GML in its entirety. – Subsetting guidelines, and a set of xslt stylesheets for automating the process, are provided in an appendix to the GML-3.1 specification document. GML: Example

Cold Creek 1,2 3,3 3,4 4,5 gravel What a "Translating Web Feature Server" does:

WFS-X

maps elements in one schema Application Application Schema Schema to (global, or (local) standard – e.g.,FGDC) elements in another schema…

…so a server can respond to a query for a feature of one type by delivering a feature of another nearly identical type.

© 2004 Open Geospatial Consortium, Inc. A WFS-X must be configured manually!

Application Application Schema Schema (global or (local or generic) specialized)

Is this the same Almost! as that?

© 2004 Open Geospatial Consortium, Inc. After configuration, two data sets using the two different data models can be translated "on-the-fly"…

WFS-X maps elements in one schema Core Application Schema to Schema (FGDC) elements in (local) another schema

… translates as…

© 2004 Open Geospatial Consortium, Inc. A WFS-X can reach across the web to translate data from a remote WFS:

User's web Query: GetFeature browser WFS-X (Standard Schema)

Translating Web Feature Server Result: a feature configured to translate between (Standard Schema) Standard Schema and schema "A".

re e tu ur ea at tF ) fe ) Ge A a A y: a lt: a er em su em u h e ch Q Sc R S al al oc oc (L (L A WFS Web Feature Server serving Schema A data

© 2004 Open Geospatial Consortium, Inc. Example: GOS Transportation Pilot User's browser

• Portal has a registry to help user find data. • Portal provides browser-accessible WFS client. • WFS client on portal issues a standard schema query, but is able to access local schema data through a WFS- X that is configured to translate that local schema.

Web Registry Service WFS Client Generator

DOT Portal Node

WFS WFS-X WFS WFS-X

California Siskiyou County Jackson County Oregon Node Node Node (local schema) Node (std. schema)

© 2004 Open Geospatial Consortium, Inc. Consequence: Unified View of Transportation Network

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

© 2004 Open Geospatial Consortium, Inc. Biological Collections Mission: • International forum for biological data projects. • Develop and promote use of standards • Facilitate data exchange ©Taxonomic Databases Working Group

• Affiliate of the International Union of Biological Scientists • Only international body developing / promoting standards for exchange of biocollections data • Data sets (specimens & observations) the primary source for biodiversity information • Location of a collection is fundamental • They want to use standards for describing location • Raises issues similar to many other domains GML Integration 1 “Lollipop” approach: traditional data model + a bit of GML

TDWG Schema GML “field”

One schema, Access to Biological Collections Data (“ABCD”) already includes a GML “field” as a placeholder GML Integration 1: Implications

• Minimal impact on existing schemas • Immediately available whenever GML descriptions of locations are produced • Entails duplication of information - &/or multiple access methods • Interoperability with data sets from other communities - especially GIS, remote sensing, and other groups with strong connections to geography - might be more challenging GML Integration 2

TDWG GML Application Schema(s)

“But I’ll have to change my root data type!!!!!!!” GML Integration 2: Implications

• Could be a LOT of work, with attendant delays in usability. (Disruptive!) • Might involve rehashing difficult issues that have long been settled. • (On the other hand - might be simple.) • Would facilitate interoperability with data services from other information communities – Inherent compatibility with WFS – Suitable for other services in process GML Integration 3

GML TDWG Schema Map Application Schema GML Integration 3: Implications

• Can develop as a parallel effort - no need to delay implementations using current or emerging TDWG schemas and technologies • Foundation work for the mapping likely essential to creating a GML application schema for collections and related data • We could define a strategy to reduce or eliminate need for future translation of non-GML, TDWG- compliant data instances: – Find feature definitions in the two schemas that match, or nearly match. Then configure software – akin to a "Translating Web Feature Server" (WFS-X) – to translate data based on these equivalencies. – There are some risks - the process may reveal that some feature definitions are too dissimilar to translate. GML Integration - The Ideal

GML TDWG Schema Isomorphic Application Schema

Then there is a natural mapping. What’s the chance? Happenstance that GML is GML

• Recall GML is an implementation of a generic object model • Should it have been implemented in RDF? – RDF wasn’t sufficiently mature – XML Schema provides fine-grained primitive types –GML could be written in RDF, DAML, OWL. Earlier versions have been written in DTD, RTF, and XML Schema • If a schema is built upon an implementation- independent, abstract object model (can be expessed as UML), it can be converted to GML.

Portions of this material were contributed by Galdos, Inc. Automated UML-to-GML Translation: “MDA”

Automatically generate UML Model GML Application Schemas from the model using a UGAS (UML to GML Application Schema) tool. UGAS Tool The component schemas are inherently harmonized, GML since their models are. Application Schema (SchemaGlobal) (Such a tool was created in the GOS-TP initiative, used in OWS- 1.2, and is released under GPL.)

Diagram © 2004 Open Geospatial Consortium, Inc. There’s a Great Chance

GML TDWG’s XSLT Application Darwin Core Transform Schema

Not for ABCD, but for Darwin Core, a simpler but still very broadly useful schema for biocollections, is implementable in UML, and follows the Object/Property paradigm. Development of a suitable transform is in process. Barriers

• Isn’t GML really complex? A barrier for nonspecialists. – Complex because domain is complex: 28 schemas in GML 3.

• However: – Increasingly, a matter of “pushing buttons.” (+/- …) – There is nothing normative about the way GML is packaged. Can extract what is needed as basis for application schemas. – There are tools (including a script in the GML 3.1 doc) for extracting the required components. – There is a substantial community with an interest in providing assistance. Georeferencing Biocollections

2.5 to 3.5 BILLION specimens worldwide. Even more species observation records from the last 250 years. Substantially unusable. Less than 0.25% georeferenced worldwide. No common form for data. No standard way to collate the info. Too big a job for 1 institution to do it all. Solution: create a toolset that can be accessed by collections managers of the world. BioGeomancer (!)

© Moore Foundation 2005

Where might standards-based interfaces fit in? Data Dissemination The obvious opportunity … both Client and Server Side

WMS / WFS Client?

WMS Server?

WFS Server?

© Moore Foundation 2005 Georeferencing Engine

Geoparser? (a WFS profile, Notional input record: and an OGC Discussion Paper). Dialog w/ OGC recID:String started. locality:String country:String adm1:String adm2:String Gazetteer? (a WFS profile, taxon:String and part of the OGC Adopted elevation:String Baseline). context:String date:String

WPS?

© Moore Foundation 2005 Other Components, Other Opportunities

WFS-T?

WCS?

CTS? WCTS?

© Moore Foundation 2005 OGC Sensor Web Industrial – All sensors connected to the Network Process – All sensors report position & observations Monitor - Sensor modeling and encoding: SensorML - Observation modeling and encoding: O&M (GML-based) - Access observations through Sensor Observation Services - Plan collections through Sensor Planning Services Environmental - Access sensor-related metadata through Web Registry Services Monitor - User notification through Web Notification Services - Alerts through Web Alert Services - ?? … and the Common Alerting protocol ?? Airborne Traffic Stored Imaging Monitor Sensor Device Data

Strain Webcam Gauge

Health Satellite-borne Monitor Imaging Device © 2004 Open Geospatial Consortium, Inc. Overview of Sensor Web Access

• Historical and real-time measurements from meteorological, water quality, air Sensor Probes quality, and seismic Sensor Observation Services (SOS). • Some sensors are not always available. Must be queried as to when / if they can acquire data. Satellites. Ground or Water Quality Meteorology Air Quality airborne mobile assets. • Sensor Planning Service (SPS) executes GetFeasibility operation. Then may execute SubmitRequest operation. • Requires notification. SOS SOS SOS • (SWE) a major focus of new initiative: OWS-3.

SOS and SPS Client Western States Fire Mission

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

• Demonstration UAV flights Q3 ‘05

QuickTime™ and a TIFF (Uncompressed) decompressor • New 12-channel multi-spectral imager, are needed to see this picture. built upon prior work with AIRDAS; higher spectral & spatial resolution • Telemetry using KU-band onboard communications with satellite relay link, C- band down-facing link for local control. QuickTime™ and a TIFF (Uncompressed) decompressor • PPP connection over each for assessing are needed to see this picture. performance on other band. • Limited radius and 24-hour fire reconnaissance flights.

Source: NASA Ames Reseach Center and USDA Forest Service Western States Fire Mission Opportunities for standards experimentation: Mobile SOS for aerosols or combustion products Geovideo - a new OGC service experiment Federated SPS for more sophisticated planning, using multiple assets Actuate features / aim sensor

QuickTime™ and a TIFF (Uncompressed) decompressor Planned OWS-3 involvement: are needed to see this picture.

SPS deployment QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Source: NASA Ames Reseach Center and USDA Forest Service Another Application: Ecological Sampling (Geolocated) Geometry Individual and Summary Observations of Biota

• Biomass • Dimensions • Presence Quadrats • Population • Condition

OF Belt or Line Transects

Point Transects and Arrays What?

Etc. The Observations and Measurements (O&M) Pattern

• An OGC Recommendation Paper: OGC document #03-022r3. Publicly available. (Editor: Simon Cox, CSIRO) • An Observation is a GML Feature • Incorporates prior work on theory of measurement • Observation or measurement abstraction is the result of a process. Recent ruminations regard it increasingly as a workflow in which the observations provide evidence for a classification or other type of value • Used in XMML - eXtraction and Mining Markup Language (Simon Cox, CSIRO) Observations and Measurements Interoperability for Generic Measurements with a Spatial Component

Reference Collection? Name/Concept? Species in Field Biology

• Geometry (and time) falls neatly out of sampling protocol. • Metadata less complex than for museum collections. Not specimen oriented; want to characterize a community. TDWG Darwin Core really is overkill for this. •Not the name, but a name: scientific binomial (with subspecies designations, other modifiers) + an “according to” property (technical flora, literature citation, …) should be adequate to identify a species observation formally and unambiguously. • More complex structures like species lists and population vectors, and derived measures like species count or diversity measures then also have a formal basis.