The OGC Web Coverage Processing Service (WCPS) Standard

Peter Baumann Jacobs University Bremen 28759 Bremen Germany [email protected]

ABSTRACT 1. MOTIVATION Imagery is more and more becoming integral part of geo More and more, imagery is becoming integral part of geo services. More generally, an increasing variety of sensors is services. In a more general perspective, a variety of remote generating massive amounts of data whose quantized nature and in situ sensors generate massive amounts of data where frequently leads to rasterized data structures. Examples in- the quantized nature of the measurements frequently leads clude 1-D time series, 2-D imagery, 3-D image time series to a rasterized data structure. Examples include 1-D time and x/y/z spatial cubes, and 4-D x/y/z/t spatio-temporal series, 2-D imagery, 3-D image time series and x/y/z spatial cubes. The massive proliferation of such raster data through cubes, and 4-D x/y/z/t spatiotemporal cubes. In parallel a rapidly growing number of services make open, standard- to the massive proliferation of such raster data through a ized service interfaces increasingly important. rapidly growing number of services the demand arises for not just simple data extraction and download services, but Geo service standardization is undertaken by the Open Geo- more flexible retrieval functionality. Spatial Consortium (OGC). The core raster service stan- dard is the (WCS) which specifies This foreseeably will soon go beyond just offering subsetting retrieval based on subsetting, scaling, and reprojection. In from some server-side imagery. Actually, we face the tran- 2008, OGC has issued a companion standard which adds sition from mere data services to value-adding information flexible, open-ended coverage processing capabilities. This services, including adequate data processing capabilities – in Web Coverage Processing Service (WCPS) specifies a cover- other words, we face a paradigm shift from data stewardship age processing language allowing clients to send requests of to service stewardship. arbitrary complexity for evaluation by the server. In this contribution we adopt a standards-based perspective, This contribution reports on the WCPS standard by giv- introducing a recently published standard for a high-level ing an introduction to its coverage model and processing language allowing complex server-side processing of multi- language. Further, design rationales are discussed, as well dimensional raster data. Standardization of open, inter- as background and relation to other OGC standards. 1-D operable geo services is performed by the Open GeoSpa- to 4-D use case scenarios illustrate intended use and bene- tial Consortium (OGC, www.opengeospatial.org) in collab- fits for different communities. Although the paper focuses oration with ISO, OASIS Open, W3C, and other relevant on conceptual issues, the WCPS reference implementation, bodies. One of the historically first and still most promi- PetaScope, is briefly addressed. The author is co-chair of nent standards is the (WMS) Implementa- the coverage-related working groups in OGC. tion Standard [9] which defines map image generation based on client-side parameter specification controlling server-side Categories and Subject Descriptors rendering (”portrayal”). While the results of such protrayal E.1 [DATA STRUCTURES]: Arrays; H.2.8 [DATABASE are ready for, e.g., immediate display in a Web browser, MANAGEMENT]: Database Applications—Image data- such rendered images usually are not suitable for further bases, Scientific databases, Spatial databases and GIS processing, e.g., in some analysis tool. For this purpose, the Web Coverage Service (WCS) Standard has been devel- oped [39]: ”Unlike the WMS, which portrays spatial data to return static maps (rendered as pictures by the server), the Web Coverage Service provides available data together with their detailed descriptions; defines a rich syntax for re- quests against these data; and returns data with its original semantics (instead of pictures) which may be interpreted, extrapolated, etc. - and not just portrayed.” WCS basically allows retrieval based on subsetting, scaling, and reprojec- tion. As such, WCS is the central OGC standard for simple, easy-to-use coverage services. The term ”coverages” is used by ISO and OGC to denote a ”space-varying phenomenon”, which in practice today boils down to raster data. A more detailed discussion of the term will be given in Section 5.1.

Effective December 2008, OGC has issued an extension to WCS which adds flexible, open-ended coverage processing capabilities to WCS. This Web Coverage Processing Service (WCPS) Interface Standard [2] specifies a processing lan- guage as client/server interface which can be characterized as ”SQL for coverages”. WCPS allows for manifold ad-hoc processing of , such as deriving the vegetation index, determining statistical evaluations, and generating Figure 1: Some of the most basic OGC services different kinds of plots, like classification, histograms, etc.

The purpose of this contribution is to present the WCPS stones to which OGC standards need to adhere. For exam- standard to the geo and GIS community. To this end, the ple, OWS Common mandates a GetCapabilities request type remainder is organized as follows. Section 2 puts WCPS for every service which allows clients to retrieve information into the context of OGC’s work. The main considerations about data sets offered and service capabilities provided by and requirements that have guided WCPS development are a server implementing the standard. Further, common sets discussed in Section 3, followed by a review of the state of of metadata as well as canonical structures for the request the art in Section 4. Section 5, the core part, presents WCPS XML schemas are laid down there. model and language concepts. For a practical assessment, a cross-dimensional set of use case scenarios is discussed in Figure 1 sketches the WFS, WCS, and CS-W standards Section 6. Section 7 gives a summary and outlook. suites, plus WMS. WFS-T, WCS-T, and CS-T represent so-called transactional services which allow for updating of 2. OGC data sets on a server. Filter Encoding (FE in Figure 1) and OGC issues a family of modular geo service standards which OWS Common Query Language (CQL) serve for predicated- are accessible free of cost. While the number of individual based retrieval on vector and meta data, respectively, a role specifications sometimes is perceived as a disadvantage, the which WCPS takes on for coverages. The core difference complexity of the matter actually mandates such modular- between WMS and the other interfaces mentioned is that ization. Of course, maintaining harmonization between the WMS delivers portrayed feature and coverage data which specifications represents a continuous challenge which re- are suitable for human viewing, but not for further process- mains on the agenda of the specification writing working ing; WFS, WCS, and CS-W, on the other hand, deliver data groups. without semantic loss so that the output can be fed into fur- ther processing tools or pipelines (such as GISs). A possible grouping of OGC specifications runs as follows: In this contribution we exclusively focus on service interface standards for coverages, which center around WCS. These • core services: for the classical triad of geo data – vec- are dealt with by the WCS Standards Working Group, in tor, raster, and metadata –, the short: WCS.SWG, and the WCPS Working Group. In ad- (WFS) [29], WCS [39], and Catalog Service (CS-W) dition, the Coverages Discussion Working Group (Cover- [25] are provided. ages.DWG) acts as a platform for the exchange and dis- • value-adding services: on top of the core services, ad- cussion of coverage-related topics, such as features and their formulations in forthcoming specification versions. The au- ditional service specifications allow for browser-based 1 navigation (such as WMS), processing (such as the thor co-chairs these working groups . One very actively (WPS) [34]), and additional contributing interest group is GALEON (Geo-Interface to service features (such as Digital Rights Management, Atmosphere, Land, Earth, Ocean, NetCDF) which regu- GeoRM [36]), to name but a few. larly reports about implementation experience as well as their view on requirements for future WCS versions; see • topical specification families, such as the Sensor Web www.ogcnetwork.net/galeon. Enablement (SWE) [7]. Historically, the WCS 1.0 specification has been the first attempt to standardize raster services. Some shortcomings All specifications are based on a common architecture in perceived with respect to clarity and conciseness have been support of OGC’s vision of geospatial technology and data remedied in version 1.1, however at the cost of an overall interoperability. The Abstract Specification document series harder to understand specification, as it turned out. The provides the conceptual foundation for OGC specification current version is WCS 1.1 Corrigendum 2, in short: WCS development activities. Open interfaces and protocols are 1.1.2 [39]. Additionally, beyond the original scope of mainly built and referenced against the Abstract Specification, thus remote sensing imagery meantime a lot more communities enabling interoperability between different brands and dif- and their particular data types need to be considered, such ferent kinds of spatial processing systems. Several of the as 1-D time series and 4-D climate simulation results. Over- Abstract Specification documents are adopted from the cor- all, complexity tends to grow inacceptably – a phenomenon responding standards developed by ISO TC211. not particular to WCS, but observed with other OGC stan- In addition to the high-level, abstract reference model the 1Opinions expressed in this article are those of the author, OWS Common specification [37] delineates technical corner- not necessarily official OGC position. Index (NDVI) from a multi-spectral satellite image. Rather than opening a can of worms by adding an open-ended set of sometimes underspecified algorithms or functions used in different flavours by different communities the WCS group decided to develop WCPS as an extension to WCS which allows users and/or service providers to phrase a large class of operations themselves on demand.

The intended use of WCPS can be summarized as navi- Figure 2: Possible WCS reorganization followig the gation, extraction, and server-side analysis of large, pos- core/extension paradigm sibly multi-dimensional coverage repositories. Navigation of coverage data requires capabilities at least like WMS (meaning subsetting, scaling, overlaying, and styling), but dards as well. Recently, therefore, OGC has developed the on objects of different dimensionalities and often without so-called core/extension paradigm which allows for a guarded an intrinsic visual nature (such as elevation or classification modularization of specifications. It mandates that in future data). Versatile portrayal and rendering capabilities, there- OGC standards shall consist of a core specification, which fore, play a important role. Extraction and download in- identifies the smallest common denominator, plus an open- volve tasks like retrieving satellite image bands, performing ended set of extensions where each extension adds some well- band combinations, or deriving vegetation index maps and defined feature to the core. An implementor of such a speci- classification; hence, they likewise require subsetting, sum- fication set must include the core and can include any subset marization, and processing capabilities. Analysis mainly in- of the extensions while respecting interdependencies stated cludes n-dimensional spatiotemporal statistics. In summary, by the extensions chosen. a range of imaging, signal processing, and statistical oper- ations should be expressible; this has been studied to some Currently there are two WCS extensions already existing, extent in [13]. WCS-T and WCPS. WCS-T (where ”T” stands for transac- tional) extends the delivery service with a data upload facil- Additionally, the language should not be too distant in its ity, thereby allowing services to offer standards-based data conceptual model from existing geo data processing tools ingest facilities [38]. WCPS we will address below. Among (such as the ones listed in the next section) so that it is eco- the possible further extensions under consideration are gen- nomically feasible for vendors to implement the standard as eralized coordinate support (to allow any combination of the an additional layer on top of their existing products. On a spatiotemporal axes plus so-called abstract axes), factoring side note, still such implementations obviously can differen- out fully general CRS support into an extension, support for tiate in terms of performance, scalability, and other factors. an open-ended list of data exchange formats, and advanced interpolation methods during reprojection and scaling. Further, it should be possible for some deployed service to accept new, unanticipated request types without extra pro- In addition and as part of ongoing harmonization between gramming, in particular not on server side. The rationale WCS, WFS, SWE, and other standards, the WCS group behind is that both lay users and experts frequently come plans to go one step further and split the core WCS into a up with new desires, however, it is not feasible for a service Grid Coverages Common (tentative title), which specifies a provider to continuously invest into programming of new ser- service-independent coverage structure, and the WCS ser- vice functionality. Ideally the service interface paradigm of- vice as such. The expected benefit is that the coverage con- fers open-ended expressiveness available without client-side cept can be used by other service standards, independently or server-side programming. This calls for a language ap- from WCS. For example, in a future scenario a Sensor Ob- proach where users (or client developers) can flexibly com- servation Service (SOS) might deliver coverages for inges- bine existing building blocks into new functionality. tion through a WCS-T interface feeding a database with a WCS/WCPS front-end. Figure 2 shows a possible lineup of From databases we learn that it is advantageous to craft WCS core and extensions. such a language in a way that it is delarative and safe2 ”Safe in evaluation” in database speak means that every admissi- The next section discusses the design goals and considera- ble request will terminate after finite time; the effect is that tions which have guided development of WCPS. no Denial of Service (DoS) attack is possible on the level of a single request. In languages like SQL this is achieved 3. REQUIREMENTS by avoiding explicit loop and recursion constructs. Obvi- ously this property requires a tradeoff with respect to the WCS tentatively restricts itself to a few simple operations, overall expressive power of the language. On the one hand, mainly: spatio-temporal subsetting; range subsetting (in a large set of statistics, signal, and image processing algo- some domains also referred to as ”band selection”), reprojec- rithms need to be supported. On the other hand, a client tion, scaling, and data format encoding. This makes WCS must not be given unlimited power over what is executed on relatively simple to implement and helps communities to the server. We consciously maintain that the processing lan- rapidly get their data assets online accessible through a ser- guage be safe in evaluation, thereby retaining, for example, vice complying with open standards. However, among the convolutions, but losing, for example, matrix inversion. inputs brought into the WCS.SWG there have been several requests for functionality beyond such simple coverage ac- 2maybe the most prominent example of a safe and declara- cess, such as deriving the Normalized Difference Vegetation tive language is SQL. Declarative languages, as opposed to imperative, procedu- ral ones, allow the user to specify what the result should look like rather than telling the system in what steps this result is computed. Declarativeness not just makes request formulation easier to the user, but also opens up free space for the server to optimize requests, i.e., rearrange evalua- tion for achieving the same result faster. As our experience with array databases tells that there is a wide range of ef- fective optimization methods on coverage manipulation [40], optimizability of expressions is an important requirement. Figure 3: Excerpt from a WPS process specification Notably we do not demand minimality of the language. We will come back on this aspect in the conclusions when we can give concrete examples where minimality has adverse effects complicated as a global climate change model. This model on usability. makes WPS especially suitable for ”webifying” legacy appli- cations. Coverage support no longer is constrained to 2-D imagery and 3-D image timeseries. Since some time now within OGC Essentially, this XML based model specifies remote method coverages are seen as 1-D to 4-D entities, and several times invocation in the spirit of RPC, Corba, and Java RMI, but ”abstract”, i.e., non-spatiotemporal axes have been brought additionally with explicit geospatial semantics in the XML into discussion, such as atmospheric pressure. Hence, cov- schema. As such, it brings along all the concerns of similar erage expressions should allow to freely walk through the approaches, such as SOAP, one of them being security: a dimensions, in any combination of spatial, temporal, and malevolent WPS implementor can implement any kind of abstract axes. For example, 2-D coverages with x and z axes server resource access and manipulation, without any control can well occur as slicing results from 3-D or 4-D coverages. by the system administrator where the service ultimately is On a side note, such considerations for WCPS actually to deployed. some extent have driven generalization of the WCS coverage model. Another grave shortcoming of WPS is its low semantic level. To understand this better let us inspect an example. A Further, the language should be semantic web ready in that server-side routine provides a function Buffer which accepts coverage access and manipulation is described in a com- an input polygon and returns a buffered feature. In the cor- pletely machine-readable form, independent from human in- responding service description which is based on the stan- tervention when it comes to service discovery and orchestra- dardized WPS XML Schema (see Figure 3), function name tion. as well as input and output parameters are described. This represents the function signature, i.e., operation syntax; look- Finally, given that an international standard is aiming at ing for the semantics specification we find XML elements a large and diverse community and stands to assure se- Title and Abstract containing human-readable text. Hence, mantic interoperability between heterogeneous clients and there is no way for any automated agent to use it without servers, a formal specification of syntax and semantics seems human interference at some point. indispensable. Still, the resulting specification document needs to be understandable, in particular to programmers This has a number of serious drawbacks: not necessarily familiar with mathematical semantics defini- tion. While the many attempts of combining both properties in a model have shown that this seems close to impossible, • WPS consists only of a low-level syntactic framework a suitable compromise should be aimed at. for procedural invocation without any coverage specific operations; In other words, the processing functional- On a side note, ease of comprehension also rules out a pure ity itself is not specified, hence any high-level services XML encoding; languages like XQuery and XPath show how implemented on top of WCS per se are not standard- compact language style can be combined with XML. ized and interoperable3; 4. STATE OF THE ART • SOAP offers only syntactic service interoperability, as opposed to the semantic interoperability of WCPS; For the design of a standardized coverage processing lan- guage suitable for use in a Web environment we have inves- • adding any new functionality to a WPS installation tigated into existing OGC standards, image processing, and requires new programming on both client and server image databases for finding a suitable basis. side; 4.1 WPS • such a service cannot be detected by an automatic The OGC Web Processing Service (WPS) is a standard agent, as the semantics is not machine understandable; which specifies a geo service interface for any sort of GIS • for the same reason, automatic service chaining and functionality across a network [34]. A WPS may offer cal- orchestration cannot be achieved - for example, it is culations as simple as subtracting one set of spatially ref- erenced numbers from another (e.g., determining the differ- 3the WPS specification already mentions that it requires ence in influenza cases between two different seasons), or as specific profiles to achieve fully-automated interoperability. unclear to an automatic agent how to connect output perform partial access and swapping of parts. A systematic parameters of one processing step with the input pa- approach to processing extremely large raster data volumes rameters of the next step due to the missing semantics is addressed by the research field of array databases. information; Historically, the first rigorous treatment is provided by the • with a similar argument, server-interal optimization rasdaman array algebra [4][5] which has been inspired by such as dynamic load balancing is difficult at least. studying several formal models in imaging, in particular AFATL Image Algebra, a rigid formalization of image and signal processing with proven comprehensiveness and ex- Hence, WPS foresees that focused application profiles are pressive power [32]. AFATL Image Algebra has been chosen defined based on the core specification; these profiles, then, as the basis for rasdaman earlier [5] because of the con- are supposed to be crafted so as to allow for interoperability vergence in algebra which is the usual mathematical ba- indeed. Following this approach, a WCPS application profile sis for conceptual modeling in databases. Further array is currently under development in clsoe collaboration with algebrae are AQL [21], AML [22], and RAM [8]. They the re-established WPS working group [6]. all have in common, however, that on principle they are domain-independent and deal with abstract arrays (in a 4.2 Image Processing programming language sense), but without intrinsic geo- We immediately rule out computer vision and image under- spatiotemporal semantics - in particular, coordinate system standing, as these disciplines work on a different semantic handling is not seamlessly integrated. level than WCPS is aiming at. Further, answers generated in these domains normally are of probabilistic nature, whereas The only approach which has been implemented comprehen- for WCPS the goal is to allow precise responses whenever sively and is in operational use under industrial conditions is possible. rasdaman. An advantage is that rasdaman combines a rigid formal semantics on the levels of query language design, ar- Many image processing languages have been proposed, such chitecture description, and expression of optimization rules. as MatLab [23], Erdas Imagine [20], and Envi [17]. These Further, its implementation is proven in earth and life sci- to some extent imperative languages offer a wide range of ences and with multi-Terabyte databases in operational use. proven functionality. Matlab is generic and does not offer Therefore, the WCPS concept has strongly been influenced generic support for geo services (add-on packages accom- by the experiences made with rasdaman, but offers an in- plish that). This is different for Erdas Imagine and, in par- trinsic geo semantics and adapts them to the coverage model ticular, for Envi which offers strong and comprehensive GIS of WCS. functionality. It seems hard, though, to factor out some self-contained functionality set which is small enough for a 5. WCPS standard but still rich enough for multi-dimensional coverage In this section we first introduce the WCS conceptual cover- services. Moreover, these imaging systems appear not im- age model, then briefly summarize WCS functionality, and mediately suitable for very large multi-dimensional imagery, then discuss the main language constructs of WCPS. where ”very large” means Terabyte to Petabyte object sizes. Rather, they traditionally are limited to main memory sizes. 5.1 Coverage Model Recent efforts to support ”out-of-memory processing” and The term ”coverage”, in ISO and OGC definition, denotes swapping of image parts hint into the right direction; how- some ”space-varying phenomenon”, i.e., a geographic object ever, database technology deals with high-volume data since with some extent whose values depend on the location (and long, with remarkable success. In particular, request opti- time) of probing. This very general definition stems from mization has been studied intensively there. Consequently, OGC Abstract Specification Topic 6 (Schema for coverage we next consider image databases. geometry and functions) [27] which is adopted from and iden- tical with ISO 19123 [26]. It foresees several different cover- 4.3 Image Databases age types: discrete coverages, Thiessen polygons, quadrilat- The requirement for further coverage processing capabilities eral grid coverages, hexagonal grid coverages, triangulated distinguishes array databases from multimedia databases. irregular networks (TINs), and segmented curve coverages. In multimedia databases images are interpreted using some In current practice, however, this general view boils down hand-picked built-in algorithm to obtain feature vectors; to discrete raster data; widening WCS scope to further cov- subsequently, search is performed on the feature vectors, not erage types listed in ISO 19123 are under consideration for the images. The WCPS language, conversely, does not at- future versions. Hence, it is safe to say that WCS in its cur- tempt to interpret the coverage data, but always operates rent version [39] defines an open access protocol for multi- on the cell values themselves. Hence, search results are not dimensional raster data. subject to a probability and are not depending on some ran- dom, hidden interpretation algorithm. A coverage basically is a function which maps coordinate lo- cations to values. It is materialized as a multi-dimensional Another distinguishing criterion, albeit on architectural level, value array, containing cells (”pixels”, ”voxels”) at the grid is the potentially large amount of data which implementa- locations. The set of admissible coordinate values is called tions of the standard need to process efficiently. Image pro- the coverage’s domain, which is spanned by a number of cessing systems traditionally are constrained to image sizes axes (or dimensions) defining the coverage’s dimensionality. less than main memory available; to some extent, ”out of For each axis, the coverage is delimited by some lower and memory” algorithms have been designed which essentially upper bound, expressed in some coordinate reference system (CRS). Each coverage has a list of CRSs associated in which with the GetCoverage request type. In addition, WCS fore- it can be queried; requesting values in another CRS than the sees a DescribeCoverage request type which delivers detail one in which the coverage is stored (or in the image coordi- information about coverages, such as extent and CRSs sup- nate system, directly using pixel coordinates) obviously will ported, as for transfer volume reasons in face of very large involve reprojection. numbers of server-side coverages the GetCapabilities request essentially only lists the names of the coverages offered, but Currently a coverage array can be of two, three, or four di- no details. mensions, containing mandatory x and y axes and optional z and time axes. For the next version it is foreseen to addition- The historically first (or legacy, as some voices say) protocol ally allow so-called abstract axes with application-defined se- type for WCS requests is HTTP GET using key-value pair mantics (such as products offered). Coverages, then, will be (KVP) notation. Alternatively, XML syntax based on XML allowed to have any combination of axes, including, for ex- Schema definitions is laid down in the standard, with both ample, 1-D time-only sensor time series, 2-D x/z planes, or HTTP POST and SOAP communication being supported. 5-D x/y/z/time/pressure cubes. GetCoverage offers a fixed set of operations which can be The structure of a coverage’s cell values (denoting the set combined freely in a request. These operators allow for spa- of all possible values associated with a cell) is given by its tial, temporal, and band subsetting, scaling, reprojection, range. Range values can be atomic, or a list of named com- and final result packaging, including data format encoding. ponents called range fields (commonly known as ”bands”, One GetCoverage request always addresses exactly one cov- ”channels”). Range fields, in turn, can be atomic or can erage. consist of multi-dimensional arrays of values themselves4. The following example shows a sample GetCoverage request Additionally, a coverage may know one or more null values against some satellite image time series coverage ModisCube, to denote cell values that are unknown or undefined. expressed in KVP notation:

For scaling or reprojection performed in the course of request evaluation usually resampling and interpolation have to be http://myServer/wcsServlet? applied. WCS defines the following list of standard inter- SERVICE=WCS & VERSION=1.1.0 & polation methods, adopted from ISO 19123: nearest (neigh- REQUEST=GetCoverage & bor), linear, quadratic, cubic, and the pseudo method none COVERAGE=ModisCube & which indicates that no interpolation is admissible; further RANGESUBSET=nir;red methods may be added by a particular WCS implementa- SRS=EPSG:31464 & tion. A subset of these methods is assigned individually to BBOX=4636000.0,5717000.0,4687000.0,5768000.0 & each coverage range field, with one of them being designated TIME=max & as default. During request processing a client may choose WIDTH=246 & HEIGHT=300 & DEPTH=1 & among the interpolation methods offered, or simply assume FORMAT=HDF-EOS & the default method. The effect of null values on interpolation EXCEPTIONS=application/vnd.ogc.se_xml can be controlled via the so-called null resistance parameter.

Further, a coverage is addressed through an identifier which The request extracts data along the given spatial bounding is unique within the server’s repository. box, which is expressed in CRS EPSG : 31464, and fetches the most recent time slice of the cube. The result is scaled Finally, some metadata are provided, part of which are op- to a size of 246x300x1 pixels and delivered in the HDF-EOS tional. format. Any eventual error is to be reported back in XML.

For formalization of the model, so-called probing functions 5.3 WCPS Coverage Processing Language are used. Each probing function extracts one aspect from the Based on the coverage model presented above WCPS offers a coverage; for example, crsSet(C) delivers a set of all CRSs in language to retrieve information from one or more coverages which coverage C can be addressed. Table 1 lists all probing stored on a server. It is a functional language, i.e., without functions. An example for rigorous use of probing functions any side effects (except for one case to be detailed below). will be given in Section 5.3.6, although for convenience we will occasionally employ them already before. 5.3.1 Processing Coverages 5.2 WCS Request Types The basic request structure consists of a - possibly nested - The common request structure for OGC standards, which loop over a list of coverages offered by the server, an optional WCS follows, initiates client/server conversation with a Get- filter predicate, and an expression indicating the desired pro- Capabilities request to learn about a service’s offerings and cessing of each coverage. capabilities. Subsequently, the client performs retrieval based 5 on this information; in the case of WCS, this is accomplished Let be given some n > 0, a set of variable names {$v1, ..., $vn} , a list of n nonempty, not necessarily disjoint coverage iden- 4 The latter feature is recognized as being relatively complex tifier sets covi = {covi,j : 1 ≤ ij ≤ in for some in > 0, to implement and handle; hence, it is optional now and is likely to be factored out into a bespoke extension in the next 5Prefixing variables with a ”$” character is not mandatory, WCS version. but used here to resemble a more XQuery-style syntax. Table 1: List of coverage probing functions Coverage Probing function Comment characteristic for some coverage C Identifier identifier(C) For original coverages only, Grid point values value(C, p) coverage cells, of data type rangeT ype(C) ∀p ∈ imageCrsDomain(C) Domain dimension list dimensionList(C) List of all of the coverages axis names, in their proper sequence Domain dimension type dimensionT ype(C, a) Dimension type ∀a ∈ dimensionList(C) Image CRS imageCRS(C) Image CRS, allowing direct array addressing Domain extent of coverage imageCrsDomain(C) Extent of the coverage in (integer) grid coor- expressed in Image CRS dinates, relative to the coverage’s Image CRS; essentially, the set of all point coordinates Domain extent imageCrsDomain(C, a) Extent of the coverage in (integer) of coverage ∀a ∈ dimensionList(C) grid coordinates, relative to the along dimension, coverages Image CRS, for a given expressed dimension; essentially, the set of in Image CRS all values inside the extent interval CRS set crsSet(C, a) Set of all CRSs from the supported CRS ∀a ∈ dimensionList(C) extent of coverage along domain(C, a, c) domain of the coverage, expressed in one of dimension, expressed in ∀a ∈ dimensionList(C), its CRSs, for a given (spatial, temporal, arbitrary CRS c ∈ crsSet(C) or abstract) dimension Range data type rangeT ype(C) The data type of the coverage’s grid point rangeT ype(C) values Range field type rangeF ieldT ype(C, f) The data type of one ∀f ∈ rangeF ieldNames(C) coverage range field Range field rangeF ieldNames(C) Set of all of the coverage’s name set range fields names Null value set nullSet(C, r) The set of all values that represent ∀r ∈ rangeT ype(C) null as coverage range field value Default inter- interpolationDefault(C, r) Default interpolation method, polation method ∀r ∈ rangeT ype(C) per coverage field Interpolation interpolationSet(C, r) All interpolation methods applicable to the method set ∀r ∈ rangeT ype(C) particular coverage range field; must list at least the default interpolation method Interpolation type interpolationT ype(im) Interpolation type of a ∀im ∈ interpolationList(C) particular interpolation method Null resistance nullResistance(im) Null resistance level of a ∀im ∈ interpolationList(C) particular interpolation method

covi,j identifier of some coverage known by the server} for tribute to the overall response. The result of such a request 1 ≤ i ≤ n, a predicate B, and a processing expression P is a (possibly empty) list of items. which both may contain occurrences of variable names vi. Then, a WCPS request has the general format: The type of expression P determines the overall response structure. A scalar-valued expression leads to a list of result values. The following example returns a single scalar repre- for $v1 in ( cov_1_1, cov_1_2, ... ), senting the maximum value occurring in elevation coverage ..., Elevation. $vn in ( cov_n_1, cov_n_2, ... ), [ where for $e in ( Elevation ) B( $v1, $v2, ... ) ] return return max( $e ) P( $v1, $v2, ... )

The result, a single floating point number, will be encoded The return clause is evaluated for each variable combina- in XML for transfer to the client. tion, unless predicate B evaluates to false in which case P is skipped and the current variable combination does not con- Coverage-valued expressions are treated differently. Let us assume that expression C evaluates to a coverage, as dis- over x $i (0,255), cussed in the later sections, and f is the name of a suitable y $j (0,255) coverage data exchange format. Then, P has one of the two values (char) ($i+$j)/2 forms

Note that the sequence in which axes are indicated in an encode( C, f ) expression is completely independent from the sequence of store( encode( C, f ) ) linearization as stored on the server (such as row-major or- der). This is part of WCPS’s hiding of storage internals.

In the first case, the response is a list of encoded coverages; in the second case, the encoding results are stored serverside 5.3.3 Condensers and URLs for download are passed back to the client instead. This operation class, which is similar to SQL aggregates, The store() function with its side effect is the only exception consolidates the grid point values of a coverage along se- to the functional style of WCPS. lected axes into a scalar value based on the condensing op- eration indicated. Example: ”The difference between red and near-infrared channel in coverage ModisScene, encoded in TIFF and stored Let be given some operation op ∈ {+, ∗, max, min, and, or}, on the server for later fetching”: n > 0, a set {$n1, ..., $nn} of axis names, a set {t1, ..., tn} of axis types, a set of pairs {(l1, h1), ..., (ln, hn)} with li < hi for i ∈ 1, ..., n, a boolean expression p which may contain for $m in ( ModisScene ) occurrences of $n1, ..., $nn, and an expression e which may return contain occurrences of $ni and evaluates to a scalar value. store( encode( abs( $m.red - $m.nir ), "TIFF" ) ) Then, the syntax for the condenser is as follows:

Next, we take a closer look at the operational capabilities condense op of the processing expression. At the core of the language over t1 $n1 (l1,h1), are basic operations for constructing coverages and summa- ..., rization over coverages. Further operation classes include tn $nn (ln,hn) convenience shorthands and special operations like scaling [ where p ] and reprojection. using e

5.3.2 Coverage Constructor The coverage constructor expression allows to create a d- The operator iterates over the given domain while combining dimensional coverage and assign values to its cells. The do- the result values of e through operator op. The where clause main definition consists, for each dimension, of a unique axis allows to exclude cells based on their coordinates as well as name plus lower and upper bound of the coverage, expressed their values. in a fixed image CRS and using integer coordinates. No other CRS is supported initially, however, the setter func- The following expression delivers the sum of all values’ abso- tion setCRS() allows adding a further supported CRS. The lutes inside the x/y bounding box (0,0)/(99,99) of coverage coverage’s content is defined by a general expression with expression C: some scalar result type, which at the same time will deter- mine the result range type. condense + over x $i (0,99), Let be given a unique coverage name f, some n > 0, a set y $j (0,99) t , .., t of axis types, a set $n , ..., $n of names, a set of 1 n 1 n using abs( C[x($i),y($j)] ) pairs (l1, h1), ..., (ln, hn) with li < hi for i ∈ 1, ..., n, and an expression e which may contain occurrences of the $ni and evaluates to a scalar value. Then, the syntax for the 5.3.4 Shorthands constructor is as follows: The previously introduced operations allow to compose a wide range of operations, however sometimes with quite some syntactic burden. To make simple things simple short- coverage F hands are introduced for important operation classes. over t1 $n1 (l1,h1), ..., Subsetting can be subdivided into sectioning and slicing. A tn $nn (ln,hn) section operation receives a coverage, an axis, and an in- values e terval on this axis. This interval will determine the new coverage’s extent along this axis. Hence, the coverage’s ex- For example, a 2-D greyscale image aligned in the x/y plane tent is reduced while the dimension remains the same. Slic- containing a diagonal shade can be written as below: ing, on the contrary, reduces dimensionality. This operation extracts a spatial slice (i.e., a hyperplane) from a given cov- erage expression along its axes, specified by one or more coverage Greyshade slicing axes and a slicing position thereon. By default, the coverage’s image CRS is used for address- ing. However, a qualifier with each axis may change this to express location in some other supported CRS. An example we will encounter in Section 6.2.

Example: The following expression subsets 4-D x/y/z/time coverage expression C by cutting out an interval from 100 to Figure 4: 3x3 Sobel edge detection filter kernel 200 along the x axis and slicing at z position 42; the result is a 3-D x/y/t coverage: case of a binary operation, the situation is more complicated. If one operand has a null value as per the coverage’s null C[ x(100,200), z(42) ] value set then the overall result will be null if there is some null value available in the intersection of both participating coverage’s null value sets. If this is not the case, then an For an assumed y extent of (y , y ), the coverage constructor 0 1 exception will be thrown. expression equivalent to the above shorthand is For filter predicates in the where clause we decided to adopt a rigid approach: a boolean null value will be interpreted coverage Slice as false, thereby effectively dropping the element on hand over x $cx (100,200), from the result list. y $cy (y0,y1) values C[ x($cx), y($cy), z(42) ] The Sobel filter, a well-known edge detector, may serve as a final, more complex example. Let the kernel be given by the matrix shown in Figure 4; for simplicity we assume it is Induced operations lift operations available on the range stored as a coverage as well, named Kernel3x3. Then, the type to coverage level by applying them simultaneously to following expression returns a coverage with same extent as all cells of a given coverage. We abbreviate the correspond- the original one, but values replaced by the edge detector. ing marray expression by the base operation. For example, cell-wise addition of coverage expressions C and D with same extent as before is written as: for $img in ( Image ), $k in ( Kernel3x3 ) return C + D encode( coverage filteredImage over The usual arithmetic, boolean, logarithmic, and trigonomet- x $ix( imageCrsDomain( $img, x ), ric operations are supported as induced operations, likewise y $iy( imageCrsDomain( $img, y ) record access and type cast known from programming lan- values guages. ( condense + over x $fx( -1, +1 ), Example: The expression below evaluates to a boolean cov- y $fy( -1, +1 ) erage: using $img[ x($ix+$fx), y($iy+$fy) ] * $k[ $fx, $fy ] + ( C.red + D.red ) > 127 condense + over x $fx( -1, +1 ), y $fy( -1, +1 ) Likewise common condensers can be abbreviated. If iter- using $img[ x($ix+$fx), y($y+$fy) ] ation uniformly goes over all cells of a coverage, and the * $k[ $fx, $fy ] expression evaluated at each location is based only on the ) / 9, cell value, but not its coordinate values, then a shorthand "png" operation can be applied. ) Example: ”The maximum value in the temperature variable of coverage ClimateRun”: 5.3.5 Further Operations In addition to the abovementioned operations there are fur- max( ClimateRun.temperature ) ther ones for scaling and reprojection. We omit discussion for the sake of brevity, as they anyway provide the standard mimics, but provide a scaling example in Section 6.2. Finally, some specification is needed as to what happens if one of the operands contains null values. Whenever a cell All operations can be nested arbitrarily as long as data value is encountered which is listed in one of its coverage’s types match. For convenience, type cohesion and exten- null value set then the result of the value combination will be sion as known from programming languages is provided. set to one of these null values (the default null if defined). In Parenthesing and implicit precedence rules are available for • ∀p ∈ imageCrsDomain(C3): value(C3, p) = value(C1, p) > value(C2, p) (”value is given by performing operation cellwise.”)

• dimensionList(C3) = dimensionList(C1)

• ∀a ∈ dimensionList(C3): crsSet(C3, a) = crsSet(C1, a)

Figure 5: Edge detector as a filter kernel example • ∀a ∈ dimensionList(C3): dimensionT ype(C2, a) = dimensionT ype(C1, a)

• imageCrs(C ) = imageCrs(C ) ∩ crsSet(C ) the syntax representation as used above, but obviously not 3 1 2 (”CRSs supported are the ones which boht input cov- needed for the XML expression encoding. erages share”)

5.3.6 Semantics Specification • imageCrsDomain(C3) = imageCrsDomain(C1) The specification approach for WCPS can be characterized • ∀a ∈ dimensionList(C ), c ∈ crsSet(C , a): as semi-formal: a fixed framework is followed which lends 3 3 domain(C3, a, c) = domain(C1, a, c) itself towards usual semantics specification, however, it re- (”extent is that of input coverages for each axis and in sorts to informal description in cases where a formalization each of its CRSs”) would constitute an inappropriate burden while the concepts are well known in the GIS community anyway. • ∀r ∈ rangeF ieldNames(C3): rangeF ieldT ype(C3, r) = Boolean The semantics of each operation is defined through its pre- (”for all range fields: result cell type is Boolean”) condition (such as only positive cells when applying a loga- rithm) and postcondition. The previously introduced prob- • ∀r ∈ rangeF ieldNames(C3): nullSet(C3) = {} ing functions serve to describe the operation postcondition. (”for all range fields: result has no null values”) Similar to algebraic specification of Abstract Data Types, the effect of applying an operation to a coverage expression • ∀r ∈ rangeF ieldNames(C3): is described by applying every probing function to the re- interpolationDefault(C3, r) = none, sulting coverage. interpolationSet(C3, r) = {none} (”result coverage does not allow interpolation”) We illustrate the semantics definition by means of the lan- guage element binaryInducedExpr, i.e., the binary induced Essentially, this specification says that the resulting cover- comparison of values; it belongs to the class of coverage- age has the same extent as the original coverages, is address- Exprs. Specification relies on the probing functions intro- able in only those CRSs supported by both input coverages, duced earlier which we present below (comments in the table and its values are derived from cellwise comparison. Some have been added for this article). constituents are not set, such as identifier and applicable in- terpolation methods; setter functions exist which can change Let C1, C2 be coverageExprs these subsequently, for example, to set the identifier of the where expression to ComparisonResult: imageCrsDomain(C1, a) = imageCrsDomain(C2, a), imageCrs(C1, a) = imageCrs(C2, a), domain(C1, a) = domain(C2, a), setIdentifier( C > D, "ComparisonResult" ) ∀a ∈ dimensionList(C2): crsSet(C1, a) = crsSet(C2, a), rangeF ieldNames(C1) = rangeF ieldNames(C2), ∀f ∈ rangeF ieldNames(C1): Note that this does not lead to a server-side storage and 6 rangeT ype(C1, f) is cast-compatible with rangeT ype(C2, f) subsequent accessibility; it merely changes metadata trans- or rangeT ype(C2, f) is cast-compatible with rangeT ype(C1, f). ferred to the client as part of the overall response. Setting the identifier might make sense when the coverage result

Then, for any coverageExpr C3 of structure is reinserted into some (same or different) server during a WCS-T upload [38].

C1 > C2 5.4 The WCPS Reference Implementation In the PetaScope project, Jacobs University is undertaking the reference implementation of WCPS. PetaScope consists the semantics of C3 is defined as follows: of a service stack as shown in Fig. 6. A Java servlet accepts XML requests, which must conform to the WCPS schema,

• identifier(C3) = ”” and returns coverage results. Coverage results are returned (”derived coverage has no name - it is not stored and, as multipart HTTP response containing an XML document hence, inaccessible by name.”) (the so-called ”manifest”) holding the metadata and one or more files holding the binary coverage data in the requested 6see [2], Section 7.2.5 encoding format. Figure 7: Search across a set of timeseries (source: EarthLook, www.earthlook.org)

In this Section we discuss some hand-picked application use cases addressing both typical current and future expected scenarios. Among the fields recently brought into the WCS Figure 6: WCPS reference implementation architec- working group are areas as diverse as sensors (in the broad- ture est sense of the term), exploration, atmospheric and hydro- spheric modeling, environmental monitoring, marine biol- ogy, biodiversity, and aerosol chemistry; certainly this list is The service uses the array database system rasdaman as by no means representative nor exhaustive. its backend, as rasdaman is already capable of storing and querying multi-dimensional raster data over any C/C++ 6.1 1-D Sensor Time Series cell type [3]. The WCPS web service component translates One-dimensional time series form a kind of coverage which a WCPS request into the rasdaman query language, rasql only recently has received attention by the WCS group. In [28], and hands this to rasdaman for processing. The results particular this was induced by harmonization work with the obtained from rasdaman are MIME-encoded and shipped developers where on the one hand back to the client, together with the XML-encoded mani- sensor timeseries obviously play a central role, and on the fest describing them. Rasdaman utilizes a relational DBMS other hand data structures are grounding on WCS when it as its persistent storage layer. Large arrays are partitioned comes to coverages. into smaller ones, so-called tiles, which then go into one BLOB (Binary Large OBject, i.e.: a byte string maintained The first use case searches within a given time series to in the database) each. The WCPS component itself addi- flag whenever a threshold T is exceeded. The WCPS be- tionally stores metadata information about the coverages low returns a standardized time series, encoded as comma- that it serves. In PetaScope, rasdaman makes use of the separated values (CSV), with value true whenever threshold PostgreSQL open-source DBMS to physically store data and T is exceeded, and false otherwise. metadata. for $ts in ( TimeSeries ) As performance evaluations are not yet available, only pre- return liminary observations can be made. The translation from encode( ( $ts > T ), "csv") the WCPS request into a rasql query appears to take only a few milliseconds. Rasdaman, which performs the main workload in the end, has been benchmarked, e.g., in [40, 5, For the second use case, the following request picks only 31, 10, 1]. those time series objects where the difference between max- imum and minimum value is below threshold T (Figure 7). Upon sufficient completion, the source code will be made available under a free license at www.petascope.org. The package eventually will consist of a comprehensive WCS for $ts in ( TimeSeries_1, ..., TimeSeries_n ) suite, offering WCS, WCPS, and WCS-T. where abs( max($ts) - min($ts) ) < T return 6. SAMPLE USE CASE SCENARIOS identifier( $ts ) OGC standards stand out in that they are thoroughly eval- uated practicality, usability, and adequateness before official The response in this case consists of a list of (locally unique) release. Central concepts of WCPS have proven successful in coverage names, hence no encoding needs to be applied. rasdaman during its many years of operational use. Further practical assessment has been performed using PetaScope; In some disaster mitigation scenario it might be of interest some of the use cases inspected are now publicly accessible to quickly learn about status changes. A simple standing through www.earthlook.org, including a sandbox for hands- query like the one below can deliver, for any time T , the on experimenting on sample data sets. cumulative average, for example: Figure 8: Alerter functionality implemented through standing queries in a browser (source: EarthLook) for $ts in ( TimeSeries ) return $ts[ time( imageCrsDomain( $ts, T ) ] for $ts in ( TimeSeries ) return avg( $ts )

Figure 8 shows how these requests are used in an alerter script where client-side Javascript is used to continuously Figure 9: Browser-based WMS navigation (source: resend the request and color the result according to some EarthLook) threshold value. scale( 6.2 2-D Web Map Service $a[ x:"urn:ogc:def:crs:EPSG:4326" The OGC Web Map Service (WMS) Implementation Stan- (-97.105,-78.794), dard [9] is the most widely used OGC standard, probably y:"urn:ogc:def:crs:EPSG:4326" due to the user friendliness of the interactive clients which (24.913,36.358) can be built on top of this client/server protocol. WMS ], provides the basis for what has been termed Web-GIS func- { x(0:559), y(0:349) }, tionality: via their Web browser users can navigate a map {} dynamically composed of different layers, with each layer ), rendered according to some chosen style definition. Usually "png" on client side some interactive client accomplishes conve- ) nient map navigation, such as interactive zoom and pan.

To show versatility of WCPS we show that WMS-type queries The expression is best understood by walking it inside out. can be expressed in it. The following is a typical WMS The coverage, represented by variable c, is subset with the GetMap request in KVP syntax, taken from [9]: coordinates indicated for each axis, along with the CRS in which coordinates are expressed. Next, the resulting image is scaled to the x and y extent indicated; the lower bound is http://a-map-co.com/mapserver.cgi? set to 0 here, but could be any integer value. The third scal- VERSION=1.2.0 & REQUEST=GetMap & ing parameter allows to indicate the interpolation method CRS=CRS:4326 & BBOX=-97.105,24.913,-78.794,36.358 & to be applied. As WMS does not allow to state such details, WIDTH=560 & HEIGHT=350 & the list is left empty meaning that the server will apply the LAYERS=AVHRR_09_27 & STYLES= & default interpolation. Finally, the result is encoded in the FORMAT=image/png PNG format.

This request translation technique is used successfully in The request accesses layer AHRR-09-27 and retrieves a cut- rasgeo, the rasdaman WMS, since many years. On the out given by bounding box (-97.105,24.913,-78.794,36.358) EarthLook website, www.earthlook.org, several demonstra- expressed in the coordinate reference system (CRS) identi- tion WMS instances are provided, some of which are based fied by EPSG code 4326 and using OGC’s URN-style syntax on WCPS, and some on the rasdaman query language, rasql. [39]. As no style is specified, the default will be applied. The All services ultimately maintain their data via rasdaman. resulting image is scaled to size 560x350 and then delivered Figure 9 shows a screenshot using the rasdaman WMS client. in PNG format. Finally we consider deriving summary data from maps. While Assuming that the AVHRR coverage is already stored as this is not within the range of WMS it is indeed relevant for a color image such a request can be formulated in WCPS imaging and GIS data analysis. The following WCPS code immediately: derives the histogram for an 8-bit greyscale satellite image channel: for $a in ( AVHRR_09_27 ) return for $ls in ( LandsatScene ) encode( return Figure 11: 2-D and 3-D Slices from a 4-D ECHAM T42 climate data set (horizontal wind speed)

over time $t(t0,t1) values avg( $a[ x( x0-1, x0+1 ), Figure 10: DFD-DLR WCS demonstration service y( y0-1, y0+1 ), with 1-D to 3-D extraction results time( $t ) ] ), encode( "csv" coverage LandsatRedHistogram ) over abstract $n( 0, 255 ) values count( $ls.red = $n ), "csv" Note that the data volume shipped over the net is about 1 ) kB, in contrast to the 10,000 image data cube. By expressing the user’s need concisely the data volume can be reduced, as will also be discussed in the next section. The induced comparison ls.red = n establishes a boolean matrix with a value of true iff the red band’s intensity values The image bottom-right shows the Normalized Difference correspond to the current bucket number, n. The count Vegetation Index (NDVI). Rasdaman would have allowed to operator inspects this matrix and counts the occurrences of derive this from suitable satellite data (such as the Landsat true. The results are cast into a new coverage which is 1- instruments), but WCS doesn’t offer such processing capa- D over an abstract dimension running from 0 to 255. An bility. In WCPS the NDVI extraction from near-infrared appropriate data format for shipping this 1-D coverage is and red Landsat channels can be phrased as follows: CSV.

6.3 3-D Remote Sensing Time Series for $lm in ( LandsatMosaic ) In an early WCS experiment a 3-D satellite image time se- return ries has been established based on rasdaman and Oracle encode( by the Remote Sensing Data Center (DFD) of the German ( $lm.nir - $lm.red ) / ( $lm.nir + $lm.red ), Aerospace Agency (DLR) [11]. IDL on the Net has been "tiff" used for the building the Web interface. AVHRR imagery ) representing land / sea surface temperature has been mo- saicked into a map of Europe and the Mediterranean, and 6.4 4-D climate then has been extended into time for an interval of sev- The four-dimensional use case is chosen from climate model- eral years. Altogether, the database consist of about 10,000 ing. Query-based access to multi-dimensional earth science AVHRR images collated into one x/y/t data cube. data has been investigated earlier in the EU-funded ESTEDI project; see also [3]. Figure 10 shows the Web interface together with some re- trieval results. Users can draw a bounding box for spatial The ECHAM T42 model is used for atmospheric simulation. selection and additionally indicate time intervals. The re- It generates relatively low-resolution data with a spatial res- sult, then, is a 3-D subcube. Alternatively, 2-D time slices olution of 128 x 64 cells for the complete earth surface. Tem- can be extracted and 1-D drill-through time series. For ex- poral resolution is 24 minutes per time slice. Over a sim- ample, the middle-right image shows the temperature curve ulation period of 200 years this accumulates to roughly 2 over Moscow for one year. In this use case it might be ad- million slices, corresponding to approximately 2.5 TB. This vantageous to not only select the cell identified by the coor- holds for one physical parameter (”variable”), such as tem- dinate location but to average over some region to eliminate perature, wind speed in x and y direction, CO2 concentra- atmospheric distortions and other potential effects. The fol- tion, etc.; up to and over 50 variables can occur. Figure 11 lowing WCPS request constructs the temperature time se- shows 2-D and 3-D slices of the x wind speed component ries by averaging over a 3x3 area around the chosen location obtained from a ECHAM T42 model run. x0/y0 for time interval t0 to t1: Interestingly it has been observed that users (in this case mostly: scientists) download by a factor of 10 too many for $a in ( AVHRR_cube ) data, as compared to what they actually need [19]. This re- return sults from unwieldy FTP archives where users have to find encode( their way through large files which they have to download, coverage TemperatureTimeSeries followed by writing own code for extracting the pieces of interest. Conversely, this means that by offering extraction and preprocessing capabilities on an adequate semantic level bandwidth usage and transfer times potentially can be re- duced by a factor of 10, not to speak of the enhanced quality of service.

Again, we discuss some typical operations on ECHAM T42- like data sets. The first use case requests wind speed in x direction at location x0/y0, expressed in CRS EPSG:4326 at height 0 over ground for time interval t0 to t1. This obviously returns a 1-D time series. An appropriate format for delivering such values is CSV. for $e in ( ECHAM_T42 ) return encode( Figure 12: Sample aggregated view of car repair $e.windspeedX [ x:"CRS:4326"( x0 ), data y:"CRS:4326"( y0 ), z( 0 ), time( t0, t1 ) ], "csv" )

Note that the syntax does not prescribe evaluation sequence; based on provable semantic equivalence a server can decide whether it first performs the subsetting (which is better in Figure 13: ME/R schema of sample OLAP cube face of a voxel-interleaved storage) or the temperature com- ponent extraction (which yields faster results with band- interleaved storage). spatio-temporal semantics we choose an data warehousing / OLAP scenario. Following the classical definition of [16] a The next use case asks for the average temperature at ground data warehouse is a topical, time-aware excerpt from one or level for all time slices. Again, the result is a one-dimensional more operative databases. Usually a data warehouse is or- time series of float values. ganized as a so-called data cube where dimensions, defined by measures, span a data space in which facts sit. Typically a data cube has between three and twelve dimensions of for $e in ( ECHAM_T42 ) which one usually is time, as aspects of enterprise behavior return over time are modelled. encode( coverage AverageTemperature Consider a sample miniworld about car repair frequency ob- over served by their garage visits [33]. An event is a garage visit, time $t (imageCrsDomain(c,time)) identified by vehicle, customer, date, and garage. Each such values events is additionally described by repair costs, number of avg( $e[ time( $t ) ] ), garage employees involved, and the duration of the repair. "csv" Figure 12 shows an aggregated tabular view of such events. ) For the modeling we use the graphical ME/R notation in- troduced by Sapia [33] as shown in Figure 13 and 14. The This time we need a coverage constructor because for each time step some processing is to be applied, for which a time position variable is required. The coverage clause gener- ates a 1-D time series by iterating over the time axis of ECHAM T 42. Note the typing of the domain axis which allows the resulting coverage lateron to know about the se- mantics of its axis. For each slice the avg operation summa- rizes its values. 6.5 Data Warehousing Suitability of WCPS for OLAP-style queries is an important prerequisite for cross-domain services where statistical (such as business) data are merged with geospatial coverage data (such as remote sensing imagery). To demonstrate the ca- Figure 14: ME/R schema, with dimension hierar- pabilities of WCPS to model multi-dimensional data beyond chies data cube, named repair, has measures vehicle, customer, Obviously this operation is structurally close to a scaling garage (resp. their identifiers), and date of arrival. Each along one dimension using linear interpolation. Actually, on fact has attribute values cost, number of employees involved, a side note we claim that there is much similarity between and repair duration. The resulting structure is termed a OLAP and spatio-temporal raster data. This gave rise for star schema in case of a single cube like in our example. In one research strand of ours where we work on extending the presence of several cubes the structures arising have been de- OLAP concept of dimension hierarchies in a way suitable scribed as snowflake and galaxy schemes; see, e.g., [24] where also for geospatial semantics. The expected benefit is not several further variants and extensions have been proposed only in the conceptual unification, but possibly also in new in addition. internal optimization techniques. For example, one of our research activities investigates on applying OLAP preaggre- Among the common operations on such cubes is aggregation gation to multi-dimensional raster images with the aim of along a multitude of different criteria. Such aggregation is extending the concept of image pyramids in a manner suit- defined on the dimensions by dimension hierarchies which able for more than two dimensions [13]. offer stepwise coarser views on the data. Users operate with spreadsheets as frontend which generate queries and return 7. CONCLUSION AND OUTLOOK the results in tabular or graphical view. We presented the Web Coverage Processing Service as the new OGC standard for flexible, high-level coverage process- We pick two typical OLAP query types, slicing and roll-up. ing services. WCPS has been approved by OGC as an official The slicing query asks for a list of all repair events of vehicle standard in December 2008. It bridges WCPS, which it ex- brand B in garage G within the last ten days. Assuming T tends with coverage processing, and WPS, which it extends as the maximum time coordinate we obtain: with a well-defined processing semantics. Both benefit from the flexibility of allowing ad-hoc formulation of complex re- for $cube in ( RepairCube ) quests without any server or client side code recompilation. return encode( In the requirements analysis we mentioned that minimality $cube[ vehicle( B ), is not among our goals, but without further justification. garage( G ), In retrospect this can be detailed now. A minimal opera- time( T-10, T ) tional set would abandon the trim and slice shorthand, all ], the induced operations, and the condensers. For example, "csv" the request )

for $c in ( MyCoverage ) The roll-up scenario requests a summarization, the number return of repairs per garage and year; the result is a 2-D cube with all( $c[ x(0:99), y(0:99) ] > 127 ) remaining dimensions garage and time. For syntactic sim- plification we assume a cube extent of t0 to t1 for the time dimension, g0 to g1 for garage, v0 to v1 for vehicle, and c0 can be expressed as to c1 for customer, resp. For simplicity we construct a new abstract axis for the years, instead of using the normal time axis. Again, the result is returned in CSV. for $c in ( MyCoverage ) return condense and for $cube in ( RepairCube ) over cx x(0:99), return cy y(0:99) encode( using (coverage MyBand coverage RollupCube over dx x(0:99), over dy y(0:99) abstract $g(g0,g1), values $c[ x(dx), y(dy) ] > 127 abstract $y(t0:t1/365) )[ x(cx), y(cy) ] values condense + over The second phrasing obviously is not just three times longer day $d(0,364), (9 lines versus 3), but also much more error prone. We vehicle $v(v0,v1), believe that such ”syntactic sugar” is beneficial for code de- customer $c(c0,c1) velopers, following the old rule ”code lines which don’t exist using cannot contain errors”. Additionally, as WCPS code usu- $cube[ time( $d+($y-t0)*365 ), ally will be generated by tools writing these tools needs to garage( $g ), be straightforward and intuitive to minimize programming vehicle( $v ), errors which are hard to detect lateron. The duplicate ex- customer( $c ) plicit coordinate addressing in the condense and trim oper- ], ation above is a nice example for this. But there is more "csv" to it: many of the ”shorthand” operations are particularly ) well to optimize - in other words, the particular syntax is a kind of optimizing hint to the server. Explicit index ad- WCPS focuses on coverage processing; dressing as above requires costly access operations, while in • WPS consists only of a low-level framework for pro- the compact formulation indexing is left to the system. This cedural invocation, whereas WCPS gives a high-level, makes the rasdaman optimizer switch to a strategy of simply concrete, and concise service specification; inspecting each cell in turn by iterating linearly over each tile. Cell access, then, effectively is reduced from evaluating • WPS specifies static services, whereas WCPS provides a Horner scheme with several additions and multiplications the flexibility of dynamic ad-hoc query formulation; in to a simple pointer increment. Finally, some operations are other words, WPS extension requires client and server generic enough with respect to domain and range to allow side programming, whereas with WCPS this means for creating libraries of generic operations. For example, the composing a new string on client side, without any short query version above is completely agnostic to dimen- change to the server; sion and extent of the input coverages. In summary, from a practical viewpoint many reasons speak against minimalis- • WCPS allows phrasing of analytically expressible al- tic language design, although the underlying model of course gorithms; WPS, on the other hand, by definition is should be minimal in its basic concepts. Turing complete; • As experience shows, WCPS offers a high potential Further, optimization and intelligent orchestration is on the for automatic chaining and optimization; WPS, on the research agenda. Optimization has proven highly effective other hand, typically requires manual server-side inter- for speeding up coverage access and processing. Concerning vention, such as code tuning in supercomputing cen- storage optimization, adaptive tiling [12] and compression ters. [10] turn out advantageous. Transparent integration of ter- tiary storage with emphasis on spatial clustering in tape Hence, any tool implementor and, subsequently, service pro- cabinets has been investigated in [30]. As for processing op- vider can choose between WPS’s syntactic interoperability timization, several techniques have been shown to speed up and WCPS’s semantic interoperability. Large systems with response times. Request rewriting exploits algebraic equiva- algorithms too complex to be described analytically (such as lences to substitute query fragments by semantically equiv- climate simulations) or legacy systems best use WPS; when alent fragments which execute faster; in [31] 150 algebraic flexibility, high-level semantics, and scalability in local or equivalence rules have been developed, of which 40 are used distributed environments specifically for coverage data are in the rasdaman system for achieving a canonical query rep- at stake - such as in decision support - then WCPS offers a resentation and 110 are optimizing. Parallel request pro- suitable interface. cessing in distributed environments has been implemented and tested in a Beowulf cluster [15]. Transposing OLAP Our experience shows that such a kind of service does not preaggregation to imagery for fast multi-dimensional scal- compete with imaging packages. Rather, there is an advan- ing and summarization is among our ongoing research [14]. tage in combining both: a WCPS-based server can perform Recently, we have started to study just-in-time compilation, data extraction and reduction through its preprocessing ca- with promising first results [18][35]. pabilities, say, reducing data size from an overall several Terabytes to several 100 Megabytes; a client-side special- A research project just launched investigates into automatic purpose data analysis tool, then, can undertake further anal- orchestration and service dispatching. This requires dy- ysis on the extracted data. An example where such a combi- namic analysis of the request and comparing against re- nation has proven useful is the WCS 3-D timeseries use case sources available. We intend to use cost-based models for presented earlier; in this case the combination consisted of distributed processing in heterogeneous environments. rasdaman as raster database and IDL on the Net for image processing and Web serving; similar experiences have been On conceptual level, the language is planned to be extended gathered with a Khoros coupling. with manipulation functionality - if the current return clause corresponds to SQL’s select then the equivalents to QL’s in- In summary, today navigational interfaces for large coverage sert, update, and delete are useful constructs. For example, archives are emerging already; the next step will consist of an application may want to update part of a map by replac- advancing from coverage data stewardship to service stew- ing an area given by a a bounding box, a bounding polygon, ardship based on open, flexible access interfaces for value- or a mask. adding processing, analysis, and mining. Application exam- ples are manifold: Sensor and streaming databases will allow While the WPS model as such was not found suitable for data subsetting, on-demand processing and summarisation, a tight semantic coupling between client and server WCPS, as well as standing queries for alerting. Hyperspectral satel- meantime work has started towards a WPCS protocol em- lite imagery will not just be served as is, but derived prod- bedding into the WPS framework. Formally, this is foreseen ucts like vegetation index or snow index will be computed on to become a WPS Applicaton Profile [6]. The WPS process- the fly and without redundant storage. Human brain imag- ing signature for WCPS is defined such that the input is a ing will benefit from analyzing thousands of brain activity WCPS expression in string or XML representation and the maps simultaneously. Multi-Petabyte statistical datacubes output is a set of either coverages or scalar, XML encoded can be leveraged for online analysis. This obviously poses values. Hence, the features of both standards augment each new challenges on the design of open, interoperable services other: and their efficient implementation. WCPS is OGC’s flexible, unified interface for semantic coverage services within and • WPS supports any kind of geo processing, whereas across domains. 8. ACKNOWLEDGEMENT March 1999. The author gratefully acknowledges is indebted to Arliss [13] A. G. Gutierrez and P. Baumann. Modeling Whiteside, with whom he co-chairs the WCS.SWG. Steven fundamental geo-raster operations with array algebra. Keens, with whom the author co-chairs the WCS.SWG, and In IEEE International Workshop in Spatial and Arliss Whiteside have contributed substantial suggestions Spatio-Temporal Data Mining, October 2007. for improvement during their proofreading of the WCPS [14] A. G. Gutierrez and P. Baumann. Computing draft. Ben Domenico continuously provides invaluable in- aggregate queries in raster image databases using put, discussion, and insight as initiator and leader of the pre-aggregated data. In International Conference on GALEON network. A big ”thank you”goes to the rasdaFolks Computer Science and Applications (ICCSA’08), for their great work in implementing rasdaman, PetaScope, 22-24 October, 2008. and EarthLook. The reviewers’ insightful comments have [15] K. Hahn, B. Reiner, G. Hoefling, and P. Baumann. allowed to significantly improve the paper. Parallel query support for multidimensional data: Inter-object parallelism. September 2002. 9. REFERENCES [16] W. H. Inmon. Building the Data Warehouse. Wiley, [1] Intra-query parallelism for multidimensional array 1996. data. In 28th International Conference on Very Large [17] ITT. www.rsinc.com/envi, last seen: 2009-apr-25. Data Bases (VLDB) 2002, August 20, 2002. [18] C. Jucovschi. Precompiling Queries in a Raster [2] P. Baumann, editor. Web Coverage Processing Service Database System. Bachelor thesis, Jacobs University (WCPS) Implementation Specification. Number Bremen, 2008. 08-068. OGC, 2008. [19] K. Kleese and P. Baumann. Intelligent support for [3] P. Baumann. Large-scale raster services: A case for high i/o requirements of leading edge scientific codes databases (invited keynote). In 3rd Intl Workshop on on high-end computing systems - the estedi project. In Conceptual Modeling for Geographic Information Proceedings of the Sixth European SGI/Cray MPP Systems (CoMoGIS), volume Lecture Notes on Workshop, 7-8 September 2000. Computer Science 4231, pages 75 – 84. Springer, 6 - 9 [20] Leica Geosystems. November 2006. gi.leica-geosystems.com/LGISub1x33x0.aspx, last seen: [4] P. Baumann. Language support for raster image 2009-apr-25. manipulation in databases. In Proc. Int. Workshop on [21] L. Libkin, R. Machlin, and L. Wong. A query Graphics Modeling, Visualization in Science and language for multidimensional arrays: design, Technology, April 13 - 14, 1992. implementation and optimization techniques. In Proc. [5] P. Baumann. A database array algebra for International Conference on Management of Data spatio-temporal data and beyond. In Proc. 4th (SIGMOD’96), pages 228–239. International Workshop on Next Generation [22] A. P. Marathe and K. Salem. Query processing Information Technologies and Systems (NGITS ’99), techniques for arrays. The VLDB Journal, volume Lecture Notes on Computer Science 1649, 11(1):68–91, 2002. pages 76 – 93. Springer Verlag, July 5-7, 1999. [23] The Mathworks. www.mathworks.com, last seen: [6] P. Baumann and M. Owonibi, editors. Web Processing 2009-apr-25. Service (WPS) Application Profile Extension for Web [24] D. L. Moody and M. A. Kortink. From enterprise Coverage Processing Service (WCPS). Number 09-045. models to dimensional models: A methodology for OGC, 2009. data warehouse and data mart design. In M. Jeusfeld, [7] M. Botts, A. Robin, J. Davidson, and I. Simonis, H. Shu, M. Staudt, and G. Vossen, editors, Proc. editors. Sensor Web Enablement Architecture. Number International Workshop on Design and Management 06-021r1. OGC, 2006. of Data Warehouses (DMDW 2000), June 5-6, 2000. [8] R. Cornacchia, S. Heman, M. Zukowski, A. de Vries, [25] D. Nebert, A. Whiteside, and P. Vretanos, editors. and P. Boncz. Flexible and efficient IR using array Catalogue Service Implementation Specification. databases. Number Report INS-E0701. CWI, January Number 07-006r1. OGC, 2007. 2007. [26] n.n. Geographic Information - Coverage Geometry and [9] J. de la Beaujardiere, editor. OGC Web Map Service Functions. Number 19123:2005. ISO, 2005. (WMS) Implementation Specification. Number 06-042. [27] N.n. Abstract Specification Topic 6: Schema for OGC, 2004-01-20. coverage geometry and functions. Number 07-011. [10] A. Dehmel. A Compression Engine for OGC, 2007. Multidimensional Array Database Systems. Phd thesis, [28] n.n. rasdaman query language guide. rasdaman GmbH, 2001. 7.0 edition, 2008. [11] E. Diedrich, B. Buckl, D. Dietrich, and P. Seifert. [29] V. Panagiotis, editor. Web Feature Service (WFS) Www-based information retrieval from full resolution Implementation Specification. Number 04-094. OGC, satellite images using a multi-dimensional data 2005. management system. In : Online proceedings of [30] B. Reiner, K. Hahn, and G. H”ofling. Tertiary storage EOGEO Workshop 2001, http://eogeo.net, 27.06.2001. support for large-scale multidimensional array [12] P. Furtado and P. Baumann. Storage of database management systems. In 28th International multidimensional arrays based on arbitrary tiling. In Conference on Very Large Data Bases (VLDB) 2002, Proceedings of the 15th International Conference on 20.08.2002. Data Engineering. IEEE Computer Society, 23-26 [31] R. Ritsch. Optimization and Evaluation of Array Queries in Database Management Systems. Phd thesis, 2002. [32] G. Ritter, J. Wilson, and J. Davidson. Image algebra: An overview. Computer Vision, Graphics, and Image Processing, 49(1):297–336, 1994. [33] C. Sapia. On modeling and predicting query behavior in olap systems. In Proceedings of the Intl. Workshop on Design and Management of Data Warehouses, DMDW’99, June 14-15, 1999. [34] P. Schut, editor. Web Processing Service Implementation Specification. Number 05-007r7. OGC, 2007-06-08. [35] S. Stancu-Mara. Using Graphic Cards for Accelerating Raster Database Query Processing. Bachelor thesis, Jacobs University Bremen, 2008. [36] G. Vowles, editor. Geospatial Digital Rights Management Reference Model. Number 06-004r3. OGC, 2004-01-20. [37] A. Whiteside, editor. OGC Web Services Common Specification. Number 06-121r3. OGC, 2007. [38] A. Whiteside, editor. Web Coverage Service (WCS) Transaction Operation Extension. Number 07-068r4. OGC, 2008. [39] A. Whiteside and J. Evans, editors. Web Coverage Service (WCS) Implementation Specification. Number 07-067r5. OGC, 2008. [40] N. Widmann and P. Baumann. Efficient execution of operations in a DBMS for multidimensional arrays. In Statistical and Scientific Database Management, pages 155–165, 1998.