Technical Report

A Java Implementation of the OpenGIS™ Feature Abstract Specification (ISO 19107 Spatial Schema)

Sanjay Dominik Jena, Jackson Roehrig {[email protected], [email protected]}

University of Applied Sciences Cologne Institute for Technology in the Tropics

Version 0.1 July 2007 ABSTRACT

The Open Geospatial Consortium’s (OGC) Feature Geometry Abstract Specification (ISO/TC211 19107) describes a geometric and topological data structure for two and three dimensional representations of vector data. GeoAPI, an OGC working group, defines inter- face APIs derived from the ISO 19107. GeoTools provides an open source Java code library, which implements (OGC) specifications in close collaboration with GeoAPI projects. This work describes a partial but serviceable implementation of the ISO 19107 specifi- cation and its corresponding GeoAPI interfaces considering previous implementations and related specifications. It is intended to be a first impulse to the GeoTools project towards a full implementation of the Feature Geometry Abstract Specification. It focuses on aspects of spatial operations, such as robustness, precision, persistence and performance. A JUnit Test Suite was developed to verify the compliance of the implementation with the GeoAPI. The ISO 19107 is discussed and proposals for improvement of the GeoAPI are presented.

II © Copyright by Sanjay Dominik Jena and Jackson Roehrig 2007 ACKNOWLEDGMENTS

Our appreciation goes to the whole of the GeoTools and GeoAPI communities, in par- ticular to Martin Desruisseaux, Bryce Nordgren, Jody Garnett and Graham Davis for their extensive support and several discussions, and to the JTS developers, the JTS developer mail- ing list and to those, who will make use of and continue the implementation accomplished in this work. We also wish to acknowledge the support offered by Prof. Dr. Marco Antônio Casanova of the PUC-Rio, and would like to thank all the students and researchers of the TecGraf insti- tute for their support and assistance.

IV Contents

1 Introduction 1 1.1 Standards and Specifications ...... 1 1.1.1 The Open Geospatial Consortium (OGC) ...... 1 1.1.2 ISO/TC211 ...... 3 1.1.3 GeoAPI and GeoTools ...... 4 1.2 Objectives of this work ...... 4 1.3 Document Structure ...... 5

2 Implementation Aspects 6 2.1 Dimension Model ...... 7 2.2 Robustness and Performance ...... 7 2.3 Precision and Robustness ...... 12 2.4 Data Storage and Persistence ...... 13

3 Geometry Data model 15 3.1 Geometry x Topology ...... 15 3.2 Geometry Root Object ...... 18 3.3 Primitives ...... 21 3.4 Complex ...... 25 3.5 Aggregates ...... 28 3.6 Coordinates ...... 29

4 Spatial analysis operators 32 4.1 Data Structure for the topological graph ...... 32 4.2 Geometric Predicates and Basic Algorithms ...... 33 4.3 Set Operations ...... 38 4.3.1 General Discussions ...... 39 4.3.2 Map Overlay ...... 43 4.4 Relational Boolean Operators ...... 50

V 4.4.1 The Intersection Matrix ...... 50 4.4.2 The Boolean Operators ...... 55 4.4.3 Algorithm ...... 60 4.5 Constructive Operations ...... 61 4.5.1 Buffer ...... 61 4.5.2 Centroid ...... 62 4.5.3 ...... 66 4.6 Metric Operations ...... 68 4.6.1 Distance ...... 68

5 Testing Suite 69 5.1 Test Environment ...... 69 5.2 Test Methodology ...... 70

6 Conclusions and Recommendations 72 6.1 Conclusions ...... 72 6.2 Future Work ...... 73

A Glossary 75

B Technical definitions 78

C Notations 80

D Observations and Recommendations 81 D.1 Abstract Specification Issues ...... 81 D.2 Recommendations for the GeoAPI ...... 82 D.2.1 Naming issues ...... 83 D.2.2 Interface and method modifications ...... 83

E Geometric objects and relations 87 E.1 Types ...... 87 E.2 Types in the plane ...... 88

F Implementation Overview 94 F.1 Implemented Classes ...... 94 F.2 Implemented Methods ...... 94

Bibliography 97

VI List of Figures

1.1 Overview of the OGC Abstract Specifications (from [OGC05]) ...... 2

2.1 Geometry in the conceptual model ...... 6 2.2 The polyline seems locally straight, but not straight at all ...... 9 2.3 Exact geometric computation in map overlay [BH98] ...... 10 2.4 Metric operations are based on elementary operations, which use optionally floating point arithmetic or exact integer arithmetic ...... 13

3.1 Geometric objects can be represented in three different dimensions: 2D (a), 2.5D (b) [imgb] and 3D (c) ...... 15 3.2 Example of vector representation ...... 16 3.3 Neighbourhood relationships in topological representations ...... 17 3.4 Non-: more then two faces connected to an edge ...... 18 3.5 Object Hierarchy of the Feature Geometry ...... 19 3.6 : simple (a), simple closed (b), non-simple (c), non-simple closed (d), self-tangent (e) ...... 20 3.7 : simple convex (a), simple concave (b), simple with hole (c), complex (non-simple) (d) ...... 20 3.8 orientation (a), no surface created (b), surface spliting (c) ...... 22 3.9 Curves segments building curves, curves composing a composite curve . . . . 23 3.10 Spline and Bezier Curves ...... 24 3.11 Composite surface generated by two surfaces ...... 26 3.12 Composite solid generated by two solids (cubes) ...... 27 3.13 Examples of Complexes: CompositeCurves and CompositeSurfaces ...... 27 3.14 Geometric objects do or do not contain their boundaries ...... 28 3.15 Geometry Collections: MultiPoint (a), MultiLine (b), MultiPolygon (c) and Ag- gregate(d) ...... 29 3.16 Examples of envelopes ...... 30

VII 4.1 The Doubly-Connected Edge List ...... 33 4.2 Vector and point translation to the origin of the coordinate system for the ori- entation test of a point and a segment ...... 34 4.3 Ring orientation test: if the the highest point, its predecessor and its successor are ccw oriented, the whole ring is ccw oriented (a); in case of collinearity the x value order of its predecessor abd successor decides (b) ...... 35 4.4 The Point-In-Polygon Test determines the number of intersections of rings with the semi-finite straight line starting from the given point: P1 and P4 lie outside the polygon and have an even number of intersections. P2 and P3 lie within the polygon and have an odd number of intersections, P5 and P6 are special cases colinear and tangent intersection lines ...... 37 4.5 Example of a Map Overlay: Three thematic maps with country, city and river information are overlaid into one map [imga] ...... 39 4.6 Set operations between two surfaces ...... 40 4.7 Different types of noding: The union of two curves (1) can be noded partially in topologically equivalent representations (2) or completely (3) ...... 41 4.8 Theoretically, set operations like difference can produce non-closed sets . . . . 41 4.9 Merging ambiguousness: The lines in (a) and (b) are topologically equivalent, but are defined by a different sequence of control points. The union of the curves in (c) is shown in (d). The object type of its final representation is am- biguous...... 42 4.10 The difference between two CompositeSurfaces can result in objects, which are not representable in conformance to the Abstract Specification ...... 43 4.11 The sweep line moves down to the next event point: As the event point is an

intersection point, the involved segments sk and sl must be tested against their new neighbours ...... 45 4.12 A noded graph with directed edges ...... 46 4.13 Finding all intersections in the overlay operation using the plane sweep algorithm 48 4.14 Internal graph representation ...... 49 4.15 The 4-Intersection matrix: its different configurations describe eight topological relations between two regions (From [ECF94]) ...... 51 4.16 Interior, Boundary and Exterior of two Polygons: Polygon A contains Polygon B 52 4.17 The 9-Intersection matrix: regions with holes can be separated exactly into the eight specified topological relations by additional distinction of the polygon exterior (from [EH91])...... 53 4.18 Disjoint geometric objects ...... 56

VIII 4.19 Examples of Touches relationships: Surface/Surface(a), Surface/Line (b), Sur- face/Point (c), Curve/Curve(d), Curve/Point(e) ...... 57 4.20 Examples of the Within / Contains relationship: Surface/Surface (a), Sur- face/Curve (b), Surface/Point (c), Curve/Curve(d) and Curve/Point (e) . . . . 58 4.21 Examples of the overlaps relationship ...... 58 4.22 Examples of the crosses relationship ...... 59 4.23 Examples of buffers: a positive buffer can be performed on a curve (c2) and surface (s2), a negative buffer only on a surface (s3) ...... 62 4.24 Examples of centroids of geometric objects: a point set (a), a straight line (b), a curve (c), a (d) and a simple polygon with a hole (e) ...... 63 4.25 Centroid of a triangle (a) and centroid of a surface computed by the centroids of the surface triangulation ...... 64 4.26 Triangulation by choosing a fixed point instead of partitioning the surface . . . 64 4.27 Triangulation of a hole within a surface ...... 65 4.28 Examples of convex hulls: Surface (a), MultiSurface (b), MultiPoint (c), straight Curve (d), Curve(e) and MultiCurve(f) ...... 66 4.29 The Graham’s scan algorithm computes the upper and lower hull separately from left to right and deletes points which do not result in a right turn . . . . . 67

5.1 Hierarchy of TestSuites and TestCases ...... 70

B.1 Examples of Monotones Chains (from [viv03b]) ...... 79

D.1 Current and suggested relation between an OrientableCurve, a Curve and the interface GenericCurve ...... 81

IX 1 Introduction

1.1 Standards and Specifications

1.1.1 The Open Geospatial Consortium (OGC)

The Open Geospatial Consortium (OGC) is an international non-profit trade industry organi- zation, which is currently composed of more than 330 companies, government agencies and universities. OGC supports the development of interoperable geographic information sys- tems through the definition of specifications and through the certification of implementations in compliance with them. Implementations of the OpenGIS Specifications can be categorized into Compliant Products and Implementing Products. Compliant products are software products which comply with OGC’s OpenGIS Specifications. These products have been tested and ver- ified through the OGC Testing Program and will be registered as a compliant on the OGC site. Implementing Products are software products which implement OGC’s OpenGIS Specifica- tions, but have not been verified yet. Most of the products are implementing products, since there are not compliance tests available for all OGC specifications. There are two types of specifications defined by the OGC: Abstract Specifications and Implementation Specifications. Abstract Specifications provide the conceptual foundation for many OGC specification development activities. They are platform and programming lan- guage independent; they support implementable interfaces and provide a reference model for the development of Implementation Specifications. An overview of the OpenGIS Abstract Specifications can be found in figure 1.1. Implementation Specifications are based on the Abstract Specifications and describe semantics, structure and technology aspects necessary for an unambiguous implementation. Therefore, independently developed implementations of an Implementation Specification should be interoperable. Abstract Specifications are separated into different topics handled by different OGC working groups, in order to manage the complexity of the subject. Figure 1.1 illustrates these topics and their hierarchy. The specifications centralize two themes: Information sharing and service access. Topics 12, 13, 15 and 16 cover geospatial services. The remainder is con- cerned with sharing geographic information. In OGC terms, real world objects are called features. Topics 5, 6 and 7 define the representation of features through geometry, coverage

1 and imagery. Topic 1, the Feature Geometry, provides a geometric data model for features. The rest of the information sharing topics are supporting issues for topics 5, 6 and 7.

Figure 1.1: Overview of the OGC Abstract Specifications (from [OGC05])

Feature Geometry Abstract Specification. The geometry layer is located at the lowest level of a GIS. It translates geographic entities, which represent Features (i.e. real world objects), into geometric entities and allows special analysis functions on and between them. The Feature Geometry Abstract Specification (FGAS) [OGC97] is the OGC document which defines the reference model of this layer. It separates geometric objects into primitives, aggre- gates and complexes in up to three Euclidean dimensions. In contrast to traditional GIS , it provides a great diversity of objects to describe spatial objects (such as the use of curvilinear curve segments) and hence has more capabilities then Simple Geometry Imple- mentation Specification (SFS)to describe real world objects. The ISO approved the FGAS as an international standard (ISO/TC211 19107). As a result, the standard has gained more and more research interest [PvOV00][Kuh05]. Throughout this document, the FGAS is also referred to as the Abstract Specification.

Simple Geometry Implementation Specification. The Simple Feature Implementation Spec- ification (SFS) [SFS05a] is an Implementation Specification for the Feature Geometry Abstract Specification (FGAS) (topic 1) [OGC97]. It focuses on Distributed Computing Platforms (DCP)

2 and specifies a simplified data model based on points, lines and polygons in two Euclidean dimensions. It defines a general architecture [SFS05a] and three implementation specifica- tions for the DCPs SQL [SFS05b], CORBA and OLE/COM. The SFS turned into a widely adopted geometry model in two dimensional GIS. How- ever, there are applications where the representation and analysis of three dimensional data is needed, such as the calculation of drainage areas or flood and weather prediction. The only serviceable GIS solutions are commercial ones (such as ArcGIS). Three dimensional free or open source GIS are rare (for instance, GRASS or SAGA) and are either not based on open standards or functionally limited. The FGAS specifies a high variety of geometric objects to represent real world entities more realistically, in up to three Euclidean dimensions. As the FGAS is an international ISO standard and provides codeable interfaces, it has a great potential to turn into an important data model definition for three dimensional geographic information. A complete implementation is provided by Vivid Solutions Inc.: The Java Topology Suite (JTS) [viv03b][viv03a] is a Java API of 2D spatial predicates and functions for fundamental geometric operations. The geometry implementation is OGC compliant to the SFS. It is a robust and running time efficient implementation and is used worldwide in several open source projects such as JUMP, uDig and GeoTools. However, JTS does not support any type of persistence, consequently it sometimes faces memory problems when loading large data sets. Another implementation of the SFS was done in the master thesis of Fei [Chu01]. Other implementations may be available, however the JTS is the most used one.

1.1.2 ISO/TC211

The International Organization for Standardization (ISO) [wwwd] is the world’s largest developer of international standards. The preparation of international standards is normally realized by ISO technical committees, divided into several subjects. The ISO/TC211 Geographic information/Geomatics [wwwe] is the ISO technical committee dedicated to the standardization of digital geographic information, i.e. information concerning objects or phenomena that are directly or indirectly associated with a location relative to the Earth. In 1998, ISO/TC211 and OGC signed an agreement so that both organizations could take full advantage of mutual contributions. One part of their collaboration is the inclusion of ISO/TC211 documents into the Abstract Specifications of OGC. In turn, OGC can submit Implementation Specifications as proposals for international standards ([OGC05], page 3).

3 1.1.3 GeoAPI and GeoTools

Abstract Specifications do not provide the same level of details in the semantic description of classes as Implementation Specifications do. A clear definition of the semantics is necessary in order to provide interoperability between implementations of Abstract Specifications. The GeoAPI project [wwwb] specifies and publishes interface class-files for further interoperable implementations by the GIS community. The interfaces published by the GeoAPI project have been created based on TC211 documentation. Attributes some analysis operators are accessed by getter and setter methods to conform with the conventions of JavaBeans. The GeoTools project [wwwc] provides an internationally-known open source (LGPL) Java library with classes used for geospatial analysis. It implements OGC specifications and works in close collaboration with the GeoAPI project. GeoTools is used in many open source projects including Web Feature Servers, Web Map Servers, and desktop applications. The GeoAPI project [wwwb] provides Java interfaces according to OpenGIS specification- sand ISO standards, which support interoperability for further implementations. currently there is no open source implementation of the FGAS GeoAPI interfaces. Also this wirk is an attempt towards a 3D implementation of this standard. GeoTools [wwwc] is a large open source GIS library; it works in close collaboration with the GeoAPI project and implements its interfaces wherever possible. GeoTools is used in many open source applications and has a vital community of developers. As there is a high potential for projects to benefit from the efforts and contributions of other developers and groups, the hosting of a new implementation of the FGAS as a subproject in the namespace of GeoTools brings optimal conditions for the project’s success.

1.2 Objectives of this work

An assessment on existing open source approaches for a FGAS implementation demonstrated that there was a demand for a serviceable FGAS implementation. Since FGAS supports either 2D and 3D objects representations, the objective of this work is to provide an implementation comparable to the SFS (as the SFS defines a 2D subset of the FGAS) without constraints to further full 3D developments. The current implementation permitts the repre- sentation of 3D data, but not their spatial analysis. This work is based on the results of the diploma thesis of Sanjay Dominik Jena [Jen07]. The Java Topological Suite (see Chapter 1.1.1) played a major role in the implementation of spatial analysis operators. Part of the JTS source code was adopted and modified under LGPL terms. The objectives of this work can be summarized as follows:

4 • Implementation of basic FGAS data types, particularly those comparable to the SFS polygons, lines and points.

• Implementation of the GeoAPI interfaces for the FGAS, including factory classes to instantiate FGAS objects.

• Extendability for 3D analysis operations and concurrent models of different dimen- sions.

• Numeric precision analysis and selection of an appropriate precision model.

• Implementation of robust numerical algorithms.

• Analysis of persistence related issues concerned with FGAS objects.

This work has at least following limitations:

• No topological data structures are implemented (chapter 7 of the ISO 19107).

• The FGAS defines a of subtypes of GM_CurveSegment such as splines, arcs and beziers. Only GM_LineSegment and GM_LineString are implemented here.

1.3 Document Structure

The remainder of this work is organized as follows: Chapter 2 discusses the general implementation aspects and design decisions. The top- ics are separated into programming language, implementation interoperability, design of a dimension and precision model, as well as data storage and persistence. Chapter 3 discusses selected aspects of the data model defined by the Feature Geometry Abstract Specification. It is intended to be a complement to the FGAS. Chapter 4 discusses structural and algorithmic for spatial analysis, as well as appropri- ate data structure for the topological graph, geometric predicates, basic algorithms, set theo- retic operations, relational boolean operators, and constructive and metric operations. Design decisions and suggestions for the FGAS regarding set operations and relational boolean are presented. Chapter 5 comments on the testing methodology used to verify the implementation of the data types and spatial analysis operations. Chapter 6 suggests modifications addressed to FGAS and GeoAPI.

5 2 Implementation Aspects

The implementation of the geometry data model described in the FGAS requires thoroughly understanding of the semantic of the geometry objects, computation geometry, software development and the particularities of computer languages. The implementation of the Fea- ture Geometry Abstract Specification presented in this document adopts the publicly accessible and accepted interfaces from GeoAPI (see section 1.1.3.

Figure 2.1: Geometry in the conceptual model

Factories. Geometric objects are constructed in factories using Java native objects or ele- mentary data types, combined with auxiliary objects (supporting light weight classes) defined by the geometry. GeoAPI defines the factories PrimitiveFactory and GeometryFactory to instantiate primitives and auxiliary coordinates. The additional factories ComplexFactory and AggregateFactory were added in this implementation to encapsulate the instantiation of complexes and aggregates and will be submitted to the GeoAPI project. All four factories are parts of the FeatureGeometryFactory. Figure 2.1 shows the possible context of a geome- try within a GIS. In this work, the geometry is directly accessed through the GeoAPI layer by its above laying application.

6 Packages. GeoAPI deploys the Feature Geometry interfaces under the namespace org.opengis.spatialschema.geometry. This package contains four subpackages, one for each specified in the Abstract Specification: aggregate, complex, primitive and geometry (corresponding to the coordinate package in the FGAS). The implementaion presented in this work is hosted by GeoTools under org.geotools.geometry.iso.

2.1 Dimension Model

The dimension of geometric data was one of the main aspects in the design of this imple- mentation, which should be easily extendable to 3D data. The geometry must be able to distinguish between 2D, 2.5D and 3D data (see examples in figure 3.1), since depends mostly on the dimension and is more complex with increasing dimension- ality. The number of ordinates can be used to to distinguish between 2D and 2.5D or between 2D and 3D models, but not between 2.5D and 3D models. Therefore, a dimensional model was developed and associated to the FeatureGeometryFactory and all geometries created with a FeatureGeometryFactory and its underlying factories are related to the same dimen- sional model. The current implementation stores 2D, 2.5D and 3D models in three instances of FeatureGeometryFactory respectively and does not support spatial analysis for object of dif- ferent dimensions since it would increase considerably the complexity of the implementation. The three dimension models associated to the singleton FeatureGeometryFactory are:

• 2D models work in two dimensional Euclidean spaces with the coordinates x and y. Geometric objects in these models do not store height information.

• 2.5D models work in three dimensional Euclidean spaces with the coordinates x, y and z, where the bijectional function f(x, y) → z is valid, i.e. there is only one z values for each tuple (x,y).

• 3D models work in three dimensional Euclidean spaces with the coordinates x, y and z.

2.2 Robustness and Performance

Precision. It is important to distinguish between precision and accuracy. Accuracy is the degree of conformity of a real world quantity and its representation in GIS, which depends on measurement or calculation methods. Precision is degree of repeatability of a digital infor- mation (for example a concrete floating-point or integer arithmetic implementation). This section is focused on computational or numeric precision. Different approaches to achieve precision can be implemented in precision models. The most common precision models are:

7 • Fixed precision. The value will be rounded according to a scale factor: point.x = round( point.x * scale ) / scale point.y = round( point.y * scale ) / scale A scale greater than one means that the precision point is to the right of the decimal point, i.e. we have a high precision (more bits). A scale smaller than one means that the precision point is to the left of the decimal point, i.e. we have a smaller precision (less bits). Result values of computation will be rounded by the rules above and have the same number of digits as its input values. Raster models always accord to fixed precision, because they use a homogeneous unit scale.

• Floating point precision. The values will use the full precision provided by the according floating-point data types based on the IEEE floating-point standard (see IEEE Floating-Point Standard in Appendix B). In Java, these are the elementary data types float and double. Computation results may have more digits than the input values.

• Exact precision. Exact arithmetic is the general term for arithmetic which realize exact results for input data in computation without loss of precision by rounding. In gen- eral, this is achieved by integer and rational arithmetic. Exact geometric computation is discussed below (see Exact geometric computation) in detail.

Algorithm Robustness. Floating-point arithmetic (see Appendix B) is inherently imprecise and its naive use can set axioms of arithmetic out of order [Sch98]. For instance, the evaluation of the term 3 · 0.4 = 1.9999999999999999

(evaluated with JDK 1.5) is a typical floating-point rounding error. However, floating-point computation is hardware-supported (see IEEE Floating-point standard, section B) and hence very fast. A comprehensive work about floating-point arithmetic and its problems is pre- sented by Goldberg [Gol91]. Rounding-errors are responsible for robustness problems. Inexact results used in combi- natorial computations can make the complete algorithm fail. Schirra defines that ".. implemen- tation of an algorithm is considered to be robust if it produces the correct result for some perturbation of the input. If the perturbation is small an implementation is called stable ..." [Sch98]. Since stability is related to numerical computation it is also called numerical stability. Schirra and [LY01] review robustness and precision issues in a broad and detailed manner. A wide spread approach on how to implement stable algorithms is the use of a threshold value . This technique follows the rule of thumb "If something is close to zero it is zero". A trigger value  is added to a numerical value to test whether this value is (almost) equal to another numerical value. The original intention is that the difference between two numerical

8 values can be that small that, in practice, both values can be assumed to be equal. The problem is the choice of , which is found in practice by trial-and-error until all current tests work for the input data and no errors occur. The technique of finding an  value is called epsilon- tweaking [Sch98]. As a result of rounding errors in floating-point arithmetic, the addition of an Epsilon value is a popular and easy technique. Its disadvantage is the risk of processing correctly on local tests, but resulting in faulty assumptions in relation to the general context. A naively implemented Epsilon condition might result in errors, illustrated in the following basic example: The program decides that A = B and B = C, but in reality A 6= C, so the program leads to incorrect results. Figure 2.2 shows another example from practice: A test using a small  value might suppose that the adjacent lines are equal in their degree, and one might not notice their different directions. But all together, the polyline is obviously not straight.

Figure 2.2: The polyline seems locally straight, but not straight at all

Consequently, the use of Epsilons can help in some situation, but may result in later robustness problems. Typical examples of robustness problems occur in the computation of the intersection point between two line segments, the Point-Line Orientation test and the Point-In-Ring test (see Geometric Predicates in section 4.2). Another problem, which occurs in practice is the dimensional collapse: Due to missing precision or rounding errors the topological dimension of an operation result is lower than the expected dimension. For instance, in the event that the precision is not high enough, the intersection between two very small regions may result in a single point or a line instead of a region (see [viv03b] for further explanations).

Exact geometric computation. The field of exact geometric computation has turned into one of the biggest issues in CG research. There are many approaches for the realization of exact computation:

• Integer arithmetic. Integer arithmetic is based on binary representation and binary arithmetic operations. It can result in an overflow, but not in rounding errors. The use of arbitrary precision integers eliminates the overflow problem. Since integral input is usually bounded in size (as the 16-bit elementary data type int), some approaches use

9 multiple precision integer with a fixed precision according to the binary size of the input data.

• Rational arithmetic. Exact rational arithmetic is the exact representation of a number by a numerator and a denominator (both in integer arithmetic). In general, divisions are typical reasons for rounding errors. Rational arithmetic avoids divisions, or better, they a c a·d  are postponed: b ÷ d = b·c

• Homogeneous Coordinates. Homogeneous coordinates can be used to represent the input data. Homogeneous coordinates use an additional ordinate, which can serve as a common denominator, and therefore avoid divisions.

• Symbolic and implicit representation. The result data is not directly computed, but only represented by its original input data. A numerical number as a result of com- plex computation can be represented by an expression tree which reflects the history of the computation of this numbers [MNU97]. An intersection point, for example, can be represented by the two line segments which intersect.

Figure 2.3 shows the map overlay process suggested by Brinkmann and Hinrichs [BH98]. As the algorithm uses the exact input data, result data must be rounded to the internal geom- etry precision afterwards.

Figure 2.3: Exact geometric computation in map overlay [BH98]

Exact Integer Arithmetic. Exact integer arithmetic can be implemented by amongst oth- ers, two approaches [BH98]: Through an abstract data type representing an arbitrary inte- ger value. That data type must support exact computation by offering robust operators (Addition, Subtraction, Multiplication, Division, Square root, etc.). The advantage of this

10 approach is that any geometric operation can be based on this robust data type. There exist several software packages based on this approach ([SVH89], [She97], [BBP95], [MNU97]). The greatest disadvantage of this technique is the efficiency of elementary operations. Karasick ([KLN91])reports that the replacement of a floating-point arithmetic package by a rational- arithmetic package in a Delaunay triangulation implementation resulted in calculation 10.000 times slower. However, Güting presents geometric algorithms based on integer arithmetic called REALM [GS93] [RHG] zaht are reported to be efficient. Special algorithms for precision and robustness are not necessary for all operations in computational geometry. In fact only a few operations need to be implemented in a robust manner. This is the fundament of the second approach, which do not implement exact compu- tation for all elementary arithmetic operations (Addition, Multiplication, etc.). The algorithms packages in Fortune and van Wyk ([FW96]) demonstrate the advantage of this approach in comparison to the first one.

Adaptive Evaluation. As mentioned above, geometric algorithms based on floating-point arithmetic compute correct results most times but fail only in special cases. Hence, the sub- stitution of all floating-point computations by exact computations would result in unneces- sary performance losts. Adaptive Evaluation tries to evaluate exact results only when needed. Schirra ([Sch98]) gives detailed explanation and a broad overview of implementations for this technique, which is also called lazy evaluation. A simple form of lazy evaluation is a floating-point filter, which has become a well established approach in geometric computation. Floating-point filters calculate a tight error bound of an operation with exact input data and compare it with the floating-point result (for instance a line intersection point). If the precision is within the error bound, the result of the floating-point-arithmetic computation will be used. Otherwise, the result will be computed using exact integer arithmetic. In [BH98], Brinkmann and Hinrichs implement a floating-point filter and show how to combine imprecise floating-point arithmetic with exact integer arithmetic to achieve exact computation of signs. Implementation experiments showed that the integer arithmetic implementation is about 50 times slower. Thus, the overhead of error-bound- computation is by far the better solution. There are more approaches for exact computation like the interval arithmetic, which is based on approximation and error bound and defines an interval containing the exact result (see [Sch98]). The selection of appropriate algorithms to be implemented robustly depends on the application. It might not be noticeable when a single point of a convex hull is lightly slighted aside, but a test which determines whether a point lies on the left or right side of a can result in extensive errors if it is not calculated correctly. Generally, the

11 implementation of an algorithm in a robust way is recommended if the procedure creates decisive input data for another algorithm.

Computational Performance. Algorithms can be implemented robustly, but on costs of performance. The robust implementation of single basic operations results in a similar prob- lem. Their performance is directly linked to the performance of the algorithms using them. In general, robust algorithms will have a worse performance than most non-robust algorithms. Hence, the trade-off between performance and precision (or even robustness) is an important issue. Another factor in computational performance is the data structure the algorithms work on. In general, topological structures have considerable performance advantages.

2.3 Precision and Robustness

The precision model defines with which precision data is stored, i.e. fixed precision, floating single precision or floating double precision (see Precision in section 2.2). The model is repre- sented by the class PrecisionModel and is instantiated by FeatureGeometryFactory. Currently floating double precision with all digits of the elementary data type double is used. Consequently, computation results may have more digits than the input data. As discussed in 2.2, elementary operations (e.g. multiplication, division) in floating-point arithmetic may result in rounding errors, which are acceptable in many cases. However, in order to avois failure due to rounding errors, algorithms were implemented in a stable man- ner, i.e. most of the basic operations are robustly implemented so that the complete algorithm will not fail. Most of the robustly implemented operations are geometric predicates in section 4.2) as the Evaluation of Signs of or Line Segment Intersection Point Computation (see Geometric Predicates and Basic Algorithms). In constructive operations as the calculation of a convex hull or the intersection between two surfaces, it may not be of importance for the practice if a control point is lightly slighted aside. In contrast, the precision and accuracy of results of metric operations (length of a curve, area of a surface or parameterization calculation) may be more important. Such operations still depend on inexact elementary operators (∗, %, +, −) of the floating-point arithmetic and may contain rounding errors. Exact computation in Java can be easily achieved in Java using BigDecimal. The calcu- lation is slower, but avoids rounding errors [MCD06] [Sch05]. The class DoubleOperation contains the implementation of the four elementary operators and is used in all metric oper- ations (as well as in its underlying basic operations in Algorithm2D and AlgorithmND) within the data model implementation. The class ExactDoubleOperation implements the same methods, but based on the BigDecimal type and can now be used instead to perform

12 Figure 2.4: Metric operations are based on elementary operations, which use optionally floating point arithmetic or exact integer arithmetic exact computation in elementary operators. The conversion from a double to a BigDecimal and vice versa also can yield rounding errors.

2.4 Data Storage and Persistence

Data efficiency is an important aspect in a geometry since thousands of objects may be instan- tiated in the same project. It is a typical GIS problem that large datasets cannot be loaded into memory due to their size. There are generally two approaches to address this problem:

• All data is held in the memory. The use of primitve java types (double, int, boolean, char, etc.) is prefered over referenced objects. For instance, a coordinate sequence stores double arrays instead of point lists.

• Data is made persistent (i.e. stored in files or a database) and loaded when necessary, e.g. for rendering or spatial analysis.

The first approach is used in the SFS. It specifies a Well known structure (WKS) which defines how to encapsulate elementary and geometric data within geometric objects. [Chu01] reports that this is an effective technique to enhance the capacity of a geometry, i.e. to use the computer memory more efficiently. Another design decision is related to the storage of geometric properties like area, perimeter, centroid, length and bounding box, which depends on the importance and the use frequency of these attributes. Objects in this implementation (besides points) store only envelope, other properties are calculated on demand. Most of today’s GIS geometries follow the second approach. The computer’s memory is relieved by data storage in files or databases. [KBS91] discuss modern access methods on

13 spatial databases and distinguishes between two types of data which can either be held in memory or made persistet:

• spatial properties and attributes of geometric objects

• temporary data generated by spatial analysis operations

A persistence is not implemented in this work. For future development, the implementa- tion provides two possibilities in order to realize persistence to relieve memory:

• Within the factories: All instantiations of geometric objects are performed through the factories. The method implementations of those could consider persistence by storing the object’s properties and attributes in a file or database.

• Use of collection factories: At some parts of the implementation lists or collections are fetched from a collection factory which implement the interface CollectionFactory. The point array, for instance, receives a list to hold its positions from such a collection factory. The methods in these factories, which actually instantiate the lists in memory, could be implemented in a persistent manner. The native Java list interface supports to fetch list elements separately.

14 3 Geometry Data model

3.1 Geometry x Topology

Vector models consist of individual or combined geometric primitives point, curve, surface and solid. Figure 3.2 shows an example of a two dimensional vector model. The ISO 19017 refers to geometric objects in E2 and E3, i.e. with geometric dimensions 2 and 3. Besides 2- and 3-tuples for two and three dimensional coordinates respectively, a two point five dimensional (2.5D) model can be also defined, in which coordinates defined by 2-tuples (x,y) are extended with height values (z). Figure 3.1 shows examples for the different models. A 2.5D geometry is actually a 3D geometry, which can either be treated as a 2D geometry by simply neglecting heights, or by using 2D operations as a preliminary analysis and then comparing height prop- erties. The intersection of two objects in a 2.5D model requires that they intersect in the 2D projection. Therefore, many 2.5D spatial analysis can be performed firstly with 2D algorithms, making spatial analysis for 2.5D models much more efficient then for full 3D computational geometry.

Figure 3.1: Geometric objects can be represented in three different dimensions: 2D (a), 2.5D (b) [imgb] and 3D (c)

15 Figure 3.2: Example of vector representation

Geometric objects are associated to topological dimensions, which are 0D for points, 1D for curves, 2D for surfaces and 3D for solids. The topological dimension is the lowest Euclidean dimension necessary to represent the points of a geometric object in a parameter- ized function (see discussion below for curve parameterization). A curve can be parameter- ized by f : R1 → Rn and a surface by g : R2 → Rn (see 3.9). Topology describes the structure of space mainly through neighbourhood relationships between different topological elements, representing therefore the subset of the geometric information, which does not include coordinates and positions. Topological relationships remain invariant under geometric operations like rotation, translation and scaling. The topo- logical elements for 0D, 1D and 2D geometries are nodes, edges and faces respectively. Point is the coordinate representation of a node, curve contains the point set of an edge and sur- face of a face. Topological structures offers computational advantages over pure geometric structures in terms of robustness and performance. A simple example of a topological repre- sentation is a ring, which is composed of a sequence of curves and points. The elements of a ring defined by the ISO19107 can be traversed without any geometrical operations. There are different levels of topological information to represent geometric objects. The level 0 topology, also known as spaghetti model, has no explicit information on boundaries and neighbourhood relationships: a line consists of a set of points and has neither explicit boundaries nor neighbourhood associations. The level 1 topology defines a node-edge data structure, in which edges have start and end nodes, and nodes are associated to one or more edges. Edges in the level 1 topology intersect only at nodes and we can imagine it as a 3D- wireframe model. In a level 2 topology the same restrictions of a level 1 topology are imposed for the planar projection of the geometry, i.e. edges in the planar projection of the model inter-

16 sect only at nodes. A level 3 topology defines faces, edges and nodes. An edge is associated to start and end nodes as well as to left and right faces. Not only edge based data structures, but also those based on vertices and faces are used to build a level 3 topology (full topology). Figure 3.3 shows different associations between topological elements, where e.g. E{F } means that edges connect to faces. Cardinalities, sequences and other association properties are not symbolized in this figure for sake of sim- plification. The ISO19107 uses the relationships E{N} for curve boundaries and F {{E}} for surface boundaries, where {E} is a ring and {{E}} the surface boundary, which is composed of rings. The constant number of nodes (2) and faces (2) connected to an edge makes edge based data structures preferable to and face based ones, because latter hold dynamic lists of associated elements. Furthermore, the short "topological distance" from edges to both nodes and faces makes the edge a central element for topological representations in terms of query and storage efficiency. The Doubly-Connected Edge List (DCEL) (Preparata [PS85]), the half-edge structure (Mäntylä [Män87]), winged-edge and radial-edge data structures are some examples of it. For special geometries like TINs other data structures can be more appro- priated, e.g. defining triangle-node (F {N}) and triangle-triangle (F {F }) structures without using edges.

Figure 3.3: Neighbourhood relationships in topological representations

Level 3 topology applies not only for planar and curved surfaces but also for solids, or generally speaking for two-manifolds. In a simple cube, each edge is associated to two nodes and to two faces and can therefore be represented by a level 3 topology (two-manifold or two- dimensional manifold). In opposite to that, a model consisting of two cubes sharing an edge can not be represented by level 3 topology, since the sharing edge is associated to 4 faces, two

17 from each cube (fig 3.4). In this case, a non-manifold data representation is necessary. [Rös98] thoroughly discusses topological relationships in GIS. Section 4.4 illustrates examples which involve this implementation.

Figure 3.4: Non-Manifolds: more then two faces connected to an edge

Topology and the relationship between geometric elements are important in three aspects of the ISO19107 implementation: 1) there is an inherent topological relationship between interior and boundary of geometric complexes; 2) topology is the base of spatial operators described in chapter 4 and 3) the standard defines a full topological data structure ([OGC97], chapter 7) (which is not being treated in this work).

3.2 Geometry Root Object

A Geometry (Geometry in GeoAPI, GM_Object in FGAS) represents a geometric entity and is the root class of the geometric primitives. It is a direct subclass of a TransfiniteSet. It contains the three type categories primitives, aggregates and complexes. Figure 3.5 shows the hierarchy of the data model and its relationships.

Simplicity. A geometric object can be simple or non-simple. Simple means, that there is no interior point intersection or self-tangency. The only intersection allowed is the intersection of boundary points, for instance in the end points of line segments which form a curve. Exam- ples of simple and non-simple curves and surfaces are given in figure 3.6 and figure 3.7.

18 Figure 3.5: Object Hierarchy of the Feature Geometry

19 Figure 3.6: Curves: simple (a), simple closed (b), non-simple (c), non-simple closed (d), self-tangent (e)

Figure 3.7: Polygons: simple convex (a), simple concave (b), simple with hole (c), complex (non-simple) (d)

Cycle. Geometries are cyclic if their boundaries are empty. Other specifications use the term closed, but the term is ambiguous since it could be confused with the concept of closed sets as applied for complexes. A Boundary itself does not have a boundary and are hence cycles. Subcomplexes of boundary objects (Ring and shell) are cycles. Point, CompositePoint and MultiPoint are always cyclic, because their boundary is empty. In all these cases the following semantic is valid: isCycle() = boundary().isEmpty().

Closure. The closure of a geometric object o1 is the union of the interior and the boundary of this object: o1.closure = o1.interior ∪o1.boundary. The method Complex getClosure() converts primitives into complexes. Since a complex contains its boundary, a closure oper- ation on complexes will return the complex itself. Furthermore, composites without bound- aries (ring, shell) are identical with their closures.

Representative Point. The representative point of an object is a position from the set of direct positions of the geometry . The representative point is defined as follows:

• Point and CompositePoint: the point itself.

• Curve: the point located in the middle of the curve, i.e. the point along the curve with the constructive parameter 0.5.

20 • Surface: in general, the centroid of the surface can be used as the representative point, if it is within the object. If the geometry does not contain its centroid, another point which lies on the surface has to be found. One possibility to find such a point is a heuristic to test points close to the boundary of the surface or points between the boundary and the centroid. Unfortunately, such approaches do not guarantee a certain running time. The implementation in this work does not provide such an approach, but returns the start point of the exterior ring of the surface boundary (which is not part of the object since surfaces do not contain their boundary).

• Aggregates: the first element of the set.

• Composites: the representative point of the first generator.

Set operators. In addition to these spatial and geometric characteristics, the FGAS defines operators to determine the spatial relationship (such as equals, contains, intersects) and set theoretic operations (such as intersection, union, difference, symmetric difference) between two geometric objects. Those analysis functions are thoroughly discussed in chapter 4.

3.3 Primitives

A primitive is an open set of points that does not contain its boundary points: curves do not contain their end points, surfaces do not contain their boundary curves, and solids do not contain their bounding surfaces. The primitive subtypes point, curve, surface and solid can be instantiated.

Orientation. Curves and surfaces are oriented geometries with positive or negative orien- tations. The orientation of a curve is positive in the sense of its parameterization. Oriented curves are positive if they have the same orientation as its corresponding curve and negative if they have the orientation opposite to the curve. In many data structures instances of OrientedCurve appear as a tuple associated to its corresponding primitive, i.e. an instance of OrientedCurve for the positive orientation and an instance of OrientedCurve for the negative orientation for the same object Curve. The FGAS presents an elegant approach of subtyping the Curve in OrientedCurve, where the positive OrientedCurve is the curve itself, what is semantically different from "‘the positive OrientedCurve has a corresponding curve"’. Each positive OrientedCurve has a correspond- ing negative OrientedCurve implemented as a proxy. Since the negative OrientedCurve is also a curve, for each curve there is also a proxy curve mirroring it in terms of its internal local coordinate system. The Java implementation uses a decorator pattern to create the proxy

21 curve. It has a reference to the positive curve and mirrors the parameterization, setting e.g curve.startP oint = proxy.endP oint. The oriented curves composing a ring bound a surface to the "left", according to the right- hand rule (fig. 3.8 (a)). Surfaces have positive orientations in the direction of its up-normal, which also follows the right-hand rule. Shells are composed of oriented faces in such a way that the solid being bounded is below the surface, i.e. the up-normal of the surface points to the outside of the solid. Curve orientation is often used to verify the closure of surfaces during construction pro- cesses. Figure 3.8 shows an example of a surface composed by an inner and an outer bound- ary (see (a)). The construction process of this example consists in the insertion of two curves as shown in the Figures (b) and (c). Figure (b) shows that the curve insertion does not split the surface, in contrast to Figure (c), where the curve insertion splits the original polygon into two polygons. The algorithm to verify polygon closure is built upon curve orientations (represented by the blue arrows) and consists of traversing the oriented curves starting at the inserted curve until it is visited again. In Figure (b), the curve is reached from the oppo- site side (not a closure) and in Figure (c) the curve is visited from the same side, resulting in polygon closure.

Figure 3.8: Surface orientation (a), no surface created (b), surface spliting (c)

It should be noted that the example discussed above is not valid according to the Abstract Specification, since the boundary ring in Figure (b) is not simple (because the oriented curves of the inserted curve overlap). Hence, such construction processes are not supported by the current geometry implementation.

Curve Parametrization

A curve is defined as a list of CurveSegments, where each end point of a curve segment must equal to the next segment’s start point. Figure 3.9 shows an example of curve segments parametrized from zero to their lengths, which were merged into curves. Both Curves and

22 the CompositeCurve composed by them are also parameterized from 0 to its length. The imple- mentation of the Curve object verifies the continuity of its curve segments.

Figure 3.9: Curves segments building curves, curves composing a composite curve

Both Curves and CurveSegments share a set of operations defined in the interface Generic- Curve, providing access to a curve’s start and endpoint, its tangent and its parameterization. The parameterized function c(t) → En described the set of points of a curve, whereas t is a real value. The constructive parameterization is normalized to 1. Within a curve, the start parameterization of a curve segment starts where its previous curve segment’s parameteriza- tion ends. The ISO 19107 defines the method forConstructiveParam(double t), which transforms a parameter t into 2D or 3D direct positions, depending on the dimension of the coordinate reference system in use. A 3D curve can also be derived from a 2D parametric space through a function c0(u, v) → E3, i.e. each point of the curve can be determined from the (local) coordinates (u, v) of a surface. This is helpful to enforce that a curve fits the shape of a given surface s(u, v), e.g. to embed the curve in a digital elevation model. This kind of parameterization is not defined in the standard. The segments of a curve are not necessarily line segments or line strings. The rail in figure 3.2 could be represented by one curve formed by splines, beziers, arc segments or their combination. Splines interpolate a sequence of control points and are defined piecewise by polynomials (see Polynomial expressions in Appendix B). Splines hold the mathematical rule that the value of the function and its first derivations have to be equal in the end point of each polynomial (except the last one) and the start point of the following polynomial. This characteristic make the spline look continuous and fluent. Figure 3.10 illustrates a cubic spline (additionally, its second derivation has to be zero) and beziers, a special spline type which has two or more fixed points and tries to approximate another sequence of points. B-Splines (Bezier Splines) are simply splines composed by Bezier curves.

23 Figure 3.10: Spline and Bezier Curves

Surface Parameterization

Surfaces are 2D primitives composed of surface patches. In the simplest case, the surface has a unique patch, which is a planar polygon. Closed surfaces are homeomorphic to a sphere and have no boundary, otherwise they have only one boundary composed of one or more rings. A ring is a closed curve, i.e. a curve without boundaries. If a surface has a boundary, then it is assumed that it has one and only one ring representing the exterior boundary of the surface and 0 to n rings representing interior holes (see fig. 3.8 (c)). In the simplest case mentioned above the surface has a boundary composed of one ring, which is the exterior one. The ISO 19107 does not permit to degenerate internal rings to open curves or points. One can distinguish between simple surfaces and non-simple surfaces . Simple surfaces (fig. 3.8 (a), (b) and (c)) have no self intersecting boundaries and can either be convex or concave, whereas non-simple surfaces are always concave. Simple surfaces are also called vertex-complete [MK89]. A typical non planar surface is a triangulated irregular network (TIN) used to represent a digital elevation model (DEM). Since surface boundaries are composed of rings (closed curves) and hence of curve segments, they can be also represented by curvilinear segments like splines. Regions can be represented by Surfaces.A Surface can be constructed either by a Surface- Boundary or by a list of SurfacePatches. A surface boundary consists of one exterior Ring which is the region’s limitation and a list of interior Rings which represent the region’s holes. Accord- ing to the Jordan curve theorem [wwwf], the exterior ring of a surface boundary divides the plane into an interior and an exterior of the surface. If the surface is constructed by surface patches, an implementation should verify their continuity, i.e. that all patches share a com- mon boundary edge with at least one other patch. Figure 3.8 illustrates examples of polygons, which are a special type of surfaces. Appendix E.2 lists all possible polygon variations and demonstrates which can not be represented by a Surface or another region representing data types.

24 Generic Surface Interface. Surfaces and SurfacePatches represent sections of surface geome- tries. They share a set of common operations, which are defined in the interface GenericSurface. Those operations access characteristics such as the surface’s upNormal vector (i.e. a perpen- dicular vector on the surface which is normalized to the length 1), the value of its area or its perimeter (i.e. the sum of the length of its exterior ring and all interior rings which define its boundary).

3.4 Complex

A complex is a collection of geometrically disjoint primitive elements and their boundaries. The subclasses of Complex are Boundary and Composite.A Boundary is part of a complex and is a (sub)complex on its own. For any primitive in a complex (apart from Point), there are primitives of lower dimension in the complex corresponding to its boundary. A boundary of a geometry of dimension d is a collection of disjoint primitive elements of dimension less then d: surfaces, curves and points are the elements of a solid boundary, a surface boundary is a complex with elements of type curve and point, and the elements of a curve boundary are the start and end points. Since boundaries are complexes, they are also composed of a collection of geometrically disjoint primitive elements and their boundaries. A complex is composed of subcomplexes, which are complexes themselves and subsets of the primitive elements of that complex. According to this definition, a boundary of a com- plex is also a subcomplex of that complex. The attribute elements of a complex contains the generator primitives of the complex and the boundaries of each one of these primitives.

Composites. Composites are generated by primitives sharing common boundaries: a com- posite solid is generated by solids, composite surface by surfaces and so on. A composite surface is "a collection of oriented surfaces that join in pairs on common boundary curves and which, when considered as a whole, form a single surface" [OGC97]. As in the case of boundaries, the primitive elements in the composite surface are not only the oriented surfaces generating the composite but also the curves composing the boundary rings of each individ- ual (primitive) surface in the composing surface, as well as the points corresponding to the boundaries of each of those curves. Figure 3.11 shows a composite surface generated by two surface primitives (fig. 3.11 (a)). This composite surface has three subcomplexes. The first is generated by surface 1, by the two curves forming the boundary ring and by both points connecting the ring curves (fig. 3.11 (b)). The second subcomplex is created from surface 2 in the same way. The third complex is the boundary (fig. 3.11 (c)). Figure 3.11 (d) shows the rela- tionships between complexes and subcomplexes. The sharing curve between both surfaces is included in complexes::elements, although it is neither a generating primitive nor a primitive

25 contained in the boundary of the composite surface. This curve is a generator of a composite curve, which is subcomplex of both rings in surface 1 and surface 2. Rings are composite curves and subcomplexes of the surface boundary. As all boundaries, a surface boundary is also a complex. In case of this example (a composite surface), it is generated by surface 1 and surface 2. These composite surfaces are subcomplexes of a composite surface. In this exam- ple, it is the maximal complex. The implementation approach uses the maximal complex to aggregate all primitives witin this complex. This aggregation in the maximal complex is used by subcomplexes to reference to primitives within it. Figure 3.11 also shows that the set of subcomplexes in a complex must not contain boundaries, as in the cases of rings illustrated in the figure, and also for shells (not shown in the figure).

Figure 3.11: Composite surface generated by two surfaces

Figure 3.12 shows an equivalent example for composite solids, where the sharing face in Figure 3.12 (b) is a primitive, which is neither part of the composite solid’s boundary (Figure 3.12 (c)) nor is a generating primitive. The generating primitives of the composite solid in this example are both cubes in Figure 3.12 (a). Figure 3.12 (d) gives a more general illustration on overlapping sharing faces.

Boundaries. According to the ISO 19107, complexes shall be used in application schemas where sharing of geometry is important, such as in the use of computational topology. If two topological faces share an edge, the edge must be a persistent entity in a topological represen- tation. In opposite to complexes, primitives create equal but not identical boundaries "‘on the fly"’ whenever necessary and are therefore not appropriated for topological representations, where persistent boundary objects are referenced.

26 Figure 3.12: Composite solid generated by two solids (cubes)

Figure 3.13: Examples of Complexes: CompositeCurves and CompositeSurfaces

27 Since the boundary of a point is the empty set, the simplest boundary element is the 0- dimensional complex representing a curve boundary, which is formed by two end points of the curve. The FGAS classes PrimitiveBoundary and ComplexBoundary do not have semantic differences in the current implementation. Primitives and complexes differ from each other in the following way: the former creates its boundary when necessary and the latter contains it as a referenced object. Boundaries are retrieved from both primitives and complexes and used, for example, to perform the Egenhofer operators as described in chapter 4. Topological models require persistent boundaries as well in order to define neighbourhood relationships between elements sharing common boundaries. Figure 3.14 illustrates examples of a surface and a curve containing (a) and not containing their boundary ((b), drawn with dashed bound- ary lines).

Figure 3.14: Geometric objects do or do not contain their boundaries

3.5 Aggregates

Modelling real world objects requires grouping different geometries into one entity, for instance a set of points representing trees, disconnected curves representing a street devia- tion or disconnected surfaces representing forest fragments. The geometries gathered into a single aggregate geometry can intersect, overlap or even be equal to each other (see fig. 3.15). MultiPrimitive is the only subtype of Aggregate and restricts the aggregation to primitives of the same topological dimension by use of the subclasses MultiPoint, MultiCurve, MultiSurface and MultiSolid (see figure 3.15 (a), (b) and (c)). According to the ISO 19107, a geometry collec- tion that holds primitive and complexes of different topological dimensions is also possible (see figure 3.15 (d)). The standard does not give any restrictions concerning intersection of element sets (see figure 3.15 (c) and (d)). MultiPrimitives do not contain their boundaries since they are composed of primitives of same dimension. The definition of MultiPrimitives in FGAS differs from the definition of GeometryCollection in the SFS. The SFS does not permit overlapping primitives in a geometric collection, whereas such constraint is not imposed in the FGAS.

28 Figure 3.15: Geometry Collections: MultiPoint (a), MultiLine (b), MultiPolygon (c) and Aggregate(d)

It is an open issue in the current implementation whether Aggregate should be an instan- tiable class in order to allow aggregation of complexes.

3.6 Coordinates

Coordinates describe the internal structure of primitives, an therefore of aggregates and com- plexes. Coordinates themselves do not represent an entire geometric object, but are part of the geometry.

DirectPosition and Position. These classes represent the coordinates of a position in the Euclidean coordinate system. A DirectPosition holds the values of the ordinates x, y and, if necessary, also z. A position holds either a DirectPosition or a Point. These classes are used by all geometric data types to describe positions.

Envelope. An envelope of a geometric object is the smallest rectangle which contains the object. An Envelope is internally represented by two points: the lower left corner of the rect- and the upper right corner of the rectangle. Envelopes are used in many operations for approximated calculations such as in intersection tests. If the envelopes of two objects are

29 Figure 3.16: Examples of envelopes disjoint, the objects are also disjoint. The minimal bounding region of a geometric object is a primitive representation of its envelope.

CurveSegment, LineString and LineSegment. Curves are composed by a list of CurveSeg- ments. Each CurveSegment represents a part of the curve and ends in the same coordinates in which the following curve segment starts. The FGAS defines several types of curve seg- ments: ArcString, ArcStringByBulge, SplineCurve, Clothoid, GeodesicString, LineString and Conic. Most of them are curvilinear segments defined by mathematically functions (see figure 3.10 for examples). Curvilinear curve segments are a considerable advantage over straight lines specified by the SFS, because they can represent real world objects more realistic. However, the implementation in this work is limited to LineStrings, which are a sequences of straight LineSegments. LineSegment is a subclass of LineString and simply connects its two end points by a straight line. In practice, curvilinear real world objects can be well approximated by line strings if the segments are small enough. A curve segment holds certain characteristics. The attribute numDerivates specifies the type of continuity between its neighboured segments at its end points. The curve inter- polation attribute reflects the type of interpolation realized by the curve segment. As this implementation always uses LineStrings, the numDerivatives is always 0 and the curve inter- polation is linear.

SurfacePatches, TINs and Triangles. The vector model defined in FGAS also includes spatial decompositions. In spatial decomposition objects are composed of non overlapping parts (of same topological dimension), which can be regular or irregular distributed. The ISO 19107 permits the representation of Triangle Irregular Networks (TIN), a special case of spa- tial decomposition, generalizing it also to irregular networks composed of planar or curved surface patches other then triangles. There is no such correspondence in the solids specifica-

30 tion, as e.g. a spatial decomposition into tetrahedrons or isoparametric volume patches. In the FGAS, the only way to model solids is given by boundary representation. Regular dis- tributed cell decomposition like rasters or spatial decompositions using quadtrees or octrees are not in the scope of the FGAS. Surfaces are formed by SurfacePatches. All surface patches agree in at least one common boundary edge with another surface patch within their surface. A special type of surface patches are Polygons. Polygons form PolyhedralSurfaces, which are a subclass of Surface. The data type Triangle inherits from Polygon and is defined by its three corners. It is used to form TriangulatedSurfaces (a direct subclass of PolyhedralSurface) such as a Tin. A Tin is a surface which is composed of a set of Triangles. The single triangles are its surface patches. A TIN is commonly used in terrain models specially in a 2.5d data model. It is usually constructed based on a set of control points. There are several algorithms to generate triangles according to those control points. One of the most known techniques is the Delaunay Triangulation.

31 4 Spatial analysis operators

Spatial analysis can be classified in selective and constructive ([Sch98]). The former does not create new data; instead, it selects a subset of the input data as result data. The latter, however computes new data based on the input data. Problems involving only elementary arithmetic operations (+, −, · and ÷) are called rational. This chapter is divided into geometric predicates (Section 4.2) and geometric operations. The latter is divided into relational boolean operators (Section 4.4), constructive set operations (Section 4.3) such as set intersection, union, etc., and constructive operations (Section 4.5) such as buffer calculation and convex hull.

4.1 Data Structure for the topological graph

Many operators, for instance set theoretic operations, need to store complex information such as the relationship between geometric objects and the relationship between parts of a geo- metric object within itself. Maps have a complex structure, for instance, at times maps can be divided into smaller regions called subdivisions. Algorithms and their performance depend strongly on the data structure on which they operate. Hence, it is of vital importance to have a data structure which contains all necessary information about subdivisions and their rela- tionships and allows for fast access to the information in order to avoid costly searches. The Doubly-Connected Edge List (DCEL) [PS85] describes a topological graph and con- tains a record of each vertex, edge and face. Figure 4.1 illustrates a planar graph in terms of a doubly-connected edge list. An edge is defined by two vertices. A closed set of edges sur- rounds a face. Each vertex stores its coordinates and the edges which end in the vertex. Each edge stores two vertices, the right and left faces and usually the previous and following edges. Instead of normal edges, the list can use directed edges, in which case each direct edge will store its start point and its twin (i.e. the directed edge which points in the opposite direction) (see illustration (b)), the following edge (the previous can be accessed through its twin) and the face to the left side (the face to the right side can be accessed through its twin). Examples of records within such an edge list can be found in [dBvKOS97]. With respect to the Feature Geometry Abstract Specification, the faces of a planar sub- division are the polygons themselves. Therefore, a surface comprises only two faces: one

32 Figure 4.1: The Doubly-Connected Edge List representing the Interior of the surface and another equals to its Exterior (see illustration (c)). The Exterior is the region, by which the polygon is surrounded, and the region described by the hole within the polygon.

4.2 Geometric Predicates and Basic Algorithms

Geometric primitives are the basic operations in geometric algorithms; they test properties of basic geometric objects. There is a small set of such operations which covers most of the computations needed in CG algorithms. This minimal set of geometric primitives is defined in [FH95] and [BH98]. These primitives are explained below and extended by the ones used in this implementation, which were in their majority adapted from those in the JTS.

Lexicographic comparison of two points. Some algorithms like the convex hull algorithm Graham’s scan need to sort coordinates into a sequence, from left to right in the coordinate system. This is called the lexicographic order. Sorting a set of coordinates in lexicographic

33 order is based on the comparison of two points p and q:

(p.x < q.x) OR ((p.x = q.x) AND (p.y < q.y))

The implementation of this algorithm is robust by its nature since it does not create new data.

Point-Line Orientation Test. A line segment s with bounding vertices p1 and p2 can be represented by the point set {p1 + λ · (p2 − p1) | λ ∈ [0, 1]}. Let l(s) be the supporting straight line of s defined by {p1 + λ · (p2 − p1) | λ ∈ <}. The 2D orientation test determines whether a point q lies on the left or right side of l(s) or on l(s) itself. One orientation test consists of testing whether a given sequence of three points (p1, p2, q) is clockwise orientated, counter clockwise orientated or whether the points are collinear (i.e. whether they lie on the same straight line).

Figure 4.2: Vector and point translation to the origin of the coordinate system for the orientation test of a point and a line segment

0 Translating the origin of the coordinate system to p , the new line starts at dp = and 1 1 0 ends at dp2 = p2 − p1; point is translate to dq = q − p1. Figure 4.2 illustrates this translation.

The determinant of dp2 and dq can now be used to determine whether dq lies on the left or right side of the new line, because translation does not change the relationsship between the line and point (i.e. whether q lies on the left or the right side of line s). 0 Let α be the angle between the two vectors dp → dp and dp → dq (that is → dp 1 2 1 0 2 0 and → dq, because dp lies on the origin). The following rules hold for the evaluation of 0 1 the sign of the determinant D:

34 • D < 0: 0 < α < π, the vectors are positively (counter-clockwise) oriented, q lies on the left side of s

• D > 0: π < α < 2π, the vectors are negatively (clockwise) oriented, q lies on the right side of s

• D = 0: q is collinear to s

The matrix in the example shown in fig. 4.2 has a positive sign, since the point q lies on the right side of the line segment s. Given that the algorithm is based on the robust evaluation of signs of determinants, it is also implemented in a robust manner. The Point-Line Orientation test has a constant runtime of O (1) because the determinant evaluation works in constant time as well.

Ring Orientation Test. This algorithm to test whether a ring is clockwise (cw) or counter- clockwise (ccw) oriented is based on the Point-Line Orientation Test described above. The algorithm searches the highest point hp within the ring, as well as its predecessor prev and successor next points. The whole ring has the same orientation as the curve defined by the sequence prev, hp and next. Example (a) in figure 4.3 illustrates a ccw oriented ring. If all three points are highest points(i.e. they are are collinear), the ring is cw oriented if prev.x < next.x and ccw oriented if prev.x > next.x as illustrated in example (b). Since the search of the highest point in a ring with n points may need n iterations, the test is performed in linear time O(n).

Figure 4.3: Ring orientation test: if the the highest point, its predecessor and its successor are ccw oriented, the whole ring is ccw oriented (a); in case of collinearity the x value order of its predecessor abd successor decides (b)

Evaluation of signs of determinants. Many geometric algorithms, for example the Point- Line Orientation Test, are based on the evaluation of determinants. This is one of the few

35 parts involving numerical precision and robustness in such algorithms. As previously illus- trated, floating-point arithmetic can lead to errors that may affect the sign evaluation of a determinant. Because this basic algorithm holds significant importance, the study of exact determi- nant evaluation gained researchers’ attention. [BY] reviews the two principal algorithms of Clarkson and Avnaim et al. Clarkson’s algorithm [Cla92] evaluates d × d determinants. The algorithm used in this implementation is the one presented by Avnaim et al. [ABD+97] for 2 × 2 and 3 × 3 determinants, which is sufficient for the Abstract Feature Geometry Specifica- tion. The implementation of this work was adapted from JTS and contains only 2 × 2 matrix evaluation. Its running time is sensitive to the numbers of bits of the input data. For a 2 × 2 matrix with b-bit Integer entries, the algorithm requires at most b iterations which operations are performed in O (1). In despite of its worse asymptotic worst-case runtime complexity compared to Clarkson’s algorithm, the constant runtime in O (1) is competitive in practice (see [ABD+95]) and the algorithm is much simpler than Clarkson’s.

Vertical relationship between two line segments. Let a and b be two line segments and let l be a vertical straight line that intersects a and b. The predicate determines the y-order of the intersections point al between a and l and the intersection point bl between b and l:

(al · y < bl · y). The algorithm can be used in the same manner interchanging the axis. Here the test is against a horizontal line and will determine the x-order of the intersection points between both the segments and the line l. This algorithm performs in constant time and is used by the Plane Sweep algorithm to sort the actual set of segments that intersect with the sweep line.

Point-In-Polygon Test. This basic operation verifies whether a point p lies within or out- side a polygon. This operator is used in the overlay operation to determine whether isolated components1 lay within or outside another polygon in order to properly label it. The algo- rithm creates a horizontal straight line (parallel to the x-axis) from p to positive infinity and determines the number of intersections with the polygon boundaries (rings, see figure 4.4). The point lies within the area if the number of intersections is odd. The whole algorithm, as well as the special cases where the horizontal line is colinear with or tangent to a ring segment are explained in [PS85] (see fig. 4.4 P5 and P6). The current implementation iterates over all segments defining the polygon’s boundaríes and tests for intersection with the straight line 2. The implementation is robust due to the

1 Isolated components are points or lines which do not contain any intersections with the other input geometry 2 (segment.start.y > p.y and segment.end.y <= p.y) or (segment.start.y > p.y and segment.end.y <= p.y)

36 Figure 4.4: The Point-In-Polygon Test determines the number of intersections of rings with the semi- finite straight line starting from the given point: P1 and P4 lie outside the polygon and have an even number of intersections. P2 and P3 lie within the polygon and have an odd number of intersections, P5 and P6 are special cases colinear and tangent intersection lines use of the Point-Line-Orientation test. An intersection is counted only if the point lies on the right side and the segment points downwards or if the point lies on the left side and the segment points upwards. Since the Point-Line-Orientation works in constant time, the Point-In-Polygon algorithm works in linear time O (n) where n is the number of segments that define the ring.

Line Intersection Test. The Line Intersection Test verifies whether two line segments inter- sect or not. It uses the Point-Line Orientation test. Let s1 and s2 be two line segments. s1 and s2 intersect, if s1.start and s1.end lie on opposite sides of s2 (i.e. s1.start and s1.end do not lie on the same side of s2) and s2.start s2.end lie on opposite sides of s1 (i.e. s2.start and s2.end do not lie on the same side of s1). s1 and s2 will not intersect if the conditions above are not satisfied.

Line Segment Intersection. This algorithm actually computes the intersection point between two line segments. Different techniques for line intersection computation are listed in [BGHV99], [vO94] and [BH98] in a broad overview. Some approaches address the robust- ness problem, for instance, the one of [BH98]. It is based on the idea of representing an inter- section point between two line segments by the input data, i.e. the original line segments. However, most of the used techniques for line intersection are based on direct calculation of the intersection point in floating-point arithmetic.

37 The computation of an intersection point between two line segments in this implementa- tion is based on homogeneous coordinates (see Exact Geometric Computation in section 2.2). It uses floating-point arithmetic, but is robust as it avoids division.

4.3 Set Operations

GIS can be better understood as a set of maps representing different topics of the real world [vO94]. Figure 4.5 illustrates three tematic maps with different information and their overlay: countries, capitals and rivers. Map Overlay3 is the combination of two or more maps into a single one according to given criterias. FGAS support map overlay through the definition of the set operations intersection, union, difference and symmetric difference between point sets of two or more transfinite sets (see Figure 4.6): TransfiniteSet intersection(TransfiniteSet pointSet) TransfiniteSet union(TransfiniteSet pointSet) TransfiniteSet difference(TransfiniteSet pointSet) TransfiniteSet symmetricDifference(TransfiniteSet pointSet)

Algorithms for set operations have a commom structure, consisting of the calculation of intersection points between input geometries of both maps. A spatial overlay join of two maps m1 and m2 will return a set of pairs of objects (o1, o2) where o1 ∈ m1 and o2 ∈ m2 and o1 inter- sects o2 [BGHV99]. In contrast, a map overlay operation will result in a completely noded graph (see Noding in Chapter 4.3.1) which contains the geometries of both input graphs. Hence, the result object consists of:

• all objects from m1 that do not intersect with objects from m2

• all objects from m2 that do not intersect with objects from m1

• all objects produced by two intersecting objects from m1 and m2

Since polygons overlay is more complex then lines and points overlay, the operation is often known as polygon overlay. A polygon overlay can be determined in several ways and has been subject of diverse publications ([PS85], [dBvKOS97], [BGHV99], [KBS91], [vO94], [MK89], [FH95], [Sch95], [BH98] and [Chu01]). Unlike the boolean operators, the Abstract Specification does not impose restrictions on the implementation of set operators.

3 A formal definition of the map overlay operation was given in 1991 by Kriegel and can be found in [KBS91]

38 Figure 4.5: Example of a Map Overlay: Three thematic maps with country, city and river information are overlaid into one map [imga]

4.3.1 General Discussions

Noding. The Map Overlay operation results in a completely noded graph of the two input geometries, that is a graph with both geometric objects and no self intersections in their line segments. The process of eliminating self intersections is called noding. Noding can be real- ized in several ways, as figure 4.7 illustrates. The most common and unambiguous solution is to split all line segments in the intersection points. This solution is used in this implemen- tation (fig. 4.73).

Closed sets In set theory, some set operations can produce non-closed sets when they are used on overlapping objects, as illustrated in fig. 4.8, where curve L overlaps polygon P and hence the subtraction of L from P results in a non-closed set (fig. 4.8 b). Since the current implementation always creates closed sets (fig. 4.8 c), such operations have no effect.

39 Figure 4.6: Set operations between two surfaces

Geometry Merging The result of a set operation comprises parts of the noded graph accord- ing to the operation applied. Here, there are several possibilities available to handle potential unnecessary intersection points. For instance, a line segment between points may be noded into several line segments by the overlay operation. A resulting geometric object containing this line (i.e. a curve, or part of a surface boundary) may store this line segment as a single line (as given by the input geometry) or as two split line segments. This discussion appears in two situations which will be explained with the use of the example in figure 4.9:

1. In the control points of a LineString: set operations will calculate the noded graph of the input geometries. The intersection points in the noded graph can be part of the resulting line, but are not always necessary to represent the result line topologically

correct. The line in (b) contains the control points p6 and p7, which are produced by

40 Figure 4.7: Different types of noding: The union of two curves (1) can be noded partially in topologi- cally equivalent representations (2) or completely (3)

Figure 4.8: Theoretically, set operations like difference can produce non-closed sets

noding. A topologically equivalent line like (a) can also be represented without these points.

2. In the sequence of LineStrings which defines a Curve or in the sequence of

LineSegments which defines a LineString: The curves in (c) intersect in p8. The result type of the union of the two curves is ambiguous; it can be a Curve, consisting of one big LineString or two or more LineStrings, or it can also be a MultiCurve consisting of two Curves.

There may be several possibilities to merge segments within a (noded) geometric object. Figure 4.7 ((2) and (3)) exemplifies this case. Merge may be ambiguous and need quadratic running time. It is therefore not recommended in practice. In the unambiguous case, demon- strated by figure 4.9, the merging of geometries like line segments and curves requires linear time. However, as most applications do not need merged geometries, this implementation does not merge geometries automatically after an overlay operation, but allows a merging operation to manually merge geometries if desired and possible. This can be done by using

41 Figure 4.9: Merging ambiguousness: The lines in (a) and (b) are topologically equivalent, but are defined by a different sequence of control points. The union of the curves in (c) is shown in (d). The object type of its final representation is ambiguous. the merge operator of a Curve or LineString. For example, curve1.merge(curve2) will join curve2 to curve1, if both curves are continuous.

Result types. Equivalent of similar geometries resulting from set operations can be repre- sented by different data types, for instance, by a Surface, a MultiSurface or a CompositeSurface. The most appropriate form of representation for result types is still an open issue in this report. Figure 4.10 illustrates the operation A.difference(B) on two CompositeSurfaces A and B, spliting the surface A into two surfaces, which should contain their boundaries since they are composites. According to the current implementation of the FGAS presented here, one object consisting of disjoint geometries (in this case disjoint CompositeSurfaces) can only be modelled by MultiSurfaces, since Complexes are abstract classes (non instanciable) and the boundaries of generating surfaces in CompositeSurfaces may not be disjoint. This work is an effort to provide a consistent implementation for set operations applied to Primitives, Complexes and Aggregates under the asumption that complexes are not instan- tiable. Set operations return currently Primitives (i.e. Point, Curve or Surface) for connected geometric object and Aggregate for splited geometries (i.e. MultiPoint, MultiCurve, MultiSur- face or MultiPrimitive). As approach to generate complexes for splited geometries would be in the future to apply the method MultiPrimitive.closure(), which is currently not implemented.

42 Figure 4.10: The difference between two CompositeSurfaces can result in objects, which are not repre- sentable in conformance to the Abstract Specification

4.3.2 Map Overlay

Research held in the the last decade has yielded different algorithms for the polygon overlay (see Set Operations, Chapter 4.3). Most of the algorithms are based on three basic steps:

1. Find all intersections between the two input geometries by building a topological graph of the geometries and searching intersection points between the components, resulting in a completely noded graph

2. Label the components with its position in relation to the other geometry

3. Select the components according to the desired operation (intersection, union, differ- ence, symmetric difference) to construct the output geometry

The above mentioned steps, specially the first two, can be carried out in a manifold man- ner. The possibilities for computing intersections of line segments, which is used to construct a noded graph, are discussed in the next paragraph Intersections between segment sets (section 4.3.2). The data structure used to store this graph has been discussed above (section 4.1). The underlying data structure affects remarkably the performance of the third step with respect to the construction of the resulting geometric object. The second step determines which line seg- ments will make part of the result geometry. The techniques for labelling edges and vertices (see above mentioned literature and [Sch95]) are based on the same principle: the position of edges and vertices of the objects are marked in relationship to one another and in relationship to the geometric object itself. The technique applied is explained in detail in the paragraph Labelling (see 4.3.2) below.

Intersections between segment sets. A map overlay needs to compute all intersections between two sets of line segments in order to create a noded graph. The runtime of a naive brute force test between all line segments performed in O n2 time is not acceptable for large

43 datasets. [FPM] presents a review on different methods of intersection calculation between segment sets, such as the segment tree based approach of Palazzi and Snoeyink [PS93] and various sweep line based algorithms. Plane sweep algorithms useing sweep lines have been specifically designed to search inter- sections in sets of objects, but are also used to solve problems such as the polygon triangulation. Plane sweep algorithms are commonly used in map overlay operations, since they perform well on GIS data and offer a competitive runtime. The first algorithms based on this paradigm were published in [SH76], [LP76] and [BO79]. Plane sweep is an output-sensitive algorithm, i.e. the runtime depends not only on the size of input data, but also on the size of the output data as well. Given that the size of the output is the number of intersections between the line segments, such an algorithm can also be called intersection-sensitive [dBvKOS97]. The basic idea of the algorithm is to move a horizontal line, the sweep line, from the top to the bottom of the plane (which holds the two input graphs) and to monitor all segments which intersect the sweep line. In addition, it is necessary to test for intersections between the segments that intersect the line at each position of the sweep line. Such proceeding tests all segments against each other. In worst case (when the line is at a position where it intersects all segments), the algorithm is still not intersection-sensitive. The key observation is that all intersecting segments are adjacent, at least, in one position of the down-moving sweep line. Hence, segments need to be tested only against its neighbours. Figure 4.13 shows the use of the algorithm within the map overlay operation. Note that the algorithm holds the invariant that all intersections above the sweep line are already found. At each level of the down-moving sweep line, the line holds a status, which is an ordered sequence of the segments that intersect the line from left to right. For example, the status in

figure 4.11 holds all four segments sj, sk, sl and sm since they all intersect with the sweep line l. In fact, the status is only altered, where a new segment begins or ends or where two segments intersect. These positions (where the sweep line changes status) are called event points and are stored in an ordered sequence. Therefore, it is only necessary to move the sweep line from one event point to the next that its status will change accordingly:

• A new segment begins (event point is upper end point of the segment): We insert the segment into the status and test the new segment for intersections with its two neigh- bours within the segment sequence. If the segments intersect, the new intersection point will be inserted as an event point.

• A segment ends (event point is lower end point of the segment): We delete the segment from the status. The former left neighbour and the former right neighbour of the deleted segment are then adjacent and it is necessary to test if they intersect. If the segments do intersect, the new intersection point will be inserted as an event point.

44 • At an intersection point: The order within the status changes. In this event point the two intersecting segments must be tested against their new neighbours. Figure 4.11

illustrates the sweep line at an intersection point between sk and sl. Above the inter-

section point, sl has already been tested against sj and sk against sm. Just below the

intersection point sl and sk invert their positions in the status sequence and must be

tested against sj and sm respectively.

Figure 4.11: The sweep line moves down to the next event point: As the event point is an intersection point, the involved segments sk and sl must be tested against their new neighbours

Note that event points are only visited once. In particular, geographic data hold sev- eral structures with connected segments (i.e. segments intersecting in end points), such as rings, and therefore the same event point would have to be inserted several times. How- ever, event points are only inserted if they are not already present in the event point sequence.

The running time of the plane sweep algorithm for a set S of n line segments in the plane is O (n · log(n) + I · log(n)), where I is the number of intersection points of segments in S. The algorithm starts constructing an event queue. As a balanced binary search tree this takes O (n · log(n)) time. Each deletion and insertion in this tree lasts O (log(n)) time. If there are m event points, the algorithm will need O (m · log(n)) time, because the intersection test and the computation of a segment intersection is performed in linear time. See [dBvKOS97] for a detailed proof (page 28) and pseudo code of the algorithm (page 20 et seqq). The implementation of this work uses the Plane Sweep algorithm implemented robustly within the JTS. Since a robust line segment intersection is a runtime expensive process, the algorithm uses Monotone Chain indexing (see Monotone Chains in Appendix B) to avoid a big part of unnecessary line segment intersection tests.

Labelling. The algorithm uses labels to mark the position of edges and nodes in relation- ship to both geometries. Nodes hold one position attribute , edges hold three position attributes : for the left side of the edge (left), for the right side of the edge (right) and the edge itself (on). The attributes have a value from the set {Interior, Boundary,

45 Figure 4.12: A noded graph with directed edges

Exterior} to describe the position. Figure 4.12 shows a simple example. The labels for the nodes are:

Nodes Geometry Location on

n1, n2, n4 A boundary B exterior

n8, n9 A exterior B boundary

n3, n5, n7 A boundary B boundary

n6 A interior B boundary

The following table gives some examples for the edge labels. For example, edge ~e3,5 is part of both polygon A and polygon B. Thus, it is equally labelled in relationship to A and B. Its labelled with the values exterior (for the left side of the edge), boundary (for the position the edge is located) and interior (for the right side of the edge). Its twin edge ~e5,3 has the same label but with the left and right sides interchanged.

46 Edges Geometry Location left Location on Location right

~e1,2, ~e4,1, ~e6,4, ~e2,7 A exterior boundary interior B exterior exterior exterior

~e2,1, ~e1,4, ~e4,6, ~e7,2 A interior boundary exterior B exterior exterior exterior

~e7,8, ~e8,9, ~e9,3 A exterior exterior exterior B exterior boundary interior

~e8,7, ~e9,8, ~e3,9 A exterior exterior exterior B interior boundary exterior

~e3,5 A exterior boundary interior B exterior boundary interior

~e5,3 A interior boundary exterior B interior boundary exterior ......

Algorithm in this Implementation. The implementation of this work does not strictly fol- low a published algorithm, instead it combines several approaches. It follows the three basic steps explained above and was modified to speed up performance. The implementation of the algorithm and all underlying used predicates have been adapted from the JTS according to the needs of the Feature Geometry. Points correspond to Points, LineStrings to Curves and Polygons to Surfaces.

The main steps of the algorithm are:

1. Create two topological graphs A and B, one for each input geometry, and copy the edges and boundary nodes into appropriate data structures of the two graphs. The labels are initialized with the position to its own geometry. The rings of polygons are supposed to be cw (clockwise) oriented. If the ring of a polygon is ccw (counter clock- wise) oriented the label will be inverted (in that way the algorithm is applicable for cw and ccw oriented rings). The label which represents the relationship with the other geometry is left empty.

2. Create a new empty topological graph C.

3. Compute the self intersections of each input geometry in A and B and store the inter- section points (as new nodes). (Note that figure 4.13 does not show this step, but only

47 Figure 4.13: Finding all intersections in the overlay operation using the plane sweep algorithm

the complete intersection computation between both input objects. This step is pro- cessed to reduce the number of segments which have to be tested for an intersection in the next step.)

4. Compute the intersections between the two input geometries in A and B and store the intersection points (as new nodes) (see figure 4.13 (b)).

5. Compute new edges by noding A and B based on the intersection points (substitute edges with intersection by those new noded edges). Existing labels are applied; in case of a redundant edge in the two graphs, the edge will be inserted only once and the two labels will be merged.

6. Add all edges as directed edges (twice: positive oriented and negative oriented) and intersection nodes to the new graph C (see figure 4.14).

7. Label edges and nodes of C in relation to the input geometries in A and B.

48 Figure 4.14: Internal graph representation

8. Compute the labelling for isolated components of the graph. Add the isolated compo- nents to the resultant graph.

9. Build the result polygons based on the labelled noded graph C. According to the selected operation, select edges located in the Interior/Boundary or in the Exterior of the geometries of A and B (edges with an INTERIOR on the right-hand-label are: CW oriented edges of an originally exterior ring, CCW oriented edges of an originally inte- rior ring):

• Intersection: interior of A and interior of B • Union: interior of A or interior of B • Difference: interior of A and exterior of B • Symmetric difference: (interior of A and exterior of B) or (exterior of A and interior of B)

10. Build the result lines based on the labelled noded graph C

11. Build the result points based on the labelled noded graph C

12. Build the result geometry based on the result polygons, lines and points

Let n be the complexity of both geometries, that is the sum of all segments and vertices, and k the complexity of the overlay, that is the number of all segment intersections within

49 and between two geometric objects, then the complete set operation by a map overlay is per- formed in O (n · log(n) + I · log(n)) time. Copying the segments and points into new graphs take O (n) time. Computing self intersections and intersections between both geometries is performed in O (n · log(n) + I · log(n)) time by the Plane Sweep algorithm. The computa- tion of new edges requires linear time in k and the creation of directed edges linear time in n. The labelling of edges and vertices in relation to the other geometry can be done in O (n · log(n) + I · log(n)) (see [dBvKOS97], page 39). The final collection of segments and points in order to create new polygons, lines and points need linear time in n. Further documentation of the algorithm can be found in [viv03b]. Similar map overlay algorithms are explained in [dBvKOS97] and [MK89].

4.4 Relational Boolean Operators

The analysis of topological relationships between geometric objects is an usual requirement in the processing of spatial queries such as "Retrieve all cities that are within 10 kilometres of Cologne" or "Find all highways in the states adjacent to Rio de Janeiro" [EH91]. Topological relationships can be easily processed if they are explicitly stored. In practice, due to the overwhelming volume of data and the maintenance efforts required to modify properties of spatial objects, such strategy is virtually infeasable. Hence, instead of storing all relationships among the objects it is more efficient to compute them. This requires a deep understanding of the evaluation of spatial relationships.

4.4.1 The Intersection Matrix

The 4-Intersection Matrix. The analysis of relationships between spatial objects has gained considerable research interest in the last decades. Max J. Egenhofer, Eliseo Clementini, Di Felice, Robert D. Franzosa et al. have all focused on this topic and established an approach to distinguish spatial relations ([CF95], [CFvO93], [CF96], [CFC95], [EF91], [EMH94], [EH91], [ECF94], [Ege93], [EF95]). The first research result of Egenhofer and Clemintini was the 4-Intersection-Model [ECF94], which differs between four topological relationships between regions with connected boundaries. The 4-Intersection-Model is based on the distinction between the interior of a region A (A◦) and the boundary of a region A (∂A) (see Figure 4.16). All possible combinations of their intersections are described in the 4-Intersection Matrix: ! ∂A ∩ ∂B ∂A ∩ B◦ (A, B) = A◦ ∩ ∂B A◦ ∩ B◦

50 By considering the values empty (φ) and non-empty (¬φ) for the interior and boundary of each spatial object, one can distinguish between sixteen (24) binary topological relations (see Figure 4.15). Eight of which can be realized for regions with connected boundaries in <2 and are called disjoint, meet, equal, inside, contains, covers, coveredBy and overlap. These relations are mutually exclusive. The remaining eight possible combinations of empty and non-empty interior and boundary of the two regions are topologically invalid.

Figure 4.15: The 4-Intersection matrix: its different configurations describe eight topological relations between two regions (From [ECF94])

In [CFvO93], Clementini extended the 4-Intersection-Model matrix through the differ- entiation not only between empty and non-empty interior and boundary, but also through the dimension of their intersection. In that way the matrix was able to handle relationships between 1-dimensional objects such as lines, as well as between 0-dimensional objects such

51 as points. This extension made it possible to distinct between 53 relationships. Since such a huge differentiation was not useful in practice, the eight more general relationships men- tioned above were retained.

The 9-Intersection Matrix. The 4-intersection matrix makes possible the evaluation of spatial relationships between points, lines and regions in two-dimensional space. However, modern GIS are capable of storing and analysing more complex structures such as regions with holes. The 4-Intersection matrix offers only a very limited service range to distinguish those spatial objects, so that different topological relations of regions with holes are mapped to the same configuration of the 4-Intersection matrix (see [ECF94] for examples).

Holes represent exterior regions within the original region. Thus, in order to make the model compatible with regions with holes, Egenhofer and Clementini extended the 4- Intersection-Model by adding an extra distinction, the exterior of region A (A−). This new model is called 9-Intersection-Model matrix ([ECF94], [EMH94], [EH91]):

 A◦ ∩ B◦ A◦ ∩ ∂B A◦ ∩ B−   ◦ −  R(A, B) =  ∂A ∩ B ∂A ∩ ∂B ∂A ∩ B  A− ∩ B◦ A− ∩ ∂B A− ∩ B−

Each different set of the 9-Intersections describe a different topological relation. All sets with the same configuration are considered topological equivalent. The simple and most general topological invariants, empty (φ) and non-empty (¬φ), are used by the 9-Intersection as its contents. As an example, we will analyse the topological relation where region A contains region B (see Figure 4.16).

Figure 4.16: Interior, Boundary and Exterior of two Polygons: Polygon A contains Polygon B

52 The 9-Intersection matrix for the contains relationship is:

 A◦ ∩ B◦ = ¬φ A◦ ∩ ∂B = ¬φ A◦ ∩ B− = ¬φ   ¬φ ¬φ ¬φ   ◦ −    R(A, B) =  ∂A ∩ B = φ ∂A ∩ ∂B = φ ∂A ∩ B = ¬φ  =  φ φ ¬φ  A− ∩ B◦ = φ A− ∩ ∂B = φ A− ∩ B− = ¬φ φ φ ¬φ

The sequence of values from left to right and from top to bottom is: interior, boundary, and exterior.

Figure 4.17: The 9-Intersection matrix: regions with holes can be separated exactly into the eight spec- ified topological relations by additional distinction of the polygon exterior (from [EH91]).

Today’s common GIS provide operators for the evaluation of the relationship between geometric objects, as it is a common application in analysis of geospatial data. Those opera-

53 tors, also known as binary predicates or Boolean operators, test whether a certain topological relationship between two spatial objects exists. The Feature Geometry Abstract Specification ([OGC97], chapter eight) specifies three general Boolean operators, equals, contains and inter- sects and restrict the implementation to the use of the Egenhofer intersection matrix or the dimension extended intersection matrix of Clementini.

Symbol Non Empty? Meaning T TRUE The intersection at this position is non-empty F FALSE The intersection at this position is empty N NULL do not test the intersection at this position

Table 4.1: Values of the Egenhofer Intersection Pattern Matrix

Symbol Non Empty? Meaning 0 0 The intersection at this position contains only points 1 1 The intersection at this position contains only points and curves 2 2 The intersection at this position contains only points, curves and surfaces 3 3 The intersection at this position contains only points, curves, surfaces and solids T TRUE The intersection at this position is non-empty F FALSE The intersection at this position is empty N NULL This operator does not test the intersection at this position

Table 4.2: Values of the Clementini Intersection Pattern Matrix

The Egenhofer intersection matrix (see 4.1) can hold the values TRUE (non-empty), FALSE (empty) and NULL (wildcard) to describe the intersection at a position in the matrix. This is sufficient to represent the relations in figure 4.17 unambiguously. The Clementini intersection matrix adds values of the intersection dimension to the possible content values (see 4.2). As demonstrated in the Extended 4-Intersection-Model the additional distinction of the intersection dimension makes it possible to analyze the relationships between objects of the same geometric type (e.g. Curve/Curve) and objects of different geometric types (e.g. Curve/Surface or Point/Curve). This matrix is also called the Dimensionally Extended Nine-Intersection Model (DE-9IM) matrix.

The example in figure 4.16 expressed in the DE-9IM is:

54  2 1 2    R(A, B) =  φ φ 1  φ φ 2

The Simple Feature Implementation Specification [SFS05a] offers a more detailed sepa- ration into equals, disjoint, intersects, touches, crosses, within, contains and overlap. However, in contrast to the Abstract Specification, it requires the 9-Intersection model to be used with the pattern matrix of Clementini.

Although the majority of Egenhofer’s and Clementini’s publications in relation to the intersection matrix refers to two dimensional space, the intersection matrix is generally valid for three dimensional space as well. However, in three dimensional space, more relationships between the objects occur and the matrix calculation is more complex.

4.4.2 The Boolean Operators

The Relate Operator. The abstract specification specifies a relate operator boolean eRelate(Geometry, Geometry, intersectionPattern) for the Egenhofer intersection pattern matrix and boolean cRelate(Geometry, Geometry, intersectionPattern) for the Clementini intersection pattern matrix. This operator tests the intersections between the interior, boundary and exterior of the two input geometries Geometry and return TRUE, if and only if the two objects are spatially related according to the values specified in the intersectionPatternMatrix parameter. For example, the test whether a Primitive A contains another Primitive B be performed by A.contains(B) or eRelate(A, B, "NFNNTNNFT").

Selection of the operators. The operators equals, intersects and contains are specified by the Abstract Specification in the class TransfiniteSet: boolean equals(TransfiniteSet pointSet) boolean intersects(TransfiniteSet pointSet) boolean contains(TransfiniteSet pointSet) boolean contains(DirectPosition point). Since Geometry is the only subclass of TransfiniteSet, the methods are embedded into the Geometry implementation GeometryImpl and applicable to GeometryImpl instances (respective DirectPosition for the contains method).

55 The definition of the different topological relationships defined by Egenhofer and Clementini (see Figure 4.17) differ from the ones specified by the SFS. The former distin- guish between contains and covers, the latter handles both cases in a single operator contains. However, the SFS divides the relationship overlaps into two operations overlaps and crosses. The operators disjoint, overlaps, touches, within (equals containedBy) and crosses are not speci- fied by the FGAS, but are implemented to provide a complete set of topological relationships. The definitions of the used relationships are based on the ones specified by the SFS.

The following part of the section will explain the different relationships in detail and discuss their formal definitions. In mathematical terms of a matrix pattern, the "*" character is a placeholfer for "N" (the relationship will not be tested at this position).

Disjoint Disjoint is the opposite to Intersects. It returns TRUE if the two geometries do not have any points in common. Figure 4.18 shows a set of pairwise disjoint geometric objects.

Figure 4.18: Disjoint geometric objects

Geometric objects are disjoint if neither the two interiors nor the two boundaries intersect and if the interior of one object does not intersect the boundary of the other object. Expressed in mathematical terms of the DE-9IM for two geometric objects a and b: a.disjoint(b) ⇔ (I(a) ∩ I(b) = φ) ∧ (I(a) ∩ B(b) = φ) ∧ (B(a) ∩ I(b) = φ) ∧ (B(a) ∩ B(b) = φ) ⇔ a.Relate(b, ”FF ∗ FF ∗ ∗ ∗ ∗”)

Intersects The intersects operator verifies whether the two objects are not disjoint, i.e. the intersection between their boundaries and interiors are not empty. The implementation uses the disjoint operator to determine whether two geometric objects intersect: a.intersects(b) =!a.disjoint(b)

56 Touches The touches operator verifies whether two geometric objects spatially touch each other, i.e. their boundaries intersect, but their interiors are disjoint. Note that the boundary of a curve is the start and end points of the curve itself. Figure 4.19 illustrates examples of the touches relationship. Mathematically, this operator is defines as follows: a.touches(b) ⇔ (I(a) ∩ I(b) = φ) ∧ ((B(a) ∩ I(b) 6= φ) ∨ (I(a) ∩ B(b) 6= φ) ∨ (B(a) ∩ B(b) 6= φ)) ⇔ a.Relate(b, ”F t ∗ ∗ ∗ ∗ ∗ ∗ ∗ ”) ∨ a.Relate(b, ”F ∗ ∗T ∗ ∗ ∗ ∗ ∗ ”) ∨ a.Relate(b, ”F ∗ ∗ ∗ T ∗ ∗ ∗ ∗”)

Figure 4.19: Examples of Touches relationships: Surface/Surface(a), Surface/Line (b), Surface/Point (c), Curve/Curve(d), Curve/Point(e)

Note that a Point can only touch a line if it intersects the line’s boundary. A point does not touch a line if it lies on the line, because at least one boundary needs to intersect the interior of the other object. However, a point doesn´t have a boundary. Therefore a Point can never touch another point.

Contains Contains tests whether the object spatially contains the other objects, i.e. the point set which defines the other object is partial quantity of the point set that defines this object. The implementation of this operator is based on the inversion of the within operator: a.contains(b) = b.within(a)

57 Figure 4.20: Examples of the Within / Contains relationship: Surface/Surface (a), Surface/Curve (b), Surface/Point (c), Curve/Curve(d) and Curve/Point (e)

Figure 4.20 illustrates some examples of this relationship. The boundary of an object which is contains another object may intersect with the boundary of this object.

Within This operator is the opposite operator of contains. It returns true if the object is spatially within the other object. The boundaries of both geometric objects may intersect. Expressed in mathematical terms of the DE-9IM for two Geometry objects a and b: a.within(b) ⇔ (I(a) ∩ I(b) 6= φ) ∧ (I(a) ∩ E(b) = φ) ∧ (B(a) ∩ E(b) = φ) ⇔ a.Relate(b, ”TF ∗ F ∗ ∗ ∗ ∗ ∗ ”)

Figure 4.21: Examples of the overlaps relationship

Overlaps The overlap operator tests whether two objects with the same topological dimen- sion, e.g. multipoints, two curves or two surfaces overlap. Intersections between objects with

58 different dimensions will never be identified as overlapping. Case boths objects are points or both objects are areas, the matematical description in terms of the DE-9IM is: a.overlaps(b) ⇔ (I(a) ∩ I(b) 6= φ) ∧ (I(a) ∩ E(b) 6= φ) ∧ (E(a) ∩ I(b) 6= φ) ⇔ a.Relate(b, ”T ∗ T ∗ ∗ ∗ T ∗ ∗”)

Case a is a line and b is a line: a.overlaps(b) ⇔ dim(I(a) ∩ I(b)) = 1 ∧ (I(a) ∩ E(b) 6= φ) ∧ (E(a) ∩ I(b) 6= φ) ⇔ a.Relate(b, ”1 ∗ T ∗ ∗ ∗ T ∗ ∗”)

Figure 4.22: Examples of the crosses relationship

Crosses The crosses relationship applies to relationships between points/lines (i.e. curves or rings), points/regions, lines/lines, and lines/areas. Case a is a point and b is a line, or case a is a point and b is an area, or case a is a line and b is an area, the matematical rule in terms of the DE-9IM is: a.crosses(b) ⇔ (I(a) ∩ I(b) 6= φ) ∧ (I(a) ∩ E(b) 6= φ) ⇔ a.Relate(b, ”T ∗ T ∗ ∗ ∗ ∗ ∗ ∗”)

Case a is a line and b is a line: a.crosses(b) ⇔ dim(I(a) ∩ I(b)) = 0 ⇔ a.Relate(b, ”0 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗”)

Equals The equals operator tests whether two geometric objects are topologically equal, i.e. they describe an equivalent set of points and therefore also have the same boundary. A formulary mathematical description is: a.equals(b) ⇔ ((I(a) ∩ I(b) 6= φ) ∧ ((I(a) ∪ B(a)) ∩ E(b) = φ) ∧ ((I(b) ∪ B(b)) ∩ E(a) = φ) ⇔ a.Relate(b, ”T ∗ F ∗ ∗FFF ∗ ”)

59 Note that this operator does not determine whether two objects are instances of the same class. The test of equality is limited to the point sets of the objects.

4.4.3 Algorithm

In order to calculate the Dimension Extended 9-Intersection matrix it is necessary to compute the intersections between the line segments of the two input geometries. Hence, a great part of the overlay operation code is reused in this implementation.

1. Create two topological graphs A and B, one for each input geometry (the same as step 1 in map overlay)

2. Compute the self intersections of each input geometry in A and B (the same as step 3 in map overlay)

3. Compute the intersections between the two input geometries in A and B (the same as step 4 in map overlay)

4. Label edges and nodes of the two geometries in A and B in relation to the other input geometry (uses the same code as step 7 in map overlay)

5. Compute the labelling for isolated components of the graph (uses same code as step 8 in map overlay)

6. Compute the intersection matrix by evaluating the labelling of all nodes and edges. Each component sets the value of the intersection matrix at the position according to its label´s attributes to its topological dimension if the previous value at this position is not higher than the new value:

• A node with the attributes a = label[0][on] (which is from the set {Interior, Boundary, Exterior}) for its label in relation to A and b = label[1][on] for its label in relation to B object will set the value of the intersection matrix at position [a][b] to at least 0, because the intersection at this position is a point. • An edge with the attributes o0 = label[0][on] (position in relation to A) and o1 = label[1][on] (position in relation to B) will set the value of the intersection matrix at position [a][b] to at least 1, because the intersection at this position is a line. If the edge was created by a polygon boundary, i.e. it describes an area, the intersection dimension of the left (l0 = label[0][left] and l1 = label[1][left]) and right side (r0 = label[0][right] and r1 = label[1][right]) of the edge will be evaluated as well. The value of the intersection matrix at position [l0][l1] and [r0][r1] will be set to at least 2, because the intersections at these positions are areas.

60 The last step of the algorithm uses the following functions, which are described in pseudo code, for each node and edge [viv03b]: function Node.computeIM(im : IntersectionMatrix) if (label[0] != null and label[1] != null) then im.setAtLeast(label[0][On], label[1][On], 0) end if end function function Edge.computeIM(im : IntersectionMatrix) if (label[0] != null and label[1] != null) then im.setAtLeast(label[0][On], label[1][On], 1) im.setAtLeast(label[0][Left], label[1][Left], 2) im.setAtLeast(label[0][Right], label[1][Right], 2) end if end function Listing 4.1: Functions for nodes and edges to set the intersection matrix

Once the intersection matrix is calculated, it is only necessary to compare it with the configuration of the desired Boolean operator (see previous section Boolean Operators 4.4.2). Since the relational Boolean operators are based on the map overlay algo- rithms, the asymptotic running time complexity of each relational Boolean operator is O (n · log(n) + I · log(n)), where n is the complexity of both geometries, that is the sum of all segments and vertices, and k the complexity of the overlay, that is the number of all segment intersections within and between two geometric objects. Once all segments are labelled completely, the computation of the intersection matrix is performed in linear time in n and the test for the intersection pattern in constant time. However, prior to performing the map overlay the envelopes of the two objects are tested for intersection. Hence, in the event of disjoint envelopes the result is returned in constant time.

4.5 Constructive Operations

4.5.1 Buffer

The buffer operation extends the geometric object in a certain radius. This operation is not currently implemented, but is described for the sake of completeness. Buffers can be distin- guished into positive and negative buffers. A positive buffer extends the point set, a negative buffer diminishes the point set. Figure 4.23 illustrates example of a curve (c1) and its positive buffer (c2). A surface (s1) can be extended by a positive buffer (s2) or reduced by a negative buffer (s3).

61 Figure 4.23: Examples of buffers: a positive buffer can be performed on a curve (c2) and surface (s2), a negative buffer only on a surface (s3)

The Abstract Specification defines the syntax for a positive buffer: Geometry getBuffer(double distance)

4.5.2 Centroid

The centroid of a geometric object is its centre of mass. Figure 4.24 shows examples of centroids of different geometric objects. The centroid of a point is the point itself, the centroid of a straight line lies on the mid of the line, the centroid of a point set is the average of the points and so on. As demonstrated in the examples, the centroid does not necessarily have to lie on the object. The Abstract Specification specifies the centroid operation for all Geometry by the method DirectPosition getCentroid() The centroid calculation of Points and MultiPoints is trivial: for a Point it is the position of the Point itself and for a MultiPoint it is the average of all contained points. The centroid of a Curve is computed by the average of the centroids of its straight CurveSegments, which are weighted by its length:

PNumOfLineSegments i=1 CentroidOfLineSegmenti ∗ LineSegmentiLength CurveLength A Ring and a MultiCurve will be handled like a curve, splitting its curve elements into LineSegments. The centroid of a CurveBoundary will be the average of the start and end point of the boundary and the centroid of a SurfaceBoundary will be computed in the same manner the centroid of a Ring is computed, considering not only the LineSegments of the exterior ring, but also the LineSegments of all interior rings. In contrast to the previous discussed geometric objects, the centroid computation of a Surface is not that trivial. A simple approach is to divide the surface into triangles, compute the sum of the centroids of each triangle weighted by its area, and than normalize this sum

62 Figure 4.24: Examples of centroids of geometric objects: a point set (a), a straight line (b), a curve (c), a simple polygon (d) and a simple polygon with a hole (e) by the total surface area (see (b) in figure 4.25). This procedure is similar to the computation of the centroid of a curve. The centroid of a triangle is simply the average of its three vertices (see (a) in figure 4.25):

p .x p .x p .x 1 + 2 + 3 centroid.x p .y p .y p .y = 1 2 3 centroid.y 3

The area of a triangle with vertices p1, p2 and p3 is:

AreaT riangle = (p2.x − p1.x) ∗ (p3.y − p1.y) − (p3.x − p1.x) ∗ (p2.y − p1.y)

Do to the the running time costs of surface triangulation other methods are prefered. A simpler method is based on the fact that the triangulation does not have to be a partition of the surface. By building all triangles between two continuous points of the polygon vertices of the polygon and a fixed point, it is possible to calculate the centroid by weighting each of the triangles’ area positively and negatively. In the example in figure 4.26 the point p1wasselected

63 Figure 4.25: Centroid of a triangle (a) and centroid of a surface computed by the centroids of the surface triangulation

Figure 4.26: Triangulation by choosing a fixed point instead of partitioning the surface as base point for the triangulation. Each consecutive two vertices build a triangle with the base point, i.e. (p1, p2, p3), (p1, p3, p4), (p1, p4, p5) and (p1, p5, p6). As the example shows, triangles can overlap (t1/t2 and t2/t3) and contain parts which are not part of the original surface (t2 and t3). As in the first approach, the sum of the centroids weighted by the area is computed for all triangles with positive weights for clockwise and negative for counterclockwise oriented nodes of triangles. In the example, the triangles t1, t3 and t4 are clockwiseoriented and there- fore added to the centroid sum. The area of triangle t2 are subtracted as it is counterclockwise oriented. It subtracts the area added twice by the overlapping of t1 and t3 and the fragment of t3 which has been added but is not part of the surface. Holes will be handled in the same manner, however they must always be added with the opposite sign, i.e. triangles which are

64 Figure 4.27: Triangulation of a hole within a surface generated by points of interior rings of the surface boundary will be added to the sum if they are counter clockwise oriented and subtracted from the sum if they are clockwise oriented. Figure 4.27 shows the triangulation of a hole within a surface. The triangulation with the same base point as before generates the triangles (p1, p7, p8), (p1, p8, p9) and (p1, p9, p7). As t5 is clockwise oriented, it will be subtracted from the sum. The fragment which has been subtracted from the sum, but forms part of the surface, will be compensated by the addition of the counter clockwise oriented triangles t6 and t7. The computation of a centroid has an asymptotic running time of O (n) where n is

• the number of points in case of a point or point set (that is Point, MultiPoint, Curve- Boundary, CompositePoint)

• the number of line segments in case of a Curve, Ring, Multicurve or SurfaceBoundary

• the number of control points in case of a Surface or MultiSurface

The algorithm is an extension of the approach that is described in [wwwa]. However, it does not consider holes. As most of the implemented operations, the actual configuration of the centroid implementation uses the floating-point arithmetic class for the elementary arithmetic operations in order to compute the centroid coordinates. Hence, floating-point rounding errors may occur, which may influence the accuracy of the result. However, the algorithm is still robust. Given that the centroid computation of points and lines is simply based on the average of all components, the algorithm works in 2D, 2.5D and 3D for Points, MultiPoints, Curves, MultiCurves, Rings, CurveBoundaries and SurfaceBoundaries. Geometric objects which rep- resent an area such as Surface, MultiSurface and CompositeSurface are only supported in two dimensional manner without elevation.

65 4.5.3 Convex Hull

The convex hull of a set of points S, also known as the convex envelope, is the minimal containing S. In the plane it may be easily visualized by imagining an elastic band stretched open to encompass the geometric object. Figure 4.28 shows examples of convex hulls for a point set, curves and surfaces. The convex hull is used, for example, in the collision detection specially in computer games. The collision between two convex objects can be computed much faster than between two concave objects. Only in case of collision between the convex hulls the real costly collision between the original objects will be calculated.

Figure 4.28: Examples of convex hulls: Surface (a), MultiSurface (b), MultiPoint (c), straight Curve (d), Curve(e) and MultiCurve(f)

There are several ways to compute the convex hull of a geometric object represented by a point set. One of the most used algorithms is the Graham’s scan which is based on [And79] and the first algorithms of Graham [Gra72]. The Graham’s scan is an incremental algorithm, i.e. it adds the points in S one by one, updating the solution after each addition. The points will be added from left to right in lexicographic order (see Lexicographic Comparison of two Points in 4.2). As it is convenient that the later convex hull vertices are ordered from left to right as well, i.e. in clockwise order, the algorithm first calculates the upper part of the hull which begins at p1 and ends at a pn and then, in a second scan, the lower part of the hull which begins at pn and ends at p1 (see illustration (b) in figure 4.29). The basic step in the algorithm

66 Figure 4.29: The Graham’s scan algorithm computes the upper and lower hull separately from left to right and deletes points which do not result in a right turn is the update of the actual hull after each addition. We verify the last three vertices in order to determine whether the new hull is still convex. If the last three vertices describe a right turn, we can proceed to the next point. If the three vertices describe a left turn we delete the middle point of the three points from the result list because it lies within the convex hull and is not a convex hull vertex. As the previous points may still describe a left turn, we loop this verification for each three last points until the last three points make a right turn or there are only two points left. Figure 4.29 (a) shows this proceeding. The points p1, p2, p3 and p4 were added. Now we add pi. The last three vertices p3, p4 and pi make a left turn so that the middle point p4 will be deleted. As p2, p3 and pi still make a left turn, the middle point p3 will be deleted as well. Afterwards the last three points p1, p2 and pi describe a convex curve so that the algorithm can proceed with the next point after pi. In case of collinearity of the last three points, the algorithm behaves in the same way as in case of a left turn. After the upper hull is calculated, the algorithm calculates the lower hull by adding the points from right to left. The result of the algorithm is a clockwise ordered list of points that represents the convex hull of S.

The Abstract Specification defines the convex hull operation for all Geometrys: Geometry getConvexHull() The implemented operation returns

• a Point for the convex hull represented by a single point. This is always the convex hull of a Point or CompositePoint.

• a straight Curve (see (d) in figure 4.29) for a convex hull operation on a topological equally straight curve (that can be a Curve, MultiCurve or CompositeCurve) or a Mul- tiPoint with only two elements (or more, but collinear)

67 • a Surface for all other geometric objects with three or more non-collinear points

In a set of n points in the plane the Graham’s scan computes the convex hull in O (nlog(n)) time. The implementation adopted from the JTS uses a previous coordinate filter to eliminate redundant points, and a special heuristic to reduce the number of points, if there are many points. The heuristic searches for an octahedron lying within or on the convex hull and can then delete all points which lie within the octahedron, as they are not candidates for vertices of the convex hull. The octahedron is calculated by finding the eight extreme points with the following attributes: lowest x value, lowest y value, lowest x+y value, lowest x-y value, highest x value, highest y value, highest x+y value and highest x-y value. The calculation of this additional filter needs linear time. Thus it does not influence the asymptotic running time of the complete operation. However, it is only recommended for large coordinate sets as it contains a certain overhead. The JTS default value is 50, which is an appropriate limit where the Graham’s scan will perform slower than the optimized procedure for large data sets.

4.6 Metric Operations

4.6.1 Distance

One of the most important attributes in the relationship between two geometric objects is the shortest distance between them. For many higher algorithms, this is one of the essential basic functions to support them. The method is specified as follows: double getDistance(Geometry geometry)

The shortest distance between two geoemtric objects is the distance between their two closest points. This search of those points a typical problem in computational geometry. Due to the time limit, the distance operation has not been implemented in this work. A simple approach to find the two closest points in two sets is the comparison of all segments of the two sets. This needs quadratic running time. The JTS’s distance operation follows such an approach and could be adapted in our implementation. Its performance can be bettered by implementing special filter to perform less comparisons.

68 5 Testing Suite

5.1 Test Environment

Software development is subject to the developer’s creation. Standards and coding conven- tions may help unify source code and increase its quality, but do not guarantee its correctness. Testing helps identify the correctness, completeness, security, and quality of software. It is an instrument to assure that the developed software behaves as it is supposed to. Testing may be viewed as a sub-field of software quality assurance, but it is often seen as a separate field.

There are many ways and instruments to test a software. In Java software development, the most common testing tool is JUnit. It is one of the most successful testing frameworks which provides Test-Driven development, a development technique where, first, a test case is written and, then, the code necessary to pass the test is implemented. JUnit is flexible and scalable and is not limited to any kind of software. A complex enterprise application can be tested just as an abstract software component such as this geometry implementation.

JUnit is based on the assertion of assumptions. The assertions are provided by a large number of classes and operations. The class Assert contains many methods such as assertEquals, assertNull, assertNotNull, assertNotSame, assertTrue or assertFalse to verify an assertion. For example assertTrue(result1 == 10); assertNotNull(result2); will verify that result1’s value is 10 and result2 is an initialized object and not null.

Most of the important Java IDEs (such as Eclipse) integrate JUnit as their standard testing tool and provide GUIs to process and evaluate their tests accordingly. JUnit consists of Test- Cases. TestCases contain the real tests, i.e. the verification of assumptions through methods provided by JUnit. A TestCase can be run individually. However, the number of TestCases increases with the complexity of the software. JUnit solves this problem through TestSuites, which calls certain TestCases. A TestCase can be called by different TestSuites and a TestSuite itself can also be called by other Testsuites. This feature enables JUnit to represent complex

69 Figure 5.1: Hierarchy of TestSuites and TestCases hierarchies by modular and flexible test components. Figure 5.1 examples the hierarchy of the test suites and cases in this implementation.

5.2 Test Methodology

In general, TestCases are divided between each data type. Some data types, such as the dif- ferent boundary types, are summarized into one TestCase. The tests include the verification of correctness of:

• The constructors of the data types, tested by direct instantiation.

• The methods of the factory classes in order to instantiate objects. Depending on the defi- nition of the factory method, in some cases, it must be tested whether the new instance’s components are really new objects instead of references to existing objects.

• Characteristics of geometric objects. This includes the MbRegion, RepresentativePoint, Boundary, Closure, isSimple, isCycle, Dimension, CoordinateDimension and the Enve- lope.

• Spatial operations such as the Centroid, Convex Hull, the set operations (intersec- tion, union, difference, symmetric difference) and the Boolean relational operators (for instance intersects, contains, equals). In the set operations, all important combina- tions of topological relationships between geometric objects for each operation has been tested. This verification has been done manually, since the visual verification is not as time consuming as assertion using JUnit methods. A minimal application was devel- oped to visualize geometries. The relational Boolean operators were tested similarly. All important combinations of topological relationships between two geometric objects for each operator have also been verified, but they have been asserted by JUnit methods.

70 • The method to clone geometric objects. This test includes the verification of whether the result and the components of which it is composed really refer to new instances.

• The basic algorithms and computations provided by the util classes.

The success of these tests demonstrates that the implemented data types and operations conform with the GeoAPI and its operation’s definitions.

71 6 Conclusions and Recommendations

6.1 Conclusions

The Feature Geometry Abstract Specification, one of the main topics of OGC’s document series, has experienced increasing interest due to its two- and three-dimensionality and diver- sity of geometry types in . This prototype of the geometry implementation of the Feature Geometry Abstract Specification (ISO19107) is a first impulse towards a full implementation of the ISO19107 in Java. The main characteristics of the current implementation are:

• It provides an implementation of basic data types and spatial analysis operations defined by the FGAS. It supports 3D data storage but only 2D spatial analysis. Spatial analysis is performed between geometries created with the same feature geometry fac- tory. There singleton factories for 2D, 2.5D and 3D objects. The geometry operations use robust algorithms; a possibility for exact computation in metric operations is proposed and implemented. Two suggestions to provide persistence are given to avoid problems with large data sets. The work is implemented in the seminal object oriented language Java, conforming to the SUN Coding conventions. Its data model and operations are tested and can be used in OGC conform GIS. This prototype can be the extended and incorporate the 3D capabilities of the Implementation Specification.

• It provides OGC conform interoperability due to the implementation of the GeoAPI interfaces, the official set of Java interfaces for the ISO19107 data model.

• It pushes the development of standardized GIS and DGIS. As a result of its interoper- ability, the geometry implementation can be easily integrated into such GIS. Reusable GIS components quicken the development of applications and reduce costs. This imple- mentation is free open source software, published under GNU Lesser General Public License (LGPL) terms as a SourceForge subproject in the namespace of GeoTools.

• It supports the GeoAPI to be more suitable in practice. Based on the experience gained in this implementation, this work presents suggestions for the GeoAPI interfaces. Fur- thermore, interfaces, which were defined by the FGAS but not implemented yet in the

72 GeoAPI, are provided conforming with the format of the GeoAPI. These suggestions can be found in Appendix D.2.

• It supports the work of GeoAPI compliance testing, including a JUNIT test suite which verifies the correctness of the implemented methods. The majority of these tests do not depend on the local implementation but only on the GeoAPI interfaces. Therefore, they can be used to form a part of a future GeoAPI test suite, which verifies the compliance of an implementation.

• It suggestes extensions for FGAS, like a boundary type for aggregates, as well as a possi- bility to represent complexes generated by primitive not sharing a common boundaries. Suggestions are summarized in Appendix D.1.

6.2 Future Work

This research focuses on the FGAS data model and its spatial analysis operations. However, a geometry implementation is a complex task which involves many aspects. Many issues still require further investigation.

• Implementation of the missing operators: This work does not support all operations specified by the FGAS for the selected data types. These should be implemented. Spe- cially the complex and aggregate classes still need work.

• Implementation of 3D operations: This implementation provides most of the operations for 2D data only. Algorithms for 3D geometries are more complex. The long-term objec- tive of this implementation is to support complete analysis of 3D data.

• Implementation of missing data types: The FGAS defines a large diversity to represent geometric objects such as Splines or Béziers to represent curve segments and surface patches. Only LineString is implemented currently. New curve segments can be easily added to the data model.

• Topology: The FGAS specifies a set of objects to represent an internal topology. Topol- ogy brings great performance and robustness to spatial operations.

• Storage and Persistence: More efficient and standardized approaches for persistet data storage in files or databases should be investigated.

• I/O support: The instantiation of the geometric objects from Geographic Markup Lan- guage (GML3) or its binding java classes create from GML schemas.

73 • Practice oriented operations: In practice, there are many typical spatial operation which should be supported better by the underlying geometry model. An overlay for more than two geometries is one example: Many GIS applications support overlaying more than two layers. However, most data models’ operations are binary, i.e. they only sup- port the operation between two objects. The support of more than two geometries at the same operation call would be a great performance benefit for these functions. Due to the structure of the overlay algorithm, an overlay of three or more geometries (instead of only two) is possible in little extra time. Developers of the JTS community have already discussed this issues in the JTS developer mailing list jts-devel.

74 Appendix A

Glossary

2d two dimensional

2.5d two and a half dimensional

3d two dimensional

API Application Programming Interface

CAD Computer aided design

CAM Computer aided manufacturing ccw counter clockwise

CG Computational Geometry

CORBA Common Object Request Broker Architecture

CRS Coordinate Reference System cw clockwise

DCEL Doubly-Connected Edge List

DCP Distributed

DE-9IM Dimension extended 9-Intersection-Matrix

DEM Digital Elevation Model

DGIS Distributed GIS

DTM Digital Terrain Model

ESRI Environmental Systems Research Institute

75 EU European Union

FGAS Feature Geometry Abstract Specification (the ISO 19107)

FTP File Transfer Protocol

GIS Geographic Information System

GML Geographic Markup Language

GNU General Public License

GUI Graphical User Interface

IEEE Institute of Electrical and Electronics Engineers

ISO International Organization for Standardization

ISO/OSI ISO Open Systems Interconnection Basic Reference Model

IT Information technology

JDK Java Developer Kit

JVM Java Virtual Machine

LGPL GNU Lesser General Public License

MB Minimum Bounding; usually the minimum bounding region

OGC Open Geospatial Consortium

OLE Object Linking and Embedding

OMG Object Management Group

OMI Object Model Interface oo Object oriented

OS Operation system

SF Simple Feature Geometry

SFS Simple Feature Geometry Implementation Specification

SQL Structured Query Language

76 SRS Spatial Reference System

SCID Spatial component identifiers

TCP/IP Transmission Control Protocol/Internet Protocol

TELNET Teletype Network

Tin Triangle Irregular Network

W3C World Wide Web Consortium

XML Extensible Markup Language

77 Appendix B

Technical definitions

IEEE Floating-Point Standard.

The IEEE Standard 754-1985 for Binary Floating-Point Arithmetic [P7585] is the most accepted standard of floating-point computation by hardware-manufacturers. The standard requires √ that the results of +, −, · and are exactly rounded according to a chosen rounding mode. Rounding to the nearest (LSB1 becomes zero), rounding toward zero, rounding toward ∞ and rounding toward −∞ are rounding modes which have to be supported.

Intention of the development of this standard was the different hardware behaviour on elementary floating-point operations. Usually, source code which does not behave equally on different machines is not useful. The IEEE floating-point standard guarantees machine- independent results on basic operations and thereby makes code portable.

The standard specifies floating-point computation single, single extended, double and double extended precision. Single precision can store a 32-Bit word with a mantissa length l = 24 and an exponent range of [−126..127]. Double precision stores two consecutive 32-Bit words with a mantissa length l = 53 and an exponent range of [−1022..1023]. Thus, the relative errors are bounded by 2−24 and 2−53.

Monotone Chains.

Monotone chains are a technique to gain computational performance in many spatial oper- ations without bringing a lot more complexity to the code. The set of all line segments is divided into monotone chains of connected segments. A monotone chain has two important characteristics:

• Non-Intersection Property: the segments within a monotone chain do not intersect.

1 Last significant bit

78 Figure B.1: Examples of Monotones Chains (from [viv03b])

• Endpoint Envelope Property: the envelope of any continuous subset of the segments in a monotone chain is the envelope of the endpoints of the subset.

Based on this rules, monotone chains bring effective advantages: There is no need to search intersections within a monotone chain. Therefore, the number of intersection tests decrease drastically. Furthermore, a binary search can be used to find intersections along the monotone chain (see figure B.1 (b)). In practice, monotone chains often result in great performance wins. A further documentation of this technique can be found in [viv03b].

Polynomial expression.

Expressions with a finite number of constants and variables combined using only the oper- ations Addition, Subtraction, Multiplication and non-negative whole number exponents, i.e. raising to a power. For example: 3 · x4 · y · z3 − 3 · y2 − 5 is a polynomial expression. But 3 1 + x2 is not a polynomial expression, because it includes a division.

79 Appendix C

Notations

Class attributes object1.attribute1 : This notation refers to the value of the attribute attribute1 of object object1. object1.method1() : This notation refers to the return value of the method method1 of object object1.

80 Appendix D

Observations and Recommendations

D.1 Abstract Specification Issues

Implementation of the Interface GenericCurve

Figure D.1: Current and suggested relation between an OrientableCurve, a Curve and the interface GenericCurve

The Abstract Specification defines that a GM_Curve should implement the Interface GM_- GenericCurve. GM_OrientableCurve, which is parent class of GM_Curve, does not implement that interface. This results in constrains since the generic curve interface offers helpful methods not availabe for GM_OrientableCurves such as getStartPoint.A GM_CompositeCurve, for example, contains a list of GM_OrientableCurves. Since an orientable curve does not implement the generic curve interface, we have to downcast the Orientable curve to a curve to access its attributes. Recommendation: The GM_OrientableCurve, instead of GM_Curve, should imple- ment the interface GM_GenericCurve (see figure D.1).

81 Boundary of a Aggregates

Aggregates are subtyped in MultiPrimitives, which are subtyped in MultiSolid, MultiSur- face, MultiCurve and MultiPoint, which aggregate disjoint primitives. There is no restriction to create Boundaries from disjoint primitives. In fact, a Curve boundary is a complex with two disjoint subcomplexes with elements Point. The boudary of an Aggregate can not be a PrimitiveBoundary, since CurveBoundary has one start and one end point and therefore can not be used for MultiCurves with various start and various end points; SurfaceBoundary has one exterior and many interior rings for its corresponding surface, and therefore can not be used for MultiSurface with more then one exterior ring and a list of interior rings for each surface. FGAS does not specify how to represent boundary of Aggregates. However, the bound- ary operation is applicable on MultiPrimitives and should return an instance of GM_Object. The following types should be introduced for an appropriate representation of Aggregate boundaries: MultiPrimitiveBoundary: a set of Boundary objects MultiCurveBoundary: a set of CurveBoundary objects MultiSurfaceBoundary: a set of SurfaceBoundary objects MultiSolidBoundary: a set of SurfaceBoundary objects

Splitted Complexes

As shown in figure 4.10, set operations on complexes may result in complexes, for which the closure of the generating primitives are disjoint. Therefore, they can not be represented by Composites (CompositeSolid, CompositeSurface and CompositeCurve). The following type should be introduced for an appropriate representation of a geometry composed of disjoint complexes: MultiComplex: a set of complexes

D.2 Recommendations for the GeoAPI

The following part of this appendix contains recommendations for the GeoAPI, which resulted by "best-practise" in this implementation. The current revision of the svn branch GeoAPI branch for this implementation is: https://svn.sourceforge.net/svnroot/geoapi/branches/geometry/ The following recommendations are based on the changes made between the revisions 898 and 946.

82 D.2.1 Naming issues

Geometry Package

The package org.opengis.spatialschema.geometry.geometry refers to the Coordi- nate package in the FGAS. The term Geometry is confusing, since it sounds as geometric objects. However, the classes in this package are only auxiliary support classes. Hence, the term Coordinates (i.e. package name coordinate)is more appropriate.

Interface GeometryFactory

Same issue as in the Geometry package. The correct name for this factory inter- face (located in the package org.opengis.spatialschema.geometry.geometry) is CoordinateFactory.

D.2.2 Interface and method modifications

The following interfaces were added (in case they were missing in the current GeoAPI ver- sion) or modified:

MultiPoint.java

The following method was added in order to return a appropriate casted result: public Set getElements();

MultiCurve.java

This interface represents a MultiCurve and was added, since it was not present in the current GeoAPI. The methods are: public Set getElements(); to return a appropriate casted result. public double length(); to access the attribute specified in ISO19107.

MultiSurface.java

This interface represents a MultiCurve and was added, since it was not present in the current GeoAPI. The methods are: public Set getElements(); to return a appropriate casted result.

83 public double getArea(); to access the attribute specified in ISO19107.

AggregateFactory.java

This interface represents the factory for the Aggregate package and was added, since it was not present in the current GeoAPI. The methods are: public MultiPrimitive createMultiPrimitive(Set primitives); to create a MultiPrimitive instance. public MultiPoint createMultiPoint(Set points); to create a MultiPoint instance. public MultiCurve createMultiCurve(Set curves); to create a MultiCurve instance. public MultiSurface createMultiSurface(Set surfaces); to create a MultiSurface instance.

Aggregate.java

Method changed from public Set getElements(); into public Set getElements(); to allow for later downcast within sets.

CompositePoint.java

This interface represents a CompositePoint and was added, since it was not present in the current GeoAPI. It does not contains methods.

ComplexFactory

This interface represents the factory for the Complex package and was added, since it was not present in the current GeoAPI. The methods are:

84 public CompositePoint createCompositePoint(Point generator); to create a CompositePoint instance. public CompositeCurve createCompositeCurve( List generator); to create a CompositeCurve instance. public CompositeSurface createCompositeSurface( List generator); to create a CompositeSurface instance.

Complex.java

Method changed from public Set getElements(); into public Collection getElements();> getElements(); to allow for later downcast within collections.

PrimitiveFactory.java

Method changed from Ring createRing(List curves) throws... into Ring createRing(List curves) throws... to allow for later downcast within lists.

SurfaceBoundary.java

Method changed from public Ring[] getInteriors();) throws... into public List getInteriors() because implementations should avoid to use possibly arrays. Iterator compatible data types are to be preferred.

GeometryFactory.java

The following method was added: public Position createPosition(DirectPosition dp);

85 Reason: There was no method specified to instantiate Positions. However, there are specified methods which need Positions as a parameter. The method was added to compensate this inconsistent state.

86 Appendix E

Geometric objects and relations

E.1 Curve Types

Simple curve Representable by GM_Curve, GM_CompositeCurve, GM_Multi- Curve

Non-simple curve Representable by GM_Curve, GM_CompositeCurve, GM_Multi- Curve

Branched curve Representable by GM_MultiCurve

87 Curve with non-connected elements Representable by GM_MultiCurve

E.2 Polygon Types in the plane

Representable polygons

Simple Representable by GM_Surface, GM_CompositeSurface, GM_- MultiSurface

Simple Representable by GM_Surface, GM_CompositeSurface, GM_- MultiSurface

Polygon with non intersecting holes Representable by GM_Surface, GM_CompositeSurface, GM_- MultiSurface

88 Polygon with islands Representable by GM_MultiSurface

Polygon with hole that touch polygon shell in vertex (vertex/vertex-intersection) Representable by GM_Surface, GM_CompositeSurface, GM_- MultiSurface

Polygon with hole which touch an edge of the polygon’s shell (vertex/edge-intersection) Representable by GM_Surface, GM_CompositeSurface, GM_- MultiSurface

Polygon with holes which touch in vertices (vertex/vertex- intersection) Representable by GM_Surface, GM_CompositeSurface, GM_- MultiSurface

89 Polygon with holes which touch in vertex and edge (vertex/edge-intersection) Representable by GM_Surface, GM_CompositeSurface, GM_- MultiSurface

Polygon with hole which touches shell in more than one point Polygon is divided into two or more regions Representable by GM_MultiSurface

Polygon with hole which touches shell in more than one point Polygon is divided into two or more regions Representable by GM_MultiSurface

Divided polygons that boundaries intersect in point (vertex/edge-intersection) Representable by a GM_MultiSurface

90 Divided polygons that boundaries intersect in point (vertex/vertex-intersection) GM_MultiSurface

Divided polygons that boundaries intersect in edge (edge/edge- intersection) GM_MultiSurface or as a single region as GM_Surface or GM_- CompositeSurface

Divided overlapping polygons (edge/edge-intersection) GM_MultiSurface

Divided non intersecting polygons GM_MultiSurface

91 Not representable polygons

Polygon with two touching boundary vertices (intersection in point) Not representable

Polygon with intersecting boundary (intersection in point set) Not representable

Polygon with hole that intersect polygon shell in edge (edge/edge-intersection) Not representable

Polygon with holes which intersect in edges (edge/edge- intersection) Not representable

92 Polygon with hole which overlaps with shell Not representable

Polygon with overlapping holes Not representable

93 Appendix F

Implementation Overview

This Appendix gives an overview of the implemented classes and methods, specified by the ISO19107 and the GeoAPI.

F.1 Implemented Classes

The following data types were implemented:

• Geometry root object

• Primitives: PrimitiveFactory, Primitive, Point, OrientableCurve, Curve, OrientableSur- face, Suface, Ring, Boundary, PrimitiveBoundary, CurveBoundary, SurfaceBoundary, OrientablePrimitive

• Aggregates: AggregateFactory, Aggregate, MultiPrimitive, MultiPoint, MultiCurve, MultiSurface

• Complexes: ComplexFactory, Complex, Composite, CompositePoint, CompositeCurve, CompositeSurface, ComplexBoundary

• Coordinates: CoordinateFactory, Envelope, DirectPosition, Position, CurveSegment, LineString, LineSegment, PointArray, PointGrid, PolyhedralSurface, TriangulatedSur- face, SurfacePatch, Polygon, Triangle

F.2 Implemented Methods

The following methods of the GeoAPI were implemented for the Primitives, sometimes also for Complexes and Aggregates.The indiviudal methods of each primitive, coordinate, aggre- gates and complexes are not listed.

94 Method in Class TransfiniteSet Implemented √ boolean contains(DirectPosition point) √ boolean contains(TransfiniteSet pointSet) √ boolean intersects(TransfiniteSet pointSet) √ boolean equals(TransfiniteSet pointSet) √ TransfiniteSet intersection(TransfiniteSet pointSet) √ TransfiniteSet difference(TransfiniteSet pointSet) TransfiniteSet symmetricDifference √ (TransfiniteSet pointSet) √ TransfiniteSet union(TransfiniteSet pointSet)

Method in Class Geometry Implemented √ Geometry clone() √ Boundary getBoundary() Geometry getBuffer(double distance) √ DirectPosition getCentroid() √ Complex getClosure() √ Geometry getConvexHull() √ int getCoordinateDimension() √ intgetDimension(DirectPosition point) CoordinateReferenceSystem getCoordinateReferenceSystem() double getDistance(Geometry geometry) √ Envelope getEnvelope() Set getMaximalComplex() √ Geometry getMbRegion() √ DirectPositiongetRepresentativePoint() √ boolean isCycle() boolean isMutable() √ booleanisSimple() GeometrytoImmutable() Geometrytransform (CoordinateReferenceSystem newCRS) Geometrytransform (CoordinateReferenceSystem newCRS, MathTransform transform)

95 Method in Class GenericCurve Implemented LineString asLineString(double maxSpacing, double maxOffset) √ DirectPosition forConstructiveParam(double cp) √ DirectPosition forParam(double s) √ double getEndConstructiveParam() √ double getEndParam() √ DirectPosition getEndPoint() ParamForPoint getParamForPoint(DirectPosition p) √ double getStartConstructiveParam() √ double getStartParam() √ DirectPosition getStartPoint() √ double[] getTangent(double s) √ double length(double cparam1, double cparam2) double length(Position point1, Position point2)

Method in Class GenericSurface Implemented double getArea() double getPerimeter() double[] getUpNormal(DirectPosition point)

96 Bibliography

[ABD+95] F. Avnaim, J.-D. Boissonnat, O. Devillers, F. P. Preparata, and M. Yvinec. Evalua- tion of a new method to compute signs of determinants. In SCG ’95: Proceedings of the eleventh annual symposium on Computational geometry, pages 416–417, New York, NY, USA, 1995. ACM Press.

[ABD+97] Francis Avnaim, Jean-Daniel Boissonnat, Olivier Devillers, Franco Preparata, and Mariette Yvinec. Evaluating signs of determinants using single precision arithmetic. Algorithmica, 17(2):111–132, 1997.

[And79] A.M. Andrew. Another efficient algorithm for convex hulls in two dimensions. Inform. Process. Lett., 9:216–219, 1979.

[BBP95] J. Buchmann, I. Biehl, and T. Papanikolaou. Lidia - a library for computational number theory, 1995.

[BGHV99] Ludger Becker, Andre Giesen, Klaus H. Hinrichs, and Jan Vahrenhold. Algo- rithms for performing polygonal map overlay and spatial join on massive data sets. Lecture Notes in Computer Science, 1651:270–??, 1999.

[BH98] A. Brinkmann and K. Hinrichs. Implementing exact line segmentintersection in map overlay. Proceedings of the Eighth International Symposium on Spatial Data Handling, pages 569–579, 1998.

[BO79] Jon Louis Bentley and Thomas Ottmann. Algorithms for reporting and counting geometric intersections. IEEE Trans. Computers, 28(9):643–647, 1979.

[BY] Hervé Brönnimann and Mariette Yvinec. Efficient exact evaluation of signs of determinants.

[CF95] Eliseo Clementini and Paolino Di Felice. A comparison of methods for repre- senting topological relationships. Inf. Sci. Appl., 3(3):149–178, 1995.

97 [CF96] Eliseo Clementini and Paolino Di Felice. A model for representing topological relationships between complex geometric features in spatial databases. Inf. Sci., 90(1-4):121–136, 1996.

[CFC95] Eliseo Clementini, Paolino Di Felice, and Gianluca Califano. Composite regions in topological queries. Inf. Syst., 20(7):579–594, 1995.

[CFvO93] Eliseo Clementini, Paolino Di Felice, and Peter van Oosterom. A small set of formal topological relationships suitable for end-user interaction. In SSD ’93: Proceedings of the Third International Symposium on Advances in Spatial Databases, pages 277–295, London, UK, 1993. Springer-Verlag.

[Chu01] Fei Chuanyun. A java implementation for open gis simple feature specification. Master’s thesis, The University of Calgary, 2001.

[Cla92] K. L. Clarkson. Safe and effective determinant evaluation. In Proc. 31st IEEE Symposium on Foundations of Computer Science, pages 387–395, Pittsburgh, PA, October 1992.

[dBvKOS97] Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, Berlin, 1997.

[ECF94] Max J. Egenhofer, Eliseo Clementini, and Paolino Di Felice. Topological relations between regions with holes, 1994.

[EF91] Max J. Egenhofer and Robert D. Franzosa. Point set topological relations. Inter- national Journal of Geographical Information Systems, 5:161–174, 1991.

[EF95] Max J. Egenhofer and Robert D. Franzosa. On the equivalence of topological relations. International Journal of Geographical Information Systems, 9(2):133–152, 1995.

[Ege93] M. Egenhofer. Topological relations between regions in R2 and Z2. In D. J. Abel and B. C. Ooi, editors, Proceedings of the 3rd International Symposium on Large Spatial Databases (SSD), volume 692, pages 316–336, Berlin, 1993. Springer- Verlag.

[EH91] M.J. Egenhofer and J.R. Herring. Categorizing binary topological relationships between regions, lines, and points in geographic databases. Technical report, Dept. of Surveying Engineering, Univ. of Maine, Orono, 1991.

98 [EMH94] M. Egenhofer, D. Mark, and J. Herring. intersection: Formalism and its use for natural language spatial predicates, 1994.

[FH95] Ulrich Finke and Klaus H. Hinrichs. Overlaying simply connected planar subdi- visions in linear time. In SCG ’95: Proceedings of the eleventh annual symposium on Computational geometry, pages 119–126, New York, NY, USA, 1995. ACM Press.

[FPM] Leila De Floriani, Enrico Puppo, and Paola Magillo. Applications of computa- tional geometry to geographic information systems.

[FW96] Steven Fortune and Christopher J. Van Wyk. Static analysis yields efficient exact integer arithmetic for computational geometry. ACM Trans. Graph., 15(3):223– 248, 1996.

[Gol91] David Goldberg. What every computer scientist should know about floating- point arithmetic. ACM Comput. Surv., 23(1):5–48, 1991.

[Gra72] R.L. Graham. An efficent algorithm for determining the convex hull of a finite planar set. nform. Process. Lett., 1:132–133, 1972.

[GS93] Ralf Hartmut Guting and Markus Schneider. Realms: A foundation for spatial data types in database systems. In Symposium on Large Spatial Databases, pages 14–35, 1993.

[imga] Draft: Sanjay dominik jena, drawing: Wolfram lange, source: Arcgis, 2006.

[imgb] Draft: Sanjay dominik jena, drawing: Wolfram lange, source: Ipp 2006.

[Jen07] Sanjay Dominik Jena. A geometry implementation of the feature geometry abstract specification (iso19107) of the open geospatial consortium. Diploma Thesis, Fachhochschule Köln, Jan 2007.

[KBS91] H. Kriegel, T. Brinkhoff, and R. Schneider. An efficient map overlay algorithm based on spatial access methods and computational geometry, 1991.

[KLN91] Michael Karasick, Derek Lieber, and Lee R. Nackman. Efficient delaunay trian- gulation using rational arithmetic. ACM Trans. Graph., 10(1):71–91, 1991.

[Kuh05] Werner Kuhn. Geospatial semantics: Why, of what, and how? pages 1–24, 2005.

[LP76] D. T. Lee and F. P. Preparata. Location of a point in a planar subdivision and its applications. In STOC ’76: Proceedings of the eighth annual ACM symposium on Theory of computing, pages 231–235, New York, NY, USA, 1976. ACM Press.

99 [LY01] C. Li and C. Yap. Recent progress in exact geometric computation, 2001.

[MCD06] Joshua Bloch Mike Cowlishaw and Joseph D. Darcy. Fixed, floating, and exact computation with java’s bigdecimal. http://www.ddj.com/dept/java/184405721, 12 2006.

[MK89] Avraham Margalit and Gary D. Knott. An algorithm for computing the union, intersection or difference of two polygons. Computers & Graphics, 13(2):167–183, 1989.

[Män87] Martti Mäntylä. An introduction to solid modeling. Computer Science Press, Inc., New York, NY, USA, 1987.

[MNU97] Kurt Mehlhorn, Stefan Näher, and Christian Uhrig. The leda platform of combi- natorial and geometric computing. In ICALP ’97: Proceedings of the 24th Interna- tional Colloquium on Automata, Languages and Programming, pages 7–16, London, UK, 1997. Springer-Verlag.

[OGC97] The opengis abstract specification, topic 1: Feature geometry (iso 19107 spatial schema), version 5, document number 01-101. Technical report, Open Geospa- tial Consortium, 1997.

[OGC05] The opengis abstract specification topic 0: Abstract specification overview, ver- sion 5, document number 04-084. Technical report, Open Geospatial Consor- tium, 2005.

[P7585] IEEE Task P754. IEEE Standard 754-1985 for Binary Floating-Point Arithmetic. August 12 1985. A preliminary draft was published in the January 1980 issue of IEEE Computer, together with several companion articles. Available from the IEEE Service Center, Piscataway, NJ, USA.

[PS85] Franco P. Preparata and Michael I. Shamos. Computational geometry: an introduc- tion. Springer-Verlag New York, Inc., New York, NY, USA, 1985.

[PS93] Larry Palazzi and Jack Snoeyink. Counting and reporting red/blue segment intersections. In Workshop on Algorithms and Data Structures, pages 530–540, 1993.

[PvOV00] Theo Tijssen Peter van Oosterom, Wilko Quak and Edward Verbree. The archi- tecture of the geo-information infrastructure. Delft University of Technology, Department of Geodesy, The Netherlands, 2000.

100 [RHG] Markus Schneider Ralf Hartmut Güting, Thomas de Ridder. Implementation of the rose algebra: Efficient algorithms for realm-based spatial data types.

[Rös98] Norbert Rösch. Topologische Beziehungen in Geo-Informationssystemen. PhD thesis, Universität Fridericiana zu Karlsruhe (TH), Geodätisches Institut., 1998.

[Sch95] Klamer Schutte. An edge labeling approach to concave polygon clipping, 1995.

[Sch98] Stefan Schirra. Robustness and precision issues in geometric computation. Research Report MPI-I-98-1-004, Max-Planck-Institut für Informatik, Im Stadt- wald, D-66123 Saarbrücken, Germany, January 1998.

[Sch05] Holger Schulz. Sicherer umgang mit zahlen. Java Spektrum, 6:44–46, 2005.

[SFS05a] The opengis implementation specification for geographic information - simple feature access - part 1:common architecture. Technical report, Open Geospatial Consortium, 2005.

[SFS05b] The opengis implementation specification for geographic information - simple feature access - part 2: Sql option. Technical report, Open Geospatial Consor- tium, 2005.

[SH76] Michael Ian Shamos and Dan Hoey. Geometric intersection problems. In FOCS, pages 208–215, 1976.

[She97] Jonathan Richard Shewchuk. Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates. Discrete & Computational Geometry, 18(3):305– 363, October 1997.

[SVH89] Bernard Serpette, Jean Vuillemin, and Jean-Claude Hervé. BigNum: a portable and efficient package for arbitrary-precision arithmetic. Technical Report 2, 1989.

[viv03a] Jts topology suite developer’s guide, version 1.4. Technical report, VividSolu- tions Inc., 2003.

[viv03b] Jts topology suite technical specifications, version 1.4. Technical report, Vivid- Solutions Inc., 2003.

[vO94] Peter van Oosterom, editor. An R-tree based map-overlay algorithm. In Proceedings EGIS/MARI’94: Fifth European Conference on Geographical Information oCtjstems, pages 318-327. EGIS Foundation, March 1994.

101 [wwwa] Algorithms faq - http://www.faqs.org/faqs/graphics/algorithms-faq/ - last checked 01.09.2006.

[wwwb] Geoapi - http://docs.codehaus.org/display/geo/home - last checked 01.09.2006.

[wwwc] Geotools project - http://docs.codehaus.org/display/geot/home - last checked 01.09.2006.

[wwwd] International organization for standardization (iso) - http://www.iso.org/ - last checked 01.09.2006.

[wwwe] Iso / tc 211 - http://www.isotc211.org/ - last checked 01.09.2006.

[wwwf] Jordan curve theorem and its generalizations - http://www.math.ohio- state.edu/01.09.2006.

102