Relational Geographic Databases
Total Page:16
File Type:pdf, Size:1020Kb
Relational Geographic Databases Dehua Zhao1, Byunggu Yu (corresponding author)1, Bong H. Hong2, and Dan Randolph1 1Department of Computer Science University of Wyoming PO Box 3315 Laramie, Wyoming 82071, USA phone: (307) 766-2440, fax: (307) 766-4036 [email protected] 2Department of Computer Science and Engineering Pusan National University San 30, Jangjeon-dong, Gumjeong-gu, Pusan, 609-735, Korea phone: +82-51-510-2424, fax: +82-51-517-2431 [email protected] Abstract. This paper proposes a generic relational-database schema that can efficiently accommodate various types of GIS data. The proposed schema complies with the OpenGIS Simple Features Specification for SQL developed by OGC (OpenGIS Consortium) and can be used for any geographic application whose geographic objects are represented based on 2D geometry with linear interpolation between vertices. The generic schema that we propose in this paper makes it possible to automatically generate a relational database schema for any existing or new 2D GIS dataset. This facilitates the migration and deployment of GIS data in well-established relational database environments. Consequently, sharing and integrating GIS data become much more feasible. In addition, since any relational database management system (DBMS) can be used, developing a GIS application system on existing GIS data is facilitated. We verified the proposed schema and automatic schema generation mechanism by developing and testing a relational geographic information system. Keywords. Geographic Information System, databases, relational databases, schema design. 1 Introduction Just a few decades ago, paper maps were the principal means of synthesizing and representing geographic information. Paper maps are limited to manual manipulation and fail to meet the increasing demand for interactive manipulation and analysis of geographic data. The rapid development of new computer software and hardware technologies has made meeting this demand possible: various types of geographic information systems (GIS’s) that can replace traditional paper maps have been developed. In recent years, a GIS has become more than a cartographic tool to produce digital maps. A GIS provides storage, management, and retrieval of geographic spatial data (e.g., the boundaries of lakes) and related non-spatial data (e.g., names, sizes, and average water temperatures of lakes). The GIS application domain spans many areas including Urban Planning, Route Optimization, Public Utility Network Management, Demography, Cartography, Agriculture, Natural Resources Management, Coastal Monitoring, Fire Control, and Epidemic Monitoring. In recent years, there is an increasing demand for database-supported GIS’s that are streamlined for handling complex statistical or analytical queries. Most modern GIS’s have been developed based on a file system. As a result, each GIS has its own logical data formats and file structures. Unfortunately, these file-system-based GIS’s have several well-known problems that have been found in the area of databases: data sharing, data redundancy and inconsistency, transaction control and recovery, concurrency control, and security. The most feasible approach to these problems is building a GIS based on a well-established database model [6]. Maybe, the most successful database model that has been proven to effectively attack the problems of early file systems in a reliable manor is the relational database model – almost all database management systems (DBMS’s) support relational database model. Previous works related to developing a GIS based on database technology are categorized into two approaches: hybrid approach [15, 16] and integration approach [12, 13, 14, 21, 22]. The hybrid approach uses a DBMS to store and manage non-spatial data, and spatial data is separately managed by either a proprietary file system (e.g., ARC/INFO) [15] or a spatial data manager (e.g., Papyrus) [16]. On the other hand, the integration approach extends the ER-model (the relational database model) by adding new data types and operations to capture spatial semantics [12, 13, 21, 22] and requires the DBMS to support user defined ADTs (Abstract Data Types) and operations [14, 21, 22]. In these systems, the major problem is data sharing and migration among heterogeneous systems. This paper proposes a highly flexible and portable GIS schema called the Generic Relational-Geographic- Information Schema (GRGIS) that can be automatically specialized to accommodate any GIS dataset whose data objects are represented based on 2D geometry with linear interpolation between vertices. The schema is based on the relational database model and a widely-used standard SQL (SQL92) [6] and fully compatible with the OpenGIS Simple Features Specification for SQL developed by OGC (OpenGIS Consortium) [5]. This paper also proposes our technique called the automatic Schema Generation mechanism for GIS applications (SGGIS) that can automatically generate a relational database schema, given a GIS dataset. The GRGIS and SGGIS facilitate the migration and deployment of GIS data in well-established relational database environments. Consequently, sharing and integrating GIS data become much more feasible. In addition, since any database management system (DBMS) that supports the basic relational database model and SQL can be used, developing a GIS application system on an existing DBMS and reusing existing sets of geographic data are facilitated. An experimental GIS system called the RGIS (Relational Geographic Information System) has been developed to verify the GRGIS and the SGGIS. The remainder of this paper is organized as follows: Chapter 2 gives an overview of commonly used GIS data models. Chapter 3 introduces the GRGIS and SGGIS. Chapter 4 shows our experimental results. Finally, we provide the summary and discuss our future work in Chapter 5. 2 Backgrounds In many ways, a GIS presents a simplified view of the real world. Each geographic data object associates with two kinds of data: non-spatial data and spatial data. Non-spatial data of a geographic object consists of alphanumeric values describing (or being associated with) the object. Spatial data of a geographic object represents geometric properties of the object. Geometric properties of a geographic object define the geometry (i.e., geometric figure) of the object by defining the interiors, the boundaries, and the exteriors of the object [8]. Existing spatial data models can be classified into two groups depending on how they view the real world: field model and object model [1, 2, 3, 4]. The field model views the world as a continuous surface over which features vary in a continuous distribution (e.g. atmospheric pressure). In this model, the world (i.e., a field) is partitioned into areas, and the emphasis is on the contents of these areas. The object model thinks of the world as a surface littered with recognizable objects. Another classification of spatial data is based on the representation of spatial data: raster representation and vector representation. Typically the field model is developed based on the raster representation. The vector representation, on which the object model is implemented, explicitly stores the geometric features of the identified geographic objects (typically obtained from raster data). It takes much less storage space and provides efficient geometrical and topological operations. Although the field model is still used in some applications such as atmosphere GIS applications and environmental GIS applications, the object model is becoming widely accepted, this is because of the fact that geometrical and topological operations are necessities for an increasing number of emerging GIS applications. In this paper, we focus on the object model and the vector representation. 2.1 Object Models based on Vector Representation In the vector representation, objects are constructed from points as primitives. A point is represented by a pair of X, Y coordinates, whereas more complex linear and region objects are represented by structures (lists, sets) on their point representation. Considering collections of geographic objects, interests are also given to the representation of topological relationships among geographic objects. Differing in the expression of topological representation, the representations of geographic-objects collection are usually classified into two models: spaghetti representation and topological representation. In the spaghetti representation, the geometric properties of any spatial object are described independently of other objects. No topological relations are stored, and all topological relations are computed on demand. On the other hand, the topological representation describes geometrical properties in terms of node, arc, polygon, region, and the topological relations among them. For example, a node is represented by a point and a list of arcs starting (or ending) at this node; an arc is represented by its ending nodes and the polygons having the arc as a common boundary; a polygon is represented by a list of arcs. The main advantage of the spaghetti representation is its simplicity. The drawbacks of this model are mainly due to the lack of explicit information about topological relations among spatial objects. In addition, the spaghetti 2 representation implies data redundancy. For example, the coordinate values representing a boundary shared by two adjacent regions are duplicated. The topological representation can efficiently support some topological queries. For example,