Exploring PostGIS with a Full Analysis Example

J.C. MARTINEZ, E. COLL, J. IRIGOYEN Department of Cartographic Engineering, Geodesy and Photogrammetry Polytechnic University of Valencia Camino de Vera, 46022, Valencia SPAIN

Abstract: -. This paper explains PostGIS functioning through a small spatial analysis. For this purpose the analysis is firstly performed on a traditional way using a GIS Desktop program (ArcGIS from ESRI), and secondly the same analysis is carried out using SQL language and spatial extensions included on PostGIS, extensions that follow the ‘ for SQL’ guidelines, from the Open Geospatial Consortium. Finally, the result from the spatial analysis are assessed and compared.

Key-Words: - PostGIS, PostgreSQL, ArcGIS, , SQL Spatial, Open Geospatial Consortium

1 Introduction 2 Spatial analysis principles The software PostGIS [8] (developed by Refractions The purpose is to found the optimum location for a Research Inc.) add a medium to store geographic laboratory with the following requirements: objects to the relational database PostgreSQL. In this way, a PostgreSQL server can be used as a backend - Located on brushes area (type of soil code equal for a Geographic Information Systems (GIS). As one to 300) of the most common task working with a GIS is the - Located on soil suitable for building (type of soil spatial analysis, this paper firstly describes the use code higher than 0) necessary steps and secondly discusses the problems - Located close to sewer network (distance lower that appear when trying to carry out a spatial analysis than 300m.) with PostGIS, using a spatial database. Thus, for - Located away from rivers (up to 20m. for small each stage of the analysis, the different queries rivers and up to 40m. for big ones) statements are described and commented. - Area bigger than 5 000 m2. In order to contrast the obtained results, the same analysis is performed also with a commercial program: ArcGIS, from ESRI [1]. The main 2.1 Available Cartography difference between them is that PostGIS is a program Datasets provided for the spatial analysis consist on distributed under GPL licence, and ArcGIS is four layers on shape format [4], and a table on dbf privative software. Furthermore PostGIS is only a format: spatial database, while ArcGIS is also a complete solution of GIS Desktop. Thus, to visualize, design, - Layer ‘suelos’ (Fig. 1). Polygonal data layer that etc. PostGIS layers, is necessary to use other consists of 43 features describing soil types. software like the ones used in this article: JUMP Attribute table schema: (shape: polygon, tsuelo: 1.1.2 [5] and QGIS 0.6 [10]. short int {0, 1, 2, 3},{unsuitable, lower The hardware and the software employed in the suitability, medium suitability, high suitability}). analysis described in this paper are: - Layer ‘usos’ (Fig. 2). Polygonal data layer that consists of 76 features describing the different - ArcGIS 9.0 running under Windows XP SP2 kind of soil uses. Attribute table schema: (shape: operating system polygon, tuso: short int {100,200,300, - PostGIS 0.9.0, PostgreSQL 7.4.6, GEOS 2.0.1 400,500,600,700},{Urban, Agriculture, Thicket, under Suse 9.1 operating system. Wood, Water, Wetland, Badlands}). - Hardware: Notebook with Pentium Centrino - Layer ‘river’ (Fig. 3). Lineal data layer that processor 1.5 GHz. (Intel). 512Mb RAM consists of 106 features describing the existing rivers. Attribute table schema: (shape: line, trio: short int {1, 2},{Minor, Major}).

- Layer ‘alcanta’ (Fig. 4). Lineal data layer that consists of 6 features describing the sewer network. Attribute table schema: (shape: line, id: short int {0}). The attribute id is not used, but it has been created as a layer on a shape format must have at least one field, apart from the geometric one (in ArcGIS 9).

Fig. 4. Layer ‘alcanta’

- Table ‘riodist’ with the correspondences within rivers and distances in order to complete the proximity analysis.

Fig. 1. Layer ‘suelos’ Trio dist 1 40 2 20

The first step in both spatial analyses, the ArcGIS one as well as the PostGIS one, has been the conversion from the original shape format to the corresponding spatial databases (geodatabase in ArcGIS, or PostGIS tables in PostGIS). The conversion from shape to geodatabase is not compulsory as the analysis operations can be performed in both format. Anyway the conversion adds consistency to the comparison as it increases the criterion standards between both programs.

Fig. 2. Layer ‘usos’ 3 Análisis con ArcGIS In order to perform the spatial analysis, the selected version has been the 9.x, as it adds the new analysis operation engine from ArcInfo that is enclosed in ArcGIS by means of ArcToolbox. The tool modelbuilder [3] has been used in this paper as it allows designing the flowchart in a visual way (Fig. 5). In this way the model, which defines all the spatial operations to perform, can be used to run, modify or repeat on a simple way all the analysis process.

Fig. 3. Layer ‘rios’

The parameters considered in each analysis operation can be seen from the programming code in Jscript, corresponding to the ArcGIS analysis model. The code is the following one:

// ------// ArcGIS generated script //modified by authors // ------

// Create the Geoprocessor object var gp = WScript.CreateObject("esriGeoprocessing.GPDispatch.1");

// Set the necessary product code gp.SetProduct("ArcInfo");

// Load required toolboxes... gp.AddToolbox("C:/ArcGIS/ArcToolbox/Toolboxes/Conversion Tools.tbx"); gp.AddToolbox("C:/ArcGIS/ArcToolbox/Toolboxes/Data Management Tools.tbx"); gp.AddToolbox("C:/ArcGIS/ArcToolbox/Toolboxes/Analysis Tools.tbx");

// Local variables... var test = "C:\\test"; var bd_mdb = test + "\\bd.mdb";

// Original Data var alcanta_shp = test + "\\alcanta.shp"; var rios_shp = test + "\\rios.shp"; var suelos_shp = test + "\\suelos.shp"; var usos_shp = test + "\\usos.shp"; var riodist_dbf = test + "\\riodist.dbf";

// Geodatabase Featureclasses var alcanta = bd_mdb + "\\alcanta"; var rios = bd_mdb + "\\rios"; var suelos = bd_mdb + "\\suelos"; var usos = bd_mdb + "\\usos"; var riodist = bd_mdb + "\\riodist";

// Intermediate Data var alcantabuf = bd_mdb + "\\alcantabuf"; var riosbuf = bd_mdb + "\\riosbuf"; var vrios_lyr = test + "\\vrios.lyr"; var vrios_lyr2 = test + "\\vrios.lyr"; var vrios = bd_mdb + "\\vrios"; var inter = bd_mdb + "\\inter"; var difbuf = bd_mdb + "\\difbuf"; var final = bd_mdb + "\\final"; var vinter = test + "\\vinter.lyr";

// Process: Create Personal GDB... gp.CreatePersonalGDB_management(test, "bd"); // Process: Feature Class To Feature Class... gp.FeatureClassToFeatureClass_conversion(alcanta_shp, bd_mdb, "alcanta", "", "id id VISIBLE", "DISABLED", "DISABLED", "", "0"); // Process: Buffer... gp.Buffer_analysis(alcanta, alcantabuf, "300,000000 Meters", "FULL", "ROUND", "ALL", ""); Fig. 5. Spatial analysis (modelbuilder) // Process: Feature Class To Feature Class (2)... gp.FeatureClassToFeatureClass_conversion(rios_shp, bd_mdb, - PostgreSQL JDBC extension objects "rios", "", "trio trio VISIBLE", "DISABLED", "DISABLED", "", corresponding to the geometries. "0"); // Process: Make Feature Layer... - OGC access functions defined by the Simple gp.MakeFeatureLayer_management(rios, vrios_lyr, "", "", "trio Features Specification (SFS). trio VISIBLE"); // Process: Table To Table... PostGIS verifies OGC specification [7] called gp.TableToTable_conversion(riodist_dbf, bd_mdb, "riodist", "", "trio trio VISIBLE;dist dist VISIBLE", ""); SFS ‘Simple Features for SQL’ [6], that’s the // Process: Add Join... document where the spatial predicates and operators gp.AddJoin_management(vrios_lyr, "trio", riodist, "trio", are described. Spatial predicates are functions which "KEEP_COMMON"); compare two spatial objects and return a boolean // Process: Copy Features... true/false result indicating the existence (or absence) gp.CopyFeatures_management(vrios_lyr2, vrios, "", "0", "0", "0"); of a particular spatial relationship. Some examples // Process: Buffer (2)... of spatial predicates are Contains(), Intersects(), gp.Buffer_analysis(vrios, riosbuf, "riodist_dist", "FULL", Touches(), and Crosses(). The spatial operators take "ROUND", "ALL", ""); two geometries and return a new derived geometric // Process: Erase... gp.Erase_analysis(alcantabuf, riosbuf, difbuf, ""); result. Examples of the operators include // Process: Feature Class To Feature Class (3)... Difference(), Union(), Buffer() and Intersection(). gp.FeatureClassToFeatureClass_conversion(suelos_shp, The current SFS version is the 1.1, and it is based on bd_mdb, "suelos", "", "tsuelo tsuelo VISIBLE", "DISABLED", SQL92, therefore object-relational concepts (that are "DISABLED", "", "0"); integrated on SQL99) are not included. However // Process: Feature Class To Feature Class (4)... gp.FeatureClassToFeatureClass_conversion(usos_shp, bd_mdb, PostgreSQL would be able to work with them, as it is "usos", "", "tuso tuso VISIBLE", "DISABLED", "DISABLED", "", an object-relational database. "0"); // Process: Intersect... gp.Intersect_analysis(suelos + ";" + usos, inter, "ALL", "", "INPUT"); 4.1 Spatial analysis // Process: Make Feature Layer (2)... As a previous step to the spatial analysis, some tasks gp.MakeFeatureLayer_management(inter, vinter, "[tsuelo] > 0 are necessary, like creating the database or importing AND [tuso] = 300", "", "FID_suelos FID_suelos VISIBLE;tsuelo the layers from the shape format. Thus, a script has tsuelo VISIBLE;FID_usos FID_usos VISIBLE;tuso tuso VISIBLE"); been created to perform all these tasks: // Process: Intersect (2)... gp.Intersect_analysis(difbuf + ";" + vinter, final, "ALL", "", #!/bin/bash "INPUT"); createdb test # database produced createlang plpgsql test # language plpgsql psql -f postgis.sql -d test # postgis functionality 4 PostGIS Analysis psql -f spatial_ref_sys.sql -d test #reference systems table In order to understand the article, the reader must have knowledge of query language and SQL data #Layer conversion from shp to sql format manipulation (Structured Query Language) [2], and shp2pgsql rios.shp rios > rios.sql the knowledge of the SFS specification is advisable. shp2pgsql suelos.shp suelos > suelos.sql PostGIS has been developed by Refractions shp2pgsql usos.shp usos > usos.sql Research Inc as a research project in open source shp2pgsql alcanta.shp alcanta > alcanta.sql spatial database technology. PostGIS is released under the GNU General Public License. #PostGIS load of layers PostGIS is an extension to the object-relational psql -d test -f rios.sql psql -d test -f alcanta.sql database PostgreSQL [9] which includes the psql -d test -f suelos.sql following characteristics: psql -d test -f usos.sql

- ‘Simple Features’ (SF) as defined by the Open psql -d test -f test.sql #spatial analysis Geospatial Consortium (OGC). - Support for Well-Known Text (WKT) and Well- Known Binary (WKB) representations of GIS objects. - Fast spatial indexing using GiST - Geospatial analysis functions using de GEOS

library. 4.2 Test.sql file u.the_geom && s.the_geom and intersects Buffer of the layer ‘alcanta’ (u.the_geom,s.the_geom)) as foo; 1: create table tmp1 (gid serial); 28: create view vinter as select i.gid as gid,i.the_geom as 2: select addgeometrycolumn the_geom from inter as i where i.tuso = 300 and i.tsuelo > ('','tmp1','the_geom',-1,'POLYGON',2); 0; 3: alter table only tmp1 add constraint tmp1_pkey primary key (gid); Difference between ‘alcantabuf’ and ‘riosbuf’ 4: insert into tmp1(the_geom) select buffer(the_geom,300) 29: create table difbuf (gid serial); from alcanta; 30: select addgeometrycolumn ('','difbuf','the_geom',-1, 'MULTIPOLYGON',2); Dissolution of the ‘buffer’ barriers 31: alter table only difbuf add constraint difbuf_pkey 5: create table alcantabuf (gid serial); primary key (gid); 6: select addgeometrycolumn ('','alcantabuf','the_geom',-1 32: insert into difbuf (the_geom) select multi(difference ,'POLYGON',2); (c1.the_geom,c2.the_geom)) from alcantabuf as c1, 7: alter table only alcantabuf add constraint riosbuf as c2 where intersects (c1.the_geom,c2.the_geom); alcantabuf_pkey primary key (gid); 8: insert into alcantabuf(the_geom) select Intersection of ‘difbuf’ y ‘vinter’ layers. geomunion(the_geom) from tmp1; Use of spatial indexes. ‘Final’ layer (Fig. 6) Buffer of the layer ‘rios’ 33: create table final (gid serial); 9: create table riodist (trio integer primary key,dist float); 34: select addgeometrycolumn ('','final','the_geom',-1, 10: insert into riodist values (1,40); 'MULTIPOLYGON',2); 11: insert into riodist values (2,20); 35: alter table only final add constraint final_pkey 12: create table tmp2 (gid serial); primary key (gid); 13: select addgeometrycolumn ('','tmp2','the_geom',-1 36: create index inter_the_geom_idx on inter using gist ,'POLYGON',2); (the_geom GIST_GEOMETRY_OPS); 14: alter table only tmp2 add constraint tmp2_pkey 37: select update_geometry_stats('inter','the_geom'); primary key (gid); 38: create index difbuf_the_geom_idx on difbuf using gist 15: insert into tmp2(the_geom) select (the_geom GIST_GEOMETRY_OPS); buffer(r.the_geom,d.dist) from rios as r, riodist as d where 39: select update_geometry_stats('difbuf','the_geom'); r.trio = d.trio; 40: insert into final (the_geom) select multi(intersection (v.the_geom,b.the_geom)) from vinter as v,difbuf as b Dissolution of the ‘buffer’ barriers where v.the_geom && b.the_geom and intersects 16: create table riosbuf (gid serial); (v.the_geom,b.the_geom); 17: select addgeometrycolumn ('','riosbuf','the_geom',-1 ,'MULTIPOLYGON',2); 18: alter table only riosbuf add constraint riosbuf_pkey primary key (gid); 19: insert into riosbuf(the_geom) select geomunion (the_geom) from tmp2;

Intersection of ‘suelos’ and ‘usos’ layers. Use of spatial indexes 20: create index suelos_the_geom_idx on suelos using gist (the_geom GIST_GEOMETRY_OPS); 21: select update_geometry_stats('suelos','the_geom'); 22: create index usos_the_geom_idx on usos using gist (the_geom GIST_GEOMETRY_OPS); 23: select update_geometry_stats('usos','the_geom'); 24: create table inter (gid serial,tsuelo integer, tuso integer); 25: select addgeometrycolumn ('','inter','the_geom',-1, Fig. 6. ‘Final’ layer (12 records). 'MULTIPOLYGON',2);

26: alter table only inter add constraint inter_pkey primary key (gid); Superficies calculation 27: insert into inter (tuso,tsuelo,the_geom) select * from 41: select gid, area (the_geom) as area from final order by (select u.tuso, s.tsuelo, multi(intersection (u.the_geom, area desc; s.the_geom)) as ge from usos as u, suelos as s where

5 Conclusion 6 Acknowledgements The two spatial analyses, performed by different We want to acknowledge all users that, in an software have obtained the same result: 12 areas, 3 of unselfish way contribute to the open source projects, them bigger than 5 000 m2. Next table (Table1) and especially to the PostGIS, PostgreSQL and Jump shows the calculated surfaces from each program: users list, as well as to Refractions Research Inc. and all their developers: thanks to their fantastic open PostGIS ArcGIS Diferencia source solutions. We also want to acknowledge all 7282,540207 7337,198493 54,658286 the institutions that take part in this kind of projects. 5576,057298 5576,057198 -0,000100 This work has been partially supported by the 5525,807471 5525,80769 0,000219 research project “Information and Management in 3534,371886 3534,372359 0,000472 Local Administration” BIA2003-07914 from the 3053,161178 3053,161189 0,000011 Spanish Government (CICYT) and the European 2680,707293 2683,57273 2,865437 Union (ERDF funds). 2644,200245 2680,546649 36,346404 2635,5 2635,500241 0,000241 1730,5 1730,499998 -0,000002 References: 574,0386461 574,0386348 -0,000011 117,0016841 115,4272253 -1,574459 [1] ArcGIS 9.x, 0,751031182 0,751034653 0,000003 http://www.esri.com/software/arcgis/ index.html Table 1. Analysis results. (m2) [2] Celma, M. et al, Bases de datos relacionales, Pearson Prentice Hall, 2003 As it is visible in the table, except for the shaded [3] ESRI, ArcGIS 9. Geoprocessing in ArcGIS, ESRI rows, the calculated areas are practically the same, Digital Book, 2004 fitting at worst with the third decimal. [4] ESRI, ESRI Shapefile Technical Description, Differences highlighted on the shaded rows are ESRI White Paper, 1998 totally justified by the segments number that each [5] JUMP (JUMP Unified Mapping Platform), program uses to calculate rounded ends on buffer http://www.jump-project.org/ operations. ArcGIS considers many more segments [6] OGC, Simple Features SQL (SFS), http://www. than PostGIS that uses 8. On the next version of opengis.org/docs/99-049.pdf PostGIS, 0.9.1, a third parameter is included on the [7] OGC (Open Geospatial Consortium), http://www. buffer operator, to specify the segments number. opengeospatial.org/ Referring to the time employed, PostGIS spend 5 [8] PostGIS, http://postgis.refractions.net/ seconds to perform all the process (pre-view script [9] PostgreSQL, http://www.postgresql.org running on bash), while ArcGIS spend 25 seconds. [10] QGIS (Quantum GIS), http://qgis.sourceforge. However this analysis is not representative of the net / time spend by the algorithms from each program, as [11] UMN Mapserver, http://mapserver.gis.umn.edu/ the used layers are too small, and consist of few index.html geometric features. Referring to the layers visualization, both Jump and QGIS have work out fine. The positive characteristics of QGIS are the fact that it doesn’t load the complete layer on memory as Jump: QGIS directly attacks PostGIS spatial database while Jump performs an import of the layers. Furthermore QGIS renders faster at screen. Jump advantages are the higher number of options (that makes it more similar to a GIS Desktop), and the fact that it’s based on the Java Topology Suite (JTS). PostGIS can also be used as a data source with a map server, for example UMN Mapserver [11]. With this combination fast viewers and complete Web GIS can be created.