Extending an Open Source Spatial Database with Geospatial Image Support: an Image Mining Perspective
Total Page:16
File Type:pdf, Size:1020Kb
Extending an open source spatial Database with geospatial image support: An image mining perspective Muhammad Imran March, 2009 Extending an open source spatial Database with geospatial image support: An image mining perspective by Muhammad Imran Thesis submitted to the International Institute for Geo-information Science and Earth Observation in partial fulfilment of the requirements for the degree in Master of Science in Geoinformatics. Degree Assessment Board Thesis advisor Dr. Ir. R.A. (Rolf) de By Dr. Ir. W. (Wietske) Bijker Thesis examiners Chair: Prof. Dr. A. Stein External examiner: Dr. B.G.H. Gorte INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION ENSCHEDE, THE NETHERLANDS Disclaimer This document describes work undertaken as part of a programme of study at the International Institute for Geo-information Science and Earth Observation (ITC). All views and opinions expressed therein remain the sole responsibility of the author, and do not necessarily represent those of the institute. Abstract The nature of vector data is relatively constant, and it is revised less frequently as compared to remotely sensed earth observation data. Remote sensing images are being collected nowadays every 15 minutes from satel- lites such as Meteosat. In the coming years, very high spatial resolution data is expected to be available freely and frequently. Integrated GIS and remote sensing spatial analysis methods have the ability to incorporate different data sources to find attribute associations and patterns of change for knowledge discovery and change detection. GIS-based data such as vec- tor data and DEM are overlayed with image data and results are taken up in a GIS for further processing and analysis. A platform is required to efficiently store, retrieve and manipulate such image data as layers just like other GIS data layers for hybrid GIS/RS analysis. In principle, spatial databases are the most suitable candidates as such a platform. Our work is aimed to investigate the open source spatial database PostgreSQL/PostGIS (PG/PG) as such a platform, to provide a solution for image support and an overall framework for integrated remote sensing and GIS analysis. This is definitely beyond just storage and retrieval of images in spatial databases. The requirements and available open source libraries were extensively studied to provide such an image support. The TerraLib library was pro- posed, and analysed to extend the PG database with image support. To demonstrate the application developed in this study, the Meteosat Second Generation (MSG) image data for a larger part of Europe was extracted from the ITC data receiver. An application programme was written to con- struct time series image database for extracted image data with the PG/ PG DBMS. A mining application to detect clouds patterns from time-series image and vector data stored in the PG database was developed using Ter- raLib conceptual schema. For this, an extensive study of data mining meth- ods was carried out. A statistical data mining method based on the prin- cipal components analysis was adopted to extract cloud features for the Netherlands from the time series image data. Using this research platform and cloud patterns detection case applica- tion, various image mining scenarios were conducted to provide a frame- work for integrated image and vector data analysis top of the DBMS tech- nology. This framework is extremely useful for studying spatio-temporal phenomena with seasonal or long intervals and region-based studies where the regions on a remote sensing image are extracted by vector data. Keywords spatial database image support, integrated remote sensing and GIS analy- sis, data mining, cloud pattern detection, image analysis i Abstract ii Contents Abstract i List of Figures v List of Tables vii Acknowledgements ix 1 Introduction 1 1.1 Motivation and problem statement ................. 1 1.2 Research objective ........................... 2 1.2.1 Research sub-objectives . .................. 3 1.2.2 Research questions ...................... 3 1.2.3 Background .......................... 4 1.3 Project set-up ............................. 5 1.3.1 Method 1 ............................ 5 1.3.2 Method 2 ............................ 6 1.4 Thesis structure ............................ 8 2 Literature review 9 2.1 Introduction .............................. 9 2.2 Raster data model ........................... 9 2.3 Data model for image storage inside PG/PG ............ 11 2.3.1 Functions for integrated vector/raster analyses ...... 14 2.4 Proposed platform for image mining ................ 15 3 Data mining methods 19 3.1 Introduction .............................. 19 3.2 Classical data mining ......................... 19 3.2.1 Statistics ............................ 21 3.2.2 Database-oriented approaches to data mining ...... 22 3.2.3 Machine learning approaches for data mining ...... 26 3.3 Spatial data mining .......................... 27 3.3.1 Spatial statistics ....................... 28 3.3.2 Spatial database approach to data mining ......... 30 3.4 Image mining ............................. 33 3.4.1 Low-level image analysis to feature extraction ...... 34 iii Contents 3.4.2 High-level knowledge discovery ............... 37 3.4.3 Image mining from integrated image/GIS data analysis . 37 3.5 Summary ................................ 38 4 A database application development method using TerraLib 39 4.1 Introduction .............................. 39 4.2 TerraLib, TerraView and PostgreSQL/PostGIS set-up for image mining ................................. 39 4.2.1 TerraLib dependencies on open source third party libraries 40 4.3 TerraLib application development . ................ 40 4.4 Conceptual data model ........................ 48 4.4.1 Data model for storage . ................. 49 4.4.2 Data model for visualization ................. 51 4.4.3 Image data handling in TerraLib Database ........ 53 4.5 Summary ................................ 55 5 Application scenarios for image mining: Results and Discus- sions 57 5.1 Introduction .............................. 57 5.2 Image mining guided by GIS data . ................ 58 5.2.1 Introduction .......................... 58 5.2.2 The Data preparation ..................... 58 5.2.3 Method and results ...................... 59 5.2.4 Discussion ........................... 62 5.3 Extending TerraView for a temporal image query ........ 64 5.3.1 Introduction .......................... 64 5.3.2 Method and results ...................... 64 5.4 Database-oriented approaches to data mining ........... 65 5.4.1 Introduction .......................... 65 5.4.2 Method and results ...................... 66 5.5 Conclusion ............................... 70 6 Conclusions and Recommendations 71 6.1 Conclusions .............................. 71 6.2 Recommendations ........................... 74 A Source code for creating time-series image database in PG 77 B Source code for image mining application scenario Section 5.2 81 C Source code for image mining application scenario Section 5.3 87 D Source code for image mining application scenario Section 5.4 91 Bibliography 103 iv List of Figures 1.1 Metadata and Data types with some important fields ...... 7 1.2 Flow diagram for providing raster support extending PostGIS with CHIP datatype ............................. 7 2.1 Design levels and associated design issues ............ 11 2.2 The current open source software and related Libraries [32] . 14 4.1 A set-up for cloud detection image mining application ...... 40 4.2 Singleton design pattern adopted for TerraLib .......... 44 4.3 Factory design pattern adopted for TerraLib ........... 44 4.4 Strategy design pattern adopted for TerraLib ........... 46 4.5 Iterator design pattern adopted for TerraLib ........... 47 4.6 TerraLib software architecture [79] ................ 49 4.7 Conceptual data model related to source domain for image and vector data storage in PG modified from [29] ........... 50 5.1 Clipping of research area from MSG satellite data with vector data 59 5.2 An image mining process with integrated image and vector anal- ysis with TerraLib on top of the DBMS technology ........ 60 5.3 The resulting principal components for two dates at 14:00 hours 61 5.4 Comparison of image size on disk with size in database using com- pression ................................. 63 5.5 The views/themes populated as a result of temporal query . 65 5.6 A sequence of steps in a mining process to generate attribute data in the PG database .......................... 67 5.7 Attribute data for PC images as a result of PCA algorithm applied on time-series MSG data ....................... 68 5.8 Time-series cloud patterns analysis for December 13, 2008 . 69 5.9 Time-series cloud patterns analysis for December 16, 2008 . 69 v List of Figures vi List of Tables 3.1 Statistical methods for data mining. ................ 23 3.2 Statistical methods for spatial data mining. ............ 29 4.1 Third-party libraries used by TerraLib for image support in PG 41 5.1 Image size on the disk and in the database ............ 63 vii List of Tables viii Acknowledgements I would like to sincerely thank Dr. Ir. R.A. (Rolf) de By and Dr. Ir. W. (Wietske) Bijker for their support and guidance all through this work. I would like to cordially thank Prof. Dr. A. Stein in promoting and motivating me from my day first at ITC till today. I would like to thank Dr. Javier Morales for helping me in handling linux-specific is- sues. I would like to affectionately thank all teachers