Clustering and Cartographic Simplification of Point Data Set

Clustering and cartographic simplification of point data set Atta Rabbi and Epameinondas Batsos Master of Science Thesis in Geoinformatics TRITA-GIT EX 12-001 Division of Geodesy and Geoinformatics Royal Institute of Technology (KTH) 100 44 Stockholm February 2012 Abstract As a key aspect of the mapping process, clustering and cartographic simplification plays a vital role in assessing the overall utility of Geographic information system. Within the digital environment, a number of research have been undertaken to define this process. Clustering and cartographic simplification is more related to visualization but also an important tool in decision making. The underlying process is mostly embedded in the system rather leaving options for the user. It is useful to have alternative local methods than very much programmed, closed and embedded methods for better understandings of the details of this process. In this research an attempt has been taken to develop a method for cartographic simplification through clustering. A point data set has been segmented into two different cluster groups by using two classic clustering techniques (K‐means clustering algorithm & Agglomerative hierarchical algorithm) and then the cluster groups have been simplified for avoid point congestion in smaller scale map than the original scale map. The produced results show the segmented data into two different cluster groups and a simplified form of them. There exist some visual disturbances in the simplified data which generates scope for further research. Keywords: clustering, k‐means clustering algorithm, agglomerative hierarchical clustering algorithm, cartographic simplification. Acknowledgements During our master thesis we were fortunate enough to have two supervisors. Dr. Irene Rangel initiated to acting as a supervisor/adviser during the initial part of the work, introduced us her ideas and thought‐provoking impulses especially regarding of the thesis topic and structure of literature review. We are very grateful to our Chief Adviser, Phd student Bo Mao who patiently tried to understand the ideas presented in this thesis when they first evolved and who provided valuable guidance as the work matured. Also we like to thank him for all the endless discussions, suggestions, for acting as a supervisor/adviser of this work and because has always been there to support us. In addition, we would like also to thank Prof. Yifang Ban for her encouraging support and serious and open discussions as well the thesis liberties offered and also for proof‐reading this thesis. A few months of our work were spent at the Geoinformatics department in KTH campus. Last but not least we owe thanks to our classmates and all the others in the department for supporting us and providing a friendly atmosphere. Their feedback greatly improves this thesis. Finally we would like to thank our families and friends for the moral and mental support in good as well as in less good periods of our work. With greatest joy, Atta Rabbi & Epameinondas Batsos Table of Contents Abstract Acknowledgements Table of Contents 1. Introduction ............................................................................................................................ 1 1.1 Background ....................................................................................................................... 1 1.2 Objectives ......................................................................................................................... 3 1.3 Thesis organization ........................................................................................................... 4 2. Literature of review ................................................................................................................ 5 2.1 Cluster analysis ................................................................................................................. 5 2.1.1 Clustering .................................................................................................................. 5 2.1.2 Categorization of clustering techniques ................................................................... 8 2.1.3 K‐means clustering algorithm ................................................................................. 12 2.1.4 Agglomerative hierarchical clustering algorithm .................................................... 14 2.2 Cartographic Generalization .......................................................................................... 18 2.2.1 Historical overview and existing definitions ........................................................... 18 2.2.2 Definition, needs, usefulness and advantages ........................................................ 21 2.1.3 Relation between map scale and generalization .................................................... 26 2.3 Generalization in digital systems ................................................................................... 30 2.2.1 Development ........................................................................................................... 30 2.2.2 Operators ................................................................................................................ 33 2.4 Related topics on data clustering & cartographic generalization .................................. 44 3. Methodology of the research ............................................................................................... 45 3.1 Data and computing environment ................................................................................. 45 3.2 Clustering data with k‐means algorithm ........................................................................ 49 3.2.1 Preliminary parameters of k‐means algorithm ....................................................... 49 3.2.2 Architecture of the basic k‐means algorithm ......................................................... 51 3.2.3 Implementation of k‐means algorithm ................................................................... 53 3.3 Clustering data with agglomerative hierarchical algorithm ........................................... 56 3.3.1 Architecture of the agglomerative hierarchical clustering algorithm .................... 56 3.3.2 Implementation of agglomerative hierarchical clustering algorithm ..................... 71 3.4 Cartographic simplification of clustered point data set ................................................. 74 4. Results & Discussion ............................................................................................................. 81 4.1 K‐means clusters ............................................................................................................ 81 4.2 Agglomerative hierarchical clusters ............................................................................... 89 4.3 Comparison between k‐means & agglomerative hierarchical clusters ......................... 95 4.4 Cartographic simplification .......................................................................................... 100 Conclusion .............................................................................................................................. 103 Appendix A ............................................................................................................................. 105 Appendix B ............................................................................................................................. 106 Appendix C ............................................................................................................................. 113 References .............................................................................................................................. 114 1. Introduction 1.1 Background The term “generalization” is used in many contexts of science and everyday life. In cartography, generalization retains the notion of abstraction and of extracting the general overriding principles similar to the usages in other disciplines. However, it is extended from the purely semantic to the spatial (and possibly temporal) domain. In cartography map generalization is the process of deriving from a detailed source spatial database a map or database the contents and complexity of which are reduced, while retaining the major semantic and structural characteristics of the source data appropriate to a required purpose (Weibel R. & Jones C.B 1998). It is typically associated with a reduction in the scale at which the data are displayed but should never be equated merely, a classic example being the derivation of a topographic map at 1:100,000 from a source map at 1:50,000. It is evident that a small – scale map contains less detailed information than a large scale map of the same area. Not only the scale but also the theme of the map specifies the density of represented data. The process of reducing the amount of data and adjusting the information to the given scale and theme is called cartographic generalization (Muller, 1991; Weibel & Dutton, 1999). This procedure abstracts and reduces the information from reality while meeting cartographic specifications and maintains the significant characteristics of the mapped area. It can be easily seen that this process is very complex and thus time‐ consuming. On the contrary, it represents a process of informed extraction and emphasis of the essential while suppressing the unimportant, maintaining logical and unambiguous relations between map objects, maintaining legibility of the map

Load more