Interactive Models for Latent Information Discovery in Satellite Images
Total Page:16
File Type:pdf, Size:1020Kb
Interactive Models for Latent Information Discovery in Satellite Images DISSERTATION zur Erlangung des Grades eines Doktors der Ingenieurwissenschaften vorgelegt von Dipl.-Eng. Dragos Bratasanu eingereicht bei der Naturwissenschaftlich-Technischen Fakultät der Universität Siegen Siegen 2014 Koordinator: Prof. Dr. Otmar Loffeld Koordinator: Prof. Dr. Mihai Datcu Datum der Mündlichen Prüfung: 30 Mai 2014 To my parents 1 Abstract The recent increase in Earth Observation (EO) missions has resulted in unprecedented volumes of multi-modal data to be processed, understood, used and stored in archives. The advanced capabilities of satellite sensors become useful only when translated into accurate, focused information, ready to be used by decision makers from various fields. Two key problems emerge when trying to bridge the gap between research, science and multi-user platforms: (1) The current systems for data access permit only queries by geographic location, time of acquisition, type of sensor, but this information is often less important than the latent, conceptual content of the scenes; (2) simultaneously, many new applications relying on EO data require the knowledge of complex image processing and computer vision methods for understanding and extracting information from the data. This dissertation designs two important concept modules of a theoretical image information mining (IIM) system for EO: semantic knowledge discovery in large databases and data visualization techniques. These modules allow users to discover and extract relevant conceptual information directly from satellite images and generate an optimum visualization for this information. The first contribution of this dissertation brings a theoretical solution that bridges the gap and discovers the semantic rules between the output of state-of-the-art classification algorithms and the semantic, human-defined, manually-applied terminology of cartographic data. The set of rules explain in latent, linguistic concepts the contents of satellite images and link the low-level machine language to the high-level human understanding. The second contribution of this dissertation is an adaptive visualization methodology used to assist the image analyst in understanding the satellite image through optimum representations and to offer cognitive support in discovering relevant information in the scenes. It is an interactive technique applied to discover the optimum combination of three spectral features of a multi-band satellite image that enhance visualization of learned targets and phenomena of interest. The visual mining module is essential for an IIM system because all EO-based applications involve several steps of visual inspection and the final decision about the information derived from satellite data is always made by a human operator. To ensure maximum correlation between the requirements of the analyst and the possibilities of the computer, the visualization tool models the human visual system and secures that a change in the image space is equivalent to a change in the perception space of the operator. This thesis presents novel concepts and methods that help users access and discover latent information in archives and visualize satellite scenes in an interactive, human-centered and information-driven workflow. 2 Table of Contents 1. Introduction…………………………………………………………………………..... 5 1.1 Motivation………………………………………………………………………….. 6 1.2 Positioning this dissertation……………………………………………………….. 7 1.3 Contributions……………………………………………………………………….. 7 1.3.1 Latent information discovery in satellite images………………………... 8 1.3.2 Visual data mining……………………………………………………….. 9 1.4 Outline of this dissertation………………………………………………………… 10 2. Information Spaces in Earth Observation Data….….….….….….….….….….……. 11 2.1 Introduction….….….….….….….….….….….….….….….….….….….….….…... 11 2.1.1 Data collection & analysis procedures….….….….….….….….….….….. 13 2.1.2 Data collection….….….….….….….….….….….….….….….….….…… 13 2.2 Resolution….….….….….….….….….….….….….….….….….….….….….……. 14 2.2.1 Spectral resolution….….….….….….….….….….….….….….….….….. 14 2.2.2 Spatial resolution….….….….….….….….….….….….….….….….…… 15 2.2.3 Temporal resolution….….….….….….….….….….….….….….….…… 15 2.2.4 Radiometric resolution….….….….….….….….….….….….….….……. 16 2.2.5 Angular information….….….….….….….….….….….….….….….……. 16 2.3 Data analysis methods….….….….….….….….….….….….….….….….….……. 16 2.4 Remote sensing satellite systems….….….….….….….….….….….….….….……. 17 2.5 Spectral dimensions in satellite images….….….….….….….….….….….….……. 18 2.6 Semantic dimensions in satellite image-based cartography….….….….….….……. 22 2.6.1 Corine Land Cover….….….….….….….….….….….….….….….….….. 22 2.6.2 Urban Atlas….….….….….….….….….….….….….….….….….….…… 25 2.6.3 Rapid mapping applications ….….….….….….….….….….….….….…. 27 2.6.4 EO-based applications….….….….….….….….….….….….….….…….. 27 3. Image Information Mining: State-of-the-Art………………………………………... 30 3.1 Introduction………………………………………………………………...………. 30 3.2 Image Information Mining System Design………………………………………… 33 3.2.1 Feature extraction………………………………………............................ 33 3.2.2 Feature selection…………………………………………………………. 33 3.2.3 Feature selection review………………………………………………… 37 3.2.4 Multidimensional index…………………………………………………... 39 3.3 Content-based Information Retrieval: State-of-the-art in multimedia…………….. 41 3.3.1 Human-centered systems for multimedia information retrieval………... 43 3.3.2 Semantic learning for CBIR……………………………………………... 44 3.3.3 Relevance feedback……………………………………………………… 45 3.3.4 Content-based image retrieval – forensics……………………………… 47 3.3.5 Content-based image retrieval – medical imaging……………………… 48 3.3.6 Concluding remarks………………………………………………………. 48 3.4 Information mining systems for Earth Observation - A brief review……………… 48 3 4. Basics of Inference and Stochastic Image Analysis………………………………….. 52 4.1 Stochastic image analysis…………………………………………………………. 53 4.1.1 Probability………………………………………………………………… 53 4.1.2 Random variables………………………………………………………… 54 4.1.2.1 Distribution function and probability density function.……………... 55 4.1.2.2 Statistical moments…………………………………………………... 58 4.2 Transformation-based analysis……………………………………………………. 61 4.2.1 Principal component analysis………………………………………….…. 61 4.2.2 Independent component analysis…………………………………………. 62 4.3 Bayesian Inference…………………………………………………………………. 62 4.3.1 Parameter estimation……………………………………………………... 64 4.3.2 Case studies………………………………………………………………. 65 4.3.3 Generative probabilistic models………………………………………….. 67 4.3.3.1 Gaussian mixture models……………………………………………. 68 4.3.3.2 Latent semantic analysis …………………………………………… 70 4.3.3.3 Probabilistic latent semantic indexing……………………………… 70 4.3.3.4 Latent Dirichlet Allocation…………………………………………... 72 4.4 Information theory…………………………………………….…………………… 77 4.4.1 Shannon’s measure of information……………………………………….. 77 4.4.2 Mutual information……………………………………………………….. 78 4.4.3 Kullback-Leibler divergence……………………………………………... 79 5. Bridging the Gap for Semantic Annotation of Satellite Images…………………….. 80 5.1 Introduction………………………………………………………………………… 81 5.2 Spectral signatures and semantic content extraction……………………………... 83 5.3 Map label learning using Latent Dirichlet Allocation……………………………... 86 5.3.1 Latent Dirichlet Allocation……………………………………………….. 86 5.3.2 Document definition – matching images & words……………………… 86 5.3.3 LDA generative process for image annotation………………………….. 88 5.3.4 Semantic learning………………………………………………………... 89 5.4 Composition rules for bridging the semantic gap………………………………... 92 5.5 Case studies: rules for bridging machine and human languages………………... 93 6. Spectral Band Discovery for Advancing Multispectral Satellite Image Analysis and Photo-Interpretation……………………………………………………………… 101 6.1 Exploratory visual analysis of satellite images…………………………………... 102 6.2 Contextual information integration for spectral feature selection………………... 103 6.3 Minimum-Redundancy-Maximum-Relevance criterion for feature selection…… 105 6.4 Objective evaluation of subjective visual information…………………………… 107 6.4.1 Visual Image Analysis – elements of processing………………………... 108 6.4.2 Color metrics for satellite image analysis………………………………… 109 6.4.3 Color models for satellite image analysis………………………………… 112 6.4.4 Quantitative evaluation using color distances………………………….. 112 6.5 Experiments and results……………………………………………………………. 114 6.6 Discussion………………………………………………………………………….. 129 6.6.1 Comparison and evaluation: mRMR, PCA, ICA…………………………. 129 6.6.2 mRMR score similarity statistics and physical modeling………………… 136 7. Conclusions……………………………………………………………………………... 145 8. Appendix………………………………………………………………………………... 147 9. Acronym list…………………………………………………………………………….. 187 10. Acknowledgements……………………………………………………………………... 188 11. Bibliography…………………………………………………………………………….. 189 4 1 Introduction Our planet is going through unprecedented climate and environmental changes, affecting our society in previously unknown ways. To monitor and predict the effects of these diverse environmental challenges, new generations of space borne imaging sensors with advanced recording capabilities have been recently launched or are scheduled for the next few years (e.g. ESA Sentinels, SPOT 6, 7, Pleiades). This increase in Earth Observation (EO) missions will result in large volumes of data requiring to be processed, understood, used and stored in archives. The need for timely delivery of accurate, focused information for decision making and intervention is constantly growing and the continuous increase