A Spatial Autocorrelation Approach
Total Page:16
File Type:pdf, Size:1020Kb
Multi-labelled Image Segmentation in Irregular, Weighted Networks: A Spatial Autocorrelation Approach Raphael¨ Cere´ and Franc¸ois Bavaud Department of Geography and Sustainability, University of Lausanne, Switzerland Keywords: Free Energy, Image Segmentation, Iterative Clustering, K-means, Laplacian, Modularity, Multivariate Features, Ncut, Soft Membership, Spatial Autocorrelation, Spatial Clustering. Abstract: Image segmentation and spatial clustering both face the same primary problem, namely to gather together spatial entities which are both spatially close and similar regarding their features. The parallelism is partic- ularly obvious in the case of irregular, weighted networks, where methods borrowed from spatial analysis and general data analysis (soft K-means) may serve at segmenting images, as illustrated on four examples. Our semi-supervised approach considers soft memberships (fuzzy clustering) and attempts to minimize a free energy functional made of three ingredients : a within-cluster features dispersion (hard K-means), a network partitioning objective (such as the Ncut or the modularity) and a regularizing entropic term, enabling an itera- tive computation of the locally optimal soft clusters. In particular, the second functional enjoys many possible formulations, arguably helpful in unifying various conceptualizations of space through the probabilistic selec- tion of pairs of neighbours, as well as their relation to spatial autocorrelation (Moran’s I). 1 INTRODUCTION work clustering. In a nutshell, the purely spatial stan- dard procedures aimed at Ncut minimisation (Shi and Regional data analysis, as performed on geographic Malik, 2000; Grady and Schwartz, 2006) or modular- information systems, deals with a notion of “where” ity maximization (Newman, 2006) are enriched with (the spatial disposition of regions), a notion of “what” a features dissimilarity term, central in the K-means (the regional features), and a notion of “how much” approach, and further regularized by an entropy term, (the relative importance of regions, as given by their favoring the emergence of soft clusters, and allowing surface or the population size). The data define a a iterative computation of locally optimal solutions. marked, weighted network, generally irregular (think The fields of spatial analysis, in particular spa- e.g. of administrative units): weighted vertices rep- tial clustering, on one hand, and image segmentation resent the regions, weighted edges measure the prox- on the other hand, seem currently to be investigated imity between regions, on which uni- or multivariate by distinct, non-overlapping communities. Yet, both features (the marks) are defined. communities arguably share the same what-versus- Much the same can be said of an image made where-trading challenge, aimed at obtaining clusters of pixels, that is a collection of elements embed- both homogeneous and connected. ded in a bidimensional layout. The regularity of the In this work in progress, we first define the main setup (regular grid, uniform weights, binary regular quantities of interest and the iterative algorithm itself adjacencies) is exploited in most segmentation algo- (section 2). The definition of the canonical measure of rithms, but the latter may become inadapted, pre- image homogeneity, namely Moran’s I in the present cisely, under irregular situations, such as pixels of var- spatial context, is recalled in section 2.3. Section (3) ious sizes or importance, aggregated pixels, irregular illustrates the method on four case sudies. A conclu- boundaries or connectivites, multi-layered or partially sion (section 4) lists some further research lines, to be missing data. investigated in a close future. The appendix details This paper proposes an iterative algorithm the construction of the so-called exchange matrix, as for semi-supervised image segmentation, directly well as the test of spatial autocorrelation in a weighted adapted from regional clustering procedures in spatial setting. analysis, themselves originating in (non-marked) net- 62 Ceré, R. and Bavaud, F. Multi-labelled Image Segmentation in Irregular, Weighted Networks: A Spatial Autocorrelation Approach. DOI: 10.5220/0006322800620069 In Proceedings of the 3rd International Conference on Geographical Information Systems Theory, Applications and Management (GISTAM 2017), pages 62-69 ISBN: 978-989-758-252-3 Copyright © 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved Multi-labelled Image Segmentation in Irregular, Weighted Networks: A Spatial Autocorrelation Approach 2 DEFINITIONS AND the corresponding groups obtain as ρg = ∑i fizig, and g FORMALISM the regional distribution of groups as fi = fizig/ρg = p(g i), obeying f g = 1. The region-group dependency can| be measured• by the mutual information The formalism we consider extends the spatial au- tocorrelation formalism used in Quantitative Geogra- p(i,g) zig phy and Spatial Econometrics to the case of weighted, K [Z] = ∑ p(i,g)ln = ∑ fizig ln (1) p(i)p(g) ρg irregular regions, as well as to multivariate features. ig ig It turns out to be broad enough to provide a flexible A good clustering should consist of homogeneous framework for unsupervised or semi-supervised gen- groups made of regions not too dissimilar regard- eralized image segmentation, where the ”generalized ing their features, that is insuring a low value of the images” under consideration can be made of irregular within-group inertia (Bavaud, 2009) pixels, irregularly inter-connected, and endowed with m multivariate numerical features. 1 g g ∆W [Z] = ∑ ρg∆g where ∆g = ∑ fi f j Di j (2) g=1 2 i j Space as a weighted network: the exchange matrix E A good clustering should also avoid to separate a pair Specifically, consider n regions (generalized pix- of spatially strongly connected regions, that is to in- els) with relative weights fi > 0, summing to f = n • sure a low value of the generalized cut ∑ fi = 1, together with an n n symmetric non- i=1 × negative exchange matrix E = (ei j), and weight- 1 2 n G[Z] = ∑G(ρg)∑ei j(zig z jg) (3) compatible in the sense ei = ∑ j=1 ei j = fi. The 2 g i j − exchange matrix E interprets• as a joint probability where G(ρ) 0 is non-increasing in ρ. The choice p(i, j) = ei j to select the pair of regions i and j ≥ (edges), and defines a weighted unoriented network. G(ρ) = 1/ρ amounts to the N-cut (Shi and Malik, Its margins interpret as the probability p(i) = fi to se- 2000), while the choice G(ρ) = 1, we shall adopt lect region i (vertices). here, is equivalent to the modularity criterium (New- Weight-compatible exchange matrices E define a man, 2006). continuous neighborhood relation between regions. They can be constructed from f and the adja- We consider the regularized clustering problem, cency matrix A, or from another spatial proximity consisting in finding a clustering Z minimizing the of distance matrix (see the appendix). The row- free energy functional standardized matrix of spatial weights W = (wi j) of α F [Z] = β∆W [Z] + G[Z] + K [Z] (4) spatial autoregressive models obtains as wi j = ei j/ fi, 2 and constitutes the transition matrix of a Markov The terms , respectively , behaves as a features chain with stationary distribution f . ∆W G dissimilarity energy, respectively a spatial energy, fa- voring hard partitions obeying z = 0 or z = 1. Multivariate features: the dissimilarity matrix D ig ig By contrast, the regularizing entropy term K favors Regional features can consist of univariate grey the emergence of soft clusterings. Setting α = 0 levels, multivariate color or spectral intensities, or yields the soft K-means algorithm (Gaussian mix- (in a geographical context) any regional variable such tures), where β is an inverse (dissimilarity) temper- as the proportions of specific land uses, population ature. Canceling the first-order derivative of the free density, proportion of party B voters, etc. Multivari- energy with respect to zig under the constraints zi = 1 • ate characteristics xi are suitably combined into n n yields the minimization condition 2× squared Euclidean dissimilarities Di j = xi x j . k − k ρ exp( βDg α(Lzg) ) z = g − i − i (5) ig h h ∑ ρh exp( βD α(Lz )i) h − i − 2.1 Image Segmentation (Regional g g where Di = ∑ j fi Di j ∆g is the squared Euclidean Clustering) dissimilarity from i to− the centroid of group g, and g g (Lz )i = zig ∑k wikzkg is the Laplacian of z at i, A soft regional clustering or image segmentation into comparing its− value to the average value of its neigh- m groups is described by non-negative n m mem- bourgs. × bership matrix Z = (zig) with zig = p(g i) denotes the probability that region (pixel) i belongs| to group m g, and obeys zi = ∑g=1 zig = 1. The weights ρ of • 63 GISTAM 2017 - 3rd International Conference on Geographical Information Systems Theory, Applications and Management 2.2 Iterative Procedure Table 1: Summary of the iterative algorithm. Begin Equation (5) can be solved iteratively, updating ρg, g Compute the weight vector f ( fi = 1/n for regular grids) Dig and Lz at each step, and converges towards a ∞ Compute the binary adjacency matrix A local minimum Z of F [Z], which constitutes the For given t > 0, compute the weight-compatible exchange matrix searched for soft spatial partition or image segmen- E( f ,A,t) by (9), and the matrix of spatial weights as wi j = ei j/ fi ∞ tation. Z can be further hardened by assigning i to Compute the features dissimilarity matrix D = x x 2 i j k i − jk = ∞ Initialize the n (m + 1) membership matrix Z0 as: group g argmaxh zih . × z0 = 1 if i F and g = 0 A semi-supervised implementation of the proce- ig ∈ z0 = 1 if i T and g = τ dure, imposing the membership of a few pixels (and ig) ∈ τ 0 = possibly breaking down the monotonic decrease of zig 0 otherwise.