Semi-Supervised Image Classification with Laplacian Support Vector

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. XX, NO. Y, MONTH Z 2007 1 Semi-supervised Image Classification with Laplacian Support Vector Machines Luis Gómez-Chova, Gustavo Camps-Valls, Senior Member, IEEE, Jordi Muñoz-Mar´ı, and Javier Calpe, Member, IEEE Abstract— This paper presents a semi-supervised method for methods are computationally demanding and generally do not remote sensing image classification based on kernel machines and yield a final decision function but only prediction labels. graph theory. The support vector machine (SVM) is regularized In this paper, we present a recently introduced semi- with the un-normalized graph Laplacian, thus leading to the Laplacian SVM. The method is tested in the challenging problems supervised framework that incorporates labeled and unlabeled of urban monitoring and cloud screening, in which an adequate data in any general-purpose learner [12], [13]. We focus on exploitation of the wealth of unlabeled samples is of critical. a semi-supervised extension of the SVM, which introduces Results obtained using different sensors, and with low number an additional regularization term on the geometry of both of training samples demonstrate the potential of the proposed labeled and unlabeled samples by using the graph Laplacian Laplacian SVM for remote sensing image classification. [14], thus leading to the so-called Laplacian SVM (LapSVM). Index Terms— Semi-supervised learning, regularization, kernel This methodology follows a non-iterative optimization proce- methods, manifold learning, support vector machines. dure in contrast to most transductive learning methods, and provides out-of-sample predictions in contrast to graph-based approaches. In addition, hard-margin SVM, directed graph I. INTRODUCTION methods, label propagation methods and spectral clustering In remote sensing image classification, we are usually given solutions are obtained for particular free parameters. a reduced set of labeled samples to develop the classifier. The performance of the LapSVM is illustrated in two Supervised classifiers such as support vector machines (SVMs) challenging problems: the urban classification problem using [1], [2] excel in using the labeled information, being (reg- multispectral (LandSat TM) and radar (ERS2 SAR) data [15], ularized) maximum margin classifiers also equipped with an and the cloud screening problem using the MEdium Resolution appropriate loss function [3], [4]. These methods, nevertheless, Imaging Spectrometer (MERIS) instrument on board the ESA need to be reformulated to exploit the information contained ENVIronmental SATellite (ENVISAT) [16]. On the one hand, in the wealth of unlabeled samples, which is known as semi- monitoring urban areas at a regional scale, and even at a global supervised classification. In semi-supervised learning (SSL), scale, has become an increasingly important topic in the last the algorithm is provided with some available supervised decades in order to keep track of the loss of natural areas due information in addition to the unlabeled data. The framework to urban development. The classification of urban areas is, of semi-supervised learning is very active and has recently at- however, a complex problem, specially when different sensor tracted a considerable amount of research [5], [6]. Essentially, sources are used, as they induce a highly variable input feature three different classes of SSL algorithms are encountered in space. On the other hand, the amount of images acquired the literature: over the globe every day by Earth Observation (EO) satellites makes inevitable the presence of clouds. Therefore, accurate 1) Generative models involve estimating the conditional cloud masking in acquired scenes is of paramount importance density, such as expectation-maximization (EM) algo- to avoid errors on the study of the true ground reflectance, and rithms with finite mixture models [7], which have been to permit multitemporal studies, as it is no longer necessary to extensively applied in the context of remotely sensed discard the whole image. However, very few labeled cloud pix- image classification [8]. els are tipically available, and cloud features change to a great 2) Low density separation algorithms maximize the margin extent depending on the cloud type, thickness, transparency, for labeled and unlabeled samples simultaneously, such height, or background. In addition, cloud screening must be as Transductive SVM (TSVM) [9], which have been carried out before atmospheric correction, being the input data recently applied to hyperspectral image classification affected by the atmospheric conditions. [10]. In machine learning, these problems constitute clear ex- 3) Graph-based methods, in which each sample spreads its amples of classification in complex manifolds and ill-posed label information to its neighbors until a global stable problems, respectively. A manifold is a topological space that state is achieved on the whole dataset [11]. is locally Euclidean, but in which the global structure may In the last years, TSVM and graph-based methods have be more complicated. In both settings, the use of LapSVM captured great attention. However, some specific problems are is motivated since it permits using the labeled samples, identified in both of them. In particular, TSVM is sensitive and efficiently exploiting the information contained in the to local minima and requires convergence heuristics by using high number of available unlabeled pixels to characterize the an (unknown) number of unlabeled samples. Graph-based marginal distribution of the class of interest. The rest of the paper is outlined as follows. Section II Manuscript received July 2007; reviews the framework of semi-supervised learning paying Dept. Enginyeria Electrònica. Escola Tècnica Superior d’Enginyeria. Uni- versitat de València. C/ Dr. Moliner, 50. 46100 Burjassot (València) Spain. attention to the minimization functional and the need for differ- E-mail: [email protected] ent types of regularization. Section III presents the formulation 2 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. XX, NO. Y, MONTH Z 2007 of the Laplacian SVM. Section IV shows the experimental without considering the mapping φ explicitly, a non-linear results. Finally, Section V concludes and outlines further work. classifier can be constructed by selecting the proper kernel. Also, the regularization term can be fully expressed in terms II. SEMI-SUPERVISED LEARNING FRAMEWORK of the kernel matrix and the expansion coefficients: Regularization is necessary to produce smooth decision 2 2 fH = w =(Φα) (Φα)=α Kα. (4) functions and thus avoiding overfitting to the training data. Since the work of Tikhonov [17], many regularized algorithms have been proposed to control the capacity of the classifier [1], C. Manifold regularization [18]. The regularization framework has been recently extended The geometry of the data is modeled with a graph in which to the use of unlabeled samples [13] as follows. nodes represent both labeled and unlabeled samples connected Notationally, we are given a set of l labeled samples, {x }l {x }l+u by weights Wij [5], [14]. Regularizing the graph follows from i,yi i=1, and a set of u unlabeled samples i i=l+1, the smoothness (or manifold) assumption and intuitively is x ∈ RN ∈{−1 +1} where i and yi , . Let us now assume equivalent to penalize the “rapid changes” of the classification a general-purpose decision function f. The regularized func- function evaluated between close samples in the graph: tional to be minimized is defined as: l+u l 1 2 1 2 2 2 fM = Wij (f(xi) − f(xj)) = f Lf, (5) L = V (xi,yi,f)+γLfH + γM fM, (1) ( + )2 l u i,j=1 l i=1 where V represents a generic cost function of the committed where L = D − W is the graph Laplacian, D is the diagonal l+u errors on the labeled samples, γL controls the complexity of degree matrix of W given by Dii = j=1 Wij , and f = H f in the associated Hilbert space , and γM controls its [f(x1),...,f(xl+u)] = Kα, where we have deliberately complexity in the intrinsic geometry of the marginal data dropped the bias term b. distribution. For example, if the probability distribution is 2 supported on a low-dimensional manifold, f M penalizes f along that manifold M. Note that this functional constitutes D. Formulation a general regularization framework that takes into account all By plugging (2)-(5) into (1), we obtain the regularized the available knowledge. function to be minimized: l III. LAPLACIAN SUPPORT VECTOR MACHINES 1 γM min ξi + γLα Kα + α KLKα (6) l 2 ξi∈R l (l + u) The previous semi-supervised learning framework allows us l+u i=1 to develop many different algorithms just by playing around α∈R with the loss function, V , and the regularizers, f 2. In this subject to: paper, we focus on the Laplacian SVM formulation, which l+u basically uses a SVM as the learner core and the graph (x x )+ ≥ 1 − =1 Laplacian for manifold regularization. In the following, we yi αjK i, j b ξi,i ,...,l (7) review all the ingredients of the formulation. j=1 ξi ≥ 0 i =1,...,l (8) A. Cost function of the errors where ξi are slack variables to deal with committed errors in The Laplacian SVM uses the same hinge loss function as the labeled samples. Introducing restrictions (7)-(8) into the the traditional SVM: primal functional (6) through Lagrange multipliers, β i and ηi, and taking derivatives w.r.t. b and ξi, we obtain: V (xi,yi,f)=max{0, 1 − yif(xi)}, (2) 1 2γM where f represents the decision function implemented by the min α 2γLK + KLK α α,β 2 ( + )2 selected classifier. l u l −α KJ Yβ + i=1 βi , (9) B. Decision function We use as the decision function f(x∗)=w, φ(x∗) + b, where J =[I0] is an l × (l + u) matrix with I as the where φ(·) is a nonlinear mapping to a higher (possibly l × l identity matrix (the first l points are labeled) and Y = infinite) dimensional Hilbert space H, and w and b define diag(y1,...,yl).

Semi-Supervised Image Classification with Laplacian Support Vector

ECS289: Scalable Machine Learning

Semi-Supervised Classification Using Local and Global Regularization

Linear Manifold Regularization for Large Scale Semi-Supervised Learning

Semi-Supervised Deep Learning Using Improved Unsupervised Discriminant Projection?

Hyper-Parameter Optimization for Manifold Regularization Learning

Zero Shot Learning Via Multi-Scale Manifold Regularization

Semi-Supervised Deep Metric Learning Networks for Classiﬁcation of Polarimetric SAR Data

Manifold Regularization and Semi-Supervised Learning: Some Theoretical Analyses

A Graph Based Approach to Semi-Supervised Learning

Cross-Modal and Multimodal Data Analysis Based on Functional Mapping of Spectral Descriptors and Manifold Regularization

Classification by Discriminative Regularization

Approximate Manifold Regularization: Scalable Algorithm and Generalization Analysis