IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. XX, NO. Y, MONTH Z 2007 1 Semi-supervised Image Classification with Laplacian Support Vector Machines

Luis G«omez-Chova, Gustavo Camps-Valls, Senior Member, IEEE, Jordi Mu˜noz-Mar«õ, and Javier Calpe, Member, IEEE

Abstract— This paper presents a semi-supervised method for methods are computationally demanding and generally do not remote sensing image classification based on kernel machines and yield a final decision function but only prediction labels. graph theory. The support vector machine (SVM) is regularized In this paper, we present a recently introduced semi- with the un-normalized graph Laplacian, thus leading to the Laplacian SVM. The method is tested in the challenging problems supervised framework that incorporates labeled and unlabeled of urban monitoring and cloud screening, in which an adequate data in any general-purpose learner [12], [13]. We focus on exploitation of the wealth of unlabeled samples is of critical. a semi-supervised extension of the SVM, which introduces Results obtained using different sensors, and with low number an additional regularization term on the geometry of both of training samples demonstrate the potential of the proposed labeled and unlabeled samples by using the graph Laplacian Laplacian SVM for remote sensing image classification. [14], thus leading to the so-called Laplacian SVM (LapSVM). Index Terms— Semi-, regularization, kernel This methodology follows a non-iterative optimization proce- methods, manifold learning, support vector machines. dure in contrast to most transductive learning methods, and provides out-of-sample predictions in contrast to graph-based approaches. In addition, hard-margin SVM, directed graph I. INTRODUCTION methods, label propagation methods and spectral clustering In remote sensing image classification, we are usually given solutions are obtained for particular free parameters. a reduced set of labeled samples to develop the classifier. The performance of the LapSVM is illustrated in two Supervised classifiers such as support vector machines (SVMs) challenging problems: the urban classification problem using [1], [2] excel in using the labeled information, being (reg- multispectral (LandSat TM) and radar (ERS2 SAR) data [15], ularized) maximum margin classifiers also equipped with an and the cloud screening problem using the MEdium Resolution appropriate [3], [4]. These methods, nevertheless, Imaging Spectrometer (MERIS) instrument on board the ESA need to be reformulated to exploit the information contained ENVIronmental SATellite (ENVISAT) [16]. On the one hand, in the wealth of unlabeled samples, which is known as semi- monitoring urban areas at a regional scale, and even at a global supervised classification. In semi-supervised learning (SSL), scale, has become an increasingly important topic in the last the algorithm is provided with some available supervised decades in order to keep track of the loss of natural areas due information in addition to the unlabeled data. The framework to urban development. The classification of urban areas is, of semi-supervised learning is very active and has recently at- however, a complex problem, specially when different sensor tracted a considerable amount of research [5], [6]. Essentially, sources are used, as they induce a highly variable input feature three different classes of SSL algorithms are encountered in space. On the other hand, the amount of images acquired the literature: over the globe every day by Earth Observation (EO) satellites makes inevitable the presence of clouds. Therefore, accurate 1) Generative models involve estimating the conditional cloud masking in acquired scenes is of paramount importance density, such as expectation-maximization (EM) algo- to avoid errors on the study of the true ground reflectance, and rithms with finite mixture models [7], which have been to permit multitemporal studies, as it is no longer necessary to extensively applied in the context of remotely sensed discard the whole image. However, very few labeled cloud pix- image classification [8]. els are tipically available, and cloud features change to a great 2) Low density separation algorithms maximize the margin extent depending on the cloud type, thickness, transparency, for labeled and unlabeled samples simultaneously, such height, or background. In addition, cloud screening must be as Transductive SVM (TSVM) [9], which have been carried out before atmospheric correction, being the input data recently applied to hyperspectral image classification affected by the atmospheric conditions. [10]. In , these problems constitute clear ex- 3) Graph-based methods, in which each sample spreads its amples of classification in complex manifolds and ill-posed label information to its neighbors until a global stable problems, respectively. A manifold is a topological space that state is achieved on the whole dataset [11]. is locally Euclidean, but in which the global structure may In the last years, TSVM and graph-based methods have be more complicated. In both settings, the use of LapSVM captured great attention. However, some specific problems are is motivated since it permits using the labeled samples, identified in both of them. In particular, TSVM is sensitive and efficiently exploiting the information contained in the to local minima and requires convergence heuristics by using high number of available unlabeled pixels to characterize the an (unknown) number of unlabeled samples. Graph-based marginal distribution of the class of interest. The rest of the paper is outlined as follows. Section II Manuscript received July 2007; reviews the framework of semi-supervised learning paying Dept. Enginyeria Electr`onica. Escola T`ecnica Superior d’Enginyeria. Uni- versitat de Val`encia. C/ Dr. Moliner, 50. 46100 Burjassot (Val`encia) Spain. attention to the minimization functional and the need for differ- E-mail: [email protected] ent types of regularization. Section III presents the formulation 2 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. XX, NO. Y, MONTH Z 2007 of the Laplacian SVM. Section IV shows the experimental without considering the mapping φ explicitly, a non-linear results. Finally, Section V concludes and outlines further work. classifier can be constructed by selecting the proper kernel. Also, the regularization term can be fully expressed in terms II. SEMI-SUPERVISED LEARNING FRAMEWORK of the kernel matrix and the expansion coefficients: Regularization is necessary to produce smooth decision 2 2   fH = w =(Φα) (Φα)=α Kα. (4) functions and thus avoiding overfitting to the training data. Since the work of Tikhonov [17], many regularized algorithms have been proposed to control the capacity of the classifier [1], C. Manifold regularization [18]. The regularization framework has been recently extended The geometry of the data is modeled with a graph in which to the use of unlabeled samples [13] as follows. nodes represent both labeled and unlabeled samples connected Notationally, we are given a set of l labeled samples, {x }l {x }l+u by weights Wij [5], [14]. Regularizing the graph follows from i,yi i=1, and a set of u unlabeled samples i i=l+1, the smoothness (or manifold) assumption and intuitively is x ∈ RN ∈{−1 +1} where i and yi , . Let us now assume equivalent to penalize the “rapid changes” of the classification a general-purpose decision function f. The regularized func- function evaluated between close samples in the graph: tional to be minimized is defined as: l+u l  1  2 1 2  2 2 fM = Wij (f(xi) − f(xj)) = f Lf, (5) L = V (xi,yi,f)+γLfH + γM fM, (1) ( + )2 l u i,j=1 l i=1 where V represents a generic cost function of the committed where L = D − W is the graph Laplacian, D is the diagonal l+u errors on the labeled samples, γL controls the complexity of degree matrix of W given by Dii = j=1 Wij , and f = H  f in the associated Hilbert space , and γM controls its [f(x1),...,f(xl+u)] = Kα, where we have deliberately complexity in the intrinsic geometry of the marginal data dropped the bias term b. distribution. For example, if the probability distribution is 2 supported on a low-dimensional manifold, f M penalizes f along that manifold M. Note that this functional constitutes D. Formulation a general regularization framework that takes into account all By plugging (2)-(5) into (1), we obtain the regularized the available knowledge. function to be minimized:  l  III. LAPLACIAN SUPPORT VECTOR MACHINES 1  γM  min ξi + γLα Kα + α KLKα (6) l 2 ξi∈R l (l + u) The previous semi-supervised learning framework allows us l+u i=1 to develop many different algorithms just by playing around α∈R with the loss function, V , and the regularizers, f 2. In this subject to: paper, we focus on the Laplacian SVM formulation, which  l+u  basically uses a SVM as the learner core and the graph (x x )+ ≥ 1 − =1 Laplacian for manifold regularization. In the following, we yi αjK i, j b ξi,i ,...,l (7) review all the ingredients of the formulation. j=1 ξi ≥ 0 i =1,...,l (8)

A. Cost function of the errors where ξi are slack variables to deal with committed errors in The Laplacian SVM uses the same hinge loss function as the labeled samples. Introducing restrictions (7)-(8) into the the traditional SVM: primal functional (6) through Lagrange multipliers, β i and ηi, and taking derivatives w.r.t. b and ξi, we obtain: V (xi,yi,f)=max{0, 1 − yif(xi)}, (2)    1  2γM where f represents the decision function implemented by the min α 2γLK + KLK α α,β 2 ( + )2 selected classifier. l u     l −α KJ Yβ + i=1 βi , (9) B. Decision function We use as the decision function f(x∗)=w, φ(x∗) + b, where J =[I0] is an l × (l + u) matrix with I as the where φ(·) is a nonlinear mapping to a higher (possibly l × l identity matrix (the first l points are labeled) and Y = infinite) dimensional Hilbert space H, and w and b define diag(y1,...,yl). Taking derivatives again w.r.t. α, we obtain a linear regression in that space. By means of the Representer the solution [13]: w Theorem [1], weights can be expressed in the dual problem  −1 w as the expansion over labeled and unlabeled samples = α = 2 I +2 γM LK JYβ∗ l+u  γL 2 (10) i=1 αiφ(xi) = Φα, where Φ = [φ(x1),...,φ(xl+u)] and (l + u) α =[α1,...,αl+u]. Then, the decision function is given by: Now, substituting again (10) into the dual functional (9), we l+u obtain the following quadratic programming problem to be f(x∗)= αiK(xi, x∗)+b, (3) solved: i=1  l  K ∗ 1  and is the kernel matrix formed by kernel functions β =max βi − β Qβ (11) β 2 K(xi, xj)=φ(xi), φ(xj). The key point here is that, i=1 GOMEZ-CHOVA ET AL.: SEMI-SUPERVISED IMAGE CLASSIFICATION WITH LAPLACIAN SVM 3

κ statistic for both SVM and LapSVM. The graph Laplacian, L, con- sisted of (l + u) nodes connected using k =6nearest neighbors, and compute the edge weights W ij using the 1 Euclidean distance among samples. For all experiments, we generated training and validation 0.5 sets consisting of l = 400 labeled samples (200 samples per class). The semi-supervised LapSVM used u = 400 unlabeled (randomly selected) samples from the analyzed 0 images. We focus on the ill-posed scenario and varied the rate of both labeled and unlabeled samples independently, −0.5 {2 5 10 20 50 100} 5 , , , , , %. All classifiers are compared using 10 5 the overall accuracy, OA[%], and the estimated kappa statistic, 0 10 10 0 κ, as a measure of robustness in the classification. 10 Free parameters γL and γM were varied in steps of one γ /(l+u)2 −5 −5 γ −4 4 M 10 10 L decade in the range [10 , 10 ], and the Gaussian width was tuned in the range σ = {10−3,...,10} for the RBF kernel. Fig. 1. Illustrative example of a kappa statistic surface over the validation The selection of the best subset of free parameters was done set for the LapSVM as a function of regularization parameters γL and γM . by cross-validation. Figure 1 shows the kappa statistic as a function of the regularization parameters γ L and γM obtained  l 1 by a LapSVM in the validation set for an illustrative example. subject to i=1 βiyi =0and 0 ≤ βi ≤ l , i =1,...,l, where This figure clearly shows that the best classification results are 2  −1 obtained with γL >γM /(u+l) , which suggests a preference γM  Q = YJK 2γLI +2 LK J Y (12) for the regularization of the classifier (supervised information) (l + u)2 than for the regularization of the geometry of the marginal data distribution (unsupervised information). Therefore, the basic steps for obtaining the weigths α i for the solution in (3) are: (i) build the weight matrix W and compute the graph Laplacian L = D − W, (ii) compute the B. Urban monitoring kernel matrix K, (iii) fix regularization parameters γ L and γM , 1) Data description: The image used in this first ex- and (iv) compute α using (10) after solving the problem (11). periment was collected in the Urban Expansion Monitoring (UrbEx) ESA-ESRIN DUP project [19]. Results from UrbEx project were used to perform the analysis of the selected test E. Relation with other classifiers site and for validation purposes as well1. The considered test The Laplacian SVM is intimately related to other unsu- site was Naples (Italy), where images from ERS2 SAR and pervised and semi-supervised classifiers. This is because the Landsat TM sensors were acquired in 1999. method incorporates both the concepts of kernels and graphs in An external Digital Elevation Model (DEM) and a reference the same classifier, thus having connections with transduction, land cover map provided by the Italian Institute of Statistics clustering, graph-based and label propagation methods. The (ISTAT) were also available. The ERS2 SAR 35-day inter- minimizing functional used in the standard TSVM considers ferometric pairs were selected with perpendicular baselines a different regularization parameter for labeled and unlabeled between 20-150m in order to obtain the interferometric co- samples, which is the case in the proposed framework (cf. herence from each complex SAR image pair. The available Eq. (1)). Also, LapSVM is directly connected with the soft- features were initially labeled as: L1-L7 for LandSat bands; margin SVM (γM =0), the hard margin SVM (γL → 0, In1-In2 for the SAR backscattering intensities (0Ð35 days); γM =0), the graph-based regularization method (γ L → 0, and Co for the coherence. Since these features come from γM > 0), the label-propagation regularization method (γ L → different sensors, the first step was to perform a specific 0, γM → 0, γM γL), and spectral clustering (γM =1). processing and conditioning of optical and SAR data, and In conclusion, by optimizing parameters γ L and γM over a to co-register all images. After pre-processing, features were wide enough range, the LapSVM theoretically outperforms stacked at a pixel level. For full details, see [15]. the aforementioned classifiers. See [13] for deeper details and 2) Model comparison: Figure 2[top] shows the validation theoretical comparison. results for the SVM and LapSVM with both the linear and RBF kernels. Several conclusions can be obtained from this IV. EXPERIMENTAL RESULTS figure. First, LapSVM classifiers produce better classification results than SVM in all cases (note that SVM is a particular This section presents the experimental results of the pro- case of the Laplacian SVM for γM =0). This gain is posed method in two challenging scenarios: urban monitoring specially noticeable when a low number of labeled samples is and cloud screening. These two examples are well-suited available (unlabeled samples help estimating the geometry of because efficient exploitation of unlabeled samples becomes the manifold) and for the RBF kernel only. These differences strictly necessary to attain satisfactory results. are lower with the linear kernel. When the number of labeled samples is high, differences among methods are numerically A. Model development and experimental setup very similar, but the proposed method produces better results still in this situation. On the right plot, the κ surface for the We used both the linear kernel, K(xi, xj)=xi, xj, and the Radial Basis Function (RBF) kernel, K(xi, xj) = 1 2 2 + For further details, visit: http://dup.esrin.esa.int/ionia/ exp −xi − xj  /2σ , where σ ∈ R is the kernel width projects/summaryp30.asp 4 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. XX, NO. Y, MONTH Z 2007

κ statistic 100 1

95 0.9 1 0.9 90 0.8 0.8 85 0.7 0.7 RBF−LapSVM RBF−LapSVM Kappa statistic RBF−SVM RBF−SVM

Overall Accuracy (%) 80 0.6 2 LIN−LapSVM LIN−LapSVM 10 2 LIN−SVM LIN−SVM 1 10 1 75 0.5 10 10 1 10 100 1 10 100 0 0 % Labeled Samples % Labeled Samples % labeled 10 10 % unlabeled κ statistic 100 1

90 0.8 1

80 0.8 0.6 70 0.6 0.4 0.4 60 RBF−LapSVM Kappa statistic RBF−LapSVM 0.2 Overall Accuracy (%) RBF−SVM RBF−SVM 2 0.2 10 50 LIN−LapSVM LIN−LapSVM 2 1 10 LIN−−SVM LIN−SVM 10 1 40 0 10 1 10 100 1 10 100 0 0 % labeled 10 10 % Labeled Samples % Labeled Samples % unlabeled

Fig. 2. Results for the urban classification (top row) and the cloud screening (bottom row) problems. Overall Accuracy, OA[%], (left) and Kappa statistic, κ,(middle) over the validation set as a function of the rate of labeled training samples used to build the LapSVM and SVM with an RBF and linear kernel. Kappa statistic surface (right) over the validation set for the best RBF-LapSVM classifier as a function of the rate of both labeled and unlabeled training samples.

LapSVM highlights the importance of the labeled information which are part of the data acquired in the framework of the in this problem. SPARC 2003 and 2004 ESA campaigns (ESA-SPARC Project, 3) Visual inspection: The best LapSVM classifier obtained contract ESTEC-18307/04/NL/FF). These two images were before was used to classify the whole scene, which consists of acquired on July 14th of two consecutive years (2003-07-14 a scene containing 200×200 pixels with urban and non-urban and 2004-07-14). For our experiments, we used as input 13 labeled samples. The classification map is shown in Fig. 3. spectral bands (MERIS bands 11 and 15 were removed since Excellent classification accuracy is obtained, and uniform they are affected by atmospheric absorptions) and 6 physically- classification covers can be observed, even with so few labeled inspired features extracted from the 15 MERIS spectral bands training samples. in previous works [20], [21]: cloud brightness and whiteness in the visible (VIS) and near-infrared (NIR) spectral ranges, along with atmospheric oxygen and water vapour absorption features. 2) Model comparison: Figure 2[bottom] shows the vali- dation results for the considered methods. Again, LapSVM models produce better classification results than SVM in all cases. In fact, LapSVM performs especially better than the standard SVM when a low number of labeled samples is available and unlabeled samples help estimating the geometry of the manifold. Third, the linear kernel seems to be more appropriate for the presented cloud screening application if the number of labeled samples is low, while the difference between the linear and the RBF kernels decreases as the number of labeled samples increases. It is well-known that simple (linear) decision functions are more appropriate in extremely ill-posed situations. Finally, the κ surface for the LapSVM in Figure 2[bottom right] confirms, in general terms, the importance of the labeled information in this problem. 3) Visual inspection: We used the best LapSVM to classify Fig. 3. LandSat TM RGB color composite (top left), SAR intensity map (top the whole scenes, which consist of MERIS L1b images of right), true map (bottom left) and classification map (bottom right) of the urban 1153 × 1153 scene, indicating ‘urban’ (gray), ‘non-urban’ (white) and ‘unknown’ (black) pixels (reduced to around 500000 useful pixels classes. after the projection in Lat/Lon coordinates). The classification map for the 2003 and 2004 images are shown in Fig. 4. In this figure, we use as ground truth a cloud classification made by an operator following the methodology described in [20], C. Cloud screening [21]. Excellent classification accuracy of 95.51% and 96.48% 1) Data description: Experiments were carried out using are obtained for the 2003 and 2004 images, respectively. two MERIS Level 1b (L1b) images taken over Barrax (Spain), The value of κ is 0.78 for both images, and reflects the GOMEZ-CHOVA ET AL.: SEMI-SUPERVISED IMAGE CLASSIFICATION WITH LAPLACIAN SVM 5 misclassification of a significant number of false cloud pixels ACKNOWLEDGMENTS while all cloud pixels are correctly classified, suggesting that This paper has been partially supported by the Spanish LapSVM benefits from the inclusion of unlabeled samples Ministry for Education and Science under project DATASAT and obtains a reliable estimation of the marginal cloud data ESP2005-07724-C05-03. The authors wish to thank ESA for distribution. The committed errors correspond to high altitude the availability of the images acquired in the framework of the locations and bright bare soil covers which are not well- ESA-SPARC Project ESTEC-18307/04/NL/FF. represented in the small, randomly selected, training samples dataset. REFERENCES

True Cloud [1] B. Sch¬olkopf and A. Smola, Learning with Kernels Ð Support Vector Machines, Regularization, Optimization and Beyond. MIT Press Series, 2002. False Land [2] G. Camps-Valls, J. L. Rojo-Alvarez,« and M. Mart«õnez-Ram«on, Eds., Ker- nel Methods in Bioengineering, Signal and Image Processing. Hershey, False Cloud PA (USA): Idea Group Publishing, Jan 2007. [3] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis.

True Land Cambridge University Press, 2004. [4] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspec- tral image classification,” IEEE Transactions on Geoscience and Remote Background Sensing, vol. 43, no. 6, pp. 1351Ð1362, June 2005. [5] O. Chapelle, B. Sch¬olkopf, and A. Zien, Semi-Supervised Learning, True Cloud 1st ed. Cambridge, Massachusetts and London, England: MIT Press, 2006.

False Land [6] X. Zhu, “Semi-supervised learning literature survey,” Computer Sci- ences, University of Wisconsin-Madison, USA, Tech. Rep. 1530, 2005, online document: http://www.cs.wisc.edu/∼jerryzhu/pub/ssl survey.pdf. False Cloud Last modified on September 7, 2006. [7] N. M. Dempster, A. P. Laird and D. B. Rubin, “Maximum likelihood True Land from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1Ð38, Jan 1977.

Background [8] Q. Jackson and D. Landgrebe, “An adaptive classifier design for high- dimensional data analysis with a limited training data set,” IEEE Transactions on Geoscience and Remote Sensing, pp. 2664Ð2679, Dec. Fig. 4. RGB color composite and classification maps for the analyzed MERIS 2001. multispectral images using the best LapSVM classifiers. [9] V. N. Vapnik, Statistical Learning Theory. New York: John Wiley & Sons, 1998. [10] L. Bruzzone, M. Chi, and M. Marconcini, “A novel transductive SVM for the semisupervised classification of remote-sensing images,” IEEE V. C ONCLUSIONS Transactions on Geoscience and Remote Sensing, vol. 44, no. 11, pp. 3363Ð3373, 2006. A semi-supervised method has been presented for the clas- [11] G. Camps-Valls, T. Bandos, and D. Zhou, “Semi-supervised graph-based sification of urban areas and the identification of clouds. This hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, 2007, accepted, in press. method brings together the ideas of spectral graph theory, [12] M. Belkin and P. Niyogi, “Semi-supervised learning on Riemannian manifold learning, and kernel-based algorithms in a coherent manifolds,” Machine Learning, Special Issue on Clustering, vol. 56, and natural way to incorporate geometric structure in a kernel- pp. 209Ð239, 2004. [13] M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: A based regularization framework. The solution of the LapSVM geometric framework for learning from labeled and unlabeled examples,” constitutes a convex optimization problem and results in a Journal of Machine Learning Research, vol. 7, pp. 2399Ð2434, 2006. natural out-of-sample extension from the labeled and unla- [14] M. I. Jordan, Learning in Graphical Models, 1st ed. Cambridge, Massachusetts and London, England: MIT Press, 1999. beled training samples to novel examples, thus solving the [15] L. Gomez-Chova, D. Fern«andez-Prieto, J. Calpe, E. Soria, J. Vila- problems of previously proposed methods. Results showed Franc«es, and G. Camps-Valls, “Urban monitoring using multitemporal an increase on the classification accuracy provided by the SAR and multispectral data,” Pattern Recognition Letters, Special Issue on “Pattern Recognition in Remote Sensing”, vol. 27, no. 4, pp. 234Ð LapSVM respect to the standard SVM both with linear or RBF 243, 2006. kernels, suggesting that considered problems hold a complex [16] M. Rast and J. Bezy, “The ESA Medium Resolution Imaging Spectrom- manifold. eter MERIS a review of the instrument and its mission,” International Journal of Remote Sensing, vol. 20, no. 9, pp. 1681Ð1702, June 1999. This work has also revealed the potential of this classifi- [17] A. N. Tikhonov, “Regularization of incorrectly posed problems,” Sov. cation method in remote sensing image classification, when Math. Dokl., vol. 4, pp. 1624Ð1627, Jan 1963. reduced training sets are available. In particular, it has ac- [18] T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks and support vector machines,” Advances in Computational Mathematics, curately identified urban areas in multi-sensor imagery and vol. 13, no. 1, pp. 1Ð50, 2000. located cloud covers in MERIS multispectral images. In both [19] P. Castracane, F. Iavaronc, S. Mica, E. Sottile, C. Vignola, O. Arino, classification problems, it becomes very difficult to obtain a M. Cataldo, D. Fernandez-Prieto, G. Guidotti, A. Masullo, and I. Pratesi, “Monitoring urban sprawl and its trends with EO data. UrbEx, a representative training set for all possible situations, thus moti- prototype national service from a WWF-ESA joint effort,” in 2nd vating the introduction of a semi-supervised method exploiting GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over the unlabeled data. Urban Areas, 2003, pp. 245Ð248. [20] L. Gomez-Chova, G. Camps-Valls, J. Amoros-Lopez, L. Guanter, The main problem encountered is related to the compu- L. Alonso, J. Calpe, and J. Moreno, “New cloud detection algo- tational cost, since a huge matrix consisting of labeled and rithm for multispectral and hyperspectral images: Application to EN- unlabeled samples must be inverted. Note, however that in this VISAT/MERIS and PROBA/CHRIS sensors,” in IEEE International Geoscience and Remote Sensing Symposium, IGARSS’2006, Denver, method is not necessary to incorporate all unlabeled samples Colorado, USA, July 2006. in the image so the computational load is easily scalable. [21] L. G«omez-Chova, G. Camps-Valls, J. Calpe, L. Guanter, and J. Moreno, However, we feel that smart sampling strategies to select “Cloud screening algorithm for ENVISAT/MERIS multispectral im- ages,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, the most informative unlabeled samples could yield improve accepted for publication, in press. perfomance, something to be considered in the future.