Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2009, Article ID 856039, 13 pages doi:10.1155/2009/856039
Research Article On the Performance of Kernel Methods for Skin Color Segmentation
A. Guerrero-Curieses,1 J. L. Rojo-Alvarez,´ 1 P. Con de - Pa rdo, 2 I. Landesa-Vazquez,´ 2 J. Ramos-Lopez,´ 1 and J. L. Alba-Castro2
1 Departamento de Teor´ıa de la Senal˜ y Comunicaciones, Universidad Rey Juan Carlos, 28943 Fuenlabrada, Spain 2 Departamento de Teor´ıa de la Senal˜ y Comunicaciones, Universidad de Vigo, 36200 Vigo, Spain
Correspondence should be addressed to A. Guerrero-Curieses, [email protected]
Received 26 September 2008; Revised 23 March 2009; Accepted 7 May 2009
Recommended by C.-C. Kuo
Human skin detection in color images is a key preprocessing stage in many image processing applications. Though kernel-based methods have been recently pointed out as advantageous for this setting, there is still few evidence on their actual superiority. Specifically, binary Support Vector Classifier (two-class SVM) and one-class Novelty Detection (SVND) have been only tested in some example images or in limited databases. We hypothesize that comparative performance evaluation on a representative application-oriented database will allow us to determine whether proposed kernel methods exhibit significant better performance than conventional skin segmentation methods. Two image databases were acquired for a webcam-based face recognition application, under controlled and uncontrolled lighting and background conditions. Three different chromaticity spaces (YCbCr, ∗ CIEL∗a∗b , and normalized RGB) were used to compare kernel methods (two-class SVM, SVND) with conventional algorithms (Gaussian Mixture Models and Neural Networks). Our results show that two-class SVM outperforms conventional classifiers and also one-class SVM (SVND) detectors, specially for uncontrolled lighting conditions, with an acceptably low complexity.
Copyright © 2009 A. Guerrero-Curieses et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction both the skin and the nonskin classes [12]. Even with an accurate estimation of the parameters in any density-based Skin detection is often the first step in many image processing parametric models, the best detection rate in skin color man-machine applications, such as face detection [1, 2], segmentation cannot be ensured. When a nonparametric gesture recognition [3], video surveillance [4], human modeling is adopted instead, a relatively high number of video tracking [5], or adaptive video coding [6]. Although samples is required for an accurate representation of skin and pixelwise skin color alone is not sufficient for segmenting nonskin regions, like histograms [13]orNeuralNetworks human faces or hands, color segmentation for skin detection (NN) [12]. hasbeenproventobeaneffective preprocessing step for Recently, the suitability of kernel methods has been the subsequent processing analysis. The segmentation task pointed out as an alternative approach for skin segmentation in most of the skin detection literature is achieved by in color spaces [14–17]. First, the Support Vector Machine using simple thresholding [7], histogram analysis [8], single (SVM) was proposed for classifying pixels into skin or Gaussian distribution models [9], or Gaussian Mixture nonskin samples, by stating the segmentation problem as Models (GMM) [1, 10, 11]. The main drawbacks of the a binary classification task [17], and later, some authors distribution-based parametric modeling techniques are, first, have proposed that the main interest in skin segmentation their strong dependence on the chosen color space and could be an adequate description of the domain that lighting conditions, and second, the need for selection of supports the skin pixels in the space color, rather than the appropriate model for statistical characterization of devoting effort to model the more heterogeneous nonskin 2 EURASIP Journal on Advances in Signal Processing class [14, 15]. According to this hypothesis, one-class kernel several color space transformations have been proposed and algorithms, known in the kernel literature as Support Vector compared [7, 10, 17, 20], none of them can be considered as Novelty Detection (SVND) [18, 19], have been used for skin the optimal one. The selection of an adequate color space is segmentation. largely dependent on factors like the robustness to changing However, and to our best knowledge, few exhaustive per- illumination spectra, the selection of a suitable distribution formance comparison have been made to date for supporting model, and the memory or complexity constraints of the a significant overperformance of kernel methods with respect running application. to conventional skin segmentation algorithms. More, differ- In the last years, experiments over highly representative ent merit figures have been used in different studies, and datasets with uncontrolled lighting conditions have shown even contradictory conclusions have been obtained when that the performance of the detector is degraded by those comparing SVM skin detectors with conventional parametric transformations which drop the luminance component. detectors [16, 17]. Moreover, the advantage of focusing Also, color-distribution modeling has been shown to have on determining the region that supports most of the skin alargereffect on performance than color space selection pixels in SVND algorithms, rather than modeling skin and [7, 21]. As trivially shown in [21], given an invertible one- nonskin regions simultaneously (as done in GMM, NN, to-one transformation between two 3D color spaces, if there and SVM algorithms), has not been thoroughly tested [14, exists an optimum skin detector in one space, there exists 15]. another optimum skin detector that performs exactly the Therefore, we hypothesize that comparative performance same in the transformed space. Therefore, results of skin evaluation on a database, with identical merit figures, will detection reported in literature for different color spaces allow us to determine whether proposed kernel methods must be understood as specific experiments constrained by exhibit significantly better performance than conventional the specific available data, the distribution model chosen skin segmentation methods. For this purpose, two image to fit the specific transformed training data and the train- databases have been acquired for a webcam based face validationtest split to tune the detector. recognition application, under controlled and uncontrolled Jayarametal.[22] showed the performance of 9 lighting and background conditions. Three different chro- color spaces with and without including the luminance maticity spaces (YCbCr, CIEL∗a∗b∗, normalized RGB) are component, on a large set of skin pixels under different used to compare kernel methods (SVM and SVND) with illumination conditions from a face database, and nonskin conventional skin segmentation algorithms (GMM and pixels from a general database. With this experimental NN). setup, histogram-based detection performed consistently The scheme of this paper is as follows. In Section 2, better than Gaussian-based detection, both in 2D and in we summarize the state of the art in skin color repre- 3D spaces, whereas 3D detection performed consistently sentation and segmentation, and we highlight some recent better than 2D detection for histograms but inconsistently findings that explain the apparent lack of consensus on better for Gaussian modeling. Also, regarding color space some issues regarding the optimum color spaces, fitting differences, some transformations performed better than models, and kernel methods. Section 3 summarizes the well- RGB, but the differences were not statistically significant. known GMM formulation, and presents a basic description Phung et al. [12] compared more distribution models of the kernel algorithms that are used here. In Section 4, (histogram-based, Gaussians, and GMM) and decision- performance is evaluated for conventional and for kernel- based classifiers (piecewise linear and NN) over 4 color based segmentations, with emphasis on the free parameters spaces by using their ECU face and skin detection database. tuning. Finally, Section 5 contains the conclusions of our This database is composed of thousands of images with study. indoor and outdoor lighting conditions. The histogram- based Bayes and the MLP classifiers in RGB performed very similarly, and consistently better than the other Gaussian- 2. Background on Color Skin Segmentation based and piecewise linear classifiers. The performance over the four color spaces with high resolution histogram Pixelwise skin detection in color still images is usually modeling was almost the same, as expected. Also, mean accomplished in three steps: (i) color space transformation, performance decreased and variance increased when the (ii) parametric or nonparametric color distribution model- luminance component was discarded. In [17], the perfor- ing, and (iii) binary skin/nonskin decision. We present the mance of nonparametric, semiparametric, and parametric background on the main results in literature that are related approaches was evaluated over sixteen color spaces in 2D to our work in terms of the skin pixels representation and of and 3D, concluding that, in general, the performance does the kernel methods previously used in this setting. not improve with color space transformation, but instead it decreases with the absence of luminance. All these tests 2.1. Color Spaces and Distribution Modeling. The first step highlight the fact that with a rich representation of the in skin segmentation, color space transformation, has been 3D color space, color transformation is not useful at all widely acknowledged as a necessary stage to deal with the but they bring also the lack of consensus regarding the perceptual nonparametricuniformity and with the high cor- performance of different color-distribution models, even relation among RGB channels, due to their mixing of lumi- when nonparametric ones seem to work better for large nance and chrominance information. However, although datasets. EURASIP Journal on Advances in Signal Processing 3
With these considerations in mind, and from our point of Map), GMM, SOM (Self-Organizing Map) and SVM on view, the design of the optimum skin detector for a specific 16 color spaces and under varying lighting conditions. application should consider the next situations. According to the results in terms of AUC, the best model is SPM, followed by GMM, SVM, and SOM. This is the (i) If there are enough labeled training data to gener- only work where the performance obtained with kernel- ously fill the RGB space, at least the regions where methods is lower than that achieved with SPM and GMM. the pixels of that application will map, and if RAM This work concludes that free parameter ν has little influence memory is not a limitation, a simple nonparametric on the results, on the contrary to the rest of the works histogram-based Bayes classifier over any color space with kernel methods. Other works have shown that the will do the job. histogram-based classifier can be an alternative to GMM [13] (ii) If there is not enough RAM memory or enough or even MLP [12] for skin segmentation problems. With labeled data to produce an accurate 3D-histogram, our databases, the results obtained by the histogram-based but still the samples represent skin under constrained method have not shown to be better than those from an MLP lighting conditions, a chromaticity space with inten- classifier. sity normalization will probably generalize better These previous works have considered the skin detection when scarcity of data prevents modeling the 3D as the skin/nonskin binary classification problem. Therefore, colorspace. The performance of any distribution- they used two-class kernel models. More recently, in order based or boundary-based classifier will be dependant to avoid modeling nonskin regions, other approaches have on the training data and the colorspace, so a joint been proposed to tackle the problem of skin detection by selection should end up with a skin detector that just means of one-class kernel-methods. In [14], a one-class works fine, but generalization could be compromised SVM model is used to separate face patterns from others. if conditions change largely. Although it is concluded that the extensive experiments (iii) If the spectral distribution of the prevailing light show that this method has an encouraging performance, no sources are heavily changing, unknown, or cannot further comparisons with other approaches are included, be estimated or corrected, then better switch to and few numerical results are reported. In [15], it is another gray-based face detector because any try to concluded that one-class kernel methods outperform other build a skin detector with such a training set and existing skin color models in normalized RGB and other conditions will yield unpredictable and poor results, color transformations, but again, comprehensive numerical unless dynamic adaptation of the skin color model comparisons are not reported, and no comparison, to other in video sequences will be possible (see [23]foran skin detectors are included. example with known camera response under several Taking into account the previous works in literature, color illuminants). the superiority of kernel-methods to tackle the problem of skin detection should be shown by using an appropriate In this paper we study more deeply the second situation, experimental setup and by making systematic comparisons that seems to be the most typical one for specific applica- with other models proposed to solve the problem. tions, and we will focus on the model selection for several 2D color spaces. We will analyze whether boundary-based models like kernel-methods work consistently better than 3. Segmentation Algorithms distribution-based models, like classical GMM. We next introduce the notation and briefly review the segmentation algorithms used in the context of skin seg- 2.2. Kernel Methods for Skin Segmentation. The skin detec- mentation applications, namely, the well-known GMM tion problem by using kernel-methods has been previously segmentation and the kernel methods with binary SVM and considered in literature. In [16] a comparative analysis of the one-class SVND algorithms. performance of SVM on the features of a segmentation based on the Orthogonal Fourier-Mellin Moments can be found. 3.1. GMM Skin Segmentation. GMM for skin segmentation They conclude that SVM achieves a higher face detection [11, 13] can be briefly described as follows. The apriori performance than a 3-layer Multilayer Perceptron (MLP) probability P(x, Θ) of each skin color pixel x (in our case, x ∈ when an adequate kernel function and free parameters R2;seeSection 4) is assumed to be the weighted contribution are used to train the SVM. The best tradeoff between of k Gaussian components, each being defined by parameter the rate of correct face detection and the rate of correct vector θ ={w , μ , Σ },wherew is the weight value of the ith rejection of distractors by using SVM is in the 65%–75% i i i i i component, and μ , Σ , are its mean vector and covariance interval for different color spaces. Nevertheless, this database i i matrix, respectively. The whole set of free parameters will does not consider different illumination conditions. A more be denoted by Θ ={θ , ..., θ }. Within a Bayesian comprehensive review of color-based skin detection methods 1 K approach, the probability for a given color pixel x can be can be found in [17], which focus on classifying each pixel written as as skin or nonskin without considering any preprocessing stage. The classification performance, in terms of ROC k (Receiver Operating Characteristic) curve and AUC (Area P(x, Θ) = wi p(xi),(1) Under Curve), is evaluated by using SPM (Skin Probability i=1 4 EURASIP Journal on Advances in Signal Processing where the ith component is given by bivariate function K(x, y), known as Mercer’s kernel, that fulfills Mercer’s theorem [26], that is, T 1 −1 2 x−μ Σ−1 x−μ p(x | i) = e / ( i) i ( i),(2) d/2 1/2 K x, y = ϕ(x), ϕ y . (6) (2π) |Σi| k For instance, a Gaussian kernel is often used in support to and the relative weights w fulfill = w = 1andw ≥ 0. i i 1 i i vector algorithms, given by AdjustablefreeparametersΘ are estimated by minimizing the negative log-likelihood for a training dataset, given by − − 2 2 K x, y = e x y /2σ ,(7) X ≡{x1, ..., xl}, that is, we minimize where σ is the kernel-free parameter, which must be previ- l l k ously chosen, according to some criteria about the problem − ln P xj , Θ =− ln wi p xj i . (3) j=1 j=1 i=1 at hand and the available data. Note that, by using Mercer’s kernels, nonparametriclinear mapping ϕ does not need to be The optimization is addressed by using the EM algorithm explicitly known. [24], which calculates the a posteriori probabilities as In the most general case of nonparametriclinearly sep- arable data, the optimization criterion for the binary SVM t t consists of minimizing w p xj i t i P ixj = ,(4) t l P xj , Θ 1 w2 + C ξ (8) 2 i where superscript t denotes the parameter values at tth i=1 iteration. The new parameters are obtained by constrained to yi(w, ϕ(xi) + b) ≥ 1 − ξi and to ξi ≥ 0, for i = 1, ..., l. Parameter C is introduced to control the tradeoff l t | j=1 P i xj xj between the margin and the losses. By using the Lagrange μt+1 = , i l t | Theorem, the Lagrangian functional can be stated as j=1 P i xj l l T l t = 1 2 − = P i | xj x − μ x − μ Lpd w + C ξi βiξi Σt+1 = j 1 i i ,(5) 2 i i=1 i=1 l t | =1 P i xj (9) j l l − αi yiw, ϕ(xi) + b − 1+ξi t+1 = 1 t | wi P i xj . i=1 l j=1 constrained to αi, βi ≥ 0, and it has to be maximized with ThefinalmodelwilldependonmodelorderK, which has respect to dual variables αi, βi and minimized with respect to be analyzed in each particular problem for the best bias- to primal variables w, b, ξi. By taking the first derivative with variance tradeoff. respect to primal variables; the Karush-Khun-Tucker (KKT) A k-means algorithm is often used, in order to take conditions are obtained, where into account even poorly represented groups of samples. All = l components are initialized to wi 1/k and the covariance w = ϕ(x ), (10) Σ 2 αi i matrices i to δ I,whereδ is the Euclidean distance from the = μ i 1 component mean i of the nearest neighbor. and the solution is achieved by maximizing the dual 3.2. Kernel-Based Binary Skin Segmentation. Kernel methods functional: provideuswithefficient nonlinear algorithms by following l l 1 two conceptual steps: first, the samples in the input space are α − α α y y K x , x , (11) i 2 i j i j i j nonlinearly mapped to a high-dimensional space, known as i=1 i,j=1 feature space, and second, the linear equations of the data ≥ l = model are stated in that feature space, rather than in the input constrained to αi 0and i=1 αi yi 0. Solving space. This methodology yields compact algorithm formula- this quadratic programming (QP) problem yields Lagrange tions, and leads to single-minimum quadratic programming multipliers αi, and the decision function can be computed as problems when nonlinearity is addressed by means of the so- ⎛ ⎞ l called Mercer’s kernels [25]. = ⎝ ⎠ l 2 f (x) sgn αi yiK(x, xi) + b (12) Assume that {(x , )} = ,withx ∈ R , represents a set i yi i 1 i i=1 of l observed skin and nonskin samples in a space color, with 2 class labels yi ∈{−1, 1}.Letϕ : R → F be a possibly which has been readily expressed in terms of Mercer’s kernels nonlinear mapping from the color space to a possibly higher- in order to avoid the explicit knowledge of the feature space dimensional feature space F, such that the dot product and of the nonlinear mapping ϕ, and where sgn() denotes between two vectors in F can be readily computed using a the sign function for a real number. EURASIP Journal on Advances in Signal Processing 5
Hypersphere in F points, and −1 in the other half region. The criterion x1 followed therein consists of first mapping the data into F, ξ and then separating the mapped points from the origin with maximum margin. This decision function is required to be R positive for most training vectors xi, and it is given by x2 f (x) = sgn w, ϕ(x) − ρ , (13) Color subspace ϕ Feature space F where w, ρ, are the maximum margin hyperplane and the x1 bias, respectively. For a newly tested point x, decision value f (x) is determined by mapping this point to F and then evaluating to which side of the hyperplane it is mapped. In order to state the problem, two terms are simultane- w ously considered. On the one hand, the maximum margin x2 condition can be introduced as usual in SVM classification formulation [26], and then, maximizing the margin is Hyperplane in F equivalent to minimizing the norm of the hyperplane vector w. On the other hand, the domain description is required to Figure 1: SVND algorithms make a nonlinear mapping from the input space to the feature space. A simple geometric figure bound the space region that contains most of the observed (hypersphere or hyperplane) is traced therein, which splits the data, but slack variables ξi are introduced in order to feature space into known domain and unknown domain. This consider some losses, that is, to allow a reduced number corresponds to a nonlinear, complex geometry boundary in the of exceptional samples outside the domain description. input space. Therefore, the optimization criterion can be expressed as the simultaneous minimization of these two terms, that is, we want to minimize Note from (10) that hyperplane in F is given by a linear l 1 2 1 − combination of the mapped input vectors, and accordingly, w + ν ξi ρ, (14) = 2 l = the patterns with αi / 0arecalledSupport Vectors. They i 1 contain all the relevant information for describing the with respect to w, ρ andconstrainedto hyperplane in F that separates the data in the input space. The number of support vector is usually small (i.e, SVM gives w, ϕ(xi) ≥ ρ − ξi, (15) a sparse solution), and it is related to the generalization error ≥ = of the classifier. and to ρ>0, and to ξi 0, for i 1, ..., l. Parameter ν ∈ (0, 1) is introduced to control the tradeoff between the margin and the losses. 3.3. Kernel-Based One-Class Skin Segmentation. The domain The Lagrangian functional can be stated, similarly to the description of a multidimensional distribution can be preceding subsection, and now, the dual problem reduces to addressed by using kernel algorithms that systematically minimizing enclose the data points into a nonlinear boundary in the input space. SVND algorithms distinguish between the class 1 l α α K x , x (16) of objects represented in the training set and all the other 2 i j i j possible objects. It is important to highlight that SVND i,j=1 represents a very different problem than the SVM. The l constrained to the KKT conditions given by = α = 1, 0 ≤ training of SVND only uses training samples from one i 1 i ≤ ν = l ϕ single class (skin pixels), whereas an SVM approach requires αi 1/ l,andw i=1 αi (xi). training with pixels from two different classes (skin and It can be easily shown that samples xi that are mapped = nonskin). Hence, let X ≡{x , ..., x } be now a set of l into the +1 semispace have no losses (ξi 0) and a null 1 l ffi observed only skin samples in a space color. Note that, in this coe cient αi, so that they are not support vectors. Also, case, nonskin samples are not used in the training dataset. the samples xi that are mapped to the boundary have no ν Two main algorithms for SVND have been proposed, losses, but they are support vectors with 0 <αi < 1/ l, that are based on different geometrical models in the feature and accordingly they are called unbounded support vectors. space, and their schematic is depicted in Figure 1.Oneof Finally, samples xi that are mapped outside the domain them uses a maximum margin hyperplane in F that separates region have nonzero losses, ξi > 0, their corresponding = ν the mapped data from the origin of F [18], whereas the other Lagrange multipliers are αi 1/ l, and they are called finds a hypersphere in F with minimum radius enclosing the bounded support vectors. mapped data [19]. These algorithms are next summarized. Solving this QP problem, the decision function (13)can be easily rewritten as ⎛ ⎞ 3.3.1. SVND with Hyperplane. The SVND algorithm pro- l ⎝ ⎠ posedin[18] builds a domain function whose value is f (x) = sgn αiK(x, xi) − ρ . (17) +1 in the half region of F that captures most of the data i=1 6 EURASIP Journal on Advances in Signal Processing
By now inspecting the KKT conditions, we can see that, sphere boundary have no losses, and they are support vectors for ν close to 1, the solution consists of all αi being at with 0 <αi