CSIFT Based Locality-Constrained Linear Coding for Image
Total Page:16
File Type:pdf, Size:1020Kb
1 CSIFT Based Locality-constrained Linear Coding for Image Classification CHEN Junzhou 1 , LI Qing 1, PENG Qiang 1 and Kin Hong Wong 2 1 School of Information Science & Technology, Southwest Jiaotong University, Chengdu, Sichuan, 610031, China 2 Department of Computer Science & Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong Email:[email protected]; [email protected] Abstract CSIFT descriptors can achieve better performances in re- In the past decade, SIFT descriptor has been witnessed sisting some certain photometric changes. One example as one of the most robust local invariant feature de- can be found in [3], which shows that CSIFT is more scriptors and widely used in various vision tasks. Most stable than SIFT in case of illumination changes. traditional image classification systems depend on the On the other hand, the bag-of-features (BoF) [7] [8] luminance-based SIFT descriptors, which only analyze joined with the spatial pyramid matching (SPM) kernel the gray level variations of the images. Misclassification [9] has been employed to build the recent state-of-the-art may happen since their color contents are ignored. In image classification systems. In BoF, images are consid- this article, we concentrate on improving the perfor- ered as sets of unordered local appearance descriptors, mance of existing image classification algorithms by which are clustered into discrete visual words for the adding color information. To achieve this purpose, dif- representation of images in semantic classification. ferent kinds of colored SIFT descriptors are introduced SPM divides an image into 2l × 2l segments in differ- and implemented. Locality-constrained Linear Coding ent scales l =0, 1, 2, computes the BoF histogram within (LLC), a state-of-the-art sparse coding technology, is each segment, and finally concatenates all the histograms employed to construct the image classification system to build a spatial location sensitive descriptor of the im- for the evaluation. The real experiments are carried age. In order to obtain better classification performance, out on several benchmarks. With the enhancements of a codebook (a set of visual words), also named dictio- color SIFT, the proposed image classification system nary, is constructed to represent the extracted descriptors. obtains approximate 3% improvement of classification Traditional SPM uses clustering techniques like K-means accuracy on the Caltech-101 dataset and approximate 4% vector quantization (VQ) to generate the codebook. improvement of classification accuracy on the Caltech- Despite their efficiency, the obtained codebooks usually 256 dataset. suffer from several drawbacks such as distortion errors and low discriminative ability [10]. A linear SPM based on sparse coding (ScSPM) method [11] was proposed I. INTRODUCTION arXiv:1309.7484v1 [cs.CV] 28 Sep 2013 by Yang et al. to relaxing the restrictive cardinality Scale invariant feature transform (SIFT) descriptors constraint of VQ. By generalizing vector quantization [1] are widely used in many vision tasks, such as object to sparse coding followed by multi-scale spatial max- recognition, image classification, video retrieval, etc. It pooling, ScSPM significantly outperforms the traditional has been witnessed a very robust local invariant feature SPM kernel on histograms and is even better than the descriptors in respect of different geometrical changes. nonlinear SPM kernels on several benchmarks. However, SIFT was mainly developed for gray images, Yu et al. [12] demonstrated that under certain as- the color information of the objects are neglected. There- sumptions locality is more essential than sparsity for the fore, two objects with completely different colors may training of nonlinear classifiers and proposed a modifi- be regarded as the same. To overcome this limitation, cation of SC, named Local Coordinate Coding (LCC). different kinds of Colored SIFT (CSIFT) descriptors However, in both SC and LCC, the computationally were proposed and developed by researchers to utilize expensive L1-norm optimization problem is to be solved. the color information inside the SIFT descriptors [2] [3] Wang et al. developed a faster implementation of LCC, [4] [5] [6]. With the enhancement of color information, named locality-constrained linear coding (LLC) [13], which utilizes the locality constraint to project each of light hits the surface which will be reflected back in descriptor into its local-coordinate system. It achieves the every direction. state-of-the-art image classification accuracy even just Consider an image of an infinitesimal surface patch using a linear SVM classifier. of some object. Let the red, green and blue sensors According to our literature survey, although various with spectral sensitivities be fR(λ), fG(λ) and fB(λ) kinds of sparse representation (SR) based image clas- respectively. The corresponding sensor values of the sification algorithms with state-of-the-art performances surface image are [17] [18]: have been developed, most of them use only luminance- L(λ, n, s, v) = mb(n, s) fL(λ)e(λ)cb(λ) dλ+ based SIFT descriptors [11] [13] [14] [15] [16] [10]. λ ms(n, s, vR) fL(λ)e(λ)cs(λ) dλ Using color information can improve the robustness of λ R (1) traditional SIFT descriptor in respect of color variations and the geometrical changes. However, facing the diverse where L ∈{R,G,B} is the color channel of light, λ is CSIFT descriptors, the following questions are worth to the wavelength, n is the surface patch normal, s is the be studied. direction of the illumination source, and v is the direction Which CSIFT descriptor is the best for the SR based of the viewer. e(λ) is power of the incident light with • image classification system? wavelength λ, cb(λ) and cs are the the surface albedo and In what extend, the performance of SR based image Fresnel reflectance, respectively. The geometric terms • classification system can be improved by using mb and ms represent the diffuse reflection and the CSIFT? specular reflection respectively. In case white illumination and neutral interface reflec- To fully exploit the potential of CSIFT descriptors for tion model holds, the incident light energy e(λ)= e and image category recognition tasks, a CSIFT based image Fresnel reflectance term cs(λ) = cs are both constant classification system is constructed in this work. As a values independent of the wavelength λ. By assuming widely used state-of-the-art SC based encoding algo- the following holds: rithm, LLC is employed to encode the CSIFT descrip- tors for classification. Real experiments with different fR(λ)= fG(λ)= fB(λ)= f (2) kinds of CSIFT descriptors demonstrate that significant Zλ Zλ Zλ improvements can be obtained with the enhancement of Eq. (1) can be simplified: color information. The rest of this article is organized as follows. In section II, a reflectance model for color analysis is L(n, s, v)= emb(n, s)kL + ems(n, s, v)csf (3) presented. In section III, different kinds of the CSIFT where kL = λ fL(λ)cb(λ) is a variable depends only descriptors and their properties are discussed. Section on the sensorsR and the surface albedo. IV introduces the basic concepts of the LLC. In section V and VI, real experiments are carried out to study the III. COLORED SIFT DESCRIPTORS proposed algorithm in various aspects. Finally, in section On the basis of the Dichromatic Reflection Model, VII, conclusions are drawn. the stabilities and reliabilities of color spaces in regard of various photometric events such as shadows and II. DICHROMATIC REFLECTANCE MODEL specularities are studied in both theoretically and em- pirically [19] [2] [20]. Although there are many existing A physical model of reflection, named Dichromatic color space models, they are correlated to intensity; are Reflection Model, was presented by Shafer in 1985 linear combinations of RGB; or normalized with respect [17]. In which, the relationship between RGB-values of to intensity rgb [19]. In this article, we concentrate captured images and the photometric changes, such as on investigating CSIFT using essentially different color shadows and specularities, of environment was investi- spaces: RGB, HSV, YCbCr, Opponent, rg and color gated. Shafer indicated that the reflection of a incident invariant spaces. light can be divided into two distinct components: spec- ular reflection and body reflection. Specular reflection is when a ray of light hits a smooth surface at certain angle. A. SIFT The reflection of that ray will reflect at the same angle The SIFT algorithm was originally developed for as the incident ray. The effect of highlight is caused by grey images by Lowe [21] [1] for extracting highly the specular reflection. Diffuse reflection is when a ray discriminative local image features that are invariant 2 to image scaling and rotation, and partially invariant to changes in illumination and viewpoint. It has been undefined if max min used in a broad range of vision tasks, such as image = G B if max R and classification, recognition, content-based image-retrieval, 60◦ × max−min +0◦ = − G B etc. The algorithm involves two steps: 1) extraction ≥ G B if max R and of the keypoints of an image; 2) computation of the H = 60◦ × max−min + 360◦ = − feature vectors characterizing the keypoints. The first G<B G B max G step is carried out by convolving the input image with 60◦ × max−min + 120◦ if = G−B the DoG (difference of Gaussians) function in multiple 60◦ × max−min + 240◦ if max = B − scales and detecting the extremas of the outputs. The (4) second step is achieved by sampling the magnitudes and orientations of the image gradient in a patch around the 0 if max =0 detected feature. A 128-D vector of direction histograms S = max min min (5) − =1 − otherwise is finally constructed as the descriptor of each patch. max max Since the SIFT descriptor is normalized, it can invariant V = max (6) to the scale of gradient magnitude.