Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification
Total Page:16
File Type:pdf, Size:1020Kb
L2EMG AND ITS APPLICATION TO IMAGE CLASSIFICATION 1 Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification Peihua Li, Member, IEEE, Qilong Wang, Member, IEEE, Hui Zeng, Lei Zhang, Senior Member, IEEE Abstract—This paper presents a novel image descriptor to effectively characterize the local, high-order image statistics. Our work is inspired by the Diffusion Tensor Imaging and the structure tensor method (or covariance descriptor), and motivated by popular distribution-based descriptors such as SIFT and HoG. Our idea is to associate one pixel with a multivariate Gaussian distribution estimated in the neighborhood. The challenge lies in that the space of Gaussians is not a linear space but a Riemannian manifold. We show, for the first time to our knowledge, that the space of Gaussians can be equipped with a Lie group structure by defining a multiplication operation on this manifold, and that it is isomorphic to a subgroup of the upper triangular matrix group. Furthermore, we propose methods to embed this matrix group in the linear space, which enables us to handle Gaussians with Euclidean operations rather than complicated Riemannian operations. The resulting descriptor, called Local Log-Euclidean Multivariate Gaussian (L2EMG) descriptor, works well with low-dimensional and high-dimensional raw features. Moreover, our descriptor is a continuous function of features without quantization, which can model the first- and second-order statistics. Extensive experiments were conducted to evaluate thoroughly L2EMG, and the results showed that L2EMG is very competitive with state-of-the-art descriptors in image classification. Index Terms—Image descriptors, space of Gaussians, Lie group, image classification. F 1 INTRODUCTION TABLE 1 Analogy of Matrix-valued and Function-valued image representation Characterizing local image properties has been attracting great research interests in past years [1]. The local descrip- Representation Description 2 3 tors can be either sampled sparsely for describing region P11 ::: P1n Second-order moment of features which STM or 4 . 5 reflects the correlation in local patches. of interest localized by region detectors [2], or extracted at COV . .. dense, regular grids for image representation [3], [4]. Image Pn1 ::: Pnn Every voxel is associated with local descriptors play a fundamental role for the success of D11 D12 D13 a 3×3 symmetric matrix de- DTI D12 D22 D23 many middle-level or high-level vision tasks. In particular, D13 D23 D33 scribing molecules diffusion. densely sampled descriptors have proven to achieve state- We represent the local image statistics of-the-art performance in image-based classification tasks using multivariate Gaussians at a dense Ours such as scene categorization, object classification, texture grid mapped to the linear space via in- recognition, and image retrieval. However, it is challenging formation geometry. to develop image descriptors with high distinctiveness for general image classification tasks. Our goal is to present a function-valued descriptor to three directions and the correlations among these directions. effectively represent the statistics of an image local region. Our idea is to associate one pixel point with a multivariate Our work is motivated by Diffusion Tensor Imaging (DTI) Gaussian distribution (hereafter abbreviated as Gaussian [5] and the structure tensor model (STM) [6] or covariance for simplicity) to characterize the first- and second-order descriptors (COV) [7], which respectively enjoy important statistics in the local neighborhood. It extends the STM or applications in vision and medical imaging fields [8]. As COV in that both the first- and second-order moments are shown in Table 1, STM or COV computes the second-order utilized; similar to DTI, our method can be interpreted as moment of image gradients or multiple cues in a neigh- an “imaging” method, which produces a function-valued borhood, which reflects the correlation of features in local image where each point is a Gaussian describing the local image patches. In DTI each voxel is associated with a 3×3 feature statistics. symmetric matrix describing the molecular mobility along Our work is also inspired by the popular distribution- based descriptors, e.g., SIFT [9] and HoG [10]. Such de- • P. Li, Q. Wang and H. Zeng are with the School of Information and scriptors usually employ histogram to represent the local Communication Engineering, Dalian University of Technology, China. image statistics. Though effective in a variety of applica- E-mail: [email protected], qlwang, [email protected] • L. Zhang is with the Department of Computing, The Hong Kong Poly- tions, they only exploit zero-order statistics as only feature technic University. E-mail: [email protected] occurrences (frequencies) are collected [4] and there lacks This work was supported by the National Natural Science Foundation of China a natural mechanism for multiple cue fusion. In addition, (61471082), and the Hong Kong RGC GRF grant (PolyU 5313/13E). discrete histograms often suffer from the quantization prob- L2EMG AND ITS APPLICATION TO IMAGE CLASSIFICATION 2 Raw feature maps : pixel-wise Vector-valued image : embedding Function-valued image : Gaussians feature vectors consisting of Gaussians in the linear space by Input image at a dense image grid representing intensity, histogram, derivatives of DE-LogE or IE-LogE, obtaining the local image statistics varying orders, etc. final L2EMG descriptors (a) Flowchart of L 2EMG + + N( n ) A+ ( n +1) A( n +1) N( n ) A+ ( n +1) GL( n +1) SO ( n +1) Sym( n +1) Sym( n +1) π -1 -1 γ φ log φ PSO log N -Τ log(A -Τ ) N A -Τ (µ ,Σ ) Aµ,L µ,L (µ ,Σ ) µ,L P log(P ) Structure of Gaussians Linear space Structure of Gaussians Linear space (b) DE-LogE (c) IE-LogE Fig. 1. Overview of the Local Log-Euclidean Multivariate Gaussian (L2EMG) descriptor. (a) Given an input image, we compute raw features and then extract Gaussians at a dense grid using these features; finally we embed the Gaussians in the linear space to obtain vectorized L2EMG descriptors. We show that the space of n-dimensional Gaussians N(n) can be equipped with a Lie group structure, equivalent to a subgroup, denoted by A+(n + 1), of the upper triangular matrix group. It can be directly embedded in a linear space A(n + 1) by the matrix logarithm which is called DE-LogE, or indirectly by IE-LogE which first maps A+(n + 1) into the space Sym+(n + 1) of SPD matrices and then into the linear space Sym(n + 1), as shown in (b) and (c), respectively. The embedding processes preserve the algebraic and topological structures. Consequently, we can handle Gaussians with Euclidean operations instead of Riemannian ones. See Section 4 for details. lem which may bring side effects on vision tasks [11]. These tor. Given an input image, we first extract n-dimensional considerations instigate us to model image local statistics raw features, then compute multivariate Gaussians at a using Gaussian, one of the most widely used continuous dense grid, obtaining a function-valued image, and finally, probability density functions. Underlying our descriptor embed these Gaussians into a linear space to obtain the are the maximum entropy principle [12] and the success vectorized L2EMG descriptors. We develop two embedding of covariance descriptors [7]. The maximum entropy prin- methods, as shown in Figs. 1(b) and 1(c), respectively. The ciple states that in the set of trial distributions encoding first method, called direct embedding Log-Euclidean (DE- the precisely stated prior or testable information, the one LogE), maps A+(n + 1) via matrix logarithm to the linear with maximal entropy is the proper distribution. It is well- space A(n + 1). The second one, what we call indirect known that Gaussian enjoys such a property among the embedding Log-Euclidean (IE-LogE), first maps A+(n + 1) family of distributions with fixed empirical mean vector via the coset and polar decomposition into the space of and covariance matrix. The covariance matrices have proven symmetric positive definite (SPD) matrices, Sym+(n + 1), to be effective descriptors in a variety of applications [7]; and then into the linear space Sym(n + 1) by the Log- beside, the mean vectors have also proven to be important Euclidean framework [14]. DE-LogE has a simple, analytic in image classification [4] and image search [13]. expression for diagonal-covariance Gaussians, particularly The challenge of using Gaussians to model local image suitable for high-dimensional raw features. Our methods statistics lies in that the space N(n) of n-dimensional Gaus- are primarily based on Lie group isomorphisms, which re- sians N (µ; Σ), where µ is the mean vector and Σ is the spects the algebraic and topological structures of the spaces covariance matrix, is not a linear space but a manifold. We involved. study the geometry of N(n), and show, for the first time to This paper substantially extends our previous work our knowledge, that the space of Gaussians can be provided namely the local Log-Euclidean covariance matrix (L2ECM) with a Lie group structure by defining a multiplication op- [15] from three aspects. (1) We represent the local image eration on this manifold, and that it is isomporhic to a sub- statistics by using multivariate Gaussians instead of using group, denoted by A+(n+1), of the upper triangular matrix only covariance matrices as in L2ECM. L2ECM is a special group. Based on this, we develop novel methods to embed case of L2EMG where the mean vectors are always constant. this space into some linear space, which enables us to handle Hence, L2EMG can capture richer statistical information. (2) Gaussians with Euclidean operations instead of complicat- We equip the space of Gaussians with a Lie group structure ed Riemannian operations while respecting the geometry and present novel methods to deal with Gaussians by using of Gaussians.