Matrix Variate Gaussian Mixture Distribution Steered Robust Metric Learning

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Matrix Variate Gaussian Mixture Distribution Steered Robust Metric Learning Lei Luo, Heng Huang∗ Electrical and Computer Engineering, University of Pittsburgh, USA [email protected], [email protected] Abstract rithms have shown great success in many applications, find- ing a robust distance metric from real-world data remains a Mahalanobis Metric Learning (MML) has been actively stud- big challenge. ied recently in machine learning community. Most of existing MML methods aim to learn a powerful Mahalanobis distance To enhance the performance of metric learning models, for computing similarity of two objects. More recently, mul- some recent methods, represented by low-rank or sparse tiple methods use matrix norm regularizers to constrain the metric learning (Huo, Nie, and Huang 2016; Ying, Huang, learned distance matrix M to improve the performance. How- and Campbell 2009), actively integrate different regulariza- ever, in real applications, the structure of the distance matrix tion terms associated with the matrix M in modeling. On M is complicated and cannot be characterized well by the one hand, using these regularization terms can effectively simple matrix norm. In this paper, we propose a novel robust prevents the overfitting problem since it is equivalent to metric learning method with learning the structure of the dis- M M adding the prior knowledge into . On the other hand, these tance matrix in a new and natural way. We partition into regularization terms can extract partial structure information blocks and consider each block as a random matrix variate, M which is fitted by matrix variate Gaussian mixture distribu- of the matrix . The structure of a variable corresponds to tion. Different from existing methods, our model has no any its spatial distribution. However, for many practical appli- assumption on M and automatically learns the structure of cations, the spatial distribution of M is generally irregular M from the real data, where the distance matrix M often is and complicated. The simple matrix norm regularizers can- neither sparse nor low-rank. We design an effective algorithm not characterize the structure information well. For instance, to optimize the proposed model and establish the correspond- L2 or L1-norm regularization is based on the hypothesis that ing theoretical guarantee. We conduct extensive evaluations the elements of M are independently distributed with Gaus- on the real-world data. Experimental results show our method sian distribution or Laplace distribution (Luo et al. 2016b). consistently outperforms the related state-of-the-art methods. Obviously, they cannot capture the spatial structure of M. The low-rank and Fantope regularizers (Law, Thome, and Introduction Cord 2014) overcome this limitation, but they lack the generalization because the distance matrix M may not be low- Mahalanobis metric learning (MML) has been actively stud- rank for real data. ied in machine learning community and successful applied Many previous works make use of multiple matrix norms to address numerous applications (Kuznetsova et al. 2016). to jointly characterize a matrix variate with complex distri- MML methods target to learn a good Mahalanobis met- bution. Nevertheless, these strategies are limited to some ric to effectively gauge the pairwise distance between data special cases. For example, Least Soft-threshold Squares objects. Particularly, the distance matrix M plays a cru- method (Wang, Lu, and Yang 2013) is only suitable for cial role in MML. A well-learned M can precisely reflect the variate following Gaussian-Laplace distribution, while domain-specific connections and relationships. Toward this Nuclear-L1 joint regression model (Luo et al. 2015; 2016a) end, many metric learning algorithms under various prob- focuses on the structural and sparse matrix variate. To adapt lem settings have been proposed, such as pairwise con- more practical problems, some scholars carry out the variate strained component analysis (PCCA) (Mignon and Jurie estimation task under the framework of the Gaussian Mix- 2012), neighborhood repulsed metric learning (NRML) (Lu ture Regression (GMR) (Cao et al. 2015). This is originated et al. 2014), large margin nearest neighbor (LMNN) (Wein- from a basic fact: Gaussian Mixture distribution can con- berger and Saul 2009), logistic discriminant metric learn- struct a universal approximator to any continuous density ing (LDML) (Guillaumin, Verbeek, and Schmid 2009), and function in theory (Bishop 2007). The experimental results Hamming distance learning (Zheng, Tang, and Shao 2016; show the advantages of this strategy, but most of these meth- Zheng and Shao 2016). Although these supervised algo- ods (related to GMR) are based on regression analysis. As ∗ To whom all correspondence should be addressed. we know, many practical problems cannot be formulated as Copyright c 2018, Association for the Advancement of Artificial regression-like models. Accordingly, it is expected to extend Intelligence (www.aaai.org). All rights reserved. GMR to other formulations. Additionally, these GMR based 3722 approaches assume the elements in a matrix variate are gen- erated independently for the convenience, which overlooks the latent relationships between elements. In this paper, we propose a novel metric learning model that utilizes the Gaussian Mixture Distribution (GMD) to automatically learn the structure information of the distance matrix M from the real data. To the best of our knowledge, this is the first work for depicting a matrix variate using GMD in metric learning model. To fully exploit the struc- Figure 1: Partition a matrix M =(mij)d×d into p × q non- ture of M, we partition M into blocks and view each block overlapping blocks: (a) the original matrix M; (b) the parti- d ×d as a random matrix variate which is automatically fitted by tioned result, where each Muv ∈R u v (u =1, ···,p,v = GMD. Since each block represents the local structure of M, 1, ···,q). our method integrates the structure information of different regions of M in modeling. On the basis of GMD, we con- struct a convex regularization with regard to M and use it to provide robustness to noise. To this end, the label in- to derive a robust metric learning model with triplet con- formation can be fully exploited, which leads to different straints. A new effective algorithm is introduced to solve weakly-supervised constraints, including pairwise, triplet the proposed model. Due to the promising generalization of and quadruplet constraints. However, the metric learning GMD, our model does not rely on any assumption on M, re- models with different constraints can be unified as the fol- gardless of whether it holds the low-rank (or sparse) attribute lowing form: or not. Therefore, comparing with existing metric learning Mz = argmin (εz(M)+λΩ(M)), (2) methods using matrix norm regularizers, our method can ef- M∈M fectively learn the structure information for the general distance matrix M. Moreover, as the theoretical contribution of where εz(·) is called loss function, Ω(M) is a regulariza- this paper, we analyze the convexity and generalization abil- tion term, the balance parameter λ>0 and M denotes the d×d ity of the proposed model. A series of experiments on image domain of M. In general, M⊆R+ . classification and face verification demonstrate the effective- The loss function εz(·) is generally induced by differ- ness of our new method. ent constraints. Therefore, the minimization of εz(·) will result in minimizing the distances between the data points Learning A Robust Distance Metric Using with similar constraints and maximizing the distances be- Gaussian Mixture Distribution tween the data points with dissimilar constraints. Recently, In this section, we will first propose a robust objective for the metric learning model: large margin nearest neighbor distance metric learning using grouped Gaussian mixture (LMNN) (Weinberger and Saul 2009) has attracted wide at- distribution. After that, we design an effective optimization tention. It uses triplet constraints on training examples and algorithm to solve the proposed model. the corresponding loss function can expressed as: Notation. Throughout this paper, we write matrices as lmnn(M)=(1− ) D (x x ) bold uppercase characters and vectors as bold lowercase εz μ M i, j i,ji characters. Let y = {y1,y2, ···,yn} be the label set of in- X = {x x ··· x } put (or training) samples 1, 2, , n , where each + μ r(yi,yl)[1 + DM(xi, xj) (3) ∈Rd ( =1 2 ··· ) xi i , , ,n . For example, the label of sample i,ji,l xi yi r(yi,yl)=1 yi = yl r(yi,yl)=0 is . Let if otherwise . −D (x x )] Suppose input samples X and labels y are contained in a in- M i, l +, put space X and a label space Y, respectively. Meanwhile, 2 where DM(xi, xj)=dM(xi, xj) and the notation j i we assume z := {zi =(xi,yi):xi ∈X,yi ∈Y,i ∈ Nn}, xj where Nn = {1, 2, ···,n}. For any x ∈R, the function indicates that input is a target neighbor (i.e., the same xi r(yi,yl)=1, xl f(x)=[x]+ is equal to x if x>0 and zero otherwise. labeled inputs) of input . When is an d×d d×d impostor neighbor (i.e., differently labeled inputs) of input R+ defines the set of positive definite matrices on R . xi. And the parameter μ ∈ (0 1) defines a trade-off between T M F , M and Tr(M) denote the Frobenius-norm, trans- the above two objectives. pose and trace of the matrix M, respectively. Although LMNN significantly improves the performance Problem statement. Given any two data points xi and xj, of traditional kNN classification, it is often confront with the a Mahalanobis distance between them can be calculated as overfitting problem. Then, some regularized LMNN models following: (Li, Tian, and Tao 2016; Lim, McFee, and Lanckriet 2013) T have emerged, which enhance the generalization and robust- dM(xi, xj)= (xi − xj) M(xi − xj).

Load more