Large Scale Similarity Learning Using Similar Pairs for Person Verification

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Large Scale Similarity Learning Using Similar Pairs for Person Verification Yang Yang, Shengcai Liao, Zhen Lei, Stan Z. Li Center for Biometrics and Security Research & National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China {yang.yang, scliao, zlei, szli}@nlpr.ia.ac.cn Abstract In this paper, we propose a novel similarity measure which can be further rewritten as a combination of a Maha- In this paper, we propose a novel similarity measure and then lanobis metric and a bilinear similarity metric. With more introduce an efficient strategy to learn it by using only similar metrics, it is able to handle heterogeneous data well. We pairs for person verification. Unlike existing metric learning methods, we consider both the difference and commonness of also present an efficient strategy to jointly learn the simi- an image pair to increase its discriminativeness. Under a pair- larity measure by using only similar pairs. Different from constrained Gaussian assumption, we show how to obtain the triplets (Schultz and Joachims 2003) or quadruplets (Law, Gaussian priors (i.e., corresponding covariance matrices) of Thome, and Cord 2013), we employ pairwise constraints dissimilar pairs from those of similar pairs. The application because it is easier to specify labels in the form of equiv- of a log likelihood ratio makes the learning process simple alence constraints (Kostinger¨ et al. 2012). To be specific, and fast and thus scalable to large datasets. Additionally, our given an image pair, we first introduce the concepts of differ- method is able to handle heterogeneous data well. Results on ence and commonness, which are defined by the subtraction the challenging datasets of face verification (LFW and Pub- between the pair and the summation of them, respectively. Fig) and person re-identification (VIPeR) show that our algo- Under a pair-constrained Gaussian assumption (detailed in rithm outperforms the state-of-the-art methods. section 3), we then show how to calculate the Gaussian priors (or ’priors’ for brevity) of dissimilar pairs from those of 1 Introduction similar pairs. Inspired by KISS metric (KISSME) (Kostinger¨ et al. 2012), we employ a log likelihood ratio to directly Person verification, i.e., verifying whether two unseen im- compute our similarity measure in terms of priors of la- ages contain the same person or not, has attracted increasing beled similar pairs. The time complexity of our method is attention in computer vision. There exist two main clues in O(Nd2 + d3), where d is the dimension of PCA-reduced the images - face and human body, based on which, the prob- features and N is the number of similar pairs. Therefore, our lem of person verification can be further classified into two method is scalable to large-scale data as long as d is small. subproblems: face verification and person re-identification. Considering that large scale learning sometimes refers to Both of them are challenging due to variations in illumina- the regime where learning is limited by computational re- tion, viewpoint, pose and expression. A general framework sources rather than availability of data (Gal Chechik and of addressing these two subproblems includes feature ex- Bengio 2010) and that our method has low time complexity, traction and matching, which solves the issues of (1) how to we then name our approach as large scale similarity learning extract efficient and robust features and (2) how to measure (LSSL) using similar pairs. We validate the performances of the similarity between an image pair based on the extracted LSSL on challenging datasets of face verification (LFW and features, respectively. This paper is mainly dedicated to the PubFig) and person re-identification (VIPeR). Experimental latter - similarity learning. results show that LSSL is both fast and accurate. Recently, learning a similarity measure (Kostinger¨ et al. In summary, the main contributions are two-fold: (1) We 2012; Gal Chechik and Bengio 2010; Nguyen and Bai 2010; propose a novel similarity measure and introduce a fast and Bohne et al. 2014; Cao, Ying, and Li 2013) has been well efficient method to learn it; (2) Benefiting from the con- studied and utilized to address the task of person verifica- sideration of both difference and commonness of an im- tion. Among them, metric learning aims at learning a Maha- age pair and from a pair-constrained Gaussian assumption, lanobis metric while similarity metric learning is to learn a we point out how to deduce priors of dissimilar pairs from bilinear similarity metric or a cosine similarity metric. How- those of similar pairs. The latter contribution is interest- ever, a single metric is inappropriate to handle heteroge- ing and important because it is useful for those based on neous data. To overcome this limitation, many approaches Bayesian rule (Moghaddam, Jebara, and Pentland 2000) based on more metrics are put forward. (e.g., KISSME) and avoids dealing with dissimilar pairs. Copyright c 2016, Association for the Advancement of Artificial The rest of this paper is organized as follows. We review Intelligence (www.aaai.org). All rights reserved. the related work on similarity learning and give a brief intro- 3655 duction to KISSME in section 2. The proposed method and metrics by solving a convex optimization problem. It is flex- experimental results are shown in sections 3 and 4, respec- ible and can be applied to a wide variety of scenarios. tively. In section 5, we conclude the paper. 2.1 A Brief Introduction to KISSME 2 Related Work In consideration of the fact that the solution of our method According to the literature survey (Bellet, Habrard, and Seb- is inspired by KISSME, we briefly introduce it in this sub- ban 2014), learning a global Mahalanobis metric has dom- section. Additionally, in experiments, we also show how to inated the metric learning literature and competitive results improve KISSME based on our method. are obtained. Based on the learned matrix M, the distance In a statistical inference perspective, KISSME aims at or similarity between a d-dimensional pair (xi,yj) is: learning a global Mahalanobis metric (defined by Eq. 1) from equivalence constraints. As there is a bijection between T dM (xi,yj)=(xi − yj) M(xi − yj) (1) the set of Mahalanobis metric and that of multivariate Gaus- sian distribution 1 , the Mahalanobis metric can be directly d×d where M ∈R is a positive semi-definite matrix. On computed in terms of the covariance matrix without opti- the basis of labeled pairs, how to learn M gives rise to mization. To seek their connection, the log likelihood ratio different metric learning methods. The first Mahalanobis defined by Eq. 2 is employed: distance learning approach - Xing (Xing et al. 2002) opti- P (z|HS ) T −1 −1 mizes M by maximizing the sum of distances between dis- s(z)=2log = C + z (ΣzD − ΣzS )z (2) similar points under the constraint of maintaining a small P (z|HD) |Σ | overall distances between similar points. Afterwards, Wein- where C = d × log zD is a constant (here, d is the dimen- berger et al. (Weinberger, Blitzer, and Saul 2006) intro- |ΣzS | duces one of the most widely used Mahalanobis distance sion of z). In KISSME, z refers to the difference of an image learning method named Large Margin Nearest Neighbors pair (xi-yj) and is assumed to follow two different Gaussian distributions (one is based on HS which represents the hy- (LMNN) by strengthening the correlation of target neigh- H bors while keeping instances from different classes far away. pothesis of a similar pair while the other on D denoting Without regularization, LMNN is prone to over-fitting dur- the hypothesis of a dissimilar pair). ing training. To overcome this problem, Davis et al. (Davis It can be seen that a higher value of s(z) indicates that et al. 2007) propose Information Theoretic Metric Learn- the pair is similar with a high probability. After stripping the constant C which just provides an offset, M in Eq. 1 ing (ITML) which guarantees the closeness of the possi- −1 − −1 ble solution to a given distance metric prior. In contrast can be written as ΣzD ΣzS . To make M be a positive to previous methods, KISSME which learns the Maha- semi-definite matrix, the authors of (Kostinger¨ et al. 2012) lanobis metric from equivalence constraints in (Kostinger¨ further re-project it onto the cone of positive semi-definite et al. 2012), does not rely on complex optimization and matrixes, i.e., clipping the spectrum of M by eigenanalysis. is orders of magnitudes faster. In (Law, Thome, and Cord Though simple, KISSME achieves surprisingly good results 2014), a linear regularization term is incorporated in the in person re-identification (Yang et al. 2014). objective function, which minimizes the k smallest eigen- values of the Mahalanobis metric. Under the regulariza- 3 Large Scale Similarity Learning tion, the rank of a learned Mahalanobis metric is explic- In this section, we first propose a novel similarity measure. itly controlled and the recognition on both controlled and Then, we demonstrate how to learn it using only similar real datasets are greatly improved. Instead of the Maha- pairs based on a statistical inference perspective. A pair- lanobis metric, other similarity metric for verification prob- constrained Gaussian assumption is made in the following. lems have two main forms: the bilinear similarity met- Under this assumption, we further show how to preprocess T ric sM (xi,yj)=xi Myj (Gal Chechik and Bengio the features. Finally, we discuss the parameter setting and 2010) and the cosine similarity metric CSM (xi,yj)= the benefit of PCA and compare with a previous work.

Large Scale Similarity Learning Using Similar Pairs for Person Verification

Deep Similarity Learning for Sports Team Ranking

RELIEF Algorithm and Similarity Learning for K-NN

Deep Metric Learning: a Survey

Cross-Domain Visual Matching Via Generalized Similarity Measure and Feature Learning

Low-Rank Similarity Metric Learning in High Dimensions

Learning Mahalanobis Distance Metric: Considering Instance Disturbance Helps∗

Unsupervised Domain Adaptation with Similarity Learning

Algorithms for Similarity Relation Learning from High Dimensional Data Phd Dissertation

Supervised Learning with Similarity Functions

Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features

Coupled Attribute Similarity Learning on Categorical Data Can Wang, Xiangjun Dong, Fei Zhou, Longbing Cao, Senior Member, IEEE, and Chi-Hung Chi

Learning Similarities for Linear Classification: Theoretical