Learning Dissimilarities by Ranking: from SDP to QP

Learning Dissimilarities by Ranking: From SDP to QP Hua Ouyang [email protected] College of Computing, Georgia Institute of Technology Alex Gray [email protected] College of Computing, Georgia Institute of Technology Abstract lem: dissimilarity ranking (d-ranking). Unlike rank- We consider the problem of learning dis- ing, this problem learns from instances like \A is more similarities between points via formulations similar to B than C is to D" or \The distance be- which preserve a speci¯ed ordering between tween E and F is larger than that between G and points rather than the numerical values of H". Note that the dissimilarities here are not neces- the dissimilarities. Dissimilarity ranking (d- sarily distances. Other than real vectors in conven- ranking) learns from instances like \A is more tional ranking problems, the data to be ranked here similar to B than C is to D" or \The dis- are dissimilates of pairwised data vectors. This prob- tance between E and F is larger than that lem can be stated as: learning an explicit or implicit between G and H". Three formulations of d- function which gives ranks over a space of dissimilar- ranking problems are presented and new al- ities d (X; X) 2 R. Based on di®erent requirements of gorithms are presented for two of them, one applications, this learning problem can have various by semide¯nite programming (SDP) and one formulations. We will present some of them in Section by quadratic programming (QP). Among the 2. novel capabilities of these approaches are out- D-ranking can be regarded as a special instance of dis- of-sample prediction and scalability to large similarity learning (or metric learning). Di®erent dis- problems. similarity learning methods have di®erent goals. We highlight some previous work as below. 1. Introduction ² In metric learning methods (Hastie & Tibshirani, Ranking or sometimes referred as ordinal regression, is 1996; Xing et al., 2002), the purpose of learning a a statistical learning problem which gained much at- proper Mahalanobis distance is to achieve better tention recently (Cohen et al., 1998; Herbrich et al., class/cluster separations. 1999; Joachims, 2002). This problem learns from rel- ² In kernel learning methods (Lanckriet et al., 2004; ative comparisons like \A ranks lower than B" or \C Micchelli & Pontil, 2005), learning a proper ker- ranks higher than D". The goal is to learn an explicit nel is equivalent to learning a good inner-product or implicit function which gives ranks over an sampling function which introduces a dissimilarity in the space X. In most of these tasks, the sampled instances input space. The purpose is to maximize the per- D to be ranked are vector-valued data in R , while the formance of a kernel-based learning machine. ranks are real numbers which can be either discrete or continuous. If the problem is to learn a real valued ² Multidimensional scaling (MDS) (Borg & Groe- ranking function, it can be stated as: given a set S of nen, 2005) and Isomap (Tenenbaum et al., 2000) pairs (xi; xj) 2 S (which indicates that the rank of xi can also be regarded as learning an implicity func- D L is lower than xj), learn a real valued f : X ! R that tion f : R ! R . The purpose of learning satis¯es f(xm) < f(xn) if the rank of xm is lower than an embedding is to preserve distances in a low- L xn. dimensional Euclidean space R . In this paper we investigate a special ranking prob- In our d-ranking problems, the purpose of learning a Appearing in Proceedings of the 25 th International Confer- proper dissimilarity is to preserve the ranks of dissim- ence on Machine Learning, Helsinki, Finland, 2008. Copy- ilarities, not the absolute values of them (which is the right 2008 by the author(s)/owner(s). case in MDS and Isomap). For example, if we know Learning Dissimilarities by Ranking: From SDP to QP that \The distance between A and B is smaller than cations. Next we will give three formulations. that between C and D", the problem can be formu- Formulation 2.1. (F1) Inputs: a set S of ordered lated as: ¯nd a dissimilarity function d, such that quadruples (i; j; k; l) 2 S, indicating that d(xi; xj) · d(A; B) < d(C; D). d(xk; xl), where d(¢; ¢) is a ¯xed but unknown dissim- Unlike conventional learning and ranking problems, d- ilarity function; Outputs: coe±cients of embedded 0 0 0 0 L ranking hasn't received intensive studies in previous samples xi; xj; xk; xl 2 R ; Criteria: (i; j; k; l) 2 0 0 2 0 0 2 research. One of the most important related work S , kxi ¡ xjk2 · kxk ¡ xlk2. is the nonmetric multidimensional scaling (NMDS) (Borg & Groenen, 2005). Given a symmetric prox- As proposed by Agarwal et. al. (Agarwal, 2007), in imity (similarity or dissimilarity) matrix ¢ = [±mn], F1 we neither assume any geometry of the input space, NMDS tries to ¯nd a low dimensional embedding space nor assume any form of dissimilarities in it. We do L L 2 R such that 8xi; xj; xk; xl 2 R ; kxi ¡ xjk2 < not need to know the coe±cients of input samples. 2 kxk ¡ xlk2 , ±ij < ±kl. NMDS was recently ex- Only ordering information is provided. Nonetheless tended to the generalized NMDS (GNMDS) (Agarwal, we assume a Euclidean metric in the embedding space, 2007). GNMDS does not need to know the absolute which is often of low dimensions (e.g. L = 2, or 3). As values of proximities ±mn. Instead it only need a set shown in Section 3, F1 can be formed as a problem of S of quadruples (i; j; k; l) 2 S, which indicate that semide¯nite programming (SDP). ±ij < ±kl. Formulation 2.2. (F2) Inputs: a set S of ordered Both NMDS and GNMDS learn an embedding space quadruples (i; j; k; l) 2 S, indicating that d(xi; xj) · instead of learning an explicit ranking function, thus d(xk; xl), where d(¢; ¢) is a ¯xed but unknown dissimi- they are unable to handle out-of-sample problems. larity function; corresponding coe±cients in the input D Schultz et. al. gave a solution to these problems by Euclidean space xi; xj; xk; xl 2 R ; Outputs: dissim- proposing to learn a distance metric from relative com- ilarity functions d^(¢; ¢): RD £ RD ! R; Criteria: ^ ^ parisons (Schultz & Joachims, 2003). They choose to (i; j; k; l) 2 S , d(xi; xj) · d(xk; xl). learn a Mahalanobis distance which can preserve ranks of distances. Since the learned distance functions are Unlike learning an embedding space as in F1, F2 parameterized, they can be used to handle new sam- learns an explicit dissimilarity function d^(¢; ¢) which ples. The proposed formulation was solved in a similar preserves the ranks of dissimilarities. We will show manner as SVM. Nonetheless, the regularization term in Section 4 that F2 can be handled in a very sim- was not well justi¯ed. ilar manner as support vector machines, where the quadratic programming (QP) problem can be solved Many applications in biology, computer vision, web e±ciently by specialized sequential optimization meth- search, social science etc. can be put into the frame- ods. If in some cases we need to ¯nd a low dimensional work of d-ranking problems. Take document classi¯ca- Euclidean embedding of the input samples, we can tion as an instance. Without adequate domain knowl- then use the classical multidimensional scaling (MDS) edge, it is hard to accurately determine the quantita- to preserve the learned dissimilarities. tive dissimilarities between two documents. However, comparing the dissimilarities between every three or Formulation 2.3. (F3) Inputs: a set S of ordered four documents can be easily done, either automat- quadruples (i; j; k; l) 2 S, indicating that d(xi; xj) · ically or manually. Generally speaking, d-ranking is d(xk; xl), where d(¢; ¢) is a ¯xed but unknown dissimi- especially useful when the quantized dissimilarities are larity function; corresponding coe±cients in the input D not reliable. Euclidean space xi; xj; xk; xl 2 R ; Outputs: pro- D L 0 0 0 0 L jection function f : R ! R , xi; xj; xk; xl 2 R ; In Section 2, we propose three formulations of d- 0 0 0 0 Criteria: (i; j; k; l) 2 S , kx ¡ x k2 · kx ¡ x k2. ranking problems. Section 3 gives the numerical so- i j 2 k l 2 lutions for solving d-ranking by SDP. Section 4 shows Although we formulate F3 as a function learning prob- how to solve d-ranking by QP. The proposed methods lem, currently we have not found any e±cient method are evaluated in Section 5. Section 6 concludes the to solve it. This formulation will remains as our future paper. work. 2. Three Formulations of D-Ranking 3. Solving F1 by SDP D-ranking problems can have various formulations de- F1 was studied by Agarwal et. al. (Agarwal, 2007). pending on speci¯c requirements or settings of appli- The authors proposed GNMDS which can be solved as Learning Dissimilarities by Ranking: From SDP to QP 500 true locations a SDP, as shown in Eq.(1). 10 400 GNMDS 10 300 1 200 1 GNMDS: 6 100 6 X 3 34 5 5 2 min » + ¸tr(K); 0 ijkl 4 7 -100 7 ijkl2S 2 8 -200 s.t. (Kkk ¡ 2Kkl + Kll) ¡ (Kii ¡ 2Kij + Kjj) 8 (1) -300 9 + »ijkl ¸ 1; -400 9 X -500 -600 -400 -200 0 200 400 600 Kab = 0;»ijkl ¸ 0;K º 0; ab Figure 1.

Load more