JournalofMachineLearningResearch13(2012)519-547 Submitted 2/11; Revised 7/11; Published 3/12
Metric and Kernel Learning Using a Linear Transformation
Prateek Jain [email protected] Microsoft Research India #9 Lavelle Road Bangalore 560 003, India
Brian Kulis [email protected] 599 Dreese Labs Ohio State University Columbus, OH 43210, USA
Jason V. Davis [email protected] Etsy Inc. 55 Washington Street, Ste. 512 Brooklyn, NY 11201
Inderjit S. Dhillon [email protected] The University of Texas at Austin 1 University Station C0500 Austin, TX 78712, USA
Editors: Soren¨ Sonnenburg, Francis Bach and Cheng Soon Ong
Abstract Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and an- alyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning for- mulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. Keywords: metric learning, kernel learning, linear transformation, matrix divergences, logdet divergence
1. Introduction
One of the basic requirements of many machine learning algorithms (e.g., semi-supervised cluster- ing algorithms, nearest neighbor classification algorithms) is the ability to compare two objects to compute a similarity or distance between them. In many cases, off-the-shelf distance or similarity