Collaborative Metric Learning
Total Page:16
File Type:pdf, Size:1020Kb
Collaborative Metric Learning Cheng-Kang Hsiehz, Longqi Yangy,Yin Cuiy, Tsung-Yi Liny, Serge Belongiey, Deborah Estriny zUCLA; yCornell Tech [email protected]; y{ly283, yc984, tl483, sjb344, destrin}@cornell.edu ABSTRACT cial to the generalization of a learned metric [29, 51]. The Metric learning algorithms produce distance metrics that triangle inequality states that for any three objects, the sum capture the important relationships among data. In this of any two pairwise distances should be greater than or equal work, we study the connection between metric learning and to the remaining pairwise distance. This implies that, given collaborative filtering. We propose Collaborative Metric the information: \x is similar to both y and z," a learned Learning (CML) which learns a joint metric space to encode metric will not only pull the stated two pairs closer, but also pull the remaining pair (y; z) relatively close to one an- not only users' preferences but also the user-user and item- 1 item similarity. The proposed algorithm outperforms state- other. We can see this as a similarity propagation process, of-the-art collaborative filtering algorithms on a wide range in which the learned metric propagates the known similarity of recommendation tasks and uncovers the underlying spec- information to the pairs whose relationships are unknown. trum of users' fine-grained preferences. CML also achieves This notion of similarity propagation is closely related to significant speedup for Top-K recommendation tasks using collaborative filtering (CF). In collaborative filtering, we also off-the-shelf, approximate nearest-neighbor search, with neg- observe the relationships between certain (user,item) pairs, ligible accuracy reduction. i.e., the users' ratings of items, and would like to generalize this information to other unseen pairs [31, 40]. For example, the most well-known CF technique, matrix factorization, 1. INTRODUCTION uses the dot product between user/item vectors to capture The notion of distance is at the heart of many fundamental the known ratings, and uses the dot product of those vectors machine learning algorithms, including K-nearest neighbor, to predict the unknown ratings [25]. K-means and SVMs. Metric learning algorithms produce However, in spite of their conceptual similarity, in most a distance metric that captures the important relationships cases matrix factorization is not a metric learning approach among data and is an indispensable technique for many suc- because dot product does not satisfy the crucial triangle cessful machine learning applications, including image clas- inequality [36, 42]. As we will illustrate in Section 3.6, al- sification, document retrieval and protein function predic- though matrix factorization can capture users' general in- tion. [43, 45, 26, 52, 32]. In this paper we apply metric terests, it may fail to capture the finer grained preference learning to collaborative filtering problems and compare it information when the triangle inequality is violated, which to state-of-the-art collaborative filtering approaches. leads to suboptimal performance, as seen in Section 4. Also, Given a set of objects in which we know certain pairs while we can predict the ratings with a matrix factoriza- of objects are \similar" or \dissimilar," the goal of metric tion system, we cannot reliably determine the user-user and learning is to learn a distance metric that respects these item-item relationships with such a system, which signifi- 2 relationships. Specifically, we would like to learn a met- cantly limits the interpretability of resulting model. ric that assigns smaller distances between similar pairs, and In this paper, we propose collaborative metric learn- larger distances between dissimilar pairs. For instance, in ing (CML) which learns a joint user-item metric to encode face recognition, we may wish to learn a metric that assigns not only users' preferences but also the user-user and item- small distances to genuine (same identity) face pairs and item similarity. We focus on the collaborative filtering prob- large distances to impostor (different identity) pairs, based lem for implicit feedback and show that CML naturally cap- on image pixels [43, 29]. tures such relationships and outperforms state-of-the-art al- Mathematically, a metric needs to satisfy several condi- gorithms in a wide range of recommendation domains, in- tions, among which the triangle inequality is the most cru- cluding books, news and photography, by up to 13.95% in recall rates for pure rating-based CF and 17.66% for CF with content information. c 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. WWW 2017, April 3–7, 2017, Perth, Australia. 1(y; z)'s distance is bounded by the sum of distances between ACM 978-1-4503-4913-0/17/04. (x; y) and (x; z): d(y; z) ≤ d(x; y) + d(x; z). http://dx.doi.org/10.1145/3038912.3052639 2A common heuristic is to measure latent vectors' cosine similarity, but this heuristic leads to misleading results when the triangle inequality is not satisfied as shown in Section . 3.6. Beyond the improvement in accuracy, an exciting property 2.2 Metric Learning for kNN of CML is its capability of uncovering fine-grained relation- The above global optimization essentially attempts to learn ships among users' preferences. In Section 4.5, we demon- a distance metric that pulls all similar pairs together, and strate this capability using a Flickr photographic dataset. pushs dissimilar pairs apart. This objective, however, is not We demonstrate how the visualization of the learned metric always feasible. On the other hand, Weinberger et al. showed can reveal user preferences for different scenes and objects, that if the learned metric is to be used for k-nearest neigh- and uncovers the underlying spectrum of users' preferences. bor classification, it is sufficient to just learn a metric that We show CML's capability of integrating various types of makes each object's k-nearest neighbors be the objects that item features, including images, texts, and tags, through a share the same class label with that object (as opposed to probabilistic interpretation of the model. Finally, we also clustering all of the similar object together) [49]. This ob- demonstrate that the efficiency of CML's Top-K recommen- jective is much more attainable, and we adopt this relaxed dation tasks can be improved by more than 2 orders of notion of metric learning in our model as our goal is also to magnitude using the off-the-shelf Locality-Sensitive Hashing find the kNN items to recommend to each user. 3 (LSH), with only negligible reduction in accuracy [4]. Specifically, given an input x, we refer to the data points The remainder of this paper is organized as follows. In we desire to be the closest to x as its target neighbors. We Section 2 we review relevant prior work on metric learn- can imagine x's target neighbors establishing a perimeter ing and collaborative filtering. In Section 3 we propose our that differently labeled inputs should not invade. The dif- CML model along with a suite of regularization and train- ferently labeled inputs that invade the perimeter are referred ing techniques. In Section 4 we evaluate CML relative to to as impostors. The goal of learning, in general, is to learn the state-of-the-art recommendation algorithms and show a metric that minimizes the number of impostors [49]. how the learned metric can uncover users' fine-grained pref- One of the most well-known models of this type is large erences. In Section 5 we conclude and discuss future work. margin nearest neighbor (LMNN) [49] that uses two loss terms to formulate this idea. Specifically, LMNN defines a 2. BACKGROUND pull loss that pulls the target neighbors of an input x closer: In this section we review the background of metric learn- X 2 L (d) = d(x ; x ) ; ing and collaborative filtering with a particular focus on col- pull i j laborative filtering for implicit feedback. j;i 2.1 Metric Learning where j ; i denotes that the input j is input i's target neighbor. In addition, LMNN defines a push loss that pushes Let X = fx1; x2; :::; xng be a collection of data over the m away the impostors from the neighborhood and maintains a input space R . The label information in metric learning is margin of safety around the kNN decision boundaries: specified in the form of pairwise constraints, including the set of known similar pairs, denoted by X X 2 2 Lpush(d) = (1 − yik)[1 + d(xi; xj ) − d(xi; xk) ]+; S = f(xi; xj )jxi and xj are considered similarg; i;j;i k and the set of dissimilar pairs, denoted by where indicator function yik = 1 if input i and k are of the same class, otherwise yik = 0, and [z]+ = max(z; 0) is the D = f(xi; xj )jxi and xj are considered dissimilarg: standard hinge loss. The complete loss function of LMNN is a weighted combination of L (d) and L (d) and can The most original metric learning approach attempts to learn pull push be optimized through semidefinite programming. a Mahalanobis distance metric d (x ; x ) = p(x − x )T A(x − x ) ; A i j i j i j 2.3 Collaborative Filtering m×m where A 2 R is a positive semi-definite matrix [51]. We now turn our attention to collaborative filtering (CF), This is equivalent to projecting each input x to a new space in particular, collaborative filtering with implicit feedback. m R in which their euclidean distances obey the desired con- Traditional collaborative filtering algorithms were based on straints. There are many different ways to create such a the user similarity computed by heuristics, such as cosine metric.