Compositional Coding for Collaborative Filtering∗
Total Page:16
File Type:pdf, Size:1020Kb
Compositional Coding for Collaborative Filtering∗ Chenghao Liu1;2, Tao Lu2;4, Xin Wang3, Zhiyong Cheng5;6, Jianling Sun2;4, Steven C.H. Hoi1 1Singapore Management University, 2Zhejiang University, 3Tsinghua University 4Alibaba-Zhejiang University Joint Institute of Frontier Technologies 5Shandong Computer Science Center (National Supercomputer Center in Jinan) 6Qilu University of Technology (Shandong Academy of Sciences) {twinsken,3140102441,sunjl}@zju.edu.cn,[email protected],[email protected],[email protected] ABSTRACT KEYWORDS Efficiency is crucial to the online recommender systems, especially Recommendation, Collaborative Filtering, Discrete Hashing for the ones which needs to deal with tens of millions of users and items. Because representing users and items as binary vectors for 1 INTRODUCTION Collaborative Filtering (CF) can achieve fast user-item affinity com- putation in the Hamming space, in recent years, we have witnessed Real-world recommender systems often have to deal with large an emerging research effort in exploiting binary hashing techniques numbers of users and items especially for online applications, such for CF methods. However, CF with binary codes naturally suffers as e-commerce or music streaming services [2, 3, 13, 14, 27]. For from low accuracy due to limited representation capability in each many modern recommender systems, a de facto solution is often bit, which impedes it from modeling complex structure of the data. based on Collaborative Filtering (CF) techniques, as exemplified In this work, we attempt to improve the efficiency without by Matrix Factorization (MF) algorithms [8]. The principle of MF hurting the model performance by utilizing both the accuracy is to represent users’ preferences and items’ characteristics into r of real-valued vectors and the efficiency of binary codes to rep- low-dimensional vectors, based on the m × n user-item interaction resent users/items. In particular, we propose the Compositional matrix of m users and n items. With the obtained user and item Coding for Collaborative Filtering (CCCF) framework, which not vectors (in the offline training stage), during the online recommen- only gains better recommendation efficiency than the state-of-the- dation stage, the preference of a user towards an item is computed art binarized CF approaches but also achieves even higher accuracy by the dot product of their represented vectors. However, when than the real-valued CF method. Specifically, CCCF innovatively dealing with large numbers of users and items (e.g., millions or represents each user/item with a set of binary vectors, which are even billions of users and items), a naive implementation of typical associated with a sparse real-value weight vector. Each value of the collaborative filtering techniques (e.g., based on MF) will leadto weight vector encodes the importance of the corresponding binary very high computation cost for generating preferred item ranking vector to the user/item. The continuous weight vectors greatly en- list for a target user [11]. Specifically, recommending the top-k pre- hances the representation capability of binary codes, and its sparsity ferred items for a user from those n items costs O¹nr +n logkº with guarantees the processing speed. Furthermore, an integer weight real-valued vectors. As a result, this process will become a critical approximation scheme is proposed to further accelerate the speed. efficiency bottleneck in practice where the recommender systems Based on the CCCF framework, we design an efficient discrete opti- typically require a real-time response for large-scale users simulta- mization algorithm to learn its parameters. Extensive experiments neously. Therefore, a fast and scalable yet accurate CF solution is on three real-world datasets show that our method outperforms crucial towards building real-time recommender systems. the state-of-the-art binarized CF methods (even achieves better Recent years have witnessed extensive research efforts for im- performance than the real-valued CF method) by a large margin in proving the efficiency of CF methods for scalable recommender terms of both recommendation accuracy and efficiency. We publish systems. One promising paradigm is to explore the hashing tech- our project at https://github.com/3140102441/CCCF. niques [18, 33, 35] to represent users/items with binary codes in- arXiv:1905.03752v1 [cs.IR] 9 May 2019 stead of the real-value latent factors in traditional MF methods. In CCS CONCEPTS this way, the dot-products of user vector and item vector in MF can be completed by fast bit-operations in the Hamming space [35]. • Information systems → Recommender System; • Human- Furthermore, by exploiting special data structures for indexing all centered computing → Collaborative filtering; items, the computational complexity of generating top-K preferred items is sub-linear or even constant [26, 33], which significantly ∗Jianling Sun is the corresponding author. accelerates the recommendation process. However, learning the binary codes is generally NP-hard [5] Permission to make digital or hard copies of part or all of this work for personal or due to its discrete constraints. Given this NP-hardness, a two-stage classroom use is granted without fee provided that copies are not made or distributed optimization procedure [18, 33, 35], which first solves a relaxed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. optimization problem through ignoring the discrete constraints For all other uses, contact the owner/author(s). and then binarizes the results by thresholding, becomes a compro- SIGIR’19, 2019, July 2019, Paris, France mising solution. Nevertheless, this solution suffers from a large © 2019 Copyright held by the owner/author(s). ACM ISBN 123-4567-24-567/08/06...$15.00 quantization loss [30] and thus fails to preserve the original data https://doi.org/10.475/123_4 geometry (user-item relevance and user-user relationship) in the SIGIR’19, 2019, July 2019, Paris, France Chenghao and Tao, et al. continuous real-valued vector space. As accuracy is arguably the programming subproblems. Besides, we develop an integer approx- most important evaluation metric for recommender systems, re- imation strategy for the weight vectors. This strategy can further searchers put lots of efforts to reduce the quantization loss by direct accelerate the recommendation speed. We conduct extensive experi- discrete optimization [12, 16, 30]. In spite of the advantages of this ments in which our promising results show that the proposed CCCF improved optimization method, compared to real-valued vectors, method not only improves the accuracy but also boosts retrieval CF with binary codes naturally suffers from low accuracy due to efficiency over state-of-the-art binary coding methods. limited representation capability in each bit, which impedes it from modeling complex relationship between users and items. 2 PRELIMINARIES In this section, we first review the two-stage hashing method for y 풅ퟑ -1 +1 +1 +1 풅ퟏ collaborative filtering. Then, we introduce the direct discrete opti- Film Title Genres mization method, which has been used in Discrete Collaborative Filtering (DCF) [30]. Finally, we discuss the limitation of binary 푣3 풗ퟏ 풅ퟏ Star Wars Action, Adventure 푣 codes in representation capability. 1 x 풗ퟐ 풅ퟐ Star Wars II Action, Adventure The Matrix 풗 풅 Action, Sci-Fi 푣2 ퟑ ퟑ Reloaded 2.1 Two-stage Hashing Method 푣 4 풗ퟒ 풅ퟒ Titanic Drama, Romance Matrix Factorization (MF) is the most successful and widely used 풅ퟒ -1 -1 +1 -1 풅ퟐ CF based recommendation method. It represents users and items Figure 1: A toy example to illustrate the limitation of bi- with real-valued vectors. Then an interaction of the corresponding nary codes of Discrete Collaborative Filtering (DCF) [30]. user and item can be efficiently estimated by the inner product. m×n v1; v2; v3, and v4 denote the real-valued vectors for item em- Formally, given a user-item interaction matrix R 2 R with m r r beddings, and d1; d2; d3, and d4 denote the binary codes for users and n items. Let ui 2 R and vj 2 R denote the latent vector item embeddings. According to the film title and genres, v1 for user i and item j respectively. Then, the predicted preference of > is the most similar to v2, followed by v3, while they are all user i towards item j is formulated as rˆij = ui vj : To learn all user > m×r dissimilar to v4. However, the binary codes learned by DCF latent vectors U = »u1;:::; um¼ 2 R and item latent vectors > n×r cannot preserve the intrinsic similarity due to the limited V = »v1;:::; un¼ 2 R , MF minimizes the following regularized representation capability of binary codes. squared loss on the observed ratings: Õ > 2 arg min ¹Rij − u vj º + λR¹U; Vº; (1) Figure 1 gives an example to illustrate the limit of binary codes. U;V i From the “film title" and “genres", we can see that Star Wars is the ¹i; jº2V most similar with Star Wars II, followed by The Matrix Reloaded, and where V denotes the all observed use-item pairs and R¹U; Vº is all of them are action movies while Titanic is remarkably dissimilar the regularization term with respect to U and V controlled by to them which is categorized as a Drama movie. The real-valued λ > 0. To improve recommendation efficiency, after we obtain vectors could easily preserve the original data geometry in the the optimized user/item real-valued latent vectors, the two-stage continuous vector space (e.g., intrinsic movie relationships), like hashing method use binary quantization (rounding off [33] or ro- v1; v2; v3, and v4. However, if we preserve the geometric relations tate [18, 35]) to convert the continuous latent representations into > m×r of Star Wars, Star Wars II and The Matrix Reloaded by representing binary codes.