Improve the “Long Tail" Recommendation Through Popularity-Sensitive Clustering
Total Page:16
File Type:pdf, Size:1020Kb
Improve the “Long Tail" Recommendation through Popularity-Sensitive Clustering Huaiyi Huang, Ying Zhao, Jiabao Yan, Lei Yang Department of Computer Science and Technology / Department of Software Engineering Tsinghua University, Beijing, 100084, China {huanghy11, yanjb11, yanglei11}@mails.tsinghua.edu.cn [email protected] ABSTRACT popular items for the lack of information. Several recom- Collaborative filtering(CF) is one of the most successful ap- mendation methods were proposed in the literature with proaches to build recommender system. In recommender the aim of improving the \long tail" recommendation per- system, \long tail" items are considered to be particularly formance [10, 21, 13, 14, 22]. They are either specific rec- valuable. Many clustering algorithms based on CF designed ommendation methods targeting only \long tail" items [10, only to tackle the \long tail" items, while others trade off 21, 22], or comprehensive ones trading-off between overall the overall recommendation accuracy and \long tail" perfor- recommendation accuracy and \long tail" performance [13, mance. Our approach is based on utilizing the item popular- 14]. ity information. We demonstrate that \long tail" recommen- dation can be inferred precisely by balanced item popularity We propose to tackle this problem from a clustering perspec- of each cluster. We propose a novel popularity-sensitive clus- tive. In addition to reaching the common goals of clustering, tering method. In terms of \long tail" and overall accuracy, such as grouping users with similar behaviors or to increase our method outperforms previous ones via experiments on the density of the user-item matrix for each cluster, our clus- MovieLens, citeUlike, and MobileApp. tering method also aims to make the popularity distribution of items within each cluster more balanced and to reduce Categories and Subject Descriptors the effect of \long tail". To the best of our knowledge, this is a novel perspective that utilizes clustering collaborative H.3.3 [Information Search and Retrieval]: Information filtering models to improve the \long tail" item recommen- filtering; H.3.5 [Online Information Services]: Web-based dation. services This paper proposes a novel popularity-sensitive clustering General Terms method to improve the \long tail" item recommendation and Algorithms, Performance makes the following two contributions. Keywords First, we propose a clustering objective function that di- rectly models item popularity in each cluster, and adaptively Collaborative Filtering, Recommender Systems, Clustering normalizes item popularity by the total item popularity asso- Algorithm, Long Tail ciated with the item and the cluster. We design such an ob- jective function that its optimal solution tends to have more 1. INTRODUCTION evenly distributed item popularity in each cluster. Conse- Collaborative filtering (CF) techniques and recommenda- quently, the less popular items in resulting clusters have a tion systems have been studied extensively in the past few better chance to be recommended correctly. decades [1, 8, 16] to fulfill users' information needs with per- sonalized recommendations. Items with very low usage are Second, we propose an incremental optimization algorithm called \long tail" items, and are very common in domains to optimize the proposed object function. We conduct a like mobile apps, reference papers, and movies. Recommen- comprehensive experimental evaluation of the proposed clus- dations from the \long tail" of the popularity distribution tering method on three real world datasets against three of items are generally considered to be particularly valuable other state-of-the-art clustering methods. Our experimen- [15], and are much more challenging than recommending tal results show that the proposed popularity-sensitive clus- tering method indeed produces clusters with more evenly distributed item popularity, and it improves not only the \long tail" recommendation accuracy, but also overall rec- ommendation accuracy for a wide range of recommendation methods. The rest of this paper is organized as follows. Section 2 provides some discussions on related work on the \long tail" problem and clustering collaborative filtering models. Sec- tion 3 describes the motivation of popularity-sensitive clus- tering. Section 4 describes the graph model of item pop- 100% MovieLens ularities in clusters and proposes our popularity-sensitive CUL clustering algorithm with a discussion on its objective func- 10% MobileApp tion and optimization method. Section 5 presents a compre- hensive experimental evaluation of the proposed clustering 1% method on three real world datasets. Finally, Section 6 pro- vides some concluding remarks. 0.1% 2. RELATED WORK Percent of Users Collaborative filtering (CF) techniques are the most widely 0.01% used and well performing algorithms that have been pro- 0 20% 40% 60% 80% 100% posed developed in the past for recommender systems [1, Percent of Items 8, 16]. Among the challenges that collaborative filtering techniques have to face, such as cold start, diversity, etc., the problem of \long tail" has been identified and studied Figure 1: Distribution of item popularities with to a large extent [10, 21, 13, 15, 14, 2, 22]. Recommenda- items sorted by popularity tions from the \long tail" of the popularity distribution of items are generally considered to be particularly valuable, but the recommendation accuracy tends to decrease towards \long tail" of the popularity distribution of items are gen- the \long tail". erally considered to be particularly valuable, while the rec- ommendation accuracy tends to decrease towards the \long The study of \long tail" came in different perspectives. [2] tail". The more unbalanced a popularity distribution is, the showed the importance of recommending\long tail"products severer the problem is. In other words, for datasets that is and how this is going to change the e-commerce. [15] pro- highly unbalanced, recommending less popular items usually posed an accuracy metric that takes the effect of \long tail" means a decrease in accuracy. into consideration. Many recommendation methods were proposed to tackle the problem of \long tail". There are In this paper, we examined three datasets MovieLens, CUL, specific recommendation methods targeting only \long tail" and MobileApp from movie, reference papers, and mobile items [10, 21], and also are ones trading-off between over- app domains, respectively. In Figure 3, we show the distribu- all recommendation accuracy and \long tail" performance tion of item popularites in terms of percentage of total users, [13, 14]. For example, Eigenapp is a recommendation al- with items sorted by popularity. The MobileApp dataset gorithm for recommending mobile apps in online mobile exhibits the \long tail" phenomenon most significantly, i.e., app stores proposed by [13], which employs an item nor- only a few popular items are used by the majority of users malization based on global popularity and a PCA analy- and the majority of apps are not being used in any signifi- sis to promote less popular items. In [14], a graph model cant sense (say being used by more than 0:1% of users). The was proposed that has an explicit \long tail" factor to trade CUL dataset exhibits the \long tail" shape as well, however, off among accuracy, diversity, and long-tail. Both [13] and with a key difference that the majority of papers (greater [14] achieved better recommendation accuracy for \long tail" than 90%) are being cited by 1% to 0:1% of users. The items with the sacrifice of overall recommendation accuracy. MovieLens dataset on the other hand is less challenging for To the best of our knowledge, there has not been methods unpopular movies, as the majority of movies has more than whose targets are to improve overall recommendation accu- 1% of viewers. Recommendations from the \long tail" of the racy and \long tail" accuracy simultaneously like ours. popularity distribution of items are particularly valuable for MobileApp and CUL. The technique of clustering has been used together collab- orative filtering from the very beginning [12, 7]. Studies We propose to tackle this problem from a clustering perspec- used clustering to improve efficiency of recommender sys- tive. In addition to reaching the common goals of clustering, tems [12], or to generate collection of user data that share such as grouping users with similar behaviors or to increase similar behaviors or close relationships [4, 18, 17, 23, 20, the density of the user-item matrix for each cluster, our clus- 7, 6], based on which collaborative filtering techniques can tering method also aims to make the popularity distribution be applied to. Some of the proposed clustering methods fo- within each cluster more balanced and to reduce the effect cused on forming clusters of users or items [12, 4, 18, 17, 23], of \long tail". In doing so, our popularity sensitive clustering and others focused on finding co-clusters [20, 7, 6], clusters method gives less popular items a better chance to be rec- of a subset of users and a subset of items. For example, [20] ommended correctly in resultant clusters without sacrificing proposed a multiclass co-clustering method that allows users the accuracy of overall recommendation. and items present in multiple clusters. To the best of our knowledge, there has not been clustering methods proposed Note that, normalizing item weights according to its global to solve the problem of \long tail". popularity in the entire dataset before performing clustering