The Long Tail of Recommender Systems and How to Leverage It

The Long Tail of Recommender Systems and How to Leverage It Yoon-Joo Park Alexander Tuzhilin Stern School of Business, New York University Stern School of Business, New York University [email protected] [email protected] ABSTRACT and fully-grouped methods. We demonstrate this performance The paper studies the Long Tail problem of recommender systems improvement by running various experiments on two "real-world" when many items in the Long Tail have only few ratings, thus datasets. Finally, we examine head/tail splitting strategies making it hard to use them in recommender systems. The reducing error rates of recommendations and demonstrate that this approach presented in the paper splits the whole itemset into the partitioning often outperforms clustering of the whole itemset. head and the tail parts and clusters only the tail items. Then The Long Tail problem in the context of recommender recommendations for the tail items are based on the ratings in systems has been addressed previously in [3] and [4]. In these clusters and for the head items on the ratings of individual particular, [3] analyzed the impact of recommender systems on items. If such partition and clustering are done properly, we show sales concentration and developed an analytical model of that this reduces the recommendation error rates for the tail items, consumer purchases that follow product recommendations while maintaining reasonable computational performance. provided by a recommender system. The recommender system follows a popularity rule, recommending the bestselling products Categories and Subject Descriptors to all consumers, and they show that the process tends to increase H.3.3 [Information Storage and Retrieval]: Information Search and the concentration of sales. As a result, the treatment is somewhat Retrieval. akin to providing product popularity information. The model in [3] does not account for consumer preferences and their I.2.6 [Artificial Intelligence]: Learning. incentives to follow recommendations or not. Also [3] studied the effects of recommender systems on sales concentration and did General Terms: Algorithm, Performance, Experimentation. not address the problem of improving recommendations for the items in the Long Tail, which constitutes the focus of this paper. Long Tail, clustering, recommendation, data Keywords: In [4], a related question has been studied: to which extent mining recommender systems account for an increase in the Long Tail of the sales distribution. [4] shows that recommender systems 1. INTRODUCTION increase firm’s profits and affect sales concentration. Many recommender systems ignore unpopular or newly introduced items having only few ratings and focus only on those Another related problem is the problem of the cold start [1]. items having enough ratings to be of real use in the This is the case because our approach can be viewed as a solution recommendation algorithms. Alternatively, such unpopular or to the cold start problem for the items in the Long Tail that have newly introduced items can remain in the system but would only very few ratings. A popular solution to the cold start problem require special handling using various cold start methods, such as utilizes content-based methods when two items with no or only the ones described in [1]. few ratings are inferred to be similar based on their content [1]. In our work, we use grouping of items in the long tail, rather than the Using the terminology introduced in [2], these unpopular or content-based methods to identify similar items and to leverage new items belong to the Long Tail of the item distribution, as their combined ratings to provide better recommendations. shown in Figure 1 for the MovieLens dataset. Following the spirit of extensive research on the Long Tail phenomena [2], these types Our work is also related to the clustering methods used in of items should not be discarded or ignored but gainfully utilized recommender systems. In particular, [9] clusters similar users into in recommendation methods. In this paper, we study the Long the same cluster to overcome the data sparsity problem for Tail of recommender systems and propose a new method of collaborative filtering. Also in [8], item clustering is used to managing such items from the Long Tail. In particular, we improve the prediction accuracy of collaborative filtering where propose to split items into the head and tail parts and group items items were divided into smaller groups, and existing CF in the tail part using certain clustering methods. We show that algorithms were applied to each group category separately. We such splitting and grouping improves recommendation use related clustering ideas but in the context of the Long Tail performance as compared to some of the alternative non-grouping phenomenon to leverage few ratings of the items in the Long Tail. Permission to make digital or hard copies of all or part of this work for 2. BACKGROUND personal or classroom use is granted without fee provided that copies are In this section, we provide some background information about not made or distributed for profit or commercial advantage and that copies the Long Tail problem in recommender systems and its solutions. bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior 2.1 Preliminaries specific permission and/or a fee. We assume that there is a set of items I, a set of customers C and RecSys’08, October 23–25, 2008, Lausanne, Switzerland. the set of known ratings R = {rij} provided by the customers in C Copyright 2008 ACM 978-1-60558-093-7/08/10...$5.00. 11 for the items in I. Let Ri = {rij} be the set of ratings provided by movie Toy Story had 272 ratings, we can build a linear regression customers in C for item i. We order all the items in I according to model to predict the unknown ratings for that movie, use RMSE to the number of ratings |Ri| provided by customers for that item. measure performance of the model, and apply 10-fold cross The histogram in Figure 1 presents the frequency distribution of validation to compute RMSE for that movie. This process was the item’s rating numbers and follows the Long Tail [2] for the repeated 1682 times for each movie in MovieLens. As Figure 1 MovieLens dataset [5] described in Section 4. The whole itemset I demonstrates, the movies in its long tail have only few ratings, and can be partitioned into the head H and the tail T parts by selecting predictive models are learned from only few training examples a cutting point α along the x-axis of the histogram. using the EI method. Finally, since we used 10-fold cross validation, the minimal number of ratings needed for the model-building purposes was 10, which was the case with MovieLens. In contrast, when we group items by applying clustering methods to the whole itemset I, we call it Total Clustering (TC) recommendation method. In other words, TC clusters the whole itemset I into different groups and builds rating predictive models for each resulting group. Finally, if we split itemset I into the head H and tail T, apply clustering only to the tail T while leaving head items un-clustered, and build data mining models for each cluster in tail T and individual models in head H, we call it Clustered Tail (CT) recommendation method. The main problem with the Each Item (EI) recommendation Figure 1. Histogram of the items’ (movies’) rating frequencies method is that only few ratings are available in the Long Tail, and for the MovieLens data. the performance rates of these models deteriorate in the Long Tail of the itemset I. We describe this problem in Section 2.2 and present In this paper, we assume that the recommendations are provided various ways to address it in the rest of the paper. as follows. First, we group items in I according to some clustering algorithm [7]. Then for each cluster of items Ik we build a data 2.2 The Long Tail Problem of Recommender mining model predicting unknown ratings in the cluster Ik based on the known ratings Rk = {rij}i∈Ik and the parameters of items in cluster Systems Ik and customers in C. For example, we can build a linear regression We used the Each Item (EI) method to build rating estimation model for a cluster of items Ik using the known ratings for that models for individual items in I, as described in Section 2.1. We cluster to estimate the unknown ratings for the items in Ik. We can have used Weka [7] to repeat this model building process across two also determine the error rates of these models, such as RMSE and datasets (MovieLens and BookCrossing), two types of performance MAE, by testing performance of these data mining models on the measures (MAE and RMSE) and nine types of predictive models: holdout data. (1) SimpleLinearRegression (SLR), (2) Gaussian radial basis function network (RBF), (3) Support vector machines (SVM), (4) K- In order to build these data mining models, we first need to nearest neighbours (KNN), (5) Locally-weighted learning (LWL), (6) specify the variables pertaining to items I and customers C. We Bagging classifier (Bagging), (7) DecisionStump tree assume that customers C have certain attributes, such as name, age, (DecisionStump), (8) M5P model tree (M5P) and (9) 0-R Rule gender and address, associated with the customers and that items in Induction classifier (ZeroR). Furthermore, we have build these I have attributes, such as item name, price, size and weight, individual item models using five sets of derived variables that associated with them.

Load more