Recommender Systems for Large-scale E-Commerce: Scalable Neighborhood Formation Using Clustering

Badrul M. Sarwar†∗,GeorgeKarypis‡, Joseph Konstan†, and John Riedl† {sarwar, karypis, konstan, riedl}@cs.umn.edu †GroupLens Research Group / ‡Army HPC Research Center Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455, USA

Abstract challenges for collaborative filtering recommender sys- tems. Recommender systems apply knowledge discovery tech- The first challenge is to improve the scalability of the niques to the problem of making personalized prod- collaborative filtering algorithms. These algorithms are uct recommendations during a live customer interac- able to search tens of thousands of potential neighbors tion. These systems, especially the k-nearest neigh- in real-time, but the demands of modern E-commerce bor collaborative filtering based ones, are achieving systems are to search tens of millions of potential neigh- widespread success in E-commerce nowadays. The bors. Further, existing algorithms have performance tremendous growth of customers and products in re- problems with individual consumers for whom the site cent years poses some key challenges for recommender has large amounts of information. For instance, if a systems. These are:producing high quality recommen- site is using browsing patterns as indications of prod- dations and performing many recommendations per uct preference, it may have thousands of data points second for millions of customers and products. New for its most valuable customers. These “long customer recommender system technologies are needed that can rows” slow down the number of neighbors that can be quickly produce high quality recommendations, even searched per second, further reducing scalability. The for very large-scale problems. We address the perfor- second challenge is to improve the quality of the recom- mance issues by scaling up the neighborhood formation mendations for the consumers. Consumers need recom- process through the use of clustering techniques. mendations they can trust to help them find products they will like. If a consumer trusts a recommender sys- 1 Introduction tem, purchases a product, and finds out he does not like the product, the consumer will be unlikely to use the The largest E-commerce sites offer millions of prod- recommender system again. In some ways these two ucts for sale. Choosing among so many options is challenges are in conflict, since the less time an algo- challenging for consumers. Recommender systems have rithm spends searching for neighbors, the more scalable emerged in response to this problem. A recommender it will be, and the worse its quality. For this reason, system for an E-commerce site recommends products it is important to treat the two challenges simultane- that are likely to fit her needs. Today, recommender ously so the solutions discovered are both useful and systems are deployed on hundreds of different sites, practical. serving millions of consumers. One of the earliest and The focus of this paper is two-fold. First, we in- most successful recommender technologies is collabora- troduce the basic concepts of a collaborative filter- tive filtering [5, 8, 9, 13]. Collaborative filtering (CF) ing based recommender system and discuss its vari- works by building a database of preferences for prod- ous limitations. Second, we present a clustering-based ucts by consumers. A new consumer, Neo, is matched algorithm that is suited for a large data set, such as against the database to discover neighbors,whichare those are common in E-commerce applications of rec- other consumers who have historically had similar taste ommender systems. This algorithm has characteristics to Neo. Products that the neighbors like are then rec- that make it likely to be faster in online performance ommended to Neo, as he will probably also like them. than many previously studied algorithms, and we seek Collaborative filtering has been very successful in both to investigate how the quality of its recommendations research and practice. However, there remain impor- compares to other algorithms under different practical tant research questions in overcoming two fundamental circumstances.

∗Currently with the Computer Science Department, San The rest of the paper is organized as follows. The Jose State University, San Jose, CA 95112, USA. Email: next section provides a brief overview of collabora- [email protected], Phone: +1 408-245-8202 tive filtering based recommender systems and discusses Product for which prediction is sought P1 P2 . . Pj . . Pn

C1 C 2 R (prediction on . a,j . Prediction product j for the active customer) Ca . Recommendation . {Tp1, Tp2, ..., T pN} Top-N list of products for the Cm active customer Active customer Input (ratings table) CF-Algorithm Output interface

Figure 1:The Process.

some of its limitations. Section 3 describes the details most often computed by finding the Pearson-r correla- algorithm of applying clustering based approach to ad- tion between the customers C and Ni. dress these limitations. Section 4 describes our exper- Once these systems determine the proximity neigh- imental framework, experimental results, and discus- borhood, they produce recommendations that can be sion. The final section provides some concluding re- of two types: marks and directions for future research. • Prediction is a numerical value, Ra,j, expressing the predicted opinion-score of product pj for the 2 Collaborative Filtering-based active customer ca. This predicted value is within the same scale (e.g., from 1 to 5) as the opinion Recommender Systems values provided by ca.

Collaborative filtering (CF) [8, 9, 13] is the most suc- • Recommendation is a list of N products, TPr = cessful recommender system technology to date, and is {Tp1,Tp2,...,TpN }, that the active user will like used in many of the most successful recommender sys- the most. The recommended list usually consists tems on the Web. CF systems recommend products to of the products not already purchased by the ac- a target customer based on the opinions of other cus- tive customer. This output interface of CF algo- tomers. These systems employ statistical techniques to rithms is also known as Top-N recommendation. find a set of customers known as neighbors,thathave a history of agreeing with the target user (i.e., they Figure 1 shows the schematic diagram of the collab- either rate different products similarly or they tend to orative filtering process. CF algorithms represent the m × n buy similar set of products). Once a neighborhood of entire customer-product data as a ratings ma- A a A users is formed, these systems use several algorithms trix, .Eachentry i,j in represent the preference i j to produce recommendations. score (ratings) of the th customer on the th product. In a typical E-Commerce scenario, there is a list of Each individual rating is within a numerical scale and it can as well be 0, indicating that the customer has m customers C = {c1,c2,...,cm} and a list of n prod- not yet rated that product. ucts P = {p1,p2,...,pn}.Eachcustomerci expresses his/her opinions about a list of products. This set of These systems have been successful in several do- mains, but the algorithm is reported to have shown opinions is called the “ratings” of customer ci and is some limitations, such as: denoted by Pci . There exists a distinguished customer c ∈C a called the active customer for whom the task of • Sparsity. Nearest neighbor algorithms rely upon a collaborative filtering algorithm is to find a product exact matches that cause the algorithms to sac- suggestion. rifice recommender system coverage and accuracy Most collaborative filtering based recommender sys- [8, 11]. In particular, since the correlation coeffi- tems build a neighborhood of likeminded customers. cient is only defined between customers who have The Neighborhood formation scheme usually uses Pear- rated at least two products in common, many pairs son correlation or cosine similarity as a measure of of customers have no correlation at all [1]. Accord- proximity [13]. The neighborhood formation process is ingly, Pearson nearest neighbor algorithms may be in fact the model-building or learning process for a rec- unable to make many product recommendations ommender system algorithm. The main goal of neigh- for a particular user. This problem is known as C borhood formation is to find, for each customer ,an reduced coverage, and is due to sparse ratings of k N {N ,N ,...,N } ordered list of customers = 1 2 k such neighbors. that C ∈ N and sim(C, N1)ismaximum,sim(C, N2) is the next maximum and so on. Where sim(C, Ni) • Scalability. Nearest neighbor algorithms require indicates similarity between two customers, which is computation that grows with both the number of Dataset after application Complete dataset (based on the of the clustering algorithm user-user similarity)

User cluster is used as the neighborhood

Active user

Figure 2:Neighborhood formation from clustered partitions

customers and the number of products. With mil- very good, since the size of the group that must be lions of customers and products, a typical web- analyzed is much smaller. based recommender system running existing algo- rithms will suffer serious scalability problems. 3.1 Scalable Neighborhood Algorithm The weakness of Pearson nearest neighbor approach The idea is to partition the users of a collaborative for large, sparse databases led us to explore alternative filtering system using a clustering algorithm and use recommender system algorithms. Our first approach the partitions as neighborhoods. Figure 2 explains attempted to bridge the sparsity by incorporating semi- this idea. A collaborative filtering algorithm using intelligent filtering agents into the system [11]. We this idea first applies a clustering algorithm on the addressed the scalability challenge in an earlier work user-item ratings database to divide the database [12], where we showed that forming neighborhoods in A into p partitions. The clustering algorithm may the low dimensional eigen-space provided better quality generate fixed sized partitions, or based on some sim- and performance. Here we present a different dimen- ilarity threshold it may generate a requested number sionality reduction approach by first clustering the data of partitions of varying size. In the next step, the set and then forming neighborhoods form the parti- neighborhood for the active customer ca is selected by tions. The application of clustering techniques reduces looking into the partition where he/she belongs. The the sparsity and improves scalability of recommender entire partition, Ai is then used as the neighborhood systems. Clustering of users can effectively partition for that active customer ca. Prediction is generated the ratings database and thereby improve the scala- using basic collaborative filtering technique. We now bility and sparsity. Earlier studies [2, 8, 15] indicate present the algorithm formally. the benefits of applying clustering in recommender sys- tems. We outline our research approach in the next Algorithm: Clustered neighborhood forma- section. tion 1. Apply the clustering algorithm to produce p parti- 3 Scalable Neighborhood Using tions of users using the training data set. Formally, the data set A is partitioned in A1,A2,...,Ap, Clustering where Ai ∩ Aj = φ,for1≤ i, j ≤ p;and A1 ∪ A2 ∪ ...∪ Ap = A. Clustering techniques work by identifying groups of u users who appear to have similar preferences. Once the 2. Determine the neighborhood for a given user .If u ∈ A A clusters are created, predictions for an individual can i then the entire partition i is used as the be made by averaging the opinions of the other users neighborhood. in that cluster. Some clustering techniques represent 3. Once the neighborhood is obtained, classical col- each user with partial participation in several clusters. laborative filtering algorithm is used to generate The prediction is then an average across the clusters, prediction from that. In particular, the prediction weighted by degree of participation. Clustering tech- score Ra,j for a customer ca on product pj is com- niques usually produce less-personal recommendations puted by using the following formula [9]. than other methods and most often lead to worse ac-  (PC − P¯C )ra,i curacy than nearest neighbor algorithms [3]. Once the ¯ i rates j i,j i Ra,j = PCa + (1) clustering is complete, however, performance can be i |ra,i| Prediction quality: Clustering vs. CF 0.82 0.81 0.8 0.79 0.78

MAE 0.77 0.76 0.75 0.74 10 20 30 40 50 60 70 80 90 100 Clust CF No. of clusters

Figure 3:Quality of prediction using clustered-neighborhood vs. classical CF-neighborhood approach

where ra,i denotes the correlation between the ac- accuracy metric named Mean Absolute Error (MAE), tive user Ca and its neighbors Ci who have rated which is a measure of the deviation of recommenda- ¯ the product Pj. PCa denotes the average ratings tions from their true user-specified values. For each

of customer Ca,andPCi,j denotes the rating given ratings-prediction pair , this metric treats by customer Ci on product Pj. the absolute error between them i.e., |pi − qi| equally. The MAE is computed by first summing these absolute This method has two benefits—first, it reduces the errors of the N corresponding ratings-prediction pairs sparsity of the data set and second, due to the dimen- and then computing the average. Formally, MAE = N sionality reduction and use of a static pre-computed |pi−qi| i=1 , The lower the MAE, the more accurately neighborhood, the prediction generation is much faster. N the recommendation engine predicts user ratings. Our choice of MAE as the primary accuracy metric 4 Experimental Evaluation is due to the fact that it matches the goals of our ex- periments most closely. MAE is most commonly used In this section we present a brief discussion of our ex- and easiest to interpret directly. There is a vast re- perimental data set, evaluation metric, and experimen- search literature on performing statistical significance tal platform followed by the experimental results and testing and computing confidence intervals for MAE. discussion. Furthermore, researchers [5] in the related field have also suggested the use of MAE for prediction evalua- 4.1 Data Sets tion metric. We used data from our recommender system Movie- Lens (www..umn.edu), which is a web-based 4.3 Experimental Procedure research recommender system that debuted in Fall Benchmark CF system. To compare the perfor- 1997. Each week hundreds of users visit MovieLens mance of item-based prediction we also entered the to rate and receive recommendations for movies. The training ratings set into a collaborative filtering rec- site now has over 50, 000 users who have expressed ommendation engine that employs the Pearson nearest opinions on 3, 000+ different movies. We randomly se- neighbor algorithm. We tuned the algorithm to use the lected enough users to obtain 100, 000 ratings from the best published Pearson nearest neighbor algorithm and database (we only considered users that had rated 20 or configured it to deliver the highest quality prediction more movies). We divided the database into 80% train- without concern for performance (i.e., it considered ev- ing set and 20% test set. The data set was converted ery possible neighbor to form optimal neighborhoods). into a user-movie matrix R that had 943 rows(users) and 1682 columns(movies that were rated by at least Experimental platform. All our experiments were one of the users). implemented using C and compiled using optimization flag −06. We ran all our experiments on a Linux based 4.2 Evaluation Metrics workstation with dual Intel Pentium III processors hav- Recommender systems researchers use a number of dif- ing a speed of 600 MHz and 2GB of RAM. ferent measures for evaluating the success of the rec- ommendation or prediction algorithms [13, 11]. For Experimental steps. To experimentally evaluate our experiments, we use a widely popular statistical the effectiveness of clustering, we use a variant of K- Throughput: Clustering vs. CF 7000

6000

5000

4000

3000

(recs./sec) Throughput 2000

1000

0 10 20 30 40 50 60 70 80 90 100 Clust CF No. of clusters

Figure 4:Throughput of clustered-neighborhood vs. classical CF-neighborhood approach.

means [7] clustering algorithm called the bisecting K- neighbors as the clustered approach. means clustering algorithm. This algorithm is fast and Figure 4 presents the performance results of our ex- tends to produce clusters of relatively uniform size, periments for both of the techniques. We plot the which results in good cluster quality [14]. We divide throughput as a function of the cluster size. We define the movie data set into a 80% training and 20% test throughput of a recommender system as the number of portion. For the purpose of comparison, we perform recommendations generated per second. From the plot the same experiments using our benchmark CF-based we see that using the clustered approach, the through- recommender system. We use the same train/test ratio put is substantially higher than the basic CF approach. x, and number of neighbors. In case of the clustering This is due to the reason that with the clustered ap- approach, the number of neighbors is not always fixed; proach the prediction algorithm has to use a fraction of one cluster may have 30 users, another one may have neighbors. The throughput increases rapidly with the 55 user and so on. To make our comparisons fair, we increase in the number of clusters (small cluster sizes). recorded the number of neighbors used for prediction Since the basic CF method has to scan through all the computation for each user and forced our basic CF al- neighbors, the number of clusters does not impact the gorithm to use same number of neighbors for prediction throughput. generation. We evaluated the results using the MAE metric and also noted the run time elapsed in seconds. We conducted a 10-fold cross validation of our experi- 5 Conclusion and Future Work ments by randomly choosing different training and test sets each time and taking the average of the MAE and Recommender systems are a powerful new technology run time values. for extracting additional value for a business from its customer databases. These systems help customers 4.4 Results and Discussion find products they want to buy from a business. Rec- ommender systems benefit customers by enabling them Figure 3 presents the prediction quality results of our to find products they like. Conversely, they help the experiments for the clustering as well as basic CF tech- business by generating more sales. Recommender sys- niques. In this chart, prediction quality is plotted as a tems are rapidly becoming a crucial tool in E-commerce function of the number of clusters. We make two im- on the Web. Recommender systems are being stressed portant observations from this chart. First, the predic- by the huge volume of customer data in existing corpo- tion quality is worse in case of the clustering algorithm rate databases, and will be stressed even more by the but the difference is small. For instance, using 10 clus- increasing volume of customer data available on the ters, the clustered approach yields an MAE of 0.7665 Web. New technologies are needed that can dramati- and the corresponding CF algorithm yields an MAE of cally improve the scalability of recommender systems. 0.7455. It can also be observed from the chart that as In this paper, we presented and experimentally eval- we increase the number of clusters the quality tends to uated a new approach in improving the scalability of be inferior (increased MAE). In case of clustering it is recommender systems by using clustering techniques. expected as with a fixed number of users, increasing Our experiments suggest that clustering based neigh- the number of clusters would mean small cluster sizes borhood provides comparable prediction quality as the and hence small number of neighbors to have opinions basic CF approach and at the same time significantly about items. The same trend is observed in the case of improves the online performance. basic CF as we force them to use the same number of We demonstrated the effectiveness of one particu- lar clustering algorithm (bisecting k-means algorithm). [5] Herlocker, J., Konstan, J., Borchers, A., and In future, better clustering algorithms as well as bet- Riedl, J. (1999). An Algorithmic Framework for ter prediction generation schemes can be used to im- Performing Collaborative Filtering. In Proceedings prove the prediction quality. Clustering techniques can of ACM SIGIR’99.ACMpress. also be applied as a “first step” for shrinking the can- didate set in a nearest neighbor algorithm or for dis- [6] Hill, W., Stead, L., Rosenstein, M., and Furnas, tributing nearest-neighbor computation across several G. (1995). Recommending and Evaluating Choices recommender engines. While dividing the population in a Virtual Community of Use. In Proceedings of into clusters may hurt the accuracy or recommenda- CHI ’95. tions to users near the fringes of their assigned cluster, [7] Jain, A. K., and Dubes, R. C. (1988). Algorithms pre-clustering may be a worthwhile trade-off between for Clustering Data. Prentice Hall Publishers.En- accuracy and throughput. glewood Cliffs, NJ. [8] Konstan, J., Miller, B., Maltz, D., Herlocker, J., 6 Acknowledgments Gordon, L., and Riedl, J. (1997). GroupLens:Ap- plying Collaborative Filtering to Usenet News. Funding for this research was provided in part by the Communications of the ACM, 40(3), pp. 77-87. National Science Foundation under grants IIS 9613960, IIS 9734442, and IIS 9978717 with additional fund- [9] Resnick, P., Iacovou, N., Suchak, M., Bergstrom, ing by Net Perceptions Inc. This work was also sup- P., and Riedl, J. (1994). GroupLens:An Open Ar- ported by NSF CCR-9972519, by Army Research Office chitecture for Collaborative Filtering of Netnews. contract DA/DAAG55-98-1-0441, by the DOE ASCI In Proceedings of CSCW ’94, Chapel Hill, NC. program and by Army High Performance Computing Research Center contract number DAAH04-95-C-0008. [10] Resnick, P., and Varian, H. R. (1997). Recom- Access to computing facilities was provided by AH- mender Systems. Special issue of Communications PCRC, Minnesota Supercomputer Institute. We also of the ACM. 40(3). thank anonymous reviewers for their valuable com- [11] Sarwar, B. M., Konstan, J. A., Borchers, A., Her- ments. locker, J., Miller, B., and Riedl, J. (1998). Using Filtering Agents to Improve Prediction Quality in References the GroupLens Research Collaborative Filtering System. In Proceedings of CSCW ’98, Seattle, WA. [1] Billsus, D., and Pazzani, M. J. (1998). Learning [12] Sarwar, B. M., Karypis, G., Konstan, J. A., and Collaborative Information Filters. In Proceedings Riedl, J. (2000). Analysis of Recommendation Al- of ICML ’98. pp. 46-53. gorithms for E-Commerce. In Proceedings of the [2] Borchers, A., Leppik, D., Konstan, J., and Riedl, ACM EC’00 Conference. Minneapolis, MN. pp. J. (1998). Partitioning in Recommender Systems. 158-167 Technical Report CS-TR-98-023, Computer Sci- [13] Shardanand, U., and Maes, P. (1995). Social In- ence Dept., University of Minnesota. formation Filtering:Algorithms for Automating [3] Breese, J. S., Heckerman, D., and Kadie, C. ’Word of Mouth’. In Proceedings of CHI ’95.Den- (1998). Empirical Analysis of Predictive Algo- ver, CO. rithms for Collaborative Filtering. In Proceedings [14] Steinbach, M., Karypis, G., and Kumar, V. of the 14th Conference on Uncertainty in Artificial (2000). A Comparison of Document Clustering Intelligence, pp. 43-52. Techniques. In Workshop (ACM [4] Goldberg, D., Nichols, D., Oki, B. M., and Terry, KDD’00). D. (1992). Using Collaborative Filtering to Weave [15] Ungar, L. H., and Foster, D. P. (1998) Cluster- an Information Tapestry. Communications of the ing Methods for Collaborative Filtering. In Work- ACM. December. shop on Recommender Systems at the 15th Na- tional Conference on Artificial Intelligence.