A Grouplens Perspective
Total Page:16
File Type:pdf, Size:1020Kb
From: AAAI Technical Report WS-98-08. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved. RecommenderSystems: A GroupLensPerspective Joseph A. Konstan*t , John Riedl *t, AI Borchers,* and Jonathan L. Herlocker* *GroupLensResearch Project *Net Perceptions,Inc. Dept. of ComputerScience and Engineering 11200 West78th Street University of Minnesota Suite 300 Minneapolis, MN55455 Minneapolis, MN55344 http://www.cs.umn.edu/Research/GroupLens/ http://www.netperceptions.com/ ABSTRACT identifying sets of articles by keyworddoes not scale to a In this paper, wereview the history and research findings of situation in which there are thousands of articles that the GroupLensResearch project I and present the four broad contain any imaginable set of keywords. Taken together, research directions that we feel are most critical for these two weaknesses represented an opportunity for a new recommender systems. type of filtering, that would focus on finding which INTRODUCTION:A History of the GroupLensProject available articles matchhuman notions of quality and taste. The GroupLens Research project began at the Computer Such a system would be able to produce a list of articles Supported Cooperative Work (CSCW)Conference in 1992. that each user wouldlike, independentof their content. Oneof the keynote speakers at the conference lectured on a Wedecided to apply our ideas in the domain of Usenet his vision of an emerging information economy,in which news. Usenet screamsfor better information filtering, with most of the effort in the economywould revolve around hundreds of thousands of articles posted daily. Manyof the production, distribution, and consumptionof information, articles in each Usenet newsgroupare on the sametopic, so rather than physical goods and services. Paul Resnick, then syntactic techniques that identify topic are muchless a student at MIT, and nowa professor at the University of valuable in Usenet. Further, different people value very Michigan, and one of us (Riedl) were movedby the talk different sets of articles, with somepeople participating in consider the technical challenges that would have to be long discussion threads that other people couldn’t imagine overcometo enable the information economy.We realized even reading. that as the amount of information increased enormously, Wedeveloped a system that falls into the class that is now while people’s ability to process information remained called automatic collaborative filtering. It collects ratings stable, one of the critical challenges would be technology from people on articles, combinesthe ratings statistically, that would automate matching people with the information and produces recommendations for other people of how they would find most valuable. muchthey are likely to like each article. Therewere two main thrusts of research activity in this area Weinvited people to participate in using GroupLensfrom that we knewof: (1) Artificial Intelligence (AI) research all over the Internet, and studied the effect of the systemon develop tools that would serve as a "knowledgerobot", or users. Users resisted our early attempts to establish multi- knowbot, continually seeking out information, reading and dimensional rating schemes, including characteristics such understandingit, and returning with the informationthat the as quality of the writing, and suitability of the topic for the knowbotdetermined would be most valuable to its user. (2) newsgroup. Rating on multiple dimensions was too much Information Filtering (IF) research to develop even more work. Wechanged to single-dimension ratings, with the efficient tools for selecting documents that contain dimension being "What score would you have liked keywordsof interest to a user. These techniques were, and GroupLensto predict for you for this article?" continue to be fruitful, but we felt they each have one serious weakness. In the case of the knowbot,the weakness Wefound that users did change behavior in response to the is that we are still a significant distance from technology recommendations,reading a muchhigher percentage of the that can understand articles in the waya humandoes. In the articles that GroupLenspredicted they would like than of case of Information Filtering, the weakness is that either randomly selected articles, or articles GroupLens predicted they would not like. However,there were many 1 GroupLensT is a trademark of Net Perceptions, Inc, articles for whichGroupLens was unable to provide ratings, M because even with a two to three hundred users, there were which develops and markets the GroupLens simply too manyarticles in the six newsgroups we were Recommendation Engine. Net Perceptions allows the studying. A greater density of ratings by article wouldhave University of Minnesota to use the name "GroupLens improvedthe usability of the system for most users. The Research" for continuity. The ideas and opinions low ratings density was compoundedby the first rater expressed in this paper are those of the authors and do not problem, which is the problem that a pure collaborative represent opinions of Net Perceptions, Inc. filtering system cannot possibly makerecommendations to 60 the first person that reads each article. Oneeffect of these may yield more accurate recommendations. Even if the two problems is that some beginning users of the system increased accuracy is offset by the smaller numberof items saw little value from GroupLensinitially, and hence never available to establish user correlations, partitioning maybe developed the habit of contributing ratings, though they valuable because it can help scale the performanceof the continued to use GroupLens-enablednews readers. system; each partition can be run in parallel on a separate Becausemost users did not like most articles, and because server. GroupLenswas effective at identifying articles users would To explore the potential of item partitioning, we considered like, users requested the ability to scan a newsgroupfor the three partitioning strategies for MovieLens: random articles that were predicted to be of high interest to them. partitions, partitions by movie genre, and partitions This led to our exploring a different style of interface to a generated algorithmically by clustering based on ratings. collaborative filtering system, the TopNinterface. Rather Clustering-based partitions produced a slight loss in than predicting a score for each article, a TopNinterface prediction accuracy as partitions grew smaller, but showed greedily seeks articles that are likely to have high scores for promise for a reasonable trade-off between performance an individual user, and recommendsthose articles to that and accuracy. Moviegenre partitions yielded less accurate user. Eventually, such an interface might be able to present recommendations than cluster-based ones, though some each of us with a list of the 20-30 most interesting articles genres were muchmore accurate, and others muchless so). for us from all of Usenet each morning. Randompartitions were slightly worse still. The value of item partitions clearly depends on the domain of the Our key lesson learned was that a very high volume, low recommendationsystem and the density of ratings within quality system like Usenet would require a very large and across potential partitions (our earlier Usenet work numberof users for collaborative filtering to be successful. found that mixing widely different newsgroups together For our research purposes, we needed a lower volume, reduced accuracy). One advantage of the clustering result higher density testbed. Our colleagues from Digital is that it maybe morebroadly applicable in domains where Equipment Corporation were closing downtheir research items lack obviousattributes for partitioning. system on movie recommendations,and offered us the data Wealso looked at the value of user partitioning, starting to jump-start a similar system using GroupLens. We with the extreme case of pre-computed symmetric launched our system in the summerof 1997, and have been neighborhoods based on our clustering algorithm; these running it since at www.movielens.umn.edu.MovieLens is were small partitions of about 200 users. If symmetric entirely web-based,and has several thousandregular users. neighborhoodsyield good results, time per recommendation Users rate movies, and MovieLens recommends other can be reduced dramatically, since substantial per- movies to them. neighborhood computation can be performed incrementally Over the past six years of research, we have learned that and amortized across the neighbors. We found that the people are hungry for effective tools for information accuracy of recommendationswas almost as good as using filtering, and that collaborative filtering is an exciting the full data set, but that the coverage(i.e., the numberof complementto existing filtering systems. Users value both movies for which we could computea recommendation)fell the taste-based recommendations, and the sense of by 14%. To restore coverage we introduced a two level communitythey get by participating in a group filtering hierarchy of users. process. However,there are manyopen research problems Users from each other neighborhoodwere collapsed into a still in collaborative filtering. Belowwe discuss our early single composite user. Each neighborhood then had all results on some of these problems, and outline the users represented, similar users were represented at full remaining problems we feel to be most important to the resolution and the more distant