Getting to Know You: Learning New User Preferences in Recommender Systems Al Mamunur Rashid, Istvan Albert, Dan Cosley, Shyong K. Lam, Sean M. McNee, Joseph A. Konstan, John Riedl GroupLens Research Project Department of Computer Science and Engineering University of , MN 55455 USA {arashid, ialbert, cosley, lam, mcnee, konstan, riedl}@cs.umn.edu

Recommender systems help people make decisions in these ABSTRACT Recommender systems have become valuable resources for complex information spaces. Recommenders suggest to the users seeking intelligent ways to search through the user items that she may value based on knowledge about enormous volume of information available to them. One her and the space of possible items. A news service, for crucial unsolved problem for recommender systems is how example, might remember the articles a user has read. The best to learn about a new user. In this paper we study six next time she visits the site, the system can recommend techniques that recommender new articles to her based on the ones she has read before. systems can use to learn about new users. These techniques Collaborative filtering is one technique for producing select a sequence of items for the collaborative filtering recommendations. Given a domain of choices (items), users system to present to each new user for rating. The can express their opinions (ratings) of items they have tried techniques include the use of information theory to select before. The recommender can then compare the user’s the items that will give the most value to the recommender ratings to those of other users, find the “most similar” users system, aggregate statistics to select the items the user is based on some criterion of similarity, and recommend most likely to have an opinion about, balanced techniques items that similar users have liked in the past. that seek to maximize the expected number of bits learned When new users come along, however, the system knows per presented item, and personalized techniques that predict nothing about them. This is called the new user problem for which items a user will have an opinion about. We study recommender systems [1, 2, 6]. The system must acquire the techniques thru offline experiments with a large pre- some information about the new user in order to make existing user data set, and thru a live experiment with over personalized predictions. The most direct way to do this is 300 users. We show that the choice of learning technique to ask for ratings directly by presenting items to the user. significantly affects the user experience, in both the user effort and the accuracy of the resulting predictions. However, the system must be careful to present useful items that garner information. A food recommender, for Keywords instance, probably should not ask whether a new user likes Recommender systems, collaborative filtering, information vanilla ice cream. Most people like vanilla ice cream, so filtering, startup problem, entropy, user modeling. knowing that a new user likes it tells you little about the user. At the same time, the recommender should ask about INTRODUCTION items the user is likely to have an opinion about. A travel People make decisions every day. “Which movie should I recommender would probably not benefit by asking a new see?” “What city should I visit?” “What book should I user if she liked Burkina Faso, for instance. The read?” “What web page has the information I need?” We is likely to learn only that, like most have far too many choices and far too little time to explore people, she has not visited Burkina Faso, which is of little them all. The exploding availability of information that the value in forming future travel recommendations. web provides makes this problem even tougher. The choice of exactly what questions to ask a new user, then, is critical. An intelligent recommender interface will Permission to make digital or hard copies of all or part of this work for minimize a new user’s effort and get him to the fun part— personal or classroom use is granted without fee provided that copies are using the system and seeing recommendations—while still not made or distributed for profit or commercial advantage and that learning enough to make good recommendations. copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, In this paper we explore approaches for choosing which requires prior specific permission and/or a fee. items to present to new users for rating. We consider this IUI’02, January 13-16, 2002, San Francisco, California, USA. problem in the general case of recommender systems, Copyright 2002 ACM 1-58113-459-2/02/0001…$5.00. illustrating strategies and performing experiments using the MovieLens movie recommender. We first survey related on attributes of items) and asking questions that help work in the areas of decision theory and recommender choose an appropriate model for a user. When these models systems, then consider approaches for selecting movies to are accurate they can be quite useful, but the premise of present to users. We test these approaches on historical data personalized recommender systems and collaborative drawn from the 7.5 million-rating MovieLens dataset. We filtering is that a person’s preferences are a better predictor also test three of the most promising strategies on over 300 of other preferences than other attributes. Category and new MovieLens users. We then discuss the results and demographic models are thus less general than the methods suggest directions for future work. we present; they apply only to certain domains, and require domain-specific expertise. RELATED WORK We briefly mention related work in the field of decision Filterbots are a technique to overcome the startup problem theory and survey work that has been done on the new user for new items in a collaborative filtering system by problem in the area of recommender systems. injecting ratings agents that rate every item in the system according to their algorithmic analysis of the content of the Decision theory and entropy item [6]. Filterbots can make sure that every item in the Decision theory has proved useful in determining models system has many ratings to help users find the items they for re-ordering search results [4]. This application of utility are most interested in. However, filterbots do not directly functions has also been used in recommender systems [13, attack the new user problem. 14]. Others have integrated agents into a collaborative filtering Analysis of data for entropy—its theoretical information environment to extract user preference information content—has been a standard technique used in information transparently [17]. This method has the advantage of retrieval [10], medical diagnostic systems [9], and collecting implicit information in addition to explicitly sequential classification problems [3] for many years. provided ratings, and should gather data for new users more Lately, researchers have extended the use of entropy into rapidly. Using implicit data in addition to explicit data is a areas such as probabilistic models for information retrieval promising approach, and is complementary to our approach [7] and value-of-information analysis [16]. of carefully selecting which explicit data to collect.

We apply decision theory techniques to a new problem: STRATEGIES FOR SELECTING ITEMS TO PRESENT choosing the items to first present to a new user of a There are trade-offs to be made when choosing a strategy recommender system. Our problem is in some ways the for presenting items. As discussed in the introduction, converse of the cited research; we are selecting items as requiring too much effort of the user will cause some users questions to present to the user, rather than choosing which to give up, while not asking enough questions will result in answers to present for a user’s question. poor recommendations. We identify four dimensions that a strategy might choose to support: (a) User effort: how hard Recommender systems and the new user problem was it to sign up? (b) User satisfaction: how well did the There has been little work in solving the new user problem user like the signup process? (c) Recommendation by analyzing ratings data to make smart decisions. Pennock accuracy: how well can the system make recommendations and Horvitz proposed the use of a “value-of-information” to the user? (d) System utility: how well will the system be calculation to discover the most valuable ratings able to serve all users, given what it learns from this one? information to next gather from a user [14]. To our knowledge, they have not published any implementations We choose to focus on user effort and accuracy. We chose or evaluations of their calculations. these two is because they are easy to measure and can be measured in both off-line and on-line experiments. User Kohrs and Merialdo make use of entropy and variance in satisfaction studies are difficult to do off-line from their ratings data in order to generate more accurate historical data, and we believe that user satisfaction will predictions for new users [12]. Our work expands their rise as user effort falls. While we touch on a few issues results by using a number of strategies that we consider as related to system utility, such as the danger of introducing being more suitable than variance or entropy. We also have biases into a system’s ratings database when using certain a much larger dataset for our offline experiments and verify strategies, we do not focus on it since our primary focus is our findings in a live experiment. on factors that directly influence the user’s experience. Another approach to solving the new user problem creates We consider several types of strategies for presenting pre-made user categories and quickly assigns new users to items, ranging from random selection, through strategies one of them. The partitioning can be accomplished by that exploit aggregate properties of the system’s database asking the user pre-determined questions that build a user such as choosing popular items, to strategies that tune preference structure. This helps jump-start the user into the themselves to individual users. system without requiring a substantial number of ratings [8, 13]. This class of approaches addresses the question of what to present first by starting with a small set of preference models (e.g. demographic models, models based Random strategies Another choice is whether to compute the entropy over Random strategies avoid bias in the presentation of items. each rating individually, or whether to convert ratings into We consider two variants. a binary “like vs. dislike” model where ratings of 4 or 5 Random. Select items to present randomly with uniform indicate like while ratings of 1 to 3 indicate dislike. probability over the universe of items. Finally, “most information” in the technical sense meant by MovieLens Classique. For each page of movies presented, entropy does not necessarily translate into information select one movie randomly from an ad hoc manual list of usable by the system. A movie with only two ratings, a 1 popular movies and the rest randomly from all movies. and a 5, has high entropy but little value in finding similar users or making recommendations. Similarly, a Discussion. Random strategies have the advantage of recommender may present high entropy movies, but if the collecting user preference data over the entire universe of user has not seen any of them, the system will gain no items. If the distribution of ratings is not uniform, however, information at all. We performed a pilot study for the on- users will likely not have seen many of the randomly line experiment using a pure entropy strategy, and it turned selected movies. MovieLens Classique tries to boost the out to be unusable. Two users had to view several hundred chance that users will have at least one success per page. movies each before finding ten to rate. Popularity Balanced strategies In MovieLens, the number of movies with a given number Popularity-based strategies tend to get many ratings from of ratings decreases in an approximately exponential users, but each rating may have low information-theoretic manner, deviating from this exponential form for the least- value to the recommender system. Conversely, entropy- and most-rated movies. based techniques get a lot of value from each rating, but Popularity. Rank all items in descending order of number users may find relatively few items to rate. In this section ratings. Present movies in descending popularity order. we consider balanced techniques that attempt to obtain Discussion. This strategy is easy to calculate and should many ratings, each of which has a relatively high value. In make it easy for users to rate movies. However, popular a sense, these techniques are working to obtain as many movies may be widely liked; if this is true, then their expected bits of information as possible from each item ratings carry little information. If everyone likes Titanic, presented for the user to possibly rate. and I say I like it too, what can the system learn from that? 2.5 Another concern when using the popularity strategy is the possibility of exacerbating the prefix bias. Popular movies 2 are easier for the system to recommend, because similar

y 1.5 p

users are more likely to have seen them. Since users can o r t

rate any movie the system recommends, popular movies En 1 garner still more ratings. Unpopular movies suffer from the same problem in reverse: they are hard to recommend, so 0.5 users see them and rate them less often than they otherwise might. This bias may explain the deviation from 0 exponential form for the popularity distribution of movies. 0 2000 4000 6000 8000 10000 12000 14000 Ratings Pure entropy An alternative approach is to ask users about movies that Figure 1. Entropy vs. the popularity of a movie, will give us the most information for each rating. smoothed by using a moving average entropy of the Informally, a movie that has some people who hate it and previous 50 most popular movies. others who like it should tell us more than a movie where almost everyone liked it. Kohrs and Merialdo used both Popularity*Entropy (Pop*Ent). Rank items by the product variance and entropy to get at this notion [12]. of popularity and entropy. Entropy is the number of bits of Pure Entropy. For each movie, calculate its entropy using information if the user rates the item, and popularity is the the relative frequency of each of the five possible ratings. probability that this user will rate this item. Using Bayes’ Sort the movies in descending order of entropy. Present the theorem in this way assumes that popularity and entropy movies with the highest entropy that the user has not seen. are independent, which is unlikely to be strictly true, but the approach is likely to be a good approximation to Discussion. There are several choices to make when using expected number of bits. Further, our experience with an entropy-based strategy. The first is how to handle MovieLens ratings suggests that popularity and entropy are missing ratings. We choose to ignore them in the not strongly correlated. Figure 1 plots the average entropy calculation because the information content of a missing of movies with a given popularity ranking using a moving rating is hard to measure: a user may have chosen not to see average over the 50 prior most popular movies. This figure it, never heard of it, or seen it but not thought to rate it. shows that there is little correlation between a movie’s dataset, for example, is close to 4 on a 1 to 5 scale. This popularity and entropy except for movies with few ratings. means that we may get mostly positive ratings for the new Log Popularity*Entropy (log Pop*Ent). As above, but take user, which is not as useful as knowing both some movies the log of the number of ratings before computing that the user likes and some that she dislikes. popularity. In studying Entropy and Popularity we observed Other plausible strategies that their distributions in our dataset were such that There are a number of other plausible strategies that we do Popularity almost completely dominated Pop*Ent, with not consider in this paper. The system might ask attribute- both strategies producing nearly the same sets of rankings. based questions of the user, although as mentioned earlier Taking the log of the ratings nearly linearized popularity, such strategies are domain-dependent. The system might making it a better match for entropy. also ask for the names of items a user likes. CDnow.com Discussion. Figure 1 suggests that entropy alone may not does this, explicitly asking for album titles the user liked. A be an effective strategy, since entropy is nearly independent recommender system could also preferentially display new of popularity. Thus, entropy alone will sometimes choose items, or items that have recently been added to its items that users have low probability of having an opinion database. Finally, it could perform a more sophisticated about. Balanced techniques directly combine entropy and analysis of entropy and personalization than we attempt in popularity to increase both the odds that a user will be able this paper and try to select items with high independent to rate movies that the recommender presents and the entropy. We focus on domain-independent strategies, and expected value of that rating. within these on the simplest ones; exploring more complex strategies would be fertile ground for future work. Personalized The strategies above all use aggregate statistics. However, OFFLINE EXPERIMENTS the overall popularity of an item is only a rough We decided to first explore the strategies mentioned above approximation for the chance that a particular user has seen by performing off-line experiments that use historical data it. Ideally, the movies presented to a user could be tailored to simulate the signup process for new MovieLens users. to that user as soon as we have some information about that The benefit of these offline experiments is that we can user. Once we know that a user has rated Ghostbusters, we quickly test a variety of strategies without bothering actual might want to show other movies rated by people who have users with strategies that turn out in practice to work rated Ghostbusters. The goal is to hone in on movies that poorly. A disadvantage of these offline experiments, the user is likely to have seen in order to make the signup described in detail below, is that biases in our existing data process easier and require less user effort. A simple may bias the results for or against particular approaches. personalized strategy uses item-item similarity. We identify the biases as carefully as we can, and interpret Item-Item personalized: Present movies using any strategy our results in that context. Still, these experiments were until the user has given at least one rating. Then use a invaluable to us in ruling out several algorithms that would recommender that computes similarity between items to have been painful for actual users. select other items that the user is likely to have seen. Experimental Design Update the list of similar movies whenever the user submits To build the dataset for the off-line experiments, we took a more ratings, remembering movies that the user has already snapshot of the MovieLens ratings database and eliminated seen so that they are not presented again. For our users who had fewer than 200 ratings. This left 7,335 users, experiments, we present the initial screen of movies as a 4,117 movies and more than 2.7 million ratings. The cutoff random selection from the top 200 movies ranked by the of 200 is both high and somewhat arbitrary. However, we log Pop*Ent strategy. needed a large number of ratings for each user as in the Discussion. Personalizing the movies we present is similar historical data it is hard to know which movies the user to, but not the same as, recommending movies. When we might have seen other than through their ratings. We recommend movies, we try to identify movies the user will needed many ratings for each user so we had a good sample like; when presenting movies, we only care whether he has of movies they were able to rate. seen a movie. The SUGGEST recommender, used as the We tested the Pure Entropy, Random, Popularity, Pop*Ent, item-item recommender in our experiments, was developed and Item-Item personalized strategies. We did not test the with e-commerce in mind and uses binary ratings (e.g. the MovieLens Classique strategy because the historical data user bought the item) [11]. It accepts a list of items the user were gathered with the Classique strategy and we feared has bought and returns a list of other items the user would possible bias. be most likely to buy. This is exactly the task we face: given a list of movies the user has seen, what other movies To mimic the on-line sign-up process, we used each is he most likely to have seen. strategy to “present” a total of 30, 45, 60, or 90 movies to each user. We varied the number of movies presented so One possible disadvantage for the item-based personalized that we could see how the strategies performed as the strategy is that seeing a movie is probably correlated with system attempted to gather more information. When we liking a movie. The average rating in the MovieLens started a run, we withheld all of that user’s ratings from the Biases in the reduced dataset system. As we presented the movies, users “rated” the The reduced dataset inherits several biases from the full movies they had “seen” (i.e. those for which we had ratings MovieLens dataset. In particular, it has the prefix bias, for in the database). where popular movies are easier to recommend and are shown (and rated) more often. This might give strategies

that incorporate popularity an advantage in the number of movies they allow a user to rate. Our decision to remove Entropy Random Popular Pop*Ent Item-item users with less than 200 ratings also introduces possible 60 bias. One bias is that our results may be most meaningful 50 for active users. It is also possible that removing users with n e

e 40 fewer ratings might artificially impact prediction accuracy.

S 30 Excluding these users also resulted in a denser data set. es

vi 20 10 Results Mo 0 Figure 2 shows that the Item-Item personalized strategy did 30 45 60 90 the best in picking movies users can rate, while Pure Movies presented Entropy was the worst. Figure 3 shows the effect different strategies have on MAE. Figure 2. Number of movies seen versus number of Pop*Ent performs best for a given number of presented movies presented to a user. movies, with Popularity close behind. Again, Pure Entropy is shockingly poor. The poor performance of Pure Entropy in both metrics is Once we had presented the correct number of movies, we directly related. Figure 1 shows a slight increase in entropy counted the number of movies the user was able to actually for less popular movies. Since popularity directly relates to rate. More ratings implied that we did a better job of the chance that a new user has seen a movie, this strategy showing items the user could rate. This is good: it means presents movies that users are less likely to have seen, that we wasted less of the user’s time looking at unratable resulting in poor performance in the movies-seen metric. items and that we can present fewer items to get the Moreover, with fewer rated movies to base predictions on, information the system needs, saving the user effort. After the MAE for Pure Entropy also suffered. counting the rated movies, we used these as training data The Item-Item personalized strategy has the most for that user and made predictions for all of the other interesting behavior. We expected it to win in the movies- movies the user had rated in the original dataset. We then seen metric, and it in fact trounced the competition. This calculated the Mean Absolute Error (MAE) for these did not translate into better recommendations, however. It predictions. MAE is the sum of the absolute differences was hard to believe that the Random strategy could get an between each prediction and corresponding rating divided error rate with eight ratings as training data comparable to by the number of ratings. We performed the entire the item-item personalized strategy with 57 ratings. procedure for each strategy for every user in the test set, and computed an average MAE across all users. Computing One possible reason is that the item-item strategy presented average MAE in this way counts all users equally, rather movies that it could otherwise have made accurate than biasing the results towards users with more ratings. predictions for. Imagine that the system presents Star Trek 23: the Bowels of Kirk to a Star Trek fan, who rates it. The system looks and finds that most people who have seen Entropy Random Popular Pop*Ent Item-item Bowels have also seen Star Trek N, 0

MA 0.9 predictions, thus lowering MAE. The problem seems to be that the item-item personalized 0.6 strategy does not do a good job of sampling the entire space 30 45 60 90 No. of movies presented of movies. Item-item methods tend to find loose clusters of similar items and hone in on those clusters. This may cause poor exploration of the universe of items: the recommender Figure 3. Mean Absolute Error (MAE) vs. the number may become an expert on Star Trek movies at the expense of movies presented by each strategy. of others. This also helps explain why the random strategy does well despite finding many fewer movies, as it samples effort. We would like to measure prediction accuracy as from all genres and all levels of popularity. well, but we do not have a good basis for computing MAE We will further discuss the merits of the strategies after we immediately after a user signs up. We could compute it on present the results of our online experiment. the movies they rated during the signup process (MovieLens logs predictions for these movies to support ONLINE EXPERIMENT retrospective analysis). However, since the purpose of the We followed up our off-line experiment by deploying signup process is to gather information, judging error several strategies on the live MovieLens site. By using live during the signup process does not make much sense. users, we could verify the results of the off-line experiment User interaction is quite difficult to foresee let alone while removing the bias induced by only considering users quantify. Some users have rated all the movies on the first who had at least 200 ratings. We also wanted to compare page of a random sample, a highly unexpected event, while these strategies to the MovieLens Classique strategy. others have waded through dozens of pages with popular We had planned to investigate all of the strategies in our movies, seemingly not being able to rate a single one from online experiments. However, after our pilot study, we them. We included all of these users without prejudice. decided against the Random and Entropy strategies as the average number of movies a user would have to see before Expectations rating enough to get recommendations would be Both the Popularity and the log Pop*Ent approaches are prohibitively high. Reading through hundreds of movie expected to show a slow decrease in the number of movies titles can be a frustrating process that would surely turn matched per page, up to the point where most of the users many users away. The pilot study also lead us to use the log finish with the signup. This is a natural consequence of the Pop*Ent strategy instead of Pop*Ent, since Pop*Ent and fact that we pre-sorted the movies and presented them in Popularity alone chose almost the same set of movies. descending order of the corresponding parameter. We also expected the Item-Item personalized strategy to perform no Experimental Design better than log Pop*Ent on the first page since it uses that When a new user joins MovieLens, the system presents strategy to select the initial set of movies. We did expect pages of ten movies until the user rates ten or more movies. Item-Item to outstrip the other strategies on subsequent We altered the signup process to ask users if they were pages, showing that it was successfully finding movies that willing to let the system use an experimental method for users had seen. selecting movies for them to rate. Users who consented Popularity Item-item logPop*Ent Classique were assigned to one of three groups, which used the Popularity, log Pop*Ent, or Item-Item personalized strategy 10 to present movies. Those who did not consent received the 8 MovieLens Classique strategy. This self-selection introduces a bias, so we use the Classique strategy only as a s 6 g n i

baseline. t

MovieLens had a total of 351 new users during the ten-day Ra 4 experimental period. Table 1 shows the number of users in 2 each experimental group. Some users gave up before completing the sign-up process. Our results below are 0 based on the users who completed the signup process. 12345678 Page Table 1. Population of the experimental groups. Figure 4. Number of movies users could rate per page using different movie presentation strategies. Strategy Total Users Dropouts Completed

Popularity 91 10 81 Item-item 92 10 82 Results logPop*Ent 92 13 79 Figure 4 shows the number of movies per page an average user was able to rate with each of the strategies. Popularity Classique 76 16 60 and log Pop*Ent exhibit the decay we expected, although Total 351 49 302 they both rose slightly after three pages. When the Item-

Item recommender kicks in on the second page the users

are able to rate more movies than with any other strategy, Our primary goal for the online experiment is to measure and more movies than they did on the first page. Classique the effectiveness of the signup process: how many pages of was approximately constant across all pages. movies must a user see before they have rated enough to get started? We believe this is a suitable proxy for the effort we require of the user, with fewer pages equaling less Popularity Item-item logPop*Ent Classique both metrics and provide the system designer an easy way 100 to choose trade-offs. Popularity provides a good balance between effort and accuracy. Pop*Ent trades effort for ) 80 % more accuracy; Item-Item personalized trades accuracy for s ( r 60 less effort. Item-Item does sacrifice more in accuracy than use the other methods. of 40

ent The results of the off-line and on-line experiments support c r e 20 each other. Random, Entropy, and Classique performed P poorly at helping users rate movies; Popularity performed 0 12345 well in both cases, and Item-Item successfully found Page movies for users to rate in both experiments. Table 2 compares the overall performance of our algorithms on our Figure 5. Cumulative percentage of users who two primary dimensions of minimizing user effort and finished signing up after a given number of pages. making good predictions. Choosing an intelligent strategy for presenting items to rate can dramatically improve usability. The Classique strategy required over three times From a user’s point of view, the ease of the signup process the effort of the best strategies, Popularity and Item-Item is probably best expressed as the number of pages of ten personalized, and based on the off-line results for the movies he must see before starting to get recommendations. Random strategy, probably delivers worse The mean number of pages varied from around two for the recommendations. Popularity (1.9) and Item-Item (2.3) strategies, then rising for the log Pop*Ent (4.0) and Classique (7.0) strategies. These results should generalize to any set of ratings where Figure 5, which plots the cumulative percentages of users the popularity of an item decays exponentially and the ending their signup on the nth page, shows that these means relative entropy of most items is in a fairly narrow range. hide long tails, especially in the case of the log Pop*Ent We expect that most real-world ratings data sets have these strategy. This figure shows that the Popularity and Item- properties. Item strategies are by far the most effective strategies with An application’s requirements also matter. An e-commerce over 90% of users being able to sign up in five pages or recommender might have to start making recommendations less. The other two strategies fare much worse, with a with no data at all about the current user [15]. In this case, number of users requiring more than five pages. Since we we suggest recommending the most popular items rather only consider users who completed the signup process, all than the highest-rated ones, and then using Item-Item four strategies eventually reach 100 percent; we truncated strategies to personalize the recommendations as quickly as the graph at five pages because all of the strategies except possible. for Popularity had outliers that viewed over 200 movies before they found ten movies to rate. We also have anecdotal evidence about another dimension of user experience: users in our research group much Table 2. Evaluation of strategies over both experiments preferred using techniques that allowed them to rate several on user effort and accuracy metrics. movies per page, especially compared to the techniques that required them to go many pages between ratings. The Strategy User Effort Accuracy reaction was so strong that we modified our experimental Random/Classique design to include only one technique with very low ratings Popularity density (Classique). Exploiting intelligence about the user may lead to improved satisfaction. (log) Pop*Ent However, using methods that exploit intelligence about the Item-Item user may induce biases in ratings distributions. The DISCUSSION Popularity strategy might exacerbate the bias we described We consider both the on-line and the off-line results in this earlier, where more popular movies get more chances to be section. In evaluating the techniques we focus on two recommended and rated. Over time, the system might dimensions of the user experience: user effort and become a “winner takes all” recommender that only recommendation accuracy. The best strategy for eliciting recommends generically popular items. information from users depends on the dimension along The Item-Item strategy might create the opposite problem. which you wish to optimize. Some algorithms do well at Each user would see a set of items to rate that predicted to minimizing user effort, at the cost of accuracy, while other be of interest to him. Over time, users may become algorithms provide very good accuracy, at the cost of clustered in small groups with very little overlap, leading to additional effort from the user. The best algorithms perform the balkanization of the user population. well by both measures. Popularity, Pop*Ent, and Item-Item Both of these potential long-term dangers can be combated personalized strategies all give reasonable performance on in practice by including some randomness in the set of items suggested for rating. Too much randomness leads to 3. Ben-Bassat, M. Myopic Policies in Sequential excessive user effort, but a small amount of randomness Classification. IEEE Transactions on Computers, 27(2), may help to extend the space over which the recommender 170-174. understands the user’s interests and ensure that all items are 4. Glover, E. J., and Birmingham, W. P. Using Decision occasionally presented to users. Theory to Order Documents. Proceedings of ACM Digital Libraries 1998, 285-286. CONCLUSION AND FUTURE WORK We conclude that the proper strategy for eliciting 5. Goldberg, K., Roeder, T., Gupta, D., and Perkins, C. information from users depends on the dimension of user Eigentaste: A Constant Time Collaborative Filtering experience along which you are trying to optimize. In Algorithm. Information Retrieval Journal, 4(2), 133- general, strategies that make good guesses about what 151. items a user is likely to be able to rate do well with both 6. Good, N., Schafer, J.B., Konstan, J., Borchers, A., reducing user effort and producing acceptable Sarwar, B., Herlocker, J., and Riedl, J. Combining recommendations. We believe these results will hold for Collaborative Filtering with Personal Agents for Better many similar recommender systems. Recommendations. Proceedings of AAAI-99, 439-446. We studied the techniques we considered in three ways: 7. Greiff, W. R., and Ponte, J. The Maximum Entropy through analysis, through simulation studies on previously Approach and Probabilistic IR Methods. ACM collected user data, and through live user trials. We found Transactions on Information Systems 18(3) 246-287. the three methods complementary. The analysis helped suggest techniques that might be useful. The simulation 8. Ha, V., and Haddawy P. Towards Case-Based studies enabled us to consider a very large number of users : Similarity Measures on quickly, and to explore techniques that would have been Preference Structures. Proceedings UAI 1998, 193-201. frustrating for live users. The live study helped avoid the 9. Horvitz, E., Heckerman D., Ng, K, Nathwani, B. problems of data bias in our simulations, and increased our Towards Normative Expert Systems: Part I, Pathfinder confidence in the applicability of the results to real systems. Project. Methods of Information in Medicine, 31, 90- We believe that all three techniques are important in 105. successfully developing intelligent user interfaces. 10. Kantor, P. B., and Lee, J. J. The Maximum Entropy In this paper we focused on minimizing user effort while Principle In Informational Retrieval. Proceedings of still being able to make accurate predictions. It would be ACM SIGIR 1986, 269-274. useful to perform a more thorough investigation of the 11. Karypis, G. Evaluation of Item-Based Top-N system’s needs for diverse ratings across all items, and how Recommendation Algorithms. Proceedings of CIKM to balance these needs with the user experience. More 2001. direct measurements of user satisfaction, such as longer- term statistics on usage and surveying users, would 12. Kohrs, A., and Merialdo, B. Improving Collaborative complement our attempts to minimize user effort. Filtering for New Users by Smart Object Selection, Proceedings of International Conference on Media ACKNOWLEDGEMENTS Features (ICMF) 2001 (oral presentation). We thank members of the GroupLens Research Group, past 13. Nguyen, H., and Haddawy, P. The Decision-Theoretic and present, for their contributions to this research. We Video Advisor. Proceedings of AAAI Workshop on would also like to thank the members of the MovieLens Recommender Systems, 76-80, 1998. system for their support in our research efforts, and our anonymous reviewers for their comments on this paper. 14. Pennock, D., and Horvitz, E. Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and Model- This work was supported by grants from the National based Approach. Proceedings of UAI 2000, 473-480. Science Foundation (IIS 96-13960, IIS 97-34442, and IIS 99-78717) and by Net Perceptions, Inc. 15. Schafer, J.B., Konstan, J., and Riedl, J., Electronic Commerce Recommender Applications. Journal of REFERENCES Data Mining and Knowledge Discovery, January 2001. 1. Avery, C., Resnick, P., and Zeckhauser, R. The Market 16. Vetschera, R. Entropy and the Value of Information, for Evaluations. American Economic Review, 89(3): Central European Journal of Operations Research 8, 564-584. 2000 S. 195-208. 2. Balabanovic, M., and Shoham, Y. 1997. Fab: Content- 17. Wasfi. A. M. A. Collecting User Access Patterns for based, collaborative recommendation. Communications Building User Profiles and Collaborative Filtering. of the ACM, 40(3), 66-72. Proceedings of IUI 1999, 57-64.