Social Choice Theory and Recommender Systems: Analysis of the Axiomatic Foundations of Collaborative Filtering

From: AAAI-00 Proceedings. Copyright © 2000, AAAI (www.aaai.org). All rights reserved. Social Choice Theory and Recommender Systems: Analysis of the Axiomatic Foundations of Collaborative Filtering David M. Pennock Eric Horvitz C. Lee Giles NEC Research Institute Microsoft Research NEC Research Institute 4 Independence Way One Microsoft Way 4 Independence Way Princeton, NJ 08540 Redmond, WA 98052-6399 Princeton, NJ 08540 [email protected] [email protected] [email protected] Abstract recommendation services. Alexa3 is a web browser plug-in The growth of Internet commerce has stimulated the use of that recommends related links based in part on other collaborative filtering (CF) algorithms as recommender people’s web surfing habits. Several CF tools originally systems. Such systems leverage knowledge about the developed at Microsoft Research are now included with behavior of multiple users to recommend items of interest to the Commerce Edition of Microsoft’s SiteServer,4 and are individual users. CF methods have been harnessed to make currently in use at multiple sites. recommendations about such items as web pages, movies, The effectiveness of any CF algorithm is ultimately books, and toys. Researchers have proposed several predicated on the underlying assumption that human variations of the technology. We take the perspective of CF preferences are correlated—if they were not, then informed as a methodology for combining preferences. The preferences predicted for the end user is some function of prediction would be impossible. There does not seem to be all of the known preferences for everyone in a database. a single, obvious way to predict preferences, nor to Social Choice theorists, concerned with the properties of evaluate effectiveness, and many different algorithms and voting methods, have been investigating preference evaluation criteria have been proposed and tested. Most aggregation for decades. At the heart of this body of work is comparisons to date have been empirical or qualitative in Arrow's result demonstrating the impossibility of nature [Billsus and Pazzani, 1998; Breese et al., 1998; combining preferences in a way that satisfies several Konstan and Herlocker, 1997; Resnick and Varian, 1997; desirable and innocuous-looking properties. We show that Resnick et al., 1994; Shardanand and Maes, 1995], though researchers working on CF algorithms often make similar some worst-case performance bounds have been derived assumptions. We elucidate these assumptions and extend [Freund et al., 1998; Nakamura and Abe, 1998; Cohen et results from Social Choice theory to CF methods. We show that only very restrictive CF functions are consistent with al., 1999] and some general principles have been desirable aggregation properties. Finally, we discuss advocated [Freund et al., 1998; Cohen et al., 1999]. Initial practical implications of these results. methods were statistical, though several researchers have recently cast CF as a machine learning problem [Billsus and Pazzani, 1998; Freund et al., 1998; Nakamura and Abe Introduction 1998]. We take instead an axiomatic approach, informed by The goal of collaborative filtering (CF) is to predict the results from Social Choice theory. First, we identify preferences of one user, referred to as the active user, several properties that a CF algorithm might ideally posses, based on the preferences of a group of users. For example, and describe how existing CF implementations obey given the active user’s ratings for several movies and a subsets of these conditions. We show that, under the full database of other users’ ratings, the system predicts how set of conditions, only one prediction strategy is possible: the active user would rate unseen movies. The key idea is The ratings of the active user are derived solely from the that the active user will prefer those items that like-minded ratings of only one other user. This is called the nearest people prefer, or even that dissimilar people don’t prefer. neighbor approach [Freund et al., 1998]. The analysis CF systems have seen growing use in electronic commerce mirrors Arrow’s celebrated Impossibility Theorem, which applications on the World Wide Web. For example, the shows that the only voting mechanism that obeys a similar University of Minnesota’s GroupLens and MovieLens1 2 set of properties is a dictatorship [Arrow, 1963]. Under research projects spawned Net Perceptions, a successful slightly weaker demands, we show that the only possible Internet startup offering personalization and form for the prediction function is a weighted average of the users’ ratings. We also provide a second, separate Copyright © 2000, American Association for Artificial Intelligence axiomatization that again admits only the weighted (www.aaai.org). All rights reserved. 3 1 http://movielens.umn.edu http://www.alexa.com 4 2 http://www.netperceptions.com http://www.microsoft.com/DirectAccess/ products/sscommerce average. The weighted average method is used in practice number of titles. Denote the n×m matrix of all users’ in many CF applications [Breese et al., 1998; Resnick et ratings for all titles as R. More specifically, the rating of ∈ ℜ∪ ⊥} al., 1994; Shardanand and Maes, 1995]. One contribution user i for title j is Rij, where each Rij { is either a ⊥ of this paper is to provide a formal justification for it. real number or , the symbol for “no rating”. Let ui be an Stated another way, we identify a set of properties, one of n-dimensional row vector with a 1 in the ith position and ⋅ which must be violated by any non-weighted-average CF zeros elsewhere. Thus ui R is the m-dimensional (row) 1 method. On a broader level, this paper proposes a new vector of all of user i’s ratings. Similarly, define tj to be an connection between theoretical results in Social Choice m-dimensional column vector with a 1 in the jth position ⋅ theory and in CF, providing a new perspective on the task. and zeros elsewhere. Then R tj is the n dimensional This angle of attack could lead to other fruitful links (column) vector of all users’ ratings for title j. Note that ⋅ ⋅ ∈ between the two areas of study, including a category of CF ui R tj = Rij. Distinguish one user a {1, 2, …, n} as the algorithms based on voting mechanisms. The next section active user. Define NR ⊂ T to be the subset of titles that the covers background on CF and Social Choice theory. The active user has not rated, and thus for which we would like remaining sections present, in turn, the three to provide predictions. That is, title j is in the set NR if and ⊥ axiomatizations, and discuss the practical implications of only if Raj = . Then the subset of titles that the active user our analysis. has rated is T-NR. In general terms, a collaborative filter is a function f that takes as input all ratings for all users, and replaces some or Background all of the “no rating” symbols with predicted ratings. Call this new matrix P. In this section, we briefly survey previous research in ≠ ⊥ collaborative filtering, describe our formal CF framework, Raj : if Raj P = (1) and present relevant background material on utility theory aj f (R) : if R = ⊥ and Social Choice theory. a aj For the remainder of this paper we drop the subscript on Collaborative Filtering Approaches f for brevity; the dependence on the active user is implicit. A variety of collaborative filters or recommender systems have been designed and deployed. The Tapestry system Utility Theory and Social Choice relied on each user to identify like-minded users manually Theory [Goldberg et al., 1992]. GroupLens [Resnick et al., 1994] Social choice theorists are also interested in aggregation and Ringo [Shardanand and Maes, 1995], developed functions f similar to that in (1), though they are concerned independently, were the first CF algorithms to automate with combining preferences or utilities rather than ratings. prediction. Both are examples of a more general class we Preferences refer to ordinal rankings of outcomes. For call similarity-based approaches. We define this class example, Alice’s preferences might hold that sunny days loosely as including those methods that first compute a (sd) are better than cloudy days (cd), and cloudy days are matrix of pairwise similarity measures between users (or better than rainy days (rd). Utilities, on the other hand, are between titles). A variety of similarity metrics are possible. numeric expressions. Alice’s utilities v for the outcomes Resnick et al. [1994] employ the Pearson correlation sd, cd, and rd might be vsd = 10, vcd = 4, and vrd = 2, coefficient for this purpose. Shardanand and Maes [1995] respectively. If Alice’s utilities are such that vsd > vcd, then test a few measures, including correlation and mean Alice prefers sd to cd. Axiomatizations by Savage [1954] squared difference. Breese et al. [1998] propose a metric and von Neumann and Morgenstern [1953] provide called vector similarity, based on the vector cosine persuasive postulates which imply the existence of utilities, measure. All of the similarity-based algorithms cited and show that maximizing expected utility is the optimal predict the active user’s rating as a weighted sum of the way to make choices. If two utility functions v and v′ are others users’ ratings, where weights are similarity scores. positive linear transformations of one another, then they Yet there is no a priori reason why the weighted average are considered equivalent, since maximizing expected should be the aggregation function of choice. Below, we utility would lead to the same choice in both cases. provide two possible axiomatic justifications. Now consider the problem of combining many peoples’ Breese et al. [1998] identify a second general class of preferences into a single expression of societal preference. CF algorithms called model-based algorithms. In this Arrow proved the startling result that this aggregation task approach, an underlying model of user preferences (for is simply impossible, if the combined preferences are to example, a Bayesian network model) is first constructed, satisfy a few compelling and rather innocuous-looking from which predictions are inferred.

Load more