<<

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

On and Choice Models

Shivani Agarwal Radcliffe Institute for Advanced Study, Harvard University, Cambridge, MA, USA Indian Institute of Science, Bangalore, India [email protected]

Abstract how to design new algorithms for ranking form pairwise com- parisons that achieve desirable goals under various conditions In today’s big era, huge amounts of ranking [Rajkumar and Agarwal, 2014; 2016a; Rajkumar et al., 2015; and choice data are generated on a daily basis, and Rajkumar and Agarwal, 2016b]. consequently, many powerful new computational In marketing, when presenting products to users in a store tools for dealing with ranking and choice data have or on a website, it is important to categorize related products emerged in recent years. This paper highlights together so that users can quickly find what they are looking recent developments in two areas of ranking and for. These categories should ideally be based on how users choice modeling that cross traditional boundaries themselves make choices, so that products that tend to be and are of multidisciplinary interest: ranking from liked or disliked together are grouped together. In the second pairwise comparisons, and automatic discovery of part of the paper (Section 3), we describe a new approach for latent categories from choice survey data. automatic discovery of categories from choice data, which brings together ideas from random utility choice models and 1 Introduction topic models in order to automatically discover latent cate- In today’s big data era, huge amounts of data are generated in gories from choice survey data [Agarwal and Saha, 2016]. the form of and choices on a daily basis: restaurant ratings, product choices, employer ratings, hospital rankings, 2 Ranking from Pairwise Comparisons and so on. Given the increasing universality of such ranking and choice data, many powerful new computational tools for Ranking from pairwise comparisons is a ubiquitous problem dealing with such data have emerged over the last few years that arises in a variety of applications. The basic types of in areas related to AI, including in particular machine learn- questions of interest here are the following: Say there are n items, denoted [n]= 1,...,n , and we observe the out- ing, , operations research, and computational social { } choice. Here we briefly highlight such developments in two comes of a number of pairwise comparisons among them broad areas: ranking from pairwise comparisons, and auto- (such as pairwise preferences among movies, pairwise judg- matic discovery of latent categories from choice data. ments among job candidates, or pairwise game outcomes As is well known, humans generally find it easier to ex- among teams). Based on these pairwise comparisons, press preferences in the form of comparisons between two can we find a good ranking of the n items, or identify a ‘best’ items, rather than directly ranking a large number of items. item or a ‘good’ set of items among them? How many com- Indeed, in many domains, one is given outcomes of pairwise parisons do we need? What sorts of algorithms can we use? comparisons among some set of items (such as movies, can- Under what conditions do these algorithms succeed? didates in an election, or sports teams), and must estimate a A natural statistical framework for analyzing such ques- ranking of all the items from these observed pairwise compar- tions assumes that for each pair of items (i, j), there is an underlying pairwise preference probability P [0, 1] such isons. Due to the ubiquitous nature of the problem, several ij 2 different algorithms have been developed for ranking from that whenever items i and j are compared, item i beats item pairwise comparisons in different communities, including j with probability Pij and j beats i with probability Pji = e.g. maximum likelihood estimation methods [Bradley and 1 Pij. Collectively, these pairwise preference probabilities n n Terry, 1952; Luce, 1959], spectral ranking methods [Kendall, form an underlying pairwise preference matrix P [0, 1] ⇥ (with P = 1 i). Different statistical models for2 pairwise 1955; Keener, 1993; Negahban et al., 2012], noisy sorting ii 2 8 methods [Braverman and Mossel, 2008], and many others; comparisons lead to different conditions on P. For exam- however, little has been understood about when one algo- ple, if the pairwise preference probabilities follow the well- known Bradley-Terry-Luce (BTL) for pair- rithm should be preferred over another. In the first part of the n paper (Section 2), we discuss recent developments in under- wise comparisons, then there is a score vector w R++ wi 2 such that Pij = i, j [Bradley and Terry, 1952; standing these issues, including understanding the conditions wi+wj 8 under which different ranking algorithms succeed or fail, and Luce, 1959]; if they follow the ‘noisy permutation’ (NP)

4050 Condition on P Property satisfied by P n wi Bradley-Terry-Luce (BTL) w : Pij = R++ wi+wj 9 2 1 Low-noise (LN) Pij > 2 k Pik > k Pjk 1 ) Pik Pjk Logarithmic LN (LogLN) Pij > ln( ) > ln( ) 2 Pk Pki P k Pkj 1 ) Markov consistency (MC) Pij > ⇡i >⇡j 2 P P 1 ) 1 1 Stochastic transitivity (ST) Pij > 2 ,Pjk > 2 Pik > 2 1 ) Condorcet winner (CW) i : Pij > 2 j = i 9 81 6 1 p if i j Noisy permutation (NP) n,p< : Pij = 9 2S 2 p otherwise Low rank (LR( , r)) rank( (P)) r ( :[0,n1] R,r [n])  ! 2

Figure 1: Conditions on the matrix of underlying pairwise preference probabilities P and relationships among them. These conditions play an important role in determining the success of different algorithms for ranking from pairwise comparisons.

model, then there is a permutation n and noise param- Thus the input pairwise comparison data here is of the form eter p [0, 1 ) such that i = j, P =12S p if (i) <(j) y1 ,...,yK , with yk =1denoting that the k-th com- 2 2 8 6 ij { ij ij }i

4051 n Table 1: Properties of different ranking algorithms when all 2 pairs are compared. Ranking Algorithm Finds optimal ranking? Finds Finds Finds BTL LN LogLN MC ST Condorcet winner? top cycle? Copeland set? Matrix Borda XX BTL-MLE ⇥⇥⇥ ⇥ ⇥ ⇥ XX ⇥⇥⇥ ⇥ ⇥ ⇥ Least Squares Ranking X X Rank Centrality ⇥ ⇥⇥ ⇥ ⇥ ⇥ X ⇥⇥X ⇥⇥ ⇥ ⇥ SVM-Based Ranking XX X XX Topological Sort ⇥ ⇥ ⇥ XX X XX ⇥ ⇥ ⇥ Matrix Copeland XX X XX X X X

In general, when P does not satisfy ST, finding an optimal ranking is hard even under exact knowledge of P; indeed, Table 2: Conditions under which different ranking algorithms this corresponds to the minimum weighted feedback arc set are known to have associated performance guarantees when (MWFAS) problem, which is known to be NP-hard.3 In such only O(n log n) randomly selected pairs are compared. cases, one may still like to be able to find a ranking that places Ranking Algorithm Condition for ‘good’ items at the top; we discuss this next. Performance Guarantee Rank Centrality BTL Balanced Rank Estimation NP 2.2 Finding Good Items at the Top Low-Rank Pairwise Ranking LR( , r) ST \ As discussed above, when the underlying pairwise probabil- ities P do not satisfy stochastic transitivity, then even un- O(n log n) der exact knowledge of P, finding an optimal ranking of 2.3 Ranking from Comparisons of Pairs n all n items as above is NP-hard. In such settings, it is nat- The above discussion focused on the setting when all 2 ural to ask that an algorithm return a ranking that places pairs can be compared. However, when n is large, comparing ‘good’ items at the top. For example, when P admits a all pairs is impractical. A natural question that arises then is: Condorcet winner (an item that beats each other item with Under what conditions on P can we find a good ranking from probability greater than half), it is natural to ask this item comparisons of only O(n log n) (randomly sampled) pairs? be placed at the top. More generally, when there is no Con- Previous work has shown this is possible under the BTL con- dorcet winner, one can consider various notions of tourna- dition via the Rank Centrality algorithm [Negahban et al., ment solutions, each of which gives a different way to define 2012] and under the NP condition via the Balanced Rank Es- sets of winners in a tournament with cycles [Moulin, 1986; timation algorithm [Wauthier et al., 2013]; in recent work, Brandt et al., 2016], and ask that the items in the correspond- we show that this is in fact possible under a broader family of ing tournament solution be ranked at the top. For example, ‘low-rank’ conditions (which includes the BTL condition as a the top cycle is the smallest set of items W [n] for which special case, but is considerably more general; see Figure 1), each item in the set beats each item outside the✓ set with prob- via a low-rank matrix completion based algorithm that we call 1 ability greater than half (i.e. for which Pij > 2 i W, j / Low-Rank Pairwise Ranking. See Table 2 for a summary, and W ); the Copeland set is the set of items C 8[n]2that beat2 [Rajkumar and Agarwal, 2016b] for further details. the largest number of items with probability greater✓ than half C = argmax 1(P > 1 ) (i.e. i j ij 2 ); and so on. 3 Discovery of Categories from Choice Data [ ] As we showed recentlyP Rajkumar et al., 2015 , when the Consider now a marketing situation with n items or products, underlying preference probabilities P do not satisfy ST, most which need to be grouped into categories based on their being commonly used algorithms for ranking from pairwise com- liked (or disliked) together. Say we have choice survey data parisons are unable to find even the Condorcet winner when from M users, in which each user m is shown some subset it exists, and more generally, fail to rank items in various nat- of items Sm [n] and is asked to indicate his or her prefer- ural tournament solutions at the top (see Table 1). However, ences among✓ these items, e.g. by indicating his or her top few it is possible to design ranking algorithms that do find such choices in the set, or by assigning a rating (say 1–5) to each tournament solutions at the top. For example, given enough item in the set. Can we automatically discover meaningful pairwise comparisons (i.e. for large enough K), the Matrix categories from such choice data? Copeland algorithm successfully places both the top cycle Suppose there are K unknown categories to be discovered. 4 [ and the Copeland set at the top (Table 1). See Rajkumar We model each category k via a random utility model (RUM) et al., 2015] for further examples and discussion. that associates n random variables Xk1,...,Xkn with the n items. In order to allow different users to have different pref- 3When P satisfies ST, the graph induced by P is acyclic, and in erences among these categories, we posit that each user m m this case the MWFAS problem under knowledge of P is easy. has a hidden preference vector ✓ K indicating his or 4The Copeland set is always a subset of the top cycle. her preferences among the K categories.2 We then posit a

4052 cinating questions that remain open, and ranking and choice models will likely continue to be active areas of research in AI and related fields for many years to come.

Acknowledgments Figure 2: Categories discovered from sports choice survey Parts of the work described here are joint with Arun Rajku- data in which 253 respondents each rated the following 7 mar, Suprovat Ghoshal, Lek-Heng Lim, and Aadirupa Saha. sports on a of 1–5 based on how much they liked them: Parts of this work have been supported by a Ramanujan Fel- , , Cycling, Football, Jogging, Swimming, lowship from DST; an Indo-US Joint Center Award from the . The first category emphasizes individual sports; the Indo-US Science & Technology Forum; and the William & second category emphasizes team sports; the third category Flora Hewlett Foundation Fellowship at the Radcliffe Insti- separates out jogging. Categories can overlap: e.g. the above tute for Advanced Study, Harvard University. categories suggest cycling is likely to be co-enjoyed both with swimming and tennis and with jogging. References [Agarwal and Saha, 2016] S. Agarwal and A. Saha. Automatic dis- covery of categories from choice data. Forthcoming, 2016. for the observed choice data, whose param- eters are the parameters of the K RUMs (i.e. parameters of [Bradley and Terry, 1952] R. A. Bradley and M. E. Terry. Rank the distributions of X ), as well as parameters of a shared analysis of incomplete block designs: I. The method of paired ki comparisons. Biometrika, pages 324–345, 1952. Dirichlet prior on the preference vectors ✓m; given the ob- served choice data, we then fit these parameters to the data [Brandt et al., 2016] F. Brandt, M. Brill, and P. Harrenstein. Tour- using a variational expectation-maximization procedure. nament solutions. In Handbook of Computational Social Choice. Cambridge University Press. Forthcoming, 2016. Under our generative model, given a set of items Sm [n], user m picks her first choice as follows: she draws a category✓ [Braverman and Mossel, 2008] M. Braverman and E. Mossel. m Noisy sorting without . In SODA, 2008. k1 according to her preference vector ✓ , then draws util- ity values x1 for all items i S according to X : [Busa-Fekete and Hullermeier,¨ 2014] R. Busa-Fekete and i 2 m { k1,i i Sm and picks an item with the highest drawn util- E. Hullermeier.¨ A survey of preference-based online learn- 2 } 1 ing with bandit algorithms. In ALT, 2014. ity: i1 arg maxi Sm xi . Having picked her first choice 2 2 i1, in order to pick her second choice, she draws a cate- [Keener, 1993] J. P. Keener. The Perron-Frobenius theorem and the m 2 ranking of football teams. SIAM Review, 35(1):80–93, 1993. gory k2 according to ✓ , then draws utility values xi for all items i in Sm except the previously chosen item i1 ac- [Kendall, 1955] Maurice G. Kendall. Further contributions to the cording to Xk2,i : i Sm i1 , and then picks an item theory of paired comparisons. Biometrics, 11(1):43–62, 1955. { 2 2 \{ }} i2 arg maxi Sm i1 xi . This process is repeated until the [Luce, 1959] R. D. Luce. Individual Choice Behavior: A Theoreti- desired2 number2 of\{ choices} have been made. To model ratings cal Analysis. Wiley, 1959. data, which can be viewed as partial ranking data, we average [Marden, 1995] J. I. Marden. Analyzing and Modeling Rank Data. over all choices that are consistent with the observed ratings. Chapman & Hall, 1995. As one example, Figure 2 shows the results obtained by [McFadden, 1974] D. McFadden. Conditional logit analysis of applying our method to sports choice survey data in which qualitative choice behavior. In P. Zarembka, editor, Frontiers in 253 respondents each rated 7 sports on a scale of 1–5 based , pages 105–142. Academic Press, 1974. 5 on how much they enjoyed each . As can be seen, [Moulin, 1986] H. Moulin. Choosing from a tournament. Social the method accurately discovers two categories correspond- Choice and Welfare, 3(4):271–291, 1986. ing to individual and team sports, and identifies a third cate- [Negahban et al., 2012] S. Negahban, S. Oh, and D. Shah. Iterative gory for jogging; this is consistent with previous studies that ranking from pair-wise comparisons. In NIPS, 2012. have found that people tend to have strong likes or dislikes for jogging independent of their liking for other sports [Marden, [Rajkumar and Agarwal, 2014] A. Rajkumar and S. Agarwal. A ] [ statistical convergence perspective of algorithms for rank aggre- 1995 . Further examples and details can be found in Agar- gation from pairwise data. In ICML, 2014. wal and Saha, 2016]. [Rajkumar and Agarwal, 2016a] A. Rajkumar and S. Agarwal. Ranking from pairwise comparisons: The role of the pairwise 4 Conclusion preference matrix. Forthcoming, 2016. Given the increasing availability of ranking and choice data, [Rajkumar and Agarwal, 2016b] A. Rajkumar and S. Agarwal. there is considerable need for new and innovative computa- When can we rank well from comparisons of O(n log n) non- tional approaches to ranking and choice modeling. In this actively chosen pairs? Forthcoming, 2016. brief paper, we have highlighted some recent developments [Rajkumar et al., 2015] A. Rajkumar, S. Ghoshal, L.-H. Lim, and in ranking from pairwise comparisons and in automatic dis- S. Agarwal. Ranking from stochastic pairwise preferences: Re- covery of categories from choice data. There are many fas- covering Condorcet winners and tournament solution sets at the top. In ICML, 2015. 5Here we modeled each category via a multinomial logit choice [Wauthier et al., 2013] F. L. Wauthier, M. I. Jordan, and N. Jojic. model, which corresponds to taking X to be independent Gumbel ki Efficient ranking from pairwise comparisons. In ICML, 2013. random variables [McFadden, 1974].

4053