Collaborative Filtering: a Machine Learning Perspective by Benjamin Marlin a Thesis Submitted in Conformity with the Requirement
Total Page:16
File Type:pdf, Size:1020Kb
Collaborative Filtering: A Machine Learning Perspective by Benjamin Marlin A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Computer Science University of Toronto Copyright c 2004 by Benjamin Marlin Abstract Collaborative Filtering: A Machine Learning Perspective Benjamin Marlin Master of Science Graduate Department of Computer Science University of Toronto 2004 Collaborative filtering was initially proposed as a framework for filtering information based on the preferences of users, and has since been refined in many different ways. This thesis is a comprehensive study of rating-based, pure, non-sequential collaborative filtering. We analyze existing methods for the task of rating prediction from a machine learning perspective. We show that many existing methods proposed for this task are simple applications or modifications of one or more standard machine learning methods for classification, regression, clustering, dimensionality reduction, and density estima- tion. We introduce new prediction methods in all of these classes. We introduce a new experimental procedure for testing stronger forms of generalization than has been used previously. We implement a total of nine prediction methods, and conduct large scale prediction accuracy experiments. We show interesting new results on the relative performance of these methods. ii Acknowledgements I would like to begin by thanking my supervisor Richard Zemel for introducing me to the field of collaborative filtering, for numerous helpful discussions about a multitude of models and methods, and for many constructive comments about this thesis itself. I would like to thank my second reader Sam Roweis for his thorough review of this thesis, as well as for many interesting discussions of this and other research. I would like to thank Matthew Beal for knowledgeably and enthusiastically answering more than a few queries about graphical models and variational methods. I would like to thank David Blei for our discussion of LDA and URP during his visit to the University this fall. I would also like to thank all the members of the machine learning group at the University of Toronto for comments on several presentations relating to collaborative filtering. Empirical research into collaborative filtering methods is not possible without suitable data sets, and large amounts of computer time. The empirical results presented in this thesis are based on the EachMovie and MovieLens data sets that have been generously made available for research purposes. I would like to thank the Compaq Computer Corporation for making the EachMovie data set available, and the GroupLens Research Project at the University of Minnesota for use of the MovieLens data set. I would like to thank Geoff Hinton for keeping us all well supplied with computing power. On a personal note, I would also like to thank my good friends Horst, Jenn, Josh, Kevin, Liam, and Rama for many entertaining lunches and dinners, and for making the AI lab an enjoyable place to work. I would like to thank my parents who have taught me much, but above all the value of hard work. Finally, I would like to thank my fiancee Krisztina who has given me boundless support, encouragement, and motivation. She has graciously agreed to share me with the University while I persue doctoral studies, and I thank her for that as well. iii Contents 1 Introduction 1 2 Formulations 4 2.1 A Space of Formulations . 4 2.1.1 Preference Indicators . 5 2.1.2 Additional Features . 6 2.1.3 Preference Dynamics . 7 2.2 Pure, Non-Sequential, Rating-Based Formulation . 7 2.2.1 Formal Definition . 8 2.2.2 Associated Tasks . 8 3 Fundamentals 10 3.1 Probability and Statistics . 11 3.2 Complexity Analysis . 13 3.3 Experimentation . 14 3.3.1 Experimental Protocols . 14 3.3.2 Error Measures . 15 3.3.3 Data Sets . 16 3.3.4 The Missing at Random Assumption . 18 4 Classification and Regression 20 iv 4.1 K-Nearest Neighbor Classifier . 21 4.1.1 Neighborhood-Based Rating Prediction . 22 4.1.2 Complexity . 24 4.1.3 Results . 25 4.2 Naive Bayes Classifier . 26 4.2.1 Naive Bayes Rating Prediction . 28 4.2.2 Complexity . 30 4.2.3 Results . 30 4.3 Other Classification and Regression Techniques . 31 5 Clustering 32 5.1 Standard Clustering . 33 5.1.1 Rating Prediction . 34 5.1.2 Complexity . 35 5.1.3 Results . 36 5.2 Hierarchical Clustering . 36 5.2.1 Rating Prediction . 38 6 Dimensionality Reduction 39 6.1 Singular Value Decomposition . 39 6.1.1 Weighted Low Rank Approximations . 40 6.1.2 Learning with Weighted SVD . 42 6.1.3 Rating Prediction with Weighted SVD . 43 6.1.4 Complexity . 44 6.1.5 Results . 45 6.2 Principal Components Analysis . 45 6.2.1 Rating Prediction with PCA . 46 6.3 Factor Analysis and Probabilistic PCA . 47 v 6.3.1 Rating Prediction with Probabilistic PCA . 49 7 Probabilistic Rating Models 51 7.1 The Multinomial Model . 52 7.1.1 Learning . 53 7.1.2 Rating Prediction . 53 7.1.3 Complexity . 54 7.1.4 Results . 55 7.2 Mixture of Multinomials Model . 55 7.2.1 Learning . 57 7.2.2 Rating Prediction . 60 7.2.3 Complexity . 61 7.2.4 Results . 61 7.3 The Aspect Model . 62 7.3.1 Learning . 65 7.3.2 Rating Prediction . 68 7.3.3 Complexity . 69 7.3.4 Results . 69 7.4 The User Rating Profile Model . 70 7.4.1 Variational Approximation and Free Energy . 72 7.4.2 Learning Variational Parameters . 75 7.4.3 Learning Model Parameters . 76 7.4.4 An Equivalence Between The Aspect Model and URP . 78 7.4.5 URP Rating Prediction . 80 7.4.6 Complexity . 81 7.4.7 Results . 82 7.5 The Attitude Model . 82 7.5.1 Variational Approximation and Free Energy . 85 vi 7.5.2 Learning . 86 7.5.3 Rating Prediction . 87 7.5.4 Binary Attitude Model . 88 7.5.5 Integer Attitude Model . 92 7.5.6 Complexity . 94 7.5.7 Results . 96 8 Comparison of Methods 98 8.1 Complexity . 98 8.2 Prediction Accuracy . 102 9 Conclusions 109 9.1 Summary . 109 9.2 Future Work . 111 9.2.1 Existing Models . 111 9.2.2 The Missing at Random Assumption . 113 9.2.3 Extensions to Additional Formulations . 114 9.2.4 Generalizations to Additional Applications . 116 9.3 The Last Word . 116 Bibliography 119 vii List of Tables 3.1 Each Movie and MovieLens Data Set Statistics . 17 4.1 PKNN-Predict: EachMovie Results . 26 4.2 PKNN-Predict: MovieLens Results . 26 4.3 NBClass-Predict: EachMovie Results . 31 4.4 NBClass-Predict: MovieLens Results . 31 5.1 K-Medians Clustering: EachMovie Results . 36 5.2 K-Medians Clustering: MovieLens Results . 36 6.1 wSVD-Predict: EachMovie Results . 45 6.2 wSVD-Predict: MovieLens Results . 45 7.1 MixMulti-Predict: EachMovie Results . 61 7.2 MixMulti-Predict: MovieLens Results . 61 7.3 Aspect-Predict: EachMovie Results . 70 7.4 Aspect-Predict: MovieLens Results . 70 7.5 URP: EachMovie Results . 82 7.6 URP: MovieLens Results . 82 7.7 AttBin-Predict: EachMovie Results . 97 7.8 AttBin-Predict: MovieLens Results . 97 8.1 Computational Complexity Of Learning and Prediction Methods . 99 viii 8.2 Space Complexity of Learned Representation . 101 8.3 EachMovie: Prediction Results . 103 8.4 MovieLens: Prediction Results . 103 ix List of Figures 3.1 EachMovie Rating Distributions . 18 3.2 MovieLens Rating Distributions . 18 4.1 Naive Bayes Classifier . 26 4.2 Naive.