Improvements in Holistic Recommender System Research
Total Page:16
File Type:pdf, Size:1020Kb
Improvements in Holistic Recommender System Research A DISSERTATION SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA BY Daniel Allen Kluver IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Joseph A. Konstan August, 2018 c Daniel Allen Kluver 2018 ALL RIGHTS RESERVED Dedication This dissertation is dedicated to my family, my friends, my advisers John Riedl and Joseph Konstan, my colleagues, both at GroupLens research and at Macalester College, and everyone else who believed in me and supported me along the way. Your support meant everything when I couldn’t support myself. Your belief meant everything when I couldn’t believe in myself. I couldn’t have done this without your help. i Abstract Since the mid 1990s, recommender systems have grown to be a major area of de- ployment in industry, and research in academia. A through-line in this research has been the pursuit, above all else, of the perfect algorithm. With this admirable focus has come a neglect of the full scope of building, maintaining, and improving recom- mender systems. In this work I outline a system deployment and a series of offline and online experiments dedicated to improving our holistic understanding of recommender systems. This work explores the design, algorithms, early performance, and interfaces of recommender systems within the scope of how they are interconnected with other aspects of the system. This work explores many indivisual aspects of a recommender system while keeping in mind how they are connected to other aspects of the system. The contributions of this thesis are: an exploration of the design of the BookLens system, a prototype recommender system for library-item recommendation; a methodology and exploration of algorithm performance for users with very few ratings which shows that the popular Item-Item recommendation algorithm performs very poorly in this context; an explo- ration of the issues faced by Item-Item, as well as fixes for these issues confirmed by both an offline and online analysis; and finally, the preference bits model for measuring the amount of noise and information contained in user ratings, as well as a rating sup- port interface capable of reducing the noise in user ratings leading to superior algorithm performance. Supporting these contributions are the following specific methodological improve- ments: a bias free methodology for measuring algorithm performance over a range of profile sizes; a prototype user-study design for investigating new-user recommendation through Amazon Mechanical Turk; the preference bits model as well as derived mea- surements of preference bits per rating, per impressions, and per second; and finally a sound experimental design that can be used to empirically measure preference bits values for a given interface. It is our hope that these methodological contributions can help researchers in the recommender systems field ask new questions and further the holistic study of recommender systems. ii Contents Dedication i Abstract ii List of Tables vii List of Figures ix 1 Introduction 1 2 Rating-Based Collaborative Filtering: Algorithms and Evaluation 5 2.1 Introduction.................................. 5 2.1.1 Examples of recommender systems . 5 2.1.2 Taxonomies of Recommendation Algorithms . 10 2.2 Concepts and Notation . 12 2.3 BaselinePredictors.............................. 15 2.4 Nearest Neighbor Algorithms . 17 2.5 Matrix Factorization Algorithms . 24 2.6 Learning to Rank . 29 2.7 Other Algorithms . 33 2.8 Combining Algorithms . 35 2.8.1 Ensemble Recommendation . 36 2.8.2 Recommending for Novelty and Diversity . 38 2.9 Metrics and Evaluation . 40 iii 3 A Case Study: the BookLens Recommender System 58 3.1 Past Work in Library Item Recommendation . 61 3.2 Development of the BookLens System . 63 3.2.1 Design Constraints . 65 3.3 TheFederatedRecommender ........................ 70 3.3.1 The Centralized Recommender . 71 3.3.2 The Decentralized Recommender . 72 3.3.3 The Federated Recommender . 73 3.4 Case Study: The Design of BookLens . 74 3.4.1 Data Organization . 74 3.4.2 Structural Organization . 79 3.5 Case Study: Exploration of the BookLens Dataset . 82 3.5.1 Data on early use . 84 3.5.2 Investigation of catalog overlap . 88 3.5.3 What is the effect of rating pooling? . 90 3.5.4 Comparison of BookLens and MovieLens Rating Data . 94 3.6 Conclusions and Future work . 98 4 Understanding Recommender Behavior for Users With Few Ratings 102 4.1 Evaluation Methodologies . 104 4.1.1 Past Approaches . 105 4.1.2 Iterated Retain-n method . 106 4.1.3 Subsetting Retain-n method . 109 4.2 Evaluation of Common Algorithms . 111 4.2.1 Dataset . 111 4.2.2 Algorithms . 112 4.2.3 Metrics . 113 4.2.4 results . 117 4.3 Conclusion . 125 5 Improving Item-Item CF for Small Rating Counts 128 5.1 Introduction.................................. 128 5.2 Improving Item-Item - the Frequency Optimized Model . 129 iv 5.3 ImprovingItem-Item-DampedItem-Item . 131 5.4 Offline Evaluation of the Improved Item-Item . 133 5.4.1 Small Profile Evaluation Results . 135 5.4.2 Large Profile Evaluation Results . 143 5.4.3 Damping Parameter Tuning Analysis . 146 5.4.4 Discussion . 148 5.5 A User Evaluation of Damped Item-Item Algorithms . 150 5.5.1 Experiment Design . 152 5.5.2 Results . 161 5.6 Summary . 169 6 Preference Bits and the Study of Rating Interfaces 173 6.1 Background . 175 6.1.1 Preference Elicitation . 175 6.1.2 Recommender Systems Research on Rating . 176 6.1.3 Alternate Designs for Preference Elicitation . 178 6.1.4 Information Theory . 179 6.2 Measuring Information Contained in Ratings . 182 6.2.1 Related Measurements . 188 6.3 Studies of Rating Scales . 189 6.3.1 Study 1: Simulated Rating System . 190 6.3.2 Study 2: Re-analysis of a past re-rating experiment . 194 6.4 Study 3: An Experiment With Rating Support Interfaces . 197 6.4.1 Rating Interfaces . 199 6.4.2 Measurements . 203 6.4.3 ExperimentStructure . 205 6.4.4 Results . 205 6.4.5 Discussion . 213 6.5 Conclusions . 213 7 Conclusion 217 References 224 v Appendix A. Output of SEM model for user-centered evaluation of Item-Item improvements 244 Appendix B. Outline of the BookLens Core Web server API 251 B.1 APIoverview ................................. 251 B.2 Authentication . 252 B.2.1 Tokens . 253 B.2.2 Signing Requests . 254 B.3 Book...................................... 255 B.3.1 WebRequests: ............................ 257 B.4 Opus...................................... 259 B.4.1 WebRequests............................. 260 B.5 User ...................................... 262 B.5.1 WebRequests............................. 263 B.6 Review..................................... 267 B.6.1 WebRequests............................. 269 B.7 BatchRequests................................ 272 B.8 WebRequests................................. 273 B.9 User Login . 274 vi List of Tables 2.1 Summary of mathematical notation . 15 2.2 A confusion matrix . 47 2.3 Information about the MovieLens datasets. 57 3.1 An overview the MELSA partner libraries. 65 3.2 A comparison of multi-community recommender designs . 73 3.3 Overall usage statistics for the BookLens system, broken out by source of data. 85 3.4 Measurement of catalog overlap . 89 3.5 Measurements of catalog 1-coverage(percent of opuses at each library with at least one rating) with and without pooled ratings for each library. 91 3.6 Average increase in monthly ratings at each library . 93 4.1 Summary of algorithms . 113 4.2 Summary of metrics . 114 4.3 Summary of algorithm behavior for new users . 125 5.1 Summary of algorithms including reason for inclusion in this experiment. 134 5.2 AverageRating@20, AILS@20, and recommendation time in seconds for the algorithms in a standard large-profile evaluation . 145 5.3 Summary of results of our analysis of improvements to the Item-Item Algorithm. 149 5.4 The list of movies used in the user evaluation of the damped Item-Item algorithm. 155 5.5 The questions in the first phase of our survey . 159 5.6 The questions in our survey and their factor loadings . 164 vii 5.7 Regression coefficients between factors, and for conditions (relative to random recommendation) on goodness and diversity factors . 165 5.8 Conditional probability of item response based on seen or unseen . 167 5.9 Conditional probability of item response based on user condition . 167 6.1 The probability distribution of ratings on the movie Titanic (1997) by users with known gender on MovieLens. 181 6.2 results from the re-analysis of the SCALES dataset . 195 6.3 Mean rating time per item in seconds for each scale and domain based on Sparling and Sen’s study . 196 6.4 Our quantitative results for four interfaces . 206 6.5 Itemspresentedinthequestionnaires. 210 viii List of Figures 2.1 Screenshot of the Pandora music streaming service . 6 2.2 Screenshot of the Jester joke recommender . 7 2.3 Screenshot of the MovieLens home page . 8 2.4 Screenshot of the Amazon home page . 9 2.5 Examples of various rating scales used in the wild. 13 2.6 An example ROC curve. Image used with permission from [41]. 48 3.1 A timeline of the development of the BookLens system . 64 3.2 Three recommender system designs capable of working with multiple communitiesofinterest. ........................... 71 3.3 A simplified data model for the BookLens system . 75 3.4 The architecture of the BookLens System . 79 3.5 The BookLens system (as deployed in one of our catalogs) . 81 3.6 A time-line of the deployments of the BookLens system. 82 3.7 Number of opuses with each rating count . 88 3.8 Number of monthly rates per client with and without rating pooling . 92 3.9 Number of ratings made by users of MovieLens and BookLens .