Leveraging Collective Intelligence in Recommender System
Total Page:16
File Type:pdf, Size:1020Kb
Leveraging Collective Intelligence in Recommender System A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Shuo Chang IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy Loren G Terveen August, 2016 c Shuo Chang 2016 ALL RIGHTS RESERVED Acknowledgements Through the five years of my graduate school, I have received numerous support from my family, friends and colleagues. This thesis would not have been possible without them. First and foremost, I thank my advisor Dr. Loren Terveen for sharing his wisdom and helping me to grow. He gave me the freedom to pursue my research interest and challenge me to be better. During difficult times, he consuls me with his kindness and patience. I am forever grateful for what I have learned from him during this five years. I would also like to thank my committee members | Dr. Joseph Konstan, Dr. George Karypis and Dr. Yuqing Ren | for providing valuable feedback. I thank Ed Chi, Peng Dai and Elizabeth Churchill for giving me internship oppor- tunities and being awesome mentors. During the internships, I was fortunate to receive guidance and support from Atish Das Sarma, Lichan Hong, Jilin Chen and Zhiyuan Cheng. I am very thankful to be a member of the GroupLens research lab, which will always have a special place in my heart. I'd like to thank Dr. John Riedl for welcoming me to the GroupLens family. I truly enjoyed working and being friend with Max Harper, Vikas Kumar, Daniel Kluver, Tien Nguyen, Hannah Miller, Jaccob Thebault-Spieker, Aditya Pal, Isaac Johnson, Qian Zhao, Dr. Brent Hecht and other members in GroupLens. Last but not least, I would like thank my parents for giving me inspiration for pursuing a PhD, and Steve Gartland and Merlajean Gartland for making me feel at home in a foreign country. Finally, I am thankful to my wife, Jingnan Zhang, for her encouragement, patience and love. I am lucky to have her to share all the laughters and tears while pursuing my dream. i Dedication To John Riedl, a greatly missed mentor and role model. ii Abstract Recommender systems, since their introduction 20 years ago, have been widely deployed in web services to alleviate user information overload. Driven by business objectives of their applications, the focus of recommender systems has shifted from accurately modeling and predicting user preferences to offering good personalized user experience. The later is difficult because there are many factors, e.g., tenure of a user, context of recommendation and transparency of recommender system, that affect users' perception of recommendations. Many of these factors are subjective and not easily quantifiable, posing challenges to recommender algorithms. When pure algorithmic solutions are at their limits in providing good user expe- rience in recommender systems, we turn to the collective intelligence of human and computer. Computer and human are complementary to each other: computers are fast at computation and data processing and have accurate memory; humans are capable of complex reasoning, being creative and relating to other humans. In fact, such close collaborations between human and computer have precedent: after chess master Garry Kasparov lost to IBM computer \Deep Blue", he invited a new form of chess | ad- vanced chess, in which human player and a computer program teams up against such pairs. In this thesis, we leverage the collective intelligence of human and computer to tackle several challenges in recommender systems and demonstrate designs of such hybrid systems. We make contributions to the following aspects of recommender systems: providing better new user experience, enhancing topic modeling component for items, composing better recommendation sets and generating personalized natural language explanations. These four applications demonstrate different ways of designing systems with collective intelligence, applicable to domains other than recommender systems. We believe the collective intelligence of human and computer can power more intelligent, user friendly and creative systems, worthy of continuous research effort in future. iii Contents Acknowledgements i Dedication ii Abstract iii List of Tables viii List of Figures x 1 Introduction 1 1.1 What is Collective Intelligence? . .2 1.2 Challenges in Recommender System . .3 1.3 Leverage Collective Intelligence in Recommender Systems . .4 1.4 Research Platforms . .5 1.4.1 MovieLens . .5 1.4.2 Google+ . .7 1.5 Thesis Overview . .8 2 Interactive New User Onboarding Process for Recommender Systems 10 2.1 Introduction . 10 2.2 Related work . 13 2.3 Design Space Analysis . 15 2.3.1 Design space . 16 2.3.2 Design challenges . 17 iv 2.4 Feasibility study . 21 2.4.1 Data . 21 2.4.2 Method . 22 2.4.3 Baseline . 22 2.4.4 Metrics . 23 2.4.5 Results . 23 2.5 User Experiment . 26 2.5.1 Overview of the cluster-picking interface . 26 2.5.2 Method . 26 2.5.3 Results . 29 2.6 Discussion . 32 3 Automatic Topic Labeling for Multimedia Social Media Posts 34 3.1 Introduction . 34 3.2 Related Work . 36 3.2.1 Crowdsourcing . 36 3.2.2 Topic Extractors and Annotators . 37 3.2.3 Topic Labeling for Twitter . 38 3.3 Our Approach . 38 3.3.1 Single-Source Annotators . 39 3.3.2 Crowdsourcing Training Labels . 41 3.3.3 Supervised Ensemble Model . 44 3.4 Evaluation . 47 3.4.1 Evaluation Setup . 47 3.4.2 Binary Classification for Main Topic Labels . 48 3.4.3 Multiclass Classification of Topic Labels . 52 3.5 Discussion . 54 4 CrowdLens: Recommendation Powered by Crowd 56 4.1 Introduction . 56 4.2 Related work . 58 4.3 The CrowdLens Framework . 60 4.3.1 Recommendation Context: Movie Groups . 60 v 4.3.2 Crowd Recommendation Workflow . 61 4.3.3 User Interface . 62 4.4 Experiment . 63 4.4.1 Obtaining Human Quality Judgments . 65 4.5 Study: Recommendations . 66 4.5.1 Different pipelines yield different recommendations . 67 4.5.2 Measured Quality: Algorithm slightly better . 67 4.5.3 Judged Quality: Crowdsourcing pipelines slightly preferred . 68 4.5.4 Diversity: A trend for crowdsourcing . 69 4.5.5 Crowdsourcing may give less common recommendations . 71 4.5.6 Recency: Little difference . 71 4.5.7 Discussion . 72 4.6 Study: Explanations . 73 4.6.1 Explanation quality . 74 4.6.2 Features of Good Explanations . 74 4.6.3 Discussion . 76 4.7 Summary Discussion . 77 5 Personalized Natural Language Recommendation Explanations Pow- ered by Crowd 78 5.1 Introduction . 78 5.2 Design Space and Related Work . 80 5.3 System Overview . 82 5.4 Experiment Platform . 83 5.5 Overview of System Processes . 84 5.5.1 Select Key Topical Dimensions of Items . 84 5.5.2 Generate Natural Language Explanations for The Key Topical Dimensions . 87 5.5.3 Model Users' Preferences and Present Explanations in a Person- alized Fashion . 90 5.6 User Evaluation . 91 5.6.1 Survey Design . 92 vi 5.6.2 Results . 94 5.7 Discussion . 97 6 Conclusion 99 6.1 Summary of Contributions . 99 6.2 Implications . 101 6.3 Future Work . 103 References 105 Appendix A. Glossary and Acronyms 117 A.1 Additional Figures . 117 vii List of Tables 2.1 Example of weighted aggregation of ranked item lists. List 1 has 2 points and list 2 has 1 point. We assign scores to items in the lists based on ranking. We re-rank items based on the weighted average of item scores in two lists and take top 3. 21 2.2 User-expressed familiarity with recommended movies and prediction ac- curacy. Familiarity is represented by the average number of movies (out of 10) that users had heard of or seen. Prediction accuracy is measured by average RMSE. 31 3.1 Comparison of two answer statistics between with and without verifiable questions(\VQ" and \No VQ"). Workers spend longer time on tasks and have higher chance to reach agreement when there are verifiable questions. 43 3.2 Unbalanced class distribution in gold standard data set. 49 3.3 Comparison of the best performing ensemble model and single-source annotators, listed above. \All annotator" predicts the union of topic labels from single-source annotators as positive. The best performing algorithm is GBC trained on all five features, having the best overall F 1-score here. 49 3.4 The F 1-scores of the best-performing model of every classification algo- rithm for multiclass classification along with their feature combinations. GBC performs the best. All ensemble learning algorithms outperform the baseline algorithm, which consistently predicts the most popular cat- egory, \Main or Important". 53 4.1 Final recommendation lists from the five pipelines for the movie group shown in Figure 4.1. 67 viii 4.2 Average number of overlapping movies recommended from any pair of pipelines across 6 movie groups. Each pipeline generates 5 movies as final recommendations. Standard deviations are included in parenthesis 67 4.3 Measured and human-judged quality of recommended movies. For a movie group in a pipeline, measured quality is computed as the average of ratings (on a 5 star scale) on recommended movies from MovieLens users who indicated a preference for the movie group. User judgments from the online evaluation range from -2 (very inappropriate) to 2 (very appropriate). Both columns show the average for all pipelines across six movie groups. 68 4.4 Extracted features of recommendation explanations, along with their ef- fect and statistical significance in a logistic regression model to predict whether an explanation is evaluated to be good or bad by MovieLens users. 75 5.1 Summary of questions asked in the 3-stage user survey.