Serendipity in Recommender Systems JYVÄSKYLÄ STUDIES in COMPUTING 281
Total Page:16
File Type:pdf, Size:1020Kb
JYVÄSKYLÄ STUDIES IN COMPUTING 281 Denis Kotkov Serendipity in Recommender Systems JYVÄSKYLÄ STUDIES IN COMPUTING 281 Denis Kotkov Serendipity in Recommender Systems Esitetään Jyväskylän yliopiston informaatioteknologian tiedekunnan suostumuksella julkisesti tarkastettavaksi yliopiston Agora-rakennuksen Alfa-salissa kesäkuun 7. päivänä 2018 kello 12. Academic dissertation to be publicly discussed, by permission of the Faculty of Information Technology of the University of Jyväskylä, in building Agora, Alfa hall, on June 7, 2018 at 12 o’clock noon. UNIVERSITY OF JYVÄSKYLÄ JYVÄSKYLÄ 2018 Serendipity in Recommender Systems JYVÄSKYLÄ STUDIES IN COMPUTING 281 Denis Kotkov Serendipity in Recommender Systems UNIVERSITY OF JYVÄSKYLÄ JYVÄSKYLÄ 2018 Editors Marja-Leena Rantalainen Faculty of Information Technology, University of Jyväskylä Pekka Olsbo, Ville Korkiakangas Publishing Unit, University Library of Jyväskylä Permanent link to this publication: http://urn.fi/URN:ISBN:978-951-39-7438-1 URN:ISBN:978-951-39-7438-1 ISBN 978-951-39-7438-1 (PDF) ISBN 978-951-39-7437-4 (nid.) ISSN 1456-5390 Copyright © 2018, by University of Jyväskylä Jyväskylä University Printing House, Jyväskylä 2018 ABSTRACT Kotkov, Denis Serendipity in Recommender Systems Jyväskylä: University of Jyväskylä, 2018, 72 p. (+included articles) (Jyväskylä Studies in Computing ISSN 1456-5390; 281) ISBN 978-951-39-7437-4 (nid.) ISBN 978-951-39-7438-1 (PDF) Finnish summary Diss. The number of goods and services (such as accommodation or music streaming) offered by e-commerce websites does not allow users to examine all the avail- able options in a reasonable amount of time. Recommender systems are auxiliary systems designed to help users find interesting goods or services (items) on a website when the number of available items is overwhelming. Traditionally, rec- ommender systems have been optimized for accuracy, which indicates how often a user consumed the items recommended by system. To increase accuracy, rec- ommender systems often suggest items that are popular and suitably similar to items these users have consumed in the past. As a result, users often lose inter- est in using these systems, as they either know about the recommended items already or can easily find these items themselves. One way to increase user satis- faction and user retention is to suggest serendipitous items. These items are items that users would not find themselves or even look for, but would enjoy consum- ing. Serendipity in recommender systems has not been thoroughly investigated. There is not even a consensus on the concept’s definition. In this dissertation, serendipitous items are defined as relevant, novel and unexpected to a user. In this dissertation, we (a) review different definitions of the concept and evaluate them in a user study, (b) assess the proportion of serendipitous items in a typical recommender system, (c) review ways to measure and improve serendip- ity, (d) investigate serendipity in cross-domain recommender systems (systems that take advantage of multiple domains, such as movies, songs and books) and (e) discuss challenges and future directions concerning this topic. We applied a Design Science methodology as the framework for this study and developed four artifacts: (1) a collection of eight variations of serendipity definition, (2) a measure of the serendipity of suggested items, (3) an algorithm that generates serendipitous suggestions, (4) a dataset of user feedback regarding serendipitous movies in the recommender system MovieLens. These artifacts are evaluated using suitable methods and communicated through publications. Keywords: recommender systems, serendipity, relevance, novelty, unexpected- ness, personalization, evaluation, recommendation algorithms, eval- uation metrics, offline experiments, user study, serendipity metrics Author Denis Kotkov Faculty of Information Technology University of Jyväskylä Finland Supervisors Professor Dr. Jari Veijalainen Faculty of Information Technology University of Jyväskylä Finland Professor Dr. Pertti Saariluoma Faculty of Information Technology University of Jyväskylä Finland Dr. Shuaiqiang Wang Data Science Lab JD.COM China Reviewers Postdoctoral Researcher, Dr. Fedelucio Narducci Department of Computer Science University of Bari “Aldo Moro” Italy Senior Lecturer, Dr. Derek Bridge Department of Computer Science University College Cork Ireland Opponent Associate Professor, Dr. Konstantinos Stefanidis Faculty of Natural Sciences University of Tampere Finland ACKNOWLEDGEMENTS Many people have helped me conduct the research presented in this dissertation. I would first like to thank my supervisors, Prof. Jari Veijalainen, Dr. Shuaiqiang Wang and Prof. Pertti Saariluoma for their support during this research. I am thankful to the external reviewers and the opponent for agreeing to review my dissertation and providing their comments. I would like to thank my coauthors, Prof. Joseph A. Konstan, Qian Zhao, Dr. Alexander Semenov, Dr. Aleksandr Farseev and Prof. Tat-Seng Chua. I learned a lot from working with them. I also appreciate the help from people who are not coauthors of this disserta- tion and who might not have been mentioned, but who provided their comments or guidance and contributed to this research. I would like to express my gratitude to Dr. Max Harper and the GroupLens team at the University of Minnesota, who participated in my pilot experiments and guided me during my research visit. I am thankful to the members of the Somea team and the MineSocMed project at the University of Jyväskylä for their comments and guidance, and I would like to thank Gaurav Pandey, Dr. Sergey Chernov, Alexandr Maslov, Dr. Oleksiy Mazhelis and Daria Wadsworth for their comments and suggestions. I wish to thank my family and my friends for their support and interest in my work. Finally, I would like to acknowledge the grants that financially supported my research: Academy of Finland grant #268078 (MineSocMed), the grant from the KAUTE Foundation and the grants provided by the University of Jyväskylä (including the mobility grant). Jyväskylä, May 2018 Denis Kotkov “Recommendations and personalization live in the sea of data we all create as we move through the world, including what we find, what we discover, and what we love. We’re convinced the future of recommendations will further build on intelligent computer algorithms leveraging collective human intelligence. The future will continue to be computers helping people help other people.” – Smith, B. & Linden, G. 2017. Two decades of recommender systems at amazon.com. IEEE Internet Computing 21 (3), 12–18. LIST OF ACRONYMS IBCF Item-Based Collaborative Filtering kFN k-Furthest Neighbor kNN k-Nearest Neighbor MAE Mean Absolute Error MAP Mean Average Precision MF Matrix Factorization NDCG Normalized Discounted Cumulative Gain RMSE Root Mean Squared Error RWR Random Walk With Restarts RWR-KI Random Walk With Restarts Enhanced With Knowledge Infusion SOG Serendipity-Oriented Greedy SVD Singular Value Decomposition TD Topic Diversification UBCF User-Based Collaborative Filtering LIST OF FIGURES FIGURE 1 A MovieLens survey page ..................................................... 24 LIST OF TABLES TABLE 1 Design science research methodology process for this disser- tation................................................................................... 22 TABLE 2 Notations ............................................................................. 28 TABLE 3 An example of user-item rating matrix.................................... 32 TABLE 4 Definitions of serendipity. The sign “+” indicates inclusion of a component to the definition of serendipity. ........................... 39 TABLE 5 Statements that we asked users to rate regarding each movie. ... 40 CONTENTS ABSTRACT ACKNOWLEDGEMENTS LIST OF ACRONYMS LIST OF FIGURES LIST OF TABLES CONTENTS LIST OF INCLUDED ARTICLES 1 INTRODUCTION ............................................................................ 15 1.1 Motivation and research questions ............................................. 17 1.2 Research process ...................................................................... 20 1.3 Research methods..................................................................... 22 1.3.1 Literature review ........................................................... 22 1.3.2 The user study............................................................... 23 1.3.3 Offline evaluations......................................................... 24 1.4 The dissertation structure .......................................................... 25 2 RELATED WORK ............................................................................ 26 2.1 Recommender systems.............................................................. 26 2.1.1 History of recommender systems .................................... 26 2.1.2 Evaluation strategies...................................................... 27 2.1.3 Accuracy metrics ........................................................... 28 2.1.4 Diversity metrics ........................................................... 30 2.1.5 Accuracy-oriented algorithms......................................... 31 2.2 Serendipity: history and definition ............................................. 33 2.3 Cross-domain recommender systems ......................................... 34 3 MAIN RESULTS AND CONTRIBUTIONS.......................................... 37 3.1 Serendipity in recommender systems ......................................... 37 3.1.1