A User-Centric Evaluation of the Netflix Recommender System

A user-centric evaluation of the Netflix recommender system User experience beyond algorithm accuracy Abstract This study focuses on the user experience of recommender systems. User experience can be defined as how a user evaluates the interaction with a system or service. Recommender systems incorporate machine learning algorithms in order to provide recommendations from a large set of items. As a result, they are often evaluated by their objective recommendation accuracy. However, research on the topic states that user-centric subjective evaluations should always be considered as well. In this study, the user experience of a real-world recommender is analyzed; the Netflix platform. The goal of this study is characterize Netflix’s view on user experience and to compare it to the perspective of actual Netflix users. This is achieved by analyzing the Netflix recommender system, using existing literature on user experience in recommender systems and by executing a controlled experiment among a selection of Netflix users. The controlled experiment features a task scenario for the participants, followed by a user experience survey. The results from this study provide insight into how the Netflix platform is evaluated by its users, what they believe to be important in their user experience and how this compares to the vision of Netflix. Robin Schouten 10743294 Supervisor: dr. J.A.C. Sandberg Second examiner: drs. A.L. van Pappelendam Bachelor thesis Information Science June, 2019 Faculty of Science | University of Amsterdam Table of contents 1. Introduction 3 2. Theoretical framework 4 2.1 User experience in recommender systems 4 2.2 A user-centric framework for the evaluation of recommender systems 5 2.3 Netflix’s view on user experience 6 2.3.1 Netflix’s focus on algorithm accuracy 6 2.3.2 The Netflix recommender system 6 2.4 Hypotheses/predictions 8 3. Methodology 10 3.1 Participants 10 3.2 Task scenario 10 3.3 User experience survey 11 3.4 Procedure 13 4. Results 14 5. Conclusion and discussion 19 6. Limitations and further research 20 6.1 Limitations 20 6.2 Further research 20 References 21 Appendix: User experience survey 22 2 1. Introduction We live in a time where recommender systems are found everywhere, from social media to e- commerce to entertainment platforms. More than ever, people want to have fewer, but more personalized options, in order to save time and make better choices. Recommender systems help people achieve this by providing personalized recommendations from a large catalog of items, which are generated by intricate machine learning algorithms. For some time now, developers of recommender systems have made an effort to design increasingly more accurate algorithms in order to provide users with recommendations that are even more tailored to their preferences. As a result, it is often assumed that more accurate recommendations directly lead to a better user experience, In order to further explore this assumption, this study focuses on the user experience of a real-world recommender system: Netflix. Within the research topic of the user experience of recommender systems there has been a paradigm shift from mainly considering algorithm accuracy to acknowledging that user experience is a more complex concept. Users want more than just a precise representation of their preferences in their recommendations (Ricci, Rokach, & Shapira, 2015). Consequently, evaluating user experience is a challenging effort that requires a system incorporating all its relevant aspects (Konstan & Riedl, 2012). Several researchers have made an effort to explore the dynamics of user experience in recommender systems. Knijnenburg, Willemsen, & Kobsa (2011) have created a pragmatic framework for the evaluation of recommender systems that goes beyond algorithm accuracy and puts an emphasis on the users’ self-reported user experience. A similar framework has been designed by Pu, Chen, & Hu (2011), which also considers user experience in recommender systems as multidimensional, but lacks the inclusion of personal and situational characteristics in comparison to the aforementioned framework. Both frameworks provide methodologies for evaluating recommender systems from the perspective of the user. The academic relevance of this study is that it applies a conceptual recommender system evaluation framework to a real-world recommender system (Netflix). The practical relevance of this study is that it may provide valuable findings for Netflix as a company by evaluating the user experience of its recommender system in a user-centric manner, going beyond algorithm accuracy. The main goal is to characterize Netflix’s view on user experience and comparing it to the experience of actual users of the platform. This is done by analyzing the Netflix recommender system using available literature and evaluating its user experience through a controlled experiment. For this study, the following research questions have been formulated: - How can Netflix’s view on user experience be characterized? - How do Netflix users evaluate the Netflix recommender system? - How does Netflix’s view on user experience compare to the experience of Netflix users? The first question is answered in the theoretical framework. The second question is answered partly in the theoretical framework and partly through the results of a controlled experiment. After analyzing the results of the experiment, the final question can be answered. 3 2. Theoretical framework 2.1 User experience in recommender systems A recommender system is a system designed to provide personalized recommendations to a user based on machine learning algorithms. In short, higher algorithm accuracy leads to more accurate recommendations. The purpose of a recommender is that it helps a user make better, faster and more relevant choices from a large set of items, such as a web shop product catalog or a movie database. Without a recommender, all users would be presented with the same catalog items. Most of the time the recommendations feature some form of personalization. This is often based on a user’s previous interaction with the platform and the choices (s)he has made in the past (Ricci et al., 2015). The moment a person starts using a platform with a recommender, a personal profile is built up, which keeps track of interaction and choice behavior. At the creation of the profile, there obviously is no past data available to base recommendations on. This is called the cold start problem and is often tried to mitigate by generalizing recommendations at first and steadily introducing increased personalization as soon as more user data becomes available (Chang, Harper, & Terveen, 2015). In order to provide users with relevant, personalized recommendations, their preferences thus first have to be characterized. Platforms achieve this by implementing preference elicitation, which is a method of evoking feedback data from users. This can be done in two ways; implicitly and explicitly (Rashid, Karypis, & Riedl, 2008). Implicit feedback means that the system gathers behavioral user data, such as viewing history and clicking patterns, to estimate their preferences. Explicit feedback relies on users to express their preferences through interaction with a feedback system by leaving reviews or ratings on the content of the recommender. The recommender system continually learns from feedback data and can make increasingly more accurate predictions because of it. There are two main types of recommender systems: collaborative filtering recommenders and content-based recommenders (Ricci, et al., 2015). A combination of the two is called a hybrid recommender. The collaborative filtering recommender provides recommendations based on similar users’ interaction with catalog items through implicit and/or explicit feedback. An example is that of e-commerce company Amazon, which suggests items based on the purchase history and/or reviews of similar customers. In contrast, a content-based recommender bases its recommendations not on the behavior of similar users, but rather on the similarity between the characteristics of the items in the content catalog. An example of this is the Internet Movie Database (IMDb), which recommends movies and series based on criteria such as overlapping cast members or having the same genre. User experience (UX) has been a difficult concept to define by researchers, which has resulted in diverse definitions being available. An early, general approach to defining user experience (UX) is that of Hassenzahl (2008): “[UX is] a momentary, primarily evaluative feeling (good-bad) while interacting with a product or service.” Knijnenburg, Willemsen, Gantner, Soncu, & Newell (2012) define user experience in recommender systems as “the users’ evaluation of their interaction with the system”. The definition of Konstan & Riedl (2012) is as follows: “the delivery of the recommendations to the user and the interaction of the user with those recommendations” and focuses more on the recommendations provided by a recommender system. For this study, the definition of Knijnenburg et al. is used as it emphasizes the perspective of the user and how (s)he experiences the recommender system. 4 2.2 A user-centric framework for the evaluation of recommender systems Knijnenburg et al. (2012) state that UX should inherently be considered from the perspective of the user. Additionally, they stress that UX does not exist in a vacuum and that there are several more aspects that influence it. As a result, they have suggested a comprehensive

A User-Centric Evaluation of the Netflix Recommender System

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support