Addressing Present Bias in Movie Recommender Systems and Beyond
Total Page:16
File Type:pdf, Size:1020Kb
Addressing Present Bias in Movie Recommender Systems and Beyond Kai Lukoffa aUniversity of Washington, Seattle, WA, USA Abstract Present bias leads people to choose smaller immediate rewards over larger rewards in the future. Recom- mender systems often reinforce present bias because they rely predominantly upon what people have done in the past to recommend what they should do in the future. How can recommender systems over- come this present bias to recommend items in ways that match with users’ aspirations? Our workshop position paper presents the motivation and design for a user study to address this question in the domain of movies. We plan to ask Netflix users to rate movies that they have watched in the past for thelong- term rewards that these movies provided (e.g., memorable or meaningful experiences). We will then evaluate how well long-term rewards can be predicted using existing data (e.g., movie critic ratings). We hope to receive feedback on this study design from other participants at the HUMANIZE workshop and spark conversations about ways to address present bias in recommender systems. Keywords present bias, cognitive bias, algorithmic bias, recommender systems, digital wellbeing, movies 1. Introduction such as Schindler’s List [2]. Recommender systems (RS), algorithmic People often select smaller immediate re- systems that predict the preference a user wards over larger rewards in the future, a would give to an item, often reinforce present phenomenon that is known as present bias bias. Today, the dominant paradigm of rec- or time discounting. This applies to deci- ommender systems is behaviorism: recom- sions such as what snack to eat [1,2], how mendations are selected based on behavior much to save for retirement [3], or which traces (“what users do”) and they largely ne- movies to watch [2]. For example, when peo- glect to capture explicit preferences (“what ple choose a movie to watch this evening users say”) [4]. Since “what users do” reflects they often choose guilty pleasures like The a present bias, RS that rely upon such actions Fast and The Furious, which are enjoyable in- to train their recommendations will priori- the-moment, but then quickly forgotten. By tize items that offer high short-term rewards contrast, when they choose a movie to watch but low long-term rewards. In this way, rec- next week, they are more likely to choose ommender systems may reinforce what the films that are challenging but meaningful, current self wants rather than helping peo- ple reach their ideal self [5]. Joint Proceedings of the ACM IUI 2021 Workshops, April This position paper for the HUMANIZE 13-17, 2021, College Station, USA workshop proposes a study design to address " [email protected] (K. Lukoff) these topics in the domain of movies. In ~ https://kailukoff.com (K. Lukoff) Study 1, a survey of Netflix users, we in- 0000-0001-5069-6817 (K. Lukoff) © 2020 Copyright for this paper by its authors. Use permit- vestigate: How should a RS make recom- ted under Creative Commons License Attribution 4.0 Inter- national (CC BY 4.0). mendations by asking ordinary users about CEUR Workshop Proceedings CEUR http://ceur-ws.org Workshop ISSN 1613-!!" the rather academic concept of "long-term Proceedings (CEUR-WS.org) rewards"? And can long-term rewards be to transform their behavior [9]. Lukoff et predicted based on existing data (e.g., movie al. previously explored how experience sam- critic ratings)? In Study 2, a participatory de- pling can be used to measure how meaning- sign exercise with a movie RS, we ask: How ful people find their interactions with smart- do users want a RS to balance short-term phone apps immediately after use [10]. How- and long-term rewards? And what controls ever, building such reflection into RSs re- would users like to have over such a RS? mains a major challenge because it is un- We expect that our eventual findings will clear how and when a system should ask a also inform the design of recommender sys- user about the "long-term rewards" of an ex- tems that address present bias in other do- perience. It may be that the common ap- mains such as news, food, and fitness. proach of asking users to rate items on a "5- star" scale reflects a combination of short- term and long-term rewards, and that a dif- 2. Related Work ferent prompt is required to capture evalua- tions of long-term rewards more specifically. The social psychologist Daniel Kahneman It is also an open question how well such describes people as having both an experi- long-term rewards can be inferred from ex- encing self, who prefers short-term rewards isting data. like pleasure, and a remembering self, who The third perspective leverages the wis- prefers long-terms rewards like meaningful dom of the crowd, by using the collective experiences [6]. Lyngs et al. describe three elicited preferences of similar users with more different approaches to the thorny question experience to make recommendations. Rec- of how to measure a user’s “true preferences" ommender systems today tend to use the "be- [5]. The first approach aligns with the expe- havior of the crowd" as input into their mod- riencing self, the second with the remember- els, in the form of behavioral data of similar ing self, and the third with the wisdom of the users, but largely neglect elicited preferences crowd. [4]. The first approach follows the experienc- Finally, Ekstrand and Willemsen propose ing self and asserts that what users do is what participatory design as a general corrective they really want, which many in the Silicon to the behaviorist bias of recommender sys- Valley push one step further to what we can tems [4]. Harambam et al. explored using get users to do is what they really want [7]. participatory methods to evaluate a recom- Social media that are financed by advertising mender system for news, suggesting that giv- are "compelled to find ways to keep users en- ing users control might mitigate filter bub- gaged for as long as possible" [8]. To achieve bles in news consumption [11]. Participa- this, social media services often give the ex- tory design is a promising way to investigate periencing self exactly what it wants, know- how users want a RS to balance short-term ing that it will override the preferences of the and long-term rewards and the controls they remembering self and lead the user to stay would like to have. engaged for longer than they had intended. The second approach prioritizes the re- membering self, calling for systems to 3. Proposed Study Design prompt the user to reflect on their ideal self. In this vein, Slovak et al. propose to design In what follows, we propose a study design to to help users to reflect upon how they wish better understand how to measure the long- term rewards of items in the context of movie • For long-term rewards: How reward- recommendations. We hope to receive feed- ing was this movie after you watched back on this study design from other par- it? ticipants at the HUMANIZE workshop and prompt conversations about ways to address Participants will rate all questions on a 1-5 present bias in recommender systems more scale, from "Not at all" to "Very." broadly. We are also interested in understanding what other constructs are correlated with the both short-term and long-term rewards. To 3.1. Study 1: Eliciting user this end, we are also asking about related preferences for long-term constructs, such as: rewards • How enjoyable was this movie while Our first study is a survey of Netflix users you were watching it? that we are currently piloting, which ad- dresses two research questions: • How interesting was this movie while you were watching it? • RQ 1a: What wording should be used to ask users to rate the “short-term re- • How meaningful was this movie after wards” and “long-term rewards” of a you watched it? movie? In other words, what wording • How memorable was this movie after captures the right construct and makes you watched it? sense to users? We are currently piloting all questions us- • RQ 1b: How well can a recommender ing a talk-aloud protocol in which partici- system predict the long-term rewards pants explain their thinking to us as they of a movie for an individual using data complete the survey. We are checking to other than explicit user ratings? make sure that the wording makes sense to participants and to identify constructs that 3.1.1. Study 1 Methods are related to short-term and long-term re- We will ask Netflix users who watch at least wards. The constructs that are most closely one movie per month to download their past related to these rewards in the piloting will viewing history and share it with us. We will be included in the final survey, so that each ask them to rate 30 movies: 10 watched in movie will be rated for a cluster of constructs the past year, 10 watched 1-2 years ago, and all related to short-term and long-term re- 10 watched 2-3 years ago. Participants will wards. For the final survey, we plan to recruit rate each movie for short-term reward, long- about 50 Netflix users to generate a total of term reward, and other constructs that might 1,500 movie ratings. be correlated with these rewards (e.g., mean- ingfulness, memorability). 3.1.2. Study 1 Planned Analysis The current wording of our questions is: To address RQ 1a, we will report the qual- • For short-term rewards: How reward- itative results of our talk-aloud piloting of ing was this movie while you were the survey wording.