Towards Conversational Recommender Systems

Towards Conversational Recommender Systems Konstantina Christakopoulou⇤ Filip Radlinski, Katja Hofmann University of Minnesota Microsoft United States Cambridge, UK [email protected] {filiprad, katja.hofmann}@microsoft.com ABSTRACT users by the characteristics of the items they like or dislike. People often ask others for restaurant recommendations as a We note that neither model represents how real people make way to discover new dining experiences. This makes restau- recommendations, particularly in a cold-start setting where rant recommendation an exciting scenario for recommender the person making a recommendation does not know a lot systems and has led to substantial research in this area. about the person asking for one. However, most such systems behave very di↵erently from Consider what would happen if a conference attendee in a human when asked for a recommendation. The goal of your home town, whom you have never met before, asked this paper is to begin to reduce this gap. for a recommendation on where to eat dinner today. Most In particular, humans can quickly establish preferences likely, you would start with one or two clarifying questions, when asked to make a recommendation for someone they do perhaps whether the person likes seafood, or whether they not know. We address this cold-start recommendation prob- have a car. These questions would depend on the context; lem in an online learning setting. We develop a preference for instance if there are great restaurants around the corner, elicitation framework to identify which questions to ask a then whether they have a car would be irrelevant. new user to quickly learn their preferences. Taking advan- We argue that such an interaction can be represented tage of latent structure in the recommendation space using using online learning, where two types of learning are oc- a probabilistic latent factor model, our experiments with curring. First, the person making the recommendation is both synthetic and real world data compare di↵erent types learning about the preferences of the person asking. How- of feedback and question selection strategies. We find that ever, the attributes learned will be contextual, based on the our framework can make very e↵ective use of online user likely follow-on answers (such as the car question earlier). feedback, improving personalized recommendations over a Second, the person making the recommendation is learning static model by 25% after asking only 2 questions. Our about which questions allow them to quickly reach a good results demonstrate dramatic benefits of starting from of- recommendation in the current context. In this paper, we fline embeddings, and highlight the benefit of bandit-based present a recommendation system that exhibits these two explore-exploit strategies in this setting. types of learning. Further, the learning is online: It imme- diately impacts future recommendations for this user, rather than requiring a batch reprocessing of information learned. Keywords We present a bandit-based approach for online recom- online learning; recommender systems; cold-start mendation, applied to restaurant recommendation so as to ground it in a specific application. Our approach builds on top of prior work, such as [33], but learns to adapt the 1. INTRODUCTION recommendation space (user-item embedding) to its users Recommendation is an everyday process that frequently throughout their interactions. We use generalized Thomp- touches people’s lives. Hence, it has seen tremendous re- son Sampling to systematically sample questions to ask the search interest (such as [9, 14]). Most work in recommen- user and to incorporate observed feedback. We further pro- dation falls into two broad classes: Collaborative Filtering pose and compare a range of alternative question selection starts with a set of user/item affinity scores and assumes strategies to identify characteristics of approaches that most that two users who agree about one item are more likely to e↵ectively learn users’ preferences. We use a matrix factor- agree about another item; Content-Based Filtering models ization approach [21, 29] to learn and adapt the embedding. Our main contributions are four-fold. (1) We propose a ⇤This work was done during an internship at Microsoft. novel view of human-like recommenders that converse with Permission to make digital or hard copies of all or part of this work for personal or new users to learn their preferences. (2) We successfully classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation demonstrate a fully online learning approach for recommen- on the first page. Copyrights for components of this work owned by others than the dation – both using absolute and relative feedback. (3) We author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or propose a systematic approach to incorporating o✏ine data republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. to initialize online learning recommenders, and demonstrate KDD ’16, August 13 - 17, 2016, San Francisco, CA, USA performance improvements, even using weakly labeled of- c 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. fline data. (4) We propose a set of item selection strategies ISBN 978-1-4503-4232-2/16/08. $15.00 for deciding what question to ask to a cold-start user to DOI: http://dx.doi.org/10.1145/2939672.2939746 most quickly infer preferences, and demonstrate benefits of feedback has long been of interest for a variety of tasks (e.g bandit-based strategies. Our extensive experiments on both [8]). To elicit the preferences of an existing or new user synthetic and real data evaluate each step of our approach. (cold-start), a range of methods have been proposed, varying Importantly, we note that this work is applicable to a wide from interview-based strategies (e.g [30]), to asking users to variety of recommendation scenarios. Here, we focus on one rate some items, to active learning [26], entropy minimiza- such application, restaurant recommendation, where (1) we tion [27], picture-based [22], and explore-exploit strategies study search logs to understand the space of real user needs; on top of a latent factor model [33]. Our work is the first to (2) we use the insights to collect preference data in a user elicit users’ preferences by utilizing either absolute or rela- study; (3) we use online behavioral data to initialize the tive feedback in a fully online setting. recommendation system; (4) we use both synthetic and user Interactive Recommenders. There have been many study data to evaluate our approaches. works (critique-based [7], constraint-based [10], dialog, utility- We now present related work (Section 2), followed by an based recommenders [19]) emphasizing the importance of analysis of real-live restaurant search (3). We describe our interactivity in recommenders so that the user has a more model (4) and empirical setup (5), and validate the model active role over the recommendations. However, these works on synthetic data (6). Finally, we evaluate on real data to rely on prior modeling of the items’ features, preventing the further empirically analyze our learning framework (7). flexibility in adaptation to a di↵erent domain; thus a com- parison with them is out of the scope of this work. 2. RELATED WORK In contrast, our work, in the same spirit as a recent line of work [18, 11] initiated by [33], learns online the latent fac- This paper builds on work in multiple active research ar- tors from PMF and uses these to do interactive preference eas. We review recent work in the most closely related areas. elicitation. These methods, although close in principle, have O✏ine Recommenders. The wide interest in person- significant di↵erences from our work.1 Briefly, in [18], the alized recommendations has sparked substantial research in authors focus on set-based feedback and propose a progres- this area [14]. The most common approaches are content- sive building of the user’s latent factor in each question. In based approaches [24] and collaborative filtering (CF) [9, contrast, our work focuses on absolute and relative feedback 21]. Collaborative filtering, which powers most modern rec- on (pairs of) items, and uses a preference elicitation phase ommenders, uses an a-priori available set of user-item rat- fully integrated with the online updates of the PMF model. ings to learn the interdependencies among users and items. The method of [11] focuses on choice-based feedback and It predicts a user’s rating on an item either via the neigh- updates online only the user’s latent factor. Neither of [11, boring items’ ratings (neighbor-based [9, 28]) or by inferring 18] balance the exploration-exploitation tradeo↵. latent factors in a low-dimensional embedding (latent factor- based [21, 29]). The majority of the work in recommendation has focused on o✏ine recommenders, i.e., building an 3. UNDERSTANDING REAL USERS o✏ine model based on past user-item interactions, periodi- A recommendation system that aims to satisfy real peo- cally retrained to incorporate new observations. ple should reflect how real people look for recommendations. Online Recommenders. Recently, it has been recog- Therefore, we start by analyzing restaurant-related search nized that o✏ine recommender approaches (i) su↵er from behavior in a commercial Web search engine. Our goals the cost of retraining the model, (ii) are built to optimize are two-fold. First, to gain insight into our target domain, o✏ine performance which does not necessarily match on- namely understand the criteria people use to choose restau- line user behavior and (iii) fail to capture preference drifts.

Load more