Sequential Learning of Product Recommendations with Customer Disengagement Hamsa Bastani Wharton School, Operations Information and Decisions,
[email protected] Pavithra Harsha IBM Thomas J. Watson Research,
[email protected] Georgia Perakis MIT Sloan School of Management, Operations Management,
[email protected] Divya Singhvi MIT Operations Research Center,
[email protected] We consider the problem of sequential product recommendation when customer preferences are unknown. First, we present empirical evidence of customer disengagement using a sequence of ad campaigns from a major airline carrier. In particular, customers decide to stay on the platform based on the relevance of recommendations. We then formulate this problem as a linear bandit, with the notable difference that the customer's horizon length is a function of past recommendations. We prove that any algorithm in this setting achieves linear regret. Thus, no algorithm can keep all customers engaged; however, we can hope to keep a subset of customers engaged. Unfortunately, we find that classical bandit learning as well as greedy algorithms provably over-explore, thereby incurring linear regret for every customer. We propose modifying bandit learning strategies by constraining the action space upfront using an integer program. We prove that this simple modification allows our algorithm to achieve sublinear regret for a significant fraction of customers. Furthermore, numerical experiments on real movie recommendations data demonstrate that our algorithm can improve customer engagement with the platform by up to 80%. Key words : bandits, online learning, recommendation systems, disengagement, cold start History : This paper is under preparation. 1. Introduction Personalized customer recommendations are a key ingredient to the success of platforms such as Netflix, Amazon and Expedia.