Diversifying Music Recommendations
Total Page:16
File Type:pdf, Size:1020Kb
Diversifying Music Recommendations Houssam Nassif [email protected] Kemal Oral Cansizlar [email protected] Mitchell Goodman [email protected] Amazon, Seattle, USA S. V. N. Vishwanathan [email protected] Amazon, Palo Alto, USA, and University of California, Santa Cruz, USA Abstract It is common for music recommendations to be rendered in list form, which makes it easy for users to peruse on We compare submodular and Jaccard meth- desktop, mobile or voice command devices. Naively rank- ods to diversify Amazon Music recommenda- ing recommended songs by their personalized score results tions. Submodularity significantly improves rec- in lower user satisfaction because similar songs get recom- ommendation quality and user engagement. Un- mended in a row. Duplication leads to stale user experi- like the Jaccard method, our submodular ap- ence, and to lost opportunities for music content providers proach incorporates item relevance score within wanting to showcase their content selection breadth. This its optimization function, and produces a relevant impact is amplified on devices with limited interaction ca- and uniformly diverse set. pabilities. For example, smart phones have a limited screen real estate, and it is usually more onerous to navigate be- tween screens or even scroll down the page (see Figure1). 1. Motivation In fact, other factors besides accuracy contribute towards With the rise of digital music streaming and distribution, recommendation quality. Such factors include diversity, and with online music stores and streaming stations dom- novelty, and serendipity, which complement and often inating the industry, automatic music recommendation is contradict accuracy (Zhang et al., 2012). Since we also becoming an increasingly relevant problem. Various rec- deal with user-constructed libraries, we focus on exploring ommender systems have been proposed, including models methods to diversify music recommendations. based on collaborative filtering (Xing et al., 2014), con- tent (van den Oord et al., 2013; Soleymani et al., 2015), This work presents experiments to diversify Amazon Music context and emotions (Song et al., 2012). Most of those mobile app recommendations. Amazon offers Prime mem- recommender systems focus on improving recommenda- bers a free Prime Music benefit, with access to millions of tion accuracy and user preference modeling, in order to songs and thousands of expert-programmed playlists. Cus- produce individually more enjoyable items. tomers can also upload their own music to their library, and mix it with Prime Music content to create personal Unlike other digital and physical products, music content playlists. Amazon Music developed a recommender sys- tends to have explicit clusters. An album contains multiple tem that assigns a personalized score to each music content. songs, all of which share the same album cover graphic, title and description. Furthermore, songs within the same album tend to belong to the same genre, and are usually 2. Diversity Methods played back to back. Due to their similar features, recom- Similar to cases in visual discovery (Teo et al., 2016), im- mender systems tend to score same-album songs similarly. age search (van Leuken et al., 2009), blog posts (El-Arini et al., 2009), and news articles (Ahmed et al., 2012), we c Nassif, Cansizlar, Goodman, Vishwanathan, Licensed under a apply diversification to alleviate recommendation redun- Creative Commons Attribution 4.0 International License (CC BY dancy. We consider two different diversity methods, one 4.0). Attribution: Nassif, Cansizlar, Goodman, Vishwanathan, based on Jaccard distance, and the other on submodularity. “Diversifying Music Recommendations” Machine Learning for Music Discovery Workshop at the 33 rd International Conference on Machine Learning, New York, NY, 2016. Diversifying Music Recommendations 2.1. Jaccard Swap Diversity the Amazon Prime Music app track, album and playlist rec- ommendations (Figure1). We run a 3-way A/B test on The Jaccard distance measures dissimilarity between two top of the Amazon Music recommender system: baseline, finite sample sets A and B: Jaccard Swap diversity, and submodular diversity. Recom- jA [ Bj − jA \ Bj mender uses item-to-item collaborative filtering and pro- d (A; B) = : (1) J jA [ Bj vides item score and explanatory set (Linden et al., 2003). We use the artist and album features as attributes/categories For each candidate music recommendation, we use an for the Jaccard/submodular methods. Baseline ranks by explanation-based diversification method to generate a set recommender score and lacks diversity. The experiment of weighted corresponding explanatory items (Yu et al., lasted 4 weeks, with equal allocation of at least 700,000 2009). The explanatory items are latent features of the customers per treatment. We evaluate the treatment im- candidate, generated from content and behavioral features. pact on engagement by tracking the number of minutes We compute the Jaccard distance between two music rec- streamed. We compare the method’s lift (Nassif et al., ommendations by applying Equation1 to their underlying 2013) via Welch’s t-test. explanatory items (Clarkson, 2006). We generate a list of k = 40 recommendations using the Algorithm Swap Table 1. Increase in number of minutes streamed. method (Yu et al., 2009), which iteratively maximizes top k Treatment Jaccard Swap Submodular pair-wise Jaccard distance, conditioned on score relevance. Baseline 0:40% (p = 0:18) 0:64% (p = 0:03) 2.2. Submodular Diversity Jaccard Swap 0:24% (p = 0:41) Alternatively, we formulate the selection and ranking of a diverse musical subset as a submodular optimization prob- Table1 shows experimental results. Based on minutes lem (Fujishige, 2005). Submodular functions are charac- streamed, both diversity measures fare better than baseline. terized by a diminishing returns property. For set S, sub- This result reinforces the notion that diversity affects rec- set A ⊆ S, elements x; y 2 S, and submodular function ommendation quality (Zhang et al., 2012). Only submodu- larity’s 0:64% lift improvement is significant (Figure1). f : f0; 1gS ! R, we have: f(A [ fxg) − f(A) ≥ f(A [ fx; yg) − f(A [ fyg): (2) We divide all musical content into C categories, according to the same content attributes. Each scored candidate gets mapped to multiple categories based on its content and be- havioral features. Each category c has its own submodular function fc. To ensure that a candidate contributes no more than its personalized score, we use: ! X fc(A) = log 1 + score(ai) : (3) i We diversify by maximizing the sum ρ of all category func- tions fc, which is itself submodular: C ! X (a) Baseline recommendation (b) Submodular diversity argmax ρ(A) = fc(A) : (4) A c Figure 1. Effect of diversification on Amazon Prime Music mo- bile app personalized album recommendations. A near-optimal solution is achieved by an iterative greedy procedure (Nemhauser et al., 1978): Our submodular solution outperforms the Jaccard ap- ( ) proach. One reason may be that maximizing the submod- A0 := ; and Ai+1 := Ai [ argmax ρ(Ai [ fag) : (5) ular function ρ (Equation4) by iteratively picking the cat- a2AnAi egory with the highest gain (Equation5) produces a uni- formly diverse stream, as the diminishing returns property 3. Results and Discussion holds for any contiguous part of the recommended list. Jac- card Swap doesn’t guarantee such a smooth list. We test the effectiveness of our diversity methods using an online experiment to improve customer engagement with Another possible reason is due to Algorithm Swap, which Diversifying Music Recommendations does not necessarily retain the most relevant content at the personalized diversity for visual discovery. In Proceed- head of the list. The swap can sacrifice a highly relevant ings of the 10th ACM Conference on Recommender Sys- song in order to increase overall diversity (Yu et al., 2009). tems (RecSys), 2016. The submodular approach ensures that the most relevant song appears first, followed by a mix of the most relevant van den Oord, A., Dieleman, S., and Schrauwen, B. Deep songs within each category. content-based music recommendation. In Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 2643–2651, 2013. References van Leuken, R. H., Garcia, L., Olivares, X., and van Zwol, Ahmed, A., Teo, C. H., Vishwanathan, S. V. N., and Smola, R. Visual diversification of image search results. In A. Fair and balanced: Learning to present news sto- Proceedings of International Conference on World Wide ries. In Proceedings of ACM International Conference Web (WWW), pp. 341–350, 2009. on Web Search and Data Mining (WSDM), pp. 333–342, 2012. Xing, Z., Wang, X., and Wang, Y. Enhancing collabora- tive filtering music recommendation by balancing explo- Clarkson, K. L. Nearest-neighbor searching and metric ration and exploitation. In Proceedings of International space dimensions. In Shakhnarovich, Gregory, Darrell, Society for Music Information Retrieval Conference (IS- Trevor, and Indyk, Piotr (eds.), Nearest-Neighbor Meth- MIR), pp. 445–450, 2014. ods for Learning and Vision: Theory and Practice, pp. 15–59. MIT Press, 2006. Yu, C., Lakshmanan, L., and Amer-Yahia, S. It takes va- riety to make a world: Diversification in recommender El-Arini, K., Veda, G., Shahaf, D., and Guestrin, C. Turn- systems. In Proceedings of the International Conference ing down the noise in the blogosphere. In Proceedings of on Extending Database Technology (EDBT), pp. 368– SIGKDD Conference on Knowledge Discovery and Data 378, 2009. Mining (KDD), pp. 289–298, 2009. Zhang, Y. C., Seaghdha,´ D. O.,´ Quercia, D., and Jambor, Fujishige, S. Submodular functions and optimization. El- T. Auralist: Introducing serendipity into music recom- sevier, 2005. mendation. In Proceedings of ACM International Con- ference on Web Search and Data Mining (WSDM), pp. Linden, G., Smith, B., and York, J. Amazon.com recom- 13–22, 2012. mendations: item-to-item collaborative filtering. IEEE Internet Computing, 7(1):76–80, 2003. Nassif, H., Kuusisto, F., Burnside, E. S., and Shavlik, J. Uplift modeling with ROC: An SRL case study. In Inter- national Conference on Inductive Logic Programming (ILP), pp.