Matrix Co-Factorization for Cold-Start Recommendation Olivier Gouvert, Thomas Oberlin, Cédric Févotte
Total Page:16
File Type:pdf, Size:1020Kb
Matrix co-factorization for cold-start recommendation Olivier Gouvert, Thomas Oberlin, Cédric Févotte To cite this version: Olivier Gouvert, Thomas Oberlin, Cédric Févotte. Matrix co-factorization for cold-start recommenda- tion. 19th International Society for Music Information Retrieval Conference (ISMIR 2018), Sep 2018, Paris, France. pp.1-7. hal-02279385 HAL Id: hal-02279385 https://hal.archives-ouvertes.fr/hal-02279385 Submitted on 17 Sep 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Open Archive Toulouse Archive Ouverte OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible This is an author’s version published in: http://oatao.univ-toulouse.fr/22489 To cite this version: Gouvert, Olivier and Oberlin, Thomas and Févotte, Cédric Matrix co-factorization for cold-start problem. (2018) In: 19th International Society for Music Information Retrieval Conference (ISMIR 2018), 23 September 2018 - 27 September 2018 (Paris, France) Any correspondence concerning this service should be sent to the repository administrator: [email protected] MATRIX CO-FACTORIZATION FOR COLD-START RECOMMENDATION Olivier Gouvert1 Thomas Oberlin1 Cedric´ Fevotte´ 1 1 IRIT, Universite´ de Toulouse, CNRS, France [email protected] ABSTRACT tem cannot recommend songs which have never been lis- tened to, and similarly it cannot labeled untagged songs. Song recommendation from listening counts is now a clas- A joint modeling of both modalities can achieve cold-start sical problem, addressed by different kinds of collabora- recommendation, as soon as at least one modality is ob- tive filtering (CF) techniques. Among them, Poisson ma- served for every song [8,22]. trix factorization (PMF) has raised a lot of interest, since In this paper, we propose a new matrix co-factorization it seems well-suited to the implicit data provided by listen- model based on PMF, which performs those two tasks ing counts. Additionally, it has proven to achieve state-of- jointly. Our model is robust to the cold-start problem for the-art performance while being scalable to big data. Yet, both modalities. It can recommend a song which has never CF suffers from a critical issue, usually called cold-start been listened to, based on its associate tags. And symmet- problem: the system cannot recommend new songs, i.e., rically, it can associate tags on a song based on who lis- songs which have never been listened to. To alleviate this, tened to it. To do that, we separately model the scale (pop- one should complement the listening counts with another ularity) of each song according to each modality, while the modality. This paper proposes a multi-modal extension of patterns across the topics are shared. PMF applied to listening counts and tag labels extracted The state of the art of co-factorization techniques is from the Million Song Dataset. In our model, every song is presented in Section 2, along with some background on represented by the same activation pattern in each modality PMF. Then, in Section 3 we will present our new model but with possibly different scales. As such, the method is and explain its properties. In Section 4, we provide not prone to the cold-start problem, i.e., it can learn from a a majorization-minimization (MM) algorithm for solving single modality when the other one is not informative. Our our optimization problem and underline its scalability. Fi- model is symmetric (it equally uses both modalities) and nally, in Section 5, we test our model on songs recommen- we evaluate it on two tasks: new songs recommendation dation and tag labeling in various settings. and tag labeling. 1. INTRODUCTION 2. RELATED WORKS New albums and songs are released every day and are in- In this paper, we will focus on works based on so-called stantly available on streaming platforms. An important is- hybrid techniques [1] and Poisson matrix factorization. sue for streaming companies is therefore to develop rec- Note that recommendation tasks can also be addressed ommender systems which are able to handle such new with other techniques such as factorization machines [19]. songs [13, 20]. More generally, additional information on those songs is needed to enrich the catalog, allowing the user to efficiently explore and find the songs he might like. 2.1 Poisson matrix factorization In this perspective, tag labeling has proven to be very use- PMF is a non-negative MF (NMF) technique [14]. Let Y ful. The labels can be attributed by experts or by the user, be a matrix of size F × I, where each column represent and algorithms can complement this information with au- an item (song) i according to F features. MF approxi- tomatic labeling [7]. mates the observed matrix Y by a low-rank product of two For both tasks (song recommendation and tag label- T F ×K matrices: Y ≈ WH , where W ∈ R+ represents a ing), matrix factorization (MF) techniques [12, 17], and × dictionary matrix, and H ∈ RI K represents a matrix of in particular Poisson MF (PMF), reach significant perfor- + attributes (activations), with K ≪ min(F, I). mance. Unfortunately, these techniques suffer from the well-known cold-start problem: such a recommender sys- When observed data are in the form of counts, i.e., Y ∈ NF ×I , a classical hypothesis is to assume that each observation is drawn from a Poisson distribution: c Olivier Gouvert, Thomas Oberlin, Cedric´ Fevotte.´ Li- censed under a Creative Commons Attribution 4.0 International License T yfi ∼ Poisson([WH ]fi). (1) (CC BY 4.0). The maximum likelihood (ML) estimator of W and H is therefore obtained by minimizing the cost function de- fined by: A popular choice for this penalty is the ℓ1-norm: HA HB HA HB P en ( , ) = − 1. It is adapted when both C(W, H) = − log p(Y|W, H) modalities are likely to share the same activations, except T = DKL (Y | WH ) + cst (2) at some sparse locations where they can differ significantly. s.t. W ≥ 0, H ≥ 0, 2.2.3 Offset models Bayesian formulations of the soft co-factorization problem where cst is a constant w.r.t. W and H, and where DKL is the generalized Kullback-Liebler (KL) divergence defined have also been developed through the introduction of an by: offset latent variable [11,22]. The link between activations is therefore given by: yfi Y X B A DKL ( | ) = yfi log − yfi + xfi . (3) h = h + εik , (6) xfi ik ik Xf,i where ε is a latent random variable. This low-rank approximation is known as KL non- In particular in [11], a co-factorization model is devel- negative matrix factorization (KL-NMF) [9, 15]. oped based on PMF, with εik ∼ Gamma( α, β ). This The cost function C is scale invariant, i.e., for any choice is motivated by the conjugacy propriety of the RK×K diagonal non-singular matrix Λ ∈ + , we have gamma distribution with the Poisson distribution. Never- W H W −1 H C( , ) = C( Λ , Λ) . To avoid degenerate solu- theless, the model is not symmetric with respect to (w.r.t.) tions, a renormalization such that f wfk = F is often the activations HA and HB, as hB > h A by construc- W ik ik used, where wfk = [ ]fk . P tion. Thus, it can solve the cold-start problem only for the Several extensions based on Bayesian formulations modality A and not for B. have been proposed in the literature [3,5,6,10,17]. In [10], the authors developed a hierarchical Poisson factorization 3. PROPOSED MODEL (HPF) by introducing new variables: the popularity of the items and the activity of the users. These variables play a 3.1 Notations significant role in recommendation tasks. In this article, we work with two different modalities. The first modality, denoted by A, corresponds to the listening 2.2 Co-factorization counts of U users on I songs. The second modality, de- A way of circumventing the cold-start problem is to intro- noted by B, corresponds to the tags assigned to these I duce new modalities [8, 11, 16]. Co-factorization frame- songs, among a set of V tags. WA and WB thus denote works have been developed to jointly factorize two matri- the preferences of users and the atoms of tags across the K ces of observations (two modalities): YA ≈ WA(HA)T patterns, respectively. and YB ≈ WB(HB)T , with shared information between the activation matrices: HA ≈ HB. 3.2 Link between attributes 2.2.1 Hard co-factorization We propose an equality constraint on normalized activa- A A B B tions. We denote by ni = k hik and ni = k hik , the Hard co-factorization [8, 21] posits that the link between sum of the rows of the activations.P We impose,P for each A B activations is an equality constraint: H = H = H. item i: This is equivalent to concatenate the observations YA and B A B A B Y , and the dictionaries W and W : hik hik A = B = dik , (7) ni ni A A T B B T DKL (Y |W H ) + γD KL (Y |W H ) A B A A when ni > 0 and ni > 0.