Attentive Autoencoders for Multifaceted Preference Learning in One-class Collaborative Filtering

Zheda Mai§, Ga Wu§, Kai Luo and Scott Sanner Department of Mechanical & Industrial Engineering, University of Toronto, Canada [email protected], {wuga, kluo, ssanner}@mie.utoronto.ca,

Abstract—Most existing One-Class Collaborative Filtering systems can either increase the encoding complexity to en- (OC-CF) algorithms estimate a user’s preference as a latent vec- hance the modeling capability or extend the latent repre- tor by encoding their historical interactions. However, users often sentation to a higher dimension. Unfortunately, models with show diverse interests, which significantly increases the learning difficulty. In order to capture multifaceted user preferences, exist- complex encoding modules are fraught with increased training ing recommender systems either increase the encoding complexity difficulty, while models with high dimensional latent spaces or extend the latent representation dimension. Unfortunately, require strong regularization to prevent overfitting and prove these changes inevitably lead to increased training difficulty challenging to scale to large, real-world datasets. and exacerbate scalability issues. In this paper, we propose However, learning complex user preferences in low- a novel and efficient CF framework called Attentive Multi- modal AutoRec (AMA) that explicitly tracks multiple facets of dimensional latent spaces is not easy. Most popular recom- user preferences. Specifically, we extend the Autoencoding-based mender systems aim to model user preferences as either a recommender AutoRec to learn user preferences with multi- single point or a uni-modal distribution in the low-dimensional modal latent representations, where each mode captures one facet latent representation space, which we denote as uni-modal of a user’s preferences. By leveraging the attention mechanism, estimation. This approach can capture the preference for users each observed interaction can have different contributions to the preference facets. Through extensive experiments on three real- who have a single preference type. However, for users with world datasets, we show that AMA is competitive with state-of- multifaceted preferences, the uni-modal estimation may not the-art models under the OC-CF setting. Also, we demonstrate suffice since it is hard to capture the characteristics of all user how the proposed model improves interpretability by providing preferences with one mode. As we can see in Figure 1, for explanations using the attention mechanism. users with multifaceted preferences, the uni-modal estimation Index Terms—One-class Collaborative Filtering, Attention Model, Multifaceted Preference Recommendation will try to ”average” the user-item interactions and lead to an inaccurate estimation (Figure 1). Therefore, we should model I.INTRODUCTION user preferences with more than one mode in the latent space, 1 Collaborative Filtering (CF) is one of the most prevalent which we denote as multi-modal estimation . With this approaches for personalized recommendations that only relies approach, each mode in the latent representation explicitly on historical user-item interaction data. Many CF systems in captures one facet of the user preference and can better handle the literature perform well when provided with explicit user the multifaceted preference situation we mentioned (Figure 1). feedback such as 1-5 user ratings of a product. In many Nevertheless, directly modeling multifaceted preferences is applications, however, only positive feedback is available, such still non-trivial as the ground-truth mappings between item as view counts of a video or purchases of an item. One interactions and the facets of user preferences are not available, challenge of working with implicit feedback data is the lack and individual item interactions may contribute differently of negative signals. Although we can safely presume that the to different facets. To address this problem, we propose a purchases indicate the preference of a customer to a product, novel and efficient CF framework called Attentive Multi- arXiv:2010.12803v1 [cs.IR] 24 Oct 2020 we should not construe the unobserved interactions as negative modal AutoRec (AMA) to track multifaceted user preferences feedback since a customer would not be aware of all the items explicitly. AMA is based on AutoRec [3], one of the earliest on a platform and could not purchase every single item they Autoencoder based recommender systems that aims to learn like. The recommendation setting with only positive feedback a low dimensional latent representation that can reconstruct observed is known as One-Class Collaborative Filtering (OC- users’ observed interactions. But traditional AutoRec adopts a CF) [1]. uni-modal estimation approach and uses a single embedding In the OC-CF setting, many CF algorithms learn the pref- vector to represent diverse user preferences. erence representation of each user as a latent vector by To capture multifaceted user preferences, we replace the encoding their historical interactions. However, users often uni-modal estimation in AutoRec with multi-modal estimation, have diverse preferences, which significantly increases the where we have more than one latent embedding for each difficulty of modeling multifaceted preferences. In order to user (i.e., one for each preference facet). Additionally, we capture complicated user preferences, existing recommender 1The term multi-modal in our context means the latent representation has multiple modes in the latent space. Specifically, it is different from the one §Equal contribution in multi-modal learning [2], where data consists of multiple input modalities. User Representation • vj: Latent embedding vector of item j in shape of h × 1

AAACE3icbVDLSgMxFL1TX7W+qi7dBIvgqsxIQZcFNy4r2Ad0hpJJM21okhmSTKEM8xm61f9wJ279AH/DLzDTdmFbDwROzrmXezhhwpk2rvvtlLa2d3b3yvuVg8Oj45Pq6VlHx6kitE1iHqteiDXlTNK2YYbTXqIoFiGn3XByX/jdKVWaxfLJzBIaCDySLGIEGyv5vsBmHEbZNB94g2rNrbtzoE3iLUkNlmgNqj/+MCapoNIQjrXue25iggwrwwinecVPNU0wmeAR7VsqsaA6yOaZc3RllSGKYmWfNGiu/t3IsNB6JkI7WWTU614h/uf1UxPdBRmTSWqoJItDUcqRiVFRABoyRYnhM0swUcxmRWSMFSbG1rRyJRT56t8KtiRvvZJN0rmpe27de2zUmo1lXWW4gEu4Bg9uoQkP0II2EEjgBV7hzXl23p0P53MxWnKWO+ewAufrF52Fn1Q=sha1_base64="N/IVEjWXrhtbuzzQMNkd2r1Au6E=">AAAB+XicbVDLSgNBEOyNrxijxrOXwSB4Crte9Ch48RjBPCBZwuxsbzJkHsvMrBCW3MWr/+FN/BZ/wy9w8jgYY0FDTVU33VNJLrh1YfgVVHZ29/YPqoe1o3rt+OS0Ue9aXRiGHaaFNv2EWhRcYcdxJ7CfG6QyEdhLpvcLv/eMxnKtntwsx1jSseIZZ9R5qT1qNMNWuATZJtGaNGGNUeN7mGpWSFSOCWrtIApzF5fUOM4EzmvDwmJO2ZSOceCpohJtXC7PnJNLr6Qk08aXcmSp/p4oqbR2JhPfKamb2L/eQvzPGxQuu41LrvLCoWKrRVkhiNNk8WeScoPMiZknlBnubyVsQg1lziezsSWR8823F3xG0d9Etkn3uhWFregxhCqcwwVcQQQ3cAcP0IYOMEjhFd6Cl+A9+FhlWQnWoZ7BBoLPH15ql7w=sha1_base64="V+imswkH757IJZEsK7LPnYxHNdQ=">AAACCHicbZDNSgMxFIXv1L9aq1a3boJFcFUybnQpuHFZwf5AZyiZNNOGJpkhyRTKMI+hW30Pd+JL+Bo+gZm2C9t6IXByTsI9fFEquLEYf3uVnd29/YPqYe2ofnxy2jird02Saco6NBGJ7kfEMMEV61huBeunmhEZCdaLpg9l3psxbXiinu08ZaEkY8VjTol1VhBIYidRnM+KoT9sNHELLwZtC38lmrCa9rDxE4wSmkmmLBXEmIGPUxvmRFtOBStqQWZYSuiUjNnASUUkM2G+6FygK+eMUJxod5RFC/fvj5xIY+Yyci/LjmYzK83/skFm47sw5yrNLFN0uSjOBLIJKgGgEdeMWjF3glDNXVdEJ0QTah2mtS2RLNbvznCQ/E0k26J70/Jxy3/CUIULuIRr8OEW7uER2tABCim8whu8ey/eh/e5xFnxVlzPYW28r1/F3p3ksha1_base64="7FL/9SW8ewIODxtcPQZ2SL6UWao=">AAACE3icbVDLSgMxFL1TX7W+qi7dBIvgqsyIoMuCG5cVbCt0hpJJM21okhmSTKEM8xm61f9wJ279AH/DLzDTzsK2HgicnHMv93DChDNtXPfbqWxsbm3vVHdre/sHh0f145OujlNFaIfEPFZPIdaUM0k7hhlOnxJFsQg57YWTu8LvTanSLJaPZpbQQOCRZBEj2FjJ9wU24zDKpvnAG9QbbtOdA60TryQNKNEe1H/8YUxSQaUhHGvd99zEBBlWhhFO85qfappgMsEj2rdUYkF1kM0z5+jCKkMUxco+adBc/buRYaH1TIR2ssioV71C/M/rpya6DTImk9RQSRaHopQjE6OiADRkihLDZ5ZgopjNisgYK0yMrWnpSijy5b8VbEneaiXrpHvV9Nym9+A2WtdlXVU4g3O4BA9uoAX30IYOEEjgBV7hzXl23p0P53MxWnHKnVNYgvP1C5xFn1A= vAAACE3icbVDLSgMxFL1TX7W+qi7dBIvgqsxIQZcFNy4r2Ad0hpJJM21okhmSTKEM8xm61f9wJ279AH/DLzDTdmFbDwROzrmXezhhwpk2rvvtlLa2d3b3yvuVg8Oj45Pq6VlHx6kitE1iHqteiDXlTNK2YYbTXqIoFiGn3XByX/jdKVWaxfLJzBIaCDySLGIEGyv5vsBmHEbZNB94g2rNrbtzoE3iLUkNlmgNqj/+MCapoNIQjrXue25iggwrwwinecVPNU0wmeAR7VsqsaA6yOaZc3RllSGKYmWfNGiu/t3IsNB6JkI7WWTU614h/uf1UxPdBRmTSWqoJItDUcqRiVFRABoyRYnhM0swUcxmRWSMFSbG1rRyJRT56t8KtiRvvZJN0rmpe27de2zUmo1lXWW4gEu4Bg9uoQkP0II2EEjgBV7hzXl23p0P53MxWnKWO+ewAufrF52Fn1Q=sha1_base64="N/IVEjWXrhtbuzzQMNkd2r1Au6E=">AAAB+XicbVDLSgNBEOyNrxijxrOXwSB4Crte9Ch48RjBPCBZwuxsbzJkHsvMrBCW3MWr/+FN/BZ/wy9w8jgYY0FDTVU33VNJLrh1YfgVVHZ29/YPqoe1o3rt+OS0Ue9aXRiGHaaFNv2EWhRcYcdxJ7CfG6QyEdhLpvcLv/eMxnKtntwsx1jSseIZZ9R5qT1qNMNWuATZJtGaNGGNUeN7mGpWSFSOCWrtIApzF5fUOM4EzmvDwmJO2ZSOceCpohJtXC7PnJNLr6Qk08aXcmSp/p4oqbR2JhPfKamb2L/eQvzPGxQuu41LrvLCoWKrRVkhiNNk8WeScoPMiZknlBnubyVsQg1lziezsSWR8823F3xG0d9Etkn3uhWFregxhCqcwwVcQQQ3cAcP0IYOMEjhFd6Cl+A9+FhlWQnWoZ7BBoLPH15ql7w=sha1_base64="V+imswkH757IJZEsK7LPnYxHNdQ=">AAACCHicbZDNSgMxFIXv1L9aq1a3boJFcFUybnQpuHFZwf5AZyiZNNOGJpkhyRTKMI+hW30Pd+JL+Bo+gZm2C9t6IXByTsI9fFEquLEYf3uVnd29/YPqYe2ofnxy2jird02Saco6NBGJ7kfEMMEV61huBeunmhEZCdaLpg9l3psxbXiinu08ZaEkY8VjTol1VhBIYidRnM+KoT9sNHELLwZtC38lmrCa9rDxE4wSmkmmLBXEmIGPUxvmRFtOBStqQWZYSuiUjNnASUUkM2G+6FygK+eMUJxod5RFC/fvj5xIY+Yyci/LjmYzK83/skFm47sw5yrNLFN0uSjOBLIJKgGgEdeMWjF3glDNXVdEJ0QTah2mtS2RLNbvznCQ/E0k26J70/Jxy3/CUIULuIRr8OEW7uER2tABCim8whu8ey/eh/e5xFnxVlzPYW28r1/F3p3ksha1_base64="7FL/9SW8ewIODxtcPQZ2SL6UWao=">AAACE3icbVDLSgMxFL1TX7W+qi7dBIvgqsyIoMuCG5cVbCt0hpJJM21okhmSTKEM8xm61f9wJ279AH/DLzDTzsK2HgicnHMv93DChDNtXPfbqWxsbm3vVHdre/sHh0f145OujlNFaIfEPFZPIdaUM0k7hhlOnxJFsQg57YWTu8LvTanSLJaPZpbQQOCRZBEj2FjJ9wU24zDKpvnAG9QbbtOdA60TryQNKNEe1H/8YUxSQaUhHGvd99zEBBlWhhFO85qfappgMsEj2rdUYkF1kM0z5+jCKkMUxco+adBc/buRYaH1TIR2ssioV71C/M/rpya6DTImk9RQSRaHopQjE6OiADRkihLDZ5ZgopjNisgYK0yMrWnpSijy5b8VbEneaiXrpHvV9Nym9+A2WtdlXVU4g3O4BA9uoAX30IYOEEjgBV7hzXl23p0P53MxWnHKnVNYgvP1C5xFn1A= v 2 1 2 1 Item Representation where h is the size of embedding. (1) uAAACGXicbVDLSsNAFL3xWesr6tJNsAh1UxIp6LLgxmUF+4A2lsl00g6dmYSZSaGEfIlu9T/ciVtX/oZf4KTNwrYeGDhzzr3cwwliRpV23W9rY3Nre2e3tFfePzg8OrZPTtsqSiQmLRyxSHYDpAijgrQ01Yx0Y0kQDxjpBJO73O9MiVQ0Eo96FhOfo5GgIcVIG2lg232O9DgI0yR7SqveVTawK27NncNZJ15BKlCgObB/+sMIJ5wIjRlSque5sfZTJDXFjGTlfqJIjPAEjUjPUIE4UX46T545l0YZOmEkzRPamat/N1LElZrxwEzmOdWql4v/eb1Eh7d+SkWcaCLw4lCYMEdHTl6DM6SSYM1mhiAsqcnq4DGSCGtT1tKVgGfLfyOYkrzVStZJ+7rmuTXvoV5p1Iu6SnAOF1AFD26gAffQhBZgmMILvMKb9Wy9Wx/W52J0wyp2zmAJ1tcv26Wg9A== • v˜ : Value vector of item j in attention with shape of h×1. (3) (3) j (1) UAAACDnicbVDLTgIxFL2DL8QX6tJNIzHBDZlRE12SuHGJiQMkMJJO6UBD25m0HRMy4R90q//hzrj1F/wNv8ACsxDwJE3OPefe3NsTJpxp47rfTmFtfWNzq7hd2tnd2z8oHx41dZwqQn0S81i1Q6wpZ5L6hhlO24miWISctsLR7dRvPVGlWSwfzDihgcADySJGsLFSy3/Mqpfnk1654tbcGdAq8XJSgRyNXvmn249JKqg0hGOtO56bmCDDyjDC6aTUTTVNMBnhAe1YKrGgOshm507QmVX6KIqVfdKgmfp3IsNC67EIbafAZqiXvan4n9dJTXQTZEwmqaGSzBdFKUcmRtO/oz5TlBg+tgQTxeytiAyxwsTYhBa2hGKyWFvBhuQtR7JKmhc1z61591eVupvHVYQTOIUqeHANdbiDBvhAYAQv8ApvzrPz7nw4n/PWgpPPHMMCnK9fhM2cgQ==

AAACGXicbVDLSsNAFL2pr1pfUZduBotQNyXRgi4LblxWsA9oa5lMJ+3QmSTMTAol5Et0q//hTty68jf8AidtFrb1wMCZc+7lHo4Xcaa043xbhY3Nre2d4m5pb//g8Mg+PmmpMJaENknIQ9nxsKKcBbSpmea0E0mKhcdp25vcZX57SqViYfCoZxHtCzwKmM8I1kYa2HZPYD32/CROn5LK9WU6sMtO1ZkDrRM3J2XI0RjYP71hSGJBA004VqrrOpHuJ1hqRjhNS71Y0QiTCR7RrqEBFlT1k3nyFF0YZYj8UJoXaDRX/24kWCg1E56ZzHKqVS8T//O6sfZv+wkLoljTgCwO+TFHOkRZDWjIJCWazwzBRDKTFZExlphoU9bSFU+ky38jmJLc1UrWSeuq6jpV96FWrtfyuopwBudQARduoA730IAmEJjCC7zCm/VsvVsf1uditGDlO6ewBOvrF97xoPY= u UAAACDnicbVDLSgMxFL3js9ZX1aWbYBHqpsyIoMuCG5cVnLbQjiWTZtrQJDMkGaEM8w+61f9wJ279BX/DLzBtZ2FbDwTOPede7s0JE860cd1vZ219Y3Nru7RT3t3bPzisHB23dJwqQn0S81h1QqwpZ5L6hhlOO4miWISctsPx7dRvP1GlWSwfzCShgcBDySJGsLFS23/Mat5F3q9U3bo7A1olXkGqUKDZr/z0BjFJBZWGcKx113MTE2RYGUY4zcu9VNMEkzEe0q6lEguqg2x2bo7OrTJAUazskwbN1L8TGRZaT0RoOwU2I73sTcX/vG5qopsgYzJJDZVkvihKOTIxmv4dDZiixPCJJZgoZm9FZIQVJsYmtLAlFPlibQUbkrccySppXdY9t+7dX1UbbhFXCU7hDGrgwTU04A6a4AOBMbzAK7w5z8678+F8zlvXnGLmBBbgfP0CgYGcfw== v1 v2 (i) 1 1 • U : Latent embedding matrix of user i in shape of d×h u1 1 (i)

AAACE3icbVDLSgMxFL3js9ZX1aWbYBFclZlS0GXBjcsK9gGdoWTSTBuaZIYkUyjDfIZu9T/ciVs/wN/wC0zbWdjWA4GTc+7lHk6YcKaN6347W9s7u3v7pYPy4dHxyWnl7Lyj41QR2iYxj1UvxJpyJmnbMMNpL1EUi5DTbji5n/vdKVWaxfLJzBIaCDySLGIEGyv5vsBmHEbZNB/UB5WqW3MXQJvEK0gVCrQGlR9/GJNUUGkIx1r3PTcxQYaVYYTTvOynmiaYTPCI9i2VWFAdZIvMObq2yhBFsbJPGrRQ/25kWGg9E6GdnGfU695c/M/rpya6CzImk9RQSZaHopQjE6N5AWjIFCWGzyzBRDGbFZExVpgYW9PKlVDkq38r2JK89Uo2Sade89ya99ioNhtFXSW4hCu4AQ9uoQkP0II2EEjgBV7hzXl23p0P53M5uuUUOxewAufrF58pn1U= vAAACE3icbVDLSgMxFL3js9ZX1aWbYBFclZlS0GXBjcsK9gGdoWTSTBuaZIYkUyjDfIZu9T/ciVs/wN/wC0zbWdjWA4GTc+7lHk6YcKaN6347W9s7u3v7pYPy4dHxyWnl7Lyj41QR2iYxj1UvxJpyJmnbMMNpL1EUi5DTbji5n/vdKVWaxfLJzBIaCDySLGIEGyv5vsBmHEbZNB/UB5WqW3MXQJvEK0gVCrQGlR9/GJNUUGkIx1r3PTcxQYaVYYTTvOynmiaYTPCI9i2VWFAdZIvMObq2yhBFsbJPGrRQ/25kWGg9E6GdnGfU695c/M/rpya6CzImk9RQSZaHopQjE6N5AWjIFCWGzyzBRDGbFZExVpgYW9PKlVDkq38r2JK89Uo2Sade89ya99ioNhtFXSW4hCu4AQ9uoQkP0II2EEjgBV7hzXl23p0P53M5uuUUOxewAufrF58pn1U= 2 v2 where d is the number of user embedding vector. u (2) u2 1 l uAAACGXicbVDLSsNAFL2pr1pfUZdugkWom5KUgi4LblxWsA9oY5lMJ+3QmUmYmRRKyJfoVv/Dnbh15W/4BU7bLGzrgYEz59zLPZwgZlRp1/22ClvbO7t7xf3SweHR8Yl9etZWUSIxaeGIRbIbIEUYFaSlqWakG0uCeMBIJ5jczf3OlEhFI/GoZzHxORoJGlKMtJEGtt3nSI+DME2yp7RSu84Gdtmtugs4m8TLSRlyNAf2T38Y4YQToTFDSvU8N9Z+iqSmmJGs1E8UiRGeoBHpGSoQJ8pPF8kz58ooQyeMpHlCOwv170aKuFIzHpjJeU617s3F/7xeosNbP6UiTjQReHkoTJijI2degzOkkmDNZoYgLKnJ6uAxkghrU9bKlYBnq38jmJK89Uo2SbtW9dyq91AvN+p5XUW4gEuogAc30IB7aEILMEzhBV7hzXq23q0P63M5WrDynXNYgfX1C91LoPU= (2) th UAAACDnicbVDLSgMxFL1TX7W+qi7dBItQN2WmCLosuHFZwT6gHUsmzbShSWZIMkIZ5h90q//hTtz6C/6GX2DazsK2Hgice8693JsTxJxp47rfTmFjc2t7p7hb2ts/ODwqH5+0dZQoQlsk4pHqBlhTziRtGWY47caKYhFw2gkmtzO/80SVZpF8MNOY+gKPJAsZwcZKndZjWq1fZoNyxa25c6B14uWkAjmag/JPfxiRRFBpCMda9zw3Nn6KlWGE06zUTzSNMZngEe1ZKrGg2k/n52bowipDFEbKPmnQXP07kWKh9VQEtlNgM9ar3kz8z+slJrzxUybjxFBJFovChCMTodnf0ZApSgyfWoKJYvZWRMZYYWJsQktbApEt11awIXmrkayTdr3muTXv/qrScPO4inAG51AFD66hAXfQhBYQmMALvMKb8+y8Ox/O56K14OQzp7AE5+sXgyecgA== denotes the l latent embedding vector for user i with u3 1 1 1 2 1 2 shape of h × 1. Uni-Modal Estimation Multi-Modal Estimation R i • c( ): Loss weighting vector of user i. We will define in Fig. 1: A 2-dimensional latent space conceptualization of the paper shortly. uni-modal user preference estimation vs. multi-modal user B. AutoRec preference estimation. Uni-modal user-preference estimation results in poor performance because of the difficulty in captur- AutoRec [3] is a state-of-the-art Autoencoder-based collab- ing common properties of user-item interactions. Multi-modal orative filtering system. It embeds a user’s sparse preference user-preference estimation mitigates the issue by learning observations in a latent space and reconstructs a dense version several significant potential preference modes. of those preferences from the embedding to enable personal- ized recommendations. The architecture of the original AutoRec is straightforward take advantage of the attention mechanism to assign different – a fully connected neural network with one hidden layer weights to each item interaction when we estimate different and sigmoid activation function, which has no difference with latent embeddings. During the decoding phase, each latent em- conventional Autoencoders. Formally, bedding will create a prediction, and the model will select the u(i) = f (r(i)) and ˆr(i) = f (u(i)). (1) prediction with the highest output value. Apart from the benefit ϑ θ of capturing multifaceted user preferences, AMA can also where u(i) is the user representation vector that encodes user provide a reasonable interpretation for the recommendations preferences, and ϑ, θ are the weights of encoder and decoder since we know which latent embedding a recommendation networks, respectively. comes from and which items contribute to the corresponding However, since unobserved interactions do not deliver infor- latent embedding. We evaluate AMA extensively on three real- mation about user preferences, AutoRec modifies its objective world datasets, and the results show AMA is competitive with function as follows: many state-of-the-art models with much fewer parameters. m 2 Our contributions are summarized as follows: X (i) (i) λ 2 2 arg minθ,ϑ r − fθ(fϑ(r )) + (kθkF + kϑkF ), O 2 • We propose a novel and efficient Autoencoder-based i (2) recommender method called Attentive Multi-modal Au- where k·k2 means that the training only takes observed ratings toRec (AMA) to accommodate users with multifaceted O into consideration. preferences in OC-CF. The performance of Autoencoder-based recommender sys- • We employ the attention mechanism to generate multi- tems highly relies on the accuracy of the user preference modal latent embeddings to capture multifaceted user embedding, where the preferences of a user could be diverse. preferences as well as provide a reasonable interpreta- In order to better capture complicated user preferences, many tion of the recommendations with a localized k-nearest previous works propose to increase model complexity through neighbor method. either expanding the latent representation space or training • Through extensive experiments on three real-world deeper encoder/decoder neural networks (see [4]–[6]). Unfor- datasets, we demonstrate that AMA is competitive with tunately, these models must trade off between model complex- many state-of-the-art methods, albeit with many fewer ity and data sparsity in recommendation tasks. Specifically, parameters and much lower computational requirements. complex models overfit the training data due to insufficient observed user-item interactions. In contrast, simple models fail II.PRELIMINARY to capture fine-grained user preferences and introduce biases A. Notation in prediction (e.g., popularity bias). Before proceeding, we define our notation as follows: C. Attention Mechanism i • r( ): The implicit feedback vector of user i in shape of The attention mechanism in a deep neural network is a (i) n × 1 where n is the number of items. Each entry of rj knowledge retrieval method that automatically computes a is either 1, which indicates there is an interaction between latent representation z from various weighted source features user i and item j, or 0 otherwise (no interaction). V = [v1 ··· vn] with attention weights a as follows: (i) (i) • ˆr : Reconstructions of the original vector r . n i i • ˜r( ): Randomly corrupted implicit feedback vector r( ) X z = ajvj. (3) whose entries are randomly reset to 0. j=1 The distribution of the attention weights a reflects quantified Action interdependence between a query q and a list of feature keys K = [k1 ··· kn]. Usually, higher attention weight aj indicates Adventure the higher value of the corresponding feature vj to the task. Comedy

The most commonly used attention mechanism is Scaled (1)

UAAACDnicbVDLTgIxFL2DL8QX6tJNIzHBDZkxJLokceMSEwdIYCSdUqCh7UzajgmZzD/oVv/DnXHrL/gbfoEFZiHgSZqce869ubcnjDnTxnW/ncLG5tb2TnG3tLd/cHhUPj5p6ShRhPok4pHqhFhTziT1DTOcdmJFsQg5bYeT25nffqJKs0g+mGlMA4FHkg0ZwcZKbf8xrXqXWb9ccWvuHGideDmpQI5mv/zTG0QkEVQawrHWXc+NTZBiZRjhNCv1Ek1jTCZ4RLuWSiyoDtL5uRm6sMoADSNlnzRorv6dSLHQeipC2ymwGetVbyb+53UTM7wJUibjxFBJFouGCUcmQrO/owFTlBg+tQQTxeytiIyxwsTYhJa2hCJbrq1gQ/JWI1knraua59a8+3qlUc/jKsIZnEMVPLiGBtxBE3wgMIEXeIU359l5dz6cz0VrwclnTmEJztcvgrWcgw==

Dot-Product Attention introduced along with the Transformer (2)

UAAACDnicbVDLSgMxFL3js9ZX1aWbYBHqpsyUgi4LblxWsA9ox5JJM21okhmSjFCG+Qfd6n+4E7f+gr/hF5i2s7CtBwLnnnMv9+YEMWfauO63s7G5tb2zW9gr7h8cHh2XTk7bOkoUoS0S8Uh1A6wpZ5K2DDOcdmNFsQg47QST25nfeaJKs0g+mGlMfYFHkoWMYGOlTusxrdSuskGp7FbdOdA68XJShhzNQemnP4xIIqg0hGOte54bGz/FyjDCaVbsJ5rGmEzwiPYslVhQ7afzczN0aZUhCiNlnzRorv6dSLHQeioC2ymwGetVbyb+5/USE974KZNxYqgki0VhwpGJ0OzvaMgUJYZPLcFEMXsrImOsMDE2oaUtgciWayvYkLzVSNZJu1b13Kp3Xy836nlcBTiHC6iAB9fQgDtoQgsITOAFXuHNeXbenQ/nc9G64eQzZ7AE5+sXhFuchA== architecture [7], as defined in Equation 4: (3) UAAACDnicbVDLTgIxFL3jE/GFunTTSExwQ2aURJckblxi4gAJjKRTCjS0nUnbMSGT+Qfd6n+4M279BX/DL7DALAQ8SZNzz7k39/aEMWfauO63s7a+sbm1Xdgp7u7tHxyWjo6bOkoUoT6JeKTaIdaUM0l9wwyn7VhRLEJOW+H4duq3nqjSLJIPZhLTQOChZANGsLFSy39MK1cXWa9UdqvuDGiVeDkpQ45Gr/TT7UckEVQawrHWHc+NTZBiZRjhNCt2E01jTMZ4SDuWSiyoDtLZuRk6t0ofDSJlnzRopv6dSLHQeiJC2ymwGellbyr+53USM7gJUibjxFBJ5osGCUcmQtO/oz5TlBg+sQQTxeytiIywwsTYhBa2hCJbrK1gQ/KWI1klzcuq51a9+1q5XsvjKsApnEEFPLiGOtxBA3wgMIYXeIU359l5dz6cz3nrmpPPnMACnK9fhgGchQ==

Horror Kq Drama a = softmax √ , (4) κ Fig. 2: Multiple user preference facets (5) that are used to demonstrate user preference in a movie recommendation where κ denotes the length of key and query vectors. task, where each vertex represents a preference degree of the D. Attentive Recommender Systems corresponding facet. The figure shows an example of three users. Many recent research shows the attention mechanism could significantly improve recommendation performance (see [8]– [12]). Especially, in content-based recommender systems, the from multiple perspectives for each user to improve recom- attention mechanism aims to extract features from various mendation performance. content sources and is proven to be effective. Attentive Col- User preferences are often a combination of multiple hid- laborative Filtering (ACF) [8] is one of the most expressive den characteristic facets in recommendation tasks. Instead model representations, which extracts informative components of representing the user preference as a single vector, it is from the multimedia items. probably more intuitive to represent it as a group of facets, In addition, the attention mechanism also contributes to as demonstrated in Figure 2. Formally, we want to create an session-based recommendation tasks, where the recommender encoding function fϑ such that system actively tracks a users’ preferences as they dynamically (i) (i) (i) (i) (i) (i) evolve. ATRank [10] is a representative model for session- U = fϑ(r ) and U = [u1 , u2 ··· ud ], (5) based recommenders, which adopts self-attention to model where each user i has d diverse preference facets that jointly heterogeneous observations of user behaviour. represent the user’s preference. Since the user preference is Unfortunately, less effort has been made on incorporat- distributed into multiple vectors, the length of each vector ing the attention mechanism into traditional collaborative could be substantially reduced. filtering tasks, where the system is only provided with the With multiple user representation vectors, one can make user-item interaction history. Latent Relational Metric Learn- recommendations to the user by jointly considering each of ing (LRML) [13] is a pioneering collaborative filtering ap- the characteristic facets (or preference modes) the user has. proach that uses attention to model the relationship between Formally, there exists a decoder function f such that: user-item interactions. However, similar to the other Metric θ (i) (i) Learning based CF algorithms, LRML suffers from exception- rˆ = fθ(U ). (6) ally slow convergence due to negative sampling. We note that some content-based filtering systems could However, in order to achieve our purpose in estimating support collaborative filtering tasks with negligible changes by multiple user preference facets, we still face a few challenges: replacing content-based item-embeddings with collaborative • How to estimate multiple preference facets when facet item-embeddings. For example, we can obtain the collabo- side information is not available in the collaborative rative item-embeddings simply from SVD decomposition of filtering setting? the interaction matrix R. Through this method, content-based • How to control the model complexity to train on the real- systems such as Attentive Collaborative Filtering (ACF) [8] are world dataset with very sparse observations? capable of tackling collaborative filtering tasks without side We describe our approach in detail in the following sub- information. However, as those systems are not intentionally sections. To provide an overview of our approach, we also invented for pure collaborative filtering tasks, the advantage of show the proposed model architecture in Figure 3. their sophisticated architecture design turns out to be a burden A. Multi-head Attention Encoder when encountering highly sparse observations. As demonstrated in Equation 5, all user preference facets III.ATTENTIVE MULTI-MODAL AUTOREC (i) (i) (i) ul ∈ U are estimated from the same interaction vector r In this section, we present an alternative attention-based of user i. To allow the preference facets to capture different collaborative filtering architecture – Attentive Multi-modal information, we need to let each preference facet pay attention AutoRec. The proposed model aims to 1) reduce the com- to a different part of the interactions such that putation and storage requirements for collaborative filtering, (i) X (i) ul = aj,l v˜j + bl, (7) 2) improve the interpretability, and 3) track user preferences j QAAACEXicbVDLSgMxFL1TX7W+qi7dBIvgqsyIoMuCG5ct2Ae2Q8mkmTY0yQxJRijDfIVu9T/ciVu/wN/wC8y0s7CtBwLnnnsv9+QEMWfauO63U9rY3NreKe9W9vYPDo+qxycdHSWK0DaJeKR6AdaUM0nbhhlOe7GiWAScdoPpXd7vPlGlWSQfzCymvsBjyUJGsLHS40BgMwnCtJUNqzW37s6B1olXkBoUaA6rP4NRRBJBpSEca9333Nj4KVaGEU6zyiDRNMZkise0b6nEgmo/nTvO0IVVRiiMlH3SoLn6dyPFQuuZCOxk7lCv9nLxv14/MeGtnzIZJ4ZKsjgUJhyZCOXfRyOmKDF8ZgkmilmviEywwsTYkJauBCJbrq1gQ/JWI1knnau659a91nWtcV3EVYYzOIdL8OAGGnAPTWgDAQkv8ApvzrPz7nw4n4vRklPsnMISnK9fIVieiw==

kAAACFXicbVDLSgMxFL1TX7W+qi7dBIvgqsxIQZcFNy4r2Ae0Y8mkmTY0yQxJRijDfIdu9T/ciVvX/oZfYKadhW09EDg5517u4QQxZ9q47rdT2tjc2t4p71b29g8Oj6rHJx0dJYrQNol4pHoB1pQzSduGGU57saJYBJx2g+lt7nefqNIskg9mFlNf4LFkISPYWOlxILCZBGE6zYaplw2rNbfuzoHWiVeQGhRoDas/g1FEEkGlIRxr3ffc2PgpVoYRTrPKINE0xmSKx7RvqcSCaj+dp87QhVVGKIyUfdKgufp3I8VC65kI7GSeUq96ufif109MeOOnTMaJoZIsDoUJRyZCeQVoxBQlhs8swUQxmxWRCVaYGFvU0pVAZMt/K9iSvNVK1knnqu65de++UWs2irrKcAbncAkeXEMT7qAFbSCg4AVe4c15dt6dD+dzMVpyip1TWILz9Qt206BV 1

vAAACE3icbVDLSgMxFL1TX7W+qi7dBIvgqsxIQZcFNy4r2Ad0hpJJM21okhmSTKEM8xm61f9wJ279AH/DLzDTdmFbDwROzrmXezhhwpk2rvvtlLa2d3b3yvuVg8Oj45Pq6VlHx6kitE1iHqteiDXlTNK2YYbTXqIoFiGn3XByX/jdKVWaxfLJzBIaCDySLGIEGyv5vsBmHEbZNB94g2rNrbtzoE3iLUkNlmgNqj/+MCapoNIQjrXue25iggwrwwinecVPNU0wmeAR7VsqsaA6yOaZc3RllSGKYmWfNGiu/t3IsNB6JkI7WWTU614h/uf1UxPdBRmTSWqoJItDUcqRiVFRABoyRYnhM0swUcxmRWSMFSbG1rRyJRT56t8KtiRvvZJN0rmpe27de2zUmo1lXWW4gEu4Bg9uoQkP0II2EEjgBV7hzXl23p0P53MxWnKWO+ewAufrF52Fn1Q= 1 .7 1 .8 .8 .7 .8 1 .7 .9 .9

AAACEXicbVDLSgMxFL1TX7W+qi7dBIvgqsxIQZcFNy4r2Ae2Q8mkmTY0yQxJRijDfIVu9T/ciVu/wN/wC8y0s7CtBwLnnnsv9+QEMWfauO63U9rY3NreKe9W9vYPDo+qxycdHSWK0DaJeKR6AdaUM0nbhhlOe7GiWAScdoPpbd7vPlGlWSQfzCymvsBjyUJGsLHS40BgMwnCVGXDas2tu3OgdeIVpAYFWsPqz2AUkURQaQjHWvc9NzZ+ipVhhNOsMkg0jTGZ4jHtWyqxoNpP544zdGGVEQojZZ80aK7+3Uix0HomAjuZO9SrvVz8r9dPTHjjp0zGiaGSLA6FCUcmQvn30YgpSgyfWYKJYtYrIhOsMDE2pKUrgciWayvYkLzVSNZJ56ruuXXvvlFrNoq4ynAG53AJHlxDE+6gBW0gIOEFXuHNeXbenQ/nczFacoqdU1iC8/ULV52erA== r AAACGXicbVDLSsNAFL2pr1pfUZduBovgqiRS0GXBjcsK9gFNKJPppB06mYSZSaGEfIlu9T/ciVtX/oZf4KTNwrYeGDhzzr3cwwkSzpR2nG+rsrW9s7tX3a8dHB4dn9inZ10Vp5LQDol5LPsBVpQzQTuaaU77iaQ4CjjtBdP7wu/NqFQsFk96nlA/wmPBQkawNtLQtr0J1pkXYT0Jwkzm+dCuOw1nAbRJ3JLUoUR7aP94o5ikERWacKzUwHUS7WdYakY4zWteqmiCyRSP6cBQgSOq/GyRPEdXRhmhMJbmCY0W6t+NDEdKzaPATBYR1bpXiP95g1SHd37GRJJqKsjyUJhypGNU1IBGTFKi+dwQTCQzWRGZYImJNmWtXAmifPVvBFOSu17JJuneNFyn4T42661mWVcVLuASrsGFW2jBA7ShAwRm8AKv8GY9W+/Wh/W5HK1Y5c45rMD6+gUHI6Gq .6 ˆr 0 .3 .6 .5

UAAACEXicbVDLSgMxFL1TX7W+qi7dBIvgqsxIQZcFNy4rOG2xHUomzbShSWZIMkIZ+hW61f9wJ279An/DLzDTzsK2Hgice+693JMTJpxp47rfTmljc2t7p7xb2ds/ODyqHp+0dZwqQn0S81h1Q6wpZ5L6hhlOu4miWIScdsLJbd7vPFGlWSwfzDShgcAjySJGsLHSY19gMw6jzJ8NqjW37s6B1olXkBoUaA2qP/1hTFJBpSEca93z3MQEGVaGEU5nlX6qaYLJBI9oz1KJBdVBNnc8QxdWGaIoVvZJg+bq340MC62nIrSTuUO92svF/3q91EQ3QcZkkhoqyeJQlHJkYpR/Hw2ZosTwqSWYKGa9IjLGChNjQ1q6EorZcm0FG5K3Gsk6aV/VPbfu3TdqzUYRVxnO4BwuwYNraMIdtMAHAhJe4BXenGfn3flwPhejJafYOYUlOF+/J+yejw== .7 1 .9 .9 .8

vAAACFXicbVDLSgMxFL3js9ZX1aWbYBFclRkp6LLgxmUF+4B2LJk004YmmSHJFMow36Fb/Q934ta1v+EXmGlnYVsPBE7OuZd7OEHMmTau++1sbG5t7+yW9sr7B4dHx5WT07aOEkVoi0Q8Ut0Aa8qZpC3DDKfdWFEsAk47weQu9ztTqjSL5KOZxdQXeCRZyAg2VnrqC2zGQZhOs0FazwaVqltz50DrxCtIFQo0B5Wf/jAiiaDSEI617nlubPwUK8MIp1m5n2gaYzLBI9qzVGJBtZ/OU2fo0ipDFEbKPmnQXP27kWKh9UwEdjJPqVe9XPzP6yUmvPVTJuPEUEkWh8KEIxOhvAI0ZIoSw2eWYKKYzYrIGCtMjC1q6UogsuW/FWxJ3mol66R9XfPcmvdQrzbqRV0lOIcLuAIPbqAB99CEFhBQ8AKv8OY8O+/Oh/O5GN1wip0zWILz9QuOBaBj 4

vAAACHXicbVDLSsNAFJ3UV62v+Ni5GSyCq5JIQZcFNy4r2Ae0IUwmk3bozCTMTAo15Ft0q//hTtyKv+EXOGmzsK0HBs6ccy/3cIKEUaUd59uqbGxube9Ud2t7+weHR/bxSVfFqcSkg2MWy36AFGFUkI6mmpF+IgniASO9YHJX+L0pkYrG4lHPEuJxNBI0ohhpI/n22VBTFpJsyJEeB1E2zXO/6dt1p+HMAdeJW5I6KNH27Z9hGOOUE6ExQ0oNXCfRXoakppiRvDZMFUkQnqARGRgqECfKy+bpc3hplBBGsTRPaDhX/25kiCs144GZLEKqVa8Q//MGqY5uvYyKJNVE4MWhKGVQx7CoAoZUEqzZzBCEJTVZIR4jibA2hS1dCXi+/DeCKcldrWSddK8brtNwH5r1VrOsqwrOwQW4Ai64AS1wD9qgAzB4Ai/gFbxZz9a79WF9LkYrVrlzCpZgff0CDwWjPg== ˜4 Multi-head Attentive Encoder Maxout Decoder Fig. 3: Attentive Multi-modal AutoRec architecture. Colors are used to distinguish preference modes. Gray boxes show fixed embeddings that are not in training scope. Dashed arrow and boxes show the track of dropped items. In this particular example architecture, we have four items in the recommendation scope. The latent embedding dimension is five. Key and query dimensions are both two. Each user has a maximum of three potential preferences.

where v˜j denotes item information that contributes to estimat- facets, we note the overlapped facets do not have a significant (i) negative impact on recommendation performance. Thus, we ing user facets, aj,l denotes the attention score from user-item intersection (i, j) to the preference facet l, and bl denotes bias omit our attempts to regularize the queries. of user preference facet l. Item Embedding Since we only track one item embedding For establishing the attention mechanism, we expect each matrix V during the encoding stage, we can presume the item j has two components: item key kj and item value v˜j. item embedding matrix comes from various sources without And, correspondingly, each preference facet l has a query ql training. However, as we limit our information accessibility to that is shared for all users. We will describe how we can only users’ historical interactions, we should obtain the item obtain these values later. Based on the Scaled Dot-Product embedding matrix from the interaction matrix R. Attention described in Section II-C, we can calculate attention One simple solution is the SVD decomposition such that (i) mentioned in Equation 7 through T al R = UΣV . (10)   (i) 1 (i) qlK al = r exp( √ ) , (8) While one can also adopt state-of-the-art approaches such as Z(·) κ Graph Convolutional Neural Networks (GCN) to optimize the where partition function Z(·) sums overall values in the item embeddings, we stick to the simple SVD solution to numerator. clarify the contributions of our proposed model. It is important to mention that the attention mechanism B. Maxout Decoder is slightly different from its original form. Concretely, we introduce the historical interaction indicator r(i) as a mask A conventional Autoencoder based recommender system in the above function, so that the attention is limited to the predicts the score of a future interaction by decoding from (i) a user embedding into an item observation space through observed interactions (rj = 1) for each user. Queries, Keys and Values Now we describe how to gather (i) (i) ˆr = fθ(u ). (11) the query, key and value components for the attention compu- tation. In a simple linear decoding network setting, a dot prod- n×h uct between user embedding (i) and item-specific decoding Given item embeddings for all items V ∈ R , the key u and value for each item are linear transformations of its weights sj are sufficient for the task such that: corresponding item embedding. Formally, we assume there are (i) T r = u(i) s . (12) two linear functions such that: j j These straightforward methods, while simple, demonstrate k = v W and v˜ = W v , (9) j j k j v j strong prediction performance in practice. h×κ h×h where coefficients Wk ∈ R and Wv ∈ R . In the multi-modal user preference estimation setting of this While the queries could also be correlated with potential paper, we keep the dot product prediction approach and select user embeddings, we believe the preference facet categories the maximum prediction score among all predictions from for a recommendation task are sharable to all users. Therefore, different user preference modes. Specifically, we have κ we simply set the query q ∈ R as trainable random variables. (i) l r = max(U (i)s ), (13) During the model training, the query variables automatically j j look for the best facet that could represent user preferences. where we only maintain one set of decoding weights for all Although it is possible to set orthogonal regularization on preference facets estimated. The maxout prediction decoder is the query variables to encourage learning diverse preference beneficial from two perspectives: • Items are recommended for different reasons. For exam- 143, 169 parameters to achieve the optimal recommendation ple, item A may be recommended because it fit preference performance while matrix factorization, a model with similar facet 1, whereas item B is recommended due to the performance as AMA, has 1, 914, 200 parameters, which is matching with user preference facet 2. far more than our proposed model. Fewer parameters does • The model behaviour is easy to interpret. We can easily not sacrifice the modelling flexibility of the proposed model trace which of the user preference facet dominated the since it supports multiple hyper-parameters. Table A.1 in recommendation. Appendix A2 shows all tune-able hyper-parameters of the AMA model. C. Training Objective As a standard Autoencoder based collaborative filtering IV. EXPERIMENTSAND EVALUATION method, the proposed model could support various type of In this section, we perform experiments aiming at answering objectives from simple Mean Squared Error (MSE) to the the following research questions: ranking losses such as Bayesian Pair-wise Ranking (BPR). We • RQ1 How does AMA perform as compared with state- choose Weighted MSE in this paper as it is efficient during of-the-art CF methods? training and shows reasonable recommendation performance. • RQ2 How effective is the proposed multi-modal prefer- Formally, the objective function is ence estimation? m • RQ3 How does the interpretability of the model benefit X (i) (i) (i) 2 arg maxθ,ϑ hc , (r − fθ(fϑ(˜r ))) i + λ kθk , (14) from the attention mechanism and multi-modal prefer- i=1 ence representation? where (i) is randomly corrupted interaction history and (i) is r˜ c A. Experiment Settings the loss weighting vector of user i. Here, the random corrup- tion means we randomly remove some observed interactions Datasets. We evaluate the candidate algorithms on three pub- of users during each training epoch to reduce the chance of licly available rating datasets: Movielens-1M, Amazon Digital over-fitting. Music, and Amazon Video Games. The details of the datasets, While there are multiple choices to produce the loss weight- preprocessing method and train/test/validation splitting are 2 ing vector, we use the one described in WRMF [14]: presented in Appendix B . Evaluation protocols. We evaluate the recommendation (i) (i) cj = 1 + α log(1 + rj ), (15) performance using five metrics: Precision@K, Recall@K, MAP@K, R-Precision, and B-NDCG, where R-Precision is where α is a hyper-parameters. Recall our previous discussion about the multi-head atten- an order insensitive metrics, NDCG is order sensitive, and Precision@K as well as Recall@K are semi-order sensitive tion encoder, the encoder parameter set ϑ consists of several convolution kernels as follows due to the K values are given. Baselines. To demonstrate the effectiveness, we compare our ϑ = {Wk,Wv, Q, B} (16) proposed AMA with nine state-of-the-art scalable models with the ability to handle millions of users and transactions. For a that are shared to all users and items, which forms a sufficient fair comparison, we only compare models with similar running mutual regularization. As a contrast, the parameter set of the time. decoder θ is bounded by the items. • POP: Most popular items – not user personalized but an θ = {S} (17) intuitive baseline to test the claims of this paper. • AutoRec [3]: A neural Autoencoder based recommenda- Therefore, we choose to regularize the decoder parameters tion system with one hidden layer and ReLU activation only. function. • ACF [8]: Attentive Collaborative Filtering, which is the D. Model Complexity most well-known attention based recommender system. The expressiveness and complexity of a machine learning • BPR [15]: Bayesian Personalized Ranking. One of the model are usually bounded by the number of parameters first recommendation algorithms that explicitly optimizes involved in the model. Many machine learning based recom- pairwise rankings. mender systems make a trade-off between model complexity • CDAE [16]: Collaborative Denoising Autoencoder, and the size of available data. However, the proposed model which is specifically optimized for implicit feedback demonstrates surprisingly strong performance with a limited recommendation tasks. number of parameters. • CML [17]: Collaborative Metric Learning. A state-of- While the proposed model has many parameter sets, as the-art metric learning based recommender system. described in Equation 16 and 17, the total number of param- • PLRec [18], [19]: Projected linear recommendation ap- eters are less than classic latent factor models such as matrix proach. factorization. As a concrete example, for the Movielens-1M dataset in our experiments, the proposed AMA model uses 2 Please find the appendix in our extended version on arXiv. • PureSVD [20]: A similarity based recommendation 8000 4000 method that constructs a similarity matrix through SVD 6000 decomposition of implicit matrix R. 3000 4000 • VAE-CF [21]: Variational Autoencoder for Collaborative 2000 1000 2000 Number of Users Filtering. A state-of-the-art deep learning based recom- Number of Users mender system. 0 0 More hyper-parameter details are provided in Appendix A2. One ModeTwo ModesThree Modes One ModeTwo ModesThree Modes Number of User Preferences Number of User Preferences B. Performance Comparison (RQ1) (a) MovieLens (b) Amazon Digital Music In this experiment, we compare the proposed model against Fig. 4: Statistics of the number of user preference modes being several collaborative filtering algorithms on one-class recom- used for producing the Top-10 recommendations for each user. mendation tasks. Since some of the recommender systems are not intentionally designed form the OC-CF tasks, we extended and enhanced them for the one-class task from various per- user preferences estimated affect the Precision and Recall of spectives. We discuss them in more detail in Appendix D2. the model? (c) What are the characteristics of the top-10 items Table I, II, and III show the Top-N recommendation perfor- that each preference mode pays the greatest attention to? mance comparison between the proposed model and various baselines on the one benchmark Movielens-1M dataset and 1) As Figure 4 depicts, in a setting where three user two real-world Amazon datasets. From the tables 3, we obtain preference modes are applied, more than 82% of users the following interesting observations: in MovieLens-1M and 74% of users in Amazon Digital Music take advantage of more than one user preference • Compared to the Autoencoder based recommender sys- mode during recommendation. The observation reflects tems (AutoRec++ and CDAE), the proposed Attentive our hypothesis that most of the users have more than one Multi-modal AutoRec model not only shows better per- distinct preference mode and could benefit from explicit formance on the benchmark dataset but also demonstrates modelling of multiple preference modes. better robustness in terms of producing consistently 2) In order to understand the influence of modeling mul- strong recommendations on real data that is extremely tiple preference modes on the overall recommendation sparse and noisy. performance, we compare the AMA models using one, • Compared to recommender systems that leverage the three, and five prefixed number of modes. As shown Noise Contrastive Estimation technique (ACF-BPR and in Figure 7, the AMA model achieves the best Preci- PLRec), the proposed model shows better recommenda- sion@K score when it is initialized with a single mode. tion performance. This observation provides concrete evi- In contrast, it achieves the best Recall@K score when it dence that the performance improvement of the proposed leverages three modes. From this perspective, the AMA model does not merely rely on the quality of prefixed model provides a way to control the precision-recall item embeddings. trade-off depends on different scenario requirements. • Comparing to the Attention based recommender sys- 3) If we take the Movielens-1M dataset as an example, it tem (ACF-BPR and AMA), we note that the proposed is easy to find that each preference mode captures one model has a consistent advantage on Precision@k metric or more movie genres. Although the preferences are not in general. However, in one of the three datasets, we explicitly bounded with any side information, they are observe that ACF does better than the proposed model automatically captured by the attention mechanism. The on Recall@K. details of the analysis for all datasets can be found in • For the two Amazon datasets, we note the algorithms Appendix C2. equipped with BPR objective show significantly better performance than the other recommender systems. This D. Attentive Interpretation Analysis (RQ3) observation indicates the practical advantage of the meth- ods that directly optimize ranking. However, to show the A key advantage of AMA is the improved interpretability contribution of the proposed novel architecture, we do since we can visualize how AMA generates recommendation. not let the proposed model take advantage of BPR loss. With an attention mechanism, we have the weighted impor- tance of each observed interaction for different user embedding C. Effect of Multi-modal Estimation (RQ2) vectors and with the Maxout Decoder, we know which user embedding vector a recommendation comes from. This not To better evaluate the effectiveness of the multi-modal only helps us to understand how the model is learned but also estimation in our proposed model, we will try to answer the provides a reasonable explanation of the recommendation. following questions: (a) How many users benefit from using We present qualitative examples for four users from more than one user preference? (b) How does the number of Movielens-1M to demonstrate the interpretability of our pro- 3We omit explicit 95% confidence intervals (CIs) in Tables I, II, and III posed model. In each example, there are three columns: apart since all CIs are smaller than 0.001. from observed interactions and recommendations, we have TABLE I: Results of Movielens-1M dataset. Hyper-parameters are chosen from the validation set.

Model R-Precision NDCG MAP@5 MAP@10 MAP@20 Precision@5 Precision@10 Precision@20 Recall@5 Recall@10 Recall@20 POP 7.36% 12.25% 11.68% 10.99% 10.31% 10.85% 9.99% 9.34% 2.37% 4.34% 8.41% AutoRec++ 9.45% 16.74% 12.54% 12.02% 11.31% 11.94% 11.15% 10.29% 3.77% 6.90% 12.15% ACF-BPR 8.40% 14.56% 11.54% 11.10% 10.47% 11.17% 10.37% 9.46% 3.14% 5.55% 9.98% MF-BPR 9.33% 16.75% 11.92% 11.53% 11.02% 11.41% 10.94% 10.16% 3.75% 6.92% 12.28% CDAE 9.41% 15.88% 12.97% 12.34% 11.55% 12.26% 11.43% 10.35% 3.50% 6.18% 10.79% CML 10.00% 17.67% 13.56% 12.99% 12.29% 12.97% 12.21% 11.14% 4.00% 7.18% 12.65% PLRec 9.91% 17.90% 13.10% 12.50% 11.81% 12.48% 11.70% 10.71% 4.24% 7.75% 13.58% PureSVD 9.20% 16.44% 12.12% 11.61% 10.98% 11.60% 10.79% 9.98% 3.83% 6.93% 12.30% VAE-CF 8.92% 16.34% 10.66% 10.45% 10.06% 10.54% 10.08% 9.42% 3.76% 6.88% 12.07% AMA 10.14% 17.91% 13.44% 13.00% 12.27% 13.02% 12.25% 11.12% 4.18% 7.62% 13.25%

TABLE II: Results of Amazon Digital Music dataset. Hyper-parameters are chosen from the validation set.

Model R-Precision NDCG MAP@5 MAP@10 MAP@20 Precision@5 Precision@10 Precision@20 Recall@5 Recall@10 Recall@20 POP 0.29% 1.10% 0.29% 0.27% 0.23% 0.28% 0.22% 0.17% 0.64% 0.92% 1.53% AutoRec++ 0.34% 1.20% 0.36% 0.31% 0.26% 0.29% 0.24% 0.19% 0.64% 1.09% 1.68% ACF-BPR 2.99% 7.15% 2.39% 1.91% 1.48% 1.74% 1.27% 0.92% 5.25% 7.26% 10.14% MF-BPR 3.04% 6.94% 2.45% 1.92% 1.47% 1.71% 1.23% 0.89% 5.23% 7.11% 9.88% CDAE 0.32% 1.17% 0.33% 0.30% 0.25% 0.32% 0.24% 0.18% 0.70% 1.09% 1.65% CML 0.29% 1.50% 0.31% 0.28% 0.25% 0.26% 0.24% 0.23% 0.60% 1.05% 2.16% PLRec 1.62% 4.62% 1.52% 1.25% 1.00% 1.19% 0.89% 0.68% 3.05% 4.46% 6.79% PureSVD 1.61% 4.06% 1.47% 1.17% 0.90% 1.07% 0.78% 0.56% 2.84% 4.04% 5.71% VAE-CF 2.94% 6.68% 2.32% 1.83% 1.41% 1.67% 1.21% 0.86% 4.99% 6.94% 9.43% AMA 3.08% 6.90% 2.59% 2.05% 1.57% 1.85% 1.33% 0.94% 5.06% 7.04% 9.60%

TABLE III: Results of Amazon Video Games dataset. Hyper-parameters are chosen from the validation set.

Model R-Precision NDCG MAP@5 MAP@10 MAP@20 Precision@5 Precision@10 Precision@20 Recall@5 Recall@10 Recall@20 POP 0.24% 1.12% 0.21% 0.19% 0.19% 0.17% 0.19% 0.17% 0.46% 0.98% 1.74% AutoRec++ 0.31% 1.81% 0.37% 0.36% 0.32% 0.39% 0.32% 0.25% 1.04% 1.69% 2.73% ACF-BPR 1.19% 4.91% 1.06% 0.95% 0.83% 0.93% 0.80% 0.65% 2.90% 4.83% 7.59% MF-BPR 1.48% 5.41% 1.27% 1.11% 0.94% 1.07% 0.88% 0.71% 3.23% 5.13% 8.19% CDAE 0.35% 1.62% 0.36% 0.32% 0.28% 0.33% 0.25% 0.23% 0.85% 1.32% 2.48% CML 1.18% 3.79% 0.98% 0.81% 0.67% 0.74% 0.59% 0.49% 2.08% 3.26% 5.33% PLRec 1.94% 5.69% 1.69% 1.41% 1.14% 1.34% 1.04% 0.77% 3.95% 5.96% 8.61% PureSVD 1.32% 4.08% 1.16% 0.98% 0.79% 0.93% 0.72% 0.54% 2.81% 4.23% 6.09% VAE-CF 1.82% 5.62% 1.49% 1.25% 1.02% 1.19% 0.92% 0.71% 3.79% 5.71% 8.51% AMA 2.16% 6.62% 1.83% 1.54% 1.25% 1.46% 1.13% 0.88% 4.48% 6.76% 10.18%

Observed Interactions User Preferences Recommendations Observed Interactions User Preferences Recommendations Chinatown (1974) American Beauty (1999) Star Wars: Episode IV - A New Hope (1977) Raiders of the Lost Ark (1981) Easy Rider (1969) Citizen Kane (1941) Star Wars: Episode V - The Empire Strikes Back (1980) Ghostbusters (1984) Being John Malkovich (1999) Preference1 Annie Hall (1977) Abyss, The (1989) Preference1 Terminator 2: Judgment Day (1991) One Flew Over the Cuckoo’s Nest (1975) Godfather: Part II, The (1974) Star Wars: Episode VI - Return of the Jedi (1983) Star Wars: Episode I - The Phantom Menace (1999) Princess Bride, The (1987) Preference2 Fargo (1996) Groundhog Day (1993) Preference2 Matrix, The (1999) Pulp Fiction (1994) Schindler’s List (1993) Who Framed Roger Rabbit? (1988) Shakespeare in Love (1998) Raging Bull (1980) Preference3 GoodFellas (1990) Men in Black (1997) Preference3 Toy Story (1995) Jonah Who Will Be 25 in the Year 2000 (1976) Taxi Driver (1976) Total Recall (1990) Toy Story 2 (1999) Casablanca (1942) Amadeus (1984) Jurassic Park (1993) Star Trek: First Contact (1996) Fig. 5: Attention case study for users with multiple active preference modes. Red entries in the recommendation lists represent the recommendations hit ground truth1 interactions in the test set. 1 user preferences in the middle, and each user preference mends a variety of related criminal movies such as GoodFellas represents one user embedding vector. The lines between and Taxi Driver. At the same time, the second preference the movies and each user preference represent the weighted learned to pay more attention to comedy-drama movies such importance; the more solid the line, the more important as Being John Malkovich, and this preference recommends the movie is to this preference. The line between a user Annie Hall, another famous comedy-drama movie. Both the preference and the recommendation indicates that the movie is examples in Figure 5(a) and 5(b) show how we can interpret recommended based on this user preference. The red entries in a recommendation and observe the effectiveness of attention the recommendation lists represent the recommendations that for multi-modal preference estimation. “hit” the ground truth interactions in the test set. It is worth mentioning that compared with the traditional According to the experimental results, we find that different AutoRec, AMA is more flexible and can be treated as a general user preferences recommend very diverse movies. For the user form of AutoRec since AMA can decide to use a different in Figure 5(a), the criminal movie Pulp Fiction contributes the number of preferences according to different users. As shown most to the third user preference, which accordingly recom- in Figure 5, for users with a broader range of interest, the Observed Interactions User Preferences Recommendations Observed Interactions User Preferences Recommendations Silence of the Lambs, The (1991) Alien (1979) U-571 (2000) Matrix, The (1999) Forrest Gump (1994) Pulp Fiction (1994) E.T. the Extra-Terrestrial (1982) Alien (1979) From Russia with Love (1963) Preference1 Usual Suspects, The (1995) Day the Earth Stood Still, The (1951) Preference1 Terminator 2: Judgment Day (1991) Star Wars: Episode VI - Return of the Jedi (1983) Saving Private Ryan (1998) Clockwork Orange, A (1971) 2001: A Space Odyssey (1968) Outlaw Josey Wales, The (1976) Preference2 Shawshank Redemption, The (1994) American Beauty (1999) Preference2 Close Encounters of the Third Kind (1977) Fargo (1996) Terminator, The (1984) Gone with the Wind (1939) Schindler’s List (1993) Gone with the Wind (1939) Preference3 Matrix, The (1999) Perfect Storm, The (2000) Preference3 Raiders of the Lost Ark (1981) Simple Plan, A (1998) Monty Python and the Holy Grail (1974) L.A. Confidential (1997) Brazil (1985) Birds, The (1963) (1988) Godfather: Part II, The (1974) Silence of the Lambs, The (1991) Fig. 6: Attention case study for users with single active preference mode. Red entries in the recommendation lists represent the recommendations hit ground truth1 interactions in the test set. 1

[6] F. Zhuang, Z. Zhang, M. Qian, C. Shi, X. Xie, and Q. He, “Repre- 0.09275 0.1650 sentation learning via dual-autoencoder for recommendation,” Neural 0.09270 Networks, 2017. 0.1648 0.09265 [7] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, 0.1646 0.09260 Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances Recall@20

Precision@20 0.1644 in neural information processing systems, 2017. 0.09255 [8] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua, “Atten- 0.09250 0.1642 tive collaborative filtering: Multimedia recommendation with item-and 1 2 3 4 5 component-level attention,” in Proceedings of the 40th International Number of User Preference Estimated ACM SIGIR conference on Research and Development in Information Retrieval, 2017. Fig. 7: Recommendation performance varies based on the [9] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma, “Neural attentive number of user preference modes or facets that are explicitly session-based recommendation,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017. estimated. [10] C. Zhou, J. Bai, J. Song, X. Liu, Z. Zhao, X. Chen, and J. Gao, “Atrank: An attention-based user behavior modeling framework for recommen- dation,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018. model tends to use two preferences for recommendation, while [11] W.-C. Kang and J. McAuley, “Self-attentive sequential recommenda- in Figure 6, the model chooses to use just one preference for tion,” in 2018 IEEE International Conference on Data Mining (ICDM), users with less diverse taste. 2018. [12] M. Fu, H. Qu, D. Moges, and L. Lu, “Attention based collaborative V. CONCLUSION filtering,” Neurocomputing, 2018. [13] Y. Tay, L. Anh Tuan, and S. C. Hui, “Latent relational metric learning In this paper, we proposed a novel and efficient model via memory-based attention for collaborative ranking,” in Proceedings called Attentive Multi-modal AutoRec (AMA) to explicitly of the 2018 World Wide Web Conference, 2018. [14] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative filtering for implicit track multiple facets of user preferences by introducing multi- feedback datasets,” in 2008 Eighth IEEE International Conference on modal estimation into AutoRec. To do this, AMA leverages Data Mining, 2008. an attention mechanism to assign different attention weights [15] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr: Bayesian personalized ranking from implicit feedback,” in Proceedings to each item interaction when it is used to estimate different of the twenty-fifth conference on uncertainty in artificial intelligence, preference facets of the user. AMA not only captures multi- 2009. faceted user preferences in a latent space but also improves [16] Y. Wu, C. DuBois, A. X. Zheng, and M. Ester, “Collaborative denoising auto-encoders for top-n recommender systems,” in Proceedings of the the interpretability of the model since we know which latent Ninth ACM International Conference on Web Search and Data Mining, preference facet a recommendation comes from and which 2016. items contribute to the corresponding user preference facet. [17] C.-K. Hsieh, L. Yang, Y. Cui, T.-Y. Lin, S. Belongie, and D. Estrin, “Collaborative metric learning,” in Proceedings of the 26th International Through extensive experiments on three real-world datasets, Conference on World Wide Web, 2017. we demonstrated the effectiveness of the multi-modal estima- [18] S. Sedhain, H. Bui, J. Kawale, N. Vlassis, B. Kveton, A. Menon, tion in our model and showed that AMA is competitive with T. Bui, and S. Sanner, “Practical linear models for large-scale one-class collaborative filtering,” in Proceedings of the 25th International Joint many state-of-the-art models in the OC-CF setting. Conference on Artificial Intelligence (IJCAI-16), 2016. [19] G. Wu, M. Volkovs, C. L. Soon, S. Sanner, and H. Rai, “Noise con- REFERENCES trastive estimation for one-class collaborative filtering,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and [1] R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. Lukose, M. Scholz, and Q. Yang, Development in Information Retrieval (SIGIR-19), 2019. “One-class collaborative filtering,” in Data Mining, 2008. ICDM’08. [20] P. Cremonesi, Y. Koren, and R. Turrin, “Performance of recommender Eighth IEEE International Conference on, 2008. algorithms on top-n recommendation tasks,” in Proceedings of the fourth [2] N. Srivastava and R. R. Salakhutdinov, “Multimodal learning with deep ACM conference on Recommender systems, 2010. boltzmann machines,” in Advances in Neural Information Processing [21] D. Liang, R. G. Krishnan, M. D. Hoffman, and T. Jebara, “Variational Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, autoencoders for collaborative filtering,” in Proceedings of the 2018 Eds., 2012. World Wide Web Conference, 2018. [3] S. Sedhain, A. K. Menon, S. Sanner, and L. Xie, “Autorec: Autoencoders [22] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extract- meet collaborative filtering,” in Proceedings of the 24th International ing and composing robust features with denoising autoencoders,” in Conference on World Wide Web, 2015. Proceedings of the 25th international conference on Machine learning, [4] O. Kuchaiev and B. Ginsburg, “Training deep autoencoders for collab- 2008. orative filtering,” arXiv preprint arXiv:1708.01715, 2017. [23] G. Shani and A. Gunawardana, “Evaluating recommendation systems,” [5] S. Cao, N. Yang, and Z. Liu, “Online news recommender based on in Recommender systems handbook. Springer, 2011. stacked auto-encoder,” in 2017 IEEE/ACIS 16th International Confer- ence on Computer and Information Science (ICIS), 2017. APPENDIX A TABLE B.1: Datasets statistics. HYPER-PARAMETER DETAIL Dataset m n |ri,j > ϑ| Sparsity In Table A.1, we show all the tune-able parameters of MovieLens-1M 6,038 3,533 575,281 2.7 × 10−2 our AMA model. We tune the hyper-parameters for all the Amazon Digital Music 16,502 11,795 136,858 7.0 × 10−4 −4 candidate algorithms by evaluating on the validation datasets Amazon Video Games 54,721 17,365 378,949 4.0 × 10 through the greedy search. The best hyper-parameter settings found for each algorithm and domain are listed in Table A.2. we use the first 50% of data as the train set, 20% data as TABLE A.1: Hyper-parameter set for AMA model validation set and 30% data as the test set. APPENDIX C Symbol Function WHAT ARE THE TOP-10 ITEMSTHATEACHPREFERENCE h Size of latent representation vector MODE PAYS THE GREATEST ATTENTION TO? d Number of user preference facets κ Size of key and query vector Table C.1, C.2, and C.3 show the top-10 items that receives α Loss weighting parameter the highest attention for each of the three preference modes. ρ Random input corruption rate (see [22]) For the Movielens-1M dataset in Table C.1, it is easy to NCE Noise Contrastive Estimation based item embedding note that the preference mode 1 captures the sci-fi movies, initialization (see [19]) preference mode 2 captures comedy and drama, and preference mode 3 captures horror movies. While the preferences are TABLE A.2: Best hyper-parameter setting for each algorithm. not explicitly bounded with any side information, they are automatically captured by the attention mechanism. Domain Algorithm h α λ ε γ ρ d NCE? However, we also note that not all of the preference modes POP ------are interpretable. The three preferences captured in table C.2, AutoRec 200 - 1E-05 300 - - - - ACF-BPR 100 - 0.0001 100 10 - - X while having a noticeable difference in item popularity, are MF-BPR 200 - 1E-05 30 - - - - CDAE 200 - 1E-05 300 - 0.5 - - hard to summarize using natural language. MovieLens-1M CML 200 - 0.1 30 - - - - In Table C.3, we see that two out of three preference PLRec 100 - 10000 10 10 - - X PureSVD 50 - 1 10 10 - - - modes capture various outliers that have lower item popularity VAE-CF 200 - 1E-05 300 - 0.4 - - than top ranked items in another mode. This observation is AMA 40 1 1e-05 300 10 0.3 3 X POP ------consistent with our hyper-parameter tuning results shown in AutoRec 200 - 0.0001 300 - - - - Table A.2, where the AMA performs the best on the Amazon ACF-BPR 200 - 1E-05 100 10 - - X MF-BPR 200 - 0.0001 30 - - - - dataset when it selects only one mode. These Amazon CDAE 200 - 0.0001 300 - 0.3 - - results show that not all datasets benefit from multi-modal Digital Music CML 200 - 0.001 30 - - - - PLRec 200 - 10000 - 10 - - X user preference estimation and for datasets where uni-modal PureSVD 200 - - - 10 - - - VAE-CF 200 - 1E-05 300 - 0.2 - - estimation is sufficient, AMA is able to adapt based on the AMA 200 10 0.0001 300 10 0.4 5 X characteristics of the datasets. POP ------AutoRec 100 - 0.0001 300 - - - - APPENDIX D ACF-BPR 200 - 0.0001 100 10 - - X MF-BPR 200 - 0.0001 30 - - - - BASELINES MODIFICATION Amazon CDAE 100 - 1E-05 300 - 0.4 - - Video Games CML 200 - 0.01 30 - - - - We compare our proposed model against several collabora- PLRec 200 - 10000 - 10 - - X tive filtering algorithms on one-class recommendation tasks. PureSVD 100 - 1 - 10 - - - VAE-CF 200 - 1E-05 300 - 0.2 - - Since some of the recommender systems are not intentionally AMA 100 10 0.001 300 10 0.4 1 X designed form the OC-CF tasks, we extended and enhanced * In this table, λ: regularization ε: epoch, γ: optimization iteration * For PureSVD, PLRec and AMA, optimization iteration means number of randomized SVD them for the one-class task from various perspectives. For iterations. For WRMF, iteration shows number of alternative close-form optimizations. example, we modified AutoRec in two aspects: 1) we re- place Sigmoid activation function with Relu activation, 2) APPENDIX B we replace the Mean Square Error with Sigmoid Cross- DATASET entropy loss. Both of these modifications show reasonable performance improvement and even outperform some state-of- For each dataset, we binarize the rating dataset with a rating the-art algorithms on the Movielens-1M benchmark dataset. In threshold, ϑ, defined as the upper half of the range of the addition, we enhanced the ACF-BPR and PLRec algorithms ratings. We do this so that the observed interactions correspond with the recently introduced NCE technique [19] and observed to positive feedback. To be specific, the threshold is ϑ > 3 remarkable performance improvement over its vanilla form for all datasets. Table B.1 summarizes the properties of the with item embeddings from an SVD decomposition. binarized matrices. We split the data into train, validation and test sets based on timestamps given by the dataset to provide a recommendation evaluation setting closer to production use [23]. For each user, TABLE C.1: Top-10 movies that receive the highest attentions for the three preference modes respectively (Movielens-1M)

Preference1 Preference2 Preference3 Movie Name Rank Count Movie Name Rank Count Movie Name Rank Count Terminator, The (1984) 1 1015 Clerks (1994) 1 628 Edward Scissorhands (1990) 1 476 Alien (1979) 2 1094 Being John Malkovich (1999) 2 1270 Professional, The (1994) 2 366 Matrix, The (1999) 3 1490 American Pie (1999) 3 527 Crying Game, The (1992) 3 403 Aliens (1986) 4 887 South Park: Bigger, Longer and Uncut (1999) 4 493 Sense and Sensibility (1995) 4 359 Terminator 2: Judgment Day (1991) 5 1419 Election (1999) 5 740 Like Water for Chocolate (1992) 5 297 Star Wars: Episode IV - A New Hope (1977) 6 1894 Austin Powers: The Spy Who Shagged Me (1999) 6 421 Interview with the Vampire (1994) 6 171 Star Wars: Episode V - The Empire Strikes Back (1980) 7 1839 Wrong Trousers, The (1993) 7 585 Nightmare Before Christmas, The (1993) 7 314 Die Hard (1988) 8 722 Close Shave, A (1995) 8 412 Rocky Horror Picture Show, The (1975) 8 302 Fugitive, The (1993) 9 955 Weird Science (1985) 9 80 Much Ado About Nothing (1993) 9 241 Total Recall (1990) 10 770 Office Space (1999) 10 275 Room with a View, A (1986) 10 189

TABLE C.2: Top-10 albums that that receive the highest attentions for the three preference modes respectively. (Digital Music)

Preference1 Preference2 Preference3 Album Name Rank Count Album Name Rank Count Album Name Rank Count Happy by Various artists 1 376 Danza Kuduro by Don Omar 1 31 Blurred Lines 1 196 Radioactive by Imagine Dragons 2 168 American VI: Ain’t No Grave by Johnny Cash 2 22 Royals 2 100 A Thousand Years by Christina Perri 3 129 Uncaged by Zac Brown Band 3 27 Roar by Katy Perry 3 151 Somebody That I Used To Know by Gotye 4 129 Court Yard Hounds 4 15 When I Was Your Man by Bruno Mars 4 80 Over The Rainbow/What A Wonderful World by Israel 5 94 Sultans of Swing by Dire Straits 5 22 Let It Go by Idina Menzel 5 115 Let Her Go by Passenger 6 109 Give Your Heart a Break by Demi Lovato 6 31 Skyfall by Adele 6 85 Break Every Chain by Tasha Cobbs Leonard 7 88 Charleston, SC 1966 by Darius Rucker 7 19 Moves Like Jagger by Maroon 5 7 119 Oh Honey by Delegation 8 12 Amazing by Ricky Dillard & New G 8 19 Sail by AWOLNATION 8 93 I Loved Her First by The Heartland 9 31 You & I by Avant 9 25 Honest Face by Liam Finn + Eliza Jane 9 89 War by Charles Jenkins & Fellowship Chicago 10 35 The One by Eric Benet´ 10 21 Thinking out Loud by Ed Sheeran 10 108

TABLE C.3: Top-10 games that that receive the highest attentions for the three preference modes respectively. (Video Games)

Preference1 Preference2 Preference3 Game Name Rank Count Game Name Rank Count Game Name Rank Count Final Fantasy VII: Sony PlayStation 1 205 The Bourne Conspiracy - PlayStation 3 1 2 Turtle Beach Ear Force PX22 Gaming Headset 1 50 Grand Theft Auto: Vice City: Sony PlayStation 2 2 182 Guitar Hero World Tour Drum - Nintendo Wii 2 3 Turtle Beach Ear Force PX22 Gaming Headset 2 103 The Legend of Zelda: The Wind Waker: Nintendo 3 140 Datel Tool Battery PSP Slim 3 1 The Wolf Among Us - Xbox 360 3 2 Mass Effect - Xbox 360 4 200 Teenage Zombies: Nintendo DS 4 1 Snoopy’s Grand Adventure - Xbox 360 4 1 Call of Duty: Modern Warfare 3 - Xbox 360 5 181 Battle of the Bands - Nintendo Wii 5 1 Back To The Future 30th Anniversary Xbox One 5 3 Resistance: Fall Of Man - PlayStation 3 6 119 Bensussen Deutsch Power Wired Controller for PS3 6 2 Back To The Future 30th Anniversary Xbox 360 6 3 Buy Halo 2 - Xbox with Ubuy Kuwait 7 200 Lara Croft Tomb Raider: Anniversary 7 1 BlazBlue: Chrono Phantasma EXTEND - Xbox One 7 6 Far Cry 3 X360 8 196 Master Of Orion 2 - Battle at Antares - PC 8 2 The Amazing Spider-man vs The Kingpin - 8 2 Resident Evil 4 - Wii: nintendo wii 9 164 PLAY-IN-CASE CLASSIC for PSP SLIM 9 2 Sunset Overdrive - Xbox One Digital Code Xbox One 9 2 New Super Mario Bros 10 251 Gaming Headset For PS4 Xbox One Smart Phone 10 1 Dynamite Cop - Sega 10 3