Sequential Recommender System Based on Hierarchical Attention

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Sequential Recommender System based on Hierarchical Attention Network Haochao Ying1, Fuzhen Zhuang2∗, Fuzheng Zhang3, Yanchi Liu5, Guandong Xu4, Xing Xie3, Hui Xiong5 and Jian Wu1† 1College of Computer Science and Technology, Zhejiang University, China 2Key Lab of IIP of CAS, Institute of Computing Technology, CAS Beijing, China 3Microsoft Research, China 4Advanced Analytics Institute, University of Technology, Australia 5Management Science & Information Systems, Rutgers University, USA Abstract Different from traditional recommender systems, there are new challenges in sequential recommendation scenar- With a large amount of user activity data accumu- ios. First, user behaviors in the above examples only reflect lated, it is crucial to exploit user sequential behav- their implicit feedbacks (e.g., purchased or not), other than ior for sequential recommendations. Convention- explicit feedbacks (e.g., ratings). This type of data brings ally, user general taste and recent demand are com- more noises because we cannot differentiate whether users bined to promote recommendation performances. dislike unobserved items or just do not realize them. There- However, existing methods often neglect that user fore, it is not appropriate to directly optimize such one-class long-term preference keep evolving over time, and score (i.e., 1 or 0) through conventional latent factor model building a static representation for user general [Bayer et al., 2017]. Second, more and more data is orig- taste may not adequately reflect the dynamic char- inated from sessions or transactions, which form user’s se- acters. Moreover, they integrate user-item or item- quential pattern and short-term preference. For instance, item interactions through a linear way which lim- users prefer resting at hotels than sporting after they leave its the capability of model. To this end, in this the airport, while after buying a camera, customers choose paper, we propose a novel two-layer hierarchical purchasing relevant accessories rather than clothes. How- attention network, which takes the above proper- ever, previous methods mainly focus on user general taste ties into account, to recommend the next item user and rarely consider sequential information, which leads to re- might be interested. Specifically, the first attention peated recommendations [Hu et al., 2017; Ying et al., 2016; layer learns user long-term preferences based on Zhang et al., 2016]. the historical purchased item representation, while the second one outputs final user representation In the literature, researchers usually employ separate mod- through coupling user long-term and short-term els to characterize user’s long-term preference (i.e, general preferences. The experimental study demonstrates taste) and short-term preference (i.e., sequential pattern), and the superiority of our method compared with other then integrate them together [Rendle et al., 2009; Feng et state-of-the-art ones. al., 2015; He and McAuley, 2016]. For example, Rendel et al. [Rendle et al., 2010] propose factoring personalized Markov chains for next basket prediction. They factorize ob- 1 Introduction served user-item matrix to learn user’s long-term preference With the emergence of platform economy, many companies and utilize item-item transitions to model sequential informa- like Amazon, Yelp, and Uber, are creating self-ecosystems to tion, and then linearly add them to get final scores. How- retain users through interaction with products and services. ever, these models neglect the dynamics of user general taste, Users can easily access these platforms through mobile de- which means user’s long-term preference keep evolving over vices in daily life, as a result large amounts of behavior logs time. It is not adequate to learn a static low-rank vector for have been generated. For instance, 62 million user trips have each user to model her general taste. Moreover, they mainly been accumulated in July 2016 at Uber, and more than 10 bil- assign fixed weights for user-item or item-item interactions lion check-ins have been generated by over 50 million users at through linear modeling, which limits the model capability. Foursquare. With such massive user sequential behavior data, It has been shown that nonlinear models can better model the sequential recommendation, which is to recommend the next user-item interaction in user activities [He and Chua, 2017; item user might be interested, has become a critical task for Xiao et al., 2017; Cheng et al., 2016]. improving user experience and meanwhile driving new value To this end, we propose a novel approach, namely Sequen- for platforms. tial Hierarchical Attention Network (SHAN), to solve the ∗Fuzhen Zhuang is also with University of Chinese Academy of next item recommendation problem. The attention mecha- Sciences, Beijing, China nism can automatically assign different influences (weights) †Jian Wu is the corresponding author and his email is wu- of items for user to capture the dynamic property, while the [email protected]. hierarchical structure combines user’s long- and short-term 3926 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) preferences. Specifically, we first embed users and items into items in a session may not follow rigidly sequential order in low-dimensional dense spaces. Then an attention layer is em- many real scenarios, e.g., transactions in online shopping, ployed to compute different weights of items in user long- where RNN is not applicable. Beyond that, [Wang et al., term set and then compress item vectors with weights to gen- 2015] and [Hu et al., 2017] learned user hierarchical repre- erate user long-term representation. After that, we use an- sentation to combine user long- and short-term preferences. other attention layer to couple user sequential behavior with Our work follows this pipeline but contributes in that: (1) long-term representation. User embedding vector is used as Our model is built on hierarchical attention networks, which context information in both attention networks to compute can caputure dynamic long- and short-term preferences. (2) different weights for different users. To learn the parameters, Our model utilizes nonlinear modeling of user-item interac- we employ the Bayesian personalized ranking optimization tions. It is able to learn different item influences (weights) of criterion to generate a pair-wise loss function [Rendle et al., different users for the same item. 2009]. From the experiments, we can observe that our model outperforms state-of-the-art algorithms on two datasets. Fi- 3 Sequential Hierarchical Attention Network nally, our contributions are summarized as follows: In this section, we first formulate our next item recommen- • We introduce the attention mechanism to model user dy- dation problem and then introduce the details of our model. namics and personal preferences for sequential recom- Finally, we present the optimization procedures. mendations. • Through the hierarchical structure, we combine user’s 3.1 Problem Formulation long- and short-term preferences to generate a high-level Let U denote a set of users and V denote a set of items, where hybrid representation of user. |U| and |V| are the total numbers of users and items, respec- • We perform experiments on two datasets which show tively. In this work, we focus on extracting information from our model consistently outperforms state-of-the-art implicit, sequential user-item feedback data (e.g., users’ suc- methods in terms of Recall and Area Under Curve. cessive check-ins and purchase transaction records). For each user u ∈U, his/her sequential transactions (or sessions) are Lu {Su,Su, ..., Su } T denoted as = 1 2 T , where is the total num- 2 Related Work Su ⊆V t ∈ ,T ber of time steps and t ( [1 ]) represents the item To model user’s individual and sequential information jointly, set corresponding to the transaction of user u at time step t. t Su u Markov chains have been introduced by previous work for For a fixed time step , the item set t can reflect user ’s traditional recommendations. [Rendle et al., 2010] combined short-term preference at time t, which is an important fac- factorization method to model user general taste and Markov tor for predicting the next item he/she will purchase. On the chains to mine user sequential pattern. Following this idea, other hand, the set of items purchased before time step t, de- Lu Su ∪ Su ∪ ... ∪ Su u researchers have utilized different methods to extract these noted as t−1 = 1 2 t−1, can reflect user ’s two different user preferences. [Chen et al., 2012] and [Feng long-term preference (i.e., general taste). In the following, Lu Su et al., 2015] used metric embedding to project items into we name t−1 and t the long- and short-term item sets w.r.t points in a low-dimension Euclidean space for play list pre- time step t, respectively. diction and successive location recommendation. [Liang et Formally, given users and their sequential transactions L, al., 2016] utilized word embedding to extract information we aim to recommend the next items users will purchase from item-item co-occurrence to improve matrix factoriza- based on long- and short-term preferences learned from L. tion performance. However, these methods have limited ca- pacity on capturing high-level user-item interactions, because 3.2 The Network Architecture the weights of different components are fixed. We propose a novel approach based on hierarchical attention Recently, researchers turn to graphical models and neu- network, as shown in Figure 1, according to the following ral networks in recommender systems. [Liu et al., 2016] characteristics of user preference. 1) User preference is dy- proposed a bi-weighted low-rank graph construction model, namic at different time steps. 2) Different items have different which integrates users’ interests and sequential preferences influences on the next item that will be purchased.

Load more