Reinforcement Learning Based Recommender Systemusing

Reinforcement Learning based Recommender System using Biclustering Technique Sungwoon Choi Heonseok Ha Uiwon Hwang Seoul National University Seoul National University Seoul National University Korea University [email protected] [email protected] [email protected] Chanju Kim Jung-Woo Ha Sungroh Yoon∗ Clova AI Research Clova AI Research Seoul National University NAVER Corporation NAVER Corporation [email protected] [email protected] [email protected] ABSTRACT System using Biclustering Technique. In Proceedings of Workshop on Multi- A recommender system aims to recommend items that a user is dimensional Information Fusion for User Modeling and Personalization (IFUP’18). ACM, New York, NY, USA, 4 pages. https://doi.org/10.475/123_4 interested in among many items. The need for the recommender system has been expanded by the information explosion. Various approaches have been suggested for providing meaningful recom- 1 INTRODUCTION mendations to users. One of the proposed approaches is to consider As the choice of users increases, the importance of recommender a recommender system as a Markov decision process (MDP) problem systems that assist in decision making is increasing day by day. and try to solve it using reinforcement learning (RL). However, Recommender systems are introduced in a variety of domains, and existing RL-based methods have an obvious drawback. To solve an the performance of recommender systems is directly related to the MDP in a recommender system, they encountered a problem with interests of the company or individual. Previously, recommender the large number of discrete actions that bring RL to a larger class of systems have achieved great success with a method called collabo- problems. In this paper, we propose a novel RL-based recommender rative filtering (CF). CF is one of the most popular techniques in system. We formulate a recommender system as a gridworld game the recommender system domain. The objective of CF is to make by using a biclustering technique that can reduce the state and ac- a personalized prediction about the preferences of users using the tion space significantly. Using biclustering not only reduces space information about other users who have similar interests for items. but also improves the recommendation quality effectively handling One disadvantage of CF is that it considers only one of the two the cold-start problem. In addition, our approach can provide users dimensions (i.e., users or items), which often makes it difficult to with some explanation why the system recommends certain items. detect important patterns that otherwise could be captured by con- Lastly, we examine the proposed algorithm on a real-world dataset sidering both dimensions. In addition, the data matrix a typical and achieve a better performance than the widely used recommen- recommender system has to handle is sparse and high-dimensional, dation algorithm. because there are a large number of available items, many of which are never purchased or rated by the users. These two facts led CCS CONCEPTS to the developments of biclustering-based recommender systems, • Information systems → Collaborative filtering; • Computing some of which have shown superior performance to conventional methodologies → Artificial intelligence; CF approaches [1, 5, 8, 15, 17]. Biclustering, also known as co- clustering [4], two-way clustering [6], and simultaneous cluster- KEYWORDS ing [7], aims to find subsets of rows and columns of a given data matrix [3]. The big difference between clustering and bicluster- arXiv:1801.05532v1 [cs.IR] 17 Jan 2018 Recommender System, Reinforcement Learning, Markov Decision ing is that clustering derives a global model, whereas biclustering Process, Biclustering produces a local model [9]. ACM Reference Format: Another disadvantage of CF is that it is static, therefore it is Sungwoon Choi, Heonseok Ha, Uiwon Hwang, Chanju Kim, Jung-Woo usually not possible to reflect a user’s response in real time. There- Ha, and Sungroh Yoon. 2018. Reinforcement Learning based Recommender fore, an MDP-based recommender system is proposed [13]. They use a discrete state MDP model to maximize the utility function ∗Corresponding author that takes into account the future interactions with their users. In Permission to make digital or hard copies of part or all of this work for personal or their work, they suggest the use of an n-gram predictive model for classroom use is granted without fee provided that copies are not made or distributed generating the initial MDP. They consider the actions of the MDP for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. as a recommendation for an item. This leads to a large action space For all other uses, contact the owner/author(s). which makes it difficult to solve the MDP problem. IFUP’18, Feb 2018, Los Angeles, USA In this paper, we propose a new recommendation algorithm © 2018 Copyright held by the owner/author(s). ACM ISBN 123-4567-24-567/08/06. using biclustering and RL. We reduce state and action space by https://doi.org/10.475/123_4 using a biclustering technique which renders the MDP problem IFUP’18, Feb 2018, Los Angeles, USA S. Choi et al. easy to solve. Using biclustering not only reduces space but also b improves the recommendation quality of the cold start problem. b1 2 Moreover, it can be explained to users why the system recommends bk-1 arg max 푄휋 certain items. b3 b4 b The paper is structured as follows. In Section 2 we review the k Biclusters from user-item matrix necessary background on MDP and RL. In Section 3 we define the problem. In Section 4 we describe the proposed approach. Section 5 provide an empirical evaluation of the actual recommender sys- b 1 b3 b9 b1 b2 tem based on the two Movielens datasets. We discuss the paper in b7 b b13 b6 b11 Section 6 then we conclude the paper in Section 7. b3 4 b2 b b b b b4 15 7 k 12 2 PRELIMINARY bk b11 bk-1 b14 b8 b4 bk-1 Markov Decision Processes : An MDP is a model for sequential stochastic decision problems [14]. An MDP model is specified by Bicluster remapped Bicluster mapped to 2-D space a tuple of states, actions, a reward function, a transition function, n x n grid states and a discount factor. The agent stays in a particular state st 2 S for each discrete time step t 2 f0; 1; 2; :::g. After the choice of an Figure 1: Overview of the proposed method action at 2 A, the agent moves to the next state st+1 by calling a transition function T¹st ; at º. At the same time the agent receives a ¹ º reward rt from the environment by reward function R st ; at ;st+1 . Reward Function R¹st ; at ;st+1º : A reward rt is also determin- Based on a policy π¹sº, the action a is selected in a certain state s. istic and is determined by the proposed reward function as follows: MDP can be solved by RL. RL aims to find the optimal policy π ∗ ¹ ; ; º ¹ ; º that maximizes the exptected cumulative reward G which is called R st at st+1 = Jaccard_Distance Ust Ust+1 (3) return. In RL, the optimal policy can be learned by a state-action jUs \ Us j = t t +1 : (4) value function Q ¹s; aº which means the expected value of the π jUst [ Ust +1 j return G obtained from episodes starting from a certain state s with The agent receives between 0 to 1 reward through the calculation of the action a.Qπ ¹s; aº can be expressed as follows: the Jaccard distance with the user vectors of two states st ;st+1. In this environment, reward is deterministic function of state-action Q ¹s; aº = Eπ fGt jst = s; at = ag (1) π pair. As the two states have more same users, the reward approaches ( 1 ) Õ to 1. The similarity of the two item vectors of the states is not = E γ kr js = s; a = a (2) π t+k t t considered as a reward, because we do not want to recommend k=0 only a small number of items when moving the state. where γ is a discount factor ¹0 < γ ≤ 1º. 4 PROPOSED APPROACH 3 PROBLEM DEFINITION The proposed method is composed of four parts: constructing the states, learning the Q-Function, generating recommendations, and We consider a recommender system as an MDP problem that can updating the model online. be formalized in a gridworld. Figure 1 describes the overview of the formalization. A gridworld is a 2D environment in which an 4.1 Constructing the States agent can move in four directons at a time. Typically, the goal in a 2 gridworld is that the agent navigates to some location by maximiz- In the next step, each state is mapped to one of n biclusters, so ing the return. In our case, the agent and the state are considered that each state has an item set and a user set. The mapping is as a user and a group of items, respectively. User movement in the performed based on the distance between the user vector of the gridworld means getting new recommendations from the group of bicluster and the states of the gridworld that can be considered as items. Moreover, the reward can be considered as a user’s satisfac- a two-dimensional (2D) euclidean space. However, the user vector tion for the recommended items. At first, we need to be specify the of the bicluster is not 2D so it is converted to a 2D space using a environment of the MDP.

Reinforcement Learning Based Recommender Systemusing

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support