A Decision Tree Based Recommender System

ADecision Tree Based Recommender System Amir Gershman, Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev Beer-Sheva,84105, Israel amirger,[email protected] Karl-Heinz Luke¨ Deutsche Telekom AG,Laboratories, Innovation Development Ernst-Reuter-Platz 7, D-10587 Berlin, Germany [email protected] Lior Rokach, Alon Schclar,Arnon Sturm Department of Information Systems Engineering, Ben-Gurion University of the Negev Beer-Sheva,84105, Israel liorrk, schclar,[email protected] Abstract: Anew method for decision-tree-based recommender systems is proposed. The proposed method includes twonew major innovations. First, the decision tree produces lists of recommended items at its leaf nodes, instead of single items. This leads to reduced amount of search, when using the tree to compile arecommendation list for auser and consequently enables ascaling of the recommendation system. The second major contribution of the paper is the splitting method for constructing the decision tree. Splitting is based on anew criterion -the least probable intersection size. The newcriterion computes the probability for getting the intersection for each potential split in arandom split and selects the split that generates the least probable size of intersection. The proposed decision tree based recommendation system was evaluated on alarge sample of the MovieLens dataset and is shown to outperform the quality of recommendations produced by the well known information gain splitting criterion. 1Introduction Recommender Systems (RS) propose useful and interesting items to users in order to in- crease both seller’sprofit and buyer’ssatisfaction. Theycontribute to the commercial success of manyon-line ventures such as Amazon.com or NetFlix [Net] and are avery active research area. Examples of recommended items include movies, web pages, books, news items and more. Often aRSattempts to predict the rating auser will give to items 170 based on her past ratings and the ratings of other (similar) users. Decision Trees have been previously used as amodel-based approach for recommender systems. The use of decision trees for building recommendation models offers several benefits, such as: efficiencyand interpretability [ZI02] and flexibility in handling avariety of input data types (ratings, demographic, contextual, etc.). The decision tree forms apredictive model which maps the input to apredicted value based on the input’sattributes. Each interior node in the tree corresponds to an attribute and each arc from aparent to achild node represents apossible value or aset of values of that attribute. The construction of the tree begins with aroot node and the input set. An attribute is assigned to the root and arcs and sub-nodes for each set of values are created. The input set is then split by the values so that each child node receivesonly the part of the input set which matches the attribute value as specified by the arc to the child node. The process then repeats itself recursively for each child until splitting is no longer feasible. Either asingle classification(predicted value) can be applied to each element in the divided set, or some other threshold is reached. Amajor weakness in using decision trees as aprediction model in RS is the need to build ahuge number of trees (either for each item or for each user). Moreover, the model can only compute the expected rating of asingle item at atime. To provide recommendations to the user,wemust traverse the tree(s) from root to leaf once for each item in order to compute its predicted rating. Only after computing the predicted rating of all items can the RS provide the recommendations (highest predicted rating items). Thus decision trees in RS do not scale well with respect to the number of items. We propose amodification to the decision tree model, to makeitofpractical use for larger scale RS. Instead of predicting the rating of an item, the decision tree would return a weighted list of recommended items. Thus with just asingle traverse of the tree, recommendations can be constructed and provided to the user.This variation of decision tree based RS is described in section 2. The second contribution of this paper is in the introduction of anew heuristic criteria for building the decision tree. Instead of picking the split attribute to be the attribute which produces the largest information gain ratio, the proposed heuristic looks at the number of shared items between the divided sets. The split attribute which had the lowest probability of producing its number of shared items, when compared to arandom split, is picked as the split attribute. This heuristic is described in further detail in section 3. We evaluate our newheuristic and compare it to the information gain heuristic used by the popular C.45 algorithm ([Qui93]) in section 4. 2RS-Adapted Decision Tree In recommender systems the input set for building the decision tree is composed of Rat- ings. Ratings can be described as arelation <ItemID, UserID,Rating > (inwhich <ItemID, UserID > is assumed to be aprimary key). The attributes can describe the 171 users, such as the user’sage, gender,occupation. Attributes can also describe the items, for example the weight, price, dimensions. Rating is the target attribute which the decision tree classifies by.Based on the training set, the system attempts to predict the Rating of items the user does not have a Rating for,and recommends to the user the items with the highest predicted Rating. The construction of adecision tree is performed by arecursive process. The process starts at the root node with an input set (training set). At each node an item attribute is picked as the split attribute. Foreach possible value (or set of values) child-nodes are created and the parent’sset is split between child-nodes so that each child-node receivesasinput-set all items that have the appropriate value(s) that correspond to this child-node. Picking the split-attribute is done heuristically since we cannot knowwhich split will produce the best tree (the tree that produces the best results for future input), for example the popular C4.5 algorithm ([Qui93]) uses aheuristic that picks the split that produces the largest information gain out of all possible splits. One of the attributes is pre-defined as the target attribute. The recursive process continues until all the items in the node’sset share the same target attribute value or the number of items reaches acertain threshold. Each leaf node is assigned alabel (classifying its set of items), this label is the shared target attribute value or the most common value in case there is more than one such value. Decision trees can be used for different recommender systems approaches: • Collaborative Filtering -Breese et al. [BHK98] used decision trees for building acol- laborative filtering system. Each instance in the training set refers to asingle customer. The training set attributes refer to the feedback provided by the customer for each item in the system. In this case adedicated decision tree is built for each item. Forthis pur- pose the feedback provided for the targeted item (for instance like/dislike) is considered to be the decision that is needed to be predicted, while the feedback provided for all other items is used as the input attributes (decision nodes). Figure 1(left) illustrates an example of such atree, for movies. • Content-Based Approach -Liand Yamda [LY04] and Bouza et al. [BRBG08] propose to use content features to build adecision tree. Aseparate decision tree is built for each user and is used as auser profile. The features of each of the items are used to build amodel that explains the user’spreferences. The information gain of every feature is used as the splitting criteria. Figure 1(right) illustrates Bob’sprofile. It should be noted that although this approach is interesting from atheoretical perspective,the precision that wasreported for this system is worse than that of recommending the average rating. • Hybrid Approach -Ahybrid decision tree can also be constructed. Only asingle tree is constructed in this approach. The tree is similar to the collaborative approach, in that it takes user’sattributes as attributes to split by (such as her liking/disliking of a certain movie) butthe attributes it uses are general attributes that represent the user’s preference for the general case, based on the content of the items. The attributes are constructed based on the user’spast ratings and the content of the items. Forexample, auser which rated negatively all movies of genre comedy is assigned alow value in a”degree of liking comedy movies” attribute. Similarly to the collaborative approach, 172 the tree constructed is applicable to all users. However, it is nowalso applicable to all items since the newattributes represent the user’spreferences for all items and not just asingle givenitem. Figure 2illustrates such ahybrid tree. Consider ageneral case with adata set containing n users, m items, and an average decision tree of height h.The collaborative filtering based RS requires m trees to be constructed, one for each item. When auser likes to receive arecommendation on what movie to watch, the system traverses all trees, from root to leaf, until it finds an item the user would like to view. The time complexity in this case is therefore O(h · m).This might be too slow for alarge system that needs to provide fast, on-demand, recommendations to users. The content based approach requires n trees to be constructed, one for each user.When auser likes to receive arecommendation, the system needs to traverse the user’stree from root to leaf once for each item, until it finds an item the user would liketoview. The time complexity in this case is therefore O(h · m).Similarly,inthe hybrid approach, the tree needs to be traversed once for each item, and the time complexity is also O(h · m).

Load more