A Robust Collaborative Filtering Approach Based on User Relationships for Recommendation Systems
Total Page:16
File Type:pdf, Size:1020Kb
Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2014, Article ID 162521, 8 pages http://dx.doi.org/10.1155/2014/162521 Research Article A Robust Collaborative Filtering Approach Based on User Relationships for Recommendation Systems Min Gao,1,2 Bin Ling,3 Quan Yuan,1 Qingyu Xiong,1,2 and Linda Yang3 1 School of Software Engineering, Chongqing University, Chongqing 400044, China 2 Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing 400044, China 3 School of Engineering, University of Portsmouth, Portsmouth PO1 3AH, UK Correspondence should be addressed to Min Gao; [email protected] Received 12 August 2013; Revised 10 December 2013; Accepted 30 December 2013; Published 19 February 2014 Academic Editor: Xing-Gang Yan Copyright © 2014 Min Gao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Personalized recommendation systems have been widely used as an effective way to deal with information overload. The common approach in the systems, item-based collaborative filtering (CF), has been identified to be vulnerable to “Shilling” attack. To improve the robustness of item-based CF, the authors propose a novel CF approach based on the mostly used relationships between users. In the paper, three most commonly used relationships between users are analyzed and applied to construct several user models at first. The DBSCAN clustering is then utilized to select the valid user model in accordance with how the models benefit detecting spam users. The selected model is used to detect spam user group. Finally, a detection-based CF method is proposed for the calculation of item-item similarities and rating prediction, by setting different weights for suspicious spam users and normal users. The experimental results demonstrate that the proposed approach provides a better robustness than the typical item-based kNN (k Nearest Neighbor) CF approach. 1. Introduction (orranks,weights)ofspamusers[15] made up by an attacker less than those of normal users, the antiattack ability of Nowadays, personalized recommendation systems have been recommendation systems would be improved. widely used as an effective way to help people cope with There are several kinds of relationships between the information overload [1, 2]. It automatically adjusts, restruc- users usually used in item-based CF, such as similarities tures, and presents tailored information for individuals by andcorrelations.Inthispaper,anapproachbasedonthese analyzing user information, creating one-to-one relationship, or understanding user needs in different contexts [3–6]. Until relationships is proposed to calculate the relative weights of now, CF is the most popular approach used in personalized users and to improve the attack resistant ability of typical recommendation systems. Approaches for CF recommenda- item-based CF approaches further. The proposed approach tioncanbegroupedintotwogeneralclasses[7–11]: user-based is constructed by the following four steps: three kinds of and item-based. relationships between users are selected to construct user Both the typical user-based and item-based CF approach- models; a density-based clustering algorithm is then used to es, however, suffer from “Shilling” attacks [12]becauseusers select the best user model; the model is then applied to detect of online systems can multiply their profiles and identities spam users; the detection results are incorporated into an nearly indefinitely. Thus, the systems that depend on such approach for the calculation of item-item similarities and rat- profiles would be subject to control by an attacker bent on ing prediction. Finally, the experimental results illustrate that making the system recommend as he or she desires [12– the proposed approach is able to provide a better robustness 17]. It is a common knowledge that some users’ ratings in (thestabilityofpredictionandhitratio)than(1)amostlyused recommendation systems are more valuable than those of item-based kNN CF (similarity-based CF) recommendation others. If there is an approach that makes the credit ratings approach and (2) other robust recommendation approaches. 2 Mathematical Problems in Engineering IF I0 IS IT characteristics: (1) :allitemsin are the most popular items that are assigned to max (()=max);(2) :allitems F F 0 0 S S T T i1 ···ik i1 ···il i1 ···im i1 ··· in in areassignedtorandom values that are in line with nor- F S mal distribution (( ) = V);(3) :allitems (i ) ···(iF) ···(iS) ···(i ) T ··· T 1 k Null Null 1 m (i1 ) (in ) in are assigned to max (( )=max). Thesegmentattackmodelisdesignedtopushanitemto Figure 1: The general form of an attack profile. a targeted group (segment) of users with known or easily pre- dicted preferences [22]. It has the following characteristics: (1) :allitemsin areassignedtomax (()=max);(2) : The rest of this paper is organized as follows: Section 2 all items in are assigned to min (( )=min);(3) :all presents the background of item-based CF approaches and items in are assigned to max (( )=max). their related problems. Section 3 presents the proposed Research in the area of shilling attacks has made sig- methods for how to select user models, how to detect and nificant advances in last years. User-based CF makes rec- mark suspicious spam users and normal users, and how to ommendations by finding peers with preference profiles; calculate item-item similarities and predictions according consequently, the profiles with biased data may result in to the detection. Section 4 presents experimental results of biased recommendations easily. Item-based CF looks for theproposedapproachonMovieLensdatasetandanalyzes items with similar profiles and makes predictions based on a if the approach is effective in comparison with the typical user’s own ratings of the peer items; therefore, the item-based item-based CF approach and other robust recommendation CF also suffers from the attacks. approaches. Section 5 draws conclusions. Random attack and average attack models are successful against the user-based CF algorithms; however, they fall short 2. Background and Associated Problem of of having a significant impact against the item-based CF Item-Based CF Approaches algorithms [13]. The newer models, bandwagon and segment model, are quite successful against item-based CF algorithms CF is the mostly used and most successful recommendation [22].Intheseattackmodels,randomandbandwagonattacks techniquetodate[18–20]. The traditional CF, user-based CF, belong to low knowledge attacks [13] which need minimal is to predict the rating of an item for a target user based on knowledge of recommendation systems and user profiles. the opinions of other like-minded users. It was remarkably For experimental purpose, the bandwagon attack is adopted successful in the past, but some potential challenges have inthepapersinceitisalowknowledgeattackandquite arisen [21]suchasproblemsinscalability,thatmeansthat successful against item-based CF. the computational complexity is growing rapidly with the number of users. The item-based CF has been proved to solve 2.2. Shilling Attack Resistant CF. A number of recent studies the problem [9]. Both the user-based and item-based CF have been focusing on the robust CF, due to the vulner- approaches, however, suffer from “Shilling” attacks. ability of the recommendation systems that are easily to be attacked. O’Donovan and Smyth [23]proposedthatthe 2.1. Shilling Attack Problem. An attack that influences a trustworthiness of users should be taken into consideration recommendation system is to arrange with a group of users, in recommendation systems. Their trust models can improve named shills [20]orspam users [14], to enter the system and the predictive accuracy. Massa and Avesani [24]proposed vouch for items in question. Their ratings are intended to a robust CF approach, also called trust-based CF, based mislead other users. The attacks are, therefore, called shilling on “web of trust.” The approach increases the coverage of attacks (or profile injection attacks [12]). recommendation systems while preserving the quality of An attack consists of a set of attack profiles (also named predictions, especially for new users. However, the predictive attack ratings). An attack model is an approach to construct accuracy and the coverage of recommendation systems are attack profiles. The general form of an attack profile is shown not the essential metrics for robust recommendation systems in Figure 1 [14]. [25]. Zhang [26] proposed a trust-aware CF based on users’ Suppose that there are items in total in a recommenda- multiple interests. He proposed a topic-level trust model and tion system; an attack profile consists of -dimensional vector a CF approach based on the model. The approach improves of ratings. The -dimensional vector can be divided into 4 the robustness of the recommendations. However, all those 0 0 threelevelsofthetrustmodelarebasedonthenumberof sets: , , ,and .Here,=| |+| |+| |+| |=+ user ratings. ++. (1∼ ) is a set of randomly selected filler items. 0 0 0 The relationships and weights among users are essential (1∼ )isasetofunrateditems. (1∼)isasetofselected to a recommendation system. Yu et al. [27]proposeda items which have some relationships with the target items. reputation-based approach for decoding information