
EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP STOCKHOLM, SVERIGE 2017 Comparison of User Based and Item Based Collaborative Filtering Recommendation Services PETER BOSTRÖM MELKER FILIPSSON KTH SKOLAN FÖR DATAVETENSKAP OCH KOMMUNIKATION Abstract With a constantly increasing amount of content on the internet, filtering algorithms are now more relevant than ever. There are several different methods of providing this type of filtering, and some of the more commonly used are user based and item based collaborative filtering. As both of these have different pros and cons, this report seeks to evaluate in which situations these outperform one another, and by how big of a margin. The purpose of this is getting insight in how some of these basic filtering algorithms work, and how they differ from one another. An algorithm using Adjusted Cosine Similarity to calculate the similarities between users and items, and RMSE to compute the error, was executed on two different datasets with differing sizes of training and testing data. The datasets had the same amount of ratings but the second had less spread in the number of items in the set. The results were similar although slightly superior for both user and item based filtering on the second dataset compared to the first one. Conclusively, when dealing with datasets that are large enough for practical use, user based collaborative filtering proves to be superior in all reviewed cases. 1 Sammanfattning Med en markant ökning av information och data på internet har algoritmer som filtrerar detta fått större relevans än någonsin. Det finns en mängd olika metoder för att förse den här typen av tjänst, och några av de mest använda är föremåls- och användarbaserade filtreringsalgoritmer. Båda dessa metoder har för- och nackdelar i olika grad, vilka denna rapport har som syfte att undersöka. Målet med detta är att få insikt i hur dessa filtreringsalgoritmer fungerar och hur de skiljer sig från varandra. En algoritm som använder sig av “Adjusted Cosine Similarity” för att beräkna likheter mellan användare och föremål, och “RMSE” för att beräkna felmarginalen, exekverades på två olika dataset med olika storlekar på tränings- och testdatan. Dataseten skiljde sig i spridningen mellan föremålen och hade olika antal användare, men var för övrigt lika gällande antalet betyg. Resultaten var liknande mellan de båda databaserna, men testet på den andra gav ett bättre utfall. Sammanfattningsvis, vid hantering av dataset stora nog att se praktisk användning var användarbaserad filtrering överlägsen i alla berörda fall. 2 Table of contents 1 Introduction 4 1.1 Problem statement 4 1.2 Scope 4 2 Background 5 2.1 User-Based Collaborative Filtering 6 2.2 Item-Based Collaborative Filtering 6 2.3 Collaborative Filtering-System Problems 7 2.4 Adjusted Cosine Similarity 8 2.5 Root Mean Square Estimation (RMSE) 9 3 Methods 9 3.1 Software and hardware 9 3.2 Datasets 9 3.3 Implementation 10 4 Results 11 4.1 Dataset 1 11 4.1.1 75/25 11 4.1.2 90/10 12 4.1.3 50/50 13 4.1.4 All Tests from Dataset 1 14 4.2 Dataset 2 15 4.2.1 75/25 15 4.2.2 90/10 16 4.2.3 50/50 17 4.2.4 All Tests from Dataset 2 18 5 Discussion 18 5.1 Discussion 18 5.1.1 dataset 1 18 5.1.2 dataset 2 19 5.2 Method evaluation 20 5.3 Conclusion 20 6 References 21 3 1 Introduction Lately, the demand for recommendation services have severely increased due to to the massive flow of new content on to the internet. In order for users to find the content they desire, competent recommendation services are extremely helpful. Finding the right movie or book among thousands of others that get released every year can be difficult. Therefore, automated recommendation services have been implemented to ease this task. There are however plenty of ways to implement these systems and enterprises want to make sure they implement the ones that best fit their business. Recommendation based algorithms are used in a vast amount of websites, such as the movie recommendation algorithm on Netflix, the music recommendations on Spotify, video recommendations on Youtube and the product recommendations on Amazon. With the amount of content only increasing, research in this subject and implementations of it are further in demand, and in 2006 Netflix handed out an award of one million dollars to whoever could implement the best movie recommendation software for them to use on their service[9]. It is crucial that services recommend the correct items, as it leads to increased consumption, increased user satisfaction, increased profit, and is beneficial to everyone. Collaborative filtering is an effective and easy approach to solve this problem, as it evolves and learns from the user’s preferences in order to further fulfill them in the future. 1.1 Problem Statement The goal of this thesis is to compare the approaches of Collaborative Filtering, mainly User-based Collaborative Filtering and Item-based Collaborative Filtering, on datasets provided by the MovieLens database. This, in purpose of seeing their performances, equalities and differences. The thesis aims at investigating the following: ● Based on database sparsity, size of training and testing data, in which situations are the different approaches to Collaborative Filtering superior to one another? ● What are the main equalities and differences between the different algorithms? 1.2 Scope Two datasets will be used to create subsets of datasets to train the program. The aim is not to reach high performance, but to compare the different approaches to one another. User-based filtering is expected to be superior when dealing with big amounts of data, whereas item-based collaborative filtering is expected to perform better on smaller datasets. 4 2 Background There are two major different approaches to collaborative filtering, item based and user based. Item based filtering uses similarity between the items to determine whether a user would like it or not, whereas user based finds users with similar consumption patterns as yourself and gives you the content that these similar users found interesting. There are also hybrid approaches, which seek to utilise the strengths of both of these approaches whilst removing each of their weaknesses[3]. There are two main approaches to collaborative filtering: Model Based and Memory Based. This paper will discuss Memory Based collaborative filtering, as user based and item based filtering fall under this category. These two are mainly different in what they take into account when calculating the recommendations. Item based collaborative filtering finds similarity patterns between items and recommends them to users based on the computed information, whilst user based finds similar users and gives them recommendations based on what other people with similar consumption patterns appreciated[3]. Fig 1: The picture depicts the different approach that user based and item based collaborative filtering takes. The half dotted lines represent recommendations based on the users preferences and similarities to the left, and based on similar items on the right. Collaborative filtering is one of the most widely used algorithm for product recommendation, and it is considered effective[7]. Hybrid solutions can be useful in order to recommend content to users with unique or wide tastes, as it will be hard to find a “close neighbour” or someone with a similar consumption 5 pattern to that user. A regular item based or user based solution may prove to be unsatisfactory in this situation[3]. 2.1 User-based Collaborative Filtering The report is focusing on the “nearest neighbour” approach for recommendations, which looks at the users rating patterns and finds the “nearest neighbours”, i.e users with ratings similar to yours. The algorithm then proceeds to give you recommendations based on the ratings of these neighbours[2]. In a fixed size neighbourhood, the algorithm finds the X most similar users to you and use them for a basis of recommendation. In a threshold based neighbourhood, all users that fall within the threshold, i.e are similar enough are used to provide recommendations[8]. This report will use the threshold based neighbourhood as it makes more sense to use data that are similar enough, and not give bad recommendations to certain users simply because the closest neighbour was really far away. This will lead to some users getting better recommendations than others (as they have more similar users for the algorithm to work with), but it will at least not give bad recommendations where no recommendations might have been preferred. It will also not neglect similar users just because some users are even more similar, and it makes sense to use all good data we have at our disposal. Fig. 2: the image on the left depicts a threshold based neighbourhood. User 1 would get recommendations from users 2 and 3, but not from 4 and 5 as they are outside the threshold. The image on the right depicts a fixed size neighbourhood. User 1 would get recommendations from users 2, 3 and 4, but not from 5 and 6 as it in this example uses the three closest neighbours for recommendations[8]. 2.2 Item-based Collaborative Filtering Item based collaborative filtering was introduced 1998 by Amazon[6]. Unlike user based collaborative filtering, item based filtering looks at the similarity between different items, and does this by taking note of how many users that bought item X also bought item Y. If the 6 correlation is high enough, a similarity can be presumed to exist between the two items, and they can be assumed to be similar to one another. Item Y will from there on be recommended to users who bought item X and vice versa[6].
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages24 Page
-
File Size-