Comparison of User Based and Item Based Collaborative Filtering Recommendation Services

Total Page:16

File Type:pdf, Size:1020Kb

Comparison of User Based and Item Based Collaborative Filtering Recommendation Services EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP STOCKHOLM, SVERIGE 2017 Comparison of User Based and Item Based Collaborative Filtering Recommendation Services PETER BOSTRÖM MELKER FILIPSSON KTH SKOLAN FÖR DATAVETENSKAP OCH KOMMUNIKATION Abstract With a constantly increasing amount of content on the internet, filtering algorithms are now more relevant than ever. There are several different methods of providing this type of filtering, and some of the more commonly used are user based and item based collaborative filtering. As both of these have different pros and cons, this report seeks to evaluate in which situations these outperform one another, and by how big of a margin. The purpose of this is getting insight in how some of these basic filtering algorithms work, and how they differ from one another. An algorithm using Adjusted Cosine Similarity to calculate the similarities between users and items, and RMSE to compute the error, was executed on two different datasets with differing sizes of training and testing data. The datasets had the same amount of ratings but the second had less spread in the number of items in the set. The results were similar although slightly superior for both user and item based filtering on the second dataset compared to the first one. Conclusively, when dealing with datasets that are large enough for practical use, user based collaborative filtering proves to be superior in all reviewed cases. 1 Sammanfattning Med en markant ökning av information och data på internet har algoritmer som filtrerar detta fått större relevans än någonsin. Det finns en mängd olika metoder för att förse den här typen av tjänst, och några av de mest använda är föremåls- och användarbaserade filtreringsalgoritmer. Båda dessa metoder har för- och nackdelar i olika grad, vilka denna rapport har som syfte att undersöka. Målet med detta är att få insikt i hur dessa filtreringsalgoritmer fungerar och hur de skiljer sig från varandra. En algoritm som använder sig av “Adjusted Cosine Similarity” för att beräkna likheter mellan användare och föremål, och “RMSE” för att beräkna felmarginalen, exekverades på två olika dataset med olika storlekar på tränings- och testdatan. Dataseten skiljde sig i spridningen mellan föremålen och hade olika antal användare, men var för övrigt lika gällande antalet betyg. Resultaten var liknande mellan de båda databaserna, men testet på den andra gav ett bättre utfall. Sammanfattningsvis, vid hantering av dataset stora nog att se praktisk användning var användarbaserad filtrering överlägsen i alla berörda fall. 2 Table of contents 1 Introduction 4 1.1 Problem statement 4 1.2 Scope 4 2 Background 5 2.1 User-Based Collaborative Filtering 6 2.2 Item-Based Collaborative Filtering 6 2.3 Collaborative Filtering-System Problems 7 2.4 Adjusted Cosine Similarity 8 2.5 Root Mean Square Estimation (RMSE) 9 3 Methods 9 3.1 Software and hardware 9 3.2 Datasets 9 3.3 Implementation 10 4 Results 11 4.1 Dataset 1 11 4.1.1 75/25 11 4.1.2 90/10 12 4.1.3 50/50 13 4.1.4 All Tests from Dataset 1 14 4.2 Dataset 2 15 4.2.1 75/25 15 4.2.2 90/10 16 4.2.3 50/50 17 4.2.4 All Tests from Dataset 2 18 5 Discussion 18 5.1 Discussion 18 5.1.1 dataset 1 18 5.1.2 dataset 2 19 5.2 Method evaluation 20 5.3 Conclusion 20 6 References 21 3 1 Introduction Lately, the demand for recommendation services have severely increased due to to the massive flow of new content on to the internet. In order for users to find the content they desire, competent recommendation services are extremely helpful. Finding the right movie or book among thousands of others that get released every year can be difficult. Therefore, automated recommendation services have been implemented to ease this task. There are however plenty of ways to implement these systems and enterprises want to make sure they implement the ones that best fit their business. Recommendation based algorithms are used in a vast amount of websites, such as the movie recommendation algorithm on Netflix, the music recommendations on Spotify, video recommendations on Youtube and the product recommendations on Amazon. With the amount of content only increasing, research in this subject and implementations of it are further in demand, and in 2006 Netflix handed out an award of one million dollars to whoever could implement the best movie recommendation software for them to use on their service[9]. It is crucial that services recommend the correct items, as it leads to increased consumption, increased user satisfaction, increased profit, and is beneficial to everyone. Collaborative filtering is an effective and easy approach to solve this problem, as it evolves and learns from the user’s preferences in order to further fulfill them in the future. 1.1 Problem Statement The goal of this thesis is to compare the approaches of Collaborative Filtering, mainly User-based Collaborative Filtering and Item-based Collaborative Filtering, on datasets provided by the MovieLens database. This, in purpose of seeing their performances, equalities and differences. The thesis aims at investigating the following: ● Based on database sparsity, size of training and testing data, in which situations are the different approaches to Collaborative Filtering superior to one another? ​ ​ ● What are the main equalities and differences between the different algorithms? 1.2 Scope Two datasets will be used to create subsets of datasets to train the program. The aim is not to reach high performance, but to compare the different approaches to one another. User-based filtering is expected to be superior when dealing with big amounts of data, whereas item-based collaborative filtering is expected to perform better on smaller datasets. 4 2 Background There are two major different approaches to collaborative filtering, item based and user ​ ​ ​ based. Item based filtering uses similarity between the items to determine whether a user ​ would like it or not, whereas user based finds users with similar consumption patterns as yourself and gives you the content that these similar users found interesting. There are also hybrid approaches, which seek to utilise the strengths of both of these approaches whilst removing each of their weaknesses[3]. There are two main approaches to collaborative filtering: Model Based and Memory Based. This paper will discuss Memory Based collaborative filtering, as user based and item based filtering fall under this category. These two are mainly different in what they take into account when calculating the recommendations. Item based collaborative filtering finds similarity patterns between items and recommends them to users based on the computed information, whilst user based finds similar users and gives them recommendations based on what other people with similar consumption patterns appreciated[3]. Fig 1: The picture depicts the different approach that user based and item based collaborative filtering takes. The half dotted lines represent recommendations based on the users preferences and similarities to the left, and based on similar items on the right. Collaborative filtering is one of the most widely used algorithm for product recommendation, and it is considered effective[7]. Hybrid solutions can be useful in order to recommend content to users with unique or wide tastes, as it will be hard to find a “close neighbour” or someone with a similar consumption 5 pattern to that user. A regular item based or user based solution may prove to be unsatisfactory in this situation[3]. 2.1 User-based Collaborative Filtering The report is focusing on the “nearest neighbour” approach for recommendations, which looks at the users rating patterns and finds the “nearest neighbours”, i.e users with ratings similar to yours. The algorithm then proceeds to give you recommendations based on the ratings of these neighbours[2]. In a fixed size neighbourhood, the algorithm finds the X most similar users to you and use them for a basis of recommendation. In a threshold based neighbourhood, all users that fall within the threshold, i.e are similar enough are used to provide recommendations[8]. This report will use the threshold based neighbourhood as it makes more sense to use data that are similar enough, and not give bad recommendations to certain users simply because the closest neighbour was really far away. This will lead to some users getting better recommendations than others (as they have more similar users for the algorithm to work with), but it will at least not give bad recommendations where no recommendations might have been preferred. It will also not neglect similar users just because some users are even more similar, and it makes sense to use all good data we have at our disposal. Fig. 2: the image on the left depicts a threshold based neighbourhood. User 1 would get ​ ​ recommendations from users 2 and 3, but not from 4 and 5 as they are outside the threshold. The image on the right depicts a fixed size neighbourhood. User 1 would get recommendations from users 2, 3 and 4, but not from 5 and 6 as it in this example uses the three closest neighbours for recommendations[8]. ​ ​ 2.2 Item-based Collaborative Filtering Item based collaborative filtering was introduced 1998 by Amazon[6]. Unlike user based collaborative filtering, item based filtering looks at the similarity between different items, and does this by taking note of how many users that bought item X also bought item Y. If the 6 correlation is high enough, a similarity can be presumed to exist between the two items, and they can be assumed to be similar to one another. Item Y will from there on be recommended to users who bought item X and vice versa[6].
Recommended publications
  • The Use of Time Dimension in Recommender Systems for Learning
    The Use of Time Dimension in Recommender Systems for Learning Eduardo José de Borba1, Isabela Gasparini1 and Daniel Lichtnow2 1Graduate Program in Applied Computing (PPGCA), Department of Computer Science (DCC), Santa Catarina State University (UDESC), Paulo Malschitzki 200, Joinville, Brazil 2Polytechnic School, Federal University of Santa Maria (UFSM), Av. Roraima 1000, Santa Maria, Brazil Keywords: Recommender System, Context-aware, Time, Learning. Abstract: When the amount of learning objects is huge, especially in the e-learning context, users could suffer cognitive overload. That way, users cannot find useful items and might feel lost in the environment. Recommender systems are tools that suggest items to users that best match their interests and needs. However, traditional recommender systems are not enough for learning, because this domain needs more personalization for each user profile and context. For this purpose, this work investigates Time-Aware Recommender Systems (Context-aware Recommender Systems that uses time dimension) for learning. Based on a set of categories (defined in previous works) of how time is used in Recommender Systems regardless of their domain, scenarios were defined that help illustrate and explain how each category could be applied in learning domain. As a result, a Recommender System for learning is proposed. It combines Content-Based and Collaborative Filtering approaches in a Hybrid algorithm that considers time in Pre- Filtering and Post-Filtering phases. 1 INTRODUCTION potential to improve the quality of recommendation (Campos et al., 2014). Nowadays, there are distinct educational approaches, This work aims to identify how Time-aware RS e.g. online learning, blended learning, face-to-face (Context-aware RS that uses time context) can be learning, etc.
    [Show full text]
  • CRUC: Cold-Start Recommendations Using Collaborative Filtering in Internet of Things
    CRUC: Cold-start Recommendations Using Collaborative Filtering in Internet of Things Daqiang Zhanga,*, Qin Zoub, Haoyi Xiongc a School of Computer Science, Nanjing Normal University, Nanjing, Jiangsu Province, 210046,China b School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, Hubei Province, 430072, China c Department of Telecommunication, Institute Telecome – Telecom SudParis, Evry, 91011, France Abstract The Internet of Things (IoT) aims at interconnecting everyday objects (including both things and users) and then using this connection information to provide customized user services. However, IoT does not work in its initial stages without adequate acquisition of user preferences. This is caused by cold-start problem that is a situation where only few users are interconnected. To this end, we propose CRUC scheme --- Cold-start Recommendations Using Collaborative Filtering in IoT, involving formulation, filtering and prediction steps. Extensive experiments over real cases and simulation have been performed to evaluate the performance of CRUC scheme. Experimental results show that CRUC efficiently solves the cold-start problem in IoT. Keywords: Cold-start Problem, Internet of Things, Collaborative Filtering 1. Introduction The Internet of Things (IoT) refers to a self-configuring network in which everyday objects are interconnected to the Internet [1] [2]. IoT deploys sensors in infrastructures (e.g., rooms and buildings) to get a heightened awareness of real-time events. It also employs sensors capturing contextual information about objects (e.g., user preferences) to achieve an enhanced situational awareness [3] [21-26]. Readings from a large number of sensors for various objects are enormous, but only a few of them are useful for a specific user.
    [Show full text]
  • Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation
    Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation Daniel Valcarce, Javier Parapar, and Álvaro Barreiro Information Retrieval Lab Computer Science Department University of A Coruña, Spain {daniel.valcarce,javierparapar,barreiro}@udc.es Abstract. Recently, Relevance-Based Language Models have been dem- onstrated as an effective Collaborative Filtering approach. Nevertheless, this family of Pseudo-Relevance Feedback techniques is computationally expensive for applying them to web-scale data. Also, they require the use of smoothing methods which need to be tuned. These facts lead us to study other similar techniques with better trade-offs between effective- ness and efficiency. Specifically, in this paper, we analyse the applicability to the recommendation task of four well-known query expansion tech- niques with multiple probability estimates. Moreover, we analyse the ef- fect of neighbourhood length and devise a new probability estimate that takes into account this property yielding better recommendation rank- ings. Finally, we find that the proposed algorithms are dramatically faster than those based on Relevance-Based Language Models, they do not have any parameter to tune (apart from the ones of the neighbourhood) and they provide a better trade-off between accuracy and diversity/novelty. Keywords: Recommender Systems, Collaborative Filtering, Query Expansion, Pseudo-Relevance Feedback. 1 Introduction Recommender systems are recognised as a key instrument to deliver relevant information to the users. Although the problem that attracts most attention in the field of Recommender Systems is accuracy, the emphasis on efficiency is increasing. We present new Collaborative Filtering (CF) algorithms. CF methods exploit the past interactions betweens items and users. Common approaches to CF are based on nearest neighbours or matrix factorisation [17].
    [Show full text]
  • Combining User-Based and Item-Based Collaborative Filtering Using Machine Learning
    Combining User-Based and Item-Based Collaborative Filtering Using Machine Learning Priyank Thakkar, Krunal Varma, Vijay Ukani, Sapan Mankad and Sudeep Tanwar Abstract Collaborative filtering (CF) is typically used for recommending those items to a user which other like-minded users preferred in the past. User-based collaborative filtering (UbCF) and item-based collaborative filtering (IbCF) are two types of CF with a common objective of estimating target user’s rating for the target item. This paper explores different ways of combining predictions from UbCF and IbCF with an aim of minimizing overall prediction error. In this paper, we propose an approach for combining predictions from UbCF and IbCF through multiple linear regression (MLR) and support vector regression (SVR). Results of the proposed approach are compared with the results of other fusion approaches. The comparison demonstrates the superiority of the proposed approach. All the tests are performed on a large publically available dataset. Keywords User-based collaborative filtering · Item-based collaborative filtering Machine learning · Multiple linear regression · Support vector regression P. Thakkar (B) · K. Varma · V. Ukani · S. Mankad · S. Tanwar Institute of Technology, Nirma University, Ahmedabad 382481, India e-mail: [email protected] K. Varma e-mail: [email protected] V. Ukani e-mail: [email protected] S. Mankad e-mail: [email protected] S. Tanwar e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2019 173 S. C. Satapathy and A. Joshi (eds.), Information and Communication Technology for Intelligent Systems, Smart Innovation, Systems and Technologies 107, https://doi.org/10.1007/978-981-13-1747-7_17 174 P.
    [Show full text]
  • A Grouplens Perspective
    From: AAAI Technical Report WS-98-08. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved. RecommenderSystems: A GroupLensPerspective Joseph A. Konstan*t , John Riedl *t, AI Borchers,* and Jonathan L. Herlocker* *GroupLensResearch Project *Net Perceptions,Inc. Dept. of ComputerScience and Engineering 11200 West78th Street University of Minnesota Suite 300 Minneapolis, MN55455 Minneapolis, MN55344 http://www.cs.umn.edu/Research/GroupLens/ http://www.netperceptions.com/ ABSTRACT identifying sets of articles by keyworddoes not scale to a In this paper, wereview the history and research findings of situation in which there are thousands of articles that the GroupLensResearch project I and present the four broad contain any imaginable set of keywords. Taken together, research directions that we feel are most critical for these two weaknesses represented an opportunity for a new recommender systems. type of filtering, that would focus on finding which INTRODUCTION:A History of the GroupLensProject available articles matchhuman notions of quality and taste. The GroupLens Research project began at the Computer Such a system would be able to produce a list of articles Supported Cooperative Work (CSCW)Conference in 1992. that each user wouldlike, independentof their content. Oneof the keynote speakers at the conference lectured on a Wedecided to apply our ideas in the domain of Usenet his vision of an emerging information economy,in which news. Usenet screamsfor better information filtering, with most of the effort in the economywould revolve around hundreds of thousands of articles posted daily. Manyof the production, distribution, and consumptionof information, articles in each Usenet newsgroupare on the sametopic, so rather than physical goods and services.
    [Show full text]
  • Towards an Effective Crowdsourcing Recommendation System a Survey of the State-Of-The-Art
    2015 IEEE Symposium on Service-Oriented System Engineering Towards an Effective Crowdsourcing Recommendation System A Survey of the State-of-the-Art Eman Aldhahri, Vivek Shandilya, Sajjan Shiva Computer Science Department, The University of Memphis Memphis, USA {aldhahri, vmshndly, sshiva}@memphis.edu Abstract—Crowdsourcing is an approach where requesters can behaviors. User behavior can be interpreted as preferable call for workers with different capabilities to process a task for features.. Explicit profiles are based on asking users to complete monetary reward. With the vast amount of tasks posted every a preferable features form. Two techniques common in this area day, satisfying workers, requesters, and service providers--who are: a collaborative filtering approach (with cold start problems, are the stakeholders of any crowdsourcing system--is critical to scarcity and scalability in large datasets [6]), and a content- its success. To achieve this, the system should address three based approach (with problems due to overspecialization, etc.). objectives: (1) match the worker with a suitable task that fits the A hybrid approach that combines the two techniques has also worker’s interests and skills, and raise the worker’s rewards; (2) been used. give requesters more qualified solutions with lower cost and II. MOTIVATION time; and (3) raise the accepted tasks rate which will raise the The main objective in this study is to investigate various online aggregated commissions accordingly. For these objectives, we recommendation systems by analyzing their input parameters, present a critical study of the state-of-the-art in effectiveness, and limitations in order to assess their usage in recommendation systems that are ubiquitous among crowdsourcing systems.
    [Show full text]
  • Mobile Application Recommender System
    UPTEC IT 10 025 Examensarbete 30 hp December 2010 Mobile Application Recommender System Christoffer Davidsson Abstract Mobile Application Recommender System Christoffer Davidsson Teknisk- naturvetenskaplig fakultet UTH-enheten With the amount of mobile applications available increasing rapidly, users have to put a lot of effort into finding applications of interest. The purpose of this thesis is to Besöksadress: investigate how to aid users in the process of discovering new mobile applications by Ångströmlaboratoriet Lägerhyddsvägen 1 providing them with recommendations. A prototype system is then built as a Hus 4, Plan 0 proof-of-concept. Postadress: The work of the thesis is divided into three phases where the aim of the first phase is Box 536 751 21 Uppsala to study related work and related systems to identify promising concepts and features. During the second phase, a prototype system is designed and implemented. Telefon: The outcome and result of the first two phases is then evaluated and analyzed in the 018 – 471 30 03 third and final phase. Telefax: 018 – 471 30 00 The prototype system integrates and extends an existing recommender engine previously used to recommend media items. As a part of the system, an Android Hemsida: application is developed, which observes user actions and presents recommended http://www.teknat.uu.se/student applications to the user. In parallel to the development, the system was tested by a small group of users recruited among colleagues at Ericsson. The data generated during this test period is analyzed to show the usefulness of observed user actions over explicit ratings and the dependency on context for application usage.
    [Show full text]
  • Wiki-Rec: a Semantic-Based Recommendation System Using Wikipedia As an Ontology
    Wiki-Rec: A Semantic-Based Recommendation System Using Wikipedia as an Ontology Ahmed Elgohary, Hussein Nomir, Ibrahim Sabek, Mohamed Samir, Moustafa Badawy, Noha A. Yousri Computer and Systems Engineering Department Faculty of Engineering, Alexandria University Alexandria, Egypt {algoharyalex, hussein.nomier, ibrahim.sabek, m.samir.galal, moustafa.badawym}@gmail.com, [email protected] Abstract— Nowadays, satisfying user needs has become the Hybrid models try to overcome the limitations of each in main challenge in a variety of web applications. Recommender order to generate better quality recommendations. systems play a major role in that direction. However, as most Since most of the information on the web is present in a of the information is present in a textual form, recommender textual form, recommendation systems have to deal with systems face the challenge of efficiently analyzing huge huge amounts of unstructured text. Efficient text mining amounts of text. The usage of semantic-based analysis has techniques are, therefore, needed to understand documents in gained much interest in recent years. The emergence of order to extract important information. Traditional term- ontologies has yet facilitated semantic interpretation of text. based or lexical-based analysis cannot capture the underlying However, relying on an ontology for performing the semantic semantics when used on their own. That is why semantic- analysis requires too much effort to construct and maintain the based analysis approaches have been introduced [1] [2] to used ontologies. Besides, the currently known ontologies cover a small number of the world's concepts especially when a non- overcome such a limitation. The use of ontologies have also domain-specific concepts are needed.
    [Show full text]
  • User Relevance for Item-Based Collaborative Filtering R
    User Relevance for Item-Based Collaborative Filtering R. Latha, R. Nadarajan To cite this version: R. Latha, R. Nadarajan. User Relevance for Item-Based Collaborative Filtering. 12th International Conference on Information Systems and Industrial Management (CISIM), Sep 2013, Krakow, Poland. pp.337-347, 10.1007/978-3-642-40925-7_31. hal-01496080 HAL Id: hal-01496080 https://hal.inria.fr/hal-01496080 Submitted on 27 Mar 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License User Relevance for Item-based Collaborative Filtering R. Latha, R. Nadarajan Department of Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore Tamil Nadu, India [email protected], [email protected] Abstract. A Collaborative filtering (CF), one of the successful recommenda- tion approaches, makes use of history of user preferences in order to make pre- dictions. Common drawback found in most of the approaches available in the literature is that all users are treated equally. i.e., all users have same im- portance. But in the real scenario, there are users who rate items, which have similar rating pattern.
    [Show full text]
  • A Systematic Review and Taxonomy of Explanations in Decision Support and Recommender Systems
    Noname manuscript No. (will be inserted by the editor) A Systematic Review and Taxonomy of Explanations in Decision Support and Recommender Systems Ingrid Nunes · Dietmar Jannach Received: date / Accepted: date Abstract With the recent advances in the field of artificial intelligence, an increasing number of decision-making tasks are delegated to software systems. A key requirement for the success and adoption of such systems is that users must trust system choices or even fully automated decisions. To achieve this, explanation facilities have been widely investigated as a means of establishing trust in these systems since the early years of expert systems. With today's increasingly sophisticated machine learning algorithms, new challenges in the context of explanations, accountability, and trust towards such systems con- stantly arise. In this work, we systematically review the literature on expla- nations in advice-giving systems. This is a family of systems that includes recommender systems, which is one of the most successful classes of advice- giving software in practice. We investigate the purposes of explanations as well as how they are generated, presented to users, and evaluated. As a result, we derive a novel comprehensive taxonomy of aspects to be considered when de- signing explanation facilities for current and future decision support systems. The taxonomy includes a variety of different facets, such as explanation objec- tive, responsiveness, content and presentation. Moreover, we identified several challenges that remain unaddressed so far, for example related to fine-grained issues associated with the presentation of explanations and how explanation facilities are evaluated. Keywords Explanation · Decision Support System · Recommender System · Expert System · Knowledge-based System · Systematic Review · Machine Learning · Trust · Artificial Intelligence I.
    [Show full text]
  • Collaborative Filtering: a Machine Learning Perspective by Benjamin Marlin a Thesis Submitted in Conformity with the Requirement
    Collaborative Filtering: A Machine Learning Perspective by Benjamin Marlin A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Computer Science University of Toronto Copyright c 2004 by Benjamin Marlin Abstract Collaborative Filtering: A Machine Learning Perspective Benjamin Marlin Master of Science Graduate Department of Computer Science University of Toronto 2004 Collaborative filtering was initially proposed as a framework for filtering information based on the preferences of users, and has since been refined in many different ways. This thesis is a comprehensive study of rating-based, pure, non-sequential collaborative filtering. We analyze existing methods for the task of rating prediction from a machine learning perspective. We show that many existing methods proposed for this task are simple applications or modifications of one or more standard machine learning methods for classification, regression, clustering, dimensionality reduction, and density estima- tion. We introduce new prediction methods in all of these classes. We introduce a new experimental procedure for testing stronger forms of generalization than has been used previously. We implement a total of nine prediction methods, and conduct large scale prediction accuracy experiments. We show interesting new results on the relative performance of these methods. ii Acknowledgements I would like to begin by thanking my supervisor Richard Zemel for introducing me to the field of collaborative filtering, for numerous helpful discussions about a multitude of models and methods, and for many constructive comments about this thesis itself. I would like to thank my second reader Sam Roweis for his thorough review of this thesis, as well as for many interesting discussions of this and other research.
    [Show full text]
  • User Data Analytics and Recommender System for Discovery Engine
    User Data Analytics and Recommender System for Discovery Engine Yu Wang Master of Science Thesis Stockholm, Sweden 2013 TRITA-ICT-EX-2013: 88 User Data Analytics and Recommender System for Discovery Engine Yu Wang [email protected] Whaam AB, Stockholm, Sweden Royal Institute of Technology, Stockholm, Sweden June 11, 2013 Supervisor: Yi Fu, Whaam AB Examiner: Prof. Johan Montelius, KTH Abstract On social bookmarking website, besides saving, organizing and sharing web pages, users can also discovery new web pages by browsing other’s bookmarks. However, as more and more contents are added, it is hard for users to find interesting or related web pages or other users who share the same interests. In order to make bookmarks discoverable and build a discovery engine, sophisticated user data analytic methods and recommender system are needed. This thesis addresses the topic by designing and implementing a prototype of a recommender system for recommending users, links and linklists. Users and linklists recommendation is calculated by content- based method, which analyzes the tags cosine similarity. Links recommendation is calculated by linklist based collaborative filtering method. The recommender system contains offline and online subsystem. Offline subsystem calculates data statistics and provides recommendation candidates while online subsystem filters the candidates and returns to users. The experiments show that in social bookmark service like Whaam, tag based cosine similarity method improves the mean average precision by 45% compared to traditional collaborative method for user and linklist recommendation. For link recommendation, the linklist based collaborative filtering method increase the mean average precision by 39% compared to the user based collaborative filtering method.
    [Show full text]