Collaborative Filtering for Binary, Positive-Only Data

Total Page:16

File Type:pdf, Size:1020Kb

Collaborative Filtering for Binary, Positive-Only Data Collaborative Filtering for Binary, Positive-only Data Koen Verstrepen∗ Kanishka Bhaduriy Froomle Apple, Inc. Antwerp, Belgium Cupertino, USA [email protected] [email protected] Boris Cule Bart Goethals University of Antwerp Froomle, University of Antwerp Antwerp, Belgium Antwerp, Belgium [email protected] [email protected] ABSTRACT other. Studying recommender systems specifically, and the connection between individuals and relevant items in gen- Traditional collaborative filtering assumes the availability of eral, is the subject of recommendation research. But the explicit ratings of users for items. However, in many cases relevance of recommendation research goes beyond connect- these ratings are not available and only binary, positive-only ing users with items. Recommender systems can, for exam- data is available. Binary, positive-only data is typically as- ple, also connect genes with diseases, biological targets with sociated with implicit feedback such as items bought, videos drug compounds, words with documents, tags with photos, watched, ads clicked on, etc. However, it can also be the re- etc. sults of explicit feedback such as likes on social networking sites. Because binary, positive-only data contains no neg- 1.1 Collaborative Filtering ative information, it needs to be treated differently than Collaborative filtering is a principal problem in recommen- rating data. As a result of the growing relevance of this dation research. In the most abstract sense, collaborative problem setting, the number of publications in this field in- filtering is the problem of weighting missing edges in a bi- creases rapidly. In this survey, we provide an overview of the partite graph. existing work from an innovative perspective that allows us to emphasize surprising commonalities and key differences. The concrete version of this problem that got most atten- tion until recently is rating prediction. In rating prediction, one set of nodes in the bipartite graph represent users, the 1. INTRODUCTION other set of nodes represent items, an edge with weight r Increasingly, people are overwhelmed by an abundance of between user u and item i expresses that u has given i the choice. Via the World Wide Web, everybody has access to rating r, and the task is to predict the missing ratings. Since a wide variety of news, opinions, (encyclopedic) informa- rating prediction is a mature domain, multiple overviews ex- tion, social contacts, books, music, videos, pictures, prod- ist [23; 47; 1; 59]. Recently, the attention for rating predic- ucts, jobs, houses, and many other items, from all over the tion diminished because of multiple reasons. First, collect- world. However, from the perspective of a particular person, ing rating data is relatively expensive in the sense that it the vast majority of items is irrelevant; and the few relevant requires a non-negligible effort from the users. Second, user items are difficult to find because they are buried under a ratings do not correlate as well with user behavior as one large pile or irrelevant ones. There exist, for example, lots would expect. Users tend to give high ratings to items they of books that one would enjoy reading, if only one could think they should consume, for example a famous book by identify them. Moreover, not only do people fail to find rel- Dostoyevsky. However, they would rather read Superman evant existing items, niche items fail to be created because comic books, which they rate much lower. Finally, in many it is anticipated that the target audience will not be able to applications, predicting ratings is not the final goal, and the find them under the pile of irrelevant items. Certain books, predicted ratings are only used to find the most relevant for example, are never written because writers anticipate items for every user. Consequently, high ratings need to they will not be able to reach a sufficiently large portion of be accurate whereas the exact value of low ratings is irrel- their target audience, although the audience exists. Recom- evant. However, in rating prediction high and low ratings mender systems contribute to overcome these difficulties by are equally important. connecting individuals with items relevant to them. A good Today, attention is increasingly shifting towards collabora- book recommender system, for example, would typically rec- tive filtering with binary, positive-only data. In this version, ommend 3, previously unknown, books that the user would edges are unweighted, an edge between user u and item i ex- enjoy reading, and that are sufficiently different from each presses that user u has given positive feedback about item i, and the task is to attach to every missing edge between ∗This work was done while Koen Verstrepen was working at a user u and an item i a score that indicates the suitabil- the University of Antwerp. ity of recommending i to u. Binary, positive-only data is yThis work was done while Kanishka Bhaduri was working typically associated with implicit feedback such as items at Netflix, Inc. bought, videos watched, songs listened to, books lent from a library, ads clicked on, etc. However, it can also be the is desirable for collaborative filtering, but typically not for result of explicit feedback, such as likes on social networking association rule mining. sites. As a result of the growing relevance of this problem setting, the number of publications in this field increases 1.3 Outline rapidly. In this survey, we provide an overview of the exist- After the Preliminaries (Sec. 2), we introduce our framework ing work on collaborative filtering with binary, positive-only (Sec. 3) and review the state of the art along the three di- data from an innovative perspective that allows us to em- mensions of our framework: Factorization Models (Sec. 4), phasize surprising commonalities and key differences. To Deviation Functions (Sec. 5), and Minimization Algorithms enhance the readability, we sometimes omit the specifica- (Sec. 6). Finally, we discuss the usability of methods for tion `binary, positive-only' and use the abbreviated term rating prediction (Sec. 7) and conclude (Sec. 8). `collaborative filtering’. Besides the bipartite graph, five types of extra information 2. PRELIMINARIES can be available. First, there can be item content or item We introduced collaborative filtering as the problem of weight- metadata. In the case of books, for example, the content is ing missing edges in a bipartite graph. Typically, however, the full text of the book and the metadata can include the this bipartite graph is represented by its adjacency matrix, writer, the publisher, the year it was published etc. Meth- which is called the preference matrix. ods that exclusively use this kind of information are typi- Let be a set of users and a set of items. We are given cally classified as content based. Methods that combine this a preferenceU matrix with trainingI data R 0; 1 jU|×|Ij. kind of information with a collaborative filtering method 2 f g Rui = 1 indicates that there is a known preference of user are typically classified as hybrid. Second, there can be user u for item i . Rui = 0 indicates that there is no metadata such as gender, age, location, etc. Third, users such2 U information.2 Notice I that the absence of information can be connected with each other in an extra, unipartite means that either there exists no preference or there exists graph. A typical example is a social network between the a preference but it is not known. users. An analogous graph can exist for the items. Finally, there can be contextual information such as location, date, Collaborative filtering methods compute for every user-item time, intent, company, device, etc. Exploiting information pair (u; i) a recommendation score s(u; i) that indicates the besides the bipartite graph, is out of the scope of this sur- suitability of recommending i to u. Typically, the user-item- pairs are (partially) sorted by their recommendation scores. vey. Comprehensive discussions on exploiting information jU|×|Ij outside the user-item matrix have been presented [53; 72]. We define the matrix S R as Sui = s(u; i). Further- more, c(x) gives the count2 of x, meaning 1.2 Relation to Other Domains (P i2I Rxi if x To emphasize the unique aspects of collaborative filtering, c(x) = P 2 U Rux if x : we highlight the commonalities and differences with two re- u2U 2 I lated data science problems: classification and association Although we conveniently call the elements of users and rule mining. the elements of items, these sets can containU any type of First, collaborative filtering is equivalent to jointly solving object. In the caseI of online social networks, for example, many one-class classification problems, in which every one- both sets contain the people that participate in the social class classification problem corresponds to one of the items. network, i.e., = , and Rui = 1 if there exists a friendship In the classification problem that corresponds to item i, i link between personsU I u and i. In image tagging/annotation serves as the class, all other items serve as the features, the problems, contains images, contains words, and Rui = 1 users that have i as a known preference serve as labeled if image uUwas tagged with wordI i. In chemogenomics, an examples and the other users serve as unlabeled examples. early stage in the drug discovery process, contains active U Amazon.com, for example, has more than 200 million items drug compounds, contains biological targets, and Rui = 1 in its catalog, hence solving the collaborative filtering prob- if there is a strongI interaction between compound u and lem for Amazon.com is equivalent to jointly solving more biological target i. than 200 million one-class classification problems, which ob- Typically, datasets for collaborative filtering are extremely viously requires a distinct approach.
Recommended publications
  • The Use of Time Dimension in Recommender Systems for Learning
    The Use of Time Dimension in Recommender Systems for Learning Eduardo José de Borba1, Isabela Gasparini1 and Daniel Lichtnow2 1Graduate Program in Applied Computing (PPGCA), Department of Computer Science (DCC), Santa Catarina State University (UDESC), Paulo Malschitzki 200, Joinville, Brazil 2Polytechnic School, Federal University of Santa Maria (UFSM), Av. Roraima 1000, Santa Maria, Brazil Keywords: Recommender System, Context-aware, Time, Learning. Abstract: When the amount of learning objects is huge, especially in the e-learning context, users could suffer cognitive overload. That way, users cannot find useful items and might feel lost in the environment. Recommender systems are tools that suggest items to users that best match their interests and needs. However, traditional recommender systems are not enough for learning, because this domain needs more personalization for each user profile and context. For this purpose, this work investigates Time-Aware Recommender Systems (Context-aware Recommender Systems that uses time dimension) for learning. Based on a set of categories (defined in previous works) of how time is used in Recommender Systems regardless of their domain, scenarios were defined that help illustrate and explain how each category could be applied in learning domain. As a result, a Recommender System for learning is proposed. It combines Content-Based and Collaborative Filtering approaches in a Hybrid algorithm that considers time in Pre- Filtering and Post-Filtering phases. 1 INTRODUCTION potential to improve the quality of recommendation (Campos et al., 2014). Nowadays, there are distinct educational approaches, This work aims to identify how Time-aware RS e.g. online learning, blended learning, face-to-face (Context-aware RS that uses time context) can be learning, etc.
    [Show full text]
  • CRUC: Cold-Start Recommendations Using Collaborative Filtering in Internet of Things
    CRUC: Cold-start Recommendations Using Collaborative Filtering in Internet of Things Daqiang Zhanga,*, Qin Zoub, Haoyi Xiongc a School of Computer Science, Nanjing Normal University, Nanjing, Jiangsu Province, 210046,China b School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, Hubei Province, 430072, China c Department of Telecommunication, Institute Telecome – Telecom SudParis, Evry, 91011, France Abstract The Internet of Things (IoT) aims at interconnecting everyday objects (including both things and users) and then using this connection information to provide customized user services. However, IoT does not work in its initial stages without adequate acquisition of user preferences. This is caused by cold-start problem that is a situation where only few users are interconnected. To this end, we propose CRUC scheme --- Cold-start Recommendations Using Collaborative Filtering in IoT, involving formulation, filtering and prediction steps. Extensive experiments over real cases and simulation have been performed to evaluate the performance of CRUC scheme. Experimental results show that CRUC efficiently solves the cold-start problem in IoT. Keywords: Cold-start Problem, Internet of Things, Collaborative Filtering 1. Introduction The Internet of Things (IoT) refers to a self-configuring network in which everyday objects are interconnected to the Internet [1] [2]. IoT deploys sensors in infrastructures (e.g., rooms and buildings) to get a heightened awareness of real-time events. It also employs sensors capturing contextual information about objects (e.g., user preferences) to achieve an enhanced situational awareness [3] [21-26]. Readings from a large number of sensors for various objects are enormous, but only a few of them are useful for a specific user.
    [Show full text]
  • Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation
    Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation Daniel Valcarce, Javier Parapar, and Álvaro Barreiro Information Retrieval Lab Computer Science Department University of A Coruña, Spain {daniel.valcarce,javierparapar,barreiro}@udc.es Abstract. Recently, Relevance-Based Language Models have been dem- onstrated as an effective Collaborative Filtering approach. Nevertheless, this family of Pseudo-Relevance Feedback techniques is computationally expensive for applying them to web-scale data. Also, they require the use of smoothing methods which need to be tuned. These facts lead us to study other similar techniques with better trade-offs between effective- ness and efficiency. Specifically, in this paper, we analyse the applicability to the recommendation task of four well-known query expansion tech- niques with multiple probability estimates. Moreover, we analyse the ef- fect of neighbourhood length and devise a new probability estimate that takes into account this property yielding better recommendation rank- ings. Finally, we find that the proposed algorithms are dramatically faster than those based on Relevance-Based Language Models, they do not have any parameter to tune (apart from the ones of the neighbourhood) and they provide a better trade-off between accuracy and diversity/novelty. Keywords: Recommender Systems, Collaborative Filtering, Query Expansion, Pseudo-Relevance Feedback. 1 Introduction Recommender systems are recognised as a key instrument to deliver relevant information to the users. Although the problem that attracts most attention in the field of Recommender Systems is accuracy, the emphasis on efficiency is increasing. We present new Collaborative Filtering (CF) algorithms. CF methods exploit the past interactions betweens items and users. Common approaches to CF are based on nearest neighbours or matrix factorisation [17].
    [Show full text]
  • Combining User-Based and Item-Based Collaborative Filtering Using Machine Learning
    Combining User-Based and Item-Based Collaborative Filtering Using Machine Learning Priyank Thakkar, Krunal Varma, Vijay Ukani, Sapan Mankad and Sudeep Tanwar Abstract Collaborative filtering (CF) is typically used for recommending those items to a user which other like-minded users preferred in the past. User-based collaborative filtering (UbCF) and item-based collaborative filtering (IbCF) are two types of CF with a common objective of estimating target user’s rating for the target item. This paper explores different ways of combining predictions from UbCF and IbCF with an aim of minimizing overall prediction error. In this paper, we propose an approach for combining predictions from UbCF and IbCF through multiple linear regression (MLR) and support vector regression (SVR). Results of the proposed approach are compared with the results of other fusion approaches. The comparison demonstrates the superiority of the proposed approach. All the tests are performed on a large publically available dataset. Keywords User-based collaborative filtering · Item-based collaborative filtering Machine learning · Multiple linear regression · Support vector regression P. Thakkar (B) · K. Varma · V. Ukani · S. Mankad · S. Tanwar Institute of Technology, Nirma University, Ahmedabad 382481, India e-mail: [email protected] K. Varma e-mail: [email protected] V. Ukani e-mail: [email protected] S. Mankad e-mail: [email protected] S. Tanwar e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2019 173 S. C. Satapathy and A. Joshi (eds.), Information and Communication Technology for Intelligent Systems, Smart Innovation, Systems and Technologies 107, https://doi.org/10.1007/978-981-13-1747-7_17 174 P.
    [Show full text]
  • A Grouplens Perspective
    From: AAAI Technical Report WS-98-08. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved. RecommenderSystems: A GroupLensPerspective Joseph A. Konstan*t , John Riedl *t, AI Borchers,* and Jonathan L. Herlocker* *GroupLensResearch Project *Net Perceptions,Inc. Dept. of ComputerScience and Engineering 11200 West78th Street University of Minnesota Suite 300 Minneapolis, MN55455 Minneapolis, MN55344 http://www.cs.umn.edu/Research/GroupLens/ http://www.netperceptions.com/ ABSTRACT identifying sets of articles by keyworddoes not scale to a In this paper, wereview the history and research findings of situation in which there are thousands of articles that the GroupLensResearch project I and present the four broad contain any imaginable set of keywords. Taken together, research directions that we feel are most critical for these two weaknesses represented an opportunity for a new recommender systems. type of filtering, that would focus on finding which INTRODUCTION:A History of the GroupLensProject available articles matchhuman notions of quality and taste. The GroupLens Research project began at the Computer Such a system would be able to produce a list of articles Supported Cooperative Work (CSCW)Conference in 1992. that each user wouldlike, independentof their content. Oneof the keynote speakers at the conference lectured on a Wedecided to apply our ideas in the domain of Usenet his vision of an emerging information economy,in which news. Usenet screamsfor better information filtering, with most of the effort in the economywould revolve around hundreds of thousands of articles posted daily. Manyof the production, distribution, and consumptionof information, articles in each Usenet newsgroupare on the sametopic, so rather than physical goods and services.
    [Show full text]
  • Towards an Effective Crowdsourcing Recommendation System a Survey of the State-Of-The-Art
    2015 IEEE Symposium on Service-Oriented System Engineering Towards an Effective Crowdsourcing Recommendation System A Survey of the State-of-the-Art Eman Aldhahri, Vivek Shandilya, Sajjan Shiva Computer Science Department, The University of Memphis Memphis, USA {aldhahri, vmshndly, sshiva}@memphis.edu Abstract—Crowdsourcing is an approach where requesters can behaviors. User behavior can be interpreted as preferable call for workers with different capabilities to process a task for features.. Explicit profiles are based on asking users to complete monetary reward. With the vast amount of tasks posted every a preferable features form. Two techniques common in this area day, satisfying workers, requesters, and service providers--who are: a collaborative filtering approach (with cold start problems, are the stakeholders of any crowdsourcing system--is critical to scarcity and scalability in large datasets [6]), and a content- its success. To achieve this, the system should address three based approach (with problems due to overspecialization, etc.). objectives: (1) match the worker with a suitable task that fits the A hybrid approach that combines the two techniques has also worker’s interests and skills, and raise the worker’s rewards; (2) been used. give requesters more qualified solutions with lower cost and II. MOTIVATION time; and (3) raise the accepted tasks rate which will raise the The main objective in this study is to investigate various online aggregated commissions accordingly. For these objectives, we recommendation systems by analyzing their input parameters, present a critical study of the state-of-the-art in effectiveness, and limitations in order to assess their usage in recommendation systems that are ubiquitous among crowdsourcing systems.
    [Show full text]
  • Mobile Application Recommender System
    UPTEC IT 10 025 Examensarbete 30 hp December 2010 Mobile Application Recommender System Christoffer Davidsson Abstract Mobile Application Recommender System Christoffer Davidsson Teknisk- naturvetenskaplig fakultet UTH-enheten With the amount of mobile applications available increasing rapidly, users have to put a lot of effort into finding applications of interest. The purpose of this thesis is to Besöksadress: investigate how to aid users in the process of discovering new mobile applications by Ångströmlaboratoriet Lägerhyddsvägen 1 providing them with recommendations. A prototype system is then built as a Hus 4, Plan 0 proof-of-concept. Postadress: The work of the thesis is divided into three phases where the aim of the first phase is Box 536 751 21 Uppsala to study related work and related systems to identify promising concepts and features. During the second phase, a prototype system is designed and implemented. Telefon: The outcome and result of the first two phases is then evaluated and analyzed in the 018 – 471 30 03 third and final phase. Telefax: 018 – 471 30 00 The prototype system integrates and extends an existing recommender engine previously used to recommend media items. As a part of the system, an Android Hemsida: application is developed, which observes user actions and presents recommended http://www.teknat.uu.se/student applications to the user. In parallel to the development, the system was tested by a small group of users recruited among colleagues at Ericsson. The data generated during this test period is analyzed to show the usefulness of observed user actions over explicit ratings and the dependency on context for application usage.
    [Show full text]
  • Wiki-Rec: a Semantic-Based Recommendation System Using Wikipedia As an Ontology
    Wiki-Rec: A Semantic-Based Recommendation System Using Wikipedia as an Ontology Ahmed Elgohary, Hussein Nomir, Ibrahim Sabek, Mohamed Samir, Moustafa Badawy, Noha A. Yousri Computer and Systems Engineering Department Faculty of Engineering, Alexandria University Alexandria, Egypt {algoharyalex, hussein.nomier, ibrahim.sabek, m.samir.galal, moustafa.badawym}@gmail.com, [email protected] Abstract— Nowadays, satisfying user needs has become the Hybrid models try to overcome the limitations of each in main challenge in a variety of web applications. Recommender order to generate better quality recommendations. systems play a major role in that direction. However, as most Since most of the information on the web is present in a of the information is present in a textual form, recommender textual form, recommendation systems have to deal with systems face the challenge of efficiently analyzing huge huge amounts of unstructured text. Efficient text mining amounts of text. The usage of semantic-based analysis has techniques are, therefore, needed to understand documents in gained much interest in recent years. The emergence of order to extract important information. Traditional term- ontologies has yet facilitated semantic interpretation of text. based or lexical-based analysis cannot capture the underlying However, relying on an ontology for performing the semantic semantics when used on their own. That is why semantic- analysis requires too much effort to construct and maintain the based analysis approaches have been introduced [1] [2] to used ontologies. Besides, the currently known ontologies cover a small number of the world's concepts especially when a non- overcome such a limitation. The use of ontologies have also domain-specific concepts are needed.
    [Show full text]
  • User Relevance for Item-Based Collaborative Filtering R
    User Relevance for Item-Based Collaborative Filtering R. Latha, R. Nadarajan To cite this version: R. Latha, R. Nadarajan. User Relevance for Item-Based Collaborative Filtering. 12th International Conference on Information Systems and Industrial Management (CISIM), Sep 2013, Krakow, Poland. pp.337-347, 10.1007/978-3-642-40925-7_31. hal-01496080 HAL Id: hal-01496080 https://hal.inria.fr/hal-01496080 Submitted on 27 Mar 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License User Relevance for Item-based Collaborative Filtering R. Latha, R. Nadarajan Department of Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore Tamil Nadu, India [email protected], [email protected] Abstract. A Collaborative filtering (CF), one of the successful recommenda- tion approaches, makes use of history of user preferences in order to make pre- dictions. Common drawback found in most of the approaches available in the literature is that all users are treated equally. i.e., all users have same im- portance. But in the real scenario, there are users who rate items, which have similar rating pattern.
    [Show full text]
  • A Systematic Review and Taxonomy of Explanations in Decision Support and Recommender Systems
    Noname manuscript No. (will be inserted by the editor) A Systematic Review and Taxonomy of Explanations in Decision Support and Recommender Systems Ingrid Nunes · Dietmar Jannach Received: date / Accepted: date Abstract With the recent advances in the field of artificial intelligence, an increasing number of decision-making tasks are delegated to software systems. A key requirement for the success and adoption of such systems is that users must trust system choices or even fully automated decisions. To achieve this, explanation facilities have been widely investigated as a means of establishing trust in these systems since the early years of expert systems. With today's increasingly sophisticated machine learning algorithms, new challenges in the context of explanations, accountability, and trust towards such systems con- stantly arise. In this work, we systematically review the literature on expla- nations in advice-giving systems. This is a family of systems that includes recommender systems, which is one of the most successful classes of advice- giving software in practice. We investigate the purposes of explanations as well as how they are generated, presented to users, and evaluated. As a result, we derive a novel comprehensive taxonomy of aspects to be considered when de- signing explanation facilities for current and future decision support systems. The taxonomy includes a variety of different facets, such as explanation objec- tive, responsiveness, content and presentation. Moreover, we identified several challenges that remain unaddressed so far, for example related to fine-grained issues associated with the presentation of explanations and how explanation facilities are evaluated. Keywords Explanation · Decision Support System · Recommender System · Expert System · Knowledge-based System · Systematic Review · Machine Learning · Trust · Artificial Intelligence I.
    [Show full text]
  • Collaborative Filtering: a Machine Learning Perspective by Benjamin Marlin a Thesis Submitted in Conformity with the Requirement
    Collaborative Filtering: A Machine Learning Perspective by Benjamin Marlin A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Computer Science University of Toronto Copyright c 2004 by Benjamin Marlin Abstract Collaborative Filtering: A Machine Learning Perspective Benjamin Marlin Master of Science Graduate Department of Computer Science University of Toronto 2004 Collaborative filtering was initially proposed as a framework for filtering information based on the preferences of users, and has since been refined in many different ways. This thesis is a comprehensive study of rating-based, pure, non-sequential collaborative filtering. We analyze existing methods for the task of rating prediction from a machine learning perspective. We show that many existing methods proposed for this task are simple applications or modifications of one or more standard machine learning methods for classification, regression, clustering, dimensionality reduction, and density estima- tion. We introduce new prediction methods in all of these classes. We introduce a new experimental procedure for testing stronger forms of generalization than has been used previously. We implement a total of nine prediction methods, and conduct large scale prediction accuracy experiments. We show interesting new results on the relative performance of these methods. ii Acknowledgements I would like to begin by thanking my supervisor Richard Zemel for introducing me to the field of collaborative filtering, for numerous helpful discussions about a multitude of models and methods, and for many constructive comments about this thesis itself. I would like to thank my second reader Sam Roweis for his thorough review of this thesis, as well as for many interesting discussions of this and other research.
    [Show full text]
  • User Data Analytics and Recommender System for Discovery Engine
    User Data Analytics and Recommender System for Discovery Engine Yu Wang Master of Science Thesis Stockholm, Sweden 2013 TRITA-ICT-EX-2013: 88 User Data Analytics and Recommender System for Discovery Engine Yu Wang [email protected] Whaam AB, Stockholm, Sweden Royal Institute of Technology, Stockholm, Sweden June 11, 2013 Supervisor: Yi Fu, Whaam AB Examiner: Prof. Johan Montelius, KTH Abstract On social bookmarking website, besides saving, organizing and sharing web pages, users can also discovery new web pages by browsing other’s bookmarks. However, as more and more contents are added, it is hard for users to find interesting or related web pages or other users who share the same interests. In order to make bookmarks discoverable and build a discovery engine, sophisticated user data analytic methods and recommender system are needed. This thesis addresses the topic by designing and implementing a prototype of a recommender system for recommending users, links and linklists. Users and linklists recommendation is calculated by content- based method, which analyzes the tags cosine similarity. Links recommendation is calculated by linklist based collaborative filtering method. The recommender system contains offline and online subsystem. Offline subsystem calculates data statistics and provides recommendation candidates while online subsystem filters the candidates and returns to users. The experiments show that in social bookmark service like Whaam, tag based cosine similarity method improves the mean average precision by 45% compared to traditional collaborative method for user and linklist recommendation. For link recommendation, the linklist based collaborative filtering method increase the mean average precision by 39% compared to the user based collaborative filtering method.
    [Show full text]