Recommender Systems Recommender Systems Sistemi Informativi M 12

Recommender Systems Information Systems M Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/SI-M/ Recommender Systems A recommender system ( RS ) helps people to evaluate the, potentially huge, number of alternatives offered by a Web site In their simplest form RS’s recommend to their users personalized and ranked lists of items Provide consumers with information to help them decide which items to purchase Given a set of users and items (documents, products, …) recommend items to a user based on past behavior of this and other users additional information on users/items A multitude of applications, and a big market ! E-commerce Medical applications (e.g., matching patients to doctors) Customer Relationship Management (e.g., matching customer problems to experts) Recommender Systems Sistemi Informativi M 2 1 What book should I buy? Recommender Systems Sistemi Informativi M 3 What movie should I watch? • The Internet Movie Database (IMDb) provides information about actors, films, television shows, television stars, video games… • Owned by Amazon.com since 1998 • 796,328 titles and 2,127,371 people • More than 50M users per month Recommender Systems Sistemi Informativi M 4 2 The Netflix prize (1) Netflix is a US online movie rental service Over 100K titles and 55 million DVDs total A proprietary recommendation system called “Cinematch” Approximately 60% of Netflix members select their movies based on movie recommendations In October 2006, Netflix announced it would have paid a $1 million to whoever created a movie-recommending algorithm 10% better than Cinematch Recommender Systems Sistemi Informativi M 5 The Netflix prize (2) Within two weeks, Netflix received 169 submissions, including three that were slightly superior to Cinematch After a month, more than a thousand programs had been entered, and the top scorers were almost halfway to the goal Three years later, on 21st of September 2009, Netflix announced the winner Recommender Systems Sistemi Informativi M 6 12.01.2011 3 What news should I read? Recommender Systems Sistemi Informativi M 7 Where should I spend my vacation? Recommender Systems Sistemi Informativi M 8 4 Remarkable examples Amazon.com Books, movies, music CDNOW.com Music Ebay.com (feedback forms) Anything Reel.com Movies Barnes & Noble Books Recommender Systems Sistemi Informativi M 9 Technologies Systems Nano YooC Think Method Taste Clerk Critic Flixst Movi Netfli Shaza Pand LastF Itune Amaz Jinni crow IMDb hoos Analy Kid dogs ker er elens x m ora M s on d e tics Collaborative Filtering v vvvvvv vvvvv Content-Based vvvv v v vvvv v Techniques Knowledge-Based v v v v v v v Techniques Ontologies and Semantic Web v v v Technologies for Recommender Systems Hybrid Techniques v v v v v v v Context Dependent v v v v v v Recommender Systems Recommender Systems Sistemi Informativi M 10 5 Inputs to a RS Behavior of user in past “transactions” which items viewed/purchase content/attributes of items pages bookmarked explicit ratings on items Context (used in context-based recommendations) what the user appears to be doing now Role/domain additional info about users, items, … Recommender Systems Sistemi Informativi M 11 Content-Based Recommendation In content-based recommendations the system tries to recommend items that matches the user profile The profile is based on items that the user liked in the past or on explicit interests that s/he defines User Profile New books Match Recommender Systems Recommender Systems Sistemi Informativi M 12 6 Implementing content-based RS’s The basic idea is borrowed from the Vector Space Model Each item is characterized by a set of (weighted) features Movie: actors, director, title, … Weight: use tf.idf Also works for “unstructured” data (web pages, docs, etc.) The user profile is built using user history E.g., a vector representing the relevance of features/keywords for that user Either implicit or explicit “rating of features” (or both) Cosine similarity can be used to match the user profile with an item vector Recommender Systems Sistemi Informativi M 13 Pros and cons of content-based RS’s Able to recommend new and unpopular items No need for data on other users Can provide explanations of recommended items Limited content analysis Not always easy to find the appropriate features to use Overspecialization Can only recommend items similar to previously seen/rated ones Further, items too similar to some the user already knows might not be of interest (e.g., news articles) New users How to build a profile? Recommender Systems Sistemi Informativi M 14 7 Collaborative filtering (CF) Unlike content-based recommendation methods, CF recommender systems try to predict the utility of items for a particular user based on the items previously rated by other users Two basic variants of CF: User-based : To predict a user’s opinion for an item, use the opinion of similar users , where similarity between users depends on their opinions for other items Item-based : as in content-based RS’s, the assumption is that a user is likely to have the same opinion for similar items; however, now similarity between items depends on how other users have rated them Recommender Systems Sistemi Informativi M 15 User-based CF Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 ? 2 7 User 2 2 ? 5 7 5 User 3 5 4 7 4 7 User 4 7 1 7 3 8 User 5 1 7 4 6 5 User 6 8 3 8 3 7 Recommender Systems Sistemi Informativi M 16 8 Similarity between users: simple way Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 ? 2 7 User 2 2 ? 5 7 5 Only consider items both users have rated For each item, compute the difference in the users’ ratings If Item J has been rated by both User 1 and User 2: | rating (User 1, Item J) – rating (User 2, Item J) | Take the average of these differences over all common items Recommender Systems Sistemi Informativi M 17 Similarity between users: more realistic Can use either all items or only those rated by both users We have a user-item matrix R of ratings, where ra,i is the rating of user a for item I, and r a is the average rating of user a Two maJor alternatives for measuring the similarity between users: Pearson correlation − − ∑(r a,i ra )(rb,i rb ) sim(a, b) = i − 2 − 2 ∑(r a,i a )r ∑(r b,i rb ) i i ∑ra,i r b,i sim(a, b) = i Cosine 2 2 ∑ra,i ∑rb,i i i Recommender Systems Sistemi Informativi M 18 9 Rating prediction and recommendation To predict the rating ra,i for the (target) user a and item i, a weighted sum can be used: = × r ia, ∑sim(a, u) r iu, u 5 7 7 weighted sum 8 4 Rather than considering all the users, only the k most similar to user a can be used Based on rating predictions, the top-N items can be recommended to user a Recommender Systems Sistemi Informativi M 19 Problems with user-based CF User Cold-Start problem Not enough is known about new user to decide who is similar Sparsity of the rating matrix With large item sets, users will have rated only some of the items (makes it hard to find similar users) With 2M books, rating 2K of them is only 0.1% Scalability With millions of users and items, computations become slow Item Cold-Start problem Cannot predict ratings for a new item until some users have rated it Also a problem with “esoteric” items Popularity bias Cannot recommend items to a user with unique tastes Recommender Systems Sistemi Informativi M 20 10 Item-based CF Pearson correlation (or cosine) is now used to measure the similarity of items Still based on ratings, not on items’ content! Pearson correlation − − ∑(r u,i ri )(ru, j j )r sim(i, j) = u − 2 − 2 ∑(r u,i i )r ∑(r u, j j )r u u ∑ru,i r u, j Cosine sim(i, j) = u 2 2 ∑ru,i ∑ru, j u u Recommender Systems Sistemi Informativi M 21 Generating predictions As with user-based CF, can use all items or only the k most similar ones × ∑sim(i, j) ra, j r = j a,i ∑sim(i, j) j Item 2 8 1 Item Item 1 Weighted sum 3 Item 5 7 Item 4 2 Recommender Systems Sistemi Informativi M 22 11 Problems with item-based CF Item Cold-Start problem This is a major problem here Recommender Systems Sistemi Informativi M 23 Important Issues Cold Start, Implicit/Explicit Rating, Sparsity, Portfolio Effect (non diversity problem), Security, Privacy, … A lot of work exists on RS‘s, and many other alternatives have been developed Hybrid RS‘s Model-based CF Develop a model of user ratings (probabilistic, based on clustering, etc.) Context-based RS‘s Vary the predictions depending on user context … See also the survey [AT05] on the web site Recommender Systems Sistemi Informativi M 24 12.

Load more