Recommender Systems Information Systems M

Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/SI-M/

Recommender Systems

 A recommender system ( RS ) helps people to evaluate the, potentially huge, number of alternatives offered by a Web site

 In their simplest form RS’s recommend to their users personalized and ranked lists of items

 Provide consumers with information to help them decide which items to purchase

 Given a set of users and items (documents, products, …) recommend items to a user based on

 past behavior of this and other users

 additional information on users/items

 A multitude of applications, and a big market !

 E-commerce

 Medical applications (e.g., matching patients to doctors)

 Customer Relationship Management (e.g., matching customer problems to experts)

Recommender Systems Sistemi Informativi M 2

1 What book should I buy?

Recommender Systems Sistemi Informativi M 3

What movie should I watch?

• The Internet Movie Database (IMDb) provides information about actors, films, television shows, television stars, games… • Owned by .com since 1998 • 796,328 titles and 2,127,371 people • More than 50M users per month

Recommender Systems Sistemi Informativi M 4

2 The Netflix prize (1)

 Netflix is a US online movie rental service

 Over 100K titles and 55 million DVDs total

 A proprietary recommendation system called “Cinematch”

 Approximately 60% of Netflix members select their movies based on movie recommendations

In October 2006, Netflix announced would have paid a $1 million to whoever created a movie-recommending algorithm 10% better than Cinematch

Recommender Systems Sistemi Informativi M 5

The Netflix prize (2)

 Within two weeks, Netflix received 169 submissions, including three that were slightly superior to Cinematch

 After a month, more than a thousand programs had been entered, and the top scorers were almost halfway to the goal

 Three years later, on 21st of September 2009, Netflix announced the winner

Recommender Systems Sistemi Informativi M 6 12.01.2011

3 What news should I read?

Recommender Systems Sistemi Informativi M 7

Where should I spend my vacation?

Recommender Systems Sistemi Informativi M 8

4 Remarkable examples

Amazon.com Books, movies, music

CDNOW.com Music

Ebay.com (feedback forms) Anything

Reel.com Movies

Barnes & Noble Books

Recommender Systems Sistemi Informativi M 9

Technologies

Systems Nano YooC Think Method Taste Clerk Critic Flixst Movi Netfli Shaza Pand LastF Itune Amaz Jinni crow IMDb hoos Analy Kid dogs ker er elens x m ora M s on d e tics Collaborative Filtering v vvvvvv vvvvv Content-Based vvvv v v vvvv v Techniques Knowledge-Based v v v v v v v Techniques Ontologies and Semantic Web v v v Technologies for Recommender Systems Hybrid Techniques v v v v v v v Context Dependent v v v v v v Recommender Systems

Recommender Systems Sistemi Informativi M 10

5 Inputs to a RS

 Behavior of user in past “transactions”

 which items viewed/purchase

 content/attributes of items

 pages bookmarked

 explicit ratings on items

 Context (used in context-based recommendations)

 what the user appears to be doing

 Role/domain

 additional info about users, items, …

Recommender Systems Sistemi Informativi M 11

Content-Based Recommendation

 In content-based recommendations the system tries to recommend items that matches the user profile

 The profile is based on items that the user liked in the past or on explicit interests that s/he defines

User Profile New books Match

Recommender Systems Recommender Systems Sistemi Informativi M 12

6 Implementing content-based RS’s

 The basic idea is borrowed from the Vector Space Model

 Each item is characterized by a set of (weighted) features

 Movie: actors, director, title, …

 Weight: use tf.idf

 Also works for “unstructured” data (web pages, docs, etc.)

 The user profile is built using user history

 E.g., a vector representing the relevance of features/keywords for that user

 Either implicit or explicit “rating of features” (or both)

 Cosine similarity can be used to match the user profile with an item vector

Recommender Systems Sistemi Informativi M 13

Pros and cons of content-based RS’s

 Able to recommend new and unpopular items

 No need for data on other users

 Can provide explanations of recommended items

 Limited content analysis

 Not always easy to find the appropriate features to use

 Overspecialization

 Can only recommend items similar to previously seen/rated ones

 Further, items too similar to some the user already knows might not be of interest (e.g., news articles)

 New users

 How to build a profile?

Recommender Systems Sistemi Informativi M 14

7 Collaborative filtering (CF)

 Unlike content-based recommendation methods, CF recommender systems try to predict the utility of items for a particular user based on the items previously rated by other users

 Two basic variants of CF: User-based : To predict a user’s opinion for an item, use the opinion of similar users , where similarity between users depends on their opinions for other items Item-based : as in content-based RS’s, the assumption is that a user is likely to have the same opinion for similar items; however, now similarity between items depends on how other users have rated them

Recommender Systems Sistemi Informativi M 15

User-based CF

Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 ? 2 7 User 2 2 ? 5 7 5 User 3 5 4 7 4 7

User 4 7 1 7 3 8

User 5 1 7 4 6 5

User 6 8 3 8 3 7

Recommender Systems Sistemi Informativi M 16

8 Similarity between users: simple way

Item 1 Item 2 Item 3 Item 4 Item 5

User 1 8 1 ? 2 7

User 2 2 ? 5 7 5

 Only consider items both users have rated

 For each item, compute the difference in the users’ ratings

 If Item j has been rated by both User 1 and User 2: | rating (User 1, Item j) – rating (User 2, Item j) |

 Take the average of these differences over all common items

Recommender Systems Sistemi Informativi M 17

Similarity between users: more realistic

 Can use either all items or only those rated by both users

 We have a user-item matrix R of ratings, where ra,i is the rating of user a for item I, and r a is the average rating of user a  Two major alternatives for measuring the similarity between users:

Pearson correlation − − ∑(r ia, a )(rr ib, b )r sim(a, b) = i − 2 − 2 ∑(r ia, a )r ∑(r ib, b )r i i

∑ rr ib,ia, sim(a, b) = i Cosine 2 2 ∑r ia, ∑r ib, i i Recommender Systems Sistemi Informativi M 18

9 Rating prediction and recommendation

 To predict the rating ra,i for the (target) user a and item i, a weighted sum can be used: = × r ia, ∑sim(a, u) r iu, u

5

7 7 weighted sum 8

4

 Rather than considering all the users, only the k most similar to user a can be used

 Based on rating predictions, the top-N items can be recommended to user a

Recommender Systems Sistemi Informativi M 19

Problems with user-based CF

 User Cold-Start problem

 Not enough is known about new user to decide who is similar

 Sparsity of the rating matrix

 With large item sets, users will have rated only some of the items (makes it hard to find similar users)

 With 2M books, rating 2K of them is only 0.1%

 Scalability

 With millions of users and items, computations become slow

 Item Cold-Start problem

 Cannot predict ratings for a new item until some users have rated it

 Also a problem with “esoteric” items

 Popularity bias

 Cannot recommend items to a user with unique tastes

Recommender Systems Sistemi Informativi M 20

10 Item-based CF

 Pearson correlation (or cosine) is now used to measure the similarity of items

 Still based on ratings, not on items’ content!

Pearson correlation − − ∑(r iu, i )(rr ju, j )r sim(i, j) = u − 2 − 2 ∑(r iu, i )r ∑(r ju, j )r u u

∑ rr ju,iu, Cosine sim(i, j) = u 2 2 ∑r iu, ∑r ju, u u

Recommender Systems Sistemi Informativi M 21

Generating predictions

 As with user-based CF, can use all items or only the k most similar ones × ∑sim(i, j) r ja, r = j ia, ∑sim(i, j) j Item 2 8 1 Item Item 1 Weighted sum 3

Item 5 7 Item 4 2

Recommender Systems Sistemi Informativi M 22

11 Problems with item-based CF

 Item Cold-Start problem

 This is a major problem here

Recommender Systems Sistemi Informativi M 23

Important Issues

 Cold Start, Implicit/Explicit Rating, Sparsity, Portfolio Effect (non diversity problem), Security, Privacy, …

 A lot of work exists on RS‘s, and many other alternatives have been developed

 Hybrid RS‘s

 Model-based CF

 Develop a model of user ratings (probabilistic, based on clustering, etc.)

 Context-based RS‘s

 Vary the predictions depending on user context

 …

 See also the survey [AT05] on the web site

Recommender Systems Sistemi Informativi M 24

12