Recommender Systems Information Systems M
Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/SI-M/
Recommender Systems
A recommender system ( RS ) helps people to evaluate the, potentially huge, number of alternatives offered by a Web site
In their simplest form RS’s recommend to their users personalized and ranked lists of items
Provide consumers with information to help them decide which items to purchase
Given a set of users and items (documents, products, …) recommend items to a user based on
past behavior of this and other users
additional information on users/items
A multitude of applications, and a big market !
E-commerce
Medical applications (e.g., matching patients to doctors)
Customer Relationship Management (e.g., matching customer problems to experts)
Recommender Systems Sistemi Informativi M 2
1 What book should I buy?
Recommender Systems Sistemi Informativi M 3
What movie should I watch?
• The Internet Movie Database (IMDb) provides information about actors, films, television shows, television stars, video games… • Owned by Amazon.com since 1998 • 796,328 titles and 2,127,371 people • More than 50M users per month
Recommender Systems Sistemi Informativi M 4
2 The Netflix prize (1)
Netflix is a US online movie rental service
Over 100K titles and 55 million DVDs total
A proprietary recommendation system called “Cinematch”
Approximately 60% of Netflix members select their movies based on movie recommendations
In October 2006, Netflix announced it would have paid a $1 million to whoever created a movie-recommending algorithm 10% better than Cinematch
Recommender Systems Sistemi Informativi M 5
The Netflix prize (2)
Within two weeks, Netflix received 169 submissions, including three that were slightly superior to Cinematch
After a month, more than a thousand programs had been entered, and the top scorers were almost halfway to the goal
Three years later, on 21st of September 2009, Netflix announced the winner
Recommender Systems Sistemi Informativi M 6 12.01.2011
3 What news should I read?
Recommender Systems Sistemi Informativi M 7
Where should I spend my vacation?
Recommender Systems Sistemi Informativi M 8
4 Remarkable examples
Amazon.com Books, movies, music
CDNOW.com Music
Ebay.com (feedback forms) Anything
Reel.com Movies
Barnes & Noble Books
Recommender Systems Sistemi Informativi M 9
Technologies
Systems Nano YooC Think Method Taste Clerk Critic Flixst Movi Netfli Shaza Pand LastF Itune Amaz Jinni crow IMDb hoos Analy Kid dogs ker er elens x m ora M s on d e tics Collaborative Filtering v vvvvvv vvvvv Content-Based vvvv v v vvvv v Techniques Knowledge-Based v v v v v v v Techniques Ontologies and Semantic Web v v v Technologies for Recommender Systems Hybrid Techniques v v v v v v v Context Dependent v v v v v v Recommender Systems
Recommender Systems Sistemi Informativi M 10
5 Inputs to a RS
Behavior of user in past “transactions”
which items viewed/purchase
content/attributes of items
pages bookmarked
explicit ratings on items
Context (used in context-based recommendations)
what the user appears to be doing now
Role/domain
additional info about users, items, …
Recommender Systems Sistemi Informativi M 11
Content-Based Recommendation
In content-based recommendations the system tries to recommend items that matches the user profile
The profile is based on items that the user liked in the past or on explicit interests that s/he defines
User Profile New books Match
Recommender Systems Recommender Systems Sistemi Informativi M 12
6 Implementing content-based RS’s
The basic idea is borrowed from the Vector Space Model
Each item is characterized by a set of (weighted) features
Movie: actors, director, title, …
Weight: use tf.idf
Also works for “unstructured” data (web pages, docs, etc.)
The user profile is built using user history
E.g., a vector representing the relevance of features/keywords for that user
Either implicit or explicit “rating of features” (or both)
Cosine similarity can be used to match the user profile with an item vector
Recommender Systems Sistemi Informativi M 13
Pros and cons of content-based RS’s
Able to recommend new and unpopular items
No need for data on other users
Can provide explanations of recommended items
Limited content analysis
Not always easy to find the appropriate features to use
Overspecialization
Can only recommend items similar to previously seen/rated ones
Further, items too similar to some the user already knows might not be of interest (e.g., news articles)
New users
How to build a profile?
Recommender Systems Sistemi Informativi M 14
7 Collaborative filtering (CF)
Unlike content-based recommendation methods, CF recommender systems try to predict the utility of items for a particular user based on the items previously rated by other users
Two basic variants of CF: User-based : To predict a user’s opinion for an item, use the opinion of similar users , where similarity between users depends on their opinions for other items Item-based : as in content-based RS’s, the assumption is that a user is likely to have the same opinion for similar items; however, now similarity between items depends on how other users have rated them
Recommender Systems Sistemi Informativi M 15
User-based CF
Item 1 Item 2 Item 3 Item 4 Item 5 User 1 8 1 ? 2 7 User 2 2 ? 5 7 5 User 3 5 4 7 4 7
User 4 7 1 7 3 8
User 5 1 7 4 6 5
User 6 8 3 8 3 7
Recommender Systems Sistemi Informativi M 16
8 Similarity between users: simple way
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 8 1 ? 2 7
User 2 2 ? 5 7 5
Only consider items both users have rated
For each item, compute the difference in the users’ ratings
If Item j has been rated by both User 1 and User 2: | rating (User 1, Item j) – rating (User 2, Item j) |
Take the average of these differences over all common items
Recommender Systems Sistemi Informativi M 17
Similarity between users: more realistic
Can use either all items or only those rated by both users
We have a user-item matrix R of ratings, where ra,i is the rating of user a for item I, and r a is the average rating of user a Two major alternatives for measuring the similarity between users:
Pearson correlation − − ∑(r ia, a )(rr ib, b )r sim(a, b) = i − 2 − 2 ∑(r ia, a )r ∑(r ib, b )r i i
∑ rr ib,ia, sim(a, b) = i Cosine 2 2 ∑r ia, ∑r ib, i i Recommender Systems Sistemi Informativi M 18
9 Rating prediction and recommendation
To predict the rating ra,i for the (target) user a and item i, a weighted sum can be used: = × r ia, ∑sim(a, u) r iu, u
5
7 7 weighted sum 8
4
Rather than considering all the users, only the k most similar to user a can be used
Based on rating predictions, the top-N items can be recommended to user a
Recommender Systems Sistemi Informativi M 19
Problems with user-based CF
User Cold-Start problem
Not enough is known about new user to decide who is similar
Sparsity of the rating matrix
With large item sets, users will have rated only some of the items (makes it hard to find similar users)
With 2M books, rating 2K of them is only 0.1%
Scalability
With millions of users and items, computations become slow
Item Cold-Start problem
Cannot predict ratings for a new item until some users have rated it
Also a problem with “esoteric” items
Popularity bias
Cannot recommend items to a user with unique tastes
Recommender Systems Sistemi Informativi M 20
10 Item-based CF
Pearson correlation (or cosine) is now used to measure the similarity of items
Still based on ratings, not on items’ content!
Pearson correlation − − ∑(r iu, i )(rr ju, j )r sim(i, j) = u − 2 − 2 ∑(r iu, i )r ∑(r ju, j )r u u
∑ rr ju,iu, Cosine sim(i, j) = u 2 2 ∑r iu, ∑r ju, u u
Recommender Systems Sistemi Informativi M 21
Generating predictions
As with user-based CF, can use all items or only the k most similar ones × ∑sim(i, j) r ja, r = j ia, ∑sim(i, j) j Item 2 8 1 Item Item 1 Weighted sum 3
Item 5 7 Item 4 2
Recommender Systems Sistemi Informativi M 22
11 Problems with item-based CF
Item Cold-Start problem
This is a major problem here
Recommender Systems Sistemi Informativi M 23
Important Issues
Cold Start, Implicit/Explicit Rating, Sparsity, Portfolio Effect (non diversity problem), Security, Privacy, …
A lot of work exists on RS‘s, and many other alternatives have been developed
Hybrid RS‘s
Model-based CF
Develop a model of user ratings (probabilistic, based on clustering, etc.)
Context-based RS‘s
Vary the predictions depending on user context
…
See also the survey [AT05] on the web site
Recommender Systems Sistemi Informativi M 24
12