Lessons from the Netflix Prize

Lessons from the Netflix prize

Nisheeth The road to the Netflix Prize

• SVD • SVD++ • Temporal SVD • Temporal SVD++ • Highly recommended reading: Madrigal (2014) How Netflix reverse-engineered Hollywood

Netflix challenge

• Ratings on 1-5 scale – 480000 users – 17770 movies

푟 − 푟 2 • Metric 푢,푖 푢푖 푢푖 푁 • Reduce baseline RMSE of 0.9514 by 10% - win a million $ Koren-Bell approach

• Basic insight – SVD captures user-item interaction – Model user and item biases separately – Include information from temporal dynamics – Include role of implicit feedback • People rate some movies for a reason, no matter what rating they eventually assign

Modeling user and item biases

• Using simple linear models

• 푏푢푖 = 휇 + 푏푖 + 푏푢 • Learn the parameters using a least squares loss function 2 2 2 • 푚푖푛푏∗ 푢푖 푟푢푖 − 휇 − 푏푖 − 푏푢 + 훾 ( 푖 푏푖 + 푢 푏푢 ) • What is the second term here doing? Is it necessary? • Typically fit using alternative least squares or stochastic gradient descent • SGD most popular – Iterate across all parameters – Calculate prediction error – Shift new parameter estimates in opposite direction of error, multiplied by error magnitude Adding SVD to the baseline

• Added by simply adding 2 2 2 2 2 • 푚푖푛푏∗,푝,푞 푢푖 푟푢푖 − 휇 − 푏푖 − 푏푢 − 푝푢푞푖 + 훾 ( 푖 푏푖 + 푢 푏푢 + 푝푢 + 푞푖 ) • Parameters continue to be learned via SGD

Incorporating implicit feedback

• User trait factors learned by SVD live in pu • Augment with implicit feedback information • Implicit feedback = which items were rated at all • Revised user model 푦 • 푝 + 푗∈푅(푢) 푗 푢 푅(푢)

• The 푦푗 are parameters learned from data also – interpret as item-specific traits Incorporating time dynamics

• User preferences shift over time • Assume this in the baseline predictor

• Design an item time drift model – Simple, just fit a histogram

• Design a user time drift model – Not as straightforward, have to deal with psychology

User time drift model

• If a user rated a movie at time t, model the time drift of the rating with respect to the

mean rating time tu as 훽 • 푑푒푣푢 푡 = 푠푔푛 푡 − 푡푢 푡 − 푡푢 • User model becomes

• 푏푢 푡 = 푏푢 + 푏푢,푡 + 훼푢 푑푒푣푢(푡),

– 푏푢,푡 is meant to fit rating spikes in time Combining dynamic baseline with SVD

• Make the user trait model dynamic also

• Put the pieces together to get the final prediction rule

• 푟푢푖 = 휇 + 푏푖 푡푢푖 + 푏푢 푡푢푖 + 푦 푞 푝 푡 + 푗∈푅(푢) 푗 푖 푢 푢푖 푅(푢) • Learn all the parameters using SGD

Outcome Micro-genres instead of parameters

Emotional Independent Sports Movies Spy Action & Adventure from the 1930s Cult Evil Kid Horror Movies Hundreds of raters Cult Sports Movies Sentimental set in Europe Dramas from the 1970s Visually-striking Foreign Nostalgic Dramas 36 page rating manual Japanese Sports Movies Gritty Discovery Channel Reality TV Romantic Chinese Crime Movies Template: Mind-bending Cult Horror Movies from the 1980s Dark Suspenseful Sci-Fi Horror Movies Region + Adjectives + Noun Gritty Suspenseful Revenge Westerns Genre + Based On... + Set In... + Violent Suspenseful Action & Adventure from the 1980s From the... + About... + For Age Time Travel Movies starring William Hartnell Romantic Indian Crime Dramas X to Y Evil Kid Horror Movies Visually-striking Goofy Action & Adventure British set in Europe Sci-Fi & Fantasy from the 1960s Dark Suspenseful Gangster Dramas Critically-acclaimed Emotional Underdog Movies

This is what a real-world recommender has to do Netflix recommender workflow

Netflix Quantum Theory

Syntactic combination

Content-based CF

Content-boosted collaborative filtering Based on content features additional ratings are created E.g. Alice likes Items 1 and 3 (unary ratings) Item7 is similar to 1 and 3 with sim rating 0.75 Thus Alice likes Item7 weighted by 0.75 Item matrices become less sparse