Deep Neural Networks for Context Aware Personalized Music Recommendation
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Deep Neural Networks for Context Aware Personalized Music Recommendation A Vector of Curation OKTAY BAHCECI KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Deep Neural Networks for Context Aware Personalized Music Recommendation A Vector of Curation OKTAY BAHCECI Master in Computer Science Date: June 29, 2017 Supervisor: Hedvig Kjellström Examiner: Patric Jensfelt Swedish title: Djupa neurala nätverk för kontextberoende personaliserad musikrekommendation School of Computer Science and Communication iii Abstract Information Filtering and Recommender Systems have been used and has been implemented in various ways from various entities since the dawn of the Internet, and state-of-the-art approaches rely on Machine Learning and Deep Learning in order to create accurate and personal- ized recommendations for users in a given context. These models require big amounts of data with a variety of features such as time, location and user data in order to find correlations and patterns that other classi- cal models such as matrix factorization and collaborative filtering cannot. This thesis researches, implements and compares a variety of models with the primary focus of Machine Learning and Deep Learning for the task of music recommendation and do so successfully by representing the task of recommendation as a multi-class extreme classification task with 100 000 distinct labels. By comparing fourteen different experiments, all im- plemented models successfully learn features such as time, location, user features and previous listening history in order to create context-aware personalized music predictions, and solves the cold start problem by us- ing user demographic information, where the best model being capable of capturing the intended label in its top 100 list of recommended items for more than 1/3 of the unseen data in an offline evaluation, when eval- uating on randomly selected examples from the unseen following week. iv Sammanfattning Informationsfiltrering och rekommendationssystem har använts och im- plementerats på flera olika sätt från olika enheter sedan gryningen av Internet, och moderna tillvägagångssätt beror på Maskininlärrning samt Djupinlärning för att kunna skapa precisa och personliga rekommenda- tioner för användare i en given kontext. Dessa modeller kräver data i stora mängder med en varians av kännetecken såsom tid, plats och användarda- ta för att kunna hitta korrelationer samt mönster som klassiska modeller såsom matris faktorisering samt samverkande filtrering inte kan. Detta examensarbete forskar, implementerar och jämför en mängd av modeller med fokus på Maskininlärning samt Djupinlärning för musikrekommen- dation och gör det med succé genom att representera rekommendations- problemet som ett extremt multi-klass klassifikationsproblem med 100 000 unika klasser att välja utav. Genom att jämföra fjorton olika expe- riment, så lär alla modeller sig kännetäcken såsom tid, plats, användar- kännetecken och lyssningshistorik för att kunna skapa kontextberoende personaliserade musikprediktioner, och löser kallstartsproblemet genom användning av användares demografiska kännetäcken, där den bästa mo- dellen klarar av att fånga målklassen i sin rekommendationslista med längd 100 för mer än 1/3 av det osedda datat under en offline evalue- ring, när slumpmässigt valda exempel från den osedda kommande veckan evalueras. v Acknowledgements Ever since I can remember, I have had a major passion for music and computers, and have been waiting for the moment to be able to combine these interests in order to maximize my potential. I would like to start by thanking Spotify for giving me the chance to do what I love to do, after presenting an idea I had been thinking of for years. Not only did you let me transform this idea into a truly successful, production sized project and into something that holds great value, but provided me with all the tools to do so. I would like to thank my Spotify mentor Marcus Isaksson for his help and for guiding me in the right directions, and I want to give thanks to Hedvig Kjellström for being my university supervisor. Fur- thermore, I want to give thanks to my university. Apart from giving me a great education and letting me excel in what I love to do, you let me teach and act as an ambassador for years, and gave me multiple oppor- tunities of a lifetime that I never thought were possible. I want to thank all the companies that I have had the chance of working for throughout my education, that has shaped me into the engineer I always wanted to be. I want to thank my friends from Sweden and from California for all your love and support. I would like to thank my relatives and cousins for giving me support throughout the good and the bad. Finally, I would like to thank my mom, Cemile Bahceci for all her love and support. You are the strongest person I know and the coolest woman in tech there is. You have shown me that it is possible to get whatever you want in life with hard work and with a positive mindset. vi Notation To simplify reading, the following notation will be used and referred to throughout this work vc play context embeddings bc play context biases vt track affinity embeddings vci city embeddings vco country embeddings vcuration vector of curation, ranked vector containing the top play contexts for a user up user platform constant ug user gender constant ua user age constant td time of day constant tw time of week constant q query representation u contextual user representation vector V10k vocabulary with 10 000 play contexts V100k vocabulary with 100 000 play contexts Contents 1 Introduction 1 1.1 Background ......................... 1 1.2 Motivation .......................... 2 1.3 Problem Definition and Objective ............. 4 1.4 Limitations ......................... 4 1.5 Sustainability, Ethics, and Societal Aspects ........ 5 1.6 Methodology ........................ 6 1.7 Thesis Outline ........................ 6 2RelatedWork 8 2.1 Recommender Systems ................... 8 2.2 Information Filtering .................... 9 2.2.1 Collaborative Filtering ............... 9 2.3 Content-based Recommendation Systems ......... 10 2.4 Context-aware Recommendation Systems ......... 10 2.4.1 Matrix Factorization ................ 11 2.4.2 Factorization Machines ............... 12 2.5 Hybrid Recommendation Systems ............. 12 2.6 Evaluation of Recommendation Systems ......... 12 3 Background 14 3.1 Vector Representation of Words .............. 14 3.1.1 Embedding ..................... 15 3.1.2 Word2Vec ...................... 16 3.2 Artificial Neural Networks ................. 16 3.3 Feed Forward Neural Networks .............. 17 3.3.1 Single-Layer Perceptron .............. 17 3.3.2 Multilayer Perceptron ............... 18 3.4 Convolutional Neural Networks .............. 19 vii viii CONTENTS 3.4.1 Convolution ..................... 20 3.4.2 Rectified Linear Unit ................ 21 3.4.3 Exponential Linear Unit .............. 21 3.4.4 Pooling ....................... 22 3.5 Recurrent Neural Networks ................. 22 3.5.1 LSTM ........................ 22 3.6 Deep Neural Networks ................... 23 3.6.1 Backpropagation .................. 23 3.7 Deep Learning ........................ 24 3.7.1 Regularization Techniques ............. 25 3.7.2 Optimization Techniques .............. 27 3.7.3 Momentum ..................... 28 3.7.4 Adagrad ....................... 28 3.7.5 Challenges ...................... 29 4Data 30 4.1 Spotify ............................ 30 4.2 Data Collection ....................... 31 4.2.1 Scio ......................... 31 4.2.2 Data Pipeline .................... 31 4.3 Data Analysis ........................ 31 4.3.1 Play Contexts .................... 32 4.3.2 Listening History .................. 32 4.3.3 User Data ...................... 33 4.3.4 Metadata ...................... 34 4.3.5 Training and Evaluation Data ........... 34 4.4 Feature Engineering and Representation ......... 34 5 Method 36 5.1 Recommendation Represented as Classification ...... 36 5.1.1 Classifier Efficiency ................. 37 5.2 Model Architecture ..................... 38 5.2.1 Scoring Function .................. 38 5.2.2 Diverse and Unlimited Features .......... 38 5.2.3 Weights and Priors ................. 40 5.2.4 Batch Training and Normalization ......... 40 5.3 Network Layer and Embedding Dimensions ........ 40 5.4 Vocabulary Dimension ................... 41 5.5 Hyperparameters and Tuning ............... 41 CONTENTS ix 5.5.1 Loss Function .................... 41 5.5.2 Optimizer ...................... 42 5.6 The Vector of Curation ................... 43 5.7 Implementation ....................... 43 5.7.1 TensorFlow ..................... 43 5.7.2 Data and Feature representation .......... 44 5.7.3 Training ....................... 44 5.8 Baseline Algorithm ..................... 44 5.8.1 Context-Aware Popularity Based Heuristic .... 45 5.9 Metrics ............................ 46 5.9.1 Accuracy ....................... 46 5.9.2 Reciprocal Rank .................. 47 5.9.3 Top-K ........................ 47 6 Results 48 6.1 Baseline Heuristic ...................... 48 6.1.1 Mean Accuracy ................... 49 6.1.2 Mean Reciprocal Rank ............... 49 6.2 Models ............................ 49 6.3 Experiments ......................... 51 6.3.1 Base Model ..................... 51 6.4 Track Vectors ........................ 52 6.5 Going Deeper ........................ 57 6.6 DEEPER