Content-Based Music Recommendation System: a Comparison of Supervised Machine Learning Models and Music Features
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020 Content-based music recommendation system: A comparison of supervised Machine Learning models and music features MARINE CHEMEQUE-RABEL KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Content-based music recommendation system: A comparison of supervised Machine Learning models and music features Marine Chemeque-Rabel [email protected] Master in Computer Science School of Electrical Engineering and Computer Science Supervisor: Bob Sturm Examiner: Joakim Gustafsson Tutor: Didier Giot, Aubay Swedish title: Inneh˚allsbaseratmusikrekommendationssystem Date: August 18, 2020 Abstract As streaming platforms have become more and more popular in recent years and music consumption has increased, music recommendation has become an increasingly relevant issue. Music applications are attempting to improve their recommendation systems in order to offer their users the best possible listening experience and keep them on their platform. For this purpose, two main models have emerged, collaborative filtering and content-based model. In the former, recommendations are based on similarity computations between users and their musical tastes. The main issue with this method is called cold start, it describes the fact that the system will not perform well on new items, whether music or users. In the latter, it is a matter of extracting information from the music itself in order to recommend a similar one. It is the second method that has been implemented in this thesis. The state of the art of content-based methods reveals that the features that can be ex- tracted are numerous. Indeed, there are low level features that can be temporal (zero crossing rate), spectral (spectral decrease), or even perceptual (loudness) that require knowledge of physics and signal processing. There are middle level features that can be understood by musical experts (rhythm, pitch, ...). Finally, there are higher level features, understandable by all (mood, danceability, ...). It should be underlined that the models identified during the paper readings step are also abundant. Using the two datasets GTZAN and FMA, we will aim to first find the best model by focusing only on supervised models as well as their hyperparameters to achieve a relevant recommendation. On the other hand it is also necessary to determine the best subset of features to characterise the music while avoiding redundant and parasitic information. One of the main challenges is to find a way to assess the performance of our system. Sammanfattning Med anledning till att streamingplattformar har blivit mer och mer popul¨ara under de senaste ˚aren,och musikf¨orbrukningen har ¨okat, har musikrekommen- dationen blivit en allt viktigare fr˚aga.Musikapplikationer f¨ors¨oker f¨orb¨attrasina rekommendationssystem genom att erbjuda sina anv¨andareden b¨astam¨ojliga lyssningsupplevelsen och h˚alladem p˚asin plattform. F¨ordetta ¨andam˚alhar tv˚ahuvudmodeller framkommit, samarbetsfiltrering och inneh˚allsbaseradmod- ell. I den f¨orsta¨arrekommendationer baserade p˚alikhetsber¨akningarmellan anv¨andareoch deras smak. Huvudfr˚aganmed denna metod kallas kallstart, den beskriver det faktum att systemet inte kommer att fungera bra p˚anya objekt, vare sig f¨ormusik eller anv¨andare. I den senare modellen handlar det om att extrahera information fr˚ansj¨alva musiken f¨oratt rekommendera en annan. Det ¨arden andra modellen som har implementerats i denna avhandling. Det senaste inom inneh˚allsbaserademetoder avsl¨ojaratt de funktioner som kan extraheras ¨arm˚anga.Det finns faktiskt l˚agniv˚afunktionersom kan vara tem- por¨ara(noll¨overg˚angshastighet),spektral (spektral minskning) eller till och med perceptuell (perceptuell h¨oghet) som kr¨aver kunskap om fysik och signalbehan- dling. Det finns funktioner p˚amedelniv˚asom kan f¨orst˚asav musikaliska experter (rytm, tonh¨ojd...). Slutligen finns det funktioner p˚ah¨ogre niv˚a,f¨orst˚aligaf¨or alla (hum¨or,dansbarhet ...). Det b¨orbetonas att de modeller som identifierats under pappersavl¨asningsstegetocks˚a¨arrikliga. Med hj¨alpav de tv˚adatam¨angderGTZAN och FMA ¨arm˚aletf¨ordet f¨orsta att hitta den b¨astamodellen genom att endast fokusera p˚a¨overvakade modeller, liksom dess hyperparametrar f¨oratt uppn˚aen relevant rekommendation. A˚ andra sidan ¨ardet ocks˚an¨odv¨andigtatt best¨ammaden b¨asta delm¨angdenav funktioner f¨oratt karakterisera musiken samtidigt som man undviker redundant och parasitisk information. En av utmaningarna ¨aratt hitta ett s¨attatt bed¨oma prestandan i v˚artsystem. Contents 1 Introduction 1 1.1 Context . .1 1.2 Purpose and specifications . .1 1.3 Research question . .2 1.4 Overview . .2 2 Background 3 2.1 Recommendation overview . .3 2.1.1 Recommendation definition . .3 2.1.2 Music is different . .3 2.1.3 What is a good recommendation? . .3 2.1.4 Available data . .4 2.2 Types of recommendation systems . .5 2.2.1 Collaborative approach . .5 2.2.2 Content-based approach . .6 2.2.3 Context-based approach . .7 2.2.4 Hybrid approach . .7 2.3 Models for content-based recommendation . .7 2.3.1 Logistic Regression . .8 2.3.2 Decision Trees . 10 2.3.3 Bagging: Random Forest . 10 2.3.4 Boosting: Adaboost . 11 2.3.5 k-Nearest Neighbours . 12 2.3.6 Support Vector Machine . 13 2.3.7 Naive Bayes . 14 2.3.8 Linear Discriminant Analysis . 14 2.3.9 Neural Networks . 14 2.4 Features for content-based recommendation . 16 2.4.1 Low-level features . 17 2.4.2 Middle-level features . 22 2.4.3 High-level features . 23 2.5 Features selection algorithms . 24 2.5.1 Filter model . 24 2.5.2 Wrapper model . 25 2.5.3 Embedded model . 26 3 Methods 27 3.1 Choosen approach . 27 3.2 Datasets . 27 3.2.1 GTZAN . 28 3.2.2 Free Music Archive . 28 3.2.3 Data-augmentation . 30 3.3 Features extraction . 31 3.3.1 Preprocessing . 31 3.3.2 Chosen features . 31 3.3.3 Wrapper model for features selection . 31 3.4 Models . 32 3.4.1 Hyperparameter tuning . 32 3.5 Evaluation . 33 3.5.1 Evaluation of the classification using labels . 33 3.5.2 Evaluation of the prediction using confusion matrices . 34 3.5.3 Evaluation of the prediction based on human opinion . 34 4 Results 35 4.1 Preliminary results . 35 4.1.1 Tests on FMA . 35 4.1.2 Tests on GTZAN . 39 4.2 Dataset creation . 41 4.3 Hyperparameters tuning . 42 4.3.1 Logistic regression optimization . 43 4.3.2 Decision tree and random forest optimization . 43 4.3.3 Adaboost optimization . 44 4.3.4 K-nearest-neighbours optimization . 45 4.3.5 Support vector machine optimization . 46 4.3.6 Linear Discriminant Analysis . 47 4.3.7 Feed-Forward Neural Network . 47 4.3.8 Global results after tuning . 47 4.4 Feature selection . 47 4.4.1 Most important features . 52 4.5 Data augmentation . 54 4.6 Final examples of recommendations . 54 5 Conclusions and discussions 57 5.1 Discussion of the results . 57 5.1.1 Quantitative results . 57 5.1.2 Qualitative results . 57 5.2 Conclusion . 58 5.2.1 Research question . 58 5.2.2 Known limitations . 58 5.3 Future work . 59 5.3.1 Improvement suggestions . 59 5.3.2 Application development . 59 1 INTRODUCTION { 1 Introduction 1.1 Context In 1979, the beginning of a recommendation system was born. Elaine Rich described her Grundy library system [1]: it is used to recommend books to users following a short interview in which the user is initially asked to fill in his first and last name and then, in order to identify the user's preferences and classify them "stereotype", Grundy asks them to describe themselves in a few key words. Once the information has been recorded, Grundy makes an initial suggestion by displaying a summary of the book. If the suggestion does not please the user, Grundy asks questions to understand on which aspect of the book it has made a mistake and suggests a new one. However, its use remains limited and Rich faces problems of generalisation. The recommendation systems that really emerged in the 1990s have de- veloped strongly in recent years, especially with the introduction of Machine Learning and networks. Indeed, on the one hand, the growing use of the cur- rent digital environment, characterised by an overabundance of information has allowed us to obtain large user databases. On the other hand, the increase in computing power made it possible to process these data especially thanks to Machine Learning when human capacities were no longer able to carry out an exhaustive analysis of so much information. Unlike search engines that receive requests containing precise information from the user about what they want, a recommendation system does not receive a direct request from the user, but must offer them new possibilities by learning their preferences from their past behaviour. E-commerce sites that aim to sell a maximum of items or services (travel, books, ...) to customers must therefore recommend suitable goods quickly. As for sites that offer streaming music and movies, their goal is to keep their users on their platform as long as possible. The common point is that it is necessary to make adequate recommendations. Recent progress in this field is considerable and these recommendations are as beneficial for companies that maximise their profits as they are for customers who are no longer overwhelmed by the number of possibilities. Decision-making is made easier and a good recommendation is therefore a significant time saver. In 2006, Netflix, which was an online DVD rental service, launched the Net- flix Challenge with $1 million to be won. The goal of the contest was to build a recommendation algorithm that could surpass the current one by 10% in tests.