Content-Based Music Recommendation System: a Comparison of Supervised Machine Learning Models and Music Features

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020 Content-based music recommendation system: A comparison of supervised Machine Learning models and music features MARINE CHEMEQUE-RABEL KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Content-based music recommendation system: A comparison of supervised Machine Learning models and music features Marine Chemeque-Rabel [email protected] Master in Computer Science School of Electrical Engineering and Computer Science Supervisor: Bob Sturm Examiner: Joakim Gustafsson Tutor: Didier Giot, Aubay Swedish title: Inneh˚allsbaseratmusikrekommendationssystem Date: August 18, 2020 Abstract As streaming platforms have become more and more popular in recent years and music consumption has increased, music recommendation has become an increasingly relevant issue. Music applications are attempting to improve their recommendation systems in order to offer their users the best possible listening experience and keep them on their platform. For this purpose, two main models have emerged, collaborative filtering and content-based model. In the former, recommendations are based on similarity computations between users and their musical tastes. The main issue with this method is called cold start, it describes the fact that the system will not perform well on new items, whether music or users. In the latter, it is a matter of extracting information from the music itself in order to recommend a similar one. It is the second method that has been implemented in this thesis. The state of the art of content-based methods reveals that the features that can be ex- tracted are numerous. Indeed, there are low level features that can be temporal (zero crossing rate), spectral (spectral decrease), or even perceptual (loudness) that require knowledge of physics and signal processing. There are middle level features that can be understood by musical experts (rhythm, pitch, ...). Finally, there are higher level features, understandable by all (mood, danceability, ...). It should be underlined that the models identified during the paper readings step are also abundant. Using the two datasets GTZAN and FMA, we will aim to first find the best model by focusing only on supervised models as well as their hyperparameters to achieve a relevant recommendation. On the other hand it is also necessary to determine the best subset of features to characterise the music while avoiding redundant and parasitic information. One of the main challenges is to find a way to assess the performance of our system. Sammanfattning Med anledning till att streamingplattformar har blivit mer och mer populära under de senaste ˚aren,och musikförbrukningen har ökat, har musikrekommen- dationen blivit en allt viktigare fr˚aga.Musikapplikationer försöker förbättrasina rekommendationssystem genom att erbjuda sina användareden bästamöjliga lyssningsupplevelsen och h˚alladem p˚asin plattform. Fördetta ändam˚alhar tv˚ahuvudmodeller framkommit, samarbetsfiltrering och inneh˚allsbaseradmod- ell. I den förstaärrekommendationer baserade p˚alikhetsberäkningarmellan användareoch deras smak. Huvudfr˚aganmed denna metod kallas kallstart, den beskriver det faktum att systemet inte kommer att fungera bra p˚anya objekt, vare sig förmusik eller användare. I den senare modellen handlar det om att extrahera information fr˚ansjälva musiken föratt rekommendera en annan. Det ärden andra modellen som har implementerats i denna avhandling. Det senaste inom inneh˚allsbaserademetoder avslöjaratt de funktioner som kan extraheras ärm˚anga.Det finns faktiskt l˚agniv˚afunktionersom kan vara tem- porära(nollöverg˚angshastighet),spektral (spektral minskning) eller till och med perceptuell (perceptuell höghet) som kräver kunskap om fysik och signalbehan- dling. Det finns funktioner p˚amedelniv˚asom kan först˚asav musikaliska experter (rytm, tonhöjd...). Slutligen finns det funktioner p˚ahögre niv˚a,först˚aligaför alla (humör,dansbarhet ...). Det börbetonas att de modeller som identifierats under pappersavläsningsstegetocks˚aärrikliga. Med hjälpav de tv˚adatamängderGTZAN och FMA ärm˚aletfördet första att hitta den bästamodellen genom att endast fokusera p˚aövervakade modeller, liksom dess hyperparametrar föratt uppn˚aen relevant rekommendation. A˚ andra sidan ärdet ocks˚anödvändigtatt bestämmaden bästa delmängdenav funktioner föratt karakterisera musiken samtidigt som man undviker redundant och parasitisk information. En av utmaningarna äratt hitta ett sättatt bedöma prestandan i v˚artsystem. Contents 1 Introduction 1 1.1 Context . .1 1.2 Purpose and specifications . .1 1.3 Research question . .2 1.4 Overview . .2 2 Background 3 2.1 Recommendation overview . .3 2.1.1 Recommendation definition . .3 2.1.2 Music is different . .3 2.1.3 What is a good recommendation? . .3 2.1.4 Available data . .4 2.2 Types of recommendation systems . .5 2.2.1 Collaborative approach . .5 2.2.2 Content-based approach . .6 2.2.3 Context-based approach . .7 2.2.4 Hybrid approach . .7 2.3 Models for content-based recommendation . .7 2.3.1 Logistic Regression . .8 2.3.2 Decision Trees . 10 2.3.3 Bagging: Random Forest . 10 2.3.4 Boosting: Adaboost . 11 2.3.5 k-Nearest Neighbours . 12 2.3.6 Support Vector Machine . 13 2.3.7 Naive Bayes . 14 2.3.8 Linear Discriminant Analysis . 14 2.3.9 Neural Networks . 14 2.4 Features for content-based recommendation . 16 2.4.1 Low-level features . 17 2.4.2 Middle-level features . 22 2.4.3 High-level features . 23 2.5 Features selection algorithms . 24 2.5.1 Filter model . 24 2.5.2 Wrapper model . 25 2.5.3 Embedded model . 26 3 Methods 27 3.1 Choosen approach . 27 3.2 Datasets . 27 3.2.1 GTZAN . 28 3.2.2 Free Music Archive . 28 3.2.3 Data-augmentation . 30 3.3 Features extraction . 31 3.3.1 Preprocessing . 31 3.3.2 Chosen features . 31 3.3.3 Wrapper model for features selection . 31 3.4 Models . 32 3.4.1 Hyperparameter tuning . 32 3.5 Evaluation . 33 3.5.1 Evaluation of the classification using labels . 33 3.5.2 Evaluation of the prediction using confusion matrices . 34 3.5.3 Evaluation of the prediction based on human opinion . 34 4 Results 35 4.1 Preliminary results . 35 4.1.1 Tests on FMA . 35 4.1.2 Tests on GTZAN . 39 4.2 Dataset creation . 41 4.3 Hyperparameters tuning . 42 4.3.1 Logistic regression optimization . 43 4.3.2 Decision tree and random forest optimization . 43 4.3.3 Adaboost optimization . 44 4.3.4 K-nearest-neighbours optimization . 45 4.3.5 Support vector machine optimization . 46 4.3.6 Linear Discriminant Analysis . 47 4.3.7 Feed-Forward Neural Network . 47 4.3.8 Global results after tuning . 47 4.4 Feature selection . 47 4.4.1 Most important features . 52 4.5 Data augmentation . 54 4.6 Final examples of recommendations . 54 5 Conclusions and discussions 57 5.1 Discussion of the results . 57 5.1.1 Quantitative results . 57 5.1.2 Qualitative results . 57 5.2 Conclusion . 58 5.2.1 Research question . 58 5.2.2 Known limitations . 58 5.3 Future work . 59 5.3.1 Improvement suggestions . 59 5.3.2 Application development . 59 1 INTRODUCTION { 1 Introduction 1.1 Context In 1979, the beginning of a recommendation system was born. Elaine Rich described her Grundy library system [1]: it is used to recommend books to users following a short interview in which the user is initially asked to fill in his first and last name and then, in order to identify the user's preferences and classify them "stereotype", Grundy asks them to describe themselves in a few key words. Once the information has been recorded, Grundy makes an initial suggestion by displaying a summary of the book. If the suggestion does not please the user, Grundy asks questions to understand on which aspect of the book it has made a mistake and suggests a new one. However, its use remains limited and Rich faces problems of generalisation. The recommendation systems that really emerged in the 1990s have de- veloped strongly in recent years, especially with the introduction of Machine Learning and networks. Indeed, on the one hand, the growing use of the current digital environment, characterised by an overabundance of information has allowed us to obtain large user databases. On the other hand, the increase in computing power made it possible to process these data especially thanks to Machine Learning when human capacities were no longer able to carry out an exhaustive analysis of so much information. Unlike search engines that receive requests containing precise information from the user about what they want, a recommendation system does not receive a direct request from the user, but must offer them new possibilities by learning their preferences from their past behaviour. E-commerce sites that aim to sell a maximum of items or services (travel, books, ...) to customers must therefore recommend suitable goods quickly. As for sites that offer streaming music and movies, their goal is to keep their users on their platform as long as possible. The common point is that it is necessary to make adequate recommendations. Recent progress in this field is considerable and these recommendations are as beneficial for companies that maximise their profits as they are for customers who are no longer overwhelmed by the number of possibilities. Decision-making is made easier and a good recommendation is therefore a significant time saver. In 2006, Netflix, which was an online DVD rental service, launched the Net- flix Challenge with $1 million to be won. The goal of the contest was to build a recommendation algorithm that could surpass the current one by 10% in tests.

Content-Based Music Recommendation System: a Comparison of Supervised Machine Learning Models and Music Features

Downbeat.Com March 2014 U.K. £3.50

Campus Mourns After Loss of Kyle Carson Drew Sets New Scoring Record

Alt-Rock and College- Pop of the 1980S, Came to an Abrupt, Grinding Halt in the 1990S

THE • Number 28 Review Online Non-Profit Org

Winners Hot New Releases

May Project Rules Enforced More Strictly for Seniors by Emma Trone However, Seniors’ Ability to the Start of May a Source of Contention Among Signed To

Notre Dame Recognizes 25Th Year of Co-Education SMC Elects

Master Layout Issue 2

Michael J. Rivard Discography 1986 – Present

Morphine Like Swimming Mp3, Flac, Wma

Impeachment Process Begins for VP Gray

File: /Media/Data/My Music/Albuns/Lista.Txt Page 1 of 5