Music Recommendation and Discovery in the Long Tail
Total Page:16
File Type:pdf, Size:1020Kb
MUSIC RECOMMENDATION AND DISCOVERY IN THE LONG TAIL Oscar` Celma Herrada 2008 c Copyright by Oscar` Celma Herrada 2008 All Rights Reserved ii To Alex and Claudia who bring the whole endeavour into perspective. iii iv Acknowledgements I would like to thank my supervisor, Dr. Xavier Serra, for giving me the opportunity to work on this very fascinating topic at the Music Technology Group (MTG). Also, I want to thank Perfecto Herrera for providing support, countless suggestions, reading all my writings, giving ideas, and devoting much time to me during this long journey. This thesis would not exist if it weren’t for the the help and assistance of many people. At the risk of unfair omission, I want to express my gratitude to them. I would like to thank all the colleagues from MTG that were —directly or indirectly— involved in some bits of this work. Special mention goes to Mohamed Sordo, Koppi, Pedro Cano, Mart´ın Blech, Emilia G´omez, Dmitry Bogdanov, Owen Meyers, Jens Grivolla, Cyril Laurier, Nicolas Wack, Xavier Oliver, Vegar Sandvold, Jos´ePedro Garc´ıa, Nicolas Falquet, David Garc´ıa, Miquel Ram´ırez, and Otto W¨ust. Also, I thank the MTG/IUA Administration Staff (Cristina Garrido, Joana Clotet and Salvador Gurrera), and the sysadmins (Guillem Serrate, Jordi Funollet, Maarten de Boer, Ram´on Loureiro, and Carlos Atance). They provided help, hints and patience when I played around with the machines. During my six months stage at the Center for Computing Research of the National Poly- technic Institute (Mexico City) in 2007, I met a lot of interesting people ranging different disciplines. I thank Alexander Gelbukh for inviting me to work in his research group, the Natural Language Laboratory. Also, I would like to thank Grigori Sidorov, Tine Stalmans, Obdulia Pichardo, Sulema Torres, and Yulema Ledeneva for making my stay so wonderful. This thesis would be much more difficult to read —except for the “Spanglish” experts— if it weren’t for the excellent work of the following people: Paul Lamere, Owen Meyers, Terry Jones, Kurt Jacobson, Douglas Turnbull, Tom Slee, Kalevi Kilkki, Perfecto Herrera, Alberto Lumbreras, Daniel McEnnis, Xavier Amatriain, and Neil Lathia. They not only have helped me to improve the text, but have provided feedback, comments, suggestions, and —of course— criticism. v Many people have influenced my research during these years. Furthermore, I have been lucky enough to meet some of them. In this sense, I would like to acknowledge Elias Pampalk, Paul Lamere, Justin Donaldson, Jeremy Pickens, Markus Schedl, Peter Knees, and Stephan Baumann. I had very interesting discussions with them in several ISMIR (and other) conferences. Other researchers whom I have learnt a lot, and I have worked with, are: Massimiliano Zanin, Javier Buld´u, Rapha¨el Troncy, Michael Hausenblas, Roberto Garc´ıa, and Yves Raimond. I also want to thank some MTG veterans, whom I met and worked before starting the PhD. They are: Alex Loscos, Jordi Bonada, Pedro Cano, Oscar Mayor, Jordi Janer, Lars Fabig, Fabien Gouyon, and Enric Mieza. Also, special thanks goes to Esteban Maestre and Pau Arum´ıfor having such a great time while being PhD students. Last but not least, this work would have never been possible without the encouragement of my wife Claudia, who has provided me love and patience, and my lovely son Alex —who altered my last.fm and youtube accounts with his favourite music. Nowadays, Cri–Cri, Elmo and Barney, coexists with The Dogs d’Amour, Backyard Babies, and other rock bands. I reckon that the two systems are a bit lost when trying to recommend me music and videos!. Also, a special warm thanks goes to my parents Tere and Toni, my brother Marc and my sister in law Marta, and the whole family in Barcelona and Mexico. At least, they will understand what my work is about. hopefully. This research was performed at the Music Technology Group of the Universitat Pompeu Fabra in Barcelona, Spain. Primary support was provided by the EU projects FP6-507142 SIMAC1 and FP6-045035 PHAROS2, and by a Mexican grant from the Secretar´ıa de Rela- ciones Exteriores (Ministry of Foreign Affairs) for a six months stage at the Center for Computing Research of the National Polytechnic Institute (Mexico City). 1http://www.semanticaudio.org 2http://www.pharos-audiovisual-search.eu/ vi Abstract Music consumption is biased towards a few popular artists. For instance, in 2007 only 1% of all digital tracks accounted for 80% of all sales. Similarly, 1,000 albums accounted for 50% of all album sales, and 80% of all albums sold were purchased less than 100 times. There is a need to assist people to filter, discover, personalise and recommend from the huge amount of music content available along the Long Tail. Current music recommendation algorithms try to accurately predict what people de- mand to listen to. However, quite often these algorithms tend to recommend popular —or well–known to the user— music, decreasing the effectiveness of the recommendations. These approaches focus on improving the accuracy of the recommendations. That is, try to make accurate predictions about what a user could listen to, or buy next, independently of how useful to the user could be the provided recommendations. In this Thesis we stress the importance of the user’s perceived quality of the recom- mendations. We model the Long Tail curve of artist popularity to predict —potentially— interesting and unknown music, hidden in the tail of the popularity curve. Effective recom- mendation systems should promote novel and relevant material (non–obvious recommenda- tions), taken primarily from the tail of a popularity distribution. The main contributions of this Thesis are: (i) a novel network–based approach for recommender systems, based on the analysis of the item (or user) similarity graph, and the popularity of the items, (ii) a user–centric evaluation that measures the user’s relevance and novelty of the recommendations, and (iii) two prototype systems that implement the ideas derived from the theoretical work. Our findings have significant implications for recommender systems that assist users to explore the Long Tail, digging for content they might like. vii viii Resum Avui en dia, la m´usica est`aesbiaixada cap al consum d’alguns artistes molt populars. Per exemple, el 2007 nom´es l’1% de totes les can¸cons en format digital va representar el 80% de les vendes. De la mateixa manera, nom´es 1.000 `albums varen representar el 50% de totes les vendes, i el 80% de tots els `albums venuts es varen comprar menys de 100 vegades. Es´ clar que hi ha una necessitat per tal d’ajudar a les persones a filtrar, descobrir, personalitzar i recomanar m´usica, a partir de l’enorme quantitat de contingut musical disponible. Els algorismes de recomanaci´ode m´usica actuals intenten predir amb precisi´oel que els usuaris demanen escoltar. Tanmateix, molt sovint aquests algoritmes tendeixen a recomanar artistes famosos, o coneguts d’avantm`aper l’usuari. Aix`o fa que disminueixi l’efic`acia i utilitat de les recomanacions, ja que aquests algorismes es centren b`asicament en millorar la precisi´ode les recomanacions. Es´ a dir, tracten de fer prediccions exactes sobre el que un usuari pugui escoltar o comprar, independentment de quantutils ´ siguin les recomanacions generades. En aquesta tesi destaquem la import`ancia que l’usuari valori les recomanacions rebudes. Per aquesta ra´omodelem la corba de popularitat dels artistes, per tal de poder recomanar m´usica interessant i desconeguda per l’usuari. Les principals contribucions d’aquesta tesi s´on: (i) un nou enfocament basat en l’an`alisi de xarxes complexes i la popularitat dels productes, aplicada als sistemes de recomanaci´o, (ii) una avaluaci´ocentrada en l’usuari, que mesura la import`ancia i la desconeixen¸ca de les recomanacions, i (iii) dos prototips que implementen la idees derivades de la tasca te`orica. Els resultats obtinguts tenen clares implicacions per aquells sistemes de recomanaci´oque ajuden a l’usuari a explorar i descobrir continguts que els pugui agradar. ix x Resumen Actualmente, el consumo de m´usica est´asesgada hacia algunos artistas muy populares. Por ejemplo, en el a˜no 2007 s´olo el 1% de todas las canciones en formato digital representaron el 80% de las ventas. De igual modo, ´unicamente 1.000 ´albumes representaron el 50% de todas las ventas, y el 80% de todos los ´albumes vendidos se compraron menos de 100 veces. Existe, pues, una necesidad de ayudar a los usuarios a filtrar, descubrir, personalizar y recomendar m´usica a partir de la enorme cantidad de contenido musical existente. Los algoritmos de recomendaci´on musical existentes intentan predecir con precisi´on lo que la gente quiere escuchar. Sin embargo, muy a menudo estos algoritmos tienden a recomendar o bien artistas famosos, o bien artistas ya conocidos de antemano por el usuario. Esto disminuye la eficacia y la utilidad de las recomendaciones, ya que estos algoritmos se centran en mejorar la precisi´on de las recomendaciones. Con lo cu´al, tratan de predecir lo que un usuario pudiera escuchar o comprar, independientemente de lo ´utiles que sean las recomendaciones generadas. En este sentido, la tesis destaca la importancia de que el usuario valore las recomenda- ciones propuestas. Para ello, modelamos la curva de popularidad de los artistas con el fin de recomendar m´usica interesante y, a la vez, desconocida para el usuario. Las principales contribuciones de esta tesis son: (i) un nuevo enfoque basado en el an´alisis de redes com- plejas y la popularidad de los productos, aplicada a los sistemas de recomendaci´on, (ii) una evaluaci´on centrada en el usuario que mide la calidad y la novedad de las recomendaciones, y (iii) dos prototipos que implementan las ideas derivadas de la labor te´orica.