Universitá Degli Studi Di Milano Facoltà Di Scienze Matematiche, Fisiche E Naturali Dipartimento Di Tecnologie Dell'informazione

UNIVERSITÁ DEGLI STUDI DI MILANO FACOLTÀ DI SCIENZE MATEMATICHE, FISICHE E NATURALI DIPARTIMENTO DI TECNOLOGIE DELL'INFORMAZIONE SCUOLA DI DOTTORATO IN INFORMATICA Settore disciplinare INF/01 TESI DI DOTTORATO DI RICERCA CICLO XXIII SERENDIPITOUS MENTORSHIP IN MUSIC RECOMMENDER SYSTEMS Eugenio Tacchini Relatore: Prof. Ernesto Damiani Direttore della Scuola di Dottorato: Prof. Ernesto Damiani Anno Accademico 2010/2011 II Acknowledgements I would like to thank all the people who helped me during my Ph.D. First of all I would like to thank Prof. Ernesto Damiani, my advisor, not only for his support and the knowledge he imparted to me but also for his capacity of understanding my needs and for having let me follow my passions; thanks also to all the other people of the SESAR Lab, in particular to Paolo Ceravolo and Gabriele Gianini. Thanks to Prof. Domenico Ferrari, who gave me the possibility to work in an inspiring context after my graduation, helping me to understand the direction I had to take. Thanks to Prof. Ken Goldberg for having hosted me in his laboratory, the Berkeley Laboratory for Automation Science and Engineering at the University of California, Berkeley, a place where I learnt a lot; thanks also to all the people of the research group and in particular to Dmitry Berenson and Timmy Siauw for the very fruitful discussions about clustering, path searching and other aspects of my work. Thanks to all the people who accepted to review my work: Prof. Richard Chbeir, Prof. Ken Goldberg, Prof. Przemysław Kazienko, Prof. Ronald Maier and Prof. Robert Tolksdorf. Thanks to 7digital, our media partner for the experimental test, and in particular to Filip Denker. III Thanks to Letizia Guagnini for the graphic design of the interface of Mentor.FM and to Simone Magnaschi for the HTML/CSS/JS conversion of the design. Thanks to Filippo Losi, Leonora Fortunati and Camilla Sampo, who helped to understand better which are the different communities behind some of the musical worlds found. Thanks to the people of stackoverflow.com for their willingness to help others using their impressive skills, thanks in particular to Itamar Katz for the precious suggestions received about the optimization of the the boolean similarity computation. Thanks to Marco della Vedova for the suggestions and the fruitful discussions. Thanks to all the friends who inspired and supported me during these years. Finally, thanks a lot to my family for supporting me over the years and thanks again to my father who, when I was 8 years old, bought for me a Commodore home computer, by mistake. Considering that computer science became for me first a passion and then a job, I would say it was an excellent example of serendipity. IV Abstract Nowadays the amount of content and products easily available on-line for purchase or fruition is so high that recommender systems represent an important resource for users in order to get suggestions about items (songs, movies, books, news, products in general...) they might like. For many years, research, in the field of recommender systems focused on improving accuracy, i.e. improving the precision with which the systems predict the rate that a given user would give to a given item. In the last years, an increasing number of efforts have been directed towards other important aspects such as novelty, diversity and serendipity of recommendations. In particular, with serendipity, in this context, we refer to the ability of a recommender system to propose unexpected and liked recommendations. Serendipity is likely the aspect which has received the least attention and it is the one, in this work, we focus more on. The aim of this thesis is to propose techniques which can be adopted by recommender system designers in order to increase serendipity while keeping an acceptable level of precision of the recommendations. We work in the domain of music, which presents a particularly suitable context for trying to propose non-obvious recommendations, mainly due to the lower cost, respect to other domains, of “bad” recommendations (listening to a song a user dislikes is not much time consuming). V The work proposes a collaborative-filtering method to classify artists, based on the Affinity Propagation clustering algorithm and on listening logs as data source. The classification, together with a list of the artists a user likes, is used to detect which musical clusters (called “musical worlds”) the user is not familiar with. A technique to synthetically represent each cluster, based on freely chosen keywords (folksonomy), is also presented. A novel recommendation method based on gradual exposure and on a variation of the user-based collaborative filtering approach is proposed. The said method exploits the knowledge of the most eclectic users (we decided to call them “mentors”) to choose, from the unfamiliar musical clusters, the ones which are more likely to contain serendipitous music for the active user. Once a target musical cluster has been chosen, a playlist is created, which starts with songs by artists who tend to be borderline in respect to the user's taste and continues with songs by artists who tend to be, gradually, closer to the most representative artist of the target cluster. A real music recommendation radio has been developed, implementing the techniques proposed and a traditional top-10 item-based recommender. The radio has been used as a validation test, considering the traditional recommender as a baseline to define which recommendations were expected and which ones were unexpected. The test session suggested that the proposed approach overcomes a method which relies on randomness in terms of a novel measure, called “serendipity cost” (measured as the total number of disliked songs over total number of serendipitous songs) and in term of cohesion, maintaining a “total cost” (measured as the total number of disliked songs over total number of liked songs, which can be considered an index of precision) which is much lower than the cost related to the random approach and closer to the cost of a traditional item-based recommender systems (1.03 for the method proposed, 0.46 for the traditional recommender, 2.77 for the random). The method we proposed in order to choose and order the intermediate artists in a playlist, based on graph search techniques, is used to gradually VI expose the user to the target musical world, following the intuition that showing a connection between the target musical world and the music the user is closer to can help him to accept the (unexpected) recommendation. This method, however, can itself be considered an achievement of this work and applied not only in this context but anytime the automatic production of a playlist, having in input the first artist, the last artist and a cohesion (distance in a playlist between an artist and the following one1) constraint, is needed. 1 Note that cohesion in literature is usually defined as the average distance in a playlist between a song and the following one so in this sentence the term is used in a broader sense VII Contents 1 Introduction..................................................................................................1 1.1 Recommender Systems........................................................................1 1.1.1 Definition..................................................................................... 1 1.1.2 Aims............................................................................................. 2 1.1.3 Data sources................................................................................. 4 1.1.4 Techniques....................................................................................4 1.1.4.1 Content-based....................................................................... 5 1.1.4.2 Collaborative-filtering based................................................ 6 1.1.4.2.1 User-based.....................................................................6 1.1.4.2.2 Item-based...................................................................11 1.1.4.3 Hybrid.................................................................................12 1.1.4.4 A comparison between content-based and collaborative- filtering-based techniques...............................................................12 1.2 Recommender Systems in the Music Domain...................................14 1.2.1 Peculiarities................................................................................ 14 1.2.1.1 Huge items space................................................................14 1.2.1.2 Low cost/consumption time................................................14 1.2.1.3 Very high per-item reuse.....................................................15 1.2.1.4 Contextual and mood usage................................................15 1.2.1.5 Consumed in sequence....................................................... 16 1.2.1.6 Implicit feedback evaluation..............................................16 1.2.2 Applications................................................................................17 1.2.2.1 Pandora............................................................................... 17 1.2.2.2 Last.fm................................................................................18 1.2.2.3 Musicovery......................................................................... 18 1.2.2.4 The Echo Nest.....................................................................19 2 Motivations, goals and state of the art......................................................

Load more