Inferring Similarity Between Music Objects with Application to Playlist Generation

Proc. ACM Workshop on Multimedia Information Retrieval, 2005 Inferring Similarity Between Music Objects with Application to Playlist Generation R. Ragno, C.J.C. Burges and C. Herley ABSTRACT come far more complicated operations. Many of these The growing libraries of multimedia objects have in- multimedia applications face the di±culty of determin- creased the need for applications that facilitate search, ing when one object is \like" another. For example, to browsing, discovery, recommendation and playlist con- choose an audio example, a human listener has little dif- struction. Many of these applications in turn require ¯culty in determining that \Salisbury Hill" by Genesis some notion of distance between, or similarity of, such is closer to \With or Without You" by U2 than it is to objects. The lack of a reliable proxy for similarity of \Highway to Hell" by AC/DC. Yet, it is very di±cult to entities is a serious obstacle in many multimedia appli- determine this automatically by computer. This lack of cations. a simple objective measure of \likeness" between multi- In this paper we describe a simple way to automat- media objects has greatly complicated many multime- ically infer similarities between objects based on their dia applications. Clearly traditional signal processing occurrences in an authored stream. The method works measures such as Mean Squared Error or even percep- both for audio and video. This allows us to generate tual measures are not useful as a measure of similarity playlists by emulating a particular stream or combina- of multimedia objects. There is no expectation, for ex- tion of streams, recommend objects that are similar to a ample, that stretches of audio from two U2 songs will chosen seed, and derive measures of similarity between be any closer in, an MSE sense, than either of them associated entities, such as artists. will be to a stretch of music by ABBA. Thus, methods based on examining signal processing metrics alone to determine \likeness" appear so far to be unsuccessful. 1. INTRODUCTION Many of the growing list of multimedia applications The increased ease with which multimedia objects can require such a measure. For example: be authored and distributed has led to individual users having access to very large and varied collections of mul- ² Given a seed song, how do we construct a playlist timedia objects. A few years ago a typical consumer of songs? might have possessed several tens or hundreds of music ² Given a media object that a user likes, how do we CDs, and several favorite movies on tape. Today an av- recommend or discover others that are like it? erage consumer may have ripped his music collection, but may also have music from electronic marketplaces ² Given an artist, how do we ¯nd similar artists? such as iTunes, peer-to-peer swapping networks such as Napster and BitTorrent, and mixer CDs created by The approaches that have been explored to approxi- friends. In addition Internet Radio stations generally mating similarity1 between songs fall into three distinct play a far more varied collection than their terrestrial groups: audio content analysis, metadata similarity and broadcast counterparts, and entirely new distribution collaborative ¯ltering. Audio content analysis relies on channels such as PodCasts serve to introduce users to comparing features such as pitch, rhythm and timbre. a vastly more varied experience than before. Metadata similarity relies on comparing metadata ¯elds This increase in the quantity and variety of multi- such as genre or sub-genre. Collaborative ¯ltering re- media available to consumers has in turn engendered lies on comparing records such as purchasing patterns new needs and new applications. For example, the new among customers. We will review this related work in freedom and larger collections make generating music Section 2. playlists a lot more challenging. Discovery and recom- This paper describes a novel approach for inferring mendation has become increasingly important as users similarity between objects which have no natural idea need help in ¯nding content that matches their personal 1When we use the term \similarity" it is to be understood to tastes. Browsing, indexing and retrieval have also be- mean suitability to appear in a playlist together. 1 Proc. ACM Workshop on Multimedia Information Retrieval, 2005 of distance between them. We do this by using what we B F believe to be a previously unexploited source of infor- 1 mation: the expertise and knowledge about \likeness" 1 2 G implicit in authored streams. For example the play or- S der of a broadcast music station contains valuable infor- 1 1 mation about the songs contained in the list: their joint 1 A 1 membership of a particular genre, their relative popu- E larity, and their pairwise suitability to appear in close proximity in playlists. Our method works by exploiting 2 1 D this information for a large number of authored streams. K We will refer to any audio stream that has been generated by a professional DJ as an Expertly Authored 1 Stream (EAS). We will discuss in Section 3.1 sources of J such data. Figure 1: Graph representing the labeled stream in Our method is very simple: we construct an undi- (1). Songs represent nodes and adjacency increases the rected graph where every song is a node, and song ad- weight of the link between two nodes. jacency contributes to the weight of an arc connecting chosen object. We will describe the actual algorithm two nodes. Suppose for example the play order of one used below. of our EAS's is that of a station that plays 1980's pop The contribution of this paper is to draw attention to hits. A labeled section of the EAS might look like: a previously unexploited source of information: namely 2:37 Salisbury Hill Genesis the order information in expertly authored streams. We 2:41 With or Without You U2 show how this may be used to infer \likeness" between 2:45 Every Move You Make The Police songs based on actual play patterns, rather than on any 2:48 Summer of '69 Bryan Adams articulated rules or preferences or manually determined 2:51 Born in the USA Bruce Springsteen genre labelings. We show how to use this for an auto- In fact, we require only a label that identi¯es each matic playlist generation algorithm that scales to collec- song uniquely. If we represent the individual songs by tions of millions of songs without di±culty. Compared symbols our ordered list might take the form: to previous playlist generation methods our scheme has the advantage that it explicitly produces a measure of ABGDEABDW F GSEW JKE (1) similarity between songs; it does not require access to a where we use "W" to denote a break, or a time gap (human-generated) genre metadata database; the simi- larger than a threshold T: In the work reported here we larity measures improve automatically without human chose T to be 5 minutes. This indicates a break for com- intervention as more data becomes available, and the mercials, news, tra±c updates or another discontinuity scheme can handle very large collections easily. in the EAS. From this we produce a graph by drawing an undirected arc between all adjacent objects; for ex- 2. RELATED WORK ample B is adjacent to A, G and D in the list. We denote the arc joining nodes X and Y by the unordered pair There has been signi¯cant recent growth in the re- (X; Y ). Thus here, the arcs (B; G) and (B; D) would search activity in the areas of multimedia content index- have weight one, while the arc (B; A) would have weight ing and retrieval. We can give no more than a sampling two. This is shown in Figure 1. of recent work in this area. Since our focus is primarily This is repeated for all objects until we have repre- on song objects we will concentrate on recently reported sented all adjacencies by links on the graph. The graph work on playlist generation and music discovery. we construct in Section 4 will have tens of thousands of nodes. A simple way to estimate similarities then ﬂows 2.1 Collaborative ﬁltering from techniques to determine distances on graphs. The Collaborative ¯ltering is often used to establish simi- distance can be modeled as being inversely related to larity and predict events when there is no natural idea the weights on the graph. Thus A in Figure 1 will be of distance between objects. For example Amazon book closer to E than to other nodes on the graph. recommendations compares users' purchasing habits to A very simple way to produce a playlist also follows. recommend books to an individual based on purchases Start with a seed node, say, A. Choose the next objects made by others with similar tastes. Generally this re- from the set of objects that have arcs to A of weight at quires access to a large database of user data. Since for least one: SA = fB; Eg: For example, a subsequent ob- music this gives information at the album level rather ject may be chosen with likelihood proportional to the than the track level it does not appear to have been weight of the arc. Repeat this procedure on the newly applied successfully to playlist generation. 2 Proc. ACM Workshop on Multimedia Information Retrieval, 2005 2.2 Audio content analysis collections of a few hundred songs cannot be used as Several recently reported algorithms allow identi¯ca- the collections grow to tens of thousands and beyond. tion of music from a stream [6, 9]. These audio ¯nger- By assigning various costs to the desired constraints printing algorithms as they are known, enable identi¯- they demonstrate generating playlists from a collection cation of songs even after they have been compressed, of 200000 songs.

Load more