Hypergraph Models of Playlist Dialects

HYPERGRAPH MODELS OF PLAYLIST DIALECTS Brian McFee Gert Lanckriet Computer Science and Engineering Electrical and Computer Engineering University of California, San Diego University of California, San Diego ABSTRACT occurring playlists as a single language, we propose to model playlists as a collection of dialects, each of which Playlist generation is an important task in music informa- may exhibit its own particular structure. Toward this end, tion retrieval. While previous work has treated a playlist we develop dialect-specific playlist models, and evaluate on collection as an undifferentiated whole, we propose to build a large corpus of annotated, user-generated playlists. playlist models which are tuned to specific categories or The proposed approach raises several natural questions: dialects of playlists. Toward this end, we develop a general class of flexible and scalable playlist models based upon • Is it beneficial to individually model playlist dialects? hypergraph random walks. To evaluate the proposed mod- • Are some dialects easier to model than others? els, we present a large corpus of categorically annotated, • Which features are important for each dialect? user-generated playlists. Experimental results indicate that category-specific models can provide substantial improve- Answering these questions will hopefully provide valuable ments in accuracy over global playlist models. insight into the underlying mechanics of playlist generation. 1. INTRODUCTION 1.1 Our contributions Playlist generation, the automated construction of sequences In this work, our contributions are two-fold. First, we de- of songs, is a central component to online music delivery velop a flexible, scalable, and efficient class of generative services. Because users tend to consume music sequentially playlist models based upon hypergraph random walks. Sec- in listening sessions, the quality of a playlist generation ond, we present a new, large-scale, categorically annotated algorithm can significantly impact user satisfaction. corpus of user-generated playlist data. Recently, it has been proposed that playlist generation algorithms may be best viewed as probabilistic models of 2. HYPERGRAPH RANDOM WALKS song sequences [11]. This viewpoint, borrowed from the Over the last decade, several researchers have proposed statistical natural language processing literature, enables playlist generation algorithms based upon random walks [9, the automatic evaluation and optimization of a model by 11,12]. 1 Random walk playlist models consist of a weighted computing the likelihood of it generating examples of user- graph G = (X ; E; w), where the vertices X represent the generated playlists. For this method to work, the practi- library of songs, and the edges E and weights w encode tioner must provide a large collection of example playlists, pairwise affinities between songs. A playlist is then gener- both for model evaluation and parameter optimization. ated by following a random trajectory through the graph, Of course, numerous subtleties and difficulties arise where transitions x x are sampled according to the when working with user-generated playlist data. For ex- t t+1 weights on edges incident to x . ample, the data is often noisy, and the author’s intent may t Random walk models, while simple and efficient, carry be obscure. In extreme cases, users may compose playlists certain practical limitations. It is often unclear how to define by randomly selecting songs from their libraries. More the weights, especially when multiple sources of pairwise generally, different playlists may have different intended affinity are available. Moreover, relying on pairwise inter- uses (e.g., road trip or party mix), thematic elements (break actions can severely limit the expressive power of these up or romantic), or simply contain songs only of specific models (if each song has few neighbors), or scalability and genres. While previous work treats the universe of user- precision (if each song has many neighbors). generated playlists as a single language, building effective To overcome these limitations, we propose a new class of global models has proven to be difficult [11]. playlist algorithms which allow for more flexible affinities To better understand the structure of playlists, we advo- between songs and sets of songs. cate a more subtle approach. Rather than viewing naturally 2.1 The user model Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are To motivate our playlist generation algorithm, we propose a not made or distributed for profit or commercial advantage and that copies simple model of user behavior. Rather than selecting songs bear this notice and the full citation on the first page. 1 There are many approaches beyond random walk models; see [5, c 2012 International Society for Music Information Retrieval. chapter 2] for a survey of recent work. YEAR_1977 Given the edge weights w, the distribution over the initial song x0 can be characterized by marginalizing over edges: Jazz e X X xt we P(x0j w) := P(x0j e)P(ej w) = P : jej wf e2E e2E f2E AUDIO-5/16 Similarly, the probability of a transition xt xt+1 is defined by marginalizing over edges incident to xt: Figure 1. An example random walk on a song hypergraph: vertices represent songs, and edges are subsets of songs. X P(xt+1j xt; w) := P(xt+1j e; xt)P(ej xt; w) Each transition x x must lie within an edge. t t+1 e2E e e X 1[xt+1 6= xt] · x x we = t+1 · t : jej − 1 X f directly from the entire collection X , we assume that the e2E xt wf user first narrows her selection to a subset e ⊆ X (e.g., jazz f2E songs), from which a song x0 2 e is chosen uniformly at Finally, to promote sparsity among the edge weights and random. For each subsequent transition xt xt+1, the user resolve scale-invariance in the model, we assume an IID selects a subset containing the current song xt, and then exponential prior on edge weights we with rate λ > 0: selects xt+1 uniformly from that subset. This user model is exactly characterized by a random P(w ) := λ · exp (−λw ) · 1[w 2 ]: walk on a hypergraph. Hypergraphs generalize undirected e e e R+ graphs by allowing an edge e 2 E to be an arbitrary subset 2.3 Learning the weights of the vertices, rather than a pair (Figure 1). For example, a hypergraph edge may be as general as jazz songs, or as Given a training sample of playlists S ⊂ X ∗, 2 we would specific as funk songs from 1977. Edge weights can be like to find the maximum a posteriori (MAP) estimate of w: used to encode the importance of a subset: for example, a model of jazz playlists would assign high weight to an w argmax log P(wj S) w2 jEj edges containing jazz songs. R+ X X This model has several practically beneficial properties. = argmax log P(sj w) + log P(we): (1) First, it is efficient and scalable, in that the only information jEj w2R+ s2S e2E necessary to describe a song is its membership in the edge sets. Similarly, it naturally supports extension to new songs The MAP objective (1) is not concave, and it is generally without having to significantly alter the model parameters difficult to find a global optimum. Our implementation uses (edge weights). Second, the model can easily integrate dis- the L-BFGS-B algorithm [2] to solve for w, and converges parate feature sources, such as audio descriptors, lyrics, tags, quite rapidly to a stationary point. Training typically takes etc, as long as they can be encoded as subsets. Moreover, a matter of seconds, even for the large playlist collections the model degrades gracefully if a song only has partial and edge sets described in Section 3. representation (e.g., audio but no lyrics or tags). Finally, the model is transparent, in that each transition can be ex- 3. DATA COLLECTION plained to the user simply in terms of the underlying edge taken between songs. As we will see in Section 3, these Previous work on playlist modeling used the Art of the edges often have natural semantic descriptions. Mix 3 (AotM) collection of Ellis, et al. [4]. The existing AotM dataset was collected in 2002, and consists of roughly 29K playlists over 218K songs, provided as lists of plain- 2.2 The playlist model text song and artist names. In this work, we expand and To formalize our model, let H = (X ; E; w) denote a hy- enrich this dataset into a new collection, which we denote pergraph over vertices (songs) X , edges E ⊆ 2X , and as AotM-2011. 4 This section describes our data collection, jEj pre-processing, and feature extraction methodology. non-negative weights w 2 R+ . We assume that the song library X and edge set E are given, and our goal is to op- e 1 3.1 Playlists: Art of the Mix 2011 timize the edge weights w. We denote by xt := [xt 2 e] the indicator that the song xt is contained in the edge e. To expand the AotM playlist collection, we crawled the site Because the selection of the next song xt+1 depends only for all playlists, starting from the first indexed playlist (1998- on the previous song xt and edge weights w, the model is 01-22) up to the most recent at the time of collection (2011- a first-order Markov process. The likelihood of a playlist 06-17), resulting in 101343 unique playlists. Each playlist s = (x0 x1 ··· xT ) thus factors into likelihood of contains not only track and artist names, but a timestamp the initial song, and each subsequent transition: and categorical label (e.g., Road Trip or Reggae).

Load more