<<

01

Random Playlists Smoothly Commuting Between Styles

MARCOS ALVES DE ALMEIDA, Universidade Federal de Minas Gerais, Brazil CAROLINA COIMBRA VIEIRA, Universidade Federal de Minas Gerais, Brazil PEDRO OLMO STANCIOLI VAZ DE MELO, Universidade Federal de Minas Gerais, Brazil RENATO MARTINS ASSUNÇÃO, Universidade Federal de Minas Gerais, Brazil Someone enjoys listening to playlists while commuting. He wants a different playlist of n songs each day, but always starting from Locked Out of Heaven, a song. The list should progress in smooth transitions between successive and randomly selected songs until it ends up at Stairway to Heaven, a Led Zepellin song. The challenge of automatically generating random and heterogeneous playlists is to find the appropriate balance among several conflicting goals. We propose two methods for solving this problem. One is called ROPE, and it depends on a representation of the songs in an Euclidean space. It generates a random path through a Brownian Bridge that connects any two songs selected by the user in this music space. The second is STRAW , which constructs a graph representation of the music space where the nodes are songs and edges connect similar songs. STRAW creates a playlist by traversing the graph through a steering random walk that starts on a selected song and is directed towards a target song also selected by the user. When compared with the state of the art algorithms, our algorithms are the only ones that satisfy the following quality constraints: heterogeneity, smooth transitions, novelty, scalability, and usability. We demonstrate the usefulness of our proposed algorithms by applying them to a large collection of songs and make available a prototype. CCS Concepts: • Information systems → Music retrieval; • Applied computing → Sound and music computing; Additional Key Words and Phrases: Music, Sound and Music Computing, System applications and experience, Knowledge and data engineering tools and techniques. ACM Reference Format: Marcos Alves de Almeida, Carolina Coimbra Vieira, Pedro Olmo Stancioli Vaz De Melo, and Renato Martins Assunção. 2019. Random Playlists Smoothly Commuting Between Styles. ACM Trans. Multimedia Comput. Commun. Appl. 9, 4, Article 01 (August 2019), 20 pages. https://doi.org/0000001.0000001

1 INTRODUCTION With the rise of music streaming services, such as Spotify, Apple Music, and Deezer, music playlists generation became an important research topic [10]. With their smartphones, people have instant access to millions of songs, which can be easily compiled into playlists to be listened anywhere, anytime. There are many activities where people turn to music playlists to help ease the monotony and provide motivation such as, for example, in workout gyms or office spaces. A dichotomy in such scenarios is that while people’s mood gradually changes throughout the activity, the playlist usually circulates impassively over similar songs [4, 7, 16, 17, 19], so there is nothing left to the

Authors’ addresses: Marcos Alves de Almeida, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, marcos. [email protected]; Carolina Coimbra Vieira, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, carolcoimbra@ dcc.ufmg.br; Pedro Olmo Stancioli Vaz De Melo, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, olmo@dcc. ufmg.br; Renato Martins Assunção, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, [email protected].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice andthe full citation on the first page. Copyrights for components of this work owned by others than the author(s) must behonored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. 1551-6857/2019/08-ART01 $15.00 https://doi.org/0000001.0000001

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. 01:2 M. Almeida et al. user but to manually change the current playlist for a more appropriate one. Under workouts, for instance, some users may like to progressively increase the tempo of the songs in the playlist until a given point, where the songs should gradually change to more relaxing ones [2]. Imagine a user who may want to balance out two opposing wishes. For one side, to listen to the songs they love as often as possible, but risking to get tired of them. For the other side, to listen to new and unexpected songs that eventually add to that beloved list. This user want ever new instances of randomly generated playlists with songs revolving around his preferred styles or performers. This task becomes particularly harder if we add the requirement that the playlists should span widely diverse styles in order to accommodate mood fluctuations. The good news is that gradually changing the songs in order to connect significant different genres would naturally yield songs that very likely are not known by the user, which is a desired property for playlists [38]. This scenario is just one illustration of many other situations where a user may desire a playlist with some specific characteristics. Although the quality of a playlist is commonly associated with its homogeneity in terms of the songs’ similarity [4, 7, 16, 19], it is well known that a playlist does not need to be entirely homogeneous if the songs fit in a given context or purpose6 [ , 20, 34, 37]. Indeed, it is hard to imagine a homogeneous playlist that fits well in a wedding or graduation party. These events may be composed by a very heterogeneous group of people in terms of age, cultural background, social status and, consequently, musical taste [14]. Thus, in such events, it is expected that all groups of people can listen to songs of their liking at some point, and a smooth transition between these likely diverse songs is appreciated [4,7, 16, 19]. Based on all the constraints previously discussed, the focus on this paper is the proposal of algorithms to generate random playlists that may be very heterogeneous at the user discretion and satisfying a set of established desired properties. The main goal of the algorithms is to randomly select an ordered list of k songs out of a large music collection. This list has two (or more) specific songs, called anchor songs, pre-selected by the user and potentially of widely different genres. The user also selects the desired number of songs in the playlist. The motivation behind letting the user define the anchor songs of the playlist is that he controls the region of the musicspacehe would like the playlist to be, satisfying his music taste. Also, by defining the number of songs inthe playlist, he implicitly controls its duration. The generated playlist should favor smooth transitions between successive tracks, even if from different styles, and should be randomly generated in order to satisfy the novelty property (e.g. different playlists at each commuting day). Moreover, the method should require minimal user effort in the process and should be able to pass through different genres as fast as the user desires. Finally, the method should be fast and scalable, i.e.,it should be able to generate playlists from very large music collections and in real time. One method to generate heterogeneous playlists with smooth transitions is to simply concatenate different homogeneous playlists. But this is a poor approach, since the sudden and drastic change when switching between the playlists may annoy the users. A better solution would sort the songs in such a way that successive songs present a smooth transition, even when changing between styles [34]. However, this solution may not be feasible since ordering a list of songs to have smooth transitions is similar to the TSP and may not scale with the size of the playlist. Our algorithm should be able to generate smooth and random trajectories between widely spread endpoint songs in a certain music space. For example, one starting from Elvis Presley and ending up with Daft Punk without rough transitions between successive songs. The random aspect guarantees different playlists every time the algorithm is run avoiding boredom. The smoothness aspect creates an atmosphere where emotions are built and developed in a pleasant and steady way. Considering automated generation of random and heterogeneous playlists, there are several aspects that make this a challenging task. First, there is usually a large number of available tracks that can potentially be added to the playlist [7]. If we consider a heterogeneous playlist, this number

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. Random Playlists Smoothly Commuting Between Styles 01:3 can be extremely large, requiring an efficient and scalable algorithm. Second, to better decide which songs should be added to the playlist to satisfy users taste, a well defined similarity measure between tracks is desired. Even thought there are many proposed methods to calculate the similarity between tracks using acoustic characteristics (as timbre, pitch and harmony) and metadata (as tags and popularity), this is a task that yet does not have a standard solution [4, 7, 11]. Third, playlists should maintain smooth transitions between consecutive songs. Unfortunately, similarity- based algorithms, which are the most common approach [4, 7], generate smooth transitions while maximizing playlist homogeneity. This compromises the diversity and serendipity of the tracks, which are reported as desired properties for playlists [4, 7, 38]. Fourth, there is the problem of clustering the songs and coherently mapping them into a music space, since a heterogeneous playlist should pass through significantly different genres in an orderly and smooth way. Finally, playlists should present an appropriate balance between familiarity and novelty [19, 37], i.e., they should have both expected and unexpected songs. The first contribution of our work is the formulation of the problem of generating high quality random and heterogeneous playlists and the proposal of a general method for solving it. Then, we propose two algorithms that implement such method and that satisfy a set of quality constraints. We also demonstrate the usefulness of our proposed algorithms by applying them to the large collection of songs used in [26]. We show that our two proposed algorithms are able to generate playlists with smooth transitions and, at the same time, to cover significantly different genres selected by the user. The algorithms are also random, consistently using new songs each time a new playlist is generated. Our proposed algorithms satisfy five quality criteria (heterogeneity, smooth transitions, novelty, scalability and usability), in contrast with the other state of the art algorithms. Finally, a prototype of our method is available at http://homepages.dcc.ufmg.br/~marcos.almeida/musics/. To the best of our knowledge, this is the first work to propose a method for automatically generating random heterogeneous playlists with all those quality constraints. In the rest of the paper, we use Section2 to describe the related work. In Section3, we describe the proposed general method to generate random heterogeneous playlists and two algorithms that implement such method, namely ROPE and STRAW . The dataset and how we use it to create a music space are described in Sections4 and5, respectively. The experiments showing the effectiveness of our proposals are shown in Section6. Finally, in Section7, we describe the conclusions.

2 RELATED WORK

Song representation. A playlist is defined as a finite sequence of songs {s1,...,sn }. A useful step for automatically generating quality playlists is to define a similarity (or distance) metric between songs [4, 7]. Such metric allows to identify songs that can be added after a song in a playlist without causing harsh transitions between them. In this case, each song si can be represented 1 m j by a numerical vector xi = (xi ,..., xi ), where the values xi denote features of the given song. In their survey, Bonnin and Jannach [4] provide a very comprehensive taxonomy of background knowledge databases that could be used to extract such features. First, one can extract musical features from the audio signal, such as harmonic [26] or timbral [18, 26, 28] characteristics, using, for instance, the Mel Frequency Cepstrum Coefficients, to compare [9, 34] or classify [11] songs. Second, features can be derived directly from metadata and expert annotations, such as year of release, artist and genre information [28]. Third, from social web data, it is possible to extract valuable information about music listening behavior, which can be described, for instance, in Twitter posts [13]. Finally, usage data has been extensively used to characterize musical tracks, including radio station playlists [24, 36], user listening logs [27] and collections of personal playlists [5]. In [22], for example, the authors used tags given by users and audio codewords to represent a song.

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. 01:4 M. Almeida et al.

Desired properties for playlists. Recent studies that analyzed playlists and their automatic gen- erators described objective measures that could be used to evaluate the quality of playlists [4, 7, 16, 19, 20, 37]. While it is a consensus that all playlists should have smooth transitions between successive tracks, some studies indicate that diversity can be as important as homogeneity [4, 19]. As discussed in [37], in a scenario where a user enjoys two diverse music genres, the transition between songs plays an important role toward user satisfaction with respect to the playlist. Accord- ing to [20], the homogeneity of track features and the diversity of artists were ranked as the most important criteria for the quality of playlists. The amount of homogeneity can vary drastically, depending on the intended duration of the playlist, the targeted audience, and among users, with some listeners having a wide taste while others are restricted in their choice. This suggests that, if one is certain that all songs available are enjoyed by the listener, then a playlist could be created with the appropriate balance between homogeneity (in order to satisfy smooth transitions) and diversity (satisfying heterogeneity). Beside diversity and smooth transitions, playlist generators should meet other quality criteria. Taramigkou et al. [38] highlighted the importance of offering serendipitous and novel experiences to playlist listeners, i.e., playlists should always reveal new sequences of songs. Moreover, since the digital music collections are quickly growing, automatic playlist generators need to be fast and scalable, as pointed out by the studies of [2, 29, 32], which were motivated by the need of improving the scalability of previous playlist generators. Concerning the goal of this paper, we propose five quality metrics for evaluating generators of random and heterogeneous playlists: • novelty, the generator should produce significantly different playlists each time, independently of the user input parameters; • heterogeneity, the playlists should be heterogeneous, passing through different genres as fast as the user desires; • scalability, the generator should be fast, independently of the size of the playlist and the database; • smooth transitions, the generated playlists should have smooth and coherent transitions between consecutive tracks; • usability, the generator should be as parsimonious as possible, requiring minimum user effort and input; To satisfy all these properties, a generator faces conflicting objectives, such as guaranteeing smooth transitions while providing wide heterogeneity. Thus, a challenge is to search for the appropriate balance among such properties, finding a favorable trade-off.

Automatic generation of heterogeneous playlists. An approach to generate heterogeneous playlists is by constrained optimization [1, 2]. Given a set of tracks, their characteristics, and a set of explicitly specified constraints, the goal is to create an optimal sequence of tracks that satisfies the constraints. Unfortunately, most of these approaches do not satisfy the novelty property, since they are deterministic, i.e., they generate the same optimal playlists given the set of constraints. Moreover, they usually have high computational complexity (scalability), which makes the problem intractable for larger music collections [4, 32]. In [33], for instance, the authors proposed a simulated annealing algorithm to determine the best sequence to listen to a set of songs in order to satisfy some constraints. Even though this solution can satisfy novelty criteria by adding a random factor to the algorithm, it may take time to converge to a feasible solution. In [34] the authors proposed Travelling Salesman algorithms to be applied to audio similarity graphs in order to generate a sequential order of all songs in a database. In this case, the “playlist” is composed of all songs and the order is always the same, going against scalability and novelty, respectively. Because of that,

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. Random Playlists Smoothly Commuting Between Styles 01:5 different local search procedures were proposed, such as genetic algorithms [15]. Unfortunately, these solutions require complex user inputs for generating heterogeneous playlists (no usability). Random walks on graphs are also used to automatically generate playlists, but they have a disad- vantage in our problem. These approaches usually initiate circulating between songs close to the seed track, taking a long time before switching to a different genre, generating very homogeneous playlists, what harms heterogeneity. Besides that, giving only a seed track, the user can not control the music genre the playlist should go toward, harming usability. One of the previews works that constructed music similarity graphs is the work of Ragno et al. [36], who analyzed radio station playlists to construct a weighted graph where each node represents a song and the edge weight is proportional to the number of times the two songs are played consecutively. Later, Maillet et al. [24] proposed a similar approach, but including in the graph songs that were not present in the training data. Miotto et al. [30] proposed to construct a similarity graph using both acoustic characteristics and the text description of the songs. The goal is to retrieve songs that fit into a context, which is described by a query text, or given by an audio example, which is described by a seed song. Results showed improvements when compared with other algorithms of the literature. Although this method achieved good results in retrieving songs similar to a seed song (creating playlists with smooth transitions), it can lead to homogeneous playlists. Contrary to the aforementioned studies, McFee and Lanckriet proposed a graph-based approach able to generate heterogeneous playlists [29], but only if the input dataset has heterogeneous playlists on it, which harms usability and, potentially, scalability. Similarly to [29], several methods were proposed to automatically generate personalized playlists from user data [3, 5, 39]. Again, such methods are able to generate heterogeneous playlists only if the target user behaves accordingly. Even thought these studies aim to understand user behavior, an ideal playlist generator algorithm should be able to generate heterogeneous playlists from arbitrary music collections. To the best of our knowledge, the work of Flexer et al. [9] and Pontello et al. [35] are the most similar to our proposal, and will be used as baselines. Flexer et al.[9] propose a method that, starting from a specified song s, generates a playlist with k − 2 intermediate songs until an ending song e, also given by the user. Based on an audio similarity metric that defines a music space, the method generates p − 2 equally spaced points between songs s and e and selects the nearest songs to each of these points to be part of the playlist. As drawbacks, this method always generates the same playlist given s and e (lacking novelty) and demands a parameter %p, which is related to the percentage of songs that will be removed from the topology before generating the playlist, which harms usability. More recently, inspired by the work of Pampalk et al. [31], Pontello et al.[35] proposed an online music space navigation algorithm that potentially creates heterogeneous playlists. From a starting song, a set S of similar songs is retrieved and, from them, the next song is chosen probabilistically. The user can indicate the target direction t to which the navigation should go (e.g. rap) and, as in [31], can skip the current song, which updates the navigation direction vi. When the latter does not occur, vi is updated online at each song si in the playlist, being given by the average −→ −−−−→ among three vectors: sit, si−1si and vi−1. The probability for a song sj ∈ S being chosen is given by −−→ 1 − d(vi,sisj ). Note that this algorithm does not allow the user to define the duration or the desired number of intermediate songs of the playlist before reaching the target destination t and, therefore, does not satisfy our definition of heterogeneity. In Table1, we summarize the comparison of our two proposed algorithms, ROPE and STRAW , with the state of the art methods. Our proposals satisfy all five criteria, in contrast with the other algorithms.

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. 01:6 M. Almeida et al.

Table 1. Comparison among state of the art methods with our two proposed algorithms, ROPE and STRAW.

Constrained optimization R. Walks TSAs Usage-based models Flexer et al. Pontello et al. ROPE STRAW [1,2, 15, 32][24, 36][34][3,5, 29, 39][9][35] (our method) (our method) heterogeneity ✓✓✓✓✓✓ smooth transitions ✓✓✓✓✓✓✓✓ novelty ✓✓✓✓✓ usability ✓✓✓✓ scalability ✓✓✓✓✓

3 METHODS In this section we describe the proposed general model to generate random and heterogeneous playlists (Section 3.1) and two algorithms that implements such model. First, in Section 3.2, we present Brownian Path generator (ROPE). Then, in Section 3.3, we present Steered Random Walker (STRAW ).

3.1 General Model The objective of this paper is to propose an automatic generator of random and heterogeneous playlists. As discussed in the previous section, the generator should satisfy all of the following quality constraints: heterogeneity, smooth transitions, novelty, scalability and usability. In this section, we describe a general method that, when implemented accordingly, will guarantee that the generator will have all these properties. In order to automatically generate a random and heterogeneous playlist, first we have to define a metric space (S,d) for a given set of n songs S = {si ,i = 1,...,n} with d being the distance 1 m m function. In this metric space, each song si is represented by a vector si = (si ,...,si ) ∈ R , where j + the values si denote features of the given song and d : S × S → R measures the distance between two songs. From now on, we call the ordered pair (S,d) as music space. As discussed previously, we want this method to be as parsimonious as possible, requiring minimum user effort in order to satisfy our usability criterion. Moreover, the users also decide the time they want to spend listening to the playlist. Thus, we decided that the user has to give only three inputs. The first one is an initial anchor or seed song s0. The second one is direction vector anchor, which we represent by a vector vd in the music space. This direction vector may be a final song for the playlist, or it may be a target tag to which the playlist should drift towards,orit may be a genre where the playlist should end up. In all these three alternative cases, the anchor vector vd indicates which direction the playlist should travel on the music space from the seed song s0. The third input is k, the desired number of songs in the playlist. The value of k implicitly controls the time duration of the playlist. The value of k also implicitly controls the smoothness of the transition between songs in the playlist. If the two songs given as input are distant in the music space and the value of k is small, we would need to give bigger steps in the music space, what probably would create abrupt transitions. From now on, we abuse the notation using the subscript to also denote the position of a given song in the playlist order, where 0 is the subscript for the first song in the playlist. When generating a heterogeneous playlist, one of the main goals is to generate playlists with smooth transitions between successive songs. Thus, the process of selecting songs from s0 is done recursively, where a new song si+1 is added to the playlist if it is similar to the current song si . In order to do that, two functions are needed. First, let F : S → 2S be a filter function over the set of songs S that, from a song si ∈ S selects a subset F(si ) of S containing all the songs that are eligible to be selected as the following song in the playlist. We say that a song is eligible if, when it is

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. Random Playlists Smoothly Commuting Between Styles 01:7

m S inserted in the playlist after si , it will generate a smooth transition. Second, let c : S ×R ×2 → S be a function that probabilistically selects a song in F(si ) to be the next song si+1 in the playlist. This function is designed to satisfy the following properties:

(1) d(si , vd) > d(si+1, vd), i.e., song si+1 should be closer to the second anchor point vd than si ; (2) P(c(si , vd, F(si )) = sj ) < 1 sj ∈ F(si ), i.e., the algorithm has to be non-deterministic, satisfying the novelty quality∀ constraint. In Algorithm1 we describe the general method to generate a heterogeneous playlist s using the aforementioned formalization. Line 1 shows the input: the music collection S, the seed song s0, the direction vector vd and the desired number of songs in the playlist k.The algorithm keeps adding songs to the playlist s by filtering eligible songs to be added in position i + 1 through function F(si ) and, after that, selecting the song to be added using function c(si , vd, F(si )) (lines 3 to 5), and stops when the desired number of songs k is reached.

Algorithm 1 Generates a heterogeneous playlist.

1: procedure Generate(S, s0, vd, k) 2: i ← 0 3: while i < k − 1 do 4: P ← F (si ) ▷ set of eligible songs 5: si+1 ← c(si, vd, P) 6: i ← i + 1 return s

The complexity of this algorithm is k × (O(F) + O(c)). As we show in Sections 3.2 and 3.3, the complexity of F can be negligible when using quad-trees or graphs to select the eligible nodes. Moreover, the idea of filtering the nodes before selecting one of them to add to the playlist isto significantly reduce the search space and, as a consequence, the number of times the operation d(si , vd) > d(si+1, vd) is executed. If we limit F to retrieve a constant number of eligible nodes, then the complexity of c is constant, or O(1). As we show in Sections 3.2 and 3.3, both algorithms we propose in this paper are highly scalable, being linear with the playlist size. With respect to the space complexity of the algorithm, we only need to store the vector representation of each song to calculate the similarity between them. Since all vector representation have the same size, the space complexity of the algorithm is O(n) where n is the number of songs in the music space.

3.2 ROPE Here we describe our first approach to generate heterogeneous playlists from music collections, called Brownian Path generator (ROPE). From an initial song s0, a final song sk−1 (or vd), and the number of songs in the playlist k, all given by the user, ROPE generates a random path in the music space from s0 to vd with k − 2 intermediate points. The random path is the sequence of k − 1 line segments connecting these points. ROPE allocates a song in the music space to each of these points creating a playlist with exactly k songs from song s0 to song sk−1. To generate the random path, we use a Brownian bridge model with discrete steps [8]. The Brownian bridge is derived from the Brownian or Wiener process. The latter is a stochastic process {W (t) : t ∈ R+} such that all its realizations are continuous functions, with W (0) = 0 with probability 1, and with independent increments (that is, W (t + u) −W (t) is independent from W (s) for 0 < s < t) which are Gaussian (that is, W (t + u) − W (t) ∼ N (0,u). The Brownian bridge B(t) is simply the Wiener process conditioned on returning to the origin at timeT : B(t) = {W (t)|W (T ) = 0}. It can be shown [8] that the Brownian bridge can be obtained from an unrestricted Wiener process as B(t) = {W (t) − (t/T ) W (T ) : t ∈ [0,T ]} .

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. 01:8 M. Almeida et al.

Fig. 1. Four independent realizations of a Brownian bridge.

Fig. 2. Illustration of ROPE algorithm in a 2-Dimensional Euclidean space.

Figure1 shows 4 independent realizations of the Brownian bridge process. The random Brownian bridge path is transposed to the metric space (S,d) by identifying the origin (0, B(0)) = (0, 0) and its final point (T, B(T )) = (T, 0) with the seed song s0 and the final anchor song sk−1. Next, we select any direction n orthogonal to the direction sk−1 − s0 determining a bi-dimensional plane P in Rm. Finally, with the two extremes of the Brownian bridge curve pinned at the two anchor songs we lay the entire Brownian bridge curve in P. When the metric space is (S,d) = (R2,d) the process is simpler as the rotating plane P is S itself. Figure2 illustrates how ROPE transforms a random path in a 2D Euclidean space in a path in the music space, which connects the anchor songs selected by the user. In this example, the random path starts from the origin, but not necessarily ends on the x-axis. Thus, the first step is a rotation to make the vector connecting the first and last song of the path to be on the same direction of the vector connecting s0 to sk−1. Finally, we perform a scaling followed by a translation operation, which results in a path connecting songs s0 and sk−1. Figure3 shows a two-dimensional music space with songs represented by colored points and with one Brownian bridge path (in blue) in a two-dimensional music space. We fully explain this figure in Section5. With the random curve B(t) embedded into (S,d), the playlist is created. Starting with s0, the function F(si ) sequentially selects a subset of songs eligible to be added after song si . Take the image of the k − 2 equi-spaced points (i/(k − 1), B(i T /(k − 1))) for i = 1,...,k − 2 along the

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. Random Playlists Smoothly Commuting Between Styles 01:9 embedded Brownian bridge curve. The i-th intermediate point give origin to the set F(si ), which is composed by the set of nearest neighbor songs of the i-th intermediate point B(i T /(k − 1)). The user can select different neighborhood structures. For illustration purposes, we will detail one such selection when (S,d) = (R2,d). We use a quadtree to divide the two-dimensional music space into l cells (in our case, l = 100). F(si ) uses the quadtree to retrieve the points in the space that are located in the quadree cell closest to point (i/(k − 1), B(i T /(k − 1))), which is the point in the path just after song si−1. From this set of points, function c(.) is a simple selection of the closest point to (i/(k − 1), B(i T /(k − 1))), which represents the next song si to be added in the playlist. In ROPE, the complexity of F(.) is O(1) (only one cell of the quadtree is selected) and the complexity of c(.) is O(n/l), since the expected number of songs in each cell is approximately n/l. Thus, the complexity of ROPE is O(kn/l), i.e., linear with the size of the playlist, which satisfies our scalability criteria. It is important to point out that the case (S,d) = (R2,d) is only illustrative. For spaces of different dimensions, the description of ROPE given above can be easily adapted to create the random playlist. In our experimental results shown in section6 we used a two-dimensional space. After extensive tests, we selected the t-sne dimension reduction algorithm [23] in our implementation. As we show in Section5, this technique was able to generate a very coherent and compact two-dimensional music space. In Figure3, there are four trajectories with the same seed song s0 and the same destination song. The ones shown as black lines were generated by ROPE and the ones in red line are generated by STRAW , our second algorithm, described next. In the same figure is shown (in blue) a Brownian bridge trajectory transposed to the music space. It is also important to point out the difference between ROPE and the algorithm proposed by Flexer [9]. Both algorithms connect two songs selected by the user with a path crossing the music space. But while Flexer is deterministic, ROPE generates a different path at each execution, satisfying the novelty criteria.

3.3 STRAW

ROPE is able to generate directed random paths from s0 to vd with exactly k steps. However, ROPE is not topology-aware. ROPE may generate a path crossing a sparse or even empty region between s0 and vd in the music space (S,d), possibly populating the playlist with harsh transitions. This is the motivation behind Steered Random Walker (STRAW ), a topology-aware heterogeneous playlist generator based on similarity graphs that we present in this section. From a similarity threshold τ , STRAW constructs a graph G(V , E) where each node vi ∈ V is associated with a song si ∈ (S). There is an edge between two nodes vi and vj only if d(si ,sj ) < τ , where si and sj are the corresponding songs of nodes vi and vj , respectively. Then, user inputs the seed song s0, the direction vector vd (e.g. a song) and a desired number of songs k. From these parameters, STRAW generates a directed random walk from v0 towards vd. We call this walk directed because STRAW gives higher probabilities to visit nodes that are closer to vd. Differently from ROPE, STRAW cannot guarantee that the destination song sd will be reached in exactly k steps. Nevertheless, it guarantees that the playlist will have exactly k songs and that sd is on it. If song sd is reached before k steps, then a random walk from node sd is initiated until k steps are reached. If sd is not reached with k steps, the directed random walk proceeds until sd is reached, what generates a playlist p′ with k ′ > k songs. After that, we remove from p′ the k ′ − k songs si with the smallest d(si−1,si+1) values. This minimizes the impact of removing songs that harms the smooth transitions and also guarantees a playlist with exactly k songs. For STRAW to work, the similarity graph G(V , E) must be created beforehand. This graph must be connected so any pair of songs can be reached through a path. Additionally, we want to allow for a certain minimum number κ of neighbors to each song so the algorithms have more than one way out from a node. We also want represented in this graph’s topology songs that are similar

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. 01:10 M. Almeida et al. to many others besides its minimum number κ, i.e., they should have a degree higher than κ. To obtain such a graph, we merge the spanning trees (to guarantee connectivity and a minimum node degree) and add to them all edges that connect similar songs. This is done by the following procedure. We start with a weighted complete graph that connects all pairs of songs with d(si ,sj ) as the weight. We create a minimum spanning tree (MST) in the music space, represented by a graph MST1(V , E1). From E1, we select the edge with the maximum weight τ , which is the smallest threshold value that, when used to create a similarity graph, guarantees that the graph is connected. After that, we remove all edges E1 from the initial complete graph and generate a second minimum spanning tree MST2(V , E2) with the remaining sub-graph. We keep sequentially removing the edges Ei to end up generating κ MSTs MSTi (V , Ei ), 2 ≤ i ≤ κ with no intersection among their edge Ðκ sets Ei . We then define the similarity graph G(V , i=1 Ei ), a union of the κ MSTs. This guarantees that, independently of which song is picked up as the seed s0, there are at least κ − 1 other songs connected to it and that to be added to the playlist. Finally, since many pairs of similar songs are still out of E, we create a sorted list le = (e1, e2,...) containing all edges e that are out of E and with weight we ≤ τ . This list is in increasing order and sorted by the edges’ weight value. Starting from e1, we add edge ei = (u,v) ∈ le to E if nodes u and v have degree smaller than Ld . This upper bound guarantees that function c(.) will execute a constant number (O(Ld )) of operations when selecting the next song in the playlist. The values of κ and Ld are parameters to construct the music similarity graph and can be set in the development phase. In our prototype, we used κ = 5 and Ld = 500. The similarity graph G(V , E) needs to be created only once. After its creation, STRAW works as described in Algorithm1. Since the navigation structure is a graph, retrieving the set of songs that could be added after song si using F(si ) is straightforward. F(si ) simply outputs the set P of songs that are connected to the correspondent node vi of song si in G. Since the degree is upper bounded by Ld , the complexity of F(.) is O(1). From P, function c(.) needs to select a song according to properties P1 and P2 described in Section 3.1. To do that, from the set of l eligible nodes P = v1,...,vl , a random variable X is created under the domain of P to randomly select the node corresponding to the next song in the playlist. To satisfy P1, X needs to assign higher probabilities to nodes that are located in the direction of vd. Moreover, to generate a playlist with an expected number of k steps, X should give higher probabilities to nodes that do not advance too much or too little in the path. Therefore, we define the desired step size δˆ in a playlist with k steps from s0 to vd as δˆ = d(s0, vd)/k. For each node v ∈ P, we calculate the step size δ that will be given if v is selected: δ = d(s , v ) − d(v, v ). We v v i d d ( ) ( − ˆ )−1 can give to each node v a likelihood Θv of being selected by c . , which is Θv = 1 + δv δ . Θv is proportional to how close the step δv is to the desired step δˆ. With that, the probability of selecting node v ∈ P is given by:

Θv P(X = v) = Í . (1) u ∈P Θu

Thus, function c(si , P, vd) selects the next song to be added to the playlist by drawing a random node from X. The song associated to that node is added to the playlist. This procedure creates a directed random walk from s0 to vd. Since the size of P is upper bounded by Ld , c(i) runs in constant time (or O(1)). This makes STRAW complexity to be O(k), linear with the size of the playlist, satisfying our scalability constraint. In Figure3, two out of the four trajectories with the same seed song s0 and the same destination song were generated by STRAW using the same parameters as the ones used for ROPE, also depicted in this figure.

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. Random Playlists Smoothly Commuting Between Styles 01:11

4 DATA SETS In this work, we will use three datasets, one to construct our music space and the other to validate the space and the proposed algorithms. The first one was used by Mauch et25 al.[ ] and comprises 17,094 songs that appeared in the US Billboard Hot 100 between 1960 and 2010. This dataset contains songs also used by [21] to study acoustic characteristics presented in popular music. It provides a diverse dataset, with a large variety of styles and a rich set of features. Every song si is characterized by a distribution hi over eight harmonic topics that capture classes of chord changes (e.g. ‘dominant-seventh chord changes’) and a distribution ti over eight timbral topics that capture particular timbres (e.g. drums, aggressive, female voice), which were extracted from audio samples and annotated by experts. Each song is associated with its year of release and with a subset of 140 genre tags, such as rock, pop and soul. The value of the tag is 1 if the song is of that genre, and 0 otherwise. A song may have several tags. We removed 19 tags associated with the decade and the country of the music (e.g. 80s and UK), since we already have the year of release as a feature, and the country may over discriminate similar songs. We could also have used the country tags in our analysis, but we decided to measure the similarity between songs using only the year of release (as some users may prefer to listen to old songs instead of the most recent ones), the genres of songs and the acoustic characteristics extracted from the audio. After removing these tags, all songs that were not associated with any genre tag were excluded from the dataset, resulting in 15,763 remaining songs. This dataset is called the Billboard dataset. The second dataset is provided by Brian McFee [29] and contains all playlists published in the Art of the Mix website 1 from 1998-01-22 to 2011-06-17, a total of 101, 343 unique playlists. Although one can always find some similarity between any pair of songs (e.g. by their year of release orby their appearance in the same movie soundtrack), it is well known that songs co-occurring very often in playlists created by users are likely to be similar[12]. In order to make this assessment, we removed from the set all playlists that contain three or fewer songs present in the Billboard dataset, resulting in a collection of 10, 301 playlists with 7, 117 distinct songs. We call this dataset the AotM dataset. The third dataset consists of users’ playlists extracted from Spotify using their API 2. We first searched for Spotify users, and then, for each user, we extracted all his public playlists, totaling 303, 703 different playlists. Using the same approach as in the AotM dataset, we removed all playlists containing three or fewer songs presented in the Billboard dataset, resulting in 54, 793 playlists with 6, 286 distinct songs. We call this dataset the Spotify dataset. Both the AotM dataset and the Spotify dataset will be our validation dataset, used to assess the quality of the generated music space described in the next section.

5 GENERATING THE MUSIC SPACE As described in Section2, there are many ways to generate a music space from data. In short, a music space is a formal representation of the songs by their characteristics, such as their acoustic characteristics, metadata and co-occurrences in playlists. Songs are positioned in the space according with a function that measures the similarity among them according to their feature representation. In the Billboard dataset, each song is characterized by four feature groups: timbral, harmony, year of release (or freshness) and genre features. In order to put all groups in the same scale, we transformed each group into a probability distribution, which is already the case for timbral and j harmony features. For the genre group, we divided the value of each tag дi , 1 ≤ j ≤ 121 by the Í121 j number of tags j=1 дi that the song si is associated with. For the freshness group, we performed

1www.artofthemix.org 2developer.spotify.com

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. 01:12 M. Almeida et al. a unity-based normalization, transforming the year of release yi of song si into yˆi , which is the yi −1960 ( − ) normalized value of yi on a scale from 0 to 1, i.e., yˆi = 2009−1960 , and made yi = yˆi , 1 yˆi . After performing the normalization, each song si is represented by a feature vector si = (hi, ti, gi, yi), where each group of features is a probability distribution. A music space can be trivially defined from a distance metric, e.g. cosine distance or Kullback-Leibler divergence. In order to make it visually appealing, we use the t-SNE dimensionality reduction technique [23] over si. The t-SNE algorithm uses the Kullback-Leibler distance to better preserve the distance between the data on the original space on the new reduced space. After reducing the dimensionality of the data, t-SNE uses a gradient descent algorithm to minimize the differences in the distances between songs compared with their distance on the original space. When using t-SNE to reduce the dimensionality of the data to a 2D Euclidean space, songs that are close on this new space are also similar in the original space, i.e. they should be acoustically similar. We emphasize that the dimensionality reduction technique is not necessary, as Algorithm 1 only needs to retrieve songs similar to a seed song and randomly select from these a song that is closer to vd . Thus, it can use any similarity function for any music space, including the original multidimensional space. We decided to keep the music space constructed using t-SNE to avoid the curse of dimensionality and to better visualise the path of the generated playlists.

Fig. 3. The generated music space with four heterogeneous playlists generated by our proposed methods, ROPE and STRAW. Each point represents a song, which is colored according to its genre. In blue we illustrate a Brownian bridge in the music space.

Figure3 shows the music space generated from t-SNE using perplexity parameter equal to 50 from the feature vectors si. Each point represents a song and its color denote its associated genre. Songs with more than one genre were colored with the less popular one. We decided to color the songs with the less popular genre to better visualize different genres. When the most popular genre is used to color the map, genres like pop, rock and soul color the most part of the map, hiding the position of other genres and their distance with them. Again, to make the figure visually appealing, we only colored the 9 most popular genres, which covered ≈ 85% of the songs. The rest was colored with light gray. Songs were coherently grouped according to their genres and similar genres are located near each other. For instance, a path from rock to hip-hop shall pass through pop and dance songs. However, songs from the same genre do not have to form a single cluster. In these cases, timbral

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. Random Playlists Smoothly Commuting Between Styles 01:13 or harmony features play an important role selecting the appropriate location. Isolated islands were created with songs that are similar to each other but different from the remaining songs. For example, the two islands located on the West and South borders are composed mainly of oldies and soul songs. The horizontal axis is highly correlated with the freshness of the song. Older songs are located more on the left-hand side of the space, where new songs, as those from hip-hop, are more on the right-hand side. We use the Euclidean distance over the t-SNE coordinates as our distance metric d(si ,sj ). To assess the quality of the generated music space, we compared the distance between songs in our space with their co-occurrences in the playlists of the AotM dataset and the Spotify dataset. For each validation dataset we calculated the number of times each pair of songs co-occured in the playlists and their (Euclidean) distance in our music space. In Figure4 we show the boxplots of the distances grouped by the number of co-occurrences for each dataset. Although we could have calculated a correlation metric between the songs’ number of co-occurrence on users’ playlists and the distance between songs in the music space, the figures clearly show that the median distance decreases as the number of co-occurrences increases. This suggests that our music space is able to group similar songs together, as similar songs tend to co-occur more frequently in playlists created by users [12].

(a) Boxplot of distance given number of co-occurrence in (b) Boxplot of distance given number of co-occurrence in AotM dataset Spotify dataset

Fig. 4. Evaluation of the constructed music space using the validation datasets.

It is important to point out that the music space could be constructed using different datasets, feature sets, and even distance metrics. Nevertheless, for the purpose of this work, the music space described in this section is highly appropriate. First, the dataset contains more than 15,000 popular songs (all of them appeared in the US Billboard Hot 100 chart) of many different genres, serving as a large-scale song collection for generating heterogeneous playlists that, very likely, will contain well-known songs. Second, the feature space is comprehensive, describing acoustic and meta characteristics of the songs and does not rely on usage data, which avoids the cold start problem. Finally, from the t-SNE representation, it is possible to easily visualize the path crossed by the playlist in the music space. In Figure3 we show the path crossed by four playlists generated by our algorithms, two by ROPE and two other by STRAW, all of them receiving as input the same parameters.

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. 01:14 M. Almeida et al.

6 EXPERIMENTS Evaluation Metrics. As discussed before, we proposed five desired properties for random het- erogeneous playlist generators. We have already shown in Section 3.1 that our proposed algorithms satisfy both scalability and usability. To measure how effective they are in the other properties, we propose four evaluation metrics for a playlist of size k: ST 1 (Smooth Transitions 1). ST 1 measures the mean distance between consecutive songs. The smaller the value of ST 1, the smoother are the transitions. It is calculated as: Ík−2 d(si ,si+1) ST 1 = i=0 (2) k − 1 ST 2 (Smooth Transition 2). ST 2 corresponds to the maximum distance between consecutive songs. This metric quantifies the most abrupt jump in the playlist, being calculated as: ST 2 = max d(si ,si+1) (3) 0≤i ≤k−2 HC (Heterogeneity Coefficient). HC measures the longest distance that the playlist traveled in the music space. To satisfy heterogeneity, HC should be as large as the user desires and corresponds to the maximum distance between any two songs in the playlist, being calculated as: HC = max d(si ,sj ) (4) 0≤i ≤k−2,i

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. Random Playlists Smoothly Commuting Between Styles 01:15 song of the playlist to be high. Since the diameter (the distance between the two furthest songs) in our music space is approximately 18, in our experiments we forced that the distance between s0 and sd be at least 6 , which is one third of the diameter (see Figure3). All results come with error intervals representing a 99% confidence interval, although they are small and hardly visible.

(a) smooth transitions by ST 1: the (b) smooth transitions by ST 2: the lower the better. lower the better.

(c) heterogeneity by HC: the greater (d) novelty by RC: the lower the bet- the better. ter.

Fig. 5. Evaluation of playlists generated by STRAW and ROPE. Must be viewed in colors.

In Figure 5(a), through metric ST 1, we show the mean distance between two consecutive songs in playlists generated by ROPE, STRAW , Flexer, Pontello, RWalk and Random. Observe that all methods are able to generate playlists with transitions significantly smoother than Random, with Pontello and RWalk achieving slightly better results when k = 10. Recall that Pontello and RWalk do not have to reach the destination song sd . As the number of songs in the playlist grows, all algorithms have enough steps to reach sd . When k grows, STRAW is able to reach sd using transitions as smooth as Pontello and RWalk. Since ROPE and Flexer always select the nearest songs to the intermediate points they generate, and the distance between consecutive intermediate points decreases with the size of the playlist, ST 1 also decreases with k. The same behavior is observed in Figure 5(b), which depicts the average maximum distance between two consecutive songs in the playlists through metric ST 2. Although the value of ST 2 is (by definition) higher than ST 1, the order of magnitude is the same for all methods but Random, which again has significantly higher values than the others. In Figure 5(c), through metric HC, we show the average maximum distance between any pair of songs for the generated playlists. Since our goal is to generate heterogeneous playlists, the larger the value of HC, the more the playlist traveled in the music space and, therefore, the better the method. Observe that the average HC value for both STRAW and ROPE are practically invariant with the size of the playlist. Conversely, and as expected, Pontello does not satisfy heterogeneity,

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. 01:16 M. Almeida et al. being able to generate heterogeneous playlists only for large values of k. Observe that for playlists with at least 30 songs, STRAW has the largest value of HC. This happens because when STRAW reaches the direction song, it starts a random walk, what usually increases the distance reached from the seed song. Since both ROPE and the Flexer algorithms always end at the destination song sd , HC is always d(s0,sd ). Again, as expected, RWalk has the lowest values of HC and Random the highest. To measure the capacity of these six methods to generate different playlists using the same parameters s0, sd and k, for each generated playlist, we generate an extra one using the same input parameters. After that, we compute the RC value between the two. In Figure 5(d), we show the average values of RC for each algorithm. Obviously, since Flexer is deterministic, it always generates the same playlists, what results in RC = 1 for all playlists generated using the same input parameters. On the other hand, observe how our proposed algorithms generate significantly different playlists even when s0, sd and k are the same. In summary, as shown in Figure5, ROPE and STRAW are able to generate playlists with smooth transitions when compared with the Random algorithm, obtaining ST 1 and ST 2 values close to other state of the art methods. But unlike Pontello, the proposed algorithms traverse the music space as fast as the user desires, satisfying heterogeneity. Moreover, unlike Flexer, both of our proposed methods are able to constantly generate significantly different playlists even when the input parameters are the same. Considering the usefulness of our algorithms, observe in Figure 6(a) the heterogeneity of playlists created by users . Surprisingly, the average HC is ≈ 7.5. Since the distance between the two farthest songs in the music space is approximately 18, this value indicates playlists traveling almost half of the music space, characterizing a highly heterogeneous playlist. This reveals that a large portion of Art of the Mix users appreciate, in fact, heterogeneous playlists. If we observe figure 6(b), we also observe that Spotify’s user also listen to playlists with HC ≈ 7.5. Observe in Figure 6(c) three playlists generated by users, one with HC = 7.33 (purple), one with HC = 9.30 (orange) and one with HC = 14.63 (green). Although STRAW creates playlists with higher HC and smaller RC values than ROPE when we increase the size of the playlist, we want to emphasize the importance of both algorithms. They have been developed following the same structure and goal: generate playlists satisfying the five quality criteria proposed for heterogeneous playlist generation. However, each algorithm has pros and cons depending on the situation where they are applied. While STRAW is topology-aware and the navigation avoid sparse or even empty regions of the music space, ROPE is able to generate playlists with exactly k intermediate songs and the final song is the same as specified by user, a guarantee not givenby STRAW . To better compare the heterogeneity of the playlists created by users with the heterogeneity of the playlists generated by our algorithms, we performed experiments using as basis the playlists of the Art of the Mix dataset. We selected a set with 1000 playlists and, for each of them, we randomly selected two songs to be the first and last in a playlist of length k generated by both ROPE and STRAW . We computed the HC difference between the users’ playlists and the ones we generated. In Figure 7(a), we show the histogram of HC differences using ROPE and, in Figure 7(b), we show the histogram of the HC differences when we use STRAW . Observe that the mode of the differences is close to 0 for both algorithms. When comparing the histograms, ROPE tends to create playlists with heterogeneity lower than the users’ playlists, while STRAW can generate a playlist with higher or lower HC. We could also have plotted the same graph for ST 1 and ST 2 metrics, but since not all songs of the AotM playlists are in the Billboard music space, the intersection of the datasets may cause the users’ playlist to have harsh transitions, giving the impression that our algorithms have smoother transitions than the users’ playlists. We made a prototype of our methods publicly available at http://homepages.dcc.ufmg.br/~marcos. almeida/musics/create.html. In this prototype, the user can generate a heterogeneous playlist from

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. Random Playlists Smoothly Commuting Between Styles 01:17

(a) Histogram of the HC metric (b) Histogram of the HC metric (c) The path of three playlists cre- for the AotM dataset. for the Spotify dataset. ated by Art of the Mix’ users on our music space.

Fig. 6. Heterogeneity of playlists generated by Art of the Mix users.

(a) Histogram of the difference between HC of (b) Histogram of the difference between HC of users and HC of ROPE algorithm. users and HC of STRAW algorithm.

Fig. 7. Difference between HC value of users’ playlists and the playlists generated by our algorithms.

three inputs: a seed song s0, a final song sd , and the desired number of songs in the playlist k. Moreover, the user may also input intermediate songs and the number of songs between them. Finally, after the playlist is generated, the user may listen to it on Youtube. In Figure8 we show an example of a playlist generated by our prototype. This playlist has 20 songs and was generated from Britney Spears - Lucky to The Doors - Love me Two Times using ROPE. We also provided the source codes of the algorithms implemented in python, available at https://github.com/almeida-marcos/ PlaylistGenerators.

7 CONCLUSIONS In this paper, we formulated the problem of automatically generating high quality random hetero- geneous playlists from large collections of songs. To the best of our knowledge, this is the first work that explicitly tackled this problem. We carried out a discussion about the desired characteristics a playlist generator algorithm should have to satisfy the users’ need. In light of that, we proposed five quality criteria that should be considered when generating heterogeneous playlists: heterogene- ity, smooth transitions, novelty, scalability and usability. Then, we described a general method to generate random heterogeneous playlists that can be used as a model to construct algorithms that

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. 01:18 M. Almeida et al.

Fig. 8. A playlist generated from Britney Spears - Lucky to The Doors - Love me Two Times using ROPE. connect any two songs (or genres) given by the user. Based on this general method, we proposed two algorithms named ROPE and STRAW . Using a large collection of songs described in [26], we constructed a 2-dimensional music space, where the distance (or dissimilarity) between two songs can be computed using the Euclidean distance. Using users’ generated playlists, we showed that the music space is coherent and groups together songs that co-occurs very often in the same playlist. We also proposed four metrics to quantify three of the quality criteria, and applied our algorithms in the Euclidean music space, comparing them with other state of the art algorithms. Based on our experimental results, we showed that our proposed algorithms can effectively satisfy all five quality criterias. We also compared the playlists generated by our algorithms with users’ playlists, showing that the heterogeneity of our playlists approximates the heterogeneity of hand-crafted playlists. For work, we intend to analyze the user’s playlist to better understand their behavior when listening to music. We also want to cluster the users by their music taste, and possibly propose algorithms to automatically generate playlists for groups of users that may share the same ambiance. In this case, the playlists must satisfy the taste of all users the most as possible, the same time the five quality presented in this paper are satisfied. When it comes to the application, weintendto expand the music database, adding more recent songs. We also want to integrate our website to Spotify, so the playlists can be generated on their system.

REFERENCES [1] Masoud Alghoniemy and Ahmed H Tewfik. 2000. User-defined Music Sequence Retrieval. In Proceedings of the Eighth ACM International Conference on Multimedia (MULTIMEDIA ’00). ACM, New York, NY, USA, 356–358. [2] J.-J. Aucouturier and F. Pachet. 2002. Scaling up music playlist generation. In Proceedings. IEEE International Conference on Multimedia and Expo. IEEE, IEEE, Lausanne, Switzerland, Switzerland, 105–108. [3] Shay Ben-Elazar, Gal Lavee, Noam Koenigstein, Oren Barkan, Hilik Berezin, Ulrich Paquet, and Tal Zaccai. 2017. Groove Radio : A Bayesian Hierarchical Model for Personalized Playlist Generation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM). ACM, New York, NY, USA, 445–453. [4] Geoffray Bonnin and Dietmar Jannach. 2014. Automated Generation of Music Playlists: Survey and Experiments. Comput. Surveys 47, 2 (nov 2014), 1–35. [5] Shuo Chen, Josh L. Moore, Douglas Turnbull, and Thorsten Joachims. 2012. Playlist prediction via metric embedding. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’12. ACM Press, New York, New York, USA, 714.

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. Random Playlists Smoothly Commuting Between Styles 01:19

[6] Sally Jo Cunningham, David Bainbridge, and Annette Falconer. 2006. More of an Art than a Science: Supporting the Creation of Playlists and Mixes.. In Proceedings of 7th International Conference on Music Information Retrieval. Victoria, Canada, 240–245. [7] Ricardo Dias, Daniel Gonçalves, and Manuel J Fonseca. 2017. From manual to assisted playlist creation: a survey. Multimedia Tools and Applications 76, 12 (2017), 14375–14403. [8] Rick Durrett. 2010. Probability: theory and examples, 4th edition. Cambridge University Press, Cambridge, United Kingdom. [9] Arthur Flexer, Dominik Schnitzer, Martin Gasser, and Gerhard Widmer. 2008. Playlist Generation Using Start and End Songs. In In Proceedings of International Symposium on Music Information Retrieval, {ISMIR} 2008. [10] Eamonn Forde. 2017. ’They could destroy the album’: how Spotify’s playlists have changed music for ever. https://www.theguardian.com/music/2017/aug/17/ they-could-destroy-the-album-how-spotify-playlists-have-changed-music-for-ever [11] Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. 2011. A Survey of Audio-Based Music Classification and Annotation. IEEE Transactions on Multimedia 13, 2 (apr 2011), 303–319. [12] Olga Goussevskaia, Michael Kuhn, Michael Lorenzi, and Roger Wattenhofer. 2008. From web to map: Exploring the world of music. In Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT’08. IEEE/WIC/ACM International Conference on, Vol. 1. IEEE, 242–248. [13] David Hauger, Markus Schedl, Andrej Kosir, and Marko Tkalcic. 2013. The million musical tweet dataset - what we can learn from microblogs. In Proceedings of the 14th International Society for Music Information Retrieval Conference. [14] Walt Hickey. 2016. The Ultimate Wedding Playlist. https://fivethirtyeight.com/features/the-ultimate-wedding-playlist/ [15] Jia-Lien Hsu and Shuk-Chun Chung. 2011. Constraint-based playlist generation by applying genetic algorithm. In 2011 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 1417–1422. https://doi.org/10.1109/ICSMC. 2011.6083868 [16] Dietmar Jannach, Iman Kamehkhosh, and Geoffray Bonnin. [n. d.]. Analyzing the Characteristics of Shared Playlists for Music Recommendation. In Proceedings of the 6th Workshop on Recommender Systems and the Social Web (RSWeb 2014) co-located with the 8th {ACM} Conference on Recommender Systems (RecSys 2014), Foster City, CA, USA, October 6, 2014. [17] Dietmar Jannach, Lukas Lerche, and Iman Kamehkhosh. 2015. Beyond hitting the hits: Generating coherent music playlist continuations with the right tracks. In Proceedings of the 9th ACM Conference on Recommender Systems. ACM, 187–194. [18] Jean-Julienaucouturier. 2003. Finding Songs That Sound The Same. [19] Mohsen Kamalzadeh, Dominikus Baur, and Torsten Möller. 2012. A Survey on Music Listening and Management Behaviours. In Proceedings of the 13th International Society for Music Information Retrieval Conference. Porto, Por- tugal. [20] Iman Kamehkhosh, Geoffray Bonnin, and Dietmar Jannach. 2019. Effects of recommendations on the playlist creation behavior of users. User Modeling and User-Adapted Interaction (2019), 1–38. [21] Junghyuk Lee and Jong-Seok Lee. 2018. Music Popularity: Metrics, Characteristics, and Audio-based Prediction. IEEE Transactions on Multimedia (2018), 1–1. https://doi.org/10.1109/TMM.2018.2820903 [22] M. Levy and M. Sandler. 2009. Music Information Retrieval Using Social Tags and Audio. IEEE Transactions on Multimedia 11, 3 (apr 2009), 383–395. https://doi.org/10.1109/TMM.2009.2012913 [23] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579–2605. [24] François Maillet, Douglas Eck, Guillaume Desjardins, and Paul Lamere. 2009. Steerable Playlist Generation by Learning Song Similarity from Radio Station Playlists. In Proceedings of the 10th International Conference on Music Information Retrieval. [25] Matthias Mauch, Robert M MacCallum, Mark Levy, and Armand M Leroi. 2015. The evolution of popular music: USA 1960–2010. Open Science 2, 5 (2015), 150081. [26] M. Mauch, R. M. MacCallum, M. Levy, and A. M. Leroi. 2015. The evolution of popular music: USA 1960-2010. Royal Society Open Science 2, 5 (may 2015), 150081–150081. [27] Brian Mcfee, Luke Barrington, and Gert R G Lanckriet. 2010. Learning Similarity from Collaborative Filters. In Proceedings of the 11th International Society for Music Information Retrieval Conference. Utrecht, The Netherlands, 345–350. [28] Brian Mcfee and Gert Lanckriet. 2011. The Natural Language of Playlists. In Proceedings of the 12th International Society for Music Information Retrieval Conference. Miami (Florida), USA, 537–541. [29] Brian McFee and Gert R G Lanckriet. 2012. Hypergraph Models of Playlist Dialects. In Proceedings of the 13th International Society for Music Information Retrieval Conference, {ISMIR} 2012, Mosteiro S.Bento Da Vit{ó}ria, Porto, Portugal, October 8-12, 2012. Proceedings of the 13th International Society for Music Information Retrieval Conference,

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019. 01:20 M. Almeida et al.

{ISMIR} 2012, Mosteiro S.Bento Da Vit{ó}ria, Porto, Portugal, October 8-12, 2012, 343–348. [30] Riccardo Miotto and Nicola Orio. 2012. A probabilistic model to combine tags and acoustic similarity for music retrieval. ACM Transactions on Information Systems (TOIS) 30, 2 (2012), 8. [31] Elias Pampalk, Tim Pohle, and Gerhard Widmer. 2005. Dynamic Playlist Generation Based on Skipping Behavior.. In ISMIR, Vol. 5. ISMIR, 634–637. [32] Steffen Pauws, Wim Verhaegh, and Mark Vossen. 2006. Fast Generation of Optimal Music Playlists using LocalSearch. In In Proceedings of International Symposium on Music Information Retrieval, {ISMIR} 2006. [33] Steffen Pauws, Wim Verhaegh, and Mark Vossen. 2008. Music playlist generation by adapted simulated annealing. Information Sciences 178, 3 (2008), 647–662. [34] T. Pohle, P. Knees, M. Schedl, E. Pampalk, and G. Widmer. 2007. “Reinventing the Wheel”: A Novel Approach to Music Player Interfaces. IEEE Transactions on Multimedia 9, 3 (apr 2007), 567–575. https://doi.org/10.1109/TMM.2006.887991 [35] Luciana Fujii Pontello, Pedro H. F. Holanda, Bruno Guilherme, João Paulo V. Cardoso, Olga Goussevskaia, and Ana Paula Couto Da Silva. 2017. Mixtape: Using Real-Time User Feedback to Navigate Large Media Collections. ACM Trans. Multimedia Comput. Commun. Appl. 13, 4, Article 50 (Aug. 2017), 22 pages. [36] R. Ragno, C. J. C. Burges, and C. Herley. 2005. Inferring similarity between music objects with application to playlist generation. In Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval - MIR ’05. ACM Press, New York, New York, USA, 73. [37] Markus Schedl, Hamed Zamani, Ching-Wei Chen, Yashar Deldjoo, and Mehdi Elahi. 2018. Current challenges and visions in music recommender systems research. International Journal of Multimedia Information Retrieval 7, 2 (2018), 95–116. [38] Maria Taramigkou, Efthimios Bothos, Konstantinos Christidis, Dimitris Apostolou, and Gregoris Mentzas. 2013. Escape the Bubble: Guided Exploration of Music Preferences for Serendipity and Novelty. In Proceedings of the 7th ACM Conference on Recommender Systems (RecSys ’13). ACM, New York, NY, USA, 335–338. [39] Andreu Vall. 2015. Listener-Inspired Automated Music Playlist Generation. In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys ’15). ACM, New York, NY, USA, 387–390.

Received August 2019

ACM Trans. Multimedia Comput. Commun. Appl., Vol. 9, No. 4, Article 01. Publication date: August 2019.