DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2021

Recommend Songs With Data From Spotify Using Spectral Clustering

DANIEL BARREIRA

NAZAR MAKSYMCHUK NETTERSTRÖM

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES Abstract

Spotify, which is of the worlds biggest music services, posted a data set and an open-ended challenge for music recommendation research. This study’s goal is to recommend songs to playlists with the given data set from Spotify using Spectral clustering. While the given data set had 1 000 000 playlists, Spectral clustering was performed on a subset with 16 000 playlists due to the lack of computational resources. With four different weighting methods describing the connection between playlists, the study shows results of reasonable clusters where similar category of playlists were clustered together although most of the results also had a very large clusters where a lot of different sorts of playlists were clustered together. The conclusion of the results were that the data was overly connected as an effect of our weighting methods. While the results show the possibility of recommending songs to a limited number of playlists, hierarchical clustering would possibly be helpful to be able to recommend song to a larger amount of playlists, but that is left to future research to conclude.

1 Sammanfattning

Spotify, som ¨aren av v¨arldensst¨orstamusiktj¨anster,publicerade data och en ¨oppen utmaning f¨orforskning inom musikrekommendation. Denna studies m˚al¨aratt rekomendera l˚atartill en spellista med den angivna data fr˚anSpotify med hj¨alpav klusteranalys. Fast¨anden publicerade datam¨angdenhade 1 000 000 spellistor, utf¨ordesklusteranalys p˚a16 000 spellistor p˚agrund av brist p˚aber¨akningskapacitet. Med fyra olika viktningar p˚agrafen med spellistor, visar studien resultat av rimliga kluster d¨arliknande kategori av spellistor var klustrade ihop. D¨aremotinneh¨oll resultatet i de flesta fallen ett v¨aldigtstort kluster med m˚angaoliaka typer av spellistor klustrades ihop. Slutsaten av detta var att den anv¨andadatan var alltf¨or sammankopplad som en effekt utav de anv¨andav¨agningarna. Aven¨ om resultaten visar att m¨ojlighetenfinns att rekommendera l˚atartill ett begr¨ansaatantal spellistor, skulle hierarkisk klustring m¨ojligenvara till hj¨alpf¨oratt kunna rekomendera l˚atartill fler antal spellistor.

2 Acknowledgement

We would like express our sincere gratitude to our supervisors Emil Ringh and Parikshit Upadhyaya. Parik, without your help we would probably still have been stuck on the difference between unnormalized and normalized Laplacian. Emil, without you detecting some of our computational flaws and teaching us effective ways to find them we would probably still be doing simulations. Without the encouragement and continuous feedback from you two this project would have been a lot harder, thank you.

3 Authors Daniel Barreira, [email protected] Nazar Maksymchuk Netterstr¨om,[email protected] Degree Programme in Technology KTH Royal Institute of Technology

Place for Project Stockholm, Sweden

Examiner Gunnar Tibert Vehicle Technology and Solid Mechanics, KTH Royal Institute of Technology

Supervisor Emil Ringh Parikshit Upadhyaya Department of Mathematics, Numerical Analysis, KTH Royal Institute of Technology Contents

1 Introduction6 1.1 The data...... 6 1.2 Clustering...... 7 1.3 Limitations...... 8

2 Method9 2.1 Graph Theory...... 9 2.2 Eigenvalue and Eigenvectors...... 11 2.3 k-means algorithm...... 11 2.4 Spectral Clustering...... 12 2.5 Different approaches to weighting...... 19 2.6 Recommending songs from clustered graph...... 21

3 Results 22 3.1 Results from weighting 1: Percentage of similar song...... 22 3.2 Results from weighting 2: Percentage of similar artists...... 25 3.3 Results from weighting 3: A constructed function...... 28 3.4 Results from weighting 4: A constructed function...... 30 3.5 Results from random samples...... 31 3.6 Song recommendation to a playlist...... 32

4 Discussion 34 4.1 Conclusion...... 36 4.2 Future work...... 36

5 1 Introduction

We live in an age where there is an overflow of data. All the data presents us humans with a spectrum of different problems as well as opportunities. Music is an important cultural part of our society, and it is one of the fields that can take advantage of the opportunities that arises with the data overflow. Both from the perspective as a musician and as a listener there are initiatives to be done. As a musician, you probably want your music to reach as many listeners as possible, and as a listener you want a diverse pool of music that is in your interest. All this leads up to the purpose of our project, a challenge presented by AIcrowd called the ”Spotify Million Playlist Dataset Challenge”[3]. The challenge is to create an automatic playlist continuation by recommending 500 songs, ordered by relevance in decreasing order. In this paper the challenge will be solved by using Spectral clustering. Spectral clustering is a method to do clustering in the eigenspace of a graph laplacian. Clustering is a method to analyze data that is widely used in many fields such as statistics, computer science and more. The aim in this project is to see if one can find clusters of playlists and use these clusters to not only give music that directly correlates with the user but also find songs from connections in the cluster. Problem formulation From a data set given by Spotify, our study will focus on analysing how effective spectral clustering is when recommending songs.

1.1 The data The data set is sampled from over 4 billion public playlists on Spotify and consists of one million playlists. There are 2 million unique tracks present in the data by nearly 300 000 artists. The data set is collected from US Spotify users during the years 2010 and 2017.

6 Figure 1: An illustrative extraction of a part of playlist 661 from the data set.

Furthermore the data has roughly 66 million tracks in total and 2, 26 million unique tracks. The given data has a few different attributes. An is shown in Figure 1 for a playlist with 58 tracks. It is shown what is given for every playlist, and for every track. The ones mainly used in this paper are the ones marked with a red ring.

1.2 Clustering Clustering is a way of understanding information with dividing data into different groups. The point is to define connections between data points with similarity, and by proxy removing non-essential data also known as noise. By doing this one will be able to detect different patterns and thus being able to analyze the given data. The applications of clustering are numerous and it is a widely used method to start analyzing big sets of data with machine learning [8], [7]. There are a variety of clustering algorithms. To name some of them, ε-neighborhood, k-means and Density-based clustering. In this paper the study revolves around Spectral clustering and how effective it is when clustering Spotify playlists.

7 The reason why there are lots of different methods of clustering is because of the variety of datasets. For example consider the different data point sets in Figures2 and3.

Figure 2: data set one Figure 3: data set two

As seen in Figure2 and Figure3, the structure of the data points are different. This means that some clustering methods will also perform better versus others. By using density-based clustering or spectral clustering one can get ”correct” results on both graphs shown in Figures2 and3 but, k-means clustering will only be simply implemented on the graph shown in Figure2. To understand why, a detailed understanding of the different algorithms is needed. Before explaining the algorithms some theory is needed.

1.3 Limitations The given data set consists of a lot of data. Due to lack of time and lack of computational resources the entire data set could not be analyzed . Furthermore the challenge itself is not being done, and the study only shows how spectral clustering could work for the challenge and to understand the general structure of the data.

8 2 Method

As mentioned before the main method in this paper is Spectral Clustering. But before diving into the algorithm itself some preliminary material is presented, starting with graph theory.

2.1 Graph Theory

A graph G is defined as a collection of i nodes N = {n1, . . . , ni} and k edges

E = {e1, . . . , ek}. A node represents a data point and an edge represents the connection/relationship between two nodes [5]. We write

G = (N,E). (1)

Furthermore there is such a thing as a directed and an undirected graph.

Undirected and directed graph

A graph is undirected when an edge between two arbitrary nodes ni and nj is the same without regard of the direction.

eij ∈ E ⇒ eji ∈ E. (2)

An undirected graph also is called a symmetric graph. For a directed graph the direction matters. The next step is to explain how nodes are conneted to each other via edges.

Connectivity of a graph

A graph is called connected when it is possible to walk from one node to every other node. Otherwise the graph is not connected and there are several number of sub graphs. This is defined as multiplicity M, where M ≥ 1. [4]

M(G) := multiplicity = number of sub graphs (3)

The connectivity of a graph can be represented by different sorts of matrices.

9 Adjacency matrix and weights

The graph can be represented with a matrix, called, the unweighted adjacency matrix Auw and is defined as following:  1 : if there is an edge between nodes k and j Auw(k, j) = (4) 0 : if no edge between nodes k and j

Figure 4: Example of an unweighted graph G and the associated adjacency matrix

A graph can either be unweighted (as seen in Figure4) or weighted, meaning that the edges values vary. The weighted adjacency matrix Aw is then defined as:

 wkj : weight of edge (k,j) Aw(k, j) = (5) 0 : if no edge between (k,j)

A weighted graph can now be written as following

G = (N,E,W ) (6) where W is a set of weights {w1, . . . , wk}. From here one last definition needs to be made from graph theory.

10 Degree matrix

A degree matrix D is a diagonal n × n matrix, where n is the length of the adjacency matrix, and is defined as the sum of each row in the adjecency matrix A   n d1 ... 0 X  . .  di = wi,j D =  .. .  (7) j=1   dn where i, j = {1, . . . , n}.

2.2 Eigenvalue and Eigenvectors Definition: For a square matrix B, λ is an eigenvalue and ν is the corresponding eigenvector if Bν = λν (8)

Spectrum of Eigenvalue

For a square matrix B , the spectrum is the set of eigenvalues. If a symmetric matrix with the size n × n has n non-negative eigenvalues, then that matrix is called a symmetric positive definite matrix. [1]

Eigs in MATLAB

In this paper the MATLAB-tool eigs() is used to calculate the eigenvalues of our matrices. To use eigs, the matrix should be square and sparse. Eigs makes the calculations significantly faster than eig() for sparse matrices.

2.3 k-means algorithm k-means clustering is an iterative method, commonly used to cluster a set of data points. Given k center points (called centroids) in a space, the method performs clustering and assigns all data points to a cluster defined by the nearest centroid. There are challenges in using this method on some occasions, but it is still a fundamental core of Spectral clustering. In Figure5 and Figure6 one can see the clusters it detects for two different set of data, for two different cases.

11 Figure 5: k-means on graph 1 Figure 6: k-means on graph 2

After initializing k clusters, the distance from each data point to each centroid is calculated. Thereafter the centroid that the data point is closest to is used to define the cluster that the data point is assigned to. Thereafter the mean-value is calculated for the positions for all data points in the different clusters. This gives a new position to the centroids in the next iteration. Same process is repeated until the centroids are stationary, meaning that they find an equilibrium. The algorithm is presented below in Table1.

Algorithm: k-means algorithm

1. Randomly initialize k centroids c1, . . . , ck. 2. Calculate the distance from each data point to every centroid. 3. Assign each data point to its nearest centroid. 4. Calculate the mean value of the grouped data points in each cluster and make that the new positions for the centroid. 5. Repeat from step 2 to step 4 til the difference in positions for each centroid is below a given tolerance. Table 1: k-mean algorithm

11 2.4 Spectral Clustering Spectral clustering uses the spectrum of eigenvalues from the graph laplacian matrix to cluster the graph in question [5]. There are unique advantages when using the spectrum of the eigenvalues to cluster a graph. These advantages are going to be presented later. The method implies calculating the eigenvalues and eigenvectors of a so called

Laplacian matrix. Given a graph G = (N,E) with nodes {n1, . . . , nm} ∈ N, then the unweighted Laplacian matrix L of the graph G is a m × m−matrix defined as

L = D − A (9) where D and A are the degree and adjacency matrices defined in (5) and (7), respectively. An important property of L is that the matrix is symmetric positive semi-definite. The spectrum of the Laplacian matrix is calculated to obtain underlying structures of the data of the graph. The number of connected components is equal to the multiplicity of the 0 eigenvalue, i.e., eigenvalues with the zero value. If k different eigenvectors have eigenvalue λ1,...,k = 0, the graph has k connected components. This gives information about the number of clusters. On the other hand, if the graph G is connected, λ2 gives information about the connectivity of the graph. The greater the value of λ2 the stronger the connectivity. In Figure7,8 and9, examples to illustrate these properties are shown:

Figure 7: λ1,2,3 = 0 Figure 8: λ2(G1) > 0 Figure 9: λ2(G2)  0

The normalized Laplacians

To calibrate spectral clustering to different graphs there are strategies that involve normalizing the Laplacian matrix. One way of normalizing the Laplacian matrix is as following: −1/2 −1/2 Lnorm = D LD (10) where the degree matrix D is defined in (7). The properties of the graph laplacians have an impact on how the graph is partitioned, in other words, how the graph is

12 cut.[5]

Partitioning a graph

Consider a graph G = (N,E,W ), with i nodes. Consider also the k partitions c {A1,A2,...,Ak}. Furthermore use the notation A for the compliment of the partition A. The problem in question is to find the partitions of the graph by minimizing the cut, defined as:

k 1 X Cut(A ,...,A ) = W (A ,Ac ) (11) 1 k 2 i i i=1 where k X W (A, B) = wi,j. i∈A,j∈B This takes into account the weight of the nodes that are being cut. This is an attempt to cut in a way that the clusters whose weights of the edges connecting the different clusters are as small as possible. However this method does not take into account the number of nodes, nor the volume in the different clusters. The risk is then that the clusters strongly vary in size but also partitioning the clusters in a ”wrong way”. Figure 10 shows an example how the partitioning can be made.

Figure 10: The figure shows how the cut can be done on a graph with multiplicity 1. Cut 1 shows how the cut can be done if number of nodes is not being taken into account. While cut 2 shows how we probably want to cut the graph.

13 As shown, this method is not necessarily giving the partitioning that is being sought-after. Other quantities are therefore needed. To make the size of the clusters more similar, there are two ways one can consider measuring sizes of the clusters.

Either by taking the number of nodes in a cluster |Ai| into account or taking the volume of a cluster vol(A) into account. This leads to minimizing either, the so called, RatioCut or NCut [5].

k c 1 X W (Ai,A i) RatioCut(A ,...,A ) = (12) 1 k 2 |A | i=1 i where

|Ai| := the number of nodes in a partition Ai, and k c 1 X W (Ai,A i) NCut(A ,...,A ) = (13) 1 k 2 vol(A ) i=1 i where k X vol(A) := di i∈A

To summarize, minimizing RatioCut will encourage the clusters to have similar amount of nodes and minimizing NCut will encourage the clusters to have similar volume. Solving this minimization problem exactly is NP-hard but with the graph Laplacian, it is possible to approximate a solution. By using the unnormalized Laplacian L an approximation of the minimization problem of RatioCut is done and the normalized Laplacian Lnorm can approximate the minimization problem of NCut. Proof and theorem of the connection of the eigenvectors of the Laplacian with the graph cut functions is given by Von Luxburg [5].

14 Spectral Clustering Algorithms

In Tables2 and3 two different algorithms for Spectral clustering are presented.

Algorithm with aproximation of RatioCut 1. Create a graph G = (N,E). 2. Compute the unnormalized Laplacian L = D − A.

3. Find the eigenvectors ν1, . . . , νk of the matrix L belonging to the k smallest

eigenvalues and create a matrix U = [ν1, . . . , νk].

4. Treat each row in U as a data point x1,..., xn and perform k-means

clustering on the points into k partitions A1,...,Ak. Table 2: Spectral clustering with unnormalized laplacian

Algorithm with approximation of NCut 1. Create a graph G = (N,E). −1/2 −1/2 2. Compute the normalized Laplacian Lnorm = D LD .

3. Find the eigenvectors ν1, . . . , νk of the matrix Lnorm belonging to the k

smallest eigenvalues and create a matrix U = {ν1, . . . , νk}. −1/2 4. Create T = ti,j and set ti,j = d νi,j. In other words, let T contain the normalized rows of U.

5. Treat each row in T as a data point x1,..., xn and perform k-means

clustering on the points into k clusters A1,...,Ak. Table 3: Spectral clustering with normalized laplacian

Spectral clustering and k-means (test results for a small dataset)

Figure 11 shows a faulty partitioning using k-means clustering. This is a problem where k-mean clustering sometimes gives undesired results. Because of the way the conditions are initialized, the centroids can find an equilibrium in an unwanted way. An easy and reasonable way to get around the problem is to do the k-mean clustering a few more times, and use the most common outcome (the undesired result is more unusual than the desired result).

15 Figure 11: An example of an undesirable convergence of k-means

Figure 12 presents an example of using k-means clustering, as done in Figure6, but instead of using the graph G, the k-means is done in the eigenspace of the Laplacian L. The algorithm has partitioned 3 clusters in a desired manner. The set of points which are shown in Figure3 are distributed in a far more complicated way than the set of points in Figure2. By only doing k-means the partitioning of this data this is not easy, one would have to involve transformations [7] and have a deeper understanding of the structure of the data to be able to accomplish the ”correct clustering”. With Spectral clustering instead, the partitioning is made significantly more effective.

Figure 12: Illustration of how Spectral clustering has ”correctly” clustered the data in Figure3.

When clustering the data with three different sizes of circles with our spectral clustering script, the results were correct approximately 60% of the time. This

16 means that k-means (which is the final step in spectral clustering ) found equilibrium on wrong spots in the eigenspace, approximatly 60% of the time. An important factor that affects the precision in k-means is the initialization of centroids. If a centroid is initialized far away, relatively to the data points, it has a chance of ending up without data points associated with it. In another case, more than one centroid has a high probability of initializing in the same cluster, which would also cause problems. By initializing random centroids in a more strategic way, instead of pure randomization, the precision increases drastically. One method of initializing starting points in a strategic manner is called k-means++. k-means++

This algorithm proposes a specific iterative mathematical method of assigning starting positions for the centroids. By using this method the centroids are more evenly spread out through the graph [2]. Let P be the set of data points and define D(x) as the distance from a data point to closest centroid. The algorithm is found in Table 4:

Algorithm k-means++

1. Randomly initialize a centroid c1 from P 2. For each datapoint compute D(x) to the centroid that is nearest.

3. Choose as the next centroid the point with the highest probability P++ 4. Repeat step 2 and 3 until k centroids have been initialized. Table 4: Algorithm for k-means++

Where X D(x)2 P++ = . (14) Pk D(x)2 x∈P x∈P

17 39 61 200 100 100 100 32 68 200 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 39 61 200 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 0 100 200 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 0 100 200 100 100 100 32 68 200 100 100 100

Table 5: Results with k-means Table 6: Results with k-means++

Table5 and6 represent solutions to Spectral clustering on predefined data points. The data is modeled as 100 data points in each circular shaped cluster, as in figure 3. Each row of the table represents a simulation, and shows the number of points to each clusters found. The table on the left represent results from spectral clustering featuring k-means and the right table represent results from spectral clustering featuring k-means++. In table6, the advantage of k-means++ becomes obvious.

18 2.5 Different approaches to weighting The weighting of the graph is a crucial part in this study. The weighting will determine how the playlists are clustered and that will directly lead to a resulting list of recommended songs.

Weighting 1: Percentage of similar songs between two playlists

This weighting method goes as follos: the number of songs that belong to both playlists are found and divided by the length of the longest playlist. The similarity is based from the perspective of the big list, i.e if list B has 100 songs and list A has 50 songs, their similarity is 0.5 and not 2.0. With this method there are values between 0 and 1. The formula is seen below, where A and B are arbitrary playlists with their tracksURIs. Length(Max(A, B)) is the length of the longer list and similarity(A, B) is the number of common URIs.

similarity(A,B) w = (15) AB Length(Max(A,B))

Furthermore the weighting also included some constraints. The constraints were split up into two different segments. The first constraint was that data under a certain threshold did not get added to the adjacency matrix, for example the filter could be that the lists needed to have more than 20% in common for them to be added to the adjacency matrix. The second constraint was applied in a second stage. First of all, every row and column with only 0’s in it were removed, meaning they did not have any connections, were removed. Second of all, the rows and columns that had less or more than a specific number of edges were removed, for example if a row only had 1 connection to another playlist it was removed. This was done in hope of reducing the number of small clusters and outliers.

Weighting 2: Percentage of similar artists between two playlists

In the same manner as weighting function 1, this was used based upon the data given from the data set. The percentage similarity was determined using (15), but with A and B containing artist URIs instead of track URIs. This function was also subject to constraints in the same way as in weight function 1.

19 Weighting 3 and 4: A constructed function

The third and fourth weight function is given by Figure 13, and it is a curve of (16)

2 3 w3,4 = 3w − 6w + 4w (16)

Figure 13: A plot of the function showing how a low percentage similarity gets increased to a higher similarity and how a high percentage similarity gets decreased to a lower.

This weight function is done with the goal to create more connections with a closer value to each other. As seen in figure 13 the function amplifies low values and devalues high values. The function is done for both artist and tracks, hence it is both the third and fourth weight function.

20 2.6 Recommending songs from clustered graph Eigenvalues and eigenvectors were computed with the function eigs in MATLAB. As mentioned before, we take every eigenvector corresponding to eigenvalues close to the value 0, and thus cluster the eigenspace. In the reduced space, each row still corresponds to one playlist in the same order. So when the clustering algorithm is done, one can find the index for the given row, and go back to the original JSON-data and withdraw essential information such as, song name corresponding to a track URI, artist name corresponding to artist URI and the name of the playlist. When it comes to recommending a song from a cluster, a random number is generated from the size of the cluster. This number is then the row in the cluster you look at, and thus also a playlist. If it is the same playlist as the one that is getting songs recommend to it, a new number is randomized. If it is another playlist, another number is randomly generated from the size of songs in the playlist to suggest a song. If the song already exists in the the playlist you are recommending songs to, then we randomize a new number. This algorithm is then repeated as many times as there are songs left in the cluster, or until the set amount of recommended songs is reached.

21 3 Results

3.1 Results from weighting 1: Percentage of similar song

Results from unnormalized Laplacian

TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix Nr. cl = number of clusters, max(cl) = maximal cluster size

TH E.f. Size(A) Nr. cl max(cl) Comments One big cluster, one with size 145 20% 4702 324 3578 a few with size 10 to 12 and the rest less than 7 One big cluster, one with size 136 20% <2 3236 60 2849 few bigger than 10 and the rest less than 5. One big cluster, one with size 367 20% <5 1878 16 1292 one 95, one 57 the rest less than 11 20 clusters greater than 10. biggest is 193 30% 1399 259 193 240 clsuters < 10, of which 200 clusters < 3 The span of biggest clusters are 30% <2 761 62 153 {153, 83, 70, 60, 49, 4740} the rest are less than 5 The span of cluster size 30% <5 315 19 93 {93, 72, 26, 24}, 7 clusters smaller than 5 Span of biggest clusters{67, 45, 20, 17, 10, 10, 10} 40% 465 116 67 rest of clusters less than 3. The span of clusters {59, 36, 18, 14, 10} 40% <2 210 27 59 20 clusters less than 4 40% <5 84 6 40 This gave cluster sizes: {40, 19, 15, 6, 2, 2} This gave the span: {25, 9, 8, 6, 5, 5, 5} 50% 192 58 25 43 cluster are less than 4 The cluster sizes were {20, 8, 8, 6, 4}, 50% <2 82 17 20 8 clusters that are less than 4 50% <5 19 4 8 This gave 4 clusters with the sizes {8, 7, 2, 2}

Table 7: Results from 16000 Spotify playlists.

22 Results from normalized Laplacian

TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix N.r. cl = number of clusters, max(cl) = maximal cluster size

TH E.f. Size(A) Nr. cl max(cl) Comments Clusters and cluster sizes identical 20% 4702 324 3578 to unnormalized laplacian Same as above 20% <2 3236 60 2849

Same as above 20% <5 1878 16 1292

Same as above 30% 1399 259 193

Same as above 30% <2 761 62 153

Same as above 30% <5 315 19 93

Same as above 40% 465 116 67

Same as above 40% <5 84 6 40

Same as above 50% 192 58 25

Same as above 50% <2 82 17 20

Same as above 50% <5 19 4 8

Table 8: Results from 16000 spotify playlists.

23 Closer look at a few selected clusters

The selected clusters are from a threshold of 30% and no edge filtering. They are presented as the playlist name they have in the data. There are 259 clusters in total of which less than 8% are greater than 10 in size. The first cluster, which is presented in Table9, is one with a size of 19, this is also one of the larger clusters obtained using this particular weight function. The second one, in Table 10, is also one that is quite large, relatively to the other clusters, and it has size 7.

Latin Trap’ latino ’ Cecilia’ Lily’ Nueva’ Latin Vibes’ Spanish’ spanish’ Regeaton’ Spanish’ Party’ ’ Fiesta latina’ Mi Gente’ musica favorita’ Spanish Mix’ Gucci’ Latin Vibes’ BEANS’

Table 9: Playlist names in a cluster

Yeezy’ yeezy ’ Kenye West ’ Kenye West’ Yeezy Taught Me Kanye Kanye

Table 10: Playlist names in a cluster

24 3.2 Results from weighting 2: Percentage of similar artists

Results from unnormalized Laplacian

TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix N.r. cl = number of clusters, max(cl) = maximal cluster size

TH E.f. Size(A) Nr. cl max(cl) Comments One very large cluster, 20% 14567 44 14463 second largest is size of 6 20% <2 13780 5 13767 Same as above 20% <5 12340 1 12340 One single cluster One very large cluster, 30% 12010 114 11738 second biggest is size of 7 One gigantic, the rest less than 5 30% <2 10483 10 10449

One single cluster 30% <5 8438 1 8438

One very large cluster, the rest are less than 5 40% 8410 210 7881

Same as above 40% <2 6648 25 6565

Same as above 40% <5 4790 6 4765

The biggest clusters have sizes: 50% 4964 244 3877 {3877, 211, 100, 60}. The vast majority is less than 14 Clusters have sizes: {3098, 181, 107, 59} 50% <2 3591 41 3098 the rest are less than 9 Clusters have sizes: {102, 98, 67, 44, 14} 50% <5 2309 15 1941 The rest are less than 10

Table 11: Results from 16000 spotify playlists.

25 Results from normalized Laplacian

TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix N.r. cl = number of clusters, max(cl) = maximal cluster size

TH E.f. Size(A) Nr. cl max(cl) Comments Same partitioning as with 20% 14567 44 14463 the unnormalized Laplacian Same partitioning as with 20% <2 13780 5 13767 the unnormalized Laplacian Same partitioning as with 20% <5 12340 1 12340 the unnormalized Laplacian Biggest cluster is 8541. 3 clusters between 100-300, 30% 12010 114 2293 few bigger than 20 and the rest less than 10 Same partitioning as with 30% <2 10483 10 10449 the unnormalized Laplacian Same partitioning as with 30% <5 8348 1 8348 the unnormalized Laplacian Same partitioning as with 40% 8410 210 7881 the unnormalized Laplacian Same partitioning as with 40% <2 6648 25 6565 the unnormalized Laplacian Same partitioning as with 40% <5 4790 6 4765 the unnormalized Laplacian Same partitioning as with 50% 4964 244 3877 the unnormalized Laplacian Same partitioning as with 50% <2 3591 41 3098 the unnormalized Laplacian Same partitioning as with 50% <5 2309 15 1941 the unnormalized Laplacian

Table 12: Results from 16000 spotify playlists.

26 Closer look at a few selected clusters

The selected clusters are obtained using a threshold of 50% and no edges filtered. The clusters we are taking a closer look at is number 127 with a size of 68 and number 70 with a size of 14. Cluster 127 is the third largest one, and cluster 70 is in the top 10% largest ones, relative to the other clusters.

Classical’ Movie Soundtracks’ movie scores’ movie themes’ Soundtrack’ Movies’ Harry Potter’ Symphonic’ Scores’ Chill’ homework’ Classical’ Orchestra’ movie music’

Table 13: Playlist names in a cluster

Disney’ Disney Jams’ Disney’ ’ Disney’ Disney’ Disney’ Disney’ Disney’ Disney’ Princess’ disney’ Disney’ Disney’ Disney’ Disney Music!!!!!!’ Disney’ Disney’ disney’ DISNEY ’ Disney/Pixar’ Disney’ disney’ Disney Disney’ Disney disney’ disney playlist.’ Disney Music’ Disney :) ’ DISNEY’ disney’ Disney’ Disney ’ Disney’ Disney ’ Disney Classics’ disney’ Disney’ babies ’ Disney’ Disney Favs’ Disney’ Disney’ disney’ Disney’ Disney’ Disney Princess’ Disney Best of Disney’ disney’ Disney’ Disney’ DISNEY JAMS ’ DISNEY’ Disney’ Disney Jams’ Disney’ Disney’ Disney!’ disney songs’ Disney’ Disney’ hakuna matata’ Disney’ Disney’ disney’ Olivia’

Table 14: Playlist names in a cluster

27 3.3 Results from weighting 3: A constructed function TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix N.r. cl = number of clusters, max(cl) = maximal cluster size

TH E.f Size(A) Nr. cl max(cl) Comments 20% 13689 120 13416 The second largest cluster is size 5 A big variation on the cluster size 20% >9 2739 569 212 top 11 clusters {212, 103, 99, 80, 58, 58, 46, 42, 31, 36, 24} Cluster sizes after the biggest have sizes 20% >14 4091 394 2905 {28, 25, 25, 20}, 360 clusters less than 5 Biggest cluster is size 33. 20% >6 1802 561 33 543 clusters are less than 10 One very large, three clusters bigger than 10 30% 9984 225 9419 The rest of the cluster are less than 6 One very large, a few with size between 10 − 69 30% >9 3710 624 1461 480 clsuters less than 4 Clusters have sizes: {75, 34, 30, 26, 25} 30% >6 2743 733 135 694 clsuters les than 10 and 443 clusters less than 3 Clusters have sizes: {49, 43, 20, 20} 30% >14 4943 429 3621 289 clsuters less than 4 One very large cluster, thereafter {164, 65, 26} 40% 5249 308 4125 . 252 clusters are less than 4 One very large cluster, second biggest 125 40% >9 3091 511 965 a few between 20-90 and 305 cluster with size 2 Largest cluster is size 77, thereafter {60, 34, 28,... } 40% >6 2438 624 77 343 cluster with size 2 One very large cluster, 40% >14 3768 388 2412 353 cluster that are less than 3

Table 15: Results from 16000 spotify playlists.

28 Closer look at a few selected clusters

Table 16 and Table 15 show clusters that are obtained using a threshold of 30% and filtering playlists that have more than 6 edges.

Awesome Playlist’ Country’ Zoned’ greek’ Dark Side’ summer country’ electro’ Rock’ Lindsey Stirling’ smiles :)’ Pool’ Black’ woo’ Relaxing ’ Spring 2017’ 90s Rock’ pump up’ Chill’ Gaming Songs’ jjj’ energy’ cool beans’ Perfection’ 80s’

Table 16: Playlist names in a cluster

This is what you came for Party playlist Me Gaming Supernatural Lit Sunshine Drive Ay ALT Rock

Table 17: Playlist names in a cluster

29 3.4 Results from weighting 4: A constructed function TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix N.r. cl = number of clusters, max(cl) = maximal cluster size

TH E.f Size(A) Nr. cl max(cl) comments One very large cluster 20% 15837 5 15829 the other 4 clusters have size of 2 Size of biggest clusters {12, 11, 9, 8 ... } 20% >9 356 115 12 99 clusters less than 5 Size of biggest clusters {7, 5, 5, 4 ... } 20% >6 173 65 7 41 clusters with the size of 2 Size of biggest clusters {47, 36, 31,... } 20% >14 692 142 47 13 clusters with size less than 5 One very large cluster the rest are size 2 30% 15621 11 15601

Sizes of biggest clusters {101, 30, 25, 20,... } 30% >9 913 217 101 186 clusters with size less than 5, 120 clusters with size 2 30% >6 528 181 9 All clusters have almost similar size One very large cluster, second biggest 22 30% >14 1550 208 826 120 clusters less than 2 One very large cluster, second biggest size of 4 40% 14817 44 14721

One large cluster, 221 clusters less than 3 40% >9 1810 388 194

Sizes of biggest clusters {33, 22, 15, 14, 13,... } 40% >6 1086 345 33 271 clusters that are less than 4 Sizes of biggest clusters {87, 45, 31, 30, 19,... } 40% >14 2906 323 1770 254 clusters with the size of 3 or less

Table 18: Results from 16000 spotify playlists.

30 Closer look at a few selected clusters

Table 19 and 20 show cluster that are obtained using a threshold of 40% and filtering of nodes that have greater than 9 edges.

JAMS’ Love Music’ basic’ RUNNIN’ ”emoji music note” electronic’ Litty ’ Cruisin’ modern rock’ vibes’ pregame’ Happy Happy Happy’ ’ PARTY ’ classic’ 4th of july’ 2016’ english’ Classical’ Summer 15’ Beach Music’ rock’ 90s Rock’ Random!’ childhood’ skrt skrt’ dance’ broadway’ sad song’ Way Back When’ lift’ In the Name of Love’ TX Country’ Bruno Mars Summertime TX Country RECENT Swing

Table 19: Playlist names in a cluster

Solitude’ Spanish’ randoms’ Julion alvarez’ *** good stuff’ june’ Workout’ Relax’ Piano Guys’ Brown Eyed Girl’ wedding playlist’ Country’ MVP ’ Fall’ ThrowBack Pop ’ Hawaii ’ gabrielle ’

Table 20: Playlist names in a cluster

3.5 Results from random samples The extraction of playlists from the data set has been done on consecutive data, which means for the analysis of 16 000 playlists, the first 16 000 playlists from the data set was extracted. A test has been done where three different sets of 16 000 random playlists from the data set have been extracted and our algorithm with weighting w1 and w2 have been implemented. The results show identical cluster-size differences, where the resulting clusters are one very big cluster and the rest with smaller size. These tests gives us no reason to believe that the given data set is in some way ordered by the publisher or that our extraction is an outlier.

31 3.6 Song recommendation to a playlist In Table 21 a full playlist with 39 Disney songs is shown. This playlist is randomly chosen from the smaller clusters to show a practical example of the song recommendation procedure to a playlist. The algorithm had a potential to recommend 920 songs to the playlist, where 48 of them are suggested as seen in Table 22.

A Disney playlist Roger Bart’ Lillias White’ Bruce Adler’ Go the Distance I Won”t Say Arabian Nights’ Brad Kane’ ’ Jonathan Freeman’ One Jump Ahead’ A Whole New World’ Prince Ali (Reprise)’ Lea Salonga’ Donny Osmond’ Harvey Fierstein’ Reflection I”ll Make a Man Out of You A Girl Worth Fighting For Jason Weaver’ Carmen Twillie’ Jeremy Irons’ I Just Can”t Wait t... Circle Of Life Be Prepared Nathan Lane’ Jodi Benson’ Samuel E. Wright’ Hakuna Matata’ Part of Your World Under the Sea Chorus ’ Robby Benson’ Belle’ Be Our Guest Something There’ Angela Lansbury’ Mandy Moore’ Donna Murphy’ Beauty and the Beast’ When Will My Life Begin Mother Knows Best Mandy Moore’ Mandy Moore’ Judy Kuhn’ I”ve Got a Dream I See the Light Just Around The Riverbend’ Phil Collins’ Phil Collins’ Phil Collins’ Two Worlds’ You”ll Be In My Heart’ Son Of Man’ Rosie O”Donnell’ Phil Collins’ Heidi Mollenhauer’ Trashin” The Camp’ Strangers Like Me’ God Help The Outcasts’ Tony Jay’ Kristen Bell’ Kristen Bell’ Heaven”s Light Do You Want to Build a Snowman?’ For the First Time in Forever’ Kristen Bell’ ’ Kristen Bell’ Love Is an Open Door’ Let It Go For the First Time in Forever Ne-Yo’ Phil Collins’ Friend Like Me You”ll Be In My Heart’

Table 21: The songs with the associated artists from a random playlist to which songs are recommended to.

32 48 SUGGESTED SONGS Maia Wilson’ Cheryl Freeman’ Opetaia Foa”i’ Fixer Upper’ The Gospel Truth I We Know The Way Jesse McCartney’ Fess Parker’ Phil Collins’ When You Wish Up...’ The Ballad Of ’ On My Way’ Miley Cyrus’ Tony Jay’ Sarah McLachlan’ Butterfly Fly Away’ The Bells Of Notre Dame’ When She Loved Me’ ’ Auli”i Cravalho’ Bryan Adams’ Whistle While You Work’ How Far I”ll Go’ You Can”t Take Me Ken Page’ Angela Lansbury’ Alessia Cara’ Oogie ”s Song’ Human Again’ How Far I”ll Go Beth Fowler’ Keith David’ Jenifer Lewis’ Honor To Us All Friends on the Other Side Dig A Little Deeper Judy Kuhn’ The Cast of M. Keali”i Ho”omalu’ Colors Of The Wind’ One of Us’ He Mele No Lilo’ Jump5’ Rhoda Williams’ Louis Prima’ Aloha, E Komo Mai The Music Lesson I Wan”Na Be Like You ’ Jeremy Jordan’ Adam Mitchell’ An Unusual Prince The World Will Know’ Days In The Sun’ Bruce Reitherman’ Anna Kendrick’ Pocahontas’ The Bare Necessities’ No One Is Alone’ Where Do I Go From Here’ Auli”i Cravalho’ Bobby Driscoll’ Shakira’ Know Who You Are’ Following The Leader’ Try Everything - From Cedar Lane Orchestra’ Jemaine Clement’ Dr. John’ The Lion King’ Shiny’ Down in New Orleans’ Adriana Caselotti’ Elvis Presley with Orchestra’ Tony Jay’ Some Day My Prince ...’ Suspicious Minds’ Out There’ Samuel E. Wright’ 98’ *NSYNC’ Kiss the Girl True To Your Heart’ Trashin” The Camp Mark Mancina’ Ferb’ Richard White’ Village Crazy Lady Backyard Beach’ Gaston Jim Cummings’ Elton John Rachel House’ Gonna Take You There Can You Feel The Love Tonight I Am Moana

Table 22: 48 recommended songs, with the associated artists, to playlist in Table 21.

33 4 Discussion

As seen from almost every table in the results, there is a general theme of one big cluster, and then a large number of small ones. This pattern is especially visible when data points with less than 2 and 5 edges are removed from the first and second weight function. Sometimes there is even only one cluster present after the filtering is done. From this the conclusion was made that the problem is not that the data is not connected enough. Rather, the analysis made from this is that the data might be overly connected, thus the decision was made that for the third and fourth weighting we try and filter out data with top many connections.

The code is written in such a way that from the start it removes 0-rows and 0-columns, and furthermore this also shows in good way how many playlists have 0 matches after the threshold-filter. It is obvious from Table7 that it is quite rare that a playlist has 20% in common with another playlist, and looking at the 50% threshold data, we see that the matrix is reduced into a 192 × 192 matrix from a 16000 × 16000, which leaves the undesired result of filtering out 98.8% of the data. What is important to note is that the big cluster is a partition of all the data that has a high connectivity, meaning that suggesting a song to another playlist within the cluster is almost equal to randomizing a song. When trying the normalized

Laplacian an identical result is obtained, and thus the conclusion is made that for w1 there is no difference when approximating the minimization of NCut or RatioCut.

The second weight function shows that the matrices are larger in size, which is no surprise, because as stated in section 1.2 there is a big difference in the number of unique artists and unique tracks over the data set. Even considering this, it is worse in some aspects. Consider the point earlier made that the big cluster is almost equal to randomizing a song from the data set, and assuming that the second largest cluster is reliable. It is then possible to put a number on how many playlists in a given cluster that are able to receive a song recommendation. Below is a summation of how many playlists that are eligible for a song recommendation. Filter 20% <2 <5 30% <2 <5 40% <2 <5 50% <2 <5 w1 1124 387 586 1206 608 222 398 151 44 167 62 11 w2 104 13 0 272 34 0 529 83 35 1087 493 368

Table 23: A table of how many playlists are able to receive a song recommendation

In Table 23 you are able to see how many playlists that are (in a best case scenario) eligible for recommendations. It becomes obvious that all of the thresholds with and without edge filtering are lackluster. At a first glance, doing the weight function

34 with artists seemed better because of more connectivity, but Table 23 shows that w1 is almost better for every threshold. In the results part we present a closer look at a few clusters to get a grasp of how the clusters looks like. It then becomes visible that for the clusters that out algorithm does find, those are the clusters that have things in common and therefore we consider them as good candidates to be the source of a recommendation for a list within the cluster.

As expected when using w3 and w4 the dimensions of the matrices became larger. The weight function itself is amplifying edges with a low value, and reducing values that are on the high end, as seen in Figure 13. From the results of w3 and w4, one can see that there are a large number of clusters when the filtering is done for data points with more than 6 , 9 and 14 edges. Looking at w4, having a 20% threshold and not filtering any edges, the matrix size is 15 829. Increasing the threshold another 10% decreases the size to 15 621. Considering that the optimum would be being able to suggest song recommendations to every playlist, these sizes are desirable. The problem comes when looking at the cluster sizes, and furthermore looking at the largest cluster, showing the same tendencies as w1 and w2. To resolve these tendencies we filtered out playlists that we defined as overly connected with the intent to maybe break up the larger clusters.

Filter 20% >6 >9 >14 30% >6 >9 >14 40% >6 >9 >14 w3 273 2739 1802 1186 565 2249 2126 1322 1124 2438 2126 1356 w4 8 173 356 692 20 528 913 724 96 1086 1616 1136

Table 24: A table of how many playlists are able to receive a song recommendation

This table is done in the same manner as for w1 and w2, the difference is that sometimes the largest cluster is not too big, and thus not being subtracted when calculating how many playlists that can receive a suggestions. Interesting to note is that the same pattern is seen as above, that tracks behave in a more desired way than artists. Furthermore as seen in table 18 the intent of filtering over-connectivity also lead to more clusters, and because of this it is also easier to suggest songs to a given playlist. This is illustrated by the fact that the algorithm went up from being able to suggest songs to 1 206 playlists at best to 2 739. But after doing a closer look at clusters, as seen in Table 16, 17, 19 20 the clusters seem more random and are not as coherent as they were for w1 and w2. The conclusion drawn from this is that artificially modifying the weighting function does not bring a desirable result. Lastly we have recommended songs to a playlists as seen in Table 22. The playlist we recommended song to is a playlist with Disney songs. The recommended songs that the algorithm gave are also Disney related songs, for example the first suggestion is a soundtrack from the movie Frozen.

35 4.1 Conclusion One of the most important conclusions is that trying to manipulate the data structure with filtering different number of edges is not the way to go. First of all, every time an edge filtering is done, that also means that a playlist is removed, and because the optimum goal is to be able to suggest songs to every playlist, this is a bad solution. Secondly, when the filtering is done for playlists that we defined as overly connected, the clusters became more random and thus also a worse source for song recommendation. When applying the simple weightings w1 and w2 without filtering edges and ignoring the large cluster, we get results which are desirable. When looking at the largest cluster, the conclusion is that some playlists are too connected and needs to be dealt with in an alternative way. How well the presented method would fare in the challenge is unknown, but after taking a glance at the recommended songs in 3.6 it would seem that the recommendations are reasonable. To conclude, this study has shown that recommending songs using Spectral clustering might be a viable option, but further research has to be done.

4.2 Future work One of the most important parts to note is the fact that every simulation presented in this paper is done on a 16 000 × 16 000 matrix. There is no way to deduce if the result would be different using the intended 1 000 000 × 1 000 000 matrix, so here there is naturally room for further testing. As stated in the limitations section we decided to not pursue larger sets of data due to limitations in computational resources. Furthermore there are unlimited different ways to do the weight functions. In our case we picked one that is the simplest, natural and logical, and then used another one that we deduced might address the problems we had detected in the first one. That weight function did do better in the sense of being able to suggest songs to more playlists, but there were still a lot of small clusters and thus a limitation on how many songs one could recommend to some of the playlists. There was also the problem of the clusters being less structured than they had been with the original weight functions. A weight function that might not make the data overly-connected but still finds meaningful clusters would be interesting. Ultimately we note that there are methods to solve the problem of one large cluster. One of them is called hierarchical clustering [6], and in a sense it means that further clustering is done on the biggest clsuter.

36 References

[1] Howard. Anton and Robert C. Busby. Contemporary Linear Algebra. Hoboken, NJ : Wiley, 2003. isbn: 0471163627. [2] David Arthur and Sergei Vassilvitskii. “K-Means++: The Advantages of Careful Seeding”. In: SODA ’07 (2007), pp. 1027–1035. [3] Ching-Wei Chen et al. “Recsys Challenge 2018: Automatic Music Playlist Continuation”. In: RecSys ’18 (2018). doi: 10.1145/3240323.3240342. [4] Elias Jarlebring. “Numerics for graphs and clustering”. In: Lecture notes numerical algorithms for data science (SF2526) (2019), pp. 8–9. [5] Ulrike von Luxburg. “A Tutorial on Spectral Clustering”. In: Statistics and Computing 17(4), (2007). url: https://arxiv.org/abs/0711.0189. [6] Frank Nielsen. “Hierarchical Clustering”. In: Feb. 2016. isbn: 978-3-319-21902-8. doi: 10.1007/978-3-319-21903-5_8. [7] Jake VanderPlas. Python Data Science Handbook: Essential Tools for Working with Data. eng. Sebastopol: O’Reilly Media, Incorporated, 2016. isbn: 1491912057. [8] “What is Clustering”. In: Machine Learning Crash Course (2020). url: https: //developers.google.com/machine-learning/clustering/overview.

37 www.kth.se