Taste Space Versus the World: an Embedding Analysis of Listening Habits and Geography

TASTE SPACE VERSUS THE WORLD: AN EMBEDDING ANALYSIS OF LISTENING HABITS AND GEOGRAPHY Joshua L. Moore, Thorsten Joachims Douglas Turnbull Cornell University, Dept. of Computer Science Ithaca College, Dept. of Computer Science fjlmo|[email protected] [email protected] ABSTRACT Hauger et al. have matched to cities and other geographic descriptors as well. Our goal in this work is to use em- Probabilistic embedding methods provide a principled way bedding methods to enable a more thorough analysis of of deriving new spatial representations of discrete objects geographic and cultural patterns in this data by embedding from human interaction data. The resulting assignment cities and the artists from track plays in those cities into of objects to positions in a continuous, low-dimensional a joint space. The resulting taste space gives us a way to space not only provides a compact and accurate predictive directly measure city/city, city/artist, and artist/artist affini- model, but also a compact and flexible representation for ties. After verifying the predictive fidelity of the learned understanding the data. In this paper, we demonstrate how taste space, we explore the surprisingly clear segmenta- probabilistic embedding methods reveal the “taste space” tions in taste space across geographic, cultural, and lin- in the recently released Million Musical Tweets Dataset guistic borders. In particular, we find that the taste space (MMTD), and how it transcends geographic space. In par- of cities gives us a remarkably clear image of some cultural ticular, by embedding cities around the world along with and linguistic phenomena that transcend geography. preferred artists, we are able to distill information about cultural and geographical differences in listening patterns 2. RELATED WORK into spatial representations. These representations yield a Embeddings methods have been applied to many different similarity metric among city pairs, artist pairs, and city- modeling and information retrieval tasks. In the field of artist pairs, which can then be used to draw conclusions music IR, these models have been used for tag prediction about the similarities and contrasts between taste space and and song similarity metrics, as in the work of Weston et geographic location. al. [7]. However, instead of a prediction task such as this, 1. INTRODUCTION we intend to focus on data analysis tasks. Therefore, we Embedding methods are a type of machine learning algo- rely on generative models like those proposed in our previ- rithm for distilling large amounts of data about discrete ob- ous work [5, 6] and by Aizenberg et al [1]. Our prior work jects into a continuous and semantically meaningful rep- uses models which rely on sequences of songs augmented resentation. These methods can be applied even when with social tags [5] or per-user song sequences with tempo- only contextual information about the objects, such as co- ral dynamics [6]. The aim of this work differs from that of occurrence statistics or usage data, is available. For this our previous work in that we are interested in aggregate reason and due to the easy interpretability of the result- global patterns and not in any particular playlist-related ing models, embeddings have become popular for tasks task, so we do not adopt the notion of song sequences. We in many fields, including natural language processing, in- also are concerned with geographic differences in listen- formation retrieval, and music information retrieval. Re- ing patterns, and so we ignore individual users in favor of cently, embeddings have been shown to be a useful tool for embedding entire cities into the space. analyzing trends in music listening histories [6]. Aizenberg et al. utilize generative models like those in In this paper, we learn embeddings that give insight into our work for purposes of building a recommendation en- how music preferences relate to geographic and cultural gine for music from Internet radio data on the web. How- boundaries. Our input data is the Million Musical Tweets ever, their work focuses on building a powerful recommen- Dataset (MMTD), which was recently collected and cu- dation system using freely available data, and does not fo- rated by Hauger et al. [3]. This dataset consists of over a cus on the use of the resulting models for data analysis, nor million tweets containing track plays and rich geograph- do they concern themselves with geographic data. ical information in the form of globe coordinates, which The data set which we will use throughout this work was published by Hauger et al. [3]. The authors of this work crawled Twitter for 17 months, looking for tweets c Joshua L. Moore, Thorsten Joachims, Douglas Turnbull. which carried certain key words, phrases, or hashtags in Licensed under a Creative Commons Attribution 4.0 International Li- order to find posts which signal that a user is listening to a cense (CC BY 4.0). Attribution: Joshua L. Moore, Thorsten Joachims, Douglas Turnbull. “Taste Space Versus the World: an Embedding Anal- track and for which the text of the tweet could be matched ysis of Listening Habits and Geography”, 15th International Society for to a particular artist and track. In addition, the data was se- Music Information Retrieval Conference, 2014. lected for only tweets with geographical tags (in the form X of GPS coordinates), and temporal data was retained. The (X;Y;p) = max log(Pr(aijci)) X;Y;p final product is a large data set of geographically and tem- (ci;ai)2D porally tagged music plays. In their work, the authors em- X 2 = max −||X(ci)−Y (ai)jj2 +pai −log(Z(ai)): phasize the collection of this impressive data set and a thor- X;Y;p (c ;a )2D ough description of the properties of the data set. The au- i i thors do add some analyses of the data, but the geographic We solve this optimization problem using a Stochastic analysis is limited to only a few examples of coarse pat- Gradient Descent approach. First, each embedding vec- terns found in the data. The primary contribution of our tor X(·) and Y (·) is randomly initialized to a point in the work over the work presented in that paper is to greatly unit ball in Rd (for the chosen dimension d). Then, the extend the scope of the geographic analysis, presenting a model parameters are updated in sequential stochastic gra- much clearer and more exhaustive view of the differences dient steps until convergence. The partition function Z(·) in musical taste across regions, countries, and languages. presents an optimization challenge, in that a na¨ıve opti- Finally, we describe how geographic information can mization strategy requires O(jAj2) time for each pass over be useful for various music IR tasks. Knopke [4] also the data. For this work, we used our C++ implementa- discusses how geospatial data can be exploited for music tion of the efficient training method employed in [6], an marketing and musicological research. We use embedding approximate method that estimates the partition function as a tool to further explore these topics. Others, such as for efficient training. This implementation is available by Lamere’s Roadtrip Mixtape 1 app, have developed systems request, and will later be available on the project website, that use a listeners location to generate a playlist of rele- http://lme.joachims.org. vant music by local artists. 3.1 Interpretation of Embedding Space As defined above, the model gives us a joint space in which 3. PROBABILISTIC EMBEDDING MODEL both cities and artists are represented through their respec- The embedding model used in this paper is similar to the tive embedding vectors X(·) and Y (·). Related works have one used in our previous work [6]. However, the following found such embedding spaces to be rich with semantic sig- analysis focuses on geographical patterns instead of tem- nificance, compactly condensing the patterns present in the poral dynamics and trends. In particular, we focus on the training data. Distances in embedding space reveal rela- relationships among cities and artists, and so we elect to tionships between objects, and visual or spatial inspection condense the geographical information in a tweet down to of the resulting models quickly reveals a great deal of seg- the city from which it came. Similarly, we discard the track mentation in the space. In particular, joint embeddings name from each tweet and use only the artist for the song. yield similarity metrics among the various types of em- This leads to a joint embedding of cities and artists. bedded objects, even though individual dimensions in the At the core of the embedding model lies a probabilis- embedding space have no explicit meaning (e.g. the em- tic link function that connects the observed data to the beddings are rotation invariant). In our case, this specifi- underlying semantic space. Intuitively, the link function cally entails the following three measures of similarity: we use states that the probability Pr(ajc) of a given city City to Artist: this is the only similarity metric explic- c playing a given artist a is proportional to the distance itly formulated in the model, and it reflects the distribution 2 Pr(ajc) that we directly observe data for. In particular, we jjX(c) − Y (a)jj2 between that city and that artist in a Eu- clidean embedding space of a chosen dimension d. X(c) directly optimize the positions of cities and artists so that and Y (a) are the embedding locations of city c and artist cities have a high probability of listening to artists which a respectively. Similar to previous works, we also incor- they were observed playing in the dataset. This requires placing the city and artist nearby in the embedding space, porate a popularity bias term pa for each artist to model global popularity.

Load more