Exploiting Playlists for Representation of Songs and Words for Text-Based Music Retrieval
Total Page:16
File Type:pdf, Size:1020Kb
EXPLOITING PLAYLISTS FOR REPRESENTATION OF SONGS AND WORDS FOR TEXT-BASED MUSIC RETRIEVAL Chia-Hao Chung Yian Chen Homer Chen National Taiwan University KKBOX Inc. National Taiwan University [email protected] [email protected] [email protected] ABSTRACT The web-based approach [10, 11] has been considered a good alternative because web documents have rich text As a result of the growth of online music streaming information. However, its performance may degrade in services, a large number of playlists have been created by the presence of noisy text [12]. In contrast, the playlist- users and service providers. The title of each playlist based approach has the following appealing features: 1) provides useful information, such as the theme and The rich text information conveyed by the succinct listening context, of the songs in the playlist. In this paper, playlist title is highly relevant to the songs in the playlist we investigate how to exploit the words extracted from and 2) Songs wrapped in one playlist must be related to playlist titles for text-based music retrieval. The main each other in a certain way. If the relationship can be idea is to represent songs and words in a common latent determined from the playlist, additional efforts on audio space so that the music retrieval is converted to the signal analysis [6, 7, 11] can be saved. problem of selecting songs that are the nearest neighbors Our main idea is to represent songs and words in a of the query word in the latent space. Specifically, an common latent space so that music retrieval can be unsupervised learning method is proposed to generate a converted to the problem of selecting songs sufficiently latent representation of songs and words, where the near the query word in the latent space. Specifically, we learning objects are the co-occurring songs and words in propose an unsupervised learning method to generate a playlist titles. Five metrics (precision, recall, coherence, representation of songs and words extracted from playlist diversity, and popularity) are considered for performance titles, in which the learning function is optimized based evaluation of the proposed method. Qualitative results on the co-occurrence of songs and words in playlists. As demonstrate that our method is able to capture the each song or word (an object) is represented as a vector in semantic meaning of songs and words, owning to the a latent space, the semantic similarity between two proximity property of related songs and words in the objects can be easily determined by the distance between latent space. the two corresponding vectors. By exploiting this property, we can improve the performance of text-based 1. INTRODUCTION music retrieval. Online music streaming services, such as Spotify, Apple Our contributions can be summarized as follows: Music, and KKBOX, create various playlists for the We propose an unsupervised learning method to convenience of music listening for users. Meanwhile, model the relevance between songs and words of users may create their own playlists for replay or music playlists and to represent these two kinds of objects in sharing with friends [1–5]. The use of playlist makes a common latent space. music retrieval and organization simple and easy, largely We make text-based music retrieval easier to solve by because the title of a playlist carries significant thematic formulating it as a nearest neighbor search problem in information of the songs contained in the playlist [1–3]. the latent space. The theme can be about an artist, genre, mood, or context Both qualitative and quantitative evaluations are of the playlist. Therefore, the thematic information is conducted to demonstrate the effectiveness of the useful for music retrieval. The goal of this paper is to proposed method. exploit playlists for text-based music retrieval. One critical issue of text-based music retrieval is 2. RELATED WORK how to identify and quantify the relationship between In this section, we review previous work related to words and songs (i.e. which songs and words are relevant playlist understanding, text-based music retrieval, and to each other and how much is the relevance). Most representation learning. previous approaches to text-based music retrieval rely on human-labeled datasets [6, 7] or social tags [8, 9] which 2.1 Playlist Understanding normally have a limited size of vocabulary (word set). To understand the use of playlist, Hagen [1] and © Chia-Hao Chung, Yian Chen, Homer Chen. Cunningham et al. [2] conducted user interviews to Licensed under a Creative Commons Attribution 4.0 International analyze various themes and contexts of playlists. The License (CC BY 4.0). Attribution: Chia-Hao Chung, Yian Chen, results motivated Pichl et al. [3] to mine common Homer Chen. “Exploiting Playlists for Representation of Songs and Words for Text-Based Music Retrieval”, 18th International Society for listening contexts using playlist titles for context-aware Music Information Retrieval Conference, Suzhou, China, 2017. 478 Proceedings of the 18th ISMIR Conference, Suzhou, China, October 23-27, 2017 479 Playlist Title Words Songs (Artists) The Boys of Summer (The Ataris) So Long, So Long (Dashboard Confessional) Last Days of Summer (Silverstein) Summer's Over summer Close To Home (The Get Up Kids) Always Summer (Yellowcard) … Snap Out Of It (Arctic Monkeys) happy Unbelievers (Vampire Weekend) Happy Morning morning Demons (Imagine Dragons) Chill The Mother We Share (Chvrches) chill Everybody Wants To Rule The World (Lorde) … Don't Let the Sun Go Down on Me (George Michael) Careless Whisper (George Michael) George Michael - george_michael Heal The Pain (George Michael) For the Heart heart A Different Corner (George Michael) I Can't Make You Love Me (George Michael) … Table 1. Illustration of words extracted from playlists. Only the first five songs of a playlist are shown. music recommendation. In our work, we take a step complexity. Second, it makes information retrieval or further and investigate how to exploit the words recommendation an easy task that can be efficiently extracted from playlist titles for text-based music accomplished. However, little attention has been paid to retrieval. exploit representation learning for text-based music A related issue is playlist quality measurement. retrieval. In this paper, we extend the idea of embedding Motivated by the observation that the songs in a playlist, learning [16–24], which is a typical representation although diverse, are related to each other in a certain learning approach, to model the relevance between songs way, Fields [4] introduced coherence and diversity as and words of playlists. metrics of playlist quality. It was found that popularity and freshness of songs in a playlist are also important 3. PROPOSED METHOD metrics [5]. Considering that the response to a text query We first introduce the notations used in this paper. Then, is in the form of playlist, we apply coherence, diversity, we describe the proposed method for learning a and popularity as metrics for performance evaluation. representation of songs and words and the detail of the 2.2 Text-Based Music Retrieval training processing, including optimization and data sampling. Finally, we describe how the learned To allow the retrieval of music pieces by text query, the representation is applied to text-based music retrieval. relevance between words and songs has to be identified. Turnbull et al. [6] and Chechik et al. [7] developed a 3.1 Notations multi-class classification approach to predict the Let 퐿 = {푙 , 푙 , … , 푙 } be a set of playlists and 푇 = relevance of a music piece to a query. To address the 1 2 I {푡 , 푡 , … , 푡 } be the set of corresponding playlist titles. issue that the perception of relevance is subjective, Hariri 1 2 I Each playlist 푙 , 1 ≤ ≤ I, as illustrated in Table 1, is et al. [8] and Cheng et al. [9] used a probabilistic model 푖 associated with a set of songs 푆푖 = {푠푖 , 푠푖 , … , 푠푖 } and a and listening records to personalize text-based music 1 2 |푙푖| retrieval. To extend the coverage of text queries, Knees set of words 푊푖 = {푤푖 , 푤푖 , … , 푤푖 } extracted from 푡 . 1 2 |푡푖| 푖 et al. [10–12] crawled web documents relevant to a 푖 Let 푆 = {푠1, 푠2, … , 푠N} be the union of all 푆 , and 푊 = music piece and represented the music piece by the text {푤 , 푤 , … , 푤 } be the union of all 푊푖 . The goal is to extracted from the web documents. However, most of the 1 2 M learn a representation 휃(∙) to map each 푠 ∈ 푆 or 푤 ∈ body of words contained in web documents can be 푛 푚 푊 to a vector. irrelevant to the theme of the music pieces. To solve the problem, we develop an alternative approach that seeks 3.2 Representation Learning for Songs and Words relevant words from the playlist titles. We extend the idea of embedding learning to songs and 2.3 Representation Learning words. In its basic form, the embedding learning generates a representation for a set of objects based on Representation learning has been widely applied to the co-occurrence of the objects [23]. It consists of two music recommendation [13, 16, 29], playlist recommen- stages. In the first (or initialization) stage, the dation [17], music annotation and retrieval [18], playlist representation 휐(∙) assigns a vector of random values to generation [19, 20], and listening behavior analysis [21, each object. In second (or update) stage, the vector is 22]. The popularity of representation learning is due to progressively updated in two steps. In the first step, a its two appealing features. First, it can efficiently handle large scale dataset [23, 24] because of low model conditional probability 푃(표푐|휐(표)) for each pair of objects 표 and 표푐 is created, where 표푐 is the co-occurring 480 Proceedings of the 18th ISMIR Conference, Suzhou, China, October 23-27, 2017 where 휑(∙) maps 푠푐 into a vector space.