<<

ANALYSIS OF SONG/ARTIST LATENT FEATURES AND ITS APPLICATION FOR SONG SEARCH

Kosetsu Tsukuda Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan {k.tsukuda, m.goto}@aist.go.jp

ABSTRACT K‐dimensional K‐dimensional (a) latent vectors feature space 𝑠 𝑠 𝑠 𝑎 For recommending songs to a user, one effective approach 𝑠 is to represent artists and songs with latent vectors and … 𝑠 𝑠 predict the user’s preference toward the songs. Although FM 𝑎 Play logs 𝑎 the latent vectors represent the characteristics of artists and 𝑎 𝑎 𝑎 Embed songs well, they have typically been used only for comput- 𝑠: song … ing the preference score. In this paper, we discuss how 𝑎: artist we can leverage these vectors for realizing applications 𝑠: high overall similarity (b) 𝑠 that enable users to search for songs from new perspec- 𝑎 𝑠 low prominent affinity tives. To this end, by embedding song/artist vectors into 𝑠: low overall similarity the same feature space, we first propose two concepts of 𝟎 high prominent affinity artist-song relationships: overall similarity and prominent Figure 1. Overview of our proposed ideas: (a) embed- affinity. Overall similarity is the degree to which the char- ding artists’ latent vectors and songs’ latent vectors into acteristics of a song are similar overall to the characteris- the same feature space and (b) concepts of overall similar- tics of the artist; while prominent affinity is the degree to ity and prominent affinity. which a song prominently represents the characteristics of the artist. By using Last.fm play logs for two years, we an- In spite of the potential to embed heterogeneous data, alyze the characteristics of the concepts. Moreover, based there have been few studies that have embedded songs and on the analysis results, we propose three applications for artists [8]. If both songs and artists are embedded into the song search. Through case studies, we demonstrate that same feature space, greater variety of MIR applications can our proposed applications are beneficial for searching for be realized. For example, given an artist, we can search for songs according to the users’ various search intents. songs that have similar characteristics to those of the artist (i.e., songs whose feature vectors are close to the feature vector of the artist). 1. INTRODUCTION To realize this, we use Factorization Machines (FM) [9], Embedding songs into a feature space is beneficial for re- which is an item recommendation technique. By using alizing various applications for music information retrieval FM, each song, user, and artist can be represented by K- (MIR). For example, by using tags of each song, songs can dimensional latent vectors. Although such vectors are usu- be embedded into a tag-based feature space [1]; this en- ally used to compute a user’s preference toward a song, we ables a user to search for similar songs of her favorite song. leverage the latent vectors of songs and artists to embed In another example, by embedding songs into the Arousal- songs and artists into the same feature space (Fig. 1 (a)). Valence space [2] according to their audio features [3], a In the embedded feature space, we propose two con- user can search for the song with the highest Arousal value cepts of artist-song relationships: overall similarity and from her favorite artist’s songs. The same can be said for prominent affinity. Suppose the vector’s direction of artist artists; by embedding artists into a feature space based on a1 in Fig. 1 (b) represents the sadness. The length of the the topics of lyrics, artists similar to a user’s favorite artist vector indicates the degree of sadness. In this case, song in terms of topic similarity can be retrieved [4]. s1 has almost the same degree of sadness and s1 is simi- Embedding heterogeneous data into a feature space is lar overall to a1. On the other hand, song s2 has a higher also useful for MIR tasks: song recommendations for degree of sadness. This means that s2 prominently repre- playlists by embedding tags and songs [5, 6], comput- sents a1’s characteristics and has a higher prominent affin- ing tag similarity by embedding artists and tags [7], etc. ity with a1 in terms of sadness. We can recommend s1 and s2 to a user as a1’s characteristic songs because of the high overall similarity and the high prominent affinity, respec- c Kosetsu Tsukuda, Masataka Goto. Licensed under tively. Such a recommendation would be useful when the a Creative Commons Attribution 4.0 International License (CC BY user listens to one of a ’s songs for the first time because 4.0). Attribution: Kosetsu Tsukuda, Masataka Goto, “Analysis of 1 Song/Artist Latent Features and Its Application for Song Search”, in she can decide whether to listen to a’s other songs after lis- Proc. of the 21st Int. Society for Music Information Retrieval Conf., tening to a characteristic song of a1. By using these con- Montréal, Canada, 2020. cepts, we can realize not only such song recommendations

717 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020 but also various applications for MIR, as in Section 5. the time-dependent listening preferences of a population Our main contributions in this paper are as follows. by embedding users and songs [19]. Our study is differ- ent from theirs in that we embed artists and songs into a • We propose the concepts of overall similarity and feature space. prominent affinity between an artist and a song by et al. leveraging their latent vectors learned through FM The study closest to ours is that by Weston [8]. (Section 3). They proposed a method for embedding artists and songs into a feature space and solved tasks such as predicting • By using Last.fm play logs for two years, we show var- songs for a given artist and retrieving similar songs for a ious characteristics of overall similarity and prominent given song. To embed songs, their method requires audio affinity. For example, we reveal that an artist’s popu- data for all songs. In contrast, we use users’ play logs. lar songs tend to have high prominent affinity with the Although comparing the embedding accuracy of these two artist (Section 4). approaches is beyond the scope of this paper, our approach • We demonstrate that the concepts of overall similar- using play logs has an advantage over their approach in ity and prominent affinity can be used to realize vari- terms of the applicability of insights into the MIR com- ous applications for MIR. Specifically, we show three munity because various kinds of large play logs are easily applications: familiarity-oriented search, typicality- accessible [21–27] compared to the accessibility of large oriented search, and analogy search (Section 5). audio data.

2. RELATED WORK 2.3 MIR Applications for Song Search 2.1 Matrix Factorization for Song Recommendations In the MIR community, various kinds of applications for song search have been proposed. These applications have Matrix Factorization (MF) [10] has been widely used for enabled users to more easily find their desired songs and item recommendations. In the context of song recom- search for songs from a new perspective. For example, mendations, too, the effectiveness of MF has been re- query by humming [28–34] and query by [35–38] ported [11–14]. One of the characteristics of MF for song enable users to search for songs even when they do not recommendations is that each user and song are repre- know the song title. Similarity-based song search is also sented by K-dimensional latent vectors. This enables the beneficial for searching for new songs that are similar to model to learn a user’s latent preference toward songs. Al- a user’s favorite song, where similarity between songs is though MF typically considers interactions between users measured by low-level acoustic features [39, 40], voice and songs, Factorization Machines (FM) [9] can include timbres of vocals [41], tags [1], and a combination of these side information in addition to information about users and characteristics [42]. When a user has a specific search in- songs. Side information can be an artist, category of a tent, searching for songs using metadata [43,44] and words song, and even the weather when a user listens to a song. in lyrics [45] is also helpful to find her desired songs. Because of such flexibility, FM has also been used for song In this paper, we propose two concepts of artist-song recommendations [15–18]. The idea of using artist infor- relationships. These concepts can be used to search for an mation in FM for song recommendations has already been artist’s characteristic songs, as we will show in Section 4.3. proposed [15, 16]; in this case, each user, song, and artist In addition, in Section 5, we demonstrate application ex- are represented by K-dimensional latent vectors. amples that can be realized by leveraging these concepts Although such latent vectors in FM represent the char- and latent vectors of artists and songs. Our proposed ap- acteristics of users, songs, and artists well, they have typ- plications are also helpful for searching for users’ desired ically been used for computing a user’s preference score songs and new songs. We believe our study is beneficial toward a song and generating a personalized ranked list of for other researchers to implement MIR applications based songs for each user. Different from existing studies, we on our proposed concepts. analyze latent vectors of songs and artists based on new relationships between artists and songs (overall similarity and prominent affinity) and show their potential to imple- 3. OVERALL SIMILARITY AND PROMINENT ment MIR applications. AFFINITY

2.2 Heterogeneous Embedding for MIR In this section, we first describe how to generate latent vec- tors of artists and songs through FM. We then propose the The usefulness of embedding heterogeneous data into a concepts of overall similarity and prominent affinity. feature space has been reported in various MIR tasks, where data has been embedded by a Markov-model-based 3.1 Notation method [5, 19], co-occurrence-based method [20], etc. For example, by embedding tags and songs, song recommen- Let U, I, and A denote the sets of users, songs, and artists, + dations for playlists have been realized [5, 6]. Other ex- respectively. Iu represents the set of songs preferred by amples include computing tag similarity by embedding user u ∈ U. Following Lim et al. [46], we define the songs artists and tags [7], retrieving songs by words by embed- played ≥ µ times by u as the preferred songs of u. By ding songs and words in playlist titles [20], and visualizing using the data, we first aim to accurately generate a per-

718 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

+ sonalized ranked list of songs for each user u from I\Iu , 3.3 Overall Similarity and Prominent Affinity which is a set of songs not included in u’s preferred songs. The learned model parameters are usually used for com- 3.2 FM for Song Recommendation puting a user’s preference score toward a song by using Eq. 1. In this paper, we propose using the learned model FM [9] is a method for predicting a user’s preference to- parameters (i.e., latent vectors) νa and νs for capturing the ward an item based on MF [10]. A typical MF deals with relationships between artists and their songs. In Eq. 1, be- only interactions between users and items; while in FM, cause the affinity between νa and νs is considered, the nth side information (e.g., the artist or category of a song) can (1 ≤ n ≤ K) dimension of νa and that of νs have the also be included in the model. By considering artist infor- same meaning. For example, if the first dimension of νa mation, we can extract new relationships between artists represents the calmness, the first dimension of νs also rep- and songs, as we will describe in Section 3.3. Below, we resents the calmness of the song. Hence, we can embed νa describe FM considering artist information. and νs into the same feature space, as shown in Fig. 1 (a). In FM, user u, song s, and artist a have K-dimensional In the feature space, the angle and length of a vector repre- latent vectors νu, νs, and νa, respectively. The preference sents its qualitative and quantitative aspects, respectively. score of user u toward song s based on the second-order Specifically, the difference of the angle between two vec- estimator of FM is computed with the following model: tors represents the difference of their characteristics as a x = α+β +β +β +hν , ν i+hν , ν i+hν , ν i, song or artist because each dimension represents a charac- bus u s as u s u as as s (1) teristic of songs and artists. While the length of a vector where α is the global offset, as is the artist of s, and βu, βs, represents the degree of its characteristics: if the value of ν ’s nth dimension is large, ν represents nth character- and βas are the user/song/artist bias terms. As can be seen s s in the model, the preference score is computed by a user’s istic well. By using the embedded feature space, we pro- affinity with a song (hνu, νsi), user’s affinity with an artist pose two concepts of the relationships between artists and songs: overall similarity and prominent affinity. (hνu, νas i), and artist’s affinity with a song (hνas , νsi).

Regarding the last term hνas , νsi, if artist as tends to sing calm songs and song s is also calm, the value of hνas , νsi 3.3.1 Overall Similarity becomes high; while if s is exciting music, the value be- comes low. Note that because a song’s popularity is re- Suppose artist a1’s song s1 is mapped fairly close to a1 in flected in βs (i.e., more popular songs tend to have higher the feature space, as shown in Fig. 1 (b). In this case, we values of βs), popular songs do not always have high val- can say that s1 is similar overall to a1. Thus, we define the closeness between an artist and a song in the feature ues of hνas , νsi. Rather, the value of hνas , νsi is deter- mined purely by the affinity between a and s. space as the overall similarity between them. More for- Note that although we use simple FM just by adding mally, given artist a and song s, the overall similarity be- artist information, our goal in this paper is not to improve tween a and s is computed based on the Euclidean distance 1 between them: fos (a, s) = . recommendation accuracy. Rather, we aim to study how to kνa−νsk+1 leverage latent vectors of songs and artists. Although this The concept of overall similarity can be used to search simple FM can learn reliable latent vectors because it can for an artist’s song that represents the artist’s characteris- achieve high enough recommendation accuracy, as we will tics well. Searching for such a song and listening to it report in Section 4.2, we can adopt more sophisticated FM is useful to quickly understand the artist’s characteristic (e.g., FM considering song co-occurrence [47] and audio songs. In particular, when a user listens to an artist’s song information [11]); we leave this for future work. for the first time, it might be helpful for her to listen to the We adopt Bayesian Personalized Ranking (BPR) [48] to artist’s characteristic song and then decide if she wants to learn latent vectors. BPR is a pairwise ranking optimiza- listen to the artist’s other songs. tion framework and is designed to deal with users’ implicit consumption behaviors such as playing a song rather than 3.3.2 Prominent Affinity explicit ones such as rating. In BPR, the training set D used for optimizing parameters is defined as follows: Now suppose a1’s song s2 is mapped fairly close to the + + extended position of a1 as shown in Fig. 1 (b). This means D = {(u, i, j) | u ∈ U ∧ i ∈ Iu ∧ j ∈ I \ Iu }. that s prominently represents a ’s characteristics because That is, a triad (u, i, j) means that user u prefers song i to 2 1 the degree of each characteristic of s is higher than that song j. The optimization criterion for D is given by: 2 of a1. Thus, we define the inner product of an artist vector X 2 ln σ(xbuij) − λΘkΘk , (2) and a song vector as the prominent affinity between them. (u,i,j)∈D More formally, given artist a and a’s song s, the prominent where σ is the sigmoid function, Θ = {βs, βa, νu, νs, νa} affinity between a and s is given by fpa (a, s) = hνa, νsi. represents all model parameters, and λΘ is a regularization The concept of prominent affinity can be used to search hyperparameter. xbuij represents the difference between for an artist’s song that strongly represents the artist’s char- u’s preference for i and that for j, which is defined as acteristics. Similar to the overall similarity, searching for xbuij = xbui − xbuj. Finally, we learn the parameters by such a song would be helpful when a user listens to an using TensorFlow [49] with Adam optimizer [50]. artist’s song for the first time.

719 hte h aaeesi Mdvlpdb u aae are dataset our by developed FM by evaluate in to parameters FM aim the we whether of Rather, accuracy methods. other recommendation with dimen- comparing the latent evaluate the to set not We 1}. terms 0.1, sionality in from 0.01, set selected 0.001, are validation hyperparameters {0.0001, a the on where AUC, tuned the of are rate) learning the (i.e., appropri- hyperparameters are The parameters learned). model bet- ately the represents value (i.e., higher performance a and ter 1, and 0 between ranges δ [52, learned evaluate appropriately to 53]: are used parameters widely model is which whether Curve), ROC (Area the AUC Under the by evaluated is training performance for ommendation used are songs (i.e., remaining the song All preferred one select randomly i we end, this To split we user, each For Development Model 4.2 preprocessing. the 1 lis- after Table statistics have discarded. dataset are who the songs users shows different and 10 than users less different to 10 tened than less listened by been have to that artists and songs parameters, model than more or song where that five, to two set The is for 3.1 logs 12/31/2013). tion song and use to We 1/1/2012 ID (between respectively. artist years name, and artist ID and in- song name also converting dataset user for The of data timestamp. consists and cludes log ID, artist listening ID, Each song listening ID, [21]. music includes Last.fm that on dataset logs LFM-1b the use We Dataset 4.1 overall of (PA). characteristics affinity the prominent and analyze (OS) we similarity section, this In 0t 0 niceet f1.TeACstrtdwhen saturated AUC The 10. of increments in 100 to 10 2 Figure where graph. each in y-values the of order ascending in sorted are songs where song a represents dot Each ( I ∈ z 1 ) eeautdteACo h aiaindtstb changing by dataset validation the on AUC the evaluated We s1when 1 is u + AUC u fpeerdsns( songs preferred of Sum ( artists of Number ( songs of Number ( users of Number D o validation for ) Distance 0.0 0.5 1.0 1.5 2.0 2.5 3.0 itiuin f()dsac between distance (A) of Distributions . u K s (A) Distance between sicue in included is = o50. to = µ 𝝂 𝒔 ie.T nraeterlaiiyo learned of reliability the increase To times. { z al 1 Table 𝝂 ( |U| stu n tews.TeACvalue AUC The otherwise. 0 and true is ,j i, 1 𝒂 1 P ) .ANALYSIS 4. eepaieta u olhr is here goal our that emphasize We |U| I |I| |A| V u | aae statistics. Dataset . u + ∈U u ) ) i ) I notann/aiaints sets. training/validation/test into n nte o testing for another and 𝝂 u 𝒔 + µ |D P T ∈ and 1 stetrsodt determine to threshold the is u when u | ∈U 𝝂 P u 𝒂 |I ∧ ( rceig fte2s SI ofrne Montr Conference, ISMIR 21st the of Proceedings i,j u u + j | ) itn to listens ) ∈D I \ I ∈ u 18,478,304 λ δ Θ Angle 100 20 40 60 80 0 390,158 R ( K nE.2and 2 Eq. in 32,448 84,708 x b (B) Angle between u ui 50 = s h rec- The . µ ν u + T qa to equal > s 𝝂 nSec- in 𝝂 u } 𝒔 K 𝒔 and . and , ,𝝂 x b [51]. 𝝂 from uj 𝒂 𝒂 ν ) 720 , a B nl between angle (B) , 𝝂 𝒔 and locmue h pamnrn orlto ewe the between of We correlation values rank reliable. the Spearman are the PA and computed and also OS learned on appropriately based em- relations are the artist-song and space parameters feature model the bedded that means result This to process. close learning is the AUC after the 1 that showing by learned appropriately Peppers”). Chili Hot “Red bottom: Beatles,” “The (top: 2 Table in eaehg ihicesn ritpplrt:artists are popularity: popularities artist whose increasing with high became tions of one least at played pop- the artist define of we ularity When 0.502. high: relatively was artists and songs all of artist’s values the their of corre- popularities the the compute between we lation order artist, each descending For in popularities. them their sorting of by obtained are songs of played have who users different song of 3.2. popularity Section The in mentioned as popularity, song reflects rectly hs eut,w a a htteba term bias the that From say respectively. can average, we results, on these 0.755 and 0.682 of values raeyland fms fa ritssnsaemapped are songs artist’s an of most if appro- are learned, parameters priately model the that showed we Although Results Analysis 4.3 latent artist for used vectors. latent are song section and vectors this in computed values between affinity the by s mainly of determined value is the 1 Eq. and popularity, song’s the flects Rank 𝝂 seilyfrpplratss eefe,teparameter the Hereafter, artists. popular for especially Rank 𝒂 h U ntets e civdahg au:0.973. value: high a achieved set test the on AUC The 10 8 7 6 4 3 2 1 5 9 10 9 4 3 2 1 7 5 8 6 apns oe Company Loves Happiness akn eut fsnsi em fO n PA and OS of terms in songs of results Ranking . o ee ieM orMoney Your Me Give Never You β rna’ et Song Death Brendan’s s ni at Baby a Wants Annie n ogpplrte oeaut if evaluate to popularities song and o ieM o Much Too Me Like You β a ν s otn Faded Fortune mtRemmus Emit oieStation Police h’ evn Home Leaving She’s eeCmsteSun the Comes Here stenme fdfeetueswohave who users different of number the as

ieI Away It Give Ratio s h vrg ftecreain vrall over correlations the of average The . io Thing Minor 0.5 1.0 1.5 2.0 2.5 3.0 a’ Stop Can’t h ih Before Night The and (C) Ratio of a a,Cnd,Otbr1-6 2020 11-16, October Canada, eal, ´ ’ oTired So I’m ls Onion Glass ssns h vrg ftecorrela- the of average the songs, ’s 𝝂 ν ≥ s OS 𝒔 u King Sun a e Back Get If h End The a esrdb h ubrof number the by measured was / n C ai of ratio (C) and , 100 𝝂 h detrso anDneMaggie Dance Rain of Adventures The 𝒂 OS 𝝂 𝒔 and

s to uyi h k ihDiamonds with Sky the in Lucy h ouaiyranking popularity The . ≥ 𝝂 𝒂 att odYu Hand Your Hold to Want I 300 rna’ et Song Death Brendan’s l o edI Love Is Need You All adDysNight Day’s Hard A β a correlation had odnSlumbers Golden h ehrSong Zephyr The k mteWalrus the Am I s oeTogether Come nw(e Oh) (Hey ν aiCalifornia Dani Californication lao Rigby Eleanor h etil re- certainly s ν k yteWay the By crTissue Scar Something otr Me Torture a e Jude Hey to s , β a ν s k s s ν PA PA cor- i and a in k . Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

Rank Area (a) Area (b) Area (c) Area (d) 1 Have Love Will Travel Yearnin’ Can’t Find My Mind 2 Lonely Boy Same Old Thing Keep Your Hands Off Her Her Eyes Are A Blue Million Miles 3 Tighten Up I Got Mine Grown So Ugly Howlin For You 4 Run Right Back Till I Get My Way The Wicked Messenger 5 Everlasting Light The Baddest Man Alive Table 3. Ranking results of familiarity-oriented search for “” in terms of PA. very close to the artist, it would be useless to rank songs ac- High popularity cording to the subtle difference of their positions. To con- (b) For users who (a) For users who firm whether artists’ songs are well distributed, we evaluate want to know the are not familiar the distributions of (A) distance between νs and νa, (B) artist’s diversity. with the artist. Low PA High PA angle between νs and νa, and (C) ratio of kνsk to kνak. (d) For users who (c) For users who Fig. 2 shows the results. In all cases, the value distribution want to become want to listen to is close to the Gaussian distribution. These results indi- an artist devotee. unexpected songs. cate that songs are well distributed in the feature space and Low popularity it is meaningful to rank songs according to OS, which is Figure 3. Properties of songs classified by the degree of affected by (A), and PA, which is affected by (B) and (C). PA and the degree of popularity. Next, we analyze the ranked results of songs by OS/PA. Table 2 shows the top 10 songs of “The Beatles” and “Red songs by listening to the searched songs. After listening Hot Chili Peppers.” Since the top ranked songs are largely to such songs, searching for songs in area (b) would be different between the two concepts, it would be meaning- useful to let her understand the diversity of the artist be- ful to generate two ranked lists so that we can show one cause area (b) includes songs that are popular but different of them (or both of them) to a user according to her in- from the artist’s typical characteristics. Moreover, search- tent. We can also see that the top ranked songs in PA ing for songs in area (c) enables her to listen to unexpected tend to be more popular than those in OS. To evaluate this, songs in terms of the fact that the searched songs match we compute the Spearman rank correlation between OS- the artist’s typical characteristics well but are not known based/PA-based song ranking and popularity-based song by many people. Finally, searching for songs in area (d) ranking. As expected, the correlation of PA-based rank- would be helpful for her to become an artist devotee by ing is high (0.577), while that of OS-based ranking is low listening to them. (-0.147). As mentioned in Section 4.2, the effect of song In light of the above, given artist a, we rank all a’s songs popularity is eliminated by the bias terms βs. Therefore, for each area in terms of PA as follows. For area (a), the this high correlation of PA-based ranking is caused purely score of song s is computed by rap (s, Ia) + rpop (s, Ia), by the characteristics of the songs: songs strongly reflect- where Ia is a set of all songs of a and rap (s, Ia) and ing the artist’s characteristics tend to be popular. rpop (s, Ia) represent the ranks of s among Ia in terms of AP and popularity, respectively. The songs are then 5. APPLICATIONS ranked in ascending order of score. For area (b), (c), and (d), the score is given by r (s, I ) − r (s, I ), In Section 4.3, we showed how to directly use the con- pop a ap a r (s, I ) − r (s, I ), and −r (s, I ) − r (s, I ), cepts of OS and PA to rank an artist’s songs. In this sec- ap a pop a ap a pop a respectively. In these areas, the songs are also ranked in tion, by leveraging the concepts and learned parameters of ascending order. Regarding OS, songs in each area has the ν and ν , we demonstrate three applications: familiarity- a s same meaning as with PA and we can also make a song oriented search, typicality-oriented search, and analogy ranking for each area in the same manner as with PA. search. Through these demonstrations, we show that this Table 3 shows example results of the top five song rank- paper brings reusable insights in that our proposed con- ings in each area for “The Black Keys” in terms of PA. It cepts can be used for various kinds of MIR applications. would be beneficial to show each ranking to a user accord- 5.1 Familiarity-oriented Search ing to her familiarity with “The Black Keys.”

By considering PA and song popularity, an application to 5.2 Typicality-oriented Search search for songs according to a user’s familiarity with an artist can be realized. As we mentioned in Section 4.3, Because the parameters of all artists and all of their songs with a high PA tend to be popular, but some of them songs are learned by using the same optimization crite- are unpopular. Similarly, some popular songs have a low rion (Eq. 2), all artists and all songs can be embedded into PA. Hence, according to the degree of PA and the degree the same feature space. This means that given artist a, all of popularity, an artist’s songs can be classified into four songs in the dataset can be ranked in terms of OS or PA. areas, as shown in Fig. 3. Searching for songs in area (a) In other words, we can search for songs that represent a’s would be beneficial especially for a user who listens to the typical characteristics well even if they were not a’s songs. artist’s song for the first time because these songs are pop- By showing such songs to a user who is a fan of a, she ular and represent the artist’s typical characteristics well; may be willing to listen to unfamiliar songs because they she can decide if she wants to listen to the artist’s other are highly related to her favorite artist. This is also benefi-

721 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

Rank OS PA Query Rank Song title 1 California Gurls / Circus / Source artist: Madonna ● 1 Yoü and I 2 Racy Lacey / Girls Aloud ...Baby One More Time / Britney Spears 3 / Britney Spears I Wanna Go / Britney Spears Source song: Like a Prayer 2 Poker Face 4 Cannibal / Ke$ha Hung Up / Madonna Target artist: Lady Gaga 3 Telephone 5 / Britney Spears I Kissed a Girl / Katy Perry Query Rank Song title Rank OS PA Source artist: Eminem 1 The Edge of Glory 1 Cymbal Rush / Thom Yorke Untitled / Interpol Source song: Not Afraid 2 Bad Romance 2 Atoms for Peace / Thom Yorke The Clock / Thom Yorke Target artist: Lady Gaga 3 Yoü and I 3 And It Rained All Night / Thom Yorke I’ve Seen It All / Björk 4 Convergence / Svefn-g-englar / Sigur Rós Query Rank Song title 5 Quick Canal / Atlas Sound Kingdom of Rust / Doves Source artist: The Beatles 1 Milk Cow Table 4. Ranking results of typicality-oriented search (top: Source song: That Means A Lot 2 Face Target artist: Aerosmith 3 Temperature “Lady Gaga,” bottom: “”). Table 5. Ranking results of analogy search. cial for online music streaming services because they can expand users’ interests and let users listen to more songs. st as a good analogy search result of the query. However, Given artist a, when we generate a ranked list of all because the degree of the scatter of songs in the feature songs in I in terms of OS, we compute the score of each space is different from one artist to another, we need to song by fos (a, s), which was defined in Section 3.3.1, and normalize the angles and ratios between an artist and its rank all songs in descending order of scores. Similarly, for songs. Formally, given as, we first compute the angle be- s s PA, we use fpa (a, s), which was defined in Section 3.3.2, tween νa and the vector of each of a ’s songs. The angles and rank all songs in descending order of scores. are then normalized to fit into the interval [0, 1] by min- Table 4 shows the top five song rankings for “Lady max normalization. Let θss denote the normalized angle of s s Gaga” and “Radiohead.” Although we do not use acous- s . We also compute the ratio of the length of each of a ’s tic features and metadata of songs, in the case of “Lady songs to νas ’s length (i.e., kνsk/kνas k where s ∈ Ias ). Gaga,” all artists in the table are female artists. Similarly, Again, min-max normalization is applied and let rss be the s in the case of “Radiohead,” members of the band such as normalized ratio of s . Similarly, we compute the normal- t “Thom Yorke” and “Jonny Greenwood” are retrieved. In ized angles and ratios for each of a ’s songs. The score of t addition, because “Radiohead” is a rock band, rock bands s ∈ Iat for analogy search is then computed as follows: s s t t such as “Interpol,” “Sigur Rós,” and “Doves” are ranked at fanalogy (a , s , a , s ) = γ|θss − θst | + (1 − γ)|rss − rst |, higher positions. These results show the potential of this where γ is a parameter to determine the weights on angle application to enable users to find new attractive songs. similarity and length-ratio similarity. Finally, at’s songs s s t t are ranked in ascending order of fanalogy (a , s , a , s ). 5.3 Analogy Search In Table 5, we show example results where the top three When a user searches for an object in an unfamiliar domain songs are listed for each query. The value of γ is set to 0.5. and obtains the desired search results, it is helpful to give In the top table, the source song is “Like a Prayer,” which an example object in her familiar domain to the search sys- is a signature piece for “Madonna,” and the target artist is tem. This kind of search is known as analogy search and its “Lady Gaga.” In this case, the signature pieces for “Lady usefulness to search for persons [54] and restaurants [55] Gaga” are ranked higher. Even when the source artist is has been studied. An analogy search for MIR can also be “Eminem,” whose music is largely different from that of an attractive application as follows. Suppose a user is a big “Lady Gaga,” her signature pieces are retrieved for his sig- fan of “Eminem” and likes his song “Not Afraid.” She has nature piece “Not Afraid.” When the source song is “That recently become interested in “Lady Gaga.” However, be- Means A Lot,” which has a low PA with “The Beatles,” cause she has little knowledge on “Lady Gaga” and there songs with a low PA with “Aerosmith” are retrieved. From are too many “Lady Gaga” songs, she is at a loss as to these results, we can say that this application can search which song she should listen to. In such a case, by using for the target artist’s songs that have a similar relationship an analogy, she can ask something like “what Lady Gaga between the source artist and the source song. song corresponds to Not Afraid by Eminem?” If we can return search results for such a query, it would be helpful 6. CONCLUSION for her to try some songs of “Lady Gaga.” This paper analyzed song/artist latent features by embed- In an analogy search for MIR, given a query consist- ding them into a same feature space. Based on the analysis ing of source artist as, source song ss, and target artist at, results, we suggested three applications for song search. our goal is to return at’s song st where the relation be- We acknowledge a limitation of this paper in that we did tween at and st corresponds to that between as and ss. We not quantitatively evaluate the search results of those appli- compute the relation between an artist and a song based on cations. Nonetheless, we believe this study is worthwhile the angle between their latent vectors and the ratio of their contribution as a first step toward leveraging latent vectors lengths because they respectively represent the qualitative of songs and artists. In future work, we plan to quantita- and quantitative aspects of artists/songs as we described in tively evaluate the usefulness of our proposed applications Section 3.3. Intuitively, if the angle between as and ss is by conducting user studies. We also want other researchers t t similar to that between a and s , and the ratio of kνss k to to leverage the concepts of OS/PA and realize useful music kνas k is also similar to that of kνst k to kνat k, we regard information retrieval systems.

722 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

7. ACKNOWLEDGMENTS [11] C.-M. Chen, M.-F. Tsai, J.-Y. Liu, and Y.-H. Yang, “Using emotional context from article for contextual This work was supported in part by JSPS KAKENHI Grant music recommendation,” in Proceedings of the 21st Number 20K19934 and JST ACCEL Grant Number JPM- ACM International Conference on Multimedia, ser. JAC1602, Japan. MM ’13, 2013, pp. 649–652.

[12] A. Vall, M. Skowron, P. Knees, and M. Schedl, “Im- 8. REFERENCES proving music recommendations with a weighted fac- [1] M. Levy and M. Sandler, “A semantic space for mu- torization of the tagging activity,” in Proceedings of the sic derived from social tags,” in Proceedings of the 8th 16th International Society for Music Information Re- International Conference on Music Information Re- trieval Conference, ser. ISMIR ’15, 2015, pp. 65–71. trieval, ser. ISMIR ’07, 2007, pp. 411–416. [13] O. Gouvert, T. Oberlin, and C. Févotte, “Matrix co- [2] J. A. Russell, “A circumplex model of affect,” Journal factorization for cold-start recommendation,” in Pro- of personality and social psychology, vol. 39, no. 6, pp. ceedings of the 19th International Society for Music In- 1161–1178, 1980. formation Retrieval Conference, ser. ISMIR ’18, 2018, pp. 792–798. [3] S. Chapaneri and D. Jayaswal, “Structured prediction [14] D. Yang, T. Chen, W. Zhang, Q. Lu, and Y. Yu, “Local of music mood with twin gaussian processes,” in Pro- implicit feedback mining for music recommendation,” ceedings of the 7th International Conference on Pat- in Proceedings of the 6th ACM Conference on Recom- tern Recognition and Machine Intelligence, ser. PReMI mender Systems, ser. RecSys ’12, 2012, pp. 91–98. ’17, 2017, pp. 647–654. [15] F. Yuan, G. Guo, J. M. Jose, L. Chen, H. Yu, and [4] K. Tsukuda, K. Ishida, and M. Goto, “Lyric Jumper: W. Zhang, “Boostfm: Boosted factorization machines A lyrics-based music exploratory web service by mod- for top-N feature-based recommendation,” in Proceed- eling lyrics generative process,” in Proceedings of the ings of the 22Nd International Conference on Intelli- 18th International Society for Music Information Re- gent User Interfaces, ser. IUI ’17, 2017, pp. 45–54. trieval Conference, ser. ISMIR ’17, 2017, pp. 544–551. [16] ——, “Optimizing factorization machines for top-N [5] J. L. Moore, S. Chen, T. Joachims, and D. Turnbull, context-aware recommendations,” in Proceedings of “Learning to embed songs and tags for playlist predic- the 17th International Conference on Web Information tion,” in Proceedings of the 13th International Society Systems Engineering, ser. WISE ’16, 2016, pp. 278– for Music Information Retrieval Conference, ser. IS- 293. MIR ’12, 2012, pp. 349–354. [17] M. Pichl, E. Zangerle, and G. Specht, “Improving [6] K. Pugatschewski, T. Köllmer, and A. M. Kruspe, context-aware music recommender systems: Beyond “Playlists from the matrix - combining audio and meta- the pre-filtering approach,” in Proceedings of the 2017 data in semantic embeddings,” in Proceedings of the ACM on International Conference on Multimedia Re- 17th International Society for Music Information Re- trieval, ser. ICMR ’17, 2017, pp. 201–208. trieval Conference, ser. ISMIR ’16, 2016. [18] G. Vigliensoni and I. Fujinaga, “Automatic music rec- [7] G. Karamanolakis, E. Iosif, A. Zlatintsi, A. Pikrakis, ommendation systems: Do demographic, profiling, and A. Potamianos, “Audio-based distributional repre- and contextual features improve their performance?” sentations of meaning using a fusion of feature encod- in Proceedings of the 17th International Society for ings,” in Proceedings of the 17th Annual Conference of Music Information Retrieval Conference, ser. ISMIR the International Speech Communication Association, ’16, 2016, pp. 94–100. ser. INTERSPEECH ’16, 2016, pp. 3658–3662. [19] J. L. Moore, S. Chen, D. Turnbull, and T. Joachims, “Taste over time: The temporal dynamics of user pref- [8] J. Weston, S. Bengio, and P. Hamel, “Multi-tasking erences,” in Proceedings of the 14th International So- with joint semantic spaces for large-scale music anno- ciety for Music Information Retrieval Conference, ser. tation and retrieval,” Journal of New Music Research, ISMIR ’13, 2013, pp. 401–406. vol. 40, no. 4, pp. 337–348, 2011. [20] C. Chung, Y. Chen, and H. H. Chen, “Exploiting [9] S. Rendle, “Factorization machines,” in Proceedings of playlists for representation of songs and words for text- the 2010 IEEE International Conference on Data Min- based music retrieval,” in Proceedings of the 18th In- ing, ser. ICDM ’10, 2010, pp. 995–1000. ternational Society for Music Information Retrieval Conference, ser. ISMIR ’17, 2017, pp. 478–485. [10] Y. Koren, R. Bell, and C. Volinsky, “Matrix factoriza- tion techniques for recommender systems,” Computer, [21] M. Schedl, “The LFM-1b dataset for music retrieval vol. 42, no. 8, pp. 30–37, 2009. and recommendation,” in Proceedings of the 2016

723 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

ACM on International Conference on Multimedia Re- [33] P. Ferraro, P. Hanna, L. Imbert, and T. Izard, “Acceler- trieval, ser. ICMR ’16, 2016, pp. 103–110. ating query-by-humming on GPU,” in Proceedings of the 10th International Society for Music Information [22] G. Vigliensoni and I. Fujinaga, “The music listening Retrieval Conference, ser. ISMIR ’09, 2009, pp. 279– histories dataset,” in Proceedings of the 18th Interna- 284. tional Society for Music Information Retrieval Confer- ence, ser. ISMIR ’17, 2017, pp. 96–102. [34] C. de la Bandera, A. M. Barbancho, L. J. Tardón, S. Sammartino, and I. Barbancho, “Humming method [23] T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and for content-based music information retrieval,” in Pro- P. Lamere, “The million song dataset,” in Proceedings ceedings of the 12th International Society for Music In- of the 12th International Society for Music Information formation Retrieval Conference, ser. ISMIR ’11, 2011, Retrieval Conference, ser. ISMIR ’11, 2011, pp. 591– pp. 49–54. 596. [35] E. Molina, L. J. Tardón, I. Barbancho, and A. M. Bar- [24] Y. Labs, “R2 - Yahoo! Music user ratings of bancho, “The importance of F0 tracking in query-by- songs with artist, , and genre metainformation, singing-humming,” in Proceedings of the 15th Inter- v.1.0,” https://webscope.sandbox.yahoo.com/catalog. national Society for Music Information Retrieval Con- php?datatype=r. ference, ser. ISMIR ’14, 2014, pp. 277–282. [36] C. Wang, J. R. Jang, and W. Wang, “An improved query [25] Ò. Celma, Music Recommendation and Discovery - by singing/humming system using melody and lyrics The Long Tail, Long Fail, and Long Play in the Dig- information,” in Proceedings of the 11th International ital Music Space. Springer, 2010. Society for Music Information Retrieval Conference, [26] E. Zangerle, M. Pichl, W. Gassler, and G. Specht, ser. ISMIR ’10, 2010, pp. 45–50. “#nowplaying music dataset: Extracting listening be- [37] A. Duda, A. Nürnberger, and S. Stober, “Towards havior from ,” in Proceedings of the 1st Inter- query by singing/humming on audio databases,” in national Workshop on Internet-Scale Multimedia Man- Proceedings of the 8th International Conference on agement, ser. WISMM ’14, 2014, pp. 21–26. Music Information Retrieval, ser. ISMIR ’07, 2007, pp. 331–334. [27] D. Hauger, M. Schedl, A. Kosir, and M. Tkalcic, “The million musical tweet dataset - what we can learn from [38] J. R. Jang, C. Hsu, and H. Lee, “Continuous HMM and microblogs,” in Proceedings of the 14th International its enhancement for singing/humming query retrieval,” Society for Music Information Retrieval Conference, in Proceedings of the 6th International Conference on ser. ISMIR ’13, 2013, pp. 189–194. Music Information Retrieval, ser. ISMIR ’05, 2005, pp. 546–551. [28] E. Pollastri, “An audio front end for query-by- humming systems,” in Proceedings of the 2nd Inter- [39] E. Allamanche, J. Herre, O. Hellmuth, T. Kastner, and national Symposium on Music Information Retrieval, C. Ertel, “A multiple feature model for musical similar- ser. ISMIR ’01, 2001, pp. 65–72. ity retrieval,” in Proceedings of the 4th International Conference on Music Information Retrieval, ser. IS- [29] T. Sorsa and K. Halonen, “Mobile melody recognition MIR ’03, 2003, pp. 217–218. system with voice-only user interface,” in Proceedings of the 3rd International Conference on Music Informa- [40] J. Aucouturier and F. Pachet, “Music similarity mea- tion Retrieval, ser. ISMIR ’02, 2002, pp. 279–280. sures: What’s the use?” in Proceedings of the 3rd International Conference on Music Information Re- [30] S. Pauws, “CubyHum: A fully operational “query by trieval, ser. ISMIR ’02, 2002, pp. 157–163. humming” system,” in Proceedings of the 3rd Interna- [41] H. Fujihara and M. Goto, “A music information re- tional Conference on Music Information Retrieval, ser. trieval system based on singing voice timbre,” in Pro- ISMIR ’02, 2002, pp. 187–196. ceedings of the 8th International Conference on Music [31] A. Ito, S. Heo, M. Suzuki, and S. Makino, “Com- Information Retrieval, ser. ISMIR ’07, 2007, pp. 467– parison of features for dp-matching based query-by- 470. humming system,” in Proceedings of the 5th Interna- [42] F. Vignoli and S. Pauws, “A music retrieval system tional Conference on Music Information Retrieval, ser. based on user driven similarity and its evaluation,” in ISMIR ’04, 2004, pp. 297–303. Proceedings of the 6th International Conference on Music Information Retrieval, ser. ISMIR ’05, 2005, pp. [32] D. Little, D. Raffensperger, and B. Pardo, “A query by 272–279. humming system that learns from experience,” in Pro- ceedings of the 8th International Conference on Music [43] D. R. Turnbull, L. Barrington, G. Lanckriet, and Information Retrieval, ser. ISMIR ’07, 2007, pp. 335– M. Yazdani, “Combining audio content and social con- 338. text for semantic music discovery,” in Proceedings of

724 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

the 32nd International ACM SIGIR Conference on Re- [53] R. He and J. McAuley, “VBPR: Visual bayesian per- search and Development in Information Retrieval, ser. sonalized ranking from implicit feedback,” in Proceed- SIGIR ’09, 2009, pp. 387–394. ings of the 30th AAAI Conference on Artificial Intelli- gence, ser. AAAI’16, 2016, pp. 144–150. [44] P. Knees, T. Pohle, M. Schedl, and G. Widmer, “A music search engine built upon audio-based and web- [54] Y. Zhang, A. Jatowt, and K. Tanaka, “Is tofu the cheese based similarity measures,” in Proceedings of the 30th of Asia?: Searching for corresponding objects across International ACM SIGIR Conference on Research and geographical areas,” in Proceedings of the 26th Inter- Development in Information Retrieval, ser. SIGIR ’07, national Conference on World Wide Web Companion, 2007, pp. 447–454. ser. WWW ’17 Companion, 2017, pp. 1033–1042.

[45] E. Brochu and N. d. Freitas, “"Name that song!": A [55] M. P. Kato, H. Ohshima, and K. Tanaka, “Content- probabilistic approach to querying on music and text,” based retrieval for heterogeneous domains: Domain in Proceedings of the 15th International Conference on adaptation by relative aggregation points,” in Proceed- Neural Information Processing Systems, ser. NIPS’02, ings of the 35th International ACM SIGIR Confer- 2002, pp. 1529–1536. ence on Research and Development in Information Re- trieval, ser. SIGIR ’12, 2012, pp. 811–820. [46] D. Lim, J. McAuley, and G. Lanckriet, “Top-N rec- ommendation with missing implicit feedback,” in Pro- ceedings of the 9th ACM Conference on Recommender Systems, ser. RecSys ’15, 2015, pp. 309–312.

[47] D. Liang, J. Altosaar, L. Charlin, and D. M. Blei, “Fac- torization meets the item embedding: Regularizing ma- trix factorization with item co-occurrence,” in Proceed- ings of the 10th ACM Conference on Recommender Systems, ser. RecSys ’16, 2016, pp. 59–66.

[48] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “BPR: Bayesian personalized ranking from implicit feedback,” in Proceedings of the 25th Conference on Uncertainty in Artificial Intelli- gence, ser. UAI ’09, 2009, pp. 452–461.

[49] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V.Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: A sys- tem for large-scale machine learning,” in Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI’16, 2016, pp. 265–283.

[50] D. P. Kingma and J. Ba, “Adam: A method for stochas- tic optimization,” in Proceedings of the 3rd Interna- tional Conference on Learning Representations, ser. ICLR ’15, 2015.

[51] K. Tsukuda, S. Fukayama, and M. Goto, “ABCPRec: Adaptively bridging consumer and producer roles for user-generated content recommendation,” in Proceed- ings of the 42nd International ACM SIGIR Confer- ence on Research and Development in Information Re- trieval, ser. SIGIR’19, 2019, p. 1197–1200.

[52] W. Pan and L. Chen, “GBPR: Group preference based bayesian personalized ranking for one-class collabora- tive filtering,” in Proceedings of the 23rd International Joint Conference on Artificial Intelligence, ser. IJCAI ’13, 2013, pp. 2691–2697.

725