Modeling the Music Genre Perception Across Language-Bound Cultures

Modeling the Music Genre Perception across Language-Bound Cultures Elena V. Epure1, Guillaume Salha1,2, Manuel Moussallam1, Romain Hennequin1 1 Deezer Research, Paris, France 2 LIX, Ecole´ Polytechnique, Palaiseau, France [email protected] Abstract coverage of music genres is strenuous (Bogdanov et al., 2019). The music genre perception expressed through To address this challenge, we study the feasi- human annotations of artists or albums varies bility of cross-culturally annotating music items significantly across language-bound cultures. with music genres, without relying on a parallel These variations cannot be modeled as mere corpus. In this work, culture is related to a com- translations since we also need to account for munity speaking the same language (Kramsch and cultural differences in the music genre perception. In this work, we study the feasibil- Widdowson, 1998). The specific research ques- ity of obtaining relevant cross-lingual, culture- tion we build upon is: assuming consistent pat- specific music genre annotations based only terns of music genres association with music items on language-specific semantic representations, within cultures, can a mapping between these pat- namely distributed concept embeddings and terns be learned by relying on language-specific ontologies. Our study, focused on six lan- semantic representations? It is worth noting that, guages, shows that unsupervised cross-lingual since music genres fall within the class of Culture- music genre annotation is feasible with high accuracy, especially when combining both Specific Items (Aixela´, 1996; Newmark, 1988), types of representations. This approach of cross-lingual annotation, in this case, cannot be studying music genres is the most extensive framed as standard translation, as one also needs to date and has many implications in musicol- to model the dissimilar perception of music genres ogy and music information retrieval. Besides, across cultures. we introduce a new, domain-dependent cross- Our work focuses on four language families, lingual corpus to benchmark state of the art Germanic (English-en and Dutch-nl), Romance multilingual pre-trained embedding models. (Spanish-es and French-fr), Japonic (Japanese-ja), Slavic (Czech-cs), and on two types of language- 1 Introduction specific semantic representations, ontologies and multi-word expression embeddings. A prevalent approach to culturally study music gen- First, ontologies are often used to represent mu- res starts with a common set of music items, e.g. sic genres, showing how they relate conceptually artists, albums, tracks, and assumes that the same (Schreiber, 2016). We identify Wikipedia1, the on- music genres would be associated with the items in line multilingual encyclopedia, to be particularly all cultures (Ferwerda and Schedl, 2016; Skowron relevant to our study. It extensively documents et al., 2017). However, music genres are subjective. worldwide music genres relating them through a Cultures themselves and individual musicological coherent set of relation types across languages (e.g. backgrounds influence the music genre perception, derivative genre, sub-genre). Though the relations which can differ among individuals (Sordo et al., types are the same per language, the actual music 2008; Lee et al., 2013). For instance, a Westerner genres and the way they are related can differ. In- may relate funk to soul and jazz, while a Brazil- deed, Pfeil et al. (Pfeil et al., 2006) have shown that ian to baile funk that is a type of rap (Hennequin Wikipedia contributions expose cultural differences et al., 2018). Thus, accounting for cultural dif- aligned with the ones in the physical world. ferences in music genres’ perception could give Second, music genres can be represented from a a more grounded basis for such cultural studies. distributional semantics perspective. Word vector However, ensuring both a common set of music items and culture-sensitive annotations with broad 1https://en.wikipedia.org 4765 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 4765–4779, November 16–20, 2020. c 2020 Association for Computational Linguistics spaces are generated from large corpora following multilingual pre-trained embedding models to de- the distributional hypothesis, i.e. words with simi- rive representations for multi-word concepts in the lar contexts have akin meanings. As languages are music domain. Our study can enable complete mu- passed on culturally, we assume that the language- sicological research, but also localized music infor- specific corpora used to create these spaces are mation retrieval. This latter application is crucial sufficient to convey concepts’ cultural specificity for online music streaming platforms that lever- into their vector representations. In our study, we age music genre annotations to provide worldwide, focus on multiple recent multilingual pre-trained user-personalized music recommendations. models to generate word or sentence2 embeddings Our domain-specific study complements other (Arora et al., 2017; Grave et al., 2018; Artetxe and works benchmarking general-language sentence Schwenk, 2019; Devlin et al., 2019; Lample and representations (Conneau et al., 2018). Finally, Conneau, 2019), to account for variances in the we provide an in-depth formal analysis of the used corpora or model designs. retrofitting part of our method. We prove the strict Lastly, we combine the semantic representations convexity of retrofitting and show that the ontol- by retrofitting distributed music genre embeddings ogy concepts’ final embeddings converge to the to music genre ontologies. Retrofitting (Faruqui same values despite the order in which concepts et al., 2015) modifies each concept embedding are iteratively updated, on condition that we know such that the representation is still close to the dis- a single initial node embedding in each connected tributed one, but also encodes ontology information. component of the ontology. Initially, we retrofit music genres per language, us- ing monolingual ontologies. Then, by partially 2 Related Work aligning these ontologies, we apply retrofitting to learn multilingual embeddings from scratch. Music genres are conceptual representations en- compassing a set of conventions between the mu- The results show that we can model the cross- sic industry, artists, and listeners about individual lingual music genre annotation with high accuracy music styles (Lena, 2012). From a cultural perspec- by combining both types of language-specific se- tive, it has been shown that there are differences mantic representations. When comparing the rep- in how people listen to music genres. (Ferwerda resentations derived from multilingual pre-trained and Schedl, 2016; Skowron et al., 2017). Average models, the smooth inverse frequency averaging listening habits in some countries span across many (Arora et al., 2017) of aligned word embeddings music genres and are less diverse in other countries outperforms the state of the art approaches. To our (Ferwerda and Schedl, 2016). Also, cultural dimen- knowledge, this simple method has been rarely sions proved strong predictors for the popularity of used to embed multilingual sentences (Vargas specific music genres (Skowron et al., 2017). et al., 2019), and we hypothesize its potential as a strong baseline on other cross-lingual datasets Despite the apparent agreement on the music and tasks too. Finally, embedding learning based style for which the music genres stand, conveyed on retrofitting leads to better multilingual music in the earlier definition and implied in the related genre representations than when inferred with pre- works too, music genres are subjective concepts trained embedding models. This opens the possi- (Sordo et al., 2008; Lee et al., 2013). To address bility to learn embeddings for rare music genres this subjectivity, Bogdanov et al.(2019) proposed or languages when aligned music genre ontologies a dataset of music items annotated with English are available. music genres by different sources. In this line of work, we address the divergent perception of music Summing up, our contributions are: 1) a study genres. Still, we focus on multilingual, unsuper- on how effective language-specific semantic repre- vised music genre annotation without relying on sentations of music genres are for modeling cross- content features, i.e. audio or lyrics. We also com- lingual annotation, without relying on a parallel plement similar studies in other domains (art: Eleta music item corpus; 2) an extensive evaluation of and Golbeck, 2012) with another research method. 2Music genres can be multi-word expressions. Previous Then, in the literature, there are other works that work (Shwartz and Dagan, 2019a) successfully embed phrases benchmark pre-trained word and sentence embed- with sentence embedding models. In particular, contextualized language models result in more meaningful representations ding models. van der Heijden et al.(2019) com- for diverse composition tasks. pares multilingual contextual language models for 4766 named-entity recognition and part-of-speech tag- Language nl fr es cs ja en 12604 28252 32891 4772 14752 ging. Shwartz and Dagan(2019b) use multiple nl 7139 7689 1885 3426 static and contextual word embeddings to represent fr 15616 3046 8622 multi-word expressions and assess their capacity es 3245 7644 cs 2065 to capture meaning shift and implicit meaning in compositionality. Conneau et al.(2018) formulate Table 1: Number of music items for language

Load more