MULTI-LABEL MUSIC GENRE CLASSIFICATION FROM AUDIO, TEXT, AND IMAGES USING DEEP FEATURES Sergio Oramas1, Oriol Nieto2, Francesco Barbieri3, Xavier Serra1 1Music Technology Group, Universitat Pompeu Fabra 2Pandora Media Inc. 3TALN Group, Universitat Pompeu Fabra fsergio.oramas, francesco.barbieri,
[email protected],
[email protected] ABSTRACT exclusive (i.e., a song could be Pop, and at the same time have elements from Deep House and a Reggae grove). In Music genres allow to categorize musical items that share this work we aim to advance the field of music classifi- common characteristics. Although these categories are not cation by framing it as multi-label genre classification of mutually exclusive, most related research is traditionally fine-grained genres. focused on classifying tracks into a single class. Further- To this end, we present MuMu, a new large-scale mul- more, these categories (e.g., Pop, Rock) tend to be too timodal dataset for multi-label music genre classification. broad for certain applications. In this work we aim to ex- MuMu contains information of roughly 31k albums clas- pand this task by categorizing musical items into multiple sified into one or more 250 genre classes. For every al- and fine-grained labels, using three different data modal- bum we analyze the cover image, text reviews, and audio ities: audio, text, and images. To this end we present tracks, with a total number of approximately 147k audio MuMu, a new dataset of more than 31k albums classified tracks and 447k album reviews. Furthermore, we exploit into 250 genre classes. For every album we have collected this dataset with a novel deep learning approach to learn the cover image, text reviews, and audio tracks.