Music Genre Classification Systems

Total Page:16

File Type:pdf, Size:1020Kb

Music Genre Classification Systems Music Genre Classification Systems - A Computational Approach Peter Ahrendt Kongens Lyngby 2006 IMM-PHD-2006-164 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk IMM-PHD: ISSN 0909-3192 Summary Automatic music genre classification is the classification of a piece of music into its corresponding genre (such as jazz or rock) by a computer. It is considered to be a cornerstone of the research area Music Information Retrieval (MIR) and closely linked to the other areas in MIR. It is thought that MIR will be a key element in the processing, searching and retrieval of digital music in the near future. This dissertation is concerned with music genre classification systems and in particular systems which use the raw audio signal as input to estimate the corresponding genre. This is in contrast to systems which use e.g. a symbolic representation or textual information about the music. The approach to music genre classification systems has here been system-oriented. In other words, all the different aspects of the systems have been considered and it is emphasized that the systems should be applicable to ordinary real-world music collections. The considered music genre classification systems can basically be seen as a feature representation of the song followed by a classification system which predicts the genre. The feature representation is here split into a Short-time feature extraction part followed by Temporal feature integration which combines the (multivariate) time-series of short-time feature vectors into feature vectors on a larger time scale. Several different short-time features with 10-40 ms frame sizes have been exam- ined and ranked according to their significance in music genre classification. A Consensus sensitivity analysis method was proposed for feature ranking. This method has the advantage of being able to combine the sensitivities over several ii resamplings into a single ranking. The main efforts have been in temporal feature integration. Two general frame- works have been proposed; the Dynamic Principal Component Analysis model as well as the Multivariate Autoregressive Model for temporal feature integra- tion. Especially the Multivariate Autoregressive Model was found to be success- ful and outperformed a selection of state-of-the-art temporal feature integration methods. For instance, an accuracy of 48% was achieved in comparison to 57% for the human performance on an 11-genre problem. A selection of classifiers were examined and compared. We introduced Co- occurrence models for music genre classification. These models include the whole song within a probabilistic framework which is often an advantage compared to many traditional classifiers which only model the individual feature vectors in a song. Resum´e Automatisk musik genre klassifikation er et forskningsomr˚ade, som fokuserer p˚a, at klassificere musik i genrer s˚asom jazz og rock ved hjælp af en computer. Det betragtes som en af de vigtigste omr˚ader indenfor Music Information Retrieval (MIR). Det forventes, at MIR vil spille en afgørende rolle for f.eks. behandling og søgning i digitale musik samlinger i den nærmeste fremtid. Denne afhandling omhandler automatisk musik genre klassifikation og specielt systemer, som kan prædiktere genre udfra det r˚a digitale audio signal. Mod- sætningen er systemer, som repræsenterer musikken i form af f.eks. symboler, som det bruges i almindelig node-notation, eller tekst-information. Tilgangen til problemet har generelt været system-orienteret, s˚aledes at alle komponenter i systemet tages i betragtning. Udgangspunktet har været, at systemet skulle kunne fungere p˚a almindelige folks musik samlinger med blandet musik. Standard musik genre klassifikations-systemer kan generelt deles op i en fea- ture repræsentation af musikken, som efterfølges af et klassifikationssystem til mønster genkendelse i feature rummet. I denne afhandling er feature repræsen- tationen delt op i hhv. Kort-tids feature ekstraktion og Tidslig feature integra- tion, som kombinerer den (multivariate) tidsserie af kort-tids features i en enkelt feature vektor p˚a en højere tidsskala. I afhandlingen undersøges adskillige kort-tids features, som lever p˚a 10-40 ms tidsskala, og de sorteres efter hvor godt de hver især kan bruges til musik genre klassifikation. Der foresl˚as en ny metode til dette, Konsensus Sensitivitets Anal- yse, som kombinerer sensitivitet fra adskillige resamplings til en samlet vurder- ing. iv Hovedvægten er lagt p˚aomr˚adet tidslig feature integration. Der foresl˚as to nye metoder, som er Dynamisk Principal Komponent Analyse og en Multivariat Au- toregressiv Model til tidslig integration. Den multivariate autoregressive model var mest lovende og den gav generelt bedre resultater end en række state-of-the- art metoder. For eksempel gav denne model en klassifikationsfejl p˚a 48% mod 57% for mennesker i et 11-genre forsøg. Der blev ogs˚a undersøgt og sammenlignet et udvalg af klassifikationssystemer. Der foresl˚as desuden Co-occurrence modeller til musik genre klassifikation. Disse modeller har den fordel, at de er i stand til at modellere hele sangen i en prob- abilistisk model. Dette er i modsætning til traditionelle systemer, som kun modellerer hver feature vektor i sangen individuelt. Preface This dissertation was prepared at the Institute of Informatics and Mathemat- ical Modelling, Technical University of Denmark in partial fulfillment of the requirements for acquiring the Ph.D. degree in engineering. The dissertation investigates automatic music genre classification which is the classification of music into its corresponding music genre by a computer. The ap- proach has been system-oriented. Still, the main efforts have been in Temporal feature integration which is the process of combining a time-series of short-time feature vectors into a single feature vector on a larger time scale. The dissertation consists of an extensive summary report and a collection of six research papers written during the period 2003–2006. Lyngby, February 2006 Peter Ahrendt vi Papers included in the thesis [Paper B] Ahrendt P., Meng A., and Larsen J., Decision Time Horizon for Mu- sic Genre Classification using Short Time Features, Proceedings of European Signal Processing Conference (EUSIPCO), Vienna, Austria, September 2004. [Paper C] Meng A., Ahrendt P., and Larsen J., Improving Music Genre Classifi- cation by Short-Time Feature Integration, Proceedings of IEEE In- ternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, USA, March 2005. [Paper D] Ahrendt P., Goutte C., and Larsen J., Co-occurrence Models in Music Genre Classification, Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Mystic, Connecticut, USA, September 2005. [Paper E] Ahrendt P. and Meng A., Music Genre Classification using the Multivariate AR Feature Integration Model, Audio Genre Classi- fication contest at the Music Information Retrieval Evaluation eXchange (MIREX) (in connection with the annual ISMIR conference) [53], London, UK, September 2005. [Paper F] Hansen L. K., Ahrendt P., and Larsen J., Towards Cognitive Compo- nent Analysis, Proceedings of International and Interdisciplinary Con- ference on Adaptive Knowledge Representation and Reasoning (AKRR), Espoo, Finland, June 2005. [Paper G] Meng A., Ahrendt P., Larsen J., and Hansen L. K., Feature Integration for Music Genre Classification, yet unpublished article, 2006. viii Acknowledgements I would like to take this opportunity to first and foremost thank my colleague Anders Meng. We have had a very close collaboration which is a fact that the publication list certainly illustrates. However, most importantly, the collabora- tion has been very enjoyable and interesting. Simply having a counterpart to tear down your idea or (most often) help you build it up, has been priceless. I would also like to thank all of the people at the Intelligent Signal Processing group for talks and chats about anything and everything. It has been interesting, funny and entertaining and contributed a lot to the good times that I’ve had during last three years of studies. Last, but not least, I would like to thank my wife for help with careful proof- reading of the dissertation as well as generally being very supportive. x Contents Summary i Resum´e iii Preface v Papers included in the thesis vii Acknowledgements ix 1 Introduction 1 1.1Scientificcontributions........................ 3 1.2Overviewofthedissertation..................... 3 2 Music Genre Classification Systems 5 2.1Humanmusicgenreclassification.................. 7 2.2 Automatic music genre classification . 11 xii CONTENTS 2.3Assumptionsandchoices....................... 14 3 Music features 17 3.1Short-timefeatureextraction.................... 18 3.2Featurerankingandselection.................... 29 4 Temporal feature integration 31 4.1GaussianModel............................ 36 4.2MultivariateAutoregressiveModel................. 38 4.3DynamicPrincipalComponentAnalysis.............. 47 4.4FrequencyCoefficients........................ 48 4.5LowShort-TimeEnergyRatio................... 48 4.6HighZero-CrossingRateRatio................... 49 4.7BeatHistogram............................ 49 4.8BeatSpectrum............................ 50 5 Classifiers andPostprocessing 51 5.1GaussianClassifier.......................... 54 5.2GaussianMixtureModel....................... 56 5.3LinearRegressionclassifier....................
Recommended publications
  • M.Sc. RESEARCH THESIS LARGE-SCALE MUSIC
    LARGE-SCALE MUSIC CLASSIFICATION USING AN APPROXIMATE k-NN CLASSIFIER Haukur Pálmason Master of Science Computer Science January 2011 School of Computer Science Reykjavík University M.Sc. RESEARCH THESIS ISSN 1670-8539 Large-Scale Music Classification using an Approximate k-NN Classifier by Haukur Pálmason Research thesis submitted to the School of Computer Science at Reykjavík University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science January 2011 Research Thesis Committee: Dr. Björn Þór Jónsson, Supervisor Associate Professor, Reykjavík University, Iceland Dr. Laurent Amsaleg Research Scientist, IRISA-CNRS, France Dr. Yngvi Björnsson Associate Professor, Reykjavík University, Iceland Copyright Haukur Pálmason January 2011 Large-Scale Music Classification using an Approximate k-NN Classifier Haukur Pálmason January 2011 Abstract Content based music classification typically uses low-level feature vectors of high-dimensionality to represent each song. Since most classification meth- ods used in the literature do not scale well, the number of such feature vec- tors is usually reduced in order to save computational time. Approximate k-NN classification, on the other hand, is very efficient and copes better with scale than other methods. It can therefore be applied to more feature vec- tors within a given time limit, potentially resulting in better quality. In this thesis, we first demonstrate the effectiveness and efficiency of approximate k-NN classification through a traditional genre classification task, achieving very respectable accuracy in a matter of minutes. We then apply the approx- imate k-NN classifier to a large-scale collection of more than thirty thousand songs. We show that, through an iterative process, the classification con- verges relatively quickly to a stable state.
    [Show full text]
  • Multimodal Deep Learning for Music Genre Classification
    Oramas, S., Barbieri, F., Nieto, O., and Serra, X (2018). Multimodal Transactions of the International Society for Deep Learning for Music Genre Classification, Transactions of the Inter- Music Information Retrieval national Society for Music Information Retrieval, V(N), pp. xx–xx, DOI: https://doi.org/xx.xxxx/xxxx.xx ARTICLE TYPE Multimodal Deep Learning for Music Genre Classification Sergio Oramas,∗ Francesco Barbieri,y Oriol Nieto,z and Xavier Serra∗ Abstract Music genre labels are useful to organize songs, albums, and artists into broader groups that share similar musical characteristics. In this work, an approach to learn and combine multimodal data representations for music genre classification is proposed. Intermediate rep- resentations of deep neural networks are learned from audio tracks, text reviews, and cover art images, and further combined for classification. Experiments on single and multi-label genre classification are then carried out, evaluating the effect of the different learned repre- sentations and their combinations. Results on both experiments show how the aggregation of learned representations from different modalities improves the accuracy of the classifica- tion, suggesting that different modalities embed complementary information. In addition, the learning of a multimodal feature space increase the performance of pure audio representa- tions, which may be specially relevant when the other modalities are available for training, but not at prediction time. Moreover, a proposed approach for dimensionality reduction of target labels yields major improvements in multi-label classification not only in terms of accuracy, but also in terms of the diversity of the predicted genres, which implies a more fine-grained categorization. Finally, a qualitative analysis of the results sheds some light on the behavior of the different modalities in the classification task.
    [Show full text]
  • Pynchon's Sound of Music
    Pynchon’s Sound of Music Christian Hänggi Pynchon’s Sound of Music DIAPHANES PUBLISHED WITH SUPPORT BY THE SWISS NATIONAL SCIENCE FOUNDATION 1ST EDITION ISBN 978-3-0358-0233-7 10.4472/9783035802337 DIESES WERK IST LIZENZIERT UNTER EINER CREATIVE COMMONS NAMENSNENNUNG 3.0 SCHWEIZ LIZENZ. LAYOUT AND PREPRESS: 2EDIT, ZURICH WWW.DIAPHANES.NET Contents Preface 7 Introduction 9 1 The Job of Sorting It All Out 17 A Brief Biography in Music 17 An Inventory of Pynchon’s Musical Techniques and Strategies 26 Pynchon on Record, Vol. 4 51 2 Lessons in Organology 53 The Harmonica 56 The Kazoo 79 The Saxophone 93 3 The Sounds of Societies to Come 121 The Age of Representation 127 The Age of Repetition 149 The Age of Composition 165 4 Analyzing the Pynchon Playlist 183 Conclusion 227 Appendix 231 Index of Musical Instruments 233 The Pynchon Playlist 239 Bibliography 289 Index of Musicians 309 Acknowledgments 315 Preface When I first read Gravity’s Rainbow, back in the days before I started to study literature more systematically, I noticed the nov- el’s many references to saxophones. Having played the instru- ment for, then, almost two decades, I thought that a novelist would not, could not, feature specialty instruments such as the C-melody sax if he did not play the horn himself. Once the saxophone had caught my attention, I noticed all sorts of uncommon references that seemed to confirm my hunch that Thomas Pynchon himself played the instrument: McClintic Sphere’s 4½ reed, the contra- bass sax of Against the Day, Gravity’s Rainbow’s Charlie Parker passage.
    [Show full text]
  • The National Harmonica League Harmonicauk
    formerly- The National Harmonica League HarmonicaUK Registered Charity (England and Wales) No. 1131484 http://www.harmonica.co.uk - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Chairman Ben Hewlett +447973 284366 [email protected] Treasurer Phil Leiwy +447979 648007 [email protected] Membership Secretary David Hambley [email protected] 7 Ingleborough Way, Leyland, Lancs, PR25 4ZS, UK +447757 215047 Editor Dave Taylor +44797 901 1106 [email protected] Chromatic Davina Brazier Education Dick Powell / Eva Hurt Publicity Sam Wilkinson Secretary Dave Taylor Tremolo Simon Joy Health Rollen Flood Archivist Roger Trobridge +44771 2671940 [email protected] President Paul Jones Patrons Lee Sankey, Brendan Power, Adam Glasser US Vice President (vacancy) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Harmonica World is published every two months, and is the magazine for members of HarmonicaUK ( formerly- National Harmonica League ). Any views expressed might not be the o!cial views of HarmonicaUK. All material in this magazine is copyrighted, and may not be reproduced without the prior consent of the Editor. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Cover: Apr - May 2020 design by Dave Taylor Welcome to HARMONICA WORLD Magazine... _____________________________________________________________________________-_______________________________________________________________________________
    [Show full text]
  • Music Genre Classification
    Music Genre Classification Michael Haggblade Yang Hong Kenny Kao 1 Introduction Music classification is an interesting problem with many applications, from Drinkify (a program that generates cocktails to match the music) to Pandora to dynamically generating images that comple- ment the music. However, music genre classification has been a challenging task in the field of music information retrieval (MIR). Music genres are hard to systematically and consistently describe due to their inherent subjective nature. In this paper, we investigate various machine learning algorithms, including k-nearest neighbor (k- NN), k-means, multi-class SVM, and neural networks to classify the following four genres: clas- sical, jazz, metal, and pop. We relied purely on Mel Frequency Cepstral Coefficients (MFCC) to characterize our data as recommended by previous work in this field [5]. We then applied the ma- chine learning algorithms using the MFCCs as our features. Lastly, we explored an interesting extension by mapping images to music genres. We matched the song genres with clusters of images by using the Fourier-Mellin 2D transform to extract features and clustered the images with k-means. 2 Our Approach 2.1 Data Retrieval Process Marsyas (Music Analysis, Retrieval, and Synthesis for Audio Signals) is an open source software framework for audio processing with specific emphasis on Music Information Retrieval Applica- tions. Its website also provides access to a database, GTZAN Genre Collection, of 1000 audio tracks each 30 seconds long. There are 10 genres represented, each containing 100 tracks. All the tracks are 22050Hz Mono 16-bit audio files in .au format [2].
    [Show full text]
  • California State University, Northridge Where's The
    CALIFORNIA STATE UNIVERSITY, NORTHRIDGE WHERE’S THE ROCK? AN EXAMINATION OF ROCK MUSIC’S LONG-TERM SUCCESS THROUGH THE GEOGRAPHY OF ITS ARTISTS’ ORIGINS AND ITS STATUS IN THE MUSIC INDUSTRY A thesis submitted in partial fulfilment of the requirements for the Degree of Master of Arts in Geography, Geographic Information Science By Mark T. Ulmer May 2015 The thesis of Mark Ulmer is approved: __________________________________________ __________________ Dr. James Craine Date __________________________________________ __________________ Dr. Ronald Davidson Date __________________________________________ __________________ Dr. Steven Graves, Chair Date California State University, Northridge ii Acknowledgements I would like to thank my committee members and all of my professors at California State University, Northridge, for helping me to broaden my geographic horizons. Dr. Boroushaki, Dr. Cox, Dr. Craine, Dr. Davidson, Dr. Graves, Dr. Jackiewicz, Dr. Maas, Dr. Sun, and David Deis, thank you! iii TABLE OF CONTENTS Signature Page .................................................................................................................... ii Acknowledgements ............................................................................................................ iii LIST OF FIGURES AND TABLES.................................................................................. vi ABSTRACT ..................................................................................................................... viii Chapter 1 – Introduction ....................................................................................................
    [Show full text]
  • Ethnomusicology a Very Short Introduction
    ETHNOMUSICOLOGY A VERY SHORT INTRODUCTION Thimoty Rice Sumário Chapter 1 – Defining ethnomusicology...........................................................................................4 Ethnos..........................................................................................................................................5 Mousikē.......................................................................................................................................5 Logos...........................................................................................................................................7 Chapter 2 A bit of history.................................................................................................................9 Ancient and medieval precursors................................................................................................9 Exploration and enlightenment.................................................................................................10 Nationalism, musical folklore, and ethnology..........................................................................10 Early ethnomusicology.............................................................................................................13 “Mature” ethnomusicology.......................................................................................................15 Chapter 3........................................................................................................................................17 Conducting
    [Show full text]
  • Fabian Holt. 2007. Genre in Popular Music. Chicago and London: University of Chicago Press
    Fabian Holt. 2007. Genre in Popular Music. Chicago and London: University of Chicago Press. Reviewed by Elizabeth K. Keenan The concept of genre has often presented difficulties for popular music studies. Although discussions of genres and their boundaries frequently emerge from the ethnographic study of popular music, these studies often fail to theorize the category of genre itself, instead focusing on elements of style that demarcate a localized version of a particular genre. In contrast, studies focusing on genre itself often take a top-down, Adorno-inspired, culture-industry perspective, locating the concept within the marketing machinations of the major-label music industry. Both approaches have their limits, leaving open numerous questions about how genres form and change over time; how people employ the concept of genre in their musical practices-whether listening, purchasing, or performing; and how culture shapes our construction of genre categories. In Genre in Popular Music, Fabian Holt deftly integrates his theoretical model of genre into a series of ethnographic and historical case studies from both the center and the boundaries of popular music in the United States. These individual case studies, from reactions to Elvis Presley in Nashville to a portrait of the lesser­ known Chicago jazz and rock musician Jeff Parker, enable Holt to develop a multifaceted theory of genre that takes into consideration how discourse and practice, local experience and translocal circulation, and industry and individuals all shape popular music genres in the United States. This book is a much needed addition to studies of genre, as well as a welcome invitation to explore more fully the possibilities for doing ethnography in American popular music studies.
    [Show full text]
  • Genre Classification Via Album Cover
    Genre Classification via Album Cover Jonathan Li Di Sun Department of Computer Science Department of Computer Science Stanford University Stanford University [email protected] [email protected] Tongxin Cai Department of Civil and Environmental Engineering Stanford University [email protected] Abstract Music information retrieval techniques are being used in recommender systems and automatically categorize music. Research in music information retrieval often focuses on textual or audio features for genre classification. This paper takes a direct look at the ability of album cover artwork to predict album genre and expands upon previous work in this area. We use transfer learning with VGG-16 to improve classifier performance. Our work results in an increase in our topline metrics, area under the precision recall curve, of 80%: going from 0.315 to 0.5671, and precision and recall increasing from 0.694 and 0.253 respectively to 0.7336 and 0.3812 respectively. 1 Introduction Genre provides information and topic classification in music. Additionally, the vast majority of music is written collaboratively and with several influences, which leads to multi-label (multi-genres) albums, i.e., a song and its album cover may belong to more than one genre. Music genre classification, firstly introduced by Tzanetakis et al [18] in 2002 as a pattern recognition task, is one of the many branches of Music Information Retrieval (MIR). From here, other tasks on musical data, such as music generation, recommendation systems, instrument recognition and so on, can be performed. Nowadays, companies use music genre classification, either to offer recommendations to their users (such as iTunes, Spotify) or simply as a product (for example Shazam).
    [Show full text]
  • Genre, Form, and Medium of Performance Terms in Music Cataloging
    Genre, Form, and Medium of Performance Terms in Music Cataloging Ann Shaffer, University of Oregon [email protected] OLA 2015 Pre-Conference What We’ll Cover • New music vocabularies for LC Genre & Form and LC Medium of Performance Terms • Why were they developed? • How to use them in cataloging? • What does it all mean? Genre, Form, & MoP in LCSH Music Terms When describing music materials, LCSH have often blurred the distinction between: • Subject what the work is about • Genre what the work is • Form the purpose or format of the work • Medium of performance what performing forces the work requires Genre, Form, & MoP in LCSH Music Terms Book Music Score • LCSH describe what • LCSH describe what the musical work is: subject the book is Symphonies about: Operas Symphony Bluegrass music Opera Waltzes Bluegrass music • LCSH also describe what the format is: Rap (music) $v Scores and parts $v Vocal scores $v Fake books Waltz • And what the medium of performance is: Sonatas (Cello and piano) Banjo with instrumental ensemble String quartets Genre, Form, & MoP in LCSH Music Terms Audio Recording • LCSH describes what the musical work is: Popular music Folk music Flamenco music Concertos • And what the medium of performance is: Songs (High voice) with guitar Jazz ensembles Choruses (Mens’ voices, 4 parts) with piano Concertos (Tuba with string orchestra) Why Is This Problematic? • Machines can’t distinguish between topic, form, genre, and medium of performance in the LCSH • All coded in same field (650) • Elements of headings & subdivisions
    [Show full text]
  • You Can Judge an Artist by an Album Cover: Using Images for Music Annotation
    You Can Judge an Artist by an Album Cover: Using Images for Music Annotation Janis¯ L¯ıbeks Douglas Turnbull Swarthmore College Department of Computer Science Swarthmore, PA 19081 Ithaca College Email: [email protected] Ithaca, NY 14850 Email: [email protected] Abstract—While the perception of music tends to focus on our acoustic listening experience, the image of an artist can play a role in how we categorize (and thus judge) the artistic work. Based on a user study, we show that both album cover artwork and promotional photographs encode valuable information that helps place an artist into a musical context. We also describe a simple computer vision system that can predict music genre tags based on content-based image analysis. This suggests that we can automatically learn some notion of artist similarity based on visual appearance alone. Such visual information may be helpful for improving music discovery in terms of the quality of recommendations, the efficiency of the search process, and the aesthetics of the multimedia experience. Fig. 1. Illustrative promotional photos (left) and album covers (right) of artists with the tags pop (1st row), hip hop (2nd row), metal (3rd row). See I. INTRODUCTION Section VI for attribution. Imagine that you are at a large summer music festival. appearance. Such a measure is useful, for example, because You walk over to one of the side stages and observe a band it allows us to develop a novel music retrieval paradigm in which is about to begin their sound check. Each member of which a user can discover new artists by specifying a query the band has long unkempt hair and is wearing black t-shirts, image.
    [Show full text]
  • Multi-Label Music Genre Classification from Audio, Text, and Images Using Deep Features
    MULTI-LABEL MUSIC GENRE CLASSIFICATION FROM AUDIO, TEXT, AND IMAGES USING DEEP FEATURES Sergio Oramas1, Oriol Nieto2, Francesco Barbieri3, Xavier Serra1 1Music Technology Group, Universitat Pompeu Fabra 2Pandora Media Inc. 3TALN Group, Universitat Pompeu Fabra fsergio.oramas, francesco.barbieri, [email protected], [email protected] ABSTRACT exclusive (i.e., a song could be Pop, and at the same time have elements from Deep House and a Reggae grove). In Music genres allow to categorize musical items that share this work we aim to advance the field of music classifi- common characteristics. Although these categories are not cation by framing it as multi-label genre classification of mutually exclusive, most related research is traditionally fine-grained genres. focused on classifying tracks into a single class. Further- To this end, we present MuMu, a new large-scale mul- more, these categories (e.g., Pop, Rock) tend to be too timodal dataset for multi-label music genre classification. broad for certain applications. In this work we aim to ex- MuMu contains information of roughly 31k albums clas- pand this task by categorizing musical items into multiple sified into one or more 250 genre classes. For every al- and fine-grained labels, using three different data modal- bum we analyze the cover image, text reviews, and audio ities: audio, text, and images. To this end we present tracks, with a total number of approximately 147k audio MuMu, a new dataset of more than 31k albums classified tracks and 447k album reviews. Furthermore, we exploit into 250 genre classes. For every album we have collected this dataset with a novel deep learning approach to learn the cover image, text reviews, and audio tracks.
    [Show full text]