Classifying Music with Jmir

Classifying Music with jMIR Cory McKay Marianopolis College and CIRMMT Montreal, Canada Cory McKay Lecture contents Introduction to music information retrieval Automatic classification Overview of the jMIR software Multimodal classification experiments Empirical results jSymbolic Other jMIR components As time and interest permit Cory McKay Goals of MIR Extract meaningful information from or about music Facilitate music analysis, organization and access Cory McKay Main sources of information Symbolic recordings e.g. MIDI Audio recordings e.g. MP3 Cultural data e.g. web data, metadata tags, etc. Lyrics Others Album art, videos, etc. Cory McKay A (very partial) list of MIR tasks Automatic transcription Automatic music analysis Harmonic analysis, structural segmentation, etc. Query by example Optical music recognition (OMR) Fingerprinting (song identification) Interfaces and visualizations Similarity Recommendation, hit prediction, etc. Automatic classification Genre, mood, artist, composer, instrument, etc. Cory McKay Automatic music classification Typical procedure: Collect annotated training / testing data With appropriate ontologies Extract features Reduce feature dimensionality Train a classification model Typically supervised Validate the model Most significant challenges: Acquiring sufficiently large annotated datasets Designing features that encapsulate relevant data Cory McKay Overview of the jMIR software jMIR is software suite designed for performing research in automatic music classification Primary tasks performed: Feature extraction Machine learning Data storage file formats Dataset management Acquiring, correcting and organizing metadata Cory McKay Characteristics of jMIR Has a separate software component to address each important aspect of automatic music classification Each component can be used independently Can also be used as an integrated whole Free and open source Architectural emphasis on providing an extensible platform for iteratively developing new techniques and algorithms Interfaces designed for both technical and non- technical users Facilitates multimodal research Cory McKay jMIR components jAudio: Audio feature extraction jSymbolic: Feature extraction from MIDI files jWebMiner: Cultural feature extraction jLyric: Extracts features from lyrical transcriptions ACE: Meta-learning classification engine ACE XML: File formats Features, feature metadata, instance metadata and ontologies lyricFetcher: Lyric mining Codaich, Bodhidharma MIDI and SLAC: datasets jMusicMetaManager: Metadata management jSongMiner: Metadata harvesting jMIRUtilities: Infrastructure for conducting experiments 9/41 Cory McKay Efficacy of multimodal approaches? Can combining features extracted from audio, symbolic, cultural and/or lyrical sources significantly improve automatic music classification performance? Intuitively, they each seem to contain very different kinds of information Can this help us break the seeming music classification performance ceiling of 70% to 80% for reasonably-sized taxonomies? This was studied empirically (McKay et al. 2010) A follow-up on a similar earlier study (McKay 2010) Cory McKay Experimental methodology Extracted features from separate audio, symbolic, cultural and lyrical sources of data Corresponding to the same musical pieces Using the jMIR feature extractors Compared ACE-based genre classification performance of each of the 15 possible subsets of these 4 feature groups Audio, Symbolic + Audio, Cultural, Symbolic + Cultural + etc. Applied dimensionality reduction 10-fold cross-validation With reserved validation set Wilcoxon signed-rank significance tests were used Cory McKay Musical dataset used: SLAC The SLAC Dataset was assembled for this experiment Symbolic Lyrical Audio Cultural 250 recordings belonging to 10 genres Collapsible to 5 genres Audio and MIDI versions of each recording Acquired separately Accompanying metadata that could be used to extract cultural features from the web Lyrics mined with lyricFetcher Cory McKay Results: 5-genre taxonomy All feature groups Classification Performance on 5-Genre Taxonomy involving cultural 100 features achieved 95 90 classification 85 accuracies of 99% 80 75 to 100% 70 65 Symbolic features 60 alone performed 55 50 with a 45 classification (%) Accuracy Classification Average 40 accuracy of 85% 35 Feature Set Cory McKay Results: 10-genre taxonomy SAC achieved the Classification Performance on 10-Genre Taxonomy best classification 100 accuracy of 89% 95 90 All feature groups 85 80 that included 75 cultural features 70 65 achieved 81% or 60 higher 55 50 Symbolic features 45 Average Classification Accuracy (%) Accuracy Classification Average 40 alone performed at 35 66% Feature Set Cory McKay Discussion: Combining feature types Combining features Average Classification Performance types tended to Based on the Number of Feature Types increase 5 Genres 10 Genres 100 classification 95 performance on 90 average 85 However, there 80 were exceptions 75 70 e.g. LC performed 65 significantly less 60 well than C in the 55 Average Classification Accuracy (%) Accuracy Classification Average 10-genre 50 experiment 1 2 3 4 Number of Feature Types Cory McKay Discussion: Feature type dominance Cultural features 10-Genre Classification Performance of significantly outperformed Feature Sets Including and not Including other feature types Cultural Features For the 10-genre taxonomy, With Cultural Features Without Cultural Features all groups including cultural 100 features outperformed all 95 groups of the same size 90 that did not include cultural 85 80 features 75 Symbolic features were 70 useful in general 65 60 Symbolic groups all 55 performed at 70% or 50 above 45 SAC was the best group 40 overall, at 89% (%) Accuracy Classification Average 35 1 2 3 4 Number of Feature Types Cory McKay Experimental conclusions Excellent overall genre classification results were obtained 89% on 10 genres, compared to the best MIREX audio-only result to date of 80% on 10 genres As a side note, jMIR holds the MIREX record (2005) for symbolic-only genre classification in a separate experiment 84% on a 9-class taxonomy 46% on a 38-class taxonomy Combining feature types tended to improve results Cultural features dominated Cory McKay Important research question Should research efforts be focused on fingerprinting and cultural feature extraction rather than bothering with extracting content- based features? Assuming reliable fingerprinting, this could result in very high classification results However, this marginalizes the musicological and music theoretical insights about musical categories that can be achieved from content-based analysis Cultural features are also of no or limited utility for brand new music Cory McKay Introduction to jSymbolic Extracts features from MIDI files 111 implemented features By far the largest existing symbolic feature catalogue Many are original An additional 49 features are proposed but not yet implemented Features saved to ACE XML Cory McKay jSymbolic feature types (1/2) Instrumentation: What types of instruments are present and which are given particular importance relative to others? Found experimentally to be the most effective symbolic feature type (McKay & Fujinaga 2005) Texture: How many independent voices are there and how do they interact (e.g., polyphonic, homophonic, etc.)? Rhythm: Time intervals between the attacks of different notes Duration of notes What kinds of meters and rhythmic patterns are present? Rubato? Dynamics: How loud are notes and what kinds of dynamic variations occur? Cory McKay jSymbolic feature types (2/2) Pitch Statistics: What are the occurrence rates of different pitches and pitch classes? How tonal is the piece? How much variety in pitch is there? Melody: What kinds of melodic intervals are present? How much melodic variation is there? What kinds of melodic contours are used? What types of phrases are used? Chords (planned): What vertical intervals are present? What types of chords do they represent? How much harmonic movement is there? Cory McKay More on jSymbolic Easy to add new features Modular plug-in design Automatic provision of all other feature values to each new feature Dynamic feature extraction scheduling that automatically resolves feature dependencies A variety of histogram aggregators are used Beat histograms Pitch and pitch class histograms (including wrapped) Instrumentation histograms Melodic interval histograms Vertical interval histograms and chord type histograms Cory McKay Beat histogram example Beat Histogram: I Wanna Be Sedated by The Ramones 0.10 Beat histograms use 0.09 0.08 autocorrelation to calculate 0.07 0.06 the relative strengths of 0.05 0.04 different beat periodicities Frequency Relative 0.03 0.02 within a signal 0.01 0.00 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 I Wanna Be Sedated by The Beats Per Minute Ramones (top) Beat Histogram: 'Round Mdinight by Thelonious Monk Several harmonic peaks with 0.30 large spreads around them 0.25 0.20 ‘Round Midnight by 0.15 0.10 Thelonious Monk (bottom) Frequency Relative Only one strong peak, with a 0.05 0.00 large low-level spread 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Beats Per Minute Cory McKay Chopin’s Nocturne in B, Op. 32, No. 1 Average Note To Note Dynamics Change: 6.03 Orchestral Strings Fraction: 0 Chromatic Motion: 0.0769 Overall Dynamic Range: 62

Load more