Harmonic Analysis for Music Transcription and Characterization
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITA` DEGLI STUDI DI BRESCIA FACOLTA` DI INGEGNERIA Dipartimento di Ingegneria dell'Informazione DOTTORATO DI RICERCA IN INGEGNERIA DELLE TELECOMUNICAZIONI XXVII CICLO SSD: ING-INF/03 Harmonic Analysis for Music Transcription and Characterization Ph.D. Candidate: Ing. Alessio Degani .............................................. Ph.D. Supervisor: Prof. Pierangelo Migliorati .............................................. Ph.D. Coordinator: Prof. Riccardo Leonardi .............................................. ANNO ACCADEMICO 2013/2014 to my family Sommario L'oggetto di questa tesi `elo studio dei vari metodi per la stima dell'informazione tonale in un brano musicale digitale. Il lavoro si colloca nel settore scientifico de- nomitato Music Information Retrieval, il quale studia le innumerevoli tematiche che riguardano l'estrazione di informazioni di alto livello attraverso l'analisi del segnale audio. Nello specifico, in questa dissertazione andremo ad analizzare quelle procedure atte ad estrarre l'informazione tonale e armonica a diversi lev- elli di astrazione. Come prima cosa verr`apresentato un metodo per stimare la presenza e la precisa localizzazione frequenziale delle componenti sinusoidali stazionarie a breve termine, ovvero le componenti fondamentali che indentificano note e accordi, quindi l'informazione tonale/armonica. Successivamente verr`aesposta un'analisi esaustiva dei metodi di stima della frequenza di riferimento (usata per accordare gli strumenti musicali) basati sui picchi spettrali. Di solito la frequenza di riferimento `econsiderata standard e associata al valore di 440 Hz, ma non sempre `ecos`ı. Vedremo quindi che per migliorare le prestazioni dei vari metodi che si affidano ad una stima del contenuto armonico e melodico per determinati scopi, `efondamentale avere una stima coerente e robusta della freqeunza di riferimento. In seguito, verr`apresentato un sistema innovativo per per misurare la rile- vanza di una data componente frequenziale sinusoidale in un ambiente polifonico. Questo pu`oessere usato come front-end per un metodo per la trascrizione auto- matica di partiture polifoniche. Poi vedremo come usare dei descrittori audio di tipo armonico, chiamati Pitch Class Profile, per identificare i cambi di accordo in una composizione musicale. Infine verr`aaffrontato il tema dell'identificazione delle canzoni cover (versione alternativa di una canzone). A tal proposito proponiamo una strategia automat- ica per combinare i risultati ottenuti da diversi metodi, in modo da migliorare le performance dell'intero sistema. Abstract This thesis is concerned with the analysis of the digital music signal for the extraction of meaningful information about the tonal content of the audio excerpt. This work lies in the filed of Music Information Retrieval which is a science that has the goal of extracting high level, human-readable information from a musical composition. In this work we cover the retrieval of the tonal content at different levels of abstraction. First, we present a method for the estimation of the presence of short-term stationary sinusoidal components, with a precise frequency resolution. The sinu- soidal components are the main atoms that compose a musical note or a musical chord, and thus the tonal/harmonic information. Next, we show an exhaustive comparative analysis of different spectral peak based tuning frequency estimation algorithms. The tuning frequency is usually set to the widely accepted standard, that is, of 440 Hz. However, several musical pieces exhibit a slight deviation in the tuning frequency. Therefore, a reliable reference frequency estimation method is fundamental in order to not deteriorate the performances of the systems that use this information. Then, we present a novel system to measure the salience of a given sinusoidal component in a mixture of partials generated in a polyphonic composition. This measure can be used as a front-end for an automatic music transcription system. Then we show how to use the harmonic mid-level representation called Pitch Class Profile for detecting the musical chord boundaries in a song. Finally, we deal with the task of cover song identification (identify different rendition of a given song). We propose an automatic method to combine the results of several different systems in order to improve the detection accuracy of a cover song identification algorithm. Acknowledgements I would like to thank my supervisor, Prof. Pierangelo Migliorati, for his guid- ance and support throughout my three years of PhD. I would also like to thank Prof. Riccardo Leonardi, my PhD coordinator, for giving me the opportunity to work on this field and making possible my experience abroad in the beautiful city of Paris. My special thanks go to Ing. Marco Dalai for his precious help and his enlight- ening conversations. I would also like to thank HDR-Dr. Geoffroy Peeters for his supervision during one of the best experiences in these years: my stay at IRCAM, Paris, in the Sound Analysis/Synthesis team. There, I've met a lot of great people that share the passion for music and science. Finally, a big thanks to my family and my girlfriend for believing in me during the PhD years. I Contents 1 Introduction 1 1.1 Motivation . .1 1.2 Context . .2 1.2.1 Music Information Retrieval . .3 1.2.2 Frequency Analysis . .4 1.2.3 Audio Features . .5 1.3 Overview of the presented work . .5 1.4 Contributions . .7 1.5 Outline . .8 2 Background 11 2.1 Elements of Music Theory . 11 2.1.1 Pitch and Musical Notes . 12 2.1.2 Tuning and Temperaments . 13 2.1.3 MIDI Tuning Standard . 14 2.1.4 Musical Scales . 15 2.1.5 Musical Chords, Melody and Harmony . 16 2.1.6 Timbre and Dynamics . 17 3 Phase-based sinusoids localization 19 3.1 Time-Frequency analysis . 21 3.1.1 Short Time Fourier Transform . 21 II CONTENTS 3.1.2 Phase evolution of the STFT . 22 3.2 Phase coherence measure . 23 3.2.1 Coherence measure . 23 3.2.2 Coherence function . 25 3.3 Results and Applications . 28 3.4 Conclusions . 30 4 Tuning Frequency Estimation 33 4.1 Tuning Frequency Estimation Methods . 35 4.1.1 Frequency Deviation Histogram . 36 4.1.2 Circular statistics . 37 4.1.3 Least-Squares Estimation . 38 4.2 Evaluation Strategy . 39 4.2.1 Ideal case performances and global reference frequency es- timation . 40 4.2.2 Speed of convergence and estimation stability . 41 4.2.3 Local tuning estimation . 42 4.2.4 Computational cost and complexity . 42 4.3 Data Set . 43 4.3.1 Cover Song 80 (covers80) . 43 4.3.2 MuseScore Symbolic Music Dataset (MS2012) . 44 4.4 Results . 45 4.4.1 Ideal case performances and global reference frequency es- timation results . 45 4.4.2 Speed of convergence and estimation stability results . 48 4.4.3 Local tuning estimation results . 50 4.4.4 Computational cost and complexity . 52 4.5 Conclusions . 53 5 Polyphonic Pitch Salience Function 55 5.0.1 Classical approach . 56 5.0.2 Proposal . 56 5.1 Proposed Method . 58 CONTENTS III 5.1.1 Overview . 58 5.1.2 Motivations for using frequency deviations for pitch salience computation . 59 5.1.3 Short Time Fourier Transform . 61 5.1.4 Spectrum Peak Picking . 61 5.1.5 Reference Frequency Estimation . 62 5.1.6 Salience Function Computation . 62 5.2 Evaluation . 64 5.2.1 Multiple-pitch estimation: post-processing of the salience function . 64 5.2.2 Evaluation measures . 65 5.2.3 Test-Set . 66 5.2.4 Results . 67 5.3 Conclusions . 70 6 Chord Bounds Detection 73 6.1 Harmonic Change Detection Function . 74 6.1.1 Algorithm for HCDF calculation . 75 6.2 Chroma Features and Novelty calculation . 78 6.2.1 Other Chroma Features . 78 6.2.2 Distance measure . 81 6.3 Evaluation . 82 6.4 Results . 83 6.5 Conclusions . 84 7 Distance Fusion for Cover Song Id. 87 7.1 Audio Features and distance metrics . 90 7.1.1 Audio Features . 90 7.1.2 Distance Measures . 91 7.2 Distance Selection . 92 7.3 Results . 94 7.4 Conclusions . 94 IV CONTENTS 8 Conclusions 97 8.1 Summary of contributions . 97 8.2 Future Perspectives . 100 Bibliography 102 V List of Figures 1.1 Outline of the dissertation . 10 2.1 Piano keys and its note names . 13 2.2 Examples of Western musical scales . 16 2.3 Examples of musical chords . 17 2.4 An excerpt of \Polonaise in G minor" by J. S. Bach . 17 2.5 Amplitude spectrum of two different instruments . 18 3.1 Example of Phase coherence measure. 26 3.2 Phase Coherence Function for a single frequency component . 27 3.3 Amplitude spectrum versus Phase Coherence Weighted Modulus (one freq. component) . 28 3.4 Amplitude spectrum versus Phase Coherence Weighted Modulus (two freq. components) . 29 3.5 Amplitude spectrum versus Phase Coherence Weighted Modulus (SNR = 10 dB) . 31 3.6 Amplitude spectrum versus Phase Coherence Weighted Modulus (SNR = −10 dB) . 32 4.1 fref estimation of a sawtooth sweep signal . 46 4.2 fref estimation histogram (k =5)................... 47 4.3 fref estimation histogram (k = 30) . 47 4.4 Convergence results for covers80 . 48 VI LIST OF FIGURES 4.5 Convergence results for MS2012 . 49 4.6 Estimated Σ for covers80 . 49 4.7 Estimated Σ for MS2012 . 50 4.8 Local tuning estimation of the song \Let It Be" . 50 4.9 Local tuning estimation of \Variations 16-20" . 51 4.10 Local tuning estimation of Choir performance . 52 5.1 Frequency location of the pitches of the equal tempered scale . 57 5.2 General scheme of the method for salience computation . 58 5.3 Deviation of the first 20 harmonic frequencies of a complex tone . 60 5.4 Piano roll representation obtained using our salience function . 66 5.5 Pitch estimation results (Harmonic model) . 68 5.6 Pitch estimation results . 69 5.7 Pitch-Class estimation results .