Object Coding of Music Using Expressive MIDI Welburn, Stephen J
Total Page:16
File Type:pdf, Size:1020Kb
Object coding of music using expressive MIDI Welburn, Stephen J. The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without the prior written consent of the author For additional information about this publication click this link. https://qmro.qmul.ac.uk/jspui/handle/123456789/656 Information about this research object was correct at the time of download; we occasionally make corrections to records, please therefore check the published record when citing. For more information contact [email protected] Object Coding of Music Using Expressive MIDI Stephen J. Welburn A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of the University of London. Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University of London January 2011 The material contained within this thesis is my own, and all references are cited accordingly. The copyright of this dissertation rests with the author. Information from it may be freely used with acknowledgement. Stephen J. Welburn 12th January 2011 2 Abstract Structured audio uses a high level representation of a signal to produce audio out- put. When it was first introduced in 1998, creating a structured audio representation from an audio signal was beyond the state-of-the-art. Inspired by object coding and structured audio, we present a system to reproduce audio using Expressive MIDI, high-level parameters being used to represent pitch expression from an audio signal. This allows a low bit-rate MIDI sketch of the original audio to be produced. We examine optimisation techniques which may be suitable for inferring Expressive MIDI parameters from estimated pitch trajectories, considering the effect of data codings on the difficulty of optimisation. We look at some less common Gray codes and examine their effect on algorithm performance on standard test problems. We build an expressive MIDI system, estimating parameters from audio and syn- thesising output from those parameters. When the parameter estimation succeeds, we find that the system produces note pitch trajectories which match source audio to within 10 pitch cents. We consider the quality of the system in terms of both param- eter estimation and the final output, finding that improvements to core components { audio segmentation and pitch estimation, both active research fields { would produce a better system. We examine the current state-of-the-art in pitch estimation, and find that some estimators produce high precision estimates but are prone to harmonic errors, whilst other estimators produce fewer harmonic errors but are less precise. Inspired by this, we produce a novel pitch estimator combining the output of existing estimators. 3 Acknowledgements This work would not have been possible without the support of my supervisor, Mark Plumbley. We got there. The inhabitants of the Centre for Digital Music at Queen Mary also deserve mention, having provided knowledge, companionship and entertain- ment. Particular thanks go to those involved in \C4DM Presents" for the opportunity to take part in projects for public consumption. One more QM thankyou must go to Melissa, for making sure that deadlines were met, courses were attended, and that the necessary forms were filed. Of course, thanks go to my parents and my sister. Knowing they were rooting for me helped in the lows, and their interest in what I've been up to kept me on my toes. Hopefully I'll be less distracted during phone calls now. Without Abigail, this could not have happened. Having taken all opportunities to vent, gripe and grump to her over the past few years her meagre reward is her name in print. For proof-reading beyond the call of duty, and for just being there at home I'll double the reward. Abigail. And special thanks to two who are no longer with us: Martin Gardner for his Mathematical Puzzles and Diversions, without which I may never have enjoyed maths; and Alan Tayler, supervisor for my first degree, without whose inspiration this would not have happened. 4 Contents Abstract3 Acknowledgements4 Contents5 List of Figures 11 List of Tables 14 1 Introduction 17 1.1 Thesis Structure.............................. 18 1.2 Previously Published Work........................ 20 2 Background 21 2.1 Audio Coding............................... 21 2.2 Lossy and Lossless Codings ....................... 24 2.2.1 Lossy Coding........................... 24 2.2.2 Lossless Codings ......................... 26 2.3 Parametric Coding ............................ 28 2.3.1 Object Coding .......................... 29 2.4 MPEG-4 Synthetic Audio Coding.................... 30 5 2.5 MIDI Manufacturers Association Standards .............. 33 2.5.1 Musical Instrument Digital Interface (MIDI).......... 33 The MIDI Hardware Specification................ 33 The MIDI Data Format ..................... 34 Standard MIDI Files (SMF)................... 36 2.5.2 MMA Synthesiser Specifications................. 36 2.5.3 General MIDI (GM) ....................... 36 2.5.4 Downloadable Sounds (DLS)................... 39 2.5.5 Combining MIDI and Synthesis: XMF............. 40 2.6 Extracting MIDI from Audio....................... 41 2.6.1 Commercial Applications..................... 42 2.6.2 Conclusions............................ 42 3 An \Expressive MIDI" model of pitch 44 3.1 Representation of Pitch Trajectories in DLS.............. 45 3.1.1 dAhDSR Envelope Generator .................. 45 3.1.2 DLS Low Frequency Oscillator (LFO).............. 47 3.1.3 The DLS Pitch Trajectory Model................ 47 3.1.4 DLS Parameter Formats..................... 48 3.1.5 The Expressive MIDI Pitch Model ............... 48 3.2 Extracting a Pitch Trajectory from Audio ............... 50 3.2.1 Pitch Estimation Using Autocorrelation ............ 50 3.2.2 Pitch Estimation Using the Fourier Transform......... 51 3.2.3 YIN ................................ 52 3.3 Segmenting the Pitch Trajectory..................... 54 3.4 Estimating Pitch Trajectory Parameters ................ 58 4 Optimisation 60 4.1 Hillclimbing/Descent Methods...................... 61 6 4.1.1 Steepest Descent ......................... 62 4.1.2 Stochastic Hill-Climber (SHC).................. 63 4.2 Evolutionary Algorithms (EAs) ..................... 65 4.2.1 Simple Genetic Algorithm.................... 66 Crossover Schemes for Genetic Algorithms........... 67 4.2.2 CHC................................ 69 4.3 A Comparison of the Algorithms .................... 70 4.4 Bit-wise Codings ............................. 72 Standard Binary Code...................... 72 4.4.1 Hamming Distance and Locality................. 74 Binary Reflected Gray Code................... 74 Maximum Distance Code..................... 75 4.4.2 Transition Counts and Balancedness .............. 76 Balanced Gray Code....................... 77 4.4.3 Hamming Weight......................... 77 Monotone Gray Code....................... 77 4.5 A New Analysis of Binary Codings ................... 78 4.5.1 Sample 4-bit Encodings ..................... 78 4.5.2 Plotting 5-bit Codings...................... 79 4.5.3 Distribution of 6-bit Property Values.............. 80 4.5.4 Example Individual 6-bit Codings................ 81 4.5.5 Variation of Properties with Number of Bits.......... 83 4.6 Algorithm and Coding Performance Analysis.............. 84 4.6.1 Method .............................. 85 4.6.2 Results............................... 85 4.7 Conclusions................................ 93 5 Parameter Estimation 101 7 5.1 The DLS Pitch Trajectory Parameter Estimation Problem . 101 5.1.1 Possible Cost Functions ..................... 102 5.2 Joint Optimisation of EG and LFO Pitch Parameters . 103 5.3 Assessing CHC+BRGC for EG+LFO Parameter Estimation . 105 5.3.1 Method .............................. 105 5.3.2 Overview of Results........................ 107 Overall Summary......................... 107 By Instrument .......................... 108 By File............................... 108 Example Output ......................... 110 5.3.3 Distributions of Errors...................... 118 5.4 Conclusions................................ 121 6 (Re)Synthesis 123 6.1 Synthesiser Selection........................... 124 6.2 Implementation in Kontakt ....................... 125 6.3 Parameter Encoding ........................... 128 6.4 Input File Format............................. 130 6.5 Evaluation of Expressive MIDI Resynthesis . 132 6.5.1 Method .............................. 132 6.5.2 Results............................... 132 6.5.3 Kontakt Sample Tuning ..................... 139 6.6 Conclusions................................ 142 7 (Re)Evaluating YIN 145 7.1 Evaluating Pitch Estimators....................... 145 7.1.1 Pitch Data............................. 145 7.1.2 Pitch Databases.......................... 147 Bagshaw.............................. 147 8 Keele................................ 147 VOQUAL'03 ........................... 147 MIREX QBSH .......................... 148 7.1.3 Metrics for Evaluating Pitch Estimators . 148 (i) Gross Error Percentage (GEP) . 148 (ii) 10 Cent Threshold (10c)................... 149 (iii) Period Mismatch Percentage (PMP) . 149 (iv) MIDI Mismatch Percentage (MMP) . 150 7.2 Appraising YIN.............................. 151 7.2.1 YIN Test Implementation .................... 151 7.2.2 YIN Pitch Metric Results .................... 152 7.2.3 YIN Pitch Error Distributions.................. 153 7.2.4 Harmonic Errors and YIN...................