Signal Processing Methods for Beat Tracking, Music Segmentation, and Audio Retrieval

Signal Processing Methods for Beat Tracking, Music Segmentation, and Audio Retrieval Peter M. Grosche Max-Planck-Institut für Informatik Saarbrücken, Germany Dissertation zur Erlangung des Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Naturwissenschaftlich-Technischen Fakultät I der Universität des Saarlandes Betreuender Hochschullehrer / Supervisor: Prof. Dr. Meinard Müller Universität des Saarlandes und MPI Informatik Campus E1.4, 66123 Saarbrücken Gutachter / Reviewers: Prof. Dr. Meinard Müller Universität des Saarlandes und MPI Informatik Campus E1.4, 66123 Saarbrücken Prof. Dr. Hans-Peter Seidel MPI Informatik Campus E1.4, 66123 Saarbrücken Dekan / Dean: Univ.-Prof. Mark Groves Universität des Saarlandes, Saarbrücken Eingereicht am / Thesis submitted: 22. Mai 2012 / May 22nd, 2012 Datum des Kolloquiums / Date of Defense: xx. xxxxxx 2012 / xxxxxx xx, 2012 Peter Matthias Grosche MPI Informatik Campus E1.4 66123 Saarbrücken Germany [email protected] iii Eidesstattliche Versicherung Hiermit versichere ich an Eides statt, dass ich die vorliegende Arbeit selbstständig und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe. Die aus anderen Quellen oder indirekt ubernommenen¨ Daten und Konzepte sind unter Angabe der Quelle gekennzeichnet. Die Arbeit wurde bisher weder im In- noch im Ausland in gleicher oder ähnlicher Form in einem Verfahren zur Erlangung eines akademischen Grades vorgelegt. Saarbrucken,¨ 22. Mai 2012 Peter M. Grosche iv Acknowledgements This work was supported by the DFG Cluster of Excellence on “Multimodal Comput- ing and Interaction” at Saarland University and the Max-Planck-Institut Informatik in Saarbrücken. I thank Prof. Dr. Meinard Müller for the opportunity to do challenging research in such an exciting field and Prof. Dr. Hans-Peter Seidel for providing an excellent research en- vironment. Special thanks for support, advice, and encouragement go to all colleagues in the Multimedia Information Retrieval and Music Processing Group, all members of the Computer Graphics Department at MPII, the outstanding administrative staff of the Clus- ter of Excellence and AG4, and the members of the working group of Prof. Dr. Clausen, University of Bonn. Für meine zwei Mädels. v Abstract The goal of music information retrieval (MIR) is to develop novel strategies and techniques for organizing, exploring, accessing, and understanding music data in an efficient manner. The conversion of waveform-based audio data into semantically meaningful feature representations by the use of digital signal processing techniques is at the center of MIR and constitutes a difficult field of research because of the complexity and diversity of music signals. In this thesis, we introduce novel signal processing methods that allow for extracting musically meaningful information from audio signals. As main strategy, we exploit musical knowledge about the signals’ properties to derive feature representations that show a significant degree of robustness against musical variations but still exhibit a high musical expressiveness. We apply this general strategy to three different areas of MIR: Firstly, we introduce novel techniques for extracting tempo and beat information, where we particu- larly consider challenging music with changing tempo and soft note onsets. Secondly, we present novel algorithms for the automated segmentation and analysis of folk song field recordings, where one has to cope with significant fluctuations in intonation and tempo as well as recording artifacts. Thirdly, we explore a cross-version approach to content-based music retrieval based on the query-by-example paradigm. In all three areas, we focus on application scenarios where strong musical variations make the extraction of musically meaningful information a challenging task. Zusammenfassung Ziel der automatisierten Musikverarbeitung ist die Entwicklung neuer Strategien und Tech- niken zur effizienten Organisation großer Musiksammlungen. Ein Schwerpunkt liegt in der Anwendung von Methoden der digitalen Signalverarbeitung zur Umwandlung von Audio- signalen in musikalisch aussagekräftige Merkmalsdarstellungen. Große Herausforderungen bei dieser Aufgabe ergeben sich aus der Komplexität und Vielschichtigkeit der Musiksi- gnale. In dieser Arbeit werden neuartige Methoden vorgestellt, mit deren Hilfe musikalisch interpretierbare Information aus Musiksignalen extrahiert werden kann. Hierbei besteht eine grundlegende Strategie in der konsequenten Ausnutzung musikalischen Vorwissens, um Merkmalsdarstellungen abzuleiten die zum einen ein hohes Maß an Robustheit gegenuber¨ musikalischen Variationen und zum anderen eine hohe musikalische Ausdruckskraft besit- zen. Dieses Prinzip wenden wir auf drei verschieden Aufgabenstellungen an: Erstens stellen wir neuartige Ansätze zur Extraktion von Tempo- und Beat-Information aus Audiosignalen vor, die insbesondere auf anspruchsvolle Szenarien mit wechselnden Tempo und weichen Notenanfängen angewendet werden. Zweitens tragen wir mit neuartigen Algorithmen zur Segmentierung und Analyse von Feldaufnahmen von Volksliedern unter Vorliegen großer Intonationsschwankungen bei. Drittens enwickeln wir effiziente Verfahren zur inhaltsba- sierten Suche in großen Datenbeständen mit dem Ziel, verschiedene Interpretationen eines Musikstuckes¨ zu detektieren. In allen betrachteten Szenarien richten wir unser Augenmerk insbesondere auf die Fälle in denen auf Grund erheblicher musikalischer Variationen die Extraktion musikalisch aussagekräftiger Informationen eine große Herausforderung dar- stellt. vi Contents 1 Introduction 1 1.1 Contributions ................................. 3 1.2 IncludedPublications. 7 1.3 SupplementalPublications . 8 PartI BeatTrackingandTempoEstimation 9 2 Predominant Local Pulse Estimation 11 2.1 RelatedWork ................................. 12 2.2 OverviewofthePLPConcept . 14 2.3 NoveltyCurve................................. 14 2.4 Tempogram .................................. 19 2.5 PredominantLocalPeriodicity. 20 2.6 PLPCurve................................... 21 2.7 DiscussionofProperties . 22 2.8 Iterative Refinement of Local Pulse Estimates . ...... 26 2.9 Experiments.................................. 28 2.9.1 BaselineExperiments . 29 2.9.2 AudioDatasets.............................. 34 2.9.3 TempoEstimationExperiments. 35 2.9.4 ConfidenceandLimitations . 36 2.9.5 DynamicProgrammingBeatTracking . 38 2.9.6 BeatTrackingExperiments . 39 2.9.7 Context-SensitiveEvaluation . 41 2.10Conclusion................................... 43 3 A Case Study on Chopin Mazurkas 45 3.1 Specification of the Beat Tracking Problem . ..... 46 3.2 FiveMazurkasbyFrédéricChopin . 47 3.3 BeatTrackingStrategies . 48 3.4 EvaluationontheBeatLevel . 49 3.5 ExperimentalResults . 51 3.6 FurtherNotes ................................. 55 vii viii CONTENTS 4 Tempo-Related Audio Features 57 4.1 TempogramRepresentations . 58 4.1.1 FourierTempogram ........................... 58 4.1.2 AutocorrelationTempogram . 60 4.2 CyclicTempograms.............................. 60 4.3 ApplicationstoMusicSegmentation. 62 4.4 FurtherNotes ................................. 63 Part II Music Segmentation 65 5 Reference-Based Folk Song Segmentation 67 5.1 BackgroundonFolkSongResearch . 68 5.2 Chroma-BasedAudioFeatures. 70 5.3 DistanceFunction............................... 73 5.4 SegmentationoftheAudioRecording . 74 5.5 EnhancementStrategies . 74 5.5.1 F0-EnhancedChromagrams . 75 5.5.2 Transposition-Invariant Distance Function . ........ 76 5.5.3 Fluctuation-Invariant Distance Function . ....... 77 5.6 Experiments.................................. 77 5.7 FurtherNotes ................................. 79 6 Reference-Free Folk Song Segmentation 81 6.1 Self-SimilarityMatrices. 81 6.2 Audio Thumbnailing and Segmentation Procedure . ...... 82 6.3 EnhancementStrategies . 84 6.3.1 F0-Enhanced Self-Similarity Matrices . ..... 84 6.3.2 TemporalSmoothing. 84 6.3.3 ThresholdingandNormalization . 85 6.3.4 Transposition and Fluctuation Invariance . ...... 85 6.4 Experiments.................................. 86 6.5 Conclusion................................... 87 7 Automated Analysis of Performance Variations 89 7.1 ChromaTemplates .............................. 90 7.2 FolkSongPerformanceAnalysis . 95 7.3 AUserInterfaceforFolkSongNavigation . 100 7.4 Conclusion...................................101 CONTENTS ix Part III Audio Retrieval 103 8 Content-Based Music Retrieval 105 8.1 Audio-BasedQuery-By-Example. 107 8.2 AudioIdentification.............................. 109 8.3 AudioMatching................................112 8.4 VersionIdentification . 116 8.5 FurtherNotes .................................119 9 Musically-Motivated Audio Fingerprints 121 9.1 ModifiedPeakFingerprints. 123 9.2 Experiments..................................125 9.2.1 Dataset ..................................125 9.2.2 SynchronizationofFingerprints . 126 9.2.3 Experiment: PeakConsistency . 126 9.2.4 Experiment: Document-Based Retrieval . 128 9.3 FurtherNotes .................................130 10 Characteristic Audio Shingles 133 10.1 Cross-VersionRetrievalStrategy. 134 10.2 Tempo-Invariant Matching Strategies . 135 10.3Experiments ..................................135 10.3.1 Dataset ..................................136 10.3.2 Evaluation of Query Length and Feature Resolution . .......136 10.3.3 EvaluationofMatchingStrategies . 137 10.3.4 Evaluation of Dimensionality Reduction . 138 10.3.5 Indexed-Based Retrieval by Locality Sensitive Hashing. .139 10.4Conclusion...................................140

Signal Processing Methods for Beat Tracking, Music Segmentation, and Audio Retrieval

Apple / Shazam Merger Procedure Regulation (Ec)

Iphone, Ipad and Ipod Touch Apps for (Special) Education

PCM 80 Dual FX Presets PCM 80 Dual FX Presets the 250 Dual FX Presets Are Organized in 5 Banks (X0-X4) of 50 Presets/Bank (Numbered 0.0 Ð 4.9)

2011 – Cincinnati, OH

Apple-Privacy-Policy-En-Ww.Pdf

Apps to Download Online for Mac How to Download Netflix Episodes and Movies on Mac

2015: What's Next?

Iphone Productivity Tips

Ipod & Itunes for Dummies, 10Th Edition

Beat Detection and Correction for Djing Applications

Auditions-Snapshot Reviews of New Production Products for Audio Profe

Music Recommendation Algorithms: Discovering Weekly Or Discovering