Analyzing Sample-Based Electronic Music Using Audio Processing Techniques
Total Page:16
File Type:pdf, Size:1020Kb
Analyzing Sample-Based Electronic Music Using Audio Processing Techniques Audioverarbeitungsmethoden zur Analyse Sample-basierter elektronischer Musik Der Technischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg zur Erlangung des Doktorgrades Doktor der Ingenieurwissenschaften (Dr.-Ing.) vorgelegt von Patricio López Serrano Erickson aus Mexiko-Stadt Als Dissertation genehmigt von der Technischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg Tag der mündlichen Prüfung: 19. Februar 2019 Vorsitzender des Promotionsorgans: Prof. Dr.-Ing. Reinhard Lerch 1. Gutachter: Prof. Dr. Meinard Müller 2. Gutachter: Prof. Dr. Jason Hockman Abstract The advent of affordable digital sampling technology brought about great changes in music production. Experienced producers of sample-based electronic music (SBEM) are capable of understanding the intricate relationship between a sampled source and its use in a new track, harnessing its properties to shape the structure, timbre, and rhythm of their compositions. However, automated analysis of the phenomena surrounding both SBEM and the sources it uses for its samples is a challenge which involves many tasks from music information retrieval (MIR) and audio processing. In this thesis we develop models and techniques to better understand SBEM at different levels. In particular, we offer four main technical contributions to retrieval and analysis tasks. First, we explore how timbral changes affect the spectral peak maps used in audio fingerprinting as a means to identify overlapping samples. Second, we analyze the structure of typical SBEM tracks using audio decomposition based on non-negative matrix factor deconvolution (NMFD). Third, we investigate the interaction of timbre and structure by designing a mid-level audio feature based on cascaded harmonic-residual-percussive source separation (CHRP). Fourth, we apply random forests to identify the quintessential sampling source: drum breaks. Given their prominent role in SBEM, we devote considerable attention to drum breaks. As an application to computational musicology, we formalize an algorithm for calculating local swing ratio in drum breaks and adapt an autocorrelation-based method to analyze their microrhythmic properties. Finally, we present a creative application to music production which allows combining the temporal and timbral properties of two separate drum breaks (redrumming). Despite the massive commercial success of SBEM and the practice of sampling, they mostly remain outside the attention of formal academic studies and MIR research. In this thesis our overarching goal is to identify and formalize some of the fundamental audio processing tasks related to SBEM, proposing methods that can seed further research. i Zusammenfassung Die zunehmende Verbreitung erschwinglicher digitaler Sampling-Technologie wurde die Musikproduk- tion revolutioniert. Erfahrene Produzenten von Sample-basierter elektronischer Musik (SBEM) können die vielschichtige Beziehung zwischen der ursprünglichen Musik eines Samples und der Verwendung in einem neuen Track erkennen und die Eigenschaften des Samples nutzen, um die Struktur, Klang- farbe und den Rhythmus neuer Kompositionen zu gestalten. Der Zusammenhang von SBEM und den dort verwendeten Samples ist vielschichtig. Die automatisierte Analyse von SBEM führt somit zu un- terschiedlichen Aufgabenstellungen, die im Bereich des Music Information Retrieval (MIR) und der Audioverarbeitung angesiedelt sind. In dieser Arbeit entwickeln wir Modelle und Techniken, um SBEM auf verschiedenen Ebenen besser zu verstehen. Insbesondere umfasst die Arbeit vier technische Beiträge zu Retrieval- und Analyseaufgaben. Zuerst untersuchen wir, wie klangliche Veränderungen die spektralen Peak-Maps beeinflussen, die beim Audio-Fingerprinting verwendet werden, um überlappende Samples zu identifizieren. Zweitens analysieren wir die Struktur typischer SBEM-Tracks mithilfe von Audioz- erlegungstechniken, die auf sogenannter Non-Negative Matrix Factorization Deconvolution (NMFD) basieren. Drittens untersuchen wir die Wechselwirkung von Klangfarbe und Struktur, indem wir ein Mid- Level-Audio-Merkmal entwickeln, das auf kaskadierter harmonisch-residual-perkussiver Quellentrennung (CHRP) basiert. Angesichts ihrer herausragenden Rolle in SBEM widmen wir Drum Breaks, eine der meistverwendeten Sample-Quellen, besondere Aufmerksamkeit. In einem vierten Beitrag wenden wir Techniken des maschinellen Lernens (insbesondere Random Forests) an, um Drum Breaks automatisiert in Musikaufnahmen aufzuspüren. Als Anwendung in der computergestützten Musikwissenschaft for- malisieren wir einen Algorithmus zur Berechnung des lokalen Swing Ratios. Dazu adaptieren wir eine Autokorrelations-basierte Methode zur Analyse der mikrorhythmischen Eigenschaften von Drum Breaks. Schließlich präsentieren wir eine kreative Anwendung für die Musikproduktion, bei der zeitliche und klan- gliche Eigenschaften zweier unterschiedlicher Drum Breaks kombiniert werden können (Redrumming). Trotz des überaus großen kommerziellen Erfolgs von SBEM und der Verwendung von Samples bleibt diese Musikpraxis meist außerhalb der Aufmerksamkeit akademischer Studien und MIR-Forschung. Ein über- geordnetes Ziel dieser Arbeit besteht in der Formalisierung neuartiger wissenschaftlicher Fragestellungen und der Bereitstellung von Methoden zur computergestützten Analyse von SBEM. iii Contents Abstract i Zusammenfassung iii 1 Introduction 3 1.1 Structure . 4 1.2 Publications . 5 1.3 Contributions . 6 1.4 Acknowledgments . 7 2 Fundamental Concepts of SBEM 9 2.1 Drum Breaks and Layering . 9 2.2 Production . 10 2.3 DJing . 12 2.4 SBEM in MIR Literature . 13 2.5 Further Literature . 14 3 Fingerprinting in Electronic Music 17 3.1 Audio Signals and Fourier Analysis . 17 3.2 Audio Fingerprinting . 21 3.3 EM-Mini Dataset Description . 21 3.4 Fingerprinting with Peak Maps . 22 3.5 Reference Implementation . 27 3.6 Peak Agreement . 29 3.7 Adding White Noise . 31 3.8 Multi-Loop Mixtures . 34 3.9 STFT Frame Offset . 36 4 Modeling SBEM 39 4.1 EM Composition Basics . 40 4.2 Structure and Production Process . 40 4.3 Simplified Model for EM . 41 1 Contents 4.4 Fingerprint-Based EM Decomposition . 43 4.5 NMFD-Based EM Decomposition . 47 4.6 Conclusions and Future Work . 50 5 Cascaded Harmonic-Residual-Percussive Features 53 5.1 Harmonic-Residual-Percussive Source Separation . 54 5.2 Related Work . 55 5.3 Cascaded Harmonic-Residual-Percussive Features . 56 5.4 Applications . 59 5.5 Conclusions and Future Work . 62 6 Finding Drum Breaks in Digital Music Recordings 63 6.1 Drum Breaks . 64 6.2 Task Specification . 65 6.3 Baseline System and Experiments . 66 6.4 Evaluation . 68 6.5 Conclusions and Future Work . 72 7 Estimating Sixteenth Note Swing Ratio in Drum Break Recordings 75 7.1 Context and Related Tasks . 76 7.2 Annotated Dataset . 80 7.3 Automated SR Estimation Approach . 84 7.4 Evaluation . 94 7.5 Conclusion . 99 8 Summary and Future Work 101 A NMF Toolbox 103 A.1 Introduction . 103 A.2 SBEM Example: Learning a Track Model with Minimal Information . 105 A.3 Diagonality-Enhanced NMF . 110 A.4 MATLAB Function Reference . 114 B Break-Informed Audio Decomposition for Interactive Redrumming 117 B.1 Proposed Method . 117 B.2 Real-World Scenario . 119 Bibliography 121 2 Chapter 1 Introduction Almost in parallel, technological advances fostered changes on two fronts. On the one hand, computation technology enabled emergent fields of study like music information retrieval (MIR) and audio processing. On the other hand, music technology—particularly affordable digital sampling—democratized production, putting music-making capabilities into the eager hands of amateur musicians with small home studios. Each of these fronts has become established in its own right. MIR and audio processing are widely recognized research fields, with conferences and publications (such as ISMIR, ICASSP, and DAFx) that are devoted to them. At the same time, sample-based electronic music (SBEM) and the practice of sampling have grown out of the genres where they were first used (such as hip-hop, jungle, and drum’n’bass), and have become central to many genres, including contemporary (as of 2018), chart-topping pop music. Recording technology and digital sampling made possible the emergence of both MIR and SBEM. The workflows and practices related to SBEM production open up interesting analysis and retrieval problems— especially new perspectives on well-established tasks such as fingerprinting, audio decomposition, and structure analysis. However, up to now, MIR and audio processing literature have dedicated relatively little attention to SBEM. Audio fingerprinting has typically been used to match a (potentially distorted or modified) query to a reference in a database—in other words, as a means to identify the query. Nevertheless, it can also be used for structure analysis in SBEM, by exploiting the fact that loops are exact copies of each other, up to differences caused by applying effects or overlaying samples in different combinations. This piece of knowledge—that we can expect mostly identical copies of a sample throughout a track—also motivates interesting audio decomposition and structure analysis applications. While making tracks, SBEM producers often have to manage large sample and track collections, seeking musical material with certain characteristics that will fit into their production. To this end, musicians can use audio features to visualize important timbral and structural changes, helping to locate and retrieve