Automatic Audio Sample Finder for Music Creation Melodic Audio Segmentation Using DSP and Machine Learning
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN INFORMATION AND COMMUNICATION TECHNOLOGY, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2019 Automatic audio sample finder for music creation Melodic audio segmentation using DSP and machine learning DAVID PITUK KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Automatic audio sample finder for music creation DAVID PITUK Master in ICT Innovation Date: October 25, 2019 Supervisor: Saikat Chatterjee Examiner: Saikat Chatterjee School of Electrical Engineering and Computer Science Host company: Teenage Engineering Swedish title: Automatisk ljudprovfinnare för musikskapande iii Abstract In the field of audio signal processing, there have always been attempts to cre- ate tools which help musicians by automating processes for music creation or analysis, and the electronic music industry is still playing an important role in the combination of software engineering and music. In the age of sample based synthesizers and sequencers, creating and using high quality and unique audio sample packages is a crucial part for composing songs. Nowadays, there are hundreds of audio applications and editors that provide the sufficient tools for songwriters and DJs to find and edit audio samples and create their own signature packages for their performances. However, these applications do not offer automated solutions to extract melodic loop or drum samples. Therefore, the whole procedure of extracting euphonious and unique samples can be quite time consuming. To decide which part of the song is good enough to be used as a separate loop or drum sound is highly subjective, so to fully automate this mechanism is really challenging. However, having a good balance between fully automated processes and freedom for additional editing can result to a useful tool, which can still save a lot of time for the users. In this paper, I present the research and implementation of a cross-platform (Windows, macOS, Linux) desktop application which automatically extracts melodic motifs and percussion sections from songs for loop and drum sam- ples. Furthermore, the app also provides classification of the extracted drum samples in five categories (kick, snare, clap, open hi-hat and closed hi-hat) and allows the user to do additional editing on the samples. The software is devel- oped as a part of an internship at a Swedish audio company, called Teenage Engineering, therefore the application converts the final sample package into a single file which is supported by the company’s OP-Z sequencer and synthe- sizer device. iv Sammanfattning Inom ämnet av ljudprocessering har det alltid gjorts försök att framställa verk- tyg som hjälper musiker genom att automatisera processer för att skapa el- ler analysera musik, och den elektroniska musikindustrin spelar en viktig roll då de kombinerar programvarutekniken och musik. I dagens musikindustri är samplingsbaserade syntar, sequencers, användning av högkvalitativa och uni- ka ljudpaket en avgörande del för att komponera låtar. Numera finns det hundratals ljudapplikationer och redigeringsprogram som ger tillräckligt med verktyg till låtskrivare och DJ:s för att hitta och redigera ljud och skapa sina egna signature packages för sina konserter. Dessa appli- kationer erbjuder emellertid inte automatiserade lösningar för framställningen av melodiska slingor eller trum samples. Därför kan hela förfarandet i proces- sen att skapa unika samples vara ganska tidskrävande. Att avgöra vilken del av låten som är tillräckligt bra för att använda som en separat slinga eller åter- kommande trumljud är mycket subjektivt, så att helt automatisera musikpro- duktionen är väldigt utmanande. Att ha en bra balans mellan de helautomatiska processerna och möjlighet för ytterligare redigering kan emellertid resultera i ett användbart verktyg som kan spara mycket tid för användarna. I denna avhandling presenterar jag forskning och implementering av en korsplatt- forms (Windows, macOS, Linux) applikation som automatiskt framställer me- lodier och slagverkssektioner för låtar. Dessutom tillhandahåller appen också klassificering av de skapade trum samples i fem kategorier (kick, snare, klapp, öppen hi-hat och sluten hi-hat). Detta tillåter användaren att göra ytterligare redigering i sampelsen. Mjukvaran utvecklas som en del av ett projekt hos det svenska ljudföretaget Teenage Engineering. Applikationen konverterar det slutliga sample package till en enda fil som stöds av företagets OP-Z sequencer och synthesizer-enhet. Contents 1 Introduction 1 1.1 Background . .1 1.1.1 Teenage Engineering and the OP-Z . .1 1.1.2 Sample kits for the OP-Z . .2 1.2 Motivation . .3 1.3 Goal . .3 1.4 Method . .4 1.4.1 Melodic samples . .4 1.4.2 Drum samples and classification . .4 1.5 Sustainability goals . .6 1.5.1 Good Health and Well-being . .6 1.5.2 Industry, Innovation and Infrastructure . .6 1.6 Outline . .6 2 Theoretical Background 7 2.1 The sound . .7 2.1.1 Time domain . .8 2.1.2 Frequency domain . .9 2.1.3 Spectrogram . 11 2.2 Melodic sample extraction . 12 2.2.1 Similarity measure . 12 2.2.2 Self similarity matrix . 13 2.3 Audio classification . 15 2.3.1 Convolutional neural network . 16 2.4 The OP-Z kit file . 19 2.4.1 The AIFF file format . 20 2.4.2 The OP-Z JSON Object . 20 v vi CONTENTS 3 Implementation 22 3.1 Loop sample extraction . 23 3.1.1 Preprocessing the spectrogram . 23 3.1.2 Self similarity matrix and thresholding . 25 3.1.3 Sample extraction from the matrix . 27 3.1.4 Algorithm overview . 28 3.2 Drum sample extraction and classification . 29 3.2.1 Beat slicing . 29 3.2.2 Generating the CNN model . 32 3.2.3 Algorithm overview . 36 3.3 The application structure and user interface . 36 3.3.1 The Electron framework . 38 3.3.2 The ZeroRPC module . 39 3.3.3 The graphical user interface . 39 4 Conclusion 43 4.1 Results . 43 4.2 Future work . 44 4.3 Final thought . 44 Bibliography 46 Chapter 1 Introduction 1.1 Background Music sequencers (or audio sequencer or simply sequencer) are very impor- tant part of electronic music creation. They are devices or application software that can record, edit, or play back music, by handling note and performance information in several forms. Sequencers can be categorized by handling data types such as MIDI, CV/Gate or audio data sequencers. Another way of cat- egorization is the storage and playback mechanism of the device, for example real time, analog or step sequencers [1]. 1.1.1 Teenage Engineering and the OP-Z This thesis project is a part of an internship program at Teenage Engineering. TE is a Swedish consumer electronics company and manufacturer founded in 2005 based in Stockholm. Their products include electronics and synthesiz- ers. TE’s OP-Z is an audio sample based step sequencer with some additional fea- tures.1 A step sequencers breaks down beats into ‘steps’. For example, if the user breaks up a loop with 4 bars that’s in standard 4/4 time, it will have 16 steps (also known as beats). With a sequencer, the user can edit each step to cus- tomize the beats or the song. Tweak, add/remove, edit drum hits such as kicks, snares or hats, add sample hits or effects. Then the user can set the desired 1OP-Z has many more features than a regular step sequencer, such as sample based syn- thesizer mode or the visual and audio effect unit. 1 2 CHAPTER 1. INTRODUCTION number of steps in each beat, change velocity, reverb and other effects. OP-Z has eight audio tracks which are divided into two groups, the drum group and the synth group. The drum group consists of four drum tracks. These are kick, snare, perc and sample. Each track in this group has a two note polyphony per step. They are all sample based and consist of 24 different sounds across the musical keyboard [2]. This is called a kit and this thesis paper is about how to generate whole kits from different songs for OP-Z automatically. Figure 1.1: Teenage Engineering’s OP-Z sequncers 1.1.2 Sample kits for the OP-Z OP-Z has several built-in sample kits, however users can also load their own kits to the device storage. In order for the device to be able to use the uploaded sounds, the drum sample kits has to meet some requirements: • The kit has to be a single file containing the sounds for each keys. • The file’s length must be 12 seconds or shorter. • The file has to be in AIF (Audio Interchange File) format with the sample rate of 44.1 kHz. • Header meta data must be part of the AIFF file’s chunk section. This header contains the information about the assignment between the kit CHAPTER 1. INTRODUCTION 3 samples and the OP-Z keyboard and other settings, such as pitch, LFO etc. I provide more details about this header section in the next chapter. 1.2 Motivation There are many sound editors and beat slicers on the software market which helps users to create their custom single kit files, however each of these tools lack of certain features that should be implemented in order to achieve an easy way of OP-Z kit construction. Figure 1.2: Audacity, a versatile and open-source audio editor For example editors like the open-source Audacity allows users to cut and edit parts of songs, but usually do not have the option to create AIFF files with the specific OP-Z compatible header data. Furthermore, the market lacks of software which are able to find melodic motifs and unique drum samples in an audio file automatically. 1.3 Goal The main goal of this thesis project is to implement a cross-platform desktop application which has the following features: • Finding melodic motifs in different songs by one click • Extracting unique drum samples from songs or drum recordings auto- matically 4 CHAPTER 1.