Music Sound Synthesis Using Machine Learning: Towards a Perceptually Relevant Control Space

Total Page:16

File Type:pdf, Size:1020Kb

Music Sound Synthesis Using Machine Learning: Towards a Perceptually Relevant Control Space THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITE GRENOBLE ALPES Spécialité : Signal, Image, Parole, Télécoms (SIPT) Arrêté ministériel : 25 mai 2016 Présentée par Fanny ROCHE Thèse dirigée par Laurent GIRIN, Professeur, Communauté Université Grenoble Alpes, GIPSA-Lab et Co-encadrée par Thomas HUEBER, Chargé de Recherche, CNRS, GIPSA-Lab et Maëva GARNIER, Chargée de Recherche, CNRS, GIPSA-Lab et Samuel LIMIER, Ingénieur, ARTURIA préparée au sein du Laboratoire Grenoble Images Parole Signal Automatique (GIPSA-Lab) et d’ARTURIA dans l'École Doctorale Electronique, Electrotechnique, Automatique, Traitement du Signal (EEATS) Music sound synthesis using machine learning: Towards a perceptually relevant control space Thèse soutenue publiquement le 29 septembre 2020, devant le jury composé de : Monsieur Christophe D’ALESSANDRO Directeur de Recherche, CNRS, Institut Jean le Rond d’Alembert, Rapporteur, Président du jury Monsieur Geoffroy PEETERS Professeur, Télécom ParisTech, LTCI, Rapporteur Madame Emilia GÓMEZ Associate Professor, Universitat Pompeu Fabra, MTG, Examinatrice Monsieur Antoine LIUTKUS Chargé de Recherche, INRIA, LIRMM, Examinateur Monsieur Laurent GIRIN Professeur, Communauté Université Grenoble Alpes, GIPSA-Lab, Directeur de Thèse Monsieur Thomas HUEBER Chargé de Recherche, CNRS, GIPSA-Lab, Co-encadrant de Thèse Madame Maëva GARNIER Chargée de Recherche, CNRS, GIPSA-Lab, Co-encadrante de Thèse, Invitée Monsieur Samuel LIMIER Ingénieur, ARTURIA, Co-encadrant de Thèse, Invité UNIVERSITÉ DE GRENOBLE ALPES ÉCOLE DOCTORALE EEATS Electronique, Electrotechnique, Automatique, Traitement du Signal THÈSE pour obtenir le titre de docteur en sciences de l’Université de Grenoble Alpes Mention : Signal, Image, Parole, Télécoms Présentée et soutenue par Fanny ROCHE Music sound synthesis using machine learning: Towards a perceptually relevant control space Thèse dirigée par Laurent GIRIN et co-encadrée par Thomas HUEBER, Maëva GARNIER et Samuel LIMIER préparée au laboratoire Grenoble Images Parole Signal Automatique (GIPSA-Lab) et ARTURIA soutenue le 29 septembre 2020 Jury : Président du jury : Christophe D’ALESSANDRO CNRS, Institut Jean le Rond d’Alembert Rapporteur : Geoffroy PEETERS Télécom ParisTech, LTCI Examinatrice : Emilia GÓMEZ Universitat Pompeu Fabra, MTG Examinateur : Antoine LIUTKUS INRIA, LIRMM Directeur : Laurent GIRIN Communauté Université Grenoble Alpes, GIPSA-Lab Encadrant : Thomas HUEBER CNRS, GIPSA-Lab Invitée : Maëva GARNIER CNRS, GIPSA-Lab Invité : Samuel LIMIER ARTURIA Abstract One of the main challenges of the synthesizer market and the research in sound synthesis nowadays lies in proposing new forms of synthesis allowing the creation of brand new sonori- ties while offering musicians more intuitive and perceptually meaningful controls to help them reach the perfect sound more easily. Indeed, today’s synthesizers are very powerful tools that provide musicians with a considerable amount of possibilities for creating sonic textures, but the control of parameters still lacks user-friendliness and may require some expert knowledge about the underlying generative processes. In this thesis, we are interested in developing and evaluating new data-driven machine learning methods for music sound synthesis allow- ing the generation of brand new high-quality sounds while providing high-level perceptually meaningful control parameters. The first challenge of this thesis was thus to characterize the musical synthetic timbre by evidencing a set of perceptual verbal descriptors that are both frequently and consensually used by musicians. Two perceptual studies were then conducted: a free verbalization test enabling us to select eight different commonly used terms for describing synthesizer sounds, and a semantic scale analysis enabling us to quantitatively evaluate the use of these terms to characterize a subset of synthetic sounds, as well as analyze how consensual they were. In a second phase, we investigated the use of machine learning algorithms to extract a high-level representation space with interesting interpolation and extrapolation properties from a dataset of sounds, the goal being to relate this space with the perceptual dimensions evidenced earlier. Following previous studies interested in using deep learning for music sound synthesis, we focused on autoencoder models and realized an extensive comparative study of several kinds of autoencoders on two different datasets. These experiments, together with a qualitative analysis made with a non real-time prototype developed during the thesis, allowed us to validate the use of such models, and in particular the use of the variational autoencoder (VAE), as relevant tools for extracting a high-level latent space in which we can navigate smoothly and create new sounds. However, so far, no link between this latent space and the perceptual dimensions evidenced by the perceptual tests emerged naturally. As a final step, we thus tried to enforce perceptual supervision of the VAE by adding a regularization during the training phase. Using the subset of synthetic sounds used in the second perceptual test and the corresponding perceptual grades along the eight perceptual dimensions provided by the semantic scale analysis, it was possible to constraint, to a certain extent, some dimensions of the VAE high-level latent space so as to match these perceptual dimensions. A final comparative test was then conducted in order to evaluate the efficiency of this additional regularization for conditioning the model and (partially) leading to a per- ceptual control of music sound synthesis. Keywords: Synthesizer sounds, synthetic timbre perceptual characterization, timbre verbal descriptors, (variational) autoencoders, perceptually controlled audio synthesis. i Résumé Un des enjeux majeurs du marché des synthétiseurs et de la recherche en synthèse sonore aujourd’hui est de proposer une nouvelle forme de synthèse permettant de générer des sons inédits tout en offrant aux utilisateurs de nouveaux contrôles plus intuitifs afin de les aider dans leur recherche de sons. En effet, les synthétiseurs sont actuellement des outils très puissants qui offrent aux musiciens une large palette de possibilités pour la création de textures sonores, mais également souvent très complexes avec des paramètres de contrôle dont la manipulation nécessite généralement des connaissances expertes. Cette thèse s’intéresse ainsi au développement et à l’évaluation de nouvelles méthodes d’apprentissage machine pour la synthèse sonore permettant la génération de nouveaux sons de qualité tout en fournissant des paramètres de contrôle pertinents perceptivement. Le premier challenge que nous avons relevé a donc été de caractériser perceptivement le timbre musical synthétique en mettant en évidence un jeu de descripteurs verbaux util- isés fréquemment et de manière consensuelle par les musiciens. Deux études perceptives ont été menées : un test de verbalisation libre qui nous a permis de sélectionner huit termes communément utilisés pour décrire des sons de synthétiseurs, et une analyse à échelles sé- mantiques permettant d’évaluer quantitativement l’utilisation de ces termes pour caractériser un sous-ensemble de sons, ainsi que d’analyser leur "degré de consensualité". Dans un second temps, nous avons exploré l’utilisation d’algorithmes d’apprentissage ma- chine pour l’extraction d’un espace de représentation haut-niveau avec des propriétés intéres- santes d’interpolation et d’extrapolation à partir d’une base de données de sons, le but étant de mettre en relation cet espace avec les dimensions perceptives mises en évidence plus tôt. S’inspirant de précédentes études sur la synthèse sonore par apprentissage profond, nous nous sommes concentrés sur des modèles du type autoencodeur et avons réalisé une étude compar- ative approfondie de plusieurs types d’autoencodeurs sur deux jeux de données différents. Ces expériences, couplées avec une étude qualitative via un prototype non temps-réel développé durant la thèse, nous ont permis de valider les autoencodeurs, et en particulier l’autoencodeur variationnel (VAE), comme des outils bien adaptés à l’extraction d’un espace latent de haut- niveau dans lequel il est possible de se déplacer de manière continue et fluide en créant de tous nouveaux sons. Cependant, à ce niveau, aucun lien entre cet espace latent et les dimensions perceptives mises en évidence précédemment n’a pu être établi spontanément. Pour finir, nous avons donc apporté de la supervision au VAE en ajoutant une régularisa- tion perceptive durant la phase d’apprentissage. En utilisant les échantillons sonores résultant du test perceptif avec échelles sémantiques labellisés suivant les huit dimensions perceptives, il a été possible de contraindre, dans une certaine mesure, certaines dimensions de l’espace latent extrait par le VAE afin qu’elles coïncident avec ces dimensions. Un test comparatif a été finalement réalisé afin d’évaluer l’efficacité de cette régularisation supplémentaire pour con- ditionner le modèle et permettre un contrôle perceptif (au moins partiel) de la synthèse sonore. Mots clés : Sons de synthétiseurs, caractérisation perceptive du timbre synthétique, de- scripteurs verbaux de timbre, autoencodeurs (variationnels), contrôle perceptif de la synthèse sonore. iii Aknowledgments Il y a de ça maintenant un peu plus de six ans dans le froid et la nuit de cette contrée chère à mon coeur, une petite idée folle a germé dans mon esprit : pourquoi pas une thèse ? Et me voilà aujourd’hui arrivée au bout de ce périple
Recommended publications
  • Minimoog Model D Manual
    3 IMPORTANT SAFETY INSTRUCTIONS WARNING - WHEN USING ELECTRIC PRODUCTS, THESE BASIC PRECAUTIONS SHOULD ALWAYS BE FOLLOWED. 1. Read all the instructions before using the product. 2. Do not use this product near water - for example, near a bathtub, washbowl, kitchen sink, in a wet basement, or near a swimming pool or the like. 3. This product, in combination with an amplifier and headphones or speakers, may be capable of producing sound levels that could cause permanent hearing loss. Do not operate for a long period of time at a high volume level or at a level that is uncomfortable. 4. The product should be located so that its location does not interfere with its proper ventilation. 5. The product should be located away from heat sources such as radiators, heat registers, or other products that produce heat. No naked flame sources (such as candles, lighters, etc.) should be placed near this product. Do not operate in direct sunlight. 6. The product should be connected to a power supply only of the type described in the operating instructions or as marked on the product. 7. The power supply cord of the product should be unplugged from the outlet when left unused for a long period of time or during lightning storms. 8. Care should be taken so that objects do not fall and liquids are not spilled into the enclosure through openings. There are no user serviceable parts inside. Refer all servicing to qualified personnel only. NOTE: This equipment has been tested and found to comply with the limits for a class B digital device, pursuant to part 15 of the FCC rules.
    [Show full text]
  • Real-Time Timbre Transfer and Sound Synthesis Using DDSP
    REAL-TIME TIMBRE TRANSFER AND SOUND SYNTHESIS USING DDSP Francesco Ganis, Erik Frej Knudesn, Søren V. K. Lyster, Robin Otterbein, David Sudholt¨ and Cumhur Erkut Department of Architecture, Design, and Media Technology Aalborg University Copenhagen, Denmark https://www.smc.aau.dk/ March 15, 2021 ABSTRACT Neural audio synthesis is an actively researched topic, having yielded a wide range of techniques that leverages machine learning architectures. Google Magenta elaborated a novel approach called Differ- ential Digital Signal Processing (DDSP) that incorporates deep neural networks with preconditioned digital signal processing techniques, reaching state-of-the-art results especially in timbre transfer applications. However, most of these techniques, including the DDSP, are generally not applicable in real-time constraints, making them ineligible in a musical workflow. In this paper, we present a real-time implementation of the DDSP library embedded in a virtual synthesizer as a plug-in that can be used in a Digital Audio Workstation. We focused on timbre transfer from learned representations of real instruments to arbitrary sound inputs as well as controlling these models by MIDI. Furthermore, we developed a GUI for intuitive high-level controls which can be used for post-processing and manipulating the parameters estimated by the neural network. We have conducted a user experience test with seven participants online. The results indicated that our users found the interface appealing, easy to understand, and worth exploring further. At the same time, we have identified issues in the timbre transfer quality, in some components we did not implement, and in installation and distribution of our plugin. The next iteration of our design will address these issues.
    [Show full text]
  • Presented at ^Ud,O the 99Th Convention 1995October 6-9
    Tunable Bandpass Filters in Music Synthesis 4098 (L-2) Robert C. Maher University of Nebraska-Lincoln Lincoln, NE 68588-0511, USA Presented at ^ uD,o the 99th Convention 1995 October 6-9 NewYork Thispreprinthas been reproducedfrom the author'sadvance manuscript,withoutediting,correctionsor considerationby the ReviewBoard. TheAES takesno responsibilityforthe contents. Additionalpreprintsmay be obtainedby sendingrequestand remittanceto theAudioEngineeringSocietY,60 East42nd St., New York,New York10165-2520, USA. All rightsreserved.Reproductionof thispreprint,or anyportion thereof,isnot permitted withoutdirectpermissionfromthe Journalof theAudio EngineeringSociety. AN AUDIO ENGINEERING SOCIETY PREPRINT TUNABLE BANDPASS FILTERS IN MUSIC SYNTHESIS ROBERT C. MAHER DEPARTMENT OF ELECTRICAL ENGINEERING AND CENTERFORCOMMUNICATION AND INFORMATION SCIENCE UNIVERSITY OF NEBRASKA-LINCOLN 209N WSEC, LINCOLN, NE 68588-05II USA VOICE: (402)472-2081 FAX: (402)472-4732 INTERNET: [email protected] Abst/act: Subtractive synthesis, or source-filter synthesis, is a well known topic in electronic and computer music. In this paper a description is given of a flexible subtractive synthesis scheme utilizing a set of tunable digital bandpass filters. Specific examples and applications are presented for realtime subtractive synthesis of singing and other musical signals. 0. INTRODUCTION Subtractive (or source-filter) synthesis is used widely in electronic and computer music applications. Subtractive synthesis general!y involves a source signal with a broad spectrum that is passed through a filter. The properties of the filter largely define the shape of the output spectrum by attenuating specific frequency ranges, hence the name subtractive synthesis [1]. The subtractive synthesis model is appropriate for the wide class of physical systems in which an input source drives a passive acoustical or mechanical system.
    [Show full text]
  • Computationally Efficient Music Synthesis
    HELSINKI UNIVERSITY OF TECHNOLOGY Department of Electrical and Communications Engineering Laboratory of Acoustics and Audio Signal Processing Jussi Pekonen Computationally Efficient Music Synthesis – Methods and Sound Design Master’s Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Technology. Espoo, June 1, 2007 Supervisor: Professor Vesa Välimäki Instructor: Professor Vesa Välimäki HELSINKI UNIVERSITY ABSTRACT OF THE OF TECHNOLOGY MASTER’S THESIS Author: Jussi Pekonen Name of the thesis: Computationally Efficient Music Synthesis – Methods and Sound Design Date: June 1, 2007 Number of pages: 80+xi Department: Electrical and Communications Engineering Professorship: S-89 Supervisor: Professor Vesa Välimäki Instructor: Professor Vesa Välimäki In this thesis, the design of a music synthesizer for systems suffering from limitations in computing power and memory capacity is presented. First, different possible syn- thesis techniques are reviewed and their applicability in computationally efficient music synthesis is discussed. In practice, the applicable techniques are limited to additive and source-filter synthesis, and, in special cases, to frequency modulation, wavetable and sampling synthesis. Next, the design of the structures of the applicable techniques are presented in detail, and properties and design issues of these structures are discussed. A major implemen- tation problem is raised in digital source-filter synthesis, where the use of classic wave- forms, such as sawtooth wave, as the source signal is challenging due to aliasing caused by waveform discontinuities. Methods for existing bandlimited waveform synthesis are reviewed, and a new approach using polynomial bandlimited step function is pre- sented in detail with design rules for the applicable polynomials.
    [Show full text]
  • Masterarbeit
    Masterarbeit Erstellung einer Sprachdatenbank sowie eines Programms zu deren Analyse im Kontext einer Sprachsynthese mit spektralen Modellen zur Erlangung des akademischen Grades Master of Science vorgelegt dem Fachbereich Mathematik, Naturwissenschaften und Informatik der Technischen Hochschule Mittelhessen Tobias Platen im August 2014 Referent: Prof. Dr. Erdmuthe Meyer zu Bexten Korreferent: Prof. Dr. Keywan Sohrabi Eidesstattliche Erklärung Hiermit versichere ich, die vorliegende Arbeit selbstständig und unter ausschließlicher Verwendung der angegebenen Literatur und Hilfsmittel erstellt zu haben. Die Arbeit wurde bisher in gleicher oder ähnlicher Form keiner anderen Prüfungsbehörde vorgelegt und auch nicht veröffentlicht. 2 Inhaltsverzeichnis 1 Einführung7 1.1 Motivation...................................7 1.2 Ziele......................................8 1.3 Historische Sprachsynthesen.........................9 1.3.1 Die Sprechmaschine.......................... 10 1.3.2 Der Vocoder und der Voder..................... 10 1.3.3 Linear Predictive Coding....................... 10 1.4 Moderne Algorithmen zur Sprachsynthese................. 11 1.4.1 Formantsynthese........................... 11 1.4.2 Konkatenative Synthese....................... 12 2 Spektrale Modelle zur Sprachsynthese 13 2.1 Faltung, Fouriertransformation und Vocoder................ 13 2.2 Phase Vocoder................................ 14 2.3 Spectral Model Synthesis........................... 19 2.3.1 Harmonic Trajectories........................ 19 2.3.2 Shape Invariance..........................
    [Show full text]
  • 11C Software 1034-1187
    Section11c PHOTO - VIDEO - PRO AUDIO Computer Software Ableton.........................................1036-1038 Arturia ...................................................1039 Antares .........................................1040-1044 Arkaos ....................................................1045 Bias ...............................................1046-1051 Bitheadz .......................................1052-1059 Bomb Factory ..............................1060-1063 Celemony ..............................................1064 Chicken Systems...................................1065 Eastwest/Quantum Leap ............1066-1069 IK Multimedia .............................1070-1078 Mackie/UA ...................................1079-1081 McDSP ..........................................1082-1085 Metric Halo..................................1086-1088 Native Instruments .....................1089-1103 Propellerhead ..............................1104-1108 Prosoniq .......................................1109-1111 Serato............................................1112-1113 Sonic Foundry .............................1114-1127 Spectrasonics ...............................1128-1130 Syntrillium ............................................1131 Tascam..........................................1132-1147 TC Works .....................................1148-1157 Ultimate Soundbank ..................1158-1159 Universal Audio ..........................1160-1161 Wave Mechanics..........................1162-1165 Waves ...........................................1166-1185
    [Show full text]
  • THE COMPLETE SYNTHESIZER: a Comprehensive Guide by David Crombie (1984)
    THE COMPLETE SYNTHESIZER: A Comprehensive Guide By David Crombie (1984) Digitized by Neuronick (2001) TABLE OF CONTENTS TABLE OF CONTENTS...........................................................................................................................................2 PREFACE.................................................................................................................................................................5 INTRODUCTION ......................................................................................................................................................5 "WHAT IS A SYNTHESIZER?".............................................................................................................................5 CHAPTER 1: UNDERSTANDING SOUND .............................................................................................................6 WHAT IS SOUND? ...............................................................................................................................................7 THE THREE ELEMENTS OF SOUND .................................................................................................................7 PITCH ...................................................................................................................................................................8 STANDARD TUNING............................................................................................................................................8 THE RESPONSE OF THE HUMAN
    [Show full text]
  • A Unit Selection Text-To-Speech-And-Singing Synthesis Framework from Neutral Speech: Proof of Concept 39 II.1 Introduction
    Adding expressiveness to unit selection speech synthesis and to numerical voice production 90) - 02 - Marc Freixes Guerreiro http://hdl.handle.net/10803/672066 Generalitat 472 (28 de Catalunya núm. Rgtre. Fund. ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de ió la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual undac F (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs. Universitat Ramon Llull Universitat Ramon ADVERTENCIA. El acceso a los contenidos de esta tesis doctoral y su utilización debe respetar los derechos de la persona autora. Puede ser utilizada para consulta o estudio personal, así como en actividades o materiales de investigación y docencia en los términos establecidos en el art. 32 del Texto Refundido de la Ley de Propiedad Intelectual (RDL 1/1996).
    [Show full text]
  • Tools for Creating Audio Stories
    Tools for Creating Audio Stories Steven Rubin Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2015-237 http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-237.html December 15, 2015 Copyright © 2015, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement Advisor: Maneesh Agrawala Tools for Creating Audio Stories By Steven Surmacz Rubin A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Maneesh Agrawala, Chair Professor Björn Hartmann Professor Greg Niemeyer Fall 2015 Tools for Creating Audio Stories Copyright 2015 by Steven Surmacz Rubin 1 Abstract Tools for Creating Audio Stories by Steven Surmacz Rubin Doctor of Philosophy in Computer Science University of California, Berkeley Professor Maneesh Agrawala, Chair Audio stories are an engaging form of communication that combines speech and music into com- pelling narratives. One common production pipeline for creating audio stories involves three main steps: recording speech, editing speech, and editing music. Existing audio recording and editing tools force the story producer to manipulate speech and music tracks via tedious, low-level wave- form editing.
    [Show full text]
  • Physically-Based Parametric Sound Synthesis and Control
    Physically-Based Parametric Sound Synthesis and Control Perry R. Cook Princeton Computer Science (also Music) Course Introduction Parametric Synthesis and Control of Real-World Sounds for virtual reality games production auditory display interactive art interaction design Course #2 Sound - 1 Schedule 0:00 Welcome, Overview 0:05 Views of Sound 0:15 Spectra, Spectral Models 0:30 Subtractive and Modal Models 1:00 Physical Models: Waveguides and variants 1:20 Particle Models 1:40 Friction and Turbulence 1:45 Control Demos, Animation Examples 1:55 Wrap Up Views of Sound • Sound is a recorded waveform PCM playback is all we need for interactions, movies, games, etc. (Not true!!) • Time Domain x( t ) (from physics) • Frequency Domain X( f ) (from math) • Production what caused it • Perception our image of it Course #2 Sound - 2 Views of Sound Time Domain is most closely related to Production Frequency Domain is most closely related to Perception we will see that many hybrids abound Views of Sound: Time Domain Sound is produced/modeled by physics, described by quantities of • Force force = mass * acceleration • Position x(t) actually < x(t), y(t), z(t) > • Velocity Rate of change of position dx/dt • Acceleration Rate of change of velocity dv/dt Examples: Mass+Spring+Damper Wave Equation Course #2 Sound - 3 Mass/Spring/Damper F = ma = - ky - rv - mg F = ma = - ky - rv (if gravity negligible) d 2 y r dy k + + y = 0 dt2 m dt m ( ) D2 + Dr / m + k / m = 0 2nd Order Linear Diff Eq. Solution 1) Underdamped: -t/τ ω y(t) = Y0 e cos( t ) exp.
    [Show full text]
  • Fabián Esqueda Native Instruments Gmbh 1.3.2019
    SOUND SYNTHESIS FABIÁN ESQUEDA NATIVE INSTRUMENTS GMBH 1.3.2019 © 2003 – 2019 VESA VÄLIMÄKI AND FABIÁN ESQUEDA SOUND SYNTHESIS 1.3.2019 OUTLINE ‣ Introduction ‣ Introduction to Synthesis ‣ Additive Synthesis ‣ Subtractive Synthesis ‣ Wavetable Synthesis ‣ FM and Phase Distortion Synthesis ‣ Synthesis, Synthesis, Synthesis SOUND SYNTHESIS 1.3.2019 Introduction ‣ BEng in Electronic Engineering with Music Technology Systems (2012) from University of York. ‣ MSc in Acoustics and Music Technology in (2013) from University of Edinburgh. ‣ DSc in Acoustics and Audio Signal Processing in (2017) from Aalto University. ‣ Thesis topic: Aliasing reduction in nonlinear processing. ‣ Published on a variety of topics, including audio effects, circuit modeling and sound synthesis. ‣ My current role is at Native Instruments where I work as a Software Developer for our Synths & FX team. ABOUT NI SOUND SYNTHESIS 1.3.2019 About Native Instruments ‣ One of largest music technology companies in Europe. ‣ Founded in 1996. ‣ Headquarters in Berlin, offices in Los Angeles, London, Tokyo, Shenzhen and Paris. ‣ Team of ~600 people (~400 in Berlin), including ~100 developers. SOUND SYNTHESIS 1.3.2019 About Native Instruments - History ‣ First product was Generator – a software modular synthesizer. ‣ Generator became Reaktor, NI’s modular synthesis/processing environment and one of its core products to this day. ‣ The Pro-Five and B4 were NI’s first analog modeling synthesizers. SOUND SYNTHESIS 1.3.2019 About Native Instruments ‣ Pioneered software instruments and digital
    [Show full text]
  • A Parametric Sound Object Model for Sound Texture Synthesis
    Daniel Mohlmann¨ A Parametric Sound Object Model for Sound Texture Synthesis Dissertation zur Erlangung des Grades eines Doktors der Ingenieurwissenschaften | Dr.-Ing. | Vorgelegt im Fachbereich 3 (Mathematik und Informatik) der Universitat¨ Bremen im Juni 2011 Gutachter: Prof. Dr. Otthein Herzog Universit¨atBremen Prof. Dr. J¨ornLoviscach Fachhochschule Bielefeld Abstract This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are con- sidered. Four essential processing steps for sound texture analysis are identified, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fixed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of pa- rameters over time. In contrast to standard spectral modeling techniques, this repre- sentation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of different length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of different sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modula- tion, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations.
    [Show full text]