Towards a Perceptually Relevant Control Space

Music sound synthesis using machine learning: Towards a perceptually relevant control space Fanny ROCHE PhD Defense 29 September 2020 Supervisor: Laurent GIRIN Co-supervisors: Thomas HUEBER Co-supervisors: Maëva GARNIER Co-supervisors: Samuel LIMIER Fanny ROCHE - PhD Defense - 29 September 2020 1 / 48 Context and Objectives Context and Objectives Fanny ROCHE - PhD Defense - 29 September 2020 2 / 48 Context and Objectives Context and Objectives Fanny ROCHE - PhD Defense - 29 September 2020 2 / 48 Context and Objectives Context and Objectives Fanny ROCHE - PhD Defense - 29 September 2020 2 / 48 Processed recordings ! e.g. sampling, wavetable, granular, ... + computationally efficient − very memory consuming Spectral modeling ! e.g. additive, subtractive, ... + close to human sound perception − numerous & very specific Physical modeling ! e.g. solving wave equations, modal synthesis, ... + physically meaningful controls − very specific sounds Context and Objectives Context and Objectives Music sound synthesis Abstract algorithms ! e.g. FM, waveshaping, ... + rich sounds − complex parameters Fanny ROCHE - PhD Defense - 29 September 2020 3 / 48 Spectral modeling ! e.g. additive, subtractive, ... + close to human sound perception − numerous & very specific Physical modeling ! e.g. solving wave equations, modal synthesis, ... + physically meaningful controls − very specific sounds Context and Objectives Context and Objectives Music sound synthesis Abstract algorithms ! e.g. FM, waveshaping, ... + rich sounds − complex parameters Processed recordings ! e.g. sampling, wavetable, granular, ... + computationally efficient − very memory consuming Fanny ROCHE - PhD Defense - 29 September 2020 3 / 48 Physical modeling ! e.g. solving wave equations, modal synthesis, ... + physically meaningful controls − very specific sounds Context and Objectives Context and Objectives Music sound synthesis Abstract algorithms ! e.g. FM, waveshaping, ... + rich sounds − complex parameters Processed recordings ! e.g. sampling, wavetable, granular, ... + computationally efficient − very memory consuming Spectral modeling ! e.g. additive, subtractive, ... + close to human sound perception − numerous & very specific Fanny ROCHE - PhD Defense - 29 September 2020 3 / 48 Context and Objectives Context and Objectives Music sound synthesis Abstract algorithms ! e.g. FM, waveshaping, ... + rich sounds − complex parameters Processed recordings ! e.g. sampling, wavetable, granular, ... + computationally efficient − very memory consuming Spectral modeling ! e.g. additive, subtractive, ... + close to human sound perception − numerous & very specific Physical modeling ! e.g. solving wave equations, modal synthesis, ... + physically meaningful controls − very specific sounds Fanny ROCHE - PhD Defense - 29 September 2020 3 / 48 Context and Objectives Context and Objectives Fanny ROCHE - PhD Defense - 29 September 2020 4 / 48 Context and Objectives Context and Objectives Thesis Project . New machine learning methods to tackle these issues and get: ◦ perceptually-meaningful ◦ independent control ◦ accurate sound control parameters parameters modeling Fanny ROCHE - PhD Defense - 29 September 2020 5 / 48 Context and Objectives Challenges/Research questions 1. Define verbal descriptors adapted to synthetic sounds 2. Find suited method for extracting a high-level representation space & generating high-quality sounds 3. Get perceptually-meaningful control parameters for the synthesis Fanny ROCHE - PhD Defense - 29 September 2020 6 / 48 Context and Objectives Content 1 Perceptual characterization of synthetic timbre State-of-the-art Free verbalization perceptual test Perceptual scaling test Conclusion 2 Unsupervised representation learning Methodology Comparative study Conclusion 3 Towards weak supervision using timbre perception Methodology Experiments Perceptual evaluation Conclusion 4 Conclusion and perspectives Fanny ROCHE - PhD Defense - 29 September 2020 7 / 48 Perceptual characterization of synthetic timbre Content 1 Perceptual characterization of synthetic timbre State-of-the-art Free verbalization perceptual test Perceptual scaling test Conclusion 2 Unsupervised representation learning Methodology Comparative study Conclusion 3 Towards weak supervision using timbre perception Methodology Experiments Perceptual evaluation Conclusion 4 Conclusion and perspectives Fanny ROCHE - PhD Defense - 29 September 2020 8 / 48 Perceptual characterization of synthetic timbre Perceptual characterization of synthetic timbre Fanny ROCHE - PhD Defense - 29 September 2020 9 / 48 Timbre perception approaches: Multidimensional studies (MDS) [Grey 1977; Iverson and Krumhansl 1993; McAdams et al. 1995; Lakatos 2000] Qualitative description of timbre: ) No consensus BUT agreement on its spectro-temporal shape: temporal dynamics vs. spectral content [Schaeffer 1996; Castellengo 2015] Context-dependent perceptual dimensions ! type of sounds, listeners, language, ... Perceptual characterization of synthetic timbre State-of-the-art Ambiguous definition of timbre ! multidimensional perceptual attribute Fanny ROCHE - PhD Defense - 29 September 2020 10 / 48 ) No consensus BUT agreement on its spectro-temporal shape: temporal dynamics vs. spectral content [Schaeffer 1996; Castellengo 2015] Context-dependent perceptual dimensions ! type of sounds, listeners, language, ... Perceptual characterization of synthetic timbre State-of-the-art Ambiguous definition of timbre ! multidimensional perceptual attribute Timbre perception approaches: Multidimensional studies (MDS) [Grey 1977; Iverson and Krumhansl 1993; McAdams et al. 1995; Lakatos 2000] Qualitative description of timbre: Free verbalization [Faure 2000; Traube 2004; Garnier et al. 2007; Cance and Dubois 2015] Free categorization [Guyot 1996; Gaillard 2000; Bensa et al. 2004; Ehrette 2004] Semantic differential (SD) method [Faure 2000; Ehrette 2004; Zacharakis et al. 2014] Fanny ROCHE - PhD Defense - 29 September 2020 10 / 48 Context-dependent perceptual dimensions ! type of sounds, listeners, language, ... Perceptual characterization of synthetic timbre State-of-the-art Ambiguous definition of timbre ! multidimensional perceptual attribute Timbre perception approaches: Multidimensional studies (MDS) [Grey 1977; Iverson and Krumhansl 1993; McAdams et al. 1995; Lakatos 2000] Qualitative description of timbre: Free verbalization [Faure 2000; Traube 2004; Garnier et al. 2007; Cance and Dubois 2015] Free categorization [Guyot 1996; Gaillard 2000; Bensa et al. 2004; Ehrette 2004] Semantic differential (SD) method [Faure 2000; Ehrette 2004; Zacharakis et al. 2014] ) No consensus BUT agreement on its spectro-temporal shape: temporal dynamics vs. spectral content [Schaeffer 1996; Castellengo 2015] Fanny ROCHE - PhD Defense - 29 September 2020 10 / 48 Perceptual characterization of synthetic timbre State-of-the-art Ambiguous definition of timbre ! multidimensional perceptual attribute Timbre perception approaches: Multidimensional studies (MDS) [Grey 1977; Iverson and Krumhansl 1993; McAdams et al. 1995; Lakatos 2000] Qualitative description of timbre: Free verbalization [Faure 2000; Traube 2004; Garnier et al. 2007; Cance and Dubois 2015] Free categorization [Guyot 1996; Gaillard 2000; Bensa et al. 2004; Ehrette 2004] Semantic differential (SD) method [Faure 2000; Ehrette 2004; Zacharakis et al. 2014] ) No consensus BUT agreement on its spectro-temporal shape: temporal dynamics vs. spectral content [Schaeffer 1996; Castellengo 2015] Context-dependent perceptual dimensions ! type of sounds, listeners, language, ... Fanny ROCHE - PhD Defense - 29 September 2020 10 / 48 Perceptual characterization of synthetic timbre State-of-the-art Ambiguous definition of timbre ! multidimensional perceptual attribute Timbre perception approaches: Multidimensional studies (MDS) [Grey 1977; Iverson and Krumhansl 1993; McAdams et al. 1995; Lakatos 2000] Qualitative description of timbre: Free verbalization [Faure 2000; Traube 2004; Garnier et al. 2007; Cance and Dubois 2015] Free categorization [Guyot 1996; Gaillard 2000; Bensa et al. 2004; Ehrette 2004] Semantic differential (SD) method [Faure 2000; Ehrette 2004; Zacharakis et al. 2014] ) No consensus BUT agreement on its spectro-temporal shape: temporal dynamics vs. spectral content [Schaeffer 1996; Castellengo 2015] Context-dependent perceptual dimensions ! type of sounds, listeners, language, ... Fanny ROCHE - PhD Defense - 29 September 2020 10 / 48 Perceptual characterization of synthetic timbre Free verbalization perceptual test Objective: . Collect verbal descriptors that are frequently and transversally used to describe synthesizer sounds in French Participants: 101 responses Stimuli: Creation of the ARTURIA dataset ! 1,233 samples from ARTURIA’s software synthesizers 50 wisely chosen stimuli HHHHHH Fanny ROCHE - PhD Defense - 29 September 2020 11 / 48 Perceptual characterization of synthetic timbre Free verbalization perceptual test Protocol: Fanny ROCHE - PhD Defense - 29 September 2020 12 / 48 Observations Both terms commonly used for usual musical instruments and new terms: ! e.g. brillant, chaud, métallique, ... see for example [Faure 2000; Traube 2004; Cheminée et al. 2005; Garnier et al. 2007; Lavoie 2013] ! e.g. distordu, explosif, rétro-futuriste, robotique, saccadé, spatial, ... 5 most frequent and transversal perceptual categories all related to spectral content of the sound Perceptual characterization of synthetic timbre Free verbalization perceptual test Results analysis Pre-processing Collected 784 different terms Semantic

Towards a Perceptually Relevant Control Space

Generating Multi-Agent Trajectories Using Programmatic Weak

Constrained Labeling for Weakly Supervised Learning

Multi-Task Weak Supervision Enables Automated Abnormality Localization in Whole-Body FDG-PET/CT

Visual Learning with Weak Supervision

Natural Language Processing and Weak Supervision

Self-Supervised Collaborative Learning for Medical Image Synthesis

Weakly Supervised Text Classification with Naturally Annotated Resource

Weakly Supervised Short Text Categorization Using World Knowledge

Strength from Weakness: Fast Learning Using Weak Supervision

Snuba: Automating Weak Supervision to Label Training Data

Multi-Modal Weakly Supervised Dense Event Captioning

A Weak-Supervision Method for Automating Training Set Creation in Multi-Domain Aspect Sentiment Classiﬁcation