<<

Historical Overview of Audio Spectral Modeling

Julius Smith CCRMA, Stanford University

Music 421 Applications Lecture

April 2, 2018

Julius Smith Music 421 Applications Lecture – 1 / 59 Milestones in Audio Spectral Modeling

• Fourier’s theorem (1822) Outline • (1898) Telharmonium • Voder (1920s) Voder • Channel Vocoder Vocoder (1920s)

Phase Vocoder • (1930s)

Additive Synthesis • Phase Vocoder (1966) FM Synthesis • Digital Organ (1968) Sinusoidal Modeling • (1969) Spectrogram Synth • FM Brass Synthesis (1970) DDSP • Synclavier 8-bit FM/Additive (1975) Future • FM singing voice (1978) • Sinusoidal Modeling (1985) • Sines + Noise (1988) • Sines + Noise + Transients (1988,1996,1998,2000) • Inverse FFT synthesis (1992) • Spectrogram Synthesis (2017) • Future Directions

Julius Smith Music 421 Applications Lecture – 2 / 59 Outline Telharmonium Voder Channel Vocoder Phase Vocoder Additive Synthesis FM Synthesis Sinusoidal Modeling Spectrogram Synth DDSP Future Telharmonium (1898)

Julius Smith Music 421 Applications Lecture – 3 / 59 Telharmonium (Cahill 1898)

U.S. patent 580,035: “Art of and Apparatus for Generating and Distributing Music Electrically”

Julius Smith Music 421 Applications Lecture – 4 / 59 Telharmonium Rheotomes

Forerunner of the Hammond Organ Tone Wheels

Julius Smith Music 421 Applications Lecture – 5 / 59 Telharmonium Rotor (early “”)

Hammond influenced: https://en.wikipedia.org/wiki/Tonewheel

Julius Smith Music 421 Applications Lecture – 6 / 59 Outline Telharmonium Voder Channel Vocoder Phase Vocoder Additive Synthesis FM Synthesis Sinusoidal Modeling Spectrogram Synth DDSP Future The Voder (1939)

Julius Smith Music 421 Applications Lecture – 7 / 59 The Voder (Homer Dudley — 1939 Worlds Fair)

> http://davidszondy.com/future/robot/voder.htm

Julius Smith Music 421 Applications Lecture – 8 / 59 Voder Keyboard

http://www.acoustics.hut.fi/publications/files/theses/ lemmetty mst/chap2.html — (from Klatt 1987)

Julius Smith Music 421 Applications Lecture – 9 / 59 Voder Schematic

http://ptolemy.eecs.berkeley.edu/~eal/audio/voder.html

Julius Smith Music 421 Applications Lecture – 10 / 59 Voder Demos

• Video Outline • Audio Telharmonium

Voder • Voder Keyboard • Voder Schematic • Voder Demos

Channel Vocoder

Phase Vocoder

Additive Synthesis

FM Synthesis

Sinusoidal Modeling

Spectrogram Synth

DDSP

Future

Julius Smith Music 421 Applications Lecture – 11 / 59 Outline Telharmonium Voder Channel Vocoder Phase Vocoder Additive Synthesis FM Synthesis Sinusoidal Modeling Spectrogram Synth DDSP The Channel Vocoder (1928) Future (“Voice Coder”)

Julius Smith Music 421 Applications Lecture – 12 / 59 Vocoder Analysis & Resynthesis (Dudley 1928)

Analysis: Outline

Telharmonium • Ten analog bandpass filters between 250 and 3000 Hz: Voder Bandpass → rectifier → lowpass filter → amplitude envelope Channel Vocoder • Voiced/Unvoiced decision made • Vocoder Examples • Fundamental frequency F0 measured for voiced case Phase Vocoder

Additive Synthesis Synthesis:

FM Synthesis • Ten matching bandpass filters driven by a Sinusoidal Modeling

Spectrogram Synth ◦ “buzz source” (voiced), or DDSP ◦ “hiss source” (unvoiced) Future • Bands were scaled by amplitude envelopes and summed • Said to have an “unpleasant electrical accent” Related Speech Models: • The Vocoder is an early source-filter model for speech • Linear Predictive Coding (LPC) of speech is another

Julius Smith Music 421 Applications Lecture – 13 / 59 Vocoder Filter Bank Analysis/Resynthesis

magnitude, or Outline magnitude and phase extraction Telharmonium

Voder A Channel Vocoder x0(t) xˆ0(t) • Vocoder Examples f

Phase Vocoder Data Compression, A Transmission, Additive Synthesis xˆ(t) x(t) x1(t) Storage, xˆ1(t) Manipulation, FM Synthesis f Noise reduction, ... Sinusoidal Modeling

Spectrogram Synth

DDSP

Future A xN−1 xˆN−1

f

Analysis Processing Synthesis

Julius Smith Music 421 Applications Lecture – 14 / 59 Channel Vocoder Sound Examples

Outline • Original Telharmonium

Voder • 10 channels, sine carriers Channel Vocoder • • Vocoder Examples 10 channels, narrowband-noise carriers

Phase Vocoder

Additive Synthesis • 26 channels, sine carriers FM Synthesis • 26 channels, narrowband-noise carriers Sinusoidal Modeling • 26 channels, narrowband-noise carriers, channels reversed Spectrogram Synth DDSP • Phase Vocoder: Identity system in absence of modifications Future

• The FFT Phase Vocoder next transitioned to the Short-Time Fourier Transform (STFT) (Allen and Rabiner 1977)

Julius Smith Music 421 Applications Lecture – 15 / 59 Outline Telharmonium Voder Channel Vocoder Phase Vocoder Additive Synthesis FM Synthesis Sinusoidal Modeling Spectrogram Synth DDSP Future The Phase Vocoder (1966)

Julius Smith Music 421 Applications Lecture – 16 / 59 Phase Vocoder Analysis for Additive Synthesis (1976)

Outline Analysis Model Synthesis Model ω ∆ω t Telharmonium Channel Filter ak ak(t) k+ k( ) Voder Response A F ∆ω Sine Osc Channel Vocoder k

ω Phase Vocoder 0 ωk Out Additive Synthesis • Early “channel vocoder” implementations (hardware) only FM Synthesis

Sinusoidal Modeling measured amplitude ak(t) (Dudley 1939) • Spectrogram Synth The “phase vocoder” (Flanagan and Golden 1966) added phase

DDSP tracking in each channel Future • Portnoff (1976) developed the FFT phase vocoder, which replaced the heterodyne comb in computer-music additive-synthesis analysis (James A. Moorer) • Inverse FFT synthesis (Rodet and Depalle 1992) gave faster sinusoidal oscillator banks

Julius Smith Music 421 Applications Lecture – 17 / 59 Amplitude and Frequency Envelopes

ak(t) Outline

Telharmonium

Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis

FM Synthesis

Sinusoidal Modeling t ωk t φ˙k t Spectrogram Synth ∆ ( )= ( )

DDSP

Future 0

t

Julius Smith Music 421 Applications Lecture – 18 / 59 Outline Telharmonium Voder Channel Vocoder Phase Vocoder Additive Synthesis FM Synthesis Sinusoidal Modeling Spectrogram Synth DDSP Future Additive Synthesis (1969)

Julius Smith Music 421 Applications Lecture – 19 / 59 Classic Additive-Synthesis Analysis (Heterodyne Comb)

Outline

Telharmonium

Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis • Additive Analysis • Additive Synthesis

FM Synthesis

Sinusoidal Modeling

Spectrogram Synth

DDSP

Future

John Grey 1975 — CCRMA Tech. Reports 1 & 2 (CCRMA “STANM” reports — available online)

Julius Smith Music 421 Applications Lecture – 20 / 59 Classic Additive-Synthesis (Sinusoidal Oscillator Envelopes)

Outline

Telharmonium

Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis • Additive Analysis • Additive Synthesis

FM Synthesis

Sinusoidal Modeling

Spectrogram Synth

DDSP

Future

John Grey 1975 — CCRMA Tech. Reports 1 & 2 (CCRMA “STANM” reports — available online)

Julius Smith Music 421 Applications Lecture – 21 / 59 Classic Additive Synthesis Diagram (Computer Music, 1960s)

Outline A1(t) f1(t) A2(t) f2(t) A3(t) f3(t) A4(t) f4(t)

Telharmonium

Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis • Additive Analysis • Additive Synthesis

FM Synthesis

Sinusoidal Modeling

Spectrogram Synth

DDSP noise Future FIR Σ

4 t y(t)= Ai(t)sin ωi(t)dt + φi(0) Xi=1 Z0 

Julius Smith Music 421 Applications Lecture – 22 / 59 Classic Additive-Synthesis Examples

Outline • Bb Telharmonium • Eb Clarinet Voder • Oboe Channel Vocoder • Phase Vocoder • Tenor Saxophone Additive Synthesis • Additive Analysis • Trumpet • Additive Synthesis • English Horn FM Synthesis • French Horn Sinusoidal Modeling • Spectrogram Synth

DDSP • Future All of the above • Independently synthesized set

(Synthesized from original John Grey data)

Julius Smith Music 421 Applications Lecture – 23 / 59 Outline Telharmonium Voder Channel Vocoder Phase Vocoder Additive Synthesis FM Synthesis Sinusoidal Modeling Spectrogram Synth DDSP Frequency Modulation Synthesis Future (1973)

Julius Smith Music 421 Applications Lecture – 24 / 59 Frequency Modulation (FM) Synthesis

Outline FM synthesis is normally used as a spectral modeling technique Telharmonium • Voder Discovered and developed (1970s) by John M. Chowning

Channel Vocoder (CCRMA Founding Director)

Phase Vocoder • Key paper: JAES 1973 (vol. 21, no. 7)

Additive Synthesis • Commercialized by Yamaha Corporation: FM Synthesis ◦ • FM Synthesis DX-7 synthesizer (1983) • FM Formula ◦ OPL chipset (SoundBlaster PC sound card) • FM Patch • FM Spectra ◦ Cell phone ring tones • FM Examples • FM Voice • On the physical modeling front, synthesis of vibrating-string Sinusoidal Modeling waveforms using finite differences started around this time: Spectrogram Synth Hiller & Ruiz, JAES 1971 (vol. 19, no. 6) DDSP

Future

Julius Smith Music 421 Applications Lecture – 25 / 59 FM Formula

Outline

Telharmonium x(t)= Ac sin[ωct + φc + Am sin(ωmt + φm)] Voder where Channel Vocoder

Phase Vocoder (Ac,ωc,φc) specify the carrier sinusoid Additive Synthesis (Am,ωm,φm) specify the modulator sinusoid

FM Synthesis • FM Synthesis Can also be called phase modulation • FM Formula • FM Patch • FM Spectra • FM Examples • FM Voice

Sinusoidal Modeling

Spectrogram Synth

DDSP

Future

Julius Smith Music 421 Applications Lecture – 26 / 59 Simple FM “Brass” Patch (Chowning 1970–)

Jean-Claude Risset observation (1964–1969): Outline Brass bandwidth ∝ amplitude Telharmonium

Voder

Channel Vocoder

Phase Vocoder fm = f0 Additive Synthesis g FM Synthesis • FM Synthesis A F • FM Formula • FM Patch • FM Spectra • FM Examples fc = f0 • FM Voice

Sinusoidal Modeling Spectrogram Synth A F DDSP

Future Out

Julius Smith Music 421 Applications Lecture – 27 / 59 FM Harmonic Amplitudes (Bessel Function of First Kind)

Outline Harmonic number k, FM index β:

Telharmonium

Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis

FM Synthesis 1 • FM Synthesis • FM Formula • FM Patch 0.5 • FM Spectra )

• FM Examples β ( k • FM Voice J 0 Sinusoidal Modeling 0

Spectrogram Synth 2 −0.5 4 DDSP 0 5 Future 10 6 15 20 8 25 Order k 30 10 Argument β

Julius Smith Music 421 Applications Lecture – 28 / 59 Frequency Modulation (FM) Examples

All examples by John Chowning unless otherwise noted: Outline Telharmonium • FM brass synthesis Voder • Channel Vocoder Low Brass example

Phase Vocoder • Dexter Morril’s FM Trumpet Additive Synthesis • FM singing voice (1978) FM Synthesis • FM Synthesis Each formant synthesized using an FM operator pair • FM Formula (two sinusoidal oscillators) • FM Patch • FM Spectra • Chorus • FM Examples • FM Voice • Voices

Sinusoidal Modeling • Basso Profundo Spectrogram Synth • Other early FM synthesis DDSP

Future • Clicks and Drums • Big Bell • String Canon

Julius Smith Music 421 Applications Lecture – 29 / 59 FM Voice

Outline FM voice synthesis can be viewed as compressed modeling of Telharmonium spectral formants Voder

Channel Vocoder Carrier 2 Phase Vocoder Carrier 1 Carrier 3 Additive Synthesis

FM Synthesis • FM Synthesis • FM Formula

• FM Patch Magnitude • FM Spectra • FM Examples • FM Voice 0 Frequency f Sinusoidal Modeling Modulation Frequency (all three) Spectrogram Synth

DDSP

Future

Julius Smith Music 421 Applications Lecture – 30 / 59 Outline Telharmonium Voder Channel Vocoder Phase Vocoder Additive Synthesis FM Synthesis Sinusoidal Modeling Spectrogram Synth DDSP Sinusoidal Modeling Synthesis Future (1988)

Julius Smith Music 421 Applications Lecture – 31 / 59 Tracking Spectral Peaks in the Short-Time Fourier Transform

Outline Phases Telharmonium atan

Voder s(t) Channel Vocoder FFT Frequencies Phase Vocoder Quadratic Peak Additive Synthesis dB mag Peak tracking Interpolation FM Synthesis window w(n) Amplitudes Sinusoidal Modeling • Sinusoidal Modeling • Spectral Trajectories • Sines + Noise • S+N Examples • STFT peak tracking at CCRMA: mid-1980s (PARSHL program) • S+N FX • S+N XSynth • Sines + Transients • Motivated by vocoder analysis of piano tones • S + N + Transients • • S+N+T TSM Influences: STFT (Allen and Rabiner 1977), • S+N+T Freq Map ADEC (1977), MAPLE (1979) • S+N+T Windows • HF Noise Modeling • Independently developed for speech coding by McAulay and • HF Noise Band Quatieri at Lincoln Labs (1985) • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 32 / 59

Future Example Spectral Trajectories

Outline f Telharmonium

Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis

FM Synthesis

Sinusoidal Modeling • Sinusoidal Modeling • Spectral Trajectories • Sines + Noise • S+N Examples • S+N FX • S+N XSynth • Sines + Transients • S + N + Transients t • S+N+T TSM • S+N+T Freq Map • S+N+T Windows • HF Noise Modeling • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 33 / 59

Future Parametric Spectral Modeling

1 1 2 2 3 3 4 4 Outline A (t) f (t) A (t) f (t) A (t) f (t) A (t) f (t)

Telharmonium

Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis

FM Synthesis

Sinusoidal Modeling • Sinusoidal Modeling • Spectral Trajectories • Sines + Noise • S+N Examples • S+N FX • S+N XSynth white noise • filter(t) Sines + Transients u(t) • S + N + Transients ht(τ) • S+N+T TSM P • S+N+T Freq Map • S+N+T Windows 4 • HF Noise Modeling t i i i t ∗ • HF Noise Band y(t)= A (t)cos 0 ω (t)dt + φ (0) +(h u)(t) i=1 • S+N+T Examples P hR i Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 34 / 59

Future Sines + Noise Sound Examples

Outline Xavier Serra thesis demos (Sines + Noise signal modeling) Telharmonium • Voder Piano Channel Vocoder ◦ Original Phase Vocoder ◦ Sinusoids alone Additive Synthesis ◦ Residual after sinusoids removed FM Synthesis ◦ Sines + noise model Sinusoidal Modeling • Sinusoidal Modeling • Voice • Spectral Trajectories • Sines + Noise ◦ • S+N Examples Original • S+N FX ◦ Sinusoids • S+N XSynth ◦ • Sines + Transients Residual • S + N + Transients ◦ Synthesis • S+N+T TSM • S+N+T Freq Map • S+N+T Windows • HF Noise Modeling • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 35 / 59

Future Musical Effects with Sines+Noise Models

Outline • Piano Effects Telharmonium ◦ Voder Pitch downshift one octave ◦ Channel Vocoder Pitch flattened

Phase Vocoder ◦ Varying partial stretching Additive Synthesis • Voice Effects FM Synthesis

Sinusoidal Modeling ◦ Frequency-scale by 0.6 • Sinusoidal Modeling ◦ Frequency-scale by 0.4 and stretch partials • Spectral Trajectories • Sines + Noise ◦ Variable time-scaling, deterministic to stochastic • S+N Examples • S+N FX • S+N XSynth • Sines + Transients • S + N + Transients • S+N+T TSM • S+N+T Freq Map • S+N+T Windows • HF Noise Modeling • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 36 / 59

Future Cross-Synthesis with Sines+Noise Models

Outline • Voice “modulator” Telharmonium Voder • Creaking ship’s mast “carrier” Channel Vocoder • Voice-modulated creaking mast Phase Vocoder • Same with modified spectral envelopes Additive Synthesis

FM Synthesis

Sinusoidal Modeling • Sinusoidal Modeling • Spectral Trajectories • Sines + Noise • S+N Examples • S+N FX • S+N XSynth • Sines + Transients • S + N + Transients • S+N+T TSM • S+N+T Freq Map • S+N+T Windows • HF Noise Modeling • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 37 / 59

Future Sines + Transients Sound Examples

Outline In this technique, the sinusoidal sum is phase-matched at the Telharmonium cross-over point only (with no cross-fade). Voder

Channel Vocoder • Marimba

Phase Vocoder ◦ Original Additive Synthesis ◦ Sinusoidal model FM Synthesis ◦ Sinusoidal Modeling Original attack, followed by sinusoidal model • Sinusoidal Modeling • Spectral Trajectories • Piano • Sines + Noise • S+N Examples ◦ Original • S+N FX ◦ Sinusoidal model • S+N XSynth • Sines + Transients ◦ Original attack, followed by sinusoidal model • S + N + Transients • S+N+T TSM • S+N+T Freq Map • S+N+T Windows • HF Noise Modeling • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 38 / 59

Future Multiresolution Sines + Noise + Transients

Outline Why Model Transients Separately? Telharmonium

Voder • Sinusoids efficiently model spectral peaks over time Channel Vocoder • Filtered noise efficiently models spectral residual vs. t Phase Vocoder • Neither is good for abrupt transients in the waveform Additive Synthesis • Phase-matched oscillators are expensive FM Synthesis • More efficient to switch to a transient model during transients Sinusoidal Modeling • • Sinusoidal Modeling Need sinusoidal phase matching at the switching times • Spectral Trajectories • Sines + Noise Transient models: • S+N Examples • S+N FX • S+N XSynth • Original waveform slice (1988) • Sines + Transients • • S + N + Transients Wavelet expansion (Ali 1996) • S+N+T TSM • MPEG-2 AAC (with short window) (Levine 1998) • S+N+T Freq Map • • S+N+T Windows Frequency-domain LPC • HF Noise Modeling (time-domain amplitude envelope) (Verma 2000) • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 39 / 59

Future Time Scale Modification of Sines + Noise + Transients Models

Outline transients transients sines+ sines+ sines+ Telharmonium originalsignal noise noise noise Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis transients transients time-scaled sines+ sines+ sines+ FM Synthesis signal noise noise noise Sinusoidal Modeling • Sinusoidal Modeling • Spectral Trajectories • Sines + Noise time • S+N Examples • S+N FX • S+N XSynth Time-Scale Modification (TSM) becomes well defined: • Sines + Transients • S + N + Transients • Transients are translated in time • S+N+T TSM • S+N+T Freq Map • Sinusoidal envelopes are scaled in time • S+N+T Windows • • HF Noise Modeling Noise-filter envelopes also scaled in time • HF Noise Band • Dual of TSM is frequency scaling • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 40 / 59

Future Sines + Noise + Transients Time-Frequency Map

16 Outline 14 Telharmonium 12 Voder

Channel Vocoder 10

Phase Vocoder 8

Additive Synthesis 6 frequency [kHz] FM Synthesis 4

Sinusoidal Modeling 2 • Sinusoidal Modeling • 0 Spectral Trajectories 0 50 100 150 200 250 • Sines + Noise 1 • S+N Examples 0.5 • S+N FX • S+N XSynth 0 • Sines + Transients amplitude • S + N + Transients −0.5 • S+N+T TSM −1 • S+N+T Freq Map 0 50 100 150 200 250 • S+N+T Windows time [milliseconds] • HF Noise Modeling (Levine 1998) • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 41 / 59

Future Corresponding Analysis Windows

Outline

Telharmonium transient Voder

Channel Vocoder

Phase Vocoder high octave Additive Synthesis

FM Synthesis

Sinusoidal Modeling

• Sinusoidal Modeling middle octave • Spectral Trajectories • Sines + Noise • S+N Examples low octave • S+N FX • S+N XSynth • Sines + Transients • S + N + Transients • S+N+T TSM • S+N+T Freq Map • S+N+T Windows amplitude • HF Noise Modeling • HF Noise Band 0 50 100 150 200 250 • S+N+T Examples time [milliseconds]

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 42 / 59

Future Quasi-Constant-Q (Wavelet) Time-Frequency Map

16 Outline 14 Telharmonium

Voder 12

Channel Vocoder 10

Phase Vocoder 8

Additive Synthesis 6 frequency [kHz] FM Synthesis 4

Sinusoidal Modeling 2 • Sinusoidal Modeling • Spectral Trajectories 0 0 4 50 100 150 200 250 • Sines + Noise x 10 • S+N Examples 2 • S+N FX 1 • S+N XSynth 0 • Sines + Transients

• S + N + Transients amplitude −1 • S+N+T TSM −2 • S+N+T Freq Map 0 50 100 150 200 250 • S+N+T Windows time [milliseconds] • HF Noise Modeling • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 43 / 59

Future Bark-Band Noise Modeling (Levine 1998)

Outline 85

Telharmonium 80 Voder

Channel Vocoder 75

Phase Vocoder 70 Additive Synthesis 65 FM Synthesis

Sinusoidal Modeling 60 • Sinusoidal Modeling • Spectral Trajectories 55 • Sines + Noise magnitude [dB] • S+N Examples 50 • S+N FX • S+N XSynth 45 • Sines + Transients • S + N + Transients 40 • S+N+T TSM • S+N+T Freq Map 35 • S+N+T Windows • HF Noise Modeling 30 • 0 500 1000 1500 2000 2500 3000 3500 4000 4500 HF Noise Band frequency [Hz] • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 44 / 59

Future Amplitude Envelope for One Noise Band

Outline 80

Telharmonium 70 Voder 60 Channel Vocoder

Phase Vocoder 50 Original Mag. [dB] Additive Synthesis 40 FM Synthesis 0 50 100 150 200 250 300 350

Sinusoidal Modeling • Sinusoidal Modeling 80

• Spectral Trajectories 75 • Sines + Noise 70 • S+N Examples 65 • S+N FX 60 • S+N XSynth 55 • Sines + Transients LSA Mag. [dB] • S + N + Transients 50 • S+N+T TSM 45 • 0 50 100 150 200 250 300 350 S+N+T Freq Map time [milliseconds] • S+N+T Windows • HF Noise Modeling • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 45 / 59

Future Sines + Noise + Transients Sound Examples

Outline Scott Levine Thesis Demos (Sines + Noise + Transients at 32 kbps) Telharmonium (http://ccrma.stanford.edu/~scottl/thesis.html) Voder

Channel Vocoder

Phase Vocoder Additive Synthesis “It Takes Two” by Rob Base & DJ E-Z Rock FM Synthesis

Sinusoidal Modeling • Original • Sinusoidal Modeling • MPEG-AAC at 32 kbps • Spectral Trajectories • Sines + Noise • Sines+transients+noise at 32 kbps • S+N Examples • S+N FX • S+N XSynth • Multiresolution sinusoids • Sines + Transients • S + N + Transients • Residual Bark-band noise • S+N+T TSM • Transform-coded transients (AAC) • S+N+T Freq Map • S+N+T Windows • Bark-band noise above 5 kHz • HF Noise Modeling • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 46 / 59

Future Time Scale Modification using Sines + Noise + Transients

Outline Scott Levine Thesis Demos (Sines + Noise + Transients at 32 kbps) Telharmonium (http://ccrma.stanford.edu/~scottl/thesis.html) Voder

Channel Vocoder

Phase Vocoder Additive Synthesis Time-Scale Modification (pitch unchanged) FM Synthesis • S+N+T time-scale factors [2.0, 1.6, 1.2, 1.0, 0.8, 0.6, 0.5] Sinusoidal Modeling • Sinusoidal Modeling • Spectral Trajectories • Sines + Noise • S+N Examples S+N+T Pitch Shifting (timing unchanged) • S+N FX • S+N XSynth • Pitch-scale factors [0.89, 0.94, 1.00, 1.06, 1.12] • Sines + Transients • S + N + Transients • S+N+T TSM • S+N+T Freq Map • S+N+T Windows • HF Noise Modeling • HF Noise Band • S+N+T Examples

Spectrogram Synth

DDSPJulius Smith Music 421 Applications Lecture – 47 / 59

Future Outline Telharmonium Voder Channel Vocoder Phase Vocoder Additive Synthesis FM Synthesis Sinusoidal Modeling Spectrogram Synth DDSP Future Spectrogram Synthesis (2017)

Julius Smith Music 421 Applications Lecture – 48 / 59 NSynth: Neural Audio Synthesis

Outline • NSynth uses deep neural networks to generate sounds at the Telharmonium level of individual samples Voder • Audio Morphing in “neural latent space” Channel Vocoder • Google Magenta project: Phase Vocoder https://magenta.tensorflow.org/nsynth Additive Synthesis

FM Synthesis

Sinusoidal Modeling

Spectrogram Synth • NSynth • Style Transfer

DDSP

Future

Julius Smith Music 421 Applications Lecture – 49 / 59 Neural Style Transfer for Audio Spectrograms (2017)

Outline Learned Spectrogram X(ω, t) minimizes a sum of loss terms: Telharmonium • Voder content loss = L2 distance between current activation filters and

Channel Vocoder those of “content” spectrogram

Phase Vocoder • style loss = normalized L2 distance between Gram matrix of filter Additive Synthesis activations of selected convolutional layers chosen as FM Synthesis corresponding to “style” Sinusoidal Modeling • differences in temporal and frequency energy envelopes Spectrogram Synth • • NSynth NIPS paper by Verma and Smith (2017): • Style Transfer https://arxiv.org/abs/1801.01589 DDSP • Original paper for images (2015): Gatys, Leon A., Alexander S. Future Ecker, and Matthias Bethge. “A neural algorithm of artistic style.” arXiv preprint arXiv:1508.06576(2015): https://arxiv.org/abs/1508.06576 • Nice intro (2016): http://yeephycho.github.io/2016/09/14/neural-style/

Julius Smith Music 421 Applications Lecture – 50 / 59 Spectrogram Style Transfer, Continued

• Left: Tuning-fork “style” imposed on a harp sample: Adaptive filtering down to fundamental observed • Right: Violin “style” imposed on singing voice: Bandwidth extension observed

Julius Smith Music 421 Applications Lecture – 51 / 59 Outline Telharmonium Voder Channel Vocoder Phase Vocoder Additive Synthesis FM Synthesis Sinusoidal Modeling Spectrogram Synth DDSP Future Differentiable DSP (2019)

Julius Smith Music 421 Applications Lecture – 52 / 59 DDSP: Differentiable DSP

• Jesse Engel et al. at Google Magenta Group • Neural network analysis/synthesis for differentiable signal models • Additive Synthesis example: • Loudness normalized by A-weighted log-power spectrum • Fundamental Frequency F0 from pretrained CREPE pitch detector • Timbre vector Z from autoencoder • Timbre vector decodes to sinusoidal amplitude trajectories

Julius Smith Music 421 Applications Lecture – 53 / 59 DDSP Encoder

Outline

Telharmonium

Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis

FM Synthesis

Sinusoidal Modeling

Spectrogram Synth • Loudness and F0 of Target Audio have been normalized away DDSP • MFCC = Mel Frequency Cepstral Coefficients • DDSP • DDSP Encoder • GRU = Gated Recurrent Unit (Cho 2014) - similar to LSTM = • DDSP Decoder Long/Short-Term Memory architecture • DDSP MLP • Dense = Fully Connected Linear Deep Neural Net (512-to-16 Future compression step) • F0 and Loudness normalization leave only timbre to be encoded

Julius Smith Music 421 Applications Lecture – 54 / 59 DDSP Decoder

Outline

Telharmonium

Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis

FM Synthesis

Sinusoidal Modeling

Spectrogram Synth • MLP = Multi-Layer Perceptron (classical neural network) DDSP • DDSP • 250 time steps (frames) included • DDSP Encoder • • DDSP Decoder Output is additive synthesis parameters (sines + filtered noise) • DDSP MLP

Future

Julius Smith Music 421 Applications Lecture – 55 / 59 DDSP MLP

Outline

Telharmonium

Voder

Channel Vocoder

Phase Vocoder

Additive Synthesis

FM Synthesis

Sinusoidal Modeling

Spectrogram Synth • RELU = Rectified Linear Unit (half-wave rectifier) DDSP • DDSP • 3 layers and 512 Units • DDSP Encoder • • DDSP Decoder Entire model is differentiable end to end, so back-propagation • DDSP MLP can optimize everything together (ADAM optimizer used) Future • Optimization is generally Stochastic Gradient Descent

Julius Smith Music 421 Applications Lecture – 56 / 59 Outline Telharmonium Voder Channel Vocoder Phase Vocoder Additive Synthesis FM Synthesis Sinusoidal Modeling Spectrogram Synth DDSP Future Summary and Future Prospects

Julius Smith Music 421 Applications Lecture – 57 / 59 Spectral Modeling History Highlights

• Bernoulli’s modal sums (1733) Perceptual audio coding: • Fourier’s initial theorem (1822) • Princen-Bradley filterbank • Telharmonium (1906) (1986) • Hammond organ (1930s) • K. Brandenburg thesis (1989) • Channel Vocoder (1939) • Auditory masking usage • Phase Vocoder (1966) • Dolby AC2 • “Additive Synthesis” (1969) • Musicam • FFT Phase Vocoder (1976) • ASPEC • Sinusoidal Modeling • MPEG-I,II,IV (1977,1979,1985) (S+N+T “parametric sounds”) • Sines+Noise (1989) • Sines+Transients (1989) • TF Reassignment (1995) • Sines+Noise+Transients (1998)

Julius Smith Music 421 Applications Lecture – 58 / 59 Future Prospects

Observations: • Sinusoidal modeling of sound is “Unreasonably Effective” • Basic “auditory masking” discards ≈ 90% information • Interesting neuroscience observation: “... most neurons in the primary auditory cortex A1 are silent most of the time ...” (from “Sparse Time-Frequency Representations”, Gardner and Magnesco, PNAS:103(16), April 2006) • What is the right “psychospectral model” for sound? ◦ The cochlea of the ear is a real-time spectrum analyzer ◦ How is the “ear’s spectrogram” represented at higher levels?

Julius Smith Music 421 Applications Lecture – 59 / 59