ICML 2019

FloWaveNet: A Generative Flow for Raw Audio

Sungwon Kim1, Sang-gil Lee1, Jongyoon Song1, Jaehyeon Kim2, Sungron Yoon1,3

1Seoul National University, 2Kakao Corporation, 3ASRI, INMC, Institute of Engineering Research, Seoul National University

Poster 6/12 6:30 PM @Pacific Ballroom #2 WaveNet

)

log $% &':) = + log $% &, &., ,-'

https://deepmind.com/blog/wavenet-generative-model-raw-audio/ WaveNet

Sequential sampling )

log $% &':) = + log $% &, &., ,-'

https://deepmind.com/blog/wavenet-generative-model-raw-audio/ Previous parallel models

Pre-trained WaveNet

Inverse Autoregressive Flows (IAFs)

Probability Density Distillation

!" #$ % ||#' %

Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on . 2018. Previous parallel speech synthesis models

Pre-trained WaveNet

Parallel sampling

Inverse Autoregressive Flows (IAFs)

Probability Density Distillation

!" #$ % ||#' %

Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning. 2018. Previous parallel speech synthesis models

Pre-trained WaveNet

Parallel sampling

Inverse Autoregressive Flows (IAFs)

Probability Density Distillation Power Loss Perceptual Loss !" #$ % ||#' % + Contrastive Loss Frame Loss

Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning. 2018. Our Objectives

• Simplify the training procedure for parallel sampling

• Maintain the quality of speech samples Our Objectives

• Simplify the training procedure for parallel sampling

• Maintain the quality of speech samples

Flow-based generative models for raw audio! FloWaveNet

3% 3+ ,-

Raw audio Gaussian Noise

2,- & Training phase log $ & = log $ , & + log det % ':) + - ':) 2& FloWaveNet

5% 5+ ,-

,34 Raw audio - Gaussian Noise

2,- & Training phase log $ & = log $ , & + log det % ':) + - ':) 2&

34 Sampling phase 6 = 6':) ~ 5+ 6 = 8 9, ; , &< = ,- (6) FloWaveNet

5% 5+ ,-

,34 Raw audio - Gaussian Noise

2,- & Training phase log $ & = log $ , & + log det % ':) + - ':) 2&

34 Sampling phase 6 = 6':) ~ 5+ 6 = 8 9, ; , &< = ,- (6)

34 Both the transformation ,- and ,- are designed to be computed efficiently à Efficient training & Parallel sampling FloWaveNet

7 456 7 459

4:

3 47 ⋅ 47 & log $ & = log $ , & + . log det 56 59 % ':) + ':) 3& / Mean Opinion Scores

FloWaveNet ≥ Gaussian IAF Sampling speed

FloWaveNet ≅ Gaussian IAF ≅ Parallel WaveNet >> Autoregressive WaveNet

1000s times faster Conclusion

• FloWaveNet produces high quality audio samples as well as previous distilled models.

• FloWaveNet synthesizes audio samples in parallel – w/o well pre-trained WaveNet (No distillation!) – w/o auxiliary loss terms

Demo page Code Poster 6/12 6:30 PM @Pacific Ballroom #2 16