Time Voice Source Model for Use with Vocal Tract Modelling Synthesis on Portable Devices

The Development of a Parametric Real-Time Voice Source Model for use with Vocal Tract Modelling Synthesis on Portable Devices Jacob Harrison MSc by Research University of York Electronics November 2014 Abstract This research is concerned with the natural synthesis of the human voice, in particular, the expansion of the LF-model voice source synthesis method. The LF- model is a mathematical representation of the acoustic waveform produced by the vocal folds in the human speech production system. Whilst being used in many voice synthesis applications since its inception in the 1970s, the parametric capabilities of this model have remained mostly unexploited in terms of real-time manipulation. With recent advances in dynamic acoustic modelling of the human vocal tract using the two-dimensional digital waveguide mesh (2D DWM), a logical step is to include a real-time parametric voice source model rather than the static LF-waveform archetype. This thesis documents the development of a parameterised LF-model to be used in conjunction with an iOS-based 2D DWM vocal tract synthesiser, designed with the further study of voice synthesis naturalness as well as improvements to assistive technology in mind. i Table of Contents Abstract ...................................................................................................................................... i List of Figures .......................................................................................................................... v List of Tables ......................................................................................................................... vii List of Accompanying Material ...................................................................................... viii Acknowledgements .............................................................................................................. ix Author’s Declaration ............................................................................................................ x 1. Introduction ........................................................................................................................ 1 1.1 Thesis Overview ....................................................................................................................... 1 1.2 Thesis Structure ....................................................................................................................... 3 2. Literature Review ............................................................................................................. 5 2.1 The ‘Source + Modifier’ Principle ....................................................................................... 5 2.2 Physiology of the Vocal Folds .............................................................................................. 8 2.3 Voice Types .............................................................................................................................. 11 2.4 Modelling the Voice Source ................................................................................................ 16 2.4.1 Existing Methods for Voice Source Synthesis ..................................................................... 17 2.4.2 The Liljencrants-Fant Glottal Flow Model ............................................................................ 20 2.5 Vocal Tract Modelling .......................................................................................................... 28 2.6 ‘Naturalness’ in Speech Synthesis .................................................................................... 37 3. A Parametric, Real-Time Voice Source Model ....................................................... 44 3.1 Motivation for Design ........................................................................................................... 45 3.2 Specifications .......................................................................................................................... 47 3.3 Design ........................................................................................................................................ 48 3.3.1 Choice of Parameters ..................................................................................................................... 48 3.3.2 Wavetable Synthesis ...................................................................................................................... 53 3.3.3 Voice Types ........................................................................................................................................ 55 3.3.4 iOS Interface ...................................................................................................................................... 58 3.3.5 Final Design ....................................................................................................................................... 59 3.4 Implementation ..................................................................................................................... 61 3.4.1 Implementation in MATLAB ...................................................................................................... 62 3.4.2 Implementation in iOS .................................................................................................................. 69 ii 3.5 System Testing ........................................................................................................................ 80 3.5.1 Waveform Reproduction ............................................................................................................. 83 3.5.2 Fundamental Frequency .............................................................................................................. 86 3.5.3 ‘Vocal Tension’ Parameters ........................................................................................................ 86 3.5.4 Automatic Pitch-Dependent Voice Types ............................................................................. 90 3.5.5 Automatic f0 Trajectory ............................................................................................................... 95 3.6 Conclusions .............................................................................................................................. 97 4. Vocal Tract Modelling .................................................................................................... 98 4.1 Vocal Tract Modelling with the 2D DWM ....................................................................... 98 4.2 Implementation of the 2D DWM in MATLAB ............................................................. 102 4.3 Implementation in iOS ...................................................................................................... 105 4.4 System Testing ..................................................................................................................... 109 4.4.1 Formant Analysis .......................................................................................................................... 109 4.4.2 Multiple Vowels ............................................................................................................................. 111 4.4.3 System Performance .................................................................................................................... 113 4.5 Conclusions ........................................................................................................................... 114 5. Summary and Analysis ............................................................................................... 115 5.1 Summary ............................................................................................................................... 115 5.2 Analysis .................................................................................................................................. 116 5.2.1 Voice Source Synthesis using the LF-Model ...................................................................... 116 5.2.2 Extensions to the LF-Model ...................................................................................................... 116 5.2.3 Use within 2D DWM Vocal Tract Model .............................................................................. 119 5.2.4 Core Aims ......................................................................................................................................... 119 5.3 Future Research .................................................................................................................. 121 5.3.1 Issues within LF-Model implementation ............................................................................ 121 5.3.2 Further Extensions to the Voice Source Model ................................................................ 122 5.3.3 Multi-touch, Gestural User Interfaces .................................................................................. 124 5.3.4 Implementation of Dynamic Impedance Mapping within the 2D DWM ............... 126 5.4 Conclusion ............................................................................................................................. 126 Appendix A – ‘LFModelFull.m’ MATLAB Source Code ........................................... 128 Appendix B – ‘ViewController.h’ LFGen App Header File .................................... 133 Appendix C – ‘ViewController.m’ LFGen App Main File ....................................... 134 Appendix D –

Time Voice Source Model for Use with Vocal Tract Modelling Synthesis on Portable Devices

Enter Title Here

THE DEVELOPMENT of ACCENTED ENGLISH SYNTHETIC VOICES By

Part 2: RHYTHM – DURATION and TIMING Część 2: RYTM – ILOCZAS I WZORCE CZASOWE

SPEECH ACOUSTICS and PHONETICS Text, Speech and Language Technology VOLUME 24

Commercial Tools in Speech Synthesis Technology

Gender, Ethnicity, and Identity in Virtual

The Sonification Handbook Chapter 9 Sound Synthesis for Auditory Display

Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden

Estudios De I+D+I

Masterarbeit

Wavetable Synthesis 101, a Fundamental Perspective

Half a Century in Phonetics and Speech Research