Articulatory-Based English Consonant Synthesis in 2-D Digital Waveguide Mesh
Total Page:16
File Type:pdf, Size:1020Kb
Articulatory-Based English Consonant Synthesis in 2-D Digital Waveguide Mesh Anocha Rugchatjaroen PhD Department of Electronics University of York June, 2014 0 1 Abstract In articulatory speech synthesis, the 3-D shape of a vocal tract for a particular speech sound has typically been established, for example, by magnetic resonance imaging (MRI), and this is used to model the acoustic output from the tract using numerical methods that operate in either one, two or three dimensions. The dimensionality strongly affects the overall computation complexity, which has a direct bearing on the quality of the synthesized speech output. The digital waveguide mesh (DWM) is a numerical method commonly used in room acoustic modelling. A smaller space such as a vocal tract, which is about 5 cm wide and 16.5-18 cm long in adults, can also be modelled using DWM in one, two and three dimensions. The latter requires a very dense mesh requiring massive computational resources; these requirements are lessened by using a lower dimensionality (two rather than three) and/or a less dense mesh. The computational cost of 2-D digital waveguide modelling makes it a practical technique for real-time synthesis in an average PC at full (20 kHz) audio bandwidth. This research makes use of a 2-D mesh with the advantage of the availability and flexibility of existing boundary modelling and the raised-cosine impedance control to study the possibilities of using it for English consonant synthesis. The research was organized under the phonetic ‘manner’ classification of English consonants as: semi-vowel, nasal, fricative, plosive and affricate. Their production has been studied in terms of acoustic pressure wave propagation. Meshing topology was fixed to being a 4-port scattering 2-D rectilinear waveguide mesh for ease of understanding and mapping to the tract shape. As the characteristic of consonant production requires vocal tract articulation variations that are quite unlike vowels, this research adopts the articulatory trajectories using electromagnetic (mid-sagittal) articulograph (EMA) data from mngu0 to guide the change of cross-sectional vocal tract area. Generally, articulatory trajectories have been used to improve the accuracy of speech recognition and synthesis in recent decades. This research adopts the 2 trajectories to control coarticulation in consonant synthesis to demonstrate that a 2-D digital waveguide mesh (DWM) is able to simulate the formant transition accurately. The formant transitions in the results are close acoustically to natural speech and are based on controlling articulation for four places of articulation. Positions of lip, tongue tip, tongue body and tongue dorsum are inversely mapped to their corresponding cross-sectional areas. Linear interpolation between them enabled all tract movements to be modelled. The results show that tract movements are best modelled as non-linear coarticulation. 3 Contents Acknowledgements ..................................................................................................................................... 10 Declaration ..................................................................................................................................................... 11 Chapter 1 Introduction ............................................................................................................................. 12 Chapter 2 Speech Synthesis ..................................................................................................................... 17 2.1 Sound Waveforms ....................................................................................................................... 18 2.2 Harmonics and Resonance .......................................................................................................... 23 2.3 Acoustic Representation in the Human Voice ............................................................................ 26 2.4 Introduction to Speech Synthesis ................................................................................................ 27 2.5 Introduction to Articulatory-based Speech Synthesis ................................................................. 30 2.6 Tube models and Wave Scattering.............................................................................................. 33 2.7 Vocal tract modelling with some examples ................................................................................ 36 Chapter 3 2-D Digital Waveguide Mesh ............................................................................................... 41 3.1 Digital Waveguide Mesh (DWM) ............................................................................................... 41 3.2 Multi-Dimensional Digital Waveguide Mesh ............................................................................. 45 3.3 2-D Digital Waveguide Mesh ..................................................................................................... 46 3.3.1 Meshing and Scattering ........................................................................................................ 46 3.3.2 Boundary Management in vocal tract modelling ................................................................. 49 Chapter 4 Acoustics of English Consonants ..................................................................................... 53 4.1 Acoustic Representation in the Human Voice ............................................................................ 54 4.2 Perturbation Theory .................................................................................................................... 56 4.3 Vocal Tract Apparatus and Articulation ..................................................................................... 57 4.4 Acoustic properties of English Consonants ................................................................................ 59 4.4.1 Nasal .................................................................................................................................... 60 4.4.2 Plosive .................................................................................................................................. 61 4.4.3 Fricatives .............................................................................................................................. 63 4.4.4 Affricates .............................................................................................................................. 65 4.4.5 Semi-vowels ......................................................................................................................... 65 Chapter 5 2-D Digital Waveguide Mesh for English Consonants ............................................... 69 5.1 Tools and data ............................................................................................................................. 69 5.1.1 Mullen's 2-D Digital Waveguide Mesh ............................................................................... 69 1 5.1.2 Voiced and unvoiced source ................................................................................................ 73 5.1.3 Vocal tract shape and articulatory trajectory corpus ............................................................ 74 5.2 The simulation ............................................................................................................................ 81 5.2.1 Semi-vowels ......................................................................................................................... 81 5.2.2 Nasals ................................................................................................................................... 84 5.2.3 Plosives ................................................................................................................................ 87 5.2.4 Fricatives .............................................................................................................................. 91 5.2.5 Affricates .............................................................................................................................. 93 Chapter 6 Evaluation and Results......................................................................................................... 95 6.1 Objective evaluations .................................................................................................................. 95 6.1.1 Nasals ................................................................................................................................... 95 6.1.2 Semi-vowels ....................................................................................................................... 102 6.1.3 Plosives .............................................................................................................................. 107 6.1.4 Fricatives ............................................................................................................................ 113 6.1.5 Affricates ............................................................................................................................ 117 6.2 Subjective test results ................................................................................................................ 118 Chapter 7 Conclusions and Future Work ........................................................................................ 131 7.1 Conclusion ...............................................................................................................................