University of Southampton
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF SOUTHAMPTON Faculty of Physical Sciences and Engineering School of Electronics and Computer Science Modern Standard Arabic Phonetics for Speech Synthesis Nawar Halabi Supervisor: Prof Mike Wald Internal Examiner: Dr Gary B Wills External Examiner: Assoc Prof Nizar Habash Thesis for the degree of Doctor of Philosophy July 2016 i ABSTRACT Arabic phonetics and phonology have not been adequately studied for the purposes of speech synthesis and speech synthesis corpus design. The only sources of knowledge available are either archaic or targeted towards other disciplines such as education. This research conducted a three- stage study. First, Arabic phonology research was reviewed in general, and the results of this review were triangulated with expert opinions – gathered throughout the project – to create a novel formalisation of Arabic phonology for speech synthesis. Secondly, this formalisation was used to create a speech corpus in Modern Standard Arabic and this corpus was used to produce a speech synthesiser. This corpus was the first to be constructed and published for this dialect of Arabic using scientifically-supported phonological formalisms. The corpus was semi-automatically annotated with phoneme boundaries and stress marks; it is word-aligned with the orthographical transcript. The accuracy of these alignments was compared with previous published work, which showed that even slightly less accurate alignments are sufficient for producing high quality synthesis. Finally, objective and subjective evaluations were conducted to assess the quality of this corpus. The objective evaluation showed that the corpus based on the proposed phonological formalism had sufficient phonetic coverage compared with previous work. The subjective evaluation showed that this corpus can be used to produce high quality parametric and unit selection speech synthesisers. In addition, it showed that the use of orthographically extracted stress marks can improve the quality of the generated speech for general purpose synthesis. These stress marks are the first to be tested for Modern Standard Arabic, which thus opens this subject for future research. ii Table of Contents Chapter 1 Introduction ............................................................................................................... 1 1.1 Text to Speech Synthesis .................................................................................................... 1 1.2 Lack of Speech Corpora in Arabic ...................................................................................... 4 1.3 Knowledge Gaps (Arabic Phonetics and Phonology) ......................................................... 6 1.4 Creating a Speech Corpus ................................................................................................... 7 1.5 Scope ................................................................................................................................... 9 1.5.1 Target Synthesis Methods ....................................................................................... 10 1.5.2 Project Description ................................................................................................. 11 1.5.3 Research Questions ................................................................................................. 11 1.6 Research Contributions ..................................................................................................... 12 1.7 Summary ........................................................................................................................... 12 Chapter 2 Methodology ............................................................................................................ 13 2.1 Literature Selection Process .............................................................................................. 13 2.1.1 Keywords used in Search Engines .......................................................................... 13 2.1.2 Literature Selection Process.................................................................................... 14 2.1.3 Literature Sources ................................................................................................... 14 2.2 Sequential Process Methodology ...................................................................................... 16 2.3 Research Methods ............................................................................................................. 17 2.3.1 Methods for the Corpus Design Process ................................................................. 18 2.3.2 Methods for Acquiring and Optimising Orthographic Transcript .......................... 18 2.3.3 Methods for Segmenting and Aligning Recordings and their Evaluation .............. 19 2.3.4 Methods for Objective and Subjective Corpus Evaluations ................................... 20 2.4 Criteria for Choosing Experts ........................................................................................... 22 2.5 Ethics ................................................................................................................................. 23 2.6 Summary ........................................................................................................................... 23 Chapter 3 MSA Phonetics and Phonology .............................................................................. 24 3.1 Stress ................................................................................................................................. 24 3.2 Prosody ............................................................................................................................. 26 3.3 Gemination ........................................................................................................................ 29 3.4 Nasalisation ....................................................................................................................... 30 3.5 Emphasis ........................................................................................................................... 31 3.6 Diphthongs ........................................................................................................................ 33 3.7 Summary ........................................................................................................................... 33 iii Chapter 4 Transcript Collection, Reduction and Recording ................................................. 35 4.1 Corpora and Transcript Size .............................................................................................. 35 4.1.1 TIMIT ...................................................................................................................... 36 4.1.2 Other Corpora .......................................................................................................... 36 4.2 Optimisation (Orthographic Transcript Reduction)........................................................... 37 4.3 Optimisation Vocabulary ................................................................................................... 40 4.3.1 Short Syllable Diphones .......................................................................................... 41 4.3.2 Half Syllable Diphones ........................................................................................... 41 4.3.3 Consonant Clusters and Vowel Clusters ................................................................. 42 4.4 Results of Reduction .......................................................................................................... 43 4.5 Recording Utterances ........................................................................................................ 47 4.6 Summary............................................................................................................................ 50 Chapter 5 Corpus Segmentation and Alignment .................................................................... 51 5.1 Generating the phonetic transcript ..................................................................................... 52 5.2 Automatic Segmentation ................................................................................................... 55 5.3 HTK Alignment ................................................................................................................. 57 5.4 Manual corrections ............................................................................................................ 59 5.5 Summary............................................................................................................................ 60 Chapter 6 Evaluation of Segmentation and Alignment .......................................................... 61 6.1 Evaluation metrics ............................................................................................................. 61 6.2 Boundary Types................................................................................................................. 64 6.3 HTK Parameters ................................................................................................................ 64 6.4 Initial Evaluation (Flat Start) ............................................................................................. 65 6.4.1 Alignment quality .................................................................................................... 65 6.4.2 Expert Agreement ..................................................................................................