Pronunciation Support for Arabic Learners
Total Page:16
File Type:pdf, Size:1020Kb
PRONUNCIATION SUPPORT FOR ARABIC LEARNERS A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences 2015 By Majed Alsabaan School of Computer Science Contents Abstract 12 Declaration 14 Copyright 15 Acknowledgements 16 1 Introduction 18 1.1 Research Overview . 19 1.2 Organisation of the Thesis . 20 1.3 Research Questions and Research Tasks . 22 1.4 Research Methodology . 24 1.5 Research Contribution . 25 1.6 Summary . 25 2 Background and Literature Review 26 2.1 Automatic Speech Recognition (ASR) . 26 2.1.1 Introduction . 26 2.1.2 Hidden Markov Model (HMM) . 28 2.2 The Arabic Language . 31 2.2.1 Introduction . 31 2.2.2 Arabic ASR . 32 2.3 Pedagogy of Learning . 34 2.3.1 Aptitude and Motivation . 34 2.3.2 Thinking Tools . 35 2.3.3 Conditions for Learning Language . 35 2.3.4 Assessment in Learning . 36 2 2.4 CALL tools in Language Learning . 38 2.4.1 History of CALL . 38 2.4.2 Importance of CALL . 40 2.4.3 Detailed Example of a CALL tool - Baldi . 43 2.4.3.1 How does Baldi work? . 45 2.4.4 Graphical Simulation of the Vocal Tract . 46 2.5 Speech Synthesis . 48 2.6 Conclusion . 50 3 Architecture of the CALL tool 51 3.1 System Architecture . 51 3.2 System Integration . 51 3.2.1 Identifying Mispronunciation by the HTK . 53 3.2.2 Animating the Vocal Tract . 53 3.2.3 Synthesising Utterances . 53 3.2.4 Instruction Feature . 53 3.3 Language Lab Version . 56 3.4 Conclusion . 56 4 HTK Experiments 57 4.1 Using the HTK . 59 4.1.1 The HTK training steps . 59 4.1.1.1 Data Preparation . 60 4.1.1.2 Training the HTK . 65 4.1.1.3 Recogniser Evaluation . 67 4.2 Experiments . 69 4.2.1 Arabic Unique Consonants . 70 4.2.2 Training Data and Platform . 73 4.2.3 Designed System . 75 4.2.4 Experiment 1: Word Level vs. Phoneme Level . 76 4.2.4.1 Experimental Design . 76 4.2.4.2 Results . 78 4.2.4.3 Conclusion . 79 4.2.5 Experiment 2: The effect of Gender . 82 4.2.5.1 Experimental Design . 82 4.2.5.2 Results . 83 3 4.2.5.3 Conclusion . 83 4.2.6 Experiment 3: Better training data . 83 4.2.6.1 Experimental Design . 83 4.2.6.2 Results . 84 4.2.7 Summary . 86 4.3 Conclusion . 86 5 Animating the Head 88 5.1 Animation Version of our CALL Tool . 88 5.1.1 Morphing with Java3D . 90 5.1.2 The Integration . 94 5.2 Conclusion . 95 6 Speech Synthesis in CALL Tool 96 6.1 Generating Synthetic Speech . 96 6.2 Estimating Phone Length . 97 6.3 Improving Pitch Contour . 103 6.4 Conclusion . 105 7 Classroom Experiments and Evaluation 107 7.1 Experimental Setup . 107 7.2 Results . 108 7.3 User Behaviour . 111 7.3.1 Using the Language Lab Version . 111 7.3.2 Using the Animation Version . 113 7.3.3 Using the Synthesis Version . 113 7.3.4 Using the Instruction version . 115 7.3.5 Using the Full Version . 116 7.4 Conclusion . 117 8 Conclusion and Future Work 118 8.1 Research Questions and Research Tasks Revisited . 118 8.2 Future Work . 120 Bibliography 123 A Ethical Approval 137 4 B Python Script for the HTK training 139 C Dynamic Time Warping 144 D Morphing Code 147 E Top-Level System Integration 150 F Buckwalter Arabic transliteration 152 G Modified Buckwalter Arabic transliteration 154 H IPA for Arabic 156 I Modified IPA for Arabic 158 J Arabic Minimal Pairs 160 K Behavioural Traces 161 K.1 User behaviours of language lab version . 161 K.2 User behaviours of animation version . 165 K.3 User behaviours of synthesis version . 168 K.4 User behaviours of instruction version . 171 K.5 User behaviours of full version . 174 Word Count: 43,777 5 List of Tables 4.1 Emphatic and non-emphatic sounds . 71 4.2 Arabic minimal pairs . 73 4.3 Confusion matrix for emphatic sounds . 79 4.4 Experiments’ summary . 86 7.1 Classroom results . 109 7.2 Individual student improvement . 110 7.3 p-value for each version vs. language lab . 110 F.1 Buckwalter Arabic transliteration table . 153 G.1 Modified Buckwalter Arabic transliteration table . 155 H.1 IPA symbols for Arabic . 157 I.1 Modified IPA symbols for Arabic . 159 J.1 Words of Arabic Minimal Pairs . 160 6 List of Figures 2.1 Example of HMM . 30 2.2 Wave signal of the word ‘MIN’ . 32 2.3 Baldi’s wire frame model . 46 2.4 Inside view of Baldi . 47 2.5 Sagittal view . 47 2.6 The vocal tract . 48 3.1 System architecture . 52 3.2 Articulatory advice . 55 4.1 HTK experiments . 58 4.2 The HTK training steps . 59 4.3 Step 1: Data preparation . 60 4.4 Prompts file . 61 4.5 TestPrompts file . 62 4.6 SpeechRecord tool . 62 4.7 The contents of codetr.scp . 63 4.8 Grammar file . 63 4.9 Pronunciation dictionary file . 64 4.10 Using HDMan to generate the dictionary file . 66 4.11 Step 2: Running the HTK training script . 68 4.12 Tongue position during the articulation of the emphatic (CQ) and non-emphatic consonants (C) . 72 4.13 Spectrogram representation of the word /dQarb/ . 72 4.14 Spectrogram representation of the word /darb/ . 73 4.15 Grammar file - word level . 77 4.16 Pronunciation dictionary file - word level . 77 4.17 Grammar file - phoneme level . 78 7 4.18 Pronunciation dictionary file - phoneme level . 78 4.19 Measurement level of recognition accuracy . 80 4.20 Word level measurement of recognition accuracy . 81 4.21 Sentence level measurement of recognition accuracy . 81 4.22 Change from anyword grammar to constrained grammar . 85 5.1 Movable articulators . 89 5.2 The diagnostic tool . 90 5.3 Source images . 91 5.4 Traced versions . 91 5.5 Drawing of the phoneme ‘b’ . 92 5.6 Drawing of the phoneme ‘i’ . 93 5.7 Drawing of the phoneme ‘r’ . 93 5.8 Animated feedback on learner’s pronunciation . 95 5.9 Animated feedback on correct pronunciation . 95 6.1 MBROLA script . 97 6.2 Intensity curve for a real speech . ..