THE DEVELOPMENT of ACCENTED ENGLISH SYNTHETIC VOICES By
Total Page:16
File Type:pdf, Size:1020Kb
THE DEVELOPMENT OF ACCENTED ENGLISH SYNTHETIC VOICES by PROMISE TSHEPISO MALATJI DISSERTATION Submitted in fulfilment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE in the FACULTY OF SCIENCE AND AGRICULTURE (School of Mathematical and Computer Sciences) at the UNIVERSITY OF LIMPOPO SUPERVISOR: Mr MJD Manamela CO-SUPERVISOR: Dr TI Modipa 2019 DEDICATION In memory of my grandparents, Cecilia Khumalo and Alfred Mashele, who always believed in me! ii DECLARATION I declare that THE DEVELOPMENT OF ACCENTED ENGLISH SYNTHETIC VOICES is my own work and that all the sources that I have used or quoted have been indicated and acknowledged by means of complete references and that this work has not been submitted before for any other degree at any other institution. ______________________ ___________ Signature Date iii ACKNOWLEDGEMENTS I want to recognise the following people for their individual contributions to this dissertation: • My brother, Mr B.I. Khumalo and the whole family for the unconditional love, support and understanding. • A distinct thank you to both my supervisors, Mr M.J.D. Manamela and Dr T.I. Modipa, for their guidance, motivation, and support. • The Telkom Centre of Excellence for Speech Technology for providing the resources and support to make this study a success. • My colleagues in Department of Computer Science, Messrs V.R. Baloyi and L.M. Kola, for always motivating me. • A special thank you to Mr T.J. Sefara for taking his time to participate in the study. • The six Computer Science undergraduate students who sacrificed their precious time to participate in data collection. • All the volunteers who evaluated the work done in the study. • Above all, I thank the Almighty God for gracing me with the ability to start and finish this study. And also for blessing me with the supportive people around me. iv ABSTRACT A Text-to-speech (TTS) synthesis system is a software system that receives text as input and produces speech as output. A TTS synthesis system can be used for, amongst others, language learning, and reading out text for people living with different disabilities, i.e., physically challenged, visually impaired, etc., by native and non-native speakers of the target language. Most people relate easily to a second language spoken by a non-native speaker they share a native language with. Most online English TTS synthesis systems are usually developed using native speakers of English. This research study focuses on developing accented English synthetic voices as spoken by non-native speakers in the Limpopo province of South Africa. The Modular Architecture for Research on speech sYnthesis (MARY) TTS engine is used in developing the synthetic voices. The Hidden Markov Model (HMM) method was used to train the synthetic voices. Secondary training text corpus is used to develop the training speech corpus by recording six speakers reading the text corpus. The quality of developed synthetic voices is measured in terms of their intelligibility, similarity and naturalness using a listening test. The results in the research study are classified based on evaluators’ occupation and gender and the overall results. The subjective listening test indicates that the developed synthetic voices have a high level of acceptance in terms of similarity and intelligibility. A speech analysis software is used to compare the recorded synthesised speech and the human recordings. There is no significant difference in the voice pitch of the speakers and the synthetic voices except for one synthetic voice. v TABLE OF CONTENTS DEDICATION ..................................................................................................... II DECLARATION ................................................................................................ III ACKNOWLEDGEMENTS ................................................................................. IV ABSTRACT ........................................................................................................ V TABLE OF CONTENTS .................................................................................... VI LIST OF FIGURES ............................................................................................. X LIST OF TABLES ........................................................................................... XIII ABBREVIATIONS AND ACRONYMS ........................................................... XIV CHAPTER 1 ....................................................................................................... 1 1. INTRODUCTION ......................................................................................... 1 1.1. Background of the Problem ............................................................... 1 1.2. Problem Statement .............................................................................. 2 1.3. Aim and Objectives ............................................................................. 2 1.3.1 Data collection ................................................................................ 3 1.3.2 Voice development ......................................................................... 3 1.3.3 Voice testing ................................................................................... 3 1.3.4 Avail voices ..................................................................................... 3 1.4. Theoretical Framework ....................................................................... 3 1.4.1. Training Stage ................................................................................ 4 1.4.2. Synthesis Stage .............................................................................. 5 1.5. Significance of the Study ................................................................... 6 1.6. Structure of the Dissertation .............................................................. 7 CHAPTER 2 ....................................................................................................... 8 vi 2. LITERATURE REVIEW ............................................................................... 8 2.1. Introduction ......................................................................................... 8 2.2. Native Speakers versus Non-native Speakers .................................. 8 2.3. Text-to-speech (TTS) Synthesis System ......................................... 10 2.3.1. Modern TTS Synthesis Systems ................................................... 10 2.3.2. Development of TTS Synthesis System ....................................... 11 2.3.3. TTS Synthesis Platforms .............................................................. 15 2.3.4. Usage of TTS Synthesis System .................................................. 18 2.3.5. TTS Synthesis System Evaluation ................................................ 19 2.3.6. TTS Synthesis System for Non-native Speakers .......................... 19 2.4. Summary ............................................................................................ 20 CHAPTER 3 ..................................................................................................... 21 3. THE MARY TTS SYNTHESIS SYSTEM AND VOICE DEVELOPMENT EXPERIMENTS. ............................................................................................... 21 3.1. Introduction ....................................................................................... 21 3.2. MARY TTS Synthesis System and Software Packages ................. 21 3.3. Synthetic Voices Experiments ......................................................... 23 3.3.1. voice_slt ........................................................................................ 23 3.3.2. voice_lwazi ................................................................................... 24 3.3.3. voice_rms ..................................................................................... 26 3.3.4. voice_sltmodified .......................................................................... 26 3.4. South African English ....................................................................... 26 3.5. Data Collection .................................................................................. 30 3.5.1. Xitsonga Female ........................................................................... 33 3.5.2. Tshivenda Female ........................................................................ 34 3.5.3. Sepedi Female.............................................................................. 34 3.5.4. Xitsonga Male ............................................................................... 35 3.5.5. Tshivenda Male ............................................................................ 35 3.5.6. Sepedi Male .................................................................................. 35 vii 3.6. Development of Accented Voices .................................................... 38 3.6.1. Voice xitsonga_female .................................................................. 38 3.6.2. Voice tshivenda_female ................................................................ 41 3.6.3. Voice sepedi_female .................................................................... 42 3.6.4. Voice xitsonga_male ..................................................................... 42 3.6.5. Voice tshivenda_male ................................................................... 43 3.6.6. Voice sepedi_male ....................................................................... 44 3.7. MARY Graphic User