A Speech Synthesis-By-Rule System for Modern Standard Chinese

A Speech Synthesis-by-Rule System For Modern Standard Chinese by Bo SHI Department of Phonetics and Linguistics University College London A thesis submitted to the University of London for the degree of Doctor of Philosophy 1990 1 ProQuest Number: 10609969 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest ProQuest 10609969 Published by ProQuest LLC(2017). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C ode Microform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 ABSTRACT This thesis discusses the development of a Chinese speech synthesis-by-rule system and presents the structure and features of the system. The aim of the work is to produce highly intelligible standard Chinese speech with natural sounding intonation from unrestricted Chinese text, using a parallel formant speech synthesizer. The synthesis system accepts standard Chinese Pinyin text as input, either from a conventional keyboard or from a computer readable file. Text to detailed phonetic description conversion is carried out in three steps: 1) application of a group of phonological and phonetic rules to convert Pinyin text into demisyllable strings; 2) conversion of the demisyllables into a succession of phonetic elements using a dictionary look-up strategy; 3) application of prosodic rules at different levels. Two specific features of this text-to-phonetic conversion system are: the use of a specially designed demisyllable dictionary which permits an effective and reliable way of deriving phonetic elements from text, and the implementation of a new generative intonation framework which enables a very wide variety of natural sounding fundamental frequency contours to be generated automatically. The acoustic-phonetic rules used in the system are developed from the Holmes-Mattingly-Shearme (HMS) algorithm. A complete set of Chinese phonetic tables, each of which contains the acoustic properties and co-articulation information of an acoustic segment, has been developed on the basis of systematic acoustic-phonetic analysis of standard Chinese syllables. 2 ABSTRACT The synthesis system has been implemented on a BBC microcomputer to drive a LSI parallel formant synthesizer. An IBM PC version of the system has also been developed. A segmental intelligibility test shows that the performance of the present system is quite comparable with those of the best English speech synthesis-by-rule systems now available. 3 ACKNOWLEDGEMENTS This work is a part of a Chinese speech Input/Output system under the Alvey Program, financially supported by the Science and Engineering Research Council and the Sindex Speech Technology Group. I am indebted to many who have helped in this research. This work would not have been completed without their cooperation and assistance. In particular, I wish to acknowledge my indebtedness to the following persons for their special contributions: - my supervisor, Professor Adrian Fourcin of University College London for his expert guidance of my research and his careful criticism of my thesis; - Dr. John Holmes for his invaluable and thoughtful help at various stages of the work; - Professor Zhang Jialu of Institute of Acoustics, the Chinese Academy of Sciences for his support and helpful advice; - Professor Eva Girding of Lund University, Sweden for stimulating my interesting in prosodic features and generous help in enhancing my knowledge in this field; - the colleagues in the Chinese speech Input/output project: Paul Thompson for many thoughtful discussions and helpful suggestions; Lillian Chia for making recordings; David Howell for helping with the processing the Chinese syllable database using the JSRU formant tracker; Peter Davies for helping with using the MASSCOMP computer; - all my colleagues in the Department of Phonetics and Linguistics of UCL for their help and providing a supportive working environment and excellent research facilities, and especially to Bridget Allen, Valerie Hazan, David Howard, Mike Johnson, Geoff Lindsey and Sarah Palmer for kindly making many helpful comments on the early versions of this thesis, Steve Nevard for technical assistance, Andrew Faulkner and Stuart Rosen for providing software of plotting confusion matrix and line drawings; - all the speakers for the recordings and subjects in the perceptual test for their patience and cooperation; last but not least, Jun and Kaixi for their love, and especially to Jun for his constant encouragement and untiring support over the years. 4 TABLE OF CONTENTS ABSTRACT............................................................................................................. 2 ACKNOWLEDGEMENTS.................................................................................. 4 TABLE OF CONTENTS ..................................................................................... 5 LIST OF FIGURES .............................................................................................. 8 LIST OF TABLES .............................................................................................. 15 LIST OF PRINCIPAL SYMBOLS AND ABBREVIATIONS.................. 18 CHAPTER 1 INTRODUCTION ........................................ 19 1.1 Organization of the thesis ............................................................................. 20 1.2 A brief review of speech synthesis technology........................................... 21 1.3 Review of work on Chinese speech synthesis ........................................... 42 1.4 Aim of the present stu d y ................................................................................ 49 CHAPTER 2 BACKGROUND OF MODERN STANDARD CHINESE ... 50 2.1 Introduction ...................................................................................................... 51 2.2 Phonetic and phonological descriptions ....................................................... 53 2.2.1 Chinese phonetic alphabet: P in y in ............................................... 53 2.2.2 Syllabic structure and phonological constraints ......................... 57 2.2.3 Phonetic description ........................................................................ 61 2.2.4 Tonal system .................................................................................. 65 CHAPTER 3 OUTLINE OF THE CHINESE SYNTHESIS-BY-RULE SYSTEM 69 3.1 Overview of the system ................................................................................ 70 3.2 The LSI parallel formant speech synthesizer ............................................. 75 3.3 The SYNCON synthesis control software .................................................. 81 CHAPTER 4 ACOUSTIC CHARACTERISTICS AND SYNTHETIC REALIZATION OF CHINESE SPEECH SOUNDS ......................... 88 4.1 Introduction .................................................................................................... 91 4.1.1 Classification and acoustic description of speech sounds .... 91 4.1.2 Synthesis strategies of formant synthesis-by-rule ....................... 96 5 4.2 Framework for the establishment of the Chinese phonetic tables .... 100 4.2.1 The Chinese syllable database .................................................... 100 4.2.2 Methods of acoustic analysis ....................................................... 100 4.2.3 Strategies of imitating Chinese speech sounds ........................... 104 4.3 Vowel sounds .............................................................................................. 117 4.3.1 Simple vow els ............................................................................... 118 4.3.2 Diphthongs .................................................................................... 120 4.3.3 Triphthongs .................................................................................. 125 4.3.4 Semivowels .................................................................................. 126 4.4 Nasal endings .............................................................................................. 129 4.5 Initial consonants ......................................................................................... 135 4.5.1 P lo siv es ......................................................................................... 136 4.5.2 F ricatives ....................................................................................... 144 4.5.3 A ffricates....................................................................................... 149 4.5.4 Nasals and the lateral ................................................................... 153 CHAPTER 5 PROSODIC FEATURES - DESCRIPTION, ANALYSIS AND MODELLING 201 5.1 Prosodic features ......................................................................................... 203 5.1.1 D efinition ....................................................................................... 203 5.1.2 Term inology .................................................................................. 204 5.1.3 F unctions ....................................................................................... 205 5.1.4 Perception ...................................................................................

A Speech Synthesis-By-Rule System for Modern Standard Chinese

How to Speak a Language Without Knowing It

A Prototype Text Analyzer for Mandarin Chinese TTS System

Prosodic Alternative Units in a Mandarin Chinese Speech Synthesizer

Syllable As a Synchronization Mechanism That Makes Human Speech Possible

Free Chinese Tts Download

The Acoustic Cues at Prosodic Boundaries in Mandarin

Chapter 23 Cslp Corpora and Language Resources

Issues in Text-To-Speech Conversion for Mandarin