Speech and Language Research in Asia/India

Development of Rules for Unlimited Text To Speech Synthesis of Hindi S.S.Agrawal*,Rajesh Verma,Shailendra Nigam, Anuradha Sengar, Central Electronics Engineering Research Institute Centre, CSIR Complex, NPL Campus, Dr.K.S Krishnan Road, New Delhi 110012. *E-Mail: [email protected] Abstract: This paper describes development of linguistic and acoustic rules for implementation in a unlimited Text To Speech Synthesis (TTS) system for Hindi. The Klatt’s cascade/parallel formant synthesizer has been simulated on a PC for synthesizing high quality Hindi and other Indian spoken languages. Parametric files of commonly spoken syllables have been created as the basic units of sounds for concatenation purpose. The input module can take up text from the keyboard in Indian Standard Code for Information Interchange (ISCII) format. Pre-processing is done for expanding abbreviations, numerals etc. in the form of phonemic strings of words. The input word sequence is parsed to break them into syllables and sent to the concatenator module. The parametric files of the corresponding syllables are picked up from the data base and concatenation rules are applied to form as smooth words as possible. These parametric word files are then subjected to prosodic rules for introducing correct pitch and stress levels. Pitch variations are considered at three levels i.e. word level, clause level and sentence level. The local peaks and values along with their amplitude levels are determined based on the phonological and syntactic information in a sentence. It is very important to correctly assign the pitch value, its rate of rise and fall at the appropriate syllables and at clausal breaks. Stressed syllables have 4 to 6 db higher intensity compared to unstressed syllables. A Win 98 based software model of the system is working satisfactorily and it is continuously improved to produce quality synthesis. DESCRIPTION OF TTS FOR HINDI The modules of the CEERI Text to Speech Synthesis system for Hindi are shown in Fig 1. Hindi Text input in ISCII format is fed to Text Preprocessor which expands abbreviations, numerals, punctuation, Special charac- ters etc. At next level the word parser breaks down the individual words into basic units available in database. The intonational and stress parser provides information regarding position of pitch and stress variation at word and sentence level. In the present text to speech system, monosyllables have been chosen as the basic units of Hindi speech to generate an unrestricted vocabulary. The combination of most frequently occurring 29 consonants and 10 vowels forms the core of our database. The concatenator merges the syllable data from the database to make raw word data. This data is then processed by the Voice Quality Manager (VQM) using the Knowl- edge base consisting of durational, phonological and con- textual rules, to improve the speech quality. The Syn- thesizer then uses this data to generate the speech. DEVELOPMENT AND IMPLEMENTA- TION OF RULES The word parser takes Hindi Text (ISCII codes) as input and then breaks them into monosyllables of CV and VC types, clusters and geminates. These syllables form the basic parametric database used in the Hindi TTS. Infor- mation regarding punctuation etc. is included so that appropriate silence could be introduced during synthesis. Similarly information about sentence being declara- Fig 1 TTS block diagram tive or interrogative, is included in the first word of the sentence for implementing intonational contour. The syllable tags also carry the information regarding whether the syllable is a simple /CV/, /VC/ or a cluster or a geminate, or it is nasalized and the junction type it forms with the succeeding syllable so that the appropriate set of rules are applied while concatenating. This information helps to expedite the process of synthesis and to decide which category of rules has to be applied. Pitch variations have been determined at three levels i.e. word, clause and sentence. The F0 values for peaks and valleys, rise and falls at each of these level are as- Fig 2. Synthetic Speech from TTS signed and implemented. The content word in Hindi are usually stressed and stress is generally on the penultimate CONCLUSION high syllable. The present text to speech system for Hindi, has been The concatenator generates raw speech data for words developed on a PC platform, using syllables as the basic and sentences based on the input syllable string stream. units. These basic units are used to generate unrestricted The rules for speech quality improvements are imple- Hindi text. In our system the quality of basic units is of mented on this speech data. The smoothening of pa- paramount importance as it has a major impact on the rameters at syllable junction boundaries is done for avoid- quality speech output. The final speech output is quite ing jerks in output speech. intelligible. Since the development of a body of rules for bringing naturalness to the output synthetic speech is a In the present system, data for syllables containing short continuing process and it is largely language dependent vowels such as / U /, /I / etc. are generated by rules from further studies are being carried out. the corresponding syllables in the long vowels by vary- ing frequency and durational parameters using the rel- ACKNOWLEDGEMENT evant rule in the knowledge base. The authors are grateful to Dr. Shamim Ahmed, Direc- A large number of general rules for improving the syn- tor CEERI for his support and thankful to the MoIT for thetic speech quality have been generated after detailed providing the financial assistance. They are also thank- acoustic analysis in different contexts. The rules are based ful to Prof. K. Stevens for his useful suggestions. on the syllabic combinations that may occur in words. In the present system the clusters and geminates are REFERENCES treated differently. The information regarding presence of a cluster or a geminate in a word is incorporated at the parser level. Rules pertaining to them are applied at [1] Agrawal S S, and Stevens K., Proc. ICSLP-92, 177- the concatenation level. In case of voiced geminates a 180 1992. predetermined duration of voice bar is inserted before [2]Klatt D H and Klatt L, JASA,87(2),820-857,1990. the consonant while for unvoiced geminate a predeter- [3]Verma R, Sarma A S S, Shrotriya N, Sharma A K, mined duration of silence is inserted. For other conso- Agrawal S S, ICPhS95 Stolkholm Vol2 ,1995,p 354- nants similar treatment is carried out. For clusters the 357. parser not only sets a cluster flag but also converts, ev- ery single consonant of cluster into a / VC / by prefixing a vowel with the consonant. This vowel is redundant actually but used only to convert a single consonant into an available syllable unit. Later on at the concatenation level data corresponding to this vowel is stripped off and only consonant data is retained. Depending upon the consonants forming the cluster their duration are adjusted as per rule. The Investigation of the Topoiyo Language in Indonesia and its Speech Data-base H.Nakashimaa and M.Yamaguchib aFaculty of Information Science,Osaka Institute of Technology,1-79-1 Kitayama,Hirakata,Osaka 573-0196,Japan bFaculty of International Language and Culture, Osaka,Japan This study deal with the investigation and recordings of the Topoiyo language, which is one of the endangered languages, in Sulawesi Island, Republic of Indonesia. INTRODUCTION 1,200; Benggaulu, about 6,000; and Panasuan, about 800. No data were available about the Bana language. The Topoiyo language is now used in the inland area The following is the data we gathered in the along the Budong- budong River in Mamuju District, investigation: which is in the South Sulawesi Province in Sulawesi (1)Topoiyo; more than 2,000 words including noun, Island, Republic of Indonesia. The number of its verb, adjective and adverb, morphological and speakers is said to be between 500-1,000. And it is syntactic material, and some recording of vocabularies considered to genetically belong to the Kaili-Pamona and simple sentences. Family[1],[2],[3]. The languages from the South (2)Benggaulu; about 300 basic words and some Sulawesi family are spoken mostly in the South recording. Sulawesi Province, and the languages studied there (3)Bada; about 300 basic words and some recording. have been ones with a large population. In the Central (4)Panasuan; about 300 basic words and some Sulawesi Province, the languges which belong to the recording. Kaili-Pamona Family are used, and main languages We have constructed the speech data-base of the there have been studied since the country was Topoiyo language from the recorded speech materials, colonized by Holland. But the Kaili-Pamona languages and have made the CD-ROM from it. and other languages with a small population which are used in the north of the South Sulawesi Province have PHONETIC AND ACOUTICAL not been invesitigated and studied so much. ANALYSIS OF TOPOIYO LANGUAGE We have conducted an investigation and have collected the linguistic and phonetic data of the The following is the phonological system of the Topoiyo language, and we have constructed its speech Topoiyo language. data-base from the recorded speech materials. VOWELS: The vowel system of Topoiyo consists of /i,e,a,o,u/( /u/,/o/ are rounded vowels). These vowels INVESTIGATION OF TOPOIYO are distributed in word-initial , middle and final. LANGUAGE AND ITS SPEECH DATA- CONSONANTS: The consonants of Topoiyo are BASE shown in Table 1. The phonological feature as the language from the Kaili-Pamona are as follows; We have conducted an investigation there in (1) There are the nasal+consonant clusters, mp, mb, nt, September, 2000 and found the following facts about nd, nyc, nyj, ngk, and ngg.

Load more