<<

AN INTRODUCTION TO MONGOLIAN PHONETIC-LINGUISTIC MODEL:

ON THE ACOUSTIC PROPERTIES OF MONGOLIAN LANGUAGE

¡ Idomuso Dawa , Shigeki Okawa and Katsuhiko Shirai

¢ Dept. of Information and Computer Science, Waseda University, Japan ¢£¢ Dept. of Network Science, Chiba Institute of Technology, Japan

ABSTRACT Pandita in 1648. (c) Cyrillic scripts, created from Russian orthographic characters in 1940’s, have been widely used in This paper proposes a multi-lingual phonetic-linguistic mod- , Kalmyk A. R. in . for Mongolian language based on the study of the In the script (a), allophones sometimes correspond to classification and the postulates of such models. This the same characters as well as the contrary case. In many model is commonly applicable for various fields as linguis- cases, a word form is quite different from the pronunciations tic/phonetic/dialectal/ language educational studies as well in these days. On the contrary, (b) Todo scripts were created as speech recognition. Its application to major of based on a rule that one character corresponds to only one Mongolian has marked 91 % on word level recognition. pronunciation. Additionally, the letters have special marks In this paper, therefore, we first introduce the structure of for the long etc. Todo scripts avoid the vagueness in language and classification of Mongolian dialectal speech. writing and speaking in initial Mongolian, as “Todo” means Next, we analyze and classify fundamental characteristics the clearness by Mongolian. in order to define labels. Lastly, we apply the

common models to a simple word recognition experiment. ¤ (a) word ¤ (b) word written spoken written spoken 1 INTRODUCTION 1 2 -- n n Mongolian language, one of the most spoken Asian lan- n n n o u u u guages, is mainly used by Mongolian people resided in g g o- o G G the and spoken by about 7.5 million people. o u u u s s s s Because of its historical and geographical backgrounds, the s u u u u Mongolian language has several classes of dialects. - writing direction n n though the language structure of each is roughly equivalent, spoken Mongolian dialects are not usually in- line feed telligible to each other, and there exists a large discrepancy between articulation and orthography. Within recent net- Figure 1: Word samples written by (a) and (b) scripts. worked computing environments a intelligent models must be established towards developing digitalized facilities for Mongolian languages. written 1.1 Structure of Mongolian Languages spoken g a r ja Mongolian language has several dialectal variations in its phonetic and graphic systems because of the above reasons. Currently, there are three different graphic systems. (a) Figure 2: Word sample written by (c) scripts. Mongolian scripts (also known as initial Mongolian or Hudom scripts), originated in Uygur scripts, are used to Scripts (a) and (b) are written vertically from top to express Mongolian languages in Genghis era of the bottom, and lines feed from left to right. Each phonetic thirteenth Century. (b) Todo scripts, which are improved character changes its shape according to its position with- and created according to scripts (a) in order to correspond in a word (initial, medial and final forms) and there are to the West Mongolian pronunciations by a Lama ZA-

page 1657 ICPhS99 San Francisco

1

complicated writing conventions for grammatical endings pact phonetic symbols are desired. The IPA (International and a large amount of “historical” orthographic manners Phonetic Alphabet) or Worldbet are good examples of such particularly for the traditional script (a). Fig. 1, 2 show the symbols. Especially for the Mongolian speech, it should word samples by scripts (a), (b) and (c). be considered each dialectal characteristics which could be used commonly. There are, however, some phonetic sym- 1.2 Dialectal Groups bols that are difficult to determine their acoustic differences by linguistic knowledge. Therefore, it is desired to classify The spoken dialects of Mongolian are very various and by means of spectral and temporal difference. categorized into several groups. Their common characters Thus, we have obtained several unique peculiarities of are completed and proposed by Sh. Lobsan Bandan who Mongolian speech, which are in common for each dialect. was the first Mongolian linguist, based on linguistics in his We itemize them as following with the definition of our paper [4] (see Table 1. In this study, we define two major phoneme label symbols. dialectal groups (A) and (B), according to the using manner and the regions [1]. Table 2 gives the population and the script for the newly defined groups. 2.1 Vowels Though people usually say Mongolian has seven vowels, Table 1: Dialectal groups of Mongolian. practically there are nine fundamental vowels in spoken

mode by means of a dialectal difference, as /a/, //, / ¦ /, /i/,

Speech Dialect Using ¨ /o/, /u/, / § /, / /. group name regions Torugut , CN Hoshuto – 1. /e/: Western Oirodu – A vowel /e/ appears quite frequently in both (A) and Chahar – (B) groups. Fig. 3 shows the LPC spectrum for Kalmyik Kalmyik, A. R. the same time period. Since there are two different

Halkha Mongolia acoustic features, we defined two label symbols / ¦ / in Middle Chahar , CN (A) and /e/ in (B) groups. For example, words “eeji”

Ordos – is pronounced as “ ¦©¦ ji” in group (A) and as “eeji” in Northern Buirat Mongolia, CN group (B).

8.0 8.0 Vowel /e/ (A) Vowel /e/ (B)

Table 2: Dialectal groups in our system. 6.0 6.0 Speech Dialect Rate Script group

group name % (a) (b) (c) 4.0 4.0 ¥ Halha 87 ¥ 2.0 2.0

(A) Chahar 83.7 ¥ ¥ Ordos ¥

[kHz] [kHz] ¥ Kalmyk 86 ¥ 0.0 1.5 3.0 4.5 6.0 0.0 1.5 3.0 4.5 6.0

(B) Oirat 87.3 ¥ Figure 3: LPC spectrum for vowel /e/.

2 QUALITATIVE ANALYSIS OF THE PHONEMIC 2. phoneme /ai/: PECULIAR IN MONGOLIAN There is a specific vowel which is usually written In this section, we mention some results of the qualitative “ai” in speech groups (A) and (B). Fig. 4 shows the analysis of Mongolian speech in order to give a guideline to spectrogram of a word “ainai” which includes symbol build phonemic labels which would be useful in this system. “ai”. There is a visible pattern of a in Unlike major languages such as English or Japanese, it is group (A) and stationary segments with clear formant still under the advancement to systematize Mongolian pho- appearance of a typical vowel in group (B). We use a netic characteristics. This is why we consider this kind of symbol /I/ for “ai” in written scripts (a) and (b). The analysis is important. In digital speech processing such as pronunciation of /I/ is similar to [æ] in English word automatic speech recognition, generally explicit and com- “cat.” Since the final shape of a vowel /i/ in Todo

page 1658 ICPhS99 San Francisco

(A) (B) F2 /a i n a/ /a i n a/ i 2000

F3 1600 e

F2 a

1200 F1 θ o u 800 u Figure 4: Spectrogram of a word “ainai” which includes a

phoneme /ai/ in (A), (B) groups. 300 400 500 600 700 F1

Figure 6: 1 2 distribution of seven vowels in (A).

/ o / / u / / o’ / / u, /

Initial F2 i e 2000 Medial θ 1600 u

a

Final 1200 o u

800

Figure 5: Examples of allophones corresponding same 300 400 500 600 700 F1

characters in speech group (A).

Figure 7: 1 2 distribution of seven vowels in (B). scripts is usually written as “i” and pronounced as [æ],

we make a distinction between /i/ and /I/.



 

   

3. Allophones vowels: 

 

In the group (A), some characters correspond to several

        

different pronunciations. Referring for the original

 

 

 

! 

dialectal scripts, we use some new symbols. For 

example, symbols /o’/ and /u,/ depend the expression

  



 

 #

" 

in group (B), which are not separated in group (A). (see 

  



 



Fig. 5) 

 $

 



    

 

The typical formant distribution of the vowels, present-

        

       

ed in our former studies [5], are shown in Fig. 6 and 7. As 

§ ¨

shown in Fig. 6, the vowels /u/, / /, / / have a difficulty to $



 

 

!  classify by frequencies F and F . 

1 2

   $ &



 # %

 " 





$

   &  2.2 Consonants % There are about 27 basic consonants in each dialect of Mongolian language, and in several cases, one phonetic Figure 8: Definition of consonants /g/, /h/ and /k/ in (A), symbol corresponds to different characters. Based on the (B) speech groups. following analyses, we define 56 phonetic label symbols.

page 1659 ICPhS99 San Francisco

1. The glottis phonemes: Consonants /g/, /h/ and /k/ in (a), (b) and (c) are Table 3: Result of word recognition. classified and defined in our system as Fig. 8. Dialect Test Rec. Accuracy 2. Position in a word: Group Speaker (1) (2) Some consonants have different acoustic characteristics nam 81.91 85.32 as well as the shape within a word. Fig 9 shows (A) bolm 83.62 83.62 nrm 77.82 77.82 a spectrogram of word /s:is:/ which includes same Average 81.11 82.25 phoneme /s:/. pvm 91.08 86.01 Since there is no corresponding phonetic symbol in (B) Olm 85.32 85.98 the IPA, we adopt special symbols as their capital bsm 79.52 81.87 letters and some new symbols, considering the original Average 85.31 84.62 dialectal scripts and the acoustic differences. The other symbols from the same ideas include /N/, /B/, /G/, /M/, 4 CONCLUSION /L/, /S/, /D/, /R/, /S:/, /:h/, /s:/, /x˜/ and /N’/. In this paper, we have introduced the first attempt of classi- fying and defining common models for Mongolian dialects in order to use widely in linguistic, phonetic, dialectal, and educational research. We have so far collected 10,024 seconds speech and built several types of databases. s: i S: Although there are more and less difference from the ultimate accuracy, we believe this could contain a possibility of the usefulness of the commonly speech models. Since Mongolian languages are quite complicated, needless to say, there exists so much problems to be solved in converting and processing them to each other, that is, the quantity and the variation are still insufficient. We are going to continue improving performance of the speech models and the recognition accuracy.

REFERENCES Figure 9: Spectrogram of word “s:is:” including two /s:/’s. [1] T. Kamei, R. Kono and E. Chino, The Sanseido Encyclopedia of Linguistics, Volume 4, Languages of the World, Part Three, (Sanseido, Japan, 1992), p. 501. 3 SPEECH RECOGNITION EXPERIMENTS [2] Burintokusu, A Dictionary of Writing and Uttering Mon- In this section, we present a result of speech recognition ex- golian, (Inner Mongolian Education Publishing Company, periment as an application of the established speech models. , 1977). (See Table 3.) [3] Hasuerudoni and Naranbatu, The Basis of Mongolian, (Jiling The task is HMM (Hidden Markov Models) based Publishing Company, China, 1977). simple word recognition. We use ML-B300 word database [4] “Problems for classifying languages and dialects of Mongo- [6]. The frontend is the LPC mel-cepstral analysis, in which lian,” Journal of Bei-Jing Univ.(Social Sciences), 2, (1959). the digitized waveform is analyzed with a 21.3 ms Hamming window, shifted by 5 ms interval. The acoustic feature vector [5] I. Dawa, S. Okawa and K. Shirai, “Acoustic features anal-

contains 15 dimensional mel-cepstral coefficients and their yses of Mongolian dialects vowels by computer” J. ACTA ' ' and logarithmic energy (the total vector dimension ACUSTICA, Vol.24 No.1 Jan 1999 (China) is 31.) We employ diagonal Gaussian distribution HMM’s [6] I. Dawa, S. Okawa and K. Shirai, “Mongolian speech with 4 states, 3 loops and 4 mixture densities. For the database considering dialectal characteristics” ASJ Fall training, we use ML-B300 data uttered by (1) 7 males (2) Meeting, pp. 173-174, September 1998. 12 males and females. For the test, we use other (1) 3 males [7] Chojinjafu and Badoma, Dictionary of Mongolian Matching (2) 3 males and females. Mongolian and Todo Language, (Xinjiang People’s Publish- ing Company, China, 1979).

page 1660 ICPhS99 San Francisco