A First Experience on Multilingual Acoustic Modeling of the Languages Spoken in Morocco
Total Page:16
File Type:pdf, Size:1020Kb
INTERSPEECH 2004 -- ICSLP 8th International Conference on Spoken ISCA Archive Language Processing http://www.isca-speech.org/archive ICC Jeju, Jeju Island, Korea October 4-8, 2004 A first experience on multilingual acoustic modeling of the languages spoken in Morocco José B. Mariño, A. Moreno, A. Nogueiras TALP Research Center on “Technologies and Applications of Language and Speech” Technical University of Catalonia, Spain {canton,asuncion,albino}@talp.upc.es inventory of allophones for both languages is designed and Abstract evaluated against the monolingual counterparts. The goal of this paper is to explore and describe the potential The paper is organized as follows. The three next sections of multilingual acoustic models for automatic speech describe the experimental framework including the available recognition of the languages spoken in Morocco. The basic speech databases used to train and test the system, the experimental framework comes from the OrienTel project, inventory of sounds and the main features of the recognition mainly the sound inventory of the Arabic languages and the system used for this experimental work. Section five provides speech databases. Monolingual and multilingual automatic the description of the experimental work carried out to speech recognition systems for Modern Colloquial and validate and evaluate the multilingual system. The paper ends Standard Arabic (MCA and MSA, respectively) and French with a discussion section. languages are developed and evaluated, in order to envisage the phonetic exchange and similarity among the three 2. Speech databases languages. As a main result, it can be stated that a combined In the OrienTel project three databases [2] have been modeling of MSA and MCA or, even a trilingual design, does produced in Morocco: for MCA, MSA and French. Calls not harm the performance of the recognition system. were recorded from fixed and mobile phones. The utterances were recorded through an ISDN access to the fixed public 1. Introduction telephone network, sampled at 8 kHz and quantified by the A- The aim of the IST project “Multilingual Access to Interactive law at 8 bits per sample. These databases have been used for Communication Services for the Mediterranean and the training the ASR system and testing. Middle East” (OrienTel) is to enable the project's participants to design and develop multilingual interactive communication 2.1. MCA database services for the Mediterranean and the Middle East, ranging The Modern Colloquial Arabic (MCA) database contains from Morocco in the West to the Gulf States in the East, utterances collected from 772 speakers: 600 of them supply including Turkey, Israel and Cyprus. To achieve this aim, the the training material and the remaining 172 speakers build up consortium has been compiling a set of 23 linguistic databases the testing set. and conduct research into ASR-related problems of the As training material the total number of utterances is 44 x 10.21437/Interspeech.2004-308 OrienTel region. 600= 26400 utterances (spellings and yes/no questions are not This paper is addressed to explore and describe the used) including more than 605850 phones. As test we chose potential of multilingual acoustic models and lexica of the three different tasks extensively described in [2]: languages spoken in Morocco. Morocco belongs to the • Digit strings: prompt sheet number, telephone number, Magreb area. Three languages are spoken in Morocco: spontaneous telephone number, credit car (14-16 digits), Modern Colloquial Arabic (MCA), Modern Standard Arabic PIN. (MSA) and French. As far as MSA and MCA are spoken • Applications words. across the country, Morocco is a fully bilingual country. • Dates: relative and general expressions. French is mainly used for commercial transactions. Both MSA and MCA languages have important 2.2. MSA database similarities while maintaining specific phonetic traits and lexica. For instance, even though they share the same The Modern Standard Arabic (MSA) database contains phonetic inventory, pronunciation issues differ slightly utterances collected from 530 speakers: 400 of them supply between both languages. the training material and the remaining 130 speakers build up On the other hand, French shows a complete different the testing set. phonetic inventory and come from a very different language As training material the total number of utterances is 46 x root, that is, Latin. In this work we shall try to take advantage 400= 18400 utterances (spelling and yes/no questions are not of the fact that, for Moroccan people, French is a very used) including more than 548395 phones. As test we chose commonly used third language, and their pronunciation is three different tasks: strongly influenced by Arab phonemes. • Digit strings: prompt sheet number, strings of 4 digits. Thus, an alternative to use a specific phonetic description • Applications words. for MSA, MCA and French can be devised. In this paper, and • Dates: relative and general expressions. following previous work (for instance, see [1]), a common 2.3. French database SAMPA Definition The French database is formed by utterances collected from Vowels 530 speakers: 400 of them supply the training material and a open front unrounded vowel the remaining 130 speakers build up the testing set. i close front unrounded vowel As training material the total number of utterances is 43 x 400= 17200 utterances (spellings and yes/no questions are not u close back rounded vowel used) including more than 344245 phones. As test we chose a: long open front unrounded vowel three different tasks: i: long close front unrounded vowel • Digit strings: prompt sheet number, telephone number, u: long close back rounded vowel spontaneous telephone number, credit car (14-16 digits), Semivowels PIN. • Applications words. j voiced palatal approximant • Dates: prompted date phrases, relative and general w voiced labial-velar approximant expressions Fricatives After discarding the utterances with mispronounced or incomplete words, the final number of utterances for every ?`(?\) voiced pharyngeal fricative language is described in Table 1. Furthermore, to be more D voiced dental fricative specific, Table 2 shows, for each test set, the size of the D` voiced dental emphatic fricative vocabulary, number of sentences and number of words for the f voiceless labiodental fricative final set. G voiced velar fricative h voiceless glottal fricative Training Test Language s voiceless alveolar fricative Utterances Digits A. words Dates S voiceless postalveolar fricative MCA 26328 356 816 156 s` voiceless alveolar emphatic fricative MSA 18322 911 698 114 T voiceless dental fricative French 17039 267 727 202 v voiced labiodental fricative (MCA, MSA rare) Table 1: Training and test material. Number of utterances. x voiceless velar fricative X\ voiceless pharyngeal fricative Digits A. words Dates Language z voiced alveolar fricative Size Words Size Words Size Words Z voiced postalveolar fricative MCA 10 2195 23 848 38 351 Lateral MSA 10 3999 25 786 32 247 l voiced dental/alveolar lateral approximant French 10 1728 33 727 151 894 l` voiced dental/alveolar lateral approximant emphatic (MCA, MSA rare) Table 2: Vocabulary size and number of words in the test set. Trill r voiced dental or alveolar trill 3. Inventory of sounds Nasals The standard SAMPA[3] phoneme set for French and Arabic m voiced bilabial nasal were used for French, and MCA and MSA as spoken in n voiced dental or alveolar nasal Morocco. Table 3 summarizes the MSA and MCA inventories of allophones as they were defined to design the OrienTel Plosives databases. The same table includes for every allophone the ? stød glottal stop attributes considered further in the clustering algorithms. It b voiced bilabial plosive can be observed that MSA and MCA share the same d voiced dental/alveolar plosive inventory, where rare sounds coming from foreign languages are included. d` voiced dento-alveolar emphatic plosive French only shares a small part of phonemes, which are g voiced velar plosive marked with a bold character in Table 3. French shows a k voiceless velar plosive greater variability on the vowels set and MCA and MSA p voiceless bilabial plosive (MCA, MSA rare) show a higher variability in the fricatives set. The specific q voiceless uvular plosive French sounds used in our experimentation can be found in Table 4, where “indeterminacy symbol” means that it replaces t voiceless dental/alveolar plosive the corresponding symbols in the list in case of indeterminacy t` voiceless dento-alveolar emphatic plosive between both symbols. Table 3: Inventory of sounds for MCA and MSA. SAMPA Definition Some standard pronunciation issues in Magreb dialects Vowels have not been taken into account when generating phonetic transcriptions because of their dependence on the speaker and 2 close-mid,front,rounded vowel their non-systematic nature: 9 open-mid front rounded vowel a) Substitution of /T/ by /t/; and of /q/ by /g/ or /?/. @ mid central unrounded vowel b) Assimilation of voiced dental fricatives and plosives (/D/, A open back unrounded vowel /D`/, /d/, /d`/), which usually merge into just /d/ and /d`/. c) Relaxation of shedda (gemination) and emphasation. e close-mid front unrounded vowel d) Deletion of hamza (/?/). E open-mid front unrounded vowel Furthermore, the distribution of these peculiarities is dialect o close-mid back rounded vowel dependent, being remarkably more important in MCA. O open-mid back rounded vowel The recognition search is sped up by using beam-search Y close front rounded vowel and phonetic look-ahead. &/ 2, 9 (indeterminacy symbol) 5. Evaluation A/ a, A (indeterminacy symbol) E/ e, E (indeterminacy symbol) The following recognition systems were trained and evaluated: a) Three monolingual systems, one for each language, with O/ o, O (indeterminacy symbol) 750 models each. U~/ e~, 9~ (indeterminacy symbol) b) Two bilingual systems for modeling MCA and MSA. 9~ open-mid front rounded nasal Both systems use 900 models. The presence or not of a~ open front unrounded nasal language dedicated models is the difference between e~ close-mid front unrounded nasal them.