Study on Unique Pharyngeal and Uvular Consonants in Foreign Accented Arabic
Total Page:16
File Type:pdf, Size:1020Kb
Study on Unique Pharyngeal and Uvular Consonants in Foreign Accented Arabic Yousef A. Alotaibi, Khondaker Abdullah-Al-Mamun, and Ghulam Muhammad Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia {yalotaibi, mamun, ghulam}@ccis.ksu.edu.sa common to all Arabic-speaking countries. It is the language Abstract used in the media (television, radio, press, etc.), in official This paper investigates the unique pharyngeal and uvular speeches, in universities and schools, and, generally speaking, consonants of Arabic from the automatic speech recognition in any kind of formal communication situation [4]. (ASR) point of view. Comparisons of the recognition error Arabic texts are almost never fully diacritized and are thus rates for these phonemes are analyzed in five experiments that potentially unsuitable for automatic speech recognition and involve different combinations of native and non-native synthesis [3]. Table 1 shows all Arabic alphabet letters and Arabic speakers. The most three confusing consonants for their correspondences to consonant and semivowel phonemes. every investigated consonant are uncovered and discussed. This table also shows the phonetic description of each Results confirm that these Arabic distinct consonants are a phoneme including the place of articulation. In this paper we major source of difficulty for ASR. While the recognition rate use Language Data Consortium (LDC) WestPoint Modern for certain of these unique consonants such as /H/ can drop MSA database phoneme set rather than International Phonetic below 35% when uttered by non-native speakers, there are Alphabet (IPA). This was decided because we are using advantages to including non-native speakers in ASR. WestPoint corpus in our research and they are using their Regional differences in the pronunciation of Modern Standard custom Arabic phoneme list. Also IPA symbolization is hard Arabic by native Arabic speakers require attention of Arabic to use in computer programs and files because symbols ASR research. contain some special characters and special diacritizations. Index Terms: Arabic, pharyngeal, uvular, speech recognition, Arabic has phonemes consisting of three short vowels (/i/, foreign accents /a/, /u/), three long vowels (/i:/, /a:/, /u:/ which are the counterparts of the short vowels), and twenty-eight consonants [6]. Arabic has noticeably fewer vowels than English. In addition, vowel lengthening in Arabic is phonemic. Some Arabic dialects may have additional or fewer 1. Introduction consonant phonemes. Arabic phonemes contain two Arabic is a Semitic language which has many differences distinctive classes that are named pharyngeal and emphatic when compared with other languages that include unique phonemes. phonemes, phonetic features and complicated morphological There are two pharyngeal phonemes in Arabic that are word structure. Automatic Speech Recognition (ASR) system part of interested phoneme in this research: fricatives, Ain /C/ for Modern Standard Arabic (MSA) has major difficulties and H\aa /H/. In addition to this, there are three uvular phones with distinctive characteristics of the Arabic sound system, that are also part of this research. These phonemes are one stop Qaaf /q/, two fricatives Ghain /G/ and Khaa /x/. Table 2 10.21437/Interspeech.2008-233 namely, geminate, emphatic, uvular, and pharyngeal consonants, and vowel duration [1, 2]. Few researches have shows the two pharyngeal and three Arabic uvular sounds. been done on Arabic ASR compared to other languages. Most This table displays the place of articulation of these five efforts have concentrated on developing recognizers for MSA, Arabic phonemes. More description about pharyngeal and which is the formal linguistic standard used throughout uvular phonemes will be represented in the results section. Arabic-speaking countries in the media, lectures, courtrooms Newman [7] investigated the phonetic status of Arabic [3]. This paper concentrates on the analysis and investigation within the world’s languages, with the concentration on the of five Arabic unique pharyngeal and uvular sounds from an uniqueness of special Arabic phonemes. In this study he ASR perspective with effect of foreign-accented considered the framework of IPA as used with the UCLA pronunciation. Phonological Segment Inventory Database – commonly Arabic is one of the oldest languages in the world. The known as UPSID. In that database there are about of 317 estimated number of Arabic speakers is 250 million, of whom languages with total of 58 phonetic features. By concentrating roughly 195 million are first-language speakers and 55 on only pharyngeal and uvular Arabic phonemes that are of million are second-language speakers. Since it is also the interest in our paper, he concluded that the voiced pharyngeal language of religious instruction in Islam, many more fricative /C/ is limited only to eight languages (2.5%) – five of speakers have at least a passive knowledge of the language. them are Afro-Asiatic, where the lengthened version is unique Arabic is an official language in more than 22 countries [4]. to Arabic. The voiced uvular fricative /G/ is reported in only Compared to MSA, Classical Arabic is an older, literary fourteen languages (4.38%) while the long version of this form of language, exemplified by the type of Arabic used in phoneme is unique for Arabic. In addition to this, Newman the holy Quran. Spoken Arabic is a collection of regional and found that the pharyngeal unvoiced phoneme, /H/, is accruing national varieties that are derived from Classical Arabic. in only thirteen languages (4.1%), and its longer variant was Arabic dialects are primarily oral languages; written material found in Arabic and in another language. On the other hand, is almost invariably in MSA. As a result, there is a serious he attributed the uvular plosive /q/ as the least stable sound lack of Language Model (LM) training material for dialectal inasmuch as in many local varieties of Arabic. To make it speech. MSA is a version of Classical Arabic with a clear, /q/ phoneme is realized either as a voiced velar plosive modernized vocabulary [5], and it is a formal standard (e.g., in Egypt, Libya, and Tunisia), or as Copyright © 2008 ISCA Accepted after peer review of full paper 751 September 22-26, Brisbane Australia Table 1. Arabic consonants [8]. effect of native and non-native speakers in both training and testing data. Hidden Markov Models (HMMs) are a well-known and Velar Uvular Glottal Palatal Bilabial Alveolar Pharyngeal Inter-dental Labio-dental Alveo-dental widely-used statistical method for characterizing the spectral Emphatic features of speech frame. The assumption underlying HMMs Voiced /D/ is that the speech signal can be well characterized as a Non- Emphatic % /d/ /j/ parametric random access, and the parameters of the Stop /b/ Emphatic /T/ stochastic process can be predicted in a precise, well-defined Unvoiced manner. The Hidden Markov Model Toolkit (HTK) [9] is a Non- Emphatic /t/ U /k/ /q/ /Q/ portable toolkit for building and manipulating HMMs; it is Emphatic /Z/ widely used for designing, testing, and implementing ASR Voiced systems and related research tasks. HTK is used in all the Non- Emphatic /z/ Fricative /TH/ /G/ /C/ experiments reported here. Emphatic Unvoiced /S/ 4 2.1. Database Non- Emphatic /f/ /th/ /s/ /sh/ /x/ /H/ /h/ We use WestPoint Arabic Speech Corpus provided by LDC Nasal Voiced Non- Emphatic /n/ /m/ [10]. A descriptive summary of this database is given in Table Non- Emphatic 3. The amount of data provided by the native speakers is /l/r/ Liquid Voiced significantly greater than non-native speakers and all non- Emphatic /l/ native Arabic speakers are native English speakers. The Semivowels Voiced Non- Emphatic corpus includes both male and female speakers. /w/ /y/ 2.2. System description and parameters Table 2. Investigated Arabic consonants. The parameters of the system are 22 kHz sampling rate with Place of 16 bit sample resolution, 25 milliseconds Hamming window Arabic Alphabet LDC Articulation duration with a step size of 10 milliseconds, MFCC Carrier Symbol Considered coefficients with 22 as the length of cepstral leftering and 26 filter bank channels, 12 as the number of MFCC coefficients, Ain E C Pharynx 12 first derivatives and 0.95 as the pre-emphasis coefficients. Ghain I G Uvular Our baseline system is designed as a phoneme-level H\aa H Pharynx recognizer with 3-state, continuous, left-to-right, no skip HMM models. It considers all 37 MSA monophones as given Qaaf Q q Uvular in the LDC catalog [10]. We note that the WestPoint Corpus Khaa ! x Uvular contains more monophones than the number of MSA phonemes mentioned in the linguistic literature [11, 12, 13]. Table 3. LDC WestPoint corpus summary. The phoneme /g/ was used because some native and non- native speakers produced it in certain MSA words. Two extra Number of Speakers diphthongs were added because of variations in the male female total pronunciations of non-native speakers. We use the WestPoint native 41 34 75 Corpus phonemes, transcriptions and other settings without non-native 25 10 35 any modification. totals 66 44 110 Since most of the words consisted of more than two phonemes, context-dependent triphone models were created Hours of Data from the monophone models. In the training phase, the model male female total was aligned and tied by using the decision tree method. The native 6 4.4 10.4 last step in the training phase is to re-estimate the HMM non-native 0.74 0.28 1.02 parameters using the Baum-Welch algorithm three times. totals 6.74 4.68 11.42 Five types of experiments were carried out in the experiments. These experiments differ only in the type of the training and testing data sets. These experiments are labeled as N/N, N/NN, NN/N, NN/NN, and M/M, where N, NN and glottal stop as the case in the Syro-Lebanese and Cairene.