International Journal of Electronics Communication and Computer Engineering Volume 9, Issue 1, ISSN (Online): 2249–071X

Classification of the and Occlusive According to the Place and the Mode of Articulation

Soufyane Mounir*, Karim Tahiry and Abdelmajid farchi

Date of publication (dd/mm/yyyy): 03/03/2018

Abstract – In this article, we study the classification of Measurements on acoustic parameters that are reported for occlusive and consonants in standard modern the fricatives namely: spectral moments, F2onset Arabic for three different articulation sites: bilabial, alveolar frequency, locus equation, slope of the spectrum, location (dental) / interdental and velar. By calculating the four of spectral peaks, measurement of static and dynamic spectral moments after pretreatment of our speech signal, we amplitudes and the duration of the noise were based on can classify these consonants according to the place and the mode of articulation. discrete Fourier transforms [22 23 24]. They concluded that there is no invariance in the acoustic signal and, therefore, Keywords – A Fricative, Occlusive, , the categorization of speech by the listeners requires a Spectral Moments massive integration of signals as well as mechanisms of compensation able to manage the contextual influences. I. INTRODUCTION Spinu and Lilley were interested in the classification of fricatives. For this, they examined two methods. From a Several methods can be adopted to improve the speech corpus of Romanian fricatives and for the coding of speech, recognition rate. Among these methods are the extraction of the first method is based on the comparison of two acoustic characteristics that are characterized by observation vectors measurements: the spectral moments and the cepstral determined by time methods such as linear predictive coefficients. For the second method, they aimed at coding (LPC) or Mel Frequency Cepstral Coding (MFCC). extracting measurements in segment areas after comparing The feature extraction phase is a very important factor in two techniques of their determination [25]. For the first the development of a recognition system [1, 2, 3]. method, Spinu and Lilley divided the phonetic segments Nishinuma studied the French where he tried in into three zones of almost equal duration, while in the his research to define protocols for detecting second method, they used hidden Markov models (HMM) clusters based on temporal size. He used bisyllabic and to break each segment into three regions. For the 2nd trisyllabic words where he inserted target syllables CCV, method, they aimed at extracting measurements in VCC, CV and VC associated with the / i, a, ã / [4]. segments areas after comparing two techniques for their Nishinuma studied the French language where he tried in determination [25]. About the 1st method, Spinu and Lilley his research to define protocols for detecting consonant divided the phonetic segments into three zones of almost clusters based on temporal size. He used bisyllabic and equal duration, whereas in the second method, they used trisyllabic words where he inserted target syllables CCV, models of Markov hidden (HMM) to decompose every VCC, CV and VC associated with the vowels / i, a, ã / [4]. segment into three regions of such kind to minimize the Using statistical analysis, he came to retrieve five variances of the measures in every region. Having classified parameters which we quote relate voicing, manner of fricatives according to the place of articulation, the articulation of the first half of the group, ratio of the harmonization, the state of palatalization and the sex by duration of the and consonant duration of the using the logistic regression, they found relevant results at segment, duration of the consonant segment and the the level of the use of the cepstral coefficients which are position in the word. These rules allowed a correct more reliable than the spectral moments at the level of the classification of 90.13% of consonant groups. classification. On the other hand, they ended that the use of Other researchers have thought of exploiting spectral zones identified by HMM possesses a rate of classification moments to classify the consonants. It is a popular subject higher than the use of regions of equal duration. in phonetic literature over the last decades [5 6 7 9 10], in In our study we try to classify the occlusive and fricatives the processing and automatic recognition of speech [11 12 consonants in standard modern Arabic language for three 13] and in the literature on clinical [14 15]. Forest places of articulation: (bilabial, alveolar or dental, velar) sought to classify occlusive consonants using spectral and (bilabial, interdental, velar) respectively by means of moment analysis [16 17 18]; he found reliable results for spectral moments: spectral mean (m1), standard deviation certain categories such as the place of fricative articulation (2), skewness (3) and kurtosis (4). We also try to make a [19 20]. McMurray and Jongman used a broad combination comparison between these two modes of articulation. of measures to model fricative perception and representation [9]. II. CORPUS Other researchers have found it difficult to extract invariants that make it possible to distinguish fricatives Articulatory data were collected for fifteen Moroccan according to the place of articulation [18 20 9 21]. men, by pronouncing the CV syllable on four occasions, Copyright © 2018 IJECCE, All right reserved 49 International Journal of Electronics Communication and Computer Engineering Volume 9, Issue 1, ISSN (Online): 2249–071X

푛 4 where "C" " is the consonant and " V " the vowel is. The ⁄2 푓푖−푚1 푃(푓푖) m4= −3 + ∑ ( ) . Where: 푃(푓푖) is the 푖=0 푚2 ∑푛/2 ( ) alveolar or 푖=0 푃 푓푖 ,/ب/=/concerned consonants are (bilabial: /b 푖 푛 = for occlusives) power of the spectrum, 푓 = 2푓 . , 푖 = 0, 1, … , 푒푡 푛 / ك/= / velar (/k ,/ د/=/: /d velar: 푖 푛푦푞 푛 2 ,/ذ/=/interdental consonant: /ð ,/ف/=/and (bilabial: /f .[for fricatives). For the vowels, we used the short 256 and 푓푛푦푞 is the Nyquist frequency [26 /غ/=/ɣ/ vowels / a, i, u/. In an isolated room and using " Praat " software, we used for the recording a microphone (Labtec IV. RESULTS AND DISCUSSIONS AM-232, sensibility: 35 dB, Impedance: 2,2 kOhm, bandwidth: 20à 8500 Hz) at 20 cms on a PC. With a The value of "m1" of the consonant / k / is greater than / frequency of sampling of 22050 Hz, the sound is directly b / and / d /. The spectral mean of / f / has the largest value scanned on a PC. We used the same software to segment the followed by / ð / then by / ɣ / (Figure 1, 2). From these syllables CV. results, we find that the spectral mean allows to classify the velar occlusive consonants, on the other hand, it allows to III. SPEECH SIGNAL PROCESSING classify all the places of articulation of the fricative consonants studied. We also find that in the case of A. Pretreatment occlusive consonants, when the consonant is produced at The pretreatment of the signal for the automatic the level of the posterior cavity, the spectral mean is greater speech recognition of the word is a compression of the data than that of the consonant produced at the level of the to facilitate a real time estimation. The estimation itself can anterior oral cavity. So here we are talking about the size of be made in the temporal domain or on the result of an this organ. These latter results are in harmony with what is analysis court-term made by the pretreatment. That will be found by Nitrouer and Stevens who stated that the oral useless to deal all of the signal (word / not word), for it we cavity and the spectral mean are dependent [27 28]. For the need to isolate the vocal activity by using a combination of standard deviation, the values of the alveolar occlusive two techniques: energy level and the passage by zero. consonants are larger than those of the velar than those of B. Preemphasis the bilabial ones. As for the fricative consonants, also the We meet a problem of decrease of amplitude in the interdentals spectrums are the most dispersed followed by spectrogram, for it we have to accentuate the sppech x (n) the bilabial and the velar. by calculating the magnitude x′(n) = x(n) − αx(n − 1). It The third spectral moment provides information on the is the filter which serves to amplify high frequencies. More location of the spectrum compared to the normal distribution. All occlusive consonants have positive values, 훼 est grand, more the magnitude is raised in high frequency. which shows that their spectrum is offset to the left, that the In our experience, we chose 훼 = 0, 95 obtained from the ퟐ흅ퟏퟎퟎ totality of the acoustic energy is contained in the low (− ) following formula 휶 = 풆 푭풔 . frequencies. More precisely, the velar consonants are the C. Windowing and FFT closest to the axis of symmetry of the normal distribution, Before extracting the parameters of the speech signal, it unlike the bilabial consonants. On the other hand, at the is essential to break it down into segments because it is of a level of consonants fricatives, we find two opposite non-stationary nature. By multiplying each segment by a (opposed) signs, the spectrum of interdental consonants is Hamming window, we succeed in weakening the moved to the right of the axis of symmetry, but it remains dicontinuities at the ends. This window is given by the closer, on the other hand, two other places of joints following equation: (articulations) possess spectrum where the maximal energy 2휋푛 is contained in the low frequencies where the spectrum of 푥 (푛) = 푥 (푛). (0,54 + 0,46. 푐표푠 ( ) (1) 1 2 푁 − 1 the bilabial stays farthest of this axis. Where N is the number of samples. The results obtained from the last moment for the The FFT step transforms the speech signal into a occlusive consonants show that the spectral distribution of frequency domain [2]: the bilabial is the narrowest seen that it corresponds to the

푁−1 biggest value of the kurtosis, on the other hand that of the 2휋푗푘푛 − velar is the most flattened. For fricatives, the spectrum of 푋 = ∑ 푥 푒 푁 , 푛 = {0, 1 … . , 푁 − 1} (2) 푛 푘 the velar is the most flattened with compared with the other 푘=0 places of articulation. We can say that according to the D. Calculating Spectral Moments place of articulation. The moments "m1 ", "m2" and "m3" In our work, we are interested in the method of spectral allow to classify occlusive consonants and fricatives moments after transforming the signal into a frequency consonants. domain. The four spectral moments concerned are: spectral mean (m1), standard deviation (m2), skewness (m3) kurtosis (m4). The spectral mean, the standard deviation, the skewness and the kurtosis are respectively given by: 푚1 = 푛/2 푃(푓푖) 푛/2 2 푃(푓푖) ∑ √ 푛/2 푓푖, 푚2 = √∑ (푓푖 − 푚1) 푛/2 , 푖=0 ∑ 푃(푓 ) 푖=0 ∑ 푃(푓 ) 푖=0 푖 푖=0 푖 푓 −푚1 3 푃(푓 ) 푚3 = ∑푛/2 ( 푖 ) 푖 and 푖=0 푚2 ∑푛/2 ( ) 푖=0 푃 푓푖 Copyright © 2018 IJECCE, All right reserved 50 International Journal of Electronics Communication and Computer Engineering Volume 9, Issue 1, ISSN (Online): 2249–071X

Fig. 1. The four spectral moments of the occlusive consonants for three places in context CV

Fig. 2. The four spectral moments of the fricative consonants for three places in context CV

Copyright © 2018 IJECCE, All right reserved 51 International Journal of Electronics Communication and Computer Engineering Volume 9, Issue 1, ISSN (Online): 2249–071X

Fig. 3. Comparison between the values of the four spectral moments of the occlusive and fricative consonants for three different places of articulation.

We notice that at the level of the spectral mean, the of the two moments of the occlusive for all these places are bilabial and interdental fricatives present the biggest values more important than those of the fricatives. that those of the occlusive consonant bilabial and alveolar (dental consonants) respectively (Figure 3). On the other V. CONCLUSION hand, the velar fricatives present a spectral mean less important than that of the velar occlusive consonants. About This study, based on the standard modern Arabic the second moment, the spectral distributions of bilabial and language, where we treated two different modes of dental occlusive consonant are less dispersed than those of articulation (occlusive and fricative) for three different sites the bilabials and interdental consonants fricatives of articulation (bilabial, dental / interdental, velar), revealed respectively, but no difference is noticed about velar several interesting results. "m1", "m3" and "m4" can occlusive and velar fricative. classify consonants according to the place of articulation for Concerning the third moment, the spectral distributions a single mode of articulation. The moment "m4" makes it of bilabial and velar occulsives and fricatives carry the possible to distinguish between the occlusives and the totality of energy in the low frequencies, but the first ones fricatives irrespective of the place of articulation. are the most offset compared to the second ones. The big difference is noted about the dentals and interdentals where REFERENCES they have a right-hand asymmetry (negative value), while the total energy of the spectral distributions of the dentals is [1] Deroo, O., 1998 , « Modèles Dependant du context et Méthodes contained in the low frequencies and this can be due to the de Fusion de données Appliquées à la reconnaissance de la parole little difference in the place of articulation. par Modèles Hybrides HMM/MPL, ’’Faculté Polytechnique de Remarkable results at the fourth moment showed that Mons’’ [2] Bergounioux, M., 2010, « Mathématiques pour le traitement du bilabial, interdental, and velar spectrum are narrower than signal : cours et exrcices corrigés » Dunod, paris, pp. 270-279 those of bilabial, alveolar and velar occlusive respectively. [3] M. Bellanger « Traitement Numérique du signal : Théorie et We can say that "m1" makes it possible to differentiate pratique » edition Masson 1987, p : 363 between the occlusive consonants and the fricative [4] [Nishinuma, Y., Duez, D., & Paboudjian, C., 1991, « Automatic classification of consonant clusters in French », Speech consonants according to the place of articulation. The first communication, Vol. 10, pp. 395-403. moment is less important in bilabial and alveolar occlusive [5] Hoelterhoff, J., & Reetz, H., 2007, « Acoustic cues discriminating compared to bilabial and interdental fricative. On the other German in place and » Journal hand, it is more important in velar occlusive than in of the Acoustical Society of America, 121, 1142–1156. [6] Jesus, L. M. & Jackson, P. J, 2008, « Frication and voicing fricatives. The moment "m4" makes it possible to classify classification. In A. Teixeira, V. L. S. de Lima, L. C. de Oliveira, the occlusive consonants in front of the fricative consonants & P. Quaresma (Eds.) », Computational processing of the whatever the place of articulation studied since the values Portuguese language: The 8th International Conference on Copyright © 2018 IJECCE, All right reserved 52 International Journal of Electronics Communication and Computer Engineering Volume 9, Issue 1, ISSN (Online): 2249–071X

Computational Processing of Portuguese (PROPOR) (pp. 11–20). [28] [28] Stevens, K.N., 1998, “Acoustic Phonetics”, The MIT Press, Berlin, Heidelberg: Springer - Verlag. Cambridge, MA. [7] Jesus, L. M., & Shadle, C. H.., 2002, « A parametric study of the spectral characteristics of European Portuguese fricatives », Journal of Phonetics, 30(3), 437–464. AUTHORS’ PROFILES [8] Koenig, L. L., Shadle, C. H., Preston, J. L., & Mooshammer, C. R.., 2013, « Toward improved spectral measures of /s/: Results Pr. Soufyane Mounir from adolescents », Journal of Speech, Language, and Hearing Member of research team "signals and systems" in laboratory of Research, 56, 1175–1189. mechanical engineering, industrial management and innovation; Professor [9] McMurray, B., & Jongman, A., 2011, « What information is in National School of Applied Sciences, University Hassan 1st, Khouribga, necessary for speech categorization? Harnessing variability in the Morocco. speech signal by integrating cues computed relative to expectations », Psychological Review, 118, 219–246. Phd. Karim Tahiry [10] [Shadle, C. H. & Mair, S. J., 1996, « Quantifying spectral Member of research team "signals and systems" in laboratory of characteristics of fricatives », In Proceedings of the international mechanical engineering, industrial management and innovation; Phd of conference on spoken language processing (ICSLP 96) (pp. 1517– Science and Technical Faculty, University Hassan 1st, Settat, Morocco. 1520). Philadelphia. [11] Childers, D. G., Wu, K., Bae, K. S., & Hicks, D. M., Pr. Abdelmajid Farchi 1998, « Automatic recognition of gender by voice », Paper Chief of research team "signals and systems" in laboratory of mechanical presented at the IEEE international conference on acoustics, engineering, industrial management and innovation; Professor in Science speech, and signal processing (ICASSP-88). and Technical Faculty, University Hassan 1st, Settat, Morocco. [12] Farooq, O., & Datta, S., 2001, « Mel filter-like admissible wavelet packet structure for speech recognition », IEEE Signal Processing Letters, 8(7), 196–198. [13] Halberstadt, A. K. & Glass, J. R., 1997, « Heterogeneous acoustic measurements for phonetic classification ». EUROSPEECH [14] Kong, Y.-Y., Mullangi, A., & Kokkinakis, K., 2014, « Classification of fricative consonants for speech enhancement in hearing devices. PLoS ONE », 9(4), e95001, http://dx.doi.org/10.1371/ journal.pone.0095001 [15] Todd, A., Edwards, J., & Litovsky, R., 2011, « Production of contrast between fricatives by children with cochlear implants », Journal of the Acoustical Society of America, 130, 3969–3979. [16] Forrest, K., Weismer, G., Elbert, M., & Dinnsen, D. A., 1994, « Spectral-analysis of target-appropriate T and K produced by phonologically disordered and normally articulating children », Clinical Linguistics and Phonetics, 8, 267–281. [17] Forrest, K., Weismer, G., Hodge, M., Dinnsen, D. A., & Elbert, M., 1990, « Statistical-analysis of word-initial K and T produced by normal and phonologically disordered children », Clinical Linguistics and Phonetics, 4, 327–340. [18] Forrest, K., Weismer, G., Milenkovic, P., & Dougall, R., 1988, « Statistical analysis of word-initial voiceless obstruents: Preliminary data », Journal of the Acoustical Society of America, 84, 115–124. [19] Flipsen, P., Shriberg, L., Weismer, G., Karlsson, H., & McSweeny, J., 1999, « Acoustic characteristics of /s/ in adolescents », Journal of Speech, Language, and Hearing Research, 42, 663–677. [20] Jongman, A., Wayland, R., & Wong, S., 2000, « Acoustic characteristics of English fricatives ». Journal of the Acoustical Society of America 10/2000, 108(3 Pt 1), 1252–1263. [21] Tomiak, G., 1990, « An acoustic and perceptual analysis of the spectral moments invariant with voiceless fricative obstruents (Doctoral dissertation) », State University of New York, Buffalo [22] Koenig, L. L., Shadle, C. H., Preston, J. L., & Mooshammer, C. R., 2013, « Toward improved spectral measures of /s/: Results from adolescents », Journal of Speech, Language, and Hearing Research, 56, 1175–1189. [23] Lousada, M., Jesus, L., & Pape, D., 2012, « Estimation of stops' spectral place cues using multitaper techniques ». DELTA, 28(1), 1–26. [24] Zygis, M., Pape, D., & Jesus, L., 2012, « (Non) retroflex Slavic and their motivation: Evidence from Czech and Polish », Journal of the International Phonetic Association, 42(3), 281–329. [25] Spinu, L., & Lilley, J., 2016, « A comparison of cepstral coefficients and spectral moments in the classification of Romanian fricatives », Journal of Phonetic, Vol, 57, pp. 44-58 [26] Feng, Y., Hao, G, J., Xue, S, A., Max, L, 2011, “Detecting anticipatory effects in speech articulation by means of spectral coefficient analyses”, Speech Communication, Vol.53, pp. 842– 854. [27] Nittrouer, S., 1995, “Children learn separate aspects of speech production at different rates: evidence from spectral moments”, J. Acoust. Soc. Am. Vol. 97, pp. 520–530. Copyright © 2018 IJECCE, All right reserved 53