International Journal of Electronics Communication and Computer Engineering Volume 9, Issue 1, ISSN (Online): 2249–071X Classification of the Fricative and Occlusive Consonants According to the Place and the Mode of Articulation Soufyane Mounir*, Karim Tahiry and Abdelmajid farchi Date of publication (dd/mm/yyyy): 03/03/2018 Abstract – In this article, we study the classification of Measurements on acoustic parameters that are reported for occlusive and fricatives consonants in standard modern the fricatives namely: spectral moments, F2onset Arabic for three different articulation sites: bilabial, alveolar frequency, locus equation, slope of the spectrum, location (dental) / interdental and velar. By calculating the four of spectral peaks, measurement of static and dynamic spectral moments after pretreatment of our speech signal, we amplitudes and the duration of the noise were based on can classify these consonants according to the place and the mode of articulation. discrete Fourier transforms [22 23 24]. They concluded that there is no invariance in the acoustic signal and, therefore, Keywords – A Fricative, Occlusive, Place of Articulation, the categorization of speech by the listeners requires a Spectral Moments massive integration of signals as well as mechanisms of compensation able to manage the contextual influences. I. INTRODUCTION Spinu and Lilley were interested in the classification of fricatives. For this, they examined two methods. From a Several methods can be adopted to improve the speech corpus of Romanian fricatives and for the coding of speech, recognition rate. Among these methods are the extraction of the first method is based on the comparison of two acoustic characteristics that are characterized by observation vectors measurements: the spectral moments and the cepstral determined by time methods such as linear predictive coefficients. For the second method, they aimed at coding (LPC) or Mel Frequency Cepstral Coding (MFCC). extracting measurements in segment areas after comparing The feature extraction phase is a very important factor in two techniques of their determination [25]. For the first the development of a recognition system [1, 2, 3]. method, Spinu and Lilley divided the phonetic segments Nishinuma studied the French language where he tried in into three zones of almost equal duration, while in the his research to define protocols for detecting consonant second method, they used hidden Markov models (HMM) clusters based on temporal size. He used bisyllabic and to break each segment into three regions. For the 2nd trisyllabic words where he inserted target syllables CCV, method, they aimed at extracting measurements in VCC, CV and VC associated with the vowels / i, a, ã / [4]. segments areas after comparing two techniques for their Nishinuma studied the French language where he tried in determination [25]. About the 1st method, Spinu and Lilley his research to define protocols for detecting consonant divided the phonetic segments into three zones of almost clusters based on temporal size. He used bisyllabic and equal duration, whereas in the second method, they used trisyllabic words where he inserted target syllables CCV, models of Markov hidden (HMM) to decompose every VCC, CV and VC associated with the vowels / i, a, ã / [4]. segment into three regions of such kind to minimize the Using statistical analysis, he came to retrieve five variances of the measures in every region. Having classified parameters which we quote relate voicing, manner of fricatives according to the place of articulation, the articulation of the first half of the group, ratio of the harmonization, the state of palatalization and the sex by duration of the vowel and consonant duration of the using the logistic regression, they found relevant results at segment, duration of the consonant segment and the the level of the use of the cepstral coefficients which are position in the word. These rules allowed a correct more reliable than the spectral moments at the level of the classification of 90.13% of consonant groups. classification. On the other hand, they ended that the use of Other researchers have thought of exploiting spectral zones identified by HMM possesses a rate of classification moments to classify the consonants. It is a popular subject higher than the use of regions of equal duration. in phonetic literature over the last decades [5 6 7 9 10], in In our study we try to classify the occlusive and fricatives the processing and automatic recognition of speech [11 12 consonants in standard modern Arabic language for three 13] and in the literature on clinical phonetics [14 15]. Forest places of articulation: (bilabial, alveolar or dental, velar) sought to classify occlusive consonants using spectral and (bilabial, interdental, velar) respectively by means of moment analysis [16 17 18]; he found reliable results for spectral moments: spectral mean (m1), standard deviation certain categories such as the place of fricative articulation (2), skewness (3) and kurtosis (4). We also try to make a [19 20]. McMurray and Jongman used a broad combination comparison between these two modes of articulation. of measures to model fricative perception and representation [9]. II. CORPUS Other researchers have found it difficult to extract invariants that make it possible to distinguish fricatives Articulatory data were collected for fifteen Moroccan according to the place of articulation [18 20 9 21]. men, by pronouncing the CV syllable on four occasions, Copyright © 2018 IJECCE, All right reserved 49 International Journal of Electronics Communication and Computer Engineering Volume 9, Issue 1, ISSN (Online): 2249–071X 푛 4 where "C" " is the consonant and " V " the vowel is. The ⁄2 푓푖−푚1 푃(푓푖) m4= −3 + ∑ ( ) . Where: 푃(푓푖) is the 푖=0 푚2 ∑푛/2 ( ) alveolar or 푖=0 푃 푓푖 ,/ب/=/concerned consonants are (bilabial: /b 푖 푛 = for occlusives) power of the spectrum, 푓 = 2푓 . , 푖 = 0, 1, … , 푒푡 푛 / ك/= / velar (/k ,/ د/=/dental consonant: /d velar: 푖 푛푦푞 푛 2 ,/ذ/=/interdental consonant: /ð ,/ف/=/and (bilabial: /f .[for fricatives). For the vowels, we used the short 256 and 푓푛푦푞 is the Nyquist frequency [26 /غ/=/ɣ/ vowels / a, i, u/. In an isolated room and using " Praat " software, we used for the recording a microphone (Labtec IV. RESULTS AND DISCUSSIONS AM-232, sensibility: 35 dB, Impedance: 2,2 kOhm, bandwidth: 20à 8500 Hz) at 20 cms on a PC. With a The value of "m1" of the consonant / k / is greater than / frequency of sampling of 22050 Hz, the sound is directly b / and / d /. The spectral mean of / f / has the largest value scanned on a PC. We used the same software to segment the followed by / ð / then by / ɣ / (Figure 1, 2). From these syllables CV. results, we find that the spectral mean allows to classify the velar occlusive consonants, on the other hand, it allows to III. SPEECH SIGNAL PROCESSING classify all the places of articulation of the fricative consonants studied. We also find that in the case of A. Pretreatment occlusive consonants, when the consonant is produced at The pretreatment of the voice signal for the automatic the level of the posterior cavity, the spectral mean is greater speech recognition of the word is a compression of the data than that of the consonant produced at the level of the to facilitate a real time estimation. The estimation itself can anterior oral cavity. So here we are talking about the size of be made in the temporal domain or on the result of an this organ. These latter results are in harmony with what is analysis court-term made by the pretreatment. That will be found by Nitrouer and Stevens who stated that the oral useless to deal all of the signal (word / not word), for it we cavity and the spectral mean are dependent [27 28]. For the need to isolate the vocal activity by using a combination of standard deviation, the values of the alveolar occlusive two techniques: energy level and the passage by zero. consonants are larger than those of the velar than those of B. Preemphasis the bilabial ones. As for the fricative consonants, also the We meet a problem of decrease of amplitude in the interdentals spectrums are the most dispersed followed by spectrogram, for it we have to accentuate the sppech x (n) the bilabial and the velar. by calculating the magnitude x′(n) = x(n) − αx(n − 1). It The third spectral moment provides information on the is the filter which serves to amplify high frequencies. More location of the spectrum compared to the normal distribution. All occlusive consonants have positive values, 훼 est grand, more the magnitude is raised in high frequency. which shows that their spectrum is offset to the left, that the In our experience, we chose 훼 = 0, 95 obtained from the ퟐ흅ퟏퟎퟎ totality of the acoustic energy is contained in the low (− ) following formula 휶 = 풆 푭풔 . frequencies. More precisely, the velar consonants are the C. Windowing and FFT closest to the axis of symmetry of the normal distribution, Before extracting the parameters of the speech signal, it unlike the bilabial consonants. On the other hand, at the is essential to break it down into segments because it is of a level of consonants fricatives, we find two opposite non-stationary nature. By multiplying each segment by a (opposed) signs, the spectrum of interdental consonants is Hamming window, we succeed in weakening the moved to the right of the axis of symmetry, but it remains dicontinuities at the ends. This window is given by the closer, on the other hand, two other places of joints following equation: (articulations) possess spectrum where the maximal energy 2휋푛 is contained in the low frequencies where the spectrum of 푥 (푛) = 푥 (푛). (0,54 + 0,46. 푐표푠 ( ) (1) 1 2 푁 − 1 the bilabial stays farthest of this axis.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages5 Page
-
File Size-