Misarticulated /R/ - Speech Corpus and Automatic Recognition Technique

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7, Issue-6, March 2019 Misarticulated /r/ - Speech Corpus and Automatic Recognition Technique Suresh Kumar Nagaram Suman Maloji KasiprasadMannepalli articulation disorder (Rhotacism) in particular, are reported Abstract: Technique to recognize the impaired pronunciation in the literature. Visual feedback of real time speech of sound /r/ from Telugu speech signals is presented in this paper spectrogram to the patient [4, 5], usage of removable besides the speech corpus. Rhotacism is called as an inability to R-appliances [6], Ultra sound [7, 8], hand gesture cues [9] pronounce the sound /r/ and is one of the Speech Sound Disorders and Electrography [10] modes of therapy for misarticulation (SSD) in children. Whose SSD not diagnosed at an early stage may result ina lack of social skills. This demands an efficient of /r/ is reported. Remediation of improper articulation of /r/ automatic speech impairment detection technique, which helps using traditional and spectral bio feedback is reckoned in the therapists to treat the patients with impairment specific [11]. With the help of speech therapy classes and continuous procedure. Databases for the impaired articulation of /r/ in practice, one can recover from SSD. The practice of speech various languages are explored in this article. The shape of the therapy mainly relies on the methodology of identification envelope, timbre, Walsh Hadamard Transform (WHT), Discreet followed by the correction of improper sounds, making it a Cosine Transform (DCT) features extracted, from the Mel-Frequency Cepstral Coefficients (MFCC), to discriminate the tedious process. During the diagnosis, the therapist has to correct and wrong articulation of /r/ are detailed. Usage of k- spend an ample amount of time to identify and categorize the Nearest Neighbor (kNN), Support Vector Machine (SVM) and misarticulated sound. An investigation of relation among the kohonen neural networks in various articles, for classification, acoustical, ultra sound and perceptual routines to categorize are briefed. MFCC features and k-NN algorithm is used to the good and bad articulation of /r/ is detailed in [12]. identify the misarticulation in the Telugu language. The 80.1% The detection and diagnosis of the SSD vary from classification accuracy shows that the proposed method performs good with respect to the methods detailed for other languages. language to language. There is very less, or no research is Availability of acoustic databases for the impaired articulation of happening on SSD in the dialect of different languages other /r/ and subjects with such impairment restricts the performance than English. The works on the impaired speech by the validation of the investigated methods. This further demand the various researchers, the impaired speech database they more contribution from scholars in the development of automatic developed and the techniques they recommended for techniques and databases for misarticulated /r/ in different automatic identification of impaired articulation of sound /r/ languages. Index Terms: Speech Sound Disorder, Rhotacism, Impaired is presented in section 2. Section 3 details the proposed Articulation, Impaired Speech, Dyslalia. method to detect the impaired articulation /r/ sound in Telugu language and section 4 concludes the paper. I. INTRODUCTION II. INVESTIGATION OF LITERATURE In the era of the fully grown and technology driven world, the automatic detection of impaired speech or SSD is not The automatic detection & databases for misarticulated /r/ fully addressed. SSD is the inability to pronounce a letter or sound developed by various researchers are detailed in this letters. The sources for this disorder are not fully known yet section. Ovidiu Grigore et al. [13] presented a technique, [1]. There are mainly two categories in SSD, those are which identifies the impaired pronunciation of /r/ in the Articulation disorder and Phonemic disorder [2, 3]. The Romanian language automatically, using variations in the former one is due to the difficulty in learning of, how these timbre during the evaluated duration, by MFC coefficients phonemes are physically produced. This disorder completely and there by performing k-NN classification algorithm on deals with the main articulators. Later one is due to the extracted feature i.e., Timbre. The database used in this difficulty in learning and understanding the language’s sound technique contains the voice recordings of 8 female and 3 system. Diagnosis of this speech impairment at the early male adults, among them 3 female and 1 male have the stages of childhood may help them to recover from this improper pronunciation of sound /r/. From a group of ten problem. Customary speech impairments are due to the words, each spokesman utters each word three times. All improper articulation of sounds, such as /r/ (Rhotacism), /s/ & voice recordings are preprocessed initially. During this stage, /z/ (Sigmatism) etc. Various remediations for SSD, the initial phoneme /r/ is segmented manually from the word. The words used to create the database contains initial consonant /r/ followed by a vowel. This particular choice in Revised Manuscript Received on March 20, 2019. Suresh Kumar Nagaram, Department of Electronics and word selection makes manual segmentation easy. The quality Communication Engineering, Koneru Lakshmaiah Education of manual segmentation is acceptable, because of the vowel Foundation,Vaddeswaram, Guntur, Andhra Pradesh 522502, India. Suman Maloji, Professor, Department of Electronics and Computers Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh 522502, India. KasiprasadMannepalli, Associate Professor, Department of Electronics and Communication Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh 522502, India. Published By: Blue Eyes Intelligence Engineering Retrieval Number: F2164037619/19©BEIESP 172 & Sciences Publication Misarticulated /r/ - Speech Corpus and Automatic Recognition Technique after the improper sound /r/. After the preprocessing, the for the vowel ‘o’ after the sound /r/ with ‘k’ is in-between 7 to segmented portion of interest is processed to extract the 11. In the work of Ovidiu Grigore [14], detection of the feature. The timbre feature of the speech over the duration of extremely effected pronunciation of sound /r/ in Romanian the pronounced phoneme /r/ is extracted from the MFC language using kohonen neural network was reckoned. coefficients. The envelope shape of the speech signal is a Phoneme /r/ is the most commonly mis-pronounced sound in feature, that can be used to differentiate the improper with the children. Enormous vowel detection algorithms are proper pronunciation if, the degree of improper developed, as they are easy to detect, because of their high pronunciation is high. The envelope shape of the proper energy concentration. A little work was done to detect the pronunciation (shown in figure 1(a)) resembles the linear consonants and its features. Whereas almost no work was modulation (i.e., Amplitude modulation), where as in later done to detect the improper pronounced consonants. Because case the envelope shape is more constant as shown in the it is more difficult than detecting properly pronounced below figure 1(b). This result is due to the replacement of consonants. A considerable effort was done by the author to sound /r/ by the other sounds /d/, /t/, /1/ etc. in improper detect the improper pronunciation of consonant /r/ using pronunciation and they will have amplitudes varying very neural networks. Words with initial /r/ phoneme followed by slowly. If the degree of improper pronunciation is very less, a vowel ‘a’ (rac, rana, rama etc.) were uttered by 15 children then the sound /r/ is guttural, making the shape of the and 5 adults for database creation. Among them, only twelve envelope similar to the proper pronunciation. This makes the children were suffering from rhoticism. The envelope shape envelope shape feature no longer a good choice. of the signal is considered as the feature and is calculated by taking the normalized amplitude of the signal over a small duration. This process is repeated until the duration of interest is covered. Later these normalized amplitudes are interpolated and then its mean is calculated to form the feature vector space with a reduced dimension. This feature vector is processed through the prominent data clustering algorithm called kohonen neural networks or self-organizing maps. With the help of this clustering algorithm, the close relation between the different test samples and the groups of the speech samples relative to their correctness is developed. The size of the input neurons is the length of feature vector space and the output neurons are3*3 array. From the results, it can be observed that more improper pronunciations are accumulated on a single output neuron and proper (a) pronunciations are accumulated on various output neurons. The accuracy of the developed kohonen neural network classifier is 82.5%. Valentin Velican, et al [15] developed an automatic system to detect improperly pronounced initial /r/ consonant in Romanian. The author collected the acoustic information from “Logopedic Interscolar Centre – Bucharestto, Romania” to develop the database. The database contains the voice recordings of the words, where /r/ is initial phoneme in the word such as “rac”, “rim” etc. All the words

Load more