Multi-Class Support Vector Machine Based Continuous Voiced Odia Numerals Recognition
Total Page:16
File Type:pdf, Size:1020Kb
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 10, OCTOBER 2019 ISSN 2277-8616 Multi-Class Support Vector Machine Based Continuous Voiced Odia Numerals Recognition Prithviraj Mohanty, Ajit Kumar Nayak Abstract: With rapid advancement of automatic speech-recognition technologies, speech-based machine interaction has fascinated attention by many researchers to apply their approach from the research laboratory to real-life applications. A continuous voiced numerals recognition system is always useful for physically challenged persons (blind people) or elder people to have a telephonic conversion, setting the PIN number for their debit and credit cards and also devising the security code for some applications without physically touching the system. The work presented on this paper emphasize the recognition of continuous Odia numerals using multi-class Support Vector Machine (SVM).Three popular feature extraction techniques such as: PLP, LPC and MFCC are used to extract the feature parameters from voiced numerals and fed as the input to the recognition process. Different kernel mapping functions like: polynomial, sigmoid, Radial Basis Function (RBF) and wavelet are used in order to map the non-linear input feature space of framed signals to linear high dimensional feature space. So as to recognize the Odia numerals, multi-class SVM models are constructed using the techniques of One-Verses-All (OVA) and Half-Verses-Half (HVH). For the proposed system, diverse experimentations has been performed and results are analyzed over multi-class SVM models considering different feature parameter techniques with various kernel mapping functions. It has been observed that, OVA SVM model with MFCC for feature extraction and wavelet as kernel mapping function provides better accuracy as compared to other variations of the results attained. Index Terms: SVM, PLP, LPC, MFCC, Sigmoid, RBF, Wavelet, OVA, HVH. —————————— —————————— 1. INTRODUCTION distribution, so it exhibits poor performance in classification. AUTOMATIC speech recognition (ASR) by machine is Therefore, it requires a technique which may be used to believed to be the most active and exciting field of research classify in a better manner. During last few decades and being well-thought-out for more than 50 years. The main researchers were proposed some alternative approaches that objective of an ASR system is to transcript input voiced have better performance compared to HMM. Most of the utterances into its corresponding text. The ASR system can be approaches based on Artificial Neural Networks [5] or the exploited for certifying users via their voiced signals and hybrid approach of HMM-ANN [6]. A new machine learning executing the activity in the form of commands specified by technique like SVM which has good generalization and the human [1]. ASR is also, treated as one of the active convergence property, can be used as a better classifier. application of speech processing and usually used for human Generally SVM is a linear classifier but use of different kernel machine interaction. ASR applications are very much crucial in mapping functions permits SVM to function as a non-linear voice based activities, automatic debit and credit card classifier which has high dimensional feature space [7]. activation, safety and investigation amenities. In the present A lot of enhancement has been already done in the field of day, in most of the smart phones, laptops and tablets, ASR ASR for all popular spoken language like English, French, based soft wares such as: OK Google, Apple Siri, and Chinese, Japanese, and Mandarin etc. ASR systems are there Microsoft Cortana are incorporated to make the life more and further evolving is going on continuously. Currently many simple and productive [2]. Automatic numeral recognition of research works are going on over Indian languages like Hindi, spoken utterances has concerned a lot of authenticity because Bengali, Kannada, Telugu, Marathi, Punjabi, Gujarati etc. several numerical data such as account number, debit and Indian language like Odia, is still less advanced due to credit card number, telephone number can be inputted to the absence of computational linguistic resources. ASR system for machine conveniently using the voices of humans. Isolated Odia language has been found inefficient even though it is word recognition and spoken digit recognition are mostly spoken by approximate 33 million of people in India. Yet, a few applicable for data entry automation, generation of PIN code research work has been found for Odia language. So the non- applied in various services, automation in banking and security availability of advanced ASR software for Odia language and systems. Similarly, the application of continuous voice based regional sensation makes curiosity for adding more research numeral recognition is generally helpful for automatically effort towards it. In this paper, we proposed a system for dialing telephone numbers [3]. In the early research over the recognizing continuous voiced Odia numerals using support speech recognition systems, HMM model, a statistical method vector machines. The system implements by considering based classifiers have been employed to evaluate the acoustic voiced mobile numbers spoken in Odia language. First the probability. Using the maximum likelihood estimation (MLE) continuous voiced numerals are preprocessed and segmented algorithm, parameters for HMM models are computed [4]. into isolated numerals. Then different feature extractions Since the parameters are evaluated using only the input class methods are applied like: MFCC, LPC and PLP. These data and mainly depends upon the prior probability features are inputted to multi-class SVM for recognition of numerals. The input voiced numeral signal contains many ———————————————— different features which is non-linear in nature can’t be Prithviraj Mohanty , classified easily by SVM. So, various kernel mapping functions Department of CS&IT, ITER, S’O’A (Deemed to be) University, Bhubaneswar, India.E-mail: [email protected] such as: polynomial, sigmoid, RBF and wavelet are used to Dr. Ajit Kumar Nayak map from non-linear input space to linear feature space [8]. Department of CS&IT, ITER, S’O’A (Deemed to be) University, The efficiency of the system is computed using the above Bhubaneswar, India. Email: [email protected] mapping functions along with different multi-class SVM classifier. The next part of the paper is framed as follows. Section-2 describes the related work over word and digit 2754 IJSTR©2019 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 10, OCTOBER 2019 ISSN 2277-8616 recognition for different languages. Section-3 outlines the classifications. A continuous speech recognition with SVM proposed model along with fundamental of SVM classifier. The which takes decision at frame level and a token passing experimental result with comparison is presented in section-4. method which is considered for finding the sequence of Finally section-5 accomplishes the paper and suggests the recognized words, has been proposed in [21]. Mittal et.al future directions. proposed a multiclass SVM for recognition of spoken Hindi digits. They used MFCC, LPC and mix of both for feature 2 RELATED WORK extraction and different kernel mapping functions of SVM for Continuous voiced numeral recognition is treated as a classification. The system has been experimented with designing technique for a voiced dialer system. The system is different approaches (one vs. all and ten one vs. all) for SVM usually significant for substantially defied (blind people) or classification with variation of frames for a signal. The aged people for having a telephonic exchange without performance comparison of various feature extraction physically dialing the numbers. This is also helpful for the technique along with other recognition technique has been illiterate people those who can speak the numerals but can’t reported in [22]. A new technique where wavelet analysis and recognize them accurately. A number of research work have SVM is utilized for speaker verification has been proposed by been proposed to recognize numerals for different languages. Returi et.al [23]. Filter banks present in wavelet has been used The exploration work related to isolated word recognition and for extracting the features which consequently distinct the digit/numeral recognition for different spoken languages with normal and abnormal input voices. Furthermore SVM different parametric representation of the speech along with approach was used to segregate the particular speaker signal various methods for classification are main focus for our from multiple dialogs. The results obtained was found to be discussion. Isolated word and digit recognition for various 95% accuracy considering appropriate classification. language with different techniques along with their Whispered recognition using SVM and HMM approach has performances has been presented in [9]. Odia word been proposed in [24]. The experimental outcomes obtained recognition system based on HMM model used for the visually by the authors suggest that, for speaker independent (SI) diminished students in school and public education was HMM provides better result while for speaker dependent (SD) proposed in [11]. S. Mohanty et.al developed a model where SVM was found to be superior. A hybrid technique which uses speech recognition and speaker verification has been both HMM and SVM can be considered