Handwritten Gujarati Numeral Recognition Using Wavelet Transform Mikita Gandhi 1, Dr
Total Page:16
File Type:pdf, Size:1020Kb
JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131 Handwritten Gujarati Numeral Recognition using Wavelet Transform Mikita Gandhi 1, Dr. V. K. Thakar 2, Dr. H.N.Patel 3 Department of Electronics & Communication, ADIT, New V. V. Nagar Gujarat Technological University [email protected] [email protected] [email protected] Abstract—Handwritten character recognition is leading research area in last few decades. In India, different languages are used in different regions. The common algorithm cannot be used for all Indian languages. In this paper, the feature extraction using wavelet transform was used for recognition of handwritten Gujarati numerals. Wavelet transform was decomposed the image into four sub images: one approximation and three detailed sub images. The approximation was used as feature vector. Total 5000 samples were taken for analysis of the proposed algorithm, in which 80% was used during training and 20 % was used for testing. The Accuracy obtained using KNN classifier was 95.86%, 98.30% and 94.16% for level 2, level 3 and level 4 wavelet decomposition levels respectively. Keywords— Handwritten Gujarati Numeral Recognition, Wavelet Transform, DWT, KNN I. INTRODUCTION This The basic difficulty in any handwritten document processing system is in recognizing the handwritten characters contained in the article. In the field of pattern recognition and image processing, the most ongoing challenging research area is hand written character recognition. A lot of work has been carried out for recognizing handwritten characters in English language and several softwares are available for handwritten reorganization of English language, but no such software was available for Indian languages, particular for Guajarati language. Across India there are approximately 46 million speakers use the Dravidian language that is Gujarati (around 4% of the total population of India). Unlike the English numerals, Gujarati numerals are written by curved lines, so there were large varieties of writing styles founded in handwritten Gujarati Numerals. Ten symbols 0,1,2,3,4,5,6,7,8,9 are used to represent 0,1,2,3,4,5,6,7,8,9 numbers respectively in Guajarati Numeral system. The objective of the system development is to help kids to be taught and recognize numerals in “learn with play” manner and also development of automatic recognition of amount written on bank cheque for banking application and marks written on answer sheet for academic application. In this paper, the offline handwritten Guajarati numeral recognition work was carried out using wavelet transformation as a feature extraction method. The accuracy was measured using KNN (K-Nearest Neighbor) classifier. Paper is organized as follows: Section 2 discusses about related work carried out in Guajarati numeral recognition using wavelet transform, Section 3 describe the Proposed Algorithm, Section 4 illustrates the Discrete Wavelet Transform; KNN discussed in Section 5; Results and Discussions in section 6; Section 7 presents the conclusion and remarks. II. RELATED WORK The most of the people in Gujarat state used Gujarati as first language for communication. Due to various applications of handwritten Gujarati numerals, an OCR system is required. A number of recognition systems have been proposed for handwritten Gujarati numerals. Survey relevant to recognition of handwritten Gujarati Numeral is given below. Kamal Moro et al [1] in 2013 have proposed pattern based feature extraction method in which done the sum of horizontal, vertical, left and right diagonal pixels sum. Using neural network classifier, 80.5% accuracy was achieved on 600 testing samples. Baheti M. J. et al [2] in 2012 worked on Affine Invariant Moments as a feature extraction method and have shown the comparison between different classifier. The algorithm was tested on 1600 samples in which 50%was applied for training and 50% was applied for testing and got recognition rate of 92.28% for SVM, 87.2% for Gaussian distribution function, 90.04% K-NN classifier and 84.1%for PCA classifier. Swital J. Macwan et al [3] in 2015 have worked on Gujarati character recognition and testing was done on 7800 samples database. Volume VI, Issue IV, April/2019 Page No:2699 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131 The SVM classifier was applied on feature set taken by Discrete Wavelet Transform, Discrete Cosine Transform and Discrete Fourier Transform. The preprocessing metods like noise removal, binarization, segmentation, size normalization Thinning and Spur removal was used to increase the recognition rate of classifier. The accuracy achieved 89.46% by DWT,89.31% by DCT and 96.06% by DFT. Jignesh Dholakia et. al. [4] in 2005 worked on zonal identification. Apurva Desai [5] in 2010 has proposed object profile based feature extraction method and got 82% success for recognize Guajarati numerals using a multi layered feed forward neural network classifier. V. A. Naik et al [6] in 2018 have used different structural and statistical features set of 22 features. These features were generated using chain code and tested on 2000 numeral samples. Using polynomial kernel in SVM classifier, the algorithm achieved 95% accuracy. Statistical features were used by Dinesh Satange et al [7] to identify offline handwritten Gujarati Numerals using Multi Layer Perceptron (MLP) network and got 90% accuracy. Sekhar Mandal et al [8] in 2011 have proposed algorithm on machine-printed character recognition in Bangla language. The two dimensional wavelet transform and gradient information was used as a feature extraction method. 1475 sampled images features were applied in KNN classifier which had achieved 88.95% accuracy. Saleem Pasha et al [9] in 2015 have shown problem of handwritten Kannada Character Recognition. Statistical features like Corner Detection, Corner Detection, Quadrant Density, Aspect Ratio, Width Features and Discrete wavelet transform were used to extract the features. The preprocessed was applied to DWT transform and then the resultant four sub bands were applied to IDWT to produce one dimensional 128 feature set. The combination of these two feature sets and ANN classifier has been reached 91% accuracy with 1000 data samples. S.Tharani et al [10] presented the application of DWT in Tamil and English handwritten characters. Using Haar wavelet transform, the feature sets were computed at decomposition levels 1, 2 and 3. Abdurazzag Ali ABURAS et al [11] in 2007 worked on handwriting Arabic characters using wavelet transformation. The Euclidean distance method was used for classified the characters and got approximately 80% accuracy average and for some characters reached up to97.9% accuracy. III. DATABASE PREPARATION The database was prepared by 500 writers of all age groups, with variety of variation pen or pencil in thickness of nib and variety of color and style of writing. Fig 1 show the scan copy of handwritten Gujarati Numeral database. Fig 1 sample copy of handwritten Gujarati numarals. The scan copy was converted in to binary form and using morphological operations handwritten numerals were segmented and resized into 64X64 pixels. Single sample of each digit from each writers were collected, so total 500X10 =5000 samples database was formed. IV. PROPOSED METHOD The sampled image was first converted into grayscale image then Discrete Wavelet Transform (DWT) was applied on grayscale image and produced four sub band images. The approximation component of sub band images was taken as a feature set. The detail of wavelet transform was described in next section. Volume VI, Issue IV, April/2019 Page No:2700 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131 KNN was used as classification method with k=1, 3 and 5. Fivefold cross validation was used to find resultant accuracy. Figure 2 shown proposed method block diagram for Handwritten Gujarati Numeral Read the input image Convert it into Gray scale Image Apply Discrete Wavelet Transform for level 2, 3 and 4 Consider Approximation sub image as Feature Set Classification using KNN Classified Numeral Fig .2 Block diagram of proposed method V. FEATURE EXTRACTION METHOD A. Concepts of Wavelets Wavelet is a mathematical function which was used in digital signal processing to recover weak signal from noise and used in image compression to upload image on internet. Wavelet analysis has ability to analyze rapidly changing transient which attract it used in various application. For provide more accurate temporal and frequency information wavelet transform is better choice than Fourier transform. The wavelet provides the time-frequency representation of signal. To analyze the wavelet consider the complex valued function 휓 satisfy the following two conditions: i. The function ψ has finite energy that implies ∞ | |2 ( ) ∫−∞ 휓 푑푡 < ∞ … … 1 ii. It is the admissibility condition. ∞ |횿(ω)|2 푐 = 2휋 ∫ 푑휔 < ∞ ……….(2) 휓 −∞ |휔| Where 횿 is Fourier transform of 휓 . This condition says that if 횿(ω) is smooth than 횿(0) = 0 The function 휓 called mother wavelet. The continuous wavelet transform (CWT) of 1-D f(x) signal is given by 1 +∞ 푋−푏 푊(푎, 푏) = ∫ 푓(푥)Ψ∗ ( ) 푑푥 ……(3) √푎 −∞ 푎 Where Ψ∗ denotes complex conjugates of , a is time dilation parameter and b is translation parameter. Discrete wavelet transforms (DWT), which transforms a discrete time signal to a discrete wavelet representation. It converts an input series X0, X1, ..Xn-1, into one high-pass wavelet coefficient series and one low-pass wavelet coefficient series (of length n/2 each) given by: 푘−1 퐻푖 = ∑푛=0 푋2푖−푛푠푛(푍)…..(4) 푘−1 퐿푖 = ∑푛=0 푋2푖−푛푡푛(푍)…..(5) Volume VI, Issue IV, April/2019 Page No:2701 JASC: Journal of Applied Science and Computations ISSN NO: 1076-5131 Where sn(Z) and tn(Z) are called wavelet filters, K is the length of the filter and i=0,...,[n/2]-1. Most popular mother wavelets are shown in figure 3: Fig. 3 Mother wavelets B.