Recognition of Printed and Handwritten Kannada Characters Using SVM Classifier

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by MAT Journals Journal of VLSI Design and Signal Processing Volume 4 Issue 2 Recognition of Printed and Handwritten Kannada Characters using SVM Classifier Urmila B. Basarkod 1, Shivanand Patil 2 1P.G. Student, 2Assistant Professor Department of Computer Science and Engineering, KLE Dr. M. S. Sheshgiri College of Engineering and Technology, Udyambag, Belagavi, Karnataka, India E-mail: [email protected] Abstract The optical character recognition is the process of converting textual scanned image into a computer editable format but one of the major challenges faced is the recognition of character from the image. The proposed system is application software for Recognition of Kannada Printed and Handwritten Characters from an image. The input image is subjected for pre-processing to make the image noise free by using median filter and then it is converted to binary image. Segmentation process is carried out to extract one character from the image by performing horizontal segmentation followed by vertical segmentation. Co- relation coefficient is used for extracting the features from the image then the character is classified using SVM classifier finally the classified character is post-processed using its Unicode values to display the recognized character. We have obtained perfectness of 100% and 99% in recognition of Kannada Printed and Handwritten characters respectively. Keywords: Kannada Script; Optical Character Recognition; Median Filter; Local Binary Pattern, Co-relation Coefficient, SVM Classifier; INTRODUCTION processing. Each stage result is depending In the present scenario large amount of on the result of previous stage. information is stored in images, thus character recognition has obtained Kannada is Dravidian language significant popularity and it has emerged predominantly spoken by kannada people. as an active area of research, Thus Image It is the official language of Karnataka. processing is mainly concerned about The Kannada script is a combination of 49 extraction of more valuable information letters and these letters are of 3 categories from the images. The recognition of they are: swara includes 13 letters, information begins by identifying the text vyanjanas include 34 letters and yogavaka from the image, this process of identifying includes 2 letters. and recognition text is called Optical Character Recognition (OCR). Optical Character Recognition is mechanical conversion or electronic conversion of printed or hand written text images into a form that the computer can manipulate. Optical Character Recognition includes five stage of processing those Fig. 1: Kannada letters include pre-processing, segmentation, feature extraction, classification, and post Some languages of Asia like Japanese and Chinese and many European languages 1 Page 1-9 © MAT Journals 2018. All Rights Reserved Journal of VLSI Design and Signal Processing Volume 4 Issue 2 have used an OCR system. However less recognising text in the field of image effort as put in the OCR system for Indian processing. In [6] author used OCR system languages, particularly for South Indian for handwritten Kannada and Telugu digits languages such as Kannada. Thus this recognition they have used zoning features work concentrates on Recognition of along with the K-nearest Neighbour Kannada Characters from an image. classifier and Support Vector Machine classifier independently to identify LITERATURE SURVEY handwritten Kannada and Telugu digits. In[1] author provides general method for They have considered an image consisting Recognition of Character from the image. a digit which is then divided it into 64 This process of recognising text from zones. Computed each zone pixel density image is step by step process which and adopted classifiers independently to includes pre-processing for basic classify the digits. Acquired exactness for processing input image like binarization handwritten Kannada and Telugu digits for converting gray scale image into recognition of 95.50%, 96.22% and Binary image and also for removing noise 99.83% and 99.80% respectively. In [7] from the image, segmentation is the author has considered an image containing processing step that divides image into line Kannada vowels or English characters, and then into character, feature extraction is to normalized it into 32*32, then divided into calculate the characteristics of an image, 64 zones and the pixel density is counted classification is to compare characters for each zone. These features are assigned from database, and at last it is post to KNN and SVM classifiers. In case of processing. This paper provides a brief handwritten Kannada character survey on the extraction of characters from recognition, gained an average perfectness given images. In [3] usually images of 92.71% and 96.0% with KNN and SVM include some noise, because of presence of respectively. For uppercase English these noise quality of an image degrades. alphabet gained an average perfectness of Therefore making an image noise free is 97.51% and 98.26% with KNN and SVM too important in order to maximise the respectively. In[8] Recognised handwritten quality of an image. The process of Kannada and English characters from an removal of noise or improving the quality image based on spatial features. Characters of image is called de-noising. In this an are assigned to KNN classifier. The author has discussed about some list of algorithm is dependent on the type and noise that can be present in the image and size of the structuring element used for the some de-noising techniques those can feature extraction. It is independent of the be used to minimise the noise in the image. image normalisation, thinning, noise and In [5] Recognition of text from an image slant of the characters. Gained a include 5 steps among these classification perfectness of an average percentage is one of the necessary step need to be 96.2% and 90.01% % in case of carried out for recognising the text from handwritten numerical and English the image. The efficiency and accuracy of uppercase alphabets respectively. recognising text highly depends on classification phase. In this paper an author PROPOSED SYSTEM provided briefly about classifiers, to the Aim of the proposed system is to recognise beginners of this field. In this they have the Kannada Printed and Handwritten explained about Support Vector Machine Characters from the considered image (SVM), Artificial Neural Network (ANN), using Optical Character Recognition. The Decision Tree (DT), and KNN classifier, process includes five steps which are highly popular classifiers for 2 Page 1-9 © MAT Journals 2018. All Rights Reserved Journal of VLSI Design and Signal Processing Volume 4 Issue 2 Fig. 2: System Architecture Step 1: Pre-processing of the image: of an image. Even the accuracy of the An image is considered and it is converted classification / recognition is depending on from RGB - color image into gray scale the result of feature extraction. For image. Then median filter-2 is used to extraction of potential features from filter the image then it is converted to considered image, the image is segmented binary image of 0's and 1's form using by LBP and normalized to size of 42*24 thresholding. Then BWareaopen is used to and after normalizing correlation remove noise from the binirized image and coefficient is used for extraction of text after completion, the image is passed for from image. The result of correlation segmenting the image. coefficient lies between 0 to 1. If the result is 0, there are no similarity exits. If the Step 2: Segmentation process result is 1 both images are exactly the The pre-processed image is considered and same it is segmented horizontally then followed by vertically to extract the single character Step 4: Classification from the image. It considers the image For classification of characters SVM performs horizontal segmentation and (Support Vector Machine) classifier is divides into lines, Then it considers the used. It provides separation between 2 first line, Performs vertical segmentation linearly separable classes which is on the line and divide it into single achieved by considering a hyper plane on characters which are independent of other all data points of considered classes. The characters in the line. Repeat this process hyper plane provides proper classification for all the characters in the line, and for all on all data points. Consider two classes A the lines in the image. and B. All the points belongs to class A are labeled +1 and the points belongs to Step 3: Feature Extraction class B are labeled as -1. As shown in Feature extraction is one of the major below figure. steps, which describes the characteristics 3 Page 1-9 © MAT Journals 2018. All Rights Reserved Journal of VLSI Design and Signal Processing Volume 4 Issue 2 Fig 3: SVM Classification Step 5:Post-Processing The experiment was carried out for Post - Processing is last step of this project Kannada Printed and Handwritten which considers the image belongs to Characters by using SVM as a classifier. class-A which is classified by SVM The result of the experiment is gained by classifier. Assign them unique ASCII using 118 Training examples, and tested value. In this the testing character is by 118 examples. Below pictures show the compared with the ASCII codes or result snapshots of the proposed system Unicode's of the Kannada characters and same procedure is followed for matched character is displayed onto the handwritten characters. Table 1 shows the screen. comparison of different methods those were implemented by different authors Experimental Result with the proposed system. Checking for printed Characters Fig 4: Selecting an image 4 Page 1-9 © MAT Journals 2018. All Rights Reserved Journal of VLSI Design and Signal Processing Volume 4 Issue 2 Fig 5: Performing gray scale Fig 6: performing Binarization Fig 7: Removing noise from the image 5 Page 1-9 © MAT Journals 2018.

Load more