View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by MAT Journals

Journal of VLSI Design and Signal Processing Volume 4 Issue 2

Recognition of Printed and Handwritten Characters using SVM Classifier

Urmila B. Basarkod 1, Shivanand Patil 2 1P.G. Student, 2Assistant Professor Department of Computer Science and Engineering, KLE Dr. M. S. Sheshgiri College of Engineering and Technology, Udyambag, Belagavi, Karnataka, India E-mail: [email protected]

Abstract The optical character recognition is the process of converting textual scanned image into a computer editable format but one of the major challenges faced is the recognition of character from the image. The proposed system is application software for Recognition of Kannada Printed and Handwritten Characters from an image. The input image is subjected for pre-processing to make the image noise free by using median filter and then it is converted to binary image. Segmentation process is carried out to extract one character from the image by performing horizontal segmentation followed by vertical segmentation. Co- relation coefficient is used for extracting the features from the image then the character is classified using SVM classifier finally the classified character is post-processed using its values to display the recognized character. We have obtained perfectness of 100% and 99% in recognition of Kannada Printed and Handwritten characters respectively.

Keywords: Kannada Script; Optical Character Recognition; Median Filter; Local Binary Pattern, Co-relation Coefficient, SVM Classifier;

INTRODUCTION processing. Each stage result is depending In the present scenario large amount of on the result of previous stage. information is stored in images, thus character recognition has obtained Kannada is Dravidian language significant popularity and it has emerged predominantly spoken by kannada people. as an active area of research, Thus Image It is the official language of Karnataka. processing is mainly concerned about The Kannada script is a combination of 49 extraction of more valuable information letters and these letters are of 3 categories from the images. The recognition of they are: swara includes 13 letters, information begins by identifying the text vyanjanas include 34 letters and yogavaka from the image, this process of identifying includes 2 letters. and recognition text is called Optical Character Recognition (OCR).

Optical Character Recognition is mechanical conversion or electronic conversion of printed or hand written text images into a form that the computer can manipulate. Optical Character Recognition includes five stage of processing those Fig. 1: Kannada letters include pre-processing, segmentation, feature extraction, classification, and post Some languages of Asia like Japanese and Chinese and many European languages

1 Page 1-9 © MAT Journals 2018. All Rights Reserved

Journal of VLSI Design and Signal Processing Volume 4 Issue 2

have used an OCR system. However less recognising text in the field of image effort as put in the OCR system for Indian processing. In [6] author used OCR system languages, particularly for South Indian for handwritten Kannada and digits languages such as Kannada. Thus this recognition they have used zoning features work concentrates on Recognition of along with the K-nearest Neighbour Kannada Characters from an image. classifier and Support Vector Machine classifier independently to identify LITERATURE SURVEY handwritten Kannada and Telugu digits. In[1] author provides general method for They have considered an image consisting Recognition of Character from the image. a digit which is then divided it into 64 This process of recognising text from zones. Computed each zone pixel density image is step by step process which and adopted classifiers independently to includes pre-processing for basic classify the digits. Acquired exactness for processing input image like binarization handwritten Kannada and Telugu digits for converting gray scale image into recognition of 95.50%, 96.22% and Binary image and also for removing noise 99.83% and 99.80% respectively. In [7] from the image, segmentation is the author has considered an image containing processing step that divides image into line Kannada vowels or English characters, and then into character, feature extraction is to normalized it into 32*32, then divided into calculate the characteristics of an image, 64 zones and the pixel density is counted classification is to compare characters for each zone. These features are assigned from database, and at last it is post to KNN and SVM classifiers. In case of processing. This paper provides a brief handwritten Kannada character survey on the extraction of characters from recognition, gained an average perfectness given images. In [3] usually images of 92.71% and 96.0% with KNN and SVM include some noise, because of presence of respectively. For uppercase English these noise quality of an image degrades. alphabet gained an average perfectness of Therefore making an image noise free is 97.51% and 98.26% with KNN and SVM too important in order to maximise the respectively. In[8] Recognised handwritten quality of an image. The process of Kannada and English characters from an removal of noise or improving the quality image based on spatial features. Characters of image is called de-noising. In this an are assigned to KNN classifier. The author has discussed about some list of algorithm is dependent on the type and noise that can be present in the image and size of the structuring element used for the some de-noising techniques those can feature extraction. It is independent of the be used to minimise the noise in the image. image normalisation, thinning, noise and In [5] Recognition of text from an image slant of the characters. Gained a include 5 steps among these classification perfectness of an average percentage is one of the necessary step need to be 96.2% and 90.01% % in case of carried out for recognising the text from handwritten numerical and English the image. The efficiency and accuracy of uppercase alphabets respectively. recognising text highly depends on classification phase. In this paper an author PROPOSED SYSTEM provided briefly about classifiers, to the Aim of the proposed system is to recognise beginners of this field. In this they have the Kannada Printed and Handwritten explained about Support Vector Machine Characters from the considered image (SVM), Artificial Neural Network (ANN), using Optical Character Recognition. The Decision Tree (DT), and KNN classifier, process includes five steps which are highly popular classifiers for

2 Page 1-9 © MAT Journals 2018. All Rights Reserved

Journal of VLSI Design and Signal Processing Volume 4 Issue 2

Fig. 2: System Architecture

Step 1: Pre-processing of the image: of an image. Even the accuracy of the An image is considered and it is converted classification / recognition is depending on from RGB - color image into gray scale the result of feature extraction. For image. Then median filter-2 is used to extraction of potential features from filter the image then it is converted to considered image, the image is segmented binary image of 0's and 1's form using by LBP and normalized to size of 42*24 thresholding. Then BWareaopen is used to and after normalizing correlation remove noise from the binirized image and coefficient is used for extraction of text after completion, the image is passed for from image. The result of correlation segmenting the image. coefficient lies between 0 to 1. If the result is 0, there are no similarity exits. If the Step 2: Segmentation process result is 1 both images are exactly the The pre-processed image is considered and same it is segmented horizontally then followed by vertically to extract the single character Step 4: Classification from the image. It considers the image For classification of characters SVM performs horizontal segmentation and (Support Vector Machine) classifier is divides into lines, Then it considers the used. . It provides separation between 2 first line, Performs vertical segmentation linearly separable classes which is on the line and divide it into single achieved by considering a hyper on characters which are independent of other all data points of considered classes. The characters in the line. Repeat this process hyper plane provides proper classification for all the characters in the line, and for all on all data points. Consider two classes A the lines in the image. and B. All the points belongs to class A are labeled +1 and the points belongs to Step 3: Feature Extraction class B are labeled as -1. As shown in Feature extraction is one of the major below figure. steps, which describes the characteristics

3 Page 1-9 © MAT Journals 2018. All Rights Reserved

Journal of VLSI Design and Signal Processing Volume 4 Issue 2

Fig 3: SVM Classification

Step 5:Post-Processing The experiment was carried out for Post - Processing is last step of this project Kannada Printed and Handwritten which considers the image belongs to Characters by using SVM as a classifier. class-A which is classified by SVM The result of the experiment is gained by classifier. Assign them unique ASCII using 118 Training examples, and tested value. In this the testing character is by 118 examples. Below pictures show the compared with the ASCII codes or result snapshots of the proposed system Unicode's of the Kannada characters and same procedure is followed for matched character is displayed onto the handwritten characters. Table 1 shows the screen. comparison of different methods those were implemented by different authors Experimental Result with the proposed system.

Checking for printed Characters

Fig 4: Selecting an image

4 Page 1-9 © MAT Journals 2018. All Rights Reserved

Journal of VLSI Design and Signal Processing Volume 4 Issue 2

Fig 5: Performing gray scale

Fig 6: performing Binarization

Fig 7: Removing noise from the image

5 Page 1-9 © MAT Journals 2018. All Rights Reserved

Journal of VLSI Design and Signal Processing Volume 4 Issue 2

Fig 8: Loading the training data set

Fig 9: Counting number of training data in the set

Fig 10: Identifying the Characters from the image

6 Page 1-9 © MAT Journals 2018. All Rights Reserved

Journal of VLSI Design and Signal Processing Volume 4 Issue 2

Fig 11: Recognized Character

Fig 12: Comparing Recognized Character with its Unicode

Fig 13: Comparing Recognized Character with Character in trained set

7 Page 1-9 © MAT Journals 2018. All Rights Reserved

Journal of VLSI Design and Signal Processing Volume 4 Issue 2

Table 1: Comparison of Results Methods Features and Percentage Proposed by Classifier used of Accuracy Tridib Chakraborty Chowdhury Md Mizan, Not mentioned and Suparna Karmaka gururaj mukarambi, mallikarjun hangarge, Zoning features 95.50% B. V. Dhandra K-Nearest Neighbor and SVM Gururaj Mukarambi, density based KNN B.V.Dhandra , Mallikarjun Hangarge zone features wih KNN and 95.50% and 97.32% SVM SVM 96.2% and 98.30% B.V.Dhandra , Zone based fetures with KNN and SVM Mallikarjun Hangarge, KNN and SVM Handwritten Gururaj Mukarambi 95.50% , 96.22% Printed 100.0%, 100.0% Mixed(both) 97.32%, 98.30% Shashikala Parameshwarappa, B.V.Dhandra Spatial features considered 90.01% for KNN Netravati Belagali, Shanmukhappa A. Angadi Zoning based invariant 94.69% moment feature with PNN Srikanta Murthy K HOG with SVM 95.02% Karthik S Proposed method Co-relation coefficient with Printed 100% SVM classifier Handwritten 99%

CONCLUSION REFERENCE In current scenario most of the data stored 1. Chowdhury Md Mizan, Tridib in the form of an image, thus recognition Chakraborty and Suparna Karmakar, " of character plays a major role in Text Recognition using Image extracting the data from an image. There Processing", International Journal of exists a maximum work in regard of Advanced Research in Computer recognising Kannada Characters but there Science, Volume 8, No. 5,Volume 8, exists no standard solution to recognise No. 5, May – June 2017 Kannada characters from the image with 2. Sukhjinder Kaur "Noise Types and reasonable perfectness. In this research Various Removal Techniques". work, presented a application software for IJARECE, Volume 4, Issue Recognition of Kannada Printed and 3. Manshi Shukla and Barjinder Kaur," Handwritten Characters from an image. Image De-noising and its Methods: A Firstly, extracted the image with character, Survey" then image is pre-processed to remove 4. International Journal of Science and noise from the image and then Correlation Research (IJSR), Volume 6 Issue Coefficient is used for extraction of 3,Abdalla Mohamed Hambal, Dr. features from image for Recognition of Zhijun Pei, and Faustini Libent character. The proposed work has Ishabailu, "Image Noise Reduction and provided good perfectness on recognising Filtering Techniques ". characters from image by using SVM 5. Rama Gaur, Dr. V.S. Chouhan, classifier. Gained a perfectness of 100% "Classifiers in Image processing", on Kannada Printed Characters and in case IJFRCSCE Volume: 3 Issue: 6. of Kannada Handwritten Characters 6. Mallikarjun Hangarge, Gururaj gained perfectness of 99.0%. Mukarambi, and B. V. Dhandra 2011,

8 Page 1-9 © MAT Journals 2018. All Rights Reserved

Journal of VLSI Design and Signal Processing Volume 4 Issue 2

" A script independent approach for Mixed Kannada Digits Recognition" handwritten bilingual kannada And ICVCI 2011, Proceedings published by telugu digits recognition " International IJCA. Journal of Machine Intelligence, ISSN: 11. http://en.wikipedia.com 0975–2927 & E-ISSN: 0975–9166, 12. Shashikala Parameshwarappa, and Volume 3, Issue 3, 2011, pp-155-159. B.V.Dhandra, "Basic Kannada 7. Mallikarjun Hangarge, Gururaj Handwritten Character Recognition Mukarambi, and B. V. Dhandra ," System using Shape Based and Handwritten Kannada Vowels and Transform Domain Features", English Character Recognition System International Journal of Advanced ", IJIPVS, Volume-1 Issue-1 ,2012. Research in Computer and 8. Mallikarjun Hangarge, Gururaj Communication Engineering Vol. 4, Mukarambi, and B. V. Dhandra," Issue 7, July 2015. Spatial Features for Handwritten 13. Faouzi Bouchareb, Rachid Hamdi, and Kannada and English Character Mouldi Bedda, " Handwritten Arabic Recognition", RTIPPR, 2010. character recognition based on SVM 9. Vishweshwarayya C. Hallur, Avinash Classifier" A. Malawade, Seema G. Itagi, "Survey 14. Tamim Ahmed Khan, Syed Hassan on Kannada Digits Recognition Using Tanvir, Abu Bakar Yamin," Evaluation OCR Technique", IJARCET, Volume of Optical Character Recognition 1, Issue 10, December 2012. Algorithms and Feature Extraction 10. B.V.Dhandra, Gururaj Mukarambi, Techniques" 2016 IEEE. Mallikarjun Hangarge, “Zone Based 15. http://mathworks.com Features for Handwritten and Printed

BIOGRAPHIES Urmila B. Basarkod is a M.Tech student in the Department of Computer Science & Engg., KLE Dr. M.S. Sheshgiri College of Engg. & Tech., Belagavi. She received B.E. degree in Computer Science& Engineering from S. G. B. I. T College of Engg. & Tech., Belagavi in 2016. Her research interests include Image Processing and Big Data.

Prof. Shivanand M. Patil is a faculty in the Department of Computer Science & Engg., KLE Dr. M.S. Sheshgiri College of Engg. &Tech., Belagavi. He did his Bachelors in Computer Science & Engg. from MMEC, Belagavi and M.Tech in Computer Science& Engg. from Basaveshwar Engineering College, Bagalkot. His research interests include Image Processing and Pattern Recognition. He has number of publications in to his credit in peer reviewed international journals and conferences.

9 Page 1-9 © MAT Journals 2018. All Rights Reserved