Recognition of Online Handwritten Gurmukhi Strokes Using Convolutional Neural Networks

Recognition of Online Handwritten Gurmukhi Strokes using Convolutional Neural Networks Rishabh Budhouliya1, Rajendra Kumar Sharma1 and Harjeet Singh2 1Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Punjab, India 2Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India frbudhouliya bemba15, [email protected], [email protected] Keywords: Convolutional Neural Networks, Data Augmentation, Stroke Warping, Gurmukhi Strokes, Online Handwritten Character Recognition. Abstract: In this paper, we attempt to explore and experiment multiple variations of Convolutional Neural Networks on the basis of their distributions of trainable parameters between convolution and fully connected layers, so as to achieve a state-of-the-art recognition accuracy on a primary dataset which contains isolated stroke samples of Gurmukhi script characters produced by 190 native writers. Furthermore, we investigate the benefit of data augmentation with synthetically generated samples using an approach called stroke warping on the aforemen- tioned dataset with three variants of a Convolutional Neural Network classifier. It has been found that this approach improves classification performance and reduces over-fitting. We extend this finding by suggesting that stroke warping helps in estimating the inherent variances induced in the original data distribution due to different writing styles and thus, increases the generalisation capacity of the classifier. 1 INTRODUCTION the strokes to achieve a state-of-the-art accuracy. • Explore and evaluate multiple CNN variants on Research in Online Handwritten Recognition Sys- the basis of their distribution of trainable param- tems seems to have peaked since the late 1990s, con- eters between convolution layer and fully con- 0 sidering the fact that Google s Multilingual Online nected layer on datasets with varying data samples Handwritten Recognition System is able to support per class. 22 scripts and 97 languages, which is being currently used by certain commercial products like Google • Study the effect of stroke warping, a technique to Translate successfully (Keysers et al., 2017). Despite produce random variances within the stroke sam- this success in Google0s system, we want to bring cer- ples. The warped augmented dataset is created by tain factors into light which are limited to the scope of applying a combination of affine transformation Indo-Aryan Languages - (rotation) and elastic distortions to images of the existing stroke samples. • Each written language has a strong variability as- sociated with the writing style depending upon • Experimentally evaluate the effect of data aug- certain demographic factors like region, age and mentation on classification accuracy of the clas- culture. sifier. • In the case of such languages, major issues tackled This paper is structured as follows. In section II, there by any researcher include the shape complexity of is a discussion about the development of character the characters, features for recognition of charac- recognition and the progression in Gurmukhi script ters and stroke level recognition. recognition. In section III, we introduce the Gur- mukhi script, covering its features and the different Motivated by these factors, our work aims to: styles of writing the script. The generation of syn- • Use a primary dataset called OHWR-Gurmukhi, thetic data is covered in section IV, where we use described in section IV, to sub sample a dataset the technique called stroke warping, to augment the which includes 79 stroke classes with 100 sam- dataset 100. After that, we delve into our main idea ples each, referred as dataset 100. A Convolu- in section V, an experiment which tests three classi- tional Neural Network has been used to classify fiers to validate the benefit of data augmentation and to achieve an optimal classification accuracy. In sec- nition was built to detect and aggregate strokes us- tion VI, the framework of the experiment is laid out ing set theory to recognize characters with an accu- and results of the performed experiments are shown. racy of 95.60% for single character stroke sequencing We discuss the result and their consequences on the (Kumar and Sharma, 2013). This work was done on objectives of the paper in section VII and we conclude a dataset of 27,231 samples categorized on the ba- the findings in section VIII. sis of the proficiency of the writers. A significant shift came after Hidden Markov Models(HMM) and Support Vector Machines(SVM) were used for clas- 2 RELATED WORK sification while employing the features extracted on the basis of region and cursiveness. This experiment resulted in a 96.70% recognition rate of Gurmukhi LeNet was the very first Convolutional Neural Net- characters (Verma and Sharma, 2016). Their experi- work used for visual detection tasks including char- ments consisted of the methods to extract features and acter recognition and document analysis (Jackel et al., then classify them using SVM or HMM for classifica- 1995). This neural network was used to extract local tion. Our aim in this work is to use the Deep Learn- geometric features from the input image in a way that ing concept of learning features and then performing preserved approximate relative locations of these fea- extensive experimentation using CNNs to obtain bet- tures. By convolving the input image with a trainable ter recognition accuracy. One of the main advantage kernel, the network was able to produce high level of using a CNN is that it is able to extract features feature maps which were then fed to a linear classifi- automatically and is invariant to shift and distortion cation layer. For the system they created, an overall (Wong et al., 2016). OCR accuracy exceeding 99.00% was achieved. Owing to the continuous academic research in the Online Handwritten Chinese Character Recogni- tion, it has been demonstrated (Xiao et al., 2017) that 3 GURMUKHI SCRIPT methods based on CNNs can learn more discrimi- native features from source data, which may lead to Punjabi language is spoken by about 130 million peo- a better end-to-end solution for Online Handwritten ple, mainly in West Punjab in Pakistan and in East Recognition problems. The authors of this paper con- Punjab in India. Indian Punjabi is written using the tinued to design a compact CNN classifier for On- Gurmukhi script, which has a fairly complex system line Handwritten Chinese Character Recognition us- of tonal variance. ing DropWeight for pruning redundant connections Some notable features of Gurmukhi script are : in a CNN architecture maintaining an accuracy of • Gurmukhi script is cursive and written in left to 96.88%. Handwritten Bangla Digit Recognition us- right direction with top down approach. ing a CNN with Gaussian and Gabor Filters, achieved • A horizontal line, called a “shirorekha” is found 98.78% recognition accuracy (Alom et al., 2017). on the upper part of almost all the characters. Along with working on improving the recognition accuracy of the classifiers, researchers also worked • Any Gurmukhi word can be divided into three upon writer adaptation for online handwritten recog- sections viz. Upper Zone, Middle Zone and the nition where they used lexemes to identify the styles Lower Zone. All the strokes are classified into present in a particular writer’s sample data which re- one of the three zone. The upper zone consists of sulted in the reduction of average error rate on hand- the region above the head line where some of the written words (Connell and Jain, 2002). vowels reside. The middle zone is the most popu- In the present paper, we demonstrate a method lated zone, consisting of consonants and some of to capture the variability induced by different writ- the vowels. The lower zone contains some vow- ing styles, thus enhancing the generalization accu- els and half characters that lie below the foot of racy of the classifier. It is pertinent here to discuss consonants (Verma and Sharma, 2017). the progression in Gurmukhi script recognition. A recognizer using pre-processing algorithms (Normal- 3.1 Different Styles of Writing ization, Interpolation and Slant Correction) has been Gurmukhi Script proposed to recognize loops, headline, straight line and dot features from online handwritten Gurmukhi The critical area of research in Character recognition strokes collected on a pen-tablet interface by 60 writ- is to capture and detect the complex nature of any ers (Sharma et al., 2007). After this, a post proces- script that results in the variation of writing styles. sor for improving the accuracy of character recog- The documented reasons for variation in handwriting Figure 1: Stroke classes for Gurmukhi character set. style can be attributed to a distinct way of writing for each person, and the ways can be: Figure 2: The distribution of dataset 100 into training and • Speed of writing test set. • Style of holding the pen validation set was created due to the small size of the • Formation of a character can be influenced by dataset 100. the amount of strokes used to create a character. An important question to be answered is why does Some users use a single stroke while others may an increase in data size by augmentation of sample use multiple strokes to write the same thing. images increases the generalization accuracy? Also, if there is an increase, could it be attributed to the fact 3.2 Data Collection that stroke warping is able to capture the variability in the empirical population due to the writer’s personal The source population of our dataset is created by 190 characteristics as discussed above. To answer these writers of different age group to bring maximum vari- questions, we have set-up an experiment where the ability in the population. A touch based, Tablet PC correlation between change in nature of the dataset has been used as input interface. The collected data due to addition of synthetic data samples and increase was annotated at stroke level with respective stroke in recognition accuracy can be seen and verified by classes characterized by three zones; Upper, Middle comparing performances of three different CNN clas- and the Lower zone as shown in Figure 1.

Load more