Online Bengali Handwritten Numerals Recognition Using Deep Autoencoders
Total Page:16
File Type:pdf, Size:1020Kb
Online Bengali Handwritten Numerals Recognition Using Deep Autoencoders Arghya Pal∗, B. K. Khonglahy, S. Mandaly, Himakshi Choudhuryy, S. R. M. Prasanna y, H. L. Rufinerz and Vineeth N Balasubramanian∗ Department of Computer Science and Engineering Indian Institute of Technology Hyderabad, Telangana 502285 ∗ Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati, Guwahati-781039 y Facultad de Ingeniera y Ciencias Hdricas - Universidad Nacional del Litoral - CONICET, Ciudad Universitaria, Paraje El Pozo, S3000 Santa Fe, Argentina z E-mail: fcs15resch11001, [email protected], fbanriskhem, [email protected], lrufi[email protected] Abstract—This work describes the development of online handwritten Bengali numeral recognition. This work finds its handwritten isolated Bengali numerals using Deep Autoencoder place in various domain like online form filling applications, (DA) based on Multilayer perceptron (MLP) [1]. Autoencoders census data collection, banking operations, number dialing capture the class specific information and the deep version uses many hidden layers and a final classification layer to accomplish systems and many more. this. DA based on MLP uses the MLP training approach for Some of the works that recognizes online handwritten its training. Different configurations of the DA are examined characters from different perspectives include HMM based to find the best DA classifier. Then an optimization technique handwriting recognition systems for Bangla [3], Tamil [4], have been adopted to reduce the overall weight space of the Telugu [5], Assamese [6], SVM based system for Tamil [7] DA based on MLP that in turn makes it suitable for a real time application. The performance of the DA based system is and Devanagari [8]. There are reported claims on handwritten compared with systems constructed using Hidden Markov Model Bengali characters using MLPs [9], [10], but those works (HMM) and Support Vector Machine (SVM). The confusion mainly focus on offline recognition, which is not compatible matrices of DA, HMM and SVM are analyzed in order to make a with the present study of online recognition. hybrid numeral recognizer system. It is found that hybrid system A central goal of this work is to classify handwritten online gives better performance than each of the individual systems, where the average recognition performances of DA, HMM and handwritten Bengali numerals by using DA based on MLP. SVM systems are 97:74%, 97:5% and 98:14%, respectively and By adding a final classification layer DA was converted into hybrid system gives a performance of 99:18%. a Deep Classifier (DC). The final classification layer worked mainly based on “winner takes all” philosophy. Best configura- I. INTRODUCTION tion of DC was chosen as per the validation set performance. Autoencoders are machine learning tools having same out- The present work is different from [1], in two ways. First, put vector dimension as input vector dimension aiming to we have used an approximation of standard backpropagation reconstruct input with minimum reconstruction error. If the known as “Marquard Bacpropagation Algoritm” (MBP) [11] number of hidden layers are more than one, the autoencoder to incorporate a higher guarantee to converge. For the rest of is considered to be Deep. Autoencoder network based on the paper, backpropagation will imply MBP and not the simple Restricted Boltzmann Machine (RBM) proposed in [2] is used backpropagation. Secondly, inspired by the work in [12], we to form a Deep Network. This work is motivated by [1], where further applied an optimization technique to the weight space Deep Autoencoder (DA) based on Multilayer perceptrons of the deep network to reduce its size and yields a network (MLP), trained through the back propagation algorithm has with good recognition time making it feasible for a real-time been explored for the task of speech emotion recognition. application. The average recognition rate of the DA was then Good initial weights of DA were obtained using a method compared with HMM and SVM and class-wise performances of pre-training and was shown to give promising results. It of the DA, HMM and SVM systems were analyzed, to build is here worthy to mention that though the functional form a hybrid system using the algorithm combination, so as to of RBM and autoencoder are quite similar, their training further improve the recognition rate. A similar kind of work procedure and interpretation are quit different. But both of can be found in [13], where the combination of HMM- them can be extended to form a deep network to perform SVM had been done using test set performance. But the ) Research Center for Signals, Systems and Computational Intelligence (fich.unl.edu.ar/sinc) i several classification and regression tasks. In this work we main difference is that our work has taken validation set have explored the use of DA (from now onwards we use DA to sinc( Arghya Pal, B. K. Khonglah, S. Mandal, Himakshi Choudhury, R. M. Prasanna, H. L. Rufiner & Vineeth N. Balasubramanian; "Online Bengali Handwritten Numerals Recognition Using Deep Autoencoders" Proc. of the 22nd National Conference on Communications (NCC 2016), mar, 2016. performance to combine those state-of-art systems. mean Deep Autoencoder based on MLP) for the task of online The paper is organized as follows: Section II describes data 978-1-5090-2361-5/16/$31.00 c 2016 IEEE collection and feature extraction method. Online Bengali hand- Fig. 1. Last row represents basic Bengali Numerals (one-to-one relation with Indo-Arabic numerals) Fig. 2. Autoencoder with input, first hidden layer and reconstruction of input; i. e. i + h1+ reconstruction of i written recognition system using DA classifier is described in Section III. Section IV explores a technique to reduce the DA network. Section V explores the use of other state- of-art systems such as HMM and SVMs for online Bengali handwritten recognition. Hybrid system is described in Section VI. The summary and conclusion is given in Section VII. Fig. 3. Autoencoder with first hidden layer, second hidden layer and reconstruction of first hidden layer; i. e.h1 + h2+reconstruction of h1 II. DATA COLLECTION AND FEATURE EXTRACTION Having a one-to-one relation with Indo-Arabic numeral set, III. ONLINE NUMERAL RECOGNITION USING DEEP Bengali numeral consists of ten characters (Fig. 1). In the AUTOENCODERS (DA) present study, we considered recognition of isolated Bengali numeral mentioned in Fig. 1. Raw data was collected from 200 Deep Autoencoder (DA) network is feed forward and fully native Bengali writers using an open source toolkit provided by connected network consisting of an input layer i, two hidden layers h1 and h2 and one output layer o. Six feature set xp, C-DAC Pune on the Lenovo ThinkPad PC in different sessions, 0 0 00 00 without putting any constraint to the writers. A mixture of yp, xp, yp, xp and yp , each having sixty equidistant points frequent writers, non-frequent writers, writers form different acquired from feature extraction set is unrolled to form a 360 age groups, professions was collected to ensure robustness of dimensional feature vector which is the input to DA. Making the system for a real world situation. At each session each the input and output class fixed, the number of units of each writer had given ten samples of each numeric character and hidden layer was modified in order to find the best classifier. that made a total count of 30000 raw samples per class. Different from the greedy layer wise training algorithm in From that collected raw data 20000 samples of each numeric [14], a strategy has been taken by [1] which finds good initial characters had been reserved for training and 5000 samples weights by learning one layer of feature at a time known as of each numeric characters had been reserved for both testing pre-training the DA. To cut a long story short, main steps and validation to make a clear segregation between train set, proposed in [1] are as follows: test set and validation dataset. Due to the large variations • Step 1: To initialize weights corresponding to links be- in shape, size, writing style and position in unconstrained tween the input i and first hidden layer h1(i.e. i+h1), an collected raw data (i.e. temporally ordered sequence of pen autoencoder with i + h1+ reconstruction of i had been coordinates (x; y)), some preprocessing techniques like nor- used. Different values of hidden layer was tested (See malization, smoothing, removal of duplicate points and re- Table II) in order to find best autoencoder which gave sampling [13] have been done to find fixed length feature minimum Sum of Squared Errors (SSE) for the validation vector. In general, preprocessing was employed to mitigate dataset. the intraclass variations and to provoke inter class variations. • Step 2: i + h1 (i.e. coder part) has been taken from step During the preprocessing (i.e. normalization, smoothing, re- 1 using those networks which gave a minimum SSE for moval of duplicate points and re-sampling), the first and last train dataset (see Fig. 2). Train and validation set was points of each raw data were unchanged as they contain vital passed to generate output. information [13]. So, preprocessing for each numeral gave 60 • Step 3: Output of the previous step was the input to the equidistant points xp; yp where p = 1 to 60, comprising of next autoencoder, i.e. h1 +h2+ reconstruction of h1. This pen-down, pen-up and other 58 equidistant meaningful points. h1 + h2+ reconstruction of h1, autoencoder initialized Preprocessed coordinates xp; yp along with first derivatives link weights between first hidden layer h and second 0 0 00 00 1 (i.e. xp and yp) and second derivatives (i.e. xp and yp ) of each hidden layer h2.