Handwritten Hangul Recognition Using Deep Convolutional Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Handwritten Hangul recognition using deep convolutional neural networks In-Jung Kim1 and Xiaohui Xie2 1School of CSEE, Handong Global University 791-708, Heunghae-eup, Bukgu, Pohang, Gyeongbuk, Republic of Korea 2Department of Computer Science, School of Information and Computer Science University of California, 1 East Peltason Drive, Irvine, CA 92697, U.S.A. Email: [email protected], [email protected] Tel: +82-54-260-1385, FAX: +82-54-260-1976 Abstract In spite of the advances in recognition technology, handwritten Hangul recognition (HHR) remains largely unsolved due to the presence of many confusing characters and excessive cursiveness in Hangul handwritings. Even the best existing recognizers do not lead to satisfactory performance for practical applications, and have much lower performance than those developed for Chinese or alpha-numeric characters. To improve the performance of HHR, here we develop a new type of recognizers based on deep neural networks, which have recently shown excellent performance in many pattern recognition and machine learning problems, but have not been attempted for HHR. We build our Hangul recognizers based on deep convolutional neural networks, and propose several novel techniques to improve the performance and training speed of the networks. We systematically evaluated the performance of our recognizers on two public Hangul image databases, SERI95a and PE92. Using our framework, we were able to achieve a recognition rate of 95.96% on SERI95a, and 92.92% on PE92. Compared to the previous best records of 93.71% on SERI95a and 87.70% on PE92, our results provided improvements of 2.25% and 5.22%, respectively. These improvements lead to error reduction rates of 35.71% on SERI95a and 42.44% on PE92, relative to the previous lowest error rates. Such improvement fills a significant gap between practical requirement and the actual performance of Hangul recognizers. Keywords: handwritten Hangul recognition, character recognition, deep convolutional neural network, deep learning, gradient-based learning 1. Introduction Character recognition technology has been developed for decades. For alpha-numeric and Chinese characters, recognition technologies are mature enough to achieve high accuracy. However, in handwritten Hangul recognition (HHR), even the best recognizers cannot yield satisfactory performance for practical applications. The difficulty of handwritten Hangul recognition is mainly caused by a multitude of confusing characters and excessive cursiveness in Hangul handwritings. Figure 1 shows some examples of Hangul characters that are very similar and are often confused by recognizers. Hangul contains numerous such confusing characters. Moreover, shape variation in cursive handwritings makes it even harder to distinguish them. On the SERI95a and the PE92 databases, two most popular public Hangul image databases, the state- of-the-art performances in terms of recognition rate are merely 93.71% and 87.70%, respectively [1][2]. Such poor performance has discouraged the utilization of HHR in practical systems. Figure 1. Examples of Hangul characters On the other hand, in recent years, deep neural networks (DNN) have been highlighted in machine learning and pattern recognition fields. Composed of many layers, DNNs can model much more complicated functions than shallow neural networks [3]. The availability of large scale training data and advances in computing technologies have made the training of such deep networks possible, leading to a widespread adoption of DNNs in many problem domains. For example, deep convolutional neural networks (DCNNs) have shown outstanding performance in many image recognition fields, beating benchmark performances by large margins [4][5][6][7]. However, although DNN has been employed for many pattern recognition systems, it has not been attempted for HHR. We reasoned that DNN could be especially beneficial for HHR for a number of reasons. First, DNN performs features learning and classification within a unified framework. Since features are automatically learned directly from the data themselves, it might be possible to extract subtle features to separate confusing characters in Hangul handwritings. Second, DNN is very good at extracting high-level features. In particular, the convolution and max-pooling layers used by DCNN have been shown to be very effective in handling shape variations, which will likely be key in handling the excessive cursiveness in Hangul writings. Therefore, DCNN seem to be well-posed to overcome the two main difficulties in HHR. In this research, we build handwritten Hangul recognizers using DCNNs. Then, we improve the performance and the training speed of the recognizers by applying a few improvement techniques. Using the system we built, we are able to achieve a recognition rate of 95.26% on SERI95a and 92.92% on PE92. Compared to the previous best records of 93.71% on SERI95a and 87.70% on PE92, our results provided improvements of 2.25% and 5.22%, respectively. These improvements lead to error reduction rates of 35.71% on SERI95a and 42.44% on PE92, relative to the previous lowest error rates. Such improvement fills a significant portion of the large gap between practical requirement and the actual performance of Hangul recognizers. The rest of this paper is organized as follows: In Section 2, we briefly review previous works on HHR and deep learning. In Sections 3 and 4, we describe the DCNN-based Hangul recognizer and the training algorithm. In Section 5, we propose several techniques to further improve performance and training speed. In Section 6, we present experimental results on two popular HHR datasets. Conclusions are provided in Section 7. 2. Related Works 2.1. Handwritten Hangul recognition Character recognition methods can be generally grouped into two categories: structural and statistical. The structural method describes the input character as strokes or contour segments, and identifies the class by matching with the structural models of candidate classes. On the other hand, the statistical method represents the character image as a feature vector, and classifies the feature vector using statistical methodologies. Among them, statistical methods are more widely used in practical systems because they are easy to build and are effective in recognizing many character sets, including Chinese characters. However, unlike handwritten Chinese character recognition (HCCR), structural methods outperformed statistical methods in HHR for a long time. We believe the reason is that, in the early days, the statistical methods were insufficient to deal with the multitude of confusing characters and the excessive cursiveness in Hangul handwritings. Kim and Kim proposed a structural method based on hierarchical random graph representation [8]. Given a character image, they extracted strokes and represented them onto an attributed graph, which is matched with character models using a bottom-up matching algorithm. Kang and Kim improved [8] by modeling between-stroke relationships [2]. They extended the hierarchical random graph in [8] by adding another type of nodes to represent relationships between strokes. Then, they matched those nodes with relationships among input strokes. Jang proposed a post- processing method for the structural recognizers in [8] and [2] to improve discrimination ability [9]. The post-processor consists of a set of pair-wise discriminators, each of which is specialized for a pair of graphemes with similar shapes. To build each pair-wise discriminator, they chose parts that separate the character pair, and then, applied statistical methods focusing on those parts. These systems were evaluated on two public handwritten Hangul image databases: SERI95a1 and PE92 1 SERI95a is also known as KU-1. [10][11]. The best performances on SERI95a and PE92 achieved by the structural methods were 93.4% reported in [9] and 87.7% reported in [2], respectively. In the early days of HHR, researchers attempted to use statistical methods to recognize handwritten Hangul [12][13][14]. However, the performance of those methods was much poorer than that of the structural methods, or it is hard to compare their performance to that of other methods because it was measured on small-size private datasets. As a result, statistical methods were not frequently used in HHR. Meanwhile, statistical methods have become the mainstream approach in HCCR and have been improved significantly. Recently, Park et al. applied state-of-the-art statistical methods to HHR and evaluated their performance [1]. Combining non-linear shape normalization, the gradient feature extraction, and the modified quadratic discriminant function (MQDF) classifier, they achieved much better results than the early statistical recognizers. Their best performances were 93.71% on SERI95a and 85.99% on PE92, which are comparable to the performances of the structural recognizers. Specially, the recognition rate 93.71% on SERI95a is even higher than the best result of the structural method, 93.4%. Moreover, it is likely that the performance of statistical methods can be further improved when more training data are available [15]. However, at present neither the structural method nor the statistical method can provide a performance level high enough for practical applications. Consequently, handwritten Hangul recognition remains an unsolved problem. 2.2. Deep neural networks For the past few years, DNNs have produced outstanding results in machine learning and pattern recognition fields.