Hangul Tree Classifier for Type Clustering Using Horizontal and Vertical Strokes

Hangul Tree Classifier for Type Clustering Using Horizontal and Vertical Strokes Young-Bin Kwon Computer Vision Lab., Dept.Of Computer Eng., Chung-Ang University, Seoul, 156-756, Korea E-mail:[email protected] Abstract tracts the mash vector from the sub-region predefined according to its style. Hangul is clustered into six different types in general. Hangul type classification is most effective when it is Type clustering has an effect of coarse classification in performed on syllable-level but can only be applied as a syllable matching and becomes the pre-processing stage pre-processing stage for the Jaso-partition process. In this for the segmentation in the grapheme matching process. research, we propose a method for classifying Hangul In this paper, we define a set of grapheme region of a pre- characters into 6 different types to enhance the Jaso- consonants, vowel and post-consonant, that can absorb partition process with a high-level classification effect. the change of the character’s shape and the noise. A method for extracting the main stroke of the horizontal 2. Type-categorization based on the princi- vowel and vertical vowel that appear in the region is ples of Hunminjeongum proposed in this paper. The Jaso (similar to grapheme) of Hangul is formed as an association of basic strokes and the Umjeol (similar to 1. Introduction syllable, one phonetic unit of a single Korean character) is formed as a 2-dimensional association of the Jaso to be Hangul, the Korean characters, consist of a much lar- able to express most natural sounds. Therefore, Hangul ger number of combination of characters and the combi- recognition should eventually be implemented based on nation have more complexities than English or Japanese Jaso units. A Hangul type classification method that is characters which makes it even more difficult to recog- reliable with a high classification rate is required because nize[1][7]. Research on Hangul recognition turned out a Jaso segmentation and Jaso region establishment is essen- promising result that classifies the characters into a tial to the Jaso-unit recognition. Jaso(similar to grapheme) unit and a syllable-unit. In Jaso-unit recognition [2,3], the number of characters to be 2.1. The essence of the foundation of Hunmin- matched is decreased to 40 characters for improved time jeongum spend to recognize characters. Syllable-unit recognition does not require such Jaso-partitioning process but also 14 Korean consonants are existing: ㄱ, ㄴ, ㄷ, ㄹ, has a large problem to be resolved because the number of ㅁ, ㅂ, ㅅ, ㅇ, ㅈ, ㅊ, ㅋ, ㅌ, ㅍ, and ㅎ. There are characters to be recognized reaches 11,172. Most re- 10 vowels for Hangul : ㅏ, ㅑ, ㅓ, ㅕ, ㅗ, ㅛ, ㅜ, ㅠ, searches that use the syllable-unit recognition method ㅡ, and ㅣ. Three rules for extending the set of opening only performs its operation on a restricted number of characters extracted by the high-occurrence method. phoneme element (consonant), middle phoneme element A study by Do [4,5] that applies the structural method (vowel), and closing phoneme element (consonant) are obtains all the runs for the four directions, horizontal, defined in the solution book of combination of Jaso. vertical, diagonal and anti-diagonal, and then sets the left- most and the top point of the image as the base-location 2.2. Hangul types and the Jaso region for obtaining the horizontal and vertical stroke. Lee [6] Hangul characters can be classified into 6 different suggested the MRLP method as a characteristic for the types according to the shape of the vowel and the exis- type classification. MRLP creates a histogram by project- tence of a closing phoneme element(s). In our study, Han- ing the runs with the maximum length that exist in each gul characters are classified into 3 different types as row or line. The method implemented by Lee[3] deter- shown in Table 1, and then further classified into two mines the type through a type-classification neural net- other categories by the existence of a closing phoneme work that uses the mash-vector as its input and then ex- element. 1051-4651/02 $17.00 (c) 2002 IEEE Table 1. Type classification of Hangul (a) (b) (c) Figure 2. Stroke Extraction using Essential Region (a) original character (b)hrizontal stroke (c) vertical stroke Extracting the vertical stroke and the horizontal stroke First step of extraction obtains a horizontal and vertical Each region illustrated in Figure 1 is a definition of the run that is thicker than the maximum thickness of a stroke region where only a portion of the Jaso appears. A portion and then merges the neighbouring runs together for the of the opening phoneme element should always appear in extraction. This method can effectively absorb the change the region of the opening phoneme element consonant and of the vertical and horizontal stroke caused by the slant or a portion of the vowel appears if either a horizontal vowel the noise. Figure 2-(a) is a normalization of the input or vertical vowel exists in the region. If a closing pho- character image and Figure 2-(b) and (c) show the ex- neme element exits, then parts of the closing phoneme tracted vertical stroke and horizontal stroke. The region element except the ‘ㄱ’ and ‘ㅋ’ characters will appear in indicated using the ‘O’ is a portion of the stroke that ap- the closing phoneme element consonant region. There- pears in the horizontal vowel region and the vertical fore, we can effectively extract the desired Jaso from the vowel region. However, a method for separating the corresponding region by investigating the connection strokes is required because the various strokes may be elements in the region. The suggested Jaso region is more merged together in this method. The basic information robust against the various character types and noise than required for the separation method detects the divergence at the peak used in the related studies [4,5]. and the change the thickness of the stroke for the separation. Distinguishing between the vertical vowel and the long horizontal vowel The vertical vowel in Hangul can be obtained by find- ing the stroke that fits the characteristic of the vertical vowel from the set of vertical strokes extracted from the vertical vowel region. This can be thought as discarding Figure 1. Essential Jaso Region for Hangul characters the strokes that are not a vertical vowel from a set of can- didate group. Because the vertical vowel region resides in the upper-right corner, the vertical strokes extracted can 2.3. Proposed Tree Classifier be thought to be generated from a opening phoneme ele- A tree classifier obtains the stable characteristics by ment consonant if not a vertical vowel. We have analyzed simplifying the complex characteristics step-by-step and the types of vertical strokes that can occur from a opening then gathering the types. The types are further classified phoneme element consonant. Because a vertical stroke within each group. exists in the right-hand side, characters ㄱ,ㄱ, ㄲ, ㅁ, In this study, we extended the tree classifier [4], which ㅂ, ㅃ, ㅋ should be discarded if they appear in the is devoted to the structural characteristics of Hangul, for vertical vowel region.ㄱ, ㄲ, ㅁ, and ㅋ can be dis- classification as shown in Figure 3. The suggested tree carded using the information of the horizontal stroke that classifier performs its horizontal vowel classification at appears in the top, and characters ㅂ and ㅃ can be the bottom-most level of the tree. The region that the discarded using the vertical stroke that occurs on the left- horizontal vowel can appear and the limitations set by the hand side and the information of the horizontal-stroke that number of horizontal branches are used to determine the appears at the bottom. If the characteristic of the existing existence of the horizontal vowel. long horizontal vowel is accurate, all of the vertical vowel in the set of candidates can be discarded. This is because a vertical stroke cannot co-exist with a long horizontal vowel. 1051-4651/02 $17.00 (c) 2002 IEEE the process of estimating the number of horizontal branches. In the PMS, assuming that the number of horizontal branch is n, then the number of horizontal branches is determined to be n-1 in case a contact is made between a horizontal branch and a vertical branch. The reason for this is to reduce the limitation on the association rule for the horizontal stroke by reducing the number of horizontal branches as shown in Table 2. Table 2. Combination rule of Horizontal and Vertical Vowel v- vowel h-brench: 2 h-brench : 1 h-brench : 3 Figure 3. Proposed Tree Classification right left h-vowel ㅏ, ㅑ, ㅒ, ㅕ, Distinguishing the closing phoneme element ㅣ ㅓ, ㅔ Distinguishing the closing phoneme element that be- ㅐ ㅖ ㅗ comes the prop can be divided into two different types { { × × according to the type of the vowel, as shown in Figure 3 ㅜ of the tree classifier. If a vertical vowel exists, the main { × { × characteristic is contained in the bottom point of the cor- ㅡ responding vertical stroke. If the bottom point of the ver- { × × × ㅛ ㅠ tical stroke is below the threshold, it can be concluded , × × × × that a closing phoneme element exists. And if it is larger than the threshold, the existence of a closing phoneme In a closing vowel with a single vertical stroke, the element is determined by the characteristic of the branch length of the horizontal run that consists of a horizontal on the left side that is generated from the vertical stroke.

Load more