Quick viewing(Text Mode)

Automatic Stroke Extraction and Stroke Ordering Based on Truetype Font

Automatic Stroke Extraction and Stroke Ordering Based on Truetype Font

EG UK Theory and Practice of Computer Graphics (2006) M. McDerby, L. Lever (Editors)

Automatic Stroke Extraction and Stroke Ordering Based on TrueType

Sang Ok Kooˋ , Hyun Gyu Jang, Kwang Hee Won and Soon Ki Jung

Virtual Reality Laboratory, Kyungpook National University, South Korea

Abstract In this paper, suggest method that extracts strokes of and orders them automatically based on data from TrueType Font. TrueType Font contains glyph of characters whose format is a series of bézier curves and they are arranged according a rule. Using clues, we extract the vectors which consist of each stoke and reconstruct each to the format similar to a TrueType Font. In addition, we label every stroke and define the orders of strokes using stroke labels and indicate the relations among them. Because all of the processes are performed automatically, we can minimize the time and the effort to make stroke database of Chinese characters. Moreover, because the final data has vector graphics format, it can be applied to the study of graphical contents using as well as to simple font generation for the Chinese education.

Categories and Subject Descriptors (according to ACM CCS): .3.3 [Computer Graphics]: Geometric algorithms, languages, and systems

1. Introduction among them. Since all parts of the process are performed automatically, we can simply build up the Chinese character There are many people who learn the Chinese language. stroke database. This database can be used within any When we are willing to learn Chinese, the first step is application. memorizing Chinese characters. However, Chinese characters are so complicate and have many different This paper is organized as follows. We describe the characters that we cannot learn them easily. Nowadays, we TrueType Font format which we use, and previous works in can contact a lot of education contents through web or multi Section 2. We describe the automatic algorithm of stroke media devices [1][2]. Despite an increasing amount of extraction and ordering in Section 3 and Section 4, content, vendors still generate them manually. In addition, Experimental results are shown in Section 5. Finally, the the contents are not compatible because of the limitation of conclusion and future work are discussed in Section 6. the data format. To change the attributes of contents such as size, color, and others, they have to do it all over again. So it 2. Background is very important that we have to define the data format of the contents that is device-independently. The method of stroke extraction we propose is based on the TrueType Font data, which is more effective at expressing In this paper, we extract stokes of Chinese characters glyphs than using bitmap data. In this section, we discuss the using TrueType Font data, which is able to be employed properties of the TrueType Font and previous methods to device-independently. This stores data in vector format, extract strokes and order them. which is more efficient and more compatible. Moreover, we describe how to order stokes that are extracted from the 2.1 TrueType Font and Bézier Curves previous step. We employ a labeling method. We name arrangements of strokes as labels, and define the orders

[email protected]

c The Eurographics Association 2006. 124 Sang Ok Koo et al. / Automatic Stroke Extraction and Stroke Ordering Based on TrueType Font TrueType Font is a popular font format used in operating systems such as Mac OS and Windows and was designed to be efficient in storage and processing, and extensible [3]. The information from a TrueType Font is the glyph of a character, which is composed of control points of bézier curves [4]. This kind of vector font needs a font rasterizer which renders the final shape of the character, but it is very efficient structure to change certain attributes including shape or size.

Figure 1 shows the glyph of ‘竖’. There are some points Figure 3: User’s input for manual stroke extraction shown, that are the control points of the TrueType Font. The algorithm of stroke extraction we propose keeps these points 3. Automatic Stroke Extraction when making up each stroke. Figure 2 shows the result of stroke extraction. The resulting points are all originally from In this section, we describe how to extract stokes from the TrueType Font data. Therefore, the result of our method has vectors in the TrueType Font data. As shown in Figure 4, a the characteristics of a TrueType Font. stroke consists of the set of vectors. There are two vector sets; one consists of vector a, b and c, the other consists of vector i, j, and k. We name one vector set “corresponding vector set” to each other. We have to find vector sets as shown in Figure 4. The process of stroke extraction consists of four steps (contour grouping, extracting vectors, composing strokes and control recovery) as shown in Figure 5.

Figure 1: The glyph of ‘竖’

Figure 4: A stroke is composed of vectors

Figure 2: The part of the result of stroke extraction

2.2 Acquisition of Stroke Animation Data i- shows GIF animation of Chinese characters [2]. To make the GIF animation data, they have to paint stroke by stroke manually. That is simple and easy, but very time consuming work.

Koo et al. shows more natural and smooth stroke animation [5]. Their method depends on user’s input as shown Figure 3. The number of stroke, the direction of each stroke and the order of strokes are determined by the user’s input. This is faster and easier than making image sequences directly, but it is still cumbersome to make a stroke for each character. We propose the method to extract and to order Figure 5: The process for extracting strokes strokes automatically without user interaction.

c The Eurographics Association 2006. Sang Ok Koo et al. / Automatic Stroke Extraction and Stroke Ordering Based on TrueType Font 125 3.1 Contour Grouping As shown in Figure 8, there are three vector relations. In the relation of (a) and (b), we can estimate the distance. But It is not efficient to search for corresponding vectors from all in case of (c), the intersection is not within the line segment. vectors. So, we make contour group, which does not affect Hence, we do not consider it as a possible candidate. others when finding a corresponding set. The left of Figure 6 shows the result of grouping contours of a character. Every group does not overlap any group. A group can have one contour or more. If a contour is surrounded by an outer one, they belong to same group. In this case, we have to pay attention to the fact that the direction of an inner contour is the reverse of the outer one.

Figure 8: Vector relations

To examine whether a vector corresponds to another, we have to check the angle between two vectors. Two main axes which make up a stroke are nearly parallel each other. Therefore we think of two vectors that make an angle GGG between them close to 180 degree as a possible candidate. Figure 6: Contour grouping and direction & & We define the angle (T ) between two vectors and v as 3.2 Extracting Vectors taking arc cosine of the dot product of the two vectors as Equation 1. The data from a TrueType Font is a series of bézier curves. It is hard to calculate with bézier curves. For this && 1§ ˜vu · reason, we replace them by simple vectors as shown in T cos  ¨ && ¸. (1) Figure 7. These simple vectors keep the feature that they ¨ vu ¸ were originally bézier curves taking the coordinates of © ¹ control points. We employ only operations between 2 dimensional vectors. This is very efficient and easy. If two vectors of a stroke move counter-clockwise or right-hand direction as Figure 9-2, we can make up a stroke. But if vectors move clockwise as Figure 9-3, we cannot compose a stroke.

Figure 7: Extracting vectors Figure 9: Contour of a stroke moves counter-clockwise 3.3 Composing Strokes In order to determine the direction of two vectors, we To compose stokes, we need two vector sets, which are estimate the cross product of two vectors. If there are two parallel to each other. To do this, we use the distance and the & & & vectors u and v , we can make a vector w , connecting the angle between two vectors. & & head of u and the tail of v as you can see in Figure 10. The thickness of is regular. Therefore, two vectors that are far from one another cannot be corresponding In the following Equation 2, we augment dimension, & & vectors. A vector placed within a specified distance to adding the value zero as a third entry, because u and v are another is considered as a candidate. We employ the distance vectors in the 2D coordinate system. If the third entry of between a point and a line on Euclidean 2-space. && & & u wu is greater than 0, this means u and v move counter-clockwise according to the right-hand rule.

c The Eurographics Association 2006. 126 Sang Ok Koo et al. / Automatic Stroke Extraction and Stroke Ordering Based on TrueType Font

G Figure 10: Arrangement of vectors

Figure 12: Stroke segmentation u uu ,,( 0 ) u yxyxwu ww ,,( 0 ) (2) It is easy to know which segments have to be considered. (0 , 0 ,  xyyx wuwu ) At the right part of A in Figure 13, the vectors move clockwise. On the other hand, at the left part of A, the vectors move counter-clockwise. The stroke that has the Figure 11 shows the algorithm for finding corresponding parts that vectors move counter-clockwise is a candidate. vectors for composing strokes. And if there is another candidate, they are merged. We can examine the fact that two vectors are arranged clockwise by using Equation 2.

Td : threshold distance

Ta : threshold angle

MakeStroke( vector1, vector2) {

if( Distance(vector1, vector2) < Td &&

Angle(vector1, vector2)

We have to substitute control points for vectors that have Figure 11: The algorithm for finding corresponding been extracted during the previous steps as shown in Figure vectors 14.

To complete making a stroke, finally it needs merging some segments. The glyph as shown in Figure 12, where there is a ‘+’ shape, has four segments, A, B, C and D that have been composed by previous stages. However, actually we want to get strokes, A+B and C+D.

Figure 14: Control point recovery

c The Eur ogr aphics Association 2006. Sang Ok Koo et al. / Automatic Stroke Extraction and Stroke Ordering Based on TrueType Font 127 In the case of Figure 15, two strokes intersect each other at ‘T’ or ‘+’ shape characters. There is no vector that connects Table 1: Basic rules for [6] all other vectors, so we simply add line there. It is easy to know when and where these cases exist. That is the point Rule Ex. Stroke order where two or more stroke share one control point. First horizontal, then vertical At the end of a stroke, we add all control points between First left-falling, the ends of vectors which make up a stroke as shown in then right-falling Figure 16. This ensures that there is no control point which From top to bottom shares two or more strokes at the end of stroke. From left to right

First outside, then inside Finish inside, then close Middle, then the two sides

Table 2: Labels of strokes Figure 15: Control point recovery at general part Label Stroke DOT G HORIZONTAL G VERTICAL G HOOK G LEFTFALLING G RIGHTFALLING G G Figure 16: Control point recovery at the end of a stroke TURNING G

4. Automatic Stroke Ordering Table 3 shows subdivided stokes, which are labeled by relation of their positions. They can resolve the problem of In this section, we describe how to estimate the order of exception. stokes. We label all strokes, and determine the order according to the relation of them. 4.2 The Algorithm for Stroke Ordering by Relaxation Labeling 4.1 Stroke Classification Relaxation labeling is a way to define a unique label of an Table 1 shows the basic rule for ordering Chinese stroke [6], entry after one or more labels of an entry are given according but these are exceptions. To resolve the exception, we deal to its attributes [7][8]. We define the set of labels (Table 3), with the problem as a special case. If there are stroke sets and the relation as the position of them (see table 3). For any which are against the basic rule, we define them as labels stroke, if we deal with the set of strokes sssQ },...,,{ , and order them correctly. 21 n si will be labeled by its relation with other strokes. The Before we set the order of strokes, we should determine algorithm is shown in Figure 18. We consider the clues, the stroke level. There are 7 stroke types as you see in table 2. which are intersection and adjacency of strokes and so forth. All strokes map into that pre-defined types. The length of strokes, the slope of strokes and the angle between vectors in strokes decide the label of strokes. The algorithm for labeling is shown in Figure 17.

c The Eurographics Association 2006. 128 Sang Ok Koo et al. / Automatic Stroke Extraction and Stroke Ordering Based on TrueType Font

Table 3: Stroke subdivision TL : threshold of stroke length (100 pixel)

Base Label Ex. Description Tvar : the maximum angle between two vectors in a stroke

DOT G Not divided Tmax : maximum slope of stroke

NONE G General Tmin : minimum slope of stroke for all strokes in the set of strokes { 21 sssQ n},...,,{ CROSS G Intersection if ( length of si < TL ) label of si = DOT

HORIZONTAL LEFT G Left of other strokes else if (vector angle of si > Tvar ) label of s = HOOK Right of other i RIGHT G strokes else if(slope of si < Tmin ) label of s = VERTICAL CLOSE G 粣 or 螔 i else if(slope of si > Tmax ) NONE General G label of si = HORIZONTAL else if( x-value of s > 0) T G Vertical of 蘌 i VERTICLAL label of si = RIGHTFALLING CROSS Intersection G else label of si = LEFTFALLING

for all strokes in the set of strokes LEFT G Vertical of 粣 21 sssQ n},...,,{

if ( label of si = VETICAL && NONE G General label of s j = HORIZONTAL && HOOK CROSS G Intersection Intersect( si , s j ) = TRUE ) CROSS G Hook of 螲 Label of si + s j = TURNING 2 } NONE G General Figure 17: Stroke classification algorithm SYM G Symmetry LEFTFALLING SYM2 G Above the 螐 21 sssQ n},...,,{ : the set of strokes

21 lllL n},...,,{ : the set of labels RIGHT G Intersection

NONE G General 1. For all strokes in 21 sssQ n},...,,{ , determine the label of s (by Figure 17). RIGHTFALLIN i SYM G Symmetry G 2. Subdivide the label of si (by table 3)

( si may have several labels). SYM2 G Above the 螐 3. Delete the labels of si which do not keep consistency. NONE G General TURNING 4. Repeat 3, until the label of si is unique. F Turning of G 胐 Figure 18: Relaxation labeling algorithm

c The Eurographics Association 2006. Sang Ok Koo et al. / Automatic Stroke Extraction and Stroke Ordering Based on TrueType Font 129 z T (minimum slope of stroke): 15 degree 4.3 Stroke Ordering min

We make up the order after all labels of strokes are The result of the experiment is shown in table 5. Stroke extraction is successful for as many as 97% of all the determined. We define the operator ‘ ’ between two d characters. The ordering experiment was taken for the strokes. We consider the ordering strokes as a problem of characters which are extracted successfully. The ordering is sorting strokes. Operator ‘ d ’ is reflexive, antisymmetric, successful for about 77%. Figure 19 and Figure 20 show the and transitive. Hence the set of strokes Q has totally result of stroke extraction and stroke ordering for two ordered relation for the operator ‘ d ’. Operator ‘ d ’ is characters (糑 and 誃). defined by the rule of table 4. 6. Conclusions Table 4: The order of priority between labels We proposed a method of automatic stroke extraction and stroke ordering for glyph data within a TrueType Font. We Ahead Later verified that about 97% of characters are well extracted and DOT HORIZONTAL_CLOSE 77% of the well extracted characters are ordered correctly. HORIZONTAL_CROSS VERTICAL_CLOSE We made the order of strokes using relaxation labeling. We HORIZONTAL_CROSS HOOK_CROSS defined the labels of strokes according to the relation of VERTICAL_NONE HORIZONTAL_LEFT strokes, and determined a priority amongst them. It is VERTICAL_NONE HORIZONTAL_RIGHT possible to generate stroke animation data for Chinese VERTICAL_T HORIZONTAL_CROSS characters automatically using our method. VERTICAL_LEFT TURNING_NONE VERTICAL_NONE LEFTFALLING_SYM Our method does not work well for any fonts whose VERTICAL_NONE RIGHTFALLING_SYM thickness or shapes are variable. We will study to apply our HOOK_NONE HORIZONTAL_LEFT method how to these kind of fonts as future work. HOOK_NONE HORIZONTAL_RIGHT HOOK_CROSS2 HORIZONTAL_CROSS References HOOK_NONE LEFEFALLING_SYM HOOK_NONE RIGHTFALLING_SYM2 [1] USC Chinese Department Homepage, LEFEFALLING_NONE RIGHTFALLING_NONE http://www.usc.edu/dept/ealc/chinese/newweb/home.htm LEFEFALLING_SYM HORIZONTAL_NONE LEFEFALLING_SYM2 HORIZONTAL_CROSS [2] i-hanja Homepage (Korean), http://www.ihanja.com/ LEFEFALLING_RIGHT RIGHTFALLING_NONE [3] Microsoft , RIGHTFALLING_SYM2 HORIZONTAL_NONE http://www.microsoft.com/typography RIGHTFALLING_SYM2 HORIZONTAL_CROSS TURNING_F VERTICAL_NONE [4] Farin, G.: Curves and Surfaces for Computer Aided TURNING_F HOOK_NONE Geometric Design, Academic Press, New York, 1988. [5] Sang Ok Koo, Hyun Gyu Jang and Soon Ki Jung: 5. Experimental Results Efficient Stroke Order Animation of the Chinese Character, In this section, we made an experiment to estimate the KCJC, 2005. accuracy of our method for 1000 Chinese characters. The [6] How To Write Chinese Characters, parameters used in this experiment are: http://www1.esc.edu/personalstu/jli/How to Write Chinese z Font name: gulim.tcc Characters.pdf z Font size: 500 ค 500 pixel [7] Dana Harry Ballard, Christopher M. Brown: Computer z Minimum of thickness of strokes: 35 pixel Vision, Prentice Hall Professional Technical Reference, 1982. z Maximum of thickness of strokes: 45 pixel z Tolerance of slope of corresponding vectors: 15 [8] Hummel R.A. and Zucker S.W.: On the foundations of degree relaxation labeling processes, IEEE Transaction on Pattern Analysis and Machine Intelligence., Vol.5, No. 3, 1983. z TL (threshold of stroke length): 100 pixel z Tvar (the maximum angle between two vectors in a stroke): 75 degree z Tmax (maximum slope of stroke): 75 degree

c The Eurographics Association 2006. 130 Sang Ok Koo et al. / Automatic Stroke Extraction and Stroke Ordering Based on TrueType Font

G Figure 19: The result of ‘糑’

Figure 20: The result of ‘誃’

c The Eurographics Association 2006. Sang Ok Koo et al. / Automatic Stroke Extraction and Stroke Ordering Based on TrueType Font 131

Table 5: The result of the experiment

The extracted well fail

972 (97.2%)

The ordered well fail 28 (2.8%)

751 (77.3%) 221 (22.7%)

竖竛竏竑竐竎竡竗竲竫竩笋笂竹笠笡第笹笶筄筆筛筐筊筝觬筤筦筧 竱笆笚笣笝笵筪筩筸筶 箑 糤紞紛絣 筥筵箁箇箄箒箔箟箘箚箤箦箥箯箾箿篕篍篏箸篔箲筟篗篨篯篜篲篣 箧箻篚篮簊籄籉籟粝 粺糂 缌緙膍臐 篠篳篬簙簍篴篶篸篿篺篼簝簟簢簮簴簻簰簲簱簳簹籂籇籝籒籚籦籩 粮粵粽糞糪糨紥紭絀絃緎 舿莤蕊薆 籱粋粊粊粂粌粣粢粠粤粱糑糒糛糜糗糙糡糮糩糿糾納紒紞紨索綎細 緒緜縡縤縲繓繘 繥纊繯繭 薅薮蝉蝀 紶組絥絽絜絘結絛給絟絎綀絓絷 絕綋綩綮網綸綼緉総緘緷縑縘縦縞 繲纈纏纷绋绊 绚绥缃缞缧 蠏諊读课 縟縧縝縣繁縶繆繊繐繗織繛繢繚繪繱繩纋繿纘纓纔纫纨纮纩约纲纵 缺罫羌群翝翓翸耮耨聝聪 貓账贫走 绀纾织绌绕统绤绷综缇缉缓编缒缜缩缫缶缽罀罘罈罚罜緢罳罵罸虙 肅肘肯胖 胭脊脓腯腲膂臅 緀腶藉觳 羯羦羫羰羲羸羹翈蝗翟翪翰耎耊耖耟耛耲耱耵聋聄聐聑聒聫聩職聺 臻舫舍 舮鑀艙艓艶艕芄芼 聹肄肏肚肞肱肫肪肵胂胃肶肼胐胞胚胛胮胷脎脙脘脗脪脧脬脾脷 脱 芩芭茉苸茎茓茲茿荅荙荹 腅腆腌腔腒腝腜腟腣腱腳腹膊膋膈膌膞膟膘膣膩膬臄臠臈膝臀臉臁 莇 莡莠莞莑菗萆萈萐萙萶 臛臦臧臮致膻舔舯舆舭舐舝舻艏艃艧艈艠艅艐艋艞艗良艔艹艭艪芊 萮萬葌葖葹蒃蒇蒨蒲蓄蓬 芈芅芐芓芟芴芣芳芪芻芠苅苇苣苗苎苩苳苫苺茗苻苽茣茢茡茞茶茼 蓷蔋蔒蔛蔰蕇蔸薛蘶 虴虝 茴荁茾茱茷茻荃荆荄草荇荐荑荓荔荢荥荳荱荶荲荺荼莃莈莝莢菊莐 蚌蚫蚞蚸蚻蛿蛽蜣 蜨蜴蜻 莗莮菑菍菲菨菵菸萀萕萚萟萣萤营萘萫萴萺萿葃葆葂葑葐葒葙葟葢 蝆蜷蝣螊螿螱蟖蠃蠀蟨蟪 葥蒈蒕蒙蒵蒽蓅蓌蓈蓜蓪蓤蓛蓳蓵蓰蔀蔆蔡蔩蔧蕈蕄蕍蕑蕚蕊蕣蕹 衍衚衲袈袉袢 袕袓裌裖詽 薙薎薂薴薺薾藕藘藛藝藶蘌蘈蘑蘧蘷蘾虅虑虈虠虸號虧虵蚎蚔蚙蚕 誺諁諨諬諵 諿謚謠譚譯讈 蚮蚖蚭蚗蚘蚠 蚤蚰蚴蚳蚾蛆蚷蚶蚹蛞蛯蛥蛌蛎蜉蜈蜧蜮蜯蜭 蜲蝂蝈 讅讔讍讠谉谗谚谞谷豇谼 蜹蝊蝋蝠蝝蝳蝸蝾蝻螁蝹螐螔螙螤蟀螲螳蟅蟋蟇蟟蟣蟶蟧蠄蟥蟸蠉 豫豸貕 貛貴賜賥賨贏贻赂 蠇蠒蠊蠅蠖蠰蠛蠟 蠿衈蠻衂衇衃血衴衒衔衝術衑衤衙衯衾衸衿衺袂 趾跊 跎赶趖籆縤纖组缜绫 袈袀袘袏袲袰袌裀裃裕裇裲裧裿裢裝複裣裨裹裬褊褈褍褌褎褚褒褤 缆美羼翦耂耜耩耴耹聆腎 褥褧褨褩褹褯褶襍襐襛襦襯覐襽覌覓覃要覀覅覜覟観覲覭见觐觛触 蛋 觰觿訅訊訖訕訮訸訵詐詍詏誃詿誂誎誣誏誔誰課誽諄諛諞諠諟諥諫 諮諹謇謈講謚謐謡謤謥謨謩謯謳謴謻謵譄譖識譞譟譭譹譺讀讁讇讒 讖讟讣讪讨讱讷评诈诐诓诜诮诟该误谈谅谎谓谒谣谨谫谮豈豀谻豅 豗豨豤豵豻貂貄貇貉貐貌貔貚貟販貲買費資賊賈贅賲贇賫贍贑贞贬 贪责骍财贮贳贽赇赣赖赬赥赿趀趒趋趌趟趫趱趺跖跑赧趢羻

c The Eurographics Association 2006.