Handwritten Chinese Font Generation with Collaborative Stroke Refinement
Total Page:16
File Type:pdf, Size:1020Kb
Handwritten Chinese Font Generation with Collaborative Stroke Refinement Chuan Wen1 Jie Chang1 Ya Zhang1 Siheng Chen2 Yanfeng Wang1 Mei Han3 Qi Tian4 1Cooperative Madianet Innovation Center, Shanghai Jiao Tong University 2Mitsubishi Electric Research Laboratories 3PingAn Technology US Research Lab 4Huawei Technologies Noah’s Ark Lab falvinwen, j chang, ya zhang, [email protected] [email protected] [email protected] [email protected] Abstract source : 풙 Thin issue Automatic character generation is an appealing solu- final synthesis standard synthesis tion for new typeface design, especially for Chinese type- faces including over 3700 most commonly-used characters. Auxiliary branch This task has two main pain points: (i) handwritten char- acters are usually associated with thin strokes of few infor- mation and complex structure which are error prone dur- ing deformation; (ii) thousands of characters with various Dominating shapes are needed to synthesize based on a few manually branch Refine branch designed characters. To solve those issues, we propose a (multiple stroke weights) novel convolutional-neural-network-based model with three main techniques: collaborative stroke refinement, using col- laborative training strategy to recover the missing or broken strokes; online zoom-augmentation, taking the advantage of the content-reuse phenomenon to reduce the size of train- ing set; and adaptive pre-deformation, standardizing and aligning the characters. The proposed model needs only final synthesis 750 paired training samples; no pre-trained network, extra dataset resource or labels is needed. Experimental results Figure 1. Collaborative stroke refinement handles the thin issue. show that the proposed method significantly outperforms Handwritten characters usually have various stroke weights. Thin strokes have lower fault tolerance in synthesis, leading to distor- the state-of-the-art methods under the practical restriction tions. Collaborate stroke refinement adopts collaborative train- on handwritten font synthesis. ing. An auxiliary branch is introduced to capture various stroke arXiv:1904.13268v3 [cs.CV] 6 May 2019 weights, guiding the dominating branch to solve the thin issue. 1. Introduction font transfer is a low fault-tolerant task because any mis- Chinese language consists of more than 8000 characters, placement of strokes may change the semantics of the char- among which about 3700 are frequently used. Designing acters [7]. So far, two main streams of approaches have a new Chinese font involves considerable tedious manual been explored. One addresses the problem with a bottle- efforts which are often prohibitive. A desirable solution is neck CNN structure [1, 2, 7, 15, 12] and the other proposes to automatically complete the rest of vocabulary given an to disentangle representation of characters into content and initial set of manually designed samples, i.e. Chinese font style [19, 24]. While promising results have been shown in synthesis. Inspired by the recent progress of neural style font synthesis, most existing methods require an impracti- transfer [6, 9, 10, 18, 21, 26], several attempts have been re- cable large-size initial set to train a model. For example, cently made to model font synthesis as an image-to-image [1, 2, 7, 15] require 3000 paired characters for training su- translation problem [7, 19, 24]. Unlike neural style transfer, pervision, which requires huge labor resources. Several re- 1 “radical” mation to standardize characters. This allows the proposed reusing networks focus on learning high-level style deformation. “single-element” reusing Combining the above methods, we proposed an end- to-end model (as shown in Fig.3), generating high-quality characters with only 750 training samples. No pre-trained …… network, or labels is needed. We verify the performance of the proposed model on several Chinese fonts including both handwritten and printed fonts. The results demonstrate the significant superiority of our proposed method over the Figure 2. Online zoom-augmentation exploits the content-reuse state-of-the-art Chinese font generation models. phenomenon in Chinese characters. A same radical can be The main contributions are summarized as follows. reused in various characters at various locations. Online zoom- augmentation implicitly model the variety of locations and sizes • We propose an end-to-end model to synthesize hand- from elementary characters without decomposition, significantly written Chinese font given only 750 training samples; reducing the number of training samples. • We propose collaborative stroke refinement, handling the thin issue; online zoom-augmentation, enabling cent methods [12, 19, 24] explore learning with less paired learning with fewer training samples; and adaptive pre- characters; however, these methods heavily depend on ex- deformation, standardizing and aligning the charac- tra labels or other pre-trained networks. Table 1 provides a ters; comparative summary of the above methods. • Experiments show that the proposed networks are far Most of the above methods focus on printed font synthe- superior to baselines in respect of visual effects, RMSE sis. Compared with printed typeface synthesis, handwritten metric and user study. font synthesis is a much more challenging task. First, its strokes are thinner especially in the joints of strokes. Hand- 2. Related Work writing fonts are also associated with irregular structures and are hard to be modeled. So far there is no satisfying Most image-to-image translation tasks such as art-style solution for handwritten font synthesis. transfer [8, 13], coloring [5, 23], super-resolution [14], de- This paper aims to synthesize more realistic handwrit- hazing [3] have achieved the progressive improvement since ten fonts with fewer training samples. There are two main the raise of CNNs. Recently, several works model font challenges: (i) handwritten characters are usually associ- glyph synthesis as a supervised image-to-image translation ated with thin strokes of few information and complex struc- task mapping the character from one typeface to another ture which are error prone during deformation; (see Fig- by CNN-framework analogous to an encoder/decoder back- ure 1) and (ii) thousands of characters with various shapes bone [1, 2, 4, 7, 12, 15]. Specially, generative adversarial are needed to synthesize based on a few manually de- networks (GANs) [11, 16, 17] are widely adopted by them signed characters (see Figure 2). To solve the first issue, for obtaining promising results. Differently, EMD [24] and we propose collaborative stroke refinement that using col- SA-VAE [19] are proposed to disentangle the style/content laborative training strategy to recover the missing or bro- representation of Chinese fonts for a more flexible transfer. ken strokes. Collaborative stroke refinement uses an aux- Preserving the content consistency is important in font iliary branch to generate the syntheses with various stroke synthesis task. “From A to Z” assigned each Latin alpha- weights, guiding the dominating branch to capture charac- bet a one-hot label into the network to constrain the con- ters with various stroke weights. To solve the second issue, tent [20]. Similarly, SA-VAE [19] embedded a pre-trained we fully exploit the content-reuse phenomenon in Chinese Chinese character recognition network [22, 25] into the characters; that is, the same radicals may present in various framework to provide a correct content label. SA-VAE also characters at various locations. Based on this observation, embedded 133-bits coding denoting the structure/content- we propose online zoom-augmentation, which synthesizes reuse knowledge of inputs, which must rely on the extra a complicated character by a series of elementary compo- labeling related to structure and content beforehand. In nents. The proposed networks can focus on learning va- order to synthesize multiple font, zi2zi [2] utilized a one- rieties of locations and ratios of those elementary compo- hot category embedding. Similarly, DCFont [12] involved nents. This significantly reduces the size of training sam- a pre-trained 100-class font-classifier in the framework to ples. To make the proposed networks easy to capture the provide a better style representation. And recently, MC- deformation from source to target, we further propose adap- GAN [4] can synthesize ornamented glyphs from a few ex- tive pre-deformation, which learns the size and scale defor- amples. However, the generation of Chinese characters are 2 Table 1. Comparison of Our with existing Chinese font transfer methods based on CNN. Methods Requirements of data resources Dependence of extra label? Dependence of pre-trained network? Rewrite [1] AEGN [15] 3000 paired samples. None None HAN [7] zi2zi [2] Hundreds of fonts (each containing 3000 samples). Class-label DCFont [12] Hundreds of fonts (each containing 755 samples). for each font. Pre-trained font-classifier based on Vgg16 SA-VAE [19] Hundreds of fonts (each containing 133-bit structural code for each character. Pre-trained character recognition network EMD [24] 3000 samples). 14 None None Ours Only 755 paired samples 풚 concat concat CNN Discri- source: x layers minator1 32x32x128 32x32x64 32x32x128 32x32x64 32x32x128 64x64x1 CNN 土 dominated branch layerslayers 64x64x1 4x4x512 data flow 32x32x64 32x32x64 target: 풚 conv refine branch 품(·) 품(품(풃 (풚))) 풃(·) dilation operation 품(·) max-pooling auxiliary branch 품(풃(풚)) 품(·) 풃(·) 풃 (풚) adaptive pre-deformation CNN Discri- bold