Separating Chinese Character from Noisy Background Using GAN

Total Page:16

File Type:pdf, Size:1020Kb

Separating Chinese Character from Noisy Background Using GAN Hindawi Wireless Communications and Mobile Computing Volume 2021, Article ID 9922017, 13 pages https://doi.org/10.1155/2021/9922017 Research Article Separating Chinese Character from Noisy Background Using GAN Bin Huang ,1 Jiaqi Lin ,1 Jinming Liu ,1 Jie Chen ,1 Jiemin Zhang ,1 Yendo Hu,1 Erkang Chen ,1 and Jingwen Yan 2 1Computing Engineering College, Jimei University, Xiamen 361021, China 2College of Engineering, Shantou University, Shantou 515063, China Correspondence should be addressed to Erkang Chen; [email protected] Received 17 March 2021; Revised 7 April 2021; Accepted 20 April 2021; Published 3 May 2021 Academic Editor: Philippe Fournier-Viger Copyright © 2021 Bin Huang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Separating printed or handwritten characters from a noisy background is valuable for many applications including test paper autoscoring. The complex structure of Chinese characters makes it difficult to obtain the goal because of easy loss of fine details and overall structure in reconstructed characters. This paper proposes a method for separating Chinese characters based on generative adversarial network (GAN). We used ESRGAN as the basic network structure and applied dilated convolution and a novel loss function that improve the quality of reconstructed characters. Four popular Chinese fonts (Hei, Song, Kai, and Imitation Song) on real data collection were tested, and the proposed design was compared with other semantic segmentation approaches. The experimental results showed that the proposed method effectively separates Chinese characters from noisy background. In particular, our methods achieve better results in terms of Intersection over Union (IoU) and optical character recognition (OCR) accuracy. 1. Introduction In order to achieve examination paper autoscoring, one of the technical challenges to be solved is handling overlap- Converting paper documents into electronic documents and ping characters. This may happen when an elementary then recognizing them using optical character recognition school student did not master writing well or put annotation (OCR) technology have been widely used in daily life. In on the test paper. The current OCR technology cannot han- recent years, with the development of machine learning tech- dle the mixed situation of printed text and handwritten text nology, the recognition accuracy of OCR has been greatly in the same image. Generally, only a single type of text can improved [1–3]. We can now process a document with both be recognized by OCR technology [8]. Our early experiments machine-printed text and handwritten text and then recog- showed that when recognizing printed text, the OCR accu- nize them separately [4, 5]. Similar applications can be found racy was greatly reduced if there were handwritten strokes in the archiving and processing of historical documents [6, or handwritten characters around the printed text. Even 7]. In the field of education, related technologies for examina- worse was that the machine was not able to find the text area tion paper autoscoring have emerged, which greatly reduce needed to be recognized. Therefore, it is desirable to separate burden for teachers and students. Taking Figure 1 as an the handwritten characters from the printed characters on example, an examination paper with students’ answers can the examination paper and then process different text types first be processed by OCR, and then the recognized answers accordingly. Furthermore, for Chinese characters, the separa- can be evaluated and scored automatically by the machine. tion of handwriting and printing becomes more difficult Under certain circumstance, since the test paper template because the font structure is far more complicated than cannot be easily obtained, it is also necessary to directly Western fonts [9, 10]. A slight loss or increase of strokes identify the printed test paper template. may change the meaning of the characters completely, which 2 Wireless Communications and Mobile Computing in both IoU and OCR accuracy; and (c) our dataset (upon request) for further research is provided. 2. Related Work Many applications in document processing need to solve the problem of separation of handwriting and printed. The Maurdor project created a realistic corpus of annotated doc- uments in French, English, and Arabic to support the effi- cient development and evaluation of extraction method (a) [20]. DeepErase [21] uses neural networks to erase ink arti- facts on scanned documents and only extract text written by a user. The ink artifacts that DeepErase targets mainly include a tabular structure, fill-in-the-blank boxes, and underlines. Guo and Ma [22] used a machine-printed and handwritten annotation discrimination algorithm based on (b) the Hidden Markov Model. Solely focusing on English and other Latin languages, their algorithm can locate the position (c) of the handwritten part in the document in the form of a bounding box. Zagoris et al. [23] proposed a method of rec- ognizing and separating handwritten content from docu- (d) ment images mixed with handwritten and printed characters through the bag of visual word model. Their (e) Post-processing(OCR, NLP, etc.) method first computes a descriptor for each block of interest fi Exam Paper Auto-scoring and then classi es the descriptor into handwritten text, (f) machine printed text or noise. However, few research has Figure 1: Basic process of examination paper autoscoring. (a) A been focusing on highly overlapped texts, especially Chinese sample examination paper which consists of both handwriting and characters that are structurally more complex than English or printed text. (b) A subquestion with answers to be scored. (c) other Latin languages. Handwriting touches or even overlaps with printed text. The red Recent deep learning methods provide new ways for circle shows an example of overlapping characters. (d) The solving the separation of handwriting and printed. Li proposed method targets separation of overlapping Chinese et al. [5] handles printed/handwritten text separation characters into printed text (left rectangle) and handwriting (right within a single framework by using conditional random rectangle). (e, f) After successful separation is made, fields. Their algorithm only performs extraction at con- postprocessing and autoscoring become feasible. nected component (CC) level. Each CC is classified into printed and handwritten no matter it is overlapping or not. U-Net [11], which performs well in many segmenta- makes it difficult to separate effectively when handwriting tion tasks, builds upon only convolution layers and the idea fonts and printed fonts are highly overlapped. of propagating context information to higher resolution Separating Chinese characters from noisy background layers during upsampling. Pix2Pix [17] translates an input (particularly with overlappings) can be considered an image image into a corresponding output image. With a paired semantic segmentation problem. Previous deep learning training dataset, it can output sharp and realistic images. methods [11–13] have shown success in other applications. Such features make it attractive for solving our character However, these methods have poor performance due to the segregation problem. However, a paired training dataset complex structure of Chinese characters. To distinguish Chi- may not be easy to find in real-world applications. Cycle- nese characters from similar fonts, we adopted a GAN-based GAN [16] is an approach for learning to translate an image approach [14–19]. A network, called DESRGAN, was devel- from a source domain to a target domain without paired oped to denoise background and reconstruct both the stroke examples. CycleGAN’s coding scheme is to hide part of structure and fine details of targeted Chinese characters. Our the information about the input image in low-amplitude, method used ESRGAN [19] as the basic network structure high-frequency signal added to the output image [14]. and applied dilated convolution to residual-in-residual dense Another way to solve the separation of printed is to treat blocks. A new loss function that can measure the integrity of the image overlapped by handwriting and printed as a the font skeleton was imposed. Then, the generator of the low-resolution picture, and the neural network determines trained GAN model was used to separate targeted characters. which part needs to be enhanced in the process of single- Our main contributions include the following: (a) we image super-resolution. SRGAN [18] takes advantage of a proposed a new network structure and a loss function that perceptual loss function which consists of an adversarial achieves the goal of Chinese character separation from noisy loss and a content loss. Based on SRGAN, ESRGAN [19] background, especially when characters are highly over- improves the network structure by introducing the lapped; (b) the proposed method achieved the best results residual-in-residual dense block and computes perceptual Wireless Communications and Mobile Computing 3 Residual-in-residual dense block convolution 4 convolution 5 convolution 1 convolution 2 convolution 3 Leaky ReLU Leaky ReLU Leaky ReLU Leaky ReLU Dilated Dilated Dilated Dilated Dilated Dense Dense Dense block block block × × × � � � + + + ×� + Generator Synthesized Residual-in-residual Residual-in-residual Residual-in-residual
Recommended publications
  • Chinese Calligraphy: Standard Script for Beginners Free
    FREE CHINESE CALLIGRAPHY: STANDARD SCRIPT FOR BEGINNERS PDF Qu Lei Lei,Jane Portal | 32 pages | 21 Jun 2004 | BRITISH MUSEUM PRESS | 9780714124254 | English | London, United Kingdom Chinese script styles - Wikipedia In Chinese calligraphyChinese characters can be written according to five major styles. These styles are intrinsically linked to the history of Chinese script. When used in decorative ornamentation, such as book covers, movie posters, and wall hangings, characters are often written in ancient variations or simplifications that deviate from the modern standards used in Chinese, Japanese, Vietnamese or Korean. Modern variations or simplifications of characters, akin to Chinese Simplified characters or Japanese shinjitaiare occasionally used, especially since some simplified forms derive from cursive script shapes in the first place. The Japanese syllabaries of katakana and hiragana are used in calligraphy; the katakana were derived from the shapes of regular script characters and hiragana from those of cursive script. In Korea, the post- Korean War period saw the increased use of hangulthe Korean alphabet, in calligraphy. Today, this style of Chinese writing is used predominantly in sealshence the English name. Although seals name chopswhich make a signature-like impression, are carved in wood, jade and other materials, the script itself was originally written with brush and ink on bamboo books and other media, just like all other ancient scripts. However, because seals act like legal signatures in the cultures of ChinaJapanKoreaand Vietnamand because vermillion seal impressions are a fundamental part of the presentation of works of art such as calligraphy and painting, seals and therefore seal script remain ubiquitous. Clerical script characters are often "flat" in appearance, being wider than the preceding seal script and the modern standard script, both of which tend to be taller than they are wide; some versions of clerical are square, and others are wider.
    [Show full text]
  • Myocrtool: Visualization System for Generating Associative Images of Chinese Characters in Smart Devices
    Hindawi Complexity Volume 2021, Article ID 5583287, 14 pages https://doi.org/10.1155/2021/5583287 Research Article MyOcrTool: Visualization System for Generating Associative Images of Chinese Characters in Smart Devices Laxmisha Rai and Hong Li College of Electronic and Information Engineering, Shandong University of Science and Technology, Qingdao 266590, China Correspondence should be addressed to Laxmisha Rai; [email protected] Received 6 February 2021; Revised 12 March 2021; Accepted 16 April 2021; Published 11 May 2021 Academic Editor: Abd E.I.-Baset Hassanien Copyright © 2021 Laxmisha Rai and Hong Li. )is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Majority of Chinese characters are pictographic characters with strong associative ability and when a character appears for Chinese readers, they usually associate with the objects, or actions related to the character immediately. Having this background, we propose a system to visualize the simplified Chinese characters, so that developing any skills of either reading or writing Chinese characters is not necessary. Considering the extensive use and application of mobile devices, automatic identification of Chinese characters and display of associative images are made possible in smart devices to facilitate quick overview of a Chinese text. )is work is of practical significance considering the research and development of real-time Chinese text recognition, display of associative images and for such users who would like to visualize the text with only images. )e proposed Chinese character recognition system and visualization tool is named as MyOcrTool and developed for Android platform.
    [Show full text]
  • Research and Discussion on the Stereotyped Writing Criticism of Jiangxi School
    2017 3rd International Conference on Social Science, Management and Economics (SSME 2017) ISBN: 978-1-60595-462-2 Research and Discussion on the Stereotyped Writing Criticism of Jiangxi School Xiaolian Li1 Abstract The stereotyped writing criticism of Jiangxi School is an important school in Chinese literature history. It is not only an influential existence on literature, but also politically gives impact on historical development. Qing dynasty literature in almost all areas have made great progress that make the various forms of genre of Chinese classical literature have been into the comprehensive prosperity. The rhythm of the eight-part essay writing features can also be of composition structure shift from above, the inversion analysis of rhythm of the reasonable explanation. Eight-part essays writing have speaking format, the arrangement of each part of the article structure and should be treated according to certain programs. This paper integrates the related result to propose the new idea on the stereotyped writing criticism genre that propose the novel perspective for the related research areas. Keywords: Stereotyped Writing, Criticism, Jiangxi School Introduction In Ming dynasty, the imperial examinations in the four article in the eleventh year of Chenghua not to evolve into a new paradigm, namely in the main body of the article and eight couplet to complete the subject, this paper is unique to be the eight-part essays, and gradually replaced four article as system is the general term for righteousness. Initially this format is mostly appear in the sentence subject, then gradually covered other types of topic, became the most popular Chenghua and Hongzhi after made art paradigm.
    [Show full text]
  • Portland State University
    American Oriental Society Western Branch 2008 PROGRAM October 24-25, Portland, Oregon Co-sponsored by The Department of Foreign Languages & Literatures Portland State University Meeting Sites Friday Morning: Double sessions will occur at The Old Church: 1422 SW 11th Avenue (at Clay Street). Street-car stop: 11th and Clay. From Mark Spencer Hotel: Turn right as you exit the hotel, and walk ten blocks up 11th Avenue to Clay Street. The Old Church is on the left. Or, turn right as you exit the hotel, then board the South Waterfront street-car at 11th and Alder; ride three stops to Clay Street. Return to hotel by boarding the NW 23rd Avenue street-car on 10th. The Portland Streetcar is free within the downtown core; cars run every 10-15 minutes; the trip to The Old Church is about a 7-minute ride. Friday Afternoon and Saturday: Full sessions in The Pearl Room, basement of the Mark Spencer Hotel: 409 SW 11th Avenue (at Stark Street). From airport: Take the MAX Light Rail train for $2.30; trains run every 15 minutes, ride takes 42 minutes. Get off at The Galleria, on Morrison Street between 9th and 10th Avenues. Walk 1½ blocks forward (to 11th), and 2 ½ blocks right (past Washington Street). Hotel is on the corner of 11th and Stark. From AMTRAK: walk 12 blocks south and 3 blocks west; or take a cab; or walk 4 blocks to 11th Avenue and take the Streetcar to Stark or Alder. Friday Evening Reception: In the Browsing Lounge (Room 238), Smith Memorial Student Union, Portland State University.
    [Show full text]
  • Chinese Character Image Clustering and Classification Based on Object Embedding Model (Student Abstract)
    PRELIMINARY PREPRINT VERSION: DO NOT CITE The AAAI Digital Library will contain the published version some time after the conference. Chinese Character Image Clustering and Classification Based on Object Embedding Model (Student Abstract) Mengting Wang, Xun Liang, Yang Xue School of Information, Renmin University of China, Beijing 100872, China {mtw, xliang, xueyang}@ruc.edu.cn Abstract according to the image characteristics. Lastly, we use We proposed Image2Vec, an unsupervised algorithm that manifold learning and logical regression algorithm to test learns feature representations for variable-size pieces of the model. The classification results outperformed the Chinese character images. existing unsupervised learning writer identification methods. Introduction Model In recent ten years, the methods represented by CNN supervised learning have made great progress in writer In this section, we introduce our model Image2Vec. Given identification, and the precision has reached more than a series of images {Mi}(i = 1, 2, ..., N) , Image2Vec maps 99%. Nevertheless, the precision of the unsupervised the items into the real space . As showed in Figure 1, learning methods is still at a low level of about 60%. This Image2Vec has two main steps: the preparation step and paper studies the problem of font recognition based on object embedding step. unsupervised learning method. The font recognition on Chinese characters is a subset of the broader domain of writer identification. Whereas supervised learning methods for Chinese character writer identification have made great achievements, unsupervised learning for this task is still a very difficult problem because the ability of feature extraction in unsupervised learning is weaker. Unsupervised learning methods for the writer identification Figure 1: Image2Vec model.
    [Show full text]
  • Archaistic Perfection: the Production of the Woodblock-Printed Edition of the Communist Manifesto in 1970S China
    Archaistic Perfection: The Production of the Woodblock-Printed Edition of The Communist Manifesto in 1970s China At the beginning of his introduction to From Woodblocks to the Internet, Christopher Reed lists The Communist Manifesto as one of the most influential printed texts in Western history, sharing equal importance with the Bible, Jane Eyre and Shakespeare’s plays.1 In East Asia, this communist classic not only profoundly influenced the modern history of China, but was also, surprisingly, connected to the use of woodblock printing technology in the 20th-century. In 1973, a woodblock-printed and thread-bound edition of The Communist Manifesto (Gongchandang xuanyan, 共产党宣言, hereafter The Manifesto) was published by the Cloudy Studio (Duoyunxuan, 朵云轩), a publisher in Shanghai which specialised in collectible Chinese literati paintings, stationery, books and similar products.2 This was then, and still is, the first and only woodblock-printed edition of this classic of the communist canon in the history of the Chinese book.3 The choice of woodblock printing technology was remarkable during the 1970s. Under the direct instruction from central CCP leaders including Yao Wenyuan 姚文元, Shanghai Revolutionary Committee ordered the Cloudy Studio to woodblock publish The Manifesto.4 The publisher was only able to recruit a few aged woodblock cutters for the project since no new cutters had been trained during the Cultural Revolution.5 This was not only due to the political ruptures, for by the 1970s woodblock printing had been losing its popularity
    [Show full text]
  • Institutsbericht 11
    Institut für Ostasienwissenschaften der Universität Wien INSTITUTSBERICHT 11 Studienjahr 2010/2011 November 2011 Inhaltsverzeichnis Bericht des Vorstands .............................................................................................................. 4 Bericht der Studienprogrammleitung ........................................................................................ 6 Ostasienwissenschaften ........................................................................................................... 8 Personal ................................................................................................................................ 8 Ostasienforum ...................................................................................................................... 8 Ostasientag 2010 .................................................................................................................. 8 EastAsiaNet Workshop ......................................................................................................... 9 EDV .................................................................................................................................... 10 1. Japanologie ........................................................................................................................ 11 1.1. Personal ....................................................................................................................... 11 1.2. Hörer- und Hörerinnenstatistik ....................................................................................
    [Show full text]
  • The Integration Between Chinese Characters Design and National Culture
    International Conference on Education, Language, Art and Intercultural Communication (ICELAIC 2014) The Integration Between Chinese Characters Design and National Culture Mengmeng Zhou Jiao Meng Art College, Yantai Nanshan University Yang Zhai Middle School Yantai, China Zibo, China e-mail: [email protected] e-mail: [email protected] Abstract—Nowadays, there has been a wild gap between Chinese character’ design and the design of Europe, America, A. The seal character Japan and Korea. The reason is that the designers ignore the The seal character, including minor and seal script. Minor Chinese culture while they are designing. Chinese characters are refers to oracle, jinwen, the Zhou dynasty, the six characters. one part of great Chinese nation which has more than 5000- They preserve the ancient hieroglyphs of obvious year Chinese culture. How to combine the design of Chinese characteristics. Also known as' Qin Zhuan ', is the language of characters with Chinese culture becomes the important task of Qin. Simplified characters are minor. Features are well made the development of national culture. This paper mainly focuses neat form; the font is the Zhou dynasty easy to write. In the on the above problem. history of Chinese characters, it is a minor nationality, the transition between the official script and regular script. Keywords—Chinese characters; character script; national culture; intergration; national characterristic B. Official script I. INTRODUCTION Official script, which is also called han nationalities. One of the Chinese characters in common grave fonts. Writing With the development of Chinese politics and economic, effect is slightly wide and flat. Horizontal drawing is long and the demand of the continuous development of the cultural and straight short, a rectangular shape.
    [Show full text]
  • New Simsun Font for Mac
    1 / 5 New Simsun Font For Mac Fonts. Google Fonts is a library of 1,064 free licensed font families and APIs for conveniently using the fonts via CSS and Android. We also provide delightful, .... Sep 15, 2020 — Download SimSun font free for Windows and Mac. Please note: ... I'm new to the site and noticed that they have a "Pixel-Bitmap Font" catagory.. The best website for free high-quality Simsun Mac fonts, with 20 free Simsun Mac ... If you are just looking to download the latest version, head to the forum post .... Mar 7, 2018 — If you've found Google Docs collection of default fonts seriously lacking, here's ... It's not like you can just install new fonts for your operating system and have ... The best email platforms for iPhone, iPad and Mac users in 2021.. Sep 21, 2020 — We have thousands of free fonts available for you. Cloud fonts are fonts hosted in the cloud by Microsoft Office, and are available in the latest .... Mar 11, 2020 — SimSun & NSimSun is a Simplified Chinese font features mincho (serif) stroke style. Overview. File name, Simsun.ttc. Simsunb.ttf. Styles & .... A long time vendor of Chinese OEM fonts, in 2006 Monotype's new owners ... YaHei" fonts that come with Windows Vista and later, as well as Simsun (Founder .... List, description and examples of the Chinese language fonts available in Microsoft ... SimSun for Simplified Chinese (mainland/Singapore) keyboards. ... Two new Chinese fonts were introduced with Vista, "Microsoft JhengHei" and "Microsoft ... Windows XP Chinese · Ubuntu Linux Chinese · Other OS: Android, Mac, .
    [Show full text]
  • Defining the Fake in Chinese Porcelain
    24/07/2019 True or False? Defining the Fake in Chinese Porcelain Les Cahiers de Framespa Nouveaux champs de l’histoire sociale 31 | 2019 : provided by SOAS Research Online View metadata, citation and similar papers at core.ac.uk CORE Vrai ou fauxbrought to you by ? : qualifier les porcelaines de Chine (XVe-XXIe siècle) Dossier : Vrai ou faux ? : qualifier les porcelaines de Chine (XVe-XXIe siècle) True or False? Defining the Fake in Chinese Porcelain Vrai ou faux ? Définir le faux dans la porcelaine de Chine ¿Verdadero o falso? Definiendo lo falso en la porcelana china STACEY PIERSON Résumés English Français Español This article explores the category of ‘fake’ in Chinese porcelain. It begins by defining the fake in the context of Chinese art and considers how the fake might be a stylistic category as well as a concept with reference to Chinese ceramics. Fake porcelain by definition is made to deceive but what kind of deception is intended? Some porcelain is made to deceive the buyer who believes it is ‘authentic’. This is falsifying for profit. However, deception can also be a form of aesthetics or even a necessity. Some Chinese porcelains were intentionally made in imitation of past pieces, for example. These are usually defined as ‘archaistic’. Others were made as replacements, and still others as skeuomorphs in the trompe l’oeil tradition that developed in court arts of the Qing period. For the latter, this article queries and explores what in fact is being faked. In doing so, it demonstrates that the concept of ‘fake’ in Chinese porcelain is complex and its production is grounded in both commercial and connoisseurship practices.
    [Show full text]
  • The Chinese Seal in the Making, 1904-1937
    The Chinese Seal in the Making, 1904-1937 Elizabeth Lawrence Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2014 1 © 2014 Elizabeth Lawrence All rights reserved 2 ABSTRACT The Chinese Seal in the Making, 1904-1937 Elizabeth Lawrence Seals are hand-held printing blocks inscribed with some pattern, generally text. They were objects of immense power and prestige in imperial China. This dissertation examines the modern afterlife of inscribed seals against the backdrop of the decline and collapse of an imperial era order of knowledge and social status, the rise of modern consumer markets and mass culture, and the local accommodation of modern disciplines that promoted new ways of classifying and engaging the material world. In late imperial China (ca. 1600s-1800s), seals legitimized the rule of the emperor and his civil servants and marked the taste and erudition of the literati elite. As hand-held printing blocks that replicated in ink small textual signs, they produced authorizing marks of personhood and office and attracted elite collectors as calligraphic compositions of antiquarian interest. In modern China, seals proliferated within the cosmopolitan material culture of cities like Hangzhou and Shanghai. As the seal was transformed following the disintegration the imperial system, its multifaceted meanings and functions were increasingly subsumed under a monolithic category of “Chinese seal” as art object. The making of the “Chinese seal” as a representative fine art and marker of a distinctive Chinese culture evolved out of the diverse ways in which the carvers, consumers, scholars, and users of seals defined the object’s significance in a modern world.
    [Show full text]
  • Chinese Character Image Clustering and Classification Based on Object Embedding Model (Student Abstract)
    The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) Chinese Character Image Clustering and Classification Based on Object Embedding Model (Student Abstract) Mengting Wang, Xun Liang, Yang Xue School of Information, Renmin University of China, Beijing 100872, China {mtw, xliang, xueyang}@ruc.edu.cn Abstract according to the image characteristics. Lastly, we use We proposed Image2Vec, an unsupervised algorithm that manifold learning and logical regression algorithm to test learns feature representations for variable-size pieces of the model. The classification results outperformed the Chinese character images. existing unsupervised learning writer identification methods. Introduction Model In recent ten years, the methods represented by CNN supervised learning have made great progress in writer In this section, we introduce our model Image2Vec. Given identification, and the precision has reached more than a series of images {Mi}(i = 1, 2, ..., N) , Image2Vec maps 99%. Nevertheless, the precision of the unsupervised the items into the real space . As showed in Figure 1, learning methods is still at a low level of about 60%. This Image2Vec has two main steps: the preparation step and paper studies the problem of font recognition based on object embedding step. unsupervised learning method. The font recognition on Chinese characters is a subset of the broader domain of writer identification. Whereas supervised learning methods for Chinese character writer identification have made great achievements, unsupervised learning for this task is still a very difficult problem because the ability of feature extraction in unsupervised learning is weaker. Unsupervised learning methods for the writer identification Figure 1: Image2Vec model. include TF-IDF and subtractive clustering method (Singh Preparation Step.
    [Show full text]