Evaluation of the SVM Based Multi-Fonts Kanji Character Recognition Method for Early-Modern Japanese Printed Books
Evaluation of the SVM based Multi-Fonts Kanji Character Recognition Method for Early-Modern Japanese Printed Books Manami Fukuo1, Yurie Enomoto1†, Naoko Yoshii1, Masami Takata1, Tsukasa Kimesawa2, and Kazuki Joe 1, 1 Dept. of Advanced Information & Computer Sciences, Nara Women’s University, Nara, Japan 2 Digital Library Division, National Diet Library, Kyoto, Japan Abstract - The national diet library in Japan provides a web The information including titles and author names of the based digital archive for early-modern printed books by books in the digital library is given as text data while main image. To make better use of the digital archive, the book body is image data. There are no functions for generating text images should be converted to text data. In this paper, we data from image data. Thus full-text search is not supported evaluate the SVM based multi-fonts Kanji character yet. To make early-modern printed and valuable books data recognition method for early-modern Japanese printed books. more accessible, their main body should be given as text data, Using several sets of Kanji characters clipped from different too. As described above, the number of the target books is so publishers’ books, we obtain the recognition rate of more than large that auto conversion is required. 92% for 256 kinds of Kanji characters. It proves our recognition method, which uses the PDC (Peripheral If the conversion targets were general text images, they would Direction Contributivity) feature of given Kanji character have been converted into text data easily with some software images for learning and recognizing with an SVM, is effective of optical character recognition (OCR).
[Show full text]