Optical Character Recognition - a Combined ANN/HMM Approach

Optical Character Recognition - A Combined ANN/HMM Approach Dissertation submitted to the Department of Computer Science Technical University of Kaiserslautern for the fulfillment of the requirements for the doctoral degree Doctor of Engineering (Dr.-Ing.) by Sheikh Faisal Rashid Dean: Prof. Dr. Klaus Schneider Thesis supervisors: Prof. Dr. Thomas Breuel, TU Kaiserslautern Prof. Dr. Andreas Dengel, TU Kaiserslautern Chair of supervisory committee: Prof. Dr. Karsten Berns, TU Kaiserslautern Kaiserslautern, 11 July, 2014 D 386 Abstract Optical character recognition (OCR) of machine printed text is ubiquitously considered as a solved problem. However, error free OCR of degraded (broken and merged) and noisy text is still challenging for modern OCR systems. OCR of degraded text with high accuracy is very important due to many applications in business, industry and large scale document digitization projects. This thesis presents a new OCR method for degraded text recognition by introducing a combined ANN/HMM OCR approach. The approach provides significantly better performance in comparison with state-of-the-art HMM based OCR methods and existing open source OCR systems. In addition, the thesis introduces novel applications of ANNs and HMMs for document image preprocessing and recognition of low resolution text. Furthermore, the thesis provides psychophysical experiments to determine the effect of letter permutation in visual word recognition of Latin and Cursive script languages. HMMs and ANNs are widely employed pattern recognition paradigms and have been used in numerous pattern classification problems. This work presents a simple and novel method for combining the HMMs and ANNs in application to segmentation free OCR of degraded text. HMMs and ANNs are powerful pattern recognition strategies and their combination is interesting to improve current state-of-the-art research in OCR. Mostly, previous attempts in combining the HMMs and ANNs were focused on applying ANNs as approximation of the probability density function or as a neural vector quantizer for HMMs. These methods either require combined NN/HMM training criteria [ECBG- MZM11] or they use complex neural network architecture like time delay or space dis- placement neural networks [BLNB95]. However, in this work neural networks are used as discriminative feature extractor, in combination with novel text line scanning mech- anism, to extract discriminative features from unsegmented text lines. The features are processed by HMMs to provide segmentation free text line recognition. The ANN/HMM modules are trained separately on a common dataset by using standard machine learning 1 2 procedures. The proposed ANN/HMM OCR system also realizes to some extent several cognitive reading based strategies during the OCR. On a dataset of 1, 060 degraded text lines extracted from the widely used UNLV-ISRI benchmark database [TNBC99], the presented system achieves a 30% reduction in error rate as compared to Google’s Tesser- act OCR system [Smi13] and 43% reduction in error as compared to OCRopus OCR system [Bre08], which are the best open source OCR systems available today. In addition, this thesis introduces new applications of HMMs and ANNs in OCR and document images preprocessing. First, an HMMs-based segmentation free OCR approach is presented for recognition of low resolution text. OCR of low resolution text is quite important due to presence of low resolution text in screen-shots, web images and video captions. OCR of low resolution text is challenging because of anti-aliased rendering and use of very small font size. The characters in low resolution text are usually joined to each other and they may appear differently at different locations on computer screen. This work presents the use of HMMs in optical recognition of low resolution isolated characters and text lines. The evaluation of the proposed method shows that HMMs-based OCR techniques works quite well and reaches the performance of specialized approaches for OCR of low resolution text. Then, this thesis presents novel applications of ANNs for automatic script recognition and orientation detection. Script recognition determines the written script on the page for the application of an appropriate character recognition algorithm. Orientation detection detects and corrects the deviation of the document’s orientation angle from the horizontal direction. Both, script recognition and orientation detection, are important preprocessing steps in developing robust OCR systems. In this work, instead of extracting handcrafted features, convolutional neural networks are used to extract relevant discriminative features for each classification task. The proposed method resulted in more than 95% script recognition accuracy on various multi-script documents at connected component level and 100% page orientation detection accuracy for Urdu documents. Human reading is a nearly analogous cognitive process to OCR that involves decoding of printed symbols into meanings. Studying the cognitive reading behavior may help in building a robust machine reading strategy. This thesis presents a behavioral study that deals on how cognitive system works in visual recognition of words and permuted non-words. The objective of this study is to determine the impact of overall word shape in visual word recognition process. The permutation is considered as a source of shape degradation and visual appearance of actual words can be distorted by changing the con- 3 stituent letter positions inside the words. The study proposes a hypothesis that reading of words and permuted non-words are two distinct mental level processes, and people use different strategies in handling permuted non-words as compared to normal words. The hypothesis is tested by conducting psychophysical experiments in visual recognition of words from orthographically different languages i.e. Urdu, German and English. Exper- imental data is analyzed using analysis of variance (ANOVA) and distribution free rank tests to determine significance differences in response time latencies for two classes of data. The results support the presented hypothesis and the findings are consistent with the dual route theories of reading. 4 Acknowledgments Foremost, I would like to express my sincere gratitude to my advisor Prof. Dr. Thomas Breuel for his continuous support in my Ph.D. He is always a source of inspiration to me for his patience, motivation, enthusiasm, and immense knowledge in the field of computer science, pattern recognition, image processing and document analysis. I am also obliged to my second advisor, Prof. Dr. Andreas Dengel for his timely and highly up to the mark feedback and support. His grand knowledge and expertise in the field of artificial intelligence is an asset for me life long. Along with my advisors, I am grateful to the chair of my thesis committee, Prof. Dr. Karsten Berns, for his encouragement, insightful comments, and hard questions. My sincere gratitude to the most adorable Dr. Faisal Shafait for his kind and nourishing supervision of my PhD. Moreover, Dr. Tandra Ghose must be acknowledged for making my basis in cognitive psychology research. A special thanks to Dr. Marcus Liwicki for providing valuable comments in preparation of my thesis defense. I am blissful to Dr. Marc-Peter Schambach, Dr. Jörg Rottland, Mr. Stephan von der Nüll for offering me internship opportunity at Seimens Konstanz and leading me working on diverse exciting projects. It is my pleasure to be part of IUPR research group who nurtures my research in different phases. Special thanks to Joost van Beusekom for presenting my work in SPIE HVEI conference, Sheraz Ahmed and Muhammad Imran Malik, my best buddies, for listening and correcting my defense presentation, and Ingrid Romani for helping me in various administrative issues. Last but not the least, I would like to thank my family: my mother Zubaida Rashid, to whom I owe so much. Thank you mom teaching me the importance of hard work, for giving me the strength during time of adversity and for your constant love & support. There are not enough words to express how grateful I feel to have you as my mom. My 5 6 elder brother, Sheikh Muhammad Idrees, who raised me like a father and guided me in every aspect of my life. My wife & best friend, Dr. Iram Ashraf, for love, patience and ability to always keep me balance. You are my rock and constant source of inspiration, and this project not have been completed without your unwavering support. I love you more and more each day of my life. My daughter, Urooj Faisal –my lucky chum–, whose lovely talks are always a refreshing boost for me. And thanks to whole my family for their prayers and well wishes. Contents 1 Introduction 15 1.1 Contributions of this Thesis . 22 1.2 Thesis Overview . 23 2 Segmentation free OCR using HMMs 25 2.1 Introduction . 25 2.2 Related Work . 28 2.3 Hidden Markov Models . 32 2.3.1 Definition . 32 2.3.2 Three Basic Problems for HMMs . 33 2.3.3 Solutions to the HMMs Problems . 34 2.4 HMMs based OCR System . 38 2.4.1 Preprocessing . 38 2.4.2 Feature Extraction . 39 2.4.3 Recognition . 42 2.5 Experimental Results and Evaluations . 43 2.5.1 Low resolution characters OCR . 43 2.5.2 Low resolution screen rendered text lines OCR . 44 2.5.3 Machine printed text lines OCR . 45 2.6 Discussion . 46 3 Discriminative Learning for Document Image Analysis using ANNs 49 3.1 Introduction . 50 3.2 Script Recognition . 53 3.2.1 Review of related work . 55 3.2.2 Proposed method . 57 3.2.3 Experimental Results . 62 7 8 CONTENTS 3.2.4 Discussion . 65 3.3 Orientation Detection . 65 3.3.1 Related Work . 66 3.3.2 Datasets . 67 3.3.3 Method description . 68 3.3.4 Experimental Results and Evaluation . 72 3.3.5 Discussion . 73 3.4 General Discussion . 74 4 A Combined ANN/HMM Approach for OCR 75 4.1 Introduction . 76 4.2 Overview of Related Work .

Optical Character Recognition - a Combined ANN/HMM Approach

Download Fedra Sans Bold

Multimedia Foundations Glossary of Terms Chapter 8 – Text

Steganography in Arabic Text Using Full Diacritics Text

CLASS: Working with Students with Dysgraphia

Cap Height Body X-Height Crossbar Terminal Counter Bowl Stroke Loop

License Agreement

INTRODUCTORY CONCEPTS Desktop Publishing Terms Overview

Microsoft Word 2016 Step by Step

African Fonts and Open Source

A Uppercase Letter Meaning

Word 2010 Basics I

2.1 Typography