Addis Ababa University School of Graduate Studies School of Information Science

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCE BILINGUAL SCRIPT IDENTIFICATION FOR OPTICAL CHARACTER RECOGNITION OF AMHARIC AND ENGLISH PRINTED DOCUMENT SERTSE ABEBE JUNE, 2011 ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCE BILINGUAL SCRIPT IDENTIFICATION FOR OPTICAL CHARACTER RECOGNITION OF MIXED AMHARIC AND ENGLISH PRINTED DOCUMENT A Thesis Submitted to the School of Graduate Studies of Addis Ababa University in Partial Fulfillment of the Requirements for the Degree of Master of Science in Information Science By SERTSE ABEBE JUNE, 2011 ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCE BILINGUAL SCRIPT IDENTIFICATION FOR OPTICAL CHARACTER RECOGNITION OF MIXED AMHARIC AND ENGLISH PRINTED DOCUMENT By SERTSE ABEBE JUNE, 2011 Name and signature of Members of the Examining Board Name Title Signature Date ___________________________Chairperson _________________________ ____________ __________________________ Advisor(s), _________________________ ____________ __________________________ Examiner, _________________________ _____________ Declaration I declare that the thesis is my original work and has not been presented for a degree in any other university. _________________ Date This thesis has been submitted for examination with my approval as university advisor. _________________ Advisor Acknowledgment First of all I would like to acknowledge my advisor Dr. Dereje Teferi for his constructive comments and advices. My acknowledgment shall also pass to Bahir Dar University, which Provide me the opportunity to enroll in this program. I want also acknowledge all staff of Addis Ababa University School of information science for all their cooperation. Finally I want to Acknowledge my Wife Simegnat Abuhay and by beloved son Natan Sertse for all their support and love during my stay in the program. Table of Contents LIST OF FIGURES ................................................................................................................................ i LIST OF TABLES ................................................................................................................................. ii ABSTRACT .................................................................................................................................... iii CHAPTER ONE .............................................................................................................................. 1 INTRODUCTION ........................................................................................................................... 1 1.1. Background ................................................................................................................. 1 1.2. Statement of the Problem ............................................................................................ 6 1.3. Objective of the Study ................................................................................................. 7 1.3.1. General objective ......................................................................................................... 7 1.3.2. Specific objectives ....................................................................................................... 7 1.4. Significance of the study.............................................................................................. 8 1.5. Scope and limitation of the study ................................................................................ 8 1.5.1. Scope of the study ......................................................... Error! Bookmark not defined. 1.5.2. Limitation of the study .................................................. Error! Bookmark not defined. 1.6. Methodology ............................................................................................................... 9 1.6.1. Data collection ............................................................................................................. 9 1.6.2. Designing and implementation tools ........................................................................... 9 1.6.3. Literature review ....................................................................................................... 11 1.7. Organization of the Thesis ......................................................................................... 11 CHAPTER TWO ........................................................................................................................... 11 WRITING SYSTEMS ................................................................................................................... 11 2.1. Overview of Writing System ..................................................................................... 12 2.1.1. Families of writing system ........................................................................................ 12 2.2. English Language Script ............................................................................................ 14 2.2.1. Origin and evolution of English language script ....................................................... 14 2.2.2. Characteristics of Roman scripts ............................................................................... 17 2.3. Amharic language Scripts .......................................................................................... 18 2.3.1. Evolution of Ethiopic script ....................................................................................... 19 2.3.2. Features of Ethiopic script ......................................................................................... 21 CHAPTER THREE ....................................................................................................................... 27 OPTICAL CHARACTER RECOGNITION AND SCRIPT IDENTIFICATION ........................ 27 3.1. Optical Character Recognition (OCR) ...................................................................... 27 3.1.1. Multilingual Script OCR ........................................................................................... 29 3.2. Script Identification ................................................................................................... 30 3.3. Tasks in Multilanguage Script Identification ............................................................ 31 3.3.1. Preprocessing ............................................................................................................. 32 3.3.1.1. Digitization ................................................................................................................ 32 3.3.1.2. Skew Detection and Correction ................................................................................. 33 3.3.1.3. Noise Removal .......................................................................................................... 34 3.3.1.4. Binarization and Threshold ....................................................................................... 35 3.3.1.5. Normalization ............................................................................................................ 36 3.3.1.7. Thinning .................................................................................................................... 38 3.3.4. Feature extraction ...................................................................................................... 39 3.3.5. Classification ............................................................................................................. 46 CHAPTER FOUR ......................................................................................................................... 50 DESIGN AND DEVELOPMENT ................................................................................................ 50 4.1. Review of Proposed Method .............................................................................................. 50 4.2.1. Digitization ................................................................................................................ 51 4.2.2. Noise Removal .......................................................................................................... 51 4.2.3. Threshold and Binarization ....................................................................................... 53 4.2.1. Segmentation ............................................................................................................. 55 4.2.2. Size Normalization and thinning ............................................................................... 58 4.2.3. Feature Extraction ..................................................................................................... 60 4.2.4. Classification ............................................................................................................. 64 4.2.4.1. Training ..................................................................................................................... 65 4.2.4.2. Testing ....................................................................................................................... 67 4.3. Data set preparation and experiment .................................................................................. 69 4.3.1. Dataset preparation .................................................................................................... 69 4.3.2. Experiment ................................................................................................................ 71 4.4. Results and Discussion ......................................................................................................

Load more