Addis Ababa University School of Graduate Studies School of Information Science

Addis Ababa University School of Graduate Studies School of Information Science

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCE BILINGUAL SCRIPT IDENTIFICATION FOR OPTICAL CHARACTER RECOGNITION OF AMHARIC AND ENGLISH PRINTED DOCUMENT SERTSE ABEBE JUNE, 2011 ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCE BILINGUAL SCRIPT IDENTIFICATION FOR OPTICAL CHARACTER RECOGNITION OF MIXED AMHARIC AND ENGLISH PRINTED DOCUMENT A Thesis Submitted to the School of Graduate Studies of Addis Ababa University in Partial Fulfillment of the Requirements for the Degree of Master of Science in Information Science By SERTSE ABEBE JUNE, 2011 ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCE BILINGUAL SCRIPT IDENTIFICATION FOR OPTICAL CHARACTER RECOGNITION OF MIXED AMHARIC AND ENGLISH PRINTED DOCUMENT By SERTSE ABEBE JUNE, 2011 Name and signature of Members of the Examining Board Name Title Signature Date ___________________________Chairperson _________________________ ____________ __________________________ Advisor(s), _________________________ ____________ __________________________ Examiner, _________________________ _____________ Declaration I declare that the thesis is my original work and has not been presented for a degree in any other university. _________________ Date This thesis has been submitted for examination with my approval as university advisor. _________________ Advisor Acknowledgment First of all I would like to acknowledge my advisor Dr. Dereje Teferi for his constructive comments and advices. My acknowledgment shall also pass to Bahir Dar University, which Provide me the opportunity to enroll in this program. I want also acknowledge all staff of Addis Ababa University School of information science for all their cooperation. Finally I want to Acknowledge my Wife Simegnat Abuhay and by beloved son Natan Sertse for all their support and love during my stay in the program. Table of Contents LIST OF FIGURES ................................................................................................................................ i LIST OF TABLES ................................................................................................................................. ii ABSTRACT .................................................................................................................................... iii CHAPTER ONE .............................................................................................................................. 1 INTRODUCTION ........................................................................................................................... 1 1.1. Background ................................................................................................................. 1 1.2. Statement of the Problem ............................................................................................ 6 1.3. Objective of the Study ................................................................................................. 7 1.3.1. General objective ......................................................................................................... 7 1.3.2. Specific objectives ....................................................................................................... 7 1.4. Significance of the study.............................................................................................. 8 1.5. Scope and limitation of the study ................................................................................ 8 1.5.1. Scope of the study ......................................................... Error! Bookmark not defined. 1.5.2. Limitation of the study .................................................. Error! Bookmark not defined. 1.6. Methodology ............................................................................................................... 9 1.6.1. Data collection ............................................................................................................. 9 1.6.2. Designing and implementation tools ........................................................................... 9 1.6.3. Literature review ....................................................................................................... 11 1.7. Organization of the Thesis ......................................................................................... 11 CHAPTER TWO ........................................................................................................................... 11 WRITING SYSTEMS ................................................................................................................... 11 2.1. Overview of Writing System ..................................................................................... 12 2.1.1. Families of writing system ........................................................................................ 12 2.2. English Language Script ............................................................................................ 14 2.2.1. Origin and evolution of English language script ....................................................... 14 2.2.2. Characteristics of Roman scripts ............................................................................... 17 2.3. Amharic language Scripts .......................................................................................... 18 2.3.1. Evolution of Ethiopic script ....................................................................................... 19 2.3.2. Features of Ethiopic script ......................................................................................... 21 CHAPTER THREE ....................................................................................................................... 27 OPTICAL CHARACTER RECOGNITION AND SCRIPT IDENTIFICATION ........................ 27 3.1. Optical Character Recognition (OCR) ...................................................................... 27 3.1.1. Multilingual Script OCR ........................................................................................... 29 3.2. Script Identification ................................................................................................... 30 3.3. Tasks in Multilanguage Script Identification ............................................................ 31 3.3.1. Preprocessing ............................................................................................................. 32 3.3.1.1. Digitization ................................................................................................................ 32 3.3.1.2. Skew Detection and Correction ................................................................................. 33 3.3.1.3. Noise Removal .......................................................................................................... 34 3.3.1.4. Binarization and Threshold ....................................................................................... 35 3.3.1.5. Normalization ............................................................................................................ 36 3.3.1.7. Thinning .................................................................................................................... 38 3.3.4. Feature extraction ...................................................................................................... 39 3.3.5. Classification ............................................................................................................. 46 CHAPTER FOUR ......................................................................................................................... 50 DESIGN AND DEVELOPMENT ................................................................................................ 50 4.1. Review of Proposed Method .............................................................................................. 50 4.2.1. Digitization ................................................................................................................ 51 4.2.2. Noise Removal .......................................................................................................... 51 4.2.3. Threshold and Binarization ....................................................................................... 53 4.2.1. Segmentation ............................................................................................................. 55 4.2.2. Size Normalization and thinning ............................................................................... 58 4.2.3. Feature Extraction ..................................................................................................... 60 4.2.4. Classification ............................................................................................................. 64 4.2.4.1. Training ..................................................................................................................... 65 4.2.4.2. Testing ....................................................................................................................... 67 4.3. Data set preparation and experiment .................................................................................. 69 4.3.1. Dataset preparation .................................................................................................... 69 4.3.2. Experiment ................................................................................................................ 71 4.4. Results and Discussion ......................................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    113 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us