Design a Fast 3D Scanner Using a Laser Line

REPUBLIC OF TURKEY FIRAT UNIVERSITY GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCE AN ANDROID BASED RECEIPT TRACKER SYSTEM USING OPTICAL CHARACTER RECOGNITION KAREZ ABDULWAHHAB HAMAD Master Thesis Department: Software Engineering Supervisor: Asst. Prof. Dr. Mehmet KAYA JULY – 2017 ACKNOWLEDGEMENTS First, thanks to ALLAH, the Almighty, for granting me the well and strength, with which this master thesis was accomplished; it will be the first step to propose much more great scientific researches. I would like to acknowledge my thankfulness and appreciation to my supervisor Asst. Prof. Dr. Mehmet KAYA for his guidance, assistance encouragement, wisdom suggestions, and valuable advice that made the completion of the present master thesis possible. Last but not the least; I want to express my special thankfulness to my lovely parents, and special gratitude to all members of my family and friends. Special thanks to my lovely uncle Assoc. Prof. Dr. Yadgar Rasool, who helped me and encouraged me a lot during my study. II TABLE OF CONTENTS Page No ACKNOWLEDGEMENTS ............................................................................................... II TABLE OF CONTENTS ................................................................................................. III ABSTRACT ....................................................................................................................... VI ÖZET ................................................................................................................................ VII LIST OF FIGURES ........................................................................................................ VIII LIST OF TABLES ............................................................................................................. XI LIST OF ABBREVIATIONS ......................................................................................... XII 1. INTRODUCTION ........................................................................................... 1 1.1. Background ........................................................................................................ 1 1.2. Problems Statement ........................................................................................... 5 1.3. General Aims and Objectives ............................................................................ 5 1.4. Thesis Layout ..................................................................................................... 7 2. THEORETICAL TECHNIQUES AND BACKGROUND OF OCR ......... 9 2.1. OCR Challenges ................................................................................................ 9 2.1.1. Complexity of scene ........................................................................................... 9 2.1.2. Uneven lighting problem.................................................................................... 10 2.1.3. Skewness problem.............................................................................................. 11 2.1.4. Un-focus and deterioration ................................................................................. 13 2.1.5. Aspect ratios ....................................................................................................... 13 2.1.6. Tilting problem .................................................................................................. 14 2.1.7. Fonts ................................................................................................................... 15 2.1.8. Multilingual environments ................................................................................. 15 2.1.9. Warping problem ............................................................................................... 16 2.2. OCR Applications .............................................................................................. 17 2.2.1. Hand-writing recognition applications ............................................................... 17 2.2.2. Healthcare applications ...................................................................................... 17 2.2.3. Financial tracking applications ........................................................................... 17 2.2.4. Legal industry .................................................................................................... 18 2.2.5. Banking application ........................................................................................... 18 2.2.6. Captcha breaking application ............................................................................. 18 III 2.2.7. Automatic number plate recognition application (ANPR) ................................. 19 2.3. OCR Phases ....................................................................................................... 19 2.3.1. Image pre-processing phase ............................................................................... 19 2.3.2. Segmentation phase ............................................................................................ 24 2.3.3. Normalization phase .......................................................................................... 26 2.3.4. Feature extraction phase ..................................................................................... 26 2.3.5. Classification phase ............................................................................................ 27 2.3.6. Post-processing phase ........................................................................................ 29 2.4. OCR Engines ..................................................................................................... 29 2.4.1. GOCR engine ..................................................................................................... 29 2.4.2. Ocrad engine ...................................................................................................... 30 2.4.3. OCRopus ............................................................................................................ 30 2.4.4. Tesseract OCR engine ........................................................................................ 31 3. PROPOSED TECHNIQUES .......................................................................... 38 3.1. System Overview ............................................................................................... 38 3.1.1. Receipt region detection ..................................................................................... 40 3.1.2. Receipt image pre-processing phase .................................................................. 43 3.1.3. Recognition phase .............................................................................................. 51 3.1.4. Regular expression (Regex) phase ..................................................................... 60 3.1.5. Database phase ................................................................................................... 62 3.2. Implementation and Practical Work .................................................................. 62 3.3. System Screenshots ........................................................................................... 68 4. QUERIES AND EXPERIMENTAL RESULTS ........................................... 72 4.1. User Queries ...................................................................................................... 72 4.1.1. Spend analyzer ................................................................................................... 72 4.1.2. Receipt image discovering ................................................................................. 74 4.1.3. Total money expended ....................................................................................... 75 4.1.4. Total money expended for a particular item....................................................... 77 4.2. Experimental Outcomes ..................................................................................... 78 4.2.1. Capability metrics .............................................................................................. 79 4.2.2. Examination corpus ............................................................................................ 80 4.2.3. Fake receipt font experimental outcomes ........................................................... 81 IV 4.2.4. Merchant copy font experimental outcomes ...................................................... 90 4.2.5. Evaluation of outcomes experienced .................................................................. 99 5. CONCLUSION AND FUTURE WORKS ..................................................... 102 6. REFERENCES ................................................................................................ 104 CURRICULUM VITA .................................................................................... 111 V ABSTRACT AN ANDROID BASED RECEIPT TRACKER SYSTEM USING OPTICAL CHARACTER RECOGNITION Since demands for innovating and implementing mobile apps gets deeper, therefore innovations on designing and creating desktop OCR Apps moved and shifted to propose and innovate mobile OCR Apps. Optical Character Recognition (OCR) is the technology that converts the text from handwritten images, text printed images or scanned images to the alterable text for further analysis and process. In this research, we suggested an Android OCR Application for automatically extracting and recognizing text on the receipt images. This research presented the main and powerful techniques proposed

Design a Fast 3D Scanner Using a Laser Line

Reconocimiento De Escritura Lecture 4/5 --- Layout Analysis

Text Classification and Layout Analysis for Document Reassembling

Comparison of Visual and Logical Character Segmentation in Tesseract OCR Language Data for Indic Writing Scripts

Mathematical Expression Detection and Segmentation in Document Images

Geometric Layout Analysis of Scanned Documents

Extending the Page Segmentation Algorithms of the Ocropus Document Layout Analysis System

Ocrdroid: a Framework to Digitize Text Using Mobile Phones

Australasian Language Technology Association Workshop 2015

Ocrdroid: a Framework to Digitize Text Using Mobile Phones

Report on File Formats for Hand-Written Text Recognition (HTR) Material

2.1.4 Document Layout Analysis