REPUBLIC OF TURKEY FIRAT UNIVERSITY GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCE AN ANDROID BASED RECEIPT TRACKER SYSTEM USING OPTICAL CHARACTER RECOGNITION KAREZ ABDULWAHHAB HAMAD Master Thesis Department: Software Engineering Supervisor: Asst. Prof. Dr. Mehmet KAYA JULY – 2017 ACKNOWLEDGEMENTS First, thanks to ALLAH, the Almighty, for granting me the well and strength, with which this master thesis was accomplished; it will be the first step to propose much more great scientific researches. I would like to acknowledge my thankfulness and appreciation to my supervisor Asst. Prof. Dr. Mehmet KAYA for his guidance, assistance encouragement, wisdom suggestions, and valuable advice that made the completion of the present master thesis possible. Last but not the least; I want to express my special thankfulness to my lovely parents, and special gratitude to all members of my family and friends. Special thanks to my lovely uncle Assoc. Prof. Dr. Yadgar Rasool, who helped me and encouraged me a lot during my study. II TABLE OF CONTENTS Page No ACKNOWLEDGEMENTS ............................................................................................... II TABLE OF CONTENTS ................................................................................................. III ABSTRACT ....................................................................................................................... VI ÖZET ................................................................................................................................ VII LIST OF FIGURES ........................................................................................................ VIII LIST OF TABLES ............................................................................................................. XI LIST OF ABBREVIATIONS ......................................................................................... XII 1. INTRODUCTION ........................................................................................... 1 1.1. Background ........................................................................................................ 1 1.2. Problems Statement ........................................................................................... 5 1.3. General Aims and Objectives ............................................................................ 5 1.4. Thesis Layout ..................................................................................................... 7 2. THEORETICAL TECHNIQUES AND BACKGROUND OF OCR ......... 9 2.1. OCR Challenges ................................................................................................ 9 2.1.1. Complexity of scene ........................................................................................... 9 2.1.2. Uneven lighting problem.................................................................................... 10 2.1.3. Skewness problem.............................................................................................. 11 2.1.4. Un-focus and deterioration ................................................................................. 13 2.1.5. Aspect ratios ....................................................................................................... 13 2.1.6. Tilting problem .................................................................................................. 14 2.1.7. Fonts ................................................................................................................... 15 2.1.8. Multilingual environments ................................................................................. 15 2.1.9. Warping problem ............................................................................................... 16 2.2. OCR Applications .............................................................................................. 17 2.2.1. Hand-writing recognition applications ............................................................... 17 2.2.2. Healthcare applications ...................................................................................... 17 2.2.3. Financial tracking applications ........................................................................... 17 2.2.4. Legal industry .................................................................................................... 18 2.2.5. Banking application ........................................................................................... 18 2.2.6. Captcha breaking application ............................................................................. 18 III 2.2.7. Automatic number plate recognition application (ANPR) ................................. 19 2.3. OCR Phases ....................................................................................................... 19 2.3.1. Image pre-processing phase ............................................................................... 19 2.3.2. Segmentation phase ............................................................................................ 24 2.3.3. Normalization phase .......................................................................................... 26 2.3.4. Feature extraction phase ..................................................................................... 26 2.3.5. Classification phase ............................................................................................ 27 2.3.6. Post-processing phase ........................................................................................ 29 2.4. OCR Engines ..................................................................................................... 29 2.4.1. GOCR engine ..................................................................................................... 29 2.4.2. Ocrad engine ...................................................................................................... 30 2.4.3. OCRopus ............................................................................................................ 30 2.4.4. Tesseract OCR engine ........................................................................................ 31 3. PROPOSED TECHNIQUES .......................................................................... 38 3.1. System Overview ............................................................................................... 38 3.1.1. Receipt region detection ..................................................................................... 40 3.1.2. Receipt image pre-processing phase .................................................................. 43 3.1.3. Recognition phase .............................................................................................. 51 3.1.4. Regular expression (Regex) phase ..................................................................... 60 3.1.5. Database phase ................................................................................................... 62 3.2. Implementation and Practical Work .................................................................. 62 3.3. System Screenshots ........................................................................................... 68 4. QUERIES AND EXPERIMENTAL RESULTS ........................................... 72 4.1. User Queries ...................................................................................................... 72 4.1.1. Spend analyzer ................................................................................................... 72 4.1.2. Receipt image discovering ................................................................................. 74 4.1.3. Total money expended ....................................................................................... 75 4.1.4. Total money expended for a particular item....................................................... 77 4.2. Experimental Outcomes ..................................................................................... 78 4.2.1. Capability metrics .............................................................................................. 79 4.2.2. Examination corpus ............................................................................................ 80 4.2.3. Fake receipt font experimental outcomes ........................................................... 81 IV 4.2.4. Merchant copy font experimental outcomes ...................................................... 90 4.2.5. Evaluation of outcomes experienced .................................................................. 99 5. CONCLUSION AND FUTURE WORKS ..................................................... 102 6. REFERENCES ................................................................................................ 104 CURRICULUM VITA .................................................................................... 111 V ABSTRACT AN ANDROID BASED RECEIPT TRACKER SYSTEM USING OPTICAL CHARACTER RECOGNITION Since demands for innovating and implementing mobile apps gets deeper, therefore innovations on designing and creating desktop OCR Apps moved and shifted to propose and innovate mobile OCR Apps. Optical Character Recognition (OCR) is the technology that converts the text from handwritten images, text printed images or scanned images to the alterable text for further analysis and process. In this research, we suggested an Android OCR Application for automatically extracting and recognizing text on the receipt images. This research presented the main and powerful techniques proposed
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages124 Page
-
File Size-