The Phone Reader
Total Page:16
File Type:pdf, Size:1020Kb
THE PHONE READER Submitted in partial fulfillment of the requirements of the degree of BACHELOR OF SCIENCE (HONOURS) of Rhodes University Mich`eleMarilyn Bihina Bihina Grahamstown, South Africa November, 2012 Abstract The Phone Reader is an Android application that reads text extracted from a photo taken with a mobile Android phone. It uses Tesseract OCR to provide accurate character recogni- tion in the image, and Apertium translator engine to translate the extracted text. It aims to help people with reading disabilities, or illiterates and non-native speakers, to hear the content of text they have difficulties to read. The system provides a user friendly client interface that communicates with a remote server; and the latter processes uploaded images to extract the text contained in it. ACM Computing Classification System Thesis classification under the ACM Computing Classification System (1998 version, valid through 2012): B.4.1 [Data Communications Devices]: Receivers (voice, data, image) I.2.7 [[Natural Language Processing]: Speech recognition and synthesis I.4 [Image Processing And Computer Vision]: Image processing software I.7.5 [Document Capture]: Graphics Recognition And Interpretation Optical character recognition (OCR) Scanning General Terms: Image processing, Android application, OCR (Optical Character Recog- nition), Text recognition, Text-To-Speech, Text translation i Acknowledgments I would like to thank God for all the strength He has given during this year. I would like to give my deep and sincere thanks to all the members of my family who have helped and supported me this year, especially my mother and my uncle J.V. Nkolo. I also want to thank the following persons for their support: * My supervisor Mr James Connan for all the help he has offered me during the devel- opment of this project. * Rhodes University and the department of computer science for the opportunity to pursue my Honours degree. * All my friends that have assisted and given me advices this year. And finally, I would like to acknowledge the financial and technical support of Telkom, Tellabs, Stortech, Genband, Easttel, Bright Ideas 39 and THRIP through the Telkom Centre of Excellence in the Department of Computer Science at Rhodes University. ii Contents Abstract . i ACM Computing Classification System . i Acknowledgments . ii 1 Introduction 1 1.1 Problem statement . 1 1.2 Objectives . 2 1.3 Methodology . 2 1.4 Progression . 2 1.5 Structure of the thesis . 3 2 Literature Review 4 2.1 Introduction . 4 2.2 Image processing . 4 2.2.1 Definitions . 4 2.2.2 Image processing methods . 5 2.3 Text reading systems . 6 2.3.1 Mobile text reading applications requiring OCR . 6 2.3.2 Mobile text reading applications not requiring OCR . 8 2.4 Object recognition systems . 9 2.4.1 Systems using crowd-sourcing for object recognition . 9 2.4.2 Mobile applications using visual search on specific types of objects . 10 2.5 Text and object recognition systems offering extended functionalities . 11 2.6 Common tools used by object recognition and text reading mobile applications 13 iii 2.6.1 Mobile operating systems . 13 2.6.2 OCR . 14 2.6.3 Text-To-Speech Engine . 14 2.7 Plan of Action . 15 2.8 Conclusion . 15 3 Design of the system 17 3.1 Introduction . 17 3.2 Textual description . 17 3.3 System Design . 19 3.3.1 System architecture . 19 3.3.2 UML approach . 20 3.4 Conclusion . 24 4 Implementation 25 4.1 Introduction . 25 4.2 System requirements . 25 4.3 Description of the tools used for the system . 26 4.4 Code documentation . 28 4.4.1 Image processing techniques used with Imagemagick . 28 4.4.2 Java implementation of the classes of the system . 31 4.5 Conclusion . 40 5 Tests and Results 41 5.1 Different font sizes for the same text . 41 5.2 Lighting conditions . 43 5.3 Testing the translation accuracy . 46 6 Conclusion 47 6.1 Goals achieved by the system . 47 6.2 Limits of the system . 47 6.3 Future work . 48 iv 6.4 Conclusion . 48 A User's guide 49 v List of Tables 1.1 Progression of the Phone Reader project . 3 4.1 System specifications . 26 vi List of Figures 3.1 Flowchart diagram of the system . 18 3.2 Architecture of the system . 19 3.3 Use case diagram . 21 3.4 Class diagram on the client side . 22 3.5 Class diagram on the server side . 23 4.1 Function to apply Unsharp method to image . 31 4.2 Initialization of the Camera Activity class . 32 4.3 Function to open camera mode . 32 4.4 Display bitmap image on phone screen . 33 4.5 Snippet of code for the event ACTION-UP . 34 4.6 Map a language to its code . 35 4.7 Php script to upload image on server . 36 4.8 URL to download the text file . 37 4.9 Calling the image processing methods from the main class . 38 4.10 Calling the OCR function . 39 4.11 Calling the translation function . 40 4.12 Function to perform Text-To-Speech . 40 5.1 OCR results of a text with font size 12 . 42 5.2 OCR results of a text with font size 14 . 42 5.3 OCR results of a text with font size 16 . 43 5.4 OCR results under low light . 44 5.5 OCR results under low light using the camera flash . 44 vii 5.6 Result of the pre-processing of an image taken with flash activated . 45 5.7 OCR result of a pre-processed image taken with flash activated . 46 5.8 Representation of accurately translated words in a text . 46 A.1 Screen 1 . 49 A.2 Screen 2 . 50 A.3 Screen 3 . 51 A.4 Screen 4 . 52 A.5 Screen 5 . 53 A.6 Screen 6 . 53 A.7 Screen 7 . 54 viii Chapter 1 Introduction In today's society, mobile phones offer a wide variety of functionalities that are not always related to calling or sending messages. Those functionalities include web browsing, playing games or music, banking, taking photos and so much more. The Phone Reader is an Android application that aims to allow the user to hear a text contained in a picture that has been taken with a mobile phone. It is an application meant to help those who cannot read a text they encounter, like non-native speakers, the visually impaired and the blind people, estimated at 285 millions in 2010 by the World Health Organization [22]. This project is mainly related to image processing to recognize characters in an image. 1.1 Problem statement Reading or understanding a text can at times be a challenge if it is written in a foreign language, if the reader is illiterate, or if one has reading disabilities. The solution to this problem is the goal of the Phone Reader project. This latter aims to develop a mobile application that can read a text for the user through a mobile Android device. To use it, the user has to photograph the text with his phone, choose a language for the translation if necessary, and send the photo to the server, which extracts the text in the photo and produces its speech. 1 1.2 Objectives The Phone Reader is meant to help different type of people unable to read a text. The following list presents cases in which the Phone Reader can be used: • Blind people can use it when they have a text to read. • Non-native speakers (like tourists) can use it when they do not understand a text written in a foreign language, or when they are not sure of the right pronunciation of words. • The illiterate or dyslexic can use it when they have difficulties reading a text. 1.3 Methodology The system was programmed using Android [26], which is a Linux-based operating system for mobile devices developed by the Open Handset Alliance. The phone sends requests to the Apache Server by uploading photos to it. The Apache Server processes the client's request by pre-processing the image sent, before extracting its text with an OCR (Optical Character Recognition) program. TTS (Text To Speech) engine produces the speech on the phone after performing a required translation of the extracted text. The programming languages used are Java and PHP. 1.4 Progression The following table presents the different steps that need to be accomplished in order to develop the Phone Reader: 2 Table 1.1: Progression of the Phone Reader project Steps Tasks Step 1 Review existing technologies Step 2 Determine system requirements Step 3 Configure web service Step 4 Implement and evaluate preprocessing Step 5 Implement OCR Step 6 Implement translation Step 7 Implement user interface/phone client Step 8 Implement TTS functionality on the phone Step 9 Testing the system Step 10 Documentation 1.5 Structure of the thesis This thesis has seven chapters, the first one is the introduction. In Chapter 2, which is the literature review, we review all the related works to this project; we examine which tools have been used and which tools we could use for our system. Chapter 3, which is the design of the system, describes how the system has been designed and presents an overview of the structure of the system. Chapter 4 is the implementation, it describes all the technical aspects of the Phone Reader: the system requirements and the programming aspect. Chapter 5 presents and discusses the results obtained from different tests performed with the system. Chapter 6 is the conclusion, it presents the system performance and how it can be improved.