International Journal of Research in Engineering, Science and Management 494 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792

Drishti: Voice Assisted Text Reading Smart Specs for Visually Impaired Persons

Chitranjali Tanwar1, S. Dakshayini2, R. Gagana3, Mythri M. Hegade4, Manjusha Mane5 1,2,3,4Student, Department of Electronics and Communication Engineering, Sambhram Institute of Technology, Bangalore, India 5Assistant Professor, Department of Electronics and Communication Engineering, Sambhram Institute of Technology, Bangalore, India

Abstract: According to the World Health Organization, out of convert this text to speech. 7.4 billion population around 285 million people are estimated to A Text-To-Speech (TTS) synthesizer is a computer based be visually impaired worldwide. It is observed that they are still system that should be able to read any text aloud, when it is finding it difficult to roll their day today life and it is important to directly introduced in the computer by an operator. It is more take necessary measure with the emerging technologies to help them to cope up the current world irrespective of the impairments. suitable to define Text-To-Speech or Speech-To-Text synthesis In the motive of supporting, an innovative, efficient and real-time, as an automatic production of speech, by ‘grapheme to cost effective smart spec is proposed to read any printed text in ’ transcription. Text Information Extraction is the vocal form. As accessing text documents is troublesome for primary and important function of any assistive reading system impaired people in many situations, it provides self- independency. and is an integral part of OCR because this process determines Pi camera is used to capture the text image from the printed text the intelligibility of the output speech. The output quality of and captured image is analyzed using Tesseract-OCR. The text in the form of [jpeg, png, jpg, bmp etc.] are considered for the text-to-speech as well as extending our capabilities to generate analysis. The image captured is then processed and with the help expressive emotional synthetic speech. Automatic video text of OCR/Tesseract the captured text was converted into letters detection and extraction was employed in order to partition which was made readable with the help of eSpeak software. The video blocks into text and non-text regions. Recent OCR algorithm involves scanning, processing, classification and developments in computer vision, digital cameras, and recognition. Finally, eSpeak command software is used to convert computers make it possible by developing camera based the obtained text from the OCR into speech command. This converted speech is read aloud by speaker connected to Raspberry products that merges with computer vision technology with Pi. A tilt sensor is as used to notify fall detection. other existing beneficial products such as optical character recognition systems. OCR is used to recognize words. It can Keywords: Tessaract OCR, Optical Character Recognition, recognize characters, words and sentences without any Text-to-Speech, eSpeak, Raspberry Pi mistakes. OCR has a high rate of recognition which is the electronic conversion of photographed images of typewritten or 1. Introduction printed text into computer readable. Of the 215 million visually impaired people worldwide, 40 million are blind. Some of the developed countries like the 2. Literature review united states, the 2008 national health interview survey (NHIS)  Ani R, Effy Maria, J Jameema Joyce, Sakkaravarthy V reported that an estimated 25.2 million adult Americans (over (2017), proposed a smart spec for the blind persons which 8%) are blind or visually impaired. Recent development trends can perform text detection thereby produce a voice output. in computer vision, digital cameras, and portable computers This can help the visually impaired persons to read any make it feasible to assist these individuals by developing printed text in vocal form. Raspberry Pi is the main target camera-based products that combine computer vision for the implementation, as it provides an interface between technology with other existing commercial products such OCR camera, sensors, and image processing results, while also systems. Accessing text documents is troublesome for visually performing functions to manipulate peripheral units impaired people in many situations, such as reading text on the (Keyboard, USB etc.,). go and accessing text in less than ideal conditions. The  Joao Guerreiro and Daniel Goncalves (2014), the authors objective is to allow blind users to touch printed text and receive proposed that the portable digital imaging Devices has speech output in real-time. The development of such systems created enormous technique in many domains by describing requires the usage of two technologies that are central to these typical imaging devices and imaging process.in this paper, systems, namely optical character recognition for Text author proposed that screen readers have been an essential Information Extraction (TIE) and Text-To-Speech (TTS) to International Journal of Research in Engineering, Science and Management 495 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792

tool for assisting visually impaired users over many years in accessing digital information. Making this information accessible is crucial and has been the object of intensive research. They made use of concurrent speech which enables visually impaired people to listen to several irrelevant information items, get the gist of the information and identify the ones that deserve further attention.  J. Liang D and Doremann (2005), the authors proposed that the advanced technology can play the tremendous role using mobile computer vision that facilitates environment exploration for visually impaired persons. In this paper, author present a survey of application domains, technical challenges and the solutions for the analysis of documents Fig. 1. Block diagram captured by digital devices and the imaging process. They provided document analysis from a single camera-captured 4. Working principle image as well as multiple frames and highest some sample The system has been built on a Raspberry pi board running applications under development and feasible ideas for future Raspbian OS and python/opencv libraries. Raspbian is a Debian  Alexandre Trilla and Francesc Alias. (2013), as the author based distributed de-facto standard , published about the improvement of TTS Method in the case which comes with pre-installed peripheral units libraries. The of input text reading and conversion of speech. the article is texts in the form of (jpeg, png, jpg, bmp etc.) are considered for about the research of analysis of input feature for speech the analysis. The image which is captured from the pi camera is synthesis. splitted in following conditions as described below for  S. Mascaro. and H. H. Asada (2001), when the conference detecting corresponding text. The image captured is then for robot tutor was held the author introduced finger posture processed and with the help of OCR/Tesseract the captured text and shear force in easuremed: initial experimentation. was converted into letters which was made readable with the Pitrelli J. and Bakis R. (2006), the author proposed the IBM help of eSpeak software. We will also use ultrasonic sensor and expressive text-to- system for American tilt sensor to notify obstacles and fall detection. The fall English which is based on the article published about the detection is sent as a notification to concerned person using IoT. audio, speech, language process. The concept of proposed system is the idea of developing  Priyanka Bacche. Apurva Bakshi, Krishna Ghiya. Prianka specs reader based text reading system for visually impaired Gujar. (2014), they proposed the idea as NETHRA - new persons. There are three different modules in this system: eyes to read artifact which was included in the Journal of Camera module, Optical Character Recognition Module and science, Engineering and technology research. Text-ToSpeech Module.  Rissanean, M. J. Fernando S. Pang N (2013), they A. Camera Module introduced it as “Natural and Socially acceptable interaction All capture methods found in OpenCV (capture, capture technique for Ringer faces. Finger-ring shaped user continuous, capture sequence) have to be considered according interfaces” which was published in distributed ambient and their use and abilities. In this project, the capture sequence pervasive interactions. Norman J. F. and Bartholomew A. method was chosen, as it is the fastest method by far. Using the N. (2011), the author proposed the Blindness Enhance capture sequence method our Raspberry Pi camera is able to Tactile Acuity and Haptic 3-D Shape Discrimination. capture images in rate of 20fps at a 640×480 resolution. Published in the article based on human interaction

conference.  Rohit Ranchal. Yiren Guo. Keith Bain and Paul Robinson J (2013), they proposed as using speech recognition for real time captioning and lecture transcription in the class room and the article was learning technologies.

3. Description The concept of proposed system is the idea of developing specs reader based text reading system for visually impaired persons. There are three different modules in this system: Camera module, Optical Character Recognition Module and

Text-To-Speech Module This explains the text reading system Fig. 2. Camera Module for visually impaired users for their self-independent. International Journal of Research in Engineering, Science and Management 496 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792

B. Optical Character Recognition Module  OCR Tessearact In order to read the words in a paragraph accurately, Tesseract Optical Character Recognition (OCR) is chosen for 5. Conclusion the project. A text detection and recognition with speech output system Tesseract OCR was shown to outperform most commercial was with success incontestable on mechanical man platform. OCR engines. The processing within Tesseract OCR needs a this technique is extremely handy and helpful for the purblind traditional step-by step pipeline. In the first step the Connected persons. Compared with a computer platform, the mobile Component Analysis is applied, and the outlines of the platform is transportable and a lot of convenient to use. this components stored. This step is particularly computational technique is going to be useful for purblind persons to access intensive, however it brings number of advantages such as info in written kind and within the encompassing. it's helpful to being able to read reversed text, recognizing easily on black text know the written communication messages, warnings, and on white background. After this stage the outlines and the traffic direction in voice kind by changing it from Text to voice. regions analysed as blobs. The text lines are broken into it's found that this technique is capable of changing the sign character cells with the algorithm nested in the OCR for boards and alternative text into speech. spacing. Next, the recognition phase is set two parts: Adaptive classifier and Template classifier. References . Text to Speech using eSpeak [1] Ani R, Effy Maria, J Jameema Joyce, Sakkaravarthy V (2017), “Smart Specs: Voice assisted Text Reading system forVisually Impaired Persons eSpeak is a compact open source software speech synthesizer Using TTS method”, IEEE International Conference on Innovations in for English and other languages, for Linux and Windows which Green Energy and Healthcare Technologies. uses a "formant synthesis" method. This allows many languages [2] Alexandre Trilla and Francesc Alías. (2013), “Sentence Based Sentiment Analysis for Expressive Text-to-Speech”, IEEE Transactions on Audio, to be provided in a small size. The main advantage of using Speech, and Language Processing, Vol. 21, Issue. 2. pp. 223-233. eSpeak is that the speech is clear, and can be used at high [3] S. Mascaro. And H. H. Asada. (2001) “Finger posture and shear force speeds. measurement: Initial experimentation,” in Proc. IEEE Int. conf. robot. Autom, Vol. 2, pp. 1857-1862. D. Hardware tools [4] Pitrelli J. and Bakis R. (2006), “The IBM expressive text-to-speech synthesis system for ‟, IEEE Trans. Audio, Speech,  Raspberry pi Lang. Process, Vol. 14, No. 4, pp. 1099–1108.  Pi camera [5] Priyanka Bacche. Apurva Bakshi, Karishma Ghiya. Priyanka Gujar. (2014), “Tech–NETRA (New Eyes to Read Artifact),” International  ESP 8266 Wifi Module Journal of Science, Engineering and Technology Research, Vol. 3, Issue.  Speakers 3, pp. 482-485. [6] Rissanen M.J., Vu S., Fernando O.N.N., Pang N., Foo S. (2013) Subtle,  Ultrasonic Sensor ADXl Tilt Sensor Natural and Socially Acceptable Interaction Techniques for Ringterfaces — Finger-Ring Shaped User Interfaces. In: Streitz N., Stephanidis C. E. Software Tools: (eds) Distributed, Ambient, and Pervasive Interactions. DAPI 2013.  Raspbian OS Lecture Notes in Computer Science, vol. 8028. Springer, Berlin, Heidelberg.”  Python [7] Norman J. F. and Bartholomew A. N. (2011), “Blindness Enhances  OpenCV Tactile Acuity and Haptic 3-D Shape Discrimination,” Proceedings of the  eSpeak 4thAugmented Human International Conference, Vol. 73, No.7, pp. 23– 30.