Voice Assisted Text Reading Smart Specs for Visually Impaired Persons
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Research in Engineering, Science and Management 494 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792 Drishti: Voice Assisted Text Reading Smart Specs for Visually Impaired Persons Chitranjali Tanwar1, S. Dakshayini2, R. Gagana3, Mythri M. Hegade4, Manjusha Mane5 1,2,3,4Student, Department of Electronics and Communication Engineering, Sambhram Institute of Technology, Bangalore, India 5Assistant Professor, Department of Electronics and Communication Engineering, Sambhram Institute of Technology, Bangalore, India Abstract: According to the World Health Organization, out of convert this text to speech. 7.4 billion population around 285 million people are estimated to A Text-To-Speech (TTS) synthesizer is a computer based be visually impaired worldwide. It is observed that they are still system that should be able to read any text aloud, when it is finding it difficult to roll their day today life and it is important to directly introduced in the computer by an operator. It is more take necessary measure with the emerging technologies to help them to cope up the current world irrespective of the impairments. suitable to define Text-To-Speech or Speech-To-Text synthesis In the motive of supporting, an innovative, efficient and real-time, as an automatic production of speech, by ‘grapheme to cost effective smart spec is proposed to read any printed text in phoneme’ transcription. Text Information Extraction is the vocal form. As accessing text documents is troublesome for primary and important function of any assistive reading system impaired people in many situations, it provides self- independency. and is an integral part of OCR because this process determines Pi camera is used to capture the text image from the printed text the intelligibility of the output speech. The output quality of and captured image is analyzed using Tesseract-OCR. The text in the form of [jpeg, png, jpg, bmp etc.] are considered for the text-to-speech as well as extending our capabilities to generate analysis. The image captured is then processed and with the help expressive emotional synthetic speech. Automatic video text of OCR/Tesseract the captured text was converted into letters detection and extraction was employed in order to partition which was made readable with the help of eSpeak software. The video blocks into text and non-text regions. Recent OCR algorithm involves scanning, processing, classification and developments in computer vision, digital cameras, and recognition. Finally, eSpeak command software is used to convert computers make it possible by developing camera based the obtained text from the OCR into speech command. This converted speech is read aloud by speaker connected to Raspberry products that merges with computer vision technology with Pi. A tilt sensor is as used to notify fall detection. other existing beneficial products such as optical character recognition systems. OCR is used to recognize words. It can Keywords: Tessaract OCR, Optical Character Recognition, recognize characters, words and sentences without any Text-to-Speech, eSpeak, Raspberry Pi mistakes. OCR has a high rate of recognition which is the electronic conversion of photographed images of typewritten or 1. Introduction printed text into computer readable. Of the 215 million visually impaired people worldwide, 40 million are blind. Some of the developed countries like the 2. Literature review united states, the 2008 national health interview survey (NHIS) Ani R, Effy Maria, J Jameema Joyce, Sakkaravarthy V reported that an estimated 25.2 million adult Americans (over (2017), proposed a smart spec for the blind persons which 8%) are blind or visually impaired. Recent development trends can perform text detection thereby produce a voice output. in computer vision, digital cameras, and portable computers This can help the visually impaired persons to read any make it feasible to assist these individuals by developing printed text in vocal form. Raspberry Pi is the main target camera-based products that combine computer vision for the implementation, as it provides an interface between technology with other existing commercial products such OCR camera, sensors, and image processing results, while also systems. Accessing text documents is troublesome for visually performing functions to manipulate peripheral units impaired people in many situations, such as reading text on the (Keyboard, USB etc.,). go and accessing text in less than ideal conditions. The Joao Guerreiro and Daniel Goncalves (2014), the authors objective is to allow blind users to touch printed text and receive proposed that the portable digital imaging Devices has speech output in real-time. The development of such systems created enormous technique in many domains by describing requires the usage of two technologies that are central to these typical imaging devices and imaging process.in this paper, systems, namely optical character recognition for Text author proposed that screen readers have been an essential Information Extraction (TIE) and Text-To-Speech (TTS) to International Journal of Research in Engineering, Science and Management 495 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792 tool for assisting visually impaired users over many years in accessing digital information. Making this information accessible is crucial and has been the object of intensive research. They made use of concurrent speech which enables visually impaired people to listen to several irrelevant information items, get the gist of the information and identify the ones that deserve further attention. J. Liang D and Doremann (2005), the authors proposed that the advanced technology can play the tremendous role using mobile computer vision that facilitates environment exploration for visually impaired persons. In this paper, author present a survey of application domains, technical challenges and the solutions for the analysis of documents Fig. 1. Block diagram captured by digital devices and the imaging process. They provided document analysis from a single camera-captured 4. Working principle image as well as multiple frames and highest some sample The system has been built on a Raspberry pi board running applications under development and feasible ideas for future Raspbian OS and python/opencv libraries. Raspbian is a Debian Alexandre Trilla and Francesc Alias. (2013), as the author based Linux distributed de-facto standard operating system, published about the improvement of TTS Method in the case which comes with pre-installed peripheral units libraries. The of input text reading and conversion of speech. the article is texts in the form of (jpeg, png, jpg, bmp etc.) are considered for about the research of analysis of input feature for speech the analysis. The image which is captured from the pi camera is synthesis. splitted in following conditions as described below for S. Mascaro. and H. H. Asada (2001), when the conference detecting corresponding text. The image captured is then for robot tutor was held the author introduced finger posture processed and with the help of OCR/Tesseract the captured text and shear force in easuremed: initial experimentation. was converted into letters which was made readable with the Pitrelli J. and Bakis R. (2006), the author proposed the IBM help of eSpeak software. We will also use ultrasonic sensor and expressive text-to-speech synthesis system for American tilt sensor to notify obstacles and fall detection. The fall English which is based on the article published about the detection is sent as a notification to concerned person using IoT. audio, speech, language process. The concept of proposed system is the idea of developing Priyanka Bacche. Apurva Bakshi, Krishna Ghiya. Prianka specs reader based text reading system for visually impaired Gujar. (2014), they proposed the idea as NETHRA - new persons. There are three different modules in this system: eyes to read artifact which was included in the Journal of Camera module, Optical Character Recognition Module and science, Engineering and technology research. Text-ToSpeech Module. Rissanean, M. J. Fernando S. Pang N (2013), they A. Camera Module introduced it as “Natural and Socially acceptable interaction All capture methods found in OpenCV (capture, capture technique for Ringer faces. Finger-ring shaped user continuous, capture sequence) have to be considered according interfaces” which was published in distributed ambient and their use and abilities. In this project, the capture sequence pervasive interactions. Norman J. F. and Bartholomew A. method was chosen, as it is the fastest method by far. Using the N. (2011), the author proposed the Blindness Enhance capture sequence method our Raspberry Pi camera is able to Tactile Acuity and Haptic 3-D Shape Discrimination. capture images in rate of 20fps at a 640×480 resolution. Published in the article based on human interaction conference. Rohit Ranchal. Yiren Guo. Keith Bain and Paul Robinson J (2013), they proposed as using speech recognition for real time captioning and lecture transcription in the class room and the article was learning technologies. 3. Description The concept of proposed system is the idea of developing specs reader based text reading system for visually impaired persons. There are three different modules in this system: Camera module, Optical Character Recognition Module and Text-To-Speech Module This explains the text reading system Fig. 2. Camera Module for visually