Design and Implementation of Text to Speech Conversion Using Raspberry PI K

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by International Journal of Innovative Technology and Research (IJITR) K. Lakshmi* et al. (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.4, Issue No.6, October – November 2016, 4564-4567. Design And Implementation Of Text To Speech Conversion Using Raspberry PI K. LAKSHMI Mr. T. CHANDRA SEKHAR RAO M.Tech Scholar, Department of ECE Professor, Department of ECE Loyola Institute of Technology and Management Loyola Institute of Technology and Management Dhulipalla, Guntur Dhulipalla, Guntur Abstract- In this paper an innovative, efficient and real- time cost beneficial technique that enables user to hear the contents of text images instead of reading through them as been introduced. It combines the concept of Optical Character Recognition (OCR) and Text to Speech Synthesizer (TTS) in Raspberry pi. This kind of system helps visually impaired people to interact with computers effectively through vocal interface. Text Extraction from color images is a challenging task in computer vision. Text-to-Speech is a device that scans and reads English alphabets and numbers that are in the image using OCR technique and changing it to voices. This paper describes the design, implementation and experimental results of the device. This device consists of two modules, image processing module and voice processing module. The device was developed based on Raspberry Pi v2 with 900 MHz processor speed. Keywords- Raspberry Pi; OCR; Camera; Image Processing; Voice Processing I. INTRODUCTION conversion system with help of Raspberry Pi. The project has a small inbuilt camera that scans the The most basic and widely used method is Braille. text printed on a paper, converts it to audio format Apart from that the other technology used is using a synthesized voice for reading out the Talking Computer Terminal, Computer Driven scanned text quickly translating books, documents Braille Printer, Paperless Braille Machines, and other materials for daily living, especially away Optacon etc. These technologies use different from home or office. Not only does this save time techniques and methods allowing the person to read and energy. [2] or convert document to Braille. Nowadays Computers are designed to interact by reading the There is numerous language translators available books or documents. Synthesized voice is used to where we have to manually enter the content and read the content by the computers. We also have the application do whatever is left of the work of devices that scan the documents and use interfaced making a translation of the content into the desired screen to allow the blind to sense the scanned language. Some translation platforms even charges documents on the screen either in Braille or by for this process. For example the most popular using the shape of the letters itself with help of Google translate API charges calculated as per the vibrating pegs. Various phone applications are also usage. [3] developed to help in reading or helping the blind in There is another example of a translator which uses other ways. If you’ve ever tried to communicate java on Linux platform to translate from one with someone who speaks a different foreign language to another. Example based machine language, you know that it can be extremely translation, EBMT, is such an example which uses difficult— even with the assistance of cutting edge java on Linux platform for translating from one translation sites where we need to pay lot sums of language to another. The rule of deciphering in money to fulfill our task. EBMT is basic: it analysis the pre-translated data in II. EXISTING SYSTEMS the data base and further translates the input. Along these lines, the bigger the database of pre-translated In the existing system, user can see conversion of sentences, more prominent and efficient will be the text messages into speech. Cause for this project is output. that reading aids for the blind, talking aid for the vocally handicapped and training aids and other III. PROPOSED SYSTEM commercial applications. Vocally handicapped The below fig1.shows that block diagram of the people can type the text from keypad and it will be text to speech conversion processed in ARM microcontroller and voice board. In voice board, they feed the Input, so that ARM microcontroller will process and output is heard through speaker. [1] In this work it aims to study the image recognition technology with speech synthesis and to develop a cost effective, user friendly image to speech 2320 –5547 @ 2013-2016 http://www.ijitr.com All rights Reserved. Page | 4564 K. Lakshmi* et al. (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.4, Issue No.6, October – November 2016, 4564-4567. Books and papers have letters. Our aim is to extract these letters and convert them into digital form and then recite it accordingly. Image processing is used to obtain the letters. Image processing is basically a set of functions that is used upon an image format to deduce some information from it. The input is an image while the output can be an image or set of parameters obtained from the image. Once the image is being loaded, we can convert it into gray scale image. The image which we get is now in the form of pixels within a specific range. This range is used to determine the letters. In gray scale, the image has either white or black content; the white Fig1. Block diagram of text to speech conversion will mostly be the spacing between words or blank space. Raspberry Pi Feature Extraction The following are specifications for Model B In this stage we gather the essential features of the Broadcom BCM2835 SoC processor with 700 MHz image called feature maps. One such method is to ARM1176JZF-S cores , 512MB RAM ,Video core detect the edges in the image, as they will contain 4 GPU supports up to 1920x1200 Resolution ,SD the required text. For this we can use various axes card slot , 10/100Mbps Ethernet port , 2 x USB 2.0 detecting techniques like: Sobel, Kirsch, Canny, ports ,HDMI, audio/video jack ,GPIO header Prewitt etc. The most accurate in finding the four containing 40 pins ,Micro USB power port directional axes: horizontal, vertical, right diagonal providing 2A current supply ,DSI and CSI ports and left diagonal is the Kirsch detector. This ,Dimensions: 85.6x56mm. technique uses the eight point neighborhood of Camera Module each pixel. The Raspberry Pi camera module size is 25mm Optical Character Recognition square, 5MP sensor much smaller than the Optical character recognition, usually abbreviated Raspberry Pi computer, to which it connects by a to OCR, is the mechanical or electronic conversion flat flex cable (FFC, 1mm pitch, 15 conductor, and of scanned images of handwritten, typewritten or type B printed text into machine encoded text. It is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records. It is crucial to the computerization of printed texts so that they can be electronically searched, stored more compactly, displayed on-line and used in machine processes such as machine translation, text-to- speech and text mining. OCR is Fig.2 Raspberry Pi Camera Module a field of research in pattern recognition, artificial The Raspberry Pi camera module offers a unique intelligence and computer vision. new capability for optical instrumentation with Tesseract critical capabilities as follows: Tesseract is a free software optical character 1080p video recording to SD flash memory cards. recognition engine for various operating systems. Simultaneous output of 1080p live video via Tesseract is considered as one of the most accurate HDMI, while recording. Sensor type: Omni Vision free software OCR engines currently available. It is OV5647 Colour CMOS QSXGA (5-megapixel) , available for Linux, Windows and Mac OS. Sensor size: 3.67 x 2.74 mm , Pixel Count: 2592 x 1944 ,Pixel Size: 1.4 x 1.4 um, Lens: f=3.6 mm, An image with the text is given as input to the f/2.9 ,Angle of View: 54 x 41 degrees, Field of Tesseract engine that is command based tool. Then View: 2.0 x 1.33 m at 2 m , Full-frame SLR lens, it is processed by Tesseract command. Tesseract equivalent: 35 mm ,Fixed Focus: 1 m to infinity, command takes two arguments: First argument is Removable lens. Adapters for M12, C-mount, image file name that contains text and second Canon EF, and Nikon F Mount lens interchange. argument is output text file in which, extracted text In-camera image mirroring. is stored. The output file extension is given as .txt by Tesseract, so no need to specify the file Image Processing extension while specifying the output file name as a second argument in Tesseract command. After 2320 –5547 @ 2013-2016 http://www.ijitr.com All rights Reserved. Page | 4565 K. Lakshmi* et al. (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.4, Issue No.6, October – November 2016, 4564-4567. processing is completed, the content of the output operating system based on Debian, optimized for is present in .txt file. In simple images with or the Raspberry Pi hardware. Raspbian comes with without color (gray scale), Tesseract provides over 35,000 packages: precompiled software results with 100% accuracy. But in the case of bundled in a nice format for easy installation on the some complex images Tesseract provides better Raspberry Pi. Raspbian is a community project accuracy results if the images are in the gray scale under active development, with an emphasis on mode as compared to color images.

Design and Implementation of Text to Speech Conversion Using Raspberry PI K

The Role of Higher-Level Linguistic Features in HMM-Based Speech Synthesis

The RACAI Text-To-Speech Synthesis System

Synthesis and Recognition of Speech Creating and Listening to Speech

The Role of Speech Processing in Human-Computer Intelligent Communication

Voice User Interface on the Web Human Computer Interaction Fulvio Corno, Luigi De Russis Academic Year 2019/2020 How to Create a VUI on the Web?

A Multimodal User Interface for an Assistive Robotic Shopping Cart

Voice Assistants and Smart Speakers in Everyday Life and in Education

Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden

Attention, I'm Trying to Speak Cs224n Project: Speech Synthesis

Quantifying the Effects of Prosody Modulation on User Engagement

Lecture 5: Part-Of-Speech Tagging

Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-To-Speech Synthesis