Optical Character Recognition Based Speech Synthesis System Using Labview

Optical Character Recognition Based Speech Synthesis System Using LabVIEW S. K. Singla*1 and R.K.Yadav2 1 Electrical and Instrumentation Engineering Department Thapar University, Patiala,Punjab *[email protected] 2 Electronics and Instrumentation Engineering Department Galgotias College of Engineering and Technology Greater Noida, U.P ABSTRACT Knowledge extraction by just listening to sounds is a distinctive property. Speech signal is more effective means of communication than text because blind and visually impaired persons can also respond to sounds. This paper aims to develop a cost effective, and user friendly optical character recognition (OCR) based speech synthesis system. The OCR based speech synthesis system has been developed using Laboratory virtual instruments engineering workbench (LabVIEW) 7.1. Keywords: Optical character recognition, Speech, Synthesis, Recognition, LabVIEW. 1. Introduction Machine replication of human functions, like reading, is organized as follows: section 2 of this paper deals is an ancient dream. However, over the last few with recognition of characters based on LabVIEW. decades, machine reading has grown from a dream The speech synthesis technique has been explained to reality. Text is being present everywhere in our day in section 3. Section 4 discusses experimental result to day life, either in the form of documents and the paper concludes with section 5. (newspapers, books, mails, magazines etc.) or in the form of natural scenes (signs, screen, schedules) 2. Optical Character Recognition which can be read by a normal person. Unfortunately, the blind and visually impaired persons Optical character recognition (OCR) is the are deprived from such information, because their mechanical or electronic translation of images of vision troubles do not allow them to have access of hand-written or printed text into machine-editable this textual information which limits their mobility in text [12]. The OCR based system consists of unconstrained environments. The OCR based following process steps: speech synthesis system will significantly improve the degree to which the visually impaired can interact a) Image Acquisition with their environment as that of a sighted person [1]. b) Image Pre-processing (Binarization) This work is related to existing research in text detection from general background or video image c) Image Segmentation [2-7], and Bangla optical character recognition (OCR) system [8-9]. Some researchers published their d) Matching and Recognition efforts on texture-based [10] text detection also. OCR based speech recognition system using LabVIEW 2.1 Image Acquisition utilizes a scanner to capture the images of printed or handwritten text, recognize that text and translate the The image has been captured using a digital HP recognized text as voice output message [11] using scanner. The flap of the scanner had been kept Microsoft Speech SDK (Text To Speech). This paper open during the acquisition process in order to Journal of Applied Research and Technology 919 Optical Character Recognition Based Speech Synthesis System Using LabVIEW, S. K. Singla / 919‐926 obtain a uniform black background. The image had i.e. the values of pixel which are from 175 to 255 been acquired using the program developed in has been converted to 1 while the of pixel which LabVIEW as shown in the Figure 1. The have gray scale value less than 175 have been configuration of the Image has been done with the converted to 0. The LabVIEW program of help of Imaq create subvi function of LabVIEW. binarization has been shown in Figure 2. The configuration of the image means selecting the image type and border size of the image as per 2.3 Image Segmentation the requirement. In this work 8 bit image with border size of 3 has been used. The segmentation process consists of line segmentation, word segmentation and finally character segmentation. 2.3.1 Line segmentation Line segmentation is the first step of the segmentation process. It takes the array of the image as an input and scans the image horizontally to find first ON pixel and remember that coordinate as y1. The system continues to scan the image Figure 1. Image acquisition. horizontally and found lots of ON pixel since the characters would have started. When finally first 2.2 Image Pre-processing (Binarization) OFF pixel has been detected the system remembers that coordinate as y2 and check the Binarization is the process of converting a gray surrounding of the pixel to find out required number scale image (0 to 255 pixel values) into binary of OFF pixels. If this happens then the system clips image (0 to1 pixel values) by using a threshold the first line (fl) from input image between the value. The pixels lighter than the threshold are coordinate y1 to y2. In this way, all the lines have turned to white and the remainder to black pixels. been segmented & stored to be used for word and In this work, a global thresholding with a threshold character segmentation. The LabVIEW program of value of 175 has been used to binarize the image Line segmentation is shown in Figure 3. Figure 2. Binarization process. 920 Vol. 12, October 2014 Optical Character Recognition Based Speech Synthesis System Using LabVIEW, S. K. Singla / 919‐926 2.3.2 Word Segmentation 2.3.3 Character Segmentation In the word segmentation process the line Character segmentation has been performed by segmented images have been vertically scanned scanning the word segmented image vertically. to find first ON pixel. When this happen the This process is different from the word system remember the coordinate of this point as segmentation in following two ways: x1. This is the starting coordinate for the word. The system continues the scanning process until i) Number of horizontal OFF pixels between the fifteen (this is assumed word distance) different characters are less in comparison to successive OFF pixels have been obtained. The number of OFF pixels between the words system records the first OFF pixel as x2. From ii) Total number of characters and their order in the x1 to x2 is the word. In this way all the words word has been determined so as to reproduce the have been segmented and these segmented word correctly during speech synthesis. words have been used in next step for character recognition. The Figure 4 shows the LabVIEW The LabVIEW program of character segmentation programs of word segmentation. has been shown in Figure 5. Figure 3. Line Segmentation. Figure 4. Word segmentation. Journal of Applied Research and Technology 921 Optical Character Recognition Based Speech Synthesis System Using LabVIEW, S. K. Singla / 919‐926 Figure 5. Character segmentation. 2.4 Matching and Recognition segmented character has been compared with the pre defined data stored in the system. Since same In this process, correlation between stored templates font size has been used for recognition, a unique and segmented character has been obtained by match for the each character has been obtained. using correlation VI. The correlation VI determines Figure 6 shows the LabVIEW program of correlation the correlation between segmented character and between two images. stored templates of each character. The value of the highest correlation recognizes a particular character. The flow chart of OCR system has been shown in In this way in order to recognize the character every Figure 7. Figure 6. Correlation between two images. 922 Vol. 12, October 2014 Optical Character Recognition Based Speech Synthesis System Using LabVIEW, S. K. Singla / 919‐926 Start File/scan Read the file Determine Threshold Binarize the image Line Detection complete? No Yes Word Segmentation No complete? Yes No Character Segmentation complete? Yes Feature extraction Recognize character Stop Figure 7. Flow chart OCR system. Journal of Applied Research and Technology 923 Optical Character Recognition Based Speech Synthesis System Using LabVIEW, S. K. Singla / 919‐926 3. Text to speech synthesis 3.1 Text to speech conversion In text to speech module text recognised by OCR In the text speech conversion input text is system will be the inputs of speech synthesis converted speech (in LabVIEW) by using system which is to be converted into speech in .wav automation open, invoke node and property file format and creates a wave file named output node. LabVIEW program of Text to speech wav, which can be listen by using wave file player. conversion is shown in Figure 8 and Flow chart Two steps are involved in text to speech synthesis text speech conversion has been shown below in Figure 9. Figure 10 shows the flow chart of i) Text to speech conversion playing the converted speech signal while Figure 11 shows its LabVIEW program. ii) Play speech in .wav file format Figure 8. Text to speech conversions. Start Start Automation Open Open the wave specified file Create/open file Retrieve sound information Property node Store sound data & make it available for next (Audio o/p stream) Speak Playback the selected file Display sound data acquired from the file. Close file Stop Stop Figure 9. Flow chart text speech conversion. Figure 10. Flow chart of wave file player.vi. 924 Vol. 12, October 2014 Optical Character Recognition Based Speech Synthesis System Using LabVIEW, S. K. Singla / 919‐926 Step-3.-In this step line segmentation of thresholded image has been done. Figure 14 shows the result of line segmentation process. Figure 14. Segmented line. Step 4. In this step words have been segmented Figure 11. Wave file player. from the line. Figure 15 shows the result of word segmentation process. 4. Results and Discussion Experiments have been performed to test the proposed system developed using LabVIEW 7.1 version. The developed OCR based speech synthesis system has two steps: Figure 15. Segmented Word. a. Optical Character Recognition Step 5. In this step character segmentation has b.

Optical Character Recognition Based Speech Synthesis System Using Labview

Aviation Speech Recognition System Using Artificial Intelligence

A Comparison of Online Automatic Speech Recognition Systems and the Nonverbal Responses to Unintelligible Speech

The Role of Higher-Level Linguistic Features in HMM-Based Speech Synthesis

Learned in Speech Recognition: Contextual Acoustic Word Embeddings

The RACAI Text-To-Speech Synthesis System

Synthesis and Recognition of Speech Creating and Listening to Speech

Natural Language Processing in Speech Understanding Systems

The Role of Speech Processing in Human-Computer Intelligent Communication

Arxiv:2007.00183V2 [Eess.AS] 24 Nov 2020

Improving Named Entity Recognition Using Deep Learning with Human in the Loop

Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-Oriented Spoken Dialog

Voice User Interface on the Web Human Computer Interaction Fulvio Corno, Luigi De Russis Academic Year 2019/2020 How to Create a VUI on the Web?