Text Reader for Blind: Text-To-Speech Ijpam.Eu
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Pure and Applied Mathematics Volume 117 No. 21 2017, 119-125 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu Special Issue Text Reader for Blind: Text-To-Speech ijpam.eu L. Mary Gladence 1, Shubham Melvin Felix2, Aatisha Cyrill3 1Assistant Professor/IT,Sathyabama University,Chennai, India 2UG student,IT,Sathyabama University,Chennai,India Abstract-With humans moving towards higher teachers are well aware of the problems created standards of living and to a more digitalised and when students with learning problems are forced to interconnected world , computers prove to play an use only written material--computer-based text has eminent role by providing the most efficient and a similar potential for causing difficulty among optimal ways in achieving the required goals. Human poor readers and most significantly the visually resource and the computer system give the perfect impaired. Hence the idea of incorporating paradigm of a trouble shooter. Such systems need to computer-generated voice in all types of software’s be user friendly, accurate, and multitasking as they has revolutionised people’s lives and there is way are needed by every section of people. But when it more to go. comes to visually impaired people they (the IBM's Writing to Read, most idea of the software’s/systems) pose a great deal of struggle and talking software produced for the education market difficulty and the complete utilization of the facilities prior to 1986 had a special-education focus. In the is hampered while using the visual interface.This can last few years, however, most speech-based be solved by using the hearing capability. Keeping instructional programs target general education. this in mind the software will be able to read the text The three most common ways of adding speech to present in the screen, webpage, document or a text educational software are compressed, digitized entered in a text box using FreeTTS text-to-speech human speech, linear predictive coding (LPC); and synthesizer. The text will be converted into a speech text-to-speech. The text-to-speech gives access to by analyzing and processing the text using Natural putting in lesser efforts in everyday chores but for Language Processing (NLP) and then using Digital the visually impaired people it can be a great tool to Signal Processing (DSP) technology to convert this lead a more normal life. processed text into synthesized speech representation of the text. Through the speech or voice visually II. RELATED WORK impaired people can be able to hear large volume of text easier. Other than just the text to speech facility Itunuoluwa Isewon, Jelili Oyelade and the software will have a facility to extract the text into Olufunke Oladipupo from Covenant University, an audio file like *.mp3,*.wav etc.It will be an Nigeria have proposed in their paper “Design and efficient way in which blind people can also interact Implementation of Text To Speech Conversion for with the computer and utilize the facilities of the Visually Impaired People” [1] the idea of text to computer. speech synthesis. In their work we see that their model is divided into these following structures: Keywords: FreeTTS, Text-to-speech synthesis, . Natural Language Processing (NLP) Natural Language Processing, Digital Signal module: It produces a phonetic Processing. transcription of the text read, together with prosody. I. INTRODUCTION . Digital Signal Processing (DSP) module: It transforms the symbolic information it Artificial speech has been a dream of the receives from NLP into audible and humankind for centuries. The computer is a silent intelligible speech. teacher for most. Often computer instructions are transmitted visually through textual presentation-- The major operations of the NLP module are as analogous to conducting a lesson using the follows: chalkboard without speaking. The majority of . Text Analysis: First the text is segmented currently available educational software provides into tokens. The token-to-word conversion feedback through pictures, written words or creates the orthographic form of the token. electronic beeps and tunes. Special-education For the token “Mr” the orthographic form 119 International Journal of Pure and Applied Mathematics Special Issue “Mister” is formed by expansion, the segments take place. For individual sounds the best token “12” gets the orthographic form option (where several appropriate options are “twelve” and “1997” is transformed to available) are selected from a database and “nineteen ninety seven”. concatenated. Application of Pronunciation Rules: After the text analysis has been completed, III. PROPOSED WORK pronunciation rules can be applied. Letters cannot be transformed 1:1 into phonemes Speech synthesis can be described as artificial because correspondence is not always production of human speech and Text-to-speech parallel. In certain environments, a single synthesizer (TTS) is the technology which lets letter can correspond to either no phoneme computer speak to you. (for example, “h” in “caught”) or several The text-to-speech (TTS) synthesis procedure phoneme (“m” in “Maximum”). In consists of two main phases which is shown in addition, several letters can correspond to “Fig. 1”. The first is text analysis, where the input a single phoneme (“ch” in “rich”). There text is transcribed into a phonetic or some other are two strategies to determine linguistic representation, and the second one is the pronunciation: generation of speech waveforms. In dictionary-based solution with morphological components, as many morphemes (words) as possible are stored in a dictionary. Full forms are generated by means of inflection, derivation and composition rules. Alternatively, a full form dictionary is used in which all possible word forms are stored. Pronunciation rules determine the pronunciation of words not found in the dictionary. In a rule based solution, pronunciation rules are generated from the phonological knowledge of dictionaries. Only words whose pronunciation is a complete Fig 1: Process Diagram exception are included in the dictionary. In the first phase of the software the raw input text is entered by the user or the text/document file The two applications differ significantly in the is imported to the software which goes under the size of their dictionaries. The dictionary-based text analysis. The text analysis is nothing but the solution is many times larger than the rules-based process in which it converts raw text containing solution’s dictionary of exception. However, symbols like numbers and abbreviations into the dictionary-based solutions can be more exact than equivalent of written-out words using the english rule-based solution if they have a large enough dictionary words. This process is often called text phonetic dictionary available. normalization, pre-processing, or tokenization. Prosody Generation: after the The second phase of the software can be sub- pronunciation has been determined, the divided into two parts. The first part of the second prosody is generated. The degree of phase is for the Natural Language Processing naturalness of a TTS system is dependent (NLP). The Natural Language Processing produces on prosodic factors like intonation a phonetic transcription of the text read, together modelling (phrasing and accentuation), with prosody where the speech database is referred amplitude modelling and duration for the processing of words in a correct way. The modelling (including the duration of sound other part is for the Digital Signal Processing and the duration of pauses, which (DSP). The Digital Signal Processing transforms determines the length of the syllable and the symbolic information it receives from NLP into the tempos of the speech).[2] audible and intelligible speech. For the digital The output of the NLP module is passed to the speech of information FreeTTS and Mbrola voices DSP module. This is where the actual synthesis of are used, which are the API for the producing the the speech signal happens. In concatenative voice for text-to-speech. synthesis the selection and linking of speech 120 International Journal of Pure and Applied Mathematics Special Issue These phases can also be divided on the two parts is created by determining the best chain of as Front-end and Back-end. The Front-end has the candidate units from the database (unit selection). two major task. First for the text analysis and This process is typically achieved using a specially secondly the natural language processing. On the weighted decision tree. other hand the back-end part often referred to as the Unit selection provides the greatest naturalness. synthesizer—that converts the symbolic linguistic DSP often makes recorded speech sound less representation into sound. Which is also referred as natural, although some systems use a small amount the digital signal processing. of signal processing at the point of concatenation to smooth the waveform. The output from the best There are different ways to perform speech unit-selection systems is often indistinguishable synthesis. The choice depends on the task they are from real human voices, especially in contexts for used for, but the most widely used method is which the TTS system has been tuned. Concatentive Synthesis, because it generally produces the most natural-sounding synthesized Diphone Synthesis: Diphone synthesis uses a speech. Concatenative synthesis is based on the minimal speech database containing all the concatenation