Design and Implementation of Text to Speech Conversion for Visually Impaired People
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Estudios De I+D+I
ESTUDIOS DE I+D+I Número 51 Proyecto SIRAU. Servicio de gestión de información remota para las actividades de la vida diaria adaptable a usuario Autor/es: Catalá Mallofré, Andreu Filiación: Universidad Politécnica de Cataluña Contacto: Fecha: 2006 Para citar este documento: CATALÁ MALLOFRÉ, Andreu (Convocatoria 2006). “Proyecto SIRAU. Servicio de gestión de información remota para las actividades de la vida diaria adaptable a usuario”. Madrid. Estudios de I+D+I, nº 51. [Fecha de publicación: 03/05/2010]. <http://www.imsersomayores.csic.es/documentos/documentos/imserso-estudiosidi-51.pdf> Una iniciativa del IMSERSO y del CSIC © 2003 Portal Mayores http://www.imsersomayores.csic.es Resumen Este proyecto se enmarca dentro de una de las líneas de investigación del Centro de Estudios Tecnológicos para Personas con Dependencia (CETDP – UPC) de la Universidad Politécnica de Cataluña que se dedica a desarrollar soluciones tecnológicas para mejorar la calidad de vida de las personas con discapacidad. Se pretende aprovechar el gran avance que representan las nuevas tecnologías de identificación con radiofrecuencia (RFID), para su aplicación como sistema de apoyo a personas con déficit de distinta índole. En principio estaba pensado para personas con discapacidad visual, pero su uso es fácilmente extensible a personas con problemas de comprensión y memoria, o cualquier tipo de déficit cognitivo. La idea consiste en ofrecer la posibilidad de reconocer electrónicamente los objetos de la vida diaria, y que un sistema pueda presentar la información asociada mediante un canal verbal. Consta de un terminal portátil equipado con un trasmisor de corto alcance. Cuando el usuario acerca este terminal a un objeto o viceversa, lo identifica y ofrece información complementaria mediante un mensaje oral. -
Expression Control in Singing Voice Synthesis
Expression Control in Singing Voice Synthesis Features, approaches, n the context of singing voice synthesis, expression control manipu- [ lates a set of voice features related to a particular emotion, style, or evaluation, and challenges singer. Also known as performance modeling, it has been ] approached from different perspectives and for different purposes, and different projects have shown a wide extent of applicability. The Iaim of this article is to provide an overview of approaches to expression control in singing voice synthesis. We introduce some musical applica- tions that use singing voice synthesis techniques to justify the need for Martí Umbert, Jordi Bonada, [ an accurate control of expression. Then, expression is defined and Masataka Goto, Tomoyasu Nakano, related to speech and instrument performance modeling. Next, we pres- ent the commonly studied set of voice parameters that can change and Johan Sundberg] Digital Object Identifier 10.1109/MSP.2015.2424572 Date of publication: 13 October 2015 IMAGE LICENSED BY INGRAM PUBLISHING 1053-5888/15©2015IEEE IEEE SIGNAL PROCESSING MAGAZINE [55] noVEMBER 2015 voices that are difficult to produce naturally (e.g., castrati). [TABLE 1] RESEARCH PROJECTS USING SINGING VOICE SYNTHESIS TECHNOLOGIES. More examples can be found with pedagogical purposes or as tools to identify perceptually relevant voice properties [3]. Project WEBSITE These applications of the so-called music information CANTOR HTTP://WWW.VIRSYN.DE research field may have a great impact on the way we inter- CANTOR DIGITALIS HTTPS://CANTORDIGITALIS.LIMSI.FR/ act with music [4]. Examples of research projects using sing- CHANTER HTTPS://CHANTER.LIMSI.FR ing voice synthesis technologies are listed in Table 1. -
UN SYNTHÉTISEUR DE LA VOIX CHANTÉE BASÉ SUR MBROLA POUR LE MANDARIN Liu Ning
UN SYNTHÉTISEUR DE LA VOIX CHANTÉE BASÉ SUR MBROLA POUR LE MANDARIN Liu Ning To cite this version: Liu Ning. UN SYNTHÉTISEUR DE LA VOIX CHANTÉE BASÉ SUR MBROLA POUR LE MAN- DARIN. Journées d’Informatique Musicale, 2012, Mons, France. hal-03041805 HAL Id: hal-03041805 https://hal.archives-ouvertes.fr/hal-03041805 Submitted on 5 Dec 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Actes des Journées d’Informatique Musicale (JIM 2012), Mons, Belgique, 9-11 mai 2012 UN SYNTHÉTISEUR DE LA VOIX CHANTÉE BASÉ SUR MBROLA POUR LE MANDARIN Liu Ning CICM (Centre de recherche en Informatique et Création Musicale) Université Paris VIII, MSH Paris Nord [email protected] RESUME fonction de la mélodie et des paroles à partir d’un fichier MIDI [7]. Dans cet article, nous présentons le projet de développement d’un synthétiseur de la voix chantée basé Néanmoins, les synthèses de la voix chantée pour la sur MBROLA en mandarin pour la langue chinoise. langue chinoise ont encore des limites techniques. Par Notre objectif vise à développer un synthétiseur qui exemple, le système à partir de la lecture de fichier MIDI puisse fonctionner en temps réel, qui soit capable de se est un système en temps différé. -
Espeak : Speech Synthesis
Software Requirements Specification for ESpeak : Speech Synthesis Version 1.48.15 Prepared by Dimitrios Koufounakis January 10, 2018 Copyright © 2002 by Karl E. Wiegers. Permission is granted to use, modify, and distribute this document. Software Requirements Specification for <Project> Page ii Table of Contents Table of Contents .......................................................................................................................... ii Revision History ............................................................................................................................ ii 1. Introduction ..............................................................................................................................1 1.1 Purpose ............................................................................................................................................. 1 1.2 Document Conventions .................................................................................................................... 1 1.3 Intended Audience and Reading Suggestions................................................................................... 1 1.4 Project Scope .................................................................................................................................... 1 1.5 References......................................................................................................................................... 1 2. Overall Description ..................................................................................................................2 -
Fully Generated Scripted Dialogue for Embodied Agents
Fully Generated Scripted Dialogue for Embodied Agents Kees van Deemter 1, Brigitte Krenn 2, Paul Piwek 3, Martin Klesen 4, Marc Schröder 5, and Stefan Baumann 6 Abstract : This paper presents the NECA approach to the generation of dialogues between Embodied Conversational Agents (ECAs). This approach consist of the automated constructtion of an abstract script for an entire dialogue (cast in terms of dialogue acts), which is incrementally enhanced by a series of modules and finally ”performed” by means of text, speech and body language, by a cast of ECAs. The approach makes it possible to automatically produce a large variety of highly expressive dialogues, some of whose essential properties are under the control of a user. The paper discusses the advantages and disadvantages of NECA’s approach to Fully Generated Scripted Dialogue (FGSD), and explains the main techniques used in the two demonstrators that were built. The paper can be read as a survey of issues and techniques in the construction of ECAs, focussing on the generation of behaviour (i.e., focussing on information presentation) rather than on interpretation. Keywords : Embodied Conversational Agents, Fully Generated Scripted Dialogue, Multimodal Interfaces, Emotion modelling, Affective Reasoning, Natural Language Generation, Speech Synthesis, Body Language. 1. Introduction A number of scientific disciplines have started, in the last decade or so, to join forces to build Embodied Conversational Agents (ECAs): software agents with a human-like synthetic voice and 1 University of Aberdeen, Scotland, UK (Corresponding author, email [email protected]) 2 Austrian Research Center for Artificial Intelligence (OEFAI), University of Vienna, Austria 3 Centre for Research in Computing, The Open University, UK 4 German Research Center for Artificial Intelligence (DFKI), Saarbruecken, Germany 5 German Research Center for Artificial Intelligence (DFKI), Saarbruecken, Germany 6 University of Koeln, Germany. -
Text to Speech Synthesis for Bangla Language
I.J. Information Engineering and Electronic Business, 2019, 2, 1-9 Published Online March 2019 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijieeb.2019.02.01 Text to Speech Synthesis for Bangla Language Khandaker Mamun Ahmed Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh Email: [email protected] Prianka Mandal Department of Software Engineering, Daffodil International University, Dhaka, Bangladesh Email: [email protected] B M Mainul Hossain Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh Email: [email protected] Received: 13 September 2018; Accepted: 14 December 2018; Published: 08 March 2019 Abstract—Text-to-speech (TTS) synthesis is a rapidly A TTS system converts natural language text into growing field of research. Speech synthesis systems are speech and then, a computer system able to read text applicable to several areas such as robotics, education and aloud. A speech synthesizer converts written text to a embedded systems. The implementation of such TTS phonemic representation and then converts the phonemic system increases the correctness and efficiency of an representation to waveforms which can be output as application. Though Bangla is the seventh most spoken sound. language all over the world, uses of TTS system in There are several ways to create synthesized speech. applications are difficult to find for Bangla language Among them, concatenative synthesis and formant because of lacking simplicity and lightweightness in TTS synthesis are very popular. Concatenative synthesis is systems. Therefore, in this paper, we propose a simple based on concatenating pre-recorded speech of phonemes, and lightweight TTS system for Bangla language. -
Feasibility Study on a Text-To-Speech Synthesizer for Embedded Systems
2006:113 CIV MASTER'S THESIS Feasibility Study on a Text-To-Speech Synthesizer for Embedded Systems Linnea Hammarstedt Luleå University of Technology MSc Programmes in Engineering Electrical Engineering Department of Computer Science and Electrical Engineering Division of Signal Processing 2006:113 CIV - ISSN: 1402-1617 - ISRN: LTU-EX--06/113--SE Preface This is a master degree project commissioned by and performed at Teleca Systems GmbH in N¨urnberg at the department of Speech Technology. Teleca is an IT services company focused on developing and integrating advanced software and information technology so- lutions. Today Teleca possesses a speak recognition system including a grapheme-to-phoneme module, i.e., an algorithm converting text into phonetic notation. Their future objective is to develop a Text-To-Speech system including this module. The purpose of this work from Teleca’s point of view is to investigate a possible solution of converting phonetic notation into speech suitable for an embedded implementation platform. I would like to thank Dr. Andreas Kiessling at Teleca for his support and patient discussions during this work, and Dr. Stefan Dobler, the head of department, for giving me the possibility to experience this interesting field of speech technology. Finally, I wish to thank all the other personnel of the department for their consistently practical support. i Abstract A system converting textual information into speech is usually denoted as a TTS (Text-To- Speech) system. The design of this system varies depending on its purpose and platform requirements. In this thesis a TTS synthesizer designed for an embedded system operat- ing on an arbitrary vocabulary has been evaluated and partially implemented in Matlab, constituting a base for further development. -
Attuning Speech-Enabled Interfaces to User and Context for Inclusive Design: Technology, Methodology and Practice
CORE Metadata, citation and similar papers at core.ac.uk Provided by Springer - Publisher Connector Univ Access Inf Soc (2009) 8:109–122 DOI 10.1007/s10209-008-0136-x LONG PAPER Attuning speech-enabled interfaces to user and context for inclusive design: technology, methodology and practice Mark A. Neerincx Æ Anita H. M. Cremers Æ Judith M. Kessens Æ David A. van Leeuwen Æ Khiet P. Truong Published online: 7 August 2008 Ó The Author(s) 2008 Abstract This paper presents a methodology to apply 1 Introduction speech technology for compensating sensory, motor, cog- nitive and affective usage difficulties. It distinguishes (1) Speech technology seems to provide new opportunities to an analysis of accessibility and technological issues for the improve the accessibility of electronic services and soft- identification of context-dependent user needs and corre- ware applications, by offering compensation for limitations sponding opportunities to include speech in multimodal of specific user groups. These limitations can be quite user interfaces, and (2) an iterative generate-and-test pro- diverse and originate from specific sensory, physical or cess to refine the interface prototype and its design cognitive disabilities—such as difficulties to see icons, to rationale. Best practices show that such inclusion of speech control a mouse or to read text. Such limitations have both technology, although still imperfect in itself, can enhance functional and emotional aspects that should be addressed both the functional and affective information and com- in the design of user interfaces (cf. [49]). Speech technol- munication technology-experiences of specific user groups, ogy can be an ‘enabler’ for understanding both the content such as persons with reading difficulties, hearing-impaired, and ‘tone’ in user expressions, and for producing the right intellectually disabled, children and older adults. -
Assisting the Speech Impaired People Using Text-To-Speech Synthesis 1Ledisi G
International Journal of Emerging Engineering Research and Technology Volume 3, Issue 8, August, 2015, PP 214-224 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Assisting the Speech Impaired People Using Text-to-Speech Synthesis 1Ledisi G. Kabari 2Ledum F. Atu 1,2Department of Computer Science, Rivers State Polytechnic, Bori, Nigeria ABSTRACT Many people have some sort of disability which impairs their ability to communicate, thus with all their intelligence they are not able to participate in conferences or meeting proceedings and consequently inhibit in a way their contributions to development of the nation. Work in alternative and augmentative communication (AAC) devices attempts to address this need. This paper tries to design a system that can be used to read out an input to its user. The system was developed using Visual Basic6.0 for the user interface, and was interfaced with the Lemout & Hauspie True Voice Text-to-Speech Engine (American English) and Microsoft agent. The system will allow the users to input the text to be read out, and also allows the user to open text document of the Rich Text Formant (.rtf) and Text File format (.txt) that have been saved on disk, for reading. Keywords: Text-To-Speech (TTS), Digital Signal Processing (DSP), Natural Language Processing (NLP), Alterative and Augmentative Communication (AAC), Rich Text file Format (RTF), Text File Format (TFF). INTRODUCTION The era has come and it is now here, where the mind and reasoning of all are needed for the development of a nation. Thus people who have some sort of disability which impairs their ability to communicate cannot be left behind. -
Guía De Accesibilidad De Oracle Solaris 11 Desktop • Octubre De 2012 Contenido
Guía de accesibilidad de Oracle® Solaris 11 Desktop Referencia: E36637–01 Octubre de 2012 Copyright © 2011, 2012, Oracle y/o sus filiales. Todos los derechos reservados. Este software y la documentación relacionada están sujetos a un contrato de licencia que incluye restricciones de uso y revelación, y se encuentran protegidos por la legislación sobre la propiedad intelectual. A menos que figure explícitamente en el contrato de licencia o esté permitido por la ley, no se podrá utilizar, copiar, reproducir, traducir, emitir, modificar, conceder licencias, transmitir, distribuir, exhibir, representar, publicar ni mostrar ninguna parte,de ninguna forma, por ningún medio. Queda prohibida la ingeniería inversa, desensamblaje o descompilación de este software, excepto en la medida en que sean necesarios para conseguir interoperabilidad según lo especificado por la legislación aplicable. La información contenida en este documento puede someterse a modificaciones sin previo aviso y no se garantiza que se encuentre exenta de errores. Sidetecta algún error, le agradeceremos que nos lo comunique por escrito. Si este software o la documentación relacionada se entrega al Gobierno de EE.UU. o a cualquier entidad que adquiera licencias en nombre del Gobierno de EE.UU. se aplicará la siguiente disposición: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. -
Multi-Lingual Screen Reader and Processing of Font-Data in Indian Languages Submitted by A
MULTI-LINGUAL SCREEN READER AND PROCESSING OF FONT-DATA IN INDIAN LANGUAGES A THESIS submitted by A. ANAND AROKIA RAJ for the award of the degree of Master of Science (by Research) in Computer Science & Engineering LANGUAGE TECHNOLOGIES RESEARCH CENTER INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY HYDERABAD - 500 032, INDIA February 2008 Dedication I dedicate this work to my loving parents ANTONI MUTHU RAYAR and MUTHU MARY who are always behind my growth. i THESIS CERTIFICATE This is to certify that the thesis entitled Multi-Lingual Screen Reader and Processing of Font-Data in Indian Languages submitted by A. Anand Arokia Raj to the International Institute of Information Technology, Hyderabad for the award of the degree of Master of Science (by Research) in Computer Science & Engineering is a bonafide record of research work carried out by him under our supervision. The contents of this thesis, in full or in parts, have not been submitted to any other Institute or University for the award of any degree or diploma. Hyderabad - 500 032 (Mr. S. Kishore Prahallad) Date: ACKNOWLEDGMENTS I thank my adviser Mr.Kishore Prahallad for his guidance during my thesis work. Because of him now I am turned into not only a researcher but also a developer. I thank Prof.Rajeev Sangal and Dr.Vasudeva Varma for their valuable advices and inputs for improving the screen reading software. I am also thankful to my speech lab friends and other IIIT friends who helped me in many ways. They were with me when I needed a help and when I needed a relaxation. -
A Unit Selection Text-To-Speech-And-Singing Synthesis Framework from Neutral Speech: Proof of Concept 39 II.1 Introduction
Adding expressiveness to unit selection speech synthesis and to numerical voice production 90) - 02 - Marc Freixes Guerreiro http://hdl.handle.net/10803/672066 Generalitat 472 (28 de Catalunya núm. Rgtre. Fund. ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de ió la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual undac F (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs. Universitat Ramon Llull Universitat Ramon ADVERTENCIA. El acceso a los contenidos de esta tesis doctoral y su utilización debe respetar los derechos de la persona autora. Puede ser utilizada para consulta o estudio personal, así como en actividades o materiales de investigación y docencia en los términos establecidos en el art. 32 del Texto Refundido de la Ley de Propiedad Intelectual (RDL 1/1996).