Využití Freetts Pro Voicexml

Total Page:16

File Type:pdf, Size:1020Kb

Využití Freetts Pro Voicexml MASARYKOVA UNIVERZITA F}w¡¢£¤¥¦§¨ AKULTA INFORMATIKY !"#$%&'()+,-./012345<yA| Využití FreeTTS pro VoiceXML BAKALÁRSKÁˇ PRÁCE VojtˇechZavˇrel Brno, podzim 2006 Prohlášení Prohlašuji, že tato bakaláˇrskápráce je mým p ˚uvodnímautorským dílem, které jsem vypra- coval samostatnˇe.Všechny zdroje, prameny a literaturu, které jsem pˇrivypracování použí- val nebo z nich ˇcerpal,v práci ˇrádnˇecituji s uvedením úplného odkazu na pˇríslušnýzdroj. Vedoucí práce: Mgr. Pavel Cenek ii Podˇekování Pˇredevšímbych chtˇelpodˇekovatvedoucímu práce Mgr. Pavlu Cenkovi za ochotné a kon- struktivní konzultace, dále HanˇeDvoˇráˇckovéa Tat’ánˇeZavˇrelovéza jazykové korektury a také všem ostatním, kteˇrímˇepodpoˇrilipˇritvorbˇetéto práce. iii Shrnutí FreeTTS je ˇreˇcovýsyntetizér napsaný v jazyce Java. Je zajímavý pˇredevšímdíky snadné pˇrenositelnostimezi r ˚uznýmioperaˇcnímisystémy. Jedno z dnešních nejˇcastˇejšíchvyužití ˇreˇcovýchsyntetizér ˚ujsou VoiceXML platformy, kde se syntetizéry uplatˇnujípˇrigenerování ˇreˇcovéhovýstupu. Bohužel FreeTTS nelze s VoiceXML platformami pˇrímoˇcaˇreintegrovat, protože nepodporuje potˇrebnéstandardy, zejména SSML a MRCP. Tato práce demostruje, že taková integrace pˇrestomožná je, za použití proprietárních technologií a urˇcitýchfunkˇcních ústupk ˚u.Pro integraci byla zvolena VoiceXML platforma OptimTalk. V práci je pˇredstavena technická realizace a podrobnˇediskutovány možnosti a limity navrženého ˇrešení. iv Klíˇcováslova text-to-speech, FreeTTS, MBrolla, Flite, OptimTalk, JNI, VoiceXML, SSML, syntéza ˇreˇci v Obsah 1 Úvod ............................................. 1 2 Syntéza ˇreˇci ......................................... 2 2.1 Historie ......................................... 2 2.2 Postupy pˇrisyntéze ˇreˇci ............................... 2 2.2.1 Syntéza ˇreˇciv ˇcasovéoblasti . 3 2.2.1.1 Výbˇersegment ˚u . 3 2.2.1.2 Difónová/trifónová syntéza ˇreˇci. 4 2.2.1.3 Doménovˇespecifická syntéza ˇreˇci . 4 2.2.2 Syntéza ˇreˇcive frekvenˇcníoblasti . 4 2.2.3 Artikulaˇcnísyntéza ˇreˇci . 5 2.3 Ovlivˇnovánívýstupu ˇreˇcovésyntézy ........................ 5 2.4 Využití syntézy ˇreˇci .................................. 5 3 Reˇcovésyntetizéryˇ ..................................... 7 3.1 Standardy pro ˇreˇcovésyntetizéry .......................... 7 3.1.1 SSML (Speech Synthesis Markup Language) . 7 3.1.2 Komunikaˇcníprotokoly klient/server . 8 3.1.2.1 MRCP (Media Resource Control Protocol) . 8 3.1.2.2 TTSCP (Text-to-Speech Control Protocol) . 9 3.2 OpenSource TTS systémy .............................. 9 3.2.1 MBROLA . 9 3.2.2 Festival TTS . 10 3.2.3 FreeTTS . 10 3.2.4 Další systémy . 10 3.2.4.1 Epos . 10 3.2.4.2 GNUSpeech . 11 3.3 KomerˇcníTTS systémy ................................ 11 3.3.1 TTS spoleˇcnostiNuance . 11 3.3.2 TTS Acapela Group . 11 3.3.3 TTS spoleˇcnostiLoquendo . 12 3.3.4 TTS spoleˇcnostiSpeechTech . 12 3.4 Vlastnosti FreeTTS .................................. 12 3.4.1 Hlasy podporované FreeTTS . 13 3.4.2 Standardy podporované FreeTTS . 14 3.4.2.1 JSAPI (Java Speech Application Programming Interface) . 14 3.4.2.2 SSML a jazyky pro ovlivˇnovánívýstupu . 14 3.4.2.3 Klient/server komunikaˇcníprotokol . 15 3.4.3 Výkonnost FreeTTS vs. Flite . 15 4 W3C Speech Interface Framework a platforma OptimTalk ............. 17 4.1 VoiceXML 2.0 (Voice eXtensible Markup Language) ............... 18 4.1.1 ProˇcVoiceXML . 19 vi 4.1.2 Struktura VoiceXML dokumentu . 19 4.2 SSML (Speech Synthesis Markup Language) ................... 20 4.3 Další jazyky z rodiny W3C Speech Interface Framework ............ 23 4.3.1 CCXML (Call Control eXtensible Markup Language) . 23 4.3.2 SRGS (Speech Recognition Grammar Specification) . 23 4.3.3 SISR (Semantic Interpretation for Speech Recognition) . 23 4.4 Platforma OptimTalk ................................. 23 4.4.1 Životní cyklus výstupní komponenty . 24 5 Integrace FreeTTS s platformou OptimTalk ...................... 25 5.1 JNI (Java Native Interface) .............................. 25 5.1.1 Schéma aplikace využívající JNI . 26 5.2 Možné varianty integrace .............................. 27 5.2.1 Jednosmˇernáintegrace . 27 5.2.2 Obousmˇernáintegrace pomocí JNI . 28 5.2.2.1 Rešeníˇ pomocí doˇcasnýchsoubor ˚u . 28 5.2.2.2 Rešeníˇ pomocí audio stream ˚u . 28 5.2.3 Obousmˇernéintegrace pomocí socket ˚u . 29 5.2.4 Srovnání integraˇcníchpostup ˚u . 29 5.3 Praktická realizace .................................. 29 5.3.1 Výstupní komponenta free_tts_ouput . 30 5.3.2 Knihovna OptimTalkFreeTTSComponent . 31 5.3.3 Nastavení free_tts_output komponenty . 32 6 Závˇer ............................................. 34 Literatura . 35 A Výkonnost FreeTTS vs. Flite ............................... 36 vii Kapitola 1 Úvod V dobˇe,kdy informaˇcnítechnologie jsou stále pokroˇcilejšía osobní poˇcítaˇcevýkonnˇejší,se s jejich pomocí snažíme docílit dˇrívenemyslitelných úkon ˚u.Pár let zpˇetbyla možnost, že by poˇcítaˇcpromlouval v kvalitˇesrovnatelné s ˇclovˇekem,jen obtížnˇepˇredstavitelná. V poslední dobˇese situace znaˇcnˇezlepšila a stroje jsou schopné psaný text interpretovat natolik kva- litnˇe,že témˇeˇrnepoznáme rozdíl mezi automatem a živou osobou. S ohledem na rozšiˇrující se dostupnost informaˇcníchzroj ˚u,používání elektronických knih a mnoho dalších oblastí se rozšiˇrujímožnosti použití kvalitní syntézy ˇreˇci.Tahounem sektoru jsou komerˇcníaplikace (automatizované telefonní služby, statistické a reklamní systémy, naviganˇcnípˇrístroje atd). Velký užitek z tˇechtostále zdokonalovaných technologií mají také slabozrací a nevidomí, pro které je hlasová syntéza dalším prostˇredkem,jak se zapojit do bˇežnéhoživota. Stranou zájmu však nez ˚ustavajíani bˇežníuživatelé. Témˇeˇrkaždý operaˇcnísystém již má integrován nˇekterýˇreˇcovýsyntetizér a pˇrestožek velké rozšíˇrenostijejich používání zatím nedošlo, je možné to v budoucnu oˇcekávat. Cílem této práce je prozkoumat stav na poli ˇreˇcovýchsyntetizér ˚u,pˇredevšímpak mul- tiplatformního ˇreˇcovéhosyntetizéru FreeTTS a prozkoumat možnost spolupráce FreeTTS s VoiceXML. FreeTTS je syntetizér výjimeˇcnýpˇredevšímdíky použitému programovacímu jazyku Java a tím snadné pˇrenositelnosti a široké použitelnosti. FreeTTS bohužel není možné pˇrímovyužít pro jedno z nejˇcastˇejšíchvyužití ˇreˇcovýchsyntetizér ˚u,integraci s VoiceXML platformami, protože nepodporuje patˇriˇcnéstandardy, zejména SSML a MRCP. Tato práce demonstruje, že taková integrace pˇrestomožná je za použití proprietárních technologií a urˇcitýchfunkˇcníchústupk ˚u.Pro integraci byla zvolena VoiceXML platforma OptimTalk. Zbytek práce je organizován následujícím zp ˚usobem.V kapitole 2 se dozvíme nˇecomálo o historii, projdeme techniky používané pˇriˇreˇcovésyntéze a zdiskutujeme možnosti pou- žití ˇreˇcovýchsyntetizér ˚u.V další kapitole 3 pˇredstavímenˇekteréznámé text-to-speech sys- témy a také si ˇreknˇemenˇecoo vlastnostech systému FreeTTS. Následující ˇcást 4 se zabývá standardy z rodiny "Voice Browser" Activity a také pˇredstavía zˇcástipopíše platformu Op- timTalk, na které bude v poslední ˇcásti 5 ukázána možnost integrace FreeTTS s platformou podporující VoiceXML. 1 Kapitola 2 Syntéza ˇreˇci Syntézou ˇreˇcinazýváme poˇcítaˇcovouprodukci mluveného slova. Automaty, které se syté- zou ˇreˇcizabývají, nazýváme ˇreˇcovýmisyntetizéry. Mnohdy je možné se setkat s anglickým výrazem text-to-speech, odkud pochází zkratka TTS. P ˚uvodnˇetext-to-speech znamenalo komplexnˇejšísystém, který umožˇnujepˇrevodybˇežnéhopsaného textu do mluvené podoby, zatímco ˇreˇcovésyntetizéry tuto vlastnost nemˇely. V tomto textu však budeme oba výrazy chápat jako synonyma. Text-to-speech systémy se obvykle skládají ze dvou ˇcástí.Front-end a back-end. Front- end má za úkol pˇrevéstvšechny symbolické znaˇcky, ˇcíslaapod. na jejich psaný ekvivalent (napˇr.9 na "nine"). Tato ˇcástse zjednodušenˇeoznaˇcujejako pˇredzpracování(pre-processing), normalizace a rozdˇeleníkorpusu na slova (tokenizace) textu. Následuje fonetický pˇrepis každého slova, segmentace (fráze, vˇety),identifikace pauz, intonace apod. Back-end se na- proti tomu stará o samotnou zvukovou syntézu. Pˇripravenýtext pˇrevededo zvukové po- doby na základˇesymbolického zápisu, který je produktem front-endu. [11, 3, 4] 2.1 Historie V minulosti se lidé ˇcastopokoušeli vyrobit mluvící stroj, ale s ohledem na ˇcistˇemechanické možnosti byla kvalita výstupu a rozsah promluvy znaˇcnˇeomezené. Vˇedcise snažili pˇrede- vším napodobit fungování lidského vokálního ústrojí na mechanické úrovni. V roce 1939 byl na výstavˇev New Yorku pˇredstavenVODER, který srozumitelnˇepˇred- ˇcítaltexty zadané na klávesnici. V druhé polovinˇepadesátých let 20. století se zrodil první poˇcítaˇcovýˇreˇcovýsyntetizér. Pˇribližnˇeo 10 let pozdˇejisvˇetlosvˇetaspatˇriltaké první kom- pletní text-to-speech systém. Od té doby se syntetizaˇcnítechniky výraznˇevylepšily a dnes se k nim používá témˇeˇr výhradnˇepoˇcítaˇcovátechnika. [11] 2.2 Postupy pˇrisyntéze ˇreˇci V souˇcasnédobˇerozeznáváme tˇrizákladní postupy pˇrisyntéze ˇreˇci. Konkatenativní neboli ˇcasovásyntéza se zakládá na skládání promluvy z malých ˇcástí slov. Tyto ˇcásti(segmenty) jsou namluvené ˇclovˇekema zaznamenané do databáze. Databáze segment ˚ubývají také nazývány hlasy. Castoˇ se databáze skládá z foném ˚u,difón ˚uapod., po- kud se jedná o tzv. obecnou doménu (general domain), resp. z vˇetšíchcelk ˚u(slov, ˇcástívˇet), 2 2.2. POSTUPY PRIˇ SYNTÉZE REˇ CIˇ pokud se jedná o omezenou doménu (limited domain). Pokud hlasy podporují obecnou do- ménu, je možné pˇrijejich využití syntetizovat libovolný text. Pokud však podporují
Recommended publications
  • Estudios De I+D+I
    ESTUDIOS DE I+D+I Número 51 Proyecto SIRAU. Servicio de gestión de información remota para las actividades de la vida diaria adaptable a usuario Autor/es: Catalá Mallofré, Andreu Filiación: Universidad Politécnica de Cataluña Contacto: Fecha: 2006 Para citar este documento: CATALÁ MALLOFRÉ, Andreu (Convocatoria 2006). “Proyecto SIRAU. Servicio de gestión de información remota para las actividades de la vida diaria adaptable a usuario”. Madrid. Estudios de I+D+I, nº 51. [Fecha de publicación: 03/05/2010]. <http://www.imsersomayores.csic.es/documentos/documentos/imserso-estudiosidi-51.pdf> Una iniciativa del IMSERSO y del CSIC © 2003 Portal Mayores http://www.imsersomayores.csic.es Resumen Este proyecto se enmarca dentro de una de las líneas de investigación del Centro de Estudios Tecnológicos para Personas con Dependencia (CETDP – UPC) de la Universidad Politécnica de Cataluña que se dedica a desarrollar soluciones tecnológicas para mejorar la calidad de vida de las personas con discapacidad. Se pretende aprovechar el gran avance que representan las nuevas tecnologías de identificación con radiofrecuencia (RFID), para su aplicación como sistema de apoyo a personas con déficit de distinta índole. En principio estaba pensado para personas con discapacidad visual, pero su uso es fácilmente extensible a personas con problemas de comprensión y memoria, o cualquier tipo de déficit cognitivo. La idea consiste en ofrecer la posibilidad de reconocer electrónicamente los objetos de la vida diaria, y que un sistema pueda presentar la información asociada mediante un canal verbal. Consta de un terminal portátil equipado con un trasmisor de corto alcance. Cuando el usuario acerca este terminal a un objeto o viceversa, lo identifica y ofrece información complementaria mediante un mensaje oral.
    [Show full text]
  • Text to Speech Synthesis for Bangla Language
    I.J. Information Engineering and Electronic Business, 2019, 2, 1-9 Published Online March 2019 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijieeb.2019.02.01 Text to Speech Synthesis for Bangla Language Khandaker Mamun Ahmed Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh Email: [email protected] Prianka Mandal Department of Software Engineering, Daffodil International University, Dhaka, Bangladesh Email: [email protected] B M Mainul Hossain Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh Email: [email protected] Received: 13 September 2018; Accepted: 14 December 2018; Published: 08 March 2019 Abstract—Text-to-speech (TTS) synthesis is a rapidly A TTS system converts natural language text into growing field of research. Speech synthesis systems are speech and then, a computer system able to read text applicable to several areas such as robotics, education and aloud. A speech synthesizer converts written text to a embedded systems. The implementation of such TTS phonemic representation and then converts the phonemic system increases the correctness and efficiency of an representation to waveforms which can be output as application. Though Bangla is the seventh most spoken sound. language all over the world, uses of TTS system in There are several ways to create synthesized speech. applications are difficult to find for Bangla language Among them, concatenative synthesis and formant because of lacking simplicity and lightweightness in TTS synthesis are very popular. Concatenative synthesis is systems. Therefore, in this paper, we propose a simple based on concatenating pre-recorded speech of phonemes, and lightweight TTS system for Bangla language.
    [Show full text]
  • Design and Implementation of Text to Speech Conversion for Visually Impaired People
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Covenant University Repository International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume 7– No. 2, April 2014 – www.ijais.org Design and Implementation of Text To Speech Conversion for Visually Impaired People Itunuoluwa Isewon* Jelili Oyelade Olufunke Oladipupo Department of Computer and Department of Computer and Department of Computer and Information Sciences Information Sciences Information Sciences Covenant University Covenant University Covenant University PMB 1023, Ota, Nigeria PMB 1023, Ota, Nigeria PMB 1023, Ota, Nigeria * Corresponding Author ABSTRACT A Text-to-speech synthesizer is an application that converts text into spoken word, by analyzing and processing the text using Natural Language Processing (NLP) and then using Digital Signal Processing (DSP) technology to convert this processed text into synthesized speech representation of the text. Here, we developed a useful text-to-speech synthesizer in the form of a simple application that converts inputted text into synthesized speech and reads out to the user which can then be saved as an mp3.file. The development of a text to Figure 1: A simple but general functional diagram of a speech synthesizer will be of great help to people with visual TTS system. [2] impairment and make making through large volume of text easier. 2. OVERVIEW OF SPEECH SYNTHESIS Speech synthesis can be described as artificial production of Keywords human speech [3]. A computer system used for this purpose is Text-to-speech synthesis, Natural Language Processing, called a speech synthesizer, and can be implemented in Digital Signal Processing software or hardware.
    [Show full text]
  • Guía De Accesibilidad De Oracle Solaris 11 Desktop • Octubre De 2012 Contenido
    Guía de accesibilidad de Oracle® Solaris 11 Desktop Referencia: E36637–01 Octubre de 2012 Copyright © 2011, 2012, Oracle y/o sus filiales. Todos los derechos reservados. Este software y la documentación relacionada están sujetos a un contrato de licencia que incluye restricciones de uso y revelación, y se encuentran protegidos por la legislación sobre la propiedad intelectual. A menos que figure explícitamente en el contrato de licencia o esté permitido por la ley, no se podrá utilizar, copiar, reproducir, traducir, emitir, modificar, conceder licencias, transmitir, distribuir, exhibir, representar, publicar ni mostrar ninguna parte,de ninguna forma, por ningún medio. Queda prohibida la ingeniería inversa, desensamblaje o descompilación de este software, excepto en la medida en que sean necesarios para conseguir interoperabilidad según lo especificado por la legislación aplicable. La información contenida en este documento puede someterse a modificaciones sin previo aviso y no se garantiza que se encuentre exenta de errores. Sidetecta algún error, le agradeceremos que nos lo comunique por escrito. Si este software o la documentación relacionada se entrega al Gobierno de EE.UU. o a cualquier entidad que adquiera licencias en nombre del Gobierno de EE.UU. se aplicará la siguiente disposición: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs.
    [Show full text]
  • Instituto De Computação Universidade Estadual De Campinas
    INSTITUTO DE COMPUTAÇÃO UNIVERSIDADE ESTADUAL DE CAMPINAS A Systematic Literature Review on Awareness of Others in Accessible Collaborative RIAs Leonelo D. A. Almeida M. Cecília C. Baranauskas Technical Report - IC-12-26 - Relatório Técnico December - 2012 - Dezembro The contents of this report are the sole responsibility of the authors. O conteúdo do presente relatório é de única responsabilidade dos autores. A Systematic Literature Review on Awareness of Others in Accessible Collaborative RIAs Leonelo D. A. Almeida, M. Cecília C. Baranauskas Institute of Computing, University of Campinas (UNICAMP) Albert Einstein Av., 1251, 13083-970, Campinas-SP, Brazil {leonelo.almeida, cecilia}@ic.unicamp.br Abstract. The more robust and dynamic aspects of Web 2.0 applications (also named Rich Internet Applications, RIAs) stimulate the participation and collaboration among people while interacting with such shared interaction spaces. An evident consequence (e.g. Facebook, Instagran, and Twitter) is the increasing influence of RIAs on other media channels as TV and newspapers. However, the current state-of-art of Web 2.0 does not provide equitative opportunities of interaction for people. Accessibility in RIAs is still a challenging objective. Also, for aspects as awareness of others on RIAs that provided collaboration features the development of accessible mechanisms is not restricted to semantic markup but it also involves data structures, politeness, load of data, and other characteristics. This technical report presents a Systematic Literature Review process designed for investigating the aspect of awareness of others in accessible collaborative RIAs; it also reports included and excluded studies and the data collected from the reviewed studies. Keywords: Web 2.0, Accessibility, Systematic Literature Review.
    [Show full text]
  • Virtual Storytelling: Emotions for the Narrator
    Virtual Storytelling: Emotions for the narrator Master's thesis, August 2007 H.A. Buurman Committee Faculty of Human Media Interaction, dr. M. Theune Department of Electrical Engineering, dr. ir. H.J.A. op den Akker Mathematics & Computer Science, dr. R.J.F. Ordelman University of Twente ii iii Preface During my time here as a student in the Computer Science department, the ¯eld of language appealed to me more than all the other ¯elds available. For my internship, I worked on an as- signment involving speech recognition, so when a graduation project concerning speech generation became available I applied for it. This seemed like a nice complementation of my experience with speech recognition. Though the project took me in previously unknown ¯elds of psychol- ogy, linguistics, and (unfortunately) statistics, I feel a lot more at home working with- and on Text-to-Speech applications. Along the way, lots of people supported me, motivated me and helped me by participating in experiments. I would like to thank these people, starting with MariÄetTheune, who kept providing me with constructive feedback and never stopped motivating me. Also I would like to thank the rest of my graduation committee: Rieks op den Akker and Roeland Ordelman, who despite their busy schedule found time now and then to provide alternative insights and support. My family also deserves my gratitude for their continued motivation and interest. Lastly I would like to thank all the people who helped me with the experiments. Without you, this would have been a lot more di±cult. Herbert Buurman iv Samenvatting De ontwikkeling van het virtueel vertellen van verhalen staat nooit stil.
    [Show full text]
  • Appendix VI Samples of Final Year Projects with Marking Sheets
    University of St Andrews School of Computer Science Appendix VI Samples of Final Year Projects with Marking Sheets Automated Class Questionnaires – Acquire Author: Gareth Edwards University of St Andrews 24th April 2003 1 Abstract This document discusses the Acquire system designed to facilitate on-line submissions of module reviews by students enrolled at the University of St Andrews; specifically the members of the School of Computer Science. The current system requires students to fill in and submit paper based forms which are read by an optical system. This system has proved unreliable so the purpose of this project was to create a new computerised system prototype to investigate whether replacing the existing system with a web-based form submission system was viable and to discover any advantages and disadvantages of such a system. Declaration I declare that the material submitted for assessment is my own work except where credit is explicitly given to others by citation or acknowledgement. This work was performed during the current academic year except where otherwise stated. The main text of this project report is 14004 words long, including project specification and plan. In submitting this project report to the University of St Andrews, I give permission for it to be made available for use in accordance with the regulations of the University Library. I also give permission for the title and abstract to be published and for copies of the report to made and supplied at cost to any bona fide library or research worker, and to be made available on the World Wide Web.
    [Show full text]
  • Automatic Conversion of Natural Language to 3D Animation
    Automatic Conversion of Natural Language to 3D Animation Minhua Ma B.A., M.A., M.Sc. Faculty of Engineering University of Ulster A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy July 2006 ii Table of Contents List of Figures vi List of Tables ix Acknowledgements x Abstract xi Abbreviations xii Note on access to contents xiv 1. INTRODUCTION......................................................................................................................... 1 1.1 Overview of language visualisation ....................................................................................... 2 1.1.1 Multimodal output............................................................................................................ 2 1.1.2 Animation......................................................................................................................... 3 1.1.3 Intelligent ......................................................................................................................... 4 1.2 Problems in language visualisation ........................................................................................ 4 1.3 Objectives of this research...................................................................................................... 5 1.4 Outline of this thesis............................................................................................................... 5 2. APPROACHES TO MULTIMODAL PROCESSING...............................................................8
    [Show full text]
  • Spatial Speaker: 3D Java Text-To-Speech Converter
    Proceedings of the World Congress on Engineering and Computer Science 2009 Vol II WCECS 2009, October 20-22, 2009, San Francisco, USA Spatial Speaker: 3D Java Text-to-Speech Converter Jaka Sodnik and Sašo Tomažič goal in the TTS field. Abstract— Text-to-speech (TTS) converters are the key At the moment, numerous examples of TTS software can components of various types of auditory displays. Such be found on the market, for example stand-alone speech converters are extremely useful for visually impaired computer programs that convert all types of text inserted through an users who depend on synthesized speech read from the input interface into high quality realistic speech output computer screen or directly from the web. In this paper we propose an enhancement of a Java FreeTTS speech synthesizer (TextAloud [2], VoiceMX STUDIO [3], etc.). Many of them by adding the function of spatial positioning of both the speaker can save the readings to a media file that can later be played and the listener. With our module, an arbitrary text from a file on demand. On the other hand, there are also several speech or the web can be read to the user through the headphones from synthesis libraries that can be included and used in various a fixed or changing position in space. In our solution, we program languages. FreeTTS is an example of such a TTS combine the following modules: FreeTTS speech synthesizer, a library written in Java programming language [4]. custom made speech processing unit, MIT Media Lab HRTF library, JOAL positioning library and Creative X-Fi sound In this paper, we propose an extension to the FreeTTS card.
    [Show full text]
  • New Icad Templates
    Proceedings of ICAD 05-Eleventh Meeting of the International Conference on Auditory Display, Limerick, Ireland, July 6-9, 2005 Lexical Semantics and Auditory Presentation in Virtual Storytelling Research paper for the ICAD05 workshop "Combining Speech and Sound in the User Interface" Minhua Ma and Paul Mc Kevitt School of Computing & Intelligent Systems, Faculty of Engineering, University of Ulster, Magee Campus Derry/Londonderry, BT48 7JL Northern Ireland. {m.ma,[email protected]} ABSTRACT Audio presentation is an important modality in virtual 2. BACKGROUND AND PREVIOUS WORK storytelling. In this paper we present our work on audio presentation in our intelligent multimodal storytelling system, We are developing an intelligent multimedia storytelling CONFUCIUS, which automatically generates 3D animation interpretation and presentation system called CONFUCIUS. It speech, and non-speech audio from natural language automatically generates 3D animation and speech from natural sentences. We provide an overview of the system and describe language sentences as shown in Figure 1. The input of speech and non-speech audio in virtual storytelling by using CONFUCIUS is sentences taken from children’s stories like linguistic approaches. We discuss several issues in auditory "Alice in Wonderland" or play scripts for children. display, such as its relation to verb and adjective ontology, CONFUCIUS’ multimodal output include 3D animation with concepts and modalities, and media allocation. Finally we speech and nonspeech audio, and a presentation agent—Merlin conclude that introducing linguistic knowledge provides more the narrator. Our work on virtual storytelling so far focussed on intelligent virtual storytelling, especially in audio presentation. generating virtual human animation and speech with particular emphasis on how to use visual and audio presentation to cover more verb classes.
    [Show full text]
  • Java Art Chapter 5. Program Sonification
    Java Prog. Techniques for Games. Java Art Chapter 5. Sonification Draft #1 (4th May 09) Java Art Chapter 5. Program Sonification Program sonification (also called auralization) is the transformation of an executing program into auditory information. I’m not talking about an application playing a sound clip, but the entire program becoming the clip or piece of music. The motivation for this unusual transformation is the same as for program visualization – as a way of better understanding what's happening inside code, to aid with its debugging and modification. Music is inherently structured, hierarchical, and time-based, which suggests that it should be a good representation for structured and hierarchical code, whose execution is also time-based of course. Music offers many benefits as a notation, being both memorable and familiar. Even the simplest melody utilizes numerous attributes, such as sound location, loudness, pitch, sound quality (timbre), duration, rate of change, and ordering. These attributes can be variously matched to code attributes, such as data assignment, iteration and selection, and method calls and returns. Moving beyond a melody into more complex musical forms, lets us match recurring themes, orchestration, and multiple voices to programming ideas such as recursion, code reuse, and concurrency. A drawback of music is the difficulty of representing quantitative information (e.g. that the integer x has the value 2), although qualitative statements are relatively easy to express (e.g. that the x value is increasing). One solution is lyrics: spoken (or sung) words to convey concrete details. I’ll be implementing program sonification using the tracer ideas discussed in the last two chapters (i.e.
    [Show full text]
  • D10.1: State of the Art of Accessibility Tools Revision: 1.0 As of 28Th February 2011
    DELIVERABLE Project Acronym: EuDML Grant Agreement number: 250503 Project Title: The European Digital Mathematics Library D10.1: State of the Art of Accessibility Tools Revision: 1.0 as of 28th February 2011 Authors: Volker Sorge University of Birmingham Mark Lee University of Birmingham Petr Sojka Masaryk University Alan P. Sexton University of Birmingham Contributors: Martin Jarmar Masaryk University Project co-funded by the European Comission within the ICT Policy Support Programme Dissemination Level P Public X C Confidential, only for members of the consortium and the Commission Services Revision History Revision Date Author Organisation Description 0.1 15th October 2010 Volker Sorge UB Just a content outline. 0.2 28th January 2011 Volker Sorge UB Section 2 completed. 0.3 4th February 2011 Petr Sojka MU MU experience part added. 0.4 13th Feburary 2011 Volker Sorge UB Sections 3,4,5 completed. 0.5 14th February 2011 VS+PS UB+MU Development version. 0.6 14th February 2011 Alan P. Sexton UB Typo corrections and pol- ishing 0.7 19th February 2011 Mark Lee UB Addition of conclusions on language translation 1.0 28th Feburary 2011 Volker Sorge UB Made changes suggested by reviewer and final edit- ing. Statement of originality: This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both. Abstract The purpose of this report is to present a state of the art in accessibility tools that can be used to provide access to mathematical literature to visually impaired users as well as print impaired users (i.e., people with specific learning disabilities like dyslexia) as well as an overview of current automated translation technology.
    [Show full text]