Improvements in Speech Synthesis Improvementsinspeechsynthesis.Editedbye.Keller Et Al

Total Page:16

File Type:pdf, Size:1020Kb

Improvements in Speech Synthesis Improvementsinspeechsynthesis.Editedbye.Keller Et Al ImprovementsinSpeechSynthesis.EditedbyE.Keller et al. Copyright # 2002 by John Wiley & Sons, Ltd ISBNs: 0-471-49985-4 &Hardback); 0-470-84594-5 &Electronic) Improvements in Speech Synthesis ImprovementsinSpeechSynthesis.EditedbyE.Keller et al. Copyright # 2002 by John Wiley & Sons, Ltd ISBNs: 0-471-49985-4 &Hardback); 0-470-84594-5 &Electronic) Improvements in Speech Synthesis COST 258: The Naturalness of Synthetic Speech Edited by E. Keller, University of Lausanne, Switzerland G. Bailly, INPG, France A. Monaghan, Aculab plc, UK J. Terken, Technische Universiteit Eindhoven, The Netherlands M. Huckvale, University College London, UK JOHN WILEY & SONS, LTD ImprovementsinSpeechSynthesis.EditedbyE.Keller et al. Copyright # 2002 by John Wiley & Sons, Ltd ISBNs: 0-471-49985-4 &Hardback); 0-470-84594-5 &Electronic) Copyright # 2002 by John Wiley & Sons, Ltd Baffins Lane, Chichester, West Sussex, PO19 1UD, England National 01243 779777 International &44) 1243 779777 e-mail &for orders and customer service enquiries): [email protected] Visit our Home Page on http://www.wiley.co.uk or http://www.wiley.com All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency, 90 Tottenham Court Road, London, W1P 9HE, UK, without the permission in writing of the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the publication. Neither the author&s) nor John Wiley and Sons Ltd accept any responsibility or liability for loss or damage occasioned to any person or property through using the material, instructions, methods or ideas contained herein, or acting or refraining from acting as a result of such use. The author&s) and Publisher expressly disclaim all implied warranties, including merchantability of fitness for any particular purpose. Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley and Sons is aware of a claim, the product names appear in initial capital or capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trade- marks and registration. Other Wiley Editorial Offices John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158±0012, USA WILEY-VCH Verlag GmbH Pappelallee 3, D-69469 Weinheim, Germany John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia John Wiley & Sons &Canada) Ltd, 22 Worcester Road Rexdale, Ontario, M9W 1L1, Canada John Wiley & Sons &Asia) Pte Ltd, 2 Clementi Loop #02±01, Jin Xing Distripark, Singapore 129809 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 0471 49985 4 Typeset in 10/12pt Times by Kolam Information Services Ltd, Pondicherry, India. Printed and bound in Great Britain by Biddles Ltd, Guildford and King's Lynn. This book is printed on acid-free paper responsibly manufactured from sustainable forestry, in which at least two trees are planted for each one used for paper production. ImprovementsinSpeechSynthesis.EditedbyE.Keller et al. Copyright # 2002 by John Wiley & Sons, Ltd ISBNs: 0-471-49985-4 &Hardback); 0-470-84594-5 &Electronic) Contents List of contributors ix Preface xiii Part I Issues in Signal Generation 1 1 Towards Greater Naturalness: Future Directions of Research in Speech Synthesis 3 Eric Keller 2 Towards More Versatile Signal Generation Systems 18 GeÂrard Bailly 3 A Parametric Harmonic Noise Model 22 GeÂrard Bailly 4 The COST 258 Signal Generation Test Array 39 GeÂrard Bailly 5 Concatenative Text-to-Speech Synthesis Based on Sinusoidal Modelling 52 Eduardo RodrõÂguez Banga,Carmen GarcõÂa Mateo and Xavier FernaÂndez Salgado 6 Shape Invariant Pitch and Time-Scale Modification of Speech Based on a Harmonic Model 64 Darragh O'Brien and Alex Monaghan 7 Concatenative Speech Synthesis Using SRELP 76 Erhard Rank Part II Issues in Prosody 87 8 Prosody in Synthetic Speech: Problems, Solutions and Challenges 89 Alex Monaghan 9 State-of-the-Art Summary of European Synthetic Prosody R&D 93 Alex Monaghan 10 Modelling FO in Various Romance Languages: Implementation in Some TTS Systems 104 Philippe Martin 11 Acoustic Characterisation of the Tonic Syllable in Portuguese 120 JoaÄo Paulo Ramos Teixeira and Diamantino R.S. Freitas 12 Prosodic Parameters of Synthetic Czech: Developing Rules for Duration and Intensity 129 Marie DohalskaÂ,Jana Mejvaldova and Tomas DubeÏda vi Contents 13 MFGI, a Linguistically Motivated Quantitative Model of German Prosody 134 HansjoÈrg Mixdorff 14 Improvements in Modelling the FO Contour for Different Types of Intonation Units in Slovene 144 Ales Dobnikar 15 Representing Speech Rhythm 154 Brigitte Zellner Keller and Eric Keller 16 Phonetic and Timing Considerations in a Swiss High German TTS System 165 Beat Siebenhaar,Brigitte Zellner Keller and Eric Keller 17 Corpus-based Development of Prosodic Models Across Six Languages 176 Justin Fackrell,Halewijn Vereecken,Cynthia Grover, Jean-Pierre Martens and Bert Van Coile 18 Vowel Reduction in German Read Speech 186 Christina Widera Part III Issues in Styles of Speech 197 19 Variability and Speaking Styles in Speech Synthesis 199 Jacques Terken 20 An Auditory Analysis of the Prosody of Fast and Slow Speech Styles in English, Dutch and German 204 Alex Monaghan 21 Automatic Prosody Modelling of Galician and its Application to Spanish 218 Eduardo LoÂpez Gonzalo,Juan M. Villar Navarro and Luis A. HernaÂndez GoÂmez 22 Reduction and Assimilatory Processes in Conversational French Speech: Implications for Speech Synthesis 228 Danielle Duez 23 Acoustic Patterns of Emotions 237 Branka Zei Pollermann and Marc Archinard 24 The Role of Pitch and Tempo in Spanish Emotional Speech: Towards Concatenative Synthesis 246 Juan Manuel Montero Martinez,Juana M. GutieÂrrez Arriola, Ricardo de CoÂrdoba Herralde,Emilia Victoria EnrõÂquez Carrasco and Jose Manuel Pardo MunÄoz 25 Voice Quality and the Synthesis of Affect 252 Ailbhe Nõ Chasaide and Christer Gobl 26 Prosodic Parameters of a `Fun' Speaking Style 264 Kjell Gustafson and David House 27 Dynamics of the Glottal Source Signal: Implications for Naturalness in Speech Synthesis 273 Christer Gobl and Ailbhe Nõ Chasaide 28 A Nonlinear Rhythmic Component in Various Styles of Speech 284 Brigitte Zellner Keller and Eric Keller Contents vii Part IV Issues in Segmentation and Mark-up 293 29 Issues in Segmentation and Mark-up 295 Mark Huckvale 30 The Use and Potential of Extensible Mark-up &XML) in Speech Generation 297 Mark Huckvale 31 Mark-up for Speech Synthesis: A Review and Some Suggestions 307 Alex Monaghan 32 Automatic Analysis of Prosody for Multi-lingual Speech Corpora 320 Daniel Hirst 33 Automatic Speech Segmentation Based on Alignment with a Text-to-Speech System 328 Petr HoraÂk 34 Using the COST 249 Reference Speech Recogniser for Automatic Speech Segmentation 339 Narada D. Warakagoda and Jon E. Natvig Part V Future Challenges 349 35 Future Challenges 351 Eric Keller 36 Towards Naturalness, or the Challenge of Subjectiveness 353 GenevieÁve Caelen-Haumont 37 Synthesis Within Multi-Modal Systems 363 Andrew Breen 38 A Multi-Modal Speech Synthesis Tool Applied to Audio-Visual Prosody 372 Jonas Beskow,BjoÈrn GranstroÈm and David House 39 Interface Design for Speech Synthesis Systems 383 Gudrun Flach Index 391 ImprovementsinSpeechSynthesis.EditedbyE.Keller et al. Copyright # 2002 by John Wiley & Sons, Ltd ISBNs: 0-471-49985-4 &Hardback); 0-470-84594-5 &Electronic) List of contributors Marc Archinard Ricardo de CoÂrdoba Herralde Geneva University Hospitals Universidad PoliteÂcnica de Madrid Liaison Psychiatry ETSI TelecomunicacioÂn Boulevard de la Cluse 51 Ciudad Universitaria s/n 1205 Geneva, Switzerland 28040 Madrid, Spain GeÂrard Bailly Ales Dobnikar Institut de la Communication ParleÂe Institute J. Stefan INPG Jamova 39 46 av. Felix Vialet 1000 Ljubljana, Slovenia 38031 Grenoble-cedex, France Marie Dohalska Institute of Phonetics Eduardo RodrõÂguez Banga Charles University, Prague Signal Theory Group &GTS). nam. Jana Palacha 2 Dpto. TecnologõÂas de las 116 38 Prague 1, Czech Republic Comunicaciones. ETSI TelecomunicacioÂn Tomas Dubeda Universidad de Vigo Institute of Phonetics 36200 Vigo, Spain Charles University, Prague nam. Jana Palacha 2 Jonas Beskow 116 38 Prague 1, Czech Republic CTT/Dept. of Speech, Music and Hearing Danielle Duez KTH Laboratoire Parole et Langage 100 44 Stockholm, Sweden CNRS Universite de Provence Andrew Breen 29 Av. Robert Schuman Nuance Communications Inc. 13621 Aix en Provence, France The School of Information Systems Emilia Victoria EnrõÂquez Carrasco University of East Anglia Facultad de FilologõÂa. UNED Norwich, NR47TJ, United Kingdom C/ Senda del Rey 7 28040 Madrid, Spain GenevieÁve Caelen-Haumont Laboratoire Parole et Langage Justin Fackrell CNRS Crichton's Close Universite de Provence Canongate 29 Av. Robert Schuman Edinburgh EH8 8DT 13621 Aix en Provence, France UK x List of contributors Xavier FernaÂndez Salgado Kjell Gustafson Signal Theory Group &GTS) CTT/Dept. of Speech, Music and Dpto. TecnologõÂas de las Hearing Comunicaciones KTH ETSI TelecomunicacioÂn 100 44 Stockholm, Sweden Universidad de Vigo
Recommended publications
  • Commercial Tools in Speech Synthesis Technology
    International Journal of Research in Engineering, Science and Management 320 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792 Commercial Tools in Speech Synthesis Technology D. Nagaraju1, R. J. Ramasree2, K. Kishore3, K. Vamsi Krishna4, R. Sujana5 1Associate Professor, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India 2Professor, Dept. of Computer Science, Rastriya Sanskrit VidyaPeet, Tirupati, India 3,4,5UG Student, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India Abstract: This is a study paper planned to a new system phonetic and prosodic information. These two phases are emotional speech system for Telugu (ESST). The main objective of usually called as high- and low-level synthesis. The input text this paper is to map the situation of today's speech synthesis might be for example data from a word processor, standard technology and to focus on potential methods for the future. ASCII from e-mail, a mobile text-message, or scanned text Usually literature and articles in the area are focused on a single method or single synthesizer or the very limited range of the from a newspaper. The character string is then preprocessed and technology. In this paper the whole speech synthesis area with as analyzed into phonetic representation which is usually a string many methods, techniques, applications, and products as possible of phonemes with some additional information for correct is under investigation. Unfortunately, this leads to a situation intonation, duration, and stress. Speech sound is finally where in some cases very detailed information may not be given generated with the low-level synthesizer by the information here, but may be found in given references.
    [Show full text]
  • Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden
    Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9932-9937, October 1995 Colloquium Paper This paper was presented at a colloquium entitled "Human-Machine Communication by Voice," organized by Lawrence R. Rabiner, held by the National Academy of Sciences at The Arnold and Mabel Beckman Center in Irvine, CA, February 8-9,1993. Models of speech synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden ABSTRACT The term "speech synthesis" has been used need large amounts of speech data. Models working close to for diverse technical approaches. In this paper, some of the the waveform are now typically making use of increased unit approaches used to generate synthetic speech in a text-to- sizes while still modeling prosody by rule. In the middle of the speech system are reviewed, and some of the basic motivations scale, "formant synthesis" is moving toward the articulatory for choosing one method over another are discussed. It is models by looking for "higher-level parameters" or to larger important to keep in mind, however, that speech synthesis prestored units. Articulatory synthesis, hampered by lack of models are needed not just for speech generation but to help data, still has some way to go but is yielding improved quality, us understand how speech is created, or even how articulation due mostly to advanced analysis-synthesis techniques. can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages Flexibility and Technical Dimensions are discussed as special challenges facing the speech synthesis community.
    [Show full text]
  • Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique
    Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique Submitted to Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, INDIA For the award of Doctor of Philosophy by Mr. Sangramsing Nathsing Kayte Supervised by: Dr. Bharti W. Gawali Professor Department of Computer Science and Information Technology DR. BABASAHEB AMBEDKAR MARATHWADA UNIVERSITY, AURANGABAD November 2018 Abstract A speech synthesis system is a computer-based system that should be able to read any text aloud with a particular language or multiple languages. This is also called as Text-to- Speech synthesis or in short TTS. Communication plays an important role in everyones life. Usually communication refers to speaking or writing or sending a message to another person. Speech is one of the most important ways for human communication. There have been a great number of efforts to incorporate speech for communication between humans and computers. Demand for technologies in speech processing, such as speech recogni- tion, dialogue processing, natural language processing, and speech synthesis are increas- ing. These technologies are useful for human-to-human like spoken language translation systems and human-to-machine communication like control for handicapped persons etc., TTS is one of the key technologies in speech processing. TTS system converts ordinary orthographic text into acoustic signal, which is indis- tinguishable from human speech. For developing a natural human machine interface, the TTS system can be used as a way to communicate back, like a human voice by a computer. The TTS can be a voice for those people who cannot speak. People wishing to learn a new language can use the TTS system to learn the pronunciation.
    [Show full text]
  • Wormed Voice Workshop Presentation
    Wormed Voice Workshop Presentation micro_research December 27, 2017 1 some worm poetry and songs: The WORM was for a long time desirous to speake, but the rule and order of the Court enjoyned him silence, but now strutting and swelling, and impatient, of further delay, he broke out thus... [Michael Maier] He worshipped the worm and prayed to the wormy grave. Serpent Lucifer, how do you do? Of your worms and your snakes I'd be one or two; For in this dear planet of wool and of leather `Tis pleasant to need neither shirt, sleeve, nor shoe, 2 And have arm, leg, and belly together. Then aches your head, or are you lazy? Sing, `Round your neck your belly wrap, Tail-a-top, and make your cap Any bee and daisy. Two pigs' feet, two mens' feet, and two of a hen; Devil-winged; dragon-bellied; grave- jawed, because grass Is a beard that's soon shaved, and grows seldom again worm writing the the the the,eeeronencoug,en sthistit, d.).dupi w m,tinsprsool itav f bometaisp- pav wheaigelic..)a?? orerdi mise we ich'roo bish ftroo htothuloul mespowouklain- duteavshi wn,jis, sownol hof." m,tisorora angsthyedust,es, fofald,junss ownoug brad,)fr m fr,aA?a????ck;A?stelav aly, al is.'rady'lfrdil owoncorara wns t.) sh'r, oof ofr,a? ar,a???????a? fu mo towess,eethen hrtolly-l,."tigolav ict,a???!ol, w..'m,elyelil,tstreamas..n gotaillas.tansstheatsea f mb ispot inici t.) owar.**1 wnshigigholoothtith orsir.tsotic.'m, sotamimoledug imootrdeavet..t,) sh s,tranciror."wn sieee h asinied.tiear wspilotor,) bla av.nicord,ier.dy'et.*tite m.)..*d, hrouceto hie, ig il m, bsomoug,.t.'l,t, olitel bs,.nt,.dotr tat,)aa? htotitedont,j alesil, starar,ja taie ass.nishiceroouldseal fotitoonckysil, m oitispl o anteeeaicowousomirot.
    [Show full text]
  • Voice Synthesizer Application Android
    Voice synthesizer application android Continue The Download Now link sends you to the Windows Store, where you can continue the download process. You need to have an active Microsoft account to download the app. This download may not be available in some countries. Results 1 - 10 of 603 Prev 1 2 3 4 5 Next See also: Speech Synthesis Receming Device is an artificial production of human speech. The computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware. The text-to-speech system (TTS) converts the usual text of language into speech; other systems display symbolic linguistic representations, such as phonetic transcriptions in speech. Synthesized speech can be created by concatenating fragments of recorded speech that are stored in the database. Systems vary in size of stored speech blocks; The system that stores phones or diphones provides the greatest range of outputs, but may not have clarity. For specific domain use, storing whole words or suggestions allows for high-quality output. In addition, the synthesizer may include a vocal tract model and other characteristics of the human voice to create a fully synthetic voice output. The quality of the speech synthesizer is judged by its similarity to the human voice and its ability to be understood clearly. The clear text to speech program allows people with visual impairments or reading disabilities to listen to written words on their home computer. Many computer operating systems have included speech synthesizers since the early 1990s. A review of the typical TTS Automatic Announcement System synthetic voice announces the arriving train to Sweden.
    [Show full text]
  • Towards Expressive Speech Synthesis in English on a Robotic Platform
    PAGE 130 Towards Expressive Speech Synthesis in English on a Robotic Platform Sigrid Roehling, Bruce MacDonald, Catherine Watson Department of Electrical and Computer Engineering University of Auckland, New Zealand s.roehling, b.macdonald, [email protected] Abstract Affect influences speech, not only in the words we choose, but in the way we say them. This pa- per reviews the research on vocal correlates in the expression of affect and examines the ability of currently available major text-to-speech (TTS) systems to synthesize expressive speech for an emotional robot guide. Speech features discussed include pitch, duration, loudness, spectral structure, and voice quality. TTS systems are examined as to their ability to control the fea- tures needed for synthesizing expressive speech: pitch, duration, loudness, and voice quality. The OpenMARY system is recommended since it provides the highest amount of control over speech production as well as the ability to work with a sophisticated intonation model. Open- MARY is being actively developed, is supported on our current Linux platform, and provides timing information for talking heads such as our current robot face. 1. Introduction explicitly stated otherwise the research is concerned with the English language. Affect influences speech, not only in the words we choose, but in the way we say them. These vocal nonverbal cues are important in human speech as they communicate 2.1. Pitch information about the speaker’s state or attitude more effi- ciently than the verbal content (Eide, Aaron, Bakis, Hamza, Pitch contour seems to be one of the clearest indica- Picheny, and Pitrelli 2004).
    [Show full text]
  • The Algorithms of Tajik Speech Synthesis by Syllable
    ITM Web of Conferences 35, 07003 (2020) https://doi.org/10.1051/itmconf/20203507003 ITEE-2019 The Algorithms of Tajik Speech Synthesis by Syllable Khurshed A. Khudoyberdiev1* 1Khujand Polytechnic institute of Tajik technical university named after academician M.S. Osimi, Khujand, 735700, Tajikistan Abstract. This article is devoted to the development of a prototype of a computer synthesizer of Tajik speech by the text. The need for such a synthesizer is caused by the fact that its analogues for other languages not only help people with visual and speech defects, but also find more and more application in communication technology, information and reference systems. In the future, such programs will take their proper place in the broad acoustic dialogue of humans with automatic machines and robotics in various fields of human activity. The article describes the prototype of the Tajik computer synthesizer by the text developed by the author, which is constructed on the principle of a concatenative synthesizer, in which the syllable is chosen as the speech unit, which in turn, indicates the need for the most complete description of the variety of Tajik language syllables. To study the patterns of the Tajik language associated with the concept of syllable, it was introduced the concept of “syllabic structure of the word”. It is obtained the statistical distribution of structures, i.e. a correspondence is established between the syllabic structures of words and the frequencies of their occurrence in texts in the Tajik language. It is proposed an algorithm for breaking Tajik words into syllables, implemented as a computer program.
    [Show full text]
  • Understanding Disability and Assistive Technology
    Free ebooks ==> www.ebook777.com BOOKS FOR PROFESSIONALS BY PROFESSIONALS® O Connor RELATED Pro HTML5 Accessibility Build exciting, accessible, and usable web sites and apps with Pro HTML5 Accessibility. This book walks you through the process of designing user interfaces to be used by everyone, regardless of ability. It gives you the knowledge and skills you need to use HTML5 to serve the needs of the widest possible audience, including people with disabilities using assistive technology (AT) and older people. With Pro HTML5 Accessibility, you’ll learn: • How accessibility makes for good web site design • The fundamentals of ATs and how they interact with web content • How to apply HTML5 to your web projects in order to design more accessible content • How JavaScript and WAI-ARIA can be used with HTML5 to support the development of accessible web content • Important usability and user-centered design techniques that can make your HTML5 projects reach a wider audience Filled with practical advice, this book helps you master HTML5 and good accessibility design. It explores the new semantics of HTML5 and shows you how to combine them with authoring practices you may know from using earlier versions of HTML. It also aims to demonstrate how HTML5 content is currently supported (or not) by ATs such as screen readers and what this practically means for you as you endeavor to make your HTML5 projects accessible. Shelve in Web Design/HTML User level: Intermediate–Advanced SOURCE CODE ONLINE www.apress.com www.ebook777.com Free ebooks ==> www.ebook777.com For your convenience Apress has placed some of the front matter material after the index.
    [Show full text]
  • Gnuspeech Tract Manual 0.9
    Gnuspeech TRAcT Manual 0.9 TRAcT: the Gnuspeech Tube Resonance Access Tool: a means of investigating and understanding the basic Gnuspeech vocal tract model David R. Hill, University of Calgary TRAcT and the \tube" model to which it allows access was originally known as \Synthesizer" and developed by Leonard Manzara on the NeXT computer. The name has been changed because \Synthesiser" might be taken to imply it synthesises speech, which it doesn't. It is a \low-level" sound synthesiser that models the human vocal tract, requiring the \high-level" synthesiser control inherent in Gnuspeech/Monet to speak. TRAcT provides direct interactive access to the human vocal tract model parameters and shapes as a means of exploring its sound generation capabilities, as needed for speech database creation. i (GnuSpeech TRAcT Manual Version 0.9) David R. Hill, PEng, FBCS ([email protected] or drh@firethorne.com) © 2004, 2015 David R. Hill. All rights reserved. This document is publicly available under the terms of a Free Software Foundation \Free Doc- umentation Licence" See see http://www.gnu.org/copyleft/fdl.html for the licence terms. This page and the acknowledgements section are the invariant sections. ii SUMMARY The \Tube Resonance Model" (TRM, or \tube", or \waveguide", or transmission-line) forms the acoustic basis of the Gnupeech articulatory text-to-speech system and provides a linguistics research tool. It emulates the human vocal apparatus and allows \postures" to be imposed on vocal tract model, and energy to be injected representing voicing, whispering, sibilant noises and \breathy" noise. The \Distinctive Region Model" (DRM) control system that allows simple specification of the postures, and accessed by TRAcT 1is based on research by CGM Fant and his colleagues at the Stockholm Royal Institute of Technology Speech Technology Laboratory (Fant & Pauli 1974), by Ren´eCarr´eand his colleagues at T´el´ecomParis (Carr´e,Chennoukh & Mrayati 1992), and further developed by our original research team at Trillium Sound Research and the U of Calgary).
    [Show full text]
  • 26Automatic Speech Recognition and Text-To-Speech
    Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright © 2021. All rights reserved. Draft of September 21, 2021. CHAPTER Automatic Speech Recognition 26 and Text-to-Speech I KNOW not whether I see your meaning: if I do, it lies Upon the wordy wavelets of your voice, Dim as an evening shadow in a brook, Thomas Lovell Beddoes, 1851 Understanding spoken language, or at least transcribing the words into writing, is one of the earliest goals of computer language processing. In fact, speech processing predates the computer by many decades! The first machine that recognized speech was a toy from the 1920s. “Radio Rex”, shown to the right, was a celluloid dog that moved (by means of a spring) when the spring was released by 500 Hz acous- tic energy. Since 500 Hz is roughly the first formant of the vowel [eh] in “Rex”, Rex seemed to come when he was called (David, Jr. and Selfridge, 1962). In modern times, we expect more of our automatic systems. The task of auto- ASR matic speech recognition (ASR) is to map any waveform like this: to the appropriate string of words: It's time for lunch! Automatic transcription of speech by any speaker in any environment is still far from solved, but ASR technology has matured to the point where it is now viable for many practical tasks. Speech is a natural interface for communicating with smart home ap- pliances, personal assistants, or cellphones, where keyboards are less convenient, in telephony applications like call-routing (“Accounting, please”) or in sophisticated dialogue applications (“I’d like to change the return date of my flight”).
    [Show full text]
  • Voice Recognition
    Technical Glossary Adaptive Technology Resource Centre Faculty of Information University of Toronto Accessible Online Learning Tools ................................................. 6 Points to ponder - Questions to consider when shopping for Accessible Online Learning Tools - Online Education Sources ................................... 6 Solutions .................................................................................................. 6 Web Resources ....................................................................................... 8 Alternative Keyboards ................................................................. 9 Points to Ponder - Questions to consider when shopping for an alternative keyboard ................................................................................ 9 Non-Keyboard Based Enhancements: ...................................................... 9 Other Free Enhancements - Windows .................................................... 10 Other Free Enhancements - Macintosh .................................................. 10 Alternative Keyboards ............................................................................ 10 Miscellaneous Keyboard Enhancers ....................................................... 11 Resources .............................................................................................. 12 Alternative Mouse Systems ........................................................ 13 Points to ponder - Questions to consider when shopping for an alternative mouse system .....................................................................
    [Show full text]
  • D10.1: State of the Art of Accessibility Tools Revision: 1.0 As of 28Th February 2011
    DELIVERABLE Project Acronym: EuDML Grant Agreement number: 250503 Project Title: The European Digital Mathematics Library D10.1: State of the Art of Accessibility Tools Revision: 1.0 as of 28th February 2011 Authors: Volker Sorge University of Birmingham Mark Lee University of Birmingham Petr Sojka Masaryk University Alan P. Sexton University of Birmingham Contributors: Martin Jarmar Masaryk University Project co-funded by the European Comission within the ICT Policy Support Programme Dissemination Level P Public X C Confidential, only for members of the consortium and the Commission Services Revision History Revision Date Author Organisation Description 0.1 15th October 2010 Volker Sorge UB Just a content outline. 0.2 28th January 2011 Volker Sorge UB Section 2 completed. 0.3 4th February 2011 Petr Sojka MU MU experience part added. 0.4 13th Feburary 2011 Volker Sorge UB Sections 3,4,5 completed. 0.5 14th February 2011 VS+PS UB+MU Development version. 0.6 14th February 2011 Alan P. Sexton UB Typo corrections and pol- ishing 0.7 19th February 2011 Mark Lee UB Addition of conclusions on language translation 1.0 28th Feburary 2011 Volker Sorge UB Made changes suggested by reviewer and final edit- ing. Statement of originality: This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both. Abstract The purpose of this report is to present a state of the art in accessibility tools that can be used to provide access to mathematical literature to visually impaired users as well as print impaired users (i.e., people with specific learning disabilities like dyslexia) as well as an overview of current automated translation technology.
    [Show full text]