Multimodal Speech Synthesis

Total Page:16

File Type:pdf, Size:1020Kb

Multimodal Speech Synthesis NGSLT 2006 Björn Granström Multimodal speech synthesis Multimodal speech synthesis/ (N)GSLT - Speech Technology course Björn Granström KTH - Kungliga tekniska högskolan CTT, KTH School for Computer Science and Communication Department of Speech, Music and Hearing Multimodal speech synthesis – (N)GSLT 2006 [ 1 ] Multimodal speech synthesis – (N)GSLT 2006 [ 2 ] CTT - Centre for speech technology CTT - modes of operation Research areas • Speech production · long-term research projects, · participation by CTT personnel in international • Speech perception projects, • Communication aids · research exchange with leading foreign groups, • Multimodal speech synthesis · graduate-level research training, • Speech understanding · dissemination of competence through the exchange of researchers within Sweden and • Speaker characteristics collaboration with other Swedish research groups • Interactive dialog systems · special information dissemination efforts • Affective computing • Language learning Multimodal speech synthesis – (N)GSLT 2006 [ 3 ] Multimodal speech synthesis – (N)GSLT 2006 [ 4 ] CTT partners phase 4 - 2004 -2006 The work presented in this lecture Babel-Infovox is the result of many researchers’ EnglishTown GN Resound efforts at the Department of Speech, Hjälpmedelsinstitutet HoneySoft Music and Hearing IcePeak Dolphin Audio Publishing LingTek STTS Phoneticom Polycom Technologies SaabTech SpeechCraft STTS Reports and further Sveriges Radio information can be Sveriges Television found on our home page TeliaSonera HoneySoft TPB - Talboks- och www.speech.kth.se punktskriftsbiblioteket Vattenfall VoiceProvider Multimodal speech synthesis – (N)GSLT 2006 [ 5 ] Multimodal speech synthesis – (N)GSLT 2006 [ 6 ] 1 NGSLT 2006 Björn Granström Multimodal speech synthesis Why Speech Tech? User “Interactive Human Communication” Chapanis Scientific American 1975! “Computer” Februari 1998 Multimodal speech synthesis – (N)GSLT 2006 [ 7 ] Multimodal speech synthesis – (N)GSLT 2006 [ 8 ] Speech is fast and verbose More efficient and convenient! Average time for solving divers problems using different combinations of means of communication Multimodal speech synthesis – (N)GSLT 2006 [ 9 ] Multimodal speech synthesis – (N)GSLT 2006 [ 10 ] Speech technology is making The first pen phone money! Why no commercial success? Classic example : AT&T and Lucent Technologies VRCP, "Voice Recognition Call Processing” sevice Too small keys? Selection of payment method Vocabulary only five words : collect, Too small display!? calling card, third number, person, operator More than 5 000 000 calls per day Earnings already (1999?) more than 1 april 1998 AT&T/Bell labs´ total investment in speech research Multimodal speech synthesis – (N)GSLT 2006 [ 11 ] Multimodal speech synthesis – (N)GSLT 2006 [ 12 ] 2 NGSLT 2006 Björn Granström Multimodal speech synthesis Speech synthesis developments at KTH The KTH speech group - Early days 1953 1961 1982 1993 1998 Gunnar Fant and OVE I 1953 2002 Multimodal speech synthesis – (N)GSLT 2006 [ 13 ] Multimodal speech synthesis – (N)GSLT 2006 [ 14 ] OVE I, in the WaveSurfer tool Ove II, 1958 • Interface is based around WaveSurfer , a general purpose tool for speech and audio viewing, editing and labelling • TTS and Talking Head functionality is added as plug-ins • WaveSurfer (presently without TTS&TH) works on all common platforms and is freely available as open 1961 source • Modules from Waves available – formants and F0 in 1962 present release – thanks to Microsoft and AT&T http://www.speech.kth.se/wavesurfer Multimodal speech synthesis – (N)GSLT 2006 [ 15 ] Multimodal speech synthesis – (N)GSLT 2006 [ 16 ] KTH/TTS history What do we mean by Speech synthesis? • Recorded speech – Words or phrases (telephone banking) • 1967, Digitally controlled OVE III – Fixed vocabulary – maintenance problems… • 1974, Rule-based system RULSYS • Concatenative speech synthesis – transformation rules – Diphones or larger units (unit selection) • LPC: source filter model (too simple?) • 1979, Mobile text-to-speech system • PSOLA/MBROLA/HNM – and rules for prosody – One speaker – used by a non-vocal child • Parametric synthesis • 1982, Portable TTS (ICASSP, Paris) – Formant synthesis – Multilingual – Articulatory synthesis – MC 68000, NEC 7720 – flexible – But lower quality - today • 1983, Founding of Infovox Inc. • Mulitmodal synthesis Multimodal speech synthesis – (N)GSLT 2006 [ 17 ] Multimodal speech synthesis – (N)GSLT 2006 [ 18 ] 3 NGSLT 2006 Björn Granström Multimodal speech synthesis The synthesis space TEXT vs SPEECH Speech • Parallel vs. sequential Knowledge Flexibility Intelligibility • permanent vs disappearing Naturalness Processing • Text as transcription of speech? needs Bit rate • Example - WaveSurfer demo : Vocabulary – Palindrome Cost Units – ”Var sak har två sidor” Complexity Multimodal speech synthesis – (N)GSLT 2006 [ 19 ] Multimodal speech synthesis – (N)GSLT 2006 [ 20 ] Phonology, Morphology Phonetics hov#mäst-ar-n två kHz ms Multimodal speech synthesis – (N)GSLT 2006 [ 21 ] Multimodal speech synthesis – (N)GSLT 2006 [ 22 ] Non-phonetic writing Signs vs. technology Multimodal speech synthesis – (N)GSLT 2006 [ 23 ] Multimodal speech synthesis – (N)GSLT 2006 [ 24 ] 4 NGSLT 2006 Björn Granström Multimodal speech synthesis How is a Chinese lexicon organized? Is the sign what you think? Multimodal speech synthesis – (N)GSLT 2006 [ 25 ] Multimodal speech synthesis – (N)GSLT 2006 [ 26 ] Text-to-speech (TTS) Synthesis methods “abcd” texttext Artikulatorer Language ident. Artikulatorer Language ident. Morphological analysis Lexicon and rules Linguistic analysis AnsatsröretsAnsatsrörets form form Linguistic analysis Syntax analysis ProsodicProsodic analysisanalysis Rules and lexicon TuberTuber Läppar Phonetic description Rules and unit selection Phonetic description ElektrisktElektriskt nät nät Concatenation Rules Concatenation KällaKälla ResonanserResonanser SoundSound generationgeneration Multimodal speech synthesis – (N)GSLT 2006 [ 27 ] Multimodal speech synthesis – (N)GSLT 2006 [ 28 ] Source filter theory RULSYS Rules - Features Time: • Is IPA synthesiser possible? • Based on generative phonology Source filter Radiation • Language specific definitions (glottal flow) (shape of vocal tract) (lips) • Rules for contextual modifications • Examples • Interactive rule manipulations Frequency: Multimodal speech synthesis – (N)GSLT 2006 [ 29 ] Multimodal speech synthesis – (N)GSLT 2006 [ 30 ] 5 NGSLT 2006 Björn Granström Multimodal speech synthesis Glove (a) a e / __ <cons> e # F0 RKRGFADI B1 voice FN source AN N2 NZ BN A0 NA morph root NM F1 F2 F3 F4 F5-F7 B1 B2 B3 B4 B5-B7 (b) graph a e AH noise source AK phon <cons> e C2 C1 ^ ^ ^ ^ Multimodal speech synthesis – (N)GSLT 2006 [ 31 ] Multimodal speech synthesis – (N)GSLT 2006 [ 32 ] ”Pia odlar blå violer” Evaluation of synthesis- VCV test percent Consonant errors 50 41.7 40 Original Goldstein & Till (1992) 30 26.1 20 12.8 8.7 10 5.6 0 Synthesis Year Year Year Year One 1983 1987 1989 1991 human Carlson, R. Granström, B., and Karlsson, I.(1990): KTH text-to-speech system speaker "Experiments with voice modeling in speech synthesis," Multimodal speech synthesis – (N)GSLT 2006 [ 33 ] Multimodal speech synthesis – (N)GSLT 2006 [ 34 ] Speaker characteristics Speaker characteristics Speaker • TIDE/Vaess project (Voices, attitudes and dialect, sex, social, education, age emotions in synthetic speech • “Voice fitting” Situation formality, style, interspeaker relation • Software, Glove synthesis • 10 user controlled parameters Complex description many dependent variables • Rule specified connection to synthesis parameters Multimodal speech synthesis – (N)GSLT 2006 [ 35 ] Multimodal speech synthesis – (N)GSLT 2006 [ 36 ] 6 NGSLT 2006 Björn Granström Multimodal speech synthesis Emotional/ubiquitous computing – do we want it? Emotions natural natural synthesis synthesis Neutral Happy Sad Angry natural natural Angry synthesis synthesis Early BBC vision – the conversational toaster Thanks to Mark Huckvale, UCL,Multimodal for speechthe synthesis video – (N)GSLT clip 2006 [ 37 ] Multimodal speech synthesis – (N)GSLT 2006 [ 38 ] Result from listening test Emotions intended 100 90 80 neutral 70 angry 60 50 happy Happiness Anger Fear 40 sad 30 percent response percent 20 10 0 Happy Sad Angry Synthetic stimuli Surprise Disgust Sadness Multimodal speech synthesis – (N)GSLT 2006 [ 39 ] Multimodal speech synthesis – (N)GSLT 2006 [ 40 ] Voice source simulation Voice source simulation (Phar.) 2y examples x φφφ y C ∆ x ∆ y ∆ z Samplingsintervall : 1 parameter Adduction force Stämbandsanatomi : 18 parametar, t ex längd, vikt, dämpningsfaktor l C m J Stämbandsartikulation : 7 parametrar, t ex restglapp, lungtryck Rest position Högtrycksartikulation: 4 parametrar, t ex tonhöjd, ljudstyrka (Trach.) x Talrörsartikulation: 8 parametrar, t ex area, dämpningsfaktor Translational, Rotational y transv. + ax. 2D z Exempel originalyttrande Mechanical model B 2(y+v) Geometric model with fördubblad stämbandsmassa two elastic rods förlängning av stämband 13->17 mm Ps assymetriförändring 1.0->1.8 Po x assymetriförändring 1.0->2.0 AAm s Air flow through glottis restgap 0.1->1.0 mm Multimodal speech synthesis – (N)GSLT 2006 [ 41 ] Multimodal speech synthesis – (N)GSLT 2006 [ 42 ] 7 NGSLT 2006 Björn Granström Multimodal speech synthesis Predictable word accents Letter-to-sound vs. • Word accent and stress –'and en (bird) lexicon –'ànd en (spirit) Size vs. Precision –'ànk -pas ta –'ànk -pas tej Maintenance - “half of the words
Recommended publications
  • Part 2: RHYTHM – DURATION and TIMING Część 2: RYTM – ILOCZAS I WZORCE CZASOWE
    Part 2: RHYTHM – DURATION AND TIMING Część 2: RYTM – ILOCZAS I WZORCE CZASOWE From research to application: creating and applying models of British RP English rhythm and intonation Od badań do aplikacji: tworzenie i zastosowanie modeli rytmu i intonacji języka angielskiego brytyjskiego David R. Hill Department of Computer Science The University of Calgary, Alberta, Canada [email protected] ABSTRACT Wiktor Jassem’s contributions and suggestions for further work have been an essential influence on my own work. In 1977, Wiktor agreed to come to my Human-Computer Interaction Laboratory at the University of Calgary to collaborate on problems asso- ciated with British English rhythm and intonation in computer speech synthesis from text. The cooperation resulted in innovative models which were used in implementing the world’s first completely functional and still operational real-time system for arti- culatory speech synthesis by rule from plain text. The package includes the software tools needed for developing the databases required for synthesising other languages, as well as providing stimuli for psychophysical experiments in phonetics and phonology. STRESZCZENIE Pu bli ka cje Wik to ra Jas se ma i wska zów ki do dal szej pra cy w spo sób istot ny wpły nę - ły na mo ją wła sną pra cę. W 1997 ro ku Wik tor zgo dził się na przy jazd do mo je go La - bo ra to rium In te rak cji Czło wiek -Kom pu ter na Uni wer sy te cie w Cal ga ry aby wspól nie za jąć się pro ble ma mi zwią za ny mi z ryt mem w bry tyj skim an giel skim oraz in to na cją w syn te zie mo wy z tek stu.
    [Show full text]
  • SPEECH ACOUSTICS and PHONETICS Text, Speech and Language Technology VOLUME 24
    SPEECH ACOUSTICS AND PHONETICS Text, Speech and Language Technology VOLUME 24 Series Editors Nancy Ide, Vassar College, New York Jean V´eronis, Universited´ eProvence and CNRS, France Editorial Board Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT&TBell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University of Regina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonoma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSI-CNRS, France The titles published in this series are listed at the end of this volume. Speech Acoustics and Phonetics by GUNNAR FANT Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm, Sweden KLUWER ACADEMIC PUBLISHERS DORDRECHT / BOSTON / LONDON A C.I.P Catalogue record for this book is available from the Library of Congress. ISBN 1-4020-2789-3 (PB) ISBN 1-4020-2373-1 (HB) ISBN 1-4020-2790-7 (e-book) Published by Kluwer Academic Publishers, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. Sold and distributed in North, Central and South America by Kluwer Academic Publishers, 101 Philip Drive, Norwell, MA 02061, U.S.A. In all other countries, sold and distributed by Kluwer Academic Publishers, P.O. Box 322, 3300 AH Dordrecht, The Netherlands. Printed on acid-free paper All Rights Reserved C 2004 Kluwer Academic Publishers No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
    [Show full text]
  • Estudios De I+D+I
    ESTUDIOS DE I+D+I Número 51 Proyecto SIRAU. Servicio de gestión de información remota para las actividades de la vida diaria adaptable a usuario Autor/es: Catalá Mallofré, Andreu Filiación: Universidad Politécnica de Cataluña Contacto: Fecha: 2006 Para citar este documento: CATALÁ MALLOFRÉ, Andreu (Convocatoria 2006). “Proyecto SIRAU. Servicio de gestión de información remota para las actividades de la vida diaria adaptable a usuario”. Madrid. Estudios de I+D+I, nº 51. [Fecha de publicación: 03/05/2010]. <http://www.imsersomayores.csic.es/documentos/documentos/imserso-estudiosidi-51.pdf> Una iniciativa del IMSERSO y del CSIC © 2003 Portal Mayores http://www.imsersomayores.csic.es Resumen Este proyecto se enmarca dentro de una de las líneas de investigación del Centro de Estudios Tecnológicos para Personas con Dependencia (CETDP – UPC) de la Universidad Politécnica de Cataluña que se dedica a desarrollar soluciones tecnológicas para mejorar la calidad de vida de las personas con discapacidad. Se pretende aprovechar el gran avance que representan las nuevas tecnologías de identificación con radiofrecuencia (RFID), para su aplicación como sistema de apoyo a personas con déficit de distinta índole. En principio estaba pensado para personas con discapacidad visual, pero su uso es fácilmente extensible a personas con problemas de comprensión y memoria, o cualquier tipo de déficit cognitivo. La idea consiste en ofrecer la posibilidad de reconocer electrónicamente los objetos de la vida diaria, y que un sistema pueda presentar la información asociada mediante un canal verbal. Consta de un terminal portátil equipado con un trasmisor de corto alcance. Cuando el usuario acerca este terminal a un objeto o viceversa, lo identifica y ofrece información complementaria mediante un mensaje oral.
    [Show full text]
  • Half a Century in Phonetics and Speech Research
    Fonetik 2000, Swedish phonetics meeting in Skövde, May 24-26, 2000 (Expanded version, internal TMH report) Half a century in phonetics and speech research Gunnar Fant Department of Speech, Music and Hearing, KTH, Stockholm, 10044 Abstract This is a brief outlook of experiences during more than 50 years in phonetics and speech research. I will have something to say about my own scientific carrier, the growth of our department at KTH, and I will end up with an overview of research objectives in phonetics and a summary of my present activities. Introduction As you are all aware of, phonetics and speech research are highly interrelated and integrated in many branches of humanities and technology. In Sweden by tradition, phonetics and linguistics have had a strong position and speech technology is well developed and internationally respected. This is indeed an exciting field of growing importance which still keeps me busy. What have we been up to during half a century? Where do we stand today and how do we look ahead? I am not attempting a deep, thorough study, my presentation will in part be anecdotal, but I hope that it will add to the perspective, supplementing the brief account presented in Fant (1998) The early period 1945-1966 KTH and Ericsson 1945-1949 I graduated from the department of Telegraphy and Telephony of the KTH in May 1945. My supervisor, professor Torbern Laurent, a specialist in transmission line theory and electrical filters had an open mind for interdisciplinary studies. My thesis was concerned with theoretical matters of relations between speech intelligibility and reduction of overall system bandwidth, incorporating the effects of different types of hearing loss.
    [Show full text]
  • Spoken Language Generation and Un- Derstanding by Machine: a Problems and Applications-Oriented Overview
    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/256982222 Spoken Language Generation and Understanding by Machine: A Problems and Applications Oriented Overview Chapter · January 1980 DOI: 10.1007/978-94-009-9091-3_1 CITATIONS READS 6 826 1 author: David Hill The University of Calgary 58 PUBLICATIONS 489 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Computer-Human Interaction View project Animating speech View project All content following this page was uploaded by David Hill on 03 December 2015. The user has requested enhancement of the downloaded file. Spoken Language Generation and Un- derstanding by Machine: a Problems and Applications-Oriented Overview David R. Hill Department of Computer Science, University of Calgary, AB, Canada © 1979, 2015 David R. Hill Summary Speech offers humans a means of spontaneous, convenient and effective communication for which neither preparation nor tools are required, and which may be adapted to suit the developing requirements of a communication situation. Although machines lack the feedback, based on understanding and shared experience that is essential to this form of communication in general, speech is so fundamental to the psychol- ogy of the human, and offers such a range of practical advantages, that it is profitable to develop and apply means of speech communication with machines under constraints that allow the provision of adequate dialog and feedback. The paper details the advantages of speech com- munication and outlines speech production and perception in terms relevant to understanding the rest of the paper. The hierarchy/heterar- chy of processes involved in both automatic speech understanding and the machine generation of speech is examined from a variety of points of view and the current capabilities in terms of viable applications are noted.
    [Show full text]
  • Speech Generation: from Concept and from Text
    Speech Generation From Concept and from Text Martin Jansche CS 6998 2004-02-11 Components of spoken output systems Front end: From input to control parameters. • From naturally occurring text; or • From constrained mark-up language; or • From semantic/conceptual representations. Back end: From control parameters to waveform. • Articulatory synthesis; or • Acoustic synthesis: – Based predominantly on speech samples; or – Using mostly synthetic sources. 2004-02-11 1 Who said anything about computers? Wolfgang von Kempelen, Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine, 1791. Charles Wheatstone’s reconstruction of von Kempelen’s machine 2004-02-11 2 Joseph Faber’s Euphonia, 1846 2004-02-11 3 Modern articulatory synthesis • Output produced by an articulatory synthesizer from Dennis Klatt’s review article (JASA 1987) • Praat demo • Overview at Haskins Laboratories (Yale) 2004-02-11 4 The Voder ... Developed by Homer Dudley at Bell Telephone Laboratories, 1939 2004-02-11 5 ... an acoustic synthesizer Architectural blueprint for the Voder Output produced by the Voder 2004-02-11 6 The Pattern Playback Developed by Franklin Cooper at Haskins Laboratories, 1951 No human operator required. Machine plays back previously drawn spectrogram (spectrograph invented a few years earlier). 2004-02-11 7 Can you understand what it says? Output produced by the Pattern Playback. 2004-02-11 8 Can you understand what it says? Output produced by the Pattern Playback. These days a chicken leg is a rare dish. It’s easy to tell the depth of a well. Four hours of steady work faced us. 2004-02-11 9 Synthesis-by-rule • Realization that spectrograph and Pattern Playback are really only recording and playback devices.
    [Show full text]
  • Aplicació De Tècniques De Generació Automàtica De La Parla En Producció Audiovisual
    Aplicació de tècniques de generació automàtica de la parla en producció audiovisual Maig 2011 Investigador responsable: Francesc Alías Pujol Equip: Ignasi Iriondo Sanz Joan Claudi Socoró Carrié Lluís Formiga Fanals Alexandre Trilla Castelló VII convocatòria d‘ajuts a projectes de recerca sobre comunicació audiovisual (segons acord 96/2010 del Ple del Consell de l‘Audiovisual de Catalunya) La Salle – Universitat Ramon Llull Departament de Tecnologies Mèdia Quatre Camins, 30 08022 BARCELONA Consell de l‘Audiovisual de Catalunya (CAC) Sancho d‘Àvila, 125-129 08018 BARCELONA Agraïments Aquest estudi de recerca ha estat possible gràcies a l‘ajut concedit pel Consell de l‘Audiovisual de Catalunya en la VII convocatòria d‘Ajuts a projectes de recerca sobre comunicació audiovisual (segons acord 96/2010 del Ple del Consell de l‘Audiovisual de Catalunya). Volem agrair al Dr. Antonio Bonafonte de la Universitat Politècnica de Catalunya (UPC) per la cessió dels textos corresponents a les veus Ona i Pau del projecte FestCat, utilitzades en aquest treball. També volem agrair a les persones que han participat de forma desinteressada en l‘enquesta realitzada dins del marc del treball de camp realitzat, tant del món de l‘audiovisual com les persones amb discapacitat visual que han tingut l‘amabilitat d‘atendre les nostres qüestions. En especial, volem agrair la col·laboració de l‘Anna Torrens que, dins del marc del seu Treball Final de Carrera d‘Enginyeria Tècnica en Sistemes de Telecomunicació (La Salle, Universitat Ramon Llull) ha estat l‘encarregada de realitzar l‘estudi de camp anteriorment esmentat. ÍNDEX 1 Estat de la qüestió sobre la síntesi de veu ...........................................................
    [Show full text]
  • C H a P T E R
    CHAPTER 16 Speech SynthesisEquation Section 16 The speech synthesis module of a TTS sys- tem is the component that generates the waveform. The input of traditional speech synthesis systems is a phonetic transcription with its associated prosody. The input can also include the original text with tags, as this may help in producing higher-quality speech. Speech synthesis systems can be classified into three types according to the model used in the speech generation. Articulatory synthesis, described in Section 16.2.4, uses a physical model of speech production that includes all the articulators described in Chapter 2. Formant synthesis uses a source-filter model, where the filter is characterized by slowly varying formant frequencies; it is the subject of Section 16.2. Concatenative synthesis gen- erates speech by concatenating speech segments and is described in Section 16.3. To allow more flexibility in concatenative synthesis, a number of prosody modification techniques are described in Sections 16.4 and 16.5. Finally, a guide to evaluating speech synthesis systems is included in Section 16.6. Speech synthesis systems can also be classified according to the degree of manual in- tervention in the system design into synthesis by rule and data-driven synthesis.Inthefor- 777 778 Speech Synthesis mer, a set of manually derived rules is used to drive a synthesizer, and in the latter the syn- thesizer’s parameters are obtained automatically from real speech data. Concatenative sys- tems are, thus, data driven. Formant synthesizers have traditionally used synthesis by rule, since the evolution of formants in a formant synthesizer has been done with hand-derived rules.
    [Show full text]
  • Gnuspeech Tract Manual 0.9
    Gnuspeech TRAcT Manual 0.9 TRAcT: the Gnuspeech Tube Resonance Access Tool: a means of investigating and understanding the basic Gnuspeech vocal tract model David R. Hill, University of Calgary TRAcT and the \tube" model to which it allows access was originally known as \Synthesizer" and developed by Leonard Manzara on the NeXT computer. The name has been changed because \Synthesiser" might be taken to imply it synthesises speech, which it doesn't. It is a \low-level" sound synthesiser that models the human vocal tract, requiring the \high-level" synthesiser control inherent in Gnuspeech/Monet to speak. TRAcT provides direct interactive access to the human vocal tract model parameters and shapes as a means of exploring its sound generation capabilities, as needed for speech database creation. i (GnuSpeech TRAcT Manual Version 0.9) David R. Hill, PEng, FBCS ([email protected] or drh@firethorne.com) © 2004, 2015 David R. Hill. All rights reserved. This document is publicly available under the terms of a Free Software Foundation \Free Doc- umentation Licence" See see http://www.gnu.org/copyleft/fdl.html for the licence terms. This page and the acknowledgements section are the invariant sections. ii SUMMARY The \Tube Resonance Model" (TRM, or \tube", or \waveguide", or transmission-line) forms the acoustic basis of the Gnupeech articulatory text-to-speech system and provides a linguistics research tool. It emulates the human vocal apparatus and allows \postures" to be imposed on vocal tract model, and energy to be injected representing voicing, whispering, sibilant noises and \breathy" noise. The \Distinctive Region Model" (DRM) control system that allows simple specification of the postures, and accessed by TRAcT 1is based on research by CGM Fant and his colleagues at the Stockholm Royal Institute of Technology Speech Technology Laboratory (Fant & Pauli 1974), by Ren´eCarr´eand his colleagues at T´el´ecomParis (Carr´e,Chennoukh & Mrayati 1992), and further developed by our original research team at Trillium Sound Research and the U of Calgary).
    [Show full text]
  • DOCUMENT RESUME ED 052 654 FL 002 384 TITLE Speech Research
    DOCUMENT RESUME ED 052 654 FL 002 384 TITLE Speech Research: A Report on the Status and Progress of Studies on Lhe Nature of Speech, Instrumentation for its Investigation, and Practical Applications. 1 July - 30 September 1970. INSTITUTION Haskins Labs., New Haven, Conn. SPONS AGENCY Office of Naval Research, Washington, D.C. Information Systems Research. REPORT NO SR-23 PUB DATE Oct 70 NOTE 211p. EDRS PRICE EDRS Price MF-$0.65 HC-$9.87 DESCRIPTORS Acoustics, *Articulation (Speech), Artificial Speech, Auditory Discrimination, Auditory Perception, Behavioral Science Research, *Laboratory Experiments, *Language Research, Linguistic Performance, Phonemics, Phonetics, Physiology, Psychoacoustics, *Psycholinguistics, *Speech, Speech Clinics, Speech Pathology ABSTRACT This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. The reports contained in this particular number are state-of-the-art reviews of work central to the Haskins Laboratories' areas of research. Tre papers included are: (1) "Phonetics: An Overview," (2) "The Perception of Speech," (3) "Physiological Aspects of Articulatory Behavior," (4) "Laryngeal Research in Experimental Phonetics," (5) "Speech Synthesis for Phonetic and Phonological Models," (6)"On Time and Timing in Speech," and (7)"A Study of Prosodic Features." (Author) SR-23 (1970) SPEECH RESEARCH A Report on the Status and Progress ofStudies on the Nature of Speech,Instrumentation for its Investigation, andPractical Applications 1 July - 30 September1970 U.S. DEPARTMENT OF HEALTH, EDUCATION & WELFARE OFFICE OF EDUCATION THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROMTHE PERSON OR ORGANIZATION ORIGINATING IT,POINTS OF VIEW OR OPINIONS STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION POSITION OR POLICY.
    [Show full text]
  • Michael Studdert-Kennedy
    MSK oral history: At Haskins Labs, 300 George St CAF: All right. This is February 19, 2013. Present are Carol Fowler, Donald Shankweiler and Michael Studdert-Kennedy. So you have the list of questions. Shall we start at the beginning about how you got to Haskins in the first place. MSK: Yes, well. Shall I go into a personal detail? DPS: Sure. CAF: Abslolutely, Yeah. MSK: Well in the beginning way before this…I should say that I regard my arriving at Haskins as the luckiest event of my life. And it completely shaped it. But at any rate. I originally went into psychology because I thought I was going to become an analyst. I’d read a lot of Freud, and that’s what I thought I was going to do. And so I enrolled in the clinical program at Columbia in 1957 when I was 30. I had wasted my 20s in various wasteful activities in Rome and elsewhere. They would give me an assistantship in the eXperimental department for my first year but they wouldn’t for my clinical. So the idea was that I ‘d spend the first year in experimental and then switch to clinical. However, after taking a course in abnormal psychology and a number of eXperimental courses I realiZed that I couldn’t conceivably become a clinical psychologist because I couldn't believe a word they said half the time. And so I became an eXperimentalist and it was a heavily behaviorist department, and I did rats and pigeons. And then one summer by a great fluke, I needed a summer job and Bill McGill who was later the advisor of Dom Massaro incidently, asked me to be his assistant.
    [Show full text]
  • 1. the Speech Chain
    SPEECH, HEARING & PHONETIC SCIENCES UCL Division of Psychology and Language Sciences PALS1004 Introduction to Speech Science 1. The Speech Chain Learning Objectives At the end of this topic the student should be able to: describe the domain of Speech Science as a scientific discipline within Psychology give an account of the speech chain as a model of linguistic communication describe the processes and knowledge involved in communicating information through the speech chain begin to understand concepts and terminology used in the scientific description of speech communication demonstrate insights into the size and complexity of language and the remarkable nature of speech identify some technological applications of Speech Science identify some of the key people in the history of the field Topics 1. Speech Science as a field of Psychology The use of complex language is a trait unique to humans not shared by even our closest primate cousins. There is a good argument to be made that it is our facility for linguistic communication that has made human society possible and given us civilisation, arts, sciences & technology. Although hominins have been around for about 5 million years, human language is a recent phenomenon, arising within the last 200,000 years at most; and except for the last 5000 years, occurring only in its spoken form. A period of 500 thousand years is not long in evolutionary time, and there is little evidence that humans have a biological specialisation for spoken language; that is, the anatomical and neurophysiological structures we see in humans are very similar to those we see in other primates and mammals.
    [Show full text]