Improvements in Speech Synthesis Improvementsinspeechsynthesis.Editedbye.Keller Et Al
Total Page:16
File Type:pdf, Size:1020Kb
ImprovementsinSpeechSynthesis.EditedbyE.Keller et al. Copyright # 2002 by John Wiley & Sons, Ltd ISBNs: 0-471-49985-4 &Hardback); 0-470-84594-5 &Electronic) Improvements in Speech Synthesis ImprovementsinSpeechSynthesis.EditedbyE.Keller et al. Copyright # 2002 by John Wiley & Sons, Ltd ISBNs: 0-471-49985-4 &Hardback); 0-470-84594-5 &Electronic) Improvements in Speech Synthesis COST 258: The Naturalness of Synthetic Speech Edited by E. Keller, University of Lausanne, Switzerland G. Bailly, INPG, France A. Monaghan, Aculab plc, UK J. Terken, Technische Universiteit Eindhoven, The Netherlands M. Huckvale, University College London, UK JOHN WILEY & SONS, LTD ImprovementsinSpeechSynthesis.EditedbyE.Keller et al. Copyright # 2002 by John Wiley & Sons, Ltd ISBNs: 0-471-49985-4 &Hardback); 0-470-84594-5 &Electronic) Copyright # 2002 by John Wiley & Sons, Ltd Baffins Lane, Chichester, West Sussex, PO19 1UD, England National 01243 779777 International &44) 1243 779777 e-mail &for orders and customer service enquiries): [email protected] Visit our Home Page on http://www.wiley.co.uk or http://www.wiley.com All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency, 90 Tottenham Court Road, London, W1P 9HE, UK, without the permission in writing of the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the publication. Neither the author&s) nor John Wiley and Sons Ltd accept any responsibility or liability for loss or damage occasioned to any person or property through using the material, instructions, methods or ideas contained herein, or acting or refraining from acting as a result of such use. The author&s) and Publisher expressly disclaim all implied warranties, including merchantability of fitness for any particular purpose. Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley and Sons is aware of a claim, the product names appear in initial capital or capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trade- marks and registration. Other Wiley Editorial Offices John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158±0012, USA WILEY-VCH Verlag GmbH Pappelallee 3, D-69469 Weinheim, Germany John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia John Wiley & Sons &Canada) Ltd, 22 Worcester Road Rexdale, Ontario, M9W 1L1, Canada John Wiley & Sons &Asia) Pte Ltd, 2 Clementi Loop #02±01, Jin Xing Distripark, Singapore 129809 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 0471 49985 4 Typeset in 10/12pt Times by Kolam Information Services Ltd, Pondicherry, India. Printed and bound in Great Britain by Biddles Ltd, Guildford and King's Lynn. This book is printed on acid-free paper responsibly manufactured from sustainable forestry, in which at least two trees are planted for each one used for paper production. ImprovementsinSpeechSynthesis.EditedbyE.Keller et al. Copyright # 2002 by John Wiley & Sons, Ltd ISBNs: 0-471-49985-4 &Hardback); 0-470-84594-5 &Electronic) Contents List of contributors ix Preface xiii Part I Issues in Signal Generation 1 1 Towards Greater Naturalness: Future Directions of Research in Speech Synthesis 3 Eric Keller 2 Towards More Versatile Signal Generation Systems 18 GeÂrard Bailly 3 A Parametric Harmonic Noise Model 22 GeÂrard Bailly 4 The COST 258 Signal Generation Test Array 39 GeÂrard Bailly 5 Concatenative Text-to-Speech Synthesis Based on Sinusoidal Modelling 52 Eduardo RodrõÂguez Banga,Carmen GarcõÂa Mateo and Xavier FernaÂndez Salgado 6 Shape Invariant Pitch and Time-Scale Modification of Speech Based on a Harmonic Model 64 Darragh O'Brien and Alex Monaghan 7 Concatenative Speech Synthesis Using SRELP 76 Erhard Rank Part II Issues in Prosody 87 8 Prosody in Synthetic Speech: Problems, Solutions and Challenges 89 Alex Monaghan 9 State-of-the-Art Summary of European Synthetic Prosody R&D 93 Alex Monaghan 10 Modelling FO in Various Romance Languages: Implementation in Some TTS Systems 104 Philippe Martin 11 Acoustic Characterisation of the Tonic Syllable in Portuguese 120 JoaÄo Paulo Ramos Teixeira and Diamantino R.S. Freitas 12 Prosodic Parameters of Synthetic Czech: Developing Rules for Duration and Intensity 129 Marie DohalskaÂ,Jana Mejvaldova and Tomas DubeÏda vi Contents 13 MFGI, a Linguistically Motivated Quantitative Model of German Prosody 134 HansjoÈrg Mixdorff 14 Improvements in Modelling the FO Contour for Different Types of Intonation Units in Slovene 144 Ales Dobnikar 15 Representing Speech Rhythm 154 Brigitte Zellner Keller and Eric Keller 16 Phonetic and Timing Considerations in a Swiss High German TTS System 165 Beat Siebenhaar,Brigitte Zellner Keller and Eric Keller 17 Corpus-based Development of Prosodic Models Across Six Languages 176 Justin Fackrell,Halewijn Vereecken,Cynthia Grover, Jean-Pierre Martens and Bert Van Coile 18 Vowel Reduction in German Read Speech 186 Christina Widera Part III Issues in Styles of Speech 197 19 Variability and Speaking Styles in Speech Synthesis 199 Jacques Terken 20 An Auditory Analysis of the Prosody of Fast and Slow Speech Styles in English, Dutch and German 204 Alex Monaghan 21 Automatic Prosody Modelling of Galician and its Application to Spanish 218 Eduardo LoÂpez Gonzalo,Juan M. Villar Navarro and Luis A. HernaÂndez GoÂmez 22 Reduction and Assimilatory Processes in Conversational French Speech: Implications for Speech Synthesis 228 Danielle Duez 23 Acoustic Patterns of Emotions 237 Branka Zei Pollermann and Marc Archinard 24 The Role of Pitch and Tempo in Spanish Emotional Speech: Towards Concatenative Synthesis 246 Juan Manuel Montero Martinez,Juana M. GutieÂrrez Arriola, Ricardo de CoÂrdoba Herralde,Emilia Victoria EnrõÂquez Carrasco and Jose Manuel Pardo MunÄoz 25 Voice Quality and the Synthesis of Affect 252 Ailbhe Nõ Chasaide and Christer Gobl 26 Prosodic Parameters of a `Fun' Speaking Style 264 Kjell Gustafson and David House 27 Dynamics of the Glottal Source Signal: Implications for Naturalness in Speech Synthesis 273 Christer Gobl and Ailbhe Nõ Chasaide 28 A Nonlinear Rhythmic Component in Various Styles of Speech 284 Brigitte Zellner Keller and Eric Keller Contents vii Part IV Issues in Segmentation and Mark-up 293 29 Issues in Segmentation and Mark-up 295 Mark Huckvale 30 The Use and Potential of Extensible Mark-up &XML) in Speech Generation 297 Mark Huckvale 31 Mark-up for Speech Synthesis: A Review and Some Suggestions 307 Alex Monaghan 32 Automatic Analysis of Prosody for Multi-lingual Speech Corpora 320 Daniel Hirst 33 Automatic Speech Segmentation Based on Alignment with a Text-to-Speech System 328 Petr HoraÂk 34 Using the COST 249 Reference Speech Recogniser for Automatic Speech Segmentation 339 Narada D. Warakagoda and Jon E. Natvig Part V Future Challenges 349 35 Future Challenges 351 Eric Keller 36 Towards Naturalness, or the Challenge of Subjectiveness 353 GenevieÁve Caelen-Haumont 37 Synthesis Within Multi-Modal Systems 363 Andrew Breen 38 A Multi-Modal Speech Synthesis Tool Applied to Audio-Visual Prosody 372 Jonas Beskow,BjoÈrn GranstroÈm and David House 39 Interface Design for Speech Synthesis Systems 383 Gudrun Flach Index 391 ImprovementsinSpeechSynthesis.EditedbyE.Keller et al. Copyright # 2002 by John Wiley & Sons, Ltd ISBNs: 0-471-49985-4 &Hardback); 0-470-84594-5 &Electronic) List of contributors Marc Archinard Ricardo de CoÂrdoba Herralde Geneva University Hospitals Universidad PoliteÂcnica de Madrid Liaison Psychiatry ETSI TelecomunicacioÂn Boulevard de la Cluse 51 Ciudad Universitaria s/n 1205 Geneva, Switzerland 28040 Madrid, Spain GeÂrard Bailly Ales Dobnikar Institut de la Communication ParleÂe Institute J. Stefan INPG Jamova 39 46 av. Felix Vialet 1000 Ljubljana, Slovenia 38031 Grenoble-cedex, France Marie Dohalska Institute of Phonetics Eduardo RodrõÂguez Banga Charles University, Prague Signal Theory Group >S). nam. Jana Palacha 2 Dpto. TecnologõÂas de las 116 38 Prague 1, Czech Republic Comunicaciones. ETSI TelecomunicacioÂn Tomas Dubeda Universidad de Vigo Institute of Phonetics 36200 Vigo, Spain Charles University, Prague nam. Jana Palacha 2 Jonas Beskow 116 38 Prague 1, Czech Republic CTT/Dept. of Speech, Music and Hearing Danielle Duez KTH Laboratoire Parole et Langage 100 44 Stockholm, Sweden CNRS Universite de Provence Andrew Breen 29 Av. Robert Schuman Nuance Communications Inc. 13621 Aix en Provence, France The School of Information Systems Emilia Victoria EnrõÂquez Carrasco University of East Anglia Facultad de FilologõÂa. UNED Norwich, NR47TJ, United Kingdom C/ Senda del Rey 7 28040 Madrid, Spain GenevieÁve Caelen-Haumont Laboratoire Parole et Langage Justin Fackrell CNRS Crichton's Close Universite de Provence Canongate 29 Av. Robert Schuman Edinburgh EH8 8DT 13621 Aix en Provence, France UK x List of contributors Xavier FernaÂndez Salgado Kjell Gustafson Signal Theory Group >S) CTT/Dept. of Speech, Music and Dpto. TecnologõÂas de las Hearing Comunicaciones KTH ETSI TelecomunicacioÂn 100 44 Stockholm, Sweden Universidad de Vigo