SPEECH ACOUSTICS and PHONETICS Text, Speech and Language Technology VOLUME 24

SPEECH ACOUSTICS AND PHONETICS Text, Speech and Language Technology VOLUME 24 Series Editors Nancy Ide, Vassar College, New York Jean Véronis, Universited´ eProvence and CNRS, France Editorial Board Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT&TBell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University of Regina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonoma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSI-CNRS, France The titles published in this series are listed at the end of this volume. Speech Acoustics and Phonetics by GUNNAR FANT Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm, Sweden KLUWER ACADEMIC PUBLISHERS DORDRECHT / BOSTON / LONDON A C.I.P Catalogue record for this book is available from the Library of Congress. ISBN 1-4020-2789-3 (PB) ISBN 1-4020-2373-1 (HB) ISBN 1-4020-2790-7 (e-book) Published by Kluwer Academic Publishers, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. Sold and distributed in North, Central and South America by Kluwer Academic Publishers, 101 Philip Drive, Norwell, MA 02061, U.S.A. In all other countries, sold and distributed by Kluwer Academic Publishers, P.O. Box 322, 3300 AH Dordrecht, The Netherlands. Printed on acid-free paper All Rights Reserved C 2004 Kluwer Academic Publishers No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed in the Netherlands. CONTENTS Foreword vii Preface ix Introduction xi List of selected articles xiii 1. Speech research overview 1 2. Speech production and synthesis 15 3. The voice source 93 4. Speech analysis and features 143 5. Speech perception 199 6. Prosody 221 Publication list 1945–2004 301 Reference categories 319 v FOREWORD by Louis C.W. Pols University of Amsterdam Since almost 60 years Gunnar Fant has actively contributed to the field of Speech Acoustics and Phonetics. Almost every speech scientist in the world knows him be- cause he is still a frequent visitor of conferences and workshops all over the world. They may also know about some of his work, but only few have a proper knowledge about the details and the breadth of his pioneering and still ongoing research. This is partly related to the fact that in the early days many of his contributions were only published in the Quarterly Progress and Status Reports (QPSR) of the Speech Transmission Laboratory (STL) of the Speech Communication and Music Acous- tics department, later called the department of Speech, Music and Hearing (TMH) of the Stockholm Royal Institute of Technology (KTH). Several other important publications only appeared in various conference proceedings or in not easily accessible journals. It is most fortunate that Gunnar Fant has taken up the challenge to produce this book of Selected Writings. It is his own unique selection and it only contains publications from his own hand with or without colleague co-authors. Via the Introductions per section he guides us himself through the multitude of topics and explains the historical developments and the various connections. It makes especially those older publications accessible that otherwise would have been very hard to find. I suggested to him to extend as much as possible the introductions to each of the six main chapters (Speech research overview; Speech production and synthesis; The voice source; Speech analysis and features; Speech perception; and Prosody). This, in my opinion has substantially contributed to the readability of the 19 individual contributions. It was difficult for him to limit himself to these 19 papers only, as one can imagine from his full list of publications with over 260 old and new titles, that he presents both chronologically as well as topic-wise. Nevertheless you have this goldmine now in front of you and I hope and expect that this will be joyful and informative reading. Gunnar Fant has been awarded many times, most recently in June 2004 with the new IEEE James L. Flanagan Speech and Audio Processing Award, together with another speech giant Ken Stevens, for their “fundamental contributions to the theory and practice of acoustic phonetics and speech perception”. Still I believe that the most valuable contribution of a scientist to the scientific and world community are his products in writing, especially when they are made accessible in such a splendid form as in this book. His pioneering book on the “Acoustic theory of speech production” published in 1960 by Mouton, is of course a classic that is for instance still invaluable in articulatory synthesis. However, many other topics, like the formant banana, the Jakobsen-Fant-Halle distinctive features, the LF source model, the OVE synthesiser, vii viii Foreword the invariance-variability dispute, syllable prominence and the speech code, get much attention and are presented in the proper perspective in the present book. The final section is about Prosody, the topic that keeps him most busy these days. He works on it together with Anita Kruckenberg and Johan Liljencrants, and it concerns not just the prosody of spoken Swedish but also that of poetry. PREFACE This is a collection of articles spanning half a century of speech research. It started at the Ericsson Telephone Company in Stockholm, 1946–1949. The following two years were spent at MIT. In 1951 a small research group was established at the KTH in Stockholm. This unit, the Speech Transmission Laboratory, became the foundation for our present department of Speech, Music and Hearing. An early expansion was promoted by US grants in the 1960’s. Research in speech acoustics, phonetics, hearing and handicap aids dominated the activities up to 1990, after which more applied projects in speech technology gained dominance at the department. Much of our work was published in our Speech Transmission Laboratory Quar- terly Progress and Status Reports (STL-QPSR). It had the advantage of reaching a large international forum with a minimum of delay, but gave less time for publications in established journals. The purpose of the present book is to make available a collection of articles from various reports and publications, which contribute to the knowledge foundation. It is not a structured textbook, but it provides a reference material for quite a wide range of topics in the field. It is with great gratitude that I acknowledge the contributions from all those who have been involved in developing our department and its research and have served as co-workers. They are too many to be listed here, but there are two persons from the early days that I want to mention. Marianne Richter, my first employee, participated in language statistics and became our first finance officer. Si Felicetti started our STL-QPSR in 1960 and promoted our international contacts. My present research is carried out together with Anita Kruckenberg and Johan Liljencrants. They and my many friends in the scientific community have contributed to the delight of cooperation and scientific discoveries. Closest in research profile is Ken Stevens. His book, Acoustic Phonetics, is of monumental value and provides deeper insights in matters of speech production. Valuable suggestions for the planning of the volume were given by Louis Pols. ix INTRODUCTION As time goes by, a look ahead in science and technology gains by a spotlight on earlier periods. Speech technology has provided important tools for applications in man-machine communication systems and is growing rapidly. But there is a risk that expansion will be limited by insufficient attention to the potentialities of speech and language research. The symbiosis between technology and basic research that has made possible the advance, now shows a tendency to turn into polarization. Speech technology is highly dependent on statistical tools and large data bases, whilst phonetics tends to become fractionalised by narrowly defined problems or by abstract issues with small or no relevance for the overall code of spoken language. There is a great need for integrated basic knowledge of speech production, acoustics, perception and cognitive processes and of the encoding of linguistically defined units in the speech wave and other parts of the overall speech chain. In several articles I have attempted to coin the concept of these relations as the speech code. The relative success of speech synthesis has created an illusion that we have a profound insight in the speech code. This illusion becomes especially appar- ent when operating in the reverse direction, that is, given a record of the speech wave we attempt to decipher what was said. An example is spectrogram reading, a difficult but rewarding exercise, which is mediated by knowledge about speech production. In quest of the speech code, we are faced with issues concerning invariance and variability. However, the invariance issue ceases to present a problem if we systematically develop rules for structuring variability of all kinds, not only language, dialectal and contextual variations but also variations specific to speaker, speaking style and emotions. Much more effort is needed to develop the code and make it available for practical applications, as well as for the advance of general phonetics and linguistics. It is only with a profound knowledge of the speech code and human behaviour that we can realize ultimate goals of advanced and reliable systems. The 19 selected articles span a period of almost 50 years. Some of the older ones still maintain a pedagogical value, or they document unique studies of some significance.

SPEECH ACOUSTICS and PHONETICS Text, Speech and Language Technology VOLUME 24

Part 2: RHYTHM – DURATION and TIMING Część 2: RYTM – ILOCZAS I WZORCE CZASOWE

Estudios De I+D+I

Half a Century in Phonetics and Speech Research

Spoken Language Generation and Un- Derstanding by Machine: a Problems and Applications-Oriented Overview

Speech Generation: from Concept and from Text

Aplicació De Tècniques De Generació Automàtica De La Parla En Producció Audiovisual

C H a P T E R

Gnuspeech Tract Manual 0.9

DOCUMENT RESUME ED 052 654 FL 002 384 TITLE Speech Research

Michael Studdert-Kennedy

1. the Speech Chain

“Síntese Computacional De Voz”