Time Voice Source Model for Use with Vocal Tract Modelling Synthesis on Portable Devices

Total Page:16

File Type:pdf, Size:1020Kb

Time Voice Source Model for Use with Vocal Tract Modelling Synthesis on Portable Devices The Development of a Parametric Real-Time Voice Source Model for use with Vocal Tract Modelling Synthesis on Portable Devices Jacob Harrison MSc by Research University of York Electronics November 2014 Abstract This research is concerned with the natural synthesis of the human voice, in particular, the expansion of the LF-model voice source synthesis method. The LF- model is a mathematical representation of the acoustic waveform produced by the vocal folds in the human speech production system. Whilst being used in many voice synthesis applications since its inception in the 1970s, the parametric capabilities of this model have remained mostly unexploited in terms of real-time manipulation. With recent advances in dynamic acoustic modelling of the human vocal tract using the two-dimensional digital waveguide mesh (2D DWM), a logical step is to include a real-time parametric voice source model rather than the static LF-waveform archetype. This thesis documents the development of a parameterised LF-model to be used in conjunction with an iOS-based 2D DWM vocal tract synthesiser, designed with the further study of voice synthesis naturalness as well as improvements to assistive technology in mind. i Table of Contents Abstract ...................................................................................................................................... i List of Figures .......................................................................................................................... v List of Tables ......................................................................................................................... vii List of Accompanying Material ...................................................................................... viii Acknowledgements .............................................................................................................. ix Author’s Declaration ............................................................................................................ x 1. Introduction ........................................................................................................................ 1 1.1 Thesis Overview ....................................................................................................................... 1 1.2 Thesis Structure ....................................................................................................................... 3 2. Literature Review ............................................................................................................. 5 2.1 The ‘Source + Modifier’ Principle ....................................................................................... 5 2.2 Physiology of the Vocal Folds .............................................................................................. 8 2.3 Voice Types .............................................................................................................................. 11 2.4 Modelling the Voice Source ................................................................................................ 16 2.4.1 Existing Methods for Voice Source Synthesis ..................................................................... 17 2.4.2 The Liljencrants-Fant Glottal Flow Model ............................................................................ 20 2.5 Vocal Tract Modelling .......................................................................................................... 28 2.6 ‘Naturalness’ in Speech Synthesis .................................................................................... 37 3. A Parametric, Real-Time Voice Source Model ....................................................... 44 3.1 Motivation for Design ........................................................................................................... 45 3.2 Specifications .......................................................................................................................... 47 3.3 Design ........................................................................................................................................ 48 3.3.1 Choice of Parameters ..................................................................................................................... 48 3.3.2 Wavetable Synthesis ...................................................................................................................... 53 3.3.3 Voice Types ........................................................................................................................................ 55 3.3.4 iOS Interface ...................................................................................................................................... 58 3.3.5 Final Design ....................................................................................................................................... 59 3.4 Implementation ..................................................................................................................... 61 3.4.1 Implementation in MATLAB ...................................................................................................... 62 3.4.2 Implementation in iOS .................................................................................................................. 69 ii 3.5 System Testing ........................................................................................................................ 80 3.5.1 Waveform Reproduction ............................................................................................................. 83 3.5.2 Fundamental Frequency .............................................................................................................. 86 3.5.3 ‘Vocal Tension’ Parameters ........................................................................................................ 86 3.5.4 Automatic Pitch-Dependent Voice Types ............................................................................. 90 3.5.5 Automatic f0 Trajectory ............................................................................................................... 95 3.6 Conclusions .............................................................................................................................. 97 4. Vocal Tract Modelling .................................................................................................... 98 4.1 Vocal Tract Modelling with the 2D DWM ....................................................................... 98 4.2 Implementation of the 2D DWM in MATLAB ............................................................. 102 4.3 Implementation in iOS ...................................................................................................... 105 4.4 System Testing ..................................................................................................................... 109 4.4.1 Formant Analysis .......................................................................................................................... 109 4.4.2 Multiple Vowels ............................................................................................................................. 111 4.4.3 System Performance .................................................................................................................... 113 4.5 Conclusions ........................................................................................................................... 114 5. Summary and Analysis ............................................................................................... 115 5.1 Summary ............................................................................................................................... 115 5.2 Analysis .................................................................................................................................. 116 5.2.1 Voice Source Synthesis using the LF-Model ...................................................................... 116 5.2.2 Extensions to the LF-Model ...................................................................................................... 116 5.2.3 Use within 2D DWM Vocal Tract Model .............................................................................. 119 5.2.4 Core Aims ......................................................................................................................................... 119 5.3 Future Research .................................................................................................................. 121 5.3.1 Issues within LF-Model implementation ............................................................................ 121 5.3.2 Further Extensions to the Voice Source Model ................................................................ 122 5.3.3 Multi-touch, Gestural User Interfaces .................................................................................. 124 5.3.4 Implementation of Dynamic Impedance Mapping within the 2D DWM ............... 126 5.4 Conclusion ............................................................................................................................. 126 Appendix A – ‘LFModelFull.m’ MATLAB Source Code ........................................... 128 Appendix B – ‘ViewController.h’ LFGen App Header File .................................... 133 Appendix C – ‘ViewController.m’ LFGen App Main File ....................................... 134 Appendix D –
Recommended publications
  • Enter Title Here
    Integer-Based Wavetable Synthesis for Low-Computational Embedded Systems Ben Wright and Somsak Sukittanon University of Tennessee at Martin Department of Engineering Martin, TN USA [email protected], [email protected] Abstract— The evolution of digital music synthesis spans from discussed. Section B will discuss the mathematics explaining tried-and-true frequency modulation to string modeling with the components of the wavetables. Part II will examine the neural networks. This project begins from a wavetable basis. music theory used to derive the wavetables and the software The audio waveform was modeled as a superposition of formant implementation. Part III will cover the applied results. sinusoids at various frequencies relative to the fundamental frequency. Piano sounds were reverse-engineered to derive a B. Mathematics and Music Theory basis for the final formant structure, and used to create a Many periodic signals can be represented by a wavetable. The quality of reproduction hangs in a trade-off superposition of sinusoids, conventionally called Fourier between rich notes and sampling frequency. For low- series. Eigenfunction expansions, a superset of these, can computational systems, this calls for an approach that avoids represent solutions to partial differential equations (PDE) burdensome floating-point calculations. To speed up where Fourier series cannot. In these expansions, the calculations while preserving resolution, floating-point math was frequency of each wave (here called a formant) in the avoided entirely--all numbers involved are integral. The method was built into the Laser Piano. Along its 12-foot length are 24 superposition is a function of the properties of the string. This “keys,” each consisting of a laser aligned with a photoresistive is said merely to emphasize that the response of a plucked sensor connected to 8-bit MCU.
    [Show full text]
  • THE DEVELOPMENT of ACCENTED ENGLISH SYNTHETIC VOICES By
    THE DEVELOPMENT OF ACCENTED ENGLISH SYNTHETIC VOICES by PROMISE TSHEPISO MALATJI DISSERTATION Submitted in fulfilment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE in the FACULTY OF SCIENCE AND AGRICULTURE (School of Mathematical and Computer Sciences) at the UNIVERSITY OF LIMPOPO SUPERVISOR: Mr MJD Manamela CO-SUPERVISOR: Dr TI Modipa 2019 DEDICATION In memory of my grandparents, Cecilia Khumalo and Alfred Mashele, who always believed in me! ii DECLARATION I declare that THE DEVELOPMENT OF ACCENTED ENGLISH SYNTHETIC VOICES is my own work and that all the sources that I have used or quoted have been indicated and acknowledged by means of complete references and that this work has not been submitted before for any other degree at any other institution. ______________________ ___________ Signature Date iii ACKNOWLEDGEMENTS I want to recognise the following people for their individual contributions to this dissertation: • My brother, Mr B.I. Khumalo and the whole family for the unconditional love, support and understanding. • A distinct thank you to both my supervisors, Mr M.J.D. Manamela and Dr T.I. Modipa, for their guidance, motivation, and support. • The Telkom Centre of Excellence for Speech Technology for providing the resources and support to make this study a success. • My colleagues in Department of Computer Science, Messrs V.R. Baloyi and L.M. Kola, for always motivating me. • A special thank you to Mr T.J. Sefara for taking his time to participate in the study. • The six Computer Science undergraduate students who sacrificed their precious time to participate in data collection.
    [Show full text]
  • Part 2: RHYTHM – DURATION and TIMING Część 2: RYTM – ILOCZAS I WZORCE CZASOWE
    Part 2: RHYTHM – DURATION AND TIMING Część 2: RYTM – ILOCZAS I WZORCE CZASOWE From research to application: creating and applying models of British RP English rhythm and intonation Od badań do aplikacji: tworzenie i zastosowanie modeli rytmu i intonacji języka angielskiego brytyjskiego David R. Hill Department of Computer Science The University of Calgary, Alberta, Canada [email protected] ABSTRACT Wiktor Jassem’s contributions and suggestions for further work have been an essential influence on my own work. In 1977, Wiktor agreed to come to my Human-Computer Interaction Laboratory at the University of Calgary to collaborate on problems asso- ciated with British English rhythm and intonation in computer speech synthesis from text. The cooperation resulted in innovative models which were used in implementing the world’s first completely functional and still operational real-time system for arti- culatory speech synthesis by rule from plain text. The package includes the software tools needed for developing the databases required for synthesising other languages, as well as providing stimuli for psychophysical experiments in phonetics and phonology. STRESZCZENIE Pu bli ka cje Wik to ra Jas se ma i wska zów ki do dal szej pra cy w spo sób istot ny wpły nę - ły na mo ją wła sną pra cę. W 1997 ro ku Wik tor zgo dził się na przy jazd do mo je go La - bo ra to rium In te rak cji Czło wiek -Kom pu ter na Uni wer sy te cie w Cal ga ry aby wspól nie za jąć się pro ble ma mi zwią za ny mi z ryt mem w bry tyj skim an giel skim oraz in to na cją w syn te zie mo wy z tek stu.
    [Show full text]
  • SPEECH ACOUSTICS and PHONETICS Text, Speech and Language Technology VOLUME 24
    SPEECH ACOUSTICS AND PHONETICS Text, Speech and Language Technology VOLUME 24 Series Editors Nancy Ide, Vassar College, New York Jean V´eronis, Universited´ eProvence and CNRS, France Editorial Board Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT&TBell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University of Regina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonoma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSI-CNRS, France The titles published in this series are listed at the end of this volume. Speech Acoustics and Phonetics by GUNNAR FANT Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm, Sweden KLUWER ACADEMIC PUBLISHERS DORDRECHT / BOSTON / LONDON A C.I.P Catalogue record for this book is available from the Library of Congress. ISBN 1-4020-2789-3 (PB) ISBN 1-4020-2373-1 (HB) ISBN 1-4020-2790-7 (e-book) Published by Kluwer Academic Publishers, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. Sold and distributed in North, Central and South America by Kluwer Academic Publishers, 101 Philip Drive, Norwell, MA 02061, U.S.A. In all other countries, sold and distributed by Kluwer Academic Publishers, P.O. Box 322, 3300 AH Dordrecht, The Netherlands. Printed on acid-free paper All Rights Reserved C 2004 Kluwer Academic Publishers No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
    [Show full text]
  • Commercial Tools in Speech Synthesis Technology
    International Journal of Research in Engineering, Science and Management 320 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792 Commercial Tools in Speech Synthesis Technology D. Nagaraju1, R. J. Ramasree2, K. Kishore3, K. Vamsi Krishna4, R. Sujana5 1Associate Professor, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India 2Professor, Dept. of Computer Science, Rastriya Sanskrit VidyaPeet, Tirupati, India 3,4,5UG Student, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India Abstract: This is a study paper planned to a new system phonetic and prosodic information. These two phases are emotional speech system for Telugu (ESST). The main objective of usually called as high- and low-level synthesis. The input text this paper is to map the situation of today's speech synthesis might be for example data from a word processor, standard technology and to focus on potential methods for the future. ASCII from e-mail, a mobile text-message, or scanned text Usually literature and articles in the area are focused on a single method or single synthesizer or the very limited range of the from a newspaper. The character string is then preprocessed and technology. In this paper the whole speech synthesis area with as analyzed into phonetic representation which is usually a string many methods, techniques, applications, and products as possible of phonemes with some additional information for correct is under investigation. Unfortunately, this leads to a situation intonation, duration, and stress. Speech sound is finally where in some cases very detailed information may not be given generated with the low-level synthesizer by the information here, but may be found in given references.
    [Show full text]
  • Gender, Ethnicity, and Identity in Virtual
    Virtual Pop: Gender, Ethnicity, and Identity in Virtual Bands and Vocaloid Alicia Stark Cardiff University School of Music 2018 Presented in partial fulfilment of the requirements for the degree Doctor of Philosophy in Musicology TABLE OF CONTENTS ABSTRACT i DEDICATION iii ACKNOWLEDGEMENTS iv INTRODUCTION 7 EXISTING STUDIES OF VIRTUAL BANDS 9 RESEARCH QUESTIONS 13 METHODOLOGY 19 THESIS STRUCTURE 30 CHAPTER 1: ‘YOU’VE COME A LONG WAY, BABY:’ THE HISTORY AND TECHNOLOGIES OF VIRTUAL BANDS 36 CATEGORIES OF VIRTUAL BANDS 37 AN ANIMATED ANTHOLOGY – THE RISE IN POPULARITY OF ANIMATION 42 ALVIN AND THE CHIPMUNKS… 44 …AND THEIR SUCCESSORS 49 VIRTUAL BANDS FOR ALL AGES, AVAILABLE ON YOUR TV 54 VIRTUAL BANDS IN OTHER TYPES OF MEDIA 61 CREATING THE VOICE 69 REPRODUCING THE BODY 79 CONCLUSION 86 CHAPTER 2: ‘ALMOST UNREAL:’ TOWARDS A THEORETICAL FRAMEWORK FOR VIRTUAL BANDS 88 DEFINING REALITY AND VIRTUAL REALITY 89 APPLYING THEORIES OF ‘REALNESS’ TO VIRTUAL BANDS 98 UNDERSTANDING MULTIMEDIA 102 APPLYING THEORIES OF MULTIMEDIA TO VIRTUAL BANDS 110 THE VOICE IN VIRTUAL BANDS 114 AGENCY: TRANSFORMATION THROUGH TECHNOLOGY 120 CONCLUSION 133 CHAPTER 3: ‘INSIDE, OUTSIDE, UPSIDE DOWN:’ GENDER AND ETHNICITY IN VIRTUAL BANDS 135 GENDER 136 ETHNICITY 152 CASE STUDIES: DETHKLOK, JOSIE AND THE PUSSYCATS, STUDIO KILLERS 159 CONCLUSION 179 CHAPTER 4: ‘SPITTING OUT THE DEMONS:’ GORILLAZ’ CREATION STORY AND THE CONSTRUCTION OF AUTHENTICITY 181 ACADEMIC DISCOURSE ON GORILLAZ 187 MASCULINITY IN GORILLAZ 191 ETHNICITY IN GORILLAZ 200 GORILLAZ FANDOM 215 CONCLUSION 225
    [Show full text]
  • The Sonification Handbook Chapter 9 Sound Synthesis for Auditory Display
    The Sonification Handbook Edited by Thomas Hermann, Andy Hunt, John G. Neuhoff Logos Publishing House, Berlin, Germany ISBN 978-3-8325-2819-5 2011, 586 pages Online: http://sonification.de/handbook Order: http://www.logos-verlag.com Reference: Hermann, T., Hunt, A., Neuhoff, J. G., editors (2011). The Sonification Handbook. Logos Publishing House, Berlin, Germany. Chapter 9 Sound Synthesis for Auditory Display Perry R. Cook This chapter covers most means for synthesizing sounds, with an emphasis on describing the parameters available for each technique, especially as they might be useful for data sonification. The techniques are covered in progression, from the least parametric (the fewest means of modifying the resulting sound from data or controllers), to the most parametric (most flexible for manipulation). Some examples are provided of using various synthesis techniques to sonify body position, desktop GUI interactions, stock data, etc. Reference: Cook, P. R. (2011). Sound synthesis for auditory display. In Hermann, T., Hunt, A., Neuhoff, J. G., editors, The Sonification Handbook, chapter 9, pages 197–235. Logos Publishing House, Berlin, Germany. Media examples: http://sonification.de/handbook/chapters/chapter9 18 Chapter 9 Sound Synthesis for Auditory Display Perry R. Cook 9.1 Introduction and Chapter Overview Applications and research in auditory display require sound synthesis and manipulation algorithms that afford careful control over the sonic results. The long legacy of research in speech, computer music, acoustics, and human audio perception has yielded a wide variety of sound analysis/processing/synthesis algorithms that the auditory display designer may use. This chapter surveys algorithms and techniques for digital sound synthesis as related to auditory display.
    [Show full text]
  • Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden
    Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9932-9937, October 1995 Colloquium Paper This paper was presented at a colloquium entitled "Human-Machine Communication by Voice," organized by Lawrence R. Rabiner, held by the National Academy of Sciences at The Arnold and Mabel Beckman Center in Irvine, CA, February 8-9,1993. Models of speech synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden ABSTRACT The term "speech synthesis" has been used need large amounts of speech data. Models working close to for diverse technical approaches. In this paper, some of the the waveform are now typically making use of increased unit approaches used to generate synthetic speech in a text-to- sizes while still modeling prosody by rule. In the middle of the speech system are reviewed, and some of the basic motivations scale, "formant synthesis" is moving toward the articulatory for choosing one method over another are discussed. It is models by looking for "higher-level parameters" or to larger important to keep in mind, however, that speech synthesis prestored units. Articulatory synthesis, hampered by lack of models are needed not just for speech generation but to help data, still has some way to go but is yielding improved quality, us understand how speech is created, or even how articulation due mostly to advanced analysis-synthesis techniques. can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages Flexibility and Technical Dimensions are discussed as special challenges facing the speech synthesis community.
    [Show full text]
  • Estudios De I+D+I
    ESTUDIOS DE I+D+I Número 51 Proyecto SIRAU. Servicio de gestión de información remota para las actividades de la vida diaria adaptable a usuario Autor/es: Catalá Mallofré, Andreu Filiación: Universidad Politécnica de Cataluña Contacto: Fecha: 2006 Para citar este documento: CATALÁ MALLOFRÉ, Andreu (Convocatoria 2006). “Proyecto SIRAU. Servicio de gestión de información remota para las actividades de la vida diaria adaptable a usuario”. Madrid. Estudios de I+D+I, nº 51. [Fecha de publicación: 03/05/2010]. <http://www.imsersomayores.csic.es/documentos/documentos/imserso-estudiosidi-51.pdf> Una iniciativa del IMSERSO y del CSIC © 2003 Portal Mayores http://www.imsersomayores.csic.es Resumen Este proyecto se enmarca dentro de una de las líneas de investigación del Centro de Estudios Tecnológicos para Personas con Dependencia (CETDP – UPC) de la Universidad Politécnica de Cataluña que se dedica a desarrollar soluciones tecnológicas para mejorar la calidad de vida de las personas con discapacidad. Se pretende aprovechar el gran avance que representan las nuevas tecnologías de identificación con radiofrecuencia (RFID), para su aplicación como sistema de apoyo a personas con déficit de distinta índole. En principio estaba pensado para personas con discapacidad visual, pero su uso es fácilmente extensible a personas con problemas de comprensión y memoria, o cualquier tipo de déficit cognitivo. La idea consiste en ofrecer la posibilidad de reconocer electrónicamente los objetos de la vida diaria, y que un sistema pueda presentar la información asociada mediante un canal verbal. Consta de un terminal portátil equipado con un trasmisor de corto alcance. Cuando el usuario acerca este terminal a un objeto o viceversa, lo identifica y ofrece información complementaria mediante un mensaje oral.
    [Show full text]
  • Masterarbeit
    Masterarbeit Erstellung einer Sprachdatenbank sowie eines Programms zu deren Analyse im Kontext einer Sprachsynthese mit spektralen Modellen zur Erlangung des akademischen Grades Master of Science vorgelegt dem Fachbereich Mathematik, Naturwissenschaften und Informatik der Technischen Hochschule Mittelhessen Tobias Platen im August 2014 Referent: Prof. Dr. Erdmuthe Meyer zu Bexten Korreferent: Prof. Dr. Keywan Sohrabi Eidesstattliche Erklärung Hiermit versichere ich, die vorliegende Arbeit selbstständig und unter ausschließlicher Verwendung der angegebenen Literatur und Hilfsmittel erstellt zu haben. Die Arbeit wurde bisher in gleicher oder ähnlicher Form keiner anderen Prüfungsbehörde vorgelegt und auch nicht veröffentlicht. 2 Inhaltsverzeichnis 1 Einführung7 1.1 Motivation...................................7 1.2 Ziele......................................8 1.3 Historische Sprachsynthesen.........................9 1.3.1 Die Sprechmaschine.......................... 10 1.3.2 Der Vocoder und der Voder..................... 10 1.3.3 Linear Predictive Coding....................... 10 1.4 Moderne Algorithmen zur Sprachsynthese................. 11 1.4.1 Formantsynthese........................... 11 1.4.2 Konkatenative Synthese....................... 12 2 Spektrale Modelle zur Sprachsynthese 13 2.1 Faltung, Fouriertransformation und Vocoder................ 13 2.2 Phase Vocoder................................ 14 2.3 Spectral Model Synthesis........................... 19 2.3.1 Harmonic Trajectories........................ 19 2.3.2 Shape Invariance..........................
    [Show full text]
  • Wavetable Synthesis 101, a Fundamental Perspective
    Wavetable Synthesis 101, A Fundamental Perspective Robert Bristow-Johnson Wave Mechanics, Inc. 45 Kilburn St., Burlington VT 05401 USA [email protected] ABSTRACT: Wavetable synthesis is both simple and straightforward in implementation and sophisticated and subtle in optimization. For the case of quasi-periodic musical tones, wavetable synthesis can be as compact in data storage requirements and as general as additive synthesis but requires much less real-time computation. This paper shows this equivalence, explores some suboptimal methods of extracting wavetable data from a recorded tone, and proposes a perceptually relevant error metric and constraint when attempting to reduce the amount of stored wavetable data. 0 INTRODUCTION Wavetable music synthesis (not to be confused with common PCM sample buffer playback) is similar to simple digital sine wave generation [1] [2] but extended at least two ways. First, the waveform lookup table contains samples for not just a single period of a sine function but for a single period of a more general waveshape. Second, a mechanism exists for dynamically changing the waveshape as the musical note evolves, thus generating a quasi-periodic function in time. This mechanism can take on a few different forms, probably the simplest being linear crossfading from one wavetable to the next sequentially. More sophisticated methods are proposed by a few authors (recently Horner, et al. [3] [4]) such as mixing a set of well chosen basis wavetables each with their corresponding envelope function as in Fig. 1. The simple linear crossfading method can be thought of as a subclass of the more general basis mixing method where the envelopes are overlapping triangular pulse functions.
    [Show full text]
  • Half a Century in Phonetics and Speech Research
    Fonetik 2000, Swedish phonetics meeting in Skövde, May 24-26, 2000 (Expanded version, internal TMH report) Half a century in phonetics and speech research Gunnar Fant Department of Speech, Music and Hearing, KTH, Stockholm, 10044 Abstract This is a brief outlook of experiences during more than 50 years in phonetics and speech research. I will have something to say about my own scientific carrier, the growth of our department at KTH, and I will end up with an overview of research objectives in phonetics and a summary of my present activities. Introduction As you are all aware of, phonetics and speech research are highly interrelated and integrated in many branches of humanities and technology. In Sweden by tradition, phonetics and linguistics have had a strong position and speech technology is well developed and internationally respected. This is indeed an exciting field of growing importance which still keeps me busy. What have we been up to during half a century? Where do we stand today and how do we look ahead? I am not attempting a deep, thorough study, my presentation will in part be anecdotal, but I hope that it will add to the perspective, supplementing the brief account presented in Fant (1998) The early period 1945-1966 KTH and Ericsson 1945-1949 I graduated from the department of Telegraphy and Telephony of the KTH in May 1945. My supervisor, professor Torbern Laurent, a specialist in transmission line theory and electrical filters had an open mind for interdisciplinary studies. My thesis was concerned with theoretical matters of relations between speech intelligibility and reduction of overall system bandwidth, incorporating the effects of different types of hearing loss.
    [Show full text]