Speech Synthesis Today – How Did It Get to Be This Advanced?

Total Page:16

File Type:pdf, Size:1020Kb

Speech Synthesis Today – How Did It Get to Be This Advanced? 9/26/14 # Why do I care?$ Speech Syntesis • # BRIDGE BETWEEN SPEECH SCIENCE AND $$ $$ $$ $$ $$ $$ $$ $$ $$ $Heater Campbel ASSITIVE TECHNOLOGY TO HELP THOSE WITH # # MOTOR SPEECH CHALLENGES $ • BETTER UNDERSTAND TECHNOLOGY USED TODAY – SOME OF THE SAME TECHNOLOGY IN MY LAB IS USED IN SPEECH SYNTHESIS TODAY – HOW DID IT GET TO BE THIS ADVANCED? $ • YET FIRST SPEECH GENERATING DEVICES (SGDs) Septmber 24, 2014$ NOT USED AS ASSISTIVE TECHNOLOGY – AUTOMATA AS ODDITY – COOL MACHINE Earliest Speech Syntesis )18-19t Centuries)$ Early Speech Syntesis )20t Century) Anecdotes from 13th Century of attempts to build machines that imitate human speech, deemed heretical by the church, destroyed. 1857, M. Faber - "Euphonia“ – invenon in circus 1779, Christian Kratzenstein, Russian Academy of Sciences, machine modeling the human vocal tract that could produce the five long vowel sounds – amazed people and also scared them – voice described as ghostly and monotonous 1791, Wolfgang von Kempelen, bellows-operated "acoustic-mechanical speech machine", machine added models of the tongue and lips, enabling it to produce consonants as well as vowels. 1837, Charles Wheatstone - "speaking machine" based on von Kempelen's design, 1922 J. Q. Stewart’s electrical analog of the vocal organs (Flanagan, 1972, as cited by Haskins, 2008) (as cited by Haskins, 2008) Early Speech Syntesis )20t Century) Early Speech Syntesis )20t Century) 1937, R. Riesz – modeled human vocal tract with valves and chambers 1930s, Bell Labs, Vocoder, ”Voice Decoder” 1939- Homer Dudley - The Voder (Voice Operating Demonstrator) -automatically analyzed speech into its fundamental tone and resonances. - keyboard-operated voice synthesizer (as cited by Haskins, 2008) (Klatt, 1987, as cited by Haskins, 2008) 1 9/26/14 t Early Speech Syntesis )20 Century) Since te 1970s$ Rapid development associated with computer technological advancements 1960- Reg Maling- POSSUM- 1980s/1990s -MITalk system, Dennis Klatt ”Klattalk” MIT, Bell Labs system- used 1940s-1950 – Franklin Cooper - Haskins patient-operated selector sophisticated voicing source - formed basis for many systems used today - first Laboratories - Pattern playback -- mechanism- scanned through multilingual language-independent systems, making extensive use of natural converts pictures of the acoustic patterns images on illuminated display language processing methods –- included 6000 words that do not follow letter-sound of speech in the form of a spectrogram conversions (Koul, 2011, p. 267) back into sound. (as cited by Haskins, 2008) (Wikipedia, 2014) (Lemmetty, 1999) Great audio examples of speech synthesis history 1939-1985: http://www.cs.indiana.edu/rhythmsp/ASA/Contents.html Wit Computrs comes # Syntesized Speech$ Digitzed Speech$ • create novel words/messages - messages created from Use prerecorded speech, limited to only a few phrases, best in institutional settings to communicate basic needs - switch operated (Koul, 2011) lebers/words, phrases, sentences, pictures, symbols $ Sentient Systems 1983 - “EyeTyper” Concatnatve Syntesis Company later became DynaVox – first SGD with touch-screen technology • Unit selecPon synthesis – combines small parts of prerecorded speech Later digital speech generators: (sounds/syllables) • requires large database • relavely natural sounding, though SpringBoard Lite not very smooth ChatBox ActionVoice Statstcal Parametic Syntesis BigMack • “hidden Markov models (HMMs) and “deep neural Cheap Talk network” (DNN) models, formant synthesis Speak Easy • Machine learning technique – speech waveform Digivox/ Dynamo (by DynaVox) constructed from generated parameters • smooth at boundaries, though not very natural sounding (Lemmetty, 1999) Concatnatve Syntesis$ Input for SGDs$ Text-to-Speech (TTS) uses keyboard, switch, or touch screen, enter words and device uses algorithm to generate synthec speech • Usually concatenave synthesis • Can choose your own vocie Examples • CereProc • Proloquo2Go • ModelTalker (Patel, 2014) 2 9/26/14 0owered by CereProc- te Bush-o-matc$ CereProc$ http://www.idyacy.com/cgi-bin/bushomatic.cgi# https://www.cereproc.com/en/support/live_demo 0roloquo2Go$ http://www.assistiveware.com/product/proloquo2go/voices http://www.assistiveware.com/product/proloquo2go 0ersonalized Syntetc Voices$ 1EDWomen Clip 2 Rupal Patl$ Rupal Patel – combines voice (source) from target talker and speech from speech donor (filter) – to produce customized synthePc voice as part of VocaliD project (Patel, 2013) -added to TTS ModelTalker version 1 device Research showing high intelligibility and recogniPon of voices (Mills, Bunnell, & Patel, 2014). (Patel, 2013) https://www.youtube.com/watch?v=d38LKbYfWrs 3 9/26/14 SOURCES$ Haskins Laboratories. (1996-2008). Talking Heads: Simulacra- The Early History of Talking Machines. From http://www.haskins.yale.edu/featured/heads/simulacra.html. Lemmetty, S. (1999). Review of Speech Synthesis Technology. (Masters), Helsinki University of Technology, Helsinki, Finland. Mills, T., Bunnell, H. T., & Patel, R. (2014). Towards Personalized Speech Synthesis for Augmentative and Alternative Communication. Augmentative and Alternative Communication, 30(3), 226-236. Patel, R. (2013). Synthetic voices, as unique as fingerprints. From https://www.ted.com/talks/rupal_patel_synthetic_voices_as_unique_as_fingerprints#. Koul, R. (2011). Synthetic Speech. In O. Wendt (Ed.), Assistive technology principles and applications for communication disorders and special education (pp. 530). Bingley, U.K.: Emerald. Wikipedia. (2014). Speech-generating device. From http://en.wikipedia.org/wiki/Speech-generating_device 4 .
Recommended publications
  • A Vocoder (Short for Voice Encoder) Is a Synthesis System, Which Was
    The Vocoder Thomas Carney 311107435 Digital Audio Systems, DESC9115, Semester 1 2012 Graduate Program in Audio and Acoustics Faculty of Architecture, Design and Planning, The University of Sydney A vocoder (short for voice encoder) is Dudley's breakthrough device a synthesis system, which was initially analysed wideband speech, developed to reproduce human converted it into slowly varying control speech. Vocoding is the cross signals, sent those over a low-band synthesis of a musical instrument with phone line, and finally transformed voice. It was called the vocoder those signals back into the original because it involved encoding the speech, or at least a close voice (speech analysis) and then approximation of it. The vocoder was reconstructing the voice in also useful in the study of human accordance with a code written to speech, as a laboratory tool. replicate the speech (speech synthesis). A key to this process was the development of a parallel band pass A Brief History filter, which allowed sounds to be The vocoder was initially conceived filtered down to a fairly specific and developed as a communication portion of the audio spectrum by tool in the 1930s as a means of attenuating the sounds that fall above coding speech for delivery over or below a certain band. By telephone wires. Homer Dudley 1 is separating the signal into many recognised as one of the father’s of bands, it could be transmitted easier the vocoder for his work over forty and allowed for a more accurate years for Bell Laboratories’ in speech resynthesis. and telecommunications coding. The vocoder was built in an attempt to It wasn’t until the late 1960s 2 and save early telephone circuit early 1970s that the vocoder was bandwidth.
    [Show full text]
  • Expression Control in Singing Voice Synthesis
    Expression Control in Singing Voice Synthesis Features, approaches, n the context of singing voice synthesis, expression control manipu- [ lates a set of voice features related to a particular emotion, style, or evaluation, and challenges singer. Also known as performance modeling, it has been ] approached from different perspectives and for different purposes, and different projects have shown a wide extent of applicability. The Iaim of this article is to provide an overview of approaches to expression control in singing voice synthesis. We introduce some musical applica- tions that use singing voice synthesis techniques to justify the need for Martí Umbert, Jordi Bonada, [ an accurate control of expression. Then, expression is defined and Masataka Goto, Tomoyasu Nakano, related to speech and instrument performance modeling. Next, we pres- ent the commonly studied set of voice parameters that can change and Johan Sundberg] Digital Object Identifier 10.1109/MSP.2015.2424572 Date of publication: 13 October 2015 IMAGE LICENSED BY INGRAM PUBLISHING 1053-5888/15©2015IEEE IEEE SIGNAL PROCESSING MAGAZINE [55] noVEMBER 2015 voices that are difficult to produce naturally (e.g., castrati). [TABLE 1] RESEARCH PROJECTS USING SINGING VOICE SYNTHESIS TECHNOLOGIES. More examples can be found with pedagogical purposes or as tools to identify perceptually relevant voice properties [3]. Project WEBSITE These applications of the so-called music information CANTOR HTTP://WWW.VIRSYN.DE research field may have a great impact on the way we inter- CANTOR DIGITALIS HTTPS://CANTORDIGITALIS.LIMSI.FR/ act with music [4]. Examples of research projects using sing- CHANTER HTTPS://CHANTER.LIMSI.FR ing voice synthesis technologies are listed in Table 1.
    [Show full text]
  • Attuning Speech-Enabled Interfaces to User and Context for Inclusive Design: Technology, Methodology and Practice
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Springer - Publisher Connector Univ Access Inf Soc (2009) 8:109–122 DOI 10.1007/s10209-008-0136-x LONG PAPER Attuning speech-enabled interfaces to user and context for inclusive design: technology, methodology and practice Mark A. Neerincx Æ Anita H. M. Cremers Æ Judith M. Kessens Æ David A. van Leeuwen Æ Khiet P. Truong Published online: 7 August 2008 Ó The Author(s) 2008 Abstract This paper presents a methodology to apply 1 Introduction speech technology for compensating sensory, motor, cog- nitive and affective usage difficulties. It distinguishes (1) Speech technology seems to provide new opportunities to an analysis of accessibility and technological issues for the improve the accessibility of electronic services and soft- identification of context-dependent user needs and corre- ware applications, by offering compensation for limitations sponding opportunities to include speech in multimodal of specific user groups. These limitations can be quite user interfaces, and (2) an iterative generate-and-test pro- diverse and originate from specific sensory, physical or cess to refine the interface prototype and its design cognitive disabilities—such as difficulties to see icons, to rationale. Best practices show that such inclusion of speech control a mouse or to read text. Such limitations have both technology, although still imperfect in itself, can enhance functional and emotional aspects that should be addressed both the functional and affective information and com- in the design of user interfaces (cf. [49]). Speech technol- munication technology-experiences of specific user groups, ogy can be an ‘enabler’ for understanding both the content such as persons with reading difficulties, hearing-impaired, and ‘tone’ in user expressions, and for producing the right intellectually disabled, children and older adults.
    [Show full text]
  • Voice Synthesizer Application Android
    Voice synthesizer application android Continue The Download Now link sends you to the Windows Store, where you can continue the download process. You need to have an active Microsoft account to download the app. This download may not be available in some countries. Results 1 - 10 of 603 Prev 1 2 3 4 5 Next See also: Speech Synthesis Receming Device is an artificial production of human speech. The computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware. The text-to-speech system (TTS) converts the usual text of language into speech; other systems display symbolic linguistic representations, such as phonetic transcriptions in speech. Synthesized speech can be created by concatenating fragments of recorded speech that are stored in the database. Systems vary in size of stored speech blocks; The system that stores phones or diphones provides the greatest range of outputs, but may not have clarity. For specific domain use, storing whole words or suggestions allows for high-quality output. In addition, the synthesizer may include a vocal tract model and other characteristics of the human voice to create a fully synthetic voice output. The quality of the speech synthesizer is judged by its similarity to the human voice and its ability to be understood clearly. The clear text to speech program allows people with visual impairments or reading disabilities to listen to written words on their home computer. Many computer operating systems have included speech synthesizers since the early 1990s. A review of the typical TTS Automatic Announcement System synthetic voice announces the arriving train to Sweden.
    [Show full text]
  • Observations on the Dynamic Control of an Articulatory Synthesizer Using Speech Production Data
    Observations on the dynamic control of an articulatory synthesizer using speech production data Dissertation zur Erlangung des Grades eines Doktors der Philosophie der Philosophischen Fakultäten der Universität des Saarlandes vorgelegt von Ingmar Michael Augustus Steiner aus San Francisco Saarbrücken, 2010 ii Dekan: Prof. Dr. Erich Steiner Berichterstatter: Prof. Dr. William J. Barry Prof. Dr. Dietrich Klakow Datum der Einreichung: 14.12.2009 Datum der Disputation: 19.05.2010 Contents Acknowledgments vii List of Figures ix List of Tables xi List of Listings xiii List of Acronyms 1 Zusammenfassung und Überblick 1 Summary and overview 4 1 Speech synthesis methods 9 1.1 Text-to-speech ................................. 9 1.1.1 Formant synthesis ........................... 10 1.1.2 Concatenative synthesis ........................ 11 1.1.3 Statistical parametric synthesis .................... 13 1.1.4 Articulatory synthesis ......................... 13 1.1.5 Physical articulatory synthesis .................... 15 1.2 VocalTractLab in depth ............................ 17 1.2.1 Vocal tract model ........................... 17 1.2.2 Gestural model ............................. 19 1.2.3 Acoustic model ............................. 24 1.2.4 Speaker adaptation ........................... 25 2 Speech production data 27 2.1 Full imaging ................................... 27 2.1.1 X-ray and cineradiography ...................... 28 2.1.2 Magnetic resonance imaging ...................... 30 2.1.3 Ultrasound tongue imaging ...................... 33
    [Show full text]
  • “It Talks!” a Very Brief History of Voice Synthesis
    “It Talks!” a very brief history of voice synthesis Mark Schubin, HPA Tech Retreat, 2018 February 23 1 “human voice” one of the oldest organ stops Mark Schubin, HPA Tech Retreat, 2018 February 23 2 When? We don’t know, but organ-like instrument 3rd-century BCE programmable automatic instruments in 9th-century Baghdad reconstruction of automated reed instrument hydraulis, 1st-century BCE Dion, Greece photo: QuartierLatin1968 Mark Schubin, HPA Tech Retreat, 2018 February 23 3 The Captain’s Parrot & the Magician “Captain Birdseye” Mark Schubin, HPA Tech Retreat, 2018 February 23 4 Pope Sylvester II (999-1003) said to have had a “brazen head” that could answer questions yes or no Mark Schubin, HPA Tech Retreat, 2018 February 23 5 Alexander Graham Bell (& Trouve) “How are you grandmamma? Mark Schubin, HPA Tech Retreat, 2018 February 23 6 Mark Schubin, HPA Tech Retreat, 2018 February 23 7 Mark Schubin, HPA Tech Retreat, 2018 February 23 8 18th & 19th Centuries part of Charles Wheatstone’s version of Wolfgang von Kempelen’s talking machine Mark Schubin, HPA Tech Retreat, 2018 February 23 9 Joseph Faber’s Euphonia Mark Schubin, HPA Tech Retreat, 2018 February 23 10 Electronics: Initially Not for Synthesis “vocoder” 1940 Homer Dudley, "The Carrier Nature of Speech" version Bell System Technical Journal, October 1940 images courtesy of Bell Labs Archive Mark Schubin, HPA Tech Retreat, 2018 February 23 11 But There Were World’s Fairs in 1939 “vocoder” “voder” Homer Dudley, "The Carrier Nature of Speech" Bell System Technical Journal, October
    [Show full text]
  • Chapter 2 a Survey on Speech Synthesis Techniques
    Chapter 2 A Survey on Speech Synthesis Techniques* Producing synthetic speech segments from natural language text utterances comes with a unique set of challenges and is currently under serviced due to the unavail- ability of a generic model for all available languages. This chapter presents a study on the existing speech synthesis techniques along with their major advantages and deficiencies. The classification of different standard speech synthesis techniques are presented in Figure 2.1. This chapter also discusses about the current status of the text to speech technology in Indian languages focusing on the issues to be resolved in proposing a generic model for different Indian languages. Speech Synthesis Techniques Articulatory Formant Concatenative Statistical Unit Selection Domain Specific Diphone Syllable-based Figure 2.1: Classification of speech synthesis techniques *S. P. Panda, A. K. Nayak, and S. Patnaik, “Text to Speech Synthesis with an Indian Language Perspective”, International Journal of Grid and Utility Computing, Inderscience, Vol. 6, No. 3/4, pp. 170-178, 2015 17 Chapter 2: A Survey on Speech Synthesis Techniques 18 2.1 Articulatory Synthesis Articulatory synthesis models the natural speech production process of human. As a speech synthesis method this is not among the best, when the quality of the produced speech is the main criterion. However, for studying speech production process it is the most suitable method adopted by the researchers [48]. To understand the articulatory speech synthesis process the human speech production process is needed to be understood first. The human speech production organs (i.e. main articulators - the tongue, the jaw and the lips, etc as well as other important parts of the vocal tract) is shown in Figure 2.2 [49] along with the idealized model, which is the basis of almost every acoustical model.
    [Show full text]
  • Chapter 8 Speech Synthesis
    PRELIMINARY PROOFS. Unpublished Work c 2008 by Pearson Education, Inc. To be published by Pearson Prentice Hall, Pearson Education, Inc., Upper Saddle River, New Jersey. All rights reserved. Permission to use this unpublished Work is granted to individuals registering through [email protected] for the instructional purposes not exceeding one academic term or semester. Chapter 8 Speech Synthesis And computers are getting smarter all the time: Scientists tell us that soon they will be able to talk to us. (By ‘they’ I mean ‘computers’: I doubt scientists will ever be able to talk to us.) Dave Barry In Vienna in 1769, Wolfgang von Kempelen built for the Empress Maria Theresa the famous Mechanical Turk, a chess-playing automaton consisting of a wooden box filled with gears, and a robot mannequin sitting behind the box who played chess by moving pieces with his mechanical arm. The Turk toured Europeand the Americas for decades, defeating Napolean Bonaparte and even playing Charles Babbage. The Mechanical Turk might have been one of the early successes of artificial intelligence if it were not for the fact that it was, alas, a hoax, powered by a human chessplayer hidden inside the box. What is perhaps less well-known is that von Kempelen, an extraordinarily prolific inventor, also built between 1769 and 1790 what is definitely not a hoax: the first full-sentence speech synthesizer. His device consisted of a bellows to simulate the lungs, a rubber mouthpiece and a nose aperature, a reed to simulate the vocal folds, various whistles for each of the fricatives. and a small auxiliary bellows to provide the puff of air for plosives.
    [Show full text]
  • Personalising Synthetic Voices for Individuals with Severe Speech Impairment
    PERSONALISING SYNTHETIC VOICES FOR INDIVIDUALS WITH SEVERE SPEECH IMPAIRMENT Sarah M. Creer DOCTOR OF PHILOSOPHY AT DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF SHEFFIELD SHEFFIELD, UK AUGUST 2009 IMAGING SERVICES NORTH Boston Spa, Wetherby West Yorkshire, lS23 7BQ www.bl.uk Contains CD Table of Contents Table of Contents ii List of Figures vi Abbreviations x 1 Introduction 1 1.1 Population affected by speech loss and impairment 2 1.1.1 Dysarthria .......... 2 1.2 Voice Output Communication Aids . 3 1.3 Scope of the thesis . 5 1.4 Thesis overview . 6 1.4.1 Chapter 2: VOCAs and social interaction 6 1.4.2 Chapter 3: Speech synthesis methods and evaluation. 7 1.4.3 Chapter 4: HTS - HMM-based synthesis. 7 1.4.4 Chapter 5: Voice banking using HMM-based synthesis for data pre- deterioration .. 7 1.4.5 Chapter 6: Building voices using dysarthric data 8 1.4.6 Chapter 7: Conclusions and further work 8 2 VOCAs and social interaction 9 2.1 Introduction ...................... 9 2.2 Acceptability of AAC: social interaction perspective 9 2.3 Acceptability of VOCAs . 10 2.3.1 Speed of access . 11 2.3.2 Aspects related to the output voice . 15 2.4 Dysarthric speech. 21 2.4.1 Types of dysarthria ........ 22 2.4.2 Acoustic description of dysarthria . 23 2.4.3 Theory of dysarthric speech production 26 2.5 Requirements of aVOCA . .. ... 27 ii 2.6 Conclusions................. 27 3 Speech synthesis methods and evaluation 29 3.1 Introduction ..... 29 3.2 Evaluation...... 29 3.2.1 Intelligibility 30 3.2.2 Naturalness.
    [Show full text]
  • Challenges and Perspectives on Real-Time Singing Voice Synthesis S´Intesede Voz Cantada Em Tempo Real: Desafios E Perspectivas
    Revista de Informatica´ Teorica´ e Aplicada - RITA - ISSN 2175-2745 Vol. 27, Num. 04 (2020) 118-126 RESEARCH ARTICLE Challenges and Perspectives on Real-time Singing Voice Synthesis S´ıntesede voz cantada em tempo real: desafios e perspectivas Edward David Moreno Ordonez˜ 1, Leonardo Araujo Zoehler Brum1* Abstract: This paper describes the state of art of real-time singing voice synthesis and presents its concept, applications and technical aspects. A technological mapping and a literature review are made in order to indicate the latest developments in this area. We made a brief comparative analysis among the selected works. Finally, we have discussed challenges and future research problems. Keywords: Real-time singing voice synthesis — Sound Synthesis — TTS — MIDI — Computer Music Resumo: Este trabalho trata do estado da arte do campo da s´ıntese de voz cantada em tempo real, apre- sentando seu conceito, aplicac¸oes˜ e aspectos tecnicos.´ Realiza um mapeamento tecnologico´ uma revisao˜ sistematica´ de literatura no intuito de apresentar os ultimos´ desenvolvimentos na area,´ perfazendo uma breve analise´ comparativa entre os trabalhos selecionados. Por fim, discutem-se os desafios e futuros problemas de pesquisa. Palavras-Chave: S´ıntese de Voz Cantada em Tempo Real — S´ıntese Sonora — TTS — MIDI — Computac¸ao˜ Musical 1Universidade Federal de Sergipe, Sao˜ Cristov´ ao,˜ Sergipe, Brasil *Corresponding author: [email protected] DOI: http://dx.doi.org/10.22456/2175-2745.107292 • Received: 30/08/2020 • Accepted: 29/11/2020 CC BY-NC-ND 4.0 - This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 1. Introduction voice synthesis embedded systems, through the description of its concept, theoretical premises, main used techniques, latest From a computational vision, the aim of singing voice syn- developments and challenges for future research.
    [Show full text]
  • First Presented at the SMPTE 2014 Annual Technical Conference in Hollywood, CA, Oct
    First presented at the SMPTE 2014 Annual Technical Conference in Hollywood, CA, Oct. 20-24, 2014. © SMPTE 2014 FOR YOUR EYES ONLY! SMPTE Meeting Presentation The Origins of Audio and Video Compression: Some Pale Gleams from the Past Jon D. Paul, MSEE Scientific Conversion, Inc., Crypto-Museum, California, USA, [email protected] Written for presentation at the SMPTE 2014 Annual Technical Conference & Exhibition Abstract. The paper explores the history that led to all audio and video compression. The roots of digital compression sprang from Dudley's speech VOCODER, and a secret WWII speech scrambler. The paper highlights these key inventions, details their hardware, describes how they functioned, and connects them to modern digital audio and digital video compression algorithms. The first working speech synthesizer was Homer Dudley's VOCODER. In 1928, he used analysis of speech into components and a bandpass filter bank to achieve 10 times speech compression ratio. In 1942, Bell Telephone Laboratories' SIGSALY was the first unbreakable speech scrambler. Dudley with Bell Laboratories invented 11 fundamental techniques that are the foundation of all digital compression today. The paper concludes with block diagrams of audio and video compression algorithms to show their close relationship to the VOCODER and SIGSALY. Keywords. audio compression, speech compression, video compression, spread spectrum, coded orthogonal frequency-division multiplexing, COFDM, mobile phone compression, speech synthesis, speech encryption, speech scrambler, MP3, CELP, MPEG-1, AC-3, H.264, MPEG-4, SIGSALY, VOCODER, VODER, National Security Agency, NSA, Homer Dudley, Hedy Lamarr. The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily reflect the official position of the Society of Motion Picture and Television Engineers (SMPTE), and its printing and distribution does not constitute an endorsement of views which may be expressed.
    [Show full text]
  • Robust Structured Voice Extraction for Flexible Expressive Resynthesis
    ROBUST STRUCTURED VOICE EXTRACTION FOR FLEXIBLE EXPRESSIVE RESYNTHESIS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Pamornpol Jinachitra June 2007 c Copyright by Pamornpol Jinachitra 2007 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Julius O. Smith, III Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Robert M. Gray I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Jonathan S. Abel Approved for the University Committee on Graduate Studies. iii iv Abstract Parametric representation of audio allows for a reduction in the amount of data needed to represent the sound. If chosen carefully, these parameters can capture the expressiveness of the sound, while reflecting the production mechanism of the sound source, and thus allow for an intuitive control in order to modify the original sound in a desirable way. In order to achieve the desired parametric encoding, algorithms which can robustly identify the model parameters even from noisy recordings are needed. As a result, not only do we get an expressive and flexible coding system, we can also obtain a model-based speech enhancement that reconstructs the speech embedded in noise cleanly and free of musical noise usually associated with the filter-based approach.
    [Show full text]