Building Synthetic Voices

Building Synthetic Voices Alan W Black Kevin A. Lenzo Building Synthetic Voices by Alan W Black and Kevin A. Lenzo For FestVox 2.0 Edition Copyright © 1999-2003 by Alan W Black & Kevin A. Lenzo Permission to use, copy, modify and distribute this document for any purpose and without fee is hereby granted in perpetuity, provided that the above copyright notice and this paragraph appear in all copies. Table of Contents I. Speech Synthesis.............................................................................................................?? 1. Overview of Speech Synthesis .............................................................................?? History................................................................................................................?? Uses of Speech Synthesis.................................................................................?? General Anatomy of a Synthesizer ................................................................?? 2. Speech Science ........................................................................................................?? 3. A Practical Speech Synthesis System ..................................................................?? Basic Use ............................................................................................................?? Utterance structure...........................................................................................?? Modules..............................................................................................................?? Utterance access................................................................................................?? Utterance building............................................................................................?? Extracting features from utterances ...............................................................?? II. Building Synthetic Voices............................................................................................?? 4. Basic Requirements................................................................................................?? Hardware/software requirements.................................................................?? Voice in a new language ..................................................................................?? Voice in an existing language..........................................................................?? Selecting a speaker ...........................................................................................?? Who owns a voice.............................................................................................?? Recording under Unix......................................................................................?? Extracting pitchmarks from waveforms........................................................?? 5. Limited domain synthesis.....................................................................................?? designing the prompts .....................................................................................?? customizing the synthesizer front end ..........................................................?? autolabeling issues ...........................................................................................?? unit size and type .............................................................................................?? using limited domain synthesizers ................................................................?? Telling the time..................................................................................................?? Making it better.................................................................................................?? 6. Text analysis ............................................................................................................?? Non-standard words analysis.........................................................................?? Token to word rules..........................................................................................?? Number pronunciation ....................................................................................?? Homograph disambiguation...........................................................................?? TTS modes .........................................................................................................?? Mark-up modes.................................................................................................?? 7. Lexicons ...................................................................................................................?? Word pronunciations........................................................................................?? Lexicons and addenda .....................................................................................?? Out of vocabulary words.................................................................................?? Building letter-to-sound rules by hand .........................................................?? Building letter-to-sound rules automatically ...............................................?? Post-lexical rules ...............................................................................................?? Building lexicons for new languages.............................................................?? 8. Building prosodic models .....................................................................................?? Phrasing .............................................................................................................?? Accent/Boundary Assignment.......................................................................?? F0 Generation ....................................................................................................?? Duration .............................................................................................................?? Prosody Research..............................................................................................?? Prosody Walkthrough ......................................................................................?? 9. Corpus developement ...........................................................................................?? 10. Waveform Synthesis.............................................................................................?? 5 11. Diphone databases ...............................................................................................?? Diphone introduction.......................................................................................?? Deﬁning a diphone list.....................................................................................?? Recording the diphones...................................................................................?? Labeling the diphones......................................................................................?? Extracting the pitchmarks ...............................................................................?? Building LPC parameters ................................................................................?? Deﬁning a diphone voice.................................................................................?? Checking and correcting diphones ................................................................?? Diphone check list ............................................................................................?? 12. Unit selection databases ......................................................................................?? Cluster unit selection........................................................................................?? Building a Unit Selection Cluster Voice.........................................................?? Diphones from general databases ..................................................................?? 13. Labeling Speech....................................................................................................?? Labeling with Dynamic Time Warping .........................................................?? Labeling with Full Acoustic Models..............................................................?? Prosodic Labeling .............................................................................................?? 14. Evaluation and Improvements...........................................................................?? Evaluation..........................................................................................................?? Does it work at all?...........................................................................................?? Formal Evaluation Tests ..................................................................................?? Debugging voices .............................................................................................?? III. Interfacing and Integration ........................................................................................?? 15. Markup ..................................................................................................................?? 16. Concept-to-speech................................................................................................?? 17. Deployment...........................................................................................................?? IV. Recipes ...........................................................................................................................??

Building Synthetic Voices

Bringing GNU Emacs to Native Code

An Evaluation of Go and Clojure

Programming with GNU Emacs Lisp

Allegro CL User Guide

Commercial Tools in Speech Synthesis Technology

Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden

Published with the Approbation of the Board of Trustees

Lexical Scope and Function Closures +Closures.Rkt

Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique

Wormed Voice Workshop Presentation

Central Islip Union Free School District and Central Islip Teachers Association (2005)

A Python Implementation for Racket