
Building Synthetic Voices Alan W Black Kevin A. Lenzo Building Synthetic Voices by Alan W Black and Kevin A. Lenzo For FestVox 2.0 Edition Copyright © 1999-2003 by Alan W Black & Kevin A. Lenzo Permission to use, copy, modify and distribute this document for any purpose and without fee is hereby granted in perpetuity, provided that the above copyright notice and this paragraph appear in all copies. Table of Contents I. Speech Synthesis.............................................................................................................?? 1. Overview of Speech Synthesis .............................................................................?? History................................................................................................................?? Uses of Speech Synthesis.................................................................................?? General Anatomy of a Synthesizer ................................................................?? 2. Speech Science ........................................................................................................?? 3. A Practical Speech Synthesis System ..................................................................?? Basic Use ............................................................................................................?? Utterance structure...........................................................................................?? Modules..............................................................................................................?? Utterance access................................................................................................?? Utterance building............................................................................................?? Extracting features from utterances ...............................................................?? II. Building Synthetic Voices............................................................................................?? 4. Basic Requirements................................................................................................?? Hardware/software requirements.................................................................?? Voice in a new language ..................................................................................?? Voice in an existing language..........................................................................?? Selecting a speaker ...........................................................................................?? Who owns a voice.............................................................................................?? Recording under Unix......................................................................................?? Extracting pitchmarks from waveforms........................................................?? 5. Limited domain synthesis.....................................................................................?? designing the prompts .....................................................................................?? customizing the synthesizer front end ..........................................................?? autolabeling issues ...........................................................................................?? unit size and type .............................................................................................?? using limited domain synthesizers ................................................................?? Telling the time..................................................................................................?? Making it better.................................................................................................?? 6. Text analysis ............................................................................................................?? Non-standard words analysis.........................................................................?? Token to word rules..........................................................................................?? Number pronunciation ....................................................................................?? Homograph disambiguation...........................................................................?? TTS modes .........................................................................................................?? Mark-up modes.................................................................................................?? 7. Lexicons ...................................................................................................................?? Word pronunciations........................................................................................?? Lexicons and addenda .....................................................................................?? Out of vocabulary words.................................................................................?? Building letter-to-sound rules by hand .........................................................?? Building letter-to-sound rules automatically ...............................................?? Post-lexical rules ...............................................................................................?? Building lexicons for new languages.............................................................?? 8. Building prosodic models .....................................................................................?? Phrasing .............................................................................................................?? Accent/Boundary Assignment.......................................................................?? F0 Generation ....................................................................................................?? Duration .............................................................................................................?? Prosody Research..............................................................................................?? Prosody Walkthrough ......................................................................................?? 9. Corpus developement ...........................................................................................?? 10. Waveform Synthesis.............................................................................................?? 5 11. Diphone databases ...............................................................................................?? Diphone introduction.......................................................................................?? Defining a diphone list.....................................................................................?? Recording the diphones...................................................................................?? Labeling the diphones......................................................................................?? Extracting the pitchmarks ...............................................................................?? Building LPC parameters ................................................................................?? Defining a diphone voice.................................................................................?? Checking and correcting diphones ................................................................?? Diphone check list ............................................................................................?? 12. Unit selection databases ......................................................................................?? Cluster unit selection........................................................................................?? Building a Unit Selection Cluster Voice.........................................................?? Diphones from general databases ..................................................................?? 13. Labeling Speech....................................................................................................?? Labeling with Dynamic Time Warping .........................................................?? Labeling with Full Acoustic Models..............................................................?? Prosodic Labeling .............................................................................................?? 14. Evaluation and Improvements...........................................................................?? Evaluation..........................................................................................................?? Does it work at all?...........................................................................................?? Formal Evaluation Tests ..................................................................................?? Debugging voices .............................................................................................?? III. Interfacing and Integration ........................................................................................?? 15. Markup ..................................................................................................................?? 16. Concept-to-speech................................................................................................?? 17. Deployment...........................................................................................................?? IV. Recipes ...........................................................................................................................??
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages202 Page
-
File Size-