Building Synthetic Voices

Total Page:16

File Type:pdf, Size:1020Kb

Building Synthetic Voices Building Synthetic Voices Alan W Black Kevin A. Lenzo Building Synthetic Voices by Alan W Black and Kevin A. Lenzo For FestVox 2.0 Edition Copyright © 1999-2003 by Alan W Black & Kevin A. Lenzo Permission to use, copy, modify and distribute this document for any purpose and without fee is hereby granted in perpetuity, provided that the above copyright notice and this paragraph appear in all copies. Table of Contents I. Speech Synthesis.............................................................................................................?? 1. Overview of Speech Synthesis .............................................................................?? History................................................................................................................?? Uses of Speech Synthesis.................................................................................?? General Anatomy of a Synthesizer ................................................................?? 2. Speech Science ........................................................................................................?? 3. A Practical Speech Synthesis System ..................................................................?? Basic Use ............................................................................................................?? Utterance structure...........................................................................................?? Modules..............................................................................................................?? Utterance access................................................................................................?? Utterance building............................................................................................?? Extracting features from utterances ...............................................................?? II. Building Synthetic Voices............................................................................................?? 4. Basic Requirements................................................................................................?? Hardware/software requirements.................................................................?? Voice in a new language ..................................................................................?? Voice in an existing language..........................................................................?? Selecting a speaker ...........................................................................................?? Who owns a voice.............................................................................................?? Recording under Unix......................................................................................?? Extracting pitchmarks from waveforms........................................................?? 5. Limited domain synthesis.....................................................................................?? designing the prompts .....................................................................................?? customizing the synthesizer front end ..........................................................?? autolabeling issues ...........................................................................................?? unit size and type .............................................................................................?? using limited domain synthesizers ................................................................?? Telling the time..................................................................................................?? Making it better.................................................................................................?? 6. Text analysis ............................................................................................................?? Non-standard words analysis.........................................................................?? Token to word rules..........................................................................................?? Number pronunciation ....................................................................................?? Homograph disambiguation...........................................................................?? TTS modes .........................................................................................................?? Mark-up modes.................................................................................................?? 7. Lexicons ...................................................................................................................?? Word pronunciations........................................................................................?? Lexicons and addenda .....................................................................................?? Out of vocabulary words.................................................................................?? Building letter-to-sound rules by hand .........................................................?? Building letter-to-sound rules automatically ...............................................?? Post-lexical rules ...............................................................................................?? Building lexicons for new languages.............................................................?? 8. Building prosodic models .....................................................................................?? Phrasing .............................................................................................................?? Accent/Boundary Assignment.......................................................................?? F0 Generation ....................................................................................................?? Duration .............................................................................................................?? Prosody Research..............................................................................................?? Prosody Walkthrough ......................................................................................?? 9. Corpus developement ...........................................................................................?? 10. Waveform Synthesis.............................................................................................?? 5 11. Diphone databases ...............................................................................................?? Diphone introduction.......................................................................................?? Defining a diphone list.....................................................................................?? Recording the diphones...................................................................................?? Labeling the diphones......................................................................................?? Extracting the pitchmarks ...............................................................................?? Building LPC parameters ................................................................................?? Defining a diphone voice.................................................................................?? Checking and correcting diphones ................................................................?? Diphone check list ............................................................................................?? 12. Unit selection databases ......................................................................................?? Cluster unit selection........................................................................................?? Building a Unit Selection Cluster Voice.........................................................?? Diphones from general databases ..................................................................?? 13. Labeling Speech....................................................................................................?? Labeling with Dynamic Time Warping .........................................................?? Labeling with Full Acoustic Models..............................................................?? Prosodic Labeling .............................................................................................?? 14. Evaluation and Improvements...........................................................................?? Evaluation..........................................................................................................?? Does it work at all?...........................................................................................?? Formal Evaluation Tests ..................................................................................?? Debugging voices .............................................................................................?? III. Interfacing and Integration ........................................................................................?? 15. Markup ..................................................................................................................?? 16. Concept-to-speech................................................................................................?? 17. Deployment...........................................................................................................?? IV. Recipes ...........................................................................................................................??
Recommended publications
  • Bringing GNU Emacs to Native Code
    Bringing GNU Emacs to Native Code Andrea Corallo Luca Nassi Nicola Manca [email protected] [email protected] [email protected] CNR-SPIN Genoa, Italy ABSTRACT such a long-standing project. Although this makes it didactic, some Emacs Lisp (Elisp) is the Lisp dialect used by the Emacs text editor limitations prevent the current implementation of Emacs Lisp to family. GNU Emacs can currently execute Elisp code either inter- be appealing for broader use. In this context, performance issues preted or byte-interpreted after it has been compiled to byte-code. represent the main bottleneck, which can be broken down in three In this work we discuss the implementation of an optimizing com- main sub-problems: piler approach for Elisp targeting native code. The native compiler • lack of true multi-threading support, employs the byte-compiler’s internal representation as input and • garbage collection speed, exploits libgccjit to achieve code generation using the GNU Com- • code execution speed. piler Collection (GCC) infrastructure. Generated executables are From now on we will focus on the last of these issues, which con- stored as binary files and can be loaded and unloaded dynamically. stitutes the topic of this work. Most of the functionality of the compiler is written in Elisp itself, The current implementation traditionally approaches the prob- including several optimization passes, paired with a C back-end lem of code execution speed in two ways: to interface with the GNU Emacs core and libgccjit. Though still a work in progress, our implementation is able to bootstrap a func- • Implementing a large number of performance-sensitive prim- tional Emacs and compile all lexically scoped Elisp files, including itive functions (also known as subr) in C.
    [Show full text]
  • An Evaluation of Go and Clojure
    An evaluation of Go and Clojure A thesis submitted in partial satisfaction of the requirements for the degree Bachelors of Science in Computer Science Fall 2010 Robert Stimpfling Department of Computer Science University of Colorado, Boulder Advisor: Kenneth M. Anderson, PhD Department of Computer Science University of Colorado, Boulder 1. Introduction Concurrent programming languages are not new, but they have been getting a lot of attention more recently due to their potential with multiple processors. Processors have gone from growing exponentially in terms of speed, to growing in terms of quantity. This means processes that are completely serial in execution will soon be seeing a plateau in performance gains since they can only rely on one processor. A popular approach to using these extra processors is to make programs multi-threaded. The threads can execute in parallel and use shared memory to speed up execution times. These multithreaded processes can significantly speed up performance, as long as the number of dependencies remains low. Amdahl‘s law states that these performance gains can only be relative to the amount of processing that can be parallelized [1]. However, the performance gains are significant enough to be looked into. These gains not only come from the processing being divvied up into sections that run in parallel, but from the inherent gains from sharing memory and data structures. Passing new threads a copy of a data structure can be demanding on the processor because it requires the processor to delve into memory and make an exact copy in a new location in memory. Indeed some studies have shown that the problem with optimizing concurrent threads is not in utilizing the processors optimally, but in the need for technical improvements in memory performance [2].
    [Show full text]
  • Programming with GNU Emacs Lisp
    Programming with GNU Emacs Lisp William H. Mitchell (whm) Mitchell Software Engineering (.com) GNU Emacs Lisp Programming Slide 1 Copyright © 2001-2008 by William H. Mitchell GNU Emacs Lisp Programming Slide 2 Copyright © 2001-2008 by William H. Mitchell Emacs Lisp Introduction A little history GNU Emacs Lisp Programming Slide 3 Copyright © 2001-2008 by William H. Mitchell Introduction GNU Emacs is a full-featured text editor that contains a complete Lisp system. Emacs Lisp is used for a variety of things: • Complete applications such as mail and news readers, IM clients, calendars, games, and browsers of various sorts. • Improved interfaces for applications such as make, diff, FTP, shells, and debuggers. • Language-specific editing support. • Management of interaction with version control systems such as CVS, Perforce, SourceSafe, and StarTeam. • Implementation of Emacs itself—a substantial amount of Emacs is written in Emacs Lisp. And more... GNU Emacs Lisp Programming Slide 4 Copyright © 2001-2008 by William H. Mitchell A little history1 Lisp: John McCarthy is the father of Lisp. The name Lisp comes from LISt Processing Language. Initial ideas for Lisp were formulated in 1956-1958; some were implemented in FLPL (FORTRAN-based List Processing Language). The first Lisp implementation, for application to AI problems, took place 1958-1962 at MIT. There are many dialects of Lisp. Perhaps the most commonly used dialect is Common Lisp, which includes CLOS, the Common Lisp Object System. See http://www-formal.stanford.edu/jmc/history/lisp/lisp.html for some interesting details on the early history of Lisp. 1 Don't quote me! GNU Emacs Lisp Programming Slide 5 Copyright © 2001-2008 by William H.
    [Show full text]
  • Allegro CL User Guide
    Allegro CL User Guide Volume 1 (of 2) version 4.3 March, 1996 Copyright and other notices: This is revision 6 of this manual. This manual has Franz Inc. document number D-U-00-000-01-60320-1-6. Copyright 1985-1996 by Franz Inc. All rights reserved. No part of this pub- lication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means electronic, mechanical, by photocopying or recording, or otherwise, without the prior and explicit written permission of Franz incorpo- rated. Restricted rights legend: Use, duplication, and disclosure by the United States Government are subject to Restricted Rights for Commercial Software devel- oped at private expense as specified in DOD FAR 52.227-7013 (c) (1) (ii). Allegro CL and Allegro Composer are registered trademarks of Franz Inc. Allegro Common Windows, Allegro Presto, Allegro Runtime, and Allegro Matrix are trademarks of Franz inc. Unix is a trademark of AT&T. The Allegro CL software as provided may contain material copyright Xerox Corp. and the Open Systems Foundation. All such material is used and distrib- uted with permission. Other, uncopyrighted material originally developed at MIT and at CMU is also included. Appendix B is a reproduction of chapters 5 and 6 of The Art of the Metaobject Protocol by G. Kiczales, J. des Rivieres, and D. Bobrow. All this material is used with permission and we thank the authors and their publishers for letting us reproduce their material. Contents Volume 1 Preface 1 Introduction 1.1 The language 1-1 1.2 History 1-1 1.3 Format
    [Show full text]
  • Commercial Tools in Speech Synthesis Technology
    International Journal of Research in Engineering, Science and Management 320 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792 Commercial Tools in Speech Synthesis Technology D. Nagaraju1, R. J. Ramasree2, K. Kishore3, K. Vamsi Krishna4, R. Sujana5 1Associate Professor, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India 2Professor, Dept. of Computer Science, Rastriya Sanskrit VidyaPeet, Tirupati, India 3,4,5UG Student, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India Abstract: This is a study paper planned to a new system phonetic and prosodic information. These two phases are emotional speech system for Telugu (ESST). The main objective of usually called as high- and low-level synthesis. The input text this paper is to map the situation of today's speech synthesis might be for example data from a word processor, standard technology and to focus on potential methods for the future. ASCII from e-mail, a mobile text-message, or scanned text Usually literature and articles in the area are focused on a single method or single synthesizer or the very limited range of the from a newspaper. The character string is then preprocessed and technology. In this paper the whole speech synthesis area with as analyzed into phonetic representation which is usually a string many methods, techniques, applications, and products as possible of phonemes with some additional information for correct is under investigation. Unfortunately, this leads to a situation intonation, duration, and stress. Speech sound is finally where in some cases very detailed information may not be given generated with the low-level synthesizer by the information here, but may be found in given references.
    [Show full text]
  • Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden
    Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9932-9937, October 1995 Colloquium Paper This paper was presented at a colloquium entitled "Human-Machine Communication by Voice," organized by Lawrence R. Rabiner, held by the National Academy of Sciences at The Arnold and Mabel Beckman Center in Irvine, CA, February 8-9,1993. Models of speech synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden ABSTRACT The term "speech synthesis" has been used need large amounts of speech data. Models working close to for diverse technical approaches. In this paper, some of the the waveform are now typically making use of increased unit approaches used to generate synthetic speech in a text-to- sizes while still modeling prosody by rule. In the middle of the speech system are reviewed, and some of the basic motivations scale, "formant synthesis" is moving toward the articulatory for choosing one method over another are discussed. It is models by looking for "higher-level parameters" or to larger important to keep in mind, however, that speech synthesis prestored units. Articulatory synthesis, hampered by lack of models are needed not just for speech generation but to help data, still has some way to go but is yielding improved quality, us understand how speech is created, or even how articulation due mostly to advanced analysis-synthesis techniques. can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages Flexibility and Technical Dimensions are discussed as special challenges facing the speech synthesis community.
    [Show full text]
  • Published with the Approbation of the Board of Trustees
    JOHNS HOPKINS UNIVERSITY CIRCULARS Published with the approbation of the Board of Trustees VOL. V.—No. 46.] BALTIMORE, JANUARY, 1886. [PRICE, 10 CENTS. RECENT PUBLICATIONS. Photograph of the Normal Solar Spectrum. Made Reproduction in Phototype of a Syriac Manuscript by PROFESSOR H. A. ROWLAND. (Baltimore, Johns Hopkins University, 1886). (Williams MS.) with the Antilegomena Epistles. Edited by IsAAC H. HALL. (Baltimore, Johns Hopkins University, 1886). [Froma letter to Science, New York, December 18, 1885]. The photographic map of the spectrum, upon which Professor Rowland [Froman article by I. H. Hall in the Journal of the Society of Biblical Literature and has expended so much hard work during the past three years, is nearly Exegesis for 1885, with additions]. ready for publication. The map is issued in a series of seven plates, cover- In September last (1884) I announced in The Independent the discovery ing the region from wave-length 3100 to 5790. Each plate is three feet of a manuscript of the Acts and Epistles, among which occur also the long and one foot wide, and contains two strips of the spectrum, except Epistles that were antilegomena among the Syrians; namely the Second plate No. 2, which contains three. Most of the plates are on a scalethree Epistle of Peter, the Second and Third Epistles of John, and the Epistle of times that of Angstrdm’s map, and in definition are more than equal to any Jude, in the version usually printed with our Peshitto New Testaments. It map yet published, at least to wave-length 5325. The 1474 line is widely is well known that the printed copies of these Epistles in that version all double, as also are b rest upon one manuscriptonly, in the Bodleian Library at Oxford, England, 3 and b4, while E may be recognized as double by the from which they were first published by Edward Pococke (Leyden, Elzevirs) expert.
    [Show full text]
  • Lexical Scope and Function Closures +Closures.Rkt
    CS 251 SpringFall 2019 2020 Principles of of Programming Programming Languages Languages λ Ben Wood Lexical Scope and Function Closures +closures.rkt https://cs.wellesley.edu/~cs251/s20/ Lexical Scope and Function Closures 1 Topics • Lexical vs dynamic scope • Closures implement lexical scope. • Design considerations: why lexical scope? • Relevant design dimensions Lexical Scope and Function Closures 2 A question of scope (warmup) (define x 1) (define f (lambda (y) (+ x y))) (define z (let ([x 2] [y 3]) (f (+ x y)))) What is the argument value passed to this function application? Lexical Scope and Function Closures 3 A question of scope (define x 1) (define f (lambda (y) (+ x y))) (define z (let ([x 2] [y 3]) (f (+ x y)))) What is the value of x when this function body is evaluated for this function application? Lexical Scope and Function Closures 4 A question of free variables A variable, x, is free in an expression, e, if x is referenced in e outside the scope of any binding of x within e. x is a free variable (define x 1) of the lambda expression. (define f (lambda (y) (+ x y))) (define z (let ([x 2] [y 3]) (f (+ x y)))) To what bindings do free variables of a function refer when the function is applied? Lexical Scope and Function Closures 5 Answer 1: lexical (static) scope A variable, x, is free in an expression, e, if x is referenced in e outside the scope of any binding of x within e. x is a free variable (define x 1) of the lambda expression.
    [Show full text]
  • Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique
    Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique Submitted to Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, INDIA For the award of Doctor of Philosophy by Mr. Sangramsing Nathsing Kayte Supervised by: Dr. Bharti W. Gawali Professor Department of Computer Science and Information Technology DR. BABASAHEB AMBEDKAR MARATHWADA UNIVERSITY, AURANGABAD November 2018 Abstract A speech synthesis system is a computer-based system that should be able to read any text aloud with a particular language or multiple languages. This is also called as Text-to- Speech synthesis or in short TTS. Communication plays an important role in everyones life. Usually communication refers to speaking or writing or sending a message to another person. Speech is one of the most important ways for human communication. There have been a great number of efforts to incorporate speech for communication between humans and computers. Demand for technologies in speech processing, such as speech recogni- tion, dialogue processing, natural language processing, and speech synthesis are increas- ing. These technologies are useful for human-to-human like spoken language translation systems and human-to-machine communication like control for handicapped persons etc., TTS is one of the key technologies in speech processing. TTS system converts ordinary orthographic text into acoustic signal, which is indis- tinguishable from human speech. For developing a natural human machine interface, the TTS system can be used as a way to communicate back, like a human voice by a computer. The TTS can be a voice for those people who cannot speak. People wishing to learn a new language can use the TTS system to learn the pronunciation.
    [Show full text]
  • Wormed Voice Workshop Presentation
    Wormed Voice Workshop Presentation micro_research December 27, 2017 1 some worm poetry and songs: The WORM was for a long time desirous to speake, but the rule and order of the Court enjoyned him silence, but now strutting and swelling, and impatient, of further delay, he broke out thus... [Michael Maier] He worshipped the worm and prayed to the wormy grave. Serpent Lucifer, how do you do? Of your worms and your snakes I'd be one or two; For in this dear planet of wool and of leather `Tis pleasant to need neither shirt, sleeve, nor shoe, 2 And have arm, leg, and belly together. Then aches your head, or are you lazy? Sing, `Round your neck your belly wrap, Tail-a-top, and make your cap Any bee and daisy. Two pigs' feet, two mens' feet, and two of a hen; Devil-winged; dragon-bellied; grave- jawed, because grass Is a beard that's soon shaved, and grows seldom again worm writing the the the the,eeeronencoug,en sthistit, d.).dupi w m,tinsprsool itav f bometaisp- pav wheaigelic..)a?? orerdi mise we ich'roo bish ftroo htothuloul mespowouklain- duteavshi wn,jis, sownol hof." m,tisorora angsthyedust,es, fofald,junss ownoug brad,)fr m fr,aA?a????ck;A?stelav aly, al is.'rady'lfrdil owoncorara wns t.) sh'r, oof ofr,a? ar,a???????a? fu mo towess,eethen hrtolly-l,."tigolav ict,a???!ol, w..'m,elyelil,tstreamas..n gotaillas.tansstheatsea f mb ispot inici t.) owar.**1 wnshigigholoothtith orsir.tsotic.'m, sotamimoledug imootrdeavet..t,) sh s,tranciror."wn sieee h asinied.tiear wspilotor,) bla av.nicord,ier.dy'et.*tite m.)..*d, hrouceto hie, ig il m, bsomoug,.t.'l,t, olitel bs,.nt,.dotr tat,)aa? htotitedont,j alesil, starar,ja taie ass.nishiceroouldseal fotitoonckysil, m oitispl o anteeeaicowousomirot.
    [Show full text]
  • Central Islip Union Free School District and Central Islip Teachers Association (2005)
    NYS PERB Contract Collection – Metadata Header This contract is provided by the Martin P. Catherwood Library, ILR School, Cornell University. The information provided is for noncommercial educational use only. Some variations from the original paper document may have occurred during the digitization process, and some appendices or tables may be absent. Subsequent changes, revisions, and corrections may apply to this document. For more information about the PERB Contract Collection, see http://digitalcommons.ilr.cornell.edu/perbcontracts/ Or contact us: Catherwood Library, Ives Hall, Cornell University, Ithaca, NY 14853 607-254-5370 [email protected] Contract Database Metadata Elements Title: Central Islip Union Free School District and Central Islip Teachers Association (2005) Employer Name: Central Islip Union Free School District Union: Central Islip Teachers Association Effective Date: 07/01/05 Expiration Date: 06/30/15 PERB ID Number: 4733 Unit Size: Number of Pages: 142 For additional research information and assistance, please visit the Research page of the Catherwood website - http://www.ilr.cornell.edu/library/research/ For additional information on the ILR School - http://www.ilr.cornell.edu/ CENTRAL,ISLIP ···TEACHERS·Xs§'6e!ATION TABLE OF CONTENTS ARTICLE Preamble 1 I Recognition 2 II Negotiation Procedure '" . 4 III Teacher Association & Board Rights 6 IV Grievance Procedure '" '" 10 V Teacher-Administration Liaison 15 VI Salaries , 16 VII Teaching Hours and Teaching Load................... ...18 VIII Class Size , 22 IX Specialists
    [Show full text]
  • A Python Implementation for Racket
    PyonR: A Python Implementation for Racket Pedro Alexandre Henriques Palma Ramos Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering Supervisor: António Paulo Teles de Menezes Correia Leitão Examination Committee Chairperson: Prof. Dr. José Manuel da Costa Alves Marques Supervisor: Prof. Dr. António Paulo Teles de Menezes Correia Leitão Member of the Committee: Prof. Dr. João Coelho Garcia October 2014 ii Agradecimentos Agradec¸o... Em primeiro lugar ao Prof. Antonio´ Leitao,˜ por me ter dado a oportunidade de participar no projecto Rosetta com esta tese de mestrado, por todos os sabios´ conselhos e pelos momentos de discussao˜ e elucidac¸ao˜ que se proporcionaram ao longo deste trabalho. Aos meus pais excepcionais e a` minha mana preferida, por me terem aturado e suportado ao longo destes quase 23 anos e sobretudo pelo incondicional apoio durante estes 5 anos de formac¸ao˜ superior. Ao pessoal do Grupo de Arquitectura e Computac¸ao˜ (Hugo Correia, Sara Proenc¸a, Francisco Freire, Pedro Alfaiate, Bruno Ferreira, Guilherme Ferreira, Inesˆ Caetano e Carmo Cardoso), por todas as sug- estoes˜ e pelo inestimavel´ feedback em artigos e apresentac¸oes.˜ Aos amigos em Tomar (Rodrigo Carrao,˜ Hugo Matos, Andre´ Marques e Rui Santos) e em Lisboa (Diogo da Silva, Nuno Silva, Pedro Engana, Kaguedes, Clara Paiva e Odemira), por terem estado pre- sentes, duma forma ou doutra, nos essenciais momentos de lazer. A` Fundac¸ao˜ para a Cienciaˆ e Tecnologia (FCT) e ao INESC-ID pelo financiamento e acolhimento atraves´ da atribuic¸ao˜ de uma bolsa de investigac¸ao˜ no ambitoˆ dos contratos Pest-OE/EEI/LA0021/2013 e PTDC/ATP-AQI/5224/2012.
    [Show full text]