Speech Synthesis Today – How Did It Get to Be This Advanced?
Total Page:16
File Type:pdf, Size:1020Kb
9/26/14 # Why do I care?$ Speech Syntesis • # BRIDGE BETWEEN SPEECH SCIENCE AND $$ $$ $$ $$ $$ $$ $$ $$ $$ $Heater Campbel ASSITIVE TECHNOLOGY TO HELP THOSE WITH # # MOTOR SPEECH CHALLENGES $ • BETTER UNDERSTAND TECHNOLOGY USED TODAY – SOME OF THE SAME TECHNOLOGY IN MY LAB IS USED IN SPEECH SYNTHESIS TODAY – HOW DID IT GET TO BE THIS ADVANCED? $ • YET FIRST SPEECH GENERATING DEVICES (SGDs) Septmber 24, 2014$ NOT USED AS ASSISTIVE TECHNOLOGY – AUTOMATA AS ODDITY – COOL MACHINE Earliest Speech Syntesis )18-19t Centuries)$ Early Speech Syntesis )20t Century) Anecdotes from 13th Century of attempts to build machines that imitate human speech, deemed heretical by the church, destroyed. 1857, M. Faber - "Euphonia“ – invenon in circus 1779, Christian Kratzenstein, Russian Academy of Sciences, machine modeling the human vocal tract that could produce the five long vowel sounds – amazed people and also scared them – voice described as ghostly and monotonous 1791, Wolfgang von Kempelen, bellows-operated "acoustic-mechanical speech machine", machine added models of the tongue and lips, enabling it to produce consonants as well as vowels. 1837, Charles Wheatstone - "speaking machine" based on von Kempelen's design, 1922 J. Q. Stewart’s electrical analog of the vocal organs (Flanagan, 1972, as cited by Haskins, 2008) (as cited by Haskins, 2008) Early Speech Syntesis )20t Century) Early Speech Syntesis )20t Century) 1937, R. Riesz – modeled human vocal tract with valves and chambers 1930s, Bell Labs, Vocoder, ”Voice Decoder” 1939- Homer Dudley - The Voder (Voice Operating Demonstrator) -automatically analyzed speech into its fundamental tone and resonances. - keyboard-operated voice synthesizer (as cited by Haskins, 2008) (Klatt, 1987, as cited by Haskins, 2008) 1 9/26/14 t Early Speech Syntesis )20 Century) Since te 1970s$ Rapid development associated with computer technological advancements 1960- Reg Maling- POSSUM- 1980s/1990s -MITalk system, Dennis Klatt ”Klattalk” MIT, Bell Labs system- used 1940s-1950 – Franklin Cooper - Haskins patient-operated selector sophisticated voicing source - formed basis for many systems used today - first Laboratories - Pattern playback -- mechanism- scanned through multilingual language-independent systems, making extensive use of natural converts pictures of the acoustic patterns images on illuminated display language processing methods –- included 6000 words that do not follow letter-sound of speech in the form of a spectrogram conversions (Koul, 2011, p. 267) back into sound. (as cited by Haskins, 2008) (Wikipedia, 2014) (Lemmetty, 1999) Great audio examples of speech synthesis history 1939-1985: http://www.cs.indiana.edu/rhythmsp/ASA/Contents.html Wit Computrs comes # Syntesized Speech$ Digitzed Speech$ • create novel words/messages - messages created from Use prerecorded speech, limited to only a few phrases, best in institutional settings to communicate basic needs - switch operated (Koul, 2011) lebers/words, phrases, sentences, pictures, symbols $ Sentient Systems 1983 - “EyeTyper” Concatnatve Syntesis Company later became DynaVox – first SGD with touch-screen technology • Unit selecPon synthesis – combines small parts of prerecorded speech Later digital speech generators: (sounds/syllables) • requires large database • relavely natural sounding, though SpringBoard Lite not very smooth ChatBox ActionVoice Statstcal Parametic Syntesis BigMack • “hidden Markov models (HMMs) and “deep neural Cheap Talk network” (DNN) models, formant synthesis Speak Easy • Machine learning technique – speech waveform Digivox/ Dynamo (by DynaVox) constructed from generated parameters • smooth at boundaries, though not very natural sounding (Lemmetty, 1999) Concatnatve Syntesis$ Input for SGDs$ Text-to-Speech (TTS) uses keyboard, switch, or touch screen, enter words and device uses algorithm to generate synthec speech • Usually concatenave synthesis • Can choose your own vocie Examples • CereProc • Proloquo2Go • ModelTalker (Patel, 2014) 2 9/26/14 0owered by CereProc- te Bush-o-matc$ CereProc$ http://www.idyacy.com/cgi-bin/bushomatic.cgi# https://www.cereproc.com/en/support/live_demo 0roloquo2Go$ http://www.assistiveware.com/product/proloquo2go/voices http://www.assistiveware.com/product/proloquo2go 0ersonalized Syntetc Voices$ 1EDWomen Clip 2 Rupal Patl$ Rupal Patel – combines voice (source) from target talker and speech from speech donor (filter) – to produce customized synthePc voice as part of VocaliD project (Patel, 2013) -added to TTS ModelTalker version 1 device Research showing high intelligibility and recogniPon of voices (Mills, Bunnell, & Patel, 2014). (Patel, 2013) https://www.youtube.com/watch?v=d38LKbYfWrs 3 9/26/14 SOURCES$ Haskins Laboratories. (1996-2008). Talking Heads: Simulacra- The Early History of Talking Machines. From http://www.haskins.yale.edu/featured/heads/simulacra.html. Lemmetty, S. (1999). Review of Speech Synthesis Technology. (Masters), Helsinki University of Technology, Helsinki, Finland. Mills, T., Bunnell, H. T., & Patel, R. (2014). Towards Personalized Speech Synthesis for Augmentative and Alternative Communication. Augmentative and Alternative Communication, 30(3), 226-236. Patel, R. (2013). Synthetic voices, as unique as fingerprints. From https://www.ted.com/talks/rupal_patel_synthetic_voices_as_unique_as_fingerprints#. Koul, R. (2011). Synthetic Speech. In O. Wendt (Ed.), Assistive technology principles and applications for communication disorders and special education (pp. 530). Bingley, U.K.: Emerald. Wikipedia. (2014). Speech-generating device. From http://en.wikipedia.org/wiki/Speech-generating_device 4 .