Turn text into lifelike speech using deep learning
Severin Gassauer-Fleissner Solutions Architect Global Financial Services Amazon Web Services
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. What to expect from this session
• The AWS ML stack
• Amazon Polly basics
• Fun with SSML
• Customized Brand Voice
Amazon Polly © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Frameworks & Infrastructure
AI SERVICES
NEW! NEW! NEW! NEW! VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS
Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Contact Lens Rekognition Polly Transcribe Comprehend Translate Textract Kendra Lex Personalize Forecast Fraud Detector CodeGuru For Amazon Connect +Medical +Medical NEW
ML SERVICES
SageMaker Studio IDE NEW! Ground Augmented ML Truth NEW! NEW! NEW! NEW! NEW! Neo AI Marketplace Model Amazon SageMaker Built-in Model Notebooks Experiments training & Debugger Autopilot Model Monitor algorithms hosting tuning
ML FRAMEWORKS & INFRASTRUCTURE
Deep Learning GPUs & Elastic Inferentia FPGA AMIs & Containers CPUs Inference (Inf1 instance) ML Services
AI SERVICES
NEW! NEW! NEW! NEW! VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS
Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Contact Lens Rekognition Polly Transcribe Comprehend Translate Textract Kendra Lex Personalize Forecast Fraud Detector CodeGuru For Amazon Connect +Medical +Medical NEW
ML SERVICES
SageMaker Studio IDE NEW! Ground Augmented ML Truth NEW! NEW! NEW! NEW! NEW! Neo AI Marketplace Model Amazon SageMaker Built-in Model Notebooks Experiments training & Debugger Autopilot Model Monitor algorithms hosting tuning
ML FRAMEWORKS & INFRASTRUCTURE
Deep Learning GPUs & Elastic Inferentia FPGA AMIs & Containers CPUs Inference (Inf1 instance) AI Services
AI SERVICES
NEW! NEW! NEW! NEW! VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS
Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Contact Lens Rekognition Polly Transcribe Comprehend Translate Textract Kendra Lex Personalize Forecast Fraud Detector CodeGuru For Amazon Connect +Medical +Medical NEW
ML SERVICES
SageMaker Studio IDE NEW! Ground Augmented ML Truth NEW! NEW! NEW! NEW! NEW! Neo AI Marketplace Model Amazon SageMaker Built-in Model Notebooks Experiments training & Debugger Autopilot Model Monitor algorithms hosting tuning
ML FRAMEWORKS & INFRASTRUCTURE
Deep Learning GPUs & Elastic Inferentia FPGA AMIs & Containers CPUs Inference (Inf1 instance) AI Services - Polly
AI SERVICES
NEW! NEW! NEW! NEW! VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS
Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Contact Lens Rekognition Polly Transcribe Comprehend Translate Textract Kendra Lex Personalize Forecast Fraud Detector CodeGuru For Amazon Connect +Medical +Medical NEW
ML SERVICES
SageMaker Studio IDE NEW! Ground Augmented ML Truth NEW! NEW! NEW! NEW! NEW! Neo AI Marketplace Model Amazon SageMaker Built-in Model Notebooks Experiments training & Debugger Autopilot Model Monitor algorithms hosting tuning
ML FRAMEWORKS & INFRASTRUCTURE
Deep Learning GPUs & Elastic Inferentia FPGA AMIs & Containers CPUs Inference (Inf1 instance) © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is Amazon Polly?
Multiple voices Adaptability API driven Re-usable
Amazon Polly Amazon Polly – Language Portfolio
APAC EMEA Americas • Danish • Arabic • Portuguese • Brazilian Portuguese • Dutch • Australian English • Romanian • Brazilian Portuguese NTTS • British English • Indian English • Russian • Canadian French • British English NTTS • Japanese • Spanish • English (US) • French • Hindi • Swedish • English (US) NTTS, • German • Korean • Turkish Newscaster & • Icelandic • Mandarin • Welsh Conversational Styles • Italian • Welsh English • Spanish (Mexican) • Norwegian • Spanish (US) • Polish • Spanish (US) NTTS Some voices in action
English (British) (Brian)
Chinese, Mandarin (Zhiyu)
German (Hans)
Indian English (Aditi) © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Standard vs Neural TTS
Text Sentence to synthesize. Standard vs Neural TTS
Text Sentence to synthesize.
ˈsɛntəns tə ˈsɪnθəˌsaɪz. Phonetic transcription Standard vs Neural TTS
Text Sentence to synthesize.
ˈsɛntəns tə ˈsɪnθəˌsaɪz. Phonetic transcription
Concatenative TTS
ˈsɛnt sɛntəns tə ˈsɪnθ əˌsaɪz. Standard vs Neural TTS
Text Sentence to synthesize.
ˈsɛntəns tə ˈsɪnθəˌsaɪz. Phonetic transcription
Concatenative TTS Neural TTS
ˈsɛnt sɛntəns tə ˈsɪnθ əˌsaɪz. How does Neural Text To Speech work?
Phonemes Mel Spectrograms & Features
CONTEXT GENERATION (Prosody models)
• Focused on intonation • Speaker style specific How does Neural Text To Speech work?
Phonemes Mel Spectrograms SPEECH & Features
CONTEXT SPEECH GENERATION PRODUCTION (Prosody models) (Neural Vocoder)
• Focused on • Focused on Hi-Fi intonation speech signal • Speaker style • Independent of specific speaker or style Neural and concatenative comparison
Kimberly (US) voice Justin (US) child voice “Duck is the common name for numerous “And now, sir,” continued the doctor, species in the waterfowl family Anatidae “since I now know there's such a fellow in which also includes swans and geese. my district, you may count I'll have an Ducks are divided among several eye upon you day and night. I'm not a subfamilies in the family Anatidae.” … doctor only; I'm a magistrate” … https://en.wikipedia.org/wiki/Duck Treasure Island by Robert Louis Stevenson Styles with NTTS - Newscaster
US English Joanna voice US English Matthew voice
“Breaking news! The AWS summit isn't “After all, learning to ride a bike as an adult is cancelled after all, and it's taking place no harder than learning as a kid, as long as virtually instead” you take the same step-by-step approach to the process—and push grown-up fear and nerves out of the way.” © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Polly features: SSML
Speech Synthesis Markup Language is a W3C recommendation, an XML-based markup language for speech synthesis applications
• All SSML text must start with an opening
• •
... {"time":2614,"type":"viseme","value":"@"} {"time":2658,"type":"word","start":66,"end":72,"value":"forest"} {"time":2658,"type":"viseme","value":"f"} {"time":2794,"type":"viseme","value":"O"} {"time":2861,"type":"viseme","value":"r"} {"time":2973,"type":"viseme","value":"i"} {"time":3022,"type":"viseme","value":"s"} {"time":3184,"type":"viseme","value":"t"} {"time":3285,"type":"viseme","value":"sil"} {"time":3672,"type":"sentence","start":74,"end":203,"value":"As she heard branches cracking, Mary started to whisper:
All of the features that are available for Amazon Polly’s NTTS voices are available for Brand Voices as well, including lexicons, speech marks, and SSML tags
Exclusive use for customer
Work with Amazon TTS organization that built the voices of Alexa, and Samuel L Jackson Customer Engagement During Voice Development
1. Customer defines the persona requirements
2. Amazon Polly team sources voice actors that match the requirements
3. Customer establishes contract with their preferred voice actor
4. Amazon Polly team engages the voice actor in recording sessions, builds and gives the customer access to their brand voice
5. Customer tests, accepts, and launches the Brand Voice in their use case Sample brand voice in action
Build a unique Brand Voice with Amazon PollyBuild a unique Brand Voice with Amazon Polly - AWS ML Blog Some Amazon Polly Customers Customer Use Cases
Voiced news articles Home monitoring alerts
Voiced training Podcasts
Telephony / IVR Language learning
Voiced reminders Standardized testing
Radio announcer Navigation
Video creation Translation Getting started and further information
Getting started with Amazon Polly Build your own real-time translator Build a unique Brand Voice with Amazon Polly Creating Next-gen Speech-Enabled Applications Thank you!
Severin Gassauer- Fleissner [email protected]
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Training and Certification
Explore tailored Build cloud skills with Demonstrate expertise Find entry-level cloud learning paths for 550+ free digital with an industry- talent with AWS customers and training courses, or dive recognized credential Academy and AWS partners deep with classroom re/Start training
aws.amazon.com/training Thank you for attending AWS Summit Online I ASEAN
We hope you found it interesting! A kind reminder to complete the survey. Let us know what you thought of today’s event and how we can improve the event experience for you in the future.
[email protected] twitter.com/AWSCloud facebook.com/AmazonWebServices youtube.com/user/AmazonWebServices slideshare.net/AmazonWebServices twitch.tv/aws