Turn text into lifelike speech using deep learning

Severin Gassauer-Fleissner Solutions Architect Global Financial Services Web Services

© 2020, , Inc. or its affiliates. All rights reserved. What to expect from this session

• The AWS ML stack

• Amazon Polly basics

• Fun with SSML

• Customized Brand Voice

Amazon Polly © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Frameworks & Infrastructure

AI SERVICES

NEW! NEW! NEW! NEW! VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS

Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Contact Lens Rekognition Polly Transcribe Comprehend Translate Textract Kendra Lex Personalize Forecast Fraud Detector CodeGuru For Amazon Connect +Medical +Medical NEW

ML SERVICES

SageMaker Studio IDE NEW! Ground Augmented ML Truth NEW! NEW! NEW! NEW! NEW! Neo AI Marketplace Model Amazon SageMaker Built-in Model Notebooks Experiments training & Debugger Autopilot Model Monitor algorithms hosting tuning

ML FRAMEWORKS & INFRASTRUCTURE

Deep Learning GPUs & Elastic Inferentia FPGA AMIs & Containers CPUs Inference (Inf1 instance) ML Services

AI SERVICES

NEW! NEW! NEW! NEW! VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS

Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Contact Lens Rekognition Polly Transcribe Comprehend Translate Textract Kendra Lex Personalize Forecast Fraud Detector CodeGuru For Amazon Connect +Medical +Medical NEW

ML SERVICES

SageMaker Studio IDE NEW! Ground Augmented ML Truth NEW! NEW! NEW! NEW! NEW! Neo AI Marketplace Model Amazon SageMaker Built-in Model Notebooks Experiments training & Debugger Autopilot Model Monitor algorithms hosting tuning

ML FRAMEWORKS & INFRASTRUCTURE

Deep Learning GPUs & Elastic Inferentia FPGA AMIs & Containers CPUs Inference (Inf1 instance) AI Services

AI SERVICES

NEW! NEW! NEW! NEW! VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS

Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Contact Lens Rekognition Polly Transcribe Comprehend Translate Textract Kendra Lex Personalize Forecast Fraud Detector CodeGuru For Amazon Connect +Medical +Medical NEW

ML SERVICES

SageMaker Studio IDE NEW! Ground Augmented ML Truth NEW! NEW! NEW! NEW! NEW! Neo AI Marketplace Model Amazon SageMaker Built-in Model Notebooks Experiments training & Debugger Autopilot Model Monitor algorithms hosting tuning

ML FRAMEWORKS & INFRASTRUCTURE

Deep Learning GPUs & Elastic Inferentia FPGA AMIs & Containers CPUs Inference (Inf1 instance) AI Services - Polly

AI SERVICES

NEW! NEW! NEW! NEW! VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS

Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Contact Lens Rekognition Polly Transcribe Comprehend Translate Textract Kendra Lex Personalize Forecast Fraud Detector CodeGuru For Amazon Connect +Medical +Medical NEW

ML SERVICES

SageMaker Studio IDE NEW! Ground Augmented ML Truth NEW! NEW! NEW! NEW! NEW! Neo AI Marketplace Model Amazon SageMaker Built-in Model Notebooks Experiments training & Debugger Autopilot Model Monitor algorithms hosting tuning

ML FRAMEWORKS & INFRASTRUCTURE

Deep Learning GPUs & Elastic Inferentia FPGA AMIs & Containers CPUs Inference (Inf1 instance) © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is Amazon Polly?

Multiple voices Adaptability API driven Re-usable

Amazon Polly Amazon Polly – Language Portfolio

APAC EMEA Americas • Danish • Arabic • Portuguese • Brazilian Portuguese • Dutch • Australian English • Romanian • Brazilian Portuguese NTTS • British English • Indian English • Russian • Canadian French • British English NTTS • Japanese • Spanish • English (US) • French • Hindi • Swedish • English (US) NTTS, • German • Korean • Turkish Newscaster & • Icelandic • Mandarin • Welsh Conversational Styles • Italian • Welsh English • Spanish (Mexican) • Norwegian • Spanish (US) • Polish • Spanish (US) NTTS Some voices in action

English (British) (Brian)

Chinese, Mandarin (Zhiyu)

German (Hans)

Indian English (Aditi) © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Standard vs Neural TTS

Text Sentence to synthesize. Standard vs Neural TTS

Text Sentence to synthesize.

ˈsɛntəns tə ˈsɪnθəˌsaɪz. Phonetic transcription Standard vs Neural TTS

Text Sentence to synthesize.

ˈsɛntəns tə ˈsɪnθəˌsaɪz. Phonetic transcription

Concatenative TTS

ˈsɛnt sɛntəns tə ˈsɪnθ əˌsaɪz. Standard vs Neural TTS

Text Sentence to synthesize.

ˈsɛntəns tə ˈsɪnθəˌsaɪz. Phonetic transcription

Concatenative TTS Neural TTS

ˈsɛnt sɛntəns tə ˈsɪnθ əˌsaɪz. How does Neural Text To Speech work?

Phonemes Mel Spectrograms & Features

CONTEXT GENERATION (Prosody models)

• Focused on intonation • Speaker style specific How does Neural Text To Speech work?

Phonemes Mel Spectrograms SPEECH & Features

CONTEXT SPEECH GENERATION PRODUCTION (Prosody models) (Neural Vocoder)

• Focused on • Focused on Hi-Fi intonation speech signal • Speaker style • Independent of specific speaker or style Neural and concatenative comparison

Kimberly (US) voice Justin (US) child voice “Duck is the common name for numerous “And now, sir,” continued the doctor, species in the waterfowl family Anatidae “since I now know there's such a fellow in which also includes swans and geese. my district, you may count I'll have an Ducks are divided among several eye upon you day and night. I'm not a subfamilies in the family Anatidae.” … doctor only; I'm a magistrate” … https://en.wikipedia.org/wiki/Duck Treasure Island by Robert Louis Stevenson Styles with NTTS - Newscaster

US English Joanna voice US English Matthew voice

“Breaking news! The AWS summit isn't “After all, learning to ride a bike as an adult is cancelled after all, and it's taking place no harder than learning as a kid, as long as virtually instead” you take the same step-by-step approach to the process—and push grown-up fear and nerves out of the way.” © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Polly features: SSML

Speech Synthesis Markup Language is a W3C recommendation, an XML-based markup language for speech synthesis applications

My name is Sev. It’s not really, but I say it is because Severin Gassauer-Fleissner is no fun to spell out when booking a table at a restaurant. SSML Directives

• All SSML text must start with an opening tag and end with a closing tag. All other tags are inserted between . Polly-Supported SSML Tags

• Volume • • Speech rate • Dynamic Range Compression • Pitch • • Phonation • • Vocal Tract Length • • Max-Duration • Whispered • Fast paced atmosphere

I’m at 500 and I want 550550 bid on 550 I’m at 500 would you go 550 550 for the gentleman in the corner A paddling of 2000 ducks is on offer Do we get 600? A paddling of 2000 ducks We got 600 for the whole lotSold Alfred, did you know there is a city called Batman in Turkey? © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Synchronize content with speech marks 1

The night was about to fall and we were still lost in the forest. As she heard branches cracking, Mary started to whisper: If you make any noise they will find us. We need to walk silently to the lake. Synchronize content with speech marks 2 aws polly synthesize-speech --text-type ssml --output-format json --voice-id Joanna --text "$(< lake.input)" --speech-mark-types='["sentence", "word", "viseme"]' lake.json

... {"time":2614,"type":"viseme","value":"@"} {"time":2658,"type":"word","start":66,"end":72,"value":"forest"} {"time":2658,"type":"viseme","value":"f"} {"time":2794,"type":"viseme","value":"O"} {"time":2861,"type":"viseme","value":"r"} {"time":2973,"type":"viseme","value":"i"} {"time":3022,"type":"viseme","value":"s"} {"time":3184,"type":"viseme","value":"t"} {"time":3285,"type":"viseme","value":"sil"} {"time":3672,"type":"sentence","start":74,"end":203,"value":"As she heard branches cracking, Mary started to whisper: If you make any noise they will find us."} {"time":3678,"type":"word","start":74,"end":76,"value":"As"} {"time":3678,"type":"viseme","value":"a"} {"time":3749,"type":"viseme","value":"s"} {"time":3775,"type":"word","start":77,"end":80,"value":"she"} {"time":3775,"type":"viseme","value":"S"} ... aws polly synthesize-speech --text-type ssml --output-format mp3 --voice-id Joanna - -text "$(< lake.input)" lake.mp3 Synchronize content with speech marks 3 © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. A (Brand) new voice! Voices match speaking styles including intonation patterns, and tone of the personas they reflect

All of the features that are available for Amazon Polly’s NTTS voices are available for Brand Voices as well, including lexicons, speech marks, and SSML tags

Exclusive use for customer

Work with Amazon TTS organization that built the voices of Alexa, and Samuel L Jackson Customer Engagement During Voice Development

1. Customer defines the persona requirements

2. Amazon Polly team sources voice actors that match the requirements

3. Customer establishes contract with their preferred voice actor

4. Amazon Polly team engages the voice actor in recording sessions, builds and gives the customer access to their brand voice

5. Customer tests, accepts, and launches the Brand Voice in their use case Sample brand voice in action

Build a unique Brand Voice with Amazon PollyBuild a unique Brand Voice with Amazon Polly - AWS ML Blog Some Amazon Polly Customers Customer Use Cases

Voiced news articles Home monitoring alerts

Voiced training Podcasts

Telephony / IVR Language learning

Voiced reminders Standardized testing

Radio announcer Navigation

Video creation Translation Getting started and further information

Getting started with Amazon Polly Build your own real-time translator Build a unique Brand Voice with Amazon Polly Creating Next-gen Speech-Enabled Applications Thank you!

Severin Gassauer- Fleissner [email protected]

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Training and Certification

Explore tailored Build cloud skills with Demonstrate expertise Find entry-level cloud learning paths for 550+ free digital with an industry- talent with AWS customers and training courses, or dive recognized credential Academy and AWS partners deep with classroom re/Start training

aws.amazon.com/training Thank you for attending AWS Summit Online I ASEAN

We hope you found it interesting! A kind reminder to complete the survey. Let us know what you thought of today’s event and how we can improve the event experience for you in the future.

[email protected] twitter.com/AWSCloud facebook.com/AmazonWebServices youtube.com/user/AmazonWebServices slideshare.net/AmazonWebServices .tv/aws