Alice Coucke Head of Machine Learning Research

ENS Corporate meeting — 11/19/2020

Machine learning applied to speech recognition

1 Outline:

1. Recent advances in applied ML

2. From physics to machine learning

3. Working at Sonos Recent advances in applied Machine Learning

3 Reinforcement learning Learning goal-oriented behavior within simulated environments

Go Starcraft II Dota 2 AlphaGo (Deepmind, 2016) AlphaStar (Deepmind) OpenAI Five (OpenAI)

Play-driven learning for robots Sim-to-real dexterity learning (Google Brain) Project BLUE (UC Berkeley) Machine Learning for Life Sciences Deep learning applied to biology and medicine

Protein folding & structure Eye disease diagnosis Drug discovery prediction (NHS, UCL, Deepmind) (Recursion, MIT, Harvard) AlphaFold (Deepmind)

Reconstruct speech from neural activity Limb control restoration (UCSF) (Batelle, Ohio State Univ) Computer vision High-level understanding of digital images or videos

GANs for artiﬁcial video dubbing (Synthesia)

Visual Question Answering on everyday images (Facebook Research, Georgia Tech)

Future state prediction in dynamic scenes (Wayve, Univ of Cambridge) From physics to machine learning and back A surge of interest from the physics community

NeurIPS: workshop on « machine learning and the physical sciences » Speech and language Understand and analyze human speech

{ intent: FindWeather, entities: { datetime: 11/19/2020, location: Paris } }

Speech transcription Spoken language Text & speech generation Human Parity (Microsoft) understanding GPT-2 (Open AI) (Super)GLUE benchmarks (Google, Bert (Google) Facebook, IBM, Stanford …) XLNet (CMU) …

� �

Neural machine translation Voice activity detection Speaker identiﬁcation Sentiment analysis Detect emotions in text and Unsupervised MT (Facebook) Detect speech from audio Recognize unique speakers speech Fairness, ethics, and explainability We, as scientists, have a say in the future of AI From physics to machine learning

10 My background From physics to machine learning

2012: M2 ICFP Theoretical physics

2013-2016: PhD in statistical physics @LPTENS

Feb 2017: senior ML scientist @ Snips

2019: head of ML research @ Snips

Today: head of ML research @ Sonos A few takeaways (please go ask other people too)

• PhD? Postdoc?

• Working at a startup company

• Physicists and machine learning

Physicists @ Snips/Sonos

Raﬀaele Tavarone Francesco Caltagirone Alaa Saade Stéphane d’Ascoli Me :) ML Director ML Director Sr ML Scientist ML research intern Acoustic modeling Spoken Language Now: DeepMind Now: PhD ENS & FAIR Understanding Sonos Voice Experience

13 Sonos Home sound system

Sonos Sonos Voice Experience ✦ Smart speakers in multi room ✦ Ex Snips, joined Sonos 2019 ✦ Hardware & software (~400 people) ✦ 50 employees: ML, Linguistics, SW engineering ✦ 1.5k employees in 6 countries (US, France, ✦ Audio signal proc, speech, NLP, embedded Netherlands, UK, Australia, China) systems

Brand new tech blog:

https://tech-blog.sonos.com/ Machine Learning at Sonos Voice Experience Spoken Language understanding for voice assistants

Automatic Speech Recognition Engine Language modeling

Natural Intent: t ɜ r n ɑ n ð ə Acoustic Language Turn on the Language SwitchLightOn l a ɪ t s ɪ n ð ə model model lights in the Understanding ˈl ɪ v ɪ ŋ r u m living room Engine Slots: room: living room

Wake Word ACR ICF Dialogue Custom Wake Automatic Command Intent Classiﬁcation Multiturn Dialogue Word Recognition and Filling and Response Audio Frontend Action Code Our approach Our own voice in a vast ecosystem

Privacy by design Resource constrained ML: small data & hardware

✓ A new popular trend in the ML community (low resource ML, transfer learning, miniaturization, etc) ✓ Numerous conferences and workshops on the topic ✓ Towards a safer, greener and more private conversational AI Research activity Publishing research in industry

Snips voice platform: Efficient keyword spotting using Federated learning for keyword Spoken language understanding on an embedded spoken language dilated convolutions and gating spotting the edge understanding system for private- by-design voice interfaces arXiv arXiv arXiv arXiv

ICML 2018 ICASSP 2019 ICASSP 2019 NeurIPS 2019 Workshop PiMLAI Main track Main track Workshop EMC2

Conditionned text generation with Predicting detection filters for small A study on more realistic room Small footprint Text-Independent transfer for closed-domain dialogue footprint open-vocabulary keyword simulation for far-field keyword Speaker Verification for Embedded systems spotting spotting Systems arXiv arXiv arXiv arXiv

SLSP 2020 Interspeech 2020 APSIPA 2020 ArXiv preprint Main track Main track Main track 2020 Thank you for your attention!

Questions?