<<

Machine Learning for Recognition by Alice Coucke, Head of Research

@alicecoucke alice.coucke@.com Outline:

1. Recent advances in machine learning

2. From physics to machine learning IA en général Speech & NLP

Startup

3. Working at Snips (now Sonos) Snips A lot of applications in the real world, quite a difference from theoretical work

Un des rares domaines scientifiques où la théorie et la pratique sont très emmêlées Recent Advances in Applied Machine Learning A lot of applications in the real world, quite a difference from theoretical work Learning goal-oriented behavior within simulated environments

Go Starcraft II 2 AlphaGo (Deepmind, 2016) AlphaStar (Deepmind) OpenAI Five (OpenAI)

Play-driven learning for robots Sim-to-real dexterity learning ( Brain) Project BLUE (UC Berkeley) Machine Learning for Life Sciences applied to biology and medicine

Protein folding & structure Cardiac arrhythmia prediction Eye disease diagnosis prediction (NHS, UCL, Deepmind) from ECGs AlphaFold (Deepmind) (Stanford)

Reconstruct speech from neural activity Limb control restoration (UCSF) (Batelle, Ohio State Univ) vision High-level understanding of digital images or videos

GANs for image generation (Heriot Watt Univ, DeepMind)

« Common sense » understanding of actions in videos (TwentyBn, DeepMind, MIT, IBM…)

GANs for artificial video dubbing GAN for full body synthesis (Synthesia) (DataGrid) From physics to machine learning and back A surge of interest from the physics community

NeurIPS 2019: workshop on « machine learning and the physical sciences » Speech and language Understand and analyze human speech

{ intent: FindWeather, entities: { datetime: 11/28/2019, location: Paris } }

Speech transcription Spoken language Text & speech generation Human Parity () understanding GPT-2 (Open AI) (Super)GLUE benchmarks (Google, Bert (Google) Facebook, IBM, Stanford …) XLNet (CMU) … Speech and language Understand and analyze human speech

{ intent: FindWeather, entities: { datetime: 11/28/2019, location: Paris } }

Speech transcription Spoken language Text & speech generation Human Parity (Microsoft) understanding GPT-2 (Open AI) (Super)GLUE benchmarks (Google, Bert (Google) Facebook, IBM, Stanford …) XLNet (CMU) …

� �

Neural machine Speaker identification Detect emotions in text and Unsupervised MT (Facebook) Detect speech from audio Recognize unique speakers speech Fairness, ethics, and explainability We, as scientists, have a say in the future of AI From physics to machine learning Décrire mon parcours My background Parler de la thèse From physics to machine learning Parler du post doc

Pourquoi les physiciens / chercheurs sont une bonne formation - appréhender des théories complexes et les appliquer à des domaines variés - Programmation skills - Formation par la recherche : communiquer des résultats, lire un état de l’art, écrire des articles - Experimental method Décrire mon parcours My background Parler de la thèse From physics to machine learning Parler du post doc

Pourquoi les physiciens / chercheurs 2012: M2 ICFP Theoretical physics sont une bonne formation - appréhender des théories complexes et les appliquer à des 2013-2016: PhD in statistical physics @LPTENS domaines variés - Programmation skills - Formation par la recherche : communiquer des résultats, lire un état de l’art, écrire des articles - Experimental method Décrire mon parcours My background Parler de la thèse From physics to machine learning Parler du post doc

Pourquoi les physiciens / chercheurs 2012: M2 ICFP Theoretical physics sont une bonne formation - appréhender des théories complexes et les appliquer à des 2013-2016: PhD in statistical physics @LPTENS domaines variés - Programmation skills - Formation par la recherche : communiquer des résultats, lire un état de l’art, écrire des articles - Experimental method Décrire mon parcours My background Parler de la thèse From physics to machine learning Parler du post doc

Pourquoi les physiciens / chercheurs 2012: M2 ICFP Theoretical physics sont une bonne formation - appréhender des théories complexes et les appliquer à des 2013-2016: PhD in statistical physics @LPTENS domaines variés - Programmation skills - Formation par la recherche : communiquer des résultats, lire un état de l’art, écrire des articles - Experimental method

Feb 2017: senior ML scientist @ Snips

2019: director of ML research @ Snips

Today: head of ML research @ Sonos, Inc. A few takeaways (please go ask other people too) A few takeaways (please go ask other people too)

• PhD? Postdoc?

• Working at a startup company

• Physicists and machine learning A few takeaways (please go ask other people too)

• PhD? Postdoc?

• Working at a startup company

• Physicists and machine learning

Physicists @ Snips

A. C. Raffaele Tavarone Francesco Caltagirone Alaa Saade Stéphane d’Ascoli Sr ML Scientist Sr ML Scientist Sr ML Scientist ML research intern team Tech Lead Now: DeepMind Now: PhD ENS & FAIR Language team A few takeaways (please go ask other people too)

• PhD? Postdoc?

• Working at a startup company

• Physicists and machine learning

Physicists @ Snips

A. C. Raffaele Tavarone Francesco Caltagirone Alaa Saade Stéphane d’Ascoli Sr ML Scientist Sr ML Scientist Sr ML Scientist ML research intern Acoustics team Tech Lead Now: DeepMind Now: PhD ENS & FAIR Language team Machine Learning at Snips (now Sonos) ML à Snips comme on peut faire du ML et de la en dehors de l’académique

Speech & language recognition - Privacy - pas cloud données en local —> contraintes en + techniques - resources computaitonelles - Peu de données pour entraîner des données - Chaque bloc est optimisé adapté Spoken Language Understanding From speech to meaning for voice assistants

Automatic Engine Language modeling

Natural Intent: t ɜ r n ɑ n ð ə Acoustic Language Turn on the Language SwitchLightOn l a ɪ t s ɪ n ð ə model model lights in the Understanding ˈl ɪ v ɪ ŋ r u m living room Engine Slots: room: living room

Wake ACR ICF Dialogue Custom Wake Automatic Command Intent Classification Multiturn Dialogue Word Recognition and Filling and Response Audio Frontend Action Code Spoken Language Understanding From speech to meaning for voice assistants

Automatic Speech Recognition Engine Language modeling

Natural Intent: t ɜ r n ɑ n ð ə Acoustic Language Turn on the Language SwitchLightOn l a ɪ t s ɪ n ð ə model model lights in the Understanding ˈl ɪ v ɪ ŋ r u m living room Engine Slots: room: living room Spoken Language Understanding From speech to meaning for voice assistants

Automatic Speech Recognition Engine Language modeling

Natural Intent: t ɜ r n ɑ n ð ə Acoustic Language Turn on the Language SwitchLightOn l a ɪ t s ɪ n ð ə model model lights in the Understanding ˈl ɪ v ɪ ŋ r u m living room Engine Slots: room: living room

Deep neural network Proba over phones /a/ /b/ /c/ /d/ /e/ time Spoken Language Understanding From speech to meaning for voice assistants

Automatic Speech Recognition Engine Language modeling

Natural Intent: t ɜ r n ɑ n ð ə Acoustic Language Turn on the Language SwitchLightOn l a ɪ t s ɪ n ð ə model lights in the Understanding ˈl ɪ v ɪ ŋ r u m model living room Engine Slots: room: living room

Proba over phones Decoding graph Turn on the lights in the living room 0.85 Turn off the lights in the living room 0.75 Set the lights in the living room 0.60 /a/ /b/ /c/ /d/ /e/ time Spoken Language Understanding From speech to meaning for voice assistants

Automatic Speech Recognition Engine Language modeling

Natural Intent: t ɜ r n ɑ n ð ə Acoustic Language Turn on the Language SwitchLightOn l a ɪ t s ɪ n ð ə model model lights in the Understanding ˈl ɪ v ɪ ŋ r u m living room Engine Slots: room: living room

Logistic regression Conditional Random Field Turn on the lights Intent: Slots: in the living room SwitchLightOn room: living room Spoken Language Understanding Offline & on device

Automatic Speech Recognition Engine Language modeling

Natural Intent: t ɜ r n ɑ n ð ə Acoustic Language Turn on the Language SwitchLightOn l a ɪ t s ɪ n ð ə model model lights in the Understanding ˈl ɪ v ɪ ŋ r u m living room Engine Slots: room: living room

❌ ❌ ✅ Wake Word ACR ICF Dialogue Custom Wake Automatic Command Intent Classification Multiturn Dialogue Word Recognition and Filling and Response Audio Frontend Action Code Our approach Our own voice in a vast ecosystem

Privacy by design Resource constrained ML: small data & hardware

✓ A new popular trend in the ML community (low resource ML, transfer learning, miniaturization, etc) ✓ Numerous conferences and workshops on the topic ✓ Towards a safer, greener and more private conversational AI Research activity Publishing research in industry

Snips voice platform: Efficient keyword spotting an embedded spoken language using dilated Federated learning for Spoken language understanding system for and gating keyword spotting understanding on the edge private-by-design voice interfaces arXiv arXiv arXiv arXiv There is a lot to do in research in the industry, with us joining Sonos we expect to grow the team and doing even more exploratory stuff.

ICML 2018 ICASSP 2019 ICASSP 2019 NeurIPS 2019 workshop PiMLAI Main track Main track Workshop EMC2

Cited 35 times to date ✓ Publish open & reproducible benchmarks: ‣ ~200 access granted to researchers to our open speech datasets ‣ Snips dataset for NLU is the new academic standard Thank you for your attention Questions?

@alicecoucke [email protected]