ENS Corporate meeting — 11/19/2020
Machine learning applied to speech recognition
Alice Coucke Head of Machine Learning Research
1 Outline:
1. Recent advances in applied ML
2. From physics to machine learning
3. Working at Sonos Recent advances in applied Machine Learning
3 Reinforcement learning Learning goal-oriented behavior within simulated environments
Go Starcraft II Dota 2 AlphaGo (Deepmind, 2016) AlphaStar (Deepmind) OpenAI Five (OpenAI)
Play-driven learning for robots Sim-to-real dexterity learning (Google Brain) Project BLUE (UC Berkeley) Machine Learning for Life Sciences Deep learning applied to biology and medicine
Protein folding & structure Eye disease diagnosis Drug discovery prediction (NHS, UCL, Deepmind) (Recursion, MIT, Harvard) AlphaFold (Deepmind)
Reconstruct speech from neural activity Limb control restoration (UCSF) (Batelle, Ohio State Univ) Computer vision High-level understanding of digital images or videos
GANs for artificial video dubbing (Synthesia)
Visual Question Answering on everyday images (Facebook Research, Georgia Tech)
Future state prediction in dynamic scenes (Wayve, Univ of Cambridge) From physics to machine learning and back A surge of interest from the physics community
NeurIPS: workshop on « machine learning and the physical sciences » Speech and language Understand and analyze human speech
{ intent: FindWeather, entities: { datetime: 11/19/2020, location: Paris } }
Speech transcription Spoken language Text & speech generation Human Parity (Microsoft) understanding GPT-2 (Open AI) (Super)GLUE benchmarks (Google, Bert (Google) Facebook, IBM, Stanford …) XLNet (CMU) …
� �
Neural machine translation Voice activity detection Speaker identification Sentiment analysis Detect emotions in text and Unsupervised MT (Facebook) Detect speech from audio Recognize unique speakers speech Fairness, ethics, and explainability We, as scientists, have a say in the future of AI From physics to machine learning
10 My background From physics to machine learning
2012: M2 ICFP Theoretical physics
2013-2016: PhD in statistical physics @LPTENS
Feb 2017: senior ML scientist @ Snips
2019: head of ML research @ Snips
Today: head of ML research @ Sonos A few takeaways (please go ask other people too)
• PhD? Postdoc?
• Working at a startup company
• Physicists and machine learning
Physicists @ Snips/Sonos
Raffaele Tavarone Francesco Caltagirone Alaa Saade Stéphane d’Ascoli Me :) ML Director ML Director Sr ML Scientist ML research intern Acoustic modeling Spoken Language Now: DeepMind Now: PhD ENS & FAIR Understanding Sonos Voice Experience
13 Sonos Home sound system
Sonos Sonos Voice Experience ✦ Smart speakers in multi room ✦ Ex Snips, joined Sonos 2019 ✦ Hardware & software (~400 people) ✦ 50 employees: ML, Linguistics, SW engineering ✦ 1.5k employees in 6 countries (US, France, ✦ Audio signal proc, speech, NLP, embedded Netherlands, UK, Australia, China) systems
Brand new tech blog:
https://tech-blog.sonos.com/ Machine Learning at Sonos Voice Experience Spoken Language understanding for voice assistants
Automatic Speech Recognition Engine Language modeling
Natural Intent: t ɜ r n ɑ n ð ə Acoustic Language Turn on the Language SwitchLightOn l a ɪ t s ɪ n ð ə model model lights in the Understanding ˈl ɪ v ɪ ŋ r u m living room Engine Slots: room: living room
Wake Word ACR ICF Dialogue Custom Wake Automatic Command Intent Classification Multiturn Dialogue Word Recognition and Filling and Response Audio Frontend Action Code Our approach Our own voice in a vast ecosystem
Privacy by design Resource constrained ML: small data & hardware
✓ A new popular trend in the ML community (low resource ML, transfer learning, miniaturization, etc) ✓ Numerous conferences and workshops on the topic ✓ Towards a safer, greener and more private conversational AI Research activity Publishing research in industry
Snips voice platform: Efficient keyword spotting using Federated learning for keyword Spoken language understanding on an embedded spoken language dilated convolutions and gating spotting the edge understanding system for private- by-design voice interfaces arXiv arXiv arXiv arXiv
ICML 2018 ICASSP 2019 ICASSP 2019 NeurIPS 2019 Workshop PiMLAI Main track Main track Workshop EMC2
Conditionned text generation with Predicting detection filters for small A study on more realistic room Small footprint Text-Independent transfer for closed-domain dialogue footprint open-vocabulary keyword simulation for far-field keyword Speaker Verification for Embedded systems spotting spotting Systems arXiv arXiv arXiv arXiv
SLSP 2020 Interspeech 2020 APSIPA 2020 ArXiv preprint Main track Main track Main track 2020 Thank you for your attention!
Questions?
18