Deepspeech @ Websummercamp
Total Page:16
File Type:pdf, Size:1020Kb
DeepSpeech @ WebSummerCamp DeepSpeech @ WebSummerCamp Workshop Alexandre Lissy [email protected] 2019-08-28 • Welcome and thanks for attending ! DeepSpeech @ WebSummerCamp • I’m Alexandre, working on the DeepSpeech team in the Paris Mozilla Office • The purpose of the workshop is an introduction to leveraging Speech Recognition for the Workshop Web • I want this to be interactive and as much as possible “hands-on” Alexandre Lissy [email protected] #websc | DeepSpeech @ WebSummerCamp 1/21 Outline DeepSpeech @ WebSummerCamp 1 Why is Mozilla working on speech ? What is DeepSpeech ? DeepSpeech status 2 Tooling and description Virtual machine Outline 3 NodeJS DeepSpeech service Basic NodeJS CLI A DeepSpeech REST API Capturing audio from a Webpage Using WebSocket and Streaming API 1 Why is Mozilla working on speech ? Outline 4 Producing a custom language model DeepSpeech models Command-specific language model What is DeepSpeech ? 2019-08-28 DeepSpeech status • Our workshop will follow this outline 2 Tooling and description Virtual machine 3 NodeJS DeepSpeech service Basic NodeJS CLI A DeepSpeech REST API Capturing audio from a Webpage Using WebSocket and Streaming API 4 Producing a custom language model DeepSpeech models Command-specific language model #websc | DeepSpeech @ WebSummerCamp 2/21 Next DeepSpeech @ WebSummerCamp 1 Why is Mozilla working on speech ? What is DeepSpeech ? DeepSpeech status Why is Mozilla working on speech ? 2 Tooling and description Virtual machine Next 3 NodeJS DeepSpeech service Basic NodeJS CLI A DeepSpeech REST API Capturing audio from a Webpage Using WebSocket and Streaming API 1 Why is Mozilla working on speech ? Next 4 Producing a custom language model DeepSpeech models Command-specific language model What is DeepSpeech ? 2019-08-28 DeepSpeech status • You might wonder why is Mozilla working on speech 2 Tooling and description Virtual machine • Let’s quickly have a look at the Common Voice talk 3 NodeJS DeepSpeech service Basic NodeJS CLI A DeepSpeech REST API Capturing audio from a Webpage Using WebSocket and Streaming API 4 Producing a custom language model DeepSpeech models Command-specific language model #websc | DeepSpeech @ WebSummerCamp | Why is Mozilla working on speech ? 3/21 DeepSpeech @ WebSummerCamp Mozilla DeepSpeech Definition Why is Mozilla working on speech ? Mozilla implementation of the DeepSpeech v1 Baidu paper Mozilla DeepSpeech FLOSS end-to-end production grade speech recognition Originally based 100% on Baidu’s paper, now with variation to allow streaming What is DeepSpeech ? usage One-shot and streaming inference API in C, exposed to bindings (Python, NodeJS, Rust, Go, . ) Mozilla DeepSpeech Model training, dataset import, model export (Protocol buffer, TFLite) 2019-08-28 Definition • Mozilla DeepSpeech aims at providing an end-to-end speech recognition engine, Mozilla implementation of the DeepSpeech v1 Baidu paper machine-learning based, available under MPL2 FLOSS end-to-end production grade speech recognition • First implementation was 100% Baidu’s implementation, with some limitations. We removed the bidirectionnal recurrent component to allow a more streaming-oriented usage Originally based 100% on Baidu’s paper, now with variation to allow streaming usage • We want to make sure people can reproduce our model and build on top of them, so the pre-trained model and checkpoints are available under appropriate license One-shot and streaming inference API in C, exposed to bindings (Python, NodeJS, Rust, Go, . ) • We ship ready-to-use English (for now) model as well as an API exposed in C with bindings in many languages Model training, dataset import, model export (Protocol buffer, TFLite) #websc | DeepSpeech @ WebSummerCamp | Why is Mozilla working on speech ? 4/21 DeepSpeech @ WebSummerCamp Objectives for today Workshop goals Running an English model from a NodeJS-based server app First, very basic with HTTP Objectives for today Second, using WebSocket and the Streaming API Producing and integrating a new, small language model for voice-driven webapp W3C SpeechRecognition polyfill Objectives for today 2019-08-28 • Goals for this workshop. The idea is to show you how one can rely on DeepSpeech to Running an English model from a NodeJS-based server app provide speech support First, very basic with HTTP • Being in a web context, I assumed JS and Node would be easier Second, using WebSocket and the Streaming API • The first two items should help you get familiar with the API Producing and integrating a new, small language model for voice-driven webapp • The third item is a way to show you how one can re-use existing English model and W3C SpeechRecognition polyfill produce a specialized language model for a subset of commands • The last item might be ambitious, but it’s a way to play with the standard speech recognition API, that is for now not available in Firefox #websc | DeepSpeech @ WebSummerCamp | Workshop goals 5/21 Next DeepSpeech @ WebSummerCamp 1 Why is Mozilla working on speech ? What is DeepSpeech ? DeepSpeech status Tooling and description 2 Tooling and description Virtual machine Next 3 NodeJS DeepSpeech service Basic NodeJS CLI A DeepSpeech REST API Capturing audio from a Webpage Using WebSocket and Streaming API 1 Why is Mozilla working on speech ? Next 4 Producing a custom language model DeepSpeech models Command-specific language model What is DeepSpeech ? 2019-08-28 DeepSpeech status 2 Tooling and description Virtual machine 3 NodeJS DeepSpeech service Basic NodeJS CLI A DeepSpeech REST API Capturing audio from a Webpage Using WebSocket and Streaming API 4 Producing a custom language model DeepSpeech models Command-specific language model #websc | DeepSpeech @ WebSummerCamp | Tooling and description 6/21 Credentials DeepSpeech @ WebSummerCamp login: wsc pass: websummer TensorFlow in ~/DeepSpeech/tensorflow ; binaries under Tooling and description bazel-bin/native_client/ Credentials TensorFlow virtualenv in ~/DeepSpeech/tf-venv/ DeepSpeech n ~/DeepSpeech/DeepSpeech ; binaries under native_client/ Virtual machine KenLM in ~/DeepSpeech/kenlm ; binaries under build/bin English model and checkpoints in ~/DeepSpeech/models and ~/DeepSpeech/checkpoints login: wsc English audio samples in ~/DeepSpeech/audio libdeepspeech.so (LD_LIBRARY_PATH=$HOME/DeepSpeech/tensorflow/bazel-bin/native_client) pass: websummer 2019-08-28 Firefox Nightly and NodeJS v10.x TensorFlow in ~/DeepSpeech/tensorflow ; binaries under • Quick list of what is already setup in the VM bazel-bin/native_client/ • You should have everything needed and in-place to rebuild language model TensorFlow virtualenv in ~/DeepSpeech/tf-venv/ DeepSpeech n ~/DeepSpeech/DeepSpeech ; binaries under native_client/ • It should even cover re-training or fine-tuning an existing model KenLM in ~/DeepSpeech/kenlm ; binaries under build/bin English model and checkpoints in ~/DeepSpeech/models and ~/DeepSpeech/checkpoints English audio samples in ~/DeepSpeech/audio libdeepspeech.so (LD_LIBRARY_PATH=$HOME/DeepSpeech/tensorflow/bazel-bin/native_client) Firefox Nightly and NodeJS v10.x #websc | DeepSpeech @ WebSummerCamp | Tooling and description 7/21 Next DeepSpeech @ WebSummerCamp 1 Why is Mozilla working on speech ? What is DeepSpeech ? DeepSpeech status NodeJS DeepSpeech service 2 Tooling and description Virtual machine Next 3 NodeJS DeepSpeech service Basic NodeJS CLI A DeepSpeech REST API Capturing audio from a Webpage Using WebSocket and Streaming API 1 Why is Mozilla working on speech ? Next 4 Producing a custom language model DeepSpeech models Command-specific language model What is DeepSpeech ? 2019-08-28 DeepSpeech status 2 Tooling and description Virtual machine 3 NodeJS DeepSpeech service Basic NodeJS CLI A DeepSpeech REST API Capturing audio from a Webpage Using WebSocket and Streaming API 4 Producing a custom language model DeepSpeech models Command-specific language model #websc | DeepSpeech @ WebSummerCamp | NodeJS DeepSpeech service 8/21 DeepSpeech @ WebSummerCamp Getting familiar Basics NodeJS DeepSpeech service deepspeech--version Getting familiar deepspeech--help Basic NodeJS CLI Run inference on one of the sample audio files install DeepSpeech NodeJS bindings are already installed. If you need: Getting familiar npm install [email protected] 2019-08-28 Basics • Ensuring that everyone is able to run an inference from the NodeJS binary deepspeech--version • Discovering the command-line arguments, versions deepspeech--help Run inference on one of the sample audio files install DeepSpeech NodeJS bindings are already installed. If you need: npm install [email protected] #websc | DeepSpeech @ WebSummerCamp | NodeJS DeepSpeech service 9/21 DeepSpeech @ WebSummerCamp DeepSpeech API and NodeJS The C-level API Public-facing is in native_client/deepspeech.h NodeJS DeepSpeech service We try to keep that as stable as possible DeepSpeech API and NodeJS NodeJS API Basic NodeJS CLI Bindings generated using SWIG, with support for NodeJS v4.x to v12.x Defined in native_client/javascript/deepspeech.i Built with node-gyp and node-pre-gyp, bundling pre-built libdeepspeech.so DeepSpeech API and NodeJS Combination of binding.gyp, index.js and Makefile that copies shared object Also supports ElectronJS from 1.6 to 5.0 2019-08-28 The C-level API Public-facing is in native_client/deepspeech.h • Now that you have been able to perform some inference and got a bit familiar, let’s have We try to keep that as stable as possible a look at the C API • From this C API, you can see the NodeJS one we derive: it’s the same, except we do NodeJS API some object-oriented