Access purpose-built ML hardware with Web Neural Network API
Ningxin Hu Intel Corporation July 2020 The JS ML frameworks and AI Web apps
AI Features of Object Semantic Speech Noise Web Apps Detection Segmentation Recognition Suppression
JS ML Frameworks ONNX.js TensorFlow.js Paddle.js OpenCV.js
Web Browser WebAssembly WebGL/WebGPU
Hardware CPU GPU
2 The purpose-built ML hardware
AI Features of Object Semantic Speech Noise Web Apps Detection Segmentation Recognition Suppression
JS ML Frameworks ONNX.js TensorFlow.js Paddle.js OpenCV.js
Web Browser WebAssembly WebGL/WebGPU
Hardware ML Ext. ML Ext. CPU GPU NPU VPU DSP
3 The performance gap: Web and native MobileNet* Inference Latency (smaller is better)
Laptop with VNNI** Smartphone with DSP 35 33 90 85 Wasm 80
30 Wasm ) 26.8 )
70 64 ms 25 WebGL ms 60 2.6X WebGL 20 9.7X 50
15 8.9X 24X 40 33 5.8X 16X 30 10 NNAPI
20 Inference Latency ( Latency Inference Inference Latency ( Latency Inference 12 5 3.4 3 10 1.1 NNAPI 4 NNAPI OpenVINO OpenVINO OpenVINO 0 0 CPU GPU CPU VNNI CPU GPU DSP Wasm/SIMD128/FP32 OpenVINO/CPU/FP32 Wasm/SIMD128/FP32 NNAPI/CPU/FP32 WebGL/GPU/FP16 OpenVINO/GPU/FP16 WebGL/GPU/FP16 NNAPI/GPU/FP16 OpenVINO/VNNI/INT8 NNAPI/DSP/INT8
* Batch size: 1, input size: 224x224, width multiplier: 1.0 ** VNNI: Vector Neural Network Instruction 4 The Web is disconnected from ML hardware
AI Features of Object Semantic Speech Noise Web Apps Detection Segmentation Recognition Suppression
JS ML Frameworks ONNX.js TensorFlow.js Paddle.js OpenCV.js
Web Browser WebAssembly WebGL/WebGPU ?
Hardware ML Ext. ML Ext. CPU GPU NPU VPU DSP
5 WebNN: the architecture view
ONNX Models TensorFlow Models Other Models
Web App JS ML frameworks TensorFlow.js, ONNX.js etc.,
Web Browser WebGL/WebGPU WebNN WebAssembly
BNNS/MPS DirectML NN API OpenVINO Native ML API MacOS/iOS Windows Android Linux
ML Ext. ML Ext. Hardware CPU GPU ML Accelerators
6 WebNN: the programming model
nn = navigator.ml.getNeuralNetworkContext startCompute NeuralNetworkContext
nn.input / nn.constant / nn.conv2d / nn.add / nn.relu / … Execution setInput Computational Graph input Buffer filter bias output Buffer setOutput input conv2d tmp add tmp relu output
nn.createModel Execution setInput Compilation Model input Buffer options createCompilation createExecution output Buffer
setOutput
Computational Graph Legend: startCompute
input constant output operand operation https://webmachinelearning.github.io/webnn/ 7 WebNN: the proof-of-concept implementation
Renderer Process blink NeuralNetwork Model Compilation Execution Context
Customized IPC Chromium service
MacOS Impl Android Impl Windows Impl Linux Impl
GPU Process
Native API MPS/BNNS NNAPI DirectML OpenVINO
ML Ext. ML Ext. Hardware CPU GPU ML Accelerators
https://github.com/otcshare/chromium-src 8 WebNN: the demos
WebNN image classification WebNN image classification on a laptop with VNNI on a smartphone with DSP https://intel.github.io/webml-polyfill/examples/image_classification 9 WebNN: the PoC performance
MobileNet* Inference Latency on Laptop with VNNI** (smaller is better) 35 33 Wasm 30 Web App
26.8 ) 25 WebGL ms WebNN 20 8X 4.9X 16X OpenVINO 15
10
Inference Latency ( Latency Inference VNNI 5.4 4.1 GPU CPU 5 3.4 3 WebNN WebNN 1.6 1.1 WebNN OpenVINO OpenVINO OpenVINO 0 CPU GPU CPU VNNI
Wasm/SIMD128/FP32 WebNN/OpenVINO/CPU/FP32 OpenVINO/CPU/FP32 WebGL/GPU/FP16 WebNN/OpenVINO/GPU/FP16 OpenVINO/GPU/FP16 WebNN/OpenVINO/VNNI/INT8 OpenVINO/VNNI/INT8
* Batch size: 1, input size: 224x224, width multiplier: 1.0 10 ** VNNI: Vector Neural Network Instruction WebNN: the PoC performance – cont’d
MobileNet* Inference Latency on Smartphone with DSP (smaller is better) 90 85 80 Wasm Web App
) 70 64
ms 60 2.4X WebGL WebNN 50
40 NNAPI 35 33 4.5X 10X
30 WebNN NNAPI Inference Latency ( Latency Inference 20 14 12 WebNN CPU GPU DSP 10 WebNN 6 NNAPI 4 NNAPI 0 CPU GPU DSP
Wasm/SIMD128/FP32 WebNN/NNAPI/CPU/FP32 NNAPI/CPU/FP32 WebGL/GPU/FP16 WebNN/NNAPI/GPU/FP16 NNAPI/GPU/FP16 WebNN/NNAPI/DSP/INT8 NNAPI/DSP/INT8
* Batch size: 1, input size: 224x224, width multiplier: 1.0 11 Call for Participation https://www.w3.org/commu nity/webmachinelearning/ https://webmachinelearning. github.io/webnn/
12 Thanks
13 Appendix
• WebNN spec: https://webmachinelearning.github.io/webnn/ • WebML CG: https://www.w3.org/community/webmachinelearning/ • NNAPI: https://developer.android.com/ndk/guides/neuralnetworks • DirectML: https://docs.microsoft.com/en-us/windows/win32/direct3d12/dml-intro • MPS: https://developer.apple.com/documentation/metalperformanceshaders • OpenVINO: https://docs.openvinotoolkit.org/ • TensorFlow.js: https://js.tensorflow.org/ • ONNX.js: https://github.com/microsoft/onnxjs • Paddle.js: https://github.com/PaddlePaddle/Paddle-Lite/tree/develop/web • OpenCV.js: https://docs.opencv.org/3.4.10/d5/d10/tutorial_js_root.html • AI-benchmark: http://ai-benchmark.com/ • TensorFlow.js benchmark: https://tensorflow.github.io/tfjs/e2e/benchmarks/ • Wasm SIMD128: https://github.com/WebAssembly/simd • AVX512-VNNI: https://en.wikichip.org/wiki/x86/avx512_vnni
14