Webnn: the Architecture View

Access purpose-built ML hardware with Web Neural Network API

Ningxin Hu Intel Corporation July 2020 The JS ML frameworks and AI Web apps

AI Features of Object Semantic Speech Noise Web Apps Detection Segmentation Recognition Suppression

JS ML Frameworks ONNX.js TensorFlow.js Paddle.js OpenCV.js

Web Browser WebAssembly WebGL/WebGPU

Hardware CPU GPU

2 The purpose-built ML hardware

AI Features of Object Semantic Speech Noise Web Apps Detection Segmentation Recognition Suppression

JS ML Frameworks ONNX.js TensorFlow.js Paddle.js OpenCV.js

Web Browser WebAssembly WebGL/WebGPU

Hardware ML Ext. ML Ext. CPU GPU NPU VPU DSP

3 The performance gap: Web and native MobileNet* Inference Latency (smaller is better)

Laptop with VNNI** Smartphone with DSP 35 33 90 85 Wasm 80

30 Wasm ) 26.8 )

70 64 ms 25 WebGL ms 60 2.6X WebGL 20 9.7X 50

15 8.9X 24X 40 33 5.8X 16X 30 10 NNAPI

20 Inference Latency ( Latency Inference Inference Latency ( Latency Inference 12 5 3.4 3 10 1.1 NNAPI 4 NNAPI OpenVINO OpenVINO OpenVINO 0 0 CPU GPU CPU VNNI CPU GPU DSP Wasm/SIMD128/FP32 OpenVINO/CPU/FP32 Wasm/SIMD128/FP32 NNAPI/CPU/FP32 WebGL/GPU/FP16 OpenVINO/GPU/FP16 WebGL/GPU/FP16 NNAPI/GPU/FP16 OpenVINO/VNNI/INT8 NNAPI/DSP/INT8

* Batch size: 1, input size: 224x224, width multiplier: 1.0 ** VNNI: Vector Neural Network Instruction 4 The Web is disconnected from ML hardware

AI Features of Object Semantic Speech Noise Web Apps Detection Segmentation Recognition Suppression

JS ML Frameworks ONNX.js TensorFlow.js Paddle.js OpenCV.js

Web Browser WebAssembly WebGL/WebGPU ?

Hardware ML Ext. ML Ext. CPU GPU NPU VPU DSP

5 WebNN: the architecture view

ONNX Models TensorFlow Models Other Models

Web App JS ML frameworks TensorFlow.js, ONNX.js etc.,

Web Browser WebGL/WebGPU WebNN WebAssembly

BNNS/MPS DirectML NN API OpenVINO Native ML API MacOS/iOS Windows Android Linux

ML Ext. ML Ext. Hardware CPU GPU ML Accelerators

6 WebNN: the programming model

nn = navigator.ml.getNeuralNetworkContext startCompute NeuralNetworkContext

nn.input / nn.constant / nn.conv2d / nn.add / nn.relu / … Execution setInput Computational Graph input Buffer filter bias output Buffer setOutput input conv2d tmp add tmp relu output

nn.createModel Execution setInput Compilation Model input Buffer options createCompilation createExecution output Buffer

setOutput

Computational Graph Legend: startCompute

input constant output operand operation https://webmachinelearning.github.io/webnn/ 7 WebNN: the proof-of-concept implementation

Renderer Process blink NeuralNetwork Model Compilation Execution Context

Customized IPC Chromium service

MacOS Impl Android Impl Windows Impl Linux Impl

GPU Process

Native API MPS/BNNS NNAPI DirectML OpenVINO

ML Ext. ML Ext. Hardware CPU GPU ML Accelerators

https://github.com/otcshare/chromium-src 8 WebNN: the demos

WebNN image classification WebNN image classification on a laptop with VNNI on a smartphone with DSP https://intel.github.io/webml-polyfill/examples/image_classification 9 WebNN: the PoC performance

MobileNet* Inference Latency on Laptop with VNNI** (smaller is better) 35 33 Wasm 30 Web App

26.8 ) 25 WebGL ms WebNN 20 8X 4.9X 16X OpenVINO 15

Inference Latency ( Latency Inference VNNI 5.4 4.1 GPU CPU 5 3.4 3 WebNN WebNN 1.6 1.1 WebNN OpenVINO OpenVINO OpenVINO 0 CPU GPU CPU VNNI

Wasm/SIMD128/FP32 WebNN/OpenVINO/CPU/FP32 OpenVINO/CPU/FP32 WebGL/GPU/FP16 WebNN/OpenVINO/GPU/FP16 OpenVINO/GPU/FP16 WebNN/OpenVINO/VNNI/INT8 OpenVINO/VNNI/INT8

* Batch size: 1, input size: 224x224, width multiplier: 1.0 10 ** VNNI: Vector Neural Network Instruction WebNN: the PoC performance – cont’d

MobileNet* Inference Latency on Smartphone with DSP (smaller is better) 90 85 80 Wasm Web App

) 70 64

ms 60 2.4X WebGL WebNN 50

40 NNAPI 35 33 4.5X 10X

30 WebNN NNAPI Inference Latency ( Latency Inference 20 14 12 WebNN CPU GPU DSP 10 WebNN 6 NNAPI 4 NNAPI 0 CPU GPU DSP

Wasm/SIMD128/FP32 WebNN/NNAPI/CPU/FP32 NNAPI/CPU/FP32 WebGL/GPU/FP16 WebNN/NNAPI/GPU/FP16 NNAPI/GPU/FP16 WebNN/NNAPI/DSP/INT8 NNAPI/DSP/INT8

* Batch size: 1, input size: 224x224, width multiplier: 1.0 11 Call for Participation https://www.w3.org/commu nity/webmachinelearning/ https://webmachinelearning. github.io/webnn/

12 Thanks

13 Appendix

• WebNN spec: https://webmachinelearning.github.io/webnn/ • WebML CG: https://www.w3.org/community/webmachinelearning/ • NNAPI: https://developer.android.com/ndk/guides/neuralnetworks • DirectML: https://docs.microsoft.com/en-us/windows/win32/direct3d12/dml-intro • MPS: https://developer.apple.com/documentation/metalperformanceshaders • OpenVINO: https://docs.openvinotoolkit.org/ • TensorFlow.js: https://js.tensorflow.org/ • ONNX.js: https://github.com/microsoft/onnxjs • Paddle.js: https://github.com/PaddlePaddle/Paddle-Lite/tree/develop/web • OpenCV.js: https://docs.opencv.org/3.4.10/d5/d10/tutorial_js_root.html • AI-benchmark: http://ai-benchmark.com/ • TensorFlow.js benchmark: https://tensorflow.github.io/tfjs/e2e/benchmarks/ • Wasm SIMD128: https://github.com/WebAssembly/simd • AVX512-VNNI: https://en.wikichip.org/wiki/x86/avx512_vnni