DEPARTMENT of INFORMATICS Evaluation of Machine Learning

DEPARTMENT OF INFORMATICS TECHNISCHE UNIVERSITÄT MÜNCHEN Bachelor’s Thesis in Informatics Evaluation of Machine Learning Inference Workloads on Heterogeneous Edge Devices Marco Rubin DEPARTMENT OF INFORMATICS TECHNISCHE UNIVERSITÄT MÜNCHEN Bachelor’s Thesis in Informatics Evaluation of Machine Learning Inference Workloads on Heterogeneous Edge Devices Evaluierung von Inferenzaufgaben für maschinelles Lernen auf heterogenen Edge-Geräten Author: Marco Rubin Supervisor: Prof. Dr. rer. nat. Martin Schulz Advisor: Dai Yang, Amir Raoofy Submission Date: 16th of March 2020 I confirm that this bachelor’s thesis in informatics is my own work and I have documented all sources and material used. Munich, 16th of March 2020 Marco Rubin Abstract Machine learning has traditionally been a thing of powerful computers and even supercomputers. Recent increases in computational power and the development of specialized architecture led to the possibility to perform machine learning, especially inference, on the edge. Those edge devices include small computers like the Nvidia Jetson Nano and accelerators like the Intel Neural Compute Sticks. In this thesis these devices and their respective frameworks are compared amongst each other and to an Apple MacBook Pro which represents a more traditional machine learning device. The comparisons are done by using an almost identical inference script across all platforms. The maximum possible throughput is also tested with tools specialized on the respective framework. Four models with a different number of convolutional layers were used to put a special emphasis on this kind of layer. The precision of the models dropped after the conversion to OpenVINO for the Neural Compute Sticks, but remained equally high on the other frameworks. Regarding performance, the MacBook was still the fastest device overall, but lost more performance proportionally to edge devices when used with an increasing number of convolutional layers. Amongst the edge devices, the Jetson Nano was the fastest, followed by the 2nd gen Neural Compute Stick and the 1st gen Neural Compute Stick. iii Contents Abstract iii 1 Introduction 1 1.1 Introduction . .1 1.2 A brief Overview of Neural Networks . .1 1.2.1 Structure . .1 1.2.2 Training . .2 1.2.3 Inferencing . .3 1.2.4 Layers . .3 1.3 Goal of the Thesis . .5 2 Implementation 6 2.1 Experiences . .6 2.2 Final Implementation . 10 2.2.1 Training . 10 2.2.2 Conversion to OpenVINO . 12 2.2.3 Inference . 12 2.2.4 Limitations of the Final Implementation . 13 3 Experiments 15 3.1 Setup . 15 3.1.1 Hardware . 15 3.1.2 Software . 17 3.2 Benchmarking Methods . 18 3.3 Limitations of the Benchmarking Methods . 19 4 Results 20 4.1 TensorFlow . 20 4.1.1 Results of the Evaluate Script . 20 4.1.2 Results of the Predict Script . 21 4.2 OpenVINO . 22 4.2.1 Results of the Benchmark App . 22 4.2.2 Results of the Predict Script . 23 4.3 ONNX Runtime . 24 4.4 Comparison of the Results between Frameworks . 25 4.4.1 Accuracy . 25 4.4.2 Runtimes . 26 iv Contents 5 Conclusion and Outlook 29 5.1 Conclusion . 29 5.2 Outlook . 30 List of Figures 31 List of Tables 32 Bibliography 33 v 1 Introduction 1.1 Introduction Since the last century, informatics has been a driving force for all kinds of related industries. However, in all that time, computers were not able to gain knowledge when processing data. Learning from experience is a typical human behaviour, enabling people to get better at their task. Machine learning tries to replicate this. It is believed to be able to solve problems in automatization, classification, recognition and many more domains. Its application field contains amongst others face recognition of specific persons in pictures and analyses of speech samples. Furthermore it can be used in robots to detect objects, or warn selfdriving cars when an obstacle is recognized. While the idea of machine learning is not new, the term has already been used in 1959 by Arthur Samuel while working at IBM [1], the ongoing development in processor technology made it increasingly useful for the variety of applications it is used today. These developments enabled the usage on increasingly smaller devices, in contrast to the supercomputers and workstations needed before. The latest target devices are the so called edge devices, which include small computers like the Raspberry Pi and Nvidia Jetson and accelerator devices like the Intel Movidius NCS, short for Neural Compute Stick [2]. Their market is quite heterogeneous as for many applications there is a specialised device. Edge devices do not possess the massive computing power of a supercomputer, but can provide real time data processing close to the destination. One of the application fields would be facial recognition performed decentralised close to the security camera. Because of the small size of an edge device, it could even be integrated into the security camera. 1.2 A brief Overview of Neural Networks The following introduction to neural networks covers the structure, the training process and inference of neural networks as well as a short explanation of different layers. As this thesis only uses classification neural networks, all the following explanation refers to this kind of network. 1.2.1 Structure A neural network consists of neurons which loosely resemble neurons of a brain [3]. Neurons use an input and an activation function to compute an output. They are aggregated in layers. Between layers, the neurons are connected with different weights as parameters. These 1 1 Introduction Figure 1.1: A simple neural network with three layers source: https://commons.wikimedia.org/wiki/File:Neural_network_example.svg weights model the probability that an edge is taken when a signal runs through the network. A higher value indicates a higher probability of taking the edge [4]. Networks consist of an input layer which handles the input, for example a picture, and an output layer which gives the probabilities for the respective classes. In between these layers there may be one or more hidden layers. In figure 1.1, each layer is indicated by a different colour, for example green for the input layer. The weights are indicated by the thickness of the arrows. The term Deep Learning is used when the network has more than one hidden layer [5]. 1.2.2 Training In the training process of a neural network, the network learns by predicting test data. For every sample the predicted result is compared to the provided label. If they match, the weights of used edges are increased or decreased if the opposite is the case. The input shape specifies the way data is presented to the network. For example, the input shape could specify batch size, height and width of the picture and the number of channels. The batch size specifies the number of samples which are processed simultaneously. After each batch, the weights of the network are updated [6]. Height and width of the picture specify the amount of pixel in the respective direction. The channel count depends on the colours of the picture. For example, an RGB picture would take three channels, while one in greyscale only requires one [7]. An epoch represents a training run over the whole input dataset. Typically, multiple epochs are used for training. More epochs most of the time lead to better results, but too many make the network prone to overfitting which leads to wore accuracy on unknown data as the network is too closely adapted to the training data. Accuracy is defined as the number of 2 1 Introduction correct predictions divided by the total number of predictions [8]. A trained neural network can be saved as a model which includes the nodes and edges as well as the weights. The model can later be loaded for further training or inference runs. 1.2.3 Inferencing Inferencing describes the process of using a previously trained network to make predictions on new data. Just like in the training process, the network is presented an input which it processes. Unlike in the training process, the output cannot be compared to a label to verify it. Instead the class with the highest probability is taken as the prediction of what the input is. More complex models often have higher accuracy but tend to be slower. Batching can also be used in inference to increase performance. 1.2.4 Layers Figure 1.2: Visual representation of one of the used models in this thesis 3 1 Introduction Neural networks consist of multiple layers which serve different purposes. Some of the most important ones are described below. Knowing the different layers and their impact on accuracy and inference times is crucial to understand the benchmark results in the following chapters of the thesis. In figure 1.2 the layers of the network are shown top-down. Additional parameters are in the boxes underneath the name of the layer. Convolutional layers One of the latest revolutions in neural networks was the introduction of convolutional layers. Their purpose is the extraction of features, such as the detection of edges, by considering smaller squares of a picture. This is done by sliding a smaller matrix, called filter, over the matrix representing the picture. The filter is filled with predefined values which describe the shape of the feature to be extracted. For each position of the filter the dot product is performed. This is shown in figure 1.3 where the blue lines refer to the multiplication of the elements of the input matrix and the filter. The green lines represent the addition of the previously multiplied values. The stride determines how many pixels the filter slides every time.

DEPARTMENT of INFORMATICS Evaluation of Machine Learning

SOL: Effortless Device Support for AI Frameworks Without Source Code Changes

Intel® Deep Learning Boost (Intel® DL Boost) Product Overview

DL-Inferencing for 3D Cephalometric Landmarks Regression Task Using Openvino?

Dr. Fabio Baruffa Senior Technical Consulting Engineer, Intel IAGS Legal Disclaimer & Optimization Notice

Accelerate Scientific Deep Learning Models on Heteroge- Neous Computing Platform with FPGA

Deploying a Smart Queuing System on Edge with Intel Openvino Toolkit

Bigdl: Distributed Deep Leaning on Apache Spark Using Bigdl

Bigdl, for Apache Spark* Openvino, Ray* and Apache Spark*

Intel® Openvino™ with FPGA Support Through the Intel FPGA Deep Learning Acceleration Suite

Intel® Distribution of Openvino™ Toolkit Dr

Download The

Intel Openvino Video Inference on the Edge