3/10/2021

Agenda

TODAY Frameworks for Deep Learning. Implementing simple networks with . Implementing LeNet-5 with Keras and Caffe. TOMORROW Implementing Deep Neural Networks with TensorFlow 1.x. TensorBoard. The TensorFlow Eager API. 02/03/2021

Overview of the Apollo Framework for Autonomous Driving.

03/03/2021 The Perception and Prediction modules of Apollo.

2

Frameworks for DNNs

DNNs are typically developed, trained, and inferred by means of specific frameworks (i.e., toolkits, libraries), because they

 provide a simple high-level interface (mostly in Python);

 simplify the creation of new DNNs using a large set of pre-implemented layers (e.g., convolutions);

 allow training and inferring DNNs on different devices (e.g., CPUs, GPUs, mobile) by changing a few lines of code;

 exploit frequent library updates with new features;

In addition, a lot of examples and pre-trained networks are available online. 4

Frameworks for DNNs Frameworks for DNNs

The deep learning landscape is constantly changing, hence many frameworks have been developed during the last years, most of them open source:  The first widely adopted framework.  Created in 2007 at the Montreal Institute for Learning Algorithms (MILA), headed by Yoshua Bengio, one of the pioneers of deep learning.  Developed in Python.  Can run DNNs on either CPU or GPU architectures.  Since 2017, it is no longer maintained, due to the release of Different frameworks have different features, hence no one is many other frameworks, developed by large companies. the best for all the problems. Let’s consider some of them.

5 6

1 3/10/2021

Frameworks for DNNs Different design choices

(2013) by UC Berkeley Configuration (2015) by Google typical usage file C++ (2015) by Amazon Model (2016) (Computational Network Toolkit) by specification Python Microsoft. Now called Microsoft Cognitive Toolkit.

Programmatic (2017) by Facebook as an evolution of Caffe. R generation (2017) by Facebook. Caffe2 and Pytorch are going to be integrated into a single platform. Java (2017) (Deep Learning for Java) by Skymind as a deep learning library for Java and Scala. 7 8

High-level APIs High-level APIs

Besides these frameworks, there are also high-level interfaces that are wrapped around one or multiple frameworks.

. Released in 2017 by Amazon and supported by Microsoft. . High-level Python deep learning interface . Released in 2016 by François Chollet (Google). . It wraps MXNet and soon it will also include CNTK. . Python-based library for fast experimentation with DNN. . Gluon is a direct competitor for Keras. . It runs on top of TensorFlow, CNTK, Theano, or PlaidML. . User-friendly, modular, and extensible. . Quote: "API designed for human beings, not machines". . It allows creating a DNN by stacking layers without . Released in 2017 by Google’s DeepMind. specifying math operations, but only layer types. . It is built on top of TensorFlow. 9 10

High-level APIs Frameworks at a glance

Open Neural Network Exchange Open source deep learning frameworks (2017)

. In 2017, Microsoft, Amazon, Facebook, and others launched ONNX, an Open Neural Network Exchange format to represent deep learning models and port them between different frameworks. . ONNX enables models to be trained in one framework and transferred to another for inference.

. ONNX supports Caffe2, CNTK, MXNet, PyTorch, and GoogleMicrosoft Amazon Microsoft Facebook TensorFlow.

11 12

2 3/10/2021

The complete stack Popularity

GitHub users can “star” a repository to indicate that they like it. Appl. DNN So GitHub Stars can measure how popular a project is.

High-level ONNX TensorFlow APIs Keras Gluon

Mid-level Tensor APIs Theano Caffe Flow PyTorch CNTK MXNet

Language Python C++ Java  Caffe GitHubStars PyTorch OS Windows Android iOS  CNTK MXNet HW CPU DSP GPU TPU 

13 14

Combined metrics

Power score computed by mixing several criteria, as articles, TensorFlow is among the most used framework, together with books, citations, Github activities, Google search volume, etc. Caffe and PyTorch.

Power score 2018 Released in 2015 by Google

Main features Large community and support TensorBoard visualization tool Scalability to many platforms Good library mgmt and updates Not so intuitive interface Slower than other frameworks

15 16

The TensorFlow Stack

Convolution Architecture For Feature Extraction High-level Released in 2013 by UC Berkeley TF APIs Estimators Keras TF learn TF-sim 

Mid-level Main features Datasets Layers Metrics Losses  TF APIs Excellent for CNNs for image processing.

Language Python Frontend C++ While in TF the network is created by programming, in Caffe layers are created by specifying a set of parameters. TF runtime TensorFlow Distributed Execution Engine Quite fast compared to other frameworks.

Operating Command line, Python, C++, and MatLab interface. Systems Windows Linux Android iOS  Not so easy to learn. Hardware CPU DSP GPU TPU  Not so good with recurrent neural networks and sequence

17 models. 18

3 3/10/2021

Comparing frameworks

Released in 2017 by Facebook Tutorial CNN RNN Easy-to-use Multiple GPU Keras Languages Speed Material modeling modeling API support compatible Main features Theano Python, C++ ++ ++ ++ + ++ NO YES Support for dynamic graphs. This is efficient when the input Tensor- Python, YES YES varies, as in text processing. In 2017, TensorFlow introduced Flow C++ +++ +++ ++ +++ ++

Eager Execution, to evaluate Python, Interactive debugging. Pytorch operations immediately, without C++ + +++ ++ ++ +++ YES NO building graphs. Easier to get started. Python, Caffe C++ + +++ NO + + YES NO

Blend of high-level and low-level APIs. Python, R, MXNet Julia, Scala ++ ++ + ++ ++ YES YES Limited documentation. CNTK Python, C++ + + +++ + ++ YES YES No graphic visualization tools. DL4J Java, Scala +++ +++ +++ ++ ++ YES YES

19 20

NVIDIA framework

NVIDIA also provides support for DNN development, but only on top of their GPU platforms:

Appl. DNN Mid-level APIs Caffe PyTorch Theano Optimiz. Libraries cuDNN cuBLAS

Language CUDA

HW GPU

21 22

Data Set Finders Data sets

There are a lot of data sets on the internet for training DNNs. Natural images . MNIST: handwritten digits (yann.lecun.com/exdb/mnist/). The following are general data set repositories that allow . CIFAR10 / CIFAR100: 32×32 image dataset with 10 / 100 categories searching for the one you need: (www.cs.utoronto.ca/~kriz/cifar.html). . Kaggle: www.kaggle.com . COCO: large dataset for object detection, segmentation, and captioning (cocodataset.org). . UCI Machine Learning Repository: mlr.cs.umass.edu/ml . ImageNet: large image database organized according to the WordNet hierarchy (www.image-net.org). . VisualData: www.visualdata.io . Pascal VOC: dataset for image classification, object detection, and . CMU Library: guides.library.cmu.edu/machine-learning/datasets segmentation (https://pjreddie.com/projects/pascal-voc-dataset-mirror/). . COIL 20: 128×128 images of 20 objects taken at different rotation angles Data sets are usually divided into categories. (www.cs.columbia.edu/CAVE/software/softlib/coil-20.php). . COIL100: 128×128 images of 100 objects taken at different rotation angles (www1.cs.columbia.edu/CAVE/software/softlib/coil-100.php). 23 24

4 3/10/2021

Data sets Data sets

Faces Speech . Labelled Faces in the Wild: 13,000 images of faces collected from the . TIMIT Speech Corpus: DARPA Acoustic-Phonetic Continuous Speech web, labelled with the person name (vis-www.cs.umass.edu/lfw). Corpus for phoneme classification (github.com/philipperemy/timit). . Olivetti: images of several people faces at different angles . Aurora: Timit with noise and additional information (aurora.hsnr.de). (www.cs.nyu.edu/~roweis/data.html). . Sheffield: 564 images of 20 individuals each shown in a range of poses Music (https://www.sheffield.ac.uk/eee/research/iel/research/face). . Piano-midi.de: classical piano pieces (www.piano-midi.de). Text . Nottingham: over 1000 folk tunes (abc.sourceforge.net/NMD). . 20 newsgroups: Classification task to map word occurrences into 20 . MuseData: a collection of classical music scores (musedata.stanford.edu). newsgroup IDs (qwone.com/~jason/20Newsgroups). . JSB Chorales: a dataset of four-part harmonized chorales . Penn Treebank: used for next word prediction or next character prediction (https://github.com/czhuang/JSB-Chorales-dataset). (corochann.com/penn-tree-bank-ptb-dataset-introduction-1456.html). . FMA: a dataset For Music Analysis (github.com/mdeff/fma). . Broadcast News: large text dataset used for next word prediction (https://github.com/cyrta/broadcast-news-videos-dataset). 25 26

MNIST CIFAR-10

MNIST is the most popular datasets of handwritten digits. It CIFAR-10 consists of 60,000 32x32 color images organized in contains a training set of 60,000 examples and a test set of 10 classes (6,000 images per class). There are 50,000 training 10,000 examples. images and 10,000 test images:

airplane automobile bird cat deer   dog frog Size: 28 x 28 Greylevels: 256 horse (0  black, 1  white) train[i][0] or test[i][0]: i-th example ship Label: 0, 1, …, 9 train[i][1] or test[i][1]: i-th label 27 truck 28

COCO Olivetti Faces

COCO is a large-scale and rich dataset for object detection, Olivetti Faces is a dataset containing 400 images of faces of segmentation, and captioning. several people at different angles.

. 330,000 images . 1.5 million object instances . 80 object categories . Number of images: 400 . 91 stuff categories . Image size: 64x64 . 5 captions per image . Color depth: 8 bit, grayscale [0-255] . 250,000 people with keypoints 29 30

5 3/10/2021

TensorFlow and Keras

First of all, it must be clarified that: . Keras is an open-source machine learning library that generates code for TensorFlow and other frameworks; . Since 2017, the Keras API is integrated in TensorFlow and can be used as tf.keras (in a Python environment) . So neural networks can be developed either under Keras or TensorFlow, using the high levelAPI provided by tf.keras:

32

Installing TF using Pip Installing Anaconda

TensorFlow and Keras may be installed by using Pip, the Another way to install TensorFlow and Keras is by leveraging package manager of Python. Pip is automatically installed Anaconda, an open-source platform developed to perform when installing Python. machine learning in Python on Windows, Linux, and Mac OS. Installation procedure Installation procedure

1. Download and install Python for Windows / Linux / Mac OS from 1. Download Anaconda for Windows / Linux / Mac OS from https://www.python.org/downloads/ https://www.anaconda.com/

2. Open a terminal and issue the commands: 2. Install Anaconda

C:\User> pip install 3. Open Anaconda and create a new environment (e.g., neural)

C:\User> Python3 -m pip install --upgrade 4. From the environment, open a terminal. You should get a prompt like: https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow (neural) C:\User> -1.12.0-py3-none-any.whl

C:\User> pip install keras To close the terminal type exit. 33 34 https://www.tensorflow.org/install/pip?lang=python3

Installing packages Practice with Python

To use TensorFlow, Keras, and display some plots, we first If you are not familiar with Python, practice with it first. Here have to install the following packages: are some useful links:

(neural) C:\User> conda install tensorflow –y Official materials (neural) C:\User> conda install keras –y . Tutorial: http://docs.python.org/tutorial/ (neural) C:\User> conda install matplotlib –y . Docs: http://docs.python.org/index.html Notes Books . This operation has to be done only once! If you close the terminal and . Fundamentals of Programming Python (Richard L. Halterman) reopen it later, the packages do not have to be reinstalled. https://python.cs.southern.edu/pythonbook/pythonbook.pdf . Each package may install a number of other packages. For example, . A Practical Introduction to Python Programming (Brian Heinold) TensorFlow also installs numpy, tensorboard, protobuf, etc. https://www.brianheinold.net/python/python_book.html The list of the packages currently installed in the environment Slides can be seen by the command: . http://tdc-www.harvard.edu/Python.pdf (neural) C:\User> conda list 35 . https://www.seas.upenn.edu/~cis391/Lectures/python-tutorial.pdf 36

6 3/10/2021

A look at the MINST dataset Displaying the 2nd digit

Now, open Python and execute the following commands: >>> import matplotlib.pyplot as plt >>> digit = train_images[1] >>> plt.imshow(digit, cmap=plt.cm.binary) (neural) C:\User> python >>> plt.show() >>> from keras.datasets import mnist >>> (x_train, y_train), (x_test, y_test) = mnist.load_data()

>>> print(x_train.ndim) dimensions (# of axis of the tensor) 3 It is a 3D tensor >>> print(x_train.shape) # of elements for each axis (60000, 28, 28) (60000 images of 28x28 pixels) >>> print(x_train.dtype) uint8 8-bit unsigned integer

37 38

A 3-layer perceptron

Suppose we want to create a 3-layer network for handwritten digits recognition with the following features:

. Input layer: 28 x 28 images 128 (i.e., 784 input values); 10 . Hidden layer: 128 neurons with ReLU activation function;

. Output layer: 10 neurons 28 with softmax activation function.    28

softmax ReLU

40

Workflow Import libraries & data

Once the all the packages have been installed, a neural network can be executed through the following steps: # import libraries import numpy as np # array library 1. Import libraries and modules; # import the MNIST dataset 2. Import the dataset; from keras.datasets import mnist 3. Preprocess the input data and class labels; # load the dataset 4. Define the network; (train_images, train_labels), (test_images, test_labels) = \ mnist.load_data() 5. Compile the network; Note: the backslash is used to break a line when the 6. Train the network with the training set; text is not between parentheses. 7. Test the network with the test set.

41 42

7 3/10/2021

Preprocess data One-hot encoding

A one-hot encoder performs binarization of the categories and includes #reshape data into a 2D array it as a feature to train the model train_images = train_images.reshape((60000, 28*28)) Sample Category test_images = test_images.reshape((10000, 28*28)) 1 Car 2 Traffic Light #scale data to 32-bit floats in [0,1] 3 Person train_images = train_images .astype('float32') / 255 4 Car test_images = test_images.astype('float32') / 255

#convert labels to one-hot encoding from keras.utils import to_categorical train_labels = to_categorical(train_labels) Sample Car Traffic Light Person test_labels = to_categorical(test_labels) 1 1 0 0 2 0 1 0 3 0 0 1 4 1 0 0 43 44

Define the network Define the network

To define the network, we first import the Sequential model and Note that the layers can also be added with the .add() method: the Dense layer, and then create the network: model = Sequential() #import the needed model and layer types model.add(Dense(128, activation='relu', input_dim=(28*28))) from keras.models import Sequential model.add(Dense(10, activation='softmax')) from keras.layers import Dense As an alternative, the activation function can also be specified #define the network as a sequence of layers as a layer: model = Sequential([ Dense(128, activation=‘relu’, input_dim=(28*28)), model = Sequential() Dense(10, activation='softmax'), model.add(Dense(128, input_dim=(28*28))) ]) model.add(Activation('relu')) model.add(Dense(10)) model.add(Activation('softmax'))

45 46

Compile the network Available Loss functions

To compile the network, we need to specify three more things: There are many loss functions in Keras. Here are a few . A loss function to evaluate the error on the training data examples: (e.g., the mean squared error); . ‘mean_squared_error’ . An optimizer, i.e., the mechanism through which the . ‘mean_absolute_error’ network updates its weights (e.g., stochastic gradient descent); . ‘categorical_crossentropy’ . A metrics to evaluate the network performance during . ‘binary_crossentropy’ training and testing (e.g., the accuracy). . ‘cosine_proximity’

model.compile( loss='categorical_crossentropy', optimizer=‘rmsprop', metrics=['accuracy'])

47 48

8 3/10/2021

Available optimizers Compile the network

A number of optimizers can be specified in Keras: Other optimizer-specific parameters can be set as follows: . SGD (stochastic gradient descent) includes support for learning rate decay, momentum, and Nesterov momentum. from keras import optimizers . Adagrad (Adaptive gradient). It is a modified version of SGD with multiple learning rates, which are adapted during training. sgd = optimizers.SGD(lr=0.1, decay=1e-6, . RMSProp (Root Mean Square Propogation). It is good for RNNs. momentum=0.9, . Adadelta. It is a more robust extension of Adagrad. nesterov=True) . Adam (Adaptive Moment Estimation). It improves RMSProp by using running averages of gradients and second moments of the gradients. model.compile( loss='categorical_crossentropy', optimizer=sgd, . Adamax. A variant of Adam based on the infinity norm. metrics=['accuracy']) . Nadam (Nesterov-accelerated Adam). It combines Adam and Nesterov-accelerated SGD.

49 50

Train the network Test the network

The network can be trained by calling the .fit() method: Once training is over, we can check how the network performs model.fit(train_images, train_labels, on the test set by the .evaluate() method: epochs=5, batch_size=128) test_loss, test_acc = model.evaluate(test_images, test_labels) . epochs = # of iterations over the entire data set print(‘Test accuracy:', test_acc) . batch_size = # of samples per gradient update

Two quantities are displayed during training: For more information on Keras functions and parameters visit: . the loss of the network over the training data . the accuracy of the network over the training data. https://keras.io/ Epoch 1/5 60000/60000 [======] - 9s - loss: 0.2524 - acc: 0.9273 Epoch 2/5 51328/60000 [======>.....] - ETA: 1s - loss: 0.1035 - acc: 0.9692

ETA (estimated time of arrival) is the estimated time to complete one epoch. 51 52

Display graphs Display the loss

A graph of loss and accuracy can be obtained as follows: A graphical display of the loss can be obtained as follows:

import matplotlib.pyplot as plt # Create count of the number of epochs epoch_count = range(1, len(train_loss) + 1) history = model.fit(train_images, train_labels, epochs=5, batch_size=128, verbose=0, # plot the training and validation loss validation_data=(test_images, test_labels)) plt.plot(epoch_count, train_loss, ‘b-') plt.plot(epoch_count, valid_loss, ‘g-') test_loss, test_acc = model.evaluate(test_images, test_labels) plt.title(‘model loss') plt.xlabel(‘epoch') # Get training and validation loss histories plt.ylabel(‘loss') train_loss = history.history['loss'] plt.legend([‘train', ‘test'], loc='upper left') valid_loss = history.history['val_loss'] plt.show()

# Get training and validation accuracy histories train_acc = history.history['acc'] valid_acc = history.history['val_acc'] 53 54

9 3/10/2021

Loss graph Display the accuracy

The displayed graph looks like the following: A graphical display of the accuracy can be obtained as follows:

# Create count of the number of epochs epoch_count = range(1, len(train_acc) + 1)

# plot the training and validation accuracy plt.plot(epoch_count, train_acc, ‘b-') plt.plot(epoch_count, valid_acc, ‘g-') plt.title(‘model accuracy') plt.xlabel(‘epoch') plt.ylabel(‘accuracy') plt.legend([‘train', ‘test'], loc='upper left') plt.show()

55 56

Accuracy graph

The displayed graph looks like the following:

57

LeNet-5 Keras preliminary code

L6 L1 L2 L3 L4 L5 # Load MNIST dataset as train and test sets L7 from keras.datasets import mnist Input FC FC (x_train, y_train), (x_test, y_test) = mnist.load_data()

image flatten 10X10 5X5 1X1 14X14X6 10 # Convert type from uint8 [0,255] to float32 in [0,1] X16 X16 X120 28X28X1 28X28X6 x_train = x_train.astype(‘float32’) / 255 84 120 x_test = x_test.astype(‘float32’) / 255 kernel size stride padding # Reshape the dataset into 4D array Layer Input operation filters r s P output x_train = x_train.reshape(x_train.shape[0], 28, 28, 1) L1 28x28x1 conv 6 5 1 2 28x28x6 x_test = x_test.reshape(x_test.shape[0], 28, 28, 1) L2 28x28x6 avg-pool 6 2 2 0 14x14x6 L3 14x14x6 conv 16 5 1 0 10x10x16 # Transform labels to one-hot encoding from keras.utils import to_categorical L4 10x10x16 avg-pool 16 2 2 0 5x5x16 y_train = to_categorical(y_train) L5 5x5x16 conv 120 5 1 0 1x1x120 y_test = to_categorical(y_test) L6 120 FC+ReLU - - - - 84 L7 84 FC+Softmax - - - - 10 59 60

10 3/10/2021

Keras LeNet-5: 1st layer Keras LeNet-5: 1st layer

model.add(Conv2D(filters, kernel_size, activation, strides, padding, input_shape)) . filters: number of feature maps in the output layer . kernel_size: height and width of the 2D convolution window

• # features map = 6 . strides: number of units by which the kernel overlaps with the • input size = 28x28 previous convolution window (height and width) • kernel size = 5 • strides = 1

model.add(Conv2D(filters=6, kernel_size=(5, 5), Example with activation='tanh', stride = (1,1) padding='same', input_shape=(28,28,1)))

61 62 Source: https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2/

Keras LeNet-5: 1st layer Keras LeNet-5: 2nd layer model.add(Conv2D(filters, kernel_size, activation, strides, padding, input_shape)) . activation_function: activation function (tanh, linear, etc.) . input_shape: input size

. padding: padding options to apply the convolution kernel: • # features map = 6 . ‘valid’ means no padding (data may be dropped); • input size = 14x14 . ‘same’ padding with zero is applied. • pool_size = 2x2 • strides = 2

1D Example with stride=1: dropped

‘valid’ 1 2 3 4 5 6 7 8 model.add(AveragePooling2D(pool_size=(2,2)))

‘same’ 1 2 3 4 5 6 7 8 0

63 64 Source: https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2/

Keras LeNet-5: 3rd layer Keras LeNet-5: 4th layer

• # features map = 6 • # features map = 6 • input size = 10x10 • input size = 5x5 • kernel_size = 5 • pool_size = 2x2 • strides = 1 • strides = 2

model.add(Conv2D(filters=16, model.add(AveragePooling2D(pool_size=(2,2))) kernel_size=(5, 5), activation='tanh'))

65 66

11 3/10/2021

Keras LeNet-5: flattening Keras LeNet-5: 5th layer

• # features map = 120 Tensor flattening: from n-dimensional to 1-dimensional tensors

model.add(Flatten())

model.add(Dense(units=120, activation='tanh'))

3D layer Flatten 1D layer 67 68

Keras LeNet-5: 6th layer Keras LeNet-5: 7th layer

• # features map = 84 • # features map = 10

model.add(Dense(units=84, activation='tanh')) model.add(Dense(units=10, activation=‘softmax'))

69 70

LeNet-5 structure LeNet-5 execution from keras.models import Sequential from keras.layers import Conv2D, AveragePooling2D, Flatten, Dense # Compile the model model.compile(loss='categorical_crossentropy', model = Sequential() optimizer=’SGD’, metrics=[‘accuracy’])) model.add(Conv2D(filters=6, kernel_size=(5, 5), activation='tanh', padding='same', input_shape=(28,28,1))) # Train the model hist = model.fit(x_train, y_train, model.add(AveragePooling2D(pool_size=(2,2))) epochs=10, batch_size=128, model.add(Conv2D(filters=16, kernel_size=(5, 5), activation='tanh')) validation_data=(x_test, y_test), model.add(AveragePooling2D()) verbose=1) model.add(Flatten()) # Evaluate the model model.add(Dense(units=120, activation='tanh')) test_score = model.evaluate(x_test, y_test) print(‘Test loss {:.4f}, accuracy {:.2f}%’.format(test_score[0], model.add(Dense(units=84, activation='tanh')) test_score[1]*100)) model.add(Dense(units=10, activation='softmax')) 71 72

12 3/10/2021

Installing Caffe

 Caffe is supported by:

 Ubuntu/Debian/Fedora  OS X

 Windows

 It has several dependencies:  BLAS (via ATLAS, MKL, or OpenBLAS)  OpenCV

 BOOST

 As for Tensorflow, GPU packages require a CUDA-enabled GPU card.

http://caffe.berkeleyvision.org/installation.html

Installing Caffe in Ubuntu >= 17.04 Installing Caffe from Source (hints)

 It is necessary to clone the git repository  To install the pre-compiled version of Caffe, just write in your terminal git clone https://github.com/BVLC/caffe.git sudo apt install caffe-cpu  and build Caffe: mkdir build for the CPU-only version, or cd build cmake .. sudo apt install caffe- make all make install for the GPU version. make runtest  A detailed tutorial about how to install Caffe is available at: http://caffe.berkeleyvision.org/installation.html

http://caffe.berkeleyvision.org/installation.html

Caffe Protocol Buffers Caffe Protocol Buffers

 Protocol buffers are a way of serializing structured data to be used in  Caffe networks are defined and stored using two files: communication protocol and data storage  A .prototxt file storing the network structure  Simpler and smaller than XML, easier to be managed programmatically  A .caffemodel file storing the weights Example:

 The prototxt file defines the structure of a neural network by means of a Google Protocol Buffer file

https://developers.google.com/protocol-buffers/docs/overview

13 3/10/2021

Caffe Protocol Buffers Caffe Protocol Buffers

 The file caffe.proto specifies  To define a neural network, it is necessary to write a the format adopted by Caffe caffe::NetParameter protobuf

 It contains definitions for  It is composed of a name networks, parameters for training algorithms, and much name: "LeNet" more  and a definition for each layer  Additional details can be found at: layer{ name: “ExampleLayer” https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto type=“Convolution” … }

Caffe LeNet-5: Input layer Caffe LeNet-5: Data layer

layer { name: "mnist" normalization of pixels in the type: "Data" range [0,1] (1/256) transform_param { scale: 0.00390625 top } data_param { dataset source: "mnist_lmdb" backend: LMDB batch_size: 64

Layer } top: "data" output of the layer: data top: "label" (images) and labels bottom }

81 82

Caffe LeNet-5: Input layer Caffe LeNet-5: 1st layer

. Multiple input layers can be specified (controllable with ‘phase’)

Input layer for training Input layer for test layer { layer { name: "mnist" name: "mnist" type: "Data" type: "Data" top: "data" top: "data" • # features map = 6 top: "label" top: "label" • input size = 28x28 include { include { • kernel size = 5 phase: TRAIN phase: TEST • strides = 1 } } data_param { data_param { source: source: "examples/mnist/mnist_train" "examples/mnist/mnist_test" batch_size: 64 batch_size: 100 backend: LMDB backend: LMDB } } } } 83 84

14 3/10/2021

Caffe LeNet-5: 1st layer Caffe LeNet-5: 1st layer

layer { Xavier Filler is a popularlayer method{ to provide initial values for weights. name: "conv1" name: "conv1" number of filters In Keras: number of filters type: "Convolution" type: "Convolution" convolution_param { model.add(Dense(64,convolution_param kernel_initializer='{ glorot_normal', num_output: 6 bias_initializer=‘constant'))num_output: 6 kernel_size: 5 kernel_size: 5 pad : 0 weights initialization according pad : 0 weights initialization according • # features map = 6 stride: 1 • # features map = 6 stride: 1 • input size = 28x28 weight_filler { to the Xavier’s filler [1] • input size = 28x28 weight_filler { to the Xavier’s filler [1] • kernel size = 5 type: "xavier" • kernel size = 5 type: "xavier" • strides = 1 } • strides = 1 } bias_filler { initialize the bias to bias_filler { initialize the bias to type: "constant" zero type: "constant" zero } } } } bottom: "data" input and output bottom: "data" input and output top: "conv1" data top: "conv1" data } }

85 86

Caffe LeNet-5: 1st layer Caffe: Convolutional layers

layer { . By default, Caffe does not provide an activation function with name: "conv1" number of filters convolution layers type: "Convolution" convolution_param { . If needed, activation functions have to be explicitly defined as a num_output: 6 separate layer kernel_size: 5 pad : 0 weights initialization according • # features map = 6 stride: 1 • input size = 28x28 weight_filler { to the Xavier’s filler [1] • kernel size = 5 type: "xavier" layer { output of the convolutional layer taken • strides = 1 } name: "conv1af" in input bias_filler { initialize the bias to bottom: "conv1" type: "constant" zero } top: "conv1af" } type: TANH bottom: "data" input and output } type of activation function top: "conv1" data }

87 88

Caffe: Convolutional layers Caffe LeNet-5: 2nd layer

. Caffe allows for a very flexible configuration of convolutional layers . The sizes of the convolutional kernel, the stride, and the padding can be specified for each dimension . Example: 6 • # features map = 6 • input size = 14x14 layer { +2 • pool_size = 2x2 <…> • strides = 1 5 kernel_h: 5 kernel_w: 6 pad_h : 1 pad = 2 pad_w : 2 +1 stride_h: 1 stride_w: 2 <…> } pad = 1 89 90

15 3/10/2021

Caffe LeNet-5: 2nd layer Caffe LeNet-5: 3rd layer

layer { name: "pool1" type: "Pooling" pooling_param { pooling method • # features map = 6 kernel_size: 2 • # features map = 6 • input size = 14x14 stride: 1 (MAX, AVE or STOCHASTIC) • input size = 10x10 • pool_size = 2x2 pool: AVE • kernel_size = 5 • strides = 1 • strides = 1 }

bottom: "conv1af" top: "pool1" input and output data }

91 92

Caffe LeNet-5: 3rd layer Caffe LeNet-5: 4th layer

layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" convolution_param { • # features map = 6 num_output: 16 • # features map = 6 • input size = 10x10 kernel_size: 10 • input size = 5x5 • kernel_size = 5 stride: 1 • pool_size = 2x2 • strides = 1 • strides = 2 weight_filler { type: "xavier" } bias_filler { type: "constant" } } }

93 94

Caffe LeNet-5: 4th layer Caffe LeNet-5: 5th layer

layer { name: "pool2" type: "Pooling" pooling_param { • # features map = 6 kernel_size: 2 pooling method • # features map = 120 • input size = 5x5 (MAX, AVE or STOCHASTIC) • pool_size = 2x2 stride: 1 • strides = 2 pool: AVE }

bottom: "conv2af" top: "pool2" input and output data }

95 96

16 3/10/2021

Caffe LeNet-5: 5th layer Caffe LeNet-5: 5th layer

layer { name: "ip1" layer { type: "InnerProduct" name: “ip1af" inner_product_param { bottom: “ip1" num_output: 500 weight_filler { top: “ip1" • # features map = 120 type: "xavier" • # features map = 120 type: TANH } } bias_filler { type: "constant" } } bottom: "pool2" top: "ip1" }

No need to explicitly specify the flattening 97 98

Caffe LeNet-5: 6th layer Caffe LeNet-5: 6th layer

layer { name: "ip2" type: "InnerProduct" bottom: "ip1af" top: "ip2" inner_product_param { • # features map = 84 • # features map = 84 num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } }

99 100

Caffe LeNet-5: 7th layer Caffe: Additional layers

 As showed for the input layer, some layers can be defined to be used only in specific phases

scores the output as the accuracy of the DNN output

• # features map = 10 layer { name: "accuracy" type: "Accuracy" layer { bottom: "ip2" name: "prob" bottom: "label" type: "Softmax" top: "accuracy" include { enabled only during testing bottom: "ip2" phase: TEST top: "prob" } } }

101 102

17 3/10/2021

Caffe: Defining the solver Caffe: Console output

protobuf file with the DNN definition I1203 net.cpp:66] Creating Layer conv1 I1203 net.cpp:76] conv1 <- data Initialization: details about each layer, net: "lenet_train_test.prototxt" I1203 net.cpp:101] conv1 -> conv1 specifies how many forward passes the connections, and output shape test_iter: 100 I1203 net.cpp:116] Top shape: 20 24 24 test should carry out I1203 net.cpp:127] conv1 needs backward computation. test_interval: 500 testing every 500 training iterations I1203 net.cpp:142] Network initialization done. base_lr: 0.01 I1203 solver.cpp:36] Solver scaffolding done. Training: Solver setting momentum: 0.9 I1203 solver.cpp:44] Solving LeNet weight_decay: 0.0005 I1203 solver.cpp:204] Iteration 100, lr = 0.00992565 lr is the learning and training function I1203 solver.cpp:66] Iteration 100, loss = 0.26044 value of each iteration lr_policy: "inv" learning rate policy ... I1203 solver.cpp:84] Testing net gamma: 0.0001 score 0 is the accuracy, and I1203 solver.cpp:111] Test score #0: 0.9785 score 1 is the testing loss function power: 0.75 I1203 solver.cpp:111] Test score #1: 0.0606671 display: 100 display every 100 iterations and I1203 solver.cpp:126] Snapshotting to lenet_iter_10000 max_iter: 10000 max number of iterations I1203 solver.cpp:133] Snapshotting solver state to traning phase completed lenet_iter_10000.solverstate I1203 solver.cpp:78] Optimization Done. snapshot: 5000 snapshot of intermediate results snapshot_prefix: "examples/mnist/lenet" name of the output (binary) protobuf file, which can be lenet_iter_10000 used for deployment solver_mode: GPU CPU or GPU 103 104

References

• https://www.floydhub.com/ • https://engmrk.com/lenet-5-a-classic-cnn-architecture/ • https://engmrk.com/module-22-implementation-of-cnn-using-keras/ • http://caffe.berkeleyvision.org/gathered/examples/mnist.html • http://tutorial.caffe.berkeleyvision.org/tutorial/layers.html • https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it- e3c6186d008f • https://stackoverflow.com/questions/37674306/what-is-the-difference-between-same-and- Thank you! valid-padding-in-tf-nn-max-pool-of-t • https://developers.google.com/protocol-buffers/docs/overview • https://developers.google.com/machine-learning/crash-course/embeddings/categorical- input-data Daniel Casini [email protected]

18