Scalable with TensorFlow and Apache ™ Schedule

and Neural Network Fundamentals ▪ MLflow and Spark UDFs ▪ Hyperparameter Tuning with Hyperopt ▪ Horovod: Distributed Model Training ▪ LIME, SHAP & Model Interpretability ▪ CNNs and ImageNet ▪ Transfer Learning ▪ Object Detection ▪ Generative Adversarial Networks (GANs) Survey

▪ Pandas/Spark? ▪ ? Deep Learning? ▪ Expectations? Deep Learning Overview Why Deep Learning?

▪ Performs well on complex datasets like images, sequences, and natural language ▪ Scales better as data size increases ▪ Theoretically can learn any shape (universal approximation theorem) Open Source Landscape Where Does Dl Fit In?

Image source What Is Deep Learning?

▪ “Composing representations of data in a hierarchical manner”

Image source Image source Keras

▪ High-level Python API to build neural networks ▪ Official high-level API of TensorFlow ▪ Has over 250,000 users ▪ Released by François Chollet in 2015 Why Keras? Hardware Considerations

▪ GPUs are prefered for training due to speed of computation, but not good in data transfer ▪ CPUs are generally acceptable for inference Why Dl On Databricks? Neural Network Fundamentals Layers

▪ Input layer ▪ Zero or more hidden layers ▪ Output layer

Image source Regression Evaluation

▪ Measure "closeness" between label and prediction ▪ When predicting someone's weight, better to be off by 2 lbs instead of 20 lbs ▪ Evaluation metrics: ▪ Loss: (y−ŷ) ▪ Absolute loss: |y−ŷ| ▪ Squared loss: (y−ŷ)2 Evaluation Metric: MSE Evaluation Metric: MSE Backpropagation

▪ Calculate gradients to update weights With Keras Activation Functions

▪ Provide non-linearity in our neural networks to learn more complex relationships

▪ Sigmoid ▪ Leaky ReLU ▪ Tangent ▪ PReLU ▪ ReLU ▪ ELU Sigmoid

▪ Saturates and kills gradients ▪ Not zero-centered

Image source Hyperbolic Tanget (Tanh)

▪ Zero centered! ▪ BUT, like the sigmoid, its activations saturate

Image source RELU

▪ BUT, gradients can still go to zero

Image source Leaky RELU ▪ For x < 0: ▪ f(x) = α∗x ▪ For x > = 0: ▪ f(x) = x

These functions are not differentiable at 0, so we set the derivative to 0 or average of left and right derivative. Image source Comparison

Image source Optimizers Stochastic Gradient Descent (Sgd)

▪ Choosing a proper can be difficult.

Image source Stochastic Gradient Descent

▪ Easy to get stuck in local minima

Image source Momentum

▪ Accelerates SGD: Like pushing a ball down a hill ▪ Take average of direction we’ve been heading (current velocity and acceleration) ▪ Limits oscillating back and forth, gets out of local minima

Image source ADAM

▪ Adaptive Moment Estimation (Adam)

Image source Keras Keras Lab Hyperparameter Selection Hyperparameter Selection

Which dataset should we use to select hyperparameters? Train? Test? Validation Dataset

▪ Split the dataset into three! ▪ Train on the training set ▪ Select hyperparameters based on performance of the validation set ▪ Test on test set Advanced Keras & Lab MLflow & Lab Hyperopt & Lab Horovod

Image source Image source Horovod

▪ Created by Alexander Sergeev of Uber, open-sourced in 2017 ▪ Simplifies distributed neural network training ▪ Supports TensorFlow, Keras, PyTorch, and Apache MXNet Classical Parameter Server

Image source All-Reduce

# Only one line of code change! optimizer = hvd.DistributedOptimizer(optimizer) Image source Horovod Demo Model Interpretability LIME

Image source SHAP

▪ Shapley Values

Image source SHAP

Image source Convolutional Neural Networks Convolutions

▪ Focus on Local Connectivity (fewer parameters to learn) ▪ Filter/kernel slides across input image (often 3x3)

Image Kernels Visualization

CS 231 Convolutional Networks

Imagenet Challenge

▪ Classify images in one of 1000 categories ▪ 2012 Deep Learning breakthrough with AlexNet: 16% top-5 test error rate (next closest was 25%) VGG16 (2014)

▪ One of the most widely used architectures for its simplicity.

Image source Max Vs Avg. Pooling

Image source Inception

Image source Residual Connection

Image source What Do CNNS Learn?

▪ Breaking Convnets CNN Demo Transfer Learning Transfer Learning

▪ IDEA: Intermediate representations learned for one task may be useful for other related tasks

Image source When To Use Transfer Learning?

Reference: Andrej Karpathy’s Transfer Learning Transfer Learning Ok, so how do I find the optimal neural network architecture? Neural Architecture Search with Reinforcement Learning

Generative Adversarial Networks Generative Adversarial Networks (GANS)

▪ Estimates generative models ▪ Simultaneously trains two models ▪ G: a generative model captures the data distribution ▪ D: a discriminative model predicts probability of data coming from G ▪ Used in generating art, deep fakes, up-scaling graphics, and astronomy research ▪ Paper Gans Architecture: 2 Models

Image source The Algorithm

▪ G takes noise as input, outputs a counterfeit ▪ D takes counterfeits and real values as input, outputs P(counterfeit) ▪ To prevent overfitting… ▪ Alternate k steps of optimizing D and one step of optimizing G ▪ Start with k of at least 5 ▪ Use log(1 - D(G(z))) to provide stronger, non-saturated gradients Resources

▪ Horovod Meetup Talk ▪ MLflow ▪ Deep Learning with Python ▪ Stanford's CS 231 ▪ fast.ai ▪ Blog posts & webinars ▪ Databricks Runtime for ML Pyception Thank you!