Scalable Deep Learning with TensorFlow and Apache Spark™ Schedule
▪ Keras and Neural Network Fundamentals ▪ MLflow and Spark UDFs ▪ Hyperparameter Tuning with Hyperopt ▪ Horovod: Distributed Model Training ▪ LIME, SHAP & Model Interpretability ▪ CNNs and ImageNet ▪ Transfer Learning ▪ Object Detection ▪ Generative Adversarial Networks (GANs) Survey
▪ Pandas/Spark? ▪ Machine Learning? Deep Learning? ▪ Expectations? Deep Learning Overview Why Deep Learning?
▪ Performs well on complex datasets like images, sequences, and natural language ▪ Scales better as data size increases ▪ Theoretically can learn any shape (universal approximation theorem) Open Source Landscape Where Does Dl Fit In?
Image source What Is Deep Learning?
▪ “Composing representations of data in a hierarchical manner”
Image source Image source Keras
▪ High-level Python API to build neural networks ▪ Official high-level API of TensorFlow ▪ Has over 250,000 users ▪ Released by François Chollet in 2015 Why Keras? Hardware Considerations
▪ GPUs are prefered for training due to speed of computation, but not good in data transfer ▪ CPUs are generally acceptable for inference Why Dl On Databricks? Neural Network Fundamentals Layers
▪ Input layer ▪ Zero or more hidden layers ▪ Output layer
Image source Regression Evaluation
▪ Measure "closeness" between label and prediction ▪ When predicting someone's weight, better to be off by 2 lbs instead of 20 lbs ▪ Evaluation metrics: ▪ Loss: (y−ŷ) ▪ Absolute loss: |y−ŷ| ▪ Squared loss: (y−ŷ)2 Evaluation Metric: MSE Evaluation Metric: MSE Backpropagation
▪ Calculate gradients to update weights Linear Regression With Keras Activation Functions
▪ Provide non-linearity in our neural networks to learn more complex relationships
▪ Sigmoid ▪ Leaky ReLU ▪ Tangent ▪ PReLU ▪ ReLU ▪ ELU Sigmoid
▪ Saturates and kills gradients ▪ Not zero-centered
Image source Hyperbolic Tanget (Tanh)
▪ Zero centered! ▪ BUT, like the sigmoid, its activations saturate
Image source RELU
▪ BUT, gradients can still go to zero
Image source Leaky RELU ▪ For x < 0: ▪ f(x) = α∗x ▪ For x > = 0: ▪ f(x) = x
These functions are not differentiable at 0, so we set the derivative to 0 or average of left and right derivative. Image source Comparison
Image source Optimizers Stochastic Gradient Descent (Sgd)
▪ Choosing a proper learning rate can be difficult.
Image source Stochastic Gradient Descent
▪ Easy to get stuck in local minima
Image source Momentum
▪ Accelerates SGD: Like pushing a ball down a hill ▪ Take average of direction we’ve been heading (current velocity and acceleration) ▪ Limits oscillating back and forth, gets out of local minima
Image source ADAM
▪ Adaptive Moment Estimation (Adam)
Image source Keras Keras Lab Hyperparameter Selection Hyperparameter Selection
Which dataset should we use to select hyperparameters? Train? Test? Validation Dataset
▪ Split the dataset into three! ▪ Train on the training set ▪ Select hyperparameters based on performance of the validation set ▪ Test on test set Advanced Keras & Lab MLflow & Lab Hyperopt & Lab Horovod
Image source Image source Horovod
▪ Created by Alexander Sergeev of Uber, open-sourced in 2017 ▪ Simplifies distributed neural network training ▪ Supports TensorFlow, Keras, PyTorch, and Apache MXNet Classical Parameter Server
Image source All-Reduce
# Only one line of code change! optimizer = hvd.DistributedOptimizer(optimizer) Image source Horovod Demo Model Interpretability LIME
Image source SHAP
▪ Shapley Values
Image source SHAP
Image source Convolutional Neural Networks Convolutions
▪ Focus on Local Connectivity (fewer parameters to learn) ▪ Filter/kernel slides across input image (often 3x3)
Image Kernels Visualization
CS 231 Convolutional Networks
Imagenet Challenge
▪ Classify images in one of 1000 categories ▪ 2012 Deep Learning breakthrough with AlexNet: 16% top-5 test error rate (next closest was 25%) VGG16 (2014)
▪ One of the most widely used architectures for its simplicity.
Image source Max Vs Avg. Pooling
Image source Inception
Image source Residual Connection
Image source What Do CNNS Learn?
▪ Breaking Convnets CNN Demo Transfer Learning Transfer Learning
▪ IDEA: Intermediate representations learned for one task may be useful for other related tasks
Image source When To Use Transfer Learning?
Reference: Andrej Karpathy’s Transfer Learning Transfer Learning Ok, so how do I find the optimal neural network architecture? Neural Architecture Search with Reinforcement Learning
Generative Adversarial Networks Generative Adversarial Networks (GANS)
▪ Estimates generative models ▪ Simultaneously trains two models ▪ G: a generative model captures the data distribution ▪ D: a discriminative model predicts probability of data coming from G ▪ Used in generating art, deep fakes, up-scaling graphics, and astronomy research ▪ Paper Gans Architecture: 2 Models
Image source The Algorithm
▪ G takes noise as input, outputs a counterfeit ▪ D takes counterfeits and real values as input, outputs P(counterfeit) ▪ To prevent overfitting… ▪ Alternate k steps of optimizing D and one step of optimizing G ▪ Start with k of at least 5 ▪ Use log(1 - D(G(z))) to provide stronger, non-saturated gradients Resources
▪ Horovod Meetup Talk ▪ MLflow ▪ Deep Learning with Python ▪ Stanford's CS 231 ▪ fast.ai ▪ Blog posts & webinars ▪ Databricks Runtime for ML Pyception Thank you!