Scalable Deep Learning with Tensorflow and Apache Spark™ Schedule

Scalable Deep Learning with TensorFlow and Apache Spark™ Schedule ▪ Keras and Neural Network Fundamentals ▪ MLflow and Spark UDFs ▪ Hyperparameter Tuning with Hyperopt ▪ Horovod: Distributed Model Training ▪ LIME, SHAP & Model Interpretability ▪ CNNs and ImageNet ▪ Transfer Learning ▪ Object Detection ▪ Generative Adversarial Networks (GANs) Survey ▪ Pandas/Spark? ▪ Machine Learning? Deep Learning? ▪ Expectations? Deep Learning Overview Why Deep Learning? ▪ Performs well on complex datasets like images, sequences, and natural language ▪ Scales better as data size increases ▪ Theoretically can learn any shape (universal approximation theorem) Open Source Landscape Where Does Dl Fit In? Image source What Is Deep Learning? ▪ “Composing representations of data in a hierarchical manner” Image source Image source Keras ▪ High-level Python API to build neural networks ▪ Official high-level API of TensorFlow ▪ Has over 250,000 users ▪ Released by François Chollet in 2015 Why Keras? Hardware Considerations ▪ GPUs are prefered for training due to speed of computation, but not good in data transfer ▪ CPUs are generally acceptable for inference Why Dl On Databricks? Neural Network Fundamentals Layers ▪ Input layer ▪ Zero or more hidden layers ▪ Output layer Image source Regression Evaluation ▪ Measure "closeness" between label and prediction ▪ When predicting someone's weight, better to be off by 2 lbs instead of 20 lbs ▪ Evaluation metrics: ▪ Loss: (y−ŷ) ▪ Absolute loss: |y−ŷ| ▪ Squared loss: (y−ŷ)2 Evaluation Metric: MSE Evaluation Metric: MSE Backpropagation ▪ Calculate gradients to update weights Linear Regression With Keras Activation Functions ▪ Provide non-linearity in our neural networks to learn more complex relationships ▪ Sigmoid ▪ Leaky ReLU ▪ Tangent ▪ PReLU ▪ ReLU ▪ ELU Sigmoid ▪ Saturates and kills gradients ▪ Not zero-centered Image source Hyperbolic Tanget (Tanh) ▪ Zero centered! ▪ BUT, like the sigmoid, its activations saturate Image source RELU ▪ BUT, gradients can still go to zero Image source Leaky RELU ▪ For x < 0: ▪ f(x) = α∗x ▪ For x > = 0: ▪ f(x) = x These functions are not differentiable at 0, so we set the derivative to 0 or average of left and right derivative. Image source Comparison Image source Optimizers Stochastic Gradient Descent (Sgd) ▪ Choosing a proper learning rate can be difficult. Image source Stochastic Gradient Descent ▪ Easy to get stuck in local minima Image source Momentum ▪ Accelerates SGD: Like pushing a ball down a hill ▪ Take average of direction we’ve been heading (current velocity and acceleration) ▪ Limits oscillating back and forth, gets out of local minima Image source ADAM ▪ Adaptive Moment Estimation (Adam) Image source Keras Keras Lab Hyperparameter Selection Hyperparameter Selection Which dataset should we use to select hyperparameters? Train? Test? Validation Dataset ▪ Split the dataset into three! ▪ Train on the training set ▪ Select hyperparameters based on performance of the validation set ▪ Test on test set Advanced Keras & Lab MLflow & Lab Hyperopt & Lab Horovod Image source Image source Horovod ▪ Created by Alexander Sergeev of Uber, open-sourced in 2017 ▪ Simplifies distributed neural network training ▪ Supports TensorFlow, Keras, PyTorch, and Apache MXNet Classical Parameter Server Image source All-Reduce # Only one line of code change! optimizer = hvd.DistributedOptimizer(optimizer) Image source Horovod Demo Model Interpretability LIME Image source SHAP ▪ Shapley Values Image source SHAP Image source Convolutional Neural Networks Convolutions ▪ Focus on Local Connectivity (fewer parameters to learn) ▪ Filter/kernel slides across input image (often 3x3) Image Kernels Visualization CS 231 Convolutional Networks Imagenet Challenge ▪ Classify images in one of 1000 categories ▪ 2012 Deep Learning breakthrough with AlexNet: 16% top-5 test error rate (next closest was 25%) VGG16 (2014) ▪ One of the most widely used architectures for its simplicity. Image source Max Vs Avg. Pooling Image source Inception Image source Residual Connection Image source What Do CNNS Learn? ▪ Breaking Convnets CNN Demo Transfer Learning Transfer Learning ▪ IDEA: Intermediate representations learned for one task may be useful for other related tasks Image source When To Use Transfer Learning? Reference: Andrej Karpathy’s Transfer Learning Transfer Learning Ok, so how do I find the optimal neural network architecture? Neural Architecture Search with Reinforcement Learning Generative Adversarial Networks Generative Adversarial Networks (GANS) ▪ Estimates generative models ▪ Simultaneously trains two models ▪ G: a generative model captures the data distribution ▪ D: a discriminative model predicts probability of data coming from G ▪ Used in generating art, deep fakes, up-scaling graphics, and astronomy research ▪ Paper Gans Architecture: 2 Models Image source The Algorithm ▪ G takes noise as input, outputs a counterfeit ▪ D takes counterfeits and real values as input, outputs P(counterfeit) ▪ To prevent overfitting… ▪ Alternate k steps of optimizing D and one step of optimizing G ▪ Start with k of at least 5 ▪ Use log(1 - D(G(z))) to provide stronger, non-saturated gradients Resources ▪ Horovod Meetup Talk ▪ MLflow ▪ Deep Learning with Python ▪ Stanford's CS 231 ▪ fast.ai ▪ Blog posts & webinars ▪ Databricks Runtime for ML Pyception Thank you! .

Scalable Deep Learning with Tensorflow and Apache Spark™ Schedule

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support