
Scalable Deep Learning with TensorFlow and Apache Spark™ Schedule ▪ Keras and Neural Network Fundamentals ▪ MLflow and Spark UDFs ▪ Hyperparameter Tuning with Hyperopt ▪ Horovod: Distributed Model Training ▪ LIME, SHAP & Model Interpretability ▪ CNNs and ImageNet ▪ Transfer Learning ▪ Object Detection ▪ Generative Adversarial Networks (GANs) Survey ▪ Pandas/Spark? ▪ Machine Learning? Deep Learning? ▪ Expectations? Deep Learning Overview Why Deep Learning? ▪ Performs well on complex datasets like images, sequences, and natural language ▪ Scales better as data size increases ▪ Theoretically can learn any shape (universal approximation theorem) Open Source Landscape Where Does Dl Fit In? Image source What Is Deep Learning? ▪ “Composing representations of data in a hierarchical manner” Image source Image source Keras ▪ High-level Python API to build neural networks ▪ Official high-level API of TensorFlow ▪ Has over 250,000 users ▪ Released by François Chollet in 2015 Why Keras? Hardware Considerations ▪ GPUs are prefered for training due to speed of computation, but not good in data transfer ▪ CPUs are generally acceptable for inference Why Dl On Databricks? Neural Network Fundamentals Layers ▪ Input layer ▪ Zero or more hidden layers ▪ Output layer Image source Regression Evaluation ▪ Measure "closeness" between label and prediction ▪ When predicting someone's weight, better to be off by 2 lbs instead of 20 lbs ▪ Evaluation metrics: ▪ Loss: (y−ŷ) ▪ Absolute loss: |y−ŷ| ▪ Squared loss: (y−ŷ)2 Evaluation Metric: MSE Evaluation Metric: MSE Backpropagation ▪ Calculate gradients to update weights Linear Regression With Keras Activation Functions ▪ Provide non-linearity in our neural networks to learn more complex relationships ▪ Sigmoid ▪ Leaky ReLU ▪ Tangent ▪ PReLU ▪ ReLU ▪ ELU Sigmoid ▪ Saturates and kills gradients ▪ Not zero-centered Image source Hyperbolic Tanget (Tanh) ▪ Zero centered! ▪ BUT, like the sigmoid, its activations saturate Image source RELU ▪ BUT, gradients can still go to zero Image source Leaky RELU ▪ For x < 0: ▪ f(x) = α∗x ▪ For x > = 0: ▪ f(x) = x These functions are not differentiable at 0, so we set the derivative to 0 or average of left and right derivative. Image source Comparison Image source Optimizers Stochastic Gradient Descent (Sgd) ▪ Choosing a proper learning rate can be difficult. Image source Stochastic Gradient Descent ▪ Easy to get stuck in local minima Image source Momentum ▪ Accelerates SGD: Like pushing a ball down a hill ▪ Take average of direction we’ve been heading (current velocity and acceleration) ▪ Limits oscillating back and forth, gets out of local minima Image source ADAM ▪ Adaptive Moment Estimation (Adam) Image source Keras Keras Lab Hyperparameter Selection Hyperparameter Selection Which dataset should we use to select hyperparameters? Train? Test? Validation Dataset ▪ Split the dataset into three! ▪ Train on the training set ▪ Select hyperparameters based on performance of the validation set ▪ Test on test set Advanced Keras & Lab MLflow & Lab Hyperopt & Lab Horovod Image source Image source Horovod ▪ Created by Alexander Sergeev of Uber, open-sourced in 2017 ▪ Simplifies distributed neural network training ▪ Supports TensorFlow, Keras, PyTorch, and Apache MXNet Classical Parameter Server Image source All-Reduce # Only one line of code change! optimizer = hvd.DistributedOptimizer(optimizer) Image source Horovod Demo Model Interpretability LIME Image source SHAP ▪ Shapley Values Image source SHAP Image source Convolutional Neural Networks Convolutions ▪ Focus on Local Connectivity (fewer parameters to learn) ▪ Filter/kernel slides across input image (often 3x3) Image Kernels Visualization CS 231 Convolutional Networks Imagenet Challenge ▪ Classify images in one of 1000 categories ▪ 2012 Deep Learning breakthrough with AlexNet: 16% top-5 test error rate (next closest was 25%) VGG16 (2014) ▪ One of the most widely used architectures for its simplicity. Image source Max Vs Avg. Pooling Image source Inception Image source Residual Connection Image source What Do CNNS Learn? ▪ Breaking Convnets CNN Demo Transfer Learning Transfer Learning ▪ IDEA: Intermediate representations learned for one task may be useful for other related tasks Image source When To Use Transfer Learning? Reference: Andrej Karpathy’s Transfer Learning Transfer Learning Ok, so how do I find the optimal neural network architecture? Neural Architecture Search with Reinforcement Learning Generative Adversarial Networks Generative Adversarial Networks (GANS) ▪ Estimates generative models ▪ Simultaneously trains two models ▪ G: a generative model captures the data distribution ▪ D: a discriminative model predicts probability of data coming from G ▪ Used in generating art, deep fakes, up-scaling graphics, and astronomy research ▪ Paper Gans Architecture: 2 Models Image source The Algorithm ▪ G takes noise as input, outputs a counterfeit ▪ D takes counterfeits and real values as input, outputs P(counterfeit) ▪ To prevent overfitting… ▪ Alternate k steps of optimizing D and one step of optimizing G ▪ Start with k of at least 5 ▪ Use log(1 - D(G(z))) to provide stronger, non-saturated gradients Resources ▪ Horovod Meetup Talk ▪ MLflow ▪ Deep Learning with Python ▪ Stanford's CS 231 ▪ fast.ai ▪ Blog posts & webinars ▪ Databricks Runtime for ML Pyception Thank you! .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages74 Page
-
File Size-