Cs231n Lecture 8 : Deep Learning Software

Total Page:16

File Type:pdf, Size:1020Kb

Cs231n Lecture 8 : Deep Learning Software Lecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 1 April 27, 2017 Administrative - Project proposals were due Tuesday - We are assigning TAs to projects, stay tuned - We are grading A1 - A2 is due Thursday 5/4 - Remember to stop your instances when not in use - Only use GPU instances for the last notebook Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -2 2 April 27, 2017 Last time Regularization: Dropout Transfer Learning Optimization: SGD+Momentum, FC-C Nesterov, RMSProp, Adam FC-4096 Reinitialize FC-4096 this and train MaxPool Conv-512 Conv-512 MaxPool Conv-512 Conv-512 MaxPool Freeze these Conv-256 Regularization: Add noise, then Conv-256 marginalize out MaxPool Conv-128 Train Conv-128 MaxPool Conv-64 Test Conv-64 Image Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 3 April 27, 2017 Today - CPU vs GPU - Deep Learning Frameworks - Caffe / Caffe2 - Theano / TensorFlow - Torch / PyTorch Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4 4 April 27, 2017 CPU vs GPU Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 5 April 27, 2017 My computer Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -6 6 April 27, 2017 Spot the CPU! (central processing unit) This image is licensed under CC-BY 2.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -7 7 April 27, 2017 Spot the GPUs! (graphics processing unit) This image is in the public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -8 8 April 27, 2017 NVIDIA vs AMD Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -9 9 April 27, 2017 NVIDIA vs AMD Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -1010 April 27, 2017 CPU vs GPU # Cores Clock Speed Memory Price CPU: Fewer cores, but each core is CPU 4 4.4 GHz Shared with system $339 much faster and (Intel Core (8 threads with much more hyperthreading i7-7700k) ) capable; great at sequential tasks CPU 10 3.5 GHz Shared with system $1723 (Intel Core (20 threads with i7-6950X) hyperthreading ) GPU: More cores, GPU 3840 1.6 GHz 12 GB GDDR5X $1200 but each core is (NVIDIA much slower and Titan Xp) “dumber”; great for parallel tasks GPU 1920 1.68 GHz 8 GB GDDR5 $399 (NVIDIA GTX 1070) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -1111 April 27, 2017 Example: Matrix Multiplication A x B A x C B x C = Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -1212 April 27, 2017 Programming GPUs ● CUDA (NVIDIA only) ○ Write C-like code that runs directly on the GPU ○ Higher-level APIs: cuBLAS, cuFFT, cuDNN, etc ● OpenCL ○ Similar to CUDA, but runs on anything ○ Usually slower :( ● Udacity: Intro to Parallel Programming https://www.udacity.com/course/cs344 ○ For deep learning just use existing libraries Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -1313 April 27, 2017 (CPU performance not CPU vs GPU in practice well-optimized, a little unfair) 66x 67x 71x 64x 76x Data from https://github.com/jcjohnson/cnn-benchmarks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 14 April 27, 2017 cuDNN much faster than CPU vs GPU in practice “unoptimized” CUDA 2.8x 3.0x 3.1x 3.4x 2.8x Data from https://github.com/jcjohnson/cnn-benchmarks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 15 April 27, 2017 CPU / GPU Communication Model Data is here is here Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 16 April 27, 2017 CPU / GPU Communication Model Data is here is here If you aren’t careful, training can bottleneck on reading data and transferring to GPU! Solutions: - Read all data into RAM - Use SSD instead of HDD - Use multiple CPU threads to prefetch data Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 17 April 27, 2017 Deep Learning Frameworks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 18 April 27, 2017 Last year ... Caffe (UC Berkeley) Torch (NYU / Facebook) Theano TensorFlow (U Montreal) (Google) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 19 April 27, 2017 This year ... Paddle (Baidu) Caffe Caffe2 (UC Berkeley) (Facebook) CNTK (Microsoft) Torch PyTorch (NYU / Facebook) (Facebook) MXNet (Amazon) Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of Theano TensorFlow choice at AWS (U Montreal) (Google) And others... Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 20 April 27, 2017 Today Paddle A bit about these (Baidu) Caffe Caffe2 (UC Berkeley) (Facebook) CNTK (Microsoft) Torch PyTorch (NYU / Facebook) (Facebook) MXNet (Amazon) Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of Theano TensorFlow choice at AWS (U Montreal) (Google) Mostly these And others... Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 21 April 27, 2017 Recall: Computational Graphs x s (scores) hinge + * loss L W R Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 22 April 27, 2017 Recall: Computational Graphs input image weights loss Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 23 April 27, 2017 Recall: Computational Graphs input image loss Figure reproduced with permission from a Twitter post by Andrej Karpathy. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - 24 April 27, 2017 The point of deep learning frameworks (1) Easily build big computational graphs (2) Easily compute gradients in computational graphs (3) Run it all efficiently on GPU (wrap cuDNN, cuBLAS, etc) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -2525 April 27, 2017 Computational Graphs Numpy x y z * a + b Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -2626 April 27, 2017 Computational Graphs Numpy x y z * a + b Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -2727 April 27, 2017 Computational Graphs Numpy x y z * a Problems: + - Can’t run on GPU - Have to compute b our own gradients Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -2828 April 27, 2017 Computational Graphs TensorFlow Numpy x y z * a + b Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -2929 April 27, 2017 Computational Graphs TensorFlow x y z * a Create forward + computational graph b Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -3030 April 27, 2017 Computational Graphs TensorFlow x y z * a + Ask TensorFlow to b compute gradients Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -3131 April 27, 2017 Computational Graphs TensorFlow x y z * Tell TensorFlow a to run on CPU + b Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -3232 April 27, 2017 Computational Graphs TensorFlow x y z * Tell TensorFlow a to run on GPU + b Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -3333 April 27, 2017 Computational Graphs PyTorch x y z * a + b Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -3434 April 27, 2017 Computational Graphs PyTorch x y z * a Define Variables to + start building a computational graph b Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -3535 April 27, 2017 Computational Graphs PyTorch x y z * a + b Forward pass looks just like Σ numpy c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -3636 April 27, 2017 Computational Graphs PyTorch x y z * a + b Σ Calling c.backward() c computes all gradients Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -3737 April 27, 2017 Computational Graphs PyTorch x y z * Run on GPU by a casting to .cuda() + b Σ c Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -3838 April 27, 2017 Numpy TensorFlow PyTorch Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -3939 April 27, 2017 TensorFlow (more detail) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4040 April 27, 2017 TensorFlow: Neural Net Running example: Train a two-layer ReLU network on random data with L2 loss Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4141 April 27, 2017 TensorFlow: Neural Net (Assume imports at the top of each snipppet) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4242 April 27, 2017 TensorFlow: Neural Net First define computational graph Then run the graph many times Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4343 April 27, 2017 TensorFlow: Neural Net Create placeholders for input x, weights w1 and w2, and targets y Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4444 April 27, 2017 TensorFlow: Neural Net Forward pass: compute prediction for y and loss (L2 distance between y and y_pred) No computation happens here - just building the graph! Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4545 April 27, 2017 TensorFlow: Neural Net Tell TensorFlow to compute loss of gradient with respect to w1 and w2. Again no computation here - just building the graph Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4646 April 27, 2017 TensorFlow: Neural Net Now done building our graph, so we enter a session so we can actually run the graph Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4747 April 27, 2017 TensorFlow: Neural Net Create numpy arrays that will fill in the placeholders above Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4848 April 27, 2017 TensorFlow: Neural Net Run the graph: feed in the numpy arrays for x, y, w1, and w2; get numpy arrays for loss, grad_w1, and grad_w2 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -4949 April 27, 2017 TensorFlow: Neural Net Train the network: Run the graph over and over, use gradient to update weights Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 -5050 April 27, 2017 TensorFlow: Neural Net Problem: copying weights between CPU / GPU each step Train the network: Run the graph over and over,
Recommended publications
  • Automated Elastic Pipelining for Distributed Training of Transformers
    PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers Chaoyang He 1 Shen Li 2 Mahdi Soltanolkotabi 1 Salman Avestimehr 1 Abstract the-art convolutional networks ResNet-152 (He et al., 2016) and EfficientNet (Tan & Le, 2019). To tackle the growth in The size of Transformer models is growing at an model sizes, researchers have proposed various distributed unprecedented rate. It has taken less than one training techniques, including parameter servers (Li et al., year to reach trillion-level parameters since the 2014; Jiang et al., 2020; Kim et al., 2019), pipeline paral- release of GPT-3 (175B). Training such models lel (Huang et al., 2019; Park et al., 2020; Narayanan et al., requires both substantial engineering efforts and 2019), intra-layer parallel (Lepikhin et al., 2020; Shazeer enormous computing resources, which are luxu- et al., 2018; Shoeybi et al., 2019), and zero redundancy data ries most research teams cannot afford. In this parallel (Rajbhandari et al., 2019). paper, we propose PipeTransformer, which leverages automated elastic pipelining for effi- T0 (0% trained) T1 (35% trained) T2 (75% trained) T3 (100% trained) cient distributed training of Transformer models. In PipeTransformer, we design an adaptive on the fly freeze algorithm that can identify and freeze some layers gradually during training, and an elastic pipelining system that can dynamically Layer (end of training) Layer (end of training) Layer (end of training) Layer (end of training) Similarity score allocate resources to train the remaining active layers. More specifically, PipeTransformer automatically excludes frozen layers from the Figure 1. Interpretable Freeze Training: DNNs converge bottom pipeline, packs active layers into fewer GPUs, up (Results on CIFAR10 using ResNet).
    [Show full text]
  • Introduction to Deep Learning Framework 1. Introduction 1.1
    Introduction to Deep Learning Framework 1. Introduction 1.1. Commonly used frameworks The most commonly used frameworks for deep learning include Pytorch, Tensorflow, Keras, caffe, Apache MXnet, etc. PyTorch: open source machine learning library; developed by Facebook AI Rsearch Lab; based on the Torch library; supports Python and C++ interfaces. Tensorflow: open source software library dataflow and differentiable programming; developed by Google brain team; provides stable Python & C APIs. Keras: an open-source neural-network library written in Python; conceived to be an interface; capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML. Caffe: open source under BSD licence; developed at University of California, Berkeley; written in C++ with a Python interface. Apache MXnet: an open-source deep learning software framework; supports a flexible programming model and multiple programming languages (including C++, Python, Java, Julia, Matlab, JavaScript, Go, R, Scala, Perl, and Wolfram Language.) 1.2. Pytorch 1.2.1 Data Tensor: the major computation unit in PyTorch. Tensor could be viewed as the extension of vector (one-dimensional) and matrix (two-dimensional), which could be defined with any dimension. Variable: a wrapper of tensor, which includes creator, value of variable (tensor), and gradient. This is the core of the automatic derivation in Pytorch, as it has the information of both the value and the creator, which is very important for current backward process. Parameter: a subset of variable 1.2.2. Functions: NNModules: NNModules (torch.nn) is a combination of parameters and functions, and could be interpreted as layers. There some common modules such as convolution layers, linear layers, pooling layers, dropout layers, etc.
    [Show full text]
  • Zero-Shot Text-To-Image Generation
    Zero-Shot Text-to-Image Generation Aditya Ramesh 1 Mikhail Pavlov 1 Gabriel Goh 1 Scott Gray 1 Chelsea Voss 1 Alec Radford 1 Mark Chen 1 Ilya Sutskever 1 Abstract Text-to-image generation has traditionally fo- cused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part la- bels or segmentation masks supplied during train- ing. We describe a simple approach for this task based on a transformer that autoregressively mod- els the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific mod- els when evaluated in a zero-shot fashion. Figure 1. Comparison of original images (top) and reconstructions from the discrete VAE (bottom). The encoder downsamples the 1. Introduction spatial resolution by a factor of 8. While details (e.g., the texture of Modern machine learning approaches to text to image syn- the cat’s fur, the writing on the storefront, and the thin lines in the thesis started with the work of Mansimov et al.(2015), illustration) are sometimes lost or distorted, the main features of the image are still typically recognizable. We use a large vocabulary who showed that the DRAW Gregor et al.(2015) generative size of 8192 to mitigate the loss of information. model, when extended to condition on image captions, could also generate novel visual scenes. Reed et al.(2016b) later demonstrated that using a generative adversarial network tioning model pretrained on MS-COCO.
    [Show full text]
  • Real-Time Object Detection for Autonomous Vehicles Using Deep Learning
    IT 19 007 Examensarbete 30 hp Juni 2019 Real-time object detection for autonomous vehicles using deep learning Roger Kalliomäki Institutionen för informationsteknologi Department of Information Technology Abstract Real-time object detection for autonomous vehicles using deep learning Roger Kalliomäki Teknisk- naturvetenskaplig fakultet UTH-enheten Self-driving systems are commonly categorized into three subsystems: perception, planning, and control. In this thesis, the perception problem is studied in the context Besöksadress: of real-time object detection for autonomous vehicles. The problem is studied by Ångströmlaboratoriet Lägerhyddsvägen 1 implementing a cutting-edge real-time object detection deep neural network called Hus 4, Plan 0 Single Shot MultiBox Detector which is trained and evaluated on both real and virtual driving-scene data. Postadress: Box 536 751 21 Uppsala The results show that modern real-time capable object detection networks achieve their fast performance at the expense of detection rate and accuracy. The Single Shot Telefon: MultiBox Detector network is capable of processing images at over fifty frames per 018 – 471 30 03 second, but scored a relatively low mean average precision score on a diverse driving- Telefax: scene dataset provided by Berkeley University. Further development in both 018 – 471 30 00 hardware and software technologies will presumably result in a better trade-off between run-time and detection rate. However, as the technologies stand today, Hemsida: general real-time object detection networks do not seem to be suitable for high http://www.teknat.uu.se/student precision tasks, such as visual perception for autonomous vehicles. Additionally, a comparison is made between two versions of the Single Shot MultiBox Detector network, one trained on a virtual driving-scene dataset from Ford Center for Autonomous Vehicles, and one trained on a subset of the earlier used Berkeley dataset.
    [Show full text]
  • TRAINING NEURAL NETWORKS with TENSOR CORES Dusan Stosic, NVIDIA Agenda
    TRAINING NEURAL NETWORKS WITH TENSOR CORES Dusan Stosic, NVIDIA Agenda A100 Tensor Cores and Tensor Float 32 (TF32) Mixed Precision Tensor Cores : Recap and New Advances Accuracy and Performance Considerations 2 MOTIVATION – COST OF DL TRAINING GPT-3 Vision tasks: ImageNet classification • 2012: AlexNet trained on 2 GPUs for 5-6 days • 2017: ResNeXt-101 trained on 8 GPUs for over 10 days T5 • 2019: NoisyStudent trained with ~1k TPUs for 7 days Language tasks: LM modeling RoBERTa • 2018: BERT trained on 64 GPUs for 4 days • Early-2020: T5 trained on 256 GPUs • Mid-2020: GPT-3 BERT What’s being done to reduce costs • Hardware accelerators like GPU Tensor Cores • Lower computational complexity w/ reduced precision or network compression (aka sparsity) 3 BASICS OF FLOATING-POINT PRECISION Standard way to represent real numbers on a computer • Double precision (FP64), single precision (FP32), half precision (FP16/BF16) Cannot store numbers with infinite precision, trade-off between range and precision • Represent values at widely different magnitudes (range) o Different tensors (weights, activation, and gradients) when training a network • Provide same relative accuracy at all magnitudes (precision) o Network weight magnitudes are typically O(1) o Activations can have orders of magnitude larger values How floating-point numbers work • exponent: determines the range of values o scientific notation in binary (base of 2) • fraction (or mantissa): determines the relative precision between values mantissa o (2^mantissa) samples between powers of
    [Show full text]
  • Pytorch Is an Open Source Machine Learning Library for Python and Is Completely Based on Torch
    PyTorch i PyTorch About the Tutorial PyTorch is an open source machine learning library for Python and is completely based on Torch. It is primarily used for applications such as natural language processing. PyTorch is developed by Facebook's artificial-intelligence research group along with Uber's "Pyro" software for the concept of in-built probabilistic programming. Audience This tutorial has been prepared for python developers who focus on research and development with machine learning algorithms along with natural language processing system. The aim of this tutorial is to completely describe all concepts of PyTorch and real- world examples of the same. Prerequisites Before proceeding with this tutorial, you need knowledge of Python and Anaconda framework (commands used in Anaconda). Having knowledge of artificial intelligence concepts will be an added advantage. Copyright & Disclaimer Copyright 2019 by Tutorials Point (I) Pvt. Ltd. All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any contents or a part of contents of this e-book in any manner without written consent of the publisher. We strive to update the contents of our website and tutorials as timely and as precisely as possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our website or its contents including this tutorial. If you discover any errors on our website or in this tutorial, please notify us at [email protected] ii PyTorch Table of Contents About the Tutorial ..........................................................................................................................................
    [Show full text]
  • Quantifying Deep Fake Generation and Detection Accuracy for a Variety of Newscast Settings
    Quantifying Deep Fake Generation and Detection Accuracy for a Variety of Newscast Settings. Quantifying Deep Fake Generation and Detection Accuracy for a Variety of Newscast Settings. A Project Report Presented to Dr. Chris Pollett Department of Computer Science San José State University In Partial Fulfillment Of the Requirements for the Class CS 297 By Pratikkumar Prajapati May 2020 Quantifying Deep Fake Generation and Detection Accuracy for a Variety of Newscast Settings. ABSTRACT DeepFakes are fake videos synthesized using advanced deep-learning techniques. The fake videos are one of the biggest threats of our times. In this project, we attempt to detect DeepFakes using advanced deep- learning and other techniques. The whole project is two semesters long. This report is of the first semester. We begin with developing a Convolution Neural Network-based classifier to classify Devanagari characters. Then we implement various Autoencoders to generate fake Devanagari characters. We extend it to develop Generative Adversarial Networks to generate more realistic looking fake Devanagari characters. We explore a state-of-the-art model, DeepPrivacy, to generate fake human images by swapping faces. We modify the DeepPrivacy model and experiment with the new model to achieve a similar kind of results. The modified DeepPrivacy model can generate DeepFakes using face-swap. The generated fake images look like humans and have variations from the original set of images. Index Terms — Artificial Intelligence, machine learning, deep-learning, autoencoders, generative adversarial networks, DeepFakes, face-swap, fake videos i Quantifying Deep Fake Generation and Detection Accuracy for a Variety of Newscast Settings. TABLE OF CONTENTS I. Introduction ......................................................................................................................................
    [Show full text]
  • Deep Learning Software Security and Fairness of Deep Learning SP18 Today
    Deep Learning Software Security and Fairness of Deep Learning SP18 Today ● HW1 is out, due Feb 15th ● Anaconda and Jupyter Notebook ● Deep Learning Software ○ Keras ○ Theano ○ Numpy Anaconda ● A package management system for Python Anaconda ● A package management system for Python Jupyter notebook ● A web application that where you can code, interact, record and plot. ● Allow for remote interaction when you are working on the cloud ● You will be using it for HW1 Deep Learning Software Deep Learning Software Caffe(UCB) Caffe2(Facebook) Paddle (Baidu) Torch(NYU/Facebook) PyTorch(Facebook) CNTK(Microsoft) Theano(U Montreal) TensorFlow(Google) MXNet(Amazon) Keras (High Level Wrapper) Deep Learning Software: Most Popular Caffe(UCB) Caffe2(Facebook) Paddle (Baidu) Torch(NYU/Facebook) PyTorch(Facebook) CNTK(Microsoft) Theano(U Montreal) TensorFlow(Google) MXNet(Amazon) Keras (High Level Wrapper) Deep Learning Software: Today Caffe(UCB) Caffe2(Facebook) Paddle (Baidu) Torch(NYU/Facebook) PyTorch(Facebook) CNTK(Microsoft) Theano(U Montreal) TensorFlow(Google) MXNet(Amazon) Keras (High Level Wrapper) Mobile Platform ● Tensorflow Lite: ○ Released last November Why do we use deep learning frameworks? ● Easily build big computational graphs ○ Not the case in HW1 ● Easily compute gradients in computational graphs ● GPU support (cuDNN, cuBLA...etc) ○ Not required in HW1 Keras ● A high-level deep learning framework ● Built on other deep-learning frameworks ○ Theano ○ Tensorflow ○ CNTK ● Easy and Fun! Keras: A High-level Wrapper ● Pass on a layer of instances in the constructor ● Or: simply add layers. Make sure the dimensions match. Keras: Compile and train! Epoch: 1 epoch means going through all the training dataset once Numpy ● The fundamental package in Python for: ○ Scientific Computing ○ Data Science ● Think in terms of vectors/Matrices ○ Refrain from using for loops! ○ Similar to Matlab Numpy ● Basic vector operations ○ Sum, mean, argmax….
    [Show full text]
  • How to Profile with Dlprof CONTENT How to Use Dlprof
    How to profile with DLProf CONTENT How to use DLProf ● Content ○ DLProf Overview ○ How to install DLProf ○ Profile with DLProf ■ TensorFlow 1.x ■ TensorFlow 2.x ■ PyTorch ○ Demo ■ Profile PyTorch model with DLProf NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 2 DLPROF OVERVIEW “Are my GPUs being utilized?” “Am I using Tensor Cores?” “How can I improve performance?” FW Support: TF, PyT, and TRT Visualize Analysis and Recommendations Lib Support: DALI, NCCL NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 3 DEEP LEARNING PROFILER Deep Learning Profiler (DLProf) is a tool for profiling deep learning models DLProf (CLI & Viewer) • Helps data scientists understand and improve performance of their DL models by analyzing text reports or visualizing the profiling data DLProf CLI • Uses Nsight Systems profiler under the hood • Aggregates and correlates CPU and GPU profiling data from a training run to DL model • Provides accurate Tensor Core usage detection for operations and kernels • Identifies performance issues and provides recommendations via Expert Systems DLProf Viewer • Uses the results from DLProf CLI and provides visualization of the data • Currently exists as a TensorBoard plugin 4 GETTING STARTED Running DLProf with TensorFlow, TRT, and PyTorch 1. TensorFlow and TRT require no additional code 1. Add few lines of code to your training script to modification enable nvidia_dlprof_pytorch_nvtx module 2. Profile using DLProf CLI - prepend with dlprof 2. Profile using DLProf CLI - prepend with dlprof 3. Visualize results with DLProf Viewer 3. Visualize with DLProf Viewer Add few lines of Profile Visualize in viewer Profile code Visualize in viewer 5 SETUP DLPROF Install DLProf with pip ● Installing Using a Python Wheel $ pip install nvidia-pyindex $ pip install nvidia-dlprof ● TensorFlow 1.x $ pip install nvidia-dlprof[tensorflow] ● TensorFlow 2.x Pip install is not supported ● PyTorch $ pip install nvidia-dlprof[pytorch] NVIDIA CONFIDENTIAL.
    [Show full text]
  • Dell EMC Isilon: Deep Learning Infrastructure for Autonomous Driving Enabled by Dell EMC Isilon and Dell EMC DSS 8440
    Technical White Paper Dell EMC Isilon: Deep Learning Infrastructure for Autonomous Driving Enabled by Dell EMC Isilon and Dell EMC DSS 8440 Abstract This document demonstrates design and considerations for large-scale deep learning infrastructure on an autonomous driving development. It also showcases the results how the Dell EMC™ Isilon F810 All-Flash Scale-out NAS and Dell EMC DSS 8440 with NVIDIA® Tesla V100 GPUs can be used to accelerate and scale deep learning training workloads. September 2019 H17918 Revisions Revisions Date Description September 2019 Initial release Acknowledgements This paper was produced by the following: Author: Frances Hu ([email protected]) The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any software described in this publication requires an applicable software license. Copyright © 2019 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners. [9/9/2019] [Technical White Paper] [H17918] 2 Dell EMC Isilon: Deep Learning Infrastructure for Autonomous Driving | H17918 Table of contents Table of contents Revisions ....................................................................................................................................................................
    [Show full text]
  • Deep Learning Software Spring 2020 Today
    Deep Learning Software Spring 2020 Today ● Homework 1 is out, due Feb 6 ● Conda and Jupyter Notebook ● Deep Learning Software ● Keras ● Tensorflow ● Numpy ● Google Cloud Platform Conda ● A package management system for Python Jupyter notebook ● A web application that where you can code, interact, record and plot. ● Allow for remote interaction when you are working on the cloud ● You will be using it for HW1 Deep Learning Software Deep Learning Software Caffe(UCB) Caffe2(Facebook) Paddle (Baidu) Torch(NYU/Facebook) PyTorch(Facebook) CNTK(Microsoft) Theano(U Montreal) TensorFlow(Google) MXNet(Amazon) Keras (High Level Wrapper) Deep Learning Software: Most Popular Caffe(UCB) Caffe2(Facebook) Paddle (Baidu) Torch(NYU/Facebook) PyTorch(Facebook) CNTK(Microsoft) Theano(U Montreal) TensorFlow(Google) MXNet(Amazon) Keras (High Level Wrapper) Deep Learning Software: Today Caffe(UCB) Caffe2(Facebook) Paddle (Baidu) Torch(NYU/Facebook) PyTorch(Facebook) CNTK(Microsoft) Theano(U Montreal) TensorFlow(Google) MXNet(Amazon) Keras (High Level Wrapper) Mobile Platform ● Tensorflow Lite: ○ Released last November Why do we use deep learning frameworks? ● Easily build big computational graphs ○ Not the case in HW1 ● Easily compute gradients in computational graphs ● GPU support (cuDNN, cuBLA...etc) ○ Not required in HW1 Keras ● A high-level deep learning framework ● Built on other deep-learning frameworks ○ Theano ○ Tensorflow ○ CNTK ● Easy and Fun! Keras: A High-level Wrapper ● Pass on a layer of instances in the constructor ● Or: simply add layers. Make sure the dimensions match. Keras: Compile and train! Epoch: 1 epoch means going through all the training dataset once Numpy ● The fundamental package in Python for: ○ Scientific Computing ○ Data Science ● Think in terms of vectors/Matrices ○ Refrain from using for loops! ○ Similar to Matlab Numpy ● Basic vector operations ○ Sum, mean, argmax….
    [Show full text]
  • Towards Russian Text Generation Problem Using Openai's GPT-2
    Towards Russian Text Generation Problem Using OpenAI’s GPT-2 Oleksii Shatalov, Nataliya Ryabova National University of Radio Electronics, Nauky av., 14, Kharkiv, 61000, Ukraine Abstract This work is devoted to Natural Language Generation (NLG) problem. The modern approaches in this area based on deep neural networks are considered. The most famous and promising deep neural network architectures that are related to this problem are considered, in particular, the most popular free software solutions for NLG based on Transformers architecture with pre-trained deep neural network models GPT-2 and BERT. The main problem is that the main part of already existing solutions is devoted to the English language. But there are few models that are able to generate text in Russian. Moreover, the text they generate often belongs to a general topic and not about a specific subject area. The object of the study is the generation of a contextually coherent narrow-profile text in Russian. Within the framework of the study, a model was trained for generating coherent articles of a given subject area in Russian, as well as a software application for interacting with it. Keywords 1 Natural Language Generation, Natural Language Processing, Transformers Architecture, Deep Learning, Transfer Learning, GPT-2 1. Introduction The current rate of growth of content is so great that organizations are beginning to fail to keep up with their own set of speeds. Editors and copywriters do not have time to create new texts from scratch, think over ideas for new publications so that they are original. Hiring a large staff of additional staff can significantly increase the costs of the company, which will lead to lower profits.
    [Show full text]