Software Frameworks for Deep Learning at Scale

Total Page:16

File Type:pdf, Size:1020Kb

Software Frameworks for Deep Learning at Scale Software Frameworks for Deep Learning at Scale James Fox Yiming Zou Judy Qiu Georgia Institute of Indiana University Indiana University Technology Bloomington Bloomington [email protected] [email protected] [email protected] ABSTRACT nodes. Two forms of parallelism also exist on the application level: model and data parallelism. Other aspects of frame- The study and adoption of deep learning methods has led works include release date, core language, user-facing API, to significant progress in different application domains. As computation model, communication model, deep learning deep learning continues to show promise and its utilization types, programming paradigm, fault tolerance, and visual- matures, so does the infrastructure and software needed to ization. This choice of criteria is explained in detail in Sec- support it. Various frameworks have been developed in re- tion 2. Tensorflow, CNTK, Deeplearning4j, MXNet, H2O, cent years to facilitate both implementation and training of Caffe, Theano, and Torch do not necessarily encompass the deep learning networks. As deep learning has also evolved entire space of frameworks for deep learning, but were se- to scale across multiple machines, there's a growing need for lected by a combination of factors: their being open-source, frameworks that can provide full parallel support. While level of documentation, maturity as a complete product, and deep learning frameworks restricted to running on a sin- level of adoption by the community. These frameworks, as gle machine have been studied and compared, frameworks well as some others not included in the Section 2 chart, are which support parallel deep learning at scale are relatively examined in detail in Section 3. Section 4 discusses finer less known and well-studied. This paper seeks to bridge that points of parallelism and scalability in deep learning which gap by surveying, summarizing, and comparing frameworks may escape but are pertinent to the framework discussion. which currently support distributed execution, including but Section 5 offers concluding remarks. not limited to Tensorflow, CNTK, Deeplearning4j, MXNet, H2O, Caffe, Theano, and Torch. 2. FRAMEWORK COMPARISON 1. INTRODUCTION The relevance of release date, core language, user-facing Deep learning has been quite successful in improving predic- APIs are self-explanatory. Synchronization model speci- tive power in domains such as computer vision and natural fies the nature of data consistency through execution, i.e. language processing. State-of-the-art performance in com- whether updates are synchronous or asynchronous. In con- puter vision is driven by the convolutional neural network text of optimization kernels like stochastic gradient descent model, a special kind of feed-forward deep learning model. (SGD), synchronous execution has better convergence guar- The high-level idea is to learn images filters for extracting antees by maintaining consistency or near-consistency with meaningful features and predictions. On the other hand, sequential execution. Asynchronous SGD can exploit more natural language processing has had a lot of success apply- parallelism and train faster, but with less guarantees of con- ing recurrent neural networks, a type of feedback model well- vergence speed. Frameworks like Tensorflow and MXNet suited for learning order and context-sensitive sequences (as leave this tradeoff as a choice to the user. in natural languages). The communication model tries to categorize the nature of As accuracies continue to increase in both domains, so do across-machine execution according to well-known paradigms. the complexity of network architectures and the size of the There are three possible levels of parallelism at the hardware parameter space. Google's network for unsupervised learn- level: cores within a CPU/GPU device, across multiple de- ing of image features reached a billion parameters [22], and vices (usually GPUs for deep learning), or across machines. was increased to 11 billion parmeters in a separate experi- Most lower-level library kernels (e.g. for linear algebra) are ment at Stanford [27]. In the NLP space, Digital Reasoning designed to use multiple cores of a device by default, so this Systems trained a 160 billion parameter network [29] fairly is not a major point of comparison. At this point, all the recently. Handling problems of this size involves looking be- frameworks also support parallelism across multiple GPUs. yond the single machine, which Google first demonstrated Theano and Torch do not yet support multi-machine paral- through its distributed DistBelief framework [21]. lelism. The goal of this paper is to survey the landscape of deep Data and model parallelism are the two prevalent oppor- learning frameworks with full support for parallelization. tunities for parallelism in training deep learning networks Three levels of parallelization exist on the hardware level: at the distributed level. In data parallelism, copies of the within a GPU, between GPUs on a single node, and between model, or parameters, are each trained on its own subset of the training data, while updating the same global model. In model parallelism, the model itself is partitioned and trained in parallel. Table 1: Open-source Frameworks Platform Tensorflow CNTK Deeplearning4j MXNet H2O Caffe Theano Torch 2011 (deep Release Date 2016 2016 2015 2015 2014 2014 2010 learning) Core Language C++ C++ C++ C++ Java C++ C++ C C++, Java, R, Python, R, Python, C++, Scala, Python, API NDL Java, Scala Scala, Python Lua Python Matlab, Matlab Javascript, Javascript, web-UI Go, Julia Synchronization Sync or Sync or Sync Sync Async Sync Async Sync Model async async Communication Parameter Iterative Parameter Distributed MPI N/A N/A N/A Model server MapReduce server fork-join Multi-GPU 3 3 3 3 3 3 3 3 Multi-node 3 3 3 3 3 7 7 7 Data 3 3 3 3 3 3 3 3 Parallelism Model 3 N/A 7 3 7 7 3 3 Parallelism DBN, DBN, DBN, DBN, DBN, Deep Learning DBN, CNN, DBN, CNN, CNN, CNN, CNN, DBN CNN, CNN, Models RNN RNN RNN RNN RNN RNN RNN Programming Imperative Imperative Declarative Both Declarative Declarative Imperative Imperative Paradigm Checkpoint- Fault Checkpoint- Checkpoint- Checkpoint- Checkpoint- Checkpoint- and- N/A N/A Tolerance and-resume and-resume and-resume and-resume and-resume recovery Graph (in- teractive), Graph Training Summary Graph Visualization None None Plots training (static) monitoring Statistics (static) monitoring Deep learning models can be categorized into three major 3.1 Tensorflow types: deep-belief networks (DBNs), convolutional neural Tensorflow was released by Google Research as open source networks (CNNs), and recurrent neural networks (RNNs). in November 2015, and included distributed support in 2016. CNNs and RNNs were briefly described in the introduc- The user-facing APIs are C++ and Python. Programming tion. DBNs are less domain-specific compared to CNNs and with Tensorflow leans more imperative. While plenty of ab- RNNs, and could be considered a precursor to CNNs, but straction power is expressed in its library, the user will prob- are fundamental nonetheless. ably also be working with computational primitive wrap- Programming paradigm falls into the categories of imper- pers such as matrix operations, element-wise math opera- ative, declarative, or a mix of both. Conventionally, im- tors, and looping control. In other words, the user is ex- perative programming specifies how a computation is done, posed to some of the internal workings of deep learning net- where as declarative programming specifies what needs to works. Tensorflow treats networks as a directed graph of be done. There is plenty of gray area, but the distinction is nodes encapsulating dataflow computation and required de- made in this paper based on whether the API exposes the pendencies [15]. Each node, or computation, gets mapped user to computation details that require some understand- to devices (CPUs or GPUs) according to some cost func- ing of the inner math of neural networks (imperative), or tion. This partitions the overall graph into subgraphs, one whether the abstraction is yet higher (declarative). per device. Cross-device edges are replaced to encode nec- Fault tolerance is included for two reasons. Distributed ex- essary synchronization between device pairs. Distributed ecution tends to be more failure prone, especially at scale. execution appears to be a natural extension of this arrange- Furthermore, any failures (not necessarily limited to dis- ment, except that TCP or Remote Direct Memory Access tributed execution) that interrupt training part-way can be (RDMA) is used for inter-device communication on sepa- very costly, if all the progress made on the model is simply rate machines. This approach of mapping subgraphs onto lost. devices also offers potential scalability, because each worker Finally, UI/Visualization is a feature supported to very dif- can schedule its own subgraph at runtime instead of relying ferent degrees across the frameworks studied. The ability on a centralized master [15]. Parallelism in Tensorflow can to monitor the progress of training and the internal state be expressed at several levels, notably both data parallelism of networks over time could be useful for debugging or hy- and model parallelism. Data parallelism can happen both perparameter tuning, and could be an interesting direction. across and within workers, by training separate batches of Tensorflow and Deeplearning4j both support this kind of data on model replications. Model parallelism is expressed visualization.
Recommended publications
  • Deep Learning Frameworks | NVIDIA Developer
    4/10/2017 Deep Learning Frameworks | NVIDIA Developer Deep Learning Frameworks The NVIDIA Deep Learning SDK accelerates widely­used deep learning frameworks such as Caffe, CNTK, TensorFlow, Theano and Torch as well as many other deep learning applications. Choose a deep learning framework from the list below, download the supported version of cuDNN and follow the instructions on the framework page to get started. Caffe is a deep learning framework made with expression, speed, and modularity in mind. Caffe is developed by the Berkeley Vision and Learning Center (BVLC), as well as community contributors and is popular for computer vision. Caffe supports cuDNN v5 for GPU acceleration. Supported interfaces: C, C++, Python, MATLAB, Command line interface Learning Resources Deep learning course: Getting Started with the Caffe Framework Blog: Deep Learning for Computer Vision with Caffe and cuDNN Download Caffe Download cuDNN The Microsoft Cognitive Toolkit —previously known as CNTK— is a unified deep­learning toolkit from Microsoft Research that makes it easy to train and combine popular model types across multiple GPUs and servers. Microsoft Cognitive Toolkit implements highly efficient CNN and RNN training for speech, image and text data. Microsoft Cognitive Toolkit supports cuDNN v5.1 for GPU acceleration. Supported interfaces: Python, C++, C# and Command line interface Download CNTK Download cuDNN TensorFlow is a software library for numerical computation using data flow graphs, developed by Google’s Machine Intelligence research organization. TensorFlow supports cuDNN v5.1 for GPU acceleration. Supported interfaces: C++, Python Download TensorFlow Download cuDNN https://developer.nvidia.com/deep­learning­frameworks 1/3 4/10/2017 Deep Learning Frameworks | NVIDIA Developer Theano is a math expression compiler that efficiently defines, optimizes, and evaluates mathematical expressions involving multi­dimensional arrays.
    [Show full text]
  • Lecture 6 Learned Feedforward Visual Processing Neural Networks, Deep Learning, Convnets
    William T. Freeman, Antonio Torralba, 2017 Lecture 6 Learned feedforward visual processing Neural Networks, Deep learning, ConvNets Some slides modified from R. Fergus We need translation invariance Lots of useful linear filters… Laplacian Gaussian derivative Gaussian Gabor And many more… High order Gaussian derivatives We need translation and scale invariance Lots of image pyramids… Gaussian Pyr Laplacian Pyr And many more: QMF, steerable, … We need … What is the best representation? • All the previous representation are manually constructed. • Could they be learnt from data? A brief history of Neural Networks enthusiasm time Perceptrons, 1958 Rosenblatt http://www.ecse.rpi.edu/homepages/nagy/PDF_chrono/2011_Na gy_Pace_FR.pdf. Photo by George Nagy 9 http://www.manhattanrarebooks-science.com/rosenblatt.htm Perceptrons, 1958 10 Perceptrons, 1958 enthusiasm time Minsky and Papert, Perceptrons, 1972 12 Perceptrons, 1958 enthusiasm Minsky and Papert, 1972 time Parallel Distributed Processing (PDP), 1986 14 XOR problem Inputs Output 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 PDP authors pointed to the backpropagation algorithm as a breakthrough, allowing multi-layer neural networks to be trained. Among the functions that a multi-layer network can represent but a single-layer network cannot: the XOR function. 15 Perceptrons, PDP book, 1958 1986 enthusiasm Minsky and Papert, 1972 time LeCun conv nets, 1998 Demos: http://yann.lecun.com/exdb/lenet/index.html 17 18 Neural networks to recognize handwritten digits? yes Neural networks for tougher problems? not really http://pub.clement.farabet.net/ecvw09.pdf 19 NIPS 2000 • NIPS, Neural Information Processing Systems, is the premier conference on machine learning.
    [Show full text]
  • Comparative Study of Deep Learning Software Frameworks
    Comparative Study of Deep Learning Software Frameworks Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, Mohak Shah Research and Technology Center, Robert Bosch LLC {Soheil.Bahrampour, Naveen.Ramakrishnan, fixed-term.Lukas.Schott, Mohak.Shah}@us.bosch.com ABSTRACT such as dropout and weight decay [2]. As the popular- Deep learning methods have resulted in significant perfor- ity of the deep learning methods have increased over the mance improvements in several application domains and as last few years, several deep learning software frameworks such several software frameworks have been developed to have appeared to enable efficient development and imple- facilitate their implementation. This paper presents a com- mentation of these methods. The list of available frame- parative study of five deep learning frameworks, namely works includes, but is not limited to, Caffe, DeepLearning4J, Caffe, Neon, TensorFlow, Theano, and Torch, on three as- deepmat, Eblearn, Neon, PyLearn, TensorFlow, Theano, pects: extensibility, hardware utilization, and speed. The Torch, etc. Different frameworks try to optimize different as- study is performed on several types of deep learning ar- pects of training or deployment of a deep learning algorithm. chitectures and we evaluate the performance of the above For instance, Caffe emphasises ease of use where standard frameworks when employed on a single machine for both layers can be easily configured without hard-coding while (multi-threaded) CPU and GPU (Nvidia Titan X) settings. Theano provides automatic differentiation capabilities which The speed performance metrics used here include the gradi- facilitates flexibility to modify architecture for research and ent computation time, which is important during the train- development. Several of these frameworks have received ing phase of deep networks, and the forward time, which wide attention from the research community and are well- is important from the deployment perspective of trained developed allowing efficient training of deep networks with networks.
    [Show full text]
  • Reinforcement Learning in Videogames
    Reinforcement Learning in Videogames Alex` Os´esLaza Final Project Director: Javier B´ejarAlonso FIB • UPC 20 de junio de 2017 2 Acknowledgements First I would like and appreciate the work and effort that my director, Javier B´ejar, has put into me. Either solving a ton of questions that I had or guiding me throughout the whole project, he has helped me a lot to make a good final project. I would also love to thank my family and friends, that supported me throughout the entirety of the career and specially during this project. 3 4 Abstract (English) While there are still a lot of projects and papers focused on: given a game, discover and measure which is the best algorithm for it, I decided to twist things around and decided to focus on two algorithms and its parameters be able to tell which games will be best approachable with it. To do this, I will be implementing both algorithms Q-Learning and SARSA, helping myself with Neural Networks to be able to represent the vast state space that the games have. The idea is to implement the algorithms as general as possible.This way in case someone wanted to use my algorithms for their game, it would take the less amount of time possible to adapt the game for the algorithm. I will be using some games that are used to make Artificial Intelligence competitions so I have a base to work with, having more time to focus on the actual algorithm implementation and results comparison. 5 6 Abstract (Catal`a) Mentre ja existeixen molts projectes i estudis centrats en: donat un joc, descobrir i mesurar quin es el millor algoritme per aquell joc, he decidit donar-li la volta i centrar-me en donat dos algorismes i els seus par`ametres,ser capa¸cde trobar quin tipus de jocs es beneficien m´esde la configuraci´odonada.
    [Show full text]
  • Distributed Negative Sampling for Word Embeddings Stergios Stergiou Zygimantas Straznickas Yahoo Research MIT [email protected] [email protected]
    Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Distributed Negative Sampling for Word Embeddings Stergios Stergiou Zygimantas Straznickas Yahoo Research MIT [email protected] [email protected] Rolina Wu Kostas Tsioutsiouliklis University of Waterloo Yahoo Research [email protected] [email protected] Abstract Training for such large vocabularies presents several chal- lenges. Word2Vec needs to maintain two d-dimensional vec- Word2Vec recently popularized dense vector word represen- tors of single-precision floating point numbers for each vo- tations as fixed-length features for machine learning algo- cabulary word. All vectors need to be kept in main memory rithms and is in widespread use today. In this paper we in- to achieve acceptable training latency, which is impractical vestigate one of its core components, Negative Sampling, and propose efficient distributed algorithms that allow us to scale for contemporary commodity servers. As a concrete exam- to vocabulary sizes of more than 1 billion unique words and ple, Ordentlich et al. (Ordentlich et al. 2016) describe an corpus sizes of more than 1 trillion words. application in which they use word embeddings to match search queries to online ads. Their dictionary comprises ≈ 200 million composite words which implies a main mem- Introduction ory requirement of ≈ 800 GB for d = 500. A complemen- tary challenge is that corpora sizes themselves increase. The Recently, Mikolov et al (Mikolov et al. 2013; Mikolov and largest reported dataset that has been used was 100 billion Dean 2013) introduced Word2Vec, a suite of algorithms words (Mikolov and Dean 2013) which required training for unsupervised training of dense vector representations of time in the order of days.
    [Show full text]
  • Comparative Study of Caffe, Neon, Theano, and Torch
    Workshop track - ICLR 2016 COMPARATIVE STUDY OF CAFFE,NEON,THEANO, AND TORCH FOR DEEP LEARNING Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, Mohak Shah Bosch Research and Technology Center fSoheil.Bahrampour,Naveen.Ramakrishnan, fixed-term.Lukas.Schott,[email protected] ABSTRACT Deep learning methods have resulted in significant performance improvements in several application domains and as such several software frameworks have been developed to facilitate their implementation. This paper presents a comparative study of four deep learning frameworks, namely Caffe, Neon, Theano, and Torch, on three aspects: extensibility, hardware utilization, and speed. The study is per- formed on several types of deep learning architectures and we evaluate the per- formance of the above frameworks when employed on a single machine for both (multi-threaded) CPU and GPU (Nvidia Titan X) settings. The speed performance metrics used here include the gradient computation time, which is important dur- ing the training phase of deep networks, and the forward time, which is important from the deployment perspective of trained networks. For convolutional networks, we also report how each of these frameworks support various convolutional algo- rithms and their corresponding performance. From our experiments, we observe that Theano and Torch are the most easily extensible frameworks. We observe that Torch is best suited for any deep architecture on CPU, followed by Theano. It also achieves the best performance on the GPU for large convolutional and fully connected networks, followed closely by Neon. Theano achieves the best perfor- mance on GPU for training and deployment of LSTM networks. Finally Caffe is the easiest for evaluating the performance of standard deep architectures.
    [Show full text]
  • Torch7: a Matlab-Like Environment for Machine Learning
    Torch7: A Matlab-like Environment for Machine Learning Ronan Collobert1 Koray Kavukcuoglu2 Clement´ Farabet3;4 1 Idiap Research Institute 2 NEC Laboratories America Martigny, Switzerland Princeton, NJ, USA 3 Courant Institute of Mathematical Sciences 4 Universite´ Paris-Est New York University, New York, NY, USA Equipe´ A3SI - ESIEE Paris, France Abstract Torch7 is a versatile numeric computing framework and machine learning library that extends Lua. Its goal is to provide a flexible environment to design and train learning machines. Flexibility is obtained via Lua, an extremely lightweight scripting language. High performance is obtained via efficient OpenMP/SSE and CUDA implementations of low-level numeric routines. Torch7 can easily be in- terfaced to third-party software thanks to Lua’s light interface. 1 Torch7 Overview With Torch7, we aim at providing a framework with three main advantages: (1) it should ease the development of numerical algorithms, (2) it should be easily extended (including the use of other libraries), and (3) it should be fast. We found that a scripting (interpreted) language with a good C API appears as a convenient solu- tion to “satisfy” the constraint (2). First, a high-level language makes the process of developing a program simpler and more understandable than a low-level language. Second, if the programming language is interpreted, it becomes also easier to quickly try various ideas in an interactive manner. And finally, assuming a good C API, the scripting language becomes the “glue” between hetero- geneous libraries: different structures of the same concept (coming from different libraries) can be hidden behind a unique structure in the scripting language, while keeping all the functionalities coming from all the different libraries.
    [Show full text]
  • BUSEM at Semeval-2017 Task 4A Sentiment Analysis with Word
    BUSEM at SemEval-2017 Task 4 Sentiment Analysis with Word Embedding and Long Short Term Memory RNN Approaches Deger Ayata1, Murat Saraclar 1, Arzucan Ozgur2 1Electrical & Electronical Engineering Department, Bogaziçi University 2Computer Engineering Department, Bogaziçi University Istanbul , Turkey {deger.ayata, murat.saraclar, arzucan.ozgur}@boun.edu.tr Abstract uses word embeddings for feature representation and Support Vector Machine This paper describes our approach for (SVM), Random Forest (RF) and Naive Bayes SemEval-2017 Task 4: Sentiment Analysis in (NB) algorithms for classification Twitter Twitter. We have participated in Subtask A: messages into negative, neutral and positive Message Polarity Classification subtask and polarity. The second system is based on Long developed two systems. The first system Short Term Memory Recurrent Neural uses word embeddings for feature representation and Support Vector Machine, Networks (LSTM) and uses word indexes as Random Forest and Naive Bayes algorithms sequence of inputs for feature representation. for the classification of Twitter messages into The remainder of this article is structured as negative, neutral and positive polarity. The follows: Section 2 contains information about second system is based on Long Short Term the system description and Section 3 explains Memory Recurrent Neural Networks and methods, models, tools and software packages uses word indexes as sequence of inputs for used in this work. Test cases and datasets are feature representation. explained in Section 4. Results are given in Section 5 with discussions. Finally, section 6 summarizes the conclusions and future work. 1 Introduction 2 System Description Sentiment analysis is extracting subjective information from source materials, via natural We have developed two independent language processing, computational linguistics, systems.
    [Show full text]
  • Tensorflow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction
    TensorFlow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction TensorFlow Google Brain, 2015 (rewritten DistBelief) Theano University of Montréal, 2009 Keras François Chollet, 2015 (now at Google) Torch Facebook AI Research, Twitter, Google DeepMind Caffe Berkeley Vision and Learning Center (BVLC), 2013 Outline 1. Introduction of each framework a. TensorFlow b. Theano c. Keras d. Torch e. Caffe 2. Further comparison a. Code + models b. Community and documentation c. Performance d. Model deployment e. Extra features 3. Which framework to choose when ..? Introduction of each framework TensorFlow architecture 1) Low-level core (C++/CUDA) 2) Simple Python API to define the computational graph 3) High-level API (TF-Learn, TF-Slim, soon Keras…) TensorFlow computational graph - auto-differentiation! - easy multi-GPU/multi-node - native C++ multithreading - device-efficient implementation for most ops - whole pipeline in the graph: data loading, preprocessing, prefetching... TensorBoard TensorFlow development + bleeding edge (GitHub yay!) + division in core and contrib => very quick merging of new hotness + a lot of new related API: CRF, BayesFlow, SparseTensor, audio IO, CTC, seq2seq + so it can easily handle images, videos, audio, text... + if you really need a new native op, you can load a dynamic lib - sometimes contrib stuff disappears or moves - recently introduced bells and whistles are barely documented Presentation of Theano: - Maintained by Montréal University group. - Pioneered the use of a computational graph. - General machine learning tool -> Use of Lasagne and Keras. - Very popular in the research community, but not elsewhere. Falling behind. What is it like to start using Theano? - Read tutorials until you no longer can, then keep going.
    [Show full text]
  • Introduction to Deep Learning Framework 1. Introduction 1.1
    Introduction to Deep Learning Framework 1. Introduction 1.1. Commonly used frameworks The most commonly used frameworks for deep learning include Pytorch, Tensorflow, Keras, caffe, Apache MXnet, etc. PyTorch: open source machine learning library; developed by Facebook AI Rsearch Lab; based on the Torch library; supports Python and C++ interfaces. Tensorflow: open source software library dataflow and differentiable programming; developed by Google brain team; provides stable Python & C APIs. Keras: an open-source neural-network library written in Python; conceived to be an interface; capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML. Caffe: open source under BSD licence; developed at University of California, Berkeley; written in C++ with a Python interface. Apache MXnet: an open-source deep learning software framework; supports a flexible programming model and multiple programming languages (including C++, Python, Java, Julia, Matlab, JavaScript, Go, R, Scala, Perl, and Wolfram Language.) 1.2. Pytorch 1.2.1 Data Tensor: the major computation unit in PyTorch. Tensor could be viewed as the extension of vector (one-dimensional) and matrix (two-dimensional), which could be defined with any dimension. Variable: a wrapper of tensor, which includes creator, value of variable (tensor), and gradient. This is the core of the automatic derivation in Pytorch, as it has the information of both the value and the creator, which is very important for current backward process. Parameter: a subset of variable 1.2.2. Functions: NNModules: NNModules (torch.nn) is a combination of parameters and functions, and could be interpreted as layers. There some common modules such as convolution layers, linear layers, pooling layers, dropout layers, etc.
    [Show full text]
  • DIY Deep Learning for Vision: the Caffe Framework
    DIY Deep Learning for Vision: the Caffe framework caffe.berkeleyvision.org github.com/BVLC/caffe Evan Shelhamer adapted from the Caffe tutorial with Jeff Donahue, Yangqing Jia, and Ross Girshick. Why Deep Learning? The Unreasonable Effectiveness of Deep Features Classes separate in the deep representations and transfer to many tasks. [DeCAF] [Zeiler-Fergus] Why Deep Learning? The Unreasonable Effectiveness of Deep Features Maximal activations of pool5 units [R-CNN] conv5 DeConv visualization Rich visual structure of features deep in hierarchy. [Zeiler-Fergus] Why Deep Learning? The Unreasonable Effectiveness of Deep Features 1st layer filters image patches that strongly activate 1st layer filters [Zeiler-Fergus] What is Deep Learning? Compositional Models Learned End-to-End What is Deep Learning? Compositional Models Learned End-to-End Hierarchy of Representations - vision: pixel, motif, part, object - text: character, word, clause, sentence - speech: audio, band, phone, word concrete abstract learning What is Deep Learning? Compositional Models Learned End-to-End figure credit Yann LeCun, ICML ‘13 tutorial What is Deep Learning? Compositional Models Learned End-to-End Back-propagation: take the gradient of the model layer-by-layer by the chain rule to yield the gradient of all the parameters. figure credit Yann LeCun, ICML ‘13 tutorial What is Deep Learning? Vast space of models! Caffe models are loss-driven: - supervised - unsupervised slide credit Marc’aurelio Ranzato, CVPR ‘14 tutorial. Convolutional Neural Nets (CNNs): 1989 LeNet: a layered model composed of convolution and subsampling operations followed by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ] Convolutional Nets: 2012 AlexNet: a layered model composed of convolution, + data subsampling, and further operations followed by a holistic + gpu representation and all-in-all a landmark classifier on + non-saturating nonlinearity ILSVRC12.
    [Show full text]
  • Enhancing OMB with Python Benchmarks Presentation at MUG ‘21
    MVAPICH2 Meets Python: Enhancing OMB with Python Benchmarks Presentation at MUG ‘2 Nawras Alnaasan Network-Based Computing Laboratory Dept. of Computer Science and Engineering !he #hio State $ni%ersity alnaasan.&'osu.edu O$tline • #S$ (icro Benc"marks )#(B* • +"y ,yt"on- • (,. for ,yt"on • .ntroducing #(B-,y • .nitial /esults • Conclusion !etwork Base" Com#$ting %a&oratory MUG ‘2 2 O)U Micro Benchmarks (OMB+ • #(B is a widely used package to measure performance of (,. operations. • .t helps in optimi0ing (,I applications on different HPC systems. • #1ers a series of benchmarks including but not limited to point-to-point, blocking and non-blocking collecti%es, and one-sided tests. • A large set of options and flags for users to create customi0able tests. • Supports different platforms like /#Cm/CUDA to run on AR(4N5IDIA GPUs. • +ritten in C !etwork Base" Com#$ting %a&oratory MUG ‘2 ( -hy Python? • Second most popular programming language according to the !.#BE inde7 as of August 8021. Adopted from= "ttps=44www.tiobe.com4tiobe-inde74 • 6reat community and support around (achine Learning Big Data and Cloud Computing – (any deep learning frameworks like ,yTorch, !ensor3ow :eras etc. – #ne of the most important tools for data science and analytics. • 6reat ,otential to use (,. and distributed computing tec"niques. !etwork• (any Base" other Com#$ting ,yt"on libraries and frameworks like Sci,y Numpy D<ango etc. %a&oratory MUG ‘2 , -hy Python? 5ery flexible and simplified synta7. Adopted from= !etwork Base" Com#$ting "ttps:44www."ardikp.com489&>4&84?94pyt"on-cpp4 %a&oratory MUG ‘2 / MPI for Python • (,.
    [Show full text]