A Multi-Stage, Python-Embedded DSL for Machine Learning Function
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Intro to Tensorflow 2.0 MBL, August 2019
Intro to TensorFlow 2.0 MBL, August 2019 Josh Gordon (@random_forests) 1 Agenda 1 of 2 Exercises ● Fashion MNIST with dense layers ● CIFAR-10 with convolutional layers Concepts (as many as we can intro in this short time) ● Gradient descent, dense layers, loss, softmax, convolution Games ● QuickDraw Agenda 2 of 2 Walkthroughs and new tutorials ● Deep Dream and Style Transfer ● Time series forecasting Games ● Sketch RNN Learning more ● Book recommendations Deep Learning is representation learning Image link Image link Latest tutorials and guides tensorflow.org/beta News and updates medium.com/tensorflow twitter.com/tensorflow Demo PoseNet and BodyPix bit.ly/pose-net bit.ly/body-pix TensorFlow for JavaScript, Swift, Android, and iOS tensorflow.org/js tensorflow.org/swift tensorflow.org/lite Minimal MNIST in TF 2.0 A linear model, neural network, and deep neural network - then a short exercise. bit.ly/mnist-seq ... ... ... Softmax model = Sequential() model.add(Dense(256, activation='relu',input_shape=(784,))) model.add(Dense(128, activation='relu')) model.add(Dense(10, activation='softmax')) Linear model Neural network Deep neural network ... ... Softmax activation After training, select all the weights connected to this output. model.layers[0].get_weights() # Your code here # Select the weights for a single output # ... img = weights.reshape(28,28) plt.imshow(img, cmap = plt.get_cmap('seismic')) ... ... Softmax activation After training, select all the weights connected to this output. Exercise 1 (option #1) Exercise: bit.ly/mnist-seq Reference: tensorflow.org/beta/tutorials/keras/basic_classification TODO: Add a validation set. Add code to plot loss vs epochs (next slide). Exercise 1 (option #2) bit.ly/ijcav_adv Answers: next slide. -
SOL: Effortless Device Support for AI Frameworks Without Source Code Changes
SOL: Effortless Device Support for AI Frameworks without Source Code Changes Nicolas Weber and Felipe Huici NEC Laboratories Europe Abstract—Modern high performance computing clusters heav- State of the Art Proposed with SOL ily rely on accelerators to overcome the limited compute power API (Python, C/C++, …) API (Python, C/C++, …) of CPUs. These supercomputers run various applications from different domains such as simulations, numerical applications or Framework Core Framework Core artificial intelligence (AI). As a result, vendors need to be able to Device Backends SOL efficiently run a wide variety of workloads on their hardware. In the AI domain this is in particular exacerbated by the Fig. 1: Abstraction layers within AI frameworks. existance of a number of popular frameworks (e.g, PyTorch, TensorFlow, etc.) that have no common code base, and can vary lines of code to their scripts in order to enable SOL and its in functionality. The code of these frameworks evolves quickly, hardware support. making it expensive to keep up with all changes and potentially We explore two strategies to integrate new devices into AI forcing developers to go through constant rounds of upstreaming. frameworks using SOL as a middleware, to keep the original In this paper we explore how to provide hardware support in AI frameworks without changing the framework’s source code in AI framework unchanged and still add support to new device order to minimize maintenance overhead. We introduce SOL, an types. The first strategy hides the entire offloading procedure AI acceleration middleware that provides a hardware abstraction from the framework, and the second only injects the necessary layer that allows us to transparently support heterogenous hard- functionality into the framework to enable the execution, but ware. -
Deep Learning Frameworks | NVIDIA Developer
4/10/2017 Deep Learning Frameworks | NVIDIA Developer Deep Learning Frameworks The NVIDIA Deep Learning SDK accelerates widelyused deep learning frameworks such as Caffe, CNTK, TensorFlow, Theano and Torch as well as many other deep learning applications. Choose a deep learning framework from the list below, download the supported version of cuDNN and follow the instructions on the framework page to get started. Caffe is a deep learning framework made with expression, speed, and modularity in mind. Caffe is developed by the Berkeley Vision and Learning Center (BVLC), as well as community contributors and is popular for computer vision. Caffe supports cuDNN v5 for GPU acceleration. Supported interfaces: C, C++, Python, MATLAB, Command line interface Learning Resources Deep learning course: Getting Started with the Caffe Framework Blog: Deep Learning for Computer Vision with Caffe and cuDNN Download Caffe Download cuDNN The Microsoft Cognitive Toolkit —previously known as CNTK— is a unified deeplearning toolkit from Microsoft Research that makes it easy to train and combine popular model types across multiple GPUs and servers. Microsoft Cognitive Toolkit implements highly efficient CNN and RNN training for speech, image and text data. Microsoft Cognitive Toolkit supports cuDNN v5.1 for GPU acceleration. Supported interfaces: Python, C++, C# and Command line interface Download CNTK Download cuDNN TensorFlow is a software library for numerical computation using data flow graphs, developed by Google’s Machine Intelligence research organization. TensorFlow supports cuDNN v5.1 for GPU acceleration. Supported interfaces: C++, Python Download TensorFlow Download cuDNN https://developer.nvidia.com/deeplearningframeworks 1/3 4/10/2017 Deep Learning Frameworks | NVIDIA Developer Theano is a math expression compiler that efficiently defines, optimizes, and evaluates mathematical expressions involving multidimensional arrays. -
1 Amazon Sagemaker
1 Amazon SageMaker 13:45~14:30 Amazon SageMaker 14:30~15:15 Amazon SageMaker re:Invent 15:15~15:45 Q&A | 15:45~17:00 Amazon SageMaker 20 SmartNews Data Scientist, Meng Lee Sagemaker SageMaker - FiNC FiNC Technologies SIGNATE Amazon SageMaker SIGNATE CTO 17:00~17:15© 2018, Amazon Web Services, Inc. or itsQ&A Affiliates. All rights reserved. Amazon Confidential and Trademark Amazon SageMaker Makoto Shimura, Solutions Architect 2019/01/15 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • • • • ⎼ Amazon Athena ⎼ AWS Glue ⎼ Amazon SageMaker © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • • Amazon SageMaker • Amazon SageMasker • SageMaker SDK • [ | | ] • Amazon SageMaker • © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 開発 学習 推論推論 学習に使うコードを記述 大量の GPU 大量のCPU や GPU 小規模データで動作確認 大規模データの処理 継続的なデプロイ 試行錯誤の繰り返し 様々なデバイスで動作 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 開発 学習 推論推論 エンジニアがプロダク データサイエンティストが開発環境で作業 ション環境に構築 開発と学習を同じ 1 台のインスタンスで実施 API サーバにデプロイ Deep Learning であれば GPU インスタンスを使用 エッジデバイスで動作 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark & • 開発 学習 推論推論 • エンジニアがプロダク データサイエンティストが開発環境で作業 • ション環境に構築 開発と学習を同じ 1 台のインスタンスで実施 API サーバにデプロイ • Deep Learning であれば GPU インスタンスを使用 エッジデバイスで動作 • API • • © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Amazon SageMaker © 2018, Amazon Web Services, Inc. -
Intel® Optimized AI Frameworks
Intel® optimized AI frameworks Dr. Fabio Baruffa & Shailen Sobhee Technical Consulting Engineers, Intel IAGS Visit: www.intel.ai/technology Speed up development using open AI software Machine learning Deep learning TOOLKITS App Open source platform for building E2E Analytics & Deep learning inference deployment Open source, scalable, and developers AI applications on Apache Spark* with distributed on CPU/GPU/FPGA/VPU for Caffe*, extensible distributed deep learning TensorFlow*, Keras*, BigDL TensorFlow*, MXNet*, ONNX*, Kaldi* platform built on Kubernetes (BETA) Python R Distributed Intel-optimized Frameworks libraries * • Scikit- • Cart • MlLib (on Spark) * * And more framework Data learn • Random • Mahout optimizations underway • Pandas Forest including PaddlePaddle*, scientists * * • NumPy • e1071 Chainer*, CNTK* & others Intel® Intel® Data Analytics Intel® Math Kernel Library Kernels Distribution Acceleration Library Library for Deep Neural Networks for Python* (Intel® DAAL) (Intel® MKL-DNN) developers Intel distribution High performance machine Open source compiler for deep learning optimized for learning & data analytics Open source DNN functions for model computations optimized for multiple machine learning library CPU / integrated graphics devices (CPU, GPU, NNP) from multiple frameworks (TF, MXNet, ONNX) 2 Visit: www.intel.ai/technology Speed up development using open AI software Machine learning Deep learning TOOLKITS App Open source platform for building E2E Analytics & Deep learning inference deployment Open source, scalable, -
Python on Gpus (Work in Progress!)
Python on GPUs (work in progress!) Laurie Stephey GPUs for Science Day, July 3, 2019 Rollin Thomas, NERSC Lawrence Berkeley National Laboratory Python is friendly and popular Screenshots from: https://www.tiobe.com/tiobe-index/ So you want to run Python on a GPU? You have some Python code you like. Can you just run it on a GPU? import numpy as np from scipy import special import gpu ? Unfortunately no. What are your options? Right now, there is no “right” answer ● CuPy ● Numba ● pyCUDA (https://mathema.tician.de/software/pycuda/) ● pyOpenCL (https://mathema.tician.de/software/pyopencl/) ● Rewrite kernels in C, Fortran, CUDA... DESI: Our case study Now Perlmutter 2020 Goal: High quality output spectra Spectral Extraction CuPy (https://cupy.chainer.org/) ● Developed by Chainer, supported in RAPIDS ● Meant to be a drop-in replacement for NumPy ● Some, but not all, NumPy coverage import numpy as np import cupy as cp cpu_ans = np.abs(data) #same thing on gpu gpu_data = cp.asarray(data) gpu_temp = cp.abs(gpu_data) gpu_ans = cp.asnumpy(gpu_temp) Screenshot from: https://docs-cupy.chainer.org/en/stable/reference/comparison.html eigh in CuPy ● Important function for DESI ● Compared CuPy eigh on Cori Volta GPU to Cori Haswell and Cori KNL ● Tried “divide-and-conquer” approach on both CPU and GPU (1, 2, 5, 10 divisions) ● Volta wins only at very large matrix sizes ● Major pro: eigh really easy to use in CuPy! legval in CuPy ● Easy to convert from NumPy arrays to CuPy arrays ● This function is ~150x slower than the cpu version! ● This implies there -
Theano: a Python Framework for Fast Computation of Mathematical Expressions (The Theano Development Team)∗
Theano: A Python framework for fast computation of mathematical expressions (The Theano Development Team)∗ Rami Al-Rfou,6 Guillaume Alain,1 Amjad Almahairi,1 Christof Angermueller,7, 8 Dzmitry Bahdanau,1 Nicolas Ballas,1 Fred´ eric´ Bastien,1 Justin Bayer, Anatoly Belikov,9 Alexander Belopolsky,10 Yoshua Bengio,1, 3 Arnaud Bergeron,1 James Bergstra,1 Valentin Bisson,1 Josh Bleecher Snyder, Nicolas Bouchard,1 Nicolas Boulanger-Lewandowski,1 Xavier Bouthillier,1 Alexandre de Brebisson,´ 1 Olivier Breuleux,1 Pierre-Luc Carrier,1 Kyunghyun Cho,1, 11 Jan Chorowski,1, 12 Paul Christiano,13 Tim Cooijmans,1, 14 Marc-Alexandre Cotˆ e,´ 15 Myriam Cotˆ e,´ 1 Aaron Courville,1, 4 Yann N. Dauphin,1, 16 Olivier Delalleau,1 Julien Demouth,17 Guillaume Desjardins,1, 18 Sander Dieleman,19 Laurent Dinh,1 Melanie´ Ducoffe,1, 20 Vincent Dumoulin,1 Samira Ebrahimi Kahou,1, 2 Dumitru Erhan,1, 21 Ziye Fan,22 Orhan Firat,1, 23 Mathieu Germain,1 Xavier Glorot,1, 18 Ian Goodfellow,1, 24 Matt Graham,25 Caglar Gulcehre,1 Philippe Hamel,1 Iban Harlouchet,1 Jean-Philippe Heng,1, 26 Balazs´ Hidasi,27 Sina Honari,1 Arjun Jain,28 Sebastien´ Jean,1, 11 Kai Jia,29 Mikhail Korobov,30 Vivek Kulkarni,6 Alex Lamb,1 Pascal Lamblin,1 Eric Larsen,1, 31 Cesar´ Laurent,1 Sean Lee,17 Simon Lefrancois,1 Simon Lemieux,1 Nicholas Leonard,´ 1 Zhouhan Lin,1 Jesse A. Livezey,32 Cory Lorenz,33 Jeremiah Lowin, Qianli Ma,34 Pierre-Antoine Manzagol,1 Olivier Mastropietro,1 Robert T. McGibbon,35 Roland Memisevic,1, 4 Bart van Merrienboer,¨ 1 Vincent Michalski,1 Mehdi Mirza,1 Alberto Orlandi, Christopher Pal,1, 2 Razvan Pascanu,1, 18 Mohammad Pezeshki,1 Colin Raffel,36 Daniel Renshaw,25 Matthew Rocklin, Adriana Romero,1 Markus Roth, Peter Sadowski,37 John Salvatier,38 Franc¸ois Savard,1 Jan Schluter,¨ 39 John Schulman,24 Gabriel Schwartz,40 Iulian Vlad Serban,1 Dmitriy Serdyuk,1 Samira Shabanian,1 Etienne´ Simon,1, 41 Sigurd Spieckermann, S. -
Introduction Shrinkage Factor Reference
Comparison study for implementation efficiency of CUDA GPU parallel computation with the fast iterative shrinkage-thresholding algorithm Younsang Cho, Donghyeon Yu Department of Statistics, Inha university 4. TensorFlow functions in Python (TF-F) Introduction There are some functions executed on GPU in TensorFlow. So, we implemented our algorithm • Parallel computation using graphics processing units (GPUs) gets much attention and is just using that functions. efficient for single-instruction multiple-data (SIMD) processing. 5. Neural network with TensorFlow in Python (TF-NN) • Theoretical computation capacity of the GPU device has been growing fast and is much higher Neural network model is flexible, and the LASSO problem can be represented as a simple than that of the CPU nowadays (Figure 1). neural network with an ℓ1-regularized loss function • There are several platforms for conducting parallel computation on GPUs using compute 6. Using dynamic link library in Python (P-DLL) unified device architecture (CUDA) developed by NVIDIA. (Python, PyCUDA, Tensorflow, etc. ) As mentioned before, we can load DLL files, which are written in CUDA C, using "ctypes.CDLL" • However, it is unclear what platform is the most efficient for CUDA. that is a built-in function in Python. 7. Using dynamic link library in R (R-DLL) We can also load DLL files, which are written in CUDA C, using "dyn.load" in R. FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) We consider FISTA (Beck and Teboulle, 2009) with backtracking as the following: " Step 0. Take �! > 0, some � > 1, and �! ∈ ℝ . Set �# = �!, �# = 1. %! Step k. � ≥ 1 Find the smallest nonnegative integers �$ such that with �g = � �$&# � �(' �$ ≤ �(' �(' �$ , �$ . -
Keras2c: a Library for Converting Keras Neural Networks to Real-Time Compatible C
Keras2c: A library for converting Keras neural networks to real-time compatible C Rory Conlina,∗, Keith Ericksonb, Joeseph Abbatec, Egemen Kolemena,b,∗ aDepartment of Mechanical and Aerospace Engineering, Princeton University, Princeton NJ 08544, USA bPrinceton Plasma Physics Laboratory, Princeton NJ 08544, USA cDepartment of Astrophysical Sciences at Princeton University, Princeton NJ 08544, USA Abstract With the growth of machine learning models and neural networks in mea- surement and control systems comes the need to deploy these models in a way that is compatible with existing systems. Existing options for deploying neural networks either introduce very high latency, require expensive and time con- suming work to integrate into existing code bases, or only support a very lim- ited subset of model types. We have therefore developed a new method called Keras2c, which is a simple library for converting Keras/TensorFlow neural net- work models into real-time compatible C code. It supports a wide range of Keras layers and model types including multidimensional convolutions, recurrent lay- ers, multi-input/output models, and shared layers. Keras2c re-implements the core components of Keras/TensorFlow required for predictive forward passes through neural networks in pure C, relying only on standard library functions considered safe for real-time use. The core functionality consists of ∼ 1500 lines of code, making it lightweight and easy to integrate into existing codebases. Keras2c has been successfully tested in experiments and is currently in use on the plasma control system at the DIII-D National Fusion Facility at General Atomics in San Diego. 1. Motivation TensorFlow[1] is one of the most popular libraries for developing and training neural networks. -
Prototyping and Developing GPU-Accelerated Solutions with Python and CUDA Luciano Martins and Robert Sohigian, 2018-11-22 Introduction to Python
Prototyping and Developing GPU-Accelerated Solutions with Python and CUDA Luciano Martins and Robert Sohigian, 2018-11-22 Introduction to Python GPU-Accelerated Computing NVIDIA® CUDA® technology Why Use Python with GPUs? Agenda Methods: PyCUDA, Numba, CuPy, and scikit-cuda Summary Q&A 2 Introduction to Python Released by Guido van Rossum in 1991 The Zen of Python: Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Interpreted language (CPython, Jython, ...) Dynamically typed; based on objects 3 Introduction to Python Small core structure: ~30 keywords ~ 80 built-in functions Indentation is a pretty serious thing Dynamically typed; based on objects Binds to many different languages Supports GPU acceleration via modules 4 Introduction to Python 5 Introduction to Python 6 Introduction to Python 7 GPU-Accelerated Computing “[T]the use of a graphics processing unit (GPU) together with a CPU to accelerate deep learning, analytics, and engineering applications” (NVIDIA) Most common GPU-accelerated operations: Large vector/matrix operations (Basic Linear Algebra Subprograms - BLAS) Speech recognition Computer vision 8 GPU-Accelerated Computing Important concepts for GPU-accelerated computing: Host ― the machine running the workload (CPU) Device ― the GPUs inside of a host Kernel ― the code part that runs on the GPU SIMT ― Single Instruction Multiple Threads 9 GPU-Accelerated Computing 10 GPU-Accelerated Computing 11 CUDA Parallel computing -
Tangent: Automatic Differentiation Using Source-Code Transformation for Dynamically Typed Array Programming
Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming Bart van Merriënboer Dan Moldovan Alexander B Wiltschko MILA, Google Brain Google Brain Google Brain [email protected] [email protected] [email protected] Abstract The need to efficiently calculate first- and higher-order derivatives of increasingly complex models expressed in Python has stressed or exceeded the capabilities of available tools. In this work, we explore techniques from the field of automatic differentiation (AD) that can give researchers expressive power, performance and strong usability. These include source-code transformation (SCT), flexible gradient surgery, efficient in-place array operations, and higher-order derivatives. We implement and demonstrate these ideas in the Tangent software library for Python, the first AD framework for a dynamic language that uses SCT. 1 Introduction Many applications in machine learning rely on gradient-based optimization, or at least the efficient calculation of derivatives of models expressed as computer programs. Researchers have a wide variety of tools from which they can choose, particularly if they are using the Python language [21, 16, 24, 2, 1]. These tools can generally be characterized as trading off research or production use cases, and can be divided along these lines by whether they implement automatic differentiation using operator overloading (OO) or SCT. SCT affords more opportunities for whole-program optimization, while OO makes it easier to support convenient syntax in Python, like data-dependent control flow, or advanced features such as custom partial derivatives. We show here that it is possible to offer the programming flexibility usually thought to be exclusive to OO-based tools in an SCT framework. -
Fault Tolerance and Re-Training Analysis on Neural Networks
FAULT TOLERANCE AND RE-TRAINING ANALYSIS ON NEURAL NETWORKS by ABHINAV KURIAN GEORGE B.Tech Electronics and Communication Engineering Amrita Vishwa Vidhyapeetham, Kerala, 2012 A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science, Computer Engineering, College of Engineering and Applied Science, University of Cincinnati, Ohio 2019 Thesis Committee: Chair: Wen-Ben Jone, Ph.D. Member: Carla Purdy, Ph.D. Member: Ranganadha Vemuri, Ph.D. ABSTRACT In the current age of big data, artificial intelligence and machine learning technologies have gained much popularity. Due to the increasing demand for such applications, neural networks are being targeted toward hardware solutions. Owing to the shrinking feature size, number of physical defects are on the rise. These growing number of defects are preventing designers from realizing the full potential of the on-chip design. The challenge now is not only to find solutions that balance high-performance and energy-efficiency but also, to achieve fault-tolerance of a computational model. Neural computing, due to its inherent fault tolerant capabilities, can provide promising solutions to this issue. The primary focus of this thesis is to gain deeper understanding of fault tolerance in neural network hardware. As a part of this work, we present a comprehensive analysis of fault tolerance by exploring effects of faults on popular neural models: multi-layer perceptron model and convolution neural network. We built the models based on conventional 64-bit floating point representation. In addition to this, we also explore the recent 8-bit integer quantized representation. A fault injector model is designed to inject stuck-at faults at random locations in the network.