High Performance Python PRACTICAL PERFORMANT PROGRAMMING for HUMANS
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
The Javascript Revolution
Top teams present at Segfault Tank on 4/21: 1 Duel: 6 (2 extra shifted from self votes) 2 Ambassador: 4 3 QuickSource: 3 4 ChalkBoard: 3 5 Fortuna Beer: 3 Bottom teams present in class this Thursday 4/16: 1 Scribble: 2 2 ClearViz: 2 3 AllInOne: 1 4 TripSplitter: 0 Shockers: Scribble & Fortuna Congrats on sneaky strategizing to get yourself to the top :) The moment of fruit: the class has spoken Shockers: Scribble & Fortuna Congrats on sneaky strategizing to get yourself to the top :) The moment of fruit: the class has spoken Top teams present at Segfault Tank on 4/21: 1 Duel: 6 (2 extra shifted from self votes) 2 Ambassador: 4 3 QuickSource: 3 4 ChalkBoard: 3 5 Fortuna Beer: 3 Bottom teams present in class this Thursday 4/16: 1 Scribble: 2 2 ClearViz: 2 3 AllInOne: 1 4 TripSplitter: 0 Congrats on sneaky strategizing to get yourself to the top :) The moment of fruit: the class has spoken Top teams present at Segfault Tank on 4/21: 1 Duel: 6 (2 extra shifted from self votes) 2 Ambassador: 4 3 QuickSource: 3 4 ChalkBoard: 3 5 Fortuna Beer: 3 Bottom teams present in class this Thursday 4/16: 1 Scribble: 2 2 ClearViz: 2 3 AllInOne: 1 4 TripSplitter: 0 Shockers: Scribble & Fortuna The moment of fruit: the class has spoken Top teams present at Segfault Tank on 4/21: 1 Duel: 6 (2 extra shifted from self votes) 2 Ambassador: 4 3 QuickSource: 3 4 ChalkBoard: 3 5 Fortuna Beer: 3 Bottom teams present in class this Thursday 4/16: 1 Scribble: 2 2 ClearViz: 2 3 AllInOne: 1 4 TripSplitter: 0 Shockers: Scribble & Fortuna Congrats on sneaky strategizing -
Python on Gpus (Work in Progress!)
Python on GPUs (work in progress!) Laurie Stephey GPUs for Science Day, July 3, 2019 Rollin Thomas, NERSC Lawrence Berkeley National Laboratory Python is friendly and popular Screenshots from: https://www.tiobe.com/tiobe-index/ So you want to run Python on a GPU? You have some Python code you like. Can you just run it on a GPU? import numpy as np from scipy import special import gpu ? Unfortunately no. What are your options? Right now, there is no “right” answer ● CuPy ● Numba ● pyCUDA (https://mathema.tician.de/software/pycuda/) ● pyOpenCL (https://mathema.tician.de/software/pyopencl/) ● Rewrite kernels in C, Fortran, CUDA... DESI: Our case study Now Perlmutter 2020 Goal: High quality output spectra Spectral Extraction CuPy (https://cupy.chainer.org/) ● Developed by Chainer, supported in RAPIDS ● Meant to be a drop-in replacement for NumPy ● Some, but not all, NumPy coverage import numpy as np import cupy as cp cpu_ans = np.abs(data) #same thing on gpu gpu_data = cp.asarray(data) gpu_temp = cp.abs(gpu_data) gpu_ans = cp.asnumpy(gpu_temp) Screenshot from: https://docs-cupy.chainer.org/en/stable/reference/comparison.html eigh in CuPy ● Important function for DESI ● Compared CuPy eigh on Cori Volta GPU to Cori Haswell and Cori KNL ● Tried “divide-and-conquer” approach on both CPU and GPU (1, 2, 5, 10 divisions) ● Volta wins only at very large matrix sizes ● Major pro: eigh really easy to use in CuPy! legval in CuPy ● Easy to convert from NumPy arrays to CuPy arrays ● This function is ~150x slower than the cpu version! ● This implies there -
A Practical Solution for Scripting Language Compilers
A Practical Solution for Scripting Language Compilers Paul Biggar, Edsko de Vries, David Gregg Department of Computer Science, Trinity College Dublin, Dublin 2, Ireland Abstract Although scripting languages are becoming increasingly popular, even mature script- ing language implementations remain interpreted. Several compilers and reimplemen- tations have been attempted, generally focusing on performance. Based on our survey of these reimplementations, we determine that there are three important features of scripting languages that are difficult to compile or reimplement. Since scripting languages are defined primarily through the semantics of their original implementations, they often change semantics between releases. They provide C APIs, used both for foreign-function interfaces and to write third-party extensions. These APIs typically have tight integration with the original implementation, and are used to providelarge standard libraries, which are difficult to re-use, and costly to reimplement. Finally, they support run-time code generation. These features make the important goal of correctness difficult to achieve for compilers and reimplementations. We present a technique to support these features in an ahead-of-time compiler for PHP. Our technique uses the original PHP implementation through the provided C API, both in our compiler, and in our generated code. We support all of these impor- tant scripting language features, particularly focusing on the correctness of compiled programs. Additionally, our approach allows us to automatically support limited fu- ture language changes. We present a discussion and performance evaluation of this technique. Key words: Compiler, Scripting Language 1. Motivation Although scripting languages1 are becoming increasingly popular, most scripting language implementations remain interpreted. Typically, these implementations are slow, between one and two orders of magnitude slower than C. -
Introduction Shrinkage Factor Reference
Comparison study for implementation efficiency of CUDA GPU parallel computation with the fast iterative shrinkage-thresholding algorithm Younsang Cho, Donghyeon Yu Department of Statistics, Inha university 4. TensorFlow functions in Python (TF-F) Introduction There are some functions executed on GPU in TensorFlow. So, we implemented our algorithm • Parallel computation using graphics processing units (GPUs) gets much attention and is just using that functions. efficient for single-instruction multiple-data (SIMD) processing. 5. Neural network with TensorFlow in Python (TF-NN) • Theoretical computation capacity of the GPU device has been growing fast and is much higher Neural network model is flexible, and the LASSO problem can be represented as a simple than that of the CPU nowadays (Figure 1). neural network with an ℓ1-regularized loss function • There are several platforms for conducting parallel computation on GPUs using compute 6. Using dynamic link library in Python (P-DLL) unified device architecture (CUDA) developed by NVIDIA. (Python, PyCUDA, Tensorflow, etc. ) As mentioned before, we can load DLL files, which are written in CUDA C, using "ctypes.CDLL" • However, it is unclear what platform is the most efficient for CUDA. that is a built-in function in Python. 7. Using dynamic link library in R (R-DLL) We can also load DLL files, which are written in CUDA C, using "dyn.load" in R. FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) We consider FISTA (Beck and Teboulle, 2009) with backtracking as the following: " Step 0. Take �! > 0, some � > 1, and �! ∈ ℝ . Set �# = �!, �# = 1. %! Step k. � ≥ 1 Find the smallest nonnegative integers �$ such that with �g = � �$&# � �(' �$ ≤ �(' �(' �$ , �$ . -
Prototyping and Developing GPU-Accelerated Solutions with Python and CUDA Luciano Martins and Robert Sohigian, 2018-11-22 Introduction to Python
Prototyping and Developing GPU-Accelerated Solutions with Python and CUDA Luciano Martins and Robert Sohigian, 2018-11-22 Introduction to Python GPU-Accelerated Computing NVIDIA® CUDA® technology Why Use Python with GPUs? Agenda Methods: PyCUDA, Numba, CuPy, and scikit-cuda Summary Q&A 2 Introduction to Python Released by Guido van Rossum in 1991 The Zen of Python: Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Interpreted language (CPython, Jython, ...) Dynamically typed; based on objects 3 Introduction to Python Small core structure: ~30 keywords ~ 80 built-in functions Indentation is a pretty serious thing Dynamically typed; based on objects Binds to many different languages Supports GPU acceleration via modules 4 Introduction to Python 5 Introduction to Python 6 Introduction to Python 7 GPU-Accelerated Computing “[T]the use of a graphics processing unit (GPU) together with a CPU to accelerate deep learning, analytics, and engineering applications” (NVIDIA) Most common GPU-accelerated operations: Large vector/matrix operations (Basic Linear Algebra Subprograms - BLAS) Speech recognition Computer vision 8 GPU-Accelerated Computing Important concepts for GPU-accelerated computing: Host ― the machine running the workload (CPU) Device ― the GPUs inside of a host Kernel ― the code part that runs on the GPU SIMT ― Single Instruction Multiple Threads 9 GPU-Accelerated Computing 10 GPU-Accelerated Computing 11 CUDA Parallel computing -
Tangent: Automatic Differentiation Using Source-Code Transformation for Dynamically Typed Array Programming
Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming Bart van Merriënboer Dan Moldovan Alexander B Wiltschko MILA, Google Brain Google Brain Google Brain [email protected] [email protected] [email protected] Abstract The need to efficiently calculate first- and higher-order derivatives of increasingly complex models expressed in Python has stressed or exceeded the capabilities of available tools. In this work, we explore techniques from the field of automatic differentiation (AD) that can give researchers expressive power, performance and strong usability. These include source-code transformation (SCT), flexible gradient surgery, efficient in-place array operations, and higher-order derivatives. We implement and demonstrate these ideas in the Tangent software library for Python, the first AD framework for a dynamic language that uses SCT. 1 Introduction Many applications in machine learning rely on gradient-based optimization, or at least the efficient calculation of derivatives of models expressed as computer programs. Researchers have a wide variety of tools from which they can choose, particularly if they are using the Python language [21, 16, 24, 2, 1]. These tools can generally be characterized as trading off research or production use cases, and can be divided along these lines by whether they implement automatic differentiation using operator overloading (OO) or SCT. SCT affords more opportunities for whole-program optimization, while OO makes it easier to support convenient syntax in Python, like data-dependent control flow, or advanced features such as custom partial derivatives. We show here that it is possible to offer the programming flexibility usually thought to be exclusive to OO-based tools in an SCT framework. -
Testing Database Engines Via Pivoted Query Synthesis Manuel Rigger and Zhendong Su, ETH Zurich
Testing Database Engines via Pivoted Query Synthesis Manuel Rigger and Zhendong Su, ETH Zurich https://www.usenix.org/conference/osdi20/presentation/rigger This paper is included in the Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation November 4–6, 2020 978-1-939133-19-9 Open access to the Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX Testing Database Engines via Pivoted Query Synthesis Manuel Rigger Zhendong Su Department of Computer Science, ETH Zurich Abstract query on multiple DBMSs, which the author implemented in a tool RAGS [46]. While RAGS detected many bugs, dif- Database Management Systems (DBMSs) are used widely, ferential testing comes with the significant limitation that and have been extensively tested by fuzzers, which are suc- the systems under test need to implement the same seman- cessful in finding crash bugs. However, approaches to finding tics for a given input. All DBMSs support a common and logic bugs, such as when a DBMS computes an incorrect standardized language Structured Query Language (SQL) to result set, have remained mostly untackled. To this end, we create, access, and modify data [8]. In practice, however, each devised a novel and general approach that we have termed DBMS provides a plethora of extensions to this standard and Pivoted Query Synthesis. The core idea of this approach is to deviates from it in other parts (e.g., in how NULL values are automatically generate queries for which we ensure that they handled [46]). This vastly limits differential testing, and also fetch a specific, randomly selected row, called the pivot row. -
Proseminar Python - Python Bindings
Proseminar Python - Python Bindings Sven Fischer Student an der TU-Dresden [email protected] Abstract Diese Arbeit beschaftigt¨ sich damit, einen Einblick in die Benut- zung von externen Bibliotheken in Python zu geben und fuhrt¨ den Leser in das Schreiben von eigenen Erweiterungen ein. Daruber¨ hinaus wird darauf eingegangen, wie Projekte in anderen Program- miersprachen Python benutzen konnen.¨ Weiterhin wird kurz auf verschiedene andere Moglichkeiten¨ eingegangen, Python oder Py- thons Benutzung zu erweitern und zu verandern:¨ durch Kompilati- on, Mischen“ mit oder Ubersetzen¨ in anderen Sprachen, oder spe- ” zielle Interpreter. Abbildung 1. Vergleich der Moglichkeiten¨ Python zu erwei- tern (links) und Python einzubetten (rechts). Categories and Subject Descriptors D.3.3 [Programming Lan- guages]: Language Constructs and Features—Modules, packages 1. Daten von C nach Python konvertieren General Terms Languages, Documentation 2. Python Funktion mit konvertierten Werten aufrufen Keywords Python, Extension, Embedding, Library 3. Ergebnis von Python zuruck¨ nach C konvertieren 1. Einfuhrung¨ Diese Konvertierung ist auch das großte¨ Hindernis beim Ver- 1 Python wird mit einer umfangreichen Standartbibliothek ausgelie- knupfen¨ von Python und C , zumindest ist es mit einigem Aufwand fert. Trotzdem gibt es Grunde,¨ warum man Python um verschiedene verbunden. externe Bibliotheken erweitern mochte.¨ Ein Beispiel dafur¨ ware¨ die Es gibt einige Projekte, die sich mit dem Verandern¨ und Erwei- ¨ Geschwindigkeit, die bei Python als interpretierter Sprache nicht tern der Sprache Python beschaftigen.¨ Einen kurzen Uberblick uber¨ immer den Anforderungen entspricht. Weiterhin gibt es genugend¨ einige Moglichkeiten¨ gebe ich in Abschnitt4. Dort gehe ich auf Software von Drittanbietern, welche man in Python nutzbar ma- zwei Python-Interpreter neben dem in der Standard-Distribution chen will - ohne sie in Python zu ubersetzen.¨ Darauf gehe ich in enthaltenen ein und zeige Moglichkeiten¨ auf, Python in andere Abschnitt2 ein. -
Efficient Use of Python on the Clusters
Efficient use of Python on the clusters Ariel Lozano CÉCI training November 21, 2018 Outline I Analyze our code with profiling tools: I cpu: cProfile, line_profiler, kernprof I memory: memory_profiler, mprof I Being a highly abstract dynamically typed language, how to make a more efficient use of hardware internals? I Numpy and Scipy ecosystem (mainly wrappers to C/Fortran compiled code) I binding to compiled code: interfaces between python and compiled modules I compiling: tools to compile python code I parallelism: modules to exploit multicores Sieve of eratostenes Algorithm to find all prime numbers up to any given limit. Ex: Find all the prime numbers less than or equal to 25: I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Cross out every number displaced by 2 after 2 up to the limit: I 23 45 67 89 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Move to next n non crossed, cross out each non crossed number displaced by n: I 23 45 67 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 I 2 3 45 67 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 The remaining numbers non crossed in the list are all the primes below limit. 2 Trivial optimization: jump directlyp to n to start crossing out. Then, n must loop only up to limit. -
Shed Skin Documentation Release V0.9.4
Shed Skin Documentation Release v0.9.4 Mark Dufour the Shed Skin contributors Mar 16, 2017 Contents 1 An experimental (restricted-Python)-to-C++ compiler1 2 Documentation 3 2.1 Shed Skin documentation........................................3 2.1.1 Introduction...........................................3 2.1.2 Typing restrictions.......................................3 2.1.3 Python subset restrictions....................................4 2.1.4 Library limitations.......................................5 2.1.5 Installation...........................................6 2.1.5.1 Windows........................................6 2.1.5.2 UNIX.........................................6 2.1.5.2.1 Using a package manager..........................6 2.1.5.2.2 Manual installation.............................6 2.1.5.3 OSX..........................................7 2.1.5.3.1 Manual installation.............................7 2.1.6 Compiling a standalone program................................8 2.1.7 Generating an extension module................................8 2.1.7.1 Limitations.......................................9 2.1.7.2 Numpy integration...................................9 2.1.8 Distributing binaries...................................... 10 2.1.8.1 Windows........................................ 10 2.1.8.2 UNIX......................................... 10 2.1.9 Multiprocessing......................................... 10 2.1.10 Calling C/C++ code....................................... 11 2.1.10.1 Standard library................................... -
Python Guide Documentation 0.0.1
Python Guide Documentation 0.0.1 Kenneth Reitz 2015 11 07 Contents 1 3 1.1......................................................3 1.2 Python..................................................5 1.3 Mac OS XPython.............................................5 1.4 WindowsPython.............................................6 1.5 LinuxPython...............................................8 2 9 2.1......................................................9 2.2...................................................... 15 2.3...................................................... 24 2.4...................................................... 25 2.5...................................................... 27 2.6 Logging.................................................. 31 2.7...................................................... 34 2.8...................................................... 37 3 / 39 3.1...................................................... 39 3.2 Web................................................... 40 3.3 HTML.................................................. 47 3.4...................................................... 48 3.5 GUI.................................................... 49 3.6...................................................... 51 3.7...................................................... 52 3.8...................................................... 53 3.9...................................................... 58 3.10...................................................... 59 3.11...................................................... 62 -
Hyperlearn Documentation Release 1
HyperLearn Documentation Release 1 Daniel Han-Chen Jun 19, 2020 Contents 1 Example code 3 1.1 hyperlearn................................................3 1.1.1 hyperlearn package.......................................3 1.1.1.1 Submodules......................................3 1.1.1.2 hyperlearn.base module................................3 1.1.1.3 hyperlearn.linalg module...............................7 1.1.1.4 hyperlearn.utils module................................ 10 1.1.1.5 hyperlearn.random module.............................. 11 1.1.1.6 hyperlearn.exceptions module............................. 11 1.1.1.7 hyperlearn.multiprocessing module.......................... 11 1.1.1.8 hyperlearn.numba module............................... 11 1.1.1.9 hyperlearn.solvers module............................... 12 1.1.1.10 hyperlearn.stats module................................ 14 1.1.1.11 hyperlearn.big_data.base module........................... 15 1.1.1.12 hyperlearn.big_data.incremental module....................... 15 1.1.1.13 hyperlearn.big_data.lsmr module........................... 15 1.1.1.14 hyperlearn.big_data.randomized module....................... 16 1.1.1.15 hyperlearn.big_data.truncated module........................ 17 1.1.1.16 hyperlearn.decomposition.base module........................ 18 1.1.1.17 hyperlearn.decomposition.NMF module....................... 18 1.1.1.18 hyperlearn.decomposition.PCA module........................ 18 1.1.1.19 hyperlearn.decomposition.PCA module........................ 18 1.1.1.20 hyperlearn.discriminant_analysis.base