A Fresh Approach to GPU Computing What Is Julia?

A Fresh Approach to GPU Computing What Is Julia?

Julia A Fresh Approach to GPU Computing What is Julia? function mandel(z) Technical computing language c = z maxiter = 80 for n = 1:maxiter if abs(z) > 2 High-level like Python return n-1 end z = z^2 + c end Performance of C return maxiter end Julia for GPU programming What is GPU programming? High-level TensorFlow, Keras Flux.jl, Knet.jl ArrayFire, Thrust GPUArrays.jl cuBLAS, cuDNN CuArrays.jl CUB, MGPU CUDAnative.jl CUDA C Low-level Programming with libraries cuBLAS.jl a = CuArray(Float32,2) b = curand(Float32, 2) cuFFT.jl cuRAND.jl CuArrays.jl a*a cuSPARSE.jl fft(a) cuDNN.jl qrfact(a) cuSOLVER.jl softmax(a) Programming with kernels Much harder! Designed for performance Multiple dispatch function foo(x) if isa(x, Int64) … foo(x::Int64) = … elseif isa(x, Float64) … foo(x::Float64) = … end end Designed for performance Type inference function sigmoid(x) temp = exp(-x) return (1 / (1+temp)) end Designed for performance Type inference function sigmoid(x::Int) temp = exp(-x)::Float64 return (1 / (1+temp))::Float64 end Designed for performance Type inference function sigmoid(x::Float32) temp = exp(-x)::Float32 return (1 / (1+temp))::Float32 end Machine-native types Designed for performance Multiple dispatch Type inference High-quality Machine-native types stand-alone machine code Specializing JIT compiler Extensible language Inspect & Inject Machine Source AST Julia IR LLVM IR code Configure How does it look? function vadd(a, b, c) No DSL, no subset, just Julia i = threadIdx().x c[i] = a[i] + b[i] return end CUDA abstraction level a = CuArray(randn(2,2)) b = CuArray(randn(2,2)) c = similar(a) Performance parity @cuda threads=4 vadd(a,b,c) How does it run? How does it work? function vadd(a, b, c) vadd i = threadIdx().x c[i] = a[i] + b[i] vadd return CuArray{Float64,2} end LLVM IR a = CuArray(randn(2,2)) b = CuArray(randn(2,2)) c = similar(a) PTX @cuda threads=4 vadd(a,b,c) How does it work? vadd Run time JIT compiler vadd vadd CuArray{Float64,2} CuArray{Int32,3} Fully transparent LLVM IR LLVM IR PTX PTX No overhead! High-level GPU programming Great performance Clean & concise ( (a .+ b) ./ d ) .- e Generic code function vadd(a, b, c) i = threadIdx().x c[i] = a[i] + b[i] return end W = randn(2, 10) b = randn(2) f(x) = softmax(W * x .+ b) model = Chain( Dense(10, 5, σ), From GPU kernels, to differentiable algorithms, to Dense(5, 2), high-level layer stacking. All on one platform. softmax) Pkg.add(“Flux”) The Julia Magic Everything just works with everything else! Differential Equations Machine Learning Everything You Build CUDA Automatic Differentiation All the HPC Tooling Differential Operations Deep Learning Equations Research JuliaDB.jl & StructOfArrays.jl DistributedArrays.jl DataFrames.jl Generic programming is extremely powerful. function model(tree) if isleaf(tree) tree.value else model(tree.left) + model(tree.right) end Case Studies JuliaCon – 300 Attendees, 150 Talks Julia https://github.com/JuliaGPU/ NVIDIA Parallel Forall blog https://github.com/FluxML/.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    36 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us