Julia: a Modern Language for Modern ML
Total Page:16
File Type:pdf, Size:1020Kb
Julia: A modern language for modern ML Dr. Viral Shah and Dr. Simon Byrne www.juliacomputing.com What we do: Modernize Technical Computing Today’s technical computing landscape: • Develop new learning algorithms • Run them in parallel on large datasets • Leverage accelerators like GPUs, Xeon Phis • Embed into intelligent products “Business as usual” will simply not do! General Micro-benchmarks: Julia performs almost as fast as C • 10X faster than Python • 100X faster than R & MATLAB Performance benchmark relative to C. A value of 1 means as fast as C. Lower values are better. A real application: Gillespie simulations in systems biology 745x faster than R • Gillespie simulations are used in the field of drug discovery. • Also used for simulations of epidemiological models to study disease propagation • Julia package (Gillespie.jl) is the state of the art in Gillespie simulations • https://github.com/openjournals/joss- papers/blob/master/joss.00042/10.21105.joss.00042.pdf Implementation Time per simulation (ms) R (GillespieSSA) 894.25 R (handcoded) 1087.94 Rcpp (handcoded) 1.31 Julia (Gillespie.jl) 3.99 Julia (Gillespie.jl, passing object) 1.78 Julia (handcoded) 1.2 Those who convert ideas to products fastest will win Computer Quants develop Scientists prepare algorithms The last 25 years for production (Python, R, SAS, DEPLOY (C++, C#, Java) Matlab) Quants and Computer Compress the Scientists DEPLOY innovation cycle collaborate on one platform - JULIA with Julia Julia offers competitive advantages to its users Julia is poised to become one of the Thank you for Julia. Yo u ' v e k i n d l ed leading tools deployed by developers serious excitement. I am now working and programmers at banks, hedge funds, toward replacing some of our regulators and vendors computationally intensive Matlab tools with Julia. Anthony Malakian, Waters Technology Magazine Patrick Majors, Engineering Manager, Cooper Tires Research anchored at MIT The Julia community: 225,000 users Expecting to reach 1 million users and 10,000 enterprises by 2019 JuliaCon 2016: 50 talks and 250 attendees Traction across Industries FINANCE ENGINEERING IOT 3D PRINTING Economic Air Collision Self-driving Cars 3D Printing Models at the NY Avoidance for at UC Berkeley Quadcopters at Fed FAA Voxel8 Machine Learning Machine Learning: Write once, Run everywhere Many machine learning frameworks Run on hardware of your choice Mocha.jl Merlin.jl Knet.jl Machine Learning to build a sky atlas on 8000 cores at NERSC Netflix recommendation challenge: Faster than Spark • RecSys.jl - Large movie data set (500 million parameters) • Distributed Alternating Least Squares SVD- based model executed in Julia and in Spark • Faster: • Original code in Scala • Distributed Julia nearly 2x faster than Spark • Better: • Julia code is significantly more readable • Easy to maintain and update http://juliacomputing.com/blog/2016/04/22/a-parallel-recommendation-engine-in-julia.html High performance Microrheology at Path Bio Analytics Analytics for Personalized Medicine • Improving the Quantity and Quality of Information via Microrheology-Based Analytics • Camera-based real-time particle tracking at KHz rates and Angstrom accuracy • Real-time organoid analysis leading to precision medicine. • Julia was the only system that allowed for real-time analysis of instrumentation data Deep learning for diabetic retinopathy detection http://juliacomputing.com/blog/2016/11/16/deep-eyes.html Normal Eye Fundus Eye Fundus Infected with Diabetic Retinopathy Neural style transfer • Deep learning model with MXNet • Performance AND expressivity • Easy to experiment • Training on the CPU and GPU • Explore pre-trained models Finance Solvency II Actuarial Capital Modeling • Purpose of their Calculation Kernel • Calculation of a Solvency II Balance Sheet • Particularly focuses on the Solvency Capital Requirement • Use of Monte Carlo Simulation, currently up to 500,000 scenarios • Involves aggregation (summing up legal entities to a Group), ranking and smoothing • Generates various outputs for downstream reporting “Solvency II compliant models in Julia are 1000x faster than IBM Algorithmics, 10x lesser code and took 1/10 the time to implement” – Tim Thornham, Director of Financial Solutions Modeling Economic Scenario Generator • High-dimensional data set on which data extraction, data reordering, and various statistical kernel computations are performed • Faster: – Original code was in K – Julia code is 4x-10x faster • Better: – Julia code is significantly more readable – Easy to maintain and update – Cost-effective Mathematical Optimization • Solving a large complex mathematical optimization problem for mortgages • Full optimization: (Faster Speed + Better Quality) – MATLAB 2014a 558.094600 seconds, 3110 iterations – Julia v0.4 1.833 seconds, 50 iterations (300x faster) • Performance: Objective function only (100 iterations) – MATLAB 2014a 2.69 seconds – Julia v0.4 0.78 seconds (3.5x faster) • Quality: Optimization value (11-parameter) – MATLAB 2014a 4.277644613116166e+14 (3110 iterations) – Julia v0.4 4.270887086707642e+14 (50 iterations) Risk Analytics and Asset Management • BlackRock is using Julia in its flagship Aladdin product: – Next generation analytics – Risk management – Asset management – Time series analytics • Significant gain in productivity and scalability Asset and Liabilities Modeling at Brazilian Development Bank • Manage >$1 Trillion in assets • Multistage stochastic optimization solution to the bank’s returns “Selected Julia for its speed, elegance, – Choosing the best allocation, funding and JuMP – the Julia Mathematical and hedge decisions Optimization Package” - Felipe Tavares – Subject to a wide range of business, political and market restrictions Mathematical Optimization Solver capabilities accessible through JuMP Solver L MILP SOC MISOC SDP NLP MINL Other JuMP P P P P Bonmin MathProgBase.jl (via ✔ ✔ ✔ ✔ AmplNLwriter.jl) Cbc.jl Clp.jl CPLEX.jl Cbc (.jl) ✔ ✔ ECOS.jl GLPK.jl Gurobi.jl Clp (.jl) ✔ Couenne Ipopt.jl KNITRO.jl Mosek.jl (via ✔ ✔ ✔ ✔ ApmlNLWriter.jl) NLopt.jl SCS.jl IP CPLEX (.jl) ✔ ✔ ✔ ✔ callbacks Key: LP = Linear Programming ECOS (.jl) ✔ ✔ MILP = Mixed Integer Linear Programming SOCP = Second-order cone programming IP (includes convex QP and QCQP) GLPK (.jl) ✔ ✔ callbacks MISOCP = Mixed Integer SOCP SDP = Semidefinite Programming IP NLP = (constrained) Nonlinear Programming Gurobi (.jl) ✔ ✔ ✔ ✔ callbacks (includes general QP and QCQP) MINLP = Mixed Integer NLP Ipopt (.jl) ✔ ✔ Notes: Artelys Knitro (.jl) ✔ ✔ ✔ ✔ 1. Problem must be convex. Mosek (.jl) ✔ ✔ ✔ ✔ ✔ ✔1 NLopt (.jl) ✔ SCS (.jl) ✔ ✔ ✔ Some JuMP Applications • Train scheduling • Self-driving cars • Electric vehicle charging • Power grid control • Plasma physics • Fantasy sports If you have a choice of several languages, it is, all other things being equal, a mistake to program in anything but the most powerful one. Paul Graham in Beating the Averages Co-Founder, Y-Combinator www.juliacomputing.com Simplicity meets Speed Products that make Julia easy to use, easy to deploy and easy to scale Simon Byrne - Julia Computing What is Julia? Julia is a modern, high-performance, dynamic programming language for technical computing. modern: based on the lessons of the past 60 years high-performance: as fast as traditional "fast" languages (Fortran/C/C++) dynamic: "simple to use" (R/Matlab/Python) technical computing: anything involving numbers Why Julia? To write fast, efficient code in an easy, elegant dynamic language Avoids the two language problem: My R/Python/Matlab code is too slow; I need to rewrite low-level routines in C/C++/Fortran It is easy to "peek under the hood" Most of Julia is written Julia Can inspect various stages of the compilation process It's free (download at www.julialang.org) It's fun. Play nicely with existing tools In [1]: # accurately compute log(sum(exp(X))) function logsumexp(X) u = maximum(X) t = 0.0 for i = 1:length(X) t += exp(X[i]-u) end u + log(t) end Out[1]: logsumexp (generic function with 1 method) Syntax heavily influenced by Python and Matlab Basic differences from Python: explicit end vs. significant whitespace 1-based vs. 0-based arrays Basic differences from Matlab: Functions can be defined anywhere Scalars are not matrices in disguise randn(10) gives you the thing you actually want. Types Every object has one: In [2]: typeof(1.0) Out[2]: Float64 In [3]: typeof(logsumexp) Out[3]: #logsumexp In [4]: typeof(Float64) Out[4]: DataType New types are declared with the type keyword: In [5]: type Baz a::Float64 b::Float64 end In [6]: b = Baz(1.0,2.0) Out[6]: Baz(1.0,2.0) Unlike classes in Python/Matlab, user defined types are just as efficient as the builtin types (indeed, most "builtin" types are actually written in Julia) Generic functions and multiple dispatch Julia functions are generic in that different code paths can be called depending on the type arguments. In [7]: f(x::Float64) = "$x is a float" # "$" does string substitution f(x::Int) = "$x is an integer" Out[7]: f (generic function with 2 methods) f(...) = ... is the same as function f(...) ... end :: is an optional type specification. In [8]: f(1.0) Out[8]: "1.0 is a float" In [9]: f(1) Out[9]: "1 is an integer" Unlike traditional object oriented languages (C++, Python, Matlab), functions don't "belong" to a type. This allows for multiple dispatch on any combination of arguments. In [10]: f(x::Float64,y::Int) = "$x is a float, but $y is an integer" f(x::Real,y::Real) = "$x and $y are both some sort of real" # Real is an abstrac t "super" type f(x,y)