What knows about ML languages that you may not

(Spoiler: only Swift and Julia make the cut)

Alan Edelman (MIT & JC) Viral Shah (Julia Computing) Juan Pablo Vielma (MIT) Chris Rackauckas (UC Irvine) Software Toolchain

4

2

1 2 3 4 5 6 7

- 2

- 4

Multiphysics (PDEs) Optimization Adjoint Methods Machine Surrogate Models (Backprop/Autodiff) Learning Dim Reduction Composability Sensitivity Analysis Performance Uncertainty Scalable Confidence Intervals Nimble/Agile Quantification ? Modern Software Development Languages Studied by Google as Powerful for ML https://github.com/tensorflow/swift/blob/master/docs/WhySwiftForTensorFlow.md Languages Studied by Google as Powerful for ML

1. Filter on Technical Merits Languages Studied by Google as Powerful for ML

2. Filter on Usability 1. Filter on Technical Merits Languages Studied by Google as Powerful for ML

Julia: Julia is … currently investing in techniques, and even have good interoperability with Python . The Julia community shares many common values…

2. Filter on Usability 1. Filter on Technical Merits KNet Psychology of Programming Languages

All languages are equally good.

… but I tune out any mention of languages I don’t use Psychology of Programming Languages

All languages are equally good.

… but I tune out any mention of languages I don’t use …at least until Google tells me to pay attention Software Toolchain

4

2

1 2 3 4 5 6 7

- 2

- 4

Multiphysics (PDEs) Optimization Adjoint Methods Machine Surrogate Models (Backprop/Autodiff) Learning Dim Reduction Composability Sensitivity Analysis Performance Uncertainty Scalable Confidence Intervals Nimble/Agile Quantification ? Topology Optimization ---- 20 years ago what we did

Write Force Balance Law  Finite Elements  Linear System  Solve  Graph

Dense Matrices

Sparse Matrices Now!

Fancy Differential Equations  Dimensionality Reduction  Write Force Balance Law  Finite Elements  Linear System  Solve  Graph

Topology Optimizaton Dense Matrices

Compose many physical systems Sparse Matrices Black Boxes vs White Boxes

Floating Point Generic Types Numbers

visible code Floating Point Numbers

Generic Types Black Boxes vs White Boxes

Floating Point Generic Types Numbers

Opaque visible code Floating Point Numbers

 Legacy Code Generic Types  Can’t hit all the criteria Code must rewrite other code An Idealized Modern Toolchain for Energy (we can have this!) Today (Fragmented) What could be with high level tools & generic types PDEs, PDMPs, … floats in trigger on demand,  True Physical Equations sensitivity analysis floats out adaptive data generation Neural Network, … floats in for more accurate surrogates  needs programmable form Surrogate Model specialized machine learning floats out floats in models for efficient optimization optimal solutions with Optimization uncertainty estimates floats out

Retrofitting your software for machine learning, sensitivity analysis, scalability, optimization

We would love to work with each and every one of you ML models are really programs

• Support hardware accelerators (GPUs, TPUs, Nervana, New silicon)

• Parallelization (Multi-threading, Multi-GPU, Distributed)

• Optimization (Placement, Memory Use, Low overhead)

• Automatic Differentiation

• Ease of programming (Math notation, Debuggers, Libraries)

• Ease of deployment (Cloud, Phones, Embedded)

ML problems are really language problems New models have new demands

Models commonly need:

• Conditional branching

• Loops for recurrence

• Recursion over trees

# Stanford TreeBank def model(tree): if isleaf(tree): tree.value else: model(tree.left) + model(tree.right)

In areas such as probabilistic programming

• Models need to reason about other programs (e.g. program generators and interpreters)

• Include non-differentiable components like Monte Carlo Tree Search. TensorFlow is more like a language and less like a ● We build a “computational graph” (essentially an AST)

Lazy (Eval) programming in JS

function add(a,b) { return `${a}+${b}`; }

● Which may contain control flow (tf.if, tf.while), variable scoping x = 1; y = 2 z = add(‘x’, ‘y’) // ‘x+y’ eval(z) // 3 x = 4 eval(z) // 6

● Cannot reuse existing libraries. Need new libraries for I/O and data processing.

Abadi, M., Isard, M., Murray, D., A Computational Model for TensorFlow: An Introduction, 2018. Academic Page of Juan Pablo Vielma 3/9/16, 12:08 PM

Jennifer Challis E62-571, 100 Main Street, Cambridge, MA 02142 (617) 324-4378 jchallis at mit dot edu Collaborators

Shabbir Ahmed, Daniel Bienstock, Daniel Dadush, Sanjeeb Dash, Santanu S. Dey, Iain Dunning, Rodolfo Carvajal, Luis A. Cisternas, Miguel Constantino, Daniel Espinoza, Alexandre S. Freire, Marcos Goycoolea, Oktay Günlük, Joey Huchette, Nathalie E. Jamett, Ahmet B. Keha, Mustafa . Kılınç, Guido Lagos, Miles Lubin, Sajad Modaresi, Sina Modaresi, Eduardo Moreno, Diego Morán, Alan T. Murray, George L. Nemhauser, Luis Rademacher, David M. Ryan, Denis Saure, Alejandro Toriello, Andres Weintraub, Sercan Yıldız, Tauhid Zaman Academic Page of Juan Pablo Vielma 3/9/16, 12:08 PM

Jennifer Challis Affiliations E62-571, 100 Main Street, Cambridge, MA 02142 (617) 324-4378 jchallis at mit dot edu Collaborators

Shabbir Ahmed, Daniel Bienstock, Daniel Dadush, Sanjeeb Dash, Santanu S. Dey, Iain Dunning, Rodolfo Carvajal, Luis A. Cisternas, Miguel Constantino, Daniel Espinoza, Alexandre S. Freire, Marcos Goycoolea, Oktay Günlük, Joey Huchette, Nathalie E. Jamett, Ahmet B. Keha, Mustafa R. Kılınç, Guido Lagos, Miles Lubin, Sajad Modaresi, Sina Modaresi, Eduardo Moreno, Diego Morán, Alan T. Murray, George L. Nemhauser, Luis Rademacher, David M. Ryan, Denis Saure, Alejandro Toriello, Andres Weintraub, Sercan Yıldız, Tauhid Zaman Mixed Integer OptimizationA andffiliation sJulia Links

• Mixed Integer Optimization • Discrete + nonlinear Links • TheoreticallyBack to top hard © 2013 Juan Pablo Vielma | Last updated: 02/29/2016 00:47:25 | Based on a template design by Andreas • RoutinelyViklund solved in practice • Back to top © 2013 Juan Pablo Vielma | Last updated: 02/29/2016 00:47:25 | Based on a template design by Andreas Vi•klundOptimization http://www.mit.edu/~jvielma/ modelling language Page 3 of 3

http://wwwand.mit.edu/~jvi elminterphasea/ Page 3 of 3 • Easy to use and http://www.gurobi.com/company/example-customers advanced • Integrated into Julia GPU computing in Julia

Native Array Libraries – CuArrays.jl, GPUArrays.jl, CLArrays.jl

Performance difference between CUDA ++ and CUDAnative.jl CUDAnative.jl: 1,300 LOC implementations of several benchmarks from the Rodinia benchmark suite.

Besard. T., Foket, C., De Sutter, B., Effective Extensible Programming: Unleashing Julia on GPUs, 2017. Julia ML at PetaScale to catalog the visible universe

650,000 cores. 1.3M threads. 60 TB of data. Cori Phase II

Cori Phase II – 1.5 PetaFlop/s Google TPU 3.0: 100 PetaFLOPS per Pod

Google TPU 3.0: 100 PetaFlop/s per Pod It just works (Part I)

“We can teach our autodiff system to differentiate the svd” vs “It just works because of built in abstractions in language design” It just works (Part II)

Machine learning with operators (not dense Build Operators matrices, not sparse solve with matrices) “backslash”

Not Blackboard  formula implementation debugging Software Toolchain

4

2

1 2 3 4 5 6 7

- 2

- 4

Multiphysics (PDEs) Optimization Adjoint Methods Machine Surrogate Models (Backprop/Autodiff) Learning Dim Reduction Composability Sensitivity Analysis Performance Uncertainty Scalable Confidence Intervals Nimble/Agile Quantification ?