Bringing Probabilistic Programming to Scientific Simulators at Scale

Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale Atılım Güneş Baydin Lei Shao Wahid Bhimji University of Oxford Intel Corporation Lawrence Berkeley National Laboratory Lukas Heinrich Lawrence Meadows Jialin Liu CERN Intel Corporation Lawrence Berkeley National Laboratory Andreas Munk Saeid Naderiparizi Bradley Gram-Hansen University of British Columbia University of British Columbia University of Oxford Gilles Louppe Mingfei Ma Xiaohui Zhao University of Liège Intel Corporation Intel Corporation Philip Torr Victor Lee Kyle Cranmer University of Oxford Intel Corporation New York University Prabhat Frank Wood Lawrence Berkeley National University of British Columbia Laboratory ABSTRACT ACM Reference Format: Probabilistic programming languages (PPLs) are receiving wide- Atılım Güneş Baydin, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence spread attention for performing Bayesian inference in complex Meadows, Jialin Liu, Andreas Munk, Saeid Naderiparizi, Bradley Gram- Hansen, Gilles Louppe, Mingfei Ma, Xiaohui Zhao, Philip Torr, Victor Lee, generative models. However, applications to science remain limited Kyle Cranmer, Prabhat, and Frank Wood. 2019. Etalumis: Bringing Proba- because of the impracticability of rewriting complex scientifc simu- bilistic Programming to Scientifc Simulators at Scale. In The International lators in a PPL, the computational cost of inference, and the lack of Conference for High Performance Computing, Networking, Storage, and Anal- scalable implementations. To address these, we present a novel PPL ysis (SC ’19), November 17–22, 2019, Denver, CO, USA. ACM, New York, NY, framework that couples directly to existing scientifc simulators USA, 14 pages. https://doi.org/10.1145/3295500.3356180 through a cross-platform probabilistic execution protocol and provides Markov chain Monte Carlo (MCMC) and deep-learning-based 1 INTRODUCTION inference compilation (IC) engines for tractable inference. To guide IC inference, we perform distributed training of a dynamic 3DCNN– Probabilistic programming [71] is an emerging paradigm within ma- LSTM architecture with a PyTorch-MPI-based framework on 1,024 chine learning that uses general-purpose programming languages 32-core CPU nodes of the Cori supercomputer with a global mini- to express probabilistic models. This is achieved by introducing batch size of 128k: achieving a performance of 450 Tfop/s through statistical conditioning as a language construct so that inverse prob- enhancements to PyTorch. We demonstrate a Large Hadron Col- lems can be expressed. Probabilistic programming languages (PPLs) lider (LHC) use-case with the C++ Sherpa simulator and achieve have semantics [67] that can be understood as Bayesian inference the largest-scale posterior inference in a Turing-complete PPL. [13, 24, 26]. The major challenge in designing useful PPL systems is that language evaluators must solve arbitrary, user-provided in- KEYWORDS verse problems, which usually requires general-purpose inference algorithms that are computationally expensive. probabilistic programming, simulation, inference, deep learning In this paper we report our work that enables, for the frst time, the use of existing stochastic simulator code as a probabilistic pro- gram in which one can do fast, repeated (amortized) Bayesian infer- Permission to make digital or hard copies of part or all of this work for personal or ence; this enables one to predict the distribution of input parameters classroom use is granted without fee provided that copies are not made or distributed and all random choices in the simulator from an observation of its for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. output. In other words, given a simulator of a generative process in For all other uses, contact the owner/author(s). the forward direction (inputs!outputs), our technique can provide SC ’19, November 17–22, 2019, Denver, CO, USA the reverse (outputs!inputs) by predicting the whole latent state © 2019 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-6229-0/19/11. of the simulator that could have given rise to an observed instance https://doi.org/10.1145/3295500.3356180 of its output. For example, using a particle physics simulation we SC ’19, November 17–22, 2019, Denver, CO, USA Baydin et al. can get distributions over the particle properties and decays within variables2 expressed by the existing Sherpa simulator code base the simulator that can give rise to a collision event observed in a of nearly one million lines of code in C++ [29]. detector, or, using a spectroscopy simulator we can determine the • Synchronous data parallel training of a dynamic 3DCNN–LSTM elemental matter composition and dispersions within the simulator neural network (NN) architecture using the PyTorch [61] MPI explaining an observed spectrum. In felds where accurate simu- framework at the scale of 1,024 nodes (32,768 CPU cores) with lators of real-world phenomena exist, our technique enables the a global minibatch size of 128k. To our knowledge this is the interpretable explanation of real observations under the structured largest scale use of PyTorch’s builtin MPI functionality,3 and the model defned by the simulator code base. largest minibatch size used for this form of NN model. We achieve this by defning a probabilistic programming execution protocol that interfaces with existing simulators at the sites of 2 PROBABILISTIC PROGRAMMING FOR random number draws, without altering the simulator’s structure PARTICLE PHYSICS and execution in the host system. The random number draws are Particle physics seeks to understand particles produced in collisions routed through the protocol to a PPL system which treats these at accelerators such at the LHC at CERN. Collisions happen millions as samples from corresponding prior distributions in a Bayesian of times per second, creating cascading particle decays, observed in setting, giving one the capability to record or guide the execution complex instruments such as the ATLAS detector [2], comprising of the simulator to perform inference. Thus we generalize existing millions of electronics channels. These experiments analyze the vast simulators as probabilistic programs and make them subject to volume of resulting data and seek to reconstruct the initial particles inference under general-purpose inference engines. produced in order to make discoveries including physics beyond Inference in the probabilistic programming setting is performed the current Standard Model of particle physics [28][73][63][72]. by sampling in the space of execution traces, where a single sample The Standard Model has a number of parameters (e.g., particle (an execution trace) represents a full run of the simulator. Each masses), which we can denote θ, describing the way particles and execution trace itself is composed of a potentially unbounded se- fundamental forces act in the universe. In a given collision at the quence of addresses, prior distributions, and sampled values, where LHC, with initial conditions denoted E, we observe a cascade of an address is a unique label identifying each random number draw. particles interact with particle detectors. If we denote all of the In other words, we work with empirical distributions over simu- random “choices” made by nature as x, the Standard Model de- lator executions, which entails unique requirements on memory, scribes, generatively, the conditional probability p¹xjE;θº, that is, storage, and computation that we address in our implementation. the distribution of all choices x as a function of initial conditions E The addresses comprising each trace give our technique the unique and model parameters θ. Note that, while the Standard Model can ability to provide direct connections to the simulator code base be expressed symbolically in mathematical notation [32, 62], it can for any predictions at test time, where the simulator is no longer also be expressed computationally as a stochastic simulator [29], used as a black box but as a highly structured and interpretable which, given access to a random number generator, can draw sam- probabilistic generative model that it implicitly represents. ples from p¹xº.4 Similarly, a particle detector can be modeled as a Our PPL provides inference engines from the Markov chain stochastic simulator, generating samples from p¹yjxº, the likelihood Monte Carlo (MCMC) and importance sampling (IS) families. MCMC of observation y as a function of x. inference guarantees closely approximating the true posterior of In this paper we focus on a real use-case in particle physics, the simulator, albeit with signifcant computational cost due to its performing experiments on the decay of the τ (tau) lepton. This is sequential nature and the large number of iterations one needs to under active investigation by LHC physicists [4] and important to accumulate statistically independent samples. Inference compila- uncovering properties of the Higgs boson. We use the state-of-the- tion (IC) [47] addresses this by training a dynamic neural network art Sherpa simulator [29] for modeling τ particle creation in LHC to provide proposals for IS, leading to fast amortized inference. collisions and their subsequent decay into further particles (the We name this project “Etalumis”, the word “simulate” spelled stochastic events x above), coupled to a fast 3D detector simulator backwards, as a reference to the

Bringing Probabilistic Programming to Scientific Simulators at Scale

Effective Virtual CPU Configuration with QEMU and Libvirt

Amd Epyc 7351

CS 110 Discussion 15 Programming with SIMD Intrinsics

AMD's Bulldozer Architecture

AMD Ryzen 5 1600 Specifications

C++ Code M128 Add (Const M128 &X, Const __M128 &Y){ X X3 X2 X1 X0 Return Mm Add Ps(X, Y); } + + + + +

Energy Efficient Spin-Locking in Multi-Core Machines

On the Performance of MPI-Openmp on a 12 Nodes Multi-Core Cluster

Benchmarking-HOWTO.Pdf

Intel® Architecture Instruction Set Extensions and Future Features

Exploring Coremark™ – a Benchmark Maximizing Simplicity and Efficacy by Shay Gal-On and Markus Levy

Intel(R) Advanced Vector Extensions Programming Reference