Etalumis: Bringing Probabilistic Programming to Scientific
Total Page:16
File Type:pdf, Size:1020Kb
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale Atılım Güneş Baydin Lei Shao Wahid Bhimji University of Oxford Intel Corporation Lawrence Berkeley National Laboratory Lukas Heinrich Lawrence Meadows Jialin Liu CERN Intel Corporation Lawrence Berkeley National Laboratory Andreas Munk Saeid Naderiparizi Bradley Gram-Hansen University of British Columbia University of British Columbia University of Oxford Gilles Louppe Mingfei Ma Xiaohui Zhao University of Liège Intel Corporation Intel Corporation Philip Torr Victor Lee Kyle Cranmer University of Oxford Intel Corporation New York University Prabhat Frank Wood Lawrence Berkeley National University of British Columbia Laboratory ABSTRACT In this paper we report our work that enables, for the first time, Probabilistic programming languages (PPLs) are receiving wide- the use of existing stochastic simulator code as a probabilistic pro- spread attention for performing Bayesian inference in complex gram in which one can do fast, repeated (amortized) Bayesian infer- generative models. However, applications to science remain limited ence; this enables one to predict the distribution of input parameters because of the impracticability of rewriting complex scientific simu- and all random choices in the simulator from an observation of its lators in a PPL, the computational cost of inference, and the lack of output. In other words, given a simulator of a generative process in scalable implementations. To address these, we present a novel PPL the forward direction (inputs!outputs), our technique can provide framework that couples directly to existing scientific simulators the reverse (outputs!inputs) by predicting the whole latent state through a cross-platform probabilistic execution protocol and pro- of the simulator that could have given rise to an observed instance vides Markov chain Monte Carlo (MCMC) and deep-learning-based of its output. For example, using a particle physics simulation we inference compilation (IC) engines for tractable inference. To guide can get distributions over the particle properties and decays within IC inference, we perform distributed training of a dynamic 3DCNN– the simulator that can give rise to a collision event observed in a LSTM architecture with a PyTorch-MPI-based framework on 1,024 detector, or, using a spectroscopy simulator we can determine the 32-core CPU nodes of the Cori supercomputer with a global mini- elemental matter composition and dispersions within the simulator batch size of 128k: achieving a performance of 450 Tflop/s through explaining an observed spectrum. In fields where accurate simu- enhancements to PyTorch. We demonstrate a Large Hadron Col- lators of real-world phenomena exist, our technique enables the arXiv:1907.03382v1 [cs.LG] 8 Jul 2019 lider (LHC) use-case with the C++ Sherpa simulator and achieve interpretable explanation of real observations under the structured the largest-scale posterior inference in a Turing-complete PPL. model defined by the simulator code base. We achieve this by defining a probabilistic programming execu- tion protocol that interfaces with existing simulators at the sites of 1 INTRODUCTION random number draws, without altering the simulator’s structure Probabilistic programming [71] is an emerging paradigm within ma- and execution in the host system. The random number draws are chine learning that uses general-purpose programming languages routed through the protocol to a PPL system which treats these to express probabilistic models. This is achieved by introducing as samples from corresponding prior distributions in a Bayesian statistical conditioning as a language construct so that inverse prob- setting, giving one the capability to record or guide the execution lems can be expressed. Probabilistic programming languages (PPLs) of the simulator to perform inference. Thus we generalize existing have semantics [67] that can be understood as Bayesian inference simulators as probabilistic programs and make them subject to [13, 24, 26]. The major challenge in designing useful PPL systems inference under general-purpose inference engines. is that language evaluators must solve arbitrary, user-provided in- Inference in the probabilistic programming setting is performed verse problems, which usually requires general-purpose inference by sampling in the space of execution traces, where a single sample algorithms that are computationally expensive. (an execution trace) represents a full run of the simulator. Each 1 Baydin, Shao, Bhimji, Heinrich, Meadows, Liu, Munk, Naderiparizi, Gram-Hansen, Louppe, Ma, Zhao, Torr, Lee, Cranmer, Prabhat, Wood execution trace itself is composed of a potentially unbounded se- produced in order to make discoveries including physics beyond quence of addresses, prior distributions, and sampled values, where the current Standard Model of particle physics [28][73][63][72]. an address is a unique label identifying each random number draw. The Standard Model has a number of parameters (e.g., particle In other words, we work with empirical distributions over simu- masses), which we can denote θ, describing the way particles and lator executions, which entails unique requirements on memory, fundamental forces act in the universe. In a given collision at the storage, and computation that we address in our implementation. LHC, with initial conditions denoted E, we observe a cascade of The addresses comprising each trace give our technique the unique particles interact with particle detectors. If we denote all of the ability to provide direct connections to the simulator code base random “choices” made by nature as x, the Standard Model de- for any predictions at test time, where the simulator is no longer scribes, generatively, the conditional probability p¹xjE;θº, that is, used as a black box but as a highly structured and interpretable the distribution of all choices x as a function of initial conditions E probabilistic generative model that it implicitly represents. and model parameters θ. Note that, while the Standard Model can Our PPL provides inference engines from the Markov chain be expressed symbolically in mathematical notation [32, 62], it can Monte Carlo (MCMC) and importance sampling (IS) families. MCMC also be expressed computationally as a stochastic simulator [29], inference guarantees closely approximating the true posterior of which, given access to a random number generator, can draw sam- the simulator, albeit with significant computational cost due to its ples from p¹xº.4 Similarly, a particle detector can be modeled as a sequential nature and the large number of iterations one needs to stochastic simulator, generating samples from p¹yjxº, the likelihood accumulate statistically independent samples. Inference compila- of observation y as a function of x. tion (IC) [47] addresses this by training a dynamic neural network In this paper we focus on a real use-case in particle physics, to provide proposals for IS, leading to fast amortized inference. performing experiments on the decay of the τ (tau) lepton. This is We name this project “Etalumis”, the word “simulate” spelled under active investigation by LHC physicists [4] and important to backwards, as a reference to the fact that our technique essentially uncovering properties of the Higgs boson. We use the state-of-the- inverts a simulator by probabilistically inferring all choices in the art Sherpa simulator [29] for modeling τ particle creation in LHC simulator given an observation of its output. We demonstrate this collisions and their subsequent decay into further particles (the by inferring properties of particles produced at the Large Hadron stochastic events x above), coupled to a fast 3D detector simulator Collider (LHC) using the Sherpa1 [29] simulator. for the detector observation y. Current methods in the field include performing classification 1.1 Contributions and regression using machine learning approaches on low dimen- Our main contributions are: sional distributions of derived variables [4] that provide point- estimates without the posterior of the full latent state nor the deep • A novel PPL framework that enables execution of existing sto- interpretability of our approach. Inference of the latent structure chastic simulators under the control of general-purpose inference has only previously been used in the field with drastically simplified engines, with HPC features including handling multi-TB data models of the process and detector [43] [3]. and distributed training and inference. PPLs allow us to express inference problems such as: given an • The largest scale posterior inference in a Turing-complete PPL, actual particle detector observation y, what sequence of choices x where our experiments encountered approximately 25,000 latent are likely to have led to this observation? In other words, we would 2 variables expressed by the existing Sherpa simulator code base like to find p¹xjyº, the distribution of x as a function of y. To solve of nearly one million lines of code in C++ [29]. this inverse problem via conditioning requires invoking Bayes rule • Synchronous data parallel training of a dynamic 3DCNN–LSTM neural network (NN) architecture using the PyTorch [61] MPI p¹y; xº p¹yjxºp¹xº p¹xjyº = = ¯ framework at the scale of 1,024 nodes (32,768 CPU cores) with p¹yº p¹yjxºp¹xºdx a global minibatch size of 128k. To our knowledge this is the 3 largest scale use of PyTorch’s builtin MPI functionality, and the where the posterior distribution of interest, p¹xjyº,