Improving Numerical Integration and Event Generation with Normalizing Flows — HET Brown Bag Seminar, University of Michigan —
Total Page:16
File Type:pdf, Size:1020Kb
Improving Numerical Integration and Event Generation with Normalizing Flows | HET Brown Bag Seminar, University of Michigan | Claudius Krause Fermi National Accelerator Laboratory September 25, 2019 In collaboration with: Christina Gao, Stefan H¨oche,Joshua Isaacson arXiv: 191x.abcde Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 1 / 27 Monte Carlo Simulations are increasingly important. https://twiki.cern.ch/twiki/bin/view/AtlasPublic/ComputingandSoftwarePublicResults MC event generation is needed for signal and background predictions. ) The required CPU time will increase in the next years. ) Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 2 / 27 Monte Carlo Simulations are increasingly important. 106 3 10− parton level W+0j 105 particle level W+1j 10 4 W+2j particle level − W+3j 104 WTA (> 6j) W+4j 5 10− W+5j 3 W+6j 10 Sherpa MC @ NERSC Mevt W+7j / 6 Sherpa / Pythia + DIY @ NERSC 10− W+8j 2 10 Frequency W+9j CPUh 7 10− 101 8 10− + 100 W +jets, LHC@14TeV pT,j > 20GeV, ηj < 6 9 | | 10− 1 10− 0 50000 100000 150000 200000 250000 300000 0 1 2 3 4 5 6 7 8 9 Ntrials Njet Stefan H¨oche,Stefan Prestel, Holger Schulz [1905.05120;PRD] The bottlenecks for evaluating large final state multiplicities are a slow evaluation of the matrix element a low unweighting efficiency Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 3 / 27 Monte Carlo Simulations are increasingly important. 106 3 10− parton level W+0j 105 particle level W+1j 10 4 W+2j particle level − W+3j 104 WTA (> 6j) W+4j 5 10− W+5j 3 W+6j 10 Sherpa MC @ NERSC Mevt W+7j / 6 Sherpa / Pythia + DIY @ NERSC 10− W+8j 2 10 Frequency W+9j CPUh 7 10− 101 8 10− + 100 W +jets, LHC@14TeV pT,j > 20GeV, ηj < 6 9 | | 10− 1 10− 0 50000 100000 150000 200000 250000 300000 0 1 2 3 4 5 6 7 8 9 Ntrials Njet Stefan H¨oche,Stefan Prestel, Holger Schulz [1905.05120;PRD] The bottlenecks for evaluating large final state multiplicities are a slow evaluation of the matrix element a low unweighting efficiency Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 3 / 27 Improving Numerical Integration and Event Generation with Normalizing Flows Part I: The \traditional" approach Part II: The Machine Learning approach Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 4 / 27 Z F = f (~x) d D x ) Z σ = dσ(pi ;#i ;'i ); D = 3nfinal 4 ) − Given a distribution f (~x), how can we sample according to it? ? = ) I: There are two problems to be solved. f (~x) dσ(pi ;#i ;'i ) Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 5 / 27 Given a distribution f (~x), how can we sample according to it? I: There are two problems to be solved. Z f (~x) F = f (~x) d D x ) Z dσ(pi ;#i ;'i ) σ = dσ(pi ;#i ;'i ); D = 3nfinal 4 ) − ? = ) Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 5 / 27 I: There are two problems to be solved. Z f (~x) F = f (~x) d D x ) Z dσ(pi ;#i ;'i ) σ = dσ(pi ;#i ;'i ); D = 3nfinal 4 ) − Given a distribution f (~x), how can we sample according to it? ? = ) Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 5 / 27 We need a fast and effective numerical integration! ) I: . but they are closely related. 1 Starting from a pdf, . 2 . we can integrate it and find its cdf, . 3 . to finally use its inverse to transform a uniform distribution. 1 2 3 ) ) Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 6 / 27 I: . but they are closely related. 1 Starting from a pdf, . 2 . we can integrate it and find its cdf, . 3 . to finally use its inverse to transform a uniform distribution. 1 2 3 ) ) We need a fast and effective numerical integration! ) Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 6 / 27 We therefore have to find a q(x) that approximates the shape of f (x). is \easy" enough such that we can sample from its inverse cdf. I: Importance Sampling is very efficient for high-dimensional integration. Z 1 MC 1 X f (x) dx f (xi ) xi ::: uniform −−! N 0 i Z 1 f (x) MC 1 X f (xi ) = q(x)dx xi ::: q(x) q(x) −−−−−−−−−−−−!importance sampling N q(xi ) 0 i Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 7 / 27 I: Importance Sampling is very efficient for high-dimensional integration. Z 1 MC 1 X f (x) dx f (xi ) xi ::: uniform −−! N 0 i Z 1 f (x) MC 1 X f (xi ) = q(x)dx xi ::: q(x) q(x) −−−−−−−−−−−−!importance sampling N q(xi ) 0 i We therefore have to find a q(x) that approximates the shape of f (x). is \easy" enough such that we can sample from its inverse cdf. Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 7 / 27 If q(x) f (x), all events would have the same weight as the distribution/ reproduces f (x) directly. # accepted events mean w We define the Unweighting Efficiency = # all events = max w , p(xi ) f (xi ) with wi = = . q(xi ) Fq(xi ) I: The unweighting efficiency measures the quality of the approximation q(x). If q(x) were constant, each event xi would require a weight of f (xi ) to reproduce the distribution of f (x). \Weighted Events" ) To unweight, we need to accept/reject each event with probability f (xi ) max f (x) . The resulting set of kept events is unweighted and reproduces the shape of f (x). Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 8 / 27 # accepted events mean w We define the Unweighting Efficiency = # all events = max w , p(xi ) f (xi ) with wi = = . q(xi ) Fq(xi ) I: The unweighting efficiency measures the quality of the approximation q(x). If q(x) were constant, each event xi would require a weight of f (xi ) to reproduce the distribution of f (x). \Weighted Events" ) To unweight, we need to accept/reject each event with probability f (xi ) max f (x) . The resulting set of kept events is unweighted and reproduces the shape of f (x). If q(x) f (x), all events would have the same weight as the distribution/ reproduces f (x) directly. Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 8 / 27 I: The unweighting efficiency measures the quality of the approximation q(x). If q(x) were constant, each event xi would require a weight of f (xi ) to reproduce the distribution of f (x). \Weighted Events" ) To unweight, we need to accept/reject each event with probability f (xi ) max f (x) . The resulting set of kept events is unweighted and reproduces the shape of f (x). If q(x) f (x), all events would have the same weight as the distribution/ reproduces f (x) directly. # accepted events mean w We define the Unweighting Efficiency = # all events = max w , p(xi ) f (xi ) with wi = = . q(xi ) Fq(xi ) Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 8 / 27 1.0 0.8 It does have problems if the features are not aligned with the coordinate axes. 0.6 0.4 The current python implementation also uses 0.2 stratified sampling. 0.0 0.0 0.2 0.4 0.6 0.8 1.0 I: The VEGAS algorithm is very efficient. The VEGAS algorithm Peter Lepage 1980 assumes the integrand factorizes and bins the 1-dim projection. then adapts the bin edges such that area of each bin is the same. = ) Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 9 / 27 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 I: The VEGAS algorithm is very efficient. The VEGAS algorithm Peter Lepage 1980 assumes the integrand factorizes and bins the 1-dim projection. then adapts the bin edges such that area of each bin is the same. = ) It does have problems if the features are not aligned with the coordinate axes. The current python implementation also uses stratified sampling. Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 9 / 27 Improving Numerical Integration and Event Generation with Normalizing Flows Part I: The \traditional" approach Part II: The Machine Learning approach Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 10 / 27 Part II: The Machine Learning approach Part II.1: Neural Network Basics Part II.2: Numerical Integration with Neural Networks Part II.3: Examples Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 11 / 27 II.1: Neural Networks are nonlinear functions, inspired by the human brain. Each neuron transforms the input with a weight W and a bias ~b. x0 xi σ(W ~x + b) W ~x + ~b xn The activation function σ makes it nonlinear. “rectified linear unit (relu)" \leaky relu" \sigmoid" Claudius Krause (Fermilab) Machine Learning Phase Space September 25, 2019 12 / 27 II.1: The Loss function quantifies our goal. We have two choices: Kullback-Leibler (KL) divergence: R p(x) 1 P p(xi ) p(xi ) DKL = p(x) log dx log ; xi ::: q(x) q(x) ≈ N q(xi ) q(xi ) Pearson χ2 divergence: 2 2 R (p(x) q(x)) 1 P p(xi ) Dχ2 = − dx 2 1; xi ::: q(x) q(x) ≈ N q(xi ) − They give the gradient that is needed for the optimization: (1 or 2) 1 X p(xi ) θD(KL or χ2) θ log q(xi ); xi ::: q(x) r ≈ −N q(xi ) r We use the ADAM optimizer for stochastic gradient descent: The learning rate for each parameter is adapted separately, but based on previous iterations.