Systematic Uncertainties and Domain Adaptation

Systematic Uncertainties and Domain Adaptation Michael Kagan SLAC Hammers and Nails July, 2017 2 The Goal ATLAS-CONF-2017-043 The Higgs Boson! 3 The Goal ATLAS-CONF-2017-043 The Higgs Boson! • Want to reject the “No Higgs” hypothesis ( + ) • Hopefully measure the size of the 4 The Goal ATLAS-CONF-2017-043 The Higgs Boson! • Need accurate predictions for and 5 The Goal ATLAS-CONF-2017-043 The Higgs Boson! • And we need to know how wrong these predictions could be… 6 Sources of Systematic Uncertainty ATLAS-CONF-2017-041 • Experimental – How well we understand the detector and our ability to identify / measure properties of particles given that they are present • Theoretical – How well we understand the underlying theoretical model of the physical processes • Modeling uncertainties – How well can I constrain backgrounds in alternate data measurements / control regions Error Propagation 7 • Random variables x={x1, x2, … , xn} distributed according to p(x) – Assume we only know E[x] = µ and Var[xi, xj] = Vij • For some function f(x), first order approximation N @f f(x) f(µ)+ (xi µi) ⇡ @xi x=µ − i=1 X • Then E[f(x)] f(µ) ⇡ N 2 2 @f @f σf = E[(f(x) f(µ)) ] Vij − ⇡ @x @x x=µ i, j=1 i j X h i Error Propagation 8 • Often, we only have samples of x • But we know how x varies under some uncertainty – x → x ± Δx – Where Δx is a 1 standard deviation change in x • Then often approximate the error on f(x) with – ±Δf(x) = f(x ± Δx ) – f(x) • So we vary x and see how f(x) changes – Often histogram the distributions, showing ±Δf(x) as a band on the histogram bin counts Hypothesis Testing 9 p(x ✓0) λ( ; ✓0, ✓1)= | D p(x ✓1) x Y2D | • Likelihood p(x|θ) parameterized by θ • Likelihood ratio used as test statistic for hypothesis testing between null hypothesis (θ0) and alternative hypothesis (θ1) • Parameterization θ=(µ, ν) includes parameters of interest (µ) and nuisance parameters (ν) [For more details, see Glen Cowan’s talk] Nuisance Parameters and the Likelihood 10 N ni (µsi + bi) (µs +b ) L(µ)= e− i i n ! i=1 i Y • Comparing binned counts between observed data and prediction – ni is data counts in bin i – si is predicted signal (diamond) counts in bin i – bi is predicted background (rocks) counts in bin i – m is parameter of interest, the “signal strength” [For more details, see Glen Cowan’s talk] Nuisance Parameters and the Likelihood 11 N ni (µsi + ⌫bi) (µs +b ) L(µ, z)= e− i i N(⌫;1, σ) n ! i=1 i Y Lprofile(µ) = max L(µ, ⌫) ⌫ • With nuisance parameters – ν is the nuisance parameter, where uncertainty parameterized by ν follows some prior (here a normal distribution) – Frequently use profile likelihood, maximizing over nuisance for each value of µ [For more details, see Glen Cowan’s talk] Effect of Systematics 12 arXiv:1408.6865 • Systematics can have a big effect, want to mitigate them as much as possible [For more details, see Glen Cowan’s talk] Simulation and Analysis at the LHC 13 How do we get these predictions? 14 + {hiGher order terms} • Standard Model tells us how to compute interaction rates and their distributions – But we can only compute and approximation • Several different “generators” that simulate how to compute the primary interactions and evolve them into multi-body states • The dimensionality of the problem becomes enormous, can’t compute the full pdf • Simulating these process is done through Monte Carlo sampling Generator Uncertainties 15 arXiv:1609.00607 • Different generators can sometimes give rather different predictions Uncertainty in the Theory Model 16 • Model true distributions has approximations in it – xt = g(zt ; yt=c , νt) – g(…) is the generator of events, that can be conditioned on class – zt is some latent random numbers (random seeds) – νt are parameters • Don’t know the exact model, but can quantify the uncertainty of the approximation – E.g. parameters have been constrained by other experiments Simulated True DistribuTon – Theoretical physicists give BackGround bounds on parameters SiGnal pt(x) – Variations of parameters νt can be done to get alternative true distributions x=mass – p(x) → p(x, νt ) = p(x | νt ) p(νt ) Very Big Detectors: The ATLAS Experiment 17 ~108 detector channels Size: Data: 46 m long, Weight: ~300 MB / sec 25 m high, 7000 tons ~3000 TB / year 25 m wide From Theory to Experiment From K. Cranmer 18 • Particles interact with custom detectors – At the LHC, detectors can 8 … have 10 sensors From Theory to Experiment From K. Cranmer 19 • Particles interact with muon Boom custom detectors Quark – At the LHC, detectors can have 108 sensors • Develop algorithms to Boom identify particles from Quark electron detector signals From Theory to Experiment From K. Cranmer 20 • Particles interact with custom detectors – At the LHC, detectors can have 108 sensors • Develop algorithms to identify particles from detector signals • Develop small set (~20-40) engineered features that describe the interesting properties of the event… Classifier • Train classifier to separate signal and background Getting it right… we need 21 • An accurate model of the underlying physics, including the properties of the proton, primary interaction, and the “showering” process • An accurate and detailed model of the detector • Accurate models of particle interactions with materials • Must tune parameters in both the underlying theory model and the particle interactions with matter • And we need to correct any mismodeling as best we can → Calibrations / Residual corrections Simulating the experiment 22 • Experiment simulation includes geometry, (mis)alignment, material properties, etc… • Detector / particle-material interactions simulated with GEANT ATL-PHYS-PUB-2015-050 Simulating the experiment 23 • Experiment simulation includes geometry, (mis)alignment, material properties, etc… • Detector / particle-material interactions simulated with GEANT ATL-PHYS-PUB-2016-015 Uncertainty in our simulated electron energy measurements due to our uncertainty in the amount of material in the detector Calibration / Modeling Corrections 24 • If we want to calibrate the modeling of a particle, find a pure sample of that particle and derive corrections to the simulation to match data • Example – Z bosons decays to 2 electrons that have a combined mass of ≈90 GeV – p(particle=electron | mee≈90) ≈ 99% – So we can select (nearly) pure electron sample by looking for electron pairs consistent with Z boson – Use known properties of Z boson to calibrate Calibration / Modeling Corrections 25 • For samples x from the simulated, a posteriori adjust the value – pi → s*pi*(1+z) – s=constant scaling parameter to match the mean of simulation and data – z~N(z; 0, σz) resolution smearing, match variance in simulation and data • Measure s±Δs and σz±Δz with finite precision → family of distributions p(pi, s, σz) – s ~ N(s; s0, Δs) z ~ N(z; 0, σz) σz ~ N(σz; σz0, Δz) p1 2 2 2 mee =(E1 + E2) (p~1 + p~2) - True distribuTon; no detector − - Simulated distribuTon with detector - Corrected simulaon, with s and σz variaons - Observed Data Counts m p2 ee Calibration / Modeling Corrections 26 • Correct simulated mean and variance to match data Scale and smear momentum pi → s*pi*(1+z) to get matching mass distributions p 1 m2 =(E + E )2 (p~ + p~ )2 ee 1 2 − 1 2 ATL-PHYS-PUB-2016-015 p2 Moving Forward 27 • Our known unknowns, our , will diminish our sensitivity to new physics • We want to reduce them where we can – Improve our models with new inputs from new measurements • And diminish their impact as much as possible – Can we do this when training a classifier to separate signal from background? Systematics in a Classifier: Standard Approach 28 [S. Hagebock] • Train classifier, evaluate on nominal sample • Run systematically varied samples through classifier and see how it changes score distribution – Only evaluate systematics after training complete • No notion of systematics in training Domain Adaptation 29 Typical Binary Classification Problem 30 Training Testing / Evaluation N • Data: {xi, yi}i=1 • Data: {xi} d – xi in R – yi in {0, 1} – xi, yi ~ p(x, y) – xi ~ p(x) • Learn classifier h(x) – Model p(y|x) and label {xi} Domain Adaptation [N. Courty, hXps://mathemacal-coffees.Github.io/mc01-ot/ ] 31 + Labels No labels! Source domain TarGet domain Domain Adaptation [N. Courty, hXps://mathemacal-coffees.Github.io/mc01-ot/ ] 32 TraininG on source data only won’t Generalize well! Domain Adaptation 33 Training Testing / Evaluation • Data: t s s N • Data: {x } S ={x i , y i}i=1 i t Ut={x i} s s – x i, y i ~ ps(x, y) – xt ~ p (x) t i t – x i ~ pt(x) • Distribution of source and target domains not the same, ps(x, y) ≠ pt(x, y) • Setting – Perform the same classification in both domains – Distributions of target and source are closely related Domain Adaptation 34 • How can we learn a classifier from Source data and have a classifier that works will on Target data? • Depends on what we know about the problem! – We assume the tasks are related! – Do we know anything about what has changed between domains? – We can use the Target data to help in the training process! Theoretical bounds [Ben-David et al., 2010] Error of a classifier in the target domain is upper bounded by the sum of three terms: – Error of classifier in source domain – Divergence (or “distance”) between the feature distributions ps(x) and pt(x) – How closely related the classification tasks are Domain Adaptation 35 • Several method to address domain adaption problems – Differ depending on the details of the problem Approaches • Instance weighting – Often in Covariate Shift setting • Self-Labeling approaches – Iteratively predict target labels from source and learn – Not going to discuss today • Feature representation approaches – Learn a feature representation f(x) that has the same distribution for target and source data – Adversarial domain adaptation takes this approach! Covariate Shift 36 • General situation: ps(x, y) ≠ pt(x, y) • In this setting assume: – Class probabilities conditional on features the same: pt(y|x) = ps(y|x) = p(y|x) – Marginal Distributions different: ps(x)≠pt(x) Covariate Shift 37 [M.

Systematic Uncertainties and Domain Adaptation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support