<<

Practical for Particle Physicists

Luca Lista INFN Lecture 1 The goals of a physicist

• Measurements • Discoveries

mt = 173.49 ± 1.07 GeV Higgs boson!

European School of HEP 2016 Luca Lista 3 Starting ingredients

• Particle collisions are recorded in form of delivered by detectors – Measurements of: particle position in the detector, energy, time, … • Usually a large number of collision events are collected by an , each event usually containing large amounts of data • Collision event data are all different from each other – Intrinsic of physics process (Quantum Mechanics: P ∝ |A|2) – Detector response is somewhat random – (Fluctuations, resolution, , ….)

European School of HEP 2016 Luca Lista 4 Theory inputs

• Distributions of measured quantities in data: – are predicted by a theory model, – depend on some theory , – e.g.: particle mass, cross section, etc. • Given our data sample, we want to: – measure theory parameters,

• e.g.: mt= 173.49 ± 1.07 GeV – answer questions about the nature of data • Is there a Higgs boson? è Yes! (strong evidence? Quantify!) • Is there a Dark Matter? è No evidence, so far… • If not, what is the of theory parameters compatible with the observed data? What range can we exclude?

European School of HEP 2016 Luca Lista 5 How to proceed

• First of all, we need to have theory under control, because our data have intrinsic randomness • Then, we should use on our data and our theory model in order to extract information that will address our questions • I.e.: we use statistics for data analysis

Oscar De Lama, flickr

European School of HEP 2016 Luca Lista 6 Introduction to probability

• Probability can be defined in different ways • The applicability of each definition depends on the kind of claim we are considering to applying the concept of probability • One subjective approach expresses the degree of belief/credibility of the claim, which may vary from subject to subject • For repeatable , probability may be a measure of how frequently the claim is true

European School of HEP 2016 Luca Lista 7 Repeatable experiments

• What’s the probability to extract one ace in a deck of cards? • What is the probability to win a lottery (or bingo, …)? • What is the probability that a pion is incorrectly identified as a muon in CMS? • What is the probability that a fluctuation in the background can produce a peak in the γγ spectrum with a magnitude at least equal to what has been observed by ATLAS? • Note: different question w.r.t.: what is the probability that the peak is due to a background fluctuation? (non repeatable!)

European School of HEP 2016 Luca Lista 8 Non repeatable claims

• Could be about future events: – what is the probability that tomorrow it will rain in Geneva? – what’s the probability that your favorite team will win next championship? • But also past events: – What’s the probability that dinosaurs went extinct because of an asteroid? • More in general, it’s about unknown events: – What is the probability that dark matter is made of particles heavier than 1 TeV? – What is the probability that climate changes are mainly due to human intervention?

European School of HEP 2016 Luca Lista 9 Classical probability

• Probability determined by symmetry properties of a random device • “Equally undecided” about event outcome, according to Laplace definition

Number of favorable cases Probability: P = Pierre Simon Laplace Number of total cases (1749-1827)

P = 1/2 P = 1/6 (each dice)

P = 1/4 P = 1/10

European School of HEP 2016 Luca Lista 10 Composite cases

• Composite cases are managed via combinatorial analysis • Reduce the (composite) event of interest into elementary equiprobable events ()

E.g:

2 = {(1,1)} 3 = {(1,2), (2,1)} 4 = {(1,3), (2,2), (3,1)} 5 = {(1,4), (2,3), (3,2), (4,1)} etc. …

• Statements about an event can be defined via set algebra – and/or/not ⇒ intersection/union/complement “sum of two dices is even and greater than four”

– E.g.:

{(d1, d2): mod(d1 + d2, 2) = 0} ∩ {(d1, d2): d1 + d2 > 4}

European School of HEP 2016 Luca Lista 11 “Event” word overloading • Note that in physics and statistics usually the word “event” have different meanings • Statistics: a subset in the sample space – E.g.: “the sum of two dices is ≥ 5” • Physics: the result of a collision, as recorded by our experiment – E.g.: a Higgs to two-photon candidate event • In several concrete cases, an event in statistics may correspond to many possible collision events

– E.g.: “pT(γ) > 40 GeV”, “The measured mH is > 125 GeV”

European School of HEP 2016 Luca Lista 12

• Probability P = of occurrence of an event in the limit of very large number (N→∞) of repeated trials

Number of favorable cases Probability: P = lim N→∞ N = Number of trials

• Exactly realizable only with an infinite number of trials – Conceptually may be unpleasant – Pragmatically acceptable by physicists • Only applicable to repeatable experiments

European School of HEP 2016 Luca Lista 13 Subjective (Bayesian) probability

• Expresses one’s degree of belief that a claim is true – How strong? Would you bet? – Applicable to all unknown events/claims, not only repeatable experiments – Each individual may have a different opinion/prejudice • Quantitative rules exist about how subjective probability should be modified after learning about some observation/evidence – Consistent with Bayes theorem (à will be introduced in next slides) – à (following observation) – The more information we receive, the more is insensitive on prior subjective prejudice (unless in pathological cases…)

European School of HEP 2016 Luca Lista 14 Kolmogorov approach • Axiomatic probability definition – Terminology: Ω = sample space, F = event space, P = probability measure – Let (Ω, F ⊆ 2Ω, P) be a measure space that satisfy:

– 1

– 2 (normalization)

– 3

Ω E0

E2 ... En • The same formalism applies to E1

either frequentist and Bayesian ...... probability E3 ...

European School of HEP 2016 Luca Lista 15 Probability distributions

• Given a discrete , we can P assign a probability to each individual value:

• In case of a continuous variable, the probability assigned to an individual value may be zero n • A probability density better quantifies the

probability content (unlike P({x}) = 0 !): f(x)

• Discrete and continuous distributions can be combined using Dirac’s delta functions. • E.g.: x

50% prob. to have zero (P({0}) = 0.5), 50% distributed according to f(x)

European School of HEP 2016 Luca Lista 16 Gaussian distribution

• Many random variables in real experiments follow a Gaussian distribution • : approximate sum of multiple random contributions, regardless of the individual distributions Carl Friedrich Gauss (1777-1855) • Frequently used to model detector resolution g p = 68.3%

nσ Prob. p = 15.8% σ σ p = 15.8% 1 0.683 2 0.954 3 0.997 4 1 – 6.3×10-5 x 5 1 – 5.7×10-7

European School of HEP 2016 Luca Lista 17 Poisson distribution

• Distribution of the number of occurrences of random event uniformly distributed in a measurement range whose rate is known – E.g.: number of rain drops in a given area and in a given time interval, number of cosmic rays crossing a P ν = 2 detector in a Siméon-Denis Poisson given time (1781-1840) interval ν = 5

ν = 10 Can be approximated with a ν = 20 Gaussian distribution for large values of ν. ν = 30

n

European School of HEP 2016 Luca Lista 18 Binomial distribution

• Probability to extract n red balls over N trials, given the fraction p of red balls in a basket p = 3/10 • Red: p = 3/10

• White: 1 – p = 7/10

• Typical application in physics: detector efficiency (ε = p)

European School of HEP 2016 Luca Lista 19 PDFs in more dimensions

• In more dimensions (n random variables), PDF can be defined as:

• The probability associated to an event E is obtained by integrating the PDF over the corresponding set in the sample Ω space

E

European School of HEP 2016 Luca Lista 20 and of random vars.

• Given a random variable x with distribution f(x) we can define:

• Mean or expected value:

r.m.s.: root mean square • Variance:

:

and of two variables x and y:

Note: integration extends over x and y in two dimensions when computing an average value!

European School of HEP 2016 Luca Lista 21 Conditional probability

• Probability of A, given B: P(A | B), i.e.: probability that an event known to belong to set B also belongs to set A: – P(A | B) = P(A ∩ B) / P(B) Ω – Notice that: P(A | Ω) = P(A ∩ Ω) / P(Ω) • Event A is said to be A B independent of B if the probability of A given B is equal to the probability of A: – P(A | B) = P(A) • If A is independent of B then P(A ∩ B) = P(A) P(B) • à If A is independent on B, B is independent on A

European School of HEP 2016 Luca Lista 22 Independent variables

• 1D projections: (marginal distributions)

• x and y are independent if: y

B

• We saw that A and B are A independent events if:

x • Where A = {xʹ: x < xʹ < x + δx}, B = {yʹ : y < yʹ

European School of HEP 2016 Luca Lista 23 The Bayes theorem

Thomas Bayes (1702-1761)

• P(A) = prior probability • P(A|B) = posterior probability

European School of HEP 2016 Luca Lista 24 Here it is!

The Big Bang Theory © CBS

European School of HEP 2016 Luca Lista 25 Bayesian posterior probability

• Bayes theorem allows to determine probability about hypotheses or claims H that not related random variables, given an observation or evidence E:

• P(H) = prior probability • P(H | E) = posterior probability, given E • The Bayes rule allows to define a rational way to modify one’s prior belief once some observation is known

European School of HEP 2016 Luca Lista 26 Example (frequentist): muon fake rate

• A detector identifies muons with high efficiently, ε = 95% • A small fraction δ = 5% of pions are incorrectly identified as muons (“fakes”) • If a particle is identified as a muon, what is the probability it is really a muon? – The answer also depends on the composition of the sample! – i.e.: the fraction of muons and pions in the overall sample

This example is usually presented as an case.

Naïve answers about fake positive probability are often wrong!

European School of HEP 2016 Luca Lista 27 Fakes and Bayes theorem

Law of total Ω ... A2 • Using Bayes theorem: probability E0 An A1 E2 ... – P(µ|+) = P(+|µ) P(µ) / P(+) En E1

• Where our inputs are: ...... E ... – P(+|µ) = ε = 0.95, P(+|π) = δ = 0.05 3 ... A3 • We can decompose P(+) as: ... – P(+) = P(+|µ) P(µ) + P(+|π) P(π) • Putting all together: – P(µ|+) = ε P(µ) / (ε P(µ) + δ P(π)) E0 = ‘+’, Ai = µ, π • Assume we have a sample made of P(µ)=4% muons and P(π)=96% pions, we have: – P(µ|+) = 0.95 × 0.04 / (0.95 × 0.04 + 0.05 × 0.96) ≅ 0.44 • Even if the selection efficiency is very high, the low sample purity makes P(µ|+) lower than 50%.

European School of HEP 2016 Luca Lista 28 Before any muon id. information

Muons: P(µ) = 4%

All particles: Pions: P(π) = 96% P(Ω) = 100%

European School of HEP 2016 Luca Lista 29 After the muon id. measurement P(+) = 8.6% P(+|µ) = ε = 95% P(−|µ) = 1 − ε = 5%

Muons: P(µ) = 4% P(−) = 91.4%

P(+|π) = δ = 5% Pions: P(π) = 96%

P(−|π) =1 − δ = 95%

European School of HEP 2016 Luca Lista 30 The

• In many cases, the outcome of our experiment can be modeled as a

set of random variables x1, …, xn whose distribution takes into account: – intrinsic sample randomness (quantum physics is intrinsically random), – detector effects (resolution, efficiency, …). • Theory and detector effects can be described according to some

parameters θ1, ..., θm, whose values are, in most of the cases, unknown

• The overall PDF, evaluated at our observation x1, …, xn, is called likelihood function:

• In case our sample consists of N independent measurements (collision events) the likelihood function can be written as:

European School of HEP 2016 Luca Lista 31 Bayes rule and likelihood function

• Given a set of measurements x1, …, xn, Bayesian posterior PDF of the unknown parameters θ1, …, θm can be determined as:

• Where π(θ1, …, θm) is the subjective prior probability • The denominator ∫ L(x, θ ) π(θ ) dmθ is a normalization factor

• The observation of x1, …, xn modifies the prior knowledge of the unknown parameters θ1, …, θm

• If π(θ1, …, θm) is sufficiently smooth and L is sharply peaked around the true values θ1, …, θm, the resulting posterior will not be strongly dependent on the prior’s choice

European School of HEP 2016 Luca Lista 32 Repeated use of Bayes theorem

• Bayes theorem can be applied sequentially for repeated independent observations (posterior PDF = learning from experiments)

P = Prior 0 Note that applying Bayes theorem directly from prior to (obs1 + obs2) leads to the same result: P1+2 = P0 ⨉ L1+2 = P0 ⨉ L1 ⨉ L2 = P2 Prior P1 ∝ P0 ⨉ L1 observation 1

Conditioned posterior 1 P2 ∝ P1 ⨉ L2 ∝ P0 ⨉ L1 ⨉ L2 observation 2

Conditioned posterior 2 P3 ∝ P0 ⨉ L1 ⨉ L2 ⨉ L3 observation 3 Composite likelihood = product of individual likelihoods Conditioned posterior 3 (for independent observations)

European School of HEP 2016 Luca Lista 33 Inference

• Determinig information about unknown parameters using probability theory

Probability Theory Data Model Data fluctuate according to process randomness

Inference Theory Data Model Model parameters uncertainty due to fluctuations of the data sample

European School of HEP 2016 Luca Lista 34

• The posterior PDF provides all the information about the unknown parameters (let’s assume here it’s just a single parameter θ for simplicity)

• Given P(θ |x), we can determine: ) x

– The most probable value | θ (best estimate) ( P p = 68.3%, as 1σ – Intervals corresponding to a for a Gaussian specified probability • Notice that if π(θ ) is a constant, the most probable value of θ δ δ correspond to the maximum of the likelihood function θ

European School of HEP 2016 Luca Lista 35

• Assigning a probability level of an unknown parameter makes no sense in the frequentist approach – Parameters are not random variables! • A frequentist inference procedure determines a central value and an uncertainty interval that depend on the observed measurements • The central value and interval extremes are random variables • No subjective element is introduced in the determination

• The function that returns the central value given an observed measurement is called estimator • Different estimator choices are possible, the most frequently adopted is the maximum likelihood estimator because of its statistical properties discussed in the following

European School of HEP 2016 Luca Lista 36 Frequentist coverage

True value of θ • Repeating the experiment will result each time in a different data sample • For each data sample, the estimator returns a different central value � • An uncertainty interval [� − δ, � + δ] can be associated to the estimator’s value � • Some of the confidence intervals

contain the fixed and unknown true Repeated experiments value of θ, corresponding to a fraction equal to 68% of the times, in the limit of very large number of experiments (coverage) �

European School of HEP 2016 Luca Lista 37 Choice of 68% prob. intervals

• Different interval choices are possible, corresponding to the same probability level (usually 68%, as 1σ for a Gaussian) – Equal areas in the right and left tails All equivalent for a – Symmetric interval symmetric distribution – Shortest interval (e.g. Gaussian) – … • Reported as � = � ± � (sym.) or � = � (asym.) Equal tails interval Symmetric interval ) ) θ θ ( ( P P p = 68.3% p = 68.3% p = 15.8% p = 15.8% δ δ

European School of HEP 2016 θ Luca Lista 38θ Upper and lower limits

• A fully asymmetric interval choice is obtained setting one extreme of the interval to the lowest or highest allowed range • The other extreme indicates an upper or lower limits to the “allowed” range • For upper or lower limits, usually a probability of 90% or 95% is preferred to the usual 68% adopted for central intervals • Reported as: θ < θup (90% CL) or θ > θlo (90% CL) ) ) θ θ ( ( P p = 90% P

p = 90%

θ θ

European School of HEP 2016 Luca Lista 39 Bayesian inference of a Poissonian

• Posterior PDF, assuming ) n | s the prior to be π(s): ( n = 5 P f(s|n) = max è s = n p = 15.8% p = 15.8%

• If is π(s) is uniform:

s |0) s ( • Note: , P n = 0 • For n = 0, one may quote an upper f(s|0) = e−s limit at 90% or 95% CL:

• s < 2.303 (90% CL) zero observed p = 10% • s < 2.996 (95% CL) events

European School of HEP 2016 Luca Lista 40 s Error propagation: Bayesian inference

• Applying a parameter transformation, say η = H(θ), results in a transformed central value and transformed uncertainty interval • The error propagation can be done transforming the posterior PDF, then computing the interval on the transformed PDF:

• Transformations for cases with more than one variable proceed in a similar way:

η = H(θ1, θ2) :

η1 = H1(θ1, θ2), η1 = H1(θ1, θ2):

European School of HEP 2016 Luca Lista 41 Choosing the prior PDF

• If the prior PDF is uniform in a choice of variable, it won’t be uniform when applying coordinate transformation • Given a prior PDF in a random variable, there is always a transformation that makes the PDF uniform • The problem is: chose one metric where the PDF is uniform • Harold Jeffreys’ prior: chose the prior form that is invariant under parameter transformation

• Some commonly used cases: – Poissonian mean: – Poissonian mean with background b: Note: the previous – Gaussian mean: simple Poissonian – Gaussian standard deviation: example was – Binomial parameter: obtained with π(µ) = const.! • Problematic with PDF in more than one dimension!

European School of HEP 2016 Luca Lista 42 Frequentist inference:

• An estimator is a function of a given set of measurements that provides an approximate value of a parameter of interest which appears in our PDF model (“best fit”) • Simplest example: – Assume a Gaussian PDF with a known σ and an unknown µ – A single experiment provides a measurement x – We estimate µ as � = x – The distribution of � (repeating the experiment many times) is the original Gaussian – 68.3% of the experiments (in the limit of large number of repetitions) will provide an estimate within: µ − σ < � < µ + σ • We can quote: µ = x ± σ

European School of HEP 2016 Luca Lista 43 The maximum-likelihood method

• The maximum-likelihood estimator is the most adopted parameter estimator • The “best fit” parameters correspond to the set of values that maximizes the likelihood function – Good statistical properties (à next slides) • The maximization can be performed analytically only in the simplest cases, and numerically for most of realistic cases

• Minuit is historically the most widely used minimization engine in High Energy Physics – F. James, 1970’s; rewritten in C++ and released under CERN’s ROOT framework

European School of HEP 2016 Luca Lista 44