Practical Statistics for Particle Physics

Practical statistics for particle physics R. J. Barlow The University of Huddersfield, Huddersfield, United Kingdom Abstract This is the write-up of a set of lectures given at the Asia Europe Pacific School of High Energy Physics in Quy Nhon, Vietnam in September 2018, to an audience of PhD students in all branches of particle physics. They cover the different meanings of ‘probability’, particularly frequentist and Bayesian, the binomial, the Poisson and the Gaussian distributions, hypothesis testing, estimation, errors (including asymmetric and systematic errors) and goodness of fit. Several different methods used in setting upper limits are explained, followed by a discussion on why 5 sigma are conventionally required for a ‘discovery’. Keywords Lectures; statistics; particle physics, probability, estimation, confidence limits. 1 Introduction To interpret the results of your particle physics experiment and see what it implies for the relevant theoretical model and parameters, you need to use statistical techniques. These are a part of your exper- imental toolkit, and to extract the maximum information from your data you need to use the correct and most powerful statistical tools. Particle physics (like, probably, any field of science) has is own special set of statistical processes and language. Our use is in some ways more complicated (we often fit multi-parameter functions, not just straight lines) and in some ways more simple (we do not have to worry about ethics, or law suits). So the generic textbooks and courses you will meet on ‘Statistics’ are not really appropriate. That’s why HEP schools like this one include lectures on statistics as well as the fundamental real physics, like field theory and physics beyond the Standard Model (BSM). There are several textbooks [1–6] available which are designed for an audience of particle physicists. You will find these helpful—more helpful than general statistical textbooks. You should find one whose language suits you and keep a copy on your bookshelf—preferably purchased—but at least on long term library loan. You will also find useful conference proceedings [7–9], journal papers (particularly in Nuclear Instruments and Methods) and web material: often your own experiment will have a set of pages devoted to the topic. 2 Probability We begin by looking at the concept of probability. Although this is familiar (we use it all the time, both inside and outside the laboratory), its use is not as obvious as you would think. 2.1 What is probability? A typical exam for Statistics101 (or equivalent) might well contain the question: Q1 Explain what is meant by the probability PA of an event A [1] © CERN, 2020, CC-BY-4.0 licence, doi:10.23730/CYRSP-2020-005.149, ISSN 0531-4283. The ‘1’ in square brackets signifies that the answer carries one mark. That’s an indication that just a sentence or two are required, not a long essay. Asking a group of physicists this question produces answers falling into four different categories 1. PA is number obeying certain mathematical rules, 2. PA is a property of A that determines how often A happens, 3. For N trials in which A occurs NA times, PA is the limit of NA=N for large N, and 4. PA is my belief that A will happen, measurable by seeing what odds I will accept in a bet. Although all these are generally present, number 3 is the most common, perhaps because it is often explicitly taught as the definition. All are, in some way, correct! We consider each in turn. 2.2 Mathematical probability The Kolmogorov axioms are: For all A ⊂ S PA ≥ 0 PS = 1 : (1) PA[B = PA + PB if A \ B = φ and A; B ⊂ S From these simple axioms a complete and complicated structure of theorems can be erected. This is what pure mathematicians do. For example, the 2nd and 3rd axiom show that the probability of not-A PA, is 1 − PA, and then the 1st axiom shows that PA ≤ 1: probabilities must be less than 1. But the axioms and the ensuing theorems says nothing about what PA actually means. Kol- mogorov had frequentist probability in mind, but these axioms apply to any definition: he explicitly avoids tying PA down in this way. So although this apparatus enables us to compute numbers, it does not tell us what we can use them for. 2.3 Real probability Also known as Classical probability, this was developed during the 18th–19th centuries by Pascal, Laplace and others to serve the gambling industry. If there are several possible outcomes and there is a symmetry between them so they are all, in a sense, identical, then their individual probabilities must be equal. For example, there are two sides to a 1 coin, so if you toss it there must be a probability 2 for each face to land uppermost. Likewise there are 1 52 cards in a pack, so the probability of a particular card being chosen is 52 . In the same way there are 6 sides to a dice, and 33 slots in a roulette wheel. This enables you to answer questions like ‘What is the probability of rolling more than 10 with 2 dice?’. There are 3 such combinations (5-6, 6-5 and 6-6) out of the 6 × 6 = 36 total possibilities, 1 so the probability is 12 . Compound instances of A are broken down into smaller instances to which the symmetry argument can be applied. This is satisfactory and clearly applicable—you know that if someone offers you a 10 to 1 bet on this dice throw, you should refuse; in the long run knowledge of the correct probabilities will pay off. The problem arises that this approach cannot be applied to continuous variables. This is brought out in Bertan’s paradoxes, one of which runs: In a circle of radius R an equilateral triangle is drawn. A chord is drawn at random. What is the probability that the length of the chord is greater than the side of the triangle? Considering Fig.1 one can give three answers: 150 Fig. 1: Bertan’s paradox 1. If the chord, without loss of generality, starts at A, then it will be longer than the side if the end 1 point is anywhere between B and C. So the answer is obviously 3 . 2. If the centre of the chord, without loss of generality, is chosen at random along the line OD, then it will be longer than the side of the triangle if it is in OE rather than ED. E is the midpoint of OD 1 so the answer is obviously 2 . 3. If the centre of the chord, without loss of generality, is chosen at random within the circle, then it R will be longer than the side of the triangle if it lies within the circle of radius 2 . So the answer is 1 obviously 4 . So we have three obvious but contradictory answers. The whole question is built on a false premise: drawing a chord ‘at random’ is, unlike tossing a coin or throwing a dice, not defined. An- other way of seeing this is that a distribution which is uniform in one variable, say θ, is not uniform in any non-trivial transformation of that variable, say cos θ or tan θ. Classical probability has therefore to be discarded. 2.4 Frequentist probability Because of such difficulties, Real Probability was replaced by Frequentist Probability in the early 20th century. This is the usual definition taught in schools and undergraduate classes. A very readable account is given by von Mises [10]: NA PA = lim : N!1 N N is the total number of events in the ensemble (or collective). It can be visualised as a Venn diagram, as in Fig.2. 1 The probability of a coin landing heads up is 2 because if you toss a coin 1000 times, one side will come down ∼ 500 times. That is an empirical definition (Frequentist probability has roots in the Vienna school and logical positivism). Similarly, the lifetime of a muon is 2:2µs because if you take 1000 muons and wait 2:2µs, then ∼ 368 (that’s a fraction e−1) will remain. With this definition PA is not just a property of A but a joint property of A and the ensemble. The same coin will have a different probability for showing head depending on whether it is in a purse or 151 Fig. 2: Frequentist probability in a numismatic collection. This leads to two distinctive properties (or, some would say, problems) for frequentist probability. Firstly, there may be more than one ensemble. To take an everyday example from von Mises, German life insurance companies pay out on 0.4% of 40 year old male clients. Your friend Hans is 40 today. What is the probability that he will survive to see his 41st birthday? 99.6% is an answer (if he’s insured). But he is also a non-smoker and non-drinker—so perhaps the figure is higher (maybe 99.8%)? But if he drives a Harley-Davidson it should be lower (maybe 99.0%)? All these numbers are acceptable. The individual Hans belongs to several different ensembles, and the probability will be different for each of them. To take an example from physics, suppose your experiment has a Particle Identification (PID) dE system using Cherenkov, time-of-flight and/or dx measurements. You want to talk about the probability that a K+ will be correctly recognised by your PID. You determine this by considering many K+ mesons and counting the number accepted to get P = Nacc=Ntot.

Load more