Searching for New Physics with Profile Likelihoods: Wilks and Beyond

Searching for new physics with profile likelihoods: Wilks and beyond Sara Algeri1, Jelle Aalbers2, Knut Dundas Mora˚ 2,3, and Jan Conrad2,* 1School of Statistics, University of Minnesota, Minneapolis (MN), 55455, USA 2Physics Department and Oskar Klein Centre, Stockholm University, Stockholm, Sweden 3Physics Department, Columbia University, New York, NY 10027, USA *e-mail: [email protected] ABSTRACT Particle physics experiments use likelihood ratio tests extensively to compare hypotheses and to construct confidence intervals. Often, the null distribution of the likelihood ratio test statistic is approximated by a c2 distribution, following a theorem due to Wilks. However, many circumstances relevant to modern experiments can cause this theorem to fail. In this paper, we review how to identify these situations and construct valid inference. Key points dard model) from a more general case H1 (e.g. the standard model with a new particle): Necessary conditions for the Likelihood Ratio Test to be approximated c2 include the following. Frequentist hypothesis testing The sample size must be sufficiently large. If not, 1. Summarize the data collected by the experiment • higher-order asymptotic results exist or one can with a test statistic T whose absolute value is rely on Monte Carlo simulations. smaller under H0 than it is under H1. The true values of the parameters must lie in 2. Compute a p-value, i.e., the probability that T is • the interior of the parameter space. If not, both larger than its observed value when H0 is true. H0 asymptotic results and simulations must be ad- is rejected if the p-value is below some thresh- 7 justed accordingly. old a (e.g. 2:9 10− , corresponding to a 5s significance level).× The parameters must be identifiable. If not, look- • elsewhere effect correction methods can be of Commonly, H differs from H for some fixed values of the help to address this problem. 0 1 parameter(s) of interest which we denote by m, i.e., The models under comparison must be nested. If • H0 : m = m0 versus H1 : m > m0 (or m = m0): (1) not, one can construct a comprehensive model 6 which includes the models considered as special cases. The model under study may also be characterized by a set of nuisance parameter(s), namely q, whose value is not of The models under comparison must be specified direct interest for the test being conducted, but it still needs to • correctly. If not, adding nuisance parameters can estimated under both hypotheses. For example, in a search for be of help, otherwise one must rely on nonpara- a new physics, m may correspond to the rate or cross-section metric methods. of signal events and q could include detector efficiencies or uncertain background rates. To assess the presence of the Website summary: The goal of this manuscript is to identify signal we test arXiv:1911.10237v1 [physics.data-an] 22 Nov 2019 situations in particle physics where the likelihood ratio test The first step of hypothesis testing requires the specification statistic cannot approximated by a c2 distribution and propose of a test statistic T. Among the wide variety of testing proce- adequate solutions. dures available in statistical literature, a popular choice is the profile (or generalized) Likelihood Ratio Test (LRT), due to its statistical power1,2 and simple implementation. The LRT has 1 Introduction been used in several seminal studies in particle physics, from Modern particle physics employs elaborate statistical methods the discovery of the Higgs boson3,4 and measurements of the to enhance the physics reach of expensive experiments and neutrino properties5 to direct6–8 and indirect9, 10 searches for large collaborations. The field’s standard for reporting results dark matter. is frequentist hypothesis testing. At its heart, this is a two-step Given a likelihood function, L, that measures the probability recipe for distinguishing a null hypothesis H0 (e.g. the stan- of the observed data for a given value of the parameters m and q, the LRT is defined as 3.0 γ L(m ;q ) 2.5 T = 2log 0 0 : (2) − L(m;q) 2.0 φ b 1 Here m0 is the value of m specified under the null hypothesis in µ b b 1.5 (1), whereas is the best-fit of the nuisance parameter under q0 φ0 H0, i.e., the value of q that maximize L given that m = m0. 1.0 Similarly, m band q are the best-fit parameters under H . In Expected events / dE 1 0.5 β the new physics signal search example, m0 = 0 (i.e., the only E(i) value of m allowedb b under the null hypothesis zero); it follows 0.0 5 4 3 2 1 0 1 2 3 4 5 that T is zero if m = 0, and it is positive otherwise. − − − − − E [arbitrary units] The second step of hypothesis testing is to quantify the evidence in favorb of H1 by means of a p-value, i.e., the probability that, when the null hypothesis is true, T is much larger Figure 1. An illustration of the example model used in this paper. The figure shows the distributions of energies of than its value observed on the data. Finally, H0 is excluded if the p-value if smaller than the predetermined probability of particles measured by an experiment. In the analysis, the aim is to constrain the mean number of signal events from the false discovery a. m E An equivalent statement on the validity of H can be made Gaussian signal (red) centered at = g, on top of the 0 polynomial background (blue) with distribution by constructing a confidence interval for m. The latter corre- B E E and expected number of events. sponds to the set of possible values m for which the probabil- ( ) = f0 + f1 + ::: b 0 The model shown here is 4, 10, 0, 1, ity that T is smaller or equal than its value observed is 1 a. m = b = g = f0 = − f1 = 0:1, f j j 2 = 0. The black dots show one dataset of An experiment can exclude a certain value m0 at a confidence − f g ≥ events E(i) N that the experiment might observe under this level (or coverage) 1 a if such value is not contained in the f gi=1 confidence interval. − model. Both test of hypothesis and confidence intervals rely on 11 the distribution of T under H0. A famous result from Wilks states that the null distribution of the LRT in (2) is approxi- the conditions needed for this theorem to apply. A formal mately c2 distributed if sufficient data is acquired, provided statement of the regularity conditions required by Wilks can 13 14 some regularity conditions are met. These same results pro- be found in and , but five necessary conditions cover most vide the underpinning for the common c2 fitting and goodness- practical cases: of-fit computations. Therefore, they are subject to the same Necessary conditions for Wilks’ theorem regularity conditions, which, if incorrectly assumed, can yield to invalid claims of exclusions or discoveries. ASYMPTOTIC: Sufficient data is observed. In this paper, we present a set of necessary conditions for Wilks’ theorem to hold using language and examples familiar INTERIOR: Only values of m and q which are far to particle physicists. Furthermore, when they fail, we give from the boundaries of their parameter space are recommendations on when extensions of Wilks’ result apply. admitted. Generally speaking, a viable alternative when Wilks or IDENTIFIABLE: Different values of the parame- similar results fail to hold is to estimate the null distribution ters specify distinct models. of T by Monte Carlo or toy simulations, i.e. simulating many datasets under H0, and computing T for each. However, as NESTED: H0 is a limiting case of H1, e.g. with discussed in Sections 3.2 and4, besides the computational some parameter fixed to a sub-range of the entire burden, additional complications may arise and consistecy of parameter space. the numerical solution is not guaranteed. CORRECT: The true model is specified either under H or under H . 2 Wilks’ theorem and conditions 0 1 Wilks’ theorem11 can be stated as follows: To illustrate these conditions, we consider an experiment that measured N particles with energies E(i), with Theorem (Wilks, 1938). Under suitable regularity conditions, i 1 2 N , and the interest is in the mean number m when H in (1) is true, the distribution of T converges to c2 . ; ;:::; 0 m of2 signal f eventsg on top of an expected background of b 2 Here cm is a chi-squared random variable with degrees of expected events. The signal’s energies are assumed to fol- freedom m equal to the number of parameters in m. Instruc- low a Gaussian distribution with mean g and standard devi- tions on constructing confidence regions and tests based on ation 1, while the background distribution B is polynomial, 12 2 Wilks’ theorem can be found elsewhere ; here, we focus on i.e. B(E) = k(f0 + f1E + f2E + :::), with k a normalization 2/9 20 Wilks’ p-value 0.20 0.5 0.16 0.023 0.0013 3.2e-05 10 75% 5 0.15 50% µ µ = 20, β = 1 25% 2 (90%) =1 2 d 0.10 Invalid 1 0% T > χ Signal events µ = 20, β = 10 0.5 P -25% 0.05 Conservative µ = 1, β = 10 0.2 -50% (True - Wilks)/Wilks p-value 0.1 0.00 -75% 0.1 0.2 0.5 1 2 5 10 20 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Background events β Wilks’ significance [σ] Figure 2.

Searching for New Physics with Profile Likelihoods: Wilks and Beyond

Ordinary Least Squares 1 Ordinary Least Squares

Quantum State Estimation with Nuisance Parameters 2

Nearly Optimal Tests When a Nuisance Parameter Is Present

Composite Hypothesis, Nuisance Parameters’

Likelihood Inference in the Presence of Nuisance Parameters N

Learning to Pivot with Adversarial Networks

Exact Statistical Inferences for Functions of Parameters of the Log-Gamma Distribution

Confidence Intervals and Nuisance Parameters Common Example

Reducing the Sensitivity to Nuisance Parameters in Pseudo-Likelihood Functions

Arxiv:1807.05996V2 [Physics.Data-An] 4 Aug 2018

P Values and Nuisance Parameters

Stat 5102 Lecture Slides Deck 2