See Significance Level ˇ2, See Kurtosis , See Unnormalized Skewness 1, See

Total Page:16

File Type:pdf, Size:1020Kb

See Significance Level ˇ2, See Kurtosis , See Unnormalized Skewness 1, See Index ˛, see significance level deep learning, 192 ˇ2, see kurtosis Asimov dataset, 231 , see unnormalized skewness asymmetric errors, 99, 110 1, see skewness combination of, 123 2, see excess asymptotic formulae for test statistics, ", see efficiency 231 . see Gaussian average value or signal average value strength continuous case, 27 , see correlation coefficient discrete case, 12 , see standard deviation or Gaussian standard in Bayesian inference, 69 deviation , see lifetime ˚, see Gaussian cumulative distribution back propagation, neural network, 191 2 background distribution, 32 dependence of Feldman–Cousins upper method, 114 limits, 218, 220 binned case, 119 determination from control regions, in multiple dimensions, 132 129 random variable, 32, 114, 120 fluctuation for significance level, 205 Baker–Cousins, 120 in convolution and unfolding, 160 Neyman’s, 119 modeling in extended likelihood, 107 Pearson’s, 119 modeling with Argus function, 43 , see sample space rejection in hypothesis test, 176 3 evidence, 207 treatment in iterative unfolding, 171 5 observation, 207 uncertainty in significance evaluation, 209 uncertainty in test statistic, 227, 236 Baker–Cousins 2, 120 activation function, 191 Bayes factor, 73 adaptive boosting, 198 Bayes’ theorem, 59 AI, artificial intelligence, 195 learning process, 67 alternative hypothesis, 175 Bayesian Anderson–Darling test, 184 inference, 68 Argus function, 43 probability, 59, 64 artificial intelligence, 195 visual derivation, 60 artificial neural network, 181, 190 unfolding, 166 © Springer International Publishing AG 2017 251 L. Lista, Statistical Methods for Data Analysis in Particle Physics, Lecture Notes in Physics 941, DOI 10.1007/978-3-319-62840-0 252 Index BDT, see boosted decision trees Clopper–Pearson binomial interval, 147 Bernoulli CLs method, 221 probability distribution, 17 CNN, see convolutional neural network random process, 16 coefficient of determination R2, 117 random variable, 17 combination Bertrand’s paradox, 7 of measurements, 129 best linear unbiased estimator, 133 principle, 136, 140 conservative correlation assumption, 137 conditional intrinsic information weight, 136 distribution, 53 iterative application, 139 probability, 9 marginal information weight, 136 confidence negative weights, 135, 137 interval, 100, 109, 143 relative importance, 136 level, 100 beta distribution, 83 conservative bias, 102 CLs method, 221, 223 in maximum likelihood estimators, 113 correlation assumption, BLUE method, 137 bifurcated Gaussian, 124 interval, 147 bimodal distribution, 14 limit, 217 bin migration, 158 consistency of an estimator, 102 binned Poissonian fit, 120 control binning, 118 region, 129 in convolution, 158 sample, 130 binomial convergence in probability, 22 coefficient, 18 ConvNet, see convolutional neural network interval, 147 convolution, 155 probability distribution, 18, 147 convolutional neural network, 194 Poissonian limit, 40 Fourier transform, 156 random process, 17 convolutional neural network, 193 random variable, 18 feature map, 194 BLUE, see best linear unbiased estimator local receptive fields, 194 boosted decision trees, 181 correlation coefficient, 14 adaptive boosting, 198 counting experiment, 208, 212, 216, 227 boosting, 198 Cousins–Highlands method, 227 cross entropy, 196 covariance, 14 decision forest, 197 matrix, 14 Gini index, 196 coverage, 100 leaf, 196 Cramér–Rao bound, 102 node, 196 Cramér–von Mises test, 184 boosting, boosted decision trees, 198 credible interval, 70 Box–Muller transformation, 89 cross entropy, decision tree, 196 Brazil plot, 225 Crystal Ball function, 44 breakdown point, robust estimator, 103 cumulative distribution, 28 Breit–Wigner cut, 176 non-relativistic distribution, 41 relativistic distribution, 42 data sample, 99 decision Cauchy distribution, 41 forest, 197 central tree, 196 interval, 70 deep learning, artificial neural network, 192 limit theorem, 46 degree of belief, 65 value, 99 dices, 4–6, 16, 21 chaotic regime, 82 differential probability, 26 classical probability, 4 discovery, 205, 207, 208 Index 253 distribution, see probability distribution flat (uniform) distribution, 6, 30 dogma, extreme Bayesian prior, 66 flip-flopping, 150 drand48 function from C standard library, 84 forest, boosted decision trees, 197 Fourier transform of PDF convolution, 156 efficiency frequentist hit-or-miss Monte Carlo, 90 inference, 100 of a detector, 10, 158 probability, 3, 22 estimate, 104 full width at half maximum, 31, 41 of an estimator, 102 fully asymmetric interval, 70 elementary event, 4, 6, 9 FWHM, full width at half maximum, equiprobability, 4, 6, 25 31 ergodicity, 94 error of a measurement gamma function, 33, 71 Bayesian approach, 70 Gaussian frequentist approach, 99 average value, ,31 of the first kind, 177 bifurcated, 124 of the second kind, 177 contours in two dimensions, 55 propagation cumulative distribution, 31 Bayesian case, 79 distribution, 31 frequentist case, 121 in more dimensions, 54 simple cases, 121 intervals, 32, 58 estimate, 68, 97, 99, 100 likelihood function, 108 estimator, 100 random number generator, 89 efficiency, 102 central limit theorem, 88 maximum likelihood, 105 standard deviation, ,31 properties, 101 generator, see pseudorandom number generator robust, 103 Gini index, decision tree, 196 Euler characteristic, 246 global significance level, 242 event, 2 goodness of fit, 33, 118, 120 counting experiment, 187, 206 gsl_rng_rand function from GSL library, elementary, 4, 6, 9 84 in physics, 105 in statistics, 2 independent, 10 Hastings ratio, 93 evidence histogram, 119 3 significance level, 207 convolution, 158 Bayes factor, 73 in Asimov dataset, 231 excess, 15 PDF approximation, 182 exclusion, 211 hit-or-miss Monte Carlo, 90 expected value. see average value homogeneous Markov chain, 93 exponential distribution, 34 Hui’s triangle, 18, 23 random number generator, 87 hybrid frequentist approach, 227 extended likelihood function, 106, 186 hypothesis test, 175 fast Fourier transform, 156 IID, independent identically distributed feature map, convolutional neural network, 194 random variables, 82 feedforward multilayer perceptron, 190 IIW, intrinsic information weight, BLUE Feldman–Cousins unified intervals, 152 method, 136 FFT. see fast Fourier transform importance sampling, 91 Fisher information, 75, 102, 136 improper prior distribution, 76 Fisher’s linear discriminant, 178 incomplete Gamma function, 41 254 Index independent linear regression, 115 and identically distributed random local variables, 82, 106 receptive fields, 194 events, 10 significance level, 210, 242 random variables, 50 log normal distribution, 33 inference, 97 logistic map, 82 Bayesian, 68 look elsewhere effect, 210, 242 intersubjective probability, 75 in more dimensions, 246 intrinsic information weight, BLUE method, Lorentz distribution, 41 136 loss function, 191 invariant prior, see Jeffreys’ prior lower limit, 70 iterative unfolding, 166 lrand48 function from C standard library, 84 Jeffreys’ prior, 75 machine learning, 188 joint probability distribution, 49 observation, 189 supervised, 188 kernel function, see response function unsupervised, 188 Kolmogorov distribution, 183 marginal Kolmogorov–Smirnov test, 182 distribution, 49 kurtosis, 15 information weight, BLUE method, 136 coefficient, 15 Markov chain, 93 homogeneous, 93 Monte Carlo, 69, 93 L’Ecuyer pseudorandom number generator, 84 maximum likelihood L-curve, 165 estimator, 105 Lüscher pseudorandom number generator, 84 bias, 113 Landau distribution, 46 properties, 112 large numbers, law of, 21 method, 69, 105 law uncertainty, 109 of large numbers, 21 MC, see Monte Carlo of total probability, 11 MCMC, Markov chain Monte Carlo, 93 leaf, decision tree, 196 median, 14, 28, 103 learning Mersenne-Twistor pseudorandom number process in Bayesian probability, 67 generator, 84 rate parameter, artificial neural network, Metropolis–Hastings 191 algorithm, 93 least squares method, 114 proposal distribution, 93 lifetime, 35, 39 ratio, 95 Bayesian inference, 76 minimum Jeffreys prior, 77 2 method, see 2 method maximum likelihood estimate, 112 variance bound, 102 measurement combination, 140 MINUIT, 106, 110 likelihood misidentification probability, 176 function, 67, 105 MIW, marginal information weight, BLUE extended, 106, 186 method, 136 Gaussian, 108 mode, 14, 28 in Bayesian probability, 67 modified frequentist approach, 221 ratio Monte Carlo method, 6, 46, 69, 81 discriminant, 181 hit-or-miss, 90 in search for new signals, 185, 209 numerical integration, 92 projective discriminant, 182 sampling, 89 test statistic in Neyman–Pearson multilayer perceptron, 190 lemma, 181 multimodal distribution, 14, 28 Index 255 multinomial distribution, 20 pooling, convolutional neural network, 194 multivariate analysis, 178 posterior MVA, multivariate analysis, 178 odds, 64, 73 probability, 60, 65, 67 prior negative weights, 135, 137 odds, 73 nested hypotheses, see Wilks’ theorem probability, 60, 65 neural network, see artificial neural network distribution, 67 Neyman distribution, improper, 76 confidence belt distribution, uniform, 71, 74 binomial case, 147 subjective choice, 74 construction, 144 uninformative, 69, 75 Feldman–Cousins, 152, 218 probability, 2 Gaussian case, 146 axiomatic definition, 8 inversion, 146 Bayesian, 3, 4 confidence intervals, 215 classical, 4 Neyman’s 2, 119 density, 25 Neyman–Pearson lemma, 181 dice rolls, 5 node, decision tree, 196 distribution, 9, 25 normal 2,32 distribution, see Gaussian distribution Bernoulli, 17 random variable, 31 beta, 83 normalization condition, 9, 26 bimodal, 14 nuisance parameter,
Recommended publications
  • Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam
    Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam Roger Barlow The University of Huddersfield August 2018 Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 1 / 34 Lecture 1: The Basics 1 Probability What is it? Frequentist Probability Conditional Probability and Bayes' Theorem Bayesian Probability 2 Probability distributions and their properties Expectation Values Binomial, Poisson and Gaussian 3 Hypothesis testing Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 2 / 34 Question: What is Probability? Typical exam question Q1 Explain what is meant by the Probability PA of an event A [1] Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 3 / 34 Four possible answers PA is number obeying certain mathematical rules. PA is a property of A that determines how often A happens For N trials in which A occurs NA times, PA is the limit of NA=N for large N PA is my belief that A will happen, measurable by seeing what odds I will accept in a bet. Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 4 / 34 Mathematical Kolmogorov Axioms: For all A ⊂ S PA ≥ 0 PS = 1 P(A[B) = PA + PB if A \ B = ϕ and A; B ⊂ S From these simple axioms a complete and complicated structure can be − ≤ erected. E.g. show PA = 1 PA, and show PA 1.... But!!! This says nothing about what PA actually means. Kolmogorov had frequentist probability in mind, but these axioms apply to any definition. Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 5 / 34 Classical or Real probability Evolved during the 18th-19th century Developed (Pascal, Laplace and others) to serve the gambling industry.
    [Show full text]
  • 3.3 Bayes' Formula
    Ismor Fischer, 5/29/2012 3.3-1 3.3 Bayes’ Formula Suppose that, for a certain population of individuals, we are interested in comparing sleep disorders – in particular, the occurrence of event A = “Apnea” – between M = Males and F = Females. S = Adults under 50 M F A A ∩ M A ∩ F Also assume that we know the following information: P(M) = 0.4 P(A | M) = 0.8 (80% of males have apnea) prior probabilities P(F) = 0.6 P(A | F) = 0.3 (30% of females have apnea) Given here are the conditional probabilities of having apnea within each respective gender, but these are not necessarily the probabilities of interest. We actually wish to calculate the probability of each gender, given A. That is, the posterior probabilities P(M | A) and P(F | A). To do this, we first need to reconstruct P(A) itself from the given information. P(A | M) P(A ∩ M) = P(A | M) P(M) P(M) P(Ac | M) c c P(A ∩ M) = P(A | M) P(M) P(A) = P(A | M) P(M) + P(A | F) P(F) P(A | F) P(A ∩ F) = P(A | F) P(F) P(F) P(Ac | F) c c P(A ∩ F) = P(A | F) P(F) Ismor Fischer, 5/29/2012 3.3-2 So, given A… P(M ∩ A) P(A | M) P(M) P(M | A) = P(A) = P(A | M) P(M) + P(A | F) P(F) (0.8)(0.4) 0.32 = (0.8)(0.4) + (0.3)(0.6) = 0.50 = 0.64 and posterior P(F ∩ A) P(A | F) P(F) P(F | A) = = probabilities P(A) P(A | M) P(M) + P(A | F) P(F) (0.3)(0.6) 0.18 = (0.8)(0.4) + (0.3)(0.6) = 0.50 = 0.36 S Thus, the additional information that a M F randomly selected individual has apnea (an A event with probability 50% – why?) increases the likelihood of being male from a prior probability of 40% to a posterior probability 0.32 0.18 of 64%, and likewise, decreases the likelihood of being female from a prior probability of 60% to a posterior probability of 36%.
    [Show full text]
  • Numerical Physics with Probabilities: the Monte Carlo Method and Bayesian Statistics Part I for Assignment 2
    Numerical Physics with Probabilities: The Monte Carlo Method and Bayesian Statistics Part I for Assignment 2 Department of Physics, University of Surrey module: Energy, Entropy and Numerical Physics (PHY2063) 1 Numerical Physics part of Energy, Entropy and Numerical Physics This numerical physics course is part of the second-year Energy, Entropy and Numerical Physics mod- ule. It is online at the EENP module on SurreyLearn. See there for assignments, deadlines etc. The course is about numerically solving ODEs (ordinary differential equations) and PDEs (partial differential equations), and introducing the (large) part of numerical physics where probabilities are used. This assignment is on numerical physics of probabilities, and looks at the Monte Carlo (MC) method, and at the Bayesian statistics approach to data analysis. It covers MC and Bayesian statistics, in that order. MC is a widely used numerical technique, it is used, amongst other things, for modelling many random processes. MC is used in fields from statistical physics, to nuclear and particle physics. Bayesian statistics is a powerful data analysis method, and is used everywhere from particle physics to spam-email filters. Data analysis is fundamental to science. For example, analysis of the data from the Large Hadron Collider was required to extract a most probable value for the mass of the Higgs boson, together with an estimate of the region of masses where the scientists think the mass is. This region is typically expressed as a range of mass values where the they think the true mass lies with high (e.g., 95%) probability. Many of you will be analysing data (physics data, commercial data, etc) for your PTY or RY, or future careers1 .
    [Show full text]
  • The Bayesian Approach to Statistics
    THE BAYESIAN APPROACH TO STATISTICS ANTHONY O’HAGAN INTRODUCTION the true nature of scientific reasoning. The fi- nal section addresses various features of modern By far the most widely taught and used statisti- Bayesian methods that provide some explanation for the rapid increase in their adoption since the cal methods in practice are those of the frequen- 1980s. tist school. The ideas of frequentist inference, as set out in Chapter 5 of this book, rest on the frequency definition of probability (Chapter 2), BAYESIAN INFERENCE and were developed in the first half of the 20th century. This chapter concerns a radically differ- We first present the basic procedures of Bayesian ent approach to statistics, the Bayesian approach, inference. which depends instead on the subjective defini- tion of probability (Chapter 3). In some respects, Bayesian methods are older than frequentist ones, Bayes’s Theorem and the Nature of Learning having been the basis of very early statistical rea- Bayesian inference is a process of learning soning as far back as the 18th century. Bayesian from data. To give substance to this statement, statistics as it is now understood, however, dates we need to identify who is doing the learning and back to the 1950s, with subsequent development what they are learning about. in the second half of the 20th century. Over that time, the Bayesian approach has steadily gained Terms and Notation ground, and is now recognized as a legitimate al- ternative to the frequentist approach. The person doing the learning is an individual This chapter is organized into three sections.
    [Show full text]
  • Paradoxes and Priors in Bayesian Regression
    Paradoxes and Priors in Bayesian Regression Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Agniva Som, B. Stat., M. Stat. Graduate Program in Statistics The Ohio State University 2014 Dissertation Committee: Dr. Christopher M. Hans, Advisor Dr. Steven N. MacEachern, Co-advisor Dr. Mario Peruggia c Copyright by Agniva Som 2014 Abstract The linear model has been by far the most popular and most attractive choice of a statistical model over the past century, ubiquitous in both frequentist and Bayesian literature. The basic model has been gradually improved over the years to deal with stronger features in the data like multicollinearity, non-linear or functional data pat- terns, violation of underlying model assumptions etc. One valuable direction pursued in the enrichment of the linear model is the use of Bayesian methods, which blend information from the data likelihood and suitable prior distributions placed on the unknown model parameters to carry out inference. This dissertation studies the modeling implications of many common prior distri- butions in linear regression, including the popular g prior and its recent ameliorations. Formalization of desirable characteristics for model comparison and parameter esti- mation has led to the growth of appropriate mixtures of g priors that conform to the seven standard model selection criteria laid out by Bayarri et al. (2012). The existence of some of these properties (or lack thereof) is demonstrated by examining the behavior of the prior under suitable limits on the likelihood or on the prior itself.
    [Show full text]
  • A Widely Applicable Bayesian Information Criterion
    JournalofMachineLearningResearch14(2013)867-897 Submitted 8/12; Revised 2/13; Published 3/13 A Widely Applicable Bayesian Information Criterion Sumio Watanabe [email protected] Department of Computational Intelligence and Systems Science Tokyo Institute of Technology Mailbox G5-19, 4259 Nagatsuta, Midori-ku Yokohama, Japan 226-8502 Editor: Manfred Opper Abstract A statistical model or a learning machine is called regular if the map taking a parameter to a prob- ability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/logn, where n is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical model.
    [Show full text]
  • Part IV: Monte Carlo and Nonparametric Bayes Outline
    Part IV: Monte Carlo and nonparametric Bayes Outline Monte Carlo methods Nonparametric Bayesian models Outline Monte Carlo methods Nonparametric Bayesian models The Monte Carlo principle • The expectation of f with respect to P can be approximated by 1 n E P(x)[ f (x)] " # f (xi ) n i=1 where the xi are sampled from P(x) • Example: the average # of spots on a die roll ! The Monte Carlo principle The law of large numbers n E P(x)[ f (x)] " # f (xi ) i=1 Average number of spots ! Number of rolls Two uses of Monte Carlo methods 1. For solving problems of probabilistic inference involved in developing computational models 2. As a source of hypotheses about how the mind might solve problems of probabilistic inference Making Bayesian inference easier P(d | h)P(h) P(h | d) = $P(d | h ") P(h ") h " # H Evaluating the posterior probability of a hypothesis requires considering all hypotheses ! Modern Monte Carlo methods let us avoid this Modern Monte Carlo methods • Sampling schemes for distributions with large state spaces known up to a multiplicative constant • Two approaches: – importance sampling (and particle filters) – Markov chain Monte Carlo Importance sampling Basic idea: generate from the wrong distribution, assign weights to samples to correct for this E p(x)[ f (x)] = " f (x)p(x)dx p(x) = f (x) q(x)dx " q(x) n ! 1 p(xi ) " # f (xi ) for xi ~ q(x) n i=1 q(xi ) ! ! Importance sampling works when sampling from proposal is easy, target is hard An alternative scheme… n 1 p(xi ) E p(x)[ f (x)] " # f (xi ) for xi ~ q(x) n i=1 q(xi ) n p(xi
    [Show full text]
  • Marginal Likelihood
    STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 2 Last Class •" In our last class, we looked at: -" Statistical Decision Theory -" Linear Regression Models -" Linear Basis Function Models -" Regularized Linear Regression Models -" Bias-Variance Decomposition •" We will now look at the Bayesian framework and Bayesian Linear Regression Models. Bayesian Approach •" We formulate our knowledge about the world probabilistically: -" We define the model that expresses our knowledge qualitatively (e.g. independence assumptions, forms of distributions). -" Our model will have some unknown parameters. -" We capture our assumptions, or prior beliefs, about unknown parameters (e.g. range of plausible values) by specifying the prior distribution over those parameters before seeing the data. •" We observe the data. •" We compute the posterior probability distribution for the parameters, given observed data. •" We use this posterior distribution to: -" Make predictions by averaging over the posterior distribution -" Examine/Account for uncertainly in the parameter values. -" Make decisions by minimizing expected posterior loss. (See Radford Neal’s NIPS tutorial on ``Bayesian Methods for Machine Learning'’) Posterior Distribution •" The posterior distribution for the model parameters can be found by combining the prior with the likelihood for the parameters given the data. •" This is accomplished using Bayes’
    [Show full text]
  • Luca Lista Statistical Methods for Data Analysis in Particle Physics Lecture Notes in Physics
    Lecture Notes in Physics 909 Luca Lista Statistical Methods for Data Analysis in Particle Physics Lecture Notes in Physics Volume 909 Founding Editors W. Beiglböck J. Ehlers K. Hepp H. Weidenmüller Editorial Board M. Bartelmann, Heidelberg, Germany B.-G. Englert, Singapore, Singapore P. Hänggi, Augsburg, Germany M. Hjorth-Jensen, Oslo, Norway R.A.L. Jones, Sheffield, UK M. Lewenstein, Barcelona, Spain H. von Löhneysen, Karlsruhe, Germany J.-M. Raimond, Paris, France A. Rubio, Donostia, San Sebastian, Spain S. Theisen, Potsdam, Germany D. Vollhardt, Augsburg, Germany J.D. Wells, Ann Arbor, USA G.P. Zank, Huntsville, USA The Lecture Notes in Physics The series Lecture Notes in Physics (LNP), founded in 1969, reports new devel- opments in physics research and teaching-quickly and informally, but with a high quality and the explicit aim to summarize and communicate current knowledge in an accessible way. Books published in this series are conceived as bridging material between advanced graduate textbooks and the forefront of research and to serve three purposes: • to be a compact and modern up-to-date source of reference on a well-defined topic • to serve as an accessible introduction to the field to postgraduate students and nonspecialist researchers from related areas • to be a source of advanced teaching material for specialized seminars, courses and schools Both monographs and multi-author volumes will be considered for publication. Edited volumes should, however, consist of a very limited number of contributions only. Proceedings will not be considered for LNP. Volumes published in LNP are disseminated both in print and in electronic for- mats, the electronic archive being available at springerlink.com.
    [Show full text]
  • Arxiv:1312.5000V2 [Hep-Ex] 30 Dec 2013
    1 Mass distributions marginalized over per-event errors a b 2 D. Mart´ınezSantos , F. Dupertuis a 3 NIKHEF and VU University Amsterdam, Amsterdam, The Netherlands b 4 Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Lausanne, Switzerland 5 Abstract 6 We present generalizations of the Crystal Ball function to describe mass peaks 7 in which the per-event mass resolution is unknown and marginalized over. The 8 presented probability density functions are tested using a series of toy MC sam- 9 ples generated with Pythia and smeared with different amounts of multiple 10 scattering and for different detector resolutions. 11 Keywords: statistics, invariant mass peaks 12 1. Introduction A very common probability density function (p:d:f:) used to fit the mass peak of a resonance in experimental particle physics is the so-called Crystal Ball (CB) function [1{3]: 8 2 − 1 ( m−µ ) m−µ <>e 2 σ , if > −a p(m) / σ (1) > m−µ n :A B − σ , otherwise 13 where m is the free variable (the measured mass), µ is the most probable value 14 (the resonance mass), σ the resolution, a is called the transition point and n the arXiv:1312.5000v2 [hep-ex] 30 Dec 2013 15 power-law exponent. A and B are calculated by imposing the continuity of the 16 function and its derivative at the transition point a. This function consists of a 17 Gaussian core, that models the detector resolution, with a tail on the left-hand 18 side that parametrizes the effect of photon radiation by the final state particles 19 in the decay.
    [Show full text]
  • Naïve Bayes Classifier
    NAÏVE BAYES CLASSIFIER Professor Tom Fomby Department of Economics Southern Methodist University Dallas, Texas 75275 April 2008 The Naïve Bayes classifier is a classification method based on Bayes Theorem. Let C j denote that an output belongs to the j-th class, j 1,2, J out of J possible classes. Let P(C j | X1, X 2 ,, X p ) denote the (posterior) probability of belonging in the j-th class given the individual characteristics X1, X 2 ,, X p . Furthermore, let P(X1, X 2 ,, X p | C j )denote the probability of a case with individual characteristics belonging to the j-th class and P(C j ) denote the unconditional (i.e. without regard to individual characteristics) prior probability of belonging to the j-th class. For a total of J classes, Bayes theorem gives us the following probability rule for calculating the case-specific probability of falling into the j-th class: P(X , X ,, X | C ) P(C ) P(C | X , X ,, X ) 1 2 p j j (1) j 1 2 p Denom where Denom P(X1, X 2 ,, X p | C1 )P(C1 ) P(X1, X 2 ,, X p | CJ )P(CJ ) . Of course the conditional class probabilities of (1) are exhaustive in that a case J (X1, X 2 ,, X p ) has to fall in one of the J cases. That is, P(C j | X1 , X 2 ,, X p ) 1. j1 The difficulty with using (1) is that in situations where the number of cases (X1, X 2 ,, X p ) is few and distinct and the number of classes J is large, there may be many instances where the probabilities of cases falling in specific classes, , are frequently equal to zero for the majority of classes.
    [Show full text]
  • Constraints Versus Priors † Philip B
    SIAM/ASA J. UNCERTAINTY QUANTIFICATION c 2015 Society for Industrial and Applied Mathematics Vol. 3, pp. 586–598 and American Statistical Association ∗ Constraints versus Priors † Philip B. Stark Abstract. There are deep and important philosophical differences between Bayesian and frequentist approaches to quantifying uncertainty. However, some practitioners choose between these approaches primar- ily on the basis of convenience. For instance, the ability to incorporate parameter constraints is sometimes cited as a reason to use Bayesian methods. This reflects two misunderstandings: First, frequentist methods can indeed incorporate constraints on parameter values. Second, it ignores the crucial question of what the result of the analysis will mean. Bayesian and frequentist measures of uncertainty have similar sounding names but quite different meanings. For instance, Bayesian uncertainties typically involve expectations with respect to the posterior distribution of the param- eter, holding the data fixed; frequentist uncertainties typically involve expectations with respect to the distribution of the data, holding the parameter fixed. Bayesian methods, including methods incorporating parameter constraints, require supplementing the constraints with a prior probability distribution for parameter values. This can cause frequentist and Bayesian estimates and their nom- inal uncertainties to differ substantially, even when the prior is “uninformative.” This paper gives simple examples where “uninformative” priors are, in fact, extremely informative, and sketches how to measure how much information the prior adds to the constraint. Bayesian methods can have good frequentist behavior, and a frequentist can use Bayesian methods and quantify the uncertainty by frequentist means—but absent a meaningful prior, Bayesian uncertainty measures lack meaning. The paper ends with brief reflections on practice.
    [Show full text]