Duality for Real and Multivariate Exponential Families

Total Page:16

File Type:pdf, Size:1020Kb

Duality for Real and Multivariate Exponential Families Duality for real and multivariate exponential families a, G´erard Letac ∗ aInstitut de Math´ematiques de Toulouse, 118 route de Narbonne 31062 Toulouse, France. Abstract n Consider a measure µ on R generating a natural exponential family F(µ) with variance function VF(µ)(m) and Laplace transform exp(ℓµ(s)) = exp( s, x )µ(dx). ZRn −h i A dual measure µ∗ satisfies ℓ′ ( ℓ′ (s)) = s. Such a dual measure does not always exist. One important property µ∗ µ 1 − − is ℓ′′ (m) = (VF(µ)(m))− , leading to the notion of duality among exponential families (or rather among the extended µ∗ notion of T exponential families TF obtained by considering all translations of a given exponential family F). Keywords: Dilogarithm distribution, Landau distribution, large deviations, quadratic and cubic real exponential families, Tweedie scale, Wishart distributions. 2020 MSC: Primary 62H05, Secondary 60E10 1. Introduction One can be surprized by the explicit formulas that one gets from large deviations in one dimension. Consider iid random variables X1,..., Xn,... and the empirical mean Xn = (X1 + + Xn)/n. Then the Cram´er theorem [6] says that the limit ··· 1/n Pr(Xn > m) n α(m) → →∞ 1 does exist for m > E(X ). For instance, the symmetric Bernoulli case Pr(Xn = 1) = leads, for 0 < m < 1, to 1 ± 2 1 m 1+m α(m) = (1 + m)− − (1 m)− . (1) − 1 x For the bilateral exponential distribution Xn e dx we get, for m > 0, ∼ 2 −| | 1 1 √1+m2 2 α(m) = e − (1 + √1 + m ). 2 What strange formulas! Other ones can be found in [17], Problem 410. The present paper is going to interpret in arXiv:2104.05510v2 [math.PR] 8 Aug 2021 certain cases the function m 1/α(m) as the Laplace transform of a certain dual measure µ which can be deduced 7→ ∗ explicitly from the distribution µ of Xn. Since the Cram´er theorem can be seen as a result about one dimensional exponential families, we develop the idea in this framework, using the large box of examples obtained from the theory of variance functions of exponential families initiated by the article of Carl Morris [21] in 1982. An even simpler example of duality is provided by the Tweedie families (Barlev and Enis [1], Jørgensen [9] and Tweedie [22]): the p q 1 1 variance function Am with p > 1 has dual Bm with p + q = 1 for suitable pairs (A, B) (see (8)). For instance, the Inverse Gaussian family Am3 has dual Bm3/2 . The Poisson distribution, with the one of the simplest variance functions VF(m) = m, leads to the study of expo- nential family with variance function em generated up to a translation by the unsymmetric stable law with Laplace ∗Corresponding author. Email address: [email protected] Preprint submitted to Journal of Multivariate Analysis August 10, 2021 s s transform e− s , also called Landau distribution. This gives tools for describing duals of other familiar exponential families. The cases of the normal and gamma families are very simple, being self dual, but other familiar cases like the negative binomial, the Bernoulli distribution and the cubic families are tougher. Finally, we consider another family, which is the dilogarithm family with variance function em 1 as well as the one with variance function sinh m. Like the normal and gamma families, they have the remarkable prop− erty to be self dual. The definition of dual measures makes sense also in Rn while the probabilistic interpretation in terms of large deviations is lost. However we consider several cases in Rn: the multinomial distribution, the Wishart ones and other quadratic families as classified by Casalis ([4]). We proceed as follows: the notion of duality leads us unfortunatelyto change a bit the tradition about the exponen- sx θx tial families. Indeed we will use e− instead of e in order to obtain later more readable formulas. This is explained in Section 2, together with the description of the classical objects attached to an exponential family. In the preceding lines we have been vague about duality. Section 3 gives proper definitions, explaining what we call a dual measure µ∗ of µ and showing that some measures have no dual. We explain also what a T exponential family TF is. It is nothing but an exponential family F plus all its translations. Indeed, talking about the dual F∗ of an exponential family F does not exactly make sense, while the dual TF∗ of TF does. Section 3 gives also the link with large deviations. m Section 4 concentrates on the TF when the variance function VF(m) is e and some parent distributions. It also give details on what we call L´evy measures of types 0, 1 and 2. Of course, large parts of this material are well known from probabilists and statisticians: exponential families, variance functions, L´evy measures, Landau distribution. It was necessary to expose them again for commodity of reading. This section contains crucial calculations for the sequel in Proposition 7. Section 5 applies the results of Section 4 to the description of the duals of the Morris and the cubic families, with the surprizing fact that they exist all with the only exception of the hyperbolic family with variance function m2 + 1. Section 6 describes the self dual dilogarithm distribution µ on the set N = 0, 1,... of integers defined by { } ∞ ∞ 1 µ(n)zn = exp (zk 1) k2 − Xn=0 Xk=1 which, for m > 0, generates the exponential family with variance function em 1. Since the consideration of this exponential family and of a set of parent distributions is not done in the literature,− we develop some of their properties, somewhat deviating from the study of duality. For instance, if N is the standard Gaussian distribution, then the variance of the exponential family generated by the convolution N µ is em + 1. Section 7 considers the Rn case: The multinomial distribution∗ has a very explicit dual expressed in terms of the Landau distribution. The Wishart distribution is self dual as the one dimensional gamma distribution. We prove some negative results, like the fact that the multivariate negative binomial law has no dual. Section 8 discusses open problems. 2. Laplace and bilateral Laplace transforms At first the exponential families and Laplace transforms are considered. A certain tradition among statisticians (see Morris [21]) as opposed to physicists, and may be to probabilists, defines the Laplace transform of a positive measure µ on R and Rn as follows: θ,x Lµ(θ) = eh iµ(dx). (2) ZRn From the H¨older inequality the set D(µ) = θ ; Lµ(θ) < is a convex set, and the function kµ = log Lµ is convex on { ∞} D(µ). Actually kµ is strictly convex outside of the particular case where µ is concentrated on one point in the case of R, oronan affine hyperplane in the case of Rn. To avoid trivialities one introduces the interior Θ(µ) of D(µ). One calls (Rn) the set of µ which are not concentrated on an affine hyperplane and such that Θ(µ) is not empty. Such a µ generatesM a set of probabilities θ,x k (θ) P(θ, µ)(dx) = eh i− µ µ(dx) 2 and F = F(µ) = P(θ, µ) ; θ Θ(µ) is called the natural exponential family generated by µ. Note that µ is not { ∈ } R necessarily bounded: simple examples on like µ(dx) = 1(0, )(x)dx or n∞=0 δn generate the important families of ∞ exponential distributions or geometric discrete laws. Omiting ’natural’, weP will say always exponential family for short. Objects linked to F are the mean m of P(θ, µ), the domain of the means MF and the inverse function ψµ. They are defined by m = kµ′ (θ) = xP(θ, µ)(dx), MF = kµ′ (Θ(µ)), θ = ψµ(m). ZRn Note that since kµ is strictly convex, then kµ′ is injective on Θ(µ), the map ψµ from MF onto Θ(µ) is well defined and MF is a connected open set. If CF is the closed convex set generated by the support of µ clearly MF is contained in CF . We say that F or µ are steep if MF is equal to the interior of CF . Most of the classical exponential families are steep, but not always (see for n instance the Tweedie scale below for p < 0). In R the set MF is an interval, but in R there are non steep examples such that MF is non-convex ([16], p. 35). Finally, the last important object about the exponential family F is its variance function VF defined on MF by 1 VF(m) = = kµ′′(ψµ(m)) = (x m) (x m)P(ψµ(m), µ)(dx), ψµ′ (m) ZRn − ⊗ − which characterizes F. Now bilateral Laplace transforms are considered. In the particular case of dimension one, an older tradition asso- ∞ e sxµ dx s > ciates the name of Laplace to integrals 0 − ( ) which are conveniently defined for 0 in many circumstances. In the present paper we need to considerR what the physicists call the bilateral Laplace transform s,x Bµ(s) = e−h iµ(dx). (3) ZRn Dealing with this slight change of notation Bµ(s) = Lµ( s) will much simplify the description of duality between two natural exponential families. In the sequel, we say ’Laplac− e transform’ for short and by abus de langage instead of the longer term ’bilateral Laplace transform’. Because we will deal with these bilateral Laplace transforms, we have to modify the description of the classical objects associated to F = F(µ) with S (µ) = Θ(µ), ℓµ(s) = kµ( s), m = ℓ′ (s), (4) − − − µ 1 s = ϕµ(m) = ψµ(m), ℓ′′(s) = k′′( s), VF(m) = (ϕ′ (m))− .
Recommended publications
  • Luria-Delbruck Experiment
    Statistical mechanics approaches for studying microbial growth (Lecture 1) Ariel Amir, 10/2019 In the following are included: 1) PDF of lecture slides 2) Calculations related to the Luria-Delbruck experiment. (See also the original paper referenced therein) 3) Calculation of the fixation probability in a serial dilution protocol, taken from SI of: Guo, Vucelja and Amir, Science Advances, 5(7), eaav3842 (2019). Statistical mechanics approaches for studying microbial growth Ariel Amir E. coli Daughter Mother cell cells Time Doubling time/ generation time/growth/replication Taheri-Araghi et al. (2015) Microbial growth 25μm Escherichia coli Saccharomyces cerevisiae Halobacterium salinarum Stewart et al., Soifer al., Eun et al., PLoS Biol (2005), Current Biology (2016) Nat. Micro (2018) Doubling time ~ tens of minutes to several hours (credit: wikipedia) Time Outline (Lecture I) • Why study microbes? Luria-Delbruck experiment, Evolution experiments • Introduction to microbial growth, with focus on cell size regulation (Lecture II) • Size control and correlations across different domains of life • Going from single-cell variability to the population growth (Lecture III) • Bet-hedging • Optimal partitioning of cellular resources Why study microbes? Luria-Delbruck experiment Protocol • Grow culture to ~108 cells. • Expose to bacteriophage. • Count survivors (plating). What to do when the experiment is not reproducible? (variance, standard deviation >> mean) Luria and Delbruck (1943) Why study microbes? Luria-Delbruck experiment Model 1: adaptation
    [Show full text]
  • Hand-Book on STATISTICAL DISTRIBUTIONS for Experimentalists
    Internal Report SUF–PFY/96–01 Stockholm, 11 December 1996 1st revision, 31 October 1998 last modification 10 September 2007 Hand-book on STATISTICAL DISTRIBUTIONS for experimentalists by Christian Walck Particle Physics Group Fysikum University of Stockholm (e-mail: [email protected]) Contents 1 Introduction 1 1.1 Random Number Generation .............................. 1 2 Probability Density Functions 3 2.1 Introduction ........................................ 3 2.2 Moments ......................................... 3 2.2.1 Errors of Moments ................................ 4 2.3 Characteristic Function ................................. 4 2.4 Probability Generating Function ............................ 5 2.5 Cumulants ......................................... 6 2.6 Random Number Generation .............................. 7 2.6.1 Cumulative Technique .............................. 7 2.6.2 Accept-Reject technique ............................. 7 2.6.3 Composition Techniques ............................. 8 2.7 Multivariate Distributions ................................ 9 2.7.1 Multivariate Moments .............................. 9 2.7.2 Errors of Bivariate Moments .......................... 9 2.7.3 Joint Characteristic Function .......................... 10 2.7.4 Random Number Generation .......................... 11 3 Bernoulli Distribution 12 3.1 Introduction ........................................ 12 3.2 Relation to Other Distributions ............................. 12 4 Beta distribution 13 4.1 Introduction .......................................
    [Show full text]
  • NAG Library Chapter Introduction G01 – Simple Calculations on Statistical Data
    g01 – Simple Calculations on Statistical Data Introduction – g01 NAG Library Chapter Introduction g01 – Simple Calculations on Statistical Data Contents 1 Scope of the Chapter.................................................... 2 2 Background to the Problems ............................................ 2 2.1 Summary Statistics .................................................... 2 2.2 Statistical Distribution Functions and Their Inverses........................ 2 2.3 Testing for Normality and Other Distributions ............................. 3 2.4 Distribution of Quadratic Forms......................................... 3 2.5 Energy Loss Distributions .............................................. 3 2.6 Vectorized Functions .................................................. 4 3 Recommendations on Choice and Use of Available Functions ............ 4 3.1 Working with Streamed or Extremely Large Datasets ....................... 7 4 Auxiliary Functions Associated with Library Function Arguments ....... 7 5 Functions Withdrawn or Scheduled for Withdrawal ..................... 7 6 References............................................................... 7 Mark 24 g01.1 Introduction – g01 NAG Library Manual 1 Scope of the Chapter This chapter covers three topics: summary statistics statistical distribution functions and their inverses; testing for Normality and other distributions. 2 Background to the Problems 2.1 Summary Statistics The summary statistics consist of two groups. The first group are those based on moments; for example mean, standard
    [Show full text]
  • Luca Lista Statistical Methods for Data Analysis in Particle Physics Lecture Notes in Physics
    Lecture Notes in Physics 909 Luca Lista Statistical Methods for Data Analysis in Particle Physics Lecture Notes in Physics Volume 909 Founding Editors W. Beiglböck J. Ehlers K. Hepp H. Weidenmüller Editorial Board M. Bartelmann, Heidelberg, Germany B.-G. Englert, Singapore, Singapore P. Hänggi, Augsburg, Germany M. Hjorth-Jensen, Oslo, Norway R.A.L. Jones, Sheffield, UK M. Lewenstein, Barcelona, Spain H. von Löhneysen, Karlsruhe, Germany J.-M. Raimond, Paris, France A. Rubio, Donostia, San Sebastian, Spain S. Theisen, Potsdam, Germany D. Vollhardt, Augsburg, Germany J.D. Wells, Ann Arbor, USA G.P. Zank, Huntsville, USA The Lecture Notes in Physics The series Lecture Notes in Physics (LNP), founded in 1969, reports new devel- opments in physics research and teaching-quickly and informally, but with a high quality and the explicit aim to summarize and communicate current knowledge in an accessible way. Books published in this series are conceived as bridging material between advanced graduate textbooks and the forefront of research and to serve three purposes: • to be a compact and modern up-to-date source of reference on a well-defined topic • to serve as an accessible introduction to the field to postgraduate students and nonspecialist researchers from related areas • to be a source of advanced teaching material for specialized seminars, courses and schools Both monographs and multi-author volumes will be considered for publication. Edited volumes should, however, consist of a very limited number of contributions only. Proceedings will not be considered for LNP. Volumes published in LNP are disseminated both in print and in electronic for- mats, the electronic archive being available at springerlink.com.
    [Show full text]
  • Field Guide to Continuous Probability Distributions
    Field Guide to Continuous Probability Distributions Gavin E. Crooks v 1.0.0 2019 G. E. Crooks – Field Guide to Probability Distributions v 1.0.0 Copyright © 2010-2019 Gavin E. Crooks ISBN: 978-1-7339381-0-5 http://threeplusone.com/fieldguide Berkeley Institute for Theoretical Sciences (BITS) typeset on 2019-04-10 with XeTeX version 0.99999 fonts: Trump Mediaeval (text), Euler (math) 271828182845904 2 G. E. Crooks – Field Guide to Probability Distributions Preface: The search for GUD A common problem is that of describing the probability distribution of a single, continuous variable. A few distributions, such as the normal and exponential, were discovered in the 1800’s or earlier. But about a century ago the great statistician, Karl Pearson, realized that the known probabil- ity distributions were not sufficient to handle all of the phenomena then under investigation, and set out to create new distributions with useful properties. During the 20th century this process continued with abandon and a vast menagerie of distinct mathematical forms were discovered and invented, investigated, analyzed, rediscovered and renamed, all for the purpose of de- scribing the probability of some interesting variable. There are hundreds of named distributions and synonyms in current usage. The apparent diver- sity is unending and disorienting. Fortunately, the situation is less confused than it might at first appear. Most common, continuous, univariate, unimodal distributions can be orga- nized into a small number of distinct families, which are all special cases of a single Grand Unified Distribution. This compendium details these hun- dred or so simple distributions, their properties and their interrelations.
    [Show full text]
  • Characteristic Kernels and Infinitely Divisible Distributions
    Journal of Machine Learning Research 17 (2016) 1-28 Submitted 3/14; Revised 5/16; Published 9/16 Characteristic Kernels and Infinitely Divisible Distributions Yu Nishiyama [email protected] The University of Electro-Communications 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan Kenji Fukumizu [email protected] The Institute of Statistical Mathematics 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan Editor: Ingo Steinwart Abstract We connect shift-invariant characteristic kernels to infinitely divisible distributions on Rd. Characteristic kernels play an important role in machine learning applications with their kernel means to distinguish any two probability measures. The contribution of this paper is twofold. First, we show, using the L´evy–Khintchine formula, that any shift-invariant kernel given by a bounded, continuous, and symmetric probability density function (pdf) of an infinitely divisible distribution on Rd is characteristic. We mention some closure properties of such characteristic kernels under addition, pointwise product, and convolution. Second, in developing various kernel mean algorithms, it is fundamental to compute the following values: (i) kernel mean values mP (x), x , and (ii) kernel mean RKHS inner products ∈ X mP ,mQ H, for probability measures P,Q. If P,Q, and kernel k are Gaussians, then the computationh i of (i) and (ii) results in Gaussian pdfs that are tractable. We generalize this Gaussian combination to more general cases in the class of infinitely divisible distributions. We then introduce a conjugate kernel and a convolution trick, so that the above (i) and (ii) have the same pdf form, expecting tractable computation at least in some cases.
    [Show full text]
  • Selection-Like Biases Emerge in Population Models with Recurrent Jackpot Events
    Genetics: Early Online, published on August 31, 2018 as 10.1534/genetics.118.301516 GENETICS | INVESTIGATION Selection-like biases emerge in population models with recurrent jackpot events Oskar Hallatschek Departments of Physics and Integrative Biology, University of California, Berkeley, CA 94720 ABSTRACT Evolutionary dynamics driven out of equilibrium by growth, expansion or adaptation often generate a characteristi- cally skewed distribution of descendant numbers: The earliest, the most advanced or the fittest ancestors have exceptionally large number of descendants, which Luria and Delbrück called “jackpot" events. Here, I show that recurrent jackpot events generate a deterministic median bias favoring majority alleles, which is akin to positive frequency-dependent selection (propor- tional to the log-ratio of the frequencies of mutant and wild-type alleles). This fictitious selection force results from the fact that majority alleles tend to sample deeper into the tail of the descendant distribution. The flipside of this sampling effect is the rare occurrence of large frequency hikes in favor of minority alleles, which ensures that the allele frequency dynamics remains neutral in expectation, unless genuine selection is present. The resulting picture of a selection-like bias compensated by rare big jumps allows for an intuitive understanding of allele frequency trajectories and enables the exact calculation of transition densities for a range of important scenarios, including population size variations and different forms of natural selection. As a general signature of evolution by rare events, fictitious selection hampers the establishment of new beneficial mutations, counteracts balancing selection and confounds methods to infer selection from data over limited timescales. KEYWORDS his Genetics journal template is provided to help you write your work in the correct journal format.
    [Show full text]
  • A Fast and Compact Approximation of Energy Loss Fluctuation for Monte Carlo Simulation of Charged Particles Transport Armando Alaminos-Bouza
    A Fast and compact approximation of energy loss fluctuation for Monte Carlo simulation of charged particles transport Armando Alaminos-Bouza To cite this version: Armando Alaminos-Bouza. A Fast and compact approximation of energy loss fluctuation for Monte Carlo simulation of charged particles transport. 2015. hal-02480024 HAL Id: hal-02480024 https://hal.archives-ouvertes.fr/hal-02480024 Preprint submitted on 17 Feb 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Public Domain Mark| 4.0 International License A Fast and compact approximation of energy loss fluctuation for Monte Carlo simulation of charged particles transport. Armando Alaminos Bouza Abstract: A simple and fast functional model is proposed to approximate energy loss distributions of charged particles crossing slabs of matter. The most accepted physical models for treating this problem was created by Landau and later improved by Vavilov. Both models depend on complex functional forms with exact solutions that are, by far, too CPU intensive to be directly included in existing Monte Carlo codes. Several authors have proposed approximations with varying degree of accuracy and performance. This paper presents a compact and efficient form that approximates with enough accuracy the Vavilov distribution and its extreme cases of Landau and Gaussian shapes.
    [Show full text]
  • NAG Library Chapter Contents G01 – Simple Calculations on Statistical Data
    G01 – Simple Calculations on Statistical Data Contents – G01 NAG Library Chapter Contents g01 – Simple Calculations on Statistical Data g01 Chapter Introduction – a description of the Chapter and an overview of the algorithms available Function Mark of Name Introduction Purpose g01adc 7 nag_summary_stats_freq Mean, variance, skewness, kurtosis, etc., one variable, from frequency table g01aec 6 nag_frequency_table Frequency table from raw data g01alc 4 nag_5pt_summary_stats Five-point summary (median, hinges and extremes) g01amc 9 nag_double_quantiles Quantiles of a set of unordered values g01anc 23 nag_approx_quantiles_fixed Calculates approximate quantiles from a data stream of known size g01apc 23 nag_approx_quantiles_arbitrary Calculates approximate quantiles from a data stream of unknown size g01atc 24 nag_summary_stats_onevar Computes univariate summary information: mean, variance, skewness, kurtosis g01auc 24 nag_summary_stats_onevar_combine Combines multiple sets of summary information, for use after nag_summary_stats_onevar (g01atc) g01bjc 4 nag_binomial_dist Binomial distribution function g01bkc 4 nag_poisson_dist Poisson distribution function g01blc 4 nag_hypergeom_dist Hypergeometric distribution function g01dac 7 nag_normal_scores_exact Normal scores, accurate values g01dcc 7 nag_normal_scores_var Normal scores, approximate variance-covariance matrix g01ddc 4 nag_shapiro_wilk_test Shapiro and Wilk's W test for Normality g01dhc 4 nag_ranks_and_scores Ranks, Normal scores, approximate Normal scores or exponential (Savage) scores
    [Show full text]
  • Remarks on the Stable Sα(Β,Γ,Μ) Distribution
    Remarks on the Stable S®(¯; γ; ¹) Distribution Tibor K. Pog¶any & Saralees Nadarajah First version: 19 December 2011 Research Report No. 11, 2011, Probability and Statistics Group School of Mathematics, The University of Manchester Remarks on the stable Sα(β; γ; µ) distribution Tibor K. Pog´any Faculty of Maritime Studies, University of Rijeka, Rijeka, Croatia Saralees Nadarajah School of Mathematics, University of Manchester, Manchester M13 9PL, U.K. Abstract. Explicit closed forms are derived for the probability density function of the stable distribution Sα(β; γ; µ), α 2 (1; 2]. Consequent asymptotic expansions are given. The expressions involve the Srivastava-Daoust generalized Kamp´e de F´eriet hypergeometric S-function, the Fox-Wright generalized hypergeometric Ψ-function, and the Gauss hypergeometric function 2F1. 2000 Mathematics subject classification. Primary 60E10; Secondary 33C60, 62G32. Keywords and phrases. Characteristic function; Fox-Wright generalized hypergeo- metric Ψ-function; Probability density function; Srivastava-Daoust S function; Stable distribution Sα(β; γ; µ). 1. Introduction and preliminaries A random variable (r.v.) ξ is said to have the stable distribution if its charac- teristic function (CHF) is specified by [10, p. 8, Eq. (1.6)]: 8 exp iµt − γαjtjα 1 − iβ tan π α sgn(t) ; α 6= 1; <> 2 φ(t) = Eeitξ = (1.1) :> 2 exp iµt − γjtj 1 + iβ π ln t ; α = 1; where α 2 (0; 2], jβj ≤ 1, γ > 0 and µ 2 R. We write ξ ∼ Sα(β; γ; µ). The stable distribution class contains the Gaussian distribution (α = 2), the Cauchy distribution (α = 1, β = 0) and the L´evy distribution (α = 1=2, β = 1) as particular cases.
    [Show full text]
  • G01 Chapter Introduction
    G01 – Simple Calculations on Statistical Data Introduction – G01 NAG Library Chapter Introduction G01 – Simple Calculations on Statistical Data Contents 1 Scope of the Chapter ........................................ 2 2 Background to the Problems .................................. 2 2.1 Plots, Descriptive Statistics and Exploratory Data Analysis ............. 2 2.2 Statistical Distribution Functions and Their Inverses .................. 2 2.3 Testing for Normality and Other Distributions ...................... 3 2.4 Distribution of Quadratic Forms ............................... 3 2.5 Energy Loss Distributions ................................... 4 2.6 Vectorized Routines ....................................... 4 3 Recommendations on Choice and Use of Available Routines ......... 4 3.1 Working with Streamed or Extremely Large Datasets ................. 7 4 Auxiliary Routines Associated with Library Routine Parameters ..... 7 5 Routines Withdrawn or Scheduled for Withdrawal ................ 8 6 References ................................................ 8 Mark 24 G01.1 Introduction – G01 NAG Library Manual 1 Scope of the Chapter This chapter covers three topics: plots, descriptive statistics, and exploratory data analysis; statistical distribution functions and their inverses; testing for Normality and other distributions. 2 Background to the Problems 2.1 Plots, Descriptive Statistics and Exploratory Data Analysis Plots and simple descriptive statistics are generally used for one of two purposes: the presentation of data; exploratory data analysis. Exploratory data analysis (EDA) is used to pick out the important features of the data in order to guide the choice of appropriate models. EDA makes use of simple displays and summary statistics. These may suggest models or transformations of the data which can then be confirmed by further plots. The process is interactive between you, the data, and the program producing the EDA displays. The summary statistics consist of two groups.
    [Show full text]
  • Lecture 1: Introduction to Frequentist Statistics
    Statistical Methods for Particle Physics Lecture 1: introduction to frequentist statistics https://indico.weizmann.ac.il//conferenceDisplay.py?confId=52 Statistical Inference for Astro and Particle Physics Workshop Weizmann Institute, Rehovot March 8-12, 2015 Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 1 1 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA Outline for Monday – Thursday (GC = Glen Cowan, KC = Kyle Cranmer) Monday 9 March GC: probability, random variables and related quantities KC: parameter estimation, bias, variance, max likelihood Tuesday 10 March KC: building statistical models, nuisance parameters GC: hypothesis tests I, p-values, multivariate methods Wednesday 11 March KC: hypothesis tests 2, composite hyp., Wilks’, Wald’s thm. GC: asympotics 1, Asimov data set, sensitivity Thursday 12 March: KC: confidence intervals, asymptotics 2 GC: unfolding G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 1 2 Some statistics books, papers, etc. G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998 R.J. Barlow, Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences, Wiley, 1989 Ilya Narsky and Frank C. Porter, Statistical Analysis Techniques in Particle Physics, Wiley, 2014. L. Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986 F. James., Statistical and Computational Methods in Experimental Physics, 2nd ed., World Scientific, 2006 S. Brandt, Statistical and Computational Methods in Data Analysis, Springer, New York, 1998 (with program library on CD) J. Beringer et al. (Particle Data Group), Review of Particle Physics, Phys.
    [Show full text]