Cookbook of Distributions

Appendix A Cookbook of Distributions We’ve a first-class assortment of magic; And for raising a posthumous shade With effects that are comic or tragic, Theres no cheaper house in the trade. —from the opera The Sorcerer by W.S. Gilbert and Arthur Sullivan This appendix gives the definitions and properties of a variety of typical distributions for random variables. Most of these distributions are discussed elsewhere in the text, but having the definitions in a central location can be useful for reference. The notation used here is the same as can be found in other standard references, except where indicated otherwise. A.1 Bernoulli Distribution This is a discrete distribution where the random variable takes the value of 1 with probability p and the value of 0 with probability 1 − p. If we consider the toss of a fair coin, and we assign the outcome of heads the value of x = 1 and tails x = 0, then x is Bernoulli distributed with p = 0.5. For simplicity we also define q = 1 − p. A.1.1 Probability Mass Function (PMF) − px= f(x|p) = 1 0 px= 1 © Springer Nature Switzerland AG 2018 323 R. G. McClarren, Uncertainty Quantification and Predictive Computational Science, https://doi.org/10.1007/978-3-319-99525-0 324 A Cookbook of Distributions A.1.2 Cumulative Distribution Function (CDF) ⎧ ⎪ ⎨⎪0 x<0 F(x|p) = − p ≤ x< ⎪1 0 1 ⎩⎪ 1 x ≥ 1 A.1.3 Properties • Mean: E[x]=p • Median: ⎧ ⎪ ⎨⎪0 q>p = . q = p Median ⎪0 5 ⎩⎪ 1 q<p. • Mode: ⎧ ⎪ ⎨⎪0 q>p = { , } q = p Mode ⎪ 0 1 ⎩⎪ 1 q<p. • Variance: pq = p(1 − p) • Skewness: 1 − 2p γ = √ 1 pq • Excess kurtosis: − pq = 1 6 Kurt pq A.2 Binomial Distribution The binomial distribution is a discrete distribution that gives the number of binary events that are successes (i.e., the outcome is 1), out of n ∈ N trials when each trial has probability p of success. As an example, if I flip a fair coin (p = 0.5) ten times (n = 10), then the number of heads, x, in those ten tosses will be binomially A Cookbook of Distributions 325 distributed. The Bernoulli distribution is a special case of the binomial distribution with n = 1. A.2.1 PMF n f(x|n, p) = px( − p)n−x, x 1 where the binomial coefficient is given by n n! = . x x!(n − x)! A.2.2 CDF n 1−p n−x−1 x F(x|n, p) = I1−p(n − x,1 + x) = (n − x) t (1 − t) dt, x 0 where I1−p is the regularized incomplete beta function. A.2.3 Properties • Mean: E[x]=np • The median for a binomial distribution does not have a simple formula but it lies between the integer part of np and the value of np rounded up to the nearest integer, i.e., the median is in between np and np . • Mode: ⎧ ⎪ ⎨⎪(n + 1)p (n + 1)p is 0 or a noninteger = (n + )p (n + )p− (n + )p ∈{ ,...,n}, mode ⎪ 1 and 1 1 1 1 ⎩⎪ n(n+ 1)p = n + 1 • Variance: np(1 − p) • Skewness: 1 − 2p γ1 = √ np(1 − p) • Excess kurtosis: 326 A Cookbook of Distributions 1 − 6p(1 − p) Kurt = np(1 − p) A.3 Poisson Distribution The Poisson distribution is a discrete distribution on the non-negative integers that has a single parameter λ>0 that gives the probability of an event occurring x times, if the events occur independently and at a known average rate. A.3.1 PMF λxe−x f(x|λ) = x! A.3.2 CDF x λi F(x|λ) = e−λ i! i=0 A.3.3 Properties • Mean: E[x]=λ λ − λ + 1 • The median is greater than or equal to log 2 and less than 3 . • There are two modes: λ and λ −1. • Variance: λ • Skewness: γ = √1 1 λ • Excess kurtosis: − Kurt = λ 1 A Cookbook of Distributions 327 A.4 Normal Distribution, Gaussian Distribution The normal, or Gaussian, distribution is the most well known continuous distribution. It has two parameters, μ ∈ R and σ 2 > 0, that correspond to the mean and variance for the distribution. We write a random variable x that is normally distributed with parameters μ and σ 2 as x ∼ N (μ, σ 2). A.4.1 Probability Density Function (PDF) 1 − (x−μ)2 f(x|μ, σ 2) = √ e 2σ2 2πσ2 A.4.2 CDF 1 x − μ F(x|μ, σ 2) = 1 + erf √ 2 σ 2 where the error function erf(x) is defined as x 2 −t2 erf(x) = √ e dt. π 0 A.4.3 Properties • The mean, median, and mode is μ • The variance is σ 2 • The skewness and excess kurtosis are 0. The standard normal distribution has μ = 0 and σ = 1. Any normal distribution can be transformed into a standard normal by centering and scaling. If x ∼ N (μ, σ 2) then z ∼ N (0, 1) with x − μ z = . σ 328 A Cookbook of Distributions A.5 Multivariate Normal Distribution The multivariate normal distribution is multidimensional generalization of the T normal distribution. Here x is a k-dimensional vector, x = (x1,x2,...,xk) , µ is a vector of the expected value, or mean of each of the random variables Xi: T T µ = (E[x1],E[x2],...,E[xk]) = (μ1,μ2,...,μk) , the covariance matrix Σ is a symmetric positive definite matrix with the determinant of the matrix written as |Σ|. A vector that is distributed as a multivariate normal with mean vector µ and covariance matrix Σ is written as x ∼ N (µ,Σ). A.5.1 Probability Density Function (PDF) 1 1 − f(x|µ,Σ)= exp − (x − µ)TΣ 1(x − µ) . (2π)k|Σ| 2 A.5.2 CDF There is no closed form expression for the CDF. A.5.3 Properties • The mean and mode is µ. • The variance is the diagonal of Σ. A.6 Student’s t-Distribution, t-Distribution The t-distribution (also known as Student’s t-distribution1). This distribution resembles a standard normal distribution but it has an additional, positive, real parameter ν>0. In the limit ν →∞the distribution limits to a standard normal. The parameter ν is often called the number of degrees of freedom. With ν = 1, the 1This name arises because the distribution was popularized by William Sealy Gosset under the pseudonym “Student” (Student 1908) to hide, for competitive reasons, the fact that it was used on samples from the beer making process at the Guinness brewery in Dublin, Ireland. Brilliant! A Cookbook of Distributions 329 distribution is equivalent to the Cauchy distribution (see below). The smaller the value of ν the thicker the tails in the distribution. Other than the thick tales, the distribution is also used to model the possible errors of having a small number of samples from a normal distribution. Given n samples from a normal distribution, the difference between the sample mean and the true mean of the distribution is a t-distribution with ν = n − 1. A.6.1 Probability Density Function (PDF) Γ ν+1 − ν+1 2 x2 2 f(x|ν) = √ 1 + , νπ Γ ν ν 2 where Γ(x)is the gamma function. A.6.2 CDF F 1 , ν+1 ; 3 ;−x2 1 ν + 1 2 1 2 2 2 ν F(x|ν) = + xΓ × √ πνΓ ν 2 2 2 where 2F1(x) is the hypergeometric function. A.6.3 Properties • The median and mode are 0. The mean is also 0 for ν>1 and undefined for ν ≤ 1 • The variance has three different cases: it can be undefined, infinite, or finite depending on ν: ⎧ ⎪ ⎨⎪Undefined ν ≤ 1 Var = ∞ 1 <ν≤ 2 ⎪ ⎩ ν ν> ν−2 2 • The skewness is 0 for ν>3 and undefined otherwise. • The excess kurtosis is 6(ν − 3) for ν>4 and undefined otherwise. The t-distribution can be changed so that as ν →∞it goes to a normal distribution with mean μ and variance σ 2.Ifz is t-distributed with parameter ν, then x = μ + zσ will be a shifted and rescaled random variable so that it becomes normal with mean μ and variance σ 2 as ν →∞. 330 A Cookbook of Distributions A multivariate t-distribution also exists. In analogous fashion to the multivariate normal, there is a mean vector µ, a positive-definite matrix Σ, and ν>0 parameter. This distribution has PDF −(ν+p)/2 Γ [(ν + p)/2] 1 − f(x|µ,Σ,ν)= 1 + (x − µ)TΣ 1(x − µ) . Γ(ν/2)νp/2π p/2 |Σ|1/2 ν As ν →∞, this distribution goes to a multivariate normal with mean vector µ and covariance matrix Σ. A.7 Logistic Distribution The logistic distribution resembles a normal distribution but it has thicker tails (i.e., the excess kurtosis is not zero). The distribution gets its name from the fact that its CDF is the logistic function. The logistic distribution has two parameters, the real-valued μ and the positive, real s. A.7.1 Probability Density Function (PDF) − x−μ e s 1 x − μ f(x|μ, s) = = sech2 . − x−μ 2 4s 2s 1 + e s s A.7.2 CDF 1 1 1 x − μ F(x|μ, s) = = + tanh . − x−μ s 1 + e s 2 2 2 A.7.3 Properties • The mean, median, and mode are μ. • The variance is proportional to s: π 2 Var = s2 3 • The skewness is 0.

Load more