Probability Distributions CEE 201L
Total Page:16
File Type:pdf, Size:1020Kb
Probability Distributions CEE 201L. Uncertainty, Design, and Optimization Department of Civil and Environmental Engineering Duke University Philip Scott Harvey, Henri P. Gavin and Jeffrey T. Scruggs Spring 2022 1 Probability Distributions Consider a continuous, random variable (rv) X with support over the domain X . The probability density function (PDF) of X is the function fX (x) such that for any two numbers a and b in the domain X , with a < b, Z b P [a < X ≤ b] = fX (x) dx a For fX (x) to be a proper distribution, it must satisfy the following two conditions: 1. The PDF fX (x) is positive-valued; fX (x) ≥ 0 for all values of x ∈ X . R 2. The rule of total probability holds; the total area under fX (x) is 1; X fX (x) dx = 1. Alternately, X may be described by its cumulative distribution function (CDF). The CDF of X is the function FX (x) that gives, for any specified number x ∈ X , the probability that the random variable X is less than or equal to the number x is written as P [X ≤ x]. For real values of x, the CDF is defined by Z b FX (x) = P [X ≤ b] = fX (x) dx , −∞ so, P [a < X ≤ b] = FX (b) − FX (a) By the first fundamental theorem of calculus, the functions fX (x) and FX (x) are related as d f (x) = F (x) X dx X 2 CEE 201L. Uncertainty, Design, and Optimization – Duke University – Spring 2022 – P.S.H., H.P.G. and J.T.S. A few important characteristics of CDF’s of X are: 1. CDF’s, FX (x), are monotonic non-decreasing functions of x. 2. For any number a, P [X > a] = 1 − P [X ≤ a] = 1 − FX (a) R b 3. For any two numbers a and b with a < b, P [a < X ≤ b] = FX (b) − FX (a) = a fX (x)dx 2 Descriptors of random variables The expected or mean value of a continuous random variable X with PDF fX (x) is the centroid of the probability density. Z ∞ µX = E[X] = x fX (x) dx −∞ The expected value of an arbitrary function of X, g(X), with respect to the PDF fX (x) is Z ∞ µg(X) = E[g(X)] = g(x) fX (x) dx −∞ The variance of a continuous rv X with PDF fX (x) and mean µX gives a quantitative measure of how much spread or dispersion there is in the distribution of x values. The variance is calculated as Z ∞ 2 2 σX = V[X] = (x − µX ) fX (x) dx −∞ = = = = p The standard deviation (s.d.) of X is σX = V[X]. The coefficient of variation (c.o.v.) of X is defined as the ratio of the standard deviation σX to the mean µX : σX cX = µX for non-zero mean. The c.o.v. is a normalized measure of dispersion (dimensionless). A mode of a probability density function, fX (x), is a value of x such that the PDF is maximized; d fX (x) = 0 . dx x=xmode The median value, xm, is is the value of x such that P [X ≤ xm] = P [X > xm] = FX (xm) = 1 − FX (xm) = 0.5 . CC BY-NC-ND March 25, 2021 PSH, HPG, JTS Probability Distributions 3 3 Some common distributions The National Institute of Standards and Technology (NIST) lists properties of nineteen commonly used probability distributions in their Engineering Statistics Handbook. This section describes the properties of seven distributions. For each of these distributions, this document provides figures and equations for the PDF and CDF, equations for the mean and variance, the names of Matlab functions to generate samples, and empirical distributions of such samples. 3.1 The Normal distribution The Normal (or Gaussian) distribution is perhaps the most commonly used distribution function. 2 The notation X ∼ N (µX , σX ) denotes that X is a normal random variable with mean µX and 2 variance σX . The standard normal random variable, Z, or “z-statistic”, is distributed as N (0, 1). The probability density function of a standard normal random variable is so widely used it has its own special symbol, φ(z), ! 1 z2 φ(z) = √ exp − 2π 2 Any normally distributed random variable can be defined in terms of the standard normal random variable, through the change of variables X = µX + σX Z. If X is normally distributed, it has the PDF 2 ! x − µX 1 (x − µX ) fX (x) = φ = q exp − 2 σX 2 2σ 2πσX X There is no closed-form equation for the CDF of a normal random variable. Solving the integral z 1 Z 2 Φ(z) = √ e−u /2 du 2π −∞ would make you famous. Try it. The CDF of a normal random variable is expressed in terms of the error function, erf(z). If X is normally distributed, P [X ≤ x] can be found from the standard normal CDF x − µX P [X ≤ x] = FX (x) = Φ . σX Values for Φ(z) are tabulated and can be computed, e.g., the Matlab command . Prob_X_le_x = normcdf(x,muX,sigX). The standard normal PDF is symmetric about z = 0, so φ(−z) = φ(z), Φ(−z) = 1 − Φ(z), and P [X > x] = 1 − FX (x) = 1 − Φ ((x − µX )/σX ) = Φ ((µX − x)/σX ). The linear combination of two independent normal rv’s X1 and X2 (with means µ1 and µ2 and 2 2 variances σ1 and σ2) is also normally distributed, 2 2 2 2 aX1 + bX2 ∼ N aµ1 + bµ2, a σ1 + b σ2 , 2 2 and more specifically, aX − b ∼ N aµX − b, a σX . CC BY-NC-ND March 25, 2021 PSH, HPG, JTS 4 CEE 201L. Uncertainty, Design, and Optimization – Duke University – Spring 2022 – P.S.H., H.P.G. and J.T.S. Given the probability of a normal rv, i.e., given P [X ≤ x], the associated value of x can be found from the inverse standard normal CDF, x − µ X = z = Φ−1(P [X ≤ x]) . σX Values of the inverse standard normal CDF are tabulated, and can be computed, e.g., the Matlab command . x = norminv(Prob_X_le_x,muX,sigX). 3.2 The Log-Normal distribution The Normal distribution is symmetric and can be used to describe random variables that can take positive as well as negative values, regardless of the value of the mean and standard deviation. For many random quantities a negative value makes no sense (e.g., modulus of elasticity, air pressure, and distance). Using a distribution which admits only positive values for such quantities eliminates any possibility of non-sensical negative values. The log-normal distribution is such a distribution. If ln X is normally distributed (i.e., ln X ∼ N (µln X , σln X )) then X is called a log-normal random variable. In other words, if Y (= ln X) is normally distributed, eY (= X) is log-normally distributed. P [Y ≤ y] FY (y) Φ y−µY 2 2 σY µY = µln X , σY = σln X , P [ln X ≤ ln x] = Fln X (ln x) = Φ ln x−µln X P [X ≤ x] FX (x) σln X The mean and standard deviation of a log-normal variable X are related to the mean and standard deviation of ln X. 1 µ = ln µ − σ2 σ2 = ln 1 + (σ /µ )2 ln X X 2 ln X ln X X X If (σX /µX ) < 0.30, σln X ≈ (σX /µX ) = cX The median, xm, is a useful parameter of log-normal rv’s. By definition of the median value, half of the population lies above the median, and half lies below, so ln x − µ Φ m ln X = 0.5 σln X ln x − µ m ln X = Φ−1(0.5) = 0 σln X q 2 and, ln xm = µln X ↔ xm = exp(µln X ) ↔ µX = xm 1 + cX For the log-normal distribution xmode < xmedian < xmean. If cX < 0.15, xmedian ≈ xmean. If ln X is normally distributed (X is log-normal) then (for cX < 0.3) ln x − ln x P [X ≤ x] ≈ Φ m cX 2 2 n m If ln X ∼ N (µln X , σln X ), and ln Y ∼ N (µln Y , σln Y ), and Z = aX /Y then 2 ln Z = ln a + n ln X − m ln Y ∼ N (µln Z , σln Z ) where µln Z = ln a + nµln X − mµln Y = ln a + n ln xm − m ln ym 2 2 2 2 2 2 2 2 and σln Z = (nσln X ) + (mσln Y ) = n ln(1 + cX ) + m ln(1 + cY ) = ln(1 + cZ ) CC BY-NC-ND March 25, 2021 PSH, HPG, JTS Probability Distributions 5 Uniform X ∼ U[a, b] Triangular X ∼ T (a, b, c) a ≤ X ≤ b; a ≤ X ≤ b, a ≤ c ≤ b 1/(b-a) 2/(b-a) p.d.f., f(x) p.d.f., f(x) 0 0 a µ-σ µ µ+ σ b a µ-σc µ µ+ σ µ+2 σ b 0.97 0.79 0.82 0.5 0.55 c.d.f., F(x) c.d.f., F(x) 0.21 0.17 a µ-σ µ µ+ σ b a µ-σc µ µ+ σ µ+2 σ b x x 2(x−a) ( , x ∈ [a, c] 1 , x ∈ [a, b] (b−a)(c−a) f(x) = b−a f(x) = 2(b−x) , x ∈ [c, b] 0, otherwise (b−a)(b−c) 0, otherwise 0, x ≤ a 2 0, x ≤ a (x−a) x−a (b−a)(c−a) , x ∈ [a, c] F (x) = b−a , x ∈ [a, b] F (x) = (b−x)2 1 − , x ∈ [c, b] 1, x ≥ b (b−a)(b−c) 1, x ≥ b 1 1 µX = 2 (a + b) µX = 3 (a + b + c) 2 1 2 2 1 2 2 2 σX = 12 (b − a) σX = 18 (a + b + c − ab − ac − bc) x = a + (b-a)*rand(1,N); x = triangular rnd(a,b,c,1,N); 0.8 0.8 0.6 0.6 0.4 0.4 empirical p.d.f.