Appendix A Cookbook of Distributions

We’ve a first-class assortment of magic; And for raising a posthumous shade With effects that are comic or tragic, Theres no cheaper house in the trade. —from the opera The Sorcerer by W.S. Gilbert and Arthur Sullivan

This appendix gives the definitions and properties of a variety of typical distributions for random variables. Most of these distributions are discussed elsewhere in the text, but having the definitions in a central location can be useful for reference. The notation used here is the same as can be found in other standard references, except where indicated otherwise.

A.1

This is a discrete distribution where the takes the value of 1 with probability p and the value of 0 with probability 1 − p. If we consider the toss of a fair coin, and we assign the outcome of heads the value of x = 1 and tails x = 0, then x is Bernoulli distributed with p = 0.5. For simplicity we also define q = 1 − p.

A.1.1 Probability Mass Function (PMF)  − px= f(x|p) = 1 0 px= 1

© Springer Nature Switzerland AG 2018 323 R. G. McClarren, Uncertainty Quantification and Predictive Computational Science, https://doi.org/10.1007/978-3-319-99525-0 324 A Cookbook of Distributions

A.1.2 Cumulative Distribution Function (CDF) ⎧ ⎪ ⎨⎪0 x<0 F(x|p) = − p ≤ x< ⎪1 0 1 ⎩⎪ 1 x ≥ 1

A.1.3 Properties

• Mean: E[x]=p • : ⎧ ⎪ ⎨⎪0 q>p = . q = p Median ⎪0 5 ⎩⎪ 1 q

: ⎧ ⎪ ⎨⎪0 q>p = { , } q = p Mode ⎪ 0 1 ⎩⎪ 1 q

: pq = p(1 − p) • : 1 − 2p γ = √ 1 pq

• Excess : − pq = 1 6 Kurt pq

A.2

The binomial distribution is a discrete distribution that gives the number of binary events that are successes (i.e., the outcome is 1), out of n ∈ N trials when each trial has probability p of success. As an example, if I flip a fair coin (p = 0.5) ten times (n = 10), then the number of heads, x, in those ten tosses will be binomially A Cookbook of Distributions 325 distributed. The Bernoulli distribution is a special case of the binomial distribution with n = 1.

A.2.1 PMF   n f(x|n, p) = px( − p)n−x, x 1 where the binomial coefficient is given by   n n! = . x x!(n − x)!

A.2.2 CDF   n 1−p n−x−1 x F(x|n, p) = I1−p(n − x,1 + x) = (n − x) t (1 − t) dt, x 0 where I1−p is the regularized incomplete beta function.

A.2.3 Properties

• Mean: E[x]=np • The median for a binomial distribution does not have a simple formula but it lies between the integer part of np and the value of np rounded up to the nearest integer, i.e., the median is in between np and np . • Mode: ⎧ ⎪ ⎨⎪(n + 1)p (n + 1)p is 0 or a noninteger = (n + )p (n + )p− (n + )p ∈{ ,...,n}, mode ⎪ 1 and 1 1 1 1 ⎩⎪ n(n+ 1)p = n + 1

• Variance: np(1 − p) • Skewness: 1 − 2p γ1 = √ np(1 − p)

• Excess kurtosis: 326 A Cookbook of Distributions

1 − 6p(1 − p) Kurt = np(1 − p)

A.3

The Poisson distribution is a discrete distribution on the non-negative integers that has a single parameter λ>0 that gives the probability of an event occurring x times, if the events occur independently and at a known average rate.

A.3.1 PMF

λxe−x f(x|λ) = x!

A.3.2 CDF

x λi F(x|λ) = e−λ i! i=0

A.3.3 Properties

• Mean: E[x]=λ λ − λ + 1 • The median is greater than or equal to log 2 and less than 3 . • There are two modes: λ and λ −1. • Variance: λ • Skewness:

γ = √1 1 λ

• Excess kurtosis:

− Kurt = λ 1 A Cookbook of Distributions 327

A.4 , Gaussian Distribution

The normal, or Gaussian, distribution is the most well known continuous distri- bution. It has two parameters, μ ∈ R and σ 2 > 0, that correspond to the mean and variance for the distribution. We write a random variable x that is normally distributed with parameters μ and σ 2 as x ∼ N (μ, σ 2).

A.4.1 Probability Density Function (PDF)

1 − (x−μ)2 f(x|μ, σ 2) = √ e 2σ2 2πσ2

A.4.2 CDF   1 x − μ F(x|μ, σ 2) = 1 + erf √ 2 σ 2 where the error function erf(x) is defined as

x 2 −t2 erf(x) = √ e dt. π 0

A.4.3 Properties

• The mean, median, and mode is μ • The variance is σ 2 • The skewness and excess kurtosis are 0. The standard normal distribution has μ = 0 and σ = 1. Any normal distribution can be transformed into a standard normal by centering and scaling. If x ∼ N (μ, σ 2) then z ∼ N (0, 1) with x − μ z = . σ 328 A Cookbook of Distributions

A.5 Multivariate Normal Distribution

The multivariate normal distribution is multidimensional generalization of the T normal distribution. Here x is a k-dimensional vector, x = (x1,x2,...,xk) , µ is a vector of the , or mean of each of the random variables Xi:

T T µ = (E[x1],E[x2],...,E[xk]) = (μ1,μ2,...,μk) , the covariance matrix Σ is a symmetric positive definite matrix with the determinant of the matrix written as |Σ|. A vector that is distributed as a multivariate normal with mean vector µ and covariance matrix Σ is written as x ∼ N (µ,Σ).

A.5.1 Probability Density Function (PDF)   1 1 − f(x|µ,Σ)= exp − (x − µ)TΣ 1(x − µ) . (2π)k|Σ| 2

A.5.2 CDF

There is no closed form expression for the CDF.

A.5.3 Properties

• The mean and mode is µ. • The variance is the diagonal of Σ.

A.6 Student’s t-Distribution, t-Distribution

The t-distribution (also known as Student’s t-distribution1). This distribution resem- bles a standard normal distribution but it has an additional, positive, real parameter ν>0. In the limit ν →∞the distribution limits to a standard normal. The parameter ν is often called the number of degrees of freedom. With ν = 1, the

1This name arises because the distribution was popularized by William Sealy Gosset under the pseudonym “Student” (Student 1908) to hide, for competitive reasons, the fact that it was used on samples from the beer making process at the Guinness brewery in Dublin, Ireland. Brilliant! A Cookbook of Distributions 329 distribution is equivalent to the (see below). The smaller the value of ν the thicker the tails in the distribution. Other than the thick tales, the distribution is also used to model the possible errors of having a small number of samples from a normal distribution. Given n samples from a normal distribution, the difference between the sample mean and the true mean of the distribution is a t-distribution with ν = n − 1.

A.6.1 Probability Density Function (PDF)   Γ ν+1  − ν+1 2 x2 2 f(x|ν) = √   1 + , νπ Γ ν ν 2 where Γ(x)is the gamma function.

A.6.2 CDF     F 1 , ν+1 ; 3 ;−x2 1 ν + 1 2 1 2 2 2 ν F(x|ν) = + xΓ × √   πνΓ ν 2 2 2 where 2F1(x) is the hypergeometric function.

A.6.3 Properties

• The median and mode are 0. The mean is also 0 for ν>1 and undefined for ν ≤ 1 • The variance has three different cases: it can be undefined, infinite, or finite depending on ν: ⎧ ⎪ ⎨⎪Undefined ν ≤ 1 Var = ∞ 1 <ν≤ 2 ⎪ ⎩ ν ν> ν−2 2

• The skewness is 0 for ν>3 and undefined otherwise. • The excess kurtosis is 6(ν − 3) for ν>4 and undefined otherwise. The t-distribution can be changed so that as ν →∞it goes to a normal distribution with mean μ and variance σ 2.Ifz is t-distributed with parameter ν, then x = μ + zσ will be a shifted and rescaled random variable so that it becomes normal with mean μ and variance σ 2 as ν →∞. 330 A Cookbook of Distributions

A multivariate t-distribution also exists. In analogous fashion to the multivariate normal, there is a mean vector µ, a positive-definite matrix Σ, and ν>0 parameter. This distribution has PDF

−(ν+p)/2 Γ [(ν + p)/2] 1 − f(x|µ,Σ,ν)= 1 + (x − µ)TΣ 1(x − µ) . Γ(ν/2)νp/2π p/2 |Σ|1/2 ν

As ν →∞, this distribution goes to a multivariate normal with mean vector µ and covariance matrix Σ.

A.7

The logistic distribution resembles a normal distribution but it has thicker tails (i.e., the excess kurtosis is not zero). The distribution gets its name from the fact that its CDF is the logistic function. The logistic distribution has two parameters, the real-valued μ and the positive, real s.

A.7.1 Probability Density Function (PDF)

− x−μ   e s 1 x − μ f(x|μ, s) =   = sech2 . − x−μ 2 4s 2s 1 + e s s

A.7.2 CDF   1 1 1 x − μ F(x|μ, s) = = + tanh . − x−μ s 1 + e s 2 2 2

A.7.3 Properties

• The mean, median, and mode are μ. • The variance is proportional to s:

π 2 Var = s2 3 • The skewness is 0. • The excess kurtosis is 1.2. A Cookbook of Distributions 331

A.8 Cauchy Distribution, Lorentz Distribution, or Breit-Wigner Distribution

The Cauchy distribution is a special case of the t-distribution with ν = 1. It has a PDF that is finite everywhere, but has undefined mean, variance, skewness, and excess kurtosis. The distribution has two parameters, x0 ∈ R and γ>0. The median and mode of the distribution are at x0.

A.8.1 Probability Density Function (PDF)

   − x − x 2 1 f(x|x ,γ)= 1 + 0 . 0 πγ 1 γ

A.8.2 CDF   1 1 x − x F(x|x ,γ)= + arctan 0 . 0 2 π γ

A.9 Gumbel Distribution

The Gumbel distribution is often used to model the maximum of a random variable. It has two parameters m ∈ R and β>0. It has positive skew and excess kurtosis. The CDF has one of the few occurrences of the exponential of an exponential.

A.9.1 Probability Density Function (PDF)

x − m f(x|m, β) = 1 e−(z+e−z), z = . β where β

A.9.2 CDF

F(x|m, β) = ee−(z−μ)/β . 332 A Cookbook of Distributions

A.9.3 Properties

• The mean of the Gumbel distribution is μ = m + βγ,where γ ≈ 0.5772 is the Euler-Mascheroni constant. • The median is m − β log(log 2). • The mode is m. • The variance is proportional to β2:

π 2 Var = β2 6 • The skewness is positive: √ 12 6 ζ(3) γ = ≈ 1.14, 1 π 3

where ζ(3) ≈ 1.20205 is Apéry’s constant. 12 • The excess kurtosis is 5 .

A.10 , Double

The Laplace distribution resembles the normal distribution except that it has an absolute value in the exponential, rather than the quadratic exponent. It takes two parameters, m ∈ R and b>0. It is a symmetric distribution about m and has nonzero excess kurtosis.

A.10.1 Probability Density Function (PDF)

1 − |x−m| f(x|m, b) = e b 2b

A.10.2 CDF  x−m 1 e b x

A.10.3 Properties

• The mean, median, and mode are all m. • The variance is proportional to b2:

Var = 2b2

• The skewness is 0. • The excess kurtosis is 3.

A.11 Uniform Distribution

A uniform random variable is equally likely to take on any value in the interval [a,b], with b>a.Ifx is a uniformly-distributed random variable, we write x ∼ U (a, b).

A.11.1 Probability Density Function (PDF)  1 x ∈[a,b] f(x|a,b) = b−a . 0 otherwise

A.11.2 CDF ⎧ ⎪ ⎨⎪0 x

A.11.3 Properties

• The mean and median are (a + b)/2 • The mode is any value in [a,b]. • The variance is (b − a)2/12 • The skewness is 0. • The excess kurtosis is −6/5. 334 A Cookbook of Distributions

We can define a standardized uniform random variable that has on [−1, 1] that we call z ∼ U (−1, 1), by taking x ∼ U (a, b) and defining a + b − x z = 2 . a − b

A.12

The beta distribution describes random variables that take on values in the interval [−1, 1] and can be described by two parameters α>−1 and β>−1. If z is a beta-distributed random variable, we write x ∼ B(α, β).

A.12.1 Probability Density Function (PDF)

−(α+β+1) 2 Γ(α+ 1) + Γ(β+ 1) β α f(x|α, β) = (1 + x) (1 − x) x ∈[−1, 1]. α + β + 1 Γ(α+ β + 1)

The PDF can also be expressed using the beta function, Γ(α)Γ(β) (α, β) = , B Γ(α+ β) as

−(α+β+1) 2 β α f(x|α, β) = (1 + z) (1 − z) z ∈[−1, 1]. B(α + 1,β+ 1)

A.12.2 CDF

F(x|α, β) = Ix(α + 1, beta + 1), where Ix(α, β) is the regularized incomplete beta function:

B(x; a,b) Ix(a, b) = , B(a, b) and x a− b− B(x; a,b) = t 1 (1 − t) 1 dt. 0 A Cookbook of Distributions 335

A.12.3 Properties

• The mean is −(α + ) + (β + )b E[x]= 1 1 . α + β + 2

• The variance is 4(α + 1)(β + 1) Var(x) = (α + β + 2)2(α + β + 3)

We can scale a beta random variable, z ∼ B(α, β), to have support on the interval [a,b] by writing b − a a + b x = z + . 2 2

Note: the more common definition for beta random variables uses α = α + 1 and β = β + 1 and has the distribution supported on [0, 1]. In this work we choose our definition so that the PDF for the standardized gamma random variable is the weighting function in the orthogonality relation for Jacobi polynomials, which have a domain of [−1.1].

A.13

The gamma distribution describes random variables that take on values on the positive real line and can be described by two parameters α>−1 and β>0. If x is a gamma-distributed random variable, we write x ∼ G (α, β).

A.13.1 Probability Density Function (PDF)

β(α+1)xαe−βx f(x|α, β) = ,x∈ (0, ∞), α > −1,β>0. Γ(α+ 1)

A.13.2 CDF

γ(α+ ,βx) F(x|α, β) = 1 , Γ(α+ 1) where γ(a,b)is the lower incomplete Gamma function. 336 A Cookbook of Distributions

A.13.3 Properties

• The mean is (α + 1)β−1. • There is no simple formula for the median. • The mode is αβ −1 for α>0. • The variance is (α + 1)β−2. • The skewness is 2 γ1 = √ . α + 1

• The excess kurtosis is 6 Kur = . α + 1

A standardized gamma random variable can be defined as z = βx where x ∼ G (α, β) and zG (α, 1).NowthePDFforz is

zαe−z f(z|α) = ,z∈ (0, ∞), α > −1. Γ(α+ 1)

Note: the more common definition for gamma random variables uses α = α + 1 but the same parameter β. In this work we choose our definition so that the PDF for the standardized gamma random variable is the weighting function in the orthogonality relation for generalized Laguerre polynomials.

A.14 Inverse Gamma Distribution

The inverse gamma distribution describes random variables whose reciprocal is a gamma random variable. Inverse gamma random variables take on values on the positive real line and can be described by two parameters α>0 and β>0. If x is an inverse gamma-distributed random variable, we write x ∼ IG(α, β). In this case we also have x−1 ∼ G (α + 1,β).

A.14.1 Probability Density Function (PDF)

(α) −α− − β β x 1e x f(x|α, β) = ,x∈ ( , ∞), α > ,β> . Γ(α) 0 0 0 A Cookbook of Distributions 337

A.14.2 CDF   β Γ α, x F(x|α, β) = , Γ(α) where Γ(a,b)is the upper incomplete Gamma function.

A.14.3 Properties

• The mean is (α − 1)−1β for α>1 • There is no simple formula for the median. • The mode is (α + 1)−1β. • The variance is (α − 1)−2(α − 2)−1β2 for α>2. • The skewness is √ 4 α − 21 γ = ,α>3. 1 α − 3

• The excess kurtosis is 30α − 66 Kur = ,α>4. (α − 3)(α − 4)

A.15 Exponential Distribution

The exponential distribution is used for nonnegative random variables with a single, positive parameter λ. The exponential distribution is a special case of the gamma distribution with α = 0 and β = λ. The exponential distribution is used to describe the distance between collisions for subatomic particles (e.g., light, neutrons, electrons) traveling in a given media with λ−1 being the average distance traveled between collisions.

A.15.1 Probability Density Function (PDF)

f(x|λ) = λe−λx 338 A Cookbook of Distributions

A.15.2 CDF

−λx F(x|m, b) = 1 − e

A.15.3 Properties

• The mean is λ−1. • The median is λ−1 log 2. • The mode is 0. • The variance is λ−2. • The skewness is 2. • The excess kurtosis is 6. References

Agnesi M (1748) Instituzioni analitiche ad uso della gioventú italiana. Nella Regia-Ducal Corte Barth A, Schwab C, Zollinger N (2011) Multi-level Monte Carlo finite element method for elliptic PDEs with stochastic coefficients. Numer Math 119(1):123–161 Bastidas-Arteaga E, Soubra AH (2006) Reliability analysis methods. In: Stochastic analysis and inverse modelling, ALERT Doctoral School 2014, pp 53–77 Bayarri MJ, Berger JO, Cafeo J, Garcia-Donato G, Liu F, Palomo J, Parthasarathy RJ, Paulo R, Sacks J, Walsh D (2007) Computer model validation with functional output. Ann Stat 35(5):1874–1906 Bernoulli J (1713) Ars conjectandi, opus posthumum. Accedit Tractatus de seriebus infinitis, et epistola gallic scripta de ludo pilae reticularis. Thurneysen Brothers, Basel Boyd JP (2001) Chebyshev and fourier spectral methods. Dover Publications, Mineola Bratley P, Fox BL, Niederreiter H (1992) Implementation and tests of low-discrepancy sequences. ACM Trans Model Comput Simul 2(3):195–213 Breiman L (2001) Random forests. Mach Learn 45(1):5–32 Cacuci DG (2015) Second-order adjoint sensitivity analysis methodology (2nd-ASAM) for computing exactly and efficiently first- and second-order sensitivities in large-scale linear systems: I. Computational methodology. J Comput Phys 284:687–699 Carlin BP, Louis TA (2008) Bayesian methods for data analysis. Chapman & Hall/CRC texts in statistical science, 3rd edn. CRC Press, Boca Raton Carpentier A, Munos R (2012) Adaptive stratified sampling for Monte-Carlo integration of differentiable functions. In: Advances in neural information processing systems, vol. 25, pp 251–259 Chowdhary K, Dupuis P (2013) Distinguishing and integrating aleatoric and epistemic variation in uncertainty quantification. ESAIM Math Model Numer Anal 47(3):635–662 Cliffe KA, Giles MB, Scheichl R, Teckentrup AL (2011) Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Comput Vis Sci 14(1):3–15 Collaboration OS et al (2015) Estimating the reproducibility of psychological science. Science 349(6251):aac4716 Collier N, Haji-Ali AL, Nobile F, Schwerin E, Tempone R (2015) A continuation multilevel Monte Carlo algorithm. BIT Numer Math 55(2):1–34 Collins GP (2009) Within any possible universe, no intellect can ever know it all. Scientific American Constantine PG (2015) Active subspaces: emerging ideas for dimension reduction in parameter studies. SIAM spotlights, vol 2. SIAM, Philadelphia. ISBN 1611973864, 9781611973860

© Springer Nature Switzerland AG 2018 339 R. G. McClarren, Uncertainty Quantification and Predictive Computational Science, https://doi.org/10.1007/978-3-319-99525-0 340 References

Cook AH (1965) The absolute determination of the acceleration due to gravity. Metrologia 1(3):84–114 Denison DG, Mallick BK, Smith AF (1998) Bayesian MARS. Stat Comput 8(4):337–346 Denison DGT, Holmes CC, Mallick BK, Smith AFM (2002) Bayesian methods for nonlinear classification and regression. Wiley, Chichester Der Kiureghian A, Ditlevsen O (2009) Aleatory or epistemic? Does it matter? Struct Saf 31(2):105–112 Farrell PE, Ham DA, Funke SW, Rognes ME (2013) Automated derivation of the adjoint of high- level transient finite element programs. SIAM J Sci Comput 35(4):C369–C393 Faure H (1982) Discrépance de suites associées à un système de numération (en dimension s). Acta Arith 41(4):337–351 Ferson S, Kreinovich V, Ginzburg L, Myers D, Sentz K (2003) Constructing probability boxes and dempster-shafer structures. Tech. Rep. SAND2002-4015, Sandia National Laboratories Ferson S, Kreinovich V, Hajagos J, Oberkampf W, Ginzburg L (2007) Experimental uncertainty estimation and for data having interval uncertainty. Tech. Rep. SAND2007-0939, Sandia National Laboratories Fox BL (1986) Algorithm 647: implementation and relative efficiency of quasirandom sequence generators. ACM Trans Math Softw 12(4):362–376 Gal Y,Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International conference on , pp 1050–1059 Ghanem RG, Spanos PD (1991) Stochastic finite elements: a spectral approach. Springer, Berlin Giles MB (2013) Multilevel monte carlo methods. In: Monte Carlo and Quasi-Monte Carlo methods 2012. Springer, Berlin, pp 83–103 Gilks W, Spiegelhalter D (1996) Markov chain Monte Carlo in practice. Chapman & Hall, London Goh J (2014) Prediction and calibration using outputs from multiple computer simulators. PhD thesis, Simon Fraser University Goh J, Bingham D, Holloway JP, Grosskopf MJ, Kuranz CC, Rutter E (2013) Prediction and computer model calibration using outputs from multifidelity simulators. Technometrics 55(4):501–512 Gramacy RB, Apley DW (2015) Local Gaussian process approximation for large computer experiments. J Comput Graph Stat 24(2):561–578 Gramacy RB, Lee HKH (2008) Bayesian treed Gaussian process models with an application to computer modeling. J Am Stat Assoc 103(483):1119–1130 Gramacy RB et al (2007) TGP: an R package for Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian process models. J Stat Softw 19(9):6 Gramacy RB, Niemi J, Weiss RM (2014) Massively parallel approximate Gaussian process regression. SIAM/ASA J Uncertain Quantif 2(1):564–584 Gramacy RB, Bingham D, Holloway JP, Grosskopf MJ, Kuranz CC, Rutter E, Trantham M, Drake RP et al (2015) Calibrating a large computer experiment simulating radiative shock hydrodynamics. Ann Appl Stat 9(3):1141–1168 Griewank A, Walther A (2008) Evaluating derivatives: principles and techniques of algorithmic differentiation, vol 105. SIAM, Philadelphia Gunzburger MD, Webster CG, Zhang G (2014) Stochastic finite element methods for partial differential equations with random input data. Acta Numer 23:521–650 Haldar A, Mahadevan S (2000) Probability, reliability, and statistical methods in engineering design. Wiley, New York Halpern JY (2017) Reasoning about uncertainty. MIT Press, Cambridge Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Data Mining, Inference, and Prediction, 2nd edn. Springer Science & Business Media, New York Higdon D, Kennedy M, Cavendish JC, Cafeo JA, Ryne RD (2004) Combining field data and computer simulations for calibration and prediction. SIAM J Sci Comput 26(2):448 Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67 References 341

Holloway JP, Bingham D, Chou CC, Doss F, Drake RP, Fryxell B, Grosskopf M, van der Holst B, Mallick BK, McClarren R, Mukherjee A, Nair V, Powell KG, Ryu D, Sokolov I, Toth G, Zhang Z (2011) Predictive modeling of a radiative shock system. Reliab Eng Syst Saf 96(9):1184– 1193 Holtz M (2011) Sparse grid quadrature in high dimensions with applications in finance and insurance. Lecture notes in computational science and engineering, vol 77. Springer, Berlin Humbird KD, McClarren RG (2017) Adjoint-based sensitivity analysis for high-energy density radiative transfer using flux-limited diffusion. High Energy Density Phys 22:12–16 Humbird K, Peterson J, McClarren R (2017) Deep jointly-informed neural networks. arXiv:170700784 John LK, Loewenstein G, Prelec D (2012) Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol Sci 23(5):524–532. https://doi.org/10.1177/ 0956797611430953 Jolliffe I (2002) Principal component analysis. Springer series in statistics. Springer, Berlin Jones S (2009) The formula that felled Wall St. The Financial Times Kalos M, Whitlock P (2008) Monte Carlo methods. Wiley-Blackwell, Hoboken Karagiannis G, Lin G (2017) On the Bayesian calibration of computer model mixtures through experimental data, and the design of predictive models. J Comput Phys 342:139–160 Kennedy MC, O’Hagan A (2000) Predicting the output from a complex computer code when fast approximations are available. Biometrika 87(1):1–13 Knupp P, Salari K (2002) Verification of computer codes in computational science and engineering. Discrete mathematics and its applications. CRC Press, Boca Raton Kreinovich V, Ferson SA (2004) A new Cauchy-based black-box technique for uncertainty in risk analysis. Reliab Eng Syst Saf 85(1–3):267–279 Kreinovich V, Nguyen HT (2009) Towards intuitive understanding of the Cauchy deviate method for processing interval and fuzzy uncertainty. In: Proceedings of the 2015 conference of the international fuzzy systems association and the european society for fuzzy logic and technology conference, pp 1264–1269 Kreinovich V, Beck J, Ferregut C, Sanchez A, Keller G, Averill M, Starks S (2004) Monte- Carlo-type techniques for processing interval uncertainty, and their engineering applications. In: Proceedings of the workshop on reliable engineering computing, pp 15–17 Kurowicka D, Cooke RM (2006) Uncertainty analysis with high dimensional dependence mod- elling. Wiley, Chichester Lahman S (2017) Baseball database. http://wwwseanlahmancom/baseball-archive/statistics LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 Ling J (2015) Using machine learning to understand and mitigate model form uncertainty in turbulence models. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA). IEEE, Piscataway, pp 813–818 Lyness JN, Moler CB (1967) Numerical differentiation of analytic functions. SIAM J Numer Anal 4(2):202–210 Marsaglia G, Tsang WW, Wang J (2003) Evaluating Kolmogorov’s distribution. J Stat Softw 8(18):1–4. https://doi.org/10.18637/jss.v008.i18 McClarren RG, Ryu D, Drake RP, Grosskopf M, Bingham D, Chou CC, Fryxell B, van der Holst B, Holloway JP, Kuranz CC, Mallick B, Rutter E, Torralva BR (2011) A physics informed emulator for laser-driven radiating shock simulations. Reliab Eng Syst Saf 96(9):1194–1207 Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. https://doi.org/10. 1063/1.1699114 National Academy of Science (2012) Building confidence in computational models: the science of verification, validation, and uncertainty quantification. National Academies Press, Washington Oberkampf WL, Roy CJ (2010) Verification and validation in scientific computing, 1st edn. Cambridge University Press, New York Owhadi H, Scovel C, Sullivan TJ, McKerns M, Ortiz M (2013) Optimal uncertainty quantification. SIAM Rev 55(2):271–345 342 References

Owhadi H, Scovel C, Sullivan T (2015) Brittleness of Bayesian inference under finite information in a continuous world. Electron J Stat 9(1):1–79 Peterson J, Humbird K, Field J, Brandon S, Langer S, Nora R, Spears B, Springer P (2017) Zonal flow generation in inertial confinement fusion implosions. Phys Plasmas 24(3):032702 Rackwitz R, Flessler B (1978) Structural reliability under combined random load sequences. Comput Struct 9(5):489–494 Raissi M, Karniadakis GE (2018) Hidden physics models: machine learning of nonlinear partial differential equations. J Computat Phys 357:125–141 Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge Roache PJ (1998) Verification and validation in computational science and engineering. Hermosa Publishers, Albuquerque Robert C, Casella G (2013) Monte Carlo statistical methods. Springer Science & Business Media, New York Roberts GO, Gelman A, Gilks WR (1997) Weak convergence and optimal scaling of random walk metropolis algorithms. Ann Appl Probab 7(1):110–120 Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, Saisana M, Tarantola S (2008) Global sensitivity analysis: the primer. Wiley, Chichester Saltelli A, Annoni P, Azzini I, Campolongo F, Ratto M, Tarantola S (2010) Variance based sensitivity analysis of model output. Design and for the total sensitivity index. Comput Phys Commun 181(2):259–270 Santner TJ, Williams BJ, Notz WI (2013) The design and analysis of computer experiments. Springer Science & Business Media, New York Schilders WH, Van der Vorst HA, Rommes J (2008) Model order reduction: theory, research aspects and applications, vol 13. Springer, Berlin Sobol IM (1967) On the distribution of points in a cube and the approximate evaluation of integrals. USSR Comput Math Math Phys 7(4):86–112 Spears BK (2017) Contemporary machine learning: a guide for practitioners in the physical sciences. arXiv:171208523 Stein M (1987) Large sample properties of simulations using latin hypercube sampling. Techno- metrics 29(2):143 Stripling HF, McClarren RG, Kuranz CC, Grosskopf MJ, Rutter E, Torralva BR (2013) A calibration and data assimilation method using the Bayesian MARS emulator. Ann Nucl Energy 52:103–112 Student (1908) The probable error of a mean. Biometrika 6:1–25 Tate DR (1968) Acceleration due to gravity at the national bureau of standards. J Res Natl Bur Stand Sect C Eng Instrum 72C(1):1 Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288 Townsend A (2015) The race for high order Gauss–Legendre quadrature. SIAM News, pp 1–3 Trefethen LN (2013) Approximation theory and approximation practice. Other titles in applied mathematics. SIAM, Philadelphia Wagner JC, Haghighat A (1998) Automated variance reduction of Monte Carlo shielding calcula- tions using the discrete ordinates adjoint function. Nucl Sci Eng 128(2):186–208 Wang Z, Navon IM, Le Dimet FX, Zou X (1992) The second order adjoint analysis: theory and applications. Meteorol Atmos Phys 50(1–3):3–20 Wilcox LC, Stadler G, Bui-Thanh T, Ghattas O (2015) Discretely exact derivatives for hyperbolic PDE-constrained optimization problems discretized by the discontinuous Galerkin method. J Sci Comput 63(1):138–162 Wolpert DH (2008) Physical limits of inference. Phys D Nonlinear Phenom 237(9):1257–1281 Zheng W, McClarren RG (2016) Emulation-based calibration for parameters in parameterized phonon spectrum of ZrHx in TRIGA reactor simulations. Nucl Sci Eng 183(1):78–95 Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320 Index

A sampling, 71 Adjoint Marshall-Olkin algorithm, 73 linear, steady-state equations, 130 t, 61, 63 nonlinear, time-dependent equations, 139 Correlation sensitivity formula, 134, 139 Kendall’s tau, 56, 57, 61, 62, 66, 67, 69, 70, Advection-diffusion-reaction equation, 73–75, 89, 90 12, 100, 106, 131, 135, matrix, 54, 101 139, 248 Pearson, 54, 55, 57, 58, 60, 89, 90 Aleatory uncertainties, 14 Spearman, 55, 57, 64, 89, 90 Archimedes of Syracuse, 66 Covariance, 33 Automatic differentiation, 108 Cross validation, 120 leave-one-out, 269 Cumulative distribution function, 20 B Bayesian statistics D Bayes’ theorem, 45 Design of computer experiments, 155 linear regression, 258 Determinism, 5 Black-Scholes equation, 224, 232 Distribution Bernoulli, 23 beta, 203–206, 208 C binomial, 46 Calibration, 276 Cauchy, 55, 318 simple, 277 exponential, 38 using Markov Chain Monte Carlo, 285 gamma, 210, 213, 214 Coca-Cola, 225 Gumbel, 153 Copula logistic, 28 Archimedean, 64, 74 multivariate normal, 33 Clayton, 68 normal, 21, 191, 194, 195 definition, 59 t,61 Fréchet, 64 uniform, 28, 198–200 Frank, 66, 68 Duck-billed platypus, 28 independent, 60 multivariate, 72 E normal, 60, 64, 72 Emoji, 70 blamed for financial crisis, 61 Emulator, see Surrogate model

© Springer Nature Switzerland AG 2018 343 R. G. McClarren, Uncertainty Quantification and Predictive Computational Science, https://doi.org/10.1007/978-3-319-99525-0 344 Index

Epistemic uncertainties, 14, 89 Maximum likelihood estimation, 150 Error function, 21 Mean, 25 Expert judgment, 313 Median, 26 Method of moments, 152 applied to Gumbel distribution, 153 F Mode, 26 Finite difference, 100 Monte Carlo complex step, 104 estimation of π, 148 cross-derivatives, 106 estimator, 147 second-derivative, 106 Latin hypercube designs, 158–161 Fuzzy logic, 12 minimax design, 161 Markov chain, see Markov Chain Monte G Carlo Gaussian process, 36, 42, 103 orthogonal arrays, 161 Gaussian Process regression, 261 second-order sampling, 308 cross validation, 269 stratified sampling, 155 predictions with noise, 267 variance estimate, 156 predictions without uncertainty, 264 variance, 148 Gibbs’ phenomena, 220 glmnet, 125 P Gödel’s incompleteness theorem, 5 Pelagian heresy, 185 Guinness brewery, 328 Poisson equation, 217 Polynomial chaos, 189 H collocation, 239 Hermite polynomials, 190, 191, 194, 241 Power exponential kernel, 263 Horsetail plot, 308 Principia Mathematica,5 Probability box, 309 Cauchy deviates estimation, 318 J Kolmogorov-Smirnov bounds, 316 Jacobi polynomials, 203–206, 208 predictions with, 312 Probability density function, 20 python, 101, 125, 134, 269, 274 K Karhunen-Loève expansion, 83, 84, 86 Kennedy-O’Hagan model, 289 Q discrepancy function, 290 Quadrature using a hierarchy of models, 295 Gauss-Hermite, 195–197 Kernel trick, 262 Gauss-Jacobi, 208, 210 Kolmogorov-Smirnov test, 315 Gauss-Laguerre, 215–217 Kurtosis, 27 Gauss-Legendre, 202 properties of Gauss quadrature, 196 sparse, 228–230 L adaptive, 235 Laguerre polynomials, 210, 213, 214 anisotropic, 233 Laplace’s demon, 5 tensor product, 223 Legendre polynomials, 198–200 Quasi-Monte Carlo, 162 Linear B, 28 Halton sequence, 164 L p norm, 114 Sobol sequence, 164 van der Corput sequence, 162 M Markov chain, 279 R Markov Chain Monte Carlo (MCMC), 279 Regression, 113 burn-in, 283 elastic net, 122 Metropolis-Hastings algorithm, 280 lasso, 121 Index 345

least-squares, 113 using regularized regression, 237 regularized, 113 standard normal, 191, 192 elastic net, 118, 119 uniform, 198–200 lasso, 116, 118 Standard deviation, 26 ridge, 114–116 Stochastic collocation, 239 ridge, 122 combination with projection, 242 Reliability methods, 175 equivalence with spectral projection, 240 Advanced First-Order Second-Moment Stochastic finite elements, 244 (AFOSM) method, 180 spectral projection, 245 First-Order Second-Moment (FOSM) stochastic collocation, 250 method, 176 Stochastic process, 35 most probable point of failure, 182 Surrogate model, 257

T S Tail dependence, 58, 60, 61, 63, 64, 67–69, 73, Scaled sensitivity coefficients, 97, 101, 106, 89 114 Taylor series, 96 Sensitivity index, 97, 101 Singular value decomposition, 76–79, 82 uncovering outliers with, 83 V Skewness, 26 Validation metric sklearn, 125, 274 definition as Minkowski L1 metric, 306 Solution verification, 6, 305 epistemic uncertainty in, 309 Spectral projection, 189 Variance, 26 applied to PDE, 217, 219 first-order sensitivity estimate, 98, 99, 103, beta, 203–206, 208 104 curse of dimensionality, 223 Volatility, 224 gamma, 210, 213, 214 issues with, 220 multi-dimensional, 222–224 W normal, 194, 195 Witch of Agnesi, 254, 317