Common Probability Distributions
Nathaniel E. Helwig
Associate Professor of Psychology and Statistics University of Minnesota
August 28, 2020
Copyright c 2020 by Nathaniel E. Helwig
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 1 / 28 Table of Contents
1. Probability Distribution Background
2. Discrete Distributions
3. Continuous Distributions
5. Affine Transformations of Normal Variables
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 2 / 28 Probability Distribution Background Table of Contents
1. Probability Distribution Background
2. Discrete Distributions
3. Continuous Distributions
4. Central Limit Theorem
5. Affine Transformations of Normal Variables
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 3 / 28 Probability Distribution Background Distribution, Mass, and Density Functions
A random variable X has a cumulative distribution function (CDF) F (·), which is a function from the sample space S to the interval [0, 1]. • F (x) = P (X ≤ x) for any given x ∈ S • 0 ≤ F (x) ≤ 1 for any x ∈ S and F (a) ≤ F (b) for all a ≤ b
F (·) has an associated function f(·) that is referred to as a probability mass function (PMF) or probability distribution function (PDF). • PMF (discrete): f(x) = P (X = x) for all x ∈ S R b • PDF (continuous): a f(x)dx = F (b) − F (a) = P (a < X < b)
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 4 / 28 Probability Distribution Background Parameters and Statistics
A parameter θ = t(F ) refers to a some function of a probability distribution that is used to characterize the distribution. • µ = E(X) and σ2 = E[(X − µ)2] are parameters
> Given a sample of data x = (x1, . . . , xn) , a statistic T = s(x) is some function of the sample of data. 1 Pn 2 1 Pn 2 • x¯ = n i=1 xi and s = n−1 i=1(xi − x¯) are statistics
Parametric distributions have a finite number of parameters, which characterize the form of the CDF and PMF (or PDF). • The parameters define a family of distributions
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 5 / 28 Discrete Distributions Table of Contents
1. Probability Distribution Background
2. Discrete Distributions
3. Continuous Distributions
4. Central Limit Theorem
5. Affine Transformations of Normal Variables
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 6 / 28 Discrete Distributions Bernoulli Distribution
A Bernoulli trial refers to a simple experiment that has two possible outcomes. The two outcomes are x = 0 (failure) and x = 1 (success), and the probability of success is denoted by p = P (X = 1). • One parameter: p ∈ [0, 1] • Notation: X ∼ Bern(p) or X ∼ B(1, p)
The Bernoulli distribution has properties: 1 − p if x = 0 • PMF: f(x) = p if x = 1 0 if x < 0 • CDF: F (x) = 1 − p if 0 ≤ x < 1 1 if x = 1 • Mean: E(X) = p • Variance: Var(X) = p(1 − p)
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 7 / 28 Discrete Distributions Bernoulli Distribution Example
Suppose we flip a coin and record the outcome as 0 (tails) or 1 (heads).
This experiment is an example of a Bernoulli trial, and the random variable X (the outcome of a coin flip) follows a Bernoulli distribution.
Bernoulli (p= 1/2) Bernoulli (p= 3/4) 0.8 0.8 ) ) x x ( ( f f 0.4 0.4 0.0 0.0 0 1 0 1 x x Figure 1: Bernoulli PMF with p = 1/2 (left) and p = 3/4 (right).
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 8 / 28 Discrete Distributions Binomial Distribution
Pn If X = i=1 Zi where the Zi are independent and identically distributed Bernoulli trials with probability of success p, then the random variable X follows a binomial distribution. • Two parameters: n ∈ {1, 2, 3,...} and p ∈ [0, 1] • Notation: X ∼ B(n, p)
The binomial distribution has the properties: • n x n−x n n! PMF: f(x) = x p (1 − p) where x = x!(n−x)! Pbxc n i n−i • CDF: F (x) = i=0 i p (1 − p) • Mean: E(X) = np • Variance: Var(X) = np(1 − p)
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 9 / 28 Discrete Distributions Binomial Distribution Example
Suppose that we flip a coin n ≥ 1 independent times, and assume that each flip Zi has probability of success (i.e., heads) p ∈ [0, 1].
If we define X to be the total number of observed heads, then Pn X = i=1 Zi follows a binomial distribution with parameters n and p.
n = 5 n = 10 0.20 ) ) 0.25 x x = = X X ( ( 0.15 0.10 P P 0.05 0.00 0 1 2 3 4 5 0 2 4 6 8 10 x x Figure 2: Binomial PMF with n ∈ {5, 10} and p = 1/2.
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 10 / 28 Discrete Distributions Limiting Distribution of Binomial
As the number of independent trials n → ∞, we have that X − np → Z ∼ N(0, 1) pnp(1 − p)
where N(0, 1) denotes a standard normal distribution (later defined).
In other words, the normal distribution is the limiting distribution of the binomial for large n.
Note that the binomial distribution (depicted on the previous slide) looks reasonably bell-shaped for n = 10.
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 11 / 28 Discrete Distributions Discrete Uniform Distribution
Suppose that a simple random experiment has possible outcomes x ∈ {a, a + 1, . . . , b − 1, b} where a ≤ b and m = 1 + b − a, and all m possible outcomes are equally likely, i.e., if P (X = x) = 1/m. • Two parameters: the two endpoints a and b (with a < b) • Notation: X ∼ U{a, b}
The discrete uniform distribution has the properties: • PMF: f(x) = 1/m • CDF: F (x) = (1 + bxc − a)/m • Mean: E(X) = (a + b)/2 • Variance: Var(X) = [(b − a + 1)2 − 1]/12
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 12 / 28 Discrete Distributions Discrete Uniform Distribution Example
Suppose we roll a fair dice and let x ∈ {1,..., 6} denote the number of dots that are observed. The random variable X follows a discrete uniform distribution with a = 1 and b = 6.
Discrete Uniform PMF Discrete Uniform CDF 0.20 0.8 ) ) x x ( ( 0.10 f F 0.4 0.0 0.00 1 2 3 4 5 6 1 2 3 4 5 6 x x
Figure 3: Discrete uniform distribution PDF and CDF with a = 1 and b = 6.
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 13 / 28 Continuous Distributions Table of Contents
1. Probability Distribution Background
2. Discrete Distributions
3. Continuous Distributions
4. Central Limit Theorem
5. Affine Transformations of Normal Variables
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 14 / 28 Continuous Distributions Normal Distribution
The normal (or Gaussian) distribution is the most well-known and commonly used probability distribution. The normal distribution is quite important because of the central limit theorem (later defined). • Two parameters: the mean µ and the variance σ2 • Notation: X ∼ N(µ, σ2)
The standard normal distribution refers to a normal distribution where µ = 0 and σ2 = 1, which is typically denoted by Z ∼ N(0, 1).
The normal distribution has the properties: 2 • PDF: f(x) = √1 exp − 1 x−µ where exp(x) = ex σ 2π 2 σ x 2 • CDF: F (x) = Φ x−µ where Φ(x) = √1 R e−z /2dz σ 2π −∞ • Mean: E(X) = µ • Variance: Var(X) = σ2
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 15 / 28 Continuous Distributions Normal Distribution Visualizations
The CDF has a elongated “S” shape, which is called an ogive.
Normal PDFs Normal CDFs
µ = 0, σ2 = 0.2 µ = 0, σ2 = 1 2 0.8 µ = 0, σ = 5 0.8 µ = −2, σ2 = 0.5 ) ) x x ( ( f F µ σ2 0.4 0.4 = 0, = 0.2 µ = 0, σ2 = 1 µ = 0, σ2 = 5 µ = −2, σ2 = 0.5 0.0 0.0 −4 −2 0 2 4 −4 −2 0 2 4 x x
Figure 4: Normal distribution PDFs and CDFs with different µ and σ2.
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 16 / 28 Continuous Distributions Chi-Square Distribution
Pk 2 If X = i=1 Zi where the Zi are independent standard normal distributions, then the random variable X follows a chi-square distribution with degrees of freedom k. • One parameter: the degrees of freedom k 2 2 • Notation: X ∼ χk or X ∼ χ (k)
The chi-square distribution has the properties: • PDF: f(x) = 1 xk/2−1e−x/2 where Γ(x) = R ∞ tx−1e−tdt 2k/2Γ(k/2) 0 is the gamma function • 1 k x R v u−1 −t CDF: F (x) = Γ(k/2) γ 2 , 2 where γ(u, v) = 0 t e dt is the lower incomplete gamma function • Mean: E(X) = k • Variance: Var(X) = 2k
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 17 / 28 Continuous Distributions Chi-Square Distribution Visualization
χ2(k) distribution approaches normal distribution as k → ∞
Chi−Square PDFs Chi−Square CDFs
k = 1 k = 2
0.4 k = 3 0.8 k = 4 ) ) k = 1 k = 5 x x ( ( k = 2
f k = 6 F
0.2 0.4 k = 3 k = 4 k = 5 k = 6 0.0 0.0 0 2 4 6 8 10 0 2 4 6 8 10 x x
Figure 5: Chi-square distribution PDFs and CDFs with different DoF.
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 18 / 28 Continuous Distributions F Distribution
A random variable X has an F distribution if the variable has the form U/m X = V/n where U ∼ χ2(m) and V ∼ χ2(n) are independent chi-square variables. • Two parameters: the two degrees of freedom parameters m and n • Notation: X ∼ Fm,n or X ∼ F (m, n)
The F distribution has the properties: r (xm)mnn m+n (xm+n) R 1 u−1 v−1 • PDF: f(x) = m n where B(u, v) = t (1 − t) dt xB( 2 , 2 ) 0 m n B(x;u,v) • CDF: F (x) = I xm , where Ix(u, v) = and xm+n 2 2 B(u,v) R x u−1 v−1 B(x; u, v) = 0 t (1 − t) dt n • Mean: E(X) = n−2 assuming that n > 2 2 • 2n (m+n−2) Variance: Var(X) = m(n−2)2(n−4) assuming that n > 4 Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 19 / 28 Continuous Distributions F Distribution Visualization
Note that if X ∼ F (m, n), then X−1 ∼ F (n, m)
F Distribution PDFs F Distribution CDFs
2.0 m = 1, n = 1 m = 2, n = 1
m = 2, n = 5 0.8 1.5 m = 5, n = 5 ) ) m = 1, n = 1
m = 5, n = 10 x x ( ( 1.0 m = 50, n = 50 m = 2, n = 1 f F
0.4 m = 2, n = 5 m = 5, n = 5 0.5 m = 5, n = 10 m = 50, n = 50 0.0 0.0 0 1 2 3 4 5 0 1 2 3 4 5 x x
Figure 6: F distribution PDFs and CDFs with different DoF.
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 20 / 28 Continuous Distributions Continuous Uniform Distribution
Suppose that a simple random experiment has an infinite number of possible outcomes x ∈ [a, b] where −∞ < a < b < ∞, and all possible outcomes are equally likely. • Two parameters: the two endpoints a and b • Notation: X ∼ U[a, b] or X ∼ U(a, b) (closed or open interval)
The continuous uniform distribution has the properties: 1 • PDF: f(x) = b−a if x ∈ [a, b] (note that f(x) = 0 otheriwse) x−a • CDF: F (x) = b−a if x ∈ [a, b] (note that F (x) = 0 if x < a and F (x) = 1 if x > b) • Mean: E(X) = (a + b)/2 • Variance: Var(X) = (b − a)2/12
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 21 / 28 Continuous Distributions Continuous Uniform Distribution Visualization
Continuous Uniform PDF Continuous Uniform CDF 0.8 0.08 ) ) x x ( ( f F 0.4 0.04 0.0 0.00 0 2 4 6 8 10 12 0 2 4 6 8 10 12 x x
Figure 7: Continuous uniform PDF and CDF with a = 0 and b = 12.
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 22 / 28 Central Limit Theorem Table of Contents
1. Probability Distribution Background
2. Discrete Distributions
3. Continuous Distributions
4. Central Limit Theorem
5. Affine Transformations of Normal Variables
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 23 / 28 Central Limit Theorem Central Limit Theorem
Let x1, . . . , xn denote an independent and identically distributed (iid) sample of data from some probability distribution F with mean µ and variance σ2 < ∞.
The central limit theorem reveals that as the sample size n → ∞
√ d 2 n(¯xn − µ) −→ N(0, σ )
1 Pn wherex ¯n = n i=1 xi is the sample mean.
For a large enough n, the sample meanx ¯n is approximately normally 2 σ2 distributed with mean µ and variance σ /n, i.e.,x ¯n ∼˙ N(µ, n ).
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 24 / 28 Central Limit Theorem Central Limit Theorem Visualization
Bernoulli (n = 100) Bernoulli (n = 500) 8 15 6 ) ) 10 x x 4 ( ( ^ ^ f f 5 2 0 0
0.3 0.4 0.5 0.6 0.7 0.40 0.45 0.50 0.55 0.60 x x
Uniform (n = 100) Uniform (n = 500) 8 15 6 ) ) 10 x x 4 ( ( ^ ^ f f 5 2 0 0
0.3 0.4 0.5 0.6 0.7 0.40 0.45 0.50 0.55 0.60 x x
√ √ 2− 12 2+ 12 Figure 8: Top: X ∼ B(1, 1/2). Bottom: X ∼ U[ 4 , 4 ]. Note that µ = 1/2 and σ2 = 1/4 for both distributions. The histograms depict the approximate 2 sampling distribution ofx ¯n and the lines denote the N(µ, σ /n) density.
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 25 / 28 Affine Transformations of Normal Variables Table of Contents
1. Probability Distribution Background
2. Discrete Distributions
3. Continuous Distributions
4. Central Limit Theorem
5. Affine Transformations of Normal Variables
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 26 / 28 Affine Transformations of Normal Variables Definition of Affine Transformation
Another reason that the normal distribution is so popular for applied research is the fact that normally distributed variables are convenient to work with. This is because affine transformations of normal variables are also normal variables.
Given a collection of variables X1,...,Xp, an affine transformation has the form Y = a + b1X1 + ··· bpXp, where a is an offset (intercept) term and bj is the weight/coefficient that is applied to the j-th variable. • Linear transformations are a special case with a = 0
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 27 / 28 Affine Transformations of Normal Variables Affine Transformations of Normals
If the random variable X is normally distributed, i.e., if X ∼ N(µ, σ2), then the random variable Y = a + bX is also normally distributed, i.e., 2 2 2 2 Y ∼ N(µY , σY ) where µY = E(Y ) = a + bµ and σY = Var(Y ) = b σ .
Pp If we define Y = a + j=1 bjXj, then the mean and variance of Y are
p X µY = E(Y ) = a + bjµj j=1 p p j−1 2 X 2 2 X X σY = Var(Y ) = bj σj + 2 bjbkσjk j=1 j=2 k=1
where σjk = E[(Xj − µj)(Xk − µk)]. If the collection of variables 2 (X1,...,Xp) are multivariate normal, then Y ∼ N(µY , σY ).
Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 28 / 28