<<

Common Distributions

Nathaniel E. Helwig

Associate Professor of Psychology and University of Minnesota

August 28, 2020

Copyright c 2020 by Nathaniel E. Helwig

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 1 / 28 Table of Contents

1. Background

2. Discrete Distributions

3. Continuous Distributions

4.

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 2 / 28 Probability Distribution Background Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 3 / 28 Probability Distribution Background Distribution, Mass, and Density Functions

A X has a cumulative distribution function (CDF) F (·), which is a function from the S to the interval [0, 1]. • F (x) = P (X ≤ x) for any given x ∈ S • 0 ≤ F (x) ≤ 1 for any x ∈ S and F (a) ≤ F (b) for all a ≤ b

F (·) has an associated function f(·) that is referred to as a probability mass function (PMF) or probability distribution function (PDF). • PMF (discrete): f(x) = P (X = x) for all x ∈ S R b • PDF (continuous): a f(x)dx = F (b) − F (a) = P (a < X < b)

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 4 / 28 Probability Distribution Background Parameters and Statistics

A parameter θ = t(F ) refers to a some function of a probability distribution that is used to characterize the distribution. • µ = E(X) and σ2 = E[(X − µ)2] are parameters

> Given a sample of data x = (x1, . . . , xn) , a statistic T = s(x) is some function of the sample of data. 1 Pn 2 1 Pn 2 • x¯ = n i=1 xi and s = n−1 i=1(xi − x¯) are statistics

Parametric distributions have a finite number of parameters, which characterize the form of the CDF and PMF (or PDF). • The parameters define a family of distributions

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 5 / 28 Discrete Distributions Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 6 / 28 Discrete Distributions

A refers to a simple that has two possible outcomes. The two outcomes are x = 0 (failure) and x = 1 (success), and the probability of success is denoted by p = P (X = 1). • One parameter: p ∈ [0, 1] • Notation: X ∼ Bern(p) or X ∼ B(1, p)

The Bernoulli distribution has properties:  1 − p if x = 0 • PMF: f(x) = p if x = 1   0 if x < 0 • CDF: F (x) = 1 − p if 0 ≤ x < 1  1 if x = 1 • Mean: E(X) = p • Variance: Var(X) = p(1 − p)

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 7 / 28 Discrete Distributions Bernoulli Distribution Example

Suppose we flip a coin and record the as 0 (tails) or 1 (heads).

This experiment is an example of a Bernoulli trial, and the random variable X (the outcome of a coin flip) follows a Bernoulli distribution.

Bernoulli (p= 1/2) Bernoulli (p= 3/4) 0.8 0.8 ) ) x x ( ( f f 0.4 0.4 0.0 0.0 0 1 0 1 x x Figure 1: Bernoulli PMF with p = 1/2 (left) and p = 3/4 (right).

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 8 / 28 Discrete Distributions

Pn If X = i=1 Zi where the Zi are independent and identically distributed Bernoulli trials with probability of success p, then the random variable X follows a binomial distribution. • Two parameters: n ∈ {1, 2, 3,...} and p ∈ [0, 1] • Notation: X ∼ B(n, p)

The binomial distribution has the properties: • n x n−x n n! PMF: f(x) = x p (1 − p) where x = x!(n−x)! Pbxc n i n−i • CDF: F (x) = i=0 i p (1 − p) • Mean: E(X) = np • : Var(X) = np(1 − p)

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 9 / 28 Discrete Distributions Binomial Distribution Example

Suppose that we flip a coin n ≥ 1 independent times, and assume that each flip Zi has probability of success (i.e., heads) p ∈ [0, 1].

If we define X to be the total number of observed heads, then Pn X = i=1 Zi follows a binomial distribution with parameters n and p.

n = 5 n = 10 0.20 ) ) 0.25 x x = = X X ( ( 0.15 0.10 P P 0.05 0.00 0 1 2 3 4 5 0 2 4 6 8 10 x x Figure 2: Binomial PMF with n ∈ {5, 10} and p = 1/2.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 10 / 28 Discrete Distributions Limiting Distribution of Binomial

As the number of independent trials n → ∞, we have that X − np → Z ∼ N(0, 1) pnp(1 − p)

where N(0, 1) denotes a standard (later defined).

In other words, the normal distribution is the limiting distribution of the binomial for large n.

Note that the binomial distribution (depicted on the previous slide) looks reasonably bell-shaped for n = 10.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 11 / 28 Discrete Distributions Discrete Uniform Distribution

Suppose that a simple random experiment has possible outcomes x ∈ {a, a + 1, . . . , b − 1, b} where a ≤ b and m = 1 + b − a, and all m possible outcomes are equally likely, i.e., if P (X = x) = 1/m. • Two parameters: the two endpoints a and b (with a < b) • Notation: X ∼ U{a, b}

The discrete uniform distribution has the properties: • PMF: f(x) = 1/m • CDF: F (x) = (1 + bxc − a)/m • Mean: E(X) = (a + b)/2 • Variance: Var(X) = [(b − a + 1)2 − 1]/12

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 12 / 28 Discrete Distributions Discrete Uniform Distribution Example

Suppose we roll a fair dice and let x ∈ {1,..., 6} denote the number of dots that are observed. The random variable X follows a discrete uniform distribution with a = 1 and b = 6.

Discrete Uniform PMF Discrete Uniform CDF 0.20 0.8 ) ) x x ( ( 0.10 f F 0.4 0.0 0.00 1 2 3 4 5 6 1 2 3 4 5 6 x x

Figure 3: Discrete uniform distribution PDF and CDF with a = 1 and b = 6.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 13 / 28 Continuous Distributions Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 14 / 28 Continuous Distributions Normal Distribution

The normal (or Gaussian) distribution is the most well-known and commonly used probability distribution. The normal distribution is quite important because of the central limit theorem (later defined). • Two parameters: the mean µ and the variance σ2 • Notation: X ∼ N(µ, σ2)

The standard normal distribution refers to a normal distribution where µ = 0 and σ2 = 1, which is typically denoted by Z ∼ N(0, 1).

The normal distribution has the properties:  2 • PDF: f(x) = √1 exp − 1 x−µ  where exp(x) = ex σ 2π 2 σ x 2 • CDF: F (x) = Φ x−µ  where Φ(x) = √1 R e−z /2dz σ 2π −∞ • Mean: E(X) = µ • Variance: Var(X) = σ2

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 15 / 28 Continuous Distributions Normal Distribution Visualizations

The CDF has a elongated “S” shape, which is called an ogive.

Normal PDFs Normal CDFs

µ = 0, σ2 = 0.2 µ = 0, σ2 = 1 2 0.8 µ = 0, σ = 5 0.8 µ = −2, σ2 = 0.5 ) ) x x ( ( f F µ σ2 0.4 0.4 = 0, = 0.2 µ = 0, σ2 = 1 µ = 0, σ2 = 5 µ = −2, σ2 = 0.5 0.0 0.0 −4 −2 0 2 4 −4 −2 0 2 4 x x

Figure 4: Normal distribution PDFs and CDFs with different µ and σ2.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 16 / 28 Continuous Distributions Chi-Square Distribution

Pk 2 If X = i=1 Zi where the Zi are independent standard normal distributions, then the random variable X follows a chi-square distribution with degrees of freedom k. • One parameter: the degrees of freedom k 2 2 • Notation: X ∼ χk or X ∼ χ (k)

The chi-square distribution has the properties: • PDF: f(x) = 1 xk/2−1e−x/2 where Γ(x) = R ∞ tx−1e−tdt 2k/2Γ(k/2) 0 is the gamma function • 1 k x  R v u−1 −t CDF: F (x) = Γ(k/2) γ 2 , 2 where γ(u, v) = 0 t e dt is the lower incomplete gamma function • Mean: E(X) = k • Variance: Var(X) = 2k

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 17 / 28 Continuous Distributions Chi-Square Distribution Visualization

χ2(k) distribution approaches normal distribution as k → ∞

Chi−Square PDFs Chi−Square CDFs

k = 1 k = 2

0.4 k = 3 0.8 k = 4 ) ) k = 1 k = 5 x x ( ( k = 2

f k = 6 F

0.2 0.4 k = 3 k = 4 k = 5 k = 6 0.0 0.0 0 2 4 6 8 10 0 2 4 6 8 10 x x

Figure 5: Chi-square distribution PDFs and CDFs with different DoF.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 18 / 28 Continuous Distributions F Distribution

A random variable X has an F distribution if the variable has the form U/m X = V/n where U ∼ χ2(m) and V ∼ χ2(n) are independent chi-square variables. • Two parameters: the two degrees of freedom parameters m and n • Notation: X ∼ Fm,n or X ∼ F (m, n)

The F distribution has the properties: r (xm)mnn m+n (xm+n) R 1 u−1 v−1 • PDF: f(x) = m n where B(u, v) = t (1 − t) dt xB( 2 , 2 ) 0 m n  B(x;u,v) • CDF: F (x) = I xm , where Ix(u, v) = and xm+n 2 2 B(u,v) R x u−1 v−1 B(x; u, v) = 0 t (1 − t) dt n • Mean: E(X) = n−2 assuming that n > 2 2 • 2n (m+n−2) Variance: Var(X) = m(n−2)2(n−4) assuming that n > 4 Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 19 / 28 Continuous Distributions F Distribution Visualization

Note that if X ∼ F (m, n), then X−1 ∼ F (n, m)

F Distribution PDFs F Distribution CDFs

2.0 m = 1, n = 1 m = 2, n = 1

m = 2, n = 5 0.8 1.5 m = 5, n = 5 ) ) m = 1, n = 1

m = 5, n = 10 x x ( ( 1.0 m = 50, n = 50 m = 2, n = 1 f F

0.4 m = 2, n = 5 m = 5, n = 5 0.5 m = 5, n = 10 m = 50, n = 50 0.0 0.0 0 1 2 3 4 5 0 1 2 3 4 5 x x

Figure 6: F distribution PDFs and CDFs with different DoF.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 20 / 28 Continuous Distributions Continuous Uniform Distribution

Suppose that a simple random experiment has an infinite number of possible outcomes x ∈ [a, b] where −∞ < a < b < ∞, and all possible outcomes are equally likely. • Two parameters: the two endpoints a and b • Notation: X ∼ U[a, b] or X ∼ U(a, b) (closed or open interval)

The continuous uniform distribution has the properties: 1 • PDF: f(x) = b−a if x ∈ [a, b] (note that f(x) = 0 otheriwse) x−a • CDF: F (x) = b−a if x ∈ [a, b] (note that F (x) = 0 if x < a and F (x) = 1 if x > b) • Mean: E(X) = (a + b)/2 • Variance: Var(X) = (b − a)2/12

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 21 / 28 Continuous Distributions Continuous Uniform Distribution Visualization

Continuous Uniform PDF Continuous Uniform CDF 0.8 0.08 ) ) x x ( ( f F 0.4 0.04 0.0 0.00 0 2 4 6 8 10 12 0 2 4 6 8 10 12 x x

Figure 7: Continuous uniform PDF and CDF with a = 0 and b = 12.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 22 / 28 Central Limit Theorem Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 23 / 28 Central Limit Theorem Central Limit Theorem

Let x1, . . . , xn denote an independent and identically distributed (iid) sample of data from some probability distribution F with mean µ and variance σ2 < ∞.

The central limit theorem reveals that as the sample size n → ∞

√ d 2 n(¯xn − µ) −→ N(0, σ )

1 Pn wherex ¯n = n i=1 xi is the sample mean.

For a large enough n, the sample meanx ¯n is approximately normally 2 σ2 distributed with mean µ and variance σ /n, i.e.,x ¯n ∼˙ N(µ, n ).

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 24 / 28 Central Limit Theorem Central Limit Theorem Visualization

Bernoulli (n = 100) Bernoulli (n = 500) 8 15 6 ) ) 10 x x 4 ( ( ^ ^ f f 5 2 0 0

0.3 0.4 0.5 0.6 0.7 0.40 0.45 0.50 0.55 0.60 x x

Uniform (n = 100) Uniform (n = 500) 8 15 6 ) ) 10 x x 4 ( ( ^ ^ f f 5 2 0 0

0.3 0.4 0.5 0.6 0.7 0.40 0.45 0.50 0.55 0.60 x x

√ √ 2− 12 2+ 12 Figure 8: Top: X ∼ B(1, 1/2). Bottom: X ∼ U[ 4 , 4 ]. Note that µ = 1/2 and σ2 = 1/4 for both distributions. The depict the approximate 2 distribution ofx ¯n and the lines denote the N(µ, σ /n) density.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 25 / 28 Affine Transformations of Normal Variables Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 26 / 28 Affine Transformations of Normal Variables Definition of Affine Transformation

Another reason that the normal distribution is so popular for applied research is the fact that normally distributed variables are convenient to work with. This is because affine transformations of normal variables are also normal variables.

Given a collection of variables X1,...,Xp, an affine transformation has the form Y = a + b1X1 + ··· bpXp, where a is an offset (intercept) term and bj is the weight/coefficient that is applied to the j-th variable. • Linear transformations are a special case with a = 0

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 27 / 28 Affine Transformations of Normal Variables Affine Transformations of Normals

If the random variable X is normally distributed, i.e., if X ∼ N(µ, σ2), then the random variable Y = a + bX is also normally distributed, i.e., 2 2 2 2 Y ∼ N(µY , σY ) where µY = E(Y ) = a + bµ and σY = Var(Y ) = b σ .

Pp If we define Y = a + j=1 bjXj, then the mean and variance of Y are

p X µY = E(Y ) = a + bjµj j=1 p p j−1 2 X 2 2 X X σY = Var(Y ) = bj σj + 2 bjbkσjk j=1 j=2 k=1

where σjk = E[(Xj − µj)(Xk − µk)]. If the collection of variables 2 (X1,...,Xp) are multivariate normal, then Y ∼ N(µY , σY ).

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 28 / 28