Introduction to Normal Distribution
Nathaniel E. Helwig
Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities)
Updated 17-Jan-2017
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 1 Copyright
Copyright c 2017 by Nathaniel E. Helwig
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 2 Outline of Notes
1) Univariate Normal: 3) Multivariate Normal: Distribution form Distribution form Standard normal Probability calculations Probability calculations Affine transformations Affine transformations Conditional distributions Parameter estimation Parameter estimation
2) Bivariate Normal: 4) Sampling Distributions: Distribution form Univariate case Probability calculations Multivariate case Affine transformations Conditional distributions
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 3 Univariate Normal
Univariate Normal
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 4 Univariate Normal Distribution Form Normal Density Function (Univariate)
Given a variable x ∈ R, the normal probability density function (pdf) is
2 1 − (x−µ) f (x) = √ e 2σ2 σ 2π (1) 1 (x − µ)2 = √ exp − σ 2π 2σ2 where µ ∈ R is the mean σ > 0 is the standard deviation( σ2 is the variance) e ≈ 2.71828 is base of the natural logarithm
Write X ∼ N(µ, σ2) to denote that X follows a normal distribution.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 5 Univariate Normal Standard Normal Standard Normal Distribution
If X ∼ N(0, 1), then X follows a standard normal distribution:
1 2 f (x) = √ e−x /2 (2) 2π 0.4 ) x ( 0.2 f
0.0 −4 −2 0 2 4 x
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 6 Univariate Normal Probability Calculations Probabilities and Distribution Functions
Probabilities relate to the area under the pdf:
Z b P(a ≤ X ≤ b) = f (x)dx a (3) = F(b) − F(a)
where Z x F(x) = f (u)du (4) −∞ is the cumulative distribution function (cdf).
Note: F(x) = P(X ≤ x) =⇒ 0 ≤ F(x) ≤ 1
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 7 Univariate Normal Probability Calculations Normal Probabilities
Helpful figure of normal probabilities:
34.1% 34.1%
2.1% 2.1% 0.1% 13.6% 13.6% 0.1% 0.0 0.1 0.2 0.3 0.4 −3σ −2σ −1σ1σ µ 2σ 3σ
From http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 8 Univariate Normal Probability Calculations Normal Distribution Functions (Univariate)
Helpful figures of normal pdfs and cdfs:
1.0 1.0 μ= 0, σ 2= 0.2, μ= 0, σ 2= 0.2, μ= 0, σ 2= 1.0, μ= 0, σ 2= 1.0, 0.8 μ= 0, σ 2= 5.0, 0.8 μ= 0, σ 2= 5.0, μ= −2, σ 2= 0.5, μ= −2, σ 2= 0.5,
0.6 0.6 x ) ( x ) ( 2 2 μ,σ μ,σ 0.4 -3 -2 -1 0.4 -3 -2 -1 φ Φ
0.2 0.2
0.0 0.0 −5 −3 −2−4 −1 01 2 3 4 5 −5 −3 −2−4 −1 01 2 3 4 5 x x
http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg http://en.wikipedia.org/wiki/File:Normal_Distribution_CDF.svg
Note that the cdf has an elongated “S” shape, referred to as an ogive.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 9 Univariate Normal Affine Transformations Affine Transformations of Normal (Univariate)
Suppose that X ∼ N(µ, σ2) and a, b ∈ R with a 6= 0.
If we define Y = aX + b, then Y ∼ N(aµ + b, a2σ2).
Suppose that X ∼ N(1, 2). Determine the distributions of. . . Y = X + 3 Y = 2X + 3 Y = 3X + 2
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10 Univariate Normal Affine Transformations Affine Transformations of Normal (Univariate)
Suppose that X ∼ N(µ, σ2) and a, b ∈ R with a 6= 0.
If we define Y = aX + b, then Y ∼ N(aµ + b, a2σ2).
Suppose that X ∼ N(1, 2). Determine the distributions of. . . Y = X + 3 =⇒ Y ∼ N(1(1) + 3, 12(2)) ≡ N(4, 2) Y = 2X + 3 =⇒ Y ∼ N(2(1) + 3, 22(2)) ≡ N(5, 8) Y = 3X + 2 =⇒ Y ∼ N(3(1) + 2, 32(2)) ≡ N(5, 18)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10 Univariate Normal Parameter Estimation Likelihood Function
Suppose that x = (x1,..., xn) is an iid sample of data from a normal 2 iid 2 distribution with mean µ and variance σ , i.e., xi ∼ N(µ, σ ).
The likelihood function for the parameters (given the data) has the form
n n 2 2 Y Y 1 (xi − µ) L(µ, σ |x) = f (xi ) = √ exp − 2 2σ2 i=1 i=1 2πσ and the log-likelihood function is given by
n n X n n 1 X LL(µ, σ2|x) = log(f (x )) = − log(2π)− log(σ2)− (x −µ)2 i 2 2 2σ2 i i=1 i=1
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 11 Univariate Normal Parameter Estimation Maximum Likelihood Estimate of the Mean
The MLE of the mean is the value of µ that minimizes
n n X 2 X 2 2 (xi − µ) = xi − 2nx¯µ + nµ i=1 i=1 Pn where x¯ = (1/n) i=1 xi is the sample mean.
Taking the derivative with respect to µ we find that
∂ Pn (x − µ)2 i=1 i = −2nx¯ + 2nµ ←→ x¯ =µ ˆ ∂µ
i.e., the sample mean x¯ is the MLE of the population mean µ.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 12 Univariate Normal Parameter Estimation Maximum Likelihood Estimate of the Variance
The MLE of the variance is the value of σ2 that minimizes
n Pn 2 2 n 1 X n x nx¯ log(σ2) + (x − µˆ)2 = log(σ2) + i=1 i − 2 2σ2 i 2 2σ2 2σ2 i=1 Pn where x¯ = (1/n) i=1 xi is the sample mean.
Taking the derivative with respect to σ2 we find that
n 2 1 Pn 2 n ∂ log(σ ) + 2 = (xi − µˆ) n 1 X 2 2σ i 1 = − (x − µˆ)2 ∂σ2 2σ2 2σ4 i i=1
2 Pn 2 which implies that the sample variance σˆ = (1/n) i=1(xi − x¯) is the MLE of the population variance σ2.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 13 Bivariate Normal
Bivariate Normal
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 14 Bivariate Normal Distribution Form Normal Density Function (Bivariate)
Given two variables x, y ∈ R, the bivariate normal pdf is
n h 2 2 io exp − 1 (x−µx ) + (y−µy ) − 2ρ(x−µx )(y−µy ) 2(1−ρ2) σ2 σ2 σx σy f (x, y) = x y (5) p 2 2πσx σy 1 − ρ
where
µx ∈ R and µy ∈ R are the marginal means + + σx ∈ R and σy ∈ R are the marginal standard deviations 0 ≤ |ρ| < 1 is the correlation coefficient
2 2 X and Y are marginally normal: X ∼ N(µx , σx ) and Y ∼ N(µy , σy )
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 15 Bivariate Normal Distribution Form √ 2 2 Example: µx = µy = 0, σx = 1, σy = 2, ρ = 0.6/ 2 0.12 4
0.12 0.10 2 0.10 0.08 )
0.08 y
, 0 ) y x 0.06 y (
, 0.06 f x (
f 0.04 0.04 4 −2 0.02 2
0 0.02 −4 −4 −2 −2 y 0 2 −4 −4 −2 0 2 4 0.00 4 x x
http://en.wikipedia.org/wiki/File:MultivariateNormal.png Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 16 Bivariate Normal Distribution Form Example: Different Means
µx = 0, µy = 0 µx = 1, µy = 2 µx = −1, µy = −1 0.12 0.12 0.12 4 4 4 0.10 0.10 0.10 2 2 2 0.08 0.08 0.08 ) ) ) y y y
, , , 0 0 0 y y y x x x 0.06 0.06 0.06 ( ( ( f f f 0.04 0.04 0.04 −2 −2 −2 0.02 0.02 0.02 −4 −4 −4
−4 −2 0 2 4 0.00 −4 −2 0 2 4 0.00 −4 −2 0 2 4 0.00 x x x √ 2 2 Note: for all three plots σx = 1, σy = 2, and ρ = 0.6/ 2.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 17 Bivariate Normal Distribution Form Example: Different Correlations
ρ = −0.6/ 2 ρ = 0 ρ = 1.2/ 2 0.12 0.20 4 4 4 0.10 0.10 2 2 2 0.08 0.15 0.08 ) ) ) y y y
, , , 0.06 0 0 0 y y y x x x 0.06 0.10 ( ( ( f f f 0.04 0.04 −2 −2 −2 0.05 0.02 0.02 −4 −4 −4
−4 −2 0 2 4 0.00 −4 −2 0 2 4 0.00 −4 −2 0 2 4 0.00 x x x
2 2 Note: for all three plots µx = µy = 0, σx = 1, and σy = 2.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 18 Bivariate Normal Distribution Form Example: Different Variances
σ σ y = 1 σy = 2 y = 2 0.12 0.08 4 4 4 0.10 0.15 2 2 2 0.06 0.08 ) ) ) y y y
, , , 0 0 0 y y y 0.10 x x x 0.06 0.04 ( ( ( f f f 0.04 −2 −2 −2 0.05 0.02 0.02 −4 −4 −4
−4 −2 0 2 4 0.00 −4 −2 0 2 4 0.00 −4 −2 0 2 4 0.00 x x x
2 Note: for all three plots µx = µy = 0, σx = 1, and ρ = 0.6/(σx σy ).
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 19 Bivariate Normal Probability Calculations Probabilities and Multiple Integration
Probabilities still relate to the area under the pdf:
Z bx Z by P([ax ≤ X ≤ bx ] and [ay ≤ Y ≤ by ]) = f (x, y)dydx (6) ax ay
where RR f (x, y)dydx denotes the multiple integral of the pdf f (x, y).
Defining z = (x, y), we can still define the cdf:
F(z) = P(X ≤ x and Y ≤ y) Z x Z y (7) = f (u, v)dvdu −∞ −∞
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 20 Bivariate Normal Probability Calculations Normal Distribution Functions (Bivariate)
Helpful figures of bivariate normal pdf and cdf: 1.0 0.12 4 4 0.8 0.10 2 2 0.08 ) 0.6 ) y y
, , 0 0 y y x x 0.06 ( ( f F 0.4 0.04 −2 −2 0.2 0.02 −4 −4 0.0 −4 −2 0 2 4 0.00 −4 −2 0 2 4 x x √ 2 2 Note: µx = µy = 0, σx = 1, σy = 2, and ρ = 0.6/ 2
Note that the cdf still has an ogive shape (now in two-dimensions).
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 21 Bivariate Normal Affine Transformations Affine Transformations of Normal (Bivariate)
Given z = (x, y)0, suppose that z ∼ N(µ, Σ) where 0 µ = (µx , µy ) is the 2 × 1 mean vector 2 σx ρσx σy Σ = 2 is the 2 × 2 covariance matrix ρσx σy σy
a11 a12 b1 0 0 Let A = and b = with A 6= 02×2 = . a21 a22 b2 0 0
If we define w = Az + b, then w ∼ N(Aµ + b, AΣA0).
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 22 Bivariate Normal Conditional Distributions Conditional Normal (Bivariate)
The conditional distribution of a variable Y given X = x is
fXY (x, y) fY |X (y|X = x) = (8) fX (x) where
fXY (x, y) is the joint pdf of X and Y
fX (x) is the marginal pdf of X
In the bivariate normal case, we have that
2 Y |X ∼ N(µ∗, σ∗ ) (9)
where µ = µ + ρ σy (x − µ ) and σ2 = σ2(1 − ρ2) ∗ y σx x ∗ y
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 23 Bivariate Normal Conditional Distributions Derivation of Conditional Normal
To prove Equation (9), simply write out the definition and simplify:
fXY (x, y) fY |X (y|X = x) = fX (x) ( " 2 2 #) 1 (x−µx ) (y−µy ) 2ρ(x−µx )(y−µy ) p 2 exp − + − / 2πσx σy 1 − ρ 2 2 2 σx σy 2(1−ρ ) σx σy = √ (x−µx )2 exp − 2 / σx 2π 2σx ( " # ) 2 (y−µ )2 2ρ(x−µ )(y−µ ) 2 exp − 1 (x−µx ) + y − x y + (x−µx ) 2 2 2 σx σy 2 2(1−ρ ) σx σy 2σx = √ p 2 2πσy 1 − ρ ( " #) σ2 1 2 y 2 2 σy exp − ρ (x − µx ) + (y − µy ) − 2ρ (x − µx )(y − µy ) 2 2 2 σx 2σy (1−ρ ) σx = √ p 2 2πσy 1 − ρ ( ) 1 h σy i2 exp − y − µy − ρ (x − µx ) 2 2 σx 2σy (1−ρ ) = √ p 2 2πσy 1 − ρ
which completes the proof.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 24 Bivariate Normal Conditional Distributions Statistical Independence for Bivariate Normal
Two variables X and Y are statistically independent if
fXY (x, y) = fX (x)fY (y) (10)
where fXY (x, y) is joint pdf, and fX (x) and fY (y) are marginals pdfs.
Note that if X and Y are independent, then
fXY (x, y) fX (x)fY (y) fY |X (y|X = x) = = = fY (y) (11) fX (x) fX (x) so conditioning on X = x does not change the distribution of Y .
If X and Y are bivariate normal, what is the necessary and sufficient condition for X and Y to be independent? Hint: see Equation (9)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 25 Bivariate Normal Conditional Distributions Example #1
A statistics class takes two exams X (Exam 1) and Y (Exam 2) where the scores follow a bivariate normal distribution with parameters:
µx = 70 and µy = 60 are the marginal means
σx = 10 and σy = 15 are the marginal standard deviations ρ = 0.6 is the correlation coefficient
Suppose we select a student at random. What is the probability that. . . (a) the student scores over 75 on Exam 2? (b) the student scores over 75 on Exam 2, given that the student scored X = 80 on Exam 1? (c) the sum of his/her Exam 1 and Exam 2 scores is over 150? (d) the student did better on Exam 1 than Exam 2? (e) P(5X − 4Y > 150)?
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 26 Bivariate Normal Conditional Distributions Example #1: Part (a)
Answer for 1(a): Note that Y ∼ N(60, 152), so the probability that the student scores over 75 on Exam 2 is 75 − 60 P(Y > 75) = P Z > 15 = P(Z > 1) = 1 − P(Z < 1) = 1 − Φ(1) = 1 − 0.8413447 = 0.1586553
x 2 where Φ(x) = R f (z)dz with f (x) = √1 e−x /2 denoting the standard −∞ 2π normal pdf (see R code for use of pnorm to calculate this quantity).
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 27 Bivariate Normal Conditional Distributions Example #1: Part (b)
Answer for 1(b): 2 Note that (Y |X = 80) ∼ N(µ∗, σ∗ ) where σY µ∗ = µ + ρ (x − µ ) = 60 + (0.6)(15/10)(80 − 70) = 69 Y σX X 2 2 2 2 2 σ∗ = σY (1 − ρ ) = 15 (1 − 0.6 ) = 144
If a student scored X = 80 on Exam 1, the probability that the student scores over 75 on Exam 2 is 75 − 69 P(Y > 75|X = 80) = P Z > 12 = P (Z > 0.5) = 1 − Φ(0.5) = 1 − 0.6914625 = 0.3085375
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 28 Bivariate Normal Conditional Distributions Example #1: Part (c)
Answer for 1(c): 2 Note that (X + Y ) ∼ N(µ∗, σ∗ ) where µ∗ = µX + µY = 70 + 60 = 130 2 2 2 2 2 σ∗ = σX + σY + 2ρσX σY = 10 + 15 + 2(0.6)(10)(15) = 505
The probability that the sum of Exam 1 and Exam 2 is above 150 is
150 − 130 P(X + Y > 150) = P Z > √ 505 = P (Z > 0.8899883) = 1 − Φ(0.8899883) = 1 − 0.8132639 = 0.1867361
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 29 Bivariate Normal Conditional Distributions Example #1: Part (d)
Answer for 1(d): 2 Note that (X − Y ) ∼ N(µ∗, σ∗ ) where µ∗ = µX − µY = 70 − 60 = 10 2 2 2 2 2 σ∗ = σX + σY − 2ρσX σY = 10 + 15 − 2(0.6)(10)(15) = 145
The probability that the student did better on Exam 1 than Exam 2 is P(X > Y ) = P(X − Y > 0) 0 − 10 = P Z > √ 145 = P (Z > −0.8304548) = 1 − Φ(−0.8304548) = 1 − 0.2031408 = 0.7968592
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 30 Bivariate Normal Conditional Distributions Example #1: Part (e)
Answer for 1(e): 2 Note that (5X − 4Y ) ∼ N(µ∗, σ∗ ) where µ∗ = 5µX − 4µY = 5(70) − 4(60) = 110 2 2 2 2 2 σ∗ = 5 σX + (−4) σY + 2(5)(−4)ρσX σY = 25(102) + 16(152) − 2(20)(0.6)(10)(15) = 2500
Thus, the needed probability can be obtained using 150 − 110 P(5X − 4Y > 150) = P Z > √ 2500 = P (Z > 0.8) = 1 − Φ(0.8) = 1 − 0.7881446 = 0.2118554
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 31 Bivariate Normal Conditional Distributions Example #1: R Code
# Example 1a # Example 1d > pnorm(1,lower=F) > pnorm(-10/sqrt(145),lower=F) [1] 0.1586553 [1] 0.7968592 > pnorm(75,mean=60,sd=15,lower=F) > pnorm(0,mean=10,sd=sqrt(145),lower=F) [1] 0.1586553 [1] 0.7968592
# Example 1b # Example 1e > pnorm(0.5,lower=F) > pnorm(0.8,lower=F) [1] 0.3085375 [1] 0.2118554 > pnorm(75,mean=69,sd=12,lower=F) > pnorm(150,mean=110,sd=50,lower=F) [1] 0.3085375 [1] 0.2118554
# Example 1c > pnorm(20/sqrt(505),lower=F) [1] 0.1867361 > pnorm(150,mean=130,sd=sqrt(505),lower=F) [1] 0.1867361
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 32 Multivariate Normal
Multivariate Normal
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 33 Multivariate Normal Distribution Form Normal Density Function (Multivariate)
0 Given x = (x1,..., xp) with xj ∈ R ∀j, the multivariate normal pdf is 1 1 f (x) = exp − (x − µ)0Σ−1(x − µ) (12) (2π)p/2|Σ|1/2 2
where 0 µ = (µ1, . . . , µp) is the p × 1 mean vector σ11 σ12 ··· σ1p σ21 σ22 ··· σ2p Σ = is the p × p covariance matrix . . .. . . . . . σp1 σp2 ··· σpp
Write x ∼ N(µ, Σ) or x ∼ Np(µ, Σ) to denote x is multivariate normal.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 34 Multivariate Normal Distribution Form Some Multivariate Normal Properties
The mean and covariance parameters have the following restrictions:
µj ∈ R for all j σjj > 0 for all j √ σij = ρij σii σjj where ρij is correlation between Xi and Xj 2 σij ≤ σii σjj for any i, j ∈ {1,..., p} (Cauchy-Schwarz)
Σ is assumed to be positive definite so that Σ−1 exists.
Marginals are normal: Xj ∼ N(µj , σjj ) for all j ∈ {1,..., p}.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 35 Multivariate Normal Probability Calculations Multivariate Normal Probabilities
Probabilities still relate to the area under the pdf:
Z b1 Z bp P(aj ≤ Xj ≤ bj ∀j) = ··· f (x)dxp ··· dx1 (13) a1 ap R R where ··· f (x)dxp ··· dx1 denotes the multiple integral f (x).
0 We can still define the cdf of x = (x1,..., xp)
F(x) = P(Xj ≤ xj ∀j) Z x1 Z xp (14) = ··· f (u)dup ··· du1 −∞ −∞
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 36 Multivariate Normal Affine Transformations Affine Transformations of Normal (Multivariate)
0 Suppose that x = (x1,..., xp) and that x ∼ N(µ, Σ) where
µ = {µj }p×1 is the mean vector
Σ = {σij }p×p is the covariance matrix
Let A = {aij }n×p and b = {bi }n×1 with A 6= 0n×p.
If we define w = Ax + b, then w ∼ N(Aµ + b, AΣA0).
Note: linear combinations of normal variables are normally distributed.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 37 Multivariate Normal Conditional Distributions Multivariate Conditional Distributions
0 0 Given variables x = (x1,..., xp) and y = (y1,..., yq) , we have
fXY (x, y) fY |X (y|X = x) = (15) fX (x) where
fY |X (y|X = x) is the conditional distribution of y given x
fXY (x, y) is the joint pdf of x and y
fX (x) is the marginal pdf of x
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 38 Multivariate Normal Conditional Distributions Conditional Normal (Multivariate)
Suppose that z ∼ N(µ, Σ) where 0 0 0 0 z = (x , y ) = (x1,..., xp, y1,..., yq) 0 0 0 0 µ = (µx , µy ) = (µ1x , . . . , µpx , µ1y , . . . , µqy ) Note: µx is mean vector of x, and µy is mean vector of y Σxx Σxy Σ = 0 where (Σxx )p×p, (Σyy )q×q, and (Σxy )p×q, Σxy Σyy Note: Σxx is covariance matrix of x, Σyy is covariance matrix of y, and Σxy is covariance matrix of x and y
In the multivariate normal case, we have that
y|x ∼ N(µ∗, Σ∗) (16)
0 −1 0 −1 where µ∗ = µy + Σxy Σxx (x − µx ) and Σ∗ = Σyy − Σxy Σxx Σxy
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 39 Multivariate Normal Conditional Distributions Statistical Independence for Multivariate Normal
Using Equation (16), we have that
y|x ∼ N(µ∗, Σ∗) ≡ N(µy , Σyy ) (17)
if and only if Σxy = 0p×q (a matrix of zeros).
Note that Σxy = 0p×q implies that the p elements of x are uncorrelated with the q elements of y. For multivariate normal variables: uncorrelated → independent For non-normal variables: uncorrelated 6→ independent
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 40 Multivariate Normal Conditional Distributions Example #2
Each Delicious Candy Company store makes 3 size candy bars: regular (X1), fun size (X2), and big size (X3).
Assume the weight (in ounces) of the candy bars (X1, X2, X3) follow a multivariate normal distribution with parameters: 5 4 −1 0 µ = 3 and Σ = −1 4 2 7 0 2 9
Suppose we select a store at random. What is the probability that. . . (a) the weight of a regular candy bar is greater than 8 oz? (b) the weight of a regular candy bar is greater than 8 oz, given that the fun size bar weighs 1 oz and the big size bar weighs 10 oz?
(c) P(4X1 − 3X2 + 5X3 < 63)?
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 41 Multivariate Normal Conditional Distributions Example #2: Part (a)
Answer for 2(a): Note that X1 ∼ N(5, 4)
So, the probability that the regular bar is more than 8 oz is
8 − 5 P(X > 8) = P Z > 1 2 = P(Z > 1.5) = 1 − Φ(1.5) = 1 − 0.9331928 = 0.0668072
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 42 Multivariate Normal Conditional Distributions Example #2: Part (b)
Answer for 2(b): (X1|X2 = 1, X3 = 10) is normally distributed, see Equation (16).
The conditional mean of (X1|X2 = 1, X3 = 10) is given by
0 −1 ˜ µ∗ = µX1 + Σ12Σ22 (x − µ˜) −1 4 2 1 − 3 = 5 + −1 0 2 9 10 − 7 1 9 −2 −2 = 5 + −1 0 32 −2 4 3 = 5 + 24/32 = 5.75
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 43 Multivariate Normal Conditional Distributions Example #2: Part (b) continued
Answer for 2(b) continued: The conditional variance of (X1|X2 = 1, X3 = 10) is given by
σ2 = σ2 − Σ0 Σ−1Σ ∗ X1 12 22 12 −1 4 2 −1 = 4 − −1 0 2 9 0 1 9 −2 −1 = 4 − −1 0 32 −2 4 0 = 4 − 9/32 = 3.71875
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 44 Multivariate Normal Conditional Distributions Example #2: Part (b) continued
Answer for 2(b) continued: So, if the fun size bar weighs 1 oz and the big size bar weighs 10 oz, the probability that the regular bar is more than 8 oz is
8 − 5.75 P(X1 > 8|X2 = 1, X3 = 10) = P Z > √ 3.71875 = P(Z > 1.166767) = 1 − Φ(1.166767) = 1 − 0.8783477 = 0.1216523
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 45 Multivariate Normal Conditional Distributions Example #2: Part (c)
Answer for 2(c): (4X1 − 3X2 + 5X3) is normally distributed.
The expectation of (4X1 − 3X2 + 5X3) is given by
µ∗ = 4µX1 − 3µX2 + 5µX3 = 4(5) − 3(3) + 5(7) = 46
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 46 Multivariate Normal Conditional Distributions Example #2: Part (c) continued
Answer for 2(c) continued: The variance of (4X1 − 3X2 + 5X3) is given by
4 2 σ∗ = 4 −3 5 Σ −3 5 4 −1 0 4 = 4 −3 5 −1 4 2 −3 0 2 9 5 19 = 4 −3 5 −6 39 = 289
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 47 Multivariate Normal Conditional Distributions Example #2: Part (c) continued
Answer for 2(c) continued: So, the needed probability can be obtained as
63 − 46 P(4X1 − 3X2 + 5X3 < 63) = P Z < √ 289 = P(Z < 1) = Φ(1) = 0.8413447
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 48 Multivariate Normal Conditional Distributions Example #2: R Code
# Example 2a > pnorm(1.5,lower=F) [1] 0.0668072 > pnorm(8,mean=5,sd=2,lower=F) [1] 0.0668072
# Example 2b > pnorm(2.25/sqrt(119/32),lower=F) [1] 0.1216523 > pnorm(8,mean=5.75,sd=sqrt(119/32),lower=F) [1] 0.1216523
# Example 2c > pnorm(1) [1] 0.8413447 > pnorm(63,mean=46,sd=17) [1] 0.8413447
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 49 Multivariate Normal Parameter Estimation Likelihood Function
Suppose that xi = (xi1,..., xip) is a sample from a normal distribution iid with mean vector µ and covariance matrix Σ, i.e., xi ∼ N(µ, Σ).
The likelihood function for the parameters (given the data) has the form
n n Y Y 1 1 L(µ, Σ|X) = f (x ) = exp − (x − µ)0Σ−1(x − µ) i (2π)p/2|Σ|1/2 2 i i i=1 i=1 and the log-likelihood function is given by
n np n 1 X LL(µ, Σ|X) = − log(2π) − log(|Σ|) − (x − µ)0Σ−1(x − µ) 2 2 2 i i i=1
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 50 Multivariate Normal Parameter Estimation Maximum Likelihood Estimate of Mean Vector
The MLE of the mean vector is the value of µ that minimizes
n n X 0 −1 X 0 −1 0 −1 0 −1 (xi − µ) Σ (xi − µ) = xi Σ xi − 2nx¯ Σ µ + nµ Σ µ i=1 i=1 Pn where x¯ = (1/n) i=1 xi is the sample mean vector.
Taking the derivative with respect to µ we find that
∂LL(µ, Σ|X) = −2nΣ−1x¯ + 2nΣ−1µ ←→ x¯ =µ ˆ ∂µ
The sample mean vector x¯ is the MLE of the population mean µ vector.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 51 Multivariate Normal Parameter Estimation Maximum Likelihood Estimate of Covariance Matrix
The MLE of the covariance matrix is the value of Σ that minimizes
n −1 X −1 0 −n log(|Σ |) + tr{Σ (xi − µˆ)(xi − µˆ) } i=1 Pn where µˆ = x¯ = (1/n) i=1 xi is the sample mean.
Taking the derivative with respect to Σ−1 we find that
n ∂LL(µ, Σ|X) X 0 = −nΣ + (xi − µˆ)(xi − µˆ) ∂Σ−1 i=1
ˆ Pn 0 i.e., the sample covariance matrix Σ = (1/n) i=1(xi − x¯)(xi − x¯) is the MLE of the population covariance matrix Σ.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 52 Sampling Distributions
Sampling Distributions
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 53 Sampling Distributions Univariate Case Univariate Sampling Distributions: x¯ and s2
In the univariate normal case, we have that Pn 2 x¯ = (1/n) i=1 xi ∼ N(µ, σ /n) 2 Pn ¯ 2 2 2 (n − 1)s = i=1(xi − x) ∼ σ χn−1
2 χk denotes a chi-square variable with k degrees of freedom.
2 2 Pk 2 iid 2 σ χk = i=1 zi where zi ∼ N(0, σ )
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 54 Sampling Distributions Multivariate Case Multivariate Sampling Distributions: x¯ and S
In the multivariate normal case, we have that Pn x¯ = (1/n) i=1 xi ∼ N(µ, Σ/n) Pn 0 (n − 1)S = i=1(xi − x¯)(xi − x¯) ∼ Wn−1(Σ)
Wk (Σ) denotes a Wishart variable with k degrees of freedom.
Pk 0 iid Wk (Σ) = i=1 zi zi where zi ∼ N(0p, Σ)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 55