<<

Multivariate

Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Lecture #4 - 7/21/2011 Slide 1 of 41 Last Time

■ Matrices and vectors ◆ Eigenvalues Overview ● Last Time ◆ ● Today’s Lecture Eigenvectors MVN ◆ Determinants MVN Properties MVN Parameters ■ Basic descriptive using matrices: MVN Likelihood Functions ◆ vectors MVN in Common Methods ◆ Assessing Normality Matrices Wrapping Up ◆ Correlation Matrices

Lecture #4 - 7/21/2011 Slide 2 of 41 Today’s Lecture

■ Putting our new knowledge to use with a useful statistical distribution: the Multivariate Normal Distribution

Overview ■ ● Last Time This roughly maps onto Chapter 4 of Johnson and Wichern ● Today’s Lecture

MVN

MVN Properties

MVN Parameters

MVN Likelihood Functions

MVN in Common Methods

Assessing Normality

Wrapping Up

Lecture #4 - 7/21/2011 Slide 3 of 41 Multivariate Normal Distribution

■ The generalization of the univariate normal distribution to multiple variables is called the multivariate normal

Overview distribution (MVN)

MVN ■ ● Univariate Review Many multivariate techniques rely on this distribution in some ● MVN ● MVN Contours manner

MVN Properties ■ MVN Parameters Although real may never come from a true MVN, the

MVN Likelihood Functions MVN provides a robust approximation, and has many nice mathematical properties MVN in Common Methods Assessing Normality ■ Furthermore, because of the , many Wrapping Up converge to the MVN distribution as the sample size increases

Lecture #4 - 7/21/2011 Slide 4 of 41 Univariate Normal Distribution

■ The univariate normal distribution function is:

1 − − 2 Overview f(x)= e [(x µ)/σ] /2 2 MVN √2πσ ● Univariate Review ● MVN ● MVN Contours ■ MVN Properties The mean is µ

MVN Parameters ■ 2 MVN Likelihood Functions The is σ

MVN in Common Methods ■ The is σ Assessing Normality Wrapping Up ■ Standard notation for normal distributions is N(µ, σ2), which will be extended for the MVN distribution

Lecture #4 - 7/21/2011 Slide 5 of 41 Univariate Normal Distribution

N(0, 1)

Overview Univariate Normal Distribution

MVN ● Univariate Review ● MVN ● MVN Contours

MVN Properties

MVN Parameters

MVN Likelihood Functions

MVN in Common Methods f(x) Assessing Normality

Wrapping Up 0.0 0.1 0.2 0.3 0.4

−6 −4 −2 0 2 4 6

x

Lecture #4 - 7/21/2011 Slide 6 of 41 Univariate Normal Distribution

N(0, 2)

Overview Univariate Normal Distribution

MVN ● Univariate Review ● MVN ● MVN Contours

MVN Properties

MVN Parameters

MVN Likelihood Functions

MVN in Common Methods f(x) Assessing Normality

Wrapping Up 0.0 0.1 0.2 0.3 0.4

−6 −4 −2 0 2 4 6

x

Lecture #4 - 7/21/2011 Slide 7 of 41 Univariate Normal Distribution

N(1.75, 1)

Overview Univariate Normal Distribution

MVN ● Univariate Review ● MVN ● MVN Contours

MVN Properties

MVN Parameters

MVN Likelihood Functions

MVN in Common Methods f(x) Assessing Normality

Wrapping Up 0.0 0.1 0.2 0.3 0.4

−6 −4 −2 0 2 4 6

x

Lecture #4 - 7/21/2011 Slide 8 of 41 UVN - Notes

■ The area under the curve for the univariate normal distribution is a function of the variance/standard deviation

Overview ■ In particular: MVN ● Univariate Review ● MVN ● MVN Contours P (µ σ X µ + σ) = 0.683 − ≤ ≤ MVN Properties P (µ 2σ X µ + 2σ) = 0.954 MVN Parameters − ≤ ≤

MVN Likelihood Functions

MVN in Common Methods ■ Also note the term in the exponent: Assessing Normality

Wrapping Up 2 (x µ) − − =(x µ)(σ2) 1(x µ) σ − −  

■ This is the square of the distance from x to µ in standard deviation units, and will be generalized for the MVN

Lecture #4 - 7/21/2011 Slide 9 of 41 MVN

■ The multivariate normal distribution function is:

−1 1 − − ′Σ − Overview f(x)= e (x µ) (x µ)/2 p/2 Σ 1/2 MVN (2π) ● Univariate Review | | ● MVN ● MVN Contours

MVN Properties ■ The mean vector is µ

MVN Parameters ■ MVN Likelihood Functions The is Σ

MVN in Common Methods ■ Standard notation for multivariate normal distributions is Assessing Normality N (µ, Σ) Wrapping Up p ■ Visualizing the MVN is difficult for more than two dimensions, so I will demonstrate some plots with two variables - the bivariate normal distribution

Lecture #4 - 7/21/2011 Slide 10 of 41 Bivariate Normal Plot #1

0 1 0 µ = , Σ = " 0 # " 0 1 # Overview

MVN ● Univariate Review ● MVN ● MVN Contours 0.16

MVN Properties 0.14 0.12 MVN Parameters 0.1

MVN Likelihood Functions 0.08

MVN in Common Methods 0.06

0.04 Assessing Normality 0.02 Wrapping Up 0 4

2 4 2 0 0 −2 −2 −4 −4

Lecture #4 - 7/21/2011 Slide 11 of 41 Bivariate Normal Plot #1a

0 1 0 µ = , Σ = " 0 # " 0 1 # Overview

4 MVN ● Univariate Review ● MVN 3 ● MVN Contours

MVN Properties 2

MVN Parameters 1 MVN Likelihood Functions

0 MVN in Common Methods

Assessing Normality −1

Wrapping Up −2

−3

−4 −4 −3 −2 −1 0 1 2 3 4

Lecture #4 - 7/21/2011 Slide 12 of 41 Bivariate Normal Plot #2

0 1 0.5 µ = , Σ = " 0 # " 0.5 1 # Overview

MVN ● Univariate Review ● MVN ● MVN Contours

MVN Properties 0.2

MVN Parameters 0.15 MVN Likelihood Functions

MVN in Common Methods 0.1

Assessing Normality

0.05 Wrapping Up

0 4

2 4 2 0 0 −2 −2 −4 −4

Lecture #4 - 7/21/2011 Slide 13 of 41 Bivariate Normal Plot #2

0 1 0.5 µ = , Σ = " 0 # " 0.5 1 # Overview

4 MVN ● Univariate Review ● MVN 3 ● MVN Contours

MVN Properties 2

MVN Parameters 1 MVN Likelihood Functions

0 MVN in Common Methods

Assessing Normality −1

Wrapping Up −2

−3

−4 −4 −3 −2 −1 0 1 2 3 4

Lecture #4 - 7/21/2011 Slide 14 of 41 MVN Contours

■ The lines of the contour plots denote places of equal probability mass for the MVN distribution Overview ◆ The lines represent points of both variables that lead to MVN ● Univariate Review the same height on the z-axis (the height of the surface) ● MVN ● MVN Contours ■ These contours can be constructed from the eigenvalues MVN Properties and eigenvectors of the covariance matrix MVN Parameters

MVN Likelihood Functions ◆ The direction of the ellipse axes are in the direction of the MVN in Common Methods eigenvalues

Assessing Normality ◆ The length of the ellipse axes are proportional to the Wrapping Up constant times the eigenvector

■ Specifically:

′ − (x µ) Σ 1(x µ)= c2 − − has ellipsoids centered at µ, and has axes c√λ e ± i i

Lecture #4 - 7/21/2011 Slide 15 of 41 MVN Contours, Continued

■ Contours are useful because they provide confidence regions for data points from the MVN distribution

Overview ■ The multivariate analog of a confidence interval is given by MVN ● Univariate Review an ellipsoid, where c is from the Chi-Squared distribution ● MVN ● MVN Contours with p degrees of freedom

MVN Properties ■ MVN Parameters Specifically:

MVN Likelihood Functions ′ − (x µ) Σ 1(x µ)= χ2(α) MVN in Common Methods − − p Assessing Normality

Wrapping Up provides the confidence region containing 1 α of the probability mass of the MVN distribution −

Lecture #4 - 7/21/2011 Slide 16 of 41 MVN Contour Example

■ Imagine we had a bivariate normal distribution with:

0 1 0.5 Overview µ = , Σ = MVN " 0 # " 0.5 1 # ● Univariate Review ● MVN ■ ● MVN Contours The covariance matrix has eigenvalues and eigenvectors:

MVN Properties 1.5 0.707 0.707 MVN Parameters λ = , E = − MVN Likelihood Functions " 0.5 # " 0.707 0.707 # MVN in Common Methods ■ Assessing Normality We want to find a contour where 95% of the probability will 2 Wrapping Up fall, corresponding to χ2(0.05) = 5.99

Lecture #4 - 7/21/2011 Slide 17 of 41 MVN Contour Example

■ This contour will be centered at µ

■ Axis 1: Overview

MVN ● Univariate Review 0.707 2.12 2.12 ● MVN µ √ ● 5.99 1.5 = , − MVN Contours ± × " 0.707 # " 2.12 # " 2.12 # MVN Properties −

MVN Parameters ■ MVN Likelihood Functions Axis 2:

MVN in Common Methods

Assessing Normality 0.707 1.22 1.22 Wrapping Up µ √5.99 0.5 − = − , ± × 0.707 1.22 1.22 " # " # " − #

Lecture #4 - 7/21/2011 Slide 18 of 41 MVN Properties

■ The MVN distribution has some convenient properties

■ If X has a multivariate normal distribution, then: Overview

MVN 1. Linear combinations of X are normally distributed

MVN Properties ● MVN Properties 2. All subsets of the components of X have a MVN MVN Parameters distribution MVN Likelihood Functions MVN in Common Methods 3. Zero covariance implies that the corresponding Assessing Normality components are independently distributed

Wrapping Up 4. The conditional distributions of the components are MVN

Lecture #4 - 7/21/2011 Slide 19 of 41 Linear Combinations

■ If X N (µ, Σ), then any set of q linear combinations of ∼ p variables A(q×p) are also normally distributed as AX N (Aµ, AΣA′) Overview ∼ q MVN ■ For example, let p = 3 and Y be the difference between X MVN Properties 1 ● MVN Properties and X2. The combination matrix would be

MVN Parameters MVN Likelihood Functions A = )1 1 0 MVN in Common Methods − h i Assessing Normality For X Wrapping Up µ1 σ11 σ12 σ13 Σ µ =  µ2  , =  σ21 σ22 σ23  µ σ σ σ  3   31 32 33      For Y = AX

µ = µ µ , ΣY = σ + σ 2σ Y 1 − 2 11 22 − 12 h i h i

Lecture #4 - 7/21/2011 Slide 20 of 41 MVN Properties

■ The MVN distribution is characterized by two parameters: ◆ The mean vector µ Overview ◆ MVN The covariance matrix Σ

MVN Properties ■ MVN Parameters The maximum likelihood estimates for these parameters are ● MVN Properties given by: ● UVN CLT ● Multi CLT ● Sufficient Stats n ◆ ′ 1 1 MVN Likelihood Functions The mean vector: x¯ = x = X’1 n i n i=1 MVN in Common Methods X Assessing Normality ◆ The covariance matrix Wrapping Up n 1 1 ′ ′ ′ S = (x x¯)2 = (X 1x¯ ) (X 1x¯ ) n i − n − − i=1 X

Lecture #4 - 7/21/2011 Slide 21 of 41 Distribution of x¯ and S

Recall back in Univariate statistics you discussed the Central Limit Theorem (CLT)

Overview It stated that, if the set of n observations x1, x2,...,xn were MVN normal or not... MVN Properties ■ MVN Parameters The distribution of x¯ would be normal with mean equal to µ ● MVN Properties 2 ● UVN CLT and variance σ /n ● Multi CLT ● Sufficient Stats MVN Likelihood Functions ■ We were also told that (n 1)s2/σ2 had a Chi-Square − MVN in Common Methods distribution with n 1 degrees of freedom Assessing Normality −

Wrapping Up ■ Note: We ended up using these pieces of information for hypothesis testing such as t-test and ANOVA.

Lecture #4 - 7/21/2011 Slide 22 of 41 Distribution of x¯ and S

We also have a Multivariate Central Limit Theorem (CLT)

It states that, if the set of n observations x1, x2,..., xn are Overview multivariate normal or not... MVN

MVN Properties ■ The distribution of x¯ would be normal with mean equal to µ MVN Parameters ● MVN Properties and variance/covariance matrix Σ/n ● UVN CLT ● Multi CLT ● Sufficient Stats ■ MVN Likelihood Functions We are also told that (n 1)S will have a Wishart distribution, W (n 1, Σ−), with n 1 degrees of freedom MVN in Common Methods p − − Assessing Normality ◆ This is the multivariate analogue to a Chi-Square Wrapping Up distribution

■ Note: We will end up using some of this information for multivariate hypothesis testing

Lecture #4 - 7/21/2011 Slide 23 of 41 Distribution of x¯ and S

■ Therefore, let x1, x2,..., xn be independent observations from a population with mean µ and covariance Σ

Overview ■ The following are true: MVN

MVN Properties ◆ ¯ √n X µ is approximately Np(0, Σ) MVN Parameters − ● MVN Properties ′ − ◆  1 2 ● UVN CLT n (X µ) S (X µ) is approximately χp ● Multi CLT − − ● Sufficient Stats

MVN Likelihood Functions

MVN in Common Methods

Assessing Normality

Wrapping Up

Lecture #4 - 7/21/2011 Slide 24 of 41 Sufficient Statistics

■ The sample estimates X¯ and S) are sufficient statistics

■ This that all of the information contained in the data Overview can be summarized by these two statistics alone MVN MVN Properties ■ This is only true if the data follow a multivariate normal MVN Parameters ● MVN Properties distribution - if they do not, other terms are needed (i.e., ● UVN CLT ● Multi CLT array, array, etc...) ● Sufficient Stats

MVN Likelihood Functions ■ Some statistical methods only use one or both of these MVN in Common Methods matrices in their analysis procedures and not the actual data

Assessing Normality

Wrapping Up

Lecture #4 - 7/21/2011 Slide 25 of 41 Density and Likelihood Functions

■ The MVN distribution is often the core statistical distribution for a uni- or multivariate statistical technique

Overview ■ Maximum likelihood estimates are preferable in statistics due MVN to a set of desirable asymptotic properties, including: MVN Properties

MVN Parameters ◆ Consistency: the converges in probability to MVN Likelihood Functions the value being estimated ● Intro to MLE ◆ ● MVN Likelihood Asymptotic Normality: the estimator has a normal

MVN in Common Methods distribution with a functionally known variance ◆ Assessing Normality Efficiency: no asymptotically unbiased estimator has

Wrapping Up lower asymptotic than the MLE

■ The form of the MVN ML function frequently appears in statistics, so we will briefly discuss MLE using normal distributions

Lecture #4 - 7/21/2011 Slide 26 of 41 An Introduction to Maximum Likelihood

■ Maximum likelihood estimation seeks to find parameters of a (mapping onto the mean vector and/or

Overview covariance matrix) such that the statistical

MVN is maximized MVN Properties ■ The method assumes data follow a statistical distribution, in MVN Parameters our case the MVN MVN Likelihood Functions ● Intro to MLE ● MVN Likelihood ■ More frequently, the log-likelihood function is used instead of MVN in Common Methods the likelihood function Assessing Normality ◆ Wrapping Up The “logged” and “un-logged” version of the function have a maximum at the same point ◆ The “logged” version is easier mathematically

Lecture #4 - 7/21/2011 Slide 27 of 41 Maximum Likelihood for Univariate Normal

■ We will start with the univariate normal case and then generalize

Overview ■ Imagine we have a sample of data, X, which we will assume MVN is normally distributed with an unknown mean but a known MVN Properties variance (say the variance is 1) MVN Parameters

MVN Likelihood Functions ■ ● Intro to MLE We will build the maximum likelihood function for the mean ● MVN Likelihood ■ MVN in Common Methods Our function rests on two assumptions:

Assessing Normality 1. All data follow a normal distribution Wrapping Up 2. All observations are independent

■ Put into statistical terms: X is independent and identically distributed (iid) as N1 (µ, 1)

Lecture #4 - 7/21/2011 Slide 28 of 41 Building the Likelihood Function

■ Each observation, then, follows a normal distribution with the same mean (unknown) and variance (1)

Overview ■ The distribution function begins with the density – the MVN function that provides the normal curve (with (1) in place of MVN Properties σ2): MVN Parameters

MVN Likelihood Functions 2 ● Intro to MLE 1 (Xi µ) ● MVN Likelihood f(Xi µ)= exp − | 2π(1) − 2(1)2 MVN in Common Methods   Assessing Normality p Wrapping Up ■ The density provides the “likelihood” of observing an observation Xi for a given value of µ (and a known value of σ2 = 1

■ The “likelihood” is the height of the normal curve

Lecture #4 - 7/21/2011 Slide 29 of 41 The One-Observation Likelihood Function

The graph shows f(X µ = 1) for a of X i|

Overview

MVN

MVN Properties

MVN Parameters

MVN Likelihood Functions ● Intro to MLE ● MVN Likelihood f(X|mu)

MVN in Common Methods

Assessing Normality

Wrapping Up 0.0 0.1 0.2 0.3 0.4 −4 −2 0 2 4 X

The vertical lines indicate: ■ f(X = 0 µ =1)= .241 i | ■ f(X = 1 µ =1)= .399 i |

Lecture #4 - 7/21/2011 Slide 30 of 41 The Overall Likelihood Function

■ Because we have a sample of N observations, our likelihood function is taken across all observations, not just one

Overview ■ The “joint” likelihood function uses the assumption that MVN observations are independent to be expressed as a product MVN Properties of likelihood functions across all observations: MVN Parameters

MVN Likelihood Functions ● Intro to MLE L(x µ)= f(X1 µ) f(X2 µ) ... f(XN µ) ● MVN Likelihood | | × | × × | N N/2 N MVN in Common Methods 1 (X µ)2 L(x µ)= f(X µ)= exp i=1 i − Assessing Normality | i| 2π(1) − 2(1)2 i=1 ! Wrapping Up Y   P ■ The value of µ that maximizes f(x µ) is the MLE (in this case, it’s the sample mean) |

■ In more complicated models, the MLE does not have a closed form and therefore must be found using numeric methods

Lecture #4 - 7/21/2011 Slide 31 of 41 The Overall Log-Likelihood Function

■ For an unknown mean µ and variance σ2, the likelihood function is:

Overview N/2 N 2 MVN 1 (X µ) x 2 i=1 i L( µ, σ )= 2 exp 2 − MVN Properties | 2πσ − 2σ   P ! MVN Parameters ■ More commonly, the log-likelihood function is used: MVN Likelihood Functions ● Intro to MLE ● MVN Likelihood N MVN in Common Methods N (X µ)2 L(x µ, σ2)= log 2πσ2 i=1 i Assessing Normality 2 − | − 2 − 2σ ! Wrapping Up   P 

Lecture #4 - 7/21/2011 Slide 32 of 41 The Multivariate Normal Likelihood Function

■ For a set of N independent observations on p variables, X(N×p), the multivariate normal likelihood function is formed

Overview by using a similar approach

MVN ■ For an unknown mean vector µ and covariance Σ, the joint MVN Properties likelihood is: MVN Parameters

MVN Likelihood Functions ● Intro to MLE ● N MVN Likelihood 1 ′ − L(X µ, Σ)= exp (x µ) Σ 1 (x µ) /2 MVN in Common Methods | (2π)p/2 Σ 1/2 − i − i − i=1 Assessing Normality Y | |  Wrapping Up N 1 1 ′ − = exp (x µ) Σ 1 (x µ) /2 np/2 Σ n/2 − i − i − (2π) i=1 ! | | X

Lecture #4 - 7/21/2011 Slide 33 of 41 The Multivariate Normal Likelihood Function

■ Occasionally, a more intricate form of the MVN likelihood function shows up

Overview ■ Although mathematically identical to the function on the last MVN page, this version typically appears without explanation: MVN Properties

MVN Parameters − − L(X µ, Σ)=(2π) np/2 Σ n/2 MVN Likelihood Functions | | | ● Intro to MLE ● MVN Likelihood N − ′ ′ exp tr Σ 1 (x x¯)(x x¯) + n (x¯ µ)(x¯ µ) /2 MVN in Common Methods − i − i − − − " i=1 !# ! Assessing Normality X Wrapping Up

Lecture #4 - 7/21/2011 Slide 34 of 41 The MVN Log-Likelihood Function

■ As with the univariate case, the MVN likelihood function is typically converted into a log-likelihood function for simplicity

Overview ■ The MVN log-likelihood function is given by: MVN

MVN Properties np n l(X µ, Σ)= log (2π) log( Σ ) MVN Parameters | − 2 − 2 | | − MVN Likelihood Functions ● Intro to MLE N ● MVN Likelihood 1 ′ −1 (xi µ) Σ (xi µ) MVN in Common Methods 2 − − i=1 ! Assessing Normality X

Wrapping Up

Lecture #4 - 7/21/2011 Slide 35 of 41 But...Why?

■ The MVN distribution, likelihood, and log-likelihood functions show up frequently in statistical methods

Overview ■ Commonly used methods rely on versions of the distribution, MVN methods such as: MVN Properties

MVN Parameters ◆ Linear models (ANOVA, Regression) MVN Likelihood Functions ◆ Mixed models (i.e., hierarchical linear models, random MVN in Common Methods effects models, multilevel models) ● Motivation for MVN ◆ ● MVN in Mixed Models Path models/simultaneous equation models ● MVN in SEM ◆ Structural equation models (and confirmatory factor Assessing Normality models) Wrapping Up ◆ Many versions of finite mixture models

■ Understanding the form of the MVN distribution will help to understand the commonalities between each of these models

Lecture #4 - 7/21/2011 Slide 36 of 41 MVN in Mixed Models

■ From SAS’ manual for proc mixed:

Overview

MVN

MVN Properties

MVN Parameters

MVN Likelihood Functions

MVN in Common Methods ● Motivation for MVN ● MVN in Mixed Models ● MVN in SEM

Assessing Normality

Wrapping Up

Lecture #4 - 7/21/2011 Slide 37 of 41 MVN in Structural Equation Models

■ From SAS’ manual for proc calis:

Overview

MVN ■ MVN Properties This package uses only the covariance matrix, so the form of the likelihood function is phrased using only the Wishart MVN Parameters Distribution: MVN Likelihood Functions

MVN in Common Methods ● Motivation for MVN w (S Σ)= ● MVN in Mixed Models | ● MVN in SEM − S (n−p−2) exp tr SΣ 1 /2 Assessing Normality | | − p(n−1)/2 p(p−1)/4 Σ (n−1)/2 p 1 Wrapping Up 2 π Γ (n i) k |   i=1 2 − Q 

Lecture #4 - 7/21/2011 Slide 38 of 41 Assessing Normality

■ Recall from earlier that IF the data have a Multivariate normal distribution then all of the previously discussed

Overview properties will hold

MVN ■ There are a host of methods that have been developed to MVN Properties assess multivariate normality - just look in Johnson & MVN Parameters Wichern MVN Likelihood Functions MVN in Common Methods ■ Given the relative robustness of the MVN distribution, I will Assessing Normality ● Assessing Normality skip this topic, acknowledging that extreme deviations from ● Transformations to Near Normality normality will result in poorly performing statistics

Wrapping Up ■ More often than not, assessing MV normality is fraught with difficulty due to sample-estimated parameters of the distribution

Lecture #4 - 7/21/2011 Slide 39 of 41 Transformations to Near Normality

■ Historically, people have gone on an expedition to find a transformation to near-normality when learning their data

Overview may not be MVN

MVN ■ Modern statistical methods, however, make that a very bad MVN Properties idea MVN Parameters MVN Likelihood Functions ■ More often than not, transformations end up changing the MVN in Common Methods nature of the statistics you are interested in forming Assessing Normality ● Assessing Normality ● Transformations to Near ■ Furthermore, not all data need to be MVN (think conditional Normality distributions) Wrapping Up

Lecture #4 - 7/21/2011 Slide 40 of 41 Final Thoughts

■ The multivariate normal distribution is an analog to the univariate normal distribution

Overview ■ The MVN distribution will play a large role in the upcoming MVN weeks MVN Properties MVN Parameters ■ We can finally put the background material to rest, and begin MVN Likelihood Functions learning some statistics methods MVN in Common Methods

Assessing Normality ■ Tomorrow: lab with SAS - the “fun” of proc iml

Wrapping Up ● Final Thoughts ■ Up next week: Inferences about Mean Vectors and Multivariate ANOVA

Lecture #4 - 7/21/2011 Slide 41 of 41