Estimating the Mean and Covariance Estimating the Mean and Covariance

I Suppose that X ∼ Np(µ, Σ) and that x1, x2,..., xN are the observed values in a random sample of size N from this distribution.

I The likelihood function is

N Y L = np (xα |µ, Σ) α=1 " N # 1 1 X = exp − (x − µ)0 Σ−1 (x − µ) . √ N 2 α α det 2πΣ α=1

Statistics 784 NC STATE UNIVERSITY 1 / 13 Multivariate Analysis Estimating the Mean and Covariance

I The data values x1, x2,..., xN are ﬁxed; we view L as a function of µ and Σ, and to distinguish between the population parameters and variables, we denote the latter µ∗ and Σ∗.

I We often work with

N ∗ X ∗ 0 ∗−1 ∗ −2 log L = N log(det 2πΣ ) + (xα − µ ) Σ (xα − µ ) α=1

I Because log(·) is monotone increasing, maximizing L is equivalent to minimizing −2 log L.

Statistics 784 NC STATE UNIVERSITY 2 / 13 Multivariate Analysis Estimating the Mean and Covariance

I The sample mean vector is   x¯1 N 1 X  x¯2  x¯ = xα =  .  N  .  α=1  .  x¯p

I The matrix of sums of squares and cross products of deviations is

N X 0 A = (xα − x¯)(xα − x¯) α=1 with (i, j) element

N X (xi,α − x¯i )(xj,α − x¯j ) α=1

Statistics 784 NC STATE UNIVERSITY 3 / 13 Multivariate Analysis Estimating the Mean and Covariance

∗ I Note that for any ξ1, ξ2,..., ξN and any positive deﬁnite Σ ,

N N X 0 ∗−1 X ∗−1 0 ξαΣ ξα = trace Σ ξαξα α=1 α=1 N ! ∗−1 X 0 = trace Σ ξαξα α=1

I Note also the matrix generalization of the scalar result:

N X 0 (xα − b)(xα − b) α=1 N X 0 0 = (xα − x¯)(xα − x¯) + N (x¯ − b)(x¯ − b) α=1

Statistics 784 NC STATE UNIVERSITY 4 / 13 Multivariate Analysis Estimating the Mean and Covariance

∗ ∗ I So −2 log L, as a function of µ and Σ , can be written

−2 log L = N log(det 2πΣ∗) + trace Σ∗−1A + N (x¯ − µ∗)0 Σ∗−1 (x¯ − µ∗)

∗ I Since µ appears only in the last term, −2 log L is clearly minimized at µ∗ = x¯, regardless of the value of Σ.

I The maximum likelihood estimate of µ is therefore

N 1 X µˆ = x¯ = x , N α α=1 whether Σ is known or unknown.

Statistics 784 NC STATE UNIVERSITY 5 / 13 Multivariate Analysis Estimating the Mean and Covariance

∗ I To ﬁnd the Σ that minimizes −2 log L, use the result that for any positive deﬁnite B,

trace(CB) − N log det C

−1 1 is minimized at C = N B. ∗ I So for any µ , −2 log L is minimized at 1 Σ∗ = A + N (x¯ − µ∗)(x¯ − µ∗)0 N

I In particular, when µ is unknown, and is estimated byµ ˆ = x¯, the maximum likelihood estimate of Σ is

N 1 1 X Σˆ = A = (x − x¯)(x − x¯)0 . N N α α α=1

Statistics 784 NC STATE UNIVERSITY 6 / 13 Multivariate Analysis Estimating the Mean and Covariance Distribution of the Sample Mean

I If we write Z as the (Np)-dimensional random vector   X1  X2  Z =    .   .  XN

its density is 1 1 0 −1 √ exp − (z − µZ) ΣZ (z − µZ) det 2πΣZ 2

Statistics 784 NC STATE UNIVERSITY 7 / 13 Multivariate Analysis Estimating the Mean and Covariance

I where  µ   µ  µ =   Z  .   .  µ and  Σ 0 ... 0   0 Σ ... 0  Σ =   Z  . . .. .   . . . .  0 0 ... Σ

I That is, Z ∼ NNp (µZ, ΣZ).

Statistics 784 NC STATE UNIVERSITY 8 / 13 Multivariate Analysis Estimating the Mean and Covariance

I Also X¯ = CZ, where 1 C = (II ... I) p×Np N ¯ I So X ∼ Np(µX¯ , ΣX¯ ), where

µX¯ = CµZ = µ, 1 Σ = CΣ C0 = Σ. X¯ Z N ¯ 1 I That is, X ∼ Np µ, N Σ .

Statistics 784 NC STATE UNIVERSITY 9 / 13 Multivariate Analysis Estimating the Mean and Covariance Joint Distribution of the Sample Mean and Covariance Matrix

I To ﬁnd the joint distribution ofµ ˆ and Σˆ , we use the following:

I Suppose that X1, X2,..., XN are independent, and Xα ∼ Np(µα, Σ). I If C is an N × N orthogonal matrix, and we transform from X1, X2,..., XN to Y1, Y2,..., YN using

N X Yα = cα,βXβ, α = 1, 2,..., N, β=1

then Y1, Y2,..., YN are independent, and Yα ∼ Np(να, Σ), where

N X να = cα,βµβ. β=1

I The proof is similar to the previous argument.

Statistics 784 NC STATE UNIVERSITY 10 / 13 Multivariate Analysis Estimating the Mean and Covariance

I In the case of a random sample from Np(µ, Σ), the means are equal: µ1 = µ2 = ··· = µN . I Also, we can construct C so that its last row is 1 1 1 √ , √ ,..., √ N N N √ I Then YN = N X¯.

I Also, because the other rows of C are orthogonal to the last,

N X cα,β = 0, α < N, β=1

so Yα ∼ Np(0, Σ) for α = 1, 2,..., N − 1.

Statistics 784 NC STATE UNIVERSITY 11 / 13 Multivariate Analysis Estimating the Mean and Covariance

I Again because of the orthogonality of C,

N N X 0 X 0 XαXα = YαYα α=1 α=1

0 ¯ ¯ 0 I But YN YN = NXX , so

N N−1 X 0 ¯ ¯ 0 X 0 A = XαXα − NXX = YαYα α=1 α=1

I Since YN is independent of Y1, Y2,..., YN−1, it follows that X¯ is independent of A.

Statistics 784 NC STATE UNIVERSITY 12 / 13 Multivariate Analysis Estimating the Mean and Covariance Summary

I The maximum likelihood estimatorsµ ˆ and Σˆ are independent. ¯ 1 I µˆ = X ∼ Np µ, N Σ . ˆ 1 I Σ = N A is distributed as N−1 1 X Y Y0 N α α α=1

where Y1, Y2,..., YN−1 are independent Np(0, Σ). I The distribution of A is the Wishart distribution Wp(Σ, N − 1). I Since trivially E(A) = (N − 1)Σ, Σˆ is a biased estimator of Σ, and we often use the sample covariance matrix

N N 1 X S = Σˆ = (x − x¯)(x − x¯)0 N − 1 N − 1 α α α=1 which is the sample value of an unbiased estimator.

Statistics 784 NC STATE UNIVERSITY 13 / 13 Multivariate Analysis