Estimating the Mean and Covariance Estimating the Mean and Covariance
I Suppose that X ∼ Np(µ, Σ) and that x1, x2,..., xN are the observed values in a random sample of size N from this distribution.
I The likelihood function is
N Y L = np (xα |µ, Σ) α=1 " N # 1 1 X = exp − (x − µ)0 Σ−1 (x − µ) . √ N 2 α α det 2πΣ α=1
Statistics 784 NC STATE UNIVERSITY 1 / 13 Multivariate Analysis Estimating the Mean and Covariance
I The data values x1, x2,..., xN are fixed; we view L as a function of µ and Σ, and to distinguish between the population parameters and variables, we denote the latter µ∗ and Σ∗.
I We often work with
N ∗ X ∗ 0 ∗−1 ∗ −2 log L = N log(det 2πΣ ) + (xα − µ ) Σ (xα − µ ) α=1
I Because log(·) is monotone increasing, maximizing L is equivalent to minimizing −2 log L.
Statistics 784 NC STATE UNIVERSITY 2 / 13 Multivariate Analysis Estimating the Mean and Covariance
I The sample mean vector is x¯1 N 1 X x¯2 x¯ = xα = . N . α=1 . x¯p
I The matrix of sums of squares and cross products of deviations is
N X 0 A = (xα − x¯)(xα − x¯) α=1 with (i, j) element
N X (xi,α − x¯i )(xj,α − x¯j ) α=1
Statistics 784 NC STATE UNIVERSITY 3 / 13 Multivariate Analysis Estimating the Mean and Covariance
∗ I Note that for any ξ1, ξ2,..., ξN and any positive definite Σ ,
N N X 0 ∗−1 X ∗−1 0 ξαΣ ξα = trace Σ ξαξα α=1 α=1 N ! ∗−1 X 0 = trace Σ ξαξα α=1
I Note also the matrix generalization of the scalar result:
N X 0 (xα − b)(xα − b) α=1 N X 0 0 = (xα − x¯)(xα − x¯) + N (x¯ − b)(x¯ − b) α=1
Statistics 784 NC STATE UNIVERSITY 4 / 13 Multivariate Analysis Estimating the Mean and Covariance
∗ ∗ I So −2 log L, as a function of µ and Σ , can be written
−2 log L = N log(det 2πΣ∗) + trace Σ∗−1A + N (x¯ − µ∗)0 Σ∗−1 (x¯ − µ∗)
∗ I Since µ appears only in the last term, −2 log L is clearly minimized at µ∗ = x¯, regardless of the value of Σ.
I The maximum likelihood estimate of µ is therefore
N 1 X µˆ = x¯ = x , N α α=1 whether Σ is known or unknown.
Statistics 784 NC STATE UNIVERSITY 5 / 13 Multivariate Analysis Estimating the Mean and Covariance
∗ I To find the Σ that minimizes −2 log L, use the result that for any positive definite B,
trace(CB) − N log det C
−1 1 is minimized at C = N B. ∗ I So for any µ , −2 log L is minimized at 1 Σ∗ = A + N (x¯ − µ∗)(x¯ − µ∗)0 N
I In particular, when µ is unknown, and is estimated byµ ˆ = x¯, the maximum likelihood estimate of Σ is
N 1 1 X Σˆ = A = (x − x¯)(x − x¯)0 . N N α α α=1
Statistics 784 NC STATE UNIVERSITY 6 / 13 Multivariate Analysis Estimating the Mean and Covariance Distribution of the Sample Mean
I If we write Z as the (Np)-dimensional random vector X1 X2 Z = . . XN
its density is 1 1 0 −1 √ exp − (z − µZ) ΣZ (z − µZ) det 2πΣZ 2
Statistics 784 NC STATE UNIVERSITY 7 / 13 Multivariate Analysis Estimating the Mean and Covariance
I where µ µ µ = Z . . µ and Σ 0 ... 0 0 Σ ... 0 Σ = Z . . .. . . . . . 0 0 ... Σ
I That is, Z ∼ NNp (µZ, ΣZ).
Statistics 784 NC STATE UNIVERSITY 8 / 13 Multivariate Analysis Estimating the Mean and Covariance
I Also X¯ = CZ, where 1 C = (II ... I) p×Np N ¯ I So X ∼ Np(µX¯ , ΣX¯ ), where
µX¯ = CµZ = µ, 1 Σ = CΣ C0 = Σ. X¯ Z N ¯ 1 I That is, X ∼ Np µ, N Σ .
Statistics 784 NC STATE UNIVERSITY 9 / 13 Multivariate Analysis Estimating the Mean and Covariance Joint Distribution of the Sample Mean and Covariance Matrix
I To find the joint distribution ofµ ˆ and Σˆ , we use the following:
I Suppose that X1, X2,..., XN are independent, and Xα ∼ Np(µα, Σ). I If C is an N × N orthogonal matrix, and we transform from X1, X2,..., XN to Y1, Y2,..., YN using
N X Yα = cα,βXβ, α = 1, 2,..., N, β=1
then Y1, Y2,..., YN are independent, and Yα ∼ Np(να, Σ), where
N X να = cα,βµβ. β=1
I The proof is similar to the previous argument.
Statistics 784 NC STATE UNIVERSITY 10 / 13 Multivariate Analysis Estimating the Mean and Covariance
I In the case of a random sample from Np(µ, Σ), the means are equal: µ1 = µ2 = ··· = µN . I Also, we can construct C so that its last row is 1 1 1 √ , √ ,..., √ N N N √ I Then YN = N X¯.
I Also, because the other rows of C are orthogonal to the last,
N X cα,β = 0, α < N, β=1
so Yα ∼ Np(0, Σ) for α = 1, 2,..., N − 1.
Statistics 784 NC STATE UNIVERSITY 11 / 13 Multivariate Analysis Estimating the Mean and Covariance
I Again because of the orthogonality of C,
N N X 0 X 0 XαXα = YαYα α=1 α=1
0 ¯ ¯ 0 I But YN YN = NXX , so
N N−1 X 0 ¯ ¯ 0 X 0 A = XαXα − NXX = YαYα α=1 α=1
I Since YN is independent of Y1, Y2,..., YN−1, it follows that X¯ is independent of A.
Statistics 784 NC STATE UNIVERSITY 12 / 13 Multivariate Analysis Estimating the Mean and Covariance Summary
I The maximum likelihood estimatorsµ ˆ and Σˆ are independent. ¯ 1 I µˆ = X ∼ Np µ, N Σ . ˆ 1 I Σ = N A is distributed as N−1 1 X Y Y0 N α α α=1
where Y1, Y2,..., YN−1 are independent Np(0, Σ). I The distribution of A is the Wishart distribution Wp(Σ, N − 1). I Since trivially E(A) = (N − 1)Σ, Σˆ is a biased estimator of Σ, and we often use the sample covariance matrix
N N 1 X S = Σˆ = (x − x¯)(x − x¯)0 N − 1 N − 1 α α α=1 which is the sample value of an unbiased estimator.
Statistics 784 NC STATE UNIVERSITY 13 / 13 Multivariate Analysis