Multivariate Statistics Chapter 2: Multivariate Distributions and Inference

Multivariate Statistics Chapter 2: Multivariate distributions and inference

Pedro Galeano Departamento de Estad´ıstica Universidad Carlos III de Madrid [email protected]

Course 2017/2018

Master in Mathematical Engineering

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 1 / 106 1 Introduction

2 Basic concepts

3 Multivariate distributions

4 Statistical inference

5 Hypothesis testing

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 2 / 106 Introduction

Multivariate statistical analysis is concerned with analysing and understanding data in more than one (high) dimensions.

Therefore, as in Chapter 1, we assume that we are given a set of n observations of p univariate random variables x1,..., xp. The p univariate random variables can be summarized in a multivariate random 0 p variable x = (x1,..., xp) deﬁned in R . In this chapter we give an introduction to the basic probability tools associated with the multivariate random variable x that are useful in multivariate statistical analysis.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 3 / 106 Introduction

In particular, we present:

I the basic probability tools used to describe a multivariate random variable, including the concepts of marginal and conditional distributions and the concept of independence;

I the mean vector, the covariance matrix and the correlation matrix of a multivariate random variable and their counterparts for marginal and conditional distributions;

I the basic techniques needed to derive the distribution of transformations with special emphasis on linear transformations;

I several multivariate distributions, including the multivariate Gaussian distribution, along with most of its companion distributions and other interesting alternatives; and

I statistical inference for multivariate samples, including parameter estimation and hypothesis testing.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 4 / 106 Basic concepts

We can say that we have the joint distribution of a multivariate random variable when the following are speciﬁed:

p 1 The sample space of the possible values, which, in general, is a subset of R .

2 The probabilities of each possible result of the sample space.

We say that a p-dimensional random variable is discrete when each of the p scalar variables that comprise it are discrete as well.

Analogously, we say that the variable is continuous if its components are continuous as well.

Otherwise, the variable is mixed.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 5 / 106 Basic concepts

0 Let x = (x1,..., xp) be a multivariate random variable.

0 0 00 The cumulative distribution function (cdf) of x at a point x = x1 ,..., xp , 0 is denoted by Fx x and is given by:

0 0 0 0 Fx x = Pr x ≤ x = Pr x1 ≤ x1 ,..., xp ≤ xp

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 6 / 106 Basic concepts

For continuous multivariate random variables, a nonnegative probability density function (pdf) fx exists, such that:

0 0 Z xp Z x1 0 Fx x = ··· fx (x1,..., xp) dx1 ··· dxp −∞ −∞

Note that: Z ∞ Z ∞ ··· fx (x1,..., xp) dx1 ··· dxp = 1 −∞ −∞

Note also that the cdf Fx is diﬀerentiable with:

p ∂ Fx (x) fx (x) = ∂x1 ··· ∂xp

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 7 / 106 Basic concepts

For discrete multivariate random variables, the values of the random variable

are concentrated on a countable or ﬁnite set of points {cj }j∈J . The probability of events of the form x ∈ D, for a certain set D ⊂ J can be computed as: X Pr (x ∈ D) = Pr (x = cj )

j:cj ∈D For simplicity we will mainly focus on continuous multivariate random variables.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 8 / 106 Basic concepts

0 The marginal density function of a subset of the elements of x, say xi1 ,..., xij , is given by: Z ∞ Z ∞ f x ,..., x = ··· f (x ,..., x ) dx ··· dx xi1 ,...,xij i1 ij x 1 p 1 p −∞ −∞ 6=i1,...,ij 6=i1,...,ij

In particular, the marginal density function of each xj , for j = 1,..., p is given by: Z ∞ Z ∞

fxj (xj ) = ··· fx (x1,..., xp) dx1 ··· dxp −∞ −∞ 6=j 6=j

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 9 / 106 Basic concepts

0 0 Let x = (x1,..., xp) and y = (y1,..., yq) be two multivariate random variables with density functions fx and fy , respectively, and joint cumulative density function fx,y . Then, the conditional density function of y given x is the density function of y given that x is known to be a certain value.

The conditional density function of y given x is given by:

fx,y (x, y) fy|x (y|x) = fx (x)

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 10 / 106 Basic concepts

From the previous deﬁnition, we can deduce that the pdf of (x, y) is given by:

fx,y (x, y) = fy|x (y|x) fx (x) = fx|y (x|y) fy (y)

As a consequence:

fx|y (x|y) fy (y) fx|y (x|y) fy (y) fx|y (x|y) fy (y) fy|x (y|x) = = R = R fx (x) fx,y (x, y) dy fx|y (x|y) fy (y) dy

This is the Bayes Theorem, one of the most important results in Statistics as it is the base of Bayesian inference.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 11 / 106 Basic concepts

The multivariate random variables x and y are independent if, and only if:

fx,y (x, y) = fx (x) fy (y)

Therefore, if x and y are independent, then:

fy|x (y|x) = fy (y)

and, fx|y (x|y) = fx (x) Independence can be interpreted as follows: knowing y does not change the probability assessments on x, and conversely.

In general, the p univariate random variables x1,..., xp are independent if, and only if:

fx1,...,xp (x1,..., xp) = fx1 (x1) ··· fxp (xp)

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 12 / 106 Basic concepts

It is important to note that diﬀerent multivariate pdf may have the same marginal pdf’s.

For instance, it is easy to see that the bivariate pdf’s given by:

fx1,x2 (x1, x2) = 1, 0 < x1, x2 < 1

and,

fx1,x2 (x1, x2) = 1 + 0.5 (2x1 − 1) (2x2 − 1) , 0 < x1, x2 < 1 have the marginals pdf given by:

fx1 (x1) = 1, 0 < x1 < 1

and,

fx2 (x2) = 1, 0 < x2 < 1 respectively.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 13 / 106 Basic concepts

Consequently, in general, the pdfs of the marginals do not specify the joint cdf of the multivariate variable.

There is one exception: when the univariate variables x1,..., xp are independent, then the joint pdf of x is the product of the marginal pdfs.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 14 / 106 Basic concepts

An elegant concept of connecting marginals with joint cdf’s is given by copulae.

For simplicity of presentation we concentrate on the p = 2 dimensional case.

A 2-dimensional copula is a function C : [0, 1]2 → [0, 1] with the following properties:

1 For every u ∈ [0, 1] : C (0, u) = C (u, 0) = 0.

2 For every u ∈ [0, 1] : C (1, u) = C (u, 1) = u.

3 For every (u1, u2) , (v1, v2) ∈ [0, 1] × [0, 1] with u1 ≤ v1 and u2 ≤ v2:

C (v1, v2) − C (v1, u2) − C (u1, v2) + C (u1, u2) ≥ 0

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 15 / 106 Basic concepts

The usefulness of a copula function C is explained by the Sklar’s Theorem.

Sklar’s Theorem: Let Fx be a multivariate cdf with marginal cdf’s Fx1 and Fx2 .

Then, a copula Cx1,x2 exists with:

Fx1,x2 (x1, x2) = Cx1,x2 (Fx1 (x1) , Fx2 (x2))

2 for every x1, x2 ∈ R . If Fx1 and Fx2 are continuous, then Cx1,x2 is unique. On

the other hand, if Cx1,x2 is a copula and Fx1 and Fx2 are cdf’s, then the function

Fx1,x2 deﬁned above, is a multivariate cdf with marginals Fx1 and Fx2 . Therefore, a copula function links a multivariate distribution to its one-dimensional marginals.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 16 / 106 Basic concepts

Theorem: Let x1 and x2 be random variables with cdf’s Fx1 and Fx2 , and mul-

tivariate cdf Fx1,x2 . Then, x1 and x2 are independent if and only if:

Cx1,x2 (Fx1 (x1) , Fx2 (x2)) = Fx1 (x1) Fx2 (x2)

The previous copula function is called the independence copula.

Other copula functions will be given in this chapter.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 17 / 106 Basic concepts

0 Let x = (x1,..., xp) be a multivariate random variable. The expectation or mean vector of x, is the vector whose components are the expectations or means of the components of the random variable, i.e.:   E [x1]  .  µx = E [x] =  .  E [xp]

where, ∞ Z

E [xj ] = xj fxj (xj ) dxj −∞

and fxj (xj ) is the marginal density function of xj .

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 18 / 106 Basic concepts

The covariance matrix of the multivariate random variable x with mean vector µx , is a p × p symmetric and semideﬁnite positive matrix given by: 0 Σx = E (x − µx )(x − µx )

such that:

I The diagonal elements of Σx are the variances of the components given by,

∞ Z 2 2 σx,j = (xj − µx,j ) fxj (xj ) dxj , −∞

for j = 1,..., p.

I The elements outside the main diagonal are the covariances between pairs of variables, ∞ Z

σx,jk = (xj − µx,j )(xk − µx,k ) fxj ,xk (xj , xk ) dxj dxk , −∞ for j, k = 1,..., p.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 19 / 106 Basic concepts

The correlation matrix of the multivariate random variable x with covariance matrix Σx is given by: −1/2 −1/2 %x = ∆x Σx ∆x

where ∆x is a diagonal matrix with the variances of the components of x. The elements outside the main diagonal are the correlations between pairs of variables, given by: σx,jk ρx,jk = σx,j σx,k

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 20 / 106 Basic concepts

0 0 Let x = (x1,..., xp) be a multivariate random variable and let xi1 ,..., xij be a subset of the elements of x. 0 Then, the mean vector and the covariance and correlation matrices of xi1 ,..., xij are obtained by extracting the corresponding elements of the mean vector and the covariance and correlation matrices of x.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 21 / 106 Basic concepts

0 0 Let x = (x1,..., xp) and y = (y1,..., yq) be two random variables with density functions fx and fy , respectively, and let fy|x be the conditional density function of y given x.

The conditional expectation of y given x is given by: Z Ey|x [y|x] = yfy|x (y|x) dy

which depends on x.

An important property of Ey|x [y|x] is the law of total expectation: Ey [y] = Ex Ey|x [y|x]

Then, to compute Ey [y], we can ﬁrst compute Ey|x [y|x] and then, take the expectation with respect to the distribution of x.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 22 / 106 Basic concepts

Similarly, the conditional covariance and correlation matrices are the covariance and correlation matrices of the multivariate random variable y|x.

In particular, the condicional covariance matrix contains the conditional varian-

ces, Varyj |x [yj |x] and the conditional covariances, Covyj ,yk |x [yj , yk |x].

An important property of Varyj |x [yj |x] is law of total variance: Varyj [yj ] = Ex Varyj |x [yj |x] + Varx Eyj |x [yj |x] .

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 23 / 106 Basic concepts

For example, let x ∼Geometric(p) and let y|x ∼Poisson(x).

The goal is to compute the expectation and variance of y.

On the one hand, we have: 1 − p 1 − p E [x] = Var [x] = x p x p2 On the other hand, we have:

Ey|x [y|x] = Vary|x [y|x] = x

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 24 / 106 Basic concepts

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 25 / 106 Basic concepts

0 0 Let x = (x1,..., xp) and y = (y1,..., yq) be two multivariate random variables with mean vectors µx and µy and covariance matrices Σx and Σy , respectively.

0 0 The covariance matrix between x = (x1,..., xp) and y = (y1,..., yq) is a p × q matrix given by:

0 Cov [x, y] = E (x − µx )(y − µy )

0 0 Similarly, the correlation matrix between x = (x1,..., xp) and y = (y1,..., yq) is a p × q matrix given by:

−1/2 −1/2 Cor [x, y] = ∆x Cov [x, y] ∆y

where ∆x and ∆y are diagonal matrices with elements the diagonal elements of Σx and Σy , respectively.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 26 / 106 Basic concepts

0 Let x = (x1,..., xp) be a multivariate variable with pdf fx and let y = 0 (y1,..., yp) a new variable given by:

y = g (x)

where g is a function with diﬀerentiable inverse given by:

x = g −1 (y) = h (y)

Therefore, the pdf of y is given by: ∂x ∂h (y) fy (y) = fx (x) det = fx (h (y)) det ∂y ∂y

∂x where ∂y is the Jacobian of the transformation, det (·) stands for determinant and |·| denotes the absolute value function.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 27 / 106 Basic concepts

Consider the particular case of a linear transformation, y = Ax + b, where A is a non-singular p × p matrix and b is a p × 1 vector.

−1 ∂x −1 Then, we have that x = A (y − b) while ∂y = A . Therefore:

−1 −1 fy (y) = fx A (y − b) A

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 28 / 106 Basic concepts

The previous case only consider transformation from a p-dimensional random variable to another p-dimensional random variable.

The case of transformations from a p-dimensional random variable to an q- dimensional random variable, with p 6= q is more diﬃcult to handle.

Therefore, we focus on the mean vector and the covariance matrix of the transformed random variable.

0 0 Let x = (x1,..., xp) be a multivariate random variable and let y = (y1,..., yq) such that: y = Ax + b where A is a q × p matrix and b is a q × 1 column vector.

Then, letting µx and µy be the mean vectors and Σx and Σy be the covariance matrices of x and y, respectively, we have:

0 µy = Aµx + b, Σy = AΣx A

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 29 / 106 Multivariate distributions

The multivariate Gaussian distribution is a generalization to two or more dimensions of the univariate Gaussian (or Normal) distribution.

This is often characterized by its resemblance to the shape of a bell and this is why it is popularly referred as the “bell curve”.

The Gaussian distribution is used extensively in both theoretical and applied statistics research.

Although it is well known that real data rarely obey the dictates of the Gaussian distribution, this deception does provide us with a useful approximation to reality.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 30 / 106 Multivariate distributions

The pdf of a univariate Gaussian random variable with mean µx = E (x) and 2 variance σx = Var (x) is:

2 ! 2−1/2 (x − µx ) fx (x) = 2πσx exp − 2 − ∞ < x < ∞ 2σx

2 and we denote it as x ∼ N µx , σx .

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 31 / 106 Multivariate distributions

PDF of N(0,1) in blue, N(1,1) in green and N(0,2) in orange 0.4 0.3 0.2 0.1 0.0

−4 −2 0 2 4 x

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 32 / 106 Multivariate distributions

Generalizing the univariate Gaussian distribution, the pdf of a multivariate Gaus- 0 sian random variable x = (x1,..., xp) with mean vector µx = E (x) and covariance matrix Σx = Cov (x) is given by:

(x − µ )0 Σ−1 (x − µ ) f (x) = (2π)−p/2 |Σ |−1/2 exp − x x x x x 2

where −∞ < xj < ∞, for j = 1,..., p.

We denote it as x ∼ Np (µx , Σx ). The next slides show some examples of pdfs of bivariate Gaussian distributions.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 33 / 106 Multivariate distributions

PDF of multivariate standard Gaussian

0.15

0.10

4 0.05

−4 0 −2 x2 0 −2 x1 2 4 −4

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 34 / 106 Multivariate distributions

PDF of Gaussian with correlation .9

0.3

0.2

4 0.1 2 0.0 −4 0 −2 x2 0 −2 x1 2 4 −4

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 35 / 106 Multivariate distributions

PDF of Gaussian with correlation −.9

0.3

0.2

4 0.1 2 0.0 −4 0 −2 x2 0 −2 x1 2 4 −4

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 36 / 106 Multivariate distributions

It is of interest to know the distribution of a Gaussian variable after it has been linearly transformed.

Let x ∼ Np (µx , Σx ), A a q × p matrix and b a q × 1 column vector.

0 Then, y = Ax + b has a Nq (Aµx + b, AΣx A ) distribution. In other words, y has also a Gaussian distribution.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 37 / 106 Multivariate distributions

How is the Np (µx , Σx ) distribution related to the Np (0p, Ip) distribution (the standard multivariate Gaussian distribution)?

1/2 I If x ∼ Np (µx , Σx ) and y = Σx (x − µx ), then y ∼ Np (0p, Ip).

How can we create Np (µx , Σx ) variables on the basis of Np (0p, Ip) variables?

1/2 I If y ∼ Np (0p, Ip), then x = µx + Σx y ∼ Np (µx , Σx ).

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 38 / 106 Multivariate distributions

The level curves or contours are the curves obtained by cutting the probability density function by parallel hyperplanes.

In other words, the level curves are points with the same density value.

In the multivariate Gaussian case, their equation is given by:

0 −1 (x − µx ) Σx (x − µx ) = c

where c is a constant.

Therefore, the level curves of multivariate Gaussian distributions are ellipsoids.

The next two slides show the level curves for the Gaussian distributions consi- dered in the previous plots with and without a sample of 100 points generated from these distributions.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 39 / 106 Multivariate distributions

Levels curves for Gaussian with correlation 0 Levels curves for Gaussian with correlation .9 Levels curves for Gaussian with correlation −.9 4 4 4 2 2 2 0 0 0 −2 −2 −2 −4 −4 −4

−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 40 / 106 Multivariate distributions

Levels curves for Gaussian with correlation 0 Levels curves for Gaussian with correlation .9 Levels curves for Gaussian with correlation −.9 4 4 4 2 2 2 0 0 0 −2 −2 −2 −4 −4 −4

−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 41 / 106 Multivariate distributions

The level curves of the multivariate Gaussian distribution give us a notion of distance between points.

Note that all the points in the level curve have the same density and form an ellipsoid.

Therefore, it is reasonable to assume that all the points in a level curve are at the same distance from the center of the distribution, i.e., µx .

The implied distance is the Mahalanobis distance between x and µx , given by: q 0 −1 DM (x, µx ) = (x − µx ) Σx (x − µx )

2 If x ∼ Np (µx , Σx ), the squared Mahalanobis distance has a χp distribution, i.e., 2 2 DM (x, µx ) ∼ χp. The Mahalanobis distance plays an important role in many problems such as outlier detection, classiﬁcation, clustering and so on.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 42 / 106 Multivariate distributions

Random sample

54 2 75 79 100 36 35 27 66 9133 29 12 67 63 10 1 88 42 5362 61 76 44 48 95 51 24 21 68 9856 898215 65 438339 87 8 5 0 86 81 901880 16 93 649996371 23 69 92 73 178419 41 59346 32 49 74725838 13 1 9750 9 2 26 30 552031 4 −1 7 85 40 94 22257857 45 46 14 52 4770 60 11 37 28

−2 77 −2 −1 0 1 2

Mahalanobis distances

54 5 6 8 7736 4628 4 7335 2932614 8716613711 252722 757988492932 952 3370786094477657100

2 244566 9742868368915012 7 431067854440 463 3062 155692092533931 177259197413954851345698235838 6 39965189080897141812184 0 15826496 0 20 40 60 80 100

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 43 / 106 Multivariate distributions

It is useful to know more about the multivariate Gaussian distribution, since it is often a good approximation in many situations.

For instance, if we partition x, its mean vector µx and its covariance matrix Σx as: x(1) µx(1) x = µx = x(2) µx(2) and, Σx(11) Σx(12) Σx = Σx(21) Σx(22)

where x(1) and x(2) have dimensions q and p − q, respectively, then, x(1) ∼ Nq µx(1), Σx(12) , x(2) ∼ Np−q µx(2), Σx(22) and Cov x(1), x(2) = Σx(12).

Moreover, x(1) and x(2) are independent if and only if Σx(12) = 0(q,p−q), where 0(q,p−q) is a q × (p − q) matrix of zeros.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 44 / 106 Multivariate distributions

If Σx(22) > 0, then the conditional distribution of x(1) given x(2) is Gaussian with mean: −1 µx(1) + Σx(12)Σx(22) x(2) − µx(2) and covariance matrix: −1 Σx(11) − Σx(12)Σx(22)Σx(21) If x(1) and x(2) are independent and distributed as Nq µx(1), Σx(12) and x(2) ∼ 0 0 0 Np−q µx(2), Σx(22) , respectively, then x = x(1), x(2) has the multivariate Gaussian distribution: µx(1) Σx(11) 0(q,p−q) Np , µx(2) 0(p−q,q) Σx(22)

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 45 / 106 Multivariate distributions

The multivariate Gaussian distribution belongs to the large family of elliptical distributions which has recently gained a lot of attention in ﬁnancial mathema- tics.

0 We say that a vector variable x = (x1,..., xp) follows an elliptical distribution if its density function only depends on x through (x − m)0 V −1 (x − m), where m is a p × 1 column vector and V is a p × p matrix (not necessarily the mean and the covariance matrix of x).

Therefore, the level curves of the distribution are ellipsoids centered in m.

The multivariate Gaussian distribution is the best known elliptical distribution.

Indeed, elliptical distributions can be seen as an extension of the Np (µx , Σx ).

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 46 / 106 Multivariate distributions

2 Let y ∼ Np (0p, Σ) and u ∼ χν be independent. The multivariate random variable: rν x = µ + y u

has a multivariate Student’s t distribution with parameters µ, Σ and ν.

For ν > 2, the mean of the distribution is µ and the covariance matrix is v/ (v − 2) Σ.

The parameter ν is called the degrees of freedom parameter.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 47 / 106 Multivariate distributions

The density function of a multivariate Student’s t distribution is given by:

ν+p Γ − ν+p 2 −1/2 0 −1 2 fx (x) = p |Vx | 1 + (x − mx ) Vx (x − mx ) 2 ν (πν) Γ 2 The multivariate Student’s t distribution belongs to the class of elliptical distributions.

In particular, if Σ = Ip, this distribution belongs to the class of spherical distributions.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 48 / 106 Multivariate distributions

PDF of a Student't distribution with 5 df

0.3

0.2 4 0.1 2

−4 0 −2 x2

0 −2 x1 2

−4 4

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 49 / 106 Multivariate distributions

Elliptical distributions share many properties with Gaussian distributions: marginal and conditional distributions are also elliptical, and the conditional means are a linear function of the determining variables.

Nevertheless, the Gaussian distribution is the only one in the family to have the property whereby if the covariance matrix is diagonal, all the component variables are independent.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 50 / 106 Multivariate distributions

A distribution is called heavy-tailed if it has higher probability density in its tail area compared with a Gaussian distribution with the same mean vector and covariance matrix.

The multivariate Student’s t distribution is an example of heavy-tailed distributions.

Other examples of heavy-tailed distributions includes the multivariate generali- zed hyperbolic distribution and the multivariate Laplace distribution.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 51 / 106 Multivariate distributions

Mixture modelling concerns modelling a statistical distribution by a mixture (or weighted sum) of diﬀerent distributions.

For certain component density functions, mixture models can approximate any continuous density to arbitrary accuracy, provided that the number of component density functions is suﬃciently large and the parameters of the model are chosen correctly.

0 The density function of a multivariate random variable x = (x1,..., xp) that follows a mixture distribution is given by:

G X fx (x) = πg fx,g (x) g=1

where: PG I π1, . . . , πG are weights such that g=1 πg = 1;

I fx,1(x),..., fx,G (x) are multivariate pdf’s.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 52 / 106 Multivariate distributions

Note that the mixture distributions have an interesting interpretation in terms of heterogeneous populations.

Assume a population where we have deﬁned the multivariate random variable x and that can be subdivided more homogeneously into G groups.

Then, π1, . . . , πG can be seen as the proportion of elements in the groups 1,..., G, while fx,1(x),..., fx,G (x) are multivariate pdf’s associated with each population.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 53 / 106 Multivariate distributions

PDF of a Mixture distribution

0.08

0.06

0.04 6

0.02 4

0.00 2 −4

−2 x2 0 0

x1 2 −2

−4 6

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 54 / 106 Multivariate distributions

Levels curves for a mixture of Gaussian distributions 6 4 2 0 −2 −4

−4 −2 0 2 4 6

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 55 / 106 Multivariate distributions

PDF of a Mixture distribution

0.15

0.10 6 0.05 4

0.00 2 −4

−2 x2 0 0

x1 2 −2

−4 6

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 56 / 106 Multivariate distributions

Levels curves for a mixture of Gaussian distributions 6 4 2 0 −2 −4

−4 −2 0 2 4 6

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 57 / 106 Multivariate distributions

One main problem in multivariate analysis is how to model dependence of the components of a multivariate random variable.

We have seen several multivariate distributions that model this dependence.

However, these models, except perhaps mixtures, are not ﬂexible enough to model multivariate dependence.

As seen before, Copulae represent an elegant concept of connecting marginals with joint cumulative distribution functions.

Copulas are functions that join or “couple” multivariate distribution functions to their 1-dimensional marginal distribution functions.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 58 / 106 Multivariate distributions

0 Let x = (x1,..., xp) be a multivariate random variable and let Fxj , for j = 1,..., p, be the marginal distribution functions of the components of x.

Using copulae, the marginal distribution functions can be separately modelled from their dependence structure and then coupled together to form the multivariate distribution Fx . The formal deﬁnition of copula function is more complex than in the 2-dimensional case.

However, the intuition is the same as for the 2-dimensional case, so we do not provide here its formal deﬁnition.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 59 / 106 Multivariate distributions

Sklar’s Theorem in p dimensions: Let Fx be a p-dimensional distribution function

with marginal distribution functions Fx1 ,..., Fxp . Then, a p-dimensional copula p Cx exists such that for all x1,..., xp ∈ R : Fx (x1,..., xp) = Cx Fx1 (x1) ,..., Fxp (xp)

Moreover, if Fx1 ,..., Fxp are continuous then Cx is unique. Conversely, if Cx is

a copula and Fx1 ,..., Fxp are distribution functions then Fx deﬁned above is a

p-dimensional distribution function with marginals Fx1 ,..., Fxp .

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 60 / 106 Multivariate distributions

Let Fz denote the univariate standard Gaussian distribution function and Fx the p-dimensional Gaussian distribution with mean vector 0p and covariance as well as correlation matrix Σx . Then, the function:

C Gauss (u) = F F −1 (u ) ,..., F −1 (u ) x,Σx x z 1 z p

is the p-dimensional Gaussian copula with correlation matrix Σx , where u = 0 p (u1,..., up) ∈ [0, 1] .

If Σx 6= Ip, then, the corresponding Gaussian copula allows to generate joint symmetric dependence.

However, it is not possible to model a tail dependence, e.g., joint extreme events have a zero probability.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 61 / 106 Multivariate distributions

The function:  1/θ  p  GH  X θ  Cx,θ (u) = exp −  (− log uj )   j=1

is the p-dimensional Gumbel-Hougaard copula function where θ ∈ [1, ∞).

GH Unlike the Gaussian copula, Cx,θ can generate an upper tail dependence.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 62 / 106 Multivariate distributions

PDF of a Copula distribution

1.5

1.0 −4 0.5 −2 0.0 0.0 0 0.5 x1

1.0 2 x2 1.5

4 2.0

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 63 / 106 Statistical inference

In multivariate statistics, we observe the values of a multivariate random variable 0 0 x = (x1,..., xp) and obtain a sample xi· = (xi1,..., xip) , for i = 1,..., n summarised in a data matrix X .

For a given random sample, x1·,..., xn·, the idea of statistical inference is to analyse the properties of the population random variable x.

If we do not know (or assume) the (parametric) distribution of the random variable x, then one can try to estimate its pdf as well as their main characteristics, such as the mean vector and the covariance matrix.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 64 / 106 Statistical inference

To estimate multivariate pdfs, we can use a multivariate kernel density.

The general form of a multivariate kernel estimator of the density of x = 0 (x1,..., xp) , denoted by fx (·), based on the sample x1j ,..., xnj is given by:

n 1 −1/2 X −1/2 fbx,H (x) = |H| K H (x − xi·) n i=1

where K (·) is a multivariate kernel function, where H is a p ×p positive deﬁnite matrix, called the bandwidth.

The most usual multivariate kernel function is the multivariate Gaussian kernel: 1 K (u) = (2π)−p/2 exp − u0u 2

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 65 / 106 Statistical inference

As in the univariate case, the bandwidth is selected by minimizing the mean integrated squared error (MISE):

Z 2 MISE (H) = E fbx,H (x) − fx (x) dx

Asymptotic rules have been provided to obtain the optimal bandwidth (not entering into details here).

An usual approach is to consider a diagonal H, where the diagonal elements are selected using univariate rules.

One problem of kernel density estimation is that, if the dimension of the random variable, p, is large, many observations are needed to obtain reliable density estimates.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 66 / 106 Statistical inference

For example, the next slide shows the multivariate kernel density estimate for a sample of 200 observations generated from a N (µx , Σx ) where:

0 I µx = (0, 0) 1 0.7 I Σ = x 0.7 1

Additionally, contour plots of the true and estimated densities are also shown.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 67 / 106 Statistical inference

Multivariate kernel density

0.20

0.15

0.10 4

0.05 2 0.00 −4 0 −2 x2

0 −2 x1 2

4 −4

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 68 / 106 Statistical inference

True and estimated levels curves 4 2 0 −2 −4

−4 −2 0 2 4

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 69 / 106 Statistical inference

0 Characteristics of the random variable x = (x1,..., xp) can often be performed using some observable functions of the sample x1·,..., xn·.

For instance, the mean vector and the covariance matrix of x, µx and Σx , can be estimated with the sample mean vector and the sample covariance matrix, x and Sx , respectively.

The sample mean vector x and the sample covariance matrix Sx veriﬁes the following properties:

1 E [x] = µx .

2 1 Cov [x] = n Σx .

3 E [Sx ] = Σx .

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 70 / 106 Statistical inference

It would be of interest to obtain the distribution of sampling statistics, such as the sample mean vector.

This is useful to derive confidence intervals or to define rejection regions in hypothesis testing for a given significance level.

For instance, in the Gaussian case, we have the following result.

1 Theorem Let x1·,..., xn· be i.i.d. with xi· ∼ N (µx , Σx ). Then, x ∼ N µx , n Σx . The central limit theorem shows than even if the parent distribution is not Gaussian, when the sample size n is large, the sample mean vector x has an approximate Gaussian distribution.

Central Limit Theorem (CLT)√ Let x1·,..., xn· be i.i.d. with xi· ∼ (µx , Σx ). Then, the distribution of n (x − µx ) is asymptotically N (0p, Σx ), i.e., √ n (x − µx ) −→d N (0p, Σx ) as n −→ ∞

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 71 / 106 Statistical inference

The next two slides show multivariate kernel density estimates of a sample of 2000 sample mean vector corresponding to 2000 samples of a certain bivariate random variable.

The ﬁrst slide corresponds to the case of n = 5.

The second slide corresponds to the case of n = 100.

It is easy to see that the second estimate appears to be closer to the bivariate Gaussian distribution than the ﬁrst one.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 72 / 106 Statistical inference

n=5

0.15

0.10

0.05 −2

−1 0.00

−2 0 x1 −1 1 0 x2 1 2 2

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 73 / 106 Statistical inference

n=100

0.10

0.05 −4

−2 0.00 −4 x1 0 −2

x2 0 2

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 74 / 106 Statistical inference

If we know (or assume) the (parametric) distribution of the random variable x = 0 (x1,..., xp) , then one can try to estimate the parameters of the distribution. For instance, in a mixture of Gaussian distributions, we are interested in esti- mating the weights as well as the parameters of the mixture densities.

0 Then, let θ = (θ1, . . . , θr ) be the vector of parameters of the density function fx (·|θ) corresponding to the distribution of the multivariate random variable 0 x = (x1,..., xp) . The aim is to estimate the vector of parameters θ from the i.i.d. sample x1·,..., xn· from x. The most important method to carry out this task is the maximum likelihood estimation (MLE) method that is presented next.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 75 / 106 Statistical inference

The joint pdf of x1·,..., xn· is given by:

n Y f (x1·,..., xn·|θ) = f (xi·|θ) i=1

Then, note that the sample is known (X , the data matrix) but θ is unknown.

Therefore, the joint pdf of x1·,..., xn· as a function of θ given the sample is called the likelihood function:

n Y L (θ|X ) = f (xi·|θ) i=1

0 where xi· = (xi1,..., xip) . Note that the likelihood function can be seen as a kind of pdf of θ|X .

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 76 / 106 Statistical inference

Consequently, the maximum likelihood estimate (MLE) of θ, denoted by θb, is the value of θ that maximizes L (θ|X ), i.e.:

θb = arg maxL (θ|X ) , θ i.e., the value of θ that maximizes the probability of obtaining the observed sample.

Often it is easier to maximize the log of the likelihood function, named the log-likelihood function or support function:

` (θ|X ) = log L (θ|X )

which is equivalent since the logarithm is a monotone one-to-one function.

Hence, θb = arg maxL (θ|X ) = arg max` (θ|X ) θ θ

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 77 / 106 Statistical inference

Usually, the maximisation process can not be performed analytically.

In these cases, nonlinear optimization techniques will be used to determine the value of θ maximising L (θ|X ) or ` (θ|X ).

These numerical methods are typically based on Newton-Raphson techniques.

Nevertheless, in certain cases, it is possible to obtain the MLE analytically.

The multivariate Gaussian distribution is an example of this that is presented next.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 78 / 106 Statistical inference

Let x1·,..., xn· be a simple random sample from x ∼ N (µx , Σx ). Then, the joint density function is:

n 0 −1 Y −p/2 −1/2 (xi· − µx ) Σ (xi· − µx ) f (x ,..., x |µ , Σ ) = (2π) |Σ | exp − x 1· n· x x x 2 i=1 Consequently, the support function is given by:

n np n 1 X 0 ` (µ , Σ |X ) = − log 2π − log |Σ | − (x − µ ) Σ−1 (x − µ ) x x 2 2 x 2 i· x x i· x i=1

The MLE of µx and Σx are the values of these parameters that maximizes ` (µx , Σx |X ).

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 79 / 106 Statistical inference

To do this note that we can write:

n " n !# X 0 −1 −1 X 0 (xi· − µx ) Σx (xi· − µx ) = Tr Σx (xi· − µx )(xi· − µx ) i=1 i=1

Then, adding and subtracting the sample mean vector x in (xi· − µx ) leads to:

n n X 0 X 0 (xi· − µx )(xi· − µx ) = (xi· − x + x − µx )(xi· − x + x − µx ) = i=1 i=1 n X 0 0 = (xi· − x)(xi· − x) + n (x − µx )(x − µx ) i=1 Pn 0 Pn 0 because the terms i=1 (xi· − x)(x − µx ) and i=1 (x − µx )(xi· − x) are both matrices of zeros.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 80 / 106 Statistical inference

Consequently:

n X 0 −1 (xi· − µx ) Σx (xi· − µx ) = i=1 " n !# −1 X 0 0 Tr Σx (xi· − x)(xi· − x) + n (x − µx )(x − µx ) = i=1 " n !# −1 X 0 0 −1 = Tr Σx (xi· − x)(xi· − x) + n (x − µx ) Σx (x − µx ) i=1

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 81 / 106 Statistical inference

Therefore, the support function can be written as: np n ` (µ , Σ |X ) = − log 2π − log |Σ | − x x 2 2 x " n !# ! 1 X 0 0 − Tr Σ−1 (x − x)(x − x) + n (x − µ ) Σ−1 (x − µ ) 2 x i· i· x x x i=1

Now, ` (µx , Σx |X ) only depends on µx in the last term and that this is maximized 0 −1 if (x − µx ) Σx (x − µx ) = 0.

Therefore, the MLE of µx is µbx = x.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 82 / 106 Statistical inference

It remains to maximize:

" n !# np n 1 X 0 ` (Σ |X , µ = x) = − log 2π− log |Σ |− Tr Σ−1 (x − x)(x − x) x bx 2 2 x 2 x i· i· i=1 For that, we need a result from the matrix algebra: Given a p × p symmetric positive deﬁnite matrix B and a scalar b > 0, it follows that: 1 −b log |Σ | − Tr Σ−1B ≤ −b log |B| + pb log (2b) − pb x 2 x Pn 0 Then, taking b = n/2 and B = i=1 (xi· − x)(xi· − x) , shows that the MLE of Σx is: n 1 X 0 Σbx = (xi· − x)(xi· − x) n i=1

Note that the MLE of Σx is not the sample covariance matrix but a re-scaled n−1 version of it because Σbx = n Sx .

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 83 / 106 Statistical inference

The next Theorem gives the asymptotic sampling distribution of the MLE, which turns out to be Gaussian.

r Theorem Suppose that the sample x1·,..., xn· is i.i.d. If θb is the MLE for θ ∈ R , then under some regularity conditions, as n → ∞:

√ −1 n θb− θ →d N 0r , F

where F denotes the Fisher information matrix given by:

1 ∂2 F = − E ` (θ|X ) n ∂θ∂θ0

As a consequence of this Theorem, we see that under regularity conditions the MLE is asymptotically unbiased, eﬃcient (minimum variance) and Gaussian distributed.

Also it is a consistent estimator of θ.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 84 / 106 Hypothesis testing

We turn now our interest towards hypothesis testing issues.

In particular, we will go over a general methodology to construct tests called the likelihood ratio method and we will apply them to the case of Gaussian populations.

Then, we assume a r-dimensional vector parameter, θ, that takes values in Ω ⊂ Rr .

We want to test the hypothesis H0 that the unknown parameter θ belongs to some subspace of Rr .

r This subspace is called the null set and will be denoted by Ω0 ⊂ R .

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 85 / 106 Hypothesis testing

Consequently, we want to test the hypothesis:

H0 : θ ∈ Ω0

versus the alternative hypothesis:

H1 : θ ∈ Ω

which assume that θ is not restricted to Ω0.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 86 / 106 Hypothesis testing

For example, consider a multivariate Gaussian N (µx , Σx ).

To test if µx equals a certain ﬁxed value of µ0, we construct the test problem:

H0 : µx = µ0

H1 : no constraints on µx

p Then, in this example we have Ω0 = {µ0} and Ω = R .

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 87 / 106 Hypothesis testing

∗ ∗ Deﬁne L0 = maxL (θ|X ) and L = maxL (θ|X ), the values of the maximized θ∈Ω0 θ∈Ω likelihood under H0 and H1, respectively. Consider the likelihood ratio (LR) given by:

L∗ LR = 0 L∗

By construction 0 ≤ LR ≤ 1, and one tends to favour H0 if the LR is high (“close” to 1) and H1 if the LR is low (“not close” to 1).

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 88 / 106 Hypothesis testing

The likelihood ratio test (LRT) tell us when exactly to favour H0 over H1. This is given by:

∗ ∗ ∗ ∗ λ = −2 log LR = −2 (log L0 − ln L ) = −2 (`0 − ` )

The LRT λ is asymptotically distributed like a χ2 distribution with the number of degrees of freedom equal to the diﬀerence of the dimension between the spaces Ω and Ω0.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 89 / 106 Hypothesis testing

Given a sample from a population N (µx , Σx ), we want to test the hypothesis:

H0 : µx = µ0

against the alternative H1 : µx 6= µ0 It is possible to show that the likelihood ratio test statistic is given by,

Σb0 λ = n log

Σbx where, n 1 X 0 Σb0 = (xi· − µ0)(xi· − µ0) n i=1 which has an asymptotic χ2 distribution with p degrees of freedom.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 90 / 106 Illustrative example (I)

Consider the daily log-returns (in percentages) of four major European stock indices: Germany (DAX), Switzerland (SMI), France (CAC) and UK (FTSE), from 1991 to 1998.

We want to test the null hypothesis that the mean vector of returns is zero (assuming Gaussianity).

The estimated mean is given by:

x = (0.065, 0.081, 0.043, 0.043)0

The covariance matrix under H0 is given by:

 1.064 0.674 0.836 0.526   0.674 0.861 0.631 0.433  Σb0 =    0.836 0.631 1.217 0.570  0.526 0.433 0.570 0.634

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 91 / 106 Illustrative example (I)

The covariance matrix under H1 is given by:

 1.060 0.669 0.834 0.523   0.669 0.855 0.628 0.430  Σbx =    0.834 0.628 1.216 0.569  0.523 0.430 0.569 0.632

The value of the statistic is λ = 11.70 with associated p-value 0.0196.

Thus, we reject H0 at the 5% signiﬁcant level but we cannot reject H0 at the 1% signiﬁcant level.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 92 / 106 Hypothesis testing

Given a sample of a population N (µx , Σx ), we want to test the hypothesis:

H0 :Σx = Σ0

against the alternative H1 :Σx 6= Σ0 It is possible to show that the likelihood ratio test statistic is given by,

|Σ0| −1 λ = −n log + nTr Σ0 Σb − np

Σb

which has an asymptotic χ2 distribution with p (p + 1) /2 degrees of freedom.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 93 / 106 Hypothesis testing

It is also of interest to know whether Σx is diagonal, in which case the univariate variables are independent.

In this case, we gain nothing from analyzing them jointly since they have no information in common.

Then: H0 :Σx diagonal against the alternative H1 :Σx unrestricted It is possible to show that the likelihood ratio test statistic is given by,

λ = −n log |Rx |

which has an asymptotic χ2 distribution with p (p − 1) /2 degrees of freedom.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 94 / 106 Illustrative example (I)

Consider again the daily log-returns (in percentages) of four major European stock indices.

We test the null hypothesis of independency (assuming Gaussianity).

The estimated correlation matrix under H0 is given by:

 1 0.703 0.734 0.639   0.703 1 0.616 0.584  R =    0.734 0.616 1 0.648  0.639 0.584 0.648 1

The value of the statistic is λ = 4071.87 with associated p-value 0.

Thus, we reject H0 at the usual signiﬁcance levels.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 95 / 106 Hypothesis testing

Assume that we have observed a sample of size n of a p-dimensional variable x = 0 (x1,..., xp) that can be split into G groups so that there are n1 observations of group 1, and so on.

Our goal here is to check whether the means of the G groups are equal or not assuming Gaussianity and that the covariance matrix Σx is the same for all the groups.

Then, the hypothesis to be tested is:

H0 : µ1 = ··· = µG = µx

and the alternative hypothesis is:

H1 : not all the µg are equal

This problem is known as the multivariate analysis of variance.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 96 / 106 Hypothesis testing

The likelihood ratio test method leads to the statistic:

Σbx λ = n log |SW |

where Σbx is the MLE of Σx under Gaussianity, and SW = W /n where:

G ng X X 0 W = (xig − x g )(xig − x g ) g=1 i=1

where xig is the i-th observation in group g and x g is the sample mean vector of the observations in group g.

W is usually called the within groups variability matrix or matrix of deviations with respect to the means of each group.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 97 / 106 Hypothesis testing

2 The statistic λ has an asymptotic χp(G−1) distribution. However, this approximation can be improved for small sample sizes.

For instance, the statistic:

Σbx λ0 = m log |SW | 2 asymptotically follows a χp(G−1) distribution, where m = (n − 1) − (p + G) /2.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 98 / 106 Hypothesis testing

The test statistic λ can be derived in an alternative way.

Let: G ng X X 0 T = nΣbx = (xig − x)(xig − x) g=1 i=1 be the total variability of the data, which measures the deviations with respect to a common mean.

The matrix T can be decomposed as the sum of two matrices.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 99 / 106 Hypothesis testing

The ﬁrst one is the matrix W which has been deﬁned previously.

The second one measures the between groups variability, explained by the dif- ferences between means, and that we will denote as B:

G X 0 B = ng (x g − x)(x g − x) g=1

Therefore, we can write:

T (Total variability) = B (Explained variability) + W (Residual variability)

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 100 / 106 Hypothesis testing

In order to test whether the means are equal we can compare the size of the matrices T and B.

One idea is to consider that the measurement of their size is their determinant.

Then, we can propose a test based on the ratio |T | / |W |.

In particular, we can use the likelihood ratio test statistic λ, as well as the statistic λ0, described before.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 101 / 106 Illustrative example (II)

We consider the Iris dataset consisting in ﬁve univariate variables measured on 150 ﬂowers.

There are 50 ﬂowers of each specie:

I x1: Length of the sepal (in mm.).

I x2: Width of the sepal (in mm.).

I x3: Length of the petal (in mm.).

I x4: Width of the petal (in mm.).

I x5: Specie (setosa, versicolor and virginica).

The next slide shows the scatterplot matrix of the dataset.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 102 / 106 Illustrative example (II)

Iris dataset

2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 7.5

Sepal.Length 6.5 5.5 4.5 4.0

3.5 Sepal.Width 3.0 2.5 2.0 7 6 5

Petal.Length 4 3 2 1 2.5 2.0

1.5 Petal.Width 1.0 0.5

4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 1 2 3 4 5 6 7

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 103 / 106 Illustrative example (II)

We test the equality of means for the 3 groups of the Iris dataset.

The means of the 3 groups are given by:

 5.006   5.936   6.588   3.428   2.770   2.974  x 1 =   x 2 =   x 2 =    1.462   4.260   5.552  0.246 1.326 2.026

The value of the statistic λ is 563.00 with associated p-value 0.

Thus, we reject H0 for any reasonable signiﬁcant level.

On the other hand, the value of the statistic λ0 is 544.23 with associated p-value 0.

Thus, we also reject H0 with this statistic.

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 104 / 106 Chapter outline

We are ready now for:

Chapter 3: Principal component Analysis

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 105 / 106 1 Introduction

2 Basic concepts

3 Multivariate distributions

4 Statistical inference

5 Hypothesis testing

Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 106 / 106