<<

Appendix A Some and

A.1 , random variables and their distribution We summarize a few of the basic concepts of random variables, usually de- noted by capital letters, X,Y,Z, etc, and their probability distributions, defined by the cumulative distribution (CDF) FX (x)= P(X x), etc. ≤ To a random we define a Ω, that contains all the outcomes that can occur in the experiment. A random , X, is a function defined on a Ω. To each ω Ω it defines a ∈ real X(ω), that represents the value of a numeric quantity that can be measured in the experiment. For X to be called a , the probability P(X x) has to be defined for all real x. ≤

Distributions and moments

A with cumulative distribution function FX (x) can be discrete or continuous with probability function pX (x) and probability density function fX (x), respectively, such that

∑k x pX (k), if X takes only integer values, P ≤ FX (x)= (X x)= x ≤ ( ∞ fX (y)dy. − A distribution can be of mixed type. The distribution function is then an inte- gral plus discrete jumps; see Appendix B. The expectation of a random variable X is defined as the center of gravity in the distribution,

∑ kp (k), E k X [X]= mX = ∞ ( x= ∞ x fX (x)dx. − R The is a simple of the spreading of the distribution and is

261 262 SOMEPROBABILITYANDSTATISTICS defined as ∑ (k m )2 p (k), 2 2 2 k X X V[X]= E[(X mX ) ]= E[X ] m = − − − X ∞ 2 ( x= ∞(x mX ) fX (x)dx. − − Chebyshev’s inequality states that, for all εR> 0, 2 E[(X mX) ] P( X mX > ε) − . | − | ≤ ε2 In order to describe the statistical properties of a random function one needs the notion of multivariate distributions. If the result of a experiment is described by two different quantities, denoted by X and Y , e.g. length and height of randomly chosen individual in a population, or the value of a random function at two different time points, one has to deal with a two-dimensional random variable. This is described by its two-dimensional distribution func- tion FX Y (x,y)= P(X x,Y y), or the corresponding two-dimensional prob- , ≤ ≤ ability or density function, ∂ 2F (x,y) f (x,y)= X,Y . X,Y ∂x∂y Two random variables X,Y are independent if, for all x,y,

FX,Y (x,y)= FX (x)FY (y). An important concept is the between two random variables X and Y, defined as

C[X,Y]= E[(X mX )(Y mY )] = E[XY] mX mY . − − − The correlation coefficient is equal to the dimensionless, normalized covari- ance,1 C[X,Y ] ρ[X,Y]= . V[X]V[Y] ρ Two random variable with zero correlation,p [X,Y]= 0, are called uncorre- lated. Note that if two random variables X and Y are independent, then they are also uncorrelated, but the reverse does not hold. It can happen that two uncorrelated variables are dependent.

1 Speaking about the “correlation between two random quantities”, one often the de- gree of covariation between the two. However, one has to remember that the correlation only measures the degree of linear covariation. Another meaning of the term correlation is used in connection with two series, (x1,...,xn) and (y1,...,yn). Then sometimes the sum of ∑n products, 1 xkyk, can be called “correlation”, and a device that produces this sum is called a “correlator”. MULTIDIMENSIONALNORMALDISTRIBUTION 263 Conditional distributions

If X and Y are two random variables with bivariate density function fX,Y (x,y), we can define the conditional distribution for X given Y = y, by the condi- tional density, fX,Y (x,y) fX Y =y(x)= , | fY (y) for every y where the marginal density fY (y) is non-zero. The expectation in this distribution, the , is a function of y, and is denoted and defined as

E [X Y = y]= x fX Y =y(x)dx = m(y). | x | Z The is defined as

V 2 σ [X Y = y]= (x m(y)) fX Y =y(x)dx = X Y (y). | Zx − | | The unconditional expectation of X can be obtained from the , and computed as

E E E [X]= x fX Y =y(x)dx fY (y)dy = [ [X Y]]. y x | | Z Z  The unconditional variance of X is given by

V[X]= E[V[X Y]] + V[E[X Y]]. | |

A.2 Multidimensional A one-dimensional normal random variable X with expectation m and vari- ance σ 2 has probability density function

1 1 x m 2 fX (x)= exp − , √ πσ 2 −2 σ 2 (   ) and we write X ∼ N(m,σ 2). If m = 0 and σ = 1, the normal distribution is standardized. If X ∼ N(0,1), then σX + m N(m,σ 2), and if X ∼ N(m,σ 2) ∈ then (X m)/σ ∼ N(0,1). We accept a constant random variable, X m, as − ≡ a normal variable, X ∼ N(m,0). 264 SOMEPROBABILITYANDSTATISTICS

Now, let X1,...,Xn have expectation mk = E[Xk] and σ jk = C[Xj,Xk], and define, (with ′ for ),

µ = (m1,...,mn)′,

Σ = (σ jk)= the covariance for X1,...,Xn.

It is a characteristic property of the normal distributions that all linear combinations of a multivariate normal variable also has a normal distribution. To formulate the definition, write a = (a1,...,an)′ and X = (X1,...,Xn)′, with a X = a1X1 + + anXn. Then, ′ ··· n E[a′X]= a′µ = ∑ a jm j, j=1 n (A.1) V[a′X]= a′Σa = ∑a jakσ jk. j,k

✤ ✜ Definition A.1. The random variables X1,...,Xn, are said to have an n-dimensional normal distribution is every a1X1 + + anXn has a normal distribution. From (A.1) we have ··· that X = (X1,...,Xn)′ is n-dimensional normal, if and only if a′X ∼ µ Σ N(a′µ ,a′Σa) for all a = (a1,...,an)′.

✣ ✢ Obviously, Xk = 0 X1 + + 1 Xk + + 0 Xn, is normal, i.e., all marginal distributions· in an ···n-dimensional· ··· normal· distribution are one- dimensional normal. However, the reverse is not necessarily true; there are variables X1,...,Xn, each of which is one-dimensional normal, but the vector (X1,...,Xn)′ is not n-dimensional normal. It is an important consequence of the definition that sums and differences of n-dimensional normal variables have a normal distribution. If the Σ is non-singular, the n-dimensional normal dis- tribution has a probability density function (with x = (x1,...,xn)′)

1 1 1 exp (x µ )′Σ− (x µ) . (A.2) 2π n/2√detΣ −2 − − ( )   The distribution is said to be non-singular. The density (A.2) is constant on 1 n every (x µ ) Σ− (x µ )= C in R . − ′ − MULTIDIMENSIONALNORMALDISTRIBUTION 265

✤ ✜ Note: the density function of an n-dimensional normal distribution is uniquely determined by the expectations and covariances.

✣ ✢ Example A.1. Suppose X1,X2 have a two-dimensional normal distribution If 2 detΣ = σ11σ22 σ > 0, − 12 then Σ is non-singular, and

1 1 σ22 σ12 Σ− = − . det Σ σ12 σ11 −  1 With Q(x1,x2) = (x µ) Σ− (x µ ), − ′ − Q(x1,x2)= 1 2σ σ 2σ = σ σ σ 2 (x1 m1) 22 2(x1 m1)(x2 m2) 12 + (x2 m2) 11 = 11 22 12 − − − − − − 2 2 1 x1 m1 ρ x1 m1 x2 m2 x2 m2 = ρ2 −σ 2 −σ −σ + −σ , 1 √ 11 − √ 11 √ 22 √ 22 −         σ where we also used the correlation coefficient ρ = 12 , and, √σ11σ22 1 1 fX ,X (x1,x2)= exp Q(x1,x2) . (A.3) 1 2 2 2π σ11σ22(1 ρ ) −2 −   2 For variables with m1 = mp2 = 0 and σ11 = σ22 = σ , the bivariate density is

1 1 2 ρ 2 fX1,X2 (x1,x2)= exp (x1 2 x1x2 + x2) . 2πσ 2 1 ρ2 −2σ 2(1 ρ2) − −  −  We see that, if ρ = 0, thisp is the density of two independent normal variables.

Figure A.1 shows the density function fX1,X2 (x1,x2) and level curves for a bivariate normal distribution with expectation µ = (0,0), and covariance matrix 1 0.5 Σ = . 0.5 1   The correlation coefficient is ρ = 0.5. N Remark A.1. If the covariance matrix Σ is singular and non-invertible, then there exists at least one of constants a1,...,an, not all equal to 0, such Σ that a′Σa = 0. From (A.1) it follows that V[a′X]= 0, which means that a′X is µ constant equal to a′µ . The distribution of X is concentrated to a hyper plane n a′x = constant in R . The distribution is said to be singular and it has no density function in Rn. 266 SOMEPROBABILITYANDSTATISTICS

0.2 2

0.1 1 0 0 −1 2 0 −2 −2 2 −2 0 −2 0 2

Figure A.1 Two-dimensional normal density . Left: density function; Right: elliptic level curves at levels 0.01,0.02,0.05,0.1,0.15.

Remark A.2. Formula (A.1) implies that every covariance matrix Σ is pos- itive definite or positive semi-definite, i.e., ∑ a jakσ jk 0 for all a1,...,an. j,k ≥ Conversely, if Σ is a symmetric, positive definite matrix of size n n, i.e., if × ∑ a jakσ jk > 0 for all a1,...,an = 0,...,0, then (A.2) defines the density j,k 6 function for an n-dimensional normal distribution with expectation mk and covariances σ jk. Every symmetric, positive definite matrix is a covariance matrix for a non-. Furthermore, for every symmetric, positive semi-definite matrix, Σ, i.e., such that

∑a jakσ jk 0 j,k ≥ for all a1,...,an with equality holding for some choice of a1,...,an = 0,...,0, there exists an n-dimensional normal distribution that has Σ as its covariance6 matrix. For n-dimensional normal variables, “uncorrelated” and “independent” are equivalent.

✤ ✜ A.1. If the random variables X1,...,Xn are n- dimensional normal and uncorrelated, then they are independent.

✣ ✢

Proof. We show the theorem only for non-singular variables with density function. It is true also for singular normal variables. 1 If X1,...,Xn are uncorrelated, σ jk = 0 for j = k, then Σ, and also Σ− are 6 MULTIDIMENSIONALNORMALDISTRIBUTION 267 diagonal matrices, i.e., (note that σ j j = V[Xj]), σ 1 11− ... 0 Σ σ Σ 1 . .. . det = ∏ j j, − =  . . . . j 0 ... σ 1  nn−    1 2 This means that (x µ ) Σ− (x µ )= ∑ (x j µ j) /σ j j, and the density ′ − j − (A.2) is − 2 1 (x j m j) ∏ exp − . 2πσ − 2σ j j j  j j  Hence, the joint densityp function for X1,...,Xn is a product of the marginal densities, which says that the variables are independent. 

A.2.1 Conditional normal distribution This section deals with partial observations in a multivariate normal distribu- tion. It is a special property of this distribution, that conditioning on observed values of a subset of variables, leads to a conditional distribution for the unob- served variables that is also normal. Furthermore, the expectation in the con- ditional distribution is linear in the observations, and the covariance matrix does not depend on the observed values. This property is particularly useful in prediction of Gaussian , as formulated by the Kalman filter.

Conditioning in the bivariate normal distribution

Let X and Y have a bivariate normal distribution with expectations mX and σ 2 σ 2 ρ mY , X and Y , respectively, and with correlation coefficient = C[X,Y]/(σX σY ). The simultaneous density function is given by (A.3). The conditional density function for X given that Y = y is

fX,Y (x,y) f X Y=y(x)= | fY (y) 2 1 (x (mX + σX ρ(y mY )/σY )) = exp − − . 2 σ 2 ρ2 σX 1 ρ √2π − 2 X (1 ) −  −  Hence, the conditionalp distribution of X given Y = y is normal with expecta- tion and variance σ ρ σ σ 2 σ 2 ρ2 mX Y=y = mX + X (y mY )/ Y , X Y=y = X (1 ). | − | − Note: the conditional expectation depends linearly on the observed y-value, and the conditional variance is constant, independent of Y = y. 268 SOMEPROBABILITYANDSTATISTICS Conditioning in the multivariate normal distribution

Let X = (X1,...,Xn)′ and Y = (Y1,...,Ym)′ be two multivariate normal vari- ables, of size n and m, respectively, such that Z = (X1,...,Xn,Y1,...,Ym)′ is (n + m)-dimensional normal. Denote the expectations

E[X]= mX, E[Y]= mY, Σ Σ and partition the covariance matrix for Z (with XY = YX′ ), X X Σ Σ Σ = Cov ; = XX XY . (A.4) Y Y Σ Σ      YX YY  If the covariance matrix Σ is positive definite, the distribution of (X, Y) has the density function

1 1 Σ 1 (x′ m′ ,y′ m′ )Σ− (x mX,y mY) fXY(x,y)= e− 2 − X − Y − − , (2π)(m+n)/2√det Σ while the m-dimensional density of Y is

1 1 (y m ) Σ 1 (y m ) f (y)= e 2 Y ′ YY− Y . Y m/2 − − − (2π) √det ΣYY To find the conditional density of X given that Y = y,

fYX(y,x) fX Y(x y)= , (A.5) | | fY(y) we need the following matrix property. Theorem A.2 (“Matrix inversions lemma”). Let B be a p p-matrix (p = × n + m): B B B = 11 12 , B B  21 22 where the sub-matrices have dimension n n, n m, etc. Suppose B,B11,B22 × × are non-singular, and partition the inverse in the same way as B,

A A A = B 1 = 11 12 . − A A  21 22 Then

1 1 1 1 1 (B11 B12B− B21)− (B11 B12B− B21)− B12B− A − 22 − − 22 22 = 1 1 1 1 1 . (B22 B21B− B12)− B21B− (B22 B21B− B12)− ! − − 11 11 − 11 MULTIDIMENSIONALNORMALDISTRIBUTION 269 Proof. For the proof, see a matrix theory textbook, for example, [22]. 

✤ ✜ Theorem A.3 (“Conditional normal distribution”). The conditional normal distribution for X, given that Y = y, is n-dimensional nor- mal with expectation and covariance matrix

E Σ Σ 1 [X Y = y] = mX Y=y = mX + XY YY− (y mY), (A.6) | | − C Σ Σ Σ Σ 1 Σ [X Y = y] = XX Y = XX XY YY− YX. (A.7) | | − These formulas are easy to remember: the dimension of the sub- Σ matrices, for example in the covariance matrix ΣXX Y, are the only possible for the matrix multiplications in the right hand| side to be meaningful.

✣ ✢ Proof. To simplify calculations, we start with mX = mY = 0, and add the expectations afterwards. The conditional distribution of X given that Y = y is, according to (A.5), the ratio between two multivariate normal densities, and hence it is of the form,

1 1 x 1 1 1 c exp (x′,y′)Σ− + y′ Σ− y = c exp Q(x,y) , −2 y 2 YY {−2 }     where c is a normalization constant, independent of x and y. The matrix Σ can be partitioned as in (A.4), and if we use the matrix inversion lemma, we find that A A Σ 1 = A = 11 12 (A.8) − A A  21 22 Σ Σ Σ 1 Σ 1 Σ Σ Σ 1 Σ 1Σ Σ 1 (ΣXX ΣXYΣ− ΣYX)− (ΣXX ΣXYΣ− ΣYX)− ΣXYΣ− = − YY − − YY YY . Σ Σ Σ 1 Σ 1Σ Σ 1 Σ Σ Σ 1 Σ 1 ( YY YX − XY)− YX − ( YY YX − XY)− ! − − XX XX − XX We also see that

1 Q(x,y) = x′A11x + x′A12y + y′A21x + y′A22y y′Σ− y − YY = (x′ y′C′)A11(x Cy)+ Q(y) − − = x′A11x x′A11Cy y′C′A11x + y′C′A11Cy + Q(y), − − e for some matrix C and Q(y) in y. e

e 270 SOMEPROBABILITYANDSTATISTICS Σ 1 Here A11 = XX− Y, according to (A.7) and (A.8), while we can find C by solving | 1 1 A11C = A12, i.e. C = A− A12 = ΣXYΣ− , − − 11 YY according to (A.8). This is precisely the matrix in (A.6).

If we reinstall the deleted mX and mY, we get the conditional density for X given Y = y to be of the form

1 1 Σ 1 c exp Q(x,y) = c exp (x′ mX′ Y=y) XX− Y(x mX Y=y) , {−2 } {−2 − | | − | } which is the normal density we were looking for. 

A.2.2 Complex normal variables

In most of the book, we have assumed all random variables to be real valued. In many applications, and also in the mathematical background, it is advan- tageous to consider complex variables, simply defined as Z = X + iY, where X and Y have a bivariate distribution. The value of a is simply

E[Z]= E[ℜZ]+ iE[ℑZ], while the variance and covariances are defined with complex conjugate on the second variable,

C[Z1,Z2]= E[Z1Z2] mZ mZ , − 1 2 2 2 V[Z]= C[Z,Z]= E[ Z ] mZ . | | − | | Note, that for a complex Z = X + iY, with V[X]= V[Y ]= σ 2,

C[Z,Z]= V[X]+ V[Y]= 2σ 2, C[Z,Z]= V[X] V[Y]+ 2iC[X,Y]= 2iC[X,Y ]. − Hence, if the real and imaginary parts are uncorrelated with the same vari- ance, then the complex variable Z is uncorrelated with its own complex con- jugate, Z. Often, one uses the term orthogonal, instead of uncorrelated for complex variables.