Some Probability and Statistics

Appendix A Some probability and statistics A.1 Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,Z, etc, and their probability distributions, defined by the cumulative distribution function (CDF) FX (x)= P(X x), etc. ≤ To a random experiment we define a sample space Ω, that contains all the outcomes that can occur in the experiment. A random variable, X, is a function defined on a sample space Ω. To each outcome ω Ω it defines a ∈ real number X(ω), that represents the value of a numeric quantity that can be measured in the experiment. For X to be called a random variable, the probability P(X x) has to be defined for all real x. ≤ Distributions and moments A probability distribution with cumulative distribution function FX (x) can be discrete or continuous with probability function pX (x) and probability density function fX (x), respectively, such that ∑k x pX (k), if X takes only integer values, P ≤ FX (x)= (X x)= x ≤ ( ∞ fX (y)dy. − R A distribution can be of mixed type. The distribution function is then an inte- gral plus discrete jumps; see Appendix B. The expectation of a random variable X is defined as the center of gravity in the distribution, ∑ kp (k), E k X [X]= mX = ∞ ( x= ∞ x fX (x)dx. − R The variance is a simple measure of the spreading of the distribution and is 261 262 SOMEPROBABILITYANDSTATISTICS defined as ∑ (k m )2 p (k), 2 2 2 k X X V[X]= E[(X mX ) ]= E[X ] m = − − − X ∞ 2 ( x= ∞(x mX ) fX (x)dx. − − Chebyshev’s inequality states that, for all εR> 0, 2 E[(X mX) ] P( X mX > ε) − . | − | ≤ ε2 In order to describe the statistical properties of a random function one needs the notion of multivariate distributions. If the result of a experiment is described by two different quantities, denoted by X and Y , e.g. length and height of randomly chosen individual in a population, or the value of a random function at two different time points, one has to deal with a two-dimensional random variable. This is described by its two-dimensional distribution function FX Y (x,y)= P(X x,Y y), or the corresponding two-dimensional prob- , ≤ ≤ ability or density function, ∂ 2F (x,y) f (x,y)= X,Y . X,Y ∂x∂y Two random variables X,Y are independent if, for all x,y, FX,Y (x,y)= FX (x)FY (y). An important concept is the covariance between two random variables X and Y, defined as C[X,Y]= E[(X mX )(Y mY )] = E[XY] mX mY . − − − The correlation coefficient is equal to the dimensionless, normalized covariance,1 C[X,Y ] ρ[X,Y]= . V[X]V[Y] ρ Two random variable with zero correlation,p [X,Y]= 0, are called uncorrelated. Note that if two random variables X and Y are independent, then they are also uncorrelated, but the reverse does not hold. It can happen that two uncorrelated variables are dependent. 1 Speaking about the “correlation between two random quantities”, one often means the degree of covariation between the two. However, one has to remember that the correlation only measures the degree of linear covariation. Another meaning of the term correlation is used in connection with two data series, (x1,...,xn) and (y1,...,yn). Then sometimes the sum of ∑n products, 1 xkyk, can be called “correlation”, and a device that produces this sum is called a “correlator”. MULTIDIMENSIONALNORMALDISTRIBUTION 263 Conditional distributions If X and Y are two random variables with bivariate density function fX,Y (x,y), we can define the conditional distribution for X given Y = y, by the conditional density, fX,Y (x,y) fX Y =y(x)= , | fY (y) for every y where the marginal density fY (y) is non-zero. The expectation in this distribution, the conditional expectation, is a function of y, and is denoted and defined as E [X Y = y]= x fX Y =y(x)dx = m(y). | x | Z The conditional variance is defined as V 2 σ [X Y = y]= (x m(y)) fX Y =y(x)dx = X Y (y). | Zx − | | The unconditional expectation of X can be obtained from the law of total probability, and computed as E E E [X]= x fX Y =y(x)dx fY (y)dy = [ [X Y]]. y x | | Z Z The unconditional variance of X is given by V[X]= E[V[X Y]] + V[E[X Y]]. | | A.2 Multidimensional normal distribution A one-dimensional normal random variable X with expectation m and variance σ 2 has probability density function 1 1 x m 2 fX (x)= exp − , √ πσ 2 −2 σ 2 ( ) and we write X ∼ N(m,σ 2). If m = 0 and σ = 1, the normal distribution is standardized. If X ∼ N(0,1), then σX + m N(m,σ 2), and if X ∼ N(m,σ 2) ∈ then (X m)/σ ∼ N(0,1). We accept a constant random variable, X m, as − ≡ a normal variable, X ∼ N(m,0). 264 SOMEPROBABILITYANDSTATISTICS Now, let X1,...,Xn have expectation mk = E[Xk] and covariances σ jk = C[Xj,Xk], and define, (with ′ for transpose), µ = (m1,...,mn)′, Σ = (σ jk)= the covariance matrix for X1,...,Xn. It is a characteristic property of the normal distributions that all linear combinations of a multivariate normal variable also has a normal distribution. To formulate the definition, write a = (a1,...,an)′ and X = (X1,...,Xn)′, with a X = a1X1 + + anXn. Then, ′ ··· n E[a′X]= a′µ = ∑ a jm j, j=1 n (A.1) V[a′X]= a′Σa = ∑a jakσ jk. j,k ✤ ✜ Definition A.1. The random variables X1,...,Xn, are said to have an n-dimensional normal distribution is every linear combination a1X1 + + anXn has a normal distribution. From (A.1) we have ··· that X = (X1,...,Xn)′ is n-dimensional normal, if and only if a′X ∼ µ Σ N(a′µ ,a′Σa) for all a = (a1,...,an)′. ✣ ✢ Obviously, Xk = 0 X1 + + 1 Xk + + 0 Xn, is normal, i.e., all marginal distributions· in an ···n-dimensional· ··· normal· distribution are one- dimensional normal. However, the reverse is not necessarily true; there are variables X1,...,Xn, each of which is one-dimensional normal, but the vector (X1,...,Xn)′ is not n-dimensional normal. It is an important consequence of the definition that sums and differences of n-dimensional normal variables have a normal distribution. If the covariance matrix Σ is non-singular, the n-dimensional normal distribution has a probability density function (with x = (x1,...,xn)′) 1 1 1 exp (x µ )′Σ− (x µ) . (A.2) 2π n/2√detΣ −2 − − ( ) The distribution is said to be non-singular. The density (A.2) is constant on 1 n every ellipsoid (x µ ) Σ− (x µ )= C in R . − ′ − MULTIDIMENSIONALNORMALDISTRIBUTION 265 ✤ ✜ Note: the density function of an n-dimensional normal distribution is uniquely determined by the expectations and covariances. ✣ ✢ Example A.1. Suppose X1,X2 have a two-dimensional normal distribution If 2 detΣ = σ11σ22 σ > 0, − 12 then Σ is non-singular, and 1 1 σ22 σ12 Σ− = − . det Σ σ12 σ11 − 1 With Q(x1,x2) = (x µ) Σ− (x µ ), − ′ − Q(x1,x2)= 1 2σ σ 2σ = σ σ σ 2 (x1 m1) 22 2(x1 m1)(x2 m2) 12 + (x2 m2) 11 = 11 22 12 − − − − − − 2 2 1 x1 m1 ρ x1 m1 x2 m2 x2 m2 = ρ2 −σ 2 −σ −σ + −σ , 1 √ 11 − √ 11 √ 22 √ 22 − σ where we also used the correlation coefficient ρ = 12 , and, √σ11σ22 1 1 fX ,X (x1,x2)= exp Q(x1,x2) . (A.3) 1 2 2 2π σ11σ22(1 ρ ) −2 − 2 For variables with m1 = mp2 = 0 and σ11 = σ22 = σ , the bivariate density is 1 1 2 ρ 2 fX1,X2 (x1,x2)= exp (x1 2 x1x2 + x2) . 2πσ 2 1 ρ2 −2σ 2(1 ρ2) − − − We see that, if ρ = 0, thisp is the density of two independent normal variables. Figure A.1 shows the density function fX1,X2 (x1,x2) and level curves for a bivariate normal distribution with expectation µ = (0,0), and covariance matrix 1 0.5 Σ = . 0.5 1 The correlation coefficient is ρ = 0.5. N Remark A.1. If the covariance matrix Σ is singular and non-invertible, then there exists at least one set of constants a1,...,an, not all equal to 0, such Σ that a′Σa = 0. From (A.1) it follows that V[a′X]= 0, which means that a′X is µ constant equal to a′µ . The distribution of X is concentrated to a hyper plane n a′x = constant in R . The distribution is said to be singular and it has no density function in Rn. 266 SOMEPROBABILITYANDSTATISTICS 0.2 2 0.1 1 0 0 −1 2 0 −2 −2 2 −2 0 −2 0 2 Figure A.1 Two-dimensional normal density . Left: density function; Right: elliptic level curves at levels 0.01,0.02,0.05,0.1,0.15. Remark A.2. Formula (A.1) implies that every covariance matrix Σ is positive definite or positive semi-definite, i.e., ∑ a jakσ jk 0 for all a1,...,an. j,k ≥ Conversely, if Σ is a symmetric, positive definite matrix of size n n, i.e., if × ∑ a jakσ jk > 0 for all a1,...,an = 0,...,0, then (A.2) defines the density j,k 6 function for an n-dimensional normal distribution with expectation mk and covariances σ jk.

Some Probability and Statistics

Lecture 22: Bivariate Normal Distribution Distribution

A Test for Singularity 1

1. How Different Is the T Distribution from the Normal?

Vector Spaces

Ph 21.5: Covariance and Principal Component Analysis (PCA)

6 Probability Density Functions (Pdfs)

Random Variable = a Real-Valued Function of an Outcome X = F(Outcome)

Laws of Total Expectation and Total Variance

1 One Parameter Exponential Families

On the Meaning and Use of Kurtosis

Multicollinearity

Basic Econometrics / Statistics Statistical Distributions: Normal, T, Chi-Sq, & F