Lecture 3 - Expectation, Inequalities and Laws of Large Numbers

Lecture 3 - Expectation, inequalities and laws of large numbers Jan Bouda FI MU April 19, 2009 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 1 / 67 Part I Functions of a random variable Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 2 / 67 Functions of a random variable Given a random variable X and a function Φ : R ! R we define the transformed random variable Y = Φ(X ) as Random variables X and Y are defined on the same sample space, moreover, Dom(X ) = Dom(Y ). Im(Y ) = fΦ(x)jx 2 Im(X )g. The probability distribution of Y is given by X pY (y) = pX (x): x2Im(X );Φ(x)=y In fact, we may define it by Φ(X ) = Φ ◦ X , where ◦ is the usual function composition. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 3 / 67 Part II Expectation Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 4 / 67 Expectation The probability distribution or probability distribution function completely characterize properties of a random variable. Often we need description that is less accurate, but much shorter - single number, or a few numbers. First such characteristic describing a random variable is the expectation, also known as the mean value. Definition Expectation of a random variable X is defined as X E(X ) = xi p(xi ) i provided the sum is absolutely (!) convergent. In case the sum is convergent, but not absolutely convergent, we say that no finite expectation exists. In case the sum is not convergent the expectation has no meaning. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 5 / 67 Median; Mode The median of a random variable X is any number x such that P(X < x) ≤ 1=2 and P(X > x) ≥ 1=2. The mode of a random variable X is the number x such that p(x) = max p(x0): x02Im(X ) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 6 / 67 Moments Let us suppose we have a random variable X and a random variable Y = Φ(X ) for some function Φ. The expected value of Y is X E(Y ) = Φ(xi )pX (xi ): i Especially interesting is the power function Φ(X ) = X k . E(X k ) is known as the kth moment of X . For k = 1 we get the expectation of X . If X and Y are random variables with matching corresponding moments of all orders, i.e. 8k E(X k ) = E(Y k ), then X and Y have the same distributions. Usually we center the expected value to 0 { we use moments of Φ(X ) = X − E(X ). We define the kth central moment of X as k µk = E [X − E(X )] : Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 7 / 67 Variance Definition The second central moment is known as the variance of X and defined as 2 µ2 = E [X − E(X )] : Explicitly written, X 2 µ2 = [xi − E(X )] p(xi ): i 2 The variance is usually denoted as σX or Var(X ). Definition 2 q 2 The square root of σX is known as the standard deviation σX = σX . If variance is small, then X takes values close to E(X ) with high probability. If the variance is large, then the distribution is more ’diffused’. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 8 / 67 Expectation revisited Theorem Let X1; X2;::: Xn be random variables defined on the same probability space and let Y = Φ(X1; X2;::: Xn). Then X X X E(Y ) = ··· Φ(x1; x2;::: xn)p(x1; x2;::: xn): x1 x2 xn Theorem (Linearity of expectation) Let X and Y be random variables. Then E(X + Y ) = E(X ) + E(Y ): Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 9 / 67 Linearity of expectation (proof) Linearity of expectation. X X E(X + Y ) = (xi + yj )p(xi ; yj ) = i j X X X X = xi p(xi ; yj ) + yj p(xi ; yj ) = i j j i X X = xi pX (xi ) + yj pY (yj ) = i j =E(X ) + E(Y ): Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 10 / 67 Linearity of expectation The linearity of expectation can be easily generalized for any linear combination of n random variables, i.e. Theorem (Linearity of expectation) Let X1; X2;::: Xn be random variables and a1; a2;::: an 2 R constants. Then n ! n X X E ai Xi = ai E(Xi ): i=1 i=1 Proof is left as a home exercise :-). Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 11 / 67 Expectation of independent random variables Theorem If X and Y are independent random variables, then E(XY ) = E(X )E(Y ): Proof. X X E(XY ) = xi yj p(xi ; yj ) = i j X X = xi yj pX (xi )pY (yj ) = i j X X = xi pX (xi ) yj pY (yj ) = i j =E(X )E(Y ): Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 12 / 67 Expectation of independent random variables The expectation of independent random variables can be easily generalized for any n{tuple X1; X2;::: Xn of mutually independent random variables: n ! n Y Y E Xi = E(Xi ): i=1 i=1 If Φ1; Φ2;::: Φn are functions, then " n # n Y Y E Φi (Xi ) = E[Φi (Xi )]: i=1 i=1 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 13 / 67 Variance revisited Theorem 2 Let σX be the variance of the random variable X . Then 2 2 2 σX = E(X ) − [E(X )] : Proof. 2 2 2 2 σX =E [X − E(X )] = E X − 2XE(X ) + [E(X )] = =E(X 2) − E[2XE(X )] + [E(X )]2 = =E(X 2) − 2E(X )E(X ) + [E(X )]2: Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 14 / 67 Covariance Definition The quantity X E [X − E(X )][Y − E(Y )] = pxi ;yj [xi − E(X )] [yj − E(Y )] i;j is called the covariance of X and Y and denoted Cov(X ; Y ). Theorem Let X and Y be independent random variables. Then the covariance of X and Y Cov(X ; Y ) = 0. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 15 / 67 Covariance Proof. Cov(X ; Y ) =E[X − E(X )][Y − E(Y )] = =E[XY − YE(X ) − XE(Y ) + E(X )E(Y )] = =E(XY ) − E(Y )E(X ) − E(X )E(Y ) + E(X )E(Y ) = =E(X )E(Y ) − E(Y )E(X ) − E(X )E(Y ) + E(X )E(Y ) = 0 Covariance measures linear (!) dependence between two random variables. E.g. when X = aY , a 6= 0, using E(X ) = aE(Y ) we have 1 Cov(X ; Y ) = aVar(Y ) = Var(X ): a Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 16 / 67 Covariance In general it holds that 0 ≤ Cov 2(X ; Y ) ≤ Var(X )Var(Y ): Definition We define the correlation coefficient ρ(X ; Y ) as the normalized covariance, i.e. Cov(X ; Y ) ρ(X ; Y ) = : pVar(X )Var(Y ) It holds that −1 ≤ ρ(X ; Y ) ≤ 1. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 17 / 67 Covariance It may happen that X is completely dependent on Y and yet the covariance is 0, e.g. for X = Y 2 and a suitably chosen Y . Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 18 / 67 Variance Theorem If X and Y are independent random variables, then Var(X + Y ) = Var(X ) + Var(Y ): Proof. Var(X + Y ) = E[(X + Y ) − E(X + Y )]2 = =E[(X + Y ) − E(X ) − E(Y )]2 = E[(X − E(X )) + (Y − E(Y ))]2 = =E[X − E(X )]2 + [Y − E(Y )]2 + 2[X − E(X )][Y − E(Y )] = =E[X − E(X )]2 + E[Y − E(Y )]2 + 2E[X − E(X )][Y − E(Y )] = =Var(X ) + Var(Y ) + 2E[X − E(X )][Y − E(Y )] = =Var(X ) + Var(Y ) + 2Cov(X ; Y ) = Var(X ) + Var(Y ): Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 19 / 67 Variance If X and Y are not independent, we obtain (see proof on the previous transparency) Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X ; Y ): The additivity of variance can be generalized to a set X1; X2;::: Xn of mutually independent variables and constants a1; a2;::: an 2 R as n ! n X X 2 Var ai Xi = ai Var(Xi ): i=1 i=1 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 20 / 67 Part III Conditional Distribution and Expectation Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersApril 19, 2009 21 / 67 Conditional probability Using the derivation of conditional probability of two events we can derive conditional probability of (a pair of) random variables. Definition The conditional probability distribution of random variable Y given random variable X (their joint distribution is pX ;Y (x; y)) is P(Y = y; X = x) p (yjx) =P(Y = yjX = x) = = Y jX P(X = x) (1) p (x; y) = X ;Y pX (x) provided pX (x) 6= 0.

Load more