Stat 609: Mathematical Statistics Lecture 4

Lecture 4: Expectations The expected value, also called the expectation or mean, of a random variable is its average value weighted by its probability distribution. Definition 2.2.1. The expected value or mean of a random variable g(X) is 8 Z ¥ g(x)f (x)dx if X has pdf f <> X X E[g(X)] = −¥ :> ∑g(x)fX (x) if X has pmf fX x provided that the integral or the sum exists (is finite); otherwise we say that the expected value of g(X) does not exist. The expected value is a number that summarizes a typical, middle, or expected value of an observation of the random variable. If g(X)≥0, then E[g(X)] is always defined except that it may be ¥. For any g(X), its expected value exists iff Ejg(X)j < ¥. beamer-tu-logo The expectation is associated with the distribution of X, not with X. UW-Madison (Statistics) Stat 609 Lecture 4 2015 1 / 17 Example X has distribution x −2 −1 0 1 2 3 fX (x) 0.1 0.2 0.1 0.2 0.3 0.1 E(X) = ∑xfX (x) = −2 × 0:1 − 0:2 + 0:2 + 2 × 0:3 + 3 × 0:1 = 0:7 2 2 E(X ) = ∑x fX (x) = 4 × 0:1 + 0:2 + 0:2 + 4 × 0:3 + 9 × 0:1 = 2:9 Y = g(X) = X 2 has distribution y 0 1 4 9 fY (y) 0.1 0.4 0.4 0.1 2 E(Y ) = E(X ) = ∑yfY (y) = 0:4 + 4 × 0:4 + 9 × 0:1 = 2:9 Example 2.2.2. −1 −x=l Suppose X ≥ 0 has pdf fX (x) = l e , where l > 0 is a constant. Then Z ¥ ¥ Z ¥ −1 −x= −x= −x= E(X) = l xe l dx = −xe l + e l dx = l beamer-tu-logo 0 0 0 UW-Madison (Statistics) Stat 609 Lecture 4 2015 2 / 17 In some cases, the calculation of the expectation requires some derivation. Example 2.2.3. Let X be a discrete random variable with pmf n f (x) = px (1 − p)n−x ; x = 0;1;:::;n; X x where n is a fixed integer and 0 < p < 1 is a fixed constant. For any n and p, n n ∑ px (1 − p)n−x = 1: x=0 x n n−1 By definition and the fact that x x = n x−1 , we obtain n n n n E(X) = ∑ x px (1 − p)n−x = ∑ x px (1 − p)n−x x=0 x x=1 x n n − 1 n−1 n − 1 = n px (1 − p)n−x = np py (1 − p)n−1−y ∑ x − 1 ∑ y x=1 y=0 beamer-tu-logo = np UW-Madison (Statistics) Stat 609 Lecture 4 2015 3 / 17 Example 2.2.4 (Nonexistence of expectation) Suppose that X has the following pdf: 1 f (x) = ; x 2 R X p(1 + x2) Note that Z ¥ jxj 2 Z ¥ x j j = = E X 2 dx 2 dx −¥ p(1 + x ) p 0 1 + x 2 Z M x 2 Z M 1 = = ( 2= ) lim 2 dx lim 2 d x 2 p M!¥ 0 1 + x p M!¥ 0 1 + x 2 M 2 2 log(1 + x ) 2 log(1 + M ) = lim = lim = ¥ p M!¥ 2 0 p M!¥ 2 Thus, neither X nor jXj has an expectation. We should not wrongly think Z ¥ x Z M x E(X) = dx = lim dx = lim 0 = 0 2 2 beamer-tu-logo −¥ p(1 + x ) M!¥ −M p(1 + x ) M!¥ UW-Madison (Statistics) Stat 609 Lecture 4 2015 4 / 17 Linear operation Taking expectation is a linear operation in the sense that E(c) = c for any constant c; E(aX) = aE(X) for any constant a and random variable X; E(X + Y ) = E(X) + E(Y ) for any random variables X and Y . These are special cases of the following theorem. Theorem 2.2.5 Let X and Y be random variables whose expectations exist, and let a, b, and c be constants. a. E(aX + bY + c) = aE(X) + bE(Y ) + c. b. If X ≥ Y , then E(X) ≥ E(Y ). Proof. The proof of Theorem 2.2.5 is easy if X = g1(Z) and Y = g2(Z ) for a random variable Z and some functions g1 and g2, and Z has a pdf or pmf. In general, the proof of property a is not simple (a topic in stat 709).beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 4 2015 5 / 17 Example 2.2.6 (Minimizing distance) The mean of a random variable X is a good guess (predict) of X. Suppose that we measure the distance between X and a constant b by (X − b)2. The closer b is to X, the smaller this quantity is. We want to find a value b that minimizes the average E(X − b)2. If m = E(X), then E(X − m)2 = minE(X − b)2 b2R To show this, note that E(X − b)2 = E[(X − m) + (m − b))]2 = E[(X − m)2] + E[(m − b)2] + E[2(X − m)(m − b)] = E(X − m)2 + (m − b)2 + 2(m − b)E(X − m) = E(X − m)2 + (m − b)2 ≥ E(X − m)2: There is an alternative proof using E(X − b)2 = E(X 2) − 2bE(X) +beamer-tu-logob2. UW-Madison (Statistics) Stat 609 Lecture 4 2015 6 / 17 Nonlinear functions When calculating expectations of nonlinear functions of X, we can proceed in one of two ways. Using the distribution of X, we can calculate Z ¥ E[g(X)] = g(x)fX (x)dx or ∑g(x)fX (x) −¥ x We can find the pdf or pmf of Y = g(X), fY , and then use Z ¥ E[g(X)] = E(Y ) = yfY (y)dy or ∑yfY (y) −¥ y Example 2.2.7. Suppose that X has a uniform pdf on (0,1), 1 0 < x < 1 f (x) = X 0 otherwise Consider g(X) = −logX. Z 1 1 beamer-tu-logo E[g(X)] = −log(x)dx = (x − x logx) = 1 0 0 UW-Madison (Statistics) Stat 609 Lecture 4 2015 7 / 17 On the other hand, we can use Theorem 2.1.5 to obtain the pdf of Y = −logX: −y −y −y fY (y) = fX (e )je j = e ; y > 0 Then Z ¥ Z ¥ −y E[g(X)] = E(Y ) = yfY (y)dy = ye dy = 1 −¥ 0 Example Suppose that X has the standard normal pdf 1 −x2=2 fX (x) = p e ; x 2 R 2p Consider g(X) = X 2. By Example 2.1.9, Y = X 2 has the chi-square pdf 1 −y=2 fY (y) = e ; y > 0 p2py Then Z ¥ Z ¥ 2 1 p −y=2 E(X ) = E(Y ) = yfY (y)dy = p ye dy = 1 beamer-tu-logo 0 2p 0 UW-Madison (Statistics) Stat 609 Lecture 4 2015 8 / 17 Theorem Let F be the cdf of a random variable X. Then Z ¥ Z 0 EjXj = [1 − F(x)]dx + F(x)dx 0 −¥ and EjXj < ¥ iff both integrals are finite. In the case where EjXj < ¥, Z ¥ Z 0 E(X) = [1 − F(x)]dx − F(x)dx 0 −¥ Proof. We only consider the case where X has a pdf or pmf, and give a proof when X has a pdf (the proof for the case where X has a pmf is similar). When X has a pdf f , Z ¥ Z ¥ Z ¥ Z ¥ Z t Z ¥ [1 − F(x)]dx = f (t)dtdx = f (t)dxdt = tf (t)dt 0 0 x 0 0 0 Z ¥ Z 0 Z x Z 0 Z 0 Z 0 F(x)dx = f (t)dtdx = f (t)dxdt = −tf (t)dt 0 −¥ −¥ −¥ t −¥ These calculations are valid regardless of whether the integrals arebeamer-tu-logo finite or not. UW-Madison (Statistics) Stat 609 Lecture 4 2015 9 / 17 By definition, Z ¥ Z ¥ Z 0 EjXj = jtjf (t)dt = tf (t)dt + −tf (t)dt −¥ 0 −¥ Z ¥ Z 0 = [1 − F(x)]dx + F(x)dx 0 −¥ Since both integrals ≥ 0, EjXj < ¥ iff both integrals are finite. If EjXj < ¥, then both integrals are finite and Z ¥ Z ¥ Z 0 E(X) = tf (t)dt = tf (t)dt + tf (t)dt −¥ 0 −¥ Z ¥ Z 0 = [1 − F(x)]dx − F(x)dx 0 −¥ Property For any random variable X, ¥ ¥ ∑ P(jXj ≥ n) ≤ EjXj ≤ 1 + ∑ P(jXj ≥ n): n=1 n=1 beamer-tu-logo Then EjXj < ¥ iff the sum is finite. UW-Madison (Statistics) Stat 609 Lecture 4 2015 10 / 17 Using the previous theorem, we can prove this result regardless of whether X has a pdf (pmf) or not: Z ¥ Z 0 Z ¥ Z ¥ EjXj = [1 − F(x)]dx + F(x)dx = [1 − F(x)]dx + F(−x)dx 0 −¥ 0 0 Z ¥ Z ¥ = [1 − F(x) + F(−x)]dx = [P(X > x) + P(X ≤ −x)]dx 0 0 Z ¥ Z ¥ ≤ [P(X ≥ x) + P(X ≤ −x)]dx = P(jXj ≥ x)dx 0 0 ¥ Z n+1 ¥ Z n+1 = ∑ P(jXj ≥ x)dx ≤ ∑ P(jXj ≥ n)dx n=0 n n=0 n ¥ ¥ = ∑ P(jXj ≥ n) = 1 + ∑ P(jXj ≥ n) n=0 n=1 Similarly, Z ¥ Z ¥ EjXj ≥ [P(X > x) + P(X < −x)]dx = P(jXj > x)dx 0 0 ¥ Z n+1 ¥ ¥ = ∑ P(jXj > x)dx ≥ ∑ P(jXj ≥ n + 1) = ∑ P(jXj ≥beamer-tu-logon) n=0 n n=0 n=1 UW-Madison (Statistics) Stat 609 Lecture 4 2015 11 / 17 Moments The various moments of a random variable are an important class of expectations.

Load more