Stat 609: Mathematical Statistics Lecture 4

Lecture 4: Expectations The expected value, also called the expectation or mean, of a random variable is its average value weighted by its probability distribution. Definition 2.2.1. The expected value or mean of a random variable g(X) is 8 Z ¥ g(x)f (x)dx if X has pdf f <> X X E[g(X)] = −¥ :> ∑g(x)fX (x) if X has pmf fX x provided that the integral or the sum exists (is finite); otherwise we say that the expected value of g(X) does not exist. The expected value is a number that summarizes a typical, middle, or expected value of an observation of the random variable. If g(X)≥0, then E[g(X)] is always defined except that it may be ¥. For any g(X), its expected value exists iff Ejg(X)j < ¥. beamer-tu-logo The expectation is associated with the distribution of X, not with X. UW-Madison (Statistics) Stat 609 Lecture 4 2015 1 / 17 Example X has distribution x −2 −1 0 1 2 3 fX (x) 0.1 0.2 0.1 0.2 0.3 0.1 E(X) = ∑xfX (x) = −2 × 0:1 − 0:2 + 0:2 + 2 × 0:3 + 3 × 0:1 = 0:7 2 2 E(X ) = ∑x fX (x) = 4 × 0:1 + 0:2 + 0:2 + 4 × 0:3 + 9 × 0:1 = 2:9 Y = g(X) = X 2 has distribution y 0 1 4 9 fY (y) 0.1 0.4 0.4 0.1 2 E(Y ) = E(X ) = ∑yfY (y) = 0:4 + 4 × 0:4 + 9 × 0:1 = 2:9 Example 2.2.2. −1 −x=l Suppose X ≥ 0 has pdf fX (x) = l e , where l > 0 is a constant. Then Z ¥ ¥ Z ¥ −1 −x= −x= −x= E(X) = l xe l dx = −xe l + e l dx = l beamer-tu-logo 0 0 0 UW-Madison (Statistics) Stat 609 Lecture 4 2015 2 / 17 In some cases, the calculation of the expectation requires some derivation. Example 2.2.3. Let X be a discrete random variable with pmf n f (x) = px (1 − p)n−x ; x = 0;1;:::;n; X x where n is a fixed integer and 0 < p < 1 is a fixed constant. For any n and p, n n ∑ px (1 − p)n−x = 1: x=0 x n n−1 By definition and the fact that x x = n x−1 , we obtain n n n n E(X) = ∑ x px (1 − p)n−x = ∑ x px (1 − p)n−x x=0 x x=1 x n n − 1 n−1 n − 1 = n px (1 − p)n−x = np py (1 − p)n−1−y ∑ x − 1 ∑ y x=1 y=0 beamer-tu-logo = np UW-Madison (Statistics) Stat 609 Lecture 4 2015 3 / 17 Example 2.2.4 (Nonexistence of expectation) Suppose that X has the following pdf: 1 f (x) = ; x 2 R X p(1 + x2) Note that Z ¥ jxj 2 Z ¥ x j j = = E X 2 dx 2 dx −¥ p(1 + x ) p 0 1 + x 2 Z M x 2 Z M 1 = = ( 2= ) lim 2 dx lim 2 d x 2 p M!¥ 0 1 + x p M!¥ 0 1 + x 2 M 2 2 log(1 + x ) 2 log(1 + M ) = lim = lim = ¥ p M!¥ 2 0 p M!¥ 2 Thus, neither X nor jXj has an expectation. We should not wrongly think Z ¥ x Z M x E(X) = dx = lim dx = lim 0 = 0 2 2 beamer-tu-logo −¥ p(1 + x ) M!¥ −M p(1 + x ) M!¥ UW-Madison (Statistics) Stat 609 Lecture 4 2015 4 / 17 Linear operation Taking expectation is a linear operation in the sense that E(c) = c for any constant c; E(aX) = aE(X) for any constant a and random variable X; E(X + Y ) = E(X) + E(Y ) for any random variables X and Y . These are special cases of the following theorem. Theorem 2.2.5 Let X and Y be random variables whose expectations exist, and let a, b, and c be constants. a. E(aX + bY + c) = aE(X) + bE(Y ) + c. b. If X ≥ Y , then E(X) ≥ E(Y ). Proof. The proof of Theorem 2.2.5 is easy if X = g1(Z) and Y = g2(Z ) for a random variable Z and some functions g1 and g2, and Z has a pdf or pmf. In general, the proof of property a is not simple (a topic in stat 709).beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 4 2015 5 / 17 Example 2.2.6 (Minimizing distance) The mean of a random variable X is a good guess (predict) of X. Suppose that we measure the distance between X and a constant b by (X − b)2. The closer b is to X, the smaller this quantity is. We want to find a value b that minimizes the average E(X − b)2. If m = E(X), then E(X − m)2 = minE(X − b)2 b2R To show this, note that E(X − b)2 = E[(X − m) + (m − b))]2 = E[(X − m)2] + E[(m − b)2] + E[2(X − m)(m − b)] = E(X − m)2 + (m − b)2 + 2(m − b)E(X − m) = E(X − m)2 + (m − b)2 ≥ E(X − m)2: There is an alternative proof using E(X − b)2 = E(X 2) − 2bE(X) +beamer-tu-logob2. UW-Madison (Statistics) Stat 609 Lecture 4 2015 6 / 17 Nonlinear functions When calculating expectations of nonlinear functions of X, we can proceed in one of two ways. Using the distribution of X, we can calculate Z ¥ E[g(X)] = g(x)fX (x)dx or ∑g(x)fX (x) −¥ x We can find the pdf or pmf of Y = g(X), fY , and then use Z ¥ E[g(X)] = E(Y ) = yfY (y)dy or ∑yfY (y) −¥ y Example 2.2.7. Suppose that X has a uniform pdf on (0,1), 1 0 < x < 1 f (x) = X 0 otherwise Consider g(X) = −logX. Z 1 1 beamer-tu-logo E[g(X)] = −log(x)dx = (x − x logx) = 1 0 0 UW-Madison (Statistics) Stat 609 Lecture 4 2015 7 / 17 On the other hand, we can use Theorem 2.1.5 to obtain the pdf of Y = −logX: −y −y −y fY (y) = fX (e )je j = e ; y > 0 Then Z ¥ Z ¥ −y E[g(X)] = E(Y ) = yfY (y)dy = ye dy = 1 −¥ 0 Example Suppose that X has the standard normal pdf 1 −x2=2 fX (x) = p e ; x 2 R 2p Consider g(X) = X 2. By Example 2.1.9, Y = X 2 has the chi-square pdf 1 −y=2 fY (y) = e ; y > 0 p2py Then Z ¥ Z ¥ 2 1 p −y=2 E(X ) = E(Y ) = yfY (y)dy = p ye dy = 1 beamer-tu-logo 0 2p 0 UW-Madison (Statistics) Stat 609 Lecture 4 2015 8 / 17 Theorem Let F be the cdf of a random variable X. Then Z ¥ Z 0 EjXj = [1 − F(x)]dx + F(x)dx 0 −¥ and EjXj < ¥ iff both integrals are finite. In the case where EjXj < ¥, Z ¥ Z 0 E(X) = [1 − F(x)]dx − F(x)dx 0 −¥ Proof. We only consider the case where X has a pdf or pmf, and give a proof when X has a pdf (the proof for the case where X has a pmf is similar). When X has a pdf f , Z ¥ Z ¥ Z ¥ Z ¥ Z t Z ¥ [1 − F(x)]dx = f (t)dtdx = f (t)dxdt = tf (t)dt 0 0 x 0 0 0 Z ¥ Z 0 Z x Z 0 Z 0 Z 0 F(x)dx = f (t)dtdx = f (t)dxdt = −tf (t)dt 0 −¥ −¥ −¥ t −¥ These calculations are valid regardless of whether the integrals arebeamer-tu-logo finite or not. UW-Madison (Statistics) Stat 609 Lecture 4 2015 9 / 17 By definition, Z ¥ Z ¥ Z 0 EjXj = jtjf (t)dt = tf (t)dt + −tf (t)dt −¥ 0 −¥ Z ¥ Z 0 = [1 − F(x)]dx + F(x)dx 0 −¥ Since both integrals ≥ 0, EjXj < ¥ iff both integrals are finite. If EjXj < ¥, then both integrals are finite and Z ¥ Z ¥ Z 0 E(X) = tf (t)dt = tf (t)dt + tf (t)dt −¥ 0 −¥ Z ¥ Z 0 = [1 − F(x)]dx − F(x)dx 0 −¥ Property For any random variable X, ¥ ¥ ∑ P(jXj ≥ n) ≤ EjXj ≤ 1 + ∑ P(jXj ≥ n): n=1 n=1 beamer-tu-logo Then EjXj < ¥ iff the sum is finite. UW-Madison (Statistics) Stat 609 Lecture 4 2015 10 / 17 Using the previous theorem, we can prove this result regardless of whether X has a pdf (pmf) or not: Z ¥ Z 0 Z ¥ Z ¥ EjXj = [1 − F(x)]dx + F(x)dx = [1 − F(x)]dx + F(−x)dx 0 −¥ 0 0 Z ¥ Z ¥ = [1 − F(x) + F(−x)]dx = [P(X > x) + P(X ≤ −x)]dx 0 0 Z ¥ Z ¥ ≤ [P(X ≥ x) + P(X ≤ −x)]dx = P(jXj ≥ x)dx 0 0 ¥ Z n+1 ¥ Z n+1 = ∑ P(jXj ≥ x)dx ≤ ∑ P(jXj ≥ n)dx n=0 n n=0 n ¥ ¥ = ∑ P(jXj ≥ n) = 1 + ∑ P(jXj ≥ n) n=0 n=1 Similarly, Z ¥ Z ¥ EjXj ≥ [P(X > x) + P(X < −x)]dx = P(jXj > x)dx 0 0 ¥ Z n+1 ¥ ¥ = ∑ P(jXj > x)dx ≥ ∑ P(jXj ≥ n + 1) = ∑ P(jXj ≥beamer-tu-logon) n=0 n n=0 n=1 UW-Madison (Statistics) Stat 609 Lecture 4 2015 11 / 17 Moments The various moments of a random variable are an important class of expectations.

Stat 609: Mathematical Statistics Lecture 4

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support