Topic 8
The Expected Value
Among the simplest summaries of quantitative data is the sample mean. Given a random variable, the corresponding concept is given a variety of names, the distributional mean, the expectation or the expected value. We begin with the case of discrete random variables where this analogy is more apparent. The formula for continuous random variables is obtained by approximating with a discrete random variable and noticing that the formula for the expected value is a Riemann sum. Thus, expected values for continuous random variables are determined by computing an integral.
8.1 Definition and Properties
Recall for a data set taking numerical values x1,x2,...,xn, one of the methods for computing the sample mean of a real-valued function of the data is accomplished by evaluating the sum,
h(x)= h(x)p(x), x X where p(x) is the proportion of observations taking the value x. For a finite sample space ⌦= !1,!2,...,!N and a probability P on ⌦, we can define the expectation or the { } expected value of a random variable X by an analogous average,
N EX = X(! )P ! . (8.1) j { j} j=1 X
More generally for a real-valued function g of the random vector X =(X1,X2,...,Xn), we have the formula
N Eg(X)= g(X(! ))P ! . (8.2) j { j} j=1 X Notice that even though we have this analogy, the two formulas come from very different starting points. The value of h(x) is derived from data whereas no data are involved in computing Eg(X). The starting point for the expected value is a probability model. Example 8.1. Roll one die. Then ⌦= 1, 2, 3, 4, 5, 6 . Let X be the value on the die. So, X(!)=!. If the die is { } fair, then the probability model has P ! =1/6 for each outcome !. Using the formula (8.1), the expected value { } EX =1 P 1 +2 P 2 +3 P 3 +4 P 4 +5 P 5 +6 P 6 · { } · { } · { } · { } · { } · { } 1 1 1 1 1 1 21 7 =1 +2 +3 +4 +5 +6 = = . · 6 · 6 · 6 · 6 · 6 · 6 6 2
119 Introduction to the Science of Statistics The Expected Value
An example of an unfair dice would be the probability with P 1 = P 2 = P 3 =1/4 and P 4 = P 5 = { } { } { } { } { } P 6 =1/12. In this case, the expected value { } 1 1 1 1 1 1 11 EX =1 +2 +3 +4 +5 +6 = . · 4 · 4 · 4 · 12 · 12 · 12 4 Exercise 8.2. Use the formula (8.2) with g(x)=x2 to find EX2 for these two examples. Two properties of expectation are immediate from the formula for EX in (8.1): 1. If X(!) 0 for every outcome ! ⌦, then every term in the sum in (8.1) is nonnegative and consequently 2 their sum EX 0. 2. Let X1 and X2 be two random variables and c1,c2 be two real numbers, then by using g(x1,x2)=c1x1 + c2x2 and the distributive property to the sum in (8.2), we find out that
E[c1X1 + c2X2]=c1EX1 + c2EX2.
The first of these properties states that nonnegative random variables have nonnegative expected value. The second states that expectation is a linear operation. Taking these two properties together, we say that the operation of taking an expectation X EX 7! is a positive linear functional. We have studied extensively another example of a positive linear functional, namely, the definite integral b g g(x) dx 7! Za that takes a continuous positive function and gives the area between the graph of g and the x-axis between the vertical lines x = a and x = b. For this example, these two properties become:
b 1. If g(x) 0 for every x [a, b], then g(x) dx 0. 2 a 2. Let g1 and g2 be two continuous functionsR and c1,c2 be two real numbers, then
b b b (c1g1(x)+c2g2(x)) dx = c1 g1(x) dx + c2 g2(x) dx. Za Za Za This analogy will be useful to keep in mind when considering the properties of expectation.
Example 8.3. If X1 and X2 are the values on two rolls of a fair die, then the expected value of the sum 7 7 E[X + X ]=EX + EX = + =7. 1 2 1 2 2 2
8.2 Discrete Random Variables
Because sample spaces can be extraordinarily large even in routine situations, we rarely use the probability space ⌦ as the basis to compute the expected value. We illustrate this with the example of tossing a coin three times. Let X denote the number of heads. To compute the expected value EX, we can proceed as described in (8.1). For the table below, we have grouped the outcomes ! that have a common value x =3, 2, 1 or 0 for X(!). From the definition of expectation in (8.1), EX, the expected value of X is the sum of the values in column F. We want to now show that EX is also the sum of the values in column G. Note, for example, that, three outcomes HHT,HTH and THH each give a value of 2 for X. Because these outcomes are disjoint, we can add probabilities P HHT + P HTH + P THH = P HHT,HTH,THH { } { } { } { } 120 Introduction to the Science of Statistics The Expected Value
A B C D E F G ! X(!) x P ! P X = x X(!)P ! xP X = x { } { } { } { } HHH 3 3 P HHH P X =3 X(HHH)P HHH 3P X =3 { } { } { } { } HHT 2 P HHT X(HHT)P HHT { } { } HTH 2 2 P HTH P X =2 X(HTH)P HTH 2P X =2 { } { } { } { } THH 2 P THH X(THH)P THH { } { } HTT 1 P HTT X(HHT)P HHT { } { } TTH 1 1 P THT P X =1 X(HTH)P HTH 1P X =1 { } { } { } { } THT 1 P TTH X(THH)P THH { } { } TTT 0 0 P TTT P X =0 X(TTT)P TTT 0P X =0 { } { } { } { } Table I: Developing the formula for EX for the case of the coin tosses.
But, the event HHT,HTH,THH can also be written as the event X =2 . { } { } This is shown for each value of x in column C, P X = x , the probabilities in column E are obtained as a sum of { } probabilities in column D. Thus, by combining outcomes that result in the same value for the random variable, the sums in the boxes in column F are equal to the value in the corresponding box in column G. and thus their total sums are the same. In other words,
EX =0 P X =0 +1 P X =1 +2 P X =2 +3 P X =3 . · { } · { } · { } · { } As in the discussion above, we can, in general, find Eg(X). First, to build a table, denote the outcomes in the probability space ⌦ as !1,...,!k,!k+1,...,!N and the state space for the random variable X as x1,...,xi,...,xn. Note that we have partitioned the sample space ⌦ into the outcomes ! that result in the same value x for the random variable X(!). This is shown by the horizontal lines in the table above showing that X(! )=X(! )= = x . k k+1 ··· i The equality of sum of the probabilities in a box in columns D and and the probability in column E can be written
P ! = P X = x . { } { i} !;XX(!)=xi
A B C D E F G ! X(!) x P ! P X = x g(X(!))P ! g(x)P X = x { } { } { } { } ...... ! X(! ) P ! g(X(! ))P ! k k { k} k { k} !k+1 X(!k+1) xi P !k+1 P X = xi g(X(!k+1))P !k+1 g(xi)P X = xi . . { . } { } . { } { } ......
Table II: Establishing the identity in (8.3) from (8.2). Arrange the rows of the table so that common values of X(!k),X(!k+1),...in the box in column B have the value xi in column C. Thus, the probabilities in a box in column D sum to give the probability in the corresponding box in column E. Because the values for g(X(!k)),g(X(!k+1)),...equal g(xi), the sum in a box in column F sums to the value in corresponding box in column G. Thus, the sums in columns F and G are equal. The sum in column F is the definition in (8.2). The sum in column G is the identity (8.3).
121 Introduction to the Science of Statistics The Expected Value
For these particular outcomes, g(X(!)) = g(xi) and the sum of the values in a boxes in column F,
g(X(!))P ! = g(x )P ! = g(x )P X = x , { } i { } i { i} !;XX(!)=xi !;XX(!)=xi the value in the corresponding box in column G. Now, sum over all possible value for X for each side of this equation.
n n Eg(X)= g(X(!))P ! = g(x )P X = x = g(x )f (x ) { } i { i} i X i ! i=1 i=1 X X X where f (x )=P X = x is the probability mass function for X. X i { i} The identity n
Eg(X)= g(xi)fX (xi)= g(x)fX (x) (8.3) i=1 x X X is the most frequently used method for computing the expectation of discrete random variables. We will soon see how this identity can be used to find the expectation in the case of continuous random variables
Example 8.4. Flip a biased coin twice and let X be the number of heads. Then, to compute the expected value of X and X2 we construct a table to prepare to use (8.3).
2 x fX (x) xfX (x) x fX (x) 0 (1 p)2 0 0 1 2p(1 p) 2p(1 p) 2p(1 p) 2 p2 2p 2 4p 2 sum 1 2p 2p +2p2
Thus, EX =2p and EX2 =2p +2p2.
Exercise 8.5. Draw 5 cards from a standard deck. Let X be the number of hearts. Use R to find EX and EX2.
A similar formula to (8.3) holds if we have a vector of random variables X =(X1,X2,...,Xn), fX , the joint probability mass function and g a real-valued function of x =(x1,x2,...,xn). In the two dimensional case, this takes the form
Eg(X1,X2)= g(x1,x2)fX1,X2 (x1,x2). (8.4) x x X1 X2 We will return to (8.4) in computing the covariance of two random variables.
8.3 Bernoulli Trials
Bernoulli trials are the simplest and among the most common models for an experimental procedure. Each trial has two possible outcomes, variously called,
heads-tails, yes-no, up-down, left-right, win-lose, female-male, green-blue, dominant-recessive, or success-failure depending on the circumstances. We will use the principles of counting and the properties of expectation to analyze Bernoulli trials. From the point of view of statistics, the data have an unknown success parameter p. Thus, the goal of statistical inference is to make as precise a statement as possible for the value of p behind the production of the data. Consequently, any experimenter that uses Bernoulli trials as a model ought to mirror its properties closely.
Example 8.6 (Bernoulli trials). Random variables X1,X2,...,Xn are called a sequence of Bernoulli trials provided that:
1. Each Xi takes on two values, namely, 0 and 1. We call the value 1 a success and the value 0 a failure.
122 Introduction to the Science of Statistics The Expected Value
2. Each trial has the same probability for success, i.e., P X =1 = p for each i. { i } 3. The outcomes on each of the trials is independent. For each trial i, the expected value EX =0 P X =0 +1 P X =1 =0 (1 p)+1 p = p i · { i } · { i } · · is the same as the success probability. Let S = X + X + + X be the total number of successes in n Bernoulli n 1 2 ··· n trials. Using the linearity of expectation, we see that ES = E[X + X + X ]=p + p + + p = np, n 1 2 ··· n ··· the expected number of successes in n Bernoulli trials is np.
In addition, we can use our ability to count to determine the probability mass function for Sn. Beginning with a concrete example, let n =8, and the outcome success, fail, fail, success, fail, fail, success, fail. Using the independence of the trials, we can compute the probability of this outcome: p (1 p) (1 p) p (1 p) (1 p) p (1 p)=p3(1 p)5. ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 8 Moreover, any of the possible 3 particular sequences of 8 Bernoulli trials having 3 successes also has probability p3(1 p)5. Each of the outcomes are mutually exclusive, and, taken together, their union is the event S =3. { 8 } Consequently, by the axioms of probability, we find that 8 P S =3 = p3(1 p)5. { 8 } 3 ✓ ◆ Returning to the general case, we replace 8 by n and 3 by x to see that any particular sequence of n Bernoulli trials having x successes has probability x n x p (1 p) . In addition, we know that we have n x ✓ ◆ mutually exclusive sequences of n Bernoulli trials that have x successes. Thus, we have the mass function
n x n x f (x)=P S = x = p (1 p) ,x=0, 1,...,n. Sn { n } x ✓ ◆ The fact that the sum n n n x n x n n f (x)= p (1 p) =(p +(1 p)) =1 =1 Sn x x=0 x=0 X X ✓ ◆ follows from the binomial theorem. Consequently, Sn is called a binomial random variable.
In the exercise above where X is the number of hearts in 5 cards, let Xi =1if the i-th card is a heart and 0 if it is not a heart. Then, the Xi are not Bernoulli trials because the chance of obtaining a heart on one card depends on whether or not a heart was obtained on other cards. Still,
X = X1 + X2 + X3 + X4 + X5 is the number of hearts and
EX = EX1 + EX2 + EX3 + EX4 + EX5 =1/4+1/4+1/4+1/4+1/4=5/4.
123 Introduction to the Science of Statistics The Expected Value
3 8.4 Continuous Random Variables density f 2.5 X For X a continuous random variable with density ˜ fX , consider the discrete random variable X ob- 2 tained from X by rounding down. Say, for exam- ple, we give lengths by rounding down to the near- 1.5 est millimeter. Thus, X˜ =2.134 meters for any lengths X satisfying 2.134 meters