Chapter 5: Multivariate Distributions

Professor Ron Fricker Naval Postgraduate School Monterey, California

3/15/15 Reading Assignment: Sections 5.1 – 5.12 1 Goals for this Chapter

• Bivariate and multivariate distributions – Bivariate – Multinomial distribution • Marginal and conditional distributions • Independence, and correlation • & of a of r.v.s – Linear functions of r.v.s in particular • and variance

3/15/15 2 Section 5.1: Bivariate and Multivariate Distributions

• A bivariate distribution is a on two random variables – I.e., it gives the probability on the simultaneous outcome of the random variables • For example, when playing craps a gambler might want to know the probability that in the simultaneous roll of two dice each comes up 1 • Another example: In an air attack on a bunker, the probability the bunker is not hardened and does not have SAM protection is of interest • A multivariate distribution is a probability distribution for more than two r.v.s

3/15/15 3 Joint

• Bivariate and multivariate distributions are joint probabilities – the probability that two or more events occur – It’s the probability of the intersection of n 2 events: Y = y , Y = y ,..., Y = y { 1 1} { 2 2} { n n} – We’ll denote the joint (discrete) probabilities as P (Y1 = y1,Y2 = y2,...,Yn = yn) • We’ll sometimes use the shorthand notation p(y1,y2,...,yn) – This is the probability that the event Y = y and { 1 1} the event Y 2 = y 2 and … and the event Y n = y n all occurred{ simultaneously} { }

3/15/15 4 Section 5.2: Bivariate and Multivariate Distributions

• Simple example of a bivariate distribution arises when throwing two dice

– Let Y1 be the number of spots on die 1

– Let Y2 be the number of spots on die 2 • Then there are 36 possible joint outcomes (remember the mn rule: 6 x 6 = 36)

– The outcomes (y1, y2) are (1,1), (1,2), (1,3),…, (6,6) • Assuming the dice are independent, all the outcomes are equally likely, so p(y1,y2)=P (Y1 = y1,Y2 = y2)=1/36, y1 =1, 2,...,6,y2 =1, 2,...,6

3/15/15 5 Bivariate PMF for the Example

3/15/15 6

Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical with Applications, 7th edition, Thomson Brooks/Cole. Defining Joint Probability Functions (for Discrete Random Variables)

• Definition 5.1: Let Y1 and Y2 be discrete random variables. The joint (or bivariate)

probability (mass) function for Y1 and Y2 is P (Y = y ,Y = y ),

• Theorem 5.1: If Y1 and Y2 are discrete r.v.s with joint probability function p ( y 1 ,y 2 ) , then 1. p (y ,y ) 0 for all y ,y 1 2 1 2

2. p ( y 1 ,y 2 )=1 , where the sum is over all y ,y non-zero p(y , y ) X1 2 1 2

3/15/15 7 Back to the Dice Example

• Find P (2 Y 3 , 1 Y 2) .  1   2  • Solution:

3/15/15 8 Textbook Example 5.1

• A NEX has 3 checkout counters. – Two customers arrive independently at the counters at different times when no other customers are present.

– Let Y1 denote the number (of the two) who choose counter 1 and let Y2 be the number who choose counter 2.

– Find the joint distribution of Y1 and Y2. • Solution:

3/15/15 9 Textbook Example 5.1 Solution Continued

3/15/15 10 Joint Distribution Functions

• Definition 5.2: For any random variables

(discrete or continuous) Y1 and Y2, the joint (bivariate) (cumulative) distribution function is F (y ,y )=P (Y y ,Y y ), 1 2 1  1 2  2

• For two discrete r.v.s Y1 and Y2,

F (y1,y2)= p(t1,t2) t y t y 1X 1 2X 2

3/15/15 11 Back to the Dice Example

• Find F (2 , 3) = P ( Y 2 ,Y 3) . 1  2  • Solution:

3/15/15 12 Textbook Example 5.2

• Back to Example 5.1. Find F ( 1 , 2) ,F (1 . 5 , 2) and F (5 , 7) . • Solution:

3/15/15 13 Properties of Joint CDFs

• Theorem 5.2: If Y1 and Y2 are random variables with joint cdf F(y1, y2), then 1. F ( , )=F (y , )=F ( ,y )=0 1 1 1 1 1 2 2. F ( , )=1 1 1 3. If y ⇤ y 1 and y ⇤ y , then 1 2 2 F (y⇤,y⇤) F (y⇤,y ) F (y ,y⇤)+F (y ,y ) 0 1 2 1 2 1 2 1 2

Property 3 follows because

F (y⇤,y⇤) F (y⇤,y ) F (y ,y⇤)+F (y ,y ) 1 2 1 2 1 2 1 2 = P (y Y y⇤,y Y y⇤) 0 1  1  1 2  2  2

3/15/15 14 Joint Probability Density Functions

• Definition 5.3: Let Y1 and Y2 be continuous random variables with joint distribution

function F(y1, y2). If there exists a non- negative function f (y1, y2) such that y1 y2 F (y1,y2)= f(t1,t2)dt1dt2 t = t = Z 1 1 Z 2 1 for all

3/15/15 15 Properties of Joint PDFs

• Theorem 5.3: If Y1 and Y2 are random variables with joint pdf f (y1, y2), then 1. f ( y ,y ) 0 for all y , y 1 2 1 2 f (y1, y2) 1 1 2. f ( y 1 ,y 2 ) dy 1 dy 2 =1 Z1 Z1 • An illustrative joint pdf, which

is a surface in y2 3 :

3/15/15 y1 16 Volumes Correspond to Probabilities

• For jointly continuous random variables, volumes under the pdf surface correspond to probabilities • E.g., for the pdf in Figure 5.2, the probability P ( a 1 Y 1 a 2 ,b 1 Y 2 b 2 ) corresponds to the volume  shown  

• It’s equal to

b2 a2 f(y1,y2)dy1dy2 Zb1 Za1

3/15/15 17

Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). with Applications, 7th edition, Thomson Brooks/Cole. Textbook Example 5.3

• Suppose a radioactive particle is located in a

square with sides of unit length. Let Y1 and Y2 denote the particle’s location and assume it is uniformly distributed in the square: 1, 0 y 1, 0 y 1 f(y ,y )= 1 2 1 2 0, elsewhere    ⇢ a. Sketch the pdf b. Find F(0.2,0.4) c. Find P (0.1 Y 0.3, 0.0 Y 0.5)  1   2 

3/15/15 18 Textbook Example 5.3 Solution

3/15/15 19 Textbook Example 5.3 Solution (cont’d)

3/15/15 20 3/15/15 21

Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical Statistics with Applications, 7th edition, Thomson Brooks/Cole. Textbook Example 5.4

• Gasoline is stored on a FOB in a bulk tank.

– Let Y1 denote the proportion of the tank available at the beginning of the week after restocking.

– Let Y2 denote the proportion of the tank that is dispensed over the week.

– Note that Y1 and Y2 must be between 0 and 1 and y2 must be less than or equal to y1. – Let the joint pdf be 3y , 0 y y 1 f(y ,y )= 1 2 1 1 2 0, elsewhere   ⇢ a. Sketch the pdf b. Find P (0 Y 0.5, 0.25 Y )  1   2 3/15/15 22 Textbook Example 5.4 Solution

• First, sketching the pdf:

3/15/15 23 3/15/15 24

Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical Statistics with Applications, 7th edition, Thomson Brooks/Cole. Textbook Example 5.4 Solution

• Now, to solve for the probability:

3/15/15 25 Calculating Probabilities from Multivariate Distributions • Finding probabilities from a bivariate pdf requires double integration over the proper region of distributional support • For multivariate distributions, the idea is the same – you just have to integrate over n dimensions: P (Y y ,Y y ,...,Y y )=F (y ,y ,...,y ) 1  1 2  2 n  n 1 2 n y1 y2 yn = f(t ,t ,...,t )dt ...dt ··· 1 2 n n 1 Z1 Z1 Z1

3/15/15 26 Section 5.2 Homework

• Do problems 5.1, 5.4, 5.5, 5.9

3/15/15 27 Section 5.3: Marginal and Conditional Distributions

• Marginal distributions connect the concept of (bivariate) joint distributions to univariate distributions (such as those in Chapters 3 & 4) – As we’ll see, in the discrete case, the name “” follows from summing across rows or down columns of a table • Conditional distributions are what arise when, in a joint distribution, we fix the value of one of the random variables – The name follows from traditional lexicon like “The

distribution of Y1, conditional on Y2 = y2, is…”

3/15/15 28 An Illustrative Example of Marginal Distributions • Remember the dice tossing of the previous section, where we defined

– Y1 to be the number of spots on die 1

– Y2 to be the number of spots on die 2

• There are 36 possible joint outcomes (y1, y2), which are (1,1), (1,2), (1,3),…, (6,6) • Assuming the dice are independent, the joint distribution is p(y1,y2)=P (Y1 = y1,Y2 = y2)=1/36, y1 =1, 2,...,6,y2 =1, 2,...,6

3/15/15 29 An Illustrative Example (cont’d)

• Now, we might want to know the probability of the univariate event P (Y1 = y1) • Given that all of the joint events are mutually exclusive, we have that 6

P (Y1 = y1)= p(y1,y2)

y2=1 – For example, X 6

P (Y1 = 1) = p1(1) = p(1,y2) y =1 X2 = p(1, 1) + p(1, 2) + p(1, 3) + p(1, 4) + p(1, 5) + p(1, 6) =1/36 + 1/36 + 1/36 + 1/36 + 1/36 + 1/36 = 1/6

3/15/15 30 An Illustrative Example (cont’d)

• In tabular form:

y1

y 1 2 3 4 5 6 Total ) 2 2 y (

1 1/36 1/36 1/36 1/36 1/36 1/36 1/6 2 p :

2 1/36 1/36 1/36 1/36 1/36 1/36 1/6 2 Y 3 1/36 1/36 1/36 1/36 1/36 1/36 1/6 4 1/36 1/36 1/36 1/36 1/36 1/36 1/6 5 1/36 1/36 1/36 1/36 1/36 1/36 1/6 Themarginal 6 1/36 1/36 1/36 1/36 1/36 1/36 1/6

Total 1/6 1/6 1/6 1/6 1/6 1/6 distributionof

The joint distribution of The marginal Y1 and Y2: p(y1,y2) 3/15/15 distribution of Y1: p1 (y1) 31 Defining Marginal Probability Functions

• Definition 5.4a: Let Y1 and Y2 be discrete random variables with pmf p(y1, y2). The marginal probability functions of Y1 and Y2, respectively, are

p1(y1)= p(y1,y2) allXy2 and

p2(y2)= p(y1,y2) allXy1

3/15/15 32 Defining Marginal Density Functions

• Definition 5.4b: Let Y1 and Y2 be jointly continuous random variables with pdf f (y1, y2). The marginal density functions of Y1 and Y2, respectively, are

1 f1(y1)= f(y1,y2)dy2 Z1 and

1 f2(y2)= f(y1,y2)dy1 Z1

3/15/15 33 Textbook Example 5.5

• From a group of three Navy officers, two Marine Corps officers, and one Army officer, an amphibious assault planning team of two officers is to be randomly selected.

– Let Y1 denote the number of Navy officers and Y2 denote the number of Marine Corps officers on the planning team.

• Find the joint probability function of Y1 and Y2 and then find the marginal pmf of Y1. • Solution:

3/15/15 34 Textbook Example 5.5 Solution (cont’d)

3/15/15 35 Textbook Example 5.6

• Let Y1 and Y2 have joint density 2y , 0 y 1, 0 y 1 f(y ,y )= 1 1 2 1 2 0, elsewhere    ⇢ Sketch f (y1,y2) and find the marginal densities for Y1 and Y2. • Solution:

3/15/15 36 Textbook Example 5.6 Solution (cont’d)

3/15/15 37 Conditional Distributions

• The idea of conditional distributions is that we know some information about one (or more) of the jointly distributed r.v.s – This knowledge changes our probability • Example: – An unconditional distribution: The distribution of heights on a randomly chosen NPS student – A conditional distribution: Given the randomly chosen student is female, the distribution of heights • The information that the randomly chosen student is female changes the distribution of heights

3/15/15 38 Connecting Back to Event-based Probabilities • Remember from Chapter 2, for events A and B, the probability of the intersection is P (A B)=P (A)P (B A) \ | • Now, consider two numerical events, (Y1 = y1) and ( Y 2 = y 2 ), where can can then write

p(y1,y2)=p1(y1)p(y2 y1) | = p (y )p(y y ) 2 2 1| 2 where the interpretation of p ( y 1 y 2 ) is the | probability that r.v. Y1 equals y1 given that Y2 takes on value y2

3/15/15 39 Defining Functions

• Definition 5.5: If Y1 and Y2 are jointly discrete r.v.s with pmf p(y1, y2) and marginal pmfs p1(y1) and p2(y2), respectively, then the conditional probability function of Y1 given Y2 is P (Y1 = y1,Y2 = y2) P (Y1 = y1 Y2 = y2)= | P (Y2 = y2)

p(y1,y2) = = p(y1 y2) p2(y2) | – Note that p ( y 1 y 2 ) is not defined if p2(y2)=0 | – The conditional probability function of Y2 given Y1 is similarly defined

3/15/15 40 Textbook Example 5.7

• Back to Example 5.5: Find the conditional

distribution of Y1 given Y2 =1. That is, find the conditional distribution of the number of Navy officers on the planning team given that we’re told one Marine Corps officer is on the team. • Solution:

3/15/15 41 Textbook Example 5.7 Solution (cont’d)

3/15/15 42 Generalizing to Continuous R.V.s

• Defining a conditional probability function for continuous r.v.s is problematic because P ( Y 1 = y 1 Y 2 = y 2 ) is undefined since both | ( Y 1 = y 1 ) and ( Y 2 = y 2 ) are events with zero probability – The problem, of course, is that for continuous random variables, the probability the r.v. takes on any particular value is zero – So, generalizing Definition 5.5 doesn’t work because the denominator is always zero – But we can make headway by using a different approach…

3/15/15 43 Defining Conditional Distribution Functions

• Definition 5.6: Let Y1 and Y2 be jointly continuous random variables with pdf f (y1, y2). The conditional distribution function of Y1 given Y2 = y2 is F (y y )=P (Y y Y = y ) 1| 2 1  1| 2 2 • And, with a bit of derivation (see the text), we have that y1 f(t1,y2) F (y1 y2)= dt1 | f2(y2) Z1 • From this, we can define conditional pdfs…

3/15/15 44 Defining Conditional Density Functions

• Definition 5.7: Let Y1 and Y2 be jointly continuous r.v.s with pdf f (y1, y2) and marginal densities f 1(y1) and f2(y2), respectively. For any y2 such that f2(y2) > 0, the conditional density function of Y1 given Y2 = y2 is

f(y1,y2) f(y1 y2)= | f2(y2)

• The conditional pdf of Y2 given Y1 = y1 is similarly defined as f(y y )=f(y ,y )/f (y ) 2| 1 1 2 1 1 3/15/15 45 Textbook Example 5.8

• A soft drink machine has a random amount Y2 (in gallons) in supply at the beginning of the

day and dispenses a random amount Y1 during the day. It is not resupplied during the day, so Y Y , and the joint pdf is 1  2 1/2, 0 y y 2 f(y ,y )= 1 2 1 2 0, elsewhere   ⇢ What is the probability that less than ½ gallon will be sold given that (“conditional on”) the machine containing 1.5 gallons at the start of the day?

3/15/15 46 Textbook Example 5.8 Solution

• Solution:

3/15/15 47 Textbook Example 5.8 Solution (cont’d)

3/15/15 48 Section 5.3 Homework

• Do problems 5.19, 5.22, 5.27

3/15/15 49 Section 5.4: Independent Random Variables

• In Chapter 2 we defined the independence of events – here we now extend that idea to the independence of random variables

• For two jointly distributed random variables Y1 and Y2, the probabilities associated with Y1 are the same regardless of the observed

value of Y2

– The idea is that learning something about Y2 does not tell you anything, probabilistically speaking,

about the distribution of Y1

3/15/15 50 The Idea of Independence

• Remember that two events A and B are independent if P (A B)=P (A) P (B) \ ⇥ • Now, when talking about r.v.s, we’re often interested in events like P (a

– That is, if Y1 and Y2 are independent, then the joint probabilities are the product of the marginal probs

3/15/15 51 Defining Independence for R.V.s

• Definition 5.8: Let Y1 have cdf F1 (y1), Y2 have cdf F2 (y2), and Y1 and Y2 have joint cdf F(y1,y2). Then Y1 and Y2 are independent iff F (y1,y2)=F1(y1)F2(y2)

for every pair of real numbers ( y 1 ,y 2 ) .

• If Y1 and Y2 are not independent, then they are said to be dependent • It is hard to demonstrate independence according to this definition – The next theorem will make it easier to determine if two r.v.s are independent 3/15/15 52 Determining Independence

• Theorem 5.4: If Y1 and Y2 are discrete r.v.s with joint pmf p(y1,y2) and marginal pmfs p1(y1) and p2(y2), then Y1 and Y2 are independent iff p(y1,y2)=p1(y1)p2(y2) for all pairs of real numbers ( y 1 ,y 2 ) .

• If Y1 and Y2 are continuous r.v.s with joint pdf f (y1,y2) and marginal pdfs f1(y1) and f2(y2), then Y1 and Y2 are independent iff f(y1,y2)=f1(y1)f2(y2) for all pairs of real numbers ( y 1 ,y 2 ) .

3/15/15 53 Textbook Example 5.9

• For the die-tossing problem of Section 5.2,

show that Y1 and Y2 are independent. • Solution:

3/15/15 54 Textbook Example 5.10

• In Example 5.5, is the number of Navy officers independent of the number of Marine

Corps officers? (I.e., is Y1 independent of Y2?) • Solution:

3/15/15 55 Textbook Example 5.11

• Let 6y y2, 0 y 1, 0 y 1 f(y ,y )= 1 2 1 2 1 2 0, elsewhere    ⇢ Show that Y1 and Y2 are independent. • Solution:

3/15/15 56 Textbook Example 5.11 Solution (cont’d)

3/15/15 57 Textbook Example 5.12

• Let 2, 0 y y 1 f(y ,y )= 2 1 1 2 0, elsewhere   ⇢ Show that Y1 and Y2 are dependent. • Solution:

3/15/15 58 Textbook Example 5.12 Solution (cont’d)

3/15/15 59 A Useful Theorem for Determining Independence of Two R.V.s

• Theorem 5.5: Let Y1 and Y2 have joint density f (y1,y2) that is positive iff a y 1 b and c y d , for constants a, b, c, and d; and  2  f (y1,y2) = 0 otherwise. Then Y1 and Y2 are independent r.v.s iff f(y1,y2)=g(y1)h(y2)

where g(y1) is a nonnegative function of y1 alone and h(y2) is a nonnegative function of y2 alone

– Note that g(y1) and h(y2) do not have to be densities!

3/15/15 60 Textbook Example 5.13

• Let Y1 and Y2 have joint density 2y , 0 y 1, 0 y 1 f(y ,y )= 1 1 2 1 2 0, elsewhere    ⇢ Are Y1 and Y2 independent? • Solution:

3/15/15 61 Textbook Example 5.13 Solution (cont’d)

3/15/15 62 Textbook Example 5.14

• Referring back to Example 5.4, is Y1, the proportion of the tank available at the

beginning of the week, independent of Y2, the proportion of the tank that is dispensed over the week? Remember, the joint pdf is 3y , 0 y y 1 f(y ,y )= 1 2 1 1 2 0, elsewhere   ⇢ • Solution:

3/15/15 63 Textbook Example 5.14 Solution (cont’d)

3/15/15 64 Section 5.4 Homework

• Do problems 5.45, 5.49, 5.53

3/15/15 65 Section 5.5: Expected Value of a Function of R.V.s

• Definition 5.9: Let g(Y1, Y2,…, Yk) be a function of the discrete random variables Y1, Y2,…, Yk, which have pmf p(y1, y2,…, yk). Then the expected value of g(Y1, Y2,…, Yk) is E [g(Y1,Y2,...,Yk)] =

g(y1,y2,...,yk)p(y1,y2,...,yk) ··· all yk all y2 all y1 X X X

If Y1,Y2,…,Yk are continuous with pdf f (y1, y2,…, yk), then E [g(Y1,Y2,...,Yk)] =

1 1 1 g(y ,y ,...,y )f(y ,y ,...,y )dy dy ...dy ··· 1 2 k 1 2 k 1 2 k Z1 Z1 Z1 3/15/15 66 Textbook Example 5.15

• Let Y1 and Y2 have joint density 2y , 0 y 1, 0 y 1 f(y ,y )= 1 1 2 1 2 0, elsewhere    ⇢ Find E(Y1Y2). • Solution:

3/15/15 67 Note the Consistency in Univariate and Bivariate Expectation Definitions

• Let g(Y1, Y2) = Y1. Then we have

1 1 E [g(Y1,Y2)] = g(y1,y2)f(y1,y2)dy2dy1 Z1 Z1 1 1 = y1f(y1,y2)dy2dy1 Z1 Z1 1 1 = y1 f(y1,y2)dy2 dy1 Z1 Z1 1 = y1f1(y1)dy1 = E(Y1) Z1

3/15/15 68 Textbook Example 5.16

• Let Y1 and Y2 have joint density 2y , 0 y 1, 0 y 1 f(y ,y )= 1 1 2 1 2 0, elsewhere    ⇢ Find E(Y1). • Solution:

3/15/15 69 Textbook Example 5.17

• For Y1 and Y2 with joint density as described in Example 5.15, in Figure 5.6 it appears that

the of Y2 is 0.5. Find E(Y2) and evaluate whether the visual estimate is correct or not. • Solution:

3/15/15 70 Textbook Example 5.18

• Let Y1 and Y2 have joint density 2y , 0 y 1, 0 y 1 f(y ,y )= 1 1 2 1 2 0, elsewhere    ⇢ Find V(Y1). • Solution:

3/15/15 71 Textbook Example 5.19

• An industrial chemical process yields a product with two types of impurities.

– In a given sample, let Y1 denote the proportion of impurities in the sample and let Y2 denote the proportion of “type I” impurities (out of all the impurities).

– The joint density of is Y1 and Y2 is 2(1 y ), 0 y 1, 0 y 1 f(y ,y )= 1 1 2 1 2 0, elsewhere    ⇢ Find the expected value of the proportion of type I impurities in the sample.

3/15/15 72 Textbook Example 5.19 Solution

3/15/15 73 Section 5.6: Special Theorems

• The univariate expectation results from Sections 3.3 and 4.3 generalize directly to joint distributions of random variables

• Theorem 5.6: Let c be a constant. Then E(c)=c

• Theorem 5.7: Let g(Y1,Y2) be a function of the random variables Y1 and Y2 and let c be a constant. Then E [cg(Y1,Y2)] = cE [g(Y1,Y2)]

3/15/15 74 • Theorem 5.8: Let Y1 and Y2 be random variables and g1(Y1,Y2), g2(Y1,Y2),…, gk(Y1,Y2) be functions of Y1 and Y2. Then E [g (Y ,Y )+g (Y ,Y )+ + g (Y ,Y )] 1 1 2 2 1 2 ··· k 1 2 = E [g (Y ,Y )] + E [g (Y ,Y )] + + E [g (Y ,Y )] 1 1 2 2 1 2 ··· k 1 2

• We won’t prove these…proofs follow directly from the way we proved the results of Sections 3.3 and 4.3

3/15/15 75 Textbook Example 5.20

• Referring back to Example 5.4, find expected

value of Y1-Y2, where 3y , 0 y y 1 f(y ,y )= 1 2 1 1 2 0, elsewhere   ⇢ • Solution:

3/15/15 76 Textbook Example 5.20 Solution (cont’d)

3/15/15 77 A Useful Theorem for Calculating Expectations

• Theorem 5.9: Let Y1 and Y2 be independent r.v.s and g(Y1) and h(Y2) be functions of only Y1 and Y2 , respectively. Then E [g(Y1)h(Y2)] = E [g(Y1)] E [h(Y2)] provided the expectations exist. • Proof:

3/15/15 78 Proof of Theorem 5.9 Continued

3/15/15 79 Textbook Example 5.21

• Referring back to Example 5.19, with the pdf

below, Y1 and Y2 are independent. 2(1 y ), 0 y 1, 0 y 1 f(y ,y )= 1 1 2 1 2 0, elsewhere    ⇢ Find E(Y1Y2). • Solution:

3/15/15 80 Textbook Example 5.21 Solution (cont’d)

3/15/15 81 Section 5.5 and 5.6 Homework

• Do problems 5.72, 5.74, 5.77

3/15/15 82 Section 5.7: The Covariance of Two Random Variables

• When looking at two random variables, a common question is whether they are associated with one-another – That is, for example, as one tends to increase, does the other do so as well? – If so, the colloquial terminology is to say that the two variables are correlated • Correlation is also a technical term used to describe the association – But it has a precise definition (that is more restrictive than the colloquial use of the term) – And there is a second from which correlation is derived: covariance 3/15/15 83 Covariance and Correlation

• Covariance and correlation are measures of dependence between two random variables • The figure below shows two extremes: perfect dependence and independence

3/15/15 84 Defining Covariance

• Definition 5.10: If Y1 and Y2 are r.v.s with µ 1 and µ 2 , respectively, then the

covariance of Y1 and Y2 is defined as Cov(Y ,Y )=E [(Y µ )(Y µ )] 1 2 1 1 2 2 • Idea:

For these observations (Y µ )(Y µ ) > 0 And note that if the 1 1 2 2 observations had fallen and for these observations around a line with a negative slope then the (Y µ )(Y µ ) > 0 1 1 2 2 expected value would be negative. So expected value is positive

3/15/15 85 An Example of Low Covariance

For observations in this quadrant

(Y1 µ1)(Y2 µ2) > 0 For observations in this quadrant (Y µ )(Y µ ) < 0 1 1 2 2

For observations For observations in this quadrant in this quadrant (Y1 µ1)(Y2 µ2) < 0 (Y µ )(Y µ ) > 0 1 1 2 2

So the positives and negatives balance out and the expected value is very small to zero.

3/15/15 86 Understanding Covariance

• Covariance gives the strength and direction of

linear relationship between Y1 and Y2: – Strong-positive: large positive covariance – Strong-negative: large negative covariance – Weak positive: small positive covariance – Weak negative: small negative covariance

• But what is “large” and “small”?

– It depends on the measurement units of Y1 and Y2

3/15/15 87 Big Covariance?

Height in inches Height in feet 76.00 6.3 6.2 74.00 6.1 72.00 6.0 5.9 70.00 5.8 68.00 5.7 5.6 66.00 5.5 64.00 5.4 5.3 62.00 5.2 60.00 5.1 64 65 66 67 68 69 70 71 72 73 74 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 parents ht parent ft Cov = 1.46 Cov = 0.0101 Same picture, different scale, different covariance 3/15/15 88 Correlation of Random Variables

• The correlation (coefficient) ⇢ of two random variables is defined as: Cov(Y ,Y ) ⇢ = 1 2 12 • As with covariance, correlation is a measure

of the dependence of two r.v.s Y1 and Y2 – But note that it’s re-scaled to be unit-free and measurement invariant

– In particular, for any random variables Y1 and Y2, 1 ⇢ +1  

3/15/15 89 Correlation is Unitless

Height in inches Height in feet 76.00 6.3 6.2 74.00 6.1 72.00 6.0 5.9 70.00 5.8 68.00 5.7 5.6 66.00 5.5 64.00 5.4 5.3 62.00 5.2 60.00 5.1 64 65 66 67 68 69 70 71 72 73 74 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 parents ht parent ft ⇢ =0.339 ⇢ =0.339 3/15/15 90 Interpreting the Correlation Coefficient

• A positive correlation coefficient ( ⇢ > 0 ) means there is a positive association

between Y1 and Y2

– As Y1 increases Y2 also tends to increase • A negative correlation coefficient ( ⇢ < 0 ) means there is a negative association

between Y1 and Y2

– As Y1 increases Y2 tends to decrease • A correlation coefficient equal to 0 ( ⇢ =0 ) means there is no linear association between

Y1 and Y2

3/15/15 91 Interactive Correlation Demo

From R (with the shiny package installed), run: shiny::runGitHub('CorrelationDemo','rdfricker')

3/15/15 92 An Alternative Covariance Formula

• Theorem 5.10: If Y1 and Y2 are r.v.s with means µ 1 and µ 2 , respectively, then Cov(Y ,Y )=E [(Y µ )(Y µ )] 1 2 1 1 2 2 = E(Y Y ) E(Y )E(Y ) 1 2 1 2 • Proof:

3/15/15 93 Textbook Example 5.22

• Referring back to Example 5.4, find the

covariance between Y1 and Y2, where 3y , 0 y y 1 f(y ,y )= 1 2 1 1 2 0, elsewhere   ⇢ • Solution:

3/15/15 94 Textbook Example 5.22 Solution (cont’d)

3/15/15 95 Textbook Example 5.23

• Let Y1 and Y2 have the following joint pdf 2y , 0 y 1, 0 y 1 f(y ,y )= 1 1 2 1 2 0, elsewhere    ⇢ Find the covariance of Y1 and Y2. • Solution:

3/15/15 96 Correlation and Independence

• Theorem 5.11: If Y1 and Y2 are independent r.v.s, then Cov( Y 1 ,Y 2 )=0 . That is, independent r.v.s are uncorrelated. • Proof:

3/15/15 97 Textbook Example 5.24

• Let Y1 and Y2 be discrete r.v.s with joint pmf as shown in Table 5.3. Show that Y1 and Y2 are dependent but have zero covariance.

• Solution:

3/15/15 98 Textbook Example 5.24 Solution (cont’d)

3/15/15 99 Important Take-Aways

• High correlation merely means that there is a strong linear relationship

– Not that Y1 causes Y2 or Y2 causes Y1 • E.g., there could be a third factor that causes

both Y1 and Y2 to move in the same direction

– No causal connection between Y1 and Y2 – But high correlation • Remember: ü High correlation does not imply causation ü Zero correlation does not mean there is no

relationship between Y1 and Y2

3/15/15 100 Section 5.7 Homework

• Do problems 5.89, 5.91, 5.92, 5.95

3/15/15 101 The Expected Value and Variance of Linear Functions of R.V.s (Section 5.8) • In OA3102 you will frequently encounter “estimators” that are linear combinations of random variables

• For example, n U = a Y + a Y + + a Y = a Y 1 1 1 2 2 ··· n n i i i=1 X where the Yis are r.v.s and ais are constants • Thus, for example, we might want to know the

expected value of U1, or the variance of U1, or perhaps the covariance between U1 and U2

3/15/15 102 Expected Value, Variance, and Covariance of Linear Combinations of R.V.s

• Theorem 5.12: Let Y1, Y2, …, Yn and X1, X2, …, Xm be r.v.s withE ( Y 1 )= µ i and E ( X 1 )= ⇠ i . Define n m U 1 = a i Y i and U2 = bjXj i=1 j=1 X X for constants a1,…, an and b1,…, bm. Then: n n

a. E ( U 1 )= a i µ i and similarly E(U2)= bj⇠j i=1 j=1 Xn X 2 b. V (U1)= ai V (Yi)+2 aiajCov(Yi,Yj) 1 i

3/15/15 103 Expected Value, Variance, and Covariance of Linear Combinations of R.V.s (cont’d)

n m

c. And, Cov(U1,U2)= aibjCov(Yi,Xj) i=1 j=1 X X • How will this be relevant? – In OA3102 you will need to estimate the population mean µ using – You will use the sample as n 1 an “estimator,” which is defined as Y¯ = Y n i i=1 X – This is a linear combination of r.v.s with constants a i =1 /n for all i

3/15/15 104 Textbook Example 5.25

• Let Y1, Y2, and Y3 be r.v.s, where: § E (Y )=1,E(Y )=2,E(Y )= 1 1 2 3 § V (Y1)=1,V(Y2)=3,V(Y3)=5 § Cov( Y ,Y )= 0.4, Cov(Y ,Y )=0.5, Cov(Y ,Y )=2 1 2 1 3 2 3 Find the expected value and variance of

U = Y1 - 2Y2 +Y3. If W = 3Y1 + Y2, find Cov(U,W). • Solution:

3/15/15 105 Textbook Example 5.25 Solution (cont’d)

3/15/15 106 Textbook Example 5.25 Solution (cont’d)

3/15/15 107 Proof of Theorem 5.12

• Now, let’s prove Theorem 5.12, including parts a and b…

3/15/15 108 Proof of Theorem 5.12 Continued

3/15/15 109 Proof of Theorem 5.12 Continued

3/15/15 110 Textbook Example 5.26

• Referring back to Examples 5.4 and 5.20, for the joint distribution

3y1, 0 y2 y1 1 f(y1,y2)=    0, elsewhere ⇢ find the variance of Y1 - Y2. • Solution:

3/15/15 111 Textbook Example 5.26 Solution (cont’d)

3/15/15 112 Textbook Example 5.26 Solution (cont’d)

3/15/15 113 Textbook Example 5.27

• Let Y1, Y2, …, Yn be independent random 2 variables with E ( Y i )= µ and V ( Y i )= . Show that E ( Y¯ )= µ and V ( Y¯ )= 2 /n for 1 n Y¯ = Y n i i=1 • Solution: X

3/15/15 114 Textbook Example 5.27 Solution (cont’d)

3/15/15 115 Textbook Example 5.28

• The number of defectives Y in a sample of n = 10 items sampled from a manufacturing process follows a . An estimator for the fraction defective in the lot is the p ˆ = Y/n . Find the expected value and variance of pˆ . • Solution:

3/15/15 116 Textbook Example 5.28 Solution (cont’d)

3/15/15 117 Textbook Example 5.29

• Suppose that an urn contains r red balls and (N-r) black balls. – A random sample of n balls is drawn without replacement and Y, the number of red balls in the sample, is observed. – From Chapter 3 we know that Y has a hypergeometric distribution. – Find E ( Y ) and V ( Y ) . • Solution:

3/15/15 118 Textbook Example 5.29 Solution (cont’d)

3/15/15 119 Textbook Example 5.29 Solution (cont’d)

3/15/15 120 Section 5.8 Homework

• Do problems 5.102, 5.103, 5.106, 5.113

3/15/15 121 Section 5.9: The Multinomial Distribution

• In a binomial experiment, there are n trials, each with a binary outcome of equal probability • But there are many situations in which the number of possible outcomes per trial is greater than two – When testing a missile, the outcomes may be “miss,” “partial kill,” and “complete kill” – When randomly students at NPS, their affiliation may be Army, Navy, Marine Corps, US civilian, non-US military, non-US civilian, other

3/15/15 122 Defining a Multinomial Experiment

• Definition 5.11: A multinomial experiment has the following properties: 1. The experiment consists of n identical trials. 2. The outcome of each trial falls into one of k classes or cells. 3. The probability the outcome of a single trial falls

into cell i, pi, i=1,…,k remains the same from trial to trial and p + p + + p =1 . 1 2 ··· k 4. The trials are independent.

5. The random variables of interest are Y1, Y2, …, Yk, the number of outcomes that fall in each of the cells, where Y + Y + + Y = n . 1 2 ··· k 3/15/15 123 Defining the Multinomial Distribution

• Definition 5.12: Assume that pi > 0, i=1,…,k are such that p 1 + p 2 + + p k =1 . The ··· random variables Y1, Y2, …, Yk have a multinomial distribution with

p1 , p2 ,…, pk if their joint distribution is

n! y1 y2 yk p(y1,y2,...,yk)= p1 p2 pk y1!y2! yk! ··· ··· k

where, for each i, yi = 0, 1, 2, …,n and y i = n . i 1 • Note that the binomial is just a X special case of the multinomial with k = 2 3/15/15 124 Textbook Example 5.30

• According to recent figures, the table below gives the proportion of US adults that fall into each age category:

Age (years) Proportion 18-24 0.18 25-34 0.23 35-44 0.16 45-64 0.27 65+ 0.16

• If 5 adults are randomly sampled out of the population, what’s the probability the sample contains one person between 18 and 24 years, two between 25 and 34, and two between 45 and 64 years?

3/15/15 125 Textbook Example 5.30 Solution

• Solution:

3/15/15 126 Expected Value, Variance, and Covariance of the Multinomial Distribution

• Theorem 5.13: If Y1, Y2, …, Yk have a multinomial distribution with parameters n and

p1 , p2 ,…, pk , then

1. E(Yi)=npi

2. V ( Y i )= np i q i where, as always, qi = 1 – pi 3. Cov(Ys,Yt)= npspt, if s = t 6 • Proof:

3/15/15 127 Proof of Theorem 5.13 Continued

3/15/15 128 Application Example

• Under certain conditions against a particular type of target, the Tomahawk Land Attack Missile – Conventional (TLAM-C) has a 50% chance of a complete target kill, a 30% chance of a partial kill, and a 20% chance of no kill/miss. In 10 independent shots, each to a different target, what’s the probability: – There are 8 complete kills, 2 partial kills, and 0 no kills/misses? – There are at least 9 complete or partial kills?

3/15/15 129 Application Example Solution

3/15/15 130 Application Example Solution in R

• The first question via brute force:

• Now, using the dmultinom() function

• For the second question, reduce it to a binomial where the probability of a complete or partial kill is 0.8

3/15/15 131 Section 5.9 Homework

• Do problems 5.122, 5.125

3/15/15 132 Section 5.10: The Bivariate Normal Distribution

• The bivariate normal distribution is an important one, both in terms of statistics (you’ll need it in OA3103), as well as probability modeling • The distribution has 5 parameters:

– µ 1 the mean of variable Y1

– µ 2 the mean of variable Y2 2 – 1 the variance of variable Y1 2 – 2 the variance of variable Y2

– ⇢ the correlation between Y1 and Y2

3/15/15 133 The Bivariate Normal PDF

• The bivariate normal pdf is a bit complicated

• It’s Q/2 e f(y1,y2)= 2⇡ 1 ⇢2 1 2 where p 1 (y µ )2 (y µ )(y µ ) (y µ )2 Q = 1 1 2⇢ 1 1 2 2 + 2 2 1 ⇢2 2 2  1 1 2 2

for

3/15/15 134 Interactive Bivariate Normal Distribution Demo

From R (with the shiny package installed), run: shiny::runGitHub('BivariateNormDemo','rdfricker')

3/15/15 135 Some Facts About the Bivariate Normal

• If Y1 and Y2 are jointly distributed according to a bivariate normal distribution, then

– Y1 has a (univariate) normal distribution with mean µ 2 1 and variance 1

– Y2 has a (univariate) normal distribution with mean 2 µ 2 and variance 2 – Cov( y 1 ,y 2 )= ⇢ 1 2 and if Cov( y 1 ,y 2 )=0 (equivalently ⇢ =0 ) then Y1 and Y2 are independent • Note that this result is not true in general: zero correlation between any two random variables does not imply independence • But it does for two normally distributed random variables!

3/15/15 136 Illustrative Bivariate Normal Application

From R (with the shiny package installed), run: shiny::runGitHub('BivariateNormExample','rdfricker')

3/15/15 137 The Multivariate Normal Distribution

• The multivariate normal distributions is a joint distribution for k > 1 random variables • The pdf for a multivariate normal is

1 1 1 f(y)= exp (y µ)0⌃ (y µ) (2⇡)k/2 ⌃ 1/2 2 | | ⇢ where – y and µ are k-dim. vectors for the location at which to evaluate the pdf and the means, respectively – ⌃ is the variance-

– For the bivariate normal: y = y 1 ,y 2 , µ = µ 1 ,µ 2 and 2 { } { } 1 ⇢12 ⌃ = 2 ⇢12 3/15/15  2 138 Section 5.11: Conditional Expectations

• In Section 5.3 we learned about conditional pmfs and pdfs – The idea: A conditional distribution is the distribution of one random variable given information about another jointly distributed random variable • Conditional expectations are similar – It’s the expectation of one random variable given information about another jointly distributed random variable – They’re defined just like univariate expectations except that the conditional pmfs and pdfs are used

3/15/15 139 Defining the Conditional Expectation

• Definition 5.13: If Y1 and Y2 are any random variables, the conditional expectation of g(Y1) given that Y2 = y2 is defined as 1 E (g(Y1) Y2 = y2)= g(y1)f(y1 y2)dy1 | | Z1

if Y1 and Y2 are jointly continuous, and

E (g(Y ) Y = y )= g(y )p(y y ) 1 | 2 2 1 1| 2 allXy1

if Y1 and Y2 are jointly discrete.

3/15/15 140 Textbook Example 5.31

• For r.v.s Y1 and Y2 from Example 5.8, with joint pdf 1/2, 0 y y 2 f(y ,y )= 1 2 1 2 0, elsewhere   ⇢ Find the conditional expectation of Y1 given Y2 = 1.5. • Solution:

3/15/15 141 Textbook Example 5.31 Solution (cont’d)

3/15/15 142 A Bit About Conditional Expectations

• Note that the conditional expectation of Y1 given Y2 = y2 is a function of y2 – For example, in Example 5.31 we obtained E(Y Y = y )=y /2 1| 2 2 2 • It then follows that E(Y Y )=Y /2 1| 2 2 – That is, the conditional expectation is a function of a random variable and so it’s a random variable itself – Thus, it has a mean and a variance as well

3/15/15 143 The Expected Value of a Conditional Expectation

• Theorem 5.14: Let Y1 and Y2 denote random variables. Then E [E(Y1 Y2)] = E(Y1) | where the inside expectation is with respect to the conditional distribution and the outside

expectation is with respect to the pdf of Y2. • Proof:

3/15/15 144 Proof of Theorem 5.14 Continued

• Note: This theorem is often referred to as the Law of Iterated Expectations – Then it’s also frequently written in the other direction: E(Y )=E [E(Y Y )] 1 1| 2

3/15/15 145 Textbook Example 5.32

• A QC plan requires sampling n = 10 items and counting the number of defectives, Y. – Assume Y has a binomial distribution with p, the probability of observing a defective. – But p varies from day-to-day according to a U(0, 0.25) distribution. – Find the expected value of Y. • Solution:

3/15/15 146 Textbook Example 5.32 Solution (cont’d)

3/15/15 147 The

• The conditional variance of Y1 given Y2 = y2 is defined by analogy with the ordinary variance: V (Y Y = y )=E(Y 2 Y = y ) [E(Y Y = y )]2 1| 2 2 1 | 2 2 1| 2 2 • Note that the calculations are basically the same, except the expectations on the right hand side of the equality are taken with respect to the conditional distribution • Just like with the conditional expectation, the

conditional variance is a function of y2

3/15/15 148 Another Way to Calculate the Variance

• Theorem 5.15: Let Y1 and Y2 denote random variables. Then, V (Y )=E [V (Y Y )] + V [E(Y Y )] 1 1| 2 1| 2 • Proof:

3/15/15 149 Proof of Theorem 5.15 Continued

3/15/15 150 Textbook Example 5.33

• From Example 5.32, find the variance of Y. • Solution:

3/15/15 151 Textbook Example 5.33 Solution (cont’d)

3/15/15 152 Note How Conditioning Can Help

• Examples 5.32 & 5.33 could have been solved by first finding the unconditional distribution of Y and then calculating E(Y) and V(Y) • But sometimes that can get pretty complicated, requiring: – First finding the joint distribution of Y and the (p in the previous examples) – Then it’s necessary to solve for the marginal distribution of Y – And then finally, calculate E(Y) and V(Y) in the usual way

3/15/15 153 Note How Conditioning Can Help

• However, sometimes, like here, it can be easier to work with conditional distributions and the results of Theorems 5.14 and 5.15 • In the examples: – The distribution conditioned on a fixed value of the parameter was clear – Then putting a distribution on the parameter itself accounted for the day-to-day differences • So, the E(Y) and V(Y) can be found without having to first solve for the marginal distribution of Y

3/15/15 154 Section 5.11 Homework

• Do problems 5.133, 5.139, 5.140

3/15/15 155 Section 5.12: Summary

• Whew – this was an intense chapter! – But the material is important and will come up in future classes • For example: – Linear functions of random variables: OA3102, OA3103 – Covariance, correlation, and the bivariate normal distribution: OA3103, OA3602 – Conditional expectation and variance, Law of Iterated Expectations, etc: OA3301, OA4301

3/15/15 156 What We Have Just Learned

• Bivariate and multivariate probability distributions – Bivariate normal distribution – Multinomial distribution • Marginal and conditional distributions • Independence, covariance and correlation • Expected value & variance of a function of r.v.s – Linear functions of r.v.s in particular • Conditional expectation and variance

3/15/15 157