3 Multiple Discrete Random Variables
3.1 Joint densities Suppose we have a probability space (Ω, F, P) and now we have two discrete random variables X and Y on it. They have probability mass functions fX (x) and fY (y). However, knowing these two functions is not enough. We illustrate this with an example.
Example: We flip a fair coin twice and define several RV’s: Let X1 be the number of heads on the first flip. (So X1 = 0, 1.) Let X2 be the number of heads on the second flip. Let Y = 1 − X1. All three RV’s have the same p.m.f. : f(0) = 1/2, f(1) = 1/2. So if we only look at p.m.f.’s of individual RV’s, the pair X1,X2 looks just like the pair X1,Y . But these pairs are very different. For example, with the pair X1,Y , if we know the value of X1 then the value of Y is determined. But for the pair X1,X2, knowning the value of X1 tells us nothing about X2. Later we will define “independence” for RV’s and see that X1,X2 are an example of a pair of independent RV’s. We need to encode information from P about what our random variables do together, so we introduce the following object. Definition 1. If X and Y are discrete RV’s, their joint probability mass function is
f(x, y)= P(X = x, Y = y)
As always, the comma in the event X = x, Y = y means “and”. We will also write this as fX,Y (x, y) when we need to indicate the RV’s we are talking about. This generalizes to more than two RV’s:
Definition 2. If X1,X2, ··· ,Xn are discrete RV’s, then their joint probability mass function is
fX1,X2,···,Xn (x1, x2, ··· , xn)= P(X1 = x1,X2 = x2, ··· ,Xn = xn)
The joint density for n RV’s is a function on Rn. Obviously, it is a non- negative function. It is non-zero only on a finite or countable set of points in Rn. If we sum it over these points we get 1:
fX1,X2,···,Xn (x1, x2, ··· , xn)=1 x1,···,xn X 1 We return to the example we started with. Example - cont. The joint pmf of X1,X2 is
fX1,X2 (0, 0) = fX1,X2 (0, 1) = fX1,X2 (1, 0) = fX1,X2 (1, 1)=1/4 and that of X1,Y is
fX1,Y (0, 1) = fX1,Y (1, 0)=1/2, fX1,Y (0, 0) = fX1,Y (1, 1)=0
As with one RV, the goal of introducing the joint pmf is to extract all the information in the probability measure P that is relevant to the RV’s we are considering. So we should be able to compute the probability of any event defined just in terms of the RV’s using only their joint pmf. Consider two RV’s X,Y . Here are some events defined in terms of them:
{0 ≤ X ≤ 10,Y ≥ 5}, {0 ≤ X + Y ≤ 10}, or {XY ≥ 3}
We can write any such event as {(X,Y ) ∈ A} where A ⊂ R2. For example,
{0 ≤ X ≤ 10,Y ≥ 5} = {(X,Y ) ∈ A}, where A is the rectangle A = {(x, y):0 ≤ x ≤ 10,y ≤ 5}. And
{0 ≤ X + Y ≤ 10} = {(X,Y ) ∈ A}, A = {(x, y):0 ≤ x + y ≤ 10}
The following proposition tells us how to compute probabilities of events like these.
n Proposition 1. If X1,X2, ··· ,Xn are discrete RV’s and A ⊂ R , then
P((X1,X2, ··· ,Xn) ∈ A)= fX1,X2,···,Xn (x1, x2, ··· , xn) ··· ∈ (x1,x2X, ,xn) A There are some important special cases of the proposition which we state as corollaries.
Corollary 1. Let g(x, y) : R2 → R. Let X,Y be discrete RV’s. Define a new discrete RV by Z = g(X,Y ). Then its pmf is
fZ (z)= fX,Y (x, y) (x,y):Xg(x,y)=z
2 When we have the joint pmf of X,Y , we can use it to find the pmf’s of X and Y by themselves, i.e., fX (x) and fY (y). These are called “marginal pmf’s.” The formula for computing them is : Corollary 2. Let X,Y be two discrete RV’s. Then
fX (x) = fX,Y (x, y) y X fY (y) = fX,Y (x, y) x X Example: Flip a fair coin. Let X = 0, 1 be number of heads. If coin is heads roll a four-sided die, if tails a six-sided die. Let Y be the number on the die. Find the joint pmf of X,Y , and the individual pmf’s of X and Y .
Example: Roll a die until we get a 6. Let Y be the number of rolls (including the 6). Let X be the number of 1’s we got before we got 6. Find fX,Y , fX , fY . It is hard to figure out P(X = x, Y = y) directly. But if we are given the value of Y , say Y = y, then X is just a binomial RV with y − 1 trials and p =1/5. So we have − − y − 1 1 x 4 y 1 x P(X = x|Y = y)= x 5 5 We then use P(X = x, Y = y)= P(X = x|Y = y)P(Y = y). The RV Y has a geometric distribution with parameter 1/6. Note that P(X = x|Y = y)=0 if x ≥ y. So we get
x y− −x y− y−1 1 4 1 5 1 1 , if x