F ⊆ Ω2, then the distribution on Ω = Ω1 × Ω2 is the product of the distributions on Ω1 and Ω2. But that doesn’t always happen, and that’s what we’re interested in. Joint distributions Math 217 Probability and Statistics A discrete example. Deal a standard deck of Prof. D. Joyce, Fall 2014 52 cards to four players so that each player gets 13 cards at random. Let X be the number of Today we’ll look at joint random variables and joint spades that the first player gets and Y be the num- distributions in detail. ber of spades that the second player gets. Let’s compute the probability mass function f(x, y) =

Product distributions. If Ω1 and Ω2 are sam- P (X=x and Y =y), that probability that the first ple spaces, then their distributions P :Ω1 → R player gets x spades and the second player gets y and P :Ω2 → R determine a product distribu- spades. 5239 tion on P :Ω1 × Ω2 → R as follows. First, if There are hands that can be dealt to E1 and E2 are events in Ω1 and Ω2, then define 13 13 P (E × E ) to be P (E )P (E ). That defines P on 13 39  1 2 1 2 these two players. There are hands “rectangles” in Ω1×Ω2. Next, extend that to count- x 13 − x able disjoint unions of rectangles. There’s a bit of for the first player with exactly x spades, and with 13 − x26 + x theory to show that what you get is well-defined. the remaining deck there are Extend to complements of those unions. Again, y 13 − y more theory. Continue by closing under countable hands for the second player with exactly y spades. unions and complements. A lot more theory. But Thus, there are it works. That theory forms part of the the topic 13 39 13 − x26 + x called measure theory which is included in courses on real analysis. x 13 − x y 13 − y When you have the product distribution on Ω1 × ways of dealing hands to those two players with x Ω2, the random variables X and Y are independent, and y spades. Thus, but there are applications where we have a distri- bution on Ω1 × Ω2 that differs from the product 13 39 13 − x26 + x distribution, and for those, X and Y won’t be in- x 13 − x y 13 − y f(x, y) = . dependent. That’s the situation we’ll look at now. 5239 13 13 Definition. A joint (X,Y ) is a random variable on any sample space Ω which is You can write that expression in terms of trino- the product of two sets Ω1 × Ω2. mial coefficients, or in terms of factorials, but we’ll Joint random variables do induce probability dis- leave it as it is. If you were a professional Bridge tributions on Ω1 and on Ω2. If E ⊆ Ω1, de- player, you might want to see a table of the values fine P (E) to be the probability in Ω of the set of f(x, y). E × Ω2. That defines P :Ω1 → R which satisfies It’s intuitive that X and Y are not indepen- the axioms for a probability distributions. Simi- dent joint random variables. The more spades the larly, you can define P :Ω2 → R by declaring for first player has, the fewer the second will proba- F ⊆ Ω2 that P (F ) = P (Ω1 × F ). If it happens bly have. In order to show they’re not indepen- that P (E × F ) = P (E)P (F ) for all E ⊆ Ω1 and dent, it’s enough to find particular values of x

1 and y so that f(x, y) = P (X=x and Y =y) does We can summarize the cumulative distribution not equal the product of fX (x) = P (X=x) and function as f (y) = P (Y =y). For example, we know they can’t Y  0 if x < 0 or y < 0 both get 7 spades since there are only 13 spades in   x2 if 0 ≤ x ≤ 1 and x ≤ y all, so f(7, 7) = 0. But since the first player can get  F (x, y) = 2xy − y2 if 0 ≤ x ≤ 1 and x > y 7 spades, f (7) > 0, also the second player can, so X  2y − y2 if x > 1 and 0 ≤ y ≤ 1 f (y) > 0. Therefore f(7, 7) 6= f (7)f (7). We’ve  Y Y Y  1 if x > 1 and y > 1 proven that X and Y are not independent. Generally speaking, joint cumulative distribution A continuous example. Let’s choose a point functions aren’t used as much as joint density func- from inside a triangle ∆ uniformly at random. Let’s tions. Typically, joint c.d.f.’s are much more com- take ∆ to be half the unit square, namely the half plicated to describe, just as in this example. with vertices at (0, 0), (1, 0), and (1, 1). Then Joint distributions and density functions. ∆ = {(x, y) | 0 ≤ y ≤ x ≤ 1}. Density functions are the usual way to describe joint continuous real-valued random variables. 1 The area of this triangle is 2 , so we can find the Let X and Y be two continuous real-valued ran- probability of any event E, a subset of ∆, simply dom variables. Individually, they have their own by doubling its area. The joint probability density cumulative distribution functions 1 function is constantly 2 inside ∆ and 0 outside. FX (x) = P (X ≤ x) FY (y) = P (Y ≤ y),  2 if 0 ≤ y ≤ x ≤ 1 f(x, y) = 0 otherwise whose derivatives, as we know, are the probability density functions A continuous joint random variable (X,Y ) is de- d d termined by its cumulative distribution function f (x) = F (x) f (y) = F (y). X dx X Y dy Y F (x, y) = P (X ≤ x and Y ≤ y). Furthermore, the cumulative distribution functions can be found by integrating the density functions We’ll figure out it’s value for this example. Z x Z y First, if either x or y is negative, then the event FX (x) = fX (t) dt FY (y) = fY (t) dt. (X ≤ x and Y ≤ y) completely misses the triangle, −∞ −∞ so it’s actually empty, so its probability is 0. There is also a joint cumulative distribution func- Second, if x is between 0 and 1 and x ≤ y, then tion for (X,Y ) defined by the event (X ≤ x and Y ≤ y) is a triangle of width 1 2 and height x, so its area is 2 x . Doubling the area F (x, y) = P (X ≤ x and Y ≤ y). gives a probability of x2. Third, if x is between 0 and 1 and x > y, then the The joint probability density function f(x, y) is point (x, y) lies inside the triangle, and the event found by taking the derivative of F twice, once with (X ≤ x and Y ≤ y) is a trapezoid. Its bottom respect to each variable, so that length is x, top length is x − y, and height is y, so ∂ ∂ its area is xy − y2/2. Therefore, the probability of f(x, y) = F (x, y). ∂x ∂y this event is 2xy − y2. There remain two more cases to be considered, (The notation ∂ is substituted for d to indicate that but their analyses are omitted. there are other variables in the expression that are

2 held constant while the derivative is taken with respect to the given variable.) The joint cumula- tive distribution function can be recovered from the joint density function by integrating twice Z x Z y F (x, y) = f(s, t) dt ds. −∞ −∞ (When more that one integration is specified, the R y inner integral, namely −∞ f(s, t) dt, is written without parentheses around it, even though the parentheses would help clarify the expression.) Furthermore, the individual cumulative distribu- tion functions are determined by the joint distribu- tion function.

FX (x) = P (X ≤ x and Y ≤ ∞) = lim F (x, y) = F (x, ∞) y→∞

FY (y) = P (X ≤ ∞ and Y ≤ y) = lim F (x, y) = F (∞, y) x→∞ The volume under that surface is 1, as it has to be for f to be a probability density function. To find Likewise, the individual density functions can be the probability that (X,Y ) lies in an event E, a found by integrating joint density function. subset of the unit square, just find the volume of Z ∞ Z ∞ the solid above E and below the surface z = f(x, y). fX (x) = f(x, y) dy, fY (x) = f(x, y) dx −∞ −∞ That’s a double integral

These individual density functions fX and fy ZZ are often called marginal density functions to dis- P ((X,Y ) ∈ E) = f(x, y) dx dy. E tinguish them from the joint density function For example, suppose we want to find the proba- f(X,Y ). Likewise the corresponding individual cu- bility P (0 ≤ X ≤ 3 and 1 ≤ Y ≤ 1). The event mulative distribution functions FX and FY are 4 3 3 1 called marginal cumulative distribution functions to E is (0 ≤ X ≤ 4 and 3 ≤ Y ≤ 1), and that de- scribes a rectangle. The double integral giving the distinguish them form the joint c.d.f F(X,Y ). probability is Another continuous example. The last exam- Z 3/4Z 1 2 ple was a uniform distribution on a triangle. For P (E) = 6x y dy dx. this example, we’ll go back to the unit square, but 0 1/3 make the distribution nonuniform. We’ll describe Evaluate that integral from the inside out. The the distribution via a joint density function inner integral is f(x, y) = 6x2y Z 1 Z 1 6x2y dy = 6x2 y dy if (x, y) is in the unit square, that is, x and y are 1/3 1/3 1 both between 0 and 1. Outside that square f(x, y) = 6x2(y2/2) is 0. The graph z = f(x, y) of this function is a y=1/3 2 1 8 2 surface sitting above the unit square. = 6x (1 − 9 )/2 = 3 x

3 Next, evaluate the outer integral. from which it follows that the joint cumulative dis- tribution function is the product of the marginal Z 3/4 8 2 cumulative distribution functions P (E) = 3 x dx 0 3/4 F (x, y) = F (x) F (y). 8 3 3 X Y = 9 x = 8 0 ∂ ∂ Take the partial derivatives of both sides of Let’s do more with this example. We’re given ∂x ∂y the joint density function f(x, y). Let’s find the that equation to conclude marginal density functions for X and Y . Z ∞ f(x, y) = fX (x) fY (y), fX (x) = f(x, y) dy −∞ the joint probability density function is the product Z 1 of the marginal density functions = 6x2y dy 0 1 Another example. We’ve seen two examples so = 6x2y2/2 y=0 far. One was when the probability was uniform over = 3x2 a triangle, and the other had X and Y independent. This example is of a nonuniform probability where Z ∞ the variables aren’t independent. fY (y) = f(x, y) dx Let −∞ Z 1  x + y if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1 = 6x2y dx f(x, y) = 0 0 otherwise 1 = 2x3y x=0 You can show that the double integral = 2y Z 1 Z 1 2 (x + y) dy dx In summary, the joint density is f(xy) = 6x y 0 0 over the unit square. The marginal density func- 2 equals 1 as follows. First, evaluate the inner inte- tions are fx(x) = 3x and fY (y) = 2y. Note: it so happens in this example that the joint gral. density function f(x, y) is the product of the two Z 1 1 1 2 1 marginal density functions fX (x, y)fY (x, y). That (x + y) dy = xy + y = x + 2 2 doesn’t always happen. But it does happen when 0 y=0 the random variables X and Y are independent, Then substitute that in the outer integral and eval- which is discussed next. uate it.

Independent continuous random variables. Z 1 1 (x + 1 ) dx = 1 x2 + 1 = 1 2 2 2 In the case of continuous real random variables, we 0 x=0 can characterize independence in terms of density functions. Random variables X and Y will be in- Therefore f is a probability density function. dependent when the events X ≤ x and Y ≤ y are Next, let’s compute the two marginal density independent for all values of x and y. That means functions fX and fY , so X and Y . In fact, the first computation we performed, evaluating the in- P (X ≤ x and Y ≤ y) = P (X ≤ x) P (Y ≤ y), ner integral, gave us the marginal density function

4 for X. In general, the marginal density function for X can be found as the integral Z ∞ fX (x) = f(x, y) dy −∞ which in this case was, for x ∈ [0, 1],

Z 1 1 fX (x) = (x + y) dy = x + 2 . 0 Also for this example, since f(x, y) is symmetric in x and y, therefore the marginal density function for 1 Y is fY (y) = y + 2 for y ∈ [0, 1]. For X and Y to be independent, the joint density function f(x, y) would have to equal the product fX (x)fY (y) of the two marginal density functions, but 1 1 x + y 6= (x + 2 )(y + 2 ). Therefore, X and Y are not independent. Later, we’ll develop the concept of correlation which will quantify how related X and Y are. In- dependent variables will have 0 correlation, but if a larger value of X indicates a larger value of Y , then they will have a positive correlation.

Math 217 Home Page at http://math.clarku.edu/~djoyce/ma217/

5