<<

Probability STAT 416 Spring 2007

4 Jointly distributed random variables

1. Introduction 2. Independent Random Variables 3. Transformations 4. Covariance, Correlation 5. Conditional Distributions 6. Bivariate

1 4.1 Introduction

Computation of probabilities with more than one random variable two random variables . . . bivariate two or more random variables. . . multivariate

Concepts:

• Joint distribution function

• purely discrete: Joint probability function

• purely continuous: Joint density

2 Joint distribution function

Bivariate case: Random variables X and Y Define joint distribution function as

F (x, y) := P (X ≤ x, Y ≤ y), −∞ < x, y < ∞

Bivariate distribution thus fully specified

P (x1

Marginal distribution: FX (x) := P (X ≤ x) = F (x, ∞) Idea: P (X ≤ x) = P (X ≤ x, Y < ∞) = lim F (x, y) y→∞

Analogous FY (y) := P (Y ≤ y) = F (∞, y)

3 Bivariate continuous random variable

X and Y jointly continuous if there exists joint density function:

∂2 f(x, y) = F (x, y) ∂x ∂y joint distribution function then obtained by integration

Zb Za F (a, b) = f(x, y) dxdy

y=−∞ x=−∞

Density of marginal distribution X obtained by integration over ΩY : Z∞

fX (x) = f(x, y) dy y=−∞

4 Example: Bivariate uniform distribution

X and Y uniformly distributed on [0, 1] × [0, 1] ⇒ density

f(x, y) = 1, 0 ≤ x, y ≤ 1.

Joint distribution function Zb Za F (a, b) = f(x, y) dxdy = a b, 0 ≤ a, b ≤ 1.

y=0 x=0

Density of marginal distribution: Z∞

fX (x) = f(x, y) dy = 1, 0 ≤ x ≤ 1 y=−∞ i.e. density of univariate uniform distribution

5 Exercise: Bivariate continuous distribution

Let joint density of X and Y be given as   ax2y2, 0 < x < y < 1 f(x, y) =  0, else

1. Determine the constant a

2. Determine the CDF F (x, y)

3. Compute the probability P (X < 1/2, Y > 2/3)

4. Compute the probability P (X < 2, Y > 2/3)

5. Compute the probability P (X < 2, Y > 3)

Hint: Be careful with the domain of integration

6 S.R.: Example 1c

Let joint density of X and Y be given as   e−(x+y), 0 < x < ∞, 0 < y < ∞ f(x, y) =  0, else

Find the density function of the random variable X/Y

Again most important to consider the correct domain of integration

Z∞ Zay 1 P (X/Y ≤ a) = e−(x+y) dx dy = 1 − a + 1 y=0 x=0

(More details are given in the book)

7 Bivariate discrete random variable

X and Y both discrete Define joint probability function

p(x, y) = P (X = x, Y = y)

Naturally we have p(x, y) = F (x, y) − F (x−, y) − F (x, y−) + F (x−, y−)

Obtain probability function of X by summation over ΩY : X pX (x) = P (X = x) = p(x, y)

ΩY

8 Example

Bowl with 3 red, 4 white and 5 blue balls; draw 3 balls randomly without replacement X . . . number of red drawn balls Y . . . number of white drawn balls

¡3¢¡4¢¡5¢ ¡12¢ e. g.: p(0, 1) = P (0R, 1W, 2B) = 0 1 2 / 3 = 40/220

j

i 0 1 2 3 pX 0 10/220 40/220 30/220 4/220 84/220 1 30/220 60/220 18/220 0 108/220 2 15/220 12/220 0 0 27/220 3 1/220 0 0 0 1/220

pY 56/220 112/220 48/220 4/220 1

9 Multivariate random variable

More than two random variables joint distribution function for n rabdom variables

F (x1, . . . , xn) = P (X1 ≤ x1,...,Xn ≤ xn)

Discrete: Joint probability function:

p(x1, . . . , xn) = P (X1 = x1,...,Xn = xn)

Marginals again by summation over all components that are not of interest, e.g. X X

pX1 (x1) = ··· p(x1, . . . , xn)

x2∈Ω2 xn∈Ωn

10 Multinomial distribution one of the most important multivariate discrete distribution n independent experiments with r possible results, each with probability p1, . . . pr

Xi . . . number of experiments with ith result, then

n! n1 nr P (X1 = n1,...,Xr = nr) = p ··· p n1!···nr ! 1 r

Pr whenever i=1 ni = n

Generalization of (r = 2)

Exercise: Poker Dice (throw 5 dices) Compute probability of a (High) straight, Four of a kind, Full House

11 4.2 Independent random variables

Two random variables X and Y are called independent if for all events A and B

P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B)

Information on X does not give any information on Y

X and Y independent if and only if

P (X ≤ a, Y ≤ b) = P (X ≤ a)P (Y ≤ b) i.e. F (a, b) = FX (a) FY (b) for all a, b.

Also equivalent with f(x, y) = fX (x) fY (y) in the continuous case and with p(x, y) = pX (x) pY (y) in the discrete case for all x, y

12 Example: continuous

Looking back at the example   ax2y2, 0 < x < y < 1 f(x, y) =  0, else

Are the random variables X and Y with joint density as specified above independent?

What changes if we look at   ax2y2, 0 < x < 1, 0 < y < 1 f(x, y) =  0, else

13 Example Z ∼ P(λ) . . . number of persons which enter a bar p . . . percentage of male visitors X,Y . . . number of male and female visitors respectively X,Y are independently Poisson distributed with parameters pλ and qλ respectively Solution: Law of total probability:

P (X =i, Y =j) = P (X =i, Y =j|X + Y =i + j)P (X + Y =i + j) by definition: ¡ ¢ i+j i j P (X =i, Y =j|X + Y =i + j) = i p q −λ λi+j P (X + Y =i + j) = e (i+j)! Together: i i j −λ (λp) j −λp (λp) −λq (λq) P (X =i, Y =j) = e i!j! (λq) = e i! e j!

14 Example: Two dices

X,Y . . . uniformly distributed on {1,..., 6}

1 Due to independence we have p(x, y) = pX (x) pY (y) = 36

Distribution function:

FX (x) = FY (x) = i/6, if x ≤ i < x + 1 and 0 < x < 7 ij F (x, y) = FX (x)FY (y) = 36 , x ≤ i < x + 1, y ≤ j < y + 1

Which distribution has X + Y ? P (X + Y = 2) = p(1, 1) = 1/36 P (X + Y = 3) = p(1, 2) + p(2, 1) = 2/36 P (X + Y = 4) = p(1, 3) + p(2, 2) + p(3, 1) = 3/36 P (X + Y = k) = p(1, k − 1) + p(2, k − 2) + ··· + p(k − 1, 1)

15 Sum of two independent distributions

Sum of random variables itself a random variable Computation of distribution via convolution Discrete random variables: X X P (X + Y = k) = pX (x)pY (y) = pX (k − y)pY (y)

x+y=k ΩY

Continuous random variables: Z∞

fX+Y (x) = fX (x − y)fY (y)dy y=−∞

Exercise: X ∼ P(λ1) and Y ∼ P(λ2) independent

⇒ X + Y ∼ P(λ1 + λ1)

16 Example of convolution: Continuous case X, Y independent,uniformly distributed on [0, 1] i.e. f(x, y) = 1, (x, y) ∈ [0, 1] × [0, 1] fX (x) = 1, 0 ≤ x ≤ 1, fY (y) = 1, 0 ≤ y ≤ 1 Computation of density of Z := X + Y Z∞

fZ (x) = fX (x − y)fY (y)dy y=−∞  Rx   dy = x, 0 < x ≤ 1 = y=0  R1  dy = 2 − x, 1 < x ≤ 2 y=x−1

Reason: fY (y) = 1 if 0 ≤ y ≤ 1 fX (x − y) = 1 if 0 ≤ x − y ≤ 1 ⇔ y ≤ x ≤ y + 1

17 Another example of convolution

X, Y indep., Γ−distributed with parameters t1, t2 and same λ λe−λx(λx)t1−1 λe−λy (λy)t2−1 fX (x) = , fY (y) = , x, y ≥ 0, Γ(t1) Γ(t2)

Z∞

fZ (x) = fX (x − y)fY (y)dy y=−∞ Zx λe−λ(x−y)(λ(x − y))t1−1 λe−λy(λy)t2−1 = dy Γ(t1) Γ(t2) y=0 Zx λt1+t2 e−λx = (x − y)t1−1yt2−1dy Γ(t1)Γ(t2) y=0 ¯ ¯ ¯ ¯ ¯ y = xz ¯ λe−λx(λx)t1+t2−1 = ¯ ¯ = ¯ dy = xdz ¯ Γ(t1 + t2)

18 4.3 Transformations

As in the univariate case one considers also in the multivariate case transformations of random variables We restrict ourselves to the bivariate case

X1 and X2 jointly distributed with density fX1,X2

Y1 = g1(X1,X2) and Y2 = g2(X1,X2)

Major question: What is the joint density fY1,Y2 ?

19 Density of Transformation

Assumptions:

• y1 = g1(x1, x2) and y2 = g2(x1, x2) can be uniquely solved for

x1 and x2, say x1 = h1(y1, y2) and x2 = h2(y1, y2)

1 • g1 and g2 are C such that ¯ ¯ ¯ ¯ ¯ ∂g1 ∂g1 ¯ ¯ ∂x1 ∂x2 ¯ J(x1, x2) = ¯ ¯ 6= 0 ¯ ∂g2 ∂g2 ¯ ∂x1 ∂x2

Under these conditions we have

−1 fY1,Y2 (y1, y2) = fX1,X2 (x1, x2)|J(x1, x2)|

Proof: Calculus Note the similarity to the one-dimensional case

20 Examples

• Sheldon Ross: Example 7a

Y1 = X1 + X2

Y2 = X1 − X2

• Sheldon Ross: Example 7c X and Y independent gamma random variables U = X + Y V = X/(X + Y )

21 Expectation of bivariate RV

X and Y discrete with joint probability p(x, y). Like in the one-dimensional case: P P E(g(X,Y )) = g(x, y)p(x, y) x∈ΩX y∈ΩY

Exercise: Let X and Y eyes of two fair dice, independently thrown Compute expectation of the difference |X − Y |

22 Expectation of bivariate RV

X and Y continuous with joint density f(x, y). Like in the one-dimensional case: R R E(g(X,Y )) = g(x, y)f(x, y) dydx x∈R y∈R

Exercise: Accident occurs at point X on a road of length L. At that time ambulance is at point Y . Assuming both X and Y uniformly distributed on [0,L] and independent, calculate average distance between them. This is again E(|X − Y |)

23 Expectation of the sum of two RV

X and Y discrete with joint probability p(x, y) For g(x, y) = x + y we obtain P P E(X + Y ) = (x + y)p(x, y) = E(X) + E(Y ) x∈ΩX y∈ΩY

Analogous in the continuous case:

R∞ R∞ E(X + Y ) = (x + y)f(x, y) dx dy = E(X) + E(Y ) y=−∞ x=−∞

Be aware: Additivity for in general not the case!

24 Expectation of functions under independence If X and Y are independent random variables, then for any functions h and g we obtain E (g(X)h(Y )) = E (g(X)) E (h(Y )) Proof:

ZZ E (g(X)h(Y )) = g(x)h(y)f(x, y) dx dy ZZ

= g(x)h(y)fX (x)fY (y) dx dy Z Z

= g(x)fX (x) dx h(y)fY (y) dy = E (g(X)) E (h(Y ))

Second equality uses independence

25 Expectation of random samples

X1,...,Xn i.i.d. like X, (independent, identically distributed) Pn ¯ 1 Definition: X := n Xi i=1 For E(X) = µ and Var (X) = σ2 we obtain: ¡ ¢ ¯ ¯ σ2 E X = µ, Var (X) = n

Proof: µ ¶ Pn Pn E Xi = E(Xi) i=1 i=1 µ ¶ µ ¶ µ ¶ n 2 n 2 n 1 P 1 P 1 P 2 E n Xi − µ = n2 E (Xi − µ) = n2 E (Xi − µ) i=1 i=1 i=1

Last equality due to independence of Xi

26 4.4 Covariance and Correlation

Describe the relation between two random variables Definition Covariance:

Cov (X,Y ) = E(X − E(X))(Y − E(Y ))

Usual notation: σXY := Cov (X,Y )

Just like for variances we have

σXY = E(XY ) − E(X)E(Y )

Definition of correlation:

ρ(X,Y ) := σXY σX σY

27 Example Correlation

3 2

1.5

2 1

0.5 1

0 ρ = 0.9 0 −0.5 ρ=−0.6 −1

−1 −1.5

−2 −2

−2.5

−3 −3 −4 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

3 4

3 2

2 1

1 ρ = 0.3 0 ρ = 0.0 0

−1 −1

−2 −2

−3 −3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 4

28 Example Covariance

Discrete bivariate distribution (ΩX = ΩY = {0, 1, 2, 3}) given by

j

i 0 1 2 3 pX 0 1/20 4/20 3/20 2/20 10/20 1 3/20 2/20 2/20 0 7/20 2 1/20 1/20 0 0 2/20 3 1/20 0 0 0 1/20

pY 6/20 7/20 5/20 2/20 20/20 Compute Cov (X,Y )

Solution: 8 14 23 162 Cov (X,Y ) = E(XY ) − E(X)E(Y ) = 20 − 20 · 20 = − 400

29 Example 2: Covariance: Compute covariance between X and Y for   24xy, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, x + y ≤ 1 f(x, y) =  0, else

Compute first cdf of X: 1Z−a Za Z1 1Z−y P (X ≤ a) = 24xy dx dy + 24xy dx dy

y=0 x=0 y=1−a x=0 = 6(1 − a)2a2 + 1 − 6(1 − a)2 + 8(1 − a)3 − 3(1 − a)4 and by differentiation

2 2 2 3 fX (x) = 12(1 − x) x − 12(1 − x)x + 12(1 − x) − 24(1 − x) + 12(1 − x) = 12(1 − x)2x

30 Example 2 :Covariance continued

Because of symmetry we have fY (x) = fX (x) and we get

Z1 E(X) = E(Y ) = 12(1 − x)2x2 dx = 2/5

x=0 Furthermore Z1 1Z−y E(XY ) = 24x2y2 dx dy = 2/15

y=0 x=0 and finally 2 2 2 10 − 12 2 Cov (X,Y ) = − · = = − 15 5 5 75 75

31 Covariance for independent RV

X and Y independent ⇒ σXY = 0 follows immediately from σXY = E(XY ) − E(X)E(Y ) and X X X X E(XY ) = xyp(x, y) = xpX (x) ypY (y) x y x y

Converse not true:   0,X 6= 0 X uniformly distributed on {−1, 0, 1} and Y =  1,X = 0

E(X) = 0 XY = 0 ⇒ E(XY ) = 0 thus Cov (X,Y ) = 0, although X and Y not independent: e.g. P (X = 1,Y = 0) = P (X = 1) = 1/3, P (Y = 0) = 2/3

32 Properties of Covariance

Obviously

Cov (X,Y ) = Cov (Y,X), and Cov (X,X) = Var (X)

Covariance is a bilinear form:

Cov (aX, Y ) = a Cov (X,Y ), a ∈ R and   Xn Xm Xn Xm Cov  Xi, Yj = Cov (Xi,Yj) i=1 j=1 i=1 j=1

Proof by simple computation . . .

33 of a sum of RV

Due to the properties shown above à ! Xn Xn Xn Var Xi = Cov (Xi,Xj) i=1 i=1 j=1 Xn Xn X = Var (Xi) + Cov (Xi,Xj) i=1 i=1 j6=i

Extreme cases: µ ¶ Pn Pn • independent RV: Var Xi = Var (Xi) i=1 i=1 µ ¶ n P 2 • X1 = X2 = ··· = Xn: Var Xi = n Var (X1) i=1

34 Correlation

Definition: ρ(X,Y ) := σXY σX σY We have:

−1 ≤ ρ(X,Y ) ≤ 1

Proof: µ ¶ X Y Var (X) Var (Y ) 2Cov (X,Y ) 0 ≤ Var + = 2 + 2 + σX σY σX σY σX σY = 2[1 + ρ(X,Y )]

µ ¶ X Y Var (X) Var (Y ) 2Cov (X,Y ) 0 ≤ Var − = 2 + 2 − σX σY σX σY σX σY = 2[1 − ρ(X,Y )]

35 Exercise Correlation

Let X and Y be independently uniform on [0, 1] Compute correlation between X and Z for

1. Z = X + Y

2. Z = X2 + Y 2

3. Z = (X + Y )2

36 4.5 Conditional Distribution

Conditional probability for two events A and B:

P (EF ) P (E|F ) = P (F )

Corresponding definition for random variables X and Y

Discrete: p (x|y) := P (X = x|Y = y) = p(x,y) X|Y pY (y)

Exercise: Let p(x, y) given by

p(0, 0) = 0.4, p(0, 1) = 0.2, p(1, 0) = 0.1, p(1, 1) = 0.3,

Compute conditional probability function of X for Y = 1

37 Discrete conditional distribution

Conditional CDF: X FX|Y (x|y) := P (X ≤ x|Y = y) = pX|Y (k|y) k≤x

If X and Y are independent – then pX|Y (x|y) = pX (x) Proof: Easy computation

Exercise: Let X ∼ P(λ1) and Y ∼ P(λ2) be independent. Compute conditional distribution of X, for X + Y = n

P (X=k)P (Y =n−k) P (X = k|X + Y = n) = P (X+Y =n) , ³ ´ λ1 X + Y ∼ P(λ1 + λ2) ⇒ X|(X + Y = n) ∼ B n, λ1+λ2

38 Continuous conditional distribution

f(x,y) Continuous: f (x|y) := for fY (y) > 0 X|Y fY (y) Definition in continuous case can be motivated by discrete case (Probabilities for small environments of x and y)

Computation of conditional probabilities: Z

P (X ∈ A|Y = y) = fX|Y (x|y) dx A

Conditional CDF: Za

FX|Y (a|y) := P (X ∈ (−∞, a)|Y = y) = fX|Y (x|y) dx x=−∞

39 Example Joint density of X and Y given by   c x(2 − x − y), x ∈ [0, 1], y ∈ [0, 1], f(x, y) =  0, otherwise.

Compute c, fX|Y (x|y) and P (X < 1/2|Y = 1/3) R1 2 y Solution: fY (y) = c x(2 − x − y) dx = c ( 3 − 2 ) x=0 R1 2 1 12 1 = fY (y) = c( 3 − 4 ) ⇒ c = 5 y=0 f(x,y) x(2−x−y) 6x(2−x−y) fX|Y (x|y) = f (y) = 2 y = 4−3y Y 3 − 2 1/2 R 6x(2−x−1/3) P (X < 1/2|Y = 1/3) = 4−3/3 dx = ··· = 1/3 x=0

40 Conditional expectation and variance Computation in continuous case with conditional density: Z∞

E(X|Y = y) = xfX|Y (x|y)dx x=−∞ Z∞ 2 Var (X|Y = y) = (x − E(X|Y = y)) fX|Y (x|y)dx x=−∞ Example: prolonged Z1 6x2(2 − x − y) 5/2 − 2y E(X|Y = y) = dx = 4 − 3y 4 − 3y x=0

2 Specifically: E(X|Y = 1/3) = 9

Var (X|Y = y) = ...

41 Conditional expectation and variance

Computation in discrete case with conditional probability function: X E(X|Y = y) = xpX|Y (x|y)

x∈ΩX X 2 Var (X|Y = y) = (x − E(X|Y = y)) pX|Y (x|y)

x∈ΩX

Exercise: Compute expectation and variance for X given Y = j

j

i 0 1 2 3 pX 0 1/20 4/20 3/20 2/20 10/20 1 3/20 2/20 2/20 0 7/20 2 1/20 1/20 0 0 2/20 3 1/20 0 0 0 1/20

pY 6/20 7/20 5/20 2/20 20/20

42 Computation of expectation by conditioning E(X|Y = y) can be seen as a function of y, and therefore the expectation of this function can be computed It holds: E(X) = E(E(X|Y ) Proof: (for the continuous case) Z∞

E(E(X|Y ) = E(X|Y = y)fY (y) dy y=−∞ Z∞ Z∞

= xfX|Y =y(x)fY (y) dx dy y=−∞ x=−∞ Z∞ Z∞ f(x, y) = x fY (y) dx dy = E(X) fY (y) y=−∞ x=−∞ Verify formula for previous examples

43 The conditional variance formula

Var (X) = E(Var (X|Y )) + Var (E(X|Y ))

Proof: From Var (X|Y ) = E(X2|Y ) − (E(X|Y ))2 we get

E(Var (X|Y )) = E(E(X2|Y ))−E((E(X|Y ))2) = E(X2)−E(E(X|Y )2)

On the other hand

Var (E(X|Y )) = E(E(X|Y )2)−(E(E(X|Y )))2 = E(E(X|Y )2)−E(X)2

The sum of both expressions gives the result.

Formula important in the theory of linear regression!

44 4.6 Bivariate normal distribution

2 2 Univariate normal distribution: f(x) = √ 1 e−(x−µ) /2σ 2π σ

2 Standard normal distribution: φ(x) = √1 e−x /2 2π

Let X1 and X2 be independent, both normal distributed 2 N (µi, σi ), i = 1, 2

1 2 2 2 2 −(x1−µ1) /2σ −(x2−µ2) /2σ ⇒ f(x1, x2) = e 1 2 2π σ1σ2 1 T −1 = e−(x−µ) Σ (x−µ)/2 2π |Σ|1/2

¡ ¢ ¡ ¢ ¡ 2 ¢ x1 µ1 σ1 0 where x = , µ = , Σ = 2 x2 µ2 0 σ2

45 Densitiy function in general

X = (X1,X2) bivariate normal distributed if joint density has the form

1 −(x−µ)T Σ−1(x−µ)/2 f(x) = 2π |Σ|1/2 e   2 σ σ12 Covariance matrix: Σ =  1  2 σ12 σ2

Notation: ρ := σ12 σ1σ2

2 2 2 2 2 2 • |Σ| = σ1σ2 − σ12 = σ1σ2(1 − ρ )   σ2 −ρσ σ −1 1  2 1 2  • Σ = 2 2 2 σ1 σ2 (1−ρ ) 2 −ρσ1σ2 σ1

46 Density in general

Finally we obtain n o z2−2ρz z +z2 f(x , x ) = 1√ exp − 1 1 2 2 1 2 2 2(1−ρ2) 2πσ1σ2 1−ρ

x1−µ1 x2−µ2 where z1 = and z2 = (compare σ1 σ2 standardization)

2 Notation suggests that µi and σi are expectation and variance of the marginals Xi, and that ρ is the correlation between X1 and X2

Important formula: 2 z2 (ρz1−z2) 1 − 1 1 − 2 f(x , x ) = √ e 2 · √ e 2(1−ρ ) 1 2 2 2πσ1 2π(1−ρ )σ2

Completion of square in the exponent

47 Bivariate normal distribution plot Marginals standard normal distributed N (0, 1), ρ = 0:

48 Example density

Let (X,Y ) be bivariate normal distributed with µ1 = µ2 = 0,

σ1 = σ2 = 1 and ρ = 1/2, compute the joint density

¡0¢ ¡ 1 1/2¢ Solution: µ = 0 , Σ = 1/2 1 ¡ ¢ −1 4 1 −1/2 |Σ| = 1 − 1/4 = 3/4, Σ = 3 −1/2 1 ¡ ¢ ¡ ¢ −1 x 2 2x−y 4 2 2 (x, y)Σ y = 3 (x, y) −x+2y = 3 (x − xy + y )

1 − 2 (x2−xy+y2) f(x, y) = √ e 3 3π

Density after completion of square in exponent:

(y−x/2)2 1 − 1 x2 1 − f(x, y) = √ e 2 p e 2·3/4 2π 2π 3/4

49 Example continued

(y−x/2)2 1 − 1 x2 1 − f(x, y) = √ e 2 p e 2·3/4 2π 2π 3/4 Joint density is the product of a standard normal (in x) and a normal distribution (in y) with x/2 and variance 3/4.

Compute density of X: Z∞ (y−x/2)2 1 − 1 x2 1 − 1 − 1 x2 fX (x) = √ e 2 p e 2·3/4 dy = √ e 2 2π 2π 3/4 2π y=−∞ fX (x) is density of a standard normal distribution The integral equals 1, because we integrate a density function!

50 2 Interpretation of µi, σi and ρ In general we have for a bivariate normal distribution

2 2 1. X1 ∼ N (µ1, σ1) and X2 ∼ N (µ2, σ2)

σ12 2. Correlation coefficient ρ(X1,X2) = σ1σ2 Proof: 1. Use formula with completed square in exponent and integrate: Z∞ 2 (ρz −z )2 z1 1 2 1 − 1 − 2 fX (x1)=√ e 2 p e 2(1−ρ ) dx2 1 2 2πσ1 2π(1 − ρ )σ2 x2=−∞ Ã !2 ρz Z∞ √ 1 −s z2 2 z2 1 − 1 1 − 1−ρ 1 − 1 =√ e 2 √ e 2 ds = √ e 2 2πσ1 2π 2πσ1 s=−∞ p p 2 2 With substitution s ← z2/ 1 − ρ = (x2 − µ2)/( 1 − ρ σ2)

51 Proof continued

2. Again formula with completed square in exponent and substitutions z1 ← (x1 − µ1)/σ1, z2 ← (x2 − µ2)/σ2: Z∞ Z∞

Cov (X1,X2) = (x1 − µ1)(x2 − µ2)f(x1, x2) dx2dx1

x1=−∞ x2=−∞ Z∞ Z∞ 2 (ρz −z )2 z1 1 2 x1 − µ1 − x2 − µ2 − 2 = √ e 2 p e 2(1−ρ ) dx2dx1 2 2πσ1 2π(1 − ρ )σ2 x1=−∞ x2=−∞ Z Z Ã ! z2 ρz1 − z2 = z1φ(z1) p φ p σ2dz2σ1dz1 2 2 z1 z2 1 − ρ 1 − ρ Z

= σ1σ2 z1φ(z1)ρz1dz1 = σ1σ2ρ = σ12 z1

52 Conditional distribution

Interpretation of the formula 2 z2 (ρz1−z2) 1 − 1 1 − 2 f(x , x ) = √ e 2 · √ e 2(1−ρ ) 1 2 2 2πσ1 2π(1−ρ )σ2

f(x1, x2) = f1(x1)f2|1(x2|x1)

2 2 (ρz1−z2) (µ2+σ2ρz1−x2) From 2 = 2 2 we conclude: (1−ρ ) σ2 (1−ρ )

Conditional distribution is again normal distribution with

σ2 2 2 µ = µ2 + ρ(x1 − µ1) , σ = σ (1 − ρ ) 2|1 σ1 2|1 2

For bivariate normal distribution: ρ = 0 ⇒ Independence

In general not correct!

53 Sum of bivariate normal distributions

2 2 Let X1,X2 be bivariate normal with µ1, µ2, σ1, σ2, σ12

Then the random variable Z = X1 + X2 is again normal, with

2 2 X1 + X2 ∼ N (µ1 + µ2, σ1 + σ2 + 2σ12) Proof: For the density of the sum we have Z∞

fZ (z) = f(z − x2, x2) dx2

x2=−∞

Get the result again by completion of square (lengthy calculation) Intuition: Mean and variance of Z obtained from formula for general random variables

54