Joint Probability Distributions, Correlations What We Learned So Far… • Events: – Working with Events As Sets: Union, Intersection, Etc

Home , Bar chart, Conditional probability

Joint Probability Distributions, Correlations What we learned so far… • Events: – Working with events as sets: union, intersection, etc. • Some events are simple: Head vs Tails, Cancer vs Healthy • Some are more complex: 10Mean, Variance, Standard deviation. How to work with E(g(X)) – Discrete (Uniform, Bernoulli, Binomial, Poisson, Geometric, Negative binomial, Hypergeometric, Power law); PMF: f(x)=Prob(X=x); CDF: F(x)=Prob(X≤x); – Continuous (Uniform, Exponential, Erlang, Gamma, Normal, Log‐ normal); PDF: f(x) such that Prob(X inside A)= ∫A f(x)dx; CDF: F(x)=Prob(X≤x) • Next step: work with multiple random variables Concept of Joint Probabilities • Biological systems are usually described not by a single random variable but by many random variables • Example: The expression state of a human cell:

20,000 random variables Xi for each of its genes • A joint probability distribution describes the behavior of several random variables • We will start with just two random variables X and Y and generalize when necessary

Chapter 5 Introduction 3 Joint Probability Mass Function Defined

The joint probability mass function of the discrete random variables XY and , denoted as fxyXY  , , satifies:  (1) fxyXY , 0 All probabilities are non-negative  (2)  fxyXY , 1 The sum of all probabilities is 1 xy

(3) fxyPXxYyXY  ,  ,  (5-1)

Sec 5‐1.1 Joint Probability Distributions 4 Example 5‐1: # Repeats vs. Signal Bars

You use your cell phone to check your airline reservation. It asks you to speak the name of your departure city to the voice recognition system. • Let Y denote the number of times you have to state your departure city. • Let X denote the number of bars of signal strength on you cell phone.

y = number of x = number of bars times city of signal strength Bar Chart of name is stated 123 Number of Repeats vs. Cell Phone Bars 1 0.010.020.25 2 0.020.030.20 3 0.020.100.05 0.25 0.20 4 0.150.100.05 0.15 0.10

Probability 4 Times 0.05 3 Times Figure 5‐1 Joint probability Twice 0.00 Once distribution of X and Y. The table cells 1 2 are the probabilities. Observe that 3 Cell Phone Bars more bars relate to less repeating.

Sec 5‐1.1 Joint Probability Distributions 5 Marginal Probability Distributions (discrete) For a discrete joint PDF, there are marginal distributions for each random variable, formed by summing the joint PMF over the other variable.

y = number of x = number of bars of times city name signal strength f XXYxfxy ,    is stated 123f Y (y ) = y 1 0.01 0.02 0.25 0.28    2 0.02 0.03 0.20 0.25 fYXYyfxy ,  3 0.02 0.10 0.05 0.17 x 4 0.15 0.10 0.05 0.30

f X (x ) = 0.20 0.25 0.55 1.00 Called marginal Figure 5‐6 From the prior example, because they are the joint PMF is shown in green written in the margins while the two marginal PMFs are shown in purple.

Sec 5‐1.2 Marginal Probability Distributions 6 Mean & Variance of X and Y are calculated using marginal distributions

y = number of x = number of bars times city of signal strength name is stated 123f (y ) = y *f (y ) = y 2*f (y ) = 1 0.01 0.02 0.25 0.28 0.28 0.28 2 0.02 0.03 0.20 0.25 0.50 1.00 3 0.02 0.10 0.05 0.17 0.51 1.53 4 0.15 0.10 0.05 0.30 1.20 4.80 f (x ) = 0.20 0.25 0.55 1.00 2.49 7.61 x *f (x ) = 0.20 0.50 1.65 2.35 x 2*f (x ) = 0.20 1.00 4.95 6.15

2 2 μX =E(X) = 2.35; σX = V(X) = 6.15 – 2.35 = 6.15 – 5.52 = 0.6275

2 2 μY= E(Y) = 2.49; σY =V(Y) = 7.61 – 2.49 = 7.61 – 16.20 = 1.4099

Sec 5‐1.2 Marginal Probability Distributions 7 Conditional Probability Distributions PAB Recall that PBA P(Y=y|X=x)=P(X=x,Y=y)/P(X=x)= PA =f(x,y)/fX(x)

From Example 5‐1 y = number of x = number of bars of P(Y=1|X=3) = 0.25/0.55 = 0.455 times city name signal strength is stated 123f Y (y ) = P(Y=2|X=3) = 0.20/0.55 = 0.364 1 0.010.020.250.28 P(Y=3|X=3) = 0.05/0.55 = 0.091 2 0.020.030.200.25 3 0.020.100.050.17 P(Y=4|X=3) = 0.05/0.55 = 0.091 4 0.150.100.050.30 f (x ) =0.200.250.551.00 Sum = 1.00 X

Note that there are 12 probabilities conditional on X, and 12 more probabilities conditional upon Y.

Sec 5‐1.3 Conditional Probability 8 Distributions Joint Random Variable Independence • Random variable independence means that knowledge of the values of X does not change any of the probabilities associated with the values of Y.

• Opposite: Dependence implies that the values of X are influenced by the values of Y

Sec 5‐1.4 Independence 9

Independence for Discrete Random Variables • Remember independence of events (slide 21 lecture 3) : P(A|B)=P(A ∩ B)/P(B)=P(A) or P(B|A)= P(A ∩ B)/P(A)=P(B) or P(A ∩ B)=P(A)∙ P(B)

• Random variables independent if any events A that Y=y and B that X=x are independent P(Y=y|X=x)=P(Y=y) for any x or P(X=x|Y=y)=P(X=x) for any y or P(X=x, Y=y)=P(X=x)∙P(Y=y) for any x and y X and Y are Bernoulli variables

Y=0 Y=1 X=0 2/6 1/6 X=1 2/6 1/6

What is the marginal PY(Y=0)? A. 1/6 B. 2/6 C. 3/6 D. 4/6 E. I don’t know

Get your i‐clickers 12 X and Y are Bernoulli variables

Y=0 Y=1 X=0 2/6 1/6 X=1 2/6 1/6

What is the marginal PY(Y=0)? A. 1/6 B. 2/6 C. 3/6 D. 4/6 E. I don’t know

Get your i‐clickers 13 X and Y are Bernoulli variables

Y=0 Y=1 X=0 2/6 1/6 X=1 2/6 1/6 What is the conditional P(X=0|Y=0)? A. 2/6 B. 1/2 C. 1/6 D. 4/6 E. I don’t know

Get your i‐clickers 14 X and Y are Bernoulli variables

Y=0 Y=1 X=0 2/6 1/6 X=1 2/6 1/6 What is the conditional P(X=0|Y=0)? A. 2/6 B. 1/2 C. 1/6 D. 4/6 E. I don’t know

Get your i‐clickers 15 X and Y are Bernoulli variables

Y=0 Y=1 X=0 2/6 1/6 X=1 2/6 1/6 Are they independent? A. yes B. no C. I don’t know

Get your i‐clickers 16 X and Y are Bernoulli variables

Y=0 Y=1 X=0 2/6 1/6 X=1 2/6 1/6 Are they independent? A. yes B. no C. I don’t know

Get your i‐clickers 17 X and Y are Bernoulli variables

Y=0 Y=1 X=0 1/2 0 X=1 0 1/2 Are they independent? A. yes B. no C. I don’t know

Get your i‐clickers 18 X and Y are Bernoulli variables

Y=0 Y=1 X=0 1/2 0 X=1 0 1/2 Are they independent? A. yes B. no C. I don’t know

Get your i‐clickers 19 Joint Probability Density Function Defined The joint probability density function for the continuous random

variables X and Y, denotes as fXY(x,y), satisfies the following properties:

(1) fxyXY  , 0 for all xy ,

 (2) fxydxdy , 1  XY     (3) PXY , R f xydxdy , (5-2)  XY  R

Figure 5‐2 Joint probability density function for the random variables X and Y. Probability that (X, Y) is in the region R is determined by the volume of

fXY(x,y) over the region R.

Sec 5‐1.1 Joint Probability Distributions 20 Joint Probability Density Function

Figure 5‐3 Joint probability density function for the continuous random variables X and Y of expression levels of two different genes. Note the asymmetric, narrow ridge shape of the PDF – indicating that small values in the X dimension are more likely to occur when small values in the Y dimension occur.

Sec 5‐1.1 Joint Probability Distributions 21 Marginal Probability Distributions (continuous) • Rather than summing a discrete joint PMF, we integrate a continuous joint PDF. • The marginal PDFs are used to make probability statements about one variable. • If the joint probability density function of random variables X and Y is fXY(x,y), the marginal probability density functions of X and Y are:

f XXYx   f x, y   fxXXY f xydy,   y     y fyYXY  f x, y    x fy f xydx, (5-3) YXY x

Sec 5‐1.2 Marginal Probability Distributions 22 Conditional Probability Density Function Defined Given continuous random variables XY and with joint probability density function fxyXY  , , the conditional probability densiy function of YX given =x is

fxyXY,, fxy XY  fy = if fx 0 (5-4) Yx fx fxydy, X X  XY y

Compare to discrete: P(Y=y|X=x)=fXY(x,y)/fX(x) which satifies the following properties:

(1) fyYx 0 (2) fydy 1  Yx (3) PY BX x f ydy for any set B in the range of Y  Yx Sec 5‐1.3 Conditional Probability B 23 Distributions Conditional Probability Distributions

• Conditional probability distributions can be developed for multiple random variables by extension of the ideas used for two random variables. • Suppose p = 5 and we wish to find the distribution of

X1, X2 and X3 conditional on X4=x4 and X5=x5. f xxxxx,,,, XX12345 XX X 12345 fxxx123,,  XX12345 X xx fxx,  XX45 45 for fxx , 0. XX45 45

Sec 5‐1.5 More Than Two Random 24 Variables Independence for Continuous Random Variables

For random variables X and Y, if any one of the following properties is true, the others are also true. Then X and Y are independent. P(Y=y|X=x)=P(Y=y) for any x or P(X=x|Y=y)=P(X=x) for any y or P(X=x, Y=y)=P(X=x)∙P(Y=y) for any x and y

(1) fxyfxfyXY ,  X  Y  

(2) fyfyYxYX  for all x and y with fx 0 

(3) fyfxXyXY  for all x and y with fy 0  (4) PXAYBPXAPYB , for any sets AB and in the range of X and Y , respectively. (5-7)

Sec 5‐1.4 Independence 25 Covariation, Correlations

Covariance Defined

Covariance is a number qunatifying average dependence between two random variables.

The covariance between the random variables X and Y,   denoted as covXY , or  XY is  XYEX X Y  Y EXY X Y (5-14)

The units of  XY are units of XY times units of .

Unlike the range of variance, -XY  .

Sec 5‐2 Covariance & Correlation 29  Covariance and PMF tables

y = number of x = number of bars times city of signal strength name is stated 123 1 0.01 0.02 0.25 2 0.02 0.03 0.20 3 0.02 0.10 0.05 4 0.15 0.10 0.05 The probability distribution of Example 5‐1 is shown.

By inspection, note that the larger probabilities occur as X and Y move in opposite directions. This indicates a negative covariance.

Sec 5‐2 Covariance & Correlation 30 Covariance and Scatter Patterns

Figure 5‐13 Joint probability distributions and the sign of cov(X, Y). Note that covariance is a measure of linear relationship. Variables with non‐zero covariance are correlated.

Sec 5‐2 Covariance & Correlation 31 Independence Implies σ=ρ = 0 but not vice versa

• If X and Y are independent random variables,

σXY = ρXY = 0 (5‐17)

• ρXY = 0 is necessary, but not a sufficient condition for independence. Independent covariance=0

NOT independent

Sec 5‐2 Covariance & Correlationcovariance=0 32 Correlation is “normalized covariance” • Also called: Pearson correlation coefficient

ρXY=σXY /σXσY is the covariance normalized to

be ‐1 ≤ ρXY ≤ 1 Karl Pearson (1852– 1936) English mathematician and biostatistician

Spearman rank correlation

• Pearson correlation tests for linear relationship between X and Y • Unlikely for variables with broad distributions  non‐ linear effects dominate • Spearman correlation tests for any monotonic relationship between X and Y

• Calculate ranks (1 to n), rX(i) and rY(i) of variables in both samples. Calculate Pearson correlation between ranks: Spearman(X,Y) = Pearson(rX, rY) • Ties: convert to fractions, e.g. tie for 6s and 7s place both get 6.5. This can lead to artefacts. • If lots of ties: use Kendall rank correlation (Kendall tau) Matlab exercise: Correlation/Covariation • Generate a sample with Stats=100,000 of two Gaussian random variables r1 and r2 which have mean 0 and standard deviation 2 and are: – Uncorrelated – Correlated with correlation coefficient 0.9 – Correlated with correlation coefficient ‐0.5 – Trick: first make uncorrelated r1 and r2. Then change r1 to: r1m=mix.*r2+(1‐mix.^2)^0.5.*r1; where mix= corr. coeff. • In each case calculate covariance and correlation coefficient • In each case make scatter plot: plot(r1,r2,’k.’); Matlab exercise: Correlation/Covariation 1. Stats=100000; 2. r1=2.*randn(Stats,1); 3. r2=2.*randn(Stats,1); 4. disp('Covariance matrix='); disp(cov(r1,r2)); 5. disp('Correlation=');disp(corr(r1,r2)); 6. figure; plot(r1,r2,'ko'); 7. mix=0.9; %Mixes r2 to r1 but keeps same variance 8. r1m=mix.*r2+sqrt(1‐mix.^2).*r1; 9. disp('Covariance matrix='); disp(cov(r1m,r2)); 10.disp('Correlation=');disp(corr(r1m,r2)); 11.figure; plot(r1m,r2,'ko'); 12.mix=‐0.5; %REDO LINES 8‐11

Linear Functions of Random Variables

• A function of random variables is itself a random variable. • A function of random variables can be formed by either linear or nonlinear relationships. We start with linear functions.

• Given random variables X1, X2,…,Xp and constants c1, c2, …, cp Y= c1X1 + c2X2 + … + cpXp (5‐24) is a linear combination of X1, X2,…,Xp.

Sec 5‐4 Linear Functions of Random 40 Variables

Mean & Variance of a Linear Function

Y= c1X1 + c2X2 + … + cpXp

EY cEX1122  cEX   ... cEpp  X (5-25)

22 2 VYcVXcVX1122    ... cVXpp  2 ccXX ijij cov (5-26)   ij  If XX12 , ,..., Xpij are independent , then cov XX  0,

22 2 VY cVX1122  cVX  ... cpp VX (5-27)

Sec 5‐4 Linear Functions of Random 42 Variables Example 5‐31: Error Propagation

A semiconductor product consists of three layers. The variances of the thickness of each layer is 25, 40 and 30 nm2. What is the variance of the finished product? Answer:

XXX123 X 3 2 VX VXi 25  40 30 95 nm i1  SD X 95 9.7 nm

Sec 5‐4 Linear Functions of Random 43 Variables

Mean & Variance of an Average

XX12... Xp If XEX and i  p  p  Then EX (5-28a) p

If the XVX are independent with  2 ii p  22 Then VX (5-28b) pp2

Sec 5‐4 Linear Functions of Random 46 Variables Credit: XKCD comics