Chapter 5 Joint Probability Distributions and Random Samples

Home , Normal distribution

1/42 Outline

5.1 Jointly Distributed Random Variables

5.2 Expected values, Covariance, and Correlation

5.3 Statistics and Their Distributions

5.4 The Distribution of Sample Mean

5.5 The Distribution of a Linear Combinations

2/42 Joint Distribution of Two Discrete Variables

Example A large insurance agency serves a large number of customers who have purchased both a homeowner’s policy and an automobile policy from the agency. For each type of policy, a deductible amount must be speciﬁed. For an automobile policy, the choices are $500 and $1000, whereas for a homeowner’s policy, the choice are $0, $500, and $1000. For a randomly selected customer with both types of policy, let X be the deductible amount on auto policy and Y be the deductible amount on the homeowner’s policy.

1 What is the sample space?

2 If the total 20, 000 customers are classiﬁed by their purchased policies as follows, what are the probabilities of simple events?

X Y 0 500 1000 Total \ 500 4,000 2,000 4,000 10,000 1000 1,000 3,000 6,000 10,000 Total 5,000 5,000 10,000 20,000

3/42 Joint Distribution of Two Discrete Variables (cont’d) Example The joint probability distribution of (X , Y ) is deﬁned for each pair of the number (x, y). It is usually expressed by a r c table as follows. ×

X Y 0 500 1000 Total \ 500 0.20 0.10 0.20 0.5 1000 0.05 0.15 0.30 0.5 Total 0.25 0.25 0.5 1.0

What are the distributions of X and Y , respectively? The individual probability distributions are called marginal distributions.

Definition 1 Let X and Y be two discrete random variables defined on the sample space S of an experiment. The joint probability mass function p(x, y) is defined for each pair of numbers (simple events) (x, y) by

p(x, y)= P[X = x and Y = y]

which satisﬁes p(x, y) 0 and p(x, y) = 1. ≥ Px Py

4/42 Joint Distribution of Two Discrete Variables (cont’d) Deﬁnition 2 For any compound event A, P[(X , Y ) ∈ A]= p(x, y). (x,y) A P ∈P 3 The marginal probability mass functions of X and of Y, denoted by pX (x) and pY (y), respectively, are deﬁned as

pX (x)= p(x, y), pY (y)= p(x, y). Xy Xx

4 The conditional probability distribution of X given Y = y is deﬁned as

p(x, y) pX Y (x|Y = y)= | pY (y)

for all x provided pY (y) 6= 0. The conditional probability distribution of Y given X = x is deﬁned as

p(x, y) pY X (y|X = x)= | pX (x)

for all x provided pX (x) 6= 0.

5/42 Joint Distribution of Two Discrete Variables (cont’d)

Deﬁnition

5 Two random variables X and Y are independent if and only if p(x, y) = pX (x)pY (y) for all x and y.

If pY (y) =0, X and Y are independent if and only if pX Y (x y) = pX (x) for all x, 6 | | i.e., the conditional probability distribution is free of y. Similarly, if pX (x) =0, X 6 and Y are independent if and only if pY X (y x) = pY (y). | | 6 Let X1, X2, , Xk are discrete random variables. Let x1 be a possible value for ··· X1, x2 be a possible value for X2, and so on with xk a possible value for Xk . The joint probability distribution of X1, X2, , Xk is probabilities deﬁned on all ··· k-tuples of possible values (x1, x2, , xk ) satisfying p(x1, x2, , xk ) 0 and ··· ··· ≥ p(x1, x2, , xk )=1. The marginal probability distribution of Xi is ··· ··· Px1 Px2 Pxk

pX (xi ) = p(x1, x2, , xk ). i ··· ··· ··· Xx1 Xxi 1 Xxi+1 Xxk −

Random variables X1, X2, , Xk are independent if ···

p(x1, x2, , xk ) = pX (x1)pX (x2) pX (xk ). ··· 1 2 ··· k

6/42 Joint Distribution of Two Discrete Variables (cont’d) Example A service station has both self-service and full-service islands. On each island, there is a single regular unleaded pump with two hoses. Let X denote the number of hoses being used on the self-service island at a particular time and let Y denote the number of hoses on the full services island in use at that time. The joint pmf of X and Y appears in the accompanying tabulation.

y X p(x, y) 0 1 2 Marginal 0 0.10 0.04 0.02 0.16 x 1 0.08 0.20 0.06 0.34 2 0.06 0.14 0.30 0.50 Y Marginal 0.24 0.38 0.38 1.00

1 What are P[X = 1, Y = 1] and P[X 1, Y 1]? ≤ ≤ 2 Give a word description of the event X = 0, Y = 0 , and compute the probability of this event. { 6 6 }

3 Compute the marginal probability mass functions of X and of Y .

4 Are X and Y independent random variables? Explain.

7/42 Joint Distribution of Two Discrete Variables (cont’d)

Example The joint distribution of the number X of cars and the number Y of buses per signal cycle at a proposed left-turn lane is displayed in the accompanying joint probability table.

X Y 0 1 2 X marginal \ 0 0.025 0.015 0.010 0.050 1 0.050 0.030 0.020 0.100 2 0.125 0.075 0.050 0.250 3 0.150 0.090 0.060 0.300 4 0.100 0.060 0.040 0.200 5 0.050 0.030 0.020 0.100 Y marginal 0.500 0.300 0.200 1.000

1 What is the probability that there is exactly one car and exactly one bus during a cycle?

2 What is the probability that there is at most one car and at most one bus during a cycle?

3 What is the probability that there is exactly one car during a cycle? Exactly one bus?

4 Suppose the left-turn line is to have a capacity of ﬁve cars and one bus is equivalent to three cars. What is the probability of an overﬂow during a cycle?

5 Are X and Y independent random variables? Explain.

8/42 Multinomial Distribution A multinomial experiment is a random experiment that has the following properties:

1 The experiment consists of n repeated trials.

2 Each trial has a discrete number of possible outcomes, say r.

3 The probability that a particular outcome will occur is constant, say, pi , on any given trial.

4 The trials are independent.

Deﬁnition

Let Xi be the number of trials resulting in outcome i (i = 1, 2,..., r) and pi be the probability of outcome i on any particular trial in a multinomial experiment. X1 ..., and Xr have jointly multinomial distribution if P[X1 = x1,..., Xr = xr ]= p(x1,..., xr )

n! x1 xr p p xi = 0, 1, 2,... x ! xr ! 1 r  1 ··· ··· =  x1 + x2 + + xr = n 0 otherwise ···  9/42 Multinomial Distribution (cont’d) Two valves are used to control the ﬂow of liquid out of a storage tank into another storage tank. Each valve has two states: open and close. A desired system is expected to have the following multinomial distribution

Configuration Valve 1 Valve 2 Probability of best flow 1 open open 0.476 2 open close 0.305 3 close open 0.218 4 close close 0.001 Two different apparatuses are randomly selected from the desired population and the number of times out of 100 measurements that the valve configuration gave the desired flow from each apparatus is recorded.

apparatus 1 apparatus 2 Conﬁguration Valve 1 Valve 2 Times Times 1 open open 37 22 2 open close 42 20 3 close open 21 56 4 close close 0 2 Which apparatus is more like to be the desired system? (apparatus 1, 3.125 10−4; apparatus 2, 8.455 10−18) × ×

10/42 Joint Distribution of Two Continuous Variables Definition 1 Let X and Y be two continuous random variables defined on the two-dimensional sample space S of an experiment. The joint probability density function f (x, y) is a non-negative function of (x, y) satisfying f (x, y) 0 and f (x, y)dxdy = 1. ≥ allRR x, y 2 For any two-dimensional domain A, P[(X , Y ) A] = f (x, y)dxdy. In particular, if A is a two-dimensional rectangle ∈ RRA A = (x, y) : a x b, c y d , P[(X , Y ) A] = P[a X b, c Y d] = b b f (x, y)dxdy. { ≤ ≤ ≤ ≤ } ∈ ≤ ≤ ≤ ≤ a c R R 3 The marginal probability density functions of X and of Y , denoted by fX (x) and fY (y), respectively, are defined as fX (x) = f (x, y)dy, fY (y) = f (x, y)dx. Ry Rx f (x,y) 4 The conditional probability distribution of X given Y = y is defined as fX Y (x y) = f y for all x provided | | Y ( ) fY (y) = 0. 6 f (x,y) The conditional probability distribution of Y given X = x is defined as fY X (y x) = f x for all y provided | | X ( ) f (x) = 0. X 6 5 If f (x y) = f (x) or f (x, y) = f (x)f (y) for all x and y, the conditional probability distribution is free of y, two X Y | X X Y random| variables X and Y are independent. 6 Let X , X , , X are discrete random variables. Let x be a possible value for X , x be a possible value for X , 1 2 ··· k 1 1 2 2 and so on with xk a possible value for Xk . The joint probability distribution of X1, X2, , Xk is probabilities defined on all k-tuples of possible values (x , x , , x ) satisfying f (x , x , , x ···) 0 and f (x , x , , x )dx dx = 1. 1 2 ··· k 1 2 ··· k ≥ ··· 1 2 ··· k 1 ··· k xR1 xR2 xRk The marginal probability distribution of Xi is f (x ) = f (x , x , , x )dx dx dx dx . Xi i ··· ··· 1 2 ··· k 1 ··· i 1 i+1 ··· k xR1 xi R 1 xiR+1 xRk − − Random variables X , X , , X are independent if f (x , x , , x ) = f (x )f (x ) f (x ). 1 2 ··· k 1 2 ··· k X1 1 X2 2 ··· Xk k

11/42 Service Windows in Bank

Example A bank operates both a drive-up facility and a walk-up window. On a randomly selected day, let X be the proportion of time that the drive-up facility is in use (at least one customer is being served or waiting to be served) and Y be the proportion of time that the walk-up window is in use. 1 What is the sample space? 2 Suppose the joint probability density function of (X , Y ) is 6 (x + y 2) for0 x 1, 0 y 1 f (x, y) = 5 ≤ ≤ ≤ ≤ 0 otherwise Verify that f (x, y) is a legitimate probability density function. 3 What is the probability that neither facility is busy more than one-quarter of the time? (0.0109) 4 What are the marginal distributions of X and Y , respectively? 5 What is the probability that the walk-up window is busy between 25% and 75% of time? (0.4625) 6 Are two facilities independent in terms of busy time?

12/42 Weight Contribution of Nuts

Example A nut company markets cans of deluxe nuts containing almonds, cashews, and peanuts. Suppose the net weight of each can is exactly one pound, but the weight contribution of each type of nuts is random. Because the three weights sum to one, a joint probability model for any two gives all necessary information about the weight of the third type. Let X be the weight of almonds in a selected can and Y be the weight of cashews. 1 What is the sample space? 2 Suppose the joint pdf of (X , Y ) is

24xy for 0 x 1, 0 y 1, 0 x + y 1 f (x, y) = ≤ ≤ ≤ ≤ ≤ ≤ 0 otherwise

Verify that this is a legitimate pdf. 3 What is the probability that the two types of nuts together make up at most 50%? (0.0625) 4 What are the marginal distributions of X and Y , respectively? 5 Are two types of nuts independent in terms of weight?

13/42 Bivariate Normal Distribution Two continuous random variables X and Y have jointly bivariate normal distribution if the density function is

2 2 1 x µx x µx y µy y µy − − −2ρ − − + − 1 2(1 ρ2) σx σx σy σy f (x, y)= e − . 2 2πσx σy 1 ρ p −

ρ = 0 ρ = 0.25

ρ = 0 ρ = 0.75

ρ = 0.75 ρ = 0.95

ρ = − 0.75 ρ = 0.95

14/42 Properties of Bivariate Normal Distribution 2 2 If X N(µx ,σ ) and Y N(µy ,σ ) and X and Y are independent, ∼ x ∼ y then (X , Y ) follows a bivariate normal distribution with ρ = 0. Two random variables X and Y have a joint bivariate normal distribution if and only if they can be expressed in the form X = aU + bV , Y = cU + dV , where U and V are independent normal random variables. Two random variables X and Y have a joint bivariate normal distribution if and only if any linear combination aX + bY has a normal distribution. If two random variables X and Y have a joint bivariate normal distribution, then X and Y are independent if and only if they are uncorrelated. If two random variables X and Y have a joint bivariate normal distribution, then the conditional distribution of X given Y = y has a σx normal distribution with mean E[X Y = y]= µx + ρ (y µy ), and | σy − V [X Y = y]=(1 ρ2)σ2. | − x 15/42 Multivariate Continuous Distributions Example 1. When a certain method is used to collect a ﬁxed volume of rock samples in a region, there are four resulting rock types. Let X1, X2, and X3 denote the proportion by volume of rock types 1, 2, and 3 in a randomly selected sample. Here is one legitimate joint pdf of X1, X2, and X3.

144x x (1 x ) 0 x 1, 0 x 1, 0 x 1,  1 2 − 3 ≤ 1 ≤ ≤ 2 ≤ ≤ 3 ≤ f (x, y)=  0 x1 + x2 + x3 1 0 otherwise≤ ≤  2. Continuous random variables X1, X2, ..., Xn jointly have a multivariate normal distribution if their joint probability density function is

x1 µ1  −  1 1 . x µ x µ Σ− . − 2 1 1 n n  .  − ··· −     1  xn µn  f (x1, x2,..., xn)= n e  − . (2π) 2 det(Σ) p

16/42 Expected Values, Covariance, and Correlation

Example A small ferry can accommodate at most ﬁve cars and at most two buses in a single trip. The toll for cars is $3 each and the toll for buses is $10 each. Let X and Y be the number of cars and buses, respectively, carried on a single trip. Based on the analysis of data over years, the joint distribution of X and Y is y p(x, y) 0 1 2 X marginal 0 0.025 0.015 0.010 0.050 1 0.050 0.030 0.020 0.100 2 0.125 0.075 0.050 0.250 x 3 0.150 0.090 0.060 0.300 4 0.100 0.060 0.040 0.200 5 0.050 0.030 0.020 0.100 Y marginal 0.500 0.300 0.200 1.000 What is the expected revenue from a single trip?

17/42 Expected Values, Covariance, and Correlation (cont’d) Deﬁnition

1 Let X and Y be jointly distributed random variables with pmf p(x, y) or pdf f (x, y) according to whether the variables are discrete or continuous. h(x, y)p(x, y) if X and Y are discrete  Px Py E[h(X , Y )] =   ∞ ∞ h(x, y)f (x, y)dxdy if X and Y are continuous R R  −∞ −∞ 2 V [g(X , Y )] = E[g(X , Y ) µ ]2, where µ = E[g(X , Y )]. − g(X ,Y ) g(X ,Y ) 3 The covariance between two R.V’s X and Y is Cov(X , Y )= E(X µX )(Y µY )= − − (x µX )(y µY )p(x, y) if X and Y are discrete  Px Py − −  ∞ ∞ (x µX )(y µY )f (x, y)dxdy if X and Y are continuous R R − −  −∞ −∞ 4 The correlation coeﬃcient of X and Y , denoted by Corr(X , Y ), ρX ,Y , or ρ Cov(X ,Y ) Cov(X ,Y ) is ρX ,Y = = . √V(X )V(Y ) σX σY

18/42 Properties of Covariance and Correlation

1 Cov(X , Y )= E(XY ) E(X )E(Y ). − 2 Let Xi be a random variable with a discrete or continuous distribution for i = 1, 2,..., k. The linear combination Y = a X + a X + + ak Xk , has mean 1 1 2 2 ··· E(Y )= a1E(X1)+ a2E(X2)+ + ak E(Xk ) and variance 2 2 ··· 2 V (Y )= a1V (X1)+ a2V (X2)+ + ak V (Xk )+2 ai aj Cov(Xi , Xj ). ··· iP

6 Let Corr(X , Y ) = ρ. ρ is a measure of the degree of linear association between X and Y . For descriptive purpose, the association is strong if ρ 0.8, moderate if | | ≥ 0.5 < ρ < 0.8, and weak if ρ 0.5. A ρ < 1 indicates only that the association | | | | ≤ | | is not completely linear, but there may still be strong nonlinear relation. See http://rpsychologist.com/d3/correlation/ for an interactive visualization.

Positive Correlation Negative Correlation Little or No Correlation

Low X, High Y High X, High Y Low X, High Y High X, High Y High X, High Y

Low X, High Y Y µ Y µ Y µ

Low X, Low Y High X, Low Y Low X, Low Y Low X, Low Y High X, Low Y

High X, Low Y

µX µX µX

7 When ρ =0 X and Y are said to be uncorrelated. Two variables could be uncorrelated yet highly dependent because there is a strong nonlinear association, so be careful not to conclude too much from knowing ρ =0.

20/42 Checkout Lines in Market Example A certain market has both express and regular checkout lines. Let X and Y be the numbers of customers in two checkout lines at a particular time of day, respectively. If the joint pmf of X and Y is as follows:

y 0 1 2 3 X marginal 0 0.08 0.07 0.04 0.00 0.19 1 0.06 0.15 0.05 0.04 0.30 x 2 0.05 0.04 0.10 0.06 0.25 3 0.00 0.03 0.04 0.07 0.14 4 0.00 0.01 0.05 0.06 0.12 Y marginal 0.19 0.30 0.28 0.23 1.00

1 Interpret P[X =1, Y = 1] and P[X = Y ]. 2 What is the probability that the total number of customers in two kinds of lines is exactly four? At least four? 3 Are X and Y independent random variables? 4 Compute Cov(X , Y ). 5 Compute Corr(X , Y ).

21/42 Checkout Lines in Market (cont’d)

cov.discrete<-function(x.value,y.value,prob.matrix){ #this function calculate the covariance of discrete distribution #x values are the rows and y values are the columns x.margin<-apply(prob.matrix,1,sum) y.margin<-apply(prob.matrix,2,sum) exp.x<-sum(x.value*x.margin) exp.y<-sum(y.value*y.margin) xy.value<-kronecker(x.value,y.value) exp.xy<-sum(xy.value*as.vector(t(prob.matrix))) cov.xy<-exp.xy-exp.x*exp.y var.x<-sum((x.value^2)*x.margin)-exp.x^2 var.y<-sum((y.value^2)*y.margin)-exp.y^2 cor.xy<-cov.xy/sqrt(var.x*var.y) out<-c(exp.x=exp.x,exp.y=exp.y,var.x=var.x,var.y=var.y,exp.xy=exp.xy,cov.xy=cov.xy,cor.xy=cor.xy) names(out)<-c("E(X)","E(Y)","Var(X)","Var(Y)","E(XY)","Cov(X,Y)","Corr(X,Y)") out } x.value<-c(0,1,2,3,4) y.value<-c(0,1,2,3) prob.matrix<-matrix(c(0.08,0.07,0.04,0.00,0.06,0.15,0.05,0.04,0.05,0.04,0.10,0.06, 0.00,0.03,0.04,0.07,0.00,0.01,0.05,0.06),nrow=5,byrow=TRUE) cov.discrete(x.value,y.value,prob.matrix)

> cov.discrete(x.value,y.value,prob.matrix) E(X) E(Y) Var(X) Var(Y) E(XY) Cov(X,Y) Corr(X,Y) 1.7000000 1.5500000 1.5900000 1.0875000 3.3300000 0.6950000 0.5285324

22/42 Air Pressures in Tires Example Each front tire on a particular type of vehicle is supposed to be ﬁlled to a pressure of 26 psi. The actual air pressure in each tire is a random variable. Let X and Y be the pressure of left and right tires, respectively. If the joint probability density function is

K(x 2 + y 2) 20 x 30, ≤ ≤ f (x, y) =  20 y 30 ≤ ≤  0 otherwise  1 What is the value of K? (3/380000) 2 What is the probability that both tires are underﬁlled? (0.3024) 3 What is the probability that the diﬀerence in air pressure between the two tires is at most two psi? (0.3593) 4 Determine the marginal distributions of air pressure in the left and right tires. (3y 2/38000 + 0.05 for 20 ≤ y ≤ 30 and 3x2/38000 + 0.05 for 20 ≤ x ≤ 30) 5 Are X and Y independent random variables? (No) 6 Compute Cov(X , Y ). (E(X )= E(Y ) = 25.3290, V (X )= V (Y ) = 8.2690, E(XY ) = 641.4474, Cov(X , Y )= −0.1083) 7 Compute Corr(X , Y ). (Corr(X , Y )= −0.0131)

23/42 Populations, Samples, Parameters, and Statistics At the heart of statistics lie the ideas of inference that enable the investigator to argue from the particular observations in a sample to the general case in a population. Example A statistics professor was interested in how fast people drive on I-75. He was told by an authorized agency that the driving speed is normally distributed with mean 70m/h and standard deviation 10m/h and was given the full data. Here are the full data.

N(70,100)

[1] 67 60 70 76 62 77 73 53 76 80 [11] 66 67 65 80 67 80 60 71 65 59 [21] 73 65 71 55 75 75 87 55 69 72 [31] 87 86 91 77 72 68 56 58 71 75 [41] 66 69 62 70 84 74 71 82 73 83

[51] 80 65 55 69 64 70 68 71 61 77 Density [61] 78 77 68 68 51 57 58 77 75 67 [71] 74 53 73 59 82 79 51 79 61 77 [81] 63 74 64 65 54 85 71 80 45 73 [91] 61 66 79 89 70 80 72 61 64 71 ...... 0.00 0.01 0.02 0.03 0.04 40 50 60 70 80 90 100

Driving speed

24/42 Populations, Samples, Parameters, and Statistics (cont’d)

Example In his large statistics class, he assigned each student to observe 10 cars randomly on I-75. Here are the data from his students.

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 mean sd x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 mean sd 773817666746081737375 73 6 166537058686956638073 66 8 8 69 68 48 64 87 61 84 71 66 65 68 11 2 71 83 49 59 79 92 72 78 81 63 73 13 976758062677282796886 75 7 3 77 54 67 61 69 56 83 68 86 82 70 11 10 77 81 65 80 76 71 64 82 72 83 75 7 472637788786470667370 72 8 11 72 80 72 64 67 73 66 72 80 54 70 8 577747268496676787878 72 9 12 81 72 59 76 70 67 57 50 67 59 66 10 671696366645571838263 69 9 ...... 1 What have you observed? Why? 2 The 10 observations from one student are not exactly the same as other students. 3 The mean observations from different students are most different and are different from the population one 70m/h. 4 The standard deviations from different students are most different and are different from the population one 10m/h. 5 How about quartiles and percentiles?

25/42 Populations, Samples, Parameters, and Statistics (cont’d) Deﬁnition

1 A statistic is any quantity whose value can be calculated from sample data. Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable and will be denoted by an uppercase letter such as X , Y , Z, etc. A lowercase letter is used to represent the calculated or observed value of the statistic.

2 A numerical feature of a population is called a parameter

3 A set of random variables X , X , , Xn constitutes a random sample of 1 2 ··· size n from an inﬁnite population if each Xi is a random variable whose distribution is determined by f (x) and the n random variables are independent or Xi ’s are independent and identically distributed (i.i.d.).

4 A set of random variables X1, X2, , Xn constitutes a random sample of size n from a ﬁnite population of size··· N, if its values are chosen so that each subset of n of the N elements of the population has the same probability of being selected. In practice, if n/N 0.05 (at most 5% of the population is sampled), the sample could be treated≤ approximately as random.

26/42 Sampling Distributions of Statistics

Since a statistic is a random variable, its probability distribution is called the sampling distribution. There are two general methods to obtain information about a statistic’s sampling distribution. One method involves calculations based on probability rules and the other involves carrying out a simulation experiment. Example A large consulting firm manages three kinds of projects with profits of 0, 3, and 12 thousand. The population distribution of the profit is as following.

x 0 3 12 p(x) 0.2 0.3 0.5

Let X1, X2, and X3 be the proﬁts from three randomly selected projects, respectively. Then X1, X2, and X3 is a random sample of size three from the population distribution where E(X ) = 6.9 and V (X ) = 27.09. (a) List all the possible sample values and determine their probabilities. (b) Determine the sampling distribution of the sample mean. (c) Determine the sampling distribution of the sample median.

27/42 Sampling Distributions of Statistics (cont’d)

Proﬁts Sample Sample The sample distribution of X¯ is X1 X2 X3 Mean Median Probability 0 0 0 0 0 0.008 2 0 0 3 1 0 0.012 x¯ f (¯x) xf¯ (¯x) x¯ f (¯x) 0 0 12 4 0 0.020 0 0.008 0.000 0.000 0 3 0 1 0 0.012 1 0.036 0.036 0.036 0 3 3 2 3 0.018 2 0.054 0.108 0.216 0 3 12 5 3 0.030 3 0.027 0.081 0.243 0 12 0 4 0 0.020 4 0.060 0.240 0.960 0 12 3 5 3 0.030 5 0.180 0.900 4.500 0 12 12 8 12 0.050 6 0.135 0.810 4.860 3 0 0 1 0 0.012 8 0.150 1.200 9.600 3 0 3 2 3 0.018 9 0.225 2.025 18.225 3 0 12 5 3 0.030 12 0.125 1.500 18.000 3 3 0 2 3 0.018 Total 1.000 6.9 56.64 3 3 3 3 3 0.027 3 3 12 6 3 0.045 Therefore, E(X¯)=6.9 and V (X¯)=9.03 3 12 0 5 3 0.030 Similarly, the sample distribution of the median is 3 12 3 6 3 0.045 3 12 12 9 12 0.075 2 12 0 0 4 0 0.020 x f (x) xf (x) x f (x) 12 0 3 5 3 0.030 0 0.104 0.000 0.000 12 0 12 8 12 0.050 3 0.396 1.188 3.564 12 3 0 5 3 0.030 12 0.500 6.000 72.000 12 3 3 6 3 0.045 Total 1.000 7.188 75.564 12 3 12 9 12 0.075 12 12 0 8 12 0.050 Therefore, E(X˜)=7.188 = 6.9 and V (X˜) = 23.897. 12 12 3 9 12 0.075 6 12 12 12 12 12 0.125

28/42 Sampling Distributions of Statistics (cont’d)

Example Service time for a certain type of bank transaction is a random variable having an exponential distribution with parameter λ or the density function is λe λx for λ > 0 x 0 f (x) = − ≥ 0 elsewhere

Suppose X1 and X2 are service times for two diﬀerent customers, assumed independent of each other. Consider the total service time T = X + X for two customers, it is a statistic. The cumulative distribution function of T is , for t 0, 1 2 ≥

F (t) = P(X + X t) = f (x , x )dx dx T 1 2 ≤ ZZ 1 2 1 2 (x ,x ):x +x t { 1 2 1 2≤ } t t x1 − λx1 λx2 λt λt = λe− λe− dx2dx1 = 1 e− λte− . Z0 Z0 − −

2 λt λ te− for λ > 0, t 0 The probability density function of T is obtained by diﬀerentiating FT (t), fT (t) = ≥ 0 for t < 0 This is a gamma distribution with α = 2 and β = 1/λ. The sample mean has the distribution

2 2λx¯ 4λ xe¯ − for λ > 0, x¯ 0 f ¯ (¯x) = ≥ X 0 for¯x < 0

2 It can be veriﬁed that E(X¯) = µ = 1/λ and V (X¯) = 1 = σ . 2λ2 2

29/42 Sampling Distributions of Statistics (cont’d) The second method of obtaining information about a statistic’s sampling distribution is to perform a simulation experiment. This method is usually used when a derivation via probability rules is too difficult or complicated to be carried out. The following characteristics of the simulation experiment must be specified: 1 the statistic of interest (X¯, S2, etc.); 2 the population distribution; 3 the sample size n; 4 the number of samples k. The general step is to obtain k different random samples, each of size n, from the designated population distribution. For each such sample, calculate the value of the statistics and construct a histogram of the k calculated values. This histogram gives the approximate sampling distribution of the statistic. The larger value of k, the better the approximation will tend to be. In practice, k = 500 or 1000 is usually enough if the statistics is ”fairly simple”. For a complicated statistic, 10, 000 is commonly used. 30/42 Sampling Distribution of Sample Mean

Example Consider a population having a uniform distribution that place a probability of 0.1 on each of the integer 0 to 9. It can be shown that the mean is 4.5 and the standard deviation is 2.872. Take 100 random samples of size 5 from the population and calculate the distribution of X¯, The relative frequency table and histogram are as follows.

Mean values Frequency Relative frequency Mean values Frequency Relative frequency [0.5, 1.5) 1 0.01 [1.5, 2.5) 4 0.04 [2.5, 3.5) 15 0.15 [3.5, 4.5) 33 0.33 [4.5, 5.5) 26 0.26 [5.5, 6.5) 14 0.14 [6.5, 7.5) 7 0.07

Histogram of the Mean

set.seed(12345) x<-rep(0,100) for ( i in 1:100) x[i]<-mean(sample(seq(0,9,1),5,replace=TRUE)) hist(x,breaks=seq(-0.5,9.5,1),freq=FALSE, main="Histogram of the Mean",

Probability xlim=c(0,10),xlab=" ",ylab="Probability",ylim=c(0,0.4)) lines(seq(0,10,0.01),dnorm(seq(0,10,0.01),4.5,1.2845)) segments(0,0.1,9,0.1,lty=2) segments(0,0,0,0.1,lty=2)

0.0 0.1 0.2 0.3 0.4 segments(9,0,9,0.1,lty=2) 0 2 4 6 8 10

31/42 Sampling Distribution of Sample Mean (cont’d) Example Platelets are solid components of the blood which are essential in the coagulation and clotting processes. Mean platelet volume (MPV) is an indicator of platelet function. Cameron, H.A., Phillips, R. Ibbotson, R.M., and Carson, P.H. (1983, Platelet size in myocardial infarction, British Medical Journal, 287(6390): 449-451) suggested that platelet volumes in individuals with no history of serious heart problems have a normal distribution with mean 8.25 FL (ﬂuid ounce) and standard deviation 0.75 FL. An elevated MPV also appears to be associated with increased platelet coagulability and an increased risk of heart attack and stroke.

1 Sample Size = 5 Sample Size = 10 Sample mean distributions from four simulation experiments with 500 replications of sample sizes: 5, 10, 20, and 30. Density Density 2 0.0 0.4 0.8 0.0 0.5 1.0 1.5 The shape of the histogram looks like a normal 7.0 8.0 9.0 10.0 7.0 8.0 9.0 10.0

curve

Sample Size = 20 Sample Size = 30 3 Even though the spread of the histogram is changing with sample size, its center is approximately at 8.25. Density Density

0.0 1.0 2.0 0.0 1.0 2.0 4 The spread of the histogram is inversely to the 7.0 8.0 9.0 10.0 7.0 8.0 9.0 10.0

sample size

32/42 Sampling Distribution of Sample Mean (cont’d) Example Consider a population having a log-normal distribution with µ = 3 and σ2 = 0.16.

set.seed(12345) par(mfrow=c(2,2)) n<-30 mu<-exp(3+0.16/2) sigma<-sqrt((exp(2*3+0.16))*(exp(0.16)-1)) x<-rep(0,500) for (i in 1:500) x[i]<-mean(rlnorm(n,3,0.4)) hist(x,freq=FALSE,main=paste("Sample size =","", n),xlim=c(0,50),xlab=" ",ylim=c(0,0.25)) lines(seq(0,50,0.1),dnorm(seq(0,50,0.1),mu,sigma/sqrt(n)),col="red") lines(seq(0,50,0.1),dlnorm(seq(0,50,0.1),3,0.4),col="blue")

Sample size = 5 Sample size = 10 Density Density 0.00 0.15 0.00 0.15 0 10 20 30 40 50 0 10 20 30 40 50

Sample size = 20 Sample size = 30 Density Density 0.00 0.15 0.00 0.15 0 10 20 30 40 50 0 10 20 30 40 50

33/42 The Law of Large Numbers Let X1, X2, ..., Xn be a simple random sample from a population with ﬁnite mean µ and ﬁnite variance σ2. As n , X¯ converges to µ in → ∞ certain ways.

LN(3,0.4) Bin(10,0.7) Sample mean 20 22 24 Sample mean

0 1000 2000 3000 4000 5000 6.4 6.8 7.2 7.6 Sample size 0 1000 2000 3000 4000 5000 Sample size n<-seq(5,5000,5);x.bar<-rep(0,length(n)) mu<-3;sigma<-0.4 n<-seq(5,5000,5);x.bar<-rep(0,length(n)) #sample from lognormal #sample from binomial for(i in 1:length(n)){ for(i in 1:length(n)){ x.bar[i]<-mean(rlnorm(n[i],mu,sigma)) x.bar[i]<-mean(rbinom(n[i],size=10,prob=0.7)) } } plot(n,x.bar,xlab="Sample size",ylab="Sample mean") plot(n,x.bar,xlab="Sample size",ylab="Sample mean") abline(h=exp(3+0.16/2),col="red");title("LN(3,0.4)") abline(h=7,col="red");title("Bin(10,0.7)")

34/42 The Sampling Distribution of a Sample Mean

Example A large automobile service center charge $40, $45, and $50 for a tune-up of four-, six- and eight-cylinder cars, respectively. If 20% of its tune-ups are done on four-cylinder cars, and 30% on six-cylinder cars, and 50% on eight-cylinder cars, then the probability distribution of revenue from a single randomly selected, ﬁnished tune-up is given by

x f (x) xf (x) x2f (x) 40 0.2 8.0 320.0 45 0.3 13.5 607.5 50 0.5 25.0 1250.0 Total 1.0 µ = 46.5 2177.5

Therefore the variance is σ2 = 2177.5 46.52 = 15.25. − On a particular day only two servicing jobs involve tune-ups. Let X1 be the revenue from the ﬁrst tune-up and X2 be the revenue from the second. Suppose that X1 and X2 are independent, each with the probability distribution given above (so that X1 and X2 constitute a random sample from the distribution). Find the probability distribution of sample mean X¯.

35/42 The Sampling Distribution of a Sample Mean (cont’d) Example

Joint distribution of X1 and X2 is x1 x2 P[X1 = x1, X2 = x2] x¯ 40 40 0.2 0.2=0.04 40.0 Distribution of X¯ is × 40 45 0.2 0.3=0.06 42.5 x¯ f (¯x) xf¯ (¯x) x¯2f (¯x) × 40 50 0.2 0.5=0.10 45.0 40.0 0.04 1.60 64.000 × 45 40 0.3 0.2=0.06 42.5 42.5 0.12 5.10 216.750 × 45 45 0.3 0.3=0.09 45.0 45.0 0.29 13.05 587.250 × 45 50 0.3 0.5=0.15 47.5 47.5 0.30 14.25 676.875 × 50 40 0.5 0.2=0.10 45.0 50.0 0.25 12.50 625.000 × 50 45 0.5 0.3=0.15 47.5 1.00 46.5 2169.875 50 50 0.5 × 0.5=0.25 50.0 ×

¯ ¯ 15.25 2 2 Therefore, E(X )=46.5 = µ, V (X )=7.625 = 2 , and E(S )=15.25 = σ . It can be veriﬁed that if four tune-ups had been done on the day of interest, the sample average revenue X¯ would have the following distribution

x¯ 40 41.25 42.5 43.75 45 f(x) 0.0016 0.0096 0.0376 0.0936 0.1761

x¯ 46.25 47.5 48.75 50 f(x) 0.2340 0.2350 0.1500 0.0625

¯ ¯ 15.25 2 2 E(X )=46.5 = µ, V (X )=3.8125 = 4 , and E(S )=15.25 = σ .

36/42 The Sampling Distribution of a Sample Mean (cont’d) Let X1, X2, , Xn be a random sample of size n from a population with mean µ and standard··· deviation σ.

1 E[X¯]= µ = population mean,

σ2 population variance 2 V [X¯]= = , n sample size

σ population standard deviation 3 SD[X¯]= = . √n sample size

4 The sample total T = X1 + X2 + + Xn has mean nµ and standard deviation √nσ. ···

5 The sample mean of a random sample from a normal distribution N(µ,σ2) has the normal distribution with mean µ and standard deviation σ/√n, i.e., 2 X¯ N(µ, σ ). ∼ n 6 If X1, X2, , Xn are independent, normally distributed random variables ··· 2 E(Xi )= µi , and V (Xi )= σi , then any linear combination of the Xi ’s a1X1 + a2X2 + ... + anXn has a normal distribution with mean 2 2 2 2 2 2 a1µ1 + a2µ2 + ... + anµn and variance a1σ1 + a2σ2 + ... + anσn. In particular, X X N(µ µ ,σ2 + σ2). 1 − 2 ∼ 1 − 2 1 2 37/42 The Sampling Distribution of a Sample Mean (cont’d) Example The time that it takes a randomly selected rat of certain subspecies to ﬁnd its way through a maze is a normally distributed random variable with µ = 1.5 minutes and σ = 0.35 minute. Suppose ﬁve rats are selected. Let X1, X2, X3, X4, and X5 denote their times in the maze. Then Xi ’s are random samples from the normal distribution N(1.5, 0.352).

1 What is the probability that the total time T = X1 + X2 + X3 + X4 + X5 for the ﬁve rats to ﬁnd their ways through a maze is between 6 and 8 minutes?

2 What is the probability that the average time X¯ for the ﬁve rats is at most 2 minutes?

38/42 The Sampling Distribution of a Sample Mean (cont’d) Example The time that it takes a randomly selected rat of certain subspecies to ﬁnd its way through a maze is a normally distributed random variable with µ = 1.5 minutes and σ = 0.35 minute. Suppose ﬁve rats are selected. Let X1, X2, X3, X4, and X5 denote their times in the maze. Then Xi ’s are random samples from the normal distribution N(1.5, 0.352).

1 What is the probability that the total time T = X1 + X2 + X3 + X4 + X5 for the ﬁve rats to ﬁnd their ways through a maze is between 6 and 8 minutes?

2 What is the probability that the average time X¯ for the ﬁve rats is at most 2 minutes?

1 T N(5µ, 5σ2)= N(7.5, 0.6125). P[6 T 8] = 0.7115. ∼ ≤ ≤ σ2 . 2 2 X¯ N(µ, )= N(1.5, 0 35 ). P[X¯ 2] = 0.9993. ∼ 5 5 ≤

> pnorm(8,7.5,sqrt(0.6125))-pnorm(6,7.5,sqrt(0.6125)) [1] 0.7109059 > pnorm(2,1.5,0.35/sqrt(5)) [1] 0.9992993

38/42 The Central Limit Theorem (CLT) When sampling from a nonnormal population, the distribution of sample mean depends on the particular form of the population that prevails.

Distribution of Sample Mean Distribution of Sample Mean

χ2 χ2 ( , ) N(3, 6) N 3 6 5

Distributions of Sample Means from N(5,400) Density Density

n=1

n=4 0.0 0.1 0.2 0.3 0.4 n=16 0.00 0.10 0.20 0 5 10 0 1 2 3 4 5 6

Distribution of Sample Mean Distribution of Sample Mean

χ2 2 N(3, 6 15) χ N(3, 6 35) 0.00 0.02 0.04 0.06 0.08 Density Density −20 −10 0 10 20 30 0.0 0.2 0.4 0.6 0.0 0.4 0.8

1 2 3 4 5 2.0 2.5 3.0 3.5 4.0

39/42 The Central Limit Theorem (cont’d)

Let X1, X2, ..., Xn be a random sample from an arbitrary population with mean µ and standard deviation σ. If n is sufficiently large the distribution of X¯ is approximately normal with mean µ and finite standard deviation σ/√n, i.e., X¯ µ −σ N(0, 1) approximately. √n ∼ The sample total T = X1 + X2 + ... + Xn has approximately a normal distribution 2 T nµ with mean nµ and variance nσ , i.e., − N(0, 1) approximately. σ√n ∼ When the population distribution is a Bernoulli distribution with a successful T np probability p, T Bin(n, p) and − N(0, 1) approximately; √np(1 p) ∼ − ∼ X¯ p X¯ p − N = ˆ is the sample proportion and p(1 p) (0, 1) approximately. This is − ∼ q n called de Moivre-Laplace (Abraham de Moivre and Pierre Simon de Laplace) central limit theorem (coined by George Pólya in 1920).

Let X1, X2, , Xn be a random sample from a distribution for which only ··· positive values are possible, that is, P[Xi > 0]=1. Then if n is sufficiently large, the product Y = X1X2 Xn has approximately a lognormal distribution. ··· The larger the value of n, the better the normal approximation. A practical difficulty in applying CLT is in knowing when n is sufficiently large. Usually, n > 30 is a rule of thumb. However, there are population distributions for which even an n of 40 or 50 is not enough and there are some for which an n much less than 30 will suffice. 40/42 The Central Limit Theorem (cont’d)

Example The population distribution of the gripping strength of industrial workers is known to have a distribution with mean of 110 and standard deviation of 10. For a random sample of 75 workers, what is the probability that the sample mean gripping strength will be between 109 and 112? greater than 111?

> pnorm(112,110,10/sqrt(75))-pnorm(109,110,10/sqrt(75)) [1] 0.7651296 > 1-pnorm(111,110,10/sqrt(75)) [1] 0.1932381

Example When a batch of a certain chemical product is prepared, the amount of a particular impurity in the batch is a random variable with mean value 4.0g and standard deviation 1.5g. if 50 batches are independently prepared, what is the probability that the average amount of impurity is between 3.5g and 3.8g?

> pnorm(3.8,4,1.5/sqrt(50))-pnorm(3.5,4,1.5/sqrt(50)) [1] 0.1636782 \begin{verbatim}

41/42 Other Sampling Distributions

1 Let X1, X2, , Xn be a random sample of size n (where n is small) from a normal ··· n n 2 1 2 1 2 X¯ µ population N(µ,σ ) and X¯ = Xi and S = (Xi X¯) . Then T = − n n 1 − S/√n iP=1 − iP=1 has a student’s t distribution with n 1 degrees of freedom. Student, 1908. The probable − error of a mean. Biometrika 6 (1) 1-25. See Stigler, S.M., 1984, The American Statistician, 38(2), 134-135 for proof. 2 2 Let X1, X2, , Xn be a random sample from a normal population N(µ,σ ) and n ··· 2 n n (Xi X¯) n S2 − ¯ 1 2 1 ¯ 2 ( 1) iP=1 2 X = n Xi and S = n (Xi X ) . Then − 2 = 2 has a χn 1 1 − σ σ − iP=1 − iP=1 distribution. Pearson, K., 1900. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can reasonably supposed to have arisen from random sampling. Philosophical Magazine Series 5, 157. 3 2 let X1, X2, , Xn1 be a random sample from a normal population N(µ1,σ1 ) and n1 ··· n1 1 2 1 2 X¯ = Xi and S = (Xi X¯) . let Y1, Y2, , Yn be a random n1 1 n1 1 2 i=1 − i=1 − ··· P P n2 2 1 sample from a normal population N(µ2,σ ) and Y¯ = Yj and 2 n2 j=1 n P 2 2S2 S 2 1 Y Y¯ 2 F σ2 1 2 = n 1 ( j ) . Then = σ2S2 has an F distribution with parameters 2 − 1 2 − jP=1 ν1 = n1 1 and ν2 = n2 1 denoted by Fn 1,n 1 or F (n1 1, n2 1). Snedecor,W., − − 1− 2− − − 1934. Calculation and Interpretation of Analysis of Variance and Covariance. Ames, Ia., Collegiate Press Inc.

42/42