<<

CHAPTER 8

FUNDAMENTAL DISTRIBUTIONS AND DESCRIPTIONS

8.1 Random Sampling pling procedure, it is desirable to choose a random in the sense that the observations are made The basic idea of the is that we independently and at random. are allowed to draw inferences or conclusions about a Random Sample population based on the statistics computed from the sample data so that we could infer something about Let X1,X2,...,Xn be n independent random variables, the parameters and obtain more information about the each having the same f (x). population. Thus we must make sure that the samples Define X1,X2,...,Xn to be a random sample of size must be good representatives of the population and n from the population f (x) and write its joint proba- pay attention on the sampling bias and variability to bility distribution as ensure the validity of statistical inference. f (x1,x2,...,xn) = f (x1) f (x2) f (xn). ···

8.2 Some Important Statistics

It is important to measure the center and the variabil- ity of the population. For the purpose of the inference, we study the following measures regarding to the cen- ter and the variability.

8.2.1 Location Measures of a Sample

The most commonly used statistics for measuring the center of a set of data, arranged in order of mag- nitude, are the sample , sample , and sample . Let X1,X2,...,Xn represent n random variables.

Sample Mean To calculate the , or mean, add all values, then Bias divide by the of individuals. n Any sampling procedure that produces inferences that X1 + X2 + + Xn 1 consistently overestimate or consistently underestimate X = ··· = ∑ Xi n n i=1 some characteristic of the population is said to be bi- ased. where X is the special symbol of the sample mean and 1 n x = x denotes its value, or the realization of X. To eliminate any possibility of bias in the sam- ∑ i n i=1 38 Chapter 8. Fundamental Sampling Distributions and Data Descriptions

NOTE. The mean is the balance point. It is the “center The sample “S2” is used to describe the vari- of mass”. ation around the mean. We use

EXAMPLE 8.1. The weights of a group of students (in 2 1 2 s = (xi x) lbs) are given below: n 1 ∑ − − 1 (∑x)2 135 105 118 163 172 183 122 150 121 162 = ∑x2 n 1 " − n # Find the mean. If another student joins in the group and − n∑(x2) (∑x)2 his weight is 250 lbs, what would be the new mean? = − n(n 1) − Sample Median to denote the realization or the computed values of S2. The number such that half of the observations are smaller and half are larger, i.e., the midpoint of a dis- Sample tribution. The sample standard deviation is the squared root of

x(n+1)/2 if n is odd the sample variance. x˜ = 1 ( xn/2 + xn/2+1 if n is even 2 S = √S2  EXAMPLE 8.2. The weights of a group of students (in and lbs) are given below: 1 2 s = (xi x) n 1 ∑ − 135 105 118 163 172 183 122 150 121 162 r −

Find the median. If another student joins in the group NOTE. Properties of Standard Deviation and his weight is 250 lbs, what would be the new me- dian? s measures spread about the mean and should be • used only when the mean is the measure of center.

Sample Mode s = 0 only when all observations have the same • The mode of a data set is the value that occurs most value and there is no spread. Otherwise, s > 0. frequently. s gets larger, as the observations become more • spread out about their mean. The cases are unimodal, bimodal, multimodal and no mode. The mode is/are the value(s) whose s has the same units of measurement as the origi- frequencies are the largest (the peaks). • nal observations. EXAMPLE 8.3. The weights of three group of students NOTE. The standard deviation of a population is de- (in lbs) are given below: ∑(x µ)2 fined by σ = − , where N is the population N (a) 135,105,118,163,172,183,122,150 size and µ is populationr mean. Be careful with the de- nominator inside square-root is N, instead of N 1. (b) 135,105,118,163,172,183,122,135 − (c) 135,135,118,118,122,118,122,135 EXAMPLE 8.4. Calculate the sample variance and the sample standard deviation of the following set of data: Find the mode for each group. 0 1 2 3 9 − − 8.2.2 Variability Measures of a Sample Sample The most commonly used statistics for measuring the The sample range R of a data set is defined as center of a set of data, arranged in order of magnitude,

are the sample variance, sample standard deviation, R = Xmax Xmin − and sample range. Let X1,X2,...,Xn represent n ran- dom variables. EXAMPLE 8.5. Refer to Example 8.4. Find the sample Sample Variance range.

STAT-3611 Lecture Notes 2015 Fall X. Li Section 8.4. of and the 39

8.3 Sampling Distributions The variation of X is much smaller than that of the • population. The standard deviation of X decreases as the sample size n increases. Sampling Distribution The above results do NOT require any assump- In general, the sampling distribution of a given • tions on the shape of the population. However, a is the distribution of the values taken by the statistic random sample is a must. in all possible samples of the same size form the same population. EXAMPLE 8.7. The mean and standard deviation of the strength of a packaging material are 55 kg and 6 kg, re- In other words, if we repeatedly collect samples spectively. A quality manager takes a random sample of of the same sample size from the population, compute specimens of this material and tests their strength. If the the statistics (mean, standard deviation, proportion), manager wants to reduce the standard deviation of X to and then draw a of those statistics, the dis- 1.5 kg, how many specimens should be tested? tribution of that histogram tends to have is called the sample distribution of that statistics (mean, standard EXAMPLE 8.8. A soft-drink machine is regulated so deviation, proportion). that the amount of drink dispensed 240 milliliters with a standard deviation of 15 milliliters. Periodically, NOTE. The statistical applets are good tools to study the machine is checked by taking a sample of 40 drinks the sampling distribution. Check out the Rice Univer- and computing the average content. If the mean of the sity Applets at http://onlinestatbook.com/stat_ 40 drinks is a value within the 2 , the sim/sampling_dist/index.html. µX σX machine is thought to be operating satisfactorily;± other- wise, adjustments are made. The company official found the mean of 40 drinks to be x = 236 milliliters and con- 8.4 Sampling Distribution of Means cluded that the machine needed no adjustment. Was this and the Central Limit Theorem a reasonable decision?

8.4.1 Sampling Distribution of Sample Means Sampling Distribution of Sample Means from a from a Normal Population Normal Population 1 n Theorem. Let X = ∑ Xi be the sample mean of a Mean and Standard Deviation of a Sample Mean n i=1 random sample of size n drawn from a normal popu- Theorem. Let X be the sample mean of a random sam- lation having mean µ and standard deviation σ, then X ple of size n drawn from a population having mean µ and follows an exact with mean µ and standard deviation σ, then the mean of X is standard deviation σ/√n. That is,

µX = µ Xi N(µ, σ) = X N µ, σ/√n . and the standard deviation of X is ∼ ⇒ ∼ σ  σ = X √n EXAMPLE 8.9. Prove the above theorem. NOTE. One of the essential assumptions is a ran- EXAMPLE 8.6. Prove the above theorem. dom sample• .

The distribution of X has the EXACTLY normal • distribution if the random sample is from a nor- mal population.

EXAMPLE 8.10. The contents of bottles of beer vary according to a normal distribution with mean µ = 341 ml and standard deviation σ = 3 ml.

(a) What is the probability that the content of a ran- domly selected bottle is less than 339 ml?

NOTE. The sample mean X is an unbiased esti- (b) What is the probability that the average content of mator• of the population mean µ and is less vari- the bottles in a 12-pack of beer is less than 339 able than a single observation. ml?

X. Li 2015 Fall STAT-3611 Lecture Notes 40 Chapter 8. Fundamental Sampling Distributions and Data Descriptions

EXAMPLE 8.11. A patient is classified as having ges- approximate probability statement concerning the sam- tational diabetes if the glucose level is above 140 mil- ple mean, without knowledge of the shape of the popu- ligrams per deciliter (mg/dl) one hour after a sugary drink lation distribution. is ingested. Sheila’s measured glucose level one hour after ingesting the sugary drink varies according to the Again, one of the essential assumptions is a ran- normal distribution with µ = 125 mg/dl and σ = 10 • dom sample. mg/dl. The distribution of X has the approximately nor- • mal (a) If a single glucose measurement is made, what is distribution if the random sample is from a other than the probability that Sheila is diagnosed as having population normal. gestational diabetes? How large a sample size? Usually, it would safe • to apply the CLT if n 30. It also depends on the (b) If measurements are made on three separate days population distribution,≥ however. More observa- and the mean result is compared with the criterion tions are required if the population distribution is 140 mg/dl, what is the probability that Sheila is far from normal. diagnosed as having gestational diabetes? EXAMPLE 8.12. The time a family physician spends (c) What is the level L such that there is probability seeing a patient follows some right-skewed distribution only 5% that the mean glucose level of three test with a mean of 15 minutes and a standard deviation of results fall above L for Sheila’s glucose level dis- 11.6 minutes. tribution. (a) Can you calculate the probability that the doctor 8.4.2 The Central Limit Theorem (CLT) spends less than 12 minutes with the next patient she sees? If so, do it. If not, explain why. (b) What is the probability that the doctor spends an average time between 13 and 18 minutes with her 30 patients of the day? (c) One day, 35 patients have an appointment to see the doctor. What is the probability that she will have to work overtime, beyond her 8-hour shift?

8.4.3 Sampling Distribution of the Differ- ence between Two Means

Suppose that we have two populations, the first with mean µ1 and standard deviation σ1, and the second with mean µ2 and standard deviation σ2. We take a random sample of size n from the first population and Theorem (Central Limit Theorem). If X is the mean 1 measure some variable X , and take an independent of a random sample of size n taken from a population 1 random sample of size n from the second population with mean µ and finite variance σ 2, then 2 and measure the value of the some variable X2. X µ Z = − n(z;0,1) By the Central Limit Theorem, we know that, if σ/√n → n1 and n2 are sufficiently large, as n ∞. X1 · N(µ1, σ1/√n1), → ∼ In other words, if a random sample of size n is selected and from any population with mean µ and standard devi- X2 · N(µ2, σ2/√n2). ation σ, then ∼ It can also be shown that X is approximately N µ, σ/√n , 2 2  σ1 σ2 when n is sufficiently large. X1 X2 · N µ1 µ2, + . − ∼  − s n1 n2  NOTE. The Central Limit Theorem is important because, for reasonably large sample size, it allows us to make an  

STAT-3611 Lecture Notes 2015 Fall X. Li Section 8.6. t-Distribution 41

Theorem. If independent samples of size n1 and n2 are NOTE (Degrees of Freedom). There are n degrees of drawn at random from two populations, discrete or con- freedom, or independent pieces of information, in the 2 2 tinuous, with means µ1 and µ2 and σ1 and σ2 , random sample from the normal distribution. When the respectively, then the sampling distribution of the dif- data (the values in the sample) are used to compute the ferences of means, X1 X2, is approximately normally mean (i.e., when µ is replaced by x), a degree of free- distributed with mean and− variance given by dom is lost in the estimation of µ. Hence, there are the remaining (n 1) degrees of freedom in the information σ 2 σ 2 − 2 µ = µ µ and σ 2 = 1 + 2 . used to estimate σ . X1 X2 1 2 X X − − 1− 2 n1 n2 2 2 Let χα (ν) be the χ value above which we find So, an area of α under the curve of the chi-squared distri- bution with ν degrees of freedom. That is, X1 X2 (µ1 µ2) Z = − − − · N(0,1) P χ2(ν) > χ2 (ν) = α. 2 2 ∼ α σ1 /n1 + σ2 /n2  q We use table A.5. to find these critical values of the NOTE. If both samples are from the normal popula- chi-squared distribution with ν degrees of freedom. tions, the sampling distribution of X1 X2 will be ex- EXAMPLE 8.15. Find the critical values actly normal, instead of approximate normal.− 2 EXAMPLE 8.13. We take a random sample of five 10- (a) χ0.95(4) year-old boys and four 10-year-old girls and measure 2 (b) χ0.75(22) their heights. Suppose that we know that heights X1 of 10-year old boys follow a normal distribution with mean EXAMPLE 8.16. Find k such that P χ2(12) < k = 0.80. 55.7 inches and standard deviation 2.9 inches, and that EXAMPLE 8.17. Use Table A.5. to give the best esti- heights X2 of 10-year old girls follow a normal distribu- tion with mean 54.1 inches and standard deviation 2.6 mate to each of the following probabilities. inches. What is the probability that the mean height of (a)P χ2(5) 3 the girls in the sample is smaller than the mean height ≥ for the boys in the sample? (b)P χ2(8) > 3.33 EXAMPLE 8.14. A research on bulimia among college (c)P χ2(10) 6.66 women studies the connection between childhood sexual ≤ abuse and a measure of family cohesion (the higher the (d)P χ2(25) > 99.9 score, the greater the cohesion). Assume that sexually abused students have an average family cohesion scale  of 2.8 and a standard deviation of 2.1, while non-abused 8.6 t-Distribution students have the average scale of 4.8 and a standard de- viation of 3.2. What is the probability that a random We have learned that Z X µ (exactly or approxi- sample of 49 non-abused students will have an average = σ/−√n family cohesion scale that is at least 0.5 scores higher mately) follows the standard normal distribution, where than the average scale of a randoms sample of 36 sexu- the data are from a random sample of size n from the ally abused students? What can you conclude? population with mean µ and standard deviation σ. And, it is very likely that both µ and σ are unknown parameters. In practice, it suffices that the distribu- 8.5 Sampling Distribution of S2 tion is symmetric and single-peaked unless the sample is very small. Since most of the simple work in statistical in- Distribution of (n 1)S2/σ 2 − ference focus on the unknown population mean µ, we If S2 is the variance of a random sample of size n taken will need deal with the unknown σ especially when n is from a normal population having the variance σ 2, then not large. It is quite intuitive and natural to estimate the statistic the unknown population standard deviation σ using the sample standard deviation S. 2 n 2 (n 1)S Xi X χ2 = − = − X µ σ 2 ∑ σ 2 We have another statistic T = − as an ana- i=1  S/√n X µ has a chi-squared distribution with ν = n 1 degrees log sample version of Z = − . of freedom. − σ/√n

X. Li 2015 Fall STAT-3611 Lecture Notes 42 Chapter 8. Fundamental Sampling Distributions and Data Descriptions

Student t distribution Because the symmetrically property, t1 α = tα . − − Let X1,X2,...,Xn be independent random variables that We use table A.4. to find these critical values of are all normal with mean µ and standard deviation σ. the t distribution with ν degrees of freedom. 1 n 2 1 n 2 Let X = ∑ Xi and S = ∑ Xi X . Then NOTE. The t table, as well as the χ2 table, gives us n i=1 n 1 i=1 − the − the UPPER tail probabilities, while the z table gives the  X µ lower tail probabilities. T = − S n /√ EXAMPLE 8.18. Find the critical values. has a t-distribution with ν = n 1 degrees of freedom. − NOTE. When n is very large, s is a very good estimate (a) t0.005(5), t0.05(5), t0.5(5), t0.85(5), t0.975(5) of σ, and the corresponding t distributions are very close (b) t0 10(10), t0 20(20), t0 30(30), t0 40(40), t0 60(60) to the normal distribution. The t distributions become . . . . . wider for smaller sample sizes, reflecting the lack of pre- (c) t0.90(10), t0.95(15), t0.99(19) cision in estimating σ from s. EXAMPLE 8.19. Let T(ν) denote the Student t-distribution with ν degrees of freedom. Find k such that

(a)P (T(8) > k) = 0.02. (b)P (T(18) < k) = 0.80. (c)P (T(28) k) = 0.99. ≥ EXAMPLE 8.20. Use Table A.4. to give the best esti- mate to each of the following probabilities.

(a)P (T(5) 1.11) ≥ 248 Chapter 8 Fundamental Sampling Distributions and Data Descriptions (b)P (T(8) < 2.22) Important Properties of the Student t distribution (c)P (T(10) 3.33) Corollary 8.1: Let X1,X2,...,Xn be independentThe t distribution random variables is differentthat are all normalfor different with sample ≥ mean µ and standard• deviationsizes,σ. or Let different degrees of freedom. (d)P (T(15) > 4.44) 1 n 1 n X¯ = X and S2 = (X X¯)2. n i n 1 i − i=1 The t distributioni=1 has the same general symmet- NOTE. Clearly, !• − ! ric bellX¯ µ shape as the standard normal distribu- Then the random variable T = − has a t-distribution with v = n 1 degrees S/√n − X µ of freedom. tion, but it reflects the greater variability (with σ/−√n The probability distributionwider of T distributions)was first published inthat 1908 isin a expected paper written with small T = by W. S. Gosset. At the time, Gosset was employed by an Irish brewery that [(n 1)S2/σ 2] samples. − prohibited publication of research by members of its staff. To circumvent this re- n 1 striction, he published his work secretly under the name “Student.” Consequently, r − the distribution of T is usuallyThe calledt distribution the Student t-distribution has a mean or simply of t the=t-0. Z distribution. In deriving• the equation of this distribution, Gosset assumed that the = samples were selected fromThe a normal standard population. deviation Although this of would the seemt distribution to be a varies χ2(n 1) very restrictive assumption,• it can be shown that nonnormal populations possess- n −1 ing nearly bell-shaped distributionswith the will sample still provide size, values but of T that it is approximate greater than 1. − the t-distribution very closely. q As the sample size n gets larger, the t distribu- More generally, What Does the t-Distribution Look• Like? tion gets closer to the standard normal distribu- Theorem. Let Z be a standard normal random variable The distribution of T is similar to the distribution of Z in that they both are symmetric about a meantion. of zero. Both distributions are bell shaped, but the t- and V a chi-squared random variable with ν degrees of distribution is more variable, owing to the fact that the T -values depend on the fluctuations of two quantities, X¯ and S2, whereas the Z-values depend only on the freedom. If Z and V are independent, then the distribu- ¯ changes in X from sampleLet to sample.tα (ν) Thebe distribution the t value of T di aboveffers from which that of Z we find an tion of the random variable T, where in that the variance of T depends on the sample size n and is always greater than 1. Only when thearea sample of sizeαnunderwill the the two curve distributions of the becomet distribution the same. with ν →∞ Z In Figure 8.8, wedegrees show the relationshipof freedom. between That a standard is, normal distribution T (v = ) and t-distributions with 2 and 5 degrees of freedom. The percentage = points∞ of the t-distribution are given in Table A.4. V/ν P(T(ν) > tα (ν)) = α. v ϭ ϱ is given by the density functionp v ϭ 5 v ϭ 2 (ν+1)/2 Γ[(ν + 1)/2] t2 − h(t) = 1 + , ∞ < t < ∞. Γ(ν/2)√πν ν −   This is known as the t-distribution with ν degrees of t Ϫ2 Ϫ1012 t1Ϫ α ϭ Ϫtα 0 tα freedom.

Figure 8.8: The t-distribution curves for v =2, 5, Figure 8.9: Symmetry property (about 0) of the and . t-distribution. ∞ STAT-3611 Lecture Notes 2015 Fall X. Li