Chapter 8 Fundamental Sampling Distributions And

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Random Sampling pling procedure, it is desirable to choose a random sample in the sense that the observations are made The basic idea of the statistical inference is that we independently and at random. are allowed to draw inferences or conclusions about a Random Sample population based on the statistics computed from the sample data so that we could infer something about Let X1;X2;:::;Xn be n independent random variables, the parameters and obtain more information about the each having the same probability distribution f (x). population. Thus we must make sure that the samples Define X1;X2;:::;Xn to be a random sample of size must be good representatives of the population and n from the population f (x) and write its joint proba- pay attention on the sampling bias and variability to bility distribution as ensure the validity of statistical inference. f (x1;x2;:::;xn) = f (x1) f (x2) f (xn): ··· 8.2 Some Important Statistics It is important to measure the center and the variability of the population. For the purpose of the inference, we study the following measures regarding to the center and the variability. 8.2.1 Location Measures of a Sample The most commonly used statistics for measuring the center of a set of data, arranged in order of magnitude, are the sample mean, sample median, and sample mode. Let X1;X2;:::;Xn represent n random variables. Sample Mean To calculate the average, or mean, add all values, then Bias divide by the number of individuals. n Any sampling procedure that produces inferences that X1 + X2 + + Xn 1 consistently overestimate or consistently underestimate X = ··· = ∑ Xi n n i=1 some characteristic of the population is said to be bi- ased. where X is the special symbol of the sample mean and 1 n x = x denotes its value, or the realization of X. To eliminate any possibility of bias in the sam- ∑ i n i=1 38 Chapter 8. Fundamental Sampling Distributions and Data Descriptions NOTE. The mean is the balance point. It is the “center The sample variance \S2" is used to describe the vari- of mass”. ation around the mean. We use EXAMPLE 8.1. The weights of a group of students (in 2 1 2 s = (xi x) lbs) are given below: n 1 ∑ − − 1 (∑x)2 135 105 118 163 172 183 122 150 121 162 = ∑x2 n 1 " − n # Find the mean. If another student joins in the group and − n∑(x2) (∑x)2 his weight is 250 lbs, what would be the new mean? = − n(n 1) − Sample Median to denote the realization or the computed values of S2. The number such that half of the observations are smaller and half are larger, i.e., the midpoint of a dis- Sample Standard Deviation tribution. The sample standard deviation is the squared root of x(n+1)=2 if n is odd the sample variance. x˜ = 1 ( xn=2 + xn=2+1 if n is even 2 S = pS2 EXAMPLE 8.2. The weights of a group of students (in and lbs) are given below: 1 2 s = (xi x) n 1 ∑ − 135 105 118 163 172 183 122 150 121 162 r − Find the median. If another student joins in the group NOTE. Properties of Standard Deviation and his weight is 250 lbs, what would be the new median? s measures spread about the mean and should be • used only when the mean is the measure of center. Sample Mode s = 0 only when all observations have the same • The mode of a data set is the value that occurs most value and there is no spread. Otherwise, s > 0. frequently. s gets larger, as the observations become more • spread out about their mean. The cases are unimodal, bimodal, multimodal and no mode. The mode is/are the value(s) whose s has the same units of measurement as the origi- frequencies are the largest (the peaks). • nal observations. EXAMPLE 8.3. The weights of three group of students NOTE. The standard deviation of a population is de- (in lbs) are given below: ∑(x m)2 fined by s = − , where N is the population N (a) 135;105;118;163;172;183;122;150 size and m is populationr mean. Be careful with the de- nominator inside square-root is N, instead of N 1. (b) 135;105;118;163;172;183;122;135 − (c) 135;135;118;118;122;118;122;135 EXAMPLE 8.4. Calculate the sample variance and the sample standard deviation of the following set of data: Find the mode for each group. 0 1 2 3 9 − − 8.2.2 Variability Measures of a Sample Sample Range The most commonly used statistics for measuring the The sample range R of a data set is defined as center of a set of data, arranged in order of magnitude, are the sample variance, sample standard deviation, R = Xmax Xmin − and sample range. Let X1;X2;:::;Xn represent n random variables. EXAMPLE 8.5. Refer to Example 8.4. Find the sample Sample Variance range. STAT-3611 Lecture Notes 2015 Fall X. Li Section 8.4. Sampling Distribution of Means and the Central Limit Theorem 39 8.3 Sampling Distributions The variation of X is much smaller than that of the • population. The standard deviation of X decreases as the sample size n increases. Sampling Distribution The above results do NOT require any assump- In general, the sampling distribution of a given statistic • tions on the shape of the population. However, a is the distribution of the values taken by the statistic random sample is a must. in all possible samples of the same size form the same population. EXAMPLE 8.7. The mean and standard deviation of the strength of a packaging material are 55 kg and 6 kg, re- In other words, if we repeatedly collect samples spectively. A quality manager takes a random sample of of the same sample size from the population, compute specimens of this material and tests their strength. If the the statistics (mean, standard deviation, proportion), manager wants to reduce the standard deviation of X to and then draw a histogram of those statistics, the dis- 1:5 kg, how many specimens should be tested? tribution of that histogram tends to have is called the sample distribution of that statistics (mean, standard EXAMPLE 8.8. A soft-drink machine is regulated so deviation, proportion). that the amount of drink dispensed averages 240 milliliters with a standard deviation of 15 milliliters. Periodically, NOTE. The statistical applets are good tools to study the machine is checked by taking a sample of 40 drinks the sampling distribution. Check out the Rice Univer- and computing the average content. If the mean of the sity Applets at http://onlinestatbook.com/stat_ 40 drinks is a value within the interval 2 , the sim/sampling_dist/index.html. mX sX machine is thought to be operating satisfactorily;± otherwise, adjustments are made. The company official found the mean of 40 drinks to be x = 236 milliliters and con- 8.4 Sampling Distribution of Means cluded that the machine needed no adjustment. Was this and the Central Limit Theorem a reasonable decision? 8.4.1 Sampling Distribution of Sample Means Sampling Distribution of Sample Means from a from a Normal Population Normal Population 1 n Theorem. Let X = ∑ Xi be the sample mean of a Mean and Standard Deviation of a Sample Mean n i=1 random sample of size n drawn from a normal popu- Theorem. Let X be the sample mean of a random sam- lation having mean m and standard deviation s, then X ple of size n drawn from a population having mean m and follows an exact normal distribution with mean m and standard deviation s, then the mean of X is standard deviation s=pn. That is, mX = m Xi N(m; s) = X N m; s=pn : and the standard deviation of X is ∼ ) ∼ s s = X pn EXAMPLE 8.9. Prove the above theorem. NOTE. One of the essential assumptions is a ran- EXAMPLE 8.6. Prove the above theorem. dom sample• . The distribution of X has the EXACTLY normal • distribution if the random sample is from a normal population. EXAMPLE 8.10. The contents of bottles of beer vary according to a normal distribution with mean m = 341 ml and standard deviation s = 3 ml. (a) What is the probability that the content of a ran- domly selected bottle is less than 339 ml? NOTE. The sample mean X is an unbiased esti- (b) What is the probability that the average content of mator• of the population mean m and is less vari- the bottles in a 12-pack of beer is less than 339 able than a single observation. ml? X. Li 2015 Fall STAT-3611 Lecture Notes 40 Chapter 8. Fundamental Sampling Distributions and Data Descriptions EXAMPLE 8.11. A patient is classified as having ges- approximate probability statement concerning the sam- tational diabetes if the glucose level is above 140 mil- ple mean, without knowledge of the shape of the popu- ligrams per deciliter (mg/dl) one hour after a sugary drink lation distribution. is ingested. Sheila’s measured glucose level one hour after ingesting the sugary drink varies according to the Again, one of the essential assumptions is a ran- normal distribution with m = 125 mg/dl and s = 10 • dom sample. mg/dl. The distribution of X has the approximately nor- • mal (a) If a single glucose measurement is made, what is distribution if the random sample is from a other than the probability that Sheila is diagnosed as having population normal.

Chapter 8 Fundamental Sampling Distributions And

5. the Student T Distribution

Chapter 4: Central Tendency and Dispersion

Notes Mean, Median, Mode & Range

Permutation Tests

Sampling Distribution of the Variance

Arxiv:1804.01620V1 [Stat.ML]

Examination of Residuals

Lecture 14 Testing for Kurtosis

Volatility Modeling Using the Student's T Distribution

The Value at the Mode in Multivariate T Distributions: a Curiosity Or Not?

Statistics, Measures of Central Tendency I

Statistics Sampling Distribution Note