Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 7 Sampling Distributions and Point Estimation of Parameters Part 1: Sampling Distributions, the Central Limit Theorem, Point Estimation & Estimators Sections 7-1 to 7-2 1 / 26 Statistical Inferences A random sample is collected on a population to draw conclusions, or make statistical inferences, about the population. Definition (Random Sample) The random variables X1;X2;:::;Xn are a random sample of size n if... 1) the Xi's are independent 2) every Xi has the same probability distribution Types of statistical inference: 1 Parameter estimation (e.g. estimating µ) with a confidence interval For estimating µ, we collect data and we use the observed sample mean x¯ as a point estimate for µ and create a confidence interval to report a likely range in which µ lies. 2 Hypothesis testing about a population parameter (e.g. H0 : µ = 50) We wish to compare the mean time that women and men spend at the CRWC. H0 : µM = µW ? Or perhaps there is evidence against this hypothesis. 2 / 26 Sample Mean X¯, a Point Estimate for the population mean µ The sample mean X¯ is a point estimate for the population mean µ. NOTATION: µ^ = X¯ (a `hat' over a parameter represents an estimator, X¯ is the estimator here) Prior to data collection, X¯ is a random variable and it is the statistic of interest calculated from the data when estimating µ. The value we get for X¯ (the sample mean) depends on the specific sample chosen! If X¯ is a random variable, then it has a certain expected value, variance, and distribution. The distribution of the random variable X¯ is called the sampling distribution of X¯. 3 / 26 Sample-to-Sample Variability As stated earlier, there is randomness in the X¯ value we get from a random sample. Suppose I want to estimate a population mean height µ using a sample mean X¯. Suppose I randomly select 50 individuals from a population, measure their heights, and find the sample mean x¯ = 5 foot 6 inches Suppose I repeat the process1, I again randomly select 50 individuals from a population, measure their heights, and find the sample mean x¯ = 5 foot 8 inches Suppose I repeat the process, I again randomly select 50 individuals from a population, measure their heights, and find the sample mean x¯ = 5 foot 5 inches I didn't do anything wrong in my data collection, this is just SAMPLING VARIABILITY! 1 In reality, we only take one sample. The above is meant to emphasize the existence of sample-to-sample variability. 4 / 26 The Sampling Distribution of X¯ Definition (Sampling Distribution) The probability distribution of a statistic is called a sampling distribution. X¯ is a statistic calculated from a random sample X1;X2;:::;Xn. X¯ is a linear combination of random variables. Pn ¯ i=1 Xi 1 1 1 X = n = n X1 + n X2 + ··· + n Xn For a random sample X1;X2;:::;Xn drawn from any distribution 2 with mean E(Xi) = µ and variance V (Xi) = σ , we have ¯ ¯ σ2 E(X) = µ and V (X) = n But a mean and variance does not fully specify a distribution. Do we know the probability distribution of X¯? ... 5 / 26 The Sampling Distribution of X¯ It turns out that X¯ has some predictable behavior... If the X1;X2;:::;Xn are drawn from a normal distribution, or by 2 notation Xi ∼ N(µ, σ ) for all i, then ¯ σ2 X ∼ N(µ, n ) for any sample size n. Example Suppose IQ scores are normally distributed with mean µ = 100 and variance σ2 = 256. If n = 9 IQ scores are drawn at random from this population, what is the probability that the sample mean is less than 93? ANSWER: Find P (X¯ < 93) (next page). 6 / 26 The Sampling Distribution of X¯ Example Suppose IQ scores are normally distributed with mean µ = 100 and variance σ2 = 256. If n = 9 IQ scores are drawn at random from this population, what is the probability that the sample mean is less than 98? ANSWER: Find P (X¯ < 93). We first need a distribution for X¯ (it follows a normal distribution!), and then we'll use it to create a Z random variable and use the Z-table. 7 / 26 The Sampling Distribution of X¯ The graphic below shows how the variability in X¯ decreases as the ¯ σ2 sample size n increases. Recall X ∼ N(µ, n ). 8 / 26 The Sampling Distribution of X¯ Notation: ¯ E(X) = µX¯ = E(X) = µ ¯ 2 V (X) σ2 V (X) = σX¯ = n = n Terminology: The term standard deviation refers to the population standard deviation, or pV (X) = σ, and... X−µ Z = σ The term standard error is a value related to X¯ and is also more fully stated as the standard error of the sample mean and it is the square root of the variance of X¯. q 2 ¯ p ¯ σ pσ Std. Error of X is V (X) = n = n And then... ¯ ¯ ¯ X − µX¯ X − µ X − µ Z = = q = p σX¯ σ2 σ= n n 9 / 26 The Sampling Distribution of X¯ Even when Xi are NOT drawn from a normal distribution, it turns out that X¯ has some predictable behavior... If the X1;X2;:::;Xn were NOT drawn from a normal distribution, 2 or by notation Xi ∼?(µ, σ ) for all i, then X¯ is approximately normally distributed as long as n is large enough or ¯ · σ2 X ∼ N(µ, n ) for n > 25 or 30. Thus, X¯ follows a normal distribution!!! (for a sufficiently large n) This is an incredibly useful result for calculating probabilities for X¯!! 10 / 26 The Sampling Distribution of X¯ Example (Probability for X¯, Flaws in a copper wire) Let X denote the number of flaws in a 1 inch length of copper wire. The probability mass function of X is presented in the following table: x P (X = x) 0 0.48 1 0.39 2 0.12 3 0.01 Suppose n = 100 wires are randomly sampled from this population. What is the probability that the average number of flaws per wire in the sample is less than 0.5? (i.e. find P (X¯ < 0:5)... next page) 11 / 26 The Sampling Distribution of X¯ Example (Probability for X¯, Flaws in a copper wire) ANSWER: P (X¯ < 0:5))= 12 / 26 Central Limit Theorem (CLT) Definition (Central Limit Theorem) Let X1;X2;:::;Xn be a random sample drawn from any population (or distribution) with mean µ and variance σ2. If the sample size is *sufficiently large*, then X¯ follows an approximate normal distribution. ¯ d σ2 We write: X ! N(µ, n ) as n ! 1 ¯ X−pµ d Or: Z = σ= n ! N(0; 1) as n ! 1 If the random sample is drawn from a non-normal population, then X¯ is approximately normal for sufficient large n (at least 25 or 30) and the approximation gets better and better as n increases. NOTE: If the original `parent population' from which the sample was drawn is normal, then X¯ follows a normal distribution for any n (a linear combination of normals is normal), and the CLT is not needed to achieve normality. 13 / 26 The Sampling Distribution of X¯ (simulation) Let's simulate this situation... f(x) Case 1: Original population is normally distributed x 1 Choose a sample of size n from a normal distribution 2 Compute x¯ 3 Plot the x¯ on our frequency histogram 4 Do steps 1-3 many time, such as 1000 times 5 Draw a histogram of the 1000 x¯ values (to see the sampling distribution of X¯) See applet at: http://onlinestatbook.com/stat sim/sampling dist/index.html 14 / 26 The Sampling Distribution of X¯ (simulation) Case 1: Original population is normally distributed (with n=2) The empirical distribution for X¯n=2 is in the lower plot (in blue). Its mean is very close to the parent population mean µ = 16, and its s:d: of 3.59 is σ 5 very close to the theoretical σx¯ = p = p = 3:54. n 2 15 / 26 The Sampling Distribution of X¯ (simulation) Case 1: Original population is normally distributed (with n=25) The empirical distribution for X¯n=25 is in the lower plot (in blue). Its mean is very close to the parent population mean µ = 16, and its s:d: of of 1.0 is the same as the theoretical σ = pσ = p5 = 1. x¯ n 25 16 / 26 The Sampling Distribution of X¯ (simulation) f(x) x RESULT - If the parent population (the one you are drawing from) is normal , then X¯ will follow a normal distribution for any sample size n with known mean and variance as show below. ¯ σ2 X ∼ N(µ, n ) 17 / 26 The Sampling Distribution of X¯ (simulation) Let's simulate this situation... Case 2: Original population is NOT normally distributed... f(x) f(x) f(x) x x x 1 Choose a sample of size n from a NON-normal distribution 2 Compute x¯ 3 Plot the x¯ on our frequency histogram 4 Do steps 1-3 many time, such as 1000 times 5 Draw a histogram of the 1000 x¯ values (to see the sampling distribution of X¯) See applet at: http://onlinestatbook.com/stat sim/sampling dist/index.html 18 / 26 The Sampling Distribution of X¯ (simulation) Case 2: Original population is NOT normally distributed (with right-skewed parent population and n=10) The empirical distribution for X¯n=10 is in the lower plot (in blue). Its bell-shaped with a mean equal to the parent population mean µ = 8:08.

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support