Lecture 14 – Statistical Inference, Point Estimation, Hypothesis Testing
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 14 – Statistical Inference, Point Estimation, Hypothesis Testing Statistical inference is the process of using the information contained in random samples from a population to make inferences about the population. This can be further divided into parameter estimation and hypothesis testing. Parameter Estimation The different kinds of population parameters one might be interested in are population mean, population variance, population proportion, difference between two population means, and difference between two population proportions. The last two are useful to compare two populations. The corresponding sample statistics for each of the above population parameters are sample mean, sample variance, sample proportion, difference between two samples means, and difference between two sample proportions, respectively. Each of these statistics has estimates (values) that can be gotten from the random sample collected from the population. Inferences on the mean of a population, variance known Let’s say, we are interested in questions about the population mean. Recall for large samples using central limit theorem, 휎2 푋̅ ~ 푁 (휇, ) 푛 Let us first find a μl such that, 훼 푃(푋̅ < 휇 ) = 푙 2 This is the same as, 휇푙 − 휇 훼 푃 (푍 < 휎 ) = ⁄ 2 √푛 And, 휇푙 − 휇 휎 = 푍훼 ⁄ 2 √푛 휎 휇푙 = 휇 + 푍훼 ⁄ 2 √푛 Similarly, we can find a μu such that, 훼 푃(푋̅ < 휇 ) = 1 − 푢 2 훼 휎 휇푙 = 휇 + 푍1− ⁄ 2 √푛 Now the probability that 푋̅ lies between μl and μu is given by 1-α. Recognizing that 훼 훼 푍1− = −푍 , if we ignore the sign for the Z values, we have, 2 2 휎 휎 푃 (휇 − 푍훼 < 푋̅ < 휇 + 푍훼 ) = 1 − 훼 2 √푛 2 √푛 Rearranging this to find an interval for μ we have, 휎 휎 푃 (푋̅ − 푍훼 < 휇 < 푋̅ + 푍훼 ) = 1 − 훼 2 √푛 2 √푛 Recognize that μ actually is a number and not a variable, the correct way to express the above is, 휎 휇 ∈ (푋̅ ± 푍훼 ) 푤푡ℎ 푎 (1 − 훼) 푐표푛푓푑푒푛푐푒 2 √푛 This is a 2 sided confidence interval. Similarly, the 1-sided confidence intervals can be shown as, 휎 휇 ∈ (푋̅ − 푍훼 , ∞) 푙표푤푒푟 푏표푢푛푑 푤푡ℎ 푎 (1 − 훼) 푐표푛푓푑푒푛푐푒 √푛 휎 휇 ∈ (−∞, 푋̅ + 푍훼 ) 푢푝푝푒푟 푏표푢푛푑 푤푡ℎ 푎 (1 − 훼) 푐표푛푓푑푒푛푐푒 √푛 Example 28 observations have a sample mean of 0.0328. If the experimenter knows the standard deviation to be 0.015, find the 95% 2-sided confidence interval for the population mean. Solution 0.015 0.015 푃 (0.0328 − 푍0.05 < 휇 < 0.0328 − 푍0.05 ) = 95% 2 √28 2 √28 0.015 0.015 휇 ∈ (0.0328 − 푍0.05 , 0.0328 − 푍0.05 ) 2 √28 2 √28 휇 ∈ (0.0272,0.0384) Hypothesis Testing Many problems require whether to accept or reject a statement about some population parameter. Example – is the strength of the column strong enough to withstand the load on it? The population parameter is the mean carrying capacity of the column. The sample parameter is the average of the carrying capacities of n randomly selected columns. The statement is called a hypothesis, and the decision making procedure is called hypothesis testing. A statistical hypothesis, therefore, is a statement about the parameters of one or more populations. There are two basic kinds of hypothesis tests. You might want to compare the population parameter (say the population mean) to a single value, and may want to determine whether the parameter is equal to that value. In this case, the hypotheses may be stated as H0 – μ = μ0 H1 - μ ≠ μ0 H0 is called the null hypothesis and H1 is called the alternate hypothesis. The above is a two sided test because in the alternative hypothesis you are allowing the population mean to be either less than or greater than the value used for comparison (μ0). However, if you want to check whether your parameter is less than or greater than a particular value, then you can have either of the following sets of hypotheses, H0 – μ ≤ μ0 H1 - μ > μ0 Or H0 – μ ≥ μ0 H1 - μ < μ0 These are one sided tests. Testing Hypotheses – Truth Table Conclusion Fact H0 is true H0 is false H0 is true Correct Action Type 2 Error H0 is false Type 1 Error Correct Action So you make a type 1 error if you reject H0 when in fact it is true and a type 2 error if you don’t reject it when in fact it is false. The type 1 error is denoted by α which is also called the significance level. The type 2 error is denoted by β. Because we work with random variables which have probability distributions, we can associate probabilities for each type of error. Our focus mostly would be on α for decision making i.e. we will want to minimize the possibility of erroneously rejecting H0. In other words, when we favor H1, there is very little chance that we made a wrong decision. So our possible conclusions are, 1) We favor H1 against H0 2) We cannot favor H1 against H0 i.e. either of them can be true. The standard values of α are 0.01 or 0.05. .