Lecture 14 – , Point Estimation, Hypothesis Testing Statistical inference is the process of using the information contained in random samples from a population to make inferences about the population. This can be further divided into parameter estimation and hypothesis testing.

Parameter Estimation The different kinds of population parameters one might be interested in are population , population , population proportion, difference between two population , and difference between two population proportions. The last two are useful to compare two populations. The corresponding sample for each of the above population parameters are sample mean, sample variance, sample proportion, difference between two samples means, and difference between two sample proportions, respectively. Each of these statistics has estimates (values) that can be gotten from the random sample collected from the population.

Inferences on the mean of a population, variance known Let’s say, we are interested in questions about the population mean. Recall for large samples using ,

휎2 푋̅ ~ 푁 (휇, ) 푛

Let us first find a μl such that, 훼 푃(푋̅ < 휇 ) = 푙 2

This is the same as,

휇푙 − 휇 훼 푃 (푍 < 휎 ) = ⁄ 2 √푛

And,

휇푙 − 휇 휎 = 푍훼 ⁄ 2 √푛 휎 휇푙 = 휇 + 푍훼 ⁄ 2 √푛

Similarly, we can find a μu such that, 훼 푃(푋̅ < 휇 ) = 1 − 푢 2

훼 휎 휇푙 = 휇 + 푍1− ⁄ 2 √푛

Now the probability that 푋̅ lies between μl and μu is given by 1-α. Recognizing that

훼 훼 푍1− = −푍 , if we ignore the sign for the Z values, we have, 2 2 휎 휎 푃 (휇 − 푍훼 < 푋̅ < 휇 + 푍훼 ) = 1 − 훼 2 √푛 2 √푛

Rearranging this to find an interval for μ we have, 휎 휎 푃 (푋̅ − 푍훼 < 휇 < 푋̅ + 푍훼 ) = 1 − 훼 2 √푛 2 √푛

Recognize that μ actually is a number and not a variable, the correct way to express the above is, 휎 휇 ∈ (푋̅ ± 푍훼 ) 푤𝑖푡ℎ 푎 (1 − 훼) 푐표푛푓𝑖푑푒푛푐푒 2 √푛

This is a 2 sided .

Similarly, the 1-sided confidence intervals can be shown as, 휎 휇 ∈ (푋̅ − 푍훼 , ∞) 푙표푤푒푟 푏표푢푛푑 푤𝑖푡ℎ 푎 (1 − 훼) 푐표푛푓𝑖푑푒푛푐푒 √푛 휎 휇 ∈ (−∞, 푋̅ + 푍훼 ) 푢푝푝푒푟 푏표푢푛푑 푤𝑖푡ℎ 푎 (1 − 훼) 푐표푛푓𝑖푑푒푛푐푒 √푛

Example

28 observations have a sample mean of 0.0328. If the experimenter knows the to be 0.015, find the 95% 2-sided confidence interval for the population mean.

Solution

0.015 0.015 푃 (0.0328 − 푍0.05 < 휇 < 0.0328 − 푍0.05 ) = 95% 2 √28 2 √28 0.015 0.015 휇 ∈ (0.0328 − 푍0.05 , 0.0328 − 푍0.05 ) 2 √28 2 √28

휇 ∈ (0.0272,0.0384) Hypothesis Testing

Many problems require whether to accept or reject a statement about some population parameter. Example – is the strength of the column strong enough to withstand the load on it? The population parameter is the mean carrying capacity of the column. The sample parameter is the average of the carrying capacities of n randomly selected columns.

The statement is called a hypothesis, and the decision making procedure is called hypothesis testing. A statistical hypothesis, therefore, is a statement about the parameters of one or more populations.

There are two basic kinds of hypothesis tests. You might want to compare the population parameter (say the population mean) to a single value, and may want to determine whether the parameter is equal to that value. In this case, the hypotheses may be stated as

H0 – μ = μ0

H1 - μ ≠ μ0

H0 is called the null hypothesis and H1 is called the alternate hypothesis. The above is a two sided test because in the you are allowing the population mean to be either less than or greater than the value used for comparison (μ0).

However, if you want to check whether your parameter is less than or greater than a particular value, then you can have either of the following sets of hypotheses,

H0 – μ ≤ μ0

H1 - μ > μ0

Or

H0 – μ ≥ μ0

H1 - μ < μ0

These are one sided tests.

Testing Hypotheses – Truth Table

Conclusion Fact H0 is true H0 is false H0 is true Correct Action Type 2 Error H0 is false Type 1 Error Correct Action

So you make a type 1 error if you reject H0 when in fact it is true and a type 2 error if you don’t reject it when in fact it is false.

The type 1 error is denoted by α which is also called the significance level. The type 2 error is denoted by β. Because we work with random variables which have probability distributions, we can associate probabilities for each type of error.

Our focus mostly would be on α for decision making i.e. we will want to minimize the possibility of erroneously rejecting H0. In other words, when we favor H1, there is very little chance that we made a wrong decision.

So our possible conclusions are,

1) We favor H1 against H0

2) We cannot favor H1 against H0 i.e. either of them can be true.

The standard values of α are 0.01 or 0.05.