Introductory Statistics Refresher
Introductory Statistics Refresher
Dr. Julia L. Sharp
Short Course on Introductory Statistics Part III
Sharp (Clemson University) ASA 1 / 26 Hypothesis Testing
As an example, suppose that I claim that I am excellent free throw shooter, making 80% or more of my free throw shots.
Given a claim. Gathered evidence. Assessed the evidence using the claim.
Sharp (Clemson University) ASA 2 / 26 Hypothesis Testing
State the null and alternative hypotheses. State the Type I and Type II Errors for the hypotheses. State the level of significance (maximum acceptable α). Check assumptions. Compute the test statistic. Calculate the p-value. Compare the p-value with the level of significance. Make a decision regarding the null hypothesis. Draw a conclusion in terms of the problem.
Sharp (Clemson University) ASA 3 / 26 Hypothesis Testing Definitions
Null Hypothesis: (Ho) a statement of no effect or no change. This statement is assumed to be true unless sufficient evidence is gathered to reject this hypothesis.
Alternative Hypothesis: (Ha) the research hypothesis. This is the statement that one wishes to support as being true. This is done by gathering evidence against the null hypothesis.
Type I Error: an error that occurs if the null hypothesis is rejected when it is true.
The probability of a Type I error is denoted as α
Type II Error: an error that occurs if the null hypothesis is not rejected when it is false.
The probability of a Type II error is denoted as β
Sharp (Clemson University) ASA 4 / 26 Hypothesis Testing Definitions
State of Nature Ho is True Ho is False
Reject Ho
Fail to Reject Ho
Sharp (Clemson University) ASA 5 / 26 More Hypothesis Testing Definitions Test statistic: a quantity computed from sample data that depends on the value of the parameter begin tested
Level of significance: the maximum allowable chance of making a Type I error that the researcher is willing to accept
P-value: the probability, computed assuming the null hypothesis is true, that a test statistic will be as or more extreme than the test statistic that was actually observed.
Sharp (Clemson University) ASA 6 / 26 Small Sample P-value Method: Ho : µ = µ0 y − µ t = √ 0 obs s/ n
Ha : µ < µ0 Ha : µ > µ0 Ha : µ 6= µ0 P-value: P-value: P-value:
P(T < tobs) P(T < tobs) P(T < tobs)
Decision Rule:
Sharp (Clemson University) ASA 7 / 26 P-value Method Example Suppose that we would like to conduct a test to determine if the average Phosphorus leaching is less than 50mm. Recall that the sample mean from 32 lysometer samples is 44.7166 and the sample standard deviation is 7.8069. Use a significance level of 0.05. State the hypotheses.
Compute the test statistic.
Determine the p-value.
Sharp (Clemson University) ASA 8 / 26 P-value Method Example
Suppose that we would like to conduct a test to determine if the average Phosphorus leaching is less than 50mm. Use a significance level of 0.05.
Make a decision regarding Ho.
State the conclusion in terms of the problem.
Sharp (Clemson University) ASA 9 / 26 Example Riddle and Bergström (2013) describe several experiments to examine Phosphorus leaching from two soils. A table of results from one of the experiments is reproduced below. There were four different rain simulations used and two soil types (clay and sand). The amount of drainage water collected from lysimeters was recorded.
Riddle,Sharp (Clemson M. U. andUniversity) Bergström, L. (2013). “PhosphorusASA leaching from two soils with catch crops10 / 26 exposed to freeze-thaw cycles,” Agronomy Journal, 105(3): 803-811. Hypothesis Test: Phosphorus Leaching
Conduct a test to determine if the average Phosphorus leaching is less than 50mm.
One Sample t-test
data: drain$drainage t = -3.8283, df = 31, p-value = 0.0002936 alternative hypothesis: true mean is less than 50 95 percent confidence interval: -Inf 47.05657 sample estimates: mean of x 44.71664
Sharp (Clemson University) ASA 11 / 26 Inferences Comparing Two Population Central Values
Compare the average responses in two groups. Assumptions:
Independent random samples of n1 observations from one population and n2 observations from a second population are selected. Samples are selected from normal distributions or large sample sizes are used. GOAL: Make inference about the difference between the population means.
Population Sample Mean Standard Deviation Size Mean Standard Deviation 1 2
Sharp (Clemson University) ASA 12 / 26 Inference for Two Population Means: Example
Riddle and Bergström (2013) describe several experiments to examine Phosphorus leaching from two soils. A table of results from one of the experiments is reproduced below. There were four different rain simulations used and two soil types (clay and sand). The amount of drainage water collected from lysimeters was recorded.
Suppose that we would like to compare the average amount of drainage water collected from clay soil to the average amount of drainage water col- lected from sandy soil.
Sharp (Clemson University) ASA 13 / 26 Sampling Distribution of Y 1 − Y 2
Suppose two independent random variables Y1 and Y2 are normally distributed with appropriate means and variances:
The sampling distributions of Y 1 and Y 2 are:
The sampling distribution of Y 1 − Y 2 is:
The mean of the sampling distribution is:
The standard error of the sampling distribution is:
Sharp (Clemson University) ASA 14 / 26 Inference for Comparing Two Population Means: Independent Samples
2 2 #σ1 and σ2
Equal Unequal 2 2 @ 2 2 σ1 = σ2 " @!σ1 6= σ2 © @@R Variance of' $ '2 2 $ 2 1 1 σ1 σ2 Y¯ − Y¯ σ + + 1 2 n1 n2 n1 n2
@ Variance Estimate& © % & @R % ' $ '2 2 $ 2 1 1 s1 s2 sp + + n1 n2 n1 n2
&Sharp (Clemson University) % ASA & 15% / 26 Independent Samples, Equal Variances: Hypothesis Tests for Comparing Two Population Means
Ho : µ1 − µ2 = D0
Ha : µ1 − µ2 < D0 Ha : µ1 − µ2 > D0 Ha : µ1 − µ2 6= D0
Test statistic: (y 1 − y 2) − D0 tobs = r 1 1 sp + n1 n2 where
2 2 2 (n1 − 1)s1 + (n2 − 1)s2 sp = n1 + n2 − 2
Sharp (Clemson University) ASA 16 / 26 Independent Samples, Equal Variances: Hypothesis Test P-values Ho : µ1 − µ2 = D0
Ha : µ1 − µ2 < D0 Ha : µ1 − µ2 > D0 Ha : µ1 − µ2 6= D0
P-value: P-value: P-value: P(T < tobs) P(T > tobs) 2P(T > |tobs|)
Decision Rule:
Sharp (Clemson University) ASA 17 / 26 Random Assignment of Treatment to Experimental Units Is the average amount of drainage water collected from clay soil different from the average amount of drainage water collected from sandy soil?
Sharp (Clemson University) ASA 18 / 26 Inference for Two Means Is the average amount of drainage water collected from clay soil different from the average amount of drainage water collected from sandy soil? Use a significance level of 0.05.
Sharp (Clemson University) ASA 19 / 26 Inference for Two Means, Independent Samples, Equal Variances: Confidence Interval A 100(1 − α)% confidence interval for the difference in population means is s ! 1 1 (y 1 − y 2) ± tα/2,(n1+n2−2) sp + n1 n2 Two Sample t-test
data: drain$drainage by drain$soil t = 1.5148, df = 30, p-value = 0.1403 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.42650 9.61915 sample estimates: mean in group clay mean in group sand 46.76480 42.66848
Sharp (Clemson University) ASA 20 / 26 Inference for Comparing Two Population Means: Independent Samples
2 2 #σ1 and σ2
Equal Unequal 2 2 @ 2 2 σ1 = σ2 " @!σ1 6= σ2 © @@R Variance of' $ '2 2 $ 2 1 1 σ1 σ2 Y¯ − Y¯ σ + + 1 2 n1 n2 n1 n2
@ Variance Estimate& © % & @R % ' $ '2 2 $ 2 1 1 s1 s2 sp + + n1 n2 n1 n2
&Sharp (Clemson University) % ASA & 21% / 26 Independent Samples, Unequal Variances: Hypothesis Tests for Comparing Two Population Means
Ho : µ1 − µ2 = D0
Ha : µ1 − µ2 < D0 Ha : µ1 − µ2 > D0 Ha : µ1 − µ2 6= D0
Test statistic: 0 (y 1 − y 2) − D0 tobs = s s2 s2 1 + 2 n1 n2
Sharp (Clemson University) ASA 22 / 26 Distribution of the Test Statistic
0 t ∼˙ t(df )
(n − 1)(n − 1) = 1 2 where df 2 2 (1 − c) (n1 − 1) + c (n2 − 1)
2 s1 n and c = 1 s2 s2 1 + 2 n1 n2
Sharp (Clemson University) ASA 23 / 26 Independent Samples, Unequal Variances: Hypothesis Test P-values
Ho : µ1 − µ2 = D0
Ha : µ1 − µ2 < D0 Ha : µ1 − µ2 > D0 Ha : µ1 − µ2 6= D0
P-value: P-value: P-value: P(T < tobs) P(T > tobs) 2P(T > |tobs|)
Decision Rule:
Sharp (Clemson University) ASA 24 / 26 Inference for Two Means (Unequal Variances): Example Using PROC TTEST Is the average amount of drainage water collected from clay soil different from the average amount of drainage water collected from sandy soil? Use a significance level of 0.05.
Sharp (Clemson University) ASA 25 / 26 Inference for Two Means, Independent Samples, Unequal Variances: Confidence Interval
A 100(1 − α)% confidence interval for the difference in population means when the population variances are not equal is s 2 2 s1 s2 (y 1 − y 2) ± tα/2,df + n1 n2 (n − 1)(n − 1) = 1 2 where df 2 2 (1 − c) (n1 − 1) + c (n2 − 1)
2 s1 n and c = 1 s2 s2 1 + 2 n1 n2
Sharp (Clemson University) ASA 26 / 26