<<

t Test For Independent Samples

Lesson 10 Content of this Section

• Understand the logic of a two -sample t-test • Factors to consider when using this type of t-test • Assumptions that we make in order to accurately perform hypothesis testing Independent-Sample Designs • The population (µ) and (σ) are all unknown • Because the population values are not known they must be estimated from the sample data. – M – S2 Independent-Measures Designs

• Evaluate difference between two populations using data from two separate samples Independent-Measures Designs

• Comparing two distinct populations – Men and Women – Drug treatment versus no-drug treatment – Method A versus Method B Learning Example

• Do the achievement scores for children taught by method A differ from the scores for children taught by method B? • Are the two population means (µ = µ) the same or different (µ ≠ µ)? – Because population means are known two samples, one from each population are necessary • Sample 1 will provide information about the mean for the first population • Sample 2 will provide information about the second population

Independent-Measures • We have two samples so we will need to identify which data is attached to which sample.

Hypotheses Means Sample Size

M n • H0: µ1 = µ2 1 1 M n • H1: µ1 ≠ µ2 2 2 Formulas

Numerator

(M1 – M2)- (µ1 – µ2) µ – µ comes from the null t = 1 2 s hypothesis and sets the (M1 – M2) population mean to 0.

M1 – M2 comes from the data and represents the difference between both samples. Formulas

Denominator

(M1 – M2)- (µ1 – µ2) Measure of standard error t = s and reflects how much (M1 – M2) difference is reasonable to expect between a sample statistic and the population parameter. Content of this Section

• Review logic of hypothesis testing • Denominator Formulas • Pooled • Estimated standard error Formula Review

(M – M )- (µ1 – µ2) t = 1 2 s (M1 – M2) Denominator Formula • Each sample mean represents its own population mean

• 2 sources of error:

– M1 approximates µ1 with some error

– M2 approximates µ2 with some error Denominator Formula Denominator Formula • 2 sources of variance from Sample 1 and Sample 2 so there are also 2 sources of error - Pooled variance

2 S p = SS1 + SS2

df1 + df2 Learning Check • Two samples that are the same size n = 6; one sample SS = 50 and the second sample SS = 30

What is the pooled variance for this example? Learning Check • Two samples that are the same size n = 6; one sample SS = 50 and the second sample SS = 30 S2 = p 50 + 30 / 5 + 5 = 80/10 = 8.00 We Aren’t Done Yet! • Pooled variance found but now we need to find the estimated standard error in order to complete the t-test 2 S2 s S p p (M1 – M2) = √ + n1 n2 Learning Check • Two samples that are the same size n = 6; one sample SS = 50 and the second sample SS = 30 – Pooled Variance = 8

– s(M1 – M2) =

What is the estimated error for this example? Learning Check • Pooled Variance = 8

8 + s 8 (M1 – M2) = √ 6 6

√1.33 + 1.33 = 1.63 Content of this Section

• Degrees of freedom (df) • Application to t-table and distribution df • Degrees of freedom for independent-samples t statistic are determined by the df values for two separate samples: • –df for sample 1 + df for sample 2

• – df1 + df2

• –(n1 - 1) + (n2 - 1) = df • We need df for the critical boundaries df Learning Check • A researcher was interested in the impact of a treatment on memory and an independent samples t-test was used to analyze results. • Sample 1 had n = 8 participants • Sample 2 had n = 8 participants

What are the degrees of freedom (df) for the study? df Learning Check

• What are the degrees of freedom (df) for the study? df = 14 df Learning Check • If an independent samples study had df = ? how many participants total were recruited to participate? – df = 14 – df = 18 – df = 28 – df = 98 df Learning Check • If an independent samples study had df = ? how many participants total were recruited to participate? – df = 14 = 16 – df = 18 = 20 – df = 28 = 30 – df = 98 = 100 df Learning Check • Two separate samples each with n = 10 participants receive two different treatments. – What are the df for this study? – What is the critical boundary for this study using a two-tailed test with .05 level of significance? df Learning Check • Two separate samples each with n = 10 participants receive two different treatments. – df = 18 – 2.101 df Learning Check • Two separate samples each with n = 21 participants receive two different treatments. – What are the df for this study? – What is the critical boundary for this study using a one-tailed test with .05 level of significance? df Learning Check • Two separate samples each with n = 21 participants receive two different treatments. – df = 40 – 1.684 Content of this Section

• Assumptions of independent t- tests • Homogeneity of variance Assumptions

1. Observations within each sample must be independent. 2. Populations from which the samples are selected must be normal. 3. Populations from which the samples are selected must have equal variance Assumptions

• Most hypothesis tests are built on underlying assumptions, the tests usually work even if the assumptions are violated

• The one notable exception is the assumption of homogeneity of variance for the independent-measures t test The Homogeneity of Variance Assumption

• Requires that the two populations from which the

samples are obtained have equal variance

• Necessary in order to justify pooling the two sample and using the pooled variance in the calculation of the t statistic. The Homogeneity of Variance Assumption

• If the assumption is violated, then the t-statistic contains two questionable values: 1. The value for the population mean difference which comes from the null hypothesis 2. The value for the pooled variance Why Does this Matter?

• You cannot determine which of the two samples is responsible for a t statistic that falls in the critical region • You cannot be certain that rejecting the null hypothesis is correct when you obtain an extreme value for t. Content of this Section • Calculate t-statistic using two samples • Calculate • Cohen’s d • r2 • Interpret effect size meaning Review Formulas

df1 + df2 df =

(n1 - 1) + (n2 - 1) = df

2 (M – M ) (µ – µ ) 1 2 1 2 S p = SS1 + SS2 t =

s (M1 – M2)

df1 + df2 2 2 S S p + p s (M1 – M2) = √ n n 1 2 Learning Check

Assume a relationship between the TV viewing habits of 5-year old children and their engagement in prosocial behaviors in high school. Use the accompanying data to conduct a hypothesis test to determine whether there is a significant difference between high school students who watched a PBS children’s program and high school students who did not and the number of prosocial behaviors . Learning Check

Average High School Grade • Steps Watched Did not watch – Hypothesis 86 99 90 79 – Critical boundary 87 97 89 83 • α = .05 (two tailed) 91 94 82 86 • Need df 97 89 83 81 98 92 85 92 • Look up t-distribution n = 10 n = 10 – Calculate M = 93 M = 85 • Numerator • Pooled variance SS = 200 SS = 160 • Standard error – Make a decision Learning Check • Hypothesis • Critical region

– H0: µ1 - µ2 = 0 – 2.101

– H1: µ1 - µ2 ≠ 0 • Calculate • df – Numerator = 8 – Pooled Variance = 20 – n = 10-1 = 9 1 – Estimated error = 2 – n = 10 – 1 = 9 2 – T-statistic = +4.00 – df = 18 Learning Check • Make a decision – t(18) = +4.00; p<.05; Reject the null hypothesis – The obtained sample mean difference is 4 times greater than would be expected if there were no difference between the two populations. – There was a significant difference in the number of prosocial behaviors between the two groups of high school students. Effect Size - d estimated mean difference d = square root of pooled variance

M – M = 1 2 2 √ Sp

• 93 – 85/ √20 = 8/4.47 = 1.79 • A very large effect! Effect Size – r 2

2 ● r is used to determine how much of the variability in scores in explained by treatment effect. 42 Variability accounted for 2 2 + r = Total variability 4 18 16 2 =.47 2 t 34 r = 2 t + df A very large effect. Content of this Section • Calculate t-statistic using two samples • Calculate effect size • Cohen’s d • r2 • Interpret effect size meaning Learning Check An educational psychologist would like to determine whether access to computers has an effect on grades for high school students. One group of n = 16 students has homeroom each day in a computer classroom in which each student has a computer.

A comparison group of n = 16 students has homeroom in a traditional classroom. At the end of the school year, the average grade is recorded for each student. Learning Check

Computer Traditional M = 86 M = 82.5 SS = 1005 SS = 1155 n=16 n=16

Is there a significant difference between the two groups? Use a two-tailed test with α = .05 Learning Check ● Hypothesis ● Critical region ○ df ○ = .05 (two-tailed) ● Calculate ○ Numerator ○ Denominator 2 ■ Sp ■ S(M1 –M2) ● Make a decision Learning Check

● Hypothesis • Calculate ○ H0: µ1 - µ2 = 0 – Numerator = 3.5

○ H1: µ1 - µ2 ≠ 0 – Pooled Variance = 72 ● df – Estimated error = 3 – T-statistic = +1.17 ○ df = 30 ● Critical region ○ 2.042 • Make a decision

Learning Check • Make a decision – t(30) = +1.17; p>.05; Fail to reject the null hypothesis – Computers in home room do not impact grades for high school students – There was not significant difference between the two groups of students Learning Check - Effect Size

M – M Cohen’s d = 1 2 2 √ S p

t2 r2 = t2 + df Learning Check

Effect size would not be calculated because we did not have a statistically significant result. Photo Attribution

Internatilnal cflassroom https://www.flickr.com/photos/dfid/15278030225 (CC By) Sara E. Wood https://www.flickr.com/photos/saraewood/103466903 Children Prime Time (CC By) Coffee https://pixabay.com/en/children-win-success-video-game-593313/ Children Winning