Chapter 2 Statistical Significance Test and Analysis of Variance

ENSC450/650 Environmental and Geophysical data analysis --- Chapter 2

Chapter 2 Statistical Significance Test and Analysis of Variance

2.1 Statistical test for means

In statistical analysis, we use the samples to infer the population features. The definition of population is “every possible object (or entity) from which the sample is selected”. Another way of looking at it is as “the complete set of all possible measurements that might hypothetically be recorded” for the study.

The inference from sample may bias the features of the population. Thus, the statistical test is required to examine whether the inference correctly catches the features of population. For example, we wish to examine whether there is a significant difference between male and female students in learning math.. To answer this question, we take 2000 samples of the math. test, 1000 male students and 1000 female students. A straightforward step is to compare the mean score of the test between male and female students. For example, the mean score of male is 85% and female is 82%. The question here is what can be inferred from the difference between 85% and 82%? Here we need to focus on the population feature, i.e., the female and male students, rather than the samples. Thus, there are two issues: 1) whether the averaged value from sample is the mean of population? 2) a small difference of 3% from sample is really different in population?

Denote the sample mean , and the population mean , so we need to test = or not?

The inference depends on the size of sample and the sample variability. The larger sample size, the more reliable the inference. In addition, the variation in sample affects the reliability. The increase in sample variation decreases the reliability. Thus, we can express the unreliability by sample size n and sample variation that can be quantified by standard deviation of the sample s, namely,

,

Typically, there are 6 steps for a statistical test, for example, the above test =? Statistically, the test is also called hypothesis test, which is a format process of asking whether a logical statement called the null hypothesis should be rejected in favor of an opposite statement, the alternative hypothesis.

1)  Define the null hypothesis,

2)  Define the alternative hypothesis, (the logical opposite of )

3)  Specify an alpha value, , which is the maximum probability we are willing to accept of committing a Type 1 error;

4)  Calculate the test statistic

5)  Compare the test statistic with a critical value (from Table)

6)  Reject the null hypothesis if the test statistic is of greater magnitude than the critical value

In step 4), a statistical distribution should be specified based on the chosen test statistic. For example, for the above the mean test, the one-sample t test is specified, i.e.,

(1)

Several commonly-used distributions are listed below

(1)  H0: two mean values from group A and group B, ? (if variance is not equal)

test statistic: (2)

the statistic follows t-distribution with the number of freedom degree

where is the sample size of group A and group B respectively. and are sample standard deviations for group A and group B, respectively.

(2)  H0: two mean values from group A and group B, ? (if variance is equal)

test statistic: (3)

the statistic follows t-distribution with the number of freedom degree (), where tis the sample size of group A and group B respectively. and are sample standard deviations for group A and group B, respectively.

(3)  H0: two variance values from group A and B,

Test statistic (4)

the statistic follows F-distribution with the number of freedom degree (, the meaning of notations is the same as above.

Example 1: Mean value Test

Let A1denote a set obtained by drawing a random sample of six measurements:

and let A2denote a second set obtained similarly:

{\displaystyle A_{2}=\{29.89,\ 29.93,\ 29.72,\ 29.98,\ 30.02,\ 29.98\}}

We will carry out tests of the null hypothesis that the meansof the populations from which the two samples were taken are equal.

The difference between the two sample means, each denoted by{\displaystyle {\overline {X}}_{i}}, which appears in the numerator for all the two-sample testing approaches discussed above, is

{\displaystyle {\overline {X}}_{1}-{\overline {X}}_{2}=0.095.}

The samplestandard deviationsfor the two samples are approximately 0.05 and 0.11,

respectively. For such small samples, a test of equality between the two population

variances would not be very powerful. Since the sample sizes are equal, the two

forms of the two samplet-test will perform similarly in this example.

unequal variances

If the approach for unequal variances (discussed above) is followed, the results are

{\displaystyle {\sqrt {{s_{1}^{2} \over n_{1}}+{s_{2}^{2} \over n_{2}}}}\approx 0.0485}

{\displaystyle {\text{df}}\approx 7.03.\,}The test statistic is approximately 1.959. Consulting the student t-table, the critical value, at =0.05, is 2.365 (df =7, probability =0.025)

So, the test statistic is of smaller magnitude than the critical value, thus, accept H0,

i.e., the mean of group A is equal to the mean of group B.

Example 2: Statistical test for Variance

In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results,in seconds, are shown in the following table.

~ F(

The test statistic is F = 0.5623/0.4617=1.22

The critical value F= 3.18 (=0.05)

2.2 One-way ANOV (Analysis of Variance )

ANOVA is a flexible data analytic technique that allows us to test hypotheses about means when we have two or more independent variables in the design. Thus, ANOV is to test the below hypothesis

Case 1: Completely randomized experiment

H0: Multiple mean values from multiple groups, i.e.,

? (if variance is equal)

The first situation is the sample is completely randomized designed. For example, we have five female and male high school seniors and obtain SAT mean scores 550 and 590 respectively. We wish to examine whether the sample mean of 550 (female) and 590 (male) is really different? Or can we conclude that the male score 40 points higher, on average, than female? To answer the question, we must consider the amount of sampling variability among the students. Case 1, if the score is distributed like the upper row

the difference between means is small relative to the sampling variability of the scores within the group. We would be inclined not to reject the null hypothesis of equal population means in this case.

In contrast, if the data are as depicted in the dot plot of bottom row, then the sampling variability is small relative to the difference between the two means. In this case, we would be inclined to reject the hypothesis.

So, the key point here is to compare the difference between two means against the variability, as we discussed in t-test. Here there is another way to address the comparison and result in another test statistic, i.e., F distribution,

~ F(k-1, n-k) (5)

Where MST is the mean square for treatments, and MSE is the mean square for errors.

(6)

(7)

Actually, where is the standard deviation of group i.

Back the above case, if the square of the standard deviation of male () and female () are 2250, we can calculate MSE and MST

=5*(550-570)^2+5(590-570)^2=4,000;

SSE=4*2250+4*2250=18,000

MST=SST/(2-1)=4,000; MSE=18,000/(10-2)=2250

Thus, F = MST/MSE=4000/2250=1.78

From F-table, given the alpha value = 0.05, numerator degree of freedom =(2-1); denominator degree of freedom = (10-2) =8, the

F value is smaller than , accept H0, i.e., there is no significant difference between male score 590 and female score 550.

Case 2: Randomized block experiment

To alleviate the impact of group inter variability, we design the experiment in blocks, in each which experiment units are as similar as possible. For example, if we wish to compare SAT scores of female and male high school seniors, we could select independent random samples of five female and five males, and analyze the results of the completely randomized design as discussed above. Or, we could select matched pairs of females and males according to their scholastic records, for example, GPA, as shown below. Five such pairs (blocks) are depicted here. For such randomized block experiment, the test statistic is the same as above, i..e,

~ F (k-1, n-b-k+1)

SST is the same as before, i.e.,

=5*(606-600)^2+5*(594-600)^2=360;

SSE cannot directly be calculated in this case. Instead, we calculate the sum of squares for blocks (SSB);

=2(535-600)^2+2(560-600)^2+2(585-600)^2+2(630-

600)^2+2(690-600)^2=30,100

Where is the sample size in each block, is the mean of each block.

The total variation

=(540-600)^2+(570-600)^2+(590-600)^2+…+(530-600)^2

+(550-600)^2+..+(690-600)^2=30,600

Thus, SSE = SS(total)-SST-SSB=30600-30100-360=140

F=MST/MSE=360/(2-1)/140/(10-5-2+1)=10.29

From F-table, given the alpha value = 0.05, numerator degree of freedom =(2-1); denominator degree of freedom = (n-b-k+1)=(10-5-2+1) =4, the . F value is greater than , so reject H0.

The difference between completely randomized experiment and randomized block experiment can be summarized in below graph

2.3 Two -way ANOV (Analysis of Variance )

In section 2.2, we investigated the differences among k level (or treatment) of a single factor. Here we study the response to changes in two factors; factor A, observed at factor levels 1, 2,… a; and factor B, observed at factor levels 1,2,…,b. For example, an engineer in a textile mill may be interested in the effect of temperature and cycle time on brightness of a fabric in a process involving dye.

Suppose that we have conducted exactly n experiments at every possible factor-level combinations. Suppose also that (i,j) represents the combination f the ith level of factor A with the jth level of factor B, where i=1,2,…,a and j=1,2,..,b. Moreover, let us denote the kth observation in the (i,j) factor-level combination, or cell, , k=1,2…,n. Thus, we have a total of a*b*n observations that can be arranged in the below table.

From the above mean, we can have the below table too

Define the overall mean of the ab levels in the above table as

The row (factor A) means as

i=1,2,…,a

The column (factor A) means as

j=1,2,…,b

To calculate the test statistic, we need to calculate these metrics

(8)

Test treatment Means

H0: No difference among the ab treatment means

H1: At least two treatment means differ

Test Statistic ~ F (ab-1, n-ab ) (n=abk)

Test for factor interaction

H0: Factor A and B do not interact to affect the response mean

H1: Factor A and B do interact to affect the response mean

The test statistic (see the below table)

Test for main effect of factor A

H0: No difference among the a mean levels of factor A

H1: At least two factor A mean levels different

The test statistic (see the below table)

Test for main effect of factor B

H0: No difference among the a mean levels of factor B

H1: At least two factor B mean levels different

The test statistic (see the below table)

Example

With F(0.01;2,12)=6.93. That is there is very strong evidence that these differences cannot be explained as being chance results.