<<

Analysis of

OPRE 6301 Introduction . . .

The purpose of of variance is to compare two or more populations of interval . Specifically, we are interested in determining whether differences exist between the population .

Although we are interested in a comparison of means, the method works by analyzing the sample variance. The intuitive reason for this is that by studying the “sources” of variation in the sample data, we can decide whether any observed differences among multiple sample means can be attributed to chance, or whether they are indicative of actual differences among the means of the corresponding populations.

Analysis of variance is one of the most widely used meth- ods in .

1 One-Way Analysis of Variance . . .

Consider the following scenarios:

— We wish to decide on the basis of sample data whether there is really a difference in the effectiveness of three methods of teaching computer programming. — We wish to compare the average yield per acre of wheat when different types of fertilizers are used. — We wish to find out whether differences exist between the travel time from home to work/school along three different routes. — We wish to find out whether differences exist between average starting salary for MBA graduates from dif- ferent schools.

In all of these examples, we are trying to find out the effect of a single “factor” on a population; this explains the description “one-way.”

2 Example: Advertising Strategy . . .

We will illustrate the method using the following concrete example.

An apple juice manufacturer is planning to develop a new product — a liquid concentrate. The marketing manager has to decide how to market the new product. Three strategies are considered: • Emphasize convenience of using the product. • Emphasize the quality of the product. • Emphasize the product’s low price. An was conducted as follows: • An advertising campaign was launched in three cities. • In each city, only one of the three characteristics of the new product (convenience, quality, and price) was emphasized.

3 Weekly sales were recorded for twenty weeks following

the beginning of the campaigns (see Xm15-01.xls).

C o n v n c e Q u a l i t y P r i c e

5 2 9 8 0 4 6 7 2

6 5 8 6 3 0 5 3 1

7 9 3 7 7 4 4 4 3

5 1 4 7 1 7 5 9 6

6 6 3 6 7 9 6 0 2

7 1 9 6 0 4 5 0 2

7 1 1 6 2 0 6 5 9

6 0 6 6 9 7 6 8 9

4 6 1 7 0 6 6 7 5

5 2 9 6 1 5 5 1 2

4 9 8 4 9 2 6 9 1

6 6 3 7 1 9 7 3 3

6 0 4 7 8 7 6 9 8

4 9 5 6 9 9 7 7 6

4 8 5 5 7 2 5 6 1

5 5 7 5 2 3 5 7 2

3 5 3 5 8 4 4 6 9

5 5 7 6 3 4 5 8 1

5 4 2 5 8 0 6 7 9

6 1 4 6 2 4 5 3 2

4 Research Hypothesis

We wish to answer the following question: Do differences in sales exist between the test markets (i.e., advertising strategies)? In other words, our research hypothesis is that at least two of the three mean sales differ.

Therefore,

H0 : µ1 = µ2 = µ3 H1 : At least two of the µjs differ

We now need to develop an appropriate to test the above pair of hypotheses.

5 Notation

Let k = total number of populations (k = 3 in this example) nj = sample size for the jth population (nj = 20 for all 1 ≤ j ≤ k in this example) k n = j=1 nj , the total number of observations

Xij =P the ith observation from the jth population, where 1 ≤ j ≤ k and for given j,1 ≤ i ≤ nj.

X¯j = the sample mean of all observations from the jth population, for 1 ≤ j ≤ k; formally, n 1 j X¯ = X . j n ij j i=1 X sj = the of all observations from the jth population X¯ = the grand mean of all observations; formally, n 1 k j 1 k X¯ = X = n X¯ . n ij n j j j=1 i=1 j=1 X X X

6

Thus,

F i r s t o b s e r v a t i o n ,

X 1 1

X X

1 1 k

2

f

i r s t s a m p l e

1

x 2

k

x x

2 2 2

.

. .

.

S

e c o n d o b s e r v a t i o n ,

. .

.

. .

X n 1 , 1

s e c o n d s a m p l e

X X

n , n k , k

2 2

n

1

n n

2

k

x

1

x x

2

k

S a m p l e s i z e

S a m p l e m e a n

Note that in general, the njs do not have to be the same.

7 Generic Terminology

In the context of this example,

Response Variable: weekly sales, denoted by X above Responses: actual sales values, which are the observed

values of the Xijs Experimental Unit: weeks in the three cities when we recored sales figures Factor: the criterion by which we classify the popula- tions — advertising strategy; another possible factor is advertising medium (TV, newspaper, or internet) Factor Levels: possible treatments for a given fac- tor — convenience, quality, or price

8 Test Statistic

We will consider two measures of variability. The idea is

depicted below ...

3 0

2 5

2 0

x 3 =

2 0

3

x =

2 0

2 0

1 9

1 5

x 2 =

1 5

1 6

x 2 =

1 5

1 4

1 0

x 1 =

1 2

1 1

1 0

1

x =

1 0

1 0

9

9

7

A s m a l l v a r i a b i l i t y w i t h i n T h e s a m p l e m e a n s a r e t h e s a m e a s b e f o r e ,

1

t h e s a m p l e s m a k e s i t e a s i e r b u t t h e l a r g e r w i t h i n s a m p l e v a r i a b i l i t y

T r e a t m e n t 3

T r e a t m e n t 1 T r e a t m e n t 2 T r e a t m e n t 3 T r e a t m e n t 1 T r e a t m e n t 2

t o d r a w a c o n c l u s i o n a b o u t t h e m a k e s i t h a r d e r t o d r a w a c o n c l u s i o n

p o p u l a t i o n m e a n s . a b o u t t h e p o p u l a t i o n m e a n s .

9 Formally, we assume that the Xijs are independent and normally distributed with means µj and a common vari- ance σ2. That is,

X¯ij = µj + ǫij , (1) where the ǫijs, the errors, are normally distributed with mean 0 and variance σ2.

Now, if the null hypothesis is true, say with all µjs equal to µ, then the key idea is that we can estimate σ2 in two ways . . .

10 Method 1: For each j, the nj observations X1j, X2j,

..., Xnj,j can be viewed as a sample of size nj from a normal population with mean µ and variance σ2. It follows that n 1 j (X − X¯ )2 σ2 ij j i=1 2 X has a χ distribution with nj − 1 degrees of freedom (see equation (3) in notes for Chapter 12). Furthermore, summing the above over j shows that

n 1 k j (X − X¯ )2 σ2 ij j j=1 i=1 X X has a χ2 distribution with n − k degrees of freedom k (recall that j=1 nj = n). 2 2 Since E(χ )=P ν, we see that σ can be estimated by n 1 k j MSE ≡ (X − X¯ )2 , (2) n − k ij j j=1 i=1 X X where MSE is for mean square for errors (cf. the white groupings in the previous figure).

11 Method 2: Recall that for each j, the sample mean X¯j also is a normally distributed variable with mean µ 2 and variance σ /nj. Since we have one mean for each j, i.e., for each treatment, it is intuitive that one should be able to estimate σ2 by looking at the variability of the treatment means around the grand mean. Indeed, it can be shown that

1 k n (X¯ − X¯ )2 σ2 j j j=1 X (think of this as weighting the squared deviation (X¯j− ¯ 2 X¯ ) by nj, the number of observations for treatment j)hasa χ2 distribution with k−1 degrees of freedom. Again, since E(χ2) = ν, we see that σ2 can also be estimated by

1 k MST ≡ n (X¯ − X¯ )2 , (3) k − 1 j j j=1 X where MST is for mean square for treatments (cf. the horizontal pink lines in the previous figure).

12 Since both (2) and (3) are unbiased point estimates for σ2, we expect the two estimates to be close. This suggests that we consider the statistic

MST F ≡ . (4) MSE

If the null hypothesis is true, we expect F to be close to 1. If, on the other hand, the null hypothesis is not true, we expect F to be pulled up by a greater MST (again, refer to the previous figure). Thus, the F statistic is a measure of relative variability.

How big an F value is considered sufficient for us to reject

H0? The answer depends on the distribution of the F statistic, which turns out to be the so-called (natu- rally) F distribution. (A discussion of the F distribution can be found on pp. 270–274 of the text.)

13 The F distribution is characterized by two parameters ν1 and ν2. In our setting, these parameters correspond to the degrees of freedom for MST and MSE, respectively, and hence ν1 = k − 1 and ν2 = n − k.

The F critical value for a given right-tail probability α can be found by the Excel function FINV(α, ν1, ν2).

In our example, we have ν1 =3−1=2and ν2 = 60−3= 57. For α =0.05, this yields the critical value

FINV(0.05, 2, 57) = 3.16 .

We are now ready to test H0 ...

14 The F Test

The MST and the MSE are usually computed via SST MST = k − 1 and SSE MSE = , n − k respectively, where k ¯ 2 SST ≡ nj(X¯j − X¯ ) , (5) j=1 X which is called the sum of squares for treatment, and k nj 2 SSE ≡ (Xij − X¯j) , (6) j=1 i=1 X X which is called the sum of squares for errors.

15 For our example, ...

SST:

1 2 3

k

2

j j

1

j

=

T h e g r a n d m e a n i s c a l c u l a t e d b y

= 2 0 ( 5 7 7 . 5 5 6 1 3 . 0 7 ) 2 +

+ 2 0 ( 6 5 3 . 0 0 6 1 3 . 0 7 ) 2 +

1 1 2 2 k k

+ 2 0 ( 6 0 8 . 6 5 6 1 3 . 0 7 ) 2

1 2 k

= 5 7 , 5 1 2 . 2 3

SSE:

j

2 2 3 3 2

2 2

= = 1 1

16 It follows that SST/(k − 1) F = SSE/(n − k) 57512.23/2 = 509983.50/57 = 3.23 . Since 3.23 exceeds 3.16, the F critical value at α =0.05, there is sufficient evidence to reject H0 in favor of H1 and we conclude that at least one mean sales under these three strategies differ.

As usual, we can also determine the right-side p-value associated F =3.23 via

FDIST(3.23, 2, 57) = 0.0469 ,

which is seen to be lower than 0.05.

0 . 1

0 . 0 8

0 . 0 6

p V a l u e = P ( F > 3 . 2 3 ) = . 0 4 6 9

0 . 0 4

0 . 0 2

0

0 1 2 3 4

0 . 0 2

17 The Data Analysis tool “ Anova: Single Factor” can be used to carry out F tests. This greatly simplifies our task.

The output for our example is shown below.

G r o u p s C o u n t S u m A v e r a g e V a r i a n c e

S o u r c e o f V a r i a t i o n S S d f M S F P v a l u e F c r i t

Note that the last row of the output gives the total of SST and SSE, defined as:

k nj ¯ 2 SS ≡ (Xij − X¯ ) . (7) j=1 i=1 X X

18 It can be shown that

SS=SST+SSE (8) and that SS/σ2 follows the χ2 distribution with n − 1 degrees of freedom. The decomposition of SS into the sum of SST and SSE in (8) is helpful in visualizing how the F statistic is related to the two “sources” of varia- tion, between groups and within groups (groups = treatments). We will encounter a similar decomposition in .

19 ANOVA: Two Factors . . .

As noted earlier, advertising medium (TV, newspaper, or internet) could be considered as another factor that has

an effect on sales. Visually, this means ...

O n e w a y A N O V A

S i

n g l e f a c t o r

T w o w a y A N O V A

R e s p o n s e

T w o f a c t o r s

R e s p o n s e

T r e a t m e n t 3 ( l e v e l 1 )

T r e a t m e n t 2 ( l e v e l 2 )

T r e a t m e n t 1 ( l e v e l 3 )

L e v e l 3

L e v e l 2

F a c t o r A

L e v e l 1

L e v e l 2 L e v e l 1

F a c t o r B

The Data Analysis tool “Anova: Two-Factor ...” can be used for this. The idea is the same, and we leave out the details.

20 Testing for Differences . . .

After having rejected H0 in our example, a natural ques- tion arises: Which mean(s) is (are) different and further- more, what is the rank order? A casual approach of course is to directly look at the X¯js, but we should rely on a more rigorous statistical test.

Recall that for two populations having a common un- known variance, the confidence interval for the difference between the population means µ1 − µ2 is given by

1 1 (X¯ − X¯ ) ± t s2 + , (9) 1 2 α/2 p n n s  1 2 where tα/2 has n1 + n2 − 2 degrees of freedom and 2 2 2 (n1 − 1)s1 +(n2 − 1)s2 sp = . (10) n1 + n2 − 2

In fact, the F test with k = 2 is equivalent to the t test; that is, F = t2.

21 The idea then is to apply (9) and (10) to all pairs of sample means for treatments. However, we will make two revisions to (9) and (10) ...

Revision 1 (Fisher):

2 In this revision, we replace sp in (10) by MSE. The rea- soning is that MSE is more accurate, as it is based on data from all (not just two) treatment groups. Formally, this results in the following decision rule: Re-

ject µi = µj (for any pair of i and j) if |X¯i − X¯j| > LSDij, where

1 1 LSD = t MSE + , (11) ij α/2 n n s  i j  with tα/2 now having n − k degrees of freedom. The letters LSD stand for “Least Significant Difference.”

22 Revision 2 (Bonferroni):

Let αE (“E” for experiment-wide) be the probability of committing at least one Type I error in all c = k 2 pairwise tests. Then, under the assumption that the tests are only “mildly dependent,” we have the  approximation:

c αE ≈ 1 − (1 − α) . (12)

Observe that as k grows, the above probability grows rapidly. It follows that Fisher’s procedure could easily

result in an unacceptably high αE. To remedy this problem, we will rely on the following

(Bonferroni) bound: αE ≤ c α (the probability of a union is bounded above by the sum of the probabilities for all individual events). This bound implies that if we wish to ensure that the probability of committing

a Type I error is no greater than αE, then we should set α for the pairwise tests to αE/c.

Conclusion: tα/2 in (11) should be replaced by tαE/(2c).

23 Summary

After having rejected H0, the following procedure can be used to conduct pairwise tests of differences in treatment means:

— Compute c, which is given by k(k − 1)/2.

— Set α = αE/c, where αE is the targeted probability for making at least one Type I error in all pairwise tests.

— For a pair of i and j, conclude that µi and µj differ if

1 1 X¯ − X¯ >t MSE + , (13) i j α/2 n n s  i j 

where the degrees of freedom of tα/2 is n − k.

24 Example: Advertising Strategy — Continued

We failed to accept H0. Which pair(s) of i and j has (have) different means? Analysis: For k = 3, we have c =3·2/2 = 3. To achieve

(say) αE = 0.05, we let α = 0.05/3=0.0167. From Excel, we obtain

TINV(0.0167, 60 − 3)=2.467 .

Hence,

5 7 7 5 5 6 5 3 0 7 5 5

4

x x

1 2

. . .

α i j

5 7 7 5 5 6 0 8 6 5 3 1 1 0

x x

1 3

. . .

6 5 3 0 6 0 8 6 5 3 5

4 4

x x

2 3

. . .

We conclude that there is a significant difference between

µ1 and µ2.

25