<<

Lecture 29 RCBD & Unequal Cell Sizes

STAT 512 Spring 2011

Background Reading KNNL: 21.1-21.6, Chapter 23

29-1 Topic Overview

• Randomized Complete Block Designs (RCBD)

• ANOVA with unequal sample sizes

29-2 RCBD • Randomized complete block designs are useful whenever the experimental units are non-homogeneous. • Grouping EU’s into “blocks” of homogeneous units helps reduce the SSE and increase the likelihood that we will be able to see differences among treatments. • A “block” consists of a complete of the set of treatments. Blocks and treatments are assumed not to interact.

29-3 RCBD Model

• Assuming no replication, same as two-way ANOVA with one observation per cell. No between block and treatment.

Yijk=ρ + i + τ j + ε ijk iid where ε∼N 0, σ 2 and ρ= τ = 0 ijk ( ) ∑i ∑ i

• We refer to ρi as the block effects and τj as the treatment effects. • We are really only interested in further analysis on the treatment effects.

29-4 RCBD Example • Want to study the effects of three different sealers on protecting concrete patios from the weather. • Ten unsealed patios are available spread across Indianapolis. • Separate each patio into three portions, and apply the treatments (randomly) in such a way that each patio receives each treatment for 1/3 of the surface.

29-5 RCBD Example (2)

• Patio (location) is a factor. Probably the weather will be different in each location; some patios may be better sheltered (trees, etc.) • If patio location is important, then failing to block on patio location would probably that the MSE will be overestimated. • Blocking requires DF (9 in this case), but usually if blocking variable is unimportant, the MSE with/without blocking will be about the same.

29-6 RCBD Example (3)

Source DF SS MS F Value patio 9 900 100 9.0 sealer 2 100 50 5.0 Error 18 180 10 Total 29 1180 • If the ANOVA results are as above, then blocking is clearly important. If we do not block here... Source DF SS MS F Value sealer 2 100 50 1.25 Error 27 1080 40

29-7 RCBD Example (4)

Source DF SS MS F Value patio 9 108 12 1.2 sealer 2 100 50 5.0 Error 18 180 10 Total 29 388 • If the ANOVA results are as above, then blocking doesn’t appear to have been as important. In this case if we fail to block... Source DF SS MS F Value sealer 2 100 50 4.69 Error 27 288 10.7

29-8 Big Picture

• Failing to block when you should block can cost you the ability to see treatment effects

• Blocking when there is no need usually often doesn’t cost much at all (though it can if the SSBlock is small enough relative to df).

• Blocking effectively requires foresight. An experimenter must guess what sources of variation will exist in order to block on them.

29-9 Other Advantages of RCBD • Reasonably simple analysis to perform • Effective grouping makes results much more precise. • Can drop an entire block or treatment if necessary, without complicating the analysis. • Can deliberately introduce extra variability into the EU’s to widen the of of the results without sacrificing precision.

29-10 Some Disadvantages of RCBD

• Missing observations are a complex problem (since generally each treatment is represented exactly once in each block) • Loss of error degrees of freedom • Additional assumptions are required for the model (additivity, constant across blocks)

29-11 Multiple Blocking Variables

• Often is the case that you the EU’s have multiple characteristics on which you could block. • Example: Consider the effect of three treatments for asthma. Might block on both AGE and GENDER. • Each treatment would be represented once at each AGE*GENDER combination.

29-12 More on Blocking...

• Quite a bit more information in Chapter 21 o More than one replicate per block o Factorial treatments

• Would discuss this and related topics in STAT 514.

29-13 Unequal Sample Sizes • Encountered for a variety of reasons including:  Convenience – usually if we have an , we have very little control over the cell sizes.  Cost Effectiveness – sometimes the cost of samples is different, and we may use larger sample sizes when the cost is less  In experimental studies, you may start with a balanced design, but lose that balance if problems occur.

29-14 Unequal Sample Sizes (2) • What changes?  Loss of balance brings “intercorrelation” among the predictors (i.e., variables are no longer orthogonal)  Type I and III SS will be different; typically Type III SS should be used for testing  LSMeans should be used for testing  Standard errors for cell and for multiple comparisons will be different  Confidence intervals will have different widths

29-15 Example • Examine the effects of gender (A) and bone development (B) on the rate of growth induced by a synthetic growth hormone. • Three categories of Bone Development Depression (Severe, Moderate, and Mild) • We categorize people on this basis after they are in the study (it is an observational factor); we wouldn’t want to throw away just to keep a balanced design. • Page 954, growth.sas

29-16 Data / Sample Sizes

Severe Moderate Mild 1.4 2.1 0.7 Male 2.4 1.7 1.1 2.2 2.4 2.5 0.5 Female 1.8 0.9 2.0 1.3

29-17

29-18 Interpretation • Same as any interaction plot • Effect seems to be greater if disease is more severe. • Effect seems greater for women than men. • Possibly an interaction. The effect of bone development is enhanced (greater) for women as compared to men. • We aren’t saying anything about significance here – we’ll do that when we look at the ANOVA.

29-19 ANOVA Output

Source DF SS MS F Value Pr > F Model 5 4.474 0.895 5.51 0.0172 Error 8 1.300 0.163 Total 13 5.774

R-Square Root MSE growth Mean 0.774864 0.403113 1.642857

29-20 Type I / III SS

Source DF Type I SS MS F Value Pr > F gender 1 0.00286 0.00286 0.02 0.8978 bone 2 4.39600 2.19800 13.53 0.0027 gen*bone 2 0.07543 0.03771 0.23 0.7980

Source DF Type III SS MS F Value Pr > F gender 1 0.1200 0.1200 0.74 0.4152 bone 2 4.1897 2.0949 12.89 0.0031 gen*bone 2 0.0754 0.0377 0.23 0.7980

29-21 Type X SS • There are actually four relevant types of sums of squares.

 I – Sequential  II – Added Last (Observation)  III – Added Last (Cell)  IV – Added Last (Empty Cells)

29-22 Types I SS • Sequential Sums of Squares, appropriate for equal cell sizes.

• SS(A), SS(B|A), SS(A*B|A,B)

• Each observation is weighted equally , with the result that treatments are weighted in proportion to their cell size (if unequal, then not all treatments get the same weight in the analysis)

29-23 Types II SS

• Variable Added Last SS, appropriate for equal cell sizes.

• SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B)

• Each observation is weighted equally

29-24 Types III SS • Variable Added Last SS, appropriate for unequal cell sizes.

• SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B)

• Each cell/treatment is weighted equally , but observations are weighted differently. Type III SS adjusts for the fact that cell sizes are different, unequal weighting of observations.

29-25 Type IV SS

• Variable Added Last SS, necessary if there are empty cells

• SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B)

• Like Type III SS but additionally takes into account the possibility of empty cells.

29-26 Data: Design Chart

Severe Moderate Mild Male xxx xx xx Female x xxx xxx

29-27 Example: Type I Hypotheses

Main Effect Gender 3 2 2 1 3 3 H 0: 7 11+ 7 12 + 7 13 = 7 21 + 7 22 + 7 23

Main Effect Bone 3 1 2 3 2 3 H 0: 4 11+ 4 21 =5 12 + 5 22 = 5 13 + 5 23

Observations weighted equally, treatment weighted by sample size.

29-28 Example: Type III Hypotheses

Main Effect Gender 1 1 H 0: 3( 11++ 12 13) = 3 ( 21 ++ 22 23 )

Main Effect Bone 1 1 1 H 0: 2( 11+= 21) 2( 12 += 22) 2 ( 13 + 23 )

Treatments are weighted equally, observations not weighted equally.

29-29 General Strategy • Remember that Type I SS and Type III SS examine different null hypotheses. • Type III SS are preferred when sample sizes are not equal, but can be somewhat misleading if sample sizes differ greatly. • Type IV SS are appropriate if there are empty cells. • Can obtain Type II/IV SS if necessary by using /ss1 ss2 ss3 ss4 in MODEL statement

29-30 Example: Type III SS

Source DF Type III SS MS F Value Pr > F gender 1 0.1200 0.1200 0.74 0.4152 bone 2 4.1897 2.0949 12.89 0.0031 gen*bone 2 0.0754 0.0377 0.23 0.7980

• The interaction and gender effects are not significant. • Now look at comparing different levels of bone; should not ‘change’ models at this point, so need to average over gender.

29-31 Multiple Comparisons • Suppose we keep model as is, and examine effect of bone. • Output from MEANS statement (WRONG):

Level of ------growth------bone N Mean Std Dev mild 5 0.900 0.31622777 moderate 5 2.020 0.31144823 severe 4 2.100 0.47609523 • These numbers are not adjusted for gender.

29-32 Multiple Comparisons (2) • Output from LSMeans (means are correctly adjusted for the level of gender – can think of these as the means for the “average” gender). growth LSMEAN bone LSMEAN Number mild 0.90000000 1 moderate 2.00000000 2 severe 2.20000000 3

29-33 Multiple Comparisons

• Example – Severe Case

1.4+ 2.4 + 2.2 + 2.4 = 2.1 MEANS: 4

(Sum up all severe cases and divide by number of severe cases regardless of gender) 1.4+ 2.4 + 2.2 + 2.4 3 = 2.2 LSMEANS: 2

(For severe cases, get averages for men and women and then take the average – accounts for gender)

29-34 Examining Differences (LSMEANS)

Least Squares Means for effect bone Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: growth i/j 1 2 3 1 0.0072 0.0059 2 0.0072 0.7845 3 0.0059 0.7845 • Mild group is significantly different from the moderate and severe groups (those groups are aided more by the hormone)

29-35 Examining Differences (LSMEANS) bone LSMEAN 95% Confidence Limits mild 0.900 0.475707 1.324293 moderate 2.000 1.575707 2.424293 severe 2.200 1.663307 2.736693 • Growth rate increased for each group, but increased by about 1 cm / month more in the moderate/severe groups than in the mild group • Note that the widths of these CI’s are different due to different sample sizes (severe is wider, since less observations) 29-36 Upcoming in Lecture 30

• A few more examples of unequal sample sizes.

• Analysis of

29-37