Lecture 29 RCBD & Unequal Cell Sizes
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 29 RCBD & Unequal Cell Sizes STAT 512 Spring 2011 Background Reading KNNL: 21.1-21.6, Chapter 23 29-1 Topic Overview • Randomized Complete Block Designs (RCBD) • ANOVA with unequal sample sizes 29-2 RCBD • Randomized complete block designs are useful whenever the experimental units are non-homogeneous. • Grouping EU’s into “blocks” of homogeneous units helps reduce the SSE and increase the likelihood that we will be able to see differences among treatments. • A “block” consists of a complete replication of the set of treatments. Blocks and treatments are assumed not to interact. 29-3 RCBD Model • Assuming no replication, same as two-way ANOVA with one observation per cell. No interaction between block and treatment. Yijk=µρ + i + τ j + ε ijk iid where ε∼N 0, σ 2 and ρ= τ = 0 ijk ( ) ∑i ∑ i • We refer to ρi as the block effects and τj as the treatment effects. • We are really only interested in further analysis on the treatment effects. 29-4 RCBD Example • Want to study the effects of three different sealers on protecting concrete patios from the weather. • Ten unsealed patios are available spread across Indianapolis. • Separate each patio into three portions, and apply the treatments (randomly) in such a way that each patio receives each treatment for 1/3 of the surface. 29-5 RCBD Example (2) • Patio (location) is a blocking factor. Probably the weather will be different in each location; some patios may be better sheltered (trees, etc.) • If patio location is important, then failing to block on patio location would probably mean that the MSE will be overestimated. • Blocking requires DF (9 in this case), but usually if blocking variable is unimportant, the MSE with/without blocking will be about the same. 29-6 RCBD Example (3) Source DF SS MS F Value patio 9 900 100 9.0 sealer 2 100 50 5.0 Error 18 180 10 Total 29 1180 • If the ANOVA results are as above, then blocking is clearly important. If we do not block here... Source DF SS MS F Value sealer 2 100 50 1.25 Error 27 1080 40 29-7 RCBD Example (4) Source DF SS MS F Value patio 9 108 12 1.2 sealer 2 100 50 5.0 Error 18 180 10 Total 29 388 • If the ANOVA results are as above, then blocking doesn’t appear to have been as important. In this case if we fail to block... Source DF SS MS F Value sealer 2 100 50 4.69 Error 27 288 10.7 29-8 Big Picture • Failing to block when you should block can cost you the ability to see treatment effects • Blocking when there is no need usually often doesn’t cost much at all (though it can if the SSBlock is small enough relative to df). • Blocking effectively requires foresight. An experimenter must guess what sources of variation will exist in order to block on them. 29-9 Other Advantages of RCBD • Reasonably simple analysis to perform • Effective grouping makes results much more precise. • Can drop an entire block or treatment if necessary, without complicating the analysis. • Can deliberately introduce extra variability into the EU’s to widen the range of validity of the results without sacrificing precision. 29-10 Some Disadvantages of RCBD • Missing observations are a complex problem (since generally each treatment is represented exactly once in each block) • Loss of error degrees of freedom • Additional assumptions are required for the model (additivity, constant variance across blocks) 29-11 Multiple Blocking Variables • Often is the case that you the EU’s have multiple characteristics on which you could block. • Example: Consider the effect of three treatments for asthma. Might block on both AGE and GENDER. • Each treatment would be represented once at each AGE*GENDER combination. 29-12 More on Blocking... • Quite a bit more information in Chapter 21 o More than one replicate per block o Factorial treatments • Would discuss this and related topics in STAT 514. 29-13 Unequal Sample Sizes • Encountered for a variety of reasons including: Convenience – usually if we have an observational study, we have very little control over the cell sizes. Cost Effectiveness – sometimes the cost of samples is different, and we may use larger sample sizes when the cost is less In experimental studies, you may start with a balanced design, but lose that balance if problems occur. 29-14 Unequal Sample Sizes (2) • What changes? Loss of balance brings “intercorrelation” among the predictors (i.e., variables are no longer orthogonal) Type I and III SS will be different; typically Type III SS should be used for testing LSMeans should be used for testing Standard errors for cell means and for multiple comparisons will be different Confidence intervals will have different widths 29-15 Example • Examine the effects of gender (A) and bone development (B) on the rate of growth induced by a synthetic growth hormone. • Three categories of Bone Development Depression (Severe, Moderate, and Mild) • We categorize people on this basis after they are in the study (it is an observational factor); we wouldn’t want to throw away data just to keep a balanced design. • Page 954, growth.sas 29-16 Data / Sample Sizes Severe Moderate Mild 1.4 2.1 0.7 Male 2.4 1.7 1.1 2.2 2.4 2.5 0.5 Female 1.8 0.9 2.0 1.3 29-17 29-18 Interpretation • Same as any interaction plot • Effect seems to be greater if disease is more severe. • Effect seems greater for women than men. • Possibly an interaction. The effect of bone development is enhanced (greater) for women as compared to men. • We aren’t saying anything about significance here – we’ll do that when we look at the ANOVA. 29-19 ANOVA Output Source DF SS MS F Value Pr > F Model 5 4.474 0.895 5.51 0.0172 Error 8 1.300 0.163 Total 13 5.774 R-Square Root MSE growth Mean 0.774864 0.403113 1.642857 29-20 Type I / III SS Source DF Type I SS MS F Value Pr > F gender 1 0.00286 0.00286 0.02 0.8978 bone 2 4.39600 2.19800 13.53 0.0027 gen*bone 2 0.07543 0.03771 0.23 0.7980 Source DF Type III SS MS F Value Pr > F gender 1 0.1200 0.1200 0.74 0.4152 bone 2 4.1897 2.0949 12.89 0.0031 gen*bone 2 0.0754 0.0377 0.23 0.7980 29-21 Type X SS • There are actually four relevant types of sums of squares. I – Sequential II – Added Last (Observation) III – Added Last (Cell) IV – Added Last (Empty Cells) 29-22 Types I SS • Sequential Sums of Squares, appropriate for equal cell sizes. • SS(A), SS(B|A), SS(A*B|A,B) • Each observation is weighted equally , with the result that treatments are weighted in proportion to their cell size (if unequal, then not all treatments get the same weight in the analysis) 29-23 Types II SS • Variable Added Last SS, appropriate for equal cell sizes. • SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B) • Each observation is weighted equally 29-24 Types III SS • Variable Added Last SS, appropriate for unequal cell sizes. • SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B) • Each cell/treatment is weighted equally , but observations are weighted differently. Type III SS adjusts for the fact that cell sizes are different, unequal weighting of observations. 29-25 Type IV SS • Variable Added Last SS, necessary if there are empty cells • SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B) • Like Type III SS but additionally takes into account the possibility of empty cells. 29-26 Data: Design Chart Severe Moderate Mild Male xxx xx xx Female x xxx xxx 29-27 Example: Type I Hypotheses Main Effect Gender 3 2 2 1 3 3 H 0: 7µ 11+ 7 µ 12 + 7 µ 13 = 7 µ 21 + 7 µ 22 + 7 µ 23 Main Effect Bone 3 1 2 3 2 3 H 0: 4µ 11+ 4 µ 21 =5 µ 12 + 5 µ 22 = 5 µ 13 + 5 µ 23 Observations weighted equally, treatment weighted by sample size. 29-28 Example: Type III Hypotheses Main Effect Gender 1 1 H 0: 3(µµµ 11++ 12 13) = 3 ( µµµ 21 ++ 22 23 ) Main Effect Bone 1 1 1 H 0: 2(µµ 11+= 21) 2( µµ 12 += 22) 2 ( µµ 13 + 23 ) Treatments are weighted equally, observations not weighted equally. 29-29 General Strategy • Remember that Type I SS and Type III SS examine different null hypotheses. • Type III SS are preferred when sample sizes are not equal, but can be somewhat misleading if sample sizes differ greatly. • Type IV SS are appropriate if there are empty cells. • Can obtain Type II/IV SS if necessary by using /ss1 ss2 ss3 ss4 in MODEL statement 29-30 Example: Type III SS Source DF Type III SS MS F Value Pr > F gender 1 0.1200 0.1200 0.74 0.4152 bone 2 4.1897 2.0949 12.89 0.0031 gen*bone 2 0.0754 0.0377 0.23 0.7980 • The interaction and gender effects are not significant. • Now look at comparing different levels of bone; should not ‘change’ models at this point, so need to average over gender. 29-31 Multiple Comparisons • Suppose we keep model as is, and examine effect of bone.