ST 370 - Factorial Experiments and ANOVA

Chapter 4 ST 370 - Factorial Experiments and ANOVA Readings: Chapter 13.1-13.2, Chapter 14.1-14.4 Recap: So far we've learned: • Why we want a `random' sample and how to achieve it (Sampling Scheme) • How to use randomization, replication, and control/blocking to create a valid experiment. • Now well look at a specific type of experiment and how to investigate which factors are important. Motivating Example: Mentos and Coke Consider an experiment where we want to determine the effect of initial volume (591 ml, 1000 ml, or 2000ml) on the % of coke expelled when Mentos (the freshmaker) are dropped in. A CRD was used. What is the response? Factor(s)? Level(s)? Treatments? What parameters might answer our question? 30 Suppose we collect data. Consider the following two hypothetical sets of boxplots for the data: Which set of boxplots gives more evidence that the true means differ? Although we'll never know the true values of the parameters, we can use our sample data to estimate them. 31 How to use these estimates to make a claim? One-Way Analysis of Variance (ANOVA) Model (Used to analyze a CRD): Consider the data below We ‘fit’ the following model to this completely randomized design: 32 Is the factor important in our One-Way ANOVA model? How can we estimate each treatment mean? If these sample means differ by enough, what would this imply? - the difference(s) in the mean response when the factor goes from one level to another. Here, we would have two elements of our main effect. Write down their true values and their estimates. If the differences are not 33 Ok, so now we can estimate the treatment means for the Mentos and Coke example. Does there appear to be evidence the factor is important? What information would help? One-Way ANOVA model in statcrunch 34 Remember: Statistics is all about Variation! Total amount of variation in the data: ANOVA (Analysis of Variance) table splits up this total variation into different sources to help determine which sources are statistically significant. The ANOVA table generally has 6 columns • Source: • SS: • Df: Degrees of Freedom • MS: • F-stat: • P-value: 35 In One-Way ANOVA we only have 2 sources we care about (recall sources of variation from the design of experiments section!): • Treatment Effect • Error Source: Treatment Effect • SS(Trt) = • DF: For t treatments, we have • Often called • MS(Trt) = Source: Error • SS(E) = • DF: For N total observations and t treatments, we have • Often called • MS(E) = 36 Table for balanced one-way ANOVA: Source DF SS MS F SS(T ) MS(T ) Treatments t − 1 SS(T ) MS(T ) = (t−1) F = MS(E) SS(E) Error t(n − 1) SS(E) MS(E) = (N−t) Total nt − 1 SS(T OT ) where t n t X X 2 X 2 SS(T ) = (¯yi• − y¯••) = n (¯yi• − y¯••) i=1 j=1 i=1 t n X X 2 SS(E) = (yij − y¯i•) i=1 j=1 t n X X 2 SS(T ot) = (yij − y¯••) i=1 j=1 Notes: SS(T) is also called SS(Between) and SS(E) is also called SS(Within). Treatment DF + Error DF = Total DF SS(Trt) + SS(E) = SS(Tot) More on the F-ratio and P-value 37 Recall Boxplot idea: Consider the following two hypothetical sets of boxplots for the data: We use the p-value to determine if the F-ratio is `large' enough. If p-value is less than a pre-specified value (usually 0.05) we say have evidence that the main effect(s) are not all 0. That is Idea of a P-value • P-values are • Here, p-value represents • P-value for Initial Volume = 0.0148. Small! Goals of One-Way ANOVA • Determine if the factor is related to the response. • If so, estimate the main effects (factor level differences) 38 One-Way ANOVA Example: (some description taken from Goosen, 2014) Consider having 24 pieces of cheese. Color of the cheese is important in terms of consumer satisfaction. We have interest in how the color differs for 4 different types of corn syrup (26, 42, 55, and 62) (4 treatments). A CRD design is decided upon and we randomly assign each corn syrup type to 6 pieces of cheese (6 replicates for each treatment). As a response, we measure the color using a 3 part CIE L*a*b* Color System. • `L' reflects the lightness of a sample, from black (L = 0) to white (L = 100) and runs from top to bottom. • à' defines the shades from red (positive values) to green (negative values). • `b' defines the shades from yellow (positive values) to blue (negative values). All three of these could be treated as responses (and analyzed together), but for our purposes we will only look at the `L' response variable. Again, we will focus on the means of the population. How might we make inference here? Define • µ1 = mean `L' score for all pieces of cheese that with corn syrup 26. • µ2 = mean `L' score for all pieces of cheese that with corn syrup 42. • µ3 = mean `L' score for all pieces of cheese that with corn syrup 55. • µ4 = mean `L' score for all pieces of cheese that with corn syrup 62. 1. What is our factor and what are the levels of that factor? 2. What hypothesis do we want to test? 39 3. The data are given below. Fill in the label column (labeling the observations in terms of y's) Data and labeling: Corn Syrup Replicate # `L' measurement Response Label 26 1 51.89 26 2 51.52 26 3 52.69 26 4 52.06 26 5 51.63 26 6 52.73 42 1 47.21 42 2 48.57 42 3 47.57 42 4 46.85 42 5 48.64 42 6 47.49 55 1 41.43 55 2 42.31 55 3 42.31 55 4 41.49 55 5 42.12 55 6 42.65 62 1 45.99 62 2 46.66 62 3 47.35 62 4 45.83 62 5 46.77 62 6 47.88 40 4. Summary statistics (from SAS) and a boxplot of the data is given below. Based on these summary measures, does there appear to be a relationship between the `L' score and corn syrup amount? Justify your answer. 41 Analysis of this data can be done in SAS using the following code: (assume data is read in as cheese) proc anova data=cheese; class syrup; model L = syrup; means syrup/tukey; run; 5. Why do we have 3 degrees of freedom for the model (aka treatment)? 6. What is the number 313.39 and what does that value represent for this data? 42 7. If the value for SS(Trt) were missing but you had SS(Tot) and SS(E), how could ou find SS(Trt)? 8. What is the relationship between the Mean square values and the F-ratio? 9. What does the MSE value of 0.41378 represent? 10. Based on the p-value what is your conclusion about the `hypothesis' we are testing? 11. What does the p-value mean in words for this experiment? 12. Since we have a significant factor, we would want to estimate the main effects. Give an estimate of the main effect of syrup 42 vs syrup 26 and also for syrup 55 vs syrup 26. 43 Factorial designs • This idea of modeling can be extended to when we look at more than one factor at a time. • Factorial experiment: • Full factorial design (henceforth factorial design): Example: Gardner wants to look at water and fertilizer on crop yield. • factor A: Water (levels= Low, High) • factor B: Fertilizer (levels = Nitrogen, Phosphate) Treatments? Note: In general, no need to look at all treatment combinations. However, we will only consider the full experiments. Why do we want to look at more than one variable at a time? 44 Notation for factorial design • 2x2 factorial design = • 3x4 factorial design = • 2x3x4 factorial design = Total # of treatments is found by multiplying the #s of levels. How many treatments for each design above? Recall: Goals of One-Way ANOVA • Determine if the factor is related to the response. • If so, estimate the main effects (factor level differences) Goals of factorial data analysis? • • • • Basically, which factors are important and then estimation of the appropriate effects. 45 2x2 ANOVA example: Back to Mentos and Coke Example A CRD is run, the response is still percent of coke expelled, but consider a second factor in this experiment. • Factor A: initial volume (591 ml, 1000 ml, or 2000ml) • Factor B: # of mentos (4 or 8) What type of factorial design is this? How many total treatments? What are the parameters of interest now? We might assume the factors act independently of one another, called an additive (or main effects only) model. 46 Two-Way (additive) ANOVA Model First, consider a different parametrization of the One-Way ANOVA Model. Back to Two-Way (additive) ANOVA model Assuming the factors act independently the two-way additive ANOVA model is 47 Now we can get the `treatment means' with these parameters: • µ11 = Mean for 591 ml/4 mentos group = • µ21 = Mean for 1000 ml/4 mentos group = • µ31 = Mean for 2000 ml/4 mentos group = • µ12 = Mean for 591 ml/8 mentos group = • µ22 = Mean for 1000 ml/8 mentos group = • µ32 = Mean for 2000 ml/8 mentos group = How can we estimate each treatment mean? • µ^11 = • µ^21 = • µ^31 = • µ^12 = • µ^22 = • µ^32 = How to determine if the Factors are Important? If the sample means involving the same level of factor B differ by enough, then we would say factor A is important! If the sample means involving the same level of factor A differ by enough, then we would say factor B is important These are investigations of the main effects! 48 Let's estimate the treatment means and main effects for this example.

Load more