Topic 2: Completely Randomized Design
Total Page:16
File Type:pdf, Size:1020Kb
Lecture & Examples
Topic 2: Completely Randomized Design
The completely randomized design is the simplest form of experimental designs. In a completely randomized design, each treatment is applied to each experimental unit completely by chance. Although the completely randomized design is very simple, it has many advantages:
(1) It is very flexible. Any number of treatments and any number of experimental units can be used. (2) The statistical analysis is easy even if the number of experimental units in different treatments is different. (3) The statistical analysis remains simple in the presence of missing values. We can actually prove that the relative loss of information due to missing data is smaller than any other experimental designs. Completely randomized design is extremely attractive in the case when experimental units are homogeneous. For example, it is the method of choice for many laboratory experiments, e.g., in physics, chemistry, or cookery, where a quantity of material, after through mixing, is divided into small samples or batches to which the treatments are applied. Steps for Conducting an Analysis of Variance (ANOVA) for a Completely Randomized Design:
Step 1: To make sure the data come from a completely randomized design. This means that each treatment is assigned to an experimental unit completely by chance.
Step 2: Use a graphical procedure such as box-plots or dot-plots to visualize the equal variance assumption. The normality assumption is guaranteed if the data truly comes from a completely randomized design. Suppose that the equal variance assumption is not satisfied, you can either use a nonparametric method, discussed in Chapter 15, or find a statistical consultant to get help. The method discussed in Chapter 15 is just one alternative when the equal variance assumption is not satisfied. It is not the "best" alternative.
Step 3: Create an ANOVA table that is similar to Table 10.1 below with any statistical package such as SAS (used in this lecture) or SPSS. In Table 10.1, we assume that there are p treatments and n experimental units. Source df SS MS F Treatment (p 1) SST MST(a) MST/MSE Error (n p) SSE MSE(b) Total (n 1) SS(Total)
(a) MST = SST/(p 1) (b) MSE = SSE/(n p)
Note: (1) The degrees of freedom for treatment is (p 1) in a completely randomized design with p treatments. (2) The degrees of freedom for error is (n p) in a completely randomized design with p treatments and n experimental units. (3) SST + SSE = SS(Total) (4) We can complete the ANOVA table if we know any two of these three quantities: SST, SSE, or SS(Total). (5) We can complete the ANOVA table if we know both MSE and MST. (6) Partially complete ANOVA table will always be available in exams or practice problems.
Step 4: Use the F-test provided in the ANOVA table to perform the following test:
H0: 1 = 2 = . . . = p Ha: At least two treatment means are different. The output from SAS or SPSS always includes the p- value of the above F-test. We can reject the null hypothesis at significance level when "p-value ."
Step 5: Suppose the F-test suggests that one can reject the null hypothesis, one can then use the multiple comparison procedures discussed in Section 10.3 to find the mean differences. However, all of those multiple comparison procedures discussed are very conservative and will not be discussed this semester.
Example 10.4:
A manufacturer of television sets is interested in the effect on tube conductivity of four different types of coating for color picture tubes. The following conductivity data are obtained. Coating Type Conduc 1 143 141 150 146 2 152 149 137 143 3 134 136 132 127 4 129 127 132 129
(a) Complete the following ANOVA table. General Linear Models Procedure Dependent Variable: RESP Sum of Mean Source DF Squares Square F Value Pr > F Model (1) 844.68750 (5) (7) .00029 Error (2) (4) (6) Corrected Total(3) 1080.93750
Solution: n = 16, p = 4, df(Treatment) = (p 1), df(Error) = (n p), df(Total) = (n 1), SSE = SS(Total) SST, MST = SST/(p 1), MSE = SSE/(n p), F = MST/MSE
Note: In the above SAS printout, Model = Treatment and Pr > F = p-value.
General Linear Models Procedure Dependent Variable: RESP Sum of Mean Source DF Squares Square F Value Pr > F Model 3 844.68750 281.56250 14.30 .00029 Error 12 236.25000 19.68750 Corrected Total 15 1080.93750
(b) Test the null hypothesis that 1 = 2 = 3 = 4, against the alternative that at least two of the means differ. Use = 0.05.
Solution:
H0: 1 = 2 = 3 = 4 Ha: At least two means differ Test Statistic: Fc = 14.30 p-value: p-value = 0.00029 Since the p-value = 0.00029 < = 0.05, one can reject the null hypothesis.
Example 10.5:
A manufacturer suspects that the batches of raw material furnished by her supplier differ significantly in calcium content. There is a large number of batches currently in the warehouse. Five of these are randomly selected for study. A chemist makes five determinations on each batch and obtains the following data.
Batch 1 Batch 2 Batch 3 Batch 4 Batch 5 23.46 23.59 23.51 23.28 23.29 23.48 23.46 23.64 23.40 23.46 23.56 23.42 23.46 23.37 23.37 23.39 23.49 23.52 23.46 23.32 23.40 23.50 23.49 23.39 23.38
(a) Complete the following ANOVA table.
General Linear Models Procedure Dependent Variable: RESP Sum of Mean Source DF Squares Square F Value Pr > F Model (1) (4) 0.0242440 (7) .00363 Error (2) (5) 0.0043800 Corrected Total(3) (6)
Solution: n = 25, p = 5, SST = (MST)(p 1), SSE = (MSE)(n p), SS(Total) = SST + SSE
General Linear Models Procedure Dependent Variable: RESP Sum of Mean Source DF Squares Square F Value Pr > F Model 4 0.0969760 0.0242440 5.54 .00363 Error 20 0.0876000 0.0043800 Corrected Total 24 0.1845760
(b) Is there a significant variation in calcium content from batch to batch? Use = 0.05.
Solution:
H0: 1 = 2 = 3 = 4 = 5 Ha: At least two means differ Test Statistic: Fc = 5.54 p-value: p-value = 0.00363 Since the p-value = 0.00363 < = 0.05, one can reject the null hypothesis that all means are equal. Thus, there is a significant variation in calcium content from batch to batch, for = 0.05.