A Simple Sample Size Formula for Analysis of Covariance in Randomized Clinical Trials George F
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Clinical Epidemiology 60 (2007) 1234e1238 ORIGINAL ARTICLE A simple sample size formula for analysis of covariance in randomized clinical trials George F. Borma,*, Jaap Fransenb, Wim A.J.G. Lemmensa aDepartment of Epidemiology and Biostatistics, Radboud University Nijmegen Medical Centre, Geert Grooteplein 21, PO Box 9101, NL-6500 HB Nijmegen, The Netherlands bDepartment of Rheumatology, Radboud University Nijmegen Medical Centre, Geert Grooteplein 21, PO Box 9101, NL-6500 HB Nijmegen, The Netherlands Accepted 15 February 2007 Abstract Objective: Randomized clinical trials that compare two treatments on a continuous outcome can be analyzed using analysis of covari- ance (ANCOVA) or a t-test approach. We present a method for the sample size calculation when ANCOVA is used. Study Design and Setting: We derived an approximate sample size formula. Simulations were used to verify the accuracy of the for- mula and to improve the approximation for small trials. The sample size calculations are illustrated in a clinical trial in rheumatoid arthritis. Results: If the correlation between the outcome measured at baseline and at follow-up is r, ANCOVA comparing groups of (1 À r2)n subjects has the same power as t-test comparing groups of n subjects. When on the same data, ANCOVA is usedpffiffiffiffiffiffiffiffiffiffiffiffiffi instead of t-test, the pre- cision of the treatment estimate is increased, and the length of the confidence interval is reduced by a factor 1 À r2. Conclusion: ANCOVA may considerably reduce the number of patients required for a trial. Ó 2007 Elsevier Inc. All rights reserved. Keywords: Power; Sample size; Precision; Analysis of covariance; Clinical trial; Statistical test 1. Introduction available so far. Consequently, when ANCOVA is planned for a trial, this is usually not taken into account in the de- Randomized clinical trials (RCTs) that compare treat- termination of the sample size, leading to unnecessarily ment A with treatment B on a continuous outcome measure large trials. can be analyzed in several ways. A straightforward option We propose a two-step method for the sample size cal- is to compare the follow-up scores at the end of the treat- culation. First, the sample size is calculated as if a t-test ment period (Y ) using a t-test or analysis of variance (AN- 1 on the follow-up scores were carried out, then the number OVA). When the outcome is also measured at baseline (Y ), 0 of subjects is multiplied by a ‘‘design factor’’ to produce Y Y the change scores ( 1 À 0) between the treatment groups the number of subjects required for the ANCOVA. As the can be compared, again using a t-test. Another approach power of an ANCOVA with dependent variable Y À Y is is to use analysis of covariance (ANCOVA) and to analyze 1 0 the same as the power of an ANCOVA with variable Y , Y or Y Y in a linear regression model that includes 1 1 1 À 0 we only discuss the latter method. treatment group and Y0 as independent covariates (Y1jY0 or Y1 À Y0jY0). An advantage of the use of ANCOVA is that it adjusts 2. Methods for baseline differences between the treatment groups. AN- COVA also has more statistical power than the t-test, so We assumed that Y0 and Y1 were the baseline and out- sample size requirements are lower [1e3]. Although this come variables, respectively, of a clinical trial with two is commonly known, to our knowledge, simple methods treatment groups. The standard deviation (SD) and the cor- for the sample size calculation for ANCOVA have not been relation between Y0 and Y1 were known. We then calculated the conditional variance of Y1jY0. Based on this result we determined the design factor, that is, the ratio between * Corresponding author. Tel.: þ31-24-361-7667; fax: þ31-24-361- the number of subjects required for an ANCOVA and the 3505. number of subjects required for a t-test. In practice, the E-mail address: [email protected] (G.F. Borm). SD and correlation are (implicitly) estimated as a part of 0895-4356/07/$ e see front matter Ó 2007 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2007.02.006 G.F. Borm et al. / Journal of Clinical Epidemiology 60 (2007) 1234e1238 1235 pffiffiffiffiffiffiffiffiffiffiffiffiffi the ANCOVA procedure. This means that the sample size of t-test, this increases the precision by 1 À r2, that is, the based on known SDs and known correlation coefficient is standard error and the length of the confidence interval (CI) only approximate. We therefore conducted a simulation of the differencepffiffiffiffiffiffiffiffiffiffiffiffiffi between the treatment groups are reduced r2 study to verify the accuracy of our sample size calculation by a factor 1 À . When changepffiffiffiffiffiffiffiffiffiffiffiffiffiffi from baseline is used, method. As the method consists of two steps, we evaluated the precision changes by a factor 2 À 2r.Forr O 0.5 this the accuracy of both steps. is a decrease, but for r ! 0.5 it is an increase. We only show results for trials with treatment groups of equal size, but the results for unequal group sizes are 3.1. Accuracy of sample size calculation methods similar. for the t-test 2.1. Accuracy of sample size calculation methods As expected, the sample sizes calculated by Nquery AdvisorÒ were quite accurate. However, the formula for the t-test 2 2 2 n 5 2(Z1Àa/2 þ Z1Àb) s /(mB À mA) leads to sample sizes To evaluate the first step, that is, the determination of the with too little power. The thin and bold broken lines in sample size required for the t-test, we assumed that the Fig. 1 show the ‘‘true’’ power, when according to the for- treatment differences between the groups ranged from 0.5 mula it should have been 80% and 90%, respectively. The SDs up to 1.5 SDs. For each difference, we calculated the unbroken lines show the power when the formula 2 2 2 sample size that is required for 80% or 90% power, using n 5 2(Z1Àa/2 þ Z1Àb) s /(mB À mA) þ 1 was used. An in- the sample size calculation package Nquery AdvisorÒ or crease of one subject per treatment arm, two in total, leads 2 2 2 the formula n 5 2(Z1Àa/2 þ Z1Àb) s /(mB À mA) [4].For exactly to the required power. each combination of treatment difference and group size, we ran 6,400 trials and estimated the ‘‘true power’’ by cal- 3.2. Accuracy of the sample size calculation method culating the percentage of t-tests that were statistically for ANCOVA significant. The design factor 1 À r2 may not be accurate for small 2.2. Accuracy of the sample size calculation method trials. In Fig. 2a, the true power (vertical axis) is plotted for ANCOVA against the correlation between Y0 and Y1 (horizontal axis). The second step of our sample size method is to multiply 100 the sample size for the t-test by the design factor. We ver- ified the accuracy of the design factor by simulating clinical trials with baseline and outcome variables Y0 and Y1, re- spectively. The differences between the means of Y1 in the groups were chosen in such a way that the required 90 sample sizes as calculated using our method were 10, 25, 50, and 100 per treatment arm. The means of Y0 in the two groups were equal, and the correlation between Y0 and Y1 varied between 0.1 and 0.9. For each combination of group size and correlation, we generated 6,400 trials er and analyzed them using ANCOVA. To estimate the true 80 power, we calculated the percentage of trials with a statisti- Pow cally significant treatment effect. 3. Results 70 In the appendix it is shown that for large trials the design factor (variance deflation factor) for ANCOVA is 1 À r2, where r is the correlation between Y0 and Y1. As a conse- 2 quence, ANCOVA with (1 À r )n subjects has the same 60 power as t-test with n subjects. It is straightforward to cal- 10 20 30 40 50 culate that for a t-test on the change from baseline n (Y À Y ), the design factor is 2 À 2r:ifat-test on Y re- 1 0 1 2 2 2 quires n subjects, then a t-test on the change from baseline Fig. 1. Accuracy of sample size formula n 5 2(Za/2 þ Zb) s /(mB À mA) . The thin and the bold broken lines indicate the true power when the in- requires (2 À 2r)n subjects. tended (nominal) power was 80% and 90%, respectively. The unbroken The design factor can also be used to compare the pre- lines indicate the true power when one extra subject was added per treat- cision of the analysis methods. If ANCOVA is used instead ment group. 1236 G.F. Borm et al. / Journal of Clinical Epidemiology 60 (2007) 1234e1238 100 90 80 Power 70 Power =80%, N= 10 Power =90%, N= 10 Power =80%, N= 25 Power =90%, N= 25 Power =80%, N= 50 Power =90%, N= 50 Power=80%, N=100 Power=90%, N=100 60 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 Correlation Fig. 2. Accuracy of the sample size calculation method for ANCOVA. (Left) True power when the intended (nominal) power was 80% (thin lines) and 90% (bold lines). (Right) True power after addition of one extra subject per treatment group.