Confidence Intervals in Analysis and Reporting of Clinical Trials Guangbin Peng, Eli Lilly and Company, Indianapolis, IN

ABSTRACT for more frequent use of confidence Regulatory agencies around the world intervals (Simon, 1993). have recommended reporting confidence There is a close relationship between intervals for treatment differences along confidence intervals and significance with the results of significance tests. tests (Hahn and Meeker, 1991); in fact, a SAS provides easy and convenient ways can often be used to to produce confidence intervals using test a hypothesis. If the 100(1-α)% procedures such as PROC GLM and confidence interval for the PROC UNIVARIATE in conjunction treatment difference in a with ODS (output delivery system). In does not contain zero, there is evidence this paper, I will discuss the relationship to indicate a treatment difference at the between significance tests and 100 α % significance level. This strategy confidence intervals, summarize the is equivalent to the hypothesis test that types of confidence intervals used in rejects the null hypothesis of no mean clinical study reports, and provide treatment difference at the level of α. examples from clinical trials to illustrate Compared to p-values, confidence the computation of distribution- intervals are generally more informative. dependent confidence intervals for the They provide quantitative bounds that mean treatment difference and express the uncertainty inherent in distribution-free confidence intervals for estimation, instead of merely an accept the response within each or reject statement. The length of a treatment group using SAS. confidence interval depends on the size; this influence of sample INTRODUCTION size is evident from observing the length Confidence interval estimation and of the interval, while this is not the case significance testing (hypothesis testing) for a significance test. So confidence are the two most commonly used intervals are usually more meaningful methods for clinical than statistical hypothesis tests alone. trials (Walker, 1997). Because p-values Moreover, they are easier to explain to have been widely presented and, in the those with no formal training in past, more easily obtained from standard (Hahn and Meeker, 1991). statistical software than have confidence Regulatory agencies around the world intervals, significance tests have also have emphasized in recent years the been more widely accepted by the importance of reporting confidence medical community than have intervals in clinical study reports. The confidence intervals. The extensive use ICH Harmonized Tripartite Guideline on of significance tests with clinical trial statistical principles for clinical trials has further increased their states, “Estimates of treatment effects popularity and made confidence should be accompanied by confidence intervals less popular. However, the intervals, whenever possible, and the overuse and misinterpretation of way in which these will be calculated significance tests have lead to advocates should be identified…. it is important to bear in mind the need to provide

statistical estimates of the size of based on what seems to be an acceptable treatment effects together with degree of assurance for the specific confidence intervals (in addition to application (Hahn and Meeker, 1991). significance tests).” For instance, the analyst may construct a In this paper, I will discuss the types of 95% confidence interval for the mean. confidence intervals and the different This indicates that the method of methods for constructing them. I will construction guarantees that 95% of all also illustrate the calculation of such intervals will contain the (true) confidence intervals for clinical trial data population mean. (Of course, this also by providing sample SAS code and that 5% of them will not.) One output. Finally, I will discuss the can request a higher level of confidence, advantages of presenting confidence which will reduce the chances of intervals along with p-values in clinical obtaining an interval that does not study reports. contain the population mean. However, increasing the confidence level results in CHARACTERISTICS OF a wider (that is, less precise) interval for CONFIDENCE INTERVALS a fixed sample size. On the other hand, Because there exist a variety of when there is a fixed confidence level, confidence intervals, the analyst must the length of intervals becomes shorter determine which type of interval to use as the sample size increases. So, the depending on the application (Hahn and analyst may choose higher confidence Meeker, 1991). Two commonly used levels with large samples and lower types are confidence intervals for confidence levels with small samples. In population parameters and confidence some cases, obtaining meaningful intervals for distribution . The confidence intervals becomes most frequently used type of confidence impractical because of the small sample interval attempts to capture the size or complexity of analysis. population mean. Sometimes, however, the analyst may construct confidence CONFIDENCE INTERVALS FOR intervals for the or CLINICAL TRIALS other shape parameters for a distribution The most frequently used confidence to satisfy his needs. In the case that the interval for clinical trial data is the 95% assumed distribution parameters are not confidence interval for the mean suitable to describe the sampled treatment difference. The selection of population, the analyst may focus on one 95% for the confidence level is common or more percentiles of the sampled across disciplines. The reason for this distribution and construct confidence selection seems quite obvious for intervals for them (for example, for the clinical trials, especially for median or the quartiles). Sometimes one- confirmatory trials as a result of the sided confidence bounds are desired in study design, and the 95% confidence situations where the major interest is level provides reasonable assurance restricted to the lower limit or the upper along with adequate precision for most limit alone. trials. The methods for calculating these All statistical intervals have an confidence intervals can be generally put associated confidence level. The analyst into two categories: distribution- must determine the confidence level dependent and distribution-free methods.

Construction of distribution-dependent difference through analysis of , confidence intervals requires one to although it is critical to obtain the assume a particular distribution, such as degrees of freedom and estimate the the . From experience, (or mean square error) the normal assumption appears to be correctly. Before the release of SAS valid for many clinical trial data analyses. version 8, the analyst had to extract these However, it may be inappropriate to values from the SAS procedures and calculate distribution-dependent then calculate the confidence intervals confidence intervals when the assumed using the formula above (or one of its distribution does not fit the data well. In variations) with custom written SAS such cases, distribution-free confidence code. The following code using SAS intervals should be constructed. A version 6.09 represents one way to distribution-free interval sometimes may obtain the 95% confidence interval for not exist, and its length is generally the mean treatment difference from an longer than the corresponding ANOVA model. distribution-dependent interval for a *------*; particular distribution. This is the price * The following statements get output *; that one pays for not making the * datasets containing statistics *; * needed for calculation of confidence*; distribution assumption (Hahn and * interval *; Meeker, 1991). So, a distribution- *------*; proc glm data=final outstat=glmdt dependent confidence interval should be noprint; chosen whenever there is solid evidence class &trt &str &invcd; model &dep=&indp; that the data follows a tractable lsmeans &trt/pdiff stderr tdiff distribution. out=lsmeandt;

*------*; DISTRIBUTION-DEPENDENT * The following statements get the *; * degree of freedom from the model *; CONFIDENCE INTERVAL *------*; If the assumption that the data are data _null_; set glmdt; if _type_='ERROR'; normally distributed is valid, one can call symput('df', df); construct confidence intervals for the *------*; mean treatment difference. The general * The following statements get the *; form of a confidence interval for the * LSMEANS from the model *; *------*; mean difference between two treatment data _null_; set lsmeandt; groups (Group A and Group B) is if &trt=’’&control’’ then call symput('pLSM', lsmean); if &trt=’’&trt1’’ then call symput('tLSM', lsmean);

a − b ± 1 − a / 2, df * (Ya − Yb ) Y Y t S (1) *------*; *The following statements get the 95% *; *confidence interval *; whereY is the mean or *------*; data mgmean; set lsmeandt; mean, and S (Ya − Yb ) is the standard error of lb=(&tLSM -&pLSM)-tinv(.975, &df)*abs(&tLSM -&pLSM)/sqrt(f); − − the estimate of (Ya Yb ), and t 1 a / 2, df is ub=(&tLSM -&pLSM)+tinv(.975, the 100(1-α/2) from student’s &df)*abs(&tLSM -&pLSM)/sqrt(f);

t distribution with ‘df ‘degrees of In this example, the estimated standard freedom. error was obtained using the relationship In practice, one can calculate a 95% between the t-distribution and F- confidence interval for the mean

distribution. Since we have the following difference in the percentage of two equations: responders will be more informative. Using the normal approximation to the binomial distribution, the asymptotic = a − b t Y Y / S (Ya − Yb ) (2) confidence interval is as follows:

2 (P – P ) ± z * S (4) F= t (3) a b 1-α/2 (Pa – Pb )

where Pa , Pb are the percentage of Then S (Ya − Yb ) can be estimated using the F-value from the model. responders in each treatment group and With the release of SAS version 8 and S(Pa – Pb ) is the standard error of the ODS, obtaining confidence intervals has difference in percentages. become an easier task from the The task will be very simple if one can programming point of view. The estimate S(Pa – Pb ) . In fact, it can be following code illustrates the way to get estimated by the quantity, the same confidence interval using ODS Pa(1.− Pa) / na + Pb(1− Pb) / nb in SAS version 8. One can extract the necessary

*------*; components from PROC FREQ to * The following statements use ods to *; calculate the confidence interval in this * obtain the 95% confidence interval *; * for the mean treatment difference *; case or use the new option ‘RISKDIFF’ *------*; in SAS version 8 to complete the job. ods listing close; ods output LSMeanDiffCL=lmci; *------*; proc glm data=final; * The following statements put the 95% *; class &trt &invcd; * confidence interval for treatment *; model &dep=&indp ; * difference into macro variables *; lsmeans &trt/pdiff cl ; *------*; run; proc freq data=&ind noprint; table &trt*&var /chisq riskdiff; ods output close; output out=pval PCHI rdif2; ods listing ; data pval; set pval; length categ $55.; *------*; categ="&cat"; * The following statements put the 95% *; pv=P_PCHI; * confidence interval for treatment *; ll=100*L_RDIF2; * difference into macro variables *; ul=100*U_RDIF2; *------*; keep pv ul ll categ; data _null_; set lmci ; call symput('LCL',put(LowerCL, 5.2)); **** Lower CI ; DISTRIBUTION-FREE data _null_; set lmci ; CONFIDENCE INTERVAL call symput('UCL',put(UpperCL, 5.2)); **** Upper CI ; Without requiring that the data follow a particular distribution, one can construct In some cases, the analyst was asked to distribution-free confidence intervals construct confidence intervals for using order statistics. However, it is proportions or percentages. For instance, generally impossible to obtain an researchers may want to estimate the interval with precisely the desired difference between treatments in the confidence level, which has severely percentage of patients who respond to limited their use. In order to obtain an the therapy. Rather than providing only interval with the desired confidence the chi-square p-value, presenting the level or with an acceptable length, confidence interval for the treatment relatively large samples are often

proc univariate data=final loccount required (Hahn and Meeker, 1991). modes cibasic(alpha=.05) After specifying the confidence level, cipctldf(TYPE=ASYMMETRIC alpha=.05); one has to determine the order statistics var P&var ; from the sample and select those order by &trt ; run; statistics as the endpoints of the interval with at least the specified confidence data pct; set pctldf; if quantile='50% Median' ; level. run; Generally, the two-sided distribution- ods output close; ods listing ; free 100(1-α)% confidence interval for Yp, the 100pth percentile of the sample *------*; * The following statements put the 95% *; of size n is [ x(l), x(u) ] , where x(j) is the * confidence interval into macro *; jth order . The lower rank and * variables *; *------*; upper rank are integers that are data _null_ ; set pct; if &trt= "&trt1"; symmetric or nearly symmetric around i call symput ('lclddf', put(LCLDistFree, 7.2)); = [np] + 1, where [np] is the integer part of np (the non-parametric estimate of the data _null_ ; set pct; if &trt= "&trt1"; call symput ('uclddf', 100pth percentile lies between x([np]) and put(UCLDistFree, 7.2)); x([np]+1)). Determination of l and u requires a stepwise process: In clinical trials, the primary objective 1. Let l and u be the two integers closest usually is to demonstrate a difference to p(n+1). (If p(n+1) is itself an integer, between treatments. It may be hard to then l and u will be the same.) draw meaningful inference from the 2. Decrement l and increment u by one significance test alone when using non- until the following constraint is met: parametric or even more complicated approaches. Providing a confidence interval can help to interpret the Qb(u-1; n,p) – Qb(l-1; n,p) ≥ 1-α, (5) significance of the finding and quantify the treatment effect to a certain degree. where Qb is the cumulative binomial probability, 0

Usually, one designs a trial to detect a an accept or reject statement. On the treatment difference using a pre- other hand, inference from confidence specified statistical test at α level of 5% intervals is equivalent to that obtained (two-sided test), so that a 95% from the associated (same distributional confidence interval is the equivalent assumption) hypothesis test. Therefore, choice and will provide inference reporting of confidence intervals along consistent with the significance test. with significance tests will improve the This close relationship is based upon the understanding of the findings from common used for the clinical trials. This paper has presented analysis and the confidence interval. If the types of confidence intervals and the significance test is based on least their properties and demonstrated how to squares means under the generalized construct both distribution-dependent , the inference from the and distribution-free confidence confidence interval for the difference of intervals using SAS. Although these LSMEANs will agree with the processes were challenging prior to the conclusion drawn from the significance release of SAS version 8, they are now test. However, when the primary very straightforward and can be analysis takes a non-parametric approach, implemented easily with the code the analyst has to be cautious in presented here. construction of the distribution-free confidence interval. When the treatment effect is strong, there will likely be agreement in inference no matter which method is used. But ambiguity may appear between several approaches when the treatment effect is not clearly identifiable. In table 2, the p-value from van Elteren’s test is near the nominal significance level. The overlapping of the confidence intervals for median percent change has provided the additional information about this uncertainty that the significance test alone cannot.

CONCLUSION The reporting of p-values alone is insufficient and can sometimes be misleading. Confidence intervals are generally more informative than p- values. They provide quantitative bounds that express the uncertainty inherent in estimation, instead of merely

Table 1 LOCF Analysis Summary All Randomized Subjects

Treatment Results 95% CI {a} p-Value {b} Group Timepoint ------n Mean Median SD Min Max

Group 1 Baseline 322 18.97 15.00 14.62 0.00 103.00 (N =339) Endpoint 14.75 10.31 14.46 0.00 89.00 Change -4.22 -3.00 13.03 -87.00 36.00 PercentChange -9.17 -27.48 93.34 -100.00 700.00 (-34.69,-18.18)

Group 2 Baseline 286 18.17 14.00 14.26 0.00 87.00 (N =344) Endpoint 9.77 7.00 9.92 0.00 53.67 Change -8.40 -6.94 11.39 -62.00 35.25 PercentChange -34.31 -50.00 78.28 -100.00 550.00 (-56.04,-43.75) <.001

{a} 95% confidence interval for the median percent change was obtained using non- parametric methods for order statistics. {b} P-value is obtained from Van Elteran's test for the analysis of the percent changes

Table 2 LOCF Analysis Summary All Randomized Subjects

Treatment Results 95% CI {a} p-Value {b} Group Timepoint ------n Mean Median SD Min Max

Group1 Baseline 229 18.33 14.70 15.50 1.00 82.00 (N =231) Endpoint 11.55 7.00 12.43 0.00 66.00 Change -6.79 -5.50 10.71 -72.00 31.70 PercentChange -31.94 -40.00 59.53 -100.00 316.67 (-50.00,-33.33)

Group2 Baseline 200 18.49 14.00 15.24 1.00 92.00 (N =227) Endpoint 10.00 6.00 11.32 0.00 57.56 Change -8.49 -7.00 12.45 -88.50 26.25 PercentChange -41.27 -53.55 53.11 -100.00 240.00 (-60.00,-42.42) .050

{a} 95% confidence interval for the median percent change was obtained using non- parametric methods for order statistics. {b} P-value is obtained from Van Elteren's test for analysis of the percent changes

REFERENCES Hahn, G. J. and Meeker, W. Q. (1991). Simon, R. (1993). Why Confidence Statistical Intervals: A Guide for Intervals are Useful Tools in Clinical Practitioners, New York: John Wiley & Therapeutics. Journal of Sons, Inc. Biopharmaceutical Statistics, 3(2), 243- 248. ICH Harmonised Tripartite Guideline: Statistical Principles for Clinical Trials. ACKNOWLEDGEMENT I owe special thanks to Scott Beattie, Walker, G. A. (1997). Common who reviewed this paper in depth and Statistical Methods for Clinical Research provided the great comments. I would with SAS Examples, SAS Institute Inc. also like to thank Yan Zhao for his valuable input.

CONTACT INFORMATION Guangbin Peng Statistical Analyst Primary Care Product Team Lilly Corporate Center Indianapolis, Indiana 46285 U.S.A Phone: 317 433 8445 Fax: 317 276 4789 Email:[email protected]