Online Resource 1 s7

Online Resource 1 Adjusting Overall Survival Estimates After Treatment Switching: A Case Study in Metastatic Castration- Resistant Prostate Cancer Konstantina Skaltsa1, Cristina Ivanescu2, Shevani Naidoo3, De Phung4, Stefan Holmstrom5, Nicholas R. Latimer6 1Quintiles Advisory Services, C/ Sardenya, 537 – 539 08024, Barcelona, Spain; 2Quintiles, Siriusdreef 10 Beukenhorst Zuid 2132 WT, Hoofddorp, Netherlands; 3Astellas Medical Affairs, Global Health Economic Outcomes Research (HEOR), 2000 Hillswood Dr, Chertsey, Surrey KT16 0PS, UK; 4Astellas Pharma Global Development, Sylviusweg 62, 2333 BE, Leiden, Netherlands; 5Astellas Medical Affairs, Global HEOR, Sylviusweg 62, 2333 BE, Leiden, Netherlands; 6School of Health and Related Research, University of Sheffield, Regent Court, 30 Regent Street, Sheffield, S1 4DA, UK Correspondence: Shevani Naidoo, Astellas Medical Affairs, Global HEOR, 2000 Hillswood Dr, Chertsey, Surrey KT16 0PS, UK; email: [email protected]

1 Methods Adjustment methodologies in relation to treatment switching, including the two-stage accelerated failure time model (two-stage method) [1] and inverse probability of censoring weights [IPCW] method [2], have been considered previously [3]. Details given below describing how these adjustment methods were applied to data from the PREVAIL study [4] are based on the previously published descriptions, to which the reader is referred for more information. 1.1 Application of the Two-Stage Method to Data from the PREVAIL Study 1.1.1 Creation of Observational Datasets and Calculation of Acceleration Factors For both switchers and nonswitchers, the secondary baseline for the two-stage method was defined as the treatment discontinuation date, and overall survival (OS) was considered from this point onward. Data from patients who did not discontinue treatment were not included in this analysis. As patients originally randomized to both treatment groups (enzalutamide and placebo) switched to nonstandard antineoplastic therapy, these two arms comprised separate observational datasets. This allowed for potential differences in the effects of the post- study therapy in the light of the originally randomized treatment. As covariates were measured at specific clinic visits and not at treatment discontinuation, the last assessment of each covariate was considered as the value of the covariate at the secondary baseline. If in the last visit prior to treatment discontinuation this particular covariate was not measured, then a last observation carried forward (LOCF) approach was used (i.e. the value from the previous visit was carried forward). For patients in whom specific covariates had never been measured, sample mean imputation was used to provide baseline values, which were then carried forward. Having created the observational datasets, two accelerated failure time models, a Weibull and a generalized gamma, were fitted separately. 1.1.2 Covariates Covariates included in the two-stage and IPCW methods are described in Table 1 of the main article. Inclusion of these covariates was intended to permit adjustment for any differences between switchers and nonswitchers. Although different post-study therapies may vary in their effectiveness, as abiraterone was the most commonly administered nonstandard drug (151/309 patients) and other treatments were received by smaller numbers of patients across a range of other nonstandard therapies (Table 2 of the main article), for each treatment group to which patients were originally randomized (enzalutamide or placebo), the data from switchers were pooled to estimate the effect of all nonstandard treatment received in that arm. For each treatment group, the exponential of the coefficient associated with the switching indicator was used to determine an acceleration factor (AF). Counterfactual survival times, shrunk for switchers, could then be calculated.

1 Counterfactual survival time = Time to treatment discontinuation + (Time from treatment discontinuation to death or censoring / AF) The resulting survival dataset thus comprised counterfactual survival times for switchers and observed survival times for nonswitchers. Model fits were evaluated using Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), with lower values indicative of a better statistical fit. 1.1.3 Recensoring With the methodology involving estimation of counterfactual survival times, and as counterfactual censoring times may relate to prognosis and be informative, recensoring was undertaken as is generally recommended [5, 6]. Consequently, the potential censoring time, C, was defined as the time from randomization until censoring for censored patients or, for patients who died, the time from randomization until the data cutoff date (i.e. September 16, 2013). For all patients, the potential recensoring time (C*) was then calculated as C* = C / AF (AF > 1). Survival times (X) were adjusted according to X = min (survival time, C*), i.e. the observed or counterfactual survival time, or, if it occurred earlier, the potential recensoring time. Consequently, patients who did not die during the study remained censored, with the censoring date being changed to the potential recensoring time if this was earlier. For patients who died: If X = survival time (observed for nonswitchers or counterfactual for switchers), the death was counted as an event If X = C*, whereby death occurred after the potential recensoring date, the patient was censored at C* and the event was no longer observed As recensoring can lead to bias if the treatment effect varies over time [3], analyses were rerun without incorporating recensoring. 1.1.4 Cox Regression Implementation of the two-stage method was completed by using a Cox regression model, with treatment as the only covariate, to obtain adjusted hazard ratios (HRs) for OS, and a 95% confidence interval (CI) for the HRs was estimated based on the means of the 2.5 and 97.5 percentiles of 1000 bootstrap replicates, in which the entire adjustment process was bootstrapped. 1.2 Application of the IPCW Method to Data from the PREVAIL Study In the context of PREVAIL, the IPCW analysis consisted of three steps. 1.2.1 Creation of Panel Data For each patient, the following baseline covariates (categorical where shown in parentheses; otherwise continuous) were determined: age, time since diagnosis (<5 years vs. ≥5 years); number of bone metastases at screening (≤5 vs. >5); presence of visceral disease at study baseline (yes vs. no); type of disease progression at study entry (prostate specific antigen [PSA] progression only vs. radiographic progression with or without PSA vs. no disease progression at study entry); baseline EQ-5D utility index; baseline Functional Assessment of Cancer Therapy–Prostate (FACT-P) total score; disease progression (yes vs. no); and time to study treatment discontinuation. A panel dataset of time-dependent covariates was also created for each patient, whereby the time from randomization to treatment switch or death, withdrawal of consent, or study end (whichever of these three alternatives occurred first) was partitioned into intervals based on the timing of the clinic visits (scheduled every 4 weeks up to week 49, and every 12 weeks thereafter). The following time-dependent covariates (categorical where shown in parentheses, otherwise continuous) were calculated and updated at the beginning of each interval: Eastern Cooperative Oncology Group Performance Status (0 vs. other); PSA level; history of grade 3, 4, or 5 adverse events (AEs) since randomization; occurrence of grade 3, 4, or 5 AEs since last visit (yes vs. no); corticosteroid use (yes vs. no); lactate dehydrogenase level (≤240 IU/mL vs. >240 IU/mL); EQ-5D utility index; FACT-P total score; and time since study treatment discontinuation.

2 For patients in whom baseline assessment of specific covariates had not been performed, sample mean imputation was used. An LOCF approach was subsequently adopted; time-dependent variables were only measured up to the point of study treatment discontinuation. Upon study treatment discontinuation, all patients were assumed to be at risk of switching, and intervals were added to the panel data during the follow-up period to assess OS. 1.2.2 Calculation of Stabilized Weights Stabilized weights (SWs) were calculated by fitting logistic models predicting the probability of remaining uncensored as described below [2, 7]. For each patient (i) and interval (j):

Where:

C(k)i = an indicator function representing censoring/switching status at the end of interval k (binary indicator: 1 = censored/switched; 0 = uncensored)

X(0)i = an array of patient characteristics measured at baseline

Y(k)i = an array of time-dependent patient characteristics measured at or before the beginning of interval k

P[C(k)i|C(k–1)i, X(0)i] = the probability of remaining uncensored at the end of interval k if uncensored

at the end of interval k–1 and conditioned on baseline characteristics X(0)i P[C(k)i|C(k–1)i, X(0)i,Y(k)i] = the probability of remaining uncensored at the end of interval k if

uncensored at the end of interval k–1 and conditioned on baseline characteristics X(0)i and time-

dependent patient characteristics Y(k)i The SW numerator was estimated with a logistic regression model (model 1), which modeled the probability of remaining uncensored at time j conditional on patient i baseline factors and a time-dependent intercept (estimated using the number of days since randomization). The following baseline factors were used: age, years since initial diagnosis, number of bone metastases at screening, presence of visceral bone disease, PSA progression only at study entry, and radiographic progression with or without PSA at study entry. The model’s dependent variable was a binary variable indicating whether or not switching had occurred during the interval (1 or 0). The model was fitted to all intervals from randomization to treatment switch or censoring (death, withdrawal of consent, or study end, whichever occurred first). The SW denominator was estimated with a logistic regression model (model 2), which modeled the probability of remaining uncensored conditional on the same baseline factors used in model 1 and the time-dependent covariates given above for patient i at time j. With time-dependent variables only being measured up to the point of study treatment discontinuation (as mentioned above), in model 2 only data observed at the time of study treatment discontinuation and the time since study treatment discontinuation were predictors of treatment switching. This model was fitted to all intervals after study treatment discontinuation to treatment switch or censoring (death, withdrawal of consent, or study end, whichever occurred first). With patients not switching before study treatment discontinuation, the probability of this was 0. Consequently, the probability of remaining uncensored in these circumstances was 1; accordingly, the probability of being uncensored at intervals prior to study discontinuation was set as 1, and the corresponding observations were not used in this model. To account for potential differences that may have affected switching in the enzalutamide and placebo arms, models 1 and 2 were fitted separately for both. SWs were calculated for all intervals before the date at which patients were assumed to be at risk of informative censoring (i.e. before study treatment discontinuation). The SW(j)i numerator was calculated using model 1 as described above, and the SW(j)i denominator was set to 1 (i.e. zero time-dependent probability of switching), producing SWs <1. For subsequent intervals, the numerators and denominators were calculated using models 1 and 2, respectively, and thus the SWs may have been >1. 1.2.3 Cox Regression

3 Implementation of the IPCW method was completed by using a weighted Cox regression model to estimate an adjusted HR for OS, comparing initial randomization to enzalutamide and placebo, with switchers being censored at the switching date. A 95% CI for the HR was obtained based on the means of 1000 bootstrap replicates.

2 Results The results presented below describe the analysis of data obtained at the cutoff (September 16, 2013) at which the planned interim survival analysis, performed at 540 reported deaths, revealed a benefit in favor of enzalutamide. 2.1 Two-Stage Method AFs calculated by fitting Weibull and generalized gamma models to the observational datasets (OS from the secondary baseline onwards) for each treatment arm are shown in Online Resource 8. In addition to fitting a full model that involved all the covariates described above, a sensitivity analysis was implemented using a restricted model that included only the covariates found to be statistically significant. Results from the restricted model are also shown. However, as the two-stage methodology relies on the critical assumption of there being no unmeasured confounders, the base case approach was to employ a full model in which attempts had been made to take into account the range of covariates that may predict survival. In line with the AIC and BIC statistics, which suggested the generalized gamma to better fit the data than the Weibull, the preferred model was the full model with a generalized gamma distribution fitted to both the treatment arms. OS results from Cox regression analyses undertaken on the counterfactual datasets obtained using the acceleration factors from the full generalized gamma and Weibull models are shown in Online Resource 8, together with the unadjusted intention-to-treat (ITT) results. With Kaplan-Meier curves (Fig. 2 of the main article) showing a substantial amount of information to be lost with recensoring of the data, creating the potential for considerable bias under the assumption that the treatment effect changed over time, the preferred approach for the two-stage method involved use of the generalized gamma model without recensoring. Using this preferred approach, the adjusted HR for OS was 0.66 (95% CI 0.57–0.81) for enzalutamide versus placebo, while the corresponding values for the unadjusted ITT analysis were 0.71 (0.60–0.84). The number of patients with events was 241 with enzalutamide and 299 with placebo for both the generalized gamma and Weibull models without recensoring, compared with 198 and 216, respectively, for the generalized gamma model with recensoring and 106 and 222, respectively, for the Weibull model with recensoring. 2.2 IPCW Method SW summary statistics for the IPCW analysis are shown in Online Resource 7. Mean SW values, for both treatment arms and all follow-up intervals, were very close to 1.0. Poorly performing covariates were liable to produce extreme weights that could introduce error into the analysis, but visual inspection of the individual weights confirmed that this had not occurred. A sensitivity analysis was performed, retaining only the covariates that were statistically significant in the two logistic regression models. Summary statistics for the SWs from the restricted models are also shown in Supplemental Table 2. Again, for both treatment arms and all follow-up intervals, the mean values were very close to 1.0. Using the full models, Cox regression analysis resulted in an HR for OS of 0.63 (95% CI 0.52–0.75) for enzalutamide versus placebo (Online Resource 8). For the restricted models, the same HR was obtained (0.63; 95% CI 0.52–0.75). Weighted Kaplan-Meier curves, estimated without including covariates, are shown in Fig. 3 of the main article.

References

4 1. Latimer NR, Abrams K, Lambert P, et al. Adjusting for treatment switching in randomised controlled trials – A simulation study and a simplified two-stage method. Stat Methods Med Res. 2014. Nov 21. pii: 0962280214557578 [Epub ahead of print]. 2. Robins JM, Finkelstein DM. Correcting for noncompliance and dependent censoring in an AIDS Clinical Trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics. 2000;56(3):779–88. 3. Latimer NR, Abrams KR. NICE DS technical support document 16: Adjusting survival time estimates in the presence of treatment switching. Report by the Decision Support Unit. July 2014. http://www.nicedsu.org.uk/TSD16_Treatment_Switching.pdf. Accessed Jun 3, 2016. 4. Beer TM, Armstrong AJ, Rathkopf DE, et al. PREVAIL Investigators. Enzalutamide in metastatic prostate cancer before chemotherapy. N Engl J Med. 2014;371(5):424–33. 5. Latimer NR, Abrams KR, Lambert PC, et al. Adjusting survival time estimates to account for treatment switching in randomized controlled trials--an economic evaluation context: methods, limitations, and recommendations. Med Decis Making. 2014;34(3):387–402. 6. White IR, Babiker AG, Walker S, et al. Randomization-based methods for correcting for treatment changes: examples from the Concorde trial. Stat Med. 1999;18(19):2617–34. 7. Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the joint causal effect of nonrandomized treatments. J Am Stat Assoc. 2001;96(454):440–8.

5