Program Documentation for a SAS Macro to Estimate the Effect of Time to Treatment Initiation

Program documentation for a SAS macro to estimate the effect of time to treatment initiation with application to initiating HAART in HIV-positive patients Authors: Judith Lok, Assistant Professor of Biostatistics, and Ray Griner, Programmer, Harvard School of Public Health, Department of Biostatistics. [email protected]; [email protected]. Program names: nonopt_macro_public.sas; nonopt_macro_newpsi4.sas Date: September 22, 2011 Program version: 1.0

LICENSE This software is provided under the standard MIT License (below). Copyright (c) 2011, The President and Fellows of Harvard College Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Citation: Lok JJ, Griner R, DeGruttola V. Program documentation for a SAS macro to estimate the effect of time to treatment initiation with application to initiating HAART in HIV-positive patients. Details of the analysis described in [1].

This work was funded by the Milton Fund and NIH grants AI051164 and AI032475. Table of Contents 1 Overview...... 1 2 Input Parameters...... 1 3 Input Dataset...... 2 4 Output Datasets...... 3 5 System Requirements...... 5 6 Internal Structure...... 4 7 Implementation Notes...... 5 8 References...... 5

1 Overview The program NONOPT_MACRO is a SAS macro designed to fit models in a class of structural nested mean models. Two implementations of the macro are provided in nonopt_macro_public.sas and nonopt_macro_newpsi4.sas. These implementations differ in the models they fit. The macro in nonopt_macro_public.sas fits 2-, 3-, and 4-parameter models with the treatment effect depending on the duration of treatment, with a coefficient depending in a quadratic way on the time the outcome was measured (month k). This function of time is linear for the 2-parameter model and quadratic for the 3- and 4-parameter model. The 4-parameter model always includes this quadratic dependence as the first three parameters, while the fourth parameter is set using the PARAM4 input parameter as either: (1) (Length of treatment)2; (2) (Log(viral load) at treatment initiation) * (length of treatment); or (3) Log(viral load). The macro in nonopt_macro_newpsi4.sas differs in that it also fits 2-, 3-, and 4-parameter models for the treatment effect depending on the duration of treatment, but with a coefficient that depends in a quadratic way on the time of treatment initiation (month m) as opposed to the time the outcome was measured. Further details on the model building and estimation process are provided in the paper by Lok and DeGruttola [1] and the EE_Public.pdf document distributed with this macro.

2 Input Parameters

Table 1: Input Parameters Parameter Description OUTLIB Output libname for the SAS datasets that will be created by the program DATA Input dataset NUMREPS Number of replicates for the bootstrap MAXWGT Maximum weight for the inverse probability of censoring weighting. Weights over this value will be censored at this value with a warning message printed to the log. Default is 15. ECHOREP Y or N (default). If Y, then a message will be printed to the UNIX terminal when processing starts on each bootstrap replicate so the progress of the program can be monitored. Do not specify Y if running from SAS installed on Windows. REGTYPE Takes values NONDR (non-doubly robust) or DR (doubly robust). PARAM4 Takes values TRT2 (Duration2), LVLTR (Log(VL)*treatment duration), or LVLAK11 (Log(VL)).This defines the fourth parameter of the 4-parameter model that will be fit by the program. Note that the first three model parameters are always Treatment duration, Treatment duration*Month, and Treatment duration*Month2 (more details in the summary section above). OUTCOME Variable name or expression defining the model outcome. TRT_MODEL_ List of variables used as a predictor of Treatment and a predictor of the quantities to VARS get the doubly robust estimators, i.e., treatment, and whatever effect that is included as the fourth model parameter using the PARAM4 input parameter. This macro implements the simplest case where the same variables are used as predictors in a number of different models. If different predictors are desired for different models, the macro can be modified to incorporate this. LASTMONTH_ List of variables used to predict whether a given month is the last observed for the MODEL_VARS patient. Used in calculating the inverse probability of censoring weights. FIRSTTRT_ List of variables used to predict the probability of treatment initiation MODEL_VARS OUTCOME_ List of variables used to predict the outcome (only needed when REGTYPE=DR). MODEL_VARS

3 Input Dataset The macro takes as input a dataset that is one record per patient per month. The dataset must be sorted by PATID and MONTH. The dataset requires the following variables:

Table 2: Input Dataset Variables Parameter Description patid Patient identifier month Month monthsqrd Month*Month treated 0/1 indicator. This is 1 when the patient started treatment in this month or a previous month and 0 if the patient has not yet initiated treatment. firsttreated 0/1 indicator whether this month is the first month the patient was treated. trt Duration of treatment between month and month+12 in years lvl Log(HIV viral load). Used if input parameter PARAM4=LVLTR or. month_firsttrt Month of first treatment. Used if PARAM4=LVLAK11 lvl_firsttrt Log(viral load) at month of first treatment. Used if PARAM4=LVLAK11 [Others] Any variables that will be used in the model building, i.e., those variables specified in the OUTCOME, TRT_MODEL_VARS, LASTMONTH_MODEL_VARS, FIRSTTRT_MODEL_VARS, and OUTCOME_MODEL_VARS input parameters. 4 Output Datasets The primary output datasets are JRE_ALL and BOOTPLOTDATA. JRE_ALL is a dataset with one record per month that stores the parameter estimates for all the requested models, as well as various statistics of the treatment effect calculated on the bootstrap replicates. JRE_ALL is derived from the dataset BOOTPLOTDATA, which is one record per month per bootstrap replicate.

Table 3: Variables in BOOTPLOTDATA Variable Description month Month sampnum Bootstrap replicate (0 = original data) effectJRE2_12 Estimated treatment effect, 2 parameter model effectJRE3_12 Estimated treatment effect, 3 parameter model effectJRE4_12 Estimated treatment effect, 4 parameter model psihat21, Parameter estimates for the two parameter model (same values for all months for a psihat22 given replicate). psihat31, Parameter estimates for the three parameter model (same values for all months for a psihat32, given replicate). psihat33 psihat41, Parameter estimates for the four parameter model (same values for all months for a psihat42, given replicate). psihat43, psihat44

Most of the variables in BOOTPLOTDATA are also in JRE_ALL (except for sampnum). The additional variables in JRE_ALL that aren’t in BOOTPLOTDATA are shown in Table 4.

Table 4: Variables in JRE_ALL that are not in BOOTPLOTDATA Variable Description effectJRE2_12 Estimated treatment effect calculated on the original (non-bootstrapped) data for the 2 parameter model effectJRE2_12_Mean, Mean, median, 5th percentile, 95th percentile, standard deviation, 2.5th effectJRE2_12_Median, percentile, and 97.5th percentile of the treatment effects, calculated across the effectJRE2_12_P5, bootstrap replicates for the 2 parameter model effectJRE2_12_P95, effectJRE2_12_StdDev, effectJRE2_12_lb, effectJRE2_12_ub effectJRE3_12, Same variables as above for the 3 and 4 parameter models effectJRE3_12_Mean, ... effectJRE4_12, effectJRE4_12_Mean, ... Nvisits Number of patient visits in the month meancd4 Mean CD4 measurement 5 Internal Structure A brief outline of the program structure is below: 1. Validate and save macro input parameters 2. Create BOOTSAMP dataset that says which of the original patients will be in each bootstrap sample. 3. Call macro %dobootstrap that is a big %do loop to perform the analysis on the original data and all the replicates. This creates the PSIHAT dataset that contains the psi-hats for every model (one record per replicate). Note that the do loop starts at 0 and goes through numreps, where the replicate 0 is the original data. 4. Create the BOOTPLOTDATA dataset (one record per patient per replicate per month) that contains the psi-hats as well as the effect estimates for the given month. 5. Create the JRE_ALL dataset (one record per month) that contains the mean effect and 95% upper and lower confidence bounds for each model. The %dobootstrap macro performs the bulk of the analysis. This macro does the following for each bootstrap replicate: 1. Create the bootstrap dataset BOOTANALDATA formed by resampling the input dataset (DATA). 2. Calculate inverse probability of censoring weights. 3. Calculate expected treatment duration using covariates through time k. These models are built using the outcomes for patient-month records representing patients untreated through time k, but the predicted probabilities are built for all patient-month records. The estimated treatment duration will be used in the first three elements of the q vector (as described in [2]) for all models. 4. Calculate the estimated value used in the fourth element of the q vector. The fourth parameter depends on the value of PARAM4 that was input (see above).

5. Calculate the expected value for each model effect for patients untreated through Ak-1. These are used in the doubly robust estimating equations, which have a term that is the difference between the actual and expected value for each model effect. 6. Create the matrix and vector to be used in the estimating equations. 7. Call the %solveeqs macro to solve the estimating equations (calculate psi-hat) for the 2-, 3-, and 4-parameter models. Add a record to the PSIHAT data that contains these estimates.

6 System Requirements The macro was tested on SAS version 9.2 on Linux. The macro uses the modules SAS/Base, SAS/Stat, and SAS/IML. We know of nothing in the code that relies on recently added SAS features, so the macro probably works on other recent SAS versions as well. A program feature dependent on the operating system is that when the input parameter ECHOREP=Y is specified, the macro will call the operating system command echo to print to the terminal the replicate number when analysis of each bootstrap replicate begins. The echo command does not exist on Windows, so users running this macro on a Windows installation should call the macro with ECHOREP=N. SAS/IML is used only to solve the linear estimating equations to calculate the parameter estimates. Therefore, users without SAS/IML may consider modifying the macro to save the necessary matrix and vector datasets for each replicate and then use other software to solve the linear equations and perform the remaining program tasks.

7 Implementation Notes 1. The random number seed is set to 12 within the macro. No input parameter exists to set the seed, so a user who requires a different seed should change the macro itself. 2. To reduce the size of the output listing and log the options MPRINT and NOTES are turned on when processing the original dataset but are turned off (using NOMPRINT and NONOTES) when processing the bootstrap replicates. Likewise, observations from intermediate datasets are printed only for the processing of the original data. 3. The WEIGHT statement is used with a variable (taking 0/1 values) in a PROC REG or PROC LOGISTIC to calculate the parameter estimates (using only those patients where the WEIGHT variable equals one) and use these estimates to calculate predicted values for all patients (even those where the WEIGHT variable equals zero). This provides greater backwards compatibility than using the INMODEL, OUTMODEL, and SCORE options for PROC LOGISTIC.

8 References [1] Lok JJ and DeGruttola V (2011). Impact of Time to Start Treatment Following Infection with Application to Initiating HAART in HIV-Positive Patients. Available upon request to [email protected]. Re-submitted by invitation to Biometrics.