Optimal Rerandomization Designs Via a Criterion That Provides Insurance Against Failed Experiments
Total Page:16
File Type:pdf, Size:1020Kb
Optimal Rerandomization Designs via a Criterion that Provides Insurance Against Failed Experiments Adam Kapelner∗ Department of Mathematics, Kiely Hall Room 604, 65-30 Kissena Boulevard, Queens, NY, 11367, USA Abba M. Krieger Department of Statistics, Huntsman Hall Room 442, 3730 Walnut Street, Philadelphia, PA, 19104, USA Michael Sklar Department of Statistics, Sequoia Hall Room 105, 390 Serra Mall, Stanford, CA, 94305, USA David Azriel Faculty of Industrial Engineering and Management, Bloomfield Building Room 301, Technion City, Haifa 32000 Israel Abstract We present an optimized rerandomization design procedure for a non-sequential treatment-control experiment. Randomized experiments are the gold standard for finding causal effects in nature. But sometimes random assignments re- sult in unequal partitions of the treatment and control group visibly seen as imbalance in observed covariates. There can additionally be imbalance on un- observed covariates. Imbalance in either observed or unobserved covariates in- creases treatment effect estimator error inflating the width of confidence regions and reducing experimental power. \Rerandomization" is a strategy that omits poor imbalance assignments by limiting imbalance in the observed covariates to a prespecified threshold. However, limiting this threshold too much can in- crease the risk of contracting error from unobserved covariates. We introduce arXiv:1905.03337v2 [stat.ME] 25 Jan 2021 a criterion that combines observed imbalance while factoring in the risk of in- advertently imbalancing unobserved covariates. We then use this criterion to locate the optimal rerandomization threshold based on the practitioner's level of desired insurance against high estimator error. We demonstrate the gains of our designs in simulation and in a dataset from a large randomized experiment ∗Corresponding author Email address: [email protected] (Adam Kapelner) Preprint submitted to Elsevier January 26, 2021 in education. We provide an open source R package available on CRAN named OptimalRerandExpDesigns which generates designs according to our algorithm. Keywords: Randomized experiments, Rerandomization, Experimental design, Optimization 1. Background We consider a classic problem: a treatment-control experiment with n sub- jects (individuals, participants or units) and one clearly defined outcome of interest (also called the response or endpoint) for the n subjects denoted y = > [y1; : : : ; yn] . We scope our discussion to the response being continuous and uncensored (with incidence and survival response left to further research). Each subject is assigned to the treatment group and a control group denoted T and C and referred to as the two arms. Here we consider the settings where all subjects along with their observed subject-specific covariates (measurements or characteristics) denoted x known beforehand and considered fixed. This non-sequential setting was studied by Fisher (1925) when assigning treatments to agricultural plots and this setting is still of great importance today. In fact, it occurs in clinical trials as \many phase I studies use `banks' of healthy volunteers ... [and] ... in most cluster randomised trials, the clusters are identified before treatment is started" (Senn, 2013, page 1440). Synonymously referred to as a randomization, an allocation or an assign- > ment is a vector w = [w1; : : : ; wn] whose entries indicate whether the subject received T (coded as +1) or C (coded as -1) and thus this vector must be an element in {−1; +1gn. The goals of such experiments are to estimate and test a population average treatment effect (PATE) denoted 2βT (where the multiplicative factor of 2 would not be present if w was coded with 0 and 1). After the sample is provided, the only degree of control the experimenter has is to choose the entries of w. The process that results in the choices of such allocations of the n subjects to the two arms is termed synonymously as an experimental design (or just design), a strategy, an algorithm, a method or a procedure. The design essentially is a generalized multivariate Bernoulli random variable denoted W with mass function denoted P (w), variance denoted ΣW and support denoted W and synonymously termed the allocation space. If all > realizations satisfy w 1n = 0, i.e. all of the assignments feature an equal number of subjects assigned to the treatment group and the control group, n=2, we term W a forced balance procedure (Rosenberger and Lachin, 2016, Chapter 3.3). The question then becomes: what is the optimal design strategy? Regard- less of how optimality is defined on whichever criterion, this is an impossible question to answer. The multivariate Bernoulli random variable has 2n − 1 pa- rameters (Teugels, 1990, Section 2.3). Finding the optimal design is tantamount to solving for exponentially many parameters in n using only the n observations, 2 a hopelessly unidentifiable task. Thus all \optimal" designs will fundamentally be heuristics. The question then becomes: which heuristic design should we employ? This has been hotly debated for over 100 years. There are loosely two main camps: those that optimize assignments and those that randomize assignments. The latter first advocated by Fisher (1925) has prevailed, certainly in the context of causal inference and clinical trials. However, the design being \best" is depen- dent on many choices: (C1) the choice of estimator for βT , (C2) the source of the randomness in the response-generating process Y , (C3) the form of the response model as a function of the observed and unob- served covariates (and embedded within is the importance of the observed covariates relative to the unobserved) and (C4) a definition of optimality and a loss function that gauges this optimality. Further, drawing valid inference will be different depending on these choices, a point that we address in Section 2.3, but here we will continue this example with the focus on estimation. We now briefly describe our (C1-C3). (C4) will be developed iteratively in the next section of the paper. For (C1), we examine two choices of estimators for βT . First, the classic ^DM ¯ ¯ ¯ difference in means estimator βT := (YT − YC )=2 where YT is the random variable of the average response in the treatment group and Y¯C is the random variable of the average response in the treatment group. If the model were to ^DM be unknown, βT would be the uniformly minimum variance unbiased esti- mator for the PATE (Lehmann and Casella, 1998, Example 4.7). Second, the ^LR covariate-adjusted regression estimator or OLS estimator, βT defined as the slope coefficient of w in an ordinary least squares fit of y using x and w. If y ^LR is known to be linear in x, then βT is known to be the best linear unbiased estimator for the PATE.1 In (C2), the source of randomness in Y is typically assumed to either be the population model (Rosenberger and Lachin, 2016, Chapter 6.2) or the ran- domization model (Rosenberger and Lachin, 2016, Chapter 6.3). To explain the difference between these two, assume for (C3) the following linear response model which we me generalize later: y = βT w + βxx + z (1) 1 ^LR The employment of βT in the realistic setting where the linear model assumption is untestable (and most likely incorrect) results in bias. This bias is a subject of a large debate (Freedman, 2008; Berk et al., 2010; Lin, 2013) but is known to disappear rapidly in sample size. 3 To make the expressions even simpler, we assume x is centered and scaled so its entries have average zero and standard deviation one. This normaliza- tion allows us to omit an intercept term in this simple response model without consequences to this discussion. In the population model, subjects in the treatment group are sampled at random from an infinitely-sized superpopulation and thus z is a realization from some data generating process Z while w is fixed. In the randomization model, it is the opposite: the source of the randomness is in the treatment assignments w which is drawn from W while z is fixed. These two choices for (C2) represent very different worldviews with a rich history of debate (see for instance Reichardt and Gollob, 1999, pages 125-127). We touch on some of these issues by elaborating on the vastly different properties of our two estimators for βT in Appendix A.1 and Appendix A.2. 1.1. Assumption of the Randomization Model In this work, we employ the randomization model perspective following Rosenberger and Lachin (2016); Freedman (2008); Lin (2013) and others mostly because the population model requires the subjects to truly be sampled at ran- dom from a superpopulation. However, in most experiments, subjects are re- cruited from nonrandom sources in nonrandom locations. Experimental settings are frequently selected because of expertise, an ability to recruit subjects, and their budgetary requirements (Rosenberger and Lachin, 2016, page 99). In the context of clinical trials, Lachin (1988, page 296) states rather harshly that, \the invocation of a population model for the analysis of a clinical trial becomes a matter of faith that is based upon assumptions that are inherently untestable".2 1.2. Restricted Designs We now return to our initial question of \which heuristic design should we employ?" and develop more terminology. If we were to assign each individual to treatment by an independent fair coin flip we will term this the the Bernoulli Trial (Imbens and Rubin, 2015, Chapter 4.2). In the Bernoulli Trial, all assign- n ments are possible i.e. W = {−1; 1g and all assignments (the n elements of W ) are independent. Thus the variance of the design W is simply ΣW = In where the latter notation indicates the n × n identity matrix. Any other design besides for this Bernoulli Trial is conventionally termed a restricted randomization because the permissible allocations are restricted to a subset of all possible allocations. A very weak restriction is that of forced balance.