Optimal Design for A/B Testing in the Presence of Covariates And

Optimal Design for A/B Testing in the Presence of Covariates and Network Connection 1 2 Qiong Zhang∗ and Lulu Kang† 1School of Mathematical and Statistical Sciences, Clemson University 2Department of Applied Mathematics, Illinois Institute of Technology Abstract A/B testing, also known as controlled experiments, refers to the statistical procedure of conducting an experiment to compare two treatments applied to different testing subjects. For example, many companies offering online services frequently to conduct A/B testing on their users who are connected in social networks. Since two connected users usually share some similar traits, we assume that their measurements are related to their network adjacency. In this paper, we assume that the users, or the test subjects of the experiments, are connected on an undirected network. The subjects’ responses are affected by the treatment assignment, the observed covariate features, as well as the network connection. We include the variation from these three sources in a conditional autoregressive model. Based on this model, we propose a design criterion on treatment allocation that minimizes the variance of the estimated treatment effect. Since the design criterion depends on an unknown network correla- arXiv:2008.06476v1 [stat.ME] 14 Aug 2020 tion parameter, we propose a Bayesian optimal design method and a hybrid solution approach to obtain the optimal design. Examples via synthetic and real social networks are shown to demonstrate the performances of the proposed approach. Keywords: A/B testing; Conditional autoregressive model; Controlled experiments; Bayesian optimal design; D-optimal design; Network correlation. ∗[email protected] †[email protected] 1 1 Introduction A/B testing, also known as controlled experiments approach, has been widely used in agri- cultural, clinical trials, engineering and science studies, marketing research, etc. Through experiments, the experimenter compares the outcomes of two or more treatment settings from a finite number of test subjects. Due to the advent of internet technologies, large-scale A/B testing has been commonly used by technology companies such as Facebook, LinkedIn, and Netflix, to compare different versions of algorithms, web-designs, and other online products and services. In its simplest form, the experimenter wants to compare two different treatments, A and B. Completely randomized design is commonly used, in which the treatment setting is randomly assigned to different test subjects, such as users of online products and services. The randomization leads unbiased estimates of certain estimands, typically, the average treatment (or causal) effect (Rubin, 2005), under minimum assumptions. How- ever, there is still room for improvement on the efficiency of the A/B testing procedure in the presence of some practical challenges. The users participating A/B testing are usually associated with some covariates information, such as demographic information or social behaviors, as well as their connections based on personal or professional relationships. Covariates can be influential to the outcome measurements of test subjects. Meanwhile, if two subjects are connected in the social network, their responses are more likely to be correlated as well. This assumption is referred to network-correlated outcomes in Basse and Airoldi (2018b). Therefore, when the experiment is carried on a network of test subjects whose covariates are also available, intuitively, the distribution of covariates and network structure of each treatment group should be similar, or in other words, balanced, so that the estimated treatment effect is not confounded with other covariates or network effects. Depending on the sample size, distribution of covariates, and the network structure, randomization may not always lead to a balanced design. By design, we mean the assignment of treatment settings to each of the test subject in the context of this paper. We focus on the simplest case where the experiment only involves two treatments. But the later proposed modeling and design method can be extended to the case of multiple treatment settings, and the design of the treatment settings is not the 2 subject of study here. There have been many works that advocate the necessity of covariates balancing between the treatment groups. Readers can turn to Rubin (2005) for a review on the early literature. We highlight some recent developments here. Morgan and Rubin (2012) and Morgan and Rubin (2015) proposed to re-randomize the test subjects into different treatment groups to achieve smaller balance criterion, which is the Mahalanobis distance of the covariates of different groups. In Bertsimas et al. (2015), the imbalance is measured as the sum of discrepancies in group means and variances, and the optimal design minimizes the imbalance globally. Kallus (2018) proposed a new kernel allocation to divide the test subjects into balanced groups as a priori, before treatment and randomization. To design controlled experiments on networks, both theoretical and methodological works have been developed. See Gui et al. (2015); Phan and Airoldi (2015); Eckles et al. (2016); Basse and Airoldi (2018a), etc. Among them, Gui et al. (2015) proposed an estimator of average treatment effect con- sidering the inference between users on network and a randomized balance graph partition to assign treatments to each of the subnetworks. Eckles et al. (2016) used a graph cluster randomization to reduce the bias of the average treatment effect estimate. In causal inference literature, potential outcome framework is the classic setup and average treatment effect is the target parameter for estimation and inference (Imbens and Rubin, 2015). Under this classic setup, these aforementioned works do not require any probabilis- tic model assumption on the response variable. But often in parts of their papers, some linear model assumptions are used in both theoretical and numerical proofs to show the advantages and properties of the balancing criteria and the design approaches. For example, Morgan and Rubin (2012) assumed an additive linear model to show how much variance reduction can be obtained by rerandomization using Mahalanobis distance. Gui et al. (2015) used a linear additive model in terms of treatment effect, neighboring covariates and neighboring responses as the rational to create the sample estimator of average treatment effect, as well as to simulate data in numerical experiments. Instead of the classic non- parametric setup, some recent works on design for A/B testing experiments have operated under specific parametric model assumptions of the response variable, and optimal design idea is used to propose new design methods. Bhat et al. (2020) developed an off-line and 3 online mathematical programming approaches to solve this optimization problem, the ob- jective function of which is exactly the Ds-optimal design criterion in the classic optimal design literature (Kiefer, 1961; Atkinson and Donev, 1992). In this case, the Ds-optimal criterion minimizes the variance of the treatment effect of a parametric linear model, which is elaborated in Section 2. Optimal design strategies have also been proposed to achieve “network balancing” under the assumption of network-correlated outcomes. For example, Basse and Airoldi (2018b) propose optimal restricted randomization strategies for treatment allocation. Pokhilko et al. (2019) used the conditional autoregressive model (CAR) to in- corporate network structure, and minimize the variance of the estimated treatment effect in CAR to achieve network balancing. In this paper, we develop an optimal design approach for A/B testing experiments in the presence of both covariates and network connection. With a parametric CAR model that assumes the outcome is the sum of treatment effect, covariate effects, and correlated residuals for capturing network effects, we focus on the estimation of treatment effect parameter. The optimal design is to assign treatment settings to different subjects such that variance of the estimated treatment effect is minimized. The resulting optimal design criterion involves an unknown network correlation parameter. We propose a Bayesian optimal design method to obtain an upper bound of the original criterion given a prior distribution of the network correlation parameter. A hybrid approach is proposed to solve the optimization problem to obtain the optimal design. Through numerical experiments, we demonstrate the benefit of the proposed approach compared to the optimal design without network information. 2 Review: Optimal Design without Network Connec- tion We first review the optimal design for A/B testing without network connection, under the additive linear model assumption. It is the basis of the later proposed CAR model to include the network connection between users. Consider n test subjects participating the experiments. For the i-th subject, let x 1, 1 represent the experimental allocation of i ∈ {− } 4 ⊤ A or B treatment, zi =(zi1,...,zip) be the covariates, and yi be the experimental outcome. Assume that the outcome yi is a continuous random variable. Bhat et al. (2020) models the relationship between the covariates and the treatment effect as ⊤ yi = xiθ + fi β + δi for i =1, . , n, (1) Here β Rp+1 is the vector of the linear coefficients for f = (1, z⊤)⊤. We name θ as the ∈ i i treatment effect. Note that it is different from the classic notion of average treatment effect, the usual estimand in potential outcome framework. The model assumption

Optimal Design for A/B Testing in the Presence of Covariates And

Computing the Oja Median in R: the Package Ojanp

Annual Report of the Center for Statistical Research and Methodology Research and Methodology Directorate Fiscal Year 2017

The Theory of the Design of Experiments

Statistical Theory

Chapter 5 Statistical Inference

Identification of Empirical Setting

Statistical Inference: Paradigms and Controversies in Historic Perspective

Topics of Statistical Theory for Register-Based Statistics Zhang, Li-Chun Statistics Norway Kongensgt

Lecture Notes on Statistical Theory1

Statistical Inference a Work in Progress

Principles of Statistical Inference

STATISTICAL TESTING of RANDOMNESS: NEW and OLD PROCEDURES Appeared As Chapter 3 in Randomness Through Computation, H