Estimating Individual Treatment Effects with Time-Varying Confounders

Estimating Individual Treatment Effects with Time-Varying Confounders Ruoqi Liu∗, Changchang Yin∗, Ping Zhang∗ ∗The Ohio State University, Columbus, Ohio, USA 43210 Email: fliu.7324, yin.731, [email protected] Abstract—Estimating the individual treatment effect (ITE) from observational data is meaningful and practical in health- Zt-1 Zt ... ZT-1 care. Existing work mainly relies on the strong ignorability assumption that no hidden confounders exist, which may lead At At+1 AT YT+t to bias in estimating causal effects. Some studies considering the ... hidden confounders are designed for static environment and not easily adaptable to a dynamic setting. In fact, most observational Xt Xt+1 ... XT data (e.g., electronic medical records) is naturally dynamic and consists of sequential information. In this paper, we propose Deep Sequential Weighting (DSW) for estimating ITE with time-varying t t+1 T Time confounders. Specifically, DSW infers the hidden confounders by incorporating the current treatment assignments and historical Fig. 1. The illustration of causal graphs for causal estimation from dy- information using a deep recurrent weighting neural network. namic observational data. During observed window T , we have a trajectory T The learned representations of hidden confounders combined fXt;At;Zt−1gt=1 [ fYT +τ g, where Xt denotes the current observed with current observed data are leveraged for potential outcome covariates, At denotes the treatment assignments, Zt denotes the hidden and treatment predictions. We compute the time-varying in- confounders and YT +τ denotes the potential outcomes. At any time stamp t, the treatment assignments At are affected by both observed covariates verse probabilities of treatment for re-weighting the population. X and hidden confounders Z (the causal relationship are annotated We conduct comprehensive comparison experiments on fully- t t−1 by blue arrows). The hidden confounders Zt−1 are inferred from previous synthetic, semi-synthetic and real-world datasets to evaluate the hidden confounders, treatments and covariates (annotated by black arrows). performance of our model and baselines. Results demonstrate The potential outcomes YT +τ are affected by last time stamp covariates XT , that our model can generate unbiased and accurate treatment treatment assignments AT and hidden confounders ZT −1. effect by conditioning both time-varying observed and hidden confounders, paving the way for personalized medicine. Index Terms—deep learning, electronic medical record, ITE, There have been lots of existing methods that estimate time-varying confounders ITE by leveraging observational data, including propensity score matching method [6], forest based methods [7], [8], I. INTRODUCTION and representation learning-based methods [9]–[11]. However, Estimating the individual treatment effect (ITE) is a task of many methods are mainly based on the strong ignorability evaluating the causal effect of treatment strategies on some assumption that there are no unobserved confounders and important outcomes over individual-level, which is a signifi- only few studies have considered the influence of hidden cant problem in many areas [1]–[5]. For example, in healthcare confounders [12]. Here, the hidden confounders refer to factors domain, it is critical to prescribe personalized medicines (treat- that affect both treatment assignment and outcome, but are ments) for different patients based on their health conditions. not directly measured in the observational data. For example, To approach this, randomized controlled trial (RCT) is usually physicians may prescribe treatments to the patient based on conducted, which is accomplished by randomly allocating indicators not in the medical records. Ignoring these hidden patients to two groups, treating them differently (i.e., one confounders can lead to bias in estimating causal effects arXiv:2008.13620v2 [stat.ME] 15 Dec 2020 group has intervention and the other has a placebo or no [13]. Besides, these methods are primarily designed for static intervention) and comparing them in terms of a measured settings and are hardly extended to longitudinal observation response. However, conducting RCTs in the healthcare domain data. However, most real-world observational data is naturally is extremely expensive and time-consuming, if not impossible, dynamic and consists of sequential information. For example, due to the requirement of tremendous expert effort and the in EMR, the patient’s conditions (e.g., prescribed medicines, consideration of ethical issues. lab results and vital signs) and treatment assignments are Observational data contain patient records, including their recorded frequently during their stay in the hospital. Therefore, demographic information, vital signs, lab tests and outcomes estimating ITE becomes more challenging when the treatments but without having complete knowledge of why a specific and covariates change over time, and the potential outcomes treatment is applied to a patient. The accumulation of ob- are influenced by historical treatments and covariates. servational data in electronic medical records (EMRs) offers In this paper, we study the problem of Estimating indi- a promising opportunity for ITE estimation when RCTs are vidual treatment effects with time-varying confounders (as expensive or impossible to conduct. illustrated by a causal graph in Fig 1). To alleviate the aforementioned challenges, we propose a novel causal effects TABLE I estimation framework, Deep Sequential Weighting (DSW), to Notations adjust the time-varying confounders in the longitudinal data. Notation Definition The proposed framework DSW consists of three main com- X The space of time-varying covariates ponents: representation learning module, balancing module C The space of static covariates and prediction module. To adjust the time-varying hidden A The set of treatment options of interest confounders, DSW first learns the representations of hidden Y The space of potential outcomes Xt The time-varying covariates of all patients at time t confounders by leveraging the current observed covariates Z The space of hidden confounders and all historical information (i.e., previous covariates and (i) xt The time-varying covariates of i-th patient at time t treatment assignments) through Gated Recurrent Units (GRU) C The static covariates of all patients with an attention mechanism. With the help of the attention c(i) The static covariates of i-th patient mechanism, the model can automatically focus on important At The treatment assigned at time t (i) historical information. Then, we compute the time-varying at The treatment assigned for i-th patient at time t YT +τ The factual (observed) outcomes at time T + τ inverse probability of treatment for each individual to balance The observed/predicted outcome at time T + τ when Y =Y^ the confounding. The learned representations of hidden con- 1;T +τ 1;T +τ receive treatment founders and observed covariates are then combined together (i) (i) The observed/predicted outcomes of i-th patient at y =y^ for both treatment prediction and potential outcome prediction. 1;T +τ 1;T +τ time T + τ when a(i) = 1 The observed/predicted outcome at time T + τ when To demonstrate the effectiveness of our framework, we con- Y =Y^ 0;T +τ 0;T +τ not receive the treatment duct comprehensive experiments on synthetic, semi-synthetic The observed/predicted outcomes of i-th patient at y(i) =y^(i) and real-world EMR datasets (MIMIC-III [14]). DSW out- 0;T +τ 0;T +τ time T + τ when a(i) = 0 performs state-of-the-art baselines in terms of PEHE and e(i)=e^(i) The true/predicted ITE of i-th patient at time t ATE. To further illustrate how our method can be used in (·)t The historical covariates collected before time t personalized medicine, we analyze the treatment effects on im- Zt The learned hidden confounders at time t (i) portant outcomes for ICU septic patients. Results demonstrate zt The learned hidden confounders for patient i at time t (i) (i) (i) fx ; a g [ fc(i)g that our model can generate unbiased and accurate treatment H~ The historical data consists of t t t for i-th patient effect by conditioning both time-varying observed confounders T The number of time stamps (observation window) and hidden confounders. Our model has the potential to be τ The length of prediction window leveraged as part of clinical decision support systems that n The number of patients in the dataset assist physicians to determine whether to introduce treatment to a patient or a specific population. i-th patient, n denotes the number of patients, and Xt de- The contributions of this paper are as follows: notes the time-dependent feature space. The static features • We study the task of estimating ITE with time-varying (e.g., demographic information), do not change overtime are confounders, on which few attention has been before. also considered as observed covariates. We use C 2 C • We propose a novel causal inference framework DSW to represent the static features for all the patients. At each solve the task. DSW fully utilize the historical information time stamp t, the treatment assignments are denoted as (1) (2) (n) (i) and current covariates for learning the representations of At = fat ; at ; :::; at g, where At 2 A, at denotes the hidden confounders. A balancing operation is adopted to treatments assigned to i-th patient. In the case of the binary i generate unbiased and accurate ITE estimation. treatment setting, i.e., at = f0; 1g, where 1 is considered as • We conduct

Estimating Individual Treatment Effects with Time-Varying Confounders

Efficiency of Average Treatment Effect Estimation When the True

Week 10: Causality with Measured Confounding

STATS 361: Causal Inference

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score

Nber Working Paper Series Regression Discontinuity

Targeted Estimation and Inference for the Sample Average Treatment Effect

Average Treatment Effects in the Presence of Unknown

How to Examine External Validity Within an Experiment∗

Inference on the Average Treatment Effect on the Treated from a Bayesian Perspective

Propensity Scores for the Estimation of Average Treatment Effects In

ESTIMATING AVERAGE TREATMENT EFFECTS: INTRODUCTION Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Treatment Effects