Text and Causal Inference: a Review of Using Text to Remove Confounding from Causal Estimates

Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates Katherine A. Keith, David Jensen, and Brendan O’Connor College of Information and Computer Sciences University of Massachusetts Amherst fkkeith,jensen,[email protected] Abstract Many applications of computational social sci- ence aim to infer causal conclusions from non- experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects. Unmeasured or latent confounders can bias causal estimates, and this has motivated interest in measuring potential confounders from observed text. For example, an individual’s entire history of social media posts or the content of a news article could provide Figure 1: Left: A causal diagram for text that encodes a rich measurement of multiple confounders. causal confounders, the setting that is focus of this re- Yet, methods and applications for this prob- view paper. The major assumption is that latent con- lem are scattered across different communi- founders can be measured from text and those con- ties and evaluation practices are inconsistent. founder measurements can be used in causal adjust- This review is the first to gather and categorize ments. Right: An example application in which practi- these examples and provide a guide to data- tioner does not have access to the confounding variable, processing and evaluation decisions. Despite occupation, in structured form but can measure con- increased attention on adjusting for confound- founders from unstructured text (e.g. an individual’s so- ing using text, there are still many open prob- cial media posts). lems, which we highlight in this paper. 1 Introduction strongly biased estimates and thus invalid causal In contrast to descriptive or predictive tasks, causal conclusions. inference aims to understand how intervening on To eliminate confounding bias, one approach is one variable affects another variable (Holland, to perform randomized controlled trials (RCTs) in 1986; Pearl, 2000; Morgan and Winship, 2015; Im- which researchers randomly assign treatment. Yet, bens and Rubin, 2015; Hernan´ and Robins, 2020). in many research areas such as healthcare, educa- Specifically, many applied researchers aim to esti- tion, or economics, randomly assigning treatment mate the size of a specific causal effect, the effect of is either infeasible or unethical. For instance, in our a single treatment variable on an outcome variable. running example, one cannot ethically randomly However, a major challenge in causal inference assign participants to smoke since this could ex- is addressing confounders, variables that influence pose them to major health risks. In such cases, re- both treatment and outcome. For example, consider searchers instead use observational data and adjust estimating the size of the causal effect of smoking for the confounding bias statistically with methods (treatment) on life expectancy (outcome). Occupa- such as matching, propensity score weighting, or tion is a potential confounder that may influence regression adjustment (x5). both the propensity to smoke and life expectancy. In causal research about human behavior and so- Estimating the effect of treatment on outcome with- ciety, there are potentially many latent confounding out accounting for this confounding could result in variables that can be measured from unstructured 5332 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5332–5344 July 5 - 10, 2020. c 2020 Association for Computational Linguistics text data. Text data could either (a) serve as a surro- as a confounder have been scattered across many gate for potential confounders; or (b) the language different communities, this review paper aims to of text itself could be a confounder. Our running gather and unify existing approaches and to con- example is an instance of text as a surrogate: a currently serve three different types of researchers researcher may not have a record of an individual’s and their respective goals: occupation but could attempt to measure this variable from the individual’s entire history of social • For applied practitioners, we collect and cat- media posts (see Fig.1). An example of text as a egorize applications with text as a causal con- direct confounder: the linguistic content of social founder (Table1 and x2), and we provide a flow- media posts could influence censorship (treatment) chart of analysts’ decisions for this problem set- and future posting rates (outcome) (Roberts et al., ting (Fig.2). 2020). • For causal inference researchers working A challenging aspect of this research design is with text data, we highlight recent work in rep- the high-dimensional nature of text. Other work has resentation learning in NLP (x4) and caution that explored general methods for adjusting for high- this is still an open research area with questions dimensional confounders (D’Amour et al., 2017; of the sensitivity of effects to choices in repre- Rassen et al., 2011; Louizos et al., 2017; Li et al., sentation. We also outline existing interpretable 2016; Athey et al., 2017). However, text data differ evaluation methods for adjustments of text as a from other high-dimensional data-types because causal confounder (x6). intermediate confounding adjustments can be read • For NLP researchers working with causal in- and evaluated by humans (x6) and designing mean- ference, we summarize some of the most-used ingful representations of text is still an open re- causal estimators that condition on confounders: search question.1 Even when applying simple ad- matching, propensity score weighting, regres- justment methods, a practitioner must first trans- sion adjustment, doubly-robust methods, and form text into a lower-dimensional representation causally-driven representation learning (x5). We via, for example, filtered word counts, lexicon in- also discuss evaluation of methods with con- dicators, topic models, or embeddings (x4). An structed observational studies and semi-synthetic additional challenge is that empirical evaluation data (x7). in causal inference is still an open research area (Dorie et al., 2019; Gentzel et al., 2019) and text 2 Applications adds to the difficulty of this evaluation (x7). In Table1, we gather and summarize applications We narrow the scope of this paper to review that use text to adjust for potential confounding. methods and applications with text data as a causal This encompasses both (a) text as a surrogate for confounder. In the broader area of text and causal confounders, or (b) the language itself as con- inference, work has examined text as a mediator founders.2 (Veitch et al., 2019), text as treatment (Fong and As an example, consider Kiciman et al.(2018) Grimmer, 2016; Egami et al.; Wood-Doughty et al., where the goal is to estimate the size of the causal 2018; Tan et al., 2014), text as outcome (Egami effect of alcohol use (treatment) on academic suc- et al.), causal discovery from text (Mani and cess (outcome) for college students. Since ran- Cooper, 2000), and predictive (Granger) causality domly assigning college students to binge drink is with text (Balashankar et al., 2019; del Prado Mar- not feasible or ethical, the study instead uses ob- tin and Brendel, 2016; Tabari et al., 2018). servational data from Twitter, which also has the Outside of this prior work, there has been rela- advantage of a large sample size of over sixty-three tively little interaction between natural language thousand students. They use heuristics to identify processing (NLP) research and causal inference. NLP has a rich history of applied modeling and di- 2We acknowledge that Table1 is by no means exhaus- tive. To construct Table1, we started with three seed papers: agnostic pipelines that causal inference could draw Roberts et al.(2020), Veitch et al.(2019), and Wood-Doughty upon. Because applications and methods for text et al.(2018). We then examined papers cited by these papers, papers that cited these papers, and papers published by the 1For instance, there have been four workshops on repre- papers’ authors. We repeated this approach with the addi- sentation learning at major NLP conferences in the last four tional papers we found that adjusted for confounding with years (Blunsom et al., 2016, 2017; Augenstein et al., 2018, text. We also examined papers matching the query “causal” or 2019). “causality” in the ACL Anthology. 5333 Paper Treatment Outcome(s) Confounder Text data Text rep. Adjustment method Johansson et al. Viewing device Reader’s experience News content News Word counts Causal-driven rep. (2016) (mobile or desktop) learning De Choudhury et al. Word use in mental User transitions to post Previous text written in a Social media Word counts Stratified propensity (2016) health community in suicide community forum (Reddit) score matching De Choudhury and Language of comments User transitions to post User’s previous posts and Social media Unigrams and Stratified propensity Kiciman(2017) in suicide community comments received (Reddit) bigrams score matching Falavarjani et al. Exercise (Foursquare Shift in topical interest Pre-treatment topical Social media Topic models Matching (2017) checkins) on Twitter interest shift (Twitter, Foursquare) Olteanu et al. Current word use Future word use Past word use Social media Top unigrams Stratified propensity (2017) (Twitter) and bigrams score matching Pham and Shen Group vs. individual Time until borrowers Loan description

Text and Causal Inference: a Review of Using Text to Remove Confounding from Causal Estimates

The Practice of Causal Inference in Cancer Epidemiology

Statistics and Causal Inference (With Discussion)

Bayesian Causal Inference

Causal Effects of Monetary Shocks: Semiparametric

Articles Causal Inference in Civil Rights Litigation

Causation and Causal Inference in Epidemiology

Experiments & Observational Studies: Causal Inference in Statistics

Causation and Experimentation

Elements of Causal Inference

Causation in Mixed Methods Research

A Logic for Causal Inference in Time Series with Discrete and Continuous Variables

Week 10: Causality with Measured Confounding