Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery

Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery

Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska 1 2 Valerie´ Chavez-Demoulin 1 Thibault Vatter 3 (a) Additive (b) Multiplicative (c) Location−Scale ●● ● ●● ● 5.0 ●● Abstract 2 ● ● 3 ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●● ● ● ●● ● ●●●●● ●●● ●●● ● ● ● ● ●● ● ●●●●●● ●● ● ● ● ● ●●● ●●● ● ● ● ●● ● ●●●●● ● ●●● ●● ● ● ● ● ●● ●●●●● ● ●● ● ●● ● ●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●●●● ●●●●● ●●●●●●● ● ● ● ● ● ● ● ● ●●● ●●●●●●●●●●● ●●●●● ● ● ● ● ●●●● ●●●●●●● ● ● ● ●●●●● ●●●● ● ● ● ● ● ● ●●● ●●● ●● ●●●●●●●●● ●● ●● 2 ● ● ●● ● ● ●●●●●●●●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●●●●●● 2.5 ● ●●● ● 1 ● ● ●●●●●●● ● ● ● Causal inference using observational data is chal- ● ●●●●●●● ●● ● ● ●●●●●●● ●● ●● ● ● ● ● ●● ●●● ● ● ● ●● ●● ●●●●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ●●● ● ●●● ● ●●● ● ●●●●●●●● ● ● ● ● ● ● ●● ● ● ● ● ●●●●●●●● ●● ●● ●●● ● ●● ● ●●● ● ●● ●●●●●●●●● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ●● ●●●● ● ● ● ● ● ● ● ●●● ● ● ●● ●●●●●●●●● ●● ● ● ● ● ●● ● ● ●● ●● ● ●●●●● ● ● ● ● ●● ●●● ●● ●● ● ● ● ●● ●● ● ●●●● ● ●● ●●●● ● ●● ●●●● ● ●● ● ● ● ●●●● ●●●●● ●●●●●●●●●●● ●● ●● ● ● ●● ● ● ●● ●●●●●●●● ● ●●●●●●●●● ●● ● ● ●● ●●●● ●●●●●●●● ● ● ● ●● ●●●●●● ● ●●●●● ●●●●● ● ●● ●● ● ●● ●● ●●●●●●●●● ●●●●● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ●●●●●● ● ●●●● ●● ●● ●●● ●●●●●●●● ●●●●● ● ●● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ●● ● ● ●●●●● ● ● ●● ●●● ●● ●●●●●●●● ●●●●●●●●●●●●●●● ● 1 ● ● ●●●● ● ● ●●● ● ● ● ●● ● ● ● ●●●● ●●●● ●●● ●●●●●●●● ● ● ● ● ●●●●●●●● ●●●● ● ●● ●●●●●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●● ●●●●●●●●●●●●●●●● ●●●●● ● ● ●● ● ●● ●● ●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●● ● ● ●● ●●●● ● ●● ●● ● y ● y ●● y ●●● ● ●● ●●●● ●●●●●●●●●●● ● ●●●● ●●● ● ● ● ●● ● ● ●● ● ●● ●●●●● ● ●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●●●●● ● ● ● ●● ●●●● ●●●●●●● ●●●●● ● ● ● ●●●●● ●●● ●●●●●●●●●● ●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●● ● ● ● ●● ●● ●●● lenging, especially in the bivariate case. Through ●●●●●● ● ● ● ● ●●● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●● ●● ●●● ●●●●●●● 0 ● ● ● ● ● ●● ●●●● ●● ●● ● ● ●● ● ●●●●●●●● 0.0 ●● ● ● ●●● ● ●●● ●●●●●●●●● ●●●●●●●●●●●●● ● ● ● ● ● ●● ●●●●● ●●●●●● ● ●● ● ●●●● ● ●●●● ●● ●●●●●●● ●●●● ● ● ●● ● ●● ●●●● ●●●● ●●●●●●● ● ●● ● ● ● ● ●●● ●●● ●●● ● ●●●●●●●●● ●●●●● ● ● ●●●● ●●●● ●●●● ● ● ● ● ●●●●● ●● ●●●●●● ● ● ●● ● ●● ●● ●●●●●●● ●● ● ●● ●●●●●●●●● ●●●●●●● ● ● ●●● ● ● ●●● ● ● ● ● ●●●● ● ●●●●●● ●● ●●● ● ●●● ●●●●● ●● ●● ● ● ● ●●●● ● ●●●●●●● ●● ● ●● ● ● ●●● ●● ● ● ●●●●● ●● ●●●●● ● ●● ●●● ● ● ● ● ● ● ●●● ●●●●●● ● ●●●● ●●●● ● ● ●● ●● ● ●● ●● ● ● ●●●●● ●●● ● ●●●●●●●●●●●● ● ● ●●● ● ● ● ●●●● ●●● ●●●●●● ● ●●●●● ●● ● ●● ● ● ● ● ● ●● ●●●●● ●●●●●● ●● ● ●● ● ● ●●●● ● ● 0 ●● ●●●●●●●● ●●●●●●●● ● ●● ●● ● ● ● ●●●●● ● ●●● ●●●●● ● ● ● ● ● ● ● ●● ● ●● ●●●●● ●●● ●● ● ● ● ● ●●● ● ●●●●●●●●●● ●●●●●●●●●● ●●● ● ● ● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●● ●● ● ● ● ●●●● ● ●●●●●●●●●● ● ● ● ● ● ●● ●● ●●●●●●● ● ●●●●●●● ●●●●●●●● ● ● ●● ●● ●● ●●●●●● ●● ●●●●●●●● ●●● ● ● ● ● ●●●●●●●●●●●●● the minimum description length principle, we link ●●●●● ●●●●●●●●● ●● ●●●●●● ● ● ● ● ● −1 ● ●●●● ●●●●●●● ●●●●●● ● ● ● ● ● ●●●●●●●●●● ● ●●●●●● ● ● ● ●●● ●●●● ●●●●●● ● ● ● ● ● ● ● ● ●●● ●●●●●●●● ●●●●● ●●● ●●● ●● ● −2.5 ● ●●●●●●●● ● ●●●●●● ●●●● ●● ●●●●●●●●●●●●●● ●● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ●●●●●●●●●●● ●●●●● ●● ● ● ● ● ● ● ●● ●●●●●●● ● ● ● ●●●● ●● ● ● ● ●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●● ● ● ●● ●●●●●●●●●●●●●●● ●● ●●●●●●● ●● ● ● ●●● ●●●●●●●●●●●●●●●●●● ●●●● ● ● −1 ● ●●●●●●●●●●●●●● ● ●●●●●●● ●●●● ● ● ● ●● ●●● ● ● ●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●● ●●● ●● ● ● ●●●● ● ● ●●●●●●●●●●●●●●● ● ● ● ● ●●●●●●●●●●●● ● ● ● ● ● ●●●●●●●●●● ● ● ● ● ● the postulate of independence between the gener- −2 −2 0 2 −2 0 2 −2 0 2 ating mechanisms of the cause and of the effect x x x given the cause to quantile regression. Based on Figure 1. Diverse cause-effect (X ! Y ) setups. Green - median, th th this theory, we develop Bivariate Quantile Causal blue - 0:9 quantile, red - 0:1 quantile. Focusing on the mean Discovery (bQCD), a new method to distinguish only is not enough in the location-scale and multiplicative cases. cause from effect assuming no confounding, se- (Spirtes et al., 2000; Maathuis & Nandy, 2016). This chal- lection bias or feedback. Because it uses multiple lenging task has been tackled by many, often relying on quantile levels instead of the conditional mean testing conditional independence and backed up by heuris- only, bQCD is adaptive not only to additive, but tics (Maathuis & Nandy, 2016; Spirtes & Zhang, 2016; also to multiplicative or even location-scale gen- Peters et al., 2017). erating mechanisms. To illustrate the effective- Borrowing from structural equations and graphical models, ness of our approach, we perform an extensive structural causal models (SCMs, Pearl et al., 2016; Pe- empirical comparison on both synthetic and real ters et al., 2017) represent the causal structure of variables datasets. This study shows that bQCD is robust X ; ··· ;X using equations such as across different implementations of the method 1 d (i.e., the quantile regression), computationally ef- Xc = fc(XPA(c);G;Nc); c 2 f1; : : : ; dg ; ficient, and compares favorably to state-of-the-art methods. where fc is a causal mechanism linking the child/effect Xc to its parents/direct causes XPA(c);G, Nc is another vari- 1. Introduction able independent of XPA(c);G, and G is the directed graph obtained from drawing arrows from parents to their children. Driven by the usefulness of causal inference in most scien- Further complications arise when observing only two vari- tific fields, an increasing body of research has contributed ables: one cannot distinguish between latent confounding towards understanding the generative processes behind data. (X Z ! Y ) and direct causation (X ! Y or X Y ) The aim is elevation of learning models towards more pow- without additional assumptions (Shimizu et al., 2006; Janz- erful interpretations: from correlations and dependencies ing et al., 2012; Peters et al., 2014; Lopez-Paz et al., 2015). towards causation (Pearl, 2009; Spirtes et al., 2000; Dawid In this paper, we focus on distinguishing cause fom effect et al., 2007; Pearl et al., 2016; Scholkopf¨ , 2019). in a bivariate setting, assuming the presence of a causal link, While the golden standard for causal discovery is random- but no confounding, selection bias or feedback. ized control trials (Fisher, 1936), experiments or interven- An alternative is to impose specific structural restrictions. tions in a system are often prohibitively expensive, unethical, For example, (non-)linear additive noise models with Y = or, in many cases, impossible. In this context, an alterna- f(X)+N (Shimizu et al., 2006; Hoyer et al., 2009; Peters tive is to use observational data to infer causal relationships Y et al., 2011) or post nonlinear models Y = g(f(X) + NY ) 1HEC, University of Lausanne 2Swiss Data Science Center (Zhang & Hyvarinen¨ , 2009) allow to establish causal identi- EPFL/ETHZ 3Statistics Department, Columbia University. Corre- fiability without some of those assumptions. spondence to: Natasa Tagasovska <natasa.tagasovska@epfl.ch>. In Figure 1, we illustrate a limitation of causal inference Proceedings of the 37 th International Conference on Machine methods based on the conditional mean and refer to Sec- Learning, Online, PMLR 119, 2020. Copyright 2020 by the au- tion 2.2 for more details on the underlying generative mech- thor(s). anisms. From Figure 1(a), it is clear that the variance of the Bivariate Quantile Causal Discovery ● Postulate 1 (Sgouritsa et al. 2015). The marginal distri- 2000 bution of the cause and the conditional distribution of the effect given the cause, corresponding to independent mech- ● anisms of nature, are algorithmically independent (i.e., they 1500 ● ● ● contain no information about each other). ● ● ● ● ● Information Geometric Causal Inference (IGCI) (Janzing ● ● ● ● ● ● ● et al., 2012) uses the postulate directly for causal discov- ●● 1000 ● ● ● ● ● ●● ● ● ● ● ●● ● Food expenditure Food ● ery. Similarly, Regression Error based Causal Inference ● ●● ● ● ● ●● ●●●● ● ● ●●● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ●● ●● ● ● (RECI) (Blobaum¨ et al., 2018) develops structural assump- ● ● ●● ●●●● ● ● ●● ●●●● ● ● ● ● ● ●●●●●● ●● ● ●● ● ● ● ● ●●● ● ● ●●● ●●●● tions based on the postulate to imply that, if X causes Y ●●●●● ● ● 500 ● ● ● ● ● ●● ●● ● ● ●●●●●●● ● ●●●● ●● ● ●●●●●●● ● ●● ●● ●●●●● ●● implies an asymmetry in mean-regression errors, that is ●●●●●● ● ●● ●● ● ●● ●● ● ●●●● ●●● ●●● E[(Y − E[Y jX])2] ≤ E[(YX − E[XjY ])2]: 0 1000 2000 3000 (1) Income Alternatively, (Mooij et al., 2009; Janzing & Scholkopf¨ , ● th ● ● th ● 0.1 quantile median 0.9 quantile mean 2010) reformulate the postulate through asymmetries in Figure 2. Heteroskedasticity in food expenditure as a function of Kolmogorov complexities (Kolmogorov, 1963) between income for Belgian working class households. Our method uses marginal and conditionals distributions. However, the halt- multiple quantiles to find the correct causal direction. ing problem (Turing, 1938) implies that the Kolmogorov complexity is not computable, and approximations or prox- effect variable is independent of the cause. As a result, the ies have to be derived to make the concept practical. independence between the cause and the effect’s noise can In this context, (Mooij et al., 2010) proposes a method based be measured in various ways by substracting an estimate of on the minimum message length principle using Bayesian the conditional mean. In Figure 1(b) and (c) however, the priors, while others are based on reproducing kernel Hilbert mechanisms

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    13 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us