Causal Aggregation: Estimation and Inference of Causal Effects by Constraint-Based Data Fusion

Causal aggregation: estimation and inference of causal effects by constraint-based data fusion Jaime Roquero Gimenez and Dominik Rothenhäusler Department of Statistics Stanford University Stanford, CA 94305, USA Abstract Randomized experiments are the gold standard for causal inference. In experiments, usually one variable is manipulated and its effect on an outcome is measured. However, practitioners may also be interested in the effect on a fixed target variable of simultaneous interventions on multiple covariates. We propose a novel method that allows to estimate the effect of joint interventions using data from different experiments in which only very few variables are manipulated. If the joint causal effect is linear, the proposed method can be used for estimation and inference of joint causal effects, and we characterize conditions for identifiability. The proposed method allows to combine data sets arising from randomized experiments as well as observational data sets for which IV assumptions or unconfoundedness hold: we indicate how to leverage all the available causal information to efficiently estimate the causal effects in the overidentified setting. If the dimension of the covariate vector is large, we may have data from experiments on every covariate, but only a few samples per randomized covariate. Under a sparsity assumption, we derive an estimator of the causal effects in this high-dimensional scenario. In addition, we show how to deal with the case where a lack of experimental constraints prevents direct estimation of the causal effects. When the joint causal effects are non-linear, we characterize conditions under which identifiability holds, and propose a non-linear causal aggregation methodology for experimental data sets similar to the gradient boosting algorithm where in each iteration we combine weak learners trained on different datasets using only unconfounded samples. We demonstrate the effectiveness of the proposed method on simulated and semi-synthetic data. 1 Introduction Causal inference is a centerpiece of scientific research, with applications ranging from the social arXiv:2106.03024v1 [stat.ME] 6 Jun 2021 sciences to biology. Often, the goal in causal inference is to estimate the effect of one single variable on one outcome: randomizing that variable and evaluating its effect on the outcome is one way to do so. Randomization provides a gold standard procedure for identifying causal effects related to that variable, as the intervention removes any spurious association with the response due to unmeasured factors. Sometimes it is also of interest to estimate the effect of joint interventions on multiple variables on an outcome. Ideally, if all the variables are jointly randomized, one can precisely reconstruct the global causal mechanism, including potential interactions between covariates. In practice, however, we only have access to multiple individual experiments –also called environments– where only a few variables are simultaneously manipulated, and provide partial information about such mechanism. We formulate a procedure for aggregating the knowledge obtained from several experiments that reconstructs a complex global causal model, capturing the causal effect of multiple covariates on the response as if they were all simultaneously manipulated. Our framework for 1 aggregating causal information also allows the use of Instrumental Variables (IV) and covariate adjustment as building blocks. A practical application could be the following: during the last decade, internet companies have massively adopted a new experimental framework to improve their web-based products: the WebLab. Any website has a myriad of design choices built in it that affect the customer behavior, such as the ranking of articles in a newsfeed, the location of an ad within a webpage, etc. These companies have the possibility of running randomized A/B experiments where customers are redirected to slightly different versions of the website. Actionable insights may be obtained by evaluating downstream metrics that reflect the effect of a particular change in the website. Estimating the combined effect of several changes would require simultaneous randomization of many parameters, which may not be feasible in practice as the user experience would vary too much. Therefore it may be of interest to understand how insights from different individual experiments, each manipulating a small number of parameters, may be aggregated into a single causal model. The following examples illustrate the purpose of our method. Example 1 Assume that variables are related via the structural causal model (Wright, 1921; Bollen, 1989) presented in Figure 1. We observe samples from the covariates X1;X2 and the response Y . <latexit sha1_base64="SKzCVfI8Kc37oRuJrpeYPo/j73k=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kokoeix66bEF+wFtKJvtpF272YTdjVBCf4EXD4p49Sd589+4bXPQ1gcDj/dmmJkXJIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ThVDJssFrHqBFSj4BKbhhuBnUQhjQKB7WB8P/PbT6g0j+WDmSToR3QoecgZNVZq1Pqlsltx5yCrxMtJGXLU+6Wv3iBmaYTSMEG17npuYvyMKsOZwGmxl2pMKBvTIXYtlTRC7WfzQ6fk3CoDEsbKljRkrv6eyGik9SQKbGdEzUgvezPxP6+bmvDWz7hMUoOSLRaFqSAmJrOvyYArZEZMLKFMcXsrYSOqKDM2m6INwVt+eZW0LivedcVtXJWrd3kcBTiFM7gAD26gCjWoQxMYIDzDK7w5j86L8+58LFrXnHzmBP7A+fwBnsuM0A==</latexit> <latexit sha1_base64="ILOb/iizaFRHqC0R90VdJHZ5vNE=">AAACYXicbVHNS8MwHE3r15xO6zx6CQ5lg1Ga4edBEL14EFR0rrBuJc0yDaYfJKk4yv5Jb168+I+Y1oI6fZDweO/9SPISJJxJ5Thvhjk3v7C4VFmurqzW1tatjfq9jFNBaJfEPBZugCXlLKJdxRSnbiIoDgNOe8HTee73nqmQLI7u1CShgxA/RGzMCFZa8q2XXegFVOGhA09gE7VRq+p5sBT9rOn6qO36ndZ0mF1d3k6LkI0O2jDff2d19DvVsY9mzM63iezjqm81HNspAP8SVJIGKHHtW6/eKCZpSCNFOJayj5xEDTIsFCOcTqteKmmCyRN+oH1NIxxSOciKhqZwRysjOI6FXpGChfpzIsOhlJMw0MkQq0c56+Xif14/VeOjQcaiJFU0Il8HjVMOVQzzuuGICUoUn2iCiWD6rpA8YoGJ0p+Sl4Bmn/yX3HdstG87N3uN07OyjgrYAtugCRA4BKfgAlyDLiDg3Zg3asaa8WEum5ZZ/4qaRjmzCX7B3PoERB2ruA==</latexit> H <latexit sha1_base64="7c/Aj+Jb7aayezZEDibBQvxqRYg=">AAACsXicdZFRT9swEMedsA1W2CjscS8WFQgJLUoKCB4Re+kjSBRaJaFz3AuYOnFkX4Aq6vfb8972beaUStDSnWTpr/vf72zfJYUUBn3/r+OufPj4aXXtc2N948vXzebW9rVRpebQ5Uoq3UuYASly6KJACb1CA8sSCTfJ6Gft3zyCNkLlVzguIM7YXS5SwRna1KD5u0P3aPQsIUWmtXoK42pCIyiMkNbu0EYUhT987zh6UBjT3iBYUt7u0INXJlhE2kuQutEBnePa81z/v1TdcYHt08ag2fI9fxr0vQhmokVmcTFo/omGipcZ5MglMyYM/ALjimkUXMKkEZUGCsZH7A5CK3OWgYmr6cQndNdmhjRV2p4c6TT7lqhYZsw4S2xlxvDeLHp1cpkXlpiexpXIixIh5y8XpaWkqGi9PjoUGjjKsRWMa2HfSvk904yjXXI9hGDxy+/FddsLjj3/8qh1dj4bxxr5TnbIPgnICTkjHXJBuoQ7nnPlxM6te+j23V9u8lLqOjPmG5kLd/QPDAvL6g==</latexit> H ✏ β0 =(1, 1) H <latexit sha1_base64="h9U9GbYOM1m/jysYDrO+Cd6991k=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0WPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzU8Prlilt15yCrxMtJBXLU++Wv3iBmaYTSMEG17npuYvyMKsOZwGmpl2pMKBvTIXYtlTRC7WfzQ6fkzCoDEsbKljRkrv6eyGik9SQKbGdEzUgvezPxP6+bmvDGz7hMUoOSLRaFqSAmJrOvyYArZEZMLKFMcXsrYSOqKDM2m5INwVt+eZW0LqreVdVtXFZqt3kcRTiBUzgHD66hBvdQhyYwQHiGV3hzHp0X5935WLQWnHzmGP7A+fwBe++MuQ==</latexit> <latexit sha1_base64="h9U9GbYOM1m/jysYDrO+Cd6991k=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0WPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzU8Prlilt15yCrxMtJBXLU++Wv3iBmaYTSMEG17npuYvyMKsOZwGmpl2pMKBvTIXYtlTRC7WfzQ6fkzCoDEsbKljRkrv6eyGik9SQKbGdEzUgvezPxP6+bmvDGz7hMUoOSLRaFqSAmJrOvyYArZEZMLKFMcXsrYSOqKDM2m5INwVt+eZW0LqreVdVtXFZqt3kcRTiBUzgHD66hBvdQhyYwQHiGV3hzHp0X5935WLQWnHzmGP7A+fwBe++MuQ==</latexit> <latexit sha1_base64="yMsISBAlJXnCY/UmYzj6RS7qozc=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoseiF48t2FpoQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgTj25n/8IRK81jem0mCfkSHkoecUWOlZq1frrhVdw6ySrycVCBHo1/+6g1ilkYoDRNU667nJsbPqDKcCZyWeqnGhLIxHWLXUkkj1H42P3RKzqwyIGGsbElD5urviYxGWk+iwHZG1Iz0sjcT//O6qQmv/YzLJDUo2WJRmApiYjL7mgy4QmbExBLKFLe3EjaiijJjsynZELzll1dJu1b1Lqtu86JSv8njKMIJnMI5eHAFdbiDBrSAAcIzvMKb8+i8OO/Ox6K14OQzx/AHzucPfXOMug==</latexit> OLS 2 1 1 X1 2H + ✏1 β =(1.16, 1.16) (X1,X2) X2 X1 + H + ✏2 βOLS =2.8 <latexit sha1_base64="B/SLs8KA4hfqP4PTLT34tEumplA=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0nEoseiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz00Ol7/XLFrbpzkFXi5aQCORr98ldvELM0QmmYoFp3PTcxfkaV4UzgtNRLNSaUjekQu5ZKGqH2s/mpU3JmlQEJY2VLGjJXf09kNNJ6EgW2M6JmpJe9mfif101NeO1nXCapQckWi8JUEBOT2d9kwBUyIyaWUKa4vZWwEVWUGZtOyYbgLb+8SloXVa9Wde8vK/WbPI4inMApnIMHV1CHO2hAExgM4Rle4c0Rzovz7nwsWgtOPnMMf+B8/gDcM42E</latexit> <latexit sha1_base64="6+F5Dg+sLj4A88bmSQ1A5xxVgww=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kokoeix68diC/ZA2lM120q7dbMLuRiihv8CLB0W8+pO8+W/ctjlo64OBx3szzMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6nfqtJ1Sax/LejBP0IzqQPOSMGivVH3qlsltxZyDLxMtJGXLUeqWvbj9maYTSMEG17nhuYvyMKsOZwEmxm2pMKBvRAXYslTRC7WezQyfk1Cp9EsbKljRkpv6eyGik9TgKbGdEzVAvelPxP6+TmvDaz7hMUoOSzReFqSAmJtOvSZ8rZEaMLaFMcXsrYUOqKDM2m6INwVt8eZk0zyveZcWtX5SrN3kcBTiGEzgDD66gCndQgwYwQHiGV3hzHp0X5935mLeuOPnMEfyB8/kDuI+M4Q==</latexit> <latexit sha1_base64="bNKxsSjrCU8l3hcc9TKu7vELPHo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoseiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz00OnX+uWKW3XnIKvEy0kFcjT65a/eIGZphNIwQbXuem5i/Iwqw5nAaamXakwoG9Mhdi2VNELtZ/NTp+TMKgMSxsqWNGSu/p7IaKT1JApsZ0TNSC97M/E/r5ua8NrPuExSg5ItFoWpICYms7/JgCtkRkwsoUxxeythI6ooMzadkg3BW355lbRqVe+y6t5fVOo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AHdt42F</latexit> X1 <latexit sha1_base64="h9U9GbYOM1m/jysYDrO+Cd6991k=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0WPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzU8Prlilt15yCrxMtJBXLU++Wv3iBmaYTSMEG17npuYvyMKsOZwGmpl2pMKBvTIXYtlTRC7WfzQ6fkzCoDEsbKljRkrv6eyGik9SQKbGdEzUgvezPxP6+bmvDGz7hMUoOSLRaFqSAmJrOvyYArZEZMLKFMcXsrYSOqKDM2m5INwVt+eZW0LqreVdVtXFZqt3kcRTiBUzgHD66hBvdQhyYwQHiGV3hzHp0X5935WLQWnHzmGP7A+fwBe++MuQ==</latexit>

Causal Aggregation: Estimation and Inference of Causal Effects by Constraint-Based Data Fusion

TWAS Fellowships Worldwide

Millennium Prize for the Poincaré

4. a Close Call: How a Near Failure Propelled Me to Succeed By

Program of the Sessions San Diego, California, January 9–12, 2013

Tao Awarded Nemmers Prize

The Work of Terence Tao

Ten Mathematical Landmarks, 1967–2017

Terence Tao Becomes Patron of International Mathematical Olympiad Foundation

Mathematics People

Dear Friends of Mathematics, and Translations to Complement the Original Source Material

Dr. Terence Tao, Inaugural Recipient of the Joseph I. Lieberman Award He Joseph I

January 2002 Prizes and Awards