Granger Causal Network Learning and the Depth Wise Grouped LASSO

Granger Causal Network Learning and the Depth Wise Grouped LASSO by Ryan Kinnear A thesis presented to the University Of Waterloo in fulfilment of the thesis requirement for the degree of Master of Applied Science in Electrical and Computer Engineering Waterloo, Ontario, Canada, 2017 c Ryan Kinnear 2017 Author's Declaration I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract In this thesis we study the notion of Granger-causality, a statistical concept originally developed to estimate causal effects in econometrics. First, we suggest a more general notion of Granger-causality in which to frame the proceeding practical developments. And second, we derive a proximal optimization algorithm to fit large and sparse vector autoregressive models, a task closely connected to the estimation Granger- causality amongst jointly wide sense stationary process. Experimental results from our so called \Depth Wise Grouped LASSO" convex program are obtained for both simulated data, as well as Canadian meteorology data. We conclude by discussing some applications and by suggesting future research questions. iii Acknowledgements I must extend the utmost gratitude to Professor Mazumdar for his unwavering support and guidence, without which this work would never have come to be. And moreover, to my friends, peers, and professors for their inspiration and mentorship. Finally, to my parents for their support and encouragement. iv To my friends and family v Table of Contents List of Figures viii List of Abbreviations ix List of Symbols xi 1 Introduction 1 1.1 The Philosophy of Causality . .1 1.2 Causality in Science . .3 1.2.1 The Causal Calculus of Judea Pearl . .4 1.3 Granger Causality . .5 1.3.1 Granger's Axioms . .5 1.3.2 Defining Causality . .6 1.3.3 Prima Facie Causation . .7 1.3.4 Granger Causality . .7 1.4 Plausible Causal Discovery and Thesis Outline . .8 2 Granger Causality - Theory 9 2.1 Preliminaries . .9 2.1.1 Hilbert Spaces . .9 2.1.2 Convexity . 10 2.1.3 Probability . 12 2.2 Basic Definitions of Granger Causality . 14 2.2.1 Modeling Space . 14 2.2.2 Defining Granger Causality . 17 2.2.3 Basic Properties . 18 2.3 Granger Causality Graphs . 19 2.3.1 Pairwise Granger-Causality . 19 2.4 Time Series Models . 20 2.4.1 Inverting 2.19 . 22 2.4.2 Granger Causality in Autoregressive Models . 22 2.5 Finite Autoregressive Models . 23 2.5.1 Stability . 24 3 Granger Causality - Methods 26 3.1 Classical Methods . 26 3.2 The Linear Model . 26 3.3 Classical Approaches to the Linear Model . 30 3.3.1 Ordinary Least Squares . 30 vi TABLE OF CONTENTS 3.3.2 The LASSO . 34 3.4 Depth Wise Grouped LASSO (DWGLASSO) . 36 3.4.1 Properties of ΓDW ........................ 37 3.4.2 Existence, Uniqueness, and Consistency . 43 3.5 Algorithms for DWGLASSO . 44 3.5.1 Subgradient Descent . 44 3.5.2 Alternating Direction Method of Multipliers (ADMM) . 46 3.5.3 Elastic-Net DWGLASSO . 50 3.6 Simulation Results . 51 3.6.1 ADMM Convergence . 51 3.6.2 Model Consistency in Squared Error . 52 3.6.3 Model Support Recovery . 53 4 Applications 57 4.1 Applications from the Literature . 57 4.1.1 Finance and Economics . 57 4.1.2 Neuroscience . 58 4.1.3 Biology . 58 4.2 DWGLASSO Applied to CWEEDS Temperature Data . 59 5 Conclusion 61 5.1 Further Research . 61 5.1.1 Theoretical Results for Support Recovery . 61 5.1.2 Choice of Hyper-parameters . 61 5.1.3 Model Perturbation . 62 5.2 Summary . 62 Bibliography 66 vii List of Figures 2.1 Illustrations of Convexity . 10 2.2 A Convex Function . 11 2.3 Pairwise Granger-Causality is not Sufficient . 19 2.4 Pairwise Granger-Causality is not Necessary . 20 3.1 A convex function with non-differentiable \kinks". Examples of sub- gradients at x0 are shown in red. 38 3.2 ADMM Convergence . 52 3.3 L2 Convergence . 53 3.4 Random Guess, Base Probabilityq ^ 2 [1; 0] . 54 3.5 Support Recovery, λ 2 [10−5; 10−1:5]................... 55 3.6 ROC Curves for Equation 3.42 . 56 3.7 Matthew's Correlation Coefficient for Equation 3.42 . 56 4.1 Qualitative Measures of Financial Sector Connectedness [15] . 57 4.2 Gene Regulatory Network Inferred by [16] through Granger-causality and the LASSO . 58 4.3 Inferred causality graph. Direction of each edge from west (left) to east (right) or from east to west is indicated by color and line style. The transparency of each edge is weighted by the edge intensity. 59 viii List of Abbreviations P-a.s. Almost surely with respect to the probability measure P.. ADMM Alternating Direction Method of Multipliers. AR Auto Regressive. DW Depth-wise. DWGLASSO Depth-wise Grouped LASSO. GLASSO Grouped LASSO. LASSO Least Absolute Shrinkage and Selection Oper- ator. LMMSE Linear Minimum Mean Squared Error. MCC Matthew's Correlation Coefficient. OLS Ordinary Least Squares. OLST Ordinary Least Squares with Tikhonov Regu- larization. p.d. Purely Deterministic. p.n.d. Purely Non-deterministic. SCM Structural Causal Model. WSS Wide Sense Stationary. ix List of Symbols B(c; r) Ball of radius r centered at the point c. The underlying normed space is to be understood from context, but is usually standard Eu- clidean space. sort. 0n Column vector of size n containing only 0. 1 Logical indicator function. 1(P ) = 1 if propo- sition P is True, and 1(P ) = 0 otherwise. 1n Column vector of size n containing only 1. B? The statistically optimal parameters for a loss function. B Arragement of coefficients of a VAR(p) model. See definition 3.1. Beij Vector of filter coefficients from process xj(t) to xi(t) in a VAR(p) model. See definition. R(τ) Covariance matrix of a WSS process R(τ) = E[X(t)X(t − τ)T)]. conv Closed convex hull of a set. dom f The domain of the function f. δ(t) The Dirac delta function. δ(0) = 1; δ(t) = 0 8t 6= 0. E Mathematical Expectation. G? The adjacency matrix of the true underlying causality graph. With respect to which modeling space must be inferred by context. Ht;p Hilbert space linearly generated by the past of p samples of an n dimensional process x(t). Pp n×n Ht;p = clf τ=1 A(τ)x(t − τ) j A(τ) 2 R g. In Identity matrix of size n × n. x List of Symbols ⊗ Kronecker product of matrices. A 2 Rm×n;B 2 Rp×q =) (A ⊗ B) 2 Rmp×nq with the p × q block of (A ⊗ B) in the i; j position being given by ai;jB.. n L2 n-fold Cartesian Product of L2. L2(Ω; F; P) Hilbert space of square integrable random variables over (Ω; F; P): Usually abbreviated to L2. jj · jj An abstract norm. n The dimension of a process x(t), or the num- ber of nodes in a causality graph. (Ω; F; P) Our underlying probability space with sample space Ω, σ-algebra F and probability measure P. P A Probability Measure. proxf Proximal operator of a function f. See definition 3.7. p The lag length used in an autoregressive model VAR(p). sgn(·) sgn(a) = 1 if a > 0; sgn(a) = −1 if a < 0; sgn(a) = 0; otherwise. T The transpose of a matrix A is written AT. τ A time lag or time difference. Var The variance VarX = E[X − EX]2. VAR(p) Abbreviation for \Vector Auto Regression of order p". A vector random process is modeled Pp as x(t) = τ=1 x(t − τ) + e(t). Xt A space generated by the past of an n- dimensional stochastic process x(t), usually ∆ Pt−1 T n×n n Xt = Ht = clf τ=−∞ aτ x(τ)jA(τ) 2 R g . Z A more convenient arrangement of data for subgradient descent and a distributed formu- lation of DWGLASSO. See definition 3.2. Z The natural arrangement of data for estimat- ing VAR(p) models via least squares. See definition 3.1. xi Geologic history shows us that life is only a short episode between two eternities of death, and that, even in this episode, concious thought has lasted and will last only a moment. Thought is only a gleam in the midst of a long night. But it is this gleam which is everything. - Henri Poincaré,The Value of Science, 1905 xii Chapter 1 Introduction 1.1 The Philosophy of Causality The philosophical study of causality dates back at least 2400 years to the time of Plato who stated \everything that becomes or changes must do so owing to some cause; for nothing can come to be without a cause" [1]. Humanity's understanding of causality has changed dramatically since the time of the ancient Greeks, and while this thesis is in no way a philosophical work, we take some time in this introductory chapter to cover the basic conceptions of causation in order to make more clear the nature of our work. Our main source is the short text by Mumford [2] (and some references therein) which lays out a brief and accessible introduction to the philosophy of causality. Early Beginnings According to Mumford, it was shortly after Plato's statement that Aristotle developed his well known theory of \The Four Causes" [3].

Load more