Precision Matrix Estimation in Gaussian Graphical Models

Precision matrix estimation in Gaussian graphical models Michał Makowski Wydział Matematyki i Informatyki Uniwersytet Wrocławski [email protected] Michał Makowski (WMI UWr) PrecisionJune matrix 4,estimation 2020 in GGMs June 4, 2020 1 / 58 Presentation plan 1 Overview Intuition and examples Background Implementation Results 2 Theory Factorization Multiple testing Alternating direction method of multipliers 3 Even more theory Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 2 / 58 Graphical models A graphical models theory generalizes and is able to describe a broad range of statistical models Markov models/hidden Markov models Bayesian networks Kalman filters neural networks Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 3 / 58 Intuitions Graphical models bound probability and graph theory probability - uncertainty/randomness graph theory - dependence/correlation Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 4 / 58 Directed graph P(sprinkler = off jrain) = 0:8 Sprinkler Rain Lack Discharged of fuel battery Wet Stalled grass engine (a) ’Garden’ model (b) ’Car’ model Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 5 / 58 Undirected graph [1/3] Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 6 / 58 Undirected graph [2/3] Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 7 / 58 Undirected graph [3/3] Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 8 / 58 Gaussian distribution factorization Any multivariate normal distribution N(µ, Σ) can be parameterized by canonical parameters in the form γ = Σ-1µ and Θ = Σ-1: If X ∼ N(µ, Σ) factorizes according to some graph G, then θst = 0 for any pair (s; t) 2= E. This sets up correspondence between the zero pattern of the matrix Θ and pattern of the underlying graph. In particular, if the θst = 0, then variables s and t are conditionally independent, given the other variables. Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 9 / 58 Graph and matrix correspondence 1 2 3 4 5 6 2 1 1 2 3 4 3 5 5 4 5 6 (c) The undirected graph G on six vertices. (d) The associated sparsity pattern of the precision matrix Θ. White squares correspond to zero entries. Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 10 / 58 MLE... Let S be a sample covariance matrix. MLE Θb ML 2 arg max flog det Θ - tr (SΘ)g p Θ2S+ MLE exists iff the matrix S is nonsingular. Then the solution to problem above is simply given by S-1 = Θb : Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 11 / 58 ...and its problems If a number of variables p is comparable or greater than a number of observations N, then the sample covariance matrix S is singular, thus the MLE does not exist. Moreover, it is obvious that P(9i;j σbij = 0) = 0, but in many applications a sparse solution is demanded. Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 12 / 58 Regularization The number of edges can be controlled by the `0-based quantity ρ0(Θ) = I[θst 6= 0]: s6=t X Note that ρ0(Θ) = 2jE(G)j for a given graph G. `0-based problem Θb 2 arg max flog det Θ - tr (SΘ)g p Θ2S+ ρ0(Θ)6k Unfortunately, the `0-based constrained defines a highly nonconvex constraint set, what makes the problem hard to solve. Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 13 / 58 Convex regularization Convex relaxation of `0-based constrain leads to Lλ(Θ; X) = log det Θ - tr (SΘ) - F (Θ): where F (·) denotes any convex function, which might be applied in the considered problem. Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 14 / 58 Main motivation - FDR control A performance of every binary classifier could be summarized in a confusion matrix (+) Real value (-) (+) True positive False positive Test outcome (-) False negative True negative . (Local) False discovery rate (FDR) #[False positive] FDR = E #[False positive] + #[True positive] #[False positive outside the component] localFDR = E #[False positive] + #[True positive] Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 15 / 58 Penalty Lambda based on multiple testing theory gLasso - Bonferonni correction (Banerjee et. al.) gSLOPE - Holm method gSLOPE - Banjamini-Hochberg procedure Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 16 / 58 Lambda comparision Procedure: gLasso (Banerjee) gSLOPE (BH) gSLOPE (Holm) Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 17 / 58 Algorithms For solving the Graphical SLOPE problem we used the Alternating direction method of multipliers, it can solve convex problems in the form minimize f (x) + g(y) subject to Ax + By = c: For solving the Graphical Lasso problem we used an algorithm proposed by Friedman et al. in theirs first work about this method. Although we derived an ADMM-based algorithm, it was orders of magnitude slower than original one. Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 18 / 58 Implementation overview Implementation with R, the huge package for simulations. Various types of graphs structure: cluster, hub, and scale-free. Data: p = 100, n 2 f50; 100; 200; 400g; different magnitude ratios; different sparsity and size of components. Two levels of a desirable FDR control: 0:05 and 0:2 . Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 19 / 58 Cluster results Setup: α = 0:2, 100 variables, 10 components, cluster graph, P(xij 6= 0) = 0:5. MR = -1 MR = 0 MR = 1:25 MR = 1:43 1.00 0.75 0.50 0.25 0.00 MR = 1:67 MR = 2:5 MR = 3:33 MR = 5 Value 1.00 0.75 0.50 0.25 0.00 50 100 200 400 50 100 200 400 50 100 200 400 50 100 200 400 Sample size n Measure: FDR localFDR Power Procedure: gLasso (Banerjee) gSLOPE (BH) gSLOPE (Holm) Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 20 / 58 Cluster ROC Setup: scaled α = 0:05, 100 variables, 10 components, cluster graph, MR = 3:33, P(xij 6= 0) = 0:5. 1.00 0.75 0.50 TPR 0.25 0.00 0.00 0.25 0.50 0.75 1.00 FPR Procedure: gLasso (Banerjee) gSLOPE (BH) gSLOPE (Holm) Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 21 / 58 Hub results Setup: α = 0:2, 100 variables, hub graph. #[components] = 5 #[components] = 10 #[components] = 20 1.00 0.75 MR 0.50 = 0 0.25 0.00 1.00 0.75 MR = 0.50 1 : 43 Value 0.25 0.00 1.00 0.75 MR = 0.50 3 : 33 0.25 0.00 50 100 200 400 50 100 200 400 50 100 200 400 Sample size n Measure: FDR localFDR Power Procedure: gLasso (Banerjee) gSLOPE (BH) gSLOPE (Holm) Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 22 / 58 Non-cluster ROC Setup: scaled α = 0:05, 100 variables, 10 components, hub graph, MR = 3:33. Setup: scaled α = 0:05, 100 variables, scale-free graph, MR = 3:33. 1.00 1.00 0.75 0.75 0.50 0.50 TPR TPR 0.25 0.25 0.00 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 FPR FPR Procedure: gLasso (Banerjee) gSLOPE (BH) gSLOPE (Holm) Procedure: gLasso (Banerjee) gSLOPE (BH) gSLOPE (Holm) Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 23 / 58 Let’s go deeper into theory... Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 24 / 58 Conditional independence Two random variables X , Y are conditionally independent wrt random vector T Z = (Z1;:::; Zn) iff their distributions are independent wrt Z. This relationship is denoted by symbol ??. Formally, for all triples x; y; z (X ?? Y ) j Z () FX ;Y jZ=z(x; y) = FX jZ=z(x) · FY jZ=z(y); where FX ;Y j Z = z(x; y) is the conditional CDF of X and Y for a given Z. Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 25 / 58 Conditional independence and graph structure For a family of multivariate probability distributions there could be constructed a graph which constraints a PDF factorization and the conditional independence property node A is not connected with node B () XA ?? XB j X-AB ; where X-AB denotes all random variables except XA and XB . Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 26 / 58 Conditional independence and precision matrix Canonical parameterization of Multivariate Gaussian Distribution constraint a precision matrix and a conditional independence property. Let (X1;:::; Xn) ∼ N(γ, Θ), then Xs ?? Xt j X-st () θst = 0; where X-st denotes all random variables except Xs and Xt . Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 27 / 58 Graph structure and precision matrix There is connection between an underlying graph and a precision matrix structure node S is not connected with node S () θst = 0; where s and t are variable indexes of Multivariate Gaussian Distribution. Michał Makowski (WMI UWr) Precision matrix estimation in GGMs June 4, 2020 28 / 58 Multiple testing Bonferonni correction α The Bonferroni correction rejects the null hypothesis for each pi 6 m , thereby controls the FWER at level α, that is FWER 6 α.

Load more