Gaussian Graphical Models Arguable Points
Total Page:16
File Type:pdf, Size:1020Kb
Gaussian Graphical Models arguable points Gaussian Graphical Models Jiang Guo Junuary 9th, 2013 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Contents 1 Gaussian Graphical Models Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Sparsity Graphical Lasso CLIME TIGER Greedy methods Measure methods non-gaussian scenario Applications Project 2 arguable points Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Undirected Graphical Model Markov Property pairwise Xu ?? XvjXV nfu;vg if fu; vg 2= E local global Conditional Independence Partial Correlation Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Gaussian Graphical Model Multivariate Gaussian X ∼ Nd(ξ; Σ) If Σ is positive definite, distribution has density on Rd T f(xjξ; Σ) = (2π)−d=2(detΩ)1=2e−(x−ξ) Ω(x−ξ)=2 where Ω = Σ−1 is the Precision matrix of the distribution. Marginal distribution: Xs ∼ Nd0 (ξs; Σs;s) Conditional distribution: X1jX2 ∼ N (ξ1j2; Σ1j2) −1 where: ξ1j2 = ξ1 + Σ12Σ22 (x2 − ξ2) −1 Σ1j2 = Σ11 − Σ12Σ22 Σ21 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Gaussian Graphical Model Multivariate Gaussian Sample Covariance Matrix n 1 X Σ~ = (x − ξ)(x − ξ)T n − 1 i i i=1 Precision Matrix Ω = Σ−1 In high dimensional settings where p n, the Σ~ is not invertible (semi-positive definite). Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Gaussian Graphical Model Every Multivariate Gaussian distribution can be represented by a pairwise Gaussian Markov Random Field (GMRF) GMRF: Undirected graph G = (V; E) with vertex set V = f1; :::; pg corresponding to random variables edge set E = f(i; j) 2 V ji 6= j; Ωij 6= 0g Goal: Estimate sparse graph structuregiven n p iid observations. Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Precision matrix estimation Graph recovery also known as "Graph structure learning/estimation" For each pair of nodes (variables), decide whether there should be an edge Edge(α, β) not exists , α ?? βjV nfα; βg , Ωα,β = 0 Precision matrix is sparse Hence, it turns out to be a non-zero patterns detection problem. x y z w x 0101 y 1001 z 0 0 0 0 w 1100 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Precision matrix estimation Parameter Estimation x y z w x 0?0? y ?00? z 0 0 0 0 w ??00 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity The word "sparsity" has (at least) four related meanings in NLP! (Noah Smith et al.) 1 Data sparsity: N is too small to obtain a good estimate for w. (usually bad) 2 "Probability" sparsity: most of events receive zero probability 3 Sparsity in the dual: Associated with SVM and other kernel-based methods. 4 Model sparsity: Most dimensions of f is not needed for a good hw; those dimensions of w can be zero, leading to a sparse w (model) We focus on sense #4. Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity Linear regression d X f(~x) = w0 + wi ∗ xi i=1 sparsity means some of the wi(s) are zero 1 problem 1: why do we need sparse solution? feature/variable selection better interprete the data shrinkage the size of model computatioal savings discourage overfitting 2 problem 2: how to achieve sparse solution? solutions to come... Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity Ordinary Least Square Objective function to minimize: N X 2 L = jyi − f(xi)j i=1 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity Ridge: L2 norm Regularization N X λ min L = jy − f(x )j2 + k~wk2 i i 2 2 i=1 equaivalent form (constrained optimization): N X 2 min L = jyi − f(xi)j i=1 2 subject to k~wk2 6 C Corresponds to zero-mean Gaussian prior ~w ∼ N (0; σ2), i.e. 2 p(wi) / exp(−λkwk2) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity Lasso: L1 norm Regularization N X λ min L = jy − f(x )j2 + k~wk i i 2 1 i=1 equaivalent form (constrained optimization): N X 2 min L = jyi − f(xi)j i=1 subject to k~wk1 6 C Corresponds to zero-mean Laplace prior ~w ∼ Laplace(0; b), i.e. p(wi) / exp(−λjwij) sparse solution Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Why lasso sparse? an intuitive interpretation Lasso Ridge Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Algorithms for the Lasso Subgradient methods interior-point methods (Boyd et al 2007) Least Angle RegreSsion(LARS) (Efron et al 2004), computes entire path of solutions. State of the Art until 2008. Pathwise Coordinate Descent (Friedman, Hastie et al 2007) Proximal Gradient (project gradient) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Pathwise Coordinate Descent for the Lasso Coordinate descent: optimize one parameter (coordinate) at a time. How? suppose we had only one predictor, problem is to minimize X 2 (γi − xiβ) + λjβj i Solution is the soft-thresholded estimate ^ ^ sign(β)(jβj − λ)+ where β^ is the least square solution. and 8 9 < z − λ if z > 0 and λ < jzj = sign(z)(jzj − λ)+ = z + λ if z < 0 and λ < jzj : 0 if λ > jzj ; Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Pathwise Coordinate Descent for the Lasso With multiple predictors, cycle through each predictor in turn. We P ^ compute residuals γi = yi − j6=k xijβk and apply univariate soft-thresholding, pretending our data is (xij; ri) Start with large value for λ (high sparsity) and slowly decrease it. Exploits current estimation as warm start, leading to a more stable solution. Jiang Guo Gaussian Graphical Models Subset selection: L0 regularization N X λ L = jy − f(x )j2 + k~wk i i 2 0 i=1 Greedy forward Greedy backward Greedy forward-backward (Tong Zhang, 2009 and 2011) Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Variable selection Lasso: L1 regularization N X λ L = jy − f(x )j2 + k~wk i i 2 1 i=1 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Variable selection Lasso: L1 regularization N X λ L = jy − f(x )j2 + k~wk i i 2 1 i=1 Subset selection: L0 regularization N X λ L = jy − f(x )j2 + k~wk i i 2 0 i=1 Greedy forward Greedy backward Greedy forward-backward (Tong Zhang, 2009 and 2011) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Greedy forward-backward algorithm Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Graphical Lasso Recall the Gaussian graphical model(multi-variate gaussian) T f(xjξ; Σ) = (2π)−d=2(detΩ)1=2e−(x−ξ)