<<

Gaussian Graphical Models arguable points

Gaussian Graphical Models

Jiang Guo

Junuary 9th, 2013

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Contents

1 Gaussian Graphical Models Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Sparsity Graphical Lasso CLIME TIGER Greedy methods Measure methods non-gaussian scenario Applications Project

2 arguable points

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Undirected Graphical Model

Markov Property pairwise

Xu ⊥⊥ Xv|XV \{u,v} if {u, v} ∈/ E

local global

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Gaussian Graphical Model

Multivariate Gaussian

X ∼ Nd(ξ, Σ) If Σ is positive definite, distribution has density on Rd

T f(x|ξ, Σ) = (2π)−d/2(detΩ)1/2e−(x−ξ) Ω(x−ξ)/2

where Ω = Σ−1 is the Precision matrix of the distribution.

Marginal distribution: Xs ∼ Nd0 (ξs, Σs,s) Conditional distribution:

X1|X2 ∼ N (ξ1|2, Σ1|2) −1 where: ξ1|2 = ξ1 + Σ12Σ22 (x2 − ξ2) −1 Σ1|2 = Σ11 − Σ12Σ22 Σ21

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Gaussian Graphical Model

Multivariate Gaussian Matrix

n 1 X Σ˜ = (x − ξ)(x − ξ)T n − 1 i i i=1 Precision Matrix Ω = Σ−1 In high dimensional settings where p  n, the Σ˜ is not invertible (semi-positive definite).

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Gaussian Graphical Model

Every Multivariate Gaussian distribution can be represented by a pairwise Gaussian Markov (GMRF) GMRF: Undirected graph G = (V,E) with vertex set V = {1, ..., p} corresponding to random variables edge set E = {(i, j) ∈ V |i 6= j, Ωij 6= 0}

Goal: Estimate sparse graph structuregiven n  p iid observations.

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Precision matrix estimation

Graph recovery also known as ”Graph structure learning/estimation” For each pair of nodes (variables), decide whether there should be an edge

Edge(α, β) not exists ⇔ α ⊥⊥ β|V \{α, β} ⇔ Ωα,β = 0 Precision matrix is sparse Hence, it turns out to be a non-zero patterns detection problem. x y z w x 0101 y 1001 z 0 0 0 0 w 1100

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Precision matrix estimation

Parameter Estimation x y z w x 0?0? y ?00? z 0 0 0 0 w ??00

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity

The word ”sparsity” has (at least) four related meanings in NLP! (Noah Smith et al.) 1 sparsity: N is too small to obtain a good estimate for w. (usually bad) 2 ”” sparsity: most of events receive zero probability 3 Sparsity in the dual: Associated with SVM and other kernel-based methods.

4 Model sparsity: Most dimensions of f is not needed for a good hw; those dimensions of w can be zero, leading to a sparse w (model) We focus on sense #4.

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity

Linear regression

d X f(~x) = w0 + wi ∗ xi i=1

sparsity some of the wi(s) are zero

1 problem 1: why do we need sparse solution? feature/variable selection better interprete the data shrinkage the size of model computatioal savings discourage overfitting 2 problem 2: how to achieve sparse solution? solutions to come...

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity

Ordinary Least Square Objective function to minimize:

N X 2 L = |yi − f(xi)| i=1

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity

Ridge: L2 norm Regularization

N X λ min L = |y − f(x )|2 + k~wk2 i i 2 2 i=1 equaivalent form (constrained optimization):

N X 2 min L = |yi − f(xi)| i=1 2 subject to k~wk2 6 C

Corresponds to zero- Gaussian prior ~w ∼ N (0, σ2), i.e. 2 p(wi) ∝ exp(−λkwk2)

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity

Lasso: L1 norm Regularization

N X λ min L = |y − f(x )|2 + k~wk i i 2 1 i=1 equaivalent form (constrained optimization):

N X 2 min L = |yi − f(xi)| i=1 subject to k~wk1 6 C Corresponds to zero-mean Laplace prior ~w ∼ Laplace(0, b), i.e. p(wi) ∝ exp(−λ|wi|) sparse solution

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Why lasso sparse? an intuitive interpretation

Lasso Ridge

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Algorithms for the Lasso

Subgradient methods interior-point methods (Boyd et al 2007) Least Angle RegreSsion(LARS) (Efron et al 2004), computes entire path of solutions. State of the Art until 2008. Pathwise Coordinate Descent (Friedman, Hastie et al 2007) Proximal Gradient (project gradient)

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Pathwise Coordinate Descent for the Lasso

Coordinate descent: optimize one parameter (coordinate) at a time. How? suppose we had only one predictor, problem is to minimize

X 2 (γi − xiβ) + λ|β| i Solution is the soft-thresholded estimate ˆ ˆ sign(β)(|β| − λ)+

where βˆ is the least square solution. and    z − λ if z > 0 and λ < |z|  sign(z)(|z| − λ)+ = z + λ if z < 0 and λ < |z|  0 if λ > |z| 

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Pathwise Coordinate Descent for the Lasso

With multiple predictors, cycle through each predictor in turn. We P ˆ compute residuals γi = yi − j6=k xijβk and apply univariate soft-thresholding, pretending our data is (xij, ri) Start with large value for λ (high sparsity) and slowly decrease it. Exploits current estimation as warm start, leading to a more stable solution.

Jiang Guo Gaussian Graphical Models Subset selection: L0 regularization N X λ L = |y − f(x )|2 + k~wk i i 2 0 i=1

Greedy forward Greedy backward Greedy forward-backward (Tong Zhang, 2009 and 2011)

Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Variable selection

Lasso: L1 regularization N X λ L = |y − f(x )|2 + k~wk i i 2 1 i=1

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Variable selection

Lasso: L1 regularization N X λ L = |y − f(x )|2 + k~wk i i 2 1 i=1

Subset selection: L0 regularization N X λ L = |y − f(x )|2 + k~wk i i 2 0 i=1

Greedy forward Greedy backward Greedy forward-backward (Tong Zhang, 2009 and 2011)

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Greedy forward-backward algorithm

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Graphical Lasso

Recall the Gaussian graphical model(multi-variate gaussian)

T f(x|ξ, Σ) = (2π)−d/2(detΩ)1/2e−(x−ξ) Ω(x−ξ)/2

log-likelihood log det Ω − trace(ΣΩ)ˆ

Graphical lasso (Friedman et al 2007)

Maximize the L1 penalized log-likelihood:

log det Ω − trace(ΣΩ)ˆ − λkΩk1

Coordinate descent

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project CLIME

Constrained L1 Minimization approach to sparse precision matrix Estimation. (CAI et al 2011) CLIME estimator

minkΩk1 subject to: ˆ p×p kΣΩ − Ik∞ 6 λn, Ω ∈ R

Solution Ωˆ have to be symmetrized Equivalent to solving the p optimization problems: ˆ minkβk1 subject to: kΣβ − eik∞ 6 λn

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project CLIME

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Gaussian Graphical Model and Column-by-Column Regression

Consider the conditional distribution

X1|X2 ∼ N (ξ1|2, Σ1|2) −1 where: ξ1|2 = ξ1 + Σ12Σ22 (X2 − ξ2) −1 Σ1|2 = Σ11 − Σ12Σ22 Σ21

By standardizing the data matrix, i.e. ξ1 = ξ2 = 0

−1 −1 X1|X2 ∼ N (Σ12Σ22 X2, Σ11 − Σ12Σ22 Σ21)

Column-by-Column Regression

T X1 = α1 X2 + 1 where 1 ∼ N (0, σ1)

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Column-by-Column Regression

node-by-node neighbor detection Easy to implement Easy to parallelize, scalable to large scale data

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project TIGER

A generalized framework for

βˆ = arg min L(β; X,Y ) + Ω(β) β

A Tuning-Insensitive Approach for Optimally Estimating Gaussian Graphical Models (Han et al 2012) SQRT-Lasso

ˆ 1 β = arg min {√ ky − Xβk2 + λkβk1} β∈Rd n

Tuning-insensitive (good point) State-of-the-art

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Greedy methods

High-dimensional (Gaussian) Graphical Model Estimation Using Greedy Methods (Pradeep et al 2012) Rather than exploiting lasso-like methods to achieve sparse solution, we apply greedy methods to do variable selection Global greedy: treat each element of the Precision Matrix as a Variable Local greedy: Column-by-column fashion (Potentially) state-of-the-art

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Global Greedy

Estimate graph structure through a series of forward and backward stagewise steps Forward Step: Choose ”best” new edge and add to current estimate, as long as decrease in loss δ exceeds stopping criterion. Backward Step: Choose ”weakest” current edge and remove if increase in loss is < product of backward step factor and decrease in loss due to previous forward step (νδ). ”Best” and ”Weakest” determined by sample-based Gaussian MLE

L(Ω) = log det Ω − trace(ΣΩ)ˆ

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Local Greedy

Estimates each node’s neighborhood in parallel using a series of forward and backward steps Forward Step: Choose ”best” new edge and add to current estimate, as long as decrease in loss δ exceeds stopping criterion. Backward Step: Choose ”weakest” current edge and remove if increase in loss is < product of backward step factor and decrease in loss due to previous forward step (νδ).

”Best” and ”Weakest” determined by least-square loss

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Measure methods

Graph recovery: ROC Curves

TP: True positive rate FP: False positve rate

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Graph recovery: ROC Curves

Decrease the regularization parameter gradually to achieve the solution path

An illustration:

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Measure methods

Parameter estimation: norm error

Frobenius norm errorµkΩˆ − ΩkF v u m n uX X 2 kAkF = t |aij| i=1 j=1

Spectrum norm error: kΩˆ − Ωk2 p kAk2 = λmax(A ∗ A) = σmax(A)

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Performance: Greedy, TIGER, Clime, Glasso

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Performance: Greedy, TIGER, Clime, Glasso

Jiang Guo Gaussian Graphical Models Remain unsolved. Recently, there are some progresses:

High dimensional semiparametric Gaussian copula graphical models (Han et al. 2012) The nonparanormal: Semiparametric estimation of high dimensional undirected graphs (Han et al. 2009)

Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses (NIPS 2012 Outstanding Student Paper Awards)

Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project non-gaussian scenario

Question? In a general graph, whether a relationship exists between conditional independence and the structure of the precision matrix?

Jiang Guo Gaussian Graphical Models Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses (NIPS 2012 Outstanding Student Paper Awards)

Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project non-gaussian scenario

Question? In a general graph, whether a relationship exists between conditional independence and the structure of the precision matrix?

Remain unsolved. Recently, there are some progresses:

High dimensional semiparametric Gaussian copula graphical models (Han et al. 2012) The nonparanormal: Semiparametric estimation of high dimensional undirected graphs (Han et al. 2009)

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project non-gaussian scenario

Question? In a general graph, whether a relationship exists between conditional independence and the structure of the precision matrix?

Remain unsolved. Recently, there are some progresses:

High dimensional semiparametric Gaussian copula graphical models (Han et al. 2012) The nonparanormal: Semiparametric estimation of high dimensional undirected graphs (Han et al. 2009)

Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses (NIPS 2012 Outstanding Student Paper Awards)

Jiang Guo Gaussian Graphical Models Social media NLP?

Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Applications

Biomatics

Jiang Guo Gaussian Graphical Models NLP?

Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Applications

Biomatics Social media

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Applications

Biomatics Social media NLP?

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Applications

Biomatics Social media NLP?

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Project: a graphical modeling platform

Numerical Precision Matrix

Graph Visualization · Infovis (Client, Javascript) · Circos (Server, Perl) Amazon EC2 (Cloud) & Princeton Server

· Regression Analysising Graphical Modeling Platform (future)

User

dataset Model

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Project: a graphical modeling platform

My visualization(1): R Visualization:

Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Arguable points

It is student who should push supervisor, rather than the superviser push student Work extremely hard, blindly trust your supervisor Keep quality first, quantity will follow Science is about problems, not equations. Being focused. Think less, do more. Open mind.

Jiang Guo Gaussian Graphical Models