Mini-Course - Probabilistic Graphical Models: a Geometric, Algebraic and Combinatorial Perspective

Mini-course - Probabilistic Graphical Models: A Geometric, Algebraic and Combinatorial Perspective

Caroline Uhler

Lecture 1: Graphical Models and Markov Properties

CIMI Workshop on Computational Aspects of Geometry Toulouse

November 6, 2019

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 1 / 5 Applications of graphical models

Probabilistic models that capture the statistical dependencies between )))Mo0va0on)variables of interest in the form of a network Used throughout the natural sciences, social sciences, and economics for modeling interactions Model)and)understand)the)link)between:) Undirected graphical models encode partial correlations, while directedSpa8al'dimension' graphical models can be used toBiochemical'dimension' represent causality

(a) Weather forecasting (b) Gene regulation

CarolineGeometry'/'Packing' Uhler Mini-course: Graphical ModelsNetworks' Toulouse, Nov 2019 2 / 5

Graphical models

Motivation: Provide an economic representation of a joint distribution using local relationships between variables

Origins of graphical models can be traced back to 3 communities: Statistical physics: use undirected graph to represent distribution over a large system of interacting particles [Gibbs, 1902] Genetics: use directed graphs to model inheritance in natural species [Wright, 1921] Statistics: use graphs to represent interactions in multi-dimensional contingency tables [Bartlett, 1935]

Graphical models combine graph theory with probability theory into a powerful framework for multivariate statistical modeling [Lauritzen, 1996] Algebraic, geometric and combinatorial questions arise naturally when studying graphical models Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 3 / 5 Overview of mini-course

(1) Introduction to graphical models - Markov properties

(2) Gaussian graphical models - Maximum likleihood estimation

(3) Covariance models with linear structure - Parameter estimation and structure learning

(4) Causal inference - Structure discovery

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 4 / 5 References: Graphical models

Bartlett, M. S. (1935). Contingency table interactions. J. Royal Stat. Soc. 2:248-252. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. J. Royal Stat. Soc., 36:192-236. Gibbs, J. W. (1902). Elementary Principles in Statistical Mechanics, Yale University Press. Lauritzen, S. L. (1996). Graphical Models, Clarendon Press. Moussouris, J. (1974). Gibbs and Markov random systems with constraints. J. Stat. Phys., 10:11-33. Pearl, J. (1988). Fusion, propagation and structuring in belief networks, Artiﬁcial Intelligence, 29, 357-370. Shachter, R. D. (1998). The rational pastime (for determining irrelevance and requisite information in belief networks and inﬂuence diagrams), UAI. Verma T. & Pearl, J. (1990). Equivalence and synthesis of causal models, UAI. Verma T. & Pearl, J. (1992). An algorithm for deciding if a set of observed independencies has a causal explanation. UAI. Wright, S. (1921). Correlation and causation. J. Agricult. Res., 20:557-585. Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 5 / 5 Mini-course - Probabilistic Graphical Models: A Geometric, Algebraic and Combinatorial Perspective

Caroline Uhler

Lecture 2: Gaussian Graphical Models

CIMI Workshop on Computational Aspects of Geometry Toulouse

November 6, 2019

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 1 / 23 Overview

Lecture is based on a book chapter that I wrote for the Handbook of Graphical Models edited by M. Drton, S. Lauritzen, M. Maathuis and M. Wainwright:

C. Uhler, “Gaussian graphical models: An algebraic and geometric perspective”, available at arXiv:1707.04345

Goal of this lecture is to give an introduction to Gaussian graphical models and show that algebraic, geometric and combinatorial questions arise naturally when studying graphical models

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 2 / 23 Gaussian graphical models

Goal: Characterize relationship among a large number of variables RESEARCH ARTICLE

encodes a deubiquitinating enzyme (DUB) in the encodes one of three adenosine monophosophate variants in genes emerging from the extended ESCRT pathway (17). The WDR48-encoded pro- (AMP) deaminase enzymes involved in balancing HSPome network. By using this method, we iden- tein forms stable complexes with multiple DUBs, purine levels (28). Mutations in AMPD2 have tified potentially pathogenic variants in MAG, such as USP1, USP12, and USP46, and is re- been recently linked to a neurodegenerative BICD2,andREEP2,foundinhomozygousin- quired for enzymatic activityVisualize and linked to lyso- brainstem disorderinteractions (28). In addition, the deletion tervals in three families (Fig. by 3), validating graphthe somal trafficking (18, 19). KIF1C encodes a we have identified in this study affects just the usefulness of the HSPome to identify new HSP motor protein localized to the ER/Golgi complex, longest of the three AMPD2 isoforms, indicating genes. Interacting with KIF1C in the HSPome is suggesting a role in trafficking (20). To validate that the most N-terminal domain of AMPD2 is CCDC64,encodingamemberoftheBicaudalfam- the effect of the putative splicing mutation in important to prevent motor neuron degeneration. ily (31), a paralog of the BIC2 gene that emerged in family 789, we obtained fibroblasts and confirmed ENTPD1 encodes an extracellular ectonuclease the HSPome (FDR < 0.05, table S5). Family 1370 skipping of exon 4 (fig.Gaussian S9). Defects in ESCRT are hydrolyzing adenosine nucleotidesgraphical in the synaptic displays a homozygous Ser models:608→Leu608 missense Used throughout the natural sciences, linked to neurodegenerative disorders such as cleft (29). NT5C2 encodes a downstream cytosolic change in the BIC2 gene within a homozygous frontotemporal dementia, Charcot Marie Tooth purine nucleotide 5′ phosphatase. Purine nucleo- haplotype. The Drosophila bicaudal-D protein is disease, and recently AR-HSP (21–23). Addition- tides are neuroprotective and play a critical role in associated with Golgi-to-ER transport and poten- ally, the HSP gene products SPG20, SPAST, and the ischemic and developing brain (29); thus, alter- tially regulates the rate of synaptic vesicle recycling ZYFVE26 interact withsocial components of this com- ations sciences in their levels could sensitize neurons to stress and(32). Coimmunoprecipitation economics confirmed that for modeling interactions among nodes plex (24–26). Taken together, this suggests that dis- and insult. ENTPD1 was recently identified as a BICD2 physically interacts with KIF1C (fig. S11). ruptions in ESCRT and endosomal function can candidate gene in a family with nonsyndromic in- Recently, a mutation in BICD2 was implicated in lead to HSP and other forms of neurodegeneration. tellectual disability, but HSP was not evaluated (30). adominantformofHSP(33). AMPD2, ENTPD1,andNT5C2 are involved MAG was identified as a significant poten- in purine nucleotide metabolismfor (fig. S10). continuous Nu- Candidate HSP Genes Identified by multivariatetial HSP candidate (FDR < 0.05) from the HSPome, data cleotide metabolism is linked to the neurological Network Analysis interacting with PLP1, the gene product mutated disorder Lesch-Nyhan disease, among others (27), For families that were not included in our initial in SPG2.MAGisamembrane-boundadhesion but was not previously implicated in HSP. AMPD2 analysis, we interrogated our exome database for protein implicated in myelin function, and knockout

−3 A B x 10 18 11 0.54 16 10

14 9 0.52 8 ARL6IP1 12 ATL1 GAD1 7 KIAA0196 10 0.50 AMPD2 6 SPG21 PGAP1 RTN2 8 DDHD1 5 SPG11 0.48 ARSI 6 4

MT-ATP6 Random walk similarity SLC33A1 Within group edge count NT5C2 4 3 PLP1 Network neighborhood overlap 0.46 ZFYVE26 BSC2 2 2 ENTPD1 SPG7 1 0 GJA1 0.44 VPS37A Seed proteins (known mutated in HSP)

L1CAM HSPD1

GJC2 −3 ERLIN2 C x 10 SPAST VCP AP4M1 18 11 0.54 USP8 AP4E1 KIF5A 16 10 AP4S1 14 9 SPG20 ERLIN1 0.52 KIF1C AP4B1 8 RAB3GAP2 12 7 AP5Z1 FA2H 10 0.50 ZFR 6 WDR48 8 5 ALS2 0.48 6 4

PNPLA6 Random walk similarity Within group edge count 4 3 Network neighborhood overlap 0.46 2 2 Seed proteins 1 (known mutated in HSP) 0 0.44 MARS Candidate HSP proteins Seed proteins & candidate proteins CCT5 (found mutated in this study) Fig. 2. Hereditary spastic paraplegia interactome. (A)HSPseeds+can- or seeds + candidates). Box and whisker plots denote matched null distributions didate network (edge-weighted force-directed layout), demonstrating many of (i.e., 10,000 permutations). (B) Seed(knownmutatedinHSP)versusrandom the genes known to be mutated in HSP (seeds, blue) and new HSP candidates proteins drawn with the same degree distribution. (C) Seed plus candidate HSP (red), along with others (circles) constituting the network. (B and C)Comparison versus a matching set of proteins. (Left) Within group edge count (i.e., number of(a) statistical strength ofGene HSP subnetworks with 10,000 permutations association of randomly of edges between members of the query(b) set).(Middle)Interactionneighborhood Athens stock exchange (c) Wind speed forecasting selected proteins. Dots denote the value of the metric on the true set (i.e., seeds overlap (i.e., Jaccard similarity). (Right) Network random walk similarity.

network www.sciencemag.org(NovarinoSCIENCE et al.,VOL 343 31 JANUARY 2014(Garos & Argyrakis,509 Physica A, 2007) Science 343, 2014)

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 3 / 23 What can you say about the space of covariance matrices? How does the space of 3 × 3 correlation matrices look like?

Gaussian Distribution

p A random vector X ∈ R follows a multivariate Gaussian distribution p p with mean µ ∈ R and covariance matrix Σ ∈ S 0 if it has density 1 f (x) = (2π)−p/2 det(Σ)−1/2 exp − (x − µ)T Σ−1(x − µ) µ,Σ 2

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 4 / 23 Gaussian Distribution

What can you say about the space of covariance matrices? How does the space of 3 × 3 correlation matrices look like?

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 4 / 23 Gaussian Distribution

What can you say about the space of covariance matrices? How does the space of 3 × 3 correlation matrices look like?

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 4 / 23 Question: Interpretation of missing edges in G?

Gaussian Graphical Model

G = (V , E) undirected graph with vertices V = {1,..., p} and edges E

p KG = {K ∈ S 0 | Kij = 0 for all i 6= j with (i, j) ∈/ E}

p A Gaussian vector X ∈ R is a Gaussian graphical model on G if

−1 p X ∼ N (µ, Σ) and Σ ∈ S 0(G).

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 5 / 23 Gaussian Graphical Model

G = (V , E) undirected graph with vertices V = {1,..., p} and edges E

p KG = {K ∈ S 0 | Kij = 0 for all i 6= j with (i, j) ∈/ E}

p A Gaussian vector X ∈ R is a Gaussian graphical model on G if

−1 p X ∼ N (µ, Σ) and Σ ∈ S 0(G).

Question: Interpretation of missing edges in G?

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 5 / 23 −1 −1 Note: Let K = Σ . Then by Schur complement,ΣA|AC =(KAA) . Hence a missing edge in G means Kij = 0, or equivalently, Xi ⊥⊥ Xj | XV \{i,j}.

Marginals and Conditionals of a Gaussian

Theorem a Let X ∼ Np(µ, Σ) and partition X into two components XA ∈ R and b XB ∈ R such that a + b = p. Let µ and Σ be partitioned accordingly, i.e., µ Σ Σ µ = A and Σ = A,A A,B . µB ΣB,A ΣB,B

Then,

(a) the marginal distribution of XA is N (µA, ΣA,A);

(b) the conditional distribution of XA | XB = xB is N (µA|B , ΣA|B ), where

−1 −1 µA|B = µA+ΣA,B ΣB,B (xB −µB ) and ΣA|B = ΣA,A−ΣA,B ΣB,B ΣB,A.

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 6 / 23 Marginals and Conditionals of a Gaussian

Then,

(a) the marginal distribution of XA is N (µA, ΣA,A);

(b) the conditional distribution of XA | XB = xB is N (µA|B , ΣA|B ), where

−1 −1 µA|B = µA+ΣA,B ΣB,B (xB −µB ) and ΣA|B = ΣA,A−ΣA,B ΣB,B ΣB,A.

−1 −1 Note: Let K = Σ . Then by Schur complement,ΣA|AC =(KAA) . Hence a missing edge in G means Kij = 0, or equivalently, Xi ⊥⊥ Xj | XV \{i,j}.

Caroline Uhler Mini-course: Graphical Models Toulouse, Nov 2019 6 / 23 These problems don’t depend on mean µ; w.l.o.g. assume µ = 0 sample covariance matrix is given by n 1 X S = X (i)(X (i))T ∈ p , rk(S) = n ≤ p with probability 1 n S0 i=1 log-likelihood is given by: `(Σ; S) ∝ − log det(Σ) − tr (SΣ−1) What can be said about the log-likelihood function?

Two Main Problems