EXACT FORMULAS FOR THE NORMALIZING CONSTANTS OF WISHART DISTRIBUTIONS FOR GRAPHICAL MODELS

By Caroline Uhler∗, Alex Lenkoski†, and Donald Richards‡ Massachusetts Institute of Technology∗, Norwegian Computing Center†, and Penn State University‡

Gaussian graphical models have received considerable attention during the past four decades from the statistical and machine learning communities. In Bayesian treatments of this model, the G-Wishart distribution serves as the for inverse covariance ma- trices satisfying graphical constraints. While it is straightforward to posit the unnormalized densities, the normalizing constants of these distributions have been known only for graphs that are chordal, or decomposable. Up until now, it was unknown whether the normaliz- ing constant for a general graph could be represented explicitly, and a considerable body of computational literature emerged that at- tempted to avoid this apparent intractability. We close this question by providing an explicit representation of the G-Wishart normalizing constant for general graphs.

1. Introduction. Let G = (V,E) be an undirected graph with vertex p set V = {1, . . . , p} and edge set E. Let S be the set of symmetric p × p p p matrices and S 0 the cone of positive definite matrices in S . Let

p p (1.1) S 0(G) = {M = (Mij) ∈ S 0 | Mij = 0 for all (i, j) ∈/ E}

p denote the cone in S of positive definite matrices with zeros in all entries not corresponding to edges in the graph. Note that the positivity of all diagonal entries Mii follows from the positive-definiteness of the matrices M. p A random vector X ∈ R is said to satisfy the Gaussian graphical model (GGM) with graph G if X has a multivariate with mean −1 p µ and covariance matrix Σ, denoted X ∼ Np(µ, Σ), where Σ ∈ S 0(G). arXiv:1406.4901v2 [math.ST] 22 Jun 2016 The inverse covariance matrix Σ−1 is called the concentration matrix and, throughout this paper, we denote Σ−1 by K.

MSC 2010 subject classifications: Primary 62H05, 60E05; secondary 62E15 Keywords and phrases: Bartlett decomposition; Bipartite graph; Cholesky decomposi- tion; Chordal graph; Directed acyclic graph; G-Wishart distribution; Gaussian graphical model; Generalized hypergeometric function of matrix argument; Moral graph; Normaliz- ing constant; Wishart distribution.

1 2 C. UHLER, A. LENKOSKI, D. RICHARDS

p Statistical inference for the concentration matrix K constrained to S 0(G) goes back to Dempster [6], who proposed an algorithm for determining the maximum likelihood estimator [cf., 31]. A Bayesian framework for this prob- lem was introduced by Dawid and Lauritzen [5], who proposed the Hyper- Inverse Wishart (HIW) prior distribution for chordal (also known as decom- posable or triangulated) graphs G. Chordal graphs enjoy a rich set of properties that led the HIW distribution to be particularly amenable to Bayesian analysis. Indeed, for nearly a decade after the introduction of GGMs, focus on the Bayesian use of GGMs was placed primarily on chordal graphs [see, e.g., 11]. This tractability stems from two causes: the ability to sample directly from HIWs [28], and the ability to calculate their normalizing constants, which are critical quantities when comparing graphs or nesting GGMs in hierarchical structures. Roverato [29] extended the HIW to general G. Focusing on K, Atay- Kayis and Massam [2] further studied this prior distribution. Following Letac and Massam [22], Lenkoski and Dobra [21] termed this distribution the G- p Wishart. For D ∈ S 0(G) and δ ∈ R, the G-Wishart density has the form 1 (δ−2) 1 fG(K | δ, D) ∝ |K| 2 exp(− tr(KD)) 1 p . 2 K∈S 0(G) This distribution is conjugate [29] and proper for δ > 1 [24]. Early work on the G-Wishart distribution was largely computational in nature [4,7,8, 17, 21, 32, 33] and was predicated on two assumptions: first, that a direct sampler was unavailable for this class of models and, second, that the normalizing constant could not be explicitly calculated. Lenkoski [20] developed a direct sampler for G-Wishart variates, mimicking the algorithm of Dempster [6], thereby resolving the first open question. In this paper, we close the second question by deriving for general graphs G an explicit formula for the G-Wishart normalizing constant, Z 1 (δ−2) 1 CG(δ, D) = |K| 2 exp(− tr(KD)) dK, p 2 S 0(G) Qp Q where dK = i=1 dkii · i

1 2 pδ+|E| 1  CG(δ, D) = 2 IG 2 (δ − 2),D . NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 3

The normalizing constant IG(δ, D) is well-known for complete graphs, in which every pair of vertices is connected by an edge. In such cases,

1 −(δ+ 2 (p+1)) 1  (1.2) Icomplete(δ, D) = |D| Γp δ + 2 (p + 1) , where p p(p−1)/4 Y 1  (1.3) Γp(α) = π Γ α − 2 (i − 1) , i=1

1 Re(α) > 2 (p − 1), is the multivariate gamma function. The formula (1.2) has a long history, dating back to Wishart [34], Wishart and Bartlett [35], Ingham [15], Siegel [30, Hilfssatz 37], Maass [23], and many derivations of a statistical nature; see Olkin [27] and Giri [10, p. 224]. As noted above, IG(δ, D) is also known for chordal graphs. Let G be chordal, and let (T1,...,Td) denote a perfect sequence of cliques (i.e., com- plete subgraphs) of V . Further, let Si = (T1 ∪· · ·∪Ti)∩Ti+1, i = 1, . . . , d−1; then, S1,...,Sd−1 are called the separators of G. Note that the separators Si are cliques as well. We denote the cardinalities by ti = |Ti| and si = |Si|. For S ⊆ {1, . . . , p}, let DSS denote the submatrix of D corresponding to the rows and columns in S. Then,

Qd I (δ, D ) I (δ, D) = i=1 Ti TiTi G Qd−1 j=1 ISj (δ, DSj Sj )  1   Qd − δ+ 2 (ti+1) 1  i=1 |DTiTi | Γti δ + 2 (ti + 1) (1.4) = .  1   Qd−1 − δ+ 2 (sj +1) 1  j=1 |DSj Sj | Γsj δ + 2 (sj + 1)

This result follows because, for a chordal graph G, the G-Wishart density function can be factored into a product of density functions [5]. For non-chordal graphs the problem of calculating IG(δ, D) has been open for over 20 years, and much of the computational methodology mentioned above was developed with the objective of either approximating IG(δ, D) or avoiding its calculation. Our result shows that an explicit representation of this quantity is indeed possible. In deriving the explicit formula for the normalizing constant IG(δ), we utilize methods that are familiar to researchers in this area. These methods include the Cholesky decomposition or the Bartlett decomposition of a pos- itive definite matrix, Schur complements for factorizing determinants, and the chordal cover of a graph. Furthermore, we make crucial use of certain formulas from the theory of generalized hypergeometric functions of matrix 4 C. UHLER, A. LENKOSKI, D. RICHARDS argument [13, 16], and analytic continuation of differential operators on the cone of positive definite matrices [9]. The article proceeds as follows. In Section2 we treat the case in which D = Ip, the p×p identity matrix, deriving a closed-form product formula for the normalizing constant IG(δ, Ip) for various classes of non-chordal graphs. In Section3 we consider the case of general matrices D; in our main result in Theorem 3.3 we derive an explicit representation of IG(δ, D) for general graphs as a closed-form product formula involving differentials of principal minors of D. We end with a brief discussion in Section4.

2. Computing the normalizing constant IG(δ, Ip). In this section, we compute IG(δ, Ip) for two classes of non-chordal graphs. We begin in Section 2.1 with the class of complete bipartite graphs and use an approach based on Schur complements to attain a closed-form formula. In Section 2.2 we introduce directed Gaussian graphical models and show how these models relate to a Cholesky factor approach to computing IG(δ, Ip). This leads to a formula for computing normalizing constants of graphs with minimum fill-in equal to 1, namely graphs that become chordal after the addition of one edge. However, these approaches do not lead to a general formula for the normalizing constant in the case D = Ip. To obtain a formula for any graph G, we found it necessary to calculate the more general case IG(δ, D) and then specialize D = Ip, as is done for moment generating functions or Laplace transforms. This is explained in Section3.

2.1. Bipartite graphs. A complete bipartite graph on m + n vertices, de- noted by Hm,n, is an undirected graph whose vertices can be divided into disjoint sets U = {1, . . . , m} and V = {m + 1, . . . , m + n}, such that each vertex in U is connected to every vertex in V , but there are no edges within U or V . For the graph Hm,n, the corresponding matrix K is a block matrix,   KAA KAB K = T , KAB KBB where KAA,KBB are diagonal matrices of sizes m×m and n×n, respectively, and KAB is unconstrained, i.e., no entry of KAB is constrained to be zero.

Proposition 2.1. The integral IHm,n (δ, Im+n) converges absolutely for all δ > −1, and

 1 m  1 n (2.1) IHm,n (δ, Im+n) = Γ δ + 2 n + 1 Γ(δ + 2 m + 1) 1  Γm+n δ + 2 (m + n + 1) × 1  1 . Γm δ + 2 (m + n + 1) Γn δ + 2 (m + n + 1) NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 5

Proof. Applying the Schur complement formula for block matrices, we obtain Z δ IHm,n (δ, m+n) = |K| exp(−tr(K)) dK I m+n S 0 (G) Z δ T −1 δ = |KAA| |KBB − KAB(KAA) KAB| m+n S 0 (G)

· exp(−tr(KAA) − tr(KBB)) dKAA dKAB dKBB.

Since KAB is unconstrained, we can change variables by replacing KAB by 1/2 1/2 n/2 m/2 KAA KABKBB; then the corresponding Jacobian is |KAA| |KBB| . Since

1/2 T 1/2 T |KBB − KBBKABKABKBB| = |KBB| · |In − KABKAB|, we obtain

Z 1 1 δ+ 2 n δ+ 2 m T δ IHm,n (δ, m+n) = |KAA| |KBB| | n − KABKAB| I m+n I S 0 (G)

· exp(−tr(KAA) − tr(KBB)) dKAA dKAB dKBB, where the range of integration is such that each diagonal entry of KAA T and KBB is positive, KAB is unconstrained, and In − KABKAB is positive definite. Integrating over each diagonal entry of KAA and KBB, we obtain

 1 m  1 n IHm,n (δ, Im+n) = Γ δ + 2 n + 1 Γ δ + 2 m + 1 Z T δ × |In − KABKAB| dKAB. KAB

Finally, since KAB is unconstrained, we deduce from (3.4) the value of the latter integral.

In this computation, we used the special structure of the graph to de- compose the inverse covariance matrix K into a special block matrix. In Section3 we use a similar approach to show how the normalizing constant changes when removing a clique (i.e. a completely connected subgraph) from a graph. This leads to an algorithm for computing the normalizing constant IG(δ, D) for any graph G. In the reminder of this section, we show how an approach based on the Cholesky factorization of K can be used to eas- ily compute the normalizing constant for graphs that have minimum fill-in equal to 1. This requires introducing directed Gaussian graphical models. 6 C. UHLER, A. LENKOSKI, D. RICHARDS

2.2. Directed Gaussian graphical models. Let G = (V, E) be a directed acyclic graph (DAG) consisting of vertices V = {1, . . . , p} and directed edges E. We assume, without loss of generality, that the vertices in G are topologically ordered, meaning that i < j for all (i, j) ∈ E. We associate to G a strictly upper-triangular matrix B of edge weights. So B = (bij) with bij 6= 0 if and only if (i, j) ∈ E. Then a directed Gaussian graphical p model on G for a random variable X ∈ R is defined by X ∼ Np(0, Σ) with Σ = [(I − B)D(I − B)T ]−1, where D is a diagonal matrix. p To simplify notation, let aii = dii and aij = −bij djj, and let A = (Aij) √ −1 T with Aii = aii and Aij = −aij for all i 6= j. Then Σ = AA , and aij 6= 0 for i 6= j if and only if (i, j) ∈ E. Note that AAT is the upper Cholesky decomposition of Σ−1. Such a decomposition exists for any positive definite matrix and is unique. We will associate to a DAG, G = (V, E), and its corresponding directed Gaussian graphical model two undirected graphs. We denote by Gs = (V, Es) the skeleton of G obtained by replacing all directed edges in G by undirected edges. We denote by Gm = (V, Em) the moral graph of G, which reflects the conditional independencies in Np(0, Σ), i.e.,

m (i, j) ∈/ E if and only if Xi ⊥⊥ Xj | XV \{i,j}.

Since Σ−1 also encodes the conditional independence relations of the form Xi ⊥⊥ Xj | XV \{i,j}, this is equivalent to the criterion,

m −1 (i, j) ∈/ E if and only if Σ ij = 0.

So, the moral graph Gm reflects the zero pattern of Σ−1. The moral graph of G can also be defined graph-theoretically: It is formed by connecting all nodes i, j ∈ V that have a common child in G, i.e., for which there exists a node k ∈ V \{i, j} such that (i, k), (j, k) ∈ E, and then making all edges in the graph undirected. The name stems from the fact that the moral graph is obtained by ‘marrying’ the parents. For a review of basic graph-theoretic concepts see e.g. [19, Chapter 2]. The moral graph is an important concept for our application. Let G = (V,E) be an undirected graph, with V = {1, . . . , p}, for which we want to compute IG(δ, Ip). Let G0 = (V,E0) with G0 = G. Given a labeling of the vertices V we associate a DAG, G0 = (V, E0), to G0 by orienting the edges in E0 according to the topological ordering, i.e., for all (i, j) ∈ E0 let (i, j) ∈ E0 if i < j. Note that the skeleton of G0 is the original undirected graph G0. Let m G1 = (V,E1) be the moral graph of G0, i.e., G1 = G0 , and let G1 = (V, E1) be the corresponding DAG obtained by orienting the edges in E1 according NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 7 to the ordering of the vertices V . So G0 is a subgraph of G1. We repeat this procedure until Gq+1 = Gq. This results in a sequence of DAGs,

G0 ( G1 ( ··· ( Gq. In the following, we denote by G = (V, E) the DAG associated to G = (V,E) obtained by orienting the edges in E according to the ordering of the vertices V . We denote by G¯ = (V, E¯) the DAG associated to G = (V,E) obtained by repeatedly marrying parents in G, i.e. G¯ = Gq. We call G¯ the moral DAG of G. Note that G¯s, the skeleton of G¯, is a chordal graph with G ⊂ G¯s (Lauritzen [19, Chapter 2]), so G¯s is a chordal cover of G. A chordal cover in general is not unique; however, G¯s is the unique chordal cover obtained by repeatedly marrying parents according to the vertex labeling V . We call this chordal cover the moral chordal graph of G and denote it by G¯ = (V, E¯). We now show how to deduce from the undirected graph G = (V,E) the normalizing constant IG(δ, Ip) as an integral in terms of the Cholesky factor p A. Since the proof is the same for general correlation matrices D ∈ S 0, we give the result directly for IG(δ, D). In the following, we use the standard graph-theoretic notation indeg(i) for the indegree of node i, representing the number of edges “arriving at” (or “pointing to”) node i in a DAG G.

Theorem 2.2. Let G = (V,E) be an undirected graph with vertices V = {1, . . . , p}. Let G = (V, E) be the DAG associated to G = (V,E) obtained by orienting the edges in E according to the ordering of the vertices in V . Let G¯ = (V, E¯) denote the moral DAG of G and G¯ = (V, E¯) its skeleton, the moral chordal graph of G. Let A be an upper-triangular p × p matrix √ with diagonal entries Aii = aii and off-diagonal entries Aij = −aij for all i < j. Then p p Z  1     Y δ+ 2 indeg(i) X X 2 IG(δ, D) = aii exp − aii + aij A ∗ i=1 i=1 j:(i,j)∈E¯  X  √ X  · exp − 2 dij − aij ajj + ailajl dA∗, (i,j)∈E l:(i,l),(j,l)∈E¯ p where D ∈ S 0 is a correlation matrix, A∗ = {aij : i = j or (i, j) ∈ E}, the range of aii is (0, ∞), the range of aij for (i, j) ∈ E is (−∞, ∞), indeg(i) denotes the indegree of node i in G, and for aij ∈/ A∗  0, if (i, j) ∈/ E¯,   1 X aij = √ ailajl, if (i, j) ∈ E\E¯ . ajj  l∈V  (i,l),(j,l)∈E¯ 8 C. UHLER, A. LENKOSKI, D. RICHARDS

p ¯ p ¯ Proof. Let K ∈ S 0(G). Since G ⊂ G, then K ∈ S 0(G) and we can view K as an inverse covariance matrix of a directed Gaussian graphical model on G¯. Because the Cholesky decomposition is unique, A is a weighted adjacency matrix of G¯ and hence aij = 0 for all (i, j) ∈/ E¯. Let (i, j) be an edge that is present in the moral chordal graph G¯ but not in G. We can assume that i < j. Hence (i, j) ∈ E\E¯ and therefore

T √ X 0 = Kij = (AA )ij = −aij ajj + ailajl. l>max(i,j)

Thus, for each edge (i, j) ∈ E\E¯ , we obtain an equation,

1 X aij = √ ailajl. a jj l∈V (i,l),(j,l)∈E¯

To complete the proof, we need to compute the Jacobian J of the change of variables from K to A. We list the aij’s column-wise, meaning that aij precedes alm if j < m or if j = m and i < l, omitting aij for (i, j) ∈/ E, corresponding to the zeros in K. We list the kij’s in the same ordering. Let the aij’s correspond to the columns of the Jacobian, while the kij’s correspond to the rows. In order to form J, we calculate the partial derivative T of each kij with respect to each alm. Since K = AA and A is upper- triangular then J also is upper-triangular; therefore, |J| = |diag(J)|. Since

X 2 √ X kii = aii + aij and kij = −aij ajj + ailajl, (i,j)∈E¯ l∈V (i,l),(j,l)∈E¯ for all (i, j) ∈ E, then p Y indeg(i)/2 |J| = aii . i=1 Collecting together these formulas completes the proof.

The number of edges in E\E¯ depend on the ordering of the vertices. It is well-known (see e.g. Lauritzen [19, Chapter 2]) that one can find an ordering of the vertices such that G¯ = G if and only if G is chordal. Hence when G is chordal we can directly derive the normalizing constant of IG(δ, Ip) from Theorem 2.2 by evaluating Gaussian and Gamma integrals. One could also prove the following corollary using Equation (1.4). 2

14 AUTHORS’ SURNAMES

NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 9

5 5

1 4 1 4

2 3 2 3

Fig 1. Undirected graph G5 (left) discussed in Example 2.4 and its moral DAG G¯5 (right).

Corollary 2.3. Let G = (V,E) be a chordal graph, where the vertices V = {1, . . . , p} are labelled according to a perfect ordering. Then

p |E|/2 Y 1  IG(δ, Ip) = π Γ δ + 2 indeg(i) + 1 . i=1 where indeg(i) denotes the indegree of node i in the corresponding DAG G.

Example 2.4. We illustrate Theorem 2.2 by studying the non-chordal graph G5, shown in Figure1 (left). We wish to calculate Z δ  (2.2) IG5 (δ, I5) = |K| exp − tr(K) dK 5 K∈S 0(G5)

T through the change of variables, K = AA . The moral DAG of G5 is denoted by G¯5 and depicted in Figure1 (right). Since the edges (2 , 4) and (2, 5) are missing in G¯5, we immediately deduce that a24 = a25 = 0. In this example, we chose an ordering where only one edge needed to be added in the process of marrying parents, namely the edge (1,3). This results in one equation for a13, which can be deduced from the colliders over the additional edge, i.e., nodes l ∈ V with (1, l), (3, l) ∈ G¯, and results in 1 a13 = √ (a14a34 + a15a35). a33 Finally, the Jacobian can be deduced from the indegrees of the nodes in G5, which corresponds to the moral DAG G¯5 after omitting the red edge. Therefore, the determinant of the Jacobian is

0/2 1/2 1/2 2/2 3/2 a11 a22 a33 a44 a55 , 10 C. UHLER, A. LENKOSKI, D. RICHARDS and we find that the integral (2.2) equals Z δ δ+1/2 δ+1/2 δ+1 δ+3/2 a11a22 a33 a44 a55 A 2 h  2 a14a34 + a15a35  2 2 × exp − a11 + a12 + √ + a14 + a15 a33 2 2 2 2 i + a22 + a23 + a33 + a34 + a35 + a44 + a45 + a55 dA, where aii > 0; aij ∈ R, i < j; and dA denotes the product of all differentials.

As seen in Example 2.4, the equations corresponding to the additional edges (i, j) ∈ E\E¯ complicate the integral significantly. Therefore, given a non-chordal graph G, it is desirable to find an ordering such that |E¯ \ E| is minimized. This ordering is given by a perfect ordering of a minimal chordal cover of G, where minimality is with respect to the number of edges that need to be added in order to make G chordal. Using Corollary 2.3, we can compute the normalizing constant corresponding to a minimal chordal cover of G. The question arises: How can one compute the normalizing constant of G from the normalizing constant of a minimal chordal cover of G? In the following theorem, we show how one can compute the normalizing constant of a graph G that results from removing one edge from a chordal graph. Such graphs are said to have minimum fill-in equal to 1.

Theorem 2.5. Let G = (V,E) be an undirected graph with minimum fill-in 1 and with vertices V = {1, . . . , p}. Let Ge = (V,Ee) denote the graph G with one additional edge e, i.e., Ee = E ∪ {e}, such that Ge is chordal. Let d denote the number of triangles formed by the edge e and two other edges in Ge. Then

1  −1/2 Γ δ + 2 (d + 2) IG(δ, Ip) = π 1  IGe (δ, Ip). Γ δ + 2 (d + 3) Proof. We begin by defining an ordering of the vertices in such a way that one can directly integrate out the variables corresponding to the end points of e and the variable corresponding to e itself. Let one of the end points of e be labelled as ‘1’, the other end point as ‘d + 2’ and label the d vertices involved in triangles over the edge e by 2, . . . , d + 1. Label all remaining vertices by d + 3, . . . , p. Let G¯e denote the moral DAG to Ge with edge set E¯e. Then the chosen ordering of the vertices guarantees that E¯e = E¯ ∪ {e}, and e∈ / E¯. NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 11

Also, since all vertices 2, . . . , d+1 are connected to vertex d+2, no added edge in E\E¯ points to vertex d + 2 and hence ad+2,d+2 does not appear in any equation for the edges in E\E¯ . Similar arguments hold for vertex 1, since due to the ordering there can be no edge pointing to node 1. Let A and Ae denote the Cholesky factors of G and Ge, respectively. Then

( e Aij for all (i, j) 6= (1, d + 2) Aij = 0 if (i, j) = (1, d + 2)

Let indeg denote the indegree with respect to the DAG G and indege the e indegree with respect to the DAG G . Let A∗ = ((aii)i/∈{1,d+2}, (aij)(i,j)∈E ). Note that

(2.3) indege(1) = 0 = indeg(1), indege(d + 2) = d + 1 = indeg(d + 2) + 1.

Then by Theorem 2.2,

p Z  1 e    Y δ+ 2 indeg (i) X 2 IGe (δ, Ip) = aii exp(−aii) exp − aij i=1 (i,j)∈E¯e

da11 dad+2,d+2 da1,d+2 dA∗ Z ∞ Z ∞ 1 e 2 δ+ 2 indeg (1) = exp(−a1,d+2) da1,d+2 · a11 exp(−a11) da11 −∞ 0 Z ∞ 1 e δ+ 2 indeg (d+2) · ad+2,d+2 exp(−ad+2,d+2) dad+2,d+2 0 Z  1 e    Y δ+ 2 indeg (i) X 2 · aii exp(−aii) exp − aij dA∗. A ∗ i/∈{1,d+2} (i,j)∈E¯ √ The integral with respect to a1,d+2 is a Gaussian integral, with value π. Also, by (2.3),

Z ∞ 1 e Z ∞ 1 δ+ 2 indeg (1) δ+ 2 indeg(1) a11 exp(−a11) da11 = a11 exp(−a11) da11. 0 0 Again by (2.3), we have

Z ∞ 1 e δ+ 2 indeg (d+2) ad+2,d+2 exp(−ad+2,d+2) dad+2,d+2 0 1  ∞ Z δ+ 1 indeg(d+2) Γ δ + 2 (d + 1) + 1 2 = 1 ad+2,d+2 exp(−ad+2,d+2) dad+2,d+2. Γ(δ + 2 d + 1) 0 12 C. UHLER, A. LENKOSKI, D. RICHARDS

Finally, since indege(i) = indeg(i) for all i∈ / {1, d + 2}, we obtain

1  Z ∞ √ Γ δ + 2 (d + 1) + 1 δ+indeg(1)/2 IGe (δ, Ip) = π 1 a11 exp(−a11) da11 Γ(δ + 2 d + 1) 0 Z ∞ 1 δ+ 2 indeg(d+2) · ad+2,d+2 exp(−ad+2,d+2) dad+2,d+2 0 Z  1    Y δ+ 2 indeg(i) X 2 · aii exp(−aii) exp − aij dA∗ A ∗ i/∈{1,d+2} (i,j)∈E¯ 1  √ Γ δ + 2 (d + 3) = π 1  IG(δ, Ip). Γ δ + 2 (d + 2) The proof now is complete.

Example 2.6. Since the graph G5 discussed in Example 2.4 has mini- mum fill-in equal to 1, we can apply Theorem 2.5 to compute its normalizing constant. The skeleton of the graph shown in Figure1 (right) is a chordal cover of G5 and the given vertex labeling is a perfect labeling. By applying Proposition 2.3, we deduce the normalizing constant for the graph G5 with the additional edge e = (1, 3):

4 3   2 5  I e (δ, ) = π Γ(δ + 1) Γ δ + Γ(δ + 2) Γ δ + . G5 Ip 2 2 Since the number of triangles over the red edge (1, 3) is d = 3, we find by Theorem 2.5 that 3 −1/2 Γ(δ + 2 + 1) I (δ, ) = π I e (δ, ) G5 Ip 4 G5 Ip Γ(δ + 2 + 1) Γδ + 5  (2.4) = π7/2 2 Γ(δ + 1) Γδ + 3  Γ(δ + 2)2 Γδ + 5 . Γ(δ + 3) 2 2

3. Computing IG(δ, D) for general non-chordal graphs. In this section, we study IG(δ, D) for general D. In Theorem 3.3 we show how the normalizing constant changes when removing not only an edge, but an entire clique (i.e., a completely connected subgraph) from a graph. This leads to an algorithm for computing the normalizing constant IG(δ, D) for any graph G, which can then be specialized to the case which D = Ip. For general graphs, we found it necessary to calculate first the general case IG(δ, D) and then to specialize to D = Ip, as is done for moment-generating functions or Laplace transforms. NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 13

3.1. Some results on a generalized hypergeometric function of matrix ar- gument. We list in this subsection some results, involving a generalized hypergeometric function of matrix argument, that we will apply repeatedly in this section. For a ∈ C and k ∈ {0, 1, 2,...}, we denote the rising factorial by Γ(a + k) (a) = = a(a + 1)(a + 2) ··· (a + k − 1). k Γ(a)

For t ∈ C and ρ 6∈ {0, −1, −2,...} the classical generalized hypergeometric function, 0F1(ρ, t), may be defined by the series expansion,

∞ X tl (3.1) 0F1(ρ; t) = . l!(ρ)l l=0 We refer to Andrews, et al. [1] for many other properties of this function. The generalized hypergeometric function of matrix argument, 0F1(ρ; Y ), p Y ∈ S 0, is defined by the Laplace transform, Z 1 ρ− 1 (p+1) −ρ −1 |Y | 2 exp(−tr(YD)) 0F1(ρ; Y ) dY = |D| exp(tr(D )), Γ (ρ) p p S 0

1 p valid for Re(ρ) > 2 (p − 1) and D ∈ S 0. Herz [13] provided an extensive theory of the analytic properties of the function 0F1. In particular, 0F1(ρ; Y ) 1 is simultaneously analytic in ρ for Re(ρ) > 2 (p − 1) and entire in Y ; so, as a p function of Y , its domain of definition extends to the set S and to the set of of complex symmetric matrices. Other properties of the function 0F1, such as zonal polynomial expansions which generalize (3.1), are given by James [16], Muirhead [26], and Gross and Richards [12]. Herz [13, p. 497] proved that the function 0F1(ρ; Y ) depends only on 1 p the eigenvalues of Y , and moreover that if Re(ρ) > 2 (p − 1), D ∈ S 0, and p C ∈ S , then there holds the Laplace transform formula, Z ρ− 1 (p+1) (3.2) |Y | 2 exp(−tr(YD)) 0F1(ρ; YC) dY p S 0 −ρ −1 = Γp(ρ) |D| exp(tr(D C)),

1/2 1/2 where, by convention, 0F1(ρ; YC) is an abbreviation for 0F1(ρ; Y CY ) 1/2 p and Y ∈ S 0 is the unique square-root of Y . Setting C = 0 (the zero matrix) in (3.2) we deduce from the uniqueness of the Laplace transform and (1.2) that 0F1(ρ; 0) = 1. 14 C. UHLER, A. LENKOSKI, D. RICHARDS

We will apply repeatedly a generalization of the Poisson integral to matrix spaces (see [13, pp. 495–496] and [16, Equation (151)]): If A is a k ×p matrix 1 such that k ≤ p, and Re(ρ) > 2 (k + p − 1), then Z T ρ− 1 (k+p+1) T (3.3) |Ik − XX | 2 exp(tr(AX )) dX T 0

Z kp/2 1  T ρ− 1 (k+p+1) π Γk ρ − 2 p (3.4) |Ik − XX | 2 dX = , T Γ (ρ) 0

∞ X 1 q (3.5) 0F1(ρ; Y ) = |Y | 0F1(ρ + 2q; tr(Y )), q!(ρ) ρ − 1  q=0 2q 2 q where the 0F1 functions on the right-hand side are the classical generalized hypergeometric functions given in (3.1). In the special case in which Y is of rank 1, it follows from Herz [13, p. 497], or directly from (3.5), that

(3.6) 0F1(ρ; Y ) = 0F1(ρ; tr(Y )).

3.2. The normalizing constant for non-chordal graphs. We want to cal- culate Z δ IG(δ, D) = |K| exp(−tr(KD)) dK, p S 0(G) the normalizing constant for G, a general non-chordal graph. By making the change of variables K → diag(D)−1/2K diag(D)−1/2 we can assume, without loss of generality, that D has ones on the diagonal and therefore is a correlation matrix; this assumption will be maintained explicitly for the remainder of the paper. In the sequel, we will encounter a 2 × m matrix C = (Cij), and then we use the notation |C{1,2},{i,j}| for the minor corresponding to rows 1 and 2 and to columns i and j, where i, j ∈ {1, . . . , m}. We will need L = (Lij), a NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 15

P2 Pm 2 × m matrix of non-negative integers such that i=1 j=1 Lij = l, and we adopt the notation

m 2  l  l! X X = ,L = L , and L = L . L Q2 Qm i+ ij +j ij i=1 j=1 Lij! j=1 i=1

We will also have Q = (Qij)1≤i

m j−1  q  q! X X = ,Q = Q , and Q = Q . Q Q Q ! i+ ij +j ij 1≤i

In the following result, we obtain the normalizing constant for H2,m, a complete bipartite graph on 2 + m vertices.

Proposition 3.1. The integral IH2,m (δ, D) converges absolutely for all 2+m δ > −1 and D ∈ S 0 . Let C = (Cij) denote the 2 × m submatrix of D corresponding to the edges in G; then IH2,m (δ, D) equals

IH2,m (δ, Im+2) ∞ 1  m ∞ X δ + 2 (m + 2) q [(δ + 2)q] X 1 · q! δ + 1 (m + 3) l! δ + 2q + 1 (m + 3) q=0 2 2q l=0 2 l   2 m  2  m  X l Y Y Lij Y 1  Y · C δ + q + (m + 2) (δ + 2)L L ij 2 Li+ +j L i=1 j=1 i=1 j=1  q   m  X Y 2Qij Y  · |C{1,2},{i,j}| δ + L+j + 2 , Q Qj++Q+j Q 1≤i

Applying (3.3) to integrate over KAB, we obtain

m 3  π Γ2 δ + 2 IH2,m (δ, D) = 1  Γ2 δ + 2 (m + 3) Z δ+ 1 m δ+1 · |KAA| 2 |KBB| exp(−tr(KAA) − tr(KBB))

1 T  · 0F1 δ + 2 (m + 3); KAACKBBC dKAA dKBB.

Applying (3.5) to expand this 0F1 function of matrix argument in terms of T a classical 0F1 function of tr(KAACKBBC ), and applying (3.1), we get

1 T  0F1 δ + 2 (m + 3); KAACKBBC ∞ X 1 T q = |KAACKBBC | q! δ + 1 (m + 3) δ + 1 (m + 2) q=0 2 2q 2 q 1 T  · 0F1 δ + 2q + 2 (m + 3); tr(KAACKBBC ) ∞ X 1 T q = |KAACKBBC | q! δ + 1 (m + 3) δ + 1 (m + 2) q=0 2 2q 2 q ∞ X 1 T l · tr(KAACKBBC ) . l! δ + 2q + 1 (m + 3) l=0 2 l By the Binet-Cauchy formula (see Karlin [18, p. 1]),

T T |KAACKBBC | = |KAA| · |CKBBC | X 2 = |KAA| kikj |C{1,2},{i,j}| . 1≤i

Hence, by the Multinomial Theorem, T q |KAACKBBC |   X q Y Qij = |K |q k k |C |2 AA Q i j {1,2},{i,j} Q 1≤i

Evaluating each gamma integral and simplifying the outcomes, we obtain

m 3  π Γ2 δ + 2  m  1  2 IH2,m (δ, D) = 1  Γ(δ + 2) Γ δ + 2 (m + 2) Γ2 δ + 2 (m + 3) ∞ 1  m ∞ X δ + 2 (m + 2) q ((δ + 2)q) X 1 · q! δ + 1 (m + 3) l! δ + 2q + 1 (m + 3) q=0 2 2q l=0 2 l  l  2 m  2  m  X Y Y Lij Y 1  Y · (Cij) δ + q + (m + 2) (δ + 2) L 2 Li+ L+j L i=1 j=1 i=1 j=1  q   m  X Y 2Qij Y · |C{1,2},{i,j}| (δ + L+j + 2) . Q Qj++Q+j Q 1≤i

Finally, the value of IH2,m (δ, Im+2) is obtained by applying Theorem 2.1 or Theorem 2.5, so the proof now is complete.

Note that if we set D = Im+2 in the proof of Proposition 3.1 then |C{1,2},{i,j}| = Cij = 0. Hence, in the infinite series, the only non-zero terms are those for which l = q = 0, so the series reduces identically to 1. The special structure of K was crucial for the proof of Proposition 3.1. We now combine Proposition 3.1 with the approach developed in Theorem 2.2, of representing K by its upper Cholesky decomposition, to describe how the normalizing constant changes when removing an edge from a chordal graph with maximal clique size at most 3. Similarly as in the proof of Theorem 2.5, the main difficulty lies in defining a good ordering of the nodes. For sim- plifying notation we denote the quotient of the normalizing constants for general D and the identity matrix by I¯G(δ, D), i.e.,

IG(δ, D) I¯G(δ, D) = . IG(δ, Ip) ¯ As an example, note that IH2,m (δ, D) is given in Proposition 3.1.

Corollary 3.2. Let G = (V,E) be an undirected graph of minimum fill-in 1 with vertices V = {1, . . . , p} and maximal clique size at most 3. Let Ge = (V,Ee) denote the graph G with one additional edge e, i.e., Ee = E ∪ {e}, such that Ge is chordal and its maximal clique size is also at most 3. Let d denote the number of triangles formed by the edge e and two other edges in Ge. Then

1  d−1 −1/2 Γ δ + 2 (d + 2) |D{1,d+2}| I (δ, D) = π I¯ (δ, D) I e (δ, D), G 1  Qd+1 H2,d G Γ δ + 2 (d + 3) j=2 |D{1,j,d+2}| NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 19

where D{i1,...,ik} denotes the principal submatrix of D corresponding to the rows and columns i1, . . . , ik.

Proof. We define an ordering of the vertices in such a way that the integral for the normalizing constant IG(δ, D) decomposes into an integral over a bipartite graph and an integral over the remaining variables. Similarly as in the proof of Theorem 2.5, label one of the end points of e as ‘1’, label the other end point as ‘d + 2’, and label the d vertices involved in triangles over the edge e by 2, . . . , d + 1. Label all remaining vertices by d + 3, . . . , p. Let G¯ denote the moral DAG to G with edge set E¯ and similarly for Ge. By Theorem 2.2, the normalizing constant for G decomposes into an in- tegral over the variables A = {aij | (i, j) ∈ E¯, i, j ≤ d + 2} and an integral over the variables B = {aij | (i, j) ∈ E¯, aij ∈/ A}. The equivalent statement holds for the graph Ge with Ae = A ∪ {e} and Be = B. Note that the integral over B is the same for G as for Ge. The integral over A is the nor- malizing constant for the complete bipartite graph H2,d with U = {1, d + 2} and V = {2, . . . , d + 1} where every vertex in U is connected to all vertices in V , but there are no edges within U nor within V . The integral over Ae = A ∪ {e} is the normalizing constant for the complete bipartite graph H2,d with one additional edge connecting the two nodes in U. We denote e this graph by H2,d. So

IH2,d (δ, D) IG(δ, D) = IGe (δ, D) I e (δ, D) H2,d ¯ IH2,d (δ, Id+2)IH2,d (δ, D) = IGe (δ, D) , I e (δ, D) H2,d ¯ where IH2,d (δ, D) is given by Proposition 3.1. e The additional edge e makes the graph H2,m chordal and hence the nor- malizing constant is computed using (1.4):

Qd+1 j=2 |D{1,j,d+2}| I e (δ, D) = I e (δ, ) . H2,d H2,d Id+2 d−1 |D{1,d+2}|

By Theorem 2.5,

1  IH2,d (δ, Id+2) −1/2 Γ δ + 2 (d + 2) = π 1 . I e (δ, ) Γ δ + (d + 3) H2,d Id+2 2 20 C. UHLER, A. LENKOSKI, D. RICHARDS

By collecting all terms we find ¯ d−1 IH2,d (δ, Id+2)IH2,d (δ, D) |D1,d+2| IG(δ, D) = IGe (δ, D) I e (δ, ) Qd+1 H2,d Id+2 j=2 |D1,j,d+2| 1  d−1 −1/2 Γ δ + 2 (d + 2) |D{1,d+2}| = π I¯ (δ, D) I e (δ, D). 1  Qd+1 H2,d G Γ δ + 2 (d + 3) j=2 |D{1,j,d+2}| The proof now is complete.

Corollary 3.2 can be generalized to graphs of minimum fill-in 1 and arbi- trary treewidth to obtain an extension of Theorem 2.5 to general D. This involves decomposing the normalizing constant for G into a normalizing con- stant for the chordal graph Ge and the quotient of the normalizing constants for the subgraph induced by the triangles over the edge e. This technical re- sult is given in Theorem (S.3) in the Supplementary Material. We now prove our main result which can be applied to compute the normalizing constant for any graph. It involves showing how the normalizing constant changes when removing a whole clique from a graph. However, for graphs of minimum fill-in 1 it is advisable for computational reasons to use the specialized result given in Theorem4 in the Appendix. In the following, we denote by GA the subgraph of G induced by the vertices A ⊂ V . In the following theorem, we will encounter a symmetric matrix TAA = (Tij)i,j∈A. Denoting Kronecker’s delta by δij, we define the matrix of differential operators,

∂  1 ∂  = 2 (1 + δij) , ∂TAA ∂Tij i,j∈A as in [9, 23]. The corresponding determinant, det(∂/∂TAA), and the (r, s)th cofactor, Cof rs(∂/∂TAA), are defined in the usual way. We will also make use of fractional powers of differential operators, a concept which is widely used in some areas of and math- ematical analysis [3, 14] but which is new to the study of Wishart distribu- tions for graphical models. In its simplest formulation, suppose a function n f : R → R is such that its nth derivative, (d/dx) f(x), can be analytically continued as a function of n to a domain in C; this allows us to define the αth derivative, (d/dx)αf(x) where α belongs to the domain of analyticity. α G˚arding[9] defined fractional powers, det(∂/∂TAA) , of the determi- nant det(∂/∂TAA) by means of analytic continuation in α. We will apply G˚arding’sfractional powers of operators to calculate the normalizing con- stant IG(δ, D), and we provide in Example 3.5 an explicit calculation for a case in which the fractional power of the determinant det(∂/∂TAA) is −1/2. NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 21

The following theorem is the main result of the paper. In this result, we express IG(δ, D) in terms of a series in which derivatives with respect to the UAA are calculated, then the outcome is evaluated at UAA = TAA, then derivatives with respect to the TAA are calculated, and then the resulting expression is evaluated at TAA = DAA.

Theorem 3.3. Let G = (V,E) be an undirected graph and partition V = A ∪ B such that the induced subgraph GB is a clique. Let I = {(i, j) ∈ A × B | (i, j) ∈ E} denote the edges connecting A and B, and let I1 denote the end points in A and I2 the end points in B (i.e. the projection of I onto the first and second coordinate). Define

  ∂ |I| (3.8) ∂I1,I2 (D,TAA) = − DI2(r),I2(s) Cof I1(r),I1(s) , ∂TAA r,s=1 a |I| × |I| matrix of differential operators. Then

1 |I|/2 1  −(δ+ 2 (|B|+1)) IG(δ, D) = π Γ|B| δ + 2 (|B| + 1) |DBB| −1/2 · det ∂I1,I2 (D,TAA) X X −j.. · ··· det ∂I1,I2 (D,UAA) 0≤jrs<∞ 1≤r≤s≤|I|

 jrs Y (1 + δrs) jrs jrs · DI (r),I (r) DI (s),I (s) jrs! 1 2 1 2 1≤r≤s≤|I|  jrs · Cof rs ∂I1,I2 (D,UAA)

1 · IG (δ + |I| + j..,UAA) . A 2 UAA=TAA TAA=DAA

As a corollary of this theorem, we obtain an analogous formula for the case in which D = Ip.

Corollary 3.4. Let G = (V,E) be an undirected graph with vertices V = {1, . . . , p}. Let V be partitioned such that V = A ∪ B and the induced subgraph GB is a clique. Let I = {(i, j) ∈ A × B | (i, j) ∈ E} denote the edges connecting A, B and let I1 denote the end points in A and I2 the end 22 C. UHLER, A. LENKOSKI, D. RICHARDS points in B. Then

|I|/2 1  IG(δ, Ip) = π Γ|B| δ + 2 (|B| + 1)

· ∂I ,I (D,TAA)IG (δ + |I|/2,TAA) . 1 2 A TAA=I|A|

Theorem 3.3 and Corollary 3.4 enable calculation of the normalizing con- stant of the G-Wishart distribution for any graph by removing cliques se- quentially until the resulting graph is chordal, in which case the normalizing constant is known. In the following example we show how to apply Theo- rem 3.3 in order to compute the normalizing constant for general D for the graph G5 given in Figure1.

Example 3.5. We wish to calculate Z δ IG5 (δ, D) = |K| exp(−tr(KD)) dK. 5 K∈S 0(G5) We partition the matrix K into blocks,   KAA KAB K = T , KAB KBB where       k33 k34 k35 k11 k12 0 k14 k15 KAA = ,KAB = ,KBB = k34 k44 k45 . k12 k22 k23 0 0 k35 k45 k55

Noting that KBB is unconstrained, we now apply Theorem 3.3. In the fol- lowing, we provide all the ingredients of the calculation, viz.,   d23 I I1 = (2, 1, 1),I2 = (3, 4, 5), vec(DAB) = d14 , d15   d33k11 −d34k12 −d35k12 −1 −1 Λ = |KAA| −d34k12 d44k22 d45k22  . −d35k12 d45k22 d55k22 NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 23

Further, the matrix of differential operators is

  ∂ 3 ∂I1,I2 (D,TAA) = − DI2(r),I2(s)Cof I1(r),I1(s) ∂TAA r,s=1 −d ∂ 1 d ∂ 1 d ∂  33 ∂T11 2 34 ∂T12 2 35 ∂T12  1 ∂ ∂ 1 ∂  =  d34 −d44 − d45   2 ∂T12 ∂T22 2 ∂T22    1 d ∂ − 1 d ∂ −d ∂ 2 35 ∂T12 2 45 ∂T22 55 ∂T22 and similarly for ∂I1,I2 (D,UAA).

Since KAA is unconstrained, the integral IGA (δ, UAA) is a standard Wishart normalizing constant, so we have

3 3  −(δ+ 2 ) IGA (δ, UAA) = Γ2 δ + 2 |UAA| . Then from Theorem 3.3 we obtain

IG5 (δ, D) 3/2 −(δ+2) −1/2 = π Γ3(δ + 2) |DBB| (det ∂I1,I2 (D,TAA))

X X −j.. · ··· Γ2(δ + 3 + j..) (det ∂I1,I2 (D,UAA)) 0≤jrs<∞ 1≤r≤s≤3

 jrs Y (1 + δrs) jrs jrs (3.9) · DI (r),I (r) DI (s),I (s) jrs! 1 2 1 2 1≤r≤s≤3  jrs −(δ+3+j..) (Cof rs ∂I ,I (D,UAA)) · |UAA| . 1 2 UAA=TAA TAA=DAA

For the case in which D = I5, we have DI1(r),I2(r) = 0 for all r = 1, 2, 3 and hence we deduce the result given in Corollary 3.4, viz.,

3/2 IG5 (δ, I5) = π Γ3(δ + 2) Γ2(δ + 3)

−1/2 −(δ+3) · (det ∂I ,I (D,TAA)) |TAA| . 1 2 TAA=I|A| 24 C. UHLER, A. LENKOSKI, D. RICHARDS

By (3.8),

n −(δ+3) (det ∂I ,I (D,TAA)) |TAA| 1 2 TAA=I|A| n 2n n  ∂   ∂  −(δ+3) = (−1) (T11T22) ∂T ∂T 11 22 T11=T22=1 = (δ + 3)(δ + 4) ··· (δ + 2 + n)(δ + 3)(δ + 4) ··· (δ + 2 + 2n) Γ(δ + 3 + n) Γ(δ + 3 + 2n) = . Γ(δ + 3) Γ(δ + 3)

The latter expression, considered as a function of a complex variable n, is 1 analytic in the complex plane on a region containing the point n = − 2 . Therefore, in accordance with G˚arding’sfractional calculus,

−1/2 −(δ+3) (det ∂I ,I (D,TAA)) |TAA| 1 2 TAA=I|A|

Γ(δ + 3 + n) Γ(δ + 3 + 2n) = Γ(δ + 3) Γ(δ + 3) 1 n=− 2 Γ(δ + 5 ) Γ(δ + 2) = 2 , Γ(δ + 3) Γ(δ + 3) so we obtain the same result for IG5 (δ, I5) as in (2.4).

To complete this section, we now provide the proofs of Theorem 3.3 and Corollary 3.4. Proof of Theorem 3.3. The matrix K is of the form   KAA KAB p K = T ∈ S 0, KAB KBB where KBB has no zero constraints. By applying the determinant formula for block matrices,

T −1 |K| = |KAA| · |KBB − KAB(KAA) KAB|, NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 25

T −1 and changing variables, KBB → KBB + KAB(KAA) KAB, we obtain Z δ IG(δ, D) = |KBB| exp(−tr(KBBDBB)) dKBB Z δ · |KAA| exp(−tr(KAADAA)) Z · exp(−2tr(KABDAB))

T −1 · exp(−tr(DBBKAB(KAA) KAB)) dKAB dKAA and hence

1 1  −(δ+ 2 (|B|+1)) IG(δ, D) = Γ|B| δ + 2 (|B| + 1) |DBB| Z δ · |KAA| exp(−tr(KAADAA)) Z · exp(−2tr(KABDAB))

T −1 · exp(−tr(DBBKAB(KAA) KAB)) dKAB dKAA, where we applied (1.2) to compute the integral over KBB. Denote by vec(KAB) the vectorized matrix KAB, written column-by- column. We apply a formula for the Kronecker product of matrices (see Muirhead [26, p. 76]) to obtain

T −1 T −1 tr(DBBKAB(KAA) KAB) = vec(KAB) DBB ⊗ (KAA) vec(KAB).

Let I = {(i, j) ∈ A × B | (KAB)ij 6= 0} and let I1 denote the projection of I onto the first index and I2 the projection of I onto the second index. I Let vec(KAB) denote the column vector containing the non-zero entries of −1 −1 vec(KAB) and let Λ be a matrix containing the entries of DBB ⊗(KAA) I corresponding to the components of vec(KAB), i.e.,

−1 −1 (Λ )rs = DI2(r),I2(s)(KAA)I1(r),I1(s) 1 (3.10) = DI2(r),I2(s) Cof I1(r),I1(s)(KAA), |KAA| where Cof ij(KAA) denotes the (i, j)-th entry of the cofactor matrix of KAA. Then

I T I tr(KABDAB) = vec(KAB) vec(DAB), T −1 I T −1 I tr(DBBKAB(KAA) KAB)) = vec(KAB) Λ vec(KAB), 26 C. UHLER, A. LENKOSKI, D. RICHARDS and hence we obtain the integral over KAB in the form of a Gaussian integral: Z T −1 exp(−2tr(KABDAB)) exp(−tr(DBBKAB(KAA) KAB)) dKAB Z I T I = exp(−2 vec(KAB) vec(DAB))

I T −1 I I · exp(−vec(KAB) Λ vec(KAB)) dKAB |I|/2 1/2 I T I = π |Λ| exp(vec(DAB) Λ vec(DAB)).

Therefore,

1 |I|/2 1  −(δ+ 2 (|B|+1)) IG(δ, D) = π Γ|B| δ + 2 (|B| + 1) |DBB| Z δ+|I|/2 · |KAA| exp(−tr(KAADAA))  −1/2 · det D Cof (K )|I| I2(r),I2(s) I1(r),I1(s) AA r,s=1 I T I · exp(vec(DAB) Λ vec(DAB)) dKAA.

Now note that   (3.11) det D Cof (K )|I| exp(−tr(K D )) I2(r),I2(s) I1(r),I1(s) AA r,s=1 AA AA

= det(∂I ,I (D,TAA)) exp(−tr(KAATAA)) . 1 2 TAA=DAA By analytic continuation [9], we obtain

1 |I|/2 1  −(δ+ 2 (|B|+1)) IG(δ, D) = π Γ|B| δ + 2 (|B| + 1) |DBB| −1/2 · det(∂I1,I2 (D,TAA)) Z δ+|I|/2 · |KAA| exp(−tr(KAATAA))(3.12)

I T I · exp(vec(D ) Λ vec(D )) dKAA . AB AB TAA=DAA Now we write the exponential function as an infinite series and apply the NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 27 cofactor formula to express Λ in terms of the entries of Λ−1: I T I exp(vec(DAB) Λ vec(DAB)) jrs X X Y (1 + δrs) jrs jrs jrs = ··· DI (r),I (r) DI (s),I (s) Λrs jrs! 1 2 1 2 0≤jrs<∞ 1≤r≤s≤|I|

jrs X X Y (1 + δrs) jrs jrs = ··· DI (r),I (r) DI (s),I (s) jrs! 1 2 1 2 0≤jrs<∞ 1≤r≤s≤|I|  jrs jrs |I| · |KAA| Cof rs [DI2(a),I2(b)Cof I1(a),I1(b)(KAA)]a,b=1

 |I| −jrs · det [DI2(a),I2(b)Cof I1(a),I1(b)(KAA)]a,b=1 . PP Denoting 0≤r

∂  1 ∂  = 2 (1 + δij) ∂UAA ∂Uij i,j∈A similar to (3.11), we obtain

1 |I|/2 1  −(δ+ 2 (|B|+1)) IG(δ, D) = π Γ|B| δ + 2 (|B| + 1) |DBB|   ∂  |I| −1/2 · det − DI2(r),I2(s) Cof I1(r),I1(s) ∂TAA r,s=1 |I| −j X X   ∂    .. · ··· det − DI2(a),I2(b)Cof I1(a),I1(b) ∂UAA 0≤jrs<∞ a,b=1

jrs Y (1 + δrs) jrs jrs · DI (r),I (r) DI (s),I (s) jrs! 1 2 1 2 1≤r≤s≤|I| !   ∂  |I| jrs · Cof rs − DI2(a),I2(b)Cof I1(a),I1(b) ∂UAA a,b=1

· IG (δ + |I|/2 + j..,UAA) , A UAA=TAA TAA=DAA where in the last line we used the fact that Z δ IGA (δ, UAA) = |KAA| exp(−tr(KAAUAA)) dKAA.

This completes the proof.

Proof of Corollary 3.4. This follows from Theorem 3.3 by setting D = Ip in (3.12). 28 C. UHLER, A. LENKOSKI, D. RICHARDS

4. Discussion. In this paper we provided an explicit representation of the G-Wishart normalizing constant for general graphs. Theorem 3.3 is our main result and it can be applied to compute the normalizing constant of any graph. However, for particular classes of graphs one might be able to obtain simpler formulas using a more specialized approach as can be seen by comparing the two formulas (3.9) and (4.1) for G5. In Proposition 3.1 we provided a simpler formula for bipartite graphs H2,m, and in Corollary 3.2 and in Theorem4 for graphs with minimum fill-in 1. Note that Corollary 3.2 and Theorem4 can be applied to graphs of minimum fill-in 1 and also to graphs which are clique sums of graphs of minimum fill-in 1. Even in modest dimensions the size of the graph space necessitates iter- ative methods to address model uncertainty, as exhaustive enumeration is infeasible. Since the graphical model may be just one part of a larger hi- erarchy, Markov chain Monte Carlo methods are naturally used to perform posterior inference. In such scenarios the chain moves between graphs in each scan of the parameter set and the transition probability reduces to the eval- uation of ratios of G-Wishart normalizing constants. Since direct evaluation of these constants has appeared infeasible, previous work used computation- ally intensive sampling-based methods to approximate this ratio. Our paper shows that computing the exact normalizing constant of the G-Wishart distribution is possible in principle. The various examples in this paper also make it clear that one can hope to find more computationally effi- cient procedures than Theorem 3.3 for computing the normalizing constant of the G-Wishart distribution for particular classes of graphs. Important future work is the development of specialized methods for computing the normalizing constants of different classes of graphs that are important for applications, one example being grids, which are widely used in spatial ap- plications.

Acknowledgments. C.U.’s research was supported by the Austrian Science Fund (FWF) Y 903-N35. A.L.’s research was supported by Statistics for Innovation sfi2 in Oslo. D.R.’s research was partially supported by the U.S. National Science Foundation grant DMS-1309808; and by a Romberg Guest Professorship at the Heidelberg University Graduate School for Math- ematical and Computational Methods in the Sciences, funded by German Universities Excellence Initiative grant GSC 220/2.

References. [1] Andrews, G. E., Askey, R. and Roy, R. (2000). Special Functions. Cambridge University Press, New York. MR1688958 NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 29

[2] Atay-Kayis, A. and Massam, H. (2005). A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models. Biometrika 92 317–335. MR2201362 [3] Bojdecki, T. and Gorostiza, L. G. (1999). Fractional Brownian motion via frac- tional Laplacian. Statist. & Probab. Letters 44 107–108. [4] Cheng, Y. and Lenkoski, A. (2012). Hierarchical Gaussian graphical models: Be- yond reversible jump. Electron. J. Statist. 6 2309–2331. [5] Dawid, A. P. and Lauritzen, S. L. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272-1317. [6] Dempster, A. P. (1972). Covariance selection. Biometrics 28 157-175. [7] Dobra, A. and Lenkoski, A. (2011). Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Statist. 5 969-993. [8] Dobra, A., Lenkoski, A. and Rodriguez, A. (2011). Bayesian inference for general Gaussian graphical models with application to multivariate lattice data. J. Amer. Statist. Assoc. 106 1418–1433. [9] G˚arding, L. (1947). The solution of Cauchy’s problem for two totally hyperbolic linear differential equations by means of Riesz integrals. Ann. Math. 48 785-826. [10] Giri, N. C. (2004). Multivariate Statistical Analysis. Marcel Dekker, New York. MR0468025 [11] Giudici, P. and Green, P. J. (1999). Decomposable graphical Gaussian model de- termination. Biometrika 86 785–801. [12] Gross, K. I. and Richards, D. St. P. (1987). Special functions of matrix ar- gument. I. Algebraic induction, zonal polynomials, and hypergeometric functions. Trans. Amer. Math. Soc. 301 781–811. MR0882715 [13] Herz, C. S. (1955). Bessel functions of matrix argument. Ann. Math. 61 474–523. MR0069960 [14] Hille, E. and Phillips, R. S. (1957). Functional Analysis and Semigroups. Amer. Math. Soc. Colloq. Publ., vol. 31, Providence, R.I. [15] Ingham, A. E. (1933). An integral which occurs in statistics. Math. Proc. Camb. Phil. Soc. 29 271–276. [16] James, A. T. (1964). Distributions of matrix variates and latent roots derived from normal samples. Ann. Math. Statist. 35 475–501. MR0181057 [17] Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C. and West, M. (2005). Experiments in stochastic computation for high-dimensional graphical mod- els. Statist. Sci. 20 388-400. [18] Karlin, S. (1968). Total Positivity 1. Stanford University Press, Stanford, CA. MR0230102 [19] Lauritzen, S. L. (1996). Graphical Models. Oxford University Press, New York. MR1419991 [20] Lenkoski, A. (2013). A direct sampler for G-Wishart variates. Stat 2 119–128. [21] Lenkoski, A. and Dobra, A. (2011). Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior. J. Comput. Graph. Statist. 20 140–157. MR2816542 [22] Letac, G. and Massam, H. (2007). Wishart distributions for decomposable graphs. Ann. Statist. 35 1278–323. [23] Maass, H. (1971). Siegel’s Modular Forms and Dirichlet Series. Lecture Notes in Mathematics 216. Springer, Heidelberg. [24] Mitsakakis, N., Massam, H. and Escobar, M. D. (2011). A Metropolis-Hastings based method for sampling from the G-Wishart distribution in Gaussian graphical models. Electron. J. Statist. 5 18–30. 30 C. UHLER, A. LENKOSKI, D. RICHARDS

[25] Muirhead, R. J. (1975). Expressions for some hypergeometric functions of matrix argument with applications. J. Multivariate Anal. 5 283–293. MR0381137 [26] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, Hoboken, NJ. MR0652932 [27] Olkin, I. (2002). The 70th anniversary of the distribution of random matrices: A survey. Linear Algebra Appl. 354 231–243. MR1927659 [28] Piccioni, M. (2000). Independence structure of natural conjugate densities to expo- nential families and the Gibbs sampler. Scand. J. Statist. 27 111–27. [29] Roverato, A. (2002). Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand. J. Statist. 29 391–411. MR1925566 [30] Siegel, C. L. (1935). Uber¨ die analytische Theorie der quadratischen Formen. Ann. Math. 36 527–606. MR1503238 [31] Speed, T. P. and Kiiveri, H. (1986). Gaussian Markov distributions over finite graphs. Ann. Statistics 14 138–150. [32] Wang, H. and Carvalho, C. M. (2010). Simulation of hyper-inverse Wishart dis- tributions for non-decomposable graphs. Electron. J. Statist. 4 1470–1475. [33] Wang, H. and Li, S. Z. (2012). Efficient Gaussian graphical model determination under G-Wishart prior distributions. Electron. J. Statist. 6 168–198. [34] Wishart, J. (1928). The generalised product moment distribution in samples from a normal multivariate population. Biometrika 20A 32–52. [35] Wishart, J. and Bartlett, M. S. (1933). The generalised product moment distri- bution in a normal system. Math. Proc. Camb. Phil. Soc. 29 260–270.

Supplementary Material. Exact Formulas for the Normalizing Constants of Wishart Distributions for Graphical Models with Minimum Fill-In 1. In the following, we prove an extension of Corollary 3.2 to obtain a generalization of Theorem 2.5 for arbitrary D. This requires generalizing Proposition 3.1 to block matrices of the form   KAA KAB p K = T ∈ S 0 KAB KBB where KAA is arbitrary of size 2×2, KAB is complete of size 2×m and KBB is arbitrary of size m × m. In Lemma (S.1) we analyze the case in which KAA is complete and in Lemma (S.2) the case in which KAA is diagonal.

Lemma (S.1). Let G be a graph on 2 + m vertices with two nodes that are connected to each other and to all other nodes, i.e. K is of the form   KAA KAB 2+m K = T ∈ S 0 , KAB KBB where KAA is a complete 2 × 2 matrix, KAB is a complete 2 × m matrix and KBB is an arbitrary m × m matrix. Then the integral IG(δ, D) converges NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 31

2+m absolutely for all δ > −1 and D ∈ S 0 . Further,

1 m 3  −(δ+ 2 (m+3)) IG(δ, D) = π Γ2 δ + 2 |DAA| T −1  · IGB δ + 1,DBB − DABDAADAB .

Proof. By applying the determinant formula for block matrices, making 1/2 1/2 a change of variables to replace KAB by KAA KABKBB and applying (3.3) as in the proof of Proposition 3.1 we find that

m 3  π Γ2 δ + 2 IG(δ, D) = 1  Γ2 δ + 2 (m + 3) Z δ+ 1 m δ+1 · |KAA| 2 |KBB| exp(−tr(KAADAA) − tr(KBBDBB))

1 T  · 0F1 δ + 2 (m + 3); KAADABKBBDAB dKAA dKBB.

Since KAA is complete, we can apply (3.2):

1  m 3  − δ+ 2 (m+3) IG(δ, D) = π Γ2 δ + 2 |DAA| Z δ+1 T −1 · |KBB| exp(−tr(KBB(DBB − DABDAADAB)) dKBB

1 m 3  −(δ+ 2 (m+3)) = π Γ2 δ + 2 |DAA| T −1 · IGB (δ + 1,DBB − DABDAADAB). This completes the proof.

In the following lemma, we will encounter a symmetric matrix EBB = (Eij)i,j∈B. Denoting Kronecker’s delta by δij, we define the matrix of differ- ential operators, ∂  1 ∂  = 2 (1 + δij) ∂EBB ∂Eij i,j∈B and denote its minor corresponding to the rows {α1, α2} and the columns {β1, β2} by  ∂ 

. ∂EBB {α1α2},{β1β2}

Lemma (S.2). Let G be a graph on 2 + m vertices with two nodes that are connected to all other nodes but not to each other, i.e. K is of the form   KAA KAB m+2 K = T ∈ S 0 , KAB KBB 32 C. UHLER, A. LENKOSKI, D. RICHARDS where KAA = diag(κ1, κ2), KAB is a complete 2 × m matrix, and KBB is an arbitrary m × m matrix. Then the integral IG(δ, D) converges absolutely 2+m for all δ > −1 and D ∈ S 0 , and IG(δ, D) is given by

m 3  ∞ 1  π Γ2 δ + 2 X δ + 2 (m + 2) q 2 Γ δ + 1 (m + 2)  Γ δ + 1 (m + 3) 2 q! δ + 1 (m + 3) 2 2 q=0 2 2q ∞   2  X 1 X l Y 1  · 1  δ + q + 2 (m + 2) l l! δ + 2q + (m + 3) l1 i l=0 2 l l1+l2=l i=1      X q Y Q Y Q · |D | α1α2++ |D | ++β1β2 Q  A,{α1,α2}  A,{β1,β2}  Q 3≤α1<α2≤m+2 3≤β1<β2≤m+2   Qα α β β  ∂ l1  ∂ l2  ∂  1 2 1 2  Y  ·   ∂t1 ∂t2  ∂EBB {α α },{β β }  3≤α1<α2≤m+2 1 2 1 2 3≤β1<β2≤m+2

· IG (δ + 1,EBB) , B P2 T EBB =DBB − j=1 tj D{j},B D{j},B t1=t2=0 where Q = (Qα1α2β1β2 : 3 ≤ α1 < α2 ≤ m + 2, 3 ≤ β1 < β2 ≤ m + 2) is a vector of non-negative integers such that Q++++ = q.

Proof. By applying the determinant formula for block matrices, making 1/2 1/2 a change of variables to replace KAB by KAA KABKBB and applying (3.3) as in the proof of Proposition 3.1 we find that

m 3  π Γ2 δ + 2 IG(δ, D) = 1  Γ2 δ + 2 (m + 3) Z δ+ 1 m δ+1 · |KAA| 2 |KBB| exp(−tr(KAADAA) − tr(KBBDBB))

1 T  · 0F1 δ + 2 (m + 3); KAADABKBBDAB dKAA dKBB. Applying (3.5) and (3.1) as in the proof of Proposition 3.1 we obtain

1 T  0F1 δ + 2 (m + 3); KAADABKBBDAB ∞ X 1 T q = |KAADABKBBD | q! δ + 1 (m + 3) δ + 1 (m + 2) AB q=0 2 2q 2 q ∞ X 1 T l · tr(KAADABKBBD ) . l! δ + 2q + 1 (m + 3) AB l=0 2 l NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 33

Since KAA = diag(κ1, κ2),

T tr(KAADABKBBDAB) T T = κ1 tr(KBBD{1},BD{1},B) + κ2 tr(KBBD{2},BD{2},B) and hence by the Multinomial Theorem   T l X l l1 l2 T l1 tr(KAAKABKBBKAB) = κ1 κ2 tr(KBBD{1},BD{1},B) l1 l1+l2=l T l2 · tr(KBBD{2},BD{2},B) .

By the Binet-Cauchy formula ([18, p. 1]),

T |KAADABKBBDAB| T = |KAA| · |DABKBBDAB| X = κ1 κ2 |DA,{α1,α2}| |K{α1,α2},{β1,β2}||DA,{β1,β2}|. 3≤α1<α2≤m+2 3≤β1<β2≤m+2

Hence by the Multinomial Theorem,

T q |KAADABKBBDAB|   q q X q Y Q = κ κ |D | α1α2β1β2 1 2 Q A,{α1,α2} Q 3≤α1<α2≤m+2 3≤β1<β2≤m+2

Qα1α2β1β2 Qα1α2β1β2 · |K{α1,α2},{β1,β2}| |DA,{β1,β2}|    q q X q Y Q = κ κ |D | α1α2++ 1 2 Q A,{α1,α2} Q 3≤α1<α2≤m+2  Y  Q++β1β2 · |DA,{β1,β2}| 3≤α1<α2≤m+2  Y  Qα1α2β1β2 · |K{α1,α2},{β1,β2}| . 3≤α1<α2≤m+2 3≤β1<β2≤m+2 34 C. UHLER, A. LENKOSKI, D. RICHARDS

Collecting all terms, we find that

IG(δ,D) m 3  ∞ π Γ2 δ + X 1 = 2 Γ δ + 1 (m + 3) q! δ + 1 (m + 3) δ + 1 (m + 2) 2 2 q=0 2 2q 2 q ∞ 2 ∞ 1  l  Z δ+q+l + 1 m  X X Y i 2 −κi · 1  κi e dκi l! δ + 2q + (m + 3) l1 0 l=0 2 l l1+l2=l i=1    X q Y Q · |D | α1α2++ Q A,{α1,α2} Q 3≤α1<α2≤m+2  Y  Q++β1β2 · |DA,{β1,β2}| 3≤β1<β2≤m+2 Z  2  δ+1 Y T lj · |KBB| exp(−tr(KBBDBB)) tr(KBBD{j},BD{j},B) j=1 ! Y Qα1α2β1β2 · |K{α1,α2},{β1,β2}| dKBB. 3≤α1<α2≤m+2 3≤β1<β2≤m+2

The gamma integrals over κ1 and κ2 are computed readily, so only the inte- gral over the variables KBB remains to be evaluated, and we shall evaluate that integral in terms of a normalizing constant for the graph GB. First, note that

2 l  Y  T  j exp − tr(KBBDBB) tr(KBBD{j},BD{j},B) j=1  l1  l2 ∂ ∂   = exp − tr(KBBEBB) , ∂t ∂t 1 2 t1=t2=0 where 2 X T EBB = DBB − tjD{j},BD{j},B. j=1

Let Eij denote the entry of EBB corresponding to nodes i and j in B. Then Z δ+1 Qα1α2β1β2 |KBB| exp(−tr(KBBEBB)) |K{α1,α2},{β1,β2}| dKBB

 ∂  Qα α β β Z 1 2 1 2 δ+1 = |KBB| exp(−tr(KBBEBB)) dKBB. ∂EBB {α1α2},{β1β2} By collecting all terms, we obtain the desired result. NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 35

With these two lemmas, we now have the tools to generalize Corollary 3.2 to graphs of treewidth larger than 2. In the following theorem, we show how the normalizing constant changes when removing one edge from an arbitrary chordal graph G.

Theorem (S.3). Let G = (V,E) be an undirected graph with minimum fill-in 1 on p vertices. Let Ge = (V,Ee) denote the graph G with one ad- ditional edge e, i.e., Ee = E ∪ {e} such that Ge is chordal. Let d denote the number of triangles formed by the edge e and two other edges in Ge. Let V be partitioned such that V = A ∪ B ∪ C with |A| = 2, |B| = d, |C| = p − d − 2, and where A contains the two vertices adjacent to the edge e in Ge, B contains all vertices in Ge that span a triangle with the edge e, and C contains all remaining nodes. Then IG(δ, D) is given by 1  ∞ δ + 1 (d + 2) −1/2 Γ δ + 2 (d + 2) δ+ 1 (d+3) X 2 q π IGe (δ, D) |DAA| 2 Γ δ + 1 (d + 3) q! δ + 1 (d + 3) 2 q=0 2 2q ∞   2 X 1 X l Y 1  · 1  δ + q + 2 (d + 2) l l! δ + 2q + (d + 3) l1 i l=0 2 l l1+l2=l i=1      X q Y Q Y Q · |D | α1α2++ |D | ++β1β2 Q  A,{α1,α2}  A,{β1,β2}  Q 3≤α1<α2≤d+2 3≤β1<β2≤d+2

l1 l2 !  ∂   ∂   ∂  Qα α β β Y 1 2 1 2 · ∂t1 ∂t2 ∂EBB {α1α2},{β1β2} 3≤α1<α2≤d+2 3≤β1<β2≤d+2

I (δ + 1,E ) · GB BB . T −1  IGB δ + 1,DBB − DABDAADAB P2 T EBB =DBB − j=1 tj D{j},B D{j},B t1=t2=0

Proof. Let GAB be the graph induced by the vertices A ∪ B. By The- orem 2.2 and as in the proof of Corollary 3.2, the normalizing constants e e for G and G decompose into the normalizing constants for GAB and GAB, respectively, and an integral over the variables involving C. Moreover, the integral over the variables involving C is the same for G and for Ge. Hence,

I (δ, D) IG (δ, D ) G = AB {A,B},{A,B} , I e (δ, D) I e (δ, D ) G GAB {A,B},{A,B} where D{A,B},{A,B} denotes the principle submatrix of D corresponding to the rows and columns in A ∪ B. Since GAB is of the form needed for 36 C. UHLER, A. LENKOSKI, D. RICHARDS

e Lemma (S.2) and GAB is of the form needed for Lemma (S.1), the claim follows by applying Lemma (S.1) and Lemma (S.2).

e Note that since G is chordal, the induced graph GB is also chordal. Hence its normalizing constant is given by (1.4). For the case in which D is the identity matrix, EBB ≡ DBB and hence for l1 > 0 or l2 > 0 we get !  ∂ l1  ∂ l2  ∂  Qα α β β Y 1 2 1 2

∂t1 ∂t2 ∂EBB {α1α2},{β1β2} 3≤α1<α2≤d+2 3≤β1<β2≤d+2 I (δ + 1,E ) · GB BB = 0. T −1  IGB δ + 1,DBB − DABDAADAB

In addition, |DA,{α1,α2}| = |DA,{β1,β2}| = 0. Hence, in the infinite sums only the terms for q = 0 and l = 0 are non-zero, and the infinite series reduce to 1. Since |DAA| = 1, we see that if D = Ip then Theorem (S.3) reduces to Theorem 2.5. We revisit the graph G5 discussed in Example 2.6 and show how to apply

Theorem (S.3) to obtain the normalizing constant, IG5 (δ, D), explicitly.

Example (S.4). A minimal chordal cover of G5 is given in Figure1 (right). Only one edge is in the chordal cover of G5 but not in G5 itself, e namely the edge e = (1, 3). We denote the chordal cover of G5 by G5. There e are d = 3 triangles formed by the edge e in G5. The vertices adjacent to the edge e are A = {1, 3} and all remaining vertices span a triangle with the edge e, i.e. B = {2, 4, 5}. The induced graph GB consists of one edge only, namely (4, 5), so its normalizing constant is

3 −(δ+ 2 ) 3  IGB (δ, DB) = |D{4,5}| Γ2 δ + 2 Γ(δ + 1), where, in order to abbreviate notation, we denoted by DB and D{4,5}, respec- tively, the principle submatrix of D corresponding to the rows and columns in B and in {4, 5}, respectively. Hence by applying Theorem (S.3), we obtain NORMALIZING CONSTANTS FOR G-WISHART DISTRIBUTIONS 37

the following formula for IG5 (δ, D):

5  ∞ δ + 5  −1/2 Γ δ + 2 δ+3 X 2 q π IGe (δ, D) |DAA| Γ(δ + 3) 5 q! δ + 3 q=0 2q ∞   2 X 1 X l Y 5  ·  δ + q + 2 l l! δ + 2q + 3 l1 i l=0 l l1+l2=l i=1     X q Y Q Y Q · |D | α1α2++ |D | ++β1β2 Q A,{α1,α2} A,{β1,β2} Q {α1,α2}⊂{2,4,5} {β1,β2}⊂{2,4,5} !  ∂ l1  ∂ l2  ∂  Qα α β β Y 1 2 1 2 · ∂t1 ∂t2 ∂EBB {α1α2},{β1β2} {α1,α2}⊂{2,4,5} {β1,β2}⊂{2,4,5} −1 3 T δ+ 2 |D{4,5} − D DAADA,{4,5}| · A,{4,5} . δ+ 3 |E{4,5}| 2 P2 T EBB =DBB − j=1 tj D{j},B D{j},B t1=t2=0

Note that when 2 ∈ {α1, α2} or 2 ∈ {β1, β2} and Qα1α2β1β2 > 0, then

Q  ∂  α1α2β1β2 − δ+ 3 ( 2 ) |E{4,5}| = 0. ∂EBB {α1α2},{β1β2}

As a consequence, the normalizing constant for IG5 (δ, D) is given by

5  ∞ δ + 5  −1/2 Γ δ + 2 δ+3 X 2 q 2q π IGe (δ, D) |DAA| |D | Γ(δ + 3) 5 q! δ + 3 A,{4,5} q=0 2q ∞ 2 l l q X 1 X  l  Y  ∂  1  ∂  2 ∂ · δ + q + 5   2 li l! δ + 2q + 3 l1 ∂t1 ∂t2 ∂E{4,5} l=0 l l1+l2=l i=1 −1 3 T δ+ 2 |D{4,5} − D DAADA,{4,5}| · A,{4,5} , 3 δ+ 2 P2 T |E{4,5}| E{4,5}=D{4,5}− j=1 tj D{j},{4,5}D{j},{4,5} t1=t2=0

P2 T the evaluation being done first at E{4,5} = D{4,5}− j=1 tjD{j},{4,5}D{j},{4,5} e and last at t1 = t2 = 0. Since G5 is chordal, the corresponding normalizing constant is obtained from (1.4):

5 −(δ+2) −(δ+ 2 ) 5  |D{1,2,3}| |D{1,3,4,5}| I e (δ, D) = Γ (δ + 2) Γ δ + . G5 3 4 2 3 −(δ+ 2 ) 3  |D{1,3}| Γ2 δ + 2 38 C. UHLER, A. LENKOSKI, D. RICHARDS

Note also that

q 3  ∂ − δ+ 3 2q Γ2 δ + 2 + q − δ+ 3 +q |E | ( 2 ) = (−1) |E | ( 2 ); {4,5} 3  {4,5} ∂E{4,5} Γ2 δ + 2 this can be obtained by writing Icomplete(δ, D) as an integral in Equation (1.2) and applying the differential operator ∂/∂E{4,5} to both sides of the equa- tion Maass [23, p. 81]. Hence, by collecting all terms, noting that A = {1, 3} and simplifying the formula for IG5 (δ, D) above, we obtain the normalizing constant for G5 for general D:

IG5 (δ, D) = IG5 (δ, I5) 2δ+ 9 ∞ 5  3  |D{1,3}| 2 X δ + 2 q δ + 2 q · δ+2 δ+ 5 q! δ + 3 |D{1,2,3}| |D{1,3,4,5}| 2 q=0 2q (4.1) ∞   2 2q X 1 X l Y 5  · |D{1,3}{4,5}| δ + q + 2 l l!(δ + 2q + 3)l l1 i l=0 l1+l2=l i=1 −1 3 T δ+ 2  ∂ l1  ∂ l2 |D{4,5} − D{1,3},{4,5}D D{1,3},{4,5}| · {1,3} . P2 T δ+q+ 3 ∂t1 ∂t2 |D − tjD D | 2 {4,5} j=1 {j},{4,5} {j},{4,5} t1=t2=0 T If D = I5 then D{j},{4,5}D{j},{4,5} = 0 and hence for l1 > 0 or l2 > 0 we get

3 T −1 δ+ 2  ∂ l1  ∂ l2 D{4,5} − D{1,3},{4,5}D{1,3}D{1,3},{4,5} 3 = 0. ∂t1 ∂t2 P2 T δ+q+ 2 D{4,5} − j=1 tjD{j},{4,5}D{j},{4,5}

In addition, |D{1,3},{4,5}| = 0. Hence in the infinite sums only the terms for q = 0 and l = 0 are non-zero, and the infinite sums reduce to 1. Since

|D{1,3}| = |D{1,2,3}| = |D{1,3,4,5}| = 1, we see that the formula for IG5 (δ, D) indeed reduces to IG5 (δ, I5).

C. Uhler A. Lenkoski Laboratory for Information and Decision Systems Norwegian Computing Center and Institute for Data, Systems and Society Oslo, Norway Massachusetts Institute of Technology E-mail: [email protected] Cambridge, MA 02139, U.S.A. E-mail: [email protected] D. Richards Department of Statistics Penn State University University Park, PA 16802, U.S.A. E-mail: [email protected]