<<

Model-based clustering for random hypergraphs

Tin-Lok James Ng1, Thomas Brendan Murphy1

School of Mathematics and Statistics, University College Dublin

Abstract

A probabilistic model for random hypergraphs is introduced to represent unary, binary and higher order interactions among objects in real-world problems. This model is an extension of the Latent Class Analysis model, which captures clustering structures among objects. An EM (expectation max- imization) algorithm with MM (minorization maximization) steps is developed to perform parameter estimation while a cross validated likelihood approach is employed to perform model selection. The developed model is applied to three real-world data sets where interesting results are obtained. Keywords: Hypergraph, Latent Class Analysis, Minorization Maximization

1. Introduction

A large number of random graph models have been proposed [36, 20, 19, 27] to describe complex inter- actions among objects of interest. Pairwise relationships among objects can be naturally represented as a graph, in which the objects are represented by the vertices, and two vertices are joined by an edge if certain relationship exists between them. While graphs are capable of representing pairwise interaction between objects, they are inadequate to represent higher order and unary interactions that are typically observed in many real-world problems. Examples of higher-order and unary relationships include co-authorship on academic papers, co-appearance in movie scenes, and songs performed in a concert. arXiv:1808.05185v1 [stat.ME] 15 Aug 2018 For example, the study of coauthorship networks of scientists have attracted significant research in- terests in both natural and social sciences [34, 35, 33, 32, 1]. Such networks are typically constructed by connecting two scientists if they have coauthored one or more papers together. However, as we will illustrate below, such representation inevitably results in loss of information while a hypergraph representation naturally preserves all information. A hypergraph is a generalization of a graph in which hyperedges are arbitrary sets of vertices, and can contain any number of vertices. As a result,

1This work was supported by the Science Foundation Ireland funded Insight Research Centre (SFI/12/RC/2289).

Preprint submitted to Journal of LATEX Templates August 16, 2018 hypergraphs are capable of representing relationships of any arbitrary orders.

We consider a simple example of a coauthorship network with 7 authors and 4 papers in order to illustrate the benefits of hypergraph modelling. A hypergraph representation of the network is given in Figure 1 where the vertices v1, v2, . . . , v7 represent the authors while the hyperedges e1, . . . , e4 represent the papers. For example, the paper e1 is written by four authors v1, v2, v3 and v4, and the paper e2 is written by two authors v2 and v3, while the paper e4 has a single author v4.

On the other hand, a graph representation of this coauthorship network with edges between any two au- thors who have coauthored at least one paper results in the edge set {(v1, v2), (v1, v3), (v1, v4), (v2, v3),

(v2, v4), (v3, v4), (v3, v5), (v3, v6), (v5, v6)}. It is evident that much information is lost with this repre- sentation. In particular, this representation removes information about the number of authors that co-authored a paper. For example, one can only deduce from this edge set that v3 has co-authored with v1 and v2 while unable to conclude that the co-authorship was for the same paper. Furthermore, the hyperedge e4 which contains a singleton v4 is left out in the graph representation.

A number of random hypergraph models were studied in probability and combinatorics literature where theoretical properties such as phase transition, chromatic number were investigated [23, 15, 5, 9, 38]. A novel parametrization of distributions on hypergraphs based on geometry of points is proposed in [30] which is used to infer Markov structure for multivariate distributions.

On the other hand, statistical modeling with random hypergraph is less explored. [44] introduced the hypergraph beta model with three variants, which is a natural extension of the beta model for random graphs [21]. In their model, the probability of a hyperedge e appearing in the hypergraph is parameterized by a vector β ∈ RN, which represents the “attractiveness” of each vertex. However, their model does not capture clustering among objects which is a typical real world phenomenon. In addition, the assumption of an upper bound on the size of hyperedges violates many real world data sets.

One may equivalently represent a hypergraph using a bipartite network (also called two-mode network and affiliation network). Two-mode networks consist of two different kinds of vertices and edges can only be observed between the two types of vertices, but not between vertices of the same type. A hypergraph can be represented as a two-mode by considering the hyperedges as a second type of vertices. For example, an equivalent bipartite representation of the hypergraph shown in Figure 1 is provided in Figure 2 where the hyperedges {e1, . . . , e4} are now replaced by the four green vertices.

Two-mode networks have been studied in various disciplines including computer science [37], social sciences [10, 40, 24, 12] and physics [29]. A number of approaches have been proposed to analyze and model two-mode network data [2, 40, 8, 26, 46, 43]. In particular, models originally developed for

2 e2 e1 v2 v3 e4

v1 v4 e3 v5

v7

v6

Figure 1: A hypergraph representation of a coauthorship network. binary networks were extended for two-mode networks.

[8] developed a blockmodeling approach of two-mode network data which aims to simultaneously par- tition the two types of vertices into blocks. [41] proposed exponential random graph models (ERGMs) for two-mode networks which models the logit of the probability of an actor belong to an event as a function of actor and event specific effects and other graph statistics. A clustering algorithm for two-mode network is developed in [11] based on the modelling framework in [41]. Several extensions to the ERGMs for bipartite networks are proposed in recent years [46, 45]. [43] proposed a methodology for studying the co-evolution of two-mode and one-mode networks. A network autocorrelation model for two-mode networks is introduced in [13].

Representing network observations using two-mode networks has the benefits of modelling vertices of both types jointly. However, in analyzing a two-mode network, one type of vertices may attract most interest. For example, in co-authorship networks, the main interest may lie in the collaborations rather than in co-authored papers. In such scenarios, a hypergraph representation is most nature by converting one type of vertices into hyperedges with no loss of information.

In this paper, we propose the Extended Latent Class Analysis (ELCA) model for random hypergraphs, which is a natural extension of the Latent Class Analysis (LCA) model [28, 16, 3] and includes the LCA model as a special case. The model is applied to two applications, including Star Wars movie scenes and concerts 2014.

2. Model and Motivation

2.1. Hypergraph

A hypergraph is represented by a pair H = (V,E), where V = {V1,V2, ··· ,VN } is the set of N vertices and E = {e1, e2, ··· , eM } is the set of M hyperedges. A hyperedge e is a subset of V , and we allow

3 V

E 1 1 2

3 2

4 3

5 4 6

7

Figure 2: Bipartite graph representation of the hypergraph in Figure 1. repetitions in the hyperedge set E. Thus, the hypergraph H can alternatively be represented with a

N × M matrix x = (xij) where xij = 1 if vertex Vi appears in hyperedge ej and xij = 0 otherwise.

2.2. Latent Class Analysis Model for Random Hypergraphs

The binary latent class analysis (LCA) model [28, 16] is a commonly used mixture model for high dimensional binary data. It assumes that each observation is a member of one and only one of the G latent classes, and conditional on the latent class membership, the manifest variables are mutually independent of each other. The LCA model appears to be a natural candidate to model random hy- pergraphs where hyperedges are partitioned into G latent classes, and the probability that a hyperedge e ∈ E contains a vertex v ∈ V depends only on its latent class assignment.

Let π = (π1, ··· , πG) be the a priori latent class assignment probabilities where G is the number of latent classes, and define the N × G matrix p = (pig) and pig is the probability that vertex Vi is contained in a hyperedge e with latent class label g. The likelihood function can be written as

M h G N i Y X Y xij 1−xij L(x; p, π) = πg pig (1 − pig) . j=1 g=1 i=1

(1) (1) (1) By introducing the M × G latent class membership matrix z = (zjg ) where zjg = 1 if hyperedge (1) (1) ej has latent class label g and zjg = 0 otherwise, the complete data likelihood of x and z can be

4 expressed as (1).

M G N (1) h izjg (1) Y Y Y xij 1−xij L(x, z ; p, π) = πg pig (1 − pig) . (1) j=1 g=1 i=1

In comparison to the hypergraph beta models introduced in [44], the LCA model is capable of capturing the clustering and heterogeneity of hyperedges. For example, academic papers can be naturally labelled according to subject areas and conditional on a paper being labelled mathematics, one would expect that the probability a mathematician co-authored the paper is higher than a biologist. The LCA model does not assume an upper bound on the size of hyperedges and can model hyperedges of any size. Furthermore, an efficient expectation maximization algorithm [6] can be easily derived to perform parameter estimation.

2.3. Extended Latent Class Analysis for Random Hypergraphs

While the LCA model captures the clustering and heterogeneity of hyperedges in real world data sets, it is quite restrictive in modeling the size of a hyperedge. The size of a hyperedge e with latent class label g follows the Poisson Binomial distribution [47] with parameters (p1g, ··· , pNg), and with PN PN expected value i=1 pig and variance i=1 pig(1 − pig). As we will illustrate in a few real world data sets, the LCA model underestimates the variation in sizes of hyperedges. Thus, we extend the LCA model by including an additional clustering structure to address this shortcoming.

We develop the Extended Latent Class Analysis model (ELCA) by introducing an additional clustering to the hyperedges. We assume that the two clustering are independent. We let τ = (τ1, ··· , τK ) be the a priori additional clustering assignment probabilities where K is the number of additional clusters. Thus, the probability that a hyperedge has cluster label g and additional cluster label k is given by

πgτk. We define the N × G matrix φ = (φig) and K dimensional vector a = (a1, ··· , aK ) so that the probability that vertex Vi is contained in a hyperedge with cluster label g and additional cluster label k is given by akφig.

Let θ = (π, τ, φ, a) denote the model parameters, the likelihood function can be written as

M h G K N i Y X X Y xij 1−xij L(x; θ) = πgτk (akφig) (1 − akφig) . j=1 g=1 k=1 i=1

(2) (2) (2) We define the M × K additional cluster membership matrix z = (zjk ) where zjk = 1 if hyperedge (2) ej has additional cluster label k and zjk = 0 otherwise. The complete data likelihood function of x, z(1) and z(2) is given as:

M G K N (1) (2) h izjg zjk (1) (2) Y Y Y Y xij 1−xij L(x, z , z ; θ) = πgτk (akφig) (1 − akφig) . (2) j=1 g=1 k=1 i=1

5 We further impose the constraint aK = 1 to ensure that the model is identifiable. It is easy to see that the LCA model is a special case of the ELCA model by letting the number of additional clusters K = 1.

2.4. Theoretical Properties

We compare the theoretical properties of the LCA and ELCA models developed above. Proposition 2.1 below shows that the size of hyperedge simulated from the ELCA model has larger variance than simulated from the LCA model.

Proposition 2.1. Suppose we are given the LCA model with parameters {π, p} and the ELCA model PK with parameters {π, τ, a, φ} and N vertices. Suppose the condition pig = φig k=1 akτk holds for i = 1, ··· ,N and g = 1, ··· ,G.

Let A denote the size of a random hyperedge XA generated under the LCA model. Similarly, let B denote the size of a random hyperedge XB generated under the ELCA model. We have the following results.

E(A) = E(B)

V ar(A) ≤ V ar(B)

Proof. The proof is straightforward and is given in the Appendix.

We now let fN (y) be the probability mass functions of the size of a random hyperedge simulated from a G cluster LCA model. Similarly, we let hN (y) be the probability mass function of the size of a random hyperedge simulated from the ELCA model with G clusters and K additional clusters. The following result can be derived. Proposition 2.2. 1. Under the specifications of a LCA model with parameters π = (π1, ··· , πG)

and {pig}i=1,··· ,N,g=1,··· ,G, and suppose the following conditions hold for g = 1, ··· ,G,

N (g) X (g) λN = pig → λ > 0 i=1

N X 2 pig → 0 i=1 as N → ∞. We have

G (g) X e−λ (λ(g))y f (y) → π N g y! g=1 That is, the distribution of the size of a random hyperedge converges to a mixture of Poisson distribution with G components.

6 2. Under the specification of a ELCA model with parameters π = (π1, ··· , πG), τ = (τ1, ··· , τK ),

a = (a1, ··· , aK ), and {φig}i=1,··· ,N,g=1,··· ,G. Further suppose the following conditions hold for g = 1, ··· ,G, and k = 1, ··· ,K. N (g,k) X (g,k) λN = φigak → λ > 0 i=1

N X 2 2 φigak → 0 i=1 as N → ∞. We have

G K (g,k) X X e−λ (λ(g,k))y h (y) → π τ N g k y! g=1 k=1 That is, the distribution of the size of a random hyperedge converges to a mixture of Poisson distribution with G × K components.

Proof. Conditional on the event that a random hyperedge is generated from cluster g, [47, Theorem 3] implies that

(g) e−λ (λ(g))y f (y) → N y! Part 1 result follows by marginalizing over the G clusters. The second part of the proposition can be proved similarly.

Proposition 2.2 implies that the size distribution of a random hyperedge generated under the ELCA model is far more flexible than for the LCA model.

2.5. Co-Clustering

The concept of having two clustering structure is related to co-clustering or block clustering. In co- clustering, the objective is to simultaneously cluster rows and columns of a data matrix. In particular, mixture models have been proposed with EM algorithms developed in the context of co-clustering [17, 18]. Co-clustering has also received significant attention in various application such as text mining, bioinformatics and recommender systems [7, 4, 14]. In comparison, we aim to obtain two types of clustering structure for the rows of a data matrix.

In the work of [39], a Poisson mixture model was proposed for clustering of digital gene expression to discover groups of co-expressed genes, where observations of biological entities under different con- ditions are collected. In order to model the variations in overall expression level among biological entities, a scaling parameter is introduced for each entity. In comparison, we explicitly model the size of random hyperedge using clustering which results in a more parsimonious model structure.

7 3. EM Algorithm

We estimate the parameters θ = (π, τ, φ, a) of the ELCA model using an EM algorithm [6] which is a popular method in fitting mixture models. The E-step of the EM algorithm involves computing the expected value of the logarithm of the complete data likelihood (2) with respect to the distribution of the unobserved z(1) and z(2) given the current estimates. The M-step involves maximizing the expected complete data log-likelihood.

Taking logarithm of the complete data likelihood in (2), we obtain the complete data log-likelihood function below.

M G K N X X X (1) (2)h X log L(x, z; θ) = Zjg Zjk log πg + log τk + xij log(ak) + j=1 g=1 k=1 i=1 i log(φig) + (1 − xij) log(1 − akφig) . (3)

For the E-step, we need to evaluate the expectation of (3) conditional on data x and current parameter estimates θ(t).

Q(θ|θ(t)) := E(log L(x, z; θ)|x, θ(t))

(1)d(2) (1) (2) (t) That is, we need to evaluate the expectation Zjg Zjk := E(Zjg Zjk |x, θ ). We have that

(1) (2) (t) (1) (2) (t) E(Zjg Zjk |x, θ ) = P r(Zjg = Zjk = 1|x, θ ) h i (t) (t) QN xij 1−xij πg τk i=1(akφig) (1 − akφig) = . (4) G K (t) (t)h N i P P Q xij 1−xij g=1 k=1 πg τk i=1(akφig) (1 − akφig)

In particular, the E-step has a computational complexity of O(N) for each pair (g, k). While the E-step of the EM algorithm is straightforward, the M-step involves complicated maximization. Thus, we use the ECM algorithm [31] which replaces the complex M-step by a series of simpler conditional maximizations. The conditional maximizations with respect to the parameters φ and a do not have closed form solutions. We resort to the MM algorithm [25, 22] which works by lower bounding the objective function by a minorizing function and then maximizing the minorizing function. Details of the M-step are given in the appendix and the EM algorithm is summarized in Algorithm 1. In particular, we note that the computational complexity for maximizing φig and ak are given by O(NiterMK) and

O(NiterMGN), respectively, where Niter is the number of iterations required for the MM algorithm.

8 Algorithm 1 EM Algorithm Input: x, G, K, tol Output: φ,ˆ a,ˆ π,ˆ τ,ˆ ˆz(1), ˆz(2) 1: conv = F alse

2: Random initialization of φ, a, π, τ

3: while conv = F alse do

4: Do the E-step according to (4)

5: for i = 1, ··· ,N do

6: for g = 1, ··· ,G do

7: Update φig according to (B.3)

8: end for

9: end for

10: for k = 1, ··· ,K − 1 do

11: Update ak according to (B.6)

12: end for

13: for g = 1, ··· ,G do

14: Update πg according to (B.7)

15: end for

16: for k = 1, ··· ,K do

17: Update τk according to (B.8)

18: end for

19: Evaluate Change in log-likelihood ∆loglik resulting from parameter updates

20: if ∆loglik < tol then

21: conv = T rue

22: end if

23: end while

9 4. Model Selection

4.1. Cross Validated Likelihood

Given a fixed model, the cross validated likelihood method [42] works by repetitively partitioning the observations into two disjoint sets, one of which is used to fit the model and obtain estimates of model parameters by maximizing the log-likelihood, and the other is for evaluating the model by computing its log-likelihood.

For each G and K, we define MG,K to be the ELCA model with G clusters and K additional clusters. To apply the cross validated likelihood method, we randomly partition the hyperedges x into two sets x(train) and x(test) where each hyperedge in x is included in x(train) with probability q. In our applications we set q = 0.7. The EM algorithm developed in section 3 is then used to fit x(train) and obtain the parameter estimates θˆ = (ˆπ, τ,ˆ φ,ˆ aˆ). We then compute the log-likelihood of x(test) under the estimated parameters θˆ and obtain the test log-likelihood L(test). The above procedure is then repeated Ncv times and the estimated cross validated log-likelihood is obtained by averaging over L(test). The procedure above is summarized in Algorithm 2.

We perform a greedy search for the optimal combination of G and K which produces the largest ˆG,K estimated cross validated log-likelihood Lcv . Starting with one cluster and one additional cluster, an additional cluster is then successively added to the model until the estimated cross validated log- likelihood does not increase. At this stage, we then increment the number of clusters G by 1 and the ˆG,1 ˆG−1,1 above procedure is repeated provided that Lcv > Lcv . The greedy search algorithm is summarized in Algorithm 3. The greedy search can be computationally intensive when the search space for (G, K) and the number of cross validation Ncv are large.

5. Applications

5.1. Star Wars Movie Scenes

Our first application is modeling co-appearance of the main characters in the scenes of the movie “Star Wars: A New Hope”. We collected the scripts of the movie from The Internet Movie Script Database 2 and constructed a hypergraph for the eight main characters. We define each scene in the movie as a hyperedge with a total of 178 hyperedges, and a character is contained in the scene if he/she speaks in the scene.

2Movie script data freely available at https://www.imsdb.com/

10 Algorithm 2 Estimated Cross Validated Log-likelihood

Input: x, G, K, Ncv, q

Output: Lˆcv 1: for n = 1, ··· ,Ncv do

2: x(train) = ∅, x(test) = ∅

3: for j = 1, ··· ,M do

4: u ∼ Unif(0, 1)

5: if u < q then (train) (train) 6: x = x ∪ xj

7: else (test) (test) 8: x = x ∪ xj

9: end if

10: end for ˆ (train) 11: θ = arg maxθ{L(x ; θ)} (test) 12: Lˆn = L(x ; θ)

13: end for

1 PNcv 14: Lˆcv = Lˆn Ncv n=1

We first performed model selection using the greedy search algorithm and the cross validated likelihood method presented in Section 4.1 to select the optimal number of clusters and additional clusters for the ELCA model. The results of the greedy search are provided in Table A1 and the model with 3 clusters and 2 additional clusters is selected.

The results from fitting the ELCA model with G = 3 and K = 2 are provided in Table 1 and Table 2. We can see the variation in the size of hyperedges from the parameter estimatesa ˆ andτ ˆ with the majority (81%) of hyperedges having size much smaller than the rest of the hyperedges. Thus, one can deduce that a small proportion of the movie scenes have far more characters.

Table 1: Estimates of π, τ and a from fitting the ELCA model with 3 clusters and 2 additional clusters for the Star Wars data set

πˆ (0.40, 0.40, 0.20) τˆ (0.81, 0.19) aˆ (0.41, 1)

The estimates φˆ in Table 2 reveal interesting clustering structure for the 8 main characters in the movie. For example, the lead character “Luke” has a strong tendency to appear in the two largest clusters. On the other hand, it is extremely unlikely for “Obi-Wan” and “Han” appear in the same

11 Algorithm 3 Greedy Search For Model Selection Input: x

Output: Gopt,Kopt

1: Gopt = 1

2: Kopt = 1

3: stopG = F alse

Gopt,Kopt 4: Obtain ˆlcv using Algorithm 2

5: while stopG = F alse do

6: stopK = F alse

7: while stopK = F alse do

8: Ktest = Kopt + 1

Gopt,Ktest 9: Obtain ˆlcv using Algorithm 2

Gopt,Ktest Gopt,Kopt 10: if ˆlcv > ˆlcv then

11: Kopt = Ktest

12: else

13: stopK = T rue

14: end if

15: end while

16: Gtest = Gopt + 1

ˆGtest,1 17: Obtain lcv using Algorithm 2 G ,1 ˆGtest,1 ˆ opt 18: if lcv > lcv then

19: Gopt = Gtest

20: else

21: StopG = T rue

22: end if

23: end while

12 Table 2: Estimates of {φig} from fitting the ELCA model with 3 clusters and 2 additional clusters for the Star Wars data set

Character Cluster 1 Cluster 2 Cluster 3 Wedge 0.18 0.00 0.36 Han 0.00 1.00 0.00 Luke 1.00 1.00 0.00 C-3PO 0.75 0.30 0.00 Obi-Wan 0.00 0.00 1.00 Leia 0.12 0.48 0.07 Biggs 0.31 0.00 0.28 Darth Vader 0.19 0.35 0.06

scene.

The estimated cluster assignment probabilities from the EM algorithm for each movie scene in the Star Wars movie are shown in chronological order in Figure 3. We can see from the plot that scenes in the early part of the movie are mainly associated with cluster 1, while cluster 2 contains most of the scenes from roughly scene 40 to scene 100. We can deduce from this, for example, that the character “Han” is very active in the middle part of the movie. On the other hand, there does not appear to be any obvious pattern for the third cluster. The clustering for many early and late movie scenes is relatively uncertain, as shown in the plot.

The uncertainties in clustering are also illustrated in a ternary plot in Figure 4. Each dot in the plot represents a movie scene, and the three corners of the plot represent the three clusters. The closer the dot is to the corner, the higher probability that the corresponding movie scene belongs to the corresponding cluster. The ternary plot in Figure 4 shows significant uncertainties in clustering a number of movie scenes into the first two clusters. This is reasonable since for a number of actors including the lead actor “Luke”, the probabilities of scene appearance are similar for the first two clusters.

5.2. Lady Gaga Concerts 2014

As a second application of the ELCA model, we collected the list of songs that Lady Gaga performed in all concerts in 2014 3. The data set contains 96 concerts with a total of 51 distinct songs performed. The hypergraph is constructed by defining each concert as a hyperedge and each song as a vertex. A

3 The Lady Gaga setlist data are available at: http://www.setlist.fm/

13 ●● ● ●● ● ●● ●●●●●●●●●●●●●●●●●● ●●●●● ●●● ●●●●●● ●● ● ● ●●●●● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ●●●● 1.0 1.0 1.0

●●●●● ●● ● ●●

●●●●● ●●●● ● ● ● ●●●●● ● ● ● ● ● 0.8 0.8 0.8

● ● ● ● ● ● ● ●

● ● ● ● ● ●●● ● ● ●●●●●●● ●●●● ●●●●●●●●●● ● ●● ●●● ●●● ● ●● ● ● ●● ●● ● ●● ●●● ● ● 0.6 0.6 0.6

● ● ●●● ● ●

● ●●● ● ●

● ● ● ●● ●● ● ●● ●●● ● ● ● ● ● ●●● ● ● ●●●●●●● ●●●● ●●●●●●●●●● ● ●● 0.4 0.4 ● ●●● ●●● ● ●● ● 0.4 Topic Cluster Probability Topic Cluster Probability Topic Cluster Probability Topic

● ● ● ●

● ● ● ●

0.2 0.2 ●●●●● ●●●● ● ● ● ●●●●● ● ● ● ● ● 0.2

● ●● ●●●●● ●● ● ● ● ● ●

● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●● ●●●● ● ● ●● ● ● ●●●●●●●●● ●●●●● ●● ● ●● ●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●● ● ●●●●●●●● 0.0 0.0 0.0

0 50 100 150 0 50 100 150 0 50 100 150 Movie Scene Movie Scene Movie Scene

Figure 3: Probability of clusters for movie scenes in Star Wars data set

Cluster 3

Cluster 1 Cluster 2

Figure 4: Ternary plot of the a posteriori group membership probabilities for the scenes in the Star Wars data set

14 vertex is contained in a hyperedge if the corresponding song is performed in the corresponding concert. The results of performing model selection using the approach of Section 4.1 is presented in Table A2. We can see from Table A2 that ELCA models with more than one additional cluster significantly out-perform standard latent class analysis models.

The model with 5 clusters and 2 additional clusters was chosen and fitted to the data set. The parameter estimatesπ ˆ,τ ˆ anda ˆ are given in Table 3. We can deduce froma ˆ andτ ˆ that there are a small number of very short concerts of length approximately 14% of the rest of the “full” concerts.

Table 3: Estimates of π, τ and a from fitting the ELCA model with 5 clusters and 2 additional clusters for Lady Gaga concerts 2014 data set

πˆ (0.23, 0.31, 0.23, 0.12, 0.05) τˆ (0.11, 0.89) aˆ (0.14, 1)

Table 4 shows the parameter estimates φˆ where the popularity of the 51 songs across 5 clusters are shown. One can see a small number of extremely popular songs which tend to be performed in most concerts, such as “Paparazzi”, “”, “Born This Way”, “G.U.Y” and “Just Dance”. Among the least performed songs, “Fashion!” and “Cake Like Lady Gaga” tend to be performed in the same concert, while “Lush Life”, “It Don’t Mean a Thing (If It Ain’t Got That Swing)” and “But Beautiful” are more likely to be performed in the same concert.

The estimated cluster assignment probabilities for each concert performed by Lady Gaga in 2014 are shown in chronological order in Figure 5. There is a strong association between clusters and time of the year. For example, the first 30 concerts performed in 2014 are mainly associated with cluster 1 where songs such as “Bad Romance”, “Judas” and “Aura” are among the most popular ones. On the other hand, the next 30 concerts are strongly associated with cluster 2 where songs such as “”, “Venus” and “Ratchet” are popular. The last 10 concerts of 2014 are mostly clustered into cluster 4 where songs such as “Yo and I”, “Monster” and “Black Jesus Amen Fashion” are frequently performed.

Figure 6 shows the distribution of hyperedge sizes (or the number of songs performed in concerts) along with the estimated hyperedge sizes by the ELCA model with G = 5 and K = 2, and the LCA model with 5 clusters. Adding an additional cluster to the model significantly improves the fit, especially on the tails of the distribution.

15 Table 4: Estimates of {φig} from fitting the ELCA model with 5 clusters and 2 additional clusters to the Lady Gaga concerts 2014 data set Songs Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Monster for Life 0.04 0.00 0.00 0.00 0.00 Fashion! 0.00 0.00 0.68 0.00 0.00 Paparazzi 1.00 1.00 1.00 1.00 1.00 Bad Romance 1.00 1.00 1.00 1.00 1.00 What’s Up 0.12 0.00 0.00 0.00 0.44 Sophisticated Lady 0.00 0.00 0.00 0.00 0.31 0.00 0.00 0.00 0.00 0.44 Born This Way 1.00 1.00 1.00 1.00 1.00 Judas 1.00 0.87 0.00 0.00 0.00 Partynauseous 1.00 1.00 1.00 0.00 1.00 Yo and I 1.00 0.13 0.00 1.00 1.00 I Will Always Love You 0.00 0.03 0.00 0.00 0.00 Monster 0.00 0.00 0.00 1.00 0.00 Bang Bang (My Baby Shot Me Down) 0.72 0.00 0.00 0.00 1.00 The Queen 0.00 0.00 0.09 0.00 0.00 Dope 1.00 0.60 0.00 1.00 1.00 Jewels N’ Drugs 1.00 1.00 1.00 0.00 1.00 Hair 0.00 0.00 0.05 0.00 0.00 Mary Jane Holland 1.00 0.00 0.95 0.00 1.00 G.U.Y. 1.00 1.00 1.00 1.00 1.00 MANiCURE 1.00 1.00 1.00 0.00 1.00 Lush Life 0.00 0.00 0.00 0.10 0.51 It Don’t Mean a Thing (If It Ain’t Got That Swing) 0.00 0.00 0.00 0.00 0.49 You’ve Got a Friend 0.00 0.00 0.00 0.12 0.00 The Edge of Glory 1.00 1.00 0.04 0.00 1.00 Donatella 1.00 1.00 1.00 0.00 1.00 But Beautiful 0.00 0.00 0.00 0.00 0.31 1.00 1.00 1.00 0.00 1.00 Gypsy 1.00 1.00 1.00 0.00 1.00 Applause 1.00 1.00 1.00 1.00 1.00 0.04 0.00 0.05 0.00 0.44 Sexxx Dreams 1.00 0.97 1.00 1.00 1.00 Another One Bites the Dust 0.00 0.00 0.00 0.12 0.00 Just Dance 1.00 1.00 1.00 1.00 1.00 Cake Like Lady Gaga 0.00 0.00 0.68 0.00 0.00 I Can’t Give You Anything but Love, Baby 0.00 0.03 0.00 0.00 0.49 Black Jesus Amen Fashion 0.00 0.00 0.00 1.00 0.00 Ratchet 1.00 1.00 1.00 0.00 1.00 Aura 1.00 0.87 0.91 0.00 0.00 Poker Face 1.00 1.00 1.00 1.00 1.00 Venus 1.00 1.00 1.00 0.00 1.00 If I Ever Lose My Faith in You 0.00 0.00 0.00 0.12 0.00 Bell Bottom Blues 0.04 0.00 0.00 0.00 0.00 Whole Lotta Love 0.04 0.00 0.00 0.00 0.00 Telephone 1.00 1.00 1.00 0.00 1.00 Willkommen 0.16 0.00 0.00 0.00 0.00 Brooklyn Nights 0.00 0.03 0.00 0.00 0.00 Alejandro 1.00 1.00 1.00 0.00 1.00 ARTPOP 1.00 1.00 1.00 1.00 1.00 I’ve Got a Crush on You 160.00 0.00 0.05 0.00 0.00 Swine 1.00 0.97 1.00 0.00 1.00 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 Cluster Probability Cluster Probability Cluster Probability 0.0 0.0 0.0

0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Concert Concert Concert 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Cluster Probability Cluster Probability 0.0 0.0

0 20 40 60 80 0 20 40 60 80 Concert Concert

Figure 5: Probability of clusters for concerts in Lady Gaga Concerts 2014 data set

Histogram of sizes of hyper−edges

Blue: GLCA G=5 K=2

0.20 Red: LCA G=5 0.15 Probability 0.10 0.05 0.00

0 5 10 15 20 25 30

Hyper−edge Size

Figure 6: Size distribution of hyperedges of the Lady Gaga Concerts 2014 data set

17 6. Conclusion

In this paper, we have proposed the Extended Latent Class Analysis model as a generative model for random hypergraphs. The model introduces two clustering structures for hyperedges which captures variation in sizes of hyperedges.

An EM algorithm has developed for model fitting where the M-step is implemented using a MM algorithm. Model selection is performed using cross validated likelihood method to account for the small sample sizes relative to the number of vertices.

The model has been shown to give an improved fit relative to the Latent Class Analysis model for three illustrative examples. Furthermore, the fitted model reveals interesting and interpretable structure within the vertices and hyperedges.

References

[1] Azondekon, R., Harper, Z. J., Agossa, F. R., Welzig, C. M., and McRoy, S. (2018), “Scientific authorship and collaboration network analysis on malaria research in Benin: Papers indexed in the Web of Science (1996–2016),” Global Health Research and Policy, 3, 11.

[2] Borgatti, S. P. and Everett, M. G. (1997), “Network analysis of 2-mode data,” Soc. Networks, 19, 243 – 269.

[3] Celeux, G. and Govaert, G. (1991), “Clustering criteria for discrete data and latent class models,” Journal of Classification, 8, 157–176.

[4] Cheng, Y. and Church, G. M. (2000), “Biclustering of expression data,” in Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, AAAI Press, pp. 93–103.

[5] de Panafieu, E.´ (2015), “Phase transition of random non-uniform hypergraphs,” J. Discrete Al- gorithms, 31, 26–39.

[6] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Statist. Soc. Ser. B, 39, 1–38, with discussion.

[7] Dhillon, I. S., Mallela, S., and Modha, D. S. (2003), “Information-theoretic co-clustering,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98.

[8] Doreian, P. and Batagelj, V. (2004), “Generalized blockmodeling of two-mode network data,” Soc. Networks, 29–53.

18 [9] Dyer, M., Frieze, A., and Greenhill, C. (2015), “On the chromatic number of a random hyper- graph,” J. Combin. Theory Ser. B, 113, 68–122.

[10] Faust, K., Willert, K., Rowlee, D., and Skvoretz, J. (2002), “Scaling and statistical models for affiliation networks: patterns of participation among Soviet politicians during the Brezhnev era,” Soc. Networks, 24, 231–259.

[11] Field, S., Frank, K. A., Schiller, K., Riegle-Crumb, C., and Muller, C. (2006), “Identifying posi- tions from affiliation networks: Preserving the duality of people and events,” Soc. Networks, 28, 97 – 123.

[12] Friel, N., Rastelli, R., Wyse, J., and Raftery, A. E. (2016), “Interlocking directorates in Irish companies using a latent space model for bipartite networks,” Proc. Natl. Acad. Sci. U.S.A., 113, 6629–6634.

[13] Fujimoto, K., Chou, C.-P., and Valente, T. W. (2011), “The network autocorrelation model using two-mode data: Affiliation exposure and potential bias in the autocorrelation parameter,” Soc. Networks, 33, 231 – 243.

[14] George, T. and Merugu, S. (2005), “A scalable collaborative filtering framework based on co- clustering,” in Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 625– 628.

[15] Goldschmidt, C. (2005), “Critical random hypergraphs: the emergence of a giant set of identifiable vertices,” Ann. Probab., 33, 1573–1600.

[16] Goodman, L. A. (1974), “Exploratory latent structure analysis using both identifiable and uniden- tifiable models,” Biometrika, 61, 215–231.

[17] Govaert, G. and Nadif, M. (2003), “Clustering with block mixture models.” Pattern Recognition, 36, 463–473.

[18] — (2008), “Block clustering with Bernoulli mixture models: comparison of different approaches,” Comput. Statist. Data Anal., 52, 3233–3245.

[19] Handcock, M. S., Raftery, A. E., and Tantrum, J. M. (2007), “Model-based clustering for social networks,” J. Roy. Statist. Soc. Ser. A, 170, 301–354.

[20] Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2002), “Latent space approaches to social network analysis,” J. Amer. Statist. Assoc., 97, 1090–1098.

[21] Holland, P. W. and Leinhardt, S. (1981), “An exponential family of probability distributions for directed graphs,” J. Amer. Statist. Assoc., 76, 33–65.

[22] Hunter, D. R. and Lange, K. (2004), “A tutorial on MM algorithms,” Amer. Statist., 58, 30–37.

19 [23] Karo´nski,M. andLuczak,T. (2002), “The phase transition in a random hypergraph,” J. Comput. Appl. Math., 142, 125–135.

[24] Koskinen, J. and Edling, C. (2012), “Modelling the evolution of a bipartite network - Peer referral in interlocking directorates,” Soc. Networks, 34, 309 – 322, dynamics of Social Networks (2).

[25] Lange, K., Hunter, D. R., and Yang, I. (2000), “Optimization transfer using surrogate objective functions,” J. Comput. Graph. Statist., 9, 1–59.

[26] Latapy, M., Magnien, C., and Vecchio, N. D. (2008), “Basic notions for the analysis of large two-mode networks,” Soc. Networks, 30, 31–48.

[27] Latouche, P., Birmel´e,E., and Ambroise, C. (2011), “Overlapping stochastic block models with application to the French political blogosphere,” Ann. Appl. Stat., 5, 309–336.

[28] Lazarsfeld, P. F. and Henry, N. W. (1968), Latent structure analysis, Houghton Mifflin, Boston, MA 02110, USA.

[29] Lind, P. G., Gonz´alez,M. C., and Herrmann, H. J. (2005), “Cycles and clustering in bipartite networks,” Phys. Rev. E, 72, 056127.

[30] Lunagmez, S., Mukherjee, S., Wolpert, R. L., and Airoldi, E. M. (2017), “Geometric representa- tions of random hypergraphs,” J. Amer. Stat. Assoc., 112, 363–383.

[31] Meng, X.-L. and Rubin, D. B. (1993), “Maximum likelihood estimation via the ECM algorithm: a general framework,” Biometrika, 80, 267–278.

[32] Moody, J. (2004), “The structure of a social science collaboration network: disciplinary cohesion from 1963 to 1999,” Am. Sociol. Rev, 69, 213–238.

[33] Newman, M. E. (2004), Who Is the Best Connected Scientist? A Study of Scientific Coauthorship Networks, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 337–370.

[34] Newman, M. E. J. (2001), “Scientific collaboration networks. I. Network construction and funda- mental results,” Phys. Rev. E, 64, 016131.

[35] — (2001), “Scientific collaboration networks. II. Shortest paths, weighted networks, and central- ity,” Phys. Rev. E, 64, 016132.

[36] Nowicki, K. and Snijders, T. A. B. (2001), “Estimation and prediction for stochastic blockstruc- tures,” J. Amer. Statist. Assoc., 96, 1077–1087.

[37] Perugini, S., Gon¸calves, M. A., and Fox, E. A. (2004), “Recommender systems research: a connection-centric survey,” Journal of Intelligent Information Systems, 23, 107–143.

20 [38] Poole, D. (2015), “On the strength of connectedness of a random hypergraph,” Electron. J. Combin., 22, Paper 1.69, 16.

[39] Rau, A., Maugis-Rabusseau, C., Martin-Magniette, M.-L., and Celeux, G. (2015), “Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models,” Bioin- formatics, 31, 1420.

[40] Robins, G. and Alexander, M. (2004), “Small worlds among interlocking directors: network struc- ture and distance in bipartite graphs,” Comput. Math. Organ. Theory, 10, 69–94.

[41] Skvoretz, J. and Faust, K. (1999), “Logit models for affiliation networks,” Sociol. Methodol, 29, 253–280.

[42] Smyth, P. (2000), “Model selection for probabilistic clustering using cross-validated likelihood,” Statistics and Computing, 10, 63–72.

[43] Snijders, T. A., Lomi, A., and Torl, V. J. (2013), “A model for the multiplex dynamics of two- mode and one-mode networks, with an application to employment preference, friendship, and advice,” Soc. Networks, 35, 265–276.

[44] Stasi, D., Sadeghi, K., Rinaldo, A., Petrovic, S., and Fienberg, S. (2014), “β models for random hypergraphs with a given degree sequence,” in Proceedings of COMPSTAT 2014—21st Interna- tional Conference on Computational Statistics, pp. 593–600.

[45] Wang, P., Pattison, P., and Robins, G. (2013), “Exponential random graph model specifications for bipartite networks - A dependence hierarchy,” Soc. Networks, 35, 211–222.

[46] Wang, P., Sharpe, K., Robins, G., and Pattison, P. (2009), “Exponential random graph (p*) models for affiliation networks,” Soc. Networks, 31, 12–25.

[47] Wang, Y. H. (1993), “On the number of successes in independent trials,” Statist. Sinica, 3, 295– 312.

Appendix A. Proof on Proposition 2.1

PN Proof. We can write A = i=1 Ai where Ai = 1 if node i appears in the hyperedge and Ai = 0 PN otherwise. Similarly, we write B = i=1 Bi. Let ZA be the latent cluster assignment of XA where (1) (2) ZA = g if XA is generated from cluster g. Let ZB and ZB be the latent cluster and additional (1) (2) clusters assignments of XB, where ZB = g and ZB = k if XB is generated from cluster g and

21 additional clusters k. We have

G X E(A) = E(A|ZA)P r(ZA = g) g=1 G N X X = E(Ai|ZA = g)P r(ZA = g) g=1 i=1 G N X X = pigπg g=1 i=1

G K X X (1) (2) (1) (2) E(B) = E(B|ZB = g, ZB = k)P r(ZB = g, ZB = k) g=1 k=1 G K N X X X (1) (2) (1) (2) = E(Bi|ZB = g, ZB = k)P r(ZB = g, ZB = k) g=1 k=1 i=1 G K N X X X = φigakτkπg g=1 k=1 i=1 G N X X = pigπg g=1 i=1 = E(A)

For the variance of the LCA model, we have that

N N X X V ar(A) = V ar(Ai) + 2 Cov(Ai,Aj) i=1 i

2 2 V ar(Ai) = E(Ai ) − E(Ai)

2 = P r(Ai = 1) − P r(Ai = 1) G G X  X 2 = pigπg − pigπg g=1 g=1

Cov(Ai,Aj) = E(AiAj) − E(Ai)E(Aj)

= P r(Ai = Aj = 1) − P r(Ai = 1)P r(Aj = 1) G G G X  X  X  = pigpjgπg − pigπg pjgπg g=1 g=1 g=1 Hence, we have that

N G N G X X X  X 2 V ar(A) = pigπg − pigπg i=1 g=1 i=1 g=1 N G N G G X X X  X  X  +2 pigpjgπg − 2 pigπg pjgπg i

22 Now,

N N X X V ar(B) = V ar(Bi) + 2 Cov(Bi,Bj) i=1 i

2 V ar(Bi) = P r(Bi = 1) − P r(Bi = 1) G K G K X X  X X 2 = φigakτkπg − φigakτkπg g=1 k=1 g=1 k=1 G G X  X 2 = pigπg − pigπg g=1 g=1

Cov(Bi,Bj) = P r(Bi = Bj = 1) − P r(Bi = 1)P r(Bj = 1) G K G G X X 2  X  X  = φigφjgakπgτk − pigπg pjgπg g=1 k=1 g=1 g=1

We have

N G N G X X X  X 2 V ar(B) = pigπg − pigπg i=1 g=1 i=1 g=1 N G K N G G X X X 2 X  X  X  +2 φigφjgakπgτk − 2 pigπg pjgπg i

Now,

N G K N G X X X 2 X X V ar(B) − V ar(A) = 2 φigφjgakπgτk − 2 pigpjgπg i

N G K K 2 X X  X 2  X   = 2 φigφjg akτk − akτk πg i

To show the quantity above is non-negative, we have to show that

K K 2 X 2  X  akτk − akτk ≥ 0 k=1 k=1 which follows from Jensen’s inequality.

Appendix B. M-step of EM Algorithm

(t) For the M-step, we need to maximize Q(θ|θ ) with respect to the model parameters {φig}, {ak}, {πg} and {τk}.

23 Appendix B.1. Maximize w.r.t. φig

For fixed i and g, the objective function retaining terms involving φig can be written as

M K X X (1)d(2)  Q = Zjg Zjk xij log(φig) + (1 − xij) log(1 − akφig) (B.1) j=1 k=1

Since an analytic expression for arg maxφig {Q} does not exist due to the term log(1 − akφig), we apply the MM (Minorization Maximization) algorithm [22]. We first apply a quadratic lower bound on the concave function log(1 − akφig) for k < K. We let

f(φig) = log(1 − akφig).

We then have

∂f −a = k ∂φig 1 − akφig

2 2 2 ∂ f −ak −ak 2 = 2 ≥ 2 ∂φig (1 − akφig) (1 − ak) Hence, we have

 −a  1 −a2  log(1 − a φ ) ≥ log(1 − a φ(t)) + k (φ − φ(t)) + k (φ − φ(t))2 k ig k ig (t) ig ig 2 ig ig 2 (1 − ak) 1 − akφig

Hence, the objective function in (B.1) up to an additive constant can be minorized by the function below.

M K M K−1 X X (1)d(2) X X (1)d(2) Qlower = Zjg Zjk xij log(φig) + Zjg Zjk (1 − xij) (B.2) j=1 k=1 j=1 k=1 !  −a  1 −a2  k φ + k (φ − φ(t))2 (t) ig 2 ig ig 2 (1 − ak) 1 − akφig M X (1)d(2) + Zjg ZjK (1 − xij) log(1 − φig) j=1

To simplify the expression above, we define the quantities below.

M K X X (1)d(2) A1 = Zjg Zjk xij j=1 k=1

M X (1)d(2) A2 = Zjg ZjK (1 − xij) j=1

M K−1 X X (1) (2) −ak B = Z dZ (1 − x ) 1 jg jk ij (t) j=1 k=1 1 − akφig

24 M K−1 2 X X (1)d(2) 1 −ak B2 = Zjg Zjk (1 − xij) 2 2 (1 − ak) j=1 k=1

Now, the lower bound in (B.2) can be written as below.

(t) 2 Qlower = A1 log(φig) + A2 log(1 − φig) + B1φig + B2(φig − φig )

Taking derivative with respect to φig, we have

A1 A2 (t) − + B1 + 2B2φig − 2B2φig = 0 φig 1 − φig

(t) Let C = B1 − 2B2φig , we have

3 2B2 − C 2 C − A1 − A2 A1 φig − φig − φig − = 0 (B.3) 2B2 2B2 2B2

Solving the cubic equation above results in the update for φig.

Appendix B.2. Maximize w.r.t. ak

For a fixed k, the objective function (3) retaining terms involving ak can be expressed as

M G N X X (1)d(2) X  Q = Zjg Zjk xij log(ak) + (1 − xij) log(1 − akφig) . (B.4) j=1 g=1 i=1

Since an analytic expression for arg max {Q} does not exist due to the term log(1 − a φ ), we ak k ig apply the MM (Minorization Maximization) algorithm. We first apply a quadratic lower bound on the concave function

 −φ  1 −φ2  log(1 − a φ ) ≥ log(1 − a(t)φ ) + ig (a − a(t)) + ig (a − a(t))2 k ig k ig (t) k k 2 k k 2 (1 − φig) 1 − ak φig

Hence, (B.4) up to an additive constant can be minorized by the function below.

M G N ! M G N X X (1)d(2) X X X (1)d(2) X Qlower = Zjg Zjk xij log(ak) + Zjg Zjk (1 − xij) (B.5) j=1 g=1 i=1 j=1 g=1 i=1 ! −φ 1 −φ2  ig a + ig (a − a(t))2 (t) k 2 k k 2 (1 − φig) 1 − ak φig

To simply the expression above, we define the following quantities.

M G N X X (1)d(2) X A = Zjg Zjk xij j=1 g=1 i=1

M G N X X (1) (2) X  −φig  B = Z dZ (1 − x ) jg jk ij (t) j=1 g=1 i=1 1 − ak φig

25 M G N 2 X X (1) (2) X 1 −φig  C = Z dZ (1 − x ) jg jk ij 2 (1 − φ )2 j=1 g=1 i=1 ig

Taking derivative of (B.4) with respect to ak, we have

∂Qlower A (t) = + B + 2C(ak − ak ) = 0 ∂ak ak

B (t) A Let D = ( 2C − ak ), E = − 2C , we have

 D2 1/2 D aˆ = E + − (B.6) k 4 2

Appendix B.3. Maximize w.r.t. πg and τk

The update for πg and τk are straightforward and are given below.

M K X X (1)d(2) πˆg ∝ Zjg Zjk (B.7) j=1 k=1

M G X X (1)d(2) τˆk ∝ Zjg Zjk (B.8) j=1 g=1

Appendix C. Model Selection & Data Description

Table A1: Model Selection for the Star Wars Data Set No. of Clusters No. of Additional Clusters Cross Validated Loglikelihood 1 1 -194.77 1 2 -206.18 2 1 -194.39 2 2 -193.92 2 3 -194.63 3 1 -194.12 3 2 -190.78 3 3 -194.96 4 1 -194.99

26 Table A2: Model Selection for Lady Gaga Concerts 2014 Data Set No. of Clusters No. of Additional Clusters Cross Validated Loglikelihood 1 1 -521.82 1 2 -822.68 2 1 -389.93 2 2 -306.86 2 3 -270.79 2 4 -274.56 3 1 -340.95 3 2 -274.87 3 3 -261.05 3 4 -246.11 3 5 -250.71 4 1 -338.64 4 2 -237.21 4 3 -248.05 5 1 -322.92 5 2 -215.39 5 3 -236.71 6 1 -328.36

27