1

Hypergraph Dissimilarity Measures Amit Surana, Can Chen and Indika Rajapakse

Abstract—In this paper, we propose two novel approaches As tensors [10] provide a natural framework to represent for comparison. The first approach transforms the multi-dimensional patterns and capture higher-order interac- hypergraph into a graph representation for use of standard tions, they are finding increasing role in context of hyper- graph dissimilarity measures. The second approach exploits the mathematics of tensors to intrinsically capture multi-way graphs. For example, the spectral theory of graphs has been relations. For each approach, we present measures that assess extended to using tensor eigenvalues [11], and hypergraph dissimilarity at a specific scale or provide a more authors in [12] define notion of tensor entropy for uniform holistic multi-scale comparison. We test these measures on hypergraphs generalizing von Neumann entropy of a graph synthetic hypergraphs and apply them to biological datasets. to hypergraphs. The problem of controllability of dynamics Index Terms—Hypergraphs, dissimilarity measures, tensors, on hypergraphs is studied via tensor-based representation and biological systems. nonlinear control theory in [13]. Similar to above mentioned work, the goal of this paper is to extend the graph comparison framework to hypergraphs. I.INTRODUCTION Comparison of structures such as modular communities, OMPLEX systems in sociology, biology, cyber-security, hubs, and trees yield insight into the generative mechanisms C telecommunications, and physical infrastructure are often and functional properties of the graph. Graph comparison represented as a set of entities, i.e. “vertices” with binary can be used for comparing brain or metabolic networks for relationships or “edges,” and hence are analyzed via graph different subjects, or the same subject before and after a treat- theoretic methods. Graph models, while simple and to some ment, and for characterizing the temporal network evolution universal, are limited to representing pairwise relation- during treatment [14]. Classification of graphs, for example ships between entities. However, real-world phenomena can be in context of protein-protein interaction networks and online rich in multi-way relationships, dependencies between more social networks can be facilitated via use of graph comparison than two variables, or properties of collections of more than measures [15]. Combined with a clustering algorithm, a graph two objects. Examples include computer networks where the comparison measure can be used to aggregate networks in a dynamic relations are defined by packets exchanged over time meaningful way and reveal redundancy in the data/networks between computers, co-authorship networks where relations [16]. Graph comparison can be used for evaluating the ac- are articles written by two or more authors, historical docu- curacy of statistical or generative network models [17], and ments where multiple persons can be mentioned together, brain can be further utilized as an objective function to drive the activity where multiple regions can be highly active at the optimization procedure to fit graph models to data. same time, film actor networks, and protein-protein interaction In order to compare graphs, a variety of dissimilarity mea- networks, [1]–[4]. sures (DM) or distances have been proposed in the literature A hypergraph is a generalization of a graph in which which either assess similarity at a specific scale e.g. local its hyperedges can join any number of vertices [5]. Thus, or global, or provide a more holistic multi-scale comparison. hypergraphs can capture multi-way relationships unambigu- See references [14], [18], [19] for a comprehensive review. ously [6], and are the natural representation of a broad range While there is a rich body of literature for graph dissimilarity of systems mentioned above. Although an expanding body measures (GDM), analogous notions for hypergraph are lack- arXiv:2106.08206v1 [cs.LG] 15 Jun 2021 of research attests to the increased utility of hypergraph- ing in literature. To address this gap, we propose two new based analyses, many methods have been approaches for hypergraph comparison. Just like for GDMs historically developed explicitly (and often, exclusively) for within each of these HDM approaches we present a collection graph-based analyses and do not directly translate to hyper- of DMs which either assess hypergraph similarity at a specific graphs. Consequently, new framework are being developed for scale or provide a multi-scale comparison. Specifically, key representation, learning and analysis of hypergraphs, see [2], contributions of this paper are as follows: [3] for a recent survey. These include techniques for converting • We develop an indirect approach for comparing hyper- hypergraphs into graphs and defining hypergraph Laplacian graphs by first transforming the hypergraph into a graph [7], higher-order random walks-based hypergraph analysis [8], and then invoking standard GDMs. In particular we and defining dynamics on hypergraphs [9]. explore and star expansion for this transformation. While information about hypergraph structure may be lost A. Surana is with Raytheon Technologies Research Center, East Hartford, CT 06108 USA (e-mail: [email protected]). during such transformations/projections, the assumption C. Chen is with the Department of Mathematics, University of Michigan, is that relevant salient features may still be preserved Ann Arbor, MI 48109 USA (e-mail: [email protected]). which are sufficient to capture key differences between I. Rajapakse is with the Department of Computational Medicine & Bioin- formatics, Medical School and the Department of Mathematics, University of underlying hypergraphs. We refer to these DMs as indi- Michigan, Ann Arbor, MI 48109 USA (e-mail: [email protected]). rect HDMs. 2

• We introduce another direct approach which relies on ten- non-negative function D : G × G → [0, ∞), i.e. D(G, G˜) = sor based representation of hypergraph which intrinsically D(G˜, G) for any G, G˜ ∈ G. D quantifies between captures the multi-way relations encoded by hyperedges. two hypergraphs with larger values indicating higher degree In particular we use adjacency tensor and Laplacian of dissimilarity. tensor associated with hypergraphs, and tensor algebraic Note that in general D is not required to satisfy D(G, G˜) = 0 notions of tensor eigenvalues/eigenvectors and higher even when G and G˜ are isomorphic, or the triangular inequality, order singular values to develop new notions of DMs for and thus may not be a valid metric. But depending on the hypergraphs. We refer to these DMs as direct HDMs. application, such requirements may be further imposed on D. • We test the proposed HDMs on synthetic hypergraphs Approaches to graph comparison can be roughly divided into to assess their usability in discerning between common two groups, those that consider or require two graphs to be hypergraph topologies. We also apply the methods to real- defined on the same set of vertices, and those that do not. world hypergraphs arising in biological datasets. To distinguish DMs which have been defined specifically The paper is organized into seven sections. We introduce for graphs in the literature, we will refer to them as graph basic notation and mathematical preliminaries related to hy- dissimilarity measures (GDMs), and denote them by D. pergraphs, and discuss some desirable characteristics of HDMs A. Characteristic of Dissimilarity Measures in Section II. In Section III, we provide a short survey of different GDMs. We then use these GDMs in Section IV to In this section we summarize some desirable properties of define indirect HDMs based on conversion of hypergraphs into graph dissimilarity measures which have been noted in the graphs. In Section V, we develop notions of direct HDMs literature, and can also be applied in context of hypergraphs. using tensor based representation of hypergraphs. Applications These properties can serve as guidelines for selecting appro- to synthetic and real-world hypergraph datasets are presented priate DM, and can further be modified and enriched by the in Section VI. We discuss pros/cons of indirect and direct data analyst depending on the application at hand. Examples HDMs and directions for future research in Section VI-D, and of some desirable properties for the DMs include [20]: conclude in Section VII. • Edge-importance: modifications of the graph structure yielding disconnected components should be penalized more. II.PRELIMINARIES • Edge-submodularity: a specific change is more important Hypergraph: Let V be a finite set. A hypergraph G is a in a graph with a few edges than in a denser graph on pair (V,E) where E ⊆ P(V ) \ {∅}, the power set of V . The the same vertices. elements of V are called the vertices, and the elements of E • Weight awareness: the impact on the similarity measure are called the hyperedges. We note that in this definition of increases with the weight of the modified edge. hypergraph we do not allow for repeated vertices within an • Focus awareness: random changes in graphs are less hyperedge (often called hyperloops). For a weighted hyper- important than targeted changes of the same extent. graph, there is positive weight function w : E → (0, ∞) Depending on the application, additional invariance properties which defines a weight w(e) > 0 associated with each may be imposed on the DMs, such as [21]: hyperedge e ∈ E. The degree d(v) of a v ∈ V is • Permutation-invariance: implies that if two graphs’ struc- P d(v) = e∈E|v∈e w(e). The degree of an hyperedge e is ture are the same (i.e., if the two graphs are isomorphic) denoted by d(e) = |e|, where | · | denotes set cardinality. For the DM between them is zero. k-uniform hypergraphs, the degree of each hyperedge is the • Scale-adaptivity: implies that the DM accounts for dif- same, i.e. d(e) = k. The vertex-hyperedge ferences in both local (edge and node) and global (com- H is a |V | × |E| matrix where the entry h(v, e) is 1 if v ∈ e munity) features. Using local features only, a DM would and 0 otherwise. By these definitions, we have, deem two graphs sharing local patterns to have near-zero X X distance although their global properties (such a page- d(v) = w(e)h(v, e), d(e) = h(v, e). rank features) may differ, and, in reverse, relying on e∈E v∈V global features only would miss the differences in local Let De and Dv be the diagonal matrices consisting of hy- structure (such as edge distributions). peredge and vertex degrees as diagonal entries, respectively. • Size-invariance: is the capacity of DM to discern that Similarly we will denote by W the diagonal matrix formed two graphs represent the same phenomenon at a different by hyperedge weights w(·) as its diagonal entries. magnitudes (e.g., two criminal circles of similar structures Note that a standard graph is a 2-uniform hypergraph. We but different sizes should have near-zero DM). Size- will denote a standard graph by G, and by A as its adjacency invariance postulates that if two graphs originate from matrix which is |V |×|V | matrix with entry (u, v) equal to the the sampling of the same domain, they should be deemed edge weight w(e) (where e is such that (u, v) ∈ e) if they are similar. connected, and 0 otherwise. The incidence matrix H, and the EVIEWOF RAPH ISSIMILARITY EASURES diagonal matrices Dv and W are similar to as defined above. III.R G D M Hypergraph Dissimilarity Measure (HDM): Let G be We review some key GDMs, the material is taken from the space of hypegraphs with finite number of vertices. A the survey articles [14], [18], [19]. Approaches to graph hypergraph dissimilarity measure (HDM) D is a symmetric comparison can be categorized from different perspectives. 3

One categorization is based on whether the graph compar- distribution, diameter, number of triangles, number of ison method requires the two graphs to be defined on the k-cliques, etc. For graph features that are vector-valued same set of vertices or not. The former eliminates the need (such as ) one might also consider the to discover a mapping between node sets, making comparison vector as an empirical distribution and take as graph relatively easier. A common approach for comparison without features the sample moments (or quantiles, or other sta- assuming node correspondence is to build the DM using graph tistical properties). A feature-based distance is a distance invariants. Graph invariants are properties of a graph that hold that uses comparison of such features to compare graphs. for all isomorphs of the graph. Using an invariant mitigates If we are using node dependent features, the method any concerns with the encoding of the graphs, and the DM is aggregates a feature-vertex matrix of size k × n, where k instead focused completely on the graph topology. is number of features selected. This feature-vertex matrix Another categorization of graph comparison methods is for the two graphs can then be directly compared, or can based on scale at which they compare structures [14]. Local be further reduced to a “signature vector” that consists DMs are only sensitive to differences in direct neighbourhood of the mean, median, standard deviation, skewness, and of each node, while global DMs may ignore node identities kurtosis of each feature across vertices. These signature and perceive differences only in global structures in the graph vectors are then compared in order to obtain a DM such as hubs, communities, number of spanning trees, etc. between graphs. NETSIMILE [22] is an example of On the other hand, mesoscopic DMs work at intermediate feature based distance which uses local and egonet-based scale such that they not only preserve vertex identities but features (e.g., degree, volume of egonet as fraction of also incorporate information characterising vertices by their maximum possible volume, etc.). In the neuroscience relationship to the whole graph, rather than uniquely with literature, in particular, feature-based methods are fairly respect to their neighbours . Finally, multi-scale DMs attempt popular. to capture aspects from multiple scales i.e. local, global and/or In this paper, we will utilize node vector c = T mesoscopic in quantifying differences between graphs. (c1, ··· , cn) as the feature for graph comparison. Let ˜ Let G and G be two graphs under comparison with ad- ci, i = 1, ··· , n and c˜i, i = 1, ··· , n be normalized (i.e. ˜ jacency matrices A and A, respectively, and let their graph |c1| = |c2| = 1) node for the graphs G and Laplacians be L and L˜, respectively. The graph Laplacian G˜, respectively, then centrality based DM is given by: L (and similarly L˜) could be the standard combinatorial n Laplacian 1 X D (G, G˜) = |c − c˜ |. (3) Lun = Dv − A, (1) C n i i i=1 or its normalized version, Note that one could use any notion of centrality, e.g. L = I − D−1/2AD−1/2. (2) v v betweeness centrality, closeness centrality, eigenvector We shall denote Laplacian eigenvalues as 0 = λ1 ≤ λ1 ≤ centrality etc. as relevant for the application [23]. Since, · · · ≤ λn. The literature remains divided on which version centrality measures typically characterize vertices as ei- of the Laplacian to pick for defining the DM. Since the ther belonging to the core or to the periphery of the eigenvalues of the normalized Laplacian are bounded between graph, and thus encode global topological information 0 and 2, it makes it a more stable and preferable represen- on the status of vertices within the graph, DMs based on tation. Therefore, if otherwise stated, we will always use the centrality capture mesoscopic differences between graphs normalized Laplacian L in definition of the GDMs. [14]. • Structural DMs: The simplest GDMs are obtained by • Spectral DMs: Spectral DMs on the other hand are directly computing the difference of the adjacency matri- more suitable for analyses where the critical information ces of the two graphs and then using a suitable norm e.g., in the graph structure is contained at a global scale, Euclidean, Manhattan, Canberra, or Jaccard. Examples of rather than locally. Spectral DMs are global measures such GDMs include, the Hamming distance, defined using the eigenvalues of either the A or of some version of the Laplacian L [14]. ||A − A˜ ||1 DH (G, G˜) = , Both the eigenvalues of the Laplacian and those of the n(n − 1) adjacency matrix can be related to physical properties of and, the Jaccard distance, a graph, and can thus be considered as characteristics of P its states. The adjacency matrix does not downweight any min(Aij, A˜ ij) D (G, G˜) = 1 − ij . changes and treats all vertices equivalently. On the other J P ˜ ij max(Aij, Aij) hand, the eigenspectrum of the Laplacian accounts for the Structural DMs focus on differences in the direct local degree of the vertices and is known to be robust to most neighborhood of each node, and are agnostic to other perturbations. Specific example of spectral DMs include more global structures in the graph. lp distance on space of Laplacian eigenvalues, • Feature-based DMs: Another possible method for com- n paring graphs is to look at specific “features” of the graph, 1 X D (G, G˜) = |λ − λ˜ |p, (4) such as the degree distribution, betweenness centrality λ n i i i=1 4

and, spanning tree DM, all the necessary information to characterize vertices’ ˜ topological status within the graph. Then the heat kernel DST (G, G) = | log(TG) − log(TG˜ )|, (5) DM between the graphs amounts to the average l2 dis- where, TG is number of spanning trees in the graph, given tance between corresponding node’s structural embedding by ξi, i.e., n−1 n 1 Y 1 X TG = λi. D (G, G˜) = ||ξ − ξ˜ || . (8) n HK n i i 2 i=1 i=1

Other DMs include distances based on the eigenspectrum • NetLSD: Similar to heat spectral wavelet, in network distributions, Laplacian spectral descriptor (NetLSD) [21] a heat kernel Z ˜ is defined as, Dρ(G, G) = |ρG(x) − ρG˜ (x)|dx, (6) −τL −Λτ T Hτ = e = Ve V , where, n−1 2 along with its heat trace, 1 X 1 − (x−λi) ρG(x) = √ e 2σ2 n 2πσ2 n i=0 X −λj τ hτ = Tr(Hτ ) = e . Another related DM is the Ipsen–Mikhailov distance j=1 which characterizes the difference between two graphs Then the NetLSD condenses the graph representation by comparing their spectral densities, rather than the raw in form of a heat trace signature h(G) = {hτ }τ>0 eigenvalues themselves. which comprises of a collection of heat traces at different • DELTACON: This DM is based on the fast belief prop- time scales. The continuous-time function hτ is finally agation method of measuring node affinities [20]. It uses transformed into a finite-dimensional vector by sampling the fast belief propagation matrix over a suitable time interval. The DM between G and 2 −1 ˜ S = [I +  D − A] , G is then taken to be the l∞ norm of vector difference between h(G) and h(G˜). ˜ and compares the two representations S and S via the Note, that the heat kernel can be seen as continuous- Matusita difference, leading to time random walk propagation (where, (Hτ )ij is the  1/2 heat transferred from node i to node j at time τ), and  q 2 ˜ X p ˜ its diagonal (sometimes referred to as the autodiffusivity D∆(G, G) =  Sij − Sij  . (7) function or the heat kernel signature) can be seen as a ij continuous-time PageRank. As τ approaches zero, the Fast belief propagation is designed to model the diffusion Taylor expansion yields Hτ = 1 − Lτ meaning the heat of information throughout a graph, and so in theory kernel depicts local connectivity. On the other hand for −τλ2 T should be able to perceive differences in both global and large τ, Hτ = 1 − e v2v2 where v2 is the Fielder local structures in the graph. vector used in spectral graph clustering, it encodes global • Heat Spectral Wavelets: An alternative is to derive connectivity. Thus, the heat kernel localizes around its characterizations of each node’s topological properties diagonal, and the degree of localization (as captured by through a signal processing approach. A specific example the heat trace) depends on the scale τ, it can thereby be for this type of DM include the heat spectral wavelets tuned to capture both local and global graph structures. in which the eigenvalues are modulated and combined • Graph Embedding based DMs: Given the diversity with their respective eigenvectors to yield a “filtered” of structural features in graphs, and the difficulty of representation of the graph’s signal. For a given scale designing by hand the set of features that optimizes factor τ > 0 is a scale, structural signature ξu for each the graph embedding, several researchers have proposed node u is defined to be a vector of coefficients, recently to learn the embedding from massive datasets of existing networks. Such algorithms learn an embedding ξτ = (Ψτ , Ψτ , ··· , Ψτ )T , u 1,u 2,u n,u from a set of graphs into Euclidean space, and then where, compute a notion of similarity between the embedded n−1 graphs [24]. All these approaches rely on the extension τ X −τλi Ψv,u = e VuiVvi, of convolutional neural networks to non Euclidean struc- i=0 tures, such as manifolds and graphs. T with, L = VΛV being the Laplacian’s eigenvec- • Graph Kernels based DMs: A popular approach to τ1 T τm T T tor decomposition. Let ξu = ((ξu ) , ··· , (ξu ) ) learning with graph-structured data is to make use of be the combined vector for a set of selected scales graph kernels—functions which measure the similarity τ1, τ2, ··· , τm. By choosing these scales appropriately, between graphs [25]. These kernels can be used for one can capture information on the connectedness and comparing graphs. Many different graph kernels have centrality of each node within the network, thereby pro- been defined, which focus on different types of sub- viding a way to encompass in a single Euclidean vector structures in graphs, such as random walks, shortest 5

paths, subtrees, and cycles. One particular approach is and thus, based on graphlets which are small, connected, non- Acs = HWHT , isomorphic, induced subgraph of a larger graph. There (9) cs cs −1/2 cs cs −1/2 are 30 graphlets with 2- to 5- vertices. Each graphlet con- L = I − (Dv ) A (Dv ) . (10) tains “symmetrical vertices” which are said to belong to Choosing weight matrix [26], the same automorphism orbit. The automorphism orbits co −1 T represent topologically different ways in which a graphlet W = HDe H , can touch a node. The Graphlet Degree Vector (GDV) of c a node generalises the notion of a node’s degree into a and using combinatorial Laplacian for G , leads to Bolla’s 73 Laplacian, 73-dimensional vector where each of the components co −1 T of that vector captures the number of times node n is L = Dv − HDe H . (11) touched by a graphlet at orbit i. Using GDV one can For an unweighted hypergraph G i.e. w(e) = 1, ∀e ∈ E, other then define several different GDMs including: relative choices of weight matrix have been considered, see [7] for graphlet frequency distance, graphlet degree distribution details. agreement, and graphlet correlation matrix / distance. b) Star expansion: The star expansion algorithm con- structs a graph G∗ = (V ∗,E∗) from hypergraph G = (V,E) IV. APPROACH I:INDIRECT HDMS BASEDON GRAPH by introducing a new vertex for every hyperedge e ∈ E, thus REPRESENTATION V ∗ = V ∪E. It connects the new graph vertex e to each vertex The first approach we a propose for defining HDMs is based in the hyperedge to it, i.e. E∗ = {(u, e): u ∈ e, e ∈ E}. Note on transforming the hypergraph into a graph representation and that each hyperedge in E corresponds to a star in the graph G∗, then invoking the standard GDMs. and G∗ is a bi-partite graph. As in clique expansion, different choices can be made for edge weights w∗(u, e) of G∗. In A. Graph-based Hypergraph Representation general, the adjacency matrix A∗ of G∗ can be expressed as, There are two main ways to transform a hypergraph in  ∗  form of a standard graph: clique expansion and star expansion ∗ 0|V | W A = ∗ T , [7], see Figure 1. Once hypergraph is represented in form (W ) 0|E| of a standard graph, one can define appropriate adjacency and, the normalized Laplacian can be shown to be, matrix and graph Laplacian. Rather than first transforming  ∗  hypergraph into a graph, some authors define hypergraph ∗ I −B L = ∗ T , Laplacian directly using analogies from the graph Laplacian. −(B ) I However, it was shown in [7], that several of these direct where, B∗ is the |V | × |E| matrix definitions are special cases of clique or star expansion which ∗ ∗ −1/2 ∗ ∗ −1/2 follows from different ways of deriving edge weights for the B = (Dv) W (De) , transformed graph from hyperedge weights of the hypergraph. where, D∗ and D∗ are degree matrices with diagonal entries Thus, we focus on clique expansion and star expansion. v e d∗(u), and d∗(e), respectively, where, a) Clique expansion: The clique expansion algorithm c c 2 X constructs a graph G = (V,E ⊂ V ) from the original d∗(u) = w∗(u, e), u ∈ V, hypergraph G = (V,E) by replacing each hyperedge e = e∈E (u , ··· , u ) ∈ E X 1 d(e) with an edge for each pair of vertices in d∗(e) = w∗(u, e), e ∈ E. the hyperedge: Ec = {(u, v): u, v ∈ e, e ∈ E}. Note that the u∈V vertices in hyperedge e form a clique in the graph Gc. The edge weight wc(u, v) can be defined in different ways leading Note that since number of hyperedges i.e. |E| can be large, ∗ to different clique expansions. Thus, the normalized Laplacian the star expansion would result in a graph G which can have of the constructed graph Gc becomes very large number of vertices making the application of GDM ˜ c c −1/2 c c −1/2 challenging. Furthermore, even if two hypergraphs G and G are L = I − (Dv) A (Dv) , defined on same node set V to begin with, the star expansions, where, Ac is the adjacency matrix G∗ and G˜∗, respectively will in general have a different set of vertices. [Ac] = wc(u, v), uv To alleviate these issues, we propose to use the notion of c and, Dv is the vertex degree matrix with diagonal entries projected Laplacian as defined in [7]. Note that any for a |V |+ c T T T ∗ ∗ d (u), u ∈ V . |E| eigenvector v = [vv , ve ] of L that satisfies L v = λv, The standard clique approach minimizes the difference then, c ∗ ∗ T 2 between the edge weight of G and the weight of each B (B ) vv = (λ − 1) vv. hyperedge e that contains both u and v leading to, Thus, the |V | elements of the eigenvectors of L∗ correspond- cs X w (u, v) = h(u, e)h(v, e)w(e), ing to vertices V ⊂ V ∗ are eigenvector of the |V |×|V | matrix e B∗(B∗)T , X dcs(u) = h(u, e)(d(e) − 1)w(e), B∗(B∗)T = (D∗)−1/2W∗(D∗)−1(W∗)T (D∗)−1/2. e∈E v e v 6

v5 v5 v1

e1 v2 v1 v1 e3

e2 v3 Clique e1 e2 Star v4 Expansion v4 Expansion v4 v2 v2 e3

v5 v3 v3

Figure 1. Illustration of hypergraph transformation into a graph by clique and star expansion.

Given this relationship between B∗(B∗)T and L∗, B. Indirect HDMs

∗ ∗ ∗ T ∗ −1/2 ∗ ∗ −1/2 Let G be a hypergraph, and let GH be the graph obtained Lp = I − B (B ) = I − (Dv) Ap(Dv) , by one of approaches discussed in the previous section. We can be considered a projected normalized Laplacian on node will denote this transformation as GH = T (G). Given a ∗ set of original graph G, with Ap, transformation T and a GDM D, an indirect HDMs D induced ∗ ∗ ∗ −1 ∗ T by the pair (T ,D) is given by, Ap = W (De) (W ) , ˜ ˜ ˜ being the projected adjacency matrix. Note that the eigenvalues DT ,D(G, G) ≡ D(T (G), T (G)) = D(GH , GH ). (14) ∗ of Lp lie in [0, 1], i.e. 0 = λ0p ≤ · · · , λnp ≤ 1. ˜ ∗ Note that depending on whether G and G have known or We next discuss different choices for weights w (u, e). unknown node correspondence, appropriate D can be chosen The standard star expansion approach assigns the scaled from the GDMs discussed in Section III or any other available hyperedge weight i.e. w∗s(u, e) = w(e) to each corresponding d(e) in literature. Furthermore, depending on application one can graph edge, so that the weight matrix becomes, pick D to capture local, global, mesoscopic or multi-scale ∗s −1 W = HWDe , differences. leading to V. APPROACH II:DIRECT HDMS BASEDON TENSOR X d∗(u) = h(u, e)w(e)/δ(e), u ∈ V, BASED HYPERGRAPH REPRESENTATION e∈E In this section we propose a second approach for defining ∗ d (e) = w(e), e ∈ E, HDMs which is based on hypergraph representations that expressed in terms of original hypergraph’s G properties. intrinsically capture multi-way relations using tensors. Another choice is w∗(u, e) = w(e), i.e., W∗z = HW, A. Tensor Preliminaries A tensor is a multidimensional array [10], [28]–[30]. The which leads to D∗ = D and D∗ = WD , and thus resulting v v e e order of a tensor is the number of its dimensions, and each in, ∗z −1 T dimension is called a mode. An m-th order real valued tensor Ap = HWDe WH , (12) J1×J2×···×Jm will be denoted by X ∈ R , where Jk is the size and of its k−th mode. ∗z −1/2 −1 T −1/2 J1×J2×···×Jm Lp = I − Dv HWDe H Dv , (13) The inner product of two tensors X, Y ∈ R is defined as, which is the same hypergraph Laplacian as the one proposed J J by Zhuo et. al. [27]. This definition of hypergraph Laplacian X1 Xm hX, Yi = ··· X Y , originates from relaxation of normalized hypergraph cut prob- j1j2...jm j1j2...jm lem analogous to the standard normalized graph cut problem. j1=1 jm=1 Infact, the eigenvector of L∗z corresponding to its second leading to the tensor Frobenius norm kXk2 = hX, Xi. We smallest eigenvalue encodes the information about subsets of say two tensors X and Y are orthogonal if the inner prod- vertices in the hypergraph which are weakly connected to each uct hX, Yi = 0. The matrix tensor multiplication X ×k A other. along mode k for a matrix A ∈ RI×Jk is defined by, In whatever follows, we will use the standard clique expan- (X × A) = PJk X A . This k j1j2...jk−1ijk+1...jm jk=1 j1j2...jk...jN ijk sion (Eqns. 9 and 10), and projected star expansion based on product can be generalized to what is known as the Tucker Zhuo et. al. construction (Equn.s 12 and 13) for transforming product, hypergraph into graph for dissimilarity comparison using I1×I2×···×Im GDMs. X ×1 A1 ×2 A2 ×3 · · · ×m Am ∈ R . (15) 7

Higher-Order Singular Value Decomposition (HOSVD) is [34] with an associated MATLAB toolbox [35]. The procedure a multilinear generalization of matrix SVD to tensors [31]. involves homotopy continuation type method which can be HOSVD of a tensor X ∈ RJ1×J2×···×Jm is given by: computationally intensive, thus making it challenging to scale to large order/size tensors. X = S ×1 U1 ×2 · · · ×m Um, (16) J ×R where, Uk ∈ R k k are orthogonal matrices, and S ∈ B. Tensor Based Hypergraph Representation R1×R2×···×Rm R is called the core tensor. The quantity Rk ≤ We follow tensor based formulation proposed in [11] to Jk is referred to as the k-mode multilinear rank of X, and equal to rank of k− mode matrix unfolding of X. The subtensors define hypergraph adjacency tensor and Laplacian tensor. Let G = (V, E, w(·)) be a weighted hypergraph with n vertices, Sjk=α of S obtained by fixing the k-th mode to α, have the properties: and k be the maximum cardinality of the hyperedges, i.e. k = max{|e| : e ∈ E}. 1) all-orthogonality: two subtensors S and S are jk=α jk=β The adjacency tensor A ∈ n×n×···×n of G, which is a k-th orthogonal for all possible values of k, α and β subject R order n-dimensional supersymmetric tensor, is defined as, to α 6= β; 2) ordering: kS k ≥ · · · ≥ kS k ≥ 0 for all  w(e)s jk=1 jn=Jn  α if e = (i1, i2, . . . , is) ∈ E possible values of k.  Aj1j2...jk = , (20) kS k γ(k)  The Frobenius norms jk=j , denoted by j , are known 0, otherwise as the k-mode singular values of X. De Lathauwer et al. [31] showed that the number of nonvanishing k-mode singular where, j1j2 . . . , jk are chosen in all possible ways from values of a tensor is equal to its k-mode multilinear rank, i.e. {i1, i2, . . . , is} with atleast once for each element of the set, R . HOSVD can be computed using a sequence of matrix and k X k! SVDs, and by introducing SVD truncations yields a quasi- α = Qs . Ps l=1 ki! optimal solution to the low mutilinear rank approximation k1,··· ,ks≥1, i=1 ki=k problem. Consider a m−th order n dimensional cubical (i.e. with Using the adjacency tensor, the degree d(vi), of a vertex vi ∈ V , can be expressed as, equal size Ji = n, i = 1, ··· , m in all modes) tensor A ∈ n×n×···×n. A is called supersymmetric if A = n R i1,··· ,im X d(vi) = Aij1j2jk−1 . (21) Aσ(i1,··· ,im) for all σ ∈ Σm, the symmetric group of m indices. T j j j =1 To a n− vector x = (x1, ··· , xn) , real or complex, define a 1 2 k−1 n-vector via Tucker product as: w(e)s The choice of the nonzero coefficients α preserves the m−1 Ax = A ×2 x ×3 · · · ×m x. degree of each node, i.e., the degree of node j computed using (21) with weights as defined above is equal to number of There are many different notions of tensor eigenval- n hyperedges containing the node in the original non-uniform ues/eigenvectors [32], [33]. A pair (λ, x) ∈ R × {R \{0}} hypergraph. Note that for a k-uniform hypergraph, above is called definition simplifies to, • H-eigenvalue/eigenvector (or H-eigenpair) of A if they  w(e) satisfy,  k−1! if e = (i1, i2, . . . , ik) ∈ E m−1 [m−1]  Ax = λx , (17) Aj1j2...jk = . (22)  [m−1] m−1 0, otherwise where, (x )i = xi . • Z-eigenvalue/eigenvector (or Z-eigenpair) of A if they Let D be a kth-order n-dimension super-diagonal tensor satisfy, with nonzero elements dii···i = d(vi). The hypergraph Lapla- Axm−1 = λx, cian tensor is defined as, 2 2 x1 + ··· + xn = 1. (18) L = D − A, (23) p p • l -eigenvalue/eigenvector (or l -eigenpair) of A for any which is also a kth-order n-dimension super-symmetric tensor. p > 0 if they satisfy, Similarly, normalized hypergraph Laplacian tensor can also Axm−1 = λx[p−1], be defined. We recall a result from [11], which establishes p p following properties (which are analogous to case of graph x1 + ··· + xn = 1. (19) Laplacian) of L , p Note that l -eigenvalue/eigenvector reduce to H- • L has an H-eigenvalue 0 with eigenvector v = eigenvalue/eigenvector and Z-eigenvalue/eigenvector for (1, 1, ·, 1)T ∈ Rn. Moreoever, 0 is the unique H++- p = m and p = 2, respectively, where note the constraint in eigenvalue of L, + (19) is superfluous for p = m. • ∆ is the largest H -eigenvalue of L, where ∆ is maxi- It was proved in [32] that H-eigenvalues and Z-eigenvalues mum node degree of GH , n exist for an even order real supersymmetric tensor. A numer- • (d(vi), ej) is an H-eigenpair, where ej ∈ R are the ical procedure for computing H eigenvalues is provided in standard basis vectors. 8

H-eigenvalues of L, thus encode global structural properties two hypergraphs with same node set and same maximum hy- of a hypergraph, and we propose to use them in generalizing peredge cardinality, and let (A, L) and (A˜, L˜) be corresponding spectral GDM for hypergraphs. adjacency and Laplacian tensor, respectively. We next discuss HOSVD of L and associated properties. • Structural HMDs: It is straightforward to generalize the Since L is supersymmetric, any mode unfolding of L would Hamming and Jaccard distance for graphs to tensor based yield the same unfolding matrix with the same singular values representation as follows: which we denote by γ , j = 1, ··· , n (note that we have j ˜ removed dependence of γk on mode k). It was shown in ||A − A||1 j DH (G, G˜) = , [12] that the singular values of L encode structural properties N of the hypergraph, such as vertex degrees, path lengths, where, clustering coefficients and nontrivial symmetricity for uniform n n n hypergraphs, and thus can be used to quantify differences in X X X ||A||1 = ··· |Ai1,··· ,ik |, hypergraph structure. Moreover, a fast and memory efficient i1=1 i2=1 ik=1 tensor train decomposition (TTD)-based computational frame- k work was developed in [12] to compute the singular values for is tensor 1-norm and N = n − n is a normalization uniform hypergraphs. Given these two desirable features, we constant, and P ˜ also propose to use singular values of L as an alternative to min(Aj j ...j , Aj j ...j ) D (G, G˜) = 1 − j1j2...jk 1 2 k 1 2 k , H-eigenvalues in defining spectral HDM. J P ˜ max(Aj j ...j , Aj j ...j ) The notion of centrality has been generalized for hyper- j1j2...jk 1 2 k 1 2 k graphs. H/Z eigenvectors of the adjacency tensor A are used respectively. to define hypergraph eigenvector-centrality [36]. In particular, • Feature Based HMDs: As in feature based GDM, we the H/Z-eigenvector centrality c ∈ Rn is defined to be an H/Z- can use specific “features” of the hypergraph, such as the eigenvector of the adjacency tensor A such that it is positive node degree distribution, different notions of centrality, i.e. c > 0 with a positive H/Z-eigenvalue, i.e. λ > 0. By diameter, etc for use in comparing hypergraphs. If we are Perron-Frobenious theorem for non-negative tensors [37], such using node dependent features, the method aggregates a positive H/Z-eigenvectors exist under certain irreducibility feature-vertex matrix of size k ×n, where k is number of conditions on A. While such a positive Z-eigenpair may not features selected. This feature-vertex matrix for the two be unique, the H-eigenpair is always unique upto scaling. hypergraphs can then be directly compared, or can be Along similar lines, authors in [38] define node and hyperedge further reduced to a “signature vector” as in the graph centralities as vectors c ∈ Rn and e ∈ Rm, respectively, that case, and used to obtain a DM between hypergraphs. As satisfy, in graph case (see Section III) we propose to use tensor based hypergraph centrality as the feature for comparison. cλ = g(HWf(e)), (24) Let ci, i = 1, ··· , n and c˜i, i = 1, ··· , n be normalized T eµ = ψ(H φ(c)), (25) (i.e. |c|1 = |c˜|1 = 1) node centralities for G and G˜, s.t. c, e > 0, λ, µ > 0. (26) respectively, then centrality based HDM is given by, n 1 X In above system of equations, H is the incidence matrix and D (G, G˜) = |c − c˜ |. (27) C n i i W is the hyperedge weight matrix for G (and we have assumed i=1 that all vertices have unit weight consistent with setting in While in above definition one could use any notion this paper), and g, f, φ, ψ : + → + are appropriately R R of hypergraph centrality, we propose to use the node chosen non-negative functions on non-negative real domain. centrality defined by (24)-(26) in our application. Furthermore note that these scalar functions are extended on • Spectral HMDs: Let the ordered set of H-eigenvalues of vectors by defining them as mappings that act in a com- L be λ , ··· , λ i.e. λ ≤ λ · · · ≤ λ , and similarly let ponentwise fashion. By invoking Perron-Frobenious theorem 1 p 1 2 q λ˜ , ··· , λ˜ ˜ for multi-homogeneous mappings [39], it was proved that 1 q˜ be the ordered set for L. Note that in general q 6=q ˜ q˜ > q under certain conditions on the scalar functions, the solution . Without loss of generality, assume and λ , ··· , λ of above system exists and is unique. A nonlinear power define an extended set of H-eigenvalues 1 q˜ for G λ = 0, i ≤ q˜ − q λ = λ , i = method with convergence guarantees is proposed to solve the , where i , and i i−(˜q−q) q˜−q+1, ··· , q˜ l above system. Furthermore, it is shown that with choices . The p distance on space of H-eigenvalues f(x) = x, g(x) = x1/(p+1), ψ(x) = ex and ψ(x) = ln(x) can then be defined as, , the node centrality vector c is also a lp-tensor eigenvector, q˜−1 ˜ 1 X ˜ p and thus further generalizing the notion of H/Z-eigenvector Dλ(G, G) = |λi − λi| . (28) q˜ centrality. i=1 As discussed above, we similarly propose to use higher order singular values, leading to, C. Direct HDMs n−1 For tensor based representation we define a set of DMs 1 X D (G, G˜) = |γ − γ˜ |p, (29) along similar lines as discussed in Section III. Let G and G˜ be γ n i i i=1 9

where, γi, i = 1, ··· , n and γ˜i, i = 1, ··· , n are higher b) Scale Free Hypergraph (SFH(µ)): To construct an k- order singular values of L and L˜, respectively. We will uniform SFH, we follow the generative model from [47]:

refer to Dλ and Dγ as spectral-H and spectral-S HDMs, i. Assign each node a probablity pi as: respectively. i−µ • Hypergraph Embedding based HDMs: Recently, graph pi = Pn −µ , i = 1, ··· , n, convolutional neural networks have been extended to j=1 j hypergraphs [40], [41]. Thus, as for graphs, one can where, 0 < µ < 1 is a user chosen parameter. learn an embedding from a set of hypergraphs into ii. Select k− distinct vertices with probabilities pi1 , ··· , pik . Euclidean space, and then compute a distance between If the hypergraph does not already contain a hyperedge the embedded hypergraphs. of those chosen k vertices, then add the hyperedge to the • Hypergraph Kernel based HDMs: The notion of graph hypergraph kernels has been generalized to hypergraphs, see for iii. Repeat step ii) m times. example [42], [43]. These kernels can be used for com- Similar to BA graph, this procedure produces a hypergraph paring hypergraphs as in the graph case. with vertices with average degrees < d > having a power-law −λ 1 distribution, Pk(< d >) ∼< d > with λ = 1 + µ . c) Watts-Strogats Hypergraph (WSH(p)): Using a proce- VI.NUMERICAL STUDIES dure similar for WS graphs, WSH is constructed as follows, In this section we assess the performance of indirect [12]: and direct HDMs on synthetic hypergraphs and real world i. Construct a d-regular k-uniform hypergraph with n ver- biological datasets. For these studies we have chosen one tices, and add extra hyperedges in every k + 1 vertices. representative example of a local, global, mesoscopic and mul- We refer to hyperedges in this hypergraph as the initial tiscale HDMs, namely, Hamming HDM (local), spectral HDM hyperedges. (global), centrality based HDM (mesoscopic) and deltaCon ii. Select an initial hyperedge and generate a new hyperedge HDM (multiscale). with k vertices chosen uniformly at random. If the new hyperedge does not exist, with probability p, replace the selected hyperedge with the new hyperedge. Here 0 < A. Synthetic Hypergraphs p < 1 is a rewiring probability as specified by the user. To generate synthetic hypergraphs, we consider three fami- iii. Repeat step ii), till all initial hyperedges have been lies of generative models: Erdos-R¨ enyi´ (ER), Barabasi-Albert´ iterated on. (BA) and Watts-Strogats (WS). These three models are widely Figure 2, show a realization of each of these three hyper- used as test-beds in a variety of network science problems and graph models with k = 4, n = 100 and m = 125. To asses if have varying structural complexity. ER model [44] leads to these hypergraph models posses similar structural properties “structureless” graph in the sense that the statistical properties as their graph counterparts, we compare their node degree of each edge and vertex in the graph is exactly same. In BA distribution, average path length, and clustering coefficient. model [45], on the other hand, the node degree distribution For computing average path length, we use: behaves as a power-law due to , and that 1 X L = d(v , v ), (30) impacts both its global and local structure. On the local scale, a n(n − 1) j i vertices in graph tend to connect exclusively to highest-degree j6=i vertices in the graph, rather than to one another, generating where, d(vj, vi) denotes the shortest distance between vertices a tree-like topology. The high-degree vertices acts like hub vj and vi. Note in computing shortest distance, two vertices which are by definition are global structures as they touch a are considered adjacent if they share a common hyperedge. significant portion of rest of the graph, thereby increasing the For clustering coefficient we use definition from [12], connectivity throughout the graph. WS model [46] on a global |{ei1i2···ik : vi1 , vi2 , ··· , vik ∈ Vj, ei1i2···ik ∈ E}| scale looks like an uncorrelated in which it Cj = , |Vj | exhibits no communities or high-degree vertices but has small k n average shortest path length between vertices, while at local 1 X ⇒ C = C , scale it shows high clustering compared to the BA model. a n j The three models, originally developed for graphs, have j=1 (31) been generalized to the hypergraph case. We will restrict to procedure of construction of k− uniform hypergraphs in where, Vj is the set of vertices that are immediately connected |Vj | each family. The user specifies the desired number of vertices to vj i.e. share a hyperedge with node vj, and k = |Vj |! n, desired number of hyperedges m and some additional returns the binomial coefficients. If |Vj| < k, we (|Vj |−k)!k! parameters depending on the model as discussed below. set Cj = 0. n a) Erdos–Renyi (ERH): There are k possible hyper- As can be seen from Fig. 2, the ERH and WSH construction edges in a k-uniform hypergraph. To construct a random k− results in a hypergraph with almost homogenous degree distri- uniform hypergraph, we uniformly sample m hyperedges from bution. The SFH, on the other hand, shows power-law degree this set without repetition. distribution for node degrees, as discussed above. The WSH 10 model shows high clustering coefficient compared to ERH and We then compute all the pairwise HDM using each method, SFH as expected. ending up with Ng × Ng distance matrix for each case. We then generate ROC curves for each HDM considered using procedure discussed above. In addition, to create a visual aid, we also generate a 2d embedding for each hypergraph in the population by applying t-distributed stochastic neighbor embedding (tSNE) [49] to the distance matrix corresponding to each HDM. Figure 3 shows the ROC curves, while the 2d embedding is shown in Figure 4, where we have labeled each point corresponding to different instances of hypergraph using distinct colors based on its known model condition. Note that for an unweighted uniform hypergraph, the adjacency matrices for clique (Eqn. (9)) and star expansion (Eqn. (12)) become same upto a scale factor. Hence, we find that the ROC curves for Hamming and centrality based indirect HDMs are similar.

Clique Star 1 1

0.5 0.5

TPR TPR

0 0 0 0.5 1 0 0.5 1 FPR FPR Figure 2. Examples of 4-uniform ERH, WSH and SFH with n = 80 and Tensor AUC m = 100. Also shown are node degree distribution, average path length and 1 1 clustering coefficients for each case. The red polygons represent a selected hyperedge.

0.5 0.5 TPR B. Performance on Synthetic Hypergraphs

We next assess and compare the effectiveness of different 0 0 0 0.5 1 Clique Star Tensor HDMs in differentiating hypergraphs with distinct structural FPR features, i.e., originating from the different models. In other words, to yield a good performance, a HDM should be able Figure 3. ROC curves and AUC for different HMDs: Hamming (blue), to assign small distance to hypergraph pairs coming from Spectral (red) and Centrality (orange). the same model but large HDM values to pairs coming from different models. A systematic approach for quantification of the performance Clique-Hamming Clique-Spectral Clique-Centrality 200 50 can be accomplished via the receiver operating characteristic 200 (ROC) curve [48]. The ROC curve is created by plotting the 0 0 0 true positive rate (TPR) against the false positive rate (FPR) -50 -200 -200 at various threshold settings. For a given HDM, one defines a -100 threshold  > 0 and classifies two hypergraphs as belonging -200 0 200 -50 0 50 100 -200 0 200 Star-Hamming Star-Spectral Star-Centrality to the same model class if their HDM value is less than . 200 100 Given that the correct classes are known, TPR and FPR values 100 200 0 can be computed to quantify the accuracy of classifying all 0 0 the hypergraphs. The procedure is then repeated by varying -100 -200 -100 , obtaining the ROC curve which, ideally, should have TPR -200 0 200 400 -100 0 100 -100 0 100 Tensor-Hamming Tensor-Spectral Tensor-Centrality equal to 1 for any FPR value. Furthermore, one can compute 5000 the area under curve (AUC) which is equal to the probability 400 200 200 0 0 that a classifier will rank a randomly chosen positive instance 0 higher than a randomly chosen negative one. Thus, for a -200 -200 -5000 perfect classifier, AUC = 1, while for a classifier that randomly -5000 0 5000 -200 0 200 -200 0 200 assigns observations to classes, AUC= 0.5. For testing purposes, we consider hypergraph size n = 40, Figure 4. Embedding for different hypergraphs in 2d using tSNE applied to m = 50 and generate 25 networks for each of the three HDMs: ERH (red), WSH (green) and SFH (blue). types, resulting in a total population of Ng = 75 hypergraphs. 11

C. Test on Real Datasets A B C In this section we assess the performance of HDMs in grouping sets of real-world networks. In order to assess statistical significance while comparing two hypergraphs, we propose to use the permutation test. 1) Permutation Test for HDM: Consider a hypothesis test- ing problem: • Null (H0): G1 and G2 are similar, • Alternative (H1): G1 and G2 are dissimilar. Since the null distribution for H0 is unknown, we use a permutation test [50] to empirically estimate it. Let D be any of the HDMs, and let ps be the desired significance level of Figure 5. Mouse neuron endomicroscopy features. (A), (B) and (C) Neuronal the test. The steps in the permutation test involve: activity networks of the three phases - fed, fast and re-fed, which depicts the • Step 1: Randomly generate a family of hypergraphs spatial location and size of individual neurons. Each 2-simplex (i.e., a triangle) r N represents a hyperedge. The cutoff threshold is 0.93 for the hypergraph model. {Gi }i=1 which are similar to G1, and compute Di = r D(G1, Gi ), i = 1, ··· ,N. • Step 2: Compute D12 = D(G1, G2). 1 N 2) Mouse Neuron Endomicroscopy: The mouse endomi- • Step 3: Compute p-value as p = P I (D ), N i=1 D12 i croscopy dataset is an imaging video created under 10-minute where Ix is indicator function, i.e. Iz(x) = 1 if x > z periods of feeding, fasting and re-feeding using fluorescence and Iz(x) = 0 otherwise. across space and time in a mouse hypothalamus [4], [12], [13]. • Step 4: Reject H0 if p ≤ ps. Twenty neurons are recorded with individual levels of “firing”. In Step 1 one could use Erdos-Reyni (ER) or Chung–Lu Similar to [12], we want to quantitatively differentiate the three (CL) procedure [8] to randomly generate hypergraphs with phases. First, we compute the multi-correlation among every 1 similar characteristics as G1. Let dv = (d(v1), ··· , d(vn)) and three neurons, which is defined by 1 d = (d(e1), ··· , d(em)) be vertex degree and hyperedge size e 1 Pn Pm 2 distribution vectors of G1. Let c = i=1 d(vi) = i=1 d(ei), ρ = (1 − det (R)) , (32) and vertex-hyperedge membership probability in G be p, 1 where, R ∈ 3×3 is the correlation matrix of three neuron where R c ρ p = . activity levels [51]. When the multi-correlation is greater mn than a prescribed threshold, we build a hyperedge among the ER procedure selects vertices uniformly at random for each three neurons and assign it an hyperedge weight equal to ρ. hyperedge with probability p. Thus, for each of the nm vertex- We use a threshold of 0.93 as used in [12] for the purpose. hyperedge pairs, the probability of membership is the same, Figure 6 shows the comparison of hypergraph corresponding i.e. to different phases using various HDMs. We find similar P(u ∈ e) = p. trends using both indirect and direct HDMs. The spectral and On the other hand, the CL procedure generates Gr with centrality HDMs between fed and refed phases have smaller similar vertex degree and hyperedge size distribution as of values revealing more similarity at global and mesoscopic r scales compared to corresponding HDM values between fed G1. The probability a vertex belongs to a hyperedge in G is proportional to the product of the desired vertex degree and and fast, and refed and fast phases. On the other hand, hyperedge size, i.e. Hamming HDM reveals that fed and fast phase are more similar at local scale compared to fed and refed phases. The d(u)d(e) P(u ∈ e) = . * on the bars implies that there is statistically significant c difference between the hypergraphs in the corresponding two To ensure this probability is always less than 1, one may fur- phases based on p-values from the permutation test. ther require the input sequences satisfy maxi,j d(ui)d(ej) ≤ c. 3) Genomic Dataset: We next apply HDM framework to Note that this procedure will in general produce a non-uniform 1 compare genomic structure of two cell types. Genomic DNA hypergraph depending on distribution of de. For a k− uniform must be folded to fit inside a nucleus, but must remain acces- hypergraph, d(ei) = k, i = 1, ··· , m. To sample a k− uniform 1 sible for gene transcription, replication and repair [52], [53]. hypergraph with given node degree distribution dv, we modify Consequently, higher-order chromatin structure arises from the CL process as follows: such combinatorial physical interactions of many genomic i. Assign each node a probability pi as: loci. Recently, authors in [54] proposed to represent such d(vi) higher-order chromatin structure by a hypergraph, where the p = , i = 1, ··· , n. i c different loci in the genome are the vertices, and each multi-

ii. Select k− distinct vertices with probabilities pi1 , ··· , pik . way contact between a set of loci represent a hyperedge. Fur- If the hypergraph does not already contain a hyperedge thermore, they used Pore-C [55], a recent method developed of the chosen k vertices, then add the hyperedge to the by Oxford Nanopore Technologies to measure these multi-way hypergraph. contacts directly and construct the hypergraph experimentally. iii. Repeat step ii) m times. Note that, while Pore-C gives contact information at the 12

Clique Star Clique Star Tensor Hamming 2.3 × 10−2(∗) 2 × 10−3(∗) 0.04 0.2 0.1 Spectral 6.2 × 10−4(∗) 1.6 × 10−3(∗) deltaCon 4.7 × 10−7(∗) 1.6 × 10−7(∗) 0.1 0.05 0.02 Centrality 2.5 × 10−6(∗) 2.9 × 10−6(∗)

Hamming Table I 0 0 0 INDIRECT HDMSVALUESBETWEENFULLGENOMEOFTHE FB AND GM. THE * IN BRACKET IMPLIES THAT DIFFERENCE BETWEEN FB AND GM IS -3 10 STATISTICALLY SIGNIFICANT BASED ON p− VALUES FROM THE 0.05 5 PERMUTATION TEST.

0.02 Spectral 0 0 0 that FB and GM are dissimilar at all scales based on clique expansion. On the other hand, using star expansion, FB and 0.01 0.01 0.05 GM are different at local and global scales, and look similar at mesoscopic scale.

0.005 0.005 Centrality 0 0 0 D. Discussion

Figure 6. Comparison of different phases of mouse feeding activity using We first discuss pros/cons of indirect and direct HDMs. different HDMs: blue (fed-fast), red (fed-refed), and orange (refed-fast). The While indirect HDMs allow one to leverage large variety of * on the bars implies that there is statistically significant difference between the hypergraphs in the corresponding two phases based on p-values from the GDMs for hypergraph comparison, the hypergraph conversion permutation test. into clique/star representation is lossy and thus may result in inability to discern certain aspects of structural differences or similarities between two hypergraphs. Direct HDMs being finest level of base-pair position in the genome, it is often based on tensor representation should not suffer from such convenient to aggregate this information at a coarser resolution limitation. However, tensor computations (e.g., tensor eigen- by aggregating linear continuous segments in the genome, see value/singular values) can be challenging for hypergraphs with [54] for details. large number of vertices and/or with high maximum hyperedge Figure 7 shows the visualization of the incidence matrix cardinality. Moreover, direct HDMs can only be applied to of the hypergraph derived for human fibroblasts (FB) and B cases where the underlying hypergraph has same number of lymphocytes (GM) cell lines at 25Mb resolution after noise vertices and same maximum hyperedge cardinality. Indirect reduction. The entire genome at this resolution consists of HDMs however are computationally less demanding, and can n = 3, 102 vertices for both cells, with number of hyperedges be employed even if hypergraphs have different maximum hy- m = 836, 571 for FM and m = 1, 028, 694 for GM. The peredge cardinality. Moreover, by restricting to GDMs which maximum hyperedge set cardinality is 40 for FB and 90 for are applicable for comparing graphs with different number GM. To compare chromosomes individually, we also construct of vertices or unknown node correspondence, indirect HDMs separate hypegraphs for each chromosome comprising of intra- can also be applied for comparing hypergraphs with different chromosomal contacts only. Chromosome 1 has maximum number of vertices and/or unknown node correspondence. In number of vertices n = 249 and chromosome 22 has smallest terms of performance of indirect and direct HDMs to assess number of vertices n = 51 at the chosen 25Mb scale. structural differences or similarities between two hypergraphs, The number of hyperedges differ by chromosomes taking the numerical studies show that both approaches could be values in range [1, 000 35, 000] for FM, and in the range effective depending on the application. [6, 000 75, 000] for GM, respectively. Moreover maximum We are currently exploring the application of line expansion hyperedge set cardinality also differs between corresponding [56] which has been recently proposed as an alternative ap- chromosomes in the GM and FB. As a result we cannot use proach to transforming hypergraph into a graph. Compared to the tensor based direct HDMs, and restrict to indirect HDM clique or star expansion, line expansion does not result in any for comparison. information loss during the transformation, thus, potentially In Figure 8 we show chromosome level comparison using providing more effective means for developing indirect HDMs. indirect HDM based on the clique and star expansion. We Addressing the computational challenges associated with ten- find that the trends between two expansions are similar for sor based HDM will be important to address to scale the Hamming, deltaCon, and centrality HDMs, while they differ approach to larger problems. In addition approaches alternative for spectral HDM. Furthermore, we can see that at local scale to using tensor based representation, such as higher order chromosome 16 differs most between FB and GM, while random walks based hypergraph analysis [8] provide another chromosome 19 and 21 differ the most at the mesoscale. The potential avenue for developing new HDMs. While notions of clique-spectral HMD reveals that chromosome 23 differs most graph kernels and graph embedding have been extended to at global scale, while star-spectral HMD indicates chromo- hypergraphs [40]–[43], further investigation is warranted for somes 19, 20, 21, 22 are the most different. their application in hypergraph comparison. Applications of Table I shows the values of different indirect HDM between the proposed HDMs in other domains e.g. cyber security and FB and GM for the entire genome. The p-values which suggest social networks is another direction of future research. 13

Fibroblast Whole Genome B Lymphocyte 25 Mb Loci Intra only Inter only Intra and inter

(A) (B)

Figure 7. (A) Incidence matrix visualization of the top 10 most common multi-way contacts per chromosome, for fibroblasts. Matrices are constructed at 25 Mb resolution. Highlighted boxes indicate example intra-chromosomal contacts (red), inter-chromosomal contacts (magenta), and combinations of intra- and inter-chromosomal contacts (blue). Examples for each type of contact are shown in the top right corner. Genomic loci that do not participate in the top 10 most common multi-way contacts for fibroblasts or B lymphocytes were removed from the incidence plots. (B) Similar plot for B lymphocytes.

Clique Star recommendations expressed in this material are those of the 0.5 2 author(s) and do not necessarily reflect the views of the United 1 States Air Force.

Hamming 0 0 0 5 10 15 20 0 5 10 15 20 REFERENCES 10-3 [1] M. Newman, Networks. Oxford university press, 2018. 5 0.01 0.005 [2] F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lucas, A. Patania, J.-G. Young, and G. Petri, “Networks beyond pairwise interactions: Spectral 0 0 0 5 10 15 20 0 5 10 15 20 structure and dynamics,” Physics Reports, 2020. -4 -5 [3] A. R. Benson, D. F. Gleich, and D. J. Higham, “Higher-order network 10 10 5 analysis takes off, fueled by classical ideas and new data,” arXiv preprint 1 arXiv:2103.05031, 2021. [4] P. Sweeney, C. Chen, I. Rajapakse, and R. D. Cone, “Network dynamics

DeltaCon 0 0 of hypothalamic feeding neurons,” Proceedings of the National Academy 0 5 10 15 20 0 5 10 15 20 of Sciences, vol. 118, no. 14, 2021. 10-3 10-3 Hypergraphs: combinatorics of finite sets 1 1 [5] C. Berge, . Elsevier, 1984, vol. 45. 0.5 0.5 [6] M. M. Wolf, A. M. Klinvex, and D. M. Dunlavy, “Advantages to modeling relational data using hypergraphs versus graphs,” in 2016 IEEE Centrality 0 0 0 5 10 15 20 0 5 10 15 20 High Performance Extreme Computing Conference (HPEC). IEEE, 2016, pp. 1–7. [7] S. Agarwal, K. Branson, and S. Belongie, “Higher order learning with Figure 8. Comparison of different chromosomes in FB and GM using indirect graphs,” in Proceedings of the 23rd international conference on Machine HDMs. learning, 2006, pp. 17–24. [8] S. G. Aksoy, C. Joslyn, C. O. Marrero, B. Praggastis, and E. Purvine, “Hypernetwork science via high-order hypergraph walks,” EPJ Data Science, vol. 9, no. 1, p. 16, 2020. VII.CONCLUSION [9] T. Carletti, D. Fanelli, and S. Nicoletti, “Dynamical systems on hyper- graphs,” Journal of Physics: Complexity, vol. 1, no. 3, p. 035006, 2020. In this paper we presented two approaches for hypergraph [10] T. G. Kolda, “Multilinear operators for higher-order decompositions,” comparison. The first approach transforms the hypergraph into 2006. [11] A. Banerjee, A. Char, and B. Mondal, “Spectra of general hypergraphs,” a graph representation, and then uses standard graph dissimi- Linear Algebra and its Applications, vol. 518, pp. 14–30, 2017. larity measures. The second approach uses tensors to represent [12] C. Chen and I. Rajapakse, “Tensor entropy for uniform hypergraphs,” hypergraphs and then invokes various tensor algebraic notions IEEE Transactions on Network Science and Engineering, vol. 7, no. 4, pp. 2889–2900, 2020. [Online]. Available: https://ieeexplore.ieee.org/ to develop hypergraph dissimilarity measures. Within each document/9119161 approach we presented a collection of measures which assess [13] C. Chen, A. Surana, A. Bloch, and I. Rajapakse, “Controllability of hypergraph dissimilarity at different scales. We evaluated these hypergraphs,” IEEE Transactions on Network Science and Engineering, 2021. measures on synthetic hypergraphs, and real world biological [14] C. Donnat and S. Holmes, “Tracking network dynamics: A survey datasets with promising results. Finally, we discussed various of distances and similarity metrics,” arXiv preprint arXiv:1801.07351, pros/cons in using the two approaches, and outlined some 2018. [15] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in Proceedings avenues of future research. of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015, pp. 1365–1374. [16] M. De Domenico, V. Nicosia, A. Arenas, and V. Latora, “Structural CKNOWLEDGMENTS A reducibility of multilayer networks,” Nature communications, vol. 6, This material is based upon work supported by the Air no. 1, pp. 1–9, 2015. [17] D. R. Hunter, S. M. Goodreau, and M. S. Handcock, “Goodness of fit of Force Office of Scientific Research under award number models,” Journal of the American Statistical Association, FA9550-18-1-0028. Any opinions, finding, and conclusions or vol. 103, no. 481, pp. 248–258, 2008. 14

[18] K. Faust and J. Skvoretz, “Comparing networks across space and time, [47] B. Jhun, M. Jo, and B. Kahng, “Simplicial sis model in scale-free size and species,” Sociological methodology, vol. 32, no. 1, pp. 267–299, uniform hypergraph,” Journal of Statistical Mechanics: Theory and 2002. Experiment, vol. 2019, no. 12, p. 123207, 2019. [19] P. Wills and F. G. Meyer, “Metrics for graph comparison: a practitioner’s [48] T. Fawcett, “An introduction to roc analysis,” Pattern recognition letters, guide,” PloS one, vol. 15, no. 2, p. e0228728, 2020. vol. 27, no. 8, pp. 861–874, 2006. [20] D. Koutra, J. T. Vogelstein, and C. Faloutsos, “Deltacon: A principled [49] G. Hinton and S. T. Roweis, “Stochastic neighbor embedding,” in NIPS, massive-graph similarity function,” in Proceedings of the 2013 SIAM vol. 15. Citeseer, 2002, pp. 833–840. International Conference on Data Mining. SIAM, 2013, pp. 162–170. [50] P. I. Good, Permutation, parametric, and bootstrap tests of hypotheses. [21] A. Tsitsulin, D. Mottin, P. Karras, A. Bronstein, and E. Muller,¨ “Netlsd: Springer Science & Business Media, 2006. hearing the shape of a graph,” in Proceedings of the 24th ACM SIGKDD [51] J. Wang and N. Zheng, “Measures of correlation for multiple variables,” International Conference on Knowledge Discovery & Data Mining, arXiv preprint, 2014. [Online]. Available: https://arxiv.org/abs/1401. 2018, pp. 2347–2356. 4827 [22] M. Berlingerio, D. Koutra, T. Eliassi-Rad, and C. Faloutsos, “Netsimile: [52] I. Rajapakse and M. Groudine, “On emerging nuclear order,” Journal of A scalable approach to size-independent network similarity,” arXiv Cell Biology, vol. 192, no. 5, pp. 711–721, 2011. preprint arXiv:1209.2684, 2012. [53] H. Chen, J. Chen, L. A. Muir, S. Ronquist, W. Meixner, M. Ljungman, [23] K. Das, S. Samanta, and M. Pal, “Study on centrality measures in social T. Ried, S. Smale, and I. Rajapakse, “Functional organization of the networks: a survey,” Social network analysis and mining, vol. 8, no. 1, human 4d nucleome,” Proceedings of the National Academy of Sciences, pp. 1–11, 2018. vol. 112, no. 26, pp. 8002–8007, 2015. [24] P. Goyal and E. Ferrara, “Graph embedding techniques, applications, [54] S. Lindsly, C. Chen, S. Dilworth, S. Jeyarajan, W. Meixner, A. Cicalo, and performance: A survey,” Knowledge-Based Systems, vol. 151, pp. C. W. Ryan, A. Surana, L. A. Muir, and I. Rajapakse, “Deciphering 78–94, 2018. genome structure and function,” in To be submitted, 2021. [25] N. M. Kriege, F. D. Johansson, and C. Morris, “A survey on graph [55] S. Howorka and Z. S. Siwy, “Reading amino acids in a nanopore,” kernels,” Applied Network Science, vol. 5, no. 1, pp. 1–42, 2020. Nature biotechnology, vol. 38, no. 2, pp. 159–160, 2020. [26] M. Bolla, “Spectra, euclidean representations and clusterings of hyper- [56] C. Yang, R. Wang, S. Yao, and T. Abdelzaher, “Hypergraph learning graphs,” Discrete Mathematics, vol. 117, no. 1-3, pp. 19–39, 1993. with line expansion,” arXiv preprint arXiv:2005.04843, 2020. [27] D. Zhou, J. Huang, and B. Scholkopf,¨ “Learning with hypergraphs: Clus- tering, classification, and embedding,” in Advances in neural information processing systems, 2007, pp. 1601–1608. [28] T. Kolda and B. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, 2009. [29] C. Chen, A. Surana, A. Bloch, and I. Rajapakse, “Multilinear time invariant system theory,” in 2019 Proceedings of the Conference on Control and its Applications. SIAM, 2019, pp. 118–125. [30] C. Chen, A. Surana, A. M. Bloch, and I. Rajapakse, “Multilinear control systems theory,” SIAM Journal on Control and Optimization, vol. 59, no. 1, pp. 749–776, 2021. [31] L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear singular value decomposition,” SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253–1278, 2000. [32] L. Qi, “Eigenvalues of a real supersymmetric tensor,” Journal of Sym- bolic Computation, vol. 40, no. 6, pp. 1302–1324, 2005. [33] L.-H. Lim, “Singular values and eigenvalues of tensors: a variational approach,” in 1st IEEE International Workshop on Computational Ad- vances in Multi-Sensor Adaptive Processing, 2005. IEEE, 2005, pp. 129–132. [34] L. Chen, L. Han, and L. Zhou, “Computing tensor eigenvalues via homotopy methods,” SIAM Journal on Matrix Analysis and Applications, vol. 37, no. 1, pp. 290–319, 2016. [35] L. Chen, L. Han, Li, and L. Zhou, “TenEig,” https://users.math.msu. edu/users/chenlipi/TenEig.html, 2015. [36] A. R. Benson, “Three hypergraph eigenvector centralities,” SIAM Jour- nal on Mathematics of Data Science, vol. 1, no. 2, pp. 293–312, 2019. [37] K.-C. Chang, K. Pearson, and T. Zhang, “Perron-frobenius theorem for nonnegative tensors,” Communications in Mathematical Sciences, vol. 6, no. 2, pp. 507–520, 2008. [38] F. Tudisco and D. J. Higham, “Node and edge eigenvector centrality for hypergraphs,” arXiv preprint arXiv:2101.06215, 2021. [39] A. Gautier, F. Tudisco, and M. Hein, “A unifying perron–frobenius theorem for nonnegative tensors via multihomogeneous maps,” SIAM Journal on Matrix Analysis and Applications, vol. 40, no. 3, pp. 1206– 1231, 2019. [40] J. Payne, “Deep hyperedges: a framework for transductive and inductive learning on hypergraphs,” arXiv preprint arXiv:1910.02633, 2019. [41] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao, “Hypergraph neural net- works,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 3558–3565. [42] J. Lugo-Martinez and P. Radivojac, “Classification in biological net- works with hypergraphlet kernels,” arXiv preprint arXiv:1703.04823, 2017. [43] S. Bai, F. Zhang, and P. H. Torr, “Hypergraph convolution and hyper- graph attention,” Pattern Recognition, vol. 110, p. 107637, 2021. [44] B. Bollobas´ and B. Bela,´ Random graphs. Cambridge university press, 2001, no. 73. [45] A.-L. Barabasi´ and R. Albert, “Emergence of scaling in random net- works,” science, vol. 286, no. 5439, pp. 509–512, 1999. [46] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small- world’networks,” nature, vol. 393, no. 6684, pp. 440–442, 1998.