Hypernetwork Science: from Multidimensional Networks to Computational Topology∗
Total Page:16
File Type:pdf, Size:1020Kb
Hypernetwork Science: From Multidimensional Networks to Computational Topology∗ Cliff A. Joslyn,y Sinan Aksoy,z Tiffany J. Callahan,x Lawrence E. Hunter,x Brett Jefferson,z Brenda Praggastis,y Emilie A.H. Purvine,y Ignacio J. Tripodix March, 2020 Abstract cations, and physical infrastructure often afford a rep- resentation as such a set of entities with binary rela- As data structures and mathematical objects used tionships, and hence may be analyzed utilizing graph for complex systems modeling, hypergraphs sit nicely theoretical methods. poised between on the one hand the world of net- Graph models benefit from simplicity and a degree of work models, and on the other that of higher-order universality. But as abstract mathematical objects, mathematical abstractions from algebra, lattice the- graphs are limited to representing pairwise relation- ory, and topology. They are able to represent com- ships between entities, while real-world phenomena plex systems interactions more faithfully than graphs in these systems can be rich in multi-way relation- and networks, while also being some of the simplest ships involving interactions among more than two classes of systems representing topological structures entities, dependencies between more than two vari- as collections of multidimensional objects connected ables, or properties of collections of more than two in a particular pattern. In this paper we discuss objects. Representing group interactions is not possi- the role of (undirected) hypergraphs in the science ble in graphs natively, but rather requires either more of complex networks, and provide a mathematical complex mathematical objects, or coding schemes like overview of the core concepts needed for hypernet- “reification” or semantic labeling in bipartite graphs. work modeling, including duality and the relationship Lacking multi-dimensional relations, it is hard to ad- to bicolored graphs, quantitative adjacency and inci- dress questions of \community interaction" in graphs: dence, the nature of walks in hypergraphs, and avail- e.g., how is a collection of entities A connected to an- able topological relationships and properties. We other collection B through chains of other commu- close with a brief discussion of two example applica- nities?; where does a particular community stand in tions: biomedical databases for disease analysis, and relation to other communities in its neighborhood? domain-name system (DNS) analysis of cyber data. The mathematical object which natively represents multi-way interactions in networks is called a \hy- 1 Hypergraphs for Complex 1 arXiv:2003.11782v1 [cs.DM] 26 Mar 2020 pergraph" [6]. In contrast to a graph, in a hyper- Systems Modeling graph H = hV; Ei those same vertices are now con- nected generally in a family E of hyperedges, where now a hyperedge e 2 E is an arbitrary subset e ⊆ V In the study of complex systems, graph theory has of k vertices, thereby representing a k-way relation- been the mathematical scaffold underlying network ship for any integer k > 0. Hypergraphs are thus science [4]. A graph G = hV; Ei comprises a set the natural representation of a broad range of sys- V of vertices connected in a set E ⊆ V of edges 2 tems, including those with the kinds of multi-way (where V here means all unordered pairs of v 2 V ), 2 relationships mentioned above. Indeed, hypergraph- where each edge e 2 E is a pair of distinct vertices. structured data (i.e. hypernetworks) are ubiquitous, Systems studied in biology, sociology, telecommuni- occurring whenever information presents naturally as ∗PNNL PNNL-SA-152208 yPacific Northwest National Laboratory, Seattle, WA 1Throughout this paper we will deal only with \basic" hy- zPacific Northwest National Laboratory, Richland, WA pergraphs in the sense of being undirected, unordered, and un- xU. of Colorado Anschutz Medical Campus, Denver, CO labeled. All of these forms are available and important [3, 16]. 1 2 set-valued, tabular, or bipartite data. ery graph is a 2-uniform hyergraph, so that jej = Hypergraph models are definitely more complicated k = 2 for all hyperedges e 2 E. Thus they sup- than graphs, but the price paid allows for higher port a wider range of mathematical methods, such as fidelity representation of data which may contain those from computational topology, to identify fea- tures specific to the high-dimensional complexity in multi-way relationships. An example from bibliomet- rics is shown in Figure 1.2 On the upper left is a table hypernetworks, but not available using graphs. showing a selection of five papers co-authored by dif- Hypergraph methods are well known in discrete ferent collections of four authors. Its hypergraph is mathematics and computer science, where, for ex- shown in the lower left in the form of an \Euler di- ample, hypergraph partitioning methods help enable agram", with hyperedges as colored bounds around parallel matrix computations [9], and have applica- groups of vertices. A typical approach to these same tions in VLSI [22]. In the network science literature, data would be to reduce the collaborative structure researchers have devised several path and motif-based to its so-called \2-section": a graph of all and only hypergraph data analytics (albeit fewer than their the two-way interactions present, whether explicitly graph counterparts), such as in clustering coefficients listed (like paper 1) or implied in virtue of larger [31] and centrality metrics [14]. Although an expand- collaborations (as for paper 2). That co-authorship ing body of research attests to the increased utility graph is shown in the lower right, represented by its of hypergraph-based analyses [17, 20, 24, 30], and are adjacency matrix in the upper right. It can be seen seeing increasingly wide adoption [19, 28, 29], many that the reduced graph form necessarily loses a great network science methods have been historically de- deal of information, ignoring information about the veloped explicitly (and often, exclusively) for graph- single-authored paper (3) and the three-authored pa- based analyses. Moreover, it is common for analysts per (2). to reduce data arising from hypernetworks to graphs, thereby losing critical information. As explicit generalizations of graphs, we must take care with axiomatization, as there are many, some- times mutually inconsistent, sets of possible defini- tions of hypergraph concepts which can yield the same results (all consistent with graph theory) when instantiated to the 2-uniform case. And some graph concepts have difficulty extending naturally at all. For example, extending the spectral theory of graph adjacency matrices to hypergraphs is unclear: hy- peredges may contain more than two vertices, so the usual (two-dimensional) adjacency matrix cannot code adjacency relations. In other cases, hypergraph extensions of graph theoretical concepts may be nat- Figure 1: Bibliometrics example comparing graphs ural, but trivial, and risk ignoring structural prop- and hypergraphs. (Upper left) Collaborative authorship erties only in hypergraphs. For example, while edge structure of a set of papers. (Lower left) Euler diagram incidence and vertex adjacency can occur in at most of its hypergraph. (Upper right) Adjacency matrix of the one vertex or edge for graphs, these notions are set- 2-section. (Lower right) Co-authorship graph resulting valued and hence quantitative for hypergraphs. So from the 2-section. while subsequent graph walk concepts like connectiv- ity are applicable to hypergraphs, if applied simply, Hypergraphs are closely related to important objects they ignore high-order structure in not accounting for in discrete mathematics used in data science such the \widths" of hypergraph walks. as bipartite graphs, set systems, partial orders, fi- Researchers have handled the complexity and ambi- nite topologies, simplicial complexes, and especially guity of hypergraphs in different ways. A very com- graphs proper, which they explicitly generalize: ev- mon approach is to limit attention to k-uniform hy- 2Hypergraph calculations shown in this pa- pergraphs, where all edges have precisely k vertices per were produced using PNNL's open source hy- (indeed, one can consider graph theory itself as ac- pergraph analytical capabilities HyperNetX (HNX, tually the theory of 2-uniform hypergraphs). This is https://github.com/pnnl/HyperNetX) and the Chapel Hy- pergraph Library (CHGL, https://github.com/pnnl/chgl); the case with much of the hypergraph mathematics and additionally diagrams were produced in HNX. literature, including hypergraph coloring [11, 25], hy- 3 V pergraph spectral theory [7, 8], hypergraph transver- hypergraphs, so that now E ⊆ 2 and all e 2 E are sals [2], and extremal problems [32]. k-uniformity is a unordered pairs with jej = 2. It follows that concepts very strong assumption, which supports the identifi- and methods in hypergraph theory should explicitly cation of mathematical results. But real-world hyper- specialize to those in graph theory for the 2-uniform graph data are effectively never uniform; or rather, if case. But conversely, starting from graph theory con- presented with uniform hypergraph data, a wise data cepts, there can be many ways of consistently extend- scientists would be led to consider other mathemati- ing them to hypergraphs. The reality of this will be cal representations for parsimony. seen in a number of instances below. Another prominent approach to handling real-world, Hypergraphs can be represented in many forms. In and thus non-uniform, hypergraph data is to study our example in Fig. 1 above, letting V = fa; b; c; dg simpler graph structures implied by a particular hy- for the authors and E = f1; 2; 3; 4; 5g for the papers, pergraph. Known by many names, including line we first represent it as a set system graph, 2-section, clique expansion, and one-mode projection, such reductions allow application of stan- H = ffa; dg; fa; c; dg; fdg; fa; bg; fb; cgg: dard graph-theoretic tools. Yet, unsurprisingly, such hypergraph-to-graph reductions are inevitably and We commonly compress set notation for conve- strikingly lossy [10, 23].