Latent Poisson Models for Networks with Heterogeneous Density

Latent Poisson models for networks with heterogeneous density Tiago P. Peixoto∗ Department of Network and Data Science, Central European University, H-1051 Budapest, Hungary ISI Foundation, Via Chisola 5, 10126 Torino, Italy and Department of Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom Empirical networks are often globally sparse, with a small average number of connections per node, when compared to the total size of the network. However, this sparsity tends not to be homogeneous, and networks can also be locally dense, for example with a few nodes connecting to a large fraction of the rest of the network, or with small groups of nodes with a large probability of connections between them. Here we show how latent Poisson models that generate hidden multigraphs can be effective at capturing this density heterogeneity, while being more tractable mathematically than some of the alternatives that model simple graphs directly. We show how these latent multigraphs can be reconstructed from data on simple graphs, and how this allows us to disentangle disassortative degree-degree correlations from the constraints of imposed degree sequences, and to improve the identification of community structure in empirically relevant scenarios. I. INTRODUCTION in isolation. For example, although networks with degree heterogeneity can exhibit in principle any kind of mixing One of the most important properties of empirical net- pattern, there is a stronger tendency of very heteroge- works — representing the pairwise interactions of social, neous networks to exhibit degree disassortativity [5–7]. biological, informational and technological systems — is This is because once the degrees of a fraction of the nodes that they exhibit a strong structural heterogeneity, while become comparable to the total number of nodes in the being globally sparse [1]. The latter property means network, there is no other option than to connect them that most possible connections between nodes are not with nodes of lower degree. Since it is not possible to observed, which as a consequence means that, on aver- fully decouple degree heterogeneity from mixing, it can age, the probability of observing a connection between become difficult to determine whether the latter is sim- two nodes is very small, and hence the typical number of ply a byproduct of the former, or if it can be related to connections each node receives is much smaller than the other properties of network formation. total number of nodes in the network. For example, even The degree disassortativity induced from broad degree though the global human population is in the order of bil- distributions can also occur in networks exhibiting com- lions, most people interact only with a far smaller number munity structure, and in a similar way: if a node has a of other people. Nevertheless, such network systems are degree comparable to the number of nodes in the com- rarely homogeneously sparse: instead, local portions of munity to which it belongs, it will tend to be connected the network can vary greatly in their number of interac- to nodes of the same community with a smaller degree. tions. As has been widely observed [2], the number of The resulting mixing pattern may confuse community de- neighbors of each node is very often broadly distributed, tection methods that do not account for this possibility, typically spanning several orders of magnitude. In addi- which will mistake the pattern that arises from a purely tion, networks exhibit diverse kinds of mixing patterns intrinsic constraint, with one that needs an extrinsic ex- in relation to the degrees [3], e.g. nodes may connect to planation in the form of a different division of the network other nodes with similar degree (assortativity), or nodes into groups. with high degree may connect preferentially with nodes In this work we address the problem of describing de- of low degree and vice-versa (disassortativity). It is pos- gree and density heterogeneity by considering models of sible also for networks to possess communities of tightly random multigraphs, i.e. where more than one link be- connected nodes [4], such that the probability of a link tween nodes is allowed, as well as self-loops, following a existing between members of these subgroups far exceeds Poisson distribution, where a full decoupling of the de- the global average. The existence of such heterogeneous gree distribution and degree mixing patterns is in fact mixing patterns serves as a signature of the process re- possible. These can be transformed into models of simple sponsible for the network formation and may give insight graphs by erasing self-loops and collapsing any existing into its functional aspects. multiedges into a single edges. Conversely, we can re- arXiv:2002.07803v4 [physics.soc-ph] 17 Jul 2020 A central complicating factor in the characterization cover the decoupling of degree variability and mixing by and understanding of the different kinds of mixing pat- reconstructing an underlying multigraph from a given ob- terns in networks is that they cannot be fully understood servation of a simple graph. Then, by inspecting the in- ferred multigraph, we can finally determine what is cause and byproduct. We also show how latent Poisson models can be em- ∗ [email protected] ployed in the task of community detection in the presence 2 of density and degree heterogeneity. When dealing with entropy [10, 11] simple graphs, degree correction [8], as we show, is in gen- X eral is not sufficient to disentangle community structure = P (A) ln P (A): (2) S − from induced degree disassortativity. When performing A latent multigraph reconstruction, we demonstrate that this becomes finally possible. Furthermore, we show how Employing the method of Lagrange multipliers to per- the latent multigraph approach is more effective at de- form the constrained maximization yields a product of scribing the heterogeneous density of many networks, independent distributions for each entry in the adjacency when compared to just using a multigraph model directly matrix, to represent a simple graph, as is often done [9]. In par- Aij Y (θiθj) ticular, we show how this increases our ability to uncover P (A) = ; (3) Z smaller groups in large networks. i<j ij This paper is divided as follows. We begin in Sec. II by formalizing the intrinsic effect of degree constraints by where the θ are “fugacities” (exponentials of Lagrange employing the maximum-entropy principle, and we show multipliers) that keep the constraints in place [5, 11–13], how the Poisson model appears naturally when multiedge and Z = P (θ θ )Aij is a normalization constant, ij Aij i j distinguishability is taken into consideration. In Sec. III comprised of a sum over all possible values of Aij. Thus, we describe the erased Poisson model for simple graphs, the value of Zij will be different depending on whether we compare its induced degree correlations with alter- we are dealing with simple graphs or multigraphs. For native maximum-entropy models, and present methods the case of simple graphs with A 0; 1 , we have ij 2 f g of Bayesian inference capable of reconstructing it from Zij = 1 + θiθj, which results in independent Bernoulli simple graph data. We then show how it can shed light distributions for every node pair, into the origins of degree disassortativity in empirical net- Aij works. In Sec. IV we show how the erased Poisson model Y (θiθj) P (A) = ; (4) can improve the task of community detection, by allow- 1 + θiθj ing the induced degree mixing to be decoupled from the i<j modular network structure, and also describe arbitrary with mean values density heterogeneity, and thereby enhance the resolu- tion of small dense communities in large globally sparse θiθj Aij = : (5) networks. We finalize in Sec. V with a conclusion. h i 1 + θiθj In order for the constraints of Eq. 1 to be fulfilled, the II. MAXIMUM-ENTROPY ENSEMBLES FOR fugacities need to be chosen by solving the system of SIMPLE AND MULTIGRAPHS nonlinear equations X θiθj ^ One of our primary objectives is to model the effects of = ki; (6) 1 + θiθj degree heterogeneity in network structure. To this end, j=i 6 we will concern ourselves with network ensembles that satisfy the constraint that the expected degrees of the which in general does not admit closed analytical solu- nodes are given as parameters. Specifically, if P (A) is tions, and needs to be solved numerically. the probability that network A occurs in the ensemble, For multigraphs with Aij N0, we have instead P Aij 2 1 Zij = 1 (θiθj) = (1 θiθj)− , assuming θiθj < 1, we have that the following condition needs to hold Aij =0 − which results in a product of geometric distributions, X X ^ P (A) Aji = ki; (1) Y P (A) = (θ θ )Aij (1 θ θ ); (7) A j i j − i j i<j for a given expected degree sequence k^ = k^ ;:::; k^ , f 1 N g with mean values where Aij determines the number of edges between nodes P i and j, ki = j Aji is the degree of node i, and and θiθj Aij = ; (8) N is the number of nodes in the network. Given the h i 1 θiθj constraints of Eq. 1, there are many choices of ensem- − ble P (A) that satisfy it. Since we want to understand and the fugacities are obtained by solving an analogous the intrinsic effect of the imposed degrees on other as- but different system of equations pects of the network structure, we are interested in the choice of A that is maximally uniform, or agnostic, X θiθj P ( ) = kî; (9) with respect to the possible networks, conditioned only 1 θiθj j=i − that the above constraint is satisfied.

Latent Poisson Models for Networks with Heterogeneous Density

Beta Current Flow Centrality for Weighted Networks Konstantin Avrachenkov, Vladimir Mazalov, Bulat Tsynguev

General Approach to Line Graphs of Graphs 1

P2k-Factorization of Complete Bipartite Multigraphs

Trees and Graphs

Sphere-Cut Decompositions and Dominating Sets in Planar Graphs

Modeling Multiple Relationships in Social Networks

The Chromatic Index of Nearly Bipartite Multigraphs

Analyzing Local and Global Properties of Multigraphs

A Multigraph Approach to Social Network Analysis

Classifying Multigraph Models of Secondary RNA Structure Using Graph-Theoretic Descriptors

Complex Networks As Hypergraphs

Laplacian Centrality