Latent Poisson models for networks with heterogeneous density

Tiago P. Peixoto∗ Department of Network and Data Science, Central European University, H-1051 Budapest, Hungary ISI Foundation, Via Chisola 5, 10126 Torino, Italy and Department of Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom

Empirical networks are often globally sparse, with a small average number of connections per node, when compared to the total size of the network. However, this sparsity tends not to be homogeneous, and networks can also be locally dense, for example with a few nodes connecting to a large fraction of the rest of the network, or with small groups of nodes with a large probability of connections between them. Here we show how latent Poisson models that generate hidden multigraphs can be effective at capturing this density heterogeneity, while being more tractable mathematically than some of the alternatives that model simple graphs directly. We show how these latent multigraphs can be reconstructed from data on simple graphs, and how this allows us to disentangle disassortative -degree correlations from the constraints of imposed degree sequences, and to improve the identification of in empirically relevant scenarios.

I. INTRODUCTION in isolation. For example, although networks with degree heterogeneity can exhibit in principle any kind of mixing One of the most important properties of empirical net- pattern, there is a stronger tendency of very heteroge- works — representing the pairwise interactions of social, neous networks to exhibit degree disassortativity [5–7]. biological, informational and technological systems — is This is because once the degrees of a fraction of the nodes that they exhibit a strong structural heterogeneity, while become comparable to the total number of nodes in the being globally sparse [1]. The latter property means network, there is no other option than to connect them that most possible connections between nodes are not with nodes of lower degree. Since it is not possible to observed, which as a consequence means that, on aver- fully decouple degree heterogeneity from mixing, it can age, the probability of observing a connection between become difficult to determine whether the latter is sim- two nodes is very small, and hence the typical number of ply a byproduct of the former, or if it can be related to connections each node receives is much smaller than the other properties of network formation. total number of nodes in the network. For example, even The degree disassortativity induced from broad degree though the global human population is in the order of bil- distributions can also occur in networks exhibiting com- lions, most people interact only with a far smaller number munity structure, and in a similar way: if a node has a of other people. Nevertheless, such network systems are degree comparable to the number of nodes in the com- rarely homogeneously sparse: instead, local portions of munity to which it belongs, it will tend to be connected the network can vary greatly in their number of interac- to nodes of the same community with a smaller degree. tions. As has been widely observed [2], the number of The resulting mixing pattern may confuse community de- neighbors of each node is very often broadly distributed, tection methods that do not account for this possibility, typically spanning several orders of magnitude. In addi- which will mistake the pattern that arises from a purely tion, networks exhibit diverse kinds of mixing patterns intrinsic constraint, with one that needs an extrinsic ex- in relation to the degrees [3], e.g. nodes may connect to planation in the form of a different division of the network other nodes with similar degree (), or nodes into groups. with high degree may connect preferentially with nodes In this work we address the problem of describing de- of low degree and vice-versa (disassortativity). It is pos- gree and density heterogeneity by considering models of sible also for networks to possess communities of tightly random multigraphs, i.e. where more than one link be- connected nodes [4], such that the probability of a link tween nodes is allowed, as well as self-loops, following a existing between members of these subgroups far exceeds Poisson distribution, where a full decoupling of the de- the global average. The existence of such heterogeneous gree distribution and degree mixing patterns is in fact mixing patterns serves as a signature of the process re- possible. These can be transformed into models of simple sponsible for the network formation and may give insight graphs by erasing self-loops and collapsing any existing into its functional aspects. multiedges into a single edges. Conversely, we can re-

arXiv:2002.07803v4 [physics.soc-ph] 17 Jul 2020 A central complicating factor in the characterization cover the decoupling of degree variability and mixing by and understanding of the different kinds of mixing pat- reconstructing an underlying multigraph from a given ob- terns in networks is that they cannot be fully understood servation of a simple graph. Then, by inspecting the in- ferred multigraph, we can finally determine what is cause and byproduct. We also show how latent Poisson models can be em- ∗ [email protected] ployed in the task of community detection in the presence 2 of density and degree heterogeneity. When dealing with entropy [10, 11] simple graphs, degree correction [8], as we show, is in gen- X eral is not sufficient to disentangle community structure = P (A) ln P (A). (2) S − from induced degree disassortativity. When performing A latent multigraph reconstruction, we demonstrate that this becomes finally possible. Furthermore, we show how Employing the method of Lagrange multipliers to per- the latent multigraph approach is more effective at de- form the constrained maximization yields a product of scribing the heterogeneous density of many networks, independent distributions for each entry in the adjacency when compared to just using a multigraph model directly matrix, to represent a simple graph, as is often done [9]. In par- Aij Y (θiθj) ticular, we show how this increases our ability to uncover P (A) = , (3) Z smaller groups in large networks. i

X θiθj ˆ One of our primary objectives is to model the effects of = ki, (6) 1 + θiθj degree heterogeneity in network structure. To this end, j=i 6 we will concern ourselves with network ensembles that satisfy the constraint that the expected degrees of the which in general does not admit closed analytical solu- nodes are given as parameters. Specifically, if P (A) is tions, and needs to be solved numerically. the probability that network A occurs in the ensemble, For multigraphs with Aij N0, we have instead P Aij ∈ 1 Zij = ∞ (θiθj) = (1 θiθj)− , assuming θiθj < 1, we have that the following condition needs to hold Aij =0 − which results in a product of geometric distributions, X X ˆ P (A) Aji = ki, (1) Y P (A) = (θ θ )Aij (1 θ θ ), (7) A j i j − i j i

In both of the above cases, if all imposed degrees kˆi are If we now consider the associated multigraph ensemble, √ P ˆ by ignoring the edge types, we obtain a product of bino- sufficiently smaller than 2E, with 2E = i ki being twice the number of expected edges, then in the limit mial distributions N 1 the fugacities can be obtained approximately as X Y P (A) = P (X) δAij,P Xm (14)  m ij X i

2.00 Poisson multigraph θiθj/(1 + θiθj) 105 Max-Ent simple graph 1.75 θiθj/(1 θiθj) Erased Poisson simple graph − Max-Ent multigraph 1.50 θiθj 4 1 exp( θiθj) 10

i − − 1.25 ij G h nn i or 1.00 k

h 3 i 10 ij

A 0.75 h

0.50 102

0.25

0.00 0 1 2 3 4 5 100 101 102 103 104 105 θiθj k

Figure 1. Average number of edges between nodes as a func- Figure 3. Mean degree of a neighbor of a node of degree k, as a tion of the product of their fugacities, for the different ensem- function of k, k nn(k), for the different ensembles considered, bles, as shown in the legend. with N = 106handi the same set of imposed degrees sampled from Eq. 27 with α = 2.2. (The error bars on the k values h inn 2.0 2.0 are smaller than the symbols used.) θiθj /(1 + θiθj ) θiθj /(1 + θiθj ) θ θ /(1 θ θ ) θ θ /(1 θ θ ) i j − i j i j − i j

i 1.5 θiθj i 1.5 θiθj ij ij 1 exp( θiθj ) 1 exp( θiθj ) G − − G − − h h

or 1.0 or 1.0 i i 0.6 ij ij r A A h 0.5 h 0.5 0.4 Poisson multigraph Max-Ent simple graph 0.0 0.0 0 1 2 3 4 5 0 1 2 3 4 5 Erased Poisson simple graph θ θ θ θ 0.2 (a) i j (b) i j Max-Ent multigraph Assortativity 0.0 Figure 2. The same as Fig. 1, but showing the values for 6 0.2 the edges of samples of each model with N = 10 nodes, and − with the same set of imposed degrees sampled from the Zipf 2.0 2.2 2.4 2.6 2.8 3.0 Zipf exponent α distribution of Eq. 27, using (a) α = 2.2 and (b) α = 2.4.

Figure 4. Degree assortativity r as a function of the Zipf probability, which also results in a disassortative degree- exponent α, for networks with N = 106 nodes sampled from degree correlation among neighbors. the models indicated in the legend. The difference between the ensembles can be further illustrated by choosing integer-valued imposed degrees that are independently sampled from a Zipf distribution fugacities and corresponding edge probabilities become more similar across ensembles. k α The induced degree correlations among neighbors in P (k α) = − , (27) | ζ(α) each ensemble are shown in Fig. 3, in each case for the same set of imposed degrees sampled from Eq. 27. Some with ζ(α) being the Riemann zeta function. For values aspects of the degree correlations of the erased Poisson of α [2, 3] the variance of this distribution diverges, model were considered rigorously in Refs. [21, 22], and ∈ while the mean remains finite, therefore serving as a sim- in the case of the maximum-entropy simple graph model ple model of globally sparse but locally dense networks. in Ref. [5]. One important point to notice is that, al- In Fig. 2 we show how the edges present in one sample of though the erased Poisson and maximum-entropy mod- each model are distributed along the curves of Fig. 1: els for simple graphs are not identical, they generate a A smaller value of α creates broader and denser net- very similar disassortative trend, indicating a comparable works, for which the discrepancy between all models is explanatory power for this kind of effect. very large. Although the mean degree of the generated A further comparison is seen in Fig. 4 where the de- networks in the case α = 2.2 is only around 3.75, even on gree assortativity coefficient [3] r [ 1, 1] is shown as a a network of N = 106 nodes the probability of observing function of the Zipf exponent α, defined∈ − as an edge between two nodes approaches one for a signifi- P cant number of pairs. As the exponent α increases, the kk0(mkk qkqk ) r = kk0 0 − 0 , (28) network becomes more homogeneously sparse, and the 2 σk 6 with P being the fraction of data (we will revisit this assumption in Sec. IV). Finally, mkk0 = ij Aijδki,kδkj ,k0 /2E P edges between nodes of degrees k and k0, q = m we have the so-called evidence k k0 kk0 is distribution of degrees at the endpoints of edges, and X Z 2 is its variance. We see that the same pattern persists P (G) = P (G A)P (A θ)P (θ) dθ, (31) σk | | for the entire range of α [2, 3], with the assortativity A ∈ values being very similar between the erased Poisson and which is an unimportant constant for our present pur- maximum-entropy simple graph models. pose. With the posterior distribution P (A, θ G) in place, The similarity between the erased Poisson and the we can proceed in a variety of ways, for example| by sam- maximum-entropy simple graph model makes the former pling from it using MCMC. But instead, we will proceed attractive as an alternative, considering its flexibility, as in a more efficient manner, by considering first the most we are about to explore in the next section. likely fugacities, when averaged over all possible multi- graphs A, i.e. X B. Reconstructing the erased Poisson model θˆ = argmax P (A, θ G) = argmax P (G θ), (32) | | θ A θ The ensembles considered previously translate differ- where P (G θ) is the erased Poisson likelihood of Eq. 25. ent assumptions about the data into probability distribu- Noting that| taking the logarithm of the likelihood does tions, conditioned on desired constraints, and mediated not alter the position of its maximum, and substituting by the maximum-entropy ansatz. In principle we should leads to choose the set of assumptions that most closely matches ˆ X θiθj  the data in question. θ = argmax Gij ln 1 e− (1 Gij)θiθj. (33) θ − − − Although we have argued that the Poisson model is i

With this at hand, we then return to the maximization to obtain 1.00 X P (G A)P (A θ) θˆ = argmax q(A) ln | | (39) q(A) 0.95 θ A X X∞ = argmax q(Aij) ln P (Aij θ) (40) 0.90 θ | i j A =0 ≤ ij 1 X (41) = argmax wij ln θiθj θiθj. α = 2.1 θ 2 − 0.85 ij α = 2.2 Similarity to original network The last equation can be solved easily, which yields α = 2.5 0.80 α = 2.8 d ˆ i 2 3 4 5 6 θi = q , (42) 10 10 10 10 10 P N j dj where Figure 5. Poisson multigraph reconstruction accuracy as mea- X sured via the similarity of Eq. 46 for simple graphs sampled di = wji (43) from the erased Poisson model with imposed degrees sampled j from a Zipf distribution with exponent α, as a function of the number of nodes N. Each point was averaged over 100 is the expected degree of node i in the multigraph A, realizations. averaged over P (A G, θˆ). Since we are interested in the | self-consistent values of w conditioned on θˆ, this leads us to the following expectation-maximization (EM) algo- rithm, which starts with some arbitrary choice of θ, and alternates between the following steps 104 1. In the “expectation” step we obtain the marginal mean multiedge multiplicities via:

3  θiθj 10 if nn θ θ Gij = 1, i  1 e− i j  k 2− (44) h wij = θi if i = j, 0 otherwise. 102

2. In the “maximization” step we use the current val- Max-Ent simple graph ues of w to update the values of θ: Poisson reconstruction

di X 0 1 2 3 4 5 θ = , with d = w . (45) 10 10 10 10 10 10 i qP i ji k j dj j

Upon convergence, the above EM algorithm is guaran- Figure 6. Mean degree of a neighbor of a node of degree k, as a function of k, for a network with N = 106 nodes sampled teed to find only a local optimum of the maximization from the maximum-entropy ensemble with imposed degrees problem, therefore we may need to run it multiple times sampled from Eq. 27 with α = 2.2, and its inferred Poisson with different initial choices of θ. However, in all our ex- multigraph, using the algorithm described in the text. periments we found that the algorithm tends to find the same solution from any initial starting point, even when this happens to be the correct solution (in artificially generated examples where this is known), giving strong data and initial conditions, but we have successfully run evidence that the global optimum is usually found. This it on networks with up to 108 edges on a regular lap- algorithm is efficient, since we need only to keep track of top computer. Our C++ implementation of the above the values for w for the observed edges in G, in addition algorithm is available as part of the graph-tool Python to each self-. Therefore the E-step can be done in library [23]. O(E + N) time, and the M-step in O(N) time, resulting in an overall O(E + N) computational complexity. The In Fig. 5 we show how the algorithm behaves in re- algorithm can also be run in parallel easily. The number covering the underlying multigraph of artificial networks of EM iterations required for convergence depends on the sampled from the erased Poisson model, as measured via 8

0.00 r

0.05 1.00 −

0.10 0.75 − Max-Ent simple graph 0.15 0.50 Degree assortativity − Poisson reconstruction 0 r 2.0 2.2 2.4 2.6 2.8 3.0 0.25 Power-law exponent α 0.00 Figure 7. Degree assortativity as function of the Zipf exponent 6 0.25 α, for networks with N = 10 nodes sampled from the max- − imum entropy ensemble with Zipf-distributed imposed de- Latent assortativity 0.50 grees, and their corresponding inferred Poisson multigraphs. −

0.75 − the Jaccard similarity 1.00 P − 1.0 0.5 0.0 0.5 1.0 ij wij Aij − − s(w, A) = 1 P | − | (46) Observed assortativity r − ij wij + Aij between the true and inferred multigraphs, with imposed Figure 8. Degree assortativity for original simple graph degrees sampled from a Zipf distribution. For smaller r, and for the reconstructed multigraph r0 for 816 empiri- cal networks, obtained from the CommunityFitNet [24] and values of the exponent α, which causes the edge multi- Konect [25] databases. plicities to become larger, the recovery becomes less ac- curate, but in all cases it approaches s(w, A) 1 as the → number of nodes increases, indicating that full recovery is 104 possible asymptotically as the amount of available data increases. Since this algorithm gives us a distribution over multi- graphs, we can use it to investigate whether the de- gree correlations of an observed simple graphs exist as 103 a necessary outcome of the existing degrees, or if they nn i k should be attributed to something else. In the first sce- h nario, the degree-degree correlations would disappear in the inferred multigraph, whereas they would persist in the second one. Interestingly, this works reasonably well 2 Original, r = 0.19 10 − even when the observed graph was not sampled from the Reconstructed, r0 = 0.0017 erased Poisson model. We illustrate this with an exam- − ple in Fig. 6, where a simple graph was generated from a 0 1 2 3 4 maximum-entropy model with Zipf-distributed degrees, 10 10 10 10 10 k and we inferred from it a corresponding Poisson multi- graph. Even though the inferred multigraph still shows a weak degree correlation between neighboring nodes, since Figure 9. Mean degree of a neighbor of a node of degree k the erased Poisson model cannot fully account for the as a function of k, for the internet at the autonomous sys- structure of the maximum-entropy model, the overall dis- tems level, both for the original network and reconstructed assortative trend is completely absent. Fig. 7 shows the multigraph, as shown in the legend, which includes the de- degree assortativity values for both original and recon- gree assortativity coefficient of each case. structed networks over a range of α [2, 3]. Although the reconstructed networks still show values∈ r < 0, their devi- ation from zero is barely noticeable in the figure. Like we C. Empirical networks had seen previously in Fig. 3, these results further show that the erased Poisson model generates mixing patterns We can use the erased Poisson model to decouple de- that, although not identical, are sufficiently similar to gree assortativity from the degree constraints by infer- the maximum-entropy simple graph model, allowing us ring the underlying multigraph where these properties to correctly conclude that the resulting degree correla- are not tied to each other. In Fig. 8 we show the results tions arise directly from the imposed degrees. of applying our algorithm to 816 networks across differ- 9

IV. COMMUNITY DETECTION

102 102 nn nn Another important type of heterogeneous sparsity is i i k k h h community structure [4], which can be loosely defined as

101 the existence of groups of nodes with a high probabil- Original, r = 0.23 Original, r = 0.31 − 1 − Reconstructed, r0 = 0.018 10 Reconstructed, r0 = 0.094 − − ity of connection to themselves, or also to other groups. 100 101 102 100 101 102 103 Models for networks with this kind of structure can be (a) k (b) k obtained by forcing the number of edges between groups to have specific values. More precisely, we assume the 4 10 nodes are divided into B disjoint groups, with bi [1,B] Original, r = 0.23 − ∈ Reconstructed, r0 = 0.0048 − denoting the group membership of node i. With this,

nn 3 nn

i 10 i k k

h h and in addition to the expected degree constraints, we 102 102 have the expected edge counts between groups given by

Original, r = 0.32 − Reconstructed, r0 = 0.32 − 101 X X 100 101 102 103 104 100 101 102 103 P (A) Aijδbi,rδbj ,s = mrs. (47) k k (c) (d) A ij

Figure 10. Mean degree of a neighbor of node of degree k as a Performing the same calculation as before, i.e. maxi- function of k, for (a) the metabolic network of C. elegans [26], mizing the ensemble entropy conditioned on the above (b) the network of leaked emails of the Democratic National constraints in addition to Eq. 1 we arrive at the model Committee, (c) the class dependency graph of a large software project [27], and (d) the online Flixter [28], Aij Y (λbibj θiθj) both for the original network and reconstructed multigraph, P (A λ, θ, b) = , (48) | λ θ θ + 1 as shown in the legend, which includes the degree assortativity i 0 we do not observe any significant (49) difference, as this kind of mixing pattern is unrelated which once more cannot be solved in closed form. to degree constraints. However, for disassortative val- Instead, if we consider multigraphs with distinguish- ues r < 0 we observe a variety of behaviors, where for able multiedges, performing the same calculations as be- many networks the assortativity value is significantly in- fore, we arrive at the Poisson version of the degree- creased in the reconstructed multigraph, indicating that corrected SBM (DC-SBM), originally proposed by Karrer observed correlations can be largely associated with the and Newman [8] broadness of the degrees. A prime example of this is λ θ θ the Internet at the autonomous systems level [29], which Aij bibj i j Y (λbibj θiθj) e− P (A λ, θ, b) = has been long considered as a case where the observed | A ! × disassortativity is a byproduct of the broad degree se- i 1, sidered previously, when applied to simple graph data. In ij| ij ± 0 otherwise. order to alleviate these problems, we may therefore also employ the erased Poisson model for community detec- (62) tion, with a likelihood For the DC-SBM model above, a move b b0 that changes the group membership of a single node→ can be

Y λb b θiθj Gij λb b θiθj (1 Gij ) done in time O(k ) where k is the degree of that node in P (G λ, θ, b) = (1 e− i j ) e− i j − . i i | − G, independently of how many groups exist in total [31]. i

(a) (b)

Figure 11. Inferred groups for a political blog network generated form the maximum-entropy DC-SBM, inferred using the (a) Poisson DC-SBM and (b) the erased Poisson DC-SBM, with inferred groups indicated by the node colors, and node degrees by the node sizes. The layout was obtained with the spring-block algorithm of Ref. [34], which tries to place nodes together if they are connected by an edge. The “liberal” and “conservative” groups correspond to the visible left (yellow) and right (blue) clusters, respectively. The layout places nodes with high degree in the center of the figure, which in (a) are clustered into their own separate groups, whereas in (b) they are merged with their true .

being the entropy of the membership distribution. For 2 102 the DC-SBM above, the posterior distribution fluctu- × ates around 6 groups, falling significantly short of the expected 24. At first, one might think that this problem is due to an underfitting of the model, caused by the use of non- nn

i informative priors. As was shown in Refs. [31, 41], the k 102 h use of such priors incurs a penalty in the posterior log- probability that grows quadratically with the number of groups, which in turn means that no more than O(√N) groups can ever be inferred in sparse networks. This 6 101 Original, r = 0.17 issue is resolved by replacing the noninformative priors × − Reconstructed, r0 = 0.0073 by a deeper hierarchy of priors and their hyperpriors, − forming a nested DC-SBM [29]. The use of the nested 100 101 102 103 k model, which remains nonparametric and agnostic about mixing patterns, increases the inference resolution by en- abling the identification of up to O(N/ log N) groups. In Figure 12. Mean degree of a neighbor of a node of degree a similar example of a network composed of 64 cliques of k as a function of k, for a political blog network generated size 10, the nested model is capable of identifying all 64 form the maximum-entropy DC-SBM, and the corresponding cliques, whereas the “flat” version finds only 32 groups, inferred latent multigraph. composed each of 2 cliques [9]. However, when applied to the current example, the use ample by Good et al [39] of a situation where community of the nested Poisson DC-SBM is not sufficient to uncover detection methods fail to find the more obvious pattern. all 24 cliques. In Fig. 14b we see the posterior distribu- Indeed, as was shown recently by Riolo and Newman [40], tion of effective group sizes for the nested Poisson model. inferring the DC-SBM also yields unsatisfactory results, Although the mean number of groups increases, it still where adjacent cliques are merged together, as shown in falls short of the 24. This indicates that the problem Fig. 13a. In Fig. 14a we see the posterior distribution of may be not only underfitting, but also mispecification, Se effective number of groups, defined as Be = e , with i.e. the model is simply not adequate to describe the data. Indeed a closer inspection reveals that this is pre- X nr nr cisely the case. A version of the DC-SBM that should Se = ln , (69) − N N r be able to generate the given number of cliques would 13

(a) (b) (c)

Figure 13. (a) Network composed of a ring of 24 cliques of size 5, connected to each other by a single edge. The node colors and shapes correspond to a typical partition sampled from the posterior distribution of Eq. 52. (b) Network sampled from the maximum-likelihood Poisson DC-SBM obtained from the network in (a) and putting each clique in their own group (as shown by the node colors and shapes). (c) Fit of the erased Poisson nested DC-SBM to the network in (a), showing a sampled partition from the posterior distribution (node shape and color), coinciding perfectly with the individual cliques, and the marginal posterior distribution of edge multiplicities (edge thickness).

3 1.0 1.5

2 1.0 0.8 )

λ λ

1 | Poisson, P (m = 1 λ) = e λ 0.5 0.6 | − λ

= 1 Erased Poisson, P (m = 1 λ) = 1 e− Probability density Probability density | −

0 0.0 m

4 6 8 5.0 7.5 10.0 ( 0.4 B B P (a) e (b) e 0.2 0.75 75 0.0 0.50 50 0 2 4 6 8 10 λ 0.25 25 Probability density Probability density Figure 15. Probability of observing a sample from the 0.00 0 m = 1 5.0 7.5 10.0 22 24 Poisson and erased Poisson models, as a function of the pa- B B (c) e (d) e rameter λ, as shown in the legend. The vertical and horizontal lines show the maximum of the Poisson at λ = 1 and P (m = Figure 14. Posterior distribution of effective number of groups 1 λ) = 1/e, and the asymptotic value of P (m = 1 λ) 1 for the| erased Poisson model as λ . | → Be for the network in Fig. 13a, obtained with (a) the Poisson → ∞ DC-SBM, (b) the nested Poisson DC-SBM, (c) the erased Poisson DC-SBM, and (d) the nested erased Poisson DC- SBM, the latter showing a distribution highly concentrated is no longer negligible. This limitation is absent from on B = 24. e the erased Poisson model, which can describe arbitrary probabilities of single edges. In Fig. 13b we show a sam- ple of the DC-SBM with parameters chosen so that it be one where the probability of an edge existing between replicates the original network as closely as possible: the two nodes of the same clique would be very close to one. nodes are separated into 24 groups of size 5, the mean However, the Poisson model struggles at describing this number of edges between nodes of the same group is one, structure because it cannot allow for a single edge oc- and between adjacent groups is 1/25. The resulting net- curring with such a high probability, without generat- work is not only riddled with multiedges and self-loops, ing multiple edges as well. As is illustrated in Fig. 15, but its also shows a far more irregular structure than the the Poisson distribution can generate the occurrence of original one. However, through the lenses of the Poisson a single edge with a probability at most 1/e 0.368, model, both networks are difficult to distinguish, as they and even in that case the occurrence of multiple≈ edges have very similar likelihoods. Although it can be possi- 14

Karate club tlenose dolphins [36], and co-purchases of books about american politics [37]. In each case, using the erased Poisson DC-SBM we obtain a more detailed division of the network, with a larger number of groups, when com- pared to the employing the Poisson DC-SBM directly. The most extreme difference is obtained by the karate club network, where the Poisson DC-SBM yields a single group, but the erased Poisson version yields four groups. Poisson Erased Poisson The explanation for the difference in each case is the same Dolphins as for the ring-of-cliques example considered previously: since the Poisson DC-SBM is unable to ascribe high prob- abilities to the existence of edges, it puts a smaller statis- tical weight to dense regions of the network, even when that would be sufficient to point to the existence of a separate group. The erased Poisson model does not have this limitation and hence is able to isolate this kind of structure with more confidence. Poisson Erased Poisson Political books C. and group assortativity

An important pattern in network structure is the de- gree of assortativity, or , between node types. This is commonly measured via the modularity quan- tity [42], which counts the excess of edges between nodes Poisson Erased Poisson of the same type, when compared to a null model without any homophily, i.e. Figure 16. Inferred group memberships for Zachary’s karate club network [35], a dolphin social network [36], and co- 1 X Q = (G G ) δ , (70) purchases of political books [37], using the Poisson DC-SBM 2E ij − h iji bi,bj i=j and the erased Poisson DC-SBM. In each network, the latter 6 model reveals a larger number of groups, due to its increased ability of identifying heterogeneous densities. where Gij is the expectation of an edge (i, j) existing accordingh toi the chosen null model, and the normaliza- tion guarantees Q [ 1, 1]. The most often used null model is ∈ − , which corresponds to a Pois- ble to extract useful information even from mispecified Gij = kikj/2E son multigraphh i model with a maximum-likelihood choice models via a detailed inspection of the posterior distri- of fugacities √ . As we have discussed, the bution [40] — a powerful feature of Bayesian methods — θi = ki/ 2E Poisson model approaches the maximum-entropy simple this is not a satisfying resolution for such a simple exam- graph model if the degrees are sufficiently smaller than ple. Indeed, what we have is once more a situation where √ , otherwise this assumption becomes inadequate to the network is globally sparse but locally dense, and we 2E describe null models of simple graphs. We can use the should expect the erased Poisson model to behave better. erased Poisson model as a better alternative in two differ- In fact, using the nested DC-SBM based on the erased ent ways, the first of which is by simply using its expected Poisson model we can uncover all 24 cliques, as it is ca- θ θ value G = 1 e− i j , which yields pable of describing their probability more accurately. In h iji − Fig. 13c is shown the result obtained with the erased 1 X  θiθj  Poisson model, together with the inferred latent multi- Q = Gij (1 e− ) δbi,bj , (71) 2E − − i=j graph, which has a abundance of multiedges inside each 6 group, that translate into cliques once erased, with the   1 X X probability of each edge approaching one. Importantly, θiθj =  err nr(nr 1) + e− δbi,bj  , 2E − − the successful detection of the cliques is only possible if r i=j 6 the erased Poisson model is used together with the nested (72) priors of Ref. [31], which illustrates the combined effect of more appropriate model specification with structured P with 2E = ij Gij. The values of θ can be obtained ef- priors that prevent underfitting. ficiently with the EM algorithm presented in Sec III B. A We further illustrate the use of the erased Poisson disadvantage of this approach is that the computation of model with some further empirical examples in Fig. 16, the last term in the above equation requires O[(N/B)2] comprised of a social network between members of a operations, and thus is not very efficient for large net- karate club [35], an animal social network between bot- works. The second approach we describe is faster, and 15 is comprised of the computation of modularity for the paring the assortativity obtained. Our approach is more multigraph inferred from P (A G, θˆ) using the same EM constructive, since it yields an inference of a generative algorithm (instead of the simple| graph G directly), for model, rather than simply a comparison with a null one. which Aij = θiθj becomes the appropriate null model, This means it is more informative in situations where the i.e. h i degree constraints can account for only a portion of the correlations observed, in which case our approach yields 1 X Q = (w θ θ ) δ , (73) a residual multigraph, with a subtracted contribution of 2E ij − i j bi,bj 0 ij the degree constraints to the degree correlations, which can be further analyzed in arbitrary ways. 1 X ˆ2 = ωrr θr (74) We have also investigated the use of this model in 2E0 − r community detection, and shown how it is more ade- 1 X ω2 quate to uncover communities not only in simple graphs = ω r (75) 2E rr − 2E with broad degree distributions, but also when they pos- 0 r 0 sess strong community structure. In the latter case, the P P with with 2E0 = ij wij, ωrs = ij wijδbi,rδbj ,s, ωr = erased Poisson model is capable of combining degree cor- P ˆ P P √ rection with the existence of edge probabilities approach- s ωrs, θr = i θiδbi,r, and θi = j wji/ 2E0. This quantity can be computed in time O(E + N), and thus ing one, meaning it can easily model networks that are offers a significant speed advantage over the first one. globally sparse, but locally dense. The enhanced ex- The two approaches are not identical, and we should not planatory power is achieved by sacrificing neither math- expect to obtain the same value of Q between them in ematical tractability nor algorithmic efficiency. general, but in case the network was in fact sampled from The erased Poisson model has been used before as a the erased Poisson model, we must have Q 0 with means to combine multigraph generative models with either computation. ≈ measurement models for simple graphs, when perform- We note that the use of modularity maximization with ing joint network reconstruction with community detec- the purpose of identifying communities in networks, al- tion in Refs. [46, 47], although these works omitted a though a popular approach, is ill-advised. This is be- detailed analysis of this modelling approach. Since the cause that method cannot account for the statistical erased Poisson model is better specified for networks with significance of the node partitions found, and can lead strongly heterogeneous density, it remains to be deter- to misleading results, such as high-scoring partitions in mined to what extent it can improve link prediction and fully random graphs [43], non-modular networks such as network reconstruction, when compared to alternatives. trees [44], has been shown to systematically overfit em- We leave this investigation for future work. pirical data [24], while at the same time it will fail for net- works with obvious community structure [39, 45]. Nev- ertheless, if the partitions are obtained with some other Appendix A: Ensemble equivalences method (like the one of we have described in the previ- ous section, which suffers from none of the mentioned In this section we consider network generative pro- shortcomings), or originate from network annotations, cesses that at first might seem distinct, but in fact are the value of can be a good description of the exist- Q equivalent not only to each other but also to the Poisson ing homophily, and the corrections above can be used to model considered in the main text. improve it.

V. CONCLUSION 1. Sequential edge-dropping model

We have considered the use of the erased Poisson model We consider the situation where a random multigraph to describe simple graphs with different kinds of hetero- is grown by adding E edges one by one to the network in geneous sparsity, in particular with broad degree distri- sequence, and the probability that a given edge is placed butions and community structure. We have shown how P between nodes i and j is given by qij, with i

Aij λqij P Y (λqij) e− assuming j Aij = ki for every node i, otherwise = . (A3) A ! P (A k) = 0. We note that all generated graphs that i

[1] Mark Newman, Networks: An Introduction (Oxford Uni- Letters 107, 178701 (2011). versity Press, 2010). [8] Brian Karrer and M. E. J. Newman, “Stochastic block- [2] Albert-László Barabási and Réka Albert, “Emergence models and community structure in networks,” Physical of Scaling in Random Networks,” Science 286, 509–512 Review E 83, 016107 (2011). (1999). [9] Tiago P. Peixoto, “Bayesian Stochastic Blockmodeling,” [3] M. E. J. Newman, “Mixing patterns in networks,” Phys. in Advances in Network Clustering and Blockmodeling Rev. E 67, 026126 (2003). (John Wiley & Sons, Ltd, 2019) pp. 289–332. [4] Santo Fortunato, “Community detection in graphs,” [10] E. T. Jaynes, Probability Theory: The Logic of Science, Physics Reports 486, 75–174 (2010). edited by G. Larry Bretthorst (Cambridge University [5] Juyong Park and M. E. J. Newman, “Origin of degree Press, Cambridge, UK ; New York, NY, 2003). correlations in the Internet and other networks,” Physical [11] Ginestra Bianconi, “Entropy of network ensembles,” Review E 68, 026112 (2003). Physical Review E 79, 036114 (2009). [6] Samuel Johnson, Joaquín J. Torres, J. Marro, and [12] Kartik Anand and Ginestra Bianconi, “Gibbs entropy of Miguel A. Muñoz, “Entropic Origin of Disassortativity network ensembles by cavity methods,” Physical Review in Complex Networks,” Physical Review Letters 104, E 82, 011116 (2010). 108702 (2010). [13] Tiziano Squartini, Joey de Mol, Frank den Hollander, [7] Charo I. Del Genio, Thilo Gross, and Kevin E. Bassler, and Diego Garlaschelli, “Breaking of Ensemble Equiva- “All Scale-Free Networks Are Sparse,” Physical Review lence in Networks,” Physical Review Letters 115, 268701 18

(2015). view E 95, 012317 (2017). [14] Marián Boguñá, R. Pastor-Satorras, and A. Vespignani, [32] Nicholas Metropolis, Arianna W. Rosenbluth, Mar- “Cut-offs and finite size effects in scale-free networks,” shall N. Rosenbluth, Augusta H. Teller, and Edward The European Physical Journal B - Condensed Matter Teller, “Equation of State Calculations by Fast Comput- 38, 205–209 (2004). ing Machines,” The Journal of Chemical Physics 21, 1087 [15] Ilkka Norros and Hannu Reittu, “On a conditionally Pois- (1953). sonian graph process,” Advances in Applied Probability [33] W. K. Hastings, “Monte Carlo sampling methods using 38, 59–75 (2006). Markov chains and their applications,” Biometrika 57, 97 [16] Marc Barthélemy, “Spatial networks,” Physics Reports –109 (1970). 499, 1–101 (2011). [34] Y. Hu, “Efficient, high-quality force- draw- [17] Ciro Cattuto, Wouter Van den Broeck, Alain Barrat, ing,” Mathematica Journal 10, 37–71 (2005). Vittoria Colizza, Jean-François Pinton, and Alessandro [35] Wayne W. Zachary, “An Information Flow Model for Vespignani, “Dynamics of Person-to-Person Interactions Conflict and Fission in Small Groups,” Journal of An- from Distributed RFID Sensor Networks,” PLOS ONE thropological Research 33, 452–473 (1977). 5, e11596 (2010), publisher: Public Library of Science. [36] David Lusseau, Karsten Schneider, Oliver J. Boisseau, [18] Henri van den Esker, Remco van der Hofstad, and Ger- Patti Haase, Elisabeth Slooten, and Steve M. Dawson, ard Hooghiemstra, “Universality for the Distance in Fi- “The bottlenose dolphin community of Doubtful Sound nite Variance Random Graphs,” Journal of Statistical features a large proportion of long-lasting associations,” Physics 133, 169–202 (2008). Behavioral Ecology and Sociobiology 54, 396–405 (2003). [19] Shankar Bhamidi, Remco van der Hofstad, and Jo- [37] V Krebs, “Political Books Network,” unpublished, han van Leeuwaarden, “Scaling Limits for Critical In- retrieved from Mark Newman’s website: http:// homogeneous Random Graphs with Finite Third Mo- www-personal.umich.edu/{~}mejn/netdata/. ments,” Electronic Journal of Probability 15, 1682–1702 [38] Lada A. Adamic and Natalie Glance, “The political blo- (2010). gosphere and the 2004 U.S. election: divided they blog,” [20] Ginestra Bianconi and Albert-László Barabási, “Bose- in Proceedings of the 3rd international workshop on Link Einstein Condensation in Complex Networks,” Physical discovery, LinkKDD ’05 (ACM, New York, NY, USA, Review Letters 86, 5632–5635 (2001). 2005) pp. 36–43. [21] Dong Yao, Pim van der Hoorn, and Nelly Litvak, “Av- [39] Benjamin H. Good, Yves-Alexandre de Montjoye, and erage nearest neighbor degrees in scale-free networks,” Aaron Clauset, “Performance of modularity maximiza- arXiv:1704.05707 [math] (2017), arXiv: 1704.05707. tion in practical contexts,” Physical Review E 81, 046106 [22] Clara Stegehuis, “Degree correlations in scale-free ran- (2010). dom graph models,” Journal of Applied Probability 56, [40] Maria A. Riolo and M. E. J. Newman, “Consis- 672–700 (2019). tency of community structure in complex networks,” [23] Tiago P. Peixoto, “The graph-tool python library,” arXiv:1908.09867 [physics] (2019), arXiv: 1908.09867. figshare (2014), 10.6084/m9.figshare.1164194, available [41] Tiago P. Peixoto, “Parsimonious Module Inference in at https://graph-tool.skewed.de. Large Networks,” Physical Review Letters 110, 148701 [24] Amir Ghasemian, Homa Hosseinmardi, and Aaron (2013). Clauset, “Evaluating Overfit and Underfit in Models of [42] M. E. J. Newman and M. Girvan, “Finding and evaluat- Network Community Structure,” IEEE Transactions on ing community structure in networks,” Physical Review Knowledge and Data Engineering , 1–1 (2019). E 69, 026113 (2004). [25] Jérôme Kunegis, “KONECT: The Koblenz Network Col- [43] Roger Guimerà, Marta Sales-Pardo, and Luís A. Nunes lection,” in Proceedings of the 22Nd International Confer- Amaral, “Modularity from fluctuations in random graphs ence on World Wide Web, WWW ’13 Companion (ACM, and complex networks,” Physical Review E 70, 025101 New York, NY, USA, 2013) pp. 1343–1350. (2004). [26] Jordi Duch and Alex Arenas, “Community detection in [44] James P. Bagrow, “Communities and bottlenecks: Trees complex networks using extremal optimization,” Physical and treelike networks have high modularity,” Physical Review E 72, 027104 (2005). Review E 85, 066118 (2012). [27] Lovro Šubelj and Marko Bajec, “Software systems [45] Santo Fortunato and Marc Barthélemy, “Resolution limit through complex networks science: review, analysis in community detection,” Proceedings of the National and applications,” in Proceedings of the First Interna- Academy of Sciences 104, 36–41 (2007). tional Workshop on Software Mining, SoftwareMining ’12 [46] Tiago P. Peixoto, “Reconstructing Networks with Un- (Association for Computing Machinery, Beijing, China, known and Heterogeneous Errors,” Physical Review X 2012) pp. 9–16. 8, 041011 (2018). [28] Reza Zafarani and Huan Liu, Social computing data [47] Tiago P. Peixoto, “Network Reconstruction and Commu- repository at ASU (2009). nity Detection from Dynamics,” Physical Review Letters [29] Tiago P. Peixoto, “Hierarchical Block Structures and 123, 128301 (2019). High-Resolution Model Selection in Large Networks,” [48] Béla Bollobás, “A Probabilistic Proof of an Asymptotic Physical Review X 4, 011047 (2014). Formula for the Number of Labelled Regular Graphs,” [30] Sergei Maslov, Kim Sneppen, and Alexei Zaliznyak, “De- European Journal of Combinatorics 1, 311–316 (1980). tection of topological patterns in complex networks: cor- [49] B. Fosdick, D. Larremore, J. Nishimura, and J. Ugander, relation profile of the internet,” Physica A: Statistical “Configuring Random Graph Models with Fixed Degree Mechanics and its Applications 333, 529–540 (2004). Sequences,” SIAM Review 60, 315–355 (2018). [31] Tiago P. Peixoto, “Nonparametric Bayesian inference of [50] E.A. Bender and J.T. Butler, “Asymptotic Approxima- the microcanonical stochastic block model,” Physical Re- tions for the Number of Fanout-Free Functions,” Com- 19

puters, IEEE Transactions on C-27, 1180–1183 (1978). don Mathematical Society Lecture Note Series , 239–298 [51] N.C. Wormald, “Models of random regular graphs,” Lon- (1999).