<<

Introducing Graph Cumulants: What is the of Your Social Network?

,1, ,2, Lee M. Gunderson∗ † & Gecia Bravo-Hermsdorff ∗ †

∗Both authors contributed equally to this work

1Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA 2Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA

†Correspondence to: [email protected] or [email protected]

In an increasingly interconnected world, understanding and summarizing the structure of these networks becomes increasingly relevant. However, this task is nontrivial; proposed summary are as diverse as the networks

they describe, and a standardized hierarchy has not yet been established. In contrast, vector-valued random variables admit such a description in terms of their cumulants (e.g., , (co)variance, skew, ). Here,

we introduce the natural analogue of cumulants for networks, building a hierarchical description based on correlations between an increasing number of connections, seamlessly incorporating additional information, such as arXiv:2002.03959v4 [math.ST] 14 Apr 2020 directed edges, node attributes, and edge weights. These graph cumulants provide a principled and unifying framework for quantifying the propensity of a network to display any substructure of interest (such as cliques to measure

clustering). Moreover, they give rise to a natural hierarchical family of maximum models for networks (i.e., ERGMs) that do not suffer from the “degeneracy problem”, a common practical pitfall of other ERGMs.

1 The power of natural science relies on the ability to find abstract representations of complex systems and describe them with meaningful summary statistics. For example, using the pluripotent language of graph theory, the field of network science distills a variety of systems into the entities comprising them (the nodes) and their pairwise connections (the edges). Remarkably many systems, from a wide variety of fields, naturally benefit from such a description, such as electrical circuits [1], brains [2], food webs [3], friendships [4], transportation networks [5], and the internet [6]. As the field developed, real networks were noticed to display several recurring themes

[7, 8, 9], such as: sparsity (a small fraction of all potential connections actually exist), heterogeneous degree distributions (some nodes have many more connections than average), and high clustering (nodes form tightly connected groups). As a consequence, measures are often tailored to capture such properties [10, 11, 12] and network models are often chosen to mimic them [13, 14, 15, 16, 17, 18]. However, network statistics are often intertwined [19, 20], and models with notably different mechanisms [16, 17, 18, 21, 22, 23] can reproduce the same commonly observed properties. It is not currently clear how to compare different methods within a single principled framework; a standardized hierarchy of descriptive network statistics and associated models is needed.

Real networks often contain a hierarchy of interconnected scales [24]. This paper formalizes a novel bottom-up approach, where the elementary units are the edges (as compared to the nodes [20]), and the hierarchy is built by considering correlations between an increasing number of them. Relationships between several edges can be expressed as subgraphs. Such substructures are often referred to as network motifs (or anti-motifs) when their appearance (or absence) in a network is deemed to be statistically significant [25]. For example, triangular substructures appear in a wide range of real networks, indicating a common tendency for clustering. However, networks in different domains appear to display different motif profiles, suggesting

2 that quantifying propensity for substructures could play a crucial role in understanding network function. For example, the feed-forward ( ) and the bi-fan ( ) substructures are prevalent in protein interaction networks, while the bi-directional two-hop ( ) substructure is more prominent in transportation networks. Current methods for quantifying a network’s propensity for various substructures are generally characterized by the following two choices. First, one needs a network statistic to measure the propensity for the substructures of interest. Second, to determine statistical significance, one needs a null model for the observed network: an ensemble of networks that matches some set of properties deemed to be important.

For the first choice, a common approach is to simply count the number of instances of a substructure [25, 26, 27]. However, substructure counts can be misleading, and are generally not faithful measures of their propensities [28, 29]. For example, the number of triangles

(or of any other substructure) naturally correlates with the number of larger substructures that contain them as subgraphs. Unfortunately, there is currently no standard measure of the propensity for an arbitrary substructure [27]; other choices tend to be rather context-dependent or specifically tailored to the substructure of interest (e.g., clustering coefficients), often leading to a proliferation of essentially incomparable statistics. This issue is exacerbated when one considers incorporating additional information, such as directed edges, node attributes, and edge weights. Regardless of the measure used, in order to assess whether its value in the observed network is statistically significant, one must compare against its distribution in some appropriate null model. This distribution is often obtained by measuring its value in networks explicitly sampled from the chosen null model. Replicating the observed degree sequence (i.e., the configuration model) is a popular choice of null model. In some cases, this node-centric model may be appropriate, such as when the nodes in the observed network have unique identities that should indeed be preserved by the null model (e.g., generating randomized networks of transactions

3 between some specific set of countries). However, the defining symmetry of graphs is that nodes are a priori indistinguishable (i.e., node exchangeability), and often one desires a null model that is similar to the observed network, but not necessarily with exactly the “same” nodes (e.g., generating typical networks of interactions between students to model the spread of infections). In , the configuration model does not treat all substructures equally. For example, hubs and clustering, both hallmark features of real networks, are associated with a propensity for (k-)stars (i.e., a central node connected to k others) and for (k-)cliques (i.e., fully connected groups of k nodes), respectively. However, the configuration model fully constrains the counts of all k-stars, and is therefore useless to assess their relative propensity in the observed network. Moreover, by design, it does not promote clustering at any level, and is thus inappropriate to study the higher-order organization of clustering. Even if one were to employ modifications to promote triangles, the issue is simply postponed to slightly larger substructures [30]. For a real-valued , cumulants provide a hierarchical sequence of summary statistics that efficiently encode its distribution [31, 32, 33]. Aside from their unique mathematical properties1, low-order cumulants have intuitive interpretations. The first two orders, the mean and variance, correspond to the center of mass and spread of a distribution, and are taught in nearly every introductory statistics course [34]. The next two orders have likewise been given unique names (skew and kurtosis), and are useful to describe data that deviate from normality, appearing in a variety of applications, such as finance [35], economics [36], and psychology [37]. Generalizations of these notions have proven similarly useful: for example, joint cumulants (e.g., ) are the natural choice for measuring correlations in multivariate random variables. Unsurprisingly, cumulants are essentially universally used by the statistics community.

By generalizing the combinatorial definition of cumulants [38, 39, 40], we obtain their

1In particular, their additive nature when applied to sums of independent random variables (see supplementary materials S8).

4 analogue for networks, which we refer to as graph cumulants. After introducing the framework, we demonstrate their usefulness as principled and intuitive measures of the propensity (or aversiveness) of a network to display any substructure of interest, allowing for systematic and meaningful comparisons between networks. We then describe how this framework seamlessly incorporates additional network features, providing examples using real datasets containing directed edges, node attributes, and edge weights. Finally, we show how graph cumulants give rise to a natural hierarchical family of maximum entropy null models from a single observed network.

For ease of exposition, we first introduce our framework for the case of simple graphs, i.e., undirected, unweighted networks with no self-loops or multiple edges.

Graph moments

The cumulants of a real-valued random variable are frequently given in terms of its moments, i.e., the of its powers. A real network can be thought of as a realization of some graph-valued random variable, and its graph cumulants are likewise given in terms of its graph moments. To motivate our definition of graph moments, consider the measurement that provides the smallest nonzero amount of information about a network G with n nodes. This measurement is a binary query, which yields 1 if an edge exists between a random pair of nodes in G, and 0 otherwise2. At first order, we consider repeated observations of a single such measurement. We define the first-order graph µ1 (the “mean”) as the expected value of this quantity: the counts of edges in G normalized by the counts of edges in the complete graph with n nodes. Hence, the first-order graph moment of a network is simply its edge density.

2One might consider measuring a random node instead. However, the presence of a node per se does not give any new information. On the other hand, querying the degree of a node yields an , containing more information than the binary result from querying the existence of a single edge.

5 For the second-order graph moments, we again consider a binary query, but now using two simultaneous such measurements. ThisIntroducing query yields 1 if G Cumulantscontains edges between for Graphs: both pairs of Introducing Cumulants for Graphs: nodes, and 0 otherwise. WeWhat now must is distinguish the Kurtosis between two of new your cases: when Social the two Network? edges What is the Kurtosisµ of your Social Network?µ share a node, therebyIntroducing forming a wedge ( 2 Cumulants); and when they do for not shareGraphs: any node ( 2 ). Each case is associated with a different second-order graph moment, which,1 we analogously define,2 What is the KurtosisGecia Bravo-Hermsdorff of your Social⇤ & Lee Network? M. Gunderson⇤ ,1 ,2 as the counts of the associatedGecia Bravo-Hermsdorff subgraph in G, normalized⇤ & Lee by the M. counts Gunderson of this subgraph⇤ in the complete graph with n nodes. Hence, the second-order graph moments of a network are the Gecia Bravo-Hermsdorff ,1 & Lee M. Gunderson ,2 densities of substructures with two edges. ⇤Both authors⇤ contributed equally to this⇤ work. 1 Princeton⇤Both authors Neuroscience contributed Institute, equally Princeton to this work. University, Princeton, NJ, 08544, USA th Likewise,1Princeton we define Neuroscience2 anDepartmentr -order Institute,graph of Astrophysical moment Princeton for Sciences, University, each of Princeton the Princeton, ways University, that NJ,r edges 08544, Princeton, can USA relate NJ, 08544, USA 2 to each otherDepartment (see fig. of 1 Astrophysical for the subgraphs Sciences, associated Princeton with University, graph moments Princeton, up NJ, to 08544, third order). USA ⇤Both authors contributed equally to this work. 1Princeton Neuroscience Institute, Princeton University,Abstract Princeton, NJ, 08544, USA Again, µrg is the density of subgraph g, defined as its counts in G normalized by its counts in 2Department of Astrophysical Sciences,Abstract Princeton University, Princeton, NJ, 08544, USA the complete graph with the same number of nodes as G. 1st order 1st order 2nd order Abstract 2nd order 3rd order 3rd order 1st order 2nd order 3rd order

Fig. 1: Subgraphs associated with graph moments. The graph moment µrg of a network G is defined as the counts of the subgraph g (with r edges), normalized by the counts of this subgraph if connections were present between all pairs of nodes in G. Displayed here are the subgraphs g associated with the graph moments up to third order (for simple graphs). The full set of rth-order moments contains all substructures with exactly r edges (including disconnected subgraphs), corresponding to all the ways that r edges can relate to each other.

1 1

The scalability of computing graph moments is determined by the computational complexity 1 6 associated with counting connected subgraphs, as these imply the counts of disconnected subgraphs (see supplementary materials S1.1 for details). This is an important fact, as the counts of disconnected subgraphs are generally orders of magnitude larger, and we can leverage numerous methods for efficiently counting connected subgraphs [41, 42, 43, 44].

Graph cumulants

As the density of smaller substructures increases, the appearance of larger substructures that contain them as subgraphs will clearly also tend to increase. Hence, we would like to measure the difference between the observed value of a given graph moment and that which would be expected due to graph moments of lower order, so as to quantify the propensity (or aversiveness) for that specific network substructure. Cumulants are the natural statistics with this desired property. For example, the variance quantifies the intuitive notion of the “spread” of a distribution (regardless of its mean), while the skew and the kurtosis reflect, respectively, the asymmetry and contribution of large deviations to this spread.

While often defined via the cumulant generating function [45, 46], cumulants have an equivalent combinatorial definition [38, 39, 40] (see fig. 2). At order r, it involves the partitions of a set of r elements:

µr = κ p , (1) | | π P p π X∈ r Y∈ th th where µr is the r moment, κr is the r cumulant, Pr is the set of all partitions of a set of r unique elements, π is one such partition, p is a subset of a partition π, and p is the number of | | elements in a subset p.

When generalizing this definition to graph moments, the partitioning Pr of the edges must

7 respect their connectivity (see fig. 2, bottom row), i.e.

µrg = κ p g , (2) | | p π P p π ∈XE(g) Y∈ where E(g) is the set of the r edges forming subgraph g, PE(g) is the set of partitions of these edges, and gp is the subgraph formed by the edges in a subset p. These expressions can then be inverted to yield the graph cumulants in terms of graph moments (summarized in supplementary materials S10 and provided in our code).

8 IntroducingIntroducingIntroducingIntroducingIntroducing Cumulants Cumulants Cumulants Cumulants Cumulants for for for Graphs: for Graphs: for Graphs: Graphs: Graphs: IntroducingIntroducingIntroducingIntroducing Cumulants Cumulants Cumulants Cumulants for for for Graphs: Graphs: Graphs: for Graphs: WhatWhatWhatWhatWhat is is the is is the theis the Kurtosis theKurtosis Kurtosis Kurtosis Kurtosis of of your of of your of your your yourSocial Social Social Social Social Network? Network? Network? Network? Network? WhatWhatWhat is isWhat the is the the Kurtosis Kurtosis isIntroducing Kurtosis the Kurtosis of of yourof your yourCumulants Social of Social Social your Network? Social Network? for Network? Graphs: Network?

WhatIntroducing is the Kurtosis Cumulants,1 of your for Graphs: Social,2 Network? Gecia Bravo-Hermsdorff,1 ,1 ,1⇤ &,1 Lee,1 M. Gunderson,2 ,2 ,2⇤ ,2 ,2 GeciaGeciaGecia Bravo-HermsdorffGecia Bravo-Hermsdorff Bravo-HermsdorffGecia Bravo-Hermsdorff Bravo-Hermsdorff⇤,1⇤,1&⇤&,1 Lee& Lee⇤ Lee M.& M.⇤ Lee Gunderson M.& Gunderson Lee Gunderson M. Gunderson M. Gunderson⇤,2⇤,2⇤,2 ⇤ ⇤ GeciaIntroducingWhatGeciaIntroducingGeciaIntroducing Bravo-HermsdorffIntroducing Bravo-Hermsdorff is Bravo-Hermsdorff theIntroducing Cumulants Kurtosis Cumulants CumulantsIntroducing Cumulants⇤ ⇤&⇤& Lee of& Lee Cumulants Lee forM. your forM. Gunderson forM. Graphs: Gunderson CumulantsGraphs: Graphs: Gunderson for Social Graphs: Graphs:⇤ for⇤ Network?⇤ Graphs: for Graphs: ,1 ,2 WhatWhatWhatWhatIntroducingIntroducing isIntroducing is the isIntroducing theIntroducingWhat the is Kurtosis theKurtosis GeciaIntroducing KurtosisWhat is CumulantsKurtosis Cumulants theBravo-Hermsdorff Cumulants Cumulants of is Cumulants of Kurtosis your of the your of your Cumulants Kurtosis yourSocialfor for Social for Social⇤ of Graphs: for Graphs:& Social for Graphs: your Lee Network? Graphs: Network? Network? Graphs:of M. for Network? Social your Gunderson Network? Graphs: Social Network?⇤ Network? ⇤Both authors contributed equally to this work. ⇤Both⇤Both⇤Both authors authors⇤Both authors contributed⇤Both authors contributed contributed authors contributed equally equally contributed equally to equally tothis thisto work.equally this work. to work. this to work. this work. 1 ,1 ,2 1 1 1 Princeton1 1 NeuroscienceGecia⇤Both⇤Both⇤Both authors Bravo-Hermsdorffauthors authors Institute, contributed contributed contributed Princeton equally equally equally⇤ University, to& tothis Lee thisto work. this work. Princeton, M. work. Gunderson NJ, 08544,⇤ USA What1WhatPrinceton1WhatPrinceton1PrincetonWhatWhatPrinceton is Neuroscience is Neuroscience is thePrinceton the NeuroscienceWhat is the is Neuroscience theKurtosis Kurtosis the KurtosisInstitute, Neuroscience is Institute, Kurtosis Institute, Kurtosis the Princeton Institute, Princeton of Kurtosis ofPrinceton Institute, of your Princeton your University, of your University, of Princeton University, your Socialyour Social University, of Princeton, Social Princeton, University,Social your Princeton, Social Princeton,Network? Network?NJ, NJ,Network? Social Princeton, 08544, NJ, Network? 08544, Network? 08544, NJ, USA 08544, USA NJ, Network? USA 08544, USA USA PrincetonPrinceton2DepartmentPrinceton Neuroscience Neuroscience Neuroscience of Astrophysical Institute, Institute, Institute, Sciences, Princeton Princeton Princeton Princeton University, University, University, University, Princeton, Princeton, Princeton, Princeton, NJ, NJ, 08544, NJ, 08544, 08544, NJ, USA 08544, USA USA USA 2Department2Department2Department2Department of2Department ofAstrophysical Astrophysicalof Astrophysical of Astrophysical of Sciences, Astrophysical Sciences, Sciences, Sciences, Princeton Princeton,1 Sciences, Princeton,1,1 ,1 Princeton University, University,,1 Princeton University, University, Princeton, Princeton,,1 University, Princeton, Princeton, NJ,,1 NJ,,2 Princeton, 08544, NJ,,2,2 08544,,2 08544, NJ, USA,2 08544, USA NJ, USA 08544, USA,2 USA ,2 2Department2Department2DepartmentGecia ofGecia ofAstrophysicalGecia Astrophysicalof Bravo-HermsdorffGecia Astrophysical Bravo-Hermsdorff Bravo-Hermsdorff Bravo-Hermsdorff Sciences,Gecia Sciences, Sciences,⇤Both Bravo-Hermsdorff PrincetonGecia Princeton⇤ authors Princeton⇤&⇤& Lee Bravo-Hermsdorff University,& Lee⇤ contributed University, Lee M. University,& M. Lee Gunderson M. Princeton, Gunderson⇤ Princeton, Gundersonequally M.& Princeton, Gunderson Lee toNJ,⇤⇤ thisNJ, M. 08544,⇤&⇤ NJ, 08544,work.⇤ Gunderson Lee 08544,⇤ USA M. USA USA Gunderson⇤ ⇤ 1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 2 ,1,1Abstract,1 ,1 ,1 ,1 ,2,2 ,2 ,2 ,2 ,2 GeciaGeciaGeciaDepartmentGecia Bravo-HermsdorffGecia Bravo-Hermsdorff Bravo-Hermsdorff Bravo-HermsdorffGecia Bravo-Hermsdorff of⇤Both Astrophysical Bravo-HermsdorffauthorsAbstractAbstract⇤⇤Abstract contributed&⇤ Sciences,&Abstract Lee⇤& Lee⇤ Lee&Abstract M.& M.Lee equally Princeton LeeM. Gunderson⇤ Gunderson M. Gunderson& M.to Gunderson Lee this University, Gunderson work. M.⇤⇤ Gunderson⇤ Princeton,⇤ ⇤ NJ,⇤ 08544, USA AbstractAbstractAbstract 1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 2Department⇤Both of⇤Both Astrophysical⇤Both authors authors⇤Both authors contributed authors contributed Sciences, contributed⇤Both contributed authors equally equally Princeton⇤Both equally tocontributed equally authors tothis thisto University, work. this work. work. to contributed work. equallythis work.Princeton, to equallythis work. NJ, to 08544, this work. USA 1Princeton1Princeton1Princetonµ11Princeton Neuroscience= Neuroscience1 Neuroscience1 NeurosciencePrinceton Institute,X Institute,1= Institute, NeurosciencePrincetonmean Princeton Institute, Princeton Princeton Neuroscience Princeton University, Institute, University, University,Abstract University,Princeton Institute, Princeton, Princeton, Princeton, University,Princeton Princeton, NJ, NJ, NJ, 08544, NJ, 08544, 08544, University, Princeton,08544, NJ, USA 08544, USA USA USA Princeton, NJ, USA 08544, NJ, USA 08544, USA µ1µ=1 µ=11=1µ11= µ1 =x x=1 x=meanh mean=ixmean= meanX = mean 2 2 2µ1µ=12µ=11=1 1 2 hxihx=ihx=meani2mean=h meani h i DepartmentDepartmentDepartmentDepartment of ofAstrophysical AstrophysicalofDepartment Astrophysical⇤Bothh⇤ ofBothih⇤ AstrophysicalBothih authors⇤ authorsBothi Sciences,⇤Department authorsBoth Sciences, of authors Sciences, contributedAstrophysical contributedauthors⇤ contributedSciences,Both Princeton contributedPrinceton of authors contributed Princeton Astrophysical equally equally Sciences,Princeton University, equally contributedUniversity, equally to University, to equally this thisto Sciences,University,Princeton thisPrinceton, work. to work. Princeton, Princeton, toequallythis work. Princeton, this work. University,Princeton Princeton, work. NJ, to NJ, NJ,this 08544, NJ, 08544, 08544, work. University, Princeton,08544, NJ, USA 08544, USA USA USA Princeton, NJ, USA 08544, NJ, USA 08544, USA 1Princeton1Princeton1Princeton1Princeton1Princeton Neuroscience Neuroscience Neuroscience1 NeurosciencePrinceton Neuroscience Institute, Institute, Neuroscience Institute, Institute, Princeton Institute, Princeton Princeton Princeton Institute, Princeton University, University,Abstract University, University,Princeton University, Princeton, Princeton, Princeton, University, Princeton, Princeton, NJ, NJ, NJ, 08544, 08544, Princeton, NJ, 08544, NJ, 08544, USA USA 08544, USA NJ, USA USA 08544, USA µ =  +   X2 = variance + mean3 2 2 2 2 2 2 2 2 2 µ13 =1 3 +2 2212 + 221 +221 +2 21 2 2 2 DepartmentDepartmentDepartmentµ2µDepartment=2 Departmentµ=22 of=+2 ofµ Astrophysical+2 Astrophysical2of1=Department+12 Astrophysicalµ of2121 ofAstrophysical+=1 Astrophysical12+12 Sciences,x of Sciences,12x Astrophysical Sciences,1=2xh=variance Sciences,variance= Sciences,ix Princeton Princetonvariance= PrincetonX+variance Sciences,Princetonmean+= Princetonmean University,+2 University,variancemean2 University,+2mean University,Princeton University,+ Princeton, Princeton,mean Princeton, University, Princeton, Princeton, NJ, NJ, NJ, 08544, 08544, Princeton, NJ, 08544, NJ, 08544, USA USA 08544, USA NJ, USA USA 08544, USA µ2µ=2 µ=22=+2 +21+1 1 xhxh=ix=variancehi variance=i hvariancei +h mean+ meani+ mean 3 µ3 =h hi3 hi+ i2 Abstract1 Abstract+ Abstract2 Abstract1 + 2 1 Abstract+ 1 Abstract µ =  +   +   +   + 3 3 3 2 1 2 1 2 1 3 1 µ3 = 3 + 21 + 21 + 23 1 3+ 3 1 3 3 3 µ3µ=3 µ=33=+3µµ+3332=+=12µ+312+=3+12+2312+1122++211+212++112++2213Abstract+Abstract2131+Abstract1+31+2Abstract211Abstract+1 +1 1Abstract µ3µ=3 µ=µ33=+3 =+32+12+12++12+12+12++12+12+12+13 ++3131  3+ 3 3 3 µ3µ=31 µ=333=+31µ+332=+123+312µ++312µx=2+3212=+1132++mean132+2122µ+1132++1=1+21231+212++1321+1321+123+112+311+111321 + 1 µ3µ3=µ3=3=µ3+33+=2µ+3231=2h+1+i+3122++2112+21++1122++2112+21++1132++13 1312+11 + 1 µ3µ3=µ3=3=3+3+2+212+1 +12+212+1 +12+212+1 +131+31 31 3 3 3 µ3µ3=µ3=3 =µ3+33+=2 +2µ3132+µ1+3=+122=+32113+2+1+µ++23122+2=2111+21+13+++12+12+21121+11+1+2+1212+11+1+11211 + 1 3 3 3 3 3 3 µµ3 3=µ=33=µ3+3+µµ=3332+==2311++233+µ+1+322+2=12121++2113++1+22+212121++211++1+21+1122+111++1 1121+1 1 µ =  2+   +  2 +   +    2 µ2 = 32 + 3 1µ =2 1 + x2 1 =+variance2 1 + 1+331mean13+ 3 3  3 µµ3µ3=µ3=µ3=33µ3=+33µµ+3=33+2=+==21+32+312µ31+32+1312++++2131=2+2+12122h21+3+21+211i1+2+12+++2+11122+2+221211+1+21111+++11++112+2112+2211111+111+++111211111+11 µ3µ=3 µ=µ33=+3 =+32+12+12++12+12+12++12+12+12+1 ++111111+    3 3µ =2 1 +  2  1 +  2  1 +  1  1 +1    µ3µ3=µ3=3=µ3+33+=2+32312+1++3122+2112+21++1122+2112+21++112+1111121+111111 11 1 1 Fig. 2: To expandµ3µ3=µ an3=3r=th3+-order3+2+2 graph12+1 +1 moment2+212+1  in+12 terms+212+1  of+11 graph+11111 cumulants,11 1 enumerate r 3 The top three rows illustrate the all partitions ofµ3 the= 3edges+ 2 comprising1 + 21 + its subgraph.21 + 1 combinatorial expansion of the classical moments in terms of cumulants.3 Analogously, the µ3 a= 3 + 2 1 + 2 1 + 2 1 + 1 3 bottom row showsa a a howa to expanda µ3 in terms of graph cumulants. The last term (κ1 ) a a ab corresponds to partitioning thisb subgraph into three subsets, each with a single edge. The first b b b b term (κ3 ) correspondsc to “partitioning” this subgraph into a single subset containing all three c edges, thus inheritingc c c thec connectivity of the entire subgraph. The remaining terms (κ2κ1) c µc3 c= 3 + 21 + 21 + 21 + 111 correspond to partitioning this subgraph into a subset with one edge and a subset with two µ =  +   +   +   +    edges. This can be3 done in3 three different2 1 ways:2 in1 two cases2 1 (the two11 1κ2 1κ1 terms), the subset 1 with two edges has those edges sharing a node; and in one1 case1 1 (the κ12 κ1 term), the subset with two edges has those edges not sharing any node. a 1 b Essentially,c the defining feature of cumulants is their additive nature when summing 1 independent random variables [31, 32, 46, 45]. In supplementary materials S8, we show that the 1 1 1 1 1 1 graph cumulants of independent graph-valued random variables also have this additive property for a natural notion of summing graphs. Moreover, we remark that the Erdos–R˝ enyi´ distribution 11 1 1 1 1 has graph cumulants of zero for all orders r 2, similar to the classical cumulants of the normal ≥ distribution, which are zero for all orders r 3. ≥ 9 Quantifying propensity for substructures: scaled graph cumulants

Cumulants are often scaled, as dimensionless quantities allow for interpretable comparisons. For example, the precision of a measurement is often quantified by the relative standard

1/2 deviation (κ2 /κ1), and the linear correlation between two random variables X and Y is

1/2 1/2 often quantified by the Pearson correlation coefficient (Cov(X,Y )/κ2 (X)κ2 (Y ), i.e., their second-order joint cumulant divided by the geometric mean of their individual second-order cumulants). Likewise, we define scaled graph cumulants as κ˜ κ /κr , and report the rg ≡ rg 1 th 1/r “signed” r root, i.e., the real number with magnitude equal to κ˜rg and with the same sign as κ˜ rg. These scaled graph cumulants offer principled measures of the propensity (or aversiveness) of a network to display any substructure of interest.

Scaling graph cumulants also allows for meaningful comparisons of the propensity of different networks to exhibit a particular substructure, even when these networks have different sizes and edge densities. To illustrate this point, we consider clustering, a hallmark feature of many real networks [7, 8, 13]. This notion is frequently understood as the prevalence of triadic closure [4, 9], an increased likelihood that two nodes are connected if they have mutual neighbors, i.e., a propensity for triangles. This property is often quantified by the clustering coefficient C , defined as the probability that two neighbors of the same node are themselves connected. While this quantity is easily expressed within our formalism as C = µ3 /µ2 , it is neither a graph cumulant nor dimensionless. We propose that the scaled triangle cumulant

κ˜3 is a more appropriate measure of clustering in networks, as demonstrated in fig. 3 (for the natural extension to clustering in bipartite networks, see fig. S1).

10 4 8 Introducing Cumulants for4 Graphs: Introducing Cumulants for16 Graphs: 8 32 What is the Kurtosis of your Social16 Network? Introducing Cumulants for Graphs:32 What is the Kurtosis of your Social0.3 Network? 0.4 0.3 What is the Kurtosis of your,1 Social Network?,2 0.5 Gecia Bravo-Hermsdorff⇤ & Lee M.0 Gunderson.4 ⇤ ,1 ,2 Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson0.6 ⇤ 0.5 Introducing Cumulants for Graphs: Introducing Cumulants for0.7 Graphs:Introducing Cumulants,1 for0.6 Graphs:,2 Gecia Bravo-Hermsdorff⇤ & Lee M. GundersonWhat⇤ is the Kurtosis of your Social Network? 0.8 0.7 ⇤Both authors contributed equally to this work. What is the Kurtosis of your Social1 What Network? is the Kurtosis of your Social0.8 Network? ⇤Both authors contributed equally to thisAverage work.Princeton degree d Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA h i 1Princeton NeuroscienceIntroducing Institute, Princeton Cumulants University, Princeton,2Department for NJ, Graphs: 08544, of Astrophysical USAIntroducing Sciences, Princeton Cumulants University, Princeton, for Graphs:NJ, 08544, USA ,1 ,2 Average degreeGeciaIntroducingd Bravo-Hermsdorff Cumulants⇤ & Lee M. for Gunderson Graphs:⇤ 2 Proximity f h i Department of Astrophysical Sciences, Princeton University, Princeton, NJ,Introducing 08544,⇤Both USA authors contributed Cumulants equally to this for work. Graphs: What is the Kurtosis,1 of your1 SocialWhat Network?,2 is the Kurtosis,1 of yourProximity Social parameter f, Network?2 Gecia Bravo-Hermsdorff⇤ & Lee M.4Princeton Gunderson NeuroscienceGecia⇤a b Bravo-Hermsdorff Institute, PrincetonAbstract University,⇤ & Lee Princeton, M.What Gunderson NJ, 08544, is the USA⇤ Kurtosis of your Social Network? Disasortative Assortative Introducing Cumulants for2 Graphs: a+b Abstract Department8 of Astrophysical ! Sciences, Princeton University, Princeton,4 NJ, 08544,a b USA 2 2 WhatIntroducing is the Kurtosis Cumulants of your for Social Graphs:Disasortative Network? Assortative , , a+b ⇤ ⇤ Introducing Cumulants for016 Graphs:IntroducingIntroducing Cumulants Cumulants for Graphs: for8 Graphs: ⇤!Both authors contributed equally to this work. What is the Kurtosis of your Social Network?4 ,1 ,2 ,1 1Princeton Neuroscience,2 Institute, Princeton University, Princeton, NJ, 08544, USA Gecia Bravo-Hermsdorff⇤ & LeeWhat132 M. Gunderson is the KurtosisGecia⇤ Bravo-Hermsdorff of your Social⇤ &016 Lee Network? M. GundersonGecia Bravo-Hermsdorff⇤ ,1 & Lee M. Gunderson ,2 What4 is the Kurtosis of your SocialWhatWhat8 Network?is the is Kurtosis the KurtosisAbstract of your of your Social2 Social Network? Network? ⇤ ⇤ ⇤Both authors contributedIntroducing equally to this+1 Cumulants work. for⇤Both Graphs: authors contributedIntroducing,1 equally to132Department this Cumulants work. of Astrophysical,2 for Sciences, Graphs: Princeton University, Princeton, NJ, 08544, USA 8 0.3 Introducing16 Gecia Bravo-Hermsdorff Cumulants⇤ for& Lee Graphs: M. GundersonIntroducingIntroducing⇤ Cumulants Cumulants for Graphs:for Graphs: 1PrincetonIntroducing Neuroscience Institute, Cumulants Princeton,1 University, for Princeton,1 Graphs:1Princeton NJ,,2 08544,Introducing Neuroscience USA Institute, Cumulants Princeton University,+1 for Princeton, Graphs: NJ, 08544, USA 16 Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson0.4 32 ⇤ 0.3 2DepartmentIntroducing of AstrophysicalWhat Sciences, Cumulants isPrinceton the University, Kurtosis for Princeton,2 Graphs:Department of NJ, yourIntroducing of08544, Astrophysical Social USA What Sciences, Network? Cumulants isPrinceton the University, Kurtosis for1 Princeton, Graphs: of NJ, your 08544, Social USA Abstract Network? What4 is the Kurtosis of your,1 SocialWhat Network?What is the,2 is the Kurtosis Kurtosis of your of your Social Social Network? Network? 32What is the Kurtosis of,1 your Social0.5 -1GeciaWhat Network? Bravo-Hermsdorff,2 is the Kurtosis⇤,1 & Lee of,1 yourM. Gunderson0.4 Social⇤,2 Network?,2 ⇤Both authors contributed equally to thisGecia work. Bravo-Hermsdorff⇤Both authors⇤ & Lee contributed M. Gunderson equally to this⇤ work. Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson8 Gecia⇤ Bravo-Hermsdorff⇤ & Lee M. GundersonIntroducing⇤ ⇤Both authors Cumulants contributed equally for to this Graphs: work. -1 1 0.006 0 1 0.5 WhatPrinceton is the Neuroscience Kurtosis Institute, of Princeton your University, Social Princeton,What Network?Princeton NJ, is 08544, theBoth Neuroscience authors USA Kurtosis contributed Institute, of equallyPrinceton your to thisUniversity, Social1 work. Princeton, Network? NJ, 08544, USA Introducing Cumulants for16 Graphs:Introducing⇤ Cumulants forPrinceton4 Graphs: Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 0 2 Abstract 0.247 1 +1 2 ,1 Abstract,2 0.006 ,1 ,2 DepartmentIntroducing⇤Both of Astrophysical authors contributed Cumulants Sciences, equally Princeton to this for University, work. Graphs:Princeton Princeton,Department NeuroscienceIntroducing NJ, of08544, Astrophysical Institute, USA,1 Cumulants Princeton Sciences, University, Princeton2 Princeton,for University, Graphs:,2 NJ, Princeton, 08544, USA NJ, 08544, USA,1 ,2 Gecia Bravo-Hermsdorff32Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ & LeeGecia⇤ M. Gunderson Bravo-HermsdorffDepartmentWhat⇤8Gecia is of Bravo-Hermsdorff⇤ Astrophysical the& Lee Kurtosis M. Sciences, Gunderson⇤ Princeton of& Lee,1 your⇤ M. University, Gunderson Social Princeton, Network?⇤ NJ,,2 08544, USA 1 +1 0.8 1 0.247 & Lee M. Gunderson & Lee M. Gunderson ,1 2 ,2 ,1 Gecia Bravo-Hermsdorff,2 ⇤ & Lee M. Gunderson⇤ Princeton NeuroscienceGecia Bravo-Hermsdorff Institute, Princeton University,⇤ & Lee Princeton, M.˜ Department Gunderson1/4 NJ, 08544, ofGecia Astrophysical⇤ USA Bravo-Hermsdorff Sciences, Princeton⇤ University,& Lee M. Princeton, Gunderson NJ, 08544,⇤ USA 1 1 What is the Kurtosis of your Social4-1 What Network? is the Kurtosis of your Social16 Network? , , 1 0.0 ⇤Both authors contributed contributed equally equally to to this this work. work.0.8 1 2 1 h i ⇤ ⇤ What is the Kurtosis of, your1 SocialWhat Network? is,2 the Kurtosis of, your1 Social Network?,2 Department of Astrophysical⇤Both Sciences,authors contributed Princeton equallyUniversity, to this Princeton, work. NJ, 08544, USA⇤Both authors contributed equally to this1/4 work. 4 Gecia Bravo-Hermsdorff⇤ & Lee M.CAverage1 0 Gunderson degree4 d Gecia⇤ Bravo-Hermsdorff⇤ & Lee˜ M.4 32 Gunderson⇤ 10.0 Abstract Princeton10.5 Neuroscienceh i Institute, Princeton Princeton University, University,Abstract Princeton, Princeton,h i NJ, NJ, 08544, 08544, USA USA Abstract Abstract Princeton NeuroscienceIntroducing Institute, Princeton Cumulants University, Princeton, forPrinceton Graphs: NJ, 08544, Neuroscience USAIntroducing Institute,Abstract Princeton Cumulants University, Princeton, for Graphs: NJ, 08544, USA Abstract,1 ,2 8 2 1 8 CAverage-1 degreeGeciaIntroducingd Bravo-Hermsdorff Cumulants⇤ & Lee M. for Gunderson Graphs:⇤ 2 0.5 Department12/3 a of Astrophysical Sciences, Princeton Princeton University, University, Princeton, Princeton, NJ, NJ, 08544, 08544,h i USA USA Department of Astrophysical Sciences,Abstract Princeton⇤Both University, authorsProximity˜3 Princeton, contributedDepartment parameter⇤Both NJ, equally of08544, authorsf Astrophysical to USA this contributed work. Sciences, equally Princeton to this⇤ work.Both University, authors Princeton, contributed⇤Both NJ, equally 08544, authors to USA this contributed work. equally to this work. 16 ,1 h 0.0i Introducing16 ,2 Cumulants,1 for Graphs:0 ,2 a Gecia Bravo-Hermsdorff1 & Lee1 M. Gundersonb Gecia Bravo-Hermsdorff1 & Lee1 M.1/ Gunderson3 ⇤Both authors contributed equally to this work. Assortative Assortative What is the Kurtosis,⇤1 of yourPrincetonC Social NeuroscienceWhat Network?,⇤2 Institute, is thePrinceton Kurtosis University,,⇤1 Princeton, of yourPrincetonProximity NJ,˜ 08544, Social Neuroscience parameter USA f, Network?⇤2 Institute, Princeton University, Princeton, NJ, 08544, USA Gecia Bravo-Hermsdorff⇤Both authorsPrinceton contributed Neuroscience⇤ & equally Lee Institute,M. to Gundersonthis work. PrincetonGecia⇤ University,a b Bravo-Hermsdorff Princeton,⇤Both authorsPrinceton NJ, contributed08544, Neuroscience⇤ USA& equally Lee Institute,M. toWhat3 Gundersonthis work. Princeton is the⇤ University, Kurtosis Princeton, NJ,of 08544, your USA Social Network? 32 4 Disasortative0.5 4 32 4 Assortative h 1 i+1 b1 2 2Department ofc Astrophysical1 a+b ! Sciences,2 PrincetonAbstract University, Princeton,2DepartmentC NJ,Princeton4 08544, of Astrophysical Neuroscience USA Sciences, Institute, Princeton Princeton University, University, Princeton, Princeton, NJ, NJ, 08544, 08544, USA USA ! Princeton! NeuroscienceDepartment Institute, Princeton of Astrophysical University, Sciences, Princeton, PrincetonPrinceton NJ, 08544, University, Neuroscience USA Princeton,Department Institute, NJ, Princeton of08544, Astrophysical USA University, Sciences, Princeton, Princeton NJ, 08544, University, USA Princeton, NJ, 08544, USA ⇤Both authors contributed equally toWhat this work. is the Kurtosis⇤Both authorsAbstract of contributed your Social equally2 to this Network? work. a b 2 2 -1 8 -1 8 b b b b Disasortative1 Assortative , , 2 4c Abstract 2 d8 Abstract Department8 of Astrophysicala+b Sciences, Princeton University, Princeton, NJ, 08544, USA ⇤ ⇤ 1 1 + DepartmentPrinceton + Introducing Neuroscience of Astrophysical Institute, Sciences, Cumulants Princeton Princeton University, University, for0 Princeton, Graphs: Princeton,DepartmentPrinceton NJ,Introducing NJ, 08544, Neuroscienceof 08544, Astrophysical USA USA Institute, Sciences, Cumulants Princeton Princeton University, University, for Princeton, Graphs: Princeton, NJ, NJ, 08544,⇤!Both 08544, USAauthors USA contributed equally to this work. a a a a i i 16 16 8d 0 160 1 0.016 d 2 d ,1 2 ,2 Abstract ,1 ,2 Abstract Both authors contributed equally to this work. Both authors contributed equally to this work. Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA h Departmenth of Astrophysical Sciences, Princeton University,1 Average Princeton,Department degree NJ, of08544,d Astrophysical USA Sciences, Princeton University,0 Princeton, NJ, 08544, USA ,1 ,2 ⇤ ⇤ 32 Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson32 Gecia⇤ Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ 1 1 0.5 Gecia Bravo-Hermsdorff & Lee M. Gunderson What16 is the⇤Both Kurtosis authors contributed of your equally Social to this4 work.What32Abstract Network? ish i the⇤Both Kurtosis authors contributed of your equally2 Social to this32 work.Abstract Network? ⇤ ⇤ ⇤Both authors contributedIntroducing equally to this+1Disassortative4 Cumulants work. a b for⇤BothAssortative Graphs: authors contributed,1 Introducing equally to1Department this Cumulants work.,2 of Astrophysical for Sciences, Graphs:Abstract Princeton University, Princeton, NJ, 08544, USA 321 0.04 -1 8IntroducingAverageGecia1 0.04 degree-1 Bravo-Hermsdorffa+bd Cumulants⇤ & Lee for M. Graphs: GundersonaIntroducing⇤Introducing Cumulants Cumulants for Graphs:for Graphs: 1 Princeton Neuroscience Institute, PrincetonAbstract University, Princeton,1-1Princeton NJ, 08544, Neuroscienceh USA!i Institute, PrincetonAbstract University, Princeton,-1 NJ, 08544, USA PrincetonIntroducing Neuroscience Institute, Cumulants Princeton University, for Princeton,18 Graphs:Princeton NJ, 08544,Introducing Neuroscience USAa b Institute, Cumulants Princeton University,+1 for Princeton, Graphs: NJ, 08544, USA 2Department-1Average0.58 degree0 of Astrophysicald Sciences, Princeton University,4 16 Princeton,2DepartmentDisassortative0 0.58 NJ,0 of08544, Astrophysicala USA+b Assortative Sciences, Princeton University,4 b Princeton, NJ, 08544, USA 2 h i What isAbstract the Kurtosis162 of your Social!What Network? isAbstract the Kurtosis0 of your SocialAbstract Network? Introducing Cumulants for Graphs: Introducing Cumulants for Graphs: Introducinga b Cumulants for Graphs:Introducing Cumulants for Graphs: Gecia Bravo-Hermsdorff Gecia Bravo-Hermsdorff Department of Astrophysical Sciences, Princeton University,What Princeton, isDepartment the NJ, Kurtosis of08544, Astrophysical USA of Sciences, your Princeton Social University,What Network?1 Princeton, is the NJ, Kurtosis 08544, USA of your Social Network? 0Disassortative16 1 a+b Assortative4 8 32 +116 1 4 8Whatc is the Kurtosis of your Social Network? What is the ! Kurtosis of,1 your Social32 What Network?,2 is the Kurtosis of,1 your Social14 Network?,2 Gecia Bravo-Hermsdorff⇤Both authors contributed⇤ & Lee equally M. Gunderson to this work.Gecia⇤ Bravo-Hermsdorff⇤Both authors contributed⇤ & Lee equally M. Gunderson to this work.⇤ Both authors contributed equally to this work. +1324 0.0 8 160.00-1 1 324 0.0 8 1 16 d0.0 ⇤ 4 8 16 32 -1 0 1 0.0 0.5 Average degree Disassortative 4 8 16 32 -1 0 1 0.0 0.5 WhatAverage degree Disassortative 1 is the Kurtosis of your SocialWhat Network?1 is the Kurtosis of your Social Network? AveragePrinceton degree Neuroscienced Institute,Abstract Princeton University,-1 Princeton,AveragePrinceton⇤ NJ, degreeBoth 08544, authors Neuroscienced USA contributed Institute, equallyAbstract Princeton to this work. University,1 8 Princeton, NJ, 08544, USA Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA -148 0.5Introducing16 Cumulants1 32 for0 Graphs:-148 0.5Introducing16 Cumulants32 forPrinceton4 Graphs: Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 1 1 1 2 h i Abstract 0.24 0.0 2 h i Abstract 0.00 0.5 What is the Kurtosis of your Social Network? What is the Kurtosis of your Social Network? ,1 ,2 ,1 ,2 Department of Astrophysicala b Sciences, Princeton University,1 0 Princeton,Department NJ, of08544, Astrophysicala b USA,1 Sciences, Princeton2 University,16,2 Princeton, NJ, 08544, USA,1 ,2 Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA Disassortative AssortativeGecia Bravo-HermsdorffPrincetonDisassortative Neuroscience⇤ & Lee M. Institute, Gunderson PrincetonAssortativeGecia⇤ University, Bravo-Hermsdorff Princeton,Department NJ, 08544, of⇤ Astrophysical USA& Lee M. Sciences, Gunderson Princeton,1⇤ University, Princeton, NJ,,2 08544, USA 2 2 0.00816 a+b ! 32 -1 +1Gecia0.508 Bravo-Hermsdorff16 a+b !⇤ 32&1 Lee M. Gunderson-10.24 ⇤8Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ & Lee M. Gunderson & Lee M. Gunderson 2 ,1 2 1 1 ,2 ,1 Gecia Bravo-Hermsdorff,2 ⇤ & Lee M. Gunderson⇤ Gecia Bravo-Hermsdorff⇤ & LeeDepartment M.˜ Gunderson/4 of AstrophysicalGecia⇤ Bravo-Hermsdorff Sciences, Princeton University,⇤ & Lee Princeton, M.Average32 Gunderson NJ, 08544, degree USA⇤ d 1 1 What1632 is the Kurtosis-1 of your0 Social4 What16 Network?32 is the Kurtosis-1 of your0 Social16 Network? , , 0.51 1 a 1 h i 1 1 h i 2 ⇤ ⇤ ,1 ,2 ,1 ,2 4 Average degree⇤Both authorsd contributed equally to this0.0 work. 4 Average⇤ degreeBoth authorsd contributed equally to this1/Disassortative-14 work. a b Assortative 0.0432-1 Gecia Bravo-Hermsdorffh i 0 ⇤ & Lee+1 M.C 0.0 Gunderson0.0432-1 Gecia⇤ Bravo-Hermsdorffh i 0 ⇤ & Lee+1˜ M.4 32Average Gunderson degree⇤ d a+b 1a a b Abstract 1b a b Abstracth i ! Abstract Abstract h i Abstract Princeton8 Disassortative Neuroscience Institute, PrincetonAssortative University, Princeton,0.5 Princeton8 NJ, 08544,Disassortative Neuroscience USA Institute, PrincetonAssortative University, Princeton,0 NJ, 08544,a USAb 0.58-10 a1+b 1 0.5 0.58-10 aAbstract1+b 1C -1Disassortative Assortative 2 b ! 12/3 c ! a+b Department16 of Astrophysical Sciences, Princeton⇤Both University, authors˜3 Princeton, contributedDepartment16⇤Both NJ, equally of08544, authors Astrophysical to USA this contributed work. Sciences, equally Princeton to this⇤ work.Both University, authors1 Princeton, contributed⇤Both NJ, equally 08544, authors to USA! this contributed work. equally to this work. 1601 0.0 ,1 0.0h 0.00i 1601 ,2 0.0 ,1 0.0 0 ,2 1 c Gecia Bravo-Hermsdorff1 & Lee1 M. Gundersond Gecia Bravo-Hermsdorff1 & Lee1 M.1/ Gunderson3 ⇤Both authors contributed equally to this work. Assortative Assortative ⇤ ⇤ ⇤ ⇤ 32 ⇤Both authorsPrinceton contributed Neuroscience equallyPrinceton Institute, toC this work.Neuroscience Princeton32 University, Institute, Princeton,⇤Both Princeton authorsPrinceton NJ, University, contributed08544, Neuroscience Princeton,USA equallyPrinceton Institute, NJ,˜ to3 this08544,0.0 work.Neuroscience Princeton USA University, Institute, Princeton, Princeton NJ, University, 08544, Princeton,USA NJ, 08544, USA d 3210.0 4 0.5 1 0.5 0.12 3210.0 4 0.5 1 0.5h 1Princetoni+1 Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 1 -1 2 2DepartmentAverage of Astrophysical1 -1 degree d Sciences,2 Princeton University, Princeton,2DepartmentC NJ,4 08544, of Astrophysical USA Sciences, Princeton University, Princeton, NJ, 08544, USA ! Princeton! Neuroscience⇤BothDepartment authors Institute, contributed Princeton of Astrophysical equally University, to Sciences, this Princeton,4 work. PrincetonPrinceton NJ, 08544,h University, Neurosciencei USA⇤Both Princeton,Department authors Institute, NJ,contributed Princeton of08544, Astrophysical USA equally University, to Sciences, this Princeton,0.5 work. Princeton NJ, 08544, University, USA Princeton, NJ, 08544, USA -1Average0.5 8 degree d -1Average0.5 8 degreea b d 2 b b 0.0b b a a 0.0 a 1 2 1 h i Abstract Disassortative2 1 h i Assortative Abstract Department8 of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA + DepartmentPrinceton0 + Neuroscience of Astrophysicala b Institute, Sciences, Princeton Princeton University, University, Princeton,8 Princeton,DepartmentPrinceton0 NJ, NJ, 08544, Neuroscienceof 08544,a Astrophysical+ USAb USAa b Institute, Sciences, Princeton Princeton University, University, Princeton, Princeton, NJ, NJ, 08544, 08544, USA USA a a Disassortativea a Assortative AverageDisassortative degree d! Assortative i i 00.5 16 a+b ! b b 00.5 16 a+b ! b 0.016 d 2 d 2 h i Abstract Abstract Both authors contributed equally to this work. Both authors contributed equally to this work. h Departmenth 1 of Astrophysical Sciences,Average degreePrincetond University,16 Princeton,Department1 NJ, of08544, Astrophysicala USAb Sciences,Average degreePrincetond University, Princeton, NJ, 08544, USA ⇤ ⇤ Average degree32 d 1 Disassortative32 Assortative 1 1 h i⇤Both authors contributed equallyh i c to thisc work.Abstract1 a⇤+Bothb authors contributed equallyh i c to this0.532Average work.Abstract degree d 1 0.0 a b a b 32 0.0 ! a b Disassortative1 Average degree d AssortativeDisassortative Assortative1 Average degree d Disassortative Assortative h i Abstract Princeton0.04 -1 Neurosciencea+hb i Institute, PrincetonAbstract University, a+b d! Princeton,d Princeton0.04 NJ, 08544,-1 Neuroscience USAh i Institute, PrincetonAbstract University, a+b d! Princeton,a-1 NJ, 08544,a USAb 0.5 !a b -1 0.5 a b Disassortative a+b Assortative 2 AverageDisassortative degree d Assortative 2 AverageDisassortative degree d Assortative ! Department0.58 0 of Astrophysical hi a+b ! Sciences,Abstract Princeton University,4 Princeton,Department0.58 NJ,0 of08544, Astrophysical USAhi a+b ! Sciences,1Abstract Princeton University,4 b0 Princeton, NJ, 08544, USA a b 0 a b Introducing Cumulants for Graphs: Introducing Cumulants for Graphs: Gecia Bravo-Hermsdorff Gecia Bravo-Hermsdorff Disassortative Assortative Disassortative Assortative 16 1 a+b !4 1 8 16 1 a+b !4 8 c1 Average+1 degree d 11 Average4 degree d 32Average4 0.0 degree d 8 1 16 32Average4 h i0.0 degree d 8 1 16 d0.0 h i 4 8 16 32 -1 0 1 0.0 0.5 Average degree Disassortative 4 8 16 32 -1 0 1 0.0 0.5 Average degree Disassortative 1 a b 8 a b Average degree dh i Abstract DisassortativeAverage degreeAssortativedh i Abstract Disassortative Assortative Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA -148 0.5 a b 16 32 -148 a0.5+b a b 16 32 a+b 1 1 Disassortative h i Assortative Disassortative ! h i Assortative 0.5 ! What is the Kurtosis of your Social Network? What is the Kurtosis of your Social Network? a a+b b 0.0 a a+b b 16 Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA Disassortative !Assortative Disassortative !Assortative 2 2 0816 a+b ! 32 -1 08161 a+b ! 322 -1 1 0.5 1 Average32 degree d 1 1 11632 -1 1 0 11632 -1 1 0 h i 2 4 Average degree d 0.00 4 Average degree d -1 a b 0.032-1 h i 0 1 +1 0.032-1 h i 0 1 +1 DisassortativeAverage degree d a+b Assortative a b a b h i ! 8 Disassortative Assortative 0.12 8 Disassortative Assortative 0 a b 0.5-10 a1+b ! 1 0.5-10 a1+b ! 1 Disassortative Assortative 16 a 16 1 a+b ! 01 0.0 0.0 01 0.0 0.0 1 32 1 b 32 1 0.0 Fig.10.0 3: The scaled triangle0.5 cumulant1 provides0.5 a1 principled0.0 measure0.5 of triadic1 closure.0.5 We Average-1 degree d c Average-1 degree d n 2 0.5 sampled0.00.5 networksh i from the symmetric stochastica block0.00.5 modelh oni nodes with communities,a 0 a b d 0 a b Disassortative Assortative Disassortative Assortative SSBM0.5 (n, 2, a, b),a+ forb ! a range of a and b, for both0.5128 nodes (top)a+b ! and 512 nodes (bottom).b 1 Average degree d 1 Average degree d h 1i c 0 h 1i c The horizontal axis indicates the levela b of assortativity: corresponds to Erdos–R˝ enyi´a b (ER)Average degree d 1 0.0Average degree d Disassortative Assortative0.0Average degree d Disassortative Assortative h i h i a+b d! h i a+b d! a b graphs (no communitya b structure), +1 to two disjoint ER graphs (onlya b connections1 within theDisassortative a+b Assortative Average0.5Disassortative degree d Assortative Average0.5Disassortative degree d Assortative ! hi a+b ! hi a+b ! two communities),a b and 1 to random bipartite graphs (only connectionsa b between the two Disassortative Assortative Disassortative Assortative a+b ! − a+b ! communities, thus, no triangles). TheAverage vertical degree axisd indicates the edge density, in termsAverage of degree d Average degree d 1 Averageh i degree d 1 h i h i a b h i a b average degree. Fora b each set of ,Disassortative we computea the+b averageAssortativea ofb both the global clusteringDisassortative a+b Assortative Disassortative Assortative Disassortative ! Assortative ! a+b ! a+b ! coefficient C (left) and scaled triangle cumulant κ˜31 (right), displaying the signed fourth root 1 1 1 1 for κ˜3 . While both measures increase1 with assortativity (as desired), the clustering1 coefficient also varies with network size and1 edge density, whereas the scaled triangle cumulant1 is to such differences. 1 1

11 Graph cumulants for networks with additional features

While it is possible to treat most networks as undirected, unweighted graphs, real networks frequently contain more information (e.g., node attributes). Our formalism naturally incorporates many such augmentations. As before, there is a graph moment of order r for each of the unique substructures (now endowed with the additional features) containing r edges. The conversion to graph cumulants likewise respects the additional features. We now discuss the specific cases of directed edges, node attributes, and edge weights. In supplementary materials S2, we illustrate their ability to quantify the propensity for different substructures in a variety of real networks with these additional features. In supplementary materials S10, we provide expressions for computing some of these augmented graph moments and graph cumulants. For clarity, we consider each of these features individually; combining them is relatively straightforward.

Directed edges

When analyzing a directed network, the graph moments must incorporate the orientation of the edges. While the first-order graph moment (µ1 ) still simply considers the number of directed edges (as edge orientation only carries meaning when considered with respect to the orientation of some other edge), there are now five second-order directed graph moments. The wedge configuration (two edges sharing one node) is now associated with three moments: one with both edges oriented towards the central node (µ2 ), one with both edges oriented away from it

(µ2 ), and one with an edge towards and the other away (µ2 ). The relative orientation of two edges that do not share any node cannot be determined, and therefore is still associated with a single moment (µ2 ). Finally, the configuration of two reciprocal edges (i.e., two nodes that are connected by edges in both directions) is associated with the fifth second-order moment (µ2 ). The appropriate normalization is with respect to the counts in the complete directed graph, i.e.,

12 that which has every pair of nodes connected by edges in both directions. Fig. S3 illustrates how incorporating the directed nature of protein interaction networks reveals additional structure.

Node attributes

Often, nodes of a network have intrinsic attributes (e.g., demographics for social networks). The graph moments of such networks are defined by endowing the subgraphs with same attributes. For example, consider the case of a network in which every node has one of two possible “flavors”: “charm” and “strange”. There are now three first-order graph moments: an edge between two “charm” nodes (µ1 ), an edge between two “strange” nodes (µ1 ), and an edge between one of each (µ1 ). To compute the moments, we normalize by their counts n n in the complete graph on the same set of nodes: here, 2 , 2 , and n n , respectively. Fig. S2 considers a (binary) gendered network of primary school  students [47], illustrating how incorporating node attributes elucidates the correlations between node type and their connectivity patterns. A common special case of networks with node attributes are bipartite networks: nodes have one of two types (e.g., authors and publications [48], plants and pollinators [49]) and edges are only allowed between nodes of different type. As certain subgraphs are now unrealizable, bipartite networks have only one first-order moment (an edge connecting a “charm” to a

“strange”, µ1 ), and two second-order wedge moments: a “charm” node connected to two

“strange” nodes (µ2 ), and a “strange” node connected to two “charm” nodes (µ2 ).

Edge weights

To compute graph moments for weighted networks, subgraphs should be counted with multiplicity equal to the product of their edge weights [24] (see supplementary materials S8 for a detailed justification). The normalization is the same as in the unweighted case, i.e., divide

13 the (weighted) count of the relevant subgraph by the count of this subgraph in the unweighted complete graph with the same number of nodes. Note that, unlike unweighted networks, graph moments may be greater than one for weighted networks. In fig. S4, we analyze a weighted network of social interactions [50], illustrating how allowing for variable connection strength can increase statistical significance and even change the resulting interpretations.

Inference from a single network

Unbiased estimators of graph cumulants

Thus far, we have not made a distinction between the graph moments of an observed network G and those of the distribution from which this network was sampled. This is because, in a G sense, they are the same: µ (G) = µ ( ) (where the angled brackets denote expectation h rg i rg G h · i with respect to the distribution ), a property known as “inherited on the average” [51]. G However, for cumulants, this distinction is important. Cumulants of a distribution are defined by first computing the moments of the distribution, then converting them to cumulants (as opposed to computing the cumulants of the individual samples, then taking the expectation of those quantities). Due to the fact that the cumulants are nonlinear functions of the moments, they are not necessarily preserved in expectation, i.e., not inherited on the average [51, 52]. For example, consider the problem of estimating the variance of a distribution over the real numbers given n observations from it. Simply using the variance of these observations gives an estimate whose expectation is less than the variance of the underlying distribution, and one

n should multiply it by the well-known correction factor of n 1 . The generalizations of this − correction factor for higher-order cumulants are known as the k-statistics [52]; given a finite number of observations, they are the minimum-variance unbiased estimators of the cumulants of the underlying distribution [53, 54, 55, 56, 57, 58]. In many applications, one wishes to estimate a over networks after G 14 observing only a single network G. Just as with classical cumulants, applying equation 2 to this network also yields biased estimates of the graph cumulants of the underlying distribution, i.e., κ (G) = κ ( ). In supplementary materials S3, we describe a procedure to obtain h rg i 6 rg G unbiased estimators of graph cumulants κˇrg, as well as the variance of these estimators Var(ˇκrg) (supplementary materials S6.2). In particular, for simple graphs, we provide the expressions for these unbiased estimators up to third order, as well as the variance for first order. Obtaining a complete list of the expressions for the unbiased estimators of graph cumulants and their variance would provide a powerful tool for network science (see supplementary materials S6.1 for a detailed discussion). This would allow for principled statistical tests of the propensity (or aversiveness) for any substructure without explicitly sampling from some null model, a procedure that is in general quite computationally expensive and often a main obstacle in the analysis of real networks [20, 59, 60, 61, 62]. However, sometimes one indeed requires samples from the underlying distribution, such as when assessing the statistical significance of some property that cannot be easily expressed in terms of the statistics defining this distribution. We now discuss how these unbiased graph cumulants can be used to obtain a principled hierarchical family of network models (see supplementary materials S4 and S5 for more details).

A principled hierarchical family of network models

A ubiquitous problem, arising in many forms, is that of estimating a probability distribution based on partial knowledge. Often, one desires the distribution to have some set of properties, but the problem is typically still highly unconstrained. In such cases, the maximum entropy principle [63, 64, 65] provides a natural solution: of the distributions that satisfy these desired properties, choose the one that assumes the least amount of additional information, i.e., that which maximizes the entropy. For example, when modeling real-valued data, one often uses a

15 Gaussian, the maximum entropy distribution (over the reals) with prescribed mean and variance. The analogous maximum entropy distributions for networks are known as exponential random graph models (ERGMs). These models are used to analyze a wide variety of real networks [66, 67, 68, 69, 70]. Although it is possible to define an ERGM by prescribing any set of realizable properties, attention is often given to the counts of edges and of other context-dependent substructures (such as wedges or triangles for social networks [4, 60, 71, 72]). While individual networks sampled from this distribution do not necessarily have the same counts (of edges and of specified substructures), the average values of these counts are required to match those of the observed network. Unfortunately, ERGMs of this type often result in pathological distributions, exhibiting strong multimodality, such that typical samples have properties far from those of the observed network they were intended to model. For example, the networks sampled from such an ERGM might be either very dense or very sparse, despite the fact that averaging over these networks yields the same edge count as that of the observed network. This phenomenon is known as the

“degeneracy problem”, and much effort has gone into understanding it [73, 72, 74]. While some remedies have been proposed [75, 76, 77], a principled and systematic method for alleviating degeneracy has thus far remained elusive.

Based on our framework, we propose a hierarchical family of ERGMs that are immune to the degeneracy problem (see fig. 4 for an example and supplementary materials S5 for details). Our hierarchy is based on correlations between an increasing number of individual connections: at rth order, the proposed ERGM is specified by all the graph cumulants of order at most r. Importantly, when inferring such an ERGM from a single observed network G, it is G appropriate to use the unbiased graph cumulants, such that κ ( ) =κ ˇ (G) for all subgraphs rg G rg through the chosen order (see supplementary materials S4 for the detailed protocol). There are two main differences between our proposed family of ERGMs and those typically

16 used in the literature. First, our family prescribes the expected counts of all subgraphs (including disconnected) with at most the desired order (see supplementary materials S9 for a motivation of this choice based on a spectral representation of distributions over networks). In contrast, current ERGMs only consider some (usually connected) subgraphs of interest, perhaps due to the fact that not all of these subgraphs are deemed important to model the observed network, or because disconnected subgraphs are not usually thought of as motifs [25, 78]. Second, the use of unbiased graph cumulants results in a distribution with expected subgraph counts different from those of the observed network. Nevertheless, the ERGM distribution induced by this choice generates samples that are appropriately clustered around the observed network (see fig. 4 and supplementary materials S5 for detailed intuition).

17 Introducing Cumulants for Graphs: Introducing CumulantsWhat for is Graphs: the Kurtosis of your Social Network? What is the Kurtosis of your Social Network? ,1 ,2 Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤

,1 ,2 Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤

Introducing Cumulants for Graphs: ⇤Both authors contributed equally to this work. 1 What is the Kurtosis of your SocialPrinceton Network? Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA What is the Kurtosis of your Social2 Network? ⇤Both authors contributed equallyDepartment to this of work. Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA 1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA ,1 ,2 Gecia Bravo-Hermsdorff⇤,1 & Lee M. Gunderson⇤,2 2Department of AstrophysicalGecia Bravo-Hermsdorff Sciences,⇤ & Princeton Lee M. Gunderson University,⇤ Princeton, NJ, 08544, USA

Probability density

Abstract 2 ⇤Both authors contributed equally to this work. ⇤Both authors contributed equally to this work. Probability density Edge counts c 1 1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 2 2Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA 2Department of Astrophysical Sciences, PrincetonAbstract University, Princeton,Probability NJ, density 08544, USAEdge counts c Wedge counts c a Probability density Edge counts c Wedge counts c ERGM(c ,c ) Abstract Abstract (degenerate) aProbability density Edge counts c WedgebProbability counts c density ERGM(c ,c ) (degenerate) ERGM(c ,c ) ERGM(µ1 , µ2 ) ERGM(µ1 , µ2 ) Edge counts(degenerate)c Wedge counts c ERGM(Edgec ,c ) counts c (our solution) (degenerate) (degenerate) ERGM(c ,c ) b ERGM(µˆc 1 ,ˆµ2 ,ˆµ2 ) c ERGM(µˆ1 ,ˆµ2 ,ˆµ2 ) Wedge countsc (ourc solution) ERGM(c ,c ) Wedge counts c (our solution) 0 (our solution) ) ) (degenerate) ERGM(c ,c )

ERGM(c ,c ) ,c ,c (our solution)ERGM(c ,c ) 0 22

c c c d (degenerate) ERGM(c ,c ) (degenerate) Probability density (our solution) 0 22 45 dERGM(c ,c ) ProbabilityERGM(c ,c ) density Probability density (our solution)Edge counts Wedge counts ProbabilityERGM( 0 (degenerate) densityERGM( (our solution) 0 22 22 (our solution)45 110 45 Edge360 counts c 110 0 114 360

Probability0 density 22 45 Edge0 counts110c 360 Edge counts c Wedge counts c 22 45 110 22 360 Fig.Edge 4: Our counts proposedc family of ERGMs does notWedge suffer counts fromc the “degeneracy problem”. 45 Wedge110 counts c 360 45 ERGM(c ,c ) We compute the exact distributions over simple graphs with 10 nodes resulting from fitting two (degenerate) ERGMsWedge110 to counts the samec observed360 network1 (shown on110 the right), and plot the resulting distributions ERGM(c ,c ) 1 of: a) edge counts, and b) wedge (i.e., 2-star) counts. Green corresponds to a network 360 (degenerate) 360 ERGM(c ,c ) model frequently used in the literature: ERGM(µ1 , µ2 ), the maximum entropy distribution1 with prescribed expected counts of edges and wedges. Purple(our corresponds solution) to our analogous ERGM(c ,c ) second-order network model: ERGM(ˇµ1 , µˇ2 , µˇ2 ), with prescribed expected unbiased counts of substructures of(our first solution) and second order, i.e., edges, wedges, and0 two edges that do not share any node. We fit the model using the procedure described in supplementary materials S4, with 1 unbiasing parameter0 η = 11 . Black lines denote the counts in the22 observed network that these distributions are intended to model. In sharp contrast to our proposed ERGM(ˇµ1 , µˇ2 , µˇ2 ), the 2 currently used ERGM22 (µ1 , µ2 ) can result in a distribution whose45 typical samples are notably different from the observed network. This is reflected in the fact that the green distributions2 have maxima far45 from their . This undesirable behavior,110 known2 as the degeneracy problem, tends to become even more pronounced for larger networks [75]. In supplementary 2 materials S5, we explain why this occurs and why our proposed family of ERGMs does not 110 360 suffer from this problem. 2 2

360

Discussion

Over a century ago, Thiele introduced cumulants [79], a concept now fundamental to the field of statistics [32, 33, 46, 45, 39, 40], which has justifiably percolated throughout the

scientific community [35, 36, 37, 38, 80]. In this work, we introduce graph cumulants, their generalization to networks (fig. 2). This principled hierarchy of network statistics provides

18

2

2 a framework to systematically describe and compare networks (figs. 3 and S1), naturally including those with additional features, such as directed edges (fig. S3), node attributes

(fig. S2), and edge weights (fig. S4). Moreover, through the lens of the maximum entropy principle, these statistics induce a natural hierarchical family of network models. These models are immune to the “degeneracy problem”, providing a principled prescription for obtaining distributions that are clustered around the properties of the network they intend to model (figs. 4, S5, and S6). To make appropriate predictions, one must acknowledge that the observed data are but one instantiation, inherently incomplete and stochastic, of some underlying process. In network science, one typically has a single network observation, and would like to make inferences about the distribution from which it came. This is analogous to characterizing the distribution of a classical random variable given a finite collection of samples. However, aside from the mean, the cumulants of a finite sample are not the same (in expectation) as those of the underlying distribution. The desired unbiased estimators are known as the k-statistics [52, 56, 58] (e.g.,

n the n 1 correction factor for the sample variance). Characterizing the distributions of these − unbiased estimators allows for a variety of principled statistical tests (e.g., the use of the χ2 distribution for analyzing the sample variance). In supplementary materials S3, we provide a procedure for deriving the analogous unbiased estimators of graph cumulants given a single network observation, and in supplementary materials S6.2, we provide a procedure for deriving their variance. While the derivations are incredibly tedious, once the expressions are obtained, they could be used to systematically measure the statistical significance of the propensity (or aversiveness) for arbitrary substructures (see supplementary materials S6.1). This is an incredibly promising avenue, as it circumvents the need for constructing and sampling from a network null model, a major challenge for many current methods. As with the rest of our framework, such an analysis naturally incorporates additional features, such as directed edges,

19 node attributes and edge weights. Graph cumulants quantify the propensity for substructures throughout the entire network.

However, in some applications, statistics that quantify the propensity of individual nodes (or edges) to participate in these substructures are more appropriate. For example, the local clustering coefficient [13] aims to describe the propensity of a node to participate in triangles

(as does the edge clustering coefficient for edges). However, as before, there is no general framework for arbitrary substructures. Our graph cumulant formalism again offers a systematic prescription. Essentially, the local graph cumulants are obtained by giving the node (or edge) of interest a unique identity and computing the graph cumulants of the entire network with this augmented information (see supplementary materials S7 for details). These local graph cumulants could serve as useful primitives in machine learning tasks such as node classification

[81, 82, 83] and link prediction [84]. Just as the scientific community has converged upon the variance as the canonical measure of the spread of a distribution, the field of network science could greatly benefit from a similarly standardized measure of the propensity for an arbitrary substructure. Inspired by over a century of work in theoretical statistics, the framework of graph cumulants introduced in this paper provides a uniquely principled solution.

References

1. D. J. Klein, M. Randic,´ Resistance distance. J. Math. Chem. 12, 81-95 (1993).

2. C. W. Lynn, D. S. Bassett, The physics of brain network structure, function and control.

Nat. Rev. Phys. 1, 318-332 (2019).

3. P. Landi, H. O. Minoarivelo, A.˚ Brannstr¨ om,¨ C. Hui, U. Dieckmann, Complexity and

stability of ecological networks: a review of the theory. Popul. Ecol. 60, 319-345 (2018).

20 4. S. Wasserman, K. Faust, Social Network Analysis: Methods and applications (Cambridge Univ. Press, Cambridge, 1994).

5. M. G. Bell, Y. Iida, Transportation network analysis (Wiley, 1997).

6. W. Hall, T. Tiropanis, Web evolution and web science. Comput. Netw. 56, 3859-3865 (2012).

7. G. Cimini, et al., The of real-world networks. Nat. Rev. Phys. 1, 58-71 (2019).

8. A.-L. Barabasi,´ et al., Network Science (Cambridge university press, 2016).

9. M. Newman, Networks (Oxford university press, 2018).

10. S. Emmons, S. Kobourov, M. Gallant, K. Borner,¨ Analysis of network clustering

algorithms and cluster quality metrics at scale. PloS One 11, e0159161 (2016).

11. S. Goswami, C. Murthy, A. K. Das, Sparsity measure of a network graph: Gini index.

Inf. Sci. 462, 16-39 (2018).

12. E. D. Demaine, et al., Structural sparsity of complex networks: Bounded expansion in

random models and real-world graphs. J. Comput. Syst. Sci. 105, 199-241 (2019).

13. D. J. Watts, S. H. Strogatz, Collective dynamics of “small-world” networks. Nature 393, 440-442 (1998).

14. A.-L. Barabasi,´ R. Albert, Emergence of scaling in random networks. Science 286, 509- 512 (1999).

15. E. R. Colman, G. J. Rodgers, Complex scale-free networks with tunable power-law

exponent and clustering. Physica A 392, 5501-5510 (2013).

21 16. D. Cai, T. Campbell, T. Broderick, “Edge-exchangeable graphs and sparsity”, in NIPS’16 Proceedings of the 29th International Conference on Neural Information Processing

Systems (2016).

17. A. Kartun-Giles, D. Krioukov, J. Gleeson, Y. Moreno, G. Bianconi, Sparse power-law

network model for reliable statistical predictions based on sampled data. Entropy 20, 257 (2018).

18. P. Holme, B. J. Kim, Growing scale-free networks with tunable clustering. Phys. Rev. E

65, 026107 (2002).

19. C. I. Del Genio, T. Gross, K. E. Bassler, All scale-free networks are sparse. Phys. Rev. Lett.

107, 178701 (2011).

20. C. Orsini, et al., Quantifying randomness in real networks. Nat. Commun. 6, 8627 (2015).

21. F. Caron, E. B. Fox, Sparse graphs using exchangeable random measures. J. Royal

Stat. Soc. 79, 1295-1366 (2017).

22. C. I. Sampaio Filho, A. A. Moreira, R. F. Andrade, H. J. Herrmann, J. S. Andrade Jr,

Mandala networks: ultra-small-world and highly sparse graphs. Sci. Rep. 5, 9082 (2015).

23. K. Zuev, M. Boguna,´ G. Bianconi, D. Krioukov, Emergence of soft communities from

geometric preferential attachment. Sci. Rep. 5, 9421 (2015).

24. L. Lovasz,´ Large Networks and Graph Limits (American Mathematical Soc., 2012).

25. R. Milo, et al., Network motifs: simple building blocks of complex networks. Science

298, 824-827 (2002).

22 26. U. Alon, Network motifs: theory and experimental approaches. Nat. Rev. Genet. 8, 450- 461 (2007).

27. F. Xia, H. Wei, S. Yu, D. Zhang, B. Xu, A survey of measures for network motifs. IEEE

Access 7, 106576–106587 (2019).

28. C. Fretter, M. Muller-Hannemann,¨ M.-T. Hutt,¨ Subgraph fluctuations in random graphs.

Phy. Rev. E 85, 056119 (2012).

29. R. Ginoza, A. Mugler, Network motifs come in sets: Correlations in the randomization

process. Phys. Rev. E 82, 011921 (2010).

30. M. Ritchie, L. Berthouze, I. Z. Kiss, Generation and analysis of networks with a prescribed

degree sequence and subgraph family: higher-order structure matters. J. Complex Netw. 5, 1-31 (2017).

31. T. N. Thiele, Theory of observations (Charles & Edwin Layton, 1903).

32. G.-C. Rota, J. Shen, On the of cumulants. J. Combin. Theory A 91, 283-304 (2000).

33. P. McCullagh, Tensor Methods in Statistics: Monographs on Statistics and Applied

Probability (Chapman and Hall, 1987).

34. B. Gunderson, M. Aliaga, Interactive statistics (Prentice Hall, 1999).

35. X. Wu, M. Xia, H. Zhang, Forecasting var using realized egarch model with and kurtosis. Finance Res. Lett. (2019).

36.S.V ah¨ amaa,¨ Skewness and kurtosis adjusted black-scholes model: A note on hedging

performance. Finance Lett. 1, 6-12 (2003).

23 37. M. J. Blanca, J. Arnau, D. Lopez-Montiel,´ R. Bono, R. Bendayan, Skewness and kurtosis

in real data samples. Meth. Eur. J. Res. Meth. Behav. Soc. Sci. 9, 78-84 (2013).

38. M. Kardar, Statistical Physics of Particles (Cambridge Univ. Press., 2007).

39. T. Speed, Cumulants and partition lattices i. J. Aust. Math. Soc. 25, 378-388 (1983).

40. T. Speed, Cumulants and partition lattices ii: Generalised k-statistics. J. Aust. Math. Soc.

40, 34-53 (1986).

41. R. Curticapean, H. Dell, D. Marx, “Homomorphisms are a good basis for counting small subgraphs”, in STOC’17 Proceedings of the 49th ACM Symposium on Theory of Computing (2017).

42. F. V. Fomin, D. Lokshtanov, V. Raman, S. Saurabh, B. R. Rao, Faster algorithms for

finding and counting subgraphs. J. Comput. Syst. Sci. 78, 698-706 (2012).

43. O. Amini, F. V. Fomin, S. Saurabh, “Counting subgraphs via homomorphisms”, in International Colloquium on Automata, Languages, and Programming (2009).

44. A. Pinar, C. Seshadhri, V. Vishal, “Escape: Efficiently counting all 5-vertex subgraphs”, in Proceedings of the 26th International Conference on World Wide Web (2017).

45. A. Hald, The early history of the cumulants and the gram-charlier series. Int. Stat. Rev. 68, 137-153 (2000).

46. B. Gnedenko, A. Kolmogorov, Limit distributions for sums of independent random

variables (Addison-Wesley, 1954).

47. J. Stehl’e, et al., High-resolution measurements of face-to-face contact patterns in a

primary school. PloS One 6, e23176 (2011).

24 48. G. Bravo-Hermsdorff, et al., Gender and collaboration patterns in a temporal scientific

authorship network. Appl. Netw. Sci. 4, 112 (2019).

49. C. Campbell, S. Yang, R. Albert, K. Shea, A network model for plant–pollinator

community assembly. Proc. Natl. Acad. Sci. U.S.A. 108, 197-202 (2011).

50. L. Isella, et al., What’s in a crowd? analysis of face-to-face behavioral networks.

J. Theor. Biol. 271, 166-180 (2011).

51. J. W. Tukey, Some sampling simplified. J. Am. Stat. Assoc. 45, 501-519 (1950).

52. R. A. Fisher, Moments and product moments of sampling distributions.

P. Lond. Math. Soc. 2, 199-238 (1930).

53. P. R. Halmos, The theory of unbiased estimation. Ann. Math. Stat. 17, 34-43 (1946).

54. M. Kendall, Some properties of k-statistics. Ann. Eugen. 10, 106–111 (1940).

55. M. Kendall, Proof of fisher’s rules for ascertaining the sampling semi-invariants of k-

statistics. Ann. Eugen. 10, 215-222 (1940).

56. J. F. Kenney, E. Keeping, Mathematics of statistics Vol. II (D. Van Nostrand Co., 1951).

57. G. James, On moments and cumulants of systems of statistics. Sankhya:¯ The Indian

Journal of Statistics 20, 1-30 (1958).

58. N. Fisher, Unbiased estimation for some non-parametric families of distributions.

Ann. Stat. 10, 603-615 (1982).

59. C. T. Butts, A perfect sampling method for random graph models.

J. Math. Sociol. 42, 17-36 (2018).

25 60. G. Robins, P. Pattison, Y. Kalish, D. Lusher, An introduction to exponential random graph

(p*) models for social networks. Soc. Netw. 29, 173-191 (2007).

61. P. Wang, K. Sharpe, G. L. Robins, P. E. Pattison, Exponential random graph p* models

for affiliation networks. Soc. Netw. 31, 12-25 (2009).

62. V. Veitch, D. M. Roy, Sampling and estimation for (sparse) exchangeable graphs.

Ann. Stat. 47, 3274-3299 (2019).

63. E. T. Jaynes, Information theory and . Phy. Rev. 106, 620-630 (1957).

64. E. T. Jaynes, Information theory and statistical mechanics. ii. Phys. Rev. 108, 171-190 (1957).

65. S.-I. Amari, Information geometry and its applications, vol. 194 (Springer, 2016).

66. Z. M. Saul, V. Filkov, Exploring biological network structure using exponential random

graph models. J. Bioinform. 23, 2604-2611 (2007).

67. A. Chakraborty, H. Krichene, H. Inoue, Y. Fujiwara, Exponential random graph models

for the japanese bipartite network of banks and firms. J. Comput. Soc. Sci. 2, 3-13 (2019).

68. J. Koskinen, P. Wang, G. Robins, P. Pattison, Outliers and influential observations in

exponential random graph models. Psychometrika 83, 809-830 (2018).

69. M. E. J. N. Juyong Park, Statistical mechanics of networks. Phy. Rev. E 70, 066117 (2004).

70. C. E. Buddenhagen, et al., Epidemic network analysis for mitigation of invasive pathogens

in seed systems: Potato in ecuador. Phytopathology 107, 1209-1218 (2017).

71. D. Strauss, On a general class of models for interaction. SIAM Review 28, 513-527 (1986).

26 72. M. E. J. N. Juyong Park, Solution of the two-star model of a network. Phy. Rev. E 70, 066146 (2004).

73. S. Chatterjee, P. Diaconis, Estimating and understanding exponential random graph

models. Ann. Stat. 41, 2428-2461 (2013).

74. M. E. J. N. Juyong Park, Solution for the properties of a clustered network. Phy. Rev. E

72, 026136 (2005).

75. S. Horvat,´ E.´ Czabarka, Z. Toroczkai, Reducing degeneracy in maximum entropy models

of networks. Phys. Rev. Lett. 114, 158701 (2015).

76. T. A. Snijders, P. E. Pattison, G. L. Robins, M. S. Handcock, New specifications for

exponential random graph models. Sociol. Method. 36, 99-153 (2006).

77. A. Caimo, N. Friel, Bayesian inference for exponential random graph models. Soc. Netw.

33, 41-55 (2011).

78. A. R. Benson, D. F. Gleich, J. Leskovec, Higher-order organization of complex networks.

Science 353, 163-166 (2016).

79. T. Thiele, The general theory of observations. Repitzel, Copenhagen (1889).

80. R. Kubo, Generalized cumulant expansion method. J. Phys. Soc. Jpn. 17, 1100-1120 (1962).

81. T. N. Kipf, M. Welling, “Semi-supervised classification with graph convolutional networks”, in ICRL’17 Proceedings of the 7th International Conference on Learning Representations (2017).

27 82. W. L. Hamilton, R. Ying, J. Leskovec, Representation learning on graphs: Methods and

applications. IEEE Data Eng. Bull. 40, 52-74 (2017).

83. W. Hamilton, Z. Ying, J. Leskovec, “Inductive representation learning on large graphs”,

in NeurIPS’17 Proceedings of the 31st International Conference on Neural Information Processing Systems (2017).

84. D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social networks.

J. Am. Soc. Inf. Sci. Tec. 58, 1019-1031 (2007).

85. G. Csardi, T. Nepusz, et al., The igraph software package for complex network research.

InterJournal 1695, 1-9 (2006).

86. I. Wolfram Research, Mathematica (Wolfram Research, Inc., Champaign, Illinois, 2019).

87. M. Danisch, O. Balalau, M. Sozio, “Listing k-cliques in sparse real-world graphs”, in WWW’18 Proceedings of the 27th World Wide Web Conference (2018).

88. N. Chiba, T. Nishizeki, Arboricity and subgraph listing algorithms. SIAM J. Comput. 14, 210-223 (1985).

89. M. Kuba, A. Panholzer, On the degree distribution of the nodes in increasing trees.

J. Combin. Theory A 114, 597-618 (2007).

90. M. Houbraken, et al., The index-based subgraph matching algorithm with general symmetries (ismags): exploiting symmetry for faster subgraph enumeration. PloS one

9, e97896 (2014).

91. S. Wernicke, Efficient detection of network motifs. IEEE/ACM Trans. Com-

put. Biol. Bioinform. 3, 347-359 (2006).

28 92. G. M. Slota, K. Madduri, “Fast approximate subgraph counting and enumeration”, in ICPP’13 Proceedings of the 42nd International Conference on Parallel Processing

(2013).

93. P. Ribeiro, F. Silva, L. Lopes, “Efficient parallel subgraph counting using g-tries”, in

Proceedings of the 11th IEEE International Conference on Cluster Computing (2010).

94. M. Aliakbarpour, et al., Sublinear-time algorithms for counting star subgraphs via edge

sampling. Algorithmica 80, 668-697 (2018).

95. K. Miyajima, T. Sakuragawa, Continuous and robust clustering coefficients for weighted

and directed networks. arXiv:1412.0059 (2014).

96. G. Fagiolo, Clustering in complex directed networks. Phy. Rev. E 76, 026107 (2007).

97. A. Barrat, M. Barthelemy,´ R. Pastor-Satorras, A. Vespignani, The architecture of complex

weighted networks. Proc. Natl. Acad. Sci. USA 101, 3747-3752 (2004).

98. I. E. Antoniou, E. T. Tsompa, Statistical analysis of weighted networks. Discrete

Dyn. Nat. Soc. 2008, 375452 (2008).

99. G. Robins, M. Alexander, Small worlds among interlocking directors: Network structure

and distance in bipartite graphs. Comput. Math. Organ. Theory 10, 69-94 (2004).

100. J. C. Brunson, Triadic analysis of affiliation networks 3, 480-508 (2015).

101. M. C. G. Pedro G. Lind, H. J. Herrmann, Cycles and clustering in bipartite networks.

Phys. Rev. E 72, 056127 (2005).

102. N. Bhardwaj, K.-K. Yan, M. B. Gerstein, Analysis of diverse regulatory networks in a hierarchical context shows consistent tendencies for collaboration in the middle levels.

Proc. Natl. Acad. Sci. USA 107, 6841-6846 (2010).

29 103. https://oeis.org/A000088, The on-line encyclopedia of integer sequences (OEIS), entry A000088 (founded by Neil Sloane in 1964).

104. M. Nouri, S. Talatahari, A. S. Shamloo, Graph products and its applications in

mathematical formulation of structures. J. Appl. Math. 2012, 510180 (2012).

105. X. Gao, B. Xiao, D. Tao, X. Li, A survey of graph edit distance. Pattern Anal. Appl. 13, 113-129 (2010).

106. A. Rucinski,´ When are small subgraphs of a random graph normally distributed?

Proba. Theory Relat. Fields 78, 1-10 (1988).

107. P. G. Maugis, S. Olhede, C. Priebe, P. Wolfe, Testing for equivalence of network

distribution using subgraph counts. J. Comput. Graph. Stat. , 1-29 (2020).

108. P. J. Bickel, A. Chen, E. Levina, The method of moments and degree distributions for

network models. Ann. Statist. 39, 2280-2301 (2011).

109. T. Opsahl, P. Panzarasa, Clustering in weighted networks. Soc. Netw. 31, 155-163 (2009).

110. P. Zhang, et al., Clustering coefficient and community structure of bipartite networks.

Phys. A. 387, 6869-6875 (2008).

111. C. Borgs, et al., “Graph limits and parameter testing”, in Proceedings of the 38th annual

ACM symposium on Theory of Computing (2006).

30 Supplementary Materials S1 Python module for computing graph cumulants

We provide a python module that computes graph moments of networks, including networks with directed edges, edge weights, and binary node attributes. Conversions to and from graph cumulants (as well as their unbiased counterparts) are also implemented. We use the python package igraph [85] to count instances of subgraphs in a network. In conjunction with the symbolic computation available in Mathematica [86], we automatically derived the expressions for the counts of disconnected subgraphs in terms of the connected counts (see supplementary materials S1.1), as well as the expressions for converting graph moments to graph cumulants (see supplementary materials S10). Our code contains the expressions to obtain graph cumulants up to: sixth order for undirected unweighted networks, fifth order for undirected weighted networks (as well as all sixth order connected subgraphs), fifth order for directed unweighted networks (as well as the sixth order K3 subgraph), subgraphs of K2,2 for bipartite networks, and subgraphs of

K3 for networks with binary node attributes. Expressions for the unbiased graph cumulants are implemented up to third order for undirected unweighted networks (see supplementary materials S3).

S1.1 Efficiently computing graph moments

The scalability of our framework is determined by the computational time required to count the relevant connected subgraphs. This is because the counts of the disconnected subgraphs can be inferred by the counts of the connected subgraphs, and therefore do not need to be explicitly enumerated. This is an important fact, as the counts of the disconnected subgraphs are generally orders of magnitude larger than those of the connected ones. Moreover, counting

31 connected subgraphs is an active area of research, and much work has gone into their efficient computation [41, 42, 43, 44].

To illustrate how the counts of disconnected subgraphs are derivable from the connected subgraph counts, consider the case of second-order moments for simple graphs. From first

n order, one has the counts of edges in the network, c = 2 µ1 . Consider all unordered pairs of distinct edges; each pair corresponds to a single second-order  count: either of a wedge, or c = c + c of two edges that do not share any node, thus 2 . Hence, the count of two edges that do not share any node c is directly derivable  from the count of edges c and the count of wedges c . A similar argument applies to all orders. For instance, at third order, there are two disconnected subgraphs: three edges that do not share any node, and a wedge and an edge that do not share any node. By enumerating all triplets of distinct edges, as well as all pairs of a wedge and an edge not contained in that wedge, we obtain the following expressions:

c = c + c + c + c + c , 3   c (c 2) = c + 3c + 3c + 2c . −

We now discuss the scalability of counting the instances of a connected subgraph g with n0 nodes in a network G with m edges and n nodes. The complexity of a na¨ıve enumeration of the

n! n0 (n n0)! potential node mappings scales as (n ). However, there exist notably more efficient − O algorithms for certain subgraphs (such as triangles, stars, and cliques), especially when G has particular properties, such as sparsity [78, 87, 90]. For example, the worst-case computational

0 n /2 time complexity for counting n0-cliques is known to be at most (n0m ) time [88]. The counts O of the r-stars can be quickly computed, as they are proportional to the rth factorial moments of the degree distribution [89, 94]. Moreover, some of these algorithms can be substantially accelerated through parallel computation [92, 93] and approximate values can be obtained by

32 stochastic methods [91]. Asymptotics aside, from a pragmatic perspective, Pinar et al. [44] showed that the exact counts of all connected subgraphs with up to 5 nodes can be obtained for networks with tens of millions of edges in minutes on a commodity machine (64GB memory).

S2 Applications to networks with additional features

In this section, we first demonstrate how our framework provides a natural notion of clustering in bipartite networks. We then illustrate the utility of incorporating additional network features by analyzing real networks with directed edges, node attributes, and weighted edges.

S2.1 Quantifying clustering

Quantifying clustering in networks with additional features is an active domain of research that has arguably not reached a consensus. For example, there have been multiple proposals for weighted [109, 97, 98], directed [95, 96] and bipartite networks [99, 110, 100, 101]. For all cases, our framework provides a principled measure of clustering, viz., the relevant scaled graph cumulant. For example, for directed networks, the two third-order scaled triangle cumulants provide two measures of clustering: one with cyclic orientation κ˜3 and one with transitive

κ˜3 . For bipartite networks, extensions are somewhat less straightforward, as triangles are now excluded. Several proposed measures consider the appearance of 4-cycles, similarly compared to the number of incomplete cycles. In fig. S1, we compare the scaled graph cumulant of the 4-cycle subgraph κ˜4 with the clustering coefficient proposed by [99], expressed in our framework as C = µ4 /µ3 . Again our measure is more directly sensitive to the propensity for clustering.

33 Introducing Cumulants for Graphs: Introducing Cumulants for Graphs: Introducing CumulantsIntroducing Cumulants for Graphs: for Graphs: What is the KurtosisWhat is of the your Kurtosis Social of your Network? Social Network? What is the KurtosisWhat is of the your Kurtosis Social of your Network? Social Network?

,1 ,2 Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ ,1 ,1 ,2 ,2 Gecia Bravo-HermsdorffGecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ & Lee⇤ M. Gunderson⇤ ,1 ,2 Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤

⇤Both authors contributed equally to this work. 1 Introducing Cumulants⇤Both authors forIntroducing Graphs: contributed equallyPrinceton Cumulants to this Neuroscience work. Institute, for Graphs: Princeton University, Princeton, NJ, 08544, USA 1Princeton Neuroscience Institute, Princeton2Department University, of Princeton, Astrophysical NJ, 08544, Sciences, USA Princeton University, Princeton, NJ, 08544, USA ⇤Both authors contributed equally to this work. 2Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA What is the Kurtosis1 of yourWhat Social is the Network? Kurtosis of your Social Network? ⇤Both authors contributedPrinceton equally Neuroscience to this work. Institute, Princeton University, Princeton, NJ, 08544, USA Abstract 1 2 Princeton Neuroscience Institute,Department Princeton of University, Astrophysical Princeton,Abstract Sciences, NJ, 08544, Princeton USA University, Princeton, NJ, 08544, USA 2 ,1 ,2 ,1 ,2 Department of AstrophysicalGecia Bravo-Hermsdorff Sciences, Princeton⇤ & Lee University,M.Gecia Gunderson Bravo-Hermsdorff Princeton,⇤ NJ,⇤ 08544,& Lee USA M. Gunderson⇤ 4 Introducing Cumulants for Graphs: 4 8 Abstract 8What is the Kurtosis of your Social16 Network?Introducing Cumulants for Graphs: ,2 ,2 ⇤ ⇤ 16 Abstract 32 Introducing Cumulants⇤Both authors for Graphs: contributedIntroducing equally to this Cumulants work.⇤Both authors forWhat Graphs: contributed is the equally Kurtosis to this of work. your Social Network? 1 32 1 ,1 0.0 ,2 What isPrinceton the Kurtosis Neuroscience of your Institute, Social Princeton Network?GeciaWhat Bravo-Hermsdorff is University,Princeton the Kurtosis Princeton, Neuroscience⇤ & Lee of M. yourNJ, Gunderson 08544, Institute, Social USA⇤ Princeton Network? University, Princeton, NJ, 08544, USA 0.0 1.0 2 4 2 ,1 ,2 Department of Astrophysical Sciences, PrincetonDepartment University, of AstrophysicalPrinceton, NJ, Sciences, 08544, USAGecia Princeton Bravo-Hermsdorff University,⇤ Princeton,& Lee M. Gunderson NJ, 08544,⇤ USA 1.0 0.45 4 ,1 8 ,2 ,1 ,2 Gecia Bravo-Hermsdorff⇤ & Lee M.0.45 GundersonIntroducing⇤ ⇤GeciaBoth authors Cumulants Bravo-Hermsdorff contributed equally for⇤ to& Graphs: this Lee work. M.a GundersonIntroducing⇤ Cumulants for Graphs: Introducing1Princeton Cumulants NeuroscienceInstitute, for Graphs: Princeton University,Introducing Princeton, NJ, 08544, Cumulants USA for Graphs: 8 2 a Abstract16 b AbstractBoth authors contributed equally to this work. Introducing CumulantsWhat forDepartment Graphs: is the of Astrophysical KurtosisIntroducing Sciences, of Princeton your Cumulants University,Social Princeton, Network?What for Graphs: NJ, is 08544, the USA Kurtosis⇤ of your Social Network? Introducing Cumulants for Graphs: Introducing Cumulants1 for Graphs: What is the Kurtosisb of your Social Network?What is the Kurtosisc Princeton Neuroscienceof your Institute,Social Princeton Network? University, Princeton, NJ, 08544, USA What16 is the Kurtosis of your Social Network?What32 is the Kurtosis of your Social2 Network? What is⇤Both the authors Kurtosis contributed equally of your to this work. Social Network?What is⇤Both the authors Kurtosis contributed equally of your to thisDepartment work. Social of Astrophysical Network? Sciences, Princeton University, Princeton, NJ, 08544, USA 1 c Introducing1 CumulantsAbstract for Graphs:d Introducing Cumulants for Graphs: Princeton Neuroscience Institute, Princeton University, Princeton, NJ,Princeton 08544, USA Neuroscience Institute,,1 Princeton University, Princeton,,2 NJ, 08544, USA ,1 ,2 & Lee M. Gunderson & Lee M. Gunderson 2 Gecia2 Bravo-Hermsdorff & Lee M. Gunderson Gecia Bravo-Hermsdorff & Lee M. Gunderson Department32 Introducing of Astrophysical4 Sciences, Cumulants Princeton University, for Princeton, Graphs:Department0.0 NJ, 08544,Introducing of USAAstrophysical4 ⇤ Sciences, Cumulants Princeton University, for⇤ Princeton, Graphs: NJ, 08544, USA ⇤ ⇤ dWhat is,1 the Kurtosis of your,2 Social Network?What is,1 the KurtosisAbstract of your,2 Social Network? ,1 ,1 Gecia,1 Bravo-Hermsdorff⇤,2 & Lee M. Gunderson⇤ Gecia,1 Bravo-Hermsdorff⇤,2 & Lee M. Gunderson⇤ 1 1 ⇤ ⇤ Gecia8 Bravo-Hermsdorff⇤ & Lee,1 M. Gunderson4 ⇤ Gecia,28 Bravo-Hermsdorff⇤ & Lee,1 M. Gunderson⇤ ,2 0.0What is theGecia Kurtosis Bravo-Hermsdorff of your⇤ & Social Lee M. Gunderson Network?1.0What is⇤ theGecia Kurtosis Bravo-Hermsdorff of your⇤ & Social Lee M. Gunderson Network?⇤ Introducing CumulantsAbstract for Graphs:8 Introducing CumulantsAbstract for Graphs: Abstract Abstract 16 16 16 Average4 degree d 1.0 0.45⇤Both authors contributed equally,1 to this work. ,2 Gecia⇤Both Bravo-Hermsdorff authorsh i contributed equally,1 & Lee to this M. work. Gunderson ,2 1 Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson1 ⇤ ⇤ ⇤ What is32 the Kurtosis of your⇤BothPrinceton Social authorsAverage32 contributedNeuroscience Network?What degree equally Institute, isd32 to the this Princeton work. Kurtosis University, of Princeton, your⇤Both NJ,Princeton SocialProximity authors 08544,8 contributedNeuroscience USA Network? parameter equally Institute, tof this Princeton work. University, Princeton, NJ, 08544, USA ,1 ,2 ,1 ,2 ⇤Both authors1 contributed equally2 to this work. h i ⇤Both authors1 contributed equally2 to this work. 4 Gecia Bravo-Hermsdorff⇤BothPrinceton authors contributedNeuroscience⇤ & LeeDepartment equally M. Institute, Gunderson0.0 to of this Princeton Astrophysical work.4 ⇤ University,Gecia Sciences, Bravo-Hermsdorff Princeton, Princeton⇤Both NJ,Princeton authors University, 08544, contributedNeuroscience⇤ USA Princeton,& LeeDepartment equally M. NJ, Institute, Gunderson 08544,16 to of this Princeton Astrophysical USA work.⇤ University, Sciences, Princeton, Princeton NJ, University, 08544, USA Princeton, NJ, 08544, USA 0.451Princeton Neuroscience0.0 Institute, Princeton University, Princeton,Proximity NJ,a1Princeton 08544, parameter USA Neuroscience0.0f Institute, Princeton University,C Princeton, NJ, 08544, USA 8 1Princeton Neuroscience2Department Institute, of Princeton Astrophysical University, Sciences,1.0 Princeton, Princeton8 1 NJ,Princeton University, 08544, Neuroscience USA Princeton,2Department NJ, Institute, 08544, of Princeton Astrophysical USA University, Sciences,32 Princeton, Princeton NJ, University, 08544, USA Princeton, NJ, 08544, USA 2Department of Astrophysical Sciences, Princeton University, Princeton,2Department NJ, 08544, of USAAstrophysical Sciences, Princeton University, Princeton,1/4 NJ, 08544, USA 162Department1.0 of Astrophysical Sciences,,1 PrincetonC University,0.45 Princeton,162Department,2 NJ, 08544,1.0 of Astrophysical USA Sciences,,1 Princeton˜ University,0.0 Princeton,,2 NJ, 08544, USA a Gecia Bravo-Hermsdorff⇤ & Lee M. Gundersonb ⇤ ⇤BothGecia authors Bravo-HermsdorffAbstract contributed equally⇤ to this& Lee work. M.4 Gunderson⇤ ⇤Both authorsAbstract contributed equally to this work. 32 0.4 32 h 1.0i f

f 1 0.45 ˜1Princeton/4 Abstract Neuroscience0.45 Institute, Princeton University, Princeton, NJ,1Princeton 08544,Abstract USA Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA ⇤Both authors contributed equally to this4 work. ⇤Both authors contributed equally to this work.0.45 0.0 0.6 i 0.0 i AbstractAbstract 2Departmenth i c of Astrophysical Sciences, PrincetonAbstract University,Abstract Princeton,2Department NJ, 08544, of USAAstrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA

1 1d bd Princeton Neuroscience Institute, Princeton University, Princeton, NJ,Princeton 08544, USA Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA Both authors contributed equally to this work. Both authors contributed equally to this work. h h 1.0 a 4 0.8 1.0 a 4 a 1 ⇤ ⇤ 2Department of Astrophysical Sciences, Princeton University, Princeton,2Department NJ, 08544, of USAAstrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA 0.45 4 8 a 0.45 4 8 b c ⇤Both authors contributed equally to this work. d ⇤Both authors contributed equally to this work. b b Abstract 1 c Abstract 1Princeton4a 4 Neuroscience Institute,8 Princeton University,16 Princeton,b NJ,1Princetona4 08544,4 USA Neuroscience Institute,8 Princeton University,16 Princeton, NJ, 08544, USA b c b d 2Departmentd8 8 of Astrophysicalc Sciences,16 Abstract Princeton University,32 Princeton,2Department8 NJ, 08544,8 of USAAstrophysicalc Sciences,16 Abstract Princeton University,32 Princeton, NJ, 08544, USA 16c 16 32 0.0 d c16 16 32 0.0 Introducing Cumulants for Graphs: Introducing Cumulants for Graphs: Gecia Bravo-Hermsdorff Gecia Bravo-Hermsdorff d 4 d 4 32d 32 0.0 1.0 d32 32 0.0 1.0 8 8Average degree d 0.04 0.0 1.0 Abstract 0.4 0.04 0.0 1.0 Abstract 0.4 h i 16AverageAverage degree d degree d 16Proximity parameter f 4 8 16 32 0.0 1.0 0.45 a b c d Average degree Proximity parameter C 4 8 16 32 0.0 1.0 0.45 a b c d Average degree 1.08 Proximity parameter 1.0C 0.4 0.3 1.08 h 1.0i 0.4 0.3 Average degree d 32 Average degree d 32C Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA

Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA h i 0.4516 0.4516

1 h i h i 1 0.4 0.3 0.5 0.4 0.3 0.5 What is the Kurtosis of your Social Network? What is the Kurtosis of your Social Network? Proximity4 parameterAveragef degreed d 0.0 ProximityProximity4 parameterAverage parameterf degree1 fd 0.0

Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA Average degree Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA a32 0.3 0.5 0.7 a32 0.3 0.5 0.7 2

2 h i h i C8 h i 1.0 C8 1.0 b0.0 0.5 Proximity parameter0.7 f a b0.0 0.5 Proximity parameter0.7 f a 1 16 f 0.45 C 16 0.45 Proximityc1.0 0.7 parametera b c1.0 0.7 a b 32 C a 32 C a d0.45 a b 1 c d0.45 a Web simulated1 a bipartitec Fig.C S1:0.0 A principled measure of clusteringb in bipartite0.0 networks. b a b c d a b c d geometric1.0 graph model with two parameters,c f and1.0 d , which determine the propensityc for b c d b h ci d Average0.45 degree d d Average0.45 degree d d clusteringc d (horizontalh i axis) and the edge density (verticalc1 d axis),h i respectively. The model first 1 1 Proximitya parameter f Average degreeProximityad parameter f Average degree d dividesd the nodes (here, 256) into two groups of equaldh i size and randomly places them on the h i Cb Average degree d Cb Average degree d unit sphere. Each node may only connecth i toAverage nodes degree from1d the other group, and onlyh i whenAverage they degree d c Average degree d c h Averagei degree1 d h i 1 d h i Proximity parameterd f h i 1 f Proximity parameter f are withinAverage degree a certaind radius, such1 that the area it contains1Average degree is a fractiond 1of the entire unit 1 h i C h i − C sphere.Proximity Among parameter thesef possible connections,1 a randomProximity subset parameter isf chosen, so as to1 match the C C desiredAverage average degree degreed d . For each set of parameters,Average degree we computed the average of both the h i h i h i 1 1 globalProximity bipartite parameter clusteringf coefficient C from [99]Proximity (left) parameter and thef scaled square cumulant for C 1 C 1 bipartite networks κ˜4 (right), displaying the signed fourth root for κ˜4 . While both clustering measures increase with f, the bipartite clustering coefficient C also notably increases with 1 1 average degree, whereas κ˜4 is insensitive to such changes in edge density.

34 4 4 8 4 8 16 8 16 32 16 32 0.3 32 0.3 0.4 0.3 0.4 0.5 0.4 0.5 0.6 0.5 0.6 0.7 0.6 0.7 0.8 0.7 0.8 Average degree d 0.8 h i Average degree d h i Proximity parameter f Average degree d h i Proximity parameter f a b Disasortative Assortative Proximity parameter f a+b ! a b Disasortative a+b Assortative a b ! 0 Disasortative a+b Assortative ! 0 1 0 1 +1 1 +1 1 +1 1 1 0.00 0.00 0.24 0.00 0.24 3 3 3 ˜ 1/4 0.24 Introducing Cumulants forh Graphs:4 i ˜ 1/4 C h 4 i 1/4 What is the Kurtosis of your Social Network? ˜4 C h i ˜ 1/3 C h 3 i 1/3 C ˜3 ,1 ,2 h i Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ 1/3 ˜3 C 1 h i ˜/r C rg 1/r ˜rg ˜rg Both authors contributed equally to this work. 1/r ⇤Both authors contributed equally to this work. ˜rg ˜rg 1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 2 ) ) ) Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA ˜rg c c c ˆ ˆ ˆ c c c a ) , ) , ) ,

c c S2.2 Networksc with node attributes ˆ ˆ ˆ c c c a Abstract b ,c , ,c , ,c , c c c c c c a b c ( ( ( ( ( ( � Original node attributes b c��� Permuted node attributes � � 2 3 4 2 c 3 2 4 3 2 ��� 3 2 3 4 2 3 / / / / / / / / / / / / / / / 2 3 4 2 1 3 2 4 3 1 2 3 2 3 4 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ 2 0 23 45 114 360 0 23 45 ERGM 114 ERGM 360 (degenerate) (our solution) ERGM ERGM Edge counts 0 (degenerate) Wedge counts 23 (our solution) Probability density 45 114 Edge counts  360 Wedge counts  Probability density  ERGM ERGM   (degenerate)  (our solution)    Edge counts  Wedge counts Original node attributes  Probability density Permuted node attributes   Original node attributes  Permuted node attributes    � Original node attributes Permuted node attributes � ��� 2 � ��� 2 �

Fig. S2: Including node attributes reveals additional structure. We use the scaled graph cumulants for weighted networks with a binary node attribute to analyze a social network (222 nodes and 5364 edges) of interactions between primary school students from [47], with edge weights proportional to the number of interactions between pairs of students and node attributes corresponding to their (binary) sex. Purple nodes indicate female students, and green

nodes indicate male students. Colored bars denote the value1 of (scaled) graph cumulant in the original network, and error bars denote the mean plus or minus one for randomly shuffled node attributes (64 runs). a) The three first-order cumulants. b) The six second-order scaled wedge cumulants. c) The four third-order scaled triangle cumulants. The first-order graph cumulants (i.e., the density of edges between nodes of the indicated type) reveal a symmetric preference for homophily between the sexes, an effect well-documented in the social science literature [4]. As all second-order scaled wedge cumulants are positive, we can infer a preference for hubs (i.e., nodes with degree notably larger than the average). Likewise, as all third-order scaled triangle cumulants are notably positive, we can infer a preference for triadic closure, a feature also present in many social networks [13, 60]. While there does not appear to be much of a difference between the sexes at second order, the third-order scaled cumulants suggest that triadic closure is more prevalent when more participants are female.

35 Introducing Cumulants for Graphs: IntroducingWhat Cumulants is the for Kurtosis Graphs: of your Social Network? What is the Kurtosis of your Social Network? Introducing Cumulants for,1 Graphs: ,2 Introducing Cumulants forGecia Graphs: Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ Gecia Bravo-HermsdorffWhat is the Kurtosis,1 & Lee M. Gunderson of your Social,2 Network? What is the Kurtosis of your Social⇤ Network? ⇤ ,2

⇤ ⇤Both authors contributed equally to this work. 1 ,1 ,2 GeciaPrinceton Bravo-Hermsdorff Neuroscience Institute,⇤ & Lee Princeton M. Gunderson University, Princeton,⇤ NJ, 08544, USA ⇤Both authors,12 contributed equally to this,2 work. Gecia Bravo-Hermsdorff⇤ Department& Lee M. of Gunderson Astrophysical⇤ Sciences, Princeton University, Princeton, NJ, 08544, USA 1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 2 DepartmentIntroducing of Cumulants Astrophysical for Graphs: Sciences, Princeton University, Princeton, NJ, 08544, USA Introducing Cumulants for Graphs: Abstract What is the Kurtosis of your Social Network? What is the Kurtosis of your Social Network?⇤Both authors contributed equally to this work. 1 ⇤Both authors contributedPrinceton Neuroscience equallyAbstract to this Institute, work. Princeton University, Princeton, NJ, 08544, USA Gecia Bravo-Hermsdorff ,1 & Lee M. Gunderson ,2 1 Gecia Bravo-Hermsdorff⇤2,1 & Lee M. Gunderson⇤,2 Princeton NeuroscienceGecia Bravo-Hermsdorff Institute,⇤ Department& Lee Princeton M. Gunderson of University,⇤ Astrophysical0 Princeton, Sciences, NJ, Princeton 08544, USA University, Princeton, NJ, 08544, USA 2 & Lee M. Gunderson DepartmentIntroducing of Astrophysical Cumulants Sciences, for Graphs: Princeton University,5 Princeton, NJ, 08544, USA ⇤Both authors contributed equally to this work. 1 ⇤Both authors contributed equally to this work. ,1 1Princeton Neuroscience0 Institute, Princeton University, Princeton, NJ, 08544, USA

1 1 ⇤ 1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA10 What is2Department thePrinceton Kurtosis of Neuroscience Astrophysical of Institute, Sciences, your Princeton Social Princeton University, University, Network? Princeton, Princeton, NJ, NJ, 08544, 08544, USA USA Abstract 2Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA Introducing2Department of Astrophysical5 Cumulants Sciences, for Princeton Graphs: University, Princeton, NJ, 08544, USA Abstract Abstract 15 Abstract 10 ,1 Abstract ,2 WhatGecia is the Bravo-Hermsdorff Kurtosis of⇤ your& LeeAbstract M.Social Gunderson Network?⇤ a Introducing Cumulants for Graphs: 0 15 0 0 b 5 ,1 ,2 WhatGecia is the Bravo-Hermsdorff5Both Kurtosis authors contributed of⇤ & yourequally Lee to M. Social this Gunderson work. Network?⇤ 0 10⇤ a 5 1Princeton Neuroscience10 Institute, Princeton University, Princeton, NJ, 08544, USA c Introducing15 Cumulants for Graphs: 2Department of Astrophysical15 Sciences, Princeton University, Princeton, NJ, 08544, USA 5 a b 10 Geciaa Bravo-Hermsdorff ,1 & Lee M. Gunderson ,2 d What is theb⇤Both Kurtosis authors contributed of⇤ yourequally to Social this work. Network?⇤ b 1Princeton10 Neurosciencec Institute,c PrincetonAbstract University, Princeton, NJ,15 08544, USA

Both authors contributed equally to this work. S2.3 Networksc with directed edges 2 dc ⇤ Department of Astrophysicald Sciences, Princeton University, Princeton, NJ, 08544, USA d d ,1 ,2 Yeast 15 Gecia Bravo-Hermsdorff⇤ & Lee M. Gundersona ⇤ 0 Yeast⇤Both authors contributed equally to this work. Yeast 1Princetona5 NeuroscienceYeast Institute, PrincetonAbstract University, Princeton,b NJ, 08544, USA 2Department10 of AstrophysicalHuman Sciences, Princeton University, Princeton, NJ, 08544, USA Human 15 Yeast 15 Human b015 Original⇤Both authors orientation contributed equally to this work. c Introducing Cumulants for Graphs: Gecia Bravo-Hermsdorff 1 a Original orientation Abstract Princeton5 NeuroscienceRandom Institute, orientation Princeton University, Princeton, NJ, 08544, USA b Random orientation 2Department10 of AstrophysicalRandom orientation Sciences, Princeton University, Princeton,10 NJ, 08544, USA c 1 d 10 1/r r c ˜rg Human Original orientation ˜rg1/r / 15 /r 1 rg 0 ˜rg d g rg ˜ 0 5 10 15 a b c d Yeast Human Original orientation Random orientation  da g 5 g Abstract 5 Random orientation

Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA b5 Yeast10 1 1 Yeast

What is the Kurtosis of your Social Network? c 15 Original orientation1 Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA dHuman0 0

2 1 a0 /r Yeast5 b Random orientation ˜rg YeastOriginal10 orientation Human c Random15 orientation g Humand 1/r We Fig.Human S3:a1/r The graph cumulant formalism naturally incorporates directed structure. ˜rg ˜rg use theb scaled cumulants for directed networks to analyze regulatory networks of protein gOriginalYeast orientation Original orientation interactionsRandomc orientation from [102] of:g a) yeast (4441 nodes and 12873 edges), and b) humans (3197 nodes Humand 1 and 6896Original1/r edges). orientation Error bars denote1 the mean andRandom one standard orientation deviation for the same network ˜rg withgOriginalYeast randomized orientation edge orientations. Despite the marked phenotypical differences between RandomRandom orientation orientation these two species, their regulatory networks display1 r notable similarities. In particular, of the Human / 1 1 1/r κ˜ ˜rg κ˜ triangular˜rg substructures ( 3 ), the feedforward (or transitive) substructure ( 3 ) is significantly Original1/r orientation moreg prevalent than the cyclic (κ˜3 ). Interestingly, while there is a higher prevalence of a central ˜Randomrg orientation g protein regulating many others (κ˜2 ), proteins that regulate each other display a propensity to 1/r 1 bothg regulate˜rg the same other protein (κ˜4 ). g 1 1 1

36 2 , ⇤

Introducing Cumulants for Graphs: What is the Kurtosis of your Social Network? & Lee M. Gunderson 1

, ,1 ,2

,1 ,2 1 ⇤ Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ Abstract

⇤Both authors contributed equally to this work. ⇤Both authors contributed equally to this work. 1 1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 2 2Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA

S2.4 Networks with weighted edges

Both authors contributed equally to this work. Abstract ⇤ 10 Weighted 8 Weighted Unweighted Introducing Cumulants for Graphs: Gecia Bravo-Hermsdorff 6 r / 1 1 rg rg ˜ ˜ µ  4  g Weighted Unweighted Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA 1 What is the Kurtosis of your Social Network? Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA 2 2

0

-2

Fig. S4: Incorporating edge weights can increase the signal and possibly change the resulting interpretations. We use the scaled graph cumulants for weighted networks to analyze a social network (410 nodes and 2765 edges) of face-to-face interactions during an exhibition on infectious diseases from [50], with edge weights proportional to the total time

a pair of people spent interacting. The scaled cumulants associated with clustering (i.e., κ˜3 and κ˜6 ) increase when using the true edge weights (dark purple) as opposed to using the corresponding unweighted network (light purple). Conversely, many of the others become

smaller, in particular, κ˜2 and κ˜3 , both of which are associated with power-law properties κ˜ of the degree distribution. The most notable deviation occurs for 4 , which is1 positive for the unweighted network, but negative for the weighted network. This negative cumulant could be interpreted as an anticorrelation between participating in triadic closure and interacting with many others, not unreasonable for such an exhibition: hosts are likely to talk with many different people, whereas groups of visitors tend to interact amongst themselves.

S3 Unbiased graph cumulants

In this section, we discuss the desired properties of the unbiased estimators κˇrg of graph cumulants and how to obtain them.

37 In the spirit of k-statistics [52], imagine a large network G with many nodes (the ∞ “population”), from which one randomly subsamples n nodes, and observes the induced subgraph G (the “sample”). We require the expectations of the unbiased graph cumulants

κˇrg to be invariant under this node subsampling (i.e., κˇrg(G) = κrg(G )), and to have h i ∞ the appropriate limit (i.e., κˇ (G) κ (G) as n ). Analogous to real-valued random rg → rg → ∞ variables, the expressions for the unbiased graph cumulants κˇrg(G) will be polynomials in the graph moments µrg(G). Graph moments are preserved in expectation under random subsampling of the nodes [107,

108], i.e.,

µrg(G) = µrg(G ). (3) h i ∞

As an example, consider the expectation of µ1 when removing a single random node i. Let c be the counts of edges in a network with n nodes, di be the degree of node i, and c0 be the counts of edges in this network after removing node i (and all its connections). Clearly, c0 = c d , − i 2c 2 so c0 = c di . As di = n , we obtain c0 = (1 n )c . Dividing by the edge counts in h i − h i h i h i − n (2) 2 the corresponding complete networks, we have µ10 = n−1 (1 n )µ1 = µ1 . By induction, h i ( 2 ) − the expectation of µ1 is also preserved under removal of any number of random nodes. Products of moments, however, are generally not preserved in expectation (under node subsampling). Fortunately, they can be expressed in terms of a linear combination of individual moments [106, 107], which, as mentioned above, are preserved in expectation. For example,

2 consider µ1 . The squared counts of edges satisfies the following relation:

c = c + c , 2  

38 which, in turn, implies

1 µ2 = # µ + 2# µ + 2# µ #2 2 4(n 2)  (n 2)(n 3) = µ + − µ + − − µ , (4) n(n 1) n(n 1) n(n 1) − − − where #g is the count of subgraph g in the complete network with n nodes (e.g., the denominators of equations 35–46 in supplementary materials S10). A s moments are preserved in expectation, taking the limit as n in equation 4 yields → ∞

µ2 µ . n −−−→→∞

In fact, this is general, when a graph distribution is obtainable via sampling from a single graphon (the natural limit of a sequence of graphs with an increasing number of nodes [111]), products of graph moments limit to the graph moment associated to their disjoint union [24]:

µr g µ(P r )(S g ). (5) i i −−−→n i i i i i →∞ Y In particular, for a single network observation, this implies that the unbiased graph cumulants are zero for all disconnected subgraphs. By combining relations 3 and 5, we obtain a succinct derivation of the unbiased estimators: begin with the combinatorial definition of the cumulant (equation 2), and replace all products of graph moments with a single graph moment associated with the disjoint union of the individual

39 graphs, e.g.,

κˇ1 = µ1 , (6)

κˇ = µ µ , (7) 2 2 − 2

κˇ2 = 0, (8)

κˇ = µ 3µ + 2µ , (9) 3 3 − 3 3 κˇ = µ 3µ + 2µ , (10) 3 3 − 3 3 κˇ = µ 2µ + µ , (11) 3 3 − 3 3

κˇ3 = 0, (12)

κˇ3 = 0. (13)

S4 Fitting ERGMs using unbiased graph cumulants

We now describe how to infer a model from our proposed hierarchical family of ERGMs using a single observed network G. In particular, we consider ERGMs with prescribed expected graph moments of at most order r (or, equivalently, the associated subgraph counts, as the number of nodes n is fixed). These distributions have the following form:

1 p(G) = ER 1 2(G) exp β c (G) , (14) Z n, / g g g ! X

Z = ERn,1/2(G0) exp βgcg(G0) , (15) G0 Ω " g !# X∈ X where ERn,p(G) is the probability of the network G in an Erdos–R˝ enyi´ random graph model (i.e., the presence of an edge between any pair of nodes occurs independently with the given

3 probability p) ; βg is the parameter (or Lagrange multiplier) associated with subgraph g; cg(G)

3Just as a biased coin has maximal entropy for p = 1/2, here, the maximum entropy distribution is given by

ERn,1/2, which is uniform over all labeled simple graphs with n nodes, i.e., uniform over all their associated adjacency matrices.

40 is the count of subgraph g in the network G; Ω is the space of all simple graphs with n nodes; and Z is the normalization constant (or partition function).

While an ERGM is typically specified by the desired expectations of the statistics of interest (in this case, subgraph counts/moments), the parameters needed to compute the distribution are the βg, and in general must be determined numerically. Moreover, as the number of unique graphs grows super-exponentially in the number of nodes (e.g., there are 12005168 simple graphs with 10 nodes [103]), the partition function Z often cannot be exactly computed. However, there exists a large body of literature on sampling and variational techniques for efficiently approximating Z as a function of βg [76]. Given a single observed network, the protocol for inferring an ERGM from our hierarchical family is as follows:

1. Choose the order of the desired ERGM. Use the graph moments of observed network µrg

to compute the unbiased estimators κˇrg of all graph cumulants up to and including this order (see supplementary materials S3).

2. Substitute these unbiased cumulants into the combinatorial formula (equation 2) to

obtain the desired unbiased graph moments µˇrg for this ERGM.

3. Fit the parameters βg such that the resulting ERGM distribution has expected graph

moments equal to these µˇrg.

S4.1 Partially unbiased ERGMs

The expressions for the unbiased graph cumulants are derived by assuming that the nodes were sampled randomly from a much larger underlying network G . However, this assumption ∞ may not always be appropriate, such as when the observed network is small, or when it could

41 feasibly represent a significant fraction of the system of interest. For these cases, we introduce an adjustable unbiasing parameter η [0, 1], where 1 corresponds to the aforementioned “fully” ∈ unbiased case, and 0 corresponds to using the original moments µrg of the observed network (“no unbiasing”). Essentially, instead of assuming that the underlying network is infinite, η = 1 n controls its size (N nodes) relative to that of the observed network (n nodes). − N The procedure is similar to before, with a modified second step. Instead of using the combinatorial expressions to convert κˇrg to the desired moments for the ERGM µˇrg, one inverts the expressions for the unbiased cumulants assuming a graph with N nodes. As the unbiased cumulants associated to disconnected subgraphs are always zero, their inverse expressions are not completely determined by the forward expressions; one must also use the expressions relating the products of graph moments for a single network, yielding, e.g.,

µˇ1 =κ ˇ1 , # #2 # µˇ = κˇ + κˇ2 κˇ , 2 # + # 2 2(# + # ) 1 − 2(# + # ) 1 (N 2)(N 3) N(N 1) 2 = − − κˇ + − κˇ2 κˇ , (N + 1)(N 2) 2 (N + 1)(N 2) 1 − (N + 1)(N 2) 1 − − − # #2 # µˇ = κˇ + κˇ2 κˇ 2 −# + # 2 2(# + # ) 1 − 2(# + # ) 1 4(N 2) N(N 1) 2 = − κˇ + − κˇ2 κˇ . −(N + 1)(N 2) 2 (N + 1)(N 2) 1 − (N + 1)(N 2) 1 − − − Note that the combinatorial definitions are recovered as N . → ∞

S5 A geometric understanding of the degeneracy problem

The degeneracy problem refers to the appearance of undesirable large-scale multimodality in the distribution induced by an ERGM; despite the fact that averaging over this distribution gives expected subgraph counts equal to those of the observed network (as desired), typical samples

42 from it have counts vastly different from these average values.

Essentially, this arises due to the shape of the base distribution (i.e., ERn,1/2) as a function of the statistics whose expected values are constrained [75] (here the subgraph counts, or equivalently, the corresponding graph moments). Recall from equation 14 that these ERGM distributions have the following form:

p(G) ER 1 2(G) exp β c (G) . (16) ∝ n, / g g g ! X Projecting this distribution to the space of the relevant subgraph counts (i.e., summing the probability of all networks for which these counts are the same), and taking its logarithm yields:

~ ln p(~c) = ln ER 1 2(~c) + β ~c, (17) n, / ·  where ~c is the vector of relevant subgraphs counts, β~ is the vector of their associated parameters, and we have dropped the term associated with the partition function (as it does not depend on ~c). Thus, to understand the behavior of p(G), it is geometrically instructive to look at the shape of ln ERn,1/2 as a function of ~c. To provide intuition about the degeneracy problem and our proposed solution, here we give attention to a commonly used (and easily visualizable) 2D model, denoted by ERGM(µ1 , µ2 ), which prescribes the expected counts of edges and wedges in the distribution to be equal to those of the observed network. For comparison, we consider our second-order ERGM, which additionally prescribes the expected counts of pairs of edges that do not share any node. We will discuss both the case when the expectations of these three subgraph counts are prescribed to be those of the observed network, denoted by ERGM(µ1 , µ2 , µ2 ), and when they are prescribed to be equal to the unbiased values (see supplementary materials S4), denoted by

ERGM(ˇµ1 , µˇ2 , µˇ2 ). Consider all tuples representing realizable subgraph counts of a single network in their re- spective 2D (for ERGM(µ1 , µ2 )) or 3D (for ERGM(µ1 , µ2 , µ2 ) and ERGM(ˇµ1 , µˇ2 , µˇ2 ))

43 spaces (fig. S5). Any point within the convex hull formed by these points may serve as the prescribed expected values of some ERGM. However, some of these choices require degenerate distributions. For example, consider c = 1 # , c = 1 # (where # is the count of subgraph h i 2 h i 2 g g in the complete graph with n nodes). Indeed, the only distribution with these expected values is an equal mixture of the empty and complete networks — in a sense, the most “degenerate” distribution possible! Even if one restricts attention to tuples of subgraph counts that are realizable by a single network, ERGM(µ1 , µ2 ) still does not always concentrate around these values. In particular, this occurs when one chooses a network that lies along the concave boundary of the of ln(ERn,1/2(c , c )) (i.e., the region in fig. S5b, where c is large for a given number of edges). This can be understood by considering equation 17: the β~ term (which serves to enforce the prescribed expected subgraph counts) is linear and essentially “pushes” on the distribution with the same direction and magnitude everywhere. Thus, increasing the expected counts of wedges is inevitably coupled with a motion of the probability density toward the “tips” of this crescent-shaped domain. Hence, the expected counts of wedges and the spread in the counts of edges cannot be independently controlled, and the distribution can become degenerate. ~ In contrast, the β in ERGM(µ1 , µ2 , µ2 ) has an additional degree of freedom. Thus, it is able to independently control the expected counts of edges and wedges, as well as the spread in the counts of edges. However, if one requires that the expected counts (c , c , c ) are exactly equal to those of the observed network, the resulting distribution necessarily concentrates on networks with precisely this edge count. Essentially, this occurs because the triplet (c , c , c ) of any individual network lies on the boundary of the convex hull formed by all such triplets. For a fixed number of edges, the relationship between the second-order moments is linear:

# µ2 + # µ2 = C (see fig. S5a). Additionally, the relationship between the counts of edges and this invariant sum C has a curvature that does not change sign: C = 1 (#2µ2 # µ ). 2 1 − 1

44 Thus, any distribution with expected counts equal to those of an observed network must have support only in the linear direction given by the set of triplets with the same invariant sum C

(and therefore the same number of edges). Thus, we have “solved” the degeneracy problem by essentially fixing the number of edges in the ERGM. However, such a solution is not satisfactory for many applications.

In order to obtain a non- containing networks with different numbers of edges, the triplet of expected counts must be slightly in the interior of the convex hull, in the direction of the red arrow in figs. S5a and S6. The unbiased graph cumulants derived in supplementary materials S3 provide a natural and consistent prescription for obtaining such modified triplets of expected counts (and, more generally, modified tuples of expected counts for higher order ERGMs). While this may seem to be an unusual choice (as such tuples are not realizable by any individual network), it is indeed quite natural: even the ERn,p distributions have tuples of expected counts that lie in this direction.

45 Introducing Cumulants for Graphs: Introducing Cumulants for Graphs: What is the Kurtosis of yourIntroducing Social Network? Cumulants for Graphs: What is the Kurtosis of your Social Network? What is the Kurtosis of your Social Network?

Introducing Cumulants,1 Introducing for Graphs: Cumulants,2 for Graphs: Gecia Bravo-Hermsdorff⇤,1 & Lee M. Gunderson⇤,2 Gecia Bravo-Hermsdorff⇤ Introducing& Lee M. Gunderson Cumulants⇤ for Graphs: WhatIntroducing is the Cumulants Kurtosis forWhat of Graphs: your is the Social Kurtosis Network?, of1 your Social Network?,2 Introducing Cumulants forWhat Graphs: isGecia the Kurtosis Bravo-Hermsdorff of your Social⇤ & Network? Lee M. Gunderson⇤ What is the Kurtosis of your SocialIntroducing Network? Cumulants for Graphs: What is the Kurtosis of your Social Network? What is the Kurtosis,1 of your Social Network?,2 ⇤Both authors contributedGecia equally Bravo-Hermsdorff to this work. ⇤ & Lee M. Gunderson⇤ ,1 ,1 ,2 ,2 ,1 ,2 1 GeciaGecia Bravo-Hermsdorff⇤Both Bravo-Hermsdorff authors⇤ contributed& Lee M. Gunderson⇤ equally&Gecia Lee⇤ to this M. Bravo-Hermsdorff work. Gunderson⇤ ⇤ & Lee M. Gunderson⇤ Princeton Neuroscience Institute,,1 Princeton University,,2 Princeton, NJ, 08544, USA 1 Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ ⇤Both authors contributed equally to this work. 2 Princeton Neuroscience Institute, Princeton University, Princeton, NJ,,1 08544, USA ,2 Department of Astrophysical Sciences, PrincetonGecia University, Bravo-Hermsdorff Princeton,⇤ NJ,& Lee 08544, M. Gunderson USA ⇤ 2 1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA Department of Astrophysical Sciences, Princeton University,⇤Both authors Princeton, contributed equally NJ, 08544, to this work. USA ⇤Both authors contributed equally2Department1Princeton to this work. Neuroscience of Astrophysical Institute, Princeton Sciences, University, Princeton Princeton, NJ, University, 08544, USA Princeton, NJ, 08544, USA 1 2 Princeton⇤Both Neuroscience authors contributed Institute, Princeton equally University, to thisDepartment work. Princeton, of Astrophysical NJ, 08544, USA Sciences, Princeton University, Princeton, NJ, 08544, USA 1 2 PrincetonDepartment Neuroscience of Astrophysical Institute,⇤ PrincetonSciences,Both authors PrincetonUniversity, University, contributed Princeton, Princeton, NJ, 08544, equally NJ, USA 08544,⇤Both to USA authorsthis⇤Both work. contributed authors equally contributed to this work. equally to this work. 2 1 Department1Princeton of Astrophysical Neuroscience Sciences, Princeton Institute, University,Abstract Princeton Princeton,1PrincetonPrinceton NJ, University, 08544, Neuroscience Neuroscience USA Institute, Princeton, Princeton Institute, NJ, University, 08544, Princeton Princeton, USA NJ,University, 08544, USA Princeton, NJ, 08544, USA 2Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA 2Department of Astrophysical Sciences,Abstract2Department Princeton University, of Astrophysical Princeton,Abstract Sciences, NJ, 08544, Princeton USA University, Princeton, NJ, 08544, USA Abstract Abstract Abstract a ERGM(µ1 ,µ2 ) Abstract ERGM(µ1 ,µ2 ) ERGM(µ1 ,µ2 ) a (degenerate)(degenerate) b (degenerate) ERGM(µ1 ,µ2 ) AbstractERGM(µ1 ,µ2 ) Abstract b ERGM(µ1 , µˆ2 , µˆ2 ) c(degenerate)(degenerate) ERGM((our solution)µ1 , µˆ2 , µˆ2 ) ERGM(µ , µˆ , µˆ ) c (our solution)1 2 2 d ERGM(µ1 , µˆ2 , µˆ2 ) c (our solution)µ , µˆ , µˆ (ourERGM( solution)µ1 ,µ2 ) ERGM(ERGM(1 µ12 ,µ22 )) d Probability density (degenerate) (our(degenerate) solution) c c c Probabilityc density Edge counts c c c ERGM(µ , µˆ , µˆ ) c ERGM(µ , µˆ , µˆ ) Edgec counts c 1 2 2 Wedge counts c 1 2 2 c (our solution) c(our solution) Wedge counts c c c c c c 1 1 c Fig. S5: Prescribing the expected1 counts of all subgraphs of first and second order allows forc greater control over the resulting distribution.c Here, we explicitly enumerate1 all graphs with 10 nodes. Each point corresponds to a tuple of subgraph counts, and the

color corresponds to ln ER10,1/2 , with darker colors denoting higher probability. a) When

representedc as a function of (c , c , c ), the density of ERc 1 2 lies on a 2D submanifold with  10, / extrinsic curvature. Three orthogonal directions can independently control the edge counts (blue arrow), the wedge counts (green arrow), and the1 spread in the edge counts (red arrow). b) In contrast, when representing the distribution as a1 function of (c , c ), these three quantities cannot be independently controlled. This can lead to the “degenerate” distributions with 1

significant bimodality observed in certain ERGM(µ1 , µ2 ).

1 1

46 Fig. S6: A stereographic image of the base distribution for graphs with 10 nodes. Displayed is the logarithm of the base distribution ln(ER10,1/2(c , c , c )) embedded in the space of all subgraph counts up to and including second order, i.e., edges, wedges, and two edges that do not share any node. Darker colors indicate higher probability. To fully appreciate the stereographic effect, print the image using either standard A4 paper or US letter size. Begin with the paper close to your eyes. Allow your left eye to focus on the left image, while your right eye focuses on the right image. Slowly move the paper away from your eyes, while maintaining focus on the middle of the “three” images, until it is approximately 20–40 cm (8–16 in) away. The “middle” image should be more apparent than those on either side, and its upper right corner should appear farther away than its bottom left. See here for an animation.

For a few extremal networks, our prescription for obtaining the modified expected counts may result in tuples that lie outside the convex hull, and thus do not lead to realizable ERGMs. This tends to occur for networks that are unlikely to be observed when subsampling nodes from a large network (such as regular or nearly-regular graphs). From a pragmatic perspective, this is unlikely to be an issue, as real networks tend not to have such properties. Moreover, if one does observe a network for which this is the case, the issue is often alleviated by using an intermediate choice of the unbiasing parameter η to obtain the modified tuples of expected counts (see supplementary materials S4.1).

47 S6 Statistical inference without constructing an explicit null model

In order to assess the statistical significance of a network’s propensity for substructures, one needs to compare the observed cumulants with the distribution obtained from some appropriate null model. While our proposed hierarchical family of ERGMs is a principled option, unfortunately, obtaining the parameters β~ is often computationally prohibitive. Fortunately, our procedure to derive the unbiased graph cumulants κˇrg (supplementary materials S3) can also be used to derive their variance Var(ˇκrg), allowing for statistical tests of a network’s propensity for substructures without explicitly constructing a null model. We first explain how to perform such a statistical test, then we describe how to obtain the variance of the unbiased graph cumulants.

S6.1 Statistical test using κˇrg and Var(ˇκrg)

To analyze a substructure g with r edges, first measure the moments of the observed network up to order 2r, and use these to compute the unbiased cumulant κˇrg, as well as its variance Var(ˇκ ). If κˇ = 0, this potentially indicates a propensity (or aversiveness) for the substructure rg rg 6 g, depending on the sign of κˇrg. To determine if such an assessment is statistically significant, one should compute the (squared) Z-score associated with the null hypothesis that κˇ 0: rg ∼ 2 2 κˇrg 2 Zrg = . If Zrg 1, one can be reasonably confident that the observed network has a Var(ˇκrg)  propensity (or aversiveness) for the substructure g. This procedure can also be used to measure the similarity between two networks, by applying a two-sample t-test to the pair of unbiased cumulants associated to each particular substructure.

The standard conversion from a Z-score to a p-value tacitly assumes normality. While this does not necessarily hold in general, the distribution of κˇrg is indeed asymptotically normal as n [108]. → ∞ 48 S6.2 Deriving Var(ˇκrg)

The variance of the unbiased graph cumulants can be obtained by exploiting the known [107, 108] constraints on the products of graph moments (supplementary materials S3). In general, the expressions for Var(ˇκrg) require moments up to order 2r, as do the analogous expressions for the variance of the unbiased estimators for the cumulants of real-valued random variables.

We now describe this procedure, using the derivation of Var(ˇκ1 ) as an example. The variance of the unbiased graph cumulant associated to the edge (κˇ1 = µ1 i.e., the edge density) is given by

Var(ˇκ ) = κˇ2 κˇ 2. (18) 1 h 1 i − h 1 i

2 The second term is simply µˇ1 of the distribution. Evaluation of the first term requires the “product” rule for the graph moments (see supplementary materials S3), in particular, we have

2 # 2# 2# that µ = 2 µˇ + 2 µˇ + 2 µˇ (i.e., equation 4). Thus, we have: h 1 i # 1 # 2 # 2

# 2# 2# 2 Var κˇ = µˇ1 + µˇ2 + µˇ2 µˇ 1 #2 #2 #2 − 1  2 4(n 2) (n 2)(n 3) = µˇ + − µˇ + − − µˇ µˇ2 . (19) n(n 1) 1 n(n 1) 2 n(n 1) 2 − 1 − − − 2 We note that, in fully unbiased case, µˇ1 =µ ˇ2 , as κˇ2 = 0.

S7 Local graph cumulants

Graph cumulants are statistics of the entire network, quantifying its overall propensity for a given substructure. However, in some applications, such as node classification [81, 82, 83] and link prediction [84], one often desires statistics of the propensity of an individual node or edge to participate in a given substructure. The graph cumulant framework naturally incorporates both of these “local” cases. In this section, we describe how to derive these local graph moments and

49 cumulants for both nodes and edges, providing the expressions necessary to compute both local triangle cumulants.

S7.1 Node local graph cumulants

The node local graph moments and cumulants are defined by giving a unique identity to the node of interest (here, symbolically distinguished by an empty circle), and applying the equations for general node attributes (see supplementary materials S10.3). For example, for simple graphs, there are now two first-order moments. One is defined as the count of edges between the distinguished node and any other node (i.e., the degree of the distinguished node), again normalized by the corresponding count in the associated complete graph:

c µ = . 1 n 1 − The other first-order moment is defined as the count of edges that do not use the distinguished node, normalized by the corresponding count in the associated complete graph:

c µ1 = n 1 . −2

Likewise: 

c µ2 = n 1 , −2 c µ2 = n 1 , 2 −2 c µ3 = n 1 . −2

The definition of node local graph cumulants follows the same procedure as before, now taking care to incorporate the presence of this distinguished node, e.g.,

κ = µ µ µ 2µ µ + 2µ2 µ . (20) 3 3 − 2 1 − 2 1 1 1

50 The same care must be taken when scaling the node local graph cumulants, e.g.,

κ3 κ˜3 = 2 . (21) µ1 µ1

S7.2 Edge local graph cumulants

A similar procedure can be used to obtain edge local graph cumulants, where instead of distinguishing a node, one now distinguishes an edge (here, represented by a four-pointed star at the midpoint of that edge). Again, there are two first-order edge local graph cumulants, although the one associated with the distinguished edge itself is trivial:

µ 1. 1 ≡ The other, associated with the remaining edges, is given by c µ = , 1 n 1 2 − where the floating star indicates that the distinguished  edge is not included in the illustrated subgraph. In particular, as the nodes associated with the distinguished edge are equivalent to any other, they are neither required in nor excluded from the illustrated subgraph. Thus, c = c 1, i.e., the count of edges in the network minus the one distinguished edge. − Likewise, c µ = , 2 2(n 2) − c µ2 = n , 3 3 2(n 2) c − − µ =  . 3 n 2 − Again, our procedure requires no modification; the edge local graph cumulants are given by a straightforward application of the combinatorial definition, e.g.,

κ = µ 2µ µ µ µ + 2µ µ2 . (22) 3 3 − 2 1 − 2 1 1 1

51 Scaling the edge local graph cumulants also incorporates the distinguished edge, e.g.,

κ3 κ˜3 = 2 . (23) µ1 µ1

S8 Graph cumulants are additive

Essentially, the defining property of cumulants is their unique additive nature when applied to sums of independent random variables [31, 32] (e.g., Var(X + Y ) = Var(X) + Var(Y ) when

X and Y are independent). This property is to foundational results in probability and statistics, such as the central limit theorem and its generalizations [46, 45]. In this section, we

first define a natural notion of “summing” (denoted by ) graph-valued random variables with ⊕ the same number of nodes. We then show that the graph cumulants of these distributions sum when they are independent.

There are a variety of operations that compose two graphs, such as the disjoint union and a variety of graph products [104]. Here, we consider the sum of two graphs G G to be at a ⊕ b the level of their adjacency matrices, defined by simply adding the entries component-wise. In general, as the same graph can be represented by many adjacency matrices, we must assign equal probability to each. In particular, for a graph G with n nodes represented by an adjacency matrix A = a , then one distributes the probability associated to this graph uniformly over { ij} all matrices A0 = a for all permutations σ of 1, . . . , n . Thus, when summing two { σ(i)σ(j)} { } graphs, one considers all the ways that their sets of representative adjacency matrices could sum. The result is a graph-valued random variable over weighted graphs (see figs. S7 and S8).

This notion extends to graph-valued random variables by the distributive property,

p (G) G p (G0) G0 = p (G) p (G0) G G0. (24) a ⊕ b a b ⊕  G Ω   G0 Ω  G Ω G0 Ω X∈ X∈ X∈ X∈ Moreover, as is clearly commutative and associative, it is also well-defined for multiple ⊕ graph-valued random variables.

52 Fig. S7: Subgraphs are counted with multiplicity equal to the product of their edge weights. To motivate this prescription, consider a network with integer edge weights, and represent each integer-weighted edge as that number of unit-weighted edges. For the weighted wedge graph shown here, by counting the number of pairs of unit-weighted edges that share one node, we find that there are 2 3 unweighted wedges in this weighted graph. Indeed, for × any weighted network (including those with real-valued edge weights), each subgraph should be counted with weight equal to the product of its edge weights.

Even when summing unweighted graph-valued random variables, the result is a graph-valued random variable over weighted graphs. Thus, to obtain the graph cumulants of the resulting distribution, we must generalize the notion of subgraph density to weighted networks (itself a useful extension). Several ways have been proposed to generalize counts of subgraphs to weighted networks [95, 109, 97, 98]. Within our framework, the consistent prescription is to treat a weighted edge as a collection of multiple edges that sum to its weight. Hence, when counting subgraphs, one should consider each instance with multiplicity equal to the product of its edge weights [24]. The normalization for graph moments is the same as before, i.e., the counts of the subgraphs in the unweighted complete network (thus, the graph moments of weighted networks may be greater than one). Likewise, the conversion from graph moments to graph cumulants remains identical (see expressions in supplementary materials S10.4). With the definitions for summing graph-valued random variables and for computing

53 moments and cumulants of weighted networks, we can now state the main result of this section (see fig. S8): For two independent graph-valued random variables over n nodes, and , the Ga Gb Introducing Cumulants for Graphs:Introducing Cumulants for Graphs: 1 3 graph cumulants of their sum is the sum of their cumulants: = + 1 What is the Kurtosis of your SocialWhat Network? is the Kurtosis of1 your3 SocialGa Network? Gb 4 4 a b = + 1 ✓ ◆ ✓ ◆ G G 4 4 1 3 ✓ ◆ ✓ ◆ = + 1 3 4 4 = + ✓ ◆ ✓ ◆ 4 4 1 1 1 1 3 1 1 1 Gecia Bravo-Hermsdorff ,1 & Lee M. GundersonGecia,2 Bravo-Hermsdorff✓,1 & Lee◆ M. Gunderson✓ ◆= ,2 + + + + + ⇤ κ ( )⇤ = κ ( ) + κ1 ⇤ 1( )1. 1 3 1⇤ 4 3 1 3 1 3 4 (25)3 3 3 rg a b rg a = rg b+ + + ✓ + + ◆ ✓ ◆ G ⊕ G G 4 3 G 3 3 4 3 1 3 1 3 3 ✓ ◆ ✓= + + ◆ 1 1 3 6 12 4 = + + 1 3 6 12 4 = + 1 ⇤Both authors contributedGa Gb equally4 to this4 work. ⇤Both authors contributed equally to this work. By induction,1 this holds for✓ the sum◆1 ✓ of any◆ number of independent graph-valued random Princeton Neuroscience Institute, Princeton University,1 Princeton,Princeton3 NJ, 08544, Neuroscience USA Institute, Princeton University, Princeton, NJ, 08544,1 USA 3 1 1 3 3 5 2 = 2 + µ = µ + µ = + = Department of Astrophysical Sciences, Princeton4 University, Princeton,Department4 NJ, 08544,of Astrophysical USA Sciences, Princeton1 University,3 Princeton,1 NJ,1Ga 08544,34 3 USA 5 4 4 3 4 3 6 ✓ ◆ ✓ ◆ µ a = µ + µ = + = ✓ ◆ ✓ ◆ 1 1 1 1 3 1 1 1 G 4 4 4 3 41 3 6 3 1 0 3 3 3 variables. = + + + + + µ✓ ◆ = ✓µ ◆ + µ = + = 4 3 3 3 4 3 3 3 1 3 1 G0a 43 3 3 4 4 3 4 3 4 Abstract ✓ ◆ ✓ µ Abstract◆a = µ + µ = + = ✓ ◆ ✓ ◆ 1 1 3 G 4 4 4 3 34 3 5 2 4 1 = + +  ✓ ◆= ✓ ◆ = 6 12 4 3 5 2 1 Ga 4 6 18  = = ✓ ◆ Ga 4 6 18 ✓ ◆ 1 3 1 1 1 3 3 5 1 2 µ = µ + µ = + = µ =1µ = 3/4 Ga 4 4 4 3 4 3 3/46 2 Gb 3 ✓ ◆ ✓ ◆ µ b =1µ = 1 3 1 0 3 3 3 G 3 1 1/4 µ = µ + µ = 1/4+ = µ =1µ = Ga 4 4 4 3 4 3 4 1 Gb 3 a = b = a = ✓ ◆ ✓ ◆ bµ= b =1µ = G G 3 5 2 1 G G G 3 1 2 2 1  = =  = = Ga 4 6 18 1 2 2 1 Gb 3 3 9 ✓ ◆  = = ✓ ◆ Gb 3 3 9 ✓ ◆ 2 1 1 3 1 3 1 3 3 5 3 µ =1µ = µ = µ + µ + µ = + + = 1 3 Gb 3 1 1 Ga3 Gb 6 1 3 121 3 3 45 3 6 3 12 3 4 3 2 a b = + 1 µ a b = µ + µ + µ = + + = ✓ ◆ ✓ ◆ ✓ ◆ G G 4 4 1 G G 6 12 4 1 6 3 121 3 4 3 2 1 2 1 3 3 8 79 ✓ ◆ ✓ ◆ µ =1µ = µ = µ ✓ ◆ + µ✓ ◆ +✓ µ◆ = + + = 1 3 Gb 3 1 1 Ga 3 Gb 6 1 2 121 3 3 4 8 79 6 3 12 3 4 3 36 = + µ = µ + µ + µ = + + = ✓ ◆ ✓ ◆ ✓ ◆ 4 4 2 Ga Gb 6 12 4 6 32 12 3 4 3 36 ✓ ◆ ✓ ◆ 1 2 1 79 ✓3 ◆ 1✓ ◆ ✓ ◆  = = 2  = = =  +  1 1 1 1 3 1 1 Gb 1 3 3 9 79 3 1 Ga Gb 36 2 18 Ga Gb = + + + + + ✓ ◆  = = =  +  ✓ ◆ 4 3 3 3 4 3 3 3 Ga Gb 36 2 18 Ga Gb ✓ ◆ ✓ ◆ ✓ ◆ 1 1 3 = + + 6 12 4 1 1 3 1 3 1 3 3 5 3 µ 1= µ + µ + µ = + + 1 = Ga Gb 6 12 4 6 3 12 3 4 3 2 ✓ ◆ ✓ ◆ ✓ ◆ 1 1 3 1 2 1 3 3 8 79 µ = µ + µ + µ = + + = Fig. S8: Graph1 cumulants3Ga Gb 61 1 have3 123 the5 same4 additive6 3 12 3 property4 3 36 as classical cumulants. µ = µ + µ = + = ✓ ◆ ✓ ◆ ✓ ◆ Ga 4 4 4 3 42 3 6 79✓ ◆ 3 ✓ ◆1 2  a = = =  a +  a) A graph-valued1 random3G Gb variable361 0 2 3 3 18 3correspondsG Gb to a probability2 distribution over graphs µ = µ + µ = ✓ +◆ = Ga 4 4 4 3 4 3 G 4 n n = 3 ✓ ◆ ✓ ◆ 1/4 with nodes. Here,3 5 2 1 and a yields a single edge with probability and a triangle  = = Ga 34 6 18 G /4 ✓ ◆ with probability . b) A graph-valued random variable may also be concentrated on a single graph, as is the case for b, which always yields a wedge. c) Our graph sum is defined for 2 µ b =1Gµ = ⊕ graph-valued randomG variables3 with the2 same number of nodes. Specifically, the adjacency 1 µ =1µ = matrices of each networkGb are summed3 component-wise, with probability uniformly distributed 1 2 2 1  = = over all permutationsGb of3 the 3 nodes.9 If the graph-valued random variables being summed are ✓ ◆ independent, the resulting (weighted) graph-valued random variable has cumulants equal to the 1 1 3 1 3 1 3 3 5 3 sum ofµ thosea b = ofµ the+ originalµ + µ distributions.= + + = G G 6 12 4 6 3 12 3 4 3 2 ✓ ◆ ✓ ◆ ✓ ◆ 1 1 3 1 2 1 3 3 8 79 µ = µ + µ + µ = + + = Ga Gb 6 12 4 6 3 12 3 4 3 36 ✓ ◆ ✓ ◆ ✓ ◆ 79 3 2 1  = = =  +  Ga Gb 36 2 18 Ga Gb ✓ ◆

To demonstrate how to verify this property, we first consider the specific cases of

κ1 , κ2 , and κ3 , and then2 give the combinatorial argument for the general case. Clearly, for

54 the first moment:

µ ( ) = µ ( ) + µ ( ), (26) 1 Ga ⊕ Gb 1 Ga 1 Gb as edge weights simply sum and the normalization remains the same.

For µ ( ), one must consider the 22 ways in which a wedge could be formed: both 2 Ga ⊕ Gb edges from , giving µ ( ); both edges from , giving µ ( ); a “left” edge from and a Ga 2 Ga Gb 2 Gb Ga “right” edge from , giving µ ( )µ ( ); and a “left” edge from and a “right” edge from Gb 1 Ga 1 Gb Gb , giving µ ( )µ ( ). Thus, Ga 1 Gb 1 Ga µ ( ) = µ ( ) + µ ( ) + 2µ ( )µ ( ) (27) 2 Ga ⊕ Gb 2 Ga 2 Gb 1 Ga 1 Gb

Substituting 26 and 27 into the expression for κ2 (equation 48), we have

κ ( ) = µ ( ) µ ( )2 2 Ga ⊕ Gb 2 Ga ⊕ Gb − 1 Ga ⊕ Gb = µ ( ) + µ ( ) + 2µ ( )µ ( ) µ ( ) + µ ( ) 2 2 Ga 2 Gb 1 Ga 1 Gb − 1 Ga 1 Gb = µ ( ) + µ ( ) µ2 ( ) µ2 ( )  2 Ga 2 Gb − 1 Ga − 1 Gb = κ ( ) + κ ( ), 2 Ga 2 Gb as desired.

Likewise, for µ ( ), one must again consider the 23 ways in which a triangle could be 3 Ga ⊕ Gb formed: all edges from , giving µ ( ); all edges from , giving µ ( ); a wedge from Ga 3 Ga Gb 3 Gb Ga and an edge from (occurring for three configurations), giving 3µ ( )µ ( ); and a wedge Gb 2 Ga 1 Gb from and an edge from (again occurring for three configurations), giving 3µ ( )µ ( ). Gb Ga 2 Gb 1 Ga Thus,

µ ( ) = µ ( ) + µ ( ) + 3µ ( )µ ( ) + 3µ ( )µ ( ). (28) 3 Ga ⊕ Gb 3 Ga 3 Gb 2 Ga 1 Gb 2 Gb 1 Ga

Substituting 26, 27 and 28 into the expression for κ3 (equation 50), we again find that

κ ( ) = κ ( ) + κ ( ). 3 Ga ⊕ Gb 3 Ga 3 Gb

55 as desired. More generally,

µ ( ) = µ ( )µ c ( ) (29) g Ga ⊕ Gb x Ga x Gb x g X⊆ = κ ( ) κ ( ) (30) p Ga p Gb x g π P p π ! π P p π ! X⊆ X∈ x Y∈ X∈ xc Y∈

= κ ( ) κ ( ) (31) p Ga p Gb π P y π p y p yc ! X∈ g X⊆ Y∈ Y∈ = κ ( ) + κ ( ) (32) p Ga p Gb π P p π ! X∈ g Y∈ = κ . (33) p Ga ⊕ Gb π P p π X∈ g Y∈  where, for notational convenience, g now represents the edge set of the subgraph (replacing the pair rg). The superscript c denotes the complement with respect to the set from which this subset was taken. Line 29 enumerates all possible 2r ways in which subgraph g could be formed.

Line 30 expands the moments in terms of cumulants (equation 2). Line 31 exchanges the order of the summands. Line 32 applies the distributive law. Line 33 demonstrates consistency with the additive property of graph cumulants (equation 25).

S9 Spectral motivation

In this section, we describe a spectral motivation for graph moments and their generalizations.

S9.1 Simple graphs

Consider the problem of parameterizing a distribution over simple graphs with n nodes. We will

n represent such networks by ordered binary vectors of length 2 , where each entry represents an unordered pair of nodes, with 1 indicating that these two nodes are connected by an edge

56 and 0 that they are not. Let be the space of all such vectors. In general, the same graph X can be represented by multiple vectors, and distributions over these graphs must give the same probability to all vectors that represent the same graph. When parametrizing a distribution, it is often desirable that the distribution be “smooth”, in the sense that similar graphs are assigned similar probabilities. As our notion of similarity, we consider a “graph edit distance” [105], defined as the minimum number of edge changes (i.e., or deletions) needed to transform one graph into the other. For example, the wedge is distance 2 from the empty graph, and distance 1 from both the single edge and the triangle.

One common method for parameterizing smooth functions is via a Fourier representation, i.e., in terms of the eigenfunctions of some Laplacian operator. For example, a low-pass filter is equivalent to giving preference to the low-frequency (i.e., long-wavelength) terms, where the location of the cutoff determines the of the output. To obtain similarly smooth parameterizations over the space of networks, we consider a Laplacian operator based on this graph edit distance.

To this end, we define a (weighted, directed) “edit graph” Hn, with nodes representing unique (i.e., non-isomorphic) networks with n nodes. A directed edge from one node in Hn to another appears whenever the network it represents can be transformed into the other by adding or removing an edge at a single location. The weight of an edge in Hn is given by the number of locations that could be altered to effect this transformation (see Fig. S9 for the case of simple graphs with 4 nodes). The Laplacian of H is defined as L = D A> , where D is n H out − H out the diagonal matrix of out-degrees, and A = a is the (asymmetric) adjacency matrix with H { ij} entries aij equal to the weight of the transition from i to j.

The lowest eigenvalue of LH is 0, and is associated with a left eigenvector that is uniform over the unique networks and a right eigenvector that is uniform over all representations of the

n networks (i.e., uniform over all binary vectors of length 2 ), thus corresponding to the ERn,1/2  57 distribution. The remainder of the spectrum contains additional structure. Its support is the set

n of positive up to and including 2 , and each integer has a predictable degeneracy: the multiplicity of an eigenvalue λ = r is equal  to the number of distinct subgraphs with exactly r edges (with at most n nodes). This is not just a combinatorial coincidence; the span of left eigenvectors (with eigenvalue at most r) is precisely the span of subgraph counts (of order at most r) in each network. The structure of this spectrum gives rise to a hierarchical parameterization of distributions over networks that is equivalent to our proposed family of hierarchical ERGMs, namely

r ri p(G) v (G) exp β v (G) , (34) ∝ d,0 i,j l,i,j i=1 j=1 ! X X where vd,0 is the right eigenvector of LH with eigenvalue 0 (i.e., the ERn,1/2 base distribution); vl,i is the set of left eigenvectors of LH with eigenvalue i, and vl,i,j is one such vector; ri is the number of eigenvectors with eigenvalue i; and βi,j are the parameters to be determined. In particular, as eigenvectors with the same eigenvalue are intrinsically intertwined, this degeneracy offers a principled motivation for the use of all subgraphs up to some chosen order.

58 Introducing Cumulants for Graphs: What is the Kurtosis of your Social Network?

Introducing Cumulants,1 for Graphs:,2 Gecia Bravo-Hermsdorff⇤ & Lee M. Gunderson⇤ What is the Kurtosis of your Social Network?

Both authors contributed equally to this work. ⇤ ,1 ,2 1PrincetonGecia Neuroscience Bravo-Hermsdorff Institute, Princeton⇤ & University, Lee M. Princeton, Gunderson NJ, 08544,⇤ USA 2Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA

⇤Both authors contributed equally to this work. 1Princeton Neuroscience Institute, PrincetonAbstract University, Princeton, NJ, 08544, USA 2Department of Astrophysical Sciences, Princeton University, Princeton, NJ, 08544, USA

a 3 1 1 3 2 3 1 4 b Abstract 1 4 1 3 2 6 2 1 6 2 2 1 1 2 c 1 2 2 4 a 4 1 d b

Probability density c Edge counts c ! 0d 1 2 3 4 5 6

Fig.Wedge S9: Spectral counts c motivation for graph moments. a) Directed weighted “edit graph” Probability density H4 for simple networks with 4 nodes. The large nodes represent each of the 11 unique (non-isomorphic) networks. A directed edge from node i to node j indicates that network i canEdge be transformed counts c into network j by adding or removing an edge at a single location, where the weight of this directed edge is equal to the number of locations that could accomplish this transformation. Note that every node has1 an out-degree of 6, corresponding to the 4 Wedge counts c 2 possible locations. b) Schematic of the spectrum of the Laplacian of H4. The eigenvalues are 4  integers ranging from 0 to 2 , and are degenerate with multiplicity equal to the number of distinct subgraphs (with at most 4 nodes) with that number of edges. This correspondence has  a combinatorial interpretation: the span of left eigenvectors with eigenvalue λ r is precisely ≤ the span of subgraph counts up to and including1 order r.

S9.2 Generalizations

We now generalize the concept of moments and cumulants for distributions over a set `, X ≡ A i.e., the vectors of length ` N over the alphabet = a1, a2,... , invariant with respect ∈ A { } to a group G acting on this set . The action of G induces an equivalence relation on X : x x0 g G x = g x0, partitioning it into orbits. The distribution over is then X ∼ ⇔ ∃ ∈ | ◦ X characterized by assigning a probability to each of these orbits. For example, for the case of

simple graphs, the group G is Sn, acting by permuting the n nodes of a graph. The set X

59 n consists of ordered binary vectors of length ` = 2 , where each entry represents an unordered pair of nodes, with 1 indicating that these two nodes  are connected by an edge and 0 that they are not. Nonisomorphic graphs are in different orbits, and all the elements in a given orbit correspond to the same graph, with the probability associated to that orbit distributed uniformly over all of its elements.

With this framework, we can construct the weighted directed “edit graph” described in the previous section for an arbitrary set and a group G acting upon it. We can then use the X spectrum of the Laplacian of this edit graph to obtain the number of moments at each order.

Again, the nodes of the edit graph are the orbits of under the action of G, and a directed edge X from orbit i to orbit j indicates that an element in orbit i can be transformed into an element in orbit j by changing one of its ` entries. The weight of this directed edge is given by the number of elements in orbit j that differ by a single entry from any single fixed element in orbit i. This abstraction applies to a variety of situations, and naturally encompasses the generalizations previously presented in this paper. For example, for unweighted directed networks with no self-loops, is the set of all ordered binary vectors of length ` = 2 n , X 2 where each entry represents an ordered pair of nodes, with 1 indicating that there is an edge  from the first node to the second and 0 that there is not. The group G is again Sn. As another example, consider the case of undirected unweighted bipartite networks, i.e., every node has one of two possible “flavors” (“charm” and “strange”), and edges can occur only between nodes of different flavors. The set consists of all ordered binary vectors of length ` = n n , where X ch str each entry represents a different unordered pair of nodes of different flavors, and 1 indicates that these two nodes are connected by an edge and 0 that they are not. The group G allows for permutations of nodes of the same flavor, namely Sn Sn . ch × str We now illustrate the versatility of this formalism by describing an additional generalization, namely, k-uniform hypergraphs, i.e., where each (hyper)edge represents a connection between

60 k distinct nodes. As in the standard graph case (i.e., k = 2), the group G is the symmetric group S acting by permuting the n nodes. The set consists of all ordered binary vectors n X n of length ` = k , where each entry represents an unordered set of k nodes, and a 1 indicates the presence of a hyperedge between them and 0 its absence. The orbits induced by this group action again partition the elements of into equivalence classes, one for each of the unique X hypergraphs. The eigenvalues of the Laplacian of the corresponding edit graph follow a similar pattern: associated to the eigenvalue of 0 is a left eigenvector that is uniform over unique hypergraphs, and a right eigenvector that is uniform over all elements of . Likewise, for X the remaining left eigenvectors, there is one eigenvector with associated eigenvalue of 1 that is linear in the number of hyperedges. At second order (associated eigenvalue of 2), there are now k eigenvalues (for n 2k), corresponding to the k ways that two hyperedges can relate (sharing ≥ any number from 0 to k 1 nodes). −

S10 Formulas for graph moments and graph cumulants

Here, we provide the expressions for graph moments and graph cumulants used to obtain the results presented in this paper. These expressions are also included explicitly in our associated code, and we have automated their derivation to arbitrary order. We first give the normalizations for obtaining the graph moments as well as the expressions for efficiently computing the disconnected subgraph counts (see supplementary materials S1.1). We then give the expressions for their conversion to graph cumulants (by inverting equation 2).

S10.1 Undirected, unweighted networks

For simple graphs, we now enumerate the expressions for all third-order graph moments and cumulants, as well as those that are necessary for computing the (sixth-order) cumulant

61 associated with the complete graph with four nodes. The remaining expressions up to and including sixth order are explicitly included on our code.

S10.1.1 Graph moments

c µ1 = n (35) 2 c µ2 = n (36) 3 3 c c c µ =  = 2 − (37) 2 3 n 3 n 4  4 c µ3 = n   (38) 3 c µ3 = n (39) 4 4 c µ3 = n (40) 12 4 c c (c 2) 3c 3c 2c  µ3 = n = − − n − − (41) 30 5 30 5 c c c c c c µ =  = 3 − − − − (42) 3 15 n 15 n 6  6 c µ4 = n  (43) 12 4 c µ4 = n  (44) 3 4 c µ5 = n (45) 6 4 c µ6 = n  (46) 4  S10.1.2 Graph cumulants

κ1 = µ1 (47)

κ = µ µ2 (48) 2 2 − 1 κ = µ µ2 (49) 2 2 − 1 62 κ = µ 3µ µ + 2µ3 (50) 3 3 − 2 1 1 κ = µ 3µ µ + 2µ3 (51) 3 3 − 2 1 1 κ = µ 2µ µ µ µ + 2µ3 (52) 3 3 − 2 1 − 2 1 1 κ = µ µ µ 2µ µ + 2µ3 (53) 3 3 − 2 1 − 2 1 1 κ = µ 3µ µ + 2µ3 (54) 3 3 − 2 1 1 κ = µ µ µ µ µ 2µ µ 4 4 − 3 1 − 3 1 − 3 1 2µ2 µ µ + 10µ µ2 + 2µ µ2 6µ4 (55) − 2 − 2 2 2 1 2 1 − 1 κ = µ 4µ µ 2µ2 µ2 + 8µ µ2 + 4µ µ2 6µ4 (56) 4 4 − 3 1 − 2 − 2 2 1 2 1 − 1 κ = µ 4µ µ µ µ 2µ µ 2µ µ 4µ µ 2µ µ 5 5 − 4 1 − 4 1 − 3 2 − 3 2 − 3 2 − 3 2 2 2 2 2 + 4µ3 µ1 + 4µ3 µ1 + 8µ3 µ1 + 4µ3 µ1

2 2 + 20µ2 µ1 + 8µ2 µ2 µ1 + 2µ2 µ1

48µ µ3 12µ µ3 + 24µ5 (57) − 2 1 − 2 1 1 κ = µ 6µ µ 12µ µ 3µ µ + 24µ µ2 + 6µ µ2 6 6 − 5 1 − 4 2 − 4 2 4 1 4 1 4µ µ 6µ2 − 3 3 − 3

+ 24µ3 µ2 µ1 + 24µ3 µ2 µ1 + 48µ3 µ2 µ1 + 24µ3 µ2 µ1

24µ µ3 24µ µ3 72µ µ3 − 3 1 − 3 1 − 3 1 3 2 3 + 12µ2 + 15µ2 µ2 + 3µ2

153µ2 µ2 90µ µ µ2 27µ2 µ2 − 2 1 − 2 2 1 − 2 1 + 288µ µ4 + 72µ µ4 120µ6 (58) 2 1 2 1 − 1

S10.2 Directed networks

We now enumerate the expressions necessary for computing the graph moments and cumulants of all directed subgraphs with three nodes, including the sixth-order graph cumulant associated

63 with the complete directed triad. The remaining expressions up to and including fifth order are explicitly included on our code.

S10.2.1 Graph moments

c1 µ1 = n (59) 2 2 c2 µ2 = n (60) 3 3 c2 µ2 = n (61) 3 3 c2 µ2 = n (62) 6 3 c2 µ2 = n  (63) 2 c3 µ3 = n (64) 6 3 c3 µ3 = n (65) 6 3 c3 µ3 = n (66) 6 3 c3 µ3 = n (67) 2 3 c4 µ4 = n (68) 3 3 c4 µ4 = n (69) 3 3 c4 µ4 = n (70) 6 3 c4 µ4 = n (71) 3 3 c5 µ5 = n (72) 6 3 c6 µ6 = n  (73) 3 

64 S10.2.2 Graph cumulants

κ1 = µ1 (74)

κ = µ µ2 (75) 2 2 − 1 κ = µ µ2 (76) 2 2 − 1 κ = µ µ2 (77) 2 2 − 1 κ = µ µ2 (78) 2 2 − 1 κ = µ µ µ µ µ µ µ + 2µ3 (79) 3 3 − 2 1 − 2 1 − 2 1 1 κ = µ µ µ µ µ µ µ + 2µ3 (80) 3 3 − 2 1 − 2 1 − 2 1 1 κ = µ µ µ µ µ µ µ + 2µ3 (81) 3 3 − 2 1 − 2 1 − 2 1 1 κ = µ 3µ µ + 2µ3 (82) 3 3 − 2 1 1 κ = µ 2µ µ 2µ µ µ2 µ2 µ µ 4 4 − 3 1 − 3 1 − 2 − 2 − 2 2 + 2µ µ2 + 4µ µ2 + 4µ µ2 + 2µ µ2 6µ4 (83) 2 1 2 1 2 1 2 1 − 1 κ = µ 2µ µ 2µ µ µ2 µ2 µ µ 4 4 − 3 1 − 3 1 − 2 − 2 − 2 2 + 4µ µ2 + 2µ µ2 + 4µ µ2 + 2µ µ2 6µ4 (84) 2 1 2 1 2 1 2 1 − 1 κ = µ µ µ µ µ µ µ µ µ µ µ µ µ µ µ 4 4 − 3 1 − 3 1 − 3 1 − 3 1 − 2 2 − 2 2 − 2 2 + 2µ µ2 + 2µ µ2 + 6µ µ2 + 2µ µ2 6µ4 (85) 2 1 2 1 2 1 2 1 − 1 κ = µ 2µ µ 2µ µ µ µ µ2 µ2 4 4 − 3 1 − 3 1 − 2 2 − 2 − 2 + 2µ µ2 + 2µ µ2 + 4µ µ2 + 4µ µ2 6µ4 (86) 2 1 2 1 2 1 2 1 − 1 κ = µ µ µ µ µ 2µ µ µ µ 5 5 − 4 1 − 4 1 − 4 1 − 4 1 − µ µ µ µ µ µ µ µ µ µ − 3 2 − 3 2 − 3 2 − 3 2 − 3 2 µ µ µ µ µ µ µ µ µ µ − 3 2 − 3 2 − 3 2 − 3 2 − 3 2 2 2 2 2 + 6µ3 µ1 + 6µ3 µ1 + 9µ3 µ1 + 2µ3 µ1

65 2 2 2 2 + 2µ2 + 2µ2 + 6µ2 + 2µ2 + 2µ2 µ2

+ 4µ2 µ2 + 4µ2 µ2 + 2µ2 µ2 + 2µ2 µ2 + 4µ2 µ2

12µ µ3 12µ µ3 24µ µ3 12µ µ3 + 24µ5 (87) − 2 1 − 2 1 − 2 1 − 2 1 1 κ = µ 6µ µ 3µ µ 3µ µ 6µ µ 3µ µ 6 6 − 5 1 − 4 2 − 4 2 − 4 2 − 4 2 + 6µ µ2 + 6µ µ2 + 12µ µ2 + 6µ µ2 3µ2 3µ2 3µ2 µ2 4 1 4 1 4 1 4 1 − 3 − 3 − 3 − 3

+ 12µ3 µ2 µ1 + 12µ3 µ2 µ1 + 12µ3 µ2 µ1 + 12µ3 µ2 µ1 + 12µ3 µ2 µ1

+ 12µ3 µ2 µ1 + 12µ3 µ2 µ1 + 12µ3 µ2 µ1 + 12µ3 µ2 µ1 + 12µ3 µ2 µ1

36µ µ3 36µ µ3 36µ µ3 12µ µ3 − 3 1 − 3 1 − 3 1 − 3 1 3 2 3 2 2 3 + 2µ2 + 6µ2 µ2 µ2 + 6µ2 µ2 + 2µ2 + 6µ2 µ2 + 6µ2 µ2 + 2µ2

18µ2 µ2 18µ µ µ2 36µ µ µ2 18µ µ µ2 18µ2 µ2 − 2 1 − 2 2 1 − 2 2 1 − 2 2 1 − 2 1 36µ µ µ2 18µ µ µ2 54µ2 µ2 36µ µ µ2 18µ2 µ2 − 2 2 1 − 2 2 1 − 2 1 − 2 2 1 − 2 1 + 60µ µ4 + 60µ µ4 + 120µ µ4 + 60µ µ4 120µ6 (88) 2 1 2 1 2 1 2 1 − 1

S10.3 Networks with node attributes

Often, networks have additional attributes associated with the nodes. By incorporating this information into the graph cumulant formalism, one can reveal structure that is correlated with these attributes. Our example in Figure S2 considers cumulants associated with subgraphs containing up to three nodes for a network with a binary node attribute. We now enumerate the expressions for computing the graph moments and cumulants required for this analysis (as well as for the 3-star subgraphs). Note that the mapping from attributes to colors is arbitrary; the colors may be reversed in any expression (e.g., the expression for κ2 can be obtained from that for κ2 by exchanging all instances of purple and green with each other).

66 S10.3.1 Graph moments

c1 µ1 = n (89) 2 c1 µ =  (90) 1 n n c2 µ2 = n (91) 3 3 c2 µ2 = n  (92) 2 2 n c2 µ2 = n (93) n 2 c3 µ3 = n  (94) 3 c3 µ3 = n  (95) 2 n c3 µ3 = n (96) 4 4 c3 µ3 = n  (97) 3 3 n c3 µ3 = n  n (98) 2 2 2 c3 µ3 = n  (99) n 3  S10.3.2 Graph cumulants

κ1 = µ1 (100)

κ1 = µ1 (101)

κ = µ µ2 (102) 2 2 − 1 κ = µ µ µ (103) 2 2 − 1 1 κ = µ µ2 (104) 2 2 − 1 κ = µ 3µ µ + 2µ3 (105) 3 3 − 2 1 1 67 κ = µ 2µ µ µ µ + 2µ µ2 (106) 3 3 − 2 1 − 2 1 1 1 κ = µ 3µ µ + 2µ3 (107) 3 3 − 2 1 1 κ = µ 2µ µ µ µ + 2µ2 µ (108) 3 3 − 2 1 − 2 1 1 1 κ = µ 2µ µ µ µ + 2µ µ2 (109) 3 3 − 2 1 − 2 1 1 1 κ = µ 3µ µ + 2µ3 (110) 3 3 − 2 1 1

S10.4 Weighted networks

For weighted networks, each instance of a subgraph is counted with weight equal to the product of its edge weights (see fig. S7 and supplementary materials S8)), but the normalization to moments and conversion to cumulants remain the same as in the unweighted case. However, the expressions for computing the disconnected counts from the connected counts requires a slight modification. For example, the counts of two weighted edges that do not share any node now includes a second-order term related to the square of the edge weights.

S10.4.1 Graph moments

1 µ1 = n wij (111) 2 0 i

κ1 = µ1 (117)

κ = µ µ2 (118) 2 2 − 1 κ = µ µ2 (119) 2 2 − 1 κ = µ µ2 (120) 2 2 − 1 κ = µ 3µ µ + 2µ3 (121) 3 3 − 2 1 1 κ = µ 2µ µ µ µ + 2µ3 (122) 3 3 − 2 1 − 2 1 1

69