Generating Hierarchically Modular Networks via Link Switching SusanKhor

ABSTRACT In order to study realworld networks, it is therefore handy to haveamethodtogeneratenetworkswithsimilarcharacteristicsas Thispaperintroducesamethodtogeneratehierarchicallymodular realworld networks, i.e. to generate hierarchically modular networks with prescribed node list by link switching. networks. Like the hierarchical model [5], our Unlike many existing network generating models, our method method to generate hierarchically modular networks uses a pre does not use link probabilities to achieve modularity. Instead, it specifiedtopologytooutlinethehierarchicalstructureandguide utilizes a userspecified topology to determine relatedness the formation of modules. However, unlike other network betweenpairsofnodesintermsof edge distances andlinksare generatingmodelsthatemploylinkprobabilities,whetherderived switchedtoincreaseedgedistances.Tomeasurethemodularness internallyfromthenetworkitselfe.g.asafunctionofnodedegree of a network as a whole , a new metric called Q2 is proposed. [4],ordefinedexternallye.g.[5],oracombinationofboth,e.g. Comparisonsaremadebetweenthe Q[15]and Q2measures.We [11]; our method does not deal with link probabilities directly. alsocommentontheeffectofourmodularizationmethodonother Instead,itutilizesthetopologytodetermine relatedness between networkcharacteristicssuchasclustering,hierarchy,averagepath pairsofnodesintermsof edge distances .Nodesthatbelongtothe length, smallworldness, degree correlation and . An same module are more related to one another than nodes that applicationofthismethodisreportedelsewhere[12].Briefly,the belong to different modules. Larger edge values are generatednetworksareusedastestproblemstoexploretheeffect associatedwithmorerelatednodes.Themodularizationalgorithm of modularity and degree distribution on evolutionary search performslinkswitchingthatfavourslinksbetweenmoreclosely algorithms. related nodes (according to the topology used) over less closely related nodes. Hogg (1996) [11] used an ultrametric distance 1. INTRODUCTION (similartoournotionofedgedistance)inhismethod to create clustered graphs .Tomeasurethemodularnessofanetworkasa Manyrealworldnetworksofnaturalandmanmadephenomena whole ,anewmetriccalled Q2isintroduced. exhibittopologicalpropertiesatypicalofclassicalrandomgraphs [1, 14]. Realworld networks tend to exhibit heterogeneous 2. THE METHOD connectivity of the kind that, allowing for finite sizes of the networks,exhibitsapowerlawdecay,i.e.P( k)~ kγ, γ>1where Thenetworksgeneratedbythisalgorithmaresimplegraphs,i.e. P( k)istheprobabilitythatarandomlyselectednodeislinkedto k unweighted, undirected, and have no loops (selfedges) and no other nodes and γ is the degree exponent or scaling factor. multipleedges. Typically 2 ≤ γ ≤ 3. Realworld networks also tend to be more modular and have higher levels of network clustering than Algorithm expectedincomparablerandomnetworks.A modular network has 1. Create a random graph with N nodes and a given node identifiable subsets of nodes with a higher density of links degree list (ndl) .Lettheresultantgraphbe G0. amongst nodes within a subset than between nodes of different subsets [9, 15, 19]. Modularity (sometimes called hierarchical 2. Randomize G0 by exchanging or switching pairs of edges clustering or )isacommoncharacteristicof selecteduniformlyatrandom.Lettheresultantgraphbe Gr. biological[7,18]andotherrealworldnetworks[9,22].Network 3. Generate T,the decomposition topology . clustering refers to the cliquishness of a network and its prevalenceinanetworkindicatesdependencybetweenlinks. 4. Measure aed (Gr),the average edge distance for Grrelativeto T. Hierarchicalorganizationwasproposedasthekeytocombinea heavytailed rightskewed degree distribution with high network 5. Modularize Gr by selectively exchanging pairs of edges clusteringwithinasinglenetwork [19].A hierarchically modular preferentially selected at random. Edge switching is biased network isonewherethenodescanberecursivelysubdividedinto towards increasing edge distances relative to T. Let the modules (subsets of unexpectedly densely linked nodes) over resultantgraphbe Gm. several scales until some atomic level is reached. An artificial instanceisthe hierarchical network model[19].Themodulesina 6. Measure aed (Gm),theaverageedgedistancefor Gmrelative hierarchy need not be isolated from each other, but can be to T. aed (Gm) > aed (Gr) is interpreted as Gm being more interrelated subsystems of a larger encompassing whole. The modular than Gr. Using aed (Gm) and aed (Gr), measure “relations that hold among its parts” are, as Simon (1969) [21] modularness of Gm with the Q2 metric as follows: emphasized, important. Hierarchical organization was also aed(G ) 0.1 − r . suggested as one of the pillars of the architecture of complex aed(G ) systems [21]. It is not surprising then that hierarchical m organizationisdetectedinmanyrealworldnetworks[5].

arXiv0903.2598v3 1 Step 1: Node degree lists and random graph G0 creation. permissible,toreduceanybiasinadvertentlyintroducedinto G0in step1.Edgesareexchangedonlyiftheswitchdoesnotintroduce Thedegreeofanode deg (n)isthenumberofedgesadjacentto n. loopsormultipleedges.Supposethepairoforiginallinkstobe Anodedegreelist( ndl )enumeratesthedegreeforallnodesinan switchedis( p, q)and( r, s).Twopatternsofexchangeareused: undirectedgraphinascendingnodelabelorder(assumesallnodes (p, s)and( q, r);and( p, r)and( q, s).Thegivennodedegreelistis areuniquelylabeledandnode irepresentsproblemvariable i).It preservedbythisstep. is common in the complex networks literature to speak of generating random graphs with a given degree sequence . A degreesequenceofanundirectedgraphisitsnodedegreeslisted Step 3: Decomposition topology T creation. innonincreasingorder.However,thisorderingisnotnecessary The decomposition topology T is used in step 5 to guide the here. formationofmodulesinanetwork.Previously,Clausetetal.[5] Twobasicconditionsforawellformednodedegreelistare:(i)it proposed the creation of hierarchical random graphs (random must sum to an even number, and (ii) all its elements must be graphswithhierarchicalstructure)fromaprespecifiedtopology positiveintegers.Thevaluesofan ndl areintheordergenerated intheformofadendrogram D (aspecialkindofbinarytree),and by the random number generator and satisfy an additional a set of probabilities { pr}. The leaf nodes of D represent graph condition: (iii) all its elements must be much smaller than the nodes,whilethenonleaforinternalnodesof Didentifygroupsof number of nodes N, and at least as large as the minimum node related graph nodes, i.e. modules. Each internal node r in D is associated with a probability value pr. The probability of degreedeg min . connecting two graph nodes i and j is pr where r is the lowest G0isproducedasfollows: ancestor node in D common to i and j. The values pr can be adjustedtofavourdifferenttypesofconnections,e.g.shortrange (i) Arandomized(shuffled)list all_nodes ,ismadeofallnodes connections between closely related nodes versus longrange accordingtotheirnodedegrees,e.g.if deg (n0)=2,then n0 connectionsbetweendistantlyrelatednodes,orviceversa. willappeartwiceinthislist. Weuseconventionalbinarytrees,anddonotrequireasetofpre (ii) Anode x,ischosenuniformlyatrandomfrom all_nodes as defined probabilities { pr}. The function of { pr} is taken up by thesourceofalink e. edgedistance(definedinstep4)andtheedgeswitchingcondition (iii) Nodes other than x in all_nodes are uniquely inspected in instep5whichfavoursshortrangeconnectionsoverlongrange orderoftheirpositionin all_nodes startingfromanode, y, oneswherepossible(althoughthereverseorotheredgeswitching chosenuniformlyatrandom,tofindthetargetof e.Alink conditionmaybespecified).Liketheinternalnodesof D,nodes between two nodes can be made only if the nodes are in Thelpidentifymodules–thesemodulesaretheoreticalbecause distinctfromeachother,thenodesarenotalreadylinkedto it remains to be seen whether the actual network, Gm, respects one another and the link does not change the given node them. T carves out modules so that smaller modules are nested degree list. If link e is made between x and y, the within larger modules to form a hierarchy, and there is no frequencies of nodes x and y in all_nodes are reduced by overlappingofterritorybetweenmodulesatthesamelevel. oneeach. Figure 1 depicts TforN=20and ts = 4 (minimum size of a (iv) If the target of e is not found as yet, a link l is chosen module).Nodesof Taredenotedinternal nodes . Edgesbetween internal nodes are internal edges , and a path comprised uniformly at random from the existing set of edges. Let l exclusivelyofinternaledgesisan internal path .Allinternalpaths connect nodes u and v, i.e. l = ( u, v). Since loops are originateattherootof T.Otherkindsoftreesorstructurescould prohibited, u ≠ v. Node u ( v) is made the target for e begeneratedfor T. provided x ≠ u ( x ≠ v), and a link does not already exist between xand u( v).If u( v)ismadethetargetof e,link lis Step 4: Measuring edge distance ed , relative to T. removedfromtheexistingsetofedges,andthefrequency of node v ( u) in all_nodes is increased by one while the Edgedistanceisameasureoftherelatednessbetweenanodepair. frequency of node x in all_nodes is decreased by one. The distance of an edge e = ( x, y) is the length of the longest Nonetheless, at this point, the target of e maystillnotbe internalpathsharedby xand y.Asinahierarchicalrandomgraph found and the search is abandoned. The random graph wheremorecloselyrelatednodeshavelowestcommonancestors creation algorithm loops back to (ii) to try with another whicharesituatedlowerin Dthandistantlyrelatednodes,nodes sourcenode. incident on edges with larger edge distance values are more related to one another in the sense that they are more likely to Therandomgraphcreationalgorithmiteratesthroughsteps(ii)to belongtothesamemoduleaccordingto T. (iv)N 2times,oruntil all_nodes isemptied,i.e.arandomgraph withthegivennodedegreelisthasbeenformed.If all_nodes is ThefollowingexampleiswithreferencetoFigure1.Let e1=(1, notemptyafterN 2iterations,thenodedegreelistisabandoned. 4), e2=(5,4)and e3=(4,17).Thelongestinternalpaths(defined The set of simple graph configurations for certain node degree instep3)fornodes1,4,5and17respectivelyare: 〈10i,5i,2i 〉, listscanbeverysmall,andtherandomgraphcreationalgorithm 〈10i, 5i, 2i 〉, 〈10i, 5i, 7i〉 and 〈10i, 15i, 17i〉. Since the longest abovemaynotbesuccessful. internal paths for nodes 1 and 4 have two internal edges in common,i.e.(10i,5i)and(5i,2i);theedgedistanceof e1, ed (e1), is2.Similarly, ed (e2)=1,and ed (e3)=0.Thus,ofnodes1,5and Step 2: Randomization of G0 . 17,node4ismorerelatedtonode1thantonode5, and least Using a common procedure in random network formation, pairs relatedtonode17.Thelowestcommonancestorfornodes1and4 oflinksinaG0arechosenuniformlyatrandomandexchangedif arXiv0903.2598v3 2 Higherlevel 10 i T 5i 15i 2i 7i 12i 17i Lowerlevel 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Figure 1. A decomposition topology TforN=20and ts =4.Theinternalnodes,i.e.nodesof T,arelabeled xi todistinguishthemfromthe actualnetworknodeswhicharearrangedinarowinascendingnodelabelorderatthebottomof T.Internalnodesidentifymodules.E.g. internalnode5iidentifiesthemoduleencompassingnodes0to9,whileinternalnode2imarksthatnodes0to4belongtoamodule.The moduleidentifiedby2iisorganizedinTtonestdirectlywithinthemoduleidentifiedby5i. is2i,whichislowerin Tthan5i,thelowestcommonancestorfor distributioninpartconstraintsanetwork’sstructuralpossibilities. nodes4and5. Forexample,nodedegreelists13and14(section3)havethetwo largest M amongst all the ndl s (Table 1), but their Q and Q2 Theaverageedgedistanceforagraph G, aed (G),isthesumofall (Figure3)valuesareonthesmallerside.Thisisexpectablesince edge distances divided by the number of edges in G: nodes with unusually high degree have no choice but to make 1 M −1 intermodulelinks. ∑ ed(ei ) wi .Misthenumberoflinks,andsincenetworks M i=0 The modularization algorithm has the effect of reducing the areunweighted, w =1.0forall i. numberofedgeswithsmalleredgedistances,andincreasingthe i numberofedgeswithlargeredgedistances(Figure2).However, step (iii) does not necessarily favour the preservation of edges Step 5: Modularization of G . r withlargeredgedistancesoveredgeswithsmalleredgedistances, Thelinkswitchingmethodinstep2,withadditionalconditions,is andpermitsforinstancetheexchangeofapairofedgeswithedge usedtomodularize Grasfollows: distances1and8withapairofedgeswithedgedistances4and5. Thisexchange favourstheformationoflarger modulesoverthe (i) The complementary edge distance ( ced ), for all edges is formationofsmallermodules. calculated.Thecomplementaryedgedistanceforanedge e isthelongestinternalpathin Ted max ,lesstheedgedistance 0.8 of e plus 1, i.e. ced (e) = ed max – ed (e) + 1. The “plus 1” 2_m0 2_m8 ensures that all edges are included at least once in 0.7 10_m0 10_m8 all_edges .Arandomized(shuffled)list all_edges ,ismade 0.6 12_m0 12_m8 14_m0 14_m8 ofalledgesaccordingtotheir ced values,i.e.if ced (e0)=2, 0.5 Post then e0willappeartwiceinthislist. ) modularization ed 0.4 Pre (ii) Distinct pairs of edges are selected uniformly at random P( modularization from all_edges for exchange in the step (iii). In this way, 0.3 edgeswithsmalleredgedistanceslinkinglessrelatednodes 0.2 relativeto Tispreferentiallyselectedformodularization. 0.1 (iii) Let e1=( p, q)and e2=( r, s)bethedistinctpairofedges selectedinstep(ii).Then,thetwopairsofalternativeedges 0.0 are e3=( p, r)and e4=( q, s),and e5=( p, s)and e6=( q, r). 0 1 2 3 4 5 6 Lettheedgedistanceforedge eibe ed i.Ifanedgeisa Edge distance, ed or introduces a multipleedge, its edge distance is 1. Let ped be theproductof edge distances of a pair of edges i ij Figure 2. Change in edge distance distribution due to andj,i.e. ped ij = ed i×ed j.Theoriginaledgepair( e1, e2)is modularization. exchangedwiththeedgepairthathasthelargerped ij value largerthan ped 12 .Thus,( e1, e2)isswitchedto( e3, e4)onlyif Step 6: Measuring modular-ness with Q2 . ped 34 >ped 12 and ped 34 ≥ped 56 ,and(e1, e2)isswitchedto (e5,e5)onlyif ped 56 >ped 12 and ped 56 ≥ped 34 . ItisproposedherethatmodularnessofanetworkGBbeassessed relativetothemodularnessofacomparablenetwork G byQ as (iv) If a switch is made in step (iii), the modularization A 2 algorithmgoestostep(i),otherwiseitloopstostep(ii). aed(G A ) follows: 0.1 − . Q2is0.0if GBisas(not)modularas aed(G ) The modularizationalgorithmiterationsthroughsteps(i)to(iv) B G , i.e. aed (G ) = aed (G ), Q is > 0.0 if G is more modular Pg ×[M+N(N1)/2]times. Pgisaparameterformodularizing A B A 2 B networks.ThenumberofedgesM,isincludedinthenumberof than GA, i.e. aed (GB) > aed (GA), and Q2 is < 0.0 if GB is less times the modularization algorithm is iterated to accommodate modularthan GA,i.e. aed (GB) <aed (GA).Inthispaper, Gris GA, graphs with the same number of nodes N, but significantly and Gmis GB.Initsusagehere, Q2compareswhatispresumedto different number of edges. Increasing the number of times the be a modularized graph with its previous random and therefore modularizationalgorithmisappliedonanetworkneednotresult likelylessmodularversion.Itispossible,asinafullyconnected in a more modular network because a network’s degree simple graph, that there are no alternative simple graph arXiv0903.2598v3 3 configurations.Inthiscase, Q2willbe0.0,whichisappropriate which increases the error associated with our method of sinceallnodesinafullyconnectedgraphbelongtotheonesame generating the ndl s [6]. Nevertheless, these networks are moduleandthusdoesnotexhibitmodularity. generated for the purpose in [12] and using a larger N would substantially increase simulation time without necessarily 3. AN EXAMPLE producing more relevant insight. The ndl s may be produced by network growing models such as the many variations of the To illustrate, the method in section 2 is applied to eight node preferential attachment model [2], other generalized random degree lists ( ndl s). For each network, the number of nodes N is graph models [13], and using biologically inspired mechanisms 200, and the number of links M is given in Table 1. Minimum suchasduplicationanddivergence.Thisisapossiblenextstep. node degree deg is set to three to induce the formation of min Step1successfullyconstructed G foreveryndl ,inpartbecause connectedgraphssothatallnodesofanetworkbelongtoasingle 0 deg isthree,and P ,thefractionofactualtoallpossiblelinks, graph. min e whichrangesfrom0.022to0.035,isverylow. The ndl swereproducedbyroundingthevaluesgeneratedbythe randht.m procedure(version1.0.2)providedonlinebyA.Clauset. Attemptsatedgeswitchinginstep2aremade0.125 ×N(N1)/2 Twodifferentkindsofdistributionsareused:normal( ndl s1and times,whichis2,487fornetworksinthispaper.Themostnumber 2)andpowerlawwithdegreeexponents4.0( ndl s9and10),3.0 oflinksanetworkinTable1hasis694,soeachlinkwouldhave (ndl s11and12)and2.6( ndl s13and14).Thecharacteristicsof hadachancetoswitch. the ndl s are summarized in Table 1. The degree distribution of All networks in this paper use the same T which is created as ndl s1and2havelittletonoskew,mean=median=mode.The follows in step 3: nodes of G0 are arranged by node label in a degree distribution curves of the other ndl s are rightskewed, string,andthisstringisrecursivelysplitintotwo(almost)equal mean ≥ median ≥ mode. The standard deviations noticeably sized halves at node labeled x until the remaining portion is increasegoingdownthelistof ndl sinTable1. smallerthansome size ts ,whichissettofourhere.Thesetofall x nodelabelsderivedfromthisprocessforms T. Table 1. Node degree list summary statistics, N=200 Std. ndl Min Max Mean Mod Median M Instep5, Pgissetto0.8. dev 1 3 9 6.05 1.0786 6 6 605 4. MODULARITY 2 3 9 6.04 1.0459 6 6 604 9 3 19 4.60 2.2573 3 4 460 Toverifythatthemodularizationalgorithminstep5doesindeed 10 3 15 4.36 1.8595 3 4 436 modularizeanetwork, Qvaluesforthefirstthreehighestlevels 11 3 36 5.74 4.3006 4 4 574 for the networks generated from the ndls in section 3 are taken 12 3 53 5.25 4.3822 3 4 525 beforeandafterstep5.The Qmetricwasintroducedin[15]asa 13 3 50 6.94 7.0673 3 5 694 measure of modularity given a certain division of a network. 14 3 68 6.74 7.6265 4 5 674 sT Bs Q = ; where s is a column vector of ±1 elements 1 Degree 10 100 2m representingaparticulardivisionofanetworkintotwocandidate 1.000 modules, sTisthetransposeof s,and Bisarealsymmetricmatrix

k i k j calledthe modularity matrix withelements B = A − .A 9 ij ij 2m 0.100 10 is the adjacency matrix for the network where Aij = e means e links exists between nodes i and j, k and k are the respective 11 i j

C D F degreesofnodes iand j,and misthetotalnumberoflinksinthe 12 network. Elements of B reflect the statistical surprisingness of 0.010 13 links relative to what could be expected by random chance. A 14 positive (negative) Q value indicates that a network has fewer (more)linksthanexpectedbetweenitstwodivisionsasdelineated by s.Eachrowandeachcolumnof Bsumto0whichassuresthe 0.001 existenceofanalloneseigenvectorwithaneigenvalueofzero.A network is indivisible when no other s but the all ones vector Figure 3. The complementary cumulative distributions (CDF) producesanonnegative Qvalue.Indivisiblenetworkshavea Q of ndl s 9 to 14 on a log-log plot. valueof0.0.Optimallydividednetworkshavea Qvalueof1.0. Figure 3 depicts the complementary cumulative distribution function(CDF):P( x)=Pr(X≥x)of ndl s9to14onadoublelog Since Tsinthispaperarebinarytreeswhichsubdividemodules scale. Whether these distributions are best characterized by intotwomoreorlessequalhalves,the svectorsfor Qfollowsuit. powerlawsislessimportantthanthatthey areheavytailedand Forexample,tocalculatethehighestlevel Qvalueforanetwork rightskewed. The doubly logarithmic scale is used as it is a with TinFigure1,shas20elementswith+1initstophalfand1 convenient form to depict the distributions. Powerlaw initsbottomhalf.Therearetwo Qvaluesatthesecondhighest identification for the ndl s is hampered by the fact that N=200 level.The Qvalueforonemoduleiscalculatedfornodes0to9, whichissmallcomparedtothehundredsofnodesinrealworld and its s has 10 elements with +1 in its top half and 1 in its scalefree networks. deg min which is 3, is also smaller than 5 arXiv0903.2598v3 4 bottom half. The Q value for the other module is similarly modularthanN1relativeto T.Hence,the Qand Q2metricsare calculatedfornodes10to19. distinguishablefromeachother.

Table 2 gives a sample of the Q and Q2 values before modularization(m0)andaftermodularization(m8)fornetworks 5. HIERARCHY generatedfromfourdifferent ndl s.Thedegreedistributioncurve A spectrum that is inversely related with for ndl 2isbellshapedwhileitisrightskewedandheavytailed node degree is interpreted as indicative of hierarchical fortheotherthree ndl s(section3).Priortomodularization,the Q organization [19]. A network’s clustering coefficient spectrum values are negative and close to 0.0000 and the Q values by 2 C(k)valuesareobtainedbyaveragingtheclusteringcoefficientof definitionis0.0000.Afterthemodularizationalgorithminstep5 node iCiforall iwithdegree k. Ci istheratioofactualtopossible isappliedwith Pg=0.8,the Qvalues forthefirstthreehighest levels increase significantly towards 1.0. The average edge 2Ei linksamongstasetofnodes: Ci = . Eiisthenumberof distance ( aed ) values for modularized networks are at least 3.5 ki (ki − )1 timesthatofnonmodularizednetworks.Assuch, Q2valuesofthe linksbetweennode i’s kneighbors,and k(k1)/2isthenumberof modularizednetworksrisesignificantlyabove0.0000(Figure4). possible(undirectedsingle)linksbetween knodes[26].Figure5 Inshort,thenetworksdobecomemoremodularafterstep5,and shows that modularization significantly increases the network the Q2 measure does indicate increase in modularness of a clustering coefficient C for all networks regardless of degree network. distributiontype.

1.05 1.0 Q C_m0 0.8 C_m8 0.95 Q2 H_m0 0.6 H_m8 A_m0 0.85 0.4 A_m8 Q2_m8 0.2 0.75 0.0 0 5ndl 10 15 0.65 -0.2 0 5ndl 10 15 -0.4 Q Q Figure 4. and 2 values for modularized networks. Figure 5. Network characteristics for non-modularized (_m0) However, while the correlation between the Q values for the and modularized (_m8) networks. highest level and corresponding Q2 values is strong (0.8487 for Howeveraftermodularization, C(k)valuesforbroadconnectivity thenetworksinsection3),thetwomeasuresneednotnecessarily networks ( ndl s 9 to 14) become significantly more inversely rank networks by modularness in the same order. This reflects related with k, while C(k) values for random connectivity the semantic difference between Q and Q2. While Q measures networks ( ndl s 1 and 2) show almost uniform increases modularnessofanetworkwithrespecttoaparticulardivisionof independentof k(Figure6).Thisimpliesthataftermodularization thenetworkatasinglelevel, Q2considersthemodularnessofa step following the binary tree decomposition topology T, networkinitsentiretywithrespecttoaparticulardecomposition networkswithnormaldegreedistribution(ndl s1and2)areless topologyT. hierarchicallyorganizedthannetworkswithrightskeweddegree N1={(0,1),(0,2),(0,3),(0,4),(4,6),(4,5),(4,7)},andN2= distribution( ndl s9to14).Thisobservationissupportedbythe H {(0,1),(0,2),(0,4),(2,3),(4,5),(4,6),(6, 7)}, where ( u, v) metric[22]whichdecreases moresharply for ndl s1and2than representsalinkbetweennodes uand v,aretwonetworksofthe fortheother ndls (Figure5).The Hmetricidentifieshierarchical samesize(N=8,M=7).Assuming Tisaperfectbinarytreewith ts architectureinanetworkthroughtheconceptof hierarchical path =2,N1andN2havethesame Qvalue(0.7143)forthehighest [10]. “A path between nodes a and b is hierarchical if (1) the level,butcomparedwiththesamerandomgraphN1’s Q2valueis degreesofverticesalongthispathvary monotonously fromone smaller than N2’s rightly reflecting the fact that N2 is more to the other, or

Table 2. Network modular-ness pre- (m0) and post- (m8) modularization. N=200 Qvaluesforthefirstthreehighestlevels aed Q2 2_m0 0.0232 0.0069 0.1067 0.1940 0.0900 0.2407 0.0220 0.8940 0.0000 2_m8 0.9966 0.9666 0.9934 0.9725 0.9690 0.9339 0.9731 4.4669 0.7999 10_m0 0.0554 0.0893 0.0750 0.5936 0.1023 0.1478 0.4025 0.9014 0.0000 10_m8 0.9905 0.9715 0.9662 0.9268 0.9611 0.9456 0.9799 4.5757 0.8030 12_m0 0.0097 0.11097 0.08526 0.5147 0.1736 0.0636 0.0494 0.9219 0.0000 12_m8 0.9732 0.8371 0.9555 0.6585 0.9636 0.9636 0.9518 4.1295 0.7768 14_m0 0.0200 0.0394 0.1286 0.2849 1.0149 0.8272 0.1720 0.9896 0.0000 14_m8 0.9162 0.8118 0.7638 0.8887 0.6762 0.7780 0.5917 3.5460 0.7209 arXiv0903.2598v3 5 0.5 (2) vertex degrees first monotonously grow, reach maximum value, and then monotonously decrease along the path. Let the 0.4 fraction Hoftheshortestpathsinanetworkbehierarchical.Then thisnumber Hcanbeusedasametricofahierarchicaltopology 0.3 [27].If Hofanetworkissufficientlycloseto1,thenetworkhasa pronouncedhierarchicalorganization.”[8]. 1_m0 0.2 1_m8 Thinking in terms of hierarchical paths can help to explain the Average C(k) Average lackofhierarchicalorganizationinnetworkswithnormaldegree 0.1 distribution( ndl s1and2).Thesenetworkslackhubs,nodeswith unusually high degree, through which paths between pairs of 0.0 nodes especially distant pairs of nodes (in terms of their label 2 4 6 8 10 differences,e.g.theleftmostandtherightmostnodesinFigure1), Degree k canroutethrough.Therefore,theyhavefewerhierarchicalpaths andlower Hvalues.Postmodularization,the Hvalues(Figure5) 0.6 becomepositivelycorrelatedwithMaxnodedegree(Table2). 9_m0 0.5 9_m8 6. SMALL-WORLDNESS 0.4 Thedistancebetweentwonodesisthenumberofedgesorlength 0.3 of a shortest (geodesic) path connecting them. Average path lengthistheaveragedistancebetweenallnodepairsinanetwork. 0.2 Average C(k) Average The diameter (maximum) and average path lengths ( apl ) for 0.1 networks generated from ndls 1, 2, 9 and 10 increased significantly after modularization (Figure 7). This can be 0.0 explainedbythelackofhubs,ornodeswithlargeenoughdegree 0 5 10 15 20 values to span module boundaries and become shortcuts in a network.Interestingly,formodularizednetworks,thereseemsto Degree k beaninverserelationshipbetween Hand apl .Inparticular,Hfor 0.6 ndls 1,2,9and10declinesastheir apl increases. 11_m0 0.5 25 11_m8 avg_m0 0.4 20 avg_m8 0.3 max_m0 15 max_m8 0.2 Average C(k) Average

0.1 10

0.0 Shortestpath length 5 0 10 20 30 40 Degree k 0 0.6 ndl 0 5 10 15 13_m0 0.5 13_m8 Figure 7. Maximum and average path length for non- modularized (_m0) and modularized (_m8) networks. 0.4 Anetworkexhibitsthe small-world propertyifthenetworkhasa 0.3 highclusteringcoefficientandifashortpathconnectsmostpairs ofnodesinthenetworkormoreprecisely,iftheaveragedistance 0.2 Average C(k) Average (pathlength)betweennodes,growslogarithmicallyorslowerwith 0.1 network size for fixed mean degree [26]. Many artificial and naturalnetworksexhibitthesmallworldproperty,e.g.[3,20,23, 0.0 24,25]. 0 10 20 30 40 50 60 Modularization increases network clustering for all networks Degree k (Figure 5). In this partial sense, all networks become smaller Figure 6. Clustering coefficient spectrum for non-modularized worlds compared to their premodularized state (when network (_m0) and modularized (_m8) networks. The network with clusteringisquitelow).However,duetotheirsmaller apl values, normal (1_) is less hierarchically modularized networks generated from ndls 11to14aresmaller organized than the other networks. Note the difference in the worldscomparedtotheothermodularizednetworksinthispaper. x-axis. arXiv0903.2598v3 6 7. DEGREE CORRELATION 20 18 1_m0 Degreecorrelationreferstothecorrelationbetweenthedegreesof 16 1_m8 connected nodes in a network. Degree correlation of a network 14 (the A values in Figure 5) is measured using Eq. 4 in ref. [16] 12 which measures the correlation of degrees at either end of an 10 edge.Networkswithnegative Avaluesaretermed disassortative (nodes tend to link to other nodes unlike themselves, e.g. high 8 6

degreenodesaremorelikelytoconnectwithlowdegreenodes), k_nn(k) Average while networks with positive A values are termed assortative 4 (nodestendtolinktoothernodeslikethemselves).Collaborative 2 and social networks tend to show assortative mixing by degree, 0 while biological networks and the World Wide Web show 0 2 4 6 8 10 disassortativemixingbydegree[14]. Degree k 20 No significant change in network degree correlation is detected 18 9_m0 betweenperandpostmodularizationnetworks(Figure5).Thisis 16 not surprising since no effort is made to either increase or 9_m8 decreasedegreecorrelationinapositiveornegativemanner(but 14 see Appendix A for a way to vary ). However, 12 networksfrom ndls 11to14areslightlymoredisassortative( A< 10 0)thantheothernetworks. 8 6

Degreecorrelationofanetworkcanalsobeinferredbyobserving k_nn(k) Average how the average degree of nearest neighbours of nodes with 4 2 degree k (k nn (k)) change as a function of k (we follow Eq. 6 in [7]).Ifthisfunctionisconstant,degreesofconnectednodesare 0 uncorrelated.Anincreasingk nn (k)withincreasing kmeansthaton 0 5 10 15 20 average, degree of nodes neighbouring kdegree nodes increase Degree k with increases in k, and this implies assortativity. A decreasing 20 knn (k) with increasing k means on average, degree of nodes 18 11_m0 neighbouring kdegreenodesdecreasewithincreasesin k,andthis 16 11_m8 impliesdisassortativity.Bythisdefinition,thenetworksgenerated 14 from ndls 11 and 13 show slight disassortativity, while the 12 degrees of neighbouring nodes in the other networks appear 10 uncorrelated(Figure8). 8 6 Average k_nn(k) Average 8. NODE CENTRALITY 4 Centrality of a node measures the number of shortest paths 2 between other nodes that traverses the node [14]. Prior to 0 modularization,hubsornodeswithhighdegreetendtooccupya 0 10 20 30 40 central position in all networks. After modularization however, Degree k the positive correlation between node degree and centrality 20 declinesmoresubstantiallyfornetworksfrom ndls 1,2,9and10 18 13_m0 thanfortheothertestproblems(Figure9). 16 13_m8 1.2 14 12 1.0 DC_m0 10 8 0.8 DC_m8 6 Average k_nn(k) Average 0.6 4 2 0.4 0 0 10 20 30 40 50 60 0.2 Degree k Degree-CentralityCorrelation Figure 8. Nearest-neighbours spectrum for non-modularized 0.0 (_m0) and modularized (_m8) networks. Note the difference in ndl 0 5 10 15 the x-axis. Figure 9. Correlation between node degree and centrality.

arXiv0903.2598v3 7 [4] Boguñá,M.,andPastorSatorras,R.(2003)Classof 9. DISCUSSION correlatedrandomnetworkswithhiddenvariables. Physical Review E 68036112. Wehaveproposedalinkswitchingmodularizationmethodbased [5] Clauset,A.,Moore,C.andNewman,M.E.J.(2008) on a prespecified topology and edge distance (section 2) and Hierarchicalstructureandthepredictionofmissinglinksin applied the algorithm on eight different node degree lists ( ndl ) networks. Nature 453:98101 generated at random from different probability distributions [6] Clauset,A.,Shalizi,C.R.andNewman,M.E.J.(2007) (section 3). Several key network structural characteristics: Powerlawdistributionsinempiricaldata. modularity, hierarchy, clustering, path length, smallworldness, arXiv:0706.1062v1 degreecorrelationandcentralitywereexaminedandthefollowing [7] Colizza,V.,Flammini,A.,Maritan,A.andA.Vespignani wereobserved: (2005)Characterizationandmodelingofproteinprotein (i) Ingeneral,networksgeneratedfrom ndls 11to14(Group interactionnetworks. Physica A 352:127. A)exhibitdifferentstructuralcharacteristicsfromnetworks [8] Dorogovtsev,S.N.andJ.F.F.Mendes(2004)Theshortest generatedfrom ndls 1,2,9and10(GroupB).Thissuggests pathtocomplexnetworks. arXiv:cond-mat/0404593v4 . theinfluenceofdegreedistributionontheotherstructural [9] Girvan,M.andM.E.J.Newman(2002)Community characteristics. structureinsocialandbiologicalnetworks. Proceedings of the National Academy of Sciences 99:78217826. (ii) After modularization, all networks have higher Q and Q2 [10]Gao,L(2001)Oninferringautonomoussystemrelationships values(section4).However,GroupAnetworksshowmore intheInternet. IEEE/ACM Transactions on Networking 9: hierarchical organization (section 5), have smaller 733745. diameters, have shorter average path lengths, are smaller [11]Hogg,T.(1996)Refiningthephasetransitionin worlds(section6),exhibitmoredegreecorrelationbetween combinatorialsearch . Artificial Intelligence ,81(12):127 neighbouring nodes (section 7), and have higher degree 154. centralitycorrelation(section8)thandoGroupBnetworks. [12]Khor,S.(2009) Effectofdegreedistributiononevolutionary Coincidently, ndls 11 to 14 are those generated from a search.Extendedabstracttoappearin GECCO 2009. powerlaw distribution with degree exponents 2.6 and 3.0 [13]Newman,M.E.J.(2003)Randomgraphsasmodelsof (section3). networks.In:S.BornholdtandH.C.Schuster(Eds.), An application of the link switching modularization method Handbook of Graphs and Networks: from the Genome to the proposed here is reported elsewhere [12]. Briefly, the generated Internet .WileyVCH,Berlin,3568. networks are used as test problems to explore the effect of [14]Newman,M.E.J.(2003)Thestructureandfunctionof modularity and degree distribution on evolutionary search complexnetworks. SIAM Review 45:167256. algorithms. Further work is planned in this area of connecting [15]Newman,M.E.J.(2006)Modularityandcommunity problem structure, in terms of the topological characteristics of structureinnetworks. PNAS 103(23):85778582. theirconstraintnetworks,andevolutionarysearch. [16]Newman,M.E.J.(2002)Assortativemixinginnetworks. Physical Review Letters 89,208701. A relevant question in this respect is how well do the networks [17]Maslov,S.andK.Sneppen(2002)Specificityandstability producedinthispaperreflecttherealworldsearchproblems?To intopologyofproteinnetworks. Science 296:910913. answer this question, further investigation of realworld search [18]Ravasz,E.,Somera,A.L.,Mongru,D.A.,Oltvai,Z.N.and problems is first needed, and refs. [11, 25] are two promising Barabási,A.L.(2002)Hierarchicalorganizationof startingpointsinthisdirection.Itisalsoencouragingtofindthat modularityinmetabolicnetworks. Science 297:15511555. networks in Group A have smallworld tendencies since several [19]Ravasz,E.andBarabási,A.L.(2003).Hierarchical benchmark search problems [25], including the protein contact organizationincomplexnetworks. Physical Review E 67: mapwhichhasbeenusedforproteinfoldingarealsofoundtobe 026112 smallworlds[23]. [20]Scala,A.,Amaral,L.A.N.,andM.Barthélémy.(2000) However, in terms of biological networks, we detect a slight Smallworldnetworksandtheconformationspaceofa mismatch between networks in Group A and the relationship latticepolymerchain. arXiv:cond-mat/0004380v1 betweenmodularity,hierarchyanddegreecorrelationinref.[21] [21]Simon,H.A.(1969). The Sciences of the Artificial .MIT which associates antihierarchical organization with Press. disassortativity,anddisassortativitywithmodularity(isolationof [22]Trusina,A.,Maslov,S.,Minnhagen,P.,andK.Sneppen. parts)[17].Thisdiscrepancymaybedueinparttodifferencesin (2004)Hierarchymeasuresincomplexnetworks. Physical networkconstruction,andmaywarrantfurtherinvestigation. Review Letters 92,178702. [23]Vendruscolo,M.,Dokholyan,N.V.,Paci,E.andM.Karplus. (2002)Smallworldviewoftheaminoacidsthatplayakey 10. REFERENCES roleinproteinfolding. Physical Review E (65):061910. [24]Wagner,A.andD.A.Fell.(2001)Thesmallworldinside [1] Albert,R.andBarabási,A.L.(2002)Statisticalmechanics largemetabolicnetworks. Proc. R. Soc. Lond. B (268):1803 ofcomplexnetworks. Reviews of Modern Physics 74(1):47 1810. 97. [25]Walsh,T.(1999)SearchinaSmallWorld. Int. Joint [2] Barabási,A.L.,andAlbert,R.(1999)Emergenceofscaling Conference on Artificial Intelligence :11721177. inrandomnetworks. Science 286:509512. [26]Watts,D.J.andStrogatz,S.H.(1998)Collectivedynamics [3] Bartoli,L.,Fariselli,P.andR.Casadio.(2007)Theeffectof of‘smallworld’networks. Nature 393:440442. backboneonthesmallworldpropertiesofproteincontact [27] XulviBrunet,R.andSokolov,I.M.(2004)Reshuffling maps. Physical Biology (4):L1L5. arXiv0903.2598v3 8 scalefreenetworks:Fromrandomtoassortative. Physical Further, network modularity (Q2) and clustering (C) are not or Review E 70 ,066102. onlyslightlyaffectedbythechangeindegreecorrelationbrought [28] Zhou,S.andMondragon,R.J.(2007)Structuralconstraints aboutbythemodification(FigureA2).Thus,ourmethodisable incomplexnetworks. New Journal of Physics 9 ,173. to produce modular networks with prescribed degree sequence, highclustering,andvaryingdegreecorrelation. Appendix A. Varying Degree Assortativity 1.0 C top 10 0.75 Q2 top 10 0.75 Inthissection,thealgorithmdescribedinsection2ismodifiedto C top 10 0.25 Q2 top 10 0.25 C null Q2 null producenetworkswithvaryingdegreecorrelation(section7).The 0.9 C top 10 0.00 Q2 top 10 0.00 basic idea behind this modification is to control links between nodes of high degree. To increase disassortativity, fewer to no 0.8 links are permitted between nodes of high degree. To increase assortativity, more to all possible links are made obligatory 0.7 between nodes of high degree. This modification bears resemblance to richclub connectivity [28], and to an extension 0.6 mentionedin[27]toincreaseclustering. Weillustratethismodificationwiththefollowingthreecaseson 0.5 test problems 11 to 14 described in section 3. All three cases Clustering(C), Modularity (Q2) involvetennodeswiththehighestdegrees 1.Letthisnodesetbe 0.4 R. 10 11 12 13 14 15 (i) “Top 10 0.75”: Nodes in R are randomly linked with Network / Test problem probability0.75,andtheselinkscannotberemoved. Figure A2. Clustering (C) and Modularity (Q2) (ii) “Top 10 0.25”: Nodes in R are randomly linked with The networks produced with the modified algorithm also probability0.25,andtheselinkscannotberemoved. demonstratetheassociationmadein[22]betweennegativedegree correlation or disassortativity and antihierarchical organization. (iii) “Top100.00”:NolinksarepermittedbetweennodesinR. Networks produced under condition (iii) are more disassortative AdditionallinksbetweennodesinRmaybecreatedincases(i) and have smaller H values than the “null” networks, while and(ii).Theeffectofthismodificationiscomparedtothe“null” networks produced under conditions (i) and (ii) which tend case, which is produced by the original algorithm, i.e. sans this towards assortativity are more hierarchical and have larger H modification. values(FigureA3). Asinsection7,degreecorrelationofanetworkismeasuredusing 1.2 Eq.4inref.[16].FigureA1comparesthedegreecorrelationof networks produced under the above three conditions. Condition 1.0 (iii) made the networks more disassortative than the “null” case networks.Conditions(i)and(ii)whilenotexhibitingasignificant 0.8 difference between them, nevertheless increased assortativity or decreased disassortativity of the networks. Hence, our 0.6 modificationproducesthedesiredeffect. top 10 0.75

Hierarchy(H) 0.4 0.20 top 10 0.25 0.15 top 10 0.75 null 0.2 null top 10 0.00 0.10 top 10 0.25 top 10 0.00 0.0 0.05 10 11 12 13 14 15 0.00 -0.05 Network / Test problem -0.10 Figure A3. Hierarchy (H)

Assortativity -0.15 Figure A4 compares the median path length amongst nodes in -0.20 different degree quartiles averaged over the networks. The -0.25 quartiles are formed as follows: (i) unique degree values are -0.30 sortedinascendingorder,and(ii)thissortedlistisdividedinto 10 11 12 13 14 15 four(almost)equalparts.Quartile1nodesarethosewithdegree values larger than or equal to the minimum value in the upper Network / Test problem quartileofthissortedlist(Quartile1nodesarethosewithhigher Figure A1. Degree-degree correlation (A) degrees).Quartile2nodesarethosewithdegreevalueslargerthan or equal to the median of this sorted list. Quartile 3 nodes are those with degree values larger than or equal to the minimum valueofthelowerquartileofthissortedlist.Quartile4comprises 1 Nodesaresortedindescendingorderoftheirdegree,andthefirstten all nodes in the network. We observe that the difference in nodesinthislistaremademembersofR. arXiv0903.2598v3 9 average median path length (AMPL) is larger in the first two Finally, we confirm that it is necessary to use nodes of high quartiles(1and2),thaninthelasttwo.Also,networksproduced degree for R to vary degree correlation. Figure A7 gives the underconditions(i)and(ii)haveshortermedianpathlengthsthan degree correlation for networks produced under the same three case(iii)networksonaverage.Thisisexpectedgiventhewaythe conditions but with ten randomly selected R nodes. The degree networkswereconstructed. correlations of these networks are indistinguishable from the “null”network. 5.0 top 10 0.75 0.20 4.5 top 10 0.25 null 0.15 ran 0.75 4.0 top 10 0.00 ran 0.25 0.10 3.5 null 0.05 3.0 ran 0.00 0.00 2.5 -0.05 2.0 -0.10

1.5 Assortativity -0.15 1.0 -0.20

Average Median0.5 Path Length -0.25 0.0 -0.30 1 2Quartiles 3 4 10 11 12 13 14 15 Figure A4. Average Median Path Length by Quartile Network / Test problem Errorbarsindicate ±onestandarddeviation. Figure A7. Random nodes for R FigureA5givethediameter(longestshortestpathlength)ofthe networksandFigureA6,theaverageandmedianpathlengthof Appendix B. Equal-only and Unequal-only the networks. Distinct differences between case (i) and (ii) problems networks and case (iii) networks in these figures are not discernable. Using the method described in Appendix A herein, we 14 conducted experiments that support the hypothesis that top 10 0.75 givensimilarconditions(i.e.numberofnodes,numberof 12 top 10 0.25 links,degreedistributionandclustering),fewercolorsare 2 10 null needed to color disassortative than assortative networks . top 10 0.00 Thisresultcontradictsourpreviousresultssomewhatinthe 8 sensethatshortercharacteristicpathlengthsamongsthubs 6 wereviewedfavourablyin[12]forquickresolutions,and leads us to the insight that search difficulty for problems 4

Network diameter Network with broad degree distribution can vary considerably 2 depending on the degree of separation between hubs, and whether the problem constraints are equal-only, e.g. the 0 problems in [12], or unequal-only, e.g. graph-coloring. 10 11 12 13 14 15 Network / Test problem Figure A5. Network diameter 5.5 avg top 10 0.75 avg top 10 0.25 5.0 avg null avg top 10 0.00 4.5 median top 10 0.75 median top 10 0.25 4.0 median null median top 10 0.00 3.5 Shortest path length 3.0

2.5 10 11 12 13 14 15 Network / Test problem 2 Figure A6. Average and Median Path Length Thispointhasbeendevelopedfurtherinrelationtobiologicalnetworks andref.[17].Seeauthor’swebsiteforfurtherupdatesonthisfront. arXiv0903.2598v3 10