Discriminating topology in galaxy distributions using network analysis

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Hong, Sungryong, Bruno C. Coutinho, Arjun Dey, Albert -L. Barabási, Mark Vogelsberger, Lars Hernquist, and Karl Gebhardt. 2016. “Discriminating Topology in Galaxy Distributions Using Network Analysis.” Monthly Notices of the Royal Astronomical Society 459 (3): 2690–2700. https://doi.org/10.1093/mnras/stw803.

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:41381855

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#OAP arXiv:1603.02285v1 [astro-ph.CO] 7 Mar 2016 a,aempigotteepninhsoyo h Uni- the of history expansion under- the or et beginning out Delubac currently mapping 2015), 2013, al. are al. way, et al. et cosmol- Zhao et precision 2015, Aghanim (Levi in al. forward 2013, experiments step Various standard new ogy. a a al. made as et and cosmology 2015) Hinshaw (ΛCDM) (e.g., matter paradigm so-called dark (CMB), the cold motivated background have Λ microwave peaks, cosmic acoustic specifically the in features parame- definitive measure cosmological provide and important ters. can constraining quantify them for to of probes many critical distribu- since undeniably matter features, is and such It energy cosmic evolved, tion. the formed, in have vanished features and topological and metrical tet abig,Msahsts018 USA 02138, Massachusetts Cambridge, Street, Ca Technology, USA of 02139, Institute Massachusetts bridge, Massachusetts Research, Space Hungary Medical Budapest, 1051, Harvard USA Hospital, 02115, Women’s Massachusetts Boston, and School, Brigham Medicine, Massachusetts USA Boston, University, USA Northeastern 78712, Physics, TX Austin, C1400, Stop Speedway, 2515 Austin, USA 85719, uigteps w eae,suiso anisotropic of studies decades, two past the During geo- various Universe, the of history the Throughout rpittpstuigL using typeset Preprint 7 6 5 4 3 2 1 ugyn Hong Sungryong avr-mtsna etrfrAtohsc,6 Garden 60 Astrophysics, for Center Harvard-Smithsonian an Astrophysics for Institute Kavli University, Physics, of European Department Central Science, Network for Center Network of Division Channing and Medicine of Department of at Department and Texas Research Network of Complex University for Center The AZ Astronomy, Tucson, of Observatory, Department Astronomy Optical National ICIIAIGTPLG NGLX ITIUIN SN NET USING DISTRIBUTIONS GALAXY IN TOPOLOGY DISCRIMINATING ffiin olfrdsrmntn oooisadfrcmaigobserv comparing for de and headings: the topologies that Subject discriminating find We for quantitatively. L´evy tool t a efficient therefo function, not distributions; is correlation simulated Illustris two point by the two friends-of- between conventional and measu separation abundance a and clear the by tools Unlike built analysis networks lengths. network from adopt transitivity we and topologies, missing these n w on orlto ucin hi pta itiuin r v are distributions spatial distribution point their galaxy function, structures simulated correlation two filamentary point the a two while function that, and correlation find the We matching L´evygalaxies. walks generate we Then, h w on orlto ucino hs iuae aaisfollows galaxies simulated these of function correlation point two The rmIlsrs(oesegre l 04)adslc aaiswt st with galaxies select and 2014A) al. et (Vogelsberger Illustris from w-on orlto ucin o h omlgclsmlto,we simulation, cosmological topolog on the different have For distributions, that L´evy function. walk, point a correlation by galaxy two-point other two the investigate and We dist simulation different demonstr topologically correlations. We between distr point discriminate degeneracies. the can break of analysis to topology network correlations the order capture higher not to does statistic this However, h ag-cl itiuino aaisi eeal nlzduigthe using analyzed generally is galaxies of distribution large-scale The 1. A INTRODUCTION T 1,2 E tl mltajv 2/16/10 v. emulateapj style X rn Coutinho Bruno , ehd:dt nlssglxe:frainglxe:evolution–la formation–galaxies: analysis–galaxies: data methods: ueo nvre:ntokscience network : Universe of ture hc r rsn ntesmlto r beti ´v rcas To L´evy . in absent are simulation the in present are which , Hernquist 3 ru Dey Arjun , 02115, 7 n alGebhardt Karl and , ABSTRACT m- d 1 let-.Barab -L. Albert , z etadcmrhniefaeokfrqatfigand quantifying consis- for a find framework to comprehensive applied the is and of it in al. tent range difficult structure how et heterogeneous reflects of but Cautun methodologies nature wide 2010, Bond, this the Universe, 2008, into al. provided insights have et al. diagnostics important topological Lidz et these Sousbie While 2010, Arag´on-Calvo2013). 2007, Cen 2005, Colberg & 2007, al. Strauss Mart´ınez et al. 2003, et 1985, al. Sonoda & et Bhavsar Sheth Barrow, (e.g., smoothed matrices and identify wavelets, Hessian theory, To Morse minimum-spanning watersheds, including 2004). trees, adapted science, of been fields al. have other methods et from various Eriksen filaments, and 1987, voids Wein- Melott (Gott, & functionals berg Minkowski as and such numbers introduced, genus been have measurements topological 2007, al. 2015). are et measures al. these Kulkarni Gil-Mar´ın however, et (e.g., 2015); challenging al. et computationally Ade 2014, non- al. (Barkats the et fluctuations constrain quantum correlation can primordial point of tri-spectra) struc- Gaussianity four and and cosmic bi- three (or, analyzing functions the for example, For essential tures. are order quanti- statistics Higher for tion are structures. spectra) cosmic power fying (or two-point the functions important correlation how demonstrate features BAO itiuinoe iernei esit,from redshifts, galaxy in large-scale range the wide of a maps over detailed distribution also most the will in experiments result These (BAO). baryon oscillations measuring acoustic by accuracy, unprecedented with verse ∼ ln ihtesuccessful the with Along and peaks acoustic CMB measuring of successes The 3. t hta lent prahusing approach alternate an that ate e h aaydsrbto simulated distribution galaxy the re, bto,adi sncsayt resort to necessary is it and ibution, 2 red eiewt aiu linking various with recipe friends cie ewr uniisoe an offer quantities network scribed daudnewt h simulated the with abundance nd la assgetrta 10 than greater masses ellar dadtertcldistributions. theoretical and ed e u il h aepower-law same the yield but ies dp h redshift the adopt r ieet otprominently, most different; ery eentokqatte eela reveal quantities network hese asi ´ igepower-law, single a iuin hthv iia two- similar have that ributions edaee,gatcomponent, giant diameter, re rdcdb cosmological a by produced e function. correlation two-point 3,4,5 aetesm abundance same the have s akVogelsberger Mark , OKANALYSIS WORK n − g-cl struc- rge-scale on ttsis many statistics, point z ξ 0 = ( r ) n quantify ∼ . 8slice 58 − r 8 on correla- point − M 6 1 Lars , . ⊙ 5 . . z ∼ to 0 2 Hong et al. measuring the topology of the Universe, in contrast to invariant in the strict mathematical sense in order to be the successful n−point statistics. topologically meaningful. For example, the set of Roman Many of these studies generate a continuous density alphabets is topological. We can consistently recognize field by smoothing the galaxy point distribution and then letters irrespective of font or handwriting since each al- measuring geometric topologies of genus numbers and phabet has its own distinct topology. “i”, “k”, “l” are Minkowski functionals. Our approach, which we term topologically very different even in mathematically rigor- “network cosmology”, is to characterize the topology of ous measures. However, “i” and “j” are indistinct topo- the discrete point distribution directly using graph the- logically. They are discerned instead by the differences in ory and network algorithms. length and curve (angle). The process of reading, i.e., vi- As a pilot study to explore new ways to quantify cosmic sually measuring the characteristics of each letter, is pre- topologies, Hong & Dey (2015; hereafter, HD15) applied dominantly topological but includes geometric aspects. the analysis tools developed for the study of complex In galaxy surveys, n−point statistics are typical mea- networks (e.g. Albert & Barab´asi 2002, Newman 2010) surements, as presented in §1. These are geometri- to the study of the large-scale galaxy distribution. The cally driven measurements; n−point correlation func- basic idea is to generate a graph (i.e., a “network”) com- tions contain specific information about distances and posed of vertices (nodes) and edges (links) from a galaxy angles between galaxies. From a practical standpoint, distribution, and then measure network quantities used this renders n−point statistics computationally challeng- in . In this paper, we demonstrate the util- ing, since computation times are dominated by the han- ity of these techniques for differentiating between point dling of geometric information. distributions that have identical two-point correlations If we are only interested in topological features, much but different spatial distributions and topologies. of the geometric information is redundant. For example, Our paper is organized as follows. In §2, as a more if we need to count all triangles in a friends-of-friends specific introduction to this paper, we offer a general network from a certain galaxy distribution, we can run discussion about what types of features can be measured a network algorithm to count all triangular subgraphs. from galaxy survey data, the strong and weak points of We do not need to measure the three point statistic for n−point statistics, and how network representations of the problem of only counting triangles. Likewise, if we galaxy distributions can improve our ability to quantify are interested in the number of holes for an object, we do topological features in the Universe. In §3, we describe not need to know whether it looks like a mug or a donut. our samples to be investigated, the snapshot of Illustris Therefore, in practical analyses, we need to determine data (Vogelsberger et al. 2014A) and L´evy walks with whether we are interested in quantifying geometric con- various parameters. In §4, we present the two-point cor- figurations or topological textures, when extracting mea- relation functions and network measurements from the surements from galaxy survey data. Theoretically, a samples and discuss the results. Then, we summarize in complete set of n−point statistics can suffice to charac- §5. terize all aspects of a point distribution. However, such geometric analyses can be very inefficient when our prime 2. GEOMETRIC CONFIGURATIONS VS. TOPOLOGICAL TEXTURES IN GALAXY SURVEYS focus is to quantify topological textures of the Universe. Sections 2.1 and 2.2 present the definitions of geometry 2.2. Continuous Density Function vs. Discrete Point and topology used in this paper and our overall philoso- Distribution phy in applying methods of network analyses to galaxy To quantify topological structures of the Universe, distributions. Readers can skip these sections without many conventional studies have used geometric topol- losing the main thread of this paper. ogy, where a metric topology is well-defined in a con- 2.1. tinuous cosmic matter distribution, ρ(x), or its density Geometry vs. Topology contrast, δ(x). In this approach, discrete observables The terms geometry and topoplogy are often used in- such as galaxy or halo distributions, n(x), are considered terchangeably in astronomical contexts. Geometry can as biased samplings of the underlying continuous cosmic be defined as the study of shapes of known metric di- matter distribution. Therefore, we generally smooth this mensions, whereas topology refers to the intrinsic shape discrete point distribution to approximate the continuous properties that are invariant to deformation (i.e., ho- mass field. motopic). For example, triangles are 3-sided geometric In a different and empirical approach, we do not shapes that are characterized by the measures of their smooth over the discrete observable, n(x). Instead, we angles and sides. However, removing all metric features build a network structure (or, a graph) from this dis- from triangles, we can also represent them topologically crete observable, and measure network quantities; hence, as a metric-free structure with three vertices where each algebraic topology from discrete observables, contrast to vertext is connected to the other two by two edges. An- the previous approach of metric topology from contin- other well-known example is the comparison between a uous observables. Hereafter, we refer to the former as mug and a donut; these are different geometric shapes “DA” (discrete and algebraic) approach, and the latter with a common topology, the latter measured by a zero as “CM” (continuous and metric) approach. genus number. The CM and DA approaches differ in methodology. Euler characteristics in graphs or genus numbers in The CM approach is based on differential geometry and manifolds are mathematically well-defined topological topology; hence, parameters of geometric shape and measures, invariant under homotopy or homeomorphism. topology are derived from differentials or integrals of the However, most practical measurements are both geomet- density field. For example, the Hessian matrix is derived ric and topological, and do not have to be homotopy from partial differentials of the density field. From the Network Topology 3 eigenvalues of this matrix, the clusters, walls, and fila- function (Berlind & Weinberg 2002, Zheng et al. 2005, ments of the density field are classified (Arag´on-Calvo Tinker 2007). From these halo occupation studies, there et al. 2007, Bond et al. 2010, Cautun et al. 2013). should be a transitioning scale from the dominance of Minkowski functionals are defined using integrals of the the one halo term to the two halo term; hence there is density field to quantify geometric and topological fea- no need for galaxies to show a single, seamless power-law tures such as area, perimeter, and genus (Mecke et al. clustering trend. The apparent single power-law behav- 1994, Park et al. 2005, Hikage et al. 2008, Ducout et al. ior, especially for low redshift galaxies, is thought to be 2013). due to massive galaxy clusters whose contribution erases On the other hand, our DA approach (which we refer the transition feature (Berlind & Weinberg 2002). to as “network cosmology”) mostly utilizes network al- For a single power-law correlation, a couple of methods gorithms developed and used in computer science, math- have been proposed to generate mock galaxies including ematics, physics, and sociology. With its roots in Eu- L´evy walks (Mandelbrot 1975) and multi-layered shells ler’s brilliant solution to the K¨onigsburg bridge problem akin to Russian dolls or onion rings (Soneira & Peebles (Euler 1741), has grown rapidly during 1978; hereafter, SP78). These models are “statistical”, the last two decades, driven by the growth of comput- since, unlike galaxies in simulations or the real Universe, ing power, large databases, and internet infrastructures their clustering properties do not originate from “gravita- (Albert & Barab´asi 2002, Barab´asi 2009, Newman 2010). tional” interaction, but instead from a statistical fractal Networks can be constructed for studying subjects as di- realization. These models can be tuned to match both verse as the relationships between costarring actors, pro- the abundance and the two-point correlation function of tein interactions, paper citations, the food web, power the observed galaxy distribution. grids, traffic patterns, the world wide web (WWW), etc. Now, we raise two questions: Many network tools have been developed to extract use- (1) What is the gap between gravitational and statis- ful information from these various kinds of big data net- tical realizations?, and works. We have attempted, therefore, to utilize these (2) Can we quantify the gap to finally test how much network tools for investigating galaxy survey data. For statistical models are reliable as mocks? example, PageRank was developed for prioritizing the These are based on doubts about the sufficiency of in- importance of WWW documents, used in the search en- formation from abundance and two-point statistics for gine, Google8 (Page et al. 1999). We can measure these testing cosmologies and features that are missing in two- PageRanks for galaxies, once we build a galaxy network point statistics. from galaxy survey data. As we have a friend recom- Interestingly, it is trivial for the human eye to cap- mendation from Facebook 9, such a recommendation al- ture the gap between statistical models and observed gorithm also can be applied to our galaxy network. This (or simulated) galaxy distributions. SP78 reproduced is the basic philosophy of our network cosmology. a reasonable first approximation of the observed galaxy Early attempts of applying network science tools to distribution using their fractal model to match the ob- galaxy point distributions made use of meth- served single power-law clustering. And, due to some ods and the minimum spanning tree, or MST (see, e.g, visual gap in spatial distributions between their models Shandarin et al. 1983AB, Barrow et al. 1985, Colberg and observed galaxies, they remarked on the ability of 2007). Since these pioneering papers, the tools developed the human eye, inherently optimized to detect topologi- for analyzing networks have proliferated and mathemat- cal patterns rather than mathematical geometries. SP78 ically matured. Our earlier work, HD15, investigated note that patterns easily discriminated by the human eye galaxy distributions using various measures of network are difficult to quantify, when compared to mathemati- centrality (degree, betweenness, and closeness). In this cally straightforward n−point statistics. Overall, SP78 paper, we apply the network measures of diameter, giant implied that there are some features, easily captured by component fraction, and transitivity to simulated galaxy the human eye, that are not easily quantified by n−point point distributions. statistics. In this paper, we show that statistical ensembles pro- 2.3. Degeneracy in Two-point Correlation Function duced by L´evy walks do not resemble simulated galaxies It has long been reported that observed galaxy popula- upon visual inspection (§3), agreeing with the same qual- tions exhibit single power-law clusterings within several itative conclusion of SP78. However, to make a new step tens of megaparsecs in comoving scale (e.g., Davis & Pee- forward, we propose that topologically motivated diag- bles 1983, Shandarin & Zeldovich 1989, Adelberger et al. nostics, especially the network measurements adopted in 2005). Within the cold dark matter paradigm of galaxy what follows, can quantify such eye-capturing features. formation, galaxies are biased tracers of the underlying To test our proposal, we employ a simple setup, as matter distribution and the clustering properties of dif- follows. First, we adopt simulated galaxies as a cosmo- ferent galaxy populations can be diverse, depending on logical sample. While there are discrepancies between how galaxies populate their dark matter halos. Analysis observed and simulated galaxies, cosmological hydrody- of the two-point correlation function of different galaxy namic simulation can provide accurate three-dimensional populations has resulted in the idea of the “halo occupa- positions with realistic galactic properties, appropriate as tion distribution”; i.e., the probability that a given halo a simple pilot study without any observational compli- contains a certain number of galaxies, and has given rise cation. Second, we generate a statistical ensemble, using to various analytic and probabilistic formulations of this L´evy walks, to match the two-point correlation function of the simulated galaxies. The next section, §3, will cover 8 http://www.google.com 9 these two steps and present the spatial distributions of http://www.facebook.com the simulated and statistical models to show any visual 4 Hong et al. gap between them. Finally, we measure network quan- A L´evy walk (or a L´evy flight) is a random walk, whose tities for the simulated sample and statistical ensemble step-size l follows the distribution in §4. These network measurements will explain why (l /l)α for l ≥ l we recognize a difference between statistical and gravi- P (>l)= 0 0 (1) tational realizations by sight, while they have practically  1 for ll0. All galaxy pairs closer than this mini- coordinates. The resolution of the dark matter mass is mum length are random encounters resulting in flat clus- 6 6.26 × 10 M⊙ and the resolution of the baryonic mass tering for r ≤ l0. The middle panels of Figure 1 show 6 1.26 × 10 M⊙. We selected galaxies with stellar mass the spatial distributions of two LWIB models, LWIBa 8 ≥ 10 M⊙; this yields a sample of 75,050 galaxies. Here- and LWIBb; their model parameters are summarized in after, we refer to this sample as “Snap100”. Table 1. The top-right panel shows the two-point cor- The top-left panel of Figure 1 shows the two- relation functions of these two models, LWIBa (black) dimensional spatial distribution of Snap100, projected and LWIBb (grey). The vertical dashed lines represent along the z-axis. We can identify rich structures of clus- the minimum step sizes: l0 = 0.2 (grey) and l0 = 0.24 ters and filaments. The red-open diamonds in the top- (black). The two LWIB models match well the two point right panel of Figure 1 show the two point correlation correlation function of Snap100 for r >l0. However, function of Snap100, measured using the method from for r ≤ l0, the clustering flatten out due to their intrinsic Landy & Szalay (1993). We do not apply integral con- limitations, as noted above. To extend the power-law be- straints to any of the samples in this paper, since they havior to the smaller scales r ≤ l0, we need to make those are minor and contribute the same amount due to the random close pairs geometrically more compact. We re- equal survey volume. Power-law slopes can be slightly fer to this small-scale tuning of clustering as “Proximity shallower, when integral constraints are applied. The Adjustment” (PA). clustering of galaxies in Snap100 is well represented by a There are many empirical approaches for determining single power-law with the slope, γ ∼ 1.5. the proximity correction. Our method is to require: (1) the correction to be based on the LWIB, and identical to 3.2. Statistical Fractal Galaxies : L´evy Walks the latter in the limit of zero correction; and (2) that the corrections only be applied on small scales r ≤ l0. We Network Topology 5

tion of Snap100, at least to the accuracy of practical Table 1 clustering studies. LWIBa and LWIBPAb can be con- L´evy Walk Models sidered, respectively, as “lower” and “upper” bounds of correlation functions encompassing suppressed and en- Name l0 α lm β pθ hanced small scale clustering. LWIBb is a model with LWIBa 0.24 1.5 – – – slightly different parameters, l0 =0.2 and α =1.6, from LWIBb 0.20 1.6 – – – LWIBPAa 0.24 1.5 0.01 1.5 0.35 LWIBa, demonstrating that the clustering properties of LWIBPAb 0.24 1.5 0.01 1.5 1.00 LWIB models do not change abruptly by choosing pa- rameters nearby. Hence, the four types of L´evy walk

Note. — l0 and α are the basic L´evy walk parameters presented models illustrated in Figure 1 span a good range of pos- in Equation 1. The others are for the “proximity adjustment” as ex- sible L´evy fractals, comparable to Snap100. plained in the text. None of these L´evy walk models properly mimic The important point is that while the L´evy walk dis- the spatial distribution of Snap100. tributions reproduce the two-point correlation function of the galaxy distribution in the Illustris simulation, refer to the models satisfying these two criteria as “L´evy none of them mimic the actual spatial distribution of Walks in a Box with Proximity Adjustment” (LWIBPA). the Snap100 galaxies. In particular, the L´evy walks fail Specifically, we choose a simple extension of LWIB for to reproduce the filamentary structure that is so charac- our LWIBPA model, as follows. First, from the initial teristic of actual galaxy distributions. This implies that position (or the current position), we generate the next L´evy fractals are not appropriate for explaining the struc- walk by LWIB with (l , α). Second, we find the nearest 0 ture of the (simulated) Universe, and two-point statistics neighbor from the new walk position. Third, if the dis- are highly degenerate. As noted in §2, this is because tance from the nearest neighbor r is larger than the min topological features are elusive in n-point statistics, while minimum step size l , i.e., r >l , then we accept this 0 min 0 human eyes are more adapted to effectively recognize walk and proceed to the next iteration. If r ≤ l , min 0 topologies of patterns and connectivities. Unlike what we calculate a new step size from the new power-law of has been believed up to now, that such eye capturing (l ,β), where l < l . If this new step size, r , is m m 0 new features are hard to quantify, in the following sections larger than l , i.e., r > l , then we discard this PA 0 new 0 we describe how such topologies can be measured using process to accept the original LWIB position and proceed network science tools. to the next iteration. If rnew ≤ l0, we take a random roll, µ ∈ [0, 1). If this roll, µ, is larger than our acceptance 4.2. Network Analysis: Quantifying Missing Topologies threshold, pθ, i.e., µ>pθ, we again discard this PA pro- cess to accept the original LWIB position and proceed A network is a data structure composed of “vertices” to the next iteration. Finally, for µ ≤ p along with the (or nodes) connected by “edges” (or links); also known as θ a graph in mathematics. In the 21st century, network sci- previous rmin ≤ l0 and rnew ≤ l0, we accept this PA correction. We keep the direction between the nearest ence (or graph theory) has becomes one of the most crit- neighbor and new walk position to only replace r with ical tools in various fields, such as bioinformatics, com- min puter science, physics, and sociology. In a previous work rnew. Whether the PA correction is accepted or not, the next walk is calculated based on the original LWIB posi- (HD15), we explored the use of network measures (be- tion. We build this PA recipe in a conservative manner tweenness, closeness and degree) to investigate the rela- to keep the new LWIBPA as close as possible to the orig- tionships between galaxy properties and topology. The inal LWIB. To briefly summarize, our LWIBPA model results were promising, but limited by the use of pho- is a broken two-power-law model with a threshold of ac- tometric (rather than spectroscopic) redshifts to char- ceptance probability determining the choice between the acterize the 3-dimensional galaxy distribution. Readers two power laws, and is represented by the five parameters interested in networks are referred to Newman (2003), Dorogovtsev & Goltsev (2008), Barth´elemy (2011), and (l0,α,lm,β,pθ). The bottom-panels of Figure 1 show the spatial dis- HD15 for further information. tributions of our two LWIBPA models, LWIBPAa and 4.2.1. Linking Length and Friends-of-friends Network LWIBPAb, where their parameters are summarized in Table 1. The top-right panel shows the two point cor- To build a network from a given galaxy population, we relation functions of LWIBPAa (green) and LWIBPAb adopt the conventional friends-of-friends (FOF) recipe (blue). These two models are variants of the origi- (e.g. Huchra & Geller 1984, More et al. 2011, HD15). For a given linking length l, we define the adjacency nal LWIBa model, sharing the same parameters (l0 = 0.24, α =1.5); but having different acceptance probabil- matrix as, ities, pθ =0.35 for LWIBPAa and pθ = 1 for LWIBPAb. 1 if rij ≤ l, Aij = (4) LWIBa also corresponds to the model of pθ = 0.  0 otherwise,

4. RESULTS where rij is the distance between the two vertices, i and 4.1. j. This binary matrix quantitatively represents the net- Missing Topologies in Two Point Correlation work connectivities of the FOF recipe. Many important Function network measures are derived from this matrix. In the previous section, we described our adopted sim- 4.2.2. ulation sets, Snap100, and L´evy walk recipes, and mea- Network Topologies : Diameter, Giant Component, sured their two-point correlation functions, as summa- and Transitivity rized in Figure 1. We measure three simple scalar quantities, diameter, LWIBPAa yields a good match to the correlation func- giant component, and transitivity, from FOF networks 6 Hong et al.

Figure 1. The top-left panel shows the spatial distribution of the Illustris galaxy distribution at z = 0.58 (“Snap100”) in comoving coordinates and the middle and bottom panels show L´evy walk galaxies with various parameters, summarized in Table 1. The top-right panel shows the two point correlation function for each sample. The dashed vertical lines represent the minimum step sizes of L´evy Walk models. We produce 100 realizations for each L´evy walk and, here, we present 5 measurements (hence, 5 lines for each) to illustrate ensemble variances. The major difference, even clear in visual inspection, between the cosmological simulation and L´evy fractals is the filamentary structure, which is absent in the L´evy fractal realizations. Network Topology 7

The Diameter is the largest path length of shortest- 1.0 pathways from all pairs in a network. The path length is defined as the number of steps to reach from a certain 0.8 vertex, i, to another, j. Hence, the pathways of min- imum path length are the shortest pathways between 0.6 the vertices, i and j; generally, there can be multiple shortest pathways between a pair in an unweighted net- work. When a linking length is quite small to isolate all 0.4 galaxies alone, the diameter is trivially 0. As the link-

Giant Component ing length is increased, the diameter grows to reach a 0.2 certain maximum value. Since, for a very large linking length, all pairs are connected by a single direct edge (in 0.0 the mathematical terms, forming a “complete graph”), 0.0 0.5 1.0 1.5 2.0 2.5 3.0 the diameter asymptotically decreases to 1, after reach- Linking Length (h−1 Mpc) ing the maximum value. Hence, this varying curve of Diameter vs. Linking Length is a quantified topology, depending on pathway structures. 300 The Giant component is the largest connected sub- graph in a network. As in the case of diameter, gi- 250 ant components are trivial for the two extreme linking lengths. For a small linking length that isolates individ- 200 ual galaxies, the size of the giant component is 1. In the opposite case of a very large linking length forming 150 a complete graph, the giant component size is equal to

Diameter the total number of vertices (galaxies). Hence, the ratio 100 of the size of the giant component to the total number of vertices is a fraction that increases from 0 to 1 monoton- 50 ically with the linking length. The rate of growth of this 0 ratio depends on topology; if a network has some topo- 0.0 0.5 1.0 1.5 2.0 2.5 3.0 logical structures to connect vertices more efficiently, the Linking Length (h−1 Mpc) fraction of the giant component grows faster to reach 1 at a smaller linking length. Transitivity can be described as a “triangle density” 1.0 for a network. It is defined as: number of closed paths of length two 0.8 C = , (5) number of paths of length two 0.6 where C denotes the transitivity (Newman 2010). A path Snap100 of length two means a “∨” shaped connection; i.e. my 0.4 LWIBa Transitivity friend-of-friend configuration in a social network. If my LWIBb friend-of-friend is my direct friend, this path of length 0.2 LWIBPAa two forms a closed path of length two; i.e. a triangle ▽ LWIBPAb “ ”. Therefore, Equation 5 predicts a higher transitiv- 0.0 ity value if there are more triangles in a network. To 0.0 0.5 1.0 1.5 2.0 2.5 3.0 some extent, transitivity can be considered as a minimal Linking Length (h−1 Mpc) (and topological) version of the three point correlation function. Figure 2. The three network measurements, giant component Figure 2 shows the three network quantities for our 5 fraction (top), diameter (middle), and transitivity (bottom) vs. samples, Snap100 (red-open diamonds), LWIBa (black linking length for the five models, the Illustris z = 0.58 snapshot (“Snap100”) (red-open diamonds), LWIBa (black lines), LWIBb lines), LWIBb (grey lines), LWIBPAa (green lines), and (grey lines), LWIBPAa (green lines), and LWIBPAb (blue lines). LWIBPAb (blue lines). As in Figure 1, for each L´evy For each L´evy walk model, we plot 5 lines for 5 realizations like in walk model we plot 5 lines for 5 realizations to illustrate Figure 1 to illustrate statistical variances. All the three network statistical variance. The three network quantities are measurements show clear separations between Snap100 and L´evy fractals, implying that L´evy walk models fail to match the topolog- uniquely determined for a given linking length. Namely, ical properties of Snap100. The galaxy distribution in the Illustris the three plots of diameter, giant component, and transi- simulation is clearly not a L´evy fractal. tivity vs. linking length are self-consistently determined for a given spatial distribution like n−point statistics for various linking lengths, using the open network li- without any further parameter or assumption, except for brary, igraph (Csardi & Nepusz 2006). Due to their their independent variable, linking length. simple definitions, these measures are computationally The top panel of Figure 2 shows the results for gi- cheap and widely used in complex networks. The ques- ant component fractions. Now we can quantitatively tion is whether these network science tools can quantify discern Snap100 (red-open diamonds) from LWIBPAa the topological differences missed by two point statistics. (green lines), though they have (practically) the same 8 Hong et al.

Figure 3. The simulated galaxies of the Illustris z = 0.58 snapshot (“Snap100”; grey dots) and edges connecting galaxies in the giant component (red line), visualizing the spatial network structure of the giant component. The linking length is 1.1 h−1Mpc, where the diameter is maximized. The texture of this Snap100 giant component can be described as “thin, diversifying, and filamentary”. abundance and two-point correlation function, shown in realizations, none of the L´evy walk models can match the Figure 1. All the other L´evy walk models also fail to diameter measurements of Snap100. Hence, both the size match the growth curve of giant component fractions. of the giant component and diameter are network mea- When considering that LWIBa and LWIBPAb are, re- sures that discriminate the L´evy walk topologies from spectively, lower and upper bounds of the small-scale the Illustris simulation, despite the data sets being con- clustering for LWIBPAa (and Snap100), the failures of structed to have matching abundance and two-point cor- all L´evy walk models to match giant component frac- relation statistics. tions imply the fundamental difference in the pathway The linking length for maximum diameter is related to topology between Snap100 and L´evy fractals; Snap100 the inflection point of the growth curve of giant compo- has more efficient pathways to connect all galaxies at a nent fractions; the rate of growth of the giant component shorter linking length than L´evy fractals. Very likely, this decreases after reaching the maximum diameter. This is due to the filamentary structures in Snap100, lacking transitioning feature occurs due to the “saturation” of in L´evy fractals. connecting edges. At first (i.e., small linking length val- The middle panel of Figure 2 shows the diameters. We ues), increasing the linking length results in adding new can again see clear separations of Snap100 from the L´evy vertices and increasing the size of the connected net- walk models. Snap100 reaches the maximum diameter, work components. However, once the largest diameter 300, at the linking length, 1.1h−1Mpc, while L´evy walk is reached, increasing the linking length tends to form models reach the maximum diameters around 200 for new pathways within the existing structure between more linking lengths near 2.0h−1Mpc. Even for 100 L´evy walk far-flung members and only slowly increases the overall Network Topology 9

Figure 4. The same as Figure 3 but for LWIBPAa. The linking length is 2.0 h−1Mpc, where the diameter for LWIBPAa is maximized. The texture of LWIBPAa’s giant component (red lines) can be described as “thick, clumpy, and modularized”. The blue lines show the edges of the giant component (hence, the largest component), and the green lines of the second, third, and fourth largest components, for the linking length 1.1 h−1Mpc, comparable to Figure 3. While the giant component of Snap100 shows a fully developed global structure at 1.1 h−1Mpc, the giant component (blue lines) and next largest components (green lines) of LWIBPAa are still localized due to the lack of topological bridges. size of the connected structure. Therefore, the diame- fractals. ter is maximized at this critical scale, transitioning from An interesting feature is the difference of convexities “growing phase” to “saturating phase”. The previous between Snap100 (concave or “cup”) and L´evy fractals percolation studies are closely related to this maximum (convex or “hat”). The transitivities of Snap100 are high diameter scale, though they have not measured these spe- for small linking lengths, then decrease to a minimum cific diameters. If the system size is infinite, the diame- transitivity at 0.4 h−1Mpc as the linking length increases. ter measurements transit from finite values to an infinity After this, the transitivities slowly increase to 0.8. This near this scale. transitivity trend of Snap100 is related to the transition The bottom panel of Figure 2 shows transitivity re- between the one-halo term to the two-halo term in halo sults. Again, none of L´evy walk models mimic the transi- occupation clustering models (Berlind & Weinberg 2002). tivity curve of Snap100. We note that the statistical vari- For small linking lengths, most triangles form in clus- ances of transitivity measurements are much smaller than ter environments reflecting halo substructures. Hence, the other measurements as shown in Figure 2, since a sin- these “intra-halo triangles” (i.e., triangles lying within gle realization of the network is statistically large enough one halo) dominate the transitivities for small linking for counting triangles. Hence, the difference of transitiv- lengths, and result in a decreasing trend from a very ities between Snap100 and L´evy fractals also suggests high transitivity. On the other hand, for sufficiently that Snap100 is topologically very different from L´evy large linking lengths, “inter-halo triangles” (i.e., halo- 10 Hong et al. halo-halo triangles) dominate over intra-halo triangles, Decreasing Transitivity simply because their configurations are more frequently found. Since a “∨” shaped configuration becomes a trian- gle for a larger increased linking length, the transitivities for inter-halo scales are generally an increasing function. Therefore, this is potentially a very interesting point. For L´evy walk models, the origins of triangles are dif- * ferent from Snap100. For scales smaller than the mini- −1 −1 mum L´evy walk step l0 (0.2 h Mpc and 0.24 h Mpc in Table 1), the triangles originate from “random” encoun- ters or our “proximity adjustment” recipe. For scales larger than the minimum L´evy walk steps, the fractal L´evy walks shape the rest of the triangles. Hence, dis- continuities occur at these breaking scales in transitivity Increasing Transitivity curves. The typical fractal transitivities increase to reach maximum values, and then asymptotically decrease to around 0.75. These convex (or “hat”) trends contrast to * the concave (or “cup”) shape of Snap100. Figure 3 shows the edges (red lines) connecting galax- ies in the giant component of Snap100, visualizing the spatial network structure of the giant component. The −1 Figure 5. Schemas demonstrating two possible cases to increase linking length is 1.1h Mpc, where the diameter is max- (top) and to decrease (bottom) transitivity values by adding a new imized. The texture of this Snap100 giant component vertex (asterisk) and its edges (dashed-grey lines). In the top di- can be described as “thin, diversifying, and filamentary”. agram, the new dashed edge produces 4 additional “∨” configu- Figure 4 shows the same as Figure 3 for LWIBPAa. The rations, but none of them form triangles. On the other hand, in −1 the bottom diagram, three triangles form by the new vertex and linking length is 2.0 h Mpc, where the diameter for edges. Hence, the different convexities of transitivity curves for LWIBPAa is maximized. The texture of LWIBPAa’s gi- Snap100 and L´evy fractals reflect the intrinsic difference of topo- ant component (red lines) can be described as “thick, logical structures. clumpy, and modularized”. The blue lines show the edges of giant component for the linking length 1.1 h−1Mpc, In this paper we have used a network approach to com- comparable to Figure 3. While the giant component of pare two galaxy distributions with similar two-point cor- Snap100 shows a fully developed global structure at 1.1 relation statistics but different topologies, one derived h−1Mpc, the giant component of LWIBPAa (blue lines) from a cosmological simulation and the other from a L´evy is still localized due to the lack of topological bridges. walk. The network measures are computed directly from Overall, the structural and topological differences be- the point distribution of the galaxies, unlike past mea- tween Snap100 and L´evy fractals are well reflected in sures that characterize a smoothed continuous version of network structure. the point distribution. We find that the simulated galax- Figure 5 presents two basic schemas to demonstrate ies and L´evy walks are statistically different in diameter, which topological configuration can increase (or de- giant component, and transitivity measurements, which crease) the transitivity. We note that variation of tran- shows that L´evy walks fail to mimic the topologies of the sitivity depends on very complex topological structures. distribution of the simulated galaxies, though they suc- The schemas are only two possible cases among many. cessfully match the abundance and two point correlation The top diagram of Figure 5 shows that the new ver- function. tex (asterisk) and edges (grey dashed lines) produce four This implies that quantified topologies are important additional “∨” configurations, but none of them form a for testing cosmologies. While n-point statistics are un- triangle; hence, transitivity decreases by this new vertex. deniably useful diagnostics, their topological complemen- This schema provides a possible illustration as to why taries are necessary to properly test cosmologies and to Snap100 shows a decreasing transitivity trend at intra- prevent misinterpretation that could result from over- halo scales. On the other hand, the bottom diagram of simplified false-positive models. Figure 5 shows that the new vertex and edges form three additional triangles to increase the transitivity. Basi- cally a linear chain of walks is less efficient in forming We are grateful to an anonymous referee for comments triangles than a gravitational pull to pack galaxies. This that have improved this paper. SH’s research activities explains why L´evy walks show smaller transitivities at have been supported by the National Optical Astron- small scales than Snap100. However, such low transitiv- omy Observatory (NOAO) and the University of Texas at ity values can be restored as the linking length increases Austin, and AD’s by NOAO. NOAO is operated by the as in the bottom diagram. Hence, the different behaviors Association of Universities for Research in Astronomy of transitivity between Snap100 and L´evy fractals reflect (AURA) under cooperative agreement with the National the different topological bindings of galaxies (or walks); Science Foundation. LH is supported by NASA ATP i.e., gravitationally packed solid ball vs. linearly tangled Award NNX12AC67G and NSF grant AST-1312095. ball. REFERENCES

5. SUMMARY [1]Adelberger, K. L., et al. 2005, ApJ, 619, 697 Network Topology 11

[2]Albert, R., & Barab´asi A.-L., 2002, Reviews of Modern Physics, [34]Lidz, A. et al. 2010, ApJ, 718, 199 74, 47 [35]Mandelbrot, B. 1975, C. R. Acad. Sci. (Paris) A280, 1551 [3]Alon, N., Yuster, R., & Zwick, U. 1997, Algorithmica, 17, 209 [36]Mart´ınez V. J., Starck J.-L., Saar E., Donoho D. L., Reynolds S. [4]Arag´on-Calvo M. A., Jones B. J. T., van de Weygaert R., van der C., de la Cruz P., Paredes S., 2005, ApJ, 634, 744 Hulst J. M., 2007, A&A, 474, 315 [37]Mecke K. R., Buchert T., Wagner H., 1994, A&A, 288, 697 [5]Barkats, D. et al. 2014, ApJ, 783, 67 [38]More, S., et al. 2011, ApJS, 195, 4 [6]Barab´asi A.-L., 2009, Science, 325, 412 [39]Nelson, D., et al. 2015, A&C, 13, 12 [7]Barrow J. D., Bhavsar S. P., Sonoda D. H., 1985, MNRAS, 216, [40]Nelson, D., et al. 2013, MNRAS, 429, 3353 17 [41]Newman, M. E. J., 2003, SIAM Review, 45, 167 [8]Barth´elemy, M. 2011, Physics Reports, 499, 1 [42]Newman, M. E. J., 2010, Networks: An Introduction. Oxford [9]Berlind, A. A., & Weinberg, D. H. 2002, ApJ, 575, 587 Univ. Press, Oxford [10]Bond N. A., Strauss M. A., Cen R., 2010, MNRAS, 409, 156 [43]Park, C. et al. 2005, ApJ, 633, 11 [11]Csardi G., Nepusz T., 2006, InterJournal, Complex Systems, [44]Planck Collaboration, & Ade, P. A. R. et al. 2015, 1695 (http://igraph.org) arXiv:1502.01592 [12]Cautun M., van de Weygaert R., Jones B. J. T., 2013, MNRAS, [45]Planck Collaboration, & Aghanim, N. et al. 2015, 429, 1286 arXiv:1507.02704 [13]Colberg J. M. 2007, MNRAS, 375, 337 [46]Shandarin, S. F., 1983A, Pis’ma Astron. Zh. 9, 195 [Sov. Astron. [14]Cormen, T. H., et al. 2009, Introduction to Algorithms, 3rd Lett. 9, 104 (1983)] (A) Edition, The MIT Press, Cambridge, Massachusetts [47]Shandarin, S. F., et al. 1983B, Usp. Fiz. Nauk 139, 83 [Sov. Pys. [15]Davis, M. & Peebles, P. J. E. 1983, ApJ, 26, 465 Usp. 26, 46 (1983)] (B) [16]Delubac, T., et al. 2015, A&A, 574, 59 [48]Shandarin, S. F., & Zeldovich, Y. B., 1989, Reviews of Modern [17]Di Matteo, T. et al. 2005, Nature, 433, 604 Physics, 61, 185 [18]Dorogovtsev, S. N., Goltsev, A. V., 2008, Rev. Mod. Phys., 80, [49]Sheth J. V., Sahni V., Shandarin S. F., Sathyaprakash B. S., 1275 2003, MNRAS,343, 22 [19]Ducout, A. et al. 2013, MNRAS, 429, 2104 [50]Sijacki, D., et al. 2012, MNRAS, 424, 2999 [20]Euler L., 1741, Commentarii Acad. Sci. Petropolitanae, 8, 128 [51]Sijacki, D., et al. 2012, MNRAS, 380, 877 [21]Eisenbrand, F., & Grandoni, F. 2004, Theoretical Computer [52]Soneira, R. M., & Peebles, P. J. E. 1978, AJ, 83, 845 Science, 326, 57 [53]Sousbie T., Pichon C., Courtois H., Colombi S., Novikov D., [22]Eriksen, H. K. et al. 2004, ApJ, 612, 64 2008, ApJ, 672, L1 [23]Genel, S., et al. 2014, MNRAS, 445, 175 [54]Springel, V. & Hernquist, L. 2003, MNRAS, 339, 289 [24]Gil-Marin, H., et al. 2015, MNRAS, 451, 539 [55]Springel, V. et al. 2005, MNRAS, 361, 776 [25]Gott J. R., Weinberg D. H., Melott A. L., 1987, ApJ, 319, 1 [56]Springel, V. 2010, MNRAS, 401, 791 [26]Hikage C. et al. 2008, MNRAS, 389, 1439 [57]Tinker, J. 2007, MNRAS, 374, 477 [27]Hinshaw, G. et al. 2013, ApJS, 208, 19 [58]Vogelsberger, M. 2014A, Nature, 509, 177 (A) [28]Hong, S., & Dey, A. 2015, MNRAS, 450, 1999 [59]Vogelsberger, M., et al. 2014B, MNRAS, 444, 1518 (B) [29]Huchra, J. P. & Geller, M. J. 1982, ApJ, 257, 423 [60]Vogelsberger, M., et al. 2013, MNRAS, 436, 3031 [30]Keres, D., et al. 2012, MNRAS, 425, 2027 [61]Vogelsberger, M., et al. 2012, MNRAS, 425, 3024 [31]Kulkarni, G., et al. 2007, MNRAS, 378, 1196 [62]Zhao, G, et al. 2015, arXiv:1510.08216 [32]Landy, S. D., & Szalay, A. S. 1993, ApJ, 412, 64 [63]Zheng, Z., et al. 2005, ApJ, 633, 791 [33]Levi, M. et al. 2013, arXiv:1308.0847 [64]Zhu et al. 2015, ApJ, 800, 6