Eindhoven University of Technology

BACHELOR

Site on the hierarchical

Verleijsdonk, Peter

Award date: 2017

Link to publication

Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required . The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Department of Mathematics and Computer Science

Site Percolation on the Hierarchical Configuration Model

Bachelor Thesis

P. Verleijsdonk

Supervisors: prof. dr. R.W. van der Hofstad C. Stegehuis (MSc)

Eindhoven, March 2017 Abstract

This paper extends the research on percolation on the hierarchical configuration model. The hierarchical configuration model is a configuration model where single vertices are replaced by small community structures. We study site percolation on the hierarchical configuration model, as well as the critical percolation value, size of the giant and distribution after percolation. For this we use analytical methods and a stochastic simulation.

2 Site Percolation on the Hierarchical Configuration Model CONTENTS

Contents

Contents 3

1 Introduction 1 1.1 Report Structure...... 1

2 Model Description3 2.1 Configuration model...... 3 2.2 Hierarchical Configuration Model...... 3 2.3 ...... 4 2.4 Model Assumptions...... 4

3 Site Percolation6 3.1 Site Percolation On Special Community Structures...... 6 3.1.1 Household Communities...... 6 3.1.2 Star Communities...... 6 3.1.3 Line Communities...... 7 3.2 Site Percolation on the Configuration Model...... 7 3.3 Site Percolation on the Hierarchical Configuration Model...... 8 3.4 Existence of a after Site Percolation...... 8

4 Simulation 16 4.1 Simulation Motivation...... 16 4.2 Simulation Description...... 16

5 Simulation Results 17 5.1 Size of the Giant Component...... 17 5.2 Distances...... 23

6 Conclusions and discussion 29

7 Appendix 30 7.1 Table of Definitions...... 30

References 31

Site Percolation on the Hierarchical Configuration Model 3

1 INTRODUCTION

1 Introduction

This thesis is about community structures in graphs. Often random graphs, constructed using the so−called configuration model, are studied but do not really match real-world graph structures. A configuration model is a where the is known beforehand. All vertices have outgoing half-edges which are paired using a random permutation on a suitable vector, this vector has one entry for every open half-edge. This method of pairing is called a configuration. A more formal definition of this algorithm is given later.

Since configuration models are studied widely, many statements exist about these kind of graph structures. In order to use these results, we study an extension to the configuration model, called the hierarchical configuration model. These random graphs are on the large scale a configuration model but are constructed using small communities. The hierarchical configuration model replaces all vertices by small communities and then connects these communities in the same way as the configuration model. The advantage of such random graphs is that they contain more short cycles compared to configuration model random graphs.

The general idea about introducing these community structures is that it increases short cycles and triangles. A triangle is a set of three vertices which are all adjacent to each other. Configuration model random graphs do not have many short cycles or community structures. Often real-life problems present themselves as a problem that depends on how people or networks are connected. [8] The configuration model does not include structures like family relations or important vertices. For example a certain may play a more important or different role than other vertices inside a neighborhood. Graph structures are important in studying mathematical models. A realistic example of such a mathematical model is the spreading of a virus through a network, using the Reed-Frost model with a connectivity matrix. A connectivity matrix can be interpreted as the blueprint of a graph and hence introducing community structures might improve the accuracy of the model. We introduce community structures which are found in real-life networks such as Facebook networks or routing networks. The Facebook network is studied widely, for example by Ugander et al. [9] Facebook networks are locally highly connected and consist of many triangles and short cycles. To illustrate this, the mean number of friends between two randomly chosen users was just 4.7, as calculated in May 2011. This holds for the entire social graph, when restricted to United States, this mean distance between to randomly chosen people was only 4.3. [9] Local routing networks are traditionally introduced as star networks. A classic model consists of a central hub and adjacent components. The main disadvantage of such networks is that the central hub is a single point of failure. [7]

The main research question is how introducing community structures changes model specifics. We compare both models when looking at graph properties such as connected components and distances. We derive new results in , sufficient conditions when a giant component exists after site percolation for specific community structures and we show that introducing community structures does not necessarily increase the critical percolation threshold or mean distance. Site- or bond percolation can be seen as the process of deleting vertices or edges respectively with a given probability. Furthermore we analyze the empirical distance distributions obtained after stochastic simulation and present a statistical analysis.

1.1 Report Structure This report follows the following schematic approach. First the mathematical objects are formally introduced and we explain which properties are to be studied and why. It is important to precisely describe how the hierarchical configuration model extends a configuration model random graph and under what conditions derived results are true. This is done in Section2.

Site Percolation on the Hierarchical Configuration Model 1 1 INTRODUCTION

In Section3, we study several community structures using an analytical approach. Since these properties are suitable for a stochastic simulation, the next main section, Section4, is dedicated to explaining this sort of simulation. Then these simulation results are presented, analyzed and compared to the prior knowledge in Section5.

2 Site Percolation on the Hierarchical Configuration Model 2 MODEL DESCRIPTION

2 Model Description

Here the mathematical objects which will be studied are introduced. For ease of reference, a list of definitions is included in the appendix.

A configuration model or a hierarchical configuration model is a random graph model. We start with the definition of an arbitrary graph. Definition 1. A graph G = (V,E) is an ordered pair where V and E ⊂ V 2 represents the set of vertices and edges respectively. If (w, v) ∈ E then shorthand notation for this is w ∼ v. Thus a graph consists of a set of vertices and a set that describes which vertices are connected.

Definition 2. The neighborhood NG(v) = {u ∈ V (G)|u ∼ v} ⊂ V of a vertex v in a graph G is the set of all vertices which are adjacent to v. A simple graph is a special kind of graph: Definition 3. A graph G = (V,E) is called simple if and only if ∼ is a symmetric and anti- reflexive relation. Furthermore, every element in E is unique. So a simple graph is an unweighted, undirected graph where self-loops or multi-loops are not allowed. We now introduce a special kind of graph structures, the configuration model.

2.1 Configuration model The definition of a configuration model is straightforward. [4] The configuration model is an al- gorithm which produces a random graph G with N vertices where every vertex v has kv outgoing edges. These (ki) are called the degree sequence and are assumed to be known beforehand. The first step in constructing the graph is defining N.

The next step is adding ki half-edges to every vertex in the network. Completing the graph is done by uniformly matching the half-edges. This can either be done using a random permutation on a suitable vector which has one entry for every half-edge or selecting random half-edges iteratively P until all half-edges are connected. It follows that ki must be even. Note that this algorithm is able to produce self-loops and multi-edges. When E[D2] < ∞, these self-loops and multi-loops become rare. [3]

2.2 Hierarchical Configuration Model The hierarchical configuration model is an extension of the configuration model as introduced by Van der Hofstad et al. in their articl [2]. On the large scale, the idea is that the hierarchical configuration model behaves like a configuration model. However when studying the graph, a community structure can be seen on local scale. To achieve this, every vertex in a graph is re- placed by a small subgraph, a community. There are no restrictions on these communities, except that they must satisfy simpleness and connectedness.

Define a random graph G with n communities. Let H be a community. Then H is a subgraph (b) of G with it’s own specifics. Let H be represented as H = (F, (dv )v∈VF ) where F = (VF ,EF ) (b) (c) satisfies simpleness and connectedness. Define dv = dv +dv as the total degree of a vertex. Here (b) (c) dv represents the number of edges to other communities and dv the degree of a vertex inside the community.

Then d = P d(b) is the total degree of a community H. Hence d describes the degree H v∈VF v H (b) of a community and dv describes the inter-community degree of a single vertex. Then G is a configuration model with degrees dH and n vertices. In this view, all communities are suppressed

Site Percolation on the Hierarchical Configuration Model 3 2 MODEL DESCRIPTION to a single vertex.

2.3 Community structure Communities may appear in several forms. We say that two communities are of the same type if and only if the (sub)graphs representing these communities are isomorphic and the outside degrees coincide. Hence for communities H1 and H2 in a graph G it must hold that:

∃f : V (H1) → V (H2): v ∈ NG(u) ⇐⇒ f(v) ∈ NG(f(u)), And

(b) (b) ∀v ∈ V (H1): dv = df(v). This implies that a bijection f exists such that two nodes are adjacent if and only if their images (n) are adjacent. Then define nH as the number of type H communities in the graph G with n (n) communities. Hence nH /n describes the fraction of type H communities in the graph. Let (dHi ) and (si) be the corresponding community degree and community size sequences. Then for a ran- domly chosen community, define Dn and Sn as the degree and size of that community, respectively.

2.4 Model Assumptions When deriving important results in the paper by Van der Hofstad et al. [2], the following technical assumptions for the community structures in the hierarchical configuration model were required:

Condition 1 (Community structure) as n → ∞

(n) P(Hn) = nH /n → P(H),

And

P E[Sn] → E[S] < ∞.

This says that the probability of obtaining a community of type H follows a probability distribu- tion in limit, and that we expect these communities to be of a finite size. Furthermore:

Condition 2 (Community connectedness) as n → ∞

E[Dn] → E[D] < ∞,

Where

P(D = 2) < 1.

4 Site Percolation on the Hierarchical Configuration Model 2 MODEL DESCRIPTION

This puts a restriction on the outgoing degree of a community. Condition 1 already implies con- vergence in distribution of Dn and Sn since the probability of obtaining a type H community exists for every type. We use results derived in the paper by Van der Hofstad et al. [2] These results rely on these assumptions, for example existence of a giant component in the hierarchical configuration model. We need convergence of Sn and Dn in order to study infinite graphs. We say that a connected component is a giant component if the fraction of vertices in the connected component is not equal to 0. Thus the size of a giant connected component tends to ∞ as n → ∞.

For configuration model random graphs, a similar set of assumptions is required. For a graph G with N vertices and degree sequence (ki) we require convergence in degree distribution and mean degree. Note that these conditions coincide with above assumptions when communities are seen as a single vertex.

Condition 1 (Degree distribution) as N → ∞

#{k |k = j} i i → p ∀j ≥ 0, N j And,

Condition 2 (Mean degree) as N → ∞

X j · pj → λ > 0. j

Site Percolation on the Hierarchical Configuration Model 5 3 SITE PERCOLATION

3 Site Percolation

Percolation theory studies the connected components or clusters in a random graph. We use the following definition for site percolation:

Definition 4. Let A be the of a graph G. Then with probability 1 − π the i-th row and column of A are replaced by a zero vector. This process is iterated (independently) for all i ∈ {1, .., N} where N is the size of G.

This means that with probability 1 − π a vertex from the graph becomes isolated and it loses all outgoing connections. An equivalent definition is to remove the row and column of the adjacency matrix, since the point is isolated it is no part of any connected component, however we have to account for the total amount of deleted vertices when calculating the size of the remaining connected components.

An alternative definition for site percolation is an algorithm that produces the same graph in two steps. This method is described by Janson in his paper about the configuration model [1], and we refer to it as Janson’s Approach.

Definition 5. Every vertex v in a graph G, with probability 1 − π this vertex is replaced by dv ∗ red vertices of degree 1. Denote this intermediate random graph by Gπ where n+ red vertices ∗ are present. Let Gπ be the graph where the red vertices are deleted. To obtain Gπ from Gπ, n+ ∗ uniformly chosen vertices of degree 1 need to be deleted from Gπ.

In the case N → ∞, it does not matter which vertices of degree 1 are deleted, the n+ vertices may be chosen uniformly amongst all vertices with degree 1. This is true since vertices of degree 1 cannot break up connected components.

3.1 Site Percolation On Special Community Structures There exist special community structures which are affected in a special way by site percolation. We will study three of these structures.

3.1.1 Household Communities Household structures are structures where remaining vertices always stay connected after percol- ation.

Definition 6. A community H is a household community when the graph H is complete and (b) furthermore every vertex has inter-community degree 1, i.e., ∀v ∈ V (H) it holds that dv = 1.

This leads to the unique situation where a household always falls apart into one household com- munity and several communities of size 1.

3.1.2 Star Communities A star community is a community with one central vertex. This central vertex alone connects the entire community. Furthermore every vertex inside the community has one half-edge for a connection with other communities, except for the central vertex, which has none. An example of a star community is shown in Figure1.

Definition 7. A community H of size k is a star community when there exists an isomorphism f such that the adjacency matrix A of the community f(H) satisfies A1,j = Aj,1 = 1 ∀j 6= 1 and (b) 1 Ai,j = 0 otherwise. Furthermore for V (f(H)) = {vj|j ∈ {1, .., k}} it holds that dvj = 1 − δj .

6 Site Percolation on the Hierarchical Configuration Model 3 SITE PERCOLATION

Figure 1: Star Community of size 6

As a result of site percolation on the central vertex, the entire community falls apart or the remaining vertices stay connected.

3.1.3 Line Communities Line communities present themselves as a connected chain. Only the outer vertices have two half-edges to another community, as can be seen in Figure2. Definition 8. A community H of size k is a line community when there exists an isomorphism f such that adjacency matrix A of the community f(H) satisfies Aj,j+1 = Aj+1,j = 1 ∀j ∈ {1, .., k − 1} and Ai,j = 0 otherwise. Furthermore for V (f(H)) = {vj|j ∈ {1, .., k}} it holds that (b) dvj = 2 for j = 1, k and 0 otherwise.

Figure 2: Line community of size 4

A line community falls apart into line communities which may not be connected to other com- munities or consist of just a single vertex. The outer vertices have a special place in this community structure.

3.2 Site Percolation on the Configuration Model Janson has already studied site percolation in his article on the configuration model. [1] We will describe the most important results here. Let G a graph with N vertices and given degree sequence (ki). Assume this graph satisfies the assumptions made in section 2.4. These conditions state that the infinite random graph has an underlying vertex degree distribution D with a finite mean λ. First we need the following definition:

Definition 9. The size biased distribution of D is the random variable D1 with probability mass function P(D1 = k) = (k + 1)P(D = k)/λ.

It follows that E[D1] = E[D(D − 1)]/λ. When picking a half-edge at random, the remaining number of half-edges at its endpoint approaches the distribution D1 asymptotically. When a giant component exists after percolation, this component is of size proportional to n and hence we expect that E[D1] > 1. It follows that this is an important condition.

Definition 10. The expectation of the size biased distribution of D is denoted by νD = E[D1].

Site Percolation on the Hierarchical Configuration Model 7 3 SITE PERCOLATION

The critical value for the percolation parameter π is the value πc, that is the infimum on the percolation parameter such that as n → ∞, the existence of a giant component is with high probability. It follows that such a critical value exists and can be found analytically.

Theorem 1. Let G be a random graph satisfying above conditions. Let Gπ the random graph after site percolation with probability π. Then a giant component exists in Gπ if and only if P 1 P j(j −1)πpj > λ. Then πc = and furthermore the size of the giant component v(cmax)/N → j νD P j P j−1 j≥1 πpj(1 − ξ ), where ξ ∈ (0, 1) is the unique solution of j jpjπ(1 − ξ ) = λ(1 − ξ).

∗ Theorem 2. Let G be a random graph satisfying above conditions. Let Gπ be the intermediate ∗ random graph with red vertices after site percolation with probability π. Then for v ∈ Gπ:

( ζ−1πp if j 6= 1 (d = k) = j P v −1 P ζ (πp1 + j≥1 j(1 − π)pj) if j = 1

Where ζ = π + (1 − π)λ.

These statements and proofs can be found in the article by Janson [1] and are the inspiration for the proofs in this article.

3.3 Site Percolation on the Hierarchical Configuration Model

Using Definition4, we try to derive important results regarding degree distribution after percola- tion. To do this, we first introduce an equivalent approach to site percolation, similar to Janson’s Approach as defined in section 3.2.

An equivalent approach to site percolation on the hierarchical configuration model is to replace (b) (b) every vertex v with probability 1 − π by dv new vertices of degree 1. Here dv is the inter- community degree of the vertex v. We also call these new vertices red vertices. Then after percolation of the entire graph, n+ additional communities of size 1 (with degree dH = 1) are present, so afterwards, n+ communities of degree 1 need to be deleted. Note that this is equival- ent to the alternative definition of site percolation where the original vertex would be deleted.

∗ Before removing n+ vertices, an intermediate graph Gπ is obtained.

3.4 Existence of a Giant Component after Site Percolation

When applying percolation to a random graph, it is likely that the largest connected component during percolation falls apart. This behavior is interesting when studying community structures.

For the proof, we use the adjusted version of Janson’s Approach with red vertices. For the hier- archical configuration model, we distinguish between edges in the same community and different communities. Now only red vertices appear for deleted edges between vertices in different com- munities. This looks like this:

8 Site Percolation on the Hierarchical Configuration Model 3 SITE PERCOLATION

Figure 3: Site Percolation on a HCM

We see 3 household communities in one connected component. The vertices with a ”*” next to them are to be deleted. We see 4 red vertices appear which are all communities with dH = 1.

Theorem 3. Let G be a hierarchical configuration model as introduced above with household com- (b) munities and dv = 1 ∀v ∈ G. Then if and only if

2 2 π (E[S ]−E[S]) νD∗ = π 2 S > 1, E[S]− 1−π E[(S −S)(1−π) ]

a giant component exists in Gπ with high probability. ∗ Proof. Let G be a graph with n household communities. Let Gπ be the intermediate graph after site percolation with parameter π following the red vertices method. We show that this graph is ∗ ∗ again a hierarchical configuration model. Let H be a community in the intermediate graph Gπ. Let H be the community where H∗ origins from. Then as n → ∞

X #{H∗|V (H∗) = k}/n = #{H∗|V (H∗) = k, V (H) = l}/n ∀k > 1, l≥k   X #{i|si = l} l = πk(1 − π)l−k, n k l≥k X l  → (S = l) πk(1 − π)l−k, P k l≥k := f(k). This clearly represents the probability that a size k community arises by summing over all probab- ilities such that a size l community transforms into a size k community. However, after percolation, ∗ + more communities are present. Let H+ denote the number of communities in Gπ and Hi the random variable which describes how many communities arise from one random community after site percolation. Then by the Law of Large Numbers

n 1 X H /n = H+ → [H+]. + n i E i i=1 Thus we have that

∗ ∗ ∗ #{H |V (H ) = k}/n f(k) P(V (H ) = k) = → + . H+/n E[Hi ]

Site Percolation on the Hierarchical Configuration Model 9 3 SITE PERCOLATION

∗ P ∗ Lastly, P(V (H ) = 1) = 1 − k>1 P(V (H ) = k). Denote this probability distribution by def S∗ = V (H∗).

Let D∗ be the community degree distribution after percolation. Then D∗ =d S∗ since every per- ∗ colated vertex becomes a red vertex. We conclude that Gπ is a hierarchical configuration model with community size distribution S∗ and community degree D∗. Hence if and only if

E[D∗(D∗ − 1)] νD∗ = > 1, E[D∗] ∗ ∗ w.h.p. a giant component in Gπ exists. If a giant component does not exist in Gπ, then it does not exist in Gπ after removing the red vertices. Removing the red vertices does not influence the value of πc, only the resulting size of the largest connected component. It follows that

1 2 ∗ ∗ P (k − k)f(k) [D (D − 1)] [H+] k>1 E = E i . ∗ 1 P 1 P E[D ] (1 − + k>1 f(k)) + + k>1 k · f(k) E[Hi ] E[Hi ] Thus P 2 k>1(k − k)f(k) νD∗ = + P P . (E[Hi ] − k>1 f(k)) + k>1 k · f(k) Following the line of reasoning of Janson, removing any red vertices of the connected components in G∗ will not break up any components. Red vertices appear when a point becomes percolated. + + Let n the number of red vertices. Then clearly, n ∼ Bin(N, 1 − π). To obtain Gπ from the ∗ intermediate graph Gπ, the red vertices have to be removed or disconnected.

+ For the case of household communities of fixed size l, it follows that Hi,l follows the following discrete distribution:

 l πl−k+1(1 − π)k−1 1 ≤ k ≤ l − 1 H+ = k−1 i,l (1 − π)l−1 k = l

+ + Then the general random variable Hi is the random variable Hi,l with probability P(S = l), l ∈ N. Hence

+ X + E[Hi ] = P(S = l)E[Hi,l], l>0 X X  l  = (S = l)(l(1 − π)l−1 + k · πl−k+1(1 − π)k−1), P k − 1 l>0 1≤k≤l−1 S−1 X l l 2 l−1 = E[S(1 − π) ] + P(S = l)(1 − (1 − π) + (1 − π)l − l(1 − π) − l π(1 − π) ), l>0 S−1 S 2 S−1 = E[S(1 − π) ] + 1 − E[(S + 1)(1 − π) ] + (1 − π)E[S] − πE[S (1 − π) ].

Thus we find

P 2 P l k l−k k>1(k −k) l≥k P(S=l)(k)π (1−π) ∗ νD = + P P l k l−k . E[Hi ]+ k>1(k−1) l≥k P(S=l)(k)π (1−π)

Where the nominator can be written as

10 Site Percolation on the Hierarchical Configuration Model 3 SITE PERCOLATION

∞ l X X l  X X l  (k2 − k) (S = l) πk(1 − π)l−k = (S = l) (k2 − k) πk(1 − π)l−k, P k P k k>1 l≥k l=0 k=0 ∞ X 2 = P(S = l)lπ (l − 1). l=0

Which simplifies to

∞ X 2 2 2 P(S = l)lπ (l − 1) = π (E[S ] − E[S]). (1) l=0

Note that P(S = 0) = 0. The second term in the denominator can be written as

l X X l  X X l  (k − 1) (S = l) πk(1 − π)l−k = (S = l) (k − 1) πk(1 − π)l−k, P k P k k>1 l≥k l>0 k=1 X l = P(S = l)(lπ + (1 − π) − 1), l>0 S = πE[S] + E[(1 − π) ] − 1.

Then the denominator simplifies to

X X l  π [H+] + (k − 1) (S = l) πk(1 − π)l−k = [S] − [(S2 − S)(1 − π)S]. (2) E i P k E 1 − π E k>1 l≥k

The result follows from combining equations1 and2 and the definition of νD.

We can do something similar for star communities. First we find the new community size distri- bution and then relate the community degree distribution to this distribution. We arrive at the following result.

Theorem 4. Let G be a hierarchical configuration model with star communities as described in section 3.1.2. Assume G satisfies the model assumptions from section 2.4. Then if and only if

3 2 π E[S −3S+2] νD∗ = π S > 1, E[S]−πP(S=1)−1+ 1−π E[(1−π) ]

a giant component exists in Gπ with high probability.

Proof. We distinguish between the two cases where the central vertex is percolated or not. The transform probability is the probability that a connected component of a community after percol- ation has size k where the original community has size l. If the central vertex is deleted then the transform probability of a size l star is pk = 1 if k = 1 and l−1  k−1 l−k 0 otherwise. If the central vertex is not deleted then pk = k−1 π (1 − π) . Thus the central vertex stays and k − 1 attached vertices, these vertices then form a star community of size k ≥ 1. Hence

Site Percolation on the Hierarchical Configuration Model 11 3 SITE PERCOLATION

X #{H∗|V (H∗) = k}/n = #{H∗|V (H∗) = kV (H) = l}/n ∀k > 1, l≥k   X #{i|si = l} l − 1 = πk(1 − π)l−k, n k − 1 l≥k X l − 1 → (S = l) πk(1 − π)l−k, P k − 1 l≥k := g(k).

+ Let Hi be the random variable describing the number of new communities that arise from a random community after site percolation. Then it follows that

X X X l − 1 [H+] = (S = l) [H+ ] = (S = l)( l(1 − π) + k · πl−k+1(1 − π)k−1), E i P E i,l P k − 1 l>0 l>0 1≤k≤l X X l − 1 = (1 − π) [S] + (S = l) k · πl−k+1(1 − π)k−1), E P k − 1 l>0 1≤k≤l X 2 2 = (1 − π)E[S] + P(S = l)(−lπ + lπ + π ), l>0 2 2 = (1 − π )E[S] + π .

And we define the following distribution function

∗ g(k) P(S = k) = + ∀k > 1. E[Hi ] ∗ P ∗ And P(S = 1) = 1 − k>1 P(S = k).

The relation between S∗ and D∗ is not trivial. All communities with size larger than one are again star communities, therefore

∗ ∗ P(D = k) = P(S = k + 1) ∀k > 1. Only star communities with the central vertex deleted can lead to a new community of size 1 with dH = 0. Furthermore a star of size 1 already satisfies dH = 0. Thus

∗ 1 X P(D = 0) = (P(S = 1) + (1 − π)P(S = l)), [H+] E i l>1 1 = + (P(S = 1) + (1 − π)(1 − P(S = 1))), E[Hi ] 1 = + (πP(S = 1) + (1 − π)). E[Hi ] ∗ P ∗ Then define P(D = 1) = 1 − l6=1 P(D = l).

We find

12 Site Percolation on the Hierarchical Configuration Model 3 SITE PERCOLATION

∗ ∗ 1 X 2 X E[D (D − 1)] = (k − k) f(k + 1), [H+] E i k≥0 l≥k l 1 X X = (k2 − k)f(k + 1), [H+] E i l>0 k+1=1 l−1   1 X X 2 l − 1 k+1 l−k−1 = P(S = l) (k − k) π (1 − π) , [H+] k E i l>0 k=0 π X 2 2 = P(S = l)(π (1 − π)(l − 3l + 2), [H+](1 − π) E i l>0 3 π 2 = + E[S − 3S + 2]. E[Hi ]

And

∗ X ∗ X ∗ E[D ] = 1 − P(D = k) + kP(D = k), k6=1 k>1 1 X 1 X = 1 − (πP(S = 1) + (1 − π) + g(k + 1)) + kg(k + 1), [H+] [H+] E i k>1 E i k>1   1 X X l − 1 k+1 l−k−1 = 1 − (πP(S = 1) + (1 − π) − P(S = l)(k − 1) π (1 − π) ), [H+] k E i k>1 l≥k+1 l   1 X X l − 1 k+1 l−k−1 = 1 − (πP(S = 1) + (1 − π) − P(S = l)(k − 1) π (1 − π) ), [H+] k E i l>0 k+1=2 l−1   1 X X l − 1 k+1 l−k−1 = 1 − (πP(S = 1) + (1 − π) − P(S = l)(k − 1) π (1 − π) ), [H+] k E i l>0 k=1

1 π X 2 l 2 = 1 − (πP(S = 1) + (1 − π) − P(S = l)(−lπ + (1 − π) + lπ + π − 1)), [H+] 1 − π E i l>0

1 π 2 S 2 = 1 − + (πP(S = 1) + (1 − π) − E[(π − π )S + (1 − π) + π − 1]). E[Hi ] 1 − π

Remember νD is defined as

E[D∗(D∗ − 1)] νD∗ = (3) E[D∗]

+ Then keeping in mind we multiply both nominator and denominator of νD with E[Hi ], we have that

Site Percolation on the Hierarchical Configuration Model 13 3 SITE PERCOLATION

π [D∗] [H+] = [H+] − π (S = 1) − (1 − π) + [(π − π2)S + (1 − π)S + π2 − 1], E E i E i P 1 − π E π = [H+] − π (S = 1) − (1 − π) + π2 [S] + [(1 − π)S + π2 − 1], E i P E 1 − π E π = [S] + π2 − π (S = 1) − (1 − π) + [(1 − π)S + π2 − 1], E P 1 − π E π = [S] − π − π (S = 1) − (1 − π) + [(1 − π)S], E P 1 − π E π = [S] − π (S = 1) − 1 + [(1 − π)S]. E P 1 − π E

And we find for νD as in Equation3 that

∗ ∗ + E[D (D − 1)]E[Hi ] ν ∗ = , D ∗ + E[D ]E[Hi ] π3E[S2 − 3S + 2] = π S . E[S] − πP(S = 1) − 1 + 1−π E[(1 − π) ]

Now we can again apply the theorem derived in the paper by Van der Hofstad et al. [2] that if and only if νD > 1, a giant component exists w.h.p., which completes the proof.

We will use these results in section5 to compare the theoretical analysis with the empirical results. + For other community structures, in general, it is difficult to find the expectation of Hi or the distribution of S∗ and D∗. However, we can state a general result using abstract functions for possible ways a community can percolate. In the above results we found the degree distribution after percolation as a consequence of the new household size distribution. To find the new degree distribution, we define the following (abstract) function that describes how vertices are connected after percolation.

Definition 11. Let g(H, v, k, π) be the probability that the connected component containing v is still connected to k other communities after site percolation with the adjusted red vertices method. Here H is a community type and π is the percolation parameter.

Then we have the following result.

Theorem 5. Let G be a hierarchical configuration model as introduced above with communities (b) and dv ≤ 1 ∀v ∈ G. Then if and only if

P P PDH −1 (b) H v∈V (H) k=1 dv P(H)g(H, v, k + 1, π)k < E[D]

a giant component exists in Gπ with high probability. Proof. The proof of this result is similar to the proof given in the paper by Van der Hofstad et al. [[2], p12-14]. We use the adjusted red vertices method and show that the intermediate graph is again a hierarchical configuration model. Note that for k = 1 this function becomes very complex, (b) since all vertices v with probability (1 − π) transform into dv communities of degree 1.

Let N (n)(H, k, π) denote the total number of connected components of all percolated versions of a community H having inter-community degree k. Let M (n)(H, v, k, π) be the total number of connected components of all percolated versions of a community H containing v and having inter-community degree k, then

14 Site Percolation on the Hierarchical Configuration Model 3 SITE PERCOLATION

(n) M (H, v, k, π)/n →P P(H)g(H, v, k, π). And (n) X (b) (n) N (H, k, π) = dv M (H, v, k, π)/k. v∈V (H) From here the proof is similar, we have that

(n) P X (b) N (H, k, π)/n → dv P(H)g(H, v, k, π)/k. v∈V (H) And therefore

∗ 1 X X (b) P(D = k) = dv P(H)g(H, v, k, π)/k. [H+] E i H v∈V (H)

Now if νD∗ > 1, a giant component exists w.h.p., thus

1 P P PDH (b) + H v∈V (H) k=1 kdv P(H)g(H, v, k, π)/k E[Hi ] νD∗ = , 1 P P PDH (b) + H v∈V (H) k=1 k(k − 1)dv P(H)g(H, v, k, π)/k E[Hi ] [D] = E . P P PDH −1 (b) H v∈V (H) k=1 dv P(H)g(H, v, k + 1, π)k

And hence if and only if

DH −1 X X X (b) dv P(H)g(H, v, k + 1, π)k < E[D] H v∈V (H) k=1 a giant component exists.

Site Percolation on the Hierarchical Configuration Model 15 4 SIMULATION

4 Simulation

Using the model dynamics of the configuration model and the hierarchical configuration model, it is possible to write a simulation to analyze the specifics of the percolated graph. We are interested in model specifics such as connectedness or distances and how they are influenced by site percolation.

4.1 Simulation Motivation The applied type of simulation is a Monte Carlo simulation, which is a simulation technique relying on repeated random sampling where the properties of the percolated graph can be expressed as an expectation. Because the complexity or size of the problem, deducing results using theoretical analysis can be quite difficult and simulation may be a more fertile approach. Simulations can also be used in verifying theoretical results.

4.2 Simulation Description To analyze the influence of the household types, the hierarchical configuration model must be compared to the corresponding configuration model. First, the graph with household communit- ies is constructed. For an initial graph size n, community objects are constructed. A community object can be seen as a subgraph of the final graph. Key properties are size, which may be fixed or random, a connectivity or adjacency matrix that fits the community types and the corresponding place in the graph.

This simulation simulates percolation on household communities of a Poisson distribution sample or deterministic size. Each node in a community has one outgoing edge to other communities. Furthermore, each community is a (other community structures are to be dis- i cussed later). When constructing the community, the connectivity matrix A satisfies Aij = 1 − δj for each matrix element. Then the connectivity matrix of the full graph is created by placing the connectivity matrices as blocks around the diagonal. Then the communities are paired using the following algorithm:

First find the vertices which have open outgoing edges. Then select uniformly one of these vertices. Then another vertex is chosen in a similar way, and the corresponding positions in the adjacency matrix are set to 1. Note that self loops are not allowed, so afterwards self loops are deleted, but it does influence the amount of open outgoing edges. Repeat this process until all nodes are paired. P This entire process could also be replaced by generating a vector k of length i ki where (ki) is the degree sequence. The vector k then has ki positions for the i-th vertex. Then a random permutation on k leads to the same result. Contructing hierarchical configuration models with a different community structure is done in the same way. The simulation takes as input a class structure which describes the community type. Each community type has an unique connectivity matrix.

Constructing the corresponding configuration model is done in a similar way. Afterwards, a graph of the same size is constructed. The degree sequence from the graph with communities is copied and using the same algorithm, the vertices are paired with each other. The result is a graph with an identical degree distribution but with a totally different internal structure.

Then the graphs are independently percolated. For each node a random number in (0, 1) is generated. If this exceeds the percolation probability, then the corresponding row and column are deleted. Afterwards, the sizes of the largest connected components are calculated, as well as the distance distributions. Monte-Carlo simulation is introduced as varying the percolation threshold while iterating the percolation of the graphs for nRuns times.

16 Site Percolation on the Hierarchical Configuration Model 5 SIMULATION RESULTS

5 Simulation Results

The simulation can be applied to varying parameters and settings for community structures. Some results for the communities defined earlier are presented in this section. Community sizes are varied and the hierarchical configuration model is compared to the configuration model with identical degree sequence.

5.1 Size of the Giant Component The size of the giant component is presented as the fraction of the entire graph. In section3 we analyzed site percolation on household communities. For household communities of fixed size l, we can numerically find the corresponding value of πc using Theorem3.

l πc 3 0.648 5 0.459 7 0.381 10 0.312 15 0.256

Table 1: Critical percolation thresholds for deterministic sized household communities

Let l = 10, then for the equivalent configuration model we have, using Proposition 5 in the paper of Van der Hofstad et al. [2], thatp ˜j = 1 if j = 10 and 0 otherwise. It then follows using theorem P P 3.5 in the paper of Janson that a giant component exists w.h.p. if j j(j − 1)π · pj > j j · pj. It 1 1 then follows that πc = 9 or in general πc = l for configuration models corresponding to household communities of a fixed size l. We see that these approximate values of πc in Table1 match with the simulation results in Figure4.

Figure 4: Site Percolation on HCM graph of 10.000 vertices with fixed sized communities

Site Percolation on the Hierarchical Configuration Model 17 5 SIMULATION RESULTS

Adapting Poisson sized household communities goes in a similar way. First, the graph according to the hierarchical configuration model is generated. Each household size is randomly generated as a P oisson(9) + 1 random variable. Then the expectation remains 10 while households of size 0 cannot exist. These blocks together form the HCM graph where afterwards the outgoing half- edges are paired. To make sure the total degree is even, a while- replaces an uniformly chosen community until the total degree is even.

Figure 5: Site Percolation on HCM graph of 10.000 vertices with Poisson sized communities

Already for n = 10.000, we see a remarkable structure in Figure5. The right part of the graph is explained by the fact that when a vertex is site percolated, it becomes isolated from the entire graph. Therefore the connected components will always become smaller in an approximately linear way without falling apart due to completeness. However, we see that when the percolation thresholds exceeds a critical value, the size of the largest connected component decays rapidly. This behavior appears earlier compared to the configuration model with identical degree sequence.

Figure 6: Site Percolation on CM graph with identical degree distribution (N = 10.000)

18 Site Percolation on the Hierarchical Configuration Model 5 SIMULATION RESULTS

The configuration model random graph does not have many vertices with a low vertex degree. This implies that the model is well-connected. This is illustrated by the fact that the decay of the 1 size of the giant component happens in an approximately linear way in the interval π ∈ ( 4 , 1) for L > 4 in Figure6. We see that this interval is much smaller when looking at the corresponding hierarchical configuration model random graphs in Figure5.

In comparison with the complete households, as shown above, we also have star households as introduced in section 3.1.2. These star households consist of L vertices where L − 1 vertices are connected to a central vertex. Then every vertex has an additional edge to another community except for the central vertex. For fixed sizes, this is a hierarchical configuration model where with 1 L−1 probability L a vertex has degree L and with probability L a vertex has degree 2. In Theorem 4, the condition for the critical value is formulated. For fixed stars, the following values of πc can be computed.

l 4 5 7 10 15 100 πc 0.794 0.700 0.585 0.500 0.423 0.217

Table 2: Critical percolation thresholds on deterministic sized star communities

We see these values in Table2 match with the simulation results shown below in Figure7. Com- pared with the approximate values of the critical values for household communities, we see that the critical values indeed satisfy πstars ≥ πhouseholds.

Figure 7: Site Percolation on HCM graph of 10.000 vertices

We will now look at star communities of varying size. Percolation of such random graphs lead to the following results:

Site Percolation on the Hierarchical Configuration Model 19 5 SIMULATION RESULTS

Figure 8: Site Percolation on HCM graph of 10.000 vertices with Poisson sized star communities

Clearly the presence of the edges which connect the entire community imposes a vulnerability. A graph is more likely to break apart as a consequence of percolation in comparison to the configuration model when looking at critical behavior. For the case L = 3, star communities with a random size are generated such that E[S] = 3, this community type coincides with a line community where the outer edges have one outgoing degree to another community. These community structures are even more vulnerable than star communities. The configuration model has a high fraction of vertices with degree 2, these vertices may form lines which may be an intuitive argument for the behavior of the size of the giant component of the configuration model compared to the hierarchical configuration model when looking at non-critical behavior as shown in Figure9.

Figure 9: Site Percolation on HCM graph of 10.000 vertices with Poisson sized Star Communities

20 Site Percolation on the Hierarchical Configuration Model 5 SIMULATION RESULTS

Figure 10: Site Percolation on the Configuration Model graphs with identical degree distribution

It is interesting to see that the configuration model random graphs with identical degree distribu- tion do not break up fast. In the hierarchical configuration model, there are many vertices with just two half-edges and a few vertices with a lot of half-edges. This ratio does only change a little when varying the community sizes, which explains the behavior of the configuration model. Ap- parently, just a few vertices keep the entire graph connected. In comparison with other community structures, line communities are more vulnerable when larger. Every intermediate vertex can be seen as a weak chain which is clearly represented by the following graph in Figure 11.

Figure 11: Site Percolation on HCM graph of 10.000 vertices with Poisson sized line communities

Household communities and star communities are examples of communities which increase πc. HCM CM This means that πc ≥ πc . However the opposite is true for line communities, as can be seen from the following graph:

Site Percolation on the Hierarchical Configuration Model 21 5 SIMULATION RESULTS

Figure 12: Site Percolation on HCM graph of 10.000 vertices with Poisson sized Line Communities

The same holds true for line communities of a different form. Let θ ∈ (0, 1). Then with probability the community H is of type H1: a line community similar to definition 3.1.3. This line community, (b) however, satisfies dv = 1 for the outer edges. Then with probability 1 − θ the community is of type H2: a single vertex with dH = 3. This is again a hierarchical configuration model with degree distribution D as shown in the article by Van der Hofstad et al [2]. The degree distribution is given by:

 Lθ  Lθ+1−θ if k = 2  1−Lθ P(D = k) = Lθ+1−θ if k = 3  0 otherwise

To select θ such that a fraction α of the vertices in G is of type H2 it must hold that α = P(D = 3). 1−α Hence θ = Lα+1−α . Let α = 1/3. When adapting Poisson sized communities, L must be seen as a random variable and therefore θ is a varying probability. When n → ∞ it still holds true that a fraction α of the vertices is of type H2.

In the graph shown in Figure 13 we see the line corresponding with L = 10 crossing the other lines. This is explained by the fact that the hierarchical configuration model with long lines is more vulnerable compared to hierarchical configuration model random graphs with smaller line communities, however the constant fraction α has a positive influence on the giant component. When comparing Figure 11 and Figure 13, we see that increasing E[S] increases πc for regular line communities and it decreases πc for the hierarchical configuration model random graphs with line communities and a fraction of single vertex communities with dH = 3.

22 Site Percolation on the Hierarchical Configuration Model 5 SIMULATION RESULTS

Figure 13: Site Percolation on HCM graph of 10.000 vertices with Poisson sized Line Communities

The difference in critical percolation value is shown more clearly when comparing Figure 12 and Figure 14. The hierarchical configuration model clearly increases πc.

Figure 14: Site Percolation on HCM graph of 10.000 vertices with Poisson sized Line Communities

5.2 Distances The distance between two vertices is defined as the shortest path between them. Definition 12. For vertices u and v in a connected graph G, the distance d(u, v) equals the minimum number of edges between u and v. For a graph with multiple connected components, we say that if vertices u and v are not connected then they have distance ∞. Then an empirical probability distribution is defined as:

Site Percolation on the Hierarchical Configuration Model 23 5 SIMULATION RESULTS

Definition 13. Let u and v vertices in a graph G. Then the distance distribution is defined as: |{(u, v) ∈ G|d(u, v) = j}| (d(˜u, v˜) = j) = , j ∈ P |{(u, v) ∈ G|d(u, v) < ∞}| N This definition is straightforward to handle. Before and after percolation a distance matrix is calculated. This matrix has the same dimensions as the adjacency matrix and contains for each pair the shortest path expressed as the minimal number of edges between these vertices. Then the denominator equals the amount of finite elements and the numerator is the amount of elements equal to j. For the configuration model, this distance distribution or hopcount is widely studied, for example in the article by Nitzan et al. [5] The most important result is the logarithmic growth of the mean hopcount, which we state here as a theorem.

Theorem 6. For a configuration model random graph G with N vertices, the mean hopcount grows like LogνD (N). Here νD is the base of the logarithm. The approximation is asymptotic.

E[d(u, v)] ∼ LogνD (N) as N → ∞ Recalll that on the large scale, the hierarchical configuration model is a configuration model with n N vertices and degrees dH . Since as n → ∞ we can write n → E[S] and hence for special community structures arrive at the following result: Theorem 7. For a hierarchical configuration model random graph G with n household communit- ies with community degree D , the mean hopcount grows like 2Log (n). HCM νDHCM Proof. The proof is straightforward. Since as n → ∞, selecting 2 vertices in the same community has probability 0, it follows that the hopcount can be interpreted as hopping between communit- ies, which is a configuration model. Furthermore we need to take the hopping inside communities into account. It follows using Theorem6 that the expected number of hops between communities equals Log (n). νDHCM Then the number of intermediate communities equals Log (n) − 1. Since in every interme- νDHCM diate community 1 additional hop is required and we need 2 additional hops for leaving the first community and arriving at the last community, we have that

2Log (n) ≤ [d(u, v)] ≤ 2Log (n) + 2 νDHCM E νDHCM In the best case scenario, in the first en last community, no hop inside the community is required. In the worst scenario, an intermediate hop is required in both communities. As n → ∞, we can divide by 2Log (n) and by applying the squeeze theorem we arrive at the desired result. νDHCM

In the particular case that the households are of a fixed size, we have that all all vertices have the same degree, and hence νDHCM = νDCM which leads to

2Log (n) = 2Log (N/ [S]), νDHCM νDHCM E = 2Log (N) − 2Log ( [S]), νDCM νDCM E > Log (N) for N > [S]2. νDCM E

From here the following heuristic argument can be made. Consider a hierarchical configuration model random graph and the configuration model random graph with identical degree distribu- tion. Since a household community is the best connected community structure it must follow that, (b) in general if dv ≤ 1 for all v ∈ G, then E[dHCM (u, v)] ≥ E[dCM (u, v)].

24 Site Percolation on the Hierarchical Configuration Model 5 SIMULATION RESULTS

For the case S = 4 and hence νD = 13/4, the following empirical distance distribution is found for a graph of n = 3000 household communities:

Figure 15: Distance distribution household com-Figure 16: Distance distribution configuration munities L=4 model with identical degree distribution

Then using Theorem6 and Theorem7 we can compare the theoretical result with the numerical mean. Already for n = 3000 we see these theorems provide good estimators.

E[d(u, v)] HCM CM Theoretical 13.586 7.969 Numerical 13.593 7.896

In general this mean distance inequality is not true. Let H the alternative type of line communities as described in section 5.1 with α = 1/3. Then the following empirical probability mass functions are found for the hierarchical configuration model and the configuration model with identical degree sequence.

Figure 17: Distance distribution HCM with 3500Figure 18: Distance distribution CM with fixed size communities identical degree distribution (N ≈ 10.000)

The red lines are normal density fits with as parameters the empirical mean and variance. For the HCM it holds that E[dHCM (u, v)] ≈ 18.73 and for the CM we find E[dCM (u, v)] ≈ 20.74. Notice that introducing community structures does not necessarily increase the mean distance.

Site Percolation on the Hierarchical Configuration Model 25 5 SIMULATION RESULTS

For a hierarchical configuration model with a Poisson sized household distribution (P oi(9) + 1) and 1000 (complete) communities, the distance distribution before and after percolation are here shown as probability mass functions. The shapes do not look like normal distributions or Poisson distributions.

Figure 19: Distance distribution 1000 householdFigure 20: Distance distribution HCM after per- communities (L=10) colation (π = 0.35)

Figure 21: Distance distribution configurationFigure 22: Distance distribution configuration model with identical degree distribution model after percolation (π = 0.35)

Evidently, percolation increases distances and the influence by the community structure is shown clearly. In the standard configuration model, distances do barely increase while in the hierarchical configuration model it is possible to achieve a distance of 24 intermediate vertices. Note that Figure 20 shows a strange low at P(d(u, v) = 10).

When looking at Poisson sized star households we see this behavior explained more clearly as shown in Figure 24. Apparently, structures in the hierarchical configuration model causes this alternating effect in the resulting distance probability distribution after percolation. When randomly selecting two vertices in the hierarchical configuration model, the distance is odd when just one vertex is a central vertex and even otherwise. Therefore even distances occur more often.

26 Site Percolation on the Hierarchical Configuration Model 5 SIMULATION RESULTS

Figure 23: Distance distribution 1000 PoissonFigure 24: Distance distribution HCM after per- sized star communities (L=10) colation (π = 0.55)

When looking at line communities as defined in section 3.1.3, we see that distances become very large compared to other community structures.

Figure 25: Distance distribution HCMFigure 26: Distance distribution HCM after per- (N=10000) with normal density fit colation (π = 0.8)

The configuration model with identical degree distribution looks very similar. This is not strange. When pairing the half-edges in the configuration model random graph, there are many vertices with dv = 2. These vertices also form lines of an arbitrary length. Intuitively it can be seen that the structure of the hierarchical configuration model with line communities and the configura- tion model with identical degree distribution have an almost identical structure. The empirical probability mass functions suggest normality. For a normal distribution, the minimum-variance unbiased estimator is the sample mean and sample variance. For these two estimators, a normal probability density function is plotted on the same interval.

Site Percolation on the Hierarchical Configuration Model 27 5 SIMULATION RESULTS

Figure 27: Distance distribution configurationFigure 28: Distance distribution CM after per- model with identical degree distribution colation (π = 0.8)

For the empirical distance distribution of the percolated configuration model random graphs, the conditional normal distribution is plotted. The conditional normal distribution is the random variable X with the following CDF:

2 2 P(X > x) = P(N (µ, σ ) > x | N (µ, σ ) > 0) ∀x > 0 The unbiased estimators for this random variable can be obtained by numerically solving for µ and σ given the following expectations [6]: µ µ µ µ µ µ [X] = σψ(µ/σ) and [X2] = σ2(( )2 + 2 ψ( ) + Φ( ) − φ( )) (4) E E σ σ σ σ σ σ Where ψ(u) = u + φ(u)/Φ(u), the functions φ and Φ denote the PDF and CDF of a standard normal random variable respectively.

28 Site Percolation on the Hierarchical Configuration Model 6 CONCLUSIONS AND DISCUSSION

6 Conclusions and discussion

In this thesis, we introduced the configuration model and as an extension the hierarchical config- uration model. The hierarchical configuration model is a configuration model where each vertex is replaced by a small community.

We summarized several already known results for these models and used these results to find new properties for hierarchical configuration model random graphs with special community structures in percolation theory. Percolation theory studies the connected components in large scale random graphs. For example, after site percolation, when does a connected component of infinite size exist for special community structures.

The condition for this giant component to emerge merely depends on the new community degree distribution after percolation. It is sometimes difficult to calculate this new distribution, the dis- tribution depends on the community size, intra-community connectivity matrix and the expected amount of communities to emerge from a random community after site percolation. However, for a few examples it was possible to calculate this condition and numerically find the critical (b) percolation thresholds. A general condition was only derived for the case where dv ≤ 1∀v ∈ G, the other case leads to complexities in the proof since the red vertices method allows a vertex to split into multiple vertices if it is has an inter-community degree larger than 1.

Using a stochastic simulation, we studied the behavior of a giant component more closely. We verified the analytical results found. For more complex community structures we visualized the behavior of the giant component and compared the community structures with each other, as well as the hierarchical configuration model with the corresponding configuration model, that is the configuration model random graphs with identical degree distribution. We see that introducing community structures in the configuration model does not necessarily mean that the critical site percolation threshold πc increases. Line communities do decrease πc while star communities and households increase πc.

We extended this study to the distance distribution and argued when introducing specific com- (b) munity structures increases distances. Especially when dv ≤ 1∀v ∈ G then the mean distance grows larger in the hierarchical configuration model. This result is not true in general, as shown in Figure 17 and Figure 18. We see familiar shapes in the empirical distance distribution plots. The distance distribution of household communities look a Poisson distribution and the distance distribution of line communities, before and after percolation, look like they may be approximated by a normal distribution, or a conditional normal random variable. A statistical analysis shows that some of these probability mass functions do not differ much from a normal distribution, especially the fit in Figure 17 and Figure 27 is quite good. It would be interesting to do more analytical research and find out more about these distributions.

Site Percolation on the Hierarchical Configuration Model 29 7 APPENDIX

7 Appendix 7.1 Table of Definitions Symbol Definition G Graph NG(v) Neighborhood of a vertex v in G 1 − π Percolation probability ∗ Gπ Intermediate graph after percolation with red vertices Gπ Graph after percolation n Number of communities N Number of vertices H Community graph {ki} Degree sequence {si} Community size sequence D Degree distribution S Size distribution E[D(D−1)] νD E[D] (b) dv Degree of a vertex to other communities (c) dv Degree of a vertex inside the community dv Total degree of a vertex dH Degree of a community d(u, v) Distance between two vertices

30 Site Percolation on the Hierarchical Configuration Model REFERENCES

References

[1] Janson, S. (2009). On percolation in random graphs with given vertex degrees. Electronic Journal of Probability, 14, p.p. 87 —118.6,7,8 [2] Hofstad, R. Van Der, Leeuwaarden, J. S. H. Van, & Stegehuis, C. (2016). Hierarchical config- uration model, 1 —25.3,4,5, 14, 17, 22 [3] Angel, O., Hofstad, R. Van Der, & Holmgren, C. (2017). Limit laws for self-loops and multiple edges in the configuration model.3 [4] Bollob´as,B. (2001). Random Graphs (Cambridge Studies in Advanced Mathematics). Cam- bridge: Cambridge University Press.3 [5] Nitzan, M., Katzav, E., K¨uhn,R., & Biham, O. (2016). Distance distribution in configuration model networks, 1 —28. https://doi.org/10.1103/PhysRevE.93.062309 24 [6] Conditional mean and variance of normal random variables. (n.d.). Retrieved June 23, 2017, from https://math.stackexchange.com/questions/546524/ conditional-mean-and-variance-of-normal-random-variables 28 [7] Roberts, Lawrence G.; Wessler, Barry D. (1970), ”Computer network development to achieve resource sharing”, AFIPS ’70 (Spring): Proceedings of the May 57, 1970, spring joint computer conference, New York, NY, USA: ACM, pp. 543549, doi:10.1145/1476936.1477020 1 [8] Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological net- works. Proceedings of the National Academy of Sciences of the United States of America, 99(12), 78217826. http://doi.org/10.1073/pnas.122653799 1 [9] Ugander, J., Karrer, B., Backstrom, L., Marlow, C., & Alto, P. (n.d.). The Anatomy of the Facebook Social Graph, 117.https://arxiv.org/pdf/1111.4503.pdf 1

Site Percolation on the Hierarchical Configuration Model 31