School of Information Sciences University of Pittsburgh

TELCOM2125: and Analysis

Konstantinos Pelechrinis Spring 2015 Figures are taken from: M.E.J. Newman, “Networks: An Introduction” Part 8: Small-World Network Model

2 Small World-Phenomenon

l Milgram’s experiment § Given a target individual - stockbroker in Boston - pass the message to a person you know on a first name basis who you think is closest to the target l Outcome § 20% of the the chains were completed § Average chain length of completed trials ~ 6.5 l Dodds, Muhamad and Watts have repeated this experiment using e-mail communications § Completion rate is lower, average chain length is lower too

3 Small World-Phenomenon l Are these numbers accurate? § What bias do the uncompleted chains introduce?

inter-country intra-country

Source: An Experimental Study of Search in Global Social Networks: Peter Sheridan Dodds, Roby Muhamad, and 4 Duncan J. Watts (8 August 2003); Science 301 (5634), 827. Clustering in real networks

l What would you expect if networks were completely cliquish? § Friends of my friends are also my friends § What happens to small paths? l Real-world networks (e.g., social networks) exhibit high levels of transitivity/clustering § But they also exhibit short paths too

5 Clustering coefficient

l The network models that we have seen until now (, and ) do not show any significant clustering coefficient § For instance the random graph model has a clustering coefficient of c/n-1, which vanishes in large networks l However, it is easy to find networks that have high clustering coefficient independent of the network size

6 Triangular lattice

l For instance, consider a triangular lattice l Due to symmetry we can consider a random vertex § Clustering coefficient gives the probability that two neighbors of the vertex under consideration are themselves friends l Every vertex has six neighbors and hence there are 15 pairs of neighbors § From them 6 are connected ü Hence, the clustering coefficient is 0.4 ü Independent of the network size

7

Figure 15.1: A triangular lattice. Any vertex in a triangular lattice, such as the one highlighted here, has six neighbors and hence pairs of neighbors, of which six are connected by edges, giving a clustering coefficient of = 0.4 for the whole network, regardless of size.

To calculate the number of triangles in such a network, we observe that a trip around any triangle must consist of two steps in the same direction around the circle—say clockwise— followed by one step back to close the triangle. The number of triangles per vertex in the whole network is then equal to the number of such triangles that start from any given point.

Figure 15.2: A simple one-dimensional network model. (a) Vertices are arranged on a line and each is connected to its c nearest neighbors, where c = 6 in this example. (b) The same network with periodic boundary conditions applied, making the line into a circle.

Traversing a “triangle” in our circle model means taking two steps forward around the circle and Circle model

Figure 15.1: A triangular lattice. Any vertex in a triangular lattice, such as the one highlighted here, has six neighbors and hence pairs of neighbors, of which six are connected by edges, giving a clustering coefficient of = 0.4 for the whole network, regardless of size. l In this model the vertices are arranged to a circle

§ Each node is connectedTo calculate to theits number c nearest of triangles invertices such a network, we observe that a trip around any triangle must consist of two steps in the same direction around the circle—say clockwise— ü Fixed degree for allfollowed nodes by one step back to close the triangle. The number of triangles per vertex in the whole network is then equal to the number of such triangles that start from any given point. l A triangle in this networkFigure 15.1: requires A triangular lattice.two Any edge vertex in traversals a triangular lattice, such at as the one highlighted here, has six neighbors and hence pairs of neighbors, of which six are connected by edges, the same direction on thegiving acircle clustering coefficient and one of = 0.4 at fo rthe the whole opposite network, regardless of size.

§ The final/opposite step can span at most c/2 vertices To calculate the number of triangles in such a network, we observe that a trip around any § Hence, the number oftriangle triangles must consist for of a two given steps in node the same is direction given around by thethe circle—say clockwise— number of distinct waysfollowed by of one step choosing back to close the triangle. 2 Theforward number of trianglestarget per vertex in the whole network is then equal to the number of such triangles that start from any given point. vertices from the c/2 possibilities: ! c / 2 $ 1 c # & = c( −1) Figure 15.2: A simple one-di"mensional2 % network4 2model. (a) Vertices are arranged on a line and each is connected to its c nearest neighbors, where c = 6 in this example. (b) The same network § The number of connectedwith periodic boundarytriples condi pertions applied,vertex making is: the line into a circle.

! c $ 1 # & = c(c −1) " 2 % 2 8

Traversing a “triangle” in our circle model means taking two steps forward around the circle and Figure 15.2: A simple one-dimensional network model. (a) Vertices are arranged on a line and each is connected to its c nearest neighbors, where c = 6 in this example. (b) The same network with periodic boundary conditions applied, making the line into a circle.

Traversing a “triangle” in our circle model means taking two steps forward around the circle and Circle model

l Hence, the clustering coefficient of the circle model is:

1 nc( c −1)×3 3(c − 2) C = 4 2 = 1 2 nc(c −1) 4(c −1) l The clustering coefficient is not constant as in the triangular lattice but it takes values between 0 (when c=2) and 0.75 (when c!∞) § However, note that C is independent of n l While this model exhibits large clustering coefficient it has two problems § § “Large-worlds” à The average shortest path is not small as in 9 real networks Small-world models

l Random graphs exhibit small paths but not clustering § Why not try to combine these two models together? l The small-world model (Watts and Strogatz 1998) tries to do exactly this We start with a circle model of n vertices in which every vertex § The small-world model, in its original form, interpolates between our circle model and the has a degree of c random graph by moving or rewiring edges from the circle to random positions. The detailed structure of the model is shown in Fig. 15.3a. Starting with a circle model of n vertices in which every vertex has degree c, we go through each of the edges in turn and with some probability p we § We go through each of the edges and withremove thatsome edge and probability replace it with one thatp joins two vertices chosen uniformly at random.233 The randomly placed edges are commonly referred to as shortcuts because, as shown in Fig. 15.3a, we rewire it they create shortcuts from one part of the circle to another. ü Remove this edge and pick two vertices uniformly at random and connect them with a new edge o Shortcut edge

10

Figure 15.3: Two versions of the small-world model. (a) In the original version of the small- world model, edges are with independent probability p removed from the circle and placed between two vertices chosen uniformly at random, creating shortcuts across the circle as shown. In this example n = 24, c = 6, and p = 0.07, so that 5 out of 72 edges are “rewired” in this fashion. (b) In the second version of the model only the shortcuts are added and no edges are removed from the circle.

The parameter p in the small-world model controls the interpolation between the circle model and the random graph. When p = 0 no edges are rewired and we retain the original circle. When p = 1 all edges are rewired to random positions and we have a random graph. For intermediate values of p we generate networks that lie somewhere in between. Thus for p = 0 the small-world model shows clustering (so long as c > 2—see Eq. (15.2)) but no small-world effect. For p = 1 it does the reverse. The crucial point about the model is that as p is increased from zero the clustering is maintained up to quite large values of p while the small-world behavior, meaning short average path lengths, already appears for quite modest values of p. As a result there is a substantial range of intermediate values for which the model shows both effects simultaneously, thereby demonstrating that the two are in fact entirely compatible and not exclusive at all. Unfortunately, it is hard to demonstrate this result rigorously because the small-world model as defined above is difficult to treat by analytic means. For this reason we will in this chapter study a slight variant of the model, which is easier to treat [254]. In this variant, shown in Fig. 15.3b, edges are added between randomly chosen vertex pairs just as before, but no edges are removed from the original circle. This leaves the circle intact, which makes our calculations much simpler. For ease of comparison with the original small-world model, the definition of the parameter p is kept the same: for every edge in the circle we add with independent probability p an additional shortcut between two vertices chosen uniformly at random.234 A downside of this version of the model is that it no longer becomes a random graph in the limit p = 1. Instead it becomes a random graph plus the original circle. This, however, turns out not to be a significant problem, since most of the interest in the model lies in the regime where p is small and in this regime the two models differ hardly at all; the only difference is the presence in the second variant of a small number of edges around the circle that would be absent in the first, having been rewired. Henceforth, we will study the variant model in which no edges are removed and we will refer to it, as others have, as the small-world model, although the reader should bear in mind that there are two slightly different models that carry this name. Small-world models

l The parameter p controls the interpolation between the circle model and the random graph § p=0 à ordered situation/circle model § p=1 à random graph § Intermediate values of p give networks somewhere in between l The crucial and interesting point is that small paths appear even for small values of p as we increase from p=0, while the high clustering remains until fairly large values of p § Hence, there is a regime for values of p where both small paths as well as high clustering exists!

11 Small-world models l The above model is the original small-world model but it is rather involved to be analyzed l We will use another model for our derivations § Edges are added at random between two vertices in the circular lattice but no edges from the original circle are removed § The definition of p is remaining the same

ü For every edge at the originalThe small-world circle model, in we its original create form, interpolates an additional between our circle model and the random graph by moving or rewiring edges from the circle to random positions. The detailed shortcut with probability p structurebetween of the model two is randomlyshown in Fig. 15.3a. chosen Starting with vertices a circle model of n vertices in which every vertex has degree c, we go through each of the edges in turn and with some probability p we l When p!1 we no longer haveremove thata edgecompletely and replace it with one random that joins two vertic graphes chosen uniformly at random.233 The randomly placed edges are commonly referred to as shortcuts because, as shown in Fig. 15.3a, § This is not a big problem sincethey create we shortcuts are from interested one part of the circle in to another.the regime where p is small ü The only difference in this regime is that a small number of edges around the circle that would be absent in the

12 original model are now present

Figure 15.3: Two versions of the small-world model. (a) In the original version of the small- world model, edges are with independent probability p removed from the circle and placed between two vertices chosen uniformly at random, creating shortcuts across the circle as shown. In this example n = 24, c = 6, and p = 0.07, so that 5 out of 72 edges are “rewired” in this fashion. (b) In the second version of the model only the shortcuts are added and no edges are removed from the circle.

The parameter p in the small-world model controls the interpolation between the circle model and the random graph. When p = 0 no edges are rewired and we retain the original circle. When p = 1 all edges are rewired to random positions and we have a random graph. For intermediate values of p we generate networks that lie somewhere in between. Thus for p = 0 the small-world model shows clustering (so long as c > 2—see Eq. (15.2)) but no small-world effect. For p = 1 it does the reverse. The crucial point about the model is that as p is increased from zero the clustering is maintained up to quite large values of p while the small-world behavior, meaning short average path lengths, already appears for quite modest values of p. As a result there is a substantial range of intermediate values for which the model shows both effects simultaneously, thereby demonstrating that the two are in fact entirely compatible and not exclusive at all. Unfortunately, it is hard to demonstrate this result rigorously because the small-world model as defined above is difficult to treat by analytic means. For this reason we will in this chapter study a slight variant of the model, which is easier to treat [254]. In this variant, shown in Fig. 15.3b, edges are added between randomly chosen vertex pairs just as before, but no edges are removed from the original circle. This leaves the circle intact, which makes our calculations much simpler. For ease of comparison with the original small-world model, the definition of the parameter p is kept the same: for every edge in the circle we add with independent probability p an additional shortcut between two vertices chosen uniformly at random.234 A downside of this version of the model is that it no longer becomes a random graph in the limit p = 1. Instead it becomes a random graph plus the original circle. This, however, turns out not to be a significant problem, since most of the interest in the model lies in the regime where p is small and in this regime the two models differ hardly at all; the only difference is the presence in the second variant of a small number of edges around the circle that would be absent in the first, having been rewired. Henceforth, we will study the variant model in which no edges are removed and we will refer to it, as others have, as the small-world model, although the reader should bear in mind that there are two slightly different models that carry this name. Small-world models

13 Degree distribution

l In the small world model that we examine every node has at least degree c l The expected number of shortcut edges we add is (1/2)ncp § ncp ends of shortcut edges l The number of shortcuts s attached to any vertex is Poisson distributed: (cp)s p = e−cp s s! l The total vertex degree is k=c+s (cp)k−c p = e−cp , k ≥ c k (k − c)! 14 Degree distribution

l For c=6, p=0.5

Not similar to real world networks

15 Figure 15.4: The degree distribution of the small-world model. The frequency distribution of vertex degrees in a small-world model with parameters c = 6 and .

Clustering coefficient l In order to calculate the clustering coefficient we need to calculate the number of triangles and connected triples after the addition of the shortcuts l Number of triangles § The triangles of the original circle are not changed: (1/4)nc(c-1) § New triangles can be created ü In general nodes that have distance on the circle between (1/2)c+1 up to c are connected through 2-hop paths o This number increases linear with the size of the network n ü If a shortcut connects them then we have a new triangle 1 ncp cp cp 2 = ≅ ü The probability they are connected through a shortcut is: 1 2 n(n −1) n −1 n ü Hence, the number of triangles that are completed through the shortcuts is proportional to n*cp/n=cp o At the limit of large n these triangles are negligible compared to 16 these of the original circle Clustering coefficient l Number of connected triples § All connected triples of the original circle are still there: (1/2)nc(c-1) § Every shortcut creates new connected triples ü At each end of the shortcut edge there are c edges that can form a triple ü Hence, the total number of triples created due to a single shortcut are: (1/2)ncp*2*c=nc2p § Pairs of shortcuts attached to a vertex can create connected triples as well ü If a vertex has m attached shortcuts there are (1/2)m(m-1) triples centered at this node ü The number of shortcuts a node received is Poisson distributed with mean cp o Hence, the expected number of connected triples centered at a given vertex is (1/2)c2p2 17 Clustering coefficient

l Combining all above together the clustering coefficient for the small-world network model we consider is: 1 nc( 1 c −1)×3 3(c − 2) C = 4 2 = 1 2 1 2 2 2 2 nc(c −1)+ nc p + 2 nc p 4(c −1)+8cp + 4cp § For p=0 we obtain the clustering coefficient of the circle model § As p grows the clustering coefficient reduces 3(c − 2) ü For p=1 the minimum value is Cmin = 4c −1 ü This value is non zero

o E.g., for c=6 à Cmin=0.13 l Note: the original small-world model from Watts and

Strogatz exhibits Cmin=0

18 Clustering coefficient

l For c=6 and n=600

19 Average shortest path lengths

l The analytical treatment of shortest paths in the small-world model is harder compared to degree distribution and clustering coefficient l It can be argued that the average path length is given by: ln(ncp)  = , ncp >>1 c2 p § The average path length will increase only logarithmically with n for given c and p ü Hence, even few shortcuts per vertex can produce short paths

20 , for a total of triples over all vertices. Thus the expected total number of connected triples of all types in the whole network is . Substituting the numbers of triangles and triples into Eq. (15.5), we then find that

(15.7)

Note that this becomes the same as Eq. (15.2), as it should, when p = 0. And as p grows it becomesAverage smaller, with a minimumshortest value of pathC = lengths when p = 1. For instance when c = 6, the minimum value of the clustering coefficient is (This behavior contrasts with that of the original Watts-Strogatz version of the small-world model in which edges are removed from the circle. In that version the clustering coefficient tends to zeroSmall-world as n when p = 1, sincel For the network c=6 and becomes n=600 a random graph at p = 1.) regime

21

Figure 15.5: Clustering coefficient and average path length in the small-world model. The solid line shows the clustering coefficient, Eq. (15.7), for a small-world model with c = 6 and n = 600, as a fraction of its maximum value , plotted as a function of the parameter p. The dashed line shows the average geodesic distance between vertices for the same model as a fraction of its maximum value max = n/2c = 50, calculated from the mean-field solution, Eq. (15.14). Note that the horizontal axis is logarithmic.

Figure 15.5 shows a plot of the clustering coefficient as a function of p for a small-world network with c = 6.