Clustering Coefficient
Total Page:16
File Type:pdf, Size:1020Kb
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 Figures are taken from: M.E.J. Newman, “Networks: An Introduction” Part 8: Small-World Network Model 2 Small World-Phenomenon l Milgram’s experiment § Given a target individual - stockbroker in Boston - pass the message to a person you know on a first name basis who you think is closest to the target l Outcome § 20% of the the chains were completed § Average chain length of completed trials ~ 6.5 l Dodds, Muhamad and Watts have repeated this experiment using e-mail communications § Completion rate is lower, average chain length is lower too 3 Small World-Phenomenon l Are these numbers accurate? § What bias do the uncompleted chains introduce? inter-country intra-country Source: An Experimental Study of Search in Global Social Networks: Peter Sheridan Dodds, Roby Muhamad, and 4 Duncan J. Watts (8 August 2003); Science 301 (5634), 827. Clustering in real networks l What would you expect if networks were completely cliquish? § Friends of my friends are also my friends § What happens to small paths? l Real-world networks (e.g., social networks) exhibit high levels of transitivity/clustering § But they also exhibit short paths too 5 Clustering coefficient l The network models that we have seen until now (random graph, configuration model and preferential attachment) do not show any significant clustering coefficient § For instance the random graph model has a clustering coefficient of c/n-1, which vanishes in large networks l However, it is easy to find networks that have high clustering coefficient independent of the network size 6 Triangular lattice l For instance, consider a triangular lattice l Due to symmetry we can consider a random vertex § Clustering coefficient gives the probability that two neighbors of the vertex under consideration are themselves friends l Every vertex has six neighbors and hence there are 15 pairs of neighbors § From them 6 are connected ü Hence, the clustering coefficient is 0.4 ü Independent of the network size 7 Figure 15.1: A triangular lattice. Any vertex in a triangular lattice, such as the one highlighted here, has six neighbors and hence pairs of neighbors, of which six are connected by edges, giving a clustering coefficient of = 0.4 for the whole network, regardless of size. To calculate the number of triangles in such a network, we observe that a trip around any triangle must consist of two steps in the same direction around the circle—say clockwise— followed by one step back to close the triangle. The number of triangles per vertex in the whole network is then equal to the number of such triangles that start from any given point. Figure 15.2: A simple one-dimensional network model. (a) Vertices are arranged on a line and each is connected to its c nearest neighbors, where c = 6 in this example. (b) The same network with periodic boundary conditions applied, making the line into a circle. Traversing a “triangle” in our circle model means taking two steps forward around the circle and Circle model Figure 15.1: A triangular lattice. Any vertex in a triangular lattice, such as the one highlighted here, has six neighbors and hence pairs of neighbors, of which six are connected by edges, giving a clustering coefficient of = 0.4 for the whole network, regardless of size. l In this model the vertices are arranged to a circle § Each node is connectedTo calculate to theits number c nearest of triangles invertices such a network, we observe that a trip around any triangle must consist of two steps in the same direction around the circle—say clockwise— ü Fixed degree for allfollowed nodes by one step back to close the triangle. The number of triangles per vertex in the whole network is then equal to the number of such triangles that start from any given point. l A triangle in this networkFigure 15.1: requires A triangular lattice.two Any edge vertex in traversals a triangular lattice, such at as the one highlighted here, has six neighbors and hence pairs of neighbors, of which six are connected by edges, the same direction on thegiving acircle clustering coefficient and one of = 0.4 at fo rthe the whole opposite network, regardless of size. § The final/opposite step can span at most c/2 vertices To calculate the number of triangles in such a network, we observe that a trip around any § Hence, the number oftriangle triangles must consist for of a two given steps in nodethe same is direction given around by the the circle—say clockwise— number of distinct waysfollowed byof one stepchoosing back to close the triangle. 2 Theforward number of trianglestarget per vertex in the whole network is then equal to the number of such triangles that start from any given point. vertices from the c/2 possibilities: ! c / 2 $ 1 c # & = c( −1) Figure 15.2: A simple one-di"mensional2 % network4 2model. (a) Vertices are arranged on a line and each is connected to its c nearest neighbors, where c = 6 in this example. (b) The same network § The number of connectedwith periodic boundarytriples condi pertions applied,vertex making is: the line into a circle. ! c $ 1 # & = c(c −1) " 2 % 2 8 Traversing a “triangle” in our circle model means taking two steps forward around the circle and Figure 15.2: A simple one-dimensional network model. (a) Vertices are arranged on a line and each is connected to its c nearest neighbors, where c = 6 in this example. (b) The same network with periodic boundary conditions applied, making the line into a circle. Traversing a “triangle” in our circle model means taking two steps forward around the circle and Circle model l Hence, the clustering coefficient of the circle model is: 1 nc( c −1)×3 3(c − 2) C = 4 2 = 1 2 nc(c −1) 4(c −1) l The clustering coefficient is not constant as in the triangular lattice but it takes values between 0 (when c=2) and 0.75 (when c!∞) § However, note that C is independent of n l While this model exhibits large clustering coefficient it has two problems § Degree distribution § “Large-worlds” à The average shortest path is not small as in 9 real networks Small-world models l Random graphs exhibit small paths but not clustering § Why not try to combine these two models together? l The small-world model (Watts and Strogatz 1998) tries to do exactly this We start with a circle model of n vertices in which every vertex § The small-world model, in its original form, interpolates between our circle model and the has a degree of c random graph by moving or rewiring edges from the circle to random positions. The detailed structure of the model is shown in Fig. 15.3a. Starting with a circle model of n vertices in which every vertex has degree c, we go through each of the edges in turn and with some probability p we § We go through each of the edges and withremove thatsome edge and probability replace it with one thatp joins two vertices chosen uniformly at random.233 The randomly placed edges are commonly referred to as shortcuts because, as shown in Fig. 15.3a, we rewire it they create shortcuts from one part of the circle to another. ü Remove this edge and pick two vertices uniformly at random and connect them with a new edge o Shortcut edge 10 Figure 15.3: Two versions of the small-world model. (a) In the original version of the small- world model, edges are with independent probability p removed from the circle and placed between two vertices chosen uniformly at random, creating shortcuts across the circle as shown. In this example n = 24, c = 6, and p = 0.07, so that 5 out of 72 edges are “rewired” in this fashion. (b) In the second version of the model only the shortcuts are added and no edges are removed from the circle. The parameter p in the small-world model controls the interpolation between the circle model and the random graph. When p = 0 no edges are rewired and we retain the original circle. When p = 1 all edges are rewired to random positions and we have a random graph. For intermediate values of p we generate networks that lie somewhere in between. Thus for p = 0 the small-world model shows clustering (so long as c > 2—see Eq. (15.2)) but no small-world effect. For p = 1 it does the reverse. The crucial point about the model is that as p is increased from zero the clustering is maintained up to quite large values of p while the small-world behavior, meaning short average path lengths, already appears for quite modest values of p. As a result there is a substantial range of intermediate values for which the model shows both effects simultaneously, thereby demonstrating that the two are in fact entirely compatible and not exclusive at all. Unfortunately, it is hard to demonstrate this result rigorously because the small-world model as defined above is difficult to treat by analytic means. For this reason we will in this chapter study a slight variant of the model, which is easier to treat [254]. In this variant, shown in Fig. 15.3b, edges are added between randomly chosen vertex pairs just as before, but no edges are removed from the original circle. This leaves the circle intact, which makes our calculations much simpler. For ease of comparison with the original small-world model, the definition of the parameter p is kept the same: for every edge in the circle we add with independent probability p an additional shortcut between two vertices chosen uniformly at random.234 A downside of this version of the model is that it no longer becomes a random graph in the limit p = 1.