School of Information Sciences University of Pittsburgh

Studying Through Network Randomization

Ke Zhang Homophily in networks

l “Birds of a feather flock together’’ [McPherson et al. 2001] § /matching l Quantification is easy (in general) § E.g., coefficient Randomization can help in both problems l Identifying its causes is hard § Peer influence § Social selection

2 Homophily and assortative mixing

l People tend to associate with others7.13 HOMOPHILYwhom ANDthey ASSORTATIVE perceive MIXING as being similar to themselves in some ways § Homophily or assortative mixingConsider Fig. 7.10, which shows a friendship network of children at an American school, determined from a questionnaire of the type discussed in Section 3.2.111 One very clear feature that emerges from the figure is the division of the network into two groups. It turns out that this § Figure presents the friendshipsdivision in isa principally high alongschool lines of race. The different shades of the vertices in the picture correspond to students of different race as denoted in the legend, and reveal that the school is ü Nodes are annotated with regardssharply divided to theirbetween arace group composed principally of black children and a group composed principally of white. l Disassortative mixing is also possible § E.g., sexual partnerships

3

Figure 7.10: Friendship network at a US high school. The vertices in this network represent 470 students at a US high school (ages 14 to 18 years). The vertices are color coded by race as indicated in the key. Data from the National Longitudinal Study of Adolescent Health [34, 314].

This is not news to sociologists, who have long observed and discussed such divisions [225]. Nor is the effect specific to race. People are found to form friendships, acquaintances, business relations, and many other types of tie based on all sorts of characteristics, including age, nationality, language, income, educational level, and many others. Almost any social parameter you can imagine plays into people’s selection of their friends. People have, it appears, a strong tendency to associate with others whom they perceive as being similar to themselves in some way. This tendency is called homophily or assortative mixing. More rarely, one also encounters disassortative mixing, the tendency for people to associate with others who are unlike them. Probably the most widespread and familiar example of Enumerative characteristics

l Consider a network where each is classified according to a discrete characteristic, that can take a finite number of values § E.g., race, nationality, etc. l We find the number of edges that run between vertices of the same type, then we subtract it from the fraction of such edges that we would expect to find if edges where placed at random (i.e., without regard for the vertex type) § If the result is significantly positive à assortativity mixing is present § If the result is significantly negative à disassortativity mixing is present 4 Enumerative characteristics

l Let ci be the class of vertex i and let nc possible classes § The total number of edges connecting two vertices belonging to the same class is given by:

1 ∑ δ(ci,cj ) = ∑Aijδ(ci,cj ) edges(i, j) 2 ij l The expected number of edges between same type

vertices is given by: 1 kik j ∑ δ(ci,cj ) 2 ij 2m l Taking the difference of the above quantities we get:

1 1 kik j 1 kik j ∑Aijδ(ci,cj )− ∑ δ(ci,cj ) = ∑(Aij − )δ(ci,cj ) 2 ij 2 ij 2m 2 ij 2m

5 Enumerative characteristics

l Conventionally we calculate the fraction of such edges, which is called Q:

1 kik j Q = ∑(Aij − )δ(ci,cj ) 2m ij 2m l It is strictly less than 1 and takes positive values when there are more edges between vertices of the same type than we would expect l The summation term above appears in many other situations and we define it to be the modularity matrix B:

k k B = A − i j ij ij 2m 6 Enumerative characteristics

l The value of Q can be considerably lower than 1 even for perfectly mixed networks § Depends on group size, degrees etc. § How can we decide whether a modularity value is large or not? l We normalize the modularity value Q, by the maximum value that it can get § Perfect mixing is when all edges fall between vertices of the

same type 1 kik j Qmax = (2m −∑ δ(ci,cj )) 2m ij 2m l Then, the assortativity coefficient is given by:

Q ∑ (Aij − kik j / 2m)δ(ci,cj ) = ij 7 Qmax 2m − (k k / 2m)δ(c ,c ) ∑ij i j i j Computation speed-up

l The above equations might be computationally prohibitive to be used in large networks to compute modularity Q

1 e A (c ,r) (c , s) l Let r s = ∑ i jδ i δ j be the fraction of edges that join 2m ij vertices of type r to vertices of type s

1 a = k δ(c ,r) l Let r ∑ i i be the fraction of ends of edges 2m ij attached to vertices of type r l We further have δ(ci,cj ) = ∑δ(ci,r)δ(cj,r) r

8 Computation speed-up

1 " kik j % Q = ∑$Aij − '∑δ(ci,r)δ(cj,r) = 2m ij # 2m & r ) 1 1 1 , = ∑+ ∑Aijδ(ci,r)δ(cj,r)− ∑kiδ(ci,r) ∑kiδ(cj,r). = r *+2m ij 2m i 2m j -. 2 = ∑(err − ar ) r

2 Qmax =1−∑ar r

9 Scalar characteristics

l Scalar characteristics take values that come in particular order and can in theory take an infinite number of values § E.g., age § In this case two people can be considered similar if they are born the same day or within a year or within 2 years etc. l When we consider scalar characteristics we basically have an approximate notion of similarity between connected vertices § There is no approximate similarity when we talk for enumerative characteristics

10 studentsScalar in the same characteristics grade. There is also, in this case, a notable tendency for students to have more friends of a wider range of ages as their age increases so there is a lower density of points in the top right box than in the lower left one.

11

Figure 7.11: Ages of pairs of friends in high school. In this scatter plot each dot corresponds to one of the edges in Fig. 7.10, and its position along the horizontal and vertical axes gives the ages of the two individuals at either end of that edge. The ages are measured in terms of the grades of the students, which run from 9 to 12. In fact, grades in the US school system don’t correspond precisely to age since students can start or end their high-school careers early or late, and can repeat grades. (Each student is positioned at random within the interval representing their grade, so as to spread the points out on the plot. Note also that each friendship appears twice, above and below the diagonal.)

One could make a crude measure of assortative mixing by scalar characteristics by adapting the ideas of the previous section. One could group the vertices into bins according to the characteristic of interest (say age) and then treat the bins as separate “types” of vertex in the sense of Section 7.13.1. For instance, we might group people by age in ranges of one year or ten years. This however misses much of the point about scalar characteristics, since it considers vertices falling in the same bin to be of identical types when they may be only approximately so, and vertices falling in different bins to be entirely different when in fact they may be quite similar.

A better approach is to use a covariance measure as follows. Let xi be the value for vertex i of the scalar quantity (age, income, etc.) that we are interested in. Consider the pairs of values (xi, xj) for the vertices at the ends of each edge (i, j) in the network and let us calculate their covariance over all edges as follows. We define the mean of the value of xi at the end of an edge thus:

(7.77)

Scalar characteristics

l Simple solution would be to quantize/bin scalar values § Treat vertices that fall in the same bin as exactly same § Apply modularity for enumerative characteristics l The above misses much of the point of scalar characteristics § Why two vertices that fall at the same bin are same ? § Why two vertices that fall in different bins are different ?

12 Scalar characteristics

l xi is the value of the scalar characteristic of vertex i

l If Xi is the variable describing the characteristic at vertex i and Xj is the one for vertex j, then we want to calculate the covariance of these two variables over the edges of the graph

l First we need to compute the mean value of xi at the end i of an edge: A x ∑ ij i ∑ki xi ij i 1 µ = = = ∑ki xi ∑Aij ∑ki 2m i ij i

13 Scalar characteristics

l Hence, the covariance is:

∑Aij (xi − µ)(x j − µ) ij 1 kik j cov(xi, x j ) = =... = ∑(Aij − )xi x j ∑Aij 2m ij 2m ij l If the covariance is positive, we have assortativity mixing, otherwise we have disassortativity l As with the modularity Q we need to normalize the above value so as it gets the value of 1 for a perfectly mixed network

§ All edges fall between vertices that have the same value x i 14 (highly unlikely in a real network) Scalar characteristics

l Setting xi=xj for all the existing edges (i,j) we get the maximum possible value (which is the variance of the variable):

1 kik j ∑(kiδij − )xi x j 2m ij 2m l Then the assortativity coefficient r is defined as:

∑ (Aij − kik j / 2m)xi x j r = ij (k δ − k k / 2m)x x ∑ij i ij i j i j § r=1àPerfectly assortative network § r=-1àPerfecrtly disassortative network § r=0àno (linear) correlation 15 Assortative mixing by

l A special case is when the characteristic of interest is the degree of the node l In this case studying of assortative mixing will show if high/low-degree nodes connect with other high/low-degree nodes or not

§ Perfectly mixed networks by degreemixing). Once à we know core/periphery the (and hence the degrees) of all vertices we can calculate r. Perhaps for this reason mixing by degree is one of the most frequently studied types of § Disassortative networks by degreeassortative mixing. à uniform ü like

16

Figure 7.12: Assortative and disassortative networks. These two small networks are not real networks—they were computer generated to display the phenomenon of assortativity by degree. (a) A network that is assortative by degree, displaying the characteristic dense core of high-degree vertices surrounded by a periphery of lower-degree ones. (b) A disassortative network, displaying the star-like structures characteristic of this case. Figure from Newman and Girvan [249]. Copyright 2003 Springer-Verlag Berlin Heidelberg. Reproduced with kind permission of Springer Science and Business Media.

Assortative mixing by degree

l Assortative mixing by degree requires only structural information for the network § No knowledge for any other attributes is required

1 kik j cov(ki, k j ) = ∑(Aij − )kik j 2m ij 2m

∑ (Aij − kik j / 2m)kik j r = ij (k δ − k k / 2m)k k ∑ij i ij i j i j

Chapter 7.13, “Networks: An Introduction”. N.E.J Newman 17 Common theme

l Compare the real structure with a counterfactual one that considers connections being generated at random l In the case of single-dimensional nodal features this is easy § Closed form expressions available l What about vector attributes? § Vectors describe behavior, degree in directed networks, degree in composite networks etc.

18 Case study

l Spatial homophily, i.e., homophily with regards to the mobility of people l Q1: Is there spatial homophily? l Q2: How much of the observed homophily can be attributed to peer influence?

19 Spatial behavior

l Mobility is a complex behavior that cannot be reduced to a single number l Alice’s mobility is described through a vector, each element of which corresponds to the number of times she has been to the corresponding location l So Q1 is now reduced to the question of quantifying assortativity with respect to vector nodal attributes

20 Naïve approach

l Compute assortativity coefficient for each vector element l Obtain the mean/median assortativity over all elements l Straightforward and easy to compute § … biased especially when ü Elements are correlated ü Variance is high

21 CQ-method

l Bad clustering will produce bad results § Given that clustering is an ill-posed problem, the CQ-method while elegant is not expected to be practical

“Matching patterns in networks with multi-dimensional attributes: a machine learning approach”. K. Pelechrinis. Analysis and Mining, 2014 22 VA-index

l Return to the basics of statistical inference § Randomization l When performing a hypothesis test we assess the probability of obtaining the observed data simply by chance § If this probability is small we reject the hypothesis that chance played a significant role in the observation l This is the basis of VA-index

23 VA-index in a nutshell

Average similarity of connected nodes in the network ⇠ H0 : ⇠rand = ⇠ G G H1 : ⇠rand = ⇠ Expected 6 G average similarity if connections were made at random ⇠rand

Randomize the d s = ⇠ ⇠rand network pd2 + ✏ d = G 24 ⇠ Evaluation

l We generate synthetic network datasets à ground truth assortativity is known § Every node is randomly assigned a type § Vector attributes are then drawn from pre-defined multivariate normal distribution l Baseline assortativity § Average assortativity of every vector element

25 Sensitivity to ξ and ε

26 Comparison with the baseline

e = r s r r | true | | true base|

27 Spatial (network) dataset

l Longitudinal dataset l Gowalla dataset [Scellato et al. 2011] § 10,097,713 check-ins § 183,709 users § 1,407,727 distinct venues ü Each venue is associated with a category (283 in total) § 754,430 friendships with a timestamp for the formation time l Infer home location of users § Iterative application of DBSCAN on users check-in histories

28 Is there spatial homophily?

l Users are annotated with vector attribute l Cosine similarity is used:

c ⋅ c sim = u v u,v c c u 2 v 2 l Randomized networks § Pure G(n,m) Erdos-Renyi ensemble § Modified G(n,m) ensemble ü Control for the distribution of the home distances of friends

29 Is there spatial homophily?

Real Network ER Network Home-location Similarity similarity controlled random network similarity 0.05425 [0.00233,0.0024] [0.01834, 0.01837]

l These similarities correspond to 0.91 (p-value < 0.05) when ER is used and 0.31 (p-value <0.05) when controlling for the home location

“VA-Index: Quantifying Assortativity Patterns in Networks with Multidimensional Nodal Attributes”. K. Pelechrinis, D. Wei. PLOS ONE, 2016 30 Causes of (spatial) homophily

Mark University Mark University Library Library t Joe kn Joe tlr Li's Li's tk Restautant tlm Restautant Jack Jack tkm t tlz Mike's l Mike's Coffee Shop t Alice Alice ls Coffee Shop Peer influence Social selection

31 Peer influence and social selection

l Do people influence their friends with regards to the locations they visit ? § What is the geographic scope of this influence ? l Are there venues with specific characteristics that promote friendship creation ?

32 Global peer influence

l Key idea: Compare the global spatial similarity between

friends before and after their social tie formation (ts)

§ Increase in the similarity after ts à global peer influence

Inherent b b a a b c ⋅ c u v a cu ⋅ cv gsimu,v = similarity b b gsimu,v = a a cu cv c c 2 2 u 2 v 2

b cu (i) = number of check-ins u has performed in venue i before becoming friends with v

0.2 Before After 0.15 At most 11.32% of the global 0.1 gsim spatial similarity between friends 0.05 can be attributed to peer influence

0 0 1 2 3 4 5 10 10 10 10 10 10 33 between home locations of friends pairs (km) Global peer influence

l 11.32% is just an upper bound on the effect of global peer influence l Global similarity can naturally increase with time § Null/reference model § 9% of the global similarity observed can be explained from the increasing nature of gsim with time l Factoring in the above result, global peer influence can explain only up to 2.32% of the global similarity observed

34 Contextual dependencies

Real data Null model

Coffee Shop Coffee Shop 0.1 0.1 Before Before After After 0.05 0.05 gsim gsim

0 0 0 1 2 3 4 5 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 10 Food Food 0.1 0.1 Before Before After After 0.05 0.05 gsim gsim

0 0 0 1 2 3 4 5 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 10 Shopping Shopping 0.05 0.05 Before Before After After gsim gsim

0 0 0 1 2 3 4 5 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 10 Pub Pub 0.1 0.1 Before Before After After 0.05 0.05 gsim gsim

0 0 0 1 2 3 4 5 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 10 Airport Airport 0.2 0.2 Before Before After After 0.1 0.1 gsim gsim

0 0 0 1 2 3 4 5 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 10 35 Distance between home locations of friend pairs (km) Distance between home locations of friend pairs (km) Local peer influence

l While people do not influence their friends at a global scale can they influence each other in a localized manner? § E.g., around each others home location ? l We cannot use the above statistical analysis to estimate the levels of local peer influence § Randomization to the rescue! l We will devise another test for which we need pair of friends that had no common check-in before becoming friends but they have at least one common co-location at one’s home location after forming their social tie

36 § 43,618 such pairs Local peer influence

l Using our data we obtain a local similarity value:

v v data c ⋅ c lsim = u v u,v cv cv u 2 v 2 l This similarity is not necessarily due to local peer influence l We will compare it with two baseline similarity values § RRM: Random Reference Model § PRM: Popularity-based Reference Model

37 Local peer influence l RRM and PRM retains the structure of the original check- v v v v ins of u RRM cu,RRM ⋅ cv c ⋅ c lsim = PRM u,PRM v u,v v v lsimu,v = v v cu,RRM cv c c 2 2 u,PRM 2 v 2

ï ï ï ï ï ï

ï

ï ï ï ï ï

ï ï ï ï 38 ï ï main reason for the levels of local similarity (e.g., for “Air- add valuable information. ports”, the curve obtained from the real data is almost on top In this section we seek to answer the question posed above of the curve for PRM). and investigate the dynamics of the social selection mecha- Table 2 summarizes the progressive percentage of local nism. In particular, we examine the (network) characteristics similarity between friends that can be explained by the two of the venues that appear to trigger the selection mechanism. reference models (RRM and PRM), as well as the maximum More specifically, we consider pairs of friends that were possible effect of local influence. As we can see RRM can check-in co-located prior to becoming friends, and hence explain approximately 44% of the observed local similarity they exhibited a non-zero level of similarity. In total, we have when considering all the check-ins, while and additional 16% 84,460 such pairs. For these pairs of friends, we analyze the can be attributed to PRM. Consequently, local peer influence categories of the places that they were co-located prior to be- mechanisms can explain up to almost 40% of the local sim- coming friends, the degree - i.e., number of check-ins - of ilarity between friends. Hence, friends are influenced more these places, their clustering coefficient as well as their en- easily when they are in proximity as compared to a global tropy (to be defined later). As we will see in what follows, Local peerscale. Moreover, influence the levels of local influence are context de- all of the above metrics exhibit the same properties for the pendent. For instance, there appears to be no local peer in- friends pairs considered. fluence in “Airports” (upper bound of local peer influence is However, the above results alone are not conclusive. In par- only 10%), but significant levels are observed in “Pubs” (ap- ticular, the common locations of other pairs of users that have proximately 64%). been check-in co-located but never formed social tie, can also exhibit the same features. Hence, in order to avoid this sam- Venue % explained additional Upper pling bias and to be able to draw safe conclusions, we need a type by RRM % explained bound on reference group for comparison. We randomly pick 84,460 by PRM local peer pairs of users that have been check-in co-located (hence, influence having some non-zero level of similarity), but never became All 43.93% 16.29% 39.78% friends, and we calculate the same for their com- Venues mon locations. Note here that, our random sampling retains Coffee 26.32% 21.47% 52.21% in the reference group the same distribution of the home lo- Shop cation distances as that for the check-in co-located pairs that Food 56.02% 8.63% 35.35% eventually became friends. In particular, if there are µ pair of Shop- 47.14% 1.31% 51.55% friends with home distances in the range [1, 2], we sample ping uniformly at random µ reference pairs with home distances Pub 12.55% 23.63% 63.82% in the same range. Airport 6.84% 82.94% 10.22% To preview our results, the features of the common venues of the reference group exhibit significant differences as com- Table 2: Progressive percentage of local similarity that can be pared with those of the friends’ pairs. Furthermore, the com- 39 attributed to RRM, PRM and local peer influence. mon venues of the reference group pairs manifest character- istics of locations that introduce trivial similarity (e.g., large degree, low clustering coefficient, large entropy)! These re- 4 Social Selection sults clearly indicate that the social selection mechanism As alluded to Section 1, social selection works between peo- works upon non-trivial similarity and can be stimulated ple that have high levels of similarity and can cause the cre- by venues with specific features. ation of friendships. We further saw in the previous section In what follows we introduce the venue metrics we exam- that users exhibit some inherent similarity, which also in- ine and present the details of our results. creases with time. Also by observing the absolute global sim- Venue Category: As mentioned in Section 2 every spot in ilarity values of the null model in Section 3, we find that pairs Gowalla is labeled with a category depending on the type of of non-friends exhibit significant levels of global similarity as place, and there are 283 possible categories. Hence, we com- well. Why then social selection works with specific pairs and pute the category probability mass function of the common not with others? locations for the pairs in the two groups (friends and refer- When examining similar questions, we need to be cau- ence). Figure 6 depicts our results. As we can see, for pairs tious and in particular to avoid confusing actual similarity of users in the reference group, the mass function exhibits a with what we call “trivial” similarity. For instance, there are clear 4-modal distribution. On the contrary, the correspond- places that most of the people living in a city will visit, e.g., ing mass function for the pairs of friends is closer to a uni- subway station(s), city hall etc. Such places introduce triv- form distribution. The four categories-modes of the reference ial similarity and it does not necessarily mean that the social mass function are: “Convention center”, “Interactive”, “Air- selection mechanism will be triggered and these people will port” and “Travel/Lodging”. As we can see these are types of form social ties. In order to avoid confusions, we would like places that people can co-locate at, not necessarily because to emphasize here that trivial similarity is also part of the in- of their similarity or common interests. For instance, there herent similarity between two people5. However, it does not ent similarity is the major that varies with time. Never- 5Actually, it might be the case that the “trivial” part of the inher- theless, further examining this is beyond the scope of our work. Social selection

l Social selection mechanism leads people that are similar to form a social tie § However, pairs of non-friends can also exhibit significant levels of similarity § Why social selection is triggered for some pairs but not for others? l Key point: “Trivial” similarity l What are the (network) characteristics of the venues that friends co-locate before becoming connected? § Compare with a reference model

40 Venue degree

15000 Friends Reference 10000 ) /

deg( 5000

0 0 1 2 3 4 5 10 10 10 10 10 10 Distance between home locations of pairs (km)

deg(π ) = number of total check-ins in π

41 Venue

/

k nπ = number of unique users that have visited π CCπ = , nπ (nπ −1) / 2 k = number of friendship pairs between the nπ users

42 Venue entropy

15

10 ) / e( 5 Friends Reference 0 0 1 2 3 4 5 10 10 10 10 10 10 Distance between home locations of pairs (km)

e(π ) = − ∑ Pπ (u)⋅ log(Pπ (u)) u:u∈S∧Pπ (u)>0

43 Social selection mechanism…

l Is triggered by non-trivial similarity l Can be stimulated by venues with specific network characteristic § Low entropy § High cc § Low degree

“Understanding Spatial Homophily: The Case of Peer Influence and Social Selection”. K. Zhang, K. Pelechrinis. WWW 2014 44 Take away messages

l Randomization is a simple, yet powerful method, for quantifying the significance of a pattern l Depending on the application different ways of randomization might be more appropriate § Uniform (e.g., Erdos-Renyi) § Controlling for various characteristics (either network such as the degree or external such as the home location)

45 Conclusions for spatial homophily

l The spatial similarity of friends in a global scale cannot be attributed to peer influence l Peer influence can explain on average up to 40% of the geographically local spatial similarity between friends l The levels of local peer influence differ depending on the type of the location we consider l Social selection mechanism works upon non-trivial similarity and can be triggered by venues with specific network characteristics

46 Thank you!

47