Quick viewing(Text Mode)

Section 7.13: Homophily (Or Assortativity)

Section 7.13: Homophily (Or Assortativity)

Section 7.13: (or )

By: Ralucca Gera, NPS Are hubs adjacent to hubs? • How does a node’s relate to its neighbors’ degree? • Real networks usually show a non-zero degree correlation (defined later compared to random). • If it is positive, the network is said to have assortatively mixed degrees (assortativity based other attributes can also be considered). • If it is negative, it is disassortatively mixed. • According to Newman, social networks tend to be assortatively mixed, while other kinds of networks are generally disassortatively mixed. 2 Partitioning the network

Sociologists have observed network partitioning based on the following characteristics: – Friendships, aquintances, business relationships – Relationships based on certain characteristics: •Age • Nationality • Language • Education • Income level Homophily: It is the tendency of individuals to choose friends with similar characteristic. “Like links with like.” Example of homophily

4 http://www.slideshare.net/NicolaBarbieri/homophily-and-influence-in-social-networks Assortativity by race

James Moody 5 Homophily or

Friendship networks at a US highschool • Homophily (or assortative mixing) is the strong tendency of people to associate with others whom they perceive are similar to themselves in some way • Disassortative mixing is the tendency for people to associate with others who are unlike them (such as gender in sexual contact networks, where the majority of partnerships are between opposite sex individuals)

6 Assortative mixing

Titter data: political retweet network Red = Republicans Blue = Democrats

Note that they mostly tweet and re-tweet to each other

Conover et at., 2011 7 Homophily (or Assortative mixing)

Homophily or assortativity is a common property of social networks (but not necessary): – Papers in citation networks tend to cite papers in the same field – Websites tend to point to websites in the same language – Political views – Race – Obesity

8 Homophily in

here

https://gephi.org/tutorials/gephi-tutorial-layouts.pdf 9 Homophily in Python

• In NetworkX, to check degree Assortativity: r = nx.attribute_assortativity_coefficient(G)

• To check an attribute’s assortativity: assortivity_val=nx.attribute_assortativity_coefficient(G, "color“) The attribute “color” can be replaced by other attributes that your data was tagged with.

10 Disassortative

• Disassortative mixing: “like links with dislike”. • Dissasortative networks are the ones in which adjacent nodes tend to be dissimilar: – Dating network (females/males) – Food web (predator/prey) – Economic networks (producers/consumers)

11 Assortative mixing (homophily) We will study two types of assortative mixing: 1. Based on enumerative characteristics (the characteristics don’t fall in any particular order), such as: 1. Nationality 2. Race 3. Gender 2. Based on scalar characteristics, such as: 1. Age 2. Income 3. By degree: high degree connect to high degree 12 Based on enumerative characteristics (characteristics that don’t fall in any particular order), such as: 1. Nationality 2. Race 3. Gender 13 Possible defn assortativity

•A network is assortative if there is a significant fraction of the edges between same-type vertices • How to quantify how assortative the network is?

• Let be the assortative class of i, # •S =  Then: What is the assortativity if we consider V(G) as the only class? Does it make sense?

14 Alternative definitions

Alternate defintions, as before, will allow to compare the Assortativity of the current network to the one of a :

• Compute the fraction of edges in in the given network

• Compute the fraction of edges in in a random graph • Consider their difference

15 Compute the fraction of edges in in the given network

• Let be the class of vertex i • Let be the total number of classes

• Let ) = be the Kronecker that accounts for vertices in the same class. • Then the number of edges of the same type is:

Checks if vertices are in the same class 16 Compute the fraction of edges in in the random network • Construct a random graph with the same degree distrib.

• Let be the class of vertex i • Let be the total number of classes Checks if vertices are in the same class

• Let ( ) = be the Kronecker • Pick an arbitrary edge in the random graph: – pick a vertex i → there are deg edges incident with it, so deg choices for i to be the 1st end vertex of our arbitrary edge – and then there are deg choices for j to be the other end • If edges are placed at random, the expected number of edges between i and j is || || || # of vertices at the end of the |E(G)| edges 17 Compute the fraction of edges in in the random network

• Let be the class of vertex i • Let be the total number of classes Checks if vertices are in the same class

• Let ( ) = be the Kronecker • The expected number of edges between i and j if m edges are placed at random is || • Then the number of edges of the same type is:

∈ duplications: as you choose vertex j above, the edge ji will be counted after edge ij was counted 18 Consider their difference

- ∈ ||

Checks if vertices = are in the same class ||

And now normalize by m = |E(G)|: = Which is the same defn as the modularity in community detection (assortativity based on deg).19 Modularity

•Q = ∑ ∙ , Checks if vertices || are in the same class • Measure used to quantify the like vertices being connected to like vertices • -1 < Q < 0 means there are fewer edges between like vertices in a class compared to a random network i.e. disassortative network • 0 < Q < 1 means there are more edges between like vertices in a class compared to a random network i.e. assortative network • Q = 0 means it behaves like a random network.

20 Ranges of the modularity values

• Range: -1 < Q < 1 • However, it never reaches 1 (we’ll see shortly), even in a perfectly assortative network (where all like nodes are adjacent) • The modularity is generally far from 1 • So what is a good modularity value (Q > ?) that says the network is assortative? • Should we normalize Q again? What are some choices?

21 Enumerative characteristics

• We normalize the modularity value Q, by the maximum value that it can get – Perfect mixing is when all edges fall between vertices of the same type ∑ 1 1 kikj Qmax  (2m  (ci,cj )) 2m ij 2m • Then, the assortativity coefficient is given by:

(/2)(,)Akkmcc  Q ij ij i j i j 11  Qmkkmcc2(/2)(,) max ij ij i j

22 Alternative form for edge list • An alternative faster to compute formula, if the network is given by a list of edges rather than : 1 • Let e rs   A ij  ( c i , r )  ( c j , s ) be the fraction of 2m edges that joinij vertices of type to vertices of type 1 ar  ki(ci,r) • Let2m be the fraction of ends of edges attachedij to vertices of type r • We further have that the edges between group and is (ci,cj ) (ci,r)(cj,r)

r 23 Alternative form for edge list

  1 kikj Q  Aij   (ci,r)(cj,r) 2m  ij  2m r

1  1    1  2m Aij(ci,r)(cj,r) ki(ci,r) ki(cj,r)    2m 2m  r  ij i j 

 (e 2  rr  ar ) r 1 Where we defined a r   k i ( c i, r ) is the fraction of edges 2m ij 1 between nodes of type and e rs   A ij  ( c i , r )  ( c j, s ) is the fraction 2m of edges between and , ij

24 and deg Based on scalar characteristics, such as: 1. Age 2. Income

25 Scalar characteristics

• Scalar characteristics take values that come in a particular order (enumerative) and take numerical values – Such as age, income – Age: In this case two people can be considered similar if they are born the same day or within a year or within x years etc. • If people are friends with others of the same age, we consider the network assortatively mixed by age (or stratified by age)

26 Assortativity by race

James Moody 27 Assortativity by grade/age

James Moody 28 Scalar characteristics

• When we consider scalar characteristics we basically have an approximate notion of similarity between adjacent vertices (i.e. how far/close the values are) –There is no approximate similarity that can be measured this way when we talk about enumerative characteristics; rather present/absent

29 Recall

Friendship networks at a US highschool • Homophily (or assortative mixing) is the strong tendency of people to associate with others whom they perceive are similar to themselves in some way • Disassortative mixing is the tendency for people to associate with others who are unlike them (such as gender in sexual contact networks, where the majority of partnerships are between opposite sex individuals)

30 Scalar characteristics

Denser along the Friendships at the y = x line same US high school: (because of the each dot represents a way data is friendship (an edge displayed) from the network)

Sparser as the difference in grades increases

31 Strongly assortative

• Top: scatter plot of 1141 married couples • Bottom: same data showing a histogram of the age difference • Data: 1995 US National Survey of Family Growth

Newman, Phys Rev E. 67, 026126 (2003) 32 Scalar characteristics

• How do we measure this assortative mixing? • Would the idea of the enumartive assortative mixing work? • That is to place vertices in bins based on scalar values: – Treat vertices that fall in the same bin (such as age) as “like vertices” or “identical” – Apply modularity metric for enumerative characteristics

33 Scalar characteristics

This approach misses much of the point of scalar part of the scalar characteristics – Why are two vertices identical if they are in the same bin? Should they only be approximately the same type? – Why two vertices that fall in different bins are different? That is, if the bins are close enough (1 grade apart), could the vertices be more related than these that fall farther (like 4 grades apart)? – We don’t have just a binary choice as we did for enumerative characteristics

34 Scalar characteristics

• We will use a covariance measure (next slide)

• Let xi (or xj) be the value of the scalar characteristic of vertex i (j), such as the age

• Then the expected mean of at the end of an edge over all edges of the graph (not just the mean value of ) is: A vertex of degree appears Ax at the end of edges  ij i kxii ij i 1  kxii Ammij 22i • And so ij

35 Scalar characteristics

• What is the covariance of two variables i and j

(based on the scalar characteristic xi and xj) over all edges of the graph? Product of values (a function of two variables)  Aij (xi )(xj ) ij 1 kikj cov(xi, xj )  ...  (Aij  )xixj 2m Aij ij 2m ij • Which is similar to the enumerative modularity

Q = || 36 Either 0 or 1 Scalar characteristics

• If the covariance is positive, we have assortativity mixing, • If the covariance is negative, we have disassortativity • As with the modularity Q we need to normalize the above covariance value to be closer to 1 for a perfectly mixed network

37 Scalar characteristics

Recall 1 kkij cov(xi ,xAxxj ) (ij ) i j 22mmij

• Setting xi=xj for all the existing edges (i,j) we get the maximum possible value (which is the variance of the variable) for perfectly assortative network:

112 kkij kk ij max (Axij i xx i j ) ( k i ij ) xx i j 2222mmmmij ij • This can then be used for comparison 38 Scalar characteristics

Then the assortativity coefficient r is defined as:

Similar to the enumerative  (Aij  kikj /2m)xixj r  ij one again (k  kk /2m)xx ij i ij i j i j r=1  Perfectly assortative network r=-1  Perfectly disassortative network r=0  no correlation Similar to the Pearson correlation coefficient

39 Simpler reformulation for Corr. Coeff.

= .621 for the network below (strongly assort.)

is the fraction of edges in a network that connect a vertex of type i to one of type j, form the matrix above.

Ref: Newman, Phys Rev E. 67, 026126 (2003)

40 Computer Science faculty

88 Computer Science faculty: • vertices are PhD granting institutions in North America • Edge (i,j) means that PhD at i and now faculty at j labels are US census regions + Canada

Five Lectures on Networks, by 41 Assortative

• Communities are tight and dense groups that connect to other dense groups

42 Five Lectures on Networks, by Aaron Clauset Community structure

Five Lectures on Networks, by Aaron Clauset 43 By degree: high degree connect to high degree

44 Assortative mixing by degree

• A special case is when the characteristic of interest is the degree of the node • Commonly used in social networks (the most used one of the scalar characteristics) • More interesting since degree is a topologial property of the network (not just a value like in the previously seen scalars) • In this case studying the assortative mixing will show if high/low-degree nodes connect with other high/low-degree nodes or not Assortative mixing by degree

• Assortative network by degree  core of high degrees and a periphery of low degrees (Figure (a) below) • Disassortative network by degree  uniform (low degree adjacent to high degree)

Either 0 or 1

(a) (b) Assortative mixing by degree Assortative mixing by degree

• measured like the other scalar assortative • requires the same adjacency information of the network The scalar is the degree 1 kikj cov(ki,kj )  (Aij  )kikj 2mij 2m • The correlation coefficient is the same normalize value: (A k k / 2m)k k ij ij i j i j degcorr_coeff  (k  k k / 2m)k k ij i ij i j i j 48 Bottom line

• Same formula: (/2)(,)Akkmcc  Q ij ij i j i j 11   Qmkkmcc2(/2)(,) max ij ij i j • If enumerative: either the property is or is not present (the delta Kronecker function is used) • If scalar: use the value of the scalar instead of the delta Kronecker function – In particular one could use the degree

49 Examples

Newman, Phys. Rev. E. 67 , 0626126 (2003) 50 Same slide with more details

Newman, Phys. Rev. E. 67 , 0626126 (2003) 51 Assortative mixing by degree

• The assortative mixing by degree coefficient, , measured how connected are the vertices of the same degree (compared to random graphs) • In general the absolute values of the assortativity coefficient r is not large in observed networks (the correlation between the degrees is not especially strong)

52 Table 8.1 page 237

r = assortativity coefficient

53 Assortative mixing by degree

• Technological/biological/information networks tend to have negative (disassortative mixing based on degree), while social networks tend to have positive (assortative) • Good time to share your thoughts on why. • Why? One possible reason is that technological, biological and information networks are simple graphs (no edge multiplicity) and Maslov et al. showed these networks tend to be disassortative since the number of vertices between high degree vertices is limited by the number of vertices of high degree 54 Assortative mixing by degree

• The multiplicity of edges in social networks tend to drive the value of up • There might be other biases in place in social networks that cause a slight assortative mixing: – Communities: low degree nodes adjacent to low degree nodes within the same community, and the high degree ones connecting the communities might have better chances of being adjacent

55 References

• Besides the references at the bottom left of different slides, this is a good overall reference: – Newman, Mark EJ. "Mixing patterns in networks." Physical Review E 67.2 (2003): 026126.

56