<<

10/9/2015

Complex Social Systems a guided exploration of concepts and methods

SOCIAL NETWORK ANALYSIS (SNA part1) Network Structure

Martin Hilbert (Dr., PhD)

Today’s questions

I. How can we formalize networks?

II. How can we describe the structure of networks? …network metrics…

Christakis & Fowler, 2007 III. How can you analyze a network with a software?

social = network !

Traditional database of attributes Network database of links

Gender Location Income Educat. … Jorge Maria Juan Magda …

Jorge M Urban 700 Tertiary … Jorge Self ‐‐‐ …

Maria F Urban 500 Second. … Maria Self ‐‐‐ …

Juan M Rural 300 Primary … Juan ‐‐‐ ‐‐‐ Self ‐‐‐ … Magda F Rural 200 ‐‐‐ … Magda ‐‐‐ Self …

… ………… … …………

Jorge Maria Jorge Maria

Juan Magda Juan Magda

1 10/9/2015

What nodes? Products

Organizations 1990 1996 2002 Organizational Units

Countries

People

Source: http://www.visualcomplexity.com/vc/; Powell et.al. (2010); Hidalgo, et al (2007)

What nodes?

When to use “new kind of node” and when to use “attribute”?

“new kind of node” = often exclusive “attribute” = very overlapping

Are the right people, with the adequate skills, connected to the right tasks?

What links? Visual contact

Communication

Joint use visual contact with teacher = 3 % = same as other student…

Animosity Sequence

Labor Flow

Source: Paul Butler (2010); Guerrero, & Axtell (2013). http://www.ciae.uchile.cl/index.php?page=view_noticias&id=322; http://www.tutorvista.com/content/biology/biology‐iv/ecosystem/food‐web.php; Adamic L. (2012), Analysis, Coursera;

2 10/9/2015

What links?

Whom do you go to for “The CEO appointed Calder manager because his colleagues respected him work‐related advice? as the most technically accomplished person… common practice… make your best producer the manager. Calder, however, turned out to be a very marginal figure in the trust network… he regularly told people they were stupid and paid little attention to their professional concerns. Leers knew that Calder was no diplomat, but he had no idea to what extent the performance and morale of the group were suffering as a result of Calder’s tyrannical management style… Frequently, senior managers presume that formal work ties will yield good relationship ties over time, and they assume that if they trust someone, others will too….” Whom would you trust to keep in confidence your concerns?

Krackhardt & Hanson. Informal networks: the company behind the chart. Harv Bus Rev. 1993 Jul‐Aug;71(4):104‐11.

Multiplex networks

Boccaletti, S., Bianconi, G., Criado, R., del Genio, C. I., Gómez‐Gardeñes, J., Romance, M., … Zanin, M. (2014). The structure and dynamics of multilayer networks. Physics Reports, 544(1), 1–122; A. Cardillo: http://bifi.es/~cardillo/images/multiplex‐air.png

Multi‐mode networks & Projections

accord. to Newman (2001)

http://toreopsahl.com/tnet/two‐mode‐networks/projection/

3 10/9/2015

Draw out (one of your) networks

http://netmap.wordpress.com/about/

MH

Where does the network end?

Source: Chu, D., Strand, R., Fjelland, R. (2003).

Which strength to give to ties?  Tie strength o Real ties are in grey zone between 0 and 1: usually implicit or explicit dichotomization o Strength of weak ties: Granovetter (1973)

o 35,000 citations (GoogleScholar 2015) vs. Darwin’s Origin of Species (30,000) vs. Castell’s Network Society & Keynes’ General Theory (26,000) o People find jobs through: o Strong ties (at least two interactions/week): 17 % o Medium ties (at least one interaction/year): 56 % o Weak ties (less than one interaction /year): 28 % (!) o Achtung: naturally there are more weak than strong ties… but only if few, they build important bridges (small world…)!

Source: Granovetter M. 1973. The strength of weak ties. Am. J. Sociol. 78: 1360 –80

4 10/9/2015

How to design the network? …layout algorithms…

Source: own elaboration, based on Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Making sense of complex networks …through network analysis

http://www.ted.com/talks/eric_berlow_how _complexity_leads_to_simplicity.html 1:00 –3:35 min

http://www.cc.com/video‐clips/flgi4b/the‐daily‐ show‐with‐jon‐stewart‐afghanistan‐stability‐chart 0.00 –1:55 min

Specific network jargon  Some of these words are well known, but have specific meaning o o Cluster o Diameter o Degree o ‐builder o Intermediator o …  Others might sound new, but usually taken from somewhere else o Transitivity o o o Geodesic distance o …  Again others, have been born in network analysis o Triadic closure o Weak ties o Small world o …

5 10/9/2015

Network Names

triad isolate A B C Directed Undirected communication friendship

D E

pendant dyad F G points lines origin vertices edges, arcs Math nodes links Computer Science sites bonds Physics actors ties, relations network = graph Source: P.Monge (2012), COMM 645: Communication Networks, USC Annenberg

Network Measures

 Global Measures o Average degree, , path length, etc.

 Local Measures o Clustering, transitivity, structural equivalence, etc.

 Position Measures o Degree Centrality, Closeness, Betweeness, Eigenvector, etc.

Network Structure  Degree o In‐degree: number of incoming links to one node o Out‐degree: number of outgoing links to one node o Degree: total number of in‐ & outgoing links o Questions: o how many links does this network have? o How many degrees does this network have? o What’s the degree of Jorge? o What’s the in‐degree of Jorge? Jorge Maria Juan Magda SUM o What’s the out‐degree of Jorge Jorge 0 1012 Maria 1 0 012 Juan 000 0 0  Networks and Matrix algebra Magda 0110 2 o Adjacency Matrix SUM 1 2 1 2

6 10/9/2015

Network Structure  Roaming the network o Walk: pass among nodes through links Avg. path length:

o Path: only pass different nodes *Letters U.S.: 5.5 Milgram (1967) o Cycle: walk that ends where it began * Co‐authorship o Geodesic: shortest path between two nodes ‐ Physics = 5.9 Newman (2001) o Diameter: largest geodesic (longest shortest path) ‐ Math = 7.6 Grossman (2002) o Average path length: avg. geodesic ‐ Economics = 9.5 Goyal et al. (2004) * Facebook: 4.7 Backstrom et al (2012)

Source: Jackson M. (2013); Social and Economic Networks: Models and Analysis; Coursera; Milgram, S. (1967) “The Small‐World Problem,” Psych.Today 2:60–67; Newman, M.E.J. (2001) Scientific collaboration networks. Phys. Rev. E 64, 016131; Goyal, S., M. van der Leij, and J.‐L. Moraga‐Gonzalez (2006) “Economics: An Emerging Small World,” J.of Pol.Economy 114(2):403–412; Backstrom, L., P. Boldiy, M. Rosay, .J. Ugander S. Vignay (2012) ``Four Degrees of Separation’’ arXiv 1111.4570v3; Grossman, J.W. (2002) “The Evolution of the Math. Research ,” in Proc. of the 33rd Southeastern Conf. on Combinatorics Vol. 158

Networks and Matrix algebra

Asere Buddy Cumpa Dude Buddy Dude Asere 0101

Buddy 1001

Cumpa 0001

Dude 1110 Asere Cumpa

What is the number of walks between these friends? i.e. What is the number of walks of LENGTH 2 between these friends? = getting from one to the other in 2 steps

Asere Buddy Cumpa Dude

Asere

Buddy

Cumpa

Dude

Networks and Matrix algebra Primer on Matrix multiplication

Source: Elizabeth Stapel (2003); http://www.purplemath.com/modules/mtrxmult.htm

7 10/9/2015

Networks and Matrix algebra

Asere Buddy Cumpa Dude Buddy Dude

Asere 0101

Buddy 1001

Cumpa 0001

Dude 1110 Asere Cumpa What is the number of walks of LENGTH 2 between these friends? Why? Asere Buddy Cumpa Dude Asere Buddy Cumpa Dude walk2 Asere Buddy Cumpa Dude

Asere 0101 Asere 0101 Asere 2111 Buddy 1001x=Buddy 1001 Buddy 1211 Cumpa 0001 Cumpa 0001 Cumpa 1110

Dude 1110 Dude 1110 Dude 1103

walk3 Asere Buddy Cumpa Dude Asere 2314 Buddy 3214 Cumpa 1103 Dude 4432 etc…

Who’s at the center of this network? Network Centrality 1 2 3

o Degree: most connected 1 2 3 o in‐ / out‐degree ‐ # of links incoming / outgoing o can be normalized by size of graph = # of possible ties (N‐1).

1 2 3 o Closeness: closest to all (reach all others more quickly; few steps with all others)

o reciprocal of the sum of distances 1 (1*3) + 0 + 1 + 2 + (3+4+5)*2 = 30 can be normalized by the maximum closeness possible o 2 (2*3) + 1 + 0 + 1 + (2+3+4)*2 = 26

3 (3*3) + 2 + 1 + 0 + (1+2+3)*2 = 24

o Betweenness: on the paths connecting all (“gatekeeper”, “intermediary” or “broker”?) o Sum of shortest paths through node / all shortest paths 1 2 3

where is the total number of shortest paths from node to node Left three 3 * 10 3 * 7 3 * 6 and is the number of those paths that pass through . [1] 7 6 [2] 3 6 o Eigenvector: friends of friends (it’s all about who you know) [3] 3 4

o Proportional to sum of neighbor’s centrality Right six 6 * 3 6 * 4 6 * 8 o Google’s PageRank: score of a page is proportional 54 56 78 to the sum of the scores of pages linked to it

Source: Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social Networks, 1(3), 215–239. https://en.wikipedia.org/wiki/Centrality;

Who you borrow Who you borrow Centrality Application money from? kerosene from?

 Diffusion of microfinance (Banerjee, Chandrasekhar, Duflo, & Jackson, 2012) o 75 rural villages in Karnataka/ India, without microfinance initially o Bank entered 43 of them and offered microfinance o Questions: Who you go to Who you go to o Does it matter who the Bank talked to first about the program? temple with? for advice? o Does the attribute (e.g. profession, gender, age, religion, etc.) or network position matter? Which kind of network position? o Who passes on the information? o Which network properties or node attributes describe agents of change? o Challenges: o How to map the network: “who would you borrow from?” o They created 13 different/ multiplex networks Does not participate

Participates Degree centrality Eigenvector centrality

Probability of communicating = 0.1

Probability of communicating = 0.5

8 10/9/2015

Community Structure: partitioning 1 2 Understanding power relations, opinion formations, group splits, etc. 5 4  Clique 3 6 o Everybody is connected to everybody else in the clique 7 o Clique can overlap & are fragile 8 9  Component 1010 o weakly connected: non‐directed path btw every pair o strongly connected: every node reachable from every other node  K‐cores o Connected to k‐nodes of the groups  Girvan‐Newman algorithm o Calculate the betweenness of all existing ties o Remove the tie with the highest betweenness o Recalculate the betweenness of all ties o Repeat steps 2 and 3 until no ties remain  Quality of Partition: o Value [‐1 and 1] measuring the density of links “inside” vs. “between” communities

Source: Girvan, M., & Newman, M. E. J. (2002). in social and biological networks. PNAS, 99(12), 7821–7826; Vincent D Blondel, Jean‐Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000

Community detection

Triangles everywhere Clustering coefficient: *Prison friendships: 3.1  Clustering / Triadic closure / Transitivity: G(nsame,psame): 0.01 MacRae (1960) o Math: a ○ b and b ○ c => a ○ c * Co‐authorship o what fractions of my friends are friends? ‐ Math = 0.15 vs. 0.00002 Grossman (2002) o Can predict future friendships with high likelihood ‐ Econom. = 0.19 vs. 0.00002 Goyal et al. (2006) * www links: 0.11 vs. 0.0002 Adamic (1999)

Sources: Grossman, J.W. (2002) “The Evolution of the Math. Research Collaboration Graph,” in Proc. of the 33rd Southeastern Conf. on Combinatorics Vol. 158; Goyal, S., M. van der Leij, and J.‐L. Moraga‐Gonzalez (2006) “Economics: An Emerging Small World,” J.of Pol.Economy 114(2):403–412; Peter Monge (2012), COMM 645: Communication Networks, USC Annenberg ; https://en.wikipedia.org/wiki/Clustering_coefficient

9 10/9/2015

‘structural holes’ as separation between Struct‐ “” non‐redundant ural contacts hole

Structural Structural hole hole

Source: Peter Monge (2012), COMM 645: Communication Networks, USC Annenberg. https://en.wikipedia.org/wiki/Structural_holes

Local Network Structure

 Structural equivalence o Two nodes are structurally equivalent if they are connected to the same nodes o Two parents are structurally equivalent (to their children) o Perfectly substitutable: same contacts, same degree, same centrality, same , etc.

 Regular equivalence o Two nodes are regularly equivalent if they have the same profile of ties with members of other sets of actors that are also regularly equivalent. o Two mothers are regularly equivalent because they are connected to children and to gynecologists, who are connected to many mothers, etc. o Every structural equivalence is also regular equivalent o Dream: Definition of social roles!? What is a teacher, CEO, or CIO? o Problem: often hard to interpret (e.g. too many regularities). Inclusion of attributes?

Sources: Robert Hanneman and Mark Riddle. 2005. Introduction to social network methods. http://faculty.ucr.edu/~hanneman/ ; Steve Borgatti (2006), “MB 874 ”, www.analytictech.com/mb874/Slides/equivalence.pdf

Brokerage

Source: Peter Monge (2012), COMM 645: Communication Networks, USC Annenberg

10 10/9/2015

Amplifier

Voice

Source: Hilbert, et al (2015). One‐step, two‐step, network‐step flow?

11 10/9/2015

12 10/9/2015

13 10/9/2015

Network Structure + node attributes Political blogs  Homophily o Old insight: William Turner (1545): "Byrdes of on kynde and color flok and flye allwayes together" o Term origin by Lazarsfeld and Merton (1954): status vs. value homophily

o Interracial marriages: 1% of whites; 5% of black; 14% of Asian (Fryer 2006)

o Closest friend: 10% of men => woman; 32% woman => men (Verbrugge 1977) o Reason for homophily: High school o Opportunity (contact theory): self‐reinforcing path‐dependency (lock‐in from the past) friends by o Cost (transaction theory): common culture/“weltanschauung” facilitates communication race o Social pressure (dialectic theory): current narrative / prejudices o Social competition (evolution theory): group/kin selection theory

Sexual orientation on Facebook

Source: Lazarsfeld, P. F. and Merton, R. K. (1954). "Friendship as a Social Process: A Substantive and Methodological Analysis". http://www.visualcomplexity.com/vc/; Fryer, R. (2007) “Guess Who’s Been Coming to Dinner?” Journal of Economic Perspectives 21(2):71–90. Verbrugge, L.M (1977) The Structure of Adult Friendship Choices , Social forces, 56:2, 576‐597; Jernigan & Mistree (2009). Gaydar. First Monday, 14(10).

Today’s questions

I. How can we formalize networks?

II. How can we describe the structure of networks? …network metrics…

Christakis & Fowler, 2007 III. How can you analyze a network with a software?

SNA software

14 10/9/2015

 Gephi: http://gephi.org SNA software o The “SPSS” of network analysis  UCINET: www.analytictech.com/ucinet/ o One of the pioneers and still most often taught  NodeXL: https://nodexl.codeplex.com/ o Add‐in for MS Excel  Visione: http://visone.info/ o Makes the pretty pictures  Pajek: http://vlado.fmf.uni‐lj.si/pub/networks/pajek/ o For very large networks  ORA: http://www.casos.cs.cmu.edu/projects/ora/ o For multi‐modal networks and longitudinal time‐series  R: http://www.r‐project.org/ & http://www.rstudio.com/ o Igraph & Network: generic packages; o Sna: sociometric analysis of networks; o Tnet: weighted networks; o Ergm: exponential models; o RSiena: dynamic actor‐oriented models; o etc; etc…

Gephi software

 Download Gephi: http://gephi.org o Like all network analysis, it needs two kinds of data: o Nodes: 1st column = ID; 2nd column = label o Links: 1st column = Source; 2nd column = Target  Fill out spreadsheet o Who did you ever write an (non‐Email‐list) Email? o “Data Laboratory” tab => “import spreadsheet” => o Delete non‐existing links (in Data laboratory)

 P.S. Data representation o Adjacency Matrix Jorge Maria Juan Magda o Edge list Jorge 0 101 Source Target Maria 1 0 01 Jorge Maria Juan 000 0 o Adjacency list Jorge Magda Magda 0110 Maria Jorge Jorge Maria Magda Maria Magda Maria Jorge Magda Magda Maria Juan ‐‐ Magda Juan Magda Maria Juan

Gephi software

 Look at “overview” tab o Zoom in and out: mouse‐wheel o Move around: right mouse‐button o Show labels => o Layout (ForceAtlas2; Yifan Hu…) + drag with “hand”  Change color and size of nodes o Ranking tab o Color button; Diamond = size button; Data‐table o …

 Calculate basics: o Results will appear in “Data Laboratory” tab o Sort results o Visualize results with “Partition” and “Ranking” tabs o Who’s the biggest spammer? (out‐degree) o Are there different groups? (modularity: density of links “inside” vs. “between” communities) o How many steps does information need to flow? (average path length)

15