10/9/2015
Complex Social Systems a guided exploration of concepts and methods
SOCIAL NETWORK ANALYSIS (SNA part1) Network Structure
Martin Hilbert (Dr., PhD)
Today’s questions
I. How can we formalize networks?
II. How can we describe the structure of networks? …network metrics…
Christakis & Fowler, 2007 III. How can you analyze a network with a software?
social = network !
Traditional database of attributes Network database of links
Gender Location Income Educat. … Jorge Maria Juan Magda …
Jorge M Urban 700 Tertiary … Jorge Self ‐‐‐ …
Maria F Urban 500 Second. … Maria Self ‐‐‐ …
Juan M Rural 300 Primary … Juan ‐‐‐ ‐‐‐ Self ‐‐‐ … Magda F Rural 200 ‐‐‐ … Magda ‐‐‐ Self …
… ………… … …………
Jorge Maria Jorge Maria
Juan Magda Juan Magda
1 10/9/2015
What nodes? Products
Organizations 1990 1996 2002 Organizational Units
Countries
People
Source: http://www.visualcomplexity.com/vc/; Powell et.al. (2010); Hidalgo, et al (2007)
What nodes?
When to use “new kind of node” and when to use “attribute”?
“new kind of node” = often exclusive “attribute” = very overlapping
Are the right people, with the adequate skills, connected to the right tasks?
What links? Visual contact
Communication
Joint use visual contact with teacher = 3 % = same as other student…
Animosity Sequence
Labor Flow
Source: Paul Butler (2010); Guerrero, & Axtell (2013). http://www.ciae.uchile.cl/index.php?page=view_noticias&id=322; http://www.tutorvista.com/content/biology/biology‐iv/ecosystem/food‐web.php; Adamic L. (2012), Social Network Analysis, Coursera;
2 10/9/2015
What links?
Whom do you go to for “The CEO appointed Calder manager because his colleagues respected him work‐related advice? as the most technically accomplished person… common practice… make your best producer the manager. Calder, however, turned out to be a very marginal figure in the trust network… he regularly told people they were stupid and paid little attention to their professional concerns. Leers knew that Calder was no diplomat, but he had no idea to what extent the performance and morale of the group were suffering as a result of Calder’s tyrannical management style… Frequently, senior managers presume that formal work ties will yield good relationship ties over time, and they assume that if they trust someone, others will too….” Whom would you trust to keep in confidence your concerns?
Krackhardt & Hanson. Informal networks: the company behind the chart. Harv Bus Rev. 1993 Jul‐Aug;71(4):104‐11.
Multiplex networks
Boccaletti, S., Bianconi, G., Criado, R., del Genio, C. I., Gómez‐Gardeñes, J., Romance, M., … Zanin, M. (2014). The structure and dynamics of multilayer networks. Physics Reports, 544(1), 1–122; A. Cardillo: http://bifi.es/~cardillo/images/multiplex‐air.png
Multi‐mode networks & Projections
accord. to Newman (2001)
http://toreopsahl.com/tnet/two‐mode‐networks/projection/
3 10/9/2015
Draw out (one of your) networks
http://netmap.wordpress.com/about/
MH
Where does the network end?
Source: Chu, D., Strand, R., Fjelland, R. (2003).
Which strength to give to ties? Tie strength o Real ties are in grey zone between 0 and 1: usually implicit or explicit dichotomization o Strength of weak ties: Granovetter (1973)
o 35,000 citations (GoogleScholar 2015) vs. Darwin’s Origin of Species (30,000) vs. Castell’s Network Society & Keynes’ General Theory (26,000) o People find jobs through: o Strong ties (at least two interactions/week): 17 % o Medium ties (at least one interaction/year): 56 % o Weak ties (less than one interaction /year): 28 % (!) o Achtung: naturally there are more weak than strong ties… but only if few, they build important bridges (small world…)!
Source: Granovetter M. 1973. The strength of weak ties. Am. J. Sociol. 78: 1360 –80
4 10/9/2015
How to design the network? …layout algorithms…
Source: own elaboration, based on Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Making sense of complex networks …through network analysis
http://www.ted.com/talks/eric_berlow_how _complexity_leads_to_simplicity.html 1:00 –3:35 min
http://www.cc.com/video‐clips/flgi4b/the‐daily‐ show‐with‐jon‐stewart‐afghanistan‐stability‐chart 0.00 –1:55 min
Specific network jargon Some of these words are well known, but have specific meaning o Clique o Cluster o Diameter o Degree o Bridge‐builder o Intermediator o … Others might sound new, but usually taken from somewhere else o Transitivity o Homophily o Centrality o Geodesic distance o … Again others, have been born in network analysis o Triadic closure o Weak ties o Small world o …
5 10/9/2015
Network Names
triad isolate A B C Directed Undirected communication friendship
D E
pendant dyad F G points lines origin vertices edges, arcs Math nodes links Computer Science sites bonds Physics actors ties, relations Sociology network = graph Source: P.Monge (2012), COMM 645: Communication Networks, USC Annenberg
Network Measures
Global Measures o Average degree, Degree distribution, path length, etc.
Local Measures o Clustering, transitivity, structural equivalence, etc.
Position Measures o Degree Centrality, Closeness, Betweeness, Eigenvector, etc.
Network Structure Degree o In‐degree: number of incoming links to one node o Out‐degree: number of outgoing links to one node o Degree: total number of in‐ & outgoing links o Questions: o how many links does this network have? o How many degrees does this network have? o What’s the degree of Jorge? o What’s the in‐degree of Jorge? Jorge Maria Juan Magda SUM o What’s the out‐degree of Jorge Jorge 0 1012 Maria 1 0 012 Juan 000 0 0 Networks and Matrix algebra Magda 0110 2 o Adjacency Matrix SUM 1 2 1 2
6 10/9/2015
Network Structure Roaming the network o Walk: pass among nodes through links Avg. path length:
o Path: only pass different nodes *Letters U.S.: 5.5 Milgram (1967) o Cycle: walk that ends where it began * Co‐authorship o Geodesic: shortest path between two nodes ‐ Physics = 5.9 Newman (2001) o Diameter: largest geodesic (longest shortest path) ‐ Math = 7.6 Grossman (2002) o Average path length: avg. geodesic ‐ Economics = 9.5 Goyal et al. (2004) * Facebook: 4.7 Backstrom et al (2012)
Source: Jackson M. (2013); Social and Economic Networks: Models and Analysis; Coursera; Milgram, S. (1967) “The Small‐World Problem,” Psych.Today 2:60–67; Newman, M.E.J. (2001) Scientific collaboration networks. Phys. Rev. E 64, 016131; Goyal, S., M. van der Leij, and J.‐L. Moraga‐Gonzalez (2006) “Economics: An Emerging Small World,” J.of Pol.Economy 114(2):403–412; Backstrom, L., P. Boldiy, M. Rosay, .J. Ugander S. Vignay (2012) ``Four Degrees of Separation’’ arXiv 1111.4570v3; Grossman, J.W. (2002) “The Evolution of the Math. Research Collaboration Graph,” in Proc. of the 33rd Southeastern Conf. on Combinatorics Vol. 158
Networks and Matrix algebra
Asere Buddy Cumpa Dude Buddy Dude Asere 0101
Buddy 1001
Cumpa 0001
Dude 1110 Asere Cumpa
What is the number of walks between these friends? i.e. What is the number of walks of LENGTH 2 between these friends? = getting from one to the other in 2 steps
Asere Buddy Cumpa Dude
Asere
Buddy
Cumpa
Dude
Networks and Matrix algebra Primer on Matrix multiplication
Source: Elizabeth Stapel (2003); http://www.purplemath.com/modules/mtrxmult.htm
7 10/9/2015
Networks and Matrix algebra
Asere Buddy Cumpa Dude Buddy Dude
Asere 0101
Buddy 1001
Cumpa 0001
Dude 1110 Asere Cumpa What is the number of walks of LENGTH 2 between these friends? Why? Asere Buddy Cumpa Dude Asere Buddy Cumpa Dude walk2 Asere Buddy Cumpa Dude
Asere 0101 Asere 0101 Asere 2111 Buddy 1001x=Buddy 1001 Buddy 1211 Cumpa 0001 Cumpa 0001 Cumpa 1110
Dude 1110 Dude 1110 Dude 1103
walk3 Asere Buddy Cumpa Dude Asere 2314 Buddy 3214 Cumpa 1103 Dude 4432 etc…
Who’s at the center of this network? Network Centrality 1 2 3
o Degree: most connected 1 2 3 o in‐ / out‐degree ‐ # of links incoming / outgoing o can be normalized by size of graph = # of possible ties (N‐1).
1 2 3 o Closeness: closest to all (reach all others more quickly; few steps with all others)
o reciprocal of the sum of distances 1 (1*3) + 0 + 1 + 2 + (3+4+5)*2 = 30 can be normalized by the maximum closeness possible o 2 (2*3) + 1 + 0 + 1 + (2+3+4)*2 = 26
3 (3*3) + 2 + 1 + 0 + (1+2+3)*2 = 24
o Betweenness: on the paths connecting all (“gatekeeper”, “intermediary” or “broker”?) o Sum of shortest paths through node / all shortest paths 1 2 3
where is the total number of shortest paths from node to node Left three 3 * 10 3 * 7 3 * 6 and is the number of those paths that pass through . [1] 7 6 [2] 3 6 o Eigenvector: friends of friends (it’s all about who you know) [3] 3 4
o Proportional to sum of neighbor’s centrality Right six 6 * 3 6 * 4 6 * 8 o Google’s PageRank: score of a page is proportional 54 56 78 to the sum of the scores of pages linked to it
Source: Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social Networks, 1(3), 215–239. https://en.wikipedia.org/wiki/Centrality;
Who you borrow Who you borrow Centrality Application money from? kerosene from?
Diffusion of microfinance (Banerjee, Chandrasekhar, Duflo, & Jackson, 2012) o 75 rural villages in Karnataka/ India, without microfinance initially o Bank entered 43 of them and offered microfinance o Questions: Who you go to Who you go to o Does it matter who the Bank talked to first about the program? temple with? for advice? o Does the attribute (e.g. profession, gender, age, religion, etc.) or network position matter? Which kind of network position? o Who passes on the information? o Which network properties or node attributes describe agents of change? o Challenges: o How to map the network: “who would you borrow from?” o They created 13 different/ multiplex networks Does not participate
Participates Degree centrality Eigenvector centrality
Probability of communicating = 0.1
Probability of communicating = 0.5
8 10/9/2015
Community Structure: partitioning 1 2 Understanding power relations, opinion formations, group splits, etc. 5 4 Clique 3 6 o Everybody is connected to everybody else in the clique 7 o Clique can overlap & are fragile 8 9 Component 1010 o weakly connected: non‐directed path btw every pair o strongly connected: every node reachable from every other node K‐cores o Connected to k‐nodes of the groups Girvan‐Newman algorithm o Calculate the betweenness of all existing ties o Remove the tie with the highest betweenness o Recalculate the betweenness of all ties o Repeat steps 2 and 3 until no ties remain Quality of Partition: Modularity o Value [‐1 and 1] measuring the density of links “inside” vs. “between” communities
Source: Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. PNAS, 99(12), 7821–7826; Vincent D Blondel, Jean‐Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000
Community detection
Triangles everywhere Clustering coefficient: *Prison friendships: 3.1 Clustering / Triadic closure / Transitivity: G(nsame,psame): 0.01 MacRae (1960) o Math: a ○ b and b ○ c => a ○ c * Co‐authorship o what fractions of my friends are friends? ‐ Math = 0.15 vs. 0.00002 Grossman (2002) o Can predict future friendships with high likelihood ‐ Econom. = 0.19 vs. 0.00002 Goyal et al. (2006) * www links: 0.11 vs. 0.0002 Adamic (1999)
Sources: Grossman, J.W. (2002) “The Evolution of the Math. Research Collaboration Graph,” in Proc. of the 33rd Southeastern Conf. on Combinatorics Vol. 158; Goyal, S., M. van der Leij, and J.‐L. Moraga‐Gonzalez (2006) “Economics: An Emerging Small World,” J.of Pol.Economy 114(2):403–412; Peter Monge (2012), COMM 645: Communication Networks, USC Annenberg ; https://en.wikipedia.org/wiki/Clustering_coefficient
9 10/9/2015
‘structural holes’ as separation between Struct‐ “Social capital” non‐redundant ural contacts hole
Structural Structural hole hole
Source: Peter Monge (2012), COMM 645: Communication Networks, USC Annenberg. https://en.wikipedia.org/wiki/Structural_holes
Local Network Structure
Structural equivalence o Two nodes are structurally equivalent if they are connected to the same nodes o Two parents are structurally equivalent (to their children) o Perfectly substitutable: same contacts, same degree, same centrality, same cliques, etc.
Regular equivalence o Two nodes are regularly equivalent if they have the same profile of ties with members of other sets of actors that are also regularly equivalent. o Two mothers are regularly equivalent because they are connected to children and to gynecologists, who are connected to many mothers, etc. o Every structural equivalence is also regular equivalent o Dream: Definition of social roles!? What is a teacher, CEO, or CIO? o Problem: often hard to interpret (e.g. too many regularities). Inclusion of attributes?
Sources: Robert Hanneman and Mark Riddle. 2005. Introduction to social network methods. http://faculty.ucr.edu/~hanneman/ ; Steve Borgatti (2006), “MB 874 Social Network Analysis”, www.analytictech.com/mb874/Slides/equivalence.pdf
Brokerage
Source: Peter Monge (2012), COMM 645: Communication Networks, USC Annenberg
10 10/9/2015
Amplifier
Voice
Source: Hilbert, et al (2015). One‐step, two‐step, network‐step flow?
11 10/9/2015
12 10/9/2015
13 10/9/2015
Network Structure + node attributes Political blogs Homophily o Old insight: William Turner (1545): "Byrdes of on kynde and color flok and flye allwayes together" o Term origin by Lazarsfeld and Merton (1954): status vs. value homophily
o Interracial marriages: 1% of whites; 5% of black; 14% of Asian (Fryer 2006)
o Closest friend: 10% of men => woman; 32% woman => men (Verbrugge 1977) o Reason for homophily: High school o Opportunity (contact theory): self‐reinforcing path‐dependency (lock‐in from the past) friends by o Cost (transaction theory): common culture/“weltanschauung” facilitates communication race o Social pressure (dialectic theory): current narrative / prejudices o Social competition (evolution theory): group/kin selection theory
Sexual orientation on Facebook
Source: Lazarsfeld, P. F. and Merton, R. K. (1954). "Friendship as a Social Process: A Substantive and Methodological Analysis". http://www.visualcomplexity.com/vc/; Fryer, R. (2007) “Guess Who’s Been Coming to Dinner?” Journal of Economic Perspectives 21(2):71–90. Verbrugge, L.M (1977) The Structure of Adult Friendship Choices , Social forces, 56:2, 576‐597; Jernigan & Mistree (2009). Gaydar. First Monday, 14(10).
Today’s questions
I. How can we formalize networks?
II. How can we describe the structure of networks? …network metrics…
Christakis & Fowler, 2007 III. How can you analyze a network with a software?
SNA software
14 10/9/2015
Gephi: http://gephi.org SNA software o The “SPSS” of network analysis UCINET: www.analytictech.com/ucinet/ o One of the pioneers and still most often taught NodeXL: https://nodexl.codeplex.com/ o Add‐in for MS Excel Visione: http://visone.info/ o Makes the pretty pictures Pajek: http://vlado.fmf.uni‐lj.si/pub/networks/pajek/ o For very large networks ORA: http://www.casos.cs.cmu.edu/projects/ora/ o For multi‐modal networks and longitudinal time‐series R: http://www.r‐project.org/ & http://www.rstudio.com/ o Igraph & Network: generic packages; o Sna: sociometric analysis of networks; o Tnet: weighted networks; o Ergm: exponential random graph models; o RSiena: dynamic actor‐oriented models; o etc; etc…
Gephi software
Download Gephi: http://gephi.org o Like all network analysis, it needs two kinds of data: o Nodes: 1st column = ID; 2nd column = label o Links: 1st column = Source; 2nd column = Target Fill out spreadsheet o Who did you ever write an (non‐Email‐list) Email? o “Data Laboratory” tab => “import spreadsheet” => o Delete non‐existing links (in Data laboratory)
P.S. Data representation o Adjacency Matrix Jorge Maria Juan Magda o Edge list Jorge 0 101 Source Target Maria 1 0 01 Jorge Maria Juan 000 0 o Adjacency list Jorge Magda Magda 0110 Maria Jorge Jorge Maria Magda Maria Magda Maria Jorge Magda Magda Maria Juan ‐‐ Magda Juan Magda Maria Juan
Gephi software
Look at “overview” tab o Zoom in and out: mouse‐wheel o Move around: right mouse‐button o Show labels => o Layout (ForceAtlas2; Yifan Hu…) + drag with “hand” Change color and size of nodes o Ranking tab o Color button; Diamond = size button; Data‐table o …
Calculate basics: o Results will appear in “Data Laboratory” tab o Sort results o Visualize results with “Partition” and “Ranking” tabs o Who’s the biggest spammer? (out‐degree) o Are there different groups? (modularity: density of links “inside” vs. “between” communities) o How many steps does information need to flow? (average path length)
15