<<

Analysis

Day 1 , , Clustering, Transitivity Graph Theoretic Concepts • In this section we will cover: – Reachability/Connectednes – Definitions s – Terminology • Connectivity, flows – Adjacency – Isolates, Pendants, Centers – Density concepts – Components, bi-components • E.g, – Walk Lengths, Completeness • Geodesic distance – Walks, trails, paths – Independent paths – Cycles, Trees – Cutpoints, bridges What is a Graph?

• G = (V,E) – A graph is a set of vertices and edges • Vertices, sometimes called nodes, are the actors or entities between which relationships exist – People – Organizations • Edges, sometimes called relations or lines, are the behavior/interaction/relationship of interest – Communicates with – Trusts – Uses Kinds of Network Data

Complete Ego 1- Bonni Bo mode e b Biff Bill Be ty t

Betsy

Dr. Web MD 2- Jones Bill mode PDR Merc k manu al Patient Mo Jane m 1-mode1-mode Complete Complete NetworkNetwork

Information flow within virtual group

Data collected by Cross 2-mode Complete Network

Data compiled from newspaper society pages by Davis, Gardner & Gardner Bipartite graphs

• Used to represent 2- mode data • Nodes can be partitioned into two sets (corresponding to modes) • Ties occur only between sets, not within Complete Network Data vs.

• The term “Complete Network Data” refers to collecting data for/from all actors (vertices) on the graph – The opposite if Ego-Network or Ego-Centric Network data, in which data is collected only from the perspective an individual (the ego) • The term “Complete Graph” refers to a graph where every edge that could exist in the graph, does: – For all i, j (j>i), v(i,j) = 1 Complete Complete Network Graph Data

LEE STEVE GER Y BRAZEY

BERT

STEVE RUSS

BERT

BRAZEY LEE GERY

RUSS EgoEgo NetworkNetwork AnalysisAnalysis

Mainstream Ego Network Social Science Networks Analysis

data perspective

• Combine the perspective of network • Combineanalysis the with perspective the data of of mainstream network analysissocial withscience the data of mainstream social science 1-mode Ego Network

Carter Administration meetings

Data courtesy of Michael Year 1 Link Year 4 2-mode Ego Network Undirected Graphs

• An undirected graph G(V,E) (often referred to simply as a graph or a simple graph) consists of … – Set of nodes|vertices V representing actors – Set of lines|links|edges E representing ties among pairs of actors • An edge is an unordered pair of nodes (u,v) • Nodes u and v adjacent if (u,v) Î E • So E is subset of set of all pairs of nodes • Drawn without arrow heads – Sometimes with dual arrow heads • Used to represent logically symmetric social relations – In communication with; attending same meeting as Directed vs. Undirected Ties

• Undirected relations – Attended meeting with – Communicates daily with • Directed relations – Lent money to • Logically vs empirically directed ties Bonnie – Empirically, even un- Bob directed relations can Biff be non-symmetric due to measurement error Betty

Betsy Directed Graphs (Digraphs)

• Digraph G(V,E) consists of … – Set of nodes V – Set of directed arcs E • An arc is an ordered pair of nodes (u,v) • (u,v) Î E indicates u sends arc to v • (u,v) Î E does not imply that (v,u) Î E • Ties drawn with arrow heads, which can be in both directions • Represent logically non-symmetric or anti- symmetric social relations – Lends money to Transpose • In directed graphs, interchanging rows/columns of adjacency matrix effectively reverses the direction & meaning of ties

Mary Bill John Larry Mary Bill John Larry Mary 0 1 0 1 Mary 0 1 0 1 Bill 1 0 0 1 Bill 1 0 1 0 John 0 1 0 0 John 0 0 0 1 Larry 1 0 1 0 Larry 1 1 0 0 Gives money to Gets money from

bill bill mary mary

john john larry larry Valued Digraphs (vigraphs)

• A valued digraph G(V,E,W) consists of … – Set of nodes V – Set of directed arcs E • An arc is an ordered pair of 3.72 nodes (u,v) • (u,v) ∈ E indicates u sends 2.9 5.28 arc to v 0.1 • (u,v) ∈ E does not imply that 3.2 (v,u) ∈ E 1.2 5.1 – Mapping W of arcs to real values 3.5 • Values can represent such things as 8.9 – Strength of relationship 9.1 1.5 – Information capacity of tie – Rates of flow or traffic across tie – Distances between nodes – Probabilities of passing on information – Frequency of interaction Valued Adjacency Matrix

Dichotomized • The diagram below uses solid lines to Jim Jill Jen Joe represent the adjacency matrix, while the numbers along the solid line (and Jim - 1 0 1 dotted lines where necessary) Jill 1 - 1 0 represent the proximity matrix. • In this particular case, one can derive Jen 0 1 - 1 the adjacency matrix by dichotomizing Joe 1 0 1 - the proximity matrix on a condition of pij <= 3.

Distances btw offices Jill Jim Jill Jen Joe 1 3 Jim - 3 9 2 Jen Jim Jill 3 - 1 15 9 Jen 9 1 - 3 15 Joe 2 15 3 - 3 2 Joe Node-related concepts • Degree BRAZE – Y The number of ties incident LEE upon a node – In a digraph, we have indegree STEV EVANDER BER E (number of arcs to a node) and T outdegree (number of arcs from BILL a node) RUS GERY S MICHAEL • Pendant HARRY – A node connected to a DON JOH component through only N one edge or arc HOLLY • A node with degree 1 • Example: John PAULINE PAM PAT • Isolate

CAROL – A node which is a component ANN on its own JENNIE • E.g., Evander Graph traversals BRAZEY

LEE • Walk – Any unrestricted traversing of vertices STEVE across edges (Russ-Steve-Bert-Lee-Steve) BERT

• Trail BILL – A walk restricted by not repeating an edge GERY or arc, although vertices can be revisited RUSS (Steve-Bert-Lee-Steve-Russ) MICHAEL • Path HARRY – A trail restricted by not revisiting any (Steve- DON Lee-Bert-Russ) JOHN • Geodesic Path HOLLY – The shortest path(s) between two vertices (Steve- Russ-John is shortest path from Steve to John) PAULINE • Cycle PAM PAT – A cycle is in all ways just like a path except that it ends where it begins CAROL – Aside from endpoints, cycles do not repeat nodes ANN – E.g. Brazey-Lee-Bert-Steve-Brazey JENNIE LengthLength && DistanceDistance

• Length of a path (or any 10 • Length of a path (or 10 12 walk)any is the walk) number is the of 12 links numberit has of links it has 1111 8 8 • The• GeodesicThe Geodesic Distance Distance 9 9 (aka graph-theoretic(aka graph-theoretic distance)distance) between between two two 22 7 7 nodesnodes is the is lengththe length of theof the shortest path shortest path 3 1 – Distance from 5 to 8 is 2, 3 1 6 – Distancebecause from the 5 toshortest 8 is 2, 6 becausepath the (5- shortest1-8) has two pathlinks

(5-1-8) has two links 4 5 4 5 Geodesic Distance Matrix

a b c d e f g 0 1 2 3 2 3 4 a 1 0 1 2 1 2 3 b c 2 1 0 1 1 2 3 d 3 2 1 0 2 3 4 e 2 1 1 2 0 1 2 f 3 2 2 3 1 0 1 g 4 3 3 4 2 1 0 SubgraphsSubgraphs

• Set of nodes bc • Set of nodes – Is just a set of nodes b c – Is just a set of nodes d • A subgraph a d • A subgraph a – Is set –of Isnodes set of together nodes together with ties amongwith them ties among them f f ee • An induced• An induced subgraphsubgraph – Subgraph defined by a set – Subgraph defined by a set of bcb c nodes of nodes – Like pulling the nodes and – Like pulling the nodes and d d ties out of the original a a ties out graphof the original graph

f e f e Subgraph induced by Subgraphconsidering induced the set by{a,b,c,f,e} considering the set {a,b,c,f,e} Components

• Maximal sets of nodes in which every node can reach every other by some path (no matter how long) • A graph is connected if it has just one component

It is relations (types of tie) that define different networks, not components. A network that has two components remains one (disconnected) network. Components in Directed Graphs

• Strong component – There is a directed path from each member of the component to every other • Weak component – There is an undirected path (a weak path) from every member of the component to every other – Is like ignoring the direction of ties – driving the wrong way if you have to A graph with 1 Weak Component & 4 Strong BILL components HARRY DON

MICHAEL HOLLY

PAT GERY LEE STEVE CAROL JENNIE BRAZEY PAM

RUSS

BERT JOHN ANN PAULINE Cutpoints and Bridges

• Cutpoint s – If a tie is a bridge, at least one of A node which, e its endpoints must be a if deleted, q cutpoint would b

increase the g m number of d a components p f • Bridge l h i – A tie that, if j c removed, would k increase n the number r of o components Data Representation

• Adjacency matrix • Edgelist • Adjacency/node list data representation – adjacency matrix

! Represen)ng!edges!(who!is!adjacent!to! whom)!as!a!matrix!

! Aij!=!1!if!node!i!has!an!edge!to!node!j! !!!=!0!if!node$i$does!not!have!an!edge!to!j!

! Aii!=!0!unless!the!network!has!self`loops!

! Aij!=!Aji!if!the!network!is!undirected,! or!if!i!and$j!share!a!reciprocated!edge! data representation – adjacency matrix

2( 0! 0! 0! 0! 0! 3( 1( 0! 0! 1! 1! 0! A!=! 0! 1! 0! 1! 0! 0! 0! 0! 0! 1! 5( 4( 1! 1! 0! 0! 0!

Issues:

1. Your dataset will likely contain network data in a non-matrix format; 2. Large, sparse networks take way too much space if kept in a matrix format data representation – adjacency matrix

Which adjacency matrix represents this network?

A)'

B)'

C)'

9/21/15! Jure!Leskovec!and!Lada!Adamic,!Stanford!CS224W:!Social!and!Informa)on!Network!Analysis! 60! data representation – edgelist

! Edge!list! ! 2,!3! 2( 3( ! 2,!4! 1( ! 3,!2! ! 3,!4! ! 4,!5! 5( 4( ! 5,!2! ! 5,!1! ! Edgelistdata representation Data Format – edgelist with weights

Source Destination Weight

A B A 1 B E 1 B C D C A 1 C E 1 E C D 1 Note: Weights are optional.

Page 14 COMM 645 Lab Fall 2012 data representation – nodelist

! ! Adjacency!list! ! is!easier!to!work!with!if! 2( network!is! 3( 1( ! large! ! sparse! ! quickly!retrieve!all!neighbors! 4( 5( for!a!node! ! 1:! ! 2:!3!4! ! 3:!2!4! ! 4:!5! ! 5:!1!2! Some strategies for network data collection Network Data Collection

Ego Can use standard sampling techniques (e.g. random sample) Networks Each respondent describes their own relationships (name generators).

Boundary specification? Complete Each respondent reports their own relationships within the network. Networks Could use a roster that people use to identify contacts. Cognitive Ask not only for a person’s own relationships, but also for perceived Social relationships between other people in your population. Structures

Individuals included in the sample identify contacts (friends, sexual Snowball partners, etc.) who are added to the study at the next step. Sampling Often used in preventive medicine.

Secondary Digital traces, social media, hyperlink networks and many more. Data

Page 16 COMM 645 Lab Fall 2012 Centrality, Clustering &Transitivity networksGraph-theoretic are complex measures Which vertices are important? Can we understand them better without a “ridiculogram”?

M.Grandjean, 2014 Leonid E. Zhukov (HSE) Lecture 5 10.02.2015 2 / 22 simplifying networks – undirected graph

Consider a classroom with 30 students. How many different possible networks could exist to represent the friendships in that classroom? simplifying networks – undirected graph simplifying networks – undirected graph How do we determine who is “important” in a Network?

How to describe an individual position in the network?

- Degree (number of connections)

- Clustering

- Distance to other nodes

- Centrality, influence, power describing networks describing networks

degree: 3 number of connections k

2 ki = Aij 4 2 j X 2 1

1 n 1 n n 1 n n number of edges m = k = A = A 2 i 2 ij 2 ji i=1 i=1 j=1 j=1 i=1 X X X X X 1 n 2m mean degree k = ki = h i n n i=1 X describing networks describing networks

degree: 3 number of connections k

2 ki = Aij 4 2 j X 2 1

degree sequence 1, 2, 2, 2, 3, 4 { }

1 3 1 1 Pr(k)= 1, , 2, , 3, , 4, 6 6 6 6 ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ degree distributionsdescribing networks

22 18 0.35 20 25 26 8 10 0.3 28 2 4 30 24 0.25 31 27 13 1 3 0.2 34 15 Pr(k) 6 0.15 32 16 7 5 14 0.1 19 12 33 21 0.05 17 9 11 29 23 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 degree, k Zachary karate club*

* Zachary, J. Anthropological Research 33, 452–473 (1977). describing networksdescribing networks

degree: 3 number of connections k

2 ki = Aij 4 2 j X 2 1 when does node degree matter? network degrees describing networks

spreading processes on networks biological (diseases)! • SIS and SIR models! susceptible-infected-susceptible social (information)! S I • SIS, SIR models! • threshold models susceptible-infected-recovered

threshold S I

R 1 1 0 ⌃ network degrees describing networks

cascade ! epidemic! branching process! spreading process

R0 = net reproductive rate! = average degree k h i

R0 =0.923 ... caveat:

R0 is the basic reproduction ignores network structure, number: the number of dynamics, etc. infected people an infected person can reproduce. network degrees describing networks

R0 < 1 R0 =1 R0 > 1 “sub-critical”! “critical”! “super-critical”! small outbreaks outbreaks of all sizes global epidemics describing networks network degrees describing networks how could we halt the spread? • break network into disconnected pieces letters to nature

called scale-free networks, which include the World-Wide Web3–5, The inhomogeneous connectivity distribution of many real net- the Internet6, social networks7 and cells8. We find that such works is reproducedletters by the scale-free to nature model17,18 that incorporates networks display an unexpected degree of robustness, the ability two ingredients common to real networks: growth and preferential 3–5 called scale-freeof their networks, nodes to which communicate include the being World-Wide unaffected Web even, byThe un- inhomogeneousattachment. connectivity The model distribution starts with ofm many0 nodes. real At net- every time step t a the Internetrealistically6, social high networks failure7 and rates. cells However,8. We find error that tolerance such comesworks at is a reproducednew node by is the introduced, scale-free model which17,18 isthat connected incorporates to m of the already-

networkshigh display price an in unexpected that these degree networks of robustness, are extremely the ability vulnerabletwo ingredients to existing common nodes. to real The networks: probability growthΠi that and preferentialthe new node is connected of theirattacks nodes to (that communicate is, to the selection being unaffected and removal even of by a un-few nodesattachment. that Theto node modeli startsdepends with m on0 nodes. the connectivity At every timeki stepoft nodea i such that realistically high failure rates. However, error tolerance comes at a new node is introduced, which is connected to m of the already- play a vital role in maintaining the network’s connectivity). Such Πi ¼ ki=Sjkj. For large t the connectivity distribution is a power- high priceerror in tolerance that these and networks attack are vulnerability extremely are vulnerable generic to propertiesexisting of nodes.law The following probabilityPðkÞ¼Πi that2m2 the=k3. new node is connected attacks (thatcommunication is, to the selection networks. and removal of a few nodes that to node i dependsThe interconnectedness on the connectivity ofki aof network node i issuch described that by its diameter play a vital role in maintaining the network’s connectivity). Such Π ¼ k =S k . For large t the connectivity distribution is a power- The increasing availability of topological data on large networks,i i j j d, defined as the average length of the shortest paths between any error tolerance and attack vulnerability are generic properties of law following PðkÞ¼2m2=k3. aided by the computerization of data acquisition, had led to great two nodes in the network. The diameter characterizes the ability of communication networks. The interconnectedness of a network is described by its diameter advances in our understanding of the generic aspects of network two nodes to communicate with each other: the smaller d is, the The increasing availability of topological9–16 data on large networks, d, defined as the average length of the shortest paths between any aided bystructure the computerization and development of data acquisition,. The existing had led empirical to great andtwo theo- nodes inshorter the network. is the The expected diameter path characterizes between them. the ability Networks of with a very advancesretical in our results understanding indicate of that the complex generic aspectsnetworks of cannetwork be dividedtwo nodes into tolarge communicate number of with nodes each can other: have thequite smaller a smalld diameter;is, the for example, 20 structuretwo and major development classes9–16 based. The on existing their connectivityempirical and distribution theo- shorterP(k), is thethe expected diameter path of between the WWW, them. with Networks over 800 with million a very nodes , is around retical resultsgiving indicate the probability that complex that networksa node in can the be network divided is into connectedlarge numberto k 19 of (ref. nodes 3), can whereas have quite social a small networks diameter; with for over example, six billion individuals two majorother classes nodes. based The on first their class connectivity of networks distribution is characterizedP(k), bythe a diameterP(k) of the WWW, with over 800 million nodes20, is around giving thethat probability peaks at an that average a nodek inand the network decays exponentially is connected to fork large19 (ref.k. The 3), whereas social networks with over six billion individuals other nodes.most The investigated first class ofexamples networks⟨ ⟩ of is such characterized exponential by a networksP(k) are the that peaksrandom at an average graphk modeland decays of Erdo exponentially¨s and Re´nyi for9,10 largeandk. the The small-world 12 ⟨ ⟩ a most investigatedmodel of Watts examples and of Strogatz such exponential11, both leading networks to a fairly are the homogeneous 9,10 E SF model of Erdo¨s and Re´nyi and the small-world 12 Failure network, in which11 each node has approximately the same number a model ofof Watts links, andk StrogatzӍ k . In, both contrast, leading results to a fairly on homogeneous the World-Wide Web 10E SF Attack network,(WWW) in which3–5 each, the⟨ node⟩ Internet has approximately6 and other the large same networks number17–19 indicate Failure of links, k k . In contrast, results on the World-Wide Web 10 Attack thatӍ many systems belong to a class of inhomogeneous networks, (WWW)3–5, the⟨ ⟩ Internet6 and other large networks17–19 indicate 8 that manycalled systems scale-free belong networks, to a class of for inhomogeneous which P(k) decays networks, as a power-law, Ϫ g 8 called scale-freethat is P networks,ðkÞϳk , for free which of a characteristicP(k) decays as scale. a power-law, Whereas the prob- that is PabilityðkÞϳk Ϫ thatg, free a node of a characteristic has a very large scale. number Whereas of connections the prob- (k q k ) 6 is practically prohibited in exponential networks, highly connected⟨ ⟩ ability that a node has a very large number of connections (k q k ) 6 is practicallynodes prohibited are statistically in exponential significant networks, in scale-free highly connected networks⟨ ⟩ (Fig. 1). nodes are statisticallyWe start by significant investigating in scale-free the robustness networks (Fig. of the 1). two basic con- 4 ¨ ´ 9,10 0.00 0.02 0.04 We startnectivity by investigating distribution the models, robustness the of Erdo thes–Re twonyi basic (ER) con- model that4 d ¨ ´ 9,10 0.00 0.02 0.04 nectivityproduces distribution a network models, the with Erdo ans–Re exponentialnyi (ER) model tail, andthat the scale-freed producesmodel a network17 with with a power-law an exponential tail. In the tail, ER and model the scale-free we first define the N b c network17 degrees model withnodes, a power-law anddescribing then connecttail. In thenetworks each ER model pair of we nodes first definewith probability the N p. This b c nodes, and then connect each pair of nodes with probability p. This 15 algorithm generates a homogeneous network (Fig. 1), whose con-15 Internet WWW algorithmnectivity generates follows a homogeneous a Poisson network distribution (Fig. peaked 1), whose at con-k and decaying Internet WWW 20 whatnectivity promotes follows a Poisson spreading? distribution peaked at k and decaying⟨ ⟩ 20 exponentially for k q k . ⟨ ⟩ •exponentiallyhigh-degree for k vertices*q k . !⟨ ⟩ 10 ⟨ ⟩ 10 Attack Attack Attack Attack • centrally-located vertices 15 15 5 homogeneous in degree heterogeneous in degree 5 a b a b Failure Failure Failure Failure 0 0 10 10 0.00 0.010.00 0.02 0.010.00 0.02 0.010.00 0.02 0.01 0.02 f f

Figure 2 ChangesFigure in the 2 diameterChangesd of in the the network diameter asd aof function the network of the fractionas a functionf of the of the fraction f of the removed nodes. aremoved, Comparison nodes. betweena, Comparison the exponential between (E) and the scale-free exponential (SF) (E) network and scale-free (SF) network models, each containingmodels,N each¼ 10 containing;000 nodesN and¼ 20,00010;000 links nodes (that and is, 20,000k ¼ 4). links The (that blue is, k ¼ 4). The blue ⟨ ⟩ ⟨ ⟩ symbols correspondsymbols to the correspond diameter of to the the exponential diameter (triangles) of the exponential and the scale-free (triangles) and the scale-free (squares) networks(squares) when a fraction networksf of when the nodes a fraction are removedf of the randomly nodes are (error removed tolerance). randomly (error tolerance). Red symbols show the response of the exponential (diamonds) and the scale-free (circles) Red symbols show the response of the exponential (diamonds) and the scale-free (circles) networks to attacks, when the most connected nodes are removed. We determined the f Exponential Scale-free networks to attacks, when the most connected nodes are removed. We determined the f Scale-freedependence of the diameter for different system sizes (N ¼ 1;000; 5,000; 20,000) and Exponential dependence of the diameter for different system sizes (N ¼ 1;000; 5,000; 20,000) and Figure 1 Visual illustration of the difference between an exponential and a scale-free found that the obtained curves, apart from a logarithmic size correction, overlap with network. a,Figure The exponential 1 Visual illustrationnetwork is homogeneous: of the difference most between nodes have an exponential approximately and a scale-freethose shown in a,found indicating that that the the obtained results curves, are independent apart from of the a logarithmic size of the system. size correction, We overlap with the same numbernetwork. ofa links., Theb, exponential The scale-free network network is homogeneous: is inhomogeneous: most the nodes majority have of approximatelynote that the diameterthose of shown the unperturbed in a, indicating (f ¼ that0) scale-free the results network are independent is smaller than of the that size of the system. We the nodes havethe same one or number two links of but links. a fewb, The nodes scale-free have a large network number is inhomogeneous: of links, theof majority the exponential of note network, that the indicating diameter that of scale-free the unperturbed networks (f use¼ 0) the scale-free links available network to is smaller than that guaranteeingthe that nodes the systemhave one is fully or two connected. links but Red, a few the nodes five nodes have with a large thehighest number of links,them more efficiently,of the generating exponential a more network, interconnected indicating web. that scale-freeb, The changes networks in the use the links available to number ofguaranteeing links; green, their that first the neighbours. system is fully Although connected. in the exponential Red, the five network nodes only with thediameter highest of the Internetthem more under efficiently, random failures generating (squares) a more or attacks interconnected (circles). We web. usedb the, The changes in the 27% of thenumber nodes are of reachedlinks; green, by the their five first most neighbours. connected nodes, Although in the in scale-freethe exponential networktopological only map ofdiameter the Internet, of the containing Internet under 6,209 random nodes and failures 12,200 (squares) links ( k or¼ attacks3:4), (circles). We used the ⟨ ⟩ network more27% than of the60% nodes are reached, are reached demonstrating by the five the most importance connected of the nodes, connected in the scale-freecollected by the Nationaltopological Laboratory map of for the Applied Internet, Network containing Research 6,209http://moat.nlanr.net/ nodes and 12,200 links ( k ¼ 3:4), ⟨ ⟨ ⟩ nodes in thenetwork scale-free more network than 60% Both arenetworks reached, contain demonstrating 130 nodes and the 215 importance links of theRouting/rawdata/ connected collected. c, Error by (squares) the National and attack Laboratory (circles) for survivability Applied Network of the World-Wide Research http://moat.nlanr.net/ ⟩ 3 ⟨ ( k ¼ 3:3).nodes The network in the scale-free visualization network was done Both using networks the Pajek contain program 130 for nodes large and 215Web, links measured onRouting/rawdata/ a sample containing. c, 325,729 Error (squares) nodes and and 1,498,353 attack (circles) links , survivability such that of the World-Wide ⟨ ⟩ ⟩ network analysis:( k ¼ http://vlado.fmf.uni-lj.si/pub/networks/pajek/pajekman.htm3:3). The network visualization was done using the Pajek. program fork large¼ 4:59. Web, measured on a sample containing 325,729 nodes and 1,498,353 links3, such that ⟨ ⟩ ⟨ ⟩ ⟨ ⟩ network analysis: http://vlado.fmf.uni-lj.si/pub/networks/pajek/pajekman.htm . k ¼ 4:59. NATURE | VOL 406 | 27 JULY 2000⟨ | www.nature.com © 2000 Macmillan Ma⟩ gazines Ltd⟨ ⟩ 379

NATURE | VOL 406 | 27 JULY 2000 | www.nature.com © 2000 Macmillan Magazines Ltd 379 Centrality examples

Closeness centrality

from www.activenetworks.net

Leonid E. Zhukov (HSE) Lecture 5 10.02.2015 11 / 22 Centrality examples

Betweenness centrality

from www.activenetworks.net

Leonid E. Zhukov (HSE) Lecture 5 10.02.2015 12 / 22 Centrality examples

Eigenvector centrality

from www.activenetworks.net

Leonid E. Zhukov (HSE) Lecture 5 10.02.2015 13 / 22 Centrality (node-level)

• A measure of how network structure and position contributes to a node’s importance • Value associated with every node • Many different measures which capture different aspects • Can be characterized by the nature of the flow describing networks Centrality measures

• Degree – how well connected; direct influence

• Closeness – how far from all others – how long information takes to arrive

• Betweenness – brokerage, gatekeeping, control of info

• Eigenvector – being connected to the well connected (a popularity & power measure) Who’s Important in this network?

Data courtesy of David Krackhardt

Degree Centrality

• Index of exposure to what is flowing through the network

• Interpreted as opportunity to influence & be influenced directly

• Predicts variety of outcomes from virus resistance to power & leadership to job satisfaction to knowledge Degree Bob Con Joe Kim Lin Pam Pat Pete Red Rick Rgr Sally Sam Sue Tim Tylr Will Bob 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 5 Con 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 4 Joe 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 4 Kim 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 4 Lin 0 0 1 0 0 0 1 0 0 1 1 0 1 0 1 1 1 8 Pam 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 3 Pat 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 3 Pete 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 6 Red 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 3 Rick 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 1 5 Rgr 0 1 1 0 1 0 1 0 0 1 0 0 0 0 1 0 1 7 Sally 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 5 Sam 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 3 Sue 1 0 0 1 0 1 0 1 1 0 0 1 0 0 0 0 0 6 Tim 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 1 5 Tylr 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 3 Will 0 1 0 0 1 0 1 0 0 1 1 0 0 0 1 0 0 6 5 4 4 4 8 3 3 6 3 5 7 5 3 6 5 3 6 Degree Centrality with Directed Data

• Indegree- The number of ties directed to the node

• Outdegree- The number of ties that the node directs to others Degree Centrality with Valued Data OUTDEGREE

MT6 MT71 MT72 MT83 MT93 MT210 MT215 MT272 MT6 0 100 500 1600 1100 300 2450 1500 7550

MT71 0 0 0 0 0 0 0 0 0

MT72 0 0 0 0 0 0 0 0 0 MT83 0 0 0 0 0 0 0 0 0

MT93 0 0 0 0 0 0 0 0 0 MT210 0 0 0 0 0 0 0 0 0

MT215 0 0 0 0 0 0 0 0 0

MT272 0 0 0 0 0 0 0 0 0

INDEGREE 0 100 500 1600 1100 300 2450 1500 0

NOTE: some software may binarize networks before calculating degree with valued data. Centrality: who’s important based on their network

Best measure if importance means: how popular you are

how many people you know It is a local measure! Degreeformula centrality for degree (normalized)

k CD (i)= i N 1

M.E.J. Newman. (2010). Networks: An Introduction. Oxford University Press. Betweenness is not

degree iseverything not everything… X and Y do not difer much for betweenness

X Y

We want to capture: Being close to all nodes Closeness Closeness Centrality

• Is an inverse measure of centrality • The extent to which a node is close from all other nodes • Index of expected time until arrival for a given node of whatever is flowing through the network –Gossip network: central player hears things first, on average describing networks describing networks

position = centrality: harmonic, closeness centrality!

importance = being in “center” of the network

1 1 harmonic ci = n 1 dij j=i X6

length of shortest path

`ij if j reachable from i distance: dij = Boldi & Vigna, arxiv:1308.2140 (2013)! otherwise Borgatti, Social Networks 27, 55–71 (2005) ⇢ 1 Closeness centrality

1 N ClosenessC˜C (i)= centralityd (i, j) closeness centrality formula2 3 j=1 X 1 4 N 5 C˜C (i)= d (i, j) 2 3 j=1 XX Y 4 5 Normalized C˜C (i) CC (i)=X Y N 1 All other nodes in the network M.E.J. Newman. (2010). Networks: An Introduction. Oxford University Press.

What happens to isolates? GEODESIC DISTANCE MATRIX used to Calculate Closeness

Bob Con Joe Kim Lin Pam Pat Pete Red Rick Rgr Sally Sam Sue Tim Tylr Will Bob 0 1 3 2 3 1 3 1 2 3 2 1 4 1 3 4 2 36 Con 1 0 2 2 2 2 2 1 2 2 1 2 3 2 2 3 1 30 Joe 3 2 0 4 1 4 2 3 4 1 1 4 2 4 1 2 2 40 Kim 2 2 4 0 4 2 4 1 1 4 3 1 5 1 4 5 3 46 Lin 3 2 1 4 0 4 1 3 4 1 1 4 1 4 1 1 1 36 Pam 1 2 4 2 4 0 4 2 2 4 3 1 5 1 4 5 3 47 Pat 3 2 2 4 1 4 0 3 4 2 1 4 2 4 2 2 1 41 Pete 1 1 3 1 3 2 3 0 1 3 2 1 4 1 3 4 2 35 Red 2 2 4 1 4 2 4 1 0 4 3 2 5 1 4 5 3 47 Rick 3 2 1 4 1 4 2 3 4 0 1 4 1 4 2 2 1 39 Rgr 2 1 1 3 1 3 1 2 3 1 0 3 2 3 1 2 1 30 Sally 1 2 4 1 4 1 4 1 2 4 3 0 5 1 4 5 3 45 Sam 4 3 2 5 1 5 2 4 5 1 2 5 0 5 2 1 2 49 Sue 1 2 4 1 4 1 4 1 1 4 3 1 5 0 4 5 3 44 Tim 3 2 1 4 1 4 2 3 4 2 1 4 2 4 0 1 1 39 Tylr 4 3 2 5 1 5 2 4 5 2 2 5 1 5 1 0 2 49 Will 2 1 2 3 1 3 1 2 3 1 1 3 2 3 1 2 0 31 NOTE: if data is directed, you can calculate in-closeness and out-closeness centrality closeness centrality example

Closeness Centrality: Toy Example

A B C D E

$ N '− 1 d(A, j) & ∑ ) −1 −1 j 1 $1 + 2 + 3+ 4' $10 ' C' (A) = & = ) = = = 0.4 c & N −1 ) %& 4 () %& 4 () & ) % (

€ Closeness in directed networks ! choose a direction ! in-closeness (e.g. prestige in citation networks) ! out-closeness

! usually consider only vertices from which the node i in question can be reached

Closeness in directed networks

• In-Closeness centrality measures the degree to which a node can be easily reached *from* other nodes (i.e. using edges coming in towards the node) where easily means shortest distance.

• Out-Closeness centrality measures the degree to which a node can easily reach other nodes (i.e. using edges out from the node), and easily again means shortest distance.

• If there is no (directed) path between vertex v and i then the total number of vertices is used in the formula instead of the path length. Eigenvector

Eigenvector: The extent to which a given node is connected to other well-connected nodes

Data courtesy of David Krackhardt

• Node has high score if connected to many nodes that are themselves well connected (you are important to the extent your friends are important)

• Indicator of popularity, – Google Page Rank

• Like degree, is index of exposure, risk

• However, tends to identify centers of large cliques describing networks eigenvector centrality

position = centrality: PageRank, Katz, eigenvector centrality!

importance = sum of importances* of nodes that point at you! I I = j i k j i j X! or, the left eigenvector of Ax = x

Boldi & Vigna, arxiv:1308.2140 (2013)! Borgatti, Social Networks 27, 55–71 (2005) *modulo several technical details Eigenvector centrality

A node is important if it is connected to important nodes N Xi = Xj Xi = AijXj AX = X j ⇤(i) j=1 2X X The solution (when exists) gives the node centrality. We take the highest

Note: Bonacich eigenvector centrality includes a parameter ß Thiswhich allows concept one to adjust is at how the important core are of neighbours the rankingin different path lengths to a algorithmnode’s centrality of versus Google how important is the number of neighbours in path length = 1; high ß leads to low M.E.J. Newman. (2010). Networks: An Introduction. Oxford attenuation and the global networkUniversity Press. structure matters; low ß yields high attenuation and only the immediate friends matter. When ß = 0, equivalent to degree centrality. Bonacich'eigenvector'centrality'

ci (β) = ∑(α + βcj )Aji j c(β) = α(I − βA)−1 A1

• α is a normalization constant • β determines how important the centrality of your neighbors is • A is the adjacency matrix (can be weighted) • I is the identity matrix (1s down the diagonal, 0 off-diagonal) • 1 is a matrix of all ones. Bonacich'Power'Centrality:'attenuation'factor'β small β #'high'attenuation' 'only'your'immediate'friends'matter,'and'their' importance'is'factored'in'only'a'bit' ' high'β #'low'attenuation' 'global'network'structure'matters'(your'friends,' your'friends''of'friends'etc.)'

' β'='0'yields'simple'degree'centrality'

ci (β) = ∑(α + βcj )Aji j Bonacich'Power'Centrality:'examples

β=.25

β=-.25

Why does the middle node have lower centrality than its neighbors when β is negative? NOTE: Node with highest eigenvector centrality is not always node with highest degree centrality

Highest eigenvector centrality Highest degree centrality Degree is not everything Who is more important in the following Who is more important insituations: the networks below? X or Y X or Y?

We want to capture: Ability to broker between groups

Likelihood that information originating anywhere in the network reaches you ability to broker between groups ability to broker between groups

Betweenness Centrality

• How often a node lies along the shortest path between two other nodes • Index of potential for gatekeeping, brokering, controlling the flow, and also of liaising otherwise separate parts of the network • Interpreted as indicating power and access to diversity of what flows; potential for synthesizing formula Betweenness centrality d (i) C˜B (i)= jk djk Xj

Normalized C˜B CB (i)= (N 1) (N 2) /2

Number of pairs of vertices excluding i

M.E.J. Newman. (2010). Networks: An Introduction. Oxford University Press.

For directed graphs: when normalizing, we have (N-1)*(N-2) instead of (N-1)*(N-2)/2, because we have twice as many ordered pairs as unordered pairs. Betweenness in directed networks

! A node does not necessarily lie on a geodesic (shortest path) from j to k if it lies on a geodesic from k to j

j

k An example I

Betweenness in directed networks

6 The node betweenness for the graph on the left:

4 3 Node Betwenness 1 0 2 1.5

5 3 1 2 4 4 5 3

1 6 0

Donglei Du (UNB) 20 / 85 Betweenness in directed networks How to find the betweeness in the example?

For example: for node 2, the (n 1)(n 2)/2 = 5(5 1)/2 = 10 terms in the summation in the order of 13, 14, 15, 16, 34, 35, 36, 45, 46, 56 are 1 0 0 0 0 1 0 0 0 0 + + + + + + + + + = 1.5. 1 1 1 1 1 2 1 1 1 1 Here the denominators are the number of shortest paths between pair of edges in the above order and the numerators are the number of shortest paths passing through edge 2 between pair of edges in the above order.

Donglei Du (UNB) Social Network Analysis 21 / 85

Summarizing

Centrality indices are answers to the question "What characterizes an important node?” Betweenness Closeness The word "importance" has a wide number of meanings, leading to many diferent definitions of centrality.

Eigenvector Degree Source: http://en.wikipedia.org/wiki/Centrality WhichWhich nodesnodes are are most most “central”? “central”? Definition of ‘central’ varies by context/ purpose. Local measure: degree Relative to the rest of network: betweenness closeness eigenvector (Bonacich power centrality) Data Types: Centrality

Disconnected or Binary or Valued Directed or Connected Undirected Degree Both Both Both Closeness Strongly Binary Both Connected Betweenness Both Binary Both Eigenvector Connected Both Undirected check yourCentrality: understanding Check Your Understanding

• generally different centrality metrics will be positively correlated • when they are not, there is likely something interesting about the network • suggest possible topologies and node positions to fit each square

Low Low Low Degree Closeness Betweenness High Degree

High Closeness

High Betweenness

adapted from a slide by James Moody Centrality: Check Your Understanding

• generally different centrality metrics will be positively correlated • when they are not, there is likely something interesting about the network • suggest possible topologies and node positions to fit each square

Low Low Low Degree Closeness Betweenness High Degree Embedded in cluster Ego's connections that is far from the are redundant - rest of the network communication bypasses him/her

High Closeness Key player tied to Probably multiple important/active paths in the players network, ego is near many people, but so are many others High Ego's few ties are Very rare cell. Betweenness crucial for network Would mean that flow ego monopolizes the ties from a small number of people to many others. adapted from a slide by James Moody Fun Applications of Centrality • Oracleofkevinbacon.org – 6 degrees of Kevin Bacon – Can you find anyone with a Bacon score > 4?

• Theyrule.net – Board overlaps of top corporations

• Oilmoney.priceofoil.org – Tracking petroleum industry campaign contributions Global Properties/Graph-level approach: Centralization

To measure the degree to which the graph as a whole is centralized,centralization:'skew'in'distribution' we look at dispersion of centrality

How much variation is there in the centrality scores among the nodes? Freemans general formula for centralization (can use other metrics, e.g. gini coefficient or standard deviation):

maximum value in the network g * ∑ [CD (n ) − CD (i)] C = i=1 D [(N −1)(N − 2)]

€ network position

most! vast wilderness most! centralized of in-between decentralized degree'centralization'examples'degree centralization examples

CD = 0.167 CD = 1.0

CD = 0.167 real-worldrealNworld'examples' networks

example financial trading networks

high in-centralization: low in-centralization: one node buying from buying is more evenly many others distributed “model in which opinion flows only from the media to influentials, and then only from influentials to the larger populace is deprecated” “large cascades of influence are driven not by influentials, but by a critical mass of easily influenced individuals.” “influence is not really about the influencer as much about the susceptibles” what have we learnt from it… Baker & Faulkner (1993): Social Organization of conspiracy

(reconstructs communicationBaker networks & Faulkner: in three well -known price-fixing conspiracies in the heavy electrical equipment industry to study social organization)Social organization of conspiracy# Questions: How are relations organized to facilitate illegal behavior?

Pattern of communication maximizes concealment, and predicts the criminal verdict.

Inter-organizational cooperation is common, but too much cooperation can thwart market competition, leading to (illegal) market failure.

Illegal networks differ from legal networks, in that they must conceal their activity from outside agents. A Secret society should be organized to (a) remain concealed and (b) if discovered make it difficult to identify who is involved in the activity

The need for secrecy should lead conspirators to conceal their activities by creating sparse and decentralized networks. The Social Organization of Conspiracy: Illegal Networks in the Heavy Electrical Equipment Industry, Wayne E. Baker, Robert R. Faulkner. American Sociological Review, Vol. 58, No. 6 (Dec., 1993), pp. 837-860. Published by: American Sociological Association, http://www.jstor.org/stable/2095954.

ClusteringClustering coecient

A feature of interest when studying a network is its transitivity,i.e.,if i j and j k then i k. ⇠ ⇠ ⇠ If node i is connected to nodes j and k, how often is it the case that j and k are also connected?

When i, j, and k are all connected to each other they form a triangle. Clustering

What fraction of my friends are friends of each other?

(1)Calculate clustering for a particular node;

(1)Average individual clustering coefficients across the network (it weights clustering node by node) DifferencesinClustering (2)Overall clustering: out of all possible triplets in the network, what the frequency with which it is connected? Average tends to 1

Overall tends to 0 Local clustering coecient

If i is a node with ki 2 then its local clustering coecient is defined as: local clustering coefficient NumberLocal of clustering triangles containing coecienti Ci = , If i is a node with ki Number2 then its oflocal pairs clustering of neighbours coecient is of definedi as: ti =Number1 of triangles, containing i C = ki(ki 1) , i Number2 of pairs of neighbours of i t where t =[A3] =. i , i ii 1 k (k 1) 2 i i 3 Possible triangles including node 1: where ti =[A ]ii. Possible(1 triangles2 including3), (1 3 node5) 1:, (1 2 5), { (1 2(13), (15 34), 5)(1, (1 2 2 4)5),,(1 3 4) . { } (1 5 4), (1 2 4), (1 3 4) . } Actual triangles: Actual triangles: (1 2 3), (1 3 5) . (1 2{ 3), (1 3 5) . } { } C = 1 . 1 1 C3 1 = 3 . global clusteringGlobal clusteringcoefficient coecient

There are two alternative definitions of the global clustering coecient:

Version 1: Average Clustering Coefficient N 1 C = C = C . h ii N i Xi=1 Version 2: Overall Clustering Coefficient 3 t C = ⇥ number of connected triples where t is the total number of triangles. If there are no 1 3 self-loops then t = 3 trace(A ). The clustering coecient of a node v is, intuitively, the proportion of its ”friends” who are friends themselves. Mathematically, it is the proportion of neighbours of v which are neighbours themselves. In adjacency matrix notation,

u,w V au,vaw,vau,w C(v)= 2 . u,w V au,vaw,v P 2 P The (average) clustering coecient is defined as 1 C = C(v). V v V | | X2 Note that

au,vaw,vau,w u,w V X2 is the number of triangles involving25 v in the graph. Similarly,

au,vaw,v u,w V X2 is the number of 2-stars centred around v in the graph. The clustering coecient is thus the ratio between the number of triangles and the number of 2-stars. The clustering coecient describes how ”locally dense” a graph is. Sometimes the clustering coecient is also called the transitivity.

The clustering coecient in the Florentine family example is 0.1914894; the average clustering coecient in the Yeast data is 0.1023149.

26 Social balance theories

Central question in modeling social networks from structural individualism:

how can the global properties of the network be understood from local properties?

E.g. theory of clusterability of balanced signed graphs:

(1) Harary’s theorem says that a complete is balanced iff the nodes can be partitioned into two sets, so that all ties within sets are positive, and all ties between sets are negative; (2) Heider’s work on cognition of social situations (Person- Object-Other), interested in correspondence between P and O, given their beliefs (like/dislike) about Object X [dyads PO, PX, OX]; (3) David & Leinhardt generalized conditions for clusterability of signed graphs and structures of ranked clusters;

These theories pose the problem: how can triadic properties of signed graphs (aggregate properties of all subgraphs of 3 nodes) determine global properties of signed graphs. argument that unbalanced triads tended towards balance, which implied all intransitive triads would disappear from the network. not what we find empirically…

+ + “A friend of a (+)(+)(+) = (+) Balanced friend is a friend” +

- - (-)(+)(-) = (-) Balanced “An enemy of my enemy is a friend” +

- - (-)(-)(-) = (-) Unbalanced “An enemy of my enemy is an - enemy”

+ + (+)(-)(+) = (-) Unbalanced “A Friend of a Friend is an enemy” Local Structure – Transitivity Transitivity Transitivity

TheTransitivity tendency of a for relation a tie meansfrom i thatto k to occur at greater than chance frequencieswhen there isif athere tie from arei totiesj, andfrom also i to from j andj to fromh, j to k – the i to j tie completesthen there “transitively” is also a tie from thei to tripleh: consisting of the tie from i to j and the tie from j tofriends k. of my friends are my friends.

TransitivityTransitivity depends on ontriads triads, subgraphs, subgraphs formedformed by 3 nodes. by 3 nodes

i i i j ? j j

h h h

Potentially Intransitive Transitive transitive

c Tom A.B. Snijders University of Oxford Transitivity and Triads May 14, 2012 5 / 32

Local Structure – Transitivity

Transitive graphs

One example of a (completely) transitive graph is evident:

the complete graph Kn, which has n nodes and density 1. (The K is in honor of Kuratowski, a pioneer in graph theory.)

Is the empty graph transitive?

Try to find out for yourself, what other graphs exist that are completely transitive!

c Tom A.B. Snijders University of Oxford Transitivity and Triads May 14, 2012 6 / 32 Simmelian Ties

• These are the ties that make up transitivity • Simmelian ties are [reciprocated] transitive triples CD,DE,EC is one set of simmelian ties CT,TE,EC is another set All other sets are not simmelian C B T

D A E Local Structure – Transitivity measuringMeasure transitivity for transitivity – clustering index

A measure for transitivity is the (global) transitivity index, defined as the ratio ]Transitive triads Transitivity Index = . ] Potentially transitive triads

(Note that “]A” means the number of elements in the set A.) This also is sometimes called a clustering index.

This is between 0 and 1; it is 1 for a transitive graph. For random graphs, the expected value of the transitivity index is close to the density of the graph (why?); for actual social networks, values between 0.3 and 0.6 are quite usual.

c Tom A.B. Snijders University of Oxford Transitivity and Triads May 14, 2012 7 / 32

Local Structure – Transitivity

Local structure and triad counts

The studies about transitivity in social networks led Holland and Leinhardt (1975) to propose that the local structure in social networks can be expressed by the triad census or triad count, the numbers of triads of any kinds.

For (nondirected) graphs, there are four triad types:

h h h h

i j i j i j i j

Empty One edge Two-path / Triangle Two-

c Tom A.B. Snijders University of Oxford Transitivity and Triads May 14, 2012 8 / 32 Local Structure – Transitivity

Measure for transitivity

A measure for transitivity is the (global) transitivity index, defined as the ratio ]Transitive triads Transitivity Index = . ] Potentially transitive triads

(Note that “]A” means the number of elements in the set A.) This also is sometimes called a clustering index.

This is between 0 and 1; it is 1 for a transitive graph. For random graphs, the expected value of the transitivity index is close to the density of the graph (why?); for actual social networks, values between 0.3 and 0.6 are quite usual.

c Tom A.B. Snijders University of Oxford Transitivity and Triads May 14, 2012 7 / 32

Local Structure – Transitivity local structure and triad counts Local structure and triad counts

The studies about transitivity in social networks led Holland and Leinhardt (1975) to propose that the local structure in social networks can be expressed by the triad census or triad count, the numbers of triads of any kinds.

For (nondirected) graphs, there are four triad types:

h h h h

i j i j i j i j

Empty One edge Two-path / Triangle Two-star

c Tom A.B. Snijders University of Oxford Transitivity and Triads May 14, 2012 8 / 32 Local Structure – Transitivity local structure and triad counts

A simple example graph ijhtriad type with 5 nodes. 1 2 3 triangle 1 2 4 one edge 3 1 2 5 one edge 1 3 4 two-star 2 4 1 3 5 one edge 1 4 5 empty 2 3 4 two-star 1 5 2 3 5 one edge 3 4 5 one edge

In this graph, the triad census is (1, 5, 2, 1) (ordered as: empty – one edge – two-star – triangle).

c Tom A.B. Snijders University of Oxford Transitivity and Triads May 14, 2012 9 / 32

Local Structure – Transitivity

It is more convenient to work with triplets instead of triads: triplets are like triads, but they refer only to the presence of the edges, and do not require the absence of edges.

E.g., the number of two-star triplets is the number of potentially transitive triads.

The triplet count for a non- is defined by the number of edges, the total number of two-stars (irrespective of whether they are embedded in a triangle), and the number of triangles.

c Tom A.B. Snijders University of Oxford Transitivity and Triads May 14, 2012 10 / 32 MAN coding for triad census triad census

(0) (1) (2) (3) (4) (5) (6)

003 012 102 111D 201 210 300

Transitivity = 030C closure Closure = 030D Similarity = 030T 021D 111U 120D Intransitive

Transitive

021U 030T 120U Mixed

021C 030C 120C

Transitivity: tie i to k to occur if ties from i to j and j to k exist; Closure: tie i to j to occur if persons k with ties to both i and j exist; Similarity: tie i to j to occur if persons k to whom i and j have ties exist; triad census - example

Type Number of triads ------1 - 003 21 ------2 - 012 26 3 - 102 11 4 - 021D 1 5 - 021U 5 6 - 021C 3 7 - 111D 2 8 - 111U 5 9 - 030T 3 10 - 030C 1 11 - 201 1 12 - 120D 1 13 - 120U 1 14 - 120C 1 15 - 210 1 16 - 300 1 ------Sum (2 - 16): 63 • triads define behavioral mechanisms: we can leverage the distribution of triads in a network to test whether the hypothesized mechanism is active.

• How?

(1) Count the number of each triad type in a given network

(2) Compare to the expected number, given some (random) distribution of ties in the network;

• Statistical approach proposed by Holland and Leinhardt is now obsolete. Statistical methods have been proposed for probability distributions of graphs depending primarily on triad counts, but complemented with stat counts and nodal variables, along with some higher-order configurations essential for adequate modeling of empirical network data.