Introduction to analysis

Introduction to

Paola Tubaro

University of Greenwich, London

27th June 2012 Introduction to social network analysis Introduction

Welcome!

Rise of online social networking services: ⇒ social networks to the fore. New interest for social network analysis (SNA). Yet networks have always existed! Likewise, SNA now has a long history. Introduction to social network analysis Introduction

This workshop

Understand what SNA is. Understand how you could use it. Learn basic principles and measures. Be aware of available resources. Introduction to social network analysis Introduction Outline Outline

1 Introduction

2 What is SNA

3 Data

4 Network metrics

5 Models

6 Resources Introduction to social network analysis Introduction Motivation What can SNA be used for?

Classical applications: Improvements in organisational performance. Social policy interventions for behaviour change; Newer applications on online social networking services: For organisations; For social policy. Introduction to social network analysis Introduction Motivation The organisational chain of a company Introduction to social network analysis Introduction Motivation Formal chart vs. network

With whom do you discuss issues important to your work? Introduction to social network analysis Introduction Motivation Formal chart vs. network

With whom do you discuss issues important to your work?

Senior people relatively peripheral (Barry): removed from day-to-day activities of the group. Introduction to social network analysis Introduction Motivation Formal chart vs. network

With whom do you discuss issues important to your work?

The very central role of Nick (what if he moves to another job?) Introduction to social network analysis Introduction Motivation Formal chart vs. network

With whom do you discuss issues important to your work?

Product 1 division relatively separate from overall network. Introduction to social network analysis Introduction Motivation Interventions

Using network data to improve flows of communication and coordination in the organisation. Introduction to social network analysis Introduction Motivation Networks for behaviour change: smoking prevention

Network of friendships among sixth grade pupils. Squares = girls, circles = boys; blue = smokers, red = non-smokers. Valente et al. 2003. Introduction to social network analysis Introduction Motivation Use popular pupils (“opinion leaders”) to reduce smoking in adolescents

Identify most popular pupils in class; Recruit and train them; Use them to spread the message.

Valente et al. 2003: network method effective in reducing adolescents’ smoking. Introduction to social network analysis What is SNA

Defining SNA

An approach to human behaviours and social interactions. A set of specific analytical and statistical methods. A special type of data (and techniques of data collection). A set of visualisation tools.

⇒ It applies to both online and offline social networks. ⇒ It can be combined with other social science theories, methods and data. Introduction to social network analysis What is SNA What is a network What is a network —a formal definition

= A set of units (nodes) connected by one or more relations (ties) What is a node? ⇒ Depends on setting: person, group/organisation, object. What is a tie? ⇒ A relation or a shared trait: friendship, advice, exchange, co-work. Introduction to social network analysis What is SNA What is a network Graphs and networks

Circles (A,B) represent nodes. Lines (e.g. betweenA andB) represent ties/edges. Graph visualizes the whole structure of ties of a defined group. Graphical conventions (colours, size of nodes and/or ties) can be added to show attributes. For example: if this is a network of friendship, blue = boys, red = girls. Introduction to social network analysis What is SNA What is a network Graphs and networks

Circles (A,B) represent nodes. Lines (e.g. betweenA andB) represent ties/edges. Graph visualizes the whole structure of ties of a defined group. Graphical conventions (colours, size of nodes and/or ties) can be added to show attributes. For example: if this is a network of friendship, blue = boys, red = girls. Introduction to social network analysis What is SNA What is a network Isolates, dyads and triads

a b c ¡A u u ¡ uA ¡ A ¡ A ¡ A d e ¡ A f u u u Isolate Dyad Triad Introduction to social network analysis What is SNA The network perspective A new perspective

SNA requires a change of mindset with respect to other social science approaches. Emphasis is on relationships, not attributes. Not just dyadic relationships (just A and B), but dyadic relationships as embedded in a whole set of relationships. Introduction to social network analysis What is SNA The network perspective A new perspective

SNA requires a change of mindset with respect to other social science approaches. Emphasis is on relationships, not attributes. Not just dyadic relationships (just A and B), but dyadic relationships as embedded in a whole set of relationships. Introduction to social network analysis What is SNA The network perspective A new perspective

SNA requires a change of mindset with respect to other social science approaches. Emphasis is on relationships, not attributes. Not just dyadic relationships (just A and B), but dyadic relationships as embedded in a whole set of relationships. Introduction to social network analysis What is SNA The network perspective Embedded relationships

Figure: Suppose the relationship represented here is friendship. How may friendship between A and B vary in these three different contexts? Introduction to social network analysis What is SNA The network perspective Triads

a a a ¡A ¡A ¡AK ¡ uA ¡ uA ¡ uA ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A b ¡ AU c b ¡ -AU c b ¡ -A c u u u u u u Intransitive Transitive 3-cycles

Intransitive: Only bilateral ties. Transitive: A friend of my friend is my friend. Three-cycles: a form of generalized exchange. Introduction to social network analysis What is SNA The network perspective Triads

a a a ¡A ¡A ¡AK ¡ uA ¡ uA ¡ uA ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A b ¡ AU c b ¡ -AU c b ¡ -A c u u u u u u Intransitive Transitive 3-cycles

Intransitive: Only bilateral ties. Transitive: A friend of my friend is my friend. Three-cycles: a form of generalized exchange. Introduction to social network analysis What is SNA The network perspective Some theory

Georg Simmel: triad as a fundamental unit of sociological analysis. Three actors in a triad may allow for social dynamics that are qualitatively different from what can be observed based on dyads or individuals. Social behaviours and phenomena cannot be reduced to dyads or individuals. The right starting point is the triad (and higher).

N.B. Dyad = a two-nodes set; Triad = a three-node set. Introduction to social network analysis What is SNA The network perspective Network effects, more globally

a 1   ¡ AK e  ¡ xA ¡ A x ¡ A d ¡ A xb ¡ A c x x

For example, those who attract many choices will attract even more in future (reputation effect, “Matthew” effect). Introduction to social network analysis What is SNA The network perspective Network effects, more globally

a 1   ¡ AK e  ¡ xA ¡ A x ¡ A d ¡ A xb ¡ A c x x

For example, those who attract many choices will attract even more in future (reputation effect, “Matthew” effect). Introduction to social network analysis What is SNA The network perspective Network effects, more globally

a 1   ¡ AK e  ¡ xA ¡ A x ¡ A d ¡ A xb ¡ A c x x

For example, those who attract many choices will attract even more in future (reputation effect, “Matthew” effect). Introduction to social network analysis What is SNA The network perspective Network effects, more globally

a 1   ¡ AK e  ¡ xA ¡ A x ¡ A d ¡ A xb ¡ A c x x

For example, those who attract many choices will attract even more in future (reputation effect, “Matthew” effect). Introduction to social network analysis What is SNA The network perspective The importance of network position

Position in network determines a opportunities and constraints; 1  e  ¡ AK  ¡ xA For example a is very central, and ¡ A will become ever more central over x ¡ A time; d ¡ A x ¡ A Does a high (and rising) number of b c friends have advantages? x x Does it have disadvantages? In this respect, how does a compare to b (or c or d)? Introduction to social network analysis What is SNA The network perspective More precisely, what are constraints and opportunities?

Burt 1992, 2005, 2010, in a business context:

Closure: every member of a network is (directly or indirectly) connected to everyone else. Communication reinforces shared views. Strengthens social control. Imposes reputation costs on any poor behavior. Facilitates trust formation. May be a source of rigidity (redundant information). Introduction to social network analysis What is SNA The network perspective Constraints and opportunities (cont.)

Burt’s “structural holes”:

Broker (A) bridges groups that are otherwise little connected. Can access knowledge and information from different sources (groups 1 and 2). Facilitates innovation through flows of information and knowledge between groups. Competitive advantage. Introduction to social network analysis What is SNA The network perspective Weak ties and information

Granovetter’s classical thesis (1973):

Weak ties act as bridges; Through weak ties, individuals have better access to non-redundant information; This is one reason why weak ties are often instrumental in finding jobs. Introduction to social network analysis What is SNA The network perspective Networks and groups

Some social science concepts or objects can be re-interpreted as networks. For example: Collectivities perceived as groups are often better described as networks. Notion of group only allows for binary membership status, while network considers varied levels of embeddedness and commitment; A group appears as a cohesive whole, while a network allows for multiple memberships and interests (whether in agreement or conflict).

Adapted from: Marin 2011. Introduction to social network analysis What is SNA The network perspective Network position and attributes

Traditional social science places emphasis on individuals’ attributes (gender, age, education, occupation etc.). This perspective can be integrated into social network analysis. Correlations between (shared) attributes and network position often empirically observed. : those with similar attributes tend to form ties with one another (McPherson, Smith-Lovin and Cook 2001). For example, friendship in schools: more often boys with boys (blue), and girls with girls (red), than boys with girls. Introduction to social network analysis What is SNA The network perspective Possible research questions

Two approaches: Understand influence of network positions (with attributes) on behaviours: For example among adolescents: having friends who smoke may induce one to smoke too. Understand network formation based on attributes: Two adolescents who both smoke may become friends precisely for this reason. The two approaches can be combined, in dynamic perspective: Two adolescents become friends because they both smoke, and friendship reinforces their smoking behaviours. Introduction to social network analysis What is SNA The network perspective More precisely, how do behaviours spread in a social network?

Things spread through networks: diseases, rumours, information, resources, norms... Some striking claims: e.g. spread of obesity through social ties (Christakis and Fowler 2007). Various causal mechanisms may be operating: Uncertainty/limited information; Status effects (following leaders); Self-reinforcing value homophily; Being exposed to common external influences/factors. Introduction to social network analysis What is SNA The network perspective How do networks form?

Exogeneous attributes: Attributes of tie sender (ego): e.g. some people more sociable than others; Attributes of tie receiver (alter): e.g. being knowledgeable attracts advice requests; Similarities (homophily): e.g. in class, girls are friends with girls and boys with boys.

Network-based social processes: Transitivity (a friend of my friend is my friend); Popularity (Matthew effect); Three-cycles (i helps j who helps k who in turn helps i); ... Introduction to social network analysis What is SNA Summary Now you know:

What a network is What the network perspective is Its theoretical bases Differences and similarities with respect to other approaches Why networks may help understand behaviours Introduction to social network analysis Data

Network data

Data format: How network data look like How they differ from other social science data From data to graph

Data collection: Name generators/interpreters Archives Web crawlers Introduction to social network analysis Data Data format Data type 1: Ego networks

The whole set of contacts (alters) of one person or entity (ego). Usually includes attributes of alters and ties between them. Usually collected for a sample of egos (e.g. in a survey). Typically, graphically represented with ego at its centre (star-shaped). Introduction to social network analysis Data Data format Example: Ego networks to discover “hidden” populations

Enrolling HIV+ persons to participate in vaccine preparedness study through their networks. Valente, 2010. Introduction to social network analysis Data Data format Data type II: Whole networks

Mapping the whole set of ties of a particular group, setting or population. Not focused on one particular person or entity. Network boundaries must be well-defined. Examples: network of friends in a classroom; network of knowledge-sharing between employees of an organisation. Introduction to social network analysis Data Data format Example: friendship in a school class

Repeated observations of friendship ties in a class. Blue = boys, pink = girls. A possible use is to use popular pupils to induce smoking reduction in the class. Introduction to social network analysis Data Data format Data storage: traditional social science

A rectangular table, where each row is an observation, each column is a variable:

name age gender married Jane 25 0 0 Mary 31 0 0 Bob 29 1 1 Sue 28 0 1 Alan 32 1 0 Tom 29 1 1 Introduction to social network analysis Data Data format Network data storage I: matrix

Network data can be stored as a n-by-n square matrix with all nodes listed in both columns and rows. The value of cell (i, j) in the matrix indicates whether the node i and the node j are connected (1) or not (0). The diagonal is meaningless.

For example, for a friendship network:

Jane Mary Bob Sue Alan Tom Jane 1 1 0 0 0 Mary 1 0 1 0 0 Bob 1 0 0 1 0 Sue 0 1 0 1 0 Alan 0 0 0 1 1 Tom 0 0 0 0 1 Introduction to social network analysis Data Data format Data storage II: Edge list

The edge list stores each pair of connected nodes in a single row of a table.

For example, for the same friendship network:

ego alter Jane Mary Jane Bob Mary Sue Bob Alan Alan Tom Alan Susan Introduction to social network analysis Data Data format The advantages of a edge list

The edge list allows combining tie and attribute data:

ego alter gender gender ego alter Jane Mary 0 0 Jane Bob 0 1 Mary Sue 0 0 Bob Alan 1 1 Alan Tom 1 1 Alan Susan 1 0 Introduction to social network analysis Data Data format The advantages of a edge list (cont.)

New variables (e.g. shared traits) can be easily built and stored:

ego alter gender gender same ego alter gender? Jane Mary 0 0 1 Jane Bob 0 1 0 Mary Sue 0 0 1 Bob Alan 1 1 1 Alan Tom 1 1 1 Alan Susan 1 0 0 Introduction to social network analysis Data Data format Data storage III: Node list

The node list stores the links of each node to all others, for example:

ego alter 1 alter 2 Jane Mary Bob Mary Sue Bob Alan Alan Tom Susan Introduction to social network analysis Data Data format Which format to choose, more generally

Most network analysis packages support both formats. Some provide conversion facilities (e.g. UCINET: edge list to matrix). It is usually possible to combine network data (in matrix or edge list format) and attributes. A rectangular table is usually needed for attribute data —as in traditional social science. Introduction to social network analysis Data Data format Some general rules

Matrix visually appealing when nodeset is small, but difficult to handle when it is large (because all possible pairs must be explicitly included). With large node sets, edge list / node list are more convenient (because only existing ties need to be listed). Introduction to social network analysis Data Data format Tie data I

Directed ties: a b - Tie goes from one node to another, but x x not necessarily back. E.g. Advice-giving, money-lending. Usual graphical representation: arrow. When directed ties do go in both directions, they are reciprocal ties. a b Usual graphical representation: double  - arrow. x x Introduction to social network analysis Data Data format Tie data I

Directed ties: a b - Tie goes from one node to another, but x x not necessarily back. E.g. Advice-giving, money-lending. Usual graphical representation: arrow. When directed ties do go in both directions, they are reciprocal ties. a b Usual graphical representation: double  - arrow. x x Introduction to social network analysis Data Data format Tie data II

Undirected ties: a b Ties are mutual by definition. x x E.g. Siblings, co-workers. Usual graphical representation: line. Introduction to social network analysis Data Data format Undirected ties: matrix is symmetric

Jane Mary Bob Sue Alan Tom Jane 1 1 0 0 0 Mary 1 0 1 0 0 Bob 1 0 0 1 0 Sue 0 1 0 1 0 Alan 0 0 0 1 1 Tom 0 0 0 0 1 Introduction to social network analysis Data Data format Directed ties: matrix is NOT symmetric

Jane Mary Bob Sue Alan Tom Jane 1 1 0 0 0 Mary 0 0 1 0 0 Bob 0 0 0 1 0 Sue 0 0 0 0 0 Alan 0 0 0 1 1 Tom 0 0 0 0 0 Introduction to social network analysis Data Data format Edge lists with directed /undirected ties

The edge list in itself does not change –can accommodate for both directed and undirected ties.

However, some indication must be provided as to whether the data are to be interpreted as directed or undirected (e.g. in metadata).

ego alter Jane Mary Jane Bob Mary Sue Bob Alan Alan Tom Alan Susan Introduction to social network analysis Data Data format Binary and valued ties

Binary ties indicate presence or absence of tie Valued ties can be stronger or weaker, under some definition of strength: Emotional closeness; Frequency of contact; Duration of Relationships. Graphically: line (arrow) thickness often represents strength of tie. Introduction to social network analysis Data Data format Storing valued ties in a edge list

The edge list can include a third column with attributes of each tie.

In our friendship example, we can include duration of friendship:

ego alter duration (years) Jane Mary 5 Jane Bob 2 Mary Susan 3 Bob Alan 1 Alan Tom 2 Alan Susan 2 Introduction to social network analysis Data Data format Attributes and valued ties in a edge list

As before, we can include attributes of ego and/or alter

ego alter duration same (years) gender? Jane Mary 5 1 Jane Bob 2 0 Mary Susan 3 1 Bob Alan 1 1 Alan Tom 2 1 Alan Susan 2 0 Introduction to social network analysis Data Data format Storing valued ties in a matrix

Instead of 0-1 values, the matrix has different values depending on duration of the relationship:

Jane Mary Bob Sue Alan Tom Jane 5 2 0 0 0 Mary 0 0 3 0 0 Bob 0 0 0 1 0 Sue 0 0 0 0 0 Alan 0 0 0 2 2 Tom 0 0 0 0 0 Introduction to social network analysis Data Data format One-mode and two-mode networks

One-mode network: Only one type of node: persons connecting with other persons, or organisations connecting with other organisations. Example: the friendship network of Jane, Mary, Bob, Sue, Alan and Tom.

Two-mode network: Two types of nodes: e.g. board directors and companies; lenders and borrowers; teachers and students. Relations in two-mode networks are affiliations from one kind of node to another. One-mode networks can be extracted from two-mode networks. Introduction to social network analysis Data Data format One-mode and two-mode networks

One-mode network: Only one type of node: persons connecting with other persons, or organisations connecting with other organisations. Example: the friendship network of Jane, Mary, Bob, Sue, Alan and Tom.

Two-mode network: Two types of nodes: e.g. board directors and companies; lenders and borrowers; teachers and students. Relations in two-mode networks are affiliations from one kind of node to another. One-mode networks can be extracted from two-mode networks. Figure: A two mode network: contracts between Peruvian microfinance institutions (blue) and their funders (red). Figure: Extracting a one mode network: similarities between Peruvian microfinance institutions in terms of their choice of funders. Tie strength depends on number of common funders. Isolates do not share funders. Introduction to social network analysis Data Data format Two-mode networks require rectangular matrices

The number of nodes in rows is not necessarily equal to number of nodes in columns. For example, clients and supermarkets:

Tesco Sainsbury’s Waitrose Jane 0 1 1 Mary 1 0 0 Bob 1 0 0 Sue 0 1 0 Alan 0 0 1 Tom 1 0 0 Introduction to social network analysis Data Data format Graphs

Basic principles of graph representation are simple (nodes and edges). But graph visualisation is a complex problem in computer science. Which representation is most suitable for detecting network structure and properties?

Circle Introduction to social network analysis Data Data format Graphs

Basic principles of graph representation are simple (nodes and edges). But graph visualisation is a complex problem in computer science. Which representation is most suitable for detecting network structure and properties?

Fruchtermann-Rheinhold Introduction to social network analysis Data Data format Graphs

Basic principles of graph representation are simple (nodes and edges). But graph visualisation is a complex problem in computer science. Which representation is most suitable for detecting network structure and properties?

Kamada-Kawai Introduction to social network analysis Data Data format Graphs

Basic principles of graph representation are simple (nodes and edges). But graph visualisation is a complex problem in computer science. Which representation is most suitable for detecting network structure and properties?

Spring Introduction to social network analysis Data Data format Graphs

Basic principles of graph representation are simple (nodes and edges). But graph visualisation is a complex problem in computer science. Which representation is most suitable for detecting network structure and properties?

MDS Introduction to social network analysis Data Data format Now you know:

Format for network data: square matrix, rectangular matrix, edge list. Difference between Ego and whole networks. Directed and undirected ties. Binary and valued ties. One mode and two mode networks. Graphical conventions to represent these different data. Introduction to social network analysis Data Data collection Collecting network data

Networks are built from nodes and the ties between them. Who are the nodes What are the ties –what is the relationship of interest. Both aspects are essential. Once this has been defined: how can we elicit information on ties from nodes? Introduction to social network analysis Data Data collection How to identify nodes

Ego-network data collections often included in larger surveys. Whole network data collection requires defining network boundaries, for example: Members of an organisation; Students of one school; Attendees of one particular event. N.B. collection of whole network data needs to be exhaustive –sensitive to response rate. Introduction to social network analysis Data Data collection What are the relevant relationships

It depends on your research questions, however in general:

Ego-network data: usually broad definitions: people with whom you discuss important matters; people you have been in touch with over the last six months; etc. Whole network: often narrower definitions: colleagues whom you sought work-related advice from; schoolfriends you spend time with outside school; organisations from whom you borrow money (outside grants or equity investment) etc. Introduction to social network analysis Data Data collection Collecting network data through surveys: name generators and interpreters

Name generators are questions to elicit respondents’ alters, for example:

From time to time, most people discuss important matters with other people. Looking back of the last six months, who are the people with whom you discussed matters important to you. Just tell me their names or initials. (General Social Survey, 1985)

Can be accompanied by name interpreters to report alter characteristics and identify ties between alters. Introduction to social network analysis Data Data collection Name generators and interpreters with real-time visualisation

Appeal of visual depictions of relationships in social network analysis is known. Recent tendency to exploit its advantages during data gathering (Hogan, Carrasco and Wellman 2007). Improves participants’ survey experience as they may gain insight into their social connectivity. Figure: A name generator with real-time visualisation in a web-based survey; research project ANAMIA. Introduction to social network analysis Data Data collection Collecting network data through surveys: rosters

Provide respondents with a list of potential network members and ask them to choose from the list those to whom they are tied, for example:

Here is the list of all the members of your Firm. Would you go through this list, and check the names of those you socialize with outside work. You know their family, they know yours, for instance. I do not mean all the people you are simply on a friendly level with, or people you happen to meet at Firm functions. (Lazega, 2001) Introduction to social network analysis Data Data collection Collecting network data through surveys: rosters (cont.)

Used for whole network studies. Also useful as a memory-aid. Requires the researcher to have a complete list of nodes from start. Only feasible for relatively small networks (e.g. schools, companies). Introduction to social network analysis Data Data collection Collecting network data through surveys: other methods

Position generators: Provide a list of occupations (or other positions) and ask respondents to select positions in which they know at least one person. Resource generators: Provides respondents with a list of resources or skills and asks to select resources possessed by at least one network member. Phone book method, reverse small world method... Introduction to social network analysis Data Data collection Collecting network data through surveys: ethical challenges

Borgatti and Molina (2003):

Difficult to ensure respondent anonymity because specific personal relationships are the object of study. Respondents may cite relationships with third parties who may not wish to participate in the study. Difficult to offer confidentiality as participants may deduce identity of individuals from even an anonymous graph. Introduction to social network analysis Data Data collection Collecting network data from archives

For example: contract data for microfinance institutions, retrieved from their financial statements; citations data, from publishers’ portals; historical data (e.g. Padgett’s marriage data in his study of the Medici family in renaissance Florence; characters in Les Misérables); Depends on the quality of the archive and the actual availability of network information. Need to ensure definition of ties is consistent and data are reported uniformly across all nodes. Need to ensure completeness (for whole networks). Figure: A citations network. From a study of the literature on pro-anorexic websites over ten years, with a corpus of 60 scientific articles. From Casilli, Tubaro and Araya (2012); research project ANAMIA. Introduction to social network analysis Data Data collection Webcrawling

Using dedicated software to retrieve websites and the links between them. Increasingly popular with the rise of web-based networks, online social networking services, the study of the Internet as a network. Defining network boundaries may be difficult. Frequent need for manual verification of data quality. Privacy protection issues. Figure: Map of the eating-disorder related web sphere in France, 2010-12. By D. Pereira, F. Pailler, research project ANAMIA. Introduction to social network analysis Data Data collection Now you know:

Different ways of collecting network data: surveys, archives, webcrawling. All have advantages and disadvantages. Choice depends on research questions, context, legal framework and expected outcomes. Introduction to social network analysis Network metrics

Measuring properties of networks

Focus is on properties of patterns of relationships, independently of node attributes. Based on the mathematics of graph theory, refined with social science concepts. A variety of algorithms, measures and software applications are available. Introduction to social network analysis Network metrics Size Size

Network size = number of nodes (= number of contacts in a personal network); The “Dunbar number”: cognitive limitations restrict the size of personal networks to about 150 contacts; An open question: have increased human capacity to maintain relationships? Median network size on Facebook = 100, average about 150 - 200 (though large variation). Introduction to social network analysis Network metrics Density Density

The proportion of ties that actually exist and the ties that could exist in principle: L Density = (n∗(n−1)) for undirected ties; 2

L Density = (n∗(n−1)) for directed ties.

where L = number of edges, n = number of nodes. In ego networks, ego and ties to/from ego are omitted. Introduction to social network analysis Network metrics Density Application: Dense networks and behaviours

Denser online networks spread behaviours faster: Centola 2010. Introduction to social network analysis Network metrics Density Why is this so?

Adapted from Valente (2010). When adoption of a new behavior requires social reinforcement (threshold effect), a denser network favours change. Introduction to social network analysis Network metrics Distance Distance

Distance: number of steps from one member to another; Connected nodes have distance 1; Diameter: longest distance between any two nodes in a network; Average path length: average distance between all pairs of nodes; Shorter paths in a network are the most important (quicker flows of information).

Left: Longer paths; Right: Shorter paths. Introduction to social network analysis Network metrics Distance Application: small worlds

Most nodes are unconnected to each other; Yet most nodes can be reached from every other in few steps; E.g. two strangers connected through a mutual acquaintance; From Milgrom’s “Six degrees of separation” (1967) to Facebook’s distance 4.

Source: Facebook Data Team, 21 Nov. 2011. Introduction to social network analysis Network metrics Centrality Centrality measures

Who are the most “important” nodes, based on network position? Intuitively, A is the most important actor @ in a “star” network. u @ @ u How to identify the most important actor, @ A more generally? @ u@ u Four main answers: @ @ Degree centrality; @

Eigenvector centrality; u u Closeness centrality; u Betweenness centrality. Introduction to social network analysis Network metrics Centrality Degree centrality

Who are the most “important” nodes? Diane has the highest number of direct connections (degree); A connector, or hub.

Krackhardt’s kite network. Introduction to social network analysis Network metrics Centrality Degree centrality

Who are the most “important” nodes? Diane has the highest number of direct connections (degree); A connector, or hub.

Krackhardt’s kite network. Introduction to social network analysis Network metrics Centrality Betweenness centrality

Heather has fewer connections than Diane; Yet she occupies a strategic position, between different parts of the network; She controls what flows in the network.

Krackhardt’s kite network. Introduction to social network analysis Network metrics Centrality Closeness centrality

Fernando and Garth have fewer connections than Diane; But they are at a shorter distance from all other network members; They can monitor the information flow in the network.

Krackhardt’s kite network. Introduction to social network analysis Network metrics Centrality Core-periphery structures

Ike and Jane have low centrality scores; e.g. they may be external contractors for a company; may be sources of fresh information!

Krackhardt’s kite network. Introduction to social network analysis Network metrics Centrality Network centralisation

The extent to which a network is dominated by one (or a few) nodes:

@ u @ @ u uu @ @ u@ uuuu @ @ @ uu u u Introduction to social network analysis Network metrics Centrality Network centralisation

Measures the extent to which a network is dominated by a single central node. Comparing centrality of the most central node to the centrality of other nodes. Normalized by dividing by the maximum centralization possible for a network of the given size. Ranges from 0 to 1 (star network). Defined for Degree, Betweenness, and Closeness Centralization. Introduction to social network analysis Network metrics Centrality Centralisation may vary over time

Figure: The advice network of judges in a Parisian court. Correlation between indegrees, first to second observation (left panel) and second to third (right panel). Introduction to social network analysis Network metrics Clusters and Subgroups, clusters and cliques

Identifying components: groups of nodes connected to one another and not connected to any node outside the component. Usual solution: extract the main (=largest) components, then calculate metrics on it.

The (two-mode) network of directors and top 250 companies in Hong Kong. More than one component can be detected. Introduction to social network analysis Network metrics Clusters and cliques Isolates, dyads and triads as components

a b c ¡A u u ¡ uA ¡ A ¡ A ¡ A d e ¡ A f u u u Isolate Dyad Triad Introduction to social network analysis Network metrics Clusters and cliques Cyclic components A B E F G

u u u u u

C D H I J u u u u u

Cycle: a path that returns to its starting point. 3-cycles: ABCA, BCAB, etc. 4-cycles: ABDCA, BDCAB, etc. 6-cycles: EFGJIHE. Cut-point: removal would increase number of components (B,E)

Adapted from Scott (2000), Everett (1982). Introduction to social network analysis Network metrics Clusters and cliques Cliques

PP ¡A @ @ PP @ ¡ uAuuuu @ @ PP @ PP ¡ A @ @ @P  ¡ A @ @ ¡ A @ @ u ¡ A @  @ uuuuuu 3-member 4-member clique 5-member clique

A clique is a sub-set of nodes where all possible pairs of nodes are directly connected. Scott (2000). Introduction to social network analysis Network metrics Clusters and cliques Real-world cliques

@ @ uu@uuuu @ @ @ @ @ @ @ @ @ uuuuuu 1-clique 2-clique 3-clique

Completely connected groups uncommon. n-clique: points connected by a maximum path link. n-cliques of greater than 2 empirically infrequent.

Scott (2000). Introduction to social network analysis Network metrics Clusters and cliques The clustering coefficient

The C.C. of a node indicates this node is embedded in its neighbourhood; The average C.C. gives an overall indication of clustering in the network. Introduction to social network analysis Network metrics Clusters and cliques Application: better understanding Small Worlds

A “small world” network is sparse, but with dense neighbourhoods (= high clustering coefficient) and short average path length. Introduction to social network analysis Network metrics Clusters and cliques Now you know:

Key metrics to measure properties of networks: Size; Density; Distance; Centrality / Centralisation; Distance; Components and cliques. Introduction to social network analysis Models

Statistical models for network analysis

By definition, relational data do not meet the indipendence assumptions of standard statistical models. Therefore, conventional approaches to statistical inference are unlikely to be reliable. Special models have been devised to account for network structures. Introduction to social network analysis Models

Statistical models for network formation

How do people form ties to others? Presence/absence of a tie can be conceptualised as a binary variable. Therefore, we can estimate probability of tie formation between any two given nodes. Models for binary choice in statistics: logistic regression. However, logistic regression in itself does not control for dependence of observations. “Adapted” versions of logit models are used to estimate network tie formation. Introduction to social network analysis Models

The most commonly used modeling approaches

Older generation of models: P1, P2 models: revised versions of logistic regression, adjusted to take dyads (and to a more limited extent, triads) into account. Useful for cross-section data. A more sophisticated approach: P* or ERGM (Exponential random graph model). Takes into account a large variety of triadic effects. Today, the most advanced approach to the study of cross-section network data. Siena: takes into account both triadic and global (degree-dependent) effects. For longitudinal data. Introduction to social network analysis Models

Other approaches

Economic models of network tie formation: take into account costs and benefits of forming, maintaining and deleting ties. Often based on game-theoretic notions (payoffs, strategy, equilibrium). Agent-based computer simulation models of network formation: compare empirical data with a theoretical process simulated on a computer, to infer properties of network and its evolution. Models inherited from physics or computer science: often based on comparison with some theoretical distribution (e.g. random networks, scale-free networks, etc.). Introduction to social network analysis Models

Now you know:

Basic principles of statistical models for networks: Based on logistic regression; ERGM for cross-section data; Siena for longitudinal data; Other (agent-based, game-theoretic etc.). Introduction to social network analysis Resources Software Software for network analysis

UCINET and Netdraw For data management, metrics, basic models, and visualisation. Easy to use, widely used; Comprehensive tutorial available on the web, good Help function; Free for one-month trial, then needs to be purchased. Pajek For exploratory data analysis and visualisation; Good for large networks, widely used; Some tutorials available on the web, a support book; Freely available online. Introduction to social network analysis Resources Software Software for network analysis (cont.)

R packages: SNA, tnet, PNet, RSiena. SNA, tnet: metrics, basic models and visualisation; PNet for ERGM (cross-section modeling); RSiena for longitudinal data modeling More difficult at the beginning, but more powerful and versatile; Freely available. GEPHI State-of-the-art for visualisation; Very good for large (incl. online) social networks; Freely available. Introduction to social network analysis Resources Software Software for network analysis (cont.)

Stata/SAS Generalist statistical software, especially useful for data management, format conversion, coding etc. Can handle dyadic data, basic measures. Other: ORA (exploratory analysis and visualisation); Visone (visualisation); Sonia (visualisation with animation, movies), NodeXL (data capture from the web, some analysis and visualisation), etc. Introduction to social network analysis Resources Books Books on social network analysis: general

Thomas W. Valente. Social networks and health. Models, Methods, and Applications, Oxford UP 2010. Christina Prell. Social Network Analysis. History, Theory and Methodology, Sage 2011. John Scott and Peter J. Carrington. The SAGE Handbook of Social Network Analysis, Sage 2011. Introduction to social network analysis Resources Books Books on social network analysis: general (cont.)

Stanley Wasserman and Katherine Faust. Social Network Analysis: Methods and Applications, Cambridge UP, 1994. Peter J. Carrington, John Scott, Stanley Wasserman (Eds.) Models and Methods in Social Network Analysis, Cambridge UP, 2005. David Knoke. Social Network Analysis, Sage 2008. Introduction to social network analysis Resources Books Books on social network analysis: Theory

Ronald S. Burt. Brokerage and Closure: An Introduction to , Oxford UP, 2005. Ronald S. Burt. Neighbor Networks: Competitive Advantage Local and Personal, Oxford UP, 2010. Nan Lin. Social Capital: A Theory of Social Structure and Action, Cambridge UP, 2002. Introduction to social network analysis Resources Books Books on social network analysis: Economics

Matthew O. Jackson Social and Economic Networks, Princeton UP, 2010. Sanjeev Goyal. Connections: An Introduction to the Economics of Networks, Princeton UP, 2009. Fernando Vega-Redondo. Complex Social Networks,Cambridge UP 2007. Introduction to social network analysis Resources Books Journals

Social Networks, Elsevier Connections Journal of Social Structure Redes (Spanish), Réseaux (French), etc. Introduction to social network analysis Resources Books Associations and conferences

INSNA: International Sunbelt conference, yearly (www.insna.org) UKSNA: annual conference, UK, yearly (www.uksna.org) ASNA: annual conference, Zurich, yearly (www.asna.ch). Introduction to social network analysis Resources Books

Questions? Comments? Introduction to social network analysis Resources Books

Thank you!

Paola Tubaro, [email protected] Introduction to social network analysis Resources Books References

Borgatti S. P. and J. L. Molina (2003). Ethical and strategic issues in organizational social network analysis. Journal of Applied Behavioral Science, 39(3), 337-349.

Burt R.S. (2005). Brokerage and Closure: An Introduction to Social Capital, Oxford UP.

Burt R.S. (2010). Neighbor Networks: Competitive Advantage Local and Personal, Oxford UP.

Centola D. (2010). The spread of behavior in an online social network experiment, Science, 329(5996), 1194-1197.

Christakis N. A. and J.H. Fowler (2007). The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357, 370-379.

Everett M. (1982). A graph theoretic blocking procedure for social networks. Social Networks, 4, 147-167.

Granovetter M. (1973). The strength of weak ties. American Journal of Sociology, 78(6), 1360-1380.

Hogan B., J.A. Carrasco and B. Wellman (2007). Visualizing personal networks: working with participant-aided , Field Methods, 19, 116-144.

Lazega E. (2001), The Collegial Phenomenon : The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership, Oxford University Press.

Marin A. (2011). Networks for newbies. Workshop given at the Sunbelt conference.

McPherson M., L. Smith-Lovin and J.M. Cook (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415-444.

Scott, J.P. (2000). Social Network Analysis: A Handbook, Sage.