Social Network Hansen and Smith

Heather Huynh What is Network Analysis?

analysis (SNA) is the systematic study of collections of social relationships, which consist of social actors implicitly or explicitly connected to one another What is ?

• Entities joined together by relationships • Relationships used to measure changes in patterns of relationships and workflow that are not visible in more common metrics (count of users, rates of usage) • This perspective distinguishes between simple population growth and the development of important social structures within that population

• Social networks existed long before , but social networking services like and LinkedIn, support the creation of large, distributed, real-time social networks of Social Network Analysis

Foundational phase (18th century – 1970s) • Focus on defining terms and establishing the necessary mathematical foundation • Erdos and Renyi: formal mechanisms for generating random graphs that made statistical tests of network properties viable • Mereno, Warner, and Mayo: applied formal mathematical methods to describe, analyze, and visualize networks (“psychological ”, “sociometrics”, and “” • Milgram: Six degrees of separation • Granovetter: showed ”weak ties” much better source of new jobs than “strong ties” => value of social network approach History of Social Network Analysis

Computational phase (1970s – mid-1990s) • Creation and systematic use of computational tools and methods • SNA as a methodological approach which leveraged the new capabilities of computers to analyze and visualize networks in novel ways • By mid 1990’s, SNA well-respected approach in numerous fields (organizational behavior, social , communication networks, , etc.) • “SNA Bible”: Social Network Analysis: Methods and Applications by Stanley Wasserman and Katherine Faust • Summarizes decades of research into a coherent mathematical framework, identifying core metrics and techniques used by SNA tools and researchers today History of Social Network Analysis

Network Deluge Phase (current) • People outside academics now use SNA techniques like corporations, governments, and nonprofits • Lots of tools created: Pajek, SNAP, NodeXL, and Gephi • Mining of data from Facebook, IM services, other channels • Techniques pioneered for inferring friendship networks from data captured via mobile devices of Social Network Analysis for HCI Researchers 1: Inform the design and implementation of new CSCW systems • SNA can characterize the of a population of intended users of a new CSCW system before the system is put in place • Research has shown mapping of members of a large can help design social and technical strategies to facilitate more effective information flow • Use SNA to identify, educate, and leverage those who will influence the maximal spread of adoption through the network to assure its rapid, effective use or help others to know to to use a new technology • Data for these analyses may come from network surveys or from existing data sources such as communication exchanges • with unique and important network positions can be identified and interviewed or observed as part of a comprehensive contextual inquiry process Goal 2: Understand and improve current CSCW systems • SNA of data from existing CSCW systems can illustrate the ways current features are utilized by users in different locations in the network • SNA may help managers understand what is happening in large scale where reading through even a meaningful of the content is not feasible • Example: knowing about “Theorist” subgroup on Lostpedia allowed designers to develop tools to meet needs of subgroup like page templates Goal 2: Understand and improve current CSCW systems • Several studies have developed recommendations for improving virtual reality games based on network analysis of guild networks and social interaction patterns • Network methods that identify subpopulations can offer customized interfaces and services to different groups of users, using the history of other users in the same group as a guide • researchers have shown how students use different social features to interact within small groups and class-wide, with implications for system design and instructional strategies Goal 3: Evaluate the impact of CSCW system on social relationships • Evaluate the impact of a CSCW system on the existing social structure of a population • Measuring the changes in aggregate and person-specific network metrics can help systematically evaluate the effectiveness of such systems • Evaluation can also be performed to assess the impact of a specific feature or social intervention (e.g. effect of an online “icebreaker”) • Education researchers are also using network data to identify students using online course management systems that may be in need of extra support • Data for evaluation assessments may come from offline network surveys, existing communications captured over time, or system usage data • For large-scale evaluations, SNA can be used as a part of a mixed method approach (like identifying who to interview in a network) Goal 4: Design novel CSCW systems and features using SNA methods • SNA can be used as input to new CSCW systems and features • A growing number of research prototypes and innovative products leverage SNA metrics and methods to provide enhanced functionality • Work done for identifying political tendencies of followers of different news agencies on which could be used for tools that personalize news, etc • Recent work has explored the theoretical and practical design implications for promoting “social translucence” within directed social network systems, like Twitter, where users can only see a portion of the social space (unlike chatrooms and discussion forums) Goal 5: Answer fundamental questions • “Computational social science”: a set of techniques that use computational techniques to address core social science questions in novel ways • So much data automatically captured via social media -> provide new opportunities to test hypotheses and theories at a much larger scale than previously possible • Predicting strength of ties from social media interactions or mobile phone usage patterns can support further large-scale studies of social networks by reducing the need for raw data collection from users • Work done by professors and students here at UIUC! Performing Social Network Analysis Identify Goals and Research Questions

• Essential that analysts hone in on a few critical goals and turn them into specific research questions • Within HCI, SNA is often exploratory in nature and analysts may only recognize what they are looking for once they see it • Often questions are refined after preliminary analysis of initial data Types of Questions SNA answers

• Questions about Social actors • Find prominent individuals; use “ metrics” or “equivalence metrics” • Questions about overall network structure • Focus on overall distribution instead of position of individuals; use “community detection ” (network clustering) and variety of “aggregate network metrics” • Questions about Network Dynamics and Flows • How networks change over time and how information, etc flows through networks (information diffusion) Collect Data

• Sources of data: • Raw data from system usage (i.e. database or XML files) [Medium-high] • Network [High] • Application Programming Interfaces (APIs) [Medium-high] • Screen scraping [Medium-high] • Network analysis importer tools (can import from 3rd party sites) [Easy] • Existing datasets, like Enron email network and Amazon related items (more at http://snap.stanford.edu/data/) [Easy] • Type of social network will determine how to appropriately analyze, visualize, and interpret data • Type determined by underlying phenomena it represents (i.e. Facebook vs. Twitter relationships) Networks can be…

• Directed vs. Undirected: directed = not necessarily reciprocated; undirected = always mutual • Weighted vs. Unweighted: weighted = edges have values associated with them; unweighted = edges either exist or do not • Multiplex networks: includes multiple types of edges (could be analyzed as a multiplex network or multiple distinct networks) • Unimodal vs. multimodal: unimodal networks = include only one type of node (i.e. all nodes represent people); multimodal networks = include more than one type of node, can have subset called bimodal or bipartite networks (which can be transformed into unimodal networks) Networks can be…

• Partial networks: “egocentric network” = includes a single node called an “ego” and all nodes that ego is directly connected to (called ”alters”); adding connections adds on degrees; can also sample a large network to find some network boundary to create partial networks

• A single socio-technical system has many types of networks; the choice of which to focus on depends on the goals of your study Representing Network Data

• Network data can be represented in three primary ways: • Edge lists – adjacency lists • Matrices – • Graphs – visually show nodes as vertices and edges as lines connecting them • Usually include additional attribute data to describe nodes and/or edges • In practice, several common network file formats: .graphml, .net, .gml, .dot, .txt, or .csv Analyze and Visualize Data

• 5 Commonly Used Tools for Network Analysis: • Gephi • NodeXL • Pajek • R • UCINet • All open source except UCINet; max network size varies from thousands to millions (more info on Table 2) • SNA requires the use of specialized designed to compute network metrics and visualize network graphs Node-specific metrics: focusing on the trees

• Analysts wants to characterize how important an individual is within a network, but definition of important may vary • Developed a set of quantitative measures called “centrality metrics” to represent these various types of importance (Table 3) • Many use the idea of “”, or number of edges on shortest between two actors • These metrics, along with statistical and visualization techniques, help identify the “structural signatures” of individual participants • Network analysts differentiate between those in the core of the network and those on the periphery Node-specific metrics: focusing on the trees

• Common centrality metrics: • (in-degree and out-degree) – • Betweenness – how disrupted flows would be if person removed • Closeness – how long take to disseminate info from one person to all others • Eigenvector – measures # of immediate connections and importance • Sometimes helpful to identify classes of people who share a similar structural signature or position in network • Can also use equivalence methods to identify similar individuals based on their relation to others in the network Aggregate Network Metrics: Focusing on the Forest • Developed and set of metrics to help characterize the entire network => can compare networks to one another or over time • Visualization of entire graph can reveal core or periphery of a network, clusters, and other patterns • Graphs can be too large to meaningfully visualize, so can use aggregate network metrics Aggregate Network Metrics: Focusing on the Forest • Similar to summary for networks • Basic metrics: • Number of vertices or edges • Number of connected components and their size • Common aggregate network metrics • Density – amount of interconnectivity in network • Diameter – number of hops needed to reach two individuals who are furthest away in network • Average geodesic distance – average number of hops between two people in network (degrees of separation) • Network centralization – measure of how hierarchal a network is Aggregate Network Metrics: Focusing on the Forest • In addition to aggregate network metrics, often look at the distribution of node-specific metrics such as degree • Help identify outliers and get overall sense of the network • Ex: network centralized around few key individuals, but otherwise not densely connected will have skewed with a few high degree individuals and many low degree individuals. Network Clusters and Motifs: Focusing on the Thickets • Networks are composed of smaller components that can be examined • Examples: • /network clusters (can identify using community detection algorithms, etc) • Network motifs (recurring structures in networks) • Can show patterns like fans, tunnels, and structural holes • Triads (combos of 3 nodes) serve as building blocks of networks • Identifying and quantifying these network structures important because often reflects social divides, political opinions, and other behavior of interest Network Dynamics and Information Flow

• Networks constantly evolving • Can research things like information diffusion and extension of techniques for spread of disease to spread of other phenomena • Information can flow through networks, but structure of network itself can also change • Examine changes in networks by comparing network metrics from different snapshots in time to highlighting important critical events or “bursts” in network • Dynamic analysis features being added to existing network tools, like allowing edges and vertices to be timestamped Network visualization

• Visualizing in a meaningful way is not trivial (have to consider a lot of things) • Ideally, we can attain a “netviz nirvana”: • Every is visible • Every vertex’s degree is countable • Every edge can be followed from source to destination • clusters and outliers are identifiable • Unnecessary edge crossings are removed • Larger/more denser networks pose significant challenges; tools try to help with reaching these goals • Exploration of network readability metrics and graph summarization techniques SNA Best Practices

• Use network metrics appropriate for type of network being examined • Do not claim more than your data can support • Customize your network visualizations to illustrate the core points you are making • Use appropriate statistical techniques when mapping network properties to outcomes of interest or comparing networks • Look at exemplary work for examples of methods and techniques appropriate for your questions