IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Lesson 10 Graph Visualisation and Analysis: Methods and Applications

Mentor: Dr. Kam Tin Seong Associate Professor of Information Systems (Practice) School of Information Systems, Singapore Management University

Content • Motivation of graph visualisation • Graph Visualisation in Actions • Typology of network data • Graph data file formats • Graph Layout Techniques • Network visualisation and analysis process model • Network Metrics • Network visualisation and analysis tools – Gephi – NodeXL

10-1 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Network in real World • Physical – Transportation (i.e. road, port, rail, etc) – Utility (electricity, water, gas, network cable, etc) – Natural (river, etc) • Abstract – Social media (i.e. e-mail, Facebook, Twitter, Wikipedia, etc) – Organisation (i.e. NGO, politics, customer-company, staff-to-staff, criminal, terrorist, disease, etc)

3

Classical Graph Theory • The Seven Bridges of Königsberg is a historically notable problem in . Its negative resolution by Leonhard Euler in 1735 laid the foundations of graph theory and prefigured the idea of .

Source: http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg

4

10-2 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Classical Graph Visualisation and Analysis • Using sociogram, also know as network graph

Early 20th-century social graph used in studying workflows among employees

Source: Valdis Krebs (2010) “Your Choice Reveal Who You Are: Mining and Visualizing Social Patterns” in Beautiful Visualization.

5

Graph Visualisation in action 1 • Using graph visualisation to understand business networks

6

10-3 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Graph Visualisation in action 2 • Graph visualisation is used to reveal voting patterns among United States senators

Graph Visualisation in action 3 • Graph visualisation is used to understand online

8

10-4 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Graph Visualisation in action 4 • Graph visualisation is used to show how the news are all connected by degrees of separation

Link: http://whichlight.github.io/reddit-network- vis/?discussion=http://reddit.com/r/gaming/comments/3d2ewz/ninten do_president_satoru_iwata_passes_away/

9

Graph Visualisation in actin 5 • SNA in Construction Industry

Source: Pryke, S.D.”Analysing construction project coalitions: exploring the application of ”, Construction Management and Economics, (2004), 22. pp. 787-797.

10-5 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Graph Visualisation in action 6 • (a) Similarity graph of log entries and (b) Similarity graph of network scans

(b) (a)

Source: for Security Visualization

Graph Visualisation in action 7 • Networks of the below universities are expanded in a breadth first manner up to the depth of 2, (showing university, alumni and companies they are associated with through employment, investment or other activities) • Size of the node reflects of the node (scaled logarithmically).

Source: http://www.innovation-ecosystems.org/wp-content/uploads/2010/12/2011.educon.pdf

10-6 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Graph Visualisation in action 8 • Degree indexes for nodes in the existing (2006) and proposed (2020) public transport networks in Melbourne’s north-east

Graph Visualisation in action 9 • Maritime degree, centrality and vulnerability: port hierarchies and emerging areas in containerized transport (2008–2010)

10-7 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Graph visualization in action 10 • S&T cooperation network diagram of cities in China

To learn more, go to Visual Complexity

Source: http://www.visualcomplexity.com/vc/index.cfm?domain=Transportation%20Networks

16

10-8 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

What are interaction data? • A table showing phone call links between 10 staff in a company.

Potential source of network data • Transaction records (for example, EZLink card)

18

10-9 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Potential source of network data

• Link records (for example, board of directors)

19

Potential source of network data • Unstructured data (for example, Tweets)

20

10-10 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

An undirected link table • Focus on interaction but not the direction. – For example C1 -> C6 and C6 -> C1 are considered as one interaction. • The table is symmetric.

A directed link table • Focus on both interaction and direction. – For example C1 -> C6 and C6 -> C1 are considered as two different interactions. • The table is asymmetric.

10-11 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Graph Data File Formats • GraphML, an XML-based file format for graphs. • GXL, graph exchange format based on XML. • , simple text based format. • GML is another widely used graph exchange format. • DGML, Markup Language from Microsoft. • XGMML, an XML-based graph markup language closely related to GML. • Dot Language, a format for describing graphs and their presentation, for the set of tools.

23

Graph database -

Reference: http://neo4j.com/developer/get-started/

10-12 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Graph representation • On the left is a normal graph, in the centre is a graph in which each edge is given a numerical value, and to the right is a directed graph.

Complete Graph • A is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge.

10-13 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

A sample graph • A complete graph is a simple undirected graph in which every pair of distinct vertices (also known as nodes) are connected by an unique edge.

edge

A directed graph • Have a clear origin and destination. • Also known as asymmetric edges. • Suitable for representing network with non reciprocal relationships such as Twitter.

10-14 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

A weighted graph • A weighted edge includes values associated with each edge that indicate the strength or frequency of tie. – For example, numbers of calls between two staffs.

A weighted graph

• Edges with different thickness are used to represent the monthly calls by staffs.

10-15 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

An ego-centric graph • Network consisting of an individual and their immediate peers (Heer & Boyd, 2005)

Bipartite Graph • A graph whose vertices can be divided into two disjoint sets U and V such that every edge connects a vertex in U to one in V; that is, U and V are independent sets. • Equivalently, a is a graph that does not contain any odd- length cycles.

32

10-16 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Affiliation Networks - Bipartite Graph

• Nodes are partitions into two subsets and all Two-mode view of the Southern lines are between pairs of nodes belonging to Women social event dataset different subsets • As there are g actors and h events, there are g + h nodes • The lines on the graph represent the relation “is affiliated with” from the perspective of the actor and the relation “has as a member” from the perspective of the event. • No two actors are adjacent and no two events are adjacent. If pairs of actors are reachable, it is only via paths containing one or more events. Similarly, if pairs of events are reachable, it is only via paths containing one or more actors. Valdis Krebs (2010) “Your Choice Reveal Who You Are: Mining and Visualizing Social Patterns” in Beautiful Visualization.

A Multimodel Graph

• Social network connecting different types of vertices. – For example, a network may connect peers to discussion forums and blog posts they have commented on.

SeriousEats visualization emphasizing the most important people (black circles), forums (orange squares), and blogs (blue diamonds).

10-17 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Adjacency Matrix

35

Graph Layouts

36

10-18 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Conventional graph drawing • Showing node-edge relationship. • Very challenging for large graph.

Issues to consider when representing structured data through a graph • Positioning of nodes • Representation of the edges • Dimensioning • Interaction with graphs

10-19 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Graph Layout Techniques • Force-directed • Radial • Tree

Source: http://en.wikipedia.org/wiki/Graph_drawing

Force-Directed Layout • http://bl.ocks.org/mbostock/4062045

Source: https://en.wikipedia.org/wiki/Force-directed_graph_drawing

40

10-20 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

BiPartite Layout

Link: http://bl.ocks.org/NPashaP/9796212

41

Node-Only Layout

Link: http://www.nytimes.com/interactive/2012/02/13/us/politics/2013-budget-proposal-graphic.html

42

10-21 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Time-Oriented Layout

Linked: http://neuralengr.com/asifr/journals/

43

Radial Hierarchical Layout

Link: http://mbostock.github.io/d3/talk/20111018/cluster.html

44

10-22 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Tree Hierarchical Layout

Link: http://mbostock.github.io/d3/talk/20111018/tree.html

45

Geographic Layout • Air transportation network for Brazil.

Source: http://www.researchgate.net/publication/260109852_Using_Network_Sciences_to_Evaluate_the_Brazilian_Airline_Network

46

10-23 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Chord Diagrams

Link: http://projects.delimited.io/experiments/chord-transitions/demos/trade.html

47

Sankey Diagrams

Link: http://bost.ocks.org/mike/sankey/

48

10-24 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Hive Plot

Link: http://www.hiveplot.net/

49

Hive Plot using d3.js

Link: http://www.konstantinkashin.com/blog/2013/12/22/network-visualization-with-d3js/

50

10-25 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Basic Visual Attributes

51

Additional Visual Attributes

52

10-26 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Network Visualisation and Analysis Process Model (Hansen, D. L. et. al. 2009)

Measures of Power and Influence - Network Metrics • A collection of statistical measures to report: – the connectivity of a node within a network, – the complexity of a network, – the clusters or sub- groups within a network.

10-27 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Network Metrics: Degree • Degree, the number of direct connections a node has. • Degree is often interpreted in terms of the immediate risk of node for catching whatever is flowing through the network (such as a virus, or some information). • If the network is directed (meaning that ties have direction), then we usually define two separate measures of degree centrality, namely indegree and outdegree. • Indegree is a count of the number of ties directed to the node, and outdegree is the number of ties that the node directs to others. • For positive relations such as friendship or advice, we normally interpret indegree as a form of popularity, and outdegree as gregariousness.

Network Metrics: Degree

10-28 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Network Metrics: Betweenness centrality • Betweenness is a centrality measure of a vertex within a graph (there is also edge betweenness, which is not discussed here). • Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not.

Network Metrics: Betweenness centrality

10-29 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Network Metrics: Closeness Centrality • In graph theory closeness is a centrality measure of a vertex within a graph. Vertices that are 'shallow' to other vertices (that is, those that tend to have short geodesic distances to other vertices with in the graph) have higher closeness. • Closeness is preferred in network analysis to mean shortest-path length, as it gives higher values to more central vertices, and so is usually positively associated with other measures such as degree.

Network Metrics: Closeness Centrality

10-30 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Network Metrics: Eigenvector Centrality • A measure of the importance of a node in a network. • It assigns relative scores to all nodes in the network based on the principle that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes.

Network Metrics: Eigenvector Centrality

10-31 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Network Metrics: • A measure on how connected a vertex’s neighbours are to one another. More specifically, it is the number of edges connecting a vertex’s neighbours divided by the total number of possible edges between the vertex’s neighbour.

Network Metrics: Clustering Coefficient

10-32 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Summary of Network Metrics

Network Analytics Methods

• Mapping relationships • Identifying hierarchies • Detecting communities • Analysing flow

66

10-33 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Analysing Spatial Networks Relationship • Spatial Network Analysis for Multi-modal Urban Transport Systems

Source: http://www.snamuts.com/melbourne.html 67

Analysing Hierarchy of Spatial Network • Global Maritime Transport

http://onlinelibrary.wiley.com/doi/10.1111/j.1471-0374.2011.00355.x/pdf

68

10-34 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Detecting communities • in protein-protein interaction networks.

Source: http://arxiv.org/pdf/0906.0612.pdf

69

Spatial Networks for Flow Analysis • Arteries of the City

Source: http://2013.sopawards.com/wp-content/uploads/2013/05/697-South-China-Morning-Post-Arteries-of-the-City.pdf

70

10-35 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Swiss Knife for Graph Visualisation and Analysis I

• NodeXL, an open-source template for Microsoft® Excel® 2007 and 2010 that makes it easy to explore network graphs.

Homepage: http://nodexl.codeplex.com/

Swiss Knife for Graph Visualisation and Analysis II

Reference: https://gephi.org/

10-36 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Graph Visualisation toolkit for the Web • http://sigmajs.org/

Graph Visualisation toolkit for the Web • D3.js

10-37 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Graph Visualisation App for Facebook • This application is created using d3.js

For more information: http://groups.google.com/group/d3-js/browse_thread/thread/d1b6ec5c59aa576f

Interactive Network Visualisation

10-38 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

d3.js force directed graph examples IV

Source: http://dhs.stanford.edu/dh/networks/

77

d3.js force directed graph examples V

Source: http://christophergandrud.github.io/networkD3/

78

10-39 IS428 Visual Analytics for Business Intelligence 10/29/2015 Lesson 10: Graph Visualisation and Analysis

Reference • Brath, R. and Jonker, D. (2015) Graph Analysis and Visualization. John Wiley & Sons, Indianapolis, IN. • Hansen, D., Shneiderman, B. and Smith, M.A. (2011) Analyzing Social Media Networks with NodeXL. Elservier Inc. USA. • Cherven, K. (2015) Mastering Gephi Network Visualization. PACKT Publishing, Birmingham, UK.

79

Reference • Chord diagram (https://en.wikipedia.org/wiki/Chord_diagram) • Chord Diagrams in D3 (http://www.delimited.io/blog/2013/12/8/chord- diagrams-in-d3) • Circos (http://circos.ca/) • Using Data Storytelling with a Chord Diagram (http://www.visualcinnamon.com/2014/12/using-data- storytelling-with-chord.html) • Hive plots – rational approach to visualizing networks (http://bib.oxfordjournals.org/content/13/5/627.long)

10-40