Visual Encoding Approaches for

Temporal Social Networks

Peter John Hoek

BIT (Distinction) CQU, MSc (IT) UNSW

A thesis submitted in partial fulfilment of the requirements

for the degree of

Doctor of Information Technology at the

School of Engineering and Information Technology

The University of New South Wales

at the Australian Defence Force Academy

2013

Abstract

Visualisations have become an inseparable part of analysis methodologies. However, despite the large amount of work in the field of social network visualisation there are still a number of areas in which current visualisation methods can be improved.

The current dynamic network visualisation approaches consisting of aggregation or animated movies suffer from various limitations, such as introducing artefacts that could obscure interesting micro-level patterns or disrupting the users’ internalised mental models. In addition, very few social network tools support the inclusion of semantic and contextual information or provide visual topological representations based on network node attributes.

This thesis introduces novel approaches to the visualisation of social networks and assesses their effectiveness through the use of concept demonstrators and prototypes. These are the software artefacts of this thesis, which provide illustrations of complementary visualisation techniques that could be considered for inclusion into social network visualisation and analysis tools.

The novel methods for visualising temporal networks introduced in this thesis consist of:

i

 an Attribute-Based Graph Visualisation (ABGV) approach for

introducing attributes as additional nodes of the sociogram,

 a visual component for the Social Network Analysis for Command and

Control (SNAC2) tool, useful for visualising the changing topology of

the network at the same time with the contextual source of the links

in the network,

 a Parallel Arc Diagram Visualisation method and a software prototype

the Temporal Interactive Parallel Arc Diagram (TIPAD) for discovering

temporal patterns embedded in the network, and

 a prototype for the Temporal Interactive Multi-slider Event and

Relationship (TIMER) that combines node-link representations with

representations of events in time without aggregation or animation,

the main purpose of which is to preserve information faithfulness.

In addition this thesis extends the standard static taxonomic tasks with temporal tasks descriptions for the purpose of enhancing the evaluation of temporal visualisation techniques. The performance of the proposed methods was tested using a combination of case studies, publicly-available datasets and the features of the proposed methods compared with established tools. A taxonomic evaluation provides the basis for positioning the proposed methods within the social network visualisation domain.

ii

Acknowledgments

Social network theory suggests everyone connected to me has in some way influenced my research. I am therefore very grateful to all the people in my social network. I am especially grateful to my supervisors Dr Hussein Abbass and Dr Michael

Sweeney for their patience and support throughout this journey. Mike’s continued encouragement drove me to complete this thesis. This work was enabled through the support of the Defence Study Bank scheme which provided both financial support and leave to pursue this research.

I wish to acknowledge my son, Keeley, for all his work in assisting me with the preparation of Adobe Flash animations for journal papers and his assistance in coding the TIMER tool. Most of all I appreciate the time he made for our discussions and his ongoing encouragement. I would also like to thank my son Kai for allowing me the time to write this thesis when he really wanted me to do something “fun” with him. Most of all I wish to thank my wife Mellissa for all her support and encouragement. Without her continued support and understanding I would not have been able to undertake this research.

The software prototypes discussed in this thesis were constructed through the use of the Java programming language. The screenshots of some of the prototypes and other images presented herein were often created by programs iii

utilising the Perfuse and Jung programming libraries. I therefore wish to thank both of those software development communities for their great work and their efforts in maintaining these very useful toolkits and especially those that have contributed to the theoretical underpinnings which often is overlooked.

iv

Introduction

Table of contents

ABSTRACT I

ACKNOWLEDGMENTS III

TABLE OF CONTENTS 1

LIST OF FIGURES 5

LIST OF TABLES 9

LIST OF ABBREVIATIONS 10

CHAPTER 1. INTRODUCTION 11

1.1 Social Networks 12

1.2 Social Network Visualisation 14

1.3 Challenges 16

1.4 Research Aims and Approach 21

1.5 Scope 22

1.6 Contributions 25

1.7 Thesis Structure 27

CHAPTER 2. BACKGROUND 33

2.1 Visualisation 33

2.2 Social Network Visualisation 36

2.3 Social Network Analysis 43

2.4 Fundamental Algorithmic Approaches to Visualisation 50 2.4.1 Force-Directed Methods 51 2.4.2 Non-Euclidean Approaches 55 2.4.3 Hierarchical Approaches 56 1

Introduction

2.4.4 Orthogonal 61 2.4.5 Matrix Representations 62 2.4.6 Arc Diagrams 64 2.4.7 Radial Layout 64 2.4.8 Circular Layout 65 2.4.9 Algorithm Summary 66

2.5 Enhancing Social Network Visualisation 69 2.5.1 Node-link Aesthetics 71 2.5.2 Alternatives to Node-link diagrams 73 2.5.3 Analytical User-Centred Visualisation 75

2.6 High-level Approaches to Social Network Visualisation 78 2.6.1 Visualising Clusters (Communities) 78 2.6.2 Dynamic Network Visualisation 88 2.6.3 Visualising Semantics and Attributes 117

2.7 Studies and Evaluation 135

CHAPTER 3. ATTRIBUTE BASED GRAPH VISUALISATION (ABGV) 143

3.1 Background 144

3.2 Motivation and Inspiration 146

3.3 Technique 148

3.4 Data Structures and Design Choices 153

3.5 Discussion 154

3.6 Evaluation 156

3.7 Implementation 159 3.7.1 Data 159 3.7.2 Program Code 161

3.8 Summary and Conclusion 163

CHAPTER 4. SOCIAL NETWORK ANALYSIS FOR COMMAND AND CONTROL (SNAC2) 165

4.1 Background 167

4.2 Similar Work 171 4.2.1 Continuous Node-Link Layout 173

4.3 Event Based Analysis + Content 175 2

Introduction

4.4 Implementation 177 4.4.1 Data 178 4.4.2 Program Code 180

4.5 User Evaluation 181

4.6 Concluding Remarks 186

CHAPTER 5. THE PARALLEL ARC DIAGRAM (PAD) 188

5.1 Background 189

5.2 PAD Concept 191

5.3 Evaluation Taxonomy 198

5.4 Temporal Interactive Parallel Arc Diagram (TIPAD) 200 5.4.1 Similar Work 201 5.4.2 The Script and the Network 202 5.4.3 Comparative Visualisations 204 5.4.4 TIPAD Design and Operation 209

5.5 Implementation 220 5.5.1 Data 220 5.5.2 Program Code 221

5.6 Conclusion 222

CHAPTER 6. TEMPORAL INTERACTIVE MULTI-SLIDER EVENT AND RELATIONSHIP (TIMER) TOOL 228

6.1 The Dataset 229

6.2 Similar Work 232

6.3 Motivation 237

6.4 TIMER Approach 237

6.5 Evaluating the Prototype 240

6.6 Discussion 246

6.7 Implementation 249 6.7.1 Program Code 249

CHAPTER 7. PROTOTYPES EVALUATION 254

3

Introduction

7.1 Evaluation 258 7.1.1 Object Focus 260 7.1.2 Low Level Tasks 262 7.1.3 Topology-based Tasks 277 7.1.4 Attribute-based Tasks 284 7.1.5 Browsing Tasks 287 7.1.6 Overview 289 7.1.7 Temporal Task Evaluation 296 7.1.8 Faithfulness 303

CHAPTER 8. CONCLUSIONS AND FUTURE WORK 308

8.1 ABGV 310

8.2 SNAC2 313

8.3 TIPAD 315

8.4 TIMER 317

REFERENCES 319

APPENDIX A - GLOSSARY 354

APPENDIX B – PSEUDO CODE & CLASS DIAGRAM 360

ABGV 360

SNAC2 364

TIPAD 367

TIMER 370

4

Introduction

List of Figures

FIGURE 1 –HUMANS NATURALLY USE NODE-LINK DIAGRAMS TO DESCRIBE RELATED CONCEPTS .... 18 FIGURE 2 – DIFFERENT PATHS ARE EMPHASISED DESPITE NODES POSITIONED IN THE SAME LOCATION [ADAPTED FROM (WARE 2000)] ...... 19 FIGURE 3 - THESIS CONTRIBUTIONS IN THE SOCIAL NETWORK VISUALISATION SPACE ...... 30 FIGURE 4 - VAN WIJK MODEL OF VISUALISATION (VAN WIJK 2005; VAN WIJK 2006) ...... 34 FIGURE 5 - VISUALISATION TAXONOMY ...... 35 FIGURE 6 -MORENO'S HAND DRAWN SOCIOGRAM: THIS NODE-LINK DIAGRAM PUBLISHED IN THE NEW YORK TIMES IN 1933 BY JACOB MORENO SHOWS RELATIONSHIPS BETWEEN FOURTH GRADERS...... 36 FIGURE 7 - A SOCIAL NETWORK DIAGRAM. SOURCE: ORGNET.COM (IMAGE REPRODUCED WITH KIND PERMISSION OF VALDIS KREBS) ...... 37 FIGURE 8 -SAMPLES OF SOCIAL NETWORK VISUALISATION TOOLS. SOURCE VISUALCOMPLEXITY.COM ...... 42 FIGURE 9 - PRIMARY ELEMENTS FOR VISUAL GRAPH ANALYSIS (VON LANDESBERGER, KUIJPER ET AL. 2011) ...... 50 FIGURE 10 -AN EXAMPLE OF THE SUGIYAMA LAYOUT ...... 57 FIGURE 11 – (A) A DIGRAPH AND (B) THE SAME GRAPH DRAWN WITH THE SUGIYAMA ALGORITHM (ADAPTED FROM (HEALY AND NIKOLOV 2013) ...... 59 FIGURE 12 – TWO ALTERNATIVE DRAWINGS OF THE SAME NETWORK. (A) FORCE DIRECTED (B) SUGIYAMA (HEALY AND NIKOLOV 2013) ...... 60 FIGURE 13- CINEMATIC PARTICLES ...... 70 FIGURE 14 - HENRY & FEKETE’S MATLINK – SHOWING A COMPARATIVE VIEW OF THE NETWORK. SOURCE: (FEKETE 2009) ...... 74 FIGURE 15 - PATH FOLLOWING ON MATLINK. SOURCE: (HENRY 2008) ...... 75 FIGURE 16 - VISUALISATION MATURITY. SOURCE: (AIGNER, MIKSCH ET AL. 2008) ...... 76 FIGURE 17 – A PSEUDO-RANDOM “SATELLITE” GRAPH RENDERED WITH (A) THE LINLOG MODEL AND (B) THE FRUCHTERMAN-REINGOLD MODEL (NOACK 2003) ...... 81 FIGURE 18 – A MULTI-LEVEL STRAIGHT-LINE CONVEX DRAWING OF A GRAPH (EADES AND FENG 1997) ...... 84 FIGURE 19 – LAYOUT TO DETECT BORDER NODES (CRUZ, BOTHOREL ET AL. 2014)...... 85 FIGURE 20 – TEMPORAL DEVELOPMENT OF SUBGROUPS. SOURCE (FALKOWSKI, BARTELHEIMER ET AL. 2006) ...... 91 FIGURE 21 – GRAPH VISUALIZATION (LEFT) AND CUT-OUT VIEW (RIGHT). SOURCE (FALKOWSKI, BARTELHEIMER ET AL. 2006) ...... 93 FIGURE 22 – SPIRAL GRAPH VISUALISATION OF CYCLIC TIME-ORIENTED DATA. SOURCE: (AIGNER, MIKSCH ET AL. 2008) ...... 94 5

Introduction

FIGURE 23 – GEOBOOST. SOURCE (EICK, EICK ET AL. 2008)...... 95 FIGURE 24 -WAVE PROPAGATION VISUALISATION. SOURCE (BLYTHE, PATWARDHAN ET AL. 2006) .. 97 FIGURE 25- RING VISUALIZATION. SOURCE: (APPAN, SUNDARAM ET AL. 2006) ...... 98 FIGURE 26 - THEME RIVER. SOURCE (HAVRE, HETZLER ET AL. 1999) ...... 99 FIGURE 27 – SONIA SCREEN SHOT...... 100 FIGURE 28 – VISUALISATION OF TWO COMMUNITIES AT TIME-STEP T1 AND T2 AND TWO INDIVIDUALS FORMING A THIRD COMMUNITY AT T3 (REDA, TANTIPATHANANANDH ET AL. 2011) ...... 103 FIGURE 29 - BUBBA TALK. SOURCE: (TAT AND CARPENDALE 2002) ...... 107 FIGURE 30 – CRYSTALCHAT. SOURCE: (TAT AND CARPENDALE 2006) ...... 108 FIGURE 31 –LOOM (LEFT) AND CHAT CIRCLES “CONVERSATION LANDSCAPE”(RIGHT). SOURCE: (DONATH, KARAHALIOS ET AL. 1999) ...... 109 FIGURE 32 - THEMAIL SCREEN SHOT. SOURCE: (VIÉGAS, GOLDER ET AL. 2006) ...... 111 FIGURE 33 - SMALL MULTIPLES NETWORK EXPLORATION SYSTEM. SOURCE: (FARRUGIA, HURLEY ET AL. 2011) ...... 112 FIGURE 34 - SMALL MULTIPLES OF VERTEX SLICES (BACH, PIETRIGA ET AL.) ...... 113 FIGURE 35 – GRAPHDICE. SOURCE (BEZERIANOS, CHEVALIER ET AL. 2010) ...... 121 FIGURE 36 –ENHANCED PAIRED PARALLEL COORDINATES VISUALISATION TOOL. SOURCE:(SHANNON, HOLLAND ET AL. 2008) ...... 123 FIGURE 37 – ATTRIGRAPH, MULTIVARIATE GRAPH VISUALISATION. SOURCE: (PRETORIUS AND WIJK 2008) ...... 125 FIGURE 38 -NETWORK VISUALISATION WITH SEMANTIC SUBSTRATES. SOURCE: (ARIS AND SHNEIDERMAN 2007) ...... 127 FIGURE 39 -SUBSTRATE DESIGNER. SOURCE (ARIS AND SHNEIDERMAN 2007) ...... 128 FIGURE 40 – PIVOTGRAPH SHOWING TWO CATEGORICAL DIMENSIONS. SOURCE: (WATTENBERG 2006) ...... 129 FIGURE 41- NETLENS. SOURCE: (PLAISANT, BEDERSON ET AL. 2006) ...... 130 FIGURE 42 - NESTED MODEL FOR VISUALISATION DESIGN AND VALIDATION. SOURCE: (MUNZNER 2009) ...... 138 FIGURE 43 - THREATS AND VALIDATION IN THE NESTED MODEL. SOURCE (MUNZNER 2009) ...... 139 FIGURE 44 - NODE TYPE CLUSTERED LAYOUT. SOURCE: (STOLPNIK 2009) ...... 148 FIGURE 45 - ABGV VISUALISATION ...... 150 FIGURE 46 - STANDARD FORCE DIRECTED VISUALISATION OF THE SAME NETWORK AS SHOWN IN FIGURE 45 ...... 151 FIGURE 47 - SNAC2 MAIN PANEL LAYOUT ...... 170 FIGURE 48 - AN EXAMPLE OF A DATA INPUT FILE FOR SNAC2 ...... 178

6

Introduction

FIGURE 49 – OPERATOR SELECTION PANEL IN SNAC2 ...... 180 FIGURE 50 -SNAC2 SNA GRAPH, PHASE 1 ...... 184 FIGURE 51 -SNAC2 SNA GRAPH, PHASE 2 ...... 184 FIGURE 52 -SNAC2 METRICS PER PHASE ...... 185 FIGURE 53 -SNC2 METRICS GRAPH ...... 185 FIGURE 54 - NODE-LINK LAYOUT VS. PAD LAYOUT ...... 191 FIGURE 55 - DAVIS, GARDNER ET AL. DATASET DISPLAYED AS 2-MODE NODE-LINK ...... 193 FIGURE 56 -DAVIS GARDENER ET AL. DATASET DISPLAYED AS 1-MODE PROJECTION NODE-LINK DIAGRAM. SOURCE (FREEMAN 2000) ...... 194 FIGURE 57 - PAD LAYOUT DETAIL ...... 195 FIGURE 58 - DAVIS, GARDENER ET AL. DATASET DISPLAYED AS 4-MODE PAD LAYOUT (EVENTS 1 TO 8) ...... 197 FIGURE 59 –DAVIS, GARDENER ET AL. DATASET DISPLAYED AS 4-MODE PAD LAYOUT (EVENTS 7 TO 14) ...... 197 FIGURE 60 - EXAMPLE OF 'THE MATRIX' PARSED FILE ...... 204 FIGURE 61 - “THE MATRIX” DEPICTED AS 2 MODE GRAPH USING A FORCE-DIRECTED LAYOUT (AGGREGATED UTTERANCES) ...... 205 FIGURE 62 -“THE MATRIX” DEPICTED USING A RADIAL GRAPH LAYOUT (AGGREGATED UTTERANCES) ...... 206 FIGURE 63 -“THE MATRIX” DEPICTED AS AN ADJACENCY MATRIX (AGGREGATED UTTERANCES) .... 207 FIGURE 64 -- BIPARTITE GRAPH OF THE MOVIE 'THE MATRIX' (AGGREGATED UTTERANCES) ...... 207 FIGURE 65 -2-MODE NODE-LINK GRAPH OF THE MOVIE 'THE MATRIX' (SEPARATE UTTERANCES) .. 208 FIGURE 66 - TIPAD, SHOWING THE DEFAULT VIEW OF THE MOVIE ‘THE MATRIX’ IN A PAD LAYOUT ...... 213 FIGURE 67 – BASIC EGOCENTRIC VIEW (MODE 2) ...... 215 FIGURE 68 – HIGH PARTICIPATION IN SCENES VS. LOW PARTICIPATION ...... 216 FIGURE 69 – PARTICIPATION WITH TRINITY ...... 216 FIGURE 70 – PARTICIPATION WITH MORPHEUS ...... 216 FIGURE 71 – MODE 3 ...... 218 FIGURE 72 – MODE 4 ...... 218 FIGURE 73 -MODE 4, ZOOMED VIEW ...... 219 FIGURE 74 - AN EXAMPLE OF A MOVIE SCRIPT FORMAT EXPECTED BY THE TIPAD APPLICATION .... 220 FIGURE 75 - TIPAD APPLICATION MAJOR GUI COMPONENTS ...... 222 FIGURE 76 - A SMALL PORTION OF THE VAST 2008 'CELL PHONE CALL' DATASET ...... 232

7

Introduction

FIGURE 77 - VAST 2008 CELL PHONE MINI-CHALLENGE TOOL. SOURCE: (FARRUGIA AND QUIGLEY 2008) ...... 234 FIGURE 78 - MOBIVIS, CALL GRAPH (CORREA, CRNOVRSANIN ET AL. 2008) ...... 234 FIGURE 79 – MOBIVIS WITH TIME CHART. SOURCE: (CORREA, CRNOVRSANIN ET AL. 2008) ...... 235 FIGURE 80 – SOCIALACTION, STACKED HISTOGRAM. SOURCE: (PERER 2008) ...... 236 FIGURE 81 - TIMER TOOL INTERFACE ...... 238 FIGURE 82 -TIMER TOOL SHOWING THE FULL DATASET VIEWED AS A NODE-LINK AND TEMPORAL EVENTS ...... 239 FIGURE 83 - TIMER TOOL SHOWING PHONE CALLS WITH RELATIVE DURATION FROM NODE 200 .. 241 FIGURE 84 - TIMER SHOWING ALL PHONE CALLS TO NODES 200,5,3,2 AND 1 ...... 242 FIGURE 85 - ZOOMED IN FROM THE START OF DAY 8 ...... 244 FIGURE 86 - SECOND SET OF NODES ...... 245 FIGURE 87 - MAJOR TIMER APPLICATION USER INTERACTION AREAS (DIRECT USER MANIPULATION) ...... 251 FIGURE 88 - ABGV CLASS DIAGRAM ...... 363 FIGURE 89 - PARTIAL CLASS DIAGRAM FOR SNAC2 ...... 366 FIGURE 90 - PARTIAL TIPAD APPLICATION CLASS DIAGRAM...... 369 FIGURE 91 - PARTIAL CLASS DIAGRAM OF THE "VIEW" PORTION OF THE TIMER TOOL ...... 372 FIGURE 92 - PARTIAL CLASS DIAGRAM OF THE "MODEL" PORTION OF THE TIMER TOOL ...... 373 FIGURE 93 -PARTIAL CLASS DIAGRAM OF THE "VIEW/CONTROLLER" PORTION OF THE TIMER TOOL ...... 374

8

Introduction

List of Tables

TABLE 1 – EXAMPLES OF SOCIAL NETWORK DATA SOURCES [ADAPTED FROM (ZHU 2007)] ...... 13 TABLE 2 – STRENGTHS AND WEAKNESSES OF FUNDAMENTAL ALGORITHMIC APPROACHES ...... 67 TABLE 3 - STRENGTHS AND WEAKNESSES OF CLUSTER VISUALISATIONS APPROACHES ...... 86 TABLE 4 - STRENGTHS AND WEAKNESSES OF DYNAMIC VISUALISATION APPROACHES ...... 114 TABLE 5 - STRENGTHS AND WEAKNESSES OF ATTRIBUTE-BASED VISUALISATIONS ...... 133 TABLE 6 - PAD VISUALISATION EVALUATION TABLE ...... 225 TABLE 7 - ONE POSSIBLE SOLUTION OF THE VAST 2008 CELL PHONE CALLS MINI-CHALLENGE ...... 247 TABLE 8 - CHARACTERISATION OF STANDARD APPROACHES AND THE PROTOTYPES BASED ON OBJECT FOCUS (CF. LEE, PLAISANT ET AL. 2006) ...... 260 TABLE 9 –EVALUATION TEST SUMMARY OF LOW-LEVEL TASKS ...... 291 TABLE 10 - EVALUATION SUMMARY OF TOPOLOGY BASED TASKS ...... 293 TABLE 11 – ATTRIBUTE-BASED TASKS ...... 295 TABLE 12 - BROWSING TASKS ...... 296 TABLE 13 - EVALUATION SUMMARY OF TEMPORAL ANALYSIS FEATURES ...... 303 TABLE 14 - SUMMARY OF PROTOTYPES FAITHFULNESS ...... 306

9

Introduction

List of Abbreviations

ABGV Attribute Based Graph Visualisation

AGNA Applied Graph Network Analysis tool

AOC Air and Space Operations Centre

CSV Comma-Separated Value

EDA Exploratory Data Analysis

INSNA International Network of Social Network Analysis

IRC Internet Relay Chat

HCI Human-computer Interaction

LED Light Emitting Diode

MAT Matrix Layout

MDS Multi-dimensional Scaling

MILC Multi-dimensional In-depth Long-term Case studies

NL Node Link

OLAP Online Analytical Processing

PAD Parallel Arc Diagram

SNA Social Network Analysis

SNAC2 Social Network Analysis for Command and Control

SVD Singular value decomposition

TIMER Temporal Interactive Multi-slider Event and Relationship viewer

TIPAD Temporal Interactive Parallel Arc Diagram

TST Time Sensitive Target

VAST Visual Analytics Science and Technology (Symposium)

10

Introduction

Chapter 1. Introduction

The primary goal of this thesis is to study the Visualisation of Social Networks and develop improved methods of displaying complex social network information.

Social Network Visualisation is a subfield of Information Visualisation and although there are numerous definitions of what Information Visualisation is, there is no common agreement on the term.

A definition used by this thesis is, “methods of providing visual representations of data to assist in comprehension and interpretation” or more generally stated as “amplifying cognition”. Information Visualisation improves our mental model of data by assisting us in gaining a better understanding of various features of the data. Often abstract data is transformed into a visual form which allows us to reason and deduce new knowledge about the data which would difficult to do without a visual representation.

The field of Information Visualisation is influenced by numerous researchers from many domains including Psychology, Human-Computer Interaction (HCI),

Computer Science, Semiotics, Graphics and Visual Design. This thesis focuses on the visual aspects of social network analysis, drawing from the work done in many of the

11

Introduction contributing domains and concentrates on the application of computer-based information visualisation.

In the next section, I first present essential contextual information including a brief introduction to Social Networks, then discuss the requisite visualisation of those networks and introduce some of the challenges in this visualisation domain. This brief prologue provides a foretaste of the primary areas which are explored in greater detail in this thesis.

1.1 Social Networks

“We are caught in an inescapable network of mutuality… Whatever affects one directly, affects all indirectly.” (Martin Luther King Jr., 1963)

A social network is a theoretical concept which views social relationships as a network of links between actors. The term “Social Network” could be considered an umbrella term describing the existence of relationships between a set of actors. It is commonly accepted that any collection of persons or organisations connected by relations is a social network (Henry 2008). The term Social Network as used in this thesis should not be confused with websites such Facebook, Twitter and alike. These sites should be more correctly referred to as Social Networking Services or Social

Media. They provide services that allow us to extend and maintain our personal social networks beyond our conventional real world capacity. These services are vast sources of social network data which can be (and often are) visualised and analysed using various social network visualisation tools.

12

Introduction

Social Networks are ubiquitous. All of us have ties to others, whether the relationship is brought about by birth, friendship or employment, and these “others” have similar connections to even more others. Social networks are comprised of actors representing individuals or groups tied to each other through various relationships. These relationships could be comprised of anything including: visions, shared values, financial exchanges, membership, and conflict. The network of interconnections that result from these ties form the structure of the social network.

Social networks usually consist of many locally dense clusters with a global sparse structure. This means that any person is a small number of links away from any other person. This is typified in the six degrees of separation property which has been topic of numerous experiments and studies (see Milgram 1967). Some examples of social network data that is sourced from traditional and new emerging online areas are shown in Table 1.

Table 1 – Examples of Social Network data sources [adapted from (Zhu 2007)]

Traditional sources Online data sources Interpersonal relations P2P file sharing Co-authorship Instant messenger Citation Email Trade Online game play Telecommunication traffic Forum posts Disease transmission Hyperlinks

Interest in interpersonal relations initially stimulated the theories

underpinning the concept of a Social Network and they are still of prime 13

Introduction

interest to researchers today. Examples of the types of interpersonal

relationships that are of interest to researchers are (Gretzel 2001):

 Kinship: - brother of, father of;  Social Roles: - boss of, teacher of, friend of;  Affective: - likes, respects, hates:  Cognitive: - knows, similar views

Researchers and analysts are interested in understanding how the structure and the pattern of interactions in a network relate to observed phenomena. In some cases the network data itself is the phenomenon of interest, however, generally the network data are considered to be a proxy for a “real” phenomenon (Bender-deMoll and McFarland 2006). Although a number of methodologies could be employed to facilitate this type of study, Social Network Analysis (SNA) has emerged as a principal contemporary technique (Ryan Cragun 2008).

SNA has become a generally accepted methodological approach to studying social networks drawing many concepts from network theory. Its application however has extended beyond the boundaries of sociology into other domains including anthropology, biology, medicine and economics. Many approaches to SNA include visualising data as an important aspect of analysis and therefore the good visualisations have the potential to enhance analysis activities across many application domains.

1.2 Social Network Visualisation

14

Introduction

“Visualisation provides an interface between two powerful information processing systems – the human mind and the modern computer” (Gershon, Eick et al. 1998)

For as long as we have sought to understand complex sequences of events in nature and society we have used visual methods to help us explore and explain underlying cause and effect relationships. Visualisation encourages discovery, assists in identifying patterns and helps disseminate information. One might argue that statistics and other mathematical techniques could provide answers to questions more efficiently and accurately. However, a mathematical approach is only effective if you know what questions to ask a priori. Knowing what questions to ask often comes from insight gained through visual representations. The human brain is particularly effective at processing visual information and consequently images and graphics have greatly assisted in the advancement of many branches of science.

Social network analysis is no exception, Freeman (2000) suggests visualisation has been central to the growth of social network analysis.

The need for a visualisation of social networks derives not only from the analyst’s desire to gain insight into social networks but also from the participants in networks wanting to understand the network they participate in. For example, many analysts desire visual techniques to explore, investigate and explain social network phenomena. In addition, the popularity of social media has created a need for users

15

Introduction of those services to understand and manage their social network, which in many cases may exceed Dunbar’s number1 (see Dunbar 1993).

The ability to see data clearly creates a capacity for building intuition that is unsurpassed by summary statistics (Moody, McFarland et al. 2005). The challenge is then to determine what can be improved in the current approaches to social network visualization.

1.3 Challenges

When asked to represent relationships between multiple entities, most people intuitively draw nodes connected to other nodes by lines, even if they have never heard of a network diagram or a graph. Graphs are underpinned by network and and are used widely in numerous domains. So pervasive is graph visualisation that graphs are used to visualise everything from nerve cells to activities within a military headquarters. Sociologists and other analysts undertaking Social

Network Analysis (SNA) often use a particular type of graph, a Sociogram, to visualise the relationships or flows between people, groups or organizations.

Often the phenomena we seek to explain are contained in data sets which consist of entities and associated relationships which can be expressed formally as a graph. The resulting graph is almost always represented as a node-link diagram

1 Dunbar’s number named after Robin Dunbar (b.1947) is a proposed cognitive limit to the number of people with whom any individual is able to sustain a stable or meaningful social relationship (approximately around 150) 16

Introduction

(Freeman 2000). Computer-based visualization using the node-link representation is very common. The vast majority of visualization software found in the INSNA software repository use node-link diagrams and nearly all the submissions to the

InfoVis 2004 contest used displays of node-link diagrams (Kang, Plaisant et al. 2007;

Henry 2008).

Even though they are intimately connected with a branch of mathematics,

Node-link diagrams (also called Network Diagrams) are an intuitive method of representing entities and the relationships between them (see Figure 1). A predecessor of these diagrams can be found in electrical circuit schematics in the sense that the exact location of components and the lengths of wires connecting them are irrelevant to the circuit function (Blackwell 2011).

Sociologists and other analysts undertaking social network analysis are increasingly making use of node-link visualizations to analyse and highlight important relationships between the elements of the network. There are a number of software packages that offer the ability to present the network using node-link diagrams, some of which are listed in (Bender-deMoll and McFarland 2006)2. All these software packages generally take advantage of the large amount of work that has been devoted to improving the algorithms responsible for the visual layout of node-link representations. However, finding an optimal layout is a difficult problem,

2 A count of referenced software on the Wikipedia page “Social network analysis software” is 72, with most offering some form of visualisation. (http://en.wikipedia.org/wiki/Social_network_analysis_software) 17

Introduction considering the almost infinite possibilities of laying out the nodes and arcs of a graph in 2- or 3-dimensional space.

Figure 1 –Humans naturally use node-link diagrams to describe related concepts

Layout algorithms of node-link diagrams are designed to optimize the positioning of nodes and edges with respect to various aesthetic criteria such as edge crossings and symmetry (Di Battista, Eades et al. 1999). However, the optimization of some criteria is intractable and the optimization of one criterion may impact adversely with the optimization of another causing unexpected consequences (Garey and Johnson 1983; Huang, Hong et al. 2007).

Moreover, the number of studies evaluating the aesthetic and perceptual criteria on which those algorithms are based is relatively small (see Dwyer, Lee et al.

2009) and there are few well-defined criteria for assessing how “good” a graph layout is at the interpretation level (Bender-deMoll and McFarland 2006). Some studies have found that the perceptions of structural features of the network changed as the

18

Introduction spatial arrangement of the network changed (McGrath, Blythe et al. 1997; Huang,

Hong et al. 2007). As Figure 2 illustrates, even at the most fundamental level, techniques used to render a graph can emphasize different network paths (Ware

2000). In this regard node-link visualizations present great challenges in providing a representation of network relationships that could be relied upon to be interpreted in a consistent manner (Huang, Hong et al. 2006).

In addition, the current approaches to node-link drawing focus on readability rather than the visual communication of substantive content (Brandes, Raab et al.

2001). In view of the number of problems that node-link diagrams pose, some authors have proposed that we move beyond the graph paradigm or at least use graphs in concert with other visualization approaches (Viégas and Donath 2004;

Karahalios and Viégas 2006).

Figure 2 – Different paths are emphasised despite nodes positioned in the same location [adapted from (Ware 2000)]

Node-link representations of social networks are by far the most common visual representation. However, a number of studies have found that their utility decreases quickly as the network size becomes larger. They often use algorithms that are high in complexity and little empirical testing has been undertaken to evaluate

19

Introduction the usefulness of this type of layout. These issues have encouraged researchers to investigate alternate methods of visualising social networks primarily using matrix representations (cf. Henry and Fekete 2007; Henry 2008), tables, scatterplots and histograms (cf. Lee 2006; Aris and Shneiderman 2007), tree-ring layouts (cf. Farrugia,

Hurley et al. 2011) or hybrid approaches (cf. Henry, Fekete et al. 2007). Currently, force-directed algorithms and other layout algorithms usually take into consideration only topology of the graph, ignoring meta-information, specifically the attributes of nodes and edges.

The challenges extant in representing static networks are compounded when attempting to visualise dynamic networks. The visualisation of dynamic or temporal networks is becoming increasingly important and has been identified by leading researches in the field as one of the next primary challenges in network theory. The data describing a social network will generally have a temporal dimension to it

(Daassi, Nigay et al. 2006). Social networks are dynamic systems in a constant state of change. People establish and dissolve friendships, business relationships continually change, and people are born while others die.

Temporal relational datasets available for network researchers are becoming more common, due in part because of social interactions on the web and the availability of the data describing those interactions. In addition, modern computing power and storage present the ability to analyse and store data from dynamic and large networks.

20

Introduction

Visualisations of dynamic networks tend to consist of animated or time-based series of static snapshots of node-link representations. Animated node-link representations suffer from the same issues that afflict static ones, with additional challenges introduced because of the temporal nature of the data.

Visualisations generally require trade-offs. However the advantage of computer-based visualisations is we are not tied to one particular representation and the most suitable visualisation or a combination of them can be used to assist the analysis. Recognising the challenges described above and remaining cognisant that there is no one best representation of a social network, I have developed new approaches which address some of the visualisation issues related to graph representations. The next section describes the contributions of this thesis in more detail.

1.4 Research Aims and Approach

The primary goal of this thesis is to study Social Network Visualisation, examining what the current issues are and exploring what new visualisation approaches could be employed to mitigate them.

The examination of the literature in the social network visualisation field reveals that the visualisation of Dynamic Social Networks is one of the primary challenges in the area. In addition, there are few approaches that visualise the relationship between the and node attributes or semantic

21

Introduction relationship information. This research investigates what visualisation approaches might be employed to address these challenges.

This thesis aims to answer the questions:

1. How could alternative visual encoding approaches aid in performing

fundamental temporal tasks on social networks?

2. How can network visualisation techniques be used to examine the

relationship of metadata to the structure of a network and vice versa?

An analytical evaluation approach is used to assess the performance of the visualisation approaches presented in this thesis. Established taxonomies are employed to qualitatively determine what low level tasks can be achieved through the application of the visualisation technique. This evaluation assists in informing designers and developers of visualisations when the approaches described herein might be of value, and where they might not be useful.

1.5 Scope

Information visualisation is a broad field. This thesis focuses on visualising information contained in datasets which can be viewed as a social network. The term visualisation can be interpreted to include aural and tactile “visualisations”, for use by those people with disabilities or other impairment. This thesis focuses on visualisations that require the use of the human visual system.

22

Introduction

The application of the techniques described in this thesis was generally assessed using social network datasets of less than 50 nodes and under 400 edges.

However, as discussed in Chapter 6. , the Temporal Interactive Multislider Event and

Relationship (TIMER) tool was evaluated with the 2008 Visual Analytics Science and

Technology (VAST) ‘Mini-challenge three’ dataset consisting of a log of cell phone calls which contains 400 nodes and 9834 edges (cf. Grinstein, Plaisant et al. 2008).

One claim of social network analysis is that it supports an ability to work at the macro, meso and micro levels simultaneously; however, when multiple level analyses are conducted it generally performed more or less separately. Thus far, the visualisation of very large datasets has remained for the most part a separate field of visualisation research which generally includes dimensionality scaling, clustering or other aggregation techniques. The visualisation of large and very large datasets is not considered in this thesis. Moreover, as the focus of this thesis is on small to medium size relational datasets, the use of aggregation techniques are avoided to reduce the possibility of introducing artefacts that could possibly obscure interesting micro-level patterns.

This thesis does not propose additional metrics for a domain already overcrowded with statistical measures. The innovations described herein focus on the visual representation of the social network data. The visual encoding techniques enable uses to obtain fundamental information quickly, at times preattentively (see

Wolfe 2005) thereby contributing to the visual facet of Visual Analytics.

23

Introduction

The research described in this thesis is centred on the exploration of visualisation encoding and interaction approaches. Most visualisation approaches can be enhanced through the addition of filtering, the ability to query, and interactive navigation and interaction techniques. However this thesis is primarily concerned with exploring whether there is inherent value to a particular visual encoding approach. To facilitate this exploration, it is necessary that the prototypes and concept demonstrators implement a simple level of filtering and interactivity.

However developing fully functioning software with the ability to perform complex filtering and provide a high degree of interactivity is beyond the scope of this thesis.

Therefore evaluation approaches that are designed to evaluate the performance of a fully developed tool are inappropriate and are consequently not employed in this thesis.

The prototypes presented in this thesis vary in maturity and application level functionality. Some prototypes in this thesis are largely concept assessment applications which don’t have the same level of filtering and querying capability as applications that have had tens of thousands man-hours of development devoted to them.

The prototypes described in this thesis utilize colour as a notable part of the visual encoding scheme. Therefore the approaches described will at a minimum require redesigning if they are to be used with users that have difficulties with colour perception.

24

Introduction

This thesis aims to evaluate the new approaches in respect to established opinion on what is required to be a useful social network visualisation. The ultimate aim is to evaluate individual techniques in isolation and utilise the techniques which prove useful into a tool which can be evaluated by longitudinal studies.

1.6 Contributions

The research contained in this thesis contributes primarily to the field of information visualisation but also to the field of visual analytics through the visual encoding designs and interactive prototypes.

Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces (Bartoletti, Billinghurst et al. 2005) and the research described in this thesis investigates and uses interactive application prototypes to facilitate testing of the visual encoding designs. Although interaction mechanisms are very important in computer-based information visualisation, the underlying method of representing data is as equally important. Card (2008) describes information visualisation as “a set of technologies that use visual computing to amplify human cognition with abstract information.” Good visual encoding can be read and comprehended almost instantaneously while others create visual puzzles (Tufte 1983; Tufte 1990) cited in

(Krempel 2009).

A particular focus of this thesis is on Dynamic Network Visualisation and developing approaches that help users understand the impact of external events on 25

Introduction the structure of the networks and the degree of change within them. The specific contributions of this thesis are:

1. Extensive literature review of Social Network Visualisation.

2. Design of the Attribute Based Graph Visualisation (ABGV) approach concept demonstrator.

3. Proposing the Visualisation component of the Social Network Analysis for Command and Control tool (SNAC2)

4. Creation of the Parallel Arc Diagram visualisation method and the development of the software prototype Temporal Interactive Parallel Arc Diagram (TIPAD).

5. Development of the Temporal Interactive Multi-slider Event and Relationship (TIMER) prototype. This focuses on visualising and enabling temporal pattern discovery in Dynamic Networks.

6. Temporal additions to the static taxonomic task descriptions for enhanced temporal visualisation evaluation.

The reader may also refer to other publications (listed below) which I have authored or co-authored related to aspects of this thesis. My contributions to these publications are:

1. The PAD approach and the TIPAD application are also described in the

publication entitled Parallel Arc Diagrams: Visualizing Temporal

Interactions which appeared in the Journal of Social Structure volume

12 issue 7 (Hoek 2011).

26

Introduction

2. The use of SNAC2 to analyse team interaction in dynamic targeting is

described in the International Journal of Intelligence Defence Support

Systems (Lo, Au et al. 2011) and the use of contextual data in the

analysis of temporal social networks is described in the TTCP Human

Science Symposium (Lo, Au et al. 2010) and the proceedings of the

SimTect Conference (Lo, Au et al. 2009).

1.7 Thesis Structure

This chapter has briefly described the concept of Social Networks, Social

Network Analysis, Visual Analytics and the need for the visualisation of Social

Networks. In addition, it introduced the most common and natural representation of social network data; the node-link diagram. It also alerts the reader to some of the challenges facing the field of social network visualisation including the observation that very little empirical testing has been performed on social network visualisation techniques, and a predominant focus on readability rather than visual communication.

This chapter also informed the reader of what some consider the next principal challenge in the field, dynamic network visualisation and what some see as an excessive focus on network topology at the expense of visualising the attributes of nodes and edges. Finally, it covers the aims and contribution of this thesis to the field of Information Visualisation.

27

Introduction

Chapter 2 covers the theoretical and methodological contributions to social network visualisation. It reviews the important contributions and theoretical foundations of Social Network Visualisation and introduces a basic feedback model of visualisation. In addition, it describes the importance of visualisation, discusses the elements that are considered important in social network analysis and the associated visualisation, reviews the current approaches to social network visualisation, discusses various categories of visualisations and examines assorted improvements and alternatives to node-link visualisations. Most importantly for this thesis, it describes the user requirements for visualisation and analysis and some of the challenges of dynamic network visualisation.

Chapter 3 discusses the existing tools poor state of support for visualisation and analysis where node and edge attributes play a central role. The work described in this chapter contributes to filling that void. I provide a description of a visualisation technique using a modification of existing approaches which benefits both novice and experienced analysts alike. This technique is demonstrated through the use of the

ABGV prototype software application. This prototype is the only one in the thesis which is not focused on visualising temporal aspects of social networks.

Chapter 4 discusses the visualisation aspects of the Social Network Analysis for Command and Control (SNAC2) application which supplements traditional social network analysis with the addition of features such as contextual mark-up. The application provides the ability to categorise the sequence of events into phases using flexible notions of time and annotations for important events. It displays the

28

Introduction semantic information associated with a particular event and as a result analysts are able to determine communication latencies. It was created to fill an urgent requirement as no existing tool could provide the ability to visualise and utilise the rich data captured by a team of Defence analysts.

Chapter 5 describes the concept of the Parallel Arc Diagram (PAD) technique for visually representing dynamic networks. Visualisations using typical existing techniques are compared to the PAD technique. This concept is demonstrated and tested via a prototype application, the Temporal Interactive Parallel Arc Diagram

(TIPAD) which applies the PAD approach to visualising 2‐Mode temporal relationships inferred from the script of the movie “The Matrix”. This chapter discusses how simple features of the inferred social network are made visually apparent and how the visualisation assists in examining ego centric patterns of interaction.

Chapter 6 discusses the concept and motivation behind the Temporal

Interactive Multi-slider Event and Relationship (TIMER) tool. This prototype is the only one in this thesis which was tested on a relatively large dataset with almost ten thousand edges (The VAST 2008 Cell Phone Mini-challenge). It is also compared against tools which have been applied to visualise and analyse the same data.

Chapter 7 provides a discussion of the evaluation methodologies including a model which characterised the visualisation design and validation process. It also presents the results of the analytical analysis of the new visual encoding techniques demonstrated in the prototypes using heuristics drawn from static and temporal task taxonomies. 29

Introduction

Chapter 7 discusses and summarises the work presented in the proceeding chapters, draws conclusions based on this work and suggests opportunities for future work.

A glossary of specialist terms has been included in Appendix A.

Matrix Map Dynamic representation representation Network Visualisation Graph Unique representation representation

SNAC2 Ch. 4 TIPAD Ch. 5 Node-link TIMER ABGV representation Ch.6 Ch. 3 Chaper

Attribute/Context Structure Focused Focused Static Network Visualisation Visualisation Visualisation

Figure 3 - Thesis contributions in the Social Network Visualisation space

Figure 3 shows the four primary contributions of this thesis and depicts their relationship to the body of research concerned with the visualisation of social

30

Introduction networks. The green objects represent the contributions and include a reference to the relevant chapter number. The red lines can be read as “is a type of”. The dotted line of the TIMER tool indicates that although it utilises a node-link representation it is not the main focus.

The figure shows that concept demonstrators or prototypes presented in this thesis utilise node-link representations as a component of a unique visualisation approach or alternatively rely on an original visualisation technique introduced and described in the relevant chapter.

31

Introduction

32

Background

Chapter 2. Background

2.1 Visualisation

Many authors suggest that the use of visual images has greatly assisted in advancing many branches of science. Some even suggest that visualisation and measurement are the two primary factors that are responsible for the rapid development of all modern science (Crosby 1997).

One might argue that statistics and other mathematical techniques could answer questions more efficiently and accurately. However, these methods are often only useful if you know what questions to ask a priori. According to Tufte (2001) one should not fail to appreciate the importance of “looking” at one’s data before analysing it. This principle is demonstrated elegantly through the use of Anscombe’s

(1973) quartet dataset to demonstrate the dangers of blindly applying statistical methods to data. Visualisations help overcome these pitfalls by leveraging the powerful perceptual abilities of humans and allow us to gain insights that would not otherwise be apparent (Perer and Shneiderman 2006). Information Visualisation systems appear to be most useful when a person does not know what question to ask about the data or when they want to ask more informed questions (Wolfe 2005).

33

Background

A simple feedback model of visualisation is shown in Figure 4. The model is intended to fit all levels of visualisations, ranging from a simple LED indicator to complex 3D computer simulations (Van Wijk 2005). The model depicts Data (D) being transformed according to a Specification (S) into a time varying Visualisation (V). The user applies Perception (P) and cognition to the Image (I) to gain some Knowledge

(K). The amount of knowledge gained is dependent on the amount of existing knowledge. Importantly the user may decide to adapt the original specification of the visualisation in order to further explore the data, represented by E(K) in the model.

This iterative cycle informs the user, thereby increasing their knowledge after each iteration.

Figure 4 - Van Wijk model of visualisation (Van Wijk 2005; Van Wijk 2006)3

Traditionally the area of information visualisation is viewed as a distinct sub- field that overlaps with two other visualisation areas; scientific visualisation and

3 Reproduced with the kind permission of Prof. Jarke Van Wijk. The symbols and mathematical notation are used as concise shorthand rather than a precise description. 34

Background geovisualisation (see Figure 5). Geovisualisation is an abbreviated term for geographic visualisation and comprises of a set of visualisation and analysis techniques targeted at understanding and exploring geospatial data. The scientific visualisation area tends to concentrate on the visualisation of data generated by scientific inquiry and generally consists of numerical data about the real world. That is, data related to the human body, the earth, molecules, etc. Information visualisation in contrast is concerned with the visual representation of abstract data to assist human cognition.

Although this categorisation is not universally agreed upon, it is still a useful method of partitioning visualisation domains (Rhyne, Tory et al. 2003). There are copious amounts of abstract data in the world, and its abundance and complexity have been the motivation for much of the research in information visualisation. This thesis concentrates on the area of information visualisation.

Information Visualisation

Scientific Geovisualisation Visualisation

Figure 5 - Visualisation Taxonomy

35

Background

2.2 Social Network Visualisation

A social network consists of nodes and ties; the nodes represent the individual actors in the network, and the ties represent relationships between the actors. In the analysis of social networks the nature of the relationship among the actors is of prime importance and can consist of numerous types. For example, relationships could be kinship, interactions, affiliations, cognitive, perceptual and role based.

Sociograms are an early form of graph visualisation which visually describe the social relationship between actors (see Figure 6 ). Sociologists and other analysts undertaking SNA often use sociograms to perform their analysis, although they are now more commonly referred to as social network visualisations (see Figure 7).

Figure 6 -Moreno's hand drawn Sociogram: This node-link diagram published in the New York Times in 1933 by Jacob Moreno shows relationships between fourth graders.

36

Background

Figure 7 - A Social network diagram. Source: orgnet.com (image reproduced with kind permission of Valdis Krebs)

Hand-drawn images of social networks started to appear in the 1930’s with visualisation techniques developing slowly until the advent of personal computers, which provided enhanced visualisation possibilities. Even though modern computing technology has enabled better interactivity and higher fidelity representations, most of the social network visualisations used today are still based on a node-link representation.

Jamali and Abolhassani (2006) describe four social network models:

 Using formal methods to show Social Networks

37

Background

 Graphs to represent Social Relations

 Using matrices to represent Social Relations

 Statistical Models for Social Network Analysis

Although using a formal or mathematical model can provide a compact and systematic representation, the visual models can aid in the exploration of relationships in ways that other representations can’t. Several authors claim that visualisations of social networks can be an effective approach to helping analysts both explore relationships between actors and present their findings to others.

Traditionally visual representations were not used as an Exploratory Data Analysis

(EDA) tool but were used as an aid to communicate or emphasise a unique aspect of the data in a compelling way. EDA is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to maximise insight into a data set

(Natrella 2010).

An examination of the ‘International Network for Social Network Analysis’ software repository reveals that most of the 55 software referenced for visualizing and exploring social networks use Node-Link diagrams (Henry and Fekete 2007)4.

Most visualisation research on understanding relationships in large data sets implicitly assumes that a node-link diagram is appropriate (Kang, Plaisant et al. 2007).

4 Note: Henry’s thesis updates this to 60 systems for visualisation in the INSNA repository. 38

Background

Typically, researchers believe that node-link diagrams are especially useful in supporting topology-based analysis tasks, and poor at attribute-based analysis tasks.

It has been shown that Node-Link diagrams behave poorly on dense networks for even the simplest of tasks (Ghoniem, Fekete et al. 2005). Node-link diagrams are generally perceived to ‘not scale up well’ and produce cluttered overviews with unreadable labels (Kang, Plaisant et al. 2007). There are two common weaknesses inherent in existing social network analysis tools (Perer and Shneiderman 2006):

 It is difficult to find patterns and comprehend the structure of networks with

many nodes and links, and

 There are often a medley of statistical methods and overwhelming visual

output which leaves many analysts uncertain about how to explore the

network in an orderly manner.

Various authors have proposed that we move beyond the graph paradigm or at least use graphs in concert with other visualisation approaches (Viégas and Donath

2004; Karahalios and Viégas 2006). Others consider graph drawing to be currently treated as more of illustrative art than a methodological tool (Moody, McFarland et al. 2005; Bender-deMoll and McFarland 2006). Even those that support the use of node-link representations admit that they aren’t particularly good at conveying the changes in relationships over time and little work has gone into improving this failing

(Gloor, Laubacher et al. 2004; Viégas and Wattenberg 2004; Appan, Sundaram et al.

2006; Bender-deMoll and McFarland 2006).

39

Background

Others regard the current state of social network visualisation as poor, saying that existing SNA visualisations show too little detail or have inadequate functionality. These failings are particularly highlighted when attempting to display a large number of entities. “Most programs currently try to fit everything on to one screen. As a result, once the number of nodes has passed a critical point the graph becomes crowded and almost useless for extracting information”(Higgins, Richards et al. 2001).

There are numerous tools which currently provide capabilities to perform analysis and visualisation of social networks (see Figure 8). These tools tend to cater for two distinct categories of user groups of SNA tools; casual users who are simply exploring data or using the tool to make clear relationships, and analysts who desire a detailed understanding of the network data.

One of the major challenges in any visualisation system is how to present as much important information as possible given a finite display area (Munzner 2000).

This problem is immediately apparent in the visualisation of large graphs. Solutions to cope with limited screen area include panning and zooming (cf. Herman, Melancon et al. 2000, p.33), distortion techniques (cf. Gansner, Koren et al. 2005), filtering and aggregating or partitioning data (cf. Munzner 2000; Marshall 2001; Xu, Cunningham et al. 2007; Henry 2008; Wong, Foote et al. 2008). The simplest approach is probably partitioning where the amount of material to be visualised at any particular time is restricted (Van Ham 2005).

40

Background

One area increasingly making use of visualisations is that of Social Network

Analysis, which is rapidly gaining momentum as a powerful technique to improve individual and organisational effectiveness. It is the mapping and measuring of relationships and flows between people, groups, organisations, animals, computers or other information/knowledge processing entities (Jamali and Abolhassani 2006).

The typical measures and metrics associated with SNA are discussed in the next section as it is important to understand what social network analysts are concerned with and hence what they desire from a social network visualisation tool.

41

Background

Figure 8 -Samples of social network visualisation TOOLS. Source visualcomplexity.com5

5 Screen capture of a portion of 777 visualisation projects related to social networks. Available from: http://www.visualcomplexity.com/vc/index.cfm?domain=Social%20Networks. Images reproduced with the kind permission Manuel Lima. [31/07/2013] 42

Background

2.3 Social Network Analysis

Richard Cross (2006) describes SNA as “…a set of techniques underpinned by statistical analysis that make visible the hidden connections that are important for sharing information, decision-making and innovation in an organisation”. It is an inherently interdisciplinary academic field and has become one of the major paradigms in contemporary sociology.

The foundation of SNA is a set of theories to assist in understanding the relationships and structures of a network. Graphs are often used as the primary data structure to represent networks, and as a consequence many of the measures and methods applied to SNA are derived from network and graph theory. For instance, the networks that are being analysed can be characterised as random, small world and scale free. Additionally, there are numerous metrics that are considered important in SNA with the most often cited metric being centrality. A list of common terminology and measures used in SNA is presented below:

1. There are at least eight commonly cited measures of centrality. These

measures are: degree, betweenness, closeness, eigenvector, power

information, flow, and reach, with the first four of these being the most

frequently used. The first three were proposed by Freeman (1979) with the

eigenvector approach being proposed by (Bonacich 1972). Centrality is

generally used to find important and less important actors in a network.

43

Background

It was first applied to human communication by Bavelas in 1948 and

stemmed from his research in communication in small groups. His

hypothesis was that centrality and influence in group processes were

directly related (Freeman 1979). General definitions of the most used

measures of centrality are given below, for more extensive definitions

including “valued” network measures see (Opsahl, Agneessens et al.

2010). The fact so many measures of centrality exist suggests they have

varying relevance to particular data sets and that there is no one absolute

measure of centrality. a. Degree - this is the simplest form of centrality measure and can be stated

as the number of ties a node has to all others. In directed networks this

measure can be divided into two subtypes, indegree and outdegree. This

measure is characterised as the “Freeman” approach, described in the

foundation paper in 1979 (Freeman). It is simply characterised by the

question “how many people can this person reach directly?”. Degree

centrality of a vertex 푣 is defined as

퐶푑(푣) = deg⁡(푣) b. Betweenness - Freeman (1977) defined a measure of a node betweenness

based on the number of shortest paths from all nodes to all others that

pass through the selected node. Typically betweenness is used to quantify

the level of ‘control’ a human has in a communication network. It can be

44

Background

simply characterised by the question “how likely is this person to be the

most direct route between two people?” Betweenness is defined as:

휎푠푡(푣) 퐶푏(푣) = ∑ 휎푠푡 푠≠푣≠푡

Where 휎푠푡⁡is the total number of shortest paths from node 푠 to node 푡 and

휎푠푡(푣) is the number of those paths that pass through⁡푣.

c. Closeness – is the reciprocal of the ‘farness’ of a node, which is the sum of

all its distances to other nodes in the network. This measure takes into

account the indirect ties a node has to all other nodes and tries to quantify

the intuitive notion of being central in two dimensional space. It is simply

characterised by the question “how fast can this person reach everyone in

the network?” In-closeness is termed integration, and out-closeness is

termed radiality. The closeness centrality of actor x is defined as the

reciprocal of the average of d(x,y):

푛 − 1 1 퐶푐(푥) = = ∑푦≠푥 푑(푑, 푦) 퐴푉퐺푦≠푥푑(푥, 푦)

Where n is the number of actors in the network, and d(x,y) is the shortest

path distance between actors x and y.

d. Eigenvector – the eigenvector centrality approach is an effort to find the

most central “global” actors and gives less consideration to patterns that

are more local. A node’s eigenvector centrality is proportional to the sum

of the eigenvector centralities of all the nodes directly connected to it. It is

45

Background

characterised by the question “how well is this person connected to other

well-connected people?” The defining equation of an eigenvector in vector

notation is:

λv = Av

Where A is the adjacency matrix of the graph, λ is a constant (the

eigenvalue), and v is the eigenvector. The equation lends itself to the

interpretation that a node that has a high eigenvector score is one that is

adjacent to nodes that are themselves high scorers.

2. Clustering – clustering is generally used to find social groups or cohesive

subgroups. Cliques, N-cliques, and K-plexes look for strongly connected

sub-graphs based on various thresholds and parameters. λ-sets is an

alternative top down approach and determines which connections when

removed from a graph result in a disconnected structure.

3. Structural cohesion – is a measure of cohesion in social groups. It is defined

as the minimal number of actors who, if removed from a group, would

disconnect the group (Moody and White 2003) .

4. Density - the density of a network is the ratio of edges between nodes over

the total possible theoretical edges between all pairs of nodes. That is, the

density of a network is the degree of dyadic connection of the network.

Degree is commonly used as an indicator of how well the network is

connected. A perfectly connected network with a value of 1 is termed a

46

Background

clique. Terminology such as ‘sparse’ and ‘dense’ networks are derived from

this measure.

5. Path Length - a path between two nodes in any sequence of non-repeating

nodes. Path Length is then a measure of the length of the paths between

nodes (also called the distance between nodes). In networks with binary

relations, the length is simply a count of edges between them. The longest

path between any two nodes is termed the diameter. The average of the

shortest path in a network (average distance) is an indicator of how far

apart any two nodes will be on average.

6. Reach – common reach measures are 2-reach and 3-reach which are

measures of the proportion of nodes that can be reached within 2 or 3

steps respectively.

7. Equivalence – Structural, Regular, and Automorphic equivalence are all

variants of this measure. Structural Equivalence refers to the extent that

two actors’ exhibit connections to the same alters. Nodes are considered

structurally equivalent if they have exactly the same relationships to all

other nodes. When the strict requirements to meet structural equivalence

are relaxed, the concept of automorphic equivalence and regular

equivalence emerges. Two actors are regularly equivalent if they are

equally related to equivalent others (Everett and Borgatti 2002)

47

Background

8. Structural hole – refers to the absence of ties between parts of the network

which can give a particular node benefit as a result of its position in the

network and its lack of ties.

9. Reciprocity - when the ties are directed, reciprocity measures the extent

that ties in dyadic relationships are reciprocated. There are a number of

possible definitions; however it is most often a ratio of reciprocal relations

relative to the total number of actual ties. Reciprocity can be considered to

be one indicator of balance or stability in social structure (Rao and

Bandyopadhyay 1987).

10. Transitivity – focuses on triads such that in a transitive network: if there is

a tie between A and B and between B and C then there will also be a tie

between A and C.

11. Core and Periphery Structures – can be measured by the degree to which

a social network is centralized. These structures (often detected at the

meso-scale level) can be identified visually or by using node degree and

their distributions or other complex algorithmic detection.

12. Density – is the ratio of the number of edges in the network over the total

number of possible edges. This ratio is often used when comparing the

density of local areas within the same network.

A study of communication networks found that traditional methods of analysis which largely rely on prestige and centrality measures can be further

48

Background enhanced by the inclusion of context (Viermetz and Skubacz 2007). The authors’ argue that traditional analysis may reveal which member is dominant; however, this rests on the assumption that the amount or frequency of communication is an indicator of some member characteristic. In this type of analysis, members that are concise and transmit important information may be misrepresented. Viermetz and

Skubacz use the content to divide large networks into “semantically connected sub- networks” (using topic detection and density-based clustering) and then perform analysis on these sub-networks.

One particular type of network that analysts commonly study is the “Ego” network. Ego network analysis combines the perspective of network analysis with the data of mainstream social science. These networks are typically studied when a researcher has a question about phenomena affecting individual entities. An ego network is a sub-network that focuses on a single actor within the network. This focal point is called the “ego” and the other actors are called “alters”. The

“Neighbourhood” is the set of alters the ego has a connection with.

In social network analysis the ego neighbourhood is generally considered one-step; that is, it includes only actors that are directly adjacent (Hanneman and

Riddle 2005). In terms of influence, an Ego’s behaviour is often attributed to the Ego’s alters because they are a source of information, social support, sense-making, normative pressures, etc. Ego centric data is rarely discussed in the literature of

Graph Theory and even standard SNA measures are useless (Seary 2005).

49

Background

One approach to understanding complicated networks is to study local connections of individual actors and determine how the complete network forms from separate ego networks. This approach is tailored towards understanding individuals rather than the complete network structure. Typically, analysts using this approach are interested in observing or discovering patterns of interaction. An ego centric approach is often applied to the visualisation of networks that are inferred based on communication (Fisher and Dourish 2004).

2.4 Fundamental Algorithmic Approaches to Visualisation

The ultimate purpose of graph visualisation is to create a representation from which a human can infer properties of the network being studied. The visual analysis of a graph is generally composed of three related elements: the visual representation, user interaction and algorithmic analysis (see Figure 9). This thesis concentrates on the visual representation; however, where these elements intersect it is difficult to discuss them separately.

Visual Representation

Algorithmic User analysis Interaction

Figure 9 - Primary elements for visual graph analysis (von Landesberger, Kuijper et al. 2011) 50

Background

The following sub-sections describe common visualisation approaches which can be the basic building blocks for more advanced approaches. Advanced approaches often include, combine or extended the basic approaches and consequently benefit from them. However, they also may suffer from their inherent weaknesses.

2.4.1 Force-Directed Methods

Force-directed methods are a popular mechanism to calculate and render node-link representations of graphs. They are also known as spring embedders and determine the layout of a graph using information only contained within the structure of the graph itself. These methods generally have two parts: an energy model and an algorithm that searches for an equilibrium state where the total force on each node is zero. In addition, the algorithm generally attempts to satisfy various aesthetic criteria.

The origins of this method dates back to at least 1963 with a method based on barycentric representations (Tutte 1963). Graphs drawn with force-directed algorithms tend to exhibit symmetries and produce crossing-free layouts for planar graphs. The most popular approaches are based on the method proposed by Eades

(1984). Those based on this approach often rely on spring forces similar to those in

Hooke’s law and generally consist of repulsive forces between nodes, but also attractive forces between nodes that are adjacent.

51

Background

Eades’s original algorithm views the graph as a mechanical system. Where steel rings are connected by springs that exert logarithmic forces so that nodes do not overreact. The rationale for using logarithmic forces is that if normal linear behaviour of springs is used, the resultant force is too strong on the nodes that are far apart. Eades’s algorithm is summarised as:

The force exerted by a spring is:

푐1 ∗ log⁡(푑/푐2),

where d is the length of the spring and c1 and c2 constants. Non adjacent nodes

repel each other with an inverse square law force,

2 푐3/푑 ,

where c3 constant and d is the distance between the nodes.

The basic force-directed algorithm computes O(|E|) attractive forces and

O(|V|2) repulsive forces. Techniques to reduce the quadratic complexity of the repulsive forces have been adopted which reduce the complexity to O(|V|).

Eades’s algorithm attempts to satisfy the goal of creating aesthetically pleasing layouts by ensuring that all the edge lengths are the same and the layout should demonstrate as much symmetry as possible. Later Fruchterman and Reingold

(1991) added the goal of even vertex distribution and modelled the system as atomic particles or celestial bodies. In addition, they added the notion of temperature which modified the amount of adjustment as the layout becomes “better”. This is a special case of the general technique called simulated annealing.

52

Background

Another variation on force directed approaches calculates forces between nodes based on their graph theoretic distances, determined by the lengths of shortest paths between them (Kamada and Kawai 1989). Recent work attempts to overcome some of the less desirable effects of established energy models. Traditional energy models enforce short uniform edge lengths and tend to group nodes with a large degree in the centre of the layout. This approach hinders the viewer’s ability to discriminate clusters. The LinLog model has two variants implementing either node- repulsion or edge-repulsion which cluster nodes according to two well-known criteria: the density of the cut and the normalised cut (Leighton and Rao 1988; Shi and Malik 2000). This model improves the discrimination of clusters over the standard force-directed approach.

The use of genetic algorithms together with force-directed placement has also been considered as an addition to the method. Genetic algorithms are inspired by evolutionary biology mimicking the process of natural selection to find solutions that are close to optimal. This includes inheritance, mutation, selection and crossover

(Kosak, Marks et al. 1994; Zhang, Liu et al. 2005). Genetic algorithms have been introduced as a solution to the computationally intractable sub-problems in graph drawing. Genetic algorithms find approximate solutions to optimisation and search problems. Therefore their application to force-directed layout means they find a solution that is close enough to being optimal. However, the introduction of genetic algorithms can introduce problems relating to crossover operations of graphs

(Eloranta and Mäkinen 2001).

53

Background

The utility of the basic force-directed approach is limited to small graphs

(Ghoniem, Fekete et al. 2005). There are a number of reasons why traditional force- directed algorithms do not perform well for large graphs. These include the prospect that physical model typically has many local minima and, in addition, large graphs have a minimum vertex separation that tends to be very small, leading to unreadable drawings. One method of attempting to address this limitation is the multi-level layout technique. This technique views the graph as a series of progressively simpler structures and is laid out in reverse order: from the simplest to the most complex

(Hadany and Harel 1999; Harel and Koren 2001; Walshaw 2001).

A common problem with force-directed layout algorithms is that they do not take the size of each node into account resulting in the overlapping nodes.

Refinements of force-directed algorithms to address this problem typically consist of introducing repulsive forces, treating the problem as constrained optimization problem or the development of heuristics to reduce most overlapping nodes

(Marriott, Stuckey et al. 2003; Li, Eades et al. 2005; Gansner and Hu 2009).

Although force-directed algorithms have some limitations they are utilised very regularly because of inherent advantages, which include:

 they are relatively simple to implement,

 extensions and heuristic improvements can be added easily,

 smooth transitions to the final state helps preserve the user’s mental map, and

 they can be extended to 3D models. 54

Background

Limitations of the basic force directed layouts include issues related to not being able to distinguish communities easily. In addition, naïve implementations have a relatively long running time and according to established aesthetic criteria, local minima can produce low-quality drawings. Another weakness is that force-directed layouts do not provide a deterministic model, that is, the start state of nodes greatly influences the final layout.

2.4.2 Non-Euclidean Approaches

Many of the proposed non-Euclidean approaches utilise hyperbolic space.

Hyperbolic space provides a number of advantages including the ability to provide a fish-eye view of a section of the graph as a natural consequence of converting from hyperbolic space to Euclidean space (Lamping, Rao et al. 1995). To display the on a computer monitor it is necessary to map it to two- dimensional Euclidean space. Two of the most common methods of achieving this are the Poncaré disk and Beltrami-Klien projections. The Bletriam-Klien projection preserves straight lines but the angles are not necessarily preserved. Conversely the

Poincaré disk model maintains the angles, but distorts lines.

Hyperbolic geometries have properties that are well suited to the layout of large graphs. Layout techniques for visualising very large directed graphs in three dimensional hyperbolic space can handle two orders of magnitude more data than general graph layout tools (Munzner 1997). The use of force directed layouts in non-

Euclidean space is common and particular types of graphs are often visualised on the surface of a sphere or torus. (Munzner 1998; Kobourov and Wampler 2005). 55

Background

When visualising large graphs, users of a hyperbolic browser select an area of

“focus” and the regions that approach the centre become magnified, while the regions that were in the centre shrink as they move to the edge. A large portion of graph remains visible allowing users to see the finer details of a portion of the graph and yet remain cognisant of the bigger picture. This technique is commonly termed

“focus and context”, whereby the context is provided through visible nodes that are several degrees away from the focus area (Lamping and Rao 1999). The underlying algorithm handles a user’s change of focus by changing the mapping from the hyperbolic plane to the Euclidean plane.

Layout in a hyperbolic space is sensitive to the amount of curvature; an overabundance of space will allows the nodes to space out too easily and this often results in undesirable layouts. In addition, some graphs are better suited to non-

Euclidean spaces that others. For example, trees work well in non-Euclidean spaces whereas a regular grid works best in Euclidean space.

2.4.3 Hierarchical Approaches

The hierarchical layout, also called the Sugiyama approach, is an algorithm for drawing directed acyclic graphs. Hierarchical algorithms are used when there is need to represent all the relationships graphically so that the positioning of the nodes is consistent with the transitivity of the relationship. This is commonly described as the edges “flowing” in a uniform direction whether that is up, down, left or right.

The algorithm is generally implemented in three phases:

56

Background

1. Layer assignment – each node in a graph is assigned a layer, such that all edges

extend from a lower layer to a higher layer.

2. Crossing minimization – the number of edge crossing is minimised. Links

between layers are usually handled by introducing dummy nodes between

the layers.

3. Node placement – the coordinates of the nodes are determined. Nodes on

the same layer get the same y-coordinate and the x-coordinate is assigned

from the permutation calculated in the second phase.

Figure 10 -An example of the Sugiyama layout

Figure 10 demonstrates the result of a simple network layout with the Sugiyama algorithm. The Sugiyama method is guided by a number of aesthetically desirable properties which according to some studies make graphs more readable. The steps of the algorithm address each of the following aesthetic criteria:

 edges point in a uniform direction,

57

Background

 short edges are more readable,

 uniformly distributed nodes,

 minimise edge crossings, and

 edges are straight.

Hierarchical layouts are best suited to directed acyclic graphs that are inherently hierarchical. If the network does not form a directed acyclic graph, a pre-processing step is required in Sugiyama algorithm. The pre-processing reverses the direction of some edges to ensure that the input to the layer assignment is an acyclic digraph.

Once the nodes have been assigned to layers, the original direction of the reversed edges can be restored. Figure 11 demonstrates how graph (a) would be rendered with the Sugiyama algorithm.

58

Background

(a) (b) Figure 11 – (a) A digraph and (b) the same graph drawn with the Sugiyama algorithm (adapted from (Healy and Nikolov 2013)

It is also possible to remove edges instead of reversing them and introduce them back after the layer assignment. However, if this approach is employed the layer assignment is performed with a sub-graph and it may result in an undesirable layout. The Sugiyama method is a very popular way of rendering directed graphs and many modifications and enhancements have been proposed in the literature. These extensions include applying interactive user constraints to hierarchical drawings which allow the users to interact with a drawing and introduce additional constraints on the positions of some nodes (Nascimento 2001; Nascimento and Eades 2002).

The Sugiyama method does have some limitations when applying it to a cycle graph. By forcing nodes to a particular level, one node may be incorrectly perceived to be inferior to another (Healy and Nikolov 2013). For example, in Figure 12 (b) the

Sugiyama layout suggests that node ‘d’ is inferior to node ‘a’.

59

Background

Figure 12 – Two alternative drawings of the same network. (a) force directed (b) Sugiyama (Healy and Nikolov 2013)

Alternatives to the Sugiyama layout based on a genetic algorithm have been proposed with the chromosome containing values which represent the x and y coordinate. In this case the fitness measure is provided by the number of edge crossings (Utech, Branke et al. 1998). Another alternative using an energy-based model and solving a one-dimensional optimisation problem for each of the axes was proposed by (Carmel, Harel et al. 2004). This algorithm does not restrict nodes to lie on a horizontal level and can be applied without change to both cyclic and acyclic digraphs with either directed or undirected edges.

One particular class of hierarchical layout approach is that of the tree layout.

Trees can be represented in the familiar “normal” way or displayed radially and other spatial projections. There are numerous techniques for displaying trees but they can be divided into three groups: space filling, node-linked based, and hybrid (von

Landesberger, Kuijper et al. 2011). A survey of implicit hierarchical visualisation lists

60

Background forty separate tree visualisation approaches but leaves many more unexplored

(Schulz, Hadlak et al. 2011). There have been several studies comparing various approaches to tree visualisation (Stasko 2000; Barlow and Neville 2001; van Ham and van Wijk 2002; Kobsa 2004; Archambault, Munzner et al. 2007). However, the results vary so significantly it is impossible to draw general conclusions.

2.4.4 Orthogonal Graph Drawing

The major aesthetic feature that is addressed by orthogonal graph drawing is improving the angular resolution of edges. That is, the smaller the angle between edges the more difficult it is to discriminate between them.

Orthogonal graph drawing generally uses the ‘topology-shape-metrics’ approach which originated from Tamassia’s (1987) seminal paper. The original algorithms were constrained to graphs that had a maximum degree of four. However, newer algorithms introduced approaches such as the proportional-growth model to overcome this limitation (Biedl, Madden et al. 1997).

The name topology-shape-metrics approach was first introduced in 1999. (Di Battista,

Eades et al. 1999). The approach produces drawings that are orthogonal and were originally principally used for printed circuit board and silicon chip layout design.

Applying the approach to graph drawing is motivated by the premise that one of the most important aesthetic criteria is the number of edge crossings. In addition, orthogonal drawings typically aim to minimise the number of edge bends.

Topology-Shape-Metrics are comprised of three phases:

61

Background

1. Planarization – constructs the topology which is described by a planar

embedding. Non-planar graphs use dummy nodes to represent edge

crossings. This stage is computationally intensive and it difficult to perform

efficiently with large graphs.

2. Orthogonalization – the number and orientation of 90° bends is determined.

3. Compaction. – dummy nodes are removed and coordinates for nodes and

bends determined. Usually minimising the sum of the lengths of all edges.

Modifications to the topology-shape-metrics have been proposed to allow for a limited set of additional constraints to be included in the layout algorithm.

The orthogonal drawing approach results in no overlapping nodes and produces compact orthogonal drawings. In terms of user interpretability it is suited to medium to sparse graphs. When applied to large graphs they become too cluttered to make confident interpretations.

2.4.5 Matrix Representations

An adjacently matrix contains one row and one column for each node in a network. Typically, each cell contains a Boolean value indicating that an edge exists between the two nodes. In a weighted adjacency matrix the values represent some value of the relations.

Studies have shown the matrix representations perform well for most tasks in large dense networks except for the task for path following (Ghoniem, Fekete et al.

62

Background

2005). Matrix representations allow for the easy determination of the degree of a node by counting the filled cells within the column or row. Patterns of relationships can be identified with an appropriate ordering. However, the visibility of patterns is very sensitive to the order of the columns and nodes. In addition, particular patterns may only be visible with a particular combination of row and column ordering. The literature on matrix ordering comes from various domains such as biology, statistics or graph theory. However they can be generally classified as:

 Interactive,  Statistical,  dimensional reduction,  heuristic,  graph linearization, and  bloc modelling. (Fekete 2009)

The advantages of matrix representations are that there are no overlapping nodes or edge crossings. In addition, manipulation is less computationally expensive enabling interactive approaches.

Tasks which require the tracing of paths are more difficult with adjacency matrices than with other forms of layout such as node-link representations. Solutions to the path following problem have been proposed which include attaching arcs to the outside of the matrix, and interactive methods that draw paths on the top of the matrix (Henry and Fekete 2007; Shen and Ma 2007 ).

63

Background

2.4.6 Arc Diagrams

Arc diagrams consist of nodes of a network positioned in a straight line in the

Euclidean plane and edges between them drawn as semi-circular arcs. Wattenberg

(2002) is generally credited with the concept but there are many others that have employed similar layouts (Saaty 1964; Nicholson 1968). The arcs can be drawn on one or both sides of the node. Using a single side provides space for labelling and other information. By sorting the nodes in a different order important patterns may become apparent.

The algorithm implementing the drawing ensures that each arc covers the same angle. Therefore an arc between two points will extend outward by a distance proportional to the distance between nodes. It is common for the algorithm to attempt to reduce the length of the arcs to assist in making the topology of the network easier to understand.

Arc diagrams do not convey the structure of the network as well as other layouts. However, with an appropriate ordering of nodes it is possible to identify cliques and bridges (Heer, Bostock et al. 2010). The problem of placing nodes in serial order such that they reveal underlying clusters is formally called seriation. There are a many algorithms to compute the ordering of nodes, for a historical overview see

(Liiv 2010).

2.4.7 Radial Layout

64

Background

Radial Layouts treat the graph as a tree rooted at a focus node. Generally a breadth-first traversal of graph starting from a focus point determines parent-child relationships. In these layouts the attention is on a particular node and how the structure relates to it. Nodes are arranged on concentric rings around the focus node.

A node lies on the ring corresponding to its shortest network distance from the focus.

The angular position of a node on its ring is generally determined by the sector allocated to it. Usually this sector is proportional to the angular width of the parent node sub-tree.

A number of authors have proposed extensions to the basic radial layout for various purposes, including dynamic visualisation or for very large graphs (Wills 1997;

Yee, Fisher et al. 2001). Basic radial layout algorithms and other extensions that strive to create more visually balanced layouts often mean that information on the structure of the graph is lost. Some approaches attempt to find a balance between the inherent structure and an ordered balanced depiction (Brandes and Pich 2010).

Radial layouts tend to suit tree structures because there is somewhat of a match in the increasing circumference of each circle to the spread of the nodes of a tree. Nevertheless, where the tree has a large number of leaf nodes they can be tightly packed on the outer circles. A special case of Radial Layouts are where all nodes are positioned on the same circle (see section 2.4.8)

2.4.8 Circular Layout

65

Background

Circular layouts are one of the oldest graph layout algorithms. They position nodes on the circumference of a circle with the edges going between nodes.

Many of the improvements in circular layout algorithms focus on reducing the number of crossings, edge length reduction and node ordering. Some algorithms draw the edges as curves rather than straight lines to increase the readability of the visualisation and include a barycentre heuristic. Other approaches draw links on the outside of the circle or ‘bundle’ edges for the same reasons.

An advantage of circular layouts is that nodes cannot be occluded by other nodes and by virtue of the fact you cannot have three collinear nodes, the issues of two edges obscuring each other is also avoided. In addition, if circular graphs have nodes placed at equal distances from each other and equal distances from the centre, no node is perceived as being more important because of their central position

(Gansner and Koren 2007).

The disadvantage of these layouts is they can be very dense and following paths can be difficult. In addition, it can be difficult to visualise the network topology.

2.4.9 Algorithm Summary

Table 2 lists the basic advantages and disadvantages of several fundamental algorithms. This list is by no means definitive as for every basic algorithm there are numerous proposed additions, modifications and improvements. The basic layout algorithms discussed in this section are only concerned with internal properties of

66

Background the graph. To introduce semantic or contextual information into the layout of the graph, the basic algorithms need to be modified.

Table 2 – Strengths and Weaknesses of Fundamental Algorithmic Approaches

Algorithmic Approach to Strengths Weaknesses Visualisation

Basic force- Intuitive and easy to understand, Does not differentiate directed + simple to implement. Extensions communities well. Naïve genetic or and heuristic improvements implementations have a others easily added. Smooth transitions relatively long running time to preserve user’s mental map. O(n^3). Finding local minima Can be extended to 3D. Path can produce low-quality following is better than other drawings. Do not provide a approaches. deterministic model.

Layout is sensitive to the Non- amount of curvature. Some Euclidean Large graphs can be graphs are more suited to represented. Nodes on Euclidean space. Similar to structures such as trees can be traditional force-directed placed with equal space algorithms, does not scale well between them. to large graphs. Visualisation

requires user interaction. Orthogonal The planarization step is often very difficult to perform Maximises the angular efficiently with large graphs. resolution of edges. Large graphs become cluttered.

Hierarchical Not useful when the data is not inherently hierarchical and not Useful where analysis requires directed. Can lead to incorrectly visualisation of hierarchies. interpreting a node to be inferior to another. Ordering of rows and columns Matrix Does not suffer from edge and affects interpretation. Difficult node occlusion. Scales better to perform edge following than node-link diagrams. tasks. Patterns hard to discern Efficient area usage. Layout not and can occur for different computationally expensive. orderings.

67

Background

Algorithmic Approach to Strengths Weaknesses Visualisation Radial User can shift focus without the Can result in very tightly packed need for panning. Suited to tree outer layers. structures.

Nodes that occlude other nodes Circular are avoided and collinear edges The links can be very dense and are not possible. No node can obscure information and perceived as being important hinder tasks such as path because of a central location. following. Compact drawing. Particularly

useful for star or ring networks. Node ordering may affect the Arc diagrams detection of patterns. Does not With correct ordering can reveal convey the structure of the clusters and bridges. graph well. Not useful for large graphs.

It is also important to note that the algorithms discussed in this section are often assessed in relation to the aesthetics that underlie their design. Whist this is important, “…algorithms that are designed for abstract graph structures, with no consideration for their ultimate use, will not produce useful visualisations of semantic information” (Purchase, Carrington et al. 2002).

68

Background

2.5 Enhancing Social Network Visualisation

When developing SNA visualisations some analysts attempt to adhere to

Shneiderman’s (1996) mantra of “overview first, zoom and filter, then details on demand” and some explicitly choose to violate it. Often those that don’t follow

Shneiderman’s guiding principles aren’t concerned with supporting the needs of the analyst, but are simply supplementing general interfaces or simply pursuing an artistic endeavour (Schindling 2007). The term “casual information visualisation” has been coined to describe this category of visualisation tools (Pousman, Stako et al.

2007). A common approach in assisting the user in developing a deeper understanding of their data is to focus on what they already know and allow the exploration from this reference point. This strategy is typified with the phrase, “start with what you know and grow”.

A number of visualisation tools have been developed for analysts, end-users or others which are simply curiosities or artistic endeavours. An example of the latter is Schindling’s visualisations of movie dialogues (Schindling 2007). Her tool, Cinematic

Particles, produces drawings that are based on the frequency of spoken works and the letters these words contain (Figure 13). Although this tool produces intriguing pictures, it is an artistic endeavour and the results cannot be utilised by analysts. For analysts this is often the problem of visualizations; they can be visually interesting, however, often they do not provide any quantitative benefit to analysis.

69

Background

Figure 13- Cinematic Particles6

Those attempting to enhance social network visualisations can be generally categorised as those that are endeavouring to improve the visualisation for a particular group or domain, and those that are trying to improve the generic capabilities offered by visualisation tools such that they can be applied to any domain. The motivation for improvements can be further categorised as those supporting analysis or those wishing to improve the user experience.

An example of a tool designed for a specific narrow domain is introduced by

Le et al (Le, Dang et al. 2008). The purpose of WikiNetViz is to visualise and analyse

“dispute-induced social networks”, specifically the networks that evolve through the disputed editing of Wikipedia7. Another example of those servicing a narrow domain utilising a node-link approach is Munzer (Munzner 2000). Munzer is firstly concerned

6 Source - http://www.evsc.net/projects/cinematic-particles 7 Wikipedia is a free online open content encyclopaedia project. 70

Background with overcoming problems of displaying large data sets with a node-link representation and secondly, the laying out node-link representations to communicate high level domain-specific semantics. The latter was achieved by catering for the concerns of an extremely small user base.

An example of those targeting the user experience is presented by Heer and

Boyd (Heer and Boyd 2005). Their tool ‘Vizster’ helps those using social networking services such as ‘Friendster’ visualise their friendship networks. Although it has the capability to automatically define friendships’ groups (clustering) and display them in an understandable way, the tool does not include any basic traditional analysis metrics in its functionality. These types of tools aren’t designed with analysts in mind but rather they focus on enabling the user to casually explore their social network.

2.5.1 Node-link Aesthetics

Efforts to improve visualisations often centre on improving the high utilised node-link (graph) representation. Those using the node-link as a visual representation of networks typically target improving layout aesthetics. Numerous aesthetic criteria have been proposed as a measure of how good a graph visualisation is. Some of the proposed criteria are based on research in human perception; however, many aesthetic metrics have not been empirically tested. A good description of graph visualisation aesthetics is given in (Bennett, Ryall et al. 2007). A summary of the most cited graph assessment metrics is provided below:

 Node Metrics

71

Background

o Distribute nodes evenly, keep nodes separated from edges, maximize orthogonally, minimise node overlap, maximise clustering of similar nodes.  Edge Metrics o Minimise edge crossings, uniform edge lengths, minimise edge lengths, minimise edge bends, minimise edge bends.  Overall Layout o Maximise consistent flow direction, maintain aspect ratio, minimise total drawing area, maximise symmetries of the graph, maximise path continuity.

Producing a visualisation that satisfies all these criteria is impossible as the optimisation of one criterion may negatively impact on other criteria. Maximising symmetries, for example, may negatively impact on the number of edge crossings

(Henry 2008).

Despite the considerable effort that has gone into devising algorithms to optimise the visualisation properties described above, very little work has gone into empirical validations of these aesthetic principles (Lee 2006; Bennett, Ryall et al.

2007). Moreover, much of the aesthetics seem only applicable to graphs with a small number of nodes. “In general, it makes no sense to test a graph of several hundred nodes for planarity or to try to minimise edge crossings” because at this point base many of the aesthetics criteria break down (Herman, Melancon et al. 2000).

Although much of the work on graph aesthetics appears to be based on intuition, a limited number of studies have attempted to evaluate and validate proposed criteria through experiments (Purchase, Carrington et al. 2000; Purchase,

72

Background

Carrington et al. 2002; Ware, Purchase et al. 2002). Studies on the effect these criteria have on the interpretation of social networks has been limited to date (Blythe,

McGrath et al. 1996; McGrath, Blythe et al. 1997).

2.5.2 Alternatives to Node-link diagrams

A common criticism of node-link diagrams is that, as the number of nodes increases they quickly become cluttered and conceal nodes and relationships making it difficult to discern any information, even when the graph contains as few as 20 nodes (Ghoniem, Fekete et al. 2005). In addition, the ability to deal with dynamic networks has been questioned with some claiming that the current approaches do not reveal network temporal dynamics (Appan, Sundaram et al. 2006). Some authors go further and state that the work in the area of dynamic network visualisation is

“often as much art as science” (Bender-deMoll and McFarland 2006).

In an attempt to overcome the shortcomings of node-link diagrams, some research has focused on providing alternatives to node-link diagrams. In general these alternatives consist of variations on the node-link diagram or matrix representations, or an amalgamation of the two in an attempt to leverage the advantages of both (Figure 14). These advantages are broadly described in a number of studies which found that matrix-based representations were better suited for large and dense graphs and node-link diagrams were more suitable for small and sparse graphs (Ghoniem, Fekete et al. 2005; Keller, Eckert et al. 2006).

73

Background

Figure 14 - Henry & Fekete’s matlink – Showing a comparative view of the network. Source: (Fekete 2009)

It is quite surprising the node-link representations are used as extensively as they are, given that these studies noted that the matrix approach performed better in almost all low level tasks with the exception of path finding for graphs with more than 20 vertices. Social Network Analysis often calls for the user to follow paths and this activity is difficult to accomplish using a matrix representation. However, the matrix representations allow the users to see nodes and their connections clearly.

A tool which combines the advantages of both matrix representations and node-link representations has been proposed (Henry and Fekete 2007). This tool essentially attempts to increase the ability to perform path related tasks using matrix representations. The matrix representation is supplemented with links overlaid on its borders and interactive drawing of additional links and highlighting of cells included in a path (Figure 15). A hybrid approach was also developed (Node Trix) which

74

Background

provided a global structure of the network that enabled portions of the network to

be shown as adjacency matrices to better support the analysis of communities

(Henry, Fekete et al. 2007).

Figure 15 - Path following on MatLink. Source: (Henry 2008) 2.5.3 Analytical User-Centred Visualisation

Some researchers advocate a much tighter integration of user-centred

analytical visualisation methods (Aigner, Miksch et al. 2008; Perer and Shneiderman

2008). The graphic shown in Figure 16 is a depiction of the growing maturity towards

combining visualisations, analysis and the user’s needs.

Considering the user’s analytical interests helps to achieve more targeted

visual representation. Some automated approaches have been proposed to assist the

user by highlighting relevant data and de-emphasizing data that is not of interest

(Aigner, Miksch et al. 2008). A key component of bringing the needs of the user, the

analysis requirements and visualisations together is the ability to interact extensively

with the visual interface. The field of visual analytics emphasises the use of interactive

visual interfaces which provide an enhanced ability for interactivity (Bartoletti,

Billinghurst et al. 2005). Modern interactive interfaces have challenged traditional

approaches which have traditionally tried to maximise what is drawn on the static

screen. New approaches now provide designers with opportunities to improve

75

Background visualisations by paying more attention to supporting specific user tasks with interactive controls (Shneiderman and Aris 2006).

Figure 16 - Visualisation Maturity. Source: (Aigner, Miksch et al. 2008)

In order to analyse networks, an analysis tool should provide the ability to filter the data and manipulate the view (Pohl, Reitz et al. 2008). In terms of manipulation, Zoom and Pan are indispensable, particularly when large graph structures are being explored. Zooming generally consists of two forms: Geometric zooming and Semantic zooming (Herman, Melancon et al. 2000). Geometric zooming is simply rendering an enlarged version of a portion of what is currently displayed.

Semantic zooming involves a change in the information that is being displayed as the user focuses on a particular area of the graph.

A number of frameworks and taxonomies of information visualisation interaction techniques currently exist. However they generally concentrate on low- level interaction techniques rather than a user’s intent while interacting with a 76

Background system. As a result Yi, Kang et al. (2007) conducted an extensive review of information visualisation systems and their interactive capabilities and proposed seven general categories of widely used interaction techniques. They consist of:

1. Select: mark something as interesting

2. Explore: show me something else (the most common exploring

technique is panning)

3. Reconfigure: show me a different arrangement

4. Encode: show me a different representation (Generally colour)

5. Abstract/Elaborate: show me more or less detail

6. Filter: show me something conditionally

7. Connect: show me related items

The scope of this thesis did not lend itself to exploring all these modes of interactivity. The visualisation approaches presented in this thesis largely utilise three of Yi’s interactions techniques: Explore, Encode and Filter. The most common technique to enable exploring in the Yi survey was Panning. Panning is utilised by the

TIPAD application discussed in this thesis to provide the ability to scroll to areas of interest. The TIMER tool supports a special mode of panning where three panes can be panned individually whilst maintaining connections across the panes. In addition, all tools and approaches discussed in this thesis apply some form of encoding to the data. In some cases this encoding enables pre-attentive cognition but also serves to

77

Background assist the user’s understanding of relationships. The proposed TIPAD and TIMER prototypes employ very simple forms of filtering to demonstrate how the application of the Parallel Arc Diagram and the TIMER encoding techniques can be used for analysis, even with a limited range of filtering options.

2.6 High-level Approaches to Social Network Visualisation

The sub-sections below describe approaches to visualising communities and dynamic networks and visualisation approaches that reveal the influence of content and context within networks.

2.6.1 Visualising Clusters (Communities)

A technique that can be very valuable in the visualisation of attributes is that of clustering. A cluster is also known as a ‘community’ or a ‘module’, with the term

‘community’ commonly used in social network analysis. Graph clustering is typically accomplished by finding similarities between data points according to characteristics inherent in the data itself. Unfortunately, no single definition of a cluster in graphs is universally accepted. However, it is generally based on classifying nodes according to a distance measure or vertex similarities (Schaeffer 2007). The primary difference between graph clustering and general data clustering is that data clustering is typically based on attribute similarity, such as, the Euclidian distance between two attribute vectors (Zhou, Cheng et al. 2009).

78

Background

There are two separate areas of concern when visualising clusters: (1) the detection of the clusters in the data, and (2) the laying out of the clusters in a visualisation. This thesis focuses on visualisation encoding techniques used to display the data. However, when discussing the visualisation of clusters it is often difficult to disentangle the algorithms for detection and those used for visual representation.

For example, some algorithms position nodes or centroids on a two dimensional plane with the visualisation component simply rendering objects at the calculated coordinates.

The goal of clustering is to group similar data items together, and has been widely studied and applied in numerous domains. An overview of commonly used clustering algorithms can be found in (Fung 2001). Clustering is often performed on large graphs as a mechanism to aggregate nodes and increase the intelligibility for the user (Huang and Eades 1998). However, the clustering algorithms typically have at least a few parameters, and determining an appropriate value may not be trivial.

The resulting clustering may be heavily dependent on the parameter choice. For example, determining the required number of clusters to create is often very difficult and often requires an initial visualisation to help estimate an appropriate number

(Schaeffer 2007).

Visually displaying clusters of aggregated nodes into a single representative super-node solves some issues relating readability. However, several new problems are introduced that need to be addressed in terms of correctly interpreting the drawings. These consist primarily consist of:

79

Background

 How is the intra-cluster connectivity represented so it can be understood?  How are the connectivity patterns of nodes in a cluster to another cluster understood?  How are the attributes of nodes understood in relation to the cluster?  How is membership of multiple clusters represented?

Research on multivariate information visualisation has shown that visually displayed clusters can yield a more reliable and effective overall analysis and interpretation of the data, compared with statistical methods which tend to summarize and compress information (Yi, Melton et al. 2005). However, the clustering approach is only possible in domains that have relatively small cardinality

(typically 2 ≤ D ≤ 30), where D is the number of attribute domains (Pretorius and Wijk

2008).

Researchers have proposed various approaches to visualise multivariate networks (Yi, Melton et al. 2005; Xu, Cunningham et al. 2007; Pretorius and Wijk

2008; Bezerianos, Chevalier et al. 2010) with some approaches taking advantage of

Force-directed layout algorithms. Force-directed layouts have regularly been used to visualise clusters but they have also been used as the mechanism to visually cluster nodes together. The addition of ‘dummy’ attractors into the network as a means to accomplish clustering is described in (Tamassia 1998; Brockenauer and Cornelson

2001) and has been utilised by a number authors (Huang and Eades 1998; Frishman and Tal 2004; Zhou, Cheng et al. 2009).

80

Background

Modifications and additions to the basic force-directed algorithm are often employed to generate the layout of clustered graphs. A “LinLog” layout is one of those variations which computes a layout and partitions the graph into clusters while it is being drawn (Noack 2003). Noack uses a model to obtain a layout whereby the distance between two clusters is inversely proportional to their coupling. Coupling of clusters is a measure of their connectivity and hence give a user a sense of the degree of inter-cluster connections.

Figure 17 – A Pseudo-random “satellite” graph rendered with (a) the LinLog model and (b) the Fruchterman- Reingold model (Noack 2003)

To draw useful interpretations from the LinLog generated drawing one must understand what properties of the drawing correspond to which properties of the graph. The LinLog algorithm ensures that the Euclidean distance of nodes corresponds to the adjacency of nodes and that the Euclidean distance of the set of 81

Background nodes corresponds to their coupling (Figure 17). These algorithms have been shown to provide minimum energy drawings from which valid inferences can be drawn.

However, interpreting drawings with high energy can give invalid results. Later this algorithm was modified so that the repulsion between nodes was replaced with repulsion between edges (Noack 2006).

Providing both detailed information as well as a global context in one image is one of the fundamental problems in Information Visualisation. Graph drawings provide an understandable representation with (at best) a few hundred nodes. To view larger graphs one of two strategies are typically employed. The first is clustering nodes into super-nodes which provide a “summary” of the graph. The second strategy is to limit navigation to a small subset of the nodes and edges at any one time. There have been a number of approaches employing either strategy or some that combine both.

Hierarchical clustering using an abridgement approach is one that combines both approaches using a force directed algorithm (Eades and Huang 2000). The abridgement approach enables the user to select a small part of the graph that is of interest and view the whole graph a piece at a time. Hierarchical clustering is performed by successively applying the same clustering process to groups discovered by previous clustering operations. In this way the clustering induces a hierarchical tree structure from a graph that is not necessarily hierarchical. The visualisation of the resulting tree represents the cluster structure of the graph but the finder details have been discarded.

82

Background

One method that provides a representation of clustering and also preserves the details of the links between nodes is the planar straight-line hierarchically clustered approach (Eades, Feng et al. 2006). This lays out all the nodes on a two dimensional plane and uses boxes around sets of them to indicate membership of a cluster. However this approach is only visually effective when applied to small graphs. A more scalable approach where only a portion of the graph is shown in detail while the rest is represented as partly overlapping spheres (super-nodes) is presented in (Ham and Wijk 2004). This approach maps relatively well to small world characteristics of many networks. That is, graphs that exhibit a local cluster structure and have a small diameter compared to the number of total number of nodes.

A method utilising force-directed layout algorithm to visualise clusters in three dimensions has been proposed (Eades and Feng 1997). A clustered graph can be considered a particular type of hierarchical graph in which the distance between two nodes from the inclusion tree is equal to one. Eades’s algorithm exploits this structure and utilises a force-directed layout to place nodes from each community into a level. This method extends two dimensional plane drawing algorithms to three dimensional multilevel representations. This approach provides a successively more abstract view in each successive layer. Although this approach allows views at different abstraction levels, it is difficult to understand the position and influence of an individual node within a community at the higher levels (Figure 18).

83

Background

Figure 18 – A multi-level straight-line convex drawing of a graph (Eades and Feng 1997)

Often the primary goal of the layout algorithm is to show the interaction between communities and thereby discover important nodes. One such algorithm places significant nodes that span communities in the centre. This enables easy detection of these nodes but does not show which community they interact with

(Cruz, Bothorel et al. 2014). It is also difficult to determine the degree individual nodes interact with other spanners or inner nodes of the community. Figure 19 shows the layout and placement and how the visualisation should be interpreted.

84

Background

Figure 19 – Layout to detect border nodes (Cruz, Bothorel et al. 2014)

There are various methods of representing nodes which overlap communities. Techniques that use semi-transparent convex hulls for communities naturally achieve this. The transparency level of hulls is often determined by the maximum number of overlapping groups in a determinate set of groups. However, when there is a high degree cross community membership it becomes difficult to discern what communities a node belongs to.

Another technique to visualise cross community membership is to place cross community nodes on the border of the cluster group and display pie charts at the nodes’ location to show the affiliation percentage to each community (Santamaría and Therón 2008). In this approach transparent hulls wrap elements of cluster and intersecting nodes and edges are not displayed to increase the comprehensibility of the visualisation.

85

Background

Summary of Cluster Visualisation

Table 3 - Strengths and Weaknesses of Cluster Visualisations Approaches

Approach to Strengths Weaknesses Visualisation

Border & inner node detection Does not show intra- + convex hulls community edges, making Helps to identify nodes that span (Cruz, Bothorel it difficult to understand communities and those on the et al. 2014) internal patterns. If a large periphery. Takes semantic and number of nodes have relationship information into multiple membership the account. visualisation may be hard to understand.

Hierarchical clustering using an abridgement Allow a user to focus on a cluster (Eades, Feng et whilst remaining cognisant of the Intra-cluster connections al. 2006) global structure. Suitable for large are not visible. graphs.

Mutli-level force- directed(Eades Nodes and edges can be and Feng 1997) occluded making it difficult Provides views at different levels of abstraction. to see the intra and inter cluster relationship.

Multi-level + Difficult to understand the force-directed Easy identification of cluster position and influence of an (convex hull, group membership in small individual node. Voronoi cells, graphs. Interpreting drawings with etc.) high energy problematic.

86

Background

Approach to Strengths Weaknesses Visualisation

Does not show the intra- cluster relations. Suffers LinLog + force- similar issues to force- Improves over the standard directed (Noack directed approaches. energy model in isolating 2003) Tendency to get stuck in clusters. Works well with small local minimum than world networks. Finds partitions conventional methods while being drawn. (when started from a random initial configuration). Semantic and Geometrical Suitable for small world distortions characteristics of many real world (Ham and Wijk graphs. Provides intra-cluster Details of the inter-cluster 2004) connections with a global view. connections are lost. Because O(Log(N)) clusters are visible, can be interactive.

Transparent Hulls and Pie Does not show intra- Chart community edges making it Identifying members from difficult to understand multiple clusters is easier than internal patterns. If a large traditional force-directed number of nodes have approaches. multiple membership, the (Santamaría visualisation may be hard and Therón to understand. 2008)

Does not scale well. Because nodes are Preserves the intra-cluster links constrained to their cluster whilst representing the clusters. they may appear further Planar straight- away than they actually line are. hierarchically clustered

87

Background

2.6.2 Dynamic Network Visualisation

Social networks are not static; the relationships that “connect” the network evolve and change over time as do the members of the network. The goal of dynamic network visualisation is to assist the user in gaining an understanding of how a network develops and changes, and includes comprehending the mechanisms of formation and disintegration of communities. Scientists use this understanding to build models of social processes that result in the observed structures.

Static-based graph-based representations are suited to investigating structural properties of networks at a single point in time. However, to analyse structural changes over a period of time various alternative visualisations are being investigated. Most visualisation models aggregate interactions over an entire observation period without regard to the chronological order. However, social scientists are interested in the emergence of communities which are brought about by groups of individuals interacting with each other. These groups tend to form and disband over extended time periods.

Generally, the combined use of timing and relational information is termed

‘Dynamic Network’ visualisation. Unfortunately, the term ‘Dynamic Network’ is overloaded with a number of alternative uses describing various subclasses (Bender- deMoll, Morris et al. 2007):

 Networks in which the edge and node sets remain fixed, but values of

the attributes on nodes and edges may vary in time (transmission

models). 88

Background

• Networks in which edges are added or deleted over time (computer

networks, friendship relations).

• Networks in which the weights of edges change over time (neural

networks, exchange networks).

• Networks in which nodes are added or removed in time (ecological

food webs, organizations).

Although in the past the temporal dimension was often overlooked, there is a growing recognition that this aspect is significant. The importance of understanding the evolution of complex networks is now becoming widely recognised. Social scientists and analysts are increasingly focused on the evolution of social networks over time in an attempt to understand how networks develop and change.

Consequently there is a growing amount of research devoted to the visualisation of

Dynamic Networks. Despite this, there are relatively few examples in the literature with very few tools or approaches sufficiently addressing the problem of how to visually analyse the social dynamics of change over time (Chen 2006; Ahn, Taieb-

Maimon et al. 2011). Moreover, technologies and methodologies appropriate to the temporal domain have had much less research devoted to them and as a result the current state of temporal network visualisation is still quite immature (Ahn, Plaisant et al. 2011).

The purpose of Dynamic Network visualisation is to help augment theoretical intuition provided by summary statistics and standard static visualisations (Moody,

McFarland et al. 2005). Current visualisation techniques that support analysis over

89

Background time include the creation of animations, and static visual snapshots over a period of time presented sequentially or placed next to each other on a page. These approaches are often supported with the display of summary statistics.

The literature has conflicting views on the merits of using a static or animated approach to facilitate understanding of the evolution of the network. A recent study compared the efficiency of animated displays versus static displays on macro and micro level analytical tasks (Farrugia and Quigley 2011). The study concluded that static representations are generally more effective, particularly in terms of time performance. Other work has studied the combined effects of layout and motion on viewers’ perceptions of social network graphs. They found that motion had a

“positive effect” on viewers’ perceptions of change in status from formal to informal networks ” (McGrath and Blythe 2004).

Numerous tools focus on vertex and edge level analysis, however some attempt to address analysis at a subgroup level. An approach to visualisation that concentrates on the evolution of subgroups and visualises the changes as plots of changing statistical measures has been proposed (Falkowski, Bartelheimer et al.

2006). These include stability, density and cohesion, Euclidean distance, correlation coefficient and group activity (see Figure 20). However, these techniques are only useful in networks that have relatively stable membership. In networks that consist of highly fluctuating membership, alternative methods of visualising the dynamics may be required.

90

Background

Figure 20 – Temporal development of subgroups. Source (Falkowski, Bartelheimer et al. 2006)

To visualise changes in subgroup structure with highly fluctuating members, techniques have been proposed that detect “community instances” and assign them to communities based on a similarity measure. That is, communities that have been discovered at different times are judged as being similar if the overlap of members between them exceeds a given threshold. Falkowski, Bartelheiimer et al. (2006) define these measures as:

91

Background

|푥 ∩ 푦| 표푣푒푟푙푎푝(푥, 푦) = min⁡(|x|, |y|)

Where |x| is the number of vertices in a community instance or intersection and 푥⁡휖⁡퐶퐺푖 and ⁡푦⁡휖⁡퐶퐺푗 describe the two community instances in the corresponding graphs of interactions. Using the above notion of overlap, the similarity function is defined as:

1⁡⁡⁡⁡⁡푡 − 푡 < ⁡ 휏 ∧ 표푣푒푟푙푎푝(푥, 푦) ≥ ⁡ 휏 푠𝑖푚(푥, 푦) = { 푗 푖 푝푒푟푖표푑푠⁡ 표푣푒푟푙푎푝 0⁡⁡⁡⁡표푡ℎ푒푟푤𝑖푠푒⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡

The equation introduces an upper boundary 휏푝푒푟푖표푑푠⁡ to the number of periods that may separate two potentially similar community instances. The visualisation consists of a left side which shows all communities detected in a time period. The y-coordinate is determined by the interval in which the community was detected and the x-coordinate, the period it appears in (see Figure 21). The visualisation requires that the user has an understanding of the layout to correctly interpret changes. In addition, significant changes are relatively easy, however, the detection of less significant changes are more difficult to detect.

92

Background

Figure 21 – Graph Visualization (left) and cut-out view (right). Source (Falkowski, Bartelheimer et al. 2006)

Given the ubiquitous nature of time-oriented data across many application domains

Aigner, Miksch et al. (2008) state, “a wide repertoire of interactive techniques for visualizing data sets with temporal dependencies is available.” Despite this, many current visualisation frameworks merely consider time as a common quantitative parameter and do not treat it as a special dimension. The work of Aigner, Miksch et al. (2008) is based on Frank’s (1998 ) characterisation of different forms of time. Their work includes a list of the three most important time characteristics from a visualisation point of view. These are:

 Linear time versus cyclic time

 Time points versus time intervals 93

Background

 Ordered time versus branching time versus time with multiple perspectives.

In this regard they demonstrate the importance of choosing the correct visualisation technique and appropriate parameters on datasets that have time as an attribute. Figure 22 shows the appropriately parameterized “spiral graph” which makes the periodic pattern easy to discern. These techniques are well known and have been explored by others (Carlis and Konstan 1998; Hewagamage, Hirakawa et al. 1999; Weber, Alexa et al. 2001). It is important to note however, that one may not always be aware that the data encodes a particular characteristic (in this case cyclic time) and even if you are aware, it is often difficult to find suitable parameter settings.

Possible approaches to this problem are applying analytical methods to detect patterns or animating the possible parameter settings (Aigner, Miksch et al. 2008).

Figure 22 – Spiral Graph visualisation of cyclic time-oriented data. Source: (Aigner, Miksch et al. 2008)

Although not focusing on network representations, there have been proposed approaches to improve the visualisation of time-ordered geospatial data

(Eick, Eick et al. 2008). The central features of this approach are linked dual timeline

94

Background components for multiple timescales and the time wheel component (see Figure 23).

This allows for fine and coarse control over the period of interest to the user.

Figure 23 – Geoboost. Source (Eick, Eick et al. 2008)

Time-tunnel is a tool that visualizes any number of time series numerical data as individual charts in a three dimensional space. The central feature of this tool is its ability to overlay charts and compare them (Akaishi and Okada 2004).

Those attempting to improve the ability of graphs to represent temporal changes frequently concentrate on the dynamics of constructing an animation of a node-link diagram. A common design choice is whether it is better to keep the position of nodes fixed or allow them to move over time. The fixed position is often regarded as the best way for a user to maintain a mental model. However, this approach introduces the additional challenge of attempting to establish the ideal location for each node because it has been shown to dramatically affect the perception of the network (Moody, McFarland et al. 2005).

95

Background

Various research has concentrated on providing a visualisation of network metrics such as measures of centrality or using centrality measures to derive other visualisations that show inferred details (Gloor 2005; Dwyer, Hong et al. 2006; Weng,

Chu et al. 2007). The measure of centrality has been used to infer communities within the network and the evolution of social relationships in groups (Gloor 2005; Weng,

Chu et al. 2007). Others have used centrality to show a positive correlation between changes over time in betweenness centrality and creativity, and a negative correlation between changes in betweenness centrality and performance (Kidane and

Gloor 2005).

Often ideas for the visualisation of information come from real word metaphors. Two visualisation systems inspired by real world phenomena have been demonstrated (Blythe, Patwardhan et al. 2006). The first is based on the metaphor of fluid flow through elastic pipes and the second on wave propagation (see Figure 24).

In both of these visualisations motion is central to efficiently identifying salient information in the data. The elastic pipe metaphor uses an animation of link line to represent changing relationship between nodes. The wave propagation metaphor is similar to dropping a pebble into a pond. This corresponds to selecting one of the nodes in the graph (the focal node). The focal node will rise and fall continuously and nodes that are related to the focal node will rise on subsequent time steps, with the timing and height of their motion tied to their distance (either relationally or temporally) from the focal node. Both of these approaches rely on the

96

Background animated movement of objects and as such, static snapshots of the dynamic display cannot be relied upon to be interpreted correctly.

Figure 24 -Wave propagation visualisation. Source (Blythe, Patwardhan et al. 2006)

A similar approach (shown in Figure 25) draws from waves propagating in a pool of water with the observation that the ripples start from the centre and radiate outward (Appan, Sundaram et al. 2006). This facilitates the representation temporal aspects with a concentric circle layout without using a graph (node-link) based layout mechanism. Their layout can focus on periodicity, isolated events or widespread growth.

The focus of this work was on the visualisation and not information retrieval.

A simple keyword search algorithm finds all messages pertaining to the topic. The messages are then ordered in time and associated with a set of people. The visualisation is provided with three scales of time, consisting of: weekly, monthly and quarterly. Two underlying algorithms detect both “spikes in time” and “spikes in people”. ‘Spikes in time’ refers to detecting message activity that exceeds a certain threshold and the activity exhibits a sharp rise and fall in small time duration. ‘Spikes in people’ refers to detecting people who send a large percentage of the messages 97

Background relevant to the selected topic. One weakness of this visualisation approach is that it does not have a view of the complete structure (or sub-structure) of the network, that is, what are the frequent communication paths and how are people connected to each other?

Figure 25- Ring Visualization. Source: (Appan, Sundaram et al. 2006)

Another approach inspired by nature is demonstrated by tool ThemeRiver™.

This is a prototype system that visualises thematic variations over time across a collection of documents (Havre, Hetzler et al. 1999). The major design goal of this visualisation was to provide a visualisation which a user could detect a theme change over time (Figure 26). It provides a macro-view of the thematic changes in a corpus of documents.

98

Background

Figure 26 - Theme River. Source (Havre, Hetzler et al. 1999)

The approach targets those undertaking exploratory analysis with the intention of making patterns, trends, anomalies visible. Coloured “currents” that run horizontally within the “river” represent themes. The x-axis represents time and each

“current” changes width to reflect the strength at each time slice. At each time slice a viewer can judge the relative widths of the currents and make interpretations about the relative strength of the themes. The visualisations can also include markers for significant events that occur during the time-line.

This visualisation approach loses continuity and structure if too few or too many themes are visualised. In addition, it is difficult to identify minor trends in the data because the curves tend to de-emphasize very small values. The visualisation displays relative and not exact values and works best with continuous data although interpolations between discrete data points can be used.

99

Background

Figure 27 shows a screen shot of the Social Network Image Animator (SoNIA).

This is a visualisation and analysis tool that is primarily targeted at visualising networks that change over time Moody, McFarland et al (2005).

Figure 27 – SoNIA screen shot8.

The creators of this tool have developed and modified common force- directed layout algorithms that allow the creation of “movies” that show the evolution of the network. They have focused on two separate types of dynamic network visualisations: flip books and dynamic movies. In the flip book approach nodes remain in a constant position and arcs fill in the holes among these nodes. The dynamic movie approach allows the nodes to move as a function of relational change.

8 Available from: http://sourceforge.net/projects/sonia/files/sonia/sonia_1_2_0/ 100

Background

SoNIA can create movies based on various layouts such as: Kamada Kawai,

Fruchterman-Riengold, Moody’s Peer Influence, metric MDS, file coordinates, and circle. The algorithm in SoNIA uses interpolation to create meaningful movement between network time samples and rests on finding an “anchor” starting position for the network nodes. The tool supplies a rating on the “accuracy” of the layout. Using a modification of the Kruskal’s stress statistic that compares the matrix and screen distances, the value indicates the degree of distortion after rendering (Kruskal 1964).

The creators of SoNIA have demonstrated the value of the holistic-view in terms of highlighting structural change in one dimension that was not visible in another dimension. Nevertheless, other studies have found that static images of force-directed networks provided better performance that their animated counterparts (Farrugia and Quigley 2011). Currently animated movies are mostly used as an exploratory data analysis tool and have yet to be utilised as a confirmatory tool.

The tool TeCFlow is another to utilise movies to visualise changing behaviour between single actors in dynamic networks (Gloor and Yhao 2004). Their tool treats the exchanges of e-mail between actors as an approximation of social ties. The visualisation algorithm uses a sliding time frame to influence a Fruchterman-Reingold

(1991) force-directed graph. Each graph is representative of a single day and the combination of these graphs results in an interactive movie which can be positioned at a time of interest to the user. The visualisation highlights network links inside the selected time frame with the others dimmed.

101

Background

The tool is designed to be used in three steps:

1. Watching the movie to find dense clusters indicating the potential emergence of collaborating teams.

2. Identifying peaks and troughs in the betweeness centrality and density to find interesting phases of collaboration.

3. Use the contribution index to understand the roles of individuals in the teams.

A number of approaches shunning the use of node-link diagrams have been demonstrated. A visualisation method that shows the evolution of communities composed of individuals was inspired by charts depicting the proximity of movie characters (Reda, Tantipathananandh et al. 2011). The visualization has similarities to a timeline chart, however, while the X axis represents time the Y axis is used to position individuals into groups. These groups form thicker bundles creating “social fibres” (see Figure 28).

102

Background

Figure 28 – Visualisation of two communities at time-step T1 and T2 and two individuals forming a third community at T3 (Reda, Tantipathananandh et al. 2011)

User testing indicated potential for this type of visualisation, however, the scalability of the approach may be problematic. The visualisation copes with 20 to 30 communities that have “moderate” membership changes. However, when there are a large number of community changes in a short period of time, it results in excessive thread crossing and a cluttered view. Another issue that is yet to be addressed with this type of visualisation is one member can belong to multiple groups. Possible methods of dealing with these issues include limiting the crossing to a particular subset of individual, and splitting of threads into branches.

Many analysts are interested in the sequence of events over time to understand causal sequences and this is particularly true for those undertaking forensic analysis (Phan 2008). However, the tools described in this section do not visualise the sequence of events over time very well, whether it be ordinal or continuous time-based data. 103

Background

A special case of dynamic social network visualisation (and one this thesis considers) is that of visualising the progression of conversations either in the physical environment such as business meetings or in virtual ones such as forum discussions.

The virtual environment is particularly attractive to researchers as our modern

“connected” society means there is often a permanent record of communication available to be analysed. Communication is the process of transferring information from one source to another, and can thus be conceived of in terms of a network. This network is often taken to be a proxy for social ties and is therefore representative of the social network.

Visually assisted analysis of communication has been used in numerous ways, such as the prediction of links in criminal social networks and the behaviour of teenagers, managers and project team members (Gloor and Zhao 2006). The work in the area of visualising communication tends largely to be concerned with discovering patterns of interactions and classifying those an ego communicates with into distinct groups.

The visualisation tool “Soylent” was aimed at making the social and temporal setting of everyday collaboration visible (Fisher and Dourish 2004). This goal rests on the hypothesis that there is structure in interaction, which can be found in electronic traces of communication activity. Although the primary target of this tool is the end user themselves, this work identifies common topological patterns found in social networks to help the user interpret the results of the visualisation. An understanding of how to find and interpret these patterns is of prime importance when using node-

104

Background link representations and the ability to interpret a node-link representation is aided by a good understanding of archetypal patterns.

Significant patterns identified are characterised as recurrent structures that occur across users or across time. These are listed below and more detailed descriptions can be found by referring to their work (Fisher and Dourish 2004):

 the Onion Pattern,  the Nexus Pattern,  the Butterfly Pattern, and  the shifting Involvement Pattern.

Some are less concerned with topological patterns but rather chronemics, that is, patterns of a temporal nature (Perer, Shneiderman et al. 2006). This research is directed towards the detection of “Rhythms of Relationships”. These rhythms were based on the frequency of interactions with others which are aggregated to provide a number representing the average interactions during the period of a year.

There are several potential problems when using aggregation. When using aggregation one should be cognisant that temporal patterns emerge at different scales. In addition, in Perer’s work the aggregation was based on the email headers and not the content. Aggregation techniques such as this may introduce artefacts possibly of a regular nature that obscure more interesting hidden patterns.

Other researchers concentrate on approaches which could aid analytical tasks and target visualization techniques that can represent various aspects of conversation (Tat and Carpendale 2002). They argue that there is a need to be able 105

Background to view the whole conversation at one time without having to move back and forth to compare conversational characteristics. A visualisation tool “bubba talk” uses the motion of “bubbles” to indicate the direction of communication instead of arrows in directed graphs (Figure 29). Dots also accumulate around the bubble representing the speaker giving an indication of the quantity of contributions to the conversation.

Other symbols are used to indicate various types of emphasis. These visualizations generate visual patterns from conversational elements such as tempo, punctuation and character usage.

Tat and Carpendale state that the graphical patterns produced can reveal some aspects of the mood of each speaker in addition to the connections between the speakers. This is in contrast to tools such as ThemeRiver which can use word frequency information but does not claim to “portray the spirit” of a conversation. In

Bubbatalk speakers are initially positioned in a circle. The direction of commination is indicated by bubbles moving towards the recipient of the communication.

Elements of typed communication result in modification of the features of the representation. For example, an animated circle grows in size every time a speaker uses an exclamation mark.

106

Background

Figure 29 - Bubba Talk. Source: (Tat and Carpendale 2002)

Visualisation tools targeted at revealing patterns in individuals’ communications and the social network that develops around them have been proposed (Tat and Carpendale 2006). CrystalChat shows the social network and the temporal characteristics of one individual’s personal chat history in a three dimensional structure (Figure 30). One of the design goals was to show temporality indicating which message is followed by which and to indicate the comparative time between messages. This egocentric view of the communications uses a three dimensional hub and spoke diagram. The centre represents the focus node and each spoke represents each person the focus node communicates with. Time is both vertical and along rows. For example, the vertical time could show successive days and with the row showing conversations in a day. This approach has saleability usability issues when applying it to records of conversation that go for long periods of time, or where the communication is highly active in the time window.

107

Background

Figure 30 – CrystalChat. Source: (Tat and Carpendale 2006)

Occasionally researchers will use the dialogue from well-known plays or movies as a means of inferring social ties over time. The visualisation of the resulting inferred networks has a strong association with the visualisation of networks based on communication logs. This methodology is well illustrated in the work of inferring and visualising social networks on Internet Relay Chat (Mutton 2004). Mutton successfully applied his visualisation approach to both IRC and Shakespeare’s play

Macbeth. To infer these networks he applied simple heuristics that provided reasonably accurate approximations of the social network, in a similar way to which social networks are inferred from real life conversations. The script of the play provides a sequence of dialogue similar to that observed in an IRC channel or what is presented in logs of discussion forums or email.

Much of the work in studying social networks inferred from communication is made possible by the prolific use of email. Many tools and studies use email as the

108

Background basis for the social network under examination, see (Boyd and Potter 2003; Kerr

2003; Fisher and Dourish 2004; Fu, Hong et al. 2006; Perer, Shneiderman et al. 2006;

Perer and Smith 2006; Viégas, Golder et al. 2006; Fu, Hong et al. 2007).

To view conversations in the context of a social network, Donath, Karahalios et al. (1999) designed two graphical interfaces, “Chat Circles” for synchronous conversation and “Loom” for threaded discussions (see Figure 31). These interfaces were created primarily to reveal patterns of conversations such as bursts of activity, the arrival of new members, or the evolution of conversational topics.

Chat Circles focuses on representing real time communication of chat systems. However, the archive of these messages can be represented as a

“Conversation Landscape”. Participants identified by colour are positioned along the x-axis with the y-axis representing time. Chat postings are represented as horizontal lines, the wider the lines the longer the message. Highlighting shows who is within hearing range.

Figure 31 –Loom (left) and Chat Circles “Conversation Landscape”(right). Source: (Donath, Karahalios et al. 1999)

109

Background

Loom focuses on enabling the discovery of patterns in threaded newsgroup communication. The primary view consists of participants on the one axis and time on the other axis. The posts can be represented as dots or lines connecting posts as it passes from person to person. Although loop can make some patterns visible, this is sensitive to the ordering of the participants, particularly when viewing as a threaded discussion.

Figure 32 shows an approach to provide a typographic visualisation of an individual’s email content over time based on the hypothesis that the visualisation of email content constitutes contextual representations of an individual’s relationships

(Viégas, Golder et al. 2006). This visualisation approach gives a unique content-based view of relations. Keywords from communication messages are placed in columns based on a selected timescale. These keyword stacks allows a user with knowledge about the communication (usually the owner of the mailbox) to reflect on the evolution of relationships over time. One disadvantage is that it only can show dyadic relationships, that is one relationship at a time. Although useful for a user browsing and understanding their personal message, it is not that useful for analysis.

110

Background

Figure 32 - Themail screen shot. Source: (Viégas, Golder et al. 2006)

In reviewing the work conducted in the area of social network visualisation based on a conversational record, it becomes apparent that a number of key elements are important in determining patterns. As a starting point in visualising conversational records, a visualisation should assist in making easy determinations of the elements listed below:

 the frequency and duration of interactions,

 the frequency and duration of periods of no interaction,

 the attributes and categorisation of recipients,

 the actors position in the social network,

 the content and nature of the message.

As discussed in section 2.3 the analysis of Ego networks is one method that analysts use to understand social networks. In recent times the dynamic nature of these networks and methods for visualising and analysing them has appeared in the literature. One method represents dynamic ego networks using static 111

Background representations instead of animation techniques in using “Small Multiples and Tree- ring Layouts” (Farrugia, Hurley et al. 2011). The central idea is to place several small visualisations next to each other to facilitate pattern identification and comparison

(see Figure 33).

The individual visualisations are made with concentric circles representing discrete time periods, like the growth rings in trees. The “ego” is placed in the centre and points are placed on the ring representing alters’ interactions with the ego. Each alter is assigned an angular position with respect to the ego. This visualisation technique is similar to Ring and Spiral Graph approaches (cf. Appan, Sundaram et al.

2006; Aigner, Miksch et al. 2008).

Figure 33 - Small Multiples Network Exploration System. Source: (Farrugia, Hurley et al. 2011)

The small images can be sorted by different criteria such as; the node degree at a time period, node attributes, and activity within the time period. A limitation of this approach is the chain of connections is not immediately apparent and following 112

Background a path is difficult. In a reasonably large graph it is difficult to select appropriate egos to compare. Supplementing this approach with another method may help to provide the indicators for appropriate ego selection.

Another approach that uses small multiples in conjunction with a matrix layout for the visualisation of dynamic networks is “matrix cubes” (Bach, Pietriga et al.). The approach consists of a three dimensional representation, however, this provides more of an overview and a means of manipulating views other than an analysis view itself. The small multiples view provides the basis for understanding and interpreting the underlying dynamic network (Figure 34).

Figure 34 - Small multiples of vertex slices (Bach, Pietriga et al.)

The projection of the cube that is selected determines the “slicing” of the cube. For example, if the time x node projection is selected the three dimensional

113

Background cube view is sliced into adjacency matrices, one for each time step. Each node-slide then shows the dynamic ego-network for each node which enables a comparison of individual connection patterns. This time slicing is possible on datasets that are based on discrete time. However, if the data consisted of continuous time it would have to be discretised. This visualisation approach derives much of its advantages from the basic adjacency matrix representations, for example, the easy identification of node degree. A limitation of this approach is that the small multiples technique does not scale well with certain projections on large networks.

Summary of Dynamic Visualisation Approaches

Table 4 - Strengths and Weaknesses of Dynamic Visualisation Approaches

Approach to Strengths Weaknesses Visualisation

Not scalable, only 20 to 30 Depicts the change in communities possible. Does (Reda, communities over time. not represent individual Tantipathanana interactions. ndh et al. 2011) SoNIA/TeFlow Not as useful in As in an exploratory tool it can confirmatory applications. draw attention to structural Suffers from the same (Gloor and Yhao changes over time. inherent limitations of 2004; Moody, node-link diagrams. McFarland et al. 2005)

114

Background

Approach to Strengths Weaknesses Visualisation

A loss of continuity and structure if there are too Themeriver few to too many themes. Easy to understand and useful for Works best with continuous identifying macro trends. Following trends over time easier data. Micro trends are

(Havre, Hetzler than a histogram. difficult to identify. Relative et al. 1999) and not exact values shown.

Ring

Provides visualisation of spikes Understanding the and time and people. Patterns of structure and communication are more visible. communication paths is not (Appan, Provides a summary of activity. immediately apparent. Sundaram et al. 2006) Wave propagation Rely on the animated Using pre-attentive movement movement of objects and disparity, the changing as such, static snapshots of relationships of a given node the dynamic display cannot (Blythe, visible over time. be relied upon to be Patwardhan et interpreted correctly. al. 2006)

Community Significant changes are Detector relatively easy to detect the Detects possible communities. detection of less significant Suitable for networks that have changes present a problem. highly fluctuating membership. Requires an understanding Provides an overall view and of the community detection zoomed area of interest. and layout to make valid (Falkowski, interpretations. Bartelheimer et al. 2006)

115

Background

Approach to Strengths Weaknesses Visualisation

ChatCircles (Conversation Landscape) Does not display the global The history interface structure. Provides an ego- “Conversation Landscape” layout centric view only. The paths reveals the temporal patterns of of communication are not discussions. (Donath, immediately apparent. Karahalios et al. 1999) Crystal Chat Scalability issues in applying to long records or highly active communication. Presents the opportunity to Egocentric view, lose sight detect patterns of interaction. (Tat and of global communication Carpendale which may impact on the individual. 2006) BubbaTalk Is best understood by viewing the animation. Capable of providing an Must understand the indication of the ‘tone’ and mapping of textual tempo of the conversation. Tat and elements to visualisation Carpendale features. 2002) Loom

Enables discovering of patterns Ordering of participants in not visible using other the threaded view affects visualisation approaches. readability and Potentially to visualise messages interpretability. (Donath, classified by content. Karahalios et al. 1999) Allows a user to detect and Themail Only explores dyadic reflect the evolution of a relationships. No relationships expressed by the visualisation of the global content of messages. Useful for communication. Not very (Viégas, Golder casual browsing and useful for analysis tasks. et al. 2006) understanding.

116

Background

Approach to Strengths Weaknesses Visualisation

Small Multiples + Ring layout Easy to understand based on a No global view of relations. recognised metaphor. Able to Path following is difficult. display different egos Selection of appropriate

(Farrugia, Hurley simultaneously. egos can be difficult. et al. 2011) Matrix Cube Derives many of its advantages The small multiples from basic matrix approach does not scale representations. For example, well with certain easy identification of node projections on large (Bach, Pietriga degree. networks. et al.)

2.6.3 Visualising Semantics and Attributes

“I should venture to assert that the most pervasive fallacy of philosophic thinking goes back to neglect of context” – John Dewey

Context is often stated as a core value of community science, however, a survey of empirical articles published in the American Journal of Community

Psychology shows that community scientists utilise a narrow range of statistical tools that are not well suited to assess contextual data (Luke 2005). A social network is composed of more than simply links and actors. Interactions occur in a particular context; date and location for example. This non-topological information enriches the network but is still underexploited by researchers (Cruz, Bothorel et al. 2014). Most graph visualisation techniques focus on the structure of the graphs and do not support techniques to deal with attributes (Pretorius and Wijk 2008). Much of the standard Social Network Analysis techniques focus on understanding within a single

117

Background context and within this many practitioners use unweighted, single-criteria representation of relationships (Renfro 2001).

There are two types of attributes discussed in the literature. In this section we are concerned with individual attributes of nodes and edges and not structural attributes. Structural attributes consist of deriving meaning from a node’s position in the network using the measures described in section 2.3. Tools such as

SocialAction (see Perer and Shneiderman 2006) which are designed to utilise and examine structural properties of network via systematically examining numerous

SNA measures are not discussed here.

Often analysts aren’t able to utilise the content and contextual information in the data for reasons of source confidentiality and maintaining the anonymity of the actors in the network. However, where semantic information is available, using it in conjunction with traditional network measures can only add to the quality of the analysis. This is particularly so where analysts use communication networks to infer the existence of other networks such as friendship or collaboration. A fine-grained approach may reveal changes in processes in more substantive detail rather than simply developing a model based on the topology alone. (Moody, McFarland et al.

2005).

The most rudimentary visualisation approach that incorporates node or edge attributes and endeavours to preserve the network topology is where a limited number of attributes in the form of colour, shape, size or glyphs are embedded in a force-directed layout. There are a number of visualisation tools which map various 118

Background attributes to features of a force directed display. For example, the Visone and

Cytoscape9 tools map attributes to a node label, node colour, edge width, size of the node, radial position, etc. (Brandes and Wagner 2004).

Although this simple method preserves the topology, presenting attributes in this way has a number of limitations. This includes the number of attributes that can be displayed simultaneously and reduced readability. The type of information that is shown in these types of displays invariably means the user will have to refer to a key to decode meaning. This makes visualisations of this type less useful for a user in quickly identifying possible relationships between an attribute and the structure of the network.

Another simple visualisation technique that considers node attributes as an important visualisation parameter is “attribute-based scatterplots”. These plots are useful for understanding how attributes of the nodes affect who is connected to whom (Borgatti, Everett et al. 2013). These plots work best when the attributes are continuous and not categorical, and work in a similar fashion to conventional scatterplots. Nodes are positioned on the x and y-axis based on two attributes, however, the difference with this representation is that the relational edges are drawn between nodes.

A more advanced use of scatterplots is occasionally seen in sophisticated visualisation tools. These tools can use a “scatterplot matrix” as the primary selection

9 Available: http://www.cytoscape.org/index.html 119

Background mechanism. A scatterplot matrix presents a series of small charts of all the pairwise relationships. Each of these charts show the correlation between a given pair of attributes. A scatterplot matrix is a graph exploration tool which provides a quick visual representation of potential associations between variables.

GraphDice tool was created to assist in social network analysis which uses a scatterplot as the master selection mechanism (Bezerianos, Chevalier et al. 2010).

The tool displays multiple scatterplots for every combination of attributes and arranges them in a plot matrix with node links drawn between nodes. This overview matrix is ordered based on a function of the similarity between dimensions. Selecting a scatterplot thumbnail from the overview matrix displays a large version. Moving from one scatterplot to the next is performed via three dimensional animated transitions with the aim of the user extracting structure from the motion during the transitions (see Figure 35). A weakness of this approach is that the scatterplot matrix approach only allows an examination of two variables at a time and complex attribute interactions may not be detected “during the transition”.

120

Background

Figure 35 – GraphDice. Source (Bezerianos, Chevalier et al. 2010)10

Another method which considers the context is the combining of traditional approaches of social network analysis with text analysis. This attempts to take advantage of the strengths of each approach (Gruzd and Haythornwaite 2008). The approach facilitates constructing a network from the content of the communication messages and using the content to supplement analysis at critical junctures.

In a social network each actor has internal information which presents an opportunity to analyse the network from different perspectives. One method of integrating the compositional information (semantic information) is finding the semantic categories and then changing the weights of the links in the graph according

10 Image reproduced with the kind permission of Dr Jean-Daniel Fekete. Image sourced from http://raweb.inria.fr/rapportsactivite/RA2010/aviz/IMG/graphdice.png 121

Background to the semantic similarity of the nodes. A model to find and visualise communities in social networks based on the non-structural semantic information was presented by

(Cruz, Bothorel et al. 2014).

Approaches to visualise multivariate information whilst preserving the topological network structure often use conventional visualisation approaches and combine them with a node-link representation. Generally this is presented in an interface with tightly coupled views exploiting the strengths of each technique. For example, Shannon, Holland et al. (2008) combined a traditional Parallel Coordinate

Visualisation (PCV) coupled with a traditional force-directed graph layout and tested this approach with social network data (Figure 36).

PCV’s are a statistical visual analytics technique that assists in investigating large data sets with high attribute dimensionality. They visually show the distribution of attribute values and excel at visually clustering. The sharing of ordinal or quantitative attributes across entities can be identified by the distribution of lines on each attribute dimension. A limitation of PCV is that any relationship between individual nodes is not represented. However, this tool includes a tightly-coupled graph representation to account for this limitation. Polylines are drawn translucent whereby the more lines drawn to a particular point become more opaque and hence more prominent. Brushing techniques are used to link the views together.

By coupling the two distinct views, a wider range of interactions is possible which enables effective “drill down” into the data to access rich details about individual items in the dataset. In this case, the application of two simultaneous 122

Background views is utilised to overcome the inability of Parallel Coordinate Visualisation to show relationships between individual data cases, but leverage its ability to represent the distribution of attribute values. A weakness of this approach is that large data sets create a lot of visual clutter and obscure meaningful patterns. In addition, the order of the axes and the nodes can dramatically change the readability and interpretation.

Different scales of attributes can make it difficult to interpret.

Figure 36 –Enhanced Paired Parallel Coordinates visualisation tool. Source:(Shannon, Holland et al. 2008)

Some researchers have chosen to adopt multivariate attribute visualisations approaches that avoid the typical use of node-link graph visualisations (Shneiderman and Aris 2006; Bezerianos, Chevalier et al. 2010). For example, the

Pretorius and Wijk (2008) were primarily concerned with visualising the attributes and edge labels of networks as they claim users often seek to understand

123

Background graphs in terms of those two elements. Their goal was to develop a highly interactive visualisation technique that would enable users to inspect graphs based on data associated with the nodes and edges. In addition, they provided an alternative to the arduous and error prone practice of formulating formal queries.

This approach clusters nodes and edges by considering their associated data.

Edge labels are positioned in the centre of the visualisation and edges are partitioned by letting every edge pass through the region that represents its label. Using a directed graph as input, source nodes are positioned on the left and target nodes are on the right. By recursive partitioning a new layer of clusters is computed for each attribute and drawn in a column of boxes left and right. Querying the data can be accomplished by selecting the data clusters and edge labels in sequence either additively or in a subtractive way.

The Pretorius and Wijk tool was targeted at general graph visualisation and was tested with state transition data but was untested on datasets of social networks

(see Figure 37). A weakness in this approach is that a large number of edge labels or attribute categories will make the screen very cluttered and compressed (without further aggregation).

124

Background

Figure 37 – AttriGraph, Multivariate graph visualisation. Source: (Pretorius and Wijk 2008)

Much of the current suite of network visualisation tools are based on node- link diagrams that make use of force-directed layouts (Aris and Shneiderman 2007).

These types of layouts have inherent drawbacks including: numerous link crossings, occluded nodes, cluttered displays and unreadable labels. In attempting to solve these problems the authors were inspired by map-based layouts, enabling the user to create meaningful spatial layouts in which they could understand node placement, spot relationships among nodes, and notice regions where nodes were absent or sparse (see Figure 38).

125

Background

Users of the tool, design rectangular regions and group nodes into them according to one of their attributes (user-defined semantic substrates). The designer interface is shown in Figure 39 which allows a user to build the substrate. The two unique features of this tool are the ability for users to design the layouts of components, and the support for laying out nodes according to node attributes. The user can limit visual clutter through the use of filters; however, the dataset can have a dramatic effect on link visibility. In addition, the decision on which combination of attributes to use and place in regions can be non-trivial and affects the interpretability of the graph. Having more than three regions can get complex and possibly make it difficult to understand.

126

Background

Figure 38 -Network visualisation with semantic substrates. Source: (Aris and Shneiderman 2007)

127

Background

Figure 39 -Substrate designer. Source (Aris and Shneiderman 2007)

PivotGraph (Figure 40) adopted a grid-based approach focusing on the relationships between node attributes analogous to pivot tables used in spread- sheets and Online Analytical Processing (OLAP). The technique relies on classical node aggregation and edge contraction techniques, reducing the graph to a small number of categorical dimensions (Wattenberg 2006). Two dimensions are compared to each other at any one time.

This tool allows the user to visualise nodes according to categorical attributes.

Each value of the attribute is evenly positioned along the horizontal axis, with the size of the circle representing the number of nodes possessing that attribute value. The edge width between nodes is drawn proportional to the number of edges connecting 128

Background the two points introduced. The visualisation is appropriate for exploratory and hypothesis generations and could assist in identifying attribute relationships.

Figure 40 – PivotGraph showing two categorical dimensions. Source: (Wattenberg 2006)

Lee’s dissertation (Lee 2006) was focussed on simple readable visualisations and their associated interaction techniques. Lee showed that analysis can be effectively done through simple data representations such as histograms and tables.

NetLens (see Figure 41) and Paper lens (the winner of the InforVis 2004 contest) are examples of applications built around this approach (Kang, Plaisant et al. 2007).

129

Background

Figure 41- NetLens. Source: (Plaisant, Bederson et al. 2006)

NetLens is a data focused system which supports any dataset that can be represented with a Content-Actor data model. The user interface presents content on the left and actors on the right. The top contains histograms which can show the distribution of items over the available attributes. The bottom area consists of ordered lists describing the selected data. All the views are tightly coupled; selecting a histogram bar for instance will change the lists to only show the data included in the histogram. Rich data manipulation facilitates the ability to formulate complex queries without knowledge of SQL or the database.

130

Background

User testing of NetLens revealed a number of usability issues, with the major ones being a lack of understanding of derived relationships and a lack of visibility of what filters had been set.

There are a small number of approaches that utilise force-directed algorithms alone with node or edge attributes as an integral shaping parameter within the layout algorithm. One such algorithm aims to show the clustering in a small world network and identify which node attributes are most influential in the clustering and in the layout in general (Gibson and Faith 2011). This technique is representative of a number of approaches that rely on dimension reduction techniques so the nodes can be drawn in two dimensions. A prerequisite for this particular visualisation is that a node’s assignment to a cluster structure must be known a priori. The clustering can be based on observing an attribute that occurs naturally in the dataset or by a clustering algorithm.

The visualisation algorithm utilises two sets of nodes; one set consists of entity nodes and the other consists of attribute nodes. In the final visualisation entity nodes are displayed and attribute nodes are not. However, the attribute nodes are integral to the final layout and can be influenced by the user to fit with their intuition.

A potential weakness of this approach is the loss of information through dimension reduction may mean that micro patterns are no longer discernable. In addition, the use of a force-directed layout means it will suffer from issues related to that layout approach.

131

Background

Several approaches have used attribute data as the driving characteristic of the clustering or layout. However, much of this work does not take the topological structure of the graph into account. A hierarchical approach has been introduced that allows a user to influence the hierarchical structure such that they can create different abstractions (Archambault, Munzner et al. 2008). To accomplish this, they utilise topologically preserving graph hierarchy and an algorithm which respects edge conservation and connectivity conservation.

The approach relies on a high amount of user interaction combined with automated algorithms to perform meta-node creation, hierarchical modification, pattern matching, coarsening and colouring. Users perform selection on a tree view and executing queries. The hierarchy is modified by selecting a graph cut and initiating a reform-below-cut or merge-at-cut operation based on the current selection. Spatial proximity is then used to indicate parts of the graph with similar attributes.

The strength of this approach is its focus on preserving the topological structure and the elimination of clutter through hierarchies. A potential weakness is optimal use relies on the formation of regular expressions. In addition, the layout will change as the user “explores” the graph which may disrupt their mental map. User feedback also suggested that some people had difficulty in understanding how containment is used to convey information about the dataset.

132

Background

Table 5 - Strengths and Weaknesses of Attribute-based Visualisations

Approach to Strengths Weaknesses Visualisation

Attribute-based Scatterplots Useful for detecting how Works best an attributes attributes of the nodes affect are continuous and not who is connected to whom. categorical.

Embedding attributes Limited number attributes that can be displayed Provides a simple approach to simultaneously and visualisation whist keeping spatial reduced readability. arrangement representative of Decoding key needed to E.g.Visone/Cyto the graph topology. provide meaning. Cannot scape easily identify the (Brandes and relationship to topology. Wagner 2004; Baur 2008) Force-directed + dimension Dimension reduction reduction approaches suffer from a May assist in being able to assess loss of information and which attributes are the most and therefore unable to detect least influential in the layout. micro patterns. Suffers from issues inherent in E.g. (Gibson and force-directed layouts. Faith 2011) Topology is not apparent. Scales reasonably well due to Only two dimensions can Pivot Graph summarising graphs. Useful for be compared at any one exploratory and hypothesis time. When there are a generation through the number of different identification of attribute dimensions selecting an relationships. appropriate set to view might be difficult. Graph Dice Consistent and simple The scatterplot matrix representation. Provides edge approach allows and node attribute visualisation. examination between two Links tabular information to variables only. Multiple (Bezerianos, visualisation. Provides dynamic variables may interact. Chevalier et al. query capability. 2010)

133

Background

Approach to Strengths Weaknesses Visualisation

Attrigraph A large number of edge Can answer questions such as, labels or attributes will “which edge types are activated make this approach by specific node attributes?” cluttered and very (Pretorius and compressed. Wijk 2008) Parallel Coordinates Large data sets create a lot of visual clutter and can Visualisation Able to detect correlations. Can obscure patterns. The order tool + Force- view distribution of attributes. of axes and nodes can directed Does not require the use of effect interpretation. legend. Different attribute scales cause confusion.

No view of the topology of NetLens the network. Network Can formulate complex queries paths cannot be seen or without knowledge of SQL or the traced. Understanding database. Shows the relationship derived relationships is between actors and the difficult. Easy to get lost in (Kang, Plaisant attributes. a sequence of exploration et al. 2007) interactions. GrouseFlocks The layout of the graph constantly changes as the user interactively explores. Attempts to preserve topological Performs best with a user structure. Eliminates clutter by understanding of regular clustering in a hierarchy. (Archambault, expressions. Containment Munzner et al. approach is difficult for 2008) some to understand. Semantic Link visibility is a problem substrates dependant on the dataset and the semantic space Facilitates an understanding of a design. Having more than set of attributes to another three regions can get dimension such as time. complex. Decisions on which attributes to use and

(Shneiderman place in regions effects and Aris 2006) interpretability.

134

Background

2.7 Studies and Evaluation

It is important to know if a particular visualisation technique actually works and additionally, there is a need to know under what circumstances they should be used, how they compare and what tasks they best serve (Santos 2008). With the effort expended by numerous authors attempting to improve the visualisation of networks some have noted that there is very little evidence offered that quantitatively demonstrates that their particular tool or approach improves the analysis of the data (Hall, McMullen et al. 2006). Despite the long history of graph visualisation research, only a few graphs visualisation systems have actually been tested with real user’s (Lee 2006). This isn’t surprising given the challenges of evaluating information visualisations.

One of the major problems in performing visualisation evaluation is the wide range of functions and interactions required by users of such tools. In particular, challenges are greatest when evaluating the utility of a visualisation to support exploratory data analysis (Kang, Plaisant et al. 2007). Exploratory data analysis means the research strategy is unknown a priori, therefore the set of tasks the user may want to perform will not be known. Therefore evaluating visualisation systems which support exploratory data analysis is problematic as the controlled experiments may not effectively represent the individual users research strategies (Perer and

Shneiderman 2008).

In Henry’s (Henry 2008) thesis she describes three levels of possible evaluation:

135

Background

1. The component level

2. The system level

3. The work environment level

The most common type of evaluation is at the component level as it is easier to quantitatively measure aspects at this level. Evaluation at the system level is less common with assessment at the “work environment” level being very rare due to more qualitative and ethnographic methods employed and the associated extended periods of time required. However, Perer and Shneiderman (Perer and Shneiderman

2006) believe network analysis cannot be replicated easily in the form of small user studies and advocate a series of longitudinal case studies believing this to be the most effective method of evaluation.

Henry (Henry and Fekete 2007) evaluated their visualization tool MatLink with matrix graph (MAT) and node-link representations in an experiment establishing readability in terms of errors and time. In this experiment they gave users tasks related to common mid-level tasks in social networking analysis. This revolved around evaluating connectivity, finding central actors and identifying communities.

Viégas and Donath (2004) also trialled different forms of visualising networks comparing the traditional network graph with a temporal based visualisation. They concluded (based on observations and interviews) that both visualisation methods complemented each other and they offered basic principles that should be incorporated in visualisations but did not offer quantitative evidence for such advice.

136

Background

A number of authors have created taxonomies of low level tasks commonly required to interpret visualisations of graph data (Ghoniem, Fekete et al. 2005; Keller,

Eckert et al. 2006; Lee 2006; Cui 2007) and some have developed taxonomies specifically for evaluating temporal network evolution (Ahn, Plaisant et al. 2011).

Taxonomies can help us go beyond simply asking whether something helps by offering tools to answer questions of why and how it helps (Munzner 2000). By using established taxonomies one can measure performance for a particular task and qualitatively determine whether a tool or an approach is of value. The evaluations in this thesis use this approach. However, many taxonomies target application level tasks and do not focus on methods of evaluating the merits of the visual encoding technique. Nevertheless, as Munzner (2000) points out in her thesis, user studies are tricky to construct without confounding variables and it is sometimes difficult to convince others that the positive results of a study merit high-level conclusions about the validity of an approach.

Munzner (2009) proposes a model with four nested layers which characterise the visualization design and validation process (depicted in Figure 42). Using this model as a reference much of the work in this thesis is situated in the encoding and interaction technique and design layer crossing into the area of “algorithmic design” and the “operation and data abstraction design” layer. An important aspect of

Muzner’s model is the recognition that validation can be conducted at the immediate and downstream levels.

137

Background

Figure 42 - Nested Model for Visualisation Design and Validation. Source: (Munzner 2009)

However, testing at any of the intermediate levels can introduce its own challenges. For example, “a poor visual encoding choice may cast doubt when testing a legitimate abstraction choice, or poor algorithm design may cast doubt when testing interaction technique” (Munzner 2009). Despite this, Munzner notes that downstream testing is necessary as the testing of intermediate levels only provides partial evidence of success.

Munzner’s model brings to the fore problems that will be encountered when evaluating prototypes that provide the necessary platform to trial a particular visualisation encoding technique. When applying a downstream approach to validate the visualisation component, the user’s experience is reliant on all upstream levels to be designed properly. When one of the levels is not well designed the user experience is diminished. However determining the cause of the poor experience may not be easy. For example, a particular set of results may raise the question, “was the algorithm slow or was the visual encoding scheme the cause of the issues”?

138

Background

Figure 43 - Threats and validation in the nested model. Source (Munzner 2009)

Possible threats and validation approaches at the different levels in the nested model are shown in Figure 43. The evaluation approaches at the encoding and interaction design level usually consist of evaluating the design with respect to known perceptual and cognitive principles. This could be in the form of expert review and comparative evaluation. Downstream approaches tend to consist of laboratory experiments, qualitative discussions of images or video, quantitative measurements of the representations produced (e.g. measurable aesthetic criteria such as the number of edge crossings in graph drawing) or informal usability studies. Many of the common techniques drawn from the HCI community such as user studies and usability testing ultimately test all Munzner’s four levels at once. Appling these techniques to testing the prototypes presented in this thesis would therefore not provide a useful evaluation of the visual encoding technique.

Evaluations of visualisation systems in the literature range from informal usability observations to formal controlled experiments designed to attempt to 139

Background gather statistically significant results (Lee 2006). It seems the more general purpose the visualisation tool or infrastructure; the less empirical evaluations can be employed. Weaver (2006) for example opted for the Delphi method c.f. (Linstone and

Turoff 1975), a qualitative evaluation approach, eliciting the opinion of users and arriving at a collaborative consensus. Aris (2008) adopted a “case study” approach for his thesis which consisted of long and short case studies. These studies involved conducting a series of sessions and eliciting feedback from the user.

An additional difficulty of this research is it is largely focused on determining the potential of the visual encoding technique. However as explained above, the benefit of visual features are intimately intertwined with the domain characterisation, the data abstraction design, and the algorithm design. Nonetheless, a common approach is to evaluate the tool or technique against a generally accepted visualisation taxonomy, see (Morse, Lewis et al. 2000; Valiati, Pimenta et al. 2006). A taxonomy for the graph visualisation domain is proposed by Lee, Plaisant et al (2006)

. Their taxonomy is focused on static representations of networks but can also be applied to dynamic network representations.

Lee’s (2006) thesis recommends and opts for a taxonomy of low-level tasks which facilitates the evaluations of the effectiveness of visualisation systems. Lee’s taxonomy was partially derived from the work of Amar, Eagan et al. (2005). Amar,

Eagan et al. presented a set of ten low-level analysis tasks geared towards the facilitation of user analytic activity. They compiled the list of tasks by asking participants about how they would analyse five data sets from different domains.

140

Background

Lee’s taxonomy presents a comprehensive catalogue of graph visualisation tasks and relevant examples but they did not explicitly address the time dimension.

There has been recent work on creating taxonomies for temporal networks such as the work conducted by (Ahn, Plaisant et al. 2011). They propose a taxonomy of visualisation tasks for the analysis of network evolution. One of their goals was to learn about common strategies of existing techniques and in addition attempt to discover strategies that were not in use yet. Their taxonomy suggests that visual analysis of networks is performed at three different levels of granularity; the node, group and network levels. In addition, they divide temporal analysis features into individual time event features and aggregated time event features. A central finding was a notable gap in the existing tools for visualisation of “rate of changes” in the aggregated time event features area.

141

Background

142

Attribute Based Graph Visualisation (ABGV)

Chapter 3. Attribute Based Graph Visualisation (ABGV)

This chapter examines the methods of visualisation that focus on attributes of the graph in addition to topology. It embodies the first small step in the research journey documented by this thesis. The sections below describe and demonstrate how the relationship of attributes (metadata) to the network topology can be visualised in accordance with the stated research aims.

Some of the techniques and tools discussed in subsequent chapters incorporate the importance of attributes and other contextual information touched on here and in section 2.6.3. As such, this chapter provides some of the context for the work described in subsequent chapters which is focused primarily on techniques of visualising the temporal dimension of social networks.

This chapter first discusses the requirement for visualisations of this nature and common approaches to satisfy this requirement. The ABGV technique combines the ideas of force directed clustering described in Chapter 2.4.1 and that of a “dust and magnet” approach utilised in multivariate visualisation. The technique described in the following sections is demonstrated through the use of a proof of concept prototype application. It uses a force directed approach where attributes are

143

Attribute Based Graph Visualisation (ABGV) introduced as additional nodes of the graph which apply forces on a selected set of entity nodes.

The visualisation of a randomly generated graph using a standard node-link approach is compared to the ABGV method and the advantages and disadvantages are discussed. A preliminary evaluation is conducted based on published work which determined that (1) the identification of central actors that bridge community’s together, and (2) analysing roles and positions, are key tasks which visualisations should support. A more detailed analysis is deferred to Chapter 7.

3.1 Background

Social Networks typically are rich in attributes associated with both actors

(nodes) and relation links (edges). These attributes could consist of race, gender, age and affiliation for example. In addition, the application of social network analysis

(SNA) introduces additional attributes such as degree and centrality (Bezerianos,

Chevalier et al. 2010). Each of these attributes or properties is termed a dimension of the network which can be of discrete, continuous or categorical types. The process of analysing networks consisting of multiple dimensions is termed multivariate social network analysis. Understanding the effects of the many attributes on the network is seen as one of the main challenges of multivariate SNA (Bezerianos, Chevalier et al.

2010).

Researchers frequently attempt to correlate social network structure with domain attributes, however many researchers have cited a lack of tools to facilitate

144

Attribute Based Graph Visualisation (ABGV) this (Ahn, Plaisant et al. 2011). In addition, there is an urgent need for graph analysis techniques where node attributes and edge labels play a central role. However, very few tools support this type of analysis (Pretorius and Wijk 2008). The work described in this chapter begins to address the area of visualisation that assists in the correlation of network structure with attributes through the application of a simple visualisation technique.

The technique described in this chapter is a simple extension of a common visualisation approach which benefits both casual information visualisation users and experienced analysts alike. The advantage of being based on a common technique is that there is minimal effort required to implement it in the current suite of visualisation tools.

The technique described in this chapter was also motivated by the desire to explore possibilities of visualising the correlation of attributes and network structure using just one visualisation technique thus avoiding the use of tightly coupled views.

It concentrates on one dimension without the application of aggregation techniques.

The intention of this approach is to depict and facilitate the investigation of the topology relative to the categories of the dimension. The technique is demonstrated through the use of a proof of concept prototype application.

145

Attribute Based Graph Visualisation (ABGV)

3.2 Motivation and Inspiration

Analysts are often interested in defining and detecting substructures that may be present in a network. The divisions of actors into groups and substructures facilitate an understanding of how the network as a whole is likely to behave. In addition, knowing how an individual is embedded in the structure of a network may also be critical to understanding their behaviour (Hanneman and Riddle 2005).

Subgroups can be divided into two types: structural groups and domain groups. In the former, structural positions of the members decide the groupings. In the latter, some measure of similarity determines the groupings (Ahn, Plaisant et al.

2011). Analysts are also interested in the relationship between the two categories of structural groups; however, many of the approaches discussed in section 0 do not facilitate examining the relationship between the two.

When attempting to understand a social network, analysts are typically interested in the various roles and groupings in the network. This generally consists of finding the connectors, maven, leader, bridges/cutpoints, and isolates.

Correspondingly, analysts investigating a network in terms of node attributes require a similar understanding of the roles of the actors. For example, ‘who are the

“cutpoints” that span attribute groups? or conversely ‘who are the isolates overall and within attribute groups?’

A number of attribute based visualisation techniques described in section

2.6.3 suffer from a loss of awareness of the topology of the underlying graph and are

146

Attribute Based Graph Visualisation (ABGV) therefore supplemented with node-link visualisations to make up for a lack of awareness of the structural aspects. The desire to visualise the domain groups (based on attributes) together with structural aspects of the network provided the motivation for the approach described in this section together with the desire for an approach that is relatively simple to use, understand and implement.

In an effort to extend the use of general multivariate information visualisation to inexperienced users, Yi, Melton et al. (2005) propose the Dust and Magnet approach. This is proposed as an alternative to traditional Multidimensional Scaling

(MDS) techniques which despite showing clusters in data sets, is poor at extracting meaning due to the dimensionality reduction. Using the magnet metaphor, the attributes become magnets and the data points the dust which is attracted to the attributes. In relating this approach to graph visualisation, it is very similar to adding additional attractors representing attributes into the network.

A technique of applying node-link diagrams to visualise the contribution of attributes and preserving the structural topology is to cluster nodes based on node type. Figure 44 shows an approach implemented by (Stolpnik 2009) in which a clustering algorithm is utilised whereby nodes are constrained to a visual area.

However in this instance, the importance of the nodes bridging each cluster can only be determined by comparing the number of incoming and outgoing links and not by spatial positioning.

147

Attribute Based Graph Visualisation (ABGV)

Figure 44 - Node Type Clustered Layout. Source: (Stolpnik 2009)

In the next section I describe a method combining the concept of clustering around attributes and attribute visualisation. It is primarily targeted at users that are inexperienced in graph analysis but wish to explore how attributes of network entities are affected by or affect the topology. The technique stands alone, in contrast to tightly coupled approaches which takes existing visualisation techniques and links them together to form a single tool. The visualisation technique described below is a proof of concept which demonstrates the value and feasibility of including it in existing tools.

3.3 Technique

A simple visualisation method is proposed that uses a standard force-directed layout based on the notion of spring forces as introduced by Eades (1984).

Specifically, constructing an augmented graph by adding a set of “dummy” vertices, where each vertex represents attribute value. A similar approach was applied to

148

Attribute Based Graph Visualisation (ABGV) visually cluster similar vertices and keep separate clusters visually distant from each other (Zhou, Cheng et al. 2009). In contrast, I propose using the dummy attractors as a simple means of clustering nodes around a set of mutually exclusive categorical attributes. Nodes possessing a particular attribute are attracted to the dummy node representing that attribute, in a similar way to visualisation proposed by (Yi, Melton et al. 2005). The name of the attribute is imbedded in the visual representation to facilitate the cognitive mapping of the attribute space to two dimensional space.

This method introduces additional nodes into the force-directed network that represent discrete or categorical attributes. The introduced attribute nodes have an attraction force which can be adjusted by the user. In addition, these nodes have a spring force between them with an adjustable natural spring length. This force strives to keep the attributes roughly positioned equidistant from each other in the two dimensional space. The distance between these attributes is moderated by the force of intra-attribute ties in the network and the resulting relative spatial position of the attribute nodes are generally representative of the ‘closeness’ of attribute groups.

There are two additional adjustable forces. The first is between a node representing an attribute and those nodes exhibiting that attribute. The second is a repulsion force between nodes with dissimilar attributes. The positioning of nodes is affected by the ties between nodes in a similar way to standard force-directed layout however it is also affected by the nodes’ ties to attributes. In this way, as well as the attributes nodes affecting the topology of the network, they themselves are affected by the structure of the network.

149

Attribute Based Graph Visualisation (ABGV)

An approach allowing the user to control the layout by adjusting the edge forces was adopted. This is termed an “Edge Type Aware Layout” (Stolpnik 2009). This type of approach enables visual clustering of nodes according to their semantic information, emphasising different aspects of the data.

Figure 45 - ABGV visualisation

150

Attribute Based Graph Visualisation (ABGV)

Figure 46 - Standard force directed visualisation of the same network as shown in Figure 45

Figure 45 and 46 depict the same fictional organisation. Both figures use similar force directed approaches to layout the nodes of the graph. This consists of a model whereby nodes are connected via springs which have user configurable natural lengths. However, Figure 45 includes the addition of additional attractor nodes representing an attribute category.

Classic force-directed graph drawing algorithms strive for uniform edge lengths in the quest for “favourable” aesthetics. The mechanism of achieving near

151

Attribute Based Graph Visualisation (ABGV) uniform edge lengths in many force-directed approaches is accomplished by spring forces which have a natural spring length. Hence an algorithm that seeks a global minimum energy state is one where the global spring forces are as close to their natural state as possible. The ABGV approach uses spring forces; however, not all of them need to employ a uniform natural spring length. An attribute node is connected by an invisible spring to all other attributes. From a user’s perspective the primary principle is to adjust the intra attribute node forces such that the attribute nodes largely encapsulate the network nodes.

The addition of the attribute nodes results in a tendency for nodes with a particular attribute to be attracted towards their attribute node, to which they are invisibly connected to via a spring force. The repulsion force also tends to move dissimilar attribute nodes away from each other. The combination of these forces can be viewed as a simple mechanism of clustering nodes around their attribute node.

The attribute nodes are visible and are adjustable in size; this creates node association with attributes via both colour and proximity.

The attribute nodes are larger (adjustable) and are labelled with the attribute.

The colour coding of an attribute and node possessing that attribute are the same, allowing the user to infer the attribute-node association. The natural tendency for nodes to cluster around their associated attribute distributes the nodes based on attributes but allows identification of those nodes bridging attribute groups.

Referring to Figure 45 it is also evident that the distribution of the nodes around an

152

Attribute Based Graph Visualisation (ABGV) attribute can be indicative of a particular characteristic. In the example shown above some departments are insular whereas others are more collaborative.

3.4 Data Structures and Design Choices

The graphs visualised by the prototype are described in GraphML, an XML description language which describes the nodes, edges and the attributes of nodes and edges. The prototype described in this section tests the visualisation technique with node attributes. Testing this technique with edge attributes is left for future work.

The data set was randomly generated with a graph generation tool constructed by Fabien Viger11 using the algorithm described in (Viger and Latapy

2005). This network represents ties to others in the organisation and consists of 31 nodes with a “heavy-tailed’ distribution of links. The ties can be thought of as interactions via email beyond a set threshold.

The assignment of nodes to attributes and total number of nodes was accomplished with a random number generator and a random sequence generator12.

After the generation of the graph additional steps were undertaken to introduce disconnected nodes into the network and randomly assigning attributes to nodes.

11 The random Graph generation tool is published under the GNU General Public License and is available from http://fabien.viger.free.fr/liafa/generation/ [12/3/2012] 12 The random number generator and random sequence generator is available from http://www.random.org/ [1/2/2014] 153

Attribute Based Graph Visualisation (ABGV)

The disconnected nodes were randomly selected from the nodes that had a degree of one with the number of possible disconnected nodes being between 0 and 5.

The resultant network is intended to represent an organisation consisting of

31 people. The number of attributes was selected to be four to avoid the resultant visualisation being too trivial but allowing one to perform analysis of the technique.

The intention of this technique is to explore the relationship of attributes to the topology of the network. This technique will however, have limitations on the number of attributes that can be explored. In addition, it will be subject to the same limitations that apply to all node-link representations, namely increased clutter and occlusion as the network density increases.

The intention of this approach is to provide a simple means of examining relationships between nodes, their attributes and the graph sub-structures whilst still preserving the global structure of the graph and hence the user’s mental map.

3.5 Discussion

The application of this technique assists analysts in performing community analysis. Those undertaking community analysis desire techniques to group actor by attributes and study the connection patterns within each group (Henry, Fekete et al.

2007). The visualisation technique builds on the most common graph representation; the node-link diagram. The representation is therefore more likely to be familiar to users and designers of social network visualisations.

154

Attribute Based Graph Visualisation (ABGV)

It is well documented that the readability of node-link representations is particularly poor on dense networks or areas of networks that are locally dense. This technique is based on the node-link representation and therefore it too will suffer readability problems on dense networks. This technique does not overcome any of the inherent problems of node-link representations. Rather, this technique offers a mechanism that uses a force-directed graph to improve the users’ ability to conduct the three principle tasks described in the next section. As the graphs become denser the ability to conduct these analytical tasks becomes more difficult just as the utility of standard node-link representations decrease under increasing graph density.

As this technique is a simple extension of standard node-link force-directed visualisation it could easily be included in existing tools to facilitate the examination of connection patterns in relation to attribute groups. The optimal parameter settings for the force-directed algorithm were adjusted manually for both the traditional and

ABGV layouts. Further work on this visualisation technique could include automatic methods of optimally positioning the attribute nodes. However, the approach described here requires a degree of user interactivity which according to many researchers is the cornerstone of exploratory analysis (see section 2.5.3).

This approach has been tested on small graphs that don’t utilise the aggregation of nodes. This was done deliberately to insure ‘faithfulness’ in information visualisation (see section 7.1.8 in the evaluation section on a discussion on task faithfulness). However, nothing precludes utilising this visualisation approach

155

Attribute Based Graph Visualisation (ABGV) on aggregated networks; however, further work is required to test its viability on such networks.

Figure 45 represents a network in which there are four mutually exclusive attributes of interest possessed by nodes in the network. In addition to the limitation on network size, the tests I have conducted have shown a limitation on the number of attributes this visualisation technique is able to effectively support. When three attribute attractors are displayed there is a minimal impact on the layout, however, as more attributes attractors are placed in the visualisation the more likely the resulting spatial positioning will not accurately reflect a proportional node/attribute relationship.

3.6 Evaluation

Each of the prototypes presented in this thesis are analytically evaluated in

Chapter 7. In this section the performance potential of the technique is appraised against three principal tasks in social network analysis as identified by (Henry, Fekete et al. 2007) and influenced by work of (Wasserman and Faust 1994) and (Scott 1991).

They are:

 (T1) identify communities, i.e. cohesive groups of actors that are strongly

connected to each other;

• (T2) identify central actors, i.e. actors linked to many others or that bridge

communities together;

156

Attribute Based Graph Visualisation (ABGV)

• (T3) analyse roles and positions — these are higher level tasks relying on the

interpretation of groups of actors (positions) and connection patterns

(roles).

These tasks are used as the basis of assessing the utility of the proposed visualisation technique. As outlined in this chapter, attributes play an important part of a network. Most existing visualisation approaches do not assist in determining the relationship attributes have to the network topology. This visualisation technique aims to allow the users to assess a network consisting of mutually exclusive discrete attributes in terms of the three tasks.

This visualisation technique is designed to be applied to datasets in which we know a priori the categorical attribute to which a given node has been assigned. If we identify those possessing an exclusive attribute as separate groups of interest, then the Henry, Fekete et al. task set can be restated as a set of attribute-based tasks:

 Task 1, identify intra and inter attribute communities i.e. cohesive groups of

actors that are strongly connected to each in groups possessing the same

attribute or different attributes.

 Task 2, identify central actors. That is, those that are connected to many

others and bridge attribute groups.

 Task 3, the analysis of roles and positions equates the higher level task of

interpreting the significance of groups of actors and connection patterns

157

Attribute Based Graph Visualisation (ABGV)

in relation to attribute groups. This task is informed by both Task 1 and

Task 2.

Considering the network shown in Figures 45 and 46, a situation is proposed where an analyst is confronted with the questions: “who are the important individuals that provide an interface into each department?” and “how is the interface achieved?” Answering these questions using a traditional node-link force directed visualisation (shown in Figure 46) presents a challenge. Although the traditional visual representation does make clear that Renetta is connected to several disconnected nodes from various departments, it does not highlight the significance of her role as a link between the HR and the purchasing departments. Her links with

Paul, Mitchel and Claude are important in terms of HR and Purchasing relationship.

However this is not immediately apparent in the standard attribute colour coded graph in Figure 46.

Identifying intra and inter attribute communities is easier to ascertain using

Figure 45 than with Figure 46. Using Figure 45 it is easy to determine for example, that Brooke and Terry are the most active in bridging attribute communities. It is also clear the Brayden, Aidan and Conner provide the primary interface to HR for Finance and Marketing areas but only Claude, Mitchell and Paul provide the interface to the

Purchasing area. Contrasting this to Figure 46 although one is able to discern that

Brook and Terry have a high degree, their role in bridging communities is not clear as is the role of Brayden, Aidan and Conner. Furthermore, although one is able to discern that Renetta is the only one with connections to Mitchell, Paul and Claude,

158

Attribute Based Graph Visualisation (ABGV) the importance of this relationship in the context of the complete network is not clear.

The spatial positioning of nodes clearly shows ‘isolates’ within attribute communities or across them. Typically force-directed layouts do not show ‘isolates’, however, if they do, they tend to drift to the extremities of the visualisation because they are not bound by a force to any other node. The ABVG layout overcomes this problem as a node is always bound to an attribute node. This produces the effect whereby ‘isolates’ hover close to its parent attribute node and via repulsion forces as far is it can be from all other attribute nodes.

The discussion of the analysis above relates strongly to attribute-based tasks one and two. The improved ability to perform these attribute-based tasks facilitates an improvement of the higher level task of interpreting the significance of groups of actors and connection patterns in relation to attribute groups, namely task three.

3.7 Implementation

3.7.1 Data

The visualisation technique is designed to be applied to datasets in which we know a priori the categorical attribute to which a given node has been assigned.

Therefore, it is assumed the dataset to be used by this technique contains a mutually exclusive categorical attribute assigned to each node.

159

Attribute Based Graph Visualisation (ABGV)

The data was constructed in the GraphML13 file format where the accepted format is defined by the GraphML schema 14. The composition of individual nodes in the GraphML file is shown below, where the data key “attribute1” defines the category of the node.

Data Schema

Node Example

Xxxxx

M

deparementX

person

13GraphML may be used free of charge and a description can be found at http://graphml.graphdrawing.org/ [accessed 18/07/2013] 14 The GraphML schema can be found at http://graphml.graphdrawing.org/xmlns/1.1/graphml.xsd 160

Attribute Based Graph Visualisation (ABGV)

3.7.2 Program Code

The implementation makes use of the Prefuse visualisation toolkit which provides an extensive visualisation framework for the Java® programming language15. The technique described in this section uses simulated forces to help discern nodes that bridge communities together and facilitates analysis of connection patterns (roles) using a modification of the standard force-directed layout algorithm.

The implementation of the layout mechanism utilises the Prefuse layout and force directed visualisation components. The concept demonstrator described in this section uses a customised force directed layout with the utilisation of the Prefuse

ForceSimulator class. The force simulator uses the Prefuse RungeKuttaIntegrator for the positioning of the nodes and is supplemented with the NBodyforce and DragForce for stability.

The NBodyForce computes an n-body force such as gravity, ant-gravity, or the results of electric charges and implements the Barnes-Hut algorithm for efficient simulation when there are a large number of nodes in a network. This aids in computing all the forces acting on a node and is performed in O(N log N) time where

N is the number of nodes. The DragForce simulates a viscosity or drag on a node to help stabilise nodes and prevent node jitter.

15 Prefuse is released under the terms of a BSD (Berkeley Standard Distribution) license and is available at https://github.com/prefuse/Prefuse [accessed 26/07/2013] 161

Attribute Based Graph Visualisation (ABGV)

The description of the code herein will not include details of implementation of the classes available in the Prefuse framework. For a detailed discussion of those refer to the Prefuse package documentation.

The customised force directed layout implements forces between nodes based on their type. Different forces are applied dependent on the type of nodes represented in the relationship. The force relationships controlled by the layout algorithm consist of:

 Attribute node to attribute node

 Attribute node to entity node exhibiting that particular attribute

 Attribute node to entity node exhibiting an alternate attribute

 Entity node to an entity node with similar attribute

 Entity node to an entity node with an alternate attribute.

The differences in forces controlled by the layout algorithm consist of:

 Natural spring length between nodes

 The mass of the nodes

 Strength of the force between nodes

The pseudo code in

162

Attribute Based Graph Visualisation (ABGV)

Appendix B – Pseudo Code describes the method of determining the factors listed above. The application is event driven. That is, interactions with the application by the user cause events to be generated which initiates re-rendering.

3.8 Summary and Conclusion

This chapter presented a technique which is a simple extension of existing force directed layout approaches and as such is one which is easy for users to understand and easily implementable by visualisation developers. Nodes are related in both colour and proximity to named attributes. This facilitates the visualisation user in maintaining a consistent mental map. Users interactively adjust parameters such as the repulsion and attraction strength between nodes and attribute nodes and the natural spring length between attribute nodes to facilitate exploration of the network.

The prototype demonstrated the feasibility of assisting users to quickly identify possible relationships between attributes and the structure of the network in a single view. The datasets applied to the prototype consisted of mutually exclusive categorical attribute dimensions. Using this type of dataset, the prototype demonstrated the capacity to assist in community analysis and in particular the task of identifying central actors that bridge communities together when compared with traditional force directed layouts.

The approach does have a number of clear limitations including only being effective on attribute domains with a relatively small cardinality and being subject to

163

Attribute Based Graph Visualisation (ABGV) the same occlusion and readability problems of typical force directed approaches.

However, it does have the advantage of being a simple addition to visualisation options of existing force directed layout tools and familiarity to users that have seen force directed network visualisation. The evaluation section (Chapter 7) presents a more detailed examination of this visualisation approach against established task taxonomies.

164

Social Network Analysis for Command and Control (SNAC2)

Chapter 4. Social Network Analysis for Command and Control (SNAC2)

The previous chapter focused on attributes of nodes and their relationship to the topology of the network. The data represents a summary of a network over a set time period. This chapter concentrates on events that occur in time and result in an ever-changing network. The distinction between the two views of the network is important, as most often the static network is a result of temporal interactions which reflect behaviours that cannot be detected by analysing the static view.

As noted in Chapter 2, dynamic network visualisation is challenging due to the complexity introduced by the extra dimension of time. Despite the recent work that has been devoted to dynamic network visualisation there is still room to improve upon existing approaches. This chapter sets the foundation for the following chapters which use novel visualisation techniques and draw on the concepts outlined herein to answer the research question, ‘how can alternative visual encoding approaches aid in performing fundamental temporal tasks on social networks’.

The Social Network Analysis of Command and Control (SNAC2) tool focuses on aspects of the data that motivated the work described in Chapter 3. Specifically, 165

Social Network Analysis for Command and Control (SNAC2) the importance of contextual and attribute information in the analysis of social networks. The SNAC2 tool views communication acts as temporal relational events.

It provides the ability to select a particular set of events and analyse them as a phase.

For example, in analysis work undertaken with the aid of the tool, phases were aligned to the phases expressed in military doctrine.

There is no one definition of command and control (C2) with which everyone in the Defence community agrees. However, it can be generally stated that C2 is the process of commanding and controlling assigned and attached forces to achieve an objective. The mechanism to achieve C2 in a headquarters is via a sociotechnical system. It is this sociotechnical system that the Defence analysts were interested in studying particularly during the prosecution of dynamic targets.

The SNAC2 tool was successfully used by analysts to assess the efficiency and effectiveness processes, to identify chokepoints, inefficient work practices and the propagation of errors within the Combat Operations Centre of the Air and Space

Operations Centre (AOC). Without the unique ability to visualise the state of the network at various points in time in concert with centrality and semantic information, the analysis of the military command and control activities would have been much challenging. A detailed evaluation is discussed in Chapter 7 where the capabilities of techniques described in this thesis are presented and compared.

166

Social Network Analysis for Command and Control (SNAC2)

4.1 Background

The primary motivation for creating the Social Network Analysis of Command and Control (SNAC2) tool was to support the analysis of intensive command and control activities in military headquarters such as the prosecution of dynamic targets.

Analysts were interested in the temporal dynamics of the network and in particular what events triggered the flow of information, how the information was processed and travelled through the network, how the flow of information contributed to a shared situational awareness and how the decisions and actions were accomplished with the coordinated use of information.

Initial efforts in analysing data collected on the prosecution of dynamic targets necessitated discarding some of the contextual information that the field analysts intuitively considered important shaping factors of the command and control network. The goal of SNAC2 was to enable users with some domain experience to gain insight into the conduct of the prosecution of dynamic targets.

The development of SNAC2 was a team effort and my contribution was primarily centred on the visualisation of the network data. Therefore this section concentrates on aspects concerning social network visualisation but leaves the detail of specific analysis to other papers (Au, Lo et al. 2009; Lo, Au et al. 2011).

The data captured and used to test this prototype consisted of verbal communication, communication via a chat application, movement around the workspace, and actions under taken by participants.

167

Social Network Analysis for Command and Control (SNAC2)

As explained in chapter 2, traditional social network analysis quantifies social interactions in terms of network theory without consideration of the associated contextual information. In addition, standard approaches tend to aggregate relationships and either visualise the entire network or a series of static snapshots.

The set of tools currently available to view a network and contextual information is summarised in section 2.6.3. This summary is representative of the state-of-the-art in visualising networks with semantic and attributes information.

The analysts using SNAC2 were initially motivated by the need to see the semantic information in conjunction with the evolving network. However, the attributes of individual nodes were not a concern for them. None of the tools surveyed provided a contextual display of the data while showing the changing interaction network, or the ability to partition the evolving network into phases.

SNAC2 attempts to supplement traditional social network analysis with the addition of content and features such as contextual mark-up. The tool provides the ability to categorise a sequence of events into phases using nominalist notions of time and flag important events. It was created as a consequence of no existing tool providing the ability to visualise the rich data captured by a team of defence analysts.

A prime motivation of the development of this tool is to answer questions about the impact of events (in this case largely communication events) on the network structure and display the evolution of temporal interactions over time.

The main display window of SNAC2 consists of 5 primary visual areas; a domain specific Time Sensitive Target (TST) traffic light area, a network visualisation 168

Social Network Analysis for Command and Control (SNAC2) area, a temporal centrality scatterplot, event and communication area, and timeline slider widget (shown in Figure 47). Each of the separate areas updates their portion of the display in response to the timeline slider widget moving through time.

Together these areas form a set of interactive Coordinated Multiple Views (CMV).

CMV is a method that combines different visualisation techniques on the one screen using the same data. The tool allows users to filter and scroll through time

169

Social Network Analysis for Command and Control (SNAC2)

Time Sensitive Target (TST) traffic light

Network Graph and rolling measures of centrality Event and (scatter plot) areas communication

Timeline slider

Figure 47 - SNAC2 Main Panel layout

170

Social Network Analysis for Command and Control (SNAC2)

whilst observing the changes in each view. The tool implements a range of layout algorithms which include:

 Fruchterman Reingold (1991)

 Kamada Kawai (1989)

 Simple Spring Layout (Eades 1984)

 Simple Circle Layout

The main display currently has two tabs which when selected between network “graph” and “motion chart” allow the main network display to be swapped alternatively with the mini temporal scatterplot. In addition the tool provides a number of other panels which allow the user to select events and assign them to user defined phases allowing a comparison of the graphs and metrics of these phases.

The Traffic Light area is a domain specific indicator of the progression through the dynamic targeting process. This gives additional context to the network as it develops over time. These indicators relate to doctrine that guides the process.

Specifically the find, fix, track, target, and engage steps.

4.2 Similar Work

Gloor and his colleagues developed a tool called TeCFlow to explore the evolution of social networks over time (Gloor and Yhao 2004; Gloor, Laubacher et al.

2004). Treating emails between actors as an approximation of social ties, Gloor applies the Fruchterman-Reingold forced-directed layout algorithm (Fruchterman and Reingold 1991) to display daily communications patterns. This is achieved by 171

Social Network Analysis for Command and Control (SNAC2) applying a ‘sliding time frame algorithm’ which provides a ‘window’ of a variable number of days of highlighted communication. By default old communication activities (outside the window) are included in the layout of the graph with the existing old links not decaying. These old links are displayed as dimmed arcs but affect the ongoing layout of the graph.

SNAC2, unlike Gloor’s TeCFlow allows for the selection of various layout algorithms. However, a similar sliding window technique is employed. Independent of which layout algorithm is chosen; links lying outside of the variable time window decay and lose their influence on the layout. Gloor’s persistent ties approach is predicated on the assumption that ties are established once an email message has been sent and this message has an enduring effect of the topology of the network.

Gloor’s Temporal Social Surfaces offered a “no history” mode in which ties do decay

(Gloor 2005). He noted that in the persistent link (“history”) mode centralities get smoothed out and in “no history” mode centralities tend to oscillate wildly. However, in an intensive dynamic event-driven environment it is precisely these variations you are interested in observing and drawing conclusions from.

In an environment where the number of actors is small, and the communication events are numerous and the pattern of communication fluctuates, the enduring effects are less important than viewing the topology changes as they occur through time. Traditional graph representations tend to calculate the whole graph and use the calculations of minimal force as fixed node positions and animate

172

Social Network Analysis for Command and Control (SNAC2) the graph over time using these fixed positions. In this prototype node positions are calculated continuously.

4.2.1 Continuous Node-Link Layout

Approaches that use Node-link representations to study network changes over time typically use animation or the comparison of snapshots of the state of the graph at different times. Those that have used this approach (Gloor, Laubacher et al.

2004; Moody, McFarland et al. 2005) generally calculate the network layout for a moving time window. For each successive window the layout is computed as a key- frame and for periods between these key-frames they compute and render interim frames.

Examination of this approach by (Trier 2008) shows that the rendering of transition frames disturbs the impression of the evolution of network structures.

Trier proposed a method used in the tool Commetrix (Trier 2006) whereby nodes and edges are dynamically added to and removed from the graph and their impact is seen immediately by the force-directed layout seeking a new energy minimum. He suggests that this technique is best suited to environments where the user has the ability to control the timeline. For every adjustment of the time parameter 푡 ∊ 푇⁡a layout algorithm ℒ퐺(푡) ↦ ⁡ ℒ퐺(푡 + 1) continually updates the visual layout (Peterson

2011). A similar continuous layout approach is applied in this application.

This approach allows the user to instantly see the topology of the network when particular nodes are deselected. More importantly, when a window or a frame

173

Social Network Analysis for Command and Control (SNAC2) of reference is moved in time topological effects can be seen through the movement of nodes. This movement, for instance, draws attention to central nodes changing their position in the network as we move through the timed events. This movement alerts the user to points in time where significant changes occur. This can then be compared with the semantic information which might offer clues to the cause of the change. This can then be further analysed to determine if the change was appropriate and if remedial action in a process or structure might be required.

This approach is also ideal when analysing network changes across various phases of an activity so this functionality became an important design feature of

SNAC2. By playing and replaying the network and observing the changing network topology together with the centrality measures for the links in the current time window, it is possible to gain an understanding of network topology changes through the course of the activity and which actors were critical to each phase.

Trier also proposed adding inertia to nodes based on their degree to reduce

“unnecessary node movement”. SNAC2 uses the continuous layout approach but does not introduce extra inertia for highly connected nodes. Observations of SNAC2 with the C2 dataset suggest that adding inertia to nodes may not be necessary and may negatively affect node movement. Nodes of high degree that tend to have

“excessive” movement are slowed naturally by virtue of the force created by the connections with other nodes. Movement that does occur may be an important indication of a change in relationships. As described above it is these oscillations in network position that we are particularly interested in.

174

Social Network Analysis for Command and Control (SNAC2)

The advantage of this approach is that no additional computation is required other than what is required for a force-directed layout. It may provide additional advantages in the amount memory required, due to only displaying a subset of relationships in the time-window.

4.3 Event Based Analysis + Content

The text based event and communication area provides a time stamped transcript of the activities occurring in the adjustable time window. In addition it highlights events the analysts have marked as belonging to a particular category.

Dynamic targeting can be divided into 6 distinct phases: Find, Fix, Track, Target,

Engage and Assess (F2T2EA).

The “Find” phase refers to the process of detecting a target for prosecution using intelligence or surveillance assets. The “Fix” phase is concerned with methods of making an accurate determination of the probable target location. The “Track” phase focuses on monitoring activity and movement of the probable target. The

“Target” phase endeavours to match an asset and method of attack with the desired effect and gains the appropriate approvals. Orders are transmitted and confirmed by the prosecution asset and the action is monitored in the “Engage” phase. The

“Assess” phase examines the results of the engagement to determine if the desired effects were achieved.

The ability to mark-up the data with meta-data enables the categorisation of events to particular phases. This in turn enables a more focused and differentiated

175

Social Network Analysis for Command and Control (SNAC2) view of the network facilitating an understanding of how the network changes in response to key events and how this relates to the movement between phases.

In cases of prosecuting simultaneous targets, various nodes in the network are operating in different phases. SNAC2 supports access to the content and the context of events providing the analysts with the only method of determining the node phase relationship at points in time. Without access to the content of the many communication acts it is impossible to determine communication latencies. Latencies are important in any communication network as it signifies that the time nodes in the network are working with ‘stale’ data.

Graph centrality measures are typically used to find influential or important nodes in a network. Centrality is often associated with the amount of information a node has or has access to. For instance, Leavitt (Leavitt 1950) considers network centrality to be a measure of the availability of information necessary for solving a problem and the availability of information to be of prime importance in affecting one’s behaviour. However information in the network is not available to everyone simultaneously; it takes time to propagate through the network and the fastest route is not always via the shortest path (Kossinets, Kleinberg et al. 2008) .

As described in section 2.3 most centrality measures use geodesic distances to calculate a value which defines a node’s centrality. Most distances between nodes are based on the presence of a link between nodes or, in the case of a valued network, the strength of the link between nodes. When using communication or interaction networks as a proxy for a real social network, customary measures of 176

Social Network Analysis for Command and Control (SNAC2) centrality may not be adequate and lead to indicators of nodes which may not be central at critical time periods. Prevailing measures of centrality do not consider the timing and sequencing of event links. Centrality is volatile and dependent on time, reflecting a temporal use of the network by individuals to carry out organisational tasks (Trier 2008).

The SNAC2 tool visualises the progression of the network topology via the force-directed layout as a consequence of event sequencing. This is displayed in concert with the context of the event. In the data recorded for the Dynamic Targeting study the “context” consisted of the content of communications (both written and verbal) and actions undertaken by the participants in the study.

Visualising the changing topology concurrently with the contextual source of the links during a time period allows analysts to visually explore the progression of events with a richer understanding of the events and how their sequencing is responsible for the network topology or vice versa.

4.4 Implementation

The implementation of SNAC2 was a collaborative effort of three Defence analysts, two of which designed and implemented the tool. This chapter describes the visualisation aspects of the tool which I was responsible for designing and implementing. These visual aspects draw on the ideas outlined in the previous chapters. The tool was implemented primarily to aid in understanding the F2T2EA

177

Social Network Analysis for Command and Control (SNAC2) process and to assist in identifying avenues of optimising the process within an Air

Operations Centre.

4.4.1 Data

SNAC2 uses four primary input files. However, only one that has significance to the visualisation aspects described in this chapter is the “data” input file. An example of a small portion of a data input file is shown in Figure 48. Internally the tab delimited file is parsed and used to create an internal representation. Central to this representation is the SNAMetrics3D Class (see Appendix B).

The data fields that are essential to the graph visualisation are “speaker”,

“listeners”, “date” and “time”. The speaker field describes a directed arc to one or many listeners. The date and time fields are important in terms of the temporal visualisation aspects. In concert with the date and time fields the selection of the

“operators” and the “window time” from the control panel are also important (see

Figure 49).

Figure 48 - An example of a data input file for SNAC2

178

Social Network Analysis for Command and Control (SNAC2)

The selection of parameters from this panel determines what nodes are displayed as the user moves the slider through time. An “operator” is a person performing a particular role in the conduct of the F2T2EA process. The abbreviation of “operators” depicted in Figure 49 consists of:

AirspaceMgr – Air Space Manager

BCD – Battlefield Coordination Detachment

CCO– Chief of Combat Operations

C2DO – Command and Control Duty Officer

DTO – Dynamic Targeting Officer

ISRD – Intelligence Surveillance and Reconnaissance Division personnel

Legal – Legal Duty Officer

Radio – Radio Communications

SADO – Senior Air Defence Officer

SIDO – Senior Intelligence Duty Officer

SODO – Senior Operations Duty Officer

SOLE – Special Operations Liaison Officer

TDO - Tanker Duty Officer

The suffix 1 refers to the same roles conducted on an alternate shift roster.

179

Social Network Analysis for Command and Control (SNAC2)

The “window time” has a range of one to ten minutes effectively providing a varying width sliding window in which to view events over time. The population of the “operators” panel is derived from the speakers and listeners fields.

Figure 49 – Operator selection panel in SNAC2

4.4.2 Program Code

The application was written in the Java® programming language and tested with the Oracle® Java Development Kit (JDK1.7)16 making use of the Java Universal

16 The JDK is available at http://www.oracle.com/technetwork/java/javase/downloads/index.html accessed [23/07/2013]

180

Social Network Analysis for Command and Control (SNAC2)

Network/Graph Framework software (JUNG) library17 to render the node-link diagram.

There are two panels which use a node-link visualisation of the time windowed view of the network. Interactions with the application by the user cause events to be generated, which in turn, initiate recalculations and re-rendering. The application has two node-link display areas (one static and one dynamic). As the tool is intended for specific tasks where the number of participants is low (less than 30) all nodes are displayed, even those that are disconnected at any point in time. As described in this chapter this is done to help identify changing relationships overtime.

One primary class is responsible for rendering the dynamic node-link representation. The NetworkGraphlet has two internal classes representing the links and the nodes of the graph. The Pseudo code in describes the key aspects of the

NetworkGraphlet.

4.5 User Evaluation

The tool (and associated concepts) was tested by analysts using data from a military exercise within an AOC environment. This data was collected on the relatively short but highly dynamic interactions required to prosecute a dynamic target.

Dynamic targets are often of a fleeting nature and the ability prosecute them in a

17 The JUNG library is licensed and made freely available under the Berkeley Software Distribution (BSD) license. JUNG is available at http://jung.sourceforge.net [21/07/2013] 181

Social Network Analysis for Command and Control (SNAC2) timely manner is of concern to the military. This requires the group responsible for targeting within the AOC to have periods of very high intensity work. The analysts were interested in optimising the targeting process through improved work practices.

The simultaneous visualisation of the utterances, the state of the process, and the network topology throughout the conduct of the targeting missions enabled the

Defence analysts to identify duration of sub-tasks and understand and decompose the workflow. This in turn enabled analysts to compute state transition probabilities for branched workflows (which were used later in subsequent modelling activities).

The identification of the sources of error and the propagation in the network was facilitated by the contextual information provided by the tool. The tool provided the ability to determine if the error occurred by an operator central to the network at that point in time or peripheral to it. For example, in one instance the analysts observed that a central operator made an incorrect decision at a time where the error could be corrected. However, it remained uncorrected and the resulting impact on the network and the workflow was observed.

During the exercise, the analysts were assessing two alternate operator configurations for the purpose of improving targeting throughput. These configurations allowed operators to perform all or part of another operators function in times where an operator is overloaded. These configurations were the backbone of an autonomous workgroup structure in which operators changed their role to accommodate for dynamically shifting workloads.

182

Social Network Analysis for Command and Control (SNAC2)

The tool’s ability to animate the network undertaking the targeting activity enabled the analysts to compare both conditions in terms of central operators, network structure, and task timings during selected phases of interest similar to what is shown in Figures 50 and 51. These graphs were supported via additional visualisations of centrality plots and bar graphs for the separate phases (shown in

Figures 52 and 53).

None of the standard tools provided the ability to utilise the contextual information whilst studying the changing topology. This tool improved over the standard offerings in that it provided the ability to utilise data from observer captured utterances, chat system logs, observer comments and system status logs. In addition, the analysts also identified that the replaying of such military activities has the potential to aid military operator training and assist in military self-assessment.

The tool also showed potential to assist linguistic based studies, however, it would benefit from the inclusion of a text searching function. Providing the ability to search and locate specific dialogue in the textual data would be generally beneficial to most analytical activities that employ this tool.

183

Social Network Analysis for Command and Control (SNAC2)

Figure 50 -SNAC2 SNA Graph, Phase 1

Figure 51 -SNAC2 SNA Graph, Phase 2

184

Social Network Analysis for Command and Control (SNAC2)

Figure 52 -SNAC2 Metrics per Phase

Figure 53 -SNC2 Metrics Graph

185

Social Network Analysis for Command and Control (SNAC2)

4.6 Concluding Remarks

SNAC2 is unique in its use of coordinated views consisting of a variable windowed continuous temporal graph layout and the use of associated attribute and contextual data to define and visualise those sub networks. In addition, the visualisation of both a domain specific ‘traffic light’ and the coordinated rolling measures of centrality make this approach unique.

The SNAC2 tool was successfully used by analysts to assess the efficiency and effectiveness of processes, to identify chokepoints, inefficient work practices and the propagation of errors. For a more detailed description of the analysis performed with

SNAC2 see (Lo, Au et al. 2009)

186

Social Network Analysis for Command and Control (SNAC2)

187

The Parallel Arc Diagram (PAD)

Chapter 5. The Parallel Arc Diagram (PAD)

The work described in Chapters 3 and 4 focuses on the visualisation of what is often considered the metadata of the network, the attributes of node and edges, and additionally the contextual information encoded in the semantics of the captured data. Chapter 4 focused on the exploration of the research question of “how can alternative visual encoding approaches aid in performing fundamental temporal tasks on social networks?” Here the temporal tasks were assisted through the dynamic visualisation of the network and social network metrics. This chapter concentrates on a key difference in analysing static and dynamic social networks, the detection of interaction patterns over time.

The visualisation approach discussed in this chapter focuses on temporal patterns embedded in the network. It is based on the premise that the patterns of communication we build up over time are themselves significant. The concept demonstrator discussed in this chapter visually aids in detecting those patterns and uses the novel Parallel Arc Diagram (PAD) technique as its basis.

188

The Parallel Arc Diagram (PAD)

Human interactions can be thought of and analysed as event-based. That is, interactions are a consequence of or occur at a particular event. The TIPAD tool described in this chapter visualises interactions that are event based.

The detection of temporal patterns in social network data usually relies on a viewer to detect them by playing and replaying animations of the network. This chapter presents an approach where a static picture is sufficient to discern particular interaction patterns without the need for animation.

5.1 Background

Historians and social scientists believe archives of email, forums and blogs are important artefacts that enable an improved understanding of individuals and communities. Although the primary component of these modes of communication is text, the limitations of exploring and analysing using only the text does have some limitations. A common approach to exploring these types of archives is keyword search, however, Perer, Shneiderman et al. (2006) argue that this results in losing the conversation’s context. They propose visualizing temporal rhythms that are inherent in email archives is one way of providing the missing context. By visualising the embedded rhythms, analysts are stimulated to ask why certain relationships start and stop, why certain relationships have similar interactions patterns and conversely why others yield different interactions patterns.

Viégas, Golder et al. (2006) hypothesise that the patterns of communication we build up over time are themselves significant. The detection of

189

The Parallel Arc Diagram (PAD) these patterns can be remarkably useful. Barabasi (2010) detected burst patterns in general e-mail use and eventually expanded on this idea as a model for everything humans do. The importance of pattern discovery in timed event data is very important and a source of much understanding of social networks as is demonstrated and explained in (Schaefer, Wanner et al. 2011). The Parallel Arc Diagram (PAD) visualisation approach is aimed at visualising data consisting of a sequence of separate periods in such a way as to reveal the temporal patterns (rhythms) hidden in the data.

A social network that many of us are familiar with and which fits the criteria above, is motion picture movies. Although we do not typically think of a movie in those terms, they have all the essential ingredients of a social network just as email archives and forums do. A character in the movie will have ‘ties’ to other characters in the movie and the relationships between characters is reflected by their participation in movie scenes and the dialogue they have in those scenes.

In this chapter the PAD visualisation approach is demonstrated using a movie script as the source of a social network18. The concept demonstrator application, the

“Temporal Interactive Parallel Arc Diagram” (TIPAD) implements the PAD idea and explores its utility with this dataset.

18 Movie scripts are available online under fair use provisions. 190

The Parallel Arc Diagram (PAD)

5.2 PAD Concept

The concept of the PAD visualisation is a simple one; in some respects one could view it as a merging of a matrix representation with that of a node-link diagram and the concept of histograms. Node-link drawings are well suited to being drawn by hand, indeed it is often claimed that their origin is the sociograms drawn by Moreno

(1953). However, a computer can render the same network much more precisely with orderly precision.

The PAD approach relies on the rendering precision that computer technology provides. Consider the simple 2-mode network shown on the left in Figure 54.

Figure 54 - Node-link layout vs. PAD layout

191

The Parallel Arc Diagram (PAD)

This graph representation shows relational links that have some temporal ordering associated with them. This network can be redrawn as shown on the right by laying out the links in a parallel fashion immediately adjacent to each other (see animation at http://users.tpg.com.au/pjhoek/PAD_explained.html). It might be argued that in such an approach the link relationships are harder to discern, however this is not the case and this aspect will be explained in later sections of this thesis.

Moreover, according to studies previously described, the typical way of representing a relationship with a link line connected to a node at an arbitrary angle is only to be useful for path-following tasks.

The key aspect of a 2-mode social network is that it does not record direct relations between social actors but rather via collective entities conventionally termed ‘events’ (Alexander 2005). The classic dataset from Davis, Gardner et al.

(1941) is an illustration of such data and is shown in a node-link diagram in Figure 55.

Other examples of 2-mode data include character participation in movies scenes, staff participation in business meetings, membership of corporate boards, and individual participation in online forums and so on. Often the data has a temporal nature to it and commonly additional data is available related to the actors in a particular event, for example, characters in a movie might have a sequence of alternating dialogue in a movie scene.

192

The Parallel Arc Diagram (PAD)

Figure 55 - Davis, Gardner et al. Dataset displayed as 2-mode node-link

Direct relations (1-mode projections) can be derived from 2-mode data; however the process of transforming the data discards some information (see Figure

56). In particular, which actors are related to which event and how they are related in time. Furthermore, the data often contains additional detail describing the relationship itself. This could include timing or sequencing of relational links within the event or contextual information such as questioning or responding actions in forum participation for example. The value of the PAD approach attempts to preserve and visualize all aspects of the data including the temporal ordering of relational links and the contextual data.

Collective entities (event nodes) are placed horizontally across the top in time ordered sequence and actors, the origin of the data points, are placed vertically down

193

The Parallel Arc Diagram (PAD) the left. I propose the ordering and indeed the inclusion of actors19 not be fixed but based on the requirements of the user.

Figure 56 -Davis Gardener et al. Dataset displayed as 1-mode projection node-link diagram. Source (Freeman 2000) The order of actors could be determined by filters and be of any nature desired by the user. For example, the ordering could be based on graph theoretic measures or thresholds. In the examples shown in this section the actors are first ordered by degree and, second, by time of participation. Actors are assigned a colour from a predetermined set and the colour sequence is repeated if the number of actors exceeds the available colours in the set. The colour palette chosen in this instance consisted of strongly contrasting colours. The application uses colour to indicate grouping, dissimilar colours have been chosen to adhere to the Gestalt laws of grouping (Todorovic 2008). In addition the pixel, thin lines utilise colours that have

19 The use of the word ‘actor’ is in the Social Network sense, not a thespian

194

The Parallel Arc Diagram (PAD) high luminance contrast with the background. However, designers have the option to choose an alternate colour palette which might better suite their particular requirements.

Link lines representing the relationship from actor to event form horizontal

‘bars’ of varying thickness due to multiple adjacent lines (shown in Figure 57). In node-link drawings these relational events would normally be represented via aggregated or multiple arcs. The ‘bars’ resulting from the PAD approach allow for easy visual determination of the relative degree of each node. As the horizontal collective entities (event nodes) are arranged in sequential order, when a filter is applied controlling which link lines are rendered, patterns emerge based on the filter and the temporal ordering of nodes (demonstrated later in this thesis).

Discrete relational links within an event (e.g. communication act) vertically sequenced Discrete collective entities (events) horizontally sequenced a b c

X

Y

Aggregated relational Figure 57 - PAD layout detail links (per “event”) vertically sequenced

195

The Parallel Arc Diagram (PAD)

This type of layout also offers the ability to include contextual or attribute data in the visualization. In this section we describe the inclusion of ordered dialogue, indicated by small semi-circles (bubbles) shown in the diagram above. These utterances are ordered vertically allowing the user to view the progression of the conversation. As well as preserving the temporal relationship the PAD layout affords the opportunity to preserve and display other elements of the data which normally would be difficult to include.

The PAD approach is useful in visualizing 2-mode data that has a sequential or temporal dimension where the data can be divided into discrete sessions or clusters. In addition, the data ideally consists of actors that are the origin of data points that relate to the temporally ordered relational events.

As a simple introductory example, Figures 58 and 59 shows the Davis, Gardner et al. Dataset displayed using the PAD approach. You may note that within each social event or activity (boxes at the top) there are no ‘bubbles’ indicating individual actor relational events, as this information is not available in the data.

In these examples Evelyn is selected and the social events that she attended are highlighted. What amounts to a count of co-attendance is displayed next to the actors’ names and their co-attendance is highlighted. The pattern of co-attendance of each individual is displayed and highlighted via their coloured arcs contrasted against other interactions shown as white lines.

196

The Parallel Arc Diagram (PAD)

Figure 58 - Davis, Gardener et al. dataset displayed as 4-mode PAD layout (events 1 to 8)

Figure 59 –Davis, Gardener et al. dataset displayed as 4-mode PAD layout (events 7 to 14)

197

The Parallel Arc Diagram (PAD)

The images in Figures 58 and 59 show a clear shift in the relationships

(expressed by attendance at the same social functions) from the 8th social event, which Nora did not attend. In particular Sylvia and Katherine co-attend with Nora thereafter. Although Evelyn is selected in Figure 59 the continuation of the white lines of Sylvia and Katherine demonstrate a similar pattern to that of Nora. Figure 58 shows the strong attendance of Evelyn, Laura, and Theresa in events 1 to 8.

5.3 Evaluation Taxonomy

Plaisant (2004) describes information visualisation as a way to answer questions you didn’t know you had. She goes on to point out that this presents challenges in designing tasks to measure the effectiveness of a visualisation. In this thesis we use a task taxonomy derived from various studies of node-link diagrams to comparatively assess the potential utility of the PAD approach. As discussed earlier, node-link visualisations are by far the most commonly used representation in social network analysis, therefore we consider it valid to compare the PAD visualisation approach with node-link visualisations.

Graph visualisations are utilised for numerous activities which are comprised of low-level tasks within those activities. Cui’s (2006) survey report lists a number of common low level tasks that are commonly carried out when interpreting graph visualisations:

1. For the whole graph, count the number of nodes. 198

The Parallel Arc Diagram (PAD)

2. For a given node, count the number of its incoming or outgoing links.

3. For a given node, find its adjacent nodes.

4. For a given node, find the nodes that can be reached by a certain number of

steps.

5. For the whole graph, find the middleman nodes.

6. For the whole graph, find strongly connected clusters.

7. For the whole graph, find all nodes/links which share some specific attribute.

Cui states that node-link representations are good enough for the first three of these tasks and matrix layouts are particularly good at revealing different proportions of links from a node that goes to different categories. (Ghoniem, Fekete et al. 2005; Keller, Eckert et al. 2006; Lee 2006; Lee, Plaisant et al. 2006) also provide a taxonomy of commonly encountered tasks while analysing graph data. Lee categorises them in four groups: topology based, attribute based, browsing and the overview task. These taxonomies generally align relatively well with Cui’s list, with the possible inclusions from Lee:

8. Which node has a maximum number of adjacent nodes?

9. Find and select a node.

10. Find and select a link.

11. Find the length of the shortest path.

199

The Parallel Arc Diagram (PAD)

The taxonomy above can be applied to dynamic network visualisations; however, what is missing are descriptions of tasks that include the temporal dimension. Some of the listed tasks can simply be applied to a dynamic network visualisation at time t and then at t + time interval. However there are tasks which are specific to the time domain which require new low level task descriptions targeted at the temporal dimension. For example, for actor I, identify the time period where no relationship exists with actor j. There are currently no published taxonomies of fundamental tasks undertaken during the interpretation of dynamic network visualisations.

In the following section we demonstrate the advantages of the PAD approach in the Temporal Interactive Parallel Arc Diagram (TIPAD) application. This application uses character participation in movie scenes as the basis for exploring social interactions over time and facilitates the comparison of the PAD visualisation with traditional visualisations of the same network. We will use the extended taxonomy of tasks described above as the basis for comparative evaluation of the approach but will also discuss why one might want to perform some of the listed tasks and why this particular approach to visualisation might be appropriate.

5.4 Temporal Interactive Parallel Arc Diagram (TIPAD)

The primary goal of the TIPAD application is to serve as a means to comparatively evaluate the PAD approach. TIPAD uses a movie script as input and assumes the character dialogue within it is representative of social interaction. 200

The Parallel Arc Diagram (PAD)

Numerous methods have been proposed to analyse movies and usually they focus on audiovisual features of the movie. However, Weng, Chu et al. (Weng, Chu et al. 2007) claim these approaches don’t assist in “understanding” movie content and propose analysing movie content through examining the social relationships between roles.

The TIPAD application is designed to visually support analysis that enables

“understanding” of the kind proposed by Weng, Chu et al.

This thesis uses the script of the movie the “The Matrix” as a simple illustration of the efficacy of using a PAD visualisation approach. This particular movie was chosen as it is a relatively well-known movie, has a number of key characters, and presents the development of various relationships during the course of the movie.

5.4.1 Similar Work

Similar work was undertaken by (Mutton 2004) in applying his IRC relay chat to visualise animations of social networks derived from the text of the plays of

William Shakespeare. TIPAD is differentiated however by the sequential ordering of scenes directly from the script and does not animate a node-link representation.

The work of (Donath, Karahalios et al. 1999; Tat and Carpendale 2006) in designing graphical representations for persistent conversations is also pertinent.

Donath, Karahalios et al. focused on the ability to reveal inaction patterns at a glance which are not ordinarily perceivable by simply perusing the conversational archive.

201

The Parallel Arc Diagram (PAD)

Tat and Carpendale also targeted revealing patterns in an individual’s online conversations.

The TIPAD application is similar to the extent that it immediately reveals patterns of interactions based on a persistent conversation log, e.g. a movie script. It also reveals inferred relationships, associated dialogue and contextual information in a simple two dimensional display without clutter or occlusion. A goal of this application was the ability to view commonly utilised metrics and characteristics of network visualisations quickly without having to refer to the script.

One should also note that although my Parallel Arc Diagram bears a similar name to Wattenberg’s (2002) Arc Diagrams, they are different concepts. The word

“arc” in Wattenberg’s work refers to the shape of connecting lines, that is, they are drawn in curved or arched shapes. The use of the word “arc” in Parallel Arc Diagrams is as it is used in mathematics, describing directed edges or arrows between ordered pairs of vertices in graphs.

5.4.2 The Script and the Network

A movie script is considered as a 2-mode bipartite network with one set of vertices composed of all the characters in the movie and the other set of vertices consisting of all the scenes in the movie. Co-appearances in scenes are described by a set of edges composed of an ordered pair from each set of vertices. Describing the movie script as a 2-mode bipartite network appears appropriate as it is difficult to discern from a movie script a set of character to character mappings based on

202

The Parallel Arc Diagram (PAD) dialogue. Often what is said in a scene is not directed to an individual but rather broadcast to all.

One should note here that there are existing techniques such as correspondence analysis and bipartite graphs that allow for the analysis and visualisation of 2-mode networks (Borgatti, Everett et al. 2002). In addition, there have been attempts to visualise 2-mode social networks using the line-graph of the bipartite adjacency matrix (Alexander 2005). However, these techniques present problems in determining the ‘meaning’ of values or inhibit the ability to discern ties between nodes, particularly when there is a temporal dimension to the data.

Many movie scripts are documented in a semi-structured format such as the

Warner Brothers formatting style (Cole and Haag 1988) making them relatively easy for a computer to parse. “The Matrix”20 movie script was parsed and encoded into the structure shown in Figure 60. A program was created using the Java programming language to accept files structured in this format, and from this, a network was constructed and visualised using the PAD approach.

20 “The Matrix” was written by Larry and Andy Wachowki and fair use copy is available at http://www.imsdb.com/ 203

The Parallel Arc Diagram (PAD)

Figure 60 - example of 'The Matrix' parsed file

5.4.3 Comparative Visualisations

For the purposes of comparison, the social network of the movie “The Matrix” was also visualised as a bipartite graph and an adjacency matrix utilising programs created with the Java programming language. Similarly the force-directed and radial layout visualisations were constructed using Java , the Perfuse visualisation library

(Heer 2004) and the Jung visualisation library (O'Madadhain, Fisher et al.)

The network could be characterised as small to medium in size. It has 220 scene nodes and 28 character nodes with a total of 316 edges which is fairly typical in terms of a movie network. Networks in the order of this size are well suited to the

PAD approach.

204

The Parallel Arc Diagram (PAD)

Graphs of networks are often represented as node-links, adjacency matrix, or bipartite graphs for the purpose of analysis or to highlight particular features of the network. In the case of Figure 61 (a node-link representation) the parameters affecting the force-directed layout were manually adjusted to give the graph the best spatial positioning. This is often the difficulty in using force-directed layouts as a visualisation component; the optimal parameters are dependent on the particular network and the requirements of the user.

Figure 61 - “The Matrix” depicted as 2 mode graph using a force-directed layout (aggregated utterances)

In analysing the alternative representations (shown in Figures 62-65) one might be able to detect which characters contribute to the majority of scenes.

However, the exact degree to which each contributes is not so immediately apparent

205

The Parallel Arc Diagram (PAD) that one could not confidently place them in rank order. In addition, despite Cui’s

(Cui 2007) assertion that node-line representations facilitate easy counting of the number of nodes, incoming or outgoing links and finding adjacent nodes; these tasks are clearly not “easy”.

Figure 62 -“The Matrix” depicted using a radial graph layout (aggregated utterances)

206

The Parallel Arc Diagram (PAD)

Figure 63 -“The Matrix” depicted as an adjacency matrix (aggregated utterances)

Figure 64 -- Bipartite Graph of the movie 'The Matrix' (aggregated utterances)

207

The Parallel Arc Diagram (PAD)

Figure 65 -2-Mode Node-link Graph of the movie 'The Matrix' (separate utterances)

The conventional visualisations shown above are static representations and do not offer any ability to discern each character’s involvement in movie scenes over time. Additionally, if one wished to view the sequence of dialogue in each scene, within the same visualisation, it would be a difficult task to represent it in an aesthetically appealing and comprehensible way.

Visualising all utterances of the movie using the node-link approach is shown in Figure 65. Character nodes are rendered in pink and movie scenes in which the utterances occur are coloured yellow. This visualisation shows the complete set of utterances, however, it still gives no indication of the temporal ordering of them. It is this figure that should be compared to the TIPAD application. The information density

208

The Parallel Arc Diagram (PAD) of each is about the same and both attempt to visualise the same dataset with the same level of fidelity.

The TIPAD application implements the new PAD technique to overcome the temporal limitations of existing visualisation approaches described above.

5.4.4 TIPAD Design and Operation

Authors such as (Brandes, Raab et al. 2001) have noted that in node-link drawing, “the focus is on readability rather than visual communication of substantive content”. The design of the TIPAD was influenced by this paper and the observation that standard graph layouts often don’t make the simple characteristics of the graph immediately apparent, (even on small or moderately sized networks). In addition it is difficult to use a standard node-link diagram to display interactions over time without the use of animation.

In the TIPAD default view the application attempts to provide an initial overview of the movie as advocated in (Shneiderman 1996) Visual Information

Seeking Mantra. However this overview does not mean viewing the entire collection, but rather viewing all the characters and their participation in scenes. The TIPAD application organises the scenes horizontally in sequential order using panning to see the full complement of scenes. Examples of early visualisation environments in which this temporal horizontal ordering is effectively used can be seen in the time-based bar and line graph approach used by (Plaisant, Milash et al. 1996) and (Harris, Allen et al. 1999).

209

The Parallel Arc Diagram (PAD)

The design of the application acknowledges and makes use of recognised residual effects of English reading habits and the resultant interpretation of graphics

(Winn 1994). It does this by following a left to right top bottom convention. It is temporally ordered with the earliest scenes on the left and characters from highest degree to the lowest positioned top to bottom.

Those characters that have dialogue in a scene are shown with a horizontal line linking the character to that scene and each line is vertically adjacent to the next.

In traditional node-link diagrams, the links are placed at angles dependant on node positioning and it is difficult to estimate the comparative degree of links from one node to another, particularly as the number of edges increase. The PAD approach lays these links out adjacent to each other and in sequential order creating a bar with progressively decreasing height over time. The total thickness of the bar resulting from adjacent links reflects the amount of appearances in scenes from a particular point in time. This is similar to the way bar height in a histogram represents a numeric value except that any point in time can be the basis of comparison.

Organising the links in this way facilitates preattentive (Healey, Booth et al.

1995; Alexandre and Tavares 2010) visual processing of the information. Preattentive processing is derived from an area of human cognitive psychology and is based on the idea that there is a limited set of features which enable humans to detect things very rapidly and accurately using the low-level visual system. In this case, the features are colour and size which assists high-speed visual estimation of the relative out degree of character nodes.

210

The Parallel Arc Diagram (PAD)

One might note that scrolling may be required to see all the characters when viewing a movie with a large cast with the TIPAD. However, by default the characters’ are positioned in out degree rank order (in this application equivalent to the number of scenes in which they contribute to the dialogue) allowing the user to graphically determine the degree order. This provides a portion of the overview capability described in Lee’s taxonomy; allowing users to immediately get estimated values or at least relative weights. In terms of the extended task taxonomy described above, the default view satisfies item 1 and 2, albeit through the use of scrolling and panning.

Compare this with a user’s ability to do the same in the most commonly used visualisation (Figure 61 to Figure 65) and one will see that the PAD approach provides this capability much more effectively.

In terms of analysing the movie, (Weng, Chu et al. 2007) claim that degree centrality is directly related to leading roles. Therefore in this movie, one could hypothesise that Neo, Morpheus, Trinity, and perhaps Tank might be the main characters in this movie. Determining the characters degree when using the traditional visualisations is achievable in most cases but more cognitive effort is required than using the PAD approach.

The design of the TIPAD application recognises research such as (Lee 2006) in which it is demonstrated that visual exploration of network data requires interactivity. Indeed, static visualisations of complex systems pose problems in which interactive interaction and navigation are essential to providing a solution (Munzner

2000; Goud 2006). Basic interaction in TIPAD is performed with mouse operations

211

The Parallel Arc Diagram (PAD) including the selection of characters, selection of display modes, scrolling the screen, hovering for information access and zooming through the use of the mouse scroll wheel.

The scene rectangles in the top third of the TIPAD display contain vertical colour coded lines indicating contributions by characters in that scene (see Figure 66).

Hovering the mouse over the line will cause an information popup to display the character’s name. Similarly, hovering over the scene number will provide a detailed description of the scene. Each of the vertical character lines has ‘bubbles’ representing dialogue for that character and is displayed in sequential order vertically. Hovering the mouse over these ‘bubbles’ will reveal the character name followed by the particular utterance. The visualisation therefore has temporal ordering in the horizontal direction based on scene order and also temporal ordering in the vertical direction within a scene, based on dialogue order. This ability is generally not provided in the other comparative visualisations and in some cases would be difficult to include within the same visual space.

212

The Parallel Arc Diagram (PAD)

Scenes in horizontal sequential sequence, containing dialogue in vertical sequential order. Scenes in which selected actors participate are highlighted with the actors’ allocated colour.

Hovering over the scene number provides a description of the scene and similarly hovering over dialogue bubbles provides a view of that particular utterance.

Selectable characters’ in rank order according to participation in scenes with dialogue

Individual horizontal lines create a bar representing out degree at particular points in time. Vertical links connect characters’ with scenes and show their dialogue in sequential order

Figure 66 - TIPAD, showing the default view of the movie ‘The Matrix’ in a PAD layout 213

The Parallel Arc Diagram (PAD)

Selecting a character by clicking on their name in the left-hand pane or directly in the scene node and followed by changing mode to ‘mode 2’ provides a filtered view of characters and their participation in scenes. Actors with dialogue appearing in scenes with the selected actor are displayed, providing immediate visual feedback as to the co-appearances in scenes (shown in Figure 67). In this example, the character Trinity is selected, and provides a somewhat egocentric view of the network, allowing one to immediately see that Neo has the most appearances in scenes with Trinity, followed by Morpheus and Tank.

Figure 68 demonstrates how the PAD visualisation naturally shows the amount of interaction over a time period. Rapidly decreasing height of the horizontal bars formed by the link lines is an indication of high participation in scenes and this is quickly identified with the use of scrolling. Figures 69 and 70 show how different patterns of interaction can be rapidly identified. Using these figures one can easily determine that Neo’s participation in scenes in the early stages is dominated with scenes that Morpheus also participates in. A number of thick bands of lines indicate that in terms of Neo’s total participation in the movie, on 3 occasions he participates in numerous successive scenes with Morpheus. However, his co-appearances in scenes with Trinity are more evenly distributed.

As the characters (nodes) are presented as a vertical list, finding a node does not require searching the full two dimensional space as is the case in a node-link

214

The Parallel Arc Diagram (PAD) diagram. This effectively provides the capability described in item 9 (find and select a node) of the extended taxonomy.

Figure 67 – Basic egocentric view (Mode 2)

The TIPAD application also highlights the scene nodes in which the selected character links to and is equivalent to highlighting all scenes in which the selected character has dialogue. In terms of the task taxonomy this satisfies item 3, ‘for a given node find adjacent nodes.’ In addition, the application filters those nodes to only

215

The Parallel Arc Diagram (PAD) displaying those characters that appear in the same scenes, effectively adjacent nodes that are two links away. This interactive capability can be provided in traditional node-link visualisations. However, the PAD approach provides a filtered representation whilst visually maintaining the temporal ordering of links.

Figure 68 – High Participation in Scenes vs. Low Participation

Figure 69 – Participation with Trinity

Dark bands indicate a number of successive scenes with Morpheus

Trinity has 2 groups (Phases) of interaction with Morpheus

Figure 70 – Participation with Morpheus

216

The Parallel Arc Diagram (PAD)

If the user is simply interested in the number of characters and pattern of interaction with the selected character then ‘mode 3’ provides this without the distraction of other link lines by only displaying those links representing co- appearances (see Figure 71). Mode 3 also provides a crude means of determining communities within the movie. Typically communities are groups of nodes that have dense connections between them but sparse connections between communities. By progressively viewing those that have dialogue in the same scenes and logically disregarding those possessing a degree below a particular value, one can make a rough estimate of the members of a community.

By selecting ‘mode 4’ one can view the sequence of co-appearances in scenes with the selected character (see Figure 72). In accordance somewhat with

(Shneiderman 1996) guidance the TIPAD application allows a user to zoom in and examine the patterns of interactions with the selected character (see Figure 73 for a

‘zoomed’ view). In this mode the user can see how those co-appearances (shown in the characters assigned colour) relate in sequence to the rest of the scenes undertaken by the co-appearing character (shown in white).

217

The Parallel Arc Diagram (PAD)

Figure 71 – Mode 3 Figure 72 – Mode 4

218

The Parallel Arc Diagram (PAD)

Trinity is selected

Complete set of links with only links to scenes with Trinity

Notable periods when Morpheus and Tank do not have dialogue with Trinity. Tank’s pattern of interaction with Trinity

Figure 73 -Mode 4, Zoomed view

219

The Parallel Arc Diagram (PAD)

5.5 Implementation

5.5.1 Data

The TIPAD application is designed to use a movie script in the format shown in Figure 74. Scene location number and information are utilised by the application for mouse hover popup display. Similarly, the speech utterances by the characters are encoded and visualisated by hovering over a “bubble”.

Scenes are viewed as events in which a number of entities (Actors) participate in.

Co-participation in an event (in this case a scene) provides a relationship link

Speech is noted in TIPAD by ‘bubbles’ and the order of conversation is preserved

Figure 74 - An example of a movie script format expected by the TIPAD application

The parser code used by the TIPAD application makes use of the Java® Scanner class processing lines starting with “Scene:”, “Location:” and “Speech” and their associated lines. 220

The Parallel Arc Diagram (PAD)

5.5.2 Program Code

The application was written in the Java programming language and tested with the Oracle® Java Development Kit (JDK1.7)21. No additional libraries were utilised other than the standard Java class libraries.

The implementation makes use of the Panel and Component design of Java itself. The application consists of four main elements: the TIPADCanvas, the

TIPADContainer, EventObject and Actor (see Figure 75). The EventObject,

TIPADContainer, TIPADCanvas extend the JPanel class and the Actor extends the

JComponent class. The EventObject, TIPADContainer and the Actor all override the paint method to render the display.

The TIPADContainer paint method performs the rendering of the parallel lines which make up a large part of the TIPAD design. The pseudo code in Appendix B describes the rendering algorithm of the TIPAD Container (ignoring the unimportant details of java graphics such as determining x and y location and display size detection).

21 The JDK is available at http://www.oracle.com/technetwork/java/javase/downloads/index.html accessed [23/07/2013] 221

The Parallel Arc Diagram (PAD)

EventObject TIPADCanvas Actor TIPADContainer

Figure 75 - TIPAD Application major GUI components

5.6 Conclusion

It is somewhat cumbersome describing an interactive visualisation application with the use of words and static pictures. A short movie is available showing the interactive operation of the TIPAD that demonstrates the use of the application and the potential utility of the PAD approach22.

Despite the fact that Node-link diagrams are widely used across numerous domains, studies indicate that on even moderate size graphs they lose their utility.

22 YouTube video showing the operation of the TIPAD prototype can be found at http://www.youtube.com/watch?v=D5HEtCIIca8

222

The Parallel Arc Diagram (PAD)

This section demonstrates when given a simple network and a node-like diagram it is often difficult to answer the simplest of questions, such as ‘which actor has the most interactions?’ In addition to this, studies described in this thesis have noted that the way the node-link representation is positioned has an effect on the way the graph is perceived.

I believe the wide-spread use of Node-link diagrams can be in part explained by their visual appeal, for example, there is something mesmerising about watching a force-directed node-link visualisation wriggle and adjust to find an optimal layout.

On this basis, node-link representations may well be the best choice to convey the results of analysis but may not always be the best choice for conducting visual exploratory analysis.

The PAD approach enables answering simple questions about the network from the visualisation itself. It also allows the analyst to explore the network over time from the point of view of different actors in the network.

This section demonstrated and evaluated the PAD approach with the use of a rudimentary implementation of it in the TIPAD application. The PAD approach makes simple network features, such as degree, immediately visually discernible. The studies and research referenced in this thesis contributed to the comparative assessment of the PAD visualisation approach using common task taxonomy and visually comparing alternative representations of the same network. A summary of this assessment is presented below in Table 6. The evaluation presented in this table was based on the network of the movie “The Matrix” and the alternative 223

The Parallel Arc Diagram (PAD) representations of the network presented in this thesis. In addition, general utility of other approaches were garnered from the academic papers sited herein.

This assessment is comparative and is subject to variation dependent on the properties of the network being analysed. However, in general it is possible to say for example, that the standard matrix representation is not effective in facilitating path following. No controlled experiments have been conducted to validate any of the assertions presented in this section and therefore Table 6 serves to orient the user to which tasks the author perceives the PAD approach being particularly applicable, and where it sits in the visualisation space.

The PAD approach does have a number of limitations that have yet to be explored. One drawback is its currently limited value in assisting in the analysis of network topology characteristics or evolution. In addition, although scrolling is utilized to view the full extent of the network, large networks will present a problem.

However, as described in this thesis, this is similar to the problems faced when trying to visualise large networks with Node-link diagrams. On a large scale network it might be possible to use techniques borrowed from other visualisations such as aggregating and drill down techniques.

In large networks, an important advantage of the PAD approach is that it offers a linear rendering time compared to force-directed layouts which attempt to minimise a global energy function and are generally considered to have a running time equivalent to the number of nodes cubed O(V3).

224

The Parallel Arc Diagram (PAD)

Table 6 - PAD Visualisation evaluation table

man

- st path

dynamic

, select link

how

# of nodes # incoming or outgoing links adjacentfind nodes be reachedCan # in steps middlefind nodes clustersfind nodesfind & links with attribute selectFind, node Find shorteFind S aspects

-

PAD

   

Estimate relative weight & read value interaction showing adjacent to adjacent Ease is dependent sizegraph and density possible with colour size and additions

as as

Node-

e e

  link

animation

possible with

full space 2D

Not goodas ormatrix PAD Keller notes it be can hard, can’t estimate easilyas as PAD Ease is dependent sizegraph and density Ease is dependent sizegraph and density possible with colour size and additions search Ease is dependent sizgraph

Matrix      

animation

possible with

Link Link tracing is hard might necessitate matrix reordering possible with colour size and additions Not as easy as PAD

Bi-

partite    graph

might be

animation

possible with

PAD betterPAD not easyas as PAD Ease is dependent sizegraph Ease is dependent sizegraph possible with colour size and additions

The TIPAD visualisation has some inherent advantages that flow from this alternative view that may not be immediately apparent. The use of this tool during evaluation has shown the usefulness of scrolling whilst maintaining focus on a particular link line to quickly find the scene of interest. In addition, the ability to incorporate and visualise contextual data in the same view is considered to be of value to analysts.

The ability to distinguish patterns of interaction with a particular actor is somewhat unique. ‘Patterns of interaction’ is a commonly used phrase in dynamic network analysis but very rarely do visualisations provide the ability to display the

225

The Parallel Arc Diagram (PAD) patterns in an obvious way. More often than not, these ‘patterns’ are a result of a cognitive process and are constructed in the analyst’s head.

The PAD approach presented in this section represents a viable alternative to the use of node-link diagrams to represent small temporal network graphs. The use of this approach was demonstrated in the TIPAD application to visualise character interactions in a movie and tested for its ability to service the low level tasks commonly required of such a visualisation. There is much scope to extend the PAD idea starting with the use of simple filters based contextual data (such as keywords) or time periods of activity.

The PAD approach is a general concept which can be applied more broadly to the visualisation of network graphs in a number of domains. We believe the PAD approach suits data that is of a temporal nature that can ideally be divided into discrete sessions or bursts. Using the PAD approach offers an alternative view on data which will likely reveal new insights and foster new techniques for extracting meaning from such data sets.

Data sets of this nature include online communication interactions including email, discussion forums and blog comments. The approach might also be applied to meetings, parliamentary sessions, legal cases or even football games based on

“plays”.

226

The Parallel Arc Diagram (PAD)

227

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

Chapter 6. Temporal Interactive Multi- slider Event and Relationship (TIMER) tool

This thesis attempts to answer the question, ‘how can alternative visual encoding approaches aid in performing fundamental temporal tasks on social networks?’ The alternative visualisation approach discussed in this chapter tests a new visualisation technique with an established dataset. The advantage of using this dataset is that it has previously been thoroughly analysed using a number of tools and the analysis processes are documented. This offers an opportunity to compare the technique implemented in Temporal Interactive Multi-slider Event and

Relationship (TIMER) tool with other approaches and also provides the ability to test the concept with a medium size dataset.

The TIMER tool was constructed to test a new temporal visualisation concept.

The concept consists of using a windowed node-link representation in a similar way to SNAC2 (discussed in Chapter 4) together with a novel temporal visualisation technique.

228

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

The coordinated dynamic visualisation approach provides complementary views on the data. A primary element of these coordinated views is a component implementing a novel visualisation representation.

The new temporal visualisation approach draws on the TIPAD visualisation approach (discussed in Chapter 5) whereby, parallel connecting lines describe the pattern of interactions over time. The visualisation approach is different to that of the TIPAD application in that interactions are not event based but occur over a continuous period of time.

6.1 The Dataset

One method of assessing a visualisation approach is to use a dataset that others have used to demonstrate the utility of their particular visualisation technique. The use of the same dataset facilitates comparison and determination of what is revealed by competing approaches and assists in assessing the intrinsic value of the underlying visualisation.

The Visual Analytics Science and Technology (VAST) datasets have become an important part of the Visual Analytics Benchmark Repository23. The repository was created to improve the evaluation of visual analytics technology as it is generally difficult for two main reasons; highly interactive visual analysis systems are not suited

23 The Visual Analytics Science and Technology (VAST datasets can be found on the Visual Analytics Benchmark Repository web site: http://www.cs.umd.edu/hcil/varepository/

229

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool for traditional evaluation methods, and researchers rarely have access to users and their data and the type of problems they face.

The VAST datasets are synthetic datasets with embedded ground truth. They are framed around an analytic problem which has known solutions. Participants in the VAST competitions have several months to analyse the data. Their answers and a description of the analytic process they used are evaluated using both quantitative accuracy ratings and subjective ratings from experts and professional analysts.

Awards are given to the teams whose tools received the best reviews. The award winners of the 2008 Cell Phone mini-challenge (Grinstein, Plaisant et al. 2008) are described in this chapter and their work is used to compare and contrast with the qualities of the TIMER tool. Twenty two entries were submitted to this mini-challenge with four awards given in this section.

Other researchers have also used the 2008 Cell Phone mini-challenge dataset as the means to demonstrate the value of their approach whether it be visual or algebraic (see Shaverdian, Zhou et al. 2009; Johansson and Johansson 2010; Vigliotti and Hankin 2012).

In the 2008 Cell Phone mini-challenge the contestants are provided with information on a fictional Paraiso movement and persons of interest in that group.

The goal of the challenge is to:

1. Characterise the Catalano/Vidro social structure as reflected in the

calling data.

230

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

2. Identify any changes in the social structure over the ten day period.

The mini-challenge data consists of a set of records stored in the Comma-

Separated Value (CSV) format containing cell phone calls in Isla del Sueno over a ten day period in June 2006. The data consists of the ID of the call and receiver, the cell phone tower that the call originated from, and the date and duration of the call. This data contains some spatio-temporal data, however, as this thesis focuses on temporal aspects, the geospatial aspects of this data are not considered.

Using distinction between continuous and discrete time in networks as described in Moody et al. (2005) this data can be considered to represent a continuous network as it is made up of streaming data with known start and end times for each relationship. This data can be interpreted as defining a dynamic social network, whereby each cell phone is a node which is connected by a separate edge for each phone call. When interpreted in this way, the 10 days’ worth of data could be considered a social network consisting of 400 nodes and 9834 edges.

The dataset consists of a Comma-Separated Values (CSV) file in the format shown in Figure 76.

231

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

Figure 76 - A small portion of the VAST 2008 'Cell Phone Call' dataset

The information provided with the dataset also reveals that Ferdinando

Catalano’s identifier may be 200 and that close relatives and associates that he would likely call include David Vidro, Juan Vidro, Jorge Vidro and Estaban Catalano. In addition, further information indicates that Ferdinando would call his brother

Estaban most frequently and that David Vidro coordinates high level Paraiso activities.

6.2 Similar Work

The data used for assessing this approach formed a sub challenge in the VAST

2008 Challenge. Several submissions describing a particular approach were received and are available online24. In this section I discuss some of the award winning

24 The papers relating to the 2008 Challenge can be found at http://vac.nist.gov/publications.html

232

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool submissions to the Cell Phone mini-challenge to enable a comparison with the TIMER tool approach.

All the award winning submissions inferred the existence of a social network from the data and utilised force-directed node-link representation of the network together with other visualisations. Most of the node-link representations use standard force-directed layout approaches and some also included conventional visualisation enhancements using colour or size typically to show degree.

One of the award winners Farrugia and Quigley (2008) constructed a tool specifically targeting the phone calls mini-challenge. Their tool consisted of a two by two pixel matrix view coupled with a node-link animation. In the matrix portion the intensity of edge colour increased with the number of same edges encountered over time. Likewise in the node-link view the thickness of the edge increased up to a threshold and thereafter the darkness of the edge was increased (see Figure 77). With both of the views the time taken for the colour to fade could be adjusted.

Their observation when using this tool was that having control over what information was displayed and for how long was crucial to discovering patterns. It appears that the “patterns” were discovered by controlling the time the edges would fade together with replaying the animation.

233

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

Figure 77 - VAST 2008 Cell Phone Mini-challenge tool. Source: (Farrugia and Quigley 2008)

Another winning submission in the challenge from Correa, Crnovrsanin et al.

(2008) used two visual analytics tools MobiVis and OntoVis to aid in understanding and solving the challenge. Two important visualisation elements were the call graphs

(see Figure 78) which represented calls and people in a two dimensional plot utilising temporal filtering, and the interactive time chart (see Figure 79) which assisted in discovering salient changes in communication patterns.

Figure 78 - MobiVis, Call Graph (Correa, Crnovrsanin et al. 2008)

234

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

Figure 79 – MobiVis with Time Chart. Source: (Correa, Crnovrsanin et al. 2008)

A third winning submission was that of Perer (2008) using a tool called

SocialAction which was designed to incorporate visualisation and statistics to improve exploratory data analysis (Figure 80). Besides the customary node-link diagram Perer used a “stacked histogram” visualisation. However, he found it “quite complex and hard to interpret”. However, it did highlight that individuals were very active until June 8 and others were active after that.

235

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

. Figure 80 – SocialAction, stacked histogram. Source: (Perer 2008)

Even though the visually assisted analysis performed by mini-challenge award winners was quite accurate, there was room for improvement in the visualisations they used. Correa, Crnovrsanin et al. (2008) noted the complex network quickly became cluttered and it was difficult to find interesting patterns. Secondly, the use of data abstractions which summarised the data limited the ability to see detailed information, and in this case some of the patterns of interest were only visible when viewing the lower level details. Perer (2008) also noted difficulty in extracting information from the visualisations he was using saying, “very little information seems to be evident from network visualisations of this size”.

It should be noted the evaluation of this tool using the VAST 2008 dataset has the advantage of hindsight. However, the purpose of this evaluation was not to prove that the TIMER tool was a more effective aid to solving the challenge but rather to evaluate the visualisations employed in each tool and determine at a fundamental level the advantages of each.

236

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

6.3 Motivation

It is apparent in the literature the answers to questions analysts have when analysing dynamic networks are often found in the temporal patterns in the data. The idea of visualising patterns appears in the literature repeatedly. For example, “There are several statistical modelling tools that have been developed specifically for network data. But these tools were designed primarily for testing hypotheses. They do not provide a simple direct way to explore the patterning of network data – one that will permit an investigator to “see” groups and positions.” (Freeman 2005)

The primary motivation behind the development of this prototype is to examine whether using a node-link view and a specific time event view together facilitates a better understanding and identification of temporal patterns and temporal groups.

6.4 TIMER Approach

Node-link diagrams can be very useful to understand the structural properties of a social network. However, node-link representations do not reveal the temporal dynamics of a network. In the work of Appan, Sundaram et al. (2006) on analysing the Enron dataset (a large scale social network), they noted animation or graph aggregation visualisation techniques did not work very well. The approach discussed in this section combines a node-link representation with a representation of events in time without aggregation or animation. This aids in preserving what Nguyen, Eades et al. (2013) refer to as information faithfulness.

237

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

The primary purpose of the TIMER tool is to test a visual encoding technique that assists finding temporal patterns in dynamic network data. A simple means of filtering the dataset to particular nodes is provided in the tool which is essential to assist in visually finding patterns in large datasets. Filtering is currently achieved through a simple checkbox approach. However, an important aspect of the TIMER tool is direct manipulation of the components by clicking and dragging them or clicking on them and using the mouse scroll wheel. The linking technique propagates the change in one view to the other visualisation component (see Figure 81).

Node-link layout (scroll wheel zoom

3 direct manipulation areas (drag scroll and scroll wheel zoom (swipe and move) Selection panel Temporal event layout (Time event diagram)

Time/Date bar

Figure 81 - TIMER tool interface

238

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

As noted elsewhere in this thesis visualisations of graphs using force-directed layouts with a large number of edges may result in an unintelligible depiction with many occlusions and overlaps. This is true in this tool as well (see Figure 82).

However, when zooming is applied to the time panel the user’s viewport is progressively “zoomed” and entities that do not have events within this time period are removed from view. Similarly when filtering is applied to the nodes only the ones selected will be visible in the node-link layout and the time event diagram.

Figure 82 -TIMER tool showing the full dataset viewed as a node-link and temporal events

239

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

6.5 Evaluating the Prototype

To test the prototype, the task set in the VAST 2008 Cell Phone Mini-Challenge was used as a case study. The goal of the challenge is to identify the Ferdinando

Catalano, Estaban Catalano, David Vidro, Juan Vidro and Jorge Vidro in the dataset and characterise the social structure changes throughout the time period.

The background information provided for this challenge included intelligence that the node with the identifier 200 might be Ferdinando Catalano, Ferdinando calls his brother, Estaban, most frequently and David Vidro coordinates high-level activities and communications within the network. In using this data we have the benefit of being able to access the competition’s submissions and the ground truth.

I however only used the data in the Cell Phone Call Challenge for this case study and concentrated on exploring how the TIMER tool would have assisted analysis without any prior knowledge (in fact I only became aware that the ground truth existed after performing this analysis).

As mentioned previously, after loading the data set in full the node-link diagram becomes a “hairball” and the temporal event layout is extremely cluttered.

With almost 10000 links this is not unexpected.

Figure 83 shows the first step in solving the challenge, the data point 200

(intelligence has medium level of confidence that this is Ferdinando Catalano) is visualised with the rest of the nodes that have made calls to it. In the figure, filtering

240

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool has been applied and the node-link representation shows the set of nodes 200 has a directed edge to.

Figure 83 - TIMER tool showing phone calls with relative duration from node 200

This figure also shows another feature of the prototype: being able to visualise the relative length of each conversation. The width of the line in the middle panel can be adjusted to represent the length of each telephone call. By moving the slider the width of the lines are re-adjusted relative to each other.

After establishing the identities of the recipients of calls from node 200, they too are visualised in addition to node 200 (see Figure 84). It is immediately apparent from this visualisation that the communication from and to this sub network drops off sharply after day 7 (The Time/Date bar has a tool tip and timer line appears in the central panel when the mouse hovers over the bar).

241

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

Very little communication after (Day 8)

Figure 84 - TIMER showing all phone calls to nodes 200,5,3,2 and 1

242

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

An additional piece of intelligence for this challenge is “Ferdinando would call brother Estaban most frequently”. If this is true, it is easy to identify node 5 as

Estaban as he has the most incoming links from node 200. The next piece of intelligence is “David Vidro coordinates high-level Paraiso activities and communications”. This statement could be interpreted as an indication he would have the highest communication in the sub network or he is central to others which have high communication. An appropriate metric representing this relationship is

“betweenness centrality”. As this prototype did not provide filtering on betweenness centrality I opted for another tool to calculate the degree and betweenness value of the nodes in this network. Node 5 had the highest betweenness centrality followed by 1, 0, 309 and 306. Given that I have tentatively identified node 5 as Estaban, it might therefore be reasonable to assume David Vidro is one of the other nodes, possibly the one with the highest betweeness and therefore I tentatively choose node

1 as David Vidro, supported by the visualisation which shows that node 1 makes a high number of calls to nodes 200 and 5.

In the next phase I attempted to determine the entities that communicate with node 200’s first order network (adjacent nodes). Scrolling across the bottom panel it quickly becomes apparent that there are a number of nodes outside the ones listed above that have a high degree of communication. It also is apparent that the distribution of calls is not across the whole 10 days (visually recognising this using the node link would be impossible). Investigating further, the additional nodes that have

243

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool a high degree begin communicating on the 8th day until the 10th day. This step brings to light that nodes 1, 2, 3 and 5 communicate with a large set of nodes which show a pattern of communication from day 1 to day 7, and thereafter the pattern of communication is between nodes 309, 306,360 and 397 (shown in Figure 85)

Start of day 8

Figure 85 - Zoomed in from the start of day 8

When this second set of nodes are selected within the prototype and it visually shows that they all make calls to node 300, node 309 makes a high number of calls to node 185, and node 260 makes a high number of calls to node 276. When

244

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool all of these nodes are selected and intercommunication is visualised, it is apparent that node 276 is the only node in this sub network that makes calls to the same node

(360) across the two phases (day 1 to 7 and day 8 to 10). Node 276 also makes calls to node 1 in the first phase and node 309 in the second phase (shown in Figure 86).

Figure 86 - Second set of nodes

From the information obtained (described above) a reasonable conjecture would be that node 200 is Ferdinando Catalano (essentially given), node 5 is Estaban

Catalano, and node 1 is David Vidro. However there is an apparent change in the network on day 8 with a completely different set of nodes dominating in the second

245

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool phase. Given the callers to the sub networks span both phases, one could surmise that either something dramatic happened to the nodes 1,2,3, and 5 or they are the same as nodes 306, 309, 360 and 397 using different communication devices. If the latter is assumed, for the same reasons that node 5 is presumed to be Estaban

Catalano and node 1 is David Vidro, node 309 could be Estaban Catalano and node

309 could be David Vidro during the second phase. This is reinforced by the interaction pattern of node 276 which interacts with both node 1 and 309 over those phases.

If we continue with the assumption that the actors in the network switched phones we need to find an equivalent node for node 200. Reviewing the node-link display reveals node 300 being central to the 306, 309, 360 and 397 sub network in a similar way to node 200 in the phase 1 network. From this we could assume that node

200 and 300 are the same individual. With no additional information one could also assume nodes 2, 3, and 360, and 397 are Juan Vidro and Jorge Vidro respectively. One could base this on their prominence in the network and their communication to the nodes identified as David Vitro.

6.6 Discussion

This case study described above allowed us to explore the use of visualisation to solve the VAST 2008 Cell Phone mini-challenge. The visualisation proved useful for exploring the temporal aspects of the challenge which it seems was an important factor in explaining the network structure and evolution. When reviewing the case

246

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool study described above with those submitted to the Vast 2008 competition it is clear that all techniques require the injection of human intuition to interpret the results of the visualisation. However, determining what affect the visualisation has on triggering that intuition is difficult. Nevertheless, all the submissions highlight the importance of visualisation in this process.

Comparing the award winner for time visualisations of cell phone activity

(Perer 2008) with the TIMER tool, it is apparent that the TIMER tool provides higher fidelity visualisation compared to the “stacked histogram” used in SocialAction tool.

However the stacked histogram approach provides a clearer “overview” visualisation.

The ability to visualise the interaction over different time scales is very valuable particularly when an analyst is interested in what event occurred first and which may trigger subsequent events.

The solution to the VAST 2008 mini-challenge is not definitive but organisers offer one possibility (shown in Table 7).Table 7 - One possible solution of the VAST

2008 Cell Phone Calls Mini-challenge.

Table 7 - One possible solution of the VAST 2008 Cell Phone Calls Mini-challenge

Person Day 1-7 Cell Day 8-10 Cell

Ferdinando Catalano 200 300

David Vidro 1 309

Estaban Catalano 5 306

Jorge Vidro 2 397

Juan Vidro 3 360

247

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

The analysis performed in the case study came to similar conclusions that are described in the solutions that were provided by the competition organisers.

However, the TIMER tool does not have the ability to determine or visualise centrality measures and would benefit greatly if this ability was included. The selection of the filtering could be enhanced by the adoption of filtering based on graph characteristics such as n-level adjacency and centrality. Secondly, in a directed graph such as the mini-challenge it would be beneficial if the selection of ‘to and from’ nodes could be interchanged.

The prototype discussed in this chapter was designed for modern computer displays. Specifically the application was designed to run on a display resolution of not less than 1600 x 1000 pixels. Testing has shown that redesigning the application for displays such as laptops with 1280 x 800 pixels does not provide enough screen area if the shown component layout is maintained. It is important when using this visualisation technique that both the node-link and the temporal layout are visible concurrently. This limits the minimum resolution screen in which this technique can be applied.

The case study demonstrated the usefulness of having the event relationship component linked and simultaneously viewable in the user interface. Often it was the temporal relationship combined with the topological relationship which was the trigger for the human intuition. The prototype had limited filtering abilities and although manually selecting nodes of interest was sufficient to evaluate the visual 248

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool encoding approach, it would have been much easier with the ability to query and display node groups that fit particular criteria. In addition, the ability to switch the selection of the “to” and “from” nodes would have made the analysis work conducted on the VAST case study much less tedious. If this visual encoding technique is being considered for a social network tool, these two filter and display functions should be included as a minimum.

The interaction approach of directly scrolling the data areas is very similar to the way modern touch interfaces directly manipulate areas without the use of scrollbars. Pulling the data through time and pulling nodes into view is an effective approach, however having three similar areas adjacent to each other might be unclear to some without having instruction prior to using it.

6.7 Implementation

6.7.1 Program Code

The TIMER application follows a model-view-controller software architecture pattern and was written in the Java programming language and tested with the

Oracle® Java Development Kit (JDK1.7)25. The application makes use of the Java

25 The JDK is available at http://www.oracle.com/technetwork/java/javase/downloads/index.html accessed [23/07/2013]

249

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

Universal Network/Graph Framework software (JUNG) library26 to render the node- link diagram in the force-directed panel.

The intension of the design is to be flexible and to enable its use with a range of datasets. The VAST data was modelled with three classes consisting of VASTData,

VASTDate and the VASTDateFormatter. The date classes are essential because date and time can be represented differently and different datasets cover a range of time scales. In addition, the prime purpose of the tool is to provide a temporal view of the data and as such it is important to get an accurate representation of the time dimension.

The application can be manipulated by a user in any of the six areas shown in

Figure 87. Most of the interaction is accomplished directly with the mouse and scroll wheel. Dependent on the location of the mouse, the context of the mouse buttons and scroll wheel change. This is akin to the way users interact using a finger with the now common tablet computing devices.

If the mouse is positioned over the node-link graph the scroll wheel will

“zoom” the graph in or out. Holding the left mouse button down will translate or move position of the graph. If the mouse is positioned over the EdgeTopPanel or the

EdgeBottomPanel, the left mouse button will horizontally translate the node position. If the mouse is held over the ScrollPane or the Times pane, the scroll wheel

26 The JUNG library is licensed and made freely available under the Berkeley Software Distribution (BSD) license. JUNG is available at http://jung.sourceforge.net [21/07/2013] 250

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool will provide a “zooming” function. That is, the time range represented in the viewable area will become smaller or larger. In this position the left mouse button will translate the time. That is, the displayed times are advanced or regressed.

SelectionPanel

GraphPanel

EdgeTopPanel

SliderPanel

TimePanel

EdgeBottomPanel

Times Scroll Pane

Figure 87 - Major TIMER application user interaction areas (direct user manipulation)

Edges are maintained in a linked list in chronological order allowing a variable sliding window approach to be applied to the window translation/zoom recalculation algorithm. An alternative approach would be to binary search for the new start and end points, but that would require a large amount of contiguous memory for very large datasets, which may be impractical.

The pseudo code in Appendix B shows the rendering algorithm of the

TimePanel. This is the principal panel in terms of temporal analysis as it shows the

251

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool patterns that are formed by the relational links. The source code of this application is available from https://github.com/TIMERTool/TIMERTool

252

Temporal Interactive Multi-slider Event and Relationship (TIMER) tool

253

Prototypes Evaluation

Chapter 7. Prototypes Evaluation

Various dissertations describe the challenge of information visualisation evaluation, (Munzner 2000; Lee 2006; Henry 2008). This is supported by many researchers who draw attention to the numerous issues associated with the evaluation of visualisations (Plaisant 2004; Hall, McMullen et al. 2006; Kang, Plaisant et al. 2007). For example, Farrugia and Quigley (2008) state, “It is difficult, if not impossible, to claim that a group of measures can be used as an absolute indicator of optimal layout or presentation. Instead, quantitative claims from a new algorithm provide an indication that it improves upon certain criteria that are currently believed to impact upon perception”.

According to Nielsen and Molich (1990) there are four ways to evaluate a user interface:

 Formally by analysis techniques,

 Automatically by a computerised procedure,

 Empirically by experiments with test users, and

 Heuristically comparing the interface (that is, logically comparing

parameters of the interface with established techniques)

254

Prototypes Evaluation

Formal approaches are a field of research but currently cannot be applied to real software development projects. Likewise automatic evaluation is currently infeasible except for a few very primitive checks. That leaves empirical tests or heuristic evaluations.

Empirical approaches are a common methodology used in the visualisation field to produce statistically significant results. This necessitates controlled conditions on low-level tasks. However, according to Riche (2009) this is particularly challenging for the information visualisation community because of the difficulty in decomposing a complex high-level task (e.g. find insight) into low level ones (e.g. find a pattern in a graph). This in turn raises doubts on the validity of the results of such experiments.

It can be difficult to gauge the effectiveness of a visualisation approach because incidental choices that are made during the system implementation can have a dramatic impact on its effectiveness, as demonstrated by (Ware 2000). This necessitates experiments designed to test the effectiveness of a visualisation that either have to be targeted at testing a very simple visual element or alternatively gathering data on the effectiveness of the approach in a tool, in situ. The in situ approach is one which is advocated by (Shneiderman and Plaisant 2006). As the majority of visualisation approaches described in this thesis are demonstrated on simple prototypes it renders longitudinal approaches to evaluation infeasible.

The inability to evaluate new concepts over long periods of time can be an issue. Information Visualisation is a means of discovery and therefore the ultimate goal of information visualisation researchers is to provide visualisations that help 255

Prototypes Evaluation discover more insights (Riche 2009). The problem is that discovery is rarely an instantaneous event and is generally quite the contrary. Discovery is most often generated by studying and manipulating the data for extended periods of time. It is very difficult to design short controlled experiments or user testing to determine a visualisations’ contribution to making discoveries. It is relatively easy to measure

‘finding’ and ‘navigating’ tasks but effective exploration and discovery requires active intellectual engagement which is difficult to trigger and control (Plaisant 2004).

The experience and prior knowledge of a network also has a great effect on how well users can ascertain information from the visual representation of a graph.

The most appropriate choice of representation depends on the detailed properties of the connectivity model and the specific task that needs to be carried out (Keller,

Eckert et al. 2006). That is, depending on what is being modelled and even the personal preference of the user, different visualisations may be advantageous.

Studies such as these suggest the most accurate approach to evaluation is the application of comprehensive longitudinal studies in real environments using real world scenarios (Shneiderman and Plaisant 2006).

In order for application designers to consider using the visualisation approaches discussed in this thesis it is important to know which visualisation tasks are supported by each approach. To assist in this, the chapters herein provide a discussion of case studies, evaluations against taxonomies and demonstration layouts. It is recognised that to establish the effectiveness of a social network visualisation tool, it needs to be measured by how effectively social network analysts

256

Prototypes Evaluation gain insights from the tool. Yi, Elmqvist et al. (2010) indicate approaches attempting to measure the level of insight gained requires long-term and multifaceted evaluation techniques such as the strategies discussed by (Shneiderman and Plaisant 2006) which in turn require many users and a number of years for a proper study.

In the visualisation community the rate of adoption of visual encoding techniques is considered a useful measure of the value of a particular technique.

However, adoption of what are now considered valuable visualisation encoding techniques can take a long time (considerably longer than that allowed for a DIT dissertation) to gain acceptance. For example, techniques such as Treemaps, dynamic scatterplots, datamaps took ten to fifteen years to gain significant visualisation community adoption and acceptance (Plaisant 2004).

The premise of visual encoding techniques is that they reduce the cognitive load of users. However, even though physiological sensing of heart rate, respiration and skin conductance, muscle activity and brain measurements are helpful in measuring cognitive load; there are issues in detecting if the effect is positive or negative (Riche 2009).

257

Prototypes Evaluation

7.1 Evaluation

In this thesis I have presented case study evaluations and discussions of the visualisation prototype in their specific chapter. Each of the chapters uses different case study data sets. The ABGV prototype uses a random synthetic data set, the

SNAC2 tool uses real interaction collected by analysts, the TIPAD prototype uses the inferred data from a movie script and the TIMER tool uses a well-known synthetic data set with a known ground truth.

According to Plaisant (2004) case studies can be extremely convincing for potential adopters working in the same application domain as the one addressed by the study. Other approaches such as a long-term and multifaceted evaluation or studying adoption rates are beyond the time and resources available within the confines of this DIT thesis. I have therefore opted to use a logical comparative evaluation approach in addition to the case studies already presented. A comprehensive assessment based on Lee, Plaisant et al. (2006) taxonomy has been undertaken on all prototypes. This is an important first step of a longitudinal approach which will evaluate the application of the visual encoding techniques described herein.

One of the most important aspects of evaluation is to understand under what circumstances visualisation approaches should be used, how they compare and what tasks they best serve. This must be the first step in assessing any new approach so it can be applied to the correct problems and subsequently evaluated via longitudinal studies. To evaluate the range of visual encoding techniques presented in this thesis 258

Prototypes Evaluation

I have opted for an approach using task taxonomies. These taxonomies (and extensions to them) are used to assess the visual encoding effectiveness of the prototypes discussed in this thesis. The task taxonomies provide heuristics grounded in objective research on which one can base predications of the efficacy of visual representations in supporting fundamental visualisation tasks. The research context and taxonomies are an essential part of the research method. Taxonomies have been utilised as an important evaluation measure in a number of other visualisation dissertations and research papers (Munzner 2000; Bezerianos, Chevalier et al. 2010;

Spritzer and Freitas 2012) and are used in this thesis to form the basis of a grounded approach in which the visualisation encoding techniques are situated within the context of intended use.

Although an assessment in respect to task taxonomies is an important first step, many taxonomic approaches do not explicitly address temporal aspects of network visualisation tasks. Three of the prototypes in this thesis are aimed at addressing aspects of temporal network visualisation and therefore evaluating these prototypes against fundamental temporal tasks is informative. To facilitate this, I have augmented Lee’s taxonomy with a temporal version of each of the fundamental tasks. I have also included a temporal version of the test question. The full assessment is presented below with a summary of this evaluation shown in tables 4 to 9.

In addition to this, I also present an evaluation based on Ahn, Plaisant et al.

(2011) network evolution analysis taxonomy, with a summary of the results shown in

259

Prototypes Evaluation

Table 13. This evaluation is carried out on the TIMER, TIPAD and the SNAC2 prototypes but not the ABVG prototype as the first three are targeted at visualising various dynamic networks but the ABVG is currently not capable of utilising a temporal data set.

7.1.1 Object Focus

The evaluation detailed in this chapter starts with characterising the prototypes according to the tasks they focus on. This is contrasted with standard node-link and matrix representations. The ratings for the node-link and matrix representations in Table 8 (and all other tables in this section) are largely drawn from the results shown in (Lee, Plaisant et al. 2006) in which they compare a set of visualisation tools with the standard node-link and matrix representations. A bold tick represents particular strengths of the visualisation approach.

Table 8 - Characterisation of standard approaches and the prototypes based on object focus (cf. Lee, Plaisant et al. 2006)

Object Focus Node-link Matrix TIPAD SNAC2 TIMER ABGV Nodes       Temporal support   Links       Temporal support    Paths     Temporal support Graphs    Temporal support   Groups  Temporal support Connected Components       Temporal support Clusters      Temporal support 

260

Prototypes Evaluation

Table 8 compares the object focus of regular node-link and matrix representations with that of the prototypes presented in this thesis. The table is similar to the table presented in Lee, Plaisant et al. (2006) however, it further categorises support for object types in terms of support for the temporal aspects or static features. The level of support is based on their characterisation and definition of the object focus which can be summarised as:

 Nodes – In social networks the term node and actors are often used interchangeably and intrinsically have attributes consisting of degree and a label.  Links – In social network analysis the term is often used interchangeably with tie or arc which represent relationships  Path – A path is an alternating sequence of nodes and links.  Graph – For the purpose of Table 8 graphs are considered to be objects which users might want to compare or see changes over time.  Groups – A set of related nodes, such as nodes with common attribute values of nodes of interest to users.  Connected Components – A connected component is a maximal connected subgraph.  Clusters – A cluster is a set of objects that are spatially close together. For graph, this is a sub-graph of connected components whose nodes have high connectivity.

Table 8 shows that SNAC2 has the greatest coverage on object focus supporting a focus on nodes, links, paths, graphs and clusters both statically and temporally. It also provides the most temporal support from an object focus perspective.

261

Prototypes Evaluation

Those approaches that use or incorporate a node-link representation such as the TIMER, SNAC2 and ABGV naturally support the same object focus as typical node-link representations. The TIMER tool additionally supports certain temporal aspects of those objects. The ABGV visualisation approach extends support to ‘group objects’ and the SNAC2 visualisation improves on classic node-link representations with support for temporal aspects. The SNAC2 prototype also facilitates the comparison of graph objects over time.

7.1.2 Low Level Tasks

Lee, Plaisant et al. (2006) demonstrate how all complex tasks can be seen as a series of low-level tasks performed on those objects. In this section the visualisation approach embodied by the prototypes are tested against their ability to perform a primary fundamental task and by extension its ability to support high level complex tasks. Each of the tasks has static visualisation based definition and illustrative tests associated with it. In addition, they are accompanied by a temporal version of both definition and test. Each of the tests provides a targeted evaluation of the static and dynamic performance of each prototype which can subsequently be directly compared to classic node-link and matrix representations.

An individual test summary table is provided at the end of each task evaluation and the individual tables are summarised in Table 9 which provides a complete overall summary of the assessment of low level tasks. The comparison to classic node-link and matrix representations was supported via the assessment contained in the Lee, Plaisant et al. publication. 262

Prototypes Evaluation

1. Retrieve Value

Given a set of cases, find attributes of those cases. In a temporal context this task would be rephrased as: “Given a set of cases, find attributes of those cases at a particular point in time”

Test: In what scenes does character A perform in? An equivalent temporal test would

be: what is the third thing character A says in scene 4?

Performance:

SNAC 2 partially supports this through displaying type of action, interaction or

communication act and current global state of TST prosecution. Selection of the

“Given specific set” is enabled by specific selection via checkboxes. The TIPAD

concept demonstrator partially supports this primarily through navigation and

interaction. It supports the temporal version of this question by showing which

scenes a character contributes dialogue to, what the dialogue is at that point in

time, order of contribution in the progression of dialogue during a scene, what

the scene context is, and out degree of the character. The Attribute Based

Graph Visualisation approach supports this simply by virtue of the additional

attribute nodes introduced into the force-directed layout and the colour of the

nodes but does not support temporal aspects. The TIMER tool does not support

this task.

The TIPAD and SNAC2 tools provide link attribute information in the form of

dialogue and actions through time. Depending on the dataset the visual

encoding could accommodate other time-based link attributes.

263

Prototypes Evaluation

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Visual Retrieval of Semantic visual retrieval edge retrieval retrieval of of node attribute supported categorical degree value (edge attribute supported attribute supported value) temporal  

Metrics such as the degree of a node can be considered inherent attributes of

nodes. Node-link diagrams enable the visual determination of node degree and

therefore the standard node-link representation receives a partial support tick

in the rating above.

2. Filter

Given a set of conditions on attribute values, find data cases satisfying those conditions. Used in a temporal context, this task could be rephrased as “Given a set of conditions on attribute values, find data cases satisfying those conditions over a particular time period or at a particular point in time.”

Test: Which workers belong to the Finance Department? An equivalent temporal test

is: Which workers belong to the Finance Department at time x?

Performance: This task is not fully supported by the ABGV, TIMER, TIPAD concept

demonstrators or the SNAC2 prototype. ABGV prototype can answer the

example question purely because each node is tied to and coloured by a

categorical attribute. Temporal filtering is applied to the SNAC2 tool

constraining the display to a time window and making apparent link attributes

264

Prototypes Evaluation

during that period in the form of dialogue or events. Temporal filtering is

applied in the TIMER tool. Zooming-in constrains the event and node-link area

to a time window, however, the only attribute visually discernible is the link

degree. These prototypes offer a very limited ability to apply a filtering;

however, there is no reason why more advanced filtering could not be

implemented in applications implementing these visualisation encoding

approaches. The results of this filtering could reveal results via colour

highlighting for example. Currently the filtering applied to the TIPAD tool

consists of selecting a character and filtering out everyone except those that

co-appear in movie scenes. The TIMER tool applies a minimum amount of

filtering through the use of node selection checkboxes. However this does not

filter attribute values and merely displays connections to or from.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV Visual Minimum level of filtering is available in retrieval of the prototypes. The visualisation sets via static approach does not exclude the possibility associated of more advanced filtering. Potentially attribute visualised via exclusion or highlighting value temporal  

265

Prototypes Evaluation

3. Compute Derived Value

Given a set of data cases, compute an aggregate numeric representation of those data cases (e.g. average, median, and count). In a temporal context, this task could be rephrased as “Given a set of data cases and a period of time, compute an aggregate numeric representation of those data cases.”

Test: What is the betweeness value of participant j? An equivalent temporal test

question then would be “What is the betweeness value of participant j over

time period t”?

Performance: The ABGV and the TIMER prototypes do not compute additional values.

The TIPAD application does compute the number of co-appearances with an

ego and displays them as part of the visualisation. SNAC2 visualises rolling

metrics showing the progression of centrality measures over varied time period

windows and in addition visualises rolling metrics for selected time periods,

phases or for particular events. The other visualisation prototypes do not

include rolling or static metrics. These would be simple to implement however

and applications beyond these prototypes would benefit from the inclusion of

them.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV Only Numeric Computed values not individual computation static count of centrality currently supported supported visualised  temporal 

266

Prototypes Evaluation

4. Find Extremum

Find data cases possessing an extreme value of an attribute over its range within the data set. In a temporal context, this task could be rephrased as: “find periods where data cases possess an extreme value of an attribute.”

Test: What character participates in the most scenes? An equivalent temporal test

question is: “In what period does person J call person K the most?”

Performance: The task is partially supported visually by TIPAD. A user of TIPAD is

capable of answering the test question easily as the characters are currently

sorted by participation in scenes by default. However they could be sorted by

any other attribute value such as ‘who has the most dialogue?’ for example.

The TIPAD application also shows rate of participation in scenes over time via

the slope of the horizontal bar. SNAC2 shows extreme values of selected

metrics overtime via the rolling metrics. The TIMER tool shows nodes with high

overall degree or over set periods of time via the embedded node-link

representation. The time event area shows area of high communication as

discussed in the case study.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Visual Visual Limited Limited support for support for visual visual appearances graph support for support for metric degree degree temporal   

267

Prototypes Evaluation

5. Sort

Given a set of data cases, rank them according to some ordinal metric. In a temporal

context, this task could be rephrased as: “Given a set of data cases, and a period

of time, rank them according to some ordinal metric over that time period. “

Test: Sort characters by participation in scenes. An equivalent temporal test question

would then be: “Sort characters by participation from scene 6 to scene 30?”

Performance: Currently the TIPAD application sorts character by participation in

scenes by default (this is equivalent to degree out). Any other sorting algorithm

could be implemented which would sort on ordinal attribute values. The ABGV,

SNAC2 and the TIMER do not rank and visualise entities according to some

ordinal metric.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Possibility for extended Not supported. These prototypes do support not rank and visualise entities demonstrated according to some ordinal metric via degree temporal

268

Prototypes Evaluation

6. Determine Range

Given a set of data cases and an attribute of interest, find the span of values within the set. In a temporal context this task could be rephrased as: “Given a set of data cases and an attribute of interest, find the span of values within the set over a particular time period.”

Test: What is the range of participation in scenes in the movie? An equivalent

temporal test question would be: “What is the range of participation in movie

scenes 6 to 30?”

Performance: Currently not supported by the TIMER, SNAC2 or the ABGV approach.

Currently TIPAD does not support this directly; however, it does offer the ability

to read the degree out value directly from the ordered list of characters.

Therefore a user could find the minimum and maximum of those and determine

the range manually. This could easily be supported with additional components

to the prototypes.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV Static Not currently supported. Visualisation of Only manual range is not available. This could easily be method on degree supported with additional components. supported

temporal

7. Characterise Distribution

Given a set of data cases and a quantitative attribute of interest, characterize the distribution of that attribute’s values over the set. In a temporal context this task

269

Prototypes Evaluation could be rephrased as: “Given a set of data cases and a quantitative attribute of interest, characterize the distribution of that attribute’s values over the set during a specific time period.”

Performance: This task is not supported by ABGV prototype. SNAC2 does make

characterising the distribution of centrality measures possible via the use of the

scatterplot and separate bar charts. The TIPAD TIMER prototype offers some

ability to characterise distribution via the density of lines. For example, in the

TIPAD application one is able to visually determine the co-appearance

distribution of characters with a selected character via line patterns. The TIMER

tool provides a visualisation of the distribution of communication from or to

them via the generated line patterns.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Visualisation Ability to Visualisation of of scene co- visualise and the distribution   appearances characterise of via line distribution of communication patterns centrality via line patterns temporal   

270

Prototypes Evaluation

8. Find Anomalies

Identify any anomalies within a given set of data cases with respect to a given relationship or expectation, e.g. statistical outliers. In a temporal context this task could be rephrased as: “Identify any anomalies within a given a set of data cases with respect to a given relationship or expectation over a specific time period.”

Test: Is there a person in node 200’s first order network which node 200 calls often?

An equivalent temporal test question would be: “Is there a person in node 200’s first order network which node 200 calls often between day 1 and day 7?”

Performance: This task could be considered partially supported by all the prototypes

depending on what the nature of the anomaly is. However, there will be many

anomalies that are impossible to detect. None of the prototypes support the

identification of statistical outliers based on statistical calculations. However

the TIMER tool has the potential to assist in the detection of abnormal

communication patterns. The TIPAD application is designed to visualise movie

scripts and as such provides little opportunity to detect anomalies. However

using the PAD approach may well provide the ability to detect abnormal

associations in a similar way to the TIMER tool. SNAC2 can be used to detect

anomalies in structure over time or anomalies in centrality and workload in

relation to the activity. The ABGV can detect anomalies in the structure in

relation to the attributes a node or sets of node have.

271

Prototypes Evaluation

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static  Partially supported depending on the nature of an anomaly. All prototypes are able to detect some anomalies. temporal 

9. Cluster

Given a set of data cases, find clusters of similar attribute values. In a temporal context this task could be rephrased as: “Given a set of data cases, find clusters of similar attribute values over a specified period of time.”

Test: Which workers belong to the same department?

Performance: ABGV clusters nodes close to attribute nodes which possess a

particular attribute as a natural consequence of the force-directed behaviour.

In addition as the ABGV algorithm attempts to position attribute nodes

equidistant from each other, the relationship between a node exhibiting a

particular attribute and another node exhibiting a different attribute can be

visually ascertained (provided the number of mutually exclusive attributes is

below 3). The TIMER , SNAC2 and TIPAD prototypes do not support this task.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Clusters are Not supported by the application   established a prototypes priori. temporal

272

Prototypes Evaluation

10. Correlate

Given a set of data cases and two attributes, determine useful relationships between the values of those attributes. In a temporal context this task could be rephrased as:

“Given a set of data cases and two attributes, determine useful relationships between the values of those attributes over a specified period of time.”

Test: What is the nature of the correlation between workers of one department and

that of another department? An equivalent temporal test question would be:

“What is the nature of the correlation between workers of one department and

that of another department over time period x?”

Performance: This task is supported (via visual determination) by ABGV on mutually

exclusive attributes. The topology resulting from the relationship between

actors possessing different attributes is represented visually. This allows the

user to hypothesize about possible reasons for the structure of the network and

in an organisational context, whether the structure is appropriate. ABGV does

not support the temporal version of this question as the prototype does not

currently support dynamic networks. SNAC2 allows for some limited correlation

based on roles of nodes.

273

Prototypes Evaluation

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Some Supported limited via correlation visualisation based on of the roles of topology nodes between attributes temporal

11. Find Adjacent Nodes

Given a node, find its adjacent nodes. In a temporal context this task could be rephrased as: “Given a node and a specified time period, find its adjacent nodes over that time period.”

Test: Which characters have appeared in scenes with character J? An equivalent

temporal test question would be: “Which characters have appeared in in the

same scene with character J from scene number 3 to scene number 30?”

Performance: Node-link diagrams inherently support this, since they show the nodes

and the edges that link them. The SNAC2, ABGV and the TIMER application all

include node-link diagrams in their interface and therefore, to some extent,

they all support this task. On dense networks determining which nodes are

adjacent requires zooming. These prototypes all implement zooming. The

TIPAD application also supports this task and also shows adjacent scene nodes

in the two mode network. In addition, if you consider characters which co-

appear in scenes as being adjacent it also easily presents that information as

well. Highlighting enables easy identification when a node is clicked on the

prototype. 274

Prototypes Evaluation

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Supported Naturally supported through the use of   (two mode) node-link representations with zooming temporal  

12. Scan

Quickly review the list of items. In a temporal context this task could be rephrased as: “Quickly review the list of items over the specified time period.”

Test: Which people attended the third event? An equivalent temporal test question

would be: “Which people attended events one to five?”

Performance: This task is supported in varying degrees by the TIPAD, ABGV and

SNAC2. TIPAD (via direct selection) displays those who participate in a

particular event. The ABGV obviously targets visualising groups by attribute

value and hence reviewing nodes with a particular attribute is trivial.

The TIMER tool partially supports this task (temporally) by being able to visually

isolate nodes which contribute over particular time periods. It also supports it

in a non-temporal way via the node-link representation. The SNAC2 tool allows

a user to quick review the set of nodes involved over a particular time period.

275

Prototypes Evaluation

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Limited Shows Shows support for  participation participation scanning in event in event attribute groups temporal  visually   isolate to periods

13. Set Operation

Given multiple sets of nodes, perform set operations on them. Put in a temporal context, this task could be rephrased as: “Given multiple sets of nodes, perform set operations across specific time periods.”

Test: Who from nodes 1, 2, 3 and 5 have called nodes 31, 34 and 170? An equivalent

temporal test question would be: “Who from node 1, 2, 4 and 5 have called

nodes 31, 34 and 170 in both time period 1 and time period 2?”

Performance: Temporal Set operations are partially supported by the TIMER tool and

not supported by the ABGV, SNAC2, and TIPAD prototypes. Members of the

“to” and “from” sets are selected by checkboxes. The interactions between

them are visualised in both the node-link and TIMER representations.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Partially temporal supported

276

Prototypes Evaluation

7.1.3 Topology-based Tasks

It is mainly the examination of changes in the topological structures over time which facilitates dynamic network analysis (Yi, Elmqvist et al. 2010). In this section the visualisation approach embodied by the prototypes are tested against their ability to perform topology-based tasks. Each of the tasks has static and temporal definitions and tests. Each of the tests provides an evaluation of the static and dynamic performance of each prototype which can be directly compared with classic node- link and matrix representations. The assessment of classic node-link and matrix representations were drawn from the original authors’ publication.

An individual test summary table is provided after each taxonomic classification and these individual summaries are combined into an overall summary in Table 10.

1. Adjacency

• Find the set of nodes adjacent to a node. Put in a temporal context, this task could be rephrased as: “Find the set of nodes adjacent to a node over a particular time period.”

Test: What are the IDs of the people that person 200 called? An equivalent temporal

test question would be: “What are the IDs of the people that person 200 called

over the first 7 days?”

Performance: This is accomplished in the TIMER tool with both the node-link diagram

and time event diagram after the selected node is filtered. The TIPAD tool

displays adjacent nodes in the 2-mode network (characters and scenes). The

277

Prototypes Evaluation

SNAC2 tool provides a visual depiction of adjacency via the node-link diagram

and the check-box filtering.

• How many nodes are adjacent to a node? Put in a temporal context, this task could be rephrased as: “How many nodes are adjacent to a node over the specified time period?”

Test: How many scenes has character k participated in? An equivalent temporal test

question would be: “How many scenes has character k participated in between

scene 2 and scene 100?”

Performance: The TIPAD tool shows this comparatively, using the pre-attentive

aspects of line thickness and displaying node bars in sorted degree order. It also

includes numerical value shown next to the name. SNAC2, ABGV and the TIMER

tool visualises adjacency via the node-link display, which requires counting to

determine exact numbers. The TIMER tool also visualises adjacency via the

time-event diagram, this also requires counting to determine exact numbers.

• Which node has a maximum number of adjacent nodes? Put in a temporal context, this task could be rephrased as: “Which node has a maximum number of adjacent nodes over the specified time period?”

Test: Which character participates in the most scenes? An equivalent temporal test

question would be: “Which character participates in the most scenes between

scene 50 to scene 400?”

278

Prototypes Evaluation

Performance: This is a specific case of “Find Extremum” (discussed above). TIPAD

visually provides this via the thickness of the resulting bar and node ordering.

The SNAC2, ABGV and the TIMER tool only provide visual representations via

the node-link diagram (however this is often cited as an advantage of node-link

representations).

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static      temporal   

2. Accessibility (direct or indirect connection)

• Find the set of nodes accessible from a node. Put in a temporal context, this task could be rephrased as: “Find the set of nodes accessible from a node during the specified time period.”

Tests: Who could have node 200 got a message to by calling his contacts? An

equivalent temporal test question would be: “Who could have node 200 got a

message to by calling his contacts between day 1 and day 7?”

Performance: This function does not exist in any of the prototypes directly. It would

be possible to achieve this via a manual iterative approach with the TIMER tool.

However this would only be practical on small sparsely connected networks.

Extensions to handle filtering by degrees of separation would be beneficial to

most of the prototypes and could be visually implemented via highlighting.

279

Prototypes Evaluation

• How many nodes are accessible from a node? Put in a temporal context, this task could be rephrased as: “How many nodes are accessible from a node during the specified time period? “

Test: How many nodes can node 200 get a message to by calling his contacts? An

equivalent temporal test question would be: “How many nodes can node 200

get a message to by calling his contacts between day 1 and day 7?”

Performance: This task could be achieved with appropriate filtering as discussed

above and subsequently verifying how many nodes are highlighted.

• Find the set of nodes accessible from a node where the distance is less than or equal to n. Put in a temporal context, this task could be rephrased as: “Find the set of nodes over the specified time period that are accessible from a node where the distance is less than or equal to n?”

Test: Who has called node 200’s first order network? An equivalent temporal test

question would be: “Who has called node 200’s first order network over the

period from day 1 to day 7?”

Performance: This task is not supported via any of the prototypes and is currently

only possible using manual selection and filtering.

• How many nodes are accessible from a node where the distance is less than or equal to n? Put in a temporal context, this task could be rephrased as: “How many nodes during the specified time period are accessible from a node where the distance is less than or equal to n?” 280

Prototypes Evaluation

Test: How many nodes have called node 200’s first order network? An equivalent

temporal test question would be: “In the period from day 1 to day 7, how many

nodes have called node 200’s first order network?”

Performance: As described in the previous task this is not supported by any of the

prototypes and is currently only achievable using manual methods and

counting.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static   Not implemented in the prototypes temporal  although possible

3. Common Connection

• Given nodes, find a set of nodes that are connected to all of them. Put in a temporal context, this task could be rephrased as: “Given a time period and a set of nodes, find a set of nodes that are connected to all of them within that time period.”

Test: Find all the characters that have participated in scenes 4, 5 and 12.

Performance: This task is not supported via any of the prototypes.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Not achievable in standard node-link and matrix representations and not temporal implemented in any of the prototypes.

4. Identify and Locate

281

Prototypes Evaluation

• Find the shortest path between two nodes. Put in a temporal context, this task could be rephrased as: “Find the shortest path between two nodes within a selected time period.”

Tests: Which nodes form the shortest connection between node 200 and node 97?

An equivalent temporal test question would be: “Which nodes form the

shortest connection between node 200 and node 97 between day 1 and day

7?”

Performance: This task is not supported via any of the prototypes.

• Identify clusters (sub-graphs whose nodes have high connectivity). Put in a temporal context, this task could be rephrased as: “Identify clusters (sub-graphs of connected components whose nodes have high connectivity) during the selected time period.”

Test: Find groups of people that call each other frequently. An equivalent temporal

test task would be: “Find groups of people that call each other frequently in

between day 1 and day7.”

Performance: This supported visually by the TIMER, ABGV and SNAC2 prototypes

where such groups can be found by visual examination. However, there is no

algorithmic function implemented to “discover” these clusters.

• Identify connected components (maximal connected sub-graphs). Put in a temporal context, this task could be rephrased as: “Identify connected components (maximal connected sub-graphs) during the selected time period.” 282

Prototypes Evaluation

Test: Find groups of people that call each other which do not call other similar groups.

An equivalent temporal test task would be: “Find groups of people that call

each other which do not call other similar groups in between day 8 and day 10.”

Performance: This is not supported visually or numerically in any of the prototypes.

However, there are known methods to calculate this and they could be

implemented in each of the visual prototypes in various ways.

• Find bridges. Put in a temporal context, this task could be rephrased as: “Find nodes that act as bridges between the selected time period.”

Test: Which caller bridges two distinct groups? An equivalent temporal test task

would be: “Which caller bridges two distinct groups in between day 1 and day

7?”

Performance: The prototypes generally do not support this function. However using

the TIMER tool it is possible to visually determine those callers that bridge

groups over time. The ABGV enables users to visually determine those nodes

which provide bridges between groups that have different values of a particular

attribute.

• Find articulation points. Put in a temporal context, this task could be rephrased as:

“Find articulation points between the selected time period.”

Test: What is the node that the distinct groups A and B communicate through? An

equivalent temporal test task would be: “What is the node that the distinct

groups A and B communicate through between day 1 and day 7?” 283

Prototypes Evaluation

Performance: The prototypes do not support this task.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static     temporal   

7.1.4 Attribute-based Tasks

As discussed in Chapter 3 attributes play an important role in social network analysis. This section evaluates the prototypes’ ability to support tasks in which attributes of the node or the links between them are a central factor. The assessment of classic node-link and matrix representations in this section were drawn from Lee,

Plaisant, Parr et al.’s (2006) original publication. The temporal tests are an extension of their taxonomy with a focus on similar tasks in the temporal domain. The results of this section are summarised in Table 11.

1. On the Nodes

• Find the nodes having a specific attribute value. Put in a temporal context, this task could be rephrased as: “Find the nodes having a specific attribute value between the selected period of time.”

Test: What characters participate in more than 20 scenes? An equivalent temporal

test question would be: “What characters participate in more than 20 scenes

between scene 50 and 300?”

284

Prototypes Evaluation

Performance: The prototypes do not offer the ability to answer queries by involving

the comparison of a numerical value. However some of the prototypes visually

support a very limited subset of this task. The TIPAD tool does not provide an

answer to the test question but a user can visually determine this by looking at

the sorted list and ascertaining a cut-off point. The ABGV prototype shows

preconfigured categories via colour. This enables the answering of simple

attribute based questions that relate to these categories.

• Review the set of nodes. Put in a temporal context, this task could be rephrased as:

“Review the set of nodes within the selected time period.”

Test: Who works for the Finance Department? An equivalent temporal test question

would be: “Who works for the Finance Department between June 1998 and July

2011?”

Performance: This low level Scan task is partially supported by the ABGV and the

TIPAD tools. The TIPAD tool displays and ordered visual representation of nodes

and hence provides the ability to review the set of nodes. SNAC2 shows the set

of nodes in a node-link representation over the selected windowed time

period.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static     temporal  

On the Links

285

Prototypes Evaluation

• Given a node, find the nodes connected only by certain types of edges. Put in a temporal context, this task could be rephrased as: “Given a node, find the nodes connected only by certain types of edges between the selected time period.”

Performance: This is not supported by the prototypes. The data sets used by the

prototypes consist of one type of edge only. The TIPAD application visualises

data consisting of utterances. So here the value is different but the type

remains constant.

• Which node is connected by an edge having the largest/smallest value? Put in a temporal context, this task could be rephrased as: “Which node is connected by an edge having the largest/smallest value over the selected time period?”

Test: Who participates in most scenes with character Y? An equivalent temporal test

question would be: “Who participates in most scenes with character Y between

scene 30 and scene 100?”

Performance: This is visually supported by the prototypes incorporating node-link

layouts with weighted line thickness (TIMER, SNCA2, ABGV). It is also supported

in the TIPAD application through the thickness of link bands. The tasks

supported on links only consist of tasks that relate to degree. Tasks that are not

supported are those based on the type of link. However, there is no inherent

limitation in the visual encoding approach which would prevent the application

of filtering to enable this ability. The TIMER and the TIPAD prototypes support

the temporal tests. That is, more broadly the TIMER tool support the

286

Prototypes Evaluation

visualisation of links during particular time periods and the TIPAD tool statically

shows patterns of relationships with an ego over time.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static Partially visually Supported via the use of node-link  supported via representation line thickness temporal  

7.1.5 Browsing Tasks

Browsing tasks form the basis of exploratory data analysis. Without the ability to traverse a path and return to the place you started, the ability to perform analysis is curtailed. This section evaluates the prototype’s ability to support browsing tasks as described by (Lee, Plaisant et al. 2006). In addition it evaluates browsing in the temporal domain. A summary of this assessment is shown in Table 12.

287

Prototypes Evaluation

1. Follow Path

• Follow a given path. Put in a temporal context, this task could be rephrased as:

“Follow a given path during the selected time period.”

Test: Follow a sequence of telephone calls starting from node 3. An equivalent

temporal test question would be: “Follow a sequence of telephone calls starting

from node 3 during the period between day 1 and day 7.”

Performance: This is visually supported by the prototypes incorporating node-link

layouts however highlighting of paths (via colour) has not been implemented

in any of the prototypes. The visual components such as the time event

visualisation component in the TIMER tool and the TIPAD tool do not support

path tracing (except for 1st order node to node). Answering the temporal

questions is possible using the TIMER tool if the number of subsequent calls is

very low.

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static  Supported by use of Node-link diagram temporal

2. Revisit

• Return to a previously visited node. Put in a temporal context, this task could be rephrased as: “Return to a previously visited node within the selected time period. “

288

Prototypes Evaluation

Test: Do the task above (described in path following) and then follow another

sequence of phone calls from node 3. An equivalent temporal test question

would be: “Do the task above and then follow another sequence of phone calls

from node 3 during the selected time period.”

Performance: This is visually supported by the prototypes incorporating node-link

layouts however highlighting of paths has not been implemented in any of the

prototypes yet. The visual components such as the time event visualisation

component in the TIMER tools and the component in the TIPAD tool do not

support path tracing (except for 1st order node to node).

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static  May require Supported by the use of Node-link scrolling diagram temporal May require May require scrolling scrolling & (may be zoom (may difficult) be difficult)

7.1.6 Overview

Lee, Plaisant et al. (2006) describe the overview task as a compound exploratory

task to get estimated values quickly. They note that sometimes it is more

important to be able to estimate the answer than to get an accurate one. In

addition they add that the overview task helps to find patterns. Once a significant

aspect has been identified there are often other computational approaches that

can provide the user with exact values. In the temporal domain the overview task

289

Prototypes Evaluation

facilitates the discovery of patterns pertaining to particular time periods or

indeed identifying distinct periods within the data.

Performance: The overview task is supported to some degree by all the prototypes.

The TIMER, SNAC2 and the ABGV prototypes all support overviewing through

the use of node-link representations. The TIPAD application provides a

vertically sorted list of nodes and a list of horizontal temporal event nodes,

although to see the full complement of nodes, scrolling may be required. As

with standard node-link representations the prototypes incorporating node-

links may suffer occlusion problems in large dense networks. However, as the

tool is designed for temporal datasets, the ability to confine the visualisation to

a time window assists in overviews of those time periods.

The ratings for the standard Node-link and Matrix representations were drawn

from the original assessment by (Lee, Plaisant et al. 2006)

Rating:

Mode Node-link Matrix TIPAD SNAC2 TIMER ABGV static May   require Supported by the use of Node-link diagram scrolling temporal   

290

Prototypes Evaluation

Table 9 –Evaluation test summary of low-level tasks

Low-level Node-link Matrix TIPAD SNAC2 TIMER ABGV tasks 1 Retrieve Semantic Retrieval visual Value Visual retrieval of edge retrieval of retrieval supported Not attribute categorical of node (edge supported value attribute degree attribute supported supported value) temporal   2 Filter Minimum level of filtering is available in Visual the prototypes. The visualisation retrieval of approach does not exclude the sets via

possibility of more advanced filtering. associated Potentially visualised via exclusion or attribute highlighting value temporal   3 Compute Numeric Only Derived computati Computed values not individual Value on of currently supported count centrality supported visualised temporal   4 Find Visual Visual Limited visual Limited visual Extremum support for support for support for support for graph appearances degree degree metric temporal    5 Sort Possibility for extended Not supported. These prototypes do not support rank and visualise entities according to demonstrate some ordinal metric d via degree temporal 6 Determine Only manual Not currently supported. Visualisation of Range method on range is not available. This could easily be

degree supported with additional components. supported temporal 7 Characterise visualisation Ability to visualisation of Distribution of scene co- visualise and the distribution   appearances characterise of via line distribution communication patterns of centrality via line patterns temporal   

291

Prototypes Evaluation

8 Find Partially supported depending on the nature of an Anomalies  anomaly. All prototypes are able to detect some anomalies. temporal  9 Cluster Clusters are Not supported by the application   established a prototypes priori. temporal 10 Correlate Supported Some via limited visualisation correlation of the based on topology roles of between nodes attributes temporal 11 Find Supported Naturally supported through the use of Adjacent   (two mode) node-link representations with zooming Nodes temporal   12 Scan Is supported in varying ways by all the prototypes Limited Shows Shows Partial support for  participation participation temporal scanning in event in event support attribute groups  visually temporal   isolate to periods 13 Set Partially Operations supported temporal

Table 9 shows the performance of each prototype against each test. The table includes classic node-link and matrix representations as a standard baseline. The table was derived from (Lee, Plaisant et al. 2006) however, it includes a rating with respect to temporal support for each low level task.

The prototypes have limited abilities to filter and compute derived values.

Therefore with the exception of SNAC2, the prototypes do not rate well on those low- level tasks. 292

Prototypes Evaluation

Table 10 - Evaluation summary of topology based tasks

Matr Complex tasks Task Node-link TIPAD SNAC2 TIMER ABGV ix 1 Adjacency Find the set of  (node-link) nodes adjacent to    (2 mode)

a node. temporal   How many nodes  Node-link are adjacent to a (requires  (2 mode) representation node? counting) (requires counting) temporal   Which node has a  Node-link  (out maximum number (requires representation degree) of adjacent nodes? counting) (requires counting) temporal  2 Accessibility Find the set of  (two nodes accessible mode) By virtue of node-link  from a node. Requires representation counting temporal  How many nodes  Achievable via verifying are accessible from (two mode) number of highlighted  a node? Requires nodes (same as node- counting link) temporal  Find the set of nodes accessible Not Supported (possible using from a node where visualisation and counting but the distance is less infeasible in large graphs) than or equal to n. temporal How many nodes are accessible from Not Supported (possible using a nodes where the visualisation and counting but distance is less infeasible in large graphs) than or equal to n? temporal 3 Common Given nodes, find a Connections set of nodes that

are connected to all of them. temporal

293

Prototypes Evaluation

Matr Complex tasks Task Node-link TIPAD SNAC2 TIMER ABGV ix 4 Identify and Find the shortest Visually supported by Locate path between two  virtue of node-link nodes. representation temporal Identify clusters Supported visually by the TIMER, (subgraphs whose ABGV and SNAC2 prototypes where nodes have high such groups can be found by visual  connectivity). examination. However, there is no algorithmic function implemented to “discover” these clusters. temporal  Identify connected Not supported visually or numerically components in any of the prototypes. However, (maximal there are known methods to calculate connected this and they could be implemented subgraphs). in each of the visual prototypes. temporal Find bridges.     temporal   Find articulation

points. temporal  partial support

294

Prototypes Evaluation

Table 11 – Attribute-based tasks

Attribute- Task Node- Matri TIPAD SNAC2 TIMER ABGV based / link x Browsing tasks 1 On the Find the nodes Nodes having a specific  attribute value temporal  Review the set    of nodes temporal   2 On the Given a node, Links find the nodes connected only by certain types of links temporal Which node is Visually connected by a Supported via the use of supported link having the  node-link via line largest/smallest representation thickness value? temporal   

295

Prototypes Evaluation

Table 12 - Browsing tasks

Browsing tasks Task Node- Matrix TIPAD SNAC2 TIMER ABGV link 1 Follow Path Follow a Supported by the use of  given path Node-link diagram temporal 2 Revisit Return to a May Supported by the use of previously  require Node-link diagram visited node scrolling temporal May May require require scrolling scrolling & zoom 3 Overview May Supported by the use of   require Node-link diagram scrolling temporal   

7.1.7 Temporal Task Evaluation

Ahn, Plaisant et al. (2011) propose a taxonomy of temporal network visualisation tasks developed by surveying a range of social network visualisation tools. These tools include ones directly targeted at the VAST08 cell phone network mini-challenge discussed in Chapter 6. The survey was primarily conducted to assist in developing the taxonomy. However, capturing the significant temporal features of sixteen visualisation tools affords us the opportunity to evaluate the prototypes discussed in this thesis with those sixteen tools. This evaluation of temporal features is presented below in terms of individual temporal events’ tasks and aggregated time event features.

296

Prototypes Evaluation

Individual Temporal Event Tasks

1. Single occurrences

(a) Examine a specific value of an entity of one or more discrete time point(s).

(b) Compare the value of entities of multiple time points.

(c) Compare the value of entities among entities.

(d) Compare multiple time points using similarity measures.

(e) Compare the events with domain attributes.

 This task is accomplished in the TIPAD application by virtue of the ‘parallel arc

diagram’ encoding providing screen area to embed and visualise attributes. In

the case of the TIPAD application, edge attribute values (verbal utterances) are

visualised as an ordered sequence of “bubbles” and the content is available by

hovering over the “bubble”.

 SNCA2 enable the comparisons of centrality measures at various time points. It

also provides a visualisation of individual centrality over an aggregated time

window. In addition, it provides limited ability to compare multiple network

diagrams (currently there is a requirement to switch between them, however,

they can be captured and viewed separately).

 The TIMER tool provides a visualisation method which facilitates a high

concentration of events (cell phone calls) at a specific time period.

297

Prototypes Evaluation

2. Birth and Death

(a) Find if and when a specific entity appears and disappears.

(b) Find an emergence of a new network structure such as an interaction pattern, or sub-groups.

(c) Compare the events with domain attributes.

 The TIMER prototype and the SNAC2 tool both visualise the entry and exit of

entities in the network. Both provide the ability to tailor the time period in

which the “birth and death” are observed. This can be done by moving through

time and visually noting those that appear and disappear. With the TIMER tool

it can be determined via the overview provided for a particular time period.

 The TIMER tool enables a user to find when a group of callers disappear (by the

call frequencies) and when another group of callers appear. It also provides a

visualisation of “interaction patterns” if the right level of zoom and node

filtering are applied.

 The TIPAD application provides a view of first appearance and last appearance

via drop down lines from the scene event boxes.

3. Replacement

(a) Find the change of entity properties.

(b) Compare the event with domain attributes.

 The TIMER tool provides a visual representation which may assist in identifying

change of properties. For example, the identification of the point in time when

298

Prototypes Evaluation

the cell phones were (switched) described in Chapter 6. Based on the Ahn,

Plaisant et al. (2011) definitions tool this provides more of a “group”

perspective.

 SNAC2 visualises the replacement of nodes as an “individual event feature” by

advancing in individual time steps.

Aggregated Temporal Event Tasks

1. Growth and Contraction

(a) Observe the value of an entity measure increases or decreases.

(b) Compare the growth or the contraction of an entity between time points.

(c) Compare the growth or the contraction pattern among entities.

(d) Compare the events with domain attributes.

 SNAC2 provides the ability to observe the selected centrality measure (a

structural property of a node) increase or decrease over time. SNAC2 also

provides the ability to select various time periods and observe the growth or

the contraction of centrality over that time period. In addition, it provides the

ability to select separate non-contiguous periods and compare the centrality

between them.

 The TIPAD prototype visualises consecutive participation in scenes with the

selected node. The changing density of lines indicates an increasing or

decreasing pattern of consecutive participation over a time period.

299

Prototypes Evaluation

2. Convergence and Divergence

(a) Observe the value of an entity measure and find if and when it converges to a specific point.

(b) In case of convergence, find if there appears a new structure at the point.

(c) Compare the convergence states.

(d) Compare the events with domain attributes.

 Not supported by any of the prototypes.

3. Stability

(a) Find if a changing value of an entity stabilizes.

(b) Identify when the stabilization happens.

(c) Compare the stability states.

(d) Compare the events with domain attributes.

 SNAC2 enables visually identifying areas over which centrality stabilises.

 The TIMER tool can generally assist in determining if a pattern of interaction

(communication) is growing or stabilising. However, there is no mechanism to

compare them (other than viewing them separately).

300

Prototypes Evaluation

4. Repetition

(a) Find out if a pattern of an entity value change repeats.

(b) Identify the repeating pattern.

(c) Compare the repetition patterns.

(d) Compare the events with domain attributes.

 The TIMER tool enables the identification of repeated patterns of interaction

over a time period.

 The TIPAD tool enables the identification of repeating patterns of interaction

with a selected character.

5. Peaks or Valleys

(a) Find out if there are any peaks or valleys of an entity value change over time.

(b) Identify the shape of the peaks/valleys. Do they change sharply or slowly?

(c) Compare the peak/valley patterns.

(d) Compare the events with domain attributes.

 The SNAC2 tool visualises the centrality with a user being able to identify the

peaks and valleys during the progression of the activity. In terms defined by the

temporal network visualisation taxonomy this would constitute structural

properties (“graph theory-based measures that are used for social network

analysis”)

301

Prototypes Evaluation

 The TIMER and the TIPAD tool presents peaks and valleys as heavily or lightly

banded areas. These represent rate of communication or participation

respectively.

6. Rate of Changes

1. Fast or Slow

(a) Identify how much changes an entity had during a given time period.

(b) Compare the difference of changes of multiple entities.

Find out which one is faster or slower.

(c) Compare the events with domain attributes.

 The TIPAD application provides visual representation of the rate of

participation in scenes via the shape of the slope on the interaction bar. This

provides a limited ability visually to perform comparisons. In addition it shows

participation with a selected character via the density of parallel lines.

 Similarly the TIMER tool displays frequency of interaction via the density of

lines.

 The SNAC2 application provides rate of change of centrality measures which

are read as slope.

2. Accelerating or Decelerating

(a) Identify whether a change is getting faster or slower.

(b) Compare the acceleration or deceleration.

(c) Compare the events with domain attributes.

302

Prototypes Evaluation

 SNAC2 provides the ability to identify change in centrality which is read as the

change in slope. This also affords the opportunity to compare the rates across

entities.

Table 13 presents a summary of the temporalanalysis features discussed above.

Table 13 - Evaluation summary of temporal analysis features

Temporal Analysis Features Individual Time Event Aggreagated Time Event Features Features Shape of Changes Rate of Changes

Prototype

Single speed

Growth

Stability

accelerate

Repetition

Divergence

Occurances

Contraction

Birth/Death

Peak/Valley

Convergence

Replacement TIMER      TIPAD       1  SNAC2     2  

1 Peaks/Valleys of interaction in relation to relation to ego.

2 Graph centrality peaks/valleys

7.1.8 Faithfulness

Recently Nguyen, Eades et al. (2013) introduced a new criterion to assist in measuring the quality of graph visualisations. They argue that commonly used criteria, while necessary, are not sufficient. They propose the faithfulness measure and distinguish three kinds of faithfulness: information faithfulness, task faithfulness, and change faithfulness.

A brief description of their approach follows. For a more in depth description refer to their publication. They define a general model of the visualisation process as 303

Prototypes Evaluation follows. The visualization process maps a data item d ∈ D (an attributed graph) to a layout item 푙 = 푉(푑) ∈ 퐿 to a specification s ∈ S described as:

푉: 퐷⁡푥⁡푆⁡ → 퐿

The perception process maps a picture from the layout space L to the knowledge space K. They denote this as:

푃: 퐿 → 퐾

They model task process as mapping the data space D, the layout space L, and the knowledge space K to a result space R. The task process is: 푇 = (푇퐷, 푇퐿, 푇푘), where

푇퐷, 푇퐿, 푇푘 are three functions:

푇퐷 ∶ ⁡퐷⁡ → ⁡푅

푇퐿 ∶ ⁡퐿⁡ → ⁡푅

푇푘 ∶ ⁡퐾⁡ → ⁡푅.

Information faithfulness is based on the idea that visual representation of a dataset should contain all the information of the dataset. Put simply, if the same visualisation can result from several input graphs than the visualisation method is not information faithful. By measuring a graph visualisation’s level of “ambiguity” the worst case information faithfulness of V is defined by the function.

1 푓푖푛푓표(푛) = max .⁡ ‖푙‖⁡≤푛 |푣−1(푙)|

Where for each⁡푙 ∈ 퐿, let 푉−1(푙) denote the set [(푑, 푠) ∈ 퐷⁡푥⁡푆 ∶ 푉(푑, 푠) = 푙] and let|푉−1(푙)⁡| denote the number of elements in 푉−1(푙).

304

Prototypes Evaluation

Although the author’s acknowledge the importance node and edge attributes in a visualisation, their measure only implicitly includes them. Their definition of information faithfulness simply implies that a visualisation of a graph is more faithful if all the nodes and edges are represented in the visualisation. For example, they state that the visualisation of graphs using a matrix representation is in general very faithful as all the nodes and edges are represented in the visualisation, however they do not account for the associated attributes. I contend that a truly faithful information visualisation requires a faithful representation of the attribute data.

All the prototypes in this thesis display all the edges and nodes and thus are information faithful. However, some go a step further and remain faithful to the representation of attribute data.

Task faithfulness can be measured as the difference between the result of performing a task on the visualisation and the result of the same task on the data. Nguyen, Eades et al. define the worst case task faithfulness of V as a function of 푓푡푎푠푘 on the natural numbers, defined by:

푓푡푎푠푘(푛) = max ‖푇퐿(푉(푑, 푠)) − 푇퐷⁡(푑)⁡‖ ‖푑‖⁡≤푛 ⁡

With respect to the task 푇 = (푇퐷, 푇퐿, 푇푘)

A visualisation is always task faithful if the visualisation is information faithful, however, the converse may not be true. As the prototypes in this thesis are information faithful they are by implication task faithful.

305

Prototypes Evaluation

The idea behind change faithfulness is that a change in the visual representation should be consistent with the change in the original data. The authors point out that visualisations based on force directed algorithms are not change faithful. Static representations are rated on change faithfulness based on the consistency between the difference between two pictures and the data they represent. Some of the prototypes employ dynamic visualisation methods based on force directed approaches and therefore are not change faithful. The TIPAD application uses a static representation to show change in interaction over time and is therefore considered change faithful.

Table 14 summarises the prototypes described in this thesis alongside classic node-link and matrix representations in terms of faithfulness. I have explicitly included the attribute faithfulness in the information faithfulness category due to its importance in overall faithfulness. It is difficult to accomplish a full range of tasks utilising attributes if the visualisation is not attribute faithful.

Table 14 - Summary of prototypes faithfulness

Faithfulness Node-link Matrix TIPAD SNAC2 TIMER ABGV Information       attribute    Task       Change 

In terms of information, task and change faithfulness the TIPAD prototype supports faithfulness best across all categories. TIPAD, SNAC2 and ABGV support attribute faithfulness which classic node-link and matrix representation do not. The

TIPAD, TIMER and SNAC2 prototypes support information faithfulness as they

306

Prototypes Evaluation incorporate or use node-link representations. The tools described in this thesis are information faithful due to no reduction in data items. In fact, the proposition in this thesis is that data reduction techniques although valuable in very large graphs, should be avoided as much as possible in small to medium graphs due to the loss of important information. Stress measures applied in multidimensional scaling (MDS) techniques are an attempt to reduce the impact of this loss, however, loss is inevitable. All the visualisation approaches in this thesis attempt to retain all the original data and some retain attribute or contextual data.

307

Conclusions and Future Work

Chapter 8. Conclusions and Future Work

I set out in this thesis to study Social Network Visualisation, examining what the current issues are and exploring what new visualisation approaches could be employed to assist in mitigating them. In particular, this thesis aims to answer the questions:

1. How could alternative visual encoding approaches aid in performing

fundamental temporal tasks on social networks?

2. How can network visualisation techniques be used to examine the

relationship of metadata to the structure of a network and vice versa?

The contributions of this thesis are:

1. Extensive literature review of Social Network Visualisation.

2. Design of the Attribute Based Graph Visualisation (ABGV) approach concept demonstrator.

3. Proposing the Visualisation component of the Social Network Analysis for Command and Control tool (SNAC2)

4. Creation of the Parallel Arc Diagram visualisation method and the development of the software prototype Temporal Interactive Parallel Arc Diagram (TIPAD).

308

Conclusions and Future Work

5. Development of the Temporal Interactive Multi-slider Event and Relationship (TIMER) prototype. This focuses on visualising and enabling temporal pattern discovery in Dynamic Networks.

6. Temporal additions to the static taxonomic task descriptions for enhanced temporal visualisation evaluation.

I have proposed three unique visual encoding approaches to address the problems of visualising the temporal aspects of social networks and one unique approach to visualise attributes and their interplay with the dynamics of social networks. I have thoroughly examined the literature and extended commonly accepted task taxonomies with temporal tasks to assist with the future development of visualisation tools.

The prototypes presented in this thesis vary in maturity and application level functionality. Some prototypes are largely concept assessment applications which don’t have the same level filtering or interactivity as applications that have had many man-hours of development devoted to them. Their purpose is to demonstrate a visual encoding approach which when incorporated by developers into applications with appropriate querying and filtering mechanisms provide a new view on the data which contributes to developing further insight.

A number of approaches presented in this thesis are complementary in that they provide another view on the data which isn’t available in the current set of visualisation tools. The intension of this thesis is to determine what is missing from

309

Conclusions and Future Work the current set of tools and evaluate what gaps might be filled by applying the techniques described herein. Future work flowing from this thesis therefore centres on enhancing the new visualisation techniques and combining them into a tool that utilises all the new techniques in concert with established visualisation approaches.

This enhanced tool will be subsequently tested using the longitudinal approach advocated by many as the most appropriate method of assessing visualisation tools that are intended to assist in analysis.

All the visualisation approaches described in this thesis could benefit from more advanced filtering and querying and were generally found lacking in their ability to display traditional network metrics. However, the focus of this thesis was not the tools themselves but rather the relative worth of the techniques they employ and as such these limitations could be addressed by using existing methods.

The techniques described in this thesis have been evaluated and in the case of SNAC2 used by analysts to successfully conduct analysis on real datasets. However, use of these approaches combined into a more comprehensive tool needs to be measured and evaluated over an extended period in a longitudinal testing phase.

8.1 ABGV

The Attribute-Based Graph Visualisation ABGV technique described in this thesis contributes to the common aspiration in the information visualisation community to develop methods to include node attributes in social network

310

Conclusions and Future Work visualisations. The technique is a layout mechanism using classic force directed layout techniques. Force directed visualisations are the most common form of social network visualisation and therefore likely to be the most familiar type of visualisation to users and tool developers. The visualisation approach uses spatial positioning to highlight interactions or the lack thereof between mutually exclusive attribute groups, allowing users to visually determine important nodes in cross attribute interactions.

The development and evaluation of ABGV approach in this thesis is the first step in including it into many tools that already use a force directed approach which will enable longitudinal user testing and experiments to be conducted. The inclusion of this approach into the current array of methods using force directed displays represents an additional visual lens social network analysts have at their disposal to assist in their analysis. This thesis tested the visual encoding approach against the ideals of a generally accepted and widely used taxonomy for graph visualization.

In terms of object focus the ABGV is very similar to node-link diagrams as it uses the node-link approach as the basis for the visual encoding technique. However

ABGV extends classic node-link diagrams into the group focus space. The utilisation of node-link representations means its performance on low-level tasks is similar to that of the node link encoding with enhanced ability to support the low-level correlation tasks namely correlating topology with node attributes. In a similar way its performance is comparable with classical node-link representations in the topology-based tasks category.

311

Conclusions and Future Work

The ABGV approach has the advantage of being a simple extension of a common technique but also has a number of disadvantages. The primary disadvantage is the limited number of attribute dimensions it is able to represent. It is suited to 3 dimensions but its utility quickly falls away beyond 5 dimensions. In addition, just as standard node-link representations perform badly on dense networks, the ABGV approach will not work well on large or similarly dense networks.

Additionally, an examination of the evaluation section highlights the inability of ABGV approach to assist with analysis involving the temporal dimension. This is expected, as the approach was conceived to assist only with the visualisation of categorical social network attributes, unlike the other visualisation techniques in this thesis which contribute to enhancing visualisation in the temporal space.

Future work will attempt to integrate this simple approach with the approach used in the SNAC2 and TIMER tools. This could potentially bring the benefits of group focused analysis to these two alternative approaches which incorporate force- directed node-link displays. The prototype tested the technique with node attributes. Testing the application of the technique to edge attributes is left for future work. In addition, further work could include methods of ‘optimally’ positioning the attribute nodes automatically without user intervention.

More rigorous methods of evaluating this visualisation approach via the application of usability tests with sets of benchmark data is also left for future work.

This would supply quantitative data regarding the techniques ability to aid in community analysis.

312

Conclusions and Future Work

8.2 SNAC2

The SNAC2 prototype is the most mature of the tools described in this thesis and is one of three visualisation approaches presented which focus on assisting analysts exploring the temporal dynamics of social networks. SNAC2 supplements traditional social network analysis with the addition of content and features such as contextual mark-up. Updating the visual layout in response to the change in time assists in analysing network changes by drawing attention movement of nodes. This is in contrast to typical approaches which seek to limit the movement of nodes over time.

The application focuses on the sequencing of events and the content associated with them. Visualising the changing topology concurrently with the contextual information affords analysts with the opportunity to visually explore the progression of events and how their sequencing might influence the network topology and vice versa.

The evaluation of SNAC2 demonstrated the greatest range of “object focus” when compared to typical visual encoding techniques and the other approaches presented in this thesis. In terms of low-level tasks SNAC2 supports semantic retrieval, visualisation of numeric computation of centrality, and provides the ability to characterise the distribution of centrality in both static and temporal modes. The evaluation shows greater support for temporal attribute-based tasks over static ones

313

Conclusions and Future Work and ‘bowsing’ tasks are supported by virtue of using the node-link representation albeit with enhanced temporal support for overviewing tasks.

In terms of temporal tasks SNAC2 supports the analysis of individual time event features consisting of: single occurrences, birth and death events and the replacement of nodes. In addition it supports the aggregated time event features: growth and contraction, stability, peaks and valleys, rate of change.

SNAC2 is novel in its use of coordinated views consisting of a continuous temporal graph layout, associated contextual data, domain specific ‘traffic lights’ and rolling measures of centrality. SNAC2 has been utilised by analysts to assess the efficiency and effectiveness of the processes, to identify chokepoints, inefficient work practices and the propagation of errors.

SNAC2 has demonstrated its use in visualising topology changes overtime with the associated context. However, SNAC2 does not present the commutation events in a way which a user might directly find patterns of communication. For example, if there was a repetitive rhythm in the data this would be hard to ascertain using SNAC2 alone.

In the future SNAC2 will be used to analyse the data collected during other military command and control activities. These user tests will inform the future development of the tool and form the basis of a long-term longitudinal study.

Currently the potential to incorporate similar contextual and mark-up features into the TIMER tool is being investigated.

314

Conclusions and Future Work

More rigorous methods of evaluating this visualisation prototype via the application of usability tests with sets of appropriate data is also left for future work.

8.3 TIPAD

The Parallel Arc Diagram (PAD) visualisation approach is targeted at visualising data consisting of a sequence of separate periods in such a way as to reveal the temporal patterns (rhythms) hidden in the data. The primary purpose of the creation of the TIPAD application was to test the concept of the parallel arc diagram (PAD) visual encoding technique.

TIPAD was characterised in the evaluation section as focusing on Nodes and

Links which is similar to classic Matrix representations, however, it improves upon them by enabling a temporal focus on Links and the ability focus on connected components. The TIPAD strongly supports 7 of the 13 defined low-level tasks both temporally and statically. This is similar to classic node-link diagrams however the

TIPAD application has additional temporal support for the tasks. In terms of topology based tasks, the TIPAD application supports all 3 of the adjacency subtasks, 2 of the accessibility tasks and supports the “identify clusters” subtask of the “identify and locate” task. It also partially supports attribute-based tasks on both nodes and edges and ‘revisit’ and ‘overview’ tasks via scrolling.

On temporal task evaluation the space available using the parallel arc diagram approach affords the developer the screen area to embed and visualise attributes 315

Conclusions and Future Work satisfying the evaluation criteria of single occurrence temporal tasks. It supports the visualisation of birth and death events, growth and contraction, repetition, peaks and valleys and makes possible the visualisation of rate of change.

In terms of information, task and change faithfulness as and defined in Section

7.1.8 the TIPAD prototype performs the best when compared to all other prototypes in this thesis and is an improvement over standard node-link and matrix representations.

TIPAD is novel in its ability to show temporal patterns which are directly reflected in patterns generated by lines connecting the entities. It shows individual links over time with minimal “ink” as a consequence of the lines being adjacent to each other. Traditional network diagrams cannot easily display links over time without the assistance of animation and therefore the PAD approach provides a relatively compact static representation and is useful for novice and experienced analysts.

Investigation of the utility of the PAD approach with alternative datasets is left for future work. The testing of this tool with movie scripts demonstrated that it may have potential in the movie script writing software space. Indeed the scripts used by TIPAD were in industry-standard formatting. The investigation of the possibility to integrating the approach with scriptwriting applications is also left for future work.

316

Conclusions and Future Work

8.4 TIMER

The primary purpose of the TIMER tool is to test a visual encoding technique that assists finding temporal patterns in dynamic network data. The tool was tested with the VAST 2008 Cell phone Mini-Challenge dataset27 with the goal of identifying key players in the social structure. The analysis was successful in identifying the most of key players (or possibilities) based on the subsequent comparison with a given possible competition ground truth. Although providing greater fidelity then some completion winners, the TIMER tool lacked simple metrics which would have made the analysis easier.

The TIMER provides the ability to focus on Nodes, Links, Paths, Graphs,

Connected Components, and Clusters. Due to the tool’s utilisation of dynamic node- link diagrams its capability is very similar to classic node-link representations with the additional ability to focus on temporal links. In terms of low level tasks the TIMER tool is equivalent to the classic node-link representations with the additional capacity to visualise temporal aspects of the tasks: Find Extremum, Characterise Distribution,

Find Anomalies, Find Adjacent Nodes, Scan, and Set Operations. Similarly the TIMER tool’s support of topology based tasks is comparable to node-link representations with additional support for temporal aspects of the tasks of Adjacency and Identify and Locate.

27 The VAST Cell Phone Calls dataset is available via the Visual Analytics Benchmark Repository at http://hcil.cs.umd.edu/localphp/hcil/vast/archive/task.php?ts_id=121 [31/07/2013] 317

Conclusions and Future Work

The advantage the TIMER tool has over classic node-link representations in performing attribute based tasks is the ability to review the set of nodes and determine the largest and smallest degree links both statically and over time.

Similarly the browsing subtask of “overviewing” is enhanced beyond that of typical node-link representations by temporal support. The TIMER tool also visualises time event features of birth and death and replacement and the aggregated time event features of stability, repetition, peaks and valleys and rate of change. The TIMER tool demonstrates a similar level of information and task faithfulness as the traditional node-link representations.

The use of the tool on datasets such as the VAST dataset has shown analysts would benefit from the inclusion of common network measures (such as centrality) within the tool itself. Future work on the TIMER tool will incorporate the ability to visualise common measures in a similar way in which SNAC2 provides a visualisation of rolling centrality metrics. In addition, the simple filtering will be expanded to provide the ability to visualise the results of complex queries consisting of time, centrality and subsets of the node set. The tool was tested on a large dataset and future work is required to evaluate its performance on smaller data sets.

318

References

References

Abdi, H., O'Toole, A. J., Valentin, D. and Edelman, B. (2005). DISTATIS: The

Analysis of Multiple Distance Matrices. Proceedings of the 2005 IEEE

Computer Society Conference on Computer Vision and Pattern Recognition

(CVPR'05) - Workshops Washington, DC, USA, IEEE Computer Society,

pp.42

Ahn, J., Plaisant, C. and Shneiderman, B. (2011). A Task Taxonomy of Network

Evolution Analysis (HCIL-2011-09 ), HCIL, Available:

https://www.cs.umd.edu/localphp/hcil/tech-reports-

search.php?number=2011-09, Accessed 1/2/2014.

Ahn, J., Taieb-Maimon, M., Sopan, A., Plaisant, C. and Shneiderman, B. (2011).

Temporal visualization of social network dynamics: prototypes for nation of

neighbors. Proceedings of the 4th international conference on Social

computing, behavioral-cultural modeling and prediction, College Park, MD,

Springer-Verlag, pp.309-316

Aigner, W., Miksch, S., Muller, W., Schumann, H. and Tominski, C. (2008). "Visual

Methods for Analyzing Time-Oriented Data." IEEE Transactions on

Visualization and Computer Graphics 14(1).

Akaishi, M. and Okada, Y. (2004). Time-tunnel: Visual Analysis Tool for Time-series

Numerical Data and Its Aspects as Multimedia Presentation Tool. Proceedings 319

References

of the Eighth International Conference on Information Visualisation (IV’04),

London, England, IEEE Computer Society, pp.456-461

Alexander, M. (2005). Using the bipartite line graph to visualize 2-mode social

networks. Proceedings of the North American Association for Computational

Social and Organizational Science (NAACSOS), Notre Dame, Indiana, USA

Alexandre, D. S. and Tavares, J. M. R. S. (2010). "Introduction of Human Perception

in Visualization." International Journal of Imaging 4(A10).

Amar, R., Eagan, J. and Stasko, J. (2005). Low-Level Components of Analytic

Activity in Information Visualization. Proceedings of the 2005 IEEE

Symposium on Information Visualization, Minneapolis, Minnesota, USA,

IEEE Computer Society, pp.111-117

Anscombe, F. J. (1973). "Graphs in Statistical Analysis." American Statistician 27(1):

17-21.

Appan, P., Sundaram, H. and Tseng, B. (2006). Summarization and Visualization of

Communication Patterns in Large-Scale Social Network. Proceedings of the

10th Pacific-Asia conference on Advances in Knowledge Discovery and Data

Mining, Singapore, Springer-Verlag Berline Heidelberg, pp.371-379

Archambault, D., Munzner, T. and Auber, D. (2007). "TopoLayout: Multilevel Graph

Layout by Topological Features." IEEE Transactions on Visualization and

Computer Graphics 13(2): 305-317.

320

References

Archambault, D., Munzner, T. and Auber, D. (2008). "GrouseFlocks: Steerable

Exploration of Graph Hierarchy Space." Visualization and Computer Graphics,

IEEE Transactions on 14(4): 900-913.

Aris, A. (2008). Visualizing & Exploring Networks Using Semantic Substrates

Department of Computer Science. Maryland, University of Maryland. PhD:

298.

Aris, A. and Shneiderman, B. (2007). "Designing semantic substrates for visual

network exploration." Information Visualization 6: 281-300.

Au, T. A., Lo, E. H. S. and Hoek, P. J. (2009). Evaluation of Command and Control

Activity for Air Operations. Proceedings of the Human Factors & Ergonomics

Society of Australia inc., University of Melbourne, Victoria, Australia

Bach, B., Pietriga, E. and Fekete, J.-D. Visualizing Dense Dynamic Networks with

Matrix Cubes. IEEE VisWeek 2013 Electronic Conference Proceedings

Barabasi, A.-L. (2010). Bursts: The hidden pattern behind everything we do. New

York, N.Y:, Dutton, Penguin Group (USA) Inc.

Barlow, T. and Neville, P. (2001). A comparison of 2-D visualizations of hierarchies.

Information Visualization, 2001. INFOVIS 2001. IEEE Symposium on,

pp.131-138

Bartoletti, A., Billinghurst, M., Card, S., Carr, D., Dill, J., Earnshaw, R., Ebert, D.,

Eick, S. and Grossman, R. (2005). Illuminating the Path The Research and

321

References

Development Agenda for Visual Analytics. J. J. Thomas and K. A. Cook,

IEEE.

Baur, M. (2008). Software for the Analysis and Visualization of Social Networks.

Fakultät für Informatik. Karlsruhe, Germany, Universität Fridericiana zu

Karlsruhe. Doktors der Naturwissenschaften.

Bender-deMoll, S. and McFarland, D. A. (2006). "The Art and Science of Dynamic

Network Visualization." Journal of Social Structure 7.

Bender-deMoll, S., Morris, M. and Moody, J. (2007). "Prototype Packages for

Managing and Animating Longitudinal Network Data: dynamicnetwork and

rSoNIA." Journal of Statistical Software 24(7): 1--36.

Bennett, C., Ryall, J., Spalteholz, L. and Gooch, A. (2007). The aesthetics of graph

visualization. Proceedings of the Third Eurographics conference on

Computational Aesthetics in Graphics, Visualization and Imaging, Alberta,

Canada, Eurographics Association, pp.57-64

Bezerianos, A., Chevalier, F., Dragicevic, P., Elmqvist, N. and Fekete, J.-D. (2010).

"GraphDice: A System for Exploring Multivariate Social Networks."

Computer Graphics Forum - Eurographics/IEEE-VGTC Symposium on

Visualization 2010 (EuroVis 2010) 29(3): 863-872.

Biedl, T., Madden, B. and Tollis, I. (1997). The three-phase method: A unified

approach to orthogonal graph drawing. Graph Drawing. G. DiBattista, Springer

Berlin Heidelberg. 1353: 391-402.

322

References

Blackwell, A. (2011). Visual Representation. Encyclopedia of Human-Computer

Interaction. M. Soegaard and R. F. Dam. Aarhus, Denmark, The Interaction-

Design.org Foundation. Available online at http://www.interaction-

design.org/encyclopedia/visual_representation.html.

Blythe, J., McGrath, C. and Krackhardt, D. (1996). The effect of graph layout on

inference from social network data. Graph Drawing. F. Brandenburg, Springer

Berlin Heidelberg. 1027: 40-51.

Blythe, J., Patwardhan, M., Oates, T., desJardins, M. and Rheigans, P. (2006).

Visualization Support for Fusing Relational, Spatio-Temporal Data: Building

Career Histories. Proceedings of the 9th International Conference on

Information Fusion, Florence, Italy, IEEE, pp.1-7

Bonacich, P. (1972). "Technique for analyzing overlapping memberships."

Sociological Methodology 4: 176-185.

Borgatti, S. P., Everett, M. G. and Freeman, L. C. (2002). Ucinet 6 for Windows:

Software for social network analysis, Harvard, Analytic Technologies.

Borgatti, S. P., Everett, M. G. and Johnson, J. C. (2013). Analyzing social networks.

London, SAGE Publications Limited.

Boyd, D. and Potter, J. (2003). Social network fragments: an interactive tool for

exploring digital social connections. Proceedings of the ACM SIGGRAPH

2003 Sketches and Applications, San Diego, California, ACM, pp.1-1

323

References

Brandes, U. and Pich, C. (2010). More Flexible Radial Layout. Graph Drawing. D.

Eppstein and E. Gansner, Springer Berlin Heidelberg. 5849: 107-118.

Brandes, U., Raab, J. and Wagner, D. (2001). "Exploratory Network Visualization:

Simultaneous Display of Actor Status and Connections." Journal of Social

Structure 2(4).

Brandes, U. and Wagner, D. (2004). Analysis and Visualization of Social Networks.

Graph Drawing Software. M. Jünger and P. Mutzel, Springer Berlin

Heidelberg: 321-340.

Brockenauer, R. and Cornelson, S. (2001). Drawing clusters and hierarchies. Drawing

graphs: Methods and Models. K. Michael and W. Dorothea, Springer-Verlag.

2025: 193-227.

Card, S. (2008). The human-computer interaction handbook : fundamentals, evolving

technologies, and emerging applications. A. Sears and J. A. Jacko. New York,

Lawrence Erlbaum Assoc.

Carlis, J. V. and Konstan, J. A. (1998). Interactive Visualization of Serial Periodic

Data. Proceedings of the UIST '98 user Interface Software and Technology

Carmel, L., Harel, D. and Koren, Y. (2004). "Combining hierarchy and energy for

drawing directed graphs." Visualization and Computer Graphics, IEEE

Transactions on 10(1): 46-57.

Chen, C. (2006). Information Visualization: Beyond the Horizon; 2nd Edition,

Springer.

324

References

Cole, H. R. and Haag, J. H. (1988). The Complete Guide to Standard Script Formats:

The Screenplay. California, CMC Publishing.

Correa, C. D., Crnovrsanin, T., Muelder, C., Shen, Z., Armstron, R., Shearer, J. and

Ma, K.-L. (2008). Visual Analytics of Cell Phone Data using MobiVis and

OntoVis. Proceedings of the IEEE VAST Symposium, Piscataway, NJ, IEEE,

pp.211-212

Crosby, A. (1997). The Measure of Reality: Quantification and Western Society, 1250-

1600, Cambridge University Press.

Cross, R. (2006). Social Network Analysis: an introduction. The Bumble Bee.

Cruz, J. D., Bothorel, C. and Poulet, F. (2014). "Community detection and

visualization in social networks: Integrating structural and semantic

information." ACM Trans. Intell. Syst. Technol. 5(1): 1-26.

Cui, W. (2007). A Survey on Graph Visualization. Hong Kong, Computer Science

Department, Hong Kong University of Science and Technology.

Daassi, C., Nigay, L. and Fauvet, M.-C. (2006). "A taxonomy of temporal data

visualization techniques." Information Interaction Intelligence Review 5(2):

41-63.

Davis, A., Gardner, B. and Gardner, M. R. (1941). Deep South, Chicago University

Press.

325

References

Di Battista, G., Eades, P., Tamassia, R. and Tollis, I. G. (1999). Graph Drawing:

Algorithms for the Visualization of Graphs, Prentice Hall.

Donath, J., Karahalios, K. and Viegas, F. (1999). Visualizing Conversation.

Proceedings of the Thirty-Second Annual Hawaii International Conference on

Systems Sciences, Maui, Hawaii, IEEE

Dunbar, R. (1993). "Coevolution of neocortex size, group size and language in

humans." Behavioral and Brain Sciences 16(4): 681-735.

Dwyer, T., Hong, S.-H., Koschutzki, D., Schreiber, F. and Xu, K. (2006). Visual

Analysis of Network Centralities. Proceedings of the 2006 Asia-Pacific

Symposium on Information Visualisation (APVIS'06), Sydney, Australia,

Australian Computer Society, Inc., pp.189-197

Dwyer, T., Lee, B., Fisher, D., Quinn, K. I., Isenberg, P., Robertson, G. and North, C.

(2009). "A Comparison of User-Generated and Automatic Graph Layouts."

IEEE Transactions on Visualization and Computer Graphics 15(6).

Eades, P. (1984). "A heuristic for graph drawing." Congressus Numerantium 42: 149-

160.

Eades, P. and Feng, Q.-W. (1997). Multilevel visualization of clustered graphs. Graph

Drawing. S. North, Springer Berlin Heidelberg. 1190: 101-112.

Eades, P., Feng, Q., Lin, X. and Nagamochi, H. (2006). "Straight-Line Drawing

Algorithms for Hierarchical Graphs and Clustered Graphs." Algorithmica

44(1): 1-32.

326

References

Eades, P. and Huang, M. L. (2000). "Navigating clustered graphs using force-directed

methods." J. Graph Algorithms Appl. 4(3): 157-181.

Eick, S. G., Eick, A., Fugitt, J., Heath, J. E. and Ross, M. (2008). Geotemporal

Analysis. Proceedings of Aerospace Conference, Big Sky, MT, IEEE, pp.1-7

Eloranta, T. and Mäkinen, E. (2001). "TimGA: A genetic Algorithm for Drawing

Undirected Graphs." Divulgaciones Matemáticas 9(2): 155-171.

Everett, M. and Borgatti, S. (2002). Computing regular equivalence: practical and

theoretical issues. Metodološki zvezk. A. Mrvar and A. Ferliigoj. Ljublijana:

FDV. 17: 31-42.

Falkowski, T., Bartelheimer, J. and Spiliopoulou, M. (2006). Mining and Visuallizing

the evolution of Subgroups in Social Networks. Proceedings of the 2006

IEEE/WIC/ACM International Conference on Web Intelligence, Hong Kong,

IEEE Computer Society

Farrugia, M., Hurley, N. and Quigley, A. (2011). Exploring Temporal Ego Networks

Using Small Multiples and Tree-ring Layouts. Proceedings of ACHI 2011: The

Fourth International Conference on Advances in Computer-Human

Interactions, Gosier, Guadeloupe, France, IARIA, pp.79 - 88

Farrugia, M. and Quigley, A. (2008). Cell Phone Mini Challenge: Node-Link

Animation Award Animating multivariate dynamic social networks.

Proceedings of the IEEE Symposium on Visual Analytics Science and

Technology, Columbus, Ohio, USA

327

References

Farrugia, M. and Quigley, A. (2011). "Effective temporal graph layout: A comparative

study of animation versus static display methods." Information Visualization

10(1): 47-64.

Fekete, J. (2009). Visualizing networks using adjacency matrices: Progresses and

challenges. Proceedings of the 11th IEEE International Conference on

Computer-Aided Design and Computer Graphics, CAD/Graphics '09. , pp.636-

638

Fisher, D. and Dourish, P. (2004). Social and temporal structures in everyday

collaboration. Proceedings of the SIGCHI conference on Human factors in

computing systems, Vienna, Austria, ACM, pp.551-558

Frank, A. U. (1998 ). "Different types of 'Times' in GIS." Spatial and Temporal

Reasoning in Geographic Information Systems.

Freeman, L. C. (1977). "A set of measures of centrallity based upon betweenness."

Sociometry 40: 35-41.

Freeman, L. C. (1979). "Centrality in social networks: Conceptual clarification."

Social Networks 1(215-239).

Freeman, L. C. (2000). "Visualizing Social Networks." Journal of Social Structure

1(1).

Freeman, L. C. (2005). Graphical Techniques for Exploring Social Network Data.

Model and Methods in Social Network Analysis. P. J. Carrington, J. Scott and

S. Wasserman. Cambridge, Cambridge University Press.

328

References

Frishman, Y. and Tal, A. (2004). Dynamic Drawing of Clustered Graphs. Proceedings

of the IEEE Symposium on Information Visualization, Austin, TX, IEEE

Computer Society, pp.191-198

Fruchterman, T. M. J. and Reingold, E. M. (1991). "Graph Drawing by Force-directed

Placement." Software: Practice and Experience 21(1, 1): 1129-1164.

Fu, X., Hong, S.-H., Nikolov, N., Shen, X., Wu, Y. and Xu, K. (2006). Visualization

and Analysis of Small-World Email Networks. Proceedings of the 12th annual

IEEE Symposium on Information Visualization (InfoVis 2006), Baltimore,

Maryland

Fu, X., Hong, S. H., Nikolov, N. S., Shen, X., Wu, Y. and Xu, K. (2007). Visualization

and analysis of email networks. Proceedings of APVIS '07. 6th International

Asia-Pacific Symposium on Visualisation, Sydney, NSW, pp.1-8

Fung, G. (2001). A Comprehensive Overview of Basic Clustering Algorithms.

Madison, University of Wisconsin, Available:

http://pages.cs.wisc.edu/~gfung/clustering.ps.gz, Accessed: 1/2/2014.

Gansner, E. and Hu, Y. (2009). Efficient Node Overlap Removal Using a Proximity

Stress Model. Graph Drawing. I. Tollis and M. Patrignani, Springer Berlin

Heidelberg. 5417: 206-217.

Gansner, E. R. and Koren, Y. (2007). Improved circular layouts. Proceedings of the

14th international conference on Graph drawing, Karlsruhe, Germany,

Springer-Verlag, pp.386-398

329

References

Gansner, E. R., Koren, Y. and North, S. C. (2005). "Topological fisheye views for

visualizing large graphs." Visualization and Computer Graphics, IEEE

Transactions on 11(4): 457-468.

Garey, M. R. and Johnson, D. S. (1983). "Crossing number is NP-complete." SIAM

Journal on Algebraic and Discrete Methods 4: 312-216.

Gershon, N., Eick, S. G. and Card, S. (1998). "Information visualization." ACM

Interactions 5(2): 9-15.

Ghoniem, M., Fekete, J. D. and Cstagliola, P. (2005). "On the readability of graphs

using node-link and matrix-based representations: a controlled experiment and

statistical analysis." Information Visualization 4(2): 114-135.

Gibson, H. and Faith, J. (2011). Node-attribute Graph Layout for Small-World

Networks. Proceedings of the15th International Conference on Information

Visualisation (IV), 2011 London, United Kingdom, pp.482-487

Gloor, P. and Yhao, Y. (2004). TeCFlow - A Temporal Communication Flow

Visualizer for Social Networks Analysis. Proceedings of the ACM CSCW

Workshop on Social Networks, Chicago

Gloor, P. A. (2005). Capturing Team Dynamics Through Temporal Social Surfaces.

Proceedings of the Ninth International Conference on Information

Visualisation (IV’05), London, UK, IEEE

Gloor, P. A., Laubacher, R., Zhao, Y. and Dynes, S. B. C. (2004). Temporal

Visualization and Analysis of Social Networks. Proceedings of NAACSOS

330

References

North American Association for Computational Social and Organizational

Science, Pittsburgh, PA

Gloor, P. A. and Zhao, Y. (2006). Analyzing Actors and Their Discussion Topics by

Semantic Social Network Analysis. Proceedings of the Tenth International

Conference on Information Visualization IV 2006, London, England, IEEE

Computer Society, pp.130-135

Goud, R. (2006). Visualization of File Relations in Software Systems with Adjacency

Matrices. Department of Mathematics and Computer Science. Eindhoven,

Technische Universiteit Eindhoven. Master's Thesis.

Gretzel, U. (2001, November, 2001). "Social Network Analysis: Introduction and

Resources." University of Illinois Retrieved 28/06, 2012, from

http://lrs.ed.uiuc.edu/tse-portal/analysis/social-network-analysis/#analysis.

Grinstein, G., Plaisant, C., Laskowski, S., O'connell, T., Scholtz, J. and Whiting, M.

A. (2008). VAST 2008 Challenge: Introducing Mini-Challenges. Proceedings

of IEEE VAST 2008 Symposium, Columbus, Ohio, IEEE, pp.Medium: X

Gruzd, A. and Haythornwaite, C. (2008). The analysis of Online Communities using

Interactive Content-based Social Networks. Proceedings of American Society

for Information Science and Technology, Wiley Subscription Services, Inc.,

pp.1-5

Hadany, R. and Harel, D. (1999). A Multi-Scale Algorithm for Drawing Graphs

Nicely, Weizmann Science Press of Israel.

331

References

Hall, C. M., McMullen, S. A. H. and Hall, D. D. L. (2006). Cognitive Engineering

Research Methodology: A Proposed Study of Visualization Analysis

Techniques.

Ham, F. v. and Wijk, J. J. v. (2004). Interactive Visualization of Small World Graphs.

Proceedings of the IEEE Symposium on Information Visualization, IEEE

Computer Society, pp.199-206

Hanneman, R. and Riddle, M. (2005). Introduction to social network methods.

Riverside, CA, University of California, Riverside, Available:

http://faculty.ucr.edu/~hanneman/nettext/Introduction_to_Social_Network_M

ethods.pdf, Accessed: 1/2/2014.

Harel, D. and Koren, Y. (2001). A Fast Multi-scale Method for Drawing Large Graphs.

Proceedings of the 8th International Symposium on Graph Drawing, Springer-

Verlag, pp.183-196

Harris, C., Allen, R. B., Plaisant, C. and Shneiderman, B. (1999). Temporal

Visualization for Legal Case Histories, HCIL Technical Report No. 99-18,

Computer Science Department, University of Maryland.

Havre, S., Hetzler, B. and Nowell, L. (1999). ThemeRiver [TM] : In Search of Trends,

Patterns, and Relationships. Proceedings of the IEEE Symposium on

Information Visualization (Info Vis '99), San Francisco, CA, USA, IEEE,

pp.115-123

332

References

Healey, C. G., Booth, K. S. and Enns, J. T. (1995). "Visualizing Real-Time

Multivariate Data Using Preattentive Processing." ACM Transactions on

Modeling and Computer Simulation 5(3): 190-221.

Healy, P. and Nikolov, N. S. (2013). Hierarchical Drawing Algorithms. Handbook of

Graph Drawing and Visualization. R. Tamassia, CRC Press.

Heer, J., Bostock, M. and Ogievetsky, V. (2010). "A tour through the visualization

zoo." Commun. ACM 53(6): 59-67.

Heer, J. and Boyd, D. (2005). Vizster:Visualizing Online Social Networks.

Proceedings of IEEE Symposium on Information Visualization INFOVIS

2005, Minneapolis, MN, IEEE, pp.32-39

Heer, J. M. (2004). Prefuse a software framework for interactive information

visualization. Department of Electrical Engineering and Computer Sciences.

Berkeley, University of California. Master of Science.

Henry, N. (2008). Exploring Social Networks with Matrix-based Representations.

Sydney, University of Sydney. PhD.

Henry, N. and Fekete, J.-D. (2007). MatLink: Enhanced Matrix Visualization for

Analyzing Social Networks. Proceedings of the 11th IFIP TC 13 international

conference on Human-computer interaction Rio de Janeiro, Brazil, Springer,

pp.288-302

333

References

Henry, N., Fekete, J.-D. and McGuffin, M. J. (2007). "NodeTrix: A Hybrid

Visualization of Social Networks." IEEE Transactions on Visualization and

Computer Graphics 13(6).

Herman, I., Melancon, G. and Marshall, M. S. (2000). "Graph Visualization and

Navigation in Information Visualization: A Survey." IEEE Transactions on

Visualization and Computer Graphics 6(1): 24-43.

Hewagamage, K. P., Hirakawa, M. and Ichikawa, T. (1999). Interactive Visualization

of Spatiotemporal Patterns Using Spirals on a Geographical Map. Proceedings

of VL '99, IEEE Symposium on Visual Languages, pp.296-303

Higgins, P., Richards, D. and McGrath, M. (2001). Intelligent Visualisation of Social

Network Analysis Data. Proceedings of Pan-Sydney Workshop on Visual

Information Processing, Sydney, Australia

Hoek, P. (2011). "Parallel Arc Diagrams: Visualizing Temporal Interactions...."

Journal of Social Structure 12(7).

Huang, M. L. and Eades, P. (1998). A Fully Animated Interactive System for

Clustering and Navigating Huge Graphs. Proceedings of the 6th International

Symposium on Graph Drawing, Montreal, Canada, Springer-Verlag, pp.374-

383

Huang, M. L. and Eades, P. (1998). "A fully interactive system for clustering and

navigating large graphs." Graph Drawing, Springer Lecture Notes in Computer

Science: 374-383.

334

References

Huang, W., Hong, S.-H. and Eades, P. (2006). How People Read Sociograms: A

Questionnaire Study. Proceedings of the Asia-Pacific Symposium on

Information Visualization (APVIS 2006), Tokyo, Japan, Australian Computer

Society

Huang, W., Hong, S.-H. and Eades, P. (2007). "Effects of Sociogram Drawing

Conventions and Edge Crossings in Social Network Visualization." Journal of

Graph Algorithms and Applications.

Jamali, M. and Abolhassani, H. (2006). Different Aspects of Social Network Analysis.

Proceedings of the IEEE/WIC/ACM International Conference on Web

Intelligence (WI'06), Hong Kong

Johansson, S. and Johansson, J. (2010). "Visual analysis of mixed data sets using

interactive quantification." SIGKDD Explor. Newsl. 11(2): 29-38.

Kamada, T. and Kawai, S. (1989). "An algorithm for drawing general undirected

graphs." Inf. Process. Lett. 31(7 -15).

Kang, H., Plaisant, C., Lee, B. and Bederson, B. B. (2007). "NetLens: iterative

exploration of content-actor network data." Information Visualization(6): 18-

31.

Karahalios, K. G. and Viégas, F. B. (2006). Social Visualization: Exploring Text,

Audio, and Video Interaction. Proceedings of the Conference on Human

Factors in Computing Systems, CHI '06, Quebec, Canada, pp.1667-1670

335

References

Keller, R., Eckert, C. M. and Clarkson, P. J. (2006). "Matrices or node-link diagrams:

which visual representation is better for visualising connectivity models?"

Information Visualization 5: 62-76.

Keller, R., Eckert, C. M. and Clarkson, P. J. (2006). "Matrices or node-link diagrams:

which visual representation is better for visualising connectivity models?"

Information Visualization 5(1): 62-76.

Kerr, B. (2003). Thread arcs: an email thread visualization. Proceedings of the Ninth

annual IEEE conference on Information visualization, Seattle, Washington,

IEEE Computer Society, pp.211-218

Kidane, Y. H. and Gloor, P. A. (2005). Correlating temporal communication patterns

of the Eclipse open source community with performance and creativity.

Proceedings of NAACSOS North American Association for computational

Social and Organizational Science, Notre Dame

Kobourov, S. G. and Wampler, K. (2005). "Non-Euclidean Spring Embedders." IEEE

Transactions on Visualization and Computer Graphics 11(6): 757-767.

Kobsa, A. (2004). User Experiments with Tree Visualization Systems. Proceedings of

the IEEE Symposium on Information Visualization, IEEE Computer Society,

pp.9-16

Kosak, C., Marks, J. and Shieber, S. (1994). "Automating the layout of network

diagrams with specified visual organization." Systems, Man and Cybernetics,

IEEE Transactions on 24(3): 440-454.

336

References

Kossinets, G., Kleinberg, J. and Watts, D. (2008). "The Structure of Information

Pathways in a Social Communication Network." Computing Research

REpository (CoRR).

Krempel, L. (2009). Network Visualization. Sage Handbook of Social Network

Analysis. J. Scott and P. J. Carrington. London, SAGE.

Kruskal, J. B. (1964). "Multidimensional scaling by optimizing goodness of fit to a

nonmetric hypothesis." Psychometrika 29(1): 1-27.

Lamping, J. and Rao, R. (1999). The hyperbolic browser: a focus + context technique

for visualizing large hierarchies. Readings in information visualization. K. C.

Stuart, D. M. Jock and S. Ben, Morgan Kaufmann Publishers Inc.: 382-408.

Lamping, J., Rao, R. and Pirolli, P. (1995). A focus+context technique based on

hyperbolic geometry for visualizing large hierarchies. Proceedings of the

SIGCHI Conference on Human Factors in Computing Systems, Denver,

Colorado, USA, ACM Press/Addison-Wesley Publishing Co., pp.401-408

Le, M.-T., Dang, H.-V., Lim, E.-P. and Data, A. (2008). WikiNetViz: Visualizing

Friends and Adversaries in Implicit Social Networks. Proceedings of the

Intelligence and Security Informatics, 2008. ISI 2008. IEEE International

Conference, Taipei, Taiwan

Leavitt, H. J. (1950). "Communication Patterns in Task-Oriented Groups." Journal of

the Accoustical Society of America 12: 725-730.

337

References

Lee, B. (2006). Interactive Visualization for Trees and Graphs, University of

Maryland. PhD.

Lee, B., Plaisant, C., Parr, C. S., Fekete, J.-D. and Henry, N. (2006). Task taxonomy

for graph visualization. Proceedings of the 2006 AVI Workshop on BEyond

time and Errors: novel evaLuation methods for Information Visualization New

York, USA, ACM, pp.1-5

Leighton, T. and Rao, S. (1988). An approximate max-flow min-cut theorem for

uniform multicommodity flow problems with applications to approximation

algorithms. Proceedings of the 29th Annual Symposium on Foundations of

Computer Science, pp.422-431

Li, W., Eades, P. and Nikolov, N. (2005). Using spring algorithms to remove node

overlapping. proceedings of the 2005 Asia-Pacific symposium on Information

visualisation - Volume 45, Sydney, Australia, Australian Computer Society,

Inc., pp.131-140

Liiv, I. (2010). "Seriation and matrix reordering methods: An historical overview."

Statistical analysis and data mining 3(2): 70-91.

Linstone, H. A. and Turoff, M. (1975). The Delphi Method: Techniques and

Applications. Reading, MA, Addison-Wesley.

Lo, E., Au, A., Hoek, P. and Eberl, L. (2010). Combining Contextual Data in the

Analysis of Temporal Social Networks... Proceedings of the TTCP Human

Sciences Symposium, Sydney, Australia.

338

References

Lo, E. H. S., Au, T. A., Hoek, P. J. and Eberl, L. (2011). "Analysis of Evolving Team

Interactions in Dynamic Targeting." International Journal of Intelligent

Defence Support Systems 4(4/2011): 309-327.

Lo, E. H. S., Au, T. A., Hoek, P. J. and La, P. D. (2009). Analysis of team interactions

in dynamic targeting. Proceedings of the SimTect 2009 Conference:

Simulation - Concepts, Capability and Technology, Adelaide Convention

Centre

Luke, D. A. (2005). "Getting the Big Picture in Community Science: Methods That

Capture Context." American Journal of Community Psychology 35(3/4).

Marriott, K., Stuckey, P., Tam, V. and He, W. (2003). "Removing Node Overlapping

in Graph Layout Using Constrained Optimization." Constraints 8(2): 143-171.

Marshall, M. S. (2001). Methods and Tools for the Visualization and Navigation of

Graphs Mathematics and Computer, University of Bordeaux. Ph.d.

McGrath, C. and Blythe, J. (2004). "Do You See What I Want You to See? The Effects

of Motion and Spatial Layout on Viewers' Perceptions of Graph Structure "

Journal of Social Structure 5(2).

McGrath, C., Blythe, J. and Krackhardt, D. (1997). "The effect of spatial arrangement

of judgments and errors in interpreting graphs." Social Networks 19(3): 223-

242.

Milgram, S. (1967). "The Small World Problem." Psychology Today 2(1): 60-67.

339

References

Moody, J., McFarland, D. and Bender-deMoll, S. (2005). "Dynamic Network

Visualization." American Journal of Sociology 10(4).

Moody, J. and White, D. R. (2003). "Structural cohesion and embeddedness: a

hierarchical concept of social groups." American Sociological Review

69(February): 103-127.

Moreno, J. L. (1953). Who Shall Survive: Foundations of Sociometry, Group

Psychotherapy, and Sociodrama. Washington, DC, Beacon House.

Morse, E., Lewis, M. and Olsen, K. A. (2000). "Evaluating visualizations: using a

taxonomic guide." Int. J. Hum.-Comput. Stud. 53(5): 637-662.

Munzner, T. (1997). H3: laying out large directed graphs in 3D hyperbolic space.

Proceedings of the 1997 IEEE Symposium on Information Visualization

(InfoVis '97), IEEE Computer Society, pp.2

Munzner, T. (1998). Drawing Large Graphs with H3Viewer and Site Manager. Graph

Drawing. S. Whitesides, Springer Berlin Heidelberg. 1547: 384-393.

Munzner, T. (2000). Interactive Visualization of Large Graphs and networks.

Department of Computer Science, Stanford University PhD.

Munzner, T. (2009). "A Nested Model for Visualization Design and Validation." IEEE

Transactions on Visualization and Computer Graphics 15(6): 921-928.

Mutton, P. (2004). Inferring and Visualizing Social Netoworks on Internet Relay Chat.

Proceedings of the Eighth International Conference on Information

Visualisation (IV'04), IEEE 340

References

Nascimento, H. A. D. d. (2001). A framework for human-computer interaction in

directed graph drawing. Proceedings of the 2001 Asia-Pacific symposium on

Information visualisation - Volume 9, Sydney, Australia, Australian Computer

Society, Inc., pp.63-69

Nascimento, H. D. and Eades, P. (2002). User Hints for Directed Graph Drawing.

Graph Drawing. P. Mutzel, M. Jünger and S. Leipert, Springer Berlin

Heidelberg. 2265: 205-219.

Natrella, M. (2010). NIST/SEMATECH e-Handbook of Statistical Methods,

NIST/SEMATECH.

Nguyen, Q., Eades, P. and Hong, S.-H. (2013). On the faithfulness of graph

visualizations. Proceedings of the 20th international conference on Graph

Drawing, Redmond, WA, Springer-Verlag, pp.566-568

Nicholson, T. A. J. (1968). "Permutation procedure for minimising the number of

crossings in a network." Electrical Engineers, Proceedings of the Institution of

115(1): 21-26.

Nielsen, J. and Molich, R. (1990). Heuristic evaluation of user interfaces. Proceedings

of the SIGCHI conference on Human factors in computing systems:

Empowering people, Seattle, Washington, United States, ACM, pp.249-256

Noack, A. (2003). An Energy Model for Visual Graph Clustering. Proceedings of the

11th International Symposium on Graph Drawing (GD'03), Perugia, Italy,

Springer Berlin Heidelberg, pp.425-436

341

References

Noack, A. (2006). Energy-Based Clustering of Graphs with Nonuniform Degrees.

Graph Drawing. P. Healy and N. Nikolov, Springer Berlin Heidelberg. 3843:

309-320.

O'Madadhain, J., Fisher, D. and Nelson, T. "JUNG - Java Universal Network/Graph

Framework." Sourceforge.net, Available from: http://jung.sourceforge.net/,

Accessed 23/4/2013.

Opsahl, T., Agneessens, F. and Skvoretz, J. (2010). "Node centrality in weighted

networks: Generalizing degree and shortest paths." Social Networks 32(2):

235.

Perer, A. (2008). Using SocialAction to Uncover Structure in Social Netowrks over

Time. Proceedings of the IEEE VAST Symposium, Piscataway, NJ, IEEE,

pp.213-214

Perer, A. and Shneiderman, B. (2006). "Balancing systematic and Flexible Exploration

of Social Networks." IEEE Transactions on Visualization and Computer

Graphics 12(5).

Perer, A. and Shneiderman, B. (2008). Integrating Statistics and Visualization: Case

Studies of Gaining Clarity during Exploratory Data Analysis. Proceedings of

the CHI 2008 - Conference on Human Factors in Computing Systems,

Florence, Italy

Perer, A., Shneiderman, B. and Oard, D. W. (2006). "Using rhythms of relationships

to understand e-mail archives." J. Am. Soc. Inf. Sci. Technol. 57(14): 1936-

1948. 342

References

Perer, A. and Smith, M. A. (2006). Contrasting portraits of email practices: visual

approaches to reflection and analysis. Proceedings of the working conference

on Advanced visual interfaces, Venezia, Italy, ACM, pp.389-395

Peterson, E. (2011). Time Spring Layout for Visualization of Dynamic Social

Networks. Proceedings of the Network Science Workshop (NSW) IEEE, West

Point, NY, pp.98-104

Phan, D. (2008). Supporting the visualization and forensic analysis of network events,

Stanford University: 108.

Plaisant, C. (2004). The Challenge of Information Visualization Evaluation.

Proceedings of the working conference on Advanced Visual Interfaces (AVI

2004),, Gallipoli, Italy, ACM Press: New York, pp.109-116

Plaisant, C., Bederson, B. B., Kang, H. and Lee, B. (2006). Exploring ENRON Email

with NetLens, Human-Computer Interaction Laboratory, University of

Maryland.

Plaisant, C., Milash, B., Rose, A., Widoff, S. and Schneiderman, B. (1996). Lifelines:

visualizing personal histories. Proceedings of the Conference on Human

Factors in Computing Systems: Common Ground, SIGCHI, Vancouver,

British Columbia, Canada, pp.221-227

Pohl, M., Reitz, F. and Birke, P. (2008). As Time Goes by- Integrated Visualization

and Analysis of Dynamic Networks. Proceedings of the working conference

on Advanced Visual Interfaces AVI`08, Napoli, Italy, ACM

343

References

Pousman, Z., Stako, J. and Mataas, M. (2007). Casual information visualization:

Depictions of data in everyday life. Proceedings of the IEEE Information

Visualization Conference (InfoVis'07), Sacramento, California. , pp.1145-

1152

Pretorius, A. J. and Wijk, J. J. V. (2008). "Visual Inspection of Multivariate Graphs."

presented at Comput. Graph Forum: 967-974.

Purchase, H., Carrington, D. and Allder, J.-A. (2002). "Empirical Evaluation of

Aesthetics-based Graph Layout." Empirical Software Engineering 7(3): 233-

255.

Purchase, H. C., Carrington, D. A. and Allder, J.-A. (2000). Experimenting with

Aesthetics-Based Graph Layout. Proceedings of the First International

Conference on Theory and Application of Diagrams, Springer-Verlag, pp.498-

501

Rao, A. R. and Bandyopadhyay, S. (1987). "Measures of Reciprocity in a Social

Network." Sankhya: The Indian Journal of Statistics 49, Series A(2): 141-188.

Reda, K., Tantipathananandh, C., Johnson, A., Leigh, J. and Berger-Wolf, T. (2011).

"Visualizing the Evolution of Community Structures in Dynamic Social

Networks." Computer Graphics Forum 30(3): 1061-1070.

Renfro, R. S. (2001). Modeling and Analysis of Social Networks. Graduate School of

Engineering and Management, Force Institute of Technology Air University.

PhD.

344

References

Rhyne, T.-M., Tory, M., Munzner, T., Ward, M., Johnson, C. and Laidlaw, D. H.

(2003). Information and Scientific Visualization: Separate but Equal or Happy

Together at Last. Proceedings of the 14th IEEE Visualization 2003 (VIS'03),

Washington, DC, USA, IEEE Computer Society pp.115

Riche, N. H. (2009). Beyond System Logging: Human Logging for Evaluating

Information Visualization. Proceedings of the BELIV'10 workshop at ACM

SIGCHI Atlanta, Georgia, USA

Ryan Cragun, D. C. (2008). Introduction to Sociology, Blacksleet River.

Saaty, T. L. (1964). The minumum number of intersections in complete graphs.

Proceedings of the National Academy of Sciene of the United States of

America, pp.688-690

Santamaría, R. and Therón, R. (2008). Overlapping Clustered Graphs: Co-authorship

Networks Visualization. Proceedings of the 9th international symposium on

Smart Graphics, Rennes, France, Springer-Verlag, pp.190-199

Santos, B. S. (2008). Evaluating Visualization techniques and tools: what are the main

issues? Proceedings of the CHI`08 Conference on Human Factors in

Computing Systems, Florence, Italy

Schaefer, M., Wanner, F., Mansmann, F., Scheible, C., Stennett, V., Hasselrot, A. T.

and Keim, D. A. (2011). Visual Pattern Discovery in Timed Event Data.

Proceedings of the SPIE 7868, Visualization and Data Analysis, San Francisco,

USA

345

References

Schaeffer, S. E. (2007). "Graph Clustering." Computer Science Review 1(1): 27-64.

Schaeffer, S. E. (2007). "Survey: Graph clustering." Comput. Sci. Rev. 1(1): 27-64.

Schindling, E. (2007). "Cinematic Particles." Retrieved 26/06, 2012, from

http://www.evsc.net/v6/htm/cinematic.htm.

Schulz, H., Hadlak, S. and Schumann, H. (2011). "The Design Space of Implicit

Hierarchy Visualization: A Survey." Visualization and Computer Graphics,

IEEE Transactions on 17(4): 393-411.

Scott, J. P. (1991). Social Network Analysis: A Handbook, Sage Publications Ltd,

2000.

Seary, A. J. (2005). MultiNet: An Interactive Program for Analysing and Visualizing

Complex Networks, Simon Fraser University.

Shannon, R., Holland, T. and Quigley, A. (2008). Multivariate Graph Drawing using

Parallel Coordinate Visualisations.

Shaverdian, A. A., Zhou, H., Michailidis, G. and Jagadish, H. V. (2009). Algebraic

visual analysis: the Catalano phone call data set case study. Proceedings of the

ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery:

Integrating Automated Analysis with Interactive Exploration, Paris, France,

ACM, pp.74-82

Shen, Z. and Ma, K.-L. (2007 ). Path Visualization for Adjacency Matrices.

Proceedings of the Eurographics/ IEEE-VGTC Symposium on Visualization

EuroVis07, pp.83-90 346

References

Shi, J. and Malik, J. (2000). "Normalized cuts and image segmentation." Pattern

Analysis and Machine Intelligence, IEEE Transactions on 22(8): 888-905.

Shneiderman, B. (1996). "The Eyes Have it: A Task By Data Type Taxonomy for

Information Visualizations." IEEE Symposium on Visual Languages 1: pp336.

Shneiderman, B. and Aris, A. (2006). "Network Visualization by Semantic

Substrates." IEEE Transactions on Visualization and Computer Graphics

12(5): 733-740.

Shneiderman, B. and Plaisant, C. (2006). Strategies for evaluating information

visualization tools: multi-dimensional in-depth long-term case studies.

Proceedings of the 2006 AVI workshop on BEyond time and errors: novel

evaluation methods for information visualization, Venice, Italy, ACM, pp.1-7

Spritzer, A. S. and Freitas, C. M. D. S. (2012). "Design and Evaluation of MagnetViz

- A Graph Visualization Tool." IEEE Transactions on Visualization and

Computer Graphics 18(5): 822-835.

Stasko, J. (2000). "An evaluation of space-filling information visualizations for

depicting hierarchical structures." Int. J. Hum.-Comput. Stud. 53(5): 663-694.

Stolpnik, A. (2009). Visual Hints for Semantic Graph Exploration. Faculty of Exact

Sciences, School of Computer Science. Tel-Aviv, Tel-Aviv University. M.Sc.

Tamassia, R. (1987). "On embedding a graph in the grid with the minimum number of

bends." SIAM J. Comput. 16(3): 421-444.

347

References

Tamassia, R. (1998). "Constraints in Graph Drawing Algorithms." Constraints 3(1):

87-120.

Tat, A. and Carpendale, M. S. T. (2002). Visualising Human Dialog. Proceedings of

the Sixth International Conference on Information Visualisation (IV’02),

London, UK, pp.295-302

Tat, A. and Carpendale, S. (2006). CrystalChat: Visualizing Personal Chat History.

Procedings of the 39th Annual Hawaii International Conference on System

Sciences HICSS'06, Hawaii

Todorovic, D. (2008). "Gestalt principles." Scholarpedia 3: 5345.

Trier, M. (2006). "Commetrix." Retrieved 28/06, 2012, from

http://www.commetrix.de/.

Trier, M. (2008). "Towards Dynamic Visualization for Understanding Evolution of

Digital Communication Networks." Information Systems Research 19(3): 335-

350.

Tufte, E. R. (1983). The visual display of quantitative information, Graphics Press.

Tufte, E. R. (1990). Envisioning information, Graphics Press.

Tufte, E. R. (2001). The Visual Display of Quantitative Information, 2nd Edition.

Cheshire, Graphics Press.

Tutte, W. T. (1963). How to draw a graph. Proceedings of the London Mathematics

Society, pp.743-768

348

References

Utech, J., Branke, J., Schmeck, H. and Eades, P. (1998). An Evolutionary Algorithm

for Drawing Directed Graphs. Proceedings of the 1998 International

Conference on Imaging Science, Systems, and Technology (CISST'98), Las

Vegas, Nevada USA, CSREA Press, pp.154-160

Valiati, E. R. A., Pimenta, M. S. and Freitas, C. M. D. S. (2006). A taxonomy of tasks

for guiding the evaluation of multidimensional visualizations. Proceedings of

the 2006 AVI workshop on BEyond time and errors: novel evaluation methods

for information visualization, Venice, Italy, ACM, pp.1-6

Van Ham, F. (2005). Interactive Visualization of Large Graphs, Technische

Universiteit Eindhoven. PhD van Ham, F. and van Wijk, J. J. (2002). Beamtrees: compact visualization of large

hierarchies. Proceedings of the IEEE Symposium on Information

Visualization, INFOVIS 2002, pp.93-100

Van Wijk, J. J. (2005). The Value of Visualization. Proceedings of the 16th IEEE

Visualization 2005 (VIS05), Dept. of Math. & Compupt. Sci., Technische

Univ. Eindhoven, Netherlands, pp.79-86

Van Wijk, J. J. (2006). "Views on Visualization." IEEE Transactions on Visualization

and Computer Graphics 12(4): 421-433.

Viégas, F. and Wattenberg, M. (2004). Studying Cooperation and Conflict between

Authors with history flow Visualizations. Proceedings of the SIGCHI

349

References

Conference on Human Factors in Computing Systems, Vienna, Austria,

pp.757-582

Viégas, F. B. and Donath, J. (2004). "Social Network Visualization: Can We Go

Beyond the Graph?" Workshop on Social Networks, CSCW 4: 6-10.

Viégas, F. B., Golder, S. and Donath, J. (2006). Visualizing email content: portraying

relationships from conversational histories. Proceedings of the SIGCHI

conference on Human Factors in computing systems, Montral, Qubec, Canada,

ACM, pp.979-988

Viermetz, M. and Skubacz, M. (2007). Using Topic Discovery to Segment Large

Communication Graphs for Social Network Analysis. Proceedings of the

IEEE/WIC/ACM International Conference on Web Intelligence, Silicon

Valley, CA, pp.95-99

Viger, F. and Latapy, M. (2005). Efficient and simple generation of random simple

connected graphs with prescribed degree sequence. Proceedings of the 11th

annual international conference on Computing and Combinatorics, Kunming,

China, Springer-Verlag, pp.440-449

Vigliotti, M. G. and Hankin, C. (2012). Discovery of anomalous behaviour and

network inference in security setting. London, Imperial College. von Landesberger, T., Kuijper, A., Schreck, T., Kohlhammer, J., van Wijk, J. J.,

Fekete, J. D. and Fellner, D. W. (2011). "Visual Analysis of Large Graphs:

State-of-the-Art and Future Research Challenges." Computer Graphics Forum

30(6): 1719-1749. 350

References

Walshaw, C. (2001). A Multilevel Algorithm for Force-Directed Graph Drawing.

Graph Drawing. J. Marks, Springer Berlin Heidelberg. 1984: 171-182.

Ware, C. (2000). The Visual Representation of Information Structures. Proceedings of

the 8th International Symposium on Graph Drawing, pp.1-4

Ware, C., Purchase, H., Colpoys, L. and McGill, M. (2002). "Cognitive measurements

of graph aesthetics." Information Visualization 1(2): 103-110.

Wasserman, S. and Faust, K. (1994). Social network analysis : methods and

applications, Cambridge [England] ; New York : Cambridge University; Press,

1997.

Wattenberg, M. (2002). Arc Diagrams: Visualizing Structure in Strings. Proceedings

of the IEEE Symposium on Information Visualization (InfoVis'02), Boston,

Massachussets, IEEE Computer Society, pp.110-116

Wattenberg, M. (2006). Visual exploration of multivariate graphs. Proceedings of the

SIGCHI conference on Human Factors in computing systems, Montreal,

Qubec, Canada, ACM, pp.811-819

Weaver, C. E. (2006). Improvise: A User Interface for Interactive Construction of

Highly-coordinated Visualizations, University of Wisconsin--Madison.

Weber, M., Alexa, M. and Müller, W. (2001). Visualizing Time-Series on Spirals.

Proceedings of InfoVis '01 IEEE Symposium Information Visualization, pp.7-

14

351

References

Weng, C.-Y., Chu, W.-T. and Wu, J.-L. (2007). Movie Analysis Based on Roles'

Social Network. Proceedings of the IEEE International Conference on

Multimedia and Expo, Beijing, pp.1403-1406

Wills, G. (1997). NicheWorks — Interactive visualization of very large graphs. Graph

Drawing. G. DiBattista, Springer Berlin Heidelberg. 1353: 403-414.

Winn, W. (1994). "Contributions of Perceptual and Cognitive Processes to the

Comprehension of Graphics." Comprehension of Graphics(3): 27.

Wolfe, J. M. (2005). Guidance of Visual Search by Preattentive Information.

Neurobiology of attention. L. Itti, G. Rees and J. Tsotsos. San Diego, CA:,

Academic Press / Elsevier.: 101-104.

Wong, P. C., Foote, H., Mackey, P., Chin, G., Sofia, H. and Thomas, J. (2008). "A

dynamic multiscale magnifying tool for exploring large sparse graphs."

Information Visualization 7(2): 105-117.

Xu, K., Cunningham, A., Hong, S.-H. and Thomas, B. H. (2007). GraphScape:

Integrated Multivariate Network Visualization. Proceedings of the 6th

International AsiaPacific Symposium on Visualization, Sydney, Australia,

IEEE, pp.33-40

Yee, K.-P., Fisher, D., Dhamija, R. and Hearst, M. (2001). Animated Exploration of

Dynamic Graphs with Radial Layout. Proceedings of the IEEE Symposium on

Information Visualization 2001 (INFOVIS'01), IEEE Computer Society, pp.43

352

References

Yi, J., Elmqvist, N. and Lee, S. (2010). "TimeMatrix: Analyzing Temporal Social

Networks Using Interactive Matrix-Based Visualizations." International

Journal of Human-Computer Interaction 26(11): 1031-1051.

Yi, J. S., Kang, Y. a., Stasko, J. and Jacko, J. (2007). "Toward a Deeper Understanding

of the Role of Interaction in Information Visualization." IEEE Transactions on

Visualization and Computer Graphics 13(6): 1224-1231.

Yi, J. S., Melton, R., Stasko, J. and Jacko, J. A. (2005). "Dust & Magnet: multivariate

information visualization using a magnet metaphor." Information Visualization

4: 239-256.

Zhang, Q.-G., Liu, H.-Y., Zhang, W. and Guo, Y.-J. (2005). Drawing Undirected

Graphs with Genetic Algorithms. Advances in Natural Computation. L. Wang,

K. Chen and Y. Ong, Springer Berlin Heidelberg. 3612: 28-36.

Zhou, Y., Cheng, H. and Yu, J. X. (2009). "Graph clustering based on

structural/attribute similarities." Proc. VLDB Endow. 2(1): 718-729.

Zhu, J. (2007). Opportunities and Challenges for Network Analysis of Social and

Behavioral Data. Proceedings of the Seminar Series on Chaos, Control and

Complex Networks, Hong Kong, Hong Kong Polytechnic University

353

Appendix A - Glossary

Appendix A - Glossary

This glossary provides a short explanation of some of the technical terms used in this thesis. Terms in italics are also defined in the glossary.

Adjacent: A node is adjacent to another if there is an edge connecting them.

Aggregation function: A function that creates a numeric representation of a set of data cases (e.g. average, count, sum).

Animation: A sequence of images viewed in a timed sequence to give the impression of movement. It is particularly suitable to represent changes with the flow of time.

Often used to visualise communication or flow and make obvious causality.

Arc: In graph drawing an arc, edge or link represents a relationship between nodes.

Attribute Domain: the set of values allowed in an attribute.

Bipartite graph: A graph, 퐵 = [푁, 퐸] where N is a finite set of nodes and E is a collection of pairs of nodes in which N is partitioned into two disjoint subsets and no edge in E has both end points in the same subset.

Brushing and Linking: the connection of two or more views whereby the interactive changes imposed on one component affect the representation of the other component.

354

Appendix A - Glossary

Comma-Separated Values: A plain text file format that stores tabular data. It consists of any number of records in which the fields are separated with a comma.

Connected: Any two nodes in a graph are connected if there is a path from on to the other.

Degree Centrality: The simplest form of centrality measure and can be stated as the

number of ties a node has to all others. In directed networks this measure can

be divided into two subtypes, in-degree and out-degree.

Density (in networks): The proportion of all possible ties that is present in the

network.

Diameter (in networks): The largest geodesic distance in the graph.

Ecological Validity: The degree to which the behaviours observed and recorded in a

study reflect the behaviours that actually occur in natural settings.

Edge: An edge represents a relationship between nodes in a network. The term is

used in graph theory (as in directed and undirected edges). In social network

analysis the term is often used interchangeably with tie, arc, link.

Entity: A data element represented in a graph. For example: a person or an

organisation.

Force-Directed Algorithm: Force-directed algorithms treat the graph as a physical

system with forces action on the nodes of the graph. Each pass of the algorithm

progressively minimises the energy until a reasonable stable layout is reached.

Geodesic: A curve that locally minimizes the distance between two points in space.

In non-curved space the geodesic is a straight line. 355

Appendix A - Glossary

Geovisualisation: Short for geographic visualisation.

Graph: A graph G= [N, E] where N is a finite set of nodes and E is a collection of pairs

of nodes represented as edges.

Graph Aggregation: Any form of clustering or merging of graph elements in a

hierarchical manner for the purpose of reducing the number of elements to

render to the screen.

Glyph (in information technology): A symbol or stylized figure that contributes to

meaning of what is being presented.

Human-computer Interaction: The study of the interaction between people and

computers.

In-Degree: In a directed graph the count of edges going into a node.

Information Visualisation (InfoVis): The visual representation of collections of non-

numerical information allowing one to form a mental model of information. The

principal task of information visualisation is to allow information to be derived

from data.

Link: In social network analysis the term is often used interchangeably with tie, arc.

Node: An individual entity in a network. In social networks the term node and actors

are often used interchangeably. In graph theory the term vertices is often used.

A point in a graph

Node Link Diagram: Diagrams consisting of points representing entities or concepts

connected via lines representing a relationship between the two nodes.

Multi-Dimensional Scaling: maps a data set in higher dimensions to lower

dimensions by non-linear projection, so that the distance between data points in 356

Appendix A - Glossary

lower dimensions best preserves the similarities of dissimilarities in the original

distance matrix (Abdi, O'Toole et al. 2005).

Out-Degree: In a directed graph the count of edges going out of the node.

Preattentive processing: a special property of the human visual system that enables

the processing visual information very quickly without the need for focused

attention. Often times of less than 200 to 250 milliseconds are quoted as being

preattentive.

Singular value decomposition (SVD): an algebraic procedure that decomposes a data

matrix into its “basic structure”.

Scientific Visualisation (SciVis): Usually focuses on real-world objects that cannot

normally be seen such as fluid dynamics and molecules and has applications in

biology, chemistry, engineering and medicine to name a few.

Size (in networks): The number of nodes (or actors) in the network.

Sociogram: Usually a node-link diagram representing social relationships in a

network. A tool of Sociometry.

Sociometry: An early version of social network analysis.

Relation: A connection between a pair of entities. For example person A is “son of”

person B.

Render: The process of ‘drawing’ the graph nodes and edges on the screen and

determining how each node and edge is displayed.

Tie (in social networks): A relationship between actors in the network. Interpersonal

ties can be categorised into three varieties: strong, week or absent.

357

Appendix A - Glossary

Two mode matrix: A data matrix which the rows and columns represent differ

objects.

Visualisation: Formation in the mind of the image of an abstract concept.

Visual Analytics: The science of analytical reasoning facilitated by interactive visual

interfaces. Combines and extends the fields of Information Visualisation and

Scientific Visualisation.

358

Appendix A - Glossary

359

Appendix B – Pseudo Code & Class Diagram

Appendix B – Pseudo Code & Class Diagram

ABGV

//Customized force directed layout class methods //primary methods only getSpringLength (Edge) { Using globally accessible variables natural_attribute_to_attribute_spring_length; natural_node_to_attrib_spring_length natural_node_to_node_differentAttrib natural_node_to_node_sameAttrib current scaling_factor; //return the appropriate natural spring length dependent on which nodes are being connected by //the edge if Edge_source type is attribute node and Edge_destination type is attribute node { return current natural_attribute_to_attribute_spring_length x current scaling_factor; } else if Edge_source type is attribute node or Edge_destination type is attribute node{ return current natural_node_to_attrib_spring_length x current scaling_factor; } else if Edge_source attribute <> Edge_destination is attribute node{ return current attribute_to_attribute_spring_length x current scaling_factor; } else return current attribute_to_attribute_spring_length x current scaling_factor;; } } } } getMassValue (Visual Item) { Using globally accessible variables attribute_mass

360

Appendix B – Pseudo Code & Class Diagram

default node mass //return the currently set mass for attributes and nodes. These are used independently to allow for //visual separation of node-attribute and node-node relationships if Visual Item type is attribute node { return current attribute_mass else return current default node mass } get Spring Coefficient (Edge) { Using globally accessible variables spring_coefficient //return the currently set spring_coefficient for and edge item. Golbally modifying this /facilitates //adjusting the density of groupings. Allowing those that span groups to be more easily identified. if Edge is attribute node { return spring_coefficient x current_scaling } Return default_spring_coefficent x current_scaling } // Edge renderer is required when the visualisation class is called upon to re-render. This /class draws //the link shape and ensures the attribute-attribute ‘hidden’ links are invisible. New_Edge_Renderer extends EdgeRenderer { //the prefuse EdgeRenderer … Shape getRawShape (Edge){ Shape EdgeRenderer::Arc = null Shape EdgeRenderer::Arrow = null EdgePoints[2] = {} //an array to hold the endpoints of the arc //return a shape for all but attribute to attribute links if Edge source or target is not an attribute node { Visual_Item Source = Edge.getSource Visual_Item Target = Edge.getTarget EdgePoints[0] = 2Dpoint (Source)

if Edge is directed { //determine arc direction Visual_Item Dest

361

Appendix B – Pseudo Code & Class Diagram

If EdgeRender::forward{ Dest = Target } Else{ Dest = Source } Arrow = EdgeRender::getArrowTransform (Dest.start, Dest.end) } // If drawing straight lines if Arc = EdgeRender::Line type = straight_line_type { Arc = line(Source.x, Source.y, Target.x, Target.y) } // If drawing curved lines if Arc = EdgeRender:::Line type = curved_line_type { Arc = cubic line (Source.x, Source.y, getControlPoints(source),getControlPoints(target) Target.x, Target.y) } Return Arc }

The primary class structure is presented in Figure 88 ignoring much of the

Prefuse framework details.

362

Appendix B – Pseudo Code & Class Diagram

Figure 88 - ABGV Class Diagram

363

Appendix B – Pseudo Code & Class Diagram

SNAC2

NetworkGraphlet methods //The NetworkGraphlet is the primary visualisation of the network as it is dynamic //The primary method is the makeGraph method. Called when a change event triggers a call //hierarchy leading to this method. This method updates the current displayed graph vertices and links. //The graph is encapsulated in the larger application by the class snaMetrics3D it /includes a 3 //dimensional member array indexed by minutes from the start of the dataset //currenttime is the time pointed to by the current slider position public void makeGraph(currenttime) { //get a 2D matrix array for the current time and window selection graphAtTimeT [ ] [ ] = snaMetrics3D.get_current_matrix(currenttime) //iterate through the currently displayed nodes and modify edges so as to maintain //node positions to make the visualisation contiguous

for s in all currently_displayed_operators{ for d in all currently_displayed_operators{ s_op = s //source node d_op = d //destination node link_count = graphAtTime[s][d] potential_edge_name = s_op - d_op graph_link = getLink(potential_edge_name) if s_op or d_op are in non_displaylist { //do not display this link if graph_link exists { //this link exists in the Jung representation graph.remove edge (s_op, d_op) //so remove it } else { // if edge is to be displayed If graph_link exists { //and link already exists graph_link.edgeCount = link_count } else { //edge doesn’t currently exist, make one graph_link = new graph_link(s_op, d_op) graph_link.edgeCount = link_count graph.add_edge(graph_link) } } } 364

Appendix B – Pseudo Code & Class Diagram

Visualisation_veiwer.repaint //draw the updated graph }

The NetworkGraphet uses two classes to model the link and nodes in the Jung environment. GraphLink models the link and GraphNode models the node.

Class GraphLink { //the representation of a link in the graph Id //store the name of the link edgeCount //a place to store edge count between 2 nodes public GraphLink(fromNode, toNode) { id = fromNode+"-"+toNode; //link name derived from nodes } incrementCount () //increase edge count { ++edgeCount; } decrementCount () //decrease edge count { --edgeCount; } }

Class GraphNode { // the representation a node in the network Id //name of the node public GraphNode(id) { this.id = id; } public String toString() { return id; // JUNG makes use of this } } The primary class structure is presented in Figure 89.

365

Appendix B – Pseudo Code & Class Diagram

Figure 89 - Partial Class diagram for SNAC2

366

Appendix B – Pseudo Code & Class Diagram

TIPAD

//TIPADContainer render methods render { selectedActor = namepanel.getselected do while set_of_actors_iterator has more { actor_name = set_of_actors_iterator.next //below sets the colour of event components in which the selected actor participates throughout the //TIPAD Container For i in componentList { eventObject = componentList[i] //eventObject contain event information including actors by are also an extension of a Java //GUI component if selected_Name ≠ null or mode ≠1 { for j in event_actor_list { actor = event_actor_list [j] if actor.name = selected_Name { found = true eventObject.set_colour (a.get_colour) //use java subsyst eventObject.set_background (default_select_colour) } } if found = false { eventObject.set_background (default_found_colour) }

}

//below draws lines connecting the actor name with the correct event component and aligns with the //actors contained within it

If mode =1 or (mode=2 and found) or mode =3 or mode=4 {

for j in event_actor_list { actor = event_actor_list [j] if Actor_name = actor.name { Point = (actor.getLocationOnScreen);

xpos =Point.x + scrollvalue - parentPos.x //x position of line

Graphics_colour = actor.colour

If mode=4 and found = false {

367

Appendix B – Pseudo Code & Class Diagram

// white shadow lines in this mode

Graphics_colour = white

}

//only to the following if you are not in mode 4 or nothing is selected

//will result in only white lines being drawn without the riser line to the event

If not (no_actor_selected and mode = 4) {

Graphics_drawline Point.x, Point.y + eventObject.height – parentPos.y

}

//below draws the horizontal line connecting to the appropriate riser line to the event

Graphics_drawline 0, riserHeight , xpos-strokesize, riserHeight

}

}

} }

The primary class structure is presented in Figure 90.

368

Appendix B – Pseudo Code & Class Diagram

Figure 90 - Partial TIPAD application class diagram

369

Appendix B – Pseudo Code & Class Diagram

TIMER

Structure Edge { //the structure of an edge Edge next Edge previous Time time } Edge first Edge last //initialise the variables where window first edge = dataset first edge and window last edge = dataset //last edge init(Dataset data, Time initialStart, Time initialEnd) { first = data.first last = data.last translate(initialStart, initialEnd) //use translate function for rendering }

//NOTE draw(Edge) is a routine which draws the parameter to the correct place on screen //Integer scalingFactor is the factor by which the standard time scale is multiplied (used in draw(Edge)) redraw() { //simply put the edges in the viewport clearScreen() for Edge e between first and last (inclusive) draw(e) }

Time windowStart Time windowEnd //translate provides the rendering of appropriate edges within specified timeframes. translate(Time newStart, Time newEnd) { Edge oldFirst = first

370

Appendix B – Pseudo Code & Class Diagram

Edge oldLast = last //Extends the start of the window backward, so that the first Edge is within the new boundaries if windowStart after newStart { Edge next = first.previous

while next.time not after nextStart first = next next = first.previous //Advances the start of the window until the Edge stored in first is within the current window } else if windowStart before newStart { Edge next = first.next while next.time not before nextStart { first = next next = first.next } } //Shrinks the end of the window backward, until the Edge stored in last is within the current window if windowEnd after newEnd { Edge next = last.previous while next.time not after nextEnd { last = next next = last.previous } //Advances the end of the window forwards, so that the Edge stored in last is the last Edge within the //new boundaries } else if windowEnd before newEnd { Edge next = last.next while next.time not before nextEnd last = next next = last.next }

371

Appendix B – Pseudo Code & Class Diagram

//If the first or last Edge was updated, redraw the visible edges if oldFirst does not equal first OR oldLast does not equal last { redraw() } } //provides zooming using the translate function zoom(Integer scalingFactor) { scalingFactor = scalingFactor * newFactor translate(windowStart + (windowStart / scalingFactor), windowEnd - (windowEnd / scalingFactor)) } The primary class structure is presented in Figure 91 to 93Figure 88 .

Figure 91 - Partial class diagram of the "View" portion of the TIMER tool 372

Appendix B – Pseudo Code & Class Diagram

Figure 92 - Partial class diagram of the "Model" portion of the TIMER tool

373

Appendix B – Pseudo Code & Class Diagram

Figure 93 -Partial class diagram of the "View/Controller" portion of the TIMER tool

374