Using immersive real-time collaboration environments to manage IP networks

A thesis submitted for the degree of Doctor of Philosophy

Warren Harrop, BEng(Hons)(Telecommunications and Technologies) & BAppSc(Multimedia Technologies) (Swinburne University), Centre for Advanced Internet Architectures, Faculty of Science, Engineering and Technology, Swinburne University of Technology, Melbourne, Victoria, Australia.

August 21, 2014

Declaration

This thesis contains no material which has been accepted for the award to the candidate of any other degree or diploma, except where due reference is made in the text of the examinable outcome. To the best of the candidate’s knowledge this thesis contains no material previously published or written by another person except where due reference is made in the text of the examinable outcome; and where the work is based on joint research or publications, discloses the relative contributions of the respective workers or authors.

Warren Harrop Centre for Advanced Internet Architectures (CAIA) - Faculty of Science, Engineering and Technology Swinburne University of Technology August 21, 2014

iii

Publications arising from this thesis

Some preliminary results and discussions in this thesis have been previously published in peer- reviewed literature: W. Harrop and G. Armitage, “Intuitive Real-Time Network Monitoring Using Visually Orthog- onal 3D Metaphors,” in Australian Telecommunications Networks & Applications Conference 2004 (ATNAC 2004), Sydney, Australia, 8-10 December 2004, pp. 276–282. [Online]. Available: http: //caia.swin.edu.au/pubs/ATNAC04/harrop-armitage-ATNAC2004.pdf

W. Harrop and G. Armitage, “Modifying first person shooter games to perform real time network monitoring and control tasks,” in NetGames ’06: Proceedings of 5th ACM SIGCOMM workshop on Network and system support for games. New York, NY, USA: ACM, 2006, p. 10.

W. Harrop and G. Armitage, “Real-time collaborative network monitoring and control using 3D game engines for representation and interaction,” in VizSEC ’06: Proceedings of the 3rd international work- shop on Visualization for computer security. New York, NY, USA: ACM, 2006, pp. 31–40.

The development of Greynets, the motivating use case for visualisation, is documented in: W. Harrop and G. Armitage, “Defining and evaluating greynets (sparse darknets),” in LCN ’05: Proceedings of the The IEEE Conference on Local Computer Networks 30th Anniversary. IEEE Computer Society, 2005, pp. 344–350.

W. Harrop and G. Armitage, “Greynets: a definition and evaluation of sparsely populated darknets,” in MineNet ’05: Proceeding of the 2005 ACM SIGCOMM workshop on Mining network data. New York, NY, USA: ACM Press, 2005, pp. 171–172.

F. Baker, W. Harrop, and G. Armitage, “IPv4 and IPv6 Greynets,” RFC 6018 (Informational), In- ternet Engineering Task Force, Sep. 2010. [Online]. Available: http://www.ietf.org/rfc/ rfc6018.txt

v

Acknowledgements

My supervisor Professor Grenville Armitage & associate supervisor Dr. Philip Branch, for providing valuable guidance and feedback throughout the creation of this thesis. Mitchell Harrop and Sebastian Zander for their feedback. Many thanks to all CAIA members past and present for creating an excellent research environ- ment. The following people helped greatly in recruiting usability experimentation participants: Mitchell Harrop, Bettina Harrop, Juniris Harrop and Peter Harrop. Thanks go to Daniel Trembath for creation of software to aid in data entry. This code saved numerous hours and allowed the accurate conversion of the approximately 200 pages of paper answer sheets to digital form. The Cisco University Research Program Fund provided support funding for the project “Anoma- lous Traffic Detection and Collaborative Network Configuration Using 3D Multiplayer Game En- gines”, Project Leader and Participants Grenville Armitage, Warren Harrop & Cisco Project Cham- pion, Fred Baker. This funding enabled the production of L3DGEWorld and thanks go to Lucas Parry for his year of software engineering on the project. (http://www.caia.swin.edu.au/urp/l3dge/) The Cisco Network Topology Icons are used in many diagrams in this thesis. (http://www.cisco.com/web/about/ac50/ac47/2.html) The auDA Foundation supplied a supporting grant for the development of the CAIA Greynets Toolkit – greynetd. (http://caia.swin.edu.au/greynets/) Trace file data of multi-player Quake III Arena game play from Chapter 6 is taken from data publicly available in the SONG collection, created by the Centre for Advanced Internet Architectures, Swinburne University of Technology (http://caia.swin.edu.au/sitcrc/song/). Monika Dieker for being there from start to finish.

vii

Contents

Abstract 1

1 Introduction 3 1.1 A real-time collaborative environment ...... 6 1.2 Usability experimentation ...... 9 1.3 Network resource consumption ...... 9 1.4 Thesis outline ...... 10

2 Techniques for Immersive and Collaborative Management 11 2.1 The role of visualisation ...... 12 2.1.1 Reasons for the visualisation of data networks ...... 13 2.1.2 Past and present ...... 14 2.2 The diversity of metering, transfer and collection methods ...... 15 2.2.1 Network observation points - collecting frames ...... 16 2.2.2 Network layers - example network metrics ...... 16 2.3 The diversity of network control & service discovery methods ...... 19 2.3.1 Example network control methods ...... 19 2.4 Network data interpretation ...... 21 2.4.1 Text based ...... 22 2.4.2 2D – static and interactive ...... 22 2.4.3 3D – static and interactive ...... 22 2.4.4 3D – immersive ...... 23 2.5 Collaboration ...... 23 2.6 Taxonomy ...... 24 2.6.1 User control over presented information ...... 24

ix x CONTENTS

2.6.2 Real-time dynamic update of data into a system ...... 24 2.6.3 Historical data access ...... 25 2.6.4 Interaction with running network configuration ...... 25 2.6.5 Visual presentations and metaphors used ...... 25 2.6.6 Concurrent presentation of network variables ...... 25 2.6.7 Collaboration ...... 26 2.6.8 Scalability ...... 26 2.7 Network visualisation evolution – a review of notable examples ...... 26 2.7.1 Textual visualisations ...... 27 2.7.2 Static 2D visualisations ...... 30 2.7.3 Interactive 2D visualisations ...... 35 2.7.4 Interactive 3D visualisations ...... 38 2.7.5 Immersive 3D visualisations ...... 41 2.8 Visualisations using game engines ...... 44 2.9 Conclusion ...... 45

3 Proposal & Methodology 46 3.1 Leveraging a FPS game engine ...... 48 3.2 Historical data access ...... 49 3.3 Evaluation Methodology ...... 50 3.3.1 Usability experiments ...... 50 3.3.2 Network resource consumption experimentation ...... 52

4 Towards a Visual Environment 54 4.1 Motivating use case – visualising greynet data ...... 54 4.2 Early prototyping - 3VEN ...... 56 4.3 Cube engine ...... 57 4.4 Example usage ...... 58 4.4.1 Multiple monitoring points ...... 58 4.5 L3DGEWorld ...... 60 4.5.1 Collaboration ...... 61 4.5.2 Drill down ...... 63 4.6 Experiment architecture ...... 64 CONTENTS xi

4.7 L3DGEWorld extensions/modifications ...... 65 4.8 Conclusion ...... 67

5 Usability Evaluation 68 5.1 Methodology ...... 69 5.2 Results & Discussion ...... 73 5.2.1 Participant demographics & experience ...... 73 5.2.2 Navigation within L3DGEWorld ...... 74 5.2.3 Object movements and the concepts they convey ...... 76 5.2.4 Visual orthogonality ...... 79 5.2.5 Participants’ ability to detect in-world events ...... 81 5.2.6 Participants’ in-world positions ...... 82 5.2.7 Participant’s sensitivity to latency ...... 82 5.2.8 Final open-ended questions ...... 83 5.2.9 Professional network administrators ...... 83 5.3 Conclusion ...... 84

6 Network Resource Consumption 85 6.1 Experiment setup ...... 86 6.2 Client connection establishment and teardown ...... 87 6.3 Data propagation process ...... 88 6.4 Continuous operation traffic ...... 90 6.4.1 Attribute update affect on snapshot size ...... 91 6.5 L3DGEWorld server data propagation delay ...... 92 6.6 Multiple client L3DGEWorld - usability traffic analysis ...... 93 6.6.1 Command traffic ...... 94 6.6.2 Snapshot traffic ...... 95 6.7 L3DGEWorld limitations ...... 98 6.7.1 Under-sampling ...... 98 6.7.2 Packet loss ...... 98 6.7.3 Acknowledgments ...... 99 6.8 L3DGEWorld performance over uncontrolled paths ...... 99 6.9 Conclusion ...... 101 xii CONTENTS

7 Conclusion 104 7.1 L3DGEWorld ...... 105 7.2 Usability ...... 105 7.3 Network Resource Consumption ...... 106

A Ethics clearance 108

B Questionnaire 110

C Answer sheets 124

D Hardware accuracy 129 D.1 Methodology ...... 129 D.2 Results ...... 130 D.2.1 Alloy 8 port switch (GS-08DXI) ...... 130 D.2.2 Alloy 5 port switch (NS-05CR) ...... 131

E Synthetic Packet Pairs 132

References 133 List of Figures

2.1 A section of the modern London underground map - still based on Beck’s original 1933 design ...... 12 2.2 A graph of nodes and links represented as pure text, hierarchically and circularly. . . 13 2.3 Delays between data observation and user perception can be intrinsic to a system or artificially introduced using storage ...... 14 2.4 Layers in IP data networking - abstract names, example protocols and examples of metrics measurable at each layer ...... 16 2.5 Data gathered from observation points is often transported elsewhere for collection and analysis (with various transport protocols at each layer) ...... 16 2.6 tcpdump displaying a fragment of the packets from a TCP exchange ...... 27 2.7 trafshow displaying flows captured from a network interface ...... 28 2.8 A running instance of wireshark ...... 29 2.9 1999 IPv4 Internet reachability by Cheswick et al. [1] ...... 31 2.10 2009 IPv4 AS reachability from CAIDA [2] ...... 31 2.11 An MRTG graph for a link with outbound data pink, inbound blue ...... 32 2.12 Clockwise from top left, heat maps displaying ‘fingerprints’ of HTTP, SMTP, SSH and AIM (instant messaging) packet data [3]...... 33 2.13 etherApe - hosts discovered on a network are joined by animated lines, where width represents a moving window average of bandwidth usage...... 33 2.14 glTail - Circles represent web-server requests ejected into an area with gravity and a choke-point [4]...... 34 2.15 Rumint - ‘Binary rainfall’ visualisation, each line is one packet, one pixel maps to one packet bit (TCP packets in green, UDP in orange, ICMP purple) [5] ...... 35 2.16 NVsionIP - Showing the three levels of drill down from overview to port level [6] . . 36

xiii xiv LIST OF FIGURES

2.17 VISUAL - a user’s internal network addresses represented as a grid, external network addresses as surrounding yellow squares [7] ...... 37 2.18 TNV - Multiple window sections displaying both traffic overview and individual packet data [8] ...... 37 2.19 SeeNet3D - displaying 1993 backbone traffic [9] ...... 38 2.20 The Spinning Cube of Potential Doom - Displaying TCP connection attempts into a network as coloured dots [10] ...... 39 2.21 Untitled (Malecot´ et al.) - User configurable cube faces represent address space or port numbers while a 2D representation provides detail [11] ...... 40 2.22 VAST - AS networks represented within a cube, for the diagnosis of BGP routing issues [12] ...... 41 2.23 Untitled (Crutcher and Lazar et al.) - ATM network physical link inspection [13] . . 42 2.24 CyberNet - showing a building metaphor (top) and city metaphor [14] (bottom) . . . 43 2.25 PSDoom - Unix processes represented as monsters within the game Doom II [15] . . 43

3.1 High-level system architecture ...... 47

4.1 A greynet host monitors multiple IP addresses (amongst normal ‘lit’ hosts) from var- ious subnets on an enterprise network. a – logical layout, b – implementation using VLAN trunking ...... 55 4.2 Early prototyping with 3VEN - pyramids representing subnets and wireframe cones representing ports. Object size denotes packet-per-second rate into greynet space. . . 56 4.3 The Cube game engine prototype visualises the output of a greynet ...... 57 4.4 Detecting and blocking an active network scan ...... 59 4.5 L3DGEWorld - the viewer and another user inspect pyramid objects representing network hosts ...... 61 4.6 Three-tier drill down in L3DGEWorld 2.3: Left to right, moving into a router object shows host objects (VoIP phones), moving into a host object then shows port objects 64 4.7 The network flows in a L3DGEWorld system ...... 65 4.8 LCMON - Objects representing super-cluster nodes (left) & LAMS - Objects repre- senting VoIP clients (right) ...... 66

5.1 A greynet host object ...... 70 LIST OF FIGURES xv

5.2 Human usability word layout & clockwise from top-left, detail of in-world objects representing a router, laptop, greynet host and VoIP phone ...... 71 5.3 Human usability participant age distribution ...... 73 5.4 The distribution of participant’s skills in each category, from lowest to highest scoring participant ...... 74 5.5 The relationship between a participant’s experience and their ability to move within the world ...... 75 5.6 Participant obstacle course run times, ordered by first run time ...... 75 5.7 Participant’s attitudes to object small, medium or large bounce height ...... 77 5.8 Participant’s attitudes to object colour ...... 77 5.9 Participant’s attitudes to object roll of 90°or 180° ...... 78 5.10 Participant’s attitudes to object large or extra-large size ...... 79 5.11 Participant’s attitudes to object slow, medium or fast spin ...... 80 5.12 Correct recognition of object attributes, alone and as aggregate with other attributes. . 80 5.13 The relationship between total experience and correct recognition of object attributes 81

6.1 L3DGEWorld isolated lab-bench experiment setup ...... 86 6.2 Typical connection establishment states with 20 snapshot and command packets per- second and no in-world activity ...... 88 6.3 L3DGEWorld attribute propagation through a L3DGEWorld server ...... 89 6.4 L3DGEWorld client to server traffic distributions for a single stationary user and no world updates ...... 90 6.5 L3DGEWorld server to client traffic distributions for a single stationary user and no world updates ...... 91 6.6 Updating the spin rate attribute of 8 to 128 objects (increasing in steps of 8) versus resulting snapshot packet sizes...... 92 6.7 Command packets - Multi-user Quake III Arena play compared to L3DGEWorld net- work monitoring ...... 95 6.8 All snapshot packet size data probability density functions - empirical and predicted . 96 6.9 Empirical and predicted data, comparative CDFs and Q-Q plot of two participant scenarios ...... 97 6.10 Empirical and predicted data, comparative CDFs and Q-Q plot of three participant scenarios (1*1*1) ...... 97 xvi LIST OF FIGURES

6.11 L3DGEWorld over uncontrolled paths - experiment setup ...... 100 6.12 L3DGEWorld 802.11g and 3G + ADSL SPP derived RTT ...... 101 6.13 L3DGEWorld 3G + ADSL SPP derived RTT time-series ...... 101

D.1 Dell time-stamping accuracy for packets with a 10ms inter-arrival time ...... 130

E.1 Using the SPP (Synthetic Packet Pairs) method to estimate path RTT ...... 132 List of Tables

5.1 Object attribute scales ...... 70

xvii

Abstract

Network administrators attempting to provide reliable Internet Protocol (IP) based network services face many challenges when monitoring and controlling their networks. They must react to constant change from equipment malfunctions, changing network loads and attacks upon components of the network itself. To be aware of issues and respond, the administrator must collect network metrics using a diverse set of methods, then collate, present and interpret this data. They must then implement solutions for a running network using a variety of methods. None of these tasks are trivial, and they are resource intensive. To ease the burdens of network management and to keep human administrators involved in net- work management in an efficient way, approaches based on visualisation have been proposed. These have often been limited, particularly in the areas of collaboration, self-evaluation through usability experiments, and the evaluation of their network resource consumption. In this thesis we make three novel contributions. First we build an immersive 3D visualisation for network management, implemented using an open-source 3D game engine, that contains a unique combination of features. These include, the display of a wide variety of visual elements within an immersive 3D presentation, control of external systems using simple in-world interaction methods, distributed collaborative operation (using a client-server model) and deployment on a wide variety of commodity hardware and software platforms. We have used it to visualise data from our novel network monitoring technique (greynets) as a demonstration. The next two contributions evaluate the feasibility of this approach to network management. We designed and conducted human usability trials with 49 participants, who had a wide variety of skill sets and demographics. We have found that when a visual attribute (such as spin) is presented in combination with other attributes on a single object, the detection accuracy of the attribute can be reduced. Generally speaking, participants did not have strong preconceived notions or common views about what meanings attributes convey. 2

All participants were able to develop a rapid understanding of the virtual world and its controls, regardless of their level of computer experience. Also, participants successfully navigated simulated network events to correctly discover in-world changes. While using the system they were quite insen- sitive to network latency between the visualisation client and the server, tolerating simulated network delay higher than would normally be encountered on the wider Internet. Third, we evaluate the network resource requirements of our prototype system. We found that it meets our goals of having modest bandwidth usage. It satisfactorily operates over long delay network paths that include the wider Internet and low-bandwidth wireless links such as 3G and WiFi. The system introduces no significant propagation delay and the traffic it produces scales linearly with the number of element to be updated. This thesis demonstrates that a collaborative 3D immersive environment shows promise as a net- work management tool. One can be constructed from a 3D game engine, used to detect network events (even when used by novices) and when supporting collaborative visualisation, has modest network requirements. Chapter 1

Introduction

Managing an Internet Protocol (IP) network is a significant challenge. A network administrator must minimise inappropriate resource consumption, while providing adequate service for applications that have a wide variety of network resource requirements. The traffic that an administrator must manage can originate from the administrator’s own network, or from outside the administrator’s domain of control. There are changing traffic engineering requirements, levels of congestion, and temporary equipment outages (whether scheduled or unforeseen). In addition to these issues, the networking devices or end hosts of an administrator’s network can be attacked by malicious parties. A network’s internal state is high-dimensional, presenting the administrator with a range of data that must be inter- preted concurrently. With the outcomes of this thesis we to assist administrators of IP networks to provide improved service, by providing them with better tools for network management. Managing networks presents challenges to the network administrator on several fronts. First, the regular, real-time collection and collation of network monitoring data from heterogeneous sources is non-trivial. Second, helpful presentation and correct interpretation of this data requires specialised skills. Finally, specialised skills are required to create a solution for the issue and then correctly implement the solution on the network devices involved. These devices can be heterogeneous in the function they perform, heterogeneous in manufacturer, and to finally complicate matters, actively carrying user data. This has been described as like controlling a network using tweezers, because administrators are trapped at the device level and stuck with control via low level configuration using proprietary interfaces. In this environment, network management requires vigilant administration by a team with special- ist skills. This leads to a financial challenge. Employees cost money, and highly skilled employees both cost more and are harder to obtain. They represent the human central-point-of-failure in a net-

3 4 CHAPTER 1. INTRODUCTION work management system. In short, it is resource intensive for network administrators to have a clear understanding of what is happening within their network and then generate responses to issues in timely and correct ways. A challenge for researchers in the network management field is to reduce the level of specialised knowledge required for an employee to be a useful member of a network operations team. An administrator may wish to monitor a number of metrics from their network, such as packet rates on key interfaces, loss rate and latency over important links, the spread of destination IP ad- dresses being probed from external sources, and the rate of route updates between border routers. The administrator could simply view all data in raw numeric form. But watching more than one set of numbers scroll past on a screen at the same time becomes problematic for even the most highly trained human administrators. Raw numbers also make qualitative monitoring (such as looking for potential trends or patterns) particularly difficult. One potential solution to these issues is the implementation of automated systems for management, but these have limitations and network management is more effective when a human element is present ‘in-the-loop’ [16]. Humans have sophisticated pattern recognition abilities and are also capable of adding high-level context to low-level events. For example, an automated system seeing a massive increase in repeat requests for a specific set of web pages might automatically trigger a response to a denial of service attack, blocking the traffic from hosts performing ‘excessive’ repeat page loads. However, a human administrator can realise that it is normal for a sport team’s website to receive large volumes of traffic when they are playing in the finals after many years of failure. Human administrators can recognise the broader context that is difficult to identify and pre-program into a machine-based system. To easily and effectively leverage these human abilities, visualisations of various forms are used. In defining what visualisations attempt to achieve, Spence observes that: “...quite often, the sight of a graphical encoding of data causes an ‘Ah Ha!’ reaction in the viewer in the sense that a useful discovery has been made” [17]. Various forms of visualisation have been proposed for data network management. Many are visually novel and explore various mappings of network metrics to visualisation. Some allow the user to manipulate the visualisation in real-time to explore and gain insight into the presented data set. But despite these steps by the research community, commonly used network monitoring software has evolved little and remains limited. In many systems, important metrics are displayed in the form of 2D time-series graphs. In other systems, data is overlaid on 2D network topology diagrams. This is largely the extent of common visualisation techniques. 5

Past attempts to visualise live network state has been limited in a number of ways. Few have real- time and historical data and very few enable the control of the underlying network. Many have an intrinsic inflexible coupling between data and visualisation. To date, no research has explored what the remote networking requirements are for systems that enable multi-party collaboration for network management tasks. Most previous approaches are only described in the literature, and have no public software releases of any kind. Further, authors have not generally engaged in much self-criticism of ideas, nor performed usability and scalability testing of their work. For any given scheme, with an absence of evidence of effectiveness and ability to scale to real-world operational needs, it is hard to obtain and retain the interest of the network operations community. In this thesis we demonstrate how we can assist with these challenges: data collection, visual presentation, interactive interpretation by humans, response, reconfiguration, and real-time operation. We propose that this is best achieved through visualisations that present data as attributes of in-world objects in an artificial 3D space, an immersive virtual environment. By immersive, we mean a 3D virtual environment where a user can move around in the world to inspect the objects in it, and gain different perspectives on the presented data. Users and collaborators are represented as avatars, and can interact with the objects in the virtual environment. We believe the emergence of small, low-cost, yet graphically powerful devices will enable network monitoring and control to continue an evolution towards immersive 3D environments and away from text or 2D representation. In our immersive environment we can present visual metaphors by applying various attributes (or combinations of attributes) to simple in-world objects, such as the spin rate of a cube or colour of a pyramid. The visual metaphors are not necessarily for fine-grained discrimination of network issues but they do enable quick recognition of relevant, coarse-grained events occurring on a network. Further, in an immersive 3D visualisation, administrators are not just limited to viewing the visual- isation and drawing conclusions. They are also able to respond to issues using in-world interaction metaphors that are transparently converted by the system into re-configuration commands sent to net- working devices. With this kind of system we propose that even non-experts can make a positive contribution to management tasks. Some of the features of the visualisation system we propose have been implemented previously, but they have often been implemented separately. Further, evaluation has been very limited or non- existent, leaving many unexplored areas. These include:

• If a collaborative immersive 3D network management system be created.

• If network administrators and those with no data networking knowledge can use such a system 6 CHAPTER 1. INTRODUCTION

to identify network events accurately.

• If there are any constraints in how we should map attributes of in-world objects to network metrics.

• If the system created can support collaboration over the Internet and if so, the network resource requirements.

In this thesis we address these areas by making three novel contributions. First, we develop prototype visualisation software that implements a number of key features. Second, we evaluate the usability of the system through human usability experiments. Third, we evaluate the prototype’s network resource consumption, specifically when supporting collaborative network management. We further outline these in the following sub-sections.

1.1 A real-time collaborative environment

We first describe the design and implementation evolution of an open source immersive world for network management. The final iteration of the software contains a unique combination of features including: immersive 3D presentations, distributed server-client based collaboration among users, control of external systems through in-world interaction methods, interactive display of a wide va- riety of visual elements, and real-time operation while deployed on commodity hardware. The final software of the prototyping effort is L3DGEWorld [18]. It is released as open source software under a GPL license. Our system can present many objects in a virtual environment. Each of these objects can have multiple attributes, such as colour, size, shape and spatial orientation. Of particular significance in our method of visual presentation is the visual orthogonality of the attributes of an object. In other words, we must make sure that the visual attributes for distinct network metrics should themselves be visually distinct when presented on an object. An abstract representation of a network’s state can be achieved by mapping each state variable to different attributes of objects within the virtual environment. By suitably combining visual char- acteristics presented on a single object we simultaneously represent multiple network metrics. For example, a rack of routers can be represented by a grid of small balls – one ball per router with each ball’s diameter proportional to the aggregate traffic load on the corresponding router. Each ball grows and shrinks as the traffic through its corresponding router rises and falls. Further, each ball’s colour 1.1. A REAL-TIME COLLABORATIVE ENVIRONMENT 7 can be linked to the current packet drop rate being experienced at each router. At a glance, the di- ameter and colour of a grid of balls reveals key attributes of a network’s current state. Animation may also be used to represent additional attributes (such as the spin-rate of small cubes to indicate the rate of routing table updates being processed by each router). Each object’s state in the virtual environment becomes a proxy for the network’s own state. Network administrators move around the virtual environment to inspect objects and collections of objects representing network state. We also enable network administrators to control their networks by interacting with objects in- world. A Command Line Interface (CLI) can be an efficient tool for control tasks. However, it is not intuitive and it takes time for a new user to become familiar and efficient with the syntax of different programs and devices. The CLI will also likely require the user to know a set of specific physical to logical mappings of the underlying system being controlled. For example, to implement a change on a network, an operator will need to know the syntax of the device they are connected to and also the details of what subnets represent what end hosts (i.e. VoIP phones, accounting PCs, external facing servers, etc.). We map in-world actions to triggers of pre-programmed network control operations, hiding the arcane details of a particular command line syntax or web interface. Certain operations can be per- formed entirely in-world by network administrators with a lower skill level than would otherwise be required. The advantage of this approach is described by Goodall et al. [19, 20]. We ease the burden of more experienced administrators in an organisation by lowing the barrier for less experienced ad- ministrators to contribute to the management of the network. At the same time we allow these less experienced administrators to gain training through their experiences. We do not attempt to implement all of the possible functionality enabled by a CLI interface with L3DGEWorld’s in-world interaction methods. It is designed as a complementary technology for network control, rather than a complete replacement for all systems that have come before it. We allow for mappings between in-world events and one or more network configuration activities. Configurations can be complicated (for example, setting off macros of events), but mappings need to be created by those implementing a L3DGEWorld system1. We do note that integration of other interfaces into L3DGEWorld is possible. We have considered implementing a feature where the 2D interface of a device (such as a CLI or web interface) can be accessed from within the virtual-world. This could be enabled by moving close to an object

1An example of this type of mapping is discussed in Section 4.1, where interacting with an object in-world (a Boolean event) creates a block rule against a specific host that the system can implicitly determine is an attacker. 8 CHAPTER 1. INTRODUCTION representing a device and pressing a key. The client screen would then display the traditional interface of the device. This is much like in any modern 2D where the user can open any number of CLI windows and then move between 2D GUI and CLI tasks as they choose. In the same way, using a virtual world does not preclude the placement of CLI or 2D GUI elements within the 3D world. Implementation of this feature was not done due to our wish to focus on the virtual world specifics of our work. As networks carry more and more mission critical data, the need for collaboration on network management decisions becomes increasingly important. The implementation of the wrong network management decision can have an impact as devastating, if not more so, than the original problem. Typically the detection, interpretation and reaction process requires time from multiple staff with relatively high skill levels - time and skills that might be more cost-effectively utilised. Using the our prototype system, multiple people at different locations are able to view the same network state from their own monitoring device. In addition, rules for collaborative interaction are easily configured, so that multiple administra- tors can be required to interact with a particular object before a network reconfiguration is instantiated (analogous to military nuclear missile launches that require multiple keys to be turned simultane- ously). Finally, our prototype system allows for the real-time operation of all of these features, unlike many of the systems in the literature which are non-real-time, and work only with recorded logs. Key to practical implementation is our use of off-the-shelf multiplayer 3D game-engine tech- nology. They provide a relatively inexpensive and flexible platform for implementing immersive and collaborative network management systems. With modification, modern game-engines enable the creation of computer-controlled objects whose behaviours are tied to monitored network events, and whose reactions to being ‘shot’ or ‘healed’ (common in-game metaphors for interaction) can be translated into network reconfiguration events. This type of software is also capable of running on a diversity of clients and in a variety of networking conditions. Our prototypes were implemented with our motivating use case in mind – the visualisation of greynets [21, 22, 23]. Greynets are a novel, lightweight scan detection method for enterprise networks that we developed in parallel with our visualisation software. Greynets passively listen to unused IP addresses scattered across an enterprise network and from the unsolicited packets that arrive, possible security issues can be inferred. 1.2. USABILITY EXPERIMENTATION 9

1.2 Usability experimentation

The second key contribution of this thesis is our evaluation of L3DGEWorld through usability exper- iments. We have run tests with 49 participants who had a wide variety of computer skill sets, ranging from participants who were network administrators, to participants with little experience with com- puters. We investigate the degree to which our objects’ attributes convey common concepts to partici- pants. For example, does an object bouncing vigorously, suggest a common meaning (eg. ‘normality’ or ‘urgency’) to participants? We test the visual orthogonality of attributes by outlining the degree to which users can correctly differentiate the attributes of size, bounce and spin, when presented in combination. We find that care must be taken when mapping network metrics to attributes, as certain attributes can prevent participants from reliably detecting other attributes on the same object. Overall, some attributes were more effective at being correctly detected than others. Finally all participants’ used a L3DGEWorld system presenting simulated networking scenarios. We found that participants could all successfully navigate the system and use it to detect events. During these experiments we also examined participant sensitivity to emulated network latency intro- duced between the L3DGEWorld server and client. We found that participants are quite insensitive to this latency, with participants not detecting 400 ms of round trip latency.

1.3 Network resource consumption

To quantify the networking resource requirements and characteristics of L3DGEWorld we first in- troduce our experiment’s setup. We characterise network traffic between L3DGEWorld clients and server so that its impact on the network can be calculated. Through experiments we detail the data propagation process of attribute updates on a L3DGEWorld server. We then discuss L3DGEWorld client connection establishment and teardown, and characterise traffic during continuous operation, both in the server-to-client and client-to-server directions. We de- tail the relationship between attribute updates and the subsequent size of server-to-client packets and the data propagation delay that a L3DGEWorld server introduces to an attribute update. Through analysis of usability trial traffic, we quantify the client-to-server traffic when L3DGEWorld is used by between one and three participants, followed by quantification and prediction of server-to-client traffic. We show that L3DGEWorld’s traffic profile is predictable and does not present a significant load on a modern network. 10 CHAPTER 1. INTRODUCTION

A set of final experiments present data obtained by running L3DGEWorld over a series of un- controlled Internet paths (including link technologies such as WiFi and 3G). The results show that realistic values of latency do not prevent or limit L3DGEWorld from operating effectively.

1.4 Thesis outline

The remainder of this thesis is structured as follows. In Chapter 2 literature is covered. Current methods of network collection and transfer are presented, followed by an overview of the role of visualisation in network monitoring. We then generate a taxonomy from study of the literature and review works of note before drawing conclusions from the literature. In Chapter 3 we present the rational for our prototype development and broad methodology for evaluation. Chapter 4 details the prototype development evolution, from a GLUT (OpenGL Utility Toolkit) library based application to the final prototype system L3DGEWorld based on the OpenArena game engine. Chapter 5 evaluates the prototype with human usability experiments. Chapter 6 characterises and evaluates the prototype’s network resource consumption. We conclude in Chapter 7. Chapter 2

Techniques for Immersive and Collaborative Management

This chapter surveys literature related to network management visualisations. First, it presents an overview of network data metering, collection and transfer protocols and an overview of network control methods and protocols. This is presented as an introduction to the diversity of management methods and their complexities, and as broader context for why people turn to visualisation as a method of network monitoring (and occasionally control). We follow with an overview of the methods that can be used to assist a human in interpreting raw networking data, from text based solutions through to fully immersive 3D environments. Our taxonomy for the evaluation of literature includes: user control over presented information, real-time dynamic update of data into a system, historical data access, interaction with running net- work configuration, visual presentations and metaphors used, concurrent presentation of network variables, collaboration support, and scalability. Selected literature is then described, starting with text based systems and moving towards sys- tems that integrate interaction, immersion, collaboration and control features. Specific coverage is then given to a set of literature where game engines have been modified and leveraged to create visu- alisations of various data sets. This chapter concludes by outlining how the literature is deficient in a number of areas including: collaboration, network control, inflexible couplings of visualisation and data set, prototype evaluation, and public releases of software.

11 12 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Figure removed due to potential copyright issues

Figure 2.1: A section of the modern London underground map - still based on Beck’s original 1933 design

2.1 The role of visualisation

The broader field of data visualisation is well established and its techniques have been applied to a vast range of data sets to produce representations of many forms [17]. In defining what visualisations attempt to achieve, Spence observes that “...quite often, the sight of a graphical encoding of data causes an ‘Ah Ha!’ reaction in the viewer in the sense that a useful discovery has been made” [17]. Along similar lines, Simon states, “...solving a problem simply means representing it so as to make the solution transparent” [24]. For example, Figure 2.1 shows how the network map of the London underground train system was simplified by Beck in the 1930s by abandoning scale, and representing important information while hiding unnecessary detail [25]. The key insight was that when underground and traversing the rail system, people do not need exact map detail, this only serves to overwhelm them with irrelevant data. Clearly, different circumstances may call for more detailed maps, but Beck’s simplified style served its intended purpose well and is now a de-facto world standard for public transport systems. Similarly, clear understanding of network state is not necessarily best served by faithful repre- sentation of a network’s entire structure and associated attributes. Administrators can benefit from simplified visual representations that focus their attention on a network’s key characteristics which enhance the administrator’s ability to react and control their systems. Figure 2.2 illustrates how one’s choice of presentation can impact on the ease with which infor- mation is processed by a human. Figure 2.2-A represents a set of relationships between nodes in 2.1. THE ROLE OF VISUALISATION 13

AB A C F E digraph G { C A -> I -> H; I B A -> B; D G -> H; B D -> E; C D D -> F; G B -> D -> G; B -> C -> E; E F G A C -> F;} H I H

Figure 2.2: A graph of nodes and links represented as pure text, hierarchically and circularly. pure text form (in ‘DOT language’ [26]), forcing the observer to create their own mental model of how each node relates to each other. Figures 2.2-B and 2.2-C simplify the observer’s mental task by presenting the relationships in hierarchical and circular layout respectively. Text has limited presentation possibilities but flexibility in analysis of details, while a visual inter- face allows for better overviews of network activity [27]. Some visualisations use simple representa- tions (single pixels, polygons connected by lines etc.), other visualisations use higher-level graphics or objects as metaphors for representation (eg. spin rate of a cube meaning the rate of data transferred by a link). When using visualisation metaphors we attempt to achieve two goals. Firstly, make metaphors distinct and optimal for leveraging a human’s ability to develop intuitive responses to the visualisation. The second is to map these visual elements to underlying network measurements in a manner that makes some form of intuitive sense to the user. But there are also a number of ways visualisation systems can be misleading and prove to be a weakness rather than a benefit. Data could be hidden simply through bad visualisation design, or malicious activity could attempt to leverage weaknesses in the way a human sensory system views visualisation data. These kinds of attacks could take various forms, including an attack on human cognition, memory, vision, or other attempts to cause the human element of a system to fail. In 2005, Conti et al. published work that discusses and creates a taxonomy of these forms of attacks [28]. The evaluation of the potential ‘exploitability’ of a visualisation scheme is important for the stability and security of a real-world deployment. However, here we consider the topic beyond the scope of this thesis and our analysis.

2.1.1 Reasons for the visualisation of data networks

The complexity, volume and diversity of critical data carried by modern networks is difficult to con- ceptualise. Management methods give diverse views and inputs on an operating network. The net- 14 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Metering process Collector process Presentation (optional filtering (optional filtering play-out and/or storage) and/or storage) Storage

Perception

t=0 time

Figure 2.3: Delays between data observation and user perception can be intrinsic to a system or artificially introduced using storage work administrator is constantly monitoring variables for patterns of communications and systems acting outside of pre-determined bounds. The cause of variation could be anything, from equipment failure, attacks upon the network, or even just legitimate changes in network activity. Naive investigation of every network anomaly quickly becomes overwhelming for an administra- tor [16]. Conti et al. observed that network based attacks quickly fill “logs with redundant informa- tion, until stopped. This fact, together with the average amount of unique alarms generated, can cause information overload and possibly hide the most significant attacks” [5]. In addition to using visualisation to assist in the diagnosis of known problems, in an ideal world, network administrators would also have enough insight into their network to data-mine and discover problems of which they had no prior knowledge. They should be made aware of issues before they become critical and are identified by other means (such as system failure or user complaints).

2.1.2 Past and present

A defining element of network visualisation is the presentation and handling of time-dependent events, as many important trends in network data will occur in the time domain. Figure 2.3 illus- trates the various time delays between data observation and final perception of this data by a human. Some delays are intrinsic to the entire visualisation system, and some may be artificially introduced. Intrinsic delay can be introduced at any system step, be it data metering, transfer, collection or presentation. Data metering delays include any time required for a metering process to collect underlying network data and format it. Data transfer delays include any networking transmission delays, including speed of light limitations or queuing delays. Presentation delays consist of the time it takes to render a visualisation from raw data to graphic form. The magnitude of specific time delays can vary widely between visualisation systems. Whether these intrinsic delays are an issue for the administrator depends on the visualisation system’s intended purpose and the data being displayed. Along with these intrinsic delays, optional data processing and/or storage can occur between any of the above steps. Processing can consist of data calculations and manipulations, or filtering to 2.2. THE DIVERSITY OF METERING, TRANSFER AND COLLECTION METHODS 15 reduce the scope of data. Storage can consist of both short term buffering or long term data retention. If a visualisation system has the ability to decouple the processes of data recording and play-out, this gives a system the helpful ability to variably time shift (akin to being able to rewind, pause or fast-forward like a home video recorder). This functionality allows the network administrator, from within a visualisation, to place prob- lems that only clearly manifest themselves at a later date, into their historic context. For example, forensically establishing which systems a previously un-detected attacker has compromised, after a major security breach is noticed.

2.2 The diversity of metering, transfer and collection methods

Data metering, transfer and collection are the first steps an administrator must take to gain an under- standing of their network and the information being transmitted across it. Broadly speaking, they then aim to discover patterns of communications within this sea of collected data. When these patterns of communications diverge from predefined limits or norms, administrators need to be aware of these changes, so that they can act. The raw data collected from a network is diverse and can be collected from any layer of the IP networking stack. Figure 2.4 shows the layers commonly defined in IP data networking – Physical, Data link, Internet, Transport and Application. This delineation allows for ease and flexibility in the development of protocols by hiding lower layer complexity from higher layers (a good overview of the IP layers can be found in [29]). The type of data monitored at each layer can vary greatly. Bit errors and loss of synchronisation can be measured at the physical layer, the propagation time of a link at the data link layer, number of HTTP requests per-second at the application, or any other number of performance metrics [30]. Figure 2.5 shows the components of network data metering, transfer and collection. Metering processes at observation points may utilise a range of different protocols to transfer various types of captured data to (potentially remote) collector processes. Much of the literature discussed in later sections use common methods to meter, transfer and collect data for their visualisations. Along with this, visualisations and the type of data they present are often tightly coupled, so an overview follows for those less familiar with the diverse range of network management data and protocols. 16 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Example Example units of Protocol Stack protocols monitored data Application http/ftp/DNS Log files Transport TCP/UDP/SCTP Flows

Internet IPv4/IPv6 Per-packet statistics

Virtual circuit Data Link Ethernet/SONET status CAT5 copper/fibre Bit errors/ Physical loss of synchronisation

Figure 2.4: Layers in IP data networking - abstract names, example protocols and examples of metrics measurable at each layer

Data Types Transfer Protocols Collector process(es) Metering process(es) Various NIDS Log files SysLog/Various Network SNMP MIB Flows netflow, IPFIX Various Observation Point(s) Raw Packets

Figure 2.5: Data gathered from observation points is often transported elsewhere for collection and analysis (with various transport protocols at each layer)

2.2.1 Network observation points - collecting frames

Metering of data at observation points can be performed using a number of methods. At the lowest level, passive taps can be inserted into, or onto, a data carrying medium to intercept a small percentage of the electrons or photons traversing it. At the data link layer, devices have the ability to take a network frame’s exact bit pattern and capture this data as it is seen ‘on the wire’. Any network event can be reconstructed from captured frames, giving full data recovery and insight. (Although, in most circumstances, retaining only nec- essary headers, can lead to substantial saving in storage space.) Frames (and metadata relating to the capture) may be recorded in various file formats, including libpcap [31], Microsoft Network Monitor (NetMon) [32] and Sun’s Snoop [33], among others.

2.2.2 Network layers - example network metrics

Figure 2.4’s physical layer consists of the physical connections between equipment. Network metrics of interest can include bit error rate (the number of incorrectly transmitted 1s or 0s) and the loss of 2.2. THE DIVERSITY OF METERING, TRANSFER AND COLLECTION METHODS 17 timing synchronisation between end points. Data link layer protocols (such as Ethernet and SONET/SDH [34]) provide delineation of bits into frames, and multiplexing of different logical communication channels across a single physical link. Along with mechanisms such as MPLS [35] (MultiProtocol Label Switching) for creating virtual links, associated network metrics of interest at this level include virtual circuit status and utilisation. At the internetworking layer, the majority of literature covered in later sections assumes the use of the Internet Protocol (IP). There are various characteristics of the IP layer that an administrator might wish to monitor. These include the state of the routing control plane (reachability data), as bad data can result in routing black holes or routing loops. They also include statistics about the actual user data being transferred, as measured in the following examples.

Flow statistics

Although IP packets are forwarded independently by routers, successive packets that share common source and destination end points are often referred to as a flow. IP flows represent application com- munication and by extension, end user actions and experiences. With the nature of IP networking being statistical multiplexing, the potential exists for transient bursty loadings of links. A network ad- ministrator may want to monitor what impact a user is introducing onto their network, or to estimate what experience an end user is receiving. A common definition is that flows are made up of packets that share the same 5-tuple of: protocol type (TCP, UDP, etc.), source and destination IP addresses and source and destination port numbers (this may also be referred to as a microflow [36]). A flow can have statistics, such as a packet size distribution, inter-packet time distribution, and autocorrelation. One common method of transporting flow information is NetFlow records [37] or the derivative IETF protocol IPFIX (IP Flow Information eXport [38]). For flow sampling (where only 1 in N packets are sampled and sent to a metering process to reduce resource consumption) there is the sFlow specification (RFC3176 [39]) or Cisco’s Sampled NetFlow [40]. RFC5474 [41] provides more detailed information on packet sampling.

MIBs and SNMP

MIBs (Management Information Bases) are a form of hierarchical database that can store statistics from any layer of the network stack (or any other system information). Entries are retrieved using Object Identifiers (OID). Transferring MIB data can be performed using SNMP (Simple Network 18 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Management Protocol) [42]. It allows for MIB key-value pair retrieval of managed objects using either an SNMP get packet sent by a collector process to a metering process (an SNMP agent) or, metering devices can be set so that when predefined conditions are met (or exceeded) collector process are sent an SNMP message.

Log files

Networking devices and end hosts record events in ‘log files’ written in human-readable ASCII text or binary encoded data. The exact layout of this data is diverse and is dependent on the application writing the log file. Two examples of standardisation in the community are the common log format by the W3C [43], implemented by software like the Apache web server [44] and proxy software Squid [45], and the syslog standard (RFC3164 [46]) used for the collection of log entries and real- time log entry transport.

NIDS

A NIDS (Network Intrusion Detection System) takes data collected using the previously discussed methods and interprets it as ‘suspect’ or ‘not suspect’ for the human administrator. NIDSs utilise a range of techniques that may often complement each other, and can be deployed together. If desired, a NIDS can interact back with a network to instantiate traffic policy changes based on its discoveries. This creates an Intrusion Prevention System (IPS), or an Intrusion Detection and Prevention System (IDPS). A signature based NIDS relies on pre-defined binary signatures to detect noteworthy traffic travers- ing a network. It can only detect predefined circumstances and has the additional resource burdens of keeping signature data current. An algorithmic NIDS relies on predefined traffic patterns or limits being met or exceeded before alerts are triggered. A NIDS may also reply to incoming connection attempts in order to elicit further insight into the nature of the connection’s initiator. Through various methods, apparently vulnerable software can be safely presented to the network. These Honeypots [47, 48] actually respond to connection attempts as vulnerable software would. Progressing to the connection and potential infection stages provides more insight into the nature and intent of the remote scanner. When attempts to infect honeypot processes, alerts are sent to administrators. Developers of NIDS may use the XML based protocol suite defined in RFC4767 [49] and RFC4765 [50] for the transfer of NIDS data. 2.3. THE DIVERSITY OF NETWORK CONTROL & SERVICE DISCOVERY METHODS 19

Darknets & Greynets

Network Telescopes [51] (or Darknets) monitor IP address space that is routed publicly (advertised into each network’s routing tables) but otherwise unused for normal network hosts. The same ba- sic idea also appears as Internet Motion Sensors [52], Black Holes [53] or our own Greynets [21] (sparse darknets). Passive observation of these addresses reveals three types of inbound packets: malware seeking exploitable hosts by performing network scans, misconfigured network devices, and backscatter [54] from attacks occurring elsewhere on the Internet [55]. Network scans can be produced by tools such as nmap [56] under human control, or automated malware, and are often a precursor to attempted infection. In summary, there is a wide diversity of network metrics that can be monitored to gain insight into a network’s performance. To collect these metrics, a wide variety of collection methods and transfer protocols have been developed. Each of these methods allow for the collection of raw data, but the administrator must take further steps to interpret the data and discover anomalous events.

2.3 The diversity of network control & service discovery methods

Upon discovering anomalous events, a network administrator will usually respond with some form of intervention. This may involve non-trivial reconfiguration of multiple components within their network. Furthermore, these components may be heterogeneous in terms of function (router, switch etc.), manufacturer (Cisco, Juniper etc.) and/or control method (HTTP, SSH etc.). In this section we briefly review some common methods and systems for network control, covering both specific protocols used for signalling and particular user interfaces for control.

2.3.1 Example network control methods

Text based interface

A Command Line Interface (CLI) provides low bandwidth, interactive, two-way text communication. This can occur over a dedicated management connection (using a direct physical attachment, such as an RS232 serial connection) or via a network protocol (such as Telnet or the Secure SHell application, SSH). Telnet and SSH may be used across the network being managed, or across a separate and dedicated management network to improve availability and security. The development of automated tools for controlling devices via CLI is complicated by the fact that commands may differ between devices, even when from the same vendor. 20 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

HTTP (Hyper Text Transfer Protocol)

A common method of controlling network devices is using HTTP [57]. Web pages detailing the setup of the device are transferred from the device to a web browser upon request. Web forms within these pages can trigger a HTTP get or post request to change underlying settings on the device. While convenient from a user perspective, this method does not provide a consistent method of control. De- vice manufacturers each generate their own key/value pairs to be set with HTTP get or post requests. Developing automation requires a degree of reverse engineering.

SNMP - Device Control (and emerging alternatives)

In addition to retrieving network data (Section 2.2.2), SNMP also provides a standardised method for remote set-ing of variables (resetting counters to a certain value, controlling specific operational parameters of a device, and so on). As with retrieving values, an SNMP message is sent to a device containing a write-enabled OID and a value to which the OID should be set. In 2002 the Internet Architecture Board (IAB) outlined a number of issues with SNMP and other network control methods in RFC3535 [58]. These included: little deployment of writable MIB mod- ules, lack of standardisation of deployed MIBs, and a disconnect between the task-oriented view of network administrators and the data-centric view given by SNMP. To address these (and other issues) a number of RFCs have been released defining new protocols for network control. These include the XML-based protocol Netconf [59] and CAPWAP [60], for management of WLAN (Wireless Lo- cal Area Network) devices. A more detailed discussion of management technologies can be found in [61].

Service discovery in consumer and isolated networks

In our previous examples of network control protocols, the question of what is being controlled is either implicit or predefined by the network administrator. However, the owners of isolated or small consumer-run edge networks often lack the kind of network support available in large managed net- works. In these situations, it is desirable to have configuration of network elements like addressing, naming and service discovery/control occur without requiring human intervention. In self-contained home networks, IP addresses are generally allocated by a DHCP (Dynamic Host Configuration Protocol) service. The ability to translate between names and IP addresses is done using the globally scoped DNS (Domain Name System) service. The ability to discover services on 2.4. NETWORK DATA INTERPRETATION 21 a network, without a centralised directory, may or may not be present. In a network where these services are limited or non-existent, the following systems attempt to fulfil them. NetBIOS [62] can provide a simple dynamic naming service. UPnP (Universal Plug and Play) [63] (published as ISO/IEC 2934) can be used for service discovery, the announcements of events and con- trol of devices. Two common uses of UPnP are to signal port traversal requests to a NAPT (Network Address Port Translation) service and to signal QoS (Quality of Service) requests. Zeroconf [64], (also know by Apple’s implementation name, Bonjour) comprises components to allow for decen- tralised addressing, naming and service discovery. In summary, the individual devices that make up a network can each implement one (or more) of these different control methods and to implement changes on a network, one or more devices may need to be reconfigured. To allow simplified control of one or more devices, some researchers have added network control features to their visualisations.

2.4 Network data interpretation

Network administrators must interpret their collected monitoring data in the broader context of their network. Where context is clear and decisions are relatively simple, interpretation may be achieved with fully automated tools. However, more sophisticated interpretation usually requires a human ‘in the loop’, using anything from a traditional text based interface, all the way to a fully immersive 3D visualisation system. Moving to fully automated systems1 to detect network issues brings challenges and limitations. Signature-based systems are limited by the availability of signatures describing known activities of interest and must be constantly updated. Anomaly-based approaches must be appropriately tuned. If not configured for enough sensitivity, real events requiring attention will not be detected. If configured for more sensitive detection, alerts may be ignored by administrators if too many false positives (events incorrectly flagged as problems) are generated by the system [16]. False positives are even more problematic if they trigger an automatic ‘fix’ of the supposed problem. We can identify the following broad classes of techniques for visual presentation of data for human-mediated interpretation: Text based, two dimensional (2D) static and interactive, three di- mensional (3D) static, interactive and immersive2. Section 2.7 will review literature ordered in these

1Examples include the open-source programs Snort [65] and Bro [66]. 2Research into network monitoring using other senses (sometimes combined with visualisation), such as auditory or even haptic ‘touch’ interaction exists, for example [67, 68], but is beyond the scope of this thesis. 22 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT categories.

2.4.1 Text based

Textual presentation and interaction provides simplicity of implementation, detail and relatively low consumption of bandwidth. A text based visualisation has limited display possibilities and makes no attempt to aid human interpretation of data or attempt higher level abstractions. Textual information is returned when performing common tasks like reading recorded log files, or when inspecting live log file data (such as with the UNIX command ‘tail -f’). If the data rate of text output is low enough, a human observer may be able to continue correct interpretation as data scrolls by, but as the data rate increases, the output will quickly become overwhelming. Consequently, text based data-mining involves some form of iterative process where data is inspected and then further filtered and reduced until the data set displayed is specific enough for the administrator to draw conclusions.

2.4.2 2D – static and interactive

Static 2D visualisations do not allow the user to re-define or manipulate their data presentation in real-time. This includes visualisations that update only periodically or visualisations with animated elements that are not under the observer’s control. An example would be software that periodically updates a graph of some form, but does not allow for changes in what the graph is presenting from within the presentation application. Interactive 2D visualisations allow the network administrator to easily change their view of a data set from within the visualisation. In other words, there is some method of manipulating the presentation of data ‘on the fly’ that does not require interruption of the presentation in order to reconfigure the visualisation system.

2.4.3 3D – static and interactive

3D visualisation involves presentation of 3D objects in 3D space (whether mapped on a 2D screen, or rendered using special optical techniques to induce a realistic 3D perspective). Multiple network metrics can be concurrently presented in a compact form by using visually distinct attributes of a 3D object (such as colour, size, shape and movement) to represent each metric. These attributes are described as being ‘visually orthogonal’ [69] to each other. As with static 2D, a static 3D visualisation 2.5. COLLABORATION 23 would present information without allowing dynamic redefinition or manipulation of the presentation in real-time. We will not actually cover any examples of static 3D visualisations in this chapter. With the introduction of 3D presentation tools, developers have generally moved to implementing interactive 3D systems. As with 2D interactive, a key characteristic is the ability to change one’s view of a data set from within the visualisation, such as manipulating the angle of view of 3D objects representing data, or changing the mappings of network metrics to on-screen object attributes to gain further insight into a data set.

2.4.4 3D – immersive

Immersive 3D interfaces include all of the properties of the interactive 3D visualisations, but the user also has a concept of ‘self’ within the 3D environment. The user observes themselves (and possibly other users) as existing within a virtual world populated by 3D objects representing data in various ways. Users can move their view around the world to inspect objects and gain different perspectives on the data presented, either alone or collaborating with other users. Most importantly, an immersive 3D visualisation system may allow users to interact with objects inside the virtual world to control the real-world network that is being represented.

2.5 Collaboration

Network administration is often a collaborative endeavour [19]. Visualisation systems give the oppor- tunity to better facilitate this. A sophisticated management system would allow multiple administra- tors, at physically diverse locations and with different presentation devices (such as desktops, tablets or smartphones), to experience their own visualisation of the same network events. (In other words, one instance of Figure 2.3’s ‘Presentation play-out’ for each administrator.) Ideally a collaborative system would allow separate control of each administrator’s perspective and view of the network data, and flexible means for prioritising (or linking) each administrator’s attempts to control the underlying network. Examples of distributed, collaborative systems range from text based MUDs (Multiple User Dun- geon) and MOOs (MUD Object Oriented) [70] to visually collaborative workspaces and interactive systems [71] and immersion within a virtual world [72, 73]. 24 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

2.6 Taxonomy

From the literature we have identified the following eight characteristics as important points of dif- ference for network monitor and control visualisations:

• User control over presented information

• Real-time dynamic update of data into a system

• Historical data access

• Interaction with running network configuration

• Visual presentations and metaphors used

• Concurrent presentation of network variables

• Collaboration

• Scalability

We outline each, as our illustrative example systems are described in terms of these characteristics in Section 2.7.

2.6.1 User control over presented information

It is rare that viewing a data set from a single, static perspective will be sufficient. Visualisation schemes vary in the degree to which they allow the user to configure (and alter) data filters and/or alter the user’s perspective on the data being displayed. For example, some systems support the concept of ‘drill down’, where data is first presented in an aggregate view, yet the user may then chose to select a more finely-grained presentation of some subset of the data [74]. In some systems this may be associated with a visual effect akin to ‘zooming in’ on the more detailed view.

2.6.2 Real-time dynamic update of data into a system

As noted in section 2.1.2, another differentiator is each visualisation scheme’s intrinsic delay between a network event being observed (by a metering process) and the presentation of that event to the user. A closely related characteristic is the frequency with which the visualisation system updates its presentation of network state. Much of the literature works with recorded data, and is incapable of real-time operation. 2.6. TAXONOMY 25

2.6.3 Historical data access

Section 2.1.2 also observed that visualisation schemes differ in their ability to provide access to his- torical data. It is often impossible to know in advance which historical network traffic data will be required for present or future forensic investigation [75, 76]. Consequently, it is notable if a visu- alisation scheme provides mechanisms for the user to investigate (‘play back’) previously-recorded network events.

2.6.4 Interaction with running network configuration

It is useful for administrators to re-configure their running network from within their real-time visu- alisation, so the specific complexities of the network control systems outlined in Section 2.3 can be avoided, and the consequences and efficacy of changes can be observed promptly [77]. Visualisation schemes may be characterised by the degree to which they provide methods for reconfiguration of network devices and system settings from within a visualisation.

2.6.5 Visual presentations and metaphors used

A visualisation scheme’s choice of presentation is fundamental to the way a user interprets underlying data. Today’s graphics technology allows for a huge range of visual expressions, from plain text to complicated 3D representations. We identify if and why animation occurs within a visualisation sys- tem. Objects representing network state may move because the underlying network state fluctuates, or because the visualisation system utilises movement (such as bouncing or spinning at a certain rate) as a unique way to present network state.

2.6.6 Concurrent presentation of network variables

We identify the manner in which a visualisation scheme presents multiple network state variables at the same time, and whether the result is visually cluttered or clean. For example, a single object can simultaneously represent multiple state variables. The colour of a cube projected into a 3D space might represent one state variable, while its size represents another state variable, and the cube’s motion (bouncing or spinning) a third state variable. All three attributes may be adjusted independently to track each network state variable, allowing a user to observe them concurrently in one glance. 26 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

2.6.7 Collaboration

We report on the support and level of collaboration. For example, if multiple geographically separate users can be presented with independent views of the same network events, to allow the shared inter- pretation of anomalies. Further, if individual users themselves are represented by objects or avatars within the visualisation. Finally, if the visualisation system allows multiple users to simultaneously concur on a course of action before changes are implemented onto a running network configuration. Doing so improves the system’s robustness against an individual’s error. Collaborators may utilise a heterogeneous mix of user interface devices interlinked using a range of different link technologies. A user may gain access from a wired desktop device, but could also use a portable device utilising a wide-area wireless technology. Supporting this type of distributed presentation and control for multiple administrators requires network capacity, and this can limit the diversity of places from where a network can be remotely monitored and controlled.

2.6.8 Scalability

Finally, the practicality of a visualisation scheme depends greatly on its resource requirements. If data collection, storage, conversion and rendering are computationally intensive (or generate too much network traffic of their own), a visualisation scheme may be impractical to deploy at any useful scale. Unfortunately this aspect of a given scheme is often not clear from the literature.

2.7 Network visualisation evolution – a review of notable examples

This section reviews an evolution of ideas and techniques for visually presenting network state for monitoring and control. Rather than exhaustively list every possible permutation, we focus on a small representative selection of examples that illustrate the range of possibilities discussed in sections 2.1 (The role of visualisation) and 2.4 (Network data interpretation). Our goal is to place practical tools and innovative prototypes into a historical context, characterise them in terms of the taxonomy out- lined in section 2.6, and identify areas where the literature is deficient and where there is potential for further work. We focus on research prototypes and openly-available tools, beginning with text based systems and moving toward interactive, immersive and collaborative 3D monitoring and control sys- tems. Many of the reviewed visualisation systems have been implemented as prototypes, but only a few implementers have actually performed human usability trials (despite such trials being a core 2.7. NETWORK VISUALISATION EVOLUTION – A REVIEW OF NOTABLE EXAMPLES 27

Figure 2.6: tcpdump displaying a fragment of the packets from a TCP exchange test of any particular approach). Where usability trials have been performed, there is diversity in the evaluation methodologies. Consequently, we are generally only able to comment on each work’s implicit claims of usability.

2.7.1 Textual visualisations

The following text based tools have no collaborative aspects or (except for one) methods of influ- encing a running network configuration. All allow for real-time traffic data input into their systems (using the pcap capture library). tcpdump

The default visual presentation of tcpdump [31] is a single line of structured text decoding each network frame encountered on a single network interface. The display can range from a single line summary, through to full multi-line hexadecimal and/or ASCII representations of a frame. Figure 2.6 illustrates tcpdump displaying a section of a TCP session with default verbosity. A user limits what frame information is displayed by running tcpdump with different filter rules. These pcap filters are human-readable expressions, for example: host 136.186.229.1 or port 53 selects packets sourced from, or destined to, IP address 136.186.229.1 or packets with a source or destination port number of 53. The line: ether src 00:1b:63:1d:62:68 and !icmp 28 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Figure 2.7: trafshow displaying flows captured from a network interface

filters for all packets sourced from the interface with the MAC address 00:1b:63:1d:62:68, but not ICMP packets. tcpdump can perform three general functions: recording frames to storage from a network inter- face, displaying frames captured from a file, or displaying frames captured from a network interface. When displaying frames, tcpdump does not have an internal method for accessing historical data that has moved off-screen (although this functionality can be implemented with an external text scroll- back buffer). A limitation of tcpdump comes from the visualisation becoming overwhelming as data is dis- played faster and with more detail than a person can correctly interpret. Along with this, the viewer also requires arcane knowledge of networking protocols to make sense of the data provided. trafshow

Displaying IP flow information in a similar manner to the Unix program ‘top’ (which displays process information), trafshow’s [78] output consists of structured text. A list of flows periodically refreshes (with a user defined period), sorted with higher transfer rate flows shown at the top of screen. traf- show’s input can be taken from a network interface using the pcap library, or Netflow formatted data sent to it by an external metering process. Figure 2.7 shows trafshow displaying flow information calculated from packets captured in real- time from a network interface. It displays varying types of flows in the same list, depending on what level of detail it can discern from collected packets. For example, in Figure 2.7 there are both IP 5-tuple flows and Ethernet flows in the same list. The variables that are concurrently presented by trafshow are: source and destination address (and ports where appropriate), protocol and total bytes 2.7. NETWORK VISUALISATION EVOLUTION – A REVIEW OF NOTABLE EXAMPLES 29

Figure 2.8: A running instance of wireshark or packets transferred per second. At or during runtime, flows collected can be filtered (using the pcap filter syntax). While run- ning trafshow, the user can select individual flows for a hexadecimal or ASCII encoded output of all bytes captured. The scalability limitations for trafshow are not clearly outlined and trafshow has no capability to view medium or long term historical data, only information of active flows matching the active filter expression are stored. While trafshow distills packets into flow-level data and shows the highest bandwidth flows prominently, it still requires low-level knowledge and understanding of the traffic traversing a particular network interface to be useful, like tcpdump.

Wireshark

Using a structured text based display, Wireshark (formerly known as Ethereal), is a 2D GUI display of captured network frames. Figure 2.8 shows how one window section displays colour-coded packets in a (user configurable) summary form. In a second window section, details of a selected packet in a collapsable/expandable hierarchical view are displayed. In the bottom window section are a packet’s raw bytes, represented in hexadecimal and ASCII format. The program has user configurable output and verbosity options for each window section. Dynamic screen updates and packet inspection can occur as packets arrive at a capture interface, or the buffer can be loaded from a recorded file. For historical data access, the entire buffer is available to the user via a scroll-bar. Filters (similar in syntax to pcap filters) can be applied to frames contained in the buffer and various analysis functions can be called to produce both data statistics of flow information and simple 2D graphs. 30 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

With reference to scalability, Wireshark’s main limiting factor is that its packet buffer is stored in memory and once physical memory has exhausted, performance drops dramatically. This imposes limits on the data analysis, requiring data to fit into physical memory, or be pre-filtered to limit traffic to that of interest.

Cisco IOS

The text based control interface to Cisco IOS can not only be used to retrieve real-time information from a running device, but is an example of a text based interface that enables changes to a device’s configuration settings. For example, the command: ip default-network 136.186.229.1

would set the default gateway (sometimes called the route of last resort) of a device to the IP address 136.186.229.1. The text based interface has no historic data access mechanisms and the user can not control the format of presented information. Multiple simultaneous logins are allowed to a device, but the interface contains no specific methods for collaboration. All of these text based tools have no collaborative aspects and present a great level of detail – which can overwhelm even experienced users. In their favour, they do work with real-time data and are open source (except cisco IOS).

2.7.2 Static 2D visualisations

We introduce the following tools in two categories, visualisations of reachability data and visual- isations of traffic statistics. None of the tools has the ability to interact with underlying network configuration, or have any mechanisms enabling collaboration.

Reachability visualisations

Figure 2.9 shows IPv4 reachability data from 1999, with nodes representing routers and connecting lines representing logical links [1]. Data was collected using a network host within Bell Labs (running the program traceroute against Internet hosts) and is arranged using a minimum distance spanning tree algorithm. Figure 2.10 shows a polar-like plot of IPv4 reachability data from 2009 by CAIDA (The Co- operative Association for Internet Data Analysis). It structures AS (Autonomous System) data by 2.7. NETWORK VISUALISATION EVOLUTION – A REVIEW OF NOTABLE EXAMPLES 31

Figure removed due to potential copyright issues

Figure 2.9: 1999 IPv4 Internet reachability by Cheswick et al. [1]

Figure removed due to potential copyright issues

Figure 2.10: 2009 IPv4 AS reachability from CAIDA [2] placing nodes into a polar plot [2]. Each AS node’s angle is plotted based on an estimate of its phys- ical longitude, while radius, node and link color is determined by the associated AS’s connectivity (more connected ASs have a smaller radius value and a redder colour). Data was collected using a distributed probe network run by CAIDA. These reachability visualisations are one-off creations and thus have no form of user control, dynamic updates, historical data access, collaboration or interaction with network configuration. 32 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Figure 2.11: An MRTG graph for a link with outbound data pink, inbound blue

Traffic characteristics visualisations

Multi Router Traffic Grapher (MRTG) [79] and its ‘sibling’ Round Robin Database (RRDtool) [80] graph time series data. The visual presentation of the tools are traditional graphs (two variables versus time as shown in Figure 2.11), usually presented as elements of a web page, for convenient retrieval and viewing. They are primarily used for visualisation of traffic load, although the tools can retrieve input from virtually any source via helper processes (such as the display of flow data when combined with a tool like NfSen [81]), although an SNMP source is common3. Graphs show a period of historical time (that can only be modified by the user altering configu- ration files). MRTG and RRDtool have been extended to work with 320,000 graphs and over half a million data points in a five minute period [83]. Figure 2.12 shows heatmap graphs for IP flow classification [3], where packet sizes are plotted in such a way that the visual ‘finger print’ of the protocols are apparent. The premise is that client-server protocol interactions are distinguishable by the size, timing and direction of packets produced (even when application layer data is encrypted). For example, SMTP, a protocol that mostly consists of data transmission from client to server, has most plot points in the upper right quadrant, while HTTP, a protocol where there is generally more server to client data transfer, has most plot points in the lower left quadrant. The authors of [3] suggest to use the technique as an anomaly detection tool, for example, to verify that all traffic to a web server exhibits the characteristic behaviour of HTTP. The work makes no reference to if the visualisation can be created in real-time, its scalability, or to technical specifics. This work is an example of a mapping of flow data to a visual pattern that directly leverages a human’s pattern recognition capabilities to determine what protocol is used across a flow.

3Tools also exist that display flow statistics, but present the data in a geographical context, such as the Google Maps API based SURFmap [82]. 2.7. NETWORK VISUALISATION EVOLUTION – A REVIEW OF NOTABLE EXAMPLES 33

Figure removed due to potential copyright issues

Figure 2.12: Clockwise from top left, heat maps displaying ‘fingerprints’ of HTTP, SMTP, SSH and AIM (instant messaging) packet data [3].

Figure removed due to potential copyright issues

Figure 2.13: etherApe - hosts discovered on a network are joined by animated lines, where width represents a moving window average of bandwidth usage.

The open-source etherApe [84], shown in Figure 2.13, visualises flow information by present- ing circles (representing nodes on a network) on the circumference of a larger circle. Lines connect communicating nodes while line thickness and node size represent bandwidth usage. etherApe re- trieves real-time data from a network interface, has no methods of displaying historical data and its scalability is unclear. etherApe gives an overall view of the flows, but only though through a single network interface. Compared to trafshow the visualisation makes it easier to see the relationships of flows between machines – but at the cost of detailed information about the flows. 34 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Figure removed due to potential copyright issues

Figure 2.14: glTail - Circles represent web-server requests ejected into an area with gravity and a choke-point [4].

PingTV (2001)

Created for broadcasting networking information to non-expert users, PingTV [85, 86] is a visuali- sation notable for disseminating real-time network state information to a university community in an intuitive manner. The visualisation is distributed across a campus via a cable TV channel and consists of network ‘health’ icons (coloured red, green or yellow) overlaid on a geographical map image. Part of the motivation behind the system is to reduce the work load on help desk personnel that inevitably results from network outages, without using a website to convey the information (which would be down in a network outage). End users have no control over the system and the system does not allow access to historical data. There are no specific elements of collaboration and the system does not interact with running network configuration. glTail [4], shown in Figure 2.14, and glTrail [87] use the simulation of physics as mechanisms for creating animations that display system loads. glTail is a real-time web server visualisation that represents web site requests as circles with mass. They are ‘ejected’ from lists of servers into an area with downward gravity and a choke point at the base, which the circles slowly drain through. glTrail is an animated link-node graph that shows referrals between web pages. Larger circles represent more popular pages and referrals add an attracting force, automatically grouping associated pages. The tools have no historical data access capabilities or mention of their scalability. Both tools are written using Ruby and OpenGL. Both of these tools are notable for the fact they abstract a network metric to a simple object and then use simulated physics as a key element of the visualisation. 2.7. NETWORK VISUALISATION EVOLUTION – A REVIEW OF NOTABLE EXAMPLES 35

Figure removed due to potential copyright issues

Figure 2.15: Rumint - ‘Binary rainfall’ visualisation, each line is one packet, one pixel maps to one packet bit (TCP packets in green, UDP in orange, ICMP purple) [5]

2.7.3 Interactive 2D visualisations

IDS RainStorm and Rumint (2005)

Described as complementary tools, IDS RainStorm is designed for the detection of network intrusions while Rumint is for in-depth analysis of network events [5, 88, 89]. IDS RainStorm is a graph of an enterprise’s IP address space, with hosts represented on the y-axis and time (the last 24 hrs) on the x-axis. The data points, coloured red, yellow and green represent the concern level of IDS output. When drilling down for more detail local IPs are shown on one side of the screen, external IPs on the other, while lines represent connections. Rolling the mouse over elements reveals data details, while filters can be applied in real-time to the displayed dataset. The Rumint tool contains a number of visualisation elements including a parallel coordinate- plot display, glyph-based animation display and binary rainfall visualisation. Figure 2.15 shows the binary rainfall visualisation. Each individual bit of a captured packet is represented as a single pixel in a horizontal row, 0s as black and 1s as a colour (TCP packets in green, UDP in orange, ICMP purple). It has a video-recorder like interface, for the manipulation of time series data and can take pcap formatted data from file, or in real-time from a network interface. Both tools are reported as capable of handling tens of thousands of packets and alarms, but neither tool is collaborative or interacts with a running network configuration.

NVisionIP (2006)

Taking recorded flow level data as input, NVisionIP [6, 90, 91, 92] displays data in three levels, an area showing all IP address space seen, with coloured dots representing user configurable metrics 36 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Figure removed due to potential copyright issues

Figure 2.16: NVsionIP - Showing the three levels of drill down from overview to port level [6]

(‘galaxy view’). A ‘small multiple view’ showing histograms of subnet level activity and a ‘machine view’. A user can move down these levels by selecting data points of interest (shown in Figure 2.16). NVisionIP allows a user to discover a pattern of activity in the visualisation, present this pattern as a symbolic rule and use this rule to search for further patterns, without leaving the tool. NVisionIP has no ability for collaboration or network control.

VISUAL (Visual Information Security Utility for Administration Live) (2004)

VISUAL [7] consists of a network’s ‘home’ IP addresses represented in a grid, with all external IP ad- dresses as coloured squares outside of this grid. This creates an ‘us versus them’ concept where users detect patterns in communication between their internal network hosts and external hosts. Figure 2.17 shows how lines represent connections from hosts to hosts, while their colours delineate connection attempts that did or did not receive responses. While using the visualisation, selections can be made to display detailed information of a host’s connections and hide all other visualisation elements. A timeline allows the user to display different time periods from a data set, read from a pcap formatted file. VISUAL is reported as being tested with 2,500 internal hosts and 10,000 external hosts at a time. The tool has no methods for interacting with a running network’s configuration and does not allow for collaboration.

TNV (Time-based Network traffic Visualization) (2005)

The TNV visualisation has a hybrid of visualisation elements on screen at once [8, 93, 94, 95]. Fig- ure 2.18 shows how the main visualisation window is made up of columns representing time and rows 2.7. NETWORK VISUALISATION EVOLUTION – A REVIEW OF NOTABLE EXAMPLES 37

Figure removed due to potential copyright issues

Figure 2.17: VISUAL - a user’s internal network addresses represented as a grid, external network addresses as surrounding yellow squares [7]

Figure removed due to potential copyright issues

Figure 2.18: TNV - Multiple window sections displaying both traffic overview and individual packet data [8] representing IP addresses. Network activity is shown with lines connecting hosts, while in each host cell, port activity is shown in a small bar graph-like format. The visualisation also contains window sections showing a legend, a list of packets for a selected host, details for a selected packet and a selected host’s port activity. An aim of the tool is to provide a visual display that helps a user gain temporal context of network events. Data can be captured in real-time or read from pcap encoded files. TNV is presented in the literature with 50,000 packets represented in the visualisation. The tool does not have any specific 38 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Figure removed due to potential copyright issues

Figure 2.19: SeeNet3D - displaying 1993 backbone traffic [9] methods for collaboration, or ability to interact with an underlying network. All of the tools mentioned in this section allow for user control over what data is presented, but are still tightly coupled with their underlying data sets.

2.7.4 Interactive 3D visualisations

SeeNet3D (1995)

SeeNet3D [9, 96, 97] uses a link-node graph rendered over a spherical representation of the earth to display the geographic distribution of network metrics. Figure 2.19 shows packet count data from a two hour window during February 1993, from the NFSNET [98] backbone. Higher and redder indicate a higher packet count. The user can manipulate the statistic displayed, arc colour, widths, heights and highlighting as well as globe translucency. Nodes can be toggled to not display, allowing for more detailed investigation of individual links and nodes. The user can also rotate and zoom in as well as dynamically display different time periods from the data set. The work reports success in visualising several ‘industrial-size’ network data sets. SeeNet3D has no features for interaction with a running network configuration and no collaborative features. SeeNet3D reportedly ran on SGI workstations and PCs with Microsoft Windows 95/NT. On the PCs running at 150MHz with graphics acceleration (high-end machines for the time) the display could only be rendered in just under a second, not fast enough to support interactivity. 2.7. NETWORK VISUALISATION EVOLUTION – A REVIEW OF NOTABLE EXAMPLES 39

Figure removed due to potential copyright issues

Figure 2.20: The Spinning Cube of Potential Doom - Displaying TCP connection attempts into a network as coloured dots [10]

The Spinning Cube of Potential Doom (2004)

The Spinning Cube of Potential Doom (SCoPD) is a real-time animated 3D scatter plot of darknet traffic [10]. Figure 2.20 shows the scatter plot consists of local IP address space represented on the x-axis, global IP addresses space on the z-axis and port numbers on the y-axis. Successful TCP connections are represented by white dots, coloured dots are used for incomplete connections (such as might be seen during network scans). Scans create visually distinct patterns inside the cube (such as the ‘barber pole’). SCoPD does not have any methods of historical data access, although the potential for this abil- ity is identified by the work’s author. The SCoPD does not have any specific methods of enabling collaboration (other than the ability for underlying data to be multicast to multiple SCoPD instances). SCoPD does not have any methods of interaction with a running network configuration and has no published data on scalability. Although the code for the original SCoPD has not been released, a GPL reimplementation is available called The GPL Cube of Potential Doom [99] (2005) as well as an implementation written in Python, NetCube [100] (2007). Inspired by SCoPD, Inetvis [101] (2006) extends upon SCoPD by adding the features of variable data replay rate, an adjustable time window, navigation of the visualisation and real-time adjustable data filters. SCoPD is a visualisation created for non-expert interpretation and designed specifically to lever- age a human’s visual ability to detect patterns. 40 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Figure removed due to potential copyright issues

Figure 2.21: Untitled (Malecot´ et al.) - User configurable cube faces represent address space or port numbers while a 2D representation provides detail [11]

Untitled - Mal´ecotet al. (2006)

Figure 2.21 shows that the fundamental element of the the untitled work by Malecot´ et al. [11, 102] is a grid. The grid can represent either a /0, /8, /16 or /24 group of IPv4 addresses or the ports of a host. The user can change between these representations, using drill down functionality. These 2D grids are then combined by the user into multiple cubes in the 3D visualisation, with lines representing con- nections (and user configurable line colours displaying the connection type). The 3D representation provides an overview of the data, while a simultaneous 2D visualisation provides a detailed view. No mention is made in [11, 102] as to whether the visualisation can present historical data or has any collaborative features. The tool does not have any abilities to interact with running network configuration and there is no evaluation of the scalability of the system. The tool can take input from recorded pcap format files or capture real-time data from a network interface.

VAST (Visualizing Autonomous System Topology) (2006)

A 3D BGP (Border Gateway Protocol) route topology visualisation, VAST [12], shows overall reach- ability and individual AS (Autonomous System) behaviour. AS numbers are placed as points in a 3D cube using an ‘Octo-tree Algorithm’. The first 3 bits of an AS number are taken and used to evenly subdivide the cube into 8 sub-cube areas. By repeating this process, all the bits of an AS number can be used to map to a unique point inside the cube. Figure 2.22 shows how the size of an AS node rep- resentation is based on the number of peering sessions the AS has, while lines connect AS numbers to show topographical links. The user can rotate, zoom and pan while filters can be placed upon the data set on display. The tool is designed to allow detection of route hijacking incidents and leaks of 2.7. NETWORK VISUALISATION EVOLUTION – A REVIEW OF NOTABLE EXAMPLES 41

Figure removed due to potential copyright issues

Figure 2.22: VAST - AS networks represented within a cube, for the diagnosis of BGP routing is- sues [12] advertisements for invalid IP address blocks. VAST works with pre-recorded route data (visualising data from a live BGP peering session is only reported as a possible future feature). The user can not display different time periods or have any form of data ‘playback’ from within the visualisation. VAST does not have any collaborative features or the ability to interact with running network configuration.

2.7.5 Immersive 3D visualisations

The following systems are all user-controlled 3D environments, where network metrics and events are capable of being updated in real-time and are displayed through the use of visual metaphors.

Untitled - Crutcher and Lazar et al. (1993)

In the untitled work by Crutcher et al. [77, 13, 103, 104, 105] ATM (Asynchronous Transfer Mode) network nodes are represented as spheres. Links are represented by thin cylinders that vary in diame- ter proportional to capacity and change colour based on how a path is being used. Figure 2.23 shows how these elements are then overlaid on a geographical map. Users can view traffic statistics, or move to another ‘virtual plane’ to observe signalling network state. The tool utilises drill down – if nodes are selected, port detail is displayed, when a port is selected, virtual path information is displayed and when a virtual path is selected, final details are displayed textually. In-world gestures generated using a hardware device called a ‘3D floating mouse’ allow the user to ‘grab links’, ‘move virtual paths’ and ‘commit links’, altering underlying system state from within the visualisation. 42 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

Figure removed due to potential copyright issues

Figure 2.23: Untitled (Crutcher and Lazar et al.) - ATM network physical link inspection [13]

The work’s authors identify, but do not implement, the potential of the tool to use varied visual metaphors and historical data access at various speeds. [77, 13, 103, 104, 105] make no reference to collaborative aspects, the exact number of variables that are concurrently presented, or the system’s scalability. The system was prototyped using an SGI Onyx Reality/Engine 2 with a StereoGraphics CrystalEyes stereo display while a Sun workstation generated emulated virtual circuit flow data. The software appears to have not been publicly released.

CyberNet (2000)

CyberNet [106, 14, 107, 108, 109] uses metaphors taken from the real world. Figure 2.24 shows two examples. At top, a building metaphor is used where networked devices are represented in-world (with their virtual locations based on their real-world positions). Below, a city metaphor is used for NFS (Network ) data, where disks are represented as buildings, computers as districts and subnets as towns. CyberNet also includes more abstract metaphors like a ‘cone tree’ link-node graph and a solar system (where computers are stars, users are planets, processes are satellites and orbit radius, size and color are used to represent CPU usage and virtual memory consumption). The user does not have access to historial data and there are no functions for interaction with underlying systems from in-world, or any collaboration. It is not clear how the system scales and the software, using Virtual Reality Modelling Language (VRML), appears to have never been released. 2.7. NETWORK VISUALISATION EVOLUTION – A REVIEW OF NOTABLE EXAMPLES 43

Figure removed due to potential copyright issues

Figure 2.24: CyberNet - showing a building metaphor (top) and city metaphor [14] (bottom)

Figure removed due to potential copyright issues

Figure 2.25: PSDoom - Unix processes represented as monsters within the game Doom II [15]

PSDoom (2001)

PSDoom [110] is the hybrid of two programs, ps (for listing Unix processes) and the First Person Shooter (FPS) game Doom II. It represents processes in real-time as avatars in the Doom II world. Figure 2.25 shows how existing game artwork is used to display processes as ‘monsters’ in a Doom II game map. Drill down is implemented by moving closer to avatars, where process name and ID are overlaid on each avatar. PSDoom allows for in-world activity to be translated into system commands. The in-world metaphor of inflicting injury upon an avatar is translated into the Unix system command ‘renice’ (lowering a process’ scheduled priority). Completely killing an avatar causes the associated process to be stopped in the controlled system. 44 CHAPTER 2. TECHNIQUES FOR IMMERSIVE AND COLLABORATIVE MANAGEMENT

The work’s author recognised the potential of (but did not implement) visualising network data and utilising Doom II’s network capabilities to allow multiple users to collaborate (with different levels of real-world authority to be given through different strength in-world weapons). The work’s author suggested that visual orthogonality can be used to present process memory and CPU usage through an avatar’s attributes (specifically, the varying of width and height respectively). PSDoom does not have any means for historic data access and no scalability research appears in [110].

2.8 Visualisations using game engines

Game engines have been explored as an aid to creating interactive and collaborative work spaces other than in the context of network visualisation.4 As mentioned previously, text based multiplayer games like MultiUser Dungeons (MUDs) and MUD Object Oriented (MOOs) have been considered for this purpose [70] (1996). 3D games engines have also been suggested as a platform for visualisation. For example, for 3D browsing of digital libraries [111] (2008) and for assistance in Geographic Information Systems (GIS) [112] (2004). The software Brutal [113] (2003) visualises a file system within a 3D FPS game5. Files are represented as objects and in-world weapons allow the user to remove files. This is similar to (File System Navigator), a demonstration system tool for ’ IRIX operating system that allowed users to navigate their file system in 3D. It has gained some notoriety as it was briefly shown being used in the 1993 film Jurassic Park. The game engine Quake III Arena has been used as an aid for collaborative source code visualisa- tion [114, 115] (2005) (with the initial ideas outlined in [116] in 2003). A collaborative visualisation of an astrophysics simulation has been created using the Second Life game engine [117] (2009); par- ticipants can interact with the visualisation, including starting, stopping and rewinding the simulation. It is clear that a number of groups have leveraged game engines for a variety of visualisation purposes. The advantages they offer are hard to ignore, given the alternative is most likely writing an engine from scratch. However, authors of several research papers have pointed out limitations of game engines when using them for visualisation purposes. By far the most serious issue noted, is limitations on the number of in-world objects possible. We discuss the issue of game engine limitations and our workarounds in Chapter 4.

4It is worth noting that this is distinct to the research area of ‘serious games’ – where game-like tasks (both utilising a computer and otherwise) are used to aid learning and understanding. 5Although not built from a game engine it was implemented from scratch in Java 3D to the look and feel of a FPS game. 2.9. CONCLUSION 45

2.9 Conclusion

Operational networks generate enormous monitoring data sets and researchers explore network data visualisation because of the nontrivial task of finding anomalies and patterns within this data [118, 119]. Parallel to this goal, some researchers also aim produce systems that reduce the skill level re- quired to make positive contributions to network monitoring efforts. This chapter has surveyed the evolution of approaches for IP network data visualisation that are moving towards merging collabo- ration, immersion and control features into integrated systems. We have created a taxonomy based on features identified from the literature, and presented published works with respect to this taxonomy. Significant development has occurred in the areas of visual presentations and interactivity. Each unique visualisation scheme commonly allows the user to interactively explore the data set it presents. However, few had features that allowed for both real-time or historical data access (most were limited to a single option). Very few enable control of actual networking devices from within their visuali- sations. Very few supported collaboration in any form. Many of the visualisations have an intrinsic, inflexible coupling between their data set type and visualisation, meaning they are highly specialised to a single task. However, there are tools that allow for generalised mappings. More recent works have started to create low-cost immersive environments that integrate some combination of multi-user collaboration, network control, and real-time data input. The area’s poten- tial has been widely acknowledged though the implementation of many prototype systems, however, most of the tools mentioned in this survey are proof-of-concept and have not seen wide deployment. Some systems have had open source code releases, but others have seen no release whatsoever. Unfortunately, whether due to funding or time constraints, authors have not generally engaged in much self-criticism of their ideas, nor performed usability and scalability testing of their work. In the absence of evidence for any given scheme’s effectiveness and ability to scale to real-world operational needs, it will be hard to obtain (and retain) the interest of the network operations community. Network management that integrates collaboration, immersion and interactivity shows potential. In the following chapter, we describe our proposal for a system that brings these three elements to- gether in an integrated whole, and outline the broad methodology we will use to evaluate the proposal. Chapter 3

Proposal & Methodology

The previous chapter identified a number of significant features that visualisation systems for network management can implement: real-time operation, data set interactivity (including data drill down), flexible metric mappings, network control, collaboration, and immersion. As part of the work for this thesis we have constructed the first and only open source system that brings together these features. In this chapter we present an overview of this 3D visualisation and outline our evaluation methodology. In accordance with the issues discussed in the previous chapter, our system performs the following tasks:

• Collates networking metric data from multiple sources, carried by multiple protocols and in multiple formats.

• Translates this data into in-world updates using a visualisation policy (rules for mapping net- working metrics to the attributes of in-world objects).

• Maintains the distributed visualisation’s state for connected clients, in real-time, to support collaboration.

• Takes in-world actions and translates these into network reconfiguration commands for devices (carried by the required protocol).

A high-level architecture of our system is shown in Figure 3.1. External network metrics are received or solicited by the input abstraction layer and collated. The collated data is converted to in-world updates using the visualisation policy, and sent to the server. Software clients running on various platforms connect to the server over varying network types and conditions. Multiple net-

46 47

Figure 3.1: High-level system architecture work administrators using these devices can view and collaborate on the same data from their own perspective in-world (while avatars represent other in-world users). We map network metrics to attributes of relatively simple 3D objects, such as pyramids and simple 3D icons of networking devices. These objects have a spatial location in-world and various visually orthogonal attributes, such as spin, bounce, size, roll and colour. From within the immersive 3D environment an administrator can move in and around presented data sets and in doing so, filter the amount of data on display. Other network administrators are represented as avatars. Any in- world interactions can be used as a trigger for the output abstraction layer to issue re-configuration commands to one or more networking devices. We also deploy our system on commodity hardware. Although making good use of the hardware available at the time, in the mid 1990s systems such as SeeNet3D required dedicated high performance SGI work-stations to run effectively and when run on high-end commodity hardware could achieve only one frame per second [97]. In contrast, cheap, consumer-grade devices are now capable of providing a collaborative environment and sophisticated animations at over 1920x1080 and 25 frames per second. 48 CHAPTER 3. PROPOSAL & METHODOLOGY

3.1 Leveraging a FPS game engine

To efficiently implement our system we use the FPS game engine OpenArena [120] as a development base. Doing this grants multiple advantages. These include: flexible development through access to source code and Software Development Kits (SDKs), built in mechanisms for creating and manipu- lating immersive 3D presentations, distributed collaborative operation, and a navigation method that has seen extensive deployment. All of these features are implemented to support real-time operation on commodity hardware. The game engine handles the complicated task of instantiating, tracking and rendering 3D graph- ics, along with mediating objects and their interactions (including in-world physics such as simulated gravity and collision detection). In addition to graphics, the engine also allows for audio to be emitted from a point source in-world, with directional audio effects created by clients, based on their relative location and orientation to sound sources. Since id Software’s 1993 release of Doom, FPS games have included real-time network-enabled multi-player capabilities. In-game this can take the form of a cooperative or player-versus-player format. We leverage this ability to allow cooperative collaboration upon a presented data set. Other users are represented as in-world avatars that indicate where they are and what area of the world they can see. Network FPS games are also designed to provide support for real-time, distributed involvement of multiple users, across home network links that have low bandwidth and no guarantees of Quality of Service (QoS). Early FPS games were designed to work over dial-up modem connections – the dominant consumer link technology of the 1990s. These provided, at best, approximately 50 kbit/sec of downstream and 33 kbit/sec of upstream traffic. To maintain acceptable interactivity over these links, FPS games have been optimised to keep packet rates and sizes low. FPS games can be placed in two general categories, open source or closed source. An open source game engine gives full access to all code and allows for any modification required. Orthogonal to being open or closed source, most game engines contain software hooks to simplify the modification of game logic, as well as providing freely available tools for the creation or modification of artwork content (both maps and models). This development software can be created by 3rd parties, or created by game authors and bundled into official Software Development Kits (SDKs)1.

1It is not unusual for games to be created with advanced SDKs, to encourage new games to be built upon an engine. For commercial games this leads to the possibility of more sales for the original game (if the modifications are only minor), or relicensing of the entire engine if a completely new game is developed. 3.2. HISTORICAL DATA ACCESS 49

Whether through the use of the SDK of a closed-source engine, or modifying underlying code of an open source engine, we can gain low level access to virtual world variables and controls. We can add and remove objects from the world, modify their behaviour, and take in-world events and pass these on to external processes. Additional GUI design tools provide us with high level means with which to modify artwork and express our desired 3D representations and the virtual environment they will inhabit. The most significant modification to the game engine needed, was adding input and output soft- ware hooks. With these, we allow input from external sources to alter world-state, and conversely, take in-world actions and send this information to external processes. We have added this functionality to our prototypes through game engine source code modifications, coupled with external abstraction layer processes. Finally, the artwork included in most FPS games is not well suited for our purpose and was also modified. FPS maps and objects are designed to facilitate interesting game play and can be artistically busy. This would only have detracted from our requirements, so we created simplified artwork that places emphasis on presented network data. Maps are simplified to single rooms or platforms, while networking objects are simplified to 3D icons for easy recognition. Retained, however, was the use of human-style avatars to represent other network administrators.

3.2 Historical data access

Although we do not evaluate the use of historical data access in this thesis, we do wish to outline the potential to implement the feature, from both a technical perspective and how users could control it in-world. To support historical data access, two types of data can be recorded in a visualisation system, raw network monitoring data (captured packets, netflow records etc.) or in-world object attribute changes (i.e. network monitoring data after it has passed though the input abstraction layer). While recording either of these would allow users to view historical data in-world, recording the actual network monitoring data allows for more flexibility. For example, if an incident requires more detailed data than the visualisation is presenting, original data can be inspected. If needed, changes can then be made to the input abstraction layer to visually present the event differently in the future. If only in-world attribute changes are recorded, the visualisation can re-play the visualisation of the event, but precise detail of the event is lost. A second advantage of recording network monitoring data is that there are existing tools for 50 CHAPTER 3. PROPOSAL & METHODOLOGY this purpose. These tools can be placed before the input abstraction layer and controlled by the visualisation server. One such open source tool is ‘Time Machine’ [75, 121]. It allows for the efficient continual recording of frames and provides a query interface to recall recorded data. Viewing historical data involves three variables, the time being viewed, the time span being viewed, and the speed of ‘playback’. All three of these variables could be controlled through slider bars, gestures, or buttons (like the traditional ‘fast-forward’ and ‘rewind’ buttons on a VCR). Sliders can be dragged using in-world tools, while VCR-like buttons could be controlled through the ‘shoot- ing’ of objects with tools, or bound to user-definable keys on the client keyboard. As well as the client keyboard there is the possibility of using the hardware device called a jog shuttle. It is widely used in the video industry for moving quickly and precisely though large amounts of linear information. It consists of two rings, an inner and outer. The outer ring controls the speed of playback, varying between ‘fast-forward’ and ‘fast-rewind’, while the inner ring controls moving ‘frame-by-frame’. We suggest that it may assist in the time domain navigation of large amounts of network data. Many USB versions of jog shuttle hardware are available. To indicate to the user what historical time span is being displayed and the speed of playback, time stamps of the data can be super imposed in a corner of the client’s screen. It is left to further work to fully implement these features and evaluate them with human usability study.

3.3 Evaluation Methodology

Through design and implementation of a usability study, we explore: to what degree usability partic- ipants can adapt to the default navigation system of the world, to what degree the attributes of spin, size and bounce can be considered visually orthogonal, if participants have common assumptions about what network metric to object mappings should be, if users can successfully detect changes in an active virtual environment, and to what degree usability participants are sensitive to system latency. Through experiment we quantify the network resource requirements of our system and its ability to work across Internet paths.

3.3.1 Usability experiments

Test participants were sought from two broad groups, those with network administration experience and those without. Demographic information was collected for each participant regarding their gen- eral computing experience, Internet usage, network administration experience and computer gaming experience. 3.3. EVALUATION METHODOLOGY 51

We first explore how well administrators move around and interact with the immersive environ- ment, as this is pivotal to their experience and the overall utility of the system. Various methods have been used to navigate and manipulate 3D environments and objects [122, 123]. Some require only a standard keyboard and mouse, but others have been developed around specialised hardware devices. We use the movement method sometimes referred to as ‘WASD’ or ‘mouse-look’. It uses a standard mouse and keyboard for navigation and has been the default for many PC-based FPS genre games [124] since the mid-1990s. One hand uses the keyboard to control forward-reverse and side- to-side (strafe) movement, the other hand uses the mouse to control direction of view. The name ‘WASD’ refers to the original default use of W, A, S and D keys, but other combinations are possible (such as R, D, F, and G, or the numeric key-pad). Unlike previous work, we evaluate the WASD movement method in a state known as ‘fly’ mode, where no virtual gravity is implemented in-world. By looking up or down (with the mouse) and moving forwards (with the keyboard), our participants can gain or lose height, moving in full 3D rather than 2.5D (3D with gravity). Fly mode does restrict the user from rolling, although they can look in any direction, their avatar’s viewpoint is always oriented vertically. We implement fly mode to enable drill down [74], with administrators moving higher above objects for overviews and approaching objects for more finely-grained views of subsets of data. Due to the differences between this movement method and previous evaluations of FPS game play, we first set out to formally evaluate if our diverse range of participants could successfully use it to navigate in-world. To answer this question we timed participants running through an in-world ‘ob- stacle course’ three times. We evaluate their initial competency and what improvement they achieved. We then explore the visual orthogonality of three object attributes: spin, size, and bounce. To do this, we display to each participant a series of single grey pyramid objects with various attributes, for two seconds each. The participants were then asked to identify and record the attributes presented. The errors from these results indicate if various levels of these attributes are occluded by other con- currently presented attributes. A further experiment explores if any attributes exhibit any form of perceived affordance [125] – the suggestion of a property or use. A simple example of affordance is the user interface to a (properly designed) door. A metallic plate on one side of the door affords pushing, a handle on the other affords pulling. The door communicates its possible use to a user through its design. Similarly, we wish to discover if any of our attributes convey broad concepts in a way common to all usability study participants. We ask participants to score attributes as ‘good’ through ‘bad’, ‘urgent’ through 52 CHAPTER 3. PROPOSAL & METHODOLOGY

‘not-urgent’ and ‘important’ though ‘unimportant’. In the case of network administrator participants, we additionally invite participants to indicate if the attribute affords the concept of an underlying networking measurement. To determine if participants can successfully detect changes in an active in-world environment they are placed in a world simulating the monitoring of multiple networking devices. We show par- ticipants the ‘normal’ activity of this simulated network for 15 seconds, followed by ‘abnormal’ network behaviour (or a ‘null’ scenario, where participants continued to view normal activity). The scenarios were one of: over usage of bandwidth by a host, a sequential port scan across greynet hosts, or a router failure resulting in a localised outage. We then ask participants to report on what changes they noticed. We use these same tests to explore at what point participants notice simulated network delay introduced to their in-world state updates. Each scenario was also given a different delay (between 0 and 400 ms RTT) and afterwards, participants reported if they felt the system was in any way unresponsive.

3.3.2 Network resource consumption experimentation

For a system to be useful in a real-world networking scenario, it must not consume excessive network resources and must also work over the wider Internet. Through experimentation we have characterised the load our prototype places on a management and control network. We have done this with data collected from controlled lab-bench experiments and data collected during usability testing. Our prototype is based on the source code of OpenArena, which is a fork of the Quake III Arena code. The network requirements of Quake III Arena and its derivative games have been quantified in previous work [126, 127, 128]. However, these results were collected as part of ‘normal’ (as designed) game play and do not reflect the potential network activity of our prototype, despite the common code base. In Quake III Arena game play, the only elements that can influence world-state are users’ movements and actions, which are generally fast paced. But in our prototype, there are two components that make up world-state changes. The first are participants’ movements and actions, which are at a much slower pace compared with that of game play. The second is the world-state changes introduced by external monitoring processes, which can potentially introduce many changes over a short time period, for example, during a sudden network attack. We evaluate the network characteristics for two broad reasons. First, we would like our prototype to run in real-time, using various link technologies and under a variety of path conditions. Use of the 3.3. EVALUATION METHODOLOGY 53 prototype on a dedicated desktop host with a LAN is possible, as shown through the usability trials in Chapter 5, but portable devices using wireless interfaces are becoming increasingly commonplace (if not dominant in some markets). We show that the prototype can be used over WiFi based wireless net- works and over paths that include ADSL or 3G link layer technologies. Second, it is desirable that the act of monitoring a network does not interfere with the normal network traffic, and vice-versa. Man- agement traffic can be carried across a physically dedicated or logically separated dedicated network, but can also share the same links that provide end-user data. It is important that the characteristics of the prototype’s network flows are known so that its requirements and potential for interaction with other flows can be calculated. We show that the prototype’s traffic profile is predictable and does not present significant load on a modern network. Chapter 4

Towards a Visual Environment

In this chapter we outline the development of L3DGEWorld, our software for immersive real-time collaborative management of IP networks. We begin by defining greynets [21, 22, 23], a novel net- work monitoring method that we developed in parallel with our visualisation software. It was our motivating use case throughout prototyping and provided the main data source for all prototypes. The development leading to L3DGEWorld was iterative, with two other systems informing its development. The first was 3D Visualisation Environment for NIDS (3VEN) [69], a visualisation written utilising the library FreeGLUT [129]. The second prototype moved to using the FPS game engine Cube [130] as a development base. As we explained in the previous chapter, off-the-shelf game engines closely match our require- ments and leveraging them provides a number of advantages over building software from more fun- damental software libraries. We present the modifications that were required for the engines to fully meet our requirements. The final prototype is the GPL licensed L3DGEWorld, based on the game engine OpenArena [120].

4.1 Motivating use case – visualising greynet data

Our ultimate goal was to have the ability to visualise a wide variety of network metrics. But as a way of narrowing the scope of development during early prototyping, the motivating use case was to visualise the network scans detected by a greynet [21, 22, 23]. All prototypes were able to present visualisations of this type of data as a demonstration. We have developed and evaluated the concept of a greynet - a region of network address space that is sparsely populated with ‘darknet’ addresses. (An alternative term would be ‘sparse darknet’,

54 4.1. MOTIVATING USE CASE – VISUALISING GREYNET DATA 55

(b) Greynet (a) Greynet monitoring host monitoring host VLAN trunk connection carrying all VLAN 10 VLANs VLAN 10 machines

VLAN 11 machines VLAN 11

VLAN x VLAN x machines

Figure 4.1: A greynet host monitors multiple IP addresses (amongst normal ‘lit’ hosts) from various subnets on an enterprise network. a – logical layout, b – implementation using VLAN trunking but we feel that greynet is easier to use in both conversational and written form.) We have defined two methods to instantiate a greynet. The first method involves configuring a sin- gle host to take multiple IP addresses on various enterprise subnetworks. The logical diagram of this is shown in Figure 4.1-A. A low-cost method to achieve this is to connect the greynet monitoring host to a network ‘trunk port’ carrying all enterprise Virtual LANs (VLANs) using a tagging method such as 802.1q, as shown in Figure 4.1-B. The greynet monitoring host can then take multiple IP addresses from each subnet (by either static configuration or through DHCP lease). This method is outlined in ‘Greynets: a definition and evaluation of sparsely populated darknets’ [22] and implemented in the software greynetd [131]. The second method is detailed in ‘IPv4 and IPv6 Greynets’ RFC6018 [23]. In this case greynets are implemented with router assistance. Edge routers forward a packet to a greynet host when that in-bound packet triggers the router to emit an ARP (Address Resolution Protocol) or ND (Neighbour- hood Discovery) request on the destination network. This second method, while published during thesis creation, had no deployments at the time of experimentation and was not used during prototype development. Greynets are an ideal source of data to visualise for our purposes, as they are simple to imple- ment and the data they output needs no complicated processing. Our prototypes are all capable of visualising a greynet of up to 128 hosts, scattered across an enterprise network. We display packets per-second to each greynet host as the object attributes of ‘jump’ and ‘spin’. As scans go across a network, ripples of movement can be seen going across a field of objects repre- 56 CHAPTER 4. TOWARDS A VISUAL ENVIRONMENT

Figure 4.2: Early prototyping with 3VEN - pyramids representing subnets and wireframe cones rep- resenting ports. Object size denotes packet-per-second rate into greynet space. senting greynet hosts. This was designed so that even a layperson can see when a scan is traversing a network. Furthermore, our design allows for objects spatially grouped by subnet, so it is easy to see what specific areas of a network are being scanned.

4.2 Early prototyping - 3VEN

The first prototype, 3VEN, was written in C and utilised FreeGLUT [129], a library for the cross- platform development of OpenGL based applications. The development of 3VEN (described in more detail in [69]) allowed for the articulation of the belief that we can concurrently present multiple net- work metrics in a compact form by using visually distinct attributes of a 3D object (such as a colour, size, shape and various movements) to represent each metric. We termed such distinct attributes as being visually orthogonal. 3VEN was only capable of presenting simple solid colour or wireframe shapes, as shown in Figure 4.2. Subnets were represented as pyramids and groups of ports were represented as wire-frame cones. These objects changed in size based on the number of packets per-second arriving at a greynet host. It was at this point in development that it became apparent that use of a game engine as a base for development would drastically simplify the creation of a virtual environment that meets our needs. Creating the 3VEN system using nothing but the library FreeGLUT allowed for the highest level of customisation, but it ignored an enormous body of existing work. We subsequently chose to experiment with the Cube game engine [130] (due to its open code base 4.3. CUBE ENGINE 57

Network administrator

Attacker 3D Game Engine Client ACL update

out

in Attack Detected 3D Game Engine Server Greynet

Figure 4.3: The Cube game engine prototype visualises the output of a greynet and ability to dynamically alter the game world during play) and then moved to Quake III Arena due to its open source code base and well-tested networked multiplayer capabilities. Many other alternatives were considered and excluded (such as Doom 3 and Quake 4 [132], and Half Life 2 [133]), primarily due to their game engines being closed source and thus difficult to retrofit with new capabilities.

4.3 Cube engine

The Cube engine prototype builds upon the feature set of 3VEN in several areas. The system allows for remote collaboration by presenting other users as in-world avatars. It also contains a generalised abstraction layer that allows data to be input in real-time, and represented by in-world objects. An output abstraction layer allows reconfiguration of networking infrastructure to be triggered when the user acts upon in-world objects with a variety of in-game weapons and tools. The Cube game engine’s artwork was modified to create our desired look and feel. The world was reduced to a single room, with a high platform so a user can gain an overview of all the objects on display. As with 3VEN, the input to the visualisation was the network metric of packet arrival into a greynet. The object created to represent a greynet host are simple pyramids, with a red and black texture pattern to accentuate any rotation. When a packet is detected at a greynet host, its corresponding in-world object both rotates and oscillates up and down (a jump-like action). The in-world metaphor of shooting was modified so that hits to objects trigger the placement of an ACL (Access Control List) onto a Cisco router – blocking all access to the network by the attacker. The left side of Figure 4.3 shows a user carrying a ‘firewall gun’ and surveying the 25 greynet hosts (distributed over 5 subnets). The view also contains traditional in-game elements – such as ‘health’, ‘armour’, and ‘ammunition’ indicators – which are unused in this prototype. 58 CHAPTER 4. TOWARDS A VISUAL ENVIRONMENT

The experiment network’s layout is shown on the right side of Figure 4.3. A Cisco router con- nected our ‘outside world’ network and an internal enterprise network, configured as five small sub- nets. Five greynet hosts were placed in each subnet. The Cisco router’s Access Control List (ACL) was remotely configurable (using telnet from a game server helper process). Both the greynet meter- ing process and visualisation hosts were developed on a single machine running FreeBSD 5.4 [134].

4.4 Example usage

The sequence of events shown in Figure 4.4 illustrates an example use of the Cube engine prototype (in this case both visualisation server and client are running on the same host). The system presents an active network scan, as detected by a set of greynet hosts. The scan activity is viewed and acted upon by an administrator. Initially the network’s greynet is idle (i.e. experiencing no ingress packets). Objects representing the greynet hosts stay still. An attacker launches a scan on the network by using nmap [56] to send (TCP SYN) packets to port 445, linearly scanning across one of the network’s subnets. An object begins jumping and spinning, indicating that a TCP SYN packet has been detected heading towards the associated greynet address. As nmap continues to scan across the subnet, the remaining objects associated with that subnet also begin spinning and jumping. Due to each object’s uncharacteristic behaviour, the network administrator infers that there is reason to intervene. The administrator uses the in-world metaphor of ‘shooting’ an affected object. The game-engine translates this into an ACL update on a router, implementing a block on the address from which the inbound scan packets are arriving. All objects stop their spinning and jumping as the network scan packets no longer enter the network.

4.4.1 Multiple monitoring points

Our view onto a network depends on where our monitoring point (or points) are placed. In our simple greynet usage example, we place our greynet hosts inside an enterprise network. When a scan is detected and mitigated against, a block is placed against the traffic of the host generating the scan. This is effective, but removes visibility of the attack because it it now being sourced beyond the view of the monitoring point. This issue occurs with all monitoring systems. Placing points inside the network will collect data on issues that have come beyond the outer defenses. Placing them outside will collect data from any attack, even if it is stopped by a network’s outer defenses. In our prototypes we support taking data from multiple monitoring points. For example, both 4.4. EXAMPLE USAGE 59

Figure 4.4: Detecting and blocking an active network scan 60 CHAPTER 4. TOWARDS A VISUAL ENVIRONMENT sides of a firewall can be monitored and visualised. This allows issues to be seen and prevented, but still shown in-world as ongoing. There are a number of methods through which this could be visualised in-world. One method would be to present different monitoring points as different spaces or rooms that can be moved between. Alternatively, with correct configuration of the input abstraction layer, external monitoring data of ongoing attacks could be presented as ‘ghost’ objects (objects with a transparency) near their internal monitoring point object counterparts.

4.5 L3DGEWorld

Development for LEDGEWorld moved to a Quake III Arena based game engine. There were a num- ber of advantages in doing this. The codebase has been released under the GPL since 2005 and has been revised and forked into various projects, meaning that bugs in the original source have been minimised. It runs on wide variety of desktop (Windows, Mac OS X and Unix-based) and portable (iOS and Android) platforms. Our group’s previous experience and study of the engine was also quite extensive, it having been the focus of other research [135, 136, 126, 127, 137, 128]. Quake III Arena could also be modified to support dynamic control of the animation of in-world objects representing network state. While Quake III Arena’s source code was released in 2005, the artwork (maps, textures and object models) were not. The actual development base chosen was the OpenArena package, which takes the GPL’ed Quake III Arena code, makes various code improvements and provides GPL artwork and maps. L3DGEWorld is shown in Figure 4.5 with the viewer and another user inspecting two greynet host objects (pyramids with a grey brick texture). When close to a greynet object, the IP addresses of the host is overlaid upon it. L3DGEWorld retains the in-world metaphors of ‘shooting’ various ‘weapons’ as its external interaction method. When the basic weapon is used on a greynet object, it places a block on the underlying system’s router, preventing the source of the network scan from continuing to access the network. When the ‘lightning’ weapon is used on an object, more detailed information about the object is displayed in textual form. Objects in-world can have any model, skin or colour and various levels of jump, spin, size and roll, mappable to any underlying data source. Navigation around the world is, by default, the ‘fly’ method. This allows the user to gain height in-world and receive an overview of activity without needing supporting platforms or other structures. Environments and objects for the visualisation can be created with any Quake III Arena compatible development tools. 4.5. L3DGEWORLD 61

Figure 4.5: L3DGEWorld - the viewer and another user inspect pyramid objects representing network hosts

OpenArena’s network stack does not implement any form of security for network transmissions. L3DGEWorld does not attempt to add security, and we leave it to other protocols (such as IPsec) to provide a secure path for the network management data carried by a L3DGEWorld system that requires it.

4.5.1 Collaboration

Goodall et al. finds though an exploratory field study of information security experts [19, 20] that network administration is a collaborative task, both internally and externally to an organisation. On one level it is externally collaborative, with administrators using Internet resources (such as mailing lists, forums and the like) to continue “keeping up with everything” – as one participant described the majority of their time. On another level it is internally collaborative. Operators must have a detailed knowledge of their organisation’s networking environment to have the context to properly respond to issues. A common occurrence for an administrator is to dismiss an alert as a false positive. However, to find what is abnormal, it first requires the administrator to know what is normal. This experience can be spread across administrators in an organisation and must be gathered together to address a potential issue. Further, this spread of information presents difficulty when bringing new staff into a 62 CHAPTER 4. TOWARDS A VISUAL ENVIRONMENT network. Goodall also finds that collaboration is performed particularly when issues are difficult or time critical. The study breaks the tasks participants perform into three phases: monitoring, analysis, and re- sponse. Monitoring tasks are much less demanding than analysis and response, and can be performed by less experienced staff with the right tools. Doing this also allows such staff to gain experience with the networking environment. With more experienced staff freed from the task of monitoring, they can devote more time to analysis and response, and also proactively take on preventative security work. L3DGEWorld supports collaborative control and implements a permissions system to manage user-activation of external events. Users may be assigned different levels of authority, by giving users and objects ‘weightings’. For example, a user can be given a weighting of 100, two further users, weights of 50 each. An object can be set a weighting of 100. In this scenario, the first user can act upon the object on their own, while the two lower-weighted users must concur and act upon an object together, within a limited period of time, for a change to be implemented on the underlying network. What follows are two specific scenarios of how the types of collaboration outlined in the above work would be experienced when using our visualisation system. The organisation consists of three senior administrators and a junior one. Each administrator has a different background and set of experiences. During a morning there are increasing complaints about poor response from the servers of the organisation. A single senior administrator begins an investigation. They enter their virtual world and at a glance find a significant bandwidth usage by all servers. This is a highly unusual and significant event, so they call in the other senior administrators. One of these administrators joins the world remotely as they are working from home. All three confer and come to the conclusion that it is a serious and sustained bandwidth based attack on many of the servers, arriving from two external hosts. The situation is serious enough to warrant immediate action, and the solution to the issue is to place a new access controls on the main corporate firewall. A serious mistake is avoided when it is realised by one of the administrators that one of the hosts seemingly under ‘attack‘ is a development server and is actually just running test cases as a stress-test. The correct firewall rule is implemented when two of the senior administrators take action upon the firewall object at the same time. A second scenario involves a junior administrator monitoring the greynet hosts of the network. The junior administrator is sure that they have discovered a network scan against a particular subnet of the network. Although convinced the scan is a network attack, the junior administrator is prevented from actually placing a traffic block, and is forced to collaborate with a more senior administrator 4.5. L3DGEWORLD 63 before a network change can be instantiated. They immediately call the senior administrator into the world and they both inspect the issue. It does turn out to be a scan, but the senior administrator realises that the scan is coming from a machine controlled by a member of the security team. Although the senior administrator does not know why the scan is occurring, his assumption is that this is most likely a legitimate penetration test by the team member. A phone call confirms this. In this case, an issue was discovered by a less experienced administrator, that could have been very serious, but turned out to be a false positive. In this process though, the junior administrator gains knowledge of the organisation’s network, so they can perform the same task in the future more effectively. We have not extensively explored all the possibilities regarding collaborative management of networks in this thesis, but we have created a system that opens a way forward for further exploitation of the area.

4.5.2 Drill down

We implement a drill down feature in L3DGEWorld. The feature provides advantages from two perspectives. One is human scalability, in other words, aggregating data to prevent the user from becoming visually overwhelmed, while still enabling analysis of details as required. It also allows an increase in the amount of network monitoring data L3DGEWorld can handle, while working around limitations in the Quake III Arena engine. An input abstraction process can monitor a large number of devices, resulting in massive amounts of data. We collect this data in the input abstraction layer and then convert it to in-world attributes of objects organised in a series of virtual rooms or spaces. Initially the user sees a single space filled with objects presenting some form of aggregate data. Virtual world camera movement allows the user to move within these objects representing networks and enter another visually different space. Here they view further objects that represent the next level of detail of the data. L3DGEWorld is flexible, this drill down mechanism can be used with any data mapping the user wishes to configure. In L3DGEWorld version 2.3 the demonstration configuration represents a three- tier drill down approach that is akin to the hierarchy already found in IP networks: that of subnets, hosts and ports. This is shown in Figure 4.6. At the the highest level of this example, visual objects of routers represent entire subnets. Moving into a router object takes the user into another room with objects that represent individual hosts, in this case, VoIP phones. If a host is moved into, the user is taken into a room displaying four pyramid objects representing ports of interest. To move back up a level of aggregation, the user moves towards the top of the space, where they reach a trigger point 64 CHAPTER 4. TOWARDS A VISUAL ENVIRONMENT

Figure 4.6: Three-tier drill down in L3DGEWorld 2.3: Left to right, moving into a router object shows host objects (VoIP phones), moving into a host object then shows port objects that takes them back to the previous space. This three-tier approach allows network state to be gleaned quickly at any level. It gives the ability to quickly expose details not present in the top level views by moving in, or just as quickly allows movement out to a view of the entire network. When using this three-tier representation, we attempt to preserve as much visual attribute consistency as possible at each level of the hierarchy. For example, when viewing an object representing an entire subnet, size is akin to the unique connections leaving and entering that subnet, the same is true as when viewing an object representing a host. In the OpenArena engine, the drill down functionality is implemented using a number of virtual rooms, reused through avatar teleportation. More technical detail of this method can be found in [138].

4.6 Experiment architecture

To outline the various network flows of L3DGEWorld, we introduce the high level L3DGEWorld system architecture we have implemented. The various network flows of this system are shown in Figure 4.7 and for clarity, shows all flows traversing separate paths (represented as clouds). This need not be the case in an actual deployment, as all the flows could share the same path, all could traverse their own dedicated paths, or any combination between. From left to right in Figure 4.7, various types of network monitoring data can be transferred to the input abstraction layer from metering processes, via network Monitor Flows (MF). Input ab- straction layer processes take diverse monitoring data and turn this into in-world attributes, based on some form of visualisation policy previously decided upon by administrators. The process then 4.7. L3DGEWORLD EXTENSIONS/MODIFICATIONS 65

L3DGEWorld world state flows (LWS)

L3DGEWorld protocol input flows (L3DGEComms-in) L3DGEWorld protocol output flows (L3DGEComms-out) Metering processes

Input Output Abstraction Abstracion Process Process Devices under L3DGEWorld control Network monitor flows (MF) server Network control flows (CF)

Figure 4.7: The network flows in a L3DGEWorld system uses the L3DGEWorld protocol L3DGEComms-in [139] to send these in-world state changes to a L3DGEWorld server. After attribute updates have been sent to the L3DGEWorld server from input abstraction layer processes, L3DGEWorld world state flows (LWS) are used by the server to maintain a consistent in- world state on the clients. This protocol is inherited from Quake III Arena’s protocol. UDP packets containing world-state are sent at regular intervals from the server to all connected clients. These are referred to as snapshot packets. The UDP packets sent from each client to the server contain individual users’ in-world actions and these are referred to as command packets. L3DGEWorld protocol output flows (L3DGEComms-out) carry information about in-world ac- tions by users, to output abstraction processes. After receiving this in-world information, the output abstraction layer takes actions to convert them into network re-configuration commands carried by network Control Flows (CF) to devices under control. The network re-configuration commands could be communicated to devices by one or more of the data types and protocols outlined in Section 2.3.

4.7 L3DGEWorld extensions/modifications

L3DGEWorld was developed in a modular fashion and is not restricted to only monitoring greynets and controlling routers. Since our public releases of L3DGEWorld in 2006 and 2007 [18], it has also been successfully used by other researchers to visualise: Uninterruptible Power Supply (UPS) devices with LupsMON 0.2 (L3DGEWorld Uninterruptible Power Supply Monitoring) [140], TCP session state with LTCPMON (L3DGEWorld Transmission Control Protocol Monitoring) [141], su- percomputer state with LCMON (L3DGEWorld Cluster-node Monitoring) [142] and VoIP systems with LAMS (L3DGEWorld Asterisk Management System) [143]. Figure 4.8 shows LCMON and 66 CHAPTER 4. TOWARDS A VISUAL ENVIRONMENT

Figure 4.8: LCMON - Objects representing super-cluster nodes (left) & LAMS - Objects representing VoIP clients (right)

LAMS in use. To modify the base L3DGEWorld software for their needs, there were two modifications the developers of these systems needed to make: changes to in-world artwork and creation of an input abstraction layer. Creation of an output abstraction layer is also possible, but optional. None of the above examples made use of an output abstraction layer, as they were created for monitoring purposes only. The in-world look of a L3DGEWorld based visualisation is created by modifying the map of the world and the objects within it. Map elements include all static items in-world, such as sky, terrain, structures, and their associated textures and lighting. Maps can be created from scratch (or the L3DGEWorld examples modified) using any 3D modeling tool that supports the BSP format, including the open source tool GtkRadiant. Object’s shapes and textures can be created with any tool that supports the MD3 file format, including the shareware software MilkShape 3D. In all of the above L3DGEWorld modifications GtkRadiant and MilkShape 3D were used for map and object creation (when assets were not reused from L3DGEWorld). The location of objects that present attributes can be statically set in the map file or can be dynamically controlled (as the engine is running) by modifying values in a text file on the L3DGEWorld server. More detail of map and object creation can be found in Map & Entity Mod- eling for L3DGEWorld [144], the technical report by Javier outlining the modifications performed for LCMON. There is a wide variety data collection methods implemented by the input abstraction layers of the modifications mentioned above. Each created their own unique input abstraction layer process 4.8. CONCLUSION 67 for collecting data. These convert the data to in-world attribute values, then using the example C code included in the L3DGEWorld package, transfers these to a L3DGEWorld server using the L3DGEComms-in protocol. LupsMON uses an open source SNMP library to periodically poll UPS devices across five uni- versity campuses. LTCPMON takes state machine data from the TCP stack of a machine running FreeBSD. It does this by reading the output of log files produced by the FreeBSD kernel module ‘siftr’. LCMON periodically creates a TCP connection to a port on the controller host of the Swin- burne University supercomputer, which then transmits back an XML file with supercomputer state information. The LAMS input abstraction process, ‘Grazer’, periodically collects statistics from mul- tiple Asterisk based VoIP servers using TCP to issue commands to the Asterisk Manager Interface (AMI).

4.8 Conclusion

We have successfully used 3D game engines as a development base to create a visualisation platform. By adopting this approach, we could focus on implementing mappings between monitored network events and in-game events, relying on the FPS engine to take care of complex graphics rendering (at the client) and synchronisation of collaborative actions between users. Although capable of vi- sualising a wide variety of data inputs, throughout prototype development the motivating use case was to visualise a greynet implementation. Prototyping moved through three stages, starting with a FreeGLUT based application written from scratch in C (3VEN), to an implementation with the Cube game engine, and finally to an OpenArena based system, L3DGEWorld (released under a GPL). It is L3DGEWorld that the remainder of this thesis evaluates. Chapter 5

Usability Evaluation

In this chapter we illustrate the potential utility of our proposal through a human usability study1. We solicited participants from the widest possible range of demographics (age, gender, education, occupation etc.), but specifically targeted network engineers for recruitment. At the end of usability testing there was a total of 49 participants. We have four broad areas of investigation for the human usability tests. First, we characterize the degree to which prior computing and gaming experience affects a person’s ability to navigate a 3D virtual world. We wish to have even inexperienced users capable of navigating the world. Second, we characterise the degree to which our objects’ attributes afford common concepts to participants. For example, we explore if an object bouncing vigorously, suggest ‘normality’ or ‘urgency’ to partic- ipants. Third, we outline the degree to which users can correctly differentiate the visual orthogonality of the attributes size, bounce and spin, when presented in combination. For example, we explore if an object has a slow spin, if other attributes prevent participants from detecting this bounce. Fourth, we explore if participants can use an ‘active’ system to correctly discover in-world changes. Finally, we quantify how sensitive participants are to network latency introduced between the L3DGEWorld server and client. The usability study consisted of multiple choice questions and open-ended written questions cov- ering five broad sections: collection of participant demographic information, testing of participants’ ability to navigate the virtual world, determining if object attributes (at varying levels) convey com- mon concepts, testing for visual orthogonality and running simulated network monitoring scenarios. A reproduction of the questionnaire is included in Appendix B along with blank answer sheets in

1All of the aspects of human experimentation were granted human ethics approval by Swinburne University Human Research Ethics Committee (SUHREC). Required documentation relating to this approval can be found in Appendix A.

68 5.1. METHODOLOGY 69

Appendix C.

5.1 Methodology

A total of 49 participants took part in the usability experimentation over a period of two months. Experiments occurred with between one and three participants at a time. All sections were com- pleted individually, except the final scenario based tasks which were performed as a group if multiple participants were present. For all usability experiments the software used was L3DGEWorld release version 2.2.1. We began with participants completing a questionnaire to capture demographic information. We wished to know if there were any significant biases in our sample and if any particular results had correlation to a specific demographic. The questions were basic information regarding: gender, age, what (if anything) they were currently studying, and if they had any form of colour blindness. They were then asked a series of questions regarding their experience in four areas: Internet usage, network administration (if any), computer gaming and general computer proficiency. We marked the answers to these 32 questions to gain a simple numerical score of experience in each area. As example, under the area of Internet usage, to the question ‘The first year I started to regularly use the Internet was:’ a participant receives a score of 4 for reporting ‘<1989’. Possible answers increased in 5 year increments through to ‘2005 - present’, where this answer scored a 0. In the area of general computer proficiency a question was, ‘I am familiar with (have used long enough to arrive at an opinion of) the following operating systems (check all that apply):’ followed by a list of 14 operating systems, the participant received a score of 1 for each checked. Navigation ability was tested by putting each participant through a simple obstacle course made up of three sections. For each section, participants were placed floating in the middle of a plain cube shaped room that took participants approximately 5 seconds to traverse end to end. One grey pyramid object (Figure 5.1) was placed in each of the eight corners. Participants needed to interact with each object (move up to and touch or shoot) as it highlighted in-turn. The different sections ended when all objects were interacted with. Each section increased movement complexity. Section 1 required simply shooting each highlighted object while being locked from lateral movement. Section 2 required participants to use both the mouse and the keyboard to move around the world and touch each highlighted object. Section 3 required participants to perform a combination of the first two tasks – shoot red, and touch blue colored objects. Participants practiced each section once, then completed all three test sections three times each, while being timed by the L3DGEWorld game engine. All 70 CHAPTER 5. USABILITY EVALUATION

Figure 5.1: A greynet host object participants completed the tests with objects highlighting in the same order. In the next section we determined if participants shared common concepts about various object attributes. Participants were shown grey pyramid objects with various attributes, one at a time, for 4 seconds each. There were 4 possible attributes: spin (slow, medium or fast), size (large or extra large), bounce height (small, medium or large), roll (90 or 180 degrees) and colour (red, purple, yellow, blue, orange and green). Participants were then asked to rank each object they saw on three scales, ‘Goodness’, ‘Urgency’ and ‘Importance’. Each scale had 5 possibilities, as summarised in Table 5.1. Participants were also asked to optionally name the networking activity that this object and its attributes represent.

Table 5.1: Object attribute scales ‘Goodness’ ‘Urgency’ ‘Importance’

Good/Normal Not urgent Unimportant Somewhat Good/Normal Somewhat not Urgent Somewhat Unimportant Neither Neither Neither Somewhat Bad/Abnormal Somewhat Urgent Somewhat Important Bad/Abnormal Urgent Important

To test for visual orthogonality, participants viewed 73 pyramid objects each with a different combination of attributes. They viewed them one at a time for 2 seconds, after pressing the space bar. We displayed objects only briefly, as we wish to push the participant’s threshold of attribute perception to a point where lack of visual orthogonality can be detected. After each object was displayed, participants were then asked to report what they perceived the object attributes and values were. Every participant was presented with the same randomised object order. To keep the number 5.1. METHODOLOGY 71

Figure 5.2: Human usability word layout & clockwise from top-left, detail of in-world objects repre- senting a router, laptop, greynet host and VoIP phone of variations participants were required to view the attributes and values used were limited to: spin (none, slow, medium and fast), size (none, large, extra large) and bounce (none, small, medium and large). Finally, a set of more open in-world tests were undertaken. In the cases where more than one participant was present, this section was performed in groups. The tests consisted of each participant sitting at a desktop machine running the L3DGEWorld client software, connected to a dedicated L3DGEWorld server via an isolated LAN. Figure 5.2 shows an overview of the world in which participants were placed. It consisted of four platforms interconnected by ‘walkways’. Each platform contained 32 objects of the type greynet host, phone, router or laptop. Participants could chose to navigate the world with simulated gravity enabled, or use fly mode. Four scenarios of 120 seconds each were shown to each participant group (with the scenario order randomised for each group). Participants were informed that the first 60 seconds of the scenario simulated ‘normal’ activity. This normal activity was the same for each scenario, with various objects presenting attributes at a low level, slowly changing over time. This was followed by 60 seconds of scenario activity. During the scenario period participants were asked to freely navigate and record what, if any, changes they saw in-world. The simulated scenarios are as follows:

• A network scan (such as those generated by the program nmap), visually consisting of all the greynet host objects sequentially beginning to spin, then at the end of the 60 seconds, slowing 72 CHAPTER 5. USABILITY EVALUATION

down and stopping.

• A continuous high use of bandwidth by a single host, visually consisting of one laptop object that became much lager than it was during normal activity and spun quickly.

• A router failure, visually consisting of two components. A router object, that was spinning during normal activity, stopped spinning and started bouncing (to indicate it was unresponsive). At the same time, a single line of eight VoIP phone objects also began bouncing.

• The null scenario, where the normal network activity continued.

We also invited participants to optionally name the activity as a networking event. They could provide a written answer or check an answer from a list. The list consisted of possible interpretations of the visual activity presented: network scan, large bandwidth usage by a network host, large number of connections by a network host, a network host down, denial of service attack against a host, or a denial of service attack sourced from a host. Previous research has shown that less than 150ms one-way network delay is desired by players to maintain a satisfactory interactive FPS game experience [137, 145]. In such games, a player’s reaction time is a major factor in game success. To determine participant’s sensitivity to latency when performing network monitoring tasks, for the final 14 participants we added 5 levels of emulated delay to the 5 in-world tests they participated in (they were given an extra null scenario compared to other participants). These delays were either 0, 100, 200, 300 or 400 ms RTT (and were administered in a random order). At the end of each test, the participant was asked “Did you find that the system was responsive (i.e. was there any noticeable delay preventing you from moving or collaborating effectively in the world)?” with a 5 point scale ranging from “Responsive” to “Unresponsive”. This was followed with the short answer question, ‘If you found the system to be unresponsive in any way, how did you find it unresponsive?’. Participants were then asked to self-report on a 5 point scale if they used walking or flying around the world during the scenarios, and if they generally kept their distance from objects, or moved in for closer inspection. Finally, a series of open-ended short answer questions were asked to determine if the participant had any troubles during the scenarios, and if they had any further overall comments about the usability experiments. 5.2. RESULTS & DISCUSSION 73

Age ranges of participants 14

12

10

8

6 Frequency 4

2

0 < 20 26-30 36-40 46-50 56-60 21-25 31-35 41-45 51-55 61-65 Age ranges

Figure 5.3: Human usability participant age distribution

5.2 Results & Discussion

5.2.1 Participant demographics & experience

Of the 49 participants, 57% (n=28) reported male and 41% (n=20) reported female (one participant did not answer). Ages ranged from under 20 to 61-65 as shown in the distribution in Figure 5.3 (the minimum possible age of participation was 18 years of age). Our sample is skewed to younger participants, with 67% of participants under 30, but 22% were over 51. 61% (n=30) of our participants were not enrolled in any form of study. 33% (n=15) that reported they were, with 10% (n=5) enrolled in a course relating to telecommunications or information tech- nologies, 14% (n=7) arts, 2% (n=1) engineering and 4% (n=2) science. 8% (n=4) did not provide an answer. For colour blindness 92% (n=45) reported no, one reported yes and one did not answer. For this analysis we can consider our results to be free from any issues that would be associated with a significant number of our participants having colour blindness. Figure 5.4 shows the distributions of participants’ skill metrics. Our range of participants varied from one who had little experience of computers or using a mouse, though to experienced network engineers with computer gaming experience. There were relatively even distributions for general computing and Internet usage. There were fewer participants with experience of network administration, with about 15% having no experience at all (even setting up or configuring a small home network). Close to 25% of participants had no computer gaming experience at all, but the distribution of the other participants was evenly distributed. We sought network administrators for the usability experiments. Five participants were current network engineers. Four of these were employed by one of Australia’s largest telecommunication 74 CHAPTER 5. USABILITY EVALUATION

General computing experience of participants Internet usage experience of participants 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10

Cumulative participants (%) 10 Cumulative participants (%) 0 0 0 5 10 15 20 25 0 5 10 15 20 25 30 35 40 45 General computing experience Internet usage experience

Network administration experience of participants Computer gaming experience of participants 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 Cumulative participants (%) 10 Cumulative participants (%) 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 Network administration experience Computer gaming experience

Figure 5.4: The distribution of participant’s skills in each category, from lowest to highest scoring participant providers. A further four participants identified as being network administrators sometime in the past, and had experience with managing networking infrastructure and some form of network monitoring software.

5.2.2 Navigation within L3DGEWorld

Figure 5.5 shows the relationship between a participant’s total experience and their ability to navigate the world, as measured by the total of all their experience metrics and the total time taken to complete the obstacle course. There is a negative correlation between the two variables for participants with experience metric values below 60. The less experience, the longer the times recorded moving around the world. Above an experience value of 60, participants were relatively close to the limit of how fast they could possibly complete the obstacle course, so times do not show significant improvement. Data does not appear in Figure 5.5 for three participants who did not record times for all three attempts. One of these participants is noteworthy as an extreme example of the negative correlation 5.2. RESULTS & DISCUSSION 75

Figure 5.5: The relationship between a participant’s experience and their ability to move within the world

Figure 5.6: Participant obstacle course run times, ordered by first run time between experience and obstacle course completion time. The participant completed the course in 914.5 seconds (even without recording the time taken on one component) and had a very low total ex- perience metric. This participant reported having had very little experience using a computer mouse. Nevertheless, although three times slower than the fastest participants, they were still able to complete the course. Figure 5.6 shows participants’ obstacle course run times, ordered by their first run. The vast majority of participants improved across their runs and in general, the longer the participant’s first run time the greater their improvement. The participants with the fastest 20% of first run times, increased a mean of approximately 5% between their first and third run, while the slowest 20% increased a mean of 15%. After these tests participants were asked to report: ‘What, if anything did you have problems 76 CHAPTER 5. USABILITY EVALUATION with?’. The majority of participants (57%, n=28) did not report any problems. Reported problems fell into two distinct categories. Distinctly negative comments centered around difficulty with vertigo-like problems, for example: ‘With all the movement it felt like everything was vibrating. Made tension in my eyes’, ‘Motion sickness, Motion sickness improved with use’ and ‘The overall effect was slightly nauseating. Particularly the wobbling walk motion’. By contrast, the other category of comments included: ‘Nothing really – pretty straightforward’, ‘Moving was slow, but it wasn’t a problem’, ‘Is there a way to speed up moving to get to a distant object? Can you zoom out?’, ‘I felt a little frustrated by a narrow field of vision when hunting for the next target’. Although these last comments are superficially negative, they were reported by experienced game users who were so comfortable with the navigation method that they wished to be able to move faster and have a wider field of vision in-world. In a real-world system it would be advantageous to let users customise their field of view and movement speed, to accommodate those who come to the system with navigation experience.

5.2.3 Object movements and the concepts they convey

Figure 5.7 shows distributions of participant’s opinions of bounce height. With all three measures, the height of the bounce did not significantly change participant’s opinions of the object. Overall, any existence of bounce is considered more good/normal than bad/abnormal, but almost 40% of participants felt it was neither. Participants were divided between if bounce represented urgency or importance, with no clear answer emerging. The results for colour are shown in Figure 5.8. Green was reported by over 80% of participants as (somewhat) good/normal, while red was the opposite. Blue was considered slightly more neutral than green. Purple and yellow did not solicit strong opinions in either direction, but tended to be either neutral or slightly positive. Orange was similar but almost 40% of participants saw this as somewhat bad/abnormal. There were similar results for urgency and importance, but not as strong, and participants seemed split as to if the colour red was urgent or important for these measures. This data suggests that green, orange and red retain what would be expected from their common real-world traffic light usage. Blue was close to green, but considered slightly more neutral. Purple, orange and yellow, had no clear results. For the measures of urgency and importance, green and blue were considered to more not-urgent and unimportant, but all the other colours seemed to split participant’s on their meaning. As shown in Figure 5.9, whether on its side (90°roll), or upside down (180°roll), object roll 5.2. RESULTS & DISCUSSION 77

Figure 5.7: Participant’s attitudes to object small, medium or large bounce height

Figure 5.8: Participant’s attitudes to object colour seemed to represent almost exactly the same meaning to participants. Overall, participants were split on what an object with roll represented, with an almost even number of participants choosing each possibility. 78 CHAPTER 5. USABILITY EVALUATION

Figure 5.9: Participant’s attitudes to object roll of 90°or 180°

The results for size are shown in Figure 5.10. Participants reported similar opinions for both large and extra-large object size. Participants seemed split, with about half reporting that size increase was some level of good/normal, not urgent and unimportant. Almost the entire other half reported neither for each measure. For spin, Figure 5.11, there was little spread of opinion between the different speeds. Generally, there were no strong results for any speed of spin. Approximately 40% of participants chose neither for every measure and approximately 40% of participants felt that any rate of spin was somewhat good/normal, somewhat urgent and somewhat important. During this same test participants were also asked to optionally name the presented attributes as a particular network activity, however, the data collected was too sparse and did not allow for any sort of analysis. After reporting on each of the three scales for the results above, very few then reported a network activity. Overall, while different colours showed some clearer results, for the most part the other attributes tested, bounce height, roll, size and spin showed little difference for varying values and largely split participants’ opinions. These results were less clearly defined than expected. It was thought that 5.2. RESULTS & DISCUSSION 79

Figure 5.10: Participant’s attitudes to object large or extra-large size participants would have stronger preconceptions about attributes and that these could be positively exploited in a visualisation.

5.2.4 Visual orthogonality

Participants’ ability to detect object attributes correctly are summarised in Figure 5.12 with three measurements: when an attribute value is displayed alone, when displayed with one or more other attributes in aggregate, and when the overall attribute was correctly detected (but the attribute’s value was not necessarily correctly identified). Overall, the presence of any spin was correctly detected 93% of the time. Fast spin rate was detected 86% of the time when presented on its own and 87% when in aggregate. Medium spin rate was detected 71% of the time when presented on its own and 74% in aggregate. Slow spin rate was detected 71% of the time when presented on its own and 49% in aggregate. The overall detection rate of spin was high, but the accuracy of detection drops as the spin rate becomes smaller, more so when in aggregate with other attributes. Given these results, spin rate should be implemented with in a non-linear scale to keep its lower values from being overlooked. 80 CHAPTER 5. USABILITY EVALUATION

Figure 5.11: Participant’s attitudes to object slow, medium or fast spin

Attribute recognition 100 90 80 70 60 50 40 30 20 Alone Aggregate 10 Overall Attribute Bounce-Small Bounce-Med Bounce-Large Correct recognition - all participants% recognition Correct 0 Spin-Slow Spin-Med Spin-Fast Size-Large Size-XLarge

Figure 5.12: Correct recognition of object attributes, alone and as aggregate with other attributes.

Overall, size change was only detected 59% of the time. Only very large changes in size were detected at all (approximately 53% of the time). The smaller size changes were only detected approx- imately 13% of the time. The difference between the attribute detection alone and in aggregate was less than 1%, suggesting that the size of the object is not occluded by other attributes, it is just not 5.2. RESULTS & DISCUSSION 81

Figure 5.13: The relationship between total experience and correct recognition of object attributes reliably detected at all. This is perhaps because, in the absence of other visual clues, object size can be very difficult to determine. Larger objects can be mistaken for simply being closer than thought. We attempted to provide visual cues by presenting the objects in a small room, but this seems to have not worked reliably. If size change is to be used at all, we recommend that future work explore using some form of periodic size change (a pulsing of sorts) as an attribute, rather than absolute size change. Overall, bounce height was detected 89% of the time. It was assumed that larger bounce height would be more easily detected, but unexpectedly, participants found larger bounce height to be more difficult to detect. Similar to the results for size, there was little difference (less than 4%) between showing the attribute alone or in aggregate. Overall, the attribute was less reliably detected as bounce height became higher. During tests, while height of bounce was varied, the speed of bounce was not. This meant that while bounce with low height could be detected in two seconds, large bounce height was too subtle during the same duration. Bounce height and rate should be set together, as a large bounce height set with a very low bounce speed can render the bounce difficult to perceive. We explored if there was any correlation between participant experience in any demographic area and their performance in detecting attributes and their levels, none was found. As example, Figure 5.13 shows total experience against correct recognition of object attributes.

5.2.5 Participants’ ability to detect in-world events

The open in-world tests were completed with 15 participants individually, 8 groups of 2 participants, and 6 groups of 3 participants. Participants were invited to verbally collaborate when in groups, but 82 CHAPTER 5. USABILITY EVALUATION only one group of two participants did. Participants quickly became engaged in their own view of the world and did not communicate at all. Due to this lack of communication, we consider these results to be specific to the individual for the purposes of this analysis. We placed the written answers from this section into three categories: correct, incorrect or am- biguous. An answer was marked as correct where the participant reported the changed visual activity (or made only a minor error). An answer was counted as incorrect when the wrong activity or no answer was reported, and answer was counted as ambiguous when it was unclear or otherwise vague. The simulated network scan was the most visually obvious of the scenarios, 45 of the participants detected this event, four did not. The failure of a router was the next most visually obvious scenario. Six participants indicated they had seen the change in both the VoIP phone object and router object, 40 reported on the VoIP phones alone, and three were incorrect. The high use of bandwidth by a single host was the most visually subtle scenario, 41 participants detected, three did not and five provided ambiguous answers. The fourth scenario was a control, where the normal network activity continued. Participants were informed at the beginning of testing that one or more of the scenarios could be a continuation of normal operation. 23 participants correctly indicated it was a continuation of normal activity. Four answers were ambiguous. 22 were wrong. When participants were wrong it was because they reported aspects of the normal activity as an event other than normal. They did not invent and report activity that did not occur. This tendency seemed to simply be because of their lack of familiarity with the activity of the normal scenario.

5.2.6 Participants’ in-world positions

It was clear from these results and observation of the participants during tests that they kept their distance from the in-world objects to maintain an overview, moving closer only for inspection of areas of interest. They also showed and reported flying around the world. No participants felt the need to move across the virtual ground, indicating that they were comfortable with the flying navigation method.

5.2.7 Participant’s sensitivity to latency

When asked to report on the system’s responsiveness while moving around the world, 71% (n=10) of the 14 participants reported completely “Responsive” for all 5 tests (or had a single missing answer). Only 4 participants answered less than the maximum of “Responsive” for one of their tests. Although participants experienced up to 400 ms of RTT latency, they still indicated no issue using 5.2. RESULTS & DISCUSSION 83 the system. RTTs over 400ms are not expected in any realistic IP path between L3DGEWorld server and client(s) so we did not explore the impact of higher RTTs.

5.2.8 Final open-ended questions

The final two written questions of the usability study solicited comment on the scenarios and any final general comments. The answers provided in this section were similar to the comments made in Section 5.2.2. Negative comments included: “I didn’t really have a feel for the world”, “My eyes ached from the continual changing + vibration” & ”Felt frustrated, out of my league, out of control” and “... I needed to stretch a few times it was good enough fun to continue/persist”. As before, the other comments were superficially negative, However, there were consistent desires of participants with gaming experience to be able to navigate and explore the world at a faster pace. Comments included: “Speed, I felt I couldn’t move around fast enough”, “Slow flying is annoying. Meant I couldn’t quickly investigate a particular host/router/phone”, “The flying speed was too slow”, “Couldn’t move fast enough”, “sometimes felt slow moving to where I wanted to be”, “Need a ‘Zoom’ function to quickly inspect a single or small group of components quickly (think sniper scope)”.

5.2.9 Professional network administrators

The professional network administrators’ comments were broadly similar to non-administrators. Gen- erally, they were more positive, for example: “Overview of flying is great” and “Stunning and visually informative way to look at such a complex network.” They did more strongly stress the desire men- tioned above – to explore the world at a faster pace. For example: “slow to move to elements in error”, “Slow flying is annoying. Meant I couldn’t quickly investigate a particular host/router/phone (Not enough time to inspect up close, then fly to get an overview)”, “Too slow to get a view / Slow to navigate” and “Sometimes felt slow moving to where I wanted to be. Need more than one view point!”. One set of comments were unique to the network administrators. These included: “Difficulty determining possible network events without knowing the topology, or capability of routers to, for example, detect/rectify DOS”, “It would be good to show network topology”, “Subtle information and info re: connections between devices lacking” and “No way of seeing how network joins together. Laid out orderly, but location independent compared to real world”. From these comments and the verbal comments of administrators during the tests, it seems that the network administrators did not 84 CHAPTER 5. USABILITY EVALUATION just see a set of objects in-world, but were attempting to make mental connections between these objects and the real networking devices they represented. They seemed to wish to further interpret the results they were seeing in the context of the network they represented. This is an understandable response given our goals and what we present in-world. In the same way a rail engineer would likely view Beck’s map (Section 2.1) and desire more detailed data about the train network (such as exact distance between stations, track gradients and load-limits etc.) our network professionals desired more network detail. The data presented in the usability study was simulated, with no drill-down to real networking data available. It is a promising result that the network professionals almost immediately started to attempt to make these connections between the visualisation and an underlying network.

5.3 Conclusion

In this chapter we have illustrated the utility of our proposal through a human usability study, with 49 participants covering a wide variety of skill sets. Participants, even those with little experience with computers, could navigate the world. Most participants showed improvement during the time period of the test. A small number of participants reported that they felt uncomfortable with the method, at worst feeling slightly nauseated. But these participants are contrasted by a group who wished they could customise their movement further for more efficiency when moving around the world. For the affordances of the attributes, while different colours showed some clearer results, for the most part the other attributes tested (bounce height, roll, size and spin) showed little difference for varying values and largely split participants’ opinions. These results were less clearly defined than expected. Multiple object attributes can be used to simultaneously represent variables, but caution must be used to make sure that these are visually orthogonal and do not interfere with each others perception by the user. As example, we have found that the slower an object’s spin is made, the less reliably detected it will be when presented in combination with other attributes. Participants could use the system (and the ’fly’ movement method) to accurately detect simulated network activity. Although participants experienced up to 400 ms of RTT latency (more than expected in any realistic IP path between L3DGEWorld server and client), they still indicated no issue using the system. Overall, the reaction participants had for the virtual environment was very positive. This is the first such evaluation of this type, and shows this approach to network management has potential. Chapter 6

Network Resource Consumption

In this chapter we present the L3DGEWorld system architecture, evaluate its network resource con- sumption and present the limitations of the system. We quantify the system in terms of the delay it introduces, and its network resource characteristics when presenting changing object attributes. We test the system under a series of conditions, starting with isolated lab-bench tests utilising synthesised traffic over controlled paths, then analysing traffic captured as part of the usability trials covered in the previous chapter, before completing a set of tests running over paths that include the Internet, WiFi and 3G links. We begin by outlining our experiments, before discussing L3DGEWorld client connection estab- lishment and teardown, as these are special but brief cases of network traffic behaviour. We then detail the data propagation process of attribute updates though a L3DGEWorld server. Following this we move on to the characteristics of continuous operation traffic, and then proceed to detail attribute update affect on snapshot size. We quantify the delay that a L3DGEWorld server introduces to data, between receiving a L3DGE- Comms-in packet and sending this update to a client as a snapshot. As discussed in Section 2.1.2, a defining feature of a visualisation system is the delay that it introduces between an event being monitored and that data being presented to a user. Many are not capable of real-time operation (they read from post-processed historical logs), or could only be described as near real-time. Through analysis of usability trial traffic we show how the L3DGEWorld world state flows change when multiple clients are connected. We quantify multi-participant command traffic, followed by quantification and prediction of snapshot traffic. L3DGEWorld limitations are then outlined before we end the chapter by reporting on data obtained by running L3DGEWorld over a series of uncontrolled paths that include 3G, ADSL2+ and 802.11g links, and the Internet, to test that realistic values of loss

85 86 CHAPTER 6. NETWORK RESOURCE CONSUMPTION

Figure 6.1: L3DGEWorld isolated lab-bench experiment setup and latency do not prevent or limit L3DGEWorld from operating effectively.

6.1 Experiment setup

Figure 6.1 shows the experiment network’s layout for our controlled experiments. This consisted of two hosts running FreeBSD 8.2-RELEASE, one generated L3DGEComms-in packets containing attribute updates (far left). This machine provides the ability to generate repeatable traffic from a synthesised network made up of elements of our choosing. A second machine acted as L3DGEWorld server (centre) and a third machine provided Microsoft Windows XP client functionality (far right). We chose to run L3DGEWorld client on a Windows platform due to the mature support for accelerated 3D graphics hardware. We chose to run the L3DGEWorld server on FreeBSD due to the author’s fa- miliarity with that platform. The underlying Quake III Arena network code is platform-independent. The experiments for this section occurred on an isolated 100Mbit/s Ethernet based test network (bot- tom). In addition to connection to this network, the machines were multi-homed to a second network for out of band control via SSH (top). The packet trace files for analysis were all collected on the server’s network interface using the program tcpdump. Our time stamping accuracy was found to be, at worst, 119 µs (and switching delay 130 µs)1. Since game traffic has been shown to support real-time interactivity with inter-packet times measured

1The experiments and details of these results can be found in Appendix D Hardware accuracy 6.2. CLIENT CONNECTION ESTABLISHMENT AND TEARDOWN 87 in the tens of milliseconds [146, Ch. 10], we can consider our level of time-stamping accuracy to be sufficient for this analysis. In the following sections we use packet size to mean the IP packet length (including IP header).

6.2 Client connection establishment and teardown

L3DGEWorld state flows can be broken down into three phases: connection establishment, continu- ous operation, and connection teardown. Connection establishment and teardown are brief. We cover them here as special cases before concentrating on the more significant traffic contribution of contin- uous operation. All results after this section have had L3DGEWorld connection and disconnection phases removed. Within the L3DGEWorld engine, client connection establishment contains two sub-states, con- nected and primed, before moving into the continuous operation state of active. A client initiates a connection to the server with a ‘challenge request’ and receives a challenge reply. This challenge exchange transfers a nonce/cookie to be used in subsequent communications, as a simple method of preventing session hijacking or malicious connection attempts to the server. The client then sends a connect request to the server and the server replies with a connection reply. The server now considers the client connected. Required world state data is generated, and the server begins transfer of this to the client (with no system of acknowledgment). The client is considered primed and the server will repeatedly send entity state data to the client until it replies. The client then loads game resources from its storage to memory, and when finished, it responds to the server. The client is then considered active by the server. The network traffic profile now moves into continuous operation until the client disconnects. A typical example of the connection establishment packet exchange is shown in Figure 6.2. In this example the server has no in-world attributes set and snapshot and command rates are set to 20 packets per-second. The majority of traffic for connection establishment is created in the server to client direction. The client to server traffic is not a significant contribution to connection establishment. Similar results are obtained for snapshot and command rates of 10 and 5 packets per-second. Typically, when a client connects to a server that has all object attributes set to zero, just over 7200 bytes are transferred. On a server with all possible object attributes set (128 objects with all 9 attributes), just over 15,400 bytes are sent. The time it takes to transfer this data and thus the per-second bandwidth spike introduced, depends on what rate snapshots are being sent out by the 88 CHAPTER 6. NETWORK RESOURCE CONSUMPTION

Packet sizes of all received packets 1400 Client to server Server to client 1200

1000

800

600

Packet size (bytes) size Packet 400

200

0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time (sec)

Connected Primed Active

Figure 6.2: Typical connection establishment states with 20 snapshot and command packets per- second and no in-world activity server. A trade-off exists between the bandwidth spike introduced by client connection and how quickly world state (and thus, the user entering the world) is completed. The lower the snapshot packet rate is set, the lower the per-second bandwidth spike will be, but at the cost of longer initial state transfer. For example, in our experiments, when sending snapshots at 5 packets per-second and having set all 9 attributes of all 128 in-world objects, the bandwidth spike introduced peaks at 55,000 bits per-second. When using 10 and 20 packets per-second, this is 110,000 and 220,000 bits per-second respectively. From a user experience perspective, this translates into a world state transfer time of: 973 ms for 20 packets per-second, 1943 ms for 10 packets per-second, and 3940 ms for 5 packets per-second. When a L3DGEWorld client wishes to disconnect, it simply signals this intention to the server and within a few milliseconds, closes its listening socket.

6.3 Data propagation process

L3DGEWorld attribute propagation through a server is shown in Figure 6.3. A L3DGEComms- in packet containing an object ID, attribute and a value to set (for example, object 7, attribute 3 set to value 100.0) is received by the server. If the L3DGEComms-in packet contains information that is identical to that already stored on the server, no further action is taken. Each attribute for 6.3. DATA PROPAGATION PROCESS 89

Clients' data structures Parse configStrings L3DGEComms-in configStrings[index]configStrings[index] (update only configStrings[index]configStrings[index] packet if difference) Entity states Delta compression At snapshot creation Message configStrings Fragmentation IP Transmit UDP

Entity states 1300

IP Transmit 1300 UDP File transfer [...]

Figure 6.3: L3DGEWorld attribute propagation through a L3DGEWorld server each object has a unique index number and is stored on the server as a one-dimensional array of ‘configStrings’. The configuration strings that have changed are then copied into the data structures holding information destined for each connected client. Concurrently with the above process, the server will (at the configured time interval) create and transmit a snapshot for each connected client. A ‘message’ (up to 32,768 bytes) is created and three types of data are added in the following order. First, all changed configuration strings, if any exist. Second, data for entities that have changed (these are further reduced in size through delta encoding with previously sent entity data). Third, the Quake III Arena engine has the capability of transferring files from server to client if the client is missing game assets (maps, objects, etc.) for the current game the server is hosting. If any file is being downloaded from server to client, this is also added. The final message, if over 1300 bytes, is then fragmented. These fragments are then sent as UDP snapshots, at the set rate for snapshot transmission. Messages are generally only larger than 1300 bytes (and thus transmitted as fragments) after initial client connection, when a large amount of entity state is being transmitted, or when many L3DGEWorld object attributes are altered in a short time. At times other than these, the messages generated are well under 1300 bytes. By default the Quake III Arena network code sends 20 snapshots per second (one every 50 ms) to provide smooth FPS game play. However, this can be adjusted by the server operator, and similarly the client, for command packets. In the following experiments we ran tests using both the server and 90 CHAPTER 6. NETWORK RESOURCE CONSUMPTION

Figure 6.4: L3DGEWorld client to server traffic distributions for a single stationary user and no world updates clients configured to generate 20, 10 and 5 snapshot packets per second.

6.4 Continuous operation traffic

After the client connection phase, L3DGEWorld enters the continuous operation state. In this state L3DGEWorld sends snapshot and command traffic even when there is no world data to be updated. This leads to a minimum rate at which client and server communicate. To determine this rate, a single client was connected to the server. The client’s perspective was not moved, and no updates were made to the world state of the server. Data was collected for approximately 300 seconds (5 minutes). This was repeated for snapshot and command packet update rates of 5, 10 and 20 packets per-second. Client to server direction per-second bitrates are shown in Figure 6.4. The average bandwidth of the flow in the client to server direction is 2, 4 and 8 kbit/sec for 5, 10 and 20 commands per second, respectively. All command packet sizes were between 49 and 56 bytes and over 90% of packets were between 50 and 52 bytes. The inter-arrival times were as expected – 50, 100 and 200 ms ± 1 ms. The data rate does not vary significantly over time. Server to client direction bitrates are shown in Figure 6.5. The server to client flows had an average bandwidth of 18.8, 37.4 and 75.3 kbit/sec for 5, 10 and 20 snapshots per second, respectively. Over 95% of packets are 46, 47 or 48 bytes and the packet inter-arrival times were as set – 50, 100 6.4. CONTINUOUS OPERATION TRAFFIC 91

Figure 6.5: L3DGEWorld server to client traffic distributions for a single stationary user and no world updates and 200 ms ± 1 ms. The L3DGEWorld system of data transfer leads to a trade-off between a high update rate (in the order of 20 snapshots per-second) and lower rates. A higher number of snapshots per-second leads to a high level of interactivity, a faster transfer of world state updates, but a less efficient use of bandwidth. Fewer packets per-second leads to a less interactive experience, a slower propagation of world state updates but more efficient use of bandwidth.

6.4.1 Attribute update affect on snapshot size

We quantify the additional load that L3DGEComms-in attribute updates and client movement intro- duce to snapshot packet size for connected users. The following experiments were run in the same virtual world map used during usability trials - a world containing 128 objects. The experiment setup was also as shown in Figure 6.1. The experiment procedure was automated and proceeded as follows, at the start of the experiment all 128 objects had their attributes set to zero. 8 objects’ attributes were updated to a value (the same value for all iterations). The size of the snapshot data caused by these updates is recorded. All objects’ attributes were then reset to zero. This was repeated, each time increasing the number of objects having their attribute updated by 8 until 128 objects were reached. The entire experiment was repeated for each of the 9 possible attributes. An example and representative experiment is shown in Figure 6.6 for attribute ‘1’, spin rate. The 92 CHAPTER 6. NETWORK RESOURCE CONSUMPTION

Spin rate attribute updates versus snapshot packet sizes

2500 First packet Second Packet Total

2000

1500

1000 Packet size (bytes) size Packet

500

0 0 20 40 60 80 100 120 Objects updated

Figure 6.6: Updating the spin rate attribute of 8 to 128 objects (increasing in steps of 8) versus resulting snapshot packet sizes. other 8 attributes are not shown as their results are similar. Each data point in Figure 6.6 shows snapshot size versus the number of objects updated in-world. Circles represent the size of the first packet sent, and if sent at all, crosses represent the size of the second packet sent. Asterisk points represent the total size of both packets. When updating in-world attributes, each attribute change adds between 16 and 20 bytes to a snapshot (depending on the attribute that is being updated). Snapshot fragmentation occurs when a packet reaches 1336 bytes (this corresponds to approximately 70 attributes updated) leading to a second packet being sent. From the total data sent in both packets, it can be seen that snapshot size scales linearly with an increase of altered attribute values.

6.5 L3DGEWorld server data propagation delay

The propagation delay of an attribute update through a L3DGEWorld server is based on the snap- shot rate and (if any), snapshot fragmentation. As shown in the last section, when the number of attributes updated is under 70, there is no need for snapshot fragmentation and all attribute updates are transmitted in the next snapshot created. The delay that can be introduced to an attribute update has an upper and lower bound. At one extreme, an attribute update can arrive just before a snapshot is scheduled to be created, leading to a very small delay, consisting solely of the server processing time. At the other extreme an attribute 6.6. MULTIPLE CLIENT L3DGEWORLD - USABILITY TRAFFIC ANALYSIS 93 can arrive at a L3DGEWorld server just after a snapshot has been sent. This attribute update will wait until the next snapshot is created before being sent. If a server transmits snapshots at 20 packets per-second, this leads to a maximum possible delay of 50 ms for an update. The same results would be obtained when snapshots are sent at 10 or 5 packets per-second, but with a maximum possible delay of 100 ms or 200 ms respectively. As covered in Section 6.3, if many attribute updates occur in a short period of time, a L3DGEWorld message will be sent as multiple snapshot fragments. Snapshot fragments are still sent at the snap- shot rate (as opposed to being sent back-to-back). Thus, the maximum possible delay experienced by an attribute update sent in a snapshot fragment is the amount of time the attributes were required to wait for the next snapshot to be created, plus the time for all of the subsequent fragments to be sent. The exact delay experienced by an individual attribute depends on the ID of the in-world object the attribute is being applied to, and the specific attribute being updated. This is because attributes are ordered in a snapshot based on the order they are stored in L3DGEWorld’s internal arrays, not on their order of arrival at the server in L3DGEComms-in packets. An example of a server updating a large number of attributes could proceed as follows. First, the large number of attribute updates contained in L3DGEComms-in packets arrive at the L3DGEWorld server in less than a millisecond. This occurs just after a snapshot has been created and sent. If the snapshot rate is set to 20 snapshots per-second, 50 ms passes before a snapshot is created. The number of attribute updates means the created message is fragmented into three 1300 byte snapshots. The first is sent immediately but the next is sent after another 50 ms, and the last after a further 50 ms. The set of attribute updates, although arriving within the same millisecond at the server, will be only be fully sent to the clients after 150 ms.

6.6 Multiple client L3DGEWorld - usability traffic analysis

We analyse traffic collected during usability experiments, to create predictions for L3DGEWorld traffic. By doing this we can predict what traffic load a system with multiple administrators will require and at what point L3DGEWorld will cease to effectively support collaboration on a given path. Previous work has been able to create predictive models for Quake III Arena traffic (among other games) [147, 148, 149, 150]. By taking empirical traffic statistics from games with fewer players, accurate predictions can be made about the traffic of games with a larger number of players. For example, 4-player game’s traffic probability density function (pdf) can be predicted using convolution operations on two 2-player games. A 5-player game can be predicted with a 2-player game and a 3- 94 CHAPTER 6. NETWORK RESOURCE CONSUMPTION player game. Our analysis here finds similar properties for L3DGEWorld traffic. The traffic trace data used in this analysis was collected during usability testing, when users completed simulated network monitoring tasks (outlined in detail in Section 5.2.5). We have 5 traces where participants used the system individually, there were ten with 2 participants and three with 3 participants. This provided us with empirical data for scenarios with 1, 2 and 3 participants (from the total of 34 participants where network traffic was captured). Snapshot traffic and command traffic each change differently as the number of connected clients increases. Command traffic varies little with the number of clients connected to a server, because command packets only carry the data of an individual user’s movements and actions. These move- ments and actions vary little between a user alone in-world, or with multiple other users. However, snapshot traffic does increase as the number of connected clients increases. As more clients connect to a server their individual movements and actions must be aggregated and transmitted to all other connected clients. Snapshots packet send rate is fixed by a server, so the only variable that changes is packet size. So as more clients join a server, packet size will increase.

6.6.1 Command traffic

Observation of the participants revealed that the in-world movement style used for network monitor- ing was completely different to that of a FPS game (even with participants who had extensive gaming experience). L3DGEWorld movement was observed to be more subdued, considered and sporadic (think, move, think, etc.) compared to Quake III Arena play with an equivalent number of partici- pants. Users needed more time to consider what they were viewing. Quake III Arena play is based on sub-second decisions and can has a frantic pace. But discerning what attributes are being presented in L3DGEWorld takes time for consideration (sometimes more than 2 seconds, as shown by the errors of users discerning various attributes in Section 5.2.4). Participants’ propensity for slower movement can also be seen in the usability participant’s self reporting in Section 5.2.6, where participants preferred wider overall views of the world, rather than remaining close to the objects and constantly moving to gather information about the network. This comparatively subdued movement in L3DGEWorld can further be seen in the network traffic com- parison in Figure 6.7 where client to server per-second bit rates are shown. Three human usability participants were in-world at the same time. The data also shows three Quake III Arena players play- ing a game at the same time. The Quake III Arena play data is actual gameplay from the Simulating Online Networked Games (SONG) Database [151, 136]. The Quake III Arena traffic has a higher and 6.6. MULTIPLE CLIENT L3DGEWORLD - USABILITY TRAFFIC ANALYSIS 95

CDF of per-second bandwidth 100

90

80

70

60

% 50

40 Quake III Arena 30

20 L3DGEWorld

10

0 0 10000 20000 30000 40000 50000 60000 70000 80000 Per-second bandwidth (bps)

Figure 6.7: Command packets - Multi-user Quake III Arena play compared to L3DGEWorld network monitoring more consistent bitrate, while the L3DGEWorld required less overall bitrate than Quake III Arena, but more variability between participants.

6.6.2 Snapshot traffic

The methodology of previous work used convolutions of the pdf of snapshot packet size data from games with a smaller number of players to predict the pdf of games with a larger number of play- ers [147, 148]. We use the same technique here.

Our usability studies had up to 3 users in-world at once, we define these as En where n is the number of participants. Thus, we have empirical data for scenarios of E1, E2 and E3. We create predictions for between 2 and 6 users and test them against real data for 2 and 3 users. We define these as Pi∗ j where i and j are the convolutions that make up the prediction. Our predictions are thus:

P1∗1, P1∗2, P1∗1∗1, P2∗2, P2∗3 and P2∗3. Figure 6.8 shows the empirical and predicted probability density functions (pdf) of snapshot packet sizes from the usability trials. Top-left, shows a combination pdf of all the usability trials where a single participant was in-world. Top-right shows the usability trials where two participants were in-world, along with the predicted pdf (dashed line) based on the self convolution of the sin- gle participant data. Bottom-left shows the data for three participants, along with the predicted pdfs (dashed lines) of the self convolution of the single participant date and an alternate prediction from the convolution of the single participant data and the two participant data. Bottom-right shows the predicted pdfs of a four participant scenario based on the convolution of the two participant data, a five participant scenario based on the convolution of the two and three participant data and a five 96 CHAPTER 6. NETWORK RESOURCE CONSUMPTION

All single participant traffic All two participant traffic 0.20 0.20 Empirical Predicted (1*1)

0.15 0.15

0.10 0.10

0.05 0.05

0.00 0.00 40 50 60 70 80 90 100 110 40 50 60 70 80 90 100 110 120 Packet size Packet size

All three participant traffic All four to six participant traffic 0.20 0.20 Empirical Predicted (2*2) Predicted (1*1*1) Predicted (2*3) Predicted (1*2) 0.15 0.15 Predicted (3*3)

0.10 0.10

0.05 0.05

0.00 0.00 40 50 60 70 80 90 100 110 40 50 60 70 80 90 100 110 Packet size Packet size

Figure 6.8: All snapshot packet size data probability density functions - empirical and predicted participant scenario based on the self convolution of the three participant data. From theses graphs we can see good agreement between empirical and predicted results. The predicted results accurately capture the peak of the packets sizes while also including the tail of larger packets. Following Paxon et al. [152] we do not use quantitative statistical tests such as Kolmogorov- Smirnov or chi-squared. Paxson et al. point out that such tests can be misleading or unreliable when applied to the large amount of data typical of telecommunications experiments. Instead we follow their lead and rely on visualisation techniques. We use Q-Q plots to visually show agreement. A Q-Q plot is a plot of two sets of cumulative probability distribution quantiles against each other on the same set of axes. The more similar the data sets the closer to y = x the plot will be. If the two data sets are linearly related the Q-Q plot will still lie on a straight line. Figure 6.9 shows comparative CDF plots and a Q-Q plot of empirical versus predicted data for two participant scenarios. This data shows an excellent match. Figure 6.10 shows comparative CDF plots and a Q-Q plot of empirical versus predicted data. The predicted data still captures the characteristics 6.6. MULTIPLE CLIENT L3DGEWORLD - USABILITY TRAFFIC ANALYSIS 97

2 user scenario snapshots, empirical & predicted CDFs 2 user scenario snapshots, empirical & predicted Q-Q plot 100 100

90 90

80 80

70 70

60 60

% 50 50 Predicted 40 40

30 30

20 20

10 Empirical 10 Predicted 0 0 40 50 60 70 80 90 100 110 120 0 10 20 30 40 50 60 70 80 90 100 Packet size (bytes) Empirical

Figure 6.9: Empirical and predicted data, comparative CDFs and Q-Q plot of two participant scenarios

3 user scenario snapshots, empirical & predicted (1+1+1 convolution) CDFs 3 user scenario snapshots, empirical & predicted (1+1+1 convolution) Q-Q plot 100 100

90 90

80 80

70 70

60 60 nvolution)

% 50 50

40 40

30 30 Predicted (1+1+1 co Predicted

20 20

10 empirical 10 Predicted 0 0 40 50 60 70 80 90 100 110 120 0 10 20 30 40 50 60 70 80 90 100 Packet size (bytes) Empirical

Figure 6.10: Empirical and predicted data, comparative CDFs and Q-Q plot of three participant sce- narios (1*1*1) of empirical data well. CDF plots and a Q-Q plot of empirical versus predicted data for three participant data are ex- tremely similar to the graphs of the convolution of three single participant scenario (and are not included). The empirical and predicted packet sizes match well. As more participants join an in-world scenario, the majority of packets move from being 55-60 bytes and spread into a ‘tail’ extending out to 110 bytes. The results for 4, 5 and 6 participant scenarios continues this trend. We can conclude that like Quake III Arena traffic, L3DGEWorld traffic has a predictable linear relationship with the number of concurrent users of the system. 98 CHAPTER 6. NETWORK RESOURCE CONSUMPTION

6.7 L3DGEWorld limitations

When developing a visualisation system based on L3DGEWorld, we have discovered that the net- working code has architectural limitations that must be accounted for when creating a system based on it. The first involves under-sampling, the second relates to packet loss and the third relates to acknowledgments.

6.7.1 Under-sampling

A form of under-sampling can occur when attempting to visualise events with a frequency component greater than the snapshot rate. In this situation no matter how quickly data is sent to a L3DGEWorld server via L3DGEComms-in, new attribute values overwrite old values in the server’s RAM. Many updates to an attribute could come in to a L3DGEWorld server in an inter-snapshot period, but only the last value set will be sent to clients at snapshot creation time. Mitigating this issue of under-sampling needs to be performed at the input abstraction processes. If a metric has a high frequency component, some form of moving window average should be used before representing the value. If there is the possibility of significant events occurring to a network metric that could be described as impulse-like, these should be detected by the input abstraction layer process and signaled to the users as a more specific visual attribute. For example, if a link is fully utilised for a sub-snapshot period (and this is considered a note-worthy event), an object might be set to the colour red for an extended period to clearly indicate this event to users. This is not an issue unique to L3DGEWorld however. As a general rule, any visualisation of network data has a risk of presenting a stimulus to a user for a time period that is too short, and reducing it below the threshold of human perception.

6.7.2 Packet loss

With regards to packet loss, there are a number of places where loss can create issues in a L3DGEWorld setup – L3DGEComms-in flows, L3DGEWorld state flows, and L3DGEComms-out flows. L3DGEComms-in data is carried using UDP and L3DGEWorld has no internal methods for ensur- ing the reliability of these packets. Loss of these packets could result in a visualisation not reflecting the monitored network’s state. To mitigate this, if loss is possible on the L3DGEComms-in path, it is recommended that any input abstraction layer processes periodically re-send any L3DGEComms-in data. As previously stated, re-sending of this data will not increase snapshot size for connected clients, 6.8. L3DGEWORLD PERFORMANCE OVER UNCONTROLLED PATHS 99 but it will re-synchronise the server and input abstraction layer in the event of L3DGEComms-in flow packet loss. The only trade off from this method of operation is the extra use of bandwidth on the L3DGEComms-in flow packet path. L3DGEComms-out flows can suffer packet loss, but the messages must be positively acknowl- edged. When an output abstraction layer process receives a packet containing an action it must send an acknowledgment or the L3DGEWorld server will re-transmit the packet until it is acknowledged.

6.7.3 Acknowledgments

During continuous operation (not connection establishment or teardown), all snapshots are positively acknowledged by a client. If a snapshot is un-acknowledged it is re-sent until it is. This simple method of acknowledgment leads to two behaviours of note. First, if a path is unable to sustain the data rate produced by L3DGEWorld, frames that are dropped will be resent, further increasing load on the path and preventing further snapshots through. This compounds the issue, leading to even more lost packets. If client or server buffer contains too many un-acknowledged packets they will close a connection. (By default, this is 1,792 packets.) The second behaviour of note resulting from the simple acknowledgment system is that if the client to server command rate is set lower than the server to client snapshot rate, this will result in two (or more) snapshots being emitted before an acknowledging command is returned. These subsequent snapshots will simply be duplicates of the first un-acknowledged snapshot, wasting bandwidth. To prevent this, the command packet rate should be at, or higher than the snapshot rate.

6.8 L3DGEWorld performance over uncontrolled paths

Up until this point, all presented results have been collected on isolated networks. We aim to make sure that L3DGEWorld will still perform over paths that include the wider Internet, and specifically when the path is high latency and/or contains wireless links. The experiment setup is shown in Figure 6.11, and consists of 5 hosts. Client 1, a host creating L3DGEComms-in updates and the server were connected using 100 Mbit/s Ethernet. Client 2 was connected using 802.11g wireless via an access point bridge. Client 3 was connected to a 3G capable mobile phone via USB and accessed the server via the Internet. The Internet connection was provided via an ADSL2+ connection. Once the three clients had all connected to the server, the same attribute update sequence as Section 6.4 was performed. We increased the load of attribute updates sent over time, starting with an 100 CHAPTER 6. NETWORK RESOURCE CONSUMPTION

2 3

Comms-in

3G

Internet

Server ADSL 1

Domain of control

Figure 6.11: L3DGEWorld over uncontrolled paths - experiment setup update of 8 objects, a pause and then continued up in steps of 8 additional objects, up to the maximum of 128 objects. Snapshot and command rates were both set to a 50 ms inter-arrival time. To gain an understanding of the path properties used by the clients, we present the path hop count and the RTT of the path. We calculate hop count from the Time to Live field of the IP packets arriving at the server and the RTT using the Synthetic Packet Pairs method [153, 154] (outlined in Appendix E). Client 1’s flows only traverse the 100 Mbit switch, and it was only used as a control in the ex- periment. No results collected from this client are presented, as no new information beyond what has been presented in the previous sections was found. Figure 6.12 shows a CDF of the RTT experienced by the flows traversing between the server and client 2 and 3. Client 2’s flows traversed a path with just a simple bridged wireless link where 90% of packets experienced under 6 ms of delay and 99% experienced less than 20 ms. The path between client 3 and the server contained 16 gateway hops. The lowest RTT was 66 ms, with a maximum RTT of 846 ms and a 50th percentile of 120 ms. Figure 6.13 shows a time-series of the SPP estimated RTT of client 3. L3DGEWorld experienced no issues when used across these paths. In all cases L3DGEWorld performed well, accurately transmitting attribute updates. From the end-user perspective there was near imperceptible delay of attribute updates, even over the slowest path tested. 6.9. CONCLUSION 101

SPP estimated RTT

100 90 80 70 60 802.11g 50

% 3G +ADSL 40 30 20 10 0 0 0.05 0.1 0.15 0.2 0.25

RTT (Sec)

Figure 6.12: L3DGEWorld 802.11g and 3G + ADSL SPP derived RTT

Figure 6.13: L3DGEWorld 3G + ADSL SPP derived RTT time-series

6.9 Conclusion

We began this chapter by presenting a high level L3DGEWorld system architecture, and proceeded to quantify its network usage characteristics, outline the architecture’s limitations and test the system when run over realistic network paths. The initial experiments to characterise network requirements took place on an isolated test net- work. We first outlined the propagation of an attribute update through a L3DGEWorld server software from its entry in a L3DGEComms-in packet, to its output as a snapshot packet. L3DGEWorld uses a simple acknowledgment system (where all dropped frames are re-sent until acknowledgment) that 102 CHAPTER 6. NETWORK RESOURCE CONSUMPTION leads to two behaviours of note. A path must be able to sustain the data rate offered by L3DGEWorld, or either client or server will drop the connection. Along with this, the command rate of clients must be set higher than snapshot rate, or the server will simply re-send un-acknowledged snapshots, wasting bandwidth. In a typical example of client connection, a maximum of 15,400 bytes are transferred from server to client when the server has all possible object attributes set to a value. Closing a connection provides no significant level of traffic. L3DGEWorld has a base traffic rate for continuous operation even when no updates are being transmitted. For client to server direction traffic this is 2, 4 and 8 kbit/sec for 5, 10 and 20 commands per second. For server to client traffic this was 18.8, 37.4 and 75.3 kbit/sec for 5, 10 and 22 packets per-second, respectively. A higher number of snapshots per-second leads to a high level of interactivity, a faster transfer of world state updates, but a higher use of bandwidth. Fewer packets per-second leads to a less interactive experience, a slower propagation of world state updates, but a lower use of bandwidth. A single attribute update increases snapshot size by between 16 and 20 bytes. Approximately 70 attributes can be sent in a single snapshot before fragmentation occurs and multiple snapshots must be sent. No matter how many times an attribute value is set to the same value by a L3DGEComms-in packet, this is only sent to clients once. The maximum delay an attribute update can experience is one snapshot interval time, plus the time taken to send all snapshot fragments. Even with the maximum number of attributes being changed and at 20 ms snapshot time, all updates will still be transferred in under a second, which is acceptable, as we have previously seen in Section 5.2.4 that usability participants can require more than two seconds to consider what an object is presenting to them (less than this introduces errors in correct attribute detection). In addition to these experiments, traffic from the usability experiments was also analysed. Of the two types of traffic, snapshot and command, command traffic carries only the movements and actions for a single user, and this does not vary significantly based on the number of other users in-world. Command traffic for participants was at all times under 75kbps and for 90% of the time was under 40kbps. Snapshot traffic changes based on the number of users in-world. Previous work with Quake III Arena successfully created predictions of higher number participant traffic from empirical traffic with lower numbers players. We were able to achieve similar results for L3DGEWorld. We have outlined an issue of under-sampling in L3DGEWorld. If an event occurs in a sub- 6.9. CONCLUSION 103 snapshot period, it has the potential to not be signaled to clients. This should be mitigated at the input abstraction layer. A second issue is that of packet loss. This can occur in state flows, but re-transmission when acknowledgments are not received solves this issue (at the cost of the band- width of re-transmission). If loss occurs in the L3DGEComms-in or L3DGEComms-out flows, there is no mechanism in place for re-transmission. It is recommended that input abstraction layer pro- cesses periodically resend all world-state, to make sure that server and input abstraction layer do not loose synchronisation. This will not increase snapshot size if server and input abstraction process are synchronised. Finally we performed a set of tests across uncontrolled paths that included 3G mobile and 802.11g links. These tests consisted of running a L3DGEWorld server on a home network, and then connecting clients to the server via a 802.11g link and another client via the Internet, including 3G and ADSL2+ links. We have shown that L3DGEWorld supports real-time collaborative visualisation and control over a variety of IP network environments. L3DGEWorld is efficient in the way it manages updating clients of world state. If a server re- ceives redundant updates via the L3DGEComms protocol, this information will not be sent to clients. L3DGEWorld’s overall bandwidth requirements are quite modest when compared with bandwidths commonly available on modern networks. Chapter 7

Conclusion

In this thesis we have discussed how IP networks are in a constant state of change, and how the management of IP networks is a significant challenge. Data must be collected and collated from many sources, presented, interpreted and then acted upon. To assist network administrators to provide improved services, we have created and evaluated a visualisation system for network management. We have created and evaluated the visualisation tool L3DGEWorld to assist with network man- agement challenges. Our visualisation presents data as attributes of in-world objects in an immersive virtual environment. We do this to leverage the sophisticated pattern recognition abilities of humans, and their capacity to add high-level context to low-level events. Doing so also reduces the level of specialised knowledge required for an employee to make a positive contribution to management tasks. Many forms of network visualisation have been outlined in the literature. They are visually di- verse and allow for various mappings of network metrics to visualisation. Despite steps by the re- search community, commonly used network monitoring software remains limited. Few have real-time data access and very few enable the control of the underlying network. Many have an intrinsic and inflexible coupling between data and visualisation. We have generated a taxonomy and surveyed works that were moving towards merging collab- oration, immersion and control features into integrated systems. No research has explored what the remote networking requirements are for systems that enable multi-party collaboration for network management tasks. Very few enable control of underlying networking elements from within their vi- sualisations or support collaboration in any form. The area’s potential has been widely acknowledged though the implementation of many prototype systems, however, most previous approaches are only described in the literature and have no public software releases of any kind. Further, authors have not generally engaged in much self-criticism of ideas, nor performed usability tests of their work.

104 7.1. L3DGEWORLD 105

We have made three novel contributions in this thesis. First, the development of L3DGEWorld. Second, we evaluated its usability through human usability experiments. Third, we evaluated its resource consumption, specifically when supporting collaborative network management.

7.1 L3DGEWorld

We have successfully used a 3D game engine as a development base to create a visualisation platform. Prototyping moved through a series of software versions. All were capable of visualising the data from a greynet (our novel passive network monitoring method) as a demonstration. Our final system is an OpenArena based system, L3DGEWorld, capable of visualising a wide variety of data inputs. L3DGEWorld contains a unique combination of features including: immersive 3D presentations, distributed server-client based collaboration among users, control of external systems through in- world interaction methods, interactive display of a wide variety of visual elements, and real-time operation while deployed on commodity hardware. L3DGEWorld can present many objects, each with multiple attributes, such as colour, size, shape and spatial orientation. In-world administrators can control their networks by interacting with objects in collaboration with other administrators (represented as avatars). These actions are then converted into network re-configuration commands. In addition to use in this thesis, L3DGEWorld has also been modified to display super computer and VoIP system state in other work.

7.2 Usability

Our 49 participant usability study showed that the WASD navigation method in ‘fly-mode’ could be used by all participants, including one who reported little experience with a computer mouse. Generally, participants improved their navigation times as the tests progressed and improvement was greatest for inexperienced users. A small number of participants reported that they felt uncomfortable with the method, at worst feeling slightly nauseated. But these participants are contrasted by a group who wished they could customise their movement further for more efficiency when moving around the world. Multiple object attributes can be used to simultaneously represent variables, but caution must be used to make sure that these are visually orthogonal and do not interfere with each others per- ception by the user. We also explored the affordances of the attributes of objects. While different 106 CHAPTER 7. CONCLUSION colours showed some clearer results for their implied meaning, for the most part the other attributes tested (bounce height, roll, size and spin) showed little difference for varying values and largely split participants’ opinions. Participants could use the system to accurately detect simulated network activity. During these same tests, although participants experienced up to 400 ms of RTT latency (more than expected in any realistic IP path between L3DGEWorld server and client), they still indicated no issue using the system. Overall, the reaction participants had for the virtual environment was very positive.

7.3 Network Resource Consumption

To evaluate the networking requirements of L3DGEWorld, we introduced an experiment architecture. Using an isolated test network we first outlined the propagation of an attribute update through a L3DGEWorld server, from its entry in a L3DGEComms-in packet, to its output as a snapshot packet. Due to L3DGEWorld’s simple acknowledgment system, a path must be able to sustain the data rate offered by L3DGEWorld, or the connection will be dropped. Along with this, the command rate of clients must be set higher than the snapshot rate, or the server will simply re-send un-acknowledged snapshots, wasting bandwidth. L3DGEWorld has a base traffic rate for continuous operation even when no updates are being transmitted. For client to server direction traffic this was 2, 4 and 8 kbit/sec for 5, 10 and 20 commands per second. For server to client traffic this was 18.8, 37.4 and 75.3 kbit/sec for 5, 10 and 22 packets per-second, respectively. A single attribute update increases snapshot size by between 16 and 20 bytes. Approximately 70 attributes can be sent in a single snapshot before fragmentation occurs and multiple snapshots must be sent. No matter how many times an attribute value is set to the same value by a L3DGEComms-in packet, this is only sent to clients once. The data propagation delay of a L3DGEWorld server is based on the configured snapshot rate, and snapshot fragmentation (if any is required). Even with the maximum number of attributes being changed and at 20 ms snapshot time, all updates will still be transferred in under a second. Traffic from the usability experiments was also analysed. Of the two types of traffic, snapshot and command, command traffic carries only the movements and actions for a single user, and this does not vary significantly based on the number of other users in-world. Command traffic for participants was at all times under 75kbps and for 90% of the time was under 40kbps. Snapshot traffic does change based on number of users in-world. Previous work with Quake III Arena successfully created predictions of higher number participant traffic from empirical traffic with lower numbers players. 7.3. NETWORK RESOURCE CONSUMPTION 107

This was also true for L3DGEWorld data. The predicted data closely reflects the empirical data for two and three participant scenarios. The results for 4 to 6 participant predictions matches the trends for previous work with Quake III Arena indicating that although having an overall lower bit-rate than Quake III Arena traffic, L3DGEWorld traffic follows the same general traffic patterns. L3DGEWorld does support real-time collaboration over the wider Internet and is not limited to specially configured or dedicated networks. From the outcomes of this work, we foresee a number of areas for future work. Our testing used regular desktop screens. An open question remains as to whether visualisation of complex network scenarios can be similarly effective or intuitive on portable devices with smaller screens. L3DGEWorld is also capable of presenting complicated higher-level visual metaphors via more de- tailed objects, or even avatars, with complicated movements or facial expressions. Further investiga- tion could also explore integrated systems that allow for historical data access, or integrated network simulators. In this thesis, a novel immersive 3D network management system was created and described. Network operators and those with no data networking knowledge could use the system to identify network events accurately. It can be used to support collaborative network monitoring and control over the Internet and when doing so, its network resource requirements were modest. Appendix A

Ethics clearance

As per university requirements, the following page reproduces email approval of Swinburne Univer- sity Human Research Ethics Committee (SUHREC) Project 0708/093, ‘The application of immersive virtual environment metaphors to the monitoring and control of data communication networks’, Assoc Prof Grenville Armitage, Mr Warren Harrop and Mr Lucas Parry. It was made clear to participants that the results were recorded anonymously. No participants were subordinate to the experimenters or offered any direct incentives to participate (although after participation, light refreshments were served for some groups at the expense of the experimenters). All conditions pertaining to the clearance were properly met and the project’s final report was submitted.

108 Return-Path: ! Message-Id: <[email protected]>! Date: Tue, 23 Oct 2007 10:34:03 +1000! From: "Keith Wilkins" ! To: "Grenville Armitage" , ! Subject: SUHREC Project 0708/093 Ethics Clearance! ! To: Assoc Prof Grenville Armitage/Mr Warren Harrop, FICT! ! Dear Grenville and Warren! ! SUHREC Project 0708/093 The application of immersive virtual environment metaphors to the monitoring and control of data! Assoc Prof G Armitage FICT Mr Warren Harrop Mr Lucas Parry! Approved Duration: 22/10/2009 To 01/02/2008! ! Ethical review of the above project protocols was undertaken on behalf of Swinburne's Human Research Ethics Committee (SUHREC) by a SUHREC Subcommittee (SHESC4) at a meeting held 19 October 2007.! ! I am pleased to advise that the project was approved as submitted. The standard on-going ethics clearance conditions are as follows.! ! - All human research activity undertaken under Swinburne auspices must conform to Swinburne and external regulatory standards, including the current National Statement on Ethical Conduct in Research Involving Humans and with respect to secure data use, retention and disposal.! ! - The named Swinburne Chief Investigator/Supervisor remains responsible for any personnel appointed to or associated with the project being made aware of ethics clearance conditions, including research and consent procedures or instruments approved. Any change in chief investigator/supervisor requires timely notification and SUHREC endorsement.! ! - The above project has been approved as submitted for ethical review by or on behalf of SUHREC. Amendments to approved procedures or instruments ordinarily require prior ethical appraisal/ clearance. SUHREC must be notified immediately or as soon as possible thereafter of (a) any serious or unexpected adverse effects on participants and any redress measures; (b) proposed changes in protocols; and (c) unforeseen events which might affect continued ethical acceptability of the project.! ! - At a minimum, an annual report on the progress of the project is required as well as at the conclusion (or abandonment) of the project.! ! - A duly authorised external or internal audit of the project may be undertaken at any time.! ! Please contact me if you have any queries about on-going ethics clearance. The SUHREC project number should be quoted in communication.! ! Best wishes for the project.! ! Yours sincerely! ! ! Keith Wilkins! Secretary, SHESC4! ! *******************************************! Keith Wilkins! Research Ethics Officer! Swinburne Research (H68)! Swinburne University of Technology! P O Box 218! HAWTHORN VIC 3122! Tel: 9214 5218! ! ! Appendix B

Questionnaire

The following pages are the anonymous questionnaire for user reporting during human usability ex- periments.

110 Anonymous questionnaire

Note: All questions are optional, feel free to skip any you do not wish to answer. If you have any questions at any time, please ask the experiment investigator.

Instructions: Please answer each question by putting a tick in the appropriate box. Some questions may have ‘Please Specify’ underneath or near a box. If you choose this option then please also write an expanded answer on the line provided.

In this section we wish to gain insight into your general computer and Internet proficiency. We also wish to know if you have any network administration experience.

Section 1. (To be completed before evaluation of software.) 1. Demographic 1.1. Gender:  Male,  Female

1.2. Age:  < 20,  21-25,  26-30,  31-35,  36-40,  41-45,  46-50,  51-55,  56-60,  61-65,  66-70,  71-75,  76-80,  80 >

1.3. I am currently studying (check all that apply):  Not a student,  Telecommunications or Information technologies,  Arts,  Engineering,  Science

1.4. Do you have any form of colour blindness?  Yes,  No

2. Information and Communication Technologies Proficiency 2.1. General computing 2.1.1. I would describe my overall general computer proficiency as:  Beginner,  Beginner to Intermediate,  Intermediate,  Intermediate to Expert,  Expert

2.1.2. The year that I first had access to a computer that I regularly used was (roughly):  < 1960,  1960-1964,  1965-1969,  1970-1974,  1975-1979,  1980-1984,  1985-1989,  1990-1994,  1995-1999,  2000-2004,  2005-present

2.1.3. I use a computer (check all that apply):  At work,  At home,  Friends/Relatives homes  At a school/university where I am a student,  Internet café

2.1.4. I am familiar with (have used long enough to arrive at an opinion of) the following operating systems (check all that apply):  Windows ME/95/98,  Windows 2000,  Windows XP,  Vista,  ,  BSD,  Mac OS ≤ 9,  Mac OS X ,  AIX,  Solaris,  Nextstep,  Multics,  MS DOS,  Windows 3.x

Anonymous questionnaire SUHREC Project 0708/093 Page 1 2.2. Internet usage 2.2.1. I would describe my Internet proficiency as:  Beginner,  Beginner to Intermediate,  Intermediate,  Intermediate to Expert,  Expert

2.2.2. I access the Internet from (check all that apply):  Home,  Work,  School/university where I am a student,  Internet Café,  Other locations using mobile devices

2.2.3. The first year I started to regularly use the Internet was:  <1989,  1990 – 1994,  1995 – 1999,  2000 – 2004,  2005 – present

2.2.4. I connect to the Internet at home via:  ADSL,  Cable,  Dial-up Modem,  'Wireless broadband' ('Unwired' or '3G' etc.),  I'm unsure of the connection type,  I don't have a home Internet connection

If you answered “I don't have a home Internet connection” please skip to question 2.2.4

2.2.1. My total monthly download limit, including any 'on-peak' or 'off-peak' usage before I am 'rate shaped' or charged extra is (in gigabytes):  0 - 2,  3 – 5,  6 – 10,  11 – 20,  21 - 30,  31 – 40,  41 – 50,  51 – 60,  61 – 70,  71 >,  Don't know,  N/A or No download limit

2.2.2. At home, I leave my Internet connection 'always on'.  Yes,  No

2.2.3. Before obtaining my current home Internet connection:  I researched for an Internet Service Provider (ISP) myself and the decision was ultimately mine,  I didn't really research the ISP – but the decision was ultimately mine,  Someone else did the research and chose for me,  My current Internet connection came about for other reasons, eg Bought by work, came bundled with other telecommunications deal, etc.

2.2.4. On average, if I had to estimate the time I spend using an Internet connected device (including all locations) it would be (hrs per day):  <1,  1-2,  3-4,  5-6,  7-8,  9-10,  11-12,  13-14,  15-16,  >16

Anonymous questionnaire SUHREC Project 0708/093 Page 2 2.2.5. I use the Internet for (check all that apply):  E-mail,  'Web surfing',  P2P file sharing,  'Net-banking',  'Blogging',  Participating in social networking pages (myspace, facebook etc),  Playing network games,  Working from home ('VPNing'),  Voice over IP (VoIP) eg, skype,  Instant messaging (ICQ, MSN),  Purchasing goods/services,  Listening to audio/watching video,  Education / Research

2.3. Network administration 2.3.1. I have a 'home network' (2 or more computers in my home linked to share resources):  Yes,  No

If you answered 'No' please skip to question 2.3.2.

2.3.1.1. When it comes to my home network's set-up and administration I consider myself to be:  Beginner,  Beginner to Intermediate,  Intermediate,  Intermediate to Expert,  Expert  I did not set-up nor do I administer my home network, someone else does it for me

2.3.1.2. I use my home network to (check all that apply):  Share files between PCs or other devices,  Share a printer,  Share the Internet

2.3.1.3. My home network uses wireless technology:  No,  Yes – I configured it myself - I am unsure if it is running securely,  Yes – I configured it myself - to run securely (or insecurely, but I understand the possible consequences of this),  Yes – Someone else configured it for me

2.3.2. My work can best be described as (check all that apply):  Network administrator,  IT or ICT management,  Telecommunication engineer,  Academic,  Student,  Other

2.3.3. Now, or sometime in the past, my professional work has involved some type of network administration.  Yes,  No

If you answered 'No' please skip to question 2.4. If you answered 'Yes' and your network administration work occurred in a previous employment position, note that all the following questions are phrased in the present tense. Please answer all of the following questions as if you were still in this previous role.

2.3.3.1. In my work as a network administrator the operating systems I work with are (check all that apply):  Windows,  BSD,  Linux,  Other Unix variant(s)

Anonymous questionnaire SUHREC Project 0708/093 Page 3 2.3.3.2. My network administration work involves OSI layer (check all that apply):  1 - Physical,  2 – Link Layer,  3 - Network,  4 - Transport,  5 – 7 – Session through Application,  Unsure

2.3.3.3. I have been required to administer the following equipment (check all that apply):  Switches,  Routers,  Servers

2.3.3.4. To diagnose network issues I have used the following programs (check all that apply):  tcpdump,  Ethereal (Wireshark),  MRTG/RRD,  What's up gold,  HP Openview,  Nagios,  openNMS,  nmap

2.4. Computer gaming

In the following section when we refer to 'games' we refer only to electronic video games (and do not count web based games or games sites such as pogo).

2.4.1. I would describe my usage of games as:  Never/infrequent,  Infrequent – Moderate,  Moderate,  Moderate - Extensive,  Extensive

2.4.2. I am familiar with (have played enough to decide if I like or dislike) the following games or their sequels (check all that apply):  Halo,  Quake,  Enemy Territory,  Battlefield,  Half Life,  Doom,  Tribes

2.4.3. I regularly use a games console for playing games (eg. Wii, Playstation, Xbox):  Yes,  No

2.4.4. On average in a week, I would play games approximately (hrs):  0 - 1,  2 - 3,  4 - 5,  5 - 6,  7 - 8,  9 - 10,  11 - 12,  13 - 14,  15 - 16,  >16

2.4.5. How much experience have you had with PC “first person shooter” game movement (where your left hand on the keyboard controls character lateral movement and your right hand controls character viewpoint with the mouse)?  None,  Beginner,  Beginner – Intermediate,  Intermediate,  Intermediate - Extensive,  Extensive

2.4.6. I prefer to use my own key/mouse configuration for the following experiments involving first person shooter movement controls.  Yes,  No, I'll use the default

If you answered yes, please inform the experiment investigator so you can configure your client machine. A copy of your configuration will be kept as part of your anonymous questionnaire.

Anonymous questionnaire SUHREC Project 0708/093 Page 4 Section 2. (To be completed using the prototype software, guided by the experiment investigator.)

Reminder: There are no right or wrong answers to the following questions. We are seeking your opinions of the software that will be presented. The experimenter will now show you through the software that has been developed at the Centre for Advanced Internet Architectures.

3. Virtual world

3.1. Movement around the virtual world In this section we are trying to find out how people's familiarity (or unfamiliarity) with 3D 'First Person Shooter' (FPS) games influences their ability to use the system under test.

3.1.1. Moving around the world – practice. We will give you a chance to familiarise yourself with a simple “virtual obstacle course” and the method of moving around this world. The course consists of three sections: 1: “Looking” and “shooting” 2: “Looking” and “moving” 3: “Looking” and “shooting” and “moving” The default keys are: Forwards – 'w', back – 's', step left – 'a', step right - 'd' The mouse controls the direction you are looking in. Feel free to ask the experiment investigator for any information you need.

3.1.2. Moving around the world – timed. We would now like you to complete a similar obstacle course set 3 times, as quickly as possible. The experiment investigator will record your times below. 1 2 3 Time for part A: ______seconds Time for part A: ______seconds Time for part A: ______seconds Time for part B: ______seconds Time for part B: ______seconds Time for part B: ______seconds Time for part C: ______seconds Time for part C: ______seconds Time for part C: ______seconds

3.1.3. I found the first part of each test (looking and shooting)  Very Easy,  Easy,  Neutral,  Challenging,  Very challenging

3.1.4. I found the second part of each test (looking and moving)  Very Easy,  Easy,  Neutral,  Challenging,  Very challenging

3.1.5. I found the third part of each test (looking and shooting and moving)  Very Easy,  Easy,  Neutral,  Challenging,  Very challenging

3.1.6. What, if anything did you have problems with (check all that apply)?  Nothing,  Moving,  Looking,  Disorientation,  Moving and looking at the same time,  Shooting/Aiming  Other – Please specify ______

3.1.7. If you can, please expand on your response on 3.1.6: ______

Anonymous questionnaire SUHREC Project 0708/093 Page 5 3.2. Objects in the virtual world

The objects in the virtual world can perform a number of different movements. We wish to get your views on what you feel these movements represent, both in terms of emotions and in a computer networking sense. We also wish to determine what movement combinations people can determine quickly and accurately.

3.2.1. Objects movements and the information they convey

In this section we wish to gain insight into people's views on various objects movements. You will be shown a moving object for 4 seconds. When the screen goes blank please write your answers on the attached sheet “3.2.1 ANSWER SHEET”. As well as a list of three things the object could be representing, the sheet contains the letters A through I. These letters correspond to computer networking phrases below. (Even if you are not a networking professional still feel free to choose any of these phrases if you understand them and see fit.) If you are a networking professional and you think that some other phrase may fit, use F through H and specify your answer.

Choose as many or as few as you think can apply. Please read through the list of emotions and networking statements before proceeding.

Networking related statements: To me, this object action represents ... A: How many other network hosts it is communicating to (active connections) B: How much communication it is doing per second (bandwidth consumption - bps) C: The number of packets per second (PPS) it is communicating D: A network alarm E: An unresponsive networking component F: Other – please specify ______G: Other – please specify ______H: Other – please specify ______I: I'm not sure

3.2.2. Movement combinations and their ability to be quickly determined

It is our hypothesis that when combined in certain ways (eg 'jumping' and 'spinning') people will be able to quickly detect how an object is moving. In other combinations, the movements will be less clear – possibly 'covering up' each other. We are attempting to find what combinations of movements slows people's ability to correctly interpret what actual moves are being presented. In the next experiment, you will be shown short flashes of objects performing different movements. Please place a mark on the attached sheet “3.2.2 ANSWER SHEET” indicating what movements you saw in each instance.

Anonymous questionnaire SUHREC Project 0708/093 Page 6 3.3. “Macro” views of the virtual world – network monitoring

In this section we are attempting to compare networking professionals, non-networking professionals, 'gamers' and non-gamers when they use the system to observe network activity.

3.3.1. Practice. This is a chance for you to look around a virtual world that represents the network of a medium sized business. For this scenario we have chosen a set of mappings from virtual world to underlying network measurements. They are as follows:

Virtual World Network Measurement Bounce Device not responding (broken) Size Time aggregate of unique connections (how many 'things' it is 'talking' to) Spin speed Throughput (bytes per second)

You will start off in 'walk' mode, but you may choose to switch between this and 'fly' mode at any time with the 'f' key (if you are using the default key set-up).

Note for those using own key combinations: bind f “togglefly”

3.3.2. You will now be presented with 5 different scenarios. The scenarios start with the world in the “normal” state, with network activity that has been deemed “OK”. Then, 60 to 90 seconds after the start of the scenario, one or more network activities may occur – changing the virtual world. Please indicate below what virtual world actions you see as anomalous. At the end of the network simulation, all movement in-world will stop.

Anonymous questionnaire SUHREC Project 0708/093 Page 7 3.3.2.1. Scenario 1 3.3.2.1.1. In a non-networking sense, if anything, what did you see change in the virtual world that you would consider different to the 'normal' state ? ______

3.3.2.1.2. If you can, please name this as a network activity (check all you feel apply):  Network scan,  Large bandwidth usage by a network host,  Large usage of connections by a network host,  A network host down,  Denial of service attack against a host,  Denial of service attack sourced from a host  Other ______

3.3.2.1.3. Did you find that the system was responsive (ie was there any noticeable delay preventing you from moving or collaborating effectively in the world)? Responsive -  1,  2,  3,  4,  5 – Unresponsive

3.3.2.1.4. If you found the system to be unresponsive in any way, how did you find it unresponsive? ______

Anonymous questionnaire SUHREC Project 0708/093 Page 8 3.3.2.2. Scenario 2 3.3.2.2.1. In a non-networking sense, if anything, what did you see change in the virtual world that you would consider different to the 'normal' state ? ______

3.3.2.2.2. If you can, please name this as a network activity (check all you feel apply):  Network scan,  Large bandwidth usage by a network host,  Large usage of connections by a network host,  A network host down,  Denial of service attack against a host,  Denial of service attack sourced from a host  Other ______3.3.2.2.3. Did you find that the system was responsive (ie was there any noticeable delay preventing you from moving or collaborating effectively in the world)? Responsive -  1,  2,  3,  4,  5 – Unresponsive

3.3.2.2.4. If you found the system to be unresponsive in any way, how did you find it unresponsive? ______

Anonymous questionnaire SUHREC Project 0708/093 Page 9 3.3.2.3. Scenario 3 3.3.2.3.1. In a non-networking sense, if anything, what did you see change in the virtual world that you would consider different to the 'normal' state ? ______

3.3.2.3.2. If you can, please name this as a network activity (check all you feel apply):  Network scan,  Large bandwidth usage by a network host,  Large usage of connections by a network host,  A network host down,  Denial of service attack against a host,  Denial of service attack sourced from a host  Other ______

3.3.2.3.3. Did you find that the system was responsive (ie was there any noticeable delay preventing you from moving or collaborating effectively in the world)? Responsive -  1,  2,  3,  4,  5 – Unresponsive

3.3.2.3.4. If you found the system to be unresponsive in any way, how did you find it unresponsive? ______

Anonymous questionnaire SUHREC Project 0708/093 Page 10 3.3.2.4. Scenario 4 3.3.2.4.1. In a non-networking sense, if anything, what did you see change in the virtual world that you would consider different to the 'normal' state ? ______

3.3.2.5. If you can, please name this as a network activity (check all you feel apply):  Network scan,  Large bandwidth usage by a network host,  Large usage of connections by a network host,  A network host down,  Denial of service attack against a host,  Denial of service attack sourced from a host  Other ______

3.3.2.5.1. Did you find that the system was responsive (ie was there any noticeable delay preventing you from moving or collaborating effectively in the world)? Responsive -  1,  2,  3,  4,  5 – Unresponsive

3.3.2.5.2. If you found the system to be unresponsive in any way, how did you find it unresponsive? ______

Anonymous questionnaire SUHREC Project 0708/093 Page 11 3.3.2.6. Scenario 5 3.3.2.6.1. In a non-networking sense, if anything, what did you see change in the virtual world that you would consider different to the 'normal' state ? ______

3.3.2.7. If you can, please name this as a network activity (check all you feel apply):  Network scan,  Large bandwidth usage by a network host,  Large usage of connections by a network host,  A network host down,  Denial of service attack against a host,  Denial of service attack sourced from a host  Other ______

3.3.2.7.1. Did you find that the system was responsive (ie was there any noticeable delay preventing you from moving or collaborating effectively in the world)? Responsive -  1,  2,  3,  4,  5 – Unresponsive

3.3.2.7.2. If you found the system to be unresponsive in any way, how did you find it unresponsive? ______

Anonymous questionnaire SUHREC Project 0708/093 Page 12 3.3.3. Did you have any difficulties during the above scenarios? ______

3.3.4. On a scale of 1 to 5, did you prefer walking or flying during these scenarios? Walking -  1,  2,  3,  4,  5 – Flying

3.3.4.1. Why? ______

3.4. When completing the previous tasks did you find yourself keeping a distance to have a world overview, or keeping closer to inspect objects up close? Keeping distance -  1,  2,  3,  4,  5 – Closer

3.4.1. If possible, please expand on your choice above ______

3.5. We would now like to get your written comments on the virtual worlds, both positive and negative aspects.

3.5.1. Negative aspects of your usage of the world ______

3.5.2. Positive aspects of your usage of the world ______

3.6. Is there anything else, that you have not yet communicated, that you would like to state about anything regarding today's research? ______

Anonymous questionnaire SUHREC Project 0708/093 Page 13 Appendix C

Answer sheets

The following pages are the answer sheets for user reporting of questionnaire sections 3.2.1 and 3.2.2.

124 3.2.1 ANSWER SHEET (check all that apply) S S o o m S m S S o N e o m ew w B S o o G o m m e t h a o ha d m e e w U s o a w N ur t /“A e I w h n d t Ne U w N o m N i B h t h a m O e /“N G e a po a e t /N a rg ha U th o ith d b ith t t ith U po o n e n r r I o o /” t U ge ta m n e d e o n e o e i r r n r /” r A r t r t n r m ta e m m rg n po b ur t po n a a N a t l n en ge r t ppl ” o o l” ta r r r t ta m m n n y t t n a a t l l A B C D E F G H ” ” I 1 X X X X X X X X X X X X X X X X X X X X X X X X X X 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 3.2.2 Answer sheet (check all that apply) Spin Scale Bounce Roll (degrees) Colour Slow Medium Fast Large X Large Small Medium Large 90 180 (Specify) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 Appendix D

Hardware accuracy

To characterise the time stamping accuracy and delay of the commodity switching and host devices used in this thesis, a precision traffic generator/monitor was used. The device is a NetCom System Smartbits 2000, populated with 4 SX-7410B cards (capable of 100Mbit/s). All experiments in this thesis were conducted with ethernet cards configured to auto-negotiate speed, which in all cases was 100Mbit/s full-duplex.

D.1 Methodology

For the hosts involved in packet collection and time-stamping, the ‘Netcom SmartWindow’ applica- tion was used to send frames of various sizes and inter-arrival times (up to and exceeding the loads presented by L3DGEWorld) to the hosts from a single card of the Smartbits. Frames were captured by the device and timestamped by their internal systems. This host generated timestamp was compared with the sub-microsecond accurate Smartbits generated traffic. Two switches were used in experiments. Each was individually connected to the four SX-7410B Smartbits cards1. The Smartbits software SmartApplications was then used to automate tests set out in RFCs ‘Benchmarking Terminology for Network Interconnection Devices’ [155] and ‘Benchmarking Methodology for Network Interconnect Devices’ [156].

1Multiple connection combinations were tested between the 4 Smartbit ports and the 5 or 8 switch ports. The results are not presented here, as varying port to port connections did not alter any results.

129 130 APPENDIX D. HARDWARE ACCURACY

Dell recorded packet inter-arrivals - 10ms 100

90

80

70

60

% 50

40

30

20

10

0 9.980 9.985 9.990 9.995 10.000 10.005 10.010 10.015 10.020 Inter-arrival time (ms)

Figure D.1: Dell time-stamping accuracy for packets with a 10ms inter-arrival time

D.2 Results

A Dell Optiplex GX260 (with 512 MB of RAM and a Pentium 4 CPU running at 2.66GHz) running FreeBSD 8.1 was used to record the traffic traces in Chapter 6. This device never experienced errors when recording packet size. For traffic generated with a 10 ms inter-arrival time, the majority of packets were timestamped as shown in Figure D.1. Approximately 4% of packets were timestamped as 10.118 ms or 10.119 ms.

D.2.1 Alloy 8 port switch (GS-08DXI)

When switching frames form one port and another, the switch showed no frame loss at any frame size (from 64 to 1518 bytes), with loads up to the full rate of 100Mbit/s. When switching frames from two or three ports to a single congested port, frames are only lost when expected – when the total ingress bandwidth exceded the 100Mbit/s egress port’s bandwidth. The maximum latency introduced by the switch was while processing 1518 byte frames at the maximum rate. This delay was 129.6 microseconds (the switch appears to use store and forward, taking this methodology in to account the switch’s processing time at this rate is 8.2 microseconds). The number of machines connected to this device during tests was not significant to warrant testing of the switch’s CAM (Content Addressable Memory) table limits. The switch’s claimed CAM table limit is 4K entries. The switches claimed memory is 2 Megabit of buffer. D.2. RESULTS 131

D.2.2 Alloy 5 port switch (NS-05CR)

This particular switch can be powered using an AC adaptor or from the USB port. All experiments were completed with the switch powered from a USB port. The switch showed no frame loss at any frame size, with loads up to the full rate of 100Mbit/s. The maximum latency introduced by the switch was while processing 1518 byte frames at the maximum rate. This delay was 124.2 microseconds (2.8 microseconds of frame processing if accounting for store and forward operation). The number of machines connected to this device during tests was not significant to warrant testing to of the switch’s CAM table limits. The switch’s claimed CAM table limit is 1K entries. The switches claimed memory is 1 Megabit of buffer. Appendix E

Synthetic Packet Pairs

Synthetic Packet Pairs (SPP) [153, 154] allows for continuous RTT estimates to be made from packets captured at two different locations on a network path. Figure E.1 shows how two packets passing monitoring points 1 and 2 can be captured and time stamped. From this data, the RTT between point 1 and 2 can be calculated using formula E.1:

RTT = (t12 −t11) + (t21 −t22) (E.1) where t11 and t12 are the time stamps of a packet as it traverses monitor point 1 and then 2. Similarly for t21 and t22 but with the packet direction reversed. In these experiments the SPP implementation used was the software SPP. More detail on the SPP method and its open source implementation can be found in [153, 154]. For these experiments, the SPP monitoring points were each of the respective clients, and the server. Monitoring points 1 2

t11 t12 t12 -t11 Time t21 t22 t22 -t21

Figure E.1: Using the SPP (Synthetic Packet Pairs) method to estimate path RTT

132 References

[1] B. Cheswick, H. Burch, and S. Branigan, “Mapping and visualizing the internet,” in USENIX Annual Technical Conference, General Track, 2000, pp. 1–12.

[2] Visualizing IPv4 Internet Topology at a Macroscopic Scale. Accessed: Dec 2013. [Online]. Available: http://www.caida.org/research/topology/as%5Fcore%5Fnetwork/

[3] C. V. Wright, F. Monrose, and G. M. Masson, “Using visual motifs to classify encrypted traf- fic,” in VizSEC ’06: Proceedings of the 3rd international workshop on Visualization for com- puter security. New York, NY, USA: ACM, 2006, pp. 41–50.

[4] glTail.rb - realtime logfile visualization. Accessed: Dec 2013. [Online]. Available: http://www.fudgie.org/

[5] G. Conti, K. Abdullah, J. Grizzard, J. Stasko, J. A. Copeland, M. Ahamad, H. L. Owen, and C. Lee, “Countering security information overload through alert and packet visualization,” IEEE Comput. Graph. Appl., vol. 26, no. 2, pp. 60–70, 2006.

[6] W. Yurcik, “Tool update: NVisionIP improvements (difference view, sparklines, and shapes),” in VizSEC ’06: Proceedings of the 3rd international workshop on Visualization for computer security. New York, NY, USA: ACM, 2006, pp. 65–66.

[7] R. Ball, G. A. Fink, and C. North, “Home-centric visualization of network traffic for security administration,” in VizSEC/DMSEC ’04: Proceedings of the 2004 ACM workshop on Visual- ization and data mining for computer security. New York, NY, USA: ACM Press, 2004, pp. 55–64.

[8] J. R. Goodall, W. G. Lutters, P. Rheingans, and A. Komlodi, “Focusing on context in network traffic analysis,” IEEE Comput. Graph. Appl., vol. 26, no. 2, pp. 72–80, 2006.

133 134 REFERENCES

[9] K. C. Cox and S. G. Eick, “Case study: 3D displays of internet traffic,” in INFOVIS ’95: Proceedings of the 1995 IEEE Symposium on Information Visualization. Washington, DC, USA: IEEE Computer Society, 1995, p. 129.

[10] S. Lau, “The spinning cube of potential doom,” Commun. ACM, vol. 47, no. 6, pp. 25–26, 2004.

[11] E. L. Malecot,´ M. Kohara, Y. Hori, and K. Sakurai, “Interactively combining 2D and 3D visu- alization for network traffic monitoring,” in VizSEC ’06: Proceedings of the 3rd international workshop on Visualization for computer security. New York, NY, USA: ACM, 2006, pp. 123–127.

[12] J. Oberheide, M. Karir, and D. Blazakis, “VAST: visualizing autonomous system topology,” in VizSEC ’06: Proceedings of the 3rd international workshop on Visualization for computer security. New York, NY, USA: ACM, 2006, pp. 71–80.

[13] L. A. Crutcher, A. A. Lazar, S. K. Feiner, and M. Zhou, “Management of broadband networks using 3D virtual world,” in High Performance Distributed Computing, 1993., Proceedings the 2nd International Symposium on, Spokane, WA, Jul. 1993, pp. 306–315.

[14] C. R. D. Santos, P. Gros, P. Abel, D. Loisel, N. Trichaud, and J. P. Paris, “Metaphor-aware 3d navigation,” in INFOVIS ’00: Proceedings of the IEEE Symposium on Information Vizualiza- tion 2000. Washington, DC, USA: IEEE Computer Society, 2000, p. 155.

[15] D. Koppenhofer. psDooM (aka: DooM for Sys A’s). Accessed: Dec 2013. [Online]. Available: http://psdoom.sourceforge.net/

[16] S. Axelsson, “The base-rate fallacy and the difficulty of intrusion detection,” ACM Trans. Inf. Syst. Secur., vol. 3, no. 3, pp. 186–205, 2000.

[17] R. Spence and A. Press, Information Visualization (2nd Edition). Prentice Hall, January 2007.

[18] (2007) Leveraging 3D Game Engines (L3DGE): Novel techniques for anomalous traffic detection and collaborative network control. Accessed: Dec 2013. [Online]. Available: http://caia.swin.edu.au/urp/l3dge/

[19] J. R. Goodall, W. G. Lutters, and A. Komlodi, “I know my network: collaboration and expertise in intrusion detection,” in CSCW ’04: Proceedings of the 2004 ACM conference on Computer supported cooperative work. New York, NY, USA: ACM Press, 2004, pp. 342–345. REFERENCES 135

[20] ——, “The work of intrusion detection: Rethinking the role of security analysts.” in Proceed- ings of the Americas Conference on Information Systems (AMCIS). Association for Informa- tion Systems, 2004.

[21] W. Harrop and G. Armitage, “Defining and evaluating greynets (sparse darknets),” in LCN ’05: Proceedings of the The IEEE Conference on Local Computer Networks 30th Anniversary. Washington, DC, USA: IEEE Computer Society, 2005, pp. 344–350.

[22] ——, “Greynets: a definition and evaluation of sparsely populated darknets,” in MineNet ’05: Proceeding of the 2005 ACM SIGCOMM workshop on Mining network data. New York, NY, USA: ACM Press, 2005, pp. 171–172.

[23] F. Baker, W. Harrop, and G. Armitage, “IPv4 and IPv6 Greynets,” RFC 6018 (Informational), Internet Engineering Task Force, Sep. 2010. [Online]. Available: http://www.ietf.org/rfc/rfc6018.txt

[24] H. A. Simon, The sciences of the artificial (3rd ed.). Cambridge, MA, USA: MIT Press, 1996.

[25] K. Garland, Mr. Beck’s Underground Map. Capital Transport Publishing, 1994.

[26] The DOT Language. Accessed: Dec 2013. [Online]. Available: http://www.graphviz.org/doc/ info/lang.html

[27] R. S. Thompson, E. M. Rantanen, W. Yurcik, and B. P. Bailey, “Command line or pretty lines?: comparing textual and visual interfaces for intrusion detection,” in CHI ’07: Proceedings of the SIGCHI conference on Human factors in computing systems. New York, NY, USA: ACM, 2007, p. 1205.

[28] G. Conti, M. Ahamad, and J. Stasko, “Attacking information visualization system usability overloading and deceiving the human,” in SOUPS ’05: Proceedings of the 2005 symposium on Usable privacy and security. New York, NY, USA: ACM, 2005, pp. 89–100.

[29] F. Baker and D. Meyer, “Internet Protocols for the Smart Grid,” RFC 6272 (Informational), Internet Engineering Task Force, Jun. 2011. [Online]. Available: http://www.ietf.org/rfc/ rfc6272.txt

[30] V. Paxson, G. Almes, J. Mahdavi, and M. Mathis, “Framework for IP Performance Metrics,” RFC 2330 (Informational), Internet Engineering Task Force, May 1998. [Online]. Available: http://www.ietf.org/rfc/rfc2330.txt 136 REFERENCES

[31] L. MartinGarcia. tcpdump/libpcap public repository. Accessed: Dec 2013. [Online]. Available: http://www.tcpdump.org/

[32] Microsoft network monitor 3.4 - download page. Accessed: Dec 2013. [Online]. Available: http://www.microsoft.com/download/en/details.aspx?id=4865

[33] B. Callaghan and R. Gilligan, “Snoop Version 2 Packet Capture File Format,” RFC 1761 (Informational), Internet Engineering Task Force, Feb. 1995. [Online]. Available: http://www.ietf.org/rfc/rfc1761.txt

[34] “ANSI T1.105: Synchronous optical network (SONET): Basic description including multi- plexing structure, rates and formats,” American National Standards Institute, 2008.

[35] E. Rosen, A. Viswanathan, and R. Callon, “Multiprotocol Label Switching Architecture,” RFC 3031 (Proposed Standard), Internet Engineering Task Force, Jan. 2001. [Online]. Available: http://www.ietf.org/rfc/rfc3031.txt

[36] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss, “An Architecture for Differentiated Service,” RFC 2475 (Informational), Internet Engineering Task Force, Dec. 1998, updated by RFC 3260. [Online]. Available: http://www.ietf.org/rfc/rfc2475.txt

[37] B. Claise, “Cisco Systems NetFlow Services Export Version 9,” RFC 3954, October 2004. [Online]. Available: http://www.ietf.org/rfc/rfc3954.txt

[38] ——, “Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information,” RFC 5101 (Proposed Standard), Internet Engineering Task Force, Jan. 2008. [Online]. Available: http://www.ietf.org/rfc/rfc5101.txt

[39] P. Phaal, S. Panchen, and N. McKee, “nMon Corporation’s sFlow: A Method for Monitoring Traffic in Switched and Routed Networks,” RFC 3176, September 2001. [Online]. Available: http://www.ietf.org/rfc/rfc3176.txt

[40] Cisco - Sampled NetFlow. Accessed: Dec 2013. [Online]. Available: http://www.cisco.com/ en/US/docs/ios/12 0s/feature/guide/12s sanf.html

[41] N. Duffield, D. Chiou, B. Claise, A. Greenberg, M. Grossglauser, and J. Rexford, “A Framework for Packet Selection and Reporting,” RFC 5474 (Informational), Internet Engineering Task Force, Mar. 2009. [Online]. Available: http://www.ietf.org/rfc/rfc5474.txt REFERENCES 137

[42] D. Harrington, R. Presuhn, and B. Wijnen, “An Architecture for Describing Simple Network Management Protocol (SNMP) Management Frameworks,” RFC 3411 (Standard), Internet Engineering Task Force, Dec. 2002, updated by RFCs 5343, 5590. [Online]. Available: http://www.ietf.org/rfc/rfc3411.txt

[43] World Wide Web Consortium (W3C). Accessed: Dec 2013. [Online]. Available: http://www.w3.org/

[44] The Apache Software Foundation. Accessed: Dec 2013. [Online]. Available: http: //www.apache.org/

[45] Squid: Optimising web delivery. Accessed: Dec 2013. [Online]. Available: http: //www.squid-cache.org/

[46] C. Lonvick, “The BSD Syslog Protocol,” RFC 3164 (Informational), Aug. 2001. [Online]. Available: http://www.ietf.org/rfc/rfc3164.txt

[47] N. Provos and T. Holz, Virtual Honeypots: From Botnet Tracking to Intrusion Detection. Ad- dison Wesley, July 2007.

[48] Honeyd Honeypot project. Accessed: Dec 2013. [Online]. Available: http://www.honeyd.org/

[49] B. Feinstein and G. Matthews, “The intrusion detection exchange protocol (idxp),” RFC 4767, March 2007. [Online]. Available: http://www.rfc-editor.org/rfc/rfc4767.txt

[50] H. Debar, D. Curry, and B. Feinstein, “The Intrusion Detection Message Exchange Format (IDMEF),” RFC 4765 (Experimental), Mar. 2007. [Online]. Available: http: //www.ietf.org/rfc/rfc4765.txt

[51] D. Moore, C. Shannon, G. M. Voelkery, and S. Savage, “Network telescopes,” http://www.caida.org/publications/papers/2004/tr-2004-04/, Tech. Rep., April 2004. [Online]. Available: http://www.caida.org/publications/papers/2004/tr-2004-04/

[52] M. Bailey, E. Cooke, F. Jahanian, J. Nazario, and D. Watson, “The internet motion sensor: A distributed blackhole monitoring system,” in In Proceedings of Network and Distributed System Security Symposium (NDSS 05, 2005, pp. 167–179. 138 REFERENCES

[53] E. Cooke, M. Bailey, Z. M. Mao, D. Watson, F. Jahanian, and D. McPherson, “Toward un- derstanding distributed blackhole placement,” in WORM ’04: Proceedings of the 2004 ACM workshop on Rapid malcode. New York, NY, USA: ACM Press, 2004, pp. 54–64.

[54] R. Pang, V. Yegneswaran, P. Barford, V. Paxson, and L. Peterson, “Characteristics of internet background radiation,” in IMC ’04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement. New York, NY, USA: ACM, 2004, pp. 27–40.

[55] D. Moore, C. Shannon, D. J. Brown, G. M. Voelker, and S. Savage, “Inferring internet denial- of-service activity,” ACM Trans. Comput. Syst., vol. 24, no. 2, pp. 115–139, 2006.

[56] Nmap security scanner for network exploration & hacking. Accessed: Dec 2013. [Online]. Available: http://nmap.org/

[57] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, “Hypertext Transfer Protocol – HTTP/1.1,” RFC 2616 (Draft Standard), Jun. 1999, updated by RFC 2817. [Online]. Available: http://www.ietf.org/rfc/rfc2616.txt

[58] J. Schoenwaelder, “Overview of the 2002 IAB Network Management Workshop,” RFC 3535 (Informational), Internet Engineering Task Force, May 2003. [Online]. Available: http://www.ietf.org/rfc/rfc3535.txt

[59] E. R. Enns, “NETCONF Configuration Protocol,” RFC 4741, 2006. [Online]. Available: http://www.ietf.org/rfc/rfc4741.txt

[60] B. O’Hara, P. Calhoun, and J. Kempf, “Configuration and Provisioning for Wireless Access Points (CAPWAP) Problem Statement,” RFC 3990 (Informational), Feb. 2005. [Online]. Available: http://www.ietf.org/rfc/rfc3990.txt

[61] J. Schonwalder, A. Pras, and J.-P. Martin-Flatin, “On the future of internet management tech- nologies,” Communications Magazine, IEEE, vol. 41, no. 10, pp. 90 – 97, oct. 2003.

[62] N. W. G. in the Defense Advanced Research Projects Agency, I. A. Board, and E. to End Services Task Force, “Protocol standard for a NetBIOS service on a TCP/UDP transport: Concepts and methods,” RFC 1001 (Standard), Internet Engineering Task Force, Mar. 1987. [Online]. Available: http://www.ietf.org/rfc/rfc1001.txt

[63] UPnP Forum. Accessed: Dec 2013. [Online]. Available: http://www.upnp.org/ REFERENCES 139

[64] D. Steinberg and S. Cheshire, Zero Configuration Networking: The Definitive Guide. O’Reilly Media, Inc., 2005.

[65] Snort. Accessed: Dec 2013. [Online]. Available: http://www.snort.org/

[66] Bro Intrusion Detection System. Accessed: Dec 2013. [Online]. Available: http://bro-ids.org/

[67] K. Nyarko, T. Capers, C. Scott, and K. Ladeji-Osias, “Network intrusion visualization with NIVA, an intrusion detection visual analyzer with haptic integration,” in HAPTICS ’02: Pro- ceedings of the 10th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems. Washington, DC, USA: IEEE Computer Society, 2002, p. 277.

[68] C. Scott, K. Nyarko, T. Capers, and J. Ladeji-Osias, “Network intrusion visualization with NIVA, an intrusion detection visual and haptic analyzer,” Information Visualization, vol. 2, no. 2, pp. 82–94, 2003.

[69] W. Harrop and G. Armitage, “Intuitive Real-Time Network Monitoring Using Visually Orthogonal 3D Metaphors,” in Australian Telecommunications Networks & Applications Conference 2004 (ATNAC 2004), Sydney, Australia, 8-10 December 2004, pp. 276–282. [Online]. Available: http://caia.swin.edu.au/pubs/ATNAC04/harrop-armitage-ATNAC2004. pdf

[70] Y. Waern and D. Pargman, “Design and use of MUDs for serious purposes (workshop ses- sion) (abstract only),” in CSCW ’96: Proceedings of the 1996 ACM conference on Computer supported cooperative work. New York, NY, USA: ACM Press, 1996, p. 2.

[71] H. Takemura and F. Kishino, “Cooperative work environment using virtual workspace,” in CSCW ’92: Proceedings of the 1992 ACM conference on Computer-supported cooperative work. New York, NY, USA: ACM Press, 1992, pp. 226–232.

[72] E. Swing, “Adding immersion to collaborative tools,” in VRML ’00: Proceedings of the fifth symposium on Virtual reality modeling language (Web3D-VRML). New York, NY, USA: ACM, 2000, pp. 63–68.

[73] B. Brown and M. Bell, “CSCW at play: ‘there’ as a collaborative virtual environment,” in CSCW ’04: Proceedings of the 2004 ACM conference on Computer supported cooperative work. New York, NY, USA: ACM, 2004, pp. 350–359. 140 REFERENCES

[74] B. Shneiderman, “The eyes have it: a task by data type taxonomy for information visualiza- tions,” in Visual Languages, 1996. Proceedings., IEEE Symposium on, sep 1996, pp. 336 –343.

[75] S. Kornexl, V. Paxson, H. Dreger, A. Feldmann, and R. Sommer, “Building a time machine for efficient recording and retrieval of high-volume network traffic,” in IMC’05: Proceedings of the Internet Measurement Conference 2005 on Internet Measurement Conference. Berkeley, CA, USA: USENIX Association, 2005, pp. 23–23.

[76] G. Maier, R. Sommer, H. Dreger, A. Feldmann, V. Paxson, and F. Schneider, “Enriching net- work security analysis with time travel,” in SIGCOMM ’08: Proceedings of the ACM SIG- COMM 2008 conference on Data communication. New York, NY, USA: ACM, 2008, pp. 183–194.

[77] L. A. Crutcher, A. A. Lazar, S. K. Feiner, and M. X. Zhou, “Managing networks through a virtual world,” IEEE Parallel Distrib. Technol., vol. 3, no. 2, pp. 4–13, 1995.

[78] V. Vorovyev. trafshow. Accessed: Jun 2012. [Online]. Available: http://soft.risp.ru/trafshow/

[79] T. Oetiker. MRTG: The Multi Router Traffic Grapher. Accessed: Dec 2013. [Online]. Available: http://oss.oetiker.ch/mrtg/

[80] ——. RRDtool. Accessed: Dec 2013. [Online]. Available: http://oss.oetiker.ch/rrdtool/

[81] Nfsen - netflow sensor. Accessed: Dec 2013. [Online]. Available: http://nfsen.sourceforge.net/

[82] Surfmap – a network monitoring tool based on the google maps api. Accessed: Dec 2013. [Online]. Available: http://sourceforge.net/projects/surfmap/

[83] D. Plonka, A. Gupta, and D. Carder, “Application buffer-cache management for performance: running the world’s largest mrtg,” in LISA’07: Proceedings of the 21st conference on Large Installation System Administration Conference. Berkeley, CA, USA: USENIX Association, 2007, pp. 1–16.

[84] J. Toledo and R. Ghetta. EtherApe a graphical network monitor. Accessed: Dec 2013. [Online]. Available: http://etherape.sourceforge.net/

[85] A. Gubin, W. Yurcik, and L. Brumbaugh, “PingTV: a case study in visual network monitoring,” in VIS ’01: Proceedings of the conference on Visualization ’01. Washington, DC, USA: IEEE Computer Society, 2001, pp. 421–424. REFERENCES 141

[86] ——, “Network management visualization with PingTV,” in LCN ’01: Proceedings of the 26th Annual IEEE Conference on Local Computer Networks. Washington, DC, USA: IEEE Computer Society, 2001, p. 62.

[87] glTrail - realtime website usage visualization. Accessed: Dec 2013. [Online]. Available: http://www.fudgie.org/gltrail.html

[88] K. Abdullah and J. A. Copeland, “Tool update: high alarm count issues in IDS rainstorm,” in VizSEC ’06: Proceedings of the 3rd international workshop on Visualization for computer security. New York, NY, USA: ACM, 2006, pp. 61–62.

[89] G. Conti, J. Grizzard, M. Ahamad, and H. Owen, “Visual exploration of malicious network objects using semantic zoom, interactive encoding and dynamic queries,” in VIZSEC ’05: Pro- ceedings of the IEEE Workshops on Visualization for Computer Security. Washington, DC, USA: IEEE Computer Society, 2005, p. 10.

[90] K. Lakkaraju, R. Bearavolu, A. Slagell, W. Yurcik, and S. North, “Closing-the-Loop in NVi- sionIP: Integrating Discovery and Search in Security Visualizations,” in VIZSEC ’05: Proceed- ings of the IEEE Workshops on Visualization for Computer Security. Washington, DC, USA: IEEE Computer Society, 2005, p. 9.

[91] K. Lakkaraju, W. Yurcik, and A. J. Lee, “NVisionIP: netflow visualizations of system state for security situational awareness,” in VizSEC/DMSEC ’04: Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security. New York, NY, USA: ACM, 2004, pp. 65–72.

[92] R. Bearavolu, K. Lakkaraju, W. Yurcik, and H. Raje, “A visualization tool for situational aware- ness of tactical and strategic security events on large and complex computer networks,” in Military Communications Conference, 2003. MILCOM 2003. IEEE, vol. 2, Oct. 2003, pp. 850–855.

[93] J. R. Goodall, W. G. Lutters, P. Rheingans, and A. Komlodi, “Preserving the big picture: Visual network traffic analysis with TNV,” in VIZSEC ’05: Proceedings of the IEEE Workshops on Visualization for Computer Security. Washington, DC, USA: IEEE Computer Society, 2005, p. 6. 142 REFERENCES

[94] J. R. Goodall, “User requirements and design of a visualization for intrusion detection analy- sis,” in Information Assurance Workshop, 2005. IAW ’05. Proceedings from the Sixth Annual IEEE SMC, Jun. 2005, pp. 394–401.

[95] J. R. Goodall, A. A. Ozok, W. G. Lutters, P. Rheingans, and A. Komlodi, “A user-centered approach to visualizing network traffic for intrusion detection,” in CHI ’05: CHI ’05 extended abstracts on Human factors in computing systems. New York, NY, USA: ACM, 2005, pp. 1403–1406.

[96] S. T. Eick, “Aspects of network visualization,” IEEE Computer Graphics and Applications, vol. 16, no. 2, pp. 69–72, Mar. 1996.

[97] K. C. Cox, S. G. Eick, and T. He, “3D geographic network displays,” SIGMOD Rec., vol. 25, no. 4, pp. 50–54, 1996.

[98] J. D. Rogers, “Internetworking and the politics of science: Nsfnet in internet history,” The Information Society, vol. 14, no. 3, pp. 213–228, 1998. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/019722498128836

[99] The GPL Cube of Potential Doom. Accessed: Dec 2013. [Online]. Available: http: //www.kismetwireless.net/doomcube/

[100] Netcube 0.3.0. Accessed: Dec 2013. [Online]. Available: http://pypi.python.org/pypi/NetCube/

[101] J.-P. van Riel and B. Irwin, “InetVis, a visual tool for network telescope traffic analysis,” in Afrigaph ’06: Proceedings of the 4th international conference on Computer graphics, virtual reality, visualisation and interaction in Africa. New York, NY, USA: ACM, 2006, pp. 85–89.

[102] E. Le Malecot, M. Kohara, Y. Hori, and K. Sakurai, “Grid based network address space brows- ing for network traffic visualization,” in 2006 IEEE Information Assurance Workshop, West Point, NY, Jun., pp. 261–267.

[103] S. Feiner, M. Zhou, L. Crutcher, and A. Lazar, “A virtual world for network management,” in Virtual Reality Annual International Symposium, 1993., 1993 IEEE, Seattle, WA, USA, Sep. 1993, pp. 55–61.

[104] L. Crutcher and A. Lazar, “Management and control for giant gigabit networks,” Network, IEEE, vol. 7, no. 6, pp. 62–71, Nov 1993. REFERENCES 143

[105] A. A. Lazar, W. Choe, K. Fairchild, and N. Hern, “Exploiting virtual reality for network man- agement,” in Singapore ICCS/ISITA ’92. ’Communications on the Move’, Nov. 1992, pp. 979– 983.

[106] C. R. D. Santos, P. Gros, P. Abel, D. Loisel, and J.-P. Paris, “Using virtual reality for network management: automated construction of dynamic 3D metaphoric worlds,” in VRST ’99: Pro- ceedings of the ACM symposium on Virtual reality software and technology. New York, NY, USA: ACM, 1999, pp. 184–185.

[107] C. R. D. Santos, P. Gros, P. Abel, D. Loisel, N. Trichaud, and J. P. Paris, “Experiments in information visualization using 3D metaphoric worlds,” in WETICE ’00: Proceedings of the 9th IEEE International Workshops on Enabling Technologies. Washington, DC, USA: IEEE Computer Society, 2000, pp. 51–58.

[108] C. R. D. Santos, P. Gros, P. Abel, D. Loisel, N. Trichaud, and J.-P. Paris, “Mapping information onto 3D virtual worlds,” in Information Visualization, 2000. Proceedings. IEEE International Conference on, 2000, pp. 379 –386.

[109] P. Abel, P. Gros, C. Santos, D. Loisel, and Paris, “Automatic construction of dynamic 3D metaphoric worlds: An application to network management,” in Visual Data Exploration and Analysis VII, volume 3960, Jan 2002, pp. 312–323.

[110] D. Chao, “Doom as an interface for process management,” in CHI ’01: Proceedings of the SIGCHI conference on Human factors in computing systems. New York, NY, USA: ACM Press, 2001, pp. 152–157.

[111] K.-I. Friese, M. Herrlich, and F.-E. Wolter, “Using game engines for visualization in scientific applications,” in New Frontiers for Entertainment Computing, ser. IFIP International Federa- tion for Information Processing, P. Ciancarini, R. Nakatsu, M. Rauterberg, and M. Roccetti, Eds. Springer Boston, 2008, vol. 279, pp. 11–22.

[112] D. Fritsch, M. Kada, and C. V, “Visualisation using game engines,” ISPRS commission, vol. 5, pp. 621–625, 2004.

[113] (2006) Brutal file manager. Accessed: Dec 2013. [Online]. Available: http://www.forchheimer. se/bfm/ 144 REFERENCES

[114] B. Kot, B. Wuensche, J. Grundy, and J. Hosking, “Information visualisation utilising 3D com- puter game engines case study: a source code comprehension tool,” in CHINZ ’05: Proceed- ings of the 6th ACM SIGCHI New Zealand chapter’s international conference on Computer- human interaction. New York, NY, USA: ACM Press, 2005, pp. 53–60.

[115] B. C. Wunsche,¨ B. Kot, A. Gits, R. Amor, and J. Hosking, “A framework for game engine based visualisations,” in in Proceedings of Image and Vision Computing New Zealand, Nov 2005.

[116] T. Panas, R. Berrigan, and J. Grundy, “A 3d metaphor for software production visualization,” in Information Visualization, 2003. IV 2003. Proceedings. Seventh International Conference on, july 2003, pp. 314 – 319.

[117] A. Nakasone, K. Miura, H. Prendinger, P. Hut, S. Holland, and J. Makino, “Astrosim: Col- laborative visualization of an astrophysics simulation in second life,” Computer Graphics and Applications, IEEE, vol. 29, no. 5, pp. 69 –81, sept.-oct. 2009.

[118] Vizsec website. Accessed: Dec 2013. [Online]. Available: http://www.vizsec.org/

[119] D. A. Keim, A. Pras, J. Schonw¨ alder,¨ P. C. Wong, and F. Mansmann, “Report on the dagstuhl seminar on visualization and monitoring of network traffic,” J. Netw. Syst. Manage., vol. 18, pp. 232–236, June 2010. [Online]. Available: http://dx.doi.org/10.1007/s10922-010-9161-1

[120] Openarena. Accessed: Dec 2013. [Online]. Available: http://www.openarena.ws/

[121] (2014, July) Time machine. [Online]. Available: http://www.bro.org/community/ time-machine.html

[122] W. Stuerzlinger and C. A. Wingrave, “The value of constraints for 3d user interfaces,” in Virtual Realities, S. Coquillart, G. Brunnett, and G. Welch, Eds. Springer, 2008, pp. 203–223.

[123] D. S. Tan, G. G. Robertson, and M. Czerwinski, “Exploring 3D navigation: combining speed-coupled flying with orbiting,” in Proceedings of the SIGCHI conference on Human factors in computing systems, ser. CHI ’01. New York, NY, USA: ACM, 2001, pp. 418–425. [Online]. Available: http://doi.acm.org/10.1145/365024.365307

[124] K. Gkikas, D. Nathanael, and N. Marmaras, “The evolution of FPS games controllers: how use progressively shaped their present design,” in Current Trends in Informatics, volume A REFERENCES 145

of Proceedings of the 11th Panhellenic Conference in Informatics (PCI’07), May 2011, pp. 37–46.

[125] W. W. Gaver, “Technology affordances,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’91. New York, NY, USA: ACM, 1991, pp. 79–84. [Online]. Available: http://doi.acm.org/10.1145/108844.108856

[126] T. Lang, P. Branch, and G. Armitage, “A Synthetic Traffic Model for Quake 3,” in Proceedings of the 2004 ACM SIGCHI International Conference on Advances in computer entertainment technology, vol. 74, Singapore, June 2004, pp. 233–238. [Online]. Available: http://dx.doi.org/10.1145/1067343.1067373

[127] A. Pavlicic and G. Armitage, “Quake 3 Packet Inter-Arrival and Length Over the Internet,” Centre for Advanced Internet Architectures, Swinburne University of Technology, Melbourne, Australia, Tech. Rep. 030919B, 19 September 2003. [Online]. Available: http://caia.swin.edu.au/reports/030919B/

[128] M. D. Pozzobon, “Quake 3 Packet and Traffic Characteristics,” Centre for Advanced Internet Architectures, Swinburne University of Technology, Melbourne, Australia, Tech. Rep. 021220A, 20 December 2002. [Online]. Available: http://caia.swin.edu.au/reports/021220A/

[129] The freeglut project. Accessed: Dec 2013. [Online]. Available: http://freeglut.sourceforge.net/

[130] Cube (Game/3D Engine). Accessed: Dec 2013. [Online]. Available: http://cube.sourceforge. net/

[131] CAIA greynet toolkit. Accessed: Dec 2013. [Online]. Available: http://caia.swin.edu.au/ greynets/downloads.html

[132] id Software, Doom 1, 2, Quake 1, 2 and 3. Accessed: Dec 2013. [Online]. Available: http://www.idsoftware.com/

[133] Valve software. Accessed: Dec 2013. [Online]. Available: http://half-life2.com/

[134] The freebsd project. Accessed: Dec 2013. [Online]. Available: http://www.freebsd.org/

[135] D. Stefyn, A. Cricenti, and P. Branch, “Quake III Arena Game Structures,” Centre for Advanced Internet Architectures, Swinburne University of Technology, Melbourne, Australia, 146 REFERENCES

Tech. Rep. 110209A, 09 February 2011. [Online]. Available: http://caia.swin.edu.au/reports/ 110209A/CAIA-TR-110209A.pdf

[136] L. Stewart and P. Branch, “SONG: Quake 3 Network Traffic Trace Files,” Centre for Advanced Internet Architectures, Swinburne University of Technology, Melbourne, Australia, Tech. Rep. 060406F, 06 April 2006. [Online]. Available: http://caia.swin.edu.au/reports/060406F/ CAIA-TR-060406F.pdf

[137] G. Armitage, “An Experimental Estimation of Latency Sensitivity in Multiplayer Quake 3,” in 11th IEEE International Conference on Networks (ICON 2003), Sydney, Australia, 28-1 September 2003, pp. 137–141. [Online]. Available: http://dx.doi.org/10.1109/ICON.2003. 1266180

[138] L. Parry, “L3DGEWorld 2.3 Hierarchy & Room Reuse Documentation,” Centre for Advanced Internet Architectures, Swinburne University of Technology, Melbourne, Australia, Tech. Rep. 080222D, 22 February 2008. [Online]. Available: http://caia.swin.edu.au/reports/080222D/ CAIA-TR-080222D.pdf

[139] ——, “L3DGEWorld 2.3 Input & Output Specifications,” Centre for Advanced Internet Architectures, Swinburne University of Technology, Melbourne, Australia, Tech. Rep. 080222C, 22 February 2008. [Online]. Available: http://caia.swin.edu.au/reports/080222C/ CAIA-TR-080222C.pdf

[140] M. Allen. LupsMON 0.2 (L3DGEWorld Uninterruptible Power Supply Monitoring). Accessed: Dec 2013. [Online]. Available: http://caia.swin.edu.au/urp/l3dge/tools/lupsmon/

[141] M. A. G. Armitage, “Monitoring of the Local Transmission Control Protocol’s State Variables Using L3DGEWorld,” Centre for Advanced Internet Architectures, Swinburne University of Technology, Melbourne, Australia, Tech. Rep. 100820C, 20 August 2010. [Online]. Available: http://caia.swin.edu.au/reports/100820C/CAIA-TR-100820C.pdf

[142] C. Javier and G. Armitage. LCMON 1.1 (L3DGEWorld Cluster-node Monitoring). Accessed: Dec 2013. [Online]. Available: http://caia.swin.edu.au/urp/l3dge/tools/lcmon/

[143] A. Huebner and C. Javier. L3DGEWorld Asterisk Management System (LAMS). Accessed: Dec 2013. [Online]. Available: http://code.google.com/p/lams-ah-cj/ REFERENCES 147

[144] C. Javier, “Map & Entity Modeling for L3DGEWorld,” Centre for Advanced Internet Architectures, Swinburne University of Technology, Melbourne, Australia, Tech. Rep. 070809A, 09 August 2007. [Online]. Available: http://caia.swin.edu.au/reports/070809A/

[145] S. Zander and G. Armitage, “Empirically Measuring the QoS Sensitivity of Interactive Online Game Players,” in Australian Telecommunications Networks & Applications Conference 2004 (ATNAC 2004), Sydney, Australia, 8-10 December 2004, pp. 511–517. [Online]. Available: http://caia.swin.edu.au/pubs/ATNAC04/zander-armitage-ATNAC2004.pdf

[146] G. Armitage, M. Claypool, and P. Branch, Networking and Online Games - Understanding and Engineering Multiplayer Internet Games. UK: John Wiley & Sons, 2006.

[147] A. Cricenti and P. Branch, “A Generalised Prediction Model of First Person Shooter Game Traffic,” in 34th IEEE Conference on Local Computer Networks (LCN 2009), Zurich, Switzerland, 20-23 October 2009, pp. 213–216. [Online]. Available: http: //dx.doi.org/10.1109/LCN.2009.5355165

[148] P. Branch, A. Cricenti, and G. Armitage, “An ARMA(1,1) Prediction Model of First Person Shooter Game Traffic,” in 10th IEEE Workshop on Multimedia Signal Processing (MMSP 2008), Cairns, Australia, 8-10 October 2008, pp. 736–741. [Online]. Available: http://dx.doi.org/10.1109/MMSP.2008.4665172

[149] ——, “A Markov Model of Server to Client IP traffic in First Person Shooter Games,” in 2008 IEEE International Conference on Communications, Beijing, China, 19-23 May 2008, pp. 5715 – 5720. [Online]. Available: http://dx.doi.org/10.1109/ICC.2008.1070

[150] A. Cricenti, P. Branch, and G. Armitage, “Time-series Modelling of Server to Client IP Packet Length in First Person Shooter Games,” in 15th IEEE International Conference on Networks (ICON2007), Adelaide, Australia, 19-21 November 2007, pp. 507–512. [Online]. Available: http://dx.doi.org/10.1109/ICON.2007.4444138

[151] (2009, Sep) SONG - Simulating Online Networked Games Database. Accessed: Dec 2013. [Online]. Available: http://caia.swin.edu.au/sitcrc/song/

[152] V. Paxson, “Empirically derived analytic models of wide-area tcp connections,” Networking, IEEE/ACM Transactions on, vol. 2, no. 4, pp. 316–336, 1994. 148 REFERENCES

[153] S. Zander, G. Armitage, L. M. Thuy Nguyen, and B. Tyo, “Minimally Intrusive Round Trip Time Measurements Using Synthetic Packet-Pairs,” Centre for Advanced Internet Architectures, Swinburne University of Technology, Melbourne, Australia, Tech. Rep. 060707A, 07 July 2006. [Online]. Available: http://caia.swin.edu.au/reports/060707A/ CAIA-TR-060707A.pdf

[154] SPP - Synthetic Packet Pairs. Accessed: Dec 2013. [Online]. Available: http: //caia.swin.edu.au/tools/spp/

[155] S. Bradner, “Benchmarking Terminology for Network Interconnection Devices,” RFC 1242 (Informational), Internet Engineering Task Force, Jul. 1991, updated by RFC 6201. [Online]. Available: http://www.ietf.org/rfc/rfc1242.txt

[156] S. Bradner and J. McQuaid, “Benchmarking Methodology for Network Interconnect Devices,” RFC 2544 (Informational), Internet Engineering Task Force, Mar. 1999, updated by RFC 6201. [Online]. Available: http://www.ietf.org/rfc/rfc2544.txt