EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP STOCKHOLM, SVERIGE 2020

Video Network Analysis A Study on Tooling Design

MURAT EKSI

MARKUS PIHL

KTH SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP Network Analysis

A Study on Tooling Design

MURAT EKSI MARKUS PIHL

Bachelor’s Programme in Information and Communication Technology Date: April 23, 2020 Supervisor: Thomas Sjöland Examiner: Johan Montelius School of Electrical Engineering and Computer Science Host company: Crackshell AB Swedish title: Nätverksanalys för Videospel Swedish subtitle: En Studie om Verktygsdesign Video Game Network Analysis / Nätverksanalys för Videospel

c 2020 Murat Eksi and Markus Pihl Abstract | i

Abstract

Crackshell is an studio situated in Stockholm. They released particular iterations of a game called Hammerwatch, which is developed with their in-house game engine and they are still working to extend both the Hammerwatch and the game engine. Hammerwatch is a rogue-like multiplayer game played by up to four players in a single session by using peer- to-peer network topology. These days, Hammerwatch has gotten significantly popular and the planned features have led the team to question of whether their network utilization is performant and in what ways they can improve this utilization. Even though they the are ones who implemented the network part of Hammerwatch, they don’t exactly have an understanding of the underlying behavior of the utilization, nor they have any way to analyze it currently. This project is aimed to design and implement a proper tooling implementa- tion for their data analysis needs by identifying the network topology, data structures, extraction, storage and providing an environment that is easy to analyze the network utilization. In order to achieve this aim, an iterative approach through design thinking has been conducted with Crackshell. In this regard, there were certain decisions to be made in accordance with the constraints and the purpose of the tooling, which is defined with the help of Crackshell by the conducted workshops as a module of the design thinking approach. The above-mentioned strategy allowed a swift understanding of the problem that led the tooling to be approved as both helpful and easy-to-use by Crackshell. The data analysis tool was implemented by using a local data extraction solution, MongoDB and Jupyter Notebook in Python together with extensions that helped further with the analysis of the collected data. The results of the data analysis deemed itself as a significant success, where problems such as the game events being sent unnecessarily frequently, stale data issues, caching opportunities, and potential data clustering issues in network packets were pointed out. Crackshell was happy with the provided ability to look at their network utilization in a detailed manner, which led them to use the implemented tooling for further analysis as Hammerwatch is kept developing.

Keywords Video , Network Analysis, Data Analysis, Tooling Design, Design Thinking, Network Packets, Data Extraction, Database Design ii | Sammanfattning

Sammanfattning

Crackshell är en indie-spelstudio belägen i Stockholm. De har släppt ett antal spel som heter Hammerwatch, vilket är utvecklat med egen spelmotor. Hammerwatch och dess spelmotor utvecklas fortfarande kontinuerligt. Det är ett rogue-liknande multiplayer-spel som spelas av upp till fyra spelare i en enda session med hjälp av peer-to-peer-nätverkstopologi. Hammerwatch blev snabbt populärt och de planerade funktionerna har lett teamet till en fråga om deras nätverksanvändning är effektiv ur prestandasynpunkt och på vilka sätt de kan förbättra den. Även om det är de som implementerade nätverksdelen av Hammerwatch, har de inte exakt en förståelse för det underliggande beteendet hos nätverkskommunikationen, och de har inte heller något sätt att analysera det för närvarande. Detta projekt syftade till att utforma och implementera verktyg för att dataanalys genom att identifiera nätverkstopologi, datastrukturer, extraktion, lagring och tillhandahålla en miljö som gör det lätt att analysera nätverks- användningen. För att uppnå detta mål valdes en iterativ metod baserad på “design thinking” denna genomfördes tillsammans med Crackshell. Under designfasen fattades beslut kring begränsningar och syfte med verktyget. Ovan nämnda strategi möjliggjorde en snabb förståelse av problemet som ledde till utvecklandet av ett verktyg som både godkänts som användbart och lätt att använda av Crackshell. Dataanalysverktyget implementerades med hjälp av en lokal lösning för utvinning av data, MongoDB och Jupyter Notebook i Python tillsammans med tillägg som hjälpte till vidare med analysen av insamlade data. Resultaten av dataanalysen löste in sig som en betydande framgång, där problem som spelhändelser som skickades onödigt ofta, data som var gammal när den nådde fram, cachemöjligheter och potentiella problem med datakluster i nätverkspaket kunde hittas. Crackshell var nöjd med resultatet och nya förmågan att titta på deras nätverksanvändning på ett detaljerat sätt. De kommer kunna använda det utvecklade verktyget till framtida analyser vid fortsatt vidareutveckling av spelmotorn.

Keywords Videospel, Nätverkanalysen, Dataanalysen, Verktygsdesign, Design Thinking, Nätverkspaketer, Data exktration, Databasdesign Acknowledgments | iii

Acknowledgments

We would like to thank Johan Montelius for his contributions as he helped us throughout the whole thesis work by both providing the needed support, knowledge and also by helping with the direction the thesis work took. In this regard, we believe that we are pretty lucky to have such an experience and we are also very grateful for his help throughout the whole thesis work. We are also really happy and thankful to Thomas Sjöland for his help throughout the thesis report and supervision sessions where we got to learn a lot. We would also like to thank Niklas Myström for all the help and patience throughout the process.

Stockholm, April 2020 Murat Eksi and Markus Pihl iv | Acknowledgments CONTENTS | v

Contents

1 Introduction1 1.1 Background...... 1 1.2 Problem...... 2 1.3 Purpose...... 3 1.4 Goals...... 4 1.5 Challenges...... 4 1.6 Delimitation...... 5 1.7 Methodology...... 5 1.8 Structure...... 6

2 Background7 2.1 Game Engine...... 7 2.2 Case Study...... 8 2.3 Network and Transport Layers...... 9 2.4 Network providers...... 10 2.5 Database Design...... 11 2.6 Data Analysis...... 11 2.7 Software Environment...... 12 2.8 Virtualisation and Containers...... 13

3 Methods 15 3.1 Design Thinking...... 15 3.1.1 Story-telling...... 16 3.1.2 Continuous Delivery...... 16 3.2 Preparation for Data Analysis...... 17 3.2.1 Collection of data...... 17 3.2.2 Analysis and optimization...... 18 3.3 Designing the tool...... 19 3.3.1 Implementation...... 19 vi | CONTENTS

3.3.2 Maintenance...... 19 3.3.3 Utilization...... 20

4 Implementation of Tooling 21 4.1 Data Import...... 21 4.1.1 Current Design...... 22 4.1.2 Cloud Solution...... 23 4.1.3 Local Solution...... 24 4.2 Parsing...... 25 4.2.1 Workshops of Data...... 25 4.2.2 Network Packet...... 26 4.2.3 Game Message...... 27 4.2.4 Solution...... 28 4.3 Data Storage...... 28 4.3.1 SQL...... 28 4.3.2 NoSQL...... 29 4.3.3 MongoDB...... 30 4.4 Environment...... 33 4.4.1 SQL Analysis...... 33 4.4.2 Apache Kafka...... 34 4.4.3 R...... 34 4.4.4 Python...... 35 4.4.5 Jupyter Notebook...... 36 4.5 Extensions...... 37 4.5.1 Pandas...... 37 4.5.2 Qgrid...... 38 4.5.3 Pivottablejs...... 38 4.6 Distribution and Portability...... 39 4.6.1 Virtualization...... 39 4.6.2 Containers...... 40 4.6.3 Docker...... 40

5 Results 42 5.1 Workflow...... 42 5.2 Data Insights...... 43

6 Conclusion 48 6.1 Limitations...... 48 6.2 Future work...... 48 6.3 Ethics and Sustainability...... 49 Contents | vii

6.4 Reflections...... 49 6.5 Verdict...... 50

References 50 viii | LIST OF FIGURES

List of Figures

2.1 Example of a Simplified Game Engine Structure with Respective Modules...... 7 2.2 OSI Model Standard...... 9 2.3 Example Network Flow through a Provider...... 10 2.4 Differences between Virtualisation and Containerisation... 13

4.1 Current Topology of the Network for a Single Game Session. 22 4.2 Respective Topology of the Network for a Possible Cloud Solution...... 23 4.3 Respective Topology of the Network for a Possible Local Solution...... 24 4.4 The Structure of a Typical Outgoing Network Packet..... 26 4.5 The Structure of a Typical Incoming Network Packet..... 26 4.6 The Structure of a Typical Game Related Message Data.... 27 4.7 Proposed Structure of the Services...... 41

5.1 An Overview of an Example Workflow...... 42 5.2 A Datagrid Example for the Processed Data through Qgrid Package...... 43 5.3 Average Network Packet Size Over Time...... 46 LIST OF TABLES | ix

List of Tables

5.1 Overview of the Game Message Types per Session in Accordance to Package Size and Frequency...... 45 x | LIST OF TABLES Introduction | 1

Chapter 1

Introduction

This thesis will go in-depth about evaluating peer-to-peer traffic in a multiplayer game for a small indie gaming studio situated in Stockholm. In this chapter, an overall picture of the problem in hand, together with the reasoning, goals that are put to achieve and challenges will be described. Additionally, a concrete scope and structure for the subject area will also be defined to give a clear idea of how the thesis proceeded.

1.1 Background

Crackshell is a small indie game developer in Stockholm. They specialize in making multiplayer 2d-action role-playing games for the PC market. They run a small team and have built their so-called game engine for developing the games upon. Their proposed task concerning the work of this thesis comes in the form of designing and implementing a solution towards the analysis of the underlying network behavior of one of their currently released and still developing games. The network behavior of a multiplayer video game is a significant aspect when it comes to the overall experience for players. This aspect can be further extended by explaining the consequences of a poorly optimized network behavior. In such an environment, frequent and latency issues can affect the player experience negatively, hence lead the players to not play the respective video game anymore. This would not only result in revenue losses, but can also come in the form of high maintenance and development cost as the game is further developed. Furthermore, this can lead to an unsustainable business environment where it becomes much harder to operate in the long term. 2 | Introduction

The concerning study areas of this thesis are explained below.

• Case Study methodologies to understand the problem, design a solution and implement together with prototyping experiments.

• Network Layer and Transport Layer studies such as connection methods, tunneling, latency, packet analysis, etc. A good understanding of the network topology is needed.

• A good understanding of Database Design and optimization techniques are also needed for this study.

• Data Analysis concerning the extraction, parsing and further understanding of data.

• Software Environment Analysis and Design concerning the needed solution is a significant demand in this subject.

• Virtualization and Containers are also studied to have a solution that is easy to replicate.

• Statistical Methods are also used to showcase the proposed solution and understand the processed data with the aim of providing a high- insight towards the data.

1.2 Problem

Today, one of Crackshell’s main games called "Hammerwatch" [1] is becoming significantly popular and they are adding many more features, together with patches to the already-implemented systems in the game. This leads the team into a position where the additions and patches create an environment that makes it significantly important to have a performant game system in hand. In the first stages of development, given that features are much smaller and the player-base is not that big, the effects of such issues does not surface easily. However, the increasing player-base and the growing game led the studio to concerns whether they are in the right direction when it comes to their network utilization throughout the game sessions. Since Crackshell has such a small team, they have not had the opportunity to develop and analyze their game performance on more than a surface level. They have primarily focused on high-level behaviors. This implies that the fast-paced feature development neither allowed them to have a further look Introduction | 3

into the underlying behavior of the network layer nor implement a solution that makes the analysis process easier. We have, in this thesis, tried to help them analyze how the game’s network layer performs and use that as a base for looking at designing tools to help them with these problems. In this regard, we have conducted studies to gather fundamental information about the game to have a case study that leads to a solution for Crackshell to have an easy approach towards the analysis of their game’s network layer. The thesis study could be summarised as the following question.

“A Case study on making tools for the analysis of the network layer of a peer-to-peer multiplayer game, how can one conduct this study and implement a respective solution?”

This question is further studied throughout the project and will be showcased in this report together with the decisions we took and how we came up with the proposed solution.

1.3 Purpose

The purpose of this project is to help Crackshell understand and analyze the behavior of the network layer in their peer-to-peer communication. This study will help them optimize the network layer as well as establish a stronger tool- set for looking at faults in the communication implementation. The analysis can also lead to them being able to run a simpler implementation of their communication. An optimized network layer could result in efficient use of resource, hence reduced operation costs, energy usage and easier development of new features. Throughout this thesis work, one can also learn significantly in the study areas mentioned above in the background section where the project would lead to increased knowledge and experience for the participants. Furthermore, other indie video game studios that are having similar issues can get a good insight into the design and implementation of such needed tooling for the network analysis. This would imply that the work presented in this report could also help the industry in a way that allows easy analysis of video game networks. 4 | Introduction

1.4 Goals

This thesis aims to showcase the steps that need to be taken for an improved solution to a high-bandwidth peer-to-peer communication in a multiplayer game. In this regard, the main goals to be achieved are as stated below.

• Do a thorough analysis of the problem together with Crackshell

• A study of the game’s current network parameters and design choices

• An effective way of data extraction and storage

• Design and implementation of the tooling for data analysis

• Design methods for analysis

• Create a tool to use the methods of analysis and allow for future customisation

1.5 Challenges

One of the main challenges that come into mind is regarding the ethical approach of the data collected through the game sessions. To have a service that respects players’ privacy, the collected data should not contain any information concerning the respective player. Furthermore, the collection process of the data should be done in a way that doesn’t affect the game experience, hence no should arise. The currently not-so-defined underlying behavior of the game sessions might lead to situations where the study and the provided service must be flexible to further accommodate possible adjustments along the way. An arising challenge also showcases itself in the form of grounds when it comes to the development of the respective video game in this domain as Crackshell did not have any progress concerning this thesis’ area before. In this regard, this thesis work becomes more of a fundamental ground for the future development of data analysis services for Crackshell, which leads the study to a position with significant consideration of an optimal architecture. Concerning the above-mentioned challenges, further study and work are presented in the coming chapters. Introduction | 5

1.6 Delimitation

To achieve the main goals and have a focused mindset throughout the project there are certain topics and areas that the thesis project has not touched on. These can be summarised as stated below. • The extensive behavior of the network layer between the players is not taken into consideration as that part of the pipeline can be thought of as a black box, hence it can change any time.

• With this project, the main focus was the aggregation of game data and the analysis. This implies that no significant solution was provided for any possibly existing bug or optimization issue, but rather showcased these issues. These issues, once showcased, must be mitigated in a manner that considers the whole game.

• This thesis also does not cover how the data is produced and consumed by the game engine in a detailed manner but rather studies it in a way that concerns the network interactions. In this regard, the thesis focuses on data in its purest form without expecting any modifications.

1.7 Methodology

The methodologies utilized throughout this thesis project can be summarized shortly as stated below. The act of providing a solution comes with the need of first understanding the problem thoroughly, so that one can design accordingly and provide a respective implementation. In this regard, the project is conducted by taking the below stated methodologies into consideration.

Design Thinking is the main methodology that this thesis focused on. This implies that the work is conducted together with Crackshell in an iterative approach that always looked for feedback loops where the use cases of the possible solution are discussed with Crackshell to further understand their needs. In this regard, a framework has been utilized to have a workflow that allows changes easily and a way that affirms the conducted work is aligned with the needs of Crackshell. The framework is explained and discussed further in the next chapter of this report.

Once the problem and the needs are understood in a confident manner that leads to a well-aligned design, the need for methodologies where one can 6 | Introduction

implement, test and analyze emerges. In this regard, the thesis project utilized the below mentioned methodologies to achieve the goals of the project.

Quantitative Analysis is the measure taken to analyze and test the proposed designs and possible implementations. In this regard, this project focused on data analysis techniques such as population analysis, clustering patterns and performance-related variables.

Qualitative Analysis is a process that has the aim of providing a solution that is both easy to implement and further maintain. Furthermore, practical use-cases of the solution also need to provide a flow that is easy to follow and operate with, so that Crackshell can utilize the tooling solution effectively. In this regard, project focused on analysis methods concerning both narrative analysis and discourse analysis.

1.8 Structure

From this chapter onward, the thesis consists of particular chapters concerning their respective focus areas. These chapters are summarised as stated below.

• Chapter 2: Background focuses on the methodologies that are taken into consideration during the thesis work.

• Chapter 3: Methods accommodate the discussion regarding the design and implementation of the above mentioned required tooling service.

• Chapter 4: Implementation of Tooling presents the outcome and the resulting data insight provided by the implemented tooling service.

• Chapter 5: Results are about the discussion and the analysis of the key points derived from chapter 4.

• Chapter 6: Conclusion concludes this report by showcasing the possible future work in this area and reflections of the writers in a compiled manner. Background | 7

Chapter 2

Background

This chapter aims to provide the adequate information to the reader for the understanding of the next chapters, especially the chapter 4 where the implementation details are presented. The chapter covers the main topics in this regard together with the associated figures to help with the understanding of certain abstract definitions and the relations of these definitions with the thesis work.

2.1 Game Engine

Figure 2.1: Example of a Simplified Game Engine Structure with Respective Modules 8 | Background

A game engine is a composition of particular tools that provide the needed solutions to develop a game. These respective tools can be categorized in the needed topics such as network utilization, graphics generation, level design, animations, audio design, and platform-specific hardware needs [2]. The composition of different tools allows teams to work on the game in an independent nature towards the final product. Furthermore, this also allows for the development of the game engine to be conducted swiftly. In figure 2.1, one can see a simplified structure that showcases the modularity of a typical game engine. Many types of game engines serve specifically different genres of games, but there are also certain specific and popular ones, which are more versatile such as Unity [3] and Unreal Engine [4]. These types of versatile game engines are utilized a lot in the industry, especially in the indie games industry [5]. However, they come with certain royalties depending on the game’s success which can be quite expensive depending on the game studio’s goals. Furthermore, using a game engine like these can lead to situations where the game in development needs a certain optimization or feature, but it is either hard to develop or optimize as the engine itself is provided by a third party company, hence the studio becomes locked to their ecosystem. With this reasoning, particular game studios such as Crackshell aim to develop their game engines by focusing on implementing and designing the parts they specifically need for their current games, which leads to better customizability and control over the ecosystem together with royalty-free sales. This comes with advantages, but also makes it harder for the game studio to only focus on the game itself as they need to develop their game engine in parallel depending on the respective needs, which can lead to longer development times.

2.2 Case Study

According to Cambridge [6], a case study is a research method centered around an investigation of an entity, phenomena, group of people or similar. This is achieved by closely studying the topic in question and the data about the respective topic is often gathered in different ways with the aim of a thorough analysis. As explained in the guide by the University of California [7], a usual case study would start by identifying the respective problem to be investigated through extensive investigation with the aim of defining the case limits, challenges and potential impact, so that the study is conducted in a successful manner. Once the investigation is done, the study conductors would try certain Background | 9

analysis and research methods to compare and come up with solutions in relation to the subject area [7]. In the context of this thesis, a case study of network communication has been done through investigating the communication for the game engine Hammerwatch by the game studio Crackshell. This study is further explained in detail in Chapter 3 where the utilized methodologies are presented with their reasoning to also showcase the steps taken to conduct this case study.

2.3 Network and Transport Layers

Figure 2.2: OSI Model Standard

Since the problem looked into throughout the thesis work is closely related to the network activities of the game, one needs to understand the underlying, fundamental behavior of the utilized network. To further simplify and analyze in a focused manner, this thesis uses the OSI Model definitions for the respective layers that are under analysis. As described by Cloudflare [8], the OSI Model is a concept used for the standardization of complex and different communication protocols used between different computer systems. The top- down level structure of the OSI Model can be seen in figure 2.2, where the layers this thesis focuses on are in the middle, labeled as Transport Layer and Network Layer. 10 | Background

According to Cloudflare [8], the Network Layer, is the main layer that makes sure that data is being sent between different networks by utilizing the best routing options. This is done by partitioning the respective data into smaller units when sending the data with the expectation of these smaller units being reassembled back to the original data [8]. This allows data to be sent and received optimally even if the respective computer systems are far away from each other, hence connected through different networks. In the game, this is mainly utilized through the network providers which are explained in the next section. The Transport Layer together with the description by Cloudflare [8], can be summarised as the layer that focuses on the computer system to another computer system connection where the error checking, received approval, and the reassembling of the data from the respective network are conducted. For the received approval, there are protocols such as TCP and UDP utilized in this layer. Furthermore, in the game’s context, these protocols can be thought of as making sure that an important game event has been sent successfully or a game event that needs to be sent as fast as possible without any concern about the success of game event transportation. This layer is utilized by the respective computer systems outside of the network provider.

2.4 Network providers

Figure 2.3: Example Network Flow through a Provider

Network providers is a term more often used to describe service providers. In this thesis, it is the name of the tunneling service in the network for the game engine. As it can be seen in the figure 2.3, there is a Network Provider in between the different Game Clients. In Hammerwatch, the most utilized one of these providers is Steam, but it also gives the players the choice of a few other providers. As an example, the purpose of using Steam for tunneling service could be explained as helping different game clients to find each other through a Steam provided service called "Matchmaking" [9] and once the game clients Background | 11

are connected, the tunneling service also helps with directing the network traffic between the game clients in an isolated secure fashion. The utilization of such network providers as described in this section helps Crackshell to have an easy implementation of network infrastructures between the game clients, which additionally helps with reducing the development time and respective costs. However, these providers must be thought of as black boxes where they only have certain interfaces for utilization, hence the underlying workings are not publicly known in detail and the thesis problem in this regard can be looked into by just looking at what comes out and goes into these black boxes without the need of analyzing the providers as they don’t alter the data in any way.

2.5 Database Design

The thesis work consisted of implementing a data analysis tool, which means that there was data that needs to be stored and handled properly. A possible solution in this aspect is using Databases, but the design of a database is a very crucial step in this process. As stated by Microsoft [10], a proper design for a database provides significant benefits for the overall and the process consists of purpose determination, information gathering, and setting a proper architecture in the database for the respective data to be stored. Furthermore, Microsoft also suggests that a possible study can be done to further optimize by using techniques such as "normalization", which are specific rules that can be applied depending on the data stored and the design [10]. According to Microsoft, a good design of a database consists of good performance, data integrity, modularity, and flexible accessibility that allows easy modifications and queries [10]. In this thesis work, these attributes are taken into consideration to provide a proper storage solution to the implemented data analysis tool. These considerations were in the form of identifying the data to be analyzed and the potential options provided in the category of databases. The specific details in this regard are showcased and explained further with the considered options in Chapter 4.

2.6 Data Analysis

According to the United States Office of Research Integrity (ORI) [11], data analysis can be described as the combination of analytical and statistical 12 | Background

methods that help to identify, collect, and analyze to provide insights about a certain data set, where the interest is to further understand how a group of data behaves and relates to each other. Furthermore, they also state that the analysis takes the form of an iterative process, which is caused by the gained insights especially if the analysis is done in a qualitative manner [11]. The integrity of such an analysis is very crucial to provide correct solutions to the respective problems, where ORI also mentions that a good data analysis should consist of honest, unbiased insights that are both accurate and provided by taking reliability into consideration [11]. The data analysis aims to help with the decisions to be made in throughout this thesis work and also with the future work of Crackshell, which means that a proper mechanism should be taken into consideration when one is trying to make decisions from the gathered insights. In this regard, John Dillard [12] mentions the steps for a proper data analysis as a process that aims to work with well-defined questions together with comprehensible measurement criteria that are followed by a properly defined data collection methods where all the properties such as data structure, extraction, and storage are set properly. Furthermore, he also mentions that there could be a need of iteration when defining the questions and criteria as the data collected being analyzed, which is followed by understanding the gathered insights in accordance with the defined measurement criteria and the question of whether the goals are achieved properly in a reliable way [12]. In this thesis, once the data analysis tool is designed and implemented, the data analysis concept is utilized to mainly look at the amounts, sizes, population, and structure of the data and how these relate to each other. This will help with the aim of the thesis as the questions asked by Crackshell regarding the game’s network utilization being answered through the gained insight. By benefiting from the provided insight, Crackshell will also look into further scenarios and potential extensions both in the data analysis tool and the game’s network utilization.

2.7 Software Environment

In this thesis work, the definition of a software environment can be given as both the development and the runtime environments, where both of them are utilized using certain programming languages and paradigms. The software environment that is to be designed and implemented can be thought of as the core of the data analysis tool, where the big chunks of both the development and utilization are happening. The design process of a software environment Background | 13

could be thought of as the choice of available programming languages, frameworks, and paradigms. Without proper analysis and the understanding of such attributes, one can not design a proper tool that serves the demands properly. In this regard, the conducted work and considered options are discussed and presented in Chapter 4 with the reasoning by also including what each option is used for. The detailed explanation aims to help the reader follow through the conducted work and material swiftly.

2.8 Virtualisation and Containers

Figure 2.4: Differences between Virtualisation and Containerisation

A good solution is only good if it can be used by others in an easy manner, where the replication of the tooling is a swift process. In this regard, there are certain options one can think of such as Virtualisation and Containerisation. The Virtualisation method as described by Red Hat [13], is an abstraction that provides the needed secure separation between big environments such as operating systems on a single computer system. As further explained by Red Hat [13], this is provided with the help of a Hypervisor, which can be 14 | Background

thought of as a manager software between the respective environments and the hardware as the orchestration of resource usage is adjusted. The Containerisation method on the other hand described by IBM as a recent and potential alternative to virtualisation [14]. As explained by IBM [14], one can think containerisation as a method of bundling an application into a package together with all of the configurations, dependencies, and necessary infrastructure where the bundle can run in any supported platform without the overhead of an additional Hypervisor for example. An operating system can have many different bundles/containers running at the same time where the containers can be thought of as just processes that are isolated from each other through the usage of Namespaces [14]. These methods can be seen on a top-level structure side-by-side in figure 2.4, where one can understand the differences in structure easily. Furthermore, these methods can provide the sought replication solution, which will be further analysed in chapter 4 where the thesis work is explained in detail. Methods | 15

Chapter 3

Methods

This chapter aims to provide in-depth information concerning the methodologies that were utilized throughout the project timeline. Since the main aim of the project is to design and implement tooling for a problem that has no prior foundation, the methodologies used are mostly in the form of understanding the problem properly and design a solution together with Crackshell according to the available data. The fundamental methodologies presented below allowed the implementation of a tooling system is showcased in the next chapters.

3.1 Design Thinking

For this project, the design thinking process that is based on the description provided by the Interaction Design Foundation [15] was taken into consideration to thoroughly understand Crackshell’s different needs from the tool. In their article [15], they describe the process as an iterative workflow that helps with the understanding of problems with many unknown parameters through the utilization of phases. According to the Interaction Design Foundation [15], these phases can be thought of as consecutive steps that are followed and iterated right from the understanding of the problem to the conclusion. In order to utilize the design thinking process properly, two main methods were chosen to incorporate properly with these phases. These methods together should in concert handle our main challenges. Firstly, Crackshell themselves did not know how the behavior of the communication layer behave very well and wanted the tool to help them understand it now and going forward, so we have to make sure we understood their wanted output to better design and implement the tool that will answer their needs. Secondly, we will have 16 | Methods

to properly understand their communication layer architecture to be able to design, which helps with understanding its behaviors. Thirdly, the design and implementation of the tooling need to be checked frequently in an iterative manner with Crackshell to ensure the work is conducted in the correct path towards the defined goals and needs.

3.1.1 Story-telling To understand their initial requirements for the tooling a method centered around storytelling was chosen. The reason for choosing this method is that it is a very well used method for handling user-centric systems in the IT industry. This could be exampled by NNgroup - a silicon valley based design firm - study of the principle [16]. For this project, the method will be used to run a series of interviews with the development team within Crackshell. They, in turn, describe with words and simple sketches of how the game’s communication layer works currently and how they would like it to work in the future. The focus here is to talk about the intended behavior of the tooling and the output they wish from it more than to talk about very detailed requirements. These interviews will be logged as entries in a logbook for better follow-up. The expected output is the good understanding of Crackshell’s overall need and wants when it comes to analyzing their communication layer

3.1.2 Continuous Delivery Once the requirements for the tool are understood. A series of workshops will be held together with Crackshell the reason for these is to on a technical level understand the system and the technical requirements for the tool. After the communication layer is understood the aim shifts to a step by step delivery process which will first focus on sketches for the architecture of the tool to Crackshell. After the architecture design is decided upon, a series of code iterations for prototypes will commence which aims to make prototypes that comply with their initial requirements. The advantage of choosing this type of method is that it will both provide us with a feedback loop of functionally and feature sets as well as bring up insights, which will be reached as the understanding of the problems grows. This is a widely used method of application development, Amazon refers to it as "a pillar of modern application development" [17]. These insights could then be worked into the next iteration as new feature sets or eliminate earlier wished-for features as the need for them is not valued after the second pass. Methods | 17

The expected output will be the working prototypes that can be tested by Crackshell and later be developed into the tool they need.

3.2 Preparation for Data Analysis

Crackshell had early-on set out two specific questions, which they wanted to understand better and they were as follows::

1. Which of the game events are sent most frequently?

2. Is there any game event, which we have specified and is never used?

In addition to these specific questions, Crackshell also presented a general problem regarding the understanding of the overall network behavior. This problem was not tied to any specific aspect, but rather an exploration process of the network analysis where the findings would further define the underlying behavior. To answer these and other questions that might come up during the iterative process, data needs to be collected and visualized.

3.2.1 Collection of data To be able to make a tool for analysis of the behaviors of the communication layer, communication data has to be gathered from the game session. The data has to comply with a series of requirements. It has to:

• Be compliant with legal requirements for privacy

• Be of at least a full gaming session played from start to end by more than one player to ensure a realistic set of circumstances

To make sure the data we gather is ethically sound and compliant with privacy requirements, an example can be if it would be thoroughly anonymous. For this project, only the data sent will be gathered. The gaming client also has data it received but it will not be gathered and this project is run on the assumption that all messages which are sent are being received properly. This is due to it not being needed for answering Crackshell’s initial questions as well as keeping down the level of complexity in the implementation of the tool. Since more questions are expected to arise during the iterative implementation period, further data will be gathered from the game clients than initially needed to answer the questions. The following types of data will be gathered: 18 | Methods

• Amount of data: How many of each type of game events are being sent?

• Type of data: Which types of game events are being sent, which are not?

• Structure of data: How does the packaging of the game events look like, how is it clustered?

• Values of data: What are the values of the game events being sent?

All the gathered data will have to be humanly readable for later visualization and analysis.

3.2.2 Analysis and optimization When the data is gathered and stored properly, a further analysis will be done. This is to find insights into the behaviors as well as spot opportunities for optimization. In addition to questions arising from Crackshell during the implementation period, three types of questions, which the tool will be aiming to answer are as stated below.

• Resolution of sent data: Is too much data being sent where less could suffice?

• Redundancy of data: Is there any data being sent even though it is invalidated before use?

• Caching: Is the same static data being sent multiple times, which could be cached?

To try for resolution of data, one can check how often a certain action is taken and sent to the other clients and compare it to human reaction speed. If the game is sending new instructions, which of them are updated much quicker than a human can react to them? Is that level of data resolution needed? The redundancy of data can be tried by looking at how the data is structured. Is a good chunk of the same instruction sent repeatedly at the same time and packaged in the same structure and the game is just acting on the last one in the structure? Are the ones before the last in the sent structure needed? Does the game keep sending the same data over and over again? If so, can it be optimized to cache some form of the communicated data? Methods | 19

3.3 Designing the tool

To help out with designing the tooling and to help guide the workshops, a series of requirements were set up to make sure that the end output would be well received. These criteria were set as qualitative ones with no clear definition of completion, but to make sure they were considered during the implementation phase. The criteria were grouped into three categories. The process of defining these criteria was conducted to make sure that the solution provided to Crackshell is in alignment with the needs and requests of Crackshell. This implies that these criteria were set through the initial discussions with Crackshell where they also reflected on their expectations.

3.3.1 Implementation To help decide which language the tool would be developed in as well as help determine what kind of third party systems would be used three questions were posed.

• Ease of implementation: How hard would it be to implement in a given way and time constraints?

• Complexity: How complex would a certain choice of implementation be?

• Speed: How good is a tool or a language for prototyping?

3.3.2 Maintenance For the tool to be useful over an extended duration of time, it had to be maintainable by the Crackshell team. To help answer this, the following questions were asked.

• Local vs Cloud: What are the advantages and disadvantages of having the service remote compared to locally?

• Dependencies: How many and how well maintained are the dependencies? 20 | Methods

3.3.3 Utilization If the tooling would be of proper use for Crackshell, a consideration needed to be taken concerning the utilization. Definition of the level of utilization is as showcased below.

• Customizability: Would Crackshell be able to tailor the tool for changing needs in the future?

• Ease of use: Would Crackshell be able to use the tool in a good way? Implementation of Tooling | 21

Chapter 4

Implementation of Tooling

This chapter focuses on the design and implementation details of the proposed solution to provide a good understanding of the reasoning behind the choices made for the reader. Furthermore, a detailed explanation and the compiled documentation of the current network topology provided by the game are also showcased in this chapter. Respective results regarding the implementation are presented in the next chapter.

4.1 Data Import

Before the project start, Crackshell didn’t have any solution implemented for extracting a game session’s network data. In this regard, before starting with any process of tooling design and implementation, a way to import the data effectively without hindering the game performance was necessary. This section aims to present the steps taken to both design and implement a solution that allows the game session’s network data to be extracted. Furthermore, both the advantages and the disadvantages that have been considered while making these decisions are also provided to showcase the reasoning behind the design up until this point. 22 | Implementation of Tooling

4.1.1 Current Design

Figure 4.1: Current Topology of the Network for a Single Game Session

An understanding of the network topology was a necessary step to figure out the possible solutions towards the extraction of network packets. In this regard, workshops have been conducted focused on the architecture to map out how a game session is handled and how a single network packet travels in a given topology for a usual session. These workshops led to an understanding of the network topology as showcased in figure 4.1 where each key component of a usual session is showcased. A description of the topology follows as stated below. An Overview of the topology can be described as a peer-to-peer topology where there is no centralized host throughout the whole session, but each computer system act as peers once a session is established. Network Provider can be thought as a black box in this design and implementation process as the provider can be any kind of service. This leads to a modular component that doesn’t alter the network packet content-wise, hence the focus on network providers would be pointless in the aim of a tooling solution that aims network packet data analysis. A Host is a name given to the initial system that starts the session, but this doesn’t imply that all the network packets go through the host. The host is Implementation of Tooling | 23

used in a centralized manner only when the game session is being set up and handshakes between all other players have been processed. After the setup of the session, the host continues as a peer just like the other players. Client is showcased in a manner that provides an example for the setup of a game session. Similar to the host, once the game session is setup, clients also act like peers, hence the network packets flow in a peer-to-peer session together with the help of the network provider.

4.1.2 Cloud Solution A possible solution when it comes to the storage of the network data was residing in cloud services. This would allow an implementation where all the processed data to be stored in a centralized manner where anyone working with the tooling could access these broadly available resources. As mentioned by Srivastava and Khan [18], cloud computing services are on the rise and they come with many potential benefits where the freedom of design can lead to highly scalable architecture. Furthermore, Srivastava and Khan also state that such architecture could also provide cost-effective results where the reasoning would be that the user only pays for how much they utilized the service in a given period [18].

Figure 4.2: Respective Topology of the Network for a Possible Cloud Solution

One of the main design choices that came into mind was having a cloud microservice solution together with storage as well. A microservice architecture, as described in a survey study done in Brazil [19], can be summarised as a paradigm of software design where the software is provided in small and independent modular components, which makes the development of each component free from each other. Further in the survey results [19] where they received responses from developers with different levels of experience, certain advantages of this paradigm were also pointed out such as the ease of scale, maintenance and vast freedom of not being tied to a specific technology over time. This could mean that the proposed tool designed in this aspect could be both easy to change and have the necessary power to handle big data. A possible implementation has been showcased in figure 4.2 where the flow of a single network packet could be described as it first going through the 24 | Implementation of Tooling

proposed cloud service and then routed to the network provider. This implies that the proposed cloud service would be the middle-man that works like a proxy between the local system and the network provider. On the other hand, the above-mentioned survey also pointed out certain disadvantages the surveyed developers have experienced [19]. Most of the disadvantages seem to stem from the overhead of microservice architectures as they mention the need for tests, bigger teams, and the complexity of the flow that needs to be implemented [19]. These disadvantages were also crucial to the aims of the thesis project. One of the key factors was the currently small team of Crackshell and the necessity of having an additional big team for the further development and maintenance. Such a system would become costly to operate over time as they didn’t plan for an in-house team focusing on the development of the tooling. Furthermore, with the time constraints of the thesis project, implementing a robust architecture with good performance was deemed both unrealistic and make it harder to reach the main goals of this project at this stage. Another thing concerning Crackshell was the possible performance loss in game sessions with the additional overhead, which could be caused by the architecture showcased in figure 4.2.

4.1.3 Local Solution

Figure 4.3: Respective Topology of the Network for a Possible Local Solution

Once the cloud solution with microservice architecture was deemed as unnecessarily complex and costly for this project, additional workshops took place with Implementation of Tooling | 25

Crackshell to further understand their needs in a way that aims for pinpointing the exact use case of the intended tooling service. The gathered feedback implied that the team would like to use this mostly during development where the data could be both small and live in the respective analyzer’s local system. This was a conclusion that was made concerning the small validity time of the data as the game is being developed, hence the extracted data could become stale over time. In this regard, the analysis of a possible local solution led the project to a cost-effective and easy to implement the solution for the extraction of the data. As showcased in figure 4.3, a proposed solution in local data extraction could be implemented independently from the network flow of the packet where the system would just dump the game data additionally to its local storage. With this solution, the possible interference in the workflow would not exist, hence not lead to a lag in the game session. After going over the possible solutions, the decision was made to proceed with the proposed local solution.

4.2 Parsing

Once it was possible to extract the network packets, a proper way of parsing the extracted data was needed to progress further to provide a solution towards the respective data analysis. There were many concerns in this aspect given that the extracted data is in binary format and a certain structure of the data must also be defined to have a proper parsing solution. In this respect, iterative analysis together with Crackshell has been conducted to both first define the needed data structure and also understand their needs and wishes towards the implementation of the parsing step.

4.2.1 Workshops of Data Together with Crackshell, a study on a proper definition of the data structure has been conducted. These workshops focused on analyzing the currently implemented event system of the game to carve out the bigger network packets into the game’s related events. These data structures have been put on a hierarchical tree system to further understand where each structure stands in an overview. This way, it was possible to figure out a way to process the extracted data where the parsing order, patterns, and the respective parameters were put into consideration. 26 | Implementation of Tooling

4.2.2 Network Packet

Figure 4.4: The Structure of a Typical Outgoing Network Packet

Figure 4.5: The Structure of a Typical Incoming Network Packet

Once the overall structure of the extracted data has been carved out, the top level structure came in the form of an encapsulating data packet that serves the functionality of clustering many types of game events together and sending them to the other peers. From this finding, the decision to call these packets "network packets" has been made. As it can be seen in the above figures, there are two types of these network packets where one of them is classified as the outgoing packet and the other one being the incoming packet. In common grounds, both of them have the same purpose of encapsulating and providing the respective game events for the concerning peer. The only difference comes in the form of delivery type, where the type of delivery such as reliable/unreliable for the outgoing messages is taken into account. Once the packet is received on the other end, it played much of a role for neither the functionality nor the analysis. Crackshell already decides how each event should be sent when it comes to reliability, which affects the related packet as a whole. As it can be seen on both figure 4.4 and figure 4.5, these packets hold three common parameters. First one would be the PeerTo/PeerFrom parameter, which is used for identifying the respective sender or the receiver. This parameter is very small in size by only covering a byte of space for each packet. Another common parameter is the Data Size parameter, which is used to tell Implementation of Tooling | 27

the respective game client how to process the encapsulated game events by taking the encapsulated data size into consideration, so that the processing of the data can be handled correctly. Finally the Data parameter of the network packet, which is used most of the time as wrapper for the clustered game events, which are explained further in the next subsection. Additionally, one should also expect that in some cases, network packets are either not clustered optimally or just hold onto a single game event.

4.2.3 Game Message

Figure 4.6: The Structure of a Typical Game Related Message Data

The parsing operation of the Data parameter of the network packets was a crucial step in this section, given that these data structures are what makes the game function and they are quite dynamic and loosely-shaped, which is concluded through the information received from Crackshell during the workshops. These structures are called Game Messages in this report and both implementation designs to provide further clarity when referring them throughout the thesis. Game messages come in many shapes and form while also respecting a general structure showcased in figure 4.6 where the Message parameter provides the type of the game event the respective game message holds onto. This parameter helps with the main difference, which stems from the Params parameter depending on the type of the game messages. As the analysis of the structure continued, a further compilation of currently developed and available message types, together with their respective parameter information have been provided by Crackshell. This information further helped with the understanding of how one can parse these loose-shaped structures and what kind of concerns should be taken into consideration when doing so. 28 | Implementation of Tooling

4.2.4 Solution After the compilation of the available game messages and the hierarchical structure of the extracted game data, the process of implementing a parsing solution was much easier. This also showcases the important of understanding the problem properly before coming up with a solution right away. In this regard, a script to handle the parsing of the extracted data has been designed and implemented in Python where a buffer to read the binary file has been used. Furthermore, once the file was readable, a design for how the network packets should be parsed was commenced, where the buffer was read according to the size information provided by the respective network packet. Once a network packet was parsed, it was time for the encapsulated game messages, where a pattern matching technique is used for identifying the type of game event provided by each game message to figure out the related parameters of the game event inside the game message. This technique provided an easy way to parse the whole game message in a correct manner. Although, values of each related parameter were a challenge to parse in Python, given that the parameters were C/C++ type variables in a usual game client. This problem was solved by using the struct package [20] from the standard library by defining the expected structure of each variable according to the corresponding C/C++ types.

4.3 Data Storage

After the parsing stage, it was possible to have the data in memory. This implies that the data had to be parsed again whenever it was needed for analysis. It could be used for small data and quick analysis, but it would have contradicted the goals of this project as the proposed tooling should be able to work with large data sets and allow analysis in a detailed manner that could accommodate larger data sets over time. In this section, the report focuses on how to store the parsed data, so that the mentioned needs can be fulfilled efficiently.

4.3.1 SQL When it comes to database options, one of the oldest paradigms available comes in the form of SQL, which stands for "Structured Query Language" [21]. As mentioned by Microsoft [21], this language came out as a demand from the users of database management systems, which resulted in having an Implementation of Tooling | 29

easy to operate high-level language. From this development, many types of database solutions based on SQL emerged, which made the use of language with its respective paradigm popular with good community support. When it comes to data science, according to Marin Fotache and Catalin Strimbei [22], the algebraic nature of the query language provides a vast pool of features that can be used for the analysis of big data. Although, it is also stated that these features become efficient and easy to use trough certain enterprise products which makes it harder for small groups to invest and work with these solutions effectively [22]. In other words, the cost of utilization would not redeem well enough for this project’s purpose, given the needs and circumstances defined by Crackshell for the proposed tooling service. Furthermore, Fotache and Strimbei also state that the design of such a service would come with significant overhead, since that the data stored needs to be structured in a relational and strict way, which leaves less room for flexibility [22]. This implies that the loosely-shaped nature of the parsed data structures in the previous section could not be accommodated easily and could also lead to costly maintenance plans in the future as the game is kept developing and new types of game messages have been added. After further analysis and workshops together with Crackshell, the overhead caused by the design of structured data stored in a SQL based database started to seem like a costly decision, which led the project to look for other options. As mentioned by Fotache and Strimbei [22], there were already solutions being developed and new paradigms available as NoSQL, which caught the team’s attention as the mention of a performant, free and easy to implement solution together with the support of high volume data did not sound bad at all.

4.3.2 NoSQL NoSQL is a not that recent, but rather a young paradigm compared to SQL and according to a study made in KIIT University [23], the need of flexibility stemming from the emerging big data processing became a significant motivation towards this paradigm. Furthermore, it is also stated that NoSQL helps with the scaling and efficiency problems experienced by big cloud vendors [23]. There are many options that one can go with when it comes to data storage. As stated by the International Journal of Database Management Systems [24], there are four main strategies of storing data in NoSQL based database solutions and some of them even provide schema-less structures. These could be mentioned as stated below. 30 | Implementation of Tooling

Key value strategy could be summarised as having the data tied to a unique key in a schema-less structure where the data can be accessed using the unique keys without any conflict [24]. It is also an efficient solution, which is also stated in the comparative study [24]. Document concept is like having JSON objects being stored independently from each other in a given NoSQL based database. In the comparative study, it has been stated that this concept is pretty similar to relational databases, but the difference comes in the power of the ability to have a schema-less structure in the database [24]. This would allow the project to accommodate the storage of loosely-shaped data structures while also still benefiting from both some kind of top-level database structure and the efficiency provided by the above mentioned Key Value strategy. Column based strategy, as mentioned in the comparative study [24], has a significant performance benefit that is achieved through the scalable architecture and the utilization of the structured database. This type of strategy seems to work well with cloud services that have a high demand for access and performance, but in this project’s case structured database would not work well and the gained performance could not be utilized in an intended manner. Graph strategy, with its directed nature, can provide good performance benefits with strongly-related database structures, which is stated in the comparative study [24] by comparing to the recursive joins of a usual relational database solution. In this regard, the proposed tooling would not need a strategy that works heavily with relational data, hence no significant performance gain would be needed. Furthermore, the overhead of design and implementation with this strategy could hinder the progress of the tooling without providing a benefit in this context. After comparing the above stated storage methods, a decision to move forward with Document strategy has been made.

4.3.3 MongoDB When it comes to NoSQL solutions, two main technologies could be used in this project’s context. One being the MySQL and the other being the MongoDB. In this regard, the project took a path of further analyzing these options to choose a technology that could accommodate the intended storage solution effectively. According to a study made concerning the differences between MySQL and MongoDB [25], an interesting result has been compiled which points out that MySQL would have to operate on static data that follows a certain Implementation of Tooling | 31

structure, while MongoDB provides a solution that makes it possible to easily have a document-based data storage strategy. Furthermore, according to the International Journal of Database Management Systems [24], MongoDB seems to be a popular choice in the developer community. This implies that moving forward with MongoDB could provide further advantages when it comes to community support, documentation and easy compatibility with other modules of the proposed tooling for data analysis. On a performance analysis done in the comparison study [25], it has also been showcased that MongoDB is overall faster with both reading and inserting operations. Once the decision was made on MongoDB, a design analysis has been conducted on how to architecture a loose-structure for the storage of the parsed data structures mentioned in the previous section. A decision to have a document for every major data structure has been made. In this regard, one concern was performance, especially when reading the stored documents. One option was to have the game messages also stored as an array of objects inside a network packet document, but through the analysis, it has been deemed as a solution that could result in performance loss. The reasoning behind this causation was stemming from the nested nature of having the game messages also encapsulated by the network packet documents. As the data grew, this would have slowed down the read operations. Because of this concern, a decision to have two loosely-structured major document types, one being the network packets and the other being the game messages. The game messages could point to their network packets, but the network packets didn’t need to know anything about their originally encapsulated game messages. This implies that network packets would be used as extra meta-data when needed, which resulted in a design that helped with the overall performance caused by the independent nature of document storage. Below, you can see the loosely defined document structure for each respective major data structure according to the parsing script explained in the previous section. Network Packet Structure { packet_id: used to distinguish packets peer: respective peer id this packet is received from or sent to in a given session delivery_type: respective delivery type data_size: respective total packet size } 32 | Implementation of Tooling

Message Structure { message_type: used to determine the message type parameters: respective array of parsed parameters packet_id: packet this message belongs to packet_size: total packet size this message belongs to _id: used to distinguish messages }

Once the extraction, parsing and storage were designed and implemented, the overall process so far could be summarised as the pseudo-code stated below. The choices made until this point allowed a process that is easy to implement and use, which is an attribute Crackshell was looking for when requesting this tooling service to be provided. Pseudo Code for the Final Data Import Solution ... 1 Open a buffer for the file holding the binary data 2 Start reading from the buffer 3 Parse until a pattern of a Packet is met 4 Parse the respective Messages inside the Packet 5 Send the Packet separately for storage 6 Send the Messages separately for storage 7 Iterate from step 2 until end of file ... Implementation of Tooling | 33

4.4 Environment

Good data extraction and storage by itself is not enough when one aims to have an analysis of large chunks of data that are both interconnected and dynamic. An improvement in this regard would be to provide an environment that can be extended with needed features and is also easy to maintain. This section covers both the steps of choosing an environment and providing a solution compatible with the proposed decisions so far.

4.4.1 SQL Analysis Even though the decision to continue with a NoSQL has been showcased in the previous section, it is also important to point out that a SQL Analysis tooling was a studied option during the design stage of the proposed tooling. In this regard, this subsection aims to showcase the findings in this context in a related manner to the thesis question. An approach that could be implemented quickly by just using the SQL environment could redeem itself by utilizing the power of raw queries that could be used to analyze the data. This approach would not need any further design, hence could be said that it is a fast and easy way of providing tooling. However, in order to have queries that are useful and showcase good data insights, one would need to have good experience in generating these queries. Furthermore, another finding to consider was the need to already having a fundamental understanding of the stored data. This implies that whoever is analyzing the data had to have good fundamental knowledge in this context, which was something Crackshell didn’t feel interested in. The reasoning was that the analysis of the data should be easy enough to operate effectively. Additional findings in this regard were the speed of utilization, which could be related to the possible mistakes and reiterations of queries that could be mitigated easily by using a tool that would allow the automation of the fundamental stages of the analysis. When the need of looking at the data as if one had no idea about it came apparent in the project, just using SQL queries deemed itself as possible causation for missing out possibly important aspects of the stored data since the insights would be provided only by the user generated queries. 34 | Implementation of Tooling

4.4.2 Apache Kafka Another environment that was considered, but later on deemed as inapplicable in relation to the goals of this project was Apache Kafka. According to the documentation by the Apache Software foundation [26], Apache Kafka could be summarised as a platform that provides distributed streaming solutions that could be used for the messaging, storage and the processing of streams of data. This would mean that the idea could be interesting to implement if the project was proceeding with cloud options since that Apache Kafka could be used for all three previous stages that this report presented so far and including the analysis of the data. This would have been an applicable solution if the data that is subject to analysis was analyzed in a real-time manner, but in this project’s context and concerning the needs of Crackshell, the aim is to aggregate the game data to analyze later on. In other words, if there was a need for real-time logging where Crackshell could benefit from those logs, the advantage of data stream processing could be useful. In Crackshell’s case, they want to have a tooling service that provides both ease of use and flexibility to analyze the previously aggregated data in a way that they can look for answers to cover issues like performance and possible bugs in a game session’s network layer. After the findings mentioned in this subsection, the overhead of implementing a tooling service trough Apache Kafka started to not look like an applicable option, hence the project proceeded with looking for other options.

4.4.3R The goal of providing a data analysis tooling led the project to a study of the R language, which is very popular in this domain as it is a language that specifically focuses on statistical data analysis. At this stage of the project, R looked very promising with the default features it comes with. As mentioned by Mateusz Staniak and Przemyslaw Biecek [27], exploratory data analysis that is desired to be achieved through the proposed tooling service could be provided by external packages, which made R language more interesting for this project. In the article by Staniak and Biecek [27], they focus on the exploratory phase of data analysis and the available packages that can be used to further accommodate an easier process. The vast choice of packages led to a thinking of having this as a strong candidate, which caused a further analysis of R language concerning how it performs in comparison to other languages that are available for easy data analysis. Implementation of Tooling | 35

One of the studies from Valparaiso University [28] showcases the advantages and disadvantages of R over other languages suitable for data analysis. From this comparison, it was interesting to see that R language had quite a positive overall impression, but specifically the ease of utilization by new-comers and the popularity of the language in comparison to other languages such as Python were quite low [28]. This was important to consider for the design of the tooling service as the ease of use for new-comers was an important factor. In this regard, one can say that it is both important for the providers of the tooling service who should have an easy time with implementing solution and Crackshell who should be able to easily utilize the tooling. Considering the lower popularity of R compared to Python also led to a concern of whether the available packages would be updated frequently and how the community support would be. Furthermore, if Crackshell was to hire people for data analysis, this could redeem it harder to find a developer given that the popularity is lower. Another thing to consider was the experience of the team currently working on the design and implementation of the tooling. In this regard, the decision to continue with R could lead the team to focus on trying to figure out the workings of R rather than providing a solution in a time-restricted manner. Together with the above reasoning and the experience of the team, a decision to skip R was made, which was also supported by Crackshell when they thought about the maintenance of the tooling.

4.4.4 Python Even though the study on R didn’t result in using R for the tooling, the analysis of R led to an interesting finding of Python when it comes to data analysis, which led the project to have a study on Python and its extensions to figure out if it would redeem itself as applicable. Python is a general-purpose language that is both easy to learn and utilize [29]. Ease of use and the support of the community which is caused by its open-source nature makes Python possible to be used in many different areas of the software world. This proved itself quite easily once the team wrote small scripts to get familiar with the language. The aim of writing these small scripts with different functionalities was to see how the language works, ease of development, and to get a good idea on how one could organize the source code if the decision to continue with Python was made. In this regard, the syntax was very easy to handle and the utilization of the standard system libraries made the possibility of faster development of the tooling possible. This is 36 | Implementation of Tooling

a causation by the system libraries and packages available that makes the developer focus on the actual problem rather than trying to overcome system or language-related issues, which made the language more attractive as the team stress-tested the language with small scripts solving random small problems such as the ones used in interviews. Even though this could be thought as a personal viewpoint of the team, the idea from the beginning was to find an environment that could be easy to develop and use, which is a topic where it is hard to be objective. Although, the importance of aligning this sub-study to the main goals of the thesis project redeemed the sub-study as useful work, given that Crackshell was interested in having something easy to work with as well. The results from another comparative study have been also in consideration where one could say that python is more popular and comes with many more community-supported packages [30]. These advantages over R were crucial when it comes to the ease of development and use of the tooling. To further elaborate, one could say that the popularity could mean that the packages would be updated more frequently and more community guides would be availabile. Furthermore, the possibility of hiring people easily when needed would be appreciated by Crackshell, given that the stated popularity also correlates with the number of developers using the language [28]. One other aspect of Python that was also interesting was the available support from the community, which led the whole language into a vast pool of packages that could be used for data analysis. Additionally, these packages were updated quite frequently by the community and finding guides in this respect was also easy compared to other packages provided by other comparable development environments. It is also important to mention that the available packages were compatible with each other, which seemed like a promise of further extension of the tooling when needed. Once the testing of the language with small scripts and further study of the packages together with the comparative studies, a decision to continue with Python for the tooling implementation was made.

4.4.5 Jupyter Notebook One of the reasons for choosing Python for the tooling environment was the realization of a package called Jupyter Notebook through the conducted research on available packages for data analysis. Jupyter Notebook by itself is actually an open source software that can be used with many different languages and aims to provide solutions towards easy analysis together with Implementation of Tooling | 37

many other features such as version control and interactivity [31]. The nature of easy to edit and share interactive notebooks came as a good point of interest for the project as Crackshell was also looking for a solution that they can easily explore the data. Another aspect that redeemed both Python and Jupyter Notebook as an applicable solution was the ease of reproducing the analysis on different systems, which is an analysis showcased in the report by the University of Federal Fluminense where they look into reproducibility of the notebooks [32]. In this study, they also mention the environment management service called Anaconda, which is described as the software that handles the needed data science packages and the notebook related dependencies [32]. Even though the results have small rate of reproducibility of the notebooks, it is important to mention that their implications of the result mostly points out the documentation related issues such as the need of knowing which packages are needed for the respective notebook [32]. This could be easily mitigated in this project’s context given that the tooling will be used by Crackshell and having a good documentation of the tooling and the notebooks accessible by the team would resolve the dependency issues, hence the reproducibility rate of the notebooks could increase. Furthermore, there is also another study from University of California that looked into the effects of Jupyter Notebook on Open Science, which is described as the plausibility of using Jupyter Notebook in a way that benefits the community supported research projects [33]. In this project’s context, this study could be interpreted in a way that the support of version control and good documentation could redeem using Jupyter Notebook for data analysis applicable in a way that makes team work easier.

4.5 Extensions

Current environment provides an easy to use workflow for data analysis, but the features that are provided by default are not enough especially with the consideration of big data analysis and the questions that should be possible to answer in a fast manner. In this section, the report covers the reasoning, design, and implementation conducted to provide extensions to the proposed tooling so that Crackshell analyze the processed data effectively.

4.5.1 Pandas When the data is fetched from the MongoDB database to the Jupyter Notebook’s python environment, utilization of the in-memory data was harder and less- 38 | Implementation of Tooling

performant than expected in the case of not using any related packages, but pure system libraries. In this regard, the team started to look for options that could make it performant and easy to work with in-memory data objects. A candidate package that looked promising was Pandas, which is described by NumFocus as a package that provides high-level implementations for data analysis purposes that has its critical points written in C [34]. What made this package more interesting was the study made by Wes Mckinney, where he describes the package as a robust and performant addition to Python that makes the Python language more feature-rich and easy to use for in-memory data manipulation, hence easier data analysis [35]. Fundamental features, as mentioned by Mckinney [35], such as the structured data manipulation, indexing and the data models that made even the advanced analysis such as grouping and aggregating made the package more attractive to use. In this regard, a decision to use Pandas as a data manipulation package was made.

4.5.2 Qgrid As the prototyping progressed, one aspect that Crackshell was not happy about the notebook rendering of the data frames provided by the pandas package. The problem was the cutting and still way of the rendering of the data inside the notebook, which made it a must to modify the code whenever a new question had to be answered. In this regard, a study on finding a solution to this problem has been conducted, which led to the finding of a package called Qgrid. As described in the API documentation [36], Qgrid is a widget extension for Jupyter Notebook, which allows the interactive rendering of the data frames provided by the pandas package in a performant way that allows on-the-fly scrolling, filtering, and sorting. This package description made Qgrid look like a plausible solution that could fulfil the needs of Crackshell when it comes to data frame rendering. The analysis progressed further with adding Qgrid to the tooling and prototyping with it together with Crackshell. Through these prototyping sessions and the received feedback, it was clear to the team that the problem was solved by using Qgrid in an effective way and the expectations from the API documentation [36] were met.

4.5.3 Pivottablejs Another feature missing with the default version of the Jupyter Notebook in Python was the dynamic and interactive graphing solutions, which came as a problem throughout the prototyping and workshop sessions together with Implementation of Tooling | 39

Crackshell. The feedback received was not in favor of having only still graphs that one can only update by changing the code sections of the notebook. This implies that there is a need for interactive rendering of the graphs, which is to be utilized in an on-the-fly analysis manner. In this regard, a study on possible solutions has commenced. After studies in this subject focusing on what can be done in compliance with the already implemented components of the Jupyter Notebook, one option came in the form of a package compatible with pandas. This package is a Python version of Pivottablejs, which is described by Nicolas Kruchten as a solution that provides both interactivity and nice-looking graphs that are easy to work with [37]. In this regard, further prototyping and workshop sessions have been conducted together with Crackshell to figure out if the extension was fulfilling their needs. As a result, Crackshell was quite happy with the solution and the impression of the implementing team was positive as well. The reasoning was the ease of implementing the extension in compilation to the proposed tooling, which made the package an easy solution. Furthermore, the package allowed many kinds of fundamental data analysis scenarios to be done easily without writing any additional code, but rather just feeding the necessary data.

4.6 Distribution and Portability

The Data is processed and the environment is set, the only thing left tooling wise is the question of how one can replicate this tooling in many different computer systems. The easy replication of tooling would provide an accessible utilization of the proposed tooling where Crackshell can work more productively. In this regard, this section covers the steps taken for the solution provided together with the considered needs of Crackshell.

4.6.1 Virtualization One of the typical solutions at hand was the classical way of virtualization, which is to be provided by the additional operating system for this service that allows replication of the whole environment. Although, the findings that emerged in this regard through the reports and research studies, the idea of just using virtualization in this context faded away quickly. According to a study made in DIT University [38], even though the virtualization allows the replication of environments, the overhead produced compared to other solutions such as containers was significantly higher. This led to a concern 40 | Implementation of Tooling

of whether this solution would be both performant and easy to utilize given that many variables need to be considered in the process [38]. This leads the project team to look for other possible options as well.

4.6.2 Containers According to the report by Riga Technical University [39], a recently popular solution for replication is the utilization of containers, which are described as lightweight and easy to manage process isolation solutions that benefit from respective kernel features. Furthermore, in the report compiled in Riga Technical University, the increased popularity and the provided agility of containers compared to old-style virtualization methods are also mentioned as advantages [39]. This implies that the implementation and the setup with container could redeem itself as an efficient solution, but the analysis concerning the performance had to be made before moving forward. In this regard, a group study conducted by IBM, Georgia Institute of Technology, and Yunnan University showcase the performance implications of using containers in comparison to other methods of virtualization [40]. In their study, they have found out that using containerization solutions such as Docker redeemed itself much faster compared to usual virtualization methods, especially when it comes to CPU utilization, startup and execution times, and scaling [40]. The findings in this regard lead the project to proceed with the study of container solutions such as Docker to assess the plausibility of a potential solution.

4.6.3 Docker Since the project was aiming for a lightweight solution and something easy to utilize, the findings so far led the project to further prototype with Docker to figure out if it fulfills the needs of the project. These needs could be summarised as ease of replication and management without any overhead that could result in performance loss. According to Docker Inc [41], Docker is a software that provides a robust and performant infrastructure for easy containerization of environments to replicate these environments regardless of the respective system variables and dependencies. This was a good promise that was aligned with the project needs at this stage, so the team started looking at the documentation and figured out that there would be a need of orchestration in a manner that the MongoDB and the Jupyter Notebook would be separated, but still work together in their virtual network layer. In this regard, Docker Compose came to help, which is Implementation of Tooling | 41

described in its documentation as a tool for running many containers together without the need for the additional overhead of management on the user side [42]. This tool works with a YAML file, which is used as a configuration script to define the services and their respective settings. At this stage, the configuration and the containerization of the services were as easy as showcased below. A Section from the YAML File ... services: tool-jupyter-notebook: container_name: tool_notebook build: . links: - mongo:mongo tool-mongo: container_name: tool_db image: mongo ...

Figure 4.7: Proposed Structure of the Services

After the implementation of the script for Docker, the structure of the tool when containerized was as showcased in figure 4.7. It was easy to implement and manage. Furthermore, the feedback from Crackshell was also positive when it comes to them replicating and setting up the environment as they mentioned the ease of use, which was the main goal of this subsection. In this regard, a decision to continue with Docker as a containerization solution for replication has been made. 42 | Results

Chapter 5

Results

Once the design and implementation were done, the next step was to further analyze how the solution performs. In this regard, this chapter aims to provide information regarding the workflow of the proposed solution. Furthermore, a big chunk of extracted data is analyzed together with the proposed solution, which leads to certain insights that are also showcased in this section. When it comes to the discussion of these results, the reader should check out the next chapter, which provides the respective information.

5.1 Workflow

Figure 5.1: An Overview of an Example Workflow

The final version of the tooling could be summarised according to the flow showcased in figure 5.1. In this regard, the design of the tooling aligned quite well with the desired initial flow showcased in the previous chapters. Furthermore, the feedback from Crackshell was positive in this aspect when it comes to ease of use and utilization. Results | 43

The first stage of the flow would be to play sessions in a version of the game that supports data extraction, which will dump the desired data in binary form. This data will later be processed trough the parsing script the project proposed and stored directly in MongoDB. From the extraction up until this stage, everything is automated. Once the time for the data analysis comes, the docker setup helps with the startup of the environment together with a connection to the MongoDB, which is then ready for the analysis. Together with the implemented extensions, the workflow had a standard way of processing when it comes to fundamental ways of data analysis. The advanced and more in-depth analysis could also be conducted by just utilizing the power of Jupyter Notebook in this aspect. In this regard, the next subsection showcases the findings of the project concerning the game sessions and the respective data collected.

5.2 Data Insights

Figure 5.2: A Datagrid Example for the Processed Data through Qgrid Package

Figure 5.2 can be thought of not only as the showcase of the dynamic rendering of the data frames provided with the tooling but also as one of the very initial findings concerning the game thanks to the tooling available now. The finding itself was related to one of the most sent message types on average throughout many game sessions and can be further looked into in table 5.1. 44 | Results

PlayerMove by itself as a game message was utilizing around 19% of the pool of messages in any given game session. This can be thought of as a normal situation initially when one assumes that one of the most frequent events in a game would be the movement of player characters, but what’s interesting in this regard was especially the values of these game messages that were sent consecutively. The values for the consecutive PlayerMove messages had only about a pixel difference when it comes to the position of the character in the respective map of the game. This implies that each peer is sending their position in a very aggressive way for each pixel that puts an extra unnecessary overhead on the network. Especially when one thinks about how much of a difference a pixel makes for the human eye in a dynamic on a high-definition screen. This was an interesting finding for Crackshell as well where they also thought that there is a significant room for improvement. Another finding that was interesting was regarding the UnitBuffed messages, where Crackshell was surprised of the results. The reasoning was the dominating frequency of this type of messages, which is around 28% and quite unexpected of a domination where Crackshell thought that this is a bug in the design that should be fixed in the game. The concern in this regard was the idea of many more items being available to be buffed in the game as the game being kept developing and players keep fetching more items over time. This would mean that the over time development of the game could actually lead to significant performance loss, given that the game would continue with increasing the number of sent UnitBuffed messages. Another aspect was how the maximum package size was 1196 bytes and the standard deviation was around 412 bytes, which implies that these kind of messages were clustered in a manner that is not that consistent. When it comes to clustering, it was also easy to see that some of the messages types that are not supposed to be clustered such as the Ping messages were also clustered, which made the team have concerns about the plausibility of some of these sent messages. Clustering also came as an additional concern in a different way when it comes to the very high standard deviations, which could further support the idea of clustering techniques for messages were not working fully as intended. The clustered nature of certain packages such as SetPetTarget together with its noticeably high frequency also led to concerns such as whether the sent event was already becoming stale before it was even received, given that these messages were found to be clustered together with each other where the cluster consisted of SetPetTarget messages in a single network packet. Results | 45

Message Type Mean Min Max Median Std Frequency UnitBuffed 650.678712 26 1196 634.0 412.170417 0.281973 PlayerMove 14.000000 14 14 14.0 0.000000 0.196339 ModifierTriggerEffect 568.587974 28 1196 454.0 386.281475 0.148029 UnitDamagedBySelf 463.288358 22 1196 342.0 364.397758 0.106947 PlayerSyncStats 72.670417 10 1196 10.0 152.637761 0.082900 SetPetTarget 71.491626 14 1196 24.0 152.197428 0.021407 PlayerStackSkillAdd 328.792483 10 1196 243.0 289.281830 0.016433 UnitKilled 565.605936 22 1196 454.0 374.889306 0.014958 PlayerActiveSkillActivate 247.614718 20 1196 197.0 205.926176 0.014539 PlayerShareExperience 576.999606 28 1196 468.0 373.613167 0.014200 PlayerActiveSkillDoActivate 242.006697 15 1182 195.0 195.718047 0.014011 PlayerGiveGold 88.167091 6 1196 26.0 182.968221 0.013762 SetLoadedState 46.998633 6 1196 6.0 134.488658 0.012297 Ping 42.012910 2 1196 2.0 126.719520 0.012258 PlayerSyncExperience 357.544248 14 1196 170.0 401.583257 0.012138 UnitDecimated 326.296973 14 1196 248.0 302.568033 0.011657 PlayerPickups 103.692162 26 1196 52.0 172.899986 0.004449 UnitPicked 103.692162 26 1196 52.0 172.899986 0.004449 PlayerDamaged 219.903132 22 1196 98.0 264.369429 0.003846 DoSpawnUnitBase 560.781877 42 1196 444.0 365.276721 0.002581 PlayerHealed 670.390244 20 1196 608.0 382.654232 0.002067 UnitUseSSkill 349.773626 44 1196 244.0 270.634359 0.001274 UnitUseSkill 351.136264 62 1196 244.0 272.173825 0.001274 SpawnedOwnedUnit 277.768924 48 1196 120.0 270.027293 0.001055 AttachEffect 217.941333 10 1196 98.0 273.161526 0.001050 BoltShooter 495.997253 34 1196 475.5 328.074240 0.001020 PlayerTitleModifiers 252.645094 185 413 234.0 55.408632 0.000671 UnitHealed 159.799065 10 815 110.0 144.432076 0.000599 PlayerGiveUpgrade 86.643505 13 377 20.0 99.421325 0.000464 PlayerGiveItem 237.080702 200 413 233.0 29.424716 0.000399 PlayerStackSkillTake 244.548387 10 982 179.0 228.071853 0.000174 UseUnitSecure 14.946429 6 108 6.0 16.555579 0.000157 PlayerCombo 150.563636 11 1043 47.0 216.118067 0.000154 PlayerLoadPet 223.191011 43 288 220.0 34.615379 0.000125 SpawnPlayer 28.391304 28 36 28.0 1.307725 0.000064 ProximityTrapExit 30.555556 6 106 21.0 28.387567 0.000050 ProximityTrapEnter 22.222222 6 106 14.0 27.229303 0.000050 PlayerGiveKey 98.529412 10 556 26.0 165.118917 0.000048 PlayerGiveDrink 376.875000 376 390 376.0 3.443086 0.000045 UnitPickSecure 101.529412 10 908 20.0 219.733395 0.000024 PlayerActiveSkillRelease 53.428571 10 158 34.0 52.608444 0.000020 StartScenarioDownload 6.000000 6 6 6.0 0.000000 0.000010 PlayerShatterActivate 196.666667 134 288 183.0 63.506430 0.000008 Table 5.1: Overview of the Game Message Types per Session in Accordance to Package Size and Frequency 46 | Results

Regarding the possible caching options, there were also traces of seeing the same strings being sent and received in a single session where the opportunity for implementing a caching system and just sending small event messages rather than the full string every time was significantly apparent. This implies that there is a room for optimization when using the game dialog texts.

120 110 100 90 80 70 60 50 40 30 Average Size (bytes) 20 10 0 0 10 20 30 40 50 60 70 80 90 100110120 Time (seconds)

Figure 5.3: Average Network Packet Size Over Time

To further understand how the clustering works in a single game session, the tooling has been utilized to check for the average network packet size over a single game session where the game has been played similarly throughout the whole session. This utilization helped further with understanding how the clustering works and one aspect that can be seen from the graph in figure 5.3 is that clustering was not fully utilized at all especially with the maximum possible size with the utilized clustering is 1196 bytes. The reasoning that was proposed for this was mainly the PlayerMove messages, which were affecting the network utilization significantly and further proved through this graph. Although, one should also point out that the average sizes over time still do not look promising even if the PlayerMove messages were not put into consideration. In this regard, one should analyze the implementation of the clustering algorithm utilization in the game engine, but that would be a task for optimization, which is out of scope for this project. Another concern that emerged through this analysis was the case of never seeing around 60% of the available message types, but the explanation from Crackshell was that most of those messages were specific level related Results | 47

messages such as a defeat of a level , or an event very specific that only happens in certain moments, which was an explanation for why the gathered session data didn’t cover those types of messages. Although, this was not seen as a problem by Crackshell, given that they were more concerned with the messages that are sent more frequently as those ones would typically generate most of the overhead in any given game session, hence it is seen as a good point to initially start working on the optimization opportunities mentioned above. 48 | Conclusion

Chapter 6

Conclusion

This chapter aims to conclude the thesis report by providing a comprehensive summary of the whole report together with writers’ reflections and the possible future work.

6.1 Limitations

When it comes to the design and the implementation of the tooling service, one can argue that the derived results are significantly affected by the needs of Crackshell, which can be an issue if another indie studio tries to implement these solutions accordingly. In this regard, one should keep in mind that each case can be unique in its way when solutions are being provided, hence the way of designing and the thinking showcased here should be considered more rather replicating results right away. This implies that the way of designing the proposed tooling can help derive the solutions in other cases, but taking the proposition here as a pure replication can redeem itself as not helpful depending on the differences in needs compared to Crackshell. Other aspect that should be considered is the measures taken in relation to the thesis project team’s preferences and the limited time constraints, which in turn led to decisions that looked more optimal in given cases. One should keep in mind that the decisions taken are also affected by the requirements of the project together with the efficiency of development.

6.2 Future work

One aspect that can be interesting and helpful to look into is the possibility of corporating machine learning solutions to the proposed tooling in a way Conclusion | 49

that can help with the automation of data exploration by figuring out certain patterns. This could potentially cut down the time spent exploring the data in a significant way, assuming that the findings are accommodated together with a nice presentation that is also automated. More automation in this respect could redeem the utilization of the tooling effectively. The current workflow of the tooling allows comprehensive data analysis only if the person utilizing the tool has enough knowledge and experience, which can be a problem over time if the developers who are working with the tooling service only want to see the results. In this regard, a solution that makes these steps automated can be helpful.

6.3 Ethics and Sustainability

Together with the optimization of the networks, which is one of the reasons for implementing the proposed tooling, the studio aims to utilize their available network bandwidth more efficiently. This implies that further savings on energy and related costs can help Crackshell and the environment in a significant way. One should consider that these improvements help with the future development of the game as well just like with the importance of the chosen algorithm when it comes to complexity, hence lead to more performant game sessions and increased player base together with a longer lifetime for the game. The data gathered through the proposed tooling for analysis has no player- related specific information, which in other words could be presented as anonymous data. This implies that the tooling service would comply with player privacy, hence explains the aspects of ethics when it comes to data gathering. In this regard, the proposed tooling service has no negative relation to ethics and might even help players have a better gaming experience through the above-mentioned optimization possibilities.

6.4 Reflections

Through this project, we got to test a lot of tooling solutions and had the chance of analyzing big game data together with the methodologies showcased in the previous chapters. This helped with our further understanding of the underlying topics in a way that we could both look at the needed components from a top-level design perspective and conduct individual work focused on each respective component. Through this opportunity, we were also able to 50 | Conclusion

help Crackshell with the proposed tooling, hence our design strategy together with clients also improved. Another aspect to consider was the path taken to overcome an unknown and novel challenge, which was a process that was both interesting and demanding of robust problem-solving abilities.

6.5 Verdict

Crackshell was satisfied with the outcome of the tooling service, which can be taken as a support-point when it comes to the validity of the decisions taken. Furthermore, this also implies that the goals of the thesis project have been achieved, hence leaves Crackshell with a good foundation for the tooling that can be both utilized for data analysis and extended depending on the future needs where one can give the future work above as an example. This is a good outcome since that Crackshell will use the provided tooling and continue extending it depending on their future needs as their game and the game engine is kept developing. REFERENCES | 51

References

[1] Crackshell. (accessed: 2020-03-19) Heroes of hammerwatch. [Online]. Available: http://www.heroesofhammerwatch.com/

[2] U. Technologies. (accessed: 2020-04-01) Game engines—how do they work? [Online]. Available: https://unity3d.com/what-is-a-game-engine

[3] ——. (accessed: 2020-04-01) Unity game engine. [Online]. Available: https://unity.com/

[4] I. Epic Games. (accessed: 2020-04-01) Unreal engine. [Online]. Available: https://www.unrealengine.com/en-US/

[5] U. Technologies. (accessed: 2020-04-01) Made with unity. [Online]. Available: https://unity.com/madewith

[6] C. U. Press. (accessed: 2020-04-01) Case study. [Online]. Available: https://dictionary.cambridge.org/dictionary/english/case-study

[7] U. of Southern California. (accessed: 2020-04-01) How to approach writing a case study research paper. [Online]. Available: https: //libguides.usc.edu/writingguide/casestudy

[8] I. Cloudflare. (accessed: 2020-04-01) What is the osi model? [Online]. Available: https://www.cloudflare.com/learning/ ddos/glossary/open-systems-interconnection-model-osi/

[9] Steam. (accessed: 2020-04-01) Steam matchmaking lobbies. [Online]. Available: https://partner.steamgames.com/doc/features/multiplayer/ matchmaking

[10] Microsoft. (accessed: 2020-04-01) Database design . [Online]. Available: https://support.office.com/en-us/article/ database-design-basics-eb2159cf-1e30-401a-8084-bd4f9c9ca1f5 52 | REFERENCES

[11] U. S. O. of Research Integrity. (accessed: 2020-04-01) Data analysis. [Online]. Available: https://ori.hhs.gov/education/products/n_illinois_ u/datamanagement/datopic.html

[12] J. Dillard. (accessed: 2020-04-01) The data analysis process: 5 steps to better decision making. [Online]. Available: https://www.bigskyassociates.com/blog/bid/372186/ The-Data-Analysis-Process-5-Steps-To-Better-Decision-Making

[13] I. Red Hat. (accessed: 2020-04-01) What is virtualization? [Online]. Available: https://www.redhat.com/en/topics/virtualization

[14] IBM. (accessed: 2020-04-01) What is containerization? [Online]. Available: https://www.ibm.com/cloud/learn/containerization

[15] I. D. Foundation. (accessed: 2020-04-2) What is design thinking? [Online]. Available: https://www.interaction-design.org/literature/ topics/design-thinking

[16] S. Gibbons. (accessed: 2020-03-25) Ux stories communicate designs. [Online]. Available: https://www.nngroup.com/articles/ux-stories/

[17] AWS. (accessed: 2020-03-25) What is continuous delivery? [Online]. Available: https://aws.amazon.com/devops/continuous-delivery/

[18] P. Srivastava and R. Khan, “A review paper on cloud computing,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 8, p. 17, 06 2018. doi: 10.23956/ijarcsse.v8i6.711

[19] M. Viggiato, R. Terra, H. Rocha, M. Valente, and E. Figueiredo, “Microservices in practice: A survey study,” 09 2018.

[20] Python. (accessed: 2020-03-20) Struct. [Online]. Available: https: //docs.python.org/2/library/struct.html

[21] Microsoft. (accessed: 2020-03-21) Structured query language (sql). [Online]. Available: https://docs.microsoft.com/en-us/sql/odbc/ reference/structured-query-language-sql?redirectedfrom=MSDN& view=sql--ver15

[22] C. S. Marin Fotache. (accessed: 2020-03-21) Sql and data analysis. some implications for data analysits and higher education. REFERENCES | 53

[Online]. Available: https://www.sciencedirect.com/science/article/pii/ S2212567115000714

[23] P. k. P. Biswajeet Sethi, Samaresh Mishra. (accessed: 2020-03-21) A study of nosql database. [Online]. Available: https://www.ijert.org/ research/a-study-of-nosql-database-IJERTV3IS041265.pdf

[24] M. V, “Comparative study of nosql document, column store databases and evaluation of cassandra,” International Journal of Database Management Systems, vol. 6, pp. 11–26, 08 2014. doi: 10.5121/ijdms.2014.6402

[25] C. Győrödi, R. Gyorodi, G. Pecherle, and A. Olah, “A comparative study: Mongodb vs. mysql,” 06 2015. doi: 10.13140/RG.2.1.1226.7685

[26] A. S. Foundation. (accessed: 2020-03-21) Introduction. [Online]. Available: https://kafka.apache.org/intro

[27] M. Staniak and P. Biecek, “The landscape of r packages for automated exploratory data analysis,” 03 2019. doi: 10.32614/RJ-2019-033

[28] G. R. Z. H. Ceyhun Ozgur, Taylor Colliau and E. B. Myer-Tyson. (accessed: 2020-03-21) Matlab vs. python vs. r. [Online]. Available: http://www.jds-online.com/files/150%E5%AE%8C%E6%88%90V.pdf

[29] P. S. Foundation. (accessed: 2020-03-22) Python. [Online]. Available: https://www.python.org/about/

[30] J. N. Jim Brittain, Mariana Cendon and john Pleis. (accessed: 2020- 03-21) Data scientist’s analysis toolbox: Comparison of python, r, and sas performance. [Online]. Available: https://scholar.smu.edu/cgi/ viewcontent.cgi?article=1021&context=datasciencereview

[31] P. Jupyter. (accessed: 2020-03-22) Jupyter notebook. [Online]. Available: https://jupyter.org/index.html

[32] V. B. Joao Felipe Pimentel, Leonardo Murta and J. Freire. (accessed: 2020-03-22) A large-scale study about quality and reproducibility of jupyter notebooks. [Online]. Available: http://www.ic.uff.br/~leomurta/ papers/pimentel2019a.pdf

[33] M. S. G. Bernadette M. Randles, Irene V. Pasquetto and C. L. Borgman. (accessed: 2020-03-22) Using the jupyter notebook as 54 | REFERENCES

a tool for open science: An empirical study. [Online]. Available: https://arxiv.org/pdf/1804.05492.pdf

[34] NumFocus. (accessed: 2020-03-22) About pandas. [Online]. Available: https://pandas.pydata.org/about/index.html

[35] W. Mckinney, “pandas: a foundational python library for data analysis and statistics,” Python High Performance Science Computer, 01 2011.

[36] Quantopian. (accessed: 2020-03-22) Qgrid api documentation. [Online]. Available: https://qgrid.readthedocs.io/en/latest/

[37] N. Kruchten. (accessed: 2020-03-22) Pivottable.js. [Online]. Available: https://pivottable.js.org/examples/

[38] A. Yadav, M. Garg, and R. Mehra, Docker Containers Versus Virtual Machine-Based Virtualization: Proceedings of IEMIS 2018, Volume 3, 01 2019, pp. 141–150. ISBN 978-981-13-1500-8

[39] V. Silva, M. Kirikova, and G. Alksnis, “Containers for virtualization: An overview,” Applied Computer Systems, vol. 23, pp. 21–27, 05 2018. doi: 10.2478/acss-2018-0003

[40] C. P. Q. D. L. W. Qi Zhang, Ling Liu and W. Zhou. (accessed: 2020-03- 22) A comparative study of containers and virtual machines in big data environment. [Online]. Available: https://arxiv.org/pdf/1807.01842.pdf

[41] D. Inc. (accessed: 2020-03-22) What is a container? [Online]. Available: https://www.docker.com/resources/what-container

[42] ——. (accessed: 2020-03-22) Overview of docker compose. [Online]. Available: https://docs.docker.com/compose/ For DIVA

{ "Author1": { "name": "Murat Eksi"}, "Author2": { "name": "Markus Pihl"}, "Degree": {"Educational program": "Bachelor’s Programme in Information and Communication Technology"}, "Title": { "Main title": "Video Game Network Analysis", "Subtitle": "A Study on Tooling Design", "Language": "eng" }, "Alternative title": { "Main title": "Nätverksanalys för Videospel", "Subtitle": "En Studie om Verktygsdesign", "Language": "swe" }, "Supervisor1": { "name": "Thomas Sjöland" }, "Examiner": { "name": "Johan Montelius", "organisation": {"L1": "School of Electrical Engineering and Computer Science" } }, "Cooperation": { "Partner_name": "Crackshell AB"}, "Other information": { "Year": "2020", "Number of pages": "ix,55"} } TRITA-EECS-EX-2020:98

www.kth.se