<<

Long Short-Term Memory for detecting DDoS flooding attacks within TensorFlow Implementation framework.

Peter Ken Bediako

Information Security, master's level (120 credits) 2017

Luleå University of Technology Department of , Electrical and Space Engineering Master Thesis Project

Long Short-Term Memory Recurrent Neural Network for detecting DDoS flooding attacks within TensorFlow Implementation framework.

Author: Peter Ken Bediako E-mail: [email protected]

Supervisor: Dr. Ali Ismail Awad E-mail: [email protected]

November 2017 Master of Science in Security

Lule˚aUniversity of Technology Department of Computer Science, Electrical and Space Engineering Contents

1. Introduction ...... 1 1.1 Problem Statement ...... 2 1.2 Research Questions ...... 3 1.3 Research Goals ...... 3 1.4 Delimitation ...... 4 1.5 Research Contribution ...... 4 1.6 Research Methodology ...... 4 1.7 Thesis Outline ...... 4

2. Background Information ...... 5 2.1 Overview of DDoS attack ...... 5 2.2 How DDoS operate ...... 6 2.3 How DDoS attack happens ...... 7 2.4 Types of DDoS attack ...... 7 2.5 DDoS flooding attack types: ...... 9 2.5.1 UDP Flood ...... 9 2.5.2 ICMP (Ping) Flood ...... 9 2.5.3 TCP SYN Flood ...... 10 2.6 algorithms in detecting DDoS attacks...... 10 2.7 model ...... 11 2.8 Recurrent Neural Networks (RNNs) ...... 12 2.9 Reasons for choosing LSTM over other techniques ...... 14 2.10 The Datasets formatting for deep learning ...... 14 2.11 TensorFlow ...... 15 2.11.1 TensorFlow flow graph ...... 15 2.11.2 Tensors ...... 15 2.11.3 Benefits of using TensorFlow in this thesis work ...... 16 2.11.4 Benefits for using TensorBoard in this thesis work ...... 16

3. Literature Review ...... 17 3.1 Defense mechanisms and techniques to detect DDoS attacks ...... 17 3.2 Research Gap analysis ...... 19 3.3 Improvement to existing gaps identify in the existing research works...... 23

4. Research Methodology ...... 24 4.1 Design Science Research (DSR) Methodology ...... 24 4.2 How DSR Methodology is used to address RQ1...... 26 4.3 How DSR Methodology is used to address RQ2...... 27 Contents iii

5. Design and Development ...... 28 5.1 Design, Develop and Implement LSTM RNN Algorithm ...... 28 5.2 Designing the algorithm based on the four layers of LSTM RNN ...... 29 5.3 Environment setup for Algorithm Development ...... 30 5.4 Design Structure for LSTM RNN technique using Tensorflow API ...... 31 5.5 How to access Tensorboard for Training Results...... 32

6. Results ...... 35 6.1 Results ...... 35 6.1.1 Data Collection Phase ...... 35 6.1.2 Data Cleaning and Segmenting Phase ...... 37 6.1.3 Data Pre-Processing Phase ...... 37 6.1.4 LSTM RNN Training and Testing Phase ...... 38 6.1.5 Classification of attacks ...... 38 6.2 Results from CPU Based environment ...... 41 6.2.1 Training the Model ...... 41 6.2.2 CPU base system Results -RQ1 ...... 41 6.2.3 CPU Iteration 1 ...... 41 6.2.4 CPU Iteration 2 ...... 44 6.2.5 CPU Iteration 3 ...... 46 6.2.6 CPU Iteration 4 ...... 49 6.2.7 CPU Iteration 5 ...... 51 6.3 Analysis of CPU Based Environment Results -RQ1 ...... 54 6.3.1 Accuracy and Dataset size Analysis: ...... 55 6.3.2 Accuracy and Epochs Analysis: ...... 56 6.3.3 Final and Average Accuracy Analysis: ...... 57 6.4 Results from GPU Based environment ...... 57 6.4.1 GPU Iteration 1 ...... 58 6.4.2 GPU Iteration 2 ...... 60 6.4.3 GPU Iteration 3 ...... 62 6.4.4 GPU Iteration 4 ...... 64 6.4.5 GPU Iteration 5 ...... 66 6.5 Analysis of CPU and GPU Based System Results RQ2 ...... 69 6.5.1 CPU- GPU Accuracy Analysis: ...... 69 6.5.2 CPU and GPU Time analysis: ...... 70 6.5.3 CPU and GPU Epoch analysis: ...... 70 6.5.4 Epochs Analysis on GPU Systems Training Time: ...... 71

7. Discussion ...... 73

8. Conclusion and Future Works ...... 75 8.1 Conclusion ...... 75 8.2 Future Works ...... 75 List of Figures

1.1 AI model training and Testing Process overview...... 3

2.1 DDoS common network and multi-vector attacks surface [3] ...... 6 2.2 Average peak bandwidth for DDoS attacks [3] ...... 6 2.3 DDoS attack Network Infrastructure Illustration ...... 7 2.4 Classification of DDoS attack types...... 8 2.5 The most common DDoS attack types in Q2 2016 [3]...... 8 2.6 Illustration of UDP attack process...... 9 2.7 TCP SYN Flood Process...... 10 2.8 Machine learning techniques [46]...... 11 2.9 Deep Neural Architecture...... 12 2.10 Recurrent Neural Network Model...... 13 2.11 RNNs folded and unfolded state [48] ...... 13 2.12 TensorFlow data flow graph [48] ...... 15 2.13 The vivisection of Tensor [48] ...... 16

4.1 DSR Methodology Model [20]...... 25

5.1 The four interacting repeating module of LSTM RNN [54]...... 28 5.2 LSTM RNN Algorithm design architecture...... 29 5.3 Sample TensorFlow code for layer 1 of the LSTM RNN algorithm...... 30 5.4 CPU and GPU base system environment setup process...... 31 5.5 TensorFlow Algorithm Architecture ...... 31 5.6 How to access TensorBoard...... 33 5.7 TensorBoard Graphs ...... 34

6.1 ISCX link to download dataset...... 35 6.2 3D embedding visualizer of 2000 sample data size with 38 features from 23 different attack types represented by the coded serial numbers of the metadata created for the sample size...... 39 6.3 3D embedding visualizer of 2000 sample data size with 38 features from 23 different attack types represented by the coded labels of the metadata created for the sample size...... 40 6.4 Iteration process adapted to increase the efficiency of LSTM RNN Model...... 41 6.5 Variable Explorer values for iteration 1 ...... 42 6.6 CPU Results for iteration 1 base on 2000 dataset size and 100 epochs...... 42 6.7 Graph results of iteration 1 based on 100 epoch and 2000 dataset size...... 43 6.8 CPU Results for iteration 2 base on 5000 dataset size and 200 epochs...... 44 6.9 Graph results of iteration 2 based on 200 epoch and 5000 dataset size...... 45 List of Figures v

6.10 Variable Explorer values for iteration 3...... 46 6.11 CPU Results for iteration 3 base on 10000 dataset size and 300 epochs...... 47 6.12 Graph results of iteration 3 based on 300 epoch and 10000 dataset size...... 48 6.13 CPU Results for iteration 4 base on 15000 dataset size and 400 epochs...... 49 6.14 Graph results of iteration 4 based on 400 epoch and 15000 dataset size...... 50 6.15 CPU Results for iteration 5 base on 20000 dataset size and 500 epochs ...... 52 6.16 Graph results of iteration 5 based on 500 epoch and 20000 dataset size...... 53 6.17 Relations among the five iteration process ...... 55 6.18 Accuracy and Dataset Size Analysis ...... 56 6.19 Accuracy and Epochs Analysis ...... 56 6.20 Final and Average Accuracy Analysis ...... 57 6.21 GPU results for iteration 1 based on 100 epochs and 2000 dataset size ...... 58 6.22 Graph of iteration 1 GPU results based on 100 epoch and 2000 dataset size...... 59 6.23 GPU results for iteration 2 based on 200 epochs and 5000 dataset size ...... 60 6.24 Graph of iteration 2 GPU results based on 200 epoch and 5000 dataset size . . . . . 61 6.25 GPU results for iteration 3 based on 300 epochs and 10000 dataset size...... 62 6.26 Graph of iteration 3 GPU results based on 300 epochs and 10000 dataset size . . . . 63 6.27 GPU results for iteration 4 based on 400 epochs and 15000 dataset size...... 64 6.28 Graph of iteration 4 GPU results based on 400 epochs and 15000 dataset size. . . . . 65 6.29 GPU results for iteration 5 based on 500 epochs and 20000 dataset size...... 67 6.30 Graph of iteration 5 GPU results based on 500 epochs and 20000 dataset size. . . . . 68 6.31 CPU-GPU Accuracy analysis...... 69 6.32 CPU-GPU base system time analysis...... 70 6.33 CPU-GPU base system Epochs analysis...... 71 6.34 Constant and Varied Epochs analysis on-GPU base system...... 72 List of Tables

2.1 Difference between Machines learning and Deep learning...... 12

3.1 Summary of deep learning techniques...... 21 3.2 Summary of deep learning techniques...... 22 3.3 Performance Comparison of LSTM and [55]...... 23

4.1 Evaluation parameters for iteration process...... 26

6.1 The 22 Attack Types of NSL-KDD Dataset...... 36 6.2 The 42 features of NSL-KDD Dataset...... 37 6.3 CPU Iteration 1 Parameters...... 42 6.4 CPU Iteration 2 Parameters...... 44 6.5 CPU Iteration 3 Parameters...... 46 6.6 CPU Iteration 4 Parameters...... 49 6.7 CPU Iteration 5 Parameters...... 51 6.8 Model training results for the five(5) Iterations...... 54 6.9 Relations between iterations ...... 55 6.10 Accuracy and Dataset Analysis...... 55 6.11 Accuracy and Epochs Analysis ...... 56 6.12 Final and Average Accuracy Analysis ...... 57 6.13 GPU Evaluation Parameters ...... 57 6.14 GPU Iteration 1 Parameters ...... 58 6.15 GPU Iteration 2 Parameters ...... 60 6.16 GPU Iteration 3 Parameters ...... 62 6.17 GPU Iteration 4 Parameters ...... 64 6.18 GPU Iteration 5 Parameters ...... 66 6.19 Summary of GPU and CPU evaluation parameters and training results...... 69 6.20 CPU and GPU Time results...... 70 6.21 CPU and GPU Epochs results...... 70 6.22 CPU and GPU results with varying dataset and constant Epochs ...... 71 6.23 Epoch Analysis on GPU systems results...... 72 6.24 Performance Comparison of LSTM , Random Forest and project results...... 72 Acronyms

AI Artificial Intelligence. 1

ANN Artificial Neural Network. 1

BPTT Back Propagation Through Time. 13

BW-DDoS Bandwidth DDoS. 5

CIS critical internet sites. 5

CPU . x

DARPA Defense Advanced Research Projects Agency. 19

DDoS Distributed Denial of Service. x

DNS Domain Name Service. 6

DSRM Design Science Research Methodology. x

GPU . x

HMM . 12

ICMP Internet Control Message Protocol. 8

IDS Intrusion Detection System. x

IoT Internet of Things. 1

LSTM Long Short-Term Memory. x

ML Machine learning. 10

OS . 9

OSI Open System Interconnection. 6 -RNN Recurrent Random Neural Networks. 1

RNN Recurrent Neural Networks. 2

SL . 10 Acronyms viii

TCP Transport Control Protocol. 6

UDP User Datagram Protocol. 8 Acronyms ix

Acknowledgements

My heartfelt gratitude goes to the Almighty God, Jehovah for the wisdom and the strength given me from the conception of this programme to the completion of this work.

Secondly, my sincere thanks goes to my supervisor, Dr.Ali Ismail Awad, for his patience and insight- ful contributions at every stage of the work. His desire for my success made him grant me remote access to his personal GPU system to be used in the experiment aspect of this project. I am very grateful SIR!

I am also grateful to my research opponents Mr. Mikel Izagirre, Marcus Hufvudsson and Andreas Schmoll, for their contributions and suggestions during and after the thesis seminars.

To Dr.Arash Habibi Lashkari from Canadian Institute for Cybersecurity (CIC), University of New Brunswick , i say thank you for assisting in getting the dataset used for this project.

Finally , I say a big thank you to Mr. Richard Nyarko for his encouragement during the diffi- cult moments. Acronyms x

Dedication

To my lovely Wife Margaret Amisare and Children Felicity, Peter Ken Jnr. and Perry Ken. Acronyms xi

Abstract

Distributed Denial of Service (DDoS) attacks is one of the most widespread security attacks to in- ternet service providers. It is the most easily launched attack, but very difficult and expensive to detect and mitigate. In view of the devastating effect of DDoS attacks, there has been the increase on the adaptation of a network detection technique to reveal the presence of DDoS attack before huge traffic buildup to prevent service availability.

Several works done on DDoS attack detection reveals that, the conventional DDoS attack detec- tion methods based on statistical divergence is useful, however, the large surface area of the internet which serve as the main conduit for DDoS flooding attacks to occur, makes it difficult to use this approach to detect attacks on the network . Hence this research work is focused on using detection techniques based on a deep learning technique, because it is proven as the most effective detection technique against DDoS attacks.

Out of the several deep neural network techniques available, this research focuses on one aspect of recurrent neural network called Long Short-Term Memory (LSTM) and TensorFlow framework to build and train a deep neural network model to detect the presence of DDoS attacks on a network. This model can be used to develop an Intrusion Detection System (IDS) to aid in detecting DDoS attacks on the network. Also, at the completion of this project, the expectation of the produced model is to have a higher detection accuracy rates, and a low false alarm rates.

Design Science Research Methodology (DSRM) was used to carry out this project. The test exper- iment for this work was performed on CPU and GPU base systems to determine the base system's effect on the detection accuracy of the model. To achieve the set goals,seven evaluating parameters were used to test the model's detection accu- racy and performance on both Central Processing Unit (CPU) and Graphics Processing Unit (GPU) systems.

The results reveal that the model was able to produce a detection accuracy of 99.968% on both CPU and GPU base system which is better than the results by Yuan et al. [55] which is 97.606%. Also the results prove that the model's performance does not depend on the based system used for the training but rather depends on the dataset size . However, the GPU systems train faster than CPU systems. It also revealed that increasing the value of epochs during training does not affect the models detection accuracy but rather extends the training time.

This model is limited to detecting 17 different attack types on maintaining the same detection accuracy mentioned above. Further future work should be done to increase the detecting attack type to unlimited so that it will be able to detect all attack types. Chapter No. 1

Introduction

The most common significant threat to online service providers is distributed denial of service (DDoS) attack [11]. It involves the attacker's ability to compromise the availability of web services offered by the targeted host. This is achieved by using attacking agents such as botnet and or compromised Internet of Things (IoT) devices to exhaust the victims capacity (Network Bandwidth, System and Application resources) preventing service availability to legitimate users [1, 5, 37, 38]. According to [37], the main victims for DDoS attack are organizations with online presence and the effect of DDoS attack to these organizations ranges from very simple problems to significant ones such as financial losses, compromise of national security and endangerment of human life.

A research conducted by Nexusguard in 2016, revealed that the frequency of DDoS attacks oc- curring has increased tremendously by 83% in the second quarter of 2016 [53]. The volatile increase in DDoS attack is attributed to several factors by various researchers. According to Mansfield- Devine [3], the increase in DDoS attacks is due to the attackers motivational factors such as money, politics, revenge, reputation and destruction to perform other attacks. Kshirsagar et al. [2] said that the increase in DDoS attack is a result of hackers advancing in their attacking strategies, and continuously look for new vulnerabilities to exploit. Fallah et al. [8] and Kim et al. [43], also said that the high increase in DDoS attack is due to the inefficiency of the existing detection and mitigation techniques to filter legitimate packet from attack packets, the large volume of data used from spoofed source and the type of DDoS attack used by the attacker.

The alarming growth of DDoS attack over the years have attracted several research groups at- tentions to investigate and propose different detection techniques and countermeasures in the three aspect of DDoS attacks. These are namely, detection of existence of DDoS attack, classification of traffic as normal traffic or DDoS traffic and lastly, the mitigation response to the attack [37]. This thesis work focuses on the detection aspect and below are some of the works of other researchers who used machine learning to detect DDoS attacks.

Sabrina et al. [38] used RNN Ensemble, an Artificial Intelligence (AI) approach to help detect the various behavior of DDoS attacks, Loukas et al. [41] used Bayesian classifiers to compare four different implementations and Recurrent Random Neural Networks (r-RNN) to fuse real-time net- working statistical data to distinguish normal traffic from DDoS attack traffic. Salah et al. [39] adopted an approach called the multi-Agent pattern recognition mechanism to detect DDoS at- tacks, Kim et al. [43] used the combined approach consisting of automatic feature selection using decision tree algorithm, and classifier generation module using neural networks to detect DDoS attacks. Saied et al. [5] also used Artificial Neural Network (ANN) algorithm to detect and miti- gate DDoS attack based on specific patterns that distinguishes attack traffic from genuine traffic, etc. 1. Introduction 2

However, all the above mentioned detection techniques becomes obsolete with the emergent of new attacking techniques and strategies which has different features or patterns. This is because the detection algorithms or techniques depends on packet features to train and detect DDoS attacks [43]. The concerns shown and contributions made by the various research groups have therefore attracted the attention of both the industry and academia sector to come up with a detection technique or mechanism which will serve as the first line of defence against DDoS attacks mitigation. This is because the research groups all support the fact that online services are under attack and therefore need an effective and efficient system to help detect the presence of DDoS attack on the network infrastructure so that the right mitigation techniques are applied before its devastating effects are experienced by users.

In view of the line of discussion, this research work focuses on using a Deep learning technique called Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN) to develop and train a tensorflow artificial intelligence (AI) model which will detect the presence of DDoS flooding attack traffic patterns on the network, and achieve a high detection accuracy, and a low false alarm rates.

LSTM RNN was chosen as the technique for this work because among the family of RNN tech- niques and existing conventional machine learning techniques, LSTM RNN is rated as the best. This rating is as results of its ability to learn longer historical features during training time[55]. Also unlike the other techniques, LSTM is able to resolve the vanishing gradient problems associated with vanilla RNNs with BPTT by ensuring that a constant error is maintained to allow the RNN to learn over long time steps. Lastly, LSTM has the ability to use its gated cell state, which makes it act like computer's memory to make decisions on what data is allowed to be written to it, read from it and store data on it, to keep features of attacks learned from training process and make detection decisions based on this stored information on gated cell. Also the LSTM has been able to achieve an accuracy rate of 97.996% [55] which the older machine learning technique has not been able to achieve .

TensorFlow was also chosen as the machine learning implementation platform because it has a flexible architecture that supports CPU, GPU, Android and iOS. This makes it easy to port trained models to any other hardware without any code changes. It is also simple to train its mathematical functions useful for neural networks.

1.1 Problem Statement

Despite the numerous DDoS attack countermeasures provided by several research studies, the in- crease in DDoS attacks clearly reveal that gaps still exist on how DDoS attack is efficiently detected, analyzed, and mitigated in a timely manner to ensure network service availability. The main ob- jective of this research work is to use a Deep learning technique - LSTM RNN and tensorflow implementation framework to develop and train a tensorflow AI model, capable of learning the fea- tures of a network packet automatically, to enhance its ability to detect DDoS flooding attack traffic patterns on the network infrastructure in a timely and efficient manner, achieving high degree of accuracy in detection rates, and low false alarm rates. 1. Introduction 3

1.2 Research Questions

Many schools of thought have done several works on detecting DDoS attacks based on machine learning techniques such as RNN Ensemble [38], Bayesian classifiers and Recurrent Random Neu- ral Networks (r-RNN)[41], multi-Agent pattern recognition mechanism[39], the combined approach consisting of decision tree algorithm, and classifier generation module using neural networks [43], Artificial Neural Network(ANN) algorithm [5] etc. to help detect the presence of DDoS flooding at- tacks on the network infrastructure. This thesis work is in line with the various researchers, seeking a countermeasure to DDoS attack detection.

The research questions which this thesis seek to answer is as follows:

RQ1. How well does LSTM RNN performs when implemented using tensorflow framework?

RQ2. What is the performance change of the Tensorflow AI model when trained on CPU and GPU based systems?

1.3 Research Goals

In answering these research questions, the following research goals will be achieved.

• Use LSTM RNN technique and Tensorflow API's to produce the code for a tensorflow AI model capable of detecting DDoS flooding attacks.

• Determine the possible fine-tune evaluation parameters required by the AI model to achieve the best detection accuracy and false alarm rate within the shortest time.

• Determine the performance of the AI model when trained on CPU and GPU based systems.

• Design a test environment for CPU and GPU system which will be based on NSL-KDD dataset.

Figure 1.1 below illustrates the process which will be followed to train and test the detection accuracy and false alarm rate of the LSTM RNN TensorFlow AI model.

Fig. 1.1: AI model training and Testing Process overview.

Testing of LSTM RNN Tensorflow Algorithm Process: To achieve this aspect of the research goal, on a Windows 10 computer, Ananconda3 software will be installed, followed by installing the 1. Introduction 4 windows version of TensorFlow, python3, and Pip3, to enable the smooth running of the code for the model. To train and test the developed model, the downloaded NSL-KDD dataset from University of New Brunswick Lab will be converted to a tensorflow acceptable format and use for that purpose.

1.4 Delimitation

This research work is limited to detection of DDoS flooding attacks particularly UDP,TCP and ICMP flooding attack types. To achieve a higher detection accuracy, the model is limited to detecting only seventeen (17) different attacks, since drastic decrease in detection accuracy is achieved when the capabilities of the algorithm is increased to detecting above seventeen different attacks. Also the produced model is limited to detecting only 22 different attacks because the dataset is limited to only 22 different attacks features. Lastly, the model works in an offline mode not real-time because the dataset is in offline mode.

1.5 Research Contribution

The main contribution of this research work is:

• Provide the tensorflow code for LSTM RNN algorithm and its fine-tune parameters that detect DDoS attacks with a high detection accuracy and low false alarm rates.

• Evaluate the performance of Tensorflow AI model on a CPU and GPU base systems.

• Analyse the effect of the evaluation parameters on the model's detection accuracy and false alarm rate.

• Provide the steps on how to install tensorflow on a windows based system. This will include steps on how to install, upgrade Ananconda3 platform, python 3, pip3 and other python libraries required by tensorflow to function. Steps on how to verify the existence of the installed package are also provided.

• Provide the steps and procedure on how to get the tensorflow AI model to run in CPU and GPU based system.

1.6 Research Methodology

This thesis work will produce a Tensorflow LSTM RNN AI model for detecting DDoS flooding attacks. This will be developed based on design science research (DSR) methodology framework [9]. This is because the DSR methodology framework provides guidelines which support the creation, evaluation, and implementation of a Tensorflow LSTM RNN software artifact.

1.7 Thesis Outline

This thesis work is organized as follows: chapter 2; give's the background information on DDoS flooding attacks types, Machine learning, Deep learning technique - Recurrent Neuron Networks (RNN), LSTM and TensorFlow framework. Chapter 3; literature review of similar works done by other researchers. Chapter 4; explains the research methodology used to conduct this work. Chapter 5;explains how the LSTM RNN algorithm is derived, how to setup for CPU and GPU environment for TensorFlow implementation and how to code the LSTM RNN using TensorFlow API. In chapter 6, the evaluation results performed on various test parameters are presented. Chapter 7 will delve into the discussion of the results and chapter 8 will present the conclusion drawn from the project and the possible future works which can be done to enhance the project. Chapter No. 2

Background Information

2.1 Overview of DDoS attack

Several researchers have different definitions for DDoS attack, but all carry the same thought and meaning. For instance, Ko et al. [6] research group explained DDoS attack as involves the use of extremely large volume of packets by the attacker on its target machine through the simultaneous cooperation of a large number of hosts that are distributed throughout the Internet. According to this research group, the attack traffic eventually prevents legitimate users from accessing the target system by consuming the network bandwidth and the internal system resource.

Fallah et al. [8] school of thought explained that DDoS flooding attacks are a form of attacks whereby the attacker first looks for hosts which have multiple vulnerabilities on the internet, to compromise and then remotely manage them to send large volumes of packets to the targeted vic- tim. They further explain that when the attackers are successful, they are able to consume critical system resources such as CPU time, stack space in a protocol software, or Internet link capacity, so that the victim cannot provide network services to its legitimate clients.

From business perspective, Yoon [7] explained that DDoS attack is performed by outside attackers on the internet by using botmaster to remotely control botnet to launch an attack on critical inter- net sites (CIS) so that essential business services such as internet banking, e-government, e-trading, e-commence etc. are no longer available to users.

Geva et al. [11] said that DDoS attack involves the use of many attacking agents to launch an attack using an excessive load to a victim host, service, or network. [11] added that, Bandwidth DDoS (BW-DDoS) attacks disrupt network infrastructure operation by causing congestion, which is carried out by increasing the total amount of traffic (in bytes) or the total amount of packets. These attacks can cause loss or severe degradation of connectivity between the Internet and victim net- works or even whole autonomous systems (ASs), possibly disconnecting entire regions of the Internet.

In support of other colleague researchers, Doron et al. [12] describe DDoS attacks as threats common among internet users which involves the attacker using bot-net to send traffic to saturate the band- width of the victims web server so that legitimate users will be denied access to the services provided by the web server. Looking at the various explanations given by the different research groups, it can be deduced that even though they are all explaining their understanding from different perspectives, they all seek to point out the devastating effects DDoS attack brings to its victims which includes the exhaustion of network systems and application resources that leads to the unavailability of services provided by these systems. 2. Background Information 6

2.2 How DDoS operate

DDoS attacks occur in two ways. The first is to exhaust system resources by flooding the network with malicious packets so that service is no longer available, and the second is to trigger a bug in the code of the victim's operating system software or an application software causing it to crash, so that service becomes unavailable [41]. DDoS attacks are launched against either Transport, network or the application layers of the Open System Interconnection (OSI) model. However, the attacker's main target is the network layer. The resources that DDoS attack can affect is categorized into Network bandwidth, System resources and Application resources [1].Figure 2.1 below shows the proportion of DDoS attacks on the three targeted layers of the OSI model.

Fig. 2.1: DDoS common network and multi-vector attacks surface [3]

The DDoS attack report of 2015 by Arbor networks states that among the several techniques available, the three most widely used techniques are Transport Control Protocol (TCP) SYN flood, Domain Name Service (DNS) flood and Smurf attacks. However, the most frequently used attack technique among the three is TCP SYN. The estimated volume of traffic used to launch such an attack is around 400 Gbps in the year 2014 as compared to 100 Gbps in the year 2010. And as it stands now the report reveals that the frequency of DDoS attacks occurring has increased tremendously over the years. Nexusguard also reports that DDoS attacks have rapidly risen by 83% to more than 182,000 attacks in the second quarter of 2016 [3].

Fig. 2.2: Average peak bandwidth for DDoS attacks [3] 2. Background Information 7

As shown in figure 2.2, the 2016 DDoS attack survey conducted by IDG/A10 Networks, revealed that most DDoS attacks are in the range of 30-40Gbps, of which does not match the bandwidth of most organizations. All these show that DDoS attack is here to stay so critical network infrastructure should be protected against it [3].

2.3 How DDoS attack happens

To launch a DDoS attack, the attacker, first uses a malware to detect flaws in the operating system or a common application running on groups of computers or IoT and installs a malicious agent pro- gram onto them [5, 39]. These multiple compromised systems are referred to as zombies. Next, the attacker forms a botnet by creating large collections of zombies from globally distributed systems so he can control them from a centralized location. To launch an attack, the attacker will send the attack to the zombie handlers who will automatically distribute the attack to all zombies and then launch the attack against the target system. For the attack to be more dangerous, the smart attacker will generate packet with a spoof IP Address and use that in the attack. This is dangerous because it will be difficult to filter the address, or and even configure firewalls to stop this address as every packet has a spoof IP Address. Figure 2.3 illustrates how the attacker setup the network infrastructure before an attack is launched.

Fig. 2.3: DDoS attack Network Infrastructure Illustration

2.4 Types of DDoS attack

DDoS attack types is classified into Bandwidth depletion and Resource depletion as shown in figure 2.4. 2. Background Information 8

Fig. 2.4: Classification of DDoS attack types.

Based on the targeting resource category (Network bandwidth, System and Application re- sources), the attacker selects the best attack type that will enable him to achieve best results. For instance, if an attacker seeks to consume the available network bandwidth or router resources near a target host or network, he will launch the flooding attack type against the Network band- width. Figure 2.5, represents Verisign DDoS flood attack types trend report. From the figure 2.5, it is obvious that most attackers use the UDP flood attack type more than any other attack type.

Fig. 2.5: The most common DDoS attack types in Q2 2016 [3].

According to [38], flooding occurs when an attacker continuously send streams of packet with the objective to exhaust the victims bandwidth or systems resources. Flooding attacks are classified based on the number of attack types. These are single source and multiple source attack types. This research work focuses on how to detect flooding attack types such as UDP, Internet Control Message Protocol (ICMP)(Ping) and TCP SYN flood. 2. Background Information 9

2.5 DDoS flooding attack types:

2.5.1 UDP Flood

This type of DDoS attack occurs when the attacker floods the ports of the target machine with User Datagram Protocol (UDP) packets. With packets sent to the port, the target machines consistently check for the application using that port for its communication. When the target machine is unable to find any of such expected application, it sends out destination unreachable packets. This process will continue for as long as the target machine continues to see the port as active. This will exhaust the target machine resource casuing inaccessibility of the target machine and its services [21]. Figure 2.6 below illustrates the steps involved in UDP Flood attack.

Fig. 2.6: Illustration of UDP attack process.

Summary of UDP Flood attack process:

1. The attacker A will send a UDP packet to a random port on the target server B system

2. When Target server B realizes that there is no application that is waiting on that port

3. It will generate an ICMP packet of destination unreachable to the forged source address

4. If enough UDP packets are delivered to ports on victim A, the system will go down.

2.5.2 ICMP (Ping) Flood

This form of DDoS attack type is similar to UDP flood attack type. The difference is that instead of overwhelming the target host's resources using a connectionless UDP protocol for the attack, Ping DDoS attack rather uses ICMP connection oriented protocol to send continuous packets to target host without waiting for any feedbacks or replies. Because the protocol used by the attacker is con- nection oriented, the targeted host will try to respond to all the numerous packets sent to it, leading to both incoming bandwidths being consumed by the ICMP flood attack, and outgoing bandwidth been used by the targeted host when responding to the ICMP Echo request being consumed. This makes the system slow and it eventually becomes unavailable to use [17]. This form of attack can be prevented at the operating system (OS) security level by controlling ping services. Most attackers don't like this type of attack because it is easy to track the attacker unless a spoofed address is used. Also the attacker's network availability will be affected in situations where the target server has a larger bandwidth capacity and responds to the request of the attacker having low bandwidth. Additionally, the attacker is not likely to succeed because most firewall systems will filter ICMP traffic. To cover his tracks, the smart attacker will generate a spoof IP address which will shield him from 2. Background Information 10 being easily identified. The drawback is that the attacker will have to generate large volumes of packets which will require that the bandwidth of the attacker has to be increased. When the server responds to the attacker's request on the spoofed source address, it will saturate the network and if the attacker is on the network, his link will also be affected.

2.5.3 TCP SYN Flood

In this form of DDoS attack type, the attacker dwells more on the known weakness of the three-way handshake of TCP connection. In TCP connection process, host machine (requester) will send a SYN request to initiate a TCP connection with a host (target) which must be answered by a SYN- ACK response from that host (target), and then confirmed by an ACK response from the requester (requester). For this attack to occur, the requester sends multiple SYN requests from a spoofed IP address, which does not respond to the hosts SYN-ACK response. The host system will continue to use system resources to wait for acknowledgement for each of the requests sent to its target. The binding of these resources for the numerous requests made will result in denial of service when no other resources will be available for the waiting process [17]. So from figure 2.7, the server keeps sending the SYN-ACK packets to the spoof address in the hope of receiving an acknowledgement from the client machine. This will fill the TCP connection and no new connection will be accepted rendering the system inaccessible by the users [17].

Fig. 2.7: TCP SYN Flood Process.

2.6 Machine learning algorithms in detecting DDoS attacks.

Machine learning (ML) is the process of teaching computers to learn from experience. The main focus of machine learning algorithms is to use computational methods to find and extract natural patterns present in data and analyze the process for extracting the features [28]. There are two main techniques used by machine learning. These are supervised learning (SL) and (UL) techniques [27]. Figure 2.8 below shows the various machine learning techniques. 2. Background Information 11

Fig. 2.8: Machine learning techniques [46].

The main difference among the various categories of machines is that, supervised learning (SL) algorithm depends on both the input and output to learn about a function. However, in unsupervised learning (UL), the algorithm learns about the relationships among data using only the input vector [27]. In a research conducted by [42], it was revealed that despite the existence of several algorithms, the neural network algorithms are considered the best in classifying DDoS attack patterns based on statistical features and its ability to recognize patterns of untrained data, as well its ability to work with imprecise and incomplete data. In an attempt to provide efficient detection techniques for DDoS attacks, several schools of thought have propose different machine learning algorithms such as [34, 33, 32, 31, 30, 29] for DDoS attack detection. The major problem associated with these proposed ML algorithms is that, they have not minimize the cost of the errors and results to more false alarms. The cost associated with false alarm is more expensive than misdetection [23]. The main focus of this paper is to minimize these errors, improve the false alarm rate and increase the detection accuracy using Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) architecture.

2.7 Deep learning model

According to [46], Deep learning is an aspect of machine learning that is making great impact in and natural language processing. It deals with the use of multiple nonlinear processing layers to learn and extract the useful features and objects directly from data. These data come in the form of images, text and sound. Today, deep learning has become the new approach for developing high precision object classification systems because of these three main reasons: 2. Background Information 12

1. Its ability to exceed human-level performance in classifying images.

2. High performance GPU helps to train machines in less times and

3. The large amount of data needed for deep learning is now available for use.

All deep neural models are trained using a large set of labeled data which is the expected output of a task and neural network architectures that contain many layers. The most popular areas that deep learning is applied are image classification, speech recognition and Natural language processing. In deep learning, the term deep refers to the number of hidden layers in the network. Figure 2.9 below shows the architecture of a deep neural network model.

Fig. 2.9: Deep Neural Architecture.

The difference between Machine learning and Deep Learning:

Tab. 2.1: Difference between Machines learning and Deep learning. Difference between Machines learning and Deep learning

Factors Machine Learning Deep Learning

Training Dataset Small Large

Choose your own features Yes No

No. of classifiers available Many Few

Training time Short Long

2.8 Recurrent Neural Networks (RNNs)

According to the research conducted by [35], the main reason for the development of RNNs is to process sequential information. The well know sequencing processing methods used are Hidden MARKOV Model (HMM) and n-gram language model [35]. In working with traditional neural net- works, the assumption has always been that all inputs (and outputs) are independent of each other. This however was not true for many task. For instance, if we want to predict a word in a sentence, it is obvious that one would have to know the words that came before it. RNNs works in a similar way, so the recurrent part of RNNs is as a result of its ability to perform the same task for every element of a sequence with the output being dependent on the previous computations. In theory, RNNs ways of operating can be illustrated as RNNs having memory for storing unbounded history on previously processed elements. So at every point in time, this stored history is used to predict the next output from the process. Figure 2.10 below shows the RNN's model: 2. Background Information 13

Fig. 2.10: Recurrent Neural Network Model.

The state of Recurrent Neural Network.

Fig. 2.11: RNNs folded and unfolded state [48]

From figure 2.11, Section A illustrates the folded state of RNNs and section B illustrates the unfolded state of RNN into a network. The unfolded state of RNN simply illustrates the network of complete sequence. From the section B, we can see that it is a three layer neural network. And this can be referred to as deep learning neural network because it has more than 1 hidden layer. From figure 2.11 above, below are the meanings of the various parameters used in RNN computations.

• U, V and W , represents the weight of neurons.

For instance,W represents the weight of neurons between hidden state S. V represents the weight of neurons between hidden states S and the output O . U represents the weight of neurons between the inputs X and the hidden state S . A major difference in RNN as compared to the traditional deep neural network is that, in RNN, all the three weights at any point in the operation of the RNN, will have the same value, but the values will be different in the case of traditional neural network. This is because the same task is per- formed at each step with different inputs parameters. This reduces the total number of parameters RNN will need to learn. To update these weights, we use Back Propagation Through Time (BPTT) . 2. Background Information 14

• Xt is the input at time step t . For example X1 , could be a one-hot vector corresponding to the second word of a sentence.

• St is the hidden state at time step t . It's the memory of the network. St is calculated based

on the previous hidden state (St−1) and the input at the current state.

• St = f(UXt + WSt−1).

The function f usually is a non-linearity such as tanh or ReLU.St−1, which is required to calculate the first hidden state, is typically initialized to all zeroes.

• Ot is the output at step t . For example, if we wanted to predict the next word in a sentence it would be a vector of probabilities across our vocabulary.

• Ot= softmax(VSt) From figure 2.11, Error calculation for the RNN unfolding is as follows:

P 1 2 • ET otal = 2 (Givenoutput − Actualoutput) Recurrent Neural Training The training in RNN and traditional Neural Network defers although the same backpropagation algorithm is used. This is because of the difference in the number of nets in RNN and traditional Neural Network. In RNN, the weight of neurons parameters are shared by all time steps in the net- work and so the gradient at each output depends on the current and previous time step calculations. For example, in order to calculate the gradient at time ,t = 5, we would need to backpropagate 4 steps and sum up the gradients. This process is called Back Propagation Through Time (BPTT).

2.9 Reasons for choosing LSTM RNN over other techniques

To resolve the vanishing gradient problems associated with training vanilla RNNs with BPTT, RNN's LSTM is used instead. LSTM RNN is able to resolve this BPTT problem by ensuring that a constant error is maintained to allow the RNN to learn over long time steps which enables it to associate problems and its effects remotely. LSTM is able to achieve this because of its gated cell. This gated cell has two states, open and close, which makes it act like a computer's memory because it makes decisions on what data is allowed to be written to it, read from it and stored on it. This feature is what enables it to keep details of attacks learnt from training process and make detec- tion decisions based on this stored information on gated cell. However, the difference to computer memory is that this gate is analog and not digitized. Reference chapter 5 section 5.2 for a detailed explanation of LSTM RNN algorithm.

2.10 The Datasets formatting for deep learning

Achieving a successful outcome in machine and deep learning depends on the type and format of data set used at the training and testing phase. Getting the right data in the right format is the most challenging aspect of machine and deep learning projects. There are two main sets of data required for deep learning and machine learning. These are the training and testing data sets. The training data set is the benchmark upon which the nets are trained. This means that the performance and efficiency of the net depends on getting the right raw training data and transform it into a numerical format, a vector which deep learning algorithm can understand. Producing the right training data set requires much time and expertise knowledge in the area of studies to enable you select the right features needed to train the algorithm. 2. Background Information 15

2.11 TensorFlow

This is the second generation machine learning system created to replace DistBelief. The purpose of this system is to help researchers and users create models for resolving NLP and image recognition using machine learning libraries and data flow graphs. It can also be used in the deep learning to create neural network architecture and develop algorithms in machine learning.

2.11.1 TensorFlow data flow graph

In TensorFlow all the computations are represented as directed graphs, where the mathematical operations (and input/output data as well) are nodes. The edges of the graph represent Multi- Dimensional arrays called Tensor. These are the paths by which data flows from one node to the other node. Figure 2.12 below represents the data flow graph of TensorFlow.

Fig. 2.12: TensorFlow data flow graph [48]

Also from the data flow graph of tensorflow, the ReLu layer represents the hidden layers and Logit layer represents the output layer. Gradient descent is the brain of the neural network. It analyzes all the data and also updates all the parameters in either the hidden or output layer of the neural network. TensorFlow also has its own visualization module called TensorBoard, which can visualize the created model in order to let a user trace the data flow in the model.

2.11.2 Tensors

Data in TensorFlow are represented as tensors (multidimensional and dynamically sized data arrays). Actually, tensors flow in the graph from node to node, thus making the name of the sound logical. Simply speaking, a tensor is a 3D (but it is not a strict mathematical definition, of course!). On figure 2.13 below, you may see a tensor in terms of vivisection. As compared to matrix it has more degrees of freedom regarding data selection and slicing. 2. Background Information 16

Fig. 2.13: The vivisection of Tensor [48]

2.11.3 Benefits of using TensorFlow in this thesis work

1. It has extensive built in support for deep learning and tools to assemble neural networks. This also makes it compatible with many variants of machine learning [50]. 2. Flexibility of representation ; user can create almost any type of data flow graph, visualize and admire it. If you can express your algorithm as data flow graph, you can do it with TensorFlow without exceptions. 3. It has simple trainable mathematical functions that are useful for neural networks. 4. TensorFlow has a flexible architecture that supports CPU, GPU, Android and iOS. Also, Ten- sorFlow model can easily be ported to any other hardware without any code changes. For example, from server to PC and from PC to laptop. 5. TensorFlow is used for both research and production purposes. This therefore allows one to create some model for research and then push this very model into a product (after some code rewriting, of course, as researchers usually forget about code optimization) using the same TensorFlow library. 6. Auto-differentiation. TensorFlow can automatically compute derivatives for you. It is very con- venient if you love gradient-based machine learning algorithms like Stochastic Gradient Descent. 7. It has Python interface. However, it has a poorly documented ++ interface.

2.11.4 Benefits for using TensorBoard in this thesis work

It helps translate the black box representation of neural networks to a flashlight. It helps us to visualize the computational structure of the tensorFlow graph. This enables the developer to analyze the developed algorithm to see how connected layers work and correct any errors which may be due to miss-wired tensors. What about the license? TensorFlow is free for both research and commercial purposes. Why Tensorflow was chosen?

• TensorFlow supports parallel computations, while does not.

• TensorFlow models are more flexible in terms of portability.

• TensorFlow code structure is more human-interpretable and easier to support.

• TensorFlow is a C++ library with Python Interface, while Theano is a Python library with an ability to generate internal C or CUDA modules. Chapter No. 3

Literature Review

The battle against distributed denial of service (DDoS) attack has received several attention from different researchers, all proposing different detection techniques to combat DDoS flooding attacks. The research conducted by Bhuyan et a.[40] reveal that DDoS detection method is classified into four major classes such as statistical, knowledge based, soft computing and and Machine learning. The different classes of DDoS attacks is due to the fact that DDoS flooding attacks have different features which makes it difficult for only one technique to detect. The features include:

• The large volume of stream packets from a spoofed sources. This feature makes it difficult to detect because of the unknown source of attack where the detection is to begin [38].

• A single source volume of data for DDoS flooding attacks can be very low making it very difficult to predict whether it is a good or bad request. This can result in a detection system having either high positives or high negatives rates [38].

• The different characteristics shown by the various tools used to launch DDoS attack, make statistical analysis possible. However, the large surface area of the internet which serve as the main conduit for DDoS attack to occurs, makes it difficult to detect [38]. Hence this research work is focused on using detection techniques based on data mining and machine learning method, because it is rated as the most effective defense system against DDoS flooding attacks [40]

This review will highlight three sections.

• Section 3.1 will talk about works done by various research groups using machine learning methods to detect the ever changing patterns of DDoS flooding attack.

• Section 3.2 will highlight the gaps in the work of the various research groups.

• Section 3.3 will focus on how these gaps can be addressed using LSTM RNN technique as proposed by this research work.

3.1 Defense mechanisms and techniques to detect DDoS attacks

The various research groups proposed several defense techniques and mechanisms to help detect DDoS attack. Loukas et al. [41] research group technique for detecting DDoS attacks was based on three main factors which are, the study of statistical features of incoming DDoS attack traffic, Bayesian classifier to assess and predict the likelihood of an attack and the use of recurrent random neural network(r- RNN) which puts together all gathered information on incoming traffic to make a detection decision. This technique operates on the victim side of the attack. Their work revealed that the performance 3. Literature Review 18 of the technique depended on how well the r-RNN was trained with different features of different DDoS attack packets. In deriving the features for training the r-RNN, the research group used both instantaneous and statistical characteristics of the incoming traffic which gives different results for normal and DDoS traffic.

Again, Loukas et al. [37] in another research work focus on using two schemes, namely, the bi- ologically inspired Random Neural Network (RNN) and multiple Bayesian classifiers to detect and distinguish normal traffic from DDoS attack traffic. This technique works by selecting the detection features and compute estimates for the probability density functions in the form of histogram for the features and the likelihood ratios which serve as the first-level decision for each selected features. Next they calculate their high level decision by fusing the first level decisions with the RNN. And then implement the RNN with actual values and histogram categories of the features. The strength of this approach lies in the fact that, they are able to combine the RNN's discriminating capacity and approximation properties with the incoming traffic's statistical data.

Sabrina et al. [38] proposed a detection procedure which is based on the use of observation and Artificial intelligence (RNN Ensemble) to detect DDoS attack at the client and intermediate nodes. The main reason for choosing this RNN Ensemble detection approach is that, DDoS attack have different behaviors at both client and intermediate nodes, and according to [38], ensemble will be able to help detect the various states of DDoS attack. The detection at the client node will be based on the observation of two main things; the number of rejected requests by an affected node and the changes in the victims resource (CPU, Physical Memory and NIC) usage. These two features will be fed into the RNN ensemble to predict the state of the request whether it is a good or bad request.

Salah et al. [39] used a multi-agent pattern recognition mechanism to detect DDoS attack launched against the victim server in a distributed network fashion having multiple internet gateways. It works based on the principle of distributed multi-agents, performing attack detection at the various levels. The detection is based on the parameters extracted from observed network traffic flow. The agents collectively and in a coordinated manner produce a pattern of network traffic behavior upon which the proposed solution depends on to perform recognition. According to [39] , this solution is robust and fault-tolerant because it employs the use of multiple agents to detect attacks at each node, so the breakdown of any of this node will not affect the operation and performance of the proposed model.

Kim et al. [43] , used Cisco Systems NetFlow and two different data mining technique to de- tect the different types of DDoS attacks. The NetFlow provided seven unique and useful features on every data traffic that enters the network. This included the source IP address, destination IP address, source port, destination port, layer 3 protocol type, TOS byte (DSCP) and input logical interface (in Index). So they used the decision tree algorithm technique to automatically select the various features provided by the NetFlow to model the traffic pattern of the different DDoS attack types. The second technique used is the neural network technology. [43] used this technique to classify DDoS attacks as normal or abnormal traffic using the automatic attributed produced by the decision tree algorithm. According to [43], their results produce twice performance than the heuristic selection and also their approach produced better performance than the single data mining approach.

Saied et al. [5], used Artificial Neural Network (ANN) algorithm to detect and mitigate DDoS attack based on specific patterns that distinguishes attack traffic from genuine traffic in real time environment. To achieve these objectives, they used existing popular DDoS tools to generate real life cases and scenarios to train the ANN algorithm. They said that much success was achieved when the algorithm was trained with more up to date patterns of latest known DDoS attacks. Also they 3. Literature Review 19 discovered that detection rate does not increase with over training using the same dataset. How- ever, they also discovered that ANN algorithm was able to learn from patterns and scenarios and is able to detect zero-day DDoS attack traffic patterns that are similar to what it has been trained with.

Kim et al. [51], research work was focused on detecting attacks on the network. They used a deep neural network to study and develop an artificial Intelligence (AI) based abnormality detection system which according to them was fast and effective to combat evolving attacks. Their study was based on the static KDD Cup 99 dataset developed Defense Advanced Research Projects Agency (DARPA)for the purpose of testing and training the algorithm. Their work reveals the achievement of a high accuracy and detection rate of 99% and a false alarm rate of 0.08%.

Azzouni et al. [52] research work is among the few works that used LSTM RNN to predict fea- ture network traffic. According to them, they used LSTM RNN because it is best suited for learning from experience to classify, process and predict time series with time lags of unknown size. Their LSTM RNN prediction technique was developed to help network operators to detect and react to network traffic changes near real-time before network congestion occurs. The results of their work shows that LSTM RNN outperforms the traditional linear methods and feed forward neural net- works by many orders of magnitude.

Lastly Yuan et al. [55] also determined and compared the performance of different neural net- work models , such as Convolutional Neural Network(CNN),Long Short-Term Memory Neural Net- work(LSTM) and Gated Recurrent Unit Neural Network(GRU) to the Random Forrest method. Their goal was to learn patterns from sequences of network traffic and trace network attack activi- ties. According to them,compared with conventional machine learning methods, their DeepDefense system was able to reduce error rate by 39.69% in dataset Datal4 and from 7.517% to 2.103% in Datal5. The best detection accuracy by their LSTM model produced a detection accuracy of 97.996%.Their findings also prove that Recurrent Neural networks can learn much longer historical features than conventional machine learning techniques.

3.2 Research Gap analysis

The above researchers have introduced different deep learning and machine learning methods to de- tect the presences of DDoS flooding attacks on the network. Below are the research gaps identified after studying the individual work of these research groups:

The research work by Loukas et al. [44] achieved some level of success if the r-RNN was trained very well. However, one major drawback to this technique is that, the results were tested only on a standalone outdated dataset which did not have current DDoS attack features. Also the technique is not a self-learning technique that needs to be continuously trained with newly discovered DDoS attack packets to enable its state to be updated with newly discovered DDoS attack features unlike Tensorflow LSTM RNN which have the capability to predict possible DDoS attack because of its ability to remember previous analysis that is done up to that point.

Sabrina et al. [41] detection procedure was based on the use of human brain to manually observe and make assumptions, and the automated way of using Artificial intelligence (RNN Ensemble).The drawback of this approach pertains to the observation stage where one assumes that the increase in the number of reject request and the increase in system resources indicates that the victim is under attack. This is because such incidents can also occur as a result of normal user usage. When this happens, then all the predictions done by the RNN will be erroneous due to wrong input features fed into it. This problem will be addressed by Tensorflow LSTM RNN because the self-learning process 3. Literature Review 20 will not depend on any human observation and predictions will be based on the dynamically trained past state or context and not on any feature selection.

The work of Salah et al. [42] will have achieved a level of success if the predefined threshold set is the same as the DDoS attack packets. This leaves a big drawback in [42] approach because DDoS attack is capable of sending small amounts of traffic from different spoofed sources which will not exceed the set thresholds and therefore will be allowed onto the network without being detected. This problem will be resolve by Tensorflow LSTM RNN because, it will not depend on thresholds set, but will rather be predicting the presence of DDoS attack based on the dynamically trained states which stores all the newly discovered DDoS attack features and not on set thresholds.

The approach proposed by Kim et al. [46] presents two main bottlenecks in detecting DDoS at- tacks. The first bottleneck is with the feature selection by the NetFlow performed by the cisco router. This is because the cisco router feature selection signature will have to be updated regularly to enable it detect new features exhibited by the DDoS attack type. The second bottle neck is that the neural network used does not have the ability to keep past learn state so that it will be able to predict possible DDoS attack types based on the already learned features. Also because the intelligence of the classification is based on the features selected, there will be an error intro- duced into the detection technique since the features selection may not be 100% accurate at all times.

The research work of Kim et al. [54], also depends on training dataset to update the knowledge base of the artificial Neural Network (ANN) algorithm. The ANN algorithm does not have the capability to automatically learn and store from long term information which later can be used by the algo- rithm to predict possible DDoS attacks. The researchers recommend my approach as the solution to the drawback of their work. They mentioned that their approach was focused on analyzing and classifying a single traffic data. However the time series will be needed, recommending LSTM recurrent neural network model as the right detection technique to battle against distributed denial of service attacks.

The research work by Azzouni et al. [55] achieved great results by using the LSTM RNN method. However, it did not use the tensorFlow implementation frame and also did not compare the perfor- mance of the LSTM RNN technique on both CPU and GPU systems to determine the parameters which will work best for each environment.

The research work by Yuan et al. [55] is the most recent work performed on DDoS attack de- tection on the network. They did a great work by comparing the performance of different neural network models such as CNN,LSTM and GRU to the Random Forrest method. Their best detection accuracy results was produce by their LSTM technique which is 97.996%. However, they did not compare the performance of their model on both CPU and GPU systems to determine the effect on the detection accuracy . They also did not determine which evaluation parameters of their model affect the detection accuracy of the model when trained on both CPU and GPU. Also they did not mention how many different attack types their model is able to detect at such an accuracy level and at what detection level does the model's detection accuracy deteriorate when evaluation parameters are changed.It is because of these drawbacks in their work that is why as part of their future works, they stated that, ”they plan to increase the diversity of DDoS vectors and system set- tings to test their model’s robustness in different environments”. Table 3.3 below shows the result of their training compared to Random Forest model. 3. Literature Review 21 Remarks It uses the Bayesian classifiers tolikelihood assess of the DDoS attack and thengathered use information the as input to r-RNNmake to detection decisions. The RNN ensemble isthe used various to behaviour help of detect DDoS attacks They used Bayesian classifiers tofour different compare implementations and RNN to fuse real-time networking statisticalto distinguish data normal traffic from DDoS at- tack traffic. It works basetributed on multi-agents the performingdetection at principle attack the various of levelsnated in manner dis- a producing coordi- awork pattern of traffic net- behaviour.based The on detection the is observed parameters network traffic extracted flow. from This approach usedrithm Decision for selecting treematically packet algo- and features then use auto- theto neural classify network the attack packetsselected based features. on the TensorFlow Framework No No No No No LSTM RNN No No No No No Train on both CPU and GPU No No No No No Self- trainable No No No No No Summary of deep learning techniques. Tab. 3.1: Detectionnique tech- Bayesianfiers classi- andRecurrent Random Neural Network(r-RNN) Artificialligence intel- Ensemble) (RNN Bayesian classifiers and Random Neu- ral Network (RNN) Multi-agent attack detectionbased scheme on threshold and amulti-agent distributed tern pat- mechanism recognition Decision tree algo- rithm andnetwork. neural Traintraffic on Fea- tures Yes Yes Yes Yes Yes DDoS detection mechanisms by various research groups based on deep learning techniques Deployment Victim node protection Client side and Inter- mediate Nodes Victim mode detection Distributed client- server detection Victim mode detection Objective DDoS at- tack De- tection DDoS attack detection DDoS attack detection DDoS attack detection and miti- gation DDoS attack detection Research Group Loukasal.[44] et Sabrina et al. [41] Loukasal.[40] et Salahal.[42] et Kim et.[46] al No. 1 2 3 4 5 3. Literature Review 22 Remarks It uses Artificialalgorithm Neural to Network detect and (ANN) mitigate DDoStack at- based on specific patternsguishes that attack distin- traffic from genuine traffic. They used LSTMture network traffic RNN based on toachieved previous network and predict traffic data. fu- Used the Artificial Neural Network (ANN) algorithm to detect attacks on the network. Used DeepDefense system to detectattack DDoS patterns on the network. Used Tensorflow Long Short-Termory Mem- (LSTM) Recurrent(RNN) algorithm Neural to detect Network DDoSattacks. flooding TensorFlow Framework No No Yes Yes Yes LSTM RNN No Yes No Yes Yes Train on both CPU and GPU No No No No Yes Self- trainable No Yes No Yes Yes Summary of deep learning techniques. Tab. 3.2: Detection tech- nique Artificial Neural Network (ANN) algorithmDDoS and traffic patterns. attack Network Traffic Matrix prediction usingRNN. LSTM Artificial Neural Network (ANN) algorithm. Convolutional Neuralwork(CNN), Net- LSTMGated Recurrent Unit and Network(GRU) Neural Tensorflow LSTMAlgorithm RNN Traintraffic on Fea- tures Yes No Yes Yes Yes DDoS detection mechanisms by various research groups based on deep learning techniques Deployment Victim mode detection Network Detection Network mode detection Network mode detection Network mode detection Objective DDoS attack detection and miti- gation Network Traffic predic- tion Attack detection DDoS de- tection DDoS Flooding attack detection Research Group Saied et.[5] al Azzouni et al.[55] Kim et[54] al. Yuan et al. [55] PeterBediako Ken No. 6 7 8 9 10 3. Literature Review 23

Tab. 3.3: Performance Comparison of LSTM and Random Forest[55]. Performance Comparison of LSTM and Random Forest

Model Name Error Rate Accuracy

LSTM 2.394% 97.606%

Random Forest 6.373% 93.627%

3.3 Improvement to existing gaps identify in the existing research works.

The main identified research gap is that no paper has been written using both LSTM RNN and Tensorflow framework to develop an algorithm for detecting DDoS flooding attacks. However, [54] and [55] research groups used either of them in their work but not both as this thesis presents. Most of the papers also used the conventional RNN which has the vanishing and exploding gradient [55] problem which the LSTM based RNN solves.

Unlike the various techniques used by the various research groups, the Tensorflow LSTM RNN technique does not only depend on the large dataset use in the training to build it knowledge base (state) but also have the capability to learn and update its knowledge base(sate) automatically with new features of network traffic packets without any human intervention. Hence LSTM RNN does not depend on any preset rules but depends on and the continuous training of the algorithm with updated dataset which has newly discovered features of DDoS attack packets.

Another interesting gap not mentioned by any of the research groups is the performance of the algorithm on CPU or GPU base systems. This is very important because detection rate will not only depend on how the various parameters of the algorithm are fine-tuned to enable it achieve the best detection accuracy within the shortest time but also depends on the performance of the algorithm when it is being run on a CPU or GPU base system. This will enable system developers know the best thresholds to use when implementing Tensorflow LSTM RNN on intrusions detection system which will run on either CPU or GPU based system. Chapter No. 4

Research Methodology

4.1 Design Science Research (DSR) Methodology

This thesis work is focused on using Long Short Term Memory (LSTM) Recurrent Neural Network (RNN), a deep learning technique and TensorFlow implementation framework to develop an AI model which can be incorporated into an intrusion detection software, enabling it to detect DDoS flooding attacks at a higher detection accuracy rate with lower false alarm rate. DSR methodology will be used to achieve these objectives because this project thesis will:

• Produce a TensorFlow LSTM RNN AI model artifact, to address the low detection rate and high false alarm rate of DDoS flooding attack systems.

• The DSR methodology will help to rigorously evaluate the tensorflow AI model's utility,quality,and efficacy.

• The artifact produced will be relevant in solving DDoS flooding attacks.

• The research conducted will represent a verifiable contribution and rigor will be applied in both the development of the model and its evaluation.

• The development of the tensorflow AI model will be based on a search process that draws from existing theories and knowledge to come up with a solution to a defined problem.

• Finally, the developed tensorflow AI model will be demonstrated to the appropriate audiences [20].

Another reason for choosing DSR methodology is because it provides a framework which enables other researchers to easily form a mental model upon which they can understand and evaluate the work of other researchers. By following the six process steps, researchers can easily see whether there are any omissions in the research work or not. The six process steps that form DSR framework are as follows: problem identification and motivation, definition of the objectives for a solution, design and development, demonstration, evaluation, and communication [20]. Figure 4.1 below illustrates the various stages of the DSR Methodology which will help achieve the project objectives. 4. Research Methodology 25

Fig. 4.1: DSR Methodology Model [20].

The following briefly explains what is involved at each stage of the DSR methodology framework processes:

• Activity 1: Problem identification and motivation. This phase of the model defines the research problem and gives reasons why it is important for this work to be pursued as thesis work.

• Activity 2: Defined the objectives for a solution. This stage of the project defines the main objectives of the proposed solution, what it will accomplish after completion of the project and its limitations.

• Activity 3: Design and development. This is the stage where the model is developed based on the proposed solution taking into consideration every assumption and limitations.

• Activity 4: Demonstration. This phase of the model demonstrates how the produced tensorflow AI model is able to solve my research questions.

• Activity 5: Evaluation. This is a continuous process of testing the tensorflow AI model and fine tuning its various parameters until an acceptable result is obtained.

• Activity 6: Communication. This phase of the model deals with the communication of the research problem and its im- portance, the artifact, its utility and novelty, the rigor of its design, and its effectiveness to researchers and other relevant audiences such as practicing professionals, when appropriate.

According to [20], the DSR methodology model produces four different types of artifacts such as: construct, models, methods and instantiations. Following the DSR methodology model, this re- search work will produce two main types of artifacts which will be based on the results achieved from answering the research questions. The research question 1(RQ1) which is the design of the algorithm, will produces a model type of artifact and research question 2 (RQ2), which is the testing of the artifact, will also produce an instantiations type of artifact. The following shows the various lines of activities which will be carried out at each stage of the DSR methodology model. 4. Research Methodology 26

4.2 How DSR Methodology is used to address RQ1.

RQ1. How well does LSTM RNN technique detect DDoS attacks when implemented using tensor- flow framework? Activity 1: Problem identification and motivation. The main research problem is to produce the code for the LSTM RNN AI model using tensorflow API and then determine its detection accuracy and false alarm rate in detecting DDoS flooding attacks. Also the artifact will produce the values for the evaluating parameters which produce the best detection accuracy. The motivation is to find out whether the artifact produced will improve the detection rate of DDoS flooding attacks in intrusion detection systems.

Activity 2: Define the objectives for a solution. The objective of the first phase of the thesis work is to use Deep learning model LSTM RNN and Tensorflow implementation framework to develop an artificial Intelligence model with much higher detection accuracy for DDoS flooding attack types at a much faster rate and with a less false alarm rate. Limitation of the project. This research work is limited to detection of only DDoS flooding attacks, the first aspect of preventing distributed denial of service (DDoS) attacks.

Activity 3: Design and development. The artifact of this thesis work will be designed and developed using a deep neural network technique - LSTM RNN and TensorFlow API libraries. The datasets of different sizes obtained from Univer- sity of New Brunswick Lab [53], will be used to train and test the AI model. Reference chapter five of this thesis work for a detailed explanation of the design and development processes involved in developing, training and testing the LSTM RNN model.

Activity 4 and 5: Demonstration and Evaluation. This phase of the model demonstrates how the LSTM RNN algorithm is able to detect DDoS flood- ing attacks. The model will be subjective to five main iteration processes which will be dependent on variable parameters such as dataset size, epochs, weights, biases, learning rate, nodes and layers. The setup to carry out this iteration process will be as follows: a Windows 10 computer, installed with Ananconda3 software, windows version of TensorFlow, python3, and Pip3 to support the smooth coding and training of the LSTM RNN model. Table 4.1 below shows the parameters which will be used for the five level iteration process.

Tab. 4.1: Evaluation parameters for iteration process. Five Level Iteration Process

Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Parameters size rate

Iteration 1 2000 100 0.001 40 5 Random Random

Iteration 2 5000 200 0.001 40 5 Random Random

Iteration 3 10000 300 0.001 40 5 Random Random

Iteration 4 15000 400 0.001 40 5 Random Random

Iteration 5 20000 500 0.001 40 5 Random Random

The evaluation of the trained model will be based on three parameters, namely, Time taken to complete the training and testing process, the mean square error (MSE) and detection Accuracy. 4. Research Methodology 27

To train and test the developed algorithm, the downloaded NSL-KDD dataset from University of New Brunswick Lab will be converted to a tensorflow acceptable format and then fed into the algo- rithm according to the specified dataset size. The best detection rate is achieved when the evaluation factors produces a detection accuracy of 100%. The tensorboard visual analyzer will be used to in- terpret the results obtained from the training process. Also a detailed explanation on setting up the experiment and evaluation environments is produced in Chapters 6 and 7. This is to assist other re- searchers to follow and produce similar results and even experiment with different parameters as well.

4.3 How DSR Methodology is used to address RQ2.

RQ2. What is the performance change of the Tensorflow AI model when trained on CPU and GPU based systems?

Activity 1: Problem identification and motivation. The literature review reveals that no research group has investigated the performance of Tensorflow LSTM RNN on CPU and GPU based system. The motivation is to evaluate the model on both CPU and GPU base systems and also to have a GPU base train version of the model.

Activity 2: Define the objectives for a solution. The objective of the RQ2 of the project is to find out whether there is any performance difference if the same dependent parameters such as weights, biases, learning rate, nodes and layers obtained from RQ1 results are used to train the same algorithm on GPU base systems with much larger datasets size but completed in the same training time frame. And also to determine the effect of the evaluation parameters on the model’s performance.

Activity 3: Design and development. The artifact from RQ1 will be fine-tuned to a GPU based acceptable format to enable the algorithm run in a GPU based environment. However, the environment setup for GPU base system for ten- sorflow will be different as compared to CPU base environment. We will use the same NSL-KDD dataset obtained from University of New Brunswick Lab to train and test the algorithm.

Activity 4 and 5: Demonstration and Evaluation. This phase of the model demonstrates how the Tensorflow LSTM RNN algorithm is able to de- tect DDoS flooding attacks in a GPU base environment. This aspect of the project will use the same iteration parameters in RQ1 with the exception of dataset size and epochs which will be de- termined based on the training times as obtained in RQ1 training results. The algorithm will be evaluated on Epoch and dataset size to see its effect on the mean square error and detection accuracy.

Activity 6 for RQ1 and RQ2: Communication. At this phase of the thesis, the entire project research questions, project objectives and limitation, and the produced artifacts will be communicated to the appropriate audience. Among other things, the design process mentioned in activity 3 and the demonstration and evaluation results mentioned in activity 4 and 5 for RQ1 and RQ2, and also chapter 6 section 6.3 and 6.4 of this project work will also be highlighted. Lastly, future works detailed in chapter 8 section 8.2 will also be made known for further works to improve on the produced artifact. Chapter No. 5

Design and Development

This phase of the project involves the design of the LSTM RNN algorithm based on the four interact- ing layers, steps to setup the development environment, coding the designed LSTM RNN algorithm using tensorflow , evaluating the algorithm based on parameters such as dataset size, epochs, learning rate, weights, and biases.

5.1 Design, Develop and Implement LSTM RNN Algorithm

LSTM RNN architecture LSTM RNN resolves the deficiency of conventional RNNs by being able to learn long term depen- dencies. Another difference is that while conventional RNN has only single neural network layer, LSTM RNN has four neural network layer interacting with each other. Figure 5.1 below illustrate the four interacting layers of LSTM RNN.

Fig. 5.1: The four interacting repeating module of LSTM RNN [54].

Below is the meaning of the various notations used in figure 5.1 and 5.2. 5. Design and Development 29

5.2 Designing the algorithm based on the four layers of LSTM RNN

Fig. 5.2: LSTM RNN Algorithm design architecture.

The LSTM RNN algorithm is developed following a four phase process as indicated in the figure 5.2.

Phase 1: Layer 1 This phase is called the decision making layer. LSTM RNN looks at information from the previous cell state (ht−1) and input xt, and decides on which to keep or discard using the sigmoid layer- σ . The LSTM RNN does so using the sigmoid layer which is also called the forget gate layer. From equation 1,wf and bf represent the weight and biases respectively. The weight and biases remain the same for the four iterations.

ft = σ(wf × [ht−1, xt] + bf ) ⇐ (1)

Phase 2: multiplication of Layers 2 and 3 This is the phase where the cell state is updated with new information. Equation 2 called the sig- moid layer decides on which information is to be added to the cell state and equation 3 converts the new information into vector form which can be added to the cell state.

it = σ(wf × [ht−1, xt] + bi) ⇐ (2)

C˘t = tanh(wc × [ht−1, xt] + bc) ⇐ (3) 5. Design and Development 30

Phase 3: Layer 1, 2 and 3

At this phase of the iteration, the old cell state Ct−1 is updated into the new cell state Ct . This is achieved by vector multiplication of ft and the old cell state Ct−1 . The results; we add tt × C˘t which determines how much we want to update the new state. Equation 4 shows how we can achieve that.

Ct = ft × Ct−1 + it × C˘t ⇐ (4)

Phase 4: Layer 4 This is the output phase of LSTM which will be input for the next layer of neural network. Equation 5 demonstrates that before we decide on what to output, we first run a sigmoid layer of the cell state to decide what we want to output, and equation 6 also shows that we pass the cell state through a tanh and then multiply the result by the output of the sigmoid gate.

ot = σ(wo × [ht−1, xt] + bo) ⇐ (5)

Ct = ot × tanh(Ct) ⇐ (6) Figure 5.3 below is a sample TensorFlow code for layer 1 of the LSTM RNN algorithm. The entire codes for the algorithm are provided upon request.

Fig. 5.3: Sample TensorFlow code for layer 1 of the LSTM RNN algorithm.

5.3 Environment setup for Algorithm Development

The experiment work for this thesis is carried out on two different environments. These are CPU and GPU base systems environment. Below is the specifications for both base systems.

CPU base system:

• Intel(R)Core(TM)i7-6700HQ CPU @2.60Ghz

• 16.0 GB of RAM.

• Windows 10 OS

GPU base system:

• Intel(R)Xeon(R) CPU [email protected]

• 16.0 GB of RAM. 5. Design and Development 31

• Windows 10 OS

Figure 5.4 below is the dataflow diagram which shows how the two environments will be created.

Fig. 5.4: CPU and GPU base system environment setup process.

5.4 Design Structure for LSTM RNN technique using Tensorflow API

See attached file name tensorflowlstmrn.py for the codes and the dataset KDDTest.csv and KD- DTrain.csv, which was used to train and test the codes. Figure 5.5 below is the architecture followed to develop the algorithm code. Algorithm Code structure

Fig. 5.5: TensorFlow Algorithm Architecture

The design parameters used to train the model to achieve 99.968% detection accuracy is as follows:

• Epoch = 100 5. Design and Development 32

• Hidden Layers = 5

• Activation function = ReLu

• Optimizer = Gradient Descent Optimizer

• Classification engine = softmax cross entropy between logits and labels

• Neurons on each layer = 40

• Learning Rate = 0.001

• Embedding size = 2

• Class size = 17

• One hot encoder = Yes

• Weights = Random

• Biases = Random

Explanation of the code structure shown in figure 5.5.

The algorithm code architecture is divided into five(5) main parts, namely, the initialization phase, the data pre-processing phase, model implementation phase, training iteration phase and results phase. At the initialization phase of the algorithm code, all the algorithm parameters are defined and ini- tialized with their constant values. Also, both the training and testing datasets are read and stored in their respectively defined variables. The record default values are also initialized and variables that needs to be reshaped is done to prevent computational errors. At the data pre-processing phase, the features and labels records of both the training and testing dataset are read and extracted from the defined variables at the initialization phase and placed in their respective placeholders. The model implementation phase holds the codes for the five layer LSTM RNN Algorithm. It is at this phase where the models'randomly generated weights , biases and specified nodes are defined as well as the rectilinear function (ReLu) computation is also applied. The training phase of the algorithm structure is the part of the code that performs the training iteration of the model. For this to happen, both the training and testing datasets, epochs and re- sults from the model implementation are used to calculate the success metrics such as cost function, prediction, and accuracy. Also results from the success metrics together with learning rate is used to calculate the optimizer and the train step. For better interpretation of the results, tensorboard is incorporated into the code structure by introducing name scope, summary scalars, file writer and em- beddings. Section 5.5 of chapter 5 explains how one can access the training results using tensorboard.

5.5 How to access Tensorboard for Training Results.

Tensorboard is a component of tensorflow that displaces the results of the trained algorithm.The code shown in figure 5.6, illustrates how the data flow graph of the tensorflow can be accessed. The commands show that you would have to first navigate to where the logs folder is created on the computer system. After, issue the command tensorboard –logdir=logs. This command will create the graph and provide the URL which will enable the user to access the graph. Below is the result provided when the URL is entered in the web browser. 5. Design and Development 33

Fig. 5.6: How to access TensorBoard.

This is the graphical interface used to visualize the LSTM RNN code and the results generated from executing the code. It has six main menu items and depending on what is specified in the code to be visualized, the various menu as shown in figure 5.7 is populated with the required results. For instance figure 5.6 shows that four main scalar items were captured by the tensorboard visualizer because it was specified in the code to do so. Since my work does not have images and audio com- ponents, the tensorboard will not have anything to display. The main focus will be on the Scalars, Graphs, Distributions , Histograms and embeddings.

TensorBoard Graphs Figure 5.7 below is the main graph for the entire TensorFlow LSTM RNN Algorithm code. It shows that the entire code uses two batches Training Input and Testing Input of datasets.The graph also shows the computational conncetions among the the five layers of the algorithm and how the output is used to calculate the Loss and Accuracy of the model. 5. Design and Development 34 TensorBoard Graphs Fig. 5.7: Chapter No. 6

Results

The results of this thesis work is grouped into three main parts. These are:

• Data collection results

• Model detection accuracy results to address RQ1

• CPU and GPU Performance results to address RQ2

6.1 Data Collection Results

There are five main processes which were taken to achieve these results. Illustrated below are the works carried out at each process level in obtaining and fine-tuning the dataset required for the thesis work.

6.1.1 Data Collection Phase

At this phase of the process, the NSL-KDD dataset and its accompanying documentation was downloaded by following the link:https://iscxdownloads.cs.unb.ca/iscxdownloads and the provided login credentials by the R and D Manager of Canadian Institute for Cybersecurity (CIC). Figure 6.1 is the screen shot of the link provided by the New Brunswick Lab to download the dataset.

Fig. 6.1: ISCX link to download dataset. 6. Results 36

This dataset has reasonable data records in both the train and test files. It does not have redundant records which prevent the learning algorithm to be biased towards the method which have better detection rates towards the repeated records. Below are the data files which make up the NSL-KDD dataset [53].

• KDDTrain+.ARFF: The full NSL-KDD train set with binary labels in ARFF format.

• KDDTrain+.TXT: The full NSL-KDD train set including attack-type labels and difficulty level in CSV format.

• KDDTrain+20Percent.TXT: A 20% subset of the KDDTrain+.txt file.

• KDDTest+.ARFF: The full NSL-KDD test set with binary labels in ARFF format.

• KDDTest+.TXT: The full NSL-KDD test set including attack-type labels and difficulty level in CSV format.

• KDDTest-21.ARFF: A subset of the KDDTest+.arff file which does not include records with difficulty level of 21 out of 21.

• KDDTest-21.TXT: A subset of the KDDTest+.txt file which does not include records with difficulty level of 21 out of 21.

The dataset obtained from New Brunswick Lab was developed using 22 different attack types as shown in Table 6.1, and each record is represented by 42 different features field as shown in table 6.2. Out of the 22 different attacks, six (6) of them is DoS attack types and the remaining is non DoS attack.

Tab. 6.1: The 22 Attack Types of NSL-KDD Dataset. The 22 Attack Types of NSL-KDD Dataset. S/N Name Attack Type S/N Name Attack Type 1 back DoS 12 perl u2r 2 buffer overflow u2r 13 phf r2l 3 ftp write r2l 14 pod DoS 4 guess passwd r2l 15 portsweep probe 5 imap r2l 16 rootkit u2r 6 ipsweep probe 17 satan probe 7 land DoS 18 smurf DoS 8 loadmodule u2r 19 spy r2l 9 multihop r2l 20 teardrop DoS 10 neptune DoS 21 warezclient r2l 11 nmap probe 22 warezmastert r2l

The table 6.1 represent the 22 different attack types in the dataset. Six (6) represents DoS attack types and sixteen (16) represents non-DoS attack types. 6. Results 37

Tab. 6.2: The 42 features of NSL-KDD Dataset. The 42 features of NSL-KDD Dataset. S/N Field Name Description S/N Field Name Description 1 duration continuous 22 is guest login symbolic 2 protocol type symbolic 23 count continuous 3 service symbolic 24 srv count continuous 4 flag symbolic 25 serror rate continuous 5 src bytes continuous 26 rerror rate continuous 6 dst bytes continuous 27 srv serror rate continuous 7 Land symbolic 28 srv rerror rate continuous 8 wrong fragment continuous 29 same srv rate continuous 9 urgent continuous 30 diff srv rate continuous 10 Hot continuous 31 srv diff host rate continuous 11 num failed logins continuous 32 dst host count continuous 12 logged in symbolic 33 dst host srv count symbolic 13 num compromised continuous 34 dst host same srv rate continuous 14 root shell continuous 35 dst host diff srv rate continuous 15 su attempted continuous 36 dst host same src port rate continuous 16 num root continuous 37 dst host srv diff host rate continuous 17 num file creations continuous 38 dst host serror rate continuous 18 num shells continuous 39 dst host srv serror rate continuous 19 num access file continuous 40 dst host rerror rate continuous 20 num outbound cmds continuous 41 dst host srv rerror rate continuous 21 is host login symbolic 42 Attack type label

6.1.2 Data Cleaning and Segmenting Phase

After segmenting the data into various sizes, the features which identifies DDoS flooding attacks were extracted and then converted into a format compatible and acceptable by TensorFlow. Four label features field such as protocol type, service, flag and attack types were taken out to help re-size the dataset. The dataset had different attack types apart from DDoS attack types so to properly visualize the attacks, a metadata file was created for each dataset size used in the iteration process. The metadata was created using serial numbers and labels. This means that all the different attacks had a unique label or serial to distinguish each attack type from the other. Also one-hot encoder was used in the system based on the attack serial number field. Lastly, the various record field headings were taken off so that the entire dataset became numbers acceptable for vector computation. The next stage was to segment both the training and testing dataset into various sizes. The training and testing dataset was re-sized into the various record sizes such as 2000, 5000, 10000, 15000, and 20000 records and used to create different trained models. Each dataset size has the same metadata size.

6.1.3 Data Pre-Processing Phase

In this phase of the process, the 2-D vector testing and training dataset was converted into a vector form to be acceptable for neural network use. The records with empty data was initialize to 0 and all other records were converted to floating point numbers. This was achieved within the code by calling on the tf.decode csv and tf.cast functions respectively. 6. Results 38

6.1.4 LSTM RNN Training and Testing Phase

The vector form training and testing dataset was used to train the tensorFlow LSTM RNN algorithm. At this stage of the process, the various evaluation parameters such as dataset size, nodes, biases, weights, epoch, and learning rate were manipulated, while the training accuracy, mean square error and cost were monitored to ensure that best detection accuracy is achieved.

6.1.5 Classification of attacks

As one-hot encoded process was used, all detected DDoS flooding attacks by tensorFlow LSTM Algorithm was represented by 1 and other attacks, including normal packets and non-DOS attacks were also represented by 0. I adopted this approach for ease of interpretation of data after the simulation is run successfully. For easy interpretation of the results, the embedded visualizer aspect of Tensorboard was employed to help determine the detected DDoS attacks and other attacks that have similar characteristics to DDoS. Figure 6.2 and 6.3 represent the 3D embedding visualizer of 2000 sample data size with 38 features from 23 different attack types. 6. Results 39 created coded serial numbers of the metadata 3D embedding visualizer of 2000for sample the data sample size size. with 38 features from 23 different attack types represented by the Fig. 6.2: 6. Results 40 created for the coded labels of the metadata 3D embedding visualizer of 2000 sample data size with 38 features from 23 different attack types represented by the sample size. Fig. 6.3: 6. Results 41

6.2 Results from CPU Based environment

6.2.1 Training the Model

To efficiently train the model to increase its detection accuracy, the following iteration process was followed as shown in figure 6.4. The main aim is to reduce the cost function so that the distance between the model and the provided dataset can be reduced. This involved the update of variables such as weights, biases, and learning rate until the cost becomes small.

Fig. 6.4: Iteration process adapted to increase the efficiency of LSTM RNN Model.

6.2.2 CPU base system Results -RQ1

To achieve better results from the model, five(5) iterations were performed based on seven (7) evaluation parameters. Out of these seven (7) parameters, the weights and biases were randomly generated, and layers, nodes and learning rate were held constant, whiles dataset and epochs were varied in all the iterations.

6.2.3 CPU Iteration 1

The first iteration was based on parameters as shown in table 6.3 .The parameters in table 6.3 include dataset size, epochs, learning rate, nodes, layers, weights, and biases. The other three pa- rameters namely time, mean square error (mse) and accuracy are the training results .The goal for this iteration is to determine the first training results upon which further iterations can be based on to determine the performance of the model. Figure 6.5 is the variable explorer of the code which shows the assigned parameters used in the model training . Figure 6.6 is the training results for iteration 1. Lastly, figure 6.7 is the graph from tensorboard which represents iteration 1 training results. Figure 6.7 has 2 graphs, the accuracy and loss graph of the trained model. From figure 6.6 and 6.7 , the results show that the model developed has very high detection rate and low false alarm rate. Out of the 100 iterations performed by the algorithm, 70% of the it- eration gave a 100% detection accuracy and 30% gave an average of 99.70% detection accuracy.However the model completed its training with a 100% detection accuracy. These results support the fact that LSTM RNN technique detects DDoS attacks accurately and efficiently when implemented using tensorFlow framework. 6. Results 42

Tab. 6.3: CPU Iteration 1 Parameters. Iteration 1 Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Time MSE Accuracy Parameters size rate (Seconds) Results 1 2000 100 0.001 40 5 Random Random 173 nan 1

Fig. 6.5: Variable Explorer values for iteration 1

Fig. 6.6: CPU Results for iteration 1 base on 2000 dataset size and 100 epochs. 6. Results 43 Graph results of iteration 1 based on 100 epoch and 2000 dataset size. Fig. 6.7: 6. Results 44

6.2.4 CPU Iteration 2

The purpose of the second iteration is to determine whether the model will be able to maintain its 100% detection accuracy when the values of the varied parameters are changed and also the effect of the increased epochs on the model's training time. This iteration is based on the parameters shown in table 6.4. Comparing the values of table 6.3 to table 6.4, the varied parameters such as dataset size and epochs have been increased by 150% and 100% respectively. All other parameters such as learning rate,nodes and layers remained the same. Figure 6.8 is the training results based on parameters set out in Table 6.4 . Figure 6.9 is tensorboard scalar graphs representing the training results. The graph is in 2 sections representing the accuracy and loss graph of the trained model. The accuracy and loss graph shows the least and highest detection accuracy and low false alarm rate performed by the model to be 0.995 and 5.000e-3 respectively. From the figures 6.8 and 6.9 , the results shows that the module developed from iteration 2 has very high detection rate and low false alarm rate. Out of the 200 iterations performed by the algorithm, 75% of the iteration gave a 100% detection accuracy and 25% gave an average of 99.74% detection accuracy. However the model completed its training with a 100% detection accuracy. These results of iteration 2 as compared with that of iteration 1 , show a performance increase of 5% detection accuracy over that of iteration 1. Also the training time for iteration 2 increased by 398.27% over that of iteration 1. This is attributed to epochs and dataset size. These results support the fact that LSTM RNN technique detects DDoS attacks accurately and efficiently when implemented using tensorFlow framework.

Tab. 6.4: CPU Iteration 2 Parameters. Iteration 2 Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Time MSE Accuracy Parameters size rate (Seconds) Results 5000 200 0.001 40 5 Random Random 862 nan 1

CPU Results for iteration 2

Fig. 6.8: CPU Results for iteration 2 base on 5000 dataset size and 200 epochs. 6. Results 45 Graph results of iteration 2 based on 200 epoch and 5000 dataset size. Tensorboard presentation of results based on Table 4 parameters Fig. 6.9: 6. Results 46

6.2.5 CPU Iteration 3

The goal for the third iteration is to determined whether the model will continue to maintain the 5% increase in detection accuracy and 398.27% increase in training time with the varied parameters such as dataset size and epochs increased by 100% and 50% respectively as compared to parameters in iteration 2. Table 6.5 shows the values for the various evaluating parameters. The values for learning rate, nodes and layers remain the same. Weights and biases continues to be randomly generated. From figure 6.11 and 6.12 , the results show that the module developed, has very high detection rate and low false alarm rate. Out of the 300 iterations performed by the algorithm, 60% of the iter- ation gave a 100% detection accuracy and 40% gave an average of 99.87% detection accuracy. However the model completed its training with a 100% detection accuracy. A comparison of itera- tion 2 and 3 reveals a drop of 15% both in the 100% detection accuracy and the average detection accuracy. Also the percentage increase of the training time dropped from 398.27% to 191.531% . These results reveal that the model was not able to maintain the percentage increase in detection accuracy and training time when the same percentage increase of dataset and epochs were ap- plied.However, since the model completed with a 100% detection accuracy, this supports the fact that LSTM RNN technique detects DDoS attacks accurately and efficiently when implemented using tensorFlow framework.

Tab. 6.5: CPU Iteration 3 Parameters. Iteration 3 Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Time MSE Accuracy Parameters size rate (Seconds) Results 10000 300 0.001 40 5 Random Random 2513 nan 1

Figure 6.10 represents variable explorer for iteration 3. It reflected the changes in the dataset size and the epochs.

Fig. 6.10: Variable Explorer values for iteration 3. 6. Results 47

Figure 6.11 is the training results for iteration 3 based on parameters set out in Table 6.5 . These results also showed the increase in epochs , the stored location of the training model, the mean square error (MSE) and the accuracy of the trained model.

Fig. 6.11: CPU Results for iteration 3 base on 10000 dataset size and 300 epochs.

Figure 6.12 is tensorboard scalar graphs representing the training results. The graph is in 2 sections representing the accuracy and loss graph of the trained module. The accuracy and loss graphs show the least and highest detection accuracy and low false alarm rate performed by the model to be 0.995 and 5.000e-3 respectively. 6. Results 48 Graph results of iteration 3 based on 300 epoch and 10000 dataset size. Fig. 6.12: Tensorboard Representation of training results based on the parameters shown in Table 6.5 6. Results 49

6.2.6 CPU Iteration 4

The goal of the fourth iteration is to determine whether the model will continue to maintain its final 100% detection accuracy, decrease in training time ,increase in the average detection accuracy and decrease in the 100% detection accuracy during the training. To help verify these goals, the dataset and epochs will be increase by 50% and 33.33% respectively as compared to parameters in iteration 3. Table 6.6 shows the values for the various evaluating parameters.The values for learning rate, nodes and layers remains the same. Weights and biases continue to be randomly generated. From figure 6.13 and 6.14 , the results show that the model developed has very high detection rate and low false alarm rate. Out of the 400 iterations performed by the algorithm, 47.5% of the iteration gave a 100% detection accuracy and 52.5% gave an average of 99.90% detection accuracy. However the model completed its training with a 99.85% detection accuracy. A comparison of iteration 3 and 4 results, shows a drop of 0.15% in the final detection accuracy of the model, a decrease of 12.5% of 100% detection accuracy during training and an increase of 0.03% of the average detection accuracy. Also the percentage increase of the training time dropped from 191.531% to 98.448% . These results reveal that there was a decrease in the final detection accuracy ,the percentage in detection accuracy during training and training time.However, since the model completed with a 99.85% detection accuracy, this supports the fact that LSTM RNN technique detects DDoS attacks accurately and efficiently when implemented using tensorFlow framework.

Tab. 6.6: CPU Iteration 4 Parameters. Iteration 4 Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Time MSE Accuracy Parameters size rate (Seconds) Results 15000 400 0.001 40 5 Random Random 4987 nan 0.998533

Fig. 6.13: CPU Results for iteration 4 base on 15000 dataset size and 400 epochs. 6. Results 50 Graph results of iteration 4 based on 400 epoch and 15000 dataset size. Tensorboard presentation of results based on Table 6.6 parameters Fig. 6.14: 6. Results 51

6.2.7 CPU Iteration 5

The goal of the fifth iteration is to determine whether the model will continue to decrease in the final detection accuracy, the 100% detection accuracy during training , and training time . To help verify this goal, the dataset and epochs will be increased by 33.33% and 25% respectively as compared to parameters in iteration 4. Table 6.7 shows the values for the various evaluating parameters. The values for learning rate, nodes and layers remain the same. Weights and biases continues to be randomly generated. Figure 6.15 shows the various training stages of the model. The entire training took 2hours 7minutes and 30seconds to complete. Looking at the training results, it can be seen that the model begun learning during the first iteration when the first batch of the dataset was presented to the model. The minimum and maximum accuracy rate is 0.99725(99.725%) and 1(100%) respectively. From figure 6.15 and 6.16 , the results show that the model developed has very high detection rate and low false alarm rate. Out of the 500 iterations performed by the algorithm, 36% of the iteration gave a 100% detection accuracy and 64% gave an average of 96.91% detection accuracy during the training. However, the model completed its training with a 99.99% detection accuracy. Comparing the results between iteration 4 and 5 , it is seen that there is an increase of 0.14% in the final detection accuracy of the model, a decrease of 11.5% of 100% detection accuracy during training and an increase of 16.5% of the average detection accuracy. Also the percentage increase of the training time increased from 49.609% to 65.731% . However, the training completed with an accuracy of 0.9999 (99.99%) which shows a better detection accuracy than the iteration 4 model. This supports the fact that LSTM RNN technique detects DDoS attacks accurately and efficiently when implemented using tensorFlow framework.

Tab. 6.7: CPU Iteration 5 Parameters. Iteration 4 Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Time MSE Accuracy Parameters size rate (Seconds) Results 20000 500 0.001 40 5 Random Random 8265 nan 0.9999 6. Results 52

Fig. 6.15: CPU Results for iteration 5 base on 20000 dataset size and 500 epochs 6. Results 53 Graph results of iteration 5 based on 500 epoch and 20000 dataset size. Fig. 6.16: Tensorboard Representation of training results based on the parameters shown in Table 6.7 6. Results 54

6.3 Analysis of CPU Based Environment Results -RQ1

Table 6.8 is the summary of the results for the five iteration process. The iteration produced results for very important parameters such as final accuracy of the trained model , 100% detection accuracy during training, average % accuracy for the entire training, Average mean square error (mse),average accuracy , dataset,epochs, training time and time % increase over the increase of epochs. Meaning of the produced parameters:

• Final accuracy of the trained model : - This is the final training results by the model.

• 100% detection accuracy during training:- This is the number of times the model produced 100% detection accuracy during the training process.

• Average % accuracy for the entire training:- This is the average percentage of the non-100% detection accuracy during the training.

• Average accuracy :- This represents the average of the training results below 100%.

• Average mean square error (MSE) :- This represents the average of MSE for training results below 100%.

• Dataset :- This represents the size of records used for the training process.

• Epochs :- This represents the number of times the model is trained with attack features.

• Time :- This represents times taken to complete each iteration.

• Time % increase :- This represents the times % increase over the entire five iterations.

Tab. 6.8: Model training results for the five(5) Iterations. Results for the five(5) Iterations Results Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Final Accuracy 1 1 1 0.9985 0.9999 100 % accuracy during training 70 75 60 47.5 36 Average %. accuracy 30 25 40 52.5 64 Average accuracy 0.997 0.9974 0.9987 0.999 0.9691 Average mean square error(MSE) 0.003 0.0026 0.0013 0.001 0.0309 Dataset 2,000 5,000 10,000 15,000 20,000 Epochs 100 200 300 400 500 Time(seconds) 173 862 2513 4987 8265 Time %. increase 0 398.72 191.531 98.44 65.731

To understand the general relationship between each iteration process based on the evaluating pa- rameters, table 6.9 was generated from table 6.8 to construct the graph shown in figure 6.17 below. This graph shows both the evaluating parameters and their results for each iteration process. The general trend in the graph shows an increase in dataset,epochs,and training time over the five iter- ation process. However, the detection accuracy did not show such a trend but kept changing as the other parameters were increased. 6. Results 55

Tab. 6.9: Relations between iterations . Relations between iterations Results Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Final Accuracy 1 1 1 0.9985 0.9999 Dataset 2,000 5,000 10,000 15,000 20,000 Epochs 100 200 300 400 500 Time(seconds) 173 862 2513 4987 8265

Fig. 6.17: Relations among the five iteration process

6.3.1 Accuracy and Dataset size Analysis:

From figure 6.18 , it can be inferred that the first three iterations got 100% accuracy. This is due to the exposure of the model to attacks with distinct features. However, as the dataset size increased by 50% in iteration 4, there was a sharp drop of the final accuracy rate from 100% to 99.85%. This may be due to the increase in dataset size and exposure of the model to different attack features with common similarities. As the dataset size increased and more of such attacks with similar features were used to train the model, the detection accuracy rate increased to 99.99%. This supports the fact that the accuracy of the trained model depends on the different attack features it is exposed to.

Tab. 6.10: Accuracy and Dataset Analysis. Accuracy and Dataset Analysis Results Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Final Accuracy 1 1 1 0.9985 0.9999 Dataset 2,000 5,000 10,000 15,000 20,000 6. Results 56

Fig. 6.18: Accuracy and Dataset Size Analysis

6.3.2 Accuracy and Epochs Analysis:

Epochs play a very import role in increasing the accuracy of the model. Epoch is the number of times the algorithm is exposed to the features of the various attacks. From the various test conducted on the model, it was discovered that if the dataset size is large, the model will require more epochs to enable it learn and distinguish the various attack features. This is the reason why the same Epochs were not used for the training, but rather was increased as the dataset size also increased. Figure 6.19, shows the relations between epochs and detection accuracy of the model.

Tab. 6.11: Accuracy and Epochs Analysis Accuracy and Epochs Analysis Results Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Final Accuracy 1 1 1 0.9985 0.9999 Epochs 100 200 300 400 500

Fig. 6.19: Accuracy and Epochs Analysis 6. Results 57

6.3.3 Final and Average Accuracy Analysis:

The accuracy of the model is an important aspect of this project. In view of that, the final accuracy of each iteration was compared to the average training accuracy of the model. Table 6.12 shows the results from the five iterations. It was revealed that the average training accuracy had a consistent increase for the first four iterations which had nearly 100% detection accuracy.However, the average accuracy experienced a sharp drop in accuracy as the model began to learn the attack features with similar characteristics.

Tab. 6.12: Final and Average Accuracy Analysis Final and Average Accuracy Analysis Results Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Final Accuracy 1 1 1 0.9985 0.9999 Average accuracy 0.997 0.9974 0.9987 0.999 0.9691

Fig. 6.20: Final and Average Accuracy Analysis

6.4 Results from GPU Based environment

In answering RQ2, five iterations were performed on GPU based system and the results was com- pared to CPU based system results. The evaluation parameters and their respective values used to carry out this part of the work is shown in Table 6.13 below.

Tab. 6.13: GPU Evaluation Parameters GPU Evaluation Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Parameters Size Rate Iteration 1 2000 100 0.001 40 5 Random Random Iteration 2 5000 200 0.001 40 5 Random Random Iteration 3 10000 300 0.001 40 5 Random Random Iteration 4 15000 400 0.001 40 5 Random Random Iteration 5 20000 500 0.001 40 5 Random Random 6. Results 58

6.4.1 GPU Iteration 1

The first iteration was based on parameters as shown in table 6.14 .The evaluation parameters in table 6.14 includes dataset size,epochs, learning rate, nodes,layers, weights, and biases. The other three parameters such as training time,mean square error(mse)and accuracy are the training results .The goal for this iteration is to determine the first training results upon which further iterations can be based on to determine the performance of the model on GPU base system. Figure 6.21 is the training results for the iteration 1. Figure 6.22 is the graph from tensorboard which represent iteration 1 training results. Figure 6.22 has 2 graphs, the accuracy and loss graph of the trained model. From figure 6.21 and 6.22 , the results shows that the model developed has very high detection rate and low false alarm rate. Out of the 100 iterations performed by the algorithm, 70% of the iteration gave a 100% detection accuracy and 30% gave an average of 99.70% detection accuracy.However the model completed its training with a 100% detection accuracy. The training also completed with a training time of 129 seconds which is 25.43% decrease in training time compared to CPU iteration 1 results. These results representing the first GPU iteration is the same as that of CPU base system first iteration process with the exception of training time. So no firm conclusions can be made until the remaining four iterations are completed.

Tab. 6.14: GPU Iteration 1 Parameters GPU Iteration 1 Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Time MSE Accuracy Parameters size rate (Seconds) Results 2000 100 0.001 40 5 Random Random 129 nan 1

Fig. 6.21: GPU results for iteration 1 based on 100 epochs and 2000 dataset size

Average training detection accuracy =(P(0.994, 0.9985, 0.9985))/3 =2.991/3 =0.997 6. Results 59 Graph of iteration 1 GPU results based on 100 epoch and 2000 dataset size. Fig. 6.22: Tensorboard Representation of training results based on the parameters shown in Table 6.13 6. Results 60

6.4.2 GPU Iteration 2

The purpose of the second iteration is to determine whether the trend in GPU - CPU iteration 1 results will repeat iteself. That is, the GPU iteration 2 results will be the same as CPU iteration 2 results.Then we can start to make possible conclusion. GPU iteration 2 is based on the evaluation parameters shown in table 6.15. comparing the values of table 6.14 to table 6.15, the varied param- eters such as dataset size and epochs has been increased by 150% and 100% respectively. All other parameters such as learning rate,nodes and layers remained the same. Figure 6.23 is the training results based on parameters set out in Table 6.15 . Figure 6.24 is tensorboard scalar graphs repre- senting the training results. The graph is in 2 sections representing the accuracy and loss graph of the trained model. The accuracy and loss graph shows the least and highest detection accuracy and low false alarm rate performed by the model to be 0.9995 and 5.000e-4 respectively. From figures 6.23 and 6.24 , the results show that the module developed from GPU iteration 2 has very high detection rate and low false alarm rate. Out of the 200 iterations performed by the algorithm, 75% of the iteration gave a 100% detection accuracy and 25% gave an average of 99.74% detection Accuracy. However the model completed its training with a 100% detection accuracy. These results of iteration 2 compared to that of iteration 1 , shows a perfor- mance increase of 5% detection accuracy over that of iteration 1. Also the training time for GPU iteration 2 decreased by 31.90% over that of CPU iteration 2. This results shows that GPU system is able to train the model much faster than the CPU system. How- ever, the detection accuracy remains the same for both iterations.

Tab. 6.15: GPU Iteration 2 Parameters GPU Iteration 2 Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Time MSE Accuracy Parameters size rate (Seconds) Results 5000 200 0.001 40 5 Random Random 587 nan 1

Fig. 6.23: GPU results for iteration 2 based on 200 epochs and 5000 dataset size 6. Results 61 9974 . 5 =0 / 987 . 5 =4 / 9984)) . 0 , 999 . 0 , 9998 . 0 , 9972 . 0 , Graph of iteration 2 GPU results based on 200 epoch and 5000 dataset size 9926 . (0 P Fig. 6.24: Tensorboard Representation of training results based on the parameters shown in Table 6.14 Average training detection accuracy =( 6. Results 62

6.4.3 GPU Iteration 3

The goal for GPU iteration 3 is to determine whether the detection accuracy and training time results will continue to be the same as its corresponding CPU iteration 3 detection accuracy results and obtain a decrease in training time as well. Table 6.16 shows the values for the various evaluating parameters. The values for learning rate, nodes and layers remains the same. Weights and biases continues to be randomly generated. From figure 6.25 and 6.26 , the results shows that the module developed, has very high detection rate and low false alarm rate. Out of the 300 iterations performed by the algorithm, 80% of the iteration gave a 100% detection accuracy and 20% gave an average of 99.87% detection accuracy. However the model completed its training with a 100% detection accuracy. These results of iteration 3 compared to that of iteration 2 , shows a performance increase of 5% detection accuracy over that of iteration 2. Also the training time for GPU iteration 3 decreased by 29.56% over that of CPU iteration 3. This results shows that GPU system is able to train the model about 30% much faster than the CPU system. However, the detection accuracy remains the same for both CPU and GPU base systems.

Tab. 6.16: GPU Iteration 3 Parameters GPU Iteration 3 Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Time MSE Accuracy Parameters size rate (Seconds) Results 10000 300 0.001 40 5 Random Random 1770 nan 1

Fig. 6.25: GPU results for iteration 3 based on 300 epochs and 10000 dataset size. 6. Results 63 Graph of iteration 3 GPU results based on 300 epochs and 10000 dataset size Fig. 6.26: Tensorboard Representation of training results based on the parameters shown in Table 6.15 6. Results 64

6.4.4 GPU Iteration 4

For the first three iterations, there has been consistent equal detection accuracy for both GPU and CPU base system training , and also nearly 30% decrease in training time is also experienced by the GPU base system. The goal for GPU iteration 4 is to determine whether the detection accuracy and training time results will continue to be the same as its corresponding CPU iteration 4. This results will enable us to really confirm whether the detection accuracy depends on the base system used to train the model or not. Table 6.17 shows the values for the various evaluating pa- rameters. The values for learning rate, nodes and layers remains the same. Weights and biases continues to be randomly generated.From figure 6.27 and 6.28 , the results shows that the module developed, has very high detection rate and low false alarm rate. Out of the 400 iterations performed by the algorithm, 47.5% of the iteration gave a 100% detection accuracy and 52.5% gave an average of 90.39% detection accuracy. However the model completed its training with a 99.85% detection accuracy. These results of iteration 4 compared to that of iteration 3 , shows a performance decrease of 0.15% detection accuracy over that of iteration 3. Also the training time for GPU iteration 4 decreased by 22.28% over that of CPU iteration 4. This results shows that GPU system is able to train the model about 22% much faster than the CPU system. However, the detection accuracy remains the same for both CPU and GPU base systems.

Tab. 6.17: GPU Iteration 4 Parameters GPU Iteration 4 Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Time MSE Accuracy Parameters size rate (Seconds) Results 15000 400 0.001 40 5 Random Random 3876 nan 0.9985

Tensorboard Representation of training results based on the parameters shown in Table 6.16

Fig. 6.27: GPU results for iteration 4 based on 400 epochs and 15000 dataset size. 6. Results 65 Graph of iteration 4 GPU results based on 400 epochs and 15000 dataset size. Fig. 6.28: Tensorboard Representation of training results based on the parameters shown in Table 6.16 6. Results 66

6.4.5 GPU Iteration 5

This is the last iteration performed using the GPU base system to confirm the trends observed in the previous four iterations. This iteration will confirm that GPU base systems trains faster than CPU base system's, and also that the performance of trained model does not depend on the base system used for the training. The iteration will be done based on parameters shown in table 6.18. From figure 6.29 and 6.30 , the results shows that the model developed, has very high detection rate and low false alarm rate. Out of the 500 iterations performed by the algorithm, 34% of the iteration gave a 100% detection accuracy and 66% gave an average of 99.93% detection accuracy during the training. However, the model completed its training with a 99.99% detection accuracy. Comparing iteration 4 and 5 results, GPU iteration 5 shows an increase of 0.14% in the final detection accuracy of the model. However, the training completed with an accuracy of 99.99% which shows a better detection accuracy than the iteration 4 model. Also the training time for GPU iteration 5 decreased by 29.89% over that of CPU iteration 5. This results shows that GPU system is able to train the model about 29% much faster than the CPU system. However, the detection accuracy remains the same for both CPU and GPU base systems.

Tab. 6.18: GPU Iteration 5 Parameters GPU Iteration 4 Parameters Evaluation Dataset Epochs Learning Nodes Layers Weights Biases Time MSE Accuracy Parameters size rate (Seconds) Results 20000 500 0.001 40 5 Random Random 5795 nan 0.99995 6. Results 67

Tensorboard Representation of training results based on the parameters shown in Table 6.16

Fig. 6.29: GPU results for iteration 5 based on 500 epochs and 20000 dataset size. 6. Results 68 Graph of iteration 5 GPU results based on 500 epochs and 20000 dataset size. Fig. 6.30: Tensorboard Representation of training results based on the parameters shown in Table 6.16 6. Results 69

6.5 Analysis of CPU and GPU Based System Results RQ2

The table 6.19 below shows the summary results derived from both CPU and GPU systems. To arrive at the results below, five different iterations were performed on both CPU and GPU systems using the evaluation parameters as shown in the table 6.19 below. The time results were obtained provides evidence that the GPU systems performs better in training time as compare to CPU. However, given the same evaluation parameters, the trained model detection accuracy produced the same results as shown in figure 6.31. This proves that given the same training time, GPU system can train a model with more attack features which will enable it to detect more attacks than a CPU train model. And the more the model is trained with different attack features, the better its detection accuracy and hence the better its performance. Figure 6.31 and Table 6.20 indicates the time stamp relationships between CPU and GPU based systems.

Tab. 6.19: Summary of GPU and CPU evaluation parameters and training results. Summary of GPU and CPU evaluation parameters and training results. Evaluation Dataset Epochs LearningNodes LayersWeights Biases GPU- GPU-GPU- CPU- CPU- CPU- Parameters Size Rate Time MSE AccuracyTime MSE Accuracy Iteration 1 2000 100 0.001 40 5 RandomRandom129 nan 1 173 nan 1 Iteration 2 5000 200 0.001 40 5 RandomRandom587 nan 1 862 nan 1 Iteration 3 10000 300 0.001 40 5 RandomRandom1770 nan 1 2513 nan 1 Iteration 4 15000 400 0.001 40 5 RandomRandom3876 nan 0.9985 4987 nan 0.9985 Iteration 5 20000 500 0.001 40 5 RandomRandom5795 nan 0.9999 8265 nan 0.9999

6.5.1 CPU- GPU Accuracy Analysis:

Fig. 6.31: CPU-GPU Accuracy analysis. 6. Results 70

6.5.2 CPU and GPU Time analysis:

Tab. 6.20: CPU and GPU Time results. CPU and GPU Time results. Results Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 GPU 129 587 1770 3876 5795 CPU 173 862 2513 4987 8265

Fig. 6.32: CPU-GPU base system time analysis.

6.5.3 CPU and GPU Epoch analysis:

The results obtained prove that Epochs have a positive relation with the model training time and accuracy on GPU and CPU base systems. This is because the epoch is the number of times the model is trained with the attack features. The more the model learns the different and similar attacks, the more its detection accuracy improves. However, this affects the training time and the system resources. Table 6.21 and Figure 6.32 shows the results and relation that exist among GPU and CPU base systems training time and Epochs.

Tab. 6.21: CPU and GPU Epochs results. CPU and GPU Epochs results. Results Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 GPU 129 587 1770 3876 5795 CPU 173 862 2513 4987 8265 Epochs 100 200 300 400 500 6. Results 71

Fig. 6.33: CPU-GPU base system Epochs analysis.

The table 6.22 below represents the results obtained after five iterations varying the dataset and holding Epochs constant. The results proves that GPU systems requires lesser training time than CPU systems. Also the results affirm that holding epochs constant does not affect the detection accuracy on GPU and CPU systems because the trained model learns from the same dataset size. It is just that the GPU model learns faster than the CPU model.

Tab. 6.22: CPU and GPU results with varying dataset and constant Epochs CPU and GPU results with varying dataset and constant Epochs. Evaluation Dataset Epochs LearningNodes LayersWeights Biases GPU- GPU-GPU- CPU CPU- CPU- Parameters Size Rate Time MSE AccuracyTime MSE Accuracy Iteration 1 2000 100 0.001 40 5 RandomRandom130 nan 1 171 nan 1 Iteration 2 5000 100 0.001 40 5 RandomRandom314 nan 1 391 nan 1 Iteration 3 10000 100 0.001 40 5 RandomRandom622 nan 1 790 nan 1 Iteration 4 15000 100 0.001 40 5 RandomRandom932 nan 0.999933 1159 nan 0.999933 Iteration 5 20000 100 0.001 40 5 RandomRandom1235 nan 0.99995 1533 nan 0.99995

6.5.4 Epochs Analysis on GPU Systems Training Time:

The purpose of Epoch analysis was to determine its effect on training time when it is maintained as constant value or varied with different values. The results from the five iterations prove that there is difference in training times when epochs is held constant or varied. Table 6.23 and figure 6.33 shows the relation that exist between holding epochs constant and varying the value of epochs. The results proves that epochs do not have an effect on the detection accuracy of the model if the same dataset size is used for the model training. This is because the same detection accuracy were obtained when in one instance the same epochs was used for the training and in other instance different epochs was used for all the five iterations. This implies that when an acceptable epochs is established, what is needed is to train the model with more dataset to increase its attack detection 6. Results 72 spectrum.

Tab. 6.23: Epoch Analysis on GPU systems results. Epoch Analysis on GPU systems results. Results Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Epochs-Constant 100 100 100 100 100 PU-Constant Epochs results 130 314 622 932 1235 GPU-Varying Epochs results 129 587 11770 4987 5795 Epochs 100 200 300 400 500 Dataset Size 2000 5000 10000 15000 20000

Fig. 6.34: Constant and Varied Epochs analysis on-GPU base system.

The below table 6.24 represent the model training results from the various deep learning detection techniques and its obvious that the project training model has higher detection accuracy and lower false alarm rate than that produced by Yuan et al. [55].

Tab. 6.24: Performance Comparison of LSTM , Random Forest and project results. Performance Comparison of LSTM and Random Forest

Model Name Error Rate Accuracy

LSTM RNN by Peter Ken Bediako 0.007% 99.993%

LSTM by Yuan et al. [55] 2.394% 97.606%

Random Forest by Yuan et al. [55] 6.373% 93.627% Chapter No. 7

Discussion

There are three main processes that need to be followed to combat DDoS flooding attacks on the network. These are attack detection, analyzing, and mitigation processes. However, the success of these processes highly depends on the model's ability to detect attacks. This thesis work focused on developing a DDoS attack detection model with a high detection accuracy and low false alarm rate. The attention was on DDoS because, it is the most common significant threat to online ser- vice providers [11]. The effect of DDoS attack has attracted several countermeasures from different research groups using different mechanisms. However, this thesis work adapted to deep learning technique called Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) algorithm. This is because of the gated cell's ability to make decisions on what data is allowed to be written to it, read from it and store on it. This memory-like feature, enables it to keep features of attacks learnt from training process and make detection decisions based on stored information. To code the LSTM RNN algorithm, TensorFlow API was used, because of the performance and flexibility of the API's implementation on different platforms and easy graphical interpretation of training results. The ultimate goal of this thesis work was to provide clear answers to research questions RQ1 and RQ2 as stated in chapter 1 of this project work, since answers to these research questions will produce an efficient deep neural network model with the ability to detect DDoS attack traffic patterns on the network infrastructure in a timely and efficient manner, achieving high degree of accuracy in detection rates, and low false alarm rates.

To arrive at these two research questions, RQ1 and RQ2, a careful review of similar works of other research groups was studied and analyzed to see the research gaps that these works present and how RQ1 and RQ2 will help address these gaps. From the literature review, three main research gaps were identified. The first was that there ex- ist no research paper on DDoS flooding attack detection, using both LSTM RNN and Tensorflow framework in the algorithm development. The two papers that come closer are the research papers of [54] and [55] in which either of them used LSTM RNN or Tensorflow framework in their work but not both as this thesis presents. The second gap is that most of the papers used the conventional RNN which has the vanishing and exploding gradient [55] problem which the LSTM RNN solves and lastly, none of the research groups measured the performance of the LSTM RNN algorithm on CPU and GPU base systems.

To arrive at providing answers to these RQ1 and RQ2, the project work adapted to Design Sci- ence Research Methodology. This is because the project work produced a software artifact that was relevant in solving DDoS flooding attacks detection. For a better detection accuracy, a five level iteration process was followed to build the model with a higher detection accuracy. The iteration process was based on epochs and dataset size. The other parameters such as learning rate, nodes, 7. Discussion 74 and layers were held constant whiles weights and biases were randomly generated. To meet up with the requirements for RQ1 and RQ2, this project was conducted on CPU and GPU based systems. On the two systems, different ways were used to install anaconda framework which was then used to create the tensorflow API environment for the algorithm development. Reference Chapter 5, section 5.3 for a step by step process for setting up the environment. A five layer LSTM RNN deep neural network algorithm was developed and tested on a fine-tuned NSL-KDD dataset obtained from Canadian Institute for Cybersecurity (CIC).

From the experiment conducted on RQ1, two main things were revealed. The first was that detec- tion accuracy rate has a positive relation with epoch and dataset size. This is because the results revealed that the larger the dataset size, the more features the model learns from and hence it be- comes more intelligent and therefore increases the detection rate and Accuracy of the trained model. However, this required more training time and a lot more system resources to achieve that.

Secondly, it was observed that as the dataset size increases, the model experience a 0.15% drop in detection accuracy. This was due to the increase in dataset size which resulted to the exposure of the model to more different attack features with common similarities. However as the model becomes familiar with such attacks, the accuracy rate increased to 99.99%. This therefore support the fact that the accuracy of the trained model depends on the different attack features it is exposed to.

Lastly, it was detected that the models detection accuracy on both CPU and GPU systems starts to deteriorate when the number of attack type to determine increases to 22. This proves that this model is limited to determining up to 20 different attack types.

From the experiment conducted on RQ2, three main things were revealed. The first is that GPU systems performs better in training time than CPU systems , but not in detection accuracy if the same dataset size is used for the training. However, given the same training time, GPU trained models will perform better than CPU trained model because the GPU system can train the models with more dataset size with different attack features than CPU system.

Secondly, the results prove that Epochs have a positive relation with the model training time on GPU and CPU base systems if the same dataset is used for the training. However, epochs do not affect the accuracy of the model when the same dataset is used for the training. This is because the same detection accuracy was obtained when in one instance the same epochs was used for the training and in other instance different epoch was used for all the five iteration.

Lastly, the dataset size also has a relation with training time. The results prove that even when the same epoch is used for training, the larger the dataset, the more training time required to train the model. Chapter No. 8

Conclusion and Future Works

8.1 Conclusion

This thesis work seeks to address two main questions posed in section 1.2 and also produces results which fulfil's the research goals also posed in section 1.3 all in chapter one of this project work. In addressing these tasks, the ultimate goal was to detect DDoS flooding attacks on a network. To achieve these objectives, artificial intelligence, specifically, the Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN), a machine learning technique and Google's second generation machine learning framework called TensorFlow was adopted and used. The evaluation of the pro- duced model was performed on two computing platforms which are CPU and GPU. To address the first research question (RQ1), a CPU based tensorflow environment was created and the algorithm was coded using tensorflow API's as well as following the mathematical computational formulas of LSTM architecture. To determine the Accuracy and efficiency of the algorithm, it was evaluated against seven main parameters such as dataset size, epochs, learning rate, nodes, weights, and biases. There were a lot of iteration done to get the best results. Every iterations done involved assigning of different values to parameters such as datasets size and epochs while randomly generated values for weights and biases were used, layers, nodes and learning rate were held constant, until the best detection accuracy was achieved. In all the iterations performed with different dataset size and epochs, the results prove that the model becomes more intelligent as the dataset size increases and detection accuracy does not de- pend on the base system used for training the model. Also the model produced by this project had a detection accuracy of 99.968% . For further explanation of the findings, sections 6.2 Analysis of CPU Based Environment and sec- tion 6.4 Analysis of GPU Based Environment clearly explains it more. Also to check the platform compatibility and performance of Tensorflow API on different computing platforms, the developed algorithm was also tested on CPU and GPU based architecture systems. The results obtained from chapter 6 clearly prove that it is better to train the model on GPU based systems since it has more computing power which will therefore require lesser time to complete a training than a CPU based system. The challenge to this proposal is how to get a GPU resource to use for the training.

8.2 Future Works

In this thesis work, the focus was on only detecting DDoS flooding attacks on the network. However the dataset obtained composed of twenty-two (22) different attacks out of which six (6) were DDoS attack types and sixteen (16) were different attack types. The results obtained reveals that the model's detection accuracy deteriorated when the number of attack detection class was increased 8. Conclusion and Future Works 76 from 17 to 20. The recommendation is that anyone who chances on extending this work should focus on how this algorithm can be modified to detect all other attacks. This will extend the capabilities of the system which this algorithm will be used to develop. Secondly, the data pre-processing stage of the thesis was done manually which took a lot of working hours. This is because the source dataset was not in acceptable tensorflow format. The recommen- dation to this draw back is that further work should be done to automate this process. Thirdly, the thesis work was analyzed on two different computing resources such as CPU and GPU based systems. Since tensorflow models can be ported to different platforms, the recommendation is that researchers should port the model to different platforms such as Android, iOS, CUDA, googles Cloud TPU etc to test the performance of the Tensorflow LSTM RNN algorithm. Fourthly, the best results generated by the algorithm depended on parameters such as layers, dataset size, epochs, learning rate, weights, and biases. The values of these parameters were set manually per every iteration until the best results was achieved. It is recommended that other researchers should work on automating the assigning of values for these parameters. This will ease the guess work and try and error approach of getting the best fit values that will produce the best detection accuracy. Lastly, although this thesis work highlighted on Tensorboard embedded visualizer, it did not cover it extensively because of time constraints. Therefore any future work should also consider working on it extensively to improve the project work. Bibliography

[1] W. Stallings, and L. Brown, Computer Security, Principles and Practice, 3rd Ed. New Jersey, FL: Pearson, 2014, ch. 7, pp. 242-258.

[2] D. Kshirsagar, S. Sawant, A. Rathod, and S. Wathore, 2016, CPU Load Analysis and Minimiza- tion for TCP SYN Flood Detection, Procedia Computer Science, 85, International Conference on Computational Modelling and Security (CMS 2016), vol. 85,pp. 626-633,June 2016.

[3] S. Mansfield-Devine, DDoS goes mainstream, How headline-grabbing attacks could make this threat an organizations biggest nightmare, vol. 2016, no. 11, pp.7-13, Nov 2016.

[4] J. Frearson, (2015) The Kaspersky report 2015. [Online]. Available: http://business- reporter.co.uk/2015/01/29/firms-face-financial-loss-ddos-att [Accessed 4th January, 2017].

[5] A. Saied, R. Overill, and T. Radzik, Neurocomputing, Detection of known and unknown DDoS attacks using Artificial Neural Networks, vol. 172, no.1, pp. 385-393, Jan 2016.

[6] N. Ko, S. Noh, J. Park, S. Lee and H. Park, ”An efficient anti-DDoS mechanism using flow- based forwarding technology,” Digest of the 9th International Conference on Optical Internet (COIN 2010), Jeju, 2010, pp. 1-3.

[7] M. Yoon, ”Using whitelisting to mitigate DDoS attacks on critical Internet sites,” in IEEE Communications Magazine, vol. 48, no. 7, pp. 110-115, July 2010.

[8] M. Fallah, and N. Kahani, TDPF: a traceback-based distributed packet filter to mitigate spoofed DDoS attacks, Security and Communication Networks, vol. 7, no. 2, pp. 245-264, Feb 2014.

[9] A.R. Hevner, S.T. March, J. Park, and S. Ram, Design science in information systems research, Management Information Systems, vol. 28, no. 1, pp. 75-105, Mar 2004.

[10] T. Booth, and K. Andersson, Network DDoS Layer 3/4/7 Mitigation via Dynamic Web Redi- rection, Future Network Systems and Security (9783319480206), pp. 111-125, Nov 2016.

[11] M. Geva, A. Herzberg and Y. Gev, ”Bandwidth Distributed Denial of Service: Attacks and Defenses,” in IEEE Security and Privacy, vol. 12, no. 1, pp. 54-61, Jan.-Feb. 2014.

[12] E. Doron, and A. Wool, WDA: A Web farm Distributed Denial of Service attack attenuator, Computer Networks, vol. 55, no. 5, pp. 1037-1051, April 2011.

[13] S. Chen and Q. Song, ”Perimeter-based defense against high bandwidth DDoS attacks,” in IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 6, pp. 526-537, June 2005.

[14] K. Lu, D. Wu, J. Fan, S. Todorovic, and A. Nucci, 'Robust and efficient detection of DDoS attacks for large-scale internet', Computer Networks, vol. 51, no. 18, pp.5036-5056, December 2007. Bibliography 78

[15] D. Larson, Distributed denial of service attacks holding back the flood, Network Security, vol. 2016, no. 3, pp. 5-7, November 2016.

[16] S. Behal, and K. Kumar, Trends in Validation of DDoS Research, Procedia Computer Science, vol. 85, no. International Conference on Computational Modelling and Security (CMS 2016), pp. 7-15, January 2016.

[17] IMPERVA INCAPSULA, DDoS Attack [Online]. Available: https://www.incapsula.com/ddos/ddos-attacks/ [Accessed 4th January, 2017].

[18] N. Ko, S. Noh, J. Park, S. Lee, and H. Park, ”An efficient anti-DDoS mechanism using flow- based forwarding technology,” Digest of the 9th International Conference on Optical Internet (COIN 2010), Jeju, pp. 1-3, 2010.

[19] D. Stevanovic and N. Vlajic, ”Application-layer DDoS in dynamic Web-domains: Building defenses against next-generation attack behavior,” 2014 IEEE Conference on Communications and Network Security, San Francisco, CA, pp. 490-491, 2014.

[20] K. PEFFERS, T. TUUNANEN, M. ROTHENBERGER, and S. CHATTERJEE, A Design Science Research Methodology for Information Systems Research, Journal of Management In- formation Systems, vol. 24, no. 3, pp. 45-77, 2007.

[21] A. A. Acharya and K. M. Arpitha, An Intrusion Detection System Against UDP Flood At- tack and Ping of Death Attack (DDOS) in MANET, International Journal of Engineering and Technology (IJET), Vol 8 No 2, 2016.

[22] T. Chatterjee, and A. Bhattacharya, VHDL Modeling of Intrusion Detection and Prevention System (IDPS) A Neural Network Approach, International Journal of Computer Trends and Technology (IJCTT) ,arXiv, vol. 8, no. 1, pp. 52-56, 2014.

[23] UCI Knowledge Discovery in Archive, [Online]. https://kdd.ics.uci.edu/ databases/ kddcup99/task.html [Accessed 15th April, 2017].

[24] W. Hu, W. Hu and S. Maybank, ”AdaBoost-Based Algorithm for Network Intrusion Detection,” in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 38, no. 2, pp. 577-583, April 2008.

[25] T. Chatterjee, and A. Bhattacharya, VHDL Modeling of Intrusion Detection and Prevention System (IDPS) A Neural Network Approach, International Journal of Computer Trends and Technology (IJCTT) ,arXiv, vol. 8, no. 1, pp. 52-56, 2014.

[26] N. Wattanapongsakorn et al., ”A Practical Network-Based Intrusion Detection and Prevention System,” 2012 IEEE 11th International Conference on Trust, Security and Privacy in Comput- ing and Communications, Liverpool, pp. 209-214, 2012.

[27] P. Arun Raj Kumar, S. Selvakumar, Distributed denial of service attack detection using an ensemble of neural classifier, Computer Communications, vol. 34, Issue 11, Pages 1328-1341 ,2011.

[28] R. R. R. Robinson and C. Thomas, ”Ranking of machine learning algorithms based on the performance in classifying DDoS attacks,” 2015 IEEE Recent Advances in Intelligent Compu- tational Systems (RAICS), Trivandrum, 2015, pp. 185-190.

[29] Yang Xiang, Wanlei Zhou, Mark-Aided Distributed Filtering by using Neural Networks for DDoS defense, IEEE GLOBECOM 2005, pp. 17011705. Bibliography 79

[30] L. Xu, Bayesian Ying-Yang System and Theory as A Unified statistical learning approach:(III) Model and Algorithms for Dependence Reduction,Data Dimension Reduction,ICA and Super- vised Learning”. Lecture Notes in Computer Science: Proc. of International Workshop on The- oretical Aspects of Neural Computation, Hong Kong, Springer-Verlag, pp. 4360, 1997.

[31] X. Xu, Y. Sun, and Z. Huang, Defending DDoS attacks using Hidden Markov Models and Cooperative , PAISI 2007, LNCS 4430, pp. 196207.

[32] S. Seufert, and D. O. Brein, Machine Learning for Automatic Defense against Distributed Denial of Service Attacks, in: Proceedings of IEEE International Conference (ICC), 2007, pp.12171222.

[33] R. Jalili, F. Imanimehr, M. Amini, and H. R. shahriari Detection of DDoS attacks using sta- tistical Preprocessor and unsupervised Neural Networks, LNCS, 2005, pp.192203.

[34] D. Gavrilis, and E. Dermatas, Real time detection of distributed denial-of-service attacks using RBF networks and statistical features, Computer Networks, vol. 44, pp 235-245, 2005.

[35] S. Liu, N. Yang, M. Li, and M. Zhou .A recursive recurrent neural network for statistical machine translation, 2014.

[36] Deep Learning for Java, [Online]. https://deeplearning4j.org/data-sets-mldatasets-and- machine-learning,[Accessed 1st May, 2017].

[37] G. Oke, G. Loukas and E. Gelenbe, ”Detecting Denial of Service Attacks with Bayesian Clas- sifiers and the Random Neural Network,” 2007 IEEE International Fuzzy Systems Conference, London, 2007, pp. 1-6.

[38] A. B. M. A. A. Islam and T. Sabrina, ”Detection of various denial of service and Distributed De- nial of Service attacks using RNN ensemble,” 2009 12th International Conference on Computers and , Dhaka, 2009, pp. 603-608.

[39] Z. A. Baig and K. Salah, ”Multi-Agent pattern recognition mechanism for detecting distributed denial of service attacks,” in IET Information Security, vol. 4, no. 4, pp. 333-343, December 2010.

[40] M.H., Bhuyan, H.J., Kashyap, Bhattacharyya, D.K. and Kalita, J.K., 2013. Detecting dis- tributed denial of service attacks: methods, tools and future directions. The Computer Journal, p.bxt031.

[41] G. Loukas, and O. Gulay. ”Likelihood ratios and recurrent random neural networks in detection of denial of service attacks.” 2007.

[42] L. Feinstein, D. Schnackenberg, R. Balupari and D. Kindred, ”Statistical approaches to DDoS attack detection and response,” Proceedings DARPA Information Survivability Conference and Exposition, 2003, pp. 303-314 vol.1.

[43] M. Kim, H. Na, K. Chae, H. Bang, and J. Na: A Combined Data Mining Approach for DDoS Attack Detection, Lecture Notes in Computer Science, Vol. 3090, pp. 943-950, 2004.

[44] D. Gavrilis and E. Dermatas: Real-time detection of distributed denial-of-service attacks using RBF networks and statistical features, Computer Networks, Vol. 48, pp. 235-245, 2005.

[45] Data Science Central, [Online] http://www.datasciencecentral.com/profiles/ blogs/google- open-source-tensorflow, [Accessed 7th May, 2017].

[46] MathWorks, Deep Learning, [Online] https://www.mathworks.com/discovery/ deep-

learning.html?stid = srchtitle, [Accessed 11th May, 2017]. Bibliography 80

[47] University, [Online] https://courses.bigdatauniversity.com/dashboard, [Accessed 24th May, 2017].

[48] WILDML, [Online] http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part- 1-introduction-to-rnns/, [Accessed 24th May, 2017].

[49] M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, A Detailed Analysis of the KDD CUP 99 Data Set, Submitted to Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009.

[50] NSL-KDD dataset, [Online]http://www.unb.ca/cic/research/datasets/nsl.html, [Accessed 25th May, 2017].

[51] Jin Kim, Nara Shin, S. Y. Jo and Sang Hyun Kim, ”Method of intrusion detection using deep neural network,” 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, 2017, pp. 313-316.

[52] A. Abdelhadi and G. Pujolle. ”A Long Short-Term Memory Recurrent Neural Network Frame- work for Network Traffic Matrix Prediction.” arXiv preprint arXiv: 1705.05690 .2017.

[53] NEXUSGUARD, [Online] https://news.nexusguard.com/threat-advisories/q2-2016-ddos- threat-report, [Accessed 21st August, 2017].

[54] Understanding LSTM Networks, [Online] http://colah.github.io/posts/2015-08-Understanding- LSTMs/ , [Accessed 23rd October, 2017].

[55] X. Yuan, C. Li and X. Li, ”DeepDefense: Identifying DDoS Attack via Deep Learning,” 2017 IEEE International Conference on Smart Computing (SMARTCOMP), Hong Kong, 2017, pp. 1-8.