Long Short-Term Memory Recurrent Neural Network for Detecting Ddos Flooding Attacks Within Tensorflow Implementation Framework
Total Page:16
File Type:pdf, Size:1020Kb
Long Short-Term Memory Recurrent Neural Network for detecting DDoS flooding attacks within TensorFlow Implementation framework. Peter Ken Bediako Information Security, master's level (120 credits) 2017 Luleå University of Technology Department of Computer Science, Electrical and Space Engineering Master Thesis Project Long Short-Term Memory Recurrent Neural Network for detecting DDoS flooding attacks within TensorFlow Implementation framework. Author: Peter Ken Bediako E-mail: [email protected] Supervisor: Dr. Ali Ismail Awad E-mail: [email protected] November 2017 Master of Science in Information Security Lule˚aUniversity of Technology Department of Computer Science, Electrical and Space Engineering Contents 1. Introduction ::::::::::::::::::::::::::::::::::::::::::: 1 1.1 Problem Statement . .2 1.2 Research Questions . .3 1.3 Research Goals . .3 1.4 Delimitation . .4 1.5 Research Contribution . .4 1.6 Research Methodology . .4 1.7 Thesis Outline . .4 2. Background Information ::::::::::::::::::::::::::::::::::::: 5 2.1 Overview of DDoS attack . .5 2.2 How DDoS operate . .6 2.3 How DDoS attack happens . .7 2.4 Types of DDoS attack . .7 2.5 DDoS flooding attack types: . .9 2.5.1 UDP Flood . .9 2.5.2 ICMP (Ping) Flood . .9 2.5.3 TCP SYN Flood . 10 2.6 Machine learning algorithms in detecting DDoS attacks. 10 2.7 Deep learning model . 11 2.8 Recurrent Neural Networks (RNNs) . 12 2.9 Reasons for choosing LSTM RNN over other techniques . 14 2.10 The Datasets formatting for deep learning . 14 2.11 TensorFlow . 15 2.11.1 TensorFlow data flow graph . 15 2.11.2 Tensors . 15 2.11.3 Benefits of using TensorFlow in this thesis work . 16 2.11.4 Benefits for using TensorBoard in this thesis work . 16 3. Literature Review :::::::::::::::::::::::::::::::::::::::: 17 3.1 Defense mechanisms and techniques to detect DDoS attacks . 17 3.2 Research Gap analysis . 19 3.3 Improvement to existing gaps identify in the existing research works. 23 4. Research Methodology ::::::::::::::::::::::::::::::::::::: 24 4.1 Design Science Research (DSR) Methodology . 24 4.2 How DSR Methodology is used to address RQ1. 26 4.3 How DSR Methodology is used to address RQ2. 27 Contents iii 5. Design and Development ::::::::::::::::::::::::::::::::::::: 28 5.1 Design, Develop and Implement LSTM RNN Algorithm . 28 5.2 Designing the algorithm based on the four layers of LSTM RNN . 29 5.3 Environment setup for Algorithm Development . 30 5.4 Design Structure for LSTM RNN technique using Tensorflow API . 31 5.5 How to access Tensorboard for Training Results. 32 6. Results :::::::::::::::::::::::::::::::::::::::::::::: 35 6.1 Data Collection Results . 35 6.1.1 Data Collection Phase . 35 6.1.2 Data Cleaning and Segmenting Phase . 37 6.1.3 Data Pre-Processing Phase . 37 6.1.4 LSTM RNN Training and Testing Phase . 38 6.1.5 Classification of attacks . 38 6.2 Results from CPU Based environment . 41 6.2.1 Training the Model . 41 6.2.2 CPU base system Results -RQ1 . 41 6.2.3 CPU Iteration 1 . 41 6.2.4 CPU Iteration 2 . 44 6.2.5 CPU Iteration 3 . 46 6.2.6 CPU Iteration 4 . 49 6.2.7 CPU Iteration 5 . 51 6.3 Analysis of CPU Based Environment Results -RQ1 . 54 6.3.1 Accuracy and Dataset size Analysis: . 55 6.3.2 Accuracy and Epochs Analysis: . 56 6.3.3 Final and Average Accuracy Analysis: . 57 6.4 Results from GPU Based environment . 57 6.4.1 GPU Iteration 1 . 58 6.4.2 GPU Iteration 2 . 60 6.4.3 GPU Iteration 3 . 62 6.4.4 GPU Iteration 4 . 64 6.4.5 GPU Iteration 5 . 66 6.5 Analysis of CPU and GPU Based System Results RQ2 ................ 69 6.5.1 CPU- GPU Accuracy Analysis: . 69 6.5.2 CPU and GPU Time analysis: . 70 6.5.3 CPU and GPU Epoch analysis: . 70 6.5.4 Epochs Analysis on GPU Systems Training Time: . 71 7. Discussion ::::::::::::::::::::::::::::::::::::::::::::: 73 8. Conclusion and Future Works ::::::::::::::::::::::::::::::::: 75 8.1 Conclusion . 75 8.2 Future Works . 75 List of Figures 1.1 AI model training and Testing Process overview. .3 2.1 DDoS common network and multi-vector attacks surface [3] . .6 2.2 Average peak bandwidth for DDoS attacks [3] . .6 2.3 DDoS attack Network Infrastructure Illustration . .7 2.4 Classification of DDoS attack types. .8 2.5 The most common DDoS attack types in Q2 2016 [3]. .8 2.6 Illustration of UDP attack process. .9 2.7 TCP SYN Flood Process. 10 2.8 Machine learning techniques [46]. 11 2.9 Deep Neural Architecture. 12 2.10 Recurrent Neural Network Model. 13 2.11 RNNs folded and unfolded state [48] . 13 2.12 TensorFlow data flow graph [48] . 15 2.13 The vivisection of Tensor [48] . 16 4.1 DSR Methodology Model [20]. 25 5.1 The four interacting repeating module of LSTM RNN [54]. 28 5.2 LSTM RNN Algorithm design architecture. 29 5.3 Sample TensorFlow code for layer 1 of the LSTM RNN algorithm. 30 5.4 CPU and GPU base system environment setup process. 31 5.5 TensorFlow Algorithm Architecture . 31 5.6 How to access TensorBoard. 33 5.7 TensorBoard Graphs . 34 6.1 ISCX link to download dataset. 35 6.2 3D embedding visualizer of 2000 sample data size with 38 features from 23 different attack types represented by the coded serial numbers of the metadata created for the sample size. 39 6.3 3D embedding visualizer of 2000 sample data size with 38 features from 23 different attack types represented by the coded labels of the metadata created for the sample size. 40 6.4 Iteration process adapted to increase the efficiency of LSTM RNN Model. 41 6.5 Variable Explorer values for iteration 1 . 42 6.6 CPU Results for iteration 1 base on 2000 dataset size and 100 epochs. 42 6.7 Graph results of iteration 1 based on 100 epoch and 2000 dataset size. 43 6.8 CPU Results for iteration 2 base on 5000 dataset size and 200 epochs. 44 6.9 Graph results of iteration 2 based on 200 epoch and 5000 dataset size. 45 List of Figures v 6.10 Variable Explorer values for iteration 3. 46 6.11 CPU Results for iteration 3 base on 10000 dataset size and 300 epochs. 47 6.12 Graph results of iteration 3 based on 300 epoch and 10000 dataset size. 48 6.13 CPU Results for iteration 4 base on 15000 dataset size and 400 epochs. 49 6.14 Graph results of iteration 4 based on 400 epoch and 15000 dataset size. 50 6.15 CPU Results for iteration 5 base on 20000 dataset size and 500 epochs . ..