Low-Rate Attack Detection with Intelligent Fine-Grained Network Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Low-Rate Attack Detection with Intelligent Fine-Grained Network Analysis A thesis submitted in partial fulfilment of the requirement for the degree of Doctor of Philosophy Baskoro Adi Pratomo October 2020 Cardiff University School of Computer Science & Informatics i Declaration This work has not previously been accepted in substance for any degree and is not concurrently submitted in candidature for any degree. Signed . (candidate) Date . Statement 1 This thesis is being submitted in partial fulfillment of the requirements for the degree of PhD. Signed . (candidate) Date . Statement 2 This thesis is the result of my own independent work/investigation, except where oth- erwise stated. Other sources are acknowledged by explicit references. Signed . (candidate) Date . Statement 3 I hereby give consent for my thesis, if accepted, to be available for photocopying and for inter-library loan, and for the title and summary to be made available to outside organisations. Signed . (candidate) Date . ii Copyright c 2020 Pratomo, Baskoro. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”. A copy of this document in various transparent and opaque machine-readable formats and related software is available at https://baskoroadi.web.id. Dedication iii To my family for their patience and support. iv Abstract Low-rate attacks are a type of attacks that silently infiltrate the victim network, control computers, and steal sensitive data. As the effect of this attack type is devastating, it is essential to be able to detect such attacks. A detection system allows system adminis- trators to react accordingly. More importantly, when the detection system is to analyse the network traffic, it may identify the malicious activity before the attack reaches the system. And by incorporating machine learning into the detection approach, the Network-based Intrusion Detection System (NIDS) will be able to adapt to evolving attacks and minimise human intervention, unlike signature-based NIDS. Several works have tried to address the problem of low-rate attack detection. However, there are several issues with these previous works. Some of them are dated; therefore their performance drops on contemporary low-rate attacks. Some of them only focus on detecting attacks in one protocol, while low-rate attacks exist on various protocols. To tackle this problem, we proposed two Deep Learning (DL) models which analyse network payload and were trained with the unsupervised approach. Our best perform- ing model surpasses the state-of-the-arts and provides an improvement in detection rate of at least 12.04%. The experiments also show that payload-based NIDSs are superior to header-based ones for identifying low-rate attacks. A common approach in payload-based NIDSs is to read the full-length application layer messages, while in some protocols such as HTTP or SMTP, it is usual to have lengthy messages. Processing the full-length of such messages would be time-consuming. Abstract v The damage from the attack may have been done by the time the decision for the par- ticular message comes out. Therefore, we proposed an approach that can early predict the occurrence of low-rate attacks from as little information as possible. Based on our experiments, the proposed method can detect 97.57% of attacks by merely reading, on average, 35.21% of the application layer messages. It improves the detection speed by three-fold. vi Acknowledgements This thesis has been a massive challenge for me. I would not have finished it without helps and supports from many people I met during my study. Firstly, I would like to thank my first supervisor, Pete Burnap. Without his guidance, I would have gotten distracted and lost. He did not give up on me when I almost had given up on myself. He also showed huge patience when dealing with my writing, which helped me to get back on my track. I would like to thank my second supervisor, George Theodorakopoulos, for the dis- cussions we had during my study. His alternate point of view on various matters had helped to figure out solutions when I had been stuck with some issues. I would like to thank Lembaga Pengelola Dana Pendidikan (LPDP), Indonesia, for providing funding for this research. I would like to thank my family for the patience and support. In particular, my wife, Sari, for the constant reminder that brain needs glucose to function, especially during the lockdown. My mum and sister who always pay attention to my health and my well- being, and sometimes, my hair. I dedicate this thesis for my uncle, who passed away during the writing of this thesis. His travels around the world had inspired my younger self to study abroad. I would like to thank my friends and colleagues. I’d like to thank Hudan, my partner in crime, who was never tired having - sometimes - inconsequential debates despite the Acknowledgements vii eight hours time difference and who always reminds me that the best thesis is the one that is finished. I’d like to thank Beryl for organising social events and her suggestions on our shared interests which have kept me clear-headed. Stefano, for the borderline banter and life lessons. I will never forget the question you asked in our first meeting. I would like to thank Roberto for sharing his knowledge on language and cultural references. Lauren, for the language and mathematical advice, and helping me to adapt to the local weather. Covid-19 has stopped us from debating whether the window should be open. Oscar, for spending time to read the thesis and the philosophical discussions. Nyala, for statistical knowledge. Matthew, my brother, for inspirational quotes. Joana and Mafalda, for letting me use some space in their house. Lucie, for her knowledge of places. Pete Sueref, for the constant, literally, pushes. Shakil and Alex, for the regular coffee meeting. I’d like to thank the Cyber Visualisation Demo team, Amir, Irene, Matilda, for the cooperation during this time. I would like to thank the members of the Indonesian Student Association in Wales. It was a great experience to spend time and to have great discussions with them. I would like to show appreciation to bands in which their music has helped me to get through this difficult time: Avenged Sevenfold, DragonForce, Helloween, My Chemical Romance, Nightwish, Powerwolf, Sabaton (for the historical knowledge), and SpyAir. Their songs have helped me to stay focused, reduce noise, and keep me awake. viii Contents Abstract iv Acknowledgements vi Contents viii List of Publications xii List of Figures xiii List of Tables xv List of Acronyms xviii 1 Introduction 1 1.1 Cyber security: threats and detection methods . .1 1.2 Contributions . .8 1.3 Limitations . 10 1.4 Thesis Structure . 11 Contents ix 2 Background 13 2.1 Why study low-Rate Attacks? . 13 2.2 Network-based Intrusion Detection Systems . 18 2.2.1 Header-based NIDS . 21 2.2.2 Payload-based NIDS . 30 2.3 Conclusions . 40 3 How well do existing validation datasets capture representative examples of contemporary low rate attacks? 42 3.1 Existing Network Traffic Datasets with Low-Rate Attacks . 43 3.2 BlattaSploit Dataset . 53 3.3 Conclusion . 57 4 An unsupervised approach for detecting low-rate attacks in network traffic 59 4.1 Introduction . 59 4.2 The Basics of Deep Learning for Outlier Detection . 64 4.2.1 Recurrent Neural Networks . 65 4.2.2 Autoencoders . 67 4.3 Low-rate Attack Detection Methodology . 69 4.3.1 Network Traffic Preprocessing . 69 4.3.2 Outlier Detection Models . 71 4.4 Experiments and Results . 78 4.4.1 Experimental Setup . 78 Contents x 4.4.2 Datasets . 82 4.4.3 Defining Threshold . 83 4.4.4 Results Discussion . 87 4.5 Conclusions . 92 5 Early prediction of low-rate attacks on network traffic with Recurrent Neural Networks 94 5.1 Introduction . 94 5.2 Threat Model . 100 5.3 Methodology . 101 5.3.1 Data Preprocessing . 103 5.3.2 Training an RNN-based classifier . 105 5.3.3 Detecting Attacks . 106 5.4 Experiments and Results . 108 5.4.1 Data Analysis . 110 5.4.2 Comparison With Previous Works . 111 5.4.3 Early Prediction . 113 5.4.4 Detection Speed . 117 5.4.5 Visualisation . 124 5.5 Conclusion and Future Work . 125 Contents xi 6 Conclusions 128 6.1 Thesis Summary . 128 6.2 Discussion on Evasion Techniques, Adversarial Attacks, and Future Works . 136 6.3 Conclusions . 138 Appendices 142 A List of low-rate attacks in BlattaSploit dataset 142 B Experiment results of AE-OD and RNN-OD 167 Bibliography 172 xii List of Publications The work introduced in this thesis is based on the following publications. • B. A. Pratomo, P. Burnap, and G. Theodorakopoulos. Unsupervised approach for detecting low rate attacks on network traffic with autoencoder. In 2018 Interna- tional Conference on Cyber Security and Protection of Digital Services (Cyber Security), pages 1–8. IEEE, 2018 • B. A. Pratomo, P. Burnap, and G. Theodorakopoulos. Blatta: early exploit detec- tion on network traffic with recurrent neural networks. Security and Communic- ation Networks, 2020 xiii List of Figures 1.1 Top 10 attack vectors in 2018-2019. Compiled by McAfee Labs. [62]2 1.2 An illustration of what is being transmitted between two parties com- municating via the Internet. A flow transmits multiple packets from one host to the others.