On the Generation of Cyber Threat Intelligence: Malware and Network Traffic Analyses
Total Page:16
File Type:pdf, Size:1020Kb
On the Generation of Cyber Threat Intelligence: Malware and Network Traffic Analyses Amine Boukhtouta A Thesis in The Department of Electrical and Computer Engineering Presented in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at Concordia University Montreal, Quebec, Canada April 2016 c Amine Boukhtouta, 2016 CONCORDIA UNIVERSITY Devision Of Graduate Studies This is to certify that the thesis prepared By: Amine Boukhtouta Entitled: On the Generation of Cyber Threat Intelligence: Malware and Net- work Traffic Analyses and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy complies with the regulations of the University and meets the accepted standards with respect to originality and quality. Signed by the final examining committee: Dr. Fariborz Haghighat Chair Dr. Indrajit Ray External Examiner Dr. Olga Ormandjieva External to Program Dr. Ferhat Khendek Examiner Dr. Amr Youssef Examiner Dr. Mourad Debbabi Thesis Supervisor Approved by Chair of Department or Graduate Program Director Dean of Faculty ABSTRACT On the Generation of Cyber Threat Intelligence: Malware and Network Traffic Analyses Amine Boukhtouta, Ph. D. Concordia University, 2016 In recent years, malware authors drastically changed their course on the subject of threat design and implementation. Malware authors, namely, hackers or cyber-terrorists perpetrate new forms of cyber-crimes involving more innovative hacking techniques. Be- ing motivated by financial or political reasons, attackers target computer systems ranging from personal computers to organizations’ networks to collect and steal sensitive data as well as blackmail, scam people, or scupper IT infrastructures. Accordingly, IT secu- rity experts face new challenges, as they need to counter cyber-threats proactively. The challenge takes a continuous allure of a fight, where cyber-criminals are obsessed by the idea of outsmarting security defenses. As such, security experts have to elaborate an ef- fective strategy to counter cyber-criminals. The generation of cyber-threat intelligence is of a paramount importance as stated in the following quote: “the field is owned by who owns the intelligence”. In this thesis, we address the problem of generating timely iii and relevant cyber-threat intelligence for the purpose of detection, prevention and miti- gation of cyber-attacks. To do so, we initiate a research effort, which falls into: First, we analyze prominent cyber-crime toolkits to grasp the inner-secrets and workings of advanced threats. We dissect prominent malware like Zeus and Mariposa botnets to un- cover their underlying techniques used to build a networked army of infected machines. Second, we investigate cyber-crime infrastructures, where we elaborate on the generation of a cyber-threat intelligence for situational awareness. We adapt a graph-theoretic ap- proach to study infrastructures used by malware to perpetrate malicious activities. We build a scoring mechanism based on a page ranking algorithm to measure the badness of infrastructures’ elements, i.e., domains, IPs, domain owners, etc. In addition, we use the min-hashing technique to evaluate the level of sharing among cyber-threat infrastructures during a period of one year. Third, we use machine learning techniques to fingerprint ma- licious IP traffic. By fingerprinting, we mean detecting malicious network flows and their attribution to malware families. This research effort relies on a ground truth collected from the dynamic analysis of malware samples. Finally, we investigate the generation of cyber-threat intelligence from passive DNS streams. To this end, we design and imple- ment a system that generates anomalies from passive DNS traffic. Due to the tremendous nature of DNS data, we build a system on top of a cluster computing framework, namely, Apache Spark [70]. The integrated analytic system has the ability to detect anomalies observed in DNS records, which are potentially generated by widespread cyber-threats. iv ACKNOWLEDGEMENTS To a great degree, all praises to God for giving me the power to accomplish this thesis. I could not finish this work without his blessing and guidance. I would like to express my heartfelt gratitude to my supervisor Prof. Mourad Debbabi, who has highly contributed in the supervision and fulfillment of this thesis. I have to express my appreciation to the committee members, including Dr. Amr Youssef, Dr. Olga Ormandjieva, Dr. Ferhat Khendek, and Dr. Indrajit Ray for evaluating my thesis and providing me with valuable feedback. I thank the National Cyber-Forensics & Training Alliance (NCFTA) Canada for providing facilities to conduct research, this work would not be possible without their active supports. I thank Farsight Security, Inc. and in particular, Dr. Paul Vixie, for the access to valuable data feeds. Special thanks to lab-mates, namely, Hamad Binsalleeh, Nour-Eddine Lakhdari, Serguei Mokhov, Djedjiga Mouheb, Farkhund Iqbal, Son Dinh, Claude Fachkha, Elias Bou-Harb, Prosenjit Sinha, Thomas Ormerod and Victor Belarde, Taher Azab, Saed Alrabaee, Perry Jones, Houssam Borjiba, Mouatez Karbab, Anhar Haneef, Paria Shirani, Lina Nouh, Ray Sujoy, André Soeanu for stimulating discussions and good memories shared in the laboratory. Last but not least, I would like to express my profound gratitude to my beloved wife and daughter for their moral support and patience during my stud- ies, without forgetting my parents, sister, brothers, my uncle and his family, and my best friend H. Oualikene for their encouragement and love. v TABLE OF CONTENTS LIST OF FIGURES . xiii LIST OF TABLES . xvi LIST OF ACRONYMS . xvii 1 Introduction 1 1.1 Motivation and Problem Description . .1 1.2 Objectives . .3 1.3 Methodology . .4 1.3.1 Malware Cyber-Threat Intelligence . .4 1.3.2 Network Traffic Cyber-Threat Intelligence . .5 1.4 Contributions . .5 1.4.1 Analysis of Prominent Threats . .5 1.4.2 Investigation of Cyber-Threat Infrastructures . .6 1.4.3 Fingerprinting Maliciousness in IP Traffic . .7 1.4.4 Near-Real-Time and Scalable Detection of Anomalies in Passive DNS Streams . .8 1.5 Thesis Organization . .9 2 Background and Related Work 11 2.1 Overview . 11 vi 2.2 Cyber-Threat Intelligence . 12 2.2.1 Definition . 12 2.2.2 Cyber-Threat Intelligence Challenges . 12 2.2.3 Cyber-Threat Intelligence Model . 14 Analysis of Cyber-Threats . 15 Specifying Indicators for Cyber-Threats . 15 Managing Cyber-Threat Responses . 15 Sharing Cyber-Threat Information . 16 Cyber-Threat Intelligence Sources . 16 2.3 Prominent Cyber-Threat Analysis . 25 2.4 Network Analysis: Graph Theoretic Approach . 27 2.5 Traffic Fingerprinting and Malware Analysis . 28 2.5.1 Network Traffic Analysis . 29 2.5.2 Malware Traffic Analysis and Classification . 36 2.6 Passive DNS Analysis Systems . 39 2.7 Conclusion . 42 3 Prominent Cyber-Threats 43 3.1 Overview . 43 3.2 Analysis of Mariposa Botnet . 43 3.2.1 Mariposa Botnet Description . 44 3.2.2 Network Analysis . 46 vii 3.2.3 Static Analysis . 49 De-Obfuscation and First Decryption Layer . 50 Anti-Debugging Traps . 51 Second, Third and Fourth Decryption Layers . 52 Code Injection . 55 Injected Thread Activity . 57 3.2.4 Modules . 60 Spreader Module . 60 Uploader and Downloader Modules . 62 Components Diagram . 64 3.3 Analysis of Zeus Botnet Crime-ware Toolkit . 65 3.3.1 Zeus Botnet Description . 65 3.3.2 Network Analysis . 67 3.3.3 Static Analysis . 69 Analysis of the Zeus Builder Program . 71 Zeus Bot Binary Analysis . 72 Packet Decryption . 83 3.4 Conclusion . 85 4 Cyber-Threat Infrastructures 86 4.1 Overview . 86 4.2 Approach . 87 viii 4.2.1 Data Collection . 88 4.2.2 Cyber-Threat Graph Generation . 89 4.2.3 Badness Scoring . 93 4.2.4 Patterns Inference . 96 Decomposition . 98 Vector Generation . 98 Fingerprinting . 99 Graph Similarities Computation . 99 Pattern Time-Based Inference . 100 4.3 Experimental Results . 101 4.3.1 Dataset Description . 101 4.3.2 Descriptive Statistics . 101 Domains & Resolving IPs . 102 Connected IPs . 104 Whois Information . 106 4.3.3 Badness Ranking . 108 4.3.4 Patterns Inference . 112 4.4 Conclusion . 115 5 Malicious Traffic Fingerprinting 117 5.1 Overview . 117 5.2 Traffic Maliciousness Ground Truth . 118 ix 5.3 Packet Headers Flow-Based Fingerprinting . 120 5.3.1 Malicious Traffic Detection . 121 Benign Traffic Datasets . 122 Bidirectional Flow Features Extraction . 122 Traffic Classification . 124 5.3.2 Malicious Traffic Attribution . 126 Malware Family Indexation . 127 Sequencing Flows . 128 Labeling Sequences . 128 Hidden Markov Modeling . 130 Hidden Markov Models Initialization . 131 5.4 Signal and NLP DPI Fingerprinting . 134 5.4.1 Core Principles . 135 5.4.2 Knowledge Base . 136 5.4.3 MARFPCAT’s DPI Methodology . 137 5.4.4 NLP Pipeline . 138 5.4.5 Demand-Driven Distributed Evaluation . 139 5.4.6 Wavelets . 141 5.5 Results . 142 5.5.1 Non-DPI Approach . 142 Classification . 142 x Attribution . 145 Computational Complexity . 150 5.5.2 DPI Approach . ..