Network Based Botnet Detection
Total Page:16
File Type:pdf, Size:1020Kb
2012-ENST EDITE - ED 130 Ph.D. ParisTech DISSERTATION In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy from TELECOM ParisTech Specialization « Internet and System’s Security » presented and defended by Leyla BILGE on the 15th of December 2011 Network Based Botnet Detection Thesis supervisor : Prof. Engin Kirda Jury Mr. Herbert BOS, Prof., Vrije Universiteit, Amsterdam Reviewer Mr. Christopher Kruegel, Prof., University of California, Santa Barbara Reviewer Mr. Marc Dacier, Prof., Symantec Examiner Mr. Refik Molva, Prof., Eurecom, Sophia Antipolis Examiner TELECOM ParisTech école de l’Institut Télécom - membre de ParisTech 2012-ENST EDITE - ED 130 Doctorat ParisTech THÈSE pour obtenir le grade de docteur délivré par TELECOM ParisTech Spécialité « Sécurité d’Internet et des systèmes » présentée et soutenue publiquement par Leyla BILGE le 15 Décembre 2011 La Détection des Botnet par l’Analyse de Réseau Directeur de thèse : Prof. Engin KIRDA Jury Mr. Herbert BOS, Prof., Vrije Universiteit, Amsterdam Rapporteur Mr. Christopher Kruegel, Prof., University of California, Santa Barbara Rapporteur Mr. Marc Dacier, Prof., Symantec Examinateur Mr. Refik Molva, Prof., Eurecom, Sophia Antipolis Examinateur TELECOM ParisTech école de l’Institut Télécom - membre de ParisTech Bubacı˘gıma... Abstract The days when the Internet used to be an academic network with no malicious activity are long gone. Today, there is a high incentive for cyber- criminals to engage in malicious, profit-oriented illegal activity on the Inter- net. A popular tool of choice for digital criminals today are bots. Compared to other types of malware, the distinguishing characteristic of a bot is its ability to establish a command and control (C&C) channel that allows an attacker to remotely control or update a compromised machine. A number of bot-infected machines that are combined under the control of a single, malicious entity are referred to as a botnet. Such botnets are often abused as platforms to launch denial of service to send spam or to host scam pages. In this thesis, we propose three network-based botnet detection tech- niques. Each technique models the detections by analyzing different types of network data : the first detection technique performs packet level inspec- tion. The second one analyzes the DNS traffic to find the domains that are abused for different kinds of malicious purposes including being assigned for the command and control servers. And finally, the last one detects command and control servers by analyzing NetFlow data. We propose a detection approach to identify single, bot-infected ma- chines without any prior knowledge about command and control mecha- nisms or the way in which a bot propagates. Our detection model leverages the characteristic behavior of a bot, which is that it (a) receives commands from the botmaster, and (b) carries out some actions in response to these commands. The basic idea of our system is that we can generate detection models by observing the behavior of bots that are captured in the wild. Based on the observations of commands and responses, we generate detec- tion models that can be deployed to scan network traffic for similar activity, indicating the fact that a machine is infected by a bot. We introduce a passive DNS analysis approach and a detection system, Exposure, to effectively and efficiently detect domain names that are in- volved in malicious activity. We use a set of features that allow us to charac- i ii Abstract terize different properties of DNS names and the ways that they are queried. Our experiments with a large, real-world data set consisting of 100 billion DNS requests, and a real-life deployment for two weeks in an ISP show that our approach is scalable and that we are able to automatically identify un- known malicious domains that are misused in a variety of malicious activity (such as for botnet command and control, spamming, and phishing). Finally, we present Disclosure, a large-scale, wide-area botnet detec- tion system that incorporates a combination of novel techniques to overcome the challenges imposed by the use of NetFlow data. In particular, we iden- tify several groups of features that allow Disclosure to reliably distinguish C&C channels from benign traffic using NetFlow records : (i) flow sizes, (ii) client access patterns, and (iii) temporal behavior. We demonstrate that these features are not only effective in detecting current C&C channels, but that these features are relatively robust against expected countermeasures future botnets might deploy against our system. Furthermore, these features are oblivious to the specific structure of known botnet C&C protocols. R´esum´e La p´eriode ou Internet ´etaitun r´eseauuniversitaire sans activit´esmalveil- lantes est r´evolue depuis longtemps. De nos jours, les cyber-criminels ont beaucoup plus de raisons et de motivations pour conduire des activit´es ill´egales`abut lucratif. Un des outils les plus reconnus pour les criminels num´eriquessont les bots. Un botnet correspond `aun ensemble de plusieurs machines "infect´ees"qui sont sous le contr^oled'une seule entit´emalveillante. Ces botnets sont souvent utilis´eescomme des plateformes pour lancer des attaques de d´enide service, pour envoyer des spams ou pour h´eberger des pages d'arnaques. Cette th`esepropose trois techniques de d´etectionde botnet en se bas- ant sur l'analyse du r´eseau.Chaque technique mod´eliseces d´etectionsen analysant diff´erents types de donn´ees : la premi`eretechnique effectue une ispection des donn´eesau niveau paquet. La deuxi`emetechnique analyse le traffic DNS pour retrouver des domaines qui sont affect´espour des fins malveillantes notamment par des serveurs de commandes et de contr^ole.En- fin, la derni`eretechnique d´etectedes serveurs de commande et de contr^ole en analysant les donn´eesdu flux r´eseau. iii iv R´esum´e Table des mati`eres Abstract . .i R´esum´e ................................. iii Contents . .v List of Figures . ix List of Tables . .x Acronyms . xiii 1 Introduction 1 1.1 Botnet Detection Through Network Level Packet Inspection .3 1.2 Botnet Detection Through Passive DNS Analysis . .5 1.3 Botnet Command and Control Server Detection Through Net- flow Analysis . .7 1.4 Contributions . .9 1.5 Outline . 10 2 The Malware Evolution : Botnets 11 2.1 Botnet Characteristics . 12 2.1.1 Botnet Command and Control Infrastructures . 12 2.1.2 Bot Propagation Mechanisms . 15 2.1.3 Malicious Activites . 16 2.2 Historical Evolution of Botnets . 19 3 State of the Art 23 3.1 General malware detection . 23 3.2 Network intrusion detection . 24 3.3 Signature generation . 24 3.4 Botnet analysis and defense . 25 3.5 Using DNS Analysis Techniques for Detecting Botnets . 26 3.5.1 Identifying Malicious Domains . 26 v vi Table des mati`eres 3.5.2 Identifying Infected Machines by Monitoring Their DNS Activities . 28 3.5.3 Generic Identification of Malicious Domains Using Pas- sive DNS Monitoring . 28 3.6 Anomaly Detection Through NetFlow Analysis . 28 3.7 Botnet Detection with NetFlow Analysis . 30 4 Botnet Detection Through Network Level Packet Inspec- tion 31 4.1 System Overview . 32 4.1.1 Detection Models . 32 4.1.2 Model Generation . 34 4.2 Analyzing Bot Activity . 36 4.2.1 Locating Bot Responses . 36 4.2.2 Extracting Model Generation Data . 39 4.3 Generating Detection Models . 40 4.3.1 Command Model Generation . 41 4.3.2 Response Model Generation . 43 4.3.3 Mapping Models into Bro Signatures . 44 4.4 Evaluation . 44 4.4.1 Capturing Bot Traffic . 45 4.4.2 Generating Signatures . 46 4.4.3 Detection Capability . 48 4.4.4 Real-World Deployment . 49 4.4.5 Examples And Comparison To Hand-tuned Signatures 51 4.5 Limitations . 54 5 Botnet Detection Through Passive DNS Analysis 55 5.1 Overview . 56 5.1.1 Extracting DNS Features for Detection . 56 5.1.2 Architecture of EXPOSURE . 57 5.1.3 Real-Time Deployment . 58 5.2 Feature Selection . 58 5.2.1 Time-Based Features . 58 5.2.2 DNS Answer-Based Features . 63 5.2.3 TTL Value-Based Features . 64 5.2.4 Domain Name-Based Features . 65 5.3 Building Detection Models . 66 5.3.1 Constructing the Training Set . 66 5.3.2 The Initial Period of Training . 67 Table des mati`eres vii 5.3.3 The Classifier . 67 5.4 Evaluation . 69 5.4.1 DNS Data Collection for Offline Experiments . 69 5.4.2 Evaluation of the Classifier . 70 5.4.3 Experiments with the Recorded Data Set . 71 5.4.4 Real-World, Real-Time Detection with EXPOSURE . 73 5.4.5 Comparison with Previous Work . 75 5.5 Real-World Deployment of EXPOSURE . 78 5.6 Limitations . 81 6 Botnet Command and Control Server Detection Through Netflow Analysis 83 6.1 System Overview . 84 6.2 Feature Selection and Classification . 85 6.2.1 NetFlow Attributes . 85 6.2.2 Disclosure Feature Extraction . 87 6.2.3 Building the Detection Models . 91 6.3 False Positive Reduction . 92 6.4 Evaluation . 94 6.4.1 The NetFlow Data Sets . 95 6.4.2 The Ground-Truth Data Sets . 95 6.4.3 Labeled Data Set Detection and False Positive Rates . 96 6.4.4 Real-Time Detection . 99 6.4.5 Performance Evaluation . 102 6.4.6 Deployment Considerations . 102 6.4.7 Resilience to Evasion . 103 7 Concluding Remarks 105 Appendices 107 A R´esum´e´etendu 109 A.1 D´etectionBotnet Par Packet Inspection R´eseau. 111 A.2 D´etectionBotnet par DNS Analyse Passive . 113 A.3 D´etectiondes Serveurs de Commande et de Contr^oleen Analysant les Donn´eesdu Flux R´eseau. 116 A.4 Contributions .