Botminer: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection

Wright State University CORE Scholar Computer Science and Engineering Faculty Publications Computer Science & Engineering 2008 BotMiner: Clustering Analysis of Network Traffic for otPr ocol- and Structure-Independent Botnet Detection Guofei Gu Roberto Perdisci Junjie Zhang Wright State University - Main Campus, [email protected] Wenke Lee Follow this and additional works at: https://corescholar.libraries.wright.edu/cse Part of the Computer Sciences Commons, and the Engineering Commons Repository Citation Gu, G., Perdisci, R., Zhang, J., & Lee, W. (2008). BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. Proceedings of the 17th USENIX Security Symposium, 139-154. https://corescholar.libraries.wright.edu/cse/4 This Conference Proceeding is brought to you for free and open access by Wright State University’s CORE Scholar. It has been accepted for inclusion in Computer Science and Engineering Faculty Publications by an authorized administrator of CORE Scholar. For more information, please contact [email protected]. BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu†, Roberto Perdisci‡, Junjie Zhang†, and Wenke Lee† †College of Computing, Georgia Institute of Technology ‡Damballa, Inc. Atlanta, GA 30308, USA {guofei,jjzhang,wenke}@cc.gatech.edu, [email protected] Abstract tion exfiltration. Botnets are now the key platform for many Internet In order for a botmaster to command a botnet, there attacks, such as spam, distributed denial-of-service needs to be a command and control (C&C) channel (DDoS), identity theft, and phishing. Most of the current through which bots receive commands and coordinate botnet detection approaches work only on specific attacks and fraudulent activities. The C&C channel is botnet command and control (C&C) protocols (e.g., the means by which individual bots form a botnet. Cen- IRC) and structures (e.g., centralized), and can become tralized C&C structures using the Internet Relay Chat ineffective as botnets change their C&C techniques. In (IRC) protocol have been utilized by botmasters for a this paper, we present a general detection framework that long time. In this architecture, each bot logs into an IRC is independent of botnet C&C protocol and structure, channel, and seeks commands from the botmaster. Even and requires no a priori knowledge of botnets (such as today, many botnets are still designed this way. Quite a captured bot binaries and hence the botnet signatures, few botnets, though, have begun to use other protocols and C&C server names/addresses). We start from the such as HTTP [8, 14, 24, 39], probably because HTTP- definition and essential properties of botnets. We define based C&C communications are more stealthy given a botnet as a coordinated group of malware instances that Web traffic is generally allowed in most networks. that are controlled via C&C communication channels. Although centralized C&C structures are effective, they The essential properties of a botnet are that the bots suffer from the single-point-of-failure problem. For ex- communicate with some C&C servers/peers, perform ample, if the IRC channel (or the Web server) is taken malicious activities, and do so in a similar or correlated down due to detection and response efforts, the botnet way. Accordingly, our detection framework clusters loses its C&C structure and becomes a collection of similar communication traffic and similar malicious isolated compromised machines. Recently, botmasters traffic, and performs cross cluster correlation to identify began using peer-to-peer (P2P) communication to avoid the hosts that share both similar communication patterns this weakness. For example, Nugache [28] and Storm and similar malicious activity patterns. These hosts worm [18,23] (a.k.a. Peacomm) are two representative are thus bots in the monitored network. We have P2P botnets. Storm, in particular, distinguishes itself implemented our BotMiner prototype system and as having infected a large number of computers on the evaluated it using many real network traces. The results Internet and effectively becoming one of the “world’s top show that it can detect real-world botnets (IRC-based, super-computers” [27] for the botmasters. HTTP-based, and P2P botnets including Nugache and Researchers have proposed a few approaches [7, 17, Storm worm), and has a very low false positive rate. 19,20,26,29,35,40] to detect the existence of botnets in monitored networks. Almost all of these approaches 1 Introduction are designed for detecting botnets that use IRC or HTTP Botnets are becoming one of the most serious threats to based C&C [7,17,26,29,40]. For example, Rishi [17] Internet security. A botnet is a network of compromised is designed to detect IRC botnets using known IRC bot machines under the influence of malware (bot) code. The nickname patterns as signatures. In [26, 40], network botnet is commandeered by a “botmaster” and utilized as flows are clustered and classified according to IRC-like “resource” or “platform” for attacks such as distributed traffic patterns. Another more recent system, BotSniffer, denial-of-service (DDoS) attacks, and fraudulent activi- [20] is designed mainly for detecting C&C activities ties such as spam, phishing, identity theft, and informa- with centralized servers (with protocols such as IRC USENIX Association 17th USENIX Security Symposium 139 and HTTP1). One exception is perhaps BotHunter [19], ties. If the botmaster commands each bot individually which is capable of detecting bots regardless of the C&C with a different command/channel, the bots are nothing structure and network protocol as long as the bot be- but some isolated/unrelated infections. That is, they do havior follows a pre-defined infection life cycle dialog not function as a botnet according to our definition and model. are out of the scope of this work3. However, botnets are evolving and can be quite flexi- We propose a general detection framework that is ble. We have witnessed that the protocols used for C&C based on these essential properties of botnets. This evolved from IRC to others (e.g., HTTP [8,14,24,39]), framework monitors both who is talking to whom that and the structure moved from centralized to distributed may suggest C&C communication activities and who is (e.g., using P2P [18,28]). Furthermore, a botnet dur- doing what that may suggest malicious activities, and ing its lifetime can also change its C&C server address finds a coordinated group pattern in both kinds of activi- frequently, e.g., using fast-flux service networks [22]. ties. More specifically, our detection framework clusters Thus, the aforementioned detection approaches designed similar communication activities in the C-plane (C&C for IRC or HTTP based botnets may become ineffective communication traffic), clusters similar malicious activ- against the recent/new botnets. Even BotHunter may fail ities in the A-plane (activity traffic), and performs cross as soon as botnets change their infection model(s). cluster correlation to identify the hosts that share both Therefore, we need to developa next generationbotnet similar communication patterns and similar malicious detection system, which should be independent of the activity patterns. These hosts, according to the botnet C&C protocol, structure, and infection model of botnets, definition and properties discussed above, are bots in the and be resilient to the change of C&C server addresses. monitored network. In addition, it should require no a priori knowledge of This paper makes the following main contributions. specific botnets (such as captured bot binaries and hence • the botnet signatures, and C&C server names/addresses). We develop a novel general botnet detection frame- In order to design such a general detection system that work that is grounded on the definition and essential can resist evolution and changes in botnet C&C tech- properties of botnets. Our detection framework niques, we need to study the intrinsic botnet communi- is thus independent of botnet C&C protocol and cation and activity characteristics that remain detectable structure, and requires no a priori knowledge (e.g., with the proper detection features and algorithms. We C&C addresses/signatures) of specific botnets. It thus start with the definition and essential properties of a can detect both centralized (e.g., IRC,HTTP) and botnet. We define a botnet as: current (and possibly future) P2P based botnets. “A coordinated group of malware instances that are • We define a new “aggregated communication flow” controlled via C&C channels”. (C-flow) record data structure to store aggregated The term “malware” means these bots are used to traffic statistics, and design a new layered clustering perform malicious activities. According to [44], about scheme with a set of traffic features measured on 53% of botnet activity commands observed in thousands the C-flow records. Our clustering scheme can of real-world IRC-based botnets are related to scan (for accurately and efficiently group similar C&C traffic the purpose of spreading or DDoS2), and about 14.4% patterns. are related to binary downloading (for the purpose of malware updating). In addition, most of HTTP-based • We build a BotMiner prototype system based on our and P2P-based botnets are used to send spam [18,39]. general detection framework, and evaluate it with The term “controlled” means these bots have to con- multiple real-world network traces including nor- tact their C&C servers to obtain commands to carry out mal traffic and several real-world botnet traces that activities, e.g., to scan. In other words, there should contain IRC, HTTP and P2P-based botnet traffic be communication between bots and C&C servers/peers (including Nugache and Storm). The results show (which can be centralized or distributed). Finally, the that BotMiner has a high detection rate and a low term “coordinated group” means that multiple (at least false positive rate.

Load more