Detection of Botnets Using Combined Host- and Network-Level Information Yuanyuan Zeng, Xin Hu, Kang G
Total Page:16
File Type:pdf, Size:1020Kb
Detection of Botnets Using Combined Host- and Network-Level Information Yuanyuan Zeng, Xin Hu, Kang G. Shin University of Michigan, Ann Arbor, MI 48109-2121, USA {gracez, huxin, kgshin}@eecs.umich.edu 1. INTRODUCTION botnet at the network level and the malicious behavior Botnets have now become one of the most serious each bot exhibits at the host level, we propose a C&C security threats to Internet services and applications; protocol-independent detection framework that incor- they can mount Distributed-Denial-of-Service (DDoS) porates information collected at both the host and the attacks, spamming, phishing, identity theft, and other network levels. The two sources of information com- cyber crimes. plement each other in making detection decisions. Our To control a botnet, a botmaster needs to use a C&C framework first identifies suspicious hosts by discovering channel to issue commands and coordinate bots’ ac- similar behaviors among different hosts using network- tions. Although IRC- and HTTP-based C&C have been flow analysis, and validates the identified suspects to be adopted by many past and current botnets, both of malicious or not by scrutinizing their in-host behavior. them are vulnerable to a central-point-of-failure. That Since bots within the same botnet are likely to receive is, once the central IRC or HTTP server is identified the same input from the botmaster and take similar and removed, the entire botnet will be disabled. actions, whereas benign hosts rarely demonstrate such To counter this weakness, attackers have recently shifted correlated behavior, our framework looks for flows with toward a new generation of botnets utilizing decentral- similar patterns and labels them as triggering flows. It ized C&C protocol such as P2P. This C&C infrastruc- then associates all subsequent flows with each trigger- ture makes detection and mitigation much harder. A ing flow on a host-by-host basis, checking the similarity well-known example is the Storm worm (a.k.a. Nuwar, among those associated groups. If multiple hosts be- W32.Peacomm, and Zhelatin) [1] which spreads via email have similarly in the trigger-action patterns, they are spam and is known to be the first malware to seed a grouped into the same suspicious cluster as likely to be- botnet in a hybrid P2P fashion. Storm uses peers as long to the same botnet. Whenever a group of hosts HTTP proxies to relay C&C traffic and hides the bot- is identified as suspicious by the network analysis, the masters well behind the P2P network. A recent spam- host-behavior analysis results, based on a history of bot Waledac, which appeared at the end of 2008, also monitored host behaviors, are reported. A correlation spreads via spam emails and forms its botnet using a algorithm finally assigns a detection score to each host C&C structure similar to the Storm botnet. Some re- under inspection by considering both network and host searchers pointed out that Waledac is the new and im- behaviors. proved version of the Storm botnet [2]. Our contributions are three-fold. First, to the best To date, most botnet-detection approaches operate of our knowledge, this is the first framework that com- at the network level; a majority of them target tra- bines both network- and host-level information to de- ditional IRC- or HTTP-based botnets [3, 4, 5, 6, 7, tect botnets. The benefit is that it completes a detec- 8] by looking for traffic signatures or flow patterns. tion picture by considering not only the coordination We are aware of only one approach [9] designed for behavior intrinsic to each botnet but also each bot’s in- protocol- and structure-independent botnet detection; host behavior. For example, it can detect botnets that it depends on network traffic analysis and is unlikely appear stealthy in network activities with the assistance to have a complete view of botnets’ behavior. We thus of host-level information. Moreover, we extract features need the finer-grained host-by-host behavior inspection from NetFlow data to analyze the similarity or dis- to complement the network analysis. On the other similarity of network behavior without inspecting each hand, since bots behave maliciously system-wide, gen- packet’s payload, thus preserving privacy. Second, our eral host-based detection can be useful. One such way detection relies on the invariant properties of botnets’ is to match malware signatures, but it is effective in de- network and host behaviors, which are independent of tecting known bots only. To deal with unknown bot in- the underlying C&C protocol. It can detect both tra- filtration, in-host behavior analysis [10, 11, 12, 13, 14] is ditional IRC and HTTP, as well as recent hybrid P2P needed. However, since in-host mechanisms are vulner- botnets. Third, our approach is evaluated by using sev- able to host-resident malware, host-based approaches eral days of real-world NetFlow data from a core router alone can hardly provide reliable detection results and of a major campus network containing benign and bot- thus we need external, hard-to-compromise (i.e., network- net traces, as well as multiple benign and botnet data level) information for detection of bots’ malicious be- sets collected from virtual machines. Our preliminary havior. results show that the proposed framework can detect Considering the required coordination within each different types of botnets with low false-alarm rates. 1 H o s t A n a l y z e r H o s t A n a l y z e r I n H o s t M o n i t o r I n H o s t M o n i t o r I n H o s t I n H o s t N e t w o r k A n a l y z e r S u s p i c i o n L e v e l C a m p u s R o u t e r S u s p i c i o n L e v e l G e n e r a t o r G e n e r a t o r N e t F l o w D a t a F l o w A n a l y z e r C l u s t e r i n g C l u s t e r i n g R e s u l t s S u s p i c i o n > L e v e l a n d N e t w o r k F e a t u r e s D e t e c t i o n R e p o r t S u s p i c i o n > L e v e l a n d N e t w o r k F e a t u r e s C o r r e l a t i o n E n g i n e Figure 1: System architecture 2. SYSTEM ARCHITECTURE connected via a virtual network to monitor and collect Figure 1 shows the architecture of our system which their activities at the Registry, file system, and network primarily consists of three components: host analyzer, stack. Table 1 shows the details of these botnet traces, network analyzer, and correlation engine. each containing 4 bot instances. IRC-rbot and IRC- As almost all of current botnets target Windows ma- spybot’s traces were generated by running their mod- chines, our host analyzer is designed and implemented ified source code in the virtual network. We obtained for Windows platforms. The host analyzer is deployed the binaries of HTTP-based BobaxA and BobaxB, and at each host and contains two modules: in-host moni- hybrid-P2P-based Storm and Waledac from public web tor and suspicion-level generator. The former monitors sites. The IRC- and HTTP-based botnets’ network- run-time system-wide behavior taking place in the Reg- level traces were captured within a controlled environ- istry, file system, and network stack on a host. The lat- ment and transformed from packet data to flow data ter generates a suspicion-level by applying a machine- in our experiment. Since Storm and Waledac botnets learning algorithm based on the behavior reported at were still active in the wild when we collected data, each time window and computes the overall suspicion- to make it more realistic, we carefully configured the level using a moving average algorithm. The host ana- firewall setting and connected virtual machines to the lyzer sends the average suspicion-level along with a few outside network so that the bots actually joined the real network feature statistics to the correlation engine, if Storm and Waledac botnet and the campus router cap- required. The network analyzer also contains two mod- tured all of the bots’ traffic. We also collected 5-day ules: flow analyzer and clustering. The flow analyzer NetFlow data from a core router in our campus net- takes the flow data from a router as input and searches work which covered the flows generated by Storm and for trigger-action botnet-like flow patterns among dif- Waledac instances and all other hosts in the network. ferent hosts. It then extracts a set of features that can The 5-day data consists of three sets: (1) 2-day data best represent those associated flows and transforms that contains 48-hour Storm traces; (2) 1-day data in- them into feature vectors. Those vectors are then fed cluding Waledac; and (3) other 2-day data.