The Analysis and Identification of P2P Botnet's Traffic Flows
Total Page:16
File Type:pdf, Size:1020Kb
138 International Journal of Communication Networks and Information Security (IJCNIS) Vol. 3, No. 2, August 2011 The Analysis and Identification of P2P Botnet’s Traffic Flows Wernhuar Tarng1, Li-Zhong Den1, Kuo-Liang Ou1 and Mingteh Chen2 1National Hsinchu University of Education, 521 Nanda Rd., Hsinchu, Taiwan, ROC 2Micrel Semiconductor Inc. 2180 Fortune Drive, San Jose, CA 95131, USA Abstract: As the advance of information and communication affecting at least hundreds of thousands of personal technologies, the Internet has become an integral part of human computers in the world, and it can send 1.5 billion spam life. Although it can provide us with many convenient services, email messages daily, enough to seriously affect the global there also exist some potential risks for its users. For example, network activities. According to Microsoft’s statistics, there hackers may try to steal confidential data for illegal benefits, and they use a variety of methods to achieve the goal of attacks, e.g., were as many as 650 million malicious spam emails sent to Distributed Denial of Service (DDoS), Spam and Trojan. These Hotmail from December 3 to 21, 2009. There were at least methods require a large number of computers; hence, hackers often 233 source IP addresses in Taiwan involved in sending spam spread out malicious software to infect those computers with lower emails for the Waledac botnets during early May 2009, defense mechanisms. The infected computers will become the showing that botnets could really influence the global zombie computers in the botnets controlled by hackers. Thus, it is computer networks. an important subject regarding network security to detect and defend against the botnets. Among them, the Peer-to-Peer (P2P) Today, the Internet is widely used for communication, botnet is a new type of botnets with every zombie computer as a multimedia, shopping, entertainment, research, education, peer controlled by hackers and thus its defense is more difficult. and so on, and it is continuously extending its application The objective of this study is to identify the traffic flows produced areas. In the open network environments, the computers by known or unknown malicious software for defending against connected to the Internet are vulnerable and subject to P2P botnets. Based on the analysis of P2P network’s traffic flows different kinds of attacks. Even with the antivirus software and the ASCII distribution in their packets, a mechanism containing six steps was proposed to identify the traffic flows of P2P botnets installed on the computers and frequently updated, it is still for locating the zombie computers, and finally restrain the possible to be infected. Due to the neglect of its user and fast computers from further infection. mutation of computer virus, a computer has a great chance to be infected and become the zombie computer. According to Keywords: P2P botnets, network traffic flows, network security, Symantec’s global Internet security report [3], Taipei has decision-tree model. become the city with the world’s highest density of botnet viruses. Up to 80% of the computers may have been 1. Introduction infected, and, what is worse, the users may still be unaware With the advance and development of information and of it. Thus, the prevention of malicious attacks can not communication technologies (ICT), computer networks have simply rely on antivirus software. Sometimes, it is required become an integral part of human life. Its applications range to use some efficient mechanisms to detect and defend from online news, online shopping and the use of Google against the botnets. search to acquire information, online ATM and stock A botnet is a collection of software agents, or robots, that trading. In the open network environments, there are always run autonomously and automatically [4]. The term is most some unscrupulous criminals or organizations trying to use commonly associated with IRC botnets and more recently various methods to steal or destroy personal data in order to malicious software, but it can also refer to a computer obtain illegal benefits. Usually, the hackers will attempt to network using distributed computation software. Botnets are infect a large number of computers lacking or without usually named after its malicious software, such as Peacomm protection using malicious software to form the so-called and Waledac. Basically, the composition of a botnet botnets, and then achieve their purposes by the attacks of includes: the server programs used to control the infected zombie computers through the botnets. The methods that computers, the client programs installed on the infected often used for attacks include: Distributed Denial of Service computers waiting for the control instructions, and the (DDoS), Spam, Click Fraud and Information Leakage. malicious software to infect normal computers to become The first botnet appeared in 1993 in the Internet Relay zombie computers. The above programs often use a unique Chat (IRC) networks, and became wide-spreading after encryption system to communicate with each other to 1999. In New Zealand, a 19-year-old hacker controlled 150 prevent from being detected and they are running in the million computers through the Internet, which is the largest background of infected computers using an exchange known botnet; another Chinese hacker controlled 60,000 channel (e.g., the RFC1459 standard, Twitter) to computers to attack a music website, causing the website out communicate with its command and control server. The new of service even with its server being transferred to Taiwan or robot can automatically scan its environment and use the the USA. The two events caused the loss of hundreds of weakness of passwords to infect other computers. When a million dollars [1], and the two hackers were finally arrested. robot is capable of infecting more computers, it is more Waledac [2] is one of the top 10 botnets in the USA, valuable in the botnets controlled by the hackers. Based on the ways of connection between the hackers and 139 International Journal of Communication Networks and Information Security (IJCNIS) Vol. 3, No. 2, August 2011 zombie computers, there are three types of botnets, i.e. IRC, overall efficiency. This study improved the above approach HTTP and P2P botnets. In the first type of botnets, an by filtering out the unwanted P2P and non-P2P packets to infected computer is automatically connected to the IRC chat reduce the time identification processes. Then, it used the room controlled by the hackers and waits for the next decision-tree model trained by known P2P traffic flows to operational command. Hackers can also set up their own IRC further increase the identification rate. servers or use the public IRC servers to exchange messages A decision tree is a classification procedure to assign a with zombie computers. The architecture of HTTP botnets is number of objects to the predefined categories. In the similar to that of IRC botnets, mainly launching attacks classification process, data are collected and divided into through malicious HTTP servers set up by the hackers. several homogeneous subsets recursively. The decision tree IRC and HTTP botnets use the client-server architecture consists of the root, intermediate nodes, and end nodes. The and thus have the feature of single point of failure, which root forms the base of all information, so it doesn’t have any means the entire botnet will collapse once the server has input but can have zero or several outputs; an intermediate been shot down. Therefore, the P2P botnet was proposed by node is a partitioned data set, which can have two or more hackers as a new architecture using P2P communication input and output; an end node, or leaf node, has one input protocols. In a P2P botnet, any zombie computer can be a and no output. The J48 decision tree used in this study is an client or a server, and it connects to the botnet according to improved decision tree based on Quinlan’s C4.5 decision its peer list to from a reciprocal relationship within the tree [10], and it expands the tree structure, starting from the network topology. Therefore, a P2P botnet doesn’t need any root to the end nodes, for better understanding the rules particular server to download programs or receive generated. instructions; the hackers can launch attacks from any In this study, the detection of P2P botnets was done by computer in the P2P botnet. Consequently, the detection and identifying their traffic flows to locate the zombie computers prevention of P2P botnets are more difficult and challenging. and finally restrain other computers from further infection. In recent years, the research on botnets has become an At first, the packets sending from the source ports to the important issue. According to the study of Zhu et al. [5], destination ports by the computers in the network were current research about botnets can be divided into three main filtered, which could help understand the current status of areas: (a) the investigation of botnets by structural analysis the network. Also, the information obtained from these or observing their operation, (b) detecting and tracking packets could be used to identify the traffic flows of P2P botnets, and (c) defending against the attacks of botnets. The botnets. The mechanism proposed in this study for above study was focused on the IRC protocols of botnets. identifying P2P botnets contains the following six steps: Currently, most detection mechanisms for P2P botnets are z Pre-processing stage: filtering out non-P2P traffic flows designed to detect a single type of P2P botnets, so they to simplify the identification process. couldn’t be applied to other types of P2P botnets. To remedy z Identification of P2P application hosts: identifying the this drawback, Liu [6] proposed an adaptive defense hosts running P2P application programs.