Malicious Website Detection Based on Honeypot Systems
Total Page:16
File Type:pdf, Size:1020Kb
2nd International Conference on Advances in Computer Science and Engineering (CSE 2013) Malicious Website Detection Based on Honeypot Systems Tung-Ming Koo, Hung-Chang Chang, Ya-Ting Hsu Huey-Yeh Lin Department of Information Management Department of Finance National Yunlin University of Science & Technology National Formosa University Douliou, Yunlin, Taiwan, R.O.C. Huwei, Yunlin, Taiwan, R.O.C. e-mail: {koo, g9823811, g9723742}@yuntech.edu.tw e-mail: [email protected] Abstract—In the Internet age, every computer user is likely to Several reasons can be attributed to why malicious inadvertently encounter highly contagious viruses. Over the attacks use the web as the medium for attacking. First, the past several years, a new type of web attack has spread across HTTP protocol is very easy to configure and use. Second, the web, that is, when a client connects to a malicious remote using a generic communication protocol is inconspicuous. server, the server responds to the request while simultaneously Because most firewalls permit port 80 of the HTTP protocol transporting malicious programs to the client’s computer, to pass, they are often useless to withstand such attacks, thereby launching a drive-by download attack. If the attack is allowing attacking packets to invade the system effortlessly. successful, malicious servers can control and execute any Because interconnectivity has become an indispensable part program from the client’s computer. Malicious websites of everyday life, terrorists believe that attacking through the frequently harbor obfuscation mechanisms to evade signature- based detection systems. These obfuscators have become web is an ideal option because of its maximum benefits and increasingly sophisticated that they have begun to invade minimum risks. multimedia files (JPG, Flash, and PDF). Under such Malicious websites frequently use obfuscation techniques circumstances, unless specific behaviors are triggered by to evade detection; these obfuscations have become malicious webpages, identifying programs with malicious increasingly sophisticated, even extending to multimedia intent by merely analyzing web content is extremely difficult, documents (JPG, Flash, and PDF). Under these not to mention the formidable quantity of webpages and the circumstances, unless specific behaviors are triggered by ever changing attack techniques. Based on a client-side browsing malicious webpages, identifying programs with honeypot system, this study proposes a model for determining malicious intentions by analyzing only the content is whether a webpage is malicious. We present a technique to extremely difficult, not including the daunting quantity of improve the accuracy of malicious web detection. First, static webpages and constantly changing attacking techniques. content analysis is performed to accelerate the detection, Thus, this study first analyzes the static content of webpages followed by actual browsing on webpages for in-depth probing to accelerate the analyzing process; then, we use a honeypot using the client-side honeypot system. Using this method, system to browse webpages for in-depth probing. The user’s security is protected when surfing the Internet. analysis outcome is used to determine whether webpages are benign or malicious automatically. Keywords- Honeypot; malicious website; drive-by download The followings are the main objectives of this technique: z Active detection of malicious websites I. INTRODUCTION z Automated malicious website analysis A web browser is indispensable tool for browsing online z Establish a protective mechanism content. However, a security report by IBM published in the II. LITERATURE REVIEW first half of 2009 [10] indicated that web browser vulnerabilities have been the most exploited over the past Traditional attacks are launched from the server-side, that few years. Regarding the range of operating systems, the is, the attack and penetration are aimed at the server service Microsoft operating systems family was ranked first place system. Nowadays, attacks have been shifting to the client- for the severity and incidence of system breaches. side, that is, when a client-side application (browser) Furthermore, Microsoft has the largest user base. Therefore, interacts with malicious servers, the vulnerabilities of the this study uses Microsoft operating systems to build the application are exploited for malicious purposes, or the experimental environment. client-side is induced to execute malicious programs. Any When a client connects to a malicious remote server, the client/server architecture can lead to this type of attack (Web, server transfers malicious programs to the client’s computer. FTP, and Mail), which is difficult to detect with the current Then, when the user sends a request to the malicious server, firewall, invasion detection, and proxy-based defense the server responds to the request while simultaneously systems. launching a drive-by download attack[17]. If the download is Drive-by downloads[1][17], also called “forced completed successfully, malicious servers can execute any downloads” or “pass-by downloads,” are a new type of programs on the client’s computer. According to statistics by client-side attack. When a user visits a webpage containing the Symantec Corporation [5][17], more than 18 million malicious scripts, the malicious programs are downloaded to drive-by downloads were enforced in 2008. the user’s computer without the consent of the user. Even © 2013. The authors - Published by Atlantis Press 76 legitimate websites can carry drive-by downloads simply assesses through manual work and only analyzes specific because they were compromised and injected with malicious websites. scripts. If the web browser or other software is not patched or Honeyclient is a highly interactive web-based honeypot updated regularly, the user may become infected. developed by Wang in 2004 [12]; it was the first open source The process of drive-by downloads are as follows: client honeypot written using Perl. As an event-based 1. Attackers seize control of legitimate web servers application, Honeyclient can detect attacks at the client side 2. Web browsing by users by monitoring specific directories and the system registry. 3. Drive-by downloads Using the password hash MD5, Honeyclient compares the 4. Redirecting to malicious websites directories and registry after interacting with the server. 5. Malicious website attacking the user When the checksum differs from the original, it is considered Numerous studies have investigated malicious webpage malicious. However, this approach is time-consuming and detection and protection in recent years. Static analysis has a high false positive rate. examines whether the program code or original documents In 2008, Seifert proposed Capture [3][13], a highly carry the defined signatures or patterns without executing interactive client honeypot with full functionalities. This these programs. Dynamic analysis refers to the detection honeypot uses a virtual machine to simulate the system technique that uses behavioral patterns to simulate the environment at user’s end. By controlling the browser, it browsing environment of users by loading an actual webpage detects the remote target hosts and observes changes in the into the browser and determining whether the browser system status. Modifications to the system status may downloads or executes unwanted scripts. suggest an abnormality. Although this approach is precise, A honeypot is a type of information system resource that detection is time-consuming. can be used without authorization or illegally. Thus, honeypots are primarily deployed as targets to be detected, III. RESEARCH METHODS attacked, or harmed by exploitive code. The basic Thousands of drive-by downloads occur on mainstream assumption is that because honeypots simulate non-existent websites daily; the users of these websites are infected or systems or services, a person attempting to reach the non- attacked at an astonishing rate. Unfortunately, numerous existent systems or services must have bad intentions. users remain unaware of these attacks. Thus, this study Most honeypots are server-side systems. They are investigates using static content analysis and dynamic configured as vulnerable hosts providing specific services to behavior analysis to determine whether a webpage has ensure they are detected and assaulted by hackers and malicious or abnormal intentions. We employ semantic- damaged by malware. Honeypots are attacked passively and based static content analysis and the behavior capturing do not act as a trapping system without first being attacked. technique proposed by Seifert et al. [2][14][15], and attempt Over the past several years, with the attack targets to improve these methods considering the network shifting to the client side, Spizner developed a new client characteristics. honeypot type [4]. This honeypot differs from the traditional Currently, most malicious webpages are assessed using server honeypot in that its active sniffing creates interactions static analysis or signature-based value. Despite the with the target hosts, and it is designed to actively detect provision of rapid detection and judgment, these methods offensive behaviors from malicious hosts. By contrast, server cannot resist unknown attack variations. Although heuristics honeypots cannot sniff actively. Therefore, a client honeypot and behavior-based detection techniques can prevent is used to actively detect whether a web service provided