Paint it Black: Evaluating the Effectiveness of Malware Blacklists Marc K¨uhrer,Christian Rossow, and Thorsten Holz Horst G¨ortzInstitute for IT-Security, Ruhr-University Bochum, Germany [email protected] Abstract. Blacklists are commonly used to protect computer systems against the tremendous number of malware threats. These lists include abusive hosts such as malware sites or botnet Command & Control and dropzone servers to raise alerts if suspicious hosts are contacted. Up to now, though, little is known about the effectiveness of malware blacklists. In this paper, we empirically analyze 15 public malware blacklists and 4 blacklists operated by antivirus (AV) vendors. We aim to categorize the blacklist content to understand the nature of the listed domains and IP addresses. First, we propose a mechanism to identify parked domains in blacklists, which we find to constitute a substantial number of blacklist entries. Second, we develop a graph-based approach to identify sinkholes in the blacklists, i.e., servers that host malicious domains which are con- trolled by security organizations. In a thorough evaluation of blacklist effectiveness, we show to what extent real-world malware domains are actually covered by blacklists. We find that the union of all 15 public blacklists includes less than 20% of the malicious domains for a major- ity of prevalent malware families and most AV vendor blacklists fail to protect against malware that utilizes Domain Generation Algorithms. Keywords: Blacklist Evaluation, Sinkholing Servers, Parking Domains 1 Introduction The security community needs to deal with an increasing number of malware samples that infect computer systems world-wide. Many countermeasures have been proposed to combat the ubiquitous presence of malware [1{4]. Most notably, researchers progressively explored network-based detection methods to comple- ment existing host-based malware protection systems. One prominent example are endpoint reputation systems. The typical approach is to assemble a blacklist of endpoints that have been observed to be involved in malicious operations. For example, blacklists can contain domains of Command & Control (C&C) servers of botnets, dropzone servers, and malware download sites [5]. Such blacklists can then be queried by an intrusion detection system (IDS) to determine if a previ- ously unknown endpoint (such as a domain) is known for suspicious behavior. Up to now, though, little is known about the effectiveness of malware black- lists. To the best of our knowledge, the completeness and accuracy of malware 2 Marc K¨uhrer,Christian Rossow, and Thorsten Holz blacklists was never examined in detail. Completeness is important as users oth- erwise risk to miss notifications about malicious but unlisted hosts. Similarly, blacklists may become outdated if entries are not frequently revisited by the providers. While an endpoint may have had a bad reputation in the past, this might change in the future (e.g., due to shared hosting). In this paper, we analyze the effectiveness of 15 public and 4 anti-virus (AV) vendor malware blacklists. That is, we aim to categorize the blacklist content to understand the nature of the listed entries. Our analysis consists of multiple steps. First, we propose a mechanism to identify parked domains, which we find to constitute a substantial number of blacklist entries. Second, we develop a graph-based approach to identify sinkholed entries, i.e., malicious domains that are mitigated and now controlled by security organizations. Last, we show to what extent real-world malware domains are actually covered by the blacklists. In the analyzed blacklist data we identified 106 previously unknown sinkhole servers, revealing 27 sinkholing organizations. In addition, we found between 40 - 85% of the blacklisted domains to be unregistered for more than half of the analyzed blacklists and up to 10.9% of the blacklist entries to be parked. The results of analyzing the remaining blacklist entries show that the coverage and completeness of most blacklists is insufficient. For example, we find public black- lists to be impractical when it comes to protecting against prevalent malware families as they fail to include domains for the variety of families or list malicious endpoints with reaction times of 30 days or higher. Fortunately, the performance of three AV vendor blacklists is significantly better. However, we also identify shortcomings of these lists: only a single black- list sufficiently protects against malware using Domain Generation Algorithms (DGAs) [3], while the other AV vendor blacklists include a negligible number of DGA-based domains only. Our thorough evaluation can help to improve the effectiveness of malware blacklists in the future. To summarize, our contributions are as follows: { We propose a method to identify parked domains by training an SVM clas- sifier on seven inherent features we identified for parked web sites. { We introduce a mechanism based on blacklist content and graph analysis to effectively identify malware sinkholes without a priori knowledge. { We evaluate the effectiveness of 19 malware blacklists and show that most public blacklists have an insufficient coverage of malicious domains for a ma- jority of popular malware families, leaving the end hosts fairly unprotected. While we find blacklists operated by AV vendors to have a significantly higher coverage, up to 26.5% of the domains were still missed for the majority of the malware families, revealing severe deficiencies of current reputation systems. 2 Overview of Malware Blacklists Various malware blacklists operated by security organizations can be used to identify malicious activities. These blacklists include domains and IP addresses, which have been observed in a suspicious context, i.e., hosts of a particular Paint it Black: Evaluating the Effectiveness of Malware Blacklists 3 type such as C&C servers or|less restrictive|endpoints associated to malware in general. Table 1 introduces the 15 public malware blacklists that we have monitored for the past two years [6]. For the majority of blacklists, we repeatedly obtained a copy every 3 hours (if permitted). The columns Current state the number of entries that were listed at the end of our monitoring period. The columns Historical summarize the entries that were once listed in a blacklist, but became delisted during our monitoring period. For reasons of brevity, we have omitted the number of listed IP addresses per blacklist, as we mainly focus on the blacklisted domains in our analyses. For all listed domains, we resolved the IP addresses and stored the name server (NS) DNS records. If blacklists contained URLs, we used the domain part of the URLs for our analysis. Four blacklists are provided by Abuse:ch, of which three specifically list hosts related to the Palevo worm and the banking trojans SpyEye and ZeuS. The Virustracker project lists domains generated by DGAs, and the Citadel list in- cludes domains utilized by the Citadel malware (that was seized by Microsoft in 2013 [7]). UrlBlacklist combines user submissions and other blacklists, covering domains and IPs of various categories, whereas we focus on the malware-related content. The Exposure [4] blacklist included domains that were flagged as mali- cious by employing passive DNS (pDNS) analysis. The Abuse.ch AMaDa and the Exposure lists were discontinued, yet we leverage the collected historical data. Table 1. Observed content of the analyzed malware blacklists (z denotes C&C blacklists) Domains (in #) Domains (in #) Observ. Observ. Blacklist Current Historical (days) Blacklist Current Historical (days) AMaDa [8]z 0 1,494 267 Citadel [7]z 4,634 0 66 Palevo Tracker [8]z 35 147 542 Cybercrime [9]z 1,070 0 121 Shadowserver [13]z 0 0 832 Exposure [4] 0 107,183 559 Shallalist [14] 20,677 48 320 Malc0de [10] 2,121 20,135 832 SpyEye Tracker [8]z 123 956 832 MDL Hosts [11] 1,653 11,996 832 UrlBlacklist [15] 127,745 281 824 MDL ZeuS [11]z 12 1,675 829 Virustracker [16] 12,066 56,269 196 MW-Domains [12] 23,396 37,490 832 ZeuS Tracker [8]z 759 8,042 832 Besides these public blacklists, we have requested information from four anti- virus (AV) vendors, namely Bitdefender TrafficLight [17], Browserdefender [18], McAfee Siteadvisor [19], and Norton SafeWeb [20]. These blacklists cannot be downloaded, but we can query if a domain is listed. We thus do not know the overall size of these blacklists and omit the numbers in Table 1. Datasets. We divide the 15 public blacklists into three overlapping datasets. The first dataset, referred to as SC&C , consists of domains taken from the sources primarily listing endpoints associated to C&C servers, denoted by z in Table 1. We extend SC&C with the IP addresses to which any of these domains at some point resolved to. The second, coarse-grained dataset SMal includes the domains that were at any time listed in any of the 15 blacklists (including SC&C ) and the resolved IPs. Last, we generate a third dataset SIP s, covering all currently listed IP addresses by any of the 15 public blacklists (i.e., 196,173 IPs in total). This dataset will help us to verify if blacklists contain IPs of sinkholing servers. 4 Marc K¨uhrer,Christian Rossow, and Thorsten Holz Paper Outline. Motivated by the fact that blacklists contain thousands of do- mains, we aim to understand the nature of these listings. We group the entries in four main categories: domains are either i) unregistered, ii) controlled by park- ing providers, iii) assigned to sinkholes, or iv) serve actual content. Unregistered domains can easily be identified using DNS. However, it is non-trivial to detect parked or sinkholed domains. We thus propose detection mechanisms for these two types in Section 3 (parking domains) and Section 4 (sinkholed domains). In Section 5, we classify the blacklist content and analyze to what extent blacklists help to protect against real malware.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages21 Page
-
File Size-