Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter For
Total Page:16
File Type:pdf, Size:1020Kb
Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits Carl Sabottke, Octavian Suciu, and Tudor Dumitraş, University of Maryland https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/sabottke This paper is included in the Proceedings of the 24th USENIX Security Symposium August 12–14, 2015 • Washington, D.C. ISBN 978-1-939133-11-3 Open access to the Proceedings of the 24th USENIX Security Symposium is sponsored by USENIX Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits Carl Sabottke Octavian Suciu Tudor Dumitras, University of Maryland Abstract [54], Microsoft’s exploitability index [21] and Adobe’s In recent years, the number of software vulnerabilities priority ratings [19], err on the side of caution by mark- discovered has grown significantly. This creates a need ing many vulnerabilities as likely to be exploited [24]. for prioritizing the response to new disclosures by assess- The situation in the real world is more nuanced. While proof of concept ing which vulnerabilities are likely to be exploited and by the disclosure process often produces quickly ruling out the vulnerabilities that are not actually exploits, which are publicly available, recent empirical exploited in the real world. We conduct a quantitative studies reported that only a small fraction of vulnerabili- in the real world and qualitative exploration of the vulnerability-related ties are exploited , and this fraction has information disseminated on Twitter. We then describe decreased over time [22,47]. At the same time, some vul- the design of a Twitter-based exploit detector, and we in- nerabilities attract significant attention and are quickly troduce a threat model specific to our problem. In addi- exploited; for example, exploits for the Heartbleed bug tion to response prioritization, our detection techniques in OpenSSL were detected 21 hours after the vulnera- have applications in risk modeling for cyber-insurance bility’s public disclosure [41]. To provide an adequate and they highlight the value of information provided by response on such a short time frame, the security com- the victims of attacks. munity must quickly determine which vulnerabilities are exploited in the real world, while minimizing false posi- tive detections. 1 Introduction The security vendors, system administrators, and The number of software vulnerabilities discovered has hackers, who discuss vulnerabilities on social media sites grown significantly in recent years. For example, 2014 like Twitter, constitute rich sources of information, as the marked the first appearance of a 5 digit CVE, as the CVE participants in coordinated disclosures discuss technical database [46], which assigns unique identifiers to vulner- details about exploits and the victims of attacks share abilities, has adopted a new format that no longer caps their experiences. This paper explores the opportuni- the number of CVE IDs at 10,000 per year. Additionally, ties for early exploit detection using information avail- many vulnerabilities are made public through a coordi- able on Twitter. We characterize the exploit-related dis- nated disclosure process [18], which specifies a period course on Twitter, the information posted before vulner- when information about the vulnerability is kept confi- ability disclosures, and the users who post this informa- dential to allow vendors to create a patch. However, this tion. We also reexamine a prior experiment on predicting process results in multi-vendor disclosure schedules that the development of proof-of-concept exploits [36] and sometimes align, causing a flood of disclosures. For ex- find a considerable performance gap. This illuminates ample, 254 vulnerabilities were disclosed on 14 October the threat landscape evolution over the past decade and 2014 across a wide range of vendors including Microsoft, the current challenges for early exploit detection. Adobe, and Oracle [16]. Building on these insights, we describe techniques To cope with the growing rate of vulnerability discov- for detecting exploits that are active in the real world. ery, the security community must prioritize the effort to Our techniques utilize supervised machine learning and respond to new disclosures by assessing the risk that the ground truth about exploits from ExploitDB [3], OS- vulnerabilities will be exploited. The existing scoring VDB [9], Microsoft security advisories [21] and the systems that are recommended for this purpose, such as descriptions of Symantec’s anti-virus and intrusion- FIRST’s Common Vulnerability Scoring System (CVSS) protection signatures [23]. We collect an unsampled cor- 1 USENIX Association 24th USENIX Security Symposium 1041 pus of tweets that contain the keyword “CVE,” posted exploits, for which the exploit code is publicly available, between February 2014 and January 2015, and we ex- and private PoC exploits, for which we can find reliable tract features for training and testing a support vector information that the exploit was developed, but it was machine (SVM) classifier. We evaluate the false posi- not released to the public. A PoC exploit may also be a tive and false negative rates and we assess the detection real-world exploit if it is used in attacks. lead time compared to existing data sets. Because Twit- The existence of a real-world or PoC exploit gives ter is an open and free service, we introduce a threat urgency to fixing the corresponding vulnerability, and model, considering realistic adversaries that can poison this knowledge can be utilized for prioritizing remedi- both the training and the testing data sets but that may be ation actions. We investigate the opportunities for early resource-bound, and we conduct simulations to evaluate detection of such exploits by using information that is the resilience of our detector to such attacks. Finally, we available publicly, but is not included in existing vul- discuss the implications of our results for building secu- nerability databases such as the National Vulnerability rity systems without secrets, the applications of early ex- Database (NVD) [7] or the Open Sourced Vulnerabil- ploit detection and the value of sharing information about ity Database (OSVDB) [9]. Specifically, we analyze the successful attacks. Twitter stream, which exemplifies the information avail- In summary, we make three contributions: able from social media feeds. On Twitter, a community of hackers, security vendors and system administrators We characterize the landscape of threats related to • discuss security vulnerabilities. In some cases, the vic- information leaks about vulnerabilities before their tims of attacks report new vulnerability exploits. In other public disclosure, and we identify features that can cases, information leaks from the coordinated disclosure be extracted automatically from the Twitter dis- process [18] through which the security community pre- course to detect exploits. pares the response to the impending public disclosure of To our knowledge, we describe the first technique a vulnerability. • for early detection of real-world exploits using so- The vulnerability-related discourse on Twitter is in- cial media. fluenced by trend-setting vulnerabilities, such as Heart- We introduce a threat model specific to our problem bleed (CVE-2014-0160), Shellshock (CVE-2014-6271, • and we evaluate the robustness of our detector to CVE-2014-7169, and CVE-2014-6277) or Drupalged- adversarial interference. don (CVE-2014-3704) [41]. Such vulnerabilities are mentioned by many users who otherwise do not provide Roadmap. In Sections 2 and 3 we formulate the prob- actionable information on exploits, which introduces a lem of exploit detection and we describe the design of significant amount of noise in the information retrieved our detector, respectively. Section 4 provides an empir- from the Twitter stream. Additionally, adversaries may ical analysis of the exploit-related information dissemi- inject fake information into the Twitter stream, in an at- nated on Twitter, Section 5 presents our detection results, tempt to poison our detector. Our goals in this paper are and Section 6 evaluates attacks against our exploit detec- (i) to identify the good sources of information about ex- tors. Section 7 reviews the related work, and Section 8 ploits and (ii) to assess the opportunities for early detec- discusses the implications of our results. tion of exploits in the presence of benign and adversarial noise. Specifically, we investigate techniques for mini- mizing false-positive detections—vulnerabilities that are 2 The problem of exploit detection not actually exploited—which is critical for prioritizing We consider a vulnerability to be a software bug that has response actions. security implications and that has been assigned a unique identifier in the CVE database [46]. An exploit is a piece Non-goals. We do not consider the detection of zero- of code that can be used by an attacker to subvert the day attacks [32], which exploit vulnerabilities before functionality of the vulnerable software. While many re- their public disclosure; instead, we focus on detecting the searchers have investigated the techniques for creating use of exploits against known vulnerabilities. Because exploits, the utilization patterns of these exploits provide our aim is to assess the value of publicly available infor- another interesting dimension to their security implica- mation for exploit detection, we do not