D3.5 Analysis of Legal and Illegal

Vulnerability Markets and Specification of

the Data Acquisition Mechanisms

Work Package 3: Economic Analysis

Document Dissemination Level P Public ☒

CΟ Confidential, only for members of the Consortium (including the Commission Services) ☐

Document Due Date: 31/10/2017 Document Submission Date: 06/11/2017

This work is performed within the SAINT Project – Systemic Analyser in Network Threats – with the support of the European Commission and the Horizon 2020 Program, under Grant Agreement No 740829

Copyright SAINT Consortium. All rights reserved. 1

D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Document Information Deliverable number: 3.5 Deliverable title: Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms Deliverable version: 1.0 Work Package number: 3 Work Package title: Economic Analysis Due Date of delivery: 31/10/2017 Actual date of delivery: 06/11/2017 Dissemination level: PU Editor(s): Yannis Stamatiou (CTI) Contributor(s): John Bothos (NCSRD) (CYBE) Dimitrios Kavallieros (KEMEA) Pantelis Tzamalis (CTI) Vasileios Vlachos (CTI) Yannis Stamatiou (CTI) Reviewer(s): Stelios Thomopoulos (NCSRD) Georgios Germanos (KEMEA) Jart Armin (CYBE) Edgardo Montes (MNTMG) Ethical advisor(s): Christina Chalanouli (KEMEA) Project name: Systemic Analyser in Network Threats Project Acronym SAINT Project starting date: 1/5/2017 Project duration: 24 months Rights: SAINT Consortium

Version History Version Date Beneficiary Description 0.1 28/07/2017 CTI Table of Contents 0.2 29/09/2017 CTI First draft version for further processing by the involved partners 0.3 12/10/2017 CTI Updated version ready for proofreading 0.4 16/10/2017 CTI Proofread version ready for technical review 0.5 20/10/2017 Ethical advisor Review by Ethical and legal advisor 0.6 25/10/2017 CTI Final version 0.7 2/11/2017 Security Advisory Review by Security Advisory Board Board members 1.0 6/11/2017 CTI Final version ready for submission

Copyright SAINT Consortium. All rights reserved. 2 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Abbreviations and Acronyms ACRONYM EXPLANATION ANZUS The Australia, New Zealand, United States Security Treaty API Application Programming Interface AR Abnormal Returns ASEAN Association of Southeast Asian Nations CAR Cumulative Abnormal Returns CERT Computer Emergency Response Team CVE Common Vulnerabilities and Exposures DDoS Distributed Denial-of-Service (type of attack) DoS Denial-of-Service (type of attack) EU European Union FBI Federal Bureau of Investigations I2P Invisible Internet Project IoT Internet of Things JSON JavaScript Object Notation LPE Local Privilege Escalation (type of vulnerability) NATO North Atlantic Treaty Organization (also called the North Atlantic Alliance) NIST National Institute of Standards and Technology NSA National Security Agency NVD National Vulnerability Database OS Operating System RCE Remote Code Execution (type of vulnerability) OVAL Open Vulnerability and Assessment Language RUB Remote Jailbreak with Persistence (type of vulnerability) TCP/IP Transmission Control Protocol/Internet Protocol UK United Kingdom US United States

Copyright SAINT Consortium. All rights reserved. 3 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Table of Contents Executive summary ...... 7 1. Introduction ...... 8 2. Identification of vulnerability markets ...... 10 2.1 Vulnerability related concepts ...... 10 2.2 Vulnerability facts and trends ...... 11 2.3 Vulnerability producers (discoverers) ...... 17 2.4 Vulnerability markets ...... 17 2.4.1 White markets ...... 17 2.4.1.1 Publicity ...... 17 2.4.1.2 Captive ...... 17 2.4.1.3 Reward programs ...... 18 2.4.1.4 Security company ...... 19 2.4.2 Online forums ...... 20 2.4.3 Grey markets ...... 21 2.4.4 Black markets...... 21 2.5 Vulnerability consumers (buyers) ...... 22 2.6 Vulnerability resolutions (patches) ...... 22 3. 0-Day vulnerabilities and Deep Web markets ...... 26 3.1 0-day vulnerabilities ...... 26 3.2 Pricing information on 0-day vulnerabilities and exploits ...... 27 3.3 0-day vulnerability markets ...... 32 3.3.1 White Markets ...... 32 3.3.2 Grey Markets ...... 34 3.3.3 Black Markets ...... 35 3.4 Cryptovirology and the Market for Encryption Back Doors ...... 38 4. The role of the rate of updates and security fixes published by vendors ...... 39 5. Financial aspects of cybersecurity breaches and vulnerability information ...... 40 5.1 General considerations ...... 40 5.2 The Capacity and Value-Based Pricing Model for vulnerability and exploit trading ...... 41 5.3 Costs of vulnerability announcements to vendors and costs of proactive defences ...... 42 5.4 The effect of vulnerability disclosure on the market value of software product vendors ...... 45 5.5 Modelling the decisions of the vulnerability discoverer and defender...... 47 6. Specifications for the OSINT Web Crawler and the Social Network Analyser ...... 49 6.1 Web Crawler ...... 50 6.2 Social Network Analyzer (SNA) ...... 51 6.3 Terms of use of the tools ...... 60 7. Conclusion ...... 61

Copyright SAINT Consortium. All rights reserved. 4 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

References ...... 62

Copyright SAINT Consortium. All rights reserved. 5 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Table of Figures Figure 1-1: Types of experienced cyber-attacks (Ponemon, 2015) ...... 8 Figure 1-2: Average annual cyber-crime cost weighted by attack frequency (Ponemon, 2015) ...... 9 Figure 2-1: Vulnerability life cycle (blue rectangle: pre-disclosure risk, red rectangle: post-disclosure risk) . 11 Figure 2-2: The evolution of vulnerability numbers since 1999 (CVE) ...... 12 Figure 2-3: Number of vulnerabilities per type since 1999 (CVE) ...... 12 Figure 2-4: Number of vulnerabilities, since 1999 (CVE), according to the CVSS (Common Vulnerability Scoring System – NIST) ...... 13 Figure 2-5: Number of vulnerabilities per vendor and software product or system ...... 14 Figure 2-6: A generic vulnerability market and involved actors’ taxonomy ...... 16 Figure 2-7: Generic vulnerability discovery and reporting process in reward programs ...... 19 Figure 2-8: Generic vulnerability discovery reward process in reward programs ...... 19 Figure 2-9: Company specific vulnerability and patch related information ...... 23 Figure 2-10: Patch availability for vulnerabilities in all products (2011 – 2017) ...... 25 Figure 2-11: Patch availability for vulnerabilities in Top-50 products in the market plus Windows 7 (2011 – 2017) ...... 25 Figure 3-1: 0-day vulnerabilities registered by Secunia (year 2016) ...... 27 Figure 3-2: Zerodium pricing list for mobile phone 0-Day vulnerabilities/exploits (August 28, 2017) ...... 28 Figure 3-3: Zerodium pricing list for desktop/server 0-Day vulnerabilities/exploits (as of August 28, 2017) 29 Figure 3-4: Zerodium pricing list for mobile phone 0-Day vulnerabilities/exploits (September, 2016) ...... 29 Figure 3-5: Zerodium pricing list for mobile phone 0-Day vulnerabilities/exploits (November, 2015) ...... 30 Figure 3-6: Top affected by cyberattacks countries (Shodan tool) ...... 36 Figure 3-7: Top affected by vulnerabilities services (Shodan tool) ...... 36 Figure 3-8: Top affected by vulnerabilities Operating Systems (Shodan tool) ...... 37 Figure 3-9: Buffer overflow related information (Shodan tool) ...... 37 Figure 4-1: Information on software security updates and patches ...... 40 Figure 5-1: Vulnerability discovery and announcement with ensuing Abnormal Returns ...... 45 Figure 6-1. From raw data to semantic knowledge ...... 50 Figure 6-2. The process of social media mining ...... 52 Figure 6-3. The time dimension of REST API Vs Streaming API ...... 57 Figure 6-4. Line graph with keyword search: Ransomware ...... 57 Figure 6-5. Map graph with keyword search: Ransomware...... 58 Figure 6-6. Line graph with related keyword query: ransomware attacks 2017 ...... 58 Figure 6-7. Map graph with related keyword query: ransomware attacks 2017 ...... 58

Table of Tables Table 1-1: Important facts about information breaches on large and small organizations in 2015 ...... 8 Table 2-1: Price ranges for 0-day vulnerabilities ...... 22 Table 2-2: OVAL repository vulnerability tests ...... 24 Table 3-1: 0-day vulnerability prices (from Forbes) ...... 30 Table 3-2: Pricing list updates by Zerodium (published on August 23, 2017) ...... 30 Table 3-3: Steps in a successful vulnerability sale ...... 32 Table 3-4: Steps in an unsuccessful vulnerability sale ...... 33 Table 3-5: Some estimates on exploit values ...... 33 Table 5-1: A simple two-person vulnerability discovery/defence game ...... 49 Table 5-2: The unique Nash equilibrium in mixed strategies ...... 49 Table 6-1: Description of “Tweet” fields ...... 54

Copyright SAINT Consortium. All rights reserved. 6 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Executive summary

This deliverable reflects work performed within the context of Task T3.3. The deliverable’s goal is to identify and categorise the main vulnerabilities and exploits markets along with the involved stakeholders and the roles they have in these markets. Emphasis is given to the 0-day vulnerability markets and their financial impact on organisations. Out of this vulnerability market analysis, this deliverable document the first appropriate (for the project’s goals) vulnerability information sources which will feed with data the automatic vulnerability discovery and analysis tools that will be developed within the scope of WP5. These sources include vulnerability archiving sites, software vendors’ sites, vulnerability finding contests and bug bounties, as well as governmental sites monitoring activity and archiving incidents. In addition, we review (for possible adoption within WP3) some existing techniques which identify the financial impact of cybercrime activity on companies as well as the effect that financial aspects of cybersecurity have on companies and cybercriminals. These techniques include abnormal returns analysis and game theory and they rely on publicly known figures about the costs (e.g. in market value or stock market losses) of cybercrime incidents on victim organizations as well as the counteracting cost, of defending against cybercrime. Moreover, we present preliminary work with respect to the specifications and architecture of the automated analysis tools that will be developed within the context of WP5. The first to is the Deep Web Crawler, which will be, legally, searching and scraping websites related to cybercrime activities (to the degree it is possible) as well as known vulnerability markets, Bug Bounties, and related websites. The second tool is the Social Network Analyzer (SNA), which applies social media mining techniques into the well-known Twitter API and the Google Trends platform, for monitoring the cybercriminal activity around the world. This deliverable, which is essentially the first technical deliverable of the project, presents our initial investigations and results with respect to the vulnerability markets and how the information sources related to them will feed with data, mainly, WP3 and WP5. In this sense, this deliverable paves the way for the subsequent, deeper and more technical, work within these work packages and their corresponding deliverables.

Copyright SAINT Consortium. All rights reserved. 7 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

1. Introduction Cybercrime activity is not something new and it dates back since the inception of the first computer networks and the invention of the Internet. However, it is a fact that the Internet of Things (IoT) era will increase machine interconnectivity and organisations’ reliance on new technologies and the Cloud. Additionally, people are more readily adopting new technologies, such as mobile phones, tablets, smart wearable devices and cloud services, in their personal and professional lives. Moreover, applications and system software have reached unprecedented levels of complexity to handle the demands of today’s interconnected world of machines and services. It was, thus, a natural consequence to witness, during the last years, an abrupt increase in cybercrime activities as well as the cost inflicted on their victims. Numerous surveys conducted by leading consulting organisations reveal an alarming picture of the global cybercrime trends. For instance, survey results provided by PWC for the year 2015, based on 664 respondent small and large organisations, showed the following (among many other interesting cybersecurity related interesting findings – see [30]):

Table 1-1: Important facts about information breaches on large and small organizations in 2015

Important information breach facts (Survey by PWC, 2015) 90% of the large The average cost of 59% of respondents 69% of the large organisations and 74% of cybersecurity breaches for feared that in 2016 there organisations and 38% of the small ones suffered a the large organisations would be more the small organisations cybersecurity breach in was £1.46m – £3.14m and cybersecurity breaches suffered malicious 2015, representing a £75k – 311£ for the small than in 2015 outsider cyberattacks in notable increase from 81% organizations. 2015. This represents an and 60% respectively for impressive increase the year 2014.

Furthermore, for the same year, independent survey results published by Ponemon from 252 companies revealed the following picture, confirming the alarming picture provided by PWC’s as well as other similar surveys:

Figure 1-1: Types of experienced cyber-attacks (Ponemon, 2015)

Copyright SAINT Consortium. All rights reserved. 8 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 1-2: Average annual cyber-crime cost weighted by attack frequency (Ponemon, 2015)

Finally, as a more recent cyberattack incident we can cite the WannaCry ransomware cyberattack [17] which started on May 12, 2017 inflicting damage upon almost 300000 computers in 150 countries some of which are owned by governmental operators of critical infrastructures (e.g. railways, hospitals, telecommunication providers etc.). The cyberattack propagation was stopped by the application of an effective kill switch discovered by an individual. We can also cite the Petya-like cyberattack which started on June 27, 2017 (see [17]) impacting about 12500 computers in 65 countries. The cyberattack was finally stopped as its propagation was confined, mostly, in local networks rather than across networks throughout the Internet. Behind these attacks (mostly the attack from WannaCry, since the Petya-like cyberattack took advantage of other vulnerabilities too) was EternalBlue. This is an exploit generally believed to be developed by the U.S. National Security Agency (NSA). It was leaked by the Shadow Brokers hacker group on April 14, 2017, and was used as part of the worldwide WannaCry ransomware attack. EternalBlue exploits a vulnerability in Microsoft's implementation of the Server Message Block (SMB) protocol. This vulnerability is denoted by the entry CVE-2017-0144 in the Common Vulnerabilities and Exposures (CVE) catalogue. The vulnerability exists because the SMB version 1 (SMBv1) server in various versions of Microsoft Windows accepts specially crafted packets from remote attackers, allowing them to execute arbitrary code on the target computer (see [4,21]). An important fact about these (and almost all known) attacks, which demonstrates a key element behind the increase of cybercrime despite everybody’s awareness of it and willingness to avoid becoming a victim, is that of negligence or indifference despite the warnings. For instance, on Tuesday, March 14, 2017, Microsoft issued security bulletin MS17-010, which detailed the flaw and announced that patches had been released for all Windows versions that were supported at that time. These were Windows 7, Windows 8.1, Windows 10, Windows Server 2008, Windows Server 2012, and Windows Server 2016, as well as Windows Vista. It is now evident that many Windows users (including system administrators) had not installed the patches as, two months later, on May 12, 2017, the WannaCry ransomware cyberattack used the EternalBlue vulnerability to spread itself. The next day, Microsoft released emergency security patches for Windows 7 and Windows 8, and the unsupported Windows XP and Windows Server 2003. The recent devastating WannaCry cyberattack and the (less devastating, but still dangerous) Petya- like cyberattack, as well as the statistical data cited above, demonstrate certain key facts about cyberattacks and how they materialise. We will address these in this deliverable as key targets of the SAINT project’s approach. Moreover, behind the materialisation of these attacks there is a new and fast-growing body of vulnerability markets with stakeholders selling and buying vulnerabilities for financial gains or to

Copyright SAINT Consortium. All rights reserved. 9 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms avoid financial loss. This implies that a whole new economy is rapidly evolving based on immaterial assets, the vulnerabilities and their exploits. However, there is another important, for our project, financial aspect of the ransomware incidents, such as the attack by the WannaCry ransomware: the currency in which the ransom payment is demanded (see [38]). Over the last years, ransomware attackers demanded payment in cryptocurrencys, with the Bitcoin (https://www.bitcoin.com/) being among the most popular ones. The Bitcoin is virtual currency, or cryptocurrency because of its relationship to cryptography, invented by Satoshi Nakamoto, probably an alias for an anonymous programmer or a team of programmers and was launched in 2009. Its major feature is that it is not backed by usual, material, assets such as gold, but by an immaterial type of asset: information (produced through expense of computational effort). Although the Bitcoin has been established as a form of currency worldwide (with some caveats, though, which are out of scope of this deliverable) it has recently (mainly through ransomware and Dark Web illicit transactions) found its way into cybercrime ([38]). This is because of the fact that the Bitcoin offers two principal advantages to cybercriminals. First, the Bitcoin is a decentralised currency, which means that transactions are accomplished in a peer-to-peer fashion, without mediation of a trusted third party which, in the case of currency, is the central bank of a country or the European Union. Thus, the Bitcoin offers anonymity as no one knows which the involved parties are and what the amount of the transaction is. Second, using Bitcoins for illicit purposes is becoming increasingly easy. David Prince, a cybersecurity specialist and director at Baringa Partners (A London-based ICT consultancy firm) has said the following (see [38]): “If you have the skills to get an iTunes account, you can probably download a ransomware toolkit, an automated bit of software, and start distributing it.”, “You can then go on the darknet and “wash” your bitcoins and convert them back into cash.” Thus, the SAINT project will pay particular attention, in the context of WP3, to virtual currency and, especially, the Bitcoin. In conclusion, at this early stage of the project’s timeline, the work in Task T3.3 and its corresponding deliverable come in. The goal of the task and its deliverable is to identify and categorise the vulnerabilities and exploits markets along with the involved stakeholders and their roles. Moreover, we highlight the, so called, 0-day vulnerability markets and their impact on organisations. Finally, we identify, out of the vulnerability market analysis, the appropriate (for the project’s goals) vulnerability information sources which will form the basis of the automatic vulnerability discovery procedures that will be designed within the project and sketch the architecture and functionality of the tools that will be developed within the scope of WP5. This deliverable, which is the first technical deliverable of the project, was scheduled for early delivery in order to present our initial investigations and results with respect to the vulnerability markets and how the information sources related to them will feed with data, mainly, WP3 and WP5. This deliverable acts as a precursor to the subsequent, deeper and more technical, deliverables of these work packages that will build upon and expand the preliminary findings and observations that are documented in D3.5. 2. Identification of vulnerability markets 2.1 Vulnerability related concepts As stated by CVE (https://www.cvedetails.com/cve-help.php), an information security vulnerability is a problem or error in a software product that can be exploited by malicious parties (e.g. hackers) in order to gain unauthorised access to information, computers systems and networks or commit an illicit action. According to CVE, an error in a software problem is deemed a vulnerability if it allows a malicious party to exploit in order to violate the cybersecurity policies and requirements of a system affected by the problem. Generally, a vulnerability can lead to one or more of the following unwanted events:

• It allows a malicious party to execute commands, impersonating authorised users • It allows a malicious party to access data that the party is not authorised to access, violating the access rights restrictions on the data • It can allow identity theft, that allows a malicious party to appear as another person by stealing her/his digital identity

Copyright SAINT Consortium. All rights reserved. 10 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

• It allows a malicious party to unleash a Denial of Service cyberattack or, generally, a cyberattack that degrades the performance and Quality of Service (QoS) of an information system

Vulnerability examples include the following, among others:

• Buffer/stack overflow • Phf (Remote Command Execution as user “nobody”) • Rpc.ttdbserver (Remote Command Execution as “root”) • World-writeable password file (modification of system-critical data) • Default password (Remote Command Execution or other access) • Denial of service problems that allow a cyber-attacker to cause Blue Screen of Death (i.e. the “blue screen” that the MS Windows OS displays whenever a fatal OS error occurs) • Smurf (Denial of Service by flooding a network)

It should also be pointed out that vulnerabilities may result from incorrect configurations of increasingly complex computer systems, or may be due to the carelessness of system users, for example the use of weak passwords or the sharing of passwords with colleagues and friends.

2.2 Vulnerability facts and trends In Figure 2-1 we can see a generic view of the life-cycle of a vulnerability, from its creation (e.g. by a carelessly developed or installed software product) to the patch produced by the vulnerability “creator” (i.e. the company whose product possessed the vulnerability). It is estimated (see [31]) that as much as 85 new exploits are produced per day, which is an alarming number for the cybersecurity industry as well as the targeted, affected organisations.

Figure 2-1: Vulnerability life cycle (blue rectangle: pre-disclosure risk, red rectangle: post-disclosure risk)

In Figure 2-2, we see an interesting graph, from the CVE (Common Vulnerabilities and Exposures – https://cve.mitre.org/) initiative, showing how the number of discovered vulnerabilities has evolved over the years since 1999. This graph was downloaded at the end of July 2017, as well as the graphs that follow, so the number of vulnerabilities for 2017 is expected to increase significantly until the end of 2017, reaching a maximum over all the previous years.

Copyright SAINT Consortium. All rights reserved. 11 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Vulnerabilities By Year

1999 894 2000 1020 2001 1677 2002 2156 2003 1526 2004 2451 2005 4935 7946 7961 2006 6610 2007 6520 2008 5632 6610 6520 6452 6435 2009 5736 5632 5736 5297 5191 2010 4651 4935 4651 2011 4155 4155 2012 5297 2013 5191 2451 2156 2014 7946 1677 1526 2015 6452 1020 894 2016 6435 2017 796

Figure 2-2: The evolution of vulnerability numbers since 1999 (CVE)

In Figure 2-3, we see (from CVE) the distribution of the number of vulnerabilities discovered since 1999 according to their type. We see that the “Execute Code” related vulnerabilities are prevalent among all other vulnerabilities, which implies that software vendors (mainly OS developers) fail to take appropriate measures during the design and implementation stages. The distribution of the vulnerability types also reflects the preferences of the vulnerability discoverers, as well as the susceptibility of particular products that are related to the vulnerabilities types.

Denial of Service 19541 Execute Code 26127 Overflow 13406 XSS 10933 Directory Traversal 3184 26127 Bypass Something 4897 Gain Information 7974 Gain Privilege 4426 19541 Sql Injection 6585 File Inclusion 2152 Memory Corruption 4318 13406 10933 CSRF 1516 Http Response Splitting 146 7974 6585 4897 4426 4318 3184 2152 1516 146

Figure 2-3: Number of vulnerabilities per type since 1999 (CVE)

In Figure 2-4 we see the categorisation of the discovered vulnerabilities, since 1999, according to their severity based on the Common Vulnerability Scoring System (https://nvd.nist.gov/vuln-metrics/cvss), from the least severe (CVSS score 0 – 1) to the most severe ones (CVSS score 9 – 10). This is, again, an alarming evidence that most of the discovered vulnerabilities (over 50%) are severe, with a severity score at least 6. This, in turn, may imply severe financial or other intangible (e.g. trust, fame) costs on affected companies.

Distribution of all vulnerabilities by CVSS Scores

Copyright SAINT Consortium. All rights reserved. 12 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

CVSS Score Number of Vulnerabilities Percentage

0-1 330 0.40

1-2 701 0.80

2-3 3608 4.10

3-4 2352 2.70

4-5 18087 20.60

5-6 17117 19.50

6-7 11063 12.60

7-8 21436 24.40

8-9 381 0.40

9-10 12850 14.60

Total 87925

Figure 2-4: Number of vulnerabilities, since 1999 (CVE), according to the CVSS (Common Vulnerability Scoring System – NIST)

Figure 2-5 (from CVE) indicates that no software product or system is immune to vulnerabilities, which demonstrates that vulnerability discoverers could virtually target any vendor, operating system, or software product as long as it is either, (or both), a challenging or profitable target.

Copyright SAINT Consortium. All rights reserved. 13 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 2-5: Number of vulnerabilities per vendor and software product or system

Information such as the one depicted in this figure is important for SAINT and will be monitored by the tools developed within the context of the project. This information will be correlated, through the work in WP3, with the economy of the vulnerability markets as well as pricing and financial features of the traded vulnerabilities. In Figure 2-6, we see a general view of today’s vulnerability markets and involved actors/stakeholders, following an interesting taxonomy proposal proposed by Algarni and Malaiya in [1].

Copyright SAINT Consortium. All rights reserved. 14 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

There are, mainly, three types of stakeholders: Vulnerability Producers (discoverers/sellers), Vulnerability Markets and the brokers who manage them, and Vulnerability Consumers (buyers). We also have Attack Resolutions in the picture, since they play a central role in the economics of cybersecurity (e.g. in defining values of vulnerabilities or inflicted damage on the company). As we will see in the next section, there are two types of vulnerability discoverers, the ones who discover vulnerabilities as part of their job in a company and the ones who operate in a freelance fashion. The first category is the one that has a direct and deeper impact on the economy of cybersecurity as well as the value of sold vulnerabilities and exploits. Then, with respect to the markets, we have two types: the regulated and unregulated ones. As their names imply, the regulated markets obey the normal market rules and operate within the legal financial system of countries. These markets are public and open to anyone, with financial transactions taking place (as a rule) according to the national and international legislation. The unregulated markets, on the other side, are more “cryptic”. Usually, they do not publicize, openly, their existence and operate among anonymous stakeholders. In general, these markets are illicit and support cybercrime activities such as trading of illegal goods and money laundering (mainly through virtual currency, such as the Bitcoin). Then we have the vulnerability consumers, i.e. the buyers or users. They include software developers, hacktivists, governmental agencies and hackers that, among stakeholders, are considered the most malicious vulnerability user group, as they invariably use the vulnerabilities they buy or discover for illicit purposes. Finally, we have the attack resolution stage which is the last one in the vulnerability life-cycle. Most often vulnerabilities are corrected, and software updates are made available free from these vulnerabilities. The corrected vulnerabilities are archived and their history, along with the accompanying corrective measures, are available to the public.

Copyright SAINT Consortium. All rights reserved. 15

`

Producers Vulnerability Markets Consumers Attack resolutions regulated unregulated

Figure 2-6: A generic vulnerability market and involved actors’ taxonomy

Copyright SAINT Consortium. All rights reserved. 16

In the next sections, we will use this taxonomy as a basis and expand on its constituents according to the needs of the project. 2.3 Vulnerability producers (discoverers) There are, generally, two types of vulnerability discoverers with respect to their position in the software industry: internal and external to an affected (by a discovered vulnerability) software vendor. They both target personal profit, by either satisfying their organisation’s needs (internal) or selling their discovered vulnerabilities to third parties (external). Vulnerability discovery is, in general, a legal activity, while the exploitation of a vulnerability for malicious or illegal acts is, of course, prohibited by law. People active in legal vulnerability discovery operate in the White and Grey markets (see below), while people acting in illegal actions through vulnerability discovery operate in the Black Market. As we are going to see in this section, the prices for vulnerabilities and their exploits sold to industrial players in the software business as well as governmental agencies can reach $250,000 and even more, at the $1,000,000 level depending on the vulnerability’s characteristics (see, e.g., [12,14]). Each of the vulnerability markets that we examine in this deliverable, that will be the target of the automated analysis and Human Intelligence methodologies of WP5, has its own attractive features to vulnerability discoverers and buyers, depending on their goals. The markets provide margin for profit for both the vulnerability discoverers (monetary reward or recognition and fame) as well as the vulnerability buyers (the vendors can use the vulnerability information for strengthening their products and avoid financial and fame loss). In the following sections, we examine the vulnerability markets based on the taxonomy introduced in [1] with suitable adaptations and extensions for our project’s purposes. Along with our description we will be giving our first impressions, as well as our views on what information sources we can deploy (for, both, technical and financial vulnerability related data) for the purposes of our project and, especially, in the context of the automated analysis of WP5. 2.4 Vulnerability markets 2.4.1 White markets 2.4.1.1 Publicity This is not a market, per se, but the first step in entering a vulnerability market. The main purpose of this “market” is to establish someone as knowledgeable about cybersecurity issues and vulnerabilities through publishing information about one’s own activities, achievements and knowledge. It encompasses blogs, Twitter and other Social Networking accounts, discussion forums, and national/international CERTs (Computer Emergency Response Teams). In the context of our project, we especially target Social Networking as a valuable source of information about cybersecurity incidents and information. Many cybersecurity specialists as well as vulnerability hunters maintain social networking accounts and, regularly, publish information about their findings, actions, and results, sharing their knowledge with their followers. In the context of WP5, the project partners are already experimenting with Twitter accounts through a working prototype of the Social Network Analyser described in the project’s description for WP5, as applied to publicly available information on Twitter and other, tool-searchable, social media accounts (for instance, Facebook blocks automated search engines). The results are automatically collected and archived for subsequent algorithmic analysis (see, for some more details, the discussion in Section 6).

2.4.1.2 Captive Again, this is not a market in the strict sense, since captive discoverers are not allowed to disclose or sell the vulnerabilities they find in products, as they are bound to disclose them only to their organisations. Usually, this “market” consists of organisation environments within whose strict confines vulnerabilities are disclosed to the organisations’ product developers for products affected by the vulnerabilities.

Copyright SAINT Consortium. All rights reserved. 17

D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

2.4.1.3 Reward programs Frequently, and on a permanent basis, software product publishers initiate, themselves, searches for vulnerabilities on their own products, offering (almost always) a monetary reward to the vulnerability finders. This market is, most commonly, called a bug bounty market. Through our research, we identified two main models that software companies employ for creating and maintaining bug bounty programs:

Self-managed: Companies create and maintain their bug bounty programs themselves, on their own website, calling for vulnerabilities in their products and offering (usually) monetary rewards or a “Hall-of- Fame” mention in a list of vulnerability finders. An example of an excellent such program is the one established by Drupal (see https://www.drupal.org/drupal-security-team/general-information), a well- known provider of CMS (Content Management Software). This software facilitates the development and maintenance of websites of various complexities many of which are deployed in cybersecurity-critical environments. Thus, Drupal has created its own bug bounty program, which is explained on its website, with the support of its own cybersecurity team. This program also involves the cybersecurity team offering cybersecurity advice, as well as help, in developing secure web applications based on Drupal’s CMS. Bug bounty web page of software developers, such as Drupal, will be a valuable information source for the tools that are being developed in the context of WP5.

Outsourced: Companies create and maintain their bug bounty programs through other, specialised, vulnerability search service providers. This model is, essentially, a “bug bounty outsourcing” model which has the usual advantages, mainly cost related, and disadvantages, mainly that someone else is in charge of the required service, that outsourcing solutions have. A prominent (and frequently appointed by companies, as we found out) provider of outsourced vulnerability discovery services is HackerOne (https://www.hackerone.com/). As advertised on its welcome web page, HackerOne is “The Most Trusted Hacker-Powered Security Platform”. Its role is to receive and resolve Critical Vulnerabilities before they can be exploited, as HackerOne states. The goal of HackerOne is to provide a bug bounty and vulnerability disclosure platform for developers and software companies attracting, in parallel, the interest of a large community of ethical hackers as well as hacker-powered cybersecurity research programs to discover vulnerabilities. HackerOne, as of August 30, 2017, has run 880 bug bounty programs on behalf of its clients and it has helped towards fixing 52002 vulnerabilities, thus rendering them non-exploitable for malicious purposes, with $20.1M bounties paid in total. In order to deliver these services, HackerOne is in contact with vulnerability discoverers (many of which are hackers). An example of one (among many) renowned software product players that is using HackerOne’s services is Adobe ([32]) while Dropbox has also turned to HackerOne after closing down its own vulnerability search site ([33]). It, thus, appears the vulnerability search outsourcing model is gaining popularity. There are two programs offered by HackerOne:

• Non-Commercial Responsible Disclosure Programs. • Commercial Bug Bounty Programs. HackerOne offers its own "Exclusive Bug Bounty Programs" for interested companies. In recent years the cybersecurity business has seen an upsurge of interest in public or private bug bounty programs. In this model, manufacturers or vendors reward cybersecurity researchers who identify and report vulnerabilities in their own products such as software, mobile applications, scripts, operating systems, protocols, services, or online-service web-applications. The intended benefit of an official "Exclusive Bug Bounty Program" is to improve the cybersecurity products through the disclosing ofzero-day vulnerabilities or the unveiling of cyberthreats in an active cooperation with the manufacturer or vendor.

In Figure 2-7, we see the timeframe of a typical vulnerability and reporting process in a reward program as provided by HackerOne.

Copyright SAINT Consortium. All rights reserved. 18 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 2-7: Generic vulnerability discovery and reporting process in reward programs

In Figure 2-8 we see the timeframe of the reward process for a successful vulnerability and reporting process, as provided by HackerOne.

Figure 2-8: Generic vulnerability discovery reward process in reward programs

Web sites that offer vulnerability search services also prove valuable as information sources for the purposes of our project. HackerOne is one of the most prominent due to its apparent popularity (measured by the large number of participating companies, as well as the presence of big players in software product developers). 2.4.1.4 Security company Several companies specialising in developing and licensing cybersecurity products also publish and run special “call for vulnerabilities” programs. The vulnerabilities gathered by these companies in the context of such programs are, principally, used to strengthen their own products and, thus, provide better services to

Copyright SAINT Consortium. All rights reserved. 19 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms their customers. Also, these companies may sell the acquired vulnerabilities to the vendors concerned under a suitable vulnerability reporting agreement. These companies tend not to sell the vulnerabilities they acquire to third parties. However, there are some third-party cybersecurity companies that buy these vulnerabilities and, then, sell them to concerned vendors. For instance listed below are some existing or discontinued programs as outlined by Algarni and Malaiya [1]:

• Secunia Vulnerability Coordination Reward Program (SVCRP). There were two special awards: most valued contributor and most interesting coordination report. However, as of August 16, 2013, this program was discontinued (see the relevant statement at https://secuniaresearch.flexerasoftware.com/community/research/svcrp/).

• Zero Day Initiative Rewards Program (ZDI – http://www.zerodayinitiative.com/about/benefits/): The Zero Day Initiative (ZDI) rewards vulnerability discoverers with points each time a submitted vulnerability is accepted. These points determine their ZDI status: bronze, silver, gold, platinum, and diamond. The monetary rewards can range from $1,000 up to $25,000.

• iDefence (acquired by Accenture) Vulnerability Contributor Program (https://www.accenture.com/us-en/service-idefense-security-intelligence): This is one of the oldest reward programs, and a few of the most eminent vulnerability discoverers have mentioned that they have collaborated with iDefence. Detailed reward information is not available, however, from the company except from some scattered references on the Web (e.g. https://www.darkreading.com/vulnerabilities---threats/idefense-awards-$50000-vulnerability- contributor-challenge-prize/d/d-id/1130194).

The cybersecurity service companies may have their own internal vulnerability discoverers and apart from resorting to programs such as the ones listed above. These discoverers’ mission is to promote the interests and businesses of their organisations. To stress the role of cybersecurity companies which run vulnerability reward programs the co- founder of cybersecurity group Secure Network Operations Software (SNOSoft), Adriel Desautels (chief technology officer, Netragard) claimed that he established a number of legal deals between vulnerability discoverers and third parties focused on information on critical flaws in software ([39]). The importance of the cybersecurity programs run by cybersecurity companies lies in the following statement by Desautels ([39]): “One of the reasons why the hacking community is so frustrated with large corporations is because these corporations are making a killing off their research and they are not seeing fair value for their work.” Based on this statement, one can conclude that the White Market sector of cybersecurity company reward programs can affect the Grey and Black vulnerability markets. This is examined in the following section.

2.4.2 Online forums Online forums provide the place where information is shared about vulnerabilities and exploits. In some cases, the exchange may not involve money—rather, the members (called hacktivists) have a special or private agenda to attack specific organisations “for the sport of it”. LulzSec (https://Twitter.com/lulzsec) was, once, a famous hacktivist group that attacked several user accounts and websites in different countries in 2011 until is ceased its operation in the end of 2014. Also, Anonymous (http://anonofficial.com/) is a network of hacktivists located in different places in the world that agree on the same targets for attack. It is believed that these groups do not really have 0-day vulnerabilities at their possession due to their non-monetary orientation and the goal to give information for free (and, thus, would not be possible to reveal for free a very expensive 0-day vulnerability). In addition, over the last years, hacktivists organise gatherings in the form of non-profit Hacker camps or conventions. An example of one of the most successful of these took place in The

Copyright SAINT Consortium. All rights reserved. 20 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Netherlands in August 2017 in the form of an outdoor Hacker camp (https://sha2017.org/), being the successor of a series of similar gatherings organised every four years in various places in the world.

2.4.3 Grey markets In contrast with the programs openly launched directly by companies (e.g. concerned vendors or cybersecurity companies), in this category there are vulnerability brokers. As the characterisation “broker” implies, they act as intermediaries between discoverers and governmental agencies or companies and vulnerability discoverers, and they buy as well as sell vulnerabilities. It has been stated that vulnerability brokers often ask for a commission, for their services, as high as 15% of the vulnerability selling price. Consequently, the vulnerability broker decides to sell the vulnerability to either the vendor affected or another organisation (mainly governmental) that pays the highest price. Of course, discoverers and buyers can be engaged in negotiation procedures (through the vulnerability broker) in order to, jointly, agree on the vulnerability price they are both satisfied with (much like a Nash equilibrium in pure strategies, as it is discussed in Section 5.5 in the contest of Game Theory). The main characteristic of the Grey Market, as the word “Grey” implies, is that it is only partially regulated, having only a generic set of operation governing rules and not a strict business/market legislation under which it openly operates. Interestingly, several international government organisations are said to have become significant buyers over the last year. As it is expected, however, no governmental agency discloses (or comments upon) its vulnerability buying polices or what vulnerabilities (and at which prices) it has bought. We, also, could not locate such information except from some generic points of views and guesswork. An example of a vulnerability broker was Vupen, which operated from 2004 until 2015. It ceased in May 2015 and its founders proceeded to establish a new vulnerability broker company, Zerodium (see Section 3.2 for more detailed information on its policies and operation, as it is deduced from the publicly available information by Zerodium itself). Although Vupen does not exist, its operation policy is typical for vulnerability brokers in general while many elements of this policy were also adopted for Zerodium. Vupen, as vulnerability brokers do, could sell vulnerabilities to governmental agencies (which are the principal buyers of 0-day vulnerabilities and 0-day exploits) under the condition that the country belongs to the NATO, the ANZUS, or the ASEAN alliance. If the concerned vendor bids higher than governmental agencies and buys the vulnerability, the acquired information will be used to patch the affected software products. However, if a governmental agency bids higher and it acquires the vulnerability, it usually uses the information for military or intelligence purposes. In this context, many security researchers and experts believe that the Stuxnet malware was developed by governmental agencies to attack Iran's nuclear factories back in 2010. Since vulnerability brokers’ principal buyers are governments, which are in a position to bid much higher than any company due to higher available funds, many experts fear that the Grey Market actually limits public vulnerability announcements and increases vulnerability prices at such levels that vendors do not have the opportunity to react to vulnerabilities before governmental agencies exploit the vulnerabilities for their (not publicized) purposes. Other than Zerodium, another influential vulnerability broker (operating at a 15% commission) is Grugq, who is a Bangkok-based security researcher operating as an individual. His job, as any other broker’s, is to act as intermediary between (mainly US and European for ethical as well as greater profit margins) governmental agencies and vulnerability discoverers. He maintains a Twitter account (which will be one of our information sources for automated analysis) at https://twitter.com/thegrugq?lang=el.

2.4.4 Black markets Black Markets, as the name implies, are neither regulated nor obey normal market ethics and policies. The involved actors, now, include virtually any individual or organisation such as cybercriminals, cyberterrorists, or governmental agencies, which purchase (at high prices) 0-day vulnerabilities in order to conduct malicious actions against other countries.

Copyright SAINT Consortium. All rights reserved. 21 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

The price paid to the vulnerability discoverers is said to be five to ten times the amount of all known vulnerability markets, depending, of course, on the vulnerability characteristics (e.g. target, criticality, possible profit by exploiting it etc.). Some price ranges for 0-day vulnerabilities in the black market, as estimated by experts in the field (since very scarce, if any, first-hand pricing information is publicly available), appear in the table below (see [1]):

Table 2-1: Price rangeTABLEs for II0 -day vulnerabilities PRICE LIST FOR ZERO-DAY VULNERABILITY EXPLOITS Products Minimum price for zero-day exploits “2011” Minimum price for zero-day exploits “2013” ADOBE READER $5,000 - $30,000 N/A MAC OSX $20,000 - $50,000 N/A ANDROID $30,000 - $60,000 $100,000 FLASH OR JAVA BROWSER PLUG-INS $40,000 - $100,000 N/A MICROSOFT WORD $50,000 - $100,000 N/A WINDOWS $60,000 - $120,000 $40,000 - $250,000 FIREFOX OR SAFARI $60,000 - $150,000 N/A CHROME OR INTERNET EXPLORER $80,000 - $200,000 $200,000 - $500,000 IOS $100,000 - $250,000 $50,000 - $200,000

As reported in [1], many governmental agencies and industrial players, such as the International Monetary Fund, Intel, the Indian Defense Ministry, as well as the Pacific Northwest National Laboratory, have been the target of malicious attacks based on 0-day vulnerabilities and exploits. It is well-known that several countries in the world have programs whose goals are to develop effective cyber weapons to inflict damage on hostile countries. These agencies are key players in the black market and set the pricing standards very high (due to competition among hem) attracting, in this way, the attention of some of the very best vulnerability discoverers and hackers.

2.5 Vulnerability consumers (buyers) In the end, at the final stage of the vulnerability discovery process, we have the vulnerability consumers, who are the buyers of vulnerability information and exploits. They are, either, software vendors themselves who make efforts to identify vulnerabilities and develop patches or organisations that intend to exploit the vulnerabilities in order to inflict damage or gain financial profit. In the case of governmental agencies, who are considered to be the principal buyers of 0-day vulnerabilities and exploits, the vulnerability information is used for espionage or intelligence purposes in order to gain advantage over other countries. As international power status is at stake and governments have virtual command over governmental expense distribution, they are willing to pay high rewards to 0- day vulnerability discoverers. This, thus, deprives software vendors of the opportunity to buy vulnerability information, patch their products in a timely fashion, and save themselves (and their clients) from (potentially huge) financial damages. This is an important asymmetry in the 0-day vulnerability market which should be taken into account in any financial model of vulnerability trading and the financial impacts of vulnerabilities on the market players.

2.6 Vulnerability resolutions (patches) The end of the vulnerability’s life-cycle is found in the action of patching. This involves the company, whose product is targeted by the vulnerability, producing a solution that prevents the exploitation of the vulnerability in order to inflict damage on the company and its clients or to gain illegal profit. Valuable information related to the vulnerability and 0-day vulnerability markets can be gained from observation of the patching management and processes of an individual company, specifically through the number of patches, their distribution over time, as well as the number of products. This is apart for their apparent usefulness with respect to the company and its customers. The characteristics of patches reveal information which sheds light on several aspects of a company, like loyalty to customers, reactiveness, and responsibility, qualities that can result to turning financial losses, due to vulnerability

Copyright SAINT Consortium. All rights reserved. 22 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms announcements (accompanied, perhaps, with damage to the company’s or to the company’s customers’ assets), into financial gains. Turning to one of our major information sources, the CVE database, for instance, we can examine the information in Figure 2-9, when searching for “Microsoft” related vulnerability patches (query was conducted on 13/9/2017 but it conveys the trend with respect to vulnerability and patching):

Figure 2-9: Company specific vulnerability and patch related information

In this table, we find useful information whose interpretation can lead to interesting conclusions with respect to vulnerability responsiveness. The information provided in the “OVAL Definitions” columns is the following (see https://oval.mitre.org/repository/about/overview.html), which is also provided in XML format for algorithmic processing:

Copyright SAINT Consortium. All rights reserved. 23 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Table 2-2: OVAL repository vulnerability tests

OVAL Vulnerability Tests that determine the presence of vulnerabilities on systems. Definitions OVAL Compliance Tests that determine whether the configuration settings of a system meet a Definitions security policy. OVAL Inventory Definitions Tests that whether a specific piece of software is installed on the system. OVAL Patch Definitions Tests that determine whether a particular patch is appropriate for a system.

The OVAL Repository is a central database where people in the vulnerabilities community discuss and disseminate information about vulnerabilities, called OVAL definitions. These are standardised (in XML), machine-readable (and CVE database compatible) tests written in the Open Vulnerability and Assessment Language (OVAL) that can test computer systems for the presence of known software vulnerabilities, problematic configuration settings, vulnerable software, and installed company patches. With respect to the information in Figure 2-9 (which is used as an example for our purposes in this deliverable) we focus on the column “Patch Definitions” and the “Windows 7” row. Clicking on the relevant cell in the matrix (with the number “92”) transfers us to a screen with 92 links, each leading to information and further links about tests that decide whether the published (by the affected company) patches, that removes a specific vulnerability, exists in a system. Given the number of known (to the database) vulnerabilities for Windows 7, which is 426, we find that 92 of them were successfully patched. Most probably more have been patched which can be located, accurately, by searching in Microsoft vulnerability patch web page (https://technet.microsoft.com/en-us/security/bulletins.aspx) for further analysis and information precision. However, the lower 92 is bound to the number of patches that Microsoft has produced or Windows 7 alone, which demonstrates a good level of responsiveness and responsibility. This information is exploitable within the scopes of WP3 (financial aspects of cybercrime) and WP5 (automatic processing of vulnerability related information feeds) in order to draw useful conclusions about how patches affect a company’s financial performance as well as its attractiveness to being attacked by a vulnerability discoverer. On the other hand, a very interesting vulnerability report by Flexera Software in [8] reveals an interesting two-sided reality with respect to vulnerability responsiveness by affected companies. In two of the report’s figures, provided as Figure 2-10 and Figure 2-11 here, we see a, rather, surprising fact about vulnerabilities and patches. As we see in Figure 2-10, 81% of all registered vulnerabilities by Secunia Research (which provided the data for the vulnerability review) had available patches on the same day of their disclosure (just slightly less than the 84.5% for the year 2015, shown in the same figure).

Copyright SAINT Consortium. All rights reserved. 24 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 2-10: Patch availability for vulnerabilities in all products (2011 – 2017)

Let us now examine the Top 50 applications’ list cited within the report, which contains the top 50 most popular applications with a share of the market. These 50 applications, which are also the ones most commonly found on users’ computers (either personal or corporate) include 35 Microsoft applications and 15 non-Microsoft applications. For the complete Top 50 applications list, one can consult the report as it is not necessary to list its contents making our point. For these applications, as it is shown in Figure 2-11, 92.5% of vulnerabilities had a patch available on the day of disclosure in 2016, the same figure as for year 2015.

Figure 2-11: Patch availability for vulnerabilities in Top-50 products in the market plus Windows 7 (2011 – 2017)

Copyright SAINT Consortium. All rights reserved. 25 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

The figure for the 2016 time-to-patch time frame shows that about one fifth of the total number of reported vulnerabilities (i.e. 19% of all the vulnerabilities) were left without patches for longer than one day, which was the day the vulnerability was disclosed. As commented in [8], this percentage is considered a typical fraction of software products that are not subjected to immediate remedial action with appropriate patches. This delay can be attributed to lack of resources on the vendor’s side, to lack of timely information about the eminent vulnerability, or to the fact that it was a 0-day vulnerability which required more time to remedy. With respect to the Top 50 applications, this percentage falls below one tenth (around 7.5%) which demonstrates that popular products are better supported by timely patches. Most importantly, with respect to the vulnerability markets discussed in Section 2.4, the research in [8] makes a very interesting point. The fact that 81% of vulnerabilities in all software products and 92.5% of vulnerabilities in the Top-50 products in 2016 received an appropriate patch on day 1 (day of vulnerability disclosure), especially compared with a 65% in the year 2011, suggests a shift of vulnerability discoverers’ practices. Most probably, the continuously improving time-to-patch rate can be attributed to the fact that researchers decide to share their vulnerability discoveries with affected vendors and vulnerability programs (e.g. bug bounties). This, in turn, gives rise to immediate availability of patches for most of the discovered vulnerabilities. 3. 0-Day vulnerabilities and Deep Web markets 3.1 0-day vulnerabilities As suggested in [3,10], in order to define the concept of a “zero-day”, or 0-day, vulnerability one must first define what is meant by “zero”. In other words, one must define the time instance which is considered as the first day of the vulnerability’s existence. Is it the time instance when the vulnerability is discovered and announced or is it the time when it is used for the first time without being noticed? Also, how many times can a 0-day vulnerability be exploited before it is no longer considered as a 0-day vulnerability? Most of the people interviewed in the research conducted in the context of [3], appear to agree with the following definition (quoted from [3]):

0-day (zero-day): the term 0-day exploit describes an exploit that is not publicly known. It describes tools by elite hackers who have discovered a new bug and shared it only with close friends. It also describes some new exploit for compromising popular services (the usual suspects: BIND, FTP services, Linux distros, Microsoft IIS, Solaris servers).

It is natural that many 0-day exploits are first discovered by their targets when hackers use them. Then, in this context, the term “0-day” can describe the fact that the value of the corresponding exploit reduces significantly upon its announcement. This reduction can even be exponential, as remarked in [3]. For instance, the next day after the announcement they can even be half as valuable. The 2nd day they may have ¼ of their 0-day value. Ten days later they may finally be 1/1000 as valuable as on day zero or even totally worthless. Thus, according to this line of thinking and the standard terminology, a 0-day exploit characterises a vulnerability that is being exploited before anything about the vulnerability or its exploit is announced or becomes public knowledge. Again, from the research conducted in [8] we have the interesting data appearing in Figure 3-1, which shows the 0-day vulnerabilities registered by Secunia in 2016.

Copyright SAINT Consortium. All rights reserved. 26 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 3-1: 0-day vulnerabilities registered by Secunia (year 2016)

We first observe that the number of 0-day vulnerabilities discovered in 2016 has decreased in comparison with the number for 2015 (row labelled “ALL” in the figure) with 19 of the 22 0-day vulnerabilities affecting products in the Top 50 product list, as compared with the number 23 the previous year. The fact that so many 0-days have been discovered within a time span of three consecutive years is rather significant, when one considers the critical role of 0-day vulnerabilities as potential attack vectors in Advanced Persistent Threat attacks (see [8]).

3.2 Pricing information on 0-day vulnerabilities and exploits Information on 0-Day vulnerabilities and exploits is, as expected, hard to locate and even if such information is discovered, it is usually scant or not accurate enough. Especially with respect to the pricing trends and models for such vulnerabilities, information is again scarce. Trying to elicit pricing information by contacting sources of the Deep Web is hopeless since either it is impossible to make a reliable contact or the given information is misleading. However, one source that publishes such information with sufficient detail to understand the 0-Day market or, at least, some sources of locating and pricing 0-Day vulnerabilities, is Zerodium (https://zerodium.com). Zerodium first published a price list on 0-day vulnerabilities and exploits in November 2015 and it was the first time that such a price list appeared in public. We could say, after recent investigations, that Zerodium remains a unique price information source on 0-Days. Zerodium, as it states on its web page, is “The premium acquisition program for zero-day exploits and advanced cybersecurity research”. It, thus, acts as buyer of 0-day vulnerabilities and exploits. Exploit acquisition platform Zerodium publishes, on a regular basis, price charts for different classes of digital intrusion techniques and software targets that it buys from hackers and later resells in a subscription service to its clients. The 0-Day price ranges provided by Zerodium, although not exact, are of interest because they are continuously updated and, thus, indirectly they provide important information about the Deep Web vulnerability markets evolution as well as vulnerability prices. In addition, the prices reflect how vulnerable software products are, during the publication of the price updates, and how difficult it is to provide 0-Day

Copyright SAINT Consortium. All rights reserved. 27 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms exploits that attack them and, consequently, the enterprise/user that deploys them. Especially interesting is to compare price information during consecutive updates, since an update implies some changes in the 0-day vulnerability market and/or some increase/decrease in the security and robustness of the listed products. Our project will adopt this pricing information provided by Zerodium as one of the information sources that will be utilised in the financial models for 0-day vulnerability pricing within the context of WP3 while it will, also, be one of the information feeds required by the tools developed within WP5. We will now focus on this pricing information and provide directions towards exploiting it for the purposes of our project, especially the work performed in WP3 and WP5. The pricing lists we cite were located on Zerodium’s site on August 28, 2017.

Figure 3-2: Zerodium pricing list for mobile phone 0-Day vulnerabilities/exploits (August 28, 2017)

Copyright SAINT Consortium. All rights reserved. 28 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 3-3: Zerodium pricing list for desktop/server 0-Day vulnerabilities/exploits (as of August 28, 2017)

Figure 3-4: Zerodium pricing list for mobile phone 0-Day vulnerabilities/exploits (September, 2016)

Copyright SAINT Consortium. All rights reserved. 29 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 3-5: Zerodium pricing list for mobile phone 0-Day vulnerabilities/exploits (November, 2015)

To confirm these prices (at least, the order of magnitude), we cite another, independent, information source (from 2012) and the corresponding vulnerability price table (see [34]):

Table 3-1: 0-day vulnerability prices (from Forbes)

Beyond 0-day vulnerability price lists, of particular interest are update information of the pricing lists for 0- day vulnerabilities in connection with their date of publication since they imply valuable information about the corresponding software products affected by the updates. For instance, in Table 3-2 we see a recent update on the 0-day vulnerability list published by Zerodium.

Table 3-2: Pricing list updates by Zerodium (published on August 23, 2017)

Modification Details New Entries $500,000 - Messaging Apps RCE + LPE (SMS/MMS, (Mobiles) iMessage, Telegram, WhatsApp, Signal, Facebook, Viber, WeChat) $500,000 - Default Email Apps RCE + LPE $150,000 - Baseband RCE + LPE $150,000 - Media files or documents RCE + LPE $100,000 - Sandbox escapes, code signing bypass,

Copyright SAINT Consortium. All rights reserved. 30 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

LPE to Kernel, WiFi RCE + LPE, SS7 $XX,000 - Other exploits for Mobiles Modified Entries $1,500,000 - Apple iOS Remote Jailbreak + (Mobiles) Persistence (Zero Click). Must be remote and without any user interaction $1,000,000 - Apple iOS Remote Jailbreak + Persistence with user interaction e.g. clicking a link or opening a file New Entries $300,000 - Windows 10 RCE (Zero Click) i.e. remote (Servers/Desktops exploits targeting default Windows services e.g. SMB or RDP $150,000 - Apache Web Server (on Linux) and Microsoft IIS RCE $100,000 - Microsoft Outlook RCE $80,000 - Mozilla Thunderbird RCE $80,000 - VMware ESXi Guest-to-Host Escape $30,000 - USB Code Execution i.e. after inserting a generic USB key/drive $10,000 - Routers RCE Increased Payouts $150,000 - Chrome RCE + LPE (Windows) including (Servers/Desktops) sandbox escape (previously: $80,000) $100,000 - PHP RCE (previously: $50,000) $100,000 - OpenSSL RCE (previously: $50,000) $100,000 - Microsoft Exchange Server RCE (previously: $40,000) $100,000 - Firefox/Tor RCE + LPE (Linux) including sandbox escape (previously: $30,000) $80,000 - Firefox/Tor RCE + LPE (Windows) including sandbox escape (previously: $30,000) $50,000 - Sendmail, Dovecot, Postfix, MS Office, and Antivirus RCEs (previously: $40,000) $50,000 - WordPress RCE (previously: $10,000) Reduced Payouts $10,000 - Antivirus LPE (previously: $40,000) (Servers/Desktops) Deleted Entries Internet Explorer RCE (previously: $30,000) (Servers/Desktops)

Let us first consider the last entry of the table, which states that Zerodium is no longer interested, in the Internet Explorer RCE (Remote Code Execution) vulnerability which was previously priced relatively high. This may imply several things about this software product, such as the following:

• The vulnerability no longer exists due to a recent patch provided by the software product owner that prevents such a vulnerability (i.e. Remote Code Execution in Microsoft’s Internet Explorer). • There is no sufficient interest from buyers any more due to some reason (e.g. shift of focus, politics in the ICT realm etc.). • There has been no offer for a corresponding exploit for several months.

By the information provided in [35], we conclude that this vulnerability was handled effectively by July 11, 2017 (when this information was published) or earlier. This led to the elimination of the search for a relevant 0-day vulnerability by Zerodium. Similar remarks hold for the “reduced pay-outs” entry. However, for the purposes of our project, of special interest is the information provided by the table entry “increased pay-outs”, with some increases surpassing 100% of the original price.

Copyright SAINT Consortium. All rights reserved. 31 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

3.3 0-day vulnerability markets Our survey in this section draws material from [3] and [10], as well as other cited published information sources, since first-hand information from 0-day vulnerability traders is not readily available or easy to have access to. 3.3.1 White Markets Discoverers of 0-day vulnerabilities have various options for disclosing them. One option is to disclose the vulnerability after the vendor has released the corresponding patch, which is termed coordinated disclosure. For instance, as discussed in the interview of Feliciano Intini in [3], after the patch has been released, the discoverer can present the vulnerability at a security conference or the vendor can give credits to the discoverer by adding his/her name in the patch announcement. Otherwise, the discoverer can share it without the vendor’s knowledge, who has no opportunity to release a timely patch. Eventually, the discoverer can capitalise on her/his effort and work invested in discovering the vulnerability if he/she wants to monetise his/her work by deciding to sell the vulnerability on the market. This case is increasingly frequent during recent years as the number of people that view vulnerability discoveries (especially 0-day vulnerability discovery) and hacking as profit-making activities is increasing. Markets where 0-day vulnerability transactions take place include the White Market, the Black Market or the Government Market. White Markets, which are legal markets anyone can contact through normal web search and contact means, mainly include security contests and bug-bounty programs in which vendors offer payments to researchers for the 0-day vulnerabilities they discover. Pricing a 0-day vulnerability is a rather complex, multi-parameter problem, which will be addressed later, in further detail, in this deliverable in Section 5. In general, a 0-day vulnerability can be likened to an immaterial asset, such as information, whose value is governed by the laws of demand and supply. Thus, one should conduct a careful cost-benefit analysis, assessing the cost to identify a vulnerability vs. profit, before embarking on 0-day vulnerability discovery for profit. The vulnerability discoverer should be aware that the value of a 0-day vulnerability can fluctuate widely, depending on many conditions and parameters while it can drop abruptly from very high to almost zero. If, for instance, the vulnerability was also discovered and traded by others, or if the vendor released suddenly a patch, the value of the vulnerability could possibly drop to zero. Consequently, the discoverer will sustain the risk of financial loss due to non-rewarded expended person hours (and even costs for buying tools and equipment) in the discovery of the vulnerability. For some critical vulnerabilities, as it can be seen in Section 3.2, the figures for a 0-day vulnerability can reach hundreds of thousands of dollars and above, but the potential gain can be reduced to nothing if one wastes too much time in settling the optimal price or locating a trusted buyer. For instance, illustrated below are two contrasting cases of a successful and unsuccessful effort on capitalising on discovered vulnerability, as they are reported in [20]. In Table 3-3 we see how a successful vulnerability sale ended at a price of $50K (See [20] for more details).

Table 3-3: Steps in a successful vulnerability sale

Date Action

6/05 Vulnerability discovered. 11/07/05 Submitted to prepub review at NSA. 7/27/06 Approved for release by prepub review. 7/27/06 Offered to government. 8/10/06 Verbally agreed to $80K conditional deal. 8/11/06 Exploit given for evaluation. 8/25/06 Hash of exploit published. 8/28/06 Agreed to lesser amount. 9/8/06 Paid.

On the other hand, which is more interesting in our study of vulnerability markets, Table 3-4 shows a case of an unsuccessful vulnerability sale. The discoverer, as he himself reports in [20], estimated that this

Copyright SAINT Consortium. All rights reserved. 32 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms vulnerability might be worth $50,000 to the government or $20,000 to a private company. He received a few offers and had agreed on a price of $12,000, offered by a security company. However, the sale was not completed because it became known that the vulnerability had been patched. For clarity, the specific patch corrected a security vulnerability in Microsoft Office 2003 that could allow arbitrary code to run when a maliciously modified file is opened.

Table 3-4: Steps in an unsuccessful vulnerability sale

Date Action

1/20/07 Vulnerability discovered. 1/25/07 Offered to government. 1/28/07 Exploit finished. 2/10/07 Offered to security companies. 2/13/06 Patched – KB929064.

This last case demonstrates the principal cause of failure of the White Market, i.e. the legitimate vulnerability market. The discoverer was at a difficulty to locate an interested buyer. In addition, although once the right buyer is located the sale can be completed very fast in a couple of days, it is difficult to know where to look for in order to contact the right buyers. Moreover, in White Markets, there is also a difficulty in agreeing on the right price. In [20], Table 3-5 is provided to demonstrate some vulnerability price estimates as well as the variations and uncertainty which may exist in the estimates.

Table 3-5: Some estimates on exploit values

Vulnerability/Exploit Value “Some exploits” $200,000 - $250,000 Significant, reliable exploit $125,000 Internet Explorer $60,000 - $120,000 Vista exploit $50,000 “Weaponized exploit” $20,000-$30,000 ZDI, iDefense purchases $2,000-$10,000 WMF exploit $4000 Microsoft Excel ≥ $1200 Mozilla $500

Buyers (or consumers) of vulnerabilities, from their side, protect their investment (i.e. purchased vulnerability) by taking into account the following considerations. The vulnerability, first of all, must be technically effective and exploitable, meaning that it can have the effect it is supposed to have. Then, the buyer should have the exclusive rights to the vulnerability, which prevents vulnerability discoverers from reselling their vulnerabilities (but this requires an honest discoverer). Reselling a vulnerability may increase a discoverer’s immediate profits (decreasing buyers’ profits since the vulnerability loses its value) but it decreases her/his future ones since reselling is considered an unethical and punishable action in the 0-day vulnerability markets. Thus, an exclusive rights agreement must accompany each transaction on 0-day vulnerabilities. It is easy to see that vulnerability selling and buying are activities which are founded on mutual trust between the buyer and the seller (as any other business, but, in this context, trust is a fundamental requirement). Vulnerability transactions, therefore, rely on trust and reputation for both parties to overcome the fact that such transactions are not normally governed by clear-cut trade legislation and pricing rules. On another route, 0-day vulnerability trading can be conducted through an intermediary company, or vulnerability broker (see 2.4.1.4), who offer payment to vulnerability discoverers if they sign an exclusive rights agreement for the company. Our preliminary research has confirmed the general common knowledge that in the vulnerability markets there is no uniform pricing for 0-day vulnerabilities with only price ranges being available. There are various vulnerability related parameters, however, that one can consider before negotiating a price, as discussed in the research results described in [10]:

Copyright SAINT Consortium. All rights reserved. 33 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

• What is the potential impact (financial gains, inflicted damage etc.) from the use of the vulnerability and its related exploits? • How widespread is the use of the vulnerable application (i.e. a rough estimate of the number of existing user base – see [36] for some estimates)? • Is the application part of the operating system or it comes as an add-on or extra application? • Is the application switched on (i.e. active) by default? • Is authentication required to use the application? • How well do typical firewall systems and policies block access to the application? • What versions of operating systems/application are vulnerable? • Does the vulnerability target a server or a client application/system? • Is user interaction required for exploiting the vulnerability? • How difficult is it to discover the vulnerability (this serves as a crude estimate of how long it will be before some other researcher discovers the vulnerability or, alternatively, how much time is available for negotiating a price in the vulnerability markets)? • How many people know about the vulnerability (if this information is available which is very difficult to know in case 0-day vulnerabilities are sought – also, it is very rare for a vulnerability researcher to share information about her/his findings or goals)? • How reliable and effective is the exploit corresponding to the vulnerability? • Does a single exploit work against many versions of the vulnerable application?

One can think of several other parameters, which will be investigated in the context of the SAINT project along with the ones above, such as the following:

• Which OS does the vulnerability target? • Does the vulnerability target a mobile platform? • What are the statistics, with respect to vulnerabilities and their severity, for the vulnerability target (e.g. are vulnerabilities rare or frequent)?

As it is stated by Cesar Cerrudo in one of the experts’ interviews discussed in [3], “0-days value depends on what product is affected and how many people and/or servers run that product. 0-days for widely used software will be the most valuable. The highest value depending on the most valuable 0-days are those that will let you to remotely compromise servers without authentication and also vulnerabilities in client-side software such as Internet Explorer, Adobe Reader, Microsoft Office, etc.” Such information is widely available today (see the discussion in Section 2) today (e.g. in CVE and NVD databases) and may serve as a good information basis for modelling vulnerability pricing and negotiations, especially the 0-day ones. These information items will be of importance in the analysis that will be conducted in the context of WP5.

3.3.2 Grey Markets Besides White and Black markets, there is a market that can be defined as Government Market or Grey Market in the 0-day vulnerability terrain. This view of Grey markets complements the view discussed in Section 2 where Grey Markets were the ones composed of vulnerability brokers mainly. In [3], an interviewee who prefers to stay anonymous stated that the US government buys 0-day vulnerabilities not for defensive reasons but, rather, to attack. The emails stolen by the hacktivist group Anonymous from the private company HBGary, which specialises in IT security, seems to support this statement about governmental activities which are probably typical for many countries. Some of the exchanged emails, provide information on 0-day vulnerabilities and their exploitation with some of the

Copyright SAINT Consortium. All rights reserved. 34 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms emails containing attachments with the vulnerabilities themselves in plaintext format. The emails were stolen, ironically, due to the exploitation of a vulnerability due to a server configuration error which allowed a successful SQL injection attack. The recipients (i.e. vulnerability buyers or brokers) ranged from well-known private companies in the ICT industry to governmental agencies and the emails were addressed to key persons in each organisation. The governments of certain countries are known to organise training programs for hackers towards employing them for intelligence and cyberwar purposes in intelligence agencies. Several examples of such programs are provided in [3] and [10]. In addition, the purchase of 0-day vulnerabilities is an important parallel task to this training for governments, both in securing their own ICT infrastructure and in inflicting damage on other countries’ infrastructures. Due to the high strategic importance of 0-day vulnerabilities, the Grey Market involving governments is very profitable for vulnerability discoverers.

3.3.3 Black Markets Finally, 0-day vulnerabilities are, also, traded in Black Markets. These are markets for illegal products and illicit services (which include drugs, weapons, stolen credit cards and 0-day vulnerabilities) where trading operations occur in two principal ways: (i) through Web contacts and online sales, and (ii) through physical meetings between trading parties. Research conducted in the context of [3] and [10], the sale of 0-day vulnerabilities takes place in more secluded Black Markets, in ways that are not clearly documented or publicly available. Even experts’ opinions on this issue are mixed (see [3]). One of the experts who agreed to talk without anonymity and whose views are stated in [3], Boris Sharov, maintains that, in order to conduct transactions for 0-day vulnerabilities in the Black Market, you first need to be introduced in specific Darknet forums by key persons trusted by the 0-day vulnerability discoverers and sellers. Moreover, according to the experience of one of the principal partners of the SAINT project, CYBE, to access the Black Market of 0-day vulnerabilities it is necessary to go through specific intermediary (broker) companies. One of the elements that makes the Black Market very attractive for 0-day vulnerability sellers is the high profit potential. Pedram Amini, a security researcher, for example, confirms that (see [3]), on the Black Market, the prices for individual 0- day vulnerabilities can range from $20,000 to $100,000, with an average price around $50,000 dollars (see full discussion in [3]); a statement that confirms the pricing information that we discovered during our research for this deliverable (see Section 2.4). As it is also reported in [3], it is interesting that it is possible for a vulnerability discoverer to sell the same 0-day vulnerability on both the White or Grey (e.g. to a governmental agency) and then to the Black Market. However, discoverers displaying such con-behaviour may face detrimental consequences both with respect to their reputation in the market (she or he may be expelled from a highly selective and profitable market) as well as have problems with the law. Finally, in the context of obtaining and investigating global data related to Black Market visibility and access patterns for 0-day vulnerabilities, one can use, as suggested in [10] the online tool Shodan (https://www.shodan.io/). This powerful tool can provide several interesting (for our project) raw information types, which are used, as Shodan states on its Web page, by “researchers, security professionals, large enterprises, CERTs and everybody in between”. This tool essentially crawls the IoT (Internet of Things) on a 24/7 basis and identifies IoT devices of any kind, recording information about its connections and its activities as well as its software and communications capabilities. It is possible that the information provided by Shodan can be linked to identifying how widespread and dangerous common vulnerabilities are, especially, in connection with, large vulnerability data sources such as CVE and NVD. For instance, the tool gave us the following tables (on September 10, 2017), when we asked to provide an estimate of the number of devices and software tools/applications in the World which are susceptible to the vulnerability exploited by Heartbleed (we only cite three tables out of the 11 provided by the Shodan tool):

Copyright SAINT Consortium. All rights reserved. 35 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 3-6: Top affected by cyberattacks countries (Shodan tool)

Figure 3-7: Top affected by vulnerabilities services (Shodan tool)

Copyright SAINT Consortium. All rights reserved. 36 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 3-8: Top affected by vulnerabilities Operating Systems (Shodan tool)

For a query about “buffer overflow”, we received the following results (part of it due to screen print limitations):

Figure 3-9: Buffer overflow related information (Shodan tool)

The hyperlinks in blue font link to related exploits in a very interesting exploits database named Exploit Database and accessed here https://www.exploit-db.com/, which will be one of our information sources

Copyright SAINT Consortium. All rights reserved. 37 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms within the context of WP5. On the left column we see that the Shodan tool provides useful information related to the appearance of the vulnerability in other information sources (“Source” information block) as well as which Platforms it affects (“Platform” information block). When these information sources are linked automatically (though our tools within the context of the SAINT project), the resulting information can considerably enhance our knowledge about how widespread the vulnerability is, as well as, obtaining further information provided by the different information sources. It is not difficult to see that automatic gathering of information, such as the one provided by tools such as Shodan and Exploit Database, can be of benefit for the goals of the SAINT project. This automatic gathering will be further investigated within the context of WP5. Moreover, the author of [10] proposes the use of the tool to identify connectivity patterns to Black Markets of 0-day vulnerabilities as well as Black Market entry points. For instance, one may use the Shodan tool, if a Black Market entry point (e.g. server) is known to monitor its behaviour and connectivity patterns from all over the world to assess, e.g., Black Market popularity, in comparison to other markets, as well as 0-day vulnerability type information (which is difficult due to the possibility of traders using strong encryption methods). The work performed in the context of WP5 will investigate the possibility of gaining access to this tool (there is an API available at https://developer.shodan.io/api) and to automatically obtain such data for further analysis using the Deep Web Crawlers of the SAINT project.

3.4 Cryptovirology and the Market for Encryption Back Doors Kleptography is the study of asymmetric back doors in key generation algorithms for the stealing of information securely and adeptly. A kleptographic attack is an attack in which a malicious developer uses asymmetric encryption to create or implement a cryptographic back door. This was first documented as far back as 1996 (see [28,29]). This is where cryptography itself is utilised against cryptography: the back door is not an additional channel of communication, nor does it require the transmission of additional data. Instead, the back door is embedded directly inside the communication channel. Kleptography is, in fact, a subfield of cryptovirology; the application of cryptography in malware, as seen most recently within ransomware attacks, e.g. Wannacry ([9,21]). The target of a kleptographic attack is the specific environment of a cryptosystem and not just any general form of software. Encryption back doors are essentially kleptographic attacks utilizing cryptovirology. This method is either provided by the developer of the cryptography, and within the encryption algorithm itself, or a third party for use on specific encrypted mechanisms or data. Currently, this is a major topic due to various law enforcements and governments mandating a backdoor to encrypted services that would enable law enforcement to use them under warrants. For example, in the US, the FBI wants a “backdoor” into encrypted products – not just phones, but other communications services as well. FBI Director Comey wants companies to build security flaws into their encrypted products. This would enable the government to break through and wiretap consumers or seize data stored on their devices. In the case of the San Bernardino shooting, the FBI sought to force Apple Inc., to produce an insecure version of its mobile operating system. The counter argument is that this would increase the security and privacy risks to hundreds of millions of mobile devices ([5]). The ISO has decided not to approve two NSA-designed block encryption algorithms: Speck and Simon, due to suspicions that backdoors were included within the encryption algorithms ([24]). In comparison, the European Parliament is considering a draft proposal that would effectively outlaw the introduction of backdoors in encryption systems and other kinds of interference with confidential information ([7]). Internationally, domestic controls on the use of encryption are enforced in: Russia, China, Mongolia, Vietnam, Pakistan, Iran, Kazakhstan, Belarus, Ukraine, Moldova, Israel, Tunisia and Morocco. Currently, within the EU, the UK is seeking domestic controls and encryption backdoors from various messaging providers via its revised data protection act ([4]). The market for encryption backdoors remains in focus, primarily around governmental agency activities, and arguably similar in nature to the 0-Day market. Former NSA contractor Edward Snowden leaked documents that show that the NSA created and promoted a flawed formula for generating random numbers that constituted a “back door” for encryption products. Reuters reported that RSA became the most important distributor of that formula by its inclusion in a software tool called Bsafe that is used to

Copyright SAINT Consortium. All rights reserved. 38 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms enhance security in personal computers and many other products. RSA received $10 million in an agreement to use the NSA formula as the preferred, or default, method for number generation in the BSafe software ([18,23]). From a grey or dark market perspective, there are 2 specific examples that emerged from the Wannacry ransomware attacks. Primarily, Wannacry was based upon the NSA tools FuzzBunch, one of the exploits leaked in the ShadowBrokers data dump after their failed auction request for payment. The FuzzBunch includes several ready-to-use exploits in a framework for easy launching and interaction with encryption backdoors as implants. The FuzzBunch framework contains several ready-to-use exploits for specific types of targets:

* Easybee-1.0.1.exe * Easypi-3.1.0.exe * Eclipsedwing-1.5.2.exe * Educatedscholar-1.0.0.exe * Emeraldthread-3.0.0.exe * Emphasismine-3.4.0.exe * Englishmansdentist-1.2.0.exe * Erraticgopher-1.0.1.exe * Eskimoroll-1.1.1.exe * Esteemaudit-2.1.0.exe * Eternalromance-1.3.0.exe * Eternalromance-1.4.0.exe * Eternalsynergy-1.0.1.exe * Ewokfrenzy-2.0.0.exe * Explodingcan-2.0.2.exe * Eternalblue-2.2.0.exe * Eternalchampion-2.0.0.exe

Eternalblue and Eternalsynergy were utilized within Wannacry and the later Petya/NotPetya ransomwares. It is estimated the Wannacry attack yielded approximately €110,000 ($130,000) and Petya/NotPetya €85,000 ($100,000) in Bitcoin payments ([22]). ShadowBrokers are now releasing a variety of NSA exploits and backdoors that they have obtained. The media used is the open Steemit social news service, by regular dumps for payment in ZCash (anonymous cyber currency transactions – currently 1 ZEC = € 195.43) ([25]) – current pricing ranges from 100 ZEC to the latest planned dump November 2017 of 16,000 ZEC – so from € 19,543 to € 312,096 per dump. The number of subscribers is unknown. 4. The role of the rate of updates and security fixes published by vendors With respect to the role of vendors updates in the vulnerability discovery and selling ecosystem, there are two aspects:

• There is little, or nothing, vendors can do for 0-day vulnerabilities, by definition, beyond taking the strategic decision to adopt good proactive measures in developing their products and protecting their infrastructure by bearing the corresponding costs along with all the other operational costs of their businesses (see considerations in Section 5).

• With respect to published vulnerabilities and exploits the affected vendors should first take immediate reactive measures. It, needs to maintain alertness (e.g. by establishing a dedicated team) to learn, in a timely fashion, about announced vulnerabilities, and to take immediate and

Copyright SAINT Consortium. All rights reserved. 39 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

appropriate measures to eliminate the vulnerability, again bearing the relevant costs. Then, the vendor, should implement proactive measures in order to avoid future vulnerabilities.

For our purposes, information on update rates as well as update volumes of vendor’s products is of significance (and will be targeted by the tools developed in WP5) for the following reasons:

One of the information sources that we will explore, is Stanford’s security updates list (see [37]), a screenshot of which appears in Figure 4-1. .

Figure 4-1: Information on software security updates and patches

5. Financial aspects of cybersecurity breaches and vulnerability information 5.1 General considerations In order to provide guidelines for cost-effective cybersecurity methodologies that can be applied as counter-measures for defence against malicious hackers, a thorough relevant research should also take into account the economic incentives for hacking. That is, to analyse the cybercrime market concerning factors that affect the purchase of illicit digital products and subsequently the hackers’ revenues and profitability. Cybercriminals’ profits depend on the net value of revenues from purchased criminal activity products. The relevant price fluctuations are indicative of the cybercriminals’ revenues. To minimise the risk of misusing valuable resources and time, cybercriminals diversify their actions towards multiple,

Copyright SAINT Consortium. All rights reserved. 40 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms potentially profitable targets and illicit actions that can yield them substantial earnings, allocating the appropriate time and resources. In this framework relevant information is of high value and constitutes a real asset. Such information is represented by the prices of cybercrime products and their sensitivity as demand changes. Cybercriminals act as investors in a stock market, allocating time and resources in activities that are more prone to profitability in terms of cost-benefit investment. These activities are treated as stocks, with those with the best potential and strong future prospects of profits are estimated to be of higher value in stock prices. The rapid change of the value of any service offered in the Deep Web market is a valid indication for cybercriminals on the actual price of purchased cybercrime products, when evaluating the cost-effectiveness of their revenue models and reallocating their resources. Profit motives that drive and cause hackers to engage in criminal cyberattacks are strongly dependent on the cost-benefit relationship between cyber-attackers’ revenues from cybercriminal activities and the alternative costs related to loss of earnings from not investing time and effort in more profitable activities. Employment of an effective cybersecurity strategy by organisations facing potential cyberattacks should target exactly this: the deployment of mechanisms and methods that, in terms of alternative costs, makes cyberattacks uneconomic for cybercriminals to bear. Applying theoretical frameworks from Microeconomics, we can examine the necessary conditions for profit maximisation of hackers as sellers of illicit digital products. Assessment of the hackers’ economic equilibrium requires knowledge of demand and supply elasticities of illicit digital products in the deep web markets. We can also examine the conditions shaping the typical hacker’s utility maximisation behaviour, regarding their allocation between time and effort spent for hacking and other relevant cybercriminal activities and respective anticipated revenues. Assessment of the typical hacker’s marginal rate of technical substitution, as a producer of illicit digital products, requires knowledge of the alternative costs related to hacking and/or other cybercriminal activities and of the revenues associated with them. In order to make the necessary estimations for examining the validity of the above theoretical frameworks, methods of statistics (correlation analysis) and econometrics (stochastic modelling) will be used. For applying these methodologies, data will be needed about certain variables for the identification of vulnerability markets and their taxonomy, and the prices of the illicit digital products of cybercrime as they are depicted in deep web market intelligence information, such as; 0-Days exploits, rate of updates and cybersecurity fixes, number of bug bounties and cybersecurity contests and data on ongoing transactions about these. In the following sections, we present some first considerations.

5.2 The Capacity and Value-Based Pricing Model for vulnerability and exploit trading The Capacity and Value-Based Pricing (CVBP) Model proposed in [27] for pricing professional services, which in the context of our project is vulnerability and exploit discovery and selling, was proposed in order to reassess how professional services should be priced in a very competitive market, such as the vulnerability markets targeted by our project. In general (see discussion in Section 5.1), the price of any product should, ideally, represent an aggregate estimate of the values assigned to its features and attributes. Especially with respect to services (such as vulnerability discovery), their pricing can be accomplished in several ways. Most often, the pricing decision relies on, what is called, “cost-plus” pricing. Cost-plus pricing is based on the natural consideration that service offers are priced based on the costs inflicted on the service providers in developing and offering their services plus a pre-determined profit margin on these costs. This simple model, however, does not take into account an important customer (vulnerability buyer) related parameter, which is reflected on the value of the service (vulnerability discovery and selling). It is our assumption that if pricing is more closely correlated with value, then profit margins can be increased without having to significantly increase staff or workload. It was in this vain that the CVBP Model was developed, in an effort to systematically correlate “value,” along with several other attributes, into the price of professional services. The “stand-alone” value of the product (e.g. “cost-plus” price) is essentially one dimension of the pricing scheme, and should be considered in the context of other variables, such as

Copyright SAINT Consortium. All rights reserved. 41 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms available capacity, when determining the price of a particular service. The term “stand-alone” value is meant to connote the value of a product absent of external factors such as time, delivery, competition, etc. While this may seem contradictory, it is easily understandable if one thinks about the products that consumers purchase. For instance, one may have some perception about chewing gum. However, depending on what time of day it is or the situation in which one finds oneself, the value one subscribes to chewing gum changes. An instance of that may be after a meal or before a meeting, the value is higher. The “stand-alone” value of goods is, thus, the value of the goods (or service) in the absence of special circumstances such as above. The CVBP Model attempts to incorporate that “stand-alone” value, along with other variables, to arrive at a set of improved prices in a systematic way.

5.3 Costs of vulnerability announcements to vendors and costs of proactive defences In [13], the following important points are stated, which are of direct relevance to any cost model applicable to vulnerability identification and its financial aspects (i.e. vulnerability pricing and costs for the proactive and reactive defence against it):

• The defender must defend all points. The attacker can choose the weakest point • The defender can defend only against known attacks. The attacker can probe for unknown vulnerabilities • The defender must be constantly vigilant. The attacker can strike at will • The defender must play by the rules. The attacker can play dirty

It is a commonly held belief in software engineering that when one makes unforeseen changes in a software product late in the design and implementation cycle (especially after it has already been distributed) involves high costs, both tangible (e.g. development and re-distribution costs) and intangible (e.g. tarnished fame of the company and dissatisfied customers). The same holds true (to a greater extend perhaps) when a software vendor makes changes to a software product too late, after a vulnerability has been identified late in the development or distribution phases, either by the company’s own developers or vulnerability discoverers (perhaps malicious ones). Now the costs involve not only the costs of developing, applying and distributing the required patches, as “late software product changes”, but the liabilities and revenue losses (e.g. in customer data loss or hacked banking accounts) of the company due to the damage inflicted on customers or on the company itself. That said, one may reasonably argue that the costs involved in fixing a vulnerability include the following:

• The cost of locating the system parts (e.g. code segments) that cause the vulnerability • The cost of developing and testing the required patch that fixes the vulnerability • Team coordination costs that, often, involve coordinating teams from several parties • Loss of productivity costs due to person hours assigned in the vulnerability fix process instead of developing new products and handling clients’ requirements • The costs involved in digitally signing the patch code, before distribution (e.g. Microsoft’s Authenticode service) • The costs involved handling ransomware demands • The costs of publishing the patch on the company’s own Web site • The cost in writing the appropriate patch documentation and installation instructions • The costs of handling customer complaints and bad publicity • The liability costs according to law for damage caused to users of the faulty software product • Costs of legal services in case of lawsuits by product users

Copyright SAINT Consortium. All rights reserved. 42 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

• Loss of customers and revenues

In conclusion, the order of magnitude of a vulnerability fix can be, easily, in the tens of thousands of USD, giving an estimate of around $100.000, and may be even higher due to customer loss and paid liabilities components. Useful information about cybercrime cases and the inflicted costs can be find on the sites http://www.cybercrime.gov (for US) and http://www.cybercrime.eu (for the EU). However, these sites require sophisticated NLP techniques for automated analysis since the information they published does not follow a formal format, such as the CVE for instance, but consists of text written in English. We plan, however, to use the information contained therein which will be collected manually by our researchers. In contrast with the costs of fixing a vulnerability, are the costs of not allowing it to appear in the first place that is the cost of preventing vulnerabilities. These costs are listed below:

• Daily monitoring, by a group of company’s developers, of publicly available vulnerability reports by major vulnerability reporting providers and agencies. It is not necessary that all developers be cybersecurity experts since vulnerabilities are, most commonly, based on certain recurring software bug patterns (e.g. buffer/stack overflows, open unused TCP/IP ports and services, lack of defence against DDOS attacks, using, for instance CAPTCHAs etc.) which can be easily understood and (most probably) fixed by software developers. For instance, a group of developers can check, on a weekly or daily basis, NVD’s site and search for known vulnerabilities using its advanced vulnerability search facility (https://nvd.nist.gov/vuln/search).

• Educating developers of key software products in cybersecurity, using some on-line or conventional seminar services. Numerous cybersecurity companies (e.g. Symantec, https://www.symantec.com/services/education-services/training-courses) and leader ICT products providers (e.g. Cisco, https://learningnetwork.cisco.com/community/certifications/security_ccna) provide vulnerability prevention, incident responding, and good cybersecurity practices seminars that also lead to certification of the participants with reasonable participation fees.

• Performing penetration testing on a regular basis. Penetration testing involves a specialist or a specialists group attempting to gain access to an ICT infrastructure using abnormal or offensive access methods, emulating potential cyber-attackers of the system, in an effort to identify vulnerabilities and cybersecurity holes of the system before attackers do. The goal of a penetration test is to enhance the cybersecurity features of the target infrastructure. Penetration testers employ all possible tools and knowledge available to achieve their goals as if they were the hackers. Some penetration tests target one vulnerability or security hole but in most cases, they look beyond anything possibly caused by a single vulnerability or hidden under an apparent normal operation of the system. Costs are usually agreed after discussions between the penetration testing provider and the infrastructure owner but are not normally high, as indicated in the following tables from two different penetration testing service providers:

❖ ServerScan (https://www.serverscan.com/Penetration-Testing): We quote from their site: Base Price: Up to 32 IP Addresses (internal + external) - $6,500 Additional IP Addresses (per 32) - $1,200 Internal Penetration Test - Included In Base Price First Web Application - $2,200 Additional Web Applications - $1,200

❖ HighBit Security (https://www.highbitsecurity.com/penetrationtesting-cost.php): We quote from their site:

Copyright SAINT Consortium. All rights reserved. 43 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Starting Type Description Price, USD Base price is for an external penetration test addressing security vulnerabilities at External the network layer* and also including host configuration* vulnerabilities, up to 32 $5,900 Network IP addresses. A non-credentialed Web Application Test may be substituted if you do not need network testing. Base engagement price is for an internal penetration test (on your internal network) addressing security vulnerabilities at the network layer* and also Internal including host configuration* vulnerabilities, up to 32 IP addresses. A non- $6,900 Network credentialed Web Application Test may be substituted if you do not need network testing. Web Price is for a single non-credentialed* web application penetration test, in $1,900 Application conjunction with an external or internal network penetration test. Price is for a wireless penetration test, in conjunction with an internal network Wireless $4,900 penetration test, for one wireless access point and associated client devices. Price is for a remote social engineering test, including two separate electronic Social attack vectors including spear phishing email directed at human targets within your $4,900 Engineering organisation, in conjunction with an external network penetration test*. *Network Layer testing includes firewall configuration testing, such as stateful analysis tests and common firewall bypass testing, IPS evasion, DNS attacks including zone transfer testing, switching and routing issues and other network related testing.

We observe that prices are comparable. Again, beyond penetration testing, continuous effort and alertness is required by the infrastructure owner since new vulnerabilities are discovered and new patches are published each day.

• Obtaining an ISO 27000 certificate. This step usually necessitates a thorough penetration testing before, in order to ascertain that the level of security of the organisation’s infrastructure corresponds to the ISO’s standards (see, e.g., https://www.itgovernance.co.uk/iso27001).

• Launch of own or outsourced bug bounty or vulnerability reward program. For instance, with just 500€ per year, one can start hosting a responsible vulnerability disclosure program at the Vulnerability Lab (https://www.vulnerability-lab.com/) and take advantage of advanced bug bounty support and vulnerability identification services of the lab.

• Employ good software design, implementation, and testing methodologies including targeting a system quality/security certificate. Two such widely accepted certification standards are the Common Criteria framework (https://www.cse-cst.gc.ca/en/canadian-common-criteria- scheme/main) and The ISO/IEC 27000 family of standards (https://www.iso.org/isoiec-27001- information-security.html). The costs of preparing the systems for the certification audits as well as the costs of obtaining the certificates should be taken into account and compared with the potential benefits of avoiding cyberattack related damage.

These two contrasting cost types, i.e. fixing or preventing a vulnerability, or in other words, reactive versus proactive measures, are related to the costs/benefits of finding a vulnerability, i.e. the costs/benefits of hackers and other vulnerability discovery agencies.

Copyright SAINT Consortium. All rights reserved. 44 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

5.4 The effect of vulnerability disclosure on the market value of software product vendors As outlined in Section 5.3 , vulnerability announcements can inflict severe monetary and other intangible costs (e.g. loss of trust and tarnished fame) on the affected company, measured by system downtime, operation disruption, and loss of credibility. However, the economic implications of these defects for software vendors are complex and not easy to grasp or to accurately model. One interesting study on these implications was conducted in [26] based on the examination of the effects (e.g. stock market value) of the disclosure of certain vulnerabilities on companies. As it is stated by the authors, their goal is twofold: (i) To examine how a software vendor's market value changes when a vulnerability is announced, and (ii) to examine how the affected company’s and the vulnerability’s characteristics induce the change in the market value of a vendor. The information and data for the study consisted of leading national newspapers’ articles and industry sources, such as the Computer Emergency Response Team (CERT), as well as information about software vulnerabilities. The study demonstrated that vulnerability announcements lead to a negative and significant change in a software vendor's market value. According to the conducted quantitative analysis, an affected vendor can lose about 0.6 percent value in stock price when a related vulnerability, is disclosed. The study also showed that a software vendor loses more market share if the market is competitive or if the vendor is small. Moreover, as it was natural to expect, the change in stock value is more negative if the vendor fails to provide the right patch at the time of disclosure of the vulnerability. In addition, according to the findings of the study, key vulnerabilities have significantly more impact on the company’s value. Of interest for our project’s purposes is the technique employed by the researchers in [26] conducing their investigations, which can be of benefit to our study of the economic impact of vulnerabilities on affected industry players. The event-study methodology assumes that an event of interest, which is a vulnerability announcement for our own project’s goals, has a significant impact (see Figure 5-1) on the returns of a stock (of the attacked, through the vulnerability, vendor or firm).

Revised cash flow expectation by Investors

Abnormal Returns

Figure 5-1: Vulnerability discovery and announcement with ensuing Abnormal Returns

The time period of observation of the event is termed the event window. The smallest such event window is one day, which is day “0”, the day of the announcement of the event. In practice, however, the event window is often extended to two days, that is day “0” and day “1”, to capture the financial impact on the stock value after the close of the markets on day “0”, on the next day (day “1”). Sometimes, the day before the announcement of the event is also included in the event window to capture the impact of potential leaks about the event announcement. In [26], the researchers used a one-day event window, i.e. they include only day “0” in the study of the event effects on the stock value of the targeted firms. The parameter that is evaluated through the event study methodology is called Abnormal Returns (AR – see Figure 5-1). This is defined as the difference between the actual return of the stock over the event window minus the expected return of the stock over the event window. The expected return on the stock

Copyright SAINT Consortium. All rights reserved. 45 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms can be calculated in several ways, according to the methodology, but in this analysis [26] the authors used the market model, which assumes a stable linear relation between the market return and the return on the stock. Also of interest is the Cumulative Abnormal Returns (CAR) which is defined as the sum of all abnormal returns. Cumulative Abnormal Returns are usually calculated over small windows, often only days. This is because evidence has shown that compounding daily abnormal returns can create bias in the results. According to the market model, the following are the steps that are taken in order to apply the event study methodology by computing the abnormal returns and cumulative abnormal returns (see, e.g., [15,26]):

• Gather and match time series date of the financial returns of the target firm's stock and the reference market index (i.e. the target market index, which in our case is the IT market – data can be derived, for instance, from the NASDAQ Computer market sector index IXCO or the NASDAQ 100 Technology Sector index NDXT, depending on the target firm).

• For each event (in our case, vulnerability disclosure), identify the sequences of firm and market returns that need to be included in the chosen estimation window.

• Using regression analysis, calculate the alpha, beta and sigma coefficients that explicate the typical relationship between the stock returns and the reference market index.

• With these three parameters, predict the “normal returns” of the stock for all days of the event window.

• Deducting these “normal returns” from the “actual returns” gives you the “abnormal returns”, which are the parameter of interest in the event study methodology.

Methodologically, event studies imply the following: based on an estimation window prior to the analysed event, the method estimates what the normal stock returns of the affected firm(s) should be at the day of the event and several days prior and after the event (i.e., during the event window). Thereafter, the method deducts this 'normal returns' from the 'actual returns' to receive 'abnormal returns' attributed to the event. The hypotheses posed by the authors in [26] were the following:

• H1: A software vendor suffers a loss in market value when a cybersecurity related vulnerability is announced in its products.

• H2: CAR [negative] of a stock is greater for vulnerabilities where the software vendor does not release a patch at the time of the vulnerability disclosure.

• H3: CAR is greater for a vulnerability, which can potentially cause a breach in confidentiality as compared to non-confidentiality related breaches.

• H4a: The loss in market value of a software vendor is greater if the announced vulnerability has a higher severity.

• H4b: The loss in market value of a software vendor is greater if an exploit exists publicly at the time of the vulnerability announcement.

• H5: The loss in market value for a software vendor is lower in case the cybersecurity vulnerability is discovered by the vendor itself rather than by rivals or third-party cybersecurity firms.

Copyright SAINT Consortium. All rights reserved. 46 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

• H6: The magnitude of CAR is more when the vulnerability is reported in popular press than in industry sources.

The research data gathered by the researcher is shown below:

Number of firms 18

Number of announcements 146

%age of vulnerabilities announcements in popular press 35

%age of vulnerabilities for which vendor has patch available at the 24 time of the announcement. %age of vulnerabilities discovered by the vendor itself 36

% of vulnerabilities that could potentially result in a security breach 39 related to confidentiality %age of vulnerabilities for which the announcement contained 22 information of a publicly available ‘exploit’

The distribution of the vulnerability announcements, over the years targeted by the study, is shown in the table below:

Year Number of Announcements

1999 4

2000 22

2001 28

2002 24

2003 45

2004 (till May 30) 23

The statistical analysis that was conducted (based on hypothesis testing methodologies) showed that the available data supported hypotheses H1, H2, H3, and H4. However, H5 and H6 were not supported, although reasonable enquiries.

5.5 Modelling the decisions of the vulnerability discoverer and defender Game theory is concerned with the study of the ways in which strategic interactions among rational agents induce outcomes with respect to the preferences (or utilities) of those players, none of which may have been intended by any of them alone (i.e. without the element of interaction). Game theory uses mathematical tools and techniques for modelling and analysing these strategic interactions and conflicts of interest. All situations in which at least one rational agent acts to maximise his utility through anticipating the responses to his actions, by one or more other agents, is called a game. Agents involved in games are referred to as players. Each player in a game faces a choice among two or more possible strategies. A strategy is a predetermined “program of play” that tells the player what actions to take in response to every possible strategy other players may use. A crucial aspect of the specification of a game involves the information that players have when they choose strategies. In game theory, information regarding moves and utilities are of primary concern. The

Copyright SAINT Consortium. All rights reserved. 47 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms games of perfect information, involve players who know all the moves made thus far in the game by all players. In contrast, in a game of imperfect information, players cannot directly observe the actions of their opponents. Since game theory is about rational action given the strategically significant actions of others, the fact that players know, or fail to know, about each other’s actions makes a considerable difference in the analyses. In games of complete information, agents know the utilities available for all possible outcomes of the game. Incomplete information means that the players do not know the utilities of their opponents. In games of incomplete information players should utilise their beliefs to estimate their opponents’ preferences and subsequently their possible strategies. In game theory these beliefs are modelled as probability distribution functions over their opponents’ possible set of moves and utility structures. The solutions of games are referred to as equilibria. A set of strategies is a Nash equilibrium where no player can possibly improve his utility, given the strategies of all other players in the game, by deviating from his equilibrium strategy. When each player’s strategy is stable and self-enforcing, then a game reaches an equilibrium Whenever incomplete information or the structure of the game causes uncertainty, the players cannot deduce the adversaries’ strategies with certainty. To account for such games, the concept of Nash equilibrium has been extended to Bayesian equilibrium which consists of mixed (stochastic) strategies, where a mixed strategy is a probability distribution over all the actions in the strategy set. More formally, a game G involves a finite number of n players, with n ≥ 2. Player i has a finite set Si of at least two pure strategies to choose from. We call the set elements S = S1 × … Sk the pure strategy profiles. Therefore, component i of an element in S represents the choice made by player i from the set Si of pure strategies. Moreover, to quantify and rank the players’ decisions, given any element in S (that is, a pure strategy profile), G must specify real-valued utilities or payoffs to each player that result from that s pure strategy profile. Let ui denote the payoff to player i that results from s  S. A Nash equilibrium is a set of strategies (one per player) from which no player has incentive to deviate. In general, individual pure strategies alone do not, always, give Nash equilibria. It’s often necessary to allow players to randomise over their pure strategies. Classical texts, such as Drew Fudenberg and Jean Tirole’s Game Theory book (see [11]) provide further information on these strategies. Our discussion below is based in the survey paper [19] on the applications of game theory to cybersecurity and risk management. Among a number of interesting cybersecurity-related games surveyed in [19] is a is a two-player game involving an attacker and a defender as defined in [2]. Of particular importance is the quick and practical investigation of the conditions that lead enterprises to invest in vulnerability defences, by taking strong cybersecurity measures (which involve a cost and a profit), and those in which cyber-attackers will invest in discovering vulnerabilities (a task which, also, involves a cost and a profit). In the model the attacker has two choices; to develop an attack or to do nothing. The defender’s choices are to develop, or not to develop, a strong defence scheme (in our case, a good system design and implementation methodology that minimises vulnerabilities or builds defences against most common vulnerability causes, e.g. stack/buffer overflow attacks). Both the attacker and the defender’s choices, attacking or defending, respectively, and doing nothing, have associated benefits and costs, which are modelled by the payoffs in a two-person game. The authors in [2] observe that, as happens in many situations in practice, for certain reasonable ranges of the payoff values of the two players, the game has no Nash equilibrium in pure strategies. In other words, there is no strategy for any of the players, that always leads to maximum profits. This is not hard to see in our “vulnerability fining/defending game”. If a vulnerability discoverer always elects to target a specific vendor’s products or systems for identifying vulnerabilities, then the vendor (knowing the vulnerability hunters’ permanent choice) would do its best to defend, even at a high cost, against the costliest cybersecurity breach. The discoverer would, then, not attempt to expend effort (and, perhaps, money) in vulnerability findings against this vendor. On the other hand, if the vulnerability discoverer decides to never target this vendor, then the vendor is better off not investing in costly counter-attack measures. In game theory, a convenient way to visualise such a simple two-person game is to represent player utilities depending on their actions in a matrix where Player 1’s (defender) strategies correspond to rows

Copyright SAINT Consortium. All rights reserved. 48 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms and Player 2’s (attacker) strategies correspond to columns. Then cell (i,j) (with i,j equal to 1 or 2) of the matrix contains an ordered pair (ui, uj) of payoff values, where the first value correspond to the row player and the second value corresponds to the column player. In Table 5-1 we see an instance of this normal form representation, of a two-person game, where the payoff values a, b, c, α, γ are positive real numbers. We assume here that investing in vulnerability elimination (respectively, investing in vulnerability discovery) is costly to the vendor (respectively the attacker), whereas doing nothing has no effort investment cost.

Table 5-1: A simple two-person vulnerability discovery/defence game Invest in vulnerability discovery Do nothing Invest in vulnerability elimination (a, -α) (-b, 0) Do nothing (-c, γ) (0,0)

In this game, it is generally reasonable to assume that the cost for the vendor in failing to guard against a vulnerability is much larger than the cost of protecting itself against it, as discussed in Section 5. In particular, based on the numbers cited in Section 5.3, it is not unreasonable to assume, for instance, that c > 2b. According to the theory, then, under this condition there exists a unique Nash equilibrium in mixed strategies, which is shown in Table 5-2:

Table 5-2: The unique Nash equilibrium in mixed strategies Invest in vulnerability discovery Do nothing Invest in vulnerability elimination 훾 푏 훾 푎 + 푐 ( , ) ( , ) 훼 + 훾 푎 + 푏 + 푐 훼 + 훾 푎 + 푏 + 푐 Do nothing 훼 푏 훼 푎 + 푐 ( , ) ( , ) 훼 + 훾 푎 + 푏 + 푐 훼 + 훾 푎 + 푏 + 푐

In this matrix, the values in the parentheses are the probabilities (or, in a more practical interpretation, the percentage of the number of times, or frequency, a strategy is chosen) with which the respective players choose the corresponding strategies. Thus, the vendor adopts the “invest in vulnerability elimination” 훾 훼 strategy with probability (or, rate of adoption) and the “do nothing” strategy with probability . 훼+훾 훼+훾 푏 Correspondingly, the attacker adopts the “invest in vulnerability discovery” strategy with probability 푎+푏+푐 푎+푐 and the “do nothing” strategy with probability . 푎+푏+푐 The authors in [2] remark that, interestingly, in such types of games the mixed strategy choice (Nash equilibrium) of each player is taken in a way that deprives the opponent of any preference over the one or the other of her/his strategies. The SAINT project will expand on the game theory based analysis methodologies in order to derive conclusions with respect to the profit equilibria among defenders and attackers in the vulnerability market financial terrain. 6. Specifications for the OSINT Web Crawler and the Social Network Analyser As illustrated in the diagram below, the concept of knowledge, sometimes referred to as knowledge hierarchy, is depicted as part of a pyramid. This pyramid has data as its foundation, the information as the middle layer, and knowledge on the top. Climbing the pyramid means refining knowledge from raw data, using the technology we build and gaining a deeper knowledge of the original (raw) data.

Copyright SAINT Consortium. All rights reserved. 49 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 6-1. From raw data to semantic knowledge

SAINT implements two tools for searching the Web for cybercrime related activity. The first one, is the Web Crawler, with the attributes of searching and scraping websites with cybercrime activities, vulnerability markets, Bug Bounties, as well as websites which include Black Markets, in the Deep and Dark Web. The second one, is the Social Network Analyzer (SNA), which applies social media mining techniques into the well-known Twitter API and Google Trends platforms, for monitoring the cybercriminal activity around the world. Subsequently, the specifications of previous ones are presented in the following sub-sections.

6.1 Web Crawler Open Source Intelligence (OSINT) is a term used to refer to the data collected from publicly available sources to be used in an intelligence context. A Web crawler is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing. A Web crawler starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies. If the crawler is performing archiving of websites, it copies and saves the information as it goes. The archives are usually stored in such a way that they can be viewed, read and navigated as if they were on the live web, but are preserved as ‘snapshots'. The archive is known as the repository and is designed to store and manage the collection of web pages. The repository only stores HTML pages and these pages are stored as distinct files. A repository is similar to any other system that stores data, like a modern database. The main difference is that a repository does not need all the functionality offered by a full-blown database system. The repository stores the most recent version of the web page retrieved by the crawler. The SAINT OSINT Web crawler is a web crawler that indexes web content related to cybercriminal activities, vulnerability markets, bug bounty markets and finally, black markets in Deep and Dark Web. Nevertheless, some extra specifications are needed for web crawling and scraping: • The SAINT web crawler tool performs legal web scraping, within fair use of copyright laws and finally making download requests at a reasonable rate. • Developing an understanding of the scale and structure of a website that will be crawled, is a prerequisite. Sitemap files and robots.txt files let the crawler know of any restrictions when a website is being crawled. The robots.txt file is a valuable resource to check before crawling to minimize the chance of being blocked and to discover clues about the website's structure. Sitemap files provide links to all site’s web pages, that is an efficient way to crawl a website. However, these files should be treated carefully, as they can be missing, out-of- date, or incomplete.

Copyright SAINT Consortium. All rights reserved. 50 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

• Estimating the size of the website affects the way it will be crawled, and subsequently the efficiency of the crawler. E.g. if a website has over a million web pages distributing downloading is an efficient way of crawling and scraping. • Identifying the technology used by a website affects the way it will be crawled. • Finding the owner of a website. This is important because if, for example, the owner is known to block web crawlers then it would be wise to be more conservative in download rate. • Making the crawler acting act more like a typical user and follow links to reach the interesting content. This can be done by using regular expressions (regex) to determine which web pages the crawler should download. • Response object can be parsed using both CSS and XPath, making it very versatile for getting obvious and harder-to-reach content. • Understanding how the web page loads the content, a process which can be described as reverse engineering. • Solving CAPTCHA automatically, which is used by the website to determine whether the user is human or not, as well as preventing bots from interacting with the website. This can be done by optical character recognition or deep learning image recognition methods. • The crawler can save scraped items automatically in CSV, JSON, or XML format. • Dynamic web pages that use AJAX or rely on Javascript for functionalities, are causing problems to web crawlers. In such situations, Reverse Engineering techniques are used, for better crawling.

Web pages in the Deep web are practically accessible by submitting queries to a database, and regular crawlers are unable to find these pages if there are no links pointing to them. A way to address these issues is to increase the number of web links to be crawled. Crawling on all text contained inside the hypertext, tags or text, is an efficient method to achieve better Deep Web crawling. Another strategic approach is a technique called screen scraping. A specialized software may be customized to automatically, and repeatedly, query a given Web form with the intention of aggregating the resulting data. In such a way, multiple Web forms can be spanned across multiple websites. Data extracted from the results of one Web form submission can be taken and applied as an input to another Web form thus establishing continuity across the Deep Web in a way not possible with traditional web crawlers.

6.2 Social Network Analyzer (SNA) The SAINT Social Network Analyzer (SNA) is composed of two sub-tools. The Twitter SNA and the Google Trends SNA tool. The overall process for building a Social Network Analyzer can be summarized in the following steps:

1. Authentication (OAuth: Open Authorization, grant access to the social media platform) 2. Data collection 3. Data cleaning and pre-processing 4. Modeling and analysis 5. Result presentation (Visualization)

Copyright SAINT Consortium. All rights reserved. 51 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 6-2. The process of social media mining

The Twitter SNA utilizes the popular social network Twitter to extract trends on the cybercrime activity. A dictionary of #hashtags of interest is created. Thereafter, the SAINT SNA mines publicly available (only) tweets for the specific hashtags and extracts the related information. Using Python’s Tweepy module for streaming into the Twitter API, tweet instances in JSON format, related to #ransomware hashtag search, are like the following example:

{ "created_at": "Mon Sep 11 12:09:28 +0000 2017", "id": 907214508048961536, "id_str": "907214508048961536", "text": "RT @htbridge: #MongoDB Ransom Victims Had No Account Passwords: https:\/\/t.co\/qfzfy9dc3v #Ransomware #cybersecurity", "source": "\u003ca href=\"http:\/\/www.Twitter.com\" rel=\"nofollow\"\u003eTwitter for Windows\u003c\/a\u003e", "truncated": false, "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": { "id": 906268429, "id_str": "906268429", "name": "Benoit Mouysset", "screen_name": "benmouy", "location": null, "url": null, "description": null, "translator_type": "none", "protected": false, "verified": false, "followers_count": 34, "friends_count": 61, "listed_count": 51, "favourites_count": 46, "statuses_count": 3497, "created_at": "Fri Oct 26 15:50:29 +0000 2012", "utc_offset": null, "time_zone": null, "geo_enabled": false, "lang": "fr", "contributors_enabled": false, "is_translator": false, "profile_background_color": "C0DEED", "profile_background_image_url": "http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png", "profile_background_image_url_https": "https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png", "profile_background_tile": false, "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "profile_image_url": "http:\/\/pbs.twimg.com\/profile_images\/2781987098\/49769fc170934db2f0e828560a831d6b_normal.jpeg", "profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/2781987098\/49769fc170934db2f0e828560a831d6b_normal.jpeg" , "default_profile": true, "default_profile_image": false, "following": null,

Copyright SAINT Consortium. All rights reserved. 52 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

"follow_request_sent": null, "notifications": null }, "geo": null, "coordinates": null, "place": null, "contributors": null, "retweeted_status": { "created_at": "Mon Sep 11 11:31:40 +0000 2017", "id": 907204994763509760, "id_str": "907204994763509760", "text": "#MongoDB Ransom Victims Had No Account Passwords: https:\/\/t.co\/qfzfy9dc3v #Ransomware #cybersecurity", "source": "\u003ca href=\"http:\/\/Twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e", "truncated": false, "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": { "id": 130604183, "id_str": "130604183", "name": "High-Tech Bridge", "screen_name": "htbridge", "location": "Geneva and San Francisco (CA)", "url": "https:\/\/www.htbridge.com\/", "description": "Award-winning Machine Learning and Intelligent Automation technology for web and mobile application security testing.", "translator_type": "none", "protected": false, "verified": false, "followers_count": 8477, "friends_count": 6307, "listed_count": 426, "favourites_count": 8, "statuses_count": 4317, "created_at": "Wed Apr 07 19:51:40 +0000 2010", "utc_offset": 7200, "time_zone": "Paris", "geo_enabled": false, "lang": "en", "contributors_enabled": false, "is_translator": false, "profile_background_color": "0090A1", "profile_background_image_url": "http:\/\/pbs.twimg.com\/profile_background_images\/806316121\/025278c41d8cb68ab92802dede901ab7.png ", "profile_background_image_url_https": "https:\/\/pbs.twimg.com\/profile_background_images\/806316121\/025278c41d8cb68ab92802dede901ab7.pn g", "profile_background_tile": false, "profile_link_color": "0084B4", "profile_sidebar_border_color": "FFFFFF", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "profile_image_url": "http:\/\/pbs.twimg.com\/profile_images\/807279406431567872\/bKG_kAH8_normal.jpg", "profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/807279406431567872\/bKG_kAH8_normal.jpg", "profile_banner_url": "https:\/\/pbs.twimg.com\/profile_banners\/130604183\/1497610307", "default_profile": false, "default_profile_image": false, "following": null, "follow_request_sent": null, "notifications": null }, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "quote_count": 0, "reply_count": 0, "retweet_count": 1, "favorite_count": 0, "entities": { "hashtags": [{ "text": "MongoDB", "indices": [0, 8] }, { "text": "Ransomware", "indices": [74, 85] }, { "text": "cybersecurity",

Copyright SAINT Consortium. All rights reserved. 53 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

"indices": [86, 100] }], "urls": [{ "url": "https:\/\/t.co\/qfzfy9dc3v", "expanded_url": "https:\/\/www.infosecurity-magazine.com\/news\/mongodb-ransom- victims-no\/", "display_url": "infosecurity-magazine.com\/news\/mongodb-r\u2026", "indices": [50, 73] }], "user_mentions": [], "symbols": [] }, "favorited": false, "retweeted": false, "possibly_sensitive": false, "filter_level": "low", "lang": "en" }, "is_quote_status": false, "quote_count": 0, "reply_count": 0, "retweet_count": 0, "favorite_count": 0, "entities": { "hashtags": [{ "text": "MongoDB", "indices": [14, 22] }, { "text": "Ransomware", "indices": [88, 99] }, { "text": "cybersecurity", "indices": [100, 114] }], "urls": [{ "url": "https:\/\/t.co\/qfzfy9dc3v", "expanded_url": "https:\/\/www.infosecurity-magazine.com\/news\/mongodb-ransom- victims-no\/", "display_url": "infosecurity-magazine.com\/news\/mongodb-r\u2026", "indices": [64, 87] }], "user_mentions": [{ "screen_name": "htbridge", "name": "High-Tech Bridge", "id": 130604183, "id_str": "130604183", "indices": [3, 12] }], "symbols": [] }, "favorited": false, "retweeted": false, "possibly_sensitive": false, "filter_level": "low", "lang": "en", "timestamp_ms": "1505131768606" }

A tweet is a complex object. The following table provides a list of all its attributes and a brief description of their meaning:

Table 6-1: Description of “Tweet” fields

Attribute Name Description This is a list of contributors if the feature is contributors enabled This is the tweepy.models.User instance of the author tweet author This is the tweepy.models.Place instance of the place place attached to the tweet This is the dictionary of coordinates in GeoJSON coordinates format This is the datetime.datetime instance of the created_at tweet creation time entities This is a dictionary of URLs, hashtags, and

Copyright SAINT Consortium. All rights reserved. 54 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

mentions in the tweets This is the number of times the tweet has been favorite_count favorited This flags whether the authenticated user has favorited favorited the tweet geo These are coordinates too This is the unique ID of the tweet as a big id integer id_str This is the unique ID of the tweet as a string This is the username of the status the tweet is in_reply_to_screen_name replying to This is the status ID of the status the tweet is in_reply_to_status_id replying to, as a big integer This is the status ID of the status the tweet is in_reply_to_status_id replying to, as a string This is the user ID of the status the tweet is in_reply_to_user_id replying to, as a big integer

This is the user ID of the status the tweet is in_reply_to_user_id_str replying to, as a string

This flags whether the tweet is a quote (that is, is_quote_status contains another tweet)

This is the string with the language code of the lang tweet

This flags whether the tweet contains URL with possibly_sensitive possibly sensitive material retweet_count This is the number of times the status retweeted This flags whether the status is a retweet This is the string describing the tool used to source post the status text This is the string with the content of the status This flags whether the status was truncated truncated (e.g., retweet exceeding 140 chars) This is the tweepy.models.User instance of the user tweet author

By obtaining these data is possible to perform the following operations by decompiling the tweet in the following entities:

• text: the text of the tweet itself. Besides the hashtags, it is important to analyze the body of the tweet for other words that may be related to the cybercrime but have not been marked with the # symbol.

• The most basic yet powerful analysis is implemented on the regular expression (regex) model. A more sophisticated approach, which is currently under evaluation, is to use the Natural Language Processing in order to extract the sentiments of the users that may be positive, negative, urgent etc.

• created_at: the date of creation. The timestamp of each tweet is the most critical information for the Social Network Analysis. The rate of appearance of new tweets for specific hashtags(i.g. #ransomware) as well as the second rate of this value. These metrics are well known in epidemiology and constitute the epidemic curves which one of the basic instruments to identify

Copyright SAINT Consortium. All rights reserved. 55 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

developing epidemics. Recent research has demonstrated the ability to use Twitter to detect epidemics. A similar but adapted approach is utilized on the SAINT SNA to focus on malware epidemics or large-scale cybercrime incidents. The SAINT SNA is based both on the hashtag trends as well as in the regex analysis of the main body of the tweet. The main goal of this approach is to identify possible threats even before having access to detailed samples or date related to a cybercrime incident.

• favorite_count, retweet_count: the number of favorites and retweets. Due to the limited number of tweets that are available to researchers for scientific purposes, about 1% percent of the total number of the posts that are upload every day, it is essential to extract as much information as possible from the limited number of messages that can be obtained. Therefore, in some cases, it might be better to focus on tweets that have a great impact instead of a uniform random search. A research strategy that is designed on the Pareto rule (80/20) can provide more targeted and, therefore, more accurate results. A strategy that identifies and follows most active accounts might help to overcome the limitation of the 1% access to Twitter data due to the limitation of the Twitter policy. A combination of a uniform Twitter mining and a preferential monitoring of the most popular accounts on the cybercrime and topics will be tested and evaluated.

• favorited, retweeted: boolean stating whether the authenticated user (you) have favorited or retweeted this tweet. A possible way to identify accounts of interest is by following specific lists related to the topic of cybersecurity. The SAINT SNA will explore possible mechanisms to flag prominent accounts and automatically retrieve their behavior

• lang: acronym for the language (e.g. “en” for English). The SAINT SNA naturally will mainly be focused on English tweets. As an experimental approach, alternative languages may be used. As the hashtags are mostly written in English, it is possible to analyze other languages as well. Languages that are related to non-EU countries, with a bad record on cybersecurity incidents and a vivid cybercrime ecosystem, are of special interest.

• id: the tweet identifier. A useful tag for archive and retrieval purposes of the stored information.

• place, coordinates, geo, location: geo-location information, if available. The location of the related tweets and accounts can be visualized in spot maps so as to provide the geographical dimension of the cybersecurity and cybercrime activity. A clustering of incidents in particular areas could be a potential intelligence lead for further investigation. As the project already will provide an interactive visual map the spot maps can be integrated into that platform.

• user: The author’s full profile will not be stored or processed.

• entities: a dictionary that contains a list of entities like URLs, @-mentions, hashtags, and symbols. The URLs found on tweets of interest will be also utilized as the input of the OSINT Web Crawler. Therefore, an initial seed of important websites with relative information would significantly enhance the capabilities of the OSINT Web Crawler.

• in_reply_to_user_id: user identifier if the tweet is a reply to a specific user

• in_reply_to_status_id: status identifier id the tweet is a reply to a specific status

The Twitter SNA uses the Twitter Streaming API. Streaming API looks into the future. That means that, once a connection is opened, this is kept opened and goes forward in time. By keeping the HTTP connection open, all the tweets that match the searching criteria can be retrieved. That is, the Twitter Streaming API is

Copyright SAINT Consortium. All rights reserved. 56 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms one of the favourite ways of getting a massive amount of data without exceeding the rate limits. Below is illustrated the difference with Twitter REST API.

Figure 6-3. The time dimension of REST API Vs Streaming API

The Google Trends SNA utilizes the popular Google Trends platform to extract trends that are related to the cybercrime activity. Google Trends is a public web facility of Google Inc., based on Google Search, that shows how often a particular search term is entered, relative to the total search volume across various regions of the world, and in various languages. The horizontal axis of the main graph represents time, and the vertical is how often a term is searched for relative to the total number of global searches. Below the main graph, popularity is broken down by countries, regions, cities, and language. Google Trends also allows the user to compare the volume of searches between two or more terms. An additional feature of Google Trends is in its ability to show news related to the search-term overlaid on the chart, showing how new events affect search popularity. Two pictures are given below for Google Trends query results, containing search trends about “ransomware” keyword (link: https://trends.google.com/trends/explore?q=ransomware).

Figure 6-4. Line graph with keyword search: Ransomware

Copyright SAINT Consortium. All rights reserved. 57 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

Figure 6-5. Map graph with keyword search: Ransomware

Two figures are also given below, illustrating the results from Google Trends platform, which are related to “Related Queries” factor and, specifically, the query “ransomware”, which yields the “ransomware attacks 2017” related result (link: https://trends.google.com/trends/explore?q=ransomware%20attacks%202017):

Figure 6-6. Line graph with related keyword query: ransomware attacks 2017

Figure 6-7. Map graph with related keyword query: ransomware attacks 2017

Similar to the Twitter SNA, the Google Trends SNA sub-tool uses social media mining techniques, to collect, process, and analyze Google Search trends queries related to cybercriminal activities around the globe. In a Google Trends SNA result instance, one can obtain and analyze the following useful option keys:

• Targeted search term(s) string or array if someone wishes to compare search terms required • Start of the time period of interest. If startTime is not provided, a date of January 1, 2004, is assumed (this is the oldest available google trends data) • End of the time period of interest. If the end time is not provided, the current date is selected.

Copyright SAINT Consortium. All rights reserved. 58 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

• geolocation of interest (string). • Preferred language (string defaults to English) • Category to search within (number of defaults to all categories) • The granularity of the geo searches (enumerated string [‘COUNTRY’, ‘REGION’, ‘CITY’, ‘DMA’]).

Two raw JSON examples about the instances related to search results are:

Example 1 - Returning top related queries for 'Westminster Dog show' with default started time, ended time, and geolocation categories:

{ "default": { "rankedList": [{ "rankedKeyword": [{ "query": "dog show 2016", "value": 100, "formattedValue": "100", "link": "/" }, { "query": "2016 westminster dog show", "value": 95, "formattedValue": "95", "link": "/" }, { "query": "dogs", "value": 20, "formattedValue": "20", "link": "/" } ] }, { "rankedKeyword": [{ "query": "dog show 2016", "value": 836500, "formattedValue": "Breakout", "link": "/" }, { "query": "2016 westminster dog show", "value": 811550, "formattedValue": "Breakout", "link": "/" }, { "query": "who won the westminster dog show", "value": 59000, "formattedValue": "Breakout", "link": "/" } ] }] } }

Example 2 - Returning top related topics for 'Chipotle' from January 1st, 2015 to February 10th, 2017:

{ "default": { "rankedList": [{ "rankedKeyword": [{ "topic": { "mid": "/m/01b566", "title": "Chipotle Mexican Grill", "type": "Restaurant company" }, "value": 100, "formattedValue": "100", "link": "/" }, { "topic": { "mid": "/m/02f217", "title": "Chipotle", "type": "Jalape\u00f1o" }, "value": 5, "formattedValue": "5", "link": "/" }, {

Copyright SAINT Consortium. All rights reserved. 59 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

"topic": { "mid": "/m/01xg7s", "title": "Chorizo", "type": "Topic" }, "value": 0, "formattedValue": "0", "link": "/" } ] }, { "rankedKeyword": [{ "topic": { "mid": "/m/09_yl", "title": "E. coli", "type": "Bacteria" }, "value": 40700, "formattedValue": "Breakout", "link": "/" }, { "topic": { "mid": "/m/0dqc4", "title": "Caridea", "type": "Animal" }, "value": 40, "formattedValue": "+40%", "link": "/" } ] }] } }

The SAINT Social Network Analyzer tools and the Web Crawler are already in the requirements specification and design phase, towards implementation within the duration of WP5.

6.3 Terms of use of the tools Due to the fact that SAINT’s tools, which will be developed and operated by CTI, will handle sensitive and risky information, certain measures and precautions need to be taken in order to ascertain safe operation of the tools and ethical use of the gathered information. First, as a formal obligation observed by any entity gathering and storing information which may be considered personal or sensitive, CTI (as well other involved partners), has submitted the data handling declaration form to the Hellenic Data Protection Authority. Then, under KEMEA’s guidance, CTI will implement the following terms of use for the tools and the computers on which they will run (the dots are placeholders which will be filled in with the specific details when the systems and tools will be up and running):

• The software will be installed on ……. (number of computers at CTI’s premises) computers. The location of these computers will be ……….. (the room into which the computers on which the tools run will operate).

• The software will use the following static IPs in order to connect to the Internet/Deep Web. o 1st Static IP o 2nd Static IP

• The software will be operated exclusively by the following authorized personnel of (Name of Organisation) of the consortium, under the supervision of (…), either by physically or remote access (e.g. secure VPN): o 1 o 2

• A screen recorder will be installed on each of the above-mentioned computers, so that all activities of the users are recorded.

Copyright SAINT Consortium. All rights reserved. 60 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

• Log files of every operation will be created, along with the appropriate timestamps.

• In order to avoid unwanted access to illegal content, no files will be downloaded to the above- mentioned computers. Metadata and calculated hash values will be stored, instead.

• A “report on illegal content” will be created (number of times) times per month. This report will include any suspicious or illegal content (path, timestamp, metadata, hash value) that the software will trace. This report will be officially sent, from the Consortium to the Cyber Crime Division of the Hellenic Police, for any further actions.

7. Conclusion In this deliverable we surveyed the existing vulnerability markets, with emphasis on 0-day exploits, focusing on their financial parameters as well as their usefulness in serving as intelligent feeds and information sources for SAINT’s tools. As expected, there is no limit to the kind, volume, and format of information available on the Web about vulnerabilities and their characteristics. We tried to identify the information sources which are authoritative, comprehensive and allow automated processing. These sources will form the basis for the financial analysis in WP3 as well as the tool-based processing in WP5. Subsequent deliverables from these WPs (mainly) will expand and elaborate on the contents of this deliverable by providing more details about the identified information sources, after their automated processing with the WP5 tools, and their financial analysis within the context of WP3. We should note, however, the (somewhat) insurmountable difficulties involved in collecting information from Deep Web sources and black markets. Our research shows that any efforts to infiltrate these sources and obtain data with computer-based methods is bound to fail since, even with specialized browsers such as Tor or I2P, access to the entry points of the markets is blocked. Efforts will be expended on alleviating this difficulty, to the degree it is possible, but our team will rely, mostly, on HUMINT (i.e. human intelligence) methods in the sense that members of our team, in close collaboration with KEMEA, will try to access, periodically, the Deep Web and black markets “manually”, through dedicated computer systems and under the protection of the Greek authorities (through KEMEA). Our team will proceed along two directions, using automated tools as well as HUMINT methods, since we believe that this is the best way to combine vulnerability information from accessible information sources, as well as more “esoteric” sources, from the Deep Web and the black markets. With respect to the accessible sources, our team has been working on the Social Network Analyzer (SNA) with some first successful results with respect to collecting information from public Twitter accounts. The analyser will be extended to work on any accessible information source, such as the ones identified in this deliverable. In parallel, our team has been experimenting with the Deep Web Crawler using open source browsers. The details of these investigations, as well as the design and implementation of the SNA and the Deep Web Crawler, will be given in the deliverables of WP5.

Copyright SAINT Consortium. All rights reserved. 61 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

References 1. A.M. Algarni and Y.K. Malaiya. Software Vulnerability Markets: Discoverers and Buyers. International Journal of Computer, Information Science and Engineering Vol: 8, No: 3, 2014. 2. T. Alpcan and T. Başar T. A game theoretic approach to decision and analysis in network intrusion detection. In Proc. 42nd IEEE Conference on Decision and Control, IEEE, 2003. 3. J. Armin and M. Cremonini. 0-Day Vulnerabilities and Cybercrime. In Proc. 10th International Conference on Availability, Reliability and Security, pp. 711 – 718, 2015. 4. BBC report. UK data protection laws to be overhauled. August 2017. Available at 5. CDT Security & Surveillance report. Issue Brief: A “Backdoor” to Encryption for Government Surveillance. Available at https://cdt.org/insight/issue-brief-a-backdoor-to-encryption-for-government-surveillance/ 6. CERT-EU Security Advisory 2017-012. WannaCry Ransomware Campaign – Exploiting SMB Vulnerability. May 22, 2017. Available at https://cert.europa.eu/static/SecurityAdvisories/2017/CERT-EU-SA2017-012.pdf 7. European Commission. Proposal for a regulation of the European Parliament and of the Council concerning the respect for private life and the protection of personal data in electronic communications and repealing Directive 2002/58/EC (Regulation on Privacy and Electronic Communications), 2017. Available at http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52017PC0010 8. Flexera Software, LLC. Vulnerability Review. Published 13 March, 2017. Available at https://www.flexera.com/enterprise/resources/research/vulnerability-review/ 9. Focus: How to avoid being hit by ransomware, May 2017 (ComputerWeekly.com). Available at http://www.computerweekly.com/ehandbook/Focus-how-to-avoid-being-hit-by-ransomware 10. P.A. Foti. 0-Day Vulnerabilities and Cybercrime: Key Players, Markets and Technologies. PhD Thesis, University of Milan, 2014. 11. D. Fudenberg and Jean Tirole. Game Theory. The MIT Press, 11th Printing edition, 1991. 12. A. Greenberg. Shopping For Zero-Days: A Price List For Hackers' Secret Software Exploits. Forbes/Security, 2012. Available at https://www.forbes.com/sites/andygreenberg/2012/03/23/shopping- for-zero-days-an-price-list-for-hackers-secret-software-exploits/#e6c4d632660b 13. M. Howard and D. LeBlanc. Writing Secure Code: Practical Strategies and Proven Techniques for Building Secure Applications in a Networked World (Developer Best Practices). 2nd edition. Microsoft Press, 2004. 14. S. Ludwig. Google throws stacks of cash at hackers to publicly crack its Chrome browse. Venturebeat, 2012. Available at https://venturebeat.com/2012/03/08/hackers-crack-chrome-in-publi/ 15. A.C. MacKinlay. Event Studies in Economics and Finance. Journal of Economic Literature Vol. XXXV, Issue 1, 1997. Available at http://www.bu.edu/econ/files/2011/01/MacKinlay-1996-Event-Studies-in-Economics- and-Finance.pdf 16. P. Maillé, P. Reichl, and B. Tuffin. In Proc. Performance Models and Risk Management in Communications Systems, N. Gülpınar, P.G. Harrison, and B. Rustem (eds.), Springer, 2011. 17. McAffee Labs Threat Report. September 2017. Available at https://www.mcafee.com/us/resources/reports/rp-quarterly-threats-sept-2017.pdf 18. J. Menn, Reuters. Exclusive: Secret contract tied NSA and security industry pioneer. December 2013. Available at https://www.reuters.com/article/us-usa-security-rsa/exclusive-secret-contract-tied-nsa-and- security-industry-pioneer-idUSBRE9BJ1C220131220 19. K. Merrick, M. Hardhienata, K. Shafi, and J. Hu. A Survey of Game Theoretic Approaches to Modelling Decision-Making in Information Warfare Scenarios. Future Internet, Special Issue , 8(34), 2016 20. C. Miller. The Legitimate Vulnerability Market: Inside the Secretive World of 0-day Exploit Sales. Independent Security Evaluators, 2007. 21. Panda Security WannaCry Report. May 15, 2017. Available at https://www.pandasecurity.com/mediacenter/src/uploads/2017/05/WannaCry_Report-en.pdf 22. Quartz Media (QM) report. The Petya ransomware attack made $20k less than WannaCry in its first 24 hours. Available at https://qz.com/1016525/the-petya-ransomware-cyberattack-has-earned-hackers-20k- less-than-wannacry-in-its-first-24-hours/ 23. RSA BSAFE Crypto-C 6.4. Available at https://community.rsa.com/community/products/bsafe/crypto-c- 64

Copyright SAINT Consortium. All rights reserved. 62 D3.5 Analysis of Legal and Illegal Vulnerability Markets and Specification of the Data Acquisition Mechanisms

24. B. Schneier. ISO Rejects NSA Encryption Algorithms. September 2017. Available at https://www.schneier.com/blog/archives/2017/09/iso_rejects_nsa.html 25. Stemmit report. The ShadowBrokers Dump Service. September 2017. Available at https://steemit.com/shadowbrokers/@theshadowbrokers/theshadowbrokers-dump-service-september- 2017 26. R. Telang and S. Wattal. An Empirical Analysis of the Impact of Software Vulnerability Announcements on Firm Stock Price. IEEE Transactions on Software Engineering Vol. 3, Issue 8, pp. 544 – 557, 2007. 27. C. Wardell, L. Wynter, and M. Helander. Capacity and Value Based Pricing Model for Professional Services. RC24349 (W0709-056), IBM Research Report, September 18, 2007. 28. A. Young and M. Yung. The Dark Side of Black-Box Cryptography, or: Should we trust Capstone? In Proc. N. Koblitz (Ed.), Advances in Cryptology – Crypto ‘96, pp 89 – 103, LNCS 1109, Springer, 1996. 29. A. Young and M. Yung. Kleptography: Using Cryptography Against Cryptography. In Proc. W. Fumy (Ed.), Advances in Cryptology – Eurocrypt ‘97, pp. 62 – 74, LNCS 1233, Springer, 1997. 30. PWC 2015 Information Security Breaches Survey, 2015. Available at https://www.pwc.co.uk/assets/pdf/2015-isbs-technical-report-blue-digital.pdf 31. Krebs on Security. How Many Zero-Days Hit You today? December 13. Available at https://krebsonsecurity.com/2013/12/how-many-zero-days-hit-you-today/ 32. Adobe’s HackerOne service. https://helpx.adobe.com/security/alertus.html 33. Dropbox HackerOne service. https://www.dropbox.com/special_thanks 34. Forbes article, by Andy Greenberg, March 2012. Shopping For Zero-Days: A Price List For Hackers' Secret Software Exploits. Available at https://www.forbes.com/sites/andygreenberg/2012/03/23/shopping-for-zero-days-an-price-list-for- hackers-secret-software-exploits/#4740a0572660 35. Symantec on Microsoft Internet Explorer CVE-2017-8618, July 11 2017. https://www.symantec.com/security_response/vulnerability.jsp?bid=99399 36. Business Insider (Tech Insider) article, by Alex Heath, October 26 2015. Available at http://www.businessinsider.com/the-app-100-the-worlds-greatest-apps-2015-10/#whatsapp-is-the-most- popular-way-to-talk-with-people-around-the-world-5 37. Stanford’s security updates list, https://uit.stanford.edu/service/bigfix/security_updates 38. Digital gold: why hackers love Bitcoin, by Simon Usborn, The Guarduan, 15 May 2017, https://www.theguardian.com/technology/2017/may/15/digital-gold-why-hackers-love-bitcoin- ransomware 39. Bug brokers offering higher bounties, by Robert Lemos, The Register, 25 January 2007, https://www.theregister.co.uk/2007/01/25/bug_brokers_offering_higher_bouties/

Copyright SAINT Consortium. All rights reserved. 63