SecureCloud

Joint EU-Brazil Research and Innovation Action SECURE BIG DATA PROCESSINGIN UNTRUSTED CLOUDS https://www.securecloudproject.eu/ Analysis of existing technologies D2.1

Due date: 31 December 2016 Submission date: 23 January 2017

Start date of project: 1 January 2016

Document type: Deliverable Work package: WP2 Editor: Andrey Brito (UFCG) Reviewer: Michal Fischer (IEC) Reviewer: Christian Priebe (IMP)

Dissemination Level √ PU Public CO Confidential, only for members of the consortium (including the Commission Services) CI Classified, as referred to in Commission Decision 2001/844/EC

SecureCloud has received funding from the European Union’s Horizon 2020 research and innovation programme and was supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under grant agreement No 690111. Tasks related to this deliverable: Task No. Task description Partners involved○ T2.1 Evaluation of technology trade-offs UFCG∗,TUD,IMP,UniNE,CS

○This task list may not be equivalent to the list of partners contributing as authors to the deliverable ∗Task leader Executive Summary

This document provides an analysis of the existing technologies that can be used to address the goal of the SecureCloud project: Secure Big Data Processing in Untrusted Clouds. Its intention is to serve as a contextualization for the SecureCloud project partners and support the initial decisions. In addition, it provides the how the workflows in the deployment of the secure applications should be adapted, which will directly affect the implementation of the APIs. Although the Brazilian National Congress recently began to consider a bill that would allow the president to require personal data from Brazilian citizens were kept in the country, Brazil has not a comprehensive data protection law yet. Therefore, in this document, we take as the basis for our analysis the influential European Network and Information Security Agency (ENISA) document on risks and recommendations for information security [36] together with other authoritative publications on the subject such as NIST’s guidelines on security and privacy for cloud computing [59]. We then identify the risks that can be relevant for the selection of the technology, which are classified in 3 categories: (i) policy and organizational; (ii) technical; and, (iii) legal. Next, we survey the existing technologies, considering the two basic approaches for secure computation in the cloud: relying either on hardware or software approaches for secure computation. As representatives of the hardware-based approach we found SGX [31], secure co-processors, ARM Trusted Zone [14] and AMD Memory Encryption [34] (which, at the time of this document had not yet been officially launched). Then, as representative of the software-based approach we considered homomorphic encryption. Finally, we revisit the risks considered relevant and identify how the chosen technology, Intel SGX, can be used to mitigate these risks or, when applicable, highlight the risks that are not mitigated by this technology. We end this investigation by implementing a proof-of-concept application that helped us to identify the workflow for deploying secure applications in the cloud and, consequently, helped us to design main services and requirements for APIs that will be needed for the SecureCloud infrastructure.

i Contents

Executive Summary i

1 Objectives 2

2 Approach 3

3 Risk-based assessment of cloud usage 4 3.1 Vulnerabilities ...... 4 3.2 Risks ...... 9 3.2.1 Policy and organizational risks ...... 9 3.2.2 Technical risks ...... 12 3.2.3 Legal risks ...... 16

4 Analysis of existing technologies 21 4.1 Intel SGX ...... 21 4.1.1 Components ...... 21 4.1.2 Usage ...... 23 4.1.3 Limitations ...... 24 4.2 Trusted and Secure Boot ...... 24 4.2.1 Components ...... 25 4.2.2 Usage ...... 26 4.2.3 Limitations ...... 27 4.3 Secure Computation ...... 27 4.3.1 Components ...... 27 4.3.2 Usage ...... 28 4.3.3 Limitations ...... 30 4.4 Conclusion ...... 30 4.4.1 Attestation ...... 30 4.4.2 Confidentiality ...... 30 4.4.3 Feasibility ...... 31 4.4.4 Robustness ...... 33 4.4.5 Summary ...... 33

5 Evaluation of SGX 34 5.1 Attacks ...... 34 5.1.1 Address Translation Attacks ...... 34 5.1.2 Physical Attacks ...... 35 5.1.3 Privileged Software Attacks ...... 37 5.1.4 Software Attacks on Peripherals ...... 38 5.1.5 Cache Timing Attacks ...... 40 5.2 Risks ...... 40 5.2.1 Policy and organizational risks ...... 40 5.2.2 Technical risks ...... 43 5.2.3 Legal risks ...... 53 5.3 Practical evaluation ...... 58 5.3.1 Intel SGX remote attestation ...... 59

ii iii

5.3.2 The proof-of-concept application ...... 60 5.4 Conclusion ...... 62

6 Requirements for secure computation in the cloud 63 6.1 Overview of the deployment process of a typical application ...... 63 6.1.1 Containers and Docker ...... 63 6.1.2 Basic Applications ...... 64 6.1.3 Orchestrated applications ...... 64 6.2 Secure services for secure applications ...... 64 6.2.1 Basic applications ...... 65 6.2.2 Orchestrated application ...... 65 6.3 Conclusion ...... 67

7 Final remarks 68 List of Figures

4.1 Comparison between reading from SGX enclave and from unencrypted memory . . . . . 32 4.2 SGX compared with homomorphic encryption ...... 33

5.1 Simplified scheme of the proof of concept [73] ...... 61 5.2 SGX imposed overhead ...... 62

1 1 Objectives

In this document, we evaluate technologies for implementing secure big data applications in servers owned by a cloud provider. The set of evaluated technologies considers well-known hardware alternatives, such as Secure Boot, based on the Trusted Platform Module (TPM) specifications and supported by major hardware producers, as well as features that currently require restructuring of how applications are developed and deployed, such as Intel Software Guard Extensions (SGX). As a result, one technology has been validated and selected. We also took steps to identify how this tecnhology, Intel SGX, will be integrated in a cloud infrastructure environment, including basic services and the respective APIs. Although the concepts and APIs are agnostic to the cloud middleware being used, we considered in our evaluation the usage of OpenStack, the de facto standard platform for managing cloud computing infrastructures. The definitions of the services to be provided and the APIs will guide the tasks and work packages developed in the remaining of the project.

2 2 Approach

For guiding the identification of technology alternatives for achieving the project’s goals and validating these technologies we constructed this document as follows:

1. Risk and vulnerability assessment is made for secure computing in the cloud based on the documents of ENISA [36] and NIST [59]. The risks are classified into three categories: organizational, technical and legal.

2. Technologies are listed that can be used for the processing of data with security guarantees.

3. The practical use of the identified technologies (SecureBoot, TrustedBoot, Intel SGX, Homomorphic encryption) is evaluated. This evaluation leads to a proof-of-concept implementation with the core technology selected, Intel SGX, and helps understanding major potential obstacles in using these technologies.

4. Discuss how the selected technologies can be integrated into a cloud ecosystem to achieve project goals.

3 3 Risk-based assessment of cloud usage

According to NIST [59], in cloud computing “the common characteristics most share are on-demand scalability of highly available and reliable pooled computing resources, secure access to metered services from nearly anywhere, and dislocation of data from inside to outside the organization.” The same source declares that “aspects of these characteristics have been realized to a certain extent, but that “cloud computing remains a work in progress”. As a general goal, this document aims at providing an assessment concerning how a set of new technologies contributes to advance the realization of these characteristics. Therefore, in order to analyze current hardware technologies supporting secure processing in the context of cloud computing, we will adopt a risk-based approach, assessing how these technologies contribute towards mitigating the risks commonly associated with cloud computing. Risks are commonly associated with a set of vulnerabilities, which are conditions or features of a system that may open the way for system malfunction or malicious attacks. The level of risk may be mitigated by the introduction of countermeasures that help to prevent attacks. We will take as the basis for our analysis the influential European Network and Information Security Agency (ENISA) document “Cloud Computing: Benefits, risks and recommendations for information security” [36], together with other authoritative publications on the subject such as NIST’s “Guidelines on Security and Privacy in Public Cloud Computing” [59] and “NIST Cloud Computing Security Reference Architecture” [60]. These two documents are among others listed in [19], a matrix listing security fundamental concepts with references to documents that guide in the process of assessing the overall security risk of using cloud computing. The rest of this chapter is organized as follows. In Section 3.1 we discuss a set of vulnerabilities based on the documents from ENISA and NIST, discussed above. Then, in Section 3.2 we discuss a set of risks associated to the usage of data processing in the cloud. The understanding of risks and vulnerabilities helps to create the basis for investigating technologies to support processing of sensitive data in the cloud.

3.1 Vulnerabilities

In this section we enumerate a set of vulnerabilities that may be relevant in the context of data processing in a cloud environment. Some vulnerabilities are specific to cloud computing usage. Others are more general and apply to any system that may be accessed through the Internet. After describing the vulnerabilities, the next section associates them with risks that may or may not be addressed by the available approaches to provide secure data processing services. V1. AAA VULNERABILITIES: AAA (Authentication, Authorization, Accounting) vulnerabilities arise out of situations such as (i) insecure storage of cloud access credentials by a customer; (ii) insufficient roles available; (iii) credentials stored on a transitory machine. The consequences could be unauthorized resource access, difficulties in tracking misuse of resources and other security incidents. In the context of cloud computing password-based authentication is considered insufficient, and stronger authentication for accessing cloud resources is required. V2. USER PROVISIONING VULNERABILITIES: User provisioning (e.g., creation) vulnerabilities are usually the consequence of (i) provisioning proccess not under the control of the customer, (ii) identity of customer not adequately verified at registration, (iii) delays in synchronization between cloud system components, (iv) multiple unsynchronized copies of identity data, and (v) credentials vulnerable to interception and replay. V3. USER DE-PROVISIONING VULNERABILITIES: Due to time delays in the process of revocation, revoked credentials may be valid until some time after revocation.

4 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

V4. REMOTE ACCESS TO MANAGEMENT INTERFACE: As a consequence of weak authentication, for example, the cloud infrastructure may be compromised due to vulnerabilities in end-point machines.

V5. HYPERVISOR VULNERABILITIES: Attacks on the hypervisor are very critical because it fully controls the physical resources and the VMs running on top of it. By exploiting the hypervisor, each VM may be potentially exploited.

V6. LACK OF RESOURCE ISOLATION: Resources used by one customer can affect resources used by another customer. In IaaS cloud computing, physical resources are typically shared by multiple virtual machines owned by different cloud consumers. Hypervisors used in IaaS clouds offer rich APIs with interfaces that are exposed to cloud customers, and vulnerabilities in their security models may lead to unauthorized access to these shared resources and customer information. Attackers may be able to manipulate assets belonging to the cloud and provoke denial of service, data leakage, data compromise, and direct financial damage. Other risks to resource isolation may arise from lack of controls on cloud cartography and co-residence, and cross side channel vulnerabilities.

V7. LACK OF REPUTATIONAL ISOLATION: Activities from one customer may impact on the reputation of another customer.

V8. COMMUNICATION ENCRYPTION VULNERABILITIES: These vulnerabilities concern the possibility of reading data in transit.

V9. LACK OF OR WEAK ENCRYPTION OF ARCHIVES AND DATA IN TRANSIT: Failure to encrypt data in transit or held in archives and databases.

V10. IMPOSSIBILITY OF PROCESSING DATA IN ENCRYPTED FORM: Despite recent advances in homomorphic encryption there is a little prospect of any commercial system being able to maintain encryption during processing for general computations. For specific cases, there are already efficient algorithms, nevertheless, each problem requires a custom solution.

V11. POOR KEY MANAGEMENT PROCEDURES: Cloud computing infrastructures require the management and storage of different kinds of keys (session keys to protect data in transit, file encryption keys, key pairs identifying cloud providers, key pairs identifying customers, authorization tokens, and revocation certificates). Some difficulties in key management regarding cloud computing are the following.

• Virtual machines do not have a fixed hardware infrastructure and cloud based contents are usually geographically distributed, which makes it harder to apply standard controls such as hardware security module (HSM) storage to keys on cloud infrastructures; being strongly physically protected, they are not easily geographically distributed or replicated. • Key management standards do not provide standardized wrappers for interfacing with distributed systems. • Key management interfaces accessible via the public Internet are more vulnerable.

5 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

• Distribution of secrets needed for virtual machine authentication may present problems of scalability; dynamic scaling of hierarchical trust authorities is difficult to achieve because of the resource overhead in creating new authorities. • Revocation of keys within a distributed architecture is expensive; centralized solutions such as OCSP are expensive and do not necessarily reduce the risk unless the CA and the CRL are tightly bound.

V12. KEY GENERATION – LOW ENTROPY FOR RANDOM NUMBER GENERATION: The combination of standard system images, visualization technologies and lack of input devices may lead to low entropy; therefore, an attacker on one virtual machine may be able to guess encryption keys generated on other virtual machines.

V13. LACK OF STANDARD TECHNOLOGIES AND SOLUTIONS: Lack of standards may lead to lock-in, which can become a major problem if the provider ceases operation.

V14. NO SOURCE ESCROW AGREEMENT: Lack of source escrow means that if a PaaS or SaaS provider goes into bankruptcy, its customers are not protected.

V15. INACCURATE MODELLING OF RESOURCE USAGE: Statistical provision of cloud services may lead to resource exhaustion because they are provisioned statistically. Failures may be due to inaccurate modeling of resource usage, failure of resource allocation algorithms due to unexpected events, failure of resource allocation algorithms using job or packet classification, and failures in overall resource provisioning.

V16. NO CONTROL ON VULNERABILITY ASSESSMENT PROCESS: Restrictions on port scanning and vulnerability testing are important vulnerabilities.

V17. POSSIBILITY THAT INTERNAL (CLOUD) NETWORK PROBING WILL OCCUR: Cloud customers can perform port scans and other tests on other customers within the internal network.

V18. POSSIBILITY THAT CO-RESIDENCE CHECKS WILL BE PERFORMED: Side-channel attacks exploiting a lack of resource isolation allow attackers to determine which resources are shared by which customers.

V19. LACK OF FORENSIC READINESS: Many providers do not implement appropriate services and terms of use to improve forensic readiness, such as access to the IP logs of clients accessing content and, in the case of IaaS, forensic services such as recent VM and disk images.

V20. SENSITIVE MEDIA SANITIZATION: Shared tenancy of physical storage resources means that sensitive data may leak because of shared tenancy and data destruction difficulties at the end of a life-cycle.

V21. SYNCHRONIZING RESPONSIBILITIES OR CONTRACTUAL OBLIGATIONS EX- TERNAL TO CLOUD: Cloud customers are often unaware of the contents of the terms of service and therefore may misplace attribution of responsibility for activities such as archive encryption to the cloud provider.

6 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

V22. CROSS-CLOUD APPLICATIONS CREATING HIDDEN DEPENDENCY: Hidden dependencies exist in the services supply chain, and the cloud provider architecture does not support continued operation from the cloud when the third parties involved and the service provider have been separated from each other.

V23. SLA CLAUSES WITH CONFLICTING PROMISES TO DIFFERENT STAKEHOLDERS: SLA clauses may contradict other clauses or clauses from other providers.

V24. SLA CLAUSES CONTAINING EXCESSIVE BUSINESS RISK: SLAs may carry too much business risk for a provider, given the actual risk of technical failures.

V25. AUDIT OR CERTIFICATION NOT AVAILABLE TO CUSTOMERS: The cloud provider cannot provide any assurance to the customer via audit certification, e.g. because it is using open source hypervisors or customized versions of them which have not reached any Common Criteria certification.

V26. CERTIFICATION SCHEMES NOT ADAPTED TO CLOUD INFRASTRUCTURES: There might not cloud-specific controls, leading to missed security vulnerabilities.

V27. INADEQUATE RESOURCE PROVISIONING AND INVESTMENTS IN INFRASTRUC- TURE: If predictive models fail, the cloud provider service can fail for a long period.

V28. NO POLICIES FOR RESOURCE CAPPING: When resource use is unpredictable, there should be a flexible and configurable way to set limits on resources.

V29. STORAGE OF DATA IN MULTIPLE JURISDICTIONS AND LACK OF TRANS- PARENCY ABOUT IT: Companies may violate regulations if clear information is not provided about the jurisdiction of storage.

V30. LACK OF INFORMATION ON JURISDICTIONS: Customers should be informed about data stored or processed in high risk jurisdictions where it is vulnerable to confiscation by forced entry.

V31. LACK OF COMPLETENESS AND TRANSPARENCY IN TERMS OF USE: Customers may be unaware of the implications of the accepted terms.

V32. LACK OF SECURITY AWARENESS: Cloud customers might not be aware of the risks they may face when migrating into the cloud.

V33. LACK OF VETTING PROCESSES: Since there may be very high privilege roles within cloud providers, due to the scale involved, the lack or inadequate vetting of the risk profile of staff with high privilege roles is an important vulnerability.

V34. UNCLEAR ROLES AND RESPONSIBILITIES: Inadequate attribution of roles and responsibilities in the cloud provider organization also constitutes a vulnerability.

V35. POOR ENFORCEMENT OF ROLE DEFINITIONS: A failure to segregate roles may lead to excessively privileged roles which can make extremely large systems vulnerable.

7 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

V36. NEED-TO-KNOW PRINCIPLE NOT APPLIED: Parties should not be given unnecessary access to data, which constitutes an unnecessary risk.

V37. INADEQUATE PHYSICAL SECURITY PROCEDURES: These include lack of physical perimeter controls and of electromagnetic shielding for critical assets vulnerable to eavesdropping.

V38. MISCONFIGURATION: These vulnerabilities include inadequate application of security baseline and hardening procedures, human error, and untrained administrator.

V39. SYSTEM OR OS VULNERABILITIES: For example, zero-day exploits can be an issue in systems accessible through the Internet.

V40. UNTRUSTED SOFTWARE: For example, tools and scripts for administration can be installed from unsafe origins (e.g., subjected to malicious modifications).

V41. LACK OF, OR A POOR AND UNTESTED, BUSINESS CONTINUITY AND DISASTER RECOVERY PLAN: Customers may loose their data in case of major provider failures.

V42. LACK OF, OR INCOMPLETE OR INACCURATE, ASSET INVENTORY: This vulnerability may lead to inadequate service provisioning.

V43. LACK OF, OR POOR OR INADEQUATE, ASSET CLASSIFICATION: This vulnerability may lead to inadequate service provisioning.

V44. UNCLEAR ASSET OWNERSHIP: This vulnerability may lead to inconsistent responsibility division between providers and customers.

V45. POOR IDENTIFICATION OF PROJECT REQUIREMENTS: These include a lack of consideration of security and legal compliance requirements, no systems and applications user involvement, and unclear or inadequate business requirements, among others.

V46. POOR PROVIDER SELECTION: Price-focused decisions may lead to providers with higher service risks.

V47. LACK OF SUPPLIER REDUNDANCY: Loss of data and unavailability can affect customer’s applications and data.

V48. APPLICATION VULNERABILITIES OR POOR PATCH MANAGEMENT: These vulnerabilities include bugs in the application code, conflicting patching procedures between provider and customer, application of untested patches, and vulnerabilities in browsers.

V49. RESOURCE CONSUMPTION VULNERABILITIES: This vulnerability includes resource exhaustion due to unexpected events, failure of resource allocation algorithms using job or packet classification, and failures in overall resource provisioning.

V50. BREACH OF NDA BY PROVIDER: Loss of data control by the customers.

V51. LIABILITY FROM DATA LOSS (CP): Customer’s business may be considerably damaged by losses in the data on the provider.

8 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

V52. LACK OF POLICY OR POOR PROCEDURES FOR LOGS COLLECTION AND RETENTION: The lack of audition capabilities may lead to difficulties in problem detection and tracking.

V53. INADEQUATE OR MISCONFIGURED FILTERING RESOURCES: This includes inadequate access or protection of resources.

3.2 Risks

Once we have an understanding of typical vulnerabilities that can affect data processing services, we will classify them according to the types of risks that they cause. Deciding if an application can be hosted in the cloud will then start by identifying which risks are relevant and, then, which vulnerabilities are associated to these risks. The level of risk varies with the type of cloud architecture (e.g., SaaS, PaaS, IaaS). ENISA defines three risk categories:

• Policy and organizational risks may have consequences in the governance or reputation of the customer services or applications executed in the cloud;

• Technical risks may cause loss of data or control of this data, as well as availability and quality of service metrics of the application;

• Legal risks are results of contractual breaches from the application provider (e.g., using resources from the cloud provider).

The following sections discuss risks, relating them to the previously defined vulnerabilities.

3.2.1 Policy and organizational risks R.1 LOCK-IN Cloud providers have few incentives to facilitate migration, and there is thus currently little support for it.

Vulnerabilities

V13. Lack of standard technologies and solutions. V31. Lack of completeness and transparency in terms of use. V46. Poor provider selection. V47. Lack of supplier redundancy.

R.2 LOSS OF GOVERNANCE By going into the cloud, a client turns over control to the Cloud Provider (CP) on a number of issues that may impact security, including confidentiality, integrity and availability of data, as well as deterioration of performance and quality of service.

9 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Vulnerabilities

V13. Lack of standard technologies and solutions. V14. No source escrow agreement. V16. No control on vulnerability assessment process. V21. Synchronizing responsibilities or contractual obligations external to cloud. V22. Cross-cloud applications creating hidden dependency. V23. SLA clauses with conflicting promises to different stakeholders. V25. Audit or certification not available to customers. V26. Certification schemes not adapted to cloud infrastructures. V29. Storage of data in multiple jurisdictions and lack of transparency about it. V30. Lack of information on jurisdictions. V31. Lack of completeness and transparency in terms of use. V34. Unclear roles and responsibilities. V35. Poor enforcement of role definitions. V44. Unclear asset ownership.

R.3 COMPLIANCE CHALLENGES Certification investments and other kinds of compliance can be put at risk by migration if the cloud provider cannot provide evidence of compliance to the relevant requirement or does not permit audits. Difficulties in obtaining from cloud providers evidence of their own compliance with the relevant requirements or in carrying out audits may seriously limit the willingness to migrate to the cloud.

Vulnerabilities

V13. Lack of standard technologies and solutions. V25. Audit or certification not available to customers. V29. Storage of data in multiple jurisdictions and lack of transparency about it. V26. Certification schemes not adapted to cloud infrastructures. V30. Lack of information on jurisdictions. V31. Lack of completeness and transparency in terms of use.

R.4 LOSS OF BUSINESS REPUTATION DUE TO CO-TENANT ACTIVITIES Resource sharing means that malicious activities carried out by one tenant may affect the reputation of another tenant.

10 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Vulnerabilities

V5. Hypervisor vulnerabilities. V6. Lack of resource isolation. V7. Lack of reputation isolation. R.5 CLOUD SERVICE TERMINATION OR FAILURE Some providers may go out of business or restructure their service portfolio offering, terminating some cloud computing services. This could lead to a loss or deterioration of service delivery performance, quality of service, and investment, and have a significant impact on the cloud customer’s ability to meet duties and obligations towards its customers.

Vulnerabilities

V31. Lack of completeness and transparency in terms of use. V46. Poor provider selection. V47. Lack of supplier redundancy. R.6 CLOUD PROVIDER ACQUISITION Acquisition of the cloud provider could make it impossible to comply with the security requirements.

Vulnerabilities

V31. Lack of completeness and transparency in terms of use. R.7 SUPPLY CHAIN FAILURE If specialized tasks of its production chain is outsourced by the cloud provider to third parties, the resulting level of security may depend on the level of security of each one of the links. Possible consequences are unavailability of services and loss of data confidentiality, integrity, and availability, apart from economic and reputation damage. Moreover, if there are dependencies on a third-party identity management service, an interruption of the third-party service or weakness in their security procedures may affect availability and confidentiality of the cloud customer.

Vulnerabilities

V22. Cross-cloud applications creating hidden dependency. V31. Lack of completeness and transparency in terms of use. V46. Poor provider selection. V47. Lack of supplier redundancy.

11 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

3.2.2 Technical risks R.8 RESOURCE EXHAUSTION (UNDER OR OVER PROVISIONING) Cloud services are by definition on-demand services, and hence there is a certain level of risk in this regard with the allocation of all the resources of a cloud service. Possible consequences are service unavailability, infrastructure oversize, access control compromise, as well as reputation and economic damage.

Vulnerabilities

V15. Inaccurate modeling of resource usage. V27. Inadequate resource provisioning and investments in infrastructure. V28. No policies for resource capping. V47. Lack of supplier redundancy.

R.9 ISOLATION FAILURE Multi-tenancy and shared resources are among the defining characteristics of cloud computing, where computing capacity, storage, and network are shared. This class of risks includes the failure of mechanisms separating storage, memory, and routing. Possible attacks include guest-hopping, SQL injection exposing multiple customers’ data stored in the same table, and side channel attacks.

Vulnerabilities

V6. Lack of resource isolation. V18. Possibility that co-residence check will be performed. V28. No policies for resource caping. V38. Misconfiguration. V39. System or OS vulnerabilities. V49. Resource consumption vulnerabilities. V53. Inadequate or misconfigured filtering resources.

R.10 CLOUD PROVIDER MALICIOUS INSIDER - ABUSE OF HIGH PRIVILEGE ROLES Insider malicious activities could have an impact on confidentiality, data integrity and availability. Malicious insiders may cause great harm, and constitute a risk that is very hard to very hard to counteract. V8. Communication encryption vulnerabilities. V9. Lack of or weak encryption of archives and data in transit. V10. Impossibility of processing data in encrypted form. V11. Poor key management procedures.

12 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

V25. Audit or certification not available to customers. V26. Certification schemes not adapted to cloud infrastructures. V33. Lack of vetting processes. V34. Unclear roles and responsibilities. V36. Need-to-know principle not applied. V37. Inadequate physical security procedures. V52. Lack of policy or poor procedures for logs collection and retention. R.11 MANAGEMENT INTERFACE COMPROMISE (MANIPULATION, AVAILABILITY OF INFRASTRUCTURE) The Internet accessible customer management interfaces of public cloud providers, controlling virtual machines and operation of the overall cloud system, pose an increased risk, especially when combined with remote access and web browser vulnerabilities.

Vulnerabilities

V1. AAA vulnerabilities. V4. Remote access to management interface. V38. Misconfiguration. V39. System or OS vulnerabilities. V48. Application vulnerabilities or poor patch management. R.12 INTERCEPTING DATA IN TRANSIT Cloud computing implies that more data is in transit along public networks than traditional infrastructures, as a result of, for example, data transferred to synchronize multiple distributed machine images, images distributed across multiple physical machines, and data transferred between cloud infrastructure and remote web clients. Possible threat sources in this context are, among others, sniffing, spoofing, man-in–the-middle attacks, side channel, and replay attacks.

Vulnerabilities

V1. AAA vulnerabilities. V8. Communication encryption vulnerabilities. V9. Lack of or weak encryption of archives and data in transit. V17. Possibility that internal (cloud) network probing will occur. V18. Possibility that co-residence checks will be performed. V31. Lack of completeness and transparency in terms of use. R.13 DATA LEAKAGE ON UP/DOWNLOAD, INTRA-CLOUD As R.12, but applies only to the of data between the cloud provider and the cloud customer.

13 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Vulnerabilities

V1. AAA vulnerabilities. V8. Communication encryption vulnerabilities. V17. Possibility that internal (cloud) network probing will occur. V18. Possibility that co-residence checks will be performed. V10. Impossibility of processing data in encrypted form. V48. Application vulnerabilities or poor patch management.

R.14 INSECURE OR INEFFECTIVE DELETION OF DATA A request to delete a cloud resource may not result in complete wiping of the data, which in the presence of multiple tenancies and the reuse of hardware resources may constitute an added risk to the client risk. Data may be available beyond the lifetime specified in the security policy. True wiping of data may not be carried out when a request to delete a cloud resource is made. If effective encryption is used then the level of risk may be considered to be lower.

Vulnerabilities

V20. Sensitive media sanitization.

R.15 DISTRIBUTED DENIAL OF SERVICE (DDOS) The resources in the cloud provider may have its performance degraded by external accesses that aim to overload them.

Vulnerabilities

V38. Misconfiguration. V39. System or OS vulnerabilities. V53. Inadequate or misconfigured filtering resources.

R.16 ECONOMIC DENIAL OF SERVICE (EDOS) A cloud customer’s resources may be used by other parties in a malicious way. This includes identity theft, unexpected loads on resources, and attacks consisting in using a public channel to use up the customer’s metered resources.

Vulnerabilities

V1. AAA vulnerabilities. V2. User provisioning vulnerabilities. V3. User de-provisioning vulnerabilities.

14 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

V4. Remote access to management interface. V28. No policies for resource capping.

R.17 LOSS OF ENCRYPTION KEYS This includes disclosure of secret keys or passwords to malicious parties, the loss or corruption of those keys, or their unauthorized use for authentication and non-repudiation.

Vulnerabilities

V10. Impossibility of processing data in encrypted form. V11. Poor key management procedures. V12. Key generation: low entropy for random number generation. V37. Inadequate physical secuirty procedures.

R.18 UNDERTAKING MALICIOUS PROBES OR SCANS These are indirect threats to the assets that can be used to collect information in the context of a hacking attempt, with potential loss of confidentiality, integrity, and availability of service and data.

Vulnerabilities

V17. Possibility that internal (cloud) network probing will occur. V18. Possibility that co-residence checks will be performed.

R.19 COMPROMISE SERVICE ENGINE The service engine code that sits above the physical hardware resources in each cloud architecture, e.g. the hypervisor in IaaS clouds, can have vulnerabilities and is prone to attacks or unexpected failure. For instance, an attacker can compromise the service engine by hacking it from inside a virtual machine in order to gain access to the data contained inside another customer environment, to monitor and modify the information inside them, or to reduce the resources assigned to them, causing denial of service.

Vulnerabilities

V5. Hypervisor vulnerabilities. V6. Lack of resource isolation.

R.20 CONFLICTS BETWEEN CUSTOMER HARDENING PROCEDURES AND CLOUD ENVIRONMENT The failure of customers to properly secure their environments may pose a vulnerability to the cloud platform if the cloud provider has not taken the necessary steps to provide isolation. Cloud

15 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

providers should therefore articulate their isolation mechanisms and provide best practice guidelines to assist customers to secure their resources. Assumptions by the customer that the cloud provider was conducting all activities required to ensure data security may place unnecessary risk on the customer’s data. Cloud customers must identify their responsibilities and comply with them. The co-location of many customers may cause conflict for the cloud provider as customers’ communication security requirements are likely to be divergent from each other, which worsens as the number of tenants and the disparity of their requirement increase. Therefore, cloud providers must be in a position to deal with these challenges by means of technology, policy and transparency.

Vulnerabilities

V31. Lack of completeness and transparency in terms of use. V23. SLA clauses with conflicting promises to different stakeholders. V34. Unclear roles and responsibilities.

3.2.3 Legal risks R.21 SUBPOENA AND E-DISCOVERY In the event of the confiscation of physical hardware, the centralization of storage as well as shared tenancy of physical hardware means many more clients are at risk of disclosure of their data to unwanted parties.

Vulnerabilities

V6. Lack of resource isolation. V29. Storage of data in multiple jurisdictions and lack of transparency about it. V30. Lack of information on jurisdictions.

R.22 RISK FROM CHANGES OF JURISDICTION Customer data may be held in multiple jurisdictions, some of which may be high risk.

Vulnerabilities

V29. Storage of data in multiple jurisdictions and lack of transparency about it. V30. Lack of information on jurisdictions.

R.23 DATA PROTECTION RISKS Cloud computing poses several data protection risks. Checking the data handling practices of a cloud provider may be difficult for the cloud customer, in particular in cases of multiple transfers of data.

16 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

The cloud customer, who is always the main person responsible for the processing of personal data, may lose control of the data processed by the cloud provider, as it can be difficult to effectively check the data processing that the cloud provider carries out, a problem exacerbated in the case of multiple transfers between federated clouds. There may be data security breaches that are not notified to the controller by the cloud provider.

Vulnerabilities

V30. Lack of information on jurisdictions. V29. Storage of data in multiple jurisdictions and lack of transparency about it.

R.24 LICENSING RISKS Licensing conditions and online licensing checks may become unworkable in a cloud environment. If software is charged on a per instance basis, every time a new machine is instantiated the cloud customer’s licensing costs may increase even though they are using the same number of machine instances for the same duration of time.

Vulnerabilities

V31. Lack of completeness and transparency in terms of use.

R.25 NETWORK BREAKS One of the highest risks, in which thousands of customers may be affected at the same time.

Vulnerabilities

V6. Lack of resource isolation. V38. Misconfiguration. V39. System or OS vulnerabilities. V41. Lack of, or a poor and untested, business continuity and disaster recovery plan.

R.26 NETWORK MANAGEMENT (I.E., NETWORK CONGESTION, MIS-CONNECTION, NON-OPTIMAL USE) Inadequate planning or poor monitoring and isolation of resources may lead to unreasonable performance of resources.

Vulnerabilities

V38. Misconfiguration. V39. System or OS vulnerabilities. V6. Lack of resource isolation.

17 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

V41. Lack of, or a poor and untested, business continuity and disaster recovery plan.

R.27 MODIFYING NETWORK TRAFFIC Also related to networking is the possibility of intercepting traffic.

Vulnerabilities

V2. User provisioning vulnerabilities. V3. User de-provisioning vulnerabilities. V8. Communication encryption vulnerabilities. V16. No control on vulnerability assessment process

R.28 PRIVILEGE ESCALATION Unauthorized parties could get access to sensitive data of the cloud customers.

Vulnerabilities

V1. AAA vulnerabilities. V2. User provisioning vulnerabilities. V3. User de-provisioning vulnerabilities. V5. Hypervisor vulnerabilities. V34. Unclear roles and responsibilities. V35. Poor enforcement of role definitions. V36. Need-to-know principle not applied. V38. Misconfiguration.

R.29 SOCIAL ENGINEERING ATTACKS (IE, IMPERSONATION) Loss of confidentiality may also be a consequence of social attacks.

Vulnerabilities

V2. User provisioning vulnerabilities. V6. Lack of resource isolation. V8. Communication encryption vulnerabilities. V32. Lack of security awareness. V37. Inadequate physical security procedures

18 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

R.30 LOSS OR COMPROMISE OF OPERATIONAL LOGS Logs may help identify data that have been compromised or system vulnerabilities. The absence of these logs may be a consequence of technical problems (e.g., storage, maintenance policies) or security problems (e.g., logs have been intentionally deleted).

Vulnerabilities

V1. AAA vulnerabilities. V2. User provisioning vulnerabilities. V3. User de-provisioning vulnerabilities. V19. Lack of forensic readiness. V39. System or OS vulnerabilities. V52. Lack of policy or poor procedures for logs collection and retention.

R.31 LOSS OR COMPROMISE OF SECURITY LOGS (MANIPULATION OF FORENSIC INVESTIGATION) Logs are essential for forensic investigation and identification of breaches. Not only can logs be lost (as in R.30), but also may be tampered with to hide malicious activities.

Vulnerabilities

V1. AAA vulnerabilities. V2. User provisioning vulnerabilities. V3. User de-provisioning vulnerabilities. V19. Lack of forensic readiness. V39. System or OS vulnerabilities. V52. Lack of policy or poor procedures for logs collection and retention.

R.32 BACKUPS LOST, STOLEN Data loss can be a significant risk with severe technical and economical consequence to customers.

Vulnerabilities

V1. AAA vulnerabilities. V2. User provisioning vulnerabilities. V3. User de-provisioning vulnerabilities. V37. Inadequate physical security procedures.

19 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

R.33 UNAUTHORIZED ACCESS TO PREMISES (INCLUDING PHYSICAL ACCESS TO MACHINES AND OTHER FACILITIES) Since cloud providers concentrate resources in large data centers, and although the physical perimeter controls are likely to be stronger, the impact of a breach of those controls is higher.

Vulnerabilities

V37. Inadequate physical security procedures.

R.34 THEFT OF COMPUTER EQUIPMENT Similar to R.33, physical perimeter controls are typically stronger then with on-premise infrastructures, but the impact is higher.

Vulnerabilities

V37. Inadequate physical security procedures.

R.35 NATURAL DISASTERS Generally speaking, the risk from natural disasters is lower compared to traditional infrastructure because cloud providers offer multiple redundant sites and network paths by default.

Vulnerabilities

V41. Lack of, or a poor and untested, business continuity and disaster recovery plan.

20 4 Analysis of existing technologies

This chapter discusses approaches to increase security of data processing in untrusted environments. We will discuss three well-known approaches, namely, Intel Software Guard Extensions, trusted and secure boot, and secure computation.

4.1 Intel SGX

Intel Software Guard eXtensions (SGX) is an Intel hardware-based technology for ensuring security of sensitive data from disclosure or modification. It enables user-level code to allocate enclaves (i.e., private regions of memory) that are protected even from processes running at higher privilege levels. Intel SGX capabilities are available from a set of instructions introduced in off-the-shelf processors, starting from the 6th Generation family, based on the Skylake microarchitecture.

4.1.1 Components The application of the Intel SGX technology requires four main componentes: (i) the availability of the set of instructions in the processor, (ii) the operating system driver, (iii) the software development kit to facilitate the access to the driver from the application code, and (iv) Platform Software. The Platform Software (Intel SGX PSW) is a collection of special SGX enclaves, and an Intel SGX Application Enclave Services Manager (AESM), provided along with the SGX SDK. These special enclaves and AESM are used when loading enclaves, retrieving cryptographic keys, and evaluating the contents of an enclave. The software development kit (SDK) is a collection of APIs, sample source code, libraries and tools that enable software developers to write and debug SGX applications in C/C++. Next, the drivers enable OS’s and other software to access the SGX hardware. Intel SGX drivers are available both for Windows (via Intel Management Engine) and for Linux* OS’s. Finally, the instruction set is composed of 17 new instructions that can be classified into the following functions [54]:

Enclave build/teardown: Used to allocate protected memory for the enclave, load values into the protected memory, measure the values loaded into the enclave’s protected memory, and teardown the enclave after the application has completed. Instructions used for this purpose are:

• ECREATE - Declare base and range, start build • EADD - Add 4k page • EEXTEND - Measure 256 bytes • EINIT - Declare enclave built • EREMOVE - Remove Page

Enclave entry/exit: Used to enter and exit the enclave. An enclave can be entered and exited explicitly. It may also be exited asynchronously due to interrupts or exceptions. In the case of asynchronous exits, the hardware will save all secrets inside the enclave, scrub secrets from registers, and return to external program flow. It then resumes where it left off execution. Instructions used for this purpose are:

• EENTER - Enter enclave • ERESUME - Resume enclave • EEXIT - Leave enclave

21 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

• AEX - Asynchronous enclave exit

Enclave security operations: Allow an enclave to prove to an external party that the enclave was built on hardware which supports the SGX instruction set. Instructions used for this purpose are:

• EREPORT - Enclave report • EGETKEY - Generate unique key

Paging instructions: Allow system software to securely move enclave pages to and from unprotected memory. Instructions used for this purpose are:

• EPA - Create version array page • ELDB/U - Load an evicted page into protected memory • EWB - Evict a protected page • EBLOCK - Prepare for eviction • ETRACK - Prepare for eviction

Debug instructions: Allow developers to use familiar debugging techniques inside special debug enclaves. A debug enclave can be single stepped and examined. A debug enclave cannot share data with a production enclave. This protects enclave developers if a debug enclave should escape the development environment. Instructions used for this purpose are:

• EDBGRD - Read inside debug enclave • EDBGWR - Write inside debug enclave

In addition, some features in Intel SGX make it very useful for providing data security. The main aspects are discussed below:

Enclave Page Cache: The Enclave Page Cache (EPC) is protected memory used to store enclave pages and SGX structures. The EPC is divided into 4KB chunks called EPC pages. EPC pages can either be valid or invalid. A valid EPC page contains either an enclave page or an SGX structure. Each enclave instance has an enclave control structure, SECS. Every valid enclave page in the EPC belongs to exactly one enclave instance. System software is required to map enclave virtual addresses to a valid EPC page.

Memory Encryption Engine: Memory Encryption Engine (MEE) is a hardware unit that encrypts and integrity protects selected traffic between the processor package and the main memory (DRAM). The overall memory region that an MEE operates on is called an MEE Region. Depending on implementation, the Processor Reserved Memory (PRM) is covered by one or more MEE regions. Intel SGX guarantees that all the data that leaves the CPU and is stored in DRAM is first encrypted using the MEE. Thus, even attackers with physical access to DRAM will not be able to retrieve secret data protected by SGX enclaves from it.

22 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Memory Access Semantics: CPU memory protection mechanisms physically block access to PRM from all external agents, by treating such accesses as references to non-existent memory. To access a page inside an enclave using MOV and other memory related instructions, the hardware checks the following:

• Logical processor is executing in ”enclave mode”. • Page belongs to enclave that the logical processor is executing. • Page accessed using the correct virtual address.

If any of these checks fails, the page access is treated as reference to nonexistent memory, or by signaling a fault. This guarantees that even processes with higher privilege levels won’t be able to access enclaves’ memory.

4.1.2 Usage Hoekstra et al. define three example secure solutions that have been developed to take advantage of the new instructions provided by Intel SGX [48]:

One-time Password (OTP): OTP is an authentication technology often used as a second factor to authenticate a user. As suggested by the name, the password is valid only for one authentication and is often used to authorize online financial transactions. There are two primary components to the architecture: the OTP server and the OTP client. In the case of the prototype developed for this work, the OTP client side component was implemented as a browser plugin. Within the OTP client software, the algorithms that interact directly with the OTP secrets were placed in an enclave. The OTP server can then use the Remote Attestation mechanism to verify the enclave running on the client side and establish a secure communication channel, allowing it to send a preshared key to be used as an OTP.

Enterprise Rights Management: Enterprise Rights Management (ERM) is a technology that aims to secure crucial elements of access and distribution of sensitive documents, such as confidentiality, access control, usage policies, and logging of user activities. While most existing solutions focus on the protection of enterprise data, the need to enforce the authorized use and dissemination of personal content such as pictures and videos is becoming increasingly apparent. The same technologies could be used for this purpose as well. If an attacker has physical possession of a platform, he may be able to use memory snooping or cold boot style attacks [46] to acquire the keying material for a valid ERM solution. This would permit the attacker to create malware which could use those stolen keys to effectively impersonate a valid ERM client. To avoid this kind of attack, SGX can be used, so that even if an attacker gains physical possession of the platform he will not be able to acquire the keying material.

Secure Video Conferencing: With the widespread availability of high network bandwidth and inexpensive hardware for capturing video and audio on client platforms, use of video chat, video conferencing and web conferencing applications has become increasingly popular for real time information sharing. This creates an opportunity for the unauthorized capture and distribution of a

23 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

video conferencing stream by malicious individuals, or theft of valuable IP or sensitive information in enterprise and government sectors. Today’s secure video conferencing solutions provide strong protection of sensitive content on the network through the use of cryptographic methods. But with the migration of threats from the network onto the computing platform, this level of security is no longer sufficient to protect the AV stream as it is being processed on the computing device. SGX allows a video conferencing application to protect its assets on the platform and enables strong participant authentication, thus mitigating a broad range of threats that could compromise the secrecy and integrity of the AV stream.

All the above mentioned examples provided by Intel are focused on the usage of SGX on client machines. Despite that, Intel SGX has also potential usages on server-side/backend applications. One example is VC3 [69], a system that allows users to run distributed MapReduce computations in the cloud while keeping their code and data secret, and ensuring the correctness and completeness of their results. VC3 runs on unmodified Hadoop, but crucially keeps Hadoop, the operating system and the hypervisor out of the trusted computing base (TCB), thus, confidentiality and integrity are preserved even if these large components are compromised. VC3 relies on SGX to isolate memory regions on individual computers, and to deploy new protocols that secure distributed MapReduce computations.

4.1.3 Limitations Before designing a new secure application using SGX, there are some limitations that need to be kept in mind by enclave developers when designing an enclave, in order to avoid having security flaws or big overheads due to memory swapping.

Memory size: When starting a machine, the SGX capable processor needs to reserve a portion of memory to itself (PRM). Also, the entire EPC must reside inside the PRM. In the current version of SGX, this portion of memory is limited to 128 MB in size per machine. If more space is needed than the space available, a big overhead in processing time is added, due to the need to encrypt the data before swapping from EPC to DRAM and decrypt the data after swapping from DRAM to EPC. Programming languages: The SGX SDK provided by Intel is only compatible with C/C++. This leaves secure application developers with no choice over what programming language to use when writing an enclave’s code. Hardware dependency: Intel SGX is a hardware-based technology. Therefore, the SGX capabilities can only be used in machines that are SGX capable and that have SGX enabled on BIOS setup.

4.2 Trusted and Secure Boot

There are two basic approaches that aim at protecting the software that will be run in a machine: Secure Boot and Trusted Boot. While Secure Boot focuses on verifying that a machine used signed software in the boot process, Trusted Boot’s main feature is the ability of measuring the software that was loaded in each step of the boot process. Both are described in the next sections.

24 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

4.2.1 Components When performing Secure Boot, the system firmware checks if the system boot loader is signed with a cryptographic key authorized by a database contained in the firmware. Depending on the given signatures’ correctness, the system will proceed the current boot or terminate it. UEFI Secure Boot [18, 6] uses the UEFI (Unified Extensible Firmware Interface) specification. It does not require specialized hardware, apart from non-volatile flash storage. This storage has to be used to store the UEFI implementation itself and some of the protected UEFI variables, including the trusted root certificate store. As with Trusted Boot, detailed below, Secure Boot can also provide a chain of trust, in which each stage verifies the signature on the next stage. Digital signatures provide both integrity and authenticity, and the process can be even simpler if all files on a system are signed by a single distribution key, because in this case, all of the files can be appraised with just this one public key [23]. It is important to notice that the operator that controls the hardware can also control which signing certificates Secure Boot offers, either by installing new certificates or removing existing ones. Signatures are then verified during the boot, and not when the boot loader is installed or updated. Therefore, UEFI Secure Boot does not stop boot path manipulations. It only prevents the system from executing a modified boot path once such a modification has occurred, and simplifies their detection. On the most basic level, UEFI Secure Boot prevents running unsigned boot loaders. The effect of running the boot loader obviously depends on that boot loader. With Trusted Boot, the concept of trust is built upon verifying the integrity of resources through its bootloader, firmware, operating system, and other components in a computing stack. In order to make this possible, some technologies such as a Trusted Platform Module (TPM) [17, 15] chip and support for Trusted Execution Technology (TXT) [50, 49], are required. When a trusted boot is being performed, the system computes measurements of the components of the computing stack to be loaded and commits such measurements to a TPM chip. This data is then stored in a secure register called Platform Configuration Register (PCR). In this context emerges tboot[15], an open source, pre-kernel/pre-hypervisor module that works with Intel TXT in the process of measuring and verifying launched components. Tboot certifies such elements using known-good values stored in the host’s TPM and enables launch control policies that can prevent booting of systems whose components cannot be verified. Once the boot is measurement, host integrity can be attested using the remote attestation protocols defined by the Trusted Computing Group (TCG) [15]. TPM can attest the measurements in the PCRs to a third party by signing them with a private key, and this third party can then use the corresponding public key to verify such signature and, consequently, the measurement of the boot code by a valid TPM chip. Intel’s OpenAttestion project, that recently evolved to OpenCIT 1, is an example of an attestation server that works as a third party in this process. OpenCIT is an Intel solution that enables physical machine attestation based on Intel TXT technology. With the OpenCIT it is possible to certify operating systems, hypervisor and any software component that can have a cryptographic summary extended in a Trusted Platform Module (TPM) register, called the Platform Configuration Registers (PCR). The attestation solution is based on a Client-Server model, and the server is responsible for safely collecting the PCR measurements of the machines to be attested. On each machine, an agent must be

1https://01.org/opencit

25 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds installed that communicates directly with the TPM and collects the records reliably and attestably (called PCR Quotes).

4.2.2 Usage Cloud computing environments allow users to store and process their data in third-party data centers. Along with the benefits of choosing this approach, there are a number of security concerns that must be taken in consideration [37]. The cloud provider should be able to ensure that their infrastructure is secure, and guarantee data integrity of the applications they run. Trusted and secure boot mechanisms can be used in cloud environments to mitigate some of these issues. OpenStack’s approach on providing these features involves a couple of its services, such as Ironic [2] and Nova [3]. Ironic, also known as the Bare Metal service, provisions physical hardware as opposed to virtual machines. Its pluggable driver architecture also enables vendor-specific drivers to be added for improved performance or for accessing specific functionalities. Nova provides virtual compute resources, e.g. instances, accessible via APIs and web interfaces for administrators and users, also allowing horizontal scaling on standard hardware. Both services implement the trusted and secure boot features in different levels. Secure boot support offered by Ironic is limited, since among its drivers, only the integrated Lights- Out (iLO) ones [7] provide such functionality for their users. It is important to notice that this is a vendor-specific feature. Only Ironic users that have hardware from HPE2 and that are equipped with the so called iLO management interface can use an iLO driver in Ironic. From a technical perspective, in order for a user of the iLO drivers to activate UEFI Secure Boot support, the user needs to add a secure-boot:true property under the given node’s capabilities, and also add a machine type (known typically as flavor) that also has this property. Therefore, when the user requests the secure type of machine, the scheduler will search for machines that have the needed property. Lastly, signed user and deploy images should be created to be used for deployment of the node. Ironic also supports Trusted Boot [13] for all drivers compatible with partition images. Partition images are a less convenient approach to provide images for a machine that is going to be provisioned. In contrast to whole disk images that require a single disk image for the disk, partition images require additional separate images for the RAM disk and Kernel, used in the boot of the Linux operating system. Such boot takes place at the end of the deployment process, when the node is rebooted with the new user image. If Trusted Boot is used, it measures the node’s BIOS, boot loader, Option ROM and the Kernel/RAM disk, and then, determines whether a bare metal node deployed by Ironic should be trusted. For configuring a node to perform trusted boot the user needs to add a trusted-boot:true property under the given node’s capabilities, and the respective flavor (as for the Secure Boot). It is also necessary to prepare tboot and mboot.c32 files for hosts of ironic-conductor processes. In addition to using Trusted Boot for bare metal provisioning, trusted boot is also supported by Nova by classifying a group of servers that will host virtual machines as trusted servers using trusted compute pools [16, 49]. Combining the security technologies mentioned above (e.g., TXT, TPM) with a remote attestation server , such as OpenCIT, cloud providers can ensure that the compute node runs only verified software, resulting in a secure cloud stack. From a user perspective, services can be then requested to run only on verified compute nodes, all possible because of trusted compute pools.

2http://www.hpe.com

26 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

What both trusted boot implementations, for Ironic and Nova, have in common, is the need of an attestation server, such as OpenAttestation (OAT)[11]. However, installing it together with OpenStack might not be a simple task [10]. The project has been through some modifications over the last couple of years, and a new version of it, called OpenCIT [12], is now the one indicated by Intel as the next generation attestation solution.

4.2.3 Limitations It is important to notice though that Secure Boot will not protect a PC from most malware or attackers. Secure Boot itself protects the boot phase of a system, but does not protect against attacks against your running system or data. It does not provide a hardware anchored attestation to a third party, like Trusted Boot, so a centralized management system cannot tell if a system has been compromised. Similarly, Trusted Boot will only attest that trusted operating system modules have been booted. Vulnerabilities in the operating system and in applications may still be a very large attack surface to compromise a system. Furthermore, physical attacks can also enable the adversary to get a hold on memory or disks that may host unencrypted data or encryption keys. Finally, the basis for Trusted Boot, Intel TXT, a dynamic root of trust mechanism, requires a complex implementation that has been shown to contain exploitable security vulnerabilities [77], and any System Management Mode attack can also be used to compromise TXT.

4.3 Secure Computation

The research field of secure computation developed protocols and algorithms to perform various tasks in a privacy-preserving way. The reason to include not-fully-trusted parties might be manifold. As local devices like smart cards might be too weak to perform the relevant operations due to power or bandwidth constraints, it uses secure outsourcing of operations. Including sensitive information from more than one party requires the use of secure multi-party computations. Proposed solutions give input-privacy guarantees, which means that privacy-preserving computations guarantee that the inputs as well as any intermediate results will not be leaked during computation. However, it does not give any privacy guarantees regarding the output. Considering a secure computation of the sum of two values from different parties, the computation will not leak the inputs, nor any intermediate value. However, the result will clearly allow both parties to calculate the respective other parties input by subtracting its own input from the sum.

4.3.1 Components A basic building-block for privacy-preserving protocols using secure computation is homomorphic encryption, which allows certain operations to be performed upon encrypted data using a privacy- homomorphism as introduced by Rivest, Adleman and Dertouzous [67]. This useful property was published shortly after the malleable asymmetric encryption algorithm RSA has been invented by Rivest, Shamir and Adleman [66]. It can be used to outsource computation by letting an untrusted party perform operations upon the ciphertext. While some operations require the knowledge of the private key , others can be performed solely given a ciphertext and system parameters of the cryptographic scheme.

27 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Formally, an encryption scheme is homomorphic if, for any encryption key k, the encryption function E satisfies:

∀m1, m2 ∈ M,E(m1 ○M m2) ← E(m1) ○C E(m2) (4.1) where M denotes the set of plaintexts, C denotes the set of ciphertexts, ○M and ○C are operators in M and C respectively and ← means ”can be directly computed from”, without any intermediate decryption [38]. Early algebraic homomorphic encryption schemes were only partially homomorphic in that they only supported a multiplicative homomorphism between the plaintext and ciphertext [66, 35] or only an additive homomorphism [62, 56, 43]. However, starting with the seminal work of Gentry [41] the first systems supporting a limited homomorphism for both addition and multiplication at the same time were constructed. Those systems are entitled somewhat, leveled or fully homomorphic encryption schemes depending on the property of supporting a circuit of certain fixed maximum depth (somewhat vs. leveled) or supporting an unlimited number of successive homomorphic operations (fully homomorphic). A typical benchmark for fully homomorphic encryption schemes is to homomorphically evaluate a symmetric encryption or decryption function. Most often AES [57] is used, while shallow-depth circuits as used in lightweight encryption schemes like Prince [28] offer much better performance results. Shahverdi et al. [70] report a runtime of about 30 hours to homomorphically evaluate 2048 AES parallel encryptions and about 1 hour to evaluate 1024 parallel Prince encryptions. A homomorphic encryption testing framework HETest [76] was proposed to test performance and correctness of different fully homomorphic encryption schemes under identical conditions. Many optimizations were proposed to make fully homomorphic encryption more efficient. This includes the necessity to squash the decryption circuit [42], remove bootstrapping [29], introduce SIMD operations for multiple data items inside a single ciphertext [74] or even build homomorphisms into noise-free encryption schemes [61, 53], which didn’t yield any secure constructions yet. Another often used building-block are Yao’s garbled circuits [80], which were designed and used to solve the prominent millionaires problem [79]. It takes a boolean circuit which is to be evaluated upon private input from two or more parties. One party (garbler) encrypts (garbles) the circuit and its inputs. The other parties also encrypt their inputs and let the garbler connect it to the correct gates using oblivious transfer [64]. Some party will evaluate the garbled circuit using all encrypted inputs and communicate the result to all inputing parties. As mentioned, garbled circuits operate over boolean circuits having gates with boolean inputs attached, while homomorphic encryption defines operations over group elements for typically larger groups having a prime modulus. However, over the last decades many new schemes, modes of operation and optimizations were presented in the respective research areas such that it is possible to pick optimized schemes or protocols for many different datatypes. Supported operations are typically logical operators like AND, OR, XOR on bits, as well as addition, subtraction, multiplication or exponentiation on larger groups.

4.3.2 Usage The typically mentioned use-cases in the relevant literature are taken from a wide range of possible domains: • Sharing medical information about patients across hospitals

28 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

• Search human genomic data for dispositions

• Compare criminal records across different states/counties/contries

• Network anomaly detection across different providers

• Outsourcing of computation upon personal information from weak/power constraint cards

• Privacy-preserving bidding at anonymous auctions

• Remote private search over sensitive documents/e-mails

• Electronic Voting

• Secure Databases

Secure computation was used in the practical task of calculating the market clearing price of a danish sugar beet auction [27] in January 2008. 1229 farmers handed in at least one bid in the auction, which took roughly 40 minutes to be processed by the used secure multi-party scheme. Another practical task was performed by Bogdanov et al. [24], to find correlations between working during studying time and failing to graduate based on real estonian government data. The most efficient toolset available currently to perform privacy-preserving statistical analysis is the Sharemind framework [26], which was also used in the aforementioned study. It supports a language called SecreC [51], allowing to easily write secure multi-party computation programs. Upon this language there exists the Rmind [25] statistical analysis system, which is a derivate of the statistical system R. The relevant literature has more propositions for secure computation tools and languages to ease developing privacy-preserving protocols and programs:

• Obliv-C [81]

• HElib [71]

• Sepia [30]

• VIFF [33]

• CBMC-GC [39]

• TASTY [47]

• PICCO [82]

• L1 [68]

• Wysteria [65]

• SMCL [58]

Building upon these are more complex systems like secure databases such as CryptDB [63], Monomi [75], TrustedDB [22] or Cipherbase [20].

29 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

4.3.3 Limitations Even though the research field around secure computation is very active, especially since the invention of fully homomorphic encryption by Gentry [41], few practical applications have been developed mainly due to the still very high overhead of running programs securely using either homomorphic encryption or garbled circuits. As both solutions are data oblivious, the program execution will not be able to use branching as in conditional jumps depending on the actual data, as this would leak information about the intermediate values to the executing party. In return all loops must be unrolled, branches must be computed nonetheless if they are necessary or not.

4.4 Conclusion

During the analysis of the technologies, we have identified four main aspects for using technologies to provide secure computation: (i) the chosen technology must support attestation, so that the cloud provider or cloud customer that is providing a service to other users can assure these users that their data is going to the application or services that they are supposed to go and that these application or services are unmodified; (ii) the sensitive data must be protected and there must be confidentiality guarantees so that users may trust their data will not be disclosed; (iii) the technology must be practical, feasibility should be a feature, both from the performance point of view and from the existence of tools to help the development, so that the SecureCloud project can deliver platforms that are flexible and readily applicable to real scenarios; and, (iv) the solution must be as robust as possible against vulnerabilities from components that are not an integral part of the technology compromising the whole system.

4.4.1 Attestation Assessing the integrity of a computational platform is essential to assure we are always running the expected software with the expected configurations in an adequate machine. This approach is supported by both Trusted Boot technologies3 and Intel SGX. With Trusted Boot, the attestation refers to care being taken with the earliest stages of the boot process. There are a variety of technologies that enable verification of these early boot stages. These typically require hardware support such as the trusted platform module (TPM), Intel Trusted Execution Technology (TXT), dynamic root of trust measurement (DRTM), and Unified Extensible Firmware Interface (UEFI) secure boot. Then, once the machine is booted, the machine can emit certificates of this boot and applications will run unmodified in the infrastructure. Intel SGX also enables attestation, a remote client can check both that the code that is running is the expected code and that this code is running inside the enclave. Homomorphic encryption by itself does not provide attestation. It would need to consider complementary technologies (such as Trusted Boot) to ensure that the code running is the expected code.

4.4.2 Confidentiality Regarding confidentiality of data, both Intel SGX and homomorphic encryption can provide guarantees that data that was passed to an already attested application will not leak. In the case of SGX, the data

3Some refer to all of these collectively as secure boot technologies.

30 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds inside the enclave is protected by encryption whenever it is stored in main memory. The potential leak of information with SGX would then be related to direct hardware attacks, application bugs or indirect attacks like side channel attacks. In the case of HE that data is never decrypted, even while it is being processed. Trusted Boot does not offer by itself means to protect data. These means would need to be implemented by the application itself, otherwise, unencrypted data could be stored in disks (which could be physically stolen, for example).

4.4.3 Feasibility On the one hand, both Trusted Boot and Intel SGX require special hardware, while HE does not require. Nevertheless, Trusted Boot hardware can be easily found in both servers and personal computers. Intel SGX hardware is becoming more common, but is still much less common than Trusted Boot hardware. Trusted Boot also has the advantage of not considerably restricting applications: the boot chain is validated and then regular applications could run with unmodified code or performance. On the other hand, both HE and Intel SGX require changes in the application deployment and in its performance characteristics. In case of HE, there are no freely available tools for converting generic code to code that processes encrypted data. In addition, generic approaches would be many orders of magnitude slower. As a consequence, HE is currently applicable only to specific problems with custom- made algorithms. Intel SGX currently also requires some change in the development and deployment of applications. Nevertheless, as planned in the Description of Work for the SecureCloud project, there are means to isolate the regular application developer from the complexity of SGX. As an example of performance issues, Figure 4.1 depicts how memory access costs grow when the size of the application data inside the enclave gets close to the memory allocated to the enclave (causing an effect similar to paging). The figure compares the costs of memory access from code within an enclave and code running outside enclaves. Intel SGX has performance losses when memory paging is required, as can be seen in Figure 4.1. However, as can be seen in Figure 4.2 – removed from a previous work with homomorphic encryption and secure multiparty computing – the imposed computational cost is much lower with SGX. In this case, the figure illustrates the computation costs for aggregating measurements in a smart grid application when done inside an enclave and inside code that uses a homomorphic scheme based on Paillier.

31 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Figure 4.1: Comparison between reading from SGX enclave and from unencrypted memory

Nevertheless, there could be alternatives to reduce the performance loss results in using more than the 90 MB memory space. We want to develop a solution, so that secure containers are a sensible basis for the SecureCloud platform and there will be no performance bottlenecks for the big-data applications, which will be developed in WP5. To overcome this drawback we want to utilize the unprotected memory and apply own security mechanisms to maintain confidentiality and integrity of the data. As prerequisite we need a secure mechanism to manage the unprotected memory. One solution would be to execute a memory allocation function outside of the enclave, but leaving and reentering the enclave mode is very expensive. Therefore, we are currently working on an alternative approach, where the memory allocation function is executed inside of the enclave. As this function is working on unprotected memory, an attacker could potentially influence its behavior. Therefore, we apply additional security mechanisms, which allow this function only to access unprotected memory, so that reading or modifying of sensitive data is not possible.

32 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Figure 4.2: SGX compared with homomorphic encryption

4.4.4 Robustness As the last factor, all approaches can be compromised if the application being executed has bugs that leak data or enable adversaries to somehow interfere with the results. Nevertheless, for Intel SGX, the code surface where bugs would need to exist is the application code that runs in the enclave. Similarly, for HE, the surface area would be only on the application. For Trusted Boot, the surface area includes not only the application, but the operating system, libraries and tools installed on the machine. Finally, Linux, the most used kernel for IaaS cloud environments, has 17 million lines of code and, as an example, over 100 security vulnerabilities patches have been performed yearly in the years between 2012 and 2014.

4.4.5 Summary Because SGX achieves a balance between feasibility and security guarantees it has been chosen as the main approach for the technologies for secure big data computation in the cloud. In the next chapter, we evaluate the chosen solution more closely to validate that it will be able to be the basis of the SecureCloud approach.

33 5 Evaluation of SGX

After the analysis of the previous chapter, a more robust study of Intel SGX based attacks, risks and vulnerabilities has been made and is presented in this chapter.

5.1 Attacks

Despite the advantages provided by Intel SGX, the technology does not provide security to all kinds of attacks. SGX can be shown to provide solid guarantees against straightforward attacks on enclaves, as we shall see below, but concerns have been raised with relation to the lack of guarantees regarding sophisticated attacks such as side-channel attacks. A side-channel attack is any attack based on information gained from the physical implementation of a cryptosystem, and includes timing information, power consumption, electromagnetic leaks, among others. In the following sections we provide an overview of common approaches for attacks and discuss their potential impact in the SGX. For a detailed discussion on these attacks, see [32].

5.1.1 Address Translation Attacks Straightforward Active Attacks Straightforward active address translation attacks, where malicious system software simply modifies the page tables, are prevented by rejecting undesirable address translations before they reach the TLB.

Active Attacks Using Page Swapping Active address translation attacks are a class of attacks where the page tables used by an application are maliciously modified. For instance, the page tables could be modified so that the physical address corresponding to a virtual address within the ELRANGE of an enclave mapping to a PRM location belonging to the enclave, is changed to a physical address to a PRM location also belonging to the enclave, but not the original one, which could result in e.g. disclosure of sensitive information. In order to defend against this attack, the contents of each evicted page must be cryptographically bound to the virtual address to which the page should be mapped. This attack is prevented in SGX by offering an EPC page eviction method, enforced by an SGX instruction called EWB, which relies on symmetric key cryptography to guarantee the confidentiality and integrity of the evicted EPC pages, and on nonces stored in Version Arrays (VAs) to guarantee the freshness of the pages brought back into the EPC after eviction. EWB evicts an encrypted version of the EPC page’s contents together with some fields in the corresponding EPCM page, the nonce, and a message authentication code (MAC) tag. With the exception of the nonce, the rest is written to outside the PRM area. A field PAGEINFO in the corresponding EPCM of the page is substituted by a structure called Page Crypto Metadata (PCMD), which includes metadata associated with the evicted page as well as the MAC tag generated by EWB, which covers the authenticity of the evicted EPC page contents, the metadata, and the nonce. When the evicted page is reloaded, the MAC tag is checked, and the evicted page will be reloaded back into the EPC only if the value of the MAC confirms the authenticity of the page data, metadata, and nonce, thereby counteracting the mentioned page swapping attack.

Passive Address Translation Attacks However, passive attacks against SGX are still possible concerning page swapping. Although the system can be prevented from reading the contents of an application, and also from changing it without discovery,

34 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds it can still infer partial information about the application from its memory access patterns by observing the page faults and page table attributes. In [78] the authors discuss how this kind of attack can be carried out against Intel’s SGX.

Active Attacks Based On TLBs This is an active memory mapping attack that may take place if the system software does not invalidate a core’s TLBs when it evicts two pages from DRAM, and then exchanges their locations when reading them back in. This is defeated by guaranteeing that all the logical processors have invalidated any TLB entry associated with pages that will be evicted, which in SGX is done by a special ETRACK instruction, invoked by EWB.

5.1.2 Physical Attacks This is one of the main risks that Intel SGX intends to protect against. However, available documentation on SGX is not enough in order to carry out a precise analysis of the risks deriving from physical access to the platform. The main physical attacks are detailed below.

Port Attacks Port attacks involves connecting a device to an existing port on the victim computer’s case or . It includes so called cold boot attacks, in which the computer boots a flash drive with malicious system software. It is worth mentioning here that there might be vulnerabilities associated with the Generic Debug External Connection (GDXC), which collects and filters the data transferred by the uncore’s ring used for inter-core and core-uncore communication, reporting it to an external debugger, but the available documentation on Intel SGX is not enough to judge it. There is though an Intel patent [72] targeting debugging ports attacks.

Bus Tapping Attacks Bus tapping attacks consist of the installation of a device that taps a bus on the computer’s motherboard. Passive attacks only monitor the bus traffic, active attacks also modify the traffic or insert new commands on the bus, and replay attacks replay old traffic. SGX’s considers the DRAM and the bus connecting it to the CPU chip to be untrusted. Confidentiality, integrity and freshness guarantees to the EPC data stored in DRAM are guaranteed by SGX’s Memory Encryption Engine (MEE). However, the MEE does not protect the addresses of the DRAM locations accessed when cache lines holding EPC data are evicted or loaded, providing thereby an opportunity to observe an enclave’s memory access patterns by means of the combination of a DRAM address line bus tap with system software that creates artificial load on the LLC lines holding the enclave’s EPC pages. There are also concerns that an attack may tap the SMBus to reach into the Intel ME, since the SMBus is easier to access than the DRAM bus.

35 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Chip Attacks Chip attacks involve removing a chip’s packaging and directly interacting with its electrical circuits, e.g. with equipment and techniques developed to diagnose design and manufacturing defects in chips. The cost of these attacks may be very high, requiring ion beam microscopy in the case of the latest Intel CPUs. The cheapest attacks are destructive, and involves imaging the chip’s circuitry with a microscope for capturing details in each layer, and equipment for mechanically removing each layer and exposing the layer below it to the microscope. These attacks commonly target global secrets shared by a family of chips, such as ROM masks that store global encryption keys or secret boot code. E-fuses and polyfuses are particularly vulnerable to imaging attacks. Non-destructive passive chip attacks, on the other hand, are much more expensive because in this case the integrity of the chip’s circuitry must be maintained. However, active attacks are not significantly more expensive than passive non-destructive attacks, since the process for accessing a module without destroying the chip’s circuitry can be used for both passive and active attacks. Often, secure computing architectures assume that the processor chip package is invulnerable to physical attacks. However, it is more realistic to assume that physical attacks will not take place if the costs are higher than their utility values, for instance by reducing the value that an attacker obtains by compromising an individual chip. The value of compromising an individual system can be reduced by avoiding shared secrets such as global encryption keys, and by not storing a platform’s secrets in hardware that is vulnerable to destructive attacks, such as e-fuses. The SGX threat model does not consider physical attacks targeting the CPU chip. However, several Intel patents [52, 55] target countermeasures aiming at increasing the cost of chip attacks. For example, the Root Seal Key and the Root Provisioning Key, both stored in e-fuses, are encrypted with a global wrapping logic key (GWK), a 128-bit AES key hard-coded in the processor’s circuitry, thereby increasing the cost of extracting the keys from the processor. However, the GWK is shared among all the chip dies created from the same mask, having therefore the drawbacks of global secrets. Newer Intel patents for SGX-enabled processors [44, 45] describe a Physical Unclonable Function (PUF) for generating a symmetric key to be used during the provisioning process. The idea is to encrypt the PUF key before being transmitted to the key generation server with the GWK key. Later, the key generation server encrypts the key material to be burned into the processor chip’s e-fuses with the PUF key, and transmits it to the chip. In this way, the cost of obtaining a chip’s fuse key material increases because the attacker must compromise both provisioning stages in order to decrypt the fuse key material. It is expected that deriving the root keys from the PUF would be more resilient to imaging attacks. However, it seems not be known today which of these mechanisms have be integrated in the current implementation of SGX.

Power Analysis Attacks Power analysis attacks consist in measuring the power consumption of a computer system or its components, e.g. the processor chip, taking advantage of a known correlation between power consumption and the computed data in order to derive some property of the data from the observed power consumption. The costs associated with the required equipment to carry out such types of attacks are usually dominated

36 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds by the costs deriving from the complexity of the analysis required to learn the desired information from the observed power trace. Successful power analysis attacks against Intel processors, which can easily be performed by any data center employee using inexpensive off-the-shelf sensor equipment, have been reported [40]. Power analysis attacks are not included in SGX’s threat model, since such attacks cannot be addressed at the architectural level. Moreover, defending against power attacks requires expensive countermeasures, with a very high cost-to-benefit ratio. In addition, power analysis attacks can extend to displays and human input devices, e.g. by measuring the radiation emitted by a CRT display’s ion beam, rendering it even harder to address.

5.1.3 Privileged Software Attacks In IaaS clouds, commodity CPUs usually run software at four different privilege levels, Ring 0 through Ring 3, in order of higher to lower privilege level. Software running at higher privilege levels may access and modify code and data running at lower privilege levels (but not the other way around), and hence can compromise the latter, which therefore must trust the former. System Management Mode (SMM), on the other hand, runs with the highest privilege, above Ring 0 level. In the context of hardware virtualization, support is added for a hypervisor or Virtual Machine Monitor (VMM). The hypervisor runs at a higher privilege level than the operating system, and allocating hardware resources across multiple operating systems sharing the same physical machine. Hypervisor code generally runs at ring 0 in VMX root mode. Operating systems are ideally split into a small kernel running at a high privilege level in kernel or supervisor mode, as ring 0. Device drivers and services run at lower privilege levels. However, for performance reasons mainstream operating systems have a larger part of code running at ring 0. Application code runs at the lowest privilege level in user mode (ring 3). In IaaS cloud environments, the VM images provided by customers run in VMX non-root mode, the kernel in VMX non-root ring 0, and the application code in VMX non-root ring 3. The SGX design assumes that all the privileged software on the computer is malicious. SMM is only used to handle a specific kind of interrupts, System Management Interrupts (SMI). The SMM handler is stored in System Management RAM (SMRAM) which should not be accessible when the processor isn’t running in SMM. However, SMM-based rootkits have been demonstrated, and compromising the SMM grants an attacker access to all the software on the computer. The SGX threat model considers system software to be untrusted, which is seen as a solution to the secure remote computation problem in IaaS cloud computing. Malicious software is prevented from directly reading or modifying the EPC pages storing an enclave’s code and data. Two features of the SGX design are crucial here: (i) the SGX implementation runs in the processor’s microcode at a higher privilege level; and (ii) SGX’s security checks are the last step performed by the PMH, which in this way cannot be bypassed by other architectural feature. SGX’s microcode is always involved in transitions between enclave code and non-enclave code, regulating all interactions between system software and an enclave’s environment, thereby preventing a malicious OS or hypervisor to attack the enclave’s software by tampering with its execution environment. In contrast to SGX, Intel’s TXT relies on Intel’s Virtual Machine Extensions (VMX) for isolation, whose restrictions could be bypassed by software running in System Management Mode (SMM).

37 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

In SGX, on the other hand, all transitions between enclave and non-enclave code place SMM software on the same level as any other system software at lower privilege levels. System Management Interrupts (SMI) causing the SMM to take over are handled using the same Asynchronous Enclave Exit (AEX) process as all other hardware exceptions. Whenever a hardware exception occurs during execution of an enclave’s code, an Asynchronous Enclave Exit (AEX) is performed before the system software’s exception handler is invoked. The AEX saves the enclave code’s execution context, and the execution state of the logical processor is removed and stored into a State Save Area (SSA) inside the enclave, in a way that system software’s exception handler will not be able to access any enclave secrets. However, the inherent complexity of the architectural features of SGX, which are also supposed to change with every new generation of CPUs, renders it very difficult to provide proofs of the correctness of the security properties of the transitions between enclave and non-enclave code. Another security risk is associated with hyperthreading. SGX does not prevent hyperthreading, where the execution units and caches on a core are shared by two logical processors, each with an own execution state. In this case, malicious system software can schedule a thread to execute the code of a normal enclave on a logical processor that shares the core with another logical processor that may execute a snooping thread. This can be prevented either by disabling hyper-threading in the SGX, which could be done by letting the enclave measurement include the hyperthreading configuration, or by having the SGX implementation guarantee that any other logical processor sharing the core is either inactive or executing the same enclave’s code that the enclave’s logical processor is executing. However, no initiatives by Intel to solve this question are known.

5.1.4 Software Attacks on Peripherals Since the system software is untrusted by SGX, SGX should be able by design to resist attacks by peripheral devices on the computer’s motherboard controlled by system software. However, peripherals could be used in physical-like attacks, so it must be shown also that there are barriers preventing untrusted software on the CPU from communicating with other programmable devices, or preventing compromised programmable devices from tampering with sensitive buses or DRAM. This is however a difficult task because there is not enough official documentation on devices such as the Management Engine (ME), the (PCH), and the Direct Media Interface (DMI). Next, we present some possible attacks of this type.

PCI Express Attacks The PCI Express bus allows devices connected to the bus to perform (DMA) to the computer’s DRAM without the involvement of a CPU core. Although assigned a range of DRAM addresses for its own use, devices may perform DMA on DRAM addresses outside of that range. Hence, an attacker may use programmable devices to access any DRAM region. In an SGX-enabled processor, the memory controller (MC) integrated on the processor’s chip die rejects any DMA transfers to the PRM range, which includes the EPC, thus protecting the enclaves’ contents from PCI Express attacks. This protection is possible because the MC configuration commands issued by SGX’s microcode are transmitted over a trusted communication path lying entirely within the CPU die.

38 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

DRAM Attacks DRAM attacks include the rowhammer DRAM bit-flipping attack, in which an attacker may cause bit flips in page tables used for address translation. The DRAM is considered as an untrusted storage medium by SGX, and hence cryptographic primitives are implemented to guarantee the confidentiality, integrity and freshness of the EPC contents that is stored into DRAM. These cryptographic primitives are supported by an autonomous hardware unit called the Memory Encryption Engine (MEE), whose role is to protect the confidentiality, integrity, and freshness of the CPU-DRAM traffic over the PRM range [40].

The Performance Monitoring Side Channel A power analysis attack can be carried out if a computer’s system software is compromised so that an attacker gains access to the performance monitoring events. SGX addresses these attacks by disabling processor features such as Precise Event Based Sampling (PEBS) for the logical processor or any hardware breakpoints placed inside the enclave’s virtual address range (ELRANGE). However, SGX does not protect against software side-channel attacks that rely on performance counters, and lack of information in the publicly available Intel documentation on this part of the SGX design prevents enclave authors from trying to find countermeasures to these kinds of attacks. The CPU uncore has a bidirectional ring interconnect used for communication between execution cores, connected to the ring by CBoxes, and the other uncore components. The CBoxes route statically the LLC accesses of the execution cores. However, Intel does not document the mapping implemented in CBoxes between physical DRAM addresses and the LLC slices used to cache the addresses, a mapping which impacts several uncore performance counters and seems to be strong enough to allow reverse- engineering. As a result, knowledge of the CBox mapping can be used to learn details about an enclave’s memory access patterns, for instance by attacks that compromise the firmware executed by the Intel Management Engine (ME), an embedded computer that plays a crucial role in platform bootstrapping.

Attacks on the Boot Firmware and Intel ME usually store the firmware used to boot the computer in a flash memory chip that can be written and updated by system software. Hence, attacks compromising system software can subvert the firmware update mechanism by injecting malicious code into the firmware, which can be used to carry out a cold boot attack. The Intel Management Engine (ME) also loads its firmware from the same flash memory chip as the main computer, and a compromised ME could leak most of the information that comes with installing active probes on the DRAM, the PCI, and the (SMBus), and also power consumption meters. It is known that security checks on the computer firmware and the ME have been subverted in the past. Since SGX assumes that the DRAM is untrusted, in principle these attacks should not compromise an enclave’s code or data, and be equivalent to other DRAM attacks. Nevertheless, there seems to be concern about lack of documentation on the ME design and implementation. Since the ME is involved in the boot process, it may play a part in the SGX initialization sequence. For instance, the ME could get direct access to the CPU’s caches in order to enable the ME’s TPM implementation to measure the

39 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

firmware directly, which could sidestep the MEE and thereby allow attackers to directly read the EPC’s contents.

5.1.5 Cache Timing Attacks Cache timing attacks can be carried out by application code running at ring 3. They exploit the dependency between the location of a memory access and the time it takes to perform the access. Cache misses require memory access to at least the next level cache, and a second memory access if a write-back occurs. With the aid of a high-resolution time-stamp counter, Intel RDTSC and RDTSCP instructions, available to ring 3 software, can be used to measure the latency between a cache hit and a miss. The attacker proceeds by filling up all the cache sets that may otherwise hold the relevant memory locations of the victim, and when the latter accesses a memory location in its own address space, the shared cache must evict one of the cache lines holding the attacker’s memory locations. Meanwhile, the attacker repeatedly accesses its own memory locations, and when access times indicate an eviction from the cache, the attacker concludes that the victim accessed an interesting memory location in its own cache, learning after some time something about the victim’s memory access pattern, which may ultimately yield sensitive information in the case of data-dependent memory fetches. Cache timing attacks have been able to retrieve cryptographic keys used by AES, RSA, Diffie-Hellman, and elliptic-curve cryptography. In order to carry out its attacks, and attacker must have access to memory locations mapping to the same cache sets as the victim. As a result, an easy line of defense would be to use a cache partitioning scheme, which must nevertheless be performed by trusted system software. Another requirement is that the victim accesses its memory in a data-dependent fashion allowing inference of private information from the memory access pattern, which is difficult to change. Concerns have been expressed that SGX does not protect against software attacks based on memory access pattern, such as cache timing attacks. The SGX threat model does not consider cache timing attacks, which are classified as side-channel attacks and are dismissed as complex physical attacks, and no modification to SGX seems to have been done in order to defend against these types of attacks, such as disabling caching for the PRM range containing the EPC

5.2 Risks

In this section we revisit the risks detailed in the Chapter 3 and highlight how Intel SGX may address some of those.

5.2.1 Policy and organizational risks R.1 LOCK-IN

Vulnerabilities

V13.Lack of standard technologies and solutions V31. Lack of completeness and transparency in terms of use V46. Poor provider selection

40 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

V47. Lack of supplier redundancy

Assessment

Intel SGX should in principle have no impact on application portability.

R.2 LOSS OF GOVERNANCE

Vulnerabilities

V13. Lack of standard technologies and solutions V14. No source escrow agreement V16. No control on vulnerability assessment process V21. Synchronizing responsibilities or contractual obligations external to cloud V22. Cross-cloud applications creating hidden dependency V23. SLA clauses with conflicting promises to different stakeholders V25. Audit or certification not available to customers V26. Certification schemes not adapted to cloud infrastructures V29. Storage of data in multiple jurisdictions and lack of transparency about THIS V30. Lack of information on jurisdictions V31. Lack of completeness and transparency in terms of use V34. Unclear roles and responsibilities V35. Poor enforcement of role definitions V44. Unclear asset ownership

Assessment

Technologies such as Intel SGX allow extensions of client control into the cloud. As they help to protect the information from the provider itself, such technologies contribute towards a minimization of this risk.

R.3 COMPLIANCE CHALLENGES

Vulnerabilities

V25. Audit or certification not available to customers V13. Lack of standard technologies and solutions V29. Storage of data in multiple jurisdictions and lack of transparency about THIS

41 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

V26. Certification schemes not adapted to cloud infrastructures V30. Lack of information on jurisdictions V31. Lack of completeness and transparency in terms of use

Assessment

The use of Intel SGX enclaves and attestation may render requirements on compliance evidence superfluous, since the cloud provider is viewed as largely untrusted, thus eliminating this risk.

R.4 LOSS OF BUSINESS REPUTATION DUE TO CO-TENANT ACTIVITIES

Vulnerabilities

V6. Lack of resource isolation V7. Lack of reputational isolation V5. Hypervisor vulnerabilities

Assessment

This risk is not affected by the usage of SGX. Even though applications can be isolated, in a cloud provider there is always the risk of IPs migrating from one tenant to the other.

R.5 CLOUD SERVICE TERMINATION OR FAILURE

Vulnerabilities

V31. Lack of completeness and transparency in terms of use V46. Poor provider selection V47. Lack of supplier redundancy

Assessment

This is not affected by a technology like Intel SGX.

R.6 CLOUD PROVIDER ACQUISITION

Vulnerabilities

V31. Lack of completeness and transparency in terms of use

42 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Assessment

This is not affected by a technology like Intel SGX.

R.7 SUPPLY CHAIN FAILURE

Vulnerabilities

V22. Cross-cloud applications creating hidden dependency V31. Lack of completeness and transparency in terms of use V46. Poor provider selection V47. Lack of supplier redundancy

Assessment

Intel SGX may make it harder for the cloud provider to outsource or sub-contract services to third-parties in other platforms, especially for running application, even if they would offer the same guarantees as the original cloud provider.

5.2.2 Technical risks R.8 RESOURCE EXHAUSTION (UNDER OR OVER PROVISIONING)

Vulnerabilities

V15. Inaccurate modeling of resource usage V27. Inadequate resource provisioning and investments in infrastructure V28. No policies for resource capping V47. Lack of supplier redundancy

Assessment

Moving live applications running in SGX enclaves to a distributed environment in which they may migrate and interact can be seen today as a research challenge. This increases the risks associated with resource exhaustion or over provisioning, since the flexibility provided by distribution of resources, both compute and storage, is one of the defining features of cloud computing, which may thereby offer a cheaper solution than in-house approaches. If costs are not reduced by the rather monolithic nature of SGX-enabled computing, the incentive of going to the cloud may be significantly reduced. Service availability is important for smart grids, and infrastructure oversize may be a result of this, specially if a bare metal solution is adopted. Moreover, limitation on the size of the PRM

43 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

(currently at a static size of 128 MB), as well as inefficient page swapping mechanisms, and difficulties, or even impossibility, to distribute computation or data running in enclaves, e.g. for load balancing purposes, significantly increase this risk. Moreover, once initiated, enclaves cannot dynamically grow. In SGX, the area used to store an enclave thread’s execution context after hardware exceptions is called a State Save Area (SSA), which is stored in regular EPC pages, and both the number of available SSAs (NSSA) and the number of EPC pages that each SSA may use up (SSAFRAMESIZE) are static and must be specified before enclave creation. The creation of an enclave will fail if the SSAFRAMESIZE is too small. Moreover, this field is included in the enclave’s measurement, since leaving it out would allow a malicious enclave loader to specify a bigger SSAFRAMESIZE than the intended one in order to cause the SSA contents to overwrite the enclave’s code or data. This force the enclave author to accurately model resource usage in order to avoid overbooking or over-provisioning of resources.

R.9 ISOLATION FAILURE

Vulnerabilities

V5. Hypervisor vulnerabilities V6. Lack of resource isolation V7. Lack of reputational isolation V17. Possibility that internal (cloud) network probing will occur V18. Possibility that co-residence checks will be performed

Assessment

Intel SGX is assumed to provide protection against these attacks by means of its strong isolation features. Trusted booting, together with attestation, may also provide assurance against the types of attacks. The use of Metal as a Service (MaaS) can also provide further isolation capabilities. SGX’s security model is based on guarantees that the software inside an enclave is isolated from all the software outside the enclave. This is done through a series of memory access checks that prevent the currently running software from accessing memory that does not belong to it. Non-enclave software is only allowed to access memory outside the PRM range. As this is a crucial feature in the SGX architecture, we will present below some details about the mechanisms used to enforce memory access checks. The memory mappings in SGX-enabled processors are carried out with the help of the page tables managed by untrusted system software, which by itself would enable simple address translation attacks. Preventing such attacks is one of the main drivers behind many SGX’s design decisions. Hence, SGX provides access checks for preventing the system software from directly accessing the isolated container’s memory. SGX protects the confidentiality and integrity of an enclave’s memory through address space allocation and memory commitment. SGX reserves a memory region called the Processor Reserved

44 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Memory (PRM), protected by the CPU from all non-enclave memory accesses, including kernel, hypervisor and DMA accesses from peripherals. The PRM holds the Enclave Page Cache (EPC), consisting of pages that store enclave information, code and data. Each EPC page’s state is held in the Enclave Page Cache Metadata (EPCM), which is an array with one entry per EPC page whose contents is only used for SGX’s security checks. Non-enclave software can only access memory outside the PRM range, while the code inside an enclave is allowed to access both PRM and non-PRM memory. The SGX’s approach rests upon the fact that software always references memory using virtual addresses that must be resolved before the actual memory accesses are carried out. A specific area within the total virtual address space of an enclave is allocated for its own use, called the ELRANGE. Memory commitment adds physical pages to the enclave. Virtual address space outside ELRANGE is mapped to access memory outside the enclave. Non-enclave software cannot access ELRANGE, and a memory access that resolves inside the ELRANGE results in an aborted transaction. The defense mechanisms aim to ensure that each enclave page can only be mapped at a specific virtual address. This is done by recording the intended virtual address of each EPC page, at the time it is allocated, in an ADDRESS field of the EPCM entry for the page. Thereafter, every time an address translation yields the physical address of an enclave EPC page, the CPU will check if the virtual address given to the address translation process matches the virtual address recorded in the page’s EPCM entry. Hence, the EPCM may be viewed as a kind of inverted page table covering the entire EPC. In order to implement SGX’s memory access checks, a modification has been made to the processor’s execution cores in the Page Miss Handler (PMH), which resolves TLB misses. Most SGX’s memory access checks can be implemented in the microcode assist, and the only required modification to the PMH hardware is introducing an mechanism to trigger the microcode assist for address translations when a logical processor is in enclave mode or when the physical address is in the PRM memory range. Address translation results are cached in a buffer called the translation look-aside buffer (TLB). When a translation look-aside buffer (TLB) miss occurs, the memory execution unit forwards the virtual address to the Page Miss Handler (PMH), which performs the page walk needed to obtain a physical address. In order to protect against page translation attacks, a series of security checks to the PMH are performed whenever a new TLB entry is created. Security checks are performed whenever the processor is in enclave mode, since otherwise the PMH will allow any address translation that does not target the PRM range. Virtual addresses inside ELRANGE must always translate into physical addresses inside the EPC, and an EPC page must only be accessed by the enclave owning the page. The latter is made possible by the fact that each enclave is identified by the index of the EPC page storing the enclave’s SECS, which is also stored in a special microcode register whenever the enclave is entered. At execution time, the register is always compared against the enclave identifier stored in the corresponding EPCM entry.

45 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Updates to the TLB after a TLB miss occurs are done as follows. After a a normal look-aside buffer (TLB) miss occurs, the PMH performs the page walk and obtains the corresponding physical address to be entered to the TLB. If not in enclave mode, if the physical address thus obtained is within the PRM range the corresponding virtual address will be associated to an abort page, since this would mean that non-enclave code is trying to access enclave memory; otherwise, the TLB is normally updated, since in this case the enclave memory is not involved. When in enclave mode, on the other hand, if the physical address is not in the PRM (an attempt to read memory outside the ELRANGE), if the virtual address is in ELRANGE a Page Fault will result, since this is a sign that the translation process has been manipulated; otherwise, the TLB is updated (note that in this case the final address translation might have been manipulated, but since no guarantees are provided concerning non-ELRANGE memory, this is a fact that must assumed by the enclave authors). If, on the other hand the targeted address is in the PRM (an attempt to read a page in the ELRANGE), it must be checked that the targeted address is in the ELRANGE of the active enclave. If not, a page fault is produced, as this would mean that an enclave is trying to access the protected memory of another enclave. Otherwise, if the targeted address is an address in the ELRANGE of the currently active enclave, the EPCM entry for the physical address is checked, which is the crucial step in this process, and the reason why the EPCM has an entry with the virtual address of the EPC page it is associated with. After checking if the EPCM entry is not blocked or not of the type PR REG (which would mean an EPC page not accessible to applications, for instance a page of type PT SECS), in which cases a page fault will occur, it is first checked if the EPCM entry EID (used to store the enclave ID identifying the enclave and stored in the corresponding EPCM) equals the current enclave’s ID, i.e. if the active enclave is the owner of the page being accessed. It not, a page fault will result; otherwise, the check is finally made to see if the EPCM entry ADDRESS equals the translated virtual address, which is the central point in the whole process. If not, a page fault is produced; otherwise, a new entry is inserted in TLB, and used in the future for fast memory access whenever the corresponding virtual address is accessed, without repeating the whole process described above. The SGX memory protection measures are implemented in the Page Miss Handler (PMH) and, at the chip die level, in the memory controller. In order to implement SGX’s memory access checks, a modification has been made to the processor’s execution cores in the PMH, which resolves TLB misses. Most SGX memory access checks can be implemented in the microcode assist, and the only required modification to the PMH hardware consists in introducing a mechanism to trigger the microcode assist for address translations when a logical processor is in enclave mode or when the physical address is in the Processor Reserved Memory range. Finally, since SGX instructions are implemented in microcode, the processor’s microcode has the ability to issue physical memory accesses that bypass the TLBs in order to access memory that is off limits to software such as EPCM pages. It is worth noting here that the only place where the code and data inside enclaves appear in plaintext is in on-chip caches, which means that the enclave contents are transmitted on the clear on the uncore’s ring bus.

46 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

We have presented in more detail the security checks performed whenever a new entry is added to the TLB because we believe it is one of the central security mechanisms included in Intel SGX, and responsible for a sizable part of the complexity of its design. It guarantees the intended isolation of the enclaves and the defense against active memory mapping attacks, like active address translation attacks.

R.10 CLOUD PROVIDER MALICIOUS INSIDER - ABUSE OF HIGH PRIVILEGE ROLES

Vulnerabilities

V1. AAA vulnerabilities V10. Impossibility of processing data in encrypted form V34. Unclear roles and responsibilities V35. Poor enforcement of role definitions V36. Need-to-know principle not applied V39. System or OS vulnerabilities V37. Inadequate physical security procedures V48. Application vulnerabilities or poor patch management

Assessment

Intel SGX are assumed to protect the sensitive data against those insider attacks. However, physical attacks are possible, such as chip attacks.

R.11 MANAGEMENT INTERFACE COMPROMISE (MANIPULATION, AVAILABILITY OF INFRASTRUCTURE)

Vulnerabilities

V1. AAA vulnerabilities V4. Remote access to management interface V38. Misconfiguration V39. System or OS vulnerabilities V48. Application vulnerabilities or poor patch management

Assessment

This is not affected by a technology such as Intel SGX.

R.12 INTERCEPTING DATA IN TRANSIT

47 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Vulnerabilities

V1. AAA vulnerabilities V8. Communication encryption vulnerabilities V9. Lack of or weak encryption of archives and data in transit V17. Possibility that internal (cloud) network probing will occur V18. Possibility that co-residence checks will be performed V31. Lack of completeness and transparency in terms of use

Assessment

Attestation protocols in SGX includes the safe creation of symmetric keys, in which man-in-the- middle attacks are not possible because of SGX platform authentication. Side channel attacks, on the other hand, are still possible, as discussed in the previous section.

R.13 DATA LEAKAGE ON UP/DOWNLOAD, INTRA-CLOUD As R.12, but applies only to the transfer of data between the cloud provider and the cloud customer.

Vulnerabilities

V1. AAA vulnerabilities V8. Communication encryption vulnerabilities V17. Possibility that internal (cloud) network probing will occur V18. Possibility that co-residence checks will be performed V10. Impossibility of processing data in encrypted form V48. Application vulnerabilities or poor patch management

R.14 INSECURE OR INEFFECTIVE DELETION OF DATA

Vulnerabilities

V20. Sensitive media sanitization

Assessment

The use of Intel SGX enclaves, memory encryption, and sealing mechanisms, reduces this risk.

R.15 DISTRIBUTED DENIAL OF SERVICE (DDOS)

48 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Vulnerabilities

V38. Misconfiguration V39. System or OS vulnerabilities V53. Inadequate or misconfigured filtering resources

Assessment

This is not affected by a technology such as Intel SGX.

R.16 ECONOMIC DENIAL OF SERVICE (EDOS)

Vulnerabilities

V1. AAA vulnerabilities V2. User provisioning vulnerabilities V3. User de-provisioning vulnerabilities V4. Remote access to management interface V28. No policies for resource capping

Assessment

This is not affected by a technology such as Intel SGX.

R.17 LOSS OF ENCRYPTION KEYS

Vulnerabilities

V11. Poor key management procedures V12. Key generation: low entropy for random number generation

Assessment

The way Intel SGX provides EPID keys is a controversial subject. Intel’s documentation on remote attestation has been criticized for being incomplete and obscure. The attestation scheme relies on two services operated by Intel: a key-generation facility and a provisioning service. When the SGX processor is created, two keys are burned into e-fuses, a Root Seal Key and a Root Provisioning Key. The root seal key is generated inside the processor chip, and is supposed to be unknown to Intel, which implies that if Intel’s key is compromised, an attacker would not be able to derive most keys produced by the enclaves, e.g. sealing keys. The root provisioning key, however, is generated

49 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

at Intel’s key generation facility, and stored in a database used by Intel’s provisioning service. It constitutes therefore a shared secret between Intel’s provisioning service and the processor. However, In order to obtain an attestation key, the root provisioning key is not enough, but instead a provisioning key derived from the root provisioning key using the certificate-based identity (MRSIGNER, ISVPRODID, ISVSVN) of a special provisioning enclave (signed by Intel), the SGX implementation’s SVN (CPUSVN), the name of the desired key KEYNAME (in this case, a two- byte word representing the key type ”provisioning key”, and the value of MASKEDATTRIBUTES, The certificate-based identity consists of three fields: MRSIGNER, ISVPRODID, and ISVSVN. The MRSIGNER is called the signer measurement, and is a hash of the module of the RSA key used to sign an enclave certificate (MODULUS), the enclave’s product ID (ISVPRODID), and the security version number (ISVSVN). SGX requires each enclave to have a certificate issued by its author. A certificate is formatted as a structure SIGSTRUCT, which is commonly generated by an enclave building toolchain that has access to the enclave’s author private RSA key (the exponent is not required to define it as it is always 3). SIGSTRUCT contains the values of the MODULUS (extracted from the enclave author’s public RSA key, which must therefore be provided by the author at enclave building), the enclave’s ID ISVPRODID, and the security version number ISVSVN, the three values required to build the certificate-based identity of the enclave. The values of ISVPRODID and ISVSVN are provided by the toolchain. The ISVPRODID is a unique value representing a software module, which may have different versions, and each version can have a different security version numbers ISVSVN. The SGX implementation’s security version number CPUSVN, on the other hand, is a security version number associated with the SGX implementation itself, and reflects the processor’s microcode update version. The value of ATTRIBUTES is extracted from the attributes field in the SGX Enclave Control Structure (SECS) associated with each enclave. It consists of bit-granular fields that include the DEBUG flag, the INIT attribute that is set when an enclave is initiated, and the value of the XCR0 register while this enclave’s code is executed. The latter is set by the kernel to the feature bitmap declared by the application. The ATTRIBUTESMASK is a field in the KEYREQUEST structure, used to derive the provisioning key. Some of its fields, like DEBUG and INIT, cannot be masked away. Summing up, the provisioning key is thus a (one-way) function of the following attributes (all must be known by Intel):

• the root provisioning key in the e-fuses; • the author’s RSA key MODULUS; • the enclave’s product ID ISVPRODID; • the security version number of the enclave ISVSVN; • the security version number of the SGX implementation CPUSVN; • the ATTRIBUTES field in the enclave’s SECS; • the ATTRIBUTEMASK in the KEYREQUEST structure.

50 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

After derivation of the provisioning key, the provisioning enclave uses it to authenticate itself to Intel’s provisioning service. If the authentication succeeds, the provisioning service generates an Attestation Key and sends it to the Provisioning Enclave. The Attestation Key is thereafter encrypted using a Provisioning Seal Key and sent to the system software for storage. The Provisioning Seal key is derived from the root seal key, unknown to Intel. The Attestation Key encrypted by the provisioning Seal key can only be decrypted by another enclave signed with the same Intel RSA key, since it is built by the provisioning enclave and includes the value of the MRSIGNER. Remote attestation involves first a local attestation procedure. Local attestation is done by a calling enclave to a target enclave. For remote attestation, the Quoting Enclave must be invoked as the target enclave. It is a privileged enclave issued by Intel that is able to therefore access the processor’s attestation key. Hence, the Quoting Enclave’s software may read the attestation key and produce attestation signatures. It is worth noting in this context that enclaves are vulnerable to many software side-channel attacks which have been shown to be effective in extracting secrets from isolated environments. When doing attestation, an EREPORT structure is produced. This structure contains, among other information, the values of the ATTRIBUTES, MRENCLAVE, MRSIGNER, ISPRODID, ISVSVN, and CPUSVN. EREPORT is authenticated by means of a report key, used to create a MAC tag for EREPORT. This MAC tag is thereafter reproduced by the target enclave. The EREPORT cryptographically binds a message supplied by the enclave with the enclave’s measurement based and certificate-based identities by means of the MAC tag computed using the Report Key, which is a symmetric key that is shared only by the target enclave and the SGX implementation. For deriving the report key a call is made to EGETKEY in which, in contrast to the derivation of the provisioning key described above, the MRSIGNER is not included, but instead the MRENCLAVE of the target enclave (for remote attestation, the Quoting Enclave, for local, the intended local target enclave), as well as the ATTRIBUTES of the latter. The MRENCLAVE stands for the identity of an enclave, i.e. the measurement of the code and data within the enclave. The values of the ISVSVN and ISVPRODID are also left out. Moreover, the value of the seal fuse is included. Finally, included is also the value of an 128-bit Owner Epoch (OWNEREPOCH) SGX configuration register, which is set by the computer’s firmware to a secret generated once and stored in non-volatile memory. Before giving over ownership, an owner can clear the OWNEREPOCH from non-volatile memory, thereby making it impossible for a new owner to decrypt any enclave secrets left on the computer. Most importantly from a security point of view, the value of the CPUSVN is included, which implies that the SGX security updates, which always increase the value CPUSVN, invalidate all outstanding reports. For remote attestation, as noted above, the Quoting Enclave is the target enclave. It is a privileged enclave issued by Intel, and can therefore access the SGX attestation key. Upon receipt of EREPORT, the Quoting Enclave first verifies its authenticity using the Report key generated by EGETKEY. Thereafter, the Quoting Enclave obtains the Provisioning Seal Key from EGETKEY and uses it to decrypt the Attestation Key stored by the system software. Finally, the Quoting Enclave replaces the MAC in the local attestation report with an Attestation Signature produced with the Attestation Key, and provides it to the application to which the enclave under attestation

51 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

belongs. Last, the application provides the signed EREPORT to the off-platform challenger that initially requested the attestation. The Attestation Key uses Intel’s Enhanced Privacy ID (EPID) cryptosystem, a group signature scheme intended to preserve the anonymity of the signers. Intel publishes the Group Public Key and stores securely the Master Issuing Key. The Attestation key provided by Intel’s Attestation Service is therefore an EPID Member Private Key, which can be used by the Provisioning enclave to execute the EPID Join protocol to join the group. However, if the EPID Join protocol is blinded, Intel’s provisioning service cannot trace an Attestation Signature to a specific Attestation Key, and hence to individual chips, assuming the correctness of the EPID scheme. From the above characterization, many observations can be made. Intel must be trusted, and a business relationship with Intel must be established. Attestation keys are not provided securely without Intel’s intervention, which as we have seen must be contacted and obtained online through their provisioning service. This can be seen as a serious limitation. However, Intel should not be able to obtain access to confidential information. All types of sealing keys depend on the root seal key, which should not be known to Intel. Secure migration of secrets between enclaves within the same platform are possible. The group signature scheme increases the anonymity for the owners of SGX-enabled processors, ensuring that Intel’s provisioning service will not be able to track individual SGX-enabled processor chips. The Intel Attestation Services (IAS) can be used for attestation, however that could be done by other agents trusted by the consumer. The most troubling issue here is perhaps that Intel must store the root provisioning keys. If compromised, potentially no secrets would be revealed, since sealing keys are a function of the root seal key. However, it would compromise remote attestation. Moreover, problems in security guarantees imply that it is up to software developers to try to implement the EPID signing scheme without leaking any information, for instance writing code in a way that avoids data-dependent memory accesses, which is extremely difficult in the case of complex software systems.

R.18 UNDERTAKING MALICIOUS PROBES OR SCANS

Vulnerabilities

V17. Possibility that internal (cloud) network probing will occur V18. Possibility that co-residence checks will be performed

Assessment

This is not affected by a technology such as Intel SGX.

R.19 COMPROMISE SERVICE ENGINE

52 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Vulnerabilities

V5. Hypervisor vulnerabilities V6. Lack of resource isolation

Assessment

The SGX design considers the hypervisor as untrusted, similarly to other system software like the kernel and the SMM.

R.20 CONFLICTS BETWEEN CUSTOMER HARDENING PROCEDURES AND CLOUD ENVIRONMENT

Vulnerabilities

V31. Lack of completeness and transparency in terms of use V23. SLA clauses with conflicting promises to different stakeholders V34. Unclear roles and responsibilities

Assessment

The SGX design considers the cloud environment as untrusted, and cloud providers isolation mechanisms must be attested by the customers.

5.2.3 Legal risks R.21 SUBPOENA AND E-DISCOVERY

Vulnerabilities

V6. Lack of resource isolation V29. Storage of data in multiple jurisdictions and lack of transparency V30 Lack of information on jurisdictions

Assessment

Because the data is encrypted and operations on it are performed within an enclave, the provider cannot be forced to give access to the data. The data are only sensitive during its execution time as any agency (or attacker) tries to look for it in the future, the data will be lost or useless in that moment.

R.22 RISK FROM CHANGES OF JURISDICTION

53 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Vulnerabilities

V30. Lack of information on jurisdictions V29. Storage of data in multiple jurisdictions and lack of transparency

Assessment

Same as R.21.

R.23 DATA PROTECTION RISKS

Vulnerabilities

V30. Lack of information on jurisdictions V29. Storage of data in multiple jurisdictions and lack of transparency

Assessment

The use of memory encryption and secure communication in Intel SGX eliminates this risk.

R.24 LICENSING RISKS

Vulnerabilities

V31. Lack of completeness and transparency in terms of use

Assessment

Like any technology, the study of Intel SGX was based on how its terms are adaptable to the scope of the project, after this study this risk was disregarded.

R.25 NETWORK BREAKS

Vulnerabilities

V38. Misconfiguration V39. System or OS vulnerabilities V6. Lack of resource isolation V41. Lack of, or a poor and untested, business continuity and disaster recovery plan

54 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Assessment

This is a risk of network infrastructure and is not tied to the Intel SGX

R.26 NETWORK MANAGEMENT (IE, NETWORK CONGESTION / MIS-CONNECTION / NON-OPTIMAL USE)

Vulnerabilities

V38. Misconfiguration V39. System or OS vulnerabilities V6. Lack of resource isolation V41. Lack of, or a poor and untested, business continuity and disaster recovery PLAN

Assessment

This is a risk of network infrastructure and is not tied to the Intel SGX

R.27 MODIFYING NETWORK TRAFFIC

Vulnerabilities

V2. User provisioning vulnerabilities V3. User de-provisioning vulnerabilities V8. Communication encryption vulnerabilities V16. No control on vulnerability assessment process

Assessment

This is a risk of network infrastructure and is not tied to the Intel SGX

R.28 PRIVILEGE ESCALATION

Vulnerabilities

V1. AAA vulnerabilities V2. User provisioning vulnerabilities V3. User de-provisioning vulnerabilities V5. Hypervisor vulnerabilities V34. Unclear roles and responsibilities

55 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

V35. Poor enforcement of role definitions V36. Need-to-know principle not applied V38. Misconfiguration

Assessment

Intel SGX addresses this risk by construction. Even privileged code, such as the operating system, will not be able to access code or data residing inside the SGX enclave.

R.29 SOCIAL ENGINEERING ATTACKS (I.E., IMPERSONATION)

Vulnerabilities

V32. Lack of security awareness V2. User provisioning vulnerabilities V6. Lack of resource isolation V8. Communication encryption vulnerabilities V37. Inadequate physical security procedures

Assessment

Given the nature of the Intel SGX, even with physical access to the machine, the attacker will not be able to capture the information that is inside an enclave.

R.30 LOSS OR COMPROMISE OF OPERATIONAL LOGS

Vulnerabilities

V1. AAA vulnerabilities V2. User provisioning vulnerabilities V3. User de-provisioning vulnerabilities V19. Lack of forensic readiness V39. System or OS vulnerabilities V52. Lack of policy or poor procedures for logs collection and retention

56 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Assessment

Because operations are performed within a secure enclave, no log is saved in such a way that it would be possible to extract sensitive information from logs.

R.31 LOSS OR COMPROMISE OF SECURITY LOGS (MANIPULATION OF FORENSIC INVESTIGATION)

Vulnerabilities

V2. User provisioning vulnerabilities V3. User de-provisioning vulnerabilities V19. Lack of forensic readiness V39. System or OS vulnerabilities V52. Lack of policy or poor procedures for logs collection and retention

Assessment

Because operations are performed within a secure enclave, no log is saved in such a way that it would be possible to extract sensitive information from logs.

R.32 BACKUPS LOST, STOLEN

Vulnerabilities

V1. AAA vulnerabilities V2. User provisioning vulnerabilities V3. User de-provisioning vulnerabilities V37. Inadequate physical security procedures

Assessment

As in Risk 30 and 31, a backup is not done with the information that is being carried out within an enclave and, therefore, are encrypted.

R.33 UNAUTHORIZED ACCESS TO PREMISES (INCLUDING PHYSICAL ACCESS TO MACHINES AND OTHER FACILITIES)

Vulnerabilities

V37. Inadequate physical security procedures

57 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Assessment

This is one of the main risks that Intel SGX intends to protect against. The risks associated with physical access to the chip are defended due to the implementation of the Intel SGX which is explained in Chapter 4.

R.34 THEFT OF COMPUTER EQUIPMENT

Vulnerabilities

V37. Inadequate physical security procedures

Assessment

The risks associated with physical access to the chip are defended due to the implementation of the Intel SGX which is explained in Chapter 4 and even with the machine being stolen, the attacker will not be able to access the data within an enclave, nor generate identical keys to decipher the existing information.

R.35 NATURAL DISASTERS Generally speaking, the risk from natural disasters is lower compared to traditional infrastructure because cloud providers offer multiple redundant sites and network paths by default.

Vulnerabilities

V41. Lack of, or a poor and untested, business continuity and disaster recovery plan

Assessment

This risk is not impacted by the use of SGX-enabled processors.

5.3 Practical evaluation

In this section, we describe a practical evaluation of the Intel SGX technology [73]. This validation is intended to serve as a proof-of-concept of an application handling sensitive data in the cloud an explores three important aspects: (i) attestation; (ii) protected communication with the enclave; and, (iii) performance costs (also mentioned in Section 4.4.3).

58 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

5.3.1 Intel SGX remote attestation When an SGX enclave is created, it should not contain any secrets. Secrets are only delivered to an SGX enclave after it has been instantiated on the platform. The process of proving that the enclave has been established in a secure hardware environment is referred to as remote attestation. The remote attestation process allows a remote party to gain confidence that the intended software is securely running within an enclave on an Intel SGX enabled platform. The attestation conveys the following information in an assertion:

• MRENCLAVE – a SHA-256 digest of an internal log that records all the activity done while the enclave is built. It contains information about the contents of the pages (code, data, stack, heap) loaded, the relative position of the pages in the enclave, and any security flags associated with the pages.

• Details of unmeasured state (e.g., the mode software is running in).

• Data that the software associates with itself.

As we have seen above in the R.17,SGX uses an asymmetric attestation key (Intel Enhanced Privacy ID, EPID), representing the SGX TCB, to sign an assertion with the information listed above. EPID is a group signature scheme that allows a platform to sign objects without uniquely identifying the platform or linking different signatures. Instead, each signer belongs to a ”group”, and verifiers use the group’s public key to verify signatures. EPID supports two modes of signatures. In the fully anonymous mode of EPID a verifier cannot associate a given signature with a particular member of the group. In Pseudonymous mode an EPID verifier has the ability to determine whether it has verified the platform previously. Remote attestation also involves another party, namely, the Intel Attestation Service (IAS), a Web service hosted and operated by Intel in a cloud environment. The primary responsibility of the IAS is to verify the assertion, also known as QUOTE, generated by SGX. Another participant in the remote attestation process is the Quoting Enclave (QE), a special enclave provided by Intel as part of the platform software. The QE creates the EPID key used for signing platform attestations which is then certified by the IAS. The EPID key represents not only the platform but the trustworthiness of the underlying hardware. Only the Quoting Enclave has access to the EPID key when the enclave system is operational, and the EPID key is bound to the version of the processor’s firmware. Therefore, a QUOTE can be seen to be issued by the processor itself. With the components described above, the remote attestation process happens as follows:

1. The application residing in the SGX enclave establishes a connection with the party, here denoted by service provider (SP), that will provide the secrets. The SP issues a challenge to the application to demonstrate that it is indeed running the necessary components inside an SGX enclave.

2. The application is provided with the QE’s MRENCLAVE and passes it along with the provider’s challenge to the application’s enclave.

3. The enclave generates a manifest that includes a response to the challenge and an ephemerally generated public key to be used by the challenger for communicating secrets back to the enclave. It then generates a hash digest of the manifest and includes it as User Data for the EREPORT

59 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

instruction that will generate a REPORT (i.e. a signed structure that contains the MRENCLAVE of the enclave, used for attestation) that binds the manifest to the enclave. The enclave then sends the REPORT to the application.

4. The application forwards the REPORT to the QE for signing.

5. The QE retrieves its Report Key using the EGETKEY instruction and verifies the REPORT generated by the application’s enclave. The QE creates the QUOTE structure and signs it with its EPID key. The QE returns the QUOTE structure to the application.

6. The application sends the QUOTE structure and any associated manifest of supporting data to the SP challenger.

7. The challenger forwards the QUOTE structure to IAS to validate the signature over the QUOTE. It then verifies the integrity of the manifest using USERDATA and checks the manifest for the response to the challenge it sent in step 1.

After this process is completed, the SP can gain confidence that the application is indeed running an Intel SGX enclave, that now has been verified by IAS, and proceed to send secrets to it.

5.3.2 The proof-of-concept application To evaluate the Intel SGX, we developed a proof-of-concept application that made use of Intel SGX and its remote certification. The application is the context of smart metering for smart grids and consists of the following entities: (i) utility provider, representing the electric utility; (ii) regions, each region has residences whose consumption is monitored remotely; (iii) smart meters, the devices responsible for collecting and sending the electric energy consumption of each residence of a region; (iv) aggregators, responsible for aggregating the values sent by the smart meters and calculate consumption by region and instant of time. Additionally, each smart meter in a region must remotely attest an aggregator from a utility provider. It can then periodically send measurements to this aggregator, which in turn sums up all measurements within an enclave – and, thus, ensuring data privacy – at every instant of time t and sends it to the utility provider. A simplified application flow can be seen in Figure 5.1.

60 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

Figure 5.1: Simplified scheme of the proof of concept [73]

In our experiments, two processes run in parallel. The first contains the meters associated with a region – each meter runs on a separate thread. The second represents the aggregator, which, within a secure enclave, calculates the consumption of the region at each instant of time. Both were implemented in C language and there are two forms of communication: (i) via HTTP using the REST architecture to perform the remote attestation process between each meter and the aggregator; (ii) via Apache Kafka1 to carry the encrypted measurements – using the key derived by the remote attestation process. Figure 5.2 depicts the cost imposed by the use of technology in comparison to the cost of the computations without the security guarantees. For each number of households, the experiments were repeated 10 times. These experiments were performed in an OpenStack environment, with nova-docker (responsible for Docker container management), confirming the feasibility of this technology in a cloud computing scenario.

1http://kafka.apache.org

61 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

The experiments consisted of two parts. The first part aimed to compare the execution time for the aggregation of daily consumption data for regions with 50, 100, 200, 300, 400 and 500, 600, 700 and 750 households, each of those sending measurements every 60 seconds. For every group size, 10 runs were executed. Also, Figure 5.2 shows the run times with a 95% confidence interval. This experiment is important in order to find out the overhead caused by security measures.

Figure 5.2: SGX imposed overhead

5.4 Conclusion

In this chapter, we evaluated how the technology chosen to be the basis of the SecureCloud approach can address the risks enumerated by reference documents in the area of security and privacy risks for cloud computing. We then confirm the practical feasibility of implementing services based on SGX in the cloud, by developing a proof-of-concept application that collects sensitive data from sensors (smart meters) and aggregates them before making them public.

62 6 Requirements for secure computation in the cloud

In this chapter, we describe how the application deployment process is, and then outline an appropriate approach to support SGX-based security guarantees.

6.1 Overview of the deployment process of a typical application

6.1.1 Containers and Docker The container technology is a solution to provide isolation and portability of application throughout different environments. It is a form of virtualization that acts on the operating system level, which means that different containers share the environment stack from the hardware up to the kernel level. Therefore, resources such as process ids and network address spaces as well as file systems are isolated, among containers running on the same host machine. Nevertheless, the services provided by the kernel, such as process scheduling and access to hardware resources, are shared among them (even though they are still transparent). With these techniques, containers enable users to create images of configured application environments to be deployed in a server without further configuration, as well as to be replicated in other servers and to serve as means of isolation in a multi-tenant environment, in a similar fashion to what is done with virtual machines. The concept of containers has emerged in the 90’s, but has not seen commercial success as a virtualization solution until the last few years, with the arrival of Docker. Docker is an application containerization platform that has widely grown in popularity since its debut in March 2013. It makes use of Linux isolation capabilities, namely kernel namespaces (which provide isolation for network, file system, process trees and similar resources), and cgroups (which provide usage limitation and standardization capabilities to resources such as I/O, memory, CPU and network bandwidth). Docker provides a multilayer filesystem that enables the creation of lightweight images, which can be done step by step and reused as base for other images. Docker also provides access to an image repository from which images can be pulled. Docker, and containers in general, provides a number of advantages over virtual machine based architectures. The most significant one is that they are several times lighter. The average container instantiation time is in the order of a few hundred milliseconds once its underlying image has been built, as a comparison, virtual machines can take up to minutes to be able to run. That significant difference results in other advantages. Since containers initialization has virtually no overhead, creating and removing containers can be done at will, which greatly improves scalability. In addition, because isolated applications can be re-instantiated quickly and freely, client exclusive application architectures, for example, are possible. Although the deployment possibilities and capacities get greatly expanded with containers, a major set back is that security is considerably looser than with virtual machines. From the virtual machine perspective, invading its host machine is a task that presents itself to be far more challenging than a container attacking its host. The shared infrastructure in a virtual machine hosted application is the hypervisor, a simple and somewhat monolithic attack surface which offers few vulnerabilities, no matter the level of control that the tenant has acquired over its virtual machine instance. Containers, on the other hand, share the operational system kernel, which is complex and has a much larger number of possible exploits.

63 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

6.1.2 Basic Applications Containers are, in a relatively safe environment, an inexpensive solution to isolate services and applications from complexities and dependency incompatibilities with the host. To make this sort of usage, the user needs only to prepare a dockerfile, which is a script that uses a set of Docker specific commands to run installation steps (which, in turn, are simply bash commands). These steps are used to create several layers of container images, each based in the previous one, until the script finishes and the image is fully built and ready to be run in a container. Deploying a basic application based on a container in a cloud environment is composed of the following steps:

1. Create a Dockerfile or an image;

2. In case of Nova, a image is generated submitted to glance and Nova is used to boot this service;

3. In case of a standalone Docker deployment, this Dockerfile or image is transferred to the host and instantiated;

4. The final user can then access the container-hosted application.

6.1.3 Orchestrated applications The compute service for OpenStack, known as Nova, can use a driver to manage containers. The driver is named nova-docker and enables the cloud infrastructure to manage Docker containers. To use Dockers containers in OpenStack, the user must also export the binary image resulting from building its originating dockerfile from Docker to nova-docker. Then, Nova is able to instantiate such a container with the same interface it uses for other resources, such as VMs (with the default Nova driver) or bare-metal nodes (with the nova-ironic driver). Because containers are highly disposable and lightweight, load balancing and auto-scaling become very easy to implement and effective with this infrastructure. It is also a known fact that containers most desirable properties (like dependency isolation and disaster recovery) can be best taken advantage of if each container runs the minimum set of services possible (ideally, one process per container). In an application that has several different components, it is then natural that each container be able to run only one part of such components, and be linked through an internal network. In order to attend to these interesting specifications, the concept of container orchestration has been created. The underlying idea is that a full application must have its components isolated and individually or collectively load balanced and auto-scaled as the incoming load varies. To achieve that, container orchestrators, like Kubernetes [8] and Docker Swarm [5], take in a set of dockerfiles (for each different component), a cluster configuration specification document and other desired parameters, and build and deploy the full application, including its scaling features.

6.2 Secure services for secure applications

In this section, we discuss the requirements for deploying secure applications in an OpenStack cloud.

64 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

6.2.1 Basic applications This first requirement needed is SGX-awareness for the scheduler. Since we have an application that has a single container instance or a set of containers which are independently instantiated, we need to make sure that OpenStack will schedule the respective instances to be in the supporting SGX machines. This can be done using adequate configurations for the scheduler, the machine templates (known as flavors) and the physical nodes that have the required hardware. Basically, this configuration means that metadata is associated with the hardware nodes and with the machine (or container) templates, then the scheduler needs to perform the allocation considering only nodes and templates that have matching metadata. Once the scheduler picks an adequate machine, nova-docker will instantiate the container in the selected node, which should also have the SGX driver installed. Note that the developer who creates the container image also needs to compute the MREnclave, a signature of the application container, in a trusted environment (e.g., his own machine). This is necessary for attesting that the deployed service is the the one the developer expected. This attestation is a functionality that is not part of the regular orchestration procedure. During the attestation, the user or an agent in the name of the user generates elliptic curve public and private keys with OpenSSL. Then, the next step is to call a REST webservice in the application, to pass in the public key. The attestation cannot be done without knowledge of the application and, therefore, applications should have built-in support. The webservice will reply with the enclave’s public key and the GID of the enclave and call the IAS with the GID to get the signature revocation list. This basic application deployment does not handle aspects such as fault tolerance, auto-scaling, and migration. This deployment also needs some client on the final user side in order to take benefits of the security features.

6.2.2 Orchestrated application As mentioned above, in Section 6.1.3, container orchestrators can manage complex, multi-container apps deployed on a cluster of machines, which are handled in a simplified fashion as a single component. A key feature when thinking about orchestration services is the automation they provide for a number of tasks, such as start-up, scheduling and deployment of applications, besides health monitoring, scaling and fail-over functions. In general, the orchestration tool will select an appropriate host for the container being initialized based on specified constraints defined by the user, and thus, operate them by using the other features we just described. Examples of known orchestrator services available in the market are Docker Swarm [5] and Kubernetes [8]. The first one uses its clustering capabilities in order to turn a set of Docker containers into a single, virtual Docker Engine. One of the biggest advantages of this orchestrator is its compatibility with Docker [4], an already very well known container service. Thus, any tool which already communicates with a Docker daemon can use Docker Swarm to transparently scale to multiple hosts. Similarly, there is Kubernetes, an open-source system supported by Google, that also provides automating deployment, scaling, and management of containerized applications. Similarly, a number of other orchestrators are out there [9, 1] providing the basic functionality we presented here, and in general, when configured correctly, such platforms can handle container-application workloads in a secure, reliable and scalable way. However, when discussing secure environments, containers’ reduced isolation assurance can leave the confidentiality and integrity of the data being handled by them easier to be compromised. The main

65 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds technology to address this problem, in the context of the SecureCloud project is Intel SGX. In this direction emerges SCONE [21], a secure container mechanism for Docker that uses Intel SGX to protect container processes by using enclaves, making sure all enclave code is verified by SGX when it is created. SCONE also promises to offer secure containers on top of an untrusted operating system, and to do this in a transparent way to already existing Docker container environments, relies on the shared host OS kernel for the execution of system calls. It is required for a such host to have a Linux SGX driver (the Intel Linux SDK is not necessary) and a SCONE kernel module. A secure image to be used in a SCONE container must be built differently. It is necessary to first build a SCONE executable of the application, compiling it together with its dependencies and the SCONE library. A secure SCONE client is used to create configuration files and meta-data necessary to protect the file system (FS), which contains message authentication codes and keys used for encryption. The FS protection file is then encrypted and added to the image. Also, SCONE supports transparent encryption and authentication of data through shields. The SCONE client is also used to launch and communicate with secure containers. Each one of them require a start-up configuration file (SCF) to be initialized. This file contains relevant security data such as encryption keys and a hash of the FS protection file. Such SCF files can only be accessed by an enclave whose identity has been verified. Considering this, upon enclave initialization, the SCF is received through a TLS protected network connection. As an alternative, the SGX remote attestation mechanism can be used to attest the enclave to verify its identity. Thus, in order to add support for remote attestation, there are different approaches on how to better address the issue. One option is to use existing orchestration services available in the market as an untrusted part of the solution. In this case, these orchestrators would only have access to non-relevant security configuration data, handling information such as dynamic scaling parameters, the name of the image to be used in the creation of containers, etc. On the other hand, the security relevant part, passwords, certificates, keys to encryption as well as the correspondent MRENCLAVE, is handed to a new component, the Configuration and Attestation Service (CAS), responsible for assigning such configuration file and returning a configuration ID that will then be handed to the untrusted orchestration tool. Next, the orchestrator creates a new container that receives the configuration ID previously mentioned, and the address of the CAS to allow communication. An enclave is initialized, and starts to request the security relevant configuration file from the CAS. Such data is only going to be handed to the enclave if it is able to successfully attest itself. This is where remote attestation takes place. Three components are necessary in the process: a Verifier library used by the data owner responsible for verifying unknown enclaves; an Attestor library inside the enclave that communicates with the Verifier while attesting the given enclave; and the Local Attestation Service (LAS) that turns local attestation into a remote one by exchanging Quoting Enclave (QE) values with Intel’s Attestation Service (IAS). In order to reduce a possible high latency of attestation requests from a system, we assume that each enclave is only attested once and other parties use certificates to verify themselves afterwards. Approaches for optimizing this process include the idea of a Known Key Quoting Enclave (KKQE). KKQE would work as a second quoting enclave that signs valid reports in the local attestation process with a key previously known, provisioned by the tenant. Therefore, each tenant is identified with a public key that signs the key provision message in order for it to be accepted by the KKQE. Tenants are then able to verify signatures on their own and communication with Intel’s IAS could be omitted.

66 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

The alternative to provide secure applications based on containers is to move from untrusted orchestrators to trusted ones. Remote attestation would be used to guarantee trustworthiness of the relevant pieces of the given orchestrator. In this scenario, the attestation process would be handled by the orchestration tool itself, and not by a separate component similar to the CAS. Existing solutions, like Docker Swarm and Kubernetes, do not offer support for such a mechanism, so a new orchestration service or intermediate would be created. Considering this, SCONE could be used to provide integration with the host’s SGX and the proper attestation process would involve IAS and the orchestrator only. In summary, when deploying a new application in a secure container using the second approach, a modified Dockerfile would be created using a basic SCONE image as the basis of the container. After container start-up, remote attestation would take place to attest the enclave being initialized. At this point, the process would be similar as described before when using a CAS component, but involving the orchestration tool instead. We currently intend on evaluating both approaches in order to thoroughly understand the trade offs.

6.3 Conclusion

In this chapter, we discussed how the process of deploying both simple applications, which are composed of a single container, and complex applications, which are composed of a set of containers, should change to enable the security guarantees that can be provided by SGX. We highlight two basic approaches, one that relies on maximizing the usage of existing container orchestrator services and another that considers implemented trusted orchestrations components. At this point, we consider that better investigation is needed to evaluate the tradeoffs between these approaches.

67 7 Final remarks

In this report, we have described a deeper investigation of Intel SGX, the main technology identified during the elaboration of the Description of Work for the SecureCloud project. We started with an overview of technologies that could be used to provide secure computation and compared these with SGX. Then, we took as a basis the influential European Network and Information Security Agency (ENISA) document on risks and recommendations for information security and analyzed how Intel SGX could influence the risks. We ended the analysis by discussing a proof-of-concept implementation of a secure, cloud-based application for aggregating sensitive data. We conclude our analysis with confidence that Intel SGX can be the basis technology to enable cloud-based applications that provide security and privacy guarantees. Lastly, in Section 6.1 we discussed how Intel SGX can be used in the cloud scenario. We looked at the deployment process of cloud applications without SGX and drafted how these can work with the help of SGX. These workflow and service modifications will help implementing the services managing secure cloud services in the next deliverables.

68 Bibliography

[1] Azure container service. https: //azure.microsoft.com/en-us/blog/azure-container-service-preview/. [Online; accessed 19-December-2016].

[2] Bare metal service installation guide. http://docs.openstack.org/project-install-guide/baremetal/draft/. [Online; accessed 24-November-2016].

[3] Compute service installation guide. http://docs.openstack.org/newton/install-guide-ubuntu/nova.html. [Online; accessed 24-November-2016].

[4] Docker containers. https://www.docker.com. [Online; accessed 19-December-2016].

[5] Docker swarm. https://www.docker.com/products/docker-swarm. [Online; accessed 19-December-2016].

[6] Hp uefi secure boot. http://support.hp.com/us-en/document/c03653226. [Online; accessed 30-November-2016].

[7] Ilo drivers for ironic. http://docs.openstack.org/developer/ironic/drivers/ilo.html. [Online; accessed 24-November-2016].

[8] Kubernets. http://kubernetes.io/. [Online; accessed 19-December-2016].

[9] Marathon for mesos. https://mesosphere.github.io/marathon/. [Online; accessed 19-December-2016].

[10] Open attestation and openstack. https://github.com/OpenAttestation/OpenAttestation/wiki/ Steps-to-configure-Openstack-to-Work-with-OAT. [Online; accessed 24-November-2016].

[11] Open attestation project. https://01.org/openattestation. [Online; accessed 24-November-2016].

[12] Open cit project. https://01.org/opencit. [Online; accessed 24-November-2016].

[13] Trusted boot for ironic. http://docs.openstack.org/project-install-guide/ baremetal/draft/advanced.html#trusted-boot-with-partition-image. [Online; accessed 24-November-2016].

[14] Trusted computing group. https://www.arm.com/products/security-on-arm/trustzone. [Online; accessed 10-December-2016].

[15] Trusted computing group. http://www.trustedcomputinggroup.org/trusted-boot/. [Online; accessed 30-November-2016].

69 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

[16] Trusted computing pools for nova. http://docs.openstack.org/admin-guide/compute-security.html. [Online; accessed 24-November-2016].

[17] Trusted platform module. https://en.wikipedia.org/wiki/Trusted_Platform_Module. [Online; accessed 30-November-2016].

[18] Uefi secure boot. https://en.wikipedia.org/wiki/Unified_Extensible_ Firmware_Interface#Secure_boot. [Online; accessed 30-November-2016].

[19] C. S. Alliance. Cloud controls matrix. CSA webpage, 2016. https://cloudsecurityalliance.org/group/cloud-controls-matrix/.

[20] A. Arasu, S. Blanas, K. Eguro, M. Joglekar, R. Kaushik, D. Kossmann, R. Ramamurthy, P. Upadhyaya, and R. Venkatesan. Secure Database-as-a-service with Cipherbase. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, pages 1033–1036, New York, NY, USA, 2013. ACM.

[21] G. F. Arnautov S., Trach B. Scone: Secure linux containers with intel sgx. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, 2016.

[22] S. Bajaj and R. Sion. TrustedDB: A Trusted Hardware-Based Database with Privacy and Data Confidentiality. IEEE Transactions on Knowledge & Data Engineering, 26(3):752–765, 2014.

[23] S. Berger, K. Goldman, D. Pendarakis, D. Safford, E. Valdez, and M. Zohar. Scalable attestation: A step toward secure key extraction via low-bandwidth acoustic cryptanalysisre and trusted clouds. In Proceedings of the 2015 IEEE International Conference on Cloud Engineering, IC2E ’15, pages 185–194, Washington, DC, USA, 2015. IEEE Computer Society.

[24] D. Bogdanov, L. Kamm, B. Kubo, R. Rebane, V. Sokk, and R. Talviste. Students and Taxes: a Privacy-Preserving Study Using Secure Computation. Proceedings on Privacy Enhancing Technologies, 2016(3):117–135, 2016.

[25] D. Bogdanov, L. Kamm, S. Laur, and V. Sokk. Rmind: a tool for cryptographically secure statistical analysis. Technical Report 512, 2014.

[26] D. Bogdanov, S. Laur, and J. Willemson. Sharemind: A Framework for Fast Privacy-Preserving Computations. In S. Jajodia and J. Lopez, editors, Computer Security - ESORICS 2008, volume 5283 of Lecture Notes in Computer Science, pages 192–206. Springer Berlin / Heidelberg, 2008.

[27] P. Bogetoft, D. L. Christensen, I. B. Damgard,˚ M. Geisler, T. Jakobsen, M. Krøigaard, J. D. Nielsen, J. B. Nielsen, K. Nielsen, J. Pagter, M. Schwartzbach, and T. Toft. Secure Multiparty Computation Goes Live. In Financial Cryptography 2009, pages 325–343, Berlin, Heidelberg, 2009. Springer-Verlag.

70 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

[28] J. Borghoff, A. Canteaut, T. Guneysu,¨ E. B. Kavun, M. Knezevic, L. R. Knudsen, G. Leander, V. Nikov, C. Paar, C. Rechberger, P. Rombouts, S. S. Thomsen, and T. Yalc¸ın. PRINCE – A Low-Latency Block Cipher for Pervasive Computing Applications. In X. Wang and K. Sako, editors, Advances in Cryptology – ASIACRYPT 2012, volume 7658 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2012. [29] Z. Brakerski, C. Gentry, and V. Vaikuntanathan. Fully Homomorphic Encryption without Bootstrapping. In Proceeding ITCS ’12 Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pages 309–325, 2011. [30] M. Burkhart, M. Strasser, D. Many, and X. Dimitropoulos. SEPIA: Privacy-Preserving Aggregation of Multi-Domain Network Events and Statistics. In 19th USENIX Security Symposium, Aug. 2010. [31] I. Corporation. Intel® software guard extensions (intel® sgx). Intel webpage, 2016. https://software.intel.com/en-us/sgx. [32] V. Costan and S. Devadas. Intel sgx explained. Technical report. Cryptology ePrint Archive, Report 2016/086, 2016. [33] I. B. Damgard,˚ M. Geisler, M. Krøigaard, and J. B. Nielsen. Asynchronous Multiparty Computation: Theory and Implementation. In Proceeding 12th International Conference on Practice and Theory in Public Key Cryptography, pages 160–179, Irvine, CA, USA, Sept. 2008. Springer Berlin Heidelberg. [34] T. W. David Kaplan, Jeremy Powell. Amd memory encryption white paper. http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/ AMD_Memory_Encryption_Whitepaper_v7-Public.pdf, 2013. [Online; accessed 25-November-2016]. [35] T. ElGamal. A Public Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms. In Advances in Cryptology, pages 10–18, Santa Barbara, California, USA, Aug. 1984. Springer Berlin Heidelberg. [36] ENISA. Cloud computing benefits, risks and recommendations for information security. In Cloud Computing Benefits, risks and recommendations for information security. ENISA, 2012. [37] D. A. Fernandes, L. F. Soares, J. a. V. Gomes, M. M. Freire, and P. R. Inacio.´ Security issues in cloud environments: A survey. Int. J. Inf. Secur., 13:113–170, 2014. [38] C. Fontaine and F. Galand. A survey of homomorphic encryption for nonspecialists. EURASIP Journal on Information Security, 2007(1):1–10, 2007. [39] M. Franz, A. Holzer, S. Katzenbeisser, C. Schallhart, and H. Veith. CBMC-GC: An ANSI C Compiler for Secure Two-Party Computations. In Compiler Construction, pages 244–249. Springer, Berlin, Heidelberg, Apr. 2014. DOI: 10.1007/978-3-642-54807-9 15. [40] D. Genkin, A. Shamir, and E. Tromer. Rsa key extraction via low-bandwidth acoustic cryptanalysis. Cryptology ePrint Archive, Report 2013/857, 2013. http://eprint.iacr.org/2013/857.

71 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

[41] C. Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University, Stanford, CA, USA, 2009. [42] C. Gentry and S. Halevi. Fully Homomorphic Encryption without Squashing Using Depth-3 Arithmetic Circuits. 2011. [43] S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and System Sciences, 28(2):270–299, Apr. 1984. [44] K. Gotze, G. Iovino, and J. Li. Secure provisioning of secret keys during integrated circuit manufacturing, Apr. 3 2014. US Patent App. 13/631,512. [45] K. Gotze, J. Li, and G. Iovino. Fuse attestation to secure the provisioning of secret keys during integrated circuit manufacturing, Nov. 11 2014. US Patent 8,885,819. [46] J. A. Halderman, S. D. Schoen, N. Heninger, W. Clarkson, W. Paul, J. A. Calandrino, A. J. Feldman, J. Appelbaum, and E. W. Felten. Lest we remember: cold-boot attacks on encryption keys. Communications of the ACM, 52(5):91–98, 2009. [47] W. Henecka, S. Kogl,¨ A.-R. Sadeghi, T. Schneider, and I. Wehrenberg. TASTY: tool for automating secure two-party computations. In Proceedings of the 17th ACM conference on Computer and communications security (CCS 2010), CCS ’10, pages 451–462, Chicago, IL, USA, 2010. ACM. [48] M. Hoekstra, R. Lal, P. Pappachan, V. Phegade, and J. Del Cuvillo. Using innovative instructions to create trustworthy software solutions. In HASP@ ISCA, page 11, 2013. [49] Intel. Creating trust in the cloud. http://www.intel.com.tw/content/dam/www/ public/us/en/documents/white-papers/ creating-trust-in-cloud-ubuntu-intel-white-paper.pdf, 2013. [Online; accessed 25-November-2016]. [50] Intel. Intel trusted execution technology. http://www.intel.com/content/dam/www/public/us/en/documents/ white-papers/trusted-execution-technology-security-paper.pdf, 2013. [Online; accessed 30-November-2016]. [51] R. Jagomagis.¨ SecreC: a Privacy-Aware Programming Language with Applications in Data Mining. PhD thesis, University of Tartu, Tartu, May 2010. [52] S. Johnson, U. Savagaonkar, V. Scarlata, F. Mckeen, and C. Rozas. Technique for supporting multiple secure enclaves, June 21 2012. WO Patent App. PCT/US2011/063,140. [53] J. Li and L. Wang. Noise-free Symmetric Fully Homomorphic Encryption based on noncommutative rings. Technical Report 2015/641, Beijing University of Posts and Telecommunications, Beijing, China, June 2015. [54] F. McKeen, I. Alexandrovich, A. Berenzon, C. V. Rozas, H. Shafi, V. Shanbhogue, and U. R. Savagaonkar. Innovative instructions and software model for isolated execution. In HASP@ ISCA, page 10, 2013.

72 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

[55] F. Mckeen, U. Savagaonkar, C. Rozas, M. Goldsmith, H. Herbert, A. Altman, G. Graunke, D. Durham, S. Johnson, M. Kounavis, et al. Method and apparatus to provide secure application execution, May 20 2010. WO Patent App. PCT/US2009/064,493.

[56] D. Naccache and J. Stern. A new cryptosystem based on higher residues. Proceedings of the 5th ACM conference on on computer and communication security, pages 59–66, 1998.

[57] J. Nechvatal, E. Barker, L. Bassham, W. Burr, and M. Dworkin. Report on the development of the Advanced Encryption Standard (AES). Technical report, 2000.

[58] J. D. Nielsen and M. I. Schwartzbach. A Domain-specific Programming Language for Secure Multiparty Computation. In Proceedings of the 2007 Workshop on Programming Languages and Analysis for Security, PLAS ’07, pages 21–30, New York, NY, USA, 2007. ACM.

[59] NIST. Guidelines on security and privacy in public cloud computing. NIST, 2011.

[60] NIST. Nist cloud computing security reference architecture. NIST, 2013.

[61] K. Nuida. A Simple Framework for Noise-Free Construction of Fully Homomorphic Encryption from a Special Class of Non-Commutative Groups. Technical report, 2014.

[62] P. Paillier. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. Advances in Cryptography - Eurocrypt ’99, 1592:223–238, 1999.

[63] R. A. Popa, C. M. S. Redfield, N. Zeldovich, and H. Balakrishnan. CryptDB: Protecting Confidentiality with Encrypted Query Processing. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP ’11, pages 85–100, New York, NY, USA, 2011. ACM.

[64] M. O. Rabin. How To Exchange Secrets with Oblivious Transfer. Technical Memo TR-81, 1981.

[65] A. Rastogi, M. A. Hammer, and M. Hicks. Wysteria: A Programming Language for Generic, Mixed-Mode Multiparty Computations. In 2014 IEEE Symposium on Security and Privacy, pages 655–670, 2014.

[66] R. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 1978.

[67] R. L. Rivest, L. Adleman, and M. L. Dertouzos. On data banks and privacy homomorphisms. Foundations of secure computation, 4(11):169–180, 1978.

[68] A. Schropfer, F. Kerschbaum, and G. Muller. L1 - An Intermediate Language for Mixed-Protocol Secure Computation. In Proceedings of the 2011 IEEE 35th Annual Computer Software and Applications Conference, COMPSAC ’11, pages 298–307, Washington, DC, USA, 2011. IEEE Computer Society.

[69] F. Schuster, M. Costa, C. Fournet, C. Gkantsidis, M. Peinado, G. Mainar-Ruiz, and M. Russinovich. Vc3: trustworthy data analytics in the cloud using sgx. In 2015 IEEE Symposium on Security and Privacy, pages 38–54. IEEE, 2015.

73 Deliverable 2.1 Secure Big Data Processing in Untrusted Clouds

[70] A. Shahverdi, T. Eisenbarth, and B. Sunar. Toward Practical Homomorphic Evaluation of Block Ciphers Using Prince. 2nd Workshop on Applied Homomorphic Cryptography and Encrypted Computing (WAHC 2014), 2014.

[71] H. Shai and V. Shoup. Algorithms in HElib. In 34th Annual Cryptology Conference, pages 554–571, Santa Barbara, CA, USA, Aug. 2014. Springer Berlin Heidelberg.

[72] V. Shanbhogue, J. Brandt, and J. Wiedemeier. Protecting information processing system secrets from debug attacks, Apr. 26 2016. US Patent 9,323,942.

[73] L. V. Silva, R. Marinho, J. L. Vivas, and A. Brito. Security and privacy preserving data aggregation in cloud computing. In Proceedings of the 32th Annual ACM Symposium on Applied Computing (to be published), SAC ’17, New York, NY, USA, 2017. ACM.

[74] N. P. Smart and F. Vercauteren. Fully homomorphic SIMD operations. Designs, Codes and Cryptography, 71(1):57–81, July 2012.

[75] S. Tu, M. F. Kaashoek, S. Madden, and N. Zeldovich. Processing analytical queries over encrypted data. In Proceedings of the 39th international conference on Very Large Data Bases, PVLDB’13, pages 289–300, Trento, Italy, 2013. VLDB Endowment.

[76] M. Varia, S. Yakoubov, and Y. Yang. HETest: A Homomorphic Encryption Testing Framework. Technical Report 416, 2015.

[77] R. Wojtczuk, J. Rutkowska, and A. Tereshkin. Another way to circumvent intel trusted execution technology, 2009.

[78] Y. Xu, W. Cui, and M. Peinado. Controlled-channel attacks: Deterministic side channels for untrusted operating systems. In Proceedings of the 2015 IEEE Symposium on Security and Privacy, SP ’15, pages 640–656, Washington, DC, USA, 2015. IEEE Computer Society.

[79] A. C. Yao. Protocols for secure computations. In Proceedings of the 23rd Annual Symposium on Foundations of Computer Science, SFCS ’82, pages 160–164. IEEE Computer Society, 1982.

[80] A. C.-C. Yao. How to generate and exchange secrets. In Foundations of Computer Science, 1986., 27th Annual Symposium on, pages 162–167, 1986.

[81] S. Zahur and D. Evans. Obliv-C: A Language for Extensible Data-Oblivious Computation. Technical Report 1153, 2015.

[82] Y. Zhang, A. Steele, and M. Blanton. PICCO: A General-purpose Compiler for Private Distributed Computation. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, CCS ’13, pages 813–826, New York, NY, USA, 2013. ACM.

74