<<

WELES: Policy-driven Runtime Integrity Enforcement of Virtual Machines

Wojciech Ozga Do Le Quoc Christof Fetzer TU Dresden, Germany TU Dresden, Germany TU Dresden, Germany IBM Research Europe

May 3, 2021

Abstract provider, its employees, and the infrastructure protect the tenant’s intellectual property as well as the confidentiality Trust is of paramount concern for tenants to deploy their and the integrity of the tenant’s data. A malicious em- security-sensitive services in the cloud. The integrity of ployee [8], or an adversary who gets into possession of virtual machines (VMs) in which these services are de- employee credentials [74, 50], might leverage administra- ployed needs to be ensured even in the presence of pow- tor privileges to read the confidential data by introspecting erful adversaries with administrative access to the cloud. virtual machine (VM) memory [85], to tamper with com- Traditional approaches for solving this challenge leverage putation by subverting the hypervisor [54], or to redirect trusted computing techniques, e.g., vTPM, or hardware the tenant to an arbitrary VM under her control by alter- CPU extensions, e.g., AMD SEV. But, they are vulnera- ing a network configuration [91]. We tackle the problem ble to powerful adversaries, or they provide only load time of how to establish trust in a VM executed in the cloud. (not runtime) integrity measurements of VMs. Specifically, we focus on the integrity of legacy systems We propose WELES, a protocol allowing tenants to es- executed in a VM. tablish and maintain trust in VM runtime integrity of soft- The existing attestation protocols focus on leveraging ware and its configuration. WELES is transparent to the trusted hardware to report measurements of the execu- VM configuration and setup. It performs an implicit at- tion environment. In trusted computing [34], the trusted testation of VMs during a secure login and binds the VM platform module attestation [11] and integrity measure- integrity state with the secure connection. Our prototype’s ment architecture (IMA) [78] provide a means to en- evaluation shows that WELES is practical and incurs low force and monitor integrity of the that has been performance overhead (≤ 6%). executed since the platform bootstrap [7]. The virtual TPM (vTPM) [19] design extends this concept by intro- arXiv:2104.14862v1 [cs.CR] 30 Apr 2021 1 Introduction ducing a software-based (TPM) that, together with the hardware TPM, provides integrity measurements of the entire software stack — from the Cloud computing paradigm shifts the responsibility of the firmware, the hypervisor, up to the VM. However, this computing resources management from application own- technique cannot be applied to the cloud because an ad- ers to cloud providers, allowing application owners (ten- versary can tamper with the communication between the ants) to focus on their business use cases instead of on vTPM and the VM. For example, by reconfiguring the net- hardware management and administration. However, trust work, she can mount a man-in-the-middle attack to per- is of paramount concern for tenants operating security- form a TPM reset attack [53], compromising the vTPM sensitive systems because software managing computing security guarantees. resources and its configuration and administration remains out of their control. Tenants have to trust that the cloud A complementary technology to trusted computing,

1 trusted execution environment (TEE) [46], uses hard- that provides integrity guarantees to legacy systems exe- ware extensions to exclude the administrator and privi- cuted in the cloud. WELES has noteworthy advantages. leged software, i.e., , hypervisor, from First, it supports legacy systems with zero-code changes the . The software guard ex- by running them inside VMs on the integrity-enforced ex- tensions (SGX) [29] comes with an attestation protocol ecution environment. To do so, it leverages trusted com- that permits remotely verifying application’s integrity and puting to enforce and attest to the hypervisor’s and VM’s the genuineness of the underlying hardware. However, it integrity. Second, WELES limits the system administra- is available only to applications executing inside an SGX tor activities in the host OS using integrity-enforcement enclave. Legacy applications executed inside an enclave mechanisms, while relying on the TEE to protect its own suffer from performance limitations due to a small amount integrity from tampering. Third, it supports tenants con- of protected memory [15]. The SGX adoption in the vir- necting from machines not equipped with trusted hard- tualized environment is further limited because the pro- ware. Specifically, WELES integrates with the secure shell tected memory is shared among all tenants. (SSH) protocol [89]. Login to the VM implicitly performs Alternative technologies isolating VMs from the un- an attestation of the VM. trusted hypervisor, e.g., AMD SEV [52, 51] or IBM Our contributions are as follows: PEF [44], do not have memory limitations. They sup- • We designed a protocol, WELES, attesting to the VM’s port running the entire operating system in isolation from runtime integrity (§4). the hypervisor while incurring minimal performance over- • We implemented the WELES prototype using state-of- head [41]. However, their attestation protocol only pro- the-art technologies commonly used in the cloud (§5). vides information about the VM integrity at the VM ini- • We evaluated it on real-world applications (§6). tialization time. It is not sufficient because the loaded op- erating system might get compromised later–at runtime– with operating system vulnerabilities or misconfigura- 2 Threat model tion [88]. Thus, to verify the runtime (post-initialization) integrity of the guest operating system, one would still We require that the cloud node is built from the software need to rely on the vTPM design. But, as already men- which source code is certified by a trusted third party [3] tioned, it is not enough in the cloud environment. or can be reviewed by tenants, e.g., open-source soft- Importantly, security models of these hardware tech- ware [13] or accessible under non- nologies isolating VM from the hypervisor assume threats disclosure agreement. Specifically, such software is typ- caused by running tenants’ OSes in a shared execution ically considered safe and trusted when (i) it originates environment, i.e., attacks performed by rogue operators, from trusted places like the official git repository; compromised hypervisor, or malicious co-tenants. These (ii) it passes security analysis like fuzzing [90]; (iii) it technologies do not address the fact that a typical tenant’s is implemented using memory safe languages, like Rust; OS is a complex mixture of software and configuration (iv) it has been formally proven, like seL4 [56] or Ever- with a large vector attack. I.e., the protected application Crypt [72]; (v) it was compiled with memory corruption is not, like in the SGX, a single process, but the kernel, mitigations, e.g., position-independent executables with userspace services, and applications, which might be com- stack-smashing protection. promised while running inside the TEE and thus exposes Our goal is to provide tenants with an runtime integrity tenant’s computation and data to threats. These technolo- attestation protocol that ensures that the cloud node (i.e., gies assume it is the tenant’s responsibility to protect the host OS, hypervisor) and the VM (guest OS, tenant’s OS, but they lack primitives to enable runtime integrity legacy application) run only expected software in the ex- verification and enforcement of guest OSes. This work pected configuration. We distinguish between an inter- proposes means to enable such primitives, which are nei- nal and an external adversary, both without capabilities of ther provided by the technologies mentioned above nor by mounting physical and hardware attacks (e.g., [87]). This the existing cloud offerings. is a reasonable assumption since cloud providers control We present WELES, a VM remote attestation protocol and limit physical access to their data centers.

2 An internal adversary, such as a malicious administra- tamper-resistant. They cannot be written directly but tor or an adversary who successfully extracted administra- only extended with a new value using a cryptographic tors credentials [74, 50], aims to tamper with a hypervisor hash function: PCR_extend = hash(PCR_old_value || configuration or with a VM deployment to compromise data_to_extend). the integrity of the tenant’s legacy application. She has The TPM attestation protocol [40] defines how to read remote administrative access to the host machine that al- a report certifying the PCRs values. The report is signed lows her to configure, install, and execute software. The using a cryptographic key derived from the endorsement internal adversary controls the network that will allow her key, which is an asymmetric key embedded in the TPM to insert, alter, and drop network packages. chip at the manufacturing time. The TPM also stores a An external adversary resides outside the cloud. Her certificate, signed by a manufacturer, containing the en- goal is to compromise the security-sensitive application’s dorsement key’s public part. Consequently, it is possible integrity. She can exploit a guest OS misconfiguration or to check that a genuine TPM chip produced the report be- use social engineering to connect to the tenant’s VM re- cause the report’s signature is verifiable using the public motely. Then, she runs dedicated software, e.g., software key read from the certificate. debugger or custom kernel, to modify the legacy applica- Runtime integrity enforcement. The administrator has tion’s behavior. privileged access to the machine with complete control We consider the TPM, the CPU, and their hardware fea- over the network configuration, with permissions to in- tures trusted. We rely on the soundness of cryptographic stall, start, and stop applications. These privileges per- primitives used by software and hardware components. mit him to trick the DRTM attestation process because the We treat software-based side-channel attacks (e.g., [58]) hypervisor’s integrity is measured just once when the hy- as orthogonal to this work because of (i) the counter- pervisor is loaded to the memory. The TPM report certi- measures existence (e.g., [67]) whose presence is verifi- fies this state until the next DRTM launch, i.e., the next able as part of the WELES protocol, (ii) the possibility boot. Hence, after the hypervisor has been mea- of provisioning a dedicated (not shared) machine in the sured, an adversary can compromise it by installing an ar- cloud. bitrary hypervisor [75] or downgrading it to a vulnerable version without being detected. 3 Background and Problem Statement Integrity measurement architecture (IMA) [7, 78, 43] allows for mitigation of the threat mentioned above. Be- ing part of the measured kernel, IMA implements an Load-time integrity enforcement. A cloud node is a integrity-enforcement mechanism [43], allowing for load- computer where multiple tenants run their VMs in par- ing only digitally signed software and configuration. Con- allel on top of the same computing resources. VMs are sequently, signing only software required to manage VMs managed by a hypervisor, a privileged layer of software allows for limiting activities carried out by an adminis- providing access to physical resources and isolating VMs trator on the host machine. A load of a legitimate kernel from each other. Since the VM’s security depends on the with enabled IMA and input-output memory management hypervisor, it is essential to ensure that the correct hyper- unit is ensured by DRTM, and it is attestable via the TPM visor controls the VM. attestation protocol. The trusted computing [34] provides hardware and soft- ware technologies to verify the hypervisor’s integrity. In Integrity auditing. IMA provides information on what particular, the dynamic root of trust for measurements software has been installed or launched since the kernel (DRTM) [81] is a mechanism available in modern CPUs loading, what is the system configuration, and whether that establishes a trusted environment in which a hyper- the system configuration has changed. IMA provides a visor is securely measured and loaded. The hypervi- tamper-proof history of collected measurements in a ded- sor’s integrity measurements are stored inside the hard- icated file called IMA log. The tamper-proof property is ware TPM chip in dedicated memory regions called plat- maintained because IMA extends to the PCR the measure- form configuration registers (PCRs) [11]. PCRs are ments of every executable, configuration file, script, and

3 a) modifies network ! b) hijacks the vTPM! c) proxy to a vTPM! Although the TLS helps protect the communication’s packages routing communication via a legitimate VM integrity, lack of authentication between the vTPM and the T T T hypervisor still enables an adversary to fully control the communication by placing a proxy in front of the vTPM. TPM TPM TPM A proxy driver driver driver In more detail, an adversary can configure the hypervisor

hypervisor hypervisor hypervisor in a way it communicates with vTPM via an intermedi- (virtual machine monitor) (virtual machine monitor) (virtual machine monitor) TPM reset, ary software, which intercepts the communication (Fig- A A replay of arbitrary ure 1b). She can then drop arbitrary measurements or per- ! measurements form the TPM reset attack [53], thus compromising the vTPMvm1 vTPMvm2 vTPM vTPMvm1 vTPMvm2 vTPM security guarantees. host OS host OS host OS

tenant! or! To mitigate the attack, the vTPM must ensure the re- - legitimate! - compromised ! T - A - VM VM (verifier) misconfiguration mote peer’s integrity (is it the correct hypervisor?) and Figure 1: An adversary with root access to the hypervisor can violate the security guarantees promised by the vTPM [19] de- its locality (is the hypervisor running on the same plat- sign. form?). Although the TEE local attestation gives infor- mation about software integrity and locality, we cannot dynamic library before such file or program is read or ex- use it here because the hypervisor cannot run inside the ecuted. Consequently, an adversary who tampers with the TEE. However, suppose we find a way to satisfy the lo- IMA log cannot hide her malicious behavior because she cality condition. In that case, we can leverage integrity cannot modify the PCR value. She cannot overwrite the measurements (IMA) to verify the hypervisor’s integrity PCR and she cannot reset it to the initial value without because among trusted software running on the platform rebooting the platform. there can be only one that connects to the vTPM—the hy- Problems with virtualized TPMs. The TPM chip can- pervisor. To satisfy the locality condition, we make the not be effectively shared with many VMs due to a limited following observation: Only software running on the same amount of PCRs. The vTPM [19] design addresses this platform has direct access to the same hardware TPM. We problem by running multiple software-based TPMs ex- propose to share a secret between the vTPM and the hy- posed to VMs by the hypervisor. This design requires ver- pervisor using the hardware TPM (§4.4). The vTPM then ifying the hypervisor’s integrity before establishing trust authenticates the hypervisor by verifying that the hypervi- with a software-based TPM. We argue that verifying the sor presents the secret in the pre-shared key TLS authen- hypervisor’s integrity alone is not enough because an ad- tication. ministrator can break the software-based TPM security guarantees by mounting a man-in-the-middle (MitM) at- Finally, an adversary who compromises the guest OS tack using the legitimate software, as we describe next. can mount the cuckoo attack [69] to impersonate the le- Consequently, the vTPM cannot be used directly to pro- gitimate VM. In more detail, an adversary can modify the vide runtime integrity of VMs. TPM driver inside a guest OS to redirect the TPM commu- In the vTPM design, the hypervisor prepends a 4-byte nication to a remote TPM (Figure 1c). A verifier running vTPM identifier that allows routing the communication to inside a compromised VM cannot recognize if he commu- the correct vTPM instance. However, the link between nicates with the vTPM attached to his VM or with a re- the vTPM and the VM is unprotected [31], and it is routed mote vTPM attached to another VM. The verifier is help- through an untrusted network. Consequently, an adver- less because he cannot establish a secure channel to the sary can mount a masquerading attack to redirect the VM vTPM that would guarantee communication with the lo- communication to an arbitrary vTPM (Figure 1a) by re- cal vTPM. To mitigate the attack, we propose leveraging placing the vTPM identifier inside the network package. the TEE attestation protocol to establish a secure commu- To mitigate the attack, we propose to use the transport nication channel between the verifier and the vTPM and to layer security (TLS) protocol [9] to protect the commu- use it to exchange a secret allowing the verifier to identify nication’s integrity. the vTPM instance uniquely (§4.5).

4 hybrid cloud node (host OS) C trusted computing components cloud node (host OS) WELES TLS deploys policy integrity measurements policyA A! returns keypublicA virtual machine (VM) trusted execution! TLS environment . tenant A signs with ! C keyprivateA SSH authenticates VM! keyprivateA trusted computing components VM’s SSH! VMA D WELES private key with keypublicA sends integrity! policy ! measurements enforcement B logical tenant separation 1 4 hypervisor 2 TLS deploys policy policyB spawns connects to the VM! deploys " returns key new VM (identify with VM’s public key) the policy TLS publicB tenant B signs with ! SSH authenticates VM! key keyprivateB TLS returns the VM’s ! SSH privateB with keypublicB VMB TLS public key sends integrity! policy ! 3 measurements enforcement cloud provider tenant Figure 2: The high-level overview of WELES. The VM’s SSH Figure 3: Multiple tenants interacting with WELES concur- key is bound to the VM’s integrity state defined in the policy. rently. WELES generates a dedicated SSH key for each de- 4 WELES design ployed security policy and allows using it only if the VM’s in- tegrity conforms to the security policy. Our objective is to provide an architecture that: 1. pro- use the private key only if the platform matches the state tects legacy applications running inside a VM from threats defined inside the policy. Finally, the tenant establishes defined in §2, 2. requires zero-code changes to legacy trust with the VM during the SSH-handshake. He veri- applications and the VM setup, 3. permits tenants to re- fues that the VM can use the private key corresponding motely attest to the platform runtime integrity without to the previously obtained public key (™). The tenant au- possessing any vendor-specific hardware. thenticates himself in a standard way, using his own SSH private key. His SSH public key is embedeed inside the VM’s image or provisioned during the VM deployment. 4.1 High-level Overview

Figure 2 shows an overview of the cloud node running 4.2 Tenant Isolation and Security Policy WELES. It consists of the following four components: (A) the VM, (B) the hypervisor managing the VM, pro- Multiple applications with different security requirements viding it with access to physical resources and isolating might coexist on the same physical machine. WELES al- from other VMs, (C) trusted computing components en- lows ensuring that applications run in isolation from each abling hypervisor’s runtime integrity enforcement and at- other and match their security requirements. Figure 3 testation, (D) WELES, software executed inside TEE that shows how WELES assigns each VM a pair of a public allows tenants to attest and enforce the VMs’ integrity. and private key. The keys are bound with the application’s The configuration, the execution, and the operation of policy and the VM’s integrity. Each tenant uses the public the above components are subject to attestation. First, the key to verify that he interacts with his VM controlled by cloud operator bootstraps the cloud node and starts WE- the integrity-enforced hypervisor. LES. At the tenant’s request, the cloud provider spawns Listing 1 shows an example of a security policy. The a VM (–). Next, the tenant establishes trust with WE- policy is a text file containing a whitelist of the hardware LES (§4.5), which becomes the first trusted component TPM manufacturer’s certificate chain (line 4), DRTM in- on a remote platform. The tenant requests WELES to tegrity measurements of the host kernel (lines 6-9), in- check if the hypervisor conforms to the policy (—), which tegrity measurements of the guest kernel (line 16), and le- contains tenant-specific trust constraints, such as integrity gal runtime integrity measurements of the guest OS (lines measurements (§4.2). WELES uses IMA and TPM to ver- 18-20, 24). The certificate chain is used to establish ify that the platform’s runtime integrity conforms to the trust in the underlying hardware TPM. WELES compares policy and then generates a VM’s public/private key pair. DRTM integrity measurements with PCR values certified The public key is returned to the tenant (˜). WELES pro- by the TPM to ensure the correct hypervisor with enabled tects access to the private key, i.e., it permits the VM to integrity-enforced mechanism was loaded. WELES uses

5 runtime integrity measurements to verify that only ex- Listing 1: Example of the WELES’s security policy pected files and software have been loaded to the guest 1 host: OS memory. A dedicated certificate (line 24) makes the 2 tpm:|- 3 -----BEGIN CERTIFICATE----- integrity-enforcement practical because it permits more 4 # ManufacturerCA certificate files to be loaded to the memory without redeploying the 5 -----END CERTIFICATE----- 6 drtm: # measurements provided by the DRTM policy. Specifically, it is enough to sign software allowed 7 - {id: 17, sha256: f9ad0...cb} to execute with the corresponding private key to let the 8 - {id: 18, sha256: c2c1a...c1} 9 - {id: 19, sha256: a18e7...00} software pass through the integrity-enforcement mecha- 10 certificate:|- 11 -----BEGIN CERTIFICATE----- nism. Additional certificates (lines 12, 27) allow for OS 12 # software update certificate updates [68]. 13 -----END CERTIFICATE----- 14 guest: 15 enforcement: true 16 pcrs: # boot measurements(e.g., secure boot) 4.3 Platform Bootstrap 17 - {id: 0, sha256: a1a1f...dd} 18 measurements: # legal integrity measurement digests The cloud provider is responsible for the proper machine 19 -"e0a11...2a" # SHA digest overa startup script initialization. She must turn on support for hardware tech- 20 -"3a10b...bb" # SHA digest overa library nologies (i.e., TPM, DRTM, TEE), launch the hypervisor, 21 certificate: 22 -|- and start WELES. Tenants detect if the platform was cor- 23 -----BEGIN CERTIFICATE----- 24 # certificate of the signer,e.g.,OS updates rectly initialized when they establish trust in the platform 25 -----END CERTIFICATE----- 26 -----BEGIN CERTIFICATE----- (§4.5). 27 # software update certificate First, WELES establishes a connection with the hard- 28 -----END CERTIFICATE----- ware TPM using the TPM attestation; it reads the TPM The communication between the hypervisor and the certificate and generates an attestation key following the emulated TPM is susceptible to MitM attacks. Unlike activation of credential procedure ([16] p. 109-111). WE- WELES, the hypervisor does not execute inside the TEE, LES ensures it communicates with the local TPM using preventing WELES from using the TEE attestation to ver- [84], but other approaches might be used as well [33, 69]. ify the hypervisor identity. However, WELES confirms Eventually, WELES reads the TPM quote, which certifies the hypervisor identity by requesting it to present a se- the DRTM launch and the measurements of the hypervi- cret when establishing a connection. WELES generates a sor’s integrity. secret inside the TEE and seals it to the hardware TPM via an encrypted channel ([10] §19.6.7). Only software 4.4 VM Launch running on the same OS as WELES can unseal the secret. Thus, it is enough to check if only trusted software ex- The cloud provider requests the hypervisor to spawn a new ecutes on the platform to verify that it is the legitimate VM. The hypervisor allocates the required resources and hypervisor who presents the secret. starts the VM providing it with TPM access. At the end of the process, the cloud provider shares the connection Figure 4 shows the procedure of attaching an emulated details with the tenant, allowing the tenant to connect to TPM to a VM. Before the hypervisor spawns a VM, it the VM. commands WELES to emulate a new software-based TPM WELES emulates multiple TPMs inside the TEE be- (–). WELES creates a new emulated TPM, generates a cause many VMs cannot share a single hardware TPM secret, and seals the secret with the hardware TPM (—). [19]. When requested by the hypervisor, WELES spawns WELES returns the TCP port and the sealed secret to the a new TPM instance accessible on a unique TCP port. The hypervisor. The hypervisor unseals the secret from the hypervisor connects to the emulated TPM and exposes it hardware TPM (˜) and establishes a TLS connection to to the VM as a standard character device. We further use the emulated TPM authenticating itself with the secret the term emulated TPM to describe a TEE-based TPM (™). At this point, the hypervisor spawns a VM. The running inside the hypervisor and distinguish it from the VM boots up, the firmware and IMA send integrity mea- software-based TPM proposed by the vTPM design. surements to the emulated TPM (š). To protect against

6 hypervisor trusted execution environment trusted execution environment cloud node (host OS) 6 challenge WELES generates ! 3 WELES 1 register VM a key pair signs with the 7 private key VM emulated TPM verifies ! VM’s SSH 2 private key the policy 8 SSH server generate! signature 2 secret 4 VM’s SSH! challenge! 5 hardware seal secret (SSH handshake) 3 unseal secret TPM public key 9 HTTPs 4 establish TLS connection, auth with secret SSH client SSH deploys policy 1 signature" VM HTTPs tenant (SSH handshake) 5 integrity measurement Figure 5: The high-level view of the attestation protocol. WE- increase MC monotonic store " LES generates a SSH public/private key pair inside the TEE. counter (MC) 6 MC The tenant receives the public key as a result of the policy de- ployment. To mitigate the MitM attacks, the tenant challenges Figure 4: Attachment of an emulated TPM to a VM. WELES the VM to prove it has access to the private key. WELES signs emulates TPMs inside the TEE. Each emulated TPM is accessi- the challenge on behalf of the VM if and only if the platform ble via a TLS connection. To prevent the MitM attack, WELES integrity conforms with the policy. authenticates the connecting hypervisor by sharing with him a secret via a hardware TPM. To mitigate the rollback attack, the LES’s integrity and the underlying hardware genuineness. emulated TPM increments the monotonic counter value on each The tenant establishes trust with WELES during the TLS- non-idempotent command. handshake, verifying that the presented certificate was is- the rollback attack, each integrity measurement causes the sued to WELES by the CA. emulated TPM to increment the hardware-based mono- Although the tenant remotely ensures that WELES is tonic counter (MC) and store the current MC value inside trusted, he has no guarantees that he connects to his VM the emulated TPM memory (›). To prevent the attach- controlled by WELES because the adversary can spoof the ment of multiple VMs to the same emulated TPM, WE- network [91] to redirect the tenant’s connection to an ar- LES permits a single client connection and does not per- bitrary VM. To mitigate the threat, WELES generates a mit reconnections. An attack in which an adversary redi- secret and shares it with the tenant and the VM. When the rects the hypervisor to a fake emulated TPM exporting a tenant establishes a connection, he uses the secret to au- false secret is detected when establishing trust with the thenticate the VM. Only the VM which integrity conforms VM (§4.5). to the policy has access to this secret. Figure 5 shows a high-level view of the protocol. First, 4.5 Establish Trust the tenant establishes a TLS connection with WELES to deploy the policy (–). WELES verifies the platform in- The tenant establishes trust with the VM in three steps. tegrity against the policy (—), and once succeeded, it gen- First, he verifies that WELES executes inside the TEE erates the SSH key pair (˜). The public key is returned and runs on genuine hardware (a CPU providing the TEE to the tenant (™) while the private key remains inside the functionality). He then extends trust to the hypervisor and TEE. WELES enforces that only a guest OS which runtime VM by leveraging WELES to verify and enforce the host integrity conforms to the policy can use the private key for and guest OSes’ runtime integrity. Finally, he connects to signing. Second, the tenant initializes an SSH connection the VM, ensuring it is the VM provisioned and controlled to the VM, expecting the VM to prove the possession of by WELES. the SSH private key. The SSH client requests the SSH Since the WELES design does not restrict tenants to server running inside the VM to sign a challenge (š). The possess any vendor-specific hardware and the existing SSH server delegates the signing operation to WELES (›). TEE attestation protocols are not standardized, we pro- WELES signs the challenge using the private key (œ) if pose to add an extra level of indirection. Following the and only if the hypervisor’s and VM’s integrity match the existing solutions [39], we rely on a trusted certificate policy. The SSH private key never leaves WELES; only authority (CA) that performs the TEE-specific attestation a signature is returned to the SSH server (). The SSH before signing an X.509 certificate confirming the WE- client verifies the signature using the SSH public key ob-

7 tained by the tenant from WELES (ž). The SSH server Intel SGX enclaves WELES: ! also authenticates the tenant, who proves his identity us- monotonic counter service ing his own private SSH key. The SSH server is config- ured to trust his SSH public key. The tenant established SSH" WELES: ! Linux emulated TPM trust in the remote platform as soon as the SSH handshake VM serverVMVM IMA QEMU QEM QEM succeeds. WELES: monitoring service Alpine Linux kernel-based virtual integrity measurement TPM 4.6 Policy Enforcement machine (KVM) architecture (Linux IMA) driver Linux kernel WELES policy enforcement mechanism guarantees that Intel trusted ! Intel software guard the VM runtime integrity conforms to the policy. At execution technology (TXT) extensions (SGX) TPM 2.0 the host OS, WELES relies on the IMA integrity- hardware enforcement [43] to prevent the host kernel from loading Figure 6: The overview of the WELES prototype implementa- to the memory files that are not digitally signed. Specif- tion. ically, each file in the filesystem has a digital signature memory corruption mitigation techniques. Those tech- stored inside its extended attribute. IMA verifies the sig- niques help mitigate the consequences of, for example, nature issued by the cloud provider before the kernel loads buffer overflow attacks that might lead to privilege escala- the file to the memory. The certificate required for signa- tion or arbitrary code execution. To restrict the host from ture verification is loaded from initramfs (measured by the accessing guest memory and state, we follow existing DRTM) to the kernel’s keyring. At the guest OS, IMA in- security-oriented commercial solutions [28] that disable side the guest kernel requires the WELES approval to load certain hypervisor features, such as hypervisor-initiated a file to the memory. The emulated TPM, controlled by memory dump, huge memory pages on the host, memory WELES, returns a failure when IMA tries to extend it with swapping, memory ballooning through a virtio-balloon measurement not conforming to the policy. The failure device, and crash dumps. For production implementa- instructs IMA not to load the file. tions, we propose to rely on microkernels like formally proved seL4 [56]. We rely on SGX as the TEE technology. The SGX re- 5 Implementation mote attestation [48] allows us to verify if the application executes inside an enclave on a genuine Intel CPU. Other 5.1 Technology Stack TEEs might be supported (§7). We implemented WELES We decided to base the prototype implementation on the in Rust [63], which preserves memory-safety and type- Linux kernel because it is an open-source project support- safety. To run WELES inside an SGX enclave, we use ing a wide range of hardware and software technologies. the SCONE framework [15] and its Rust cross-compiler. It is commonly used in the cloud and, as such, can demon- We also exploit the Intel trusted execution technology strate the practicality of the proposed design. QEMU [17] (TXT) [38] as a DRTM technology because it is widely and kernel-based virtual machine (KVM) [55] permit to available on Intel CPUs. We use the open-source soft- use it as a hypervisor. We rely on Linux IMA [78] as an ware tboot [12] as a pre-kernel bootloader that establishes integrity enforcement and auditing mechanism built-in the the DRTM with TXT to provide the measured boot of the Linux kernel. Linux kernel. We chose Alpine Linux because it is designed for se- curity and simplicity in contrast to other Linux distribu- 5.2 Prototype Architecture tions. It consists of a minimum amount of software re- quired to provide a fully-functional OS that permits keep- The WELES prototype architecture consists of three com- ing a trusted computing base (TCB) low. All userspace ponents executing in SGX enclaves: the monitoring ser- binaries are compiled as position-independent executables vice, the emulated TPM, and the monotonic counter ser- with stack-smashing protection and relocation read-only vice.

8 The monitoring service is the component that leverages The emulated TPM establishes trust with the MCS via Linux IMA and the hardware TPM to collect integrity the TLS-based SGX attestation (§5.4) and maintains the measurements of the host OS. There is only one monitor- TLS connection open until the emulated TPM is shut- ing service running per host OS. It is available on a well- down. We implemented the emulated TPM to increase known port on which it exposes a TLS-protected REST the MC before executing any non-idempotent TPM com- API used by tenants to deploy the policy. We based this mand, e.g., extending PCRs, generating keys, writing to part of the implementation on [84] that provides a mech- non-volatile memory. The MC value and the TLS cre- anism to detect the TPM locality. The monitoring ser- dentials are persisted in the emulated TPM state, which vice spawns emulated TPMs and intermediates in the se- is protected by the SGX during runtime and at rest via cret exchange between QEMU and the emulated TPM. sealing. When the emulated TPM starts, it reads the MC Specifically, it generates and seals to the hardware TPM value from the MCS and then checks the emulated TPM the secret required to authenticate the QEMU process, and state freshness by verifying that its MC value equals the passes this secret to an emulated TPM. value read from the MCS. The emulated TPM is a software-based TPM based on the libtpms library [18]. It exposes a TLS-based 5.4 TLS-based SGX Attestation API allowing QEMU to connect. The connection is au- thenticated using the secret generated inside an SGX en- We use the SCONE key management system (CAS) [5] clave and known only to processes that gained access to to perform remote attestation of WELES components, ver- the hardware TPM. We extracted the emulated TPM into ify SGX quotes using Intel attestation service (IAS) [14], a separate component because of the libtpms implementa- generate TLS credentials, and distribute the credentials tion, which requires running each emulator in a separate and the CAS CA certificate to each component during process. initialization. WELES components are configured to es- The monotonic counter service (MCS) provides access tablish mutual authentication over TLS, where both peers to a hardware monotonic counter (MC). Emulated TPMs present a certificate, signed by the same CAS CA, con- use it to protect against rollback attacks. We designed the taining an enclave integrity measurement. Tenants do not MCS as a separate module because we anticipate that due perform the SGX remote attestation to verify the monitor- to hardware MC limitations (i.e., high latency, the lim- ing service identity and integrity. Instead, they verify the ited number of memory overwrites [82]), a distributed ver- certificate exposed by a remote peer during the policy de- sion, e.g., ROTE [62], might be required. Notably, the ployment when establishing a TLS connection to the mon- MCS might also be deployed locally using SGX MC [22] itoring service. The production implementation might use accessible on the same platform where the monitoring ser- Intel SGX-RA [57] to achieve similar functionality with- vice and emulated TPM run. out relying on an external key management system.

5.3 Monotonic Counter Service 5.5 VM Integrity Enforcement We implemented a monotonic counter service (MCS) as The current Linux IMA implementation extends the in- a service executed inside the SGX enclave. It leverages tegrity digest of the IMA log entry to all active TPM PCR TPM 2.0 high-endurance indices [10] to provide the MC banks. For example, when there are two active PCR banks functionality. The MCS relies on the TPM attestation (e.g., SHA-1 and SHA-256), both are extended with the to establish trust with the TPM chip offering hardware same value. We did a minor modification of the Linux MC, and on the encrypted and authenticated communi- kernel. It permitted us to share with the emulated TPM cation channel ([10] §19.6.7) to protect the integrity and not only the integrity digest but also the file’s measure- confidentiality of the communication with the TPM chip ment and the file’s signature. We modified the content of from the enclave. The MCS exposes a REST API over a the PCR_Extend command sent by the Linux IMA in a TLS (§5.4), allowing other enclaves to increment and read way it uses the SHA-1 bank to transfer the integrity di- hardware monotonic counters remotely. gest, the SHA-256 bank to transfer the file’s measurement

9 digest, and the SHA-512 bank to transfer the file’s signa- tualization (SR-IOV) [23] technologies are turned on in ture. In the emulated TPM, we intercept the PCR_extend the UEFI/BIOS configuration. The hyper-threading is command to extract the file’s measurement and the file’s switched off. The enclave page cache (EPC) is configured digest. We use obtained information to enforce the pol- to reserve 128 MiB of random-access memory (RAM). icy; if the file is not permitted to be executed, the emu- On host and guest OSes, we run Alpine 3.10 with Linux lated TPM process closes the TLS connection causing the kernel 4.19. We modified the guest OS kernel according QEMU process to shut down the VM. to the description in §5.5. We adjusted quick emulator (QEMU) 3.1.0 to support TLS-based communication with 5.6 SSH Integration the emulated TPM as described in §4.4. CPUs are on the microcode patch level (0x5e). To enable a secure connection to the VM, we relied on the OpenSSH server. It supports the PKCS#11 [66] standard, 6.1 Micro-benchmarks which defines how to communicate with cryptographic devices, like TPM, to perform cryptographic operations Are TPM monotonic counters practical to handle using the key without retrieving it from the TPM. the rollback protection mechanism? Strackx and We configured an OpenSSH server running inside the Piessens [82] reported that the TPM 1.2 memory gets cor- guest OS to use an SSH key stored inside the emulated rupted after a maximum of 1.450M writes and has a lim- TPM running on the host OS. The VM’s SSH private key ited increment rate (one increment per 5 sec). We run an is generated and stored inside the SGX enclave, and it experiment to confirm or undermine the hypothesis that never leaves it. The SSH server, via PCRS#11, uses it those limitations apply to the TPM 2.0 chip. We were for the TLS connection only when WELES authorizes ac- continuously incrementing the monotonic counter in the cess to it. The tenant uses his own SSH private key, which dTPM and the iTPM chips. The dTPM chip reached 85M is not managed by WELES. increments, and it did not throttle its speed. The iTPM chip slowed down after 7.3M increments limiting the in- 6 Evaluation crement latency to 5 sec. We did not observe any problem with the TPM memory. We present the evaluation of WELES in three parts. In the What is the cost of the rollback protection mechanism? first part, we measure the performance of internal WELES Each non-idempotent TPM operation causes the emulated components and related technologies. In the second part, TPM to communicate with the MCS and might directly we check if WELES is practical to protect popular appli- influence the WELES performance. We measured the la- cations, i.e., nginx, memcached. Finally, we estimate the tency of the TPM-based MCS read and increment opera- TCB of the prototype used to run the experiments men- tions. In this experiment, the MCS and the test client ex- tioned above. ecute inside an SGX enclave. Before the experiment, the test client running on the same machine establishes a TLS Testbed. Experiments execute on a rack-based cluster connection with the MCS. The connection is maintained of PowerEdge R330 servers equipped with an Intel during the entire experiment to keep the communication Xeon E3-1270 v5 CPU, 64 GiB of RAM, Infineon 9665 overhead minimal. The evaluation consists of sending 5k TPM 2.0 discrete TPM chips (dTPMs). Experiments that requests and measuring the mean latency of the MCS re- use an integrated TPM (iTPM) run on Intel NUC7i7BNH sponse. machine, which has the Intel platform trusted technol- ogy (PTT) [25] running on Intel ME 11.8.50.3425 pow- Table 1: The latency of main operations in the TPM-based ered by Intel Core i7-7567U CPU and 8 GiB of RAM. MCS. σ states for standard deviation. All machines have a 10 Gb Ethernet network interface Read Increase card (NIC) connected to a 20 Gb/s switched network. The SGX, TXT, TPM 2.0, Intel virtualization technology for discrete TPM 42 ms (σ = 2 ms) 40 ms (σ = 2 ms) directed I/O (VT-d) [24], and single root input/output vir- integrated TPM 25 ms (σ = 2 ms) 32 ms (σ = 1 ms)

10 200 (6236 ms) swTPM 150 WELES Host OS, no IMA Host OS, with IMA Guest OS, with IMA WELES MC 100 dTPM 1st file open 134ms iTPM 75 50

Latency (ms) Latency 50 0 25 baseline: Create Create Quote PCR PCR ms Latency < 40µs Primary Read Extend 0 2nd+ file open Figure 7: TPM operations latency depending on the TPM: s the swTPM, the emulated TPM without rollback protection µ 40 (WELES TPM), the emulated TPM with rollback protection 20

(WELES TPM with MC), the discrete TPM chip (dTPM), and Latency 0 the integrated TPM (iTPM). 100 1k 10k 100k 1M 10M Table 1 shows that the MCS using iTPM performs from File size [bytes] 1.25× to 1.68× faster than its version using dTPM. The Figure 8: File opening times with and without Linux IMA. read operation on the iTPM is faster than the increment impacts the opening time of files depending on their size. operation (25 ms versus 32 ms, respectively). Differently, Figure 8 shows that the IMA inside the guest OS incurs on dTPM both operations take a similar amount of time higher overhead than the IMA inside the host OS. It is pri- (about 40 ms). marily caused by i) the higher latency of the TPM extend What is the cost of running the TPM emulator in- command (∼ 43 ms) that is dominated by a slow network- side TEE and with the rollback protection mecha- based monotonic counter, ii) the IMA mechanism itself nism? Is it slower than a hardware TPM used by that has to calculate the cryptographic hash over the entire the host OS? As a reference point to evaluate the em- file even if only a small part of the file is actually read, ulated TPM’s performance, we measured the latency of and iii) the less efficient data storage used by the VM (vir- various TPM commands executed in different implemen- tualized storage, QCOW format). In both systems, the tations of TPMs. The TPM quotes were generated with IMA takes less than 70 ms when loading files smaller than the elliptic curve digital signature algorithm (ECDSA) us- 1 MB (99% of files in the deployed prototype are smaller ing the P-256 curve and SHA-256 bit key. PCRs were than 1 MB). Importantly, IMA measures the file only once extended using the SHA-256 algorithm. Figure 7 shows unless it changes. Figure 8 shows that the next file reads that except for the PCR extend operation, the SGX-based take less than 40 µs regardless of the file size. TPM with rollback protection is from 1.2× to 69× faster than hardware TPMs and up to 6× slower than the un- 6.2 Macro-benchmarks protected software-based swTPM. Except for the create primary command, which derives a new key from the We run macro-benchmarks to measure performance TPM seed, we did not observe performance degradation degradation when protecting popular applications, i.e., when running the TPM emulator inside an enclave. How- the nginx web-server [4] and the memcached cache sys- ever, when running with the rollback protection, the TPM tem [2], with WELES. We compare the performance of slows down the processing of non-idempotent commands four variants for each application running on the host op- (e.g., PCR_Extend) due to the additional time required to erating system (OS) (native), inside a SCONE-protected increase the MC. Docker container on the host OS (SCONE), inside a guest OS (VM), inside a WELES-protected guest OS with roll- How much IMA impacts file opening times? Before back protection turned on (WELES). Notably, WELES the kernel executes the software, it verifies if executable, operates under a weaker threat model than SCONE. We related configuration files, and required dynamic libraries compare them to show the tradeoff between the security can be loaded to the memory. The IMA calculates a cryp- and performance. tographic hash over each file (its entire content) and sends the hash to the TPM. We measure how much this process How much does WELES influence the throughput of

11 before the throughput started to degrade (latency <2 ms). 2 Except for the reference measurement (native) run on the

1.5 Native bare metal, memcached run in a VM with access to all SCONE available cores and 4 GB of memory. Figure 10 shows 1 VM WELES WELES influence on the throughput-latency ratio of mem- Latency [s] Latency 0.5 cached. We observed low performance degradation when 0 0 10 20 30 40 50 60 70 80 running memcached in a VM. WELES achieved 0.98× of Throughput (k.req/s) the native throughput. It is a result of IMA implemen- Figure 9: Throughput/latency of the nginx tation, which measures the integrity of the memcached executables and configuration files during memcached a web server, e.g., nginx? We configured nginx to run launch, but it does not measure data directly written to the a single worker thread with turned off gzip compression memory during runtime. WELES throughput was 1.23× and logging, according to available SCONE’s benchmark higher than the memcached run inside SCONE. settings. Then, we used wrk2 [86], running on a differ- ent physical host, to simulate 16 clients (4 per physical How the measured boot increases the VM boot time? core) concurrently fetching a pre-generated 10 KiB binary Table 2 shows how WELES impacts VM boot times. As uncompressed file for 45 s. 1 We were increasing the fre- a reference, we measure the boot time of a VM without quency of the fetching until the response times started to any TPM attached. Then, we run experiments in which a degrade. Except for the reference measurement (native) VM has access to different implementations of a software- run on the bare metal, nginx run inside a VM with access based TPMs. Except for the reference measurement, the to all available cores and 4 GB of memory. Linux IMA is always turned on. Each VM has access Figure 9 shows that WELES achieved 0.94× of the to all available cores and 4 GB of memory. As the guest native throughput, reaching 70k requests. The SCONE OS, we run Ubuntu 18.10, a with a pre- variant reached about 31k requests, which is 0.45× of installed tool (systemd-analyze) to calculate system boot the WELES throughput. We observed low-performance times. The measured boot increases the VM load time. degradation incurred by the virtualization (less than 2%). It is caused by the IMA module that measures files re- The WELES overhead is caused mostly by the IMA. quired to initialize the OS. We did not observe any differ- Does WELES influence the throughput of systems that ence in boot time between the setup with the swTPM [6] extensively use in-memory storage, i.e., memcached? and the WELES emulated TPM (WELES no MC). How- We used memtier [59] to generate load by sending ever, when running the emulated TPM with the rollback GET/SET requests (1:1 ratio) of 500 bytes of random data protection (WELES with MC), the VM boot time is 5.2× to a memcached instance running on a different physical and 3.6× higher when compared to the reference and host. We calculated the memcached performance by com- the swTPM setting, respectively. Alternative implementa- puting the mean throughput achieved by the experiment Table 2: The VM boot time duration when using different TPM 1The limited network bandwidth dictated the file size. For larger implementations: no TPM, swTPM [6], emulated TPM inside size we saturated the NIC bandwidth. an SGX enclave without rollback protection (WELES TPM), Native SCONE VM WELES and emulated TPM inside an SGX enclave with rollback protec- tion (WELES TPM with MC). σ states for standard deviation. 3 MC TPM IMA Boot time 2 no TPM    9.7 s (σ = 0.1 s) 1 swTPM    14.0 s (σ = 0.2 s) Latency [ms] Latency WELES 0 220 240 260 280 300 320 340 no MC    14.1 s (σ = 0.3 s) Throughput (k.req/s) with MC    50.8 s (σ = 0.4 s) Figure 10: Throughput/latency of memcached fast MC    15.8 s (estimate)

12 2 VMs 3 VMs 4 VMs and QEMU with support for gnutls, TPM and KVM. 4 4 4 3 3 3 To estimate the TCB of nginx and memcached executed 2 2 2 with WELES, we added their source code sizes (including 1 1 1 musl libc) to the source code size of the software stack

Latency [ms] Latency 0 0 0 150 200 250 150 200 250 150 200 250 required to run it (92.49 MB). Table 3 shows that the eval- Throughput (k.req/s) native WELES uated setup increases the TCB 13× and 32× for nginx and Figure 11: WELES scalability memcached, respectively. When compared to SCONE, WELES prototype increases the TCB from 8× to 12×. SCONE offers not only stronger security guarantees (con- Table 3: The TCB (source code size in MB) of nginx and mem- fidentiality of the tenant’s code and data) but also requires cached when running in SCONE and WELES. All variants in- clude the musl libc library. to trust a lower amount of software code. However, this comes at the cost of performance degradation (§6.2) and +musl libc +SCONE +WELES the necessity of reimplementing or at least recompiling a Nginx 7.3 12.3 (1.6×) 97.69 (13.4×) legacy application to link with the SCONE runtime. Memcached 2.9 7.9 (2.7×) 93.29 (32.2×) tions of MCS, such as ROTE [62], offer much faster MC 7 Discussion increments (1–2 ms) than the presented TPM-based pro- totype. We estimated that using WELES with a fast MC 7.1 Alternative TEEs would slow down VM boot time only by 1.13×. The WELES design (§4) requires a TEE that offers a re- Does WELES incur performance degradation when mote attestation protocol and provides confidentiality and multiple VMs are spawned? We run memcached con- integrity guarantees of WELES components executing in currently in several VMs with and without WELES to ex- the host OS. Therefore, the SGX used to build the WE- amine WELES scalability. We then calculate the perfor- LES prototype (§5) might be replaced with other TEEs. In mance degradation between the variant with and without particular, WELES implementation might leverage Sanc- WELES. We do not compare the performance degradation tum [30], Keystone [60], Flicker [65], or L4Re [73] as between different number of VMs, because it depends on an SGX replacement. WELES might also leverage ARM the limited amount of shared network bandwidth. Each TrustZone [?] by running WELES components in the se- VM run with one physical core and 1 GB of RAM. Fig- cure world and exploiting the TPM attestation to prove its ure 11 shows that when multiple VMs are concurrently integrity. running on the host OS, WELES achieves 0.96×–0.97× of the native throughput. 7.2 Hardware-enforced VM Isolation

6.3 Trusted Computing Base Hardware CPU extensions, such as AMD secure en- crypted virtualization (SEV) [45], Intel multi-key total As part of this evaluation, we estimate the TCB of soft- memory encryption (MKTME) [26], Intel trust domain ware components by measuring the source code size (us- extensions (TDX) [27], are largely complementary to the ing source lines of code (SLOC) as a metric) used to build WELES design. They might enrich WELES design by the software stack. providing the confidentiality of the code and data against How much does the WELES increase the TCB? We rogue operators with physical access to the machine, com- estimate the software stack TCB by calculating size of promised hypervisor, or malicious co-tenants. They also the source code metrics using CLOC [32], lib.rs [1], and consider untrusted hypervisor excluding it from the WE- Linux stat utility. For the Linux kernel and QEMU, we LES TCB. On the other hand, WELES complements these calculated code that is compiled with our custom configu- technologies by offering means to verify and enforce the ration, i.e., Linux kernel with support for IMA and KVM, runtime integrity of guest OSes. Functionality easily

13 available for bare-metal machines (via a hardware TPM) 8 Related work but not for virtual machines. VM attestation is a long-standing research objective. The existing approaches vary from VMs monitoring systems 7.3 Trusted Computing Base focusing on system behavior verification [42, 76, 70], in- trusion detection systems [49, 77], or verifying the in- The prototype builds on top of software commonly used tegrity of the executing software [35, 20]. WELES fo- in the cloud, which has a large TCB because it supports cuses on the VM runtime integrity attestation. Following different processor architectures and hardware. WELES Terra [35] architecture, WELES leverages VMs to provide might be combined with other TEE and hardware exten- isolated execution environments constrained with differ- sions, resulting in a lower TCB and stronger security guar- ent security requirements defined in a policy. Like Scal- antees. Specifically, WELES could be implemented on able Attestation [20], WELES uses software-based TPM to top of a microkernel architecture, such as formally ver- collect VM integrity measurements. WELES extends the ified seL4 [56, 71], that provides stronger isolation be- software-based TPM functionality by enforcing the policy tween processes and a much lower code base (less than and binding the attestation result with the VM connection, 10k SLOC [56]), when compared to the Linux kernel. as proposed by IVP [80]. Unlike the idea of linking the Comparing to the prototype, QEMU might be replaced remote attestation quote to the TLS certificate [37], WE- with Firecracker [13], a virtual machine monitor writ- LES relies on the TEE to restrict access to the private key ten in a type-safe programming language that consists of based on the attestation result. Following TrustVisor [64], 46k SLOC (0.16× of QEMU source code size) and is used WELES exposes trusted computing methods to legacy ap- in production by Amazon AWS cloud. The TCB of the plications by providing them with dedicated TPM func- prototype implementation might be reduced by removing tionalities emulated inside the hypervisor. Unlike oth- superfluous code and dependencies. For example, most of ers, WELES addresses the TPM cuckoo attack at the VM the TPM emulator functionalities could be removed fol- level by combining integrity enforcement with key man- lowing the approach of µTPM [64]. WELES API could agement and with the TEE-based remote attestation. Al- be built on top of the socket layer, allowing removal of ternative approaches to TPM virtualization exist [83, 36]. HTTP dependencies that constitute 41% of the prototype However, the cuckoo attack remains the main problem. implementation code. WELES enhances the vTPM design [19] mostly because of the simplicity; no need for hardware [83] or the TPM 7.4 Integrity Measurements Management specification [36] changes. Hardware solutions, such as SEV [45], IBM PEF [44], The policy composed of digests is sensitive to software TDX [27], emerged to isolate VMs from the untrusted hy- updates because newer software versions result in differ- pervisor and the cloud administrator. However, they lack ent measurement digests. Consequently, any software up- the VM runtime integrity attestation, a key feature pro- date of an integrity-enforced system would require a pol- vided by WELES.WELES is complementary to them. icy update, which is impractical. Instead, WELES sup- Combining these technologies allows for better isolation ports dedicated update mirrors serving updates containing of VM from the hypervisor and the administrator and for digitally signed integrity measurements [68, 21]. Other runtime integrity guarantees during the VM’s runtime. measurements defined in the policy can be obtained from the national software reference library [3] or directly from the IMA-log read from a machine executed in a trusted 9 Conclusion environment, e.g., development environment running on tenant premises. The amount of runtime IMA measure- This paper presented WELES, the virtual machine (VM) ments can be further reduced by taking into account pro- attestation protocol allowing for verification that security- cesses interaction to exclude some mutable files from the sensitive applications execute in the VM composed and measurement [47, 79]. controlled by expected software in expected configura-

14 tion. WELES provides transparent support for legacy ap- https://www.techspot.com/news/ plications, requires no changes in the VM configuration, 40280-google-fired-employees-for-breaching-user-privacy. and permits tenants to remotely attest to the platform html, accessed on March 2020. runtime integrity without possessing any vendor-specific hardware by binding the VM state to the SSH connec- [9] The Transport Layer Security Protocol Version https://tools.ietf.org/html/rfc5246 tion. WELES complements hardware-based trusted execu- 1.2. , tion environment (TEE), such as AMD SEV, IBM PEF, In- accessed on March 2020. tel TDX, by providing runtime integrity attestation of the [10] TPM Library Part 1: Architecture, Fam- guest OS. Finally, WELES incurs low performance over- ily "2.0", Level 00, Revision 01.38. ≤ head ( 6%) but has much larger trusted computing base http://www.trustedcomputinggroup.org/ (TCB) compared to pure TEE approaches, such as Intel resources/tpm_library_specification, software guard extensions (SGX). accessed on March 2020.

References [11] TPM Library Specification, Family "2.0", Level 00, Revision 01.38. http://www. trustedcomputinggroup.org/resources/ [1] Lib.rs: Find quality Rust libraries. https://lib. tpm_library_specification, accessed on March rs, accessed on March 2020. 2020. [2] memcached. https://memcached.org, accessed on March 2020. [12] Trusted Boot (tboot). https://sourceforge.net/ projects/tboot/, accessed on March 2020. [3] National Software Reference Library [13]A MAZON WEB SERVICES,I. O. I. A. Firecracker: (NSRL). https://www.nist.gov/ secure and fast microVMs for serverless comput- itl/ssd/software-quality-group/ ing. http://firecracker-microvm.github.io, national-software-reference-library-nsrl/ accessed on March 2020. about-nsrl/nsrl-introduction, accessed on March 2020. [14]A NATI,I.,GUERON,S.,JOHNSON,S., AND SCARLATA, V. Innovative Technology for CPU [4] NGINX. https://www.nginx.com, accessed on March 2020. Based Attestation and Sealing. In Proceedings of the 2nd International Workshop on Hardware and Ar- [5] SCONE Configuration and Attestation service. chitectural Support for Security and Privacy (2013), https://sconedocs.github.io/CASOverview/, vol. 13 of HASP ’13, ACM. accessed on March 2020. [15]A RNAUTOV,S.,TRACH,B.,GREGOR, F., [6] SWTPM - Software TPM Emulator. https:// KNAUTH, T., MARTIN,A.,PRIEBE,C.,LIND, github.com/stefanberger/swtpm, accessed on J.,MUTHUKUMARAN,D.,O’KEEFFE,D.,STILL- March 2020. WELL,M.,GOLTZSCHE,D.,EYERS,D., KAPITZA,R.,PIETZUCH, P., AND FETZER,C. [7] TCG Infrastructure Working Group Architec- SCONE: Secure Linux Containers with Intel SGX. ture Part II - Integrity Management, Speci- In Proceedings of OSDI (2016). fication Version 1.0, Revision 1.0. https: //trustedcomputinggroup.org/wp-content/ [16]A RTHUR, W., AND CHALLENER,D. A Practical uploads/IWG_ArchitecturePartII_v1.0.pdf, Guide to TPM 2.0: Using the Trusted Platform Mod- accessed on March 2020. ule in the New Age of Security. Apress, 2015.

[8] TechSpot News. Google fired em- [17]B ELLARD, F. QEMU, a Fast and Portable Dynamic ployees for breaching user privacy. Translator. In Proceedings of the Annual Conference

15 on USENIX Annual Technical Conference (2005), [29]C OSTAN, V., AND DEVADAS, S. Intel SGX Ex- ATEC ’05. plained. IACR Cryptology ePrint Archive (2016).

[18]B ERGER, S. Libtpms: software emulation of a [30]C OSTAN, V., LEBEDEV,I., AND DEVADAS,S. Trusted Platform Module. https://github.com/ Sanctum: Minimal Hardware Extensions for Strong stefanberger/libtpms, accessed on March 2020. Software Isolation. In Proceedings of the 25th USENIX Conference on Security Symposium (2016), [19]B ERGER,S.,CACERES,R.,GOLDMAN,K.A., SEC’16. PEREZ,R.,SAILER,R., AND VAN DOORN,L. vTPM: virtualizing the trusted platform module. In [31]C UCURULL,J., AND GUASCH, S. Virtual TPM Proc. 15th Conf. on USENIX Security Symposium for a secure cloud: fallacy or reality? RECSI 2014 (2006). (2014).

[20]B ERGER,S.,GOLDMAN,K.,PENDARAKIS,D., [32]D ANIAL, A. cloc: Count Lines of Code. https:// SAFFORD,D.,VALDEZ,E., AND ZOHAR, M. Scal- github.com/AlDanial/cloc, accessed on March able Attestation: A Step Toward Secure and Trusted 2020. Clouds. In 2015 IEEE International Conference on Cloud Engineering (2015), IEEE. [33]F INK,R.A.,SHERMAN, A. T., MITCHELL,A.O., AND CHALLENER, D. C. Catching the Cuckoo: [21]B ERGER,S.,KAYAALP,M.,PENDARAKIS,D., Verifying TPM Proximity Using a Quote Timing AND ZOHAR, M. File Signatures Needed! Linux Plumbers Conference (2016). Side-Channel. In Trust and Trustworthy Computing. 2011. [22]C EN,S., AND ZHANG, B. Trusted Time and Mono- tonic Counters with Intel Software Guard Extensions [34]G ALLERY,E., AND MITCHELL, C. J. Trusted Com- Platform Services. Intel white paper, 2017. puting: Security and Applications. Cryptologia 33, 3 (2009), 217–245. [23]C ORPORATION, I. An Introduction to SR-IOV Technology, revision 2.5. In PCI-SIG SR-IOV [35]G ARFINKEL, T., PFAFF,B.,CHOW,J.,ROSEN- Primer (2011). BLUM,M., AND BONEH, D. Terra: A Virtual Machine-based Platform for Trusted Computing. In [24]C ORPORATION, I. Intel Virtualization Technology Proceedings of the Nineteenth ACM Symposium on for Directed I/O. In Architecture Specification - Rev Operating Systems Principles (2003), SOSP ’03. 1.3 (2011). [36]G OLDMAN,K., AND BERGER, S. TPM Main Part [25]C ORPORATION, I. Strengthening Security with Intel 3 – IBM Commands. https://researcher. Platform Trust Technology. Intel white paper, 2014. watson..com/researcher/files/ us-kgoldman/mainP3IBMCommandsrev36.pdf [26]C ORPORATION, I. Memory Encryption Technolo- , gies Specification Rev: 1.2. Intel Architecture, 2019. accessed on March 2020.

[27]C ORPORATION, I. Intel Trust Domain Extensions. [37]G OLDMAN,K.,PEREZ,R., AND SAILER, R. Link- Intel white paper, 2020. ing Remote Attestation to Secure Tunnel Endpoints. In Proceedings of the First ACM Workshop on Scal- [28] CORPORATION, I. Introducing IBM Secure Execu- able Trusted Computing (2006), STC ’06. tion for Linux. https://www.ibm.com/support/ knowledgecenter/linuxonibm/com.ibm. [38]G REENE, J. Intel Trusted Execution Technology linux.z.lxse/lxse_t_secureexecution.html, Hardware based Technology for Enhancing Server accessed on March 2020. Platform Security. Intel white paper, 2012.

16 [39]G REGOR, F., OZGA, W., VAUCHER,S.,PIRES, [47]J AEGER, T., SAILER,R., AND SHANKAR,U. R.,LE QUOC,D.,ARNAUTOV,S.,MARTIN,A., PRIMA: Policy-Reduced Integrity Measurement Ar- SCHIAVONI, V., FELBER, P., AND FETZER,C. chitecture. In Proceedings of the Eleventh ACM Trust Management as a Service: Enabling Trusted Symposium on Access Control Models and Technolo- Execution in the Face of Byzantine Stakeholders. gies (2006), SACMAT ’06. In Proceedings of the 50th IEEE/IFIP International Conference on Dependable Systems and Networks [48]J OHNSON,S.,SCARLATA, V., ROZAS,C.,BRICK- (DSN 2020) (2020). ELL,E., AND MCKEEN, F. Intel Software Guard Extensions: EPID Provisioning and Attestation Ser- [40]G ROUP, T. C. TCG Trusted Attestation Protocol vices. Intel white paper, 2016. (TAP) Information Model for TPM Families 1.2 and 2.0 and DICE Family 1.0. Version 1.0, Revi- [49]J OSHI,A.,KING, S. T., DUNLAP, G. W., AND sion 0.36. https://trustedcomputinggroup. CHEN, P. M. Detecting Past and Present Intrusions org/resource/tcg-tap-information-model/, Through Vulnerability-specific Predicates. In Pro- accessed on March 2020. ceedings of the Twentieth ACM Symposium on Op- erating Systems Principles (2005), SOSP ’05. [41]G ÖTTEL,C.,PIRES,R.,ROCHA,I.,VAUCHER, S.,FELBER, P., PASIN,M., AND SCHIAVONI, [50]J OYCE, R. Disrupting Nation State Hackers. V. Security, Performance and Energy Trade-offs of USENIX Enigma’16 (2016). Hardware-assisted Mechanisms. 2018 IEEE 37th Symposium on Reliable Distributed [51]K APLAN, D. Protecting VM register State with Systems (SRDS) (2018). SEV-ES. Amd white paper, 2017.

[42]H ALDAR, V., CHANDRA,D., AND FRANZ, M. Se- [52]K APLAN,D.,POWELL,J., AND WOLLER, T. mantic Remote Attestation – A Virtual Machine di- AMD Memory Encryption. Amd white paper, 2016. rected approach to Trusted Computing. 3rd Vir- tual Machine Research and Technology Symposium, [53]K AUER, B. OSLO: Improving the Security of USENIX, 2004. Trusted Computing. In Proceedings of 16th USENIX Security Symposium on USENIX Security Sympo- [43]H ALLYN,S.,KASATKIN,D.,SAFFORD,D., sium (2007), SS’07. SAILER,R., AND ZOHAR, M. Linux Integrity Measurement Architecture (IMA). https: [54]K ING,S., AND CHEN, P. SubVirt: implementing //.net/p/linux-ima/wiki/Home/, malware with virtual machines. In 2006 IEEE Sym- accessed on March 2020. posium on Security and Privacy (S&P’06) (2006). [44]H UNT,G.D.,PAI,R.,LE, M. V., JAMJOOM, [55]K IVITY,A.,KAMAY, Y., AND LAOR, D. kvm: the H.,BHATTIPROLU,S.,BOIVIE,R.,DUFOUR,L., Linux Virtual Machine Monitor. In Proceedings of FREY,B.,KAPUR,M.,GOLDMAN,K.A., ETAL. the Linux Symposium, Volume One (2007). Confidential computing for OpenPOWER. In Pro- ceedings of the Sixteenth European Conference on [56]K LEIN,G.,NORRISH,M.,SEWELL, T., TUCH, Computer Systems (2021), pp. 294–310. H.,WINWOOD,S.,ELPHINSTONE,K.,HEISER, [45]I NC., A. M. D. AMD Secure Encrypted Virtualiza- G.,ANDRONICK,J.,COCK,D.,DERRIN, P., tion API Version 0.22. Technical Preview (2019). ELKADUWE,D.,ENGELHARDT,K., AND KOLAN- SKI, R. seL4: formal verification of an OS kernel. [46]I NC., G. The trusted execution environment: Deliv- In Proceedings of the ACM SIGOPS 22nd sympo- ering enhanced security at a lower cost to the mobile sium on Operating systems principles - SOSP ’09 market. Globalplatform white paper, 2011. (Big Sky, Montana, USA, 2009).

17 [57]K NAUTH, T., STEINER,M.,CHAKRABARTI,S., os/pkcs11-base-v2.40-os.html, accessed on LEI,L.,XING,C., AND VIJ, M. Integrating Re- March 2020. mote Attestation with Transport Layer Security. In- tel white paper, 2018. [67]O LEKSENKO,O.,TRACH,B.,KRAHN,R.,MAR- TIN,A.,FETZER,C., AND SILBERSTEIN,M. [58]K OCHER, P., HORN,J.,FOGH,A.,,GENKIN,D., Varys: Protecting SGX enclaves from practical side- GRUSS,D.,HAAS, W., HAMBURG,M.,LIPP,M., channel attacks. In USENIX ATC (2018). MANGARD,S.,PRESCHER, T., SCHWARZ,M., AND YAROM, Y. Spectre Attacks: Exploiting Spec- [68]O ZGA, W., LE QUOC,D., AND FETZER,C. ulative Execution. In 40th IEEE Symposium on Se- A practical approach for updating an integrity- curity and Privacy (S&P’19) (2019). enforced operating system. 21st International Mid- dleware Conference (Middleware ’20), December 7- [59]L ABS, R. NoSQL Redis and Memcache traf- 11, 2020, Delft, Netherlands. ACM, New York, NY, fic generation and benchmarking tool. https: USA (2020). //github.com/RedisLabs/memtier_benchmark, accessed on March 2020. [69]P ARNO, B. Bootstrapping Trust in a "Trusted" Plat- Proceedings of the 3rd Conference on Hot [60]L EE,D.,KOHLBRENNER,D.,SHINDE,S., form. In Topics in Security ASANOVIC,K., AND SONG, D. Keystone: An (2008). Open Framework for Architecting Trusted Execu- [70]P ORITZ,J.,SCHUNTER,M.,HERREWEGHEN, tion Environments. In Proceedings of the Fifteenth E. V., AND WAIDNER, M. Property attestation — European Conference on Computer Systems (2020), Scalable and privacy-friendly security assessment of EuroSys ’20. peer . In IBM Research Report, Computer [61]L IMITED, A. Building a Secure System using Trust- Science RZ3548 (2004). Zone Technology. Arm white paper, accessed on March 2020. [71]P OTTS,D.D.,BOURQUIN,R.,ANDRESEN,L., KLEIN,D.G., AND HEISER, G. Mathematically [62]M ATETIC,S.,SOMMER,D.,AHMED,M.,GER- Verified Software Kernels: Raising the Bar for High VAIS,A.,KOSTIAINEN,K.,DHAR,A.,JUELS,A., Assurance Implementations. AND CAPKUN, S. ROTE: Rollback Protection for Trusted Execution. 26th USENIX Security Sympo- [72]P ROTZENKO,J.,PARNO,B.,FROMHERZ,A., sium (USENIX Security ’17) (2017). HAWBLITZEL,C.,POLUBELOVA,M.,BHARGA- VAN,K.,BEURDOUCHE,B.,CHOI,J.,DELIGNAT- ATSAKIS AND LOCK [63]M ,N.D., K , II, F. S. The Rust LAVAUD,A.,FOURNET,C., ETAL. EverCrypt: A Language. In Proceedings of HILT (2014). fast, verified, cross-platform cryptographic provider. In 2020 IEEE Symposium on Security and Privacy [64]M CCUNE,J.M.,LI, Y., QU,N.,ZHOU,Z., (SP) (2020), IEEE, pp. 983–1002. DATTA,A.,GLIGOR, V., AND PERRIG, A. TrustVi- sor: Efficient TCB Reduction and Attestation. In [73]R EITZ,M. Isolating Program Execution on 2010 IEEE Symposium on Security and Privacy L4Re/Fiasco.OC. PhD thesis, TU Dresden, 2019. (Oakland, CA, USA, 2010).

[65]M CCUNE,J.M.,PARNO,B.J.,PERRIG,A.,RE- [74]R ESPONSE, I. X.-F. I., AND (IRIS), I. S. X-Force ITER,M.K., AND ISOZAKI, H. Flicker: an execu- Threat Intelligence Index. In IBM Security report tion infrastructure for tcb minimization. ACM Press. (2020).

[66] OASIS. PKCS#11 specification. http://docs. [75]R UTKOWSKA, J. Introducing Blue Pill. SyScan’06 oasis-open.org/pkcs11/pkcs11-base/v2.40/ (2006).

18 [76]S ADEGHI,A.-R., AND STÜBLE, C. Property- [85]S UNEJA,S.,ISCI,C., DE LARA,E., AND BALA, Based Attestation for Computing Platforms: Caring V. Exploring VM Introspection: Techniques and about Properties, Not Mechanisms. In Proceedings Trade-offs. In Proceedings of the 11th ACM SIG- of the 2004 Workshop on New Security Paradigms PLAN/SIGOPS International Conference on Vir- (2004). tual Execution Environments - VEE ’15 (Istanbul, Turkey, 2015). [77]S AILER,R.,VALDEZ,E.,JAEGER, T., PEREZ,R., DOORN, L. V., GRIFFIN,J.L., AND BERGER,S. [86]T ENE,G., ETAL. wrk2 HTTP benchmarking sHype: Secure Hypervisor Approach to Trusted Vir- tool. https://github.com/giltene/wrk2, ac- tualized Systems. In IBM research report RC23511 cessed on March 2020. (2005). [87]W INTER,J., AND DIETRICH, K. A hijacker’s guide to communication interfaces of the trusted platform [78]S AILER,R.,ZHANG,X.,JAEGER, T., AND WAT- module. Computers & Mathematics with Applica- SON, I. T. J. Design and Implementation of a TCG- tions (2013). based Integrity Measurement Architecture. In In Proceedings of the 13th USENIX Security Sympo- [88]Y IN,Z.,MA,X.,ZHENG,J.,ZHOU, Y., sium. USENIX Association (2004). BAIRAVASUNDARAM,L.N., AND PASUPATHY, S. An Empirical Study on Configuration Errors in [79]S ASSU, R. Infoflow LSM. In Linux Security Sum- Commercial and Open Source Systems. In Proceed- mit’19 (2019). ings of the Twenty-Third ACM Symposium on Oper- ating Systems Principles (2011), SOSP ’11. [80]S CHIFFMAN,J.,VIJAYAKUMAR,H., AND JAEGER, T. Verifying System Integrity by Proxy. In [89]Y LONEN, T., AND LONVICK, C. The Secure Shell Trust and Trustworthy Computing. Springer Berlin (SSH) Transport Layer Protocol. RFC 4254 (Pro- Heidelberg, 2012. posed Standard). Updated by RFC 6668. https: //tools.ietf.org/html/rfc6668, 2006. [81]S HIN,J.,JACOBS,B.,SCOTT-NASH,M.,HAM- MERSLEY,J.,WISEMAN,M.,SPIGER,R., [90]Z ELLER,A.,GOPINATH,R.,BOEHME,M., WILKINS,D.,FINDEISEN,R.,CHALLENER,D., FRASER,G., AND HOLLER, C. Generating Soft- DESSELLE,D.,GOODMAN,S.,SIMPSON,G., ware Tests, 2019. BRANNOCK,K.,NELSON,A.,PIWONKA,M., [91]Z HENG,O.,POON,J., AND BEZNOSOV,K. DAILEY,C., AND SPRINGFIELD, R. TCG D- Application-based TCP hijacking. In Proceedings of RTM Architecture, Document Version 1.0.0. Trusted the Second European Workshop on System Security Computing Group (2013). - EUROSEC ’09 (2009), ACM Press. [82]S TRACKX,R., AND PIESSENS, F. Ariadne: A Min- imal Approach to State Continuity. 25th USENIX Security Symposium (USENIX Security ’16) (2016).

[83]S TUMPF, F., AND ECKERT, C. Enhancing Trusted Platform Modules with Hardware-Based Virtualiza- tion Techniques. In 2008 Second International Con- ference on Emerging Security Information, Systems and Technologies (2008), IEEE.

[84] SUBMISSION.DETAILSOMITTEDFORDOUBLE- BLINDREVIEWING., U. Anonymous.

19