Rkt-Io: a Direct I/O Stack for Shielded Execution Jörg Thalheim1,2, Harshavardhan Unnibhavi1,2, Christian Priebe3, Pramod Bhatotia1,2, Peter Pietzuch3

rkt-io: A Direct I/O Stack for Shielded Execution Jörg Thalheim1,2, Harshavardhan Unnibhavi1,2, Christian Priebe3, Pramod Bhatotia1,2, Peter Pietzuch3 1Technical University of Munich, Germany 2The University of Edinburgh, UK 3Imperial College London, UK Abstract and fault tolerance. At the same time, it increases the risk The shielding of applications using trusted execution of security violations when applications run in untrusted environments (TEEs) can provide strong security guarantees third-party cloud environments. Attackers (or even malicious in untrusted cloud environments. When executing I/O cloud administrators) can compromise the security of operations, today’s shielded execution frameworks, however, applications [72]. In fact, many studies show that software exhibit performance and security limitations: they assign bugs, configuration errors, and security vulnerabilities pose resources to the I/O path inefficiently, perform redundant serious threats to cloud systems, and software security is data copies, use untrusted host I/O stacks with security cited as a barrier to the adoption of cloud solutions [71]. risks and performance overheads. This prevents TEEs from Hardware-assisted trusted execution environments (TEEs), running modern I/O-intensive applications that require such as Intel SGX [35], ARM Trustzone [4], RISC-V Key- high-performance networking and storage. stone [49, 69], and AMD-SEV [3], offer an appealing way We describe rkt-io (pronounced “rocket I/O”), a direct user- to make cloud services more resilient against security space network and storage I/O stack specifically designed for attacks. TEEs provide a secure memory region that protects TEEs that combines high-performance, POSIX compatibility application code and data from other privileged layers in the and security. rkt-io achieves high I/O performance by system stack, including the OS kernel/hypervisor. Based on employing direct userspace I/O libraries (DPDK and SPDK) this promise, TEEs are now commercially offered by major inside the TEE for kernel-bypass I/O. For efficiency, rkt-io cloud computer providers, including Azure [54], Google [27], polls for I/O events directly, by interacting with the hardware and Alibaba [17]. instead of relying on interrupts, and it avoids data copies TEEs, however, introduce new challenges to meet the per- by mapping DMA regions in the untrusted host memory. formance requirements of modern I/O-intensive applications To maintain full Linux ABI compatibility, the userspace I/O that rely on high-performance networking hardware (e.g., libraries are integrated with userspace versions of the Linux >20 Gbps NICs) and storage (e.g., SSDs). Since TEEs are pri- VFS and network stacks inside the TEE. Since it omits the host marily designed to protect in-memory state, they only offer OS from the I/O path, does not suffer from host interface/Iago relatively expensive I/O support to interact with the untrusted attacks. Our evaluation with Intel SGX TEEs shows that rkt-io host environment [20]. Early designs relied on expensive syn- is 9× faster for networking and 7× faster for storage compared chronous world switches between the trusted and untrusted to host- (Scone) and LibOS-based (SGX-LKL) I/O approaches. domains for I/O calls, where a thread executing an I/O oper- ation must exit the TEE before issuing a host I/O system call. CCS Concepts: • Security and privacy ! Trusted com- Thisapproachincursprohibitiveoverheadsduetothesecurity puting; • Software and its engineering ! Operating sanitization of the CPU state including registers, TLBs, etc. systems. To overcome this limitation, more recent designs used by shielded execution frameworks (e.g., Scone [5], Eleos [62], 1 Introduction and SGX-LKL [66]) employ a switchless I/O model in which Cloud computing offers economies of scale for computational dedicated host I/O threads process I/O calls from TEE threads resources combined with the ease of management, elasticity, using shared memory queues. To avoid blocking TEE threads when waiting for I/O results, these frameworks employ Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies user-level threading libraries inside the TEE to execute I/O are not made or distributed for profit or commercial advantage and that calls asynchronously [76]. copies bear this notice and the full citation on the first page. Copyrights While such switchless asynchronous designs improve for components of this work owned by others than the author(s) must I/O performance over the strawman synchronous world be honored. Abstracting with credit is permitted. To copy otherwise, or switching design, current frameworks still exhibit significant republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. performance and security limitations: (1) they manage EuroSys ’21, April 26–28, 2021, Online, United Kingdom resources inefficiently by requiring dedicated I/O threads © 2021 Copyright held by the owner/author(s). Publication rights licensed outside the TEE, which incurs extra CPU cycles when busy to ACM. polling syscalls queues. These additional I/O threads also ACM ISBN 978-1-4503-8334-9/21/04...$15.00 require fine-grained performance tuning to determine their https://doi.org/10.1145/3447786.3456255 490 EuroSys ’21, April 26–28, 2021, Online, United Kingdom Thalheim et al. 100 read 800 14 write 80 12 600 10 60 8 400 Time [μs] 40 6 Throughput [Gbps] Throughput [MiB/s] 4 20 200 2 0 0 0 native sync async direct native sgx-lkl scone rkt-io native sgx-lkl scone rkt-io (a) System call latency with sendto() (b) Storage stack performance with fio (c) Network stack performance with iPerf Figure 1. Micro-benchmarks to showcase the performance of syscalls, storage and network stacks across different systems optimal number based on the application threads and I/O abstractions provided by a Linux-based LibOS (LKL [59]) in- workload; (2) they perform additional data copies between side the TEEs. This combination results in a high-performance the trusted and untrusted domains, and the indirection via I/O path, while preserving compatibility with off-the-shelf, shared memory queues significantly increases I/O latency; well-tested Linux filesystems and network protocol imple- (3) the untrusted host interface on the I/O path has security mentations inside the TEE. Since the I/O stack runs in the pro- and performance issues: the host interface is fundamentally tected domain of the TEE, rkt-io provides improved security, insecure [15, 81], and requires context switches, which are as it does not rely on information from the untrusted host OS. expensive for high-performance network and storage devices; The design of rkt-io embodies four principles to address and (4) they lack a universal and transparent mechanism to the aforementioned limitations of current frameworks: encrypt data on the I/O path. Instead, they rely on application- • rkt-io adopts a host-independent I/O interface to improve level encryption, which is potentially not comprehensive, performance and security. This interface leverages a and incompatible with full VM encryption models. direct I/O mechanism in the context of TEEs, where it To overcome these limitations, we argue for a fundamen- bypasses thehost OS when accessing external hardware tally different design point where we re-design the I/O stack devices. At the same time, it leverages a Linux-based based on direct userspace I/O in the context of TEEs. To exem- LibOS (LKL [59]) to provide full Linux compatibility. plify our design choice, we compare the direct I/O approach • rkt-io favors a polling-based approach for I/O event within TEEs with three alternative I/O approaches, measur- handling since TEEs do not provide an efficient way ing the performance of the sendto() syscall with 32-byte to receive interrupts on I/O events. UDP packets over a 40GbE link for (i) native (not secured), • rkt-io proposes a sensible I/O stack partitioning strategy (ii) synchronous and (iii) asynchronous syscalls within TEEs to efficiently utilize resources and eliminate spurious (secured). As Figure 1a shows, native system calls (16.4 µs) and data copies. It partitions the I/O stack by directly the direct I/O based approach (17.9 µs) take approximately the mapping the (encrypted) hardware DMA regions into same time, while we see higher per-packet processing time for untrusted memory outside the TEE, and runs the I/O the synchronous (91.7 µs) and asynchronous (96.7 µs) system stack within the TEE. calls. By bypassing the host I/O support, TEE I/O stacks can • rkt-io provides universal and transparent encryption in avoid performance overheads (and security limitations). the I/O stack to ensure the confidentiality and integrity Our design for a TEE I/O stack therefore has the following of data entering and leaving the TEE. It supports goals: (a) performance: we aim to provide near-native Layer 3 network packet encryption (based on Linux’s performance by accessing the I/O devices (NICs or SSDs) in-kernel Wireguard VPN [23]) for networking, and directly within the TEEs; (b) security: we aim to ensure strong full disk encryption (based on Linux’s dm-crypt device security guarantees, mitigating against OS-based Iago [15] mapper [22]) for storage. and host interface attacks [81]; and (c) compatibility: we Our evaluation with a range of micro-benchmarks aim to offer a complete POSIX/Linux ABI for applications and real-world applications shows that rkt-io provides without having to rewrite their I/O interface. better performance compared to Scone (a host-OS based To achieve these design goals, we describe rkt-io (pro- approach) and SGX-LKL (a LibOS-based approach). For nounced “rocket I/O”), an I/O stack for shielded execution example, the read/write bandwidth of rkt-io’s storage stack using Intel SGX TEEs.

Rkt-Io: a Direct I/O Stack for Shielded Execution Jörg Thalheim1,2, Harshavardhan Unnibhavi1,2, Christian Priebe3, Pramod Bhatotia1,2, Peter Pietzuch3

Linux: Kernel Release Number, Part II

Open Source Software Notice

Hardware-Driven Evolution in Storage Software by Zev Weiss A

Achieving Keyless Cdns with Conclaves

To FUSE Or Not to FUSE? Analysis and Performance Characterization of the FUSE User-Space File System Framework

Reducing the Test Effort on AGL Kernels Together by Using LTSI and LTSI Test Framework

Open Source Used in Cisco RV130/RV130W Firmware Version

Using TCP/IP Traffic Shaping to Achieve Iscsi Service Predictability

Reliable Writeback for Client-Side Flash Caches

DAMN: Overhead-Free IOMMU Protection for Networking

Open Source Used in Rv130x V1.0.3.51

Understanding the Impact of Cache Locations on Storage Performance and Energy Consumption of Virtualization Systems