Scalable Memory Protection in the Penglai Enclave

Scalable Memory Protection in the PENGLAI Enclave Erhu Feng, Xu Lu, Dong Du, Bicheng Yang, and Xueqiang Jiang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Yubin Xia, Binyu Zang, and Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China https://www.usenix.org/conference/osdi21/presentation/feng This paper is included in the Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation. July 14–16, 2021 978-1-939133-22-9 Open access to the Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. Scalable Memory Protection in the PENGLAI Enclave Erhu Feng1†‡, Xu Lu1†‡, Dong Du†‡, Bicheng Yang†‡, Xueqiang Jiang†‡, Yubin Xia†§‡, Binyu Zang†§‡, Haibo Chen†§‡ †Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University §Shanghai AI Laboratory ‡Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China Abstract both are also managed by the cloud providers, it is natural to Secure hardware enclaves have been widely used for pro- use enclaves to protect serverless-like and microservice-like tecting security-critical applications in the cloud. However, applications in the cloud [36, 93]. existing enclave designs fail to meet the requirements of However, existing enclave systems cannot well fit some scalability demanded by new scenarios like serverless com- inherent characteristics of these cloud applications, including puting, mainly due to the limitations in their secure mem- resilient memory allocation [86], high resource utilization [43, ory protection mechanisms, including static allocation, re- 76], auto-scaling [43] and ephemeral execution time [34, 56, stricted capacity and high-cost initialization. In this paper, 64], mainly due to three scalability limitations of their secure we propose a software-hardware co-design to support dy- memory protection mechanisms: namic, fine-grained, large-scale secure memory as well as Limitation-1. Non-scalable memory partition/isolation: fast-initialization. We first introduce two new hardware prim- Most existing enclave systems use static or almost-static parti- itives: 1) Guarded Page Table (GPT), which protects page tion for region-based memory isolation, like fixed-sized PRM table pages to support page-level secure memory isolation; (Processor Reserved Memory) in SGX [15], limited secure 2) Mountable Merkle Tree (MMT), which supports scalable world memory regions in ARM TrustZone [2]2, 16 protected integrity protection for secure memory. Upon these two primi- memory regions in Keystone [13, 68], etc. It is hard to dy- tives, our system can scale to thousands of concurrent enclaves namically adjust the boundaries of partitions and scale to a with high resource utilization and eliminate the high-cost ini- large amount of secure memory. Region-based isolation also tialization of secure memory using fork-style enclave creation violates the scalability requirement of fined-grained memory without weakening the security guarantees. management in the cloud. We have implemented a prototype of our design based on Limitation-2. Non-scalable memory integrity protection: PENGLAI [24], an open-sourced enclave system for RISC- Using a traditional Merkle hash tree (or its variants) to protect V. The experimental results show that PENGLAI can sup- memory integrity is hard to scale. For example, Intel SGX port 1,000s enclave instances running concurrently and scale only supports 128/256MB EPC (Enclave Page Cache)3. Al- up to 512GB secure memory with both encryption and in- though SGX has no restriction on the number of enclaves, tegrity protection. The overhead of GPT is 5% for memory- running thousands of enclaves may either cause little available intensive workloads (e.g., Redis) and negligible for CPU- EPC for each instance or frequent EPC swapping, which fails intensive workloads (e.g., RV8 and Coremarks). PENGLAI to meet the scalability requirement of serverless computing. also reduces the latency of secure memory initialization by three orders of magnitude and gains 3.6x speedup for real- Limitation-3. Non-scalable secure memory initialization: world applications (e.g., MapReduce). High-cost secure memory initialization causes long startup latency for enclaves, which significantly affects the perfor- 1 Introduction mance of auto-scaling. For example, SGX needs seconds to There has been a surge of interest in using enclaves like Intel create an enclave [40,69] by adding memory to EPC and mea- SGX [77], AMD SEV [12] and ARM TrustZone [58] to host suring the contents (by EADD and EEXTEND instructions) security-critical applications in cloud with minimal reliance for every page. On the contrary, serverless functions usually on the trust of cloud providers [28,29,37,40,46,59,84,87,96]. have a very short life cycle (<1s) [50, 88]. Meanwhile, microservice [80] and serverless computing [1, In this paper, we propose scalable secure memory protec- 3–5] have become emerging paradigms of cloud, which use tion mechanisms for enclaves with three metrics: (1) size and single-purpose service or function as a basic computation granularity of secure memory, (2) number of enclaves, (3) unit and achieve high scalability. Since the frameworks of 2The number depends on specific implementations, and is typically 8. 1The two authors contributed equally to this work and should be consid- 3The latest SGX platforms (Ice Lake Server) support larger EPC sizes ered co-first authors. (e.g., 1TB) [22], but they do so by giving up on integrity protection. USENIX Association 15th USENIX Symposium on Operating Systems Design and Implementation 275 startup latency of enclaves. We first introduce two new archi- is restricted to 16 (509 in EPYC Generation 2 [14]). Intel tectural primitives, Guarded Page Table (GPT) and Mount- TDX [16] is also designed to isolate secure VMs from other able Merkle Tree (MMT): GPT protects page table pages and software, including the hypervisor. However, TDX only pro- enables memory isolation with page-level granularity, and vides basic integrity protection and cannot defend against MMT is a new abstraction to achieve on-demand and scalable hardware-based memory replay attacks. The number of secure memory encryption and integrity protection. Leveraging these VMs is limited by hardware to 64 private keys in MKTME two primitives, a lightweight secure monitor running in the (Multi-Key Total Memory Encryption). ARM TrustZone- most privileged mode is in charge of enclave management based enclaves, e.g., Komodo [52] and Sanctuary [35], have and maintaining security guarantees. To mitigate the high no restriction on enclave number or memory size. However, overhead of enclave creation due to costly secure memory ini- the secure memory can only reside in a few fixed memory tialization, we propose a new type of enclave, called shadow regions and has no encryption or integrity guarantees. enclave, to support fork-style fast enclave creation. Keystone [68] implements enclave memory isolation by We have implemented a prototype of our design based on leveraging the PMP (Physical Memory Protection) mecha- PENGLAI [24], an open-source RISC-V enclave system using nism of RISC-V [100], which includes a set of paired registers a secure monitor to manage all the enclaves. We extend the to indicate physical memory regions as well as their access secure monitor to support scalable secure memory protection permissions. Thus, the number of memory regions in Key- with two hardware primitives, as well as fast enclave creation. stone is restricted by the number of PMP registers (up to 16). The evaluation results show that PENGLAI can host 1,000s In order to defend against physical attacks, Keystone lever- of concurrent enclave instances and support secure memory ages on-chip computing, which is costly due to the restricted up to 512GB. PENGLAI incurs negligible overhead for CPU- on-chip RAM [68]. CURE [21] adopts enclave ID-based ac- intensive benchmarks (e.g., RV8 and Coremarks), and incurs cess control for customizable enclaves. It utilizes a hardware 5% overhead for memory-intensive benchmarks (e.g., Redis). arbiter to record contiguous physical memory regions of en- For startup latency, PENGLAI leverages the shadow enclave claves, which can only support 13 enclaves. Similarly, the to boost enclave creation by three orders of magnitude (with enclave number of Sanctum [45] is also restricted by the num- 16MB enclave memory). We also evaluate PENGLAI with real- ber of isolated DRAM regions. TIMBER-V [102] extends world scalable applications. The results show that PENGLAI the RISC-V ISA to run unlimited number of enclaves, but it can significantly reduce the execution time of MapReduce incurs non-trivial overhead (25.2% on average) and does not (3.6x speedup with shadow fork) and achieve near-native consider memory integrity protection. performance for a serverless application. We have implemented all the architectural features of PENGLAI on RISC-V 2.2 State-of-the-art Fall Short platform, including an FPGA board, QEMU and the Gem5 Fine-grained memory isolation. Prior art [27,91,92] achieves simulator, and implemented the software monitor with 6,399 fine-grained and flexible memory isolation by introducing ad- LoCs, including enclave/hardware management and encryp- ditional metadata like bitmap [27,92] or tags [102] to identify tion libraries. The hardware costs are also minor, i.e., 0.81% whether a page belongs to an enclave and check each memory (without MMT or memory encryption) in LUT and 0.73% in access. However, due to the capacity restriction of in-SoC FF on a Xilinx VC707 FPGA board. RAM, most of the metadata has to be stored in the main PENGLAI is open-sourced at https://github.com/ memory, which requires an extra memory load when meta- Penglai-Enclave. data is out of SoC and incurs a high performance penalty in 2 Motivation TIMBER-V [102].

Load more