An Authenticated Call Stack

PACStack: an Authenticated Call Stack Hans Liljestrand Thomas Nyman Lachlan J. Gunn University of Waterloo, Canada Aalto University, Finland Aalto University, Finland [email protected] thomas.nyman@aalto.fi [email protected] Jan-Erik Ekberg N. Asokan Huawei Technologies Oy, Finland University of Waterloo, Canada Aalto University, Finland Aalto University, Finland [email protected] [email protected] Abstract Recent software-based shadow stacks show reasonable per- A popular run-time attack technique is to compromise the formance [10], but are vulnerable to an adversary capable control-flow integrity of a program by modifying function re- of exploiting memory vulnerabilities to infer the location of turn addresses on the stack. So far, shadow stacks have proven the shadow stack. To date, only hardware-assisted schemes, to be essential for comprehensively preventing return ad- such as Intel CET [29], achieve negligible overhead without dress manipulation. Shadow stacks record return addresses in trading off security. But employing such a custom hardware integrity-protected memory secured with hardware-assistance mechanism incurs development and deployment costs. or software access control. Software shadow stacks incur Recent ARM processors include support for pointer authen- high overheads or trade off security for efficiency. Hardware- tication (PA); a hardware extension that uses tweakable mes- assisted shadow stacks are efficient and secure, but require sage authentication codes (MACs) to sign and verify point- the deployment of special-purpose hardware. ers [4]. One initial use case of PA is the authentication of re- We present authenticated call stack (ACS), an approach turn addresses [45]. However, current PA schemes are vulnera- that uses chained message authentication codes (MACs). Our ble to reuse attacks, where the adversary can reuse previously prototype, PACStack, uses the ARM general purpose hard- observed valid protected pointers [35]. Prior work [35, 45] 1 2 ware mechanism for pointer authentication (PA) to implement and current implementations by GCC and LLVM mitigate ACS. Via a rigorous security analysis, we show that PACStack reuse attacks, but cannot completely prevent them. achieves security comparable to hardware-assisted shadow In this paper, we propose a new approach, authenticated stacks without requiring dedicated hardware. We demonstrate call stack (ACS), providing security comparable to hardware- that PACStack’s performance overhead is small (≈3%). assisted shadow stacks, with minimal overhead and without requiring new hardware-protected memory. ACS binds all return addresses into a chain of MACs that allow verification 1 Introduction of return addresses before their use. We show how ACS can be efficiently realized using ARM PA while resisting reuse Traditional code-injection attacks are ineffective in the pres- attacks. The resulting system, PACStack, can withstand strong ence of W⊕X policies that prevent the modification of exe- adversaries with full memory access. Our contributions are: cutable memory [49]. However, code-reuse attacks can alter • ACS, a new approach for precise verification of function the run-time behavior of a program without modifying any of return addresses by chaining MACs (Section4). its executable code sections. Return-oriented programming • PACStack, an LLVM-based realization of ACS using ARM (ROP) is a prevalent attack technique that corrupts function PA without requiring additional hardware (Section5). return addresses to hijack a program’s control flow. ROP can • A systematic evaluation of PACStack security, showing that be used to achieve Turing-complete computation by chain- its security is comparable to shadow stacks (Section6). ing together existing code sequences in the victim program. • Demonstrating that the performance overhead of PAC- To prevent ROP, return addresses must be protected when Stack is small (≈3%) (Section7). stored in memory. At present, the most powerful protection PACStack and associated evaluation code is available as open against ROP is using an integrity-protected shadow stack source at https://pacstack.github.io. that maintains a secure reference copy of each return address [1]. Integrity of the shadow stack is ensured by mak- 1https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function- ing it inaccessible to the adversary either by randomizing its Attributes.html location in memory or by using specialized hardware [29]. 2https://reviews.llvm.org/D49793 2 Background 8 bits reserved bit 3 – 23 bits VA_SIZE bits tag/PAC sign ext./PAC virtual address (AP) 2.1 ROP on ARM general purpose registers HK(AP, M) PA key (K) In ROP, the adversary exploits a memory vulnerability to manipulate return addresses stored on the stack, thereby al- 64-bit modifier (M) configuration register tering the program’s backward-edge control flow. ROP allows Turing-complete attacks by chaining together multiple Figure 1: PA uses a pointer authentication code (PAC) based gadgets, i.e., adversary-chosen sequences of pre-existing pro- on the pointer’s address, a modifier, and a key. gram instructions that together perform the desired operations. 1 prologue: ARM architectures use the link register (LR) to hold the cur- 2 paciasp ; signLR usingSP ¶ rent function’s return address. LR is automatically set by the 3 str LR,[SP] ; pushLR onto stack · branch with link (bl) or branch with link to register (blr) 4 ... instructions that are used to implement regular and indirect 5 epilogue: 6 ldr LR,[SP] ; pop stack ontoLR ¸ function calls. Because LR is overwritten on call, non-leaf 7 retaa ; verifyLR and return ¹ functions must store the return address onto the stack. This opens up the possibility of ROP on ARM [30]. Listing 1: The -mbranch-protection feature in GCC and LLVM/Clang uses PA to sign (¶) and verify (¹) the return address in LR. PA does not access memory directly, the LR 2.2 ARM Pointer Authentication value is stored (·) and loaded (¸) conventionally. The ARMv8.3-A PA extension supports calculating and veri- fying pointer authentication codes (PACs) [4]. PA is at present 2.2.1 PA-based return address protection deployed in the Apple A12, A13, S4, and S5 systems-on-chip (SoCs) and is going to be available in all upcoming ARMv8.3- PA-based return address protection is implemented as part A and later SoCs. A pac instruction calculates a keyed tweak- of the -mbranch-protection feature of GCC and LLVM/- Clang.4 An authenticated return address is computed with able MAC, HK(AP;M), over the address AP of a pointer P using a 64-bit modifier M as the tweak. The resulting au- paciasp (¶ in Listing1) and verified with retaa (¹). These thentication token, referred to as a PAC, is embedded into instructions use the instruction key A and the value of stack the unused high-order bits of P. It can be verified using an pointer (SP) as the modifier. The PA-keys are protected by hardware; consequently an adversary has to resort to guessing aut instruction that recalculates HK(AP;M), and compares the result to P’s PAC. the correct PAC for a modified return address. Since the PAC is stored in unused bits of a pointer, its size The -mbranch-protection feature and other prior PA- is limited by the virtual address size (VA_SIZE in Figure1) based solutions are vulnerable to reuse attacks where an ad- and whether address tagging is enabled [4]. On a 64-bit ARM versary replaces a valid authenticated return address with machine running a default Linux kernel, VA_SIZE is 39, which another authenticated return address previously read from the leaves 16 bits for the PAC when excluding the reserved and process’ memory. For a reused PAC to pass verification, both address tag bits. PA provides five different keys; two for the original and replacement PAC must have been computed code pointers, two for data pointers, and one for generic use. using the same PA key and modifier. This applies to any PA Each key has a separate set of instructions, e.g., the autia scheme, not only authenticated return addresses. Using the SP and pacia instructions always operate on the instruction key value as a modifier reduces the set of interchangeable pointers, A, stored in the APIAKey_EL1 register. Access to the key but still allows reuse attacks when SP values coincide. Reuse registers and PA configuration registers can be restricted to a attacks can be mitigated, but not completely prevented, by higher exception level (EL). Linux v5:03 adds full support for further narrowing the scope of modifier values [35]. PA, such that the kernel (at EL1) manages user-space (EL0) keys and prevents EL0 from modifying them. The kernel 3 Adversary model and requirements generates new PA keys for a process on an exec system call. As currently specified, PA does not cause a fault on verifi- In this work, we consider a powerful adversary, A, with arbi- cation failure; instead, it strips the PAC from the pointer P and trary control of process memory but restricted by a W⊕X pol- flips one of the high-order bits such that P becomes invalid. icy that prevents modification of code pages. This adversary If the invalid pointer is used by an instruction that causes the model is consistent with prior work on run-time attacks [49]. pointer to be translated, such as load or instruction fetch, the We limit A to user space; thus A cannot read or modify kernel- memory management unit issues a memory translation fault. managed registers such as the PA keys. 3https://kernelnewbies.org/Linux_5.0#ARM_pointer_ 4https://gcc.gnu.org/gcc-9/changes.html and authentication https://reviews.llvm.org/D51429 We make the following assumptions about the system: the MAC key and the last authentication token authn must A1 AW⊕X policy protects code memory pages from modifi- be stored securely to ensure that previous auth tokens and cation by non-privileged processes. All major processor return addresses can be correctly verified when unwinding architectures, including ARMv8-A, support W⊕X.

Load more