Autolock: Why Cache Attacks on ARM Are Harder Than You Think

AutoLock: Why Cache Attacks on ARM Are Harder Than You Think Marc Green, Worcester Polytechnic Institute; Leandro Rodrigues-Lima and Andreas Zankl, Fraunhofer AISEC; Gorka Irazoqui, Worcester Polytechnic Institute; Johann Heyszl, Fraunhofer AISEC; Thomas Eisenbarth, Worcester Polytechnic Institute https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/green This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX AutoLock: Why Cache Attacks on ARM Are Harder Than You Think Marc Green Leandro Rodrigues-Lima Andreas Zankl Worcester Polytechnic Institute Fraunhofer AISEC Fraunhofer AISEC Gorka Irazoqui Johann Heyszl Thomas Eisenbarth Worcester Polytechnic Institute Fraunhofer AISEC Worcester Polytechnic Institute Abstract caches speed up the access to data and instructions, timing measurements allow an adversary to infer the activity Attacks on the microarchitecture of modern processors of other applications and the data processed by them. In have become a practical threat to security and privacy fact, cache attacks have been demonstrated in multiple in desktop and cloud computing. Recently, cache at- scenarios in which our personal data is processed, e.g., tacks have successfully been demonstrated on ARM web browsing [41] or cloud computing [26, 58]. These based mobile devices, suggesting they are as vulner- attacks have severe security implications, as they can re- able as their desktop or server counterparts. In this cover sensitive information such as passwords, crypto- work, we show that previous literature might have left graphic keys, and private user behavior. The majority of an overly pessimistic conclusion of ARM’s security as attacks have been demonstrated on classic desktop and we unveil AutoLock: an internal performance enhance- server hardware [25, 30, 42, 51], and with Intel’s market ment found in inclusive cache levels of ARM proces- share for server processors being over 98% [31], their sors that adversely affects Evict+Time, Prime+Probe, platforms have been targeted most frequently. and Evict+Reload attacks. AutoLock’s presence on system-on-chips (SoCs) is not publicly documented, yet With mobile usage skyrocketing, the feasibility of knowing that it is implemented is vital to correctly as- cache attacks on smartphone and IoT processors – which sess the risk of cache attacks. We therefore provide a de- are predominantly ARM-based – has become a rele- tailed description of the feature and propose three ways vant issue. Attacks that rely on the existence of a to detect its presence on actual SoCs. We illustrate how cache flush instruction, i.e., Flush+Reload [51] and AutoLock impedes cross-core cache evictions, but show Flush+Flush [23], work efficiently across a broad range that its effect can also be compensated in a practical at- of x86 processors, but have limited applicability on ARM tack. Our findings highlight the intricacies of cache at- devices. In general, cache flush instructions serve the tacks on ARM and suggest that a fair and comprehen- legitimate purpose of manually maintaining coherency, sive vulnerability assessment requires an in-depth under- e.g., for memory mapped input-output or self-modifying standing of ARM’s cache architectures and rigorous test- code. On any x86 processor implementing the SSE2 ing across a broad range of ARM based devices. instruction set extension, this flush instruction is avail- able from all privilege levels as clflush. A similar instruction was introduced for ARM processors only in the 1 Introduction most recent architecture version, ARMv8. In contrast to clflush, it must be specifically enabled to be accessi- The rapid growth of mobile computing illustrates the ble from userspace. This leaves a significant number of continually increasing role of digital services in our daily ARM processors without a cache flush instruction. lives. As more and more information is processed digi- For all processors with a disabled flush instruction or tally, data privacy and security are of utmost importance. an earlier architecture version, e.g., ARMv7, only evic- One of the threats known today aims directly at the fab- tion based cache attacks can be deployed. In particular, ric of digital computing. Attacks on processors and their these attacks are Evict+Time [42], Prime+Probe [42], microarchitecture exploit the very core that handles our and Evict+Reload [24]. On multi-core systems, they data. In particular, processor caches have been exploited target the last-level cache (LLC) to succeed regardless of to retrieve sensitive information across logic boundaries which core a victim process is running on. This requires established by operating systems and hypervisors. As the LLC to be inclusive, i.e., to always contain the con- USENIX Association 26th USENIX Security Symposium 1075 tents of all core-private cache levels. On Intel processors, technical reference manuals (TRMs) nor any other pub- the entire cache hierarchy fulfills the inclusiveness prop- lic product documentation by ARM mention AutoLock. erty and is therefore a viable target for eviction based We therefore provide a detailed description of the fea- attacks. ARM devices, on the contrary, implement inclu- ture and propose three methodologies to test for it: using sive and non-inclusive caches alike. Both properties can a hardware debugging probe, reading the performance co-exist, even in the same cache hierarchy. This renders monitoring unit (PMU), and conducting simple cache- eviction based attacks to be practicable only on a limited timing measurements. Each test strategy has different number of devices, in particular those that implement requirements and reliability; having multiple of them inclusive last-level caches. Yet, our findings show that is vital to test for AutoLock under any circumstances. an internal performance enhancement in inclusive last- With the proposed test suite, we verify AutoLock on level caches, dubbed AutoLock, can still impede evic- ARM Cortex-A7, A15, A53, and A57 processors. As tion based cache attacks. In short, AutoLock prevents AutoLock is likely implemented on a larger number of a processor core from evicting a cache line from an in- ARM processors, we discuss its general implications and clusive last-level cache, if said line is allocated in any how our results relate to previous literature. of the other cores’ private cache levels. This inhibits Despite its adverse effect on eviction based cache at- cross-core LLC evictions, a key requirement for practi- tacks, the impact of AutoLock can be reduced. We dis- cal Evict+Time, Prime+Probe, and Evict+Reload at- cuss generic circumvention strategies and execute the tacks on multi-core systems, and further limits the num- attack by Irazoqui et al. [29] in a practical cross-core ber of ARM based attack targets in practice. Evict+Reload scenario on a Cortex-A15 implementing In literature, Lipp et al. [37], Zhang et al. [54], and AutoLock. We successfully recover the secret key from Zhang et al. [56] confirmed the general feasibility of a table based implementation of AES and show that at- flush and eviction based cache attacks from unprivileged tacks can tolerate AutoLock if multiple cache lines are code on ARM processors. Given the lack of flush in- exploitable. Furthermore, the presented circumvention structions on a large selection of ARM devices and the strategies implicitly facilitate cross-core eviction based deployment of non-inclusive LLCs or inclusive LLCs attacks also on non-inclusive caches. This is because in implementing AutoLock, the authors might have left an the context of cross-core LLC evictions, inclusive last- overly pessimistic conclusion of ARM’s security against level caches with AutoLock behave identically to non- cache attacks. In addition, ARM’s highly flexible licens- inclusive ones. In summary, our main contributions are: ing ecosystem creates a heterogeneous market of system- on-chips (SoCs) that can exhibit significant differences • the disclosure and description of AutoLock, an un- in their microarchitectural implementations. Demme et documented and previously unknown cache imple- al. [17] illustrate that already small changes to the cache mentation feature with adverse impact on practical architecture can have considerable impact on the side- eviction based cache attacks on ARM devices, channel vulnerability of the processor. Consequently, judging the true impact of cache attacks on a broad range • a comprehensive test suite to determine the exis- of ARM based platforms remains to be a challenge. Our tence of AutoLock on actual devices, as its presence work adds another step in this process. It is a contribu- is not documented publicly, tion to an in-depth understanding of microarchitectural • a discussion of AutoLock’s implications and its re- features on ARM in general and an extension to our cur- lation to previous literature demonstrating cache at- rent knowledge of cache implementations in particular. tacks on ARM, and 1.1 Contribution • a set of strategies to circumvent AutoLock to- gether with a practical demonstration of a cross- This work unveils AutoLock, an internal and undocu- core Evict+Reload attack on a multi-core SoC im- mented performance enhancement feature found in in- plementing AutoLock. clusive cache levels on ARM processors. It prevents cross-core evictions of cache lines from inclusive

Load more