Enabling Hardware Randomization Across the Cache Hierarchy in Linux-Class Processors

Enabling Hardware Randomization Across the Cache Hierarchy in Linux-Class Processors Max Doblas1, Ioannis-Vatistas Kostalabros1, Miquel Moreto´1 and Carles Hernandez´ 2 1Computer Sciences - Runtime Aware Architecture, Barcelona Supercomputing Center fmax.doblas, vatistas.kostalabros, [email protected] 2Department of Computing Engineering, Universitat Politecnica` de Valencia` [email protected] Abstract—The most promising secure-cache design approaches access latency [14], [15], [21]. However, in many computing use cache-set randomization to index cache contents thus thwart- domains (e.g. safety-critical systems) protecting LLCs is not ing cache side-channel attacks. Unfortunately, existing randomization proposals cannot be sucessfully applied to processors’ enough as it does not provide any security guarantees for cache hierarchies due to the overhead added when dealing with attacks targeting other upper-level caches [22]. coherency and virtual memory. In this paper, we solve existing Unfortunately, applying cache randomization to the whole limitations of hardware randomization approaches and propose cache hierarchy is more challenging as it requires the ran- a cost-effective randomization implementation to the whole cache hierarchy of a Linux-capable RISC-V processor. domization approaches to have low implementation costs and support virtual-to-physical memory translation. The majority I. INTRODUCTION of current processors rely on Virtually-Indexed Physically- Recently reported security vulnerabilities exploit key, Tagged (VIPT) first level caches for enhanced performance. high-performance, processor features such as the use of spec- Such designs allow reducing cache access latency as virtual- ulative execution [11] in order to get access to classified to-physical translation is done in parallel with the cache access. information of victim processes executed in the CPU. In Current cache randomization solutions cannot be directly Spectre-like attacks, speculative execution is used to force the applied to such cache designs without sacrificing performance. execution of instructions not belonging to the regular program Moreover, keeping coherence across the cache hierarchy re- path to obtain private information from the victim’s address quires being able to transparently index cache contents both space. Since the misspeculated paths are not functionally with physical and virtual addresses. visible, they have to rely on microarchitectural side-channels In this paper, we propose the first randomization mechanism to obtain the secret information. Cache side-channel attacks that can be applied to the whole cache hierarchy of a processor have been shown to be a feasible and powerful tool to leak design with virtual memory support. In particular, this paper sensitive information from a victim process [13], [16]. makes the following contributions: The design of secure caches has become a very active ¬ A mechanism to deal with both virtual and physical area of research. The most promising secure-cache design addresses in randomized cache designs while retaining cache approaches randomize the mapping from memory line to the coherency. cache-set in order to thwart cache side-channel attacks [14], A performance/security balanced solution in which the best [18], [20], [21]. These cache designs use a parametric, keyed randomization strategies are employed at different cache levels function to index cache contents. The function is fed with a to improve performance vs security trade-offs. subset of the target address’s bits and a key obtained from an ® An FPGA implementation of our proposal in a RISC-V entropy source and it dictates the cache set to look for that processor that is able to successfully boot the Linux operating value. When the key value used in this function is altered, the system (OS) including randomization in the whole cache memory line to cache-set mapping changes. By extend, the hierarchy. cache conflicts observed will be completely different. ¯ An evaluation of the security and performance overheads of The complexity of building an attack exploiting cache the proposed solution showing for the first time the feasibility conflicts relies on the ability of the attacker to find an eviction of applying randomization across the whole cache hierarchy. set [19]. When cache randomization is in place, eviction sets II. RANDOMIZED CACHE DESIGNS FOR SECURITY have to be rediscovered every time the key used to index cache contents is modified. The key modification frequency defines A. Cache Side-Channel Attacks the maximum vulnerable time-window of the system against Cache-based side channel attacks have become a serious a side-channel attack. concern in many computing domains [13], [16]. These at- Recent proposals have shown that cache randomization can tacks are able to bypass existing virtualization and software be successfully applied to Last-Level Caches (LLCs) to pre- isolation mechanisms as they infer confidential information vent cache side-channel attacks at the cost of increasing cache from cache conflicts between the victim and the attacker. 1 CACHE CACHE CACHE DATA TAGS DATA TAGS DATA TAGS Extended bits Random Index Decryption Encryption Randomizer Extended bits Randomizer Address Random Index Random Index Address + metadata Address Encrypted Address (a) CEASER (b) Time-Randomized (c) Randomize Once Fig. 1: Existing randomization schemes versus our proposal Contention-based attacks work in private caches (same-core) Private caches are weaker against a powerfull side-channel or shared-caches (cross-cores) and exploit cache evictions to attack due to their small size. Nevertheless, current ran- learn the access patterns of victim applications. Same-core side domization approaches have only been shown suitable for channel attacks require the victim and the attacker to be co- LLCs [14], [21] or cache hierarchies with limited virtualization located on the same core by exploiting hyperthreading [16] support [9]. Consequently, there is an imminent need for or OS scheduling [13]. While cross-core attacks impose less solutions that deal with randomized VIPT cache designs. restrictions to the victim’s and the attacker’s co-location, they III. ENABLING RANDOMIZATION ACROSS THE encounter more obstacles, as timing measurements suffer from ENTIRE CACHE HIERARCHY interference noise coming from multiple cores. In this section we propose a novel mechanism that supports B. Randomization Countermeasures virtual memory in randomized caches. We also discuss the complexity and performance of existing schemes. Cache-layout randomization schemes use a parametric function that combines the address with a key-value to randomize A. Cache Layout Randomization the mapping from the cache line tothe cache-set. Proposed First, we analyze the suitability of different randomization functions rely on hashing schemes [12], random permutations schemes to implement cache randomization in the whole cache of the set index [8], or whole address encryptors [14]. hierarchy. The CEASER approach [14] (see Figure 1 (a)) Single vs multiple security domains. Currently, two alter- uses an encryption/decryption scheme for the addresses ac- native schemes exist, that offer different levels of protection. cessing/stemming from the cache. The main advantage of In the single domain scheme [14], all processes share the same this approach is that the cache structure remains unaltered. key-value to access cache contents. Protection is provided by However, its main drawback is the increased access latency ensuring that the value of the key is modified frequently. due to the encryption and decryption process. The latter The frequency at which the key is changed, determines the are being handled by a Low-Latency Block Cipher (LLBC) maximum amount of time, that the attackers have in order which utilizes a Feistel network [?] to produce the necessary to build a successful attack (e.g. discover an eviction set). encrypted/decrypted bits. As shown by Bodduna et al. [5] In the second scheme [18], [21], different security domains this LLBC is vulnerable to key and bits invariance attacks are defined such that every domain uses a different and and therefore is deemed futile at thwarting cache side-channel independent key-value. With this latter approach and assuming attacks against a powerfull adversary. that the victim and the attacker belong to different security An alternative scheme consists of using a randomization domains, cache conflicts cannot be directly associated to a function to produce the cache set’s index [8], [12], [21] (see specific address. As a result, an additional profiling process is Figure 1 (b)). This scheme requires extending cache tags to required to determine the actual relationship between victim include all bits in the address, except the offset bits, to avoid and attacker congruent addresses. Moreover, using security colisions between blocks with the same tag in the same set. domains, provides additional protection against unauthorized Moreover, the latency of the randomization process can be control-information-tampering attacks [10]. partially hidden, as it occurs in parallel with the cache access. Nevertheless, this may have an impact on the timing of the C. Current Limitations cache provided that the randomization delay does not fit in the available timing slack. Finally, set-index randomization is A typical memory hierarchy of a high-performance proces- only applied when cache contents are accessed and therefore sor targeting data-center or embedded systems

Enabling Hardware Randomization Across the Cache Hierarchy in Linux-Class Processors

Caching Basics

45-Year CPU Evolution: One Law and Two Equations

Hierarchical Roofline Analysis for Gpus: Accelerating Performance

An Algorithmic Theory of Caches by Sridhar Ramachandran

Generalized Methods for Application Specific Hardware Specialization

14. Caching and Cache-Efficient Algorithms

Open Jishen-Zhao-Dissertation.Pdf

Beating In-Order Stalls with “Flea-Flicker” Two-Pass Pipelining

Performance Analysis of Complex Shared Memory Systems Abridged Version of Dissertation

Chapter 2: Memory Hierarchy Design

Architecting and Programming an Incoherent Multiprocessor Cache

55 Measuring Microarchitectural Details of Multi- and Many-Core