Mithril: Cooperative Protection on Commodity DRAM Leveraging Managed Refresh

Michael Jaemin Kim, Jaehyun Park, Yeonhong Park, Wanju Doh, Namhoon Kim, Tae Jun Ham, Jae W. Lee, and Jung Ho Ahn Seoul National University [email protected]

Abstract—Since its public introduction in the mid-2010s, the computer system. RH-induced -flip can be abused in various Row Hammer (RH) phenomenon has drawn significant attention attack scenarios. For example, the RH can be utilized to from the research community due to its security implications. modify an unauthorized memory location or enable other Although many RH-protection schemes have been proposed by processor vendors, DRAM manufacturers, and academia, they attack techniques such as and cross-VM still have shortcomings. Solutions implemented in the memory attacks [1], [11], [12], [17], [41], [49], [53]. controller (MC) pay an increasingly higher cost due to their The criticality of this problem motivated many RH protec- conservative design for the worst case in terms of the number tion solutions. There exist several software-based solutions [5], of DRAM banks and RH threshold to support. Meanwhile, the [9], [19], [30], [49], but such solutions often incur a high- DRAM-side implementation has a limited time margin for RH protection measures or requires extensive modifications to the performance cost and have limited coverage (i.e., only ef- standard DRAM interface. fective against a specific attack scenario). For these reasons, Recently, a new command for RH protection has been intro- architectural solutions have emerged as promising alternatives. duced in the DDR5/LPDDR5 standards, called refresh manage- By providing the RH mitigation at the DRAM device level, ment (RFM). RFM enables the separation of the tasks for RH pro- these mechanisms can provide strong protection against most tection to both MC and DRAM. It does so by having the former generate an RFM command at a specific activation frequency, attack scenarios with relatively low or negligible performance and the latter take proper RH protection measures within a given overhead. Most architectural RH defense solutions exploit one time window. Although promising, no existing study presents and of the two mechanisms: adjacent row refresh (ARR) and mem- analyzes RFM-based solutions for RH protection. In this paper, ory access throttling. ARR [29], [31], [39] is a non-standard we propose Mithril, the first RFM interface-compatible, DRAM- DRAM command that is issued by memory controllers to MC cooperative RH protection scheme providing deterministic protection guarantees. Mithril has minimal energy overheads for initiate an extra preventive refresh (REFprev) on potential RH common use cases without adversarial memory access patterns. victim rows. Throttling [16], [50] is a mechanism of limiting We also introduce Mithril+, an extension to provide minimal the overly frequent activations on the potential RH aggressor performance overheads at the expense of a tiny modification to row from the . the MC while utilizing an existing DRAM command. Using one of the remedies, existing architectural RH- protection schemes provide either probabilistic or determinis- I.INTRODUCTION tic protection guarantee for the target RHTH . The probabilistic DRAM has been an essential part of computer systems for protection schemes are efficient in that they often require very decades. One characteristic of DRAM is that DRAM inher- few extra structures, but they cannot completely prevent the ently leaks charges over time. Thus an auto-refresh command RH attack. On the other hand, the deterministic protection arXiv:2108.06703v1 [cs.CR] 15 Aug 2021 needs to recharge a DRAM cell at an interval of refresh schemes utilize an extra counter structure to track the potential window to avoid a loss of data. However, when a certain RH aggressor rows to provide the complete protection guar- row (aggressor) is activated at a rate surpassing a certain antee. Often, streaming (Section II-F) or grouped- threshold, the nearby row (victim) may experience a bit flip counter approaches are utilized to effectively manage the phenomenon. This reliability problem is called Row Hammer counter table. (RH) [28], [29] and the threshold number of activations within One of the important design decisions for the architectural a single refresh window causing the first-bit flip is called RH-protection scheme is where to implement the proposed RH-threshold (RHTH ). Unfortunately, the RH problem is solution within the system. In practice, most RH protection becoming more critical as fabrication technology scales [14]. solutions are either implemented in an on-die memory con- Recent works report RHTH to be as low as 20K, and several troller (MC) or a DRAM device. For example, Graphene [39], literatures identified that even non-adjacent victim rows could BlockHammer [50], and PARA [29] are implemented in the be influenced by the RH phenomenon [15], [28]. on-die MC, whereas TWiCe [31] and the industry-oriented RH The RH phenomenon has been a serious security vul- protection schemes [14], [37] are implemented in the DRAM. nerability as it breaks the basic integrity guarantee in the Unfortunately, both choices have their own drawbacks.

1 First, the MC-side implementation needs to provide RH TABLE I protection resources for the worst-case, where the expected SYMBOLSANDTHEIRDESCRIPTIONSUSEDFOR DRAM REFRESH,ROW HAMMER, ANDSTREAMINGALGORITHM. RHTH level is very low, and the processor is connected to the maximum number of DRAM banks it supports. As a result, it Symbol Description requires a large extra area for the counter structures that the tREFW Per row auto-refresh interval (e.g. 32ms and RH protection mechanism utilizes. 64ms) RH Notation for RH threshold. The DRAM-side implementations are free from such con- TH REFprev Extra preventive refresh on the potential RH vic- cerns, as RHTH of a specific DRAM is estimated accurately tim rows. Executed during ARR, RFM command, by DRAM vendors, and the resource usage is proportional to or hidden in the auto-refresh. the number of DRAM banks since on-DRAM RH-protection rblast Distance that RH effect reaches. Realcnt Real number of occurrence in a data stream. schemes are often deployed at a per-bank or per-DIMM basis. Estcnt Reported number of Realcnt from the . However, such on-DRAM protection schemes suffer from the deployability issues. protection against a target RHTH . Finally, we propose 1) To secure the time margin for the extra REFprev for the potential RH victim rows, the DRAM-side schemes must algorithmic optimization for energy saving, 2) RFM extension either request the MC to generate the non-standard ARR to mitigate performance overhead by exploiting the memory access patterns of the ordinary workloads, and 3) hardware commands or perform extra REFprev during the auto-refresh in a way that is transparent to the MC. The former mechanism optimization that obviates the need for counter table reset, breaks the abstraction that DRAM is a passive device. In which was mandatory in the prior work. contrast, the latter mechanism [14] called the time-margin- The key contributions of this paper are as follows: stealing method is not always possible depending on the • We classify the possible combinations of different stream- DRAM characteristics and the auto-refresh margin. ing algorithms and remedies for RH-protection and inves- Refresh Management (RFM) is a new candidate for a tigate appropriate methods for the RFM-based scheme. remedy in addition to the prior ARR and throttling, which has • We propose Mithril, the first RFM-based RH-protection been newly introduced to the DDR5 and LPDDR5 specifica- scheme with deterministic safety guarantees, exploiting a tions [21], [22]. An MC sends an RFM command at a specific modified Counter-based Summary algorithm. activation frequency to a target DRAM bank without a target • We provide a rigorous mathematical proof of the modified row. The DRAM-side RH-protection scheme can exploit the algorithm and the RH safety of Mithril. time margin provided by the RFM command to take necessary • We suggest the energy and performance optimization tech- niques that exploit the memory access patterns of the actions, typically extra REFprev. This cooperation between the MC and DRAM effectively avoids the critical drawbacks common, non-adversarial workloads. of MC- or DRAM-only implementations. Despite its promising trait, the applicability of RFM for II.BACKGROUND the RH-protection scheme has not been publicly verified or A. DRAM Refresh appropriately evaluated to the best of our knowledge. A prior probabilistic scheme [29] may be trivially applied. However, DRAM stores a single bit in a cell, composed of one capac- the prior deterministic schemes cannot be directly applied to itor and one access . These cells are organized into the RFM interface. The prior ARR-based schemes reactively rows and columns. A DRAM row, which shares a wordline, is issue the command targeting a specific row when the acti- the granularity of the activation (ACT) and precharge (PRE), vation count reaches a scheme-specific predefined threshold. which respectively allows and disallows the read or write However, periodic in its nature, the RFM command is prone to operation on the row. The read and write operation occurs the worst case where multiple rows simultaneously require the by accessing a certain number of columns in an activated REFprev in a short time period, unlike the ARR command. row. DRAM is composed of multiple banks. Each bank allows Throttling-based prior work also cannot utilize the RFM independent ACT, PRE, read, and write operations. Multiple command properly. Thus, prior ARR- or throttling-based RH- banks form a rank, which shares the memory channel to the protection schemes are inefficient over the RFM-interface. memory controller (MC) at the host side. Mithril, a novel DRAM-side, RFM-compatible, and de- Due to the inherent characteristic of a DRAM cell terministic RH-protection scheme takes a new approach in that the stored charge leaks over time, the cell value needs utilizing the RFM command. To avoid such a concentration to be restored periodically [6], [8]. This periodic restoration of rows to refresh, we take a greedy approach in selecting the is called auto-refresh and is initiated at every refresh (REF) target row to refresh at every RFM command. Moreover, we command within tRFC (refresh time) period. Every DRAM investigate the usage of the streaming algorithms and grouped row must be refreshed at least once every refresh window counter approaches from the prior works to finally construct an (tREFW) to be safe from such a charge retention problem. In effective method to manage the counter structure. We provide modern DRAM devices (e.g., DDR5 [22]), all the rows in a a new mathematical proof that by simply maintaining the single bank are divided into typically 8,192 groups. A group greedy selection for REFprev, we guarantee deterministic is refreshed every time interval tREFI (refresh interval).

2 B. Row Hammer Phenomenon D. Implementation Location Row Hammer (RH) refers to a phenomenon that repetitive Architectural RH-protection schemes can be located either activations of a specific row (aggressor) lead to bit flips in on the MC-side or the DRAM-side. MC-side implementation its physically nearby rows (victim) [29], [36], [38], [51]. A has its strength in that it utilizes a superior logic bit flip is observable when the activation count reaches a with a higher area budget available. However, it has major drawbacks. It demands 1) a conservatively high number of certain RH threshold (RHTH ) without being refreshed inside counter structures to allocate, 2) a conservatively low target a tREFW time window. The RHTH value varies depending on different chips, generations, or DRAM manufacturers [28]. RHTH value, and 3) MC modification with higher complexity. The RH problem is getting worse following the current trend The counter table of the deterministic scheme is typically of fabrication technology scale-down, ensued by the intensified allocated per DRAM bank. The latest CPU servers, such as inter-cell interference. Recent studies [14], [28] reported that Ice Lake, support up to 2048 banks per socket (8 channels × 8 ranks × 32 banks). This number could further increase if RHTH has reduced to mere several thousand ACTs. It has also been observed that non-adjacent rows affect the victim rows we consider 3D stacked DRAM devices or future generations. when activated frequently. We refer to the distance between Even though fully populating 2048 banks may be unlikely, the counter structures must be designed to support the worst case. the aggressor and victim as blast radius (rblast) [50]. Such limitation was not properly taken into account by the prior works such as BlockHammer [50] and Graphene [39], C. Remedies and Protection Guarantees for RH which only report the area overhead of the counter table for 16 DRAM banks. There have existed two types of remedies prior to the refresh The target RH value to protect against varies greatly management (RFM) command: ARR and throttling. ARR TH depending on the manufacturer, generation, or even device. refers to a type of commands that the MC issues to DRAM Considering that most deterministic schemes have to be tuned with an explicit target row address (either aggressor or victim) to the target RH at design time, it must be built to protect at a required moment. It triggers an extra REF on the TH prev against the pessimistic RH value. The MC-side schemes potential RH victim rows within the time margin the command TH also often demand new commands that have to be issued in an provides. It is different from the normal REF command, which -like manner [39], [43] or require a new MC scheduler is row agnostic and periodic. A command with a similar con- and system support [50]. cept was once proposed in DDR4 [20] but is now deprecated. Throttling is only practical on the MC side, physical isola- Prior RH-protection schemes that exploited ARR either issued tion [9], [30], [49] is too costly, and the higher refresh rate [4], the command based on some probability [29], [46], [52] or [29] is limited in its protection guarantee. Therefore, DRAM- when the ACT count of a certain aggressor exceeds a scheme- side implementation typically relies on the extra REF specific predefined threshold, being assumed hazardous. prev on the potential RH victim row. To secure the uninterrupted Throttling is a method where MC delays the frequency of time to execute the REF , prior DRAM-side RH-protection activation on an aggressor starting at the moment of identifica- prev schemes had to rely on either the feedback-augmented ARR tion for a defined time. The duration and intensity of the delay command or the auto-refresh time-margin stealing method. are adjusted to guarantee RH protection. [16] first suggested The former is similar to the normal ARR command issued such a methodology, and [50] proposed a deterministic RH- by MCs but requires that DRAM halt the MC for a certain protection scheme utilizing the throttling method. amount of time. There exists some method of feedback from There exist two different types of RH-protection guaran- DRAM to MC, such as ALERT_n signal, but requires more tees: deterministic and probabilistic. Deterministic guarantee pin to deliver additional alert types to support the DRAM-side ensures the RH-protection by guaranteeing that a victim row RH-protection scheme. The latter method, auto-refresh time is always refreshed before it is affected by the ACT more margin stealing, invisibly executes REFprev during normal than RHTH , either by the extra REFprev or the normal auto- auto-refresh. Albeit not requiring any feedback path, it has a refresh. It utilizes a counter structure to track the aggressor limited amount of time margin that can be stolen, thus cannot row and executes one of the two remedies promptly. The be scaled to a low RHTH value. main drawback of the deterministic scheme is its higher area overhead. Given that the counter structure dominates the E. RFM Interface as a New Remedy overall area cost, representative prior works have either utilized Under these circumstances, the RFM interface has been a streaming algorithm [31], [39], [50] or a grouped counter newly introduced as an alternative remedy that allows for approach [25], [43] to minimize its size. DRAM-MC cooperation. It is suggested as the primary means Probabilistic guarantees prevent RH with a certain prob- of RH-protection by the JEDEC committee [23], [24]. The ability. The probabilistic approach has its strengths in the RH-protection scheme resides on the DRAM-side while the minimal area overhead. However, the performance overhead MC provides a periodic but DRAM-row agnostic time margin exacerbates severely when the target RHTH level is lowered to the DRAM bank. Periodic here means that it is not based or when the number of DRAM devices in the system increases. on time but the number of ACTs on a single DRAM bank. It does not provide a deterministic protection guarantee either. Figure 1 shows an exemplar main-memory organization using

3 Memory Controller DRAM fit for RH protection. They report the approximate number of RFM Logic RH Protection each element’s occurrences called estimated count (Estcnt) RAA Counter [Last] Scheme instead of real count (Realcnt). RFM Command . DRAM Bank [Last] (tRFM Time Margin) . Counter-based Summary (CbS) [2], [33], [34] is one of the representative streaming algorithms. The algorithm has a table RAA Counter [1] RH Protection Scheme of entries, each holding an address and a counter. When the RAA Counter [0] DRAM Bank [0] queried address hits an entry in the table, the corresponding counter is incremented by one. When it misses the table, it (a) RFM organization replaces the address of the entry with the minimum counter Find Bank & value in the table with the queried address. Then it increments MC Scheduler ACT Issue RAA Counter ++ its counter by one (Figure 2). Due to its monotonically increasing nature and swapping, the accumulated counter value RFM Issue to the Bank & Reach above the minimum in the table belongs to the currently Reset RAA Counter Yes �����? No written address. In contrast, the ones below the minimum cannot find their source. (b) RFM issue flow On-Table Address: Realcnt ≤ Estcnt = Counter ≤ Realcnt + Min (1) Fig. 1. (a) Exemplar main-memory organization with RFM interface. (b) The Off-Table Address: Realcnt ≤ Estcnt = Min ≤ Realcnt + Min (2) issue logic of RFM.

The CbS algorithm reports the Estcnt of an on-table address Yes Address Counter with its written counter value, whereas the Estcnt of an off- Address exists? increases by 1 table address with the minimum value in the whole table. No Inequalities (1) and (2) show the bound of Estcnt, with Min Replace the address of denoting the minimum counter value in the table. minimum counter NVESTIGATING BASED CHEMES Fig. 2. Counter-based Summary(CbS) algorithm operation. III.I RFM- S The probabilistic scheme (e.g., [29]) can be trivially ex- the RFM interface and RFM issue logic. An MC has a Rolling tended to the RFM interface. However, when it comes to Accumulated ACT (RAA) counter per bank that keeps track the deterministic scheme, we no longer can use the prior of the number of ACTs on its bank. When the RAA count ARR-based schemes nor the existing streaming algorithm as-is reaches an RFM threshold (RFMTH ), the MC issues an RFM for the counter table management. We investigate the proper command only to the corresponding bank and resets the RAA approach for the deterministic RFM-based scheme and the counter for the target bank. At every RFM command issue, the proper method of counter-table management. recipient bank receives a time margin (tRFM) during which no disturbance from any other regular operation is guaranteed. A. Streaming Algorithm Bounds A key difference with the prior ARR command is that RFM The Estcnt boundary of the streaming algorithm is ex- is row agnostic and periodic (i.e., it cannot be issued in a pressed in two inequalities with regard to the Realcnt (see bursty way). In a sense, it can be seen as an extension of the the exemplar inequality (1) and (2) of the CbS algorithm). time-margin stealing method. We refer to the left-hand side of the inequality as the bound- The RFM-based RH-protection scheme can avoid overkill low and the right-hand side as the bound-high (Table II). The because it can use a more accurate prediction of RH and TH bound-low shows not only the lower bound of Estcnt but set the RFM value after testing the manufactured chip. TH also the upper bound of Realcnt, and vice versa. Among the Moreover, the DRAM-side RH-protection scheme can get an inequality boundaries, there are two types: deterministic (Det) additional time margin from the MC without an additional and probabilistic (Prob) ones. The Det boundary is always interface for the feedback path. The format of an RFM met regardless of the input distribution or stream pattern. command is similar to that of a per-bank REF command [21], However, the P rob boundary is only satisfied with a certain [22], requiring minimal additional complexity. probability, which is likely to be met at a common case such as uniform or zipfian distribution, while being prone to skewed F. Streaming Algorithms and CbS Algorithm adversarial patterns.1 The algorithms can be classified into four A sequence of ACTs can be seen as a data stream where each input can be examined only once. Processing dense, fast 1The Count-min Sketch algorithm [10] is representative algorithm with data streams with limited memory to gain useful informa- P rob bound-high. It keeps a table of counters and uses multiple hash functions. When input data is processed, data is designated to multiple tion has been researched under the name of the streaming counters pointed by the hash functions, and corresponding counters are algorithm [35] in the field of data mining. Among different incremented. When queried by data, the algorithm reports its Estcnt as subgroups of streaming algorithms, each aiming at extracting the minimum value among the hash indexed counters, thus is always over- estimated (Det bound-low). However, because it utilizes the hash functions, a a different kind of information from a data stream, the ones maliciously skewed data set can cause redundant indexing, resulting in a high estimating the total number of occurrences per input element minimum and over-estimation (P rob bound-high) for a single data address.

4 TABLE II COMBINATIONS OF THREE RH-PROTECTIONREMEDIESANDFOURCATEGORIESOFSTREAMINGALGORITHMS.

bound-low / bound-high Det/ Det Det/ Prob Prob/ Det Prob/ Prob Lossy-Counting CbS Count-min Sketch Sticky-Sampling Count Sketch Realcnt − m ≤ Estcnt≤ Realcnt Realcnt ≤ Estcnt ≤ Realcnt + m Realcnt ≤ Estcnt ≤ Realcnt + ||f−a|| Realcnt − m ≤ Estcnt ≤ Realcnt Realcnt − ||f−a|| ≤ Estcnt ≤ Realcnt + ||f−a|| ARR O (TWiCe) O (Graphene) X X X Throttle O O O (BlockHammer) X X RFM ? ? X X X

*  indicates an algorithm specific error terminology and m denotes the length of the total stream (e.g., number of ACTs). ||f−a|| denotes the size of the input space (e.g., DRAM row address space), excluding the address of the current Estcnt. categories depending on the type of boundaries they possess It may be tempting to utilize the algorithms with the (see Table II). P rob bound-low such as Sticky Sampling [32] or Count Sketch [10], not to the deterministic but the probabilistic B. Viable Combinations of Remedies and Algorithms RH-protection scheme. They may incur minimal performance overhead compared to the pure sampling with merely a limited When it comes to applying the streaming algorithm to area overhead. However, they are prone to adversarial patterns, an RH-protection scheme, first the bound-low has to be resulting in a worse probability in the corner cases, just as Det to build a deterministic solution. Suppose Est , the cnt ProHIT and MRLoc did [39]. value that the scheme bases its action upon, be lower than Realcnt unboundedly (i.e., P rob). In that case, we cannot C. RFM and Greedy Selection guarantee deterministic safety. Thus, algorithms with the P rob An RFM-based scheme requires a new approach in selecting bound-low cannot be the candidates for deterministic schemes. a row to execute REFprev and manage the counter table. Moreover, depending on the type of remedy, the requirements Executing a REFprev during the RFM is a prominent op- for the algorithm are different. tion. However, prior ARR-based schemes with the predefined ARR: Once ARR is applied to the target victim, the activation threshold and reactive issue, which are also based on the Realcnt of the aggressor becomes zero. Thus, it is necessary REFprev, are extremely inefficient with the RFM interface. to readjust the counter structure (i.e., Estcnt) to minimize It is because an RFM command cannot be issued in a bursty the error between Estcnt and Realcnt. TWiCe [31], which way as ARR can. exploited the Lossy-counting algorithm [32] in retrospect, Prior ARR-based RH-protection schemes issued the ARR utilizes the Det bound-high of the algorithm. Estcnt is command whenever Estcnt reaches a certain predefined guaranteed to be lower than Realcnt; thus, it can be safely threshold specified by the target RHTH and type of the reset to zero following Realcnt (Table II). Graphene [39], scheme. If we pursue the same approach to the RFM interface, which uses the modified Misra-Gries algorithm [34], exploits we could buffer the rows’ addresses that reach a predefined a property derived from the Det bound-low. When a certain threshold. Then we wait for the following RFM command Estcnt reaches the m value, Misra-Gries guarantees that its to execute the REFprev for the buffered victim row. If we address does not get swapped out [39]. Using this property, assume an ARR-based scheme with the rblast of one, the Graphene does not readjust Estcnt after ARR. However, it safe RHTH level is 3K when the predefined threshold is issues ARR at every multiple of the predefined threshold that 1.5K. However, the same scheme with the RFM interface of Estcnt reaches. RFMTH 64 may have a severely degraded safe RHTH level Throttling: Unlike ARR, throttling does not require readjust- of 19K. When 300 different rows reach the activation count of ment of Estcnt since the normal ACT command only alters 1.5K simultaneously in a tREFW time window, the last row Realcnt. Therefore, only the Det bound-low is required for has to wait for over 19K (300×64) ACTs until it gets its turn. the algorithm. Thus, not only the “Det/Det” algorithms of In this work, we propose to use a greedy selection of a target Table II are possible to use, but the algorithm of “Det/P rob” row at every RFM command for the RFM-based scheme. To is available. In fact, BlockHammer [50] used the Count-min avoid a concentration of victim rows that needs a REFprev, we Sketch [10] algorithm (or its equivalent counting bloom filter) should preemptively refresh the rows with the highest Estcnt. as a means of counter management. Based on this action, the streaming algorithm for the counter Count-min Sketch does not require a specific counter table management has two requirements; 1) Det bound-low and size for a target RHTH unlike the Lossy-counting or CbS 2) Det bound-high. Det bound-low is again necessary for algorithm; it rather can be chosen empirically. However, the deterministic RH-protection guarantee. Det bound-high reducing the table size too much could result in a high rate of is newly required for the proper readjustment of Estcnt.A false-positive. Moreover, the algorithm is prone to adversarial similar method of Estcnt adjustment as TWiCe can be applied patterns, given that it is based on the hash function. This can be if the lower bound of the decremented Realcnt is known. more detrimental to the throttling-based RH-protection scheme Out of two possible candidates, we choose the CbS algorithm because falsely throttling a benign or DRAM row can for Mithril due to its smaller area overhead compared to the lead to performance degradation. Lossy-Counting algorithm (Figure 4 in Section IV-C).

5 D. Compatibility of the Grouped Counter Approach preemptive refresh create an upper bound in the rate of Estcnt So far, we have overlooked another method of efficient increment. The first point is easily proved from the algorithm counter management, the grouped counter approach. However, property. The second point, on the other hand, requires rigor- the prior works that had augmented the methodology are not ous proof based on mathematical induction. In the case of the compatible with or efficient at the RFM interface. CBT [43] is prior ARR-based schemes, proving the RH-protection from the representative scheme of this kind. First, it cannot utilize the utilized algorithm (second point) was trivial; they refresh the RFM opportunities during its tree construction phase. a row whenever it reaches a predefined threshold, directly Suppose it chooses to prematurely refresh a group that is not related to the target RHTH . The throttling-based scheme, fully split. In that case, it will have to refresh too many rows BlockHammer, relies on rigorous case categorization and the too conservatively. Second, even after the tree is constructed, analytical solver in proving the RH-protection. having a leaf node of a size larger than eight rows will not fit Proposition 1. Reported Estcnt is always greater than or into a single tRFM period, leading to the stacking of refresh equal to the Realcnt (Det upper-bound). loads. CAT-TWO [25], which extends CBT, may guarantee the leaf to be small (covering a single row) enough, but only at The Estcnt of any address in the CbS algorithm always the cost of a higher area overhead. satisfies the inequality (1) or (2) (Section II-F). Considering Mithril operates the same as the original CbS algorithm IV. MITHRIL at the ACT command, we only need to show whether the Mithril is the first RFM-interface-compatible RH-protection conditions hold at the RFM command. On-table rows: for scheme providing the deterministic protection guarantee. It the decremented address at RFM, inequality (1) still holds exploits a modified CbS algorithm for counter management. because the Realcnt after the RFM command is now zero Each entry in the counter table holds a DRAM row address and and the Estcnt is Min. As for the other addresses in the a counter value related to the address. Mithril also tracks the table, no variable is affected. Off-Table rows: the addresses that maximum and minimum counter entries using two pointers, are not on the table are also unaffected since such decrement MaxP tr and MinP tr, favoring the greedy selection and does not alter the Min of inequality (2). In other words, if adjustment of the Estcnt. We first only consider the rblast we had decremented Estcnt lower than the Min, it would of one. In Section V-C, we will reconsider the different rblast. have affected the boundary of other addresses, breaking the algorithm property. A. Operation Proposition 2. Within any tREFW, the increase of Estcnt for Figure 3 illustrates how Mithril manages its counter struc- any single row is bounded to M, which is a function of N MaxP tr entry ture (henceforth Mithril table) and the two pointers, and RFM . and MinP tr. For every ACT, Mithril checks if the table TH Nentry tRFC ! already tracks the activated row address. If so, the associated X RFMTH RFMTH tREFW(1 − ) M = + tREFI − 2 counter is incremented by one. When the row address misses, k Nentry tRC × RFMTH + tRFM k=1 the address of the entry pointed by MinP tr is replaced with the requesting row address, and its counter is incremented by The detailed proof of Proposition 2 is provided in Appendix one. MaxP tr and MinP tr are updated at each step to point (Section IX). Intuitively, Mithril can limit the maximum to the correct maximum and minimum. Thus far, the operation amount of increase in accumulated ACT count on a single is identical to the original CbS algorithm. row through greedy selection. If a row is activated too often Once in every RFMTH ACT, the MC issues an RFM for a short period, it will soon become the target of the greedy command. At this point, Mithril selects the entry pointed selection and its victims will get refreshed. If a row is activated by MaxP tr (greedy selection). It preemptively performs the too rarely, the increment rate of the activation count will be REFprev for the two victim rows associated with this entry, low, although it can avoid being the target of REFprev. The identified as a prime candidate of aggressor rows. Then, the formula of M can also be interpreted as the worst-case ACT counter value is decremented to the table’s minimum value pattern with the initial condition. pointed by MinP tr. MaxP tr and MinP tr are updated Combining Proposition 1 and 2, the bound M of Estcnt is correspondingly. also always greater than or equal to the increase in Realcnt. Thus, with a properly configured Nentry and RFMTH value B. Brief Overview of the Proof that satisfies the M < RHTH /2, we can prove the determin- Theorem. Mithril guarantees that the maximum increase of istic RH-protection. ACT Realcnt for any single row is smaller than RHTH /2 C. Configuring Nentry and RFMTH during any given tREFW. There are multiple possible Mithril configurations for a Proving the above theorem guarantees the deterministic RH- single target RHTH , because both Nentry and RFMTH can protection at rblast of one. To prove the theorem, we need change to satisfy M < RHTH /2. Figure 4 plots (Nentry, to prove that 1) the modified CbS algorithm with the reset RFMTH ) pairs that satisfy this condition for varying RHTH still holds and that 2) the continuous greedy selection and (e.g., 1.5K, 3.125K, ..., 50K). It first shows a trade-off between

6 row_addr ������ row_addr ������ row_addr ������ row_addr ������ 0xA0 9 0xA0 10 ������ 0xA0 10 ������ 0xA0 2 0xB0 9 ������ 0xB0 9 0xB0 9 0xB0 9 ������ 0xC0 3 0xC0 3 0xC0 3 0xC0 3 0xD0 1 ������ 0xD0 1 ������ 0xE0 2 ������ 0xE0 2 ������

ACT 0xA0 ACT 0xE0 RFM Fig. 3. A sequence of ACT and RFM commands and the corresponding update on the Mithril table.

TABLE III structures must be equipped for every bank at every DRAM NOTATION USED IN THEOREMAND PROPOSITION chip. At every ACT command, the address CAM is checked Symbol Description by the activated row address. Then, the count CAM, MaxP tr, M Maximum increase in an aggressor row’s Estcnt and MinP tr are updated if needed. At every RFM command, within tREFW. the count CAM is queried by the MaxP tr pointed index and Nentry The number of entries in the Mithril table. RFMTH ACT threshold that triggers the RFM command. performs decrement. Then, during the tRFM time window, a new maximum value is found and updated for the MaxP tr. 6 1.56K Wrapping Counter: The absolute counter value of the Mithril 3.125K table can increase unbounded during its run-time, which 6.25K 4 12.5K complicates the hardware implementation. Prior works solved 25K 50K this issue by periodically resetting the whole table [31], [39] or 25K Lossy-Counting using a duplicate counter table in an interleaving fashion [50]; 2 50K Lossy-Counting both are bearing a two-fold degradation of predefined threshold Table size(KB) Table level (from RHTH /2 to RHTH /4) and area, respectively. How- ever, Mithril can avoid this. Unlike the prior works, Mithril 0 16 116 216 316 416 does not require the absolute value of Estcnt. Instead, we ��� �� require the relative difference of Estcnt against the minimum Fig. 4. Each line denotes the possible configuration of Nentry and RFM TH Estcnt on the Mithril table. Moreover, due to the operational that can protect against the given RHTH value. RFM-based scheme built with Lossy-Counting algorithm is also denoted in dotted lines. behavior of Mithril, the maximum difference between the MaxP tr and MinP tr counter values is always bounded. Nentry and RFMTH regardless of RHTH . Decreased Nentry Therefore, we adopt a wrapping counter for the Mithril table implies less area usage but results in the lower RFMTH , implementation. If we provision enough that can express implying more performance and energy overhead due to more a higher value than the maximum difference in the table, the frequent issues of RFM commands. The trade-off exists for all wrapping counter can always correctly identify the relative RHTH , but the curve looks different across the various RHTH size relationship among the Mithril table entries. Through this values. A scheme similar to Mithril but based on the Lossy- implementation, we acquire a two-fold benefit. counting algorithm is also notated at 50K and 25K RHTH values, which clearly demonstrates a greater table size. V. MITHRIL OPTIMIZATIONS When RHTH is sufficiently high (e.g., larger than 12.5K), A. Adaptive Refresh it is possible to set RFMTH to be around 256 at a relatively In Section IV, it was assumed that Mithril performs a small Nentry. Then, Mithril can achieve RH-protection with REFprev for every RFM command. However, the other option a relatively low area, performance, and energy overhead. is to selectively perform a REFprev depending on the current In contrast, when RHTH is low, maintaining the low per- state. Specifically, we propose to perform an additional refresh formance/energy overhead (i.e., sufficiently large RFMTH ) only when the difference between the MaxP tr and the requires a substantially larger Nentry. Overall, this is a trade- MinP tr count values exceeds a certain threshold (AdTH ). off that a DRAM vendor needs to consider when determining We call this an adaptive refresh policy. Nentry. The target RHTH level can be adjusted by tweaking Intuitively, this effectively avoids performing the additional the RFMTH value even if Nentry is fixed. This flexibility refresh when the memory access exhibits a pattern where can be handy when the scheme has to be built based on the no specific row is accessed much more frequently than the predicted RHTH level and thus fixed area, as it can avoid others in a short time window. If AdTH is chosen large excessive performance/energy overhead. enough, Mithril with the adaptive refresh policy can effec- tively filter out the ACT patterns observed by the normal D. Hardware Implementation workloads. By doing so, we can minimize the unnecessary Mithril Hardware Overview: The Mithril table comprises energy overhead caused by false-alarmed refreshes. Figure 5 two CAM structures, each storing the row address and the shows the effectiveness of the adaptive refresh policy, almost activation count. A MaxP tr and MinP tr are also employed eliminating additional energy overhead with benign workloads as the index pointing registers. The Mithril logic and CAM (see Section VI for the details of the experimental setup).

7 120 30 TABLE IV 100 (%) ARCHITECTURAL PARAMETERS FOR SIMULATION 80 20 60 !"#$%

40 10 � Core Configurations (16 cores) 20 Core 3.6 GHz 4-way OOO cores 0 0

0 0 0 0 LLC 16 MB 50 50 50 50 Relativeenergy (%) ����

100 150 200 100 150 200 100 150 200 100 150 200 Memory System Configurations Additional (����, �����) (3.125K,Multi-programmed 16) (6.25K, 64) (3.125K,Multi-threaded 16) (6.25K, 64) Module DDR5-4800 Multi-programmed Multi-threaded Channel 2 channels Configuration 1 rank; 32 banks per rank Fig. 5. Relative dynamic energy overhead against baseline with no RH- Scheduling BLISS [47] protection and relative number of Nentry regarding 4 different AdTH levels. -Policy Minimalist-open [26] tRFC, tRC, tRFM 295 ns, 48.64 ns, 97.28 ns 150 tRCD, tRP, tCL 16.64 ns 100 B. Mithril+ 50 The adaptive refresh policy allows Mithril to skip REF 0 prev Accessed row Accessed 0 50 100 150 200 250 300 350 400 450 even when the RFM command is issued by the memory (a) Access pattern (large time window) 15 controller. By doing so, Mithril can save energy but not the 10 performance overhead. Whether a DRAM actually performs 5 refreshes or not, the MC will continue to issue RFM com- mands at every RFMTH ACTs. Accessed row Accessed

AccessedRow 0 (b) Access pattern (small time window) Inspired by such limitation, we propose an optional, more 15 invasive extension of Mithril, named Mithril+, which pre- 10 vents the MC from issuing the unnecessary RFM command. 5 Mithril+ utilizes the mode register in the DRAM device, which 0 Activated rowActivated ActivatedRow is flagged when the difference between MaxP tr and MinP tr 15 20 25 30 35 40 45 time (μs) is smaller than AdTH . At every RFMTH , MC reads the (c) Activate pattern (small time window) flag using the JEDEC-standard MRR (Mode Register Read) command, deciding whether or not to issue the RFM com- Fig. 6. Exemplar large object sweep pattern of lbm in SPEC CPU2017. (a) shows the in large time window. (b) is magnified to mand. With this interface, Mithril+ can substantially minimize the small window. (c) shows the activation pattern in the small window. the performance overhead at the common case of ordinary workloads at the expense of modification to the RFM interface.

Among the multiple AdTH values, we can identify that C. Non-adjacent Row Hammer the adaptive refresh policy is effective at the range of 100 Mithril can follow similar approaches of prior works [39], to 200 in all cases. We seek the root cause in the cross-play [50] in handling the non-adjacent RH, by adjusting the M of memory access patterns of the ordinary workloads and the value and number of rows to execute REFprev. When rblast DRAM row size. Multithreaded or memory-intensive work- is one (double-sided attack), M smaller than RHTH /2 is safe. loads often exhibit large-object-sweep behavior that results in However, when rblast is larger, M has to be smaller than main-memory accesses (Figure 6(a)). In such a case, memory RHTH /(aggregated RH effect) regarding the non-adjacent access concentrates on a small number of rows in a short time aggressors. In the case of a rblast of 3, the aggregated RH period (Figure 6(b)) while being rather evenly distributed to effect is 3.5 [50], with six victim rows to execute REFprev. the entire footprint in total. Although such an access pattern may possess high DRAM row locality, inter-process/thread VI.EVALUATION conflict may cause a high rate of ACT per memory access We evaluate the performance, energy, and area overhead (Figure 6(c)). Here, the number of concentrated ACTs would of Mithril and Mithril+, in comparison with the probabilistic be similar to the number of streaming RDs/WRs, which would RFM-based scheme (henceforth PARFM), as well as prior be 128 for 8KB DRAM row and 64B line size. This works based on either ARR or throttling. matches the value range of the effective adaptive threshold range, although the exact value must be decided empirically. A. Experimental Setup The adaptive refresh policy causes a slight deterioration on Methodology: Performance overhead is evaluated based on the bound M, thus inducing a higher area or performance McSimA+ [3]. Table IV summarizes the experimental setup. overhead to reach the same effect as the baseline. However, We use normalized IPC throughput as the performance metric. such an effect is minimal unless AdTH is very high. Figure 5 We count the number of ACT, PRE, and executed REFprev to shows a small increase in Nentry, maximum 12% only at calculate the dynamic energy consumption. We first synthesize very low RHTH . The mathematical proof of the adjusted Mithril hardware module RTL implementation using TSMC 40 table size can be derived from Proposition 2 and conducted in nm standard cell library with the Synopsys design compiler. Section IX-B. The area overhead is scaled down to DRAM 20 nm, then again

8 Mithril Mithril+ PARFM scaled up 10× [13] to conservatively take the inferior DRAM 102 process into account. Mithril hardware energy consumption is (%) 100 also derived from the synthesis. 98 96

Workloads: We use 1) ordinary, 2) double-sided RH, 3) multi- Relative 94 sided RH, 4) CBT energy adversarial, and 5) BlockHammer ����� 784 256 128 64 32 128 64 32 32 16 performance performance adversarial workloads for evaluation. We use both ���� 50k 6.25k 3.125k 1.5k (a) Performance overhead 1.5 multi-programmed and multi-threaded workloads for ordinary 3.12 6.25 12.5 workloads. From SPEC CPU2017, we extract 100M instruc- (%) 1.0 tion traces [45] and render two different workloads of mix- 0.5 high and mix-blend, each with 16 traces of memory-intensive Energy overhead 0.0 784 256 128 64 32 128 64 32 32 16 and randomly selected workloads, respectively. We execute ����� 50k 6.25k 3.125k 1.5k 400M instructions in total. We also evaluate 3 different multi- ���� (b) Energy overhead threaded benchmarks (FFT and RADIX from SPLASH-2 [40], PageRank from GAP [7]). Fig. 7. The relative performance and energy overhead of Mithril, Mithril+, and PARFM. (a) 100% denotes the baseline IPC throughput without the RFM We configure the multi-sided RH attack that targets multiple interface. (b) the additional dynamic energy overhead of the REFprev and victims [14], [15]. We model the CBT energy adversarial the module. pattern as the randomized ACT pattern over the whole DRAM address space; it effectively uniformly constructs the CBT tree, 700 (as opposed to 109 ACTs in original system with more making the leaf group large. The BlockHammer performance banks per thread [50]), especially for the memory-intensive adversarial pattern is configured to blacklist specific profiled workloads. Because NBL must be lower than RHTH /2 (750 rows that share the CBF entry with the benign threads. Each is for RHTH 1.5K), it is difficult to set an appropriate NBL activated just enough to reach the blacklist threshold. This may value that distinguishes the benign accesses against aggressor effectively cause the throttling of benign workloads, especially accesses and fulfill the RH-protection at RHTH 1.5K, while for memory-intensive ones. Each RH attack or adversarial also incurring minimal performance overhead. pattern runs simultaneously with 15 other benign workloads. B. RFM-based Schemes Configurations: We select up-to three different Mithril and Figure 7(a) shows the performance and energy overhead N RFM RH Mithril+ ( entry, TH ) configurations for each TH , of the RFM-based schemes. Mithril+ shows near-zero perfor- RH ranging from 50K to 1.5K. At high TH values of 50K and mance overhead in all RH levels. Mithril shows various RFM N TH 25K, the TH is fixed to 256 given that entry is already performance degradation, depending on the RH level and RH RFM TH low. At the lowest TH of 1.5K, the TH is fixed to the configuration types. At every RH level, Mithril with RFM TH 32 because a higher TH value results in an overly high multiple configurations shows better or at least competitive N Ad entry. We use 200 for TH as default. PARFM, similar to performance overhead when compared with PARFM. PARA, samples a single row that is activated in a RFMTH Figure 7(b) shows dynamic energy overhead, induced by interval and executes REFprev. A new mathematical equation the executed REFprev. Both Mithril and Mithril+ show low for the probability of failure is conducted in Section IX-C. For overhead at all RHTH levels because the properly configured each target RHTH , the RFMTH value is fixed for PARFM −15 AdTH value allows them to skip the majority of RFM com- to satisfy the failure probability of 10 for 64 banks within mands in common cases. PARFM incurs much higher energy a 32ms time period (tREFW). The probability degrades if the overhead as it executes REFprev at every RFM command. number of banks to support increases. We also evaluate the prior ARR and throttling-based C. Comparison with the Prior Works schemes. We configure TWiCe and Graphene using the equa- Figure 8 shows the performance and the energy overhead tions provided in each work. We configure PARA to satisfy of various schemes on multiple workloads at RHTH ranging the same failure probability of 10−15. CBT is configured to from 50K to 1.5K. First, on the ordinary workloads, all follow the configurations in the original work [43]. the evaluated schemes show negligible performance overhead 2 We reconfigure the BlockHammer to match our simulation except at an extremely pessimistic RHTH level of 1.5K environment and our target RHTH values. For (CBF Size, (Figure 8(a)) for Mithril and PARA. Second, as for the double- NBL) pairs, we used (1K, 17.1K), (1K, 8.6K), (1K, 4.3K), sided and multi-sided RH attack, a similar tendency is ob- (2K, 2.1K), (4K, 1.1K), and (8K, 0.49K) for the RHTH from served in the aggregate IPC of the benign workloads, whereas 50K to 1.5K. Under our system of 4 banks for a single the BlockHammer exhibits better throughput in most cases thread, the number of ACTs per row easily reaches over (Figure 8(b,c)). It is because the attacking thread is throttled by BlockHammer, favoring the benign threads. Notice that 2BlockHammer uses a pair of interleaved counting bloom filters (CBFs) similar to the Count-min Sketch algorithm. Each CBF is reset at every the RH attack is adversarial to Mithril, Mithril+, TWiCe, and CBF lifetime (tCBF ), which typically matches tREFW. There exists a Graphene, where Mithril executes more REFprev (energy), certain blacklist threshold (NBL) of ACT, which triggers the delay on a Mithril+ issues the RFM commands more frequently (perfor- certain row when surpassed. Delay time (t ) is calculated as (t − Delay CBF mance), and TWiCe and Graphene issue the ARR command NBL×tRC)/(RHTH − NBL). Thread-level scheduling support is built on top of these to throttle the aggressor thread itself. more frequently (performance).

9 50k 25k 12.5k 6.25k 3.125k 1.5k

PARA CBT TWiCe Graphene BlockHammer Mithril Mithril+ TABLE V 106 PER BANK TABLE SIZE COMPARISON (KB) (%) 104 102 RH-protection 100 Location 50K 25K 12.5K 6.25K 3.125K 1.5K Scheme Relative 98 CBT 0.47 0.97 2 4.12 8.5 17.5 89.0 94.4 94.9 performance 96 MC Graphene 0.14 0.21 0.51 0.99 1.92 3.7 �� !" 50k 25k 1.5k 50k 25k 1.5k 50k 25k 1.5k BlockHammer 3.75 3.5 3.25 6 11 20 12.5k 6.25k 12.5k 6.25k 12.5k 6.25k 3.125k 3.125k 3.125k Buffer Chip TWiCe 2.79 5.08 9.54 18.27 35.29 71.26 (a) Ordinary (b) Double-sided RH (c) Multi-sided RH A 0.08 0.17 0.41 0.84 3.76 - 2.5 105 DRAM Mithril B - - 0.3 0.68 1.78 - C - - 0.27 0.57 1.38 4.85 2 100 (%)

(%) * Three Mithril configurations (A,B,C) have different RF MTH , from 256 to 32. 1.5 95

1 90 Energy

0.5 Relative 85 , allowing adversaries to compromise the overhead

0 performance 80 confidentiality and integrity and confidentiality of real systems. ��!" 50k 25k 12.5k 6.25k 3.125k 1.5k ��!" 50k 25k 12.5k 6.25k 3.125k 1.5k (d) CBT energy adversarial (e) BH performance adversarial In 2015, [42] demonstrated that a user-level program could breach the system-level security of a typical PC by Fig. 8. (a) Relative performance at ordinary workloads. (b),(c) Relative exploiting Row Hammer vulnerability. A number of successful performance of benign threads at the double-side and multi-sided RH-attack. (d) Relative dynamic energy at the CBT-adversarial pattern. (e) Relative attacks have followed [42], including the ones compromising performance of benign threads at the BlockHammer-adversarial pattern. mobile devices [48], [49] and servers [12], [18], [41] that break the authentication process and damage the entire system, Third, under the CBT-energy-adversarial pattern, both CBT even in the case that a system protects memory locations and PARA show high dynamic energy overhead, especially near sensitive data [53]. Because Row Hammer undermines at low RHTH (Figure 8(d)). PARA shows high energy the fundamental principle of memory isolation, it has been overhead, regardless of the ACT pattern. CBT shows high regarded as a real threat drawing mitigation proposals from energy overhead in this case because the largely built leaves software, architecture, and hardware levels. covering several DRAM rows frequently reach the predefined Architectural Proposals to Mitigate Row Hammer: There threshold, resulting in a high number of REFprev. Fourth, un- have been deterministic [25], [31], [39], [44], [50] and prob- der the BlockHammer-performance-adversarial pattern, Block- abilistic [29], [46], [52] schemes proposed to mitigate Row Hammer shows lower IPC throughput than usual (Figure 8(e)). Hammer attacks in the architecture level. Among these, [44], It is because multiple rows accessed by the benign memory- [46], [52] are susceptible to adversarial DRAM access pat- intensive thread are blacklisted and thus throttled. terns. TWiCe [31] and CAT-TWO [25] are relatively free from We report the counter table size of each scheme in KB- this susceptibility but require an order of magnitude more per-bank (Table V). While the MC-side scheme benefits from storage to track aggressor rows compared to Graphene [39]. exploiting faster , abundant wiring resources, and a PARA [29] incurs low performance and energy overhead, relaxed area budget, the number of total banks is much higher whereas it is also extremely area-efficient as it does not require (2048), and the target RHTH value has to be pessimistic. The counters to trace aggressor rows. Yet, the protection is prob- DRAM-side implementation can benefit from fewer banks (32) abilistic in nature; even if the probability is quite small, there to support per device and more accurate RHTH value, but is a non-zero probability that a victim row is not refreshed is hindered by slower transistors and tighter area and wiring after reaching its Row Hammer threshold. BlockHammer [50] budget. Mithril shows lower or competitive area overhead in uses a throttling approach backed up with a thread-level MC 2 terms of KB per bank, reaching 0.024mm at 6.25K RHTH . scheduling. None of the proposals are compatible with RFM, It is 1% of a single DDR5 chip [27] when multiplied 32× to the topic of this paper, either. cover the 32 banks per chip. All the schemes suffer from a high area or performance VIII.CONCLUSION overhead at the RHTH level of 1.5K. However, it is possible to We have proposed Mithril, a DRAM-side, RFM-compatible, use the throttling-based MC-side scheme (e.g., BlockHammer) energy-efficient scheme that provides deterministic safety in cooperation with Mithril to avoid scalability issues. For against Row Hammer attacks. First, we show that the conven- example, if the throttling-based scheme guarantees the RHTH tional algorithms and methodology used in the previous archi- of 32K with low overhead, the maximum number of ACTs in a tectural RH-prevention schemes are not compatible with the tREFW time window is reduced from 600K (tREF W/tRC) RFM command introduced in the latest DRAM specifications, to 32K. Because the second term of the equation for M in such as DDR5 and LPDDR5. By mathematical defining the Proposition 2 is dominated by this, M can effectively decrease maximum bound of activation count without being refreshed in 18×, and Mithril can easily defend against a low RHTH level a tREFW time window, we guarantee the safety on the specific with only a minimal area or performance overhead. RHTH value. The devised adaptive refresh policy can decrease the energy overhead by exploiting the row activation pat- VII.RELATED WORK terns of ordinary workloads. Moreover, we proposed Mithril+ Row Hammer on Real Systems: As a hardware vulnerability, that requires slight modification to the memory controller Row Hammer has been shown to bypass all the system operation. It utilizes the existing DRAM command to skip

10 k k 3 P P Identity 3. i=1cj[i] ≤ i=1cj−1[i]+RFMTH for 1≤k ≤N � 1 ��� � � 1 � 1 1 Proof: This is an obvious extension from Identity 2 because � 2 � 2 � 2 � 2 Pk Pk 0 1 c [i] ≤ c [i] . . . . i=1 j i=1 j−1 . In other words, an RFM refresh . . . . 1 : Identity 1 always decreases the sum of top k counter values in the table. � � − 1 � � − 1 � � − 1 � � − 1 2 : Identity 2 1 � � � � � � � � Pk−1 k−1 Pk . . . 2 . 3 : Identity 3 Identity 4. i=1 cj[i] ≤ k ( i=1 cj−1[i] + RFMTH ) for . . Refreshed . . by RFM � � � � �� � � � 2 ≤ k ≤ N Proof: Using Identity 1, Identity 2, and the fact that c0 [1] ≥ ���!" ACTs ���!" ACTs j−1 0 (j-1)-th RFM interval j-th RFM interval cj−1[i] for all i, the following holds true: k−1 k k k ! Fig. 9. Exemplar case illustrating how the Est is updated between two X X k − 1 X k − 1 X cnt c [i] = c0 [i] ≤ c0 [i] ≤ c [i] + RFM consecutive RFM intervals. j j−1 k j−1 k j−1 TH i=1 i=2 i=1 i=1 sending RFM commands, which can significantly reduce the With these identities, we are ready to prove Proposition 2. performance overhead of the RFM interface. Our evaluation Proving Proposition 2 is equivalent to proving the following: demonstrates that Mithril achieves a significantly low energy c0 [1] − c [N] ≤ M for given c [1], ..., c [N] overhead in all cases compared to the PARA tailored to support W 1 1 1 RFM, whereas suffering from a higher performance overhead. This is because any row’s Estcnt that is increased during the Mithril+ shows not only a low energy overhead but also a W RFM intervals is obviously less than difference between 0 significantly lower performance overhead, to be comparable the largest Estcnt at the end (cW [1]) and the smallest Estcnt to Graphene, a state-of-the-art RH-prevention scheme which at the beginning (c1[N]). Then, we can obtain the upper bound 0 is not supporting RFM. for cW [1] through the following: 0 cW [1] ≤ cW [1] + RFMTH (∵ Identity 2) IX.APPENDIX 2 ! 1 X ≤ c [i] + RFM + RFM ( Identity 4) A. Proof for Proposition 2 2 W −1 TH TH ∵ i=1 3 ! 2 Proposition 2. Within any tREFW, the increase of Estcnt for 1 X X RFMTH ≤ c [i] + RFM + ( Identity 4) any single row is bounded to M, which is a function of N 3 W −2 TH k ∵ entry i=1 k=1 and RFM . TH Repeatedly applying Identity 4 for a total of N times, we get Nentry tRFC ! the following inequality: X RFMTH RFMTH tREFW(1 − tREFI ) M = + − 2 N ! N−1 k Nentry tRC × RFMTH + tRFM 1 X X RFMTH k=1 c0 [1] ≤ c [i] + RFM + W N W −N+1 TH k i=1 k=1 Henceforth, Nentry is replaced by N. Also, W represents N ! N 1 X X RFMTH the maximum number of RFM intervals (the period between = c [i] + N W −N+1 k two consecutive RFM commands) between consecutive auto- i=1 k=1 refreshes and is computed as follows: At this point, we cannot apply Identity 4 anymore, but instead W = d(tREFW − (tREFW/tREFI) × tRF C)/(tRC × RFMTH + tRFM)e apply Identity 3 (k = N) for W − N times. Suppose that cj[i] is the i-th largest Estcnt in the Mithril table PN N c1[i] (W − N)RFMTH X RFMTH c0 [1] ≤ i=1 + + at the beginning of j-th RFM interval (1 ≤ i ≤ N, 1 ≤ j ≤ W N N k W ). c0 [i] is the i-th largest Est in the table at the end of k=1 j cnt PN i=1 c1[i] N − 2 the j-th RFM interval. Figure 9 illustrates the exemplar case = + M − (RFMTH − 1) for these notations. Then, the following identities hold true. N N Earlier, we showed that proving Proposition 2 is equivalent to 0 0 Identity 1. cj[i] = cj+1[i − 1] proving cW [1]−c1[N] ≤ M. With the above equation, proving Proof: At the end of each RFM interval, an entry with the the following is the only step left to prove Proposition 2: largest Est (i.e., c0 [1]) becomes the target for the RFM PN cnt j i=1 c1[i] N − 2 − c1[N] ≤ RFMTH refresh, and its Estcnt is reset to the minimum count in the N N table. Thus, the ranks of all the other entries promote by one Here, the left hand side can be represented as follows. after the RFM refresh. PN c [i] PN c [i] − Nc [N] PN (c [i] − c [N]) i=1 1 − c [N] = i=1 1 1 = i=1 1 1 Pk 0 Pk N 1 N N Identity 2. i=1 cj[i] ≤ i=1 cj[i]+RFMTH for 1≤k ≤N RFM PN Proof: Using the fact that there are TH ACTs within The upper bound of (cj [i]−cj [N]) for any j-th RFM inter- 0 i=1 each RFM interval and c [i] is larger than cj[i] for all i by PN j val can be obtained by contradiction. Assume that i=1(cj [i]− definition, the following holds true. cj [N]) is maximum when j is m, and the difference between k N N X 0 X X 0 cm[1] and cm[N] is greater than RFMTH . Then cj[i] = cj[i] − cj[i] + RFMTH 0 0 0 0 i=1 i=1 i=k+1 cm−1[1] − cm−1[N] ≥ cm−1[2] − cm−1[N] k N k (3) X X 0 X = cm[1] − cm[N] > RF MTH ≤ cj[i] − cj[i] − cj[i] + RFMTH ≤ cj[i] + RFMTH i=1 i=k+1 i=1

11 At the end of the (m-1)-th RFM interval, c0 [1] is reduced N N m−1 X X 0 to c0 [N] by RFM. Therefore cj[i] = cj−1[i] m−1 i=1 i=1 k N N N X 0 X 0 X X 0 0 0 0 = c [i] + c [i] (cm[i] − cm[N]) = (cm−1[i] − cm−1[N]) − (cm−1[1] − cm−1[N]) j−1 j−1 i=1 i=1 i=1 i=k+1 N k X 0 0 X = (cm−1[i] − cm−1[N]) + RFMTH − (cm−1[1] − cm−1[N]) 0 0 ≥ cj−1[i] + (N − k)cj−1[N] i=1 N i=1 X < (c [i] − c [N]) ( (3)) k m−1 m−1 ∵ X 0 0 i=1 ≥ cj−1[i] + (N − k)(cj−1[1] − AdTH ) i=1 PN k k ! This contradicts that (cj [i]−cj [N]) is maximum when j is i=1 X 0 1 X 0 PN ≥ cj−1[i] + (N − k) cj−1[i] − AdTH m. Therefore, if i=1(cj [i] − cj [N]) is maximum when j is m, k i=1 i=1 the difference between cm[1] and cm[N] is less than or equal k N X 0 to RFM . Then, we get the following inequality: ≥ c [i] − (N − k)AdTH TH k j−1 i=1 PN PN k k i=1(c1[i] − c1[N]) i=1(cm[i] − cm[N]) ≤ X X 0 N N cj[i] = cj[i] PN−2 i=1 i=1 i=1 (cm[i] − cm[N]) = (∵ cm[N − 1] = cm[N]) N ! N k X 0 PN−2 ≤ cj−1[i] + (N − k)AdTH (cm[1] − cm[N]) N ≤ i=1 i=1 N N ! k X (N − 2)RFMTH ≤ c [i] + RFM + (N − k)Ad ≤ N j−1 TH TH N i=1

B. Finding New M for Adaptive Refresh Similar to Proposition 2, proving Proposition 3 is equivalent 0 0 to proving cW [1] − c1[N] ≤ M . For some arbitrary number If the adaptive refresh policy (Section V-A) is applied to n smaller than W , assume that REFprev does not occur at Mithril, Identity 1 and 4 in Section IX-A no longer hold the n-th last RFM interval (i.e., the (W − n)-th RFM interval) because REFprev may not occur at the end of the RFM and that REFprev occurs at all the subsequent RFM intervals. interval. The modified M (henceforth M 0) for adaptive Even with the adaptive refresh, Identity 2 and 3 are still true. 0 refresh is as follows. If n is greater than N, the upper bound of cW [1] − c1[N] is equivalent to M (the result of Proposition 2). We can obtain Proposition 3. When the adaptive refresh is applied to Mithril, this result by applying Identity 5 (equivalent to Identity 4) for 0 the increase of Estcnt for any single row within any tREFW N times and applying Identity 3 for (W − N) times to cW [1]. 0 is bounded to M , which is a function of Nentry, RFMTH , Otherwise, if n is less than or equal to N, we first and AdTH . repeatedly apply Identity 5 for a total of (n − 1) times to 0 n ∗ ∗ obtain the upper bound of c [1]: X RFMTH (W − n + Nentry − 2)RFMTH + (Nentry − n )AdTH W M 0 = + k Nentry 0 k=1 cW [1] ≤ cW [1] + RFMTH (∵ Identity 2) ∗ W = d(tREFW − (tREFW/tREFI) × tRFC)/(tRC × RFMTH + tRFM)e 2 ! 1 X ∗ n∗ = d(N × RFM )/(RFM + Ad )e ≤ c [i] + RFM + RFM ( Identity 5) entry TH TH TH 2 W −1 TH TH ∵ i=1 3 ! 2 At the end of any j-th RFM interval, REF does not 1 X X RFMTH prev ≤ cW −2[i] + RFMTH + ( Identity 5) 0 0 3 k ∵ occur if the difference between cj[1] and cj[N] is less than i=1 k=1 . AdTH . Considering that REFprev may not occur, we modify . Identity 4 to Identity 5 and 6 as follows: n ! n−1 1 X X RFMTH ≤ cW −n+1[i] + RFMTH + 0 0 n k Identity 5. If cj−1[1] − cj−1[N] > AdTH , then i=1 k=1 k k+1 n ! n P k P 1 X X RFMTH i=1 cj[i] ≤ k+1 ( i=1 cj−1[i] + RFMTH ) for 1 ≤ k ≤ = c [i] + n W −n+1 k N − 1 i=1 k=1 0 0 Proof: If cj−1[1] − cj−1[N] > AdTH , REFprev occurs at At this point, we have to apply Identity 6. the end of the (j-1)-th RFM interval, so we can derive the N ! n 1 X X RFMTH same result as Identity 4. c0 [1] ≤ c [i] + RFM + (N − n)Ad + W N W −n TH TH k i=1 k=1 0 0 Identity 6. If cj−1[1] − cj−1[N] ≤ AdTH , then Pk k PN Then, we apply Identity 3 (k = N) for (W − n) times. i=1 cj[i] ≤ N ( i=1 cj−1[i] + RFMTH + (N − k)AdTH ) for 1 ≤ k ≤ N − 1 N ! n 1 X X RFMTH c0 [1] ≤ c [i] + (W − n)RFM + (N − n)Ad + Proof: Because RFM does not occur at the j-th RFM interval, W N 1 TH TH k 0 i=1 k=1 cj[i] = cj−1[i] for 1 ≤ i ≤ N. Then, the following holds true.

12 PN The maximum value of i=1(cj [i]−cj [N]) is equivalent to that does not contribute to reaching the RHTH ACTs), activating in the proof of Proposition 2. Then, the following holds true. a row only a single time for every RFMTH is the most N ! 1 X cost-effective pattern. We thus base our further formulation c0 [1] − c0 [N] ≤ (c [i] − c [N]) + (W − n)RFM + (N − n)Ad W 1 N 1 1 TH TH i=1 on that the RFMTH number of different rows are activated n X RFMTH + once every RFMTH period. k k=1 Compared to PARA, the mathematical formula also must 1 ≤ ((N − 2)RFM + (W − n)RFM + (N − n)Ad ) N TH TH TH be different. The following formula is the accurate probability n X RFMTH + of failure for a single DRAM bank in a tREFW time win- k k=1 dow. F ail(1) denotes the failure probability where a single n 1 X RFMTH ≤ ((N − 2 + W − n)RFM + (N − n)Ad ) + (4) row fails. F ail(2) denotes the failure probability where two N TH TH k k=1 different rows fail in a tREFW window and so on. Based on Suppose that M 0[n] is the right side of the inequality (4) for Equation (5), a uniform distribution of ACTs on the activated 1 ≤ n ≤ N. Note that M 0[N] = M, which is the upper bound rows is assumed. 0 of c [1]−c1[N] when n is greater than N (which is the same W Bank failure probability: RFMTH C1F ail(1) − RFMTH C2F ail(2) as the value when the adaptive refresh is not applied). Now, we +RFM C3F ail(3) − RFM C4F ail(4) ... need to find the n value maximizing M 0[n]. For 2 ≤ n ≤ N, TH TH 0 0 the difference between M [n] and M [n − 1] is as follows: F ail(1) can be calculated using the following recurrence RFMTH RFMTH + AdTH equation where P [i] denotes the failure probability at the i-th M 0[n] − M 0[n − 1] = − n N RFM command: 1 1 RHTH /2 0 0 P [i] = P [i − 1] + (1 − ) (1 − P [i − RHTH /2 − 1]) (M [n] − M [n − 1]) is a decreasing function with respect to RFMTH RFMTH n. Thus, the largest n (i.e., n∗) satisfying M 0[n] > M 0[n − 1] ∗ The initial condition is as follows. is given by n = d(N ×RFMTH )/(RFMTH +AdTH )e, and  0 0 for 0 ≤ i ≤ RHTH − 1 it maximizes M [n]. Finally, we can prove Proposition 3 as  2 follows: P [i] =  1 RHTH /2 1 − for i = RHTH ∗  2 n ∗ ∗  RFMTH X RFMTH (W − n + N − 2)RFMTH + (N − n )AdTH c0 [1] − c0 [N] ≤ + W W k N k=1 We acquire F ail(1) by calculating the last P [i] in a tREFW = M 0 window. F ail(2) is much smaller than F ail(1) because the ∗ M 0 = M 0[n∗] RHTH value exceeds over 1K even at the most pessimistic RH C. PARFM Probability of Failure vulnerability. The probability of more than one row reaching the RH ACT value without being refreshed is much less A PARA-inspired, intuitive form of a probabilistic preven- TH likely compared to that of a single row. Therefore, we estimate tion scheme PARFM is deployable under the RFM interface. the (upper-bound) probability of failure with only the first term Whenever an RFM command arrives, PARFM randomly sam- of bank failure probability. Using the bank failure probability, ples a single aggressor row among the last RFM activa- TH we can acquire the system failure probability based on the tions and executes the REF on its victims. The probability prev number of banks that can be simultaneously attacked (N ). of being selected for a row depends on the ratio of its ACTs on banks the last RFMTH activations. PARFM’s protection capability System Failure Probability: 1 − (1 − F ail(1))Nbanks depends on RFMTH , which determines the sampling rate. Failure probability of PARFM requires two major modifi- In our experimental system of 2 ranks of 32 banks each, cations on the original method of PARA: the worst-case ACT total 22 banks can be activated satisfying the tFAW constraints. pattern and mathematical formulation. First, the number of In Section VI-A, we properly set the RFMTH value on each rows to activate depends on the given RFMTH value, while target RHTH ; therefore, our system failure probability is lower the worst-case ACT pattern of PARA was to activate a single than 10−15. row continuously. Suppose only a single row is activated under PARFM. In that case, it will always be selected at the next REFERENCES RFM command, and its victims will receive REFprev. From [1] M. T. Aga, Z. B. Aweke, and T. Austin, “When good protections go the attacker’s perspective, the cost-effectiveness in minimiz- bad: Exploiting anti-dos measures to accelerate rowhammer attacks,” in 2017 IEEE International Symposium on Hardware Oriented Security ing the PARFM selection and quickly reaching RHTH is and Trust (HOST). IEEE, 2017, pp. 8–13. expressed as follows (where j denotes the number of ACTs [2] P. K. Agarwal, G. Cormode, Z. Huang, J. M. Phillips, Z. Wei, and K. Yi, “Mergeable summaries,” ACM Transactions on Database Systems for a single row in a single RFMTH activation interval): (TODS), vol. 38, no. 4, pp. 1–28, 2013.  j 1/j [3] J. Ahn, S. Li, S. O, and N. P. Jouppi, “McSimA+: A manycore simu- Cost-effectiveness: 1 − (5) lator with application-level+ simulation and detailed RFMTH modeling,” in IEEE International Symposium on Performance Analysis of Systems and Software, 2013. Because this is a monotonically decreasing function and the [4] Apple Inc., “About the security content of Mac EFI Security Update RFMTH period that the row is not activated can be ignored (it 2015-001,” https://support.apple.com/en-us/HT204934, 2015.

13 [5] Z. B. Aweke, S. F. Yitbarek, R. Qiao, R. Das, M. Hicks, Y. Oren, [30] R. K. Konoth, M. Oliverio, A. Tatar, D. Andriesse, H. Bos, C. Giuffrida, and T. Austin, “ANVIL: Software-Based Protection Against Next- and K. Razavi, “ZebRAM: Comprehensive and Compatible Software Generation Rowhammer Attacks,” in Proceedings of the 21st Interna- Protection Against Rowhammer Attacks,” in 13th USENIX Symposium tional Conference on Architectural Support for Programming Languages on Operating Systems Design and Implementation, 2018. and Operating Systems, 2016. [31] E. Lee, I. Kang, S. Lee, G. E. Suh, and J. Ahn, “TWiCe: Preventing [6] R. Balasubramonian, Innovations in the Memory System. Morgan & Row-hammering by Exploiting Time Window Counters,” in Proceedings Claypool Publishers, 2019. of the 46th International Symposium on , 2019. [7] S. Beamer, K. Asanovic, and D. A. Patterson, “The GAP Benchmark [32] G. S. Manku and R. Motwani, “Approximate Frequency Counts over Suite,” ArXiv, 2015. Data Streams,” in Proceedings of the 28th International Conference on [8] I. Bhati, M. Chang, Z. Chishti, S. Lu, and B. Jacob, “DRAM Refresh Very Large Data Bases, 2002. Mechanisms, Penalties, and Trade-Offs,” IEEE Transactions on Com- [33] A. Metwally, D. Agrawal, and A. El Abbadi, “Efficient Computation of puters, 2016. Frequent and Top-k Elements in Data Streams,” in Proceedings of the [9] F. Brasser, L. Davi, D. Gens, C. Liebchen, and A.-R. Sadeghi, “CAn’T 10th International Conference on Database Theory, 2005. Touch This: Software-only Mitigation Against Rowhammer Attacks [34] J. Misra and D. Gries, “Finding repeated elements,” Science of Computer Targeting Kernel Memory,” in 26th USENIX Conference on Security Programming, vol. 2, no. 2, 1982. Symposium, 2017. [35] S. Muthukrishnan, Data streams: Algorithms and applications. Now [10] M. Charikar, K. Chen, and M. Farach-Colton, “Finding Frequent Items Publishers Inc, 2005. in Data Streams,” in Proceedings of the 29th International Colloquium [36] O. Mutlu and J. S. Kim, “RowHammer: A Retrospective,” IEEE Trans- on Automata, Languages and Programming, 2002. actions on Computer-Aided Design of Integrated Circuits and Systems, [11] L. Cojocar, J. Kim, M. Patel, L. Tsai, S. Saroiu, A. Wolman, and 2019. O. Mutlu, “Are we susceptible to rowhammer? an end-to-end methodol- [37] B. Nale and C. E. Cox, “Refresh command control for host assist of ogy for cloud providers,” in IEEE Symposium on Security and Privacy row hammer mitigation,” Jul. 25 2019, uS Patent App. 16/370,578. (S&P), 2020. [38] K. Park, C. Lim, D. Yun, and S. Baeg, “Experiments and root cause [12] L. Cojocar, K. Razavi, C. Giuffrida, and H. Bos, “Exploiting correcting analysis for active-precharge hammering fault in under 3x codes: On the effectiveness of against rowhammer attacks,” nm technology,” Microelectronics Reliability, 2016. in IEEE Symposium on Security and Privacy (S&P), 2019. [39] Y. Park, W. Kwon, E. Lee, T. J. Ham, J. Ahn, and J. W. Lee, “Graphene: [13] F. Devaux, “The true processing in memory accelerator,” in 2019 IEEE Strong yet Lightweight Row Hammer Protection,” in MICRO, 2020. Hot Chips 31 Symposium (HCS). IEEE Computer Society, 2019, pp. [40] PARSEC Group, “A Memo on Exploration of SPLASH-2 Input Sets,” 1–24. in Princeton University, 2011. [14] P. Frigo, E. Vannacci, H. Hassan, V. van der Veen, O. Mutlu, C. Giuf- [41] K. Razavi, B. Gras, E. Bosman, B. Preneel, C. Giuffrida, and H. Bos, frida, H. Bos, and K. Razavi, “Trrespass: Exploiting the many sides of “Flip feng shui: Hammering a needle in the software stack,” in USENIX target row refresh,” in S&P, May 2020. Conference on Security Symposium, 2016. [15] ““half-double”: Next-row-over assisted rowhammer,” https://github. [42] M. Seaborn and T. Dullien, “Exploiting the DRAM Rowhammer Bug com/google/hammer-kit/blob/main/20210525_half_double.pdf, Google, to Gain Kernel Privileges,” https://googleprojectzero.blogspot.com/2015/ 2021. 03/exploiting-dram-rowhammer-bug-to-gain.html, 2015. [16] Z. Greenfield and L. Tomer, “Throttling support for row-hammer coun- [43] S. M. Seyedzadeh, A. K. Jones, and R. Melhem, “Counter-Based Tree ters,” Feb. 2 2016, uS Patent 9,251,885. Structure for Row Hammering Mitigation in DRAM,” IEEE Computer [17] D. Gruss, M. Lipp, M. Schwarz, D. Genkin, J. Juffinger, S. O’Connell, Architecture Letters, 2017. W. Schoechl, and Y. Yarom, “Another flip in the wall of rowhammer [44] S. M. Seyedzadeh, A. K. Jones, and R. Melhem, “Mitigating Wordline defenses,” S&P, 2017. Crosstalk Using Adaptive Trees of Counters,” in ISCA, 2018. [18] D. Gruss, C. Maurice, and S. Mangard, “Rowhammer.js: A remote [45] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, “Automatically software-induced fault attack in ,” in Detection of Intrusions Characterizing Large Scale Program Behavior,” in Proceedings of the and Malware, and Vulnerability Assessment, 2016. 10th International Conference on Architectural Support for Program- [19] G. Irazoqui, T. Eisenbarth, and B. Sunar, “MASCAT: Preventing Mi- ming Languages and Operating Systems, 2002. croarchitectural Attacks Before Distribution,” in Proceedings of the 8th [46] M. Son, H. Park, J. Ahn, and S. Yoo, “Making DRAM Stronger ACM Conference on Data and Application Security and Privacy, 2018. Against Row Hammering,” in Proceedings of the 54th Annual Design [20] JEDEC, “DDR4 SDRAM Standard,” 2017. Automation Conference, 2017. [21] JEDEC, “LPDDR5 standard JESD209-5,” 2019. [47] L. Subramanian, D. Lee, V. Seshadri, H. Rastogi, and O. Mutlu, “Bliss: [22] JEDEC, “DDR5 SDRAM,” 2020. Balancing performance, fairness and complexity in memory access [23] JEDEC, “NEAR-TERM DRAM LEVEL ROWHAMMER MITIGA- scheduling,” IEEE Transactions on Parallel and Distributed Systems, TION,” 2021. vol. 27, no. 10, pp. 3071–3087, 2016. [24] JEDEC, “SYSTEM LEVEL ROWHAMMER MITIGATION,” 2021. [48] V. van der Veen, Y. Fratantonio, M. Lindorfer, D. Gruss, C. Maurice, [25] I. Kang, E. Lee, and J. Ahn, “CAT-TWO: Counter-Based Adaptive Tree, G. Vigna, H. Bos, K. Razavi, and C. Giuffrida, “Drammer: Deterministic Time Window Optimized for DRAM Row-Hammer Prevention,” IEEE Rowhammer Attacks on Mobile Platforms,” in Proceedings of the 2016 Access, vol. 8, pp. 17 366–17 377, 2020. ACM SIGSAC Conference on Computer and Communications Security, [26] D. Kaseridis, J. Stuecheli, and L. K. John, “Minimalist open-page: 2016. A DRAM page-mode scheduling policy for the many-core era,” in [49] V. van der Veen, M. Lindorfer, Y. Fratantonio, H. Pillai, G. Vigna, 44th Annual IEEE/ACM International Symposium on Microarchitecture, C. Kruegel, H. Bos, and K. Razavi, “GuardiON: Practical mitigation 2011. of dma-based rowhammer attacks on ARM,” in 15th International [27] D. Kim, M. Park, S. Jang, J.-Y. Song, H. Chi, G. Choi, S. Choi, J. Kim, Conference on Detection of Intrusions and Malware, and Vulnerability C. Kim, K. Kim et al., “23.2 a 1.1 v 1ynm 6.4 gb/s/pin 16gb Assessment (DIMVA), 2018. with a phase-rotator-based dll, high-speed serdes and rx/tx equalization [50] A. G. Yaglikci, M. Patel, J. Kim, R. AziziBarzoki, J. Park, H. Has- scheme,” in 2019 IEEE International Solid-State Circuits Conference- san, A. Olgun, L. Orosa, K. Kanellopoulos, T. Shahroodi, S. Ghose, (ISSCC). IEEE, 2019, pp. 380–382. and O. Mutlu, “Blockhammer: Preventing rowhammer at low cost by [28] J. Kim, M. Patel, A. G. Yaglikçi, H. Hassan, R. Azizi, L. Orosa, and blacklisting rapidly-accessed dram rows,” in International Symposium O. Mutlu, “Revisiting rowhammer: An experimental analysis of modern on High-Performance Computer Architecture, 2021. dram devices and mitigation techniques,” in Proceedings of the 47th [51] T. Yang and X. Lin, “Trap-Assisted DRAM Row Hammer Effect,” IEEE Annual International Symposium on Computer Architecture, 2020. Electron Device Letters, 2019. [29] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, [52] J. M. You and J.-S. Yang, “MRLoc: Mitigating Row-hammering Based K. Lai, and O. Mutlu, “Flipping Bits in Memory Without Accessing on Memory Locality,” in Proceedings of the 56th Annual Design Them: An Experimental Study of DRAM Disturbance Errors,” in Automation Conference, 2019. Proceeding of the 41st Annual International Symposium on Computer [53] Z. Zhang, Y. Cheng, D. Liu, S. Nepal, Z. Wang, and Y. Yarom, “PTham- Architecture, 2014. mer: Cross-User-Kernel-Boundary Rowhammer through Implicit Ac- cesses,” in MICRO, 2020.

14