Yat: a Validation Framework for Persistent Memory Software

Yat: A Validation Framework for Persistent Memory Software Philip Lantz, Subramanya Dulloor, Sanjay Kumar, Rajesh Sankaran, and Jeff Jackson, Intel Labs https://www.usenix.org/conference/atc14/technical-sessions/presentation/lantz This paper is included in the Proceedings of USENIX ATC ’14: 2014 USENIX Annual Technical Conference. June 19–20, 2014 • Philadelphia, PA 978-1-931971-10-2 Open access to the Proceedings of USENIX ATC ’14: 2014 USENIX Annual Technical Conference is sponsored by USENIX. Yat: A Validation Framework for Persistent Memory Software Philip Lantz Subramanya Dulloor Sanjay Kumar Rajesh Sankaran Jeff Jackson Intel Labs Abstract WA->B = {W1, W2} WB->C = {W3, W4} This paper describes the design and implementation of Yat. Yat is a hypervisor-based framework that sup- ports testing of applications that use Persistent Memory A B C (PM)—byte-addressable, non-volatile memory attached directly to the memory controller. PM has implications on both system architecture and software. The PM ar- clflush (L1, L2) clflush (L3, L4) chitecture extends the memory ordering model to add pm_wbarrier pm_wbarrier software-visible support for durability of stores to PM. Figure 1: PM software flow By simulating the characteristics of PM, and integrat- ing an application-specific checker in the framework, Yat lines (L1,L2) in the process. For traditional block based enables validation, correctness testing, and debugging storage, software explicitly schedules IO to make these of PM software in the presence of power failures and cachelines persistent at block granularity. However, for crashes. We discuss the use of Yat in development and PM-based storage these cachelines can become persis- testing of the Persistent Memory File System (PMFS), tent in arbitrary order by cacheline evictions outside of describing the effectiveness of Yat in catching and de- software control. Hence, extra care must be taken in en- bugging several hard-to-find bugs in PMFS. forcing ordering on updates to PM. Programmers today are not used to explicitly tracking and flushing modified 1 Introduction cachelines in volatile memory. But, this is a critical re- We are witnessing growing interest in Non-Volatile quirement of PM software, failing which could cause se- DIMMs (NVDIMMs) that attach storage class memory rious consistency bugs and data corruption. (e.g., PCM, MRAM, etc.) directly to the memory con- Testing for the correctness of PM software is challeng- troller [5]. We refer to any such byte-addressable, non- ing. One way is to simulate or induce failures (such volatile memory as Persistent Memory (PM). as from power loss or crashes) and use an application- PM has implications on system architecture and soft- specific checker tool (similar to fsck for Linux filesys- ware [2]. Since PM performance—both latency and tems) to verify consistency of the data in PM. However, bandwidth—is within an order of magnitude of DRAM, this method tests only the actual ordering of writes to software can map PM as write-back cacheable for per- PM in a single flow, even though many other orderings formance reasons. Several studies [2, 6] have shown sig- are possible outside the control of the PM software (e.g., nificant performance gains from the use of in-place data due to arbitrary cacheline evictions). structures in write-back PM. These studies also show the To overcome this challenge, we built Yat (meaning need for extensions to the existing memory model to al- “trial by fire” in Sanskrit), a hypervisor-based framework low PM software to control ordering and durability of for testing the correctness of PM. Yat uses a record and stores to write-back PM. replay method to simulate architectural failure conditions However, such extensions to the memory model intro- that are specific to PM. We used Yat in validation and duce the possibility of new types of programming errors. correctness testing of PMFS [2], which is a reasonably For instance, consider a PM software flow as shown in large and complex PM software module. Though we fo- Figure 1. Starting at consistent state A, PM software per- cus on PMFS as the only case study in this paper, the forms two writes to PM (set WA >B), dirtying two cache- principles of Yat are applicable to other PM software, in- − USENIX Association 2014 USENIX Annual Technical Conference 433 cluding libraries and applications [6]. 4. When a fence is executed, prior clflushes on that Contributions of this paper are as follows: processor take effect. All writes performed by any Design and implementation of Yat, a hypervisor- processor to the cache lines affected by the clflushes • based framework for testing PM software. are flushed. Evaluation of a Yat prototype, using PMFS as an • 5. When a pm wbarrier primitive is executed, all example PM software. writes that have been flushed by rule 4 are made 2 System Architecture durable. Any writes that have not been clflushed— We assume the Intel64 based system architecture de- or that were clflushed but where no subsequent scribed elsewhere [2], where software can access PM fence was executed on the same processor as the directly using regular CPU load and store instructions. clflush prior to the pm wbarrier—are not guaran- Because PM is typically mapped write-back (for perfor- teed to be durable. mance reasons), PM data in CPU caches can be evicted 3 Yat Design and made durable at any time. To give software the abil- ity to control consistency, the architecture must provide Yat is a framework for testing PM software. We refer to a software visible guarantee of ordering and durability of the PM software being tested as App. stores to PM. The proposed architecture includes a sim- The goals of Yat are: ple hardware primitive, PM write barrier (pm wbarrier), 1) to test App for bugs caused by improper reorder- which guarantees durability of all stores to PM that have ing of write operations; e.g., due to missing or misplaced been explicitly flushed from the CPU caches but might ordering and durability instructions. still be be in a volatile buffer in the memory controller or 2) to exercise the PM recovery code in App in the con- in the PM module. text of a large variety of failure scenarios, such as power In Figure 1, to effect WA >B before WB >C, PM soft- failures and software failures internal or external to App − − ware must flush the dirty cachelines L1 2 and make those that cause it to abort. − writes durable using pm wbarrier, before proceeding to To test that App applies sufficient memory ordering writes in WB >C. In complex software, it can be chal- − constraints to preserve consistency no matter when a fail- lenging to keep track of all the dirty cachelines that need ure occurs, Yat simulates reordering PM writes in every to be flushed before the use of pm wbarrier. We ex- allowed order in which they may become durable in the pect user-level libraries and programming models to hide PM hardware based on the rules in 2. Note that if App is § most of this programming complexity from PM applica- multithreaded, Yat records the actual sequence of operations [6]. tions executed by the various threads. However, Yat does Yat is designed to help validate that PM software cor- not model the non-determinism in the software as it has rectly uses cache flushes (clflush), ordering instructions no knowledge of synchronization done by the software. (sfence and mfence), and the new hardware primitive The operation of Yat is shown in Figure 2. Yat op- (pm wbarrier) to control durability and consistency in erates in two phases. The first phase, Yat-record, sim- PM, even in the face of arbitrary failures and cacheline ply collects a trace (App-trace) while App is executing. evictions. For any sequence of updates to PM, Yat cre- Yat-record logs write and clflush instructions within the ates all possible states in PM based storage and then runs address range of PM, along with the explicit ordering in- the PM application’s recovery tool to test recovery to a structions (sfence and mfence), and the new hardware consistent state. The possible orderings are determined primitive pm wbarrier. by the memory ordering model of the processor archi- The second phase, Yat-replay, has the following steps: tecture. The proposed system (based on Intel64 architec- Segment Yat-replay divides App-trace into segments, ture) follows these rules: separated by each pm wbarrier in the trace. For each 1. When a write is executed, it may become durable segment, there is a set of writes to PM that have been immediately or at any subsequent time up to the executed but are not guaranteed to be durable. This set point where it is known to have become durable by of writes is called the active set for that segment. rule 5. Reorder For each segment, Yat-replay selects subsets 2. When a write is executed that modifies a cache line of writes in the active set that do not violate rule 2. Each that has been modified by a prior write, the later such selected subset of writes is called a combination. write is guaranteed to become durable no sooner For each combination, Yat-replay starts with than the prior write to the same cache line. This Replay an initial state (App-initial-state), applies all writes guarantee holds across cores based on Intel64 ar- that are guaranteed to have become durable as of the chitecture. pm wbarrier that precedes the current segment, and then 3. When a clflush is executed, it has no effect until it is applies the writes contained in the current combination.

Yat: a Validation Framework for Persistent Memory Software

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support