Using Test Beds to Probe Non- Volatile Memories' Dark
Total Page:16
File Type:pdf, Size:1020Kb
Beyond the Datasheet: Using Test Beds to Probe Non- Volatile Memories’ Dark Secrets Laura M. Grupp*, Adrian M. Caulfield*, Joel Coburn*, John D. Davis†, and Steven Swanson* *University of California, San Diego †Microsoft Research, Silicon Valley Lab [lgrupp, acaulfie, jdcoburn, swanson]@cs.ucsd.edu [email protected] latency, bandwidth, bit error rate, and energy efficiency will all combine to determine which designs are viable Abstract and which are not. Second, they must understand how Non-volatile memories (such as NAND flash and phase the non-volatile memory array will affect the rest of the change memories) have the potential to revolutionize system. If a new storage technology significantly computer systems. However, these technologies have increases performance, it may expose bottlenecks in complex behavior in terms of performance, reliability, other parts of the system. and energy consumption that make fully exploiting Addressing the first challenge requires a their potential a complicated task. As device engineers comprehensive understanding of non-volatile memory push bit densities higher, this complexity will only performance, failure modes, and energy consumption. increase. Managing and exploiting the complex and at This information is difficult to come by. Device times surprising behavior of these memories requires a manufacturers hide the details of their devices behind deep understanding of the devices grounded in non-disclosure agreements, and even when datasheets experimental results. Our research groups have are freely available, they provide scant details: average developed several hardware test beds for flash and performance characteristics, vague recommendations other memories that allow us to both characterize these about error correction requirements, and inexact memories and experimentally evaluate their bounds on energy consumption. The datasheets make performance on full-scale computer systems. We no mention of many “warts” that these devices possess. describe several of these test bed systems, outline some It is not that datasheets are inaccurate; it is that they are of the research findings they have enabled, and discuss woefully inadequate if our goal is to fully exploit the some of the methodological challenges they raise. capabilities of these memories. To address the second challenge and fully evaluate a 1. Introduction new non-volatile storage array, it is easiest to measure and observe the array in situ, in a working system, Non-volatile memory has recently emerged as a running real applications. The benefits of this approach possible replacement for main memory and hard disk – versus, for example analytical model or simulation – drives (HDDs). While these devices have some is that it reveals unexpected hardware and/or software limitations, the potential benefits that the components bottlenecks and reveals the actual benefit of this promise more than outweigh them. In current systems, application. non-volatile memories usually appear as solid state disks (SSDs) that use NAND flash to build “black box” The only way to address either of these challenges is replacements for conventional HDDs. to construct working hardware test beds – custom-built hardware systems that incorporate non-volatile Constructing SSDs or other components built from memories or can emulate them at very high fidelity. non-volatile memories presents two problems that These test beds can provide “ground truth” data on the researchers and designers must confront. First, how to behavior of both memory devices and systems that best construct a system to exploit the strengths and incorporate them. mitigate the weaknesses of the technology. The This paper describes the non-volatile test beds that PowerPC FPGA we have built and provides some results from them. Flash Memory We describe some of the insights we have gleaned Processor Testing Socket from them and discuss some of the challenges that building and using these test beds presents. Our Flash Flash Bus experiences with these systems demonstrate that they Controller can provide a wealth of useful data and raise Flash Memory unexpected and important questions that motivate and Other Peripherals Testing Socket inform research into applying non-volatile memories. Figure 1. The Ming/Zarkov system uses a PowerPC processor and a custom flash controller built into an 2. Hardware Test beds FPGA to provide direct access to flash devices for either Hardware test beds come in two flavors: characterization (in Ming) or FTL prototyping (in Zarkov). characterization platforms and application platforms. Characterization platforms allow researchers to observe for the flash controller. The driver is unique in that it the behavior of the device under test at a finer provides user-level access to low-level flash operations granularity than the information provided in the device (read, program, and erase, etc.). The result is that datasheet. Application platforms allow researchers to developing test code for Ming is very easy. For evaluate the impact of a memory technology at the instance, to test a new error correction or data encoding application level, in a working system, and in the technique, it is sufficient to implement it in C. presence of real-world software and hardware The combination of the Ming hardware and software overheads. In some cases, the same platform can be provides a wealth of information about each flash used for both characterization and applications. memory access. The driver returns the operation Characterization platforms latency at 10ns resolution and allows straight forward A device’s datasheet provides some information about measurement of the energy consumed by each chip. how it will behave, but it does not provide the level of Application platforms detail that a characterization platform can extract. Application platforms offer insight in how a new Whereas a datasheet may provide a latency range for a technology will affect application-level behavior. They device operation, the characterization platform can offer insight into how storage devices will interact with observe the distribution of latencies for that operation other hardware and software components. This can, for across the chip or a collection of chips. In general, instance, reveal bottlenecks in the system that the new these platforms may be able to observe characteristics technology exposes. Below we describe the application of the device that are not reported in the datasheet. platforms our groups have developed and used. We have used a platform called Ming to characterize The first is Zarkov. Zarkov relies on the same many NAND flash devices. Ming is composed of an FPGA-based controller and FPGA prototyping board off-the-shelf Xilinx FPGA-based development board as as Ming but holds up to 32, non-removable flash chips. well as a custom-built flash testing board, enabling Zarkov is a key component of the BlueSSD project [5]. detailed measurements of almost all aspects of flash BlueSSD aims to provide an open platform for memory devices. The test board holds two “burn in” experimenting with SSD optimizations. sockets that accept standard TSOP flash devices. Each The second application platform is the Flash socket has a separate power plane to support per-chip Research Platform (FRP). The FRP hardware is a power measurement. Figure 1 provides a block combination of the BEE3 multi-FPGA research diagram of the Ming hardware platform. platform [1] and a custom printed circuit board, a Flash In addition to the hardware, Ming also includes Dual Inline Memory Module (FDIMM). The several software components that make it useful as a combination of an FPGA and DIMM slot provide the characterization system. The Xilinx FPGA contains a ability to interface to other non-volatile memories, like microprocessor and we configure it with a custom-built PCM, for future research, leveraging the same BEE3 flash memory controller. We run a full-fledged version hardware and software. Each FDIMM exposes 8 of Linux on the processor, and provide a custom driver independent flash channels and up to four FDIMMs execution of these commands on the actual hardware. Host PC Host PC The FRP management software can be executed on an PCI-E x8, embedded CPU in the FPGA, or on the host CPU in the 1 GbE, PCI-E x8, 10 GbE 8 GB 64 GB 1 GbE, device driver or even as a user-level application. 10 GbE FF 8 GB 64 GB LL F Currently, we organize the management layer as a user- M Controller F A AA L R L FPGA S M Controller D S A AA level application on a PC running Windows XP, A R HH FPGA S D S A HH trading off performance for ease of implementation. (A) Ring A2D Ring A2B Co-development of the user-level application and the FF FF FF FF FPGA flash controller enable implementing and C LL Flash LL 2 LL Flash LL A AA A A A FPGA A H A FPGA A instrumenting all flash operations. Like the SS D SS S SS B SS Q HH HH HH HH Ming/Zarkov platform, the FRP can be a 64 GB 64 GB 64 GB 64 GB characterization or application platform. The main FF FF LL Flash LL difference is that the FRP platform does not have AA FPGA AA SS C SS HH HH separate power planes per flash package on the 64 GB (B) 64 GB FDIMM, enabling only module power measurement Figure 2. FRP configuration ranges from (A) a single FPGA and not individual package power measurement. and associated DIMM cards to (B) a multiple FPGA system with up to 16 DIMM cards. These configurations assume 4 The final application platform is a prototyping GB DRAM DIMMs and 32 GB FDIMMs. system for developing PCIe-attached storage devices based on next-generation non-volatile memories. The can be connected to an FPGA. The BEE3 has 16 platform is called Moneta and it targets memories such DIMM slots that we can populate with DRAM, as phase change memory, the memristor, or scalable FDIMMs, or other special-purpose modules.