Autossd: an Autonomic SSD Architecture Bryan S

AutoSSD: an Autonomic SSD Architecture Bryan S. Kim, Seoul National University; Hyun Suk Yang, Hongik University; Sang Lyul Min, Seoul National University https://www.usenix.org/conference/atc18/presentation/kim This paper is included in the Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC ’18). July 11–13, 2018 • Boston, MA, USA ISBN 978-1-939133-02-1 Open access to the Proceedings of the 2018 USENIX Annual Technical Conference is sponsored by USENIX. AutoSSD: an Autonomic SSD Architecture Bryan S.Kim Hyun Suk Yang Sang Lyul Min Seoul National University Hongik University Seoul National University 1.2 Abstract 1 From small mobile devices to large-scale storage arrays, 0.8 flash memory-based storage systems have gained a lot of 0.6 popularity in recent years. However, the uncoordinated 0.4 use of resources by competing tasks in the flash trans- 0.2 lation layer (FTL) makes it difficult to guarantee pre- Normalized throughput 0 dictable performance. Time In this paper, we present AutoSSD, an autonomic SSD Figure 1: Performance drop and variation under 4KB random architecture that self-manages FTL tasks to maintain a writes. high-level of QoS performance. In AutoSSD, each FTL task is given an illusion of a dedicated flash memory sub- low latency and collectively massive parallelism make system, allowing tasks to be implemented oblivious to flash storage suitable for high-performance storage for others and making it easy to integrate new tasks to han- mission-critical applications. As multi-level cell tech- dle future flash memory quirks. Furthermore, each task is nology [5] and 3D stacking [38] continue to lower the allocated a share that represents its relative importance, cost per GB, flash storage will not only remain compet- and its utilization is enforced by a simple and effective itive in the data storage market, but also will enable the scheduling scheme that limits the number of outstand- emergence of new applications in this Age of Big Data. ing flash memory requests for each task. The shares are Large-scale deployments and user experiences, how- dynamically adjusted through feedback control by mon- ever, reveal that despite its low latency and massive par- itoring key system states and reacting to their changes to allelism, flash storage exhibits high performance insta- coordinate the progress of FTL tasks. bilities and variations [9, 17]. Garbage collection (GC) We demonstrate the effectiveness of AutoSSD by has been pointed out as the main source of the problem holistically considering multiple facets of SSD internal [9, 25, 28, 29, 45], and Figure 1 illustrates this case. It management, and by evaluating it across diverse work- shows the performance degradation of our SSD model loads. Compared to state-of-the-art techniques, our de- under small random writes, and it closely resembles mea- sign reduces the average response time by up to 18.0%, sured results from commercial SSDs [2,23]. Initially, the the 3 nines (99.9%) QoS by up to 67.2%, and the 6 nines SSD’s performance is good because all the resources of (99.9999%) QoS by up to 76.6% for QoS-sensitive small the flash memory subsystem can be used to service host reads. requests. But as the flash memory blocks are consumed by host writes, GC needs to reclaim space by compact- 1 Introduction ing data spread across blocks and erasing unused blocks. Consequently, host and GC compete for resources, and Flash memory-based storage systems have become pop- the host performance inevitably suffers. ular across a wide range of applications from mobile However, garbage collection is a necessary evil for systems to enterprise data storages. Flash memory’s the flash storage. Simply putting off space reclamation small size, resistance to shock and vibration, and low or treating GC as a low priority task will lead to larger power consumption make it the de facto storage medium performance degradations, as host writes will eventually in mobile devices. On the other hand, flash memory’s block and wait for GC to reclaim space. Instead, garbage USENIX Association 2018 USENIX Annual Technical Conference 677 collection must be judiciously scheduled with host re- thetic and real I/O workloads. x 6 discusses our design quests to ensure that there is enough free space for fu- in relation to prior work, and finally x 7 concludes. ture requests, while meeting the performance demands of current requests. This principle of harmonious coexis- 2 Background tence, in fact, extends to every internal management task. Map caching [15] that selectively keeps mapping data in For flash memory to be used as storage, several of its memory generates flash memory traffic on cache misses, limitations need to be addressed. First, it does not allow but this is a mandatory step for locating host data. Read in-place updates, mandating a mapping table between scrubbing [16] that preventively migrates data before its the logical and the physical address space. Second, corruption also creates traffic when blocks are repeatedly the granularities of the two state-modifying operations— read, but failure to perform its duty on time can lead to program and erase—are different in size, making it nec- data loss. As more tasks with unique responsibilities are essary to perform garbage collection (GC) that copies added to the system, it becomes increasingly difficult to valid data to another location for reclaiming space. These design a system that meets its performance and reliability internal management schemes, collectively known as the requirements [13]. flash translation layer (FTL) [11], hide the limitations In this paper, we present an autonomic SSD archi- of flash memory and provide an illusion of a traditional tecture called AutoSSD that self-manages its manage- block storage interface. ment tasks to maintain a high-level of QoS performance. The role of the FTL has become increasingly im- In our design, each task is given a virtualized view of portant as hiding the error-prone nature of flash mem- the flash memory subsystem by hiding the details of ory can be challenging when relying solely on hard- flash memory request scheduling. Each task is allo- ware techniques such as error correction code (ECC) cated a share that represents the amount of progress it and RAID-like parity schemes. Data stored in the flash can make, and a simple yet effective scheduling scheme array may become corrupt in a wide variety of ways. enforces resource arbitration according to the allotted Bits in a cell may be disturbed when neighboring cells shares. The shares are dynamically and automatically ad- are accessed [12, 41, 44], and the electrons in the float- justed through feedback control by monitoring key sys- ing gate that represent data may gradually leak over tem states and reacting to their changes. This achieves time [6, 35, 44]. Sudden power loss can increase bit er- predictable performance by maintaining a stable system. ror rates beyond the error correction capabilities [44,47], We show that for small read requests, AutoSSD reduces and error rates increase as flash memory blocks wear the average response time by up to 18.0%, the 3 nines out [6, 12, 19]. As flash memory becomes less reliable (99.9%) QoS by up to 67.2%, and the 6 nines (99.9999%) in favor of high-density [13], more sophisticated FTL al- QoS by up to 76.6% compared to state-of-the-art tech- gorithms are needed to complement existing reliability niques. Our contributions are as follows: enhancement techniques. Even though modern flash storages are equipped with • We present AutoSSD, an autonomic SSD architec- sophisticated FTLs and powerful controllers, meeting ture that dynamically manages internal housekeep- performance requirements have three main challenges. ing tasks to maintain a stable system state. (x 3) First, as new quirks of flash memory are introduced, • We holistically consider multiple facets of SSD in- more FTL tasks are added to hide the limitations, thereby ternal management, including not only garbage col- increasing the complexity of the system. Furthermore, lection and host request handling, but also mapping existing FTL algorithms need to be fine-tuned for ev- management and read scrubbing. (x 4) ery new generation of flash memory, making it difficult to design a system that universally meets performance • We evaluate our design and compare it to the state- requirements. Second, multiple FTL tasks generate se- of-the-art techniques across diverse workloads, an- quences of flash memory requests that contend for the alyze causes for long tail latencies, and demonstrate resources of the shared flash memory subsystem. This the advantages of dynamic management. (x 5) resource contention creates queueing delays that increase response times and causes long-tail latencies. Lastly, de- The remainder of this paper is organized as follows. pending on the state of the flash storage, the importance x 2 gives a background on understanding why flash stor- of FTL tasks dynamically changes. For example, if the ages exhibit performance unpredictability. x 3 presents flash storage runs out of free blocks for writing host data, the overall architecture of AutoSSD and explains our de- host request handling stalls and waits for garbage col- sign choices. x 4 describes the evaluation methodology lection to reclaim free space. On the other hand, with and the SSD model that implements various FTL tasks, sufficient free blocks, there is no incentive prioritizing and x 5 presents the experimental results under both syn- garbage collection over host request handling. 678 2018 USENIX Annual Technical Conference USENIX Association Autonomic SSD Task queues Flash Flash channel Flash Translation Layer memory subsystem Flash Flash Flash Host queue chip chip chip Host system request Flash channel handling Garbage Flash Flash Flash collection Scheduling Flash memory chip chip chip Subsystem Subsystem Other mgmt tasks Flash channel Read scrubbing Flash Flash Flash chip chip chip Key system states Share (# of clean blocks, weight read count, etc) Share controller Figure 2: Overall architecture of AutoSSD and its components.

Autossd: an Autonomic SSD Architecture Bryan S

Advanced NAND Flash Memory Single-Chip Storage Solution

Using Data Postcompensation and Predistortion to Tolerate Cell-To-Cell

Speeding Data Access with the Sun Flash Accelerator F20 Pcie Card

Using Test Beds to Probe Non- Volatile Memories' Dark

Flash Memory Guide

Flash Memory Controller 1TB SSD Controller Gbdriver RS4 Series with 3 Gbps Serial ATA

88ALP01 PCI to NAND, SD and Camera Host Controller Datasheet

NAND Flash Controller 13 2014.08.18

2013 Annual Report

Flash Memory Guide

X1 SSD Flash Memory Controller

Scalable Parallel Flash Firmware for Many-Core