Flashy Prefetching for High-Performance Flash Drives
Total Page:16
File Type:pdf, Size:1020Kb
Flashy Prefetching for High-Performance Flash Drives Ahsen J. Uppal, Ron C. Chiang, and H. Howie Huang Department of Electrical and Computer Engineering George Washington University fauppal, rclc, [email protected] Abstract—While hard drives hold on to the capacity advantage, seek penalties, limited HDD bandwidth, and limited system flash-based solid-state drives (SSD) with high bandwidth and RAM), with one notable exception [36]. low latency have become good alternatives for I/O-intensive For flash-based SSDs, we believe that aggressive prefetch- applications. Traditional data prefetching has been primarily designed to improve I/O performance on hard drives. The same ing could potentially expedite data requests for many ap- techniques, if applied unchanged on flash drives, are likely to plications. However, as we will demonstrate shortly, simply either fail to fully utilize SSDs, or interfere with application I/O prefetching as much data as possible does not provide the requests, both of which could result in undesirable application desired benefits for three main reasons. First, data prefetching performance. In this work, we demonstrate that data prefetching, on faster devices such as SSDs, if uncontrolled, will take when effectively harnessing the high performance of SSDs, can provide significant performance benefits for a wide range of shared I/O bandwidth from existing data accesses (more easily data-intensive applications. The new technique, flashy prefetching, than on slower hard drives). As a side effect, useful cached consists of accurate prediction of application needs in runtime data may become evicted while main memory would be filled and adaptive feedback-directed prefetching that scales with with mispredicted (and unneeded) data while applications were application needs, while being considerate to underlying storage waiting for useful data. devices. We have implemented a real system in Linux and evaluated it on four different SSDs. The results show 65-70% Second, not every device is the same, and this is especially prefetching accuracy and an average 20% speedup on LFS, web true for SSDs. The performance of an SSD can vary depending search engine traces, BLAST, and TPC-H like benchmarks across on flash type (SLC/MLC), internal organization, memory various storage drives. management, etc. A prefetching algorithm, while reasonably aggressive for a faster drive, could become too aggressive for I. INTRODUCTION another drive, slowing down normal execution. Last, not every The spectrum of storage devices has expanded drastically in application is the same – two applications often possess differ- the last several years, thanks to the emergence of solid-state ent I/O requirements. A single application can also go through drives (SSDs) that are built upon NAND flash memory [6], multiple stages, each of which has different I/O requirements. [13]. As scientific and enterprise data usage continues to grow Clearly, care should be taken to avoid adverse effects from exponentially, new storage systems that leverage both high both too-conservative and too-aggressive prefetching. performance from SSDs and large capacity from hard drives In this work, we propose the technique of flashy prefetch- (HDDs) will likely be in high demand to reduce the I/O ing1for emerging flash devices, which is aware of the runtime performance gap. While for many data-intensive applications environment and can adapt to the changing requirements of moving, ”hot” data from HDDs to SSDs (disk swapping and both devices and applications as well as its own run-time data migration) can easily bring good speedups, in this paper performance. we aim to achieve additional performance benefits from SSDs The salient features of flashy prefetching include not only beyond the simple practice of disk replacement. taking advantage of the high bandwidth and low latency of Data prefetching [37], [19] is one of, if not the most, SSDs, but also providing inherent support for parallel data widely-used techniques to reduce access latency, because it accesses and feedback-controlled aggressiveness. To demon- can load the data that are likely to soon be accessed from strate its feasibility and benefits, we have implemented a real storage devices into main memory. Traditional prefetching system called prefetchd in Linux that dynamically controls techniques have been focused on rotational hard drives and its prefetching aggressiveness at runtime to maximize per- are conservative with the amount of data prefetched for good formance benefits, by making good tradeoffs between data reasons – because data prefetching consumes shared system prefetching and resource consumption. Note that we use flashy resources, it is likely that aggressive data prefetching would in- prefetching and prefetchd interchangeably in this paper. We terfere with normal access and subsequently hinder application evaluate prefetchd on four different SSDs with a wide range performance. As a result, current techniques often leverage the of data-intensive applications and benchmarks. The prototype low cost of sequential access on hard drives to read data that resides on the same and nearby tracks. Aggressive prefetching 1”Flashy: momentarily dazzling.” (source: Merriam-Webster) Flashy has been considered too risky by many researchers (given long prefetching aims to effectively harness ”flashy”, high-performance SSDs. 978-1-4673-1747-4/12/$31.00 c 2013 IEEE TABLE I achieves average 20% speedups on LFS, web search engine SPECIFICATION AND MEASURED PERFORMANCE traces, BLAST, and TPC-H like benchmarks across various SSD1 [34] SSD2 [35] SSD3 [23] SSD4 [24] storage drives, which we believe largely comes from the 65- Capacity 120 GB 120 GB 80 GB 120 GB 70% prefetching accuracy. Flash MLC MLC MLC MLC The main contributions of this paper are threefold: Type 34nm 34nm 25nm Controller Indilinx SandForce Intel Marvell • We conduct a comprehensive study of the effects of Read BW 250 MB/s 285 MB/s 250 MB/s 450 MB/s conservative and aggressive prefetching in the context (Bandwidth) (max) (max) (seq) (seq) Write BW 180 MB/s 275 MB/s 100 MB/s 210 MB/s of heterogeneous devices and applications. The results (max) (max) (seq) (seq) show that adaptive prefetching is essential for taking Latency 100 ms 100 ms 65 ms 65 ms advantage of high-performance storage devices like solid- read read state drives. Measured 170 MB/s 170 MB/s 265 MB/s 215 MB/s Read BW • We design the architecture of flashy prefetching such that Measured 180 MB/s 203 MB/s 81 MB/s 212 MB/s it self-tunes to prefetch data in a manner that matches Write BW application needs without being so aggressive that useful pages are evicted from the cache. Measuring performance SSDs from two manufacturers (roughly covering two recent metrics in real-time and adjusting the aggressiveness generations of SSDs): OCZ Vertex (SSD1) [34] and Vertex2 accordingly significantly improves the effectiveness of (SSD2) [35], and Intel X-25M (SSD3) [23] and 510 (SSD4) this approach. [24]. We also use a Samsung Spinpoint M7 (HDD) hard drive [39]. If we look at the specifications in the top half of Table I, • We develop a Linux-based prototype, prefetchd, that the specification numbers for SSDs are close, except for the monitors application read requests, predicts which pages Intel 510 SSD (which comes with the SATA III support, but are likely to be read in the near future, loads those pages is limited by the SATA II interface in our test machines). into the system page cache while attempting to not evict However, the differences between different SSDs tend to other useful pages, monitors its success rate in time and be subtle, mostly in architectural designs. Note that the same across pages, and adjusts its aggressiveness accordingly. manufacturer may choose to adopt different controllers across Note that data prefetching on SSDs has drawn a lot of two models, which coincidentally was the case for the SSDs interest. For example, [46], [9] show that prefetching can be chosen in this study. As shown in the bottom half in Table I, used to improve energy efficiency. Our positive results also when measured under Linux, the four SSDs clearly have demonstrate the performance potential of data prefetching. higher bandwidth than the hard drive (measured read band- Another notable work, FAST [25], focuses on shortening width at about 90 MB/s), that is, the four SSDs outperform application launch times and utilizes prefetching on SSDs the hard drive by 189%, 189%, 294%, and 239%, respectively. for the quick start of various applications. Our approach ex- More importantly, the four SSDs differ noticeably, especially, pands on previous work along multiple dimensions, including their measured write bandwidths range from 80 MB/s and 200 employing a feedback-driven control mechanism, handling MB/s. multiple simultaneous requests across processes, and requiring no application modifications. With a wider range of data- 7000 intensive applications in mind, the proposed prefetchd aims 6000 to improve the overall performance of generic applications. 5000 The rest of the paper is organized as follows: Section 4000 II presents the challenges the prefetching must address and Section III describes our design principles to address those 3000 challenges. Section IV presents the architecture of prefetchd 2000 Throughput (IOPS) and describes each individual component. Section V discusses 1000 the implementation in detail. The evaluation is presented in Section VI and related work is discussed in Section VII. We 0 conclude in Section VIII. lfs ws1 ws2 db1 db2 db3 db4 db5 db6 db7 Benchmarks II. THREE CHALLENGES Fig. 1. Application I/O Throughputs A. Challenge #1: SSDs are different SSDs are clearly different from HDDs in many ways. To B. Challenge #2: Applications are different name a few: no seek latency, excellent random read and write Although data-intensive applications are in dire need of performance, inherent support for parallel I/O, expensive small high-performance data access, they tend to have different writes, and limited erase cycles.