Algorithms Designs and Implementations for Page Allocation in SSD Firmware and SSD Caching in Storage Systems

Algorithms Designs and Implementations for Page Allocation in SSD Firmware and SSD Caching in Storage Systems Thesis Presented in Partial Fulfillment of the Requirements for the Degre Master of Science in the Graduate School of The Ohio State University By Shuang Liang, B.E., M.E. Graduate Program in Computer Science and Engineering The Ohio State University 2010 Thesis Committee: Xiaodong Zhang, Adviser Feng Qin c Copyright by Shuang Liang 2010 ABSTRACT The emerging flash memory based SSD technology poses a new challenge for us to effectively use it in storage systems. Despite of its many advantages that promise a performance breakthrough for the next generation storage systems, the technical limitations of flash memory write operations call for innovative designs that can overcome the long latency of small random write pattern and the limited lifetime of flash memory cells. Previous studies have attempted to address the write performance limitations by trading capacity at the firmware-level such that the write access pattern can be transformed to avoid small random write patterns. Although it may improve the write performance, it does not utilize the data access locality information available at system level to achieve optimal performance. In this thesis, we study techniques to improve the performance via. system-level designs. We consider a major application scenario of flash SSDs – SSD based disk cache. Based on a new disk cache model for SSD based cache, we propose cache management algorithms which utilize the data access locality for write access pattern transformation with minimal cost to cache hit ratio. We implemented the proposed algorithm in an SSD based second-level disk cache system. The results show that the proposed algorithm can significantly improve the performance of existing cache management algorithms. To maximize the lifetime of flash SSDs, we study techniques that can reduce the number of write operations, which consume the life-cycle of flash memory cells. Unlike previous ap- proaches of flash memory life-cycle management, which balance the wear-level of flash memory ii cells through data migration, we focus on reducing the internal data movement overhead introduced by flash memory SSD firmware to mask the long latency of small random write. We found that the internal data movement overhead can be managed effectively by capacity over-provisioning based on the write request working-set size, and we show a theoretical analysis to illustrate the rela- tionship between the minimal over-provisioning capacity and the page allocation policy to achieve zero data movement overhead. Based on these findings, two SSD firmware page allocation algorithms are proposed to improve the capacity efficiency. Our trace-driven simulation using the Microsoft SSD extension of the well-known Disksim simulator shows that the proposed algorithm can significantly reduce the internal data movement overhead. iii To My Family iv ACKNOWLEDGMENTS I am indebted to many great people. First of all, I would like to thank my advisor, Prof. Xiaodong Zhang for his patience and thoughtful advice during my graduate study. It is Prof. Zhang’s help, motivation, support, and encouragement that has made me come to this step. I would like to thank my research collaborators: Song Jiang, Ke Chen, Weikuan Yu, Ranjit Noronha and Prof. D.K. Panda. They have inspired and helped me in various aspects of my research. I am also very grateful to my lab-mates in The Ohio State University: Feng Chen, Xiaoning Ding, Lei Guo, Rubao Li, Enhua Tan, and Qi Zhang. Their friendship and support have brought me joyful and unforgettable memories in these years. I would like to thank Prof. Feng Qin for his advice as a committee member so that I can complete this thesis. I am also grateful to Peg Steele and Nikki Strader. Their support and friendship provided me an environment to be productive in research while I was working as a graduate administrative assistant. Finally, I am indebted to my family for their unconditional love and support so that I can go through the toughest moments in these years. v VITA 1999 . B.E. in Computer Science, Huazhong University of Science and Technology 2002 . M.E. in Computer Science, Huazhong University of Science and Technology 2003-present . Graduate student in Computer Science and Engineering, The Ohio State Univer- sity FIELDS OF STUDY Major Field: Computer Science and Engineering PUBLICATIONS Research Publications Song Jiang, Xuechen Zhang, Shuang Liang, and Kei Davis. “Improving Networked File System Performance Using a Locality-Aware Cooperative Cache Protocol”. To appear In IEEE Transac- tion on Computers, 2010 Shuang Liang, Ke Chen, Song Jiang, and Xiaodong Zhang. “Cost-Aware Caching Algorithms for Distributed Storage Servers”. In Proceedings of 21st International Symposium on Distributed Computing (DISC’07) Shuang Liang, Song Jiang, and Xiaodong Zhang. “STEP: Sequentiality and Thrashing Detection Based Prefetching to Improve Performance of Networked Storage Servers”. In Proceedings of 27th International Conference on Distributed Computing Systems (ICDCS’07) vi Shuang Liang, Weikuan Yu, and D.K. Panda. “High Performance Block I/O for Global File System (GFS) with InfiniBand RDMA”. In Proceedings of 35th International Conference for Parallel Processing (ICPP’06) Shuang Liang, Ranjit Noronha, and D.K. Panda. “Swapping to Remote Memory over Infini- Band: An Approach using a High Performance Network Block Device”. In Proceedings of IEEE International Conference on Cluster Computing (Cluster’05) Weikuan Yu, Shuang Liang and D.K. Panda. “High Performance Support of Parallel Virtual File System (PVFS2) over Quadrics”. In Proceedings of 19th ACM International Conference on Supercomputing (ICS ’05) Weikuan Yu, Ranjit Noronha, Shuang Liang and D.K. Panda. “Benefits of High Speed Inter- connects to Cluster File Systems: A Case Study with Lustre”. In Proceedings of International Workshop on Communication Architecture for Clusters (CAC ’06) in conjunction with IPDPS 06 vii TABLE OF CONTENTS Page Abstract . ii Dedication . iv Acknowledgments . v Vita ............................................... vi List of Tables . x List of Figures . xi Chapters: 1. Introduction . 1 1.1 Rationale and Challenge of SSD based Disk Caches . 2 1.2 Challenge of Maximizing the Lifetime of Flash SSD . 3 1.3 Contribution of the Thesis . 4 1.3.1 Flash SSD based Disk Cache . 4 1.3.2 Maximizing the Lifetime of Flash SSD . 5 1.4 Thesis Organization . 5 2. Background and Related Work . 7 2.1 Flash Memory SSDs Basics . 7 2.2 Enhancing Flash SSDs via. FTL . 9 2.2.1 Performance Optimization . 9 2.2.2 Managing the Life-cycles of SSDs . 11 2.3 Related Work . 11 2.3.1 Caching Models and Algorithms . 12 viii 2.3.2 Flash Related Systems . 13 3. Flash SSD based Disk Cache . 16 3.1 An SSD-based Caching Model . 16 3.2 Semi-Random Writes for SSD based Disk Cache . 18 3.3 Our SSD-based L2 Cache Design . 20 3.3.1 SSD Aware Placement . 20 3.3.2 Zone based Admission . 25 3.3.3 On Exclusive Cache . 28 3.4 Experimental Studies . 29 3.4.1 Simulation . 30 3.4.2 System Implementation . 36 3.5 Summary . 40 4. Maximizing Flash SSD Lifetime . 42 4.1 Overview . 42 4.2 Capacity, Access Pattern, Performance and Wear . 42 4.2.1 Over-provisioning Capacity and Write Amplification . 42 4.2.2 Access Pattern Impacts . 44 4.3 Workload Measurements . 45 4.3.1 Experiment Environment . 45 4.3.2 Evaluation Results . 46 4.4 Write Amplification Rate Reduction Algorithms . 51 4.4.1 Algorithm Preliminaries . 52 4.4.2 WFWD Allocation Algorithm . 55 4.4.3 Predicting the Write Forward Distance . 56 4.4.4 Implementation in DiskSim-SSD . 57 4.4.5 Performance Evaluations . 58 4.5 Summary . 62 5. Conclusions . 64 Bibliography . 65 ix LIST OF TABLES Table Page 3.1 Product Specification of SSDs .............................. 30 4.1 Workload Description .................................. 45 4.2 Parameters of DiskSim-SSD ............................... 46 4.3 Over-provisioning Capacity vs. Workload Storage Address Space Size . 50 x LIST OF FIGURES Figure Page 1.1 SSD endurance/ECC vs. progress geometry (source: Micron at Memcon’09) . 4 2.1 An example SSD Internal Organization(source:Microsoft at Usenix’08) . 9 3.1 The SSD based two level cache architecture. 17 3.2 Histogram of the evicted page address for MIN, LRU, and DULO replacement algorithms. 19 3.3 Selection of new active placement blocks. 22 3.4 Detailed Placement Algorithm . 24 3.5 Zone based Admission Control Algorithm . 27 3.6 Overall Performance of SSD based L2 cache . 31 3.7 Detail Performance of Placement and Admission Control . 32 3.8 Effects of Exclusive Cache . 33 3.9 α and Placement Block Size on Miss Ratio . 34 3.10 Comparison of High-end and Low-end SSDs . 35 3.11 The architecture of the system implementation of SSD based L2 cache in Linux kernel. 36 3.12 Replaying the OLTP Traces on Real System for 3 Runs. 37 xi 3.13 Evaluation Results of TPC-H on MySQL . 40 4.1 Sensitivity to Write Working-set based OPC . 47 4.2 Sensitivity to Disk Capacity based OPC . 48 4.3 State Transitions of Physical Pages in SSDs . 52 4.4 An Example for Write Forward Distance based Allocation. 57 4.5 Comparison of Algorithms (a) . 60 4.6 Comparison of Algorithms (b) . 61 xii CHAPTER 1 Introduction The exponential growth of information in modern society has been constantly demanding the advancement of storage technology to provide better performance and larger capacity for various data intensive applications. In the past 50 years, the magnetic disk as the major online system storage technology has been successfully keeping up with the capacity requirements, however, it has not been able to catch up with the performance scaling trend of data processing speed.

Algorithms Designs and Implementations for Page Allocation in SSD Firmware and SSD Caching in Storage Systems

Copy on Write Based File Systems Performance Analysis and Implementation

The Page Cache Today’S Lecture RCU File System Networking(Kernel Level Syncmem

Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser

NOVA: a Log-Structured File System for Hybrid Volatile/Non

MULTI-STREAM SSD TECHNOLOGY 2X READ PERFORMANCE & WRITE ENDURANCE T10 SCSI STANDARD

Hibachi: a Cooperative Hybrid Cache with NVRAM and DRAM for Storage Arrays

Things You Should Know About Solid State Storage

How Controllers Maximize SSD Life

Write Amplification Analysis in Flash-Based Solid State Drives

Why Data Retention Is Becoming More Critical in Automotive

Getting the Most out of SSD: Sometimes, Less Is More

UNIVERSITY of CALIFORNIA, SAN DIEGO Model and Analysis of Trim