On-Line Data Reconstruction in Redundant Disk Arrays
Total Page:16
File Type:pdf, Size:1020Kb
On-Line Data Reconstruction In Redundant Disk Arrays A dissertation submitted to the Department of Electrical and Computer Engineering, Carnegie Mellon University, in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright © 1994 by Mark Calvin Holland ii Abstract There exists a wide variety of applications in which data availability must be continu- ous, that is, where the system is never taken off-line and any interruption in the accessibil- ity of stored data causes significant disruption in the service provided by the application. Examples include on-line transaction processing systems such as airline reservation sys- tems and automated teller networks in banking systems. In addition, there exist many applications for which a high degree of data availability is important, but continuous oper- ation is not required. An example is a research and development environment, where access to a centrally-stored CAD system is often necessary to make progress on a design project. These applications and many others mandate both high performance and high availability from their storage subsystems. Redundant disk arrays are systems in which a high level of I/O performance is obtained by grouping together a large number of small disks, rather than building one large, expensive drive. The high component count of such systems leads to unacceptably high rates of data loss due to component failure, and so they typically incorporate redun- dancy to achieve fault tolerance. This redundancy takes one of two forms: replication or encoding. In replication, the system maintains one or more duplicate copies of all data. In the encoding approach, the system maintains an error-correcting code (ECC) computed over the data. The latter category of systems is very attractive because it offers both low cost per megabyte and high data reliability, but unfortunately such systems exhibit very poor performance in the presence of a disk failure. This dissertation addresses the design of ECC-based redundant disk arrays that offer dramatically higher levels of performance in the presence of failure than systems comprising the current state of the art, without sig- nificantly affecting the performance, cost, or reliability of these systems. The first aspect of the problem considered here is the organization of data and redun- dant information in the array. The dissertation demonstrates techniques for distributing the workload induced by a disk failure across a large set of disks, thereby reducing the impact of the failure recovery process on the system as a whole. i Once the organization of data and redundancy has been specified, additional improve- ments in performance during failure recovery can be obtained through the careful design of the algorithms used to recover lost data from redundant information. The dissertation shows that structuring the recovery algorithm so as to assign one recovery process to each disk in the array, as opposed to the traditional approach of structuring it so as to assign a process to each unit of in a set of data units to be concurrently recovered, provides signifi- cant advantages. Finally, the dissertation develops a design for a redundant disk array targeted at extremely high availability through extremely fast failure recovery. This development also demonstrates the generality of the techniques presented here. ii Acknowledgments First and foremost thanks of course go to my parents, Robert and Esther Holland. Their support has been unconditional and unwavering, but they’ve given me more than that. The really significant thing Mom and Dad did for me was to show me that education is the primary road to a better and more meaningful life. One is enriched by each new level of understanding that one acquires, irrespective of traditional boundaries between domains of knowledge. For this lesson, I’m more grateful to them than I can say. I love you both. My advisor, Dan Siewiorek, gave me the freedom to pursue my own academic inter- ests, and supported me even when my work did not exactly coincide with his own research agenda. This involved a great deal of extra effort on his part, and his willingness to make sure I succeeded didn’t go unnoticed. It was he who initially suggested the topics for both my Master’s and Ph.D. I’m very pleased to have had the opportunity to work with him, and my only regret is that the path my studies took did not allow us to work more closely. This dissertation has grown out of long and fruitful discussions with Garth Gibson. Garth is one of the sharpest and most capable people I’ve ever worked with, and it was his encyclopedic knowledge of data storage technology, and where it is and should be going, that guided my studies from the start. This would be impressive even if it were all, but Garth and I were also able to establish a rare relationship based on confidence and com- munication that allowed our interaction to be pleasurable as well as productive. Thanks Garth. Before I leave off thanking my advisors, one more point has to be made. Garth and Dan stood by me when a personal crisis caused me to shirk my studies for a while. This I appreciate more than anything else. Bill Courtright, Hugo Patterson, and Dan Stodolsky all deserve thanks for contribut- ing to my thesis through constant discussion, review, and technical assistance. In working with them I felt genuinely like a member of a team; like each of us made it a goal that we should all succeed. The three of you made it all a positive experience for me. iii Stephanie Byram has put up with a lot from me lately, and I want her to know that I realize that nothing goes unnoticed. Thanks for tolerating, Steph. I’ll be there when you need me. Finally, I want to express my thanks to Yale Patt, now with the University of Michi- gan at Ann Arbor. About nine years ago, Yale took a chance and gave some responsibility to an undergraduate student who lacked confidence and was uncertain of his abilities. I firmly believe that the opportunities that arose from his support have led to my every suc- cess since then. The rare and beautiful thing Yale did for me was to trust me in the absence of any compelling reason to do so. I really hope, Yale, that you continue to extend your confidence to others at the risk of getting burned. iv Table of Contents 1 Chapter 1: Introduction 1 Chapter 2: Background Information 7 2.1. The need for improved availability in the storage subsystem 8 2.1.1. The widening access gap 8 2.1.2. The downsizing trend in disk drives 9 2.1.3. The advent of new, I/O intensive applications 10 2.1.4. Why these trends necessitate higher availability 10 2.2. Technology background 13 2.2.1. Disk technology 13 2.2.2. Disk array technology 16 2.2.2.1. Disk array architecture 17 2.2.2.2. Defining the RAID levels: data layout and ECC 18 2.2.2.3. Reading and writing data in the different RAID levels 23 2.2.2.3.1. RAID Level 1 24 2.2.2.3.2. RAID Level 3 25 2.2.2.3.3. RAID Level 5 26 2.2.2.4. Comparing the performance of the RAID levels 29 2.2.2.5. On-line reconstruction 29 2.2.2.6. Related work: variations on these organizations 30 2.2.2.6.1. Multiple failure toleration 30 2.2.2.6.2. Addressing the small-write problem 32 2.2.2.6.3. Spare space organizations 34 2.2.2.6.4. Distributing the functionality of the array controller 35 2.2.2.6.5. Striping studies 35 2.2.2.6.6. Disk array performance evaluation 37 2.2.2.6.7. Reliability modeling 37 2.2.2.6.8. Improving the write-performance of RAID Level 1 39 2.2.2.6.9. Network file systems based on RAID 40 2.3. Evaluation methodology 40 2.3.1. Simulation methodology 40 2.3.2. The raidSim disk array simulator 41 2.3.3. Default workload 42 Chapter 3: Disk Array Architectures and Data Layouts 45 3.1. Related work 45 3.1.1. Availability techniques in mirrored arrays 46 3.1.2. Availability techniques for parity-based arrays 47 3.1.2.1. Multiple independent groups 47 3.1.2.2. Distributing the failure-induced workload 49 v 3.1.3. Summary 51 3.2. Disk array layouts for parity declustering 52 3.2.1. Layout goodness criteria 52 3.2.2. Layouts based on balanced incomplete block designs 55 3.2.2.1. Block designs 55 3.2.2.2. Deriving a layout from a block design 56 3.2.2.3. Evaluating the layout 58 3.2.2.4. Finding block designs for layout 60 3.2.3. A related study: layout via random permutations 61 3.2.4. Summary 63 3.3. Primary evaluations 63 3.3.1. Comparing declustering to RAID Level 5 65 3.3.1.1. No effect on fault-free performance 65 3.3.1.2. Declustering greatly benefits degraded-mode performance 65 3.3.1.3. Declustering benefits persist during reconstruction 66 3.3.1.4. Declustering also benefits data reliability 68 3.3.1.5. Summary 70 3.3.2. Varying the declustering ratio 70 3.3.2.1. Fault-free performance 71 3.3.2.2. Degraded- and reconstruction-mode performance 72 3.3.2.3. High data reliability 74 3.3.2.4.