Information and Collaboration in the Storage Stack
Total Page:16
File Type:pdf, Size:1020Kb
INFORMATION AND COLLABORATION IN THE STORAGE STACK by Timothy Edward Denehy A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science) at the UNIVERSITY OF WISCONSIN–MADISON 2006 c Copyright by Timothy Edward Denehy 2006 All Rights Reserved i TABLE OF CONTENTS Page LIST OF TABLES ....................................... iv LIST OF FIGURES ...................................... v ABSTRACT ..........................................viii 1 Introduction ........................................ 1 1.1 Approach ....................................... 4 1.2 DeconstructingStorageArrays . ....... 5 1.3 BridgingtheInformationGap . ..... 6 1.4 CollaboratingLayers . .. .. .. .. .. .. .. .. .... 7 1.5 Contributions ................................... .. 8 1.6 Organization.................................... .. 9 2 Deconstructing Storage Arrays with Shear ....................... 10 2.1 Shear ......................................... 13 2.1.1 Assumptions ................................. 14 2.1.2 Techniques .................................. 16 2.1.3 SimulationFramework . 18 2.1.4 Algorithm................................... 19 2.1.5 RedundancySimulations . .. 32 2.1.6 Overhead ................................... 43 2.2 RealPlatforms ................................... 44 2.3 ShearApplications ............................... ... 50 2.3.1 ShearManagement.............................. 50 2.3.2 ShearDiskCharacterization . .... 53 2.3.3 ShearPerformance .............................. 57 2.4 Discussion...................................... 57 ii Page 2.5 Conclusions..................................... 59 3 Bridging the Information Gap: Exposed RAID and Informed LFS ............................ 61 3.1 Overview ....................................... 63 3.2 ExposedRAID .................................... 64 3.2.1 ASegmentedAddressSpace. 65 3.2.2 DynamicInformation. 67 3.2.3 Implementation................................ 68 3.3 InformedLFS..................................... 69 3.3.1 On-LineExpansionandContraction . ..... 70 3.3.2 DynamicParallelism . 71 3.3.3 FlexibleRedundancy . 74 3.3.4 LazyMirroring ................................ 78 3.4 Evaluation...................................... 80 3.4.1 BaselinePerformance . 80 3.4.2 On-lineExpansion .............................. 82 3.4.3 DynamicParallelism . 84 3.4.4 FlexibleRedundancy . 88 3.4.5 LazyMirroring ................................ 88 3.5 Discussion...................................... 91 3.6 Conclusions..................................... 92 4 Collaborating Layers: Journal-guided Resynchronization .............. 95 4.1 Introduction.................................... .. 95 4.2 TheConsistentUpdateProblem . ..... 98 4.2.1 Introduction.................................. 98 4.2.2 FailureModels ................................ 99 4.2.3 MeasuringVulnerability . 101 4.2.4 Solutions ...................................104 4.3 ext3Background .................................. 107 4.3.1 Modes.....................................108 4.3.2 TransactionDetails. 109 4.3.3 JournalStructure .............................. 110 4.4 DesignandImplementation. .110 4.4.1 ext3WriteAnalysis. .. .. .. .. .. .. .. .. 111 4.4.2 ext3DeclaredMode .............................114 4.4.3 SoftwareRAIDInterface. 116 iii Page 4.4.4 RecoveryandResynchronization. .118 4.5 Evaluation...................................... 120 4.5.1 ext3DeclaredMode .............................120 4.5.2 Journal-guidedResynchronization . .......132 4.5.3 Complexity..................................135 4.6 Conclusions..................................... 136 5 Related Work .......................................137 5.1 Gray-boxApplications . .137 5.2 StoragePerformance .............................. 138 5.3 VolumeManagersandSoftwareRAID . .140 5.4 ExploitingStorageDetails . .. .. .141 5.5 ExpandingStorageInterfaces. .......142 6 Conclusions ........................................145 6.1 SummaryandObservations. .145 6.2 FutureWork...................................... 147 6.2.1 Shear .....................................147 6.2.2 InformedLFS.................................148 6.2.3 Journal-guidedResynchronization . .......149 6.2.4 RAID-awareFileSystems . 150 6.3 TheEnd........................................150 LIST OF REFERENCES ...................................152 iv LIST OF TABLES Table Page 4.1 JournalWriteRecords. .. .. .113 4.2 ComplexityofLinuxModifications. .........135 v LIST OF FIGURES Figure Page 2.1 ExamplesandTerminology. ...... 15 2.2 PatternSizeDetectionAlgorithm. .......... 20 2.3 PatternSizeDetection:SampleExecution. ............ 21 2.4 PatternSizeDetection:Simulations. ............ 22 2.5 ChunkSizeDetectionAlgorithm. ........ 24 2.6 ChunkSizeDetection:SampleExecution. ........... 25 2.7 ChunkSizeDetection:Simulations. .......... 26 2.8 LayoutDetectionAlgorithm.. ........ 27 2.9 ReadLayoutDetection:Simulations. .......... 28 2.10 PatternSizeandChunkSizeDetection: RAID-0. ............. 33 2.11 PatternSizeDetection:RAID-5. .......... 35 2.12 ReadandWriteLayoutDetection:RAID-5. ........... 36 2.13 Pattern Size, Chunk Size, and Layout Detection: RAID-4. ............... 37 2.14 Pattern Size, Chunk Size, and Layout Detection: P+Q. ............... 39 2.15 Pattern Size, Chunk Size, and Layout Detection: RAID-1. ............... 41 2.16 Pattern Size, Chunk Size, and Layout Detection: ChainedDeclustering. 42 vi Figure Page 2.17 ShearOverhead. ................................. .... 45 2.18 SensitivitytoRegionSize. ......... 47 2.19 AvoidingtheWriteBuffer. ........ 48 2.20 RedundancyDetection.. ....... 49 2.21 DetectingMisconfiguredLayouts. .......... 51 2.22 DetectingHeterogeneity. ......... 52 2.23 DetectingFailure.. .. .. .. .. .. .. .. .. ....... 54 2.24 Skippy. ........................................ .. 56 2.25 BenefitsofStripeAlignment. ........ 58 3.1 AnExampleE×RAIDConfiguration. .. .. .. .. .. .. .. 66 3.2 TheCrossedPointerProblem. ....... 76 3.3 BaselinePerformanceComparison. ......... 81 3.4 StorageExpansion................................ ..... 83 3.5 StaticStorageHeterogeneity.. .......... 85 3.6 DynamicStorageHeterogeneity. ......... 86 3.7 StorageFailure.................................. ..... 89 3.8 Per-fileRedundancy. .............................. ..... 90 3.9 LazyMirroring................................... .... 93 4.1 FailureScenarios................................ .. .. .100 4.2 SoftwareRAIDVulnerability. ........103 4.3 SoftwareRAIDResynchronizationTime. ..........106 4.4 RandomWritePerformance. .. .. .121 vii Figure Page 4.5 SequentialWritePerformance.. .........123 4.6 SpriteCreatePerformance.. ........125 4.7 sshBenchmarkPerformance.. .......127 4.8 PostmarkPerformance.. .. .. .128 4.9 TPC-BPerformance. ............................... .130 4.10 TPC-BwithVariedsync()Intervals. ...........131 4.11 SoftwareRAIDResynchronization. ..........133 viii ABSTRACT Though there have been substantial innovations in both file systems and storage systems over the past twenty-five years, the interface with which they communicate has remained simple and abstract. The storage stack that exists today was not developed in a coherent manner; rather, it evolved over time due to the flexible nature of this abstraction. The result is an information gap: the file system no longer understands the nature of its underlying storage, and the storage system cannot comprehend the semantics of the blocks it stores. In the end, this obscuring interface has lead to the development of independent, locally optimized, complex layers whose interactions may lead to poor performance and limited functionality. Therefore, we believe it is time to re-examine the structure of the storage stack and the division of labor between the file system and storage system layers. Specifically, we advocate designs that enable vertical coordination and even collaboration between layers in order to achieve the goals of the entire storage stack. Furthermore, we believe the key to achieving these goals lies in information, and therefore depends on the development of informing interfaces that facilitate such vertical designs. ix In this dissertation, we take three steps in this direction. First, we develop a system that auto- matically discovers the properties of a RAID storage system using only the logical block abstrac- tion, transforming the obscuring interface into a basic informing interface. Second, we examine the applicability of such storage level information by designing an improved informing interface and a file system that explicitly manages the multiple disks of a storage array. Third, we develop a vertical design for collaboration between journaling file systems and RAID systems that improves the reliability and availability of the overall storage system. 1 Chapter 1 Introduction A chasm exists in the world of file storage and management. Though a hierarchical file system of directories and byte-accessible files has been the norm for almost 30 years [52], the internals of both file systems and underlying storage systems have evolved substantially. In file systems, many approaches have been developed to improve performance, including read- optimized inode and file placement [40], logging of writes [55], improved meta-data update meth- ods [67], more scalable internal data structures [75], and off-line reorganization strategies [39]. However, almost all such techniques have been developed under the assumption that the file sys- tem will be run upon a single, traditional disk. Storage systems have also received