Reliable, Parallel Storage Architecture: RAID & Beyond
Total Page:16
File Type:pdf, Size:1020Kb
Reliable, Parallel Storage Architecture: RAID & Beyond Garth Gibson School of Computer Science & Dept of Electrical and Computer Engineering Carnegie Mellon University [email protected] URL: http://www.cs.cmu.edu/Web/Groups/PDL/ Anonymous FTP on ftp.cs.cmu.edu in project/pdl Patterson, Gibson, Katz, Sigmod, 88. Gibson, Hellerstein, Asplos III, 89. Gibson, MIT Press, 92. Gibson, Patterson, J. Parallel and Distributed Computing, 93. Drapeau, et al., ISCA, 94. Chen, Lee, Gibson, Katz, Patterson, ACM Computing Surveys, 94. Stodolsky, Courtright, Holland, Gibson, ACM Trans on Computer Systems, 94. Holland, Gibson, Siewiorek, J. Distributed and Parallel Databases, 94. Patterson, Gibson, Parallel and Distributed Information Systems, 94. Patterson, et al, CMU-CS-95-134, 95. Carnegie Gibson, et al, Compcon, 95. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 1/74 G. Gibson, CMU Tutorial Outline • RAID basics: striping, RAID levels, controllers • recent advances in disk technology • expanding RAID markets • RAID reliability: high and higher • RAID performance: fast recovery, small writes • embedding storage management • exploiting disk parallelism: deep prefetching • RAID over the network Carnegie Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 2/74 G. Gibson, CMU Review of RAID Basics (RAID in the 80s) Motivating need for more storage parallelism Striping for parallel transfer, load balancing Rapid review of original RAID levels Simple performance model of RAID levels 1, 3, 5 Basic controller design RAID = Redundant Array of Inexpensive (Independent) Disks Patterson, Gibson, Katz, Sigmod, 88. Carnegie P. Massiglia, DEC, RAIDBook, 93. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 3/74 G. Gibson, CMU What I/O Performance Crisis? Existing huge gap in performance • access time ratio, disk:dram, is 1000 X sram:dram, onchip:sram Cache-ineffective applications stress hierarchy • video, data mining, scientific visualization, digital libraries Increasing gap in performance • 40-100+%/year VLSI versus 20-40%/year magnetic disk Amdahl’s law implies diminishing decreases in application response time Carnegie Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 4/74 G. Gibson, CMU Disk Arrays: Parallelism in Storage Carnegie Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 5/74 G. Gibson, CMU More Disk Array Motivations Volumetric and floorspace density Exploit best technology trend to smaller disks Increasing requirement for high reliability Increasing requirement for high availability Enabling technology: SCSI storage abstraction Carnegie Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 6/74 G. Gibson, CMU Data Striping for Read Throughput Parallelism will only be effective if • load balance high concurrency, small accesses • parallel transfer low concurrency, large accesses Striping data provides both • uniform load for small independent accesses 0 1 2 3 stripe unit large enough to contain sin- gle accesses 4 56 7 • parallel transfer for large accesses stripe unit small enough to spread access widely Carnegie Livny, Khoshafian, Boral, Sigmetrics, 87. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 7/74 G. Gibson, CMU Sensitivity to Stripe Unit Size Simple model of 10 balanced, sync’d disks Response Time Throughput at low loads at 50% utilization 1000 800 400 250 non-striped non-striped 160 80 ms IO/s 1K 32K 65 25 32K 1K byte interleaved 25 8 byte interleaved 10 2.5 1 10 100 1000 1 10 100 1000 Request Size KB Request Size KB Carnegie Gibson, MIT Press, 92 (derived from J. Gray, VLDB, 90). Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 8/74 G. Gibson, CMU Selecting a Good Striping Unit (S.U.) Balance benefit to penalty: • benefit: parallel transfer shortens transfer time • penalty: parallel positioning consumes more disk time Stochastic simulation study of maximum throughput yields simple rules of thumb • 16 disks, sync’d; find max min of normalized throughput Given I/O concurrency (conc) • S.U. = 1/4 * (positioning time) * (transfer rate) * (conc - 1) + 1 Given zero workload knowledge • S.U. = 2/3 * (positioning time) * (transfer rate) Carnegie Chen, Patterson, ISCA, 90. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 9/74 G. Gibson, CMU Adding Redundancy to Striped Arrays Meet performance needs with more disks in array • implies more disks vulnerable to failure Striping all data implies failure effects most files • failure recovery involves processing full dump and all increments Provide failure protection with redundant code == RAID Single failure protecting codes • general single-error-correcting code too powerful • disk failures are self-identifying - called erasures • fact: T-error-detecting code is also a T-erasure-correcting code Parity is single-disk-failure-correcting Carnegie Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 10/74 G. Gibson, CMU Synopsis of RAID Levels RAID Level 0: Non-redundant RAID Level 1: Mirroring RAID Level 2: Byte-Interleaved, ECC RAID Level 3: Byte-Interleaved, Parity RAID Level 4: Block-Interleaved, Parity RAID Level 5: Block-Interleaved, Distributed Parity Carnegie Chen, Lee, Gibson, Katz, Patterson, ACM Computing Surveys, 94. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 11/74 G. Gibson, CMU Basic Tradeoffs in RAID Levels Relative to non-redundant, 16 disk (sync’d) array 100% C R r W w C R r W w C R r W w RAID Level 1 RAID Level 3 RAID Level 5 Mirroring Byte-Interleaved Block-Interleaved Parity Distributed Parity Patterson, Gibson, Katz, Sigmod, 88. Carnegie Bitton, Gray, VLDB, 88. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 12/74 G. Gibson, CMU Parity Placement Throughput generally insensitive • stochastic workload on 2 groups of 4+1, 32 KB stripe units Left Symmetric One Read At All Times Left Symmetric 12 0123P0 10 0 123P0 RAID0, 567P14 8 XLS,FLS 567P1 4 MB/sec 10 11 P2 8 9 6 LS 10 11 P2 89 RAID4,RA,LA 15 P3 12 13 14 4 RS 15 P3 12 13 14 P4 16 17 18 19 2 P4 16 17 18 19 0 0 1 2 3 4 5 6 7 8 9 10 Request Size (100KB) G Gib Carnegie Lee, Katz, Asplos IV, 91. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 13/74 G. Gibson, CMU Stripe Unit (S.U.) Size in RAID Writes in RAID must update redundancy code • full stripe overwrites can compute new code in memory • other accesses must pre-read disk blocks to determine new code • write work limits concurrency gains with large S.U. Rerun Chen’s ISCA 90 simulations in RAID 5 Given I/O concurrency (conc) • read S.U. = 1/4 * (positioning time) * (transfer rate) * (conc - 1) + 1 • write S.U. = 1/20 * (positioning time) * (transfer rate) * (conc - 1) + 1 Given zero workload knowledge • S.U. = 1/2 * (positioning time) * (transfer rate) Carnegie Chen, Lee, U. Michigan CSE-TR-181-93. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 14/74 G. Gibson, CMU Modeling RAID 5 Performance Queueing model for RAID 5 response time • controller directly manages disk queues • separate channels for each disk • first-come-first-serve scheduling policies • accesses are 1 to N stripe units, N = number of data disks • detailed model of disk seek and rotate used Model succeeds for case of special parity handling • separate, high-priority queue for parity R-M-W operations • parity R-M-W not generated until (last) data operation starts • parity R-M-W does not preempt current data operation • implies parity update nearly always one rotation, finishing last Carnegie S. Chen, D. Towsley, J. Parallel and Distributed Computing, 93. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 15/74 G. Gibson, CMU Example Controller: NCR (Symbios) 6299 Minimal buffering for RAID 5 parity calc. only • low cost and complexity controller - quick to market • problem: concurrent disk accesses return out-of-order (TTD/CIOP) • problem: max RAID 5 bandwidth is single drive bandwidth Fast SCSI-2 NCR 53C916 Fast/Wide NCR SCSI-2 Host Interconnect 53C916 NCR NCR 53C916 53C916 NCR 53C916 NCR 53C916 MUX XOR NCR 53C920 Read/Modify/Write 64KB SRAM Buffer Carnegie Chen, Lee, Gibson, Katz, Patterson, ACM Computing Surveys, 94. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 16/74 G. Gibson, CMU Adding Buffering to RAID Controller Performance advantages; added complexity • parallel independent transfers to all disks • XOR placement: stream through or separate transfer • buffer management - extra buffers give higher concurrency Large buffer XOR XOR High bandwidth controller data bus Host Carnegie Menon, Cortney, ISCA, 93. Mellon Parallel Data Laboratory Data Storage Systems Center ISCA’95 Reliable, Parallel Storage Tutorial Slide 17/74 G. Gibson, CMU Recap RAID Basics Disk arrays respond to increasing requirements for I/O throughput, reliability, availability Data