Designing Disk Arrays for High Data Reliability Garth A. Gibson School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsbugh PA 15213 David A. Patterson Computer Science Division Electrical Engineering and Computer Sciences University of California at Berkeley Berkeley, CA 94720 hhhhhhhhhhhhhhhhhhhhhhhhhhhhh This research was funded by NSF grant MIP-8715235, NASA/DARPA grant NAG 2-591, a Computer Measurement Group fellowship, and an IBM predoctoral fellowship. 1 Proposed running head: Designing Disk Arrays for High Data Reliability Please forward communication to: Garth A. Gibson School of Computer Science Carnegie Mellon University 5000 Forbes Ave. Pittsburgh PA 15213-3890 412-268-5890 FAX 412-681-5739
[email protected] ABSTRACT Redundancy based on a parity encoding has been proposed for insuring that disk arrays provide highly reliable data. Parity-based redundancy will tolerate many independent and dependent disk failures (shared support hardware) without on-line spare disks and many more such failures with on-line spare disks. This paper explores the design of reliable, redundant disk arrays. In the context of a 70 disk strawman array, it presents and applies analytic and simulation models for the time until data is lost. It shows how to balance requirements for high data reliability against the overhead cost of redundant data, on-line spares, and on-site repair personnel in terms of an array's architecture, its component reliabilities, and its repair policies. 2 Recent advances in computing speeds can be matched by the I/O performance afforded by parallelism in striped disk arrays [12, 13, 24]. Arrays of small disks further utilize advances in the technology of the magnetic recording industry to provide cost-effective I/O systems based on disk striping [19].