Raid: Redundant Array of Inexpensive Disks

RAID: REDUNDANT ARRAY OF INEXPENSIVE DISKS

CS-350-1

Spring 2004

Jason Drown Mark Rodden

- 1 - Table of Contents

List of Figures ...... 3

Introduction ...... 4

History of RAID ...... 4

Striping ...... 4

Levels of RAID ...... 5 RAID Level 0 ...... 5 RAID Level 1 ...... 5 RAID Level 2 ...... 6 RAID Level 3 ...... 6 RAID Level 4 ...... 6 RAID Level 5 ...... 7 RAID Level 10 ...... 7 RAID Level 0+1 ...... 8 Other Levels ...... 8

Conclusion...... 9

Works Cited ...... 10

- 2 - List of Figures

Figure 1: Data Mapping for RAID Level 0 ...... 5

Figure 3: Data Mapping for RAID Level 3 and 4 ...... 7

Figure 6: Data Mapping for RAID Level 0+1 ...... 8

- 3 - Introduction The purpose of this report is to provide information about RAID. RAID stands for Redundant Array of Inexpensive (or Independent) Disks. RAID provides a method of accessing multiple individual disks as if the array were one larger disk, spreading data access out over these multiple disks, thereby reducing the risk of losing all data if one of the drives fails, and improving access time to data.

Typically RAID is used in large file servers, transaction of application servers, where data accessibility is critical, and fault tolerance is required. Nowadays, RAID is also being used in desktop systems for multimedia editing and playback where higher transfer rates are needed.

History of RAID Early disk drives were enormously costly and occupied a large amount of storage space in proportion to their storage space, and only the largest computers were equipped with large disk storage systems. RAID began as a research project at the University of California, Berkeley and the concept was introduced in 1988 by David Patterson, Garth Gibson, and Randy Katz in their paper, "A Case for Redundant Arrays of Inexpensive Disks."

In their paper, Patterson, Gibson, and Katz defined five levels of RAID with different performance and reliability characteristics. Over time, more levels are been, or are becoming, accepted by the industry. The paper described array configuration and applications for multiple inexpensive hard disks, providing fault tolerance (redundancy) and improved access rates. The basic idea of RAID was to combine multiple small, inexpensive disk drives into an array of disk drives which yields performance exceeding that of a Single Large Expensive Drive (SLED). Additionally, this array of drives appears to the computer as a single logical storage unit. (Neuffer 3)

Striping Fundamental to RAID is "striping", a method of concatenating multiple drives into one logical storage unit. Striping involves partitioning each drive's storage space into strips which may be as small as one sector (512 bytes) or as large as several megabytes. These strips are then interleaved round-robin, so that the combined space (stripes) is composed alternately of strips from each drive. The type of application environment, I/O or data intensive, determines whether large or small strips should be used. (Neuffer 3)

- 4 - Levels of RAID

--RAID Level 0-- Also known as "Disk Striping", this is technically not a RAID level since it provides no fault tolerance. The data is broken down into blocks and each block is written to a separate disk drive. Since data is written in blocks across multiple drives, one drive can be writing or reading a block while the next drive is seeking the next block.

The advantages of striping are the higher access rate, and full utilization of the array capacity. The disadvantage is that there is no fault tolerance - if one drive fails, the entire contents of the array become inaccessible. It offers no data security and is used where maximum performance is the only concern.

Figure 1: Data Mapping for RAID Level 0 (ACNC 1)

--RAID Level 1-- Known as "Disk Mirroring", RAID level 1 provides redundancy by writing twice - once to each drive. It is the simplest RAID storage subsystem design; if one drive fails, the other contains an exact duplicate of the data and the RAID can switch to the mirror drive with no lapse in user accessibility.

The disadvantages of mirroring are no improvement in data access speed, and higher cost, since twice the number of drives is required. However, it provides the best protection of data since the array management software will simply direct all application requests to the surviving drive when one fails.

- 5 - Figure 2: Data Mapping for RAID Level 1 (ACNC 2)

--RAID Level 2-- RAID Level 2, which uses error correction codes, is intended for use with drives which do not have built-in error detection. It was conceived when disk drives were more expensive and less sophisticated. A Hamming code detects errors that occur and determines which part of the data is in error. Most drives support built- in error detection now, so this level is of little use and is not used in the industry today.

--RAID Level 3-- Similar to RAID level 0, RAID level 3 stripes data across multiple drives. However, for this level an additional drive is dedicated to parity for error correction and data recovery. Data is striped at the byte level across all data drives. This level provides very high 'Read' and 'Write' data transfer rates, and disk failure has an insignificant impact on throughput. If a drive fails, the data can be restructured by the parity drive. A low ratio of parity disks to data disks makes for high efficiency; but in order to optimize performance disk rotations must be synchronized, which can be difficult to maintain.

--RAID Level 4-- Like level 3, RAID level 4 stripes data across multiple drives and writes parity to a dedicated parity drive. However, RAID Level 4 stripes data at a block level instead of a byte level. The parity information allows recovery from the failure of any single drive, and the performance of a level 4 array is very good for 'Reads' (the same as level 0) due to the larger strip size. It has the slowest 'Writes', however, because the parity disk is accessed twice for each 'Write' cycle.

- 6 - Figure 3: Data Mapping for RAID Level 3 and 4 (ACNC 4)

--RAID Level 5-- RAID level 5 is the most popular configuration, providing striping as well as parity for error recovery. In RAID level 5, the parity block is distributed among the drives of array (parity is written onto the next available drive rather than a dedicated parity drive), giving a more balanced access load across the drives. The parity information is used to recovery data if one drive fails, and, combined with the load balancing, is the reason this method is the most popular. The disadvantage is a relatively slow write cycle (2 reads and 2 writes are required for each block written).

Figure 4: Data Mapping for RAID Level 5 (ACNC 6)

--RAID Level 10-- This is stripping and mirroring combined, without parity. It is created by combining level 0 and 1 controllers. This combination uses a single level 0 controller, which stripes the data to multiple level 1 controllers. Each level 1 controller then mirrors this data to two different disks. The advantages are fast data access (like RAID 0), and single multiple drive fault tolerance. You can have, at most, half of your drives crash and still be able to function. It requires at least four disks however, and has a limited scalability at a high cost.

- 7 - Figure 5: Data Mapping for RAID Level 10 (ACNC 9)

--RAID Level 0+1-- This is stripping and mirroring combined, without parity. Similar to level 10, in this level is a combination of level 0 and level 1 controllers. This time, a level 1 controller mirrors the data to two level 0 controllers. These controllers then stripe the data over their own set of drives. The advantages are fast data access (like RAID 0), the execution of parallel reads, and single drive fault tolerance (like RAID 1). RAID 0+1 still requires twice the number of disks (like RAID 1 and RAID 10).

Figure 6: Data Mapping for RAID Level 0+1 (ACNC 11)

--Other Levels-- There are many other possible levels of RAID. RAID Levels 0-6 are the basic foundation that all others levels are currently combinations of. Many are only theoretical though, as they would not be practical to implement.

- 8 - Conclusion This report has provided information about the history and use of RAID. RAID provides a method of accessing multiple individual disks as if the array were one larger disk, spreading data access out over these multiple disks, thereby reducing the risk of losing all data if one of the drives fails, and improving access time to data. The six basic levels of RAID and how they are implemented were covered, along with two combinations which are becoming more prevalent in the industry.

- 9 - Works Cited

Advanced Computer & Network Corporation (2000). "RAID.edu" URL: http://www.acnc.com/04_00.html

Alonso, Dr. Gustavo (2002). "RAID and Data Striping." URL: http://edu.gbssg.ch/informatik/Lehrstoff/Informatik/Datenbanken/L4.pdf

Electronix Corporation (2003). "What is RAID?" URL: http://www.raidweb.com/whatis.html

Kintronics (2003). "Whitepaper on RAID." URL: http://www.kintronics.com/raidwpaper.htm

Neuffer, Mike (2001). "High Performance SCSI and RAID." URL: http://www.uni-mainz.de/~neuffer/scsi/index.html

Farley, Marc (2000). Building Storage Networks. Berkeley, CA: Osborne/McGraw-Hill. QA 76.9.D3 F37 2000.

- 10 -