RAID Technology Overview HP Smart Array RAID Controllers

The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

Warranty A copy of the specific warranty terms applicable to your Hewlett-Packard product and replacement parts can be obtained from your local Sales and Service Office.

U.S. Government License Proprietary computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license.

Trademark Notices UNIX® is a registered trademark in the United States and other countries, licensed exclusively through The Open Group. Table of Contents

About This Document ...... 9 Intended Audience...... 9 New and Changed Documentation in This Edition...... 9 Publishing History...... 9 Document Organization...... 9 Typographical Conventions...... 9 Related Documents...... 9 HP Encourages Your Comments...... 9

1 Introduction to RAID Technology...... 11 What is RAID?...... 11 The RAID Concept...... 11 Physical Disks and Logical Drives...... 12 Performance and Data Redundancy...... 13 Increasing Logical Drive Performance...... 13 Protecting Data With Fault Tolerance and Spare Disks...... 14

2 Smart Array Controller Supported RAID Configurations...... 15 RAID 0: No Fault Tolerance...... 15 RAID 1: Disk Mirroring...... 15 RAID 1+0: Disk Mirroring and Striping...... 16 RAID 5: Distributed Data Guarding...... 17 RAID ADG: Advanced Data Guarding...... 17 Summary of RAID Methods...... 19 Choosing a RAID Method...... 19

A Logical Drive Failure Probability...... 21 RAID Level and Probability of Drive Failure...... 21

Glossary...... 23

Index...... 25

Table of Contents 3 4 List of Figures

1-1 Physical Disks Added to System ...... 13 1-2 Physical Disks Configured into a Logical Drive (L1) ...... 13 1-3 Data Striping (S1-S4) of Data Blocks B1-B12 ...... 14 2-1 Data Striping (S1-S4) of Data Blocks B1-B12 ...... 15 2-2 Disk Drive Mirroring of P1 onto P2 (RAID 1)...... 16 2-3 Mirroring and Striping (RAID 1+0)...... 16 2-4 Distributed Data Guarding, Showing Parity Information (Px,y) ...... 17 2-5 Advanced Data Guarding, Showing Parity Information (Px,y and Qx,y)...... 18 A-1 Relative Probability of Logical Drive Failure...... 22

5 6 List of Tables

2-1 Summary of RAID Methods...... 19 2-2 Choosing a RAID Method...... 19

7 8 About This Document

This document provides an overview of Redundant Array of Independent Disks (RAID) technology. The information in this document applies to HP Integrity and HP 9000 servers equipped with HP Smart Array RAID controllers, running any supported operating system. The latest version of this document can be found online at: http://docs.hp.com/en/netcom.html#Smart%20Array%20%28RAID%29 Intended Audience This document is intended for system and network administrators responsible for installing, configuring, and managing fault tolerant data storage. Administrators are expected to have knowledge of server hardware and operating system concepts, commands, and configuration. This document is not a tutorial. New and Changed Documentation in This Edition This is the first edition of this document. Publishing History This is the first edition of this document. Document Organization The RAID Technology Overview is divided into several chapters containing information about RAID in general, the RAID levels supported by each Smart Array Controller specifically, and installation, configuration, and troubleshooting details for the Smart Array Controllers. There are also several appendixes containing supplemental information. Chapter 1 Introduction to RAID Technology Use this chapter to learn about RAID in general, and how RAID technology can improve performance and data integrity. Chapter 2 Smart Array Controller Supported RAID Configurations Use this chapter to learn about the theory, advantages, and disadvantages of each RAID level. Appendix A Logical Drive Failure Probability Typographical Conventions This document uses the following conventions. Book Title The title of a book. Emphasis Text that is emphasized. Bold Text that is strongly emphasized. Bold The defined use of an important word or phrase. Related Documents Additional information about HP Smart Array RAID controllers can be found at: http://docs.hp.com/en/netcom.html#Smart%20Array%20%28RAID%29 HP Encourages Your Comments HP encourages your comments concerning this document. We are committed to providing documentation that meets your needs. Please send comments to:

Intended Audience 9 [email protected] Please include document title, manufacturing part number, and any comment, error found, or suggestion for improvement you have concerning this document. Also, please let us know if there is anything about this document that is particularly useful, so we can incorporate it into our other documents.

10 About This Document 1 Introduction to RAID Technology

This chapter provides an overview of RAID technology and descriptions of the different RAID levels that are supported by HP Smart Array Controllers. This chapter addresses the following topics: “What is RAID?” (page 11) “Performance and Data Redundancy” (page 13) What is RAID? The RAID concept was proposed in 1987 when “A Case for Redundant Arrays of Inexpensive Disks (RAID)” was published by David Patterson, Garth Gibson, and Randy Katz at the University of California, Berkeley. The idea was to combine multiple small, inexpensive physical disks into an array that would function as a single logical drive, but provide better performance and higher data availability than a single large expensive disk drive (SLED). The study defined five different disk array configurations, or RAID levels. All of the RAID levels provided fault tolerance, and each RAID level offered different feature sets and performance to accommodate different system administration priorities and computing environments.

NOTE: RAID now stands for “Redundant Array of Independent Disks”, because disks have become inexpensive. Small disk drives are lower in performance and have less capacity compared to large disk drives. Small drives also have lower storage density than large drives. However, small disk drives are equal to or better than large disk drives in four areas: • I/O per actuator (multiple I/O capability) • Cost per megabyte • Mean time between failures (MTBF) • SCSI controller per disk drive (better cost/performance ratio) Grouping small disk drives into an array provides the following additional advantages: • High transfer rates • Increased disk capacity • High I/O rates The RAID study pointed out that as the number of disk drives in an array (also called a stripe set) increases, the MTBF of the array decreases. At the time the RAID study was published, if a disk drive crashed data restoration was typically dependent on backup from a tape drive. In addition, the system was taken offline to replace the failed disk. The RAID Concept The RAID study proposed a multilevel concept for improved data input/output performance (by combining multiple physical disks) and improved data availability (by avoiding the impact of disk drive failures). Five original RAID configurations, or “levels” (RAID 1 through RAID 5), were defined to meet the needs of various computing environments. As the five original RAID configurations progress from RAID 1 through RAID 5, data redundancy increases.

What is RAID? 11 Overall, RAID has three main attributes that are exploited in some way by all five original RAID configurations and by most other RAID configurations that have been defined since the 1987 study. These attributes are: • A set of physical disk drives that can function as one or more logical drives (improved I/O) • Data distribution across multiple physical disks (striping) • Data recovery, or reconstruction of data in the event of a physical disk failure (redundancy) “RAID 0” was not defined in the original study, and it does not have all of these attributes. The term was adopted to describe a disk array configuration that includes data block striping, but lacks redundancy. RAID 2, RAID 3 and RAID 4 have become impractical due to technological changes. Other RAID configurations (including some that are proprietary) have been defined over the years as well. You can read the original RAID study at: http://techreports.lib.berkeley.edu/accessPages/CSD-87-391.html RAID configurations supported by HP Smart Array Controllers are as follows: • RAID 0 • RAID 1+0 • RAID 5 • ADG Physical Disks and Logical Drives The group of physical disks containing the logical drive is called an array (or drive array). Since all the physical disks in an array are commonly configured into a single logical drive, the term array is also used as a synonym for logical drive. In this document, “disk” refers to a physical disk, and “drive” refers to a logical drive or array.

12 Introduction to RAID Technology Performance and Data Redundancy Increasing Logical Drive Performance Without an array controller, connecting extra physical disks to a system increases the total storage capacity. However, it has no effect on the efficiency of read/write operations, because data can only be transferred to one physical disk at a time (see Figure 1-1).

Figure 1-1 Physical Disks Added to System

With an array controller, connecting extra physical disks to a system increases both the total storage capacity and the read/write efficiency. The capacity of several physical disks is combined into one or more virtual units called logical drives (also called logical volumes). The read/write heads of all of the physical disks in a logical drive are active simultaneously, improving I/O performance and reducing the total time required for data transfer (see Figure 1-2).

Figure 1-2 Physical Disks Configured into a Logical Drive (L1)

Because the read/write heads for each physical disk are active simultaneously, the same amount of data is written to each disk during any given time interval. Each unit of data is called a block. The blocks form a set of data stripes that are spread evenly over all the physical disks in a logical drive (see Figure 1-3).

Performance and Data Redundancy 13 Figure 1-3 Data Striping (S1-S4) of Data Blocks B1-B12

For data in the logical drive to be readable, the data block sequence must be the same in every stripe. This sequencing process is performed by the Smart Array Controller, which sends the data blocks to the physical disk, writing the heads in the correct order. In a striped array, each physical disk in a logical drive contains the same amount of data. If one physical disk has a larger capacity than other physical disks in the same logical drive, the extra capacity cannot be used. A logical drive can extend over more than one channel on the same controller, but it cannot extend over more than one controller. Disk failure, although rare, is potentially catastrophic to an array. If a physical disk fails, the logical drive it is assigned to fails, and all of the data on that logical drive is lost. Protecting Data With Fault Tolerance and Spare Disks To protect against data loss due to physical disk failure, logical drives can be configured with fault tolerance. Fault-tolerant RAID configurations that are supported by the Smart Array Controllers are as follows: RAID 1 Data Mirroring only (fault tolerant) RAID 1+0 Drive Mirroring and Striping (fault tolerant) RAID 5 Distributed Data Guarding (fault tolerant) RAID ADG Advanced Data Guarding (fault tolerant) For any fault-tolerant configuration, you can create further protection against data loss by assigning a physical disk as an online spare (or “hot spare”). Spare disks contain no data and must be in the same array as the logical drive they are assigned to. Multiple spare physical disks can be assigned to a logical drive, limited only by the availability of unused disks in the array.

NOTE: When multiple logical drives are defined on a controller, spare disks must be assigned to each logical drive. When a physical disk in the array fails, the controller automatically rebuilds the information from the failed disk onto an online spare. The system is quickly restored to full RAID-level data protection. In the unlikely event that another disk in the array fails while data is being rewritten to the spare, the logical drive may fail, depending on which RAID configuration is in use. For more information, see Appendix A (page 21).

14 Introduction to RAID Technology 2 Smart Array Controller Supported RAID Configurations

This chapter provides details about each of the RAID levels that are supported by HP Smart Array Controllers. This chapter addresses the following topics: “RAID 0: No Fault Tolerance” “RAID 1: Disk Mirroring” (page 15) “RAID 1+0: Disk Mirroring and Striping” (page 16) “RAID 5: Distributed Data Guarding” (page 17) “RAID ADG: Advanced Data Guarding” (page 17) “Summary of RAID Methods” (page 19) “Choosing a RAID Method” (page 19) RAID 0: No Fault Tolerance The RAID 0 configuration enhances performance with data striping, but there is no data redundancy to protect against data loss when a physical disk fails. RAID 0 is useful for rapid storage of large amounts of non-critical data (for printing or image editing, for example), or when cost is the most important consideration (see Figure 2-1 (page 15)).

Figure 2-1 Data Striping (S1-S4) of Data Blocks B1-B12

The advantages of RAID 0 are as follows: • Highest performance configuration for writes • Lowest cost per unit of data stored • All disk capacity is used to store data (none needed for fault tolerance) The disadvantages of RAID 0 are as follows: • All data on the logical drive is lost if a physical disk fails. • Online spare disks are not available. • Data preservation by backing up to external physical disks only. RAID 1: Disk Mirroring In this configuration, only two physical disks are present in the array. Data is duplicated from one disk onto the other, creating a mirrored pair of disk drives, but there is no striping of data (see Figure 2-2: “Disk Drive Mirroring of P1 onto P2 (RAID 1)”, ).

RAID 0: No Fault Tolerance 15 Figure 2-2 Disk Drive Mirroring of P1 onto P2 (RAID 1)

The advantages of RAID 1 are as follows: • No data loss or interruption of service if a disk fails. • Fast read performance — data is available from either disk. The disadvantages of RAID 1 are as follows: • High cost — 50% of disk space is allocated for data protection, so only 50% of total disk drive capacity is usable for data storage. RAID 1+0: Disk Mirroring and Striping RAID 1+0 requires an array with four or more physical disks. The disks are mirrored in pairs and data blocks are striped across the mirrored pairs.

Figure 2-3 Mirroring and Striping (RAID 1+0)

In each mirrored pair, the physical disk that is not busy answering other requests answers any read request sent to the array; this behavior is called load balancing. If a physical disk fails, the remaining disk in the mirrored pair can still provide all the necessary data. Several disks in the array can fail without incurring data loss, as long as no two failed disks belong to the same mirrored pair. This fault-tolerance method is useful when high performance and data protection are more important than the cost of physical disks.

16 Smart Array Controller Supported RAID Configurations The advantages of RAID 1+0 are as follows: • Highest read and write performance of any fault-tolerant configuration. • No loss of data as long as no of failed disks are mirrored to any other failed disk (up to half of the physical disks in the array can fail). The disadvantages of RAID 1+0 are as follows: • Expensive — many disks are needed for fault tolerance. • Only 50% of total disk capacity usable for data storage. RAID 5: Distributed Data Guarding RAID 5 uses a parity data formula to create fault tolerance. In RAID 5, one block in each data stripe contains parity data that is calculated for the other data blocks in that stripe. The blocks of parity data are distributed over the physical disks that make up the logical drive, with each physical disk containing only one block of parity data (see Figure 2-4). When a physical disk fails, the data that was on the failed disk can be calculated from the parity data in the data blocks on the remaining physical disks in the logical drive. This recovered data is usually written to an online spare in a process called a rebuild. RAID 5 is useful when cost, performance, and data availability are all equally important.

Figure 2-4 Distributed Data Guarding, Showing Parity Information (Px,y)

The advantages of RAID 5 are as follows: • High read performance • No loss of data if one physical disk fails. • More usable disk capacity than with RAID 1+0; parity information only requires the storage space equivalent to one physical disk on the array. The disadvantages of RAID 5 are as follows: • Relatively low write performance • Data loss occurs if a second disk fails before data from the first failed disk is rebuilt. RAID ADG: Advanced Data Guarding RAID Advanced Data Guarding (ADG), sometimes referred to as RAID 6, is similar to RAID 5 in that parity data is generated and stored to protect against data loss caused by physical disk failure. However, with RAID ADG two different sets of parity data are generated for each data block on a stripe. The two parity data blocks are stored on different physical disks, allowing data to be preserved even if two physical disks fail simultaneously. Figure 2-5 illustrates the two sets of parity data that require as much storage capacity as the data blocks they correspond to on each stripe in a logical drive.

RAID 5: Distributed Data Guarding 17 RAID ADG is most useful when data loss is unacceptable but cost must also be minimized. The probability that data loss will occur when arrays are configured with RAID ADG is less than when they are configured with RAID 5. For more information, see Appendix A (page 21).

Figure 2-5 Advanced Data Guarding, Showing Parity Information (Px,y and Qx,y)

The advantages of RAID ADG are as follows: • High read performance. • High data availability—any two disks can fail without loss of critical data. • More disk capacity usable than with RAID 1+0; parity information requires only the storage space equivalent to two physical disks. The only significant disadvantage of RAID ADG is a relatively low write performance (lower than RAID 5), due to the need for two sets of parity data.

18 Smart Array Controller Supported RAID Configurations Summary of RAID Methods Table 2-1 summarizes the important features of the different RAID configurations that are supported by the Smart Array Controllers. The decision chart in Table 2-2 (page 19) can help you determine which option is best for your computing environment. Table 2-1 Summary of RAID Methods

RAID 0 RAID 1 RAID1+0 RAID 5 RAID ADG

Alternative name Striping (no Mirroring Mirroring and Distributed Data Advanced Data fault tolerance) Striping Guarding Guarding

Usable disk space* 100% 50% 50% 67% to 96% 50% to 93%

Usable disk space n n/2 n/2 (n-1)/n (n-2)/n formula

Minimum number of 1 2 4 3 4 physical disks

Tolerates failure of No Yes Yes Yes Yes one physical disk?

Tolerates No No Only if no two failed No Yes simultaneous failure disks are in a of more than one mirrored pair physical disk?

Read performance High High High High High

Write performance High Medium Medium Low Low

Relative cost Low High High Medium Medium

*Values for usable disk space are calculated with these assumptions: (1) All physical disks in the array have the same capacity; (2) Online spares are not used; (3) No more than 28 physical disks are used per array for RAID 5.

Choosing a RAID Method Table 2-2 summarizes the advantages of each RAID method. Use this table to select the best RAID method for your needs. Table 2-2 Choosing a RAID Method

Most Important Also Important Suggested RAID Level

Fault tolerance Cost effectiveness RAID ADG

I/O performance RAID 1, RAID 1+0

Cost effectiveness Fault tolerance RAID ADG

I/O performance RAID 5 (RAID 0 if fault tolerance is not required)

I/O performance Cost effectiveness RAID 5 (RAID 0 if fault tolerance is not required)

Fault tolerance RAID 1, RAID 1+0

Summary of RAID Methods 19 20 A Logical Drive Failure Probability

This appendix discusses the probability of logical drive failure. RAID Level and Probability of Drive Failure The probability that a logical drive will fail depends on the RAID level setting. • A RAID 0 logical drive fails if only one physical disk fails. • A RAID 1+0 logical drive fails under the following conditions: — The maximum number of physical disks that can fail without causing failure of the logical drive is n/2, where n is the number of physical disks in the array. This maximum is reached only if no failed disk is mirrored to any other failed disk. In practice, a logical drive usually fails before this maximum is reached. As the number of failed disks increases, it becomes increasingly likely that a newly failed disk is mirrored to a previously failed disk. — The failure of only two physical disks can cause a logical drive to fail if the two disks happen to be mirrored to each other. The risk of this occurring decreases as the number of mirrored pairs in the array increases. • A RAID 5 logical drive (with no online spare) fails if two physical disks fail. • A RAID ADG logical drive (with no online spare) fails when three physical disks fail. At any given RAID level, the probability of logical drive failure increases as the number of physical disks in the logical drive increases. Figure A-1: “Relative Probability of Logical Drive Failure”, provides quantitative information about logical drive failure. The data for this graph is calculated from the mean time between failure (MTBF) value for a typical physical disk, assuming that no online spares are present. If an online spare is added to a fault-tolerant RAID configurations, the probability of logical drive failure decreases.

RAID Level and Probability of Drive Failure 21 Figure A-1 Relative Probability of Logical Drive Failure

22 Logical Drive Failure Probability Glossary array A set of physical disks configured into one or more logical drives. Arrayed disks have significant performance and data protection advantages over non-arrayed disks. array capacity See capacity expansion. expansion Auto-Reliability Monitoring (ARM) Also known as surface analysis. A fault management feature that scans physical disks for bad sectors. Data in the faulty sectors remaps onto good sectors. Also checks parity data consistency for disks in RAID 5 or RAID ADG configurations. Operates as a background process. Automatic Data A process that automatically reconstructs data from a failed disk and writes it onto a replacement Recovery disk. Automatic Data Recovery time depends on several factors, but HP recommends that you allow at least 15 minutes per gigabyte. Also known as rebuild. cache A high-speed memory component, used to store data temporarily for rapid access. capacity The addition of physical disks to an existing disk array, and redistribution of existing logical expansion drives and data over the enlarged array. The size of the logical drives does not change. Also known as array capacity expansion. capacity The enlargement of a logical drive without disruption of data. There must be free space on the extension array before capacity extension can occur. If necessary, create free space by deleting a logical drive or by carrying out a capacity expansion. Also known as logical drive capacity extension. data guarding See RAID. data striping Writing data to logical drives in interleaved chunks (by byte or by sector). Data striping improves system performance by distributing data evenly across all physical disks in the array, but has no fault tolerance drive mirroring Duplicating data from one disk onto a second disk. Mirroring provides fault tolerance, but can only recover from failure of one physical disk per mirrored pair. Error Correction A type of memory that checks and corrects single-bit or multi-bit memory errors (depending and Checking on configuration) without causing the server to halt or to corrupt data. (ECC) memory fault tolerance The ability of a server to recover from physical disk hardware problems without interrupting server performance or corrupting data. Hardware RAID is most commonly used, but there are other types of fault tolerance, including controller duplexing and software-based RAID. hot spare See online spare. interim data If a disk fails in RAID 1, 1+0, 5 or ADG, the system still processes I/O requests, but at a reduced recovery performance level. logical drive A group of physical disks, or part of a group, that behaves as one storage unit. Each constituent physical disk contributes the same storage volume to the total volume of the logical drive. A logical drive has performance advantages over individual physical disks. Also known as a logical volume. logical drive See capacity extension. capacity extension online spare A fault-tolerant system that normally contains no data. When any other disk in the array fails, the controller automatically rebuilds the data that was on the failed disk onto the online spare. Also known as a hot spare. physical disk A random-access storage device. In traditional non-arrayed storage, one physical disk typically contains a single logical drive. In RAID configurations, multiple disks are combined to form a single logical drive. rebuild See Automatic Data Recovery.

23 Redundant Array A form of fault-tolerant storage control. See Chapter 1: “Introduction to RAID Technology” of Independent (page 11). Disks (RAID) Self-Monitoring, Technology co-developed by HP and the physical disk industry that provides warning of Analysis, and imminent disk failure. The self-monitoring routines are customized for each specific disk type Reporting and have direct access to internal performance, calibration, and error measurements. Technology (S.M.A.R.T.) spare See online spare. striping See data striping. surface analysis See Auto-Reliability Monitoring.

24 Glossary Index

A N ADGSee RAID ADG, 17 no fault tolerance (RAID 0), 15 advanced data guarding (RAID ADG), 17 array O defined, 12 online spare physical limitations of, 14 defined, 14 limitations of, 14 B block of data, defined, 13 P parity data C in RAID 5, 17 comparison in RAID ADG, 17 of different RAID methods, 19 physical drivesSee hard drives, 14 protecting data D RAID methods, 14 data block, defined, 13 data protection methods R RAID, 14 RAID 0 (no fault tolerance), 15 data stripes, defined, 13 RAID 1+0 (drive mirroring), 15 distributed data guarding (RAID 5), 17 RAID 5 (distributed data guarding), 17 drive arraySee array, 12 RAID ADG (advanced data guarding), 17 drive mirroring (RAID 1+0), 15 RAID methods comparison with each other, 19 F selection chart for, 19 fault tolerance summary of features, 19 description of methods, 14 RAID methodsSee also fault tolerance, 14 fault toleranceSee also RAID methods, 14 features S of RAID methods, 19 spare drives defined, 14 H striping data, defined, 13 hard drive failure summary fault tolerance and, 19 of RAID methods, 19 multiple, simultaneous, 19 protection against, 14 hard drives minimum number of, for RAID, 19 hot spare defined, 14 L load balancing, defined, 16 logical drives compared to array, 12 M maximum number of hard drives for RAID 5, 19 hard drives for RAID ADG, 19 minimum number of hard drives for RAID, 19 mirroring of drives, 15 multiple hard drive failure, 19

25 *J6369-90050*

Printed in the US