RAID Technology
Total Page:16
File Type:pdf, Size:1020Kb
RAID Technology Reference and Sources: y The most part of text in this guide has been taken from copyrighted document of Adaptec, Inc. on site (www.adaptec.com) y Perceptive Solutions, Inc. RAID stands for Redundant Array of Inexpensive (or sometimes "Independent") Disks. RAID is a method of combining several hard disk drives into one logical unit (two or more disks grouped together to appear as a single device to the host system). RAID technology was developed to address the fault-tolerance and performance limitations of conventional disk storage. It can offer fault tolerance and higher throughput levels than a single hard drive or group of independent hard drives. While arrays were once considered complex and relatively specialized storage solutions, today they are easy to use and essential for a broad spectrum of client/server applications. Redundant Arrays of Inexpensive Disks (RAID) "KILLS - BUGS - DEAD!" -- TV commercial for RAID bug spray There are many applications, particularly in a business environment, where there are needs beyond what can be fulfilled by a single hard disk, regardless of its size, performance or quality level. Many businesses can't afford to have their systems go down for even an hour in the event of a disk failure; they need large storage subsystems with capacities in the terabytes; and they want to be able to insulate themselves from hardware failures to any extent possible. Some people working with multimedia files need fast data transfer exceeding what current drives can deliver, without spending a fortune on specialty drives. These situations require that the traditional "one hard disk per system" model be set aside and a new system employed. This technique is called Redundant Arrays of Inexpensive Disks or RAID. ("Inexpensive" is sometimes replaced with "Independent", but the former term is the one that was used when the term "RAID" was first coined by the researchers at the University of California at Berkeley, who first investigated the use of multiple- drive arrays in 1987.) The fundamental principle behind RAID is the use of multiple hard disk drives in an array that behaves in most respects like a single large, fast one. There are a number of ways that this can be done, depending on the needs of the application, but in every case the use of multiple drives allows the resulting storage subsystem to exceed the capacity, data security, and performance of the drives that make up the system, to one extent or another. The tradeoffs--remember, there's no free lunch- -are usually in cost and complexity. Originally, RAID was almost exclusively the province of high-end business applications, due to the high cost of the hardware required. This has changed in recent years, and as "power users" of all sorts clamor for improved performance and better up-time, RAID is making its way from the "upper echelons" down to the mainstream. The recent proliferation of inexpensive RAID controllers that work with consumer-grade IDE/ATA drives--as opposed to expensive SCSI units-- has increased interest in RAID dramatically. This trend will probably continue. I predict that more and more motherboard manufacturers will begin offering support for the feature on their boards, and within a couple of years PC builders will start to offer systems with inexpensive RAID setups as standard configurations. This interest, combined with my long-time interest in this technology, is the reason for my recent expansion of the RAID coverage on this site from one page to 80. :^) Unfortunately, RAID in the computer context doesn't really kill bugs dead. It can, if properly implemented, "kill down-time dead", which is still pretty good. :^) History RAID technology was first defined by a group of computer scientists at the University of California at Berkeley in 1987. The scientists studied the possibility of using two or more disks to appear as a single device to the host system. Although the array's performance was better than that of large, single-disk storage systems, reliability was unacceptably low. To address this, the scientists proposed redundant architectures to provide ways of achieving storage fault tolerance. In addition to defining RAID levels 1 through 5, the scientists also studied data striping -- a non-redundant array configuration that distributes files across multiple disks in an array. Often known as RAID 0, this configuration actually provides no data protection. However, it does offer maximum throughput for some data-intensive applications such as desktop digital video production. The driving factors behind RAID A number of factors are responsible for the growing adoption of arrays for critical network storage. More and more organizations have created enterprise-wide networks to improve productivity and streamline information flow. While the distributed data stored on network servers provides substantial cost benefits, these savings can be quickly offset if information is frequently lost or becomes inaccessible. As today's applications create larger files, network storage needs have increased proportionately. In addition, accelerating CPU speeds have outstripped data transfer rates to storage media, creating bottlenecks in today's systems. RAID storage solutions overcome these challenges by providing a combination of outstanding data availability, extraordinary and highly scalable performance, high capacity, and recovery with no loss of data or interruption of user access. By integrating multiple drives into a single array -- which is viewed by the network operating system as a single disk drive -- organizations can create cost-effective, minicomputersized solutions of up to a terabyte or more of storage. Principles RAID combines two or more physical hard disks into a single logical unit using special hardware or software. Hardware solutions are often designed to present themselves to the attached system as a single hard drive, so that the operating system would be unaware of the technical workings. For example, if one were to configure a hardware-based RAID-5 volume using three 250GB hard drives (two drives for data, and one for parity), the operating system would be presented with a single 500GB volume. Software solutions are typically implemented in the operating system and would present the RAID volume as a single drive to applications running within the operating system. There are three key concepts in RAID: mirroring, the writing of identical data to more than one disk; striping, the splitting of data across more than one disk; and error correction, where redundant ("parity") data is stored to allow problems to be detected and possibly fixed (known as fault tolerance). Different RAID schemes use one or more of these techniques, depending on the system requirements. The purpose of using RAID is to improve reliability and availability of data, ensuring that important data is not harmed in case of hardware failure, and/or to increase the speed of file input/output. Mirroring Mirroring is one of the two data redundancy techniques used in RAID (the other being parity). In a RAID system using mirroring, all data in the system is written simultaneously to two hard disks instead of one; thus the "mirror" concept. The principle behind mirroring is that this 100% data redundancy provides full protection against the failure of either of the disks containing the duplicated data. Mirroring setups always require an even number of drives for obvious reasons. The chief advantage of mirroring is that it provides not only complete redundancy of data, but also reasonably fast recovery from a disk failure. Since all the data is on the second drive, it is ready to use if the first one fails. Mirroring also improves some forms of read performance (though it actually hurts write performance.) The chief disadvantage of RAID 1 is expense: that data duplication means half the space in the RAID is "wasted" so you must buy twice the capacity that you want to end up with in the array. Performance is also not as good as some RAID levels. Block diagram of a RAID mirroring configuration. The RAID controller duplicates the same information onto each of two hard disks. Note that the RAID controller is represented as a "logical black box" since its functions can be implemented in software, or several different types of hardware (integrated controller, bus-based add-in card, stand-alone RAID hardware.) Mirroring is used in RAID 1, as well as multiple-level RAID involving RAID 1 (RAID 01 or RAID 10). It is related in concept to duplexing. Very high-end mirroring solutions even include such fancy technologies as remote mirroring, where data is configured in a RAID 1 array with the pairs split between physical locations to protect against physical disaster! You won't typically find support for anything that fancy in a PC RAID card. :^) Duplexing Duplexing is an extension of mirroring that is based on the same principle as that technique. Like in mirroring, all data is duplicated onto two distinct physical hard drives. Duplexing goes one step beyond mirroring, however, in that it also duplicates the hardware that controls the two hard drives (or sets of hard drives). So if you were doing mirroring on two hard disks, they would both be connected to a single host adapter or RAID controller. If you were doing duplexing, one of the drives would be connected to one adapter and the other to a second adapter. Block diagram of a RAID duplexing configuration. Two controllers are used to send the same information to two different hard disks. The controllers are often regular host adapters or disk controllers with the mirroring done by the system. Contrast this diagram with the one for straight mirroring. Duplexing is superior to mirroring in terms of availability because it provides the same protection against drive failure that mirroring does, but also protects against the failure of either of the controllers.