PROUHD : RAID for the End-User

PROUHD : RAID for the end-user. Pierre Vignéras [email protected] April 14, 2010 Résumé RAID has still not been adopted by most end-users despite its inherent quality such as performance and reliability. Reasons such as complexity of RAID technology (levels, hard/soft), set-up, or support may be given. We believe the main reason is that most end-users own a vast amount of heterogeneous storage devices (USB stick, IDE/SATA/SCSI internal/external hard drives, SD/XD Card, SSD, . .), and that RAID- based systems are mostly designed for homogenous (in size and technology) hard disks. Therefore, there is currently no storage solution that manages heterogeneous storage devices efficiently. In this article, we propose such a solution and we call it PROUHD (Pool of RAID Over User Hetero- geneous Devices). This solution supports heterogeneous (in size and technology) storage devices, maximizes the available storage space consumption, is tolerant to device failure up to a customizable degree, still makes automatic addition, removal and replacement of storage devices possible and remains performant in the face of average end-user workflow. Although this article makes some references to Linux, the algorithms described are independent of the operating system and thus may be implemented on any of them. Copyrights This document is licensed under a Creative Commons Attribution-Share Alike 2.0 France License. Please, see for details : http://creativecommons.org/licenses/by-sa/2.0/ Disclaimer The information contained in this document is for general information purposes only. The information is provided by Pierre Vignéras and while I endeavor to keep the information up to date and correct, I make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the document or the information, products, services, or related graphics contained in the document for any purpose. Any reliance you place on such information is therefore strictly at your own risk. In no event I will we be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from loss of data or profits arising out of, or in connection with, the use of this document. Through this document you are able to link to other documents which are not under the control of Pierre Vignéras. I have no control over the nature, content and availability of those sites. The inclusion of any links does not necessarily imply a recommendation or endorse the views expressed within them. 1 TABLE DES MATIÈRES TABLE DES MATIÈRES Table des matières 1 Introduction 3 2 Algorithm 3 2.1 Description.................................................3 2.2 Analysis...................................................5 2.3 Implementation (layout-disks)......................................8 2.4 Performance................................................8 3 Partitionning drives 9 4 Handling Disk Failure 9 4.1 Replacement Procedure.......................................... 11 4.1.1 Replacing a failed device with a same-size one......................... 11 4.1.2 Replacing a failed device with a larger one............................ 11 4.1.3 Replacing a failed drive with a smaller one........................... 14 4.1.4 RAID array reconstruction..................................... 17 5 Adding/removing a device to/from a PROUHD 17 6 Forecasting: Storage Box for Average End-Users 18 7 Alternatives 19 8 Questions, Comments & Suggestions 19 9 Note 19 10 Acknowledgment 19 2 2 ALGORITHM hda: 1 Tb hdb: 1 Tb hdb: 1 Tb hdc: 1 Tb FIGURE 1 – Stacking storage devices (same size, ideal RAID case). 1 Introduction Whereas RAID 1 has been massively adopted by the industry, it is still not common on end-users desk- top. Complexity of RAID system might be one reason... among many others. Actually, in a state-of-the-art data center, the storage is designed according to some requirements (the ”top-bottom” approach already discussed in a previous article 2). Therefore, from a RAID perspective, the storage is usually composed of a pool of disks of same size and characteristics including spares 3. The focus is often on performance. The global storage capacity is usually not a big deal. The average end-user case is rather different in that their global storage capacity is composed of various storage devices such as : – Hard drives (internal IDE, internal/external SATA, external USB, external Firewire) ; – USB Sticks ; – Flash Memory such as SDCard, XDCard, ... ; – SSD. On the opposite, performance is not the big deal for the end-user : most usage does not require very high throughput. Cost and capacity are main important factors along with ease of use. By the way, the end-user does not usually have any spare devices. We propose in this paper an algorithm for disk layout using (software) RAID that has the following characteristics : – it supports heterogeneous storage devices (size and technology) ; – it maximizes storage space ; – it is tolerant to device failure up to a certain degree that depends on the number of available devices and on the RAID level chosen ; – it still makes automatic addition, removal and replacement of storage devices possible under certain conditions ; – it remains performant in the face of average end-user workflow. 2 Algorithm 2.1 Description Conceptually, we first stack storage devices one over the other as shown in figure1. 1. For an introduction on RAID technology, please refer to online articles such as : http://en.wikipedia.org/wiki/Standard_RAID_levels 2. http://www.vigneras.org/pierre/wp/2009/07/21/choosing-the-right-file-system-layout-under-linux/ 3. By the way, since similar disks may fail at similar time, it may be better to create storage pools from disks of different model or even vendor. 3 2.1 Description 2 ALGORITHM hda: 1 Tb hdb: 2 Tb hdc: 1 Tb hdd: 4 Tb FIGURE 2 – Stacking storage devices (different size = usual end-user case). On that example with n 4 devices, each of capacity c 1T b (terabytes), we end up with a global storage Æ Æ capacity of G n c 4T b. From that global storage space, using RAID, you can get : Æ ¤ Æ – a4Tb(n c) virtual storage devices (called PV for Physical Volume 4 in the following) using RAID0 ¤ (level 0), but then you have no fault tolerancy (if a physical device fail, the whole virtual device is lost). – a1Tb(c) PV using RAID1 ; in that case, you have a fault tolerancy degree of 3 (the PV remains valid in the face of 3 drives failure, and this is the maximum). – a 3 Tb ((n 1) c) PV using RAID5 ; in that case, you have a fault tolerancy degree of 1 ; ¡ ¤ – a 2 Tb (M c) PV using RAID10 ; it that case, the fault tolerancy degree is also 1 5 (M is the number of ¤ mirrored sets, 2 in our case). The previous example hardly represents a real (end-user) case. Figure2 represents such a scenario, with 4 disks also (though listed capacities does not represent common use cases, they ease mental capacity calcu- lation for the algorithm description). In this case, we face n 4 devices d, of respective capacity c : 1 Tb, Æ d 2 Tb, 1 Tb, and 4 Tb. Hence the global storage capacity is : G §c 1 2 1 4 8T b. Since traditional RAID Æ d Æ Å Å Å Æ array requires same device size, in that case, the minimum device capacity is used : c 1T b. Therefore, min Æ we can have : – 4 Tb, using RAID0 ; – 1 Tb, using RAID1 ; – 3 Tb, using RAID5 ; – 2 Tb, using RAID10. Thus, exactly the same possibilities than in the previous example. The main difference however, is the wasted storage space — defined as the storage space unused from each disk neither for storage nor for fault tolerancy 6. In our example, the 1 Tb capacity of both devices hda and hdc are fortunately fully used. But only 1 Tb out of 2 Tb of device hdb and 1 Tb out of 4 Tb of device hdd is really used. Therefore in this case, the wasted storage space is given by the formula : X W (cd cmin) (1 1) (2 1) (1 1) (4 1) 4T b Æ d ¡ Æ ¡ Å ¡ Å ¡ Å ¡ Æ In this example, W 4T b out of G 8T b, i.e. 50% of the global storage space is actually unused. For an Æ Æ end-user, such an amount of wasted space is definitely an argument against using RAID, despite all the other advantages RAID provides (flexibility for adding/removing devices, fault tolerancy and performance). 4. This comes from the LVM terminology which is often used with RAID on Linux. 5. This is the worst case and the one that should be taken into account. Of course, disks hda and hdc may fail, for example, and the PV will remain available, but the best case is not the one that represents the fault tolerancy degree. 6. Note that this is independent on the actual RAID level chosen : each byte in a RAID array is used, either for storage or for fault tolerance. In the example, using RAID1, we only get 1 Tb out of 8 Tb and it may look like a waste. But if RAID1 is chosen for such an array, it actually means that the fault tolerancy degree of 3 is required. And such a fault tolerancy degree has a storage cost ! 4 2 ALGORITHM 2.2 Analysis hda: 1 Tb hdc: 1 Tb hdb: 2 Tb hdd: 4 Tb p p p 1 2 3 R R Raid Array: 1 2 Physical Volume: PV(1) PV(2) FIGURE 3 – Illustration of the vertical RAID layout. The algorithm we propose is very simple indeed.

Load more