Fastscale: Accelerate RAID Scaling by Minimizing Data Migration

FastScale: Accelerate RAID Scaling by Minimizing Data Migration Weimin Zheng, Guangyan Zhang Tsinghua University [email protected] Abstract today’s server environments, many applications (e.g., e- business, scientific computation, and web servers) access Previous approaches to RAID scaling either require a data constantly. The cost of downtime is extremely high very large amount of data to be migrated, or cannot toler- [2], giving rise to the necessity of online and real-time ate multiple disk additions without resulting in disk im- scaling. balance. In this paper, we propose a new approach to Traditional approaches [3, 4, 5] to RAID scaling RAID-0 scaling called FastScale. First, FastScale mini- are restricted by preserving the round-robin order after mizes data migration, while maintaining a uniform data adding disks. The addressing algorithm can be expressed distribution. With a new and elastic addressing function, as follows for the ith scaling operation: it moves only enough data blocks from old disks to fill ½ an appropriate fraction of new disks without migrating d = x mod Ni fi(x) : (1) data among old disks. Second, FastScale optimizes data b = x=Ni migration with two techniques: (1) it accesses multiple physically successive blocks via a single I/O, and (2) it where block b of disk d is the location of logical block x, records data migration lazily to minimize the number of and Ni gives the total number of disks. Generally speak- metadata writes without compromising data consistency. ing, as far as RAID scaling from m disks to m + n is Using several real system disk traces, our experiments concerned, only the data blocks in the first stripe are not show that compared with SLAS, one of the most effi- moved. This indicates that almost 100 percent of data cient traditional approaches, FastScale can reduce redis- blocks have to be migrated no matter what the numbers tribution time by up to 86.06% with smaller maximum of old disks and new disks are. There are some efforts response time of user I/Os. The experiments also illus- [3, 5] concentrating on optimization of data migration. trate that the performance of the RAID-0 scaled using They improve the performance of RAID scaling by a cer- FastScale is almost identical with that of the round-robin tain degree, but do not overcome the limitation of large RAID-0. data migration completely. The most intuitive method to reduce data migration is the semi-RR [6] algorithm. It requires a block movement 1 Introduction only if the resulting disk number is one of new disks. The algorithm can be expressed as follows for the ith scaling Redundant Array of Inexpensive Disks (RAID) [1] was operation: proposed to achieve high performance, large capacity ½ g (x) if (x mod N ) < N and data reliability, while allowing a RAID volume to g (x) = i¡1 i i¡1 (2) i f (x) otherwise be managed as a single device. As user data increase i and computing powers enhance, applications often re- Semi-RR reduces data migration significantly. Unfortu- quire larger storage capacity and higher I/O performance. nately, it does not guarantee uniform distribution of data To supply needed capacity and/or bandwidth, one solu- blocks after subsequent scaling operations (see section tion is to add new disks to a RAID volume. This disk 2.4). This will deteriorate the initial equally distributed addition is termed “RAID scaling”. load. To regain uniform data distribution in all disks includ- In this paper, we propose a novel approach called ing the old and the new, RAID scaling requires certain FastScale to redistribute data for RAID-0 scaling. It ac- blocks to be moved onto added disks. Furthermore, in celerates RAID-0 scaling by minimizing data migration. 1 by up to 86.06% with smaller maximum response 1 m time of user I/Os. m n 1 ( ) ² The performance of the RAID scaled using FastScale is almost identical with that of the round- Ă Ă robin RAID. In this paper, we only describe our solution for RAID- 0 0, i.e., striping without parity. The solution can also work for RAID-10 and RAID-01. Although we do not handle D0 D1 Dm-1 Dm Dm+n-1 old disks new disks RAID-4 and RAID-5, we believe that our method pro- vides a good starting point for efficient scaling of RAID- 4 and RAID-5 arrays. Figure 1: Data migration using FastScale. Only data blocks are moved from old disks to new disks for regaining a uniform distribution, while no data is migrated among old disks. 2 Minimizing Data Migration As shown in Figure 1, FastScale moves only data blocks 2.1 Problem Statement from old disks to new disks enough for preserving the For disk addition into a RAID, it is desirable to ensure an uniformity of data distribution, while not migrating data even load on all the disks and minimal block movement. among old disks. Consequently, the migration fraction of Since the location of a block may be changed during a FastScale reaches the lower bound of the migration frac- scaling operation, another objective is to quickly com- tion, n=(m + n). In other words, FastScale succeeds in pute the current location of a block. minimizing data migration for RAID scaling. To achieve the above objectives, the following three We design an elastic addressing function through requirements should be satisfied for RAID scaling: which the location of one block can be easily computed without any lookup operation. By using this function, ² Requirement 1 (Uniform data distribution): If there FastScale changes only a fraction of the data layout while are B blocks stored on m disks, the expected number preserving the uniformity of data distribution. FastScale of blocks on each disk is approximately B=m so as has several unique features as follows: to maintain an even load. ² FastScale maintains a uniform data distribution af- ² Requirement 2 (Minimal Data Migration): During ter RAID scaling. the addition of n disks to a RAID with m disks stor- ² FastScale minimizes the amount of data to be mi- ing B blocks, the expected number of blocks to be grated entirely. moved is B £ n=(m + n). ² FastScale preserves a simple management of data ² Requirement 3 (Fast data Addressing): In a m-disk due to deterministic placement. RAID, the location of a block is computed by an algorithm with low space and time complexity. ² FastScale can sustain the above three features after multiple disk additions. 2.2 Two Examples of RAID Scaling FastScale also exploits special physical properties to optimize online data migration. First, it uses aggre- Example 1: To understand how the FastScale algorithm gate accesses to improve the efficiency of data migration. works and how it satisfies all of the three requirements, Second, it records data migration lazily to minimize the we take RAID scaling from 3 disks to 5 as an example. number of metadata updates while ensuring data consis- As shown in Figure 2, one RAID scaling process can be tency. divided into two stages logically: data migration and data We implement a detailed simulator that uses DiskSim filling. In the first stage, a fraction of existing data blocks as a worker module to simulate disk accesses. Under sev- are migrated to new disks. In the second stage, new data eral real-system workloads, we evaluate the traditional are filled into the RAID continuously. Actually, the two approach and the FastScale approach. The experimental stages, data migration and data filling, can be overlapped results demonstrate that: in time. For the RAID scaling, each 5 sequential locations in ² Compared with one of the most efficient traditional one disk are grouped into one segment. For the 5 disks, approaches, FastScale shortens redistribution time 5 segments with the same physical address are grouped 2 D0 0 3 6 9 12 15 18 21 24 27 30 D0 0 2 4 6 8 10 12 14 16 18 20 D1 1 4 7 10 13 16 19 22 25 28 31 D1 1 3 5 7 9 11 13 15 17 19 21 D2 2 5 8 11 14 17 20 23 26 29 32 2+3 3+2 D0 6 8 16 18 D0 6 9 12 21 24 27 D1 1 9 11 19 21 D1 1 10 13 16 25 28 31 D2 0 2 10 12 20 D2 2 5 14 17 20 29 32 D3 3 4 13 14 D3 0 3 7 15 18 22 30 D4 5 7 15 17 D4 4 8 11 19 23 26 New data is filled in New data is filled in D0 22 25 28 6 8 37 40 43 16 18 52 D0 33 35 6 9 12 43 45 21 24 27 53 D1 1 26 29 31 9 11 41 44 46 19 21 D1 1 36 37 10 13 16 46 47 25 28 31 D2 0 2 30 32 34 10 12 45 47 49 20 D2 2 5 38 39 14 17 20 48 49 29 32 D3 23 3 4 33 35 38 13 14 48 50 53 D3 0 3 7 40 41 15 18 22 50 51 30 D4 24 27 5 7 36 39 42 15 17 51 54 D4 34 4 8 11 42 44 19 23 26 52 54 Figure 3: RAID scaling from 2 disks to 5 using FastScale, where m < n. Figure 2: RAID scaling from 3 disks to 5 using FastScale, where m ¸ n. that the calculation overhead for the data addressing is very low. into one region.

Load more