Using Replication for Energy Conservation in RAID Systems

Using Replication for Energy Conservation in RAID Systems Jinoh Kim and Doron Rotem Lawrence Berkeley National Laboratory University of California Berkeley, CA 94720, USA {jinohkim,d_rotem}@lbl.gov Abstract— Energy efficiency has become a major con- This percentage of power consumption by disk storage cern in data centers as several reports predict that systems will only continue to increase, as data intensive the anticipated energy costs over a three year period applications demand fast and reliable access to on-line will exceed hardware acquisition. In particular, several data resources. This in turn requires the deployment of reports indicate that storage devices (and cooling them power hungry faster (high RPM) and larger capacity off) may contribute over 25 percent of the total energy disks. Most of the larger data centers use some type consumed in a data center. In this paper, we present of RAID disk configuration to provide I/O parallelism. a novel approach for energy conservation, called iRGS While this parallelism benefits performance, it increases (inter-RAID Gear-Shift), which utilizes data replication the number of spinning disks and energy consumption. as a tool for extending the idleness period of a large There are many suggested techniques for power savings fraction of the disks. iRGS adapts to the workload in the research literature, including: observed by the system, thus allowing energy saving (a) Dynamic power management (DPM) algorithms [5]: subject to required service level. In particular, iRGS These make decisions in real time when disks can manage power in large data centers with mul- should be transitioned to a lower power dissipation tiple RAID groups unlike previous work which deals state while experiencing an idle period. The length with individual disks on one RAID group. To enable of the idle period before a spin down is triggered this, iRGS provides (1) a new replication algorithm is called idleness threshold. Analytical solutions to that allows gradual adaption to the workload by gear this online problem have been evaluated and it was shifting, (2) a mapping technique to service requests shown that the optimal idleness threshold period under power saving modes, and (3) a write consistency should be set to β where β is the energy penalty (in Pτ mechanism for new writes and existing file updates Joules) for having to serve a request while the disk under replicated environments. Simulation with real life is in standby mode, (i.e., spinning the disk down trace data (Cello99) and synthetic data shows that our and then spinning it up in order to serve a request) method saves up to 60% of energy, and outperforms and Pτ is the rate of energy consumption of the disk existing power management algorithms, while providing (in Watts) in the idle mode. However, it is difficult better response time. for this kind of fixed threshold-based techniques to adapt to workload characteristics, which may vary Keywords: Power Management, RAIDs, Replication significantly over time. (b) Using Solid State Devices (SSDs) instead of 1. Introduction disks [6]: While SSD technology may help to allevi- Power optimization in data center environments has ate some of the energy problems, they are currently recently gained a lot of interest because of the costs an order of magnitude more expensive than HDDs involved in power delivery and system cooling. In a in terms of dollars per gigabyte [9]. recent report to congress [12], EPA stated that many (c) MAID (massive array of idle disks) technology [2]: data centers have already reached their power capacity In a MAID architecture, the number of drives that limit and more than 10% of data centers will be out can spin at any one time is limited. This allows of power capacity by the end of this year, while 68% extremely dense packaging, often impossible in expect to be at their limit within the next three years. conventional architectures. An example is a single Among the many components in the data center, it COPAN1 frame that can support up to 896 drives is currently estimated that disk storage systems con- sume about 25–35 percent of the total power [4]. 1http://www.sgi.com/products/storage/maid/ where at most 25% of the drives are spinning at data replication is an attractive practical option, since any given period. Disks are spun down whenever it is widely employed in server clusters for diverse they experience long idle periods. In case data is purposes, including data availability, durability, load needed from a non-spinning disk, severe response balancing, etc. Although replication requires additional time penalties may be incurred. This technology storage capacity, it usually comes at a very low cost, was successfully utilized for archival type data, but since it is well known that storage resources in data cen- in most typical data center environments, the disks ters are often considerably under-utilized at around 1/3 do not experience long enough idle periods, unless of total available capacity [4], [7], [8]. In addition, one some request redirection is available. critical problem of non-redundancy-based techniques is (d) Disks with dynamic RPM [16], [10], [14]: This the possibility of extremely long response times (greater technology may potentially allow to move disks to than tens of seconds) for any data accesses against a state that consumes less energy while still being powered-down disks as these disks need to be spun able to actively serve requests although at a slower up before they can serve any requests. Given that data transfer rate. This technology is unfortunately not access can be bursty, more than a single request may yet available on a large commercial scale. experience such unexpected, huge latencies, violating Our approach in this work is to use data replication system level agreements (SLAs). In this work, we focus as a tool for extending the idleness period of a large on techniques for taking advantage of data replication fraction of the disks, thus allowing them to be spun for energy conservation. down. Our algorithms adapt to the workload observed Diverted Accesses [11] and eRAID [8] utilize existing by the system (e.g., disk utilization, I/O request arrival replication for energy saving by using redirection of rate) thus allowing energy saving subject to required requests under light load. For example, eRAID relies service level. Our key contributions are as follows: on RAID-1 mirroring. By exploiting replication, it is • We present the design of a new replication algo- possible to spin down mirrored disks thus saving energy. rithm that allows gradual adaptation to the work- Since the spinning disks contain all data in the system, load by gear shifting for massive RAID systems. there would be no data access misses, and this could • We develop a mapping technique to service re- extend disk sleeping time, unless system load surges quests under power saving modes and a write up. consistency mechanism for new writes and existing While the above studies simply exploit replicated file updates in replicated environments. data, some studies [13], [7] provide replication strategies • We present experimental results with real life trace for energy management. Lang et al. [7] suggested a data (Cello99 [1]) and synthetic data. The results replication technique based on Chained Declustering. show that our method outperforms PARAID and The basic idea is to create a full replication to one DPM algorithms in energy saving, while providing of the adjacent nodes in a virtual ring topology. Thus, a better response time. up to half of the nodes can be spun down for energy The paper is organized as follows. First, we provide management. The authors also showed how they can related studies, particularly utilizing replications for achieve load balancing in such a ring-based replication energy conservation in section 2. Our novel replication setting. and mapping techniques for flexible gear shifting will PARAID [13] suggests replication in a RAID group. be presented in section 3. We provide our evaluation Instead of running all disks in a RAID group, PARAID results in section 4 with synthetic and Cello99 work- determines an appropriate gear level,i.e., how many loads. Finally, we provide conclusions and future work disks in a RAID group are spinning vs. disks in sleep directions in section 5. mode. This decision is based on the current system load (derived from disk utilization). Based on gear level, 2. Related Work disks in the RAID system are spun down (when the There has been a great deal of work on energy gear goes down) or spun up (when the gear goes up). conservation for large-scale storage systems, based on PARAID provides skewed data replication, so that it caching [15], [17], data placement [14], data migra- can continue service using only a subset of the disks tion [10], [16], and data replication [11], [8], [13], [7]. in each RAID group. Our work is inspired by PARAID These techniques try to prolong disk idle times, so as to in terms of shifting gear levels depending on current make it possible to place idle disks in low-power state, workload. One big difference is that our approach, called thus saving energy. Among these techniques, exploiting iRGS (Inter-RAID Gear-Shift), shifts gear levels be- tween RAID groups (or RDG) rather than within a single RAID group. Hence, there is no sacrifice in parallelism that PARAID may suffer in low-level gears. In addition, iRGS can be more suitable for large-scale settings, since data centers include multiple, possibly hundreds of, RAID groups. In this case, power management based on RAID groups (instead of individual disks) would be simpler and more efficient. In the next section, we Fig. 1: iRGS gear shift discuss details of our iRGS architecture and operations.

Load more