Data Reliability in Highly Fault-Tolerant Cloud Systems

Ajaykumar Rajasekharan, PhD

Seagate Point of View Determining What Keeps Data Durable and Available

How often have you seen services such as an online email account, online streaming video or online social page hang? Sometimes these kinds of problems are minor inconveniences, and sometimes they result in something major, like data loss. Both hardware devices (hard drives, servers, switches, networks, etc.) and software (hanging processes, OS, etc.) fail at some point. There is also a chance of rare extraneous events like power outages and inclement weather that could contribute to system failures. All of these events can affect system performance and lead to downtime, which often directly translates to frustrated customers and lost revenue for the service provider. Reliability of cloud services are assessed on the basis of two main metrics: • Data Durability – Data durability is the probability that a customer’s data is not lost at any point after it has been uploaded into the system. • Data Availability – Data availability, on the other hand, is the probability that a customer’s data is available for reading, writing or modifying at any point in time. Data can be unavailable if, for example, during a network outage, the system hangs or a node reboots. Data can then be available later— once the errors have been corrected. On the other hand, the data is lost when all the instances of a particular piece of data are completely lost either by a drive failure or data corruption. So while data that is not available may not be lost, data that is lost is definitely not available. Data durability hence bounds data availability from above (data durability ≥ data availability). Data Reliability in Highly Fault-Tolerant Cloud Systems

Cloud reliability models can help provide estimates for the A good model for modeling data durability of storage systems metrics of data durability and data availability, as well as provide should include: estimates of hardware reliability, systems availability, a number • Mean Time to Failure (MTTF) – Length of time of annual replacements and replacement costs. Data center a device or other product is expected to last in architects can use these models to quickly modify hardware operation. and software parameters (number and types of drives, replication/erasure coding, storage zones, network layouts and • Mean Time to Repair (MTTR) – Measure of the risk choices) and design/reorganize their systems to meet their maintainability of repairable items. It represents the quality of service goals. With this in mind, the CMDA team is average time needed to repair a failed device or involved in developing a suite of such detailed system-level component. If represented mathematically, it is the reliability models. total corrective maintenance time divided by the total number of corrective maintenance actions during a To safeguard against data loss issues, cloud computing selected period of time. platforms are comprised of highly fault-tolerant systems that use various methods of redundancy, including: • Ratio of MTTF to MTTR – Decreasing MTTF while increasing MTTR generally hurts data durability. • N-fold data replication – N-fold data replication is a Increasing HDD capacity increases MTTR. system where data is partitioned and each partition is replicated n-fold across n storage targets. Data loss • MTTDL – Traditional durability models (as in Ref [1]) occurs if the replica of the data is corrupted on all use a two-state Markov Chain model (healthy and n storage targets. Multiple data replicas help ensure failed respectively), with the failed state being reached data is not lost. when the number of hard drive failures exceeds the fault tolerance of the system (2 drive failures in case of • K out of N erasure coding – An erasure code is a RAID 6). The analysis is carried out on one RAID group data recovery technique that transforms data or a and divided by the number of RAID groups in the message of k symbols into a longer message system for the system’s MTTDL value. (code word) with n symbols in a way where the data or message can be recovered from a subset of • Parallel Rebuilds – The selected model should the n symbols. assume that the system can conduct parallel rebuilds of lost data into remaining hard drives, thereby • Declustered-parity Redundant Array of Inexpensive enhancing rebuild performance and durability. Disks (RAID) – Rather than treating the hard disk drive (HDD) as a whole device, with RAID declustering, • Correlated Failures – The selected model should RAID protection is applied at the track level. address the situation where failure of a device is When a disk fails, the RAID rebuild is applied at the related to the failure or non-failure of other devices in track level, improving rebuild times for failed the system. devices and decreasing impact on performance • Combinatorics – The selected model should address of the RAID volume. the situation where different fragments of the data Data durability is quantified in terms of the Mean Time to Data are distributed randomly across all hard drives in Loss (MTTDL), or in terms of the probability of losing data the system and hence the fact that the number of (for example, 1 object) in one year. Real-world testing of the lost drives exceeds the fault tolerance level does not durability values of these systems is a hard task requiring a necessarily mean data loss. large amount of testing time and data to analyze. Mathematical models come to the rescue here and allow for computing data durability values. It is extremely important that the durability of these systems is computed accurately and accounted for, since data loss is costly. Overestimating durability values can expose the system to data loss risk (a strict no-no) and underestimating this value increases cost. Therefore, it is important to choose a good model to estimate data durability. This can be a daunting task, but there are a variety of characteristics to look for that can help you make the best choice when choosing a model. Data Reliability in Highly Fault-Tolerant Cloud Systems

It is desirable that durability models incorporate all the above Data availability of highly fault-tolerant systems can be aspects, namely parallel rebuild, correlated failures and effects estimated using a storage-node-based data availability Markov of placement strategies of fragments. A Continuous Time Chains model as shown in Figure 2. This model estimates the Markov Chain (CTMC) is a good modeling framework that probability of the system being in the state of unavailability by enables incorporation of these aspects for fast computation allowing for node and switch failures, and accounting for node of the durability of the system at hand. The following figure rebuilds/reboots. An example model for racks with three nodes compares one such model developed by the Seagate Cloud is shown here. Modeling and Data Analytics (CMDA) group in Ref [1] with the simple and improved equations presented in Ref [2] and Ref [3] for a 14 out of 16 erasure coding scheme. The MTTF of drives is set to 1,400,000 hours; MTTR set to 24 hours and block size chosen to be 128MB. Data loss due to bit-rot or Unrecoverable Read Errors (UREs) and correlation of failures have been excluded here and will be incorporated into future models. The results of the plot show the under-prediction of durability by the simple RAID-based models in this setting and highlight the need for improved models to correctly capture the durability of the high fault-tolerant systems in lieu of generalized simple models.

Figure 2 Schematic of Markov Chain data availability model

References 1. “How Safe is Your Data with OpenStack Swift”; Dimitar M V; http://www.evault.com/uncatego- rized/how-safe-is-your-data-with-openstack-swift/; 2013 2. “RAID: High-Performance, Reliable Secondary Storage”; Chen P M, Lee E K, Gibson G A, Katz R H and Patterson D A; ACM Computing Surveys; Vol .26 No. 2; June 1994 3. “Reliability models for highly fault-tolerant storage systems”, Resch J and Volvovski I; Cleversafe, Inc, Oct 2013

Figure 1 Data durability versus number of drives in a system

www.seagate.com

AMERICAS Seagate Technology LLC 10200 South De Anza Boulevard, Cupertino, California 95014, United States, 408-658-1000 ASIA/PACIFIC Seagate Singapore International Headquarters Pte. Ltd. 7000 Ang Mo Kio Avenue 5, Singapore 569877, 65-6485-3888 EUROPE, MIDDLE EAST AND AFRICA Seagate Technology SAS 16–18, rue du Dôme, 92100 Boulogne-Billancourt, France, 33 1-4186 10 00

© 2014 Seagate Technology LLC. All rights reserved. Printed in USA. Seagate, Seagate Technology and the Wave logo are registered trademarks of Seagate Technology LLC in the United States and/ or other countries. All other trademarks or registered trademarks are the property of their respective owners. Seagate reserves the right to change, without notice, product offerings or specifications. PV0031.1-1410US, October 2014