CAN RAID TECHNOLOGY CONTINUE to PROTECT OUR DATA in the YEAR 2020? Bruce Yellin Advisory Technology Consultant [email protected] EMC Corporation

CAN RAID TECHNOLOGY CONTINUE TO PROTECT OUR DATA IN THE YEAR 2020? Bruce Yellin Advisory Technology Consultant [email protected] EMC Corporation Table of Contents What’s the Issue? ...................................................................................................................... 4 The Danger ................................................................................................................................ 8 Tackling the Issue Head On ......................................................................................................13 Erasure Coding .....................................................................................................................17 What About Solid State Disks? ..............................................................................................19 Triple parity ...........................................................................................................................22 RAID 1 ..................................................................................................................................22 Big Data ................................................................................................................................23 Conclusion ................................................................................................................................25 Appendix – Background on How A Disk Handles I/O Requests ................................................27 Footnotes ..................................................................................................................................28 Disclaimer: The views, processes, or methodologies published in this article are those of the author. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies. 2012 EMC Proven Professional Knowledge Sharing 2 The digital data tsunami is coming. It is not a once in a decade storm, but a continuous daily event that can overwhelm the largest of data centers. Like a giant surge of rising water, data continues to grow at 60% a year and you need to be prepared. IDC reports over a zettabyte (one trillion gigabytes) of information was created and replicated worldwide in 2010, growing to 1.8 zettabytes just a year later. Data has increased by a factor of nine in just the last five years, and that the number of files will grow 8-fold over the next five years1. Over 90% of that data surge is unstructured, and “big data” is by far the largest producer of unstructured data. One of the challenging issues is how to store and protect all that data. No one is suggesting you’ll have a zettabyte or even an exabyte, but in the next eight years, you could easily be storing a petabyte of data. Will you use RAID to protect it? RAID is the “Swiss Army” tool invented in 19882 to protect and improve the performance of sub-gigabyte drives. Meanwhile, drive capacities have doubled every year or two since then according to Kryder’s Law3. Similar to Moore’s Law, Mark Kryder’s work suggest that by 2015, 6TB drives will be common and by 2020, we will have inexpensive 14TB4 hard drives. Is RAID still the right tool to use? The numbers can be scary. Years ago, 50-100TB seemed like a lot of storage. These days, 250TB of usable capacity minimally needs 320 x 1TB drives using RAID 6 (14+2) protection. At the projected growth rate, you will have a petabyte of usable capacity and almost 1,300 of those drives three years from now. RAID would be protecting over 13,300 x 1TB drives by 2020. Clearly, that is a lot of drives and eight years is beyond their useful drive life, so it is a good thing those larger 6TB drives are coming, right? Well, maybe. Whether you believe data will grow at 40%, 50%, or 60%, or if drive sizes double every year or two or three, one of the issues facing the storage industry is whether today’s parity RAID concepts are up to the task of protecting those large drives. Many have their doubts. Drives have gotten larger because of the advances in areal density. As shown to the right, areal density is the number of bits in a square inch of magnetic media. The original IBM RAMAC in 19565 had a density of 2,000 bits/in2 and today 625 gigabits fit in a square inch of a 4TB drive6. The downside to increased areal density is the potentially harmful increase in bit error rate (BER). 2012 EMC Proven Professional Knowledge Sharing 3 Larger drives also have longer rebuild times. We also know if we consolidate them into even larger ones, such as two 1TB drives combined into a single 2TB drive, the performance profile significantly decreases. Solid-state drives (SSDs) can offset the performance deficiencies of mechanical drives, but they have their own unique reliability issues. With that in mind, this Knowledge Sharing article examines how RAID holds up. What’s the Issue? Mechanical hard disk drives, which will be called HDD throughout this article, use a magnetically coated platter with “flying” disk heads, while SSDs use memory chips. Drive platters are made from aluminum or glass/ceramic that resists expansion and contraction when they get hot or cold. That material is ground down and base coated for an ultra flat surface. Additional polishing is done before various magnetic layers are deposited on the platter through processes such as “sputtering”7, all before a protective coating is applied. The disk drive housing contains one or more platters, a read/write head for each platter surface, a head positioning coil, a motor to spin the platters, and electronics to translate the disk controller’s commands and administer the buffer space. The head, literally flying over the spinning platter at a fraction of an inch at speeds greater than 150 M.P.H. (15,000 RPM drive), imparts a magnetic field or detects the existing magnetic field on the platter. The positioning coil receives instructions from the drive electronics as to where to position the head over the platter. Before a single bit of data is stored on a drive, it is first low-level formatted at the factory and high-level formatted by your storage frame’s operating system. This allows any “bad” sectors to be remapped to spare sectors. With the drive operational, when a server issues a write to a disk sector, the data is broken up into 512 or 4,096 byte sectors. Extra Error Correcting Code (ECC) check bits are calculated at the same time and written along with the sector to the platter. For example, on EMC VMAX®8 and VNX® arrays, a sector is 520 bytes in size with 512 bytes for data and an 8 byte CRC (cyclic redundancy check) checksum to maintain the integrity of the sector’s data. When the drive subsequently reads that sector, it also reads the ECC which together detects if the correct data was read. Data are Run-Length Limited (RLL) encoded to maximize the number of bits stored on the media, well beyond the coding of just a zero or a one bit. 2012 EMC Proven Professional Knowledge Sharing 4 With data packed so densely, ECC frequently corrects the data given the fluctuating signal strengths and electronic noise of a flying head over a rotating platter. It is when ECC is unable to correct the data that your operating system displays a disk read error message. As mentioned earlier, drive capacity is a factor of its areal density, and the denser you can pack the zeroes and ones on the magnetic media, the greater the BER and likelihood of read failures. In this chart, areal density is shown to be limited by the BER, especially when heat, commonly called a thermodynamic effect, becomes a factor.9 Some manufacturers are tackling the BER problem by formatting drives with larger 4,096 byte sectors and 100 bytes of ECC. Larger ECCs distribute the calculation overhead from eight sectors to one sector thereby reducing the error checking overhead caused by 512 byte sectors on dense drives10. The 4KB ECC helps address the BER problem that are in some cases approaching 1 bit of error for every 12.5TB read – more on this in a later section. Another benefit of the 4KB sector is a reduction in the usable capacity overhead on the drive. The improvement ranges from “…seven to eleven percent in physical platter space” and “The 4K format provides enough space to expand the ECC field from 50 to 100 bytes to accommodate new ECC algorithms.”11 Some systems proactively test drive reliability before failures occur by “sniffing” or “scrubbing” them and making sure all the data can be “test read” without error. This happens during idle moments and has very little performance impact on the rest of the system. 2012 EMC Proven Professional Knowledge Sharing 5 Monitoring the well-being of every zero and one on a disk and noting the quantity of recoverable read errors is just one way of predicting a possible future disk failure. These statistics, depending on the system, allow it to proactively move data from troubled disk areas to safe locations on other drives before serious problems arise. This monitoring is just one way a system can perform “self-healing”. Another tool used to predict failure is S.M.A.R.T., or Self-Monitoring Analysis and Reporting Technology. S.M.A.R.T. tracks drive conditions and alerts the user to pending disk failures with the goal of giving you enough time to back up your data before it is lost. This tool incorporates metrics such as reallocated sector counts, temperature, raw read errors, spin-up time, and others. In theory, if the temperature rises or any other triggering event occurs, data can be preemptively moved off the suspect drive or all the data can be copied to a spare. Copying is much faster than parity rebuilding. However, in a Google study of over 100,000 drives, there was “…little predictive value of S.M.A.R.T.“ and “…in the 60 days following the first scan error on a drive, the drive is, on average, 39 times more likely to fail than it would have been had no such error occurred.”12 For more on S.M.A.R.T., please read the article “Playing It S.M.A.R.T.”13. As a disk controller issues read and write drive requests, it may find a sector to be unreliable – i.e., the magnetic coding is not retaining data or is reporting the wrong information. For example, after a write, the drive reads the information again to make sure it was written properly.

CAN RAID TECHNOLOGY CONTINUE to PROTECT OUR DATA in the YEAR 2020? Bruce Yellin Advisory Technology Consultant [email protected] EMC Corporation

Active@ Boot Disk User Guide Copyright © 2008, LSOFT TECHNOLOGIES INC

Active@ UNDELETE Documentation

Chapter 3. Booting Operating Systems

Chapter 9: Peripheral Devices—Overview D a 2/E

Diskgenius User Guide (PDF)

Operating Systems Disk Management

File Allocation Table - Wikipedia, the Free Encyclopedia Page 1 of 22

R-Linux User's Manual

Backing Storage

Lecture Notes on Hard Disk

Darshan Institute of Engineering & Technology for Diploma Studies

Dd GNU Fileutils 4.0.36, Provided with Red Hat Linux 7.1 U.S