Oracle ZFS Storage--Data Integrity White Paper

Oracle ZFS Storage—Data Integrity ORACLE WHITE P A P E R | M A Y 2 0 1 7 Table of Contents Introduction 1 Overview 2 Primary Design Goals 2 Shortcomings of Traditional Data Storage 2 RAID Design Problems 2 Traditional RAID Data Integrity Study 3 ZFS Data Integrity 4 ZFS Transactional Processing 4 ZFS Ditto Blocks 5 ZFS Self-Healing 5 University Research Testing and Validation of ZFS Data Integrity 5 Defining Corruption and How Often Corruption Happens 5 Problems Identified with Other File Systems and RAID 6 Research Testing Methodology 6 Testing Configuration and Data Layout 6 Oracle ZFS Storage Appliance Robustness 7 Reducing Risk Through Advanced Data Integrity 7 Robust Protection from Hardware Failures 7 ZFS Data Encryption 8 Conclusion 8 Related Links 9 ORACLE ZFS STORAGE—DATA INTEGRITY Introduction Studies have shown that traditional RAID technology, while effective for disk space savings, cannot provide sufficient end-to-end data integrity to ensure against data loss or corruption in contemporary cloud-scale data storage environments. For example, traditional RAID technology is unable to isolate and protect against firmware or driver bugs that can have a substantial impact on a storage system’s ability to protect against data corruption or loss. Modern storage architectures, like those incorporated into Oracle ZFS Storage Appliance, protect against these failure modes by providing advanced data integrity technology such as hierarchical checksums, redundant metadata, transactional processing, and integrated redundancy. These features provide a more comprehensive and reliable protection against data corruption or loss. This paper explores inherent design flaws in traditional RAID projects that cause data loss, provides case study evidence of these deficiencies, and demonstrates how modern technology like Oracle ZFS Storage is better suited to today’s contemporary cloud-scale data storage environments.1 1 Throughout this paper, Oracle ZFS Storage is used as an umbrella term that covers numerous products, such as the Oracle ZFS Storage Appliance systems, which use the Oracle Solaris ZFS file system. 1 | ORACLE ZFS STORAGE—DATA INTEGRITY Overview Today's vast repositories of data represent valuable intellectual property that must be protected to ensure the livelihood of the enterprise. While backup and archival capabilities are designed to guard against data loss, they do not necessarily protect against silent data corruption. Furthermore, in the event of a problem, the process of restoring archived data generally entails downtime and thus lost productivity. Even worse, not all restore operations are successful, and this can result in permanent data loss. Primary Design Goals A primary design goal and foundation of Oracle ZFS Storage systems is data integrity. The modern end-to-end data integrity architecture of the ZFS file system is designed to overcome the deficiencies of traditional storage products. Oracle ZFS Storage Appliance architecture is able to deliver industry-leading performance while providing contemporary end-to-end data protection and data integrity capability that ensure against data loss better than traditional storage systems. » Redirect-on-write architecture: Data is never overwritten in place. » Snapshots provide continuous file system replication. » Checksum protection occurs throughout the data path. » “Self-healing” ability replaces damaged data from redundant configurations. » Multiple levels of RAID protection meet modern capacities. » ZFS data path reliability is available throughout the OS, application, and hardware stack. Shortcomings of Traditional Data Storage The problem with traditional data storage products is that there is no defense against silent data corruption. Any defect in disk, controller, cable, driver, laser, or firmware can corrupt data silently. » File systems rely on underlying hardware to detect and report errors. » Hardware doesn’t know that a firmware bug occurred. » If disk returns bad data, traditional file system won’t detect it. Even without hardware problems, data is vulnerable to in-transit damage such as controller bugs, DMA parity errors, and SAN network bugs. Block-level checksums only prove that a block is self-consistent. They do not ensure that it's the right block. There is no fault isolation between the data and the checksum that is supposed to protect it, as illustrated in Figure 1 below. RAID Design Problems RAID 5 and other RAID parity schemes include a fatal flaw known as the RAID 5 write hole. When data in a RAID stripe is updated, the parity also must be updated as part of the reconstruction process when a disk fails. Because no way exists to update two or more disks atomically, RAID stripes can become damaged during a reconstruction process if the system crashes or a power outage occurs. This problem means that now the data and the parity are inconsistent and remain inconsistent. This is a silent failure that corrupts data. RAID systems can provide protection with NVRAM that survives power loss but this solution is expensive, and if the battery-backed NVRAM cache fails, a risk of data inconsistency still exists. A well-known performance problem with RAID systems is that if they do partial-stripe writes, where the data updated is less than a single RAID stripe, the RAID system must read the old data and parity to calculate the new parity. This recalculation slows performance. A possible solution is that the RAID system buffers the partial-stripe writes in 2 | ORACLE ZFS STORAGE—DATA INTEGRITY NVRAM, which hides the latency from the user, but the NVRAM cache can fill up. One RAID vendor’s solution is to sell more expensive cache. Today’s savvy enterprise customers are looking for a storage solution without the costs associated with silent data corruption or slow performance. Figure 1. Traditional RAID approach leaves I/O path vulnerable. Traditional RAID Data Integrity Study A CERN research study on data integrity with traditional RAID solutions found that data corruption occurs at lower levels and that probes and monitors must be implemented and deployed, leading to increased OpEx and CapEx costs. 2 » Testing involved writing 2 GB files with special bit patterns and afterwards, reading the files back to compare the patterns. The testing was deployed on more than 3,000 nodes (disk server, CPU server, database server, etc.) and run every two hours. After about five weeks of testing that was run on 3,000 nodes, the results revealed 500 errors on 100 nodes. » The study’s findings uncovered the following RAID problems: » RAID controllers don’t always check parity when reading data from RAID 5 file systems. » RAID controllers don’t always report problems at the disk level to the upper-level OS. » Running a verify operation for the RAID controller reads all data from disk and recalculates the RAID 5 checksum, but it doesn’t have a notion of what “correct” data is from a user point of view. » Running the verify operation on 492 systems over four weeks resulted in the fix of ~300 block problems. » The discovery of a disk firmware problem required a manual update of the firmware on 3,000 disks. » CERN concluded that they must implement and deploy constant RAID monitoring and intensive probes that would double their original disk I/O performance requirements, and they must also increase CPU capacity on storage servers by 50 percent to accommodate the RAID monitoring. 2 Bernd Panzer-Steindel, CERN/IT, Data Integrity, April 2007 3 | ORACLE ZFS STORAGE—DATA INTEGRITY ZFS Data Integrity ZFS is a modern file system that provides the following data integrity components to eliminate the problems with previous storage products: » End-to-end data integrity requires each data block to be verified against an independent checksum, after the data has arrived in the host's memory. » The ZFS design goal provides fault isolation between data and checksum by storing the checksum of each block in its parent block pointer—not in the block itself. » ZFS is a self-validating Merkle tree of blocks, proven to provide cryptographically strong authentication for any component of the tree, and for the tree as a whole. » Each ZFS block contains checksums for all its children, meaning the entire pool is self-validating. » ZFS knows to trust the checksum because it is part of some other block at a level higher in the tree, and that block has already been validated. » ZFS uses this end-to-end checksum hierarchy to detect and correct silent data corruption: » If a disk returns bad data transiently, ZFS detect its and retries the read. » If disk is part of mirror or RAIDZ group, ZFS both detects and corrects the error by using the checksum to determine the copy is correct, provides good data to the application, and repairs the damaged copy. Figure 2. The ZFS hierarchical checksums can detect more types of errors than traditional block-level checksums. ZFS Transactional Processing ZFS maintains data consistency in the event of system crashes by using a redirect-on-write transactional update model that is similar to a copy-on-write (COW) model, only it is more efficient because it generates one less copy. ZFS file system metadata and data are represented as objects and are grouped into a transactional group for modification. New copies are created for all the modified blocks (in a Merkle tree) when a transaction group is committed to disk. See Figure 1. The root of the tree structure (uberblock) is updated atomically, maintaining an 4 | ORACLE ZFS STORAGE—DATA INTEGRITY always-consistent disk state. The redirect-on-write transactional processing model, along with hierarchical checksums, means there is no need for a journal that is part of many traditional file systems. ZFS Ditto Blocks ZFS uses block pointers to point to data blocks on disk called disk virtual addresses (DVAs). ZFS block pointers identify not just one DVA, but up to three DVAs.

Oracle ZFS Storage--Data Integrity White Paper

Hybrid Drowsy SRAM and STT-RAM Buffer Designs for Dark-Silicon

Z/OS ICSF Overview How to Send Your Comments to IBM

An Analysis of Data Corruption in the Storage Stack

Detection Method of Data Integrity in Network Storage Based on Symmetrical Diﬀerence

Linux Data Integrity Extensions

Understanding Real World Data Corruptions in Cloud Systems

Nasdeluxe Z-Series

MRAM Technology Status

The Effects of Repeated Refresh Cycles on the Oxide Integrity of EEPROM Memories at High Temperature by Lynn Reed and Vema Reddy Tekmos, Inc

On the Effects of Data Corruption in Files

Practical Risk-Based Guide for Managing Data Integrity

Exploiting Asymmetry in Edram Errors for Redundancy-Free Error