Verifyfs in Btrfs Style (Btrfs End to End Data Integrity)

<Insert Picture Here> VerifyFS in Btrfs Style (Btrfs end to end Data Integrity) Liu Bo ([email protected]) Btrfs community • Filesystems span many different use cases • Btrfs has contributors from many different companies(including Facebook, Fujitsu, FusionIO, Intel, Linux Foundation, Netgear, Novell/SUSE, Oracle, Redhat, STRATO AG) and many individuals • Broad community ensures that btrfs is full of interesting features Btrfs • Copy On Write (COW) • Writable snapshots, read-only snapshots • Transparent Compression (zlib, lzo) • Integrated multiple device support • Built-in Raid with restriping(raid 0,1,10,5,6) • Checksums on data and metadata(crc32c) • Space-efficient packing of small files • Conversion of existing ext3/4 file systems • Subvolume-aware quota support • Etc. Data corruptions • Data from disk != the expected contents Data corruptions • Data from disk != the expected contents • Why do they happen? • At different layers of storage stack • Disk firmware bugs • Software bugs • library / kernel errors, e.g. bugs in filesystems and device drivers Data Integrity • Why we need “end to end data integrity” in btrfs? • Most filesystems depend on disk/hardware to detect and report errors • Disk firmware is a black box. • Most filesystems don't guarantee the data is what you're looking for How to verify data integrity • Store checksum with disk block • Disk can be formatted with 520 or 528 byte sector rather than 512 • The extra bytes can be used to store checksum (block appended checksum) • data and checksum are stored as a unit -- so they're self-consistent 512 bytes of data 8 or 16 How to verify data integrity (cont.) • It is harder than it sounds to make good use of block-level checksum • It only proves that a block is self-consistent; • It doesn't prove that it's the right block • The rest of the I/O path from the disk to the host remains unprotected Solutions • Fault isolation, separate data block and checksum(e.g. btrfs, zfs) • Add more information in extra bytes (e.g. T10's Protection Information, DIF) Btrfs checksum • Checksums of data blocks are stored in the checksum tree • Checksums of metadata blocks and superblock are store inside their blocks Checksum tree root Metadata block / superblock leaf Metadata/superblock crc data crc ... data crc Figure 1 Figure 2 Btrfs checksum cont. • Already support crc32c algorithm • Checksuming on all things • Superblock, metadata blocks and data blocks • Fast but insecure • crc32c isn't suitable for detecting malicious data in general. • The goal is just to find blocks that are not correctly returned by the storage. • Recently support sha256 as an alternative algorithm Why sha256? • Fairly strong • Slower but secure • Intel has already developed acceleration instructions for sha256 • Btrfs disk format has checksum size limit Another checksum sha256 • For superblock and metadata blocks, btrfs has reserved 32bytes(256bit) for checksum. • For data blocks, btrfs store checksum in the crc tree, no size limit. • No need to change disk format! Schemes • Schemes to detect malicious changes to the FS data. • The Merkle tree? • Root hash Schemes cont.(1) • Btrfs + merkle tree, sounds great? • Does it work? • Unfortunately, sorry. • Merkle tree requires... • we wouldn't be allowed to write a tree node until all of its children had been checksum'd • These write ordering rules of metadata block will make things difficult under memory pressure Schemes cont.(2) • Checksum + 'btrfs scrub' • Data scrubbing will ... • read all superblock, metadata blocks and data blocks on disk • verify integrity by checking their sums • If errors occur(checksum failure or EIO), a good copy is searched for. • If one is found, the bad copy will be overwritten. • There is an READONLY option. Demo • Checksum sha256 + btrfs scrub Limitations • For btrfs's superblock and metadata blocks, it's not fault isolation • but they have two or more copies, • superblocks have up to 3 copies • metadata blocks have 2 copies. • Filesystem checksums are way better for READ time error detection • Which could be months later, original buffer is lost • Redundant copy may also be bad if buffer was incorrect • DIF/DIX checksums, catch errors at write time while we still have a chance to recover with good data in memory Performance • Heavily depends on the implementation of sha256 and btrfs scrub • Thank you! • Questions? References.

Verifyfs in Btrfs Style (Btrfs End to End Data Integrity)

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support