Verifyfs in Btrfs Style (Btrfs End to End Data Integrity)

Total Page:16

File Type:pdf, Size:1020Kb

Verifyfs in Btrfs Style (Btrfs End to End Data Integrity) <Insert Picture Here> VerifyFS in Btrfs Style (Btrfs end to end Data Integrity) Liu Bo ([email protected]) Btrfs community • Filesystems span many different use cases • Btrfs has contributors from many different companies(including Facebook, Fujitsu, FusionIO, Intel, Linux Foundation, Netgear, Novell/SUSE, Oracle, Redhat, STRATO AG) and many individuals • Broad community ensures that btrfs is full of interesting features Btrfs • Copy On Write (COW) • Writable snapshots, read-only snapshots • Transparent Compression (zlib, lzo) • Integrated multiple device support • Built-in Raid with restriping(raid 0,1,10,5,6) • Checksums on data and metadata(crc32c) • Space-efficient packing of small files • Conversion of existing ext3/4 file systems • Subvolume-aware quota support • Etc. Data corruptions • Data from disk != the expected contents Data corruptions • Data from disk != the expected contents • Why do they happen? • At different layers of storage stack • Disk firmware bugs • Software bugs • library / kernel errors, e.g. bugs in filesystems and device drivers Data Integrity • Why we need “end to end data integrity” in btrfs? • Most filesystems depend on disk/hardware to detect and report errors • Disk firmware is a black box. • Most filesystems don't guarantee the data is what you're looking for How to verify data integrity • Store checksum with disk block • Disk can be formatted with 520 or 528 byte sector rather than 512 • The extra bytes can be used to store checksum (block appended checksum) • data and checksum are stored as a unit -- so they're self-consistent 512 bytes of data 8 or 16 How to verify data integrity (cont.) • It is harder than it sounds to make good use of block-level checksum • It only proves that a block is self-consistent; • It doesn't prove that it's the right block • The rest of the I/O path from the disk to the host remains unprotected Solutions • Fault isolation, separate data block and checksum(e.g. btrfs, zfs) • Add more information in extra bytes (e.g. T10's Protection Information, DIF) Btrfs checksum • Checksums of data blocks are stored in the checksum tree • Checksums of metadata blocks and superblock are store inside their blocks Checksum tree root Metadata block / superblock leaf Metadata/superblock crc data crc ... data crc Figure 1 Figure 2 Btrfs checksum cont. • Already support crc32c algorithm • Checksuming on all things • Superblock, metadata blocks and data blocks • Fast but insecure • crc32c isn't suitable for detecting malicious data in general. • The goal is just to find blocks that are not correctly returned by the storage. • Recently support sha256 as an alternative algorithm Why sha256? • Fairly strong • Slower but secure • Intel has already developed acceleration instructions for sha256 • Btrfs disk format has checksum size limit Another checksum sha256 • For superblock and metadata blocks, btrfs has reserved 32bytes(256bit) for checksum. • For data blocks, btrfs store checksum in the crc tree, no size limit. • No need to change disk format! Schemes • Schemes to detect malicious changes to the FS data. • The Merkle tree? • Root hash Schemes cont.(1) • Btrfs + merkle tree, sounds great? • Does it work? • Unfortunately, sorry. • Merkle tree requires... • we wouldn't be allowed to write a tree node until all of its children had been checksum'd • These write ordering rules of metadata block will make things difficult under memory pressure Schemes cont.(2) • Checksum + 'btrfs scrub' • Data scrubbing will ... • read all superblock, metadata blocks and data blocks on disk • verify integrity by checking their sums • If errors occur(checksum failure or EIO), a good copy is searched for. • If one is found, the bad copy will be overwritten. • There is an READONLY option. Demo • Checksum sha256 + btrfs scrub Limitations • For btrfs's superblock and metadata blocks, it's not fault isolation • but they have two or more copies, • superblocks have up to 3 copies • metadata blocks have 2 copies. • Filesystem checksums are way better for READ time error detection • Which could be months later, original buffer is lost • Redundant copy may also be bad if buffer was incorrect • DIF/DIX checksums, catch errors at write time while we still have a chance to recover with good data in memory Performance • Heavily depends on the implementation of sha256 and btrfs scrub • Thank you! • Questions? References.
Recommended publications
  • Lecture 9: Data Storage and IO Models Lecture 9
    Lecture 9 Lecture 9: Data Storage and IO Models Lecture 9 Announcements • Submission Project Part 1 tonight • Instructions on Piazza! • PS2 due on Friday at 11:59 pm • Questions? Easier than PS1. • Badgers Rule! Lecture 9 Today’s Lecture 1. Data Storage 2. Disk and Files 3. Buffer Manager - Prelims 3 Lecture 9 > Section 1 1. Data Storage 4 Lecture 9 > Section 1 What you will learn about in this section 1. Life cycle of a query 2. Architecture of a DBMS 3. Memory Hierarchy 5 Lecture 9 > Section 1 Life cycle of a query Query Result Query Database Server Query Execute Parser Optimizer Select R.text from |…|……|………..|………..| Report R, Weather W |…|……|………..|………..| where W.image.rain() Scheduler Operators |…|……|………..|………..| and W.city = R.city |…|……|………..|………..| and W.date = R.date |…|……|………..|………..| and |…|……|………..|………..| R.text. |…|……|………..|………..| matches(“insurance claims”) |…|……|………..|………..| |…|……|………..|………..| |…|……|………..|………..| |…|……|………..|………..| Query Query Syntax Tree Query Plan Result Segments 6 Lecture 9 > Section 1 > Architecture of a DBMS Internal Architecture of a DBMS query Query Execution data access Storage Manager I/O access 7 Lecture 9 > Section 1 > Storage Manager Architecture of a Storage Manager Access Methods Sorted File Hash Index B+-tree Heap File Index Manager Buffer Manager Recovery Manager Concurrency Control I/O Manager IO Accesses In Systems, IO cost matters a ton! 8 Lecture 9 > Section 1 > Data Storage Data Storage • How does a DBMS store and access data? • main memory (fast, temporary) • disk (slow, permanent) • How
    [Show full text]
  • Z/OS ICSF Overview How to Send Your Comments to IBM
    z/OS Version 2 Release 3 Cryptographic Services Integrated Cryptographic Service Facility Overview IBM SC14-7505-08 Note Before using this information and the product it supports, read the information in “Notices” on page 81. This edition applies to ICSF FMID HCR77D0 and Version 2 Release 3 of z/OS (5650-ZOS) and to all subsequent releases and modifications until otherwise indicated in new editions. Last updated: 2020-05-25 © Copyright International Business Machines Corporation 1996, 2020. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Figures................................................................................................................ vii Tables.................................................................................................................. ix About this information.......................................................................................... xi ICSF features...............................................................................................................................................xi Who should use this information................................................................................................................ xi How to use this information........................................................................................................................ xi Where to find more information.................................................................................................................xii
    [Show full text]
  • An Analysis of Data Corruption in the Storage Stack
    An Analysis of Data Corruption in the Storage Stack Lakshmi N. Bairavasundaram∗, Garth R. Goodson†, Bianca Schroeder‡ Andrea C. Arpaci-Dusseau∗, Remzi H. Arpaci-Dusseau∗ ∗University of Wisconsin-Madison †Network Appliance, Inc. ‡University of Toronto {laksh, dusseau, remzi}@cs.wisc.edu, [email protected], [email protected] Abstract latent sector errors, within disk drives [18]. Latent sector errors are detected by a drive’s internal error-correcting An important threat to reliable storage of data is silent codes (ECC) and are reported to the storage system. data corruption. In order to develop suitable protection Less well-known, however, is that current hard drives mechanisms against data corruption, it is essential to un- and controllers consist of hundreds-of-thousandsof lines derstand its characteristics. In this paper, we present the of low-level firmware code. This firmware code, along first large-scale study of data corruption. We analyze cor- with higher-level system software, has the potential for ruption instances recorded in production storage systems harboring bugs that can cause a more insidious type of containing a total of 1.53 million disk drives, over a pe- disk error – silent data corruption, where the data is riod of 41 months. We study three classes of corruption: silently corrupted with no indication from the drive that checksum mismatches, identity discrepancies, and par- an error has occurred. ity inconsistencies. We focus on checksum mismatches since they occur the most. Silent data corruptionscould lead to data loss more of- We find more than 400,000 instances of checksum ten than latent sector errors, since, unlike latent sector er- mismatches over the 41-month period.
    [Show full text]
  • System Calls System Calls
    System calls We will investigate several issues related to system calls. Read chapter 12 of the book Linux system call categories file management process management error handling note that these categories are loosely defined and much is behind included, e.g. communication. Why? 1 System calls File management system call hierarchy you may not see some topics as part of “file management”, e.g., sockets 2 System calls Process management system call hierarchy 3 System calls Error handling hierarchy 4 Error Handling Anything can fail! System calls are no exception Try to read a file that does not exist! Error number: errno every process contains a global variable errno errno is set to 0 when process is created when error occurs errno is set to a specific code associated with the error cause trying to open file that does not exist sets errno to 2 5 Error Handling error constants are defined in errno.h here are the first few of errno.h on OS X 10.6.4 #define EPERM 1 /* Operation not permitted */ #define ENOENT 2 /* No such file or directory */ #define ESRCH 3 /* No such process */ #define EINTR 4 /* Interrupted system call */ #define EIO 5 /* Input/output error */ #define ENXIO 6 /* Device not configured */ #define E2BIG 7 /* Argument list too long */ #define ENOEXEC 8 /* Exec format error */ #define EBADF 9 /* Bad file descriptor */ #define ECHILD 10 /* No child processes */ #define EDEADLK 11 /* Resource deadlock avoided */ 6 Error Handling common mistake for displaying errno from Linux errno man page: 7 Error Handling Description of the perror () system call.
    [Show full text]
  • Report on the 2020 FOSS Contributor Survey
    Report on the 2020 FOSS Contributor Survey The Linux Foundation & The Laboratory for Innovation Science at Harvard Frank Nagle Harvard Business School David A. Wheeler The Linux Foundation Hila Lifshitz-Assaf New York University Haylee Ham Jennifer L. Hoffman Laboratory for Innovation Science at Harvard Acknowledgments This report and the research behind it would not have been possible without the leadership of the Core Infrastructure Initiative’s Advisory Committee, composed of Josh Corman, Steve Lipner, Audris Mockus, Henning Piezunka, and Sam Ransbotham. Frank Nagle would also like to thank his fellow co-directors of the Core Infrastructure Initiative, Jim Zemlin at the Linux Foundation and Karim Lakhani at the Laboratory for Innovation Science at Harvard, for their counsel and direction throughout this project. Gratitude and thanks to Michael Dolan and Kate Stewart at the Linux Foundation for their ongoing commitment to this undertaking. Thank you to James Dana for laying the initial groundwork for this survey. Finally — and perhaps, most importantly — thank you to all the individuals who contribute to FOSS projects. Without their tireless efforts, our core digital infrastructure and the feats enabled by it would not be sustainable. REVISED: This report has been updated since its original release on 8 December 2020. This second version, released on 10 December 2020, corrects errors found in the original text and graphics. Contents Executive Summary 4 Introduction 7 Methodology 9 Overview of Findings 10 Demographics 10 Figure 1: Gender
    [Show full text]
  • Detection Method of Data Integrity in Network Storage Based on Symmetrical Difference
    S S symmetry Article Detection Method of Data Integrity in Network Storage Based on Symmetrical Difference Xiaona Ding School of Electronics and Information Engineering, Sias University of Zhengzhou, Xinzheng 451150, China; [email protected] Received: 15 November 2019; Accepted: 26 December 2019; Published: 3 February 2020 Abstract: In order to enhance the recall and the precision performance of data integrity detection, a method to detect the network storage data integrity based on symmetric difference was proposed. Through the complete automatic image annotation system, the crawler technology was used to capture the image and related text information. According to the automatic word segmentation, pos tagging and Chinese word segmentation, the feature analysis of text data was achieved. Based on the symmetrical difference algorithm and the background subtraction, the feature extraction of image data was realized. On the basis of data collection and feature extraction, the sentry data segment was introduced, and then the sentry data segment was randomly selected to detect the data integrity. Combined with the accountability scheme of data security of the trusted third party, the trusted third party was taken as the core. The online state judgment was made for each user operation. Meanwhile, credentials that cannot be denied by both parties were generated, and thus to prevent the verifier from providing false validation results. Experimental results prove that the proposed method has high precision rate, high recall rate, and strong reliability. Keywords: symmetric difference; network; data integrity; detection 1. Introduction In recent years, the cloud computing becomes a new shared infrastructure based on the network. Based on Internet, virtualization, and other technologies, a large number of system pools and other resources are combined to provide users with a series of convenient services [1].
    [Show full text]
  • Linux Data Integrity Extensions
    Linux Data Integrity Extensions Martin K. Petersen Oracle [email protected] Abstract The software stack, however, is rapidly growing in com- plexity. This implies an increasing failure potential: Many databases and filesystems feature checksums on Harddrive firmware, RAID controller firmware, host their logical blocks, enabling detection of corrupted adapter firmware, operating system code, system li- data. The scenario most people are familiar with in- braries, and application errors. There are many things volves bad sectors which develop while data is stored that can go wrong from the time data is generated in on disk. However, many corruptions are actually a re- host memory until it is stored physically on disk. sult of errors that occurred when the data was originally written. While a database or filesystem can detect the Most storage devices feature extensive checking to pre- corruption when data is eventually read back, the good vent errors. However, these protective measures are al- data may have been lost forever. most exclusively being deployed internally to the de- vice in a proprietary fashion. So far, there have been A recent addition to SCSI allows extra protection infor- no means for collaboration between the layers in the I/O mation to be exchanged between controller and disk. We stack to ensure data integrity. have extended this capability up into Linux, allowing filesystems (and eventually applications) to be able to at- An extension to the SCSI family of protocols tries to tach integrity metadata to I/O requests. Controllers and remedy this by defining a way to check the integrity of disks can then verify the integrity of an I/O before com- an request as it traverses the I/O stack.
    [Show full text]
  • Ext4 File System and Crash Consistency
    1 Ext4 file system and crash consistency Changwoo Min 2 Summary of last lectures • Tools: building, exploring, and debugging Linux kernel • Core kernel infrastructure • Process management & scheduling • Interrupt & interrupt handler • Kernel synchronization • Memory management • Virtual file system • Page cache and page fault 3 Today: ext4 file system and crash consistency • File system in Linux kernel • Design considerations of a file system • History of file system • On-disk structure of Ext4 • File operations • Crash consistency 4 File system in Linux kernel User space application (ex: cp) User-space Syscalls: open, read, write, etc. Kernel-space VFS: Virtual File System Filesystems ext4 FAT32 JFFS2 Block layer Hardware Embedded Hard disk USB drive flash 5 What is a file system fundamentally? int main(int argc, char *argv[]) { int fd; char buffer[4096]; struct stat_buf; DIR *dir; struct dirent *entry; /* 1. Path name -> inode mapping */ fd = open("/home/lkp/hello.c" , O_RDONLY); /* 2. File offset -> disk block address mapping */ pread(fd, buffer, sizeof(buffer), 0); /* 3. File meta data operation */ fstat(fd, &stat_buf); printf("file size = %d\n", stat_buf.st_size); /* 4. Directory operation */ dir = opendir("/home"); entry = readdir(dir); printf("dir = %s\n", entry->d_name); return 0; } 6 Why do we care EXT4 file system? • Most widely-deployed file system • Default file system of major Linux distributions • File system used in Google data center • Default file system of Android kernel • Follows the traditional file system design 7 History of file system design 8 UFS (Unix File System) • The original UNIX file system • Design by Dennis Ritche and Ken Thompson (1974) • The first Linux file system (ext) and Minix FS has a similar layout 9 UFS (Unix File System) • Performance problem of UFS (and the first Linux file system) • Especially, long seek time between an inode and data block 10 FFS (Fast File System) • The file system of BSD UNIX • Designed by Marshall Kirk McKusick, et al.
    [Show full text]
  • Hash Functions
    Hash Functions A hash function is a function that maps data of arbitrary size to an integer of some fixed size. Example: Java's class Object declares function ob.hashCode() for ob an object. It's a hash function written in OO style, as are the next two examples. Java version 7 says that its value is its address in memory turned into an int. Example: For in an object of type Integer, in.hashCode() yields the int value that is wrapped in in. Example: Suppose we define a class Point with two fields x and y. For an object pt of type Point, we could define pt.hashCode() to yield the value of pt.x + pt.y. Hash functions are definitive indicators of inequality but only probabilistic indicators of equality —their values typically have smaller sizes than their inputs, so two different inputs may hash to the same number. If two different inputs should be considered “equal” (e.g. two different objects with the same field values), a hash function must re- spect that. Therefore, in Java, always override method hashCode()when overriding equals() (and vice-versa). Why do we need hash functions? Well, they are critical in (at least) three areas: (1) hashing, (2) computing checksums of files, and (3) areas requiring a high degree of information security, such as saving passwords. Below, we investigate the use of hash functions in these areas and discuss important properties hash functions should have. Hash functions in hash tables In the tutorial on hashing using chaining1, we introduced a hash table b to implement a set of some kind.
    [Show full text]
  • File Handling in Python
    hapter C File Handling in 2 Python There are many ways of trying to understand programs. People often rely too much on one way, which is called "debugging" and consists of running a partly- understood program to see if it does what you expected. Another way, which ML advocates, is to install some means of understanding in the very programs themselves. — Robin Milner In this Chapter » Introduction to Files » Types of Files » Opening and Closing a 2.1 INTRODUCTION TO FILES Text File We have so far created programs in Python that » Writing to a Text File accept the input, manipulate it and display the » Reading from a Text File output. But that output is available only during » Setting Offsets in a File execution of the program and input is to be entered through the keyboard. This is because the » Creating and Traversing a variables used in a program have a lifetime that Text File lasts till the time the program is under execution. » The Pickle Module What if we want to store the data that were input as well as the generated output permanently so that we can reuse it later? Usually, organisations would want to permanently store information about employees, inventory, sales, etc. to avoid repetitive tasks of entering the same data. Hence, data are stored permanently on secondary storage devices for reusability. We store Python programs written in script mode with a .py extension. Each program is stored on the secondary device as a file. Likewise, the data entered, and the output can be stored permanently into a file.
    [Show full text]
  • Nasdeluxe Z-Series
    NASdeluxe Z-Series Benefit from scalable ZFS data storage By partnering with Starline and with Starline Computer’s NASdeluxe Open-E, you receive highly efficient Z-series and Open-E JovianDSS. This and reliable storage solutions that software-defined storage solution is offer: Enhanced Storage Performance well-suited for a wide range of applica- tions. It caters perfectly to the needs • Great adaptability Tiered RAM and SSD cache of enterprises that are looking to de- • Tiered and all-flash storage Data integrity check ploy a flexible storage configuration systems which can be expanded to a high avail- Data compression and in-line • High IOPS through RAM and SSD ability cluster. Starline and Open-E can data deduplication caching look back on a strategic partnership of Thin provisioning and unlimited • Superb expandability with more than 10 years. As the first part- number of snapshots and clones ner with a Gold partnership level, Star- Starline’s high-density JBODs – line has always been working hand in without downtime Simplified management hand with Open-E to develop and de- Flexible scalability liver innovative data storage solutions. Starline’s NASdeluxe Z-Series offers In fact, Starline supports worldwide not only great features, but also great Hardware independence enterprises in managing and pro- flexibility – thanks to its modular archi- tecting their storage, with over 2,800 tecture. Open-E installations to date. www.starline.de Z-Series But even with a standard configuration with nearline HDDs IOPS and SSDs for caching, you will be able to achieve high IOPS 250 000 at a reasonable cost.
    [Show full text]
  • Kernel Validation with Kselftest Shuah Khan, Kernel Maintainer and Fellow, the Linux Foundation
    Kernel Validation With Kselftest Shuah Khan, Kernel Maintainer and Fellow, The Linux Foundation • Why do we test? • Kinds of testing/tests ... – Unit, developer, regression, integration • Linux kernel testing philosophy – Developer and community driven testing – Reliance on community and users • Linux kernel release cycle – Time based - not feature based – Continuous and parallel development/testing model • Linux kernel testing and validation – Writing tests • Kernel test frameworks - Kselftest & KUnit – Developer testing • Kselftest, KUnit and others. – Regression testing • Kselftest, KUnit and others. • Linux kernel testing and validation – Continuous Integration testing • Static analysis tools (sparse, smatch, coccicheck etc.) • Dynamic analysis tools (fuzzers, syzbot etc.) • Where does this all happen? – Developer test systems – Continuous Integration Rings • Kernel CI Dashboard — Home • 0-Day - Boot and Performance issues • 0-Day - Build issues • Linaro QA • Buildbot • Hulk Robot • What is tested? – Kernel repositories: • linux mainline • linux-next • developer git repositories – Active kernel releases • Basic testing – Boot and usage test – Run basic sanity tests • Basic sanity tests – Does networking (wifi/wired) work correctly? – Does ssh work? – rsync a large file(s) from another system – Download files: wget, ftp, git clone etc. – Play audio/video • Examine kernel logs – Look for new critical and error messages – Check for new warning messages – Check for panic traces • Kernel selftest (Kselftest) – Regression test suite • Kernel
    [Show full text]